From patchwork Sun Apr 9 12:12:41 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 13205939 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from pdx1-mailman-customer002.dreamhost.com (listserver-buz.dreamhost.com [69.163.136.29]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 86581C77B61 for ; Sun, 9 Apr 2023 12:15:20 +0000 (UTC) Received: from pdx1-mailman-customer002.dreamhost.com (localhost [127.0.0.1]) by pdx1-mailman-customer002.dreamhost.com (Postfix) with ESMTP id 4PvWHG71nnz1yBY; Sun, 9 Apr 2023 05:14:02 -0700 (PDT) Received: from smtp4.ccs.ornl.gov (smtp4.ccs.ornl.gov [160.91.203.40]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by pdx1-mailman-customer002.dreamhost.com (Postfix) with ESMTPS id 4PvWGd6d3Gz1wYq for ; Sun, 9 Apr 2023 05:13:29 -0700 (PDT) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp4.ccs.ornl.gov (Postfix) with ESMTP id C8096100826D; Sun, 9 Apr 2023 08:13:27 -0400 (EDT) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id B90712B2; Sun, 9 Apr 2023 08:13:27 -0400 (EDT) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Sun, 9 Apr 2023 08:12:41 -0400 Message-Id: <1681042400-15491-2-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1681042400-15491-1-git-send-email-jsimmons@infradead.org> References: <1681042400-15491-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 01/40] lustre: protocol: basic batching processing framework X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.39 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Qian Yingjin Batching processing can obtain boost performace. The larger the batch size, the higher the latency for the entire batch. Although the latency for the entire batch of operations is higher than the latency of any single operation, the throughput of the batch of operations is much high. This patch implements the basic batching processing framework for Lustre. It could be used for the future batching statahead and WBC. A batched RPC does not require that the opcodes of sub requests in a batch are same. Each sub request has its own opcode. It allows batching not only read-only requests but also multiple modification updates with different opcodes, and even a mixed workload which contains both read-only requests and modification updates. For the recovery, only the batched RPC contains a client XID, there is no separate client XID for each sub-request. Although the server will generate a transno for each update sub request, but the transno only stores into the batched RPC (in @ptlrpc_body) when the sub update request is finished. Thus the batched RPC only stores the transno of the last sub update request. Only the batched RPC contains the @ptlrpc_body message field. Each sub request in a batched RPC does not contain @ptlrpc_body field. A new field named @lrd_batch_idx is added in the client reply data @lsd_reply_data. It indicates the sub request index in a batched RPC. When the server finished a sub update request, it will update @lrd_batch_idx accordingly. When found that a batched RPC was a resend RPC, and if the index of the sub request in the batched RPC is smaller or equal than @lrd_batch_idx in the reply data, it means that the sub request has already executed and committed, the server will reconstruct the reply for this sub request; if the index is larger than @lrd_batch_idx, the server will re-execute the sub request in the batched RPC. To simplify the reply/resend of the batched RPCs, the batch processing stops at the first failure in the current design. WC-bug-id: https://jira.whamcloud.com/browse/LU-14393 Lustre-commit: 840274b5c5e95e44a ("LU-14393 protocol: basic batching processing framework") Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/41378 Lustre-commit: 178988d67aa2f83aa ("LU-14393 recovery: reply reconstruction for batched RPCs") Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/48228 Signed-off-by: Qian Yingjin Reviewed-by: Mikhail Pershin Reviewed-by: Andreas Dilger Reviewed-by: Alex Zhuravlev Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/include/lustre_net.h | 43 +++ fs/lustre/include/lustre_req_layout.h | 28 +- fs/lustre/include/lustre_swab.h | 3 + fs/lustre/include/obd.h | 78 +++++ fs/lustre/include/obd_class.h | 48 +++ fs/lustre/lmv/lmv_internal.h | 12 + fs/lustre/lmv/lmv_obd.c | 173 ++++++++++ fs/lustre/mdc/Makefile | 2 +- fs/lustre/mdc/mdc_batch.c | 62 ++++ fs/lustre/mdc/mdc_internal.h | 3 + fs/lustre/mdc/mdc_request.c | 4 + fs/lustre/ptlrpc/Makefile | 2 +- fs/lustre/ptlrpc/batch.c | 588 +++++++++++++++++++++++++++++++++ fs/lustre/ptlrpc/client.c | 25 ++ fs/lustre/ptlrpc/layout.c | 126 ++++++- fs/lustre/ptlrpc/lproc_ptlrpc.c | 1 + fs/lustre/ptlrpc/pack_generic.c | 27 +- fs/lustre/ptlrpc/wiretest.c | 12 +- include/uapi/linux/lustre/lustre_idl.h | 82 ++++- 19 files changed, 1280 insertions(+), 39 deletions(-) create mode 100644 fs/lustre/mdc/mdc_batch.c create mode 100644 fs/lustre/ptlrpc/batch.c diff --git a/fs/lustre/include/lustre_net.h b/fs/lustre/include/lustre_net.h index 1605fcc..1ffe9f7 100644 --- a/fs/lustre/include/lustre_net.h +++ b/fs/lustre/include/lustre_net.h @@ -1161,6 +1161,13 @@ struct ptlrpc_bulk_frag_ops { struct page *page, int pageoffset, int len); /** + * Add a @fragment to the bulk descriptor @desc. + * Data to transfer in the fragment is pointed to by @frag + * The size of the fragment is @len + */ + int (*add_iov_frag)(struct ptlrpc_bulk_desc *desc, void *frag, int len); + + /** * Uninitialize and free bulk descriptor @desc. * Works on bulk descriptors both from server and client side. */ @@ -1170,6 +1177,42 @@ struct ptlrpc_bulk_frag_ops { extern const struct ptlrpc_bulk_frag_ops ptlrpc_bulk_kiov_pin_ops; extern const struct ptlrpc_bulk_frag_ops ptlrpc_bulk_kiov_nopin_ops; +static inline bool req_capsule_ptlreq(struct req_capsule *pill) +{ + struct ptlrpc_request *req = pill->rc_req; + + return req && pill == &req->rq_pill; +} + +static inline bool req_capsule_subreq(struct req_capsule *pill) +{ + struct ptlrpc_request *req = pill->rc_req; + + return !req || pill != &req->rq_pill; +} + +/** + * Returns true if request needs to be swabbed into local cpu byteorder + */ +static inline bool req_capsule_req_need_swab(struct req_capsule *pill) +{ + struct ptlrpc_request *req = pill->rc_req; + + return req && req_capsule_req_swabbed(&req->rq_pill, + MSG_PTLRPC_HEADER_OFF); +} + +/** + * Returns true if request reply needs to be swabbed into local cpu byteorder + */ +static inline bool req_capsule_rep_need_swab(struct req_capsule *pill) +{ + struct ptlrpc_request *req = pill->rc_req; + + return req && req_capsule_rep_swabbed(&req->rq_pill, + MSG_PTLRPC_HEADER_OFF); +} + /** * Definition of bulk descriptor. * Bulks are special "Two phase" RPCs where initial request message diff --git a/fs/lustre/include/lustre_req_layout.h b/fs/lustre/include/lustre_req_layout.h index 9f22134b..a7ed89b 100644 --- a/fs/lustre/include/lustre_req_layout.h +++ b/fs/lustre/include/lustre_req_layout.h @@ -82,7 +82,9 @@ void req_capsule_init(struct req_capsule *pill, struct ptlrpc_request *req, void req_capsule_set(struct req_capsule *pill, const struct req_format *fmt); size_t req_capsule_filled_sizes(struct req_capsule *pill, enum req_location loc); -int req_capsule_server_pack(struct req_capsule *pill); +int req_capsule_server_pack(struct req_capsule *pill); +int req_capsule_client_pack(struct req_capsule *pill); +void req_capsule_set_replen(struct req_capsule *pill); void *req_capsule_client_get(struct req_capsule *pill, const struct req_msg_field *field); @@ -150,22 +152,6 @@ static inline bool req_capsule_rep_swabbed(struct req_capsule *pill, } /** - * Returns true if request needs to be swabbed into local cpu byteorder - */ -static inline bool req_capsule_req_need_swab(struct req_capsule *pill) -{ - return req_capsule_req_swabbed(pill, MSG_PTLRPC_HEADER_OFF); -} - -/** - * Returns true if request reply needs to be swabbed into local cpu byteorder - */ -static inline bool req_capsule_rep_need_swab(struct req_capsule *pill) -{ - return req_capsule_rep_swabbed(pill, MSG_PTLRPC_HEADER_OFF); -} - -/** * Mark request buffer at offset \a index that it was already swabbed */ static inline void req_capsule_set_req_swabbed(struct req_capsule *pill, @@ -295,6 +281,14 @@ static inline void req_capsule_set_rep_swabbed(struct req_capsule *pill, extern struct req_format RQF_CONNECT; +/* Batch UpdaTe req_format */ +extern struct req_format RQF_MDS_BATCH; + +/* Batch UpdaTe format */ +extern struct req_msg_field RMF_BUT_REPLY; +extern struct req_msg_field RMF_BUT_HEADER; +extern struct req_msg_field RMF_BUT_BUF; + extern struct req_msg_field RMF_GENERIC_DATA; extern struct req_msg_field RMF_PTLRPC_BODY; extern struct req_msg_field RMF_MDT_BODY; diff --git a/fs/lustre/include/lustre_swab.h b/fs/lustre/include/lustre_swab.h index 000e622..eda3532 100644 --- a/fs/lustre/include/lustre_swab.h +++ b/fs/lustre/include/lustre_swab.h @@ -96,6 +96,9 @@ void lustre_swab_lov_user_md_objects(struct lov_user_ost_data *lod, void lustre_swab_hsm_user_state(struct hsm_user_state *hus); void lustre_swab_hsm_user_item(struct hsm_user_item *hui); void lustre_swab_hsm_request(struct hsm_request *hr); +void lustre_swab_but_update_header(struct but_update_header *buh); +void lustre_swab_but_update_buffer(struct but_update_buffer *bub); +void lustre_swab_batch_update_reply(struct batch_update_reply *bur); void lustre_swab_swap_layouts(struct mdc_swap_layouts *msl); void lustre_swab_close_data(struct close_data *data); void lustre_swab_close_data_resync_done(struct close_data_resync_done *resync); diff --git a/fs/lustre/include/obd.h b/fs/lustre/include/obd.h index e9752a3..a980bf0 100644 --- a/fs/lustre/include/obd.h +++ b/fs/lustre/include/obd.h @@ -835,7 +835,14 @@ struct md_readdir_info { struct md_op_item; typedef int (*md_op_item_cb_t)(struct md_op_item *item, int rc); +enum md_item_opcode { + MD_OP_NONE = 0, + MD_OP_GETATTR = 1, + MD_OP_MAX, +}; + struct md_op_item { + enum md_item_opcode mop_opc; struct md_op_data mop_data; struct lookup_intent mop_it; struct lustre_handle mop_lockh; @@ -847,6 +854,69 @@ struct md_op_item { struct work_struct mop_work; }; +enum lu_batch_flags { + BATCH_FL_NONE = 0x0, + /* All requests in a batch are read-only. */ + BATCH_FL_RDONLY = 0x1, + /* Will create PTLRPC request set for the batch. */ + BATCH_FL_RQSET = 0x2, + /* Whether need sync commit. */ + BATCH_FL_SYNC = 0x4, +}; + +struct lu_batch { + struct ptlrpc_request_set *lbt_rqset; + __s32 lbt_result; + __u32 lbt_flags; + /* Max batched SUB requests count in a batch. */ + __u32 lbt_max_count; +}; + +struct batch_update_head { + struct obd_export *buh_exp; + struct lu_batch *buh_batch; + int buh_flags; + __u32 buh_count; + __u32 buh_update_count; + __u32 buh_buf_count; + __u32 buh_reqsize; + __u32 buh_repsize; + __u32 buh_batchid; + struct list_head buh_buf_list; + struct list_head buh_cb_list; +}; + +struct object_update_callback; +typedef int (*object_update_interpret_t)(struct ptlrpc_request *req, + struct lustre_msg *repmsg, + struct object_update_callback *ouc, + int rc); + +struct object_update_callback { + struct list_head ouc_item; + object_update_interpret_t ouc_interpret; + struct batch_update_head *ouc_head; + void *ouc_data; +}; + +typedef int (*md_update_pack_t)(struct batch_update_head *head, + struct lustre_msg *reqmsg, + size_t *max_pack_size, + struct md_op_item *item); + +struct cli_batch { + struct lu_batch cbh_super; + struct batch_update_head *cbh_head; +}; + +struct lu_batch *cli_batch_create(struct obd_export *exp, + enum lu_batch_flags flags, __u32 max_count); +int cli_batch_stop(struct obd_export *exp, struct lu_batch *bh); +int cli_batch_flush(struct obd_export *exp, struct lu_batch *bh, bool wait); +int cli_batch_add(struct obd_export *exp, struct lu_batch *bh, + struct md_op_item *item, md_update_pack_t packer, + object_update_interpret_t interpreter); + struct obd_ops { struct module *owner; int (*iocontrol)(unsigned int cmd, struct obd_export *exp, int len, @@ -1086,6 +1156,14 @@ struct md_ops { const union lmv_mds_md *lmv, size_t lmv_size); int (*rmfid)(struct obd_export *exp, struct fid_array *fa, int *rcs, struct ptlrpc_request_set *set); + struct lu_batch *(*batch_create)(struct obd_export *exp, + enum lu_batch_flags flags, + u32 max_count); + int (*batch_stop)(struct obd_export *exp, struct lu_batch *bh); + int (*batch_flush)(struct obd_export *exp, struct lu_batch *bh, + bool wait); + int (*batch_add)(struct obd_export *exp, struct lu_batch *bh, + struct md_op_item *item); }; static inline struct md_open_data *obd_mod_alloc(void) diff --git a/fs/lustre/include/obd_class.h b/fs/lustre/include/obd_class.h index 81ef59e..e4ad600 100644 --- a/fs/lustre/include/obd_class.h +++ b/fs/lustre/include/obd_class.h @@ -1673,6 +1673,54 @@ static inline int md_rmfid(struct obd_export *exp, struct fid_array *fa, return MDP(exp->exp_obd, rmfid)(exp, fa, rcs, set); } +static inline struct lu_batch * +md_batch_create(struct obd_export *exp, enum lu_batch_flags flags, + __u32 max_count) +{ + int rc; + + rc = exp_check_ops(exp); + if (rc) + return ERR_PTR(rc); + + return MDP(exp->exp_obd, batch_create)(exp, flags, max_count); +} + +static inline int md_batch_stop(struct obd_export *exp, struct lu_batch *bh) +{ + int rc; + + rc = exp_check_ops(exp); + if (rc) + return rc; + + return MDP(exp->exp_obd, batch_stop)(exp, bh); +} + +static inline int md_batch_flush(struct obd_export *exp, struct lu_batch *bh, + bool wait) +{ + int rc; + + rc = exp_check_ops(exp); + if (rc) + return rc; + + return MDP(exp->exp_obd, batch_flush)(exp, bh, wait); +} + +static inline int md_batch_add(struct obd_export *exp, struct lu_batch *bh, + struct md_op_item *item) +{ + int rc; + + rc = exp_check_ops(exp); + if (rc) + return rc; + + return MDP(exp->exp_obd, batch_add)(exp, bh, item); +} + /* OBD Metadata Support */ int obd_init_caches(void); diff --git a/fs/lustre/lmv/lmv_internal.h b/fs/lustre/lmv/lmv_internal.h index 9e89f88..64ec4ae 100644 --- a/fs/lustre/lmv/lmv_internal.h +++ b/fs/lustre/lmv/lmv_internal.h @@ -42,6 +42,18 @@ #define LL_IT2STR(it) \ ((it) ? ldlm_it2str((it)->it_op) : "0") +struct lmvsub_batch { + struct lu_batch *sbh_sub; + struct lmv_tgt_desc *sbh_tgt; + struct list_head sbh_sub_item; +}; + +struct lmv_batch { + struct lu_batch lbh_super; + struct ptlrpc_request_set *lbh_rqset; + struct list_head lbh_sub_batch_list; +}; + int lmv_intent_lock(struct obd_export *exp, struct md_op_data *op_data, struct lookup_intent *it, struct ptlrpc_request **reqp, ldlm_blocking_callback cb_blocking, diff --git a/fs/lustre/lmv/lmv_obd.c b/fs/lustre/lmv/lmv_obd.c index 3a02cc1..64d16d8 100644 --- a/fs/lustre/lmv/lmv_obd.c +++ b/fs/lustre/lmv/lmv_obd.c @@ -3790,6 +3790,175 @@ static int lmv_merge_attr(struct obd_export *exp, return 0; } +static struct lu_batch *lmv_batch_create(struct obd_export *exp, + enum lu_batch_flags flags, + __u32 max_count) +{ + struct lu_batch *bh; + struct lmv_batch *lbh; + + lbh = kzalloc(sizeof(*lbh), GFP_NOFS); + if (!lbh) + return ERR_PTR(-ENOMEM); + + bh = &lbh->lbh_super; + bh->lbt_flags = flags; + bh->lbt_max_count = max_count; + + if (flags & BATCH_FL_RQSET) { + bh->lbt_rqset = ptlrpc_prep_set(); + if (!bh->lbt_rqset) { + kfree(lbh); + return ERR_PTR(-ENOMEM); + } + } + + INIT_LIST_HEAD(&lbh->lbh_sub_batch_list); + return bh; +} + +static int lmv_batch_stop(struct obd_export *exp, struct lu_batch *bh) +{ + struct lmv_batch *lbh; + struct lmvsub_batch *sub; + struct lmvsub_batch *tmp; + int rc = 0; + + lbh = container_of(bh, struct lmv_batch, lbh_super); + list_for_each_entry_safe(sub, tmp, &lbh->lbh_sub_batch_list, + sbh_sub_item) { + list_del(&sub->sbh_sub_item); + rc = md_batch_stop(sub->sbh_tgt->ltd_exp, sub->sbh_sub); + if (rc < 0) { + CERROR("%s: stop batch processing failed: rc = %d\n", + exp->exp_obd->obd_name, rc); + if (bh->lbt_result == 0) + bh->lbt_result = rc; + } + kfree(sub); + } + + if (bh->lbt_flags & BATCH_FL_RQSET) { + rc = ptlrpc_set_wait(NULL, bh->lbt_rqset); + ptlrpc_set_destroy(bh->lbt_rqset); + } + + kfree(lbh); + return rc; +} + +static int lmv_batch_flush(struct obd_export *exp, struct lu_batch *bh, + bool wait) +{ + struct lmv_batch *lbh; + struct lmvsub_batch *sub; + int rc = 0; + int rc1; + + lbh = container_of(bh, struct lmv_batch, lbh_super); + list_for_each_entry(sub, &lbh->lbh_sub_batch_list, sbh_sub_item) { + rc1 = md_batch_flush(sub->sbh_tgt->ltd_exp, sub->sbh_sub, wait); + if (rc1 < 0) { + CERROR("%s: stop batch processing failed: rc = %d\n", + exp->exp_obd->obd_name, rc); + if (bh->lbt_result == 0) + bh->lbt_result = rc; + + if (rc == 0) + rc = rc1; + } + } + + if (wait && bh->lbt_flags & BATCH_FL_RQSET) { + rc1 = ptlrpc_set_wait(NULL, bh->lbt_rqset); + if (rc == 0) + rc = rc1; + } + + return rc; +} + +static inline struct lmv_tgt_desc * +lmv_batch_locate_tgt(struct lmv_obd *lmv, struct md_op_item *item) +{ + struct lmv_tgt_desc *tgt; + + switch (item->mop_opc) { + default: + tgt = ERR_PTR(-EOPNOTSUPP); + } + + return tgt; +} + +struct lu_batch *lmv_batch_lookup_sub(struct lmv_batch *lbh, + struct lmv_tgt_desc *tgt) +{ + struct lmvsub_batch *sub; + + list_for_each_entry(sub, &lbh->lbh_sub_batch_list, sbh_sub_item) { + if (sub->sbh_tgt == tgt) + return sub->sbh_sub; + } + + return NULL; +} + +struct lu_batch *lmv_batch_get_sub(struct lmv_batch *lbh, + struct lmv_tgt_desc *tgt) +{ + struct lmvsub_batch *sbh; + struct lu_batch *child_bh; + struct lu_batch *bh; + + child_bh = lmv_batch_lookup_sub(lbh, tgt); + if (child_bh) + return child_bh; + + sbh = kzalloc(sizeof(*sbh), GFP_NOFS); + if (!sbh) + return ERR_PTR(-ENOMEM); + + INIT_LIST_HEAD(&sbh->sbh_sub_item); + sbh->sbh_tgt = tgt; + + bh = &lbh->lbh_super; + child_bh = md_batch_create(tgt->ltd_exp, bh->lbt_flags, + bh->lbt_max_count); + if (IS_ERR(child_bh)) { + kfree(sbh); + return child_bh; + } + + child_bh->lbt_rqset = bh->lbt_rqset; + sbh->sbh_sub = child_bh; + list_add(&sbh->sbh_sub_item, &lbh->lbh_sub_batch_list); + return child_bh; +} + +static int lmv_batch_add(struct obd_export *exp, struct lu_batch *bh, + struct md_op_item *item) +{ + struct obd_device *obd = exp->exp_obd; + struct lmv_obd *lmv = &obd->u.lmv; + struct lmv_tgt_desc *tgt; + struct lmv_batch *lbh; + struct lu_batch *child_bh; + int rc; + + tgt = lmv_batch_locate_tgt(lmv, item); + if (IS_ERR(tgt)) + return PTR_ERR(tgt); + + lbh = container_of(bh, struct lmv_batch, lbh_super); + child_bh = lmv_batch_get_sub(lbh, tgt); + if (IS_ERR(child_bh)) + return PTR_ERR(child_bh); + + rc = md_batch_add(tgt->ltd_exp, child_bh, item); + return rc; +} + static const struct obd_ops lmv_obd_ops = { .owner = THIS_MODULE, .setup = lmv_setup, @@ -3840,6 +4009,10 @@ static int lmv_merge_attr(struct obd_export *exp, .get_fid_from_lsm = lmv_get_fid_from_lsm, .unpackmd = lmv_unpackmd, .rmfid = lmv_rmfid, + .batch_create = lmv_batch_create, + .batch_add = lmv_batch_add, + .batch_stop = lmv_batch_stop, + .batch_flush = lmv_batch_flush, }; static int __init lmv_init(void) diff --git a/fs/lustre/mdc/Makefile b/fs/lustre/mdc/Makefile index 1ac97ee..191c400 100644 --- a/fs/lustre/mdc/Makefile +++ b/fs/lustre/mdc/Makefile @@ -2,5 +2,5 @@ ccflags-y += -I$(srctree)/$(src)/../include obj-$(CONFIG_LUSTRE_FS) += mdc.o mdc-y := mdc_changelog.o mdc_request.o mdc_reint.o mdc_lib.o mdc_locks.o lproc_mdc.o -mdc-y += mdc_dev.o +mdc-y += mdc_dev.o mdc_batch.o mdc-$(CONFIG_LUSTRE_FS_POSIX_ACL) += mdc_acl.o diff --git a/fs/lustre/mdc/mdc_batch.c b/fs/lustre/mdc/mdc_batch.c new file mode 100644 index 0000000..496d61e3 --- /dev/null +++ b/fs/lustre/mdc/mdc_batch.c @@ -0,0 +1,62 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * GPL HEADER START + * + * DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS FILE HEADER. + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License version 2 only, + * as published by the Free Software Foundation. + * + * This program is distributed in the hope that it will be useful, but + * WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * General Public License version 2 for more details (a copy is included + * in the LICENSE file that accompanied this code). + * + * You should have received a copy of the GNU General Public License + * version 2 along with this program; If not, see + * http://www.gnu.org/licenses/gpl-2.0.html + * + * GPL HEADER END + */ +/* + * Copyright (c) 2020, 2022, DDN Storage Corporation. + */ +/* + * This file is part of Lustre, http://www.lustre.org/ + */ +/* + * lustre/mdc/mdc_batch.c + * + * Batch Metadata Updating on the client (MDC) + * + * Author: Qian Yingjin + */ + +#define DEBUG_SUBSYSTEM S_MDC + +#include +#include + +#include "mdc_internal.h" + +static md_update_pack_t mdc_update_packers[MD_OP_MAX]; + +static object_update_interpret_t mdc_update_interpreters[MD_OP_MAX]; + +int mdc_batch_add(struct obd_export *exp, struct lu_batch *bh, + struct md_op_item *item) +{ + enum md_item_opcode opc = item->mop_opc; + + if (opc >= MD_OP_MAX || !mdc_update_packers[opc] || + !mdc_update_interpreters[opc]) { + CERROR("%s: unexpected opcode %d\n", + exp->exp_obd->obd_name, opc); + return -EFAULT; + } + + return cli_batch_add(exp, bh, item, mdc_update_packers[opc], + mdc_update_interpreters[opc]); +} diff --git a/fs/lustre/mdc/mdc_internal.h b/fs/lustre/mdc/mdc_internal.h index 2416607..ae12a37 100644 --- a/fs/lustre/mdc/mdc_internal.h +++ b/fs/lustre/mdc/mdc_internal.h @@ -132,6 +132,9 @@ int mdc_revalidate_lock(struct obd_export *exp, struct lookup_intent *it, int mdc_intent_getattr_async(struct obd_export *exp, struct md_op_item *item); +int mdc_batch_add(struct obd_export *exp, struct lu_batch *bh, + struct md_op_item *item); + enum ldlm_mode mdc_lock_match(struct obd_export *exp, u64 flags, const struct lu_fid *fid, enum ldlm_type type, union ldlm_policy_data *policy, diff --git a/fs/lustre/mdc/mdc_request.c b/fs/lustre/mdc/mdc_request.c index c073da2..643b6ee 100644 --- a/fs/lustre/mdc/mdc_request.c +++ b/fs/lustre/mdc/mdc_request.c @@ -3023,6 +3023,10 @@ static int mdc_cleanup(struct obd_device *obd) .intent_getattr_async = mdc_intent_getattr_async, .revalidate_lock = mdc_revalidate_lock, .rmfid = mdc_rmfid, + .batch_create = cli_batch_create, + .batch_stop = cli_batch_stop, + .batch_flush = cli_batch_flush, + .batch_add = mdc_batch_add, }; dev_t mdc_changelog_dev; diff --git a/fs/lustre/ptlrpc/Makefile b/fs/lustre/ptlrpc/Makefile index 3badb05..29287b4 100644 --- a/fs/lustre/ptlrpc/Makefile +++ b/fs/lustre/ptlrpc/Makefile @@ -13,7 +13,7 @@ ldlm_objs += $(LDLM)ldlm_pool.o ptlrpc_objs := client.o recover.o connection.o niobuf.o pack_generic.o ptlrpc_objs += events.o ptlrpc_module.o service.o pinger.o ptlrpc_objs += llog_net.o llog_client.o import.o ptlrpcd.o -ptlrpc_objs += pers.o lproc_ptlrpc.o wiretest.o layout.o +ptlrpc_objs += pers.o batch.o lproc_ptlrpc.o wiretest.o layout.o ptlrpc_objs += sec.o sec_bulk.o sec_gc.o sec_config.o ptlrpc_objs += sec_null.o sec_plain.o ptlrpc_objs += heap.o nrs.o nrs_fifo.o nrs_delay.o diff --git a/fs/lustre/ptlrpc/batch.c b/fs/lustre/ptlrpc/batch.c new file mode 100644 index 0000000..76eb4cf --- /dev/null +++ b/fs/lustre/ptlrpc/batch.c @@ -0,0 +1,588 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * GPL HEADER START + * + * DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS FILE HEADER. + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License version 2 only, + * as published by the Free Software Foundation. + * + * This program is distributed in the hope that it will be useful, but + * WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * General Public License version 2 for more details (a copy is included + * in the LICENSE file that accompanied this code). + * + * You should have received a copy of the GNU General Public License + * version 2 along with this program; If not, see + * http://www.gnu.org/licenses/gpl-2.0.html + * + * GPL HEADER END + */ +/* + * Copyright (c) 2020, 2022, DDN/Whamcloud Storage Corporation. + */ +/* + * This file is part of Lustre, http://www.lustre.org/ + */ +/* + * lustre/ptlrpc/batch.c + * + * Batch Metadata Updating on the client + * + * Author: Qian Yingjin + */ + +#define DEBUG_SUBSYSTEM S_MDC + +#include +#include +#include + +#define OUT_UPDATE_REPLY_SIZE 4096 + +static inline struct lustre_msg * +batch_update_repmsg_next(struct batch_update_reply *bur, + struct lustre_msg *repmsg) +{ + if (repmsg) + return (struct lustre_msg *)((char *)repmsg + + lustre_packed_msg_size(repmsg)); + else + return &bur->burp_repmsg[0]; +} + +struct batch_update_buffer { + struct batch_update_request *bub_req; + size_t bub_size; + size_t bub_end; + struct list_head bub_item; +}; + +struct batch_update_args { + struct batch_update_head *ba_head; +}; + +/** + * Prepare inline update request + * + * Prepare BUT update ptlrpc inline request, and the request usuanlly includes + * one update buffer, which does not need bulk transfer. + */ +static int batch_prep_inline_update_req(struct batch_update_head *head, + struct ptlrpc_request *req, + int repsize) +{ + struct batch_update_buffer *buf; + struct but_update_header *buh; + int rc; + + buf = list_entry(head->buh_buf_list.next, + struct batch_update_buffer, bub_item); + req_capsule_set_size(&req->rq_pill, &RMF_BUT_HEADER, RCL_CLIENT, + buf->bub_end + sizeof(*buh)); + + rc = ptlrpc_request_pack(req, LUSTRE_MDS_VERSION, MDS_BATCH); + if (rc != 0) + return rc; + + buh = req_capsule_client_get(&req->rq_pill, &RMF_BUT_HEADER); + buh->buh_magic = BUT_HEADER_MAGIC; + buh->buh_count = 1; + buh->buh_inline_length = buf->bub_end; + buh->buh_reply_size = repsize; + buh->buh_update_count = head->buh_update_count; + + memcpy(buh->buh_inline_data, buf->bub_req, buf->bub_end); + + req_capsule_set_size(&req->rq_pill, &RMF_BUT_REPLY, + RCL_SERVER, repsize); + + ptlrpc_request_set_replen(req); + req->rq_request_portal = OUT_PORTAL; + req->rq_reply_portal = OSC_REPLY_PORTAL; + + return rc; +} + +static int batch_prep_update_req(struct batch_update_head *head, + struct ptlrpc_request **reqp) +{ + struct ptlrpc_request *req; + struct ptlrpc_bulk_desc *desc; + struct batch_update_buffer *buf; + struct but_update_header *buh; + struct but_update_buffer *bub; + int page_count = 0; + int total = 0; + int repsize; + int rc; + + repsize = head->buh_repsize + + cfs_size_round(offsetof(struct batch_update_reply, + burp_repmsg[0])); + if (repsize < OUT_UPDATE_REPLY_SIZE) + repsize = OUT_UPDATE_REPLY_SIZE; + + LASSERT(head->buh_buf_count > 0); + + req = ptlrpc_request_alloc(class_exp2cliimp(head->buh_exp), + &RQF_MDS_BATCH); + if (!req) + return -ENOMEM; + + if (head->buh_buf_count == 1) { + buf = list_entry(head->buh_buf_list.next, + struct batch_update_buffer, bub_item); + + /* Check whether it can be packed inline */ + if (buf->bub_end + sizeof(struct but_update_header) < + OUT_UPDATE_MAX_INLINE_SIZE) { + rc = batch_prep_inline_update_req(head, req, repsize); + if (rc == 0) + *reqp = req; + goto out_req; + } + } + + req_capsule_set_size(&req->rq_pill, &RMF_BUT_HEADER, RCL_CLIENT, + sizeof(struct but_update_header)); + req_capsule_set_size(&req->rq_pill, &RMF_BUT_BUF, RCL_CLIENT, + head->buh_buf_count * sizeof(*bub)); + + rc = ptlrpc_request_pack(req, LUSTRE_MDS_VERSION, MDS_BATCH); + if (rc != 0) + goto out_req; + + buh = req_capsule_client_get(&req->rq_pill, &RMF_BUT_HEADER); + buh->buh_magic = BUT_HEADER_MAGIC; + buh->buh_count = head->buh_buf_count; + buh->buh_inline_length = 0; + buh->buh_reply_size = repsize; + buh->buh_update_count = head->buh_update_count; + bub = req_capsule_client_get(&req->rq_pill, &RMF_BUT_BUF); + list_for_each_entry(buf, &head->buh_buf_list, bub_item) { + bub->bub_size = buf->bub_size; + bub++; + /* First *and* last might be partial pages, hence +1 */ + page_count += DIV_ROUND_UP(buf->bub_size, PAGE_SIZE) + 1; + } + + req->rq_bulk_write = 1; + desc = ptlrpc_prep_bulk_imp(req, page_count, + MD_MAX_BRW_SIZE >> LNET_MTU_BITS, + PTLRPC_BULK_GET_SOURCE, + MDS_BULK_PORTAL, + &ptlrpc_bulk_kiov_nopin_ops); + if (!desc) { + rc = -ENOMEM; + goto out_req; + } + + list_for_each_entry(buf, &head->buh_buf_list, bub_item) { + desc->bd_frag_ops->add_iov_frag(desc, buf->bub_req, + buf->bub_size); + total += buf->bub_size; + } + CDEBUG(D_OTHER, "Total %d in %u\n", total, head->buh_update_count); + + req_capsule_set_size(&req->rq_pill, &RMF_BUT_REPLY, + RCL_SERVER, repsize); + + ptlrpc_request_set_replen(req); + req->rq_request_portal = OUT_PORTAL; + req->rq_reply_portal = OSC_REPLY_PORTAL; + *reqp = req; + +out_req: + if (rc < 0) + ptlrpc_req_finished(req); + + return rc; +} + +static struct batch_update_buffer * +current_batch_update_buffer(struct batch_update_head *head) +{ + if (list_empty(&head->buh_buf_list)) + return NULL; + + return list_entry(head->buh_buf_list.prev, struct batch_update_buffer, + bub_item); +} + +static int batch_update_buffer_create(struct batch_update_head *head, + size_t size) +{ + struct batch_update_buffer *buf; + struct batch_update_request *bur; + + buf = kzalloc(sizeof(*buf), GFP_KERNEL); + if (!buf) + return -ENOMEM; + + LASSERT(size > 0); + size = round_up(size, PAGE_SIZE); + bur = kvzalloc(size, GFP_KERNEL); + if (!bur) { + kfree(buf); + return -ENOMEM; + } + + bur->burq_magic = BUT_REQUEST_MAGIC; + bur->burq_count = 0; + buf->bub_req = bur; + buf->bub_size = size; + buf->bub_end = sizeof(*bur); + INIT_LIST_HEAD(&buf->bub_item); + list_add_tail(&buf->bub_item, &head->buh_buf_list); + head->buh_buf_count++; + + return 0; +} + +/** + * Destroy an @object_update_callback. + */ +static void object_update_callback_fini(struct object_update_callback *ouc) +{ + LASSERT(list_empty(&ouc->ouc_item)); + + kfree(ouc); +} + +/** + * Insert an @object_update_callback into the @batch_update_head. + * + * Usually each update in @batch_update_head will have one correspondent + * callback, and these callbacks will be called in ->rq_interpret_reply. + */ +static int +batch_insert_update_callback(struct batch_update_head *head, void *data, + object_update_interpret_t interpret) +{ + struct object_update_callback *ouc; + + ouc = kzalloc(sizeof(*ouc), GFP_KERNEL); + if (!ouc) + return -ENOMEM; + + INIT_LIST_HEAD(&ouc->ouc_item); + ouc->ouc_interpret = interpret; + ouc->ouc_head = head; + ouc->ouc_data = data; + list_add_tail(&ouc->ouc_item, &head->buh_cb_list); + + return 0; +} + +/** + * Allocate and initialize batch update request. + * + * @batch_update_head is being used to track updates being executed on + * this OBD device. The update buffer will be 4K initially, and increased + * if needed. + */ +static struct batch_update_head * +batch_update_request_create(struct obd_export *exp, struct lu_batch *bh) +{ + struct batch_update_head *head; + int rc; + + head = kzalloc(sizeof(*head), GFP_KERNEL); + if (!head) + return ERR_PTR(-ENOMEM); + + INIT_LIST_HEAD(&head->buh_cb_list); + INIT_LIST_HEAD(&head->buh_buf_list); + head->buh_exp = exp; + head->buh_batch = bh; + + rc = batch_update_buffer_create(head, PAGE_SIZE); + if (rc != 0) { + kfree(head); + return ERR_PTR(rc); + } + + return head; +} + +static void batch_update_request_destroy(struct batch_update_head *head) +{ + struct batch_update_buffer *bub, *tmp; + + if (!head) + return; + + list_for_each_entry_safe(bub, tmp, &head->buh_buf_list, bub_item) { + list_del(&bub->bub_item); + kvfree(bub->bub_req); + kfree(bub); + } + + kfree(head); +} + +static int batch_update_request_fini(struct batch_update_head *head, + struct ptlrpc_request *req, + struct batch_update_reply *reply, int rc) +{ + struct object_update_callback *ouc, *next; + struct lustre_msg *repmsg = NULL; + int count = 0; + int index = 0; + + if (reply) + count = reply->burp_count; + + list_for_each_entry_safe(ouc, next, &head->buh_cb_list, ouc_item) { + int rc1 = 0; + + list_del_init(&ouc->ouc_item); + + /* + * The peer may only have handled some requests (indicated by + * @count) in the packaged OUT PRC, we can only get results + * for the handled part. + */ + if (index < count) { + repmsg = batch_update_repmsg_next(reply, repmsg); + if (!repmsg) + rc1 = -EPROTO; + else + rc1 = repmsg->lm_result; + } else { + /* + * The peer did not handle these request, let us return + * -ECANCELED to the update interpreter for now. + */ + repmsg = NULL; + rc1 = -ECANCELED; + } + + if (ouc->ouc_interpret) + ouc->ouc_interpret(req, repmsg, ouc, rc1); + + object_update_callback_fini(ouc); + if (rc == 0 && rc1 < 0) + rc = rc1; + } + + batch_update_request_destroy(head); + + return rc; +} + +static int batch_update_interpret(const struct lu_env *env, + struct ptlrpc_request *req, + void *args, int rc) +{ + struct batch_update_args *aa = (struct batch_update_args *)args; + struct batch_update_reply *reply = NULL; + + if (!aa->ba_head) + return 0; + + ptlrpc_put_mod_rpc_slot(req); + /* Unpack the results from the reply message. */ + if (req->rq_repmsg && req->rq_replied) { + reply = req_capsule_server_sized_get(&req->rq_pill, + &RMF_BUT_REPLY, + sizeof(*reply)); + if ((!reply || + reply->burp_magic != BUT_REPLY_MAGIC) && rc == 0) + rc = -EPROTO; + } + + rc = batch_update_request_fini(aa->ba_head, req, reply, rc); + + return rc; +} + +static int batch_send_update_req(const struct lu_env *env, + struct batch_update_head *head) +{ + struct lu_batch *bh; + struct ptlrpc_request *req = NULL; + struct batch_update_args *aa; + int rc; + + if (!head) + return 0; + + bh = head->buh_batch; + rc = batch_prep_update_req(head, &req); + if (rc) { + rc = batch_update_request_fini(head, NULL, NULL, rc); + return rc; + } + + aa = ptlrpc_req_async_args(aa, req); + aa->ba_head = head; + req->rq_interpret_reply = batch_update_interpret; + + /* + * Only acquire modification RPC slot for the batched RPC + * which contains metadata updates. + */ + if (!(bh->lbt_flags & BATCH_FL_RDONLY)) + ptlrpc_get_mod_rpc_slot(req); + + if (bh->lbt_flags & BATCH_FL_SYNC) { + rc = ptlrpc_queue_wait(req); + } else { + if ((bh->lbt_flags & (BATCH_FL_RDONLY | BATCH_FL_RQSET)) == + BATCH_FL_RDONLY) { + ptlrpcd_add_req(req); + } else if (bh->lbt_flags & BATCH_FL_RQSET) { + ptlrpc_set_add_req(bh->lbt_rqset, req); + ptlrpc_check_set(env, bh->lbt_rqset); + } else { + ptlrpcd_add_req(req); + } + req = NULL; + } + + if (req) + ptlrpc_req_finished(req); + + return rc; +} + +static int batch_update_request_add(struct batch_update_head **headp, + struct md_op_item *item, + md_update_pack_t packer, + object_update_interpret_t interpreter) +{ + struct batch_update_head *head = *headp; + struct lu_batch *bh = head->buh_batch; + struct batch_update_buffer *buf; + struct lustre_msg *reqmsg; + size_t max_len; + int rc; + + for (; ;) { + buf = current_batch_update_buffer(head); + LASSERT(buf); + max_len = buf->bub_size - buf->bub_end; + reqmsg = (struct lustre_msg *)((char *)buf->bub_req + + buf->bub_end); + rc = packer(head, reqmsg, &max_len, item); + if (rc == -E2BIG) { + int rc2; + + /* Create new batch object update buffer */ + rc2 = batch_update_buffer_create(head, + max_len + offsetof(struct batch_update_request, + burq_reqmsg[0]) + 1); + if (rc2 != 0) { + rc = rc2; + break; + } + } else { + if (rc == 0) { + buf->bub_end += max_len; + buf->bub_req->burq_count++; + head->buh_update_count++; + head->buh_repsize += reqmsg->lm_repsize; + } + break; + } + } + + if (rc) + goto out; + + rc = batch_insert_update_callback(head, item, interpreter); + if (rc) + goto out; + + /* Unplug the batch queue if accumulated enough update requests. */ + if (bh->lbt_max_count && head->buh_update_count >= bh->lbt_max_count) { + rc = batch_send_update_req(NULL, head); + *headp = NULL; + } +out: + if (rc) { + batch_update_request_destroy(head); + *headp = NULL; + } + + return rc; +} + +struct lu_batch *cli_batch_create(struct obd_export *exp, + enum lu_batch_flags flags, u32 max_count) +{ + struct cli_batch *cbh; + struct lu_batch *bh; + + cbh = kzalloc(sizeof(*cbh), GFP_KERNEL); + if (!cbh) + return ERR_PTR(-ENOMEM); + + bh = &cbh->cbh_super; + bh->lbt_result = 0; + bh->lbt_flags = flags; + bh->lbt_max_count = max_count; + + cbh->cbh_head = batch_update_request_create(exp, bh); + if (IS_ERR(cbh->cbh_head)) { + bh = (struct lu_batch *)cbh->cbh_head; + kfree(cbh); + } + + return bh; +} +EXPORT_SYMBOL(cli_batch_create); + +int cli_batch_stop(struct obd_export *exp, struct lu_batch *bh) +{ + struct cli_batch *cbh; + int rc; + + cbh = container_of(bh, struct cli_batch, cbh_super); + rc = batch_send_update_req(NULL, cbh->cbh_head); + + kfree(cbh); + return rc; +} +EXPORT_SYMBOL(cli_batch_stop); + +int cli_batch_flush(struct obd_export *exp, struct lu_batch *bh, bool wait) +{ + struct cli_batch *cbh; + int rc; + + cbh = container_of(bh, struct cli_batch, cbh_super); + if (!cbh->cbh_head) + return 0; + + rc = batch_send_update_req(NULL, cbh->cbh_head); + cbh->cbh_head = NULL; + + return rc; +} +EXPORT_SYMBOL(cli_batch_flush); + +int cli_batch_add(struct obd_export *exp, struct lu_batch *bh, + struct md_op_item *item, md_update_pack_t packer, + object_update_interpret_t interpreter) +{ + struct cli_batch *cbh; + int rc; + + cbh = container_of(bh, struct cli_batch, cbh_super); + if (!cbh->cbh_head) { + cbh->cbh_head = batch_update_request_create(exp, bh); + if (IS_ERR(cbh->cbh_head)) + return PTR_ERR(cbh->cbh_head); + } + + rc = batch_update_request_add(&cbh->cbh_head, item, + packer, interpreter); + + return rc; +} +EXPORT_SYMBOL(cli_batch_add); diff --git a/fs/lustre/ptlrpc/client.c b/fs/lustre/ptlrpc/client.c index 6c1d98d..c9a8c8f 100644 --- a/fs/lustre/ptlrpc/client.c +++ b/fs/lustre/ptlrpc/client.c @@ -70,6 +70,30 @@ static void ptlrpc_release_bulk_page_pin(struct ptlrpc_bulk_desc *desc) put_page(desc->bd_vec[i].bv_page); } +static int ptlrpc_prep_bulk_frag_pages(struct ptlrpc_bulk_desc *desc, + void *frag, int len) +{ + unsigned int offset = (unsigned long)frag & ~PAGE_MASK; + + while (len > 0) { + int page_len = min_t(unsigned int, PAGE_SIZE - offset, + len); + struct page *pg; + + if (is_vmalloc_addr(frag)) + pg = vmalloc_to_page(frag); + else + pg = virt_to_page(frag); + + ptlrpc_prep_bulk_page_nopin(desc, pg, offset, page_len); + offset = 0; + len -= page_len; + frag += page_len; + } + + return desc->bd_nob; +} + const struct ptlrpc_bulk_frag_ops ptlrpc_bulk_kiov_pin_ops = { .add_kiov_frag = ptlrpc_prep_bulk_page_pin, .release_frags = ptlrpc_release_bulk_page_pin, @@ -79,6 +103,7 @@ static void ptlrpc_release_bulk_page_pin(struct ptlrpc_bulk_desc *desc) const struct ptlrpc_bulk_frag_ops ptlrpc_bulk_kiov_nopin_ops = { .add_kiov_frag = ptlrpc_prep_bulk_page_nopin, .release_frags = NULL, + .add_iov_frag = ptlrpc_prep_bulk_frag_pages, }; EXPORT_SYMBOL(ptlrpc_bulk_kiov_nopin_ops); diff --git a/fs/lustre/ptlrpc/layout.c b/fs/lustre/ptlrpc/layout.c index 82ec899..0fe74ff 100644 --- a/fs/lustre/ptlrpc/layout.c +++ b/fs/lustre/ptlrpc/layout.c @@ -561,6 +561,17 @@ &RMF_CAPA2 }; +static const struct req_msg_field *mds_batch_client[] = { + &RMF_PTLRPC_BODY, + &RMF_BUT_HEADER, + &RMF_BUT_BUF, +}; + +static const struct req_msg_field *mds_batch_server[] = { + &RMF_PTLRPC_BODY, + &RMF_BUT_REPLY, +}; + static const struct req_msg_field *llog_origin_handle_create_client[] = { &RMF_PTLRPC_BODY, &RMF_LLOGD_BODY, @@ -800,6 +811,7 @@ &RQF_LLOG_ORIGIN_HANDLE_PREV_BLOCK, &RQF_LLOG_ORIGIN_HANDLE_READ_HEADER, &RQF_CONNECT, + &RQF_MDS_BATCH, }; struct req_msg_field { @@ -1222,6 +1234,20 @@ struct req_msg_field RMF_OST_LADVISE = lustre_swab_ladvise, NULL); EXPORT_SYMBOL(RMF_OST_LADVISE); +struct req_msg_field RMF_BUT_REPLY = + DEFINE_MSGF("batch_update_reply", 0, -1, + lustre_swab_batch_update_reply, NULL); +EXPORT_SYMBOL(RMF_BUT_REPLY); + +struct req_msg_field RMF_BUT_HEADER = DEFINE_MSGF("but_update_header", 0, + -1, lustre_swab_but_update_header, NULL); +EXPORT_SYMBOL(RMF_BUT_HEADER); + +struct req_msg_field RMF_BUT_BUF = DEFINE_MSGF("but_update_buf", + RMF_F_STRUCT_ARRAY, sizeof(struct but_update_buffer), + lustre_swab_but_update_buffer, NULL); +EXPORT_SYMBOL(RMF_BUT_BUF); + /* * Request formats. */ @@ -1422,6 +1448,11 @@ struct req_format RQF_MDS_GET_INFO = mds_getinfo_server); EXPORT_SYMBOL(RQF_MDS_GET_INFO); +struct req_format RQF_MDS_BATCH = + DEFINE_REQ_FMT0("MDS_BATCH", mds_batch_client, + mds_batch_server); +EXPORT_SYMBOL(RQF_MDS_BATCH); + struct req_format RQF_LDLM_ENQUEUE = DEFINE_REQ_FMT0("LDLM_ENQUEUE", ldlm_enqueue_client, ldlm_enqueue_lvb_server); @@ -1849,17 +1880,61 @@ int req_capsule_server_pack(struct req_capsule *pill) LASSERT(fmt); count = req_capsule_filled_sizes(pill, RCL_SERVER); - rc = lustre_pack_reply(pill->rc_req, count, - pill->rc_area[RCL_SERVER], NULL); - if (rc != 0) { - DEBUG_REQ(D_ERROR, pill->rc_req, - "Cannot pack %d fields in format '%s'", - count, fmt->rf_name); + if (req_capsule_ptlreq(pill)) { + rc = lustre_pack_reply(pill->rc_req, count, + pill->rc_area[RCL_SERVER], NULL); + if (rc != 0) { + DEBUG_REQ(D_ERROR, pill->rc_req, + "Cannot pack %d fields in format '%s'", + count, fmt->rf_name); + } + } else { /* SUB request */ + u32 msg_len; + + msg_len = lustre_msg_size_v2(count, pill->rc_area[RCL_SERVER]); + if (msg_len > pill->rc_reqmsg->lm_repsize) { + /* TODO: Check whether there is enough buffer size */ + CDEBUG(D_INFO, + "Overflow pack %d fields in format '%s' for the SUB request with message len %u:%u\n", + count, fmt->rf_name, msg_len, + pill->rc_reqmsg->lm_repsize); + } + + rc = 0; + lustre_init_msg_v2(pill->rc_repmsg, count, + pill->rc_area[RCL_SERVER], NULL); } + return rc; } EXPORT_SYMBOL(req_capsule_server_pack); +int req_capsule_client_pack(struct req_capsule *pill) +{ + const struct req_format *fmt; + int count; + int rc = 0; + + LASSERT(pill->rc_loc == RCL_CLIENT); + fmt = pill->rc_fmt; + LASSERT(fmt); + + count = req_capsule_filled_sizes(pill, RCL_CLIENT); + if (req_capsule_ptlreq(pill)) { + struct ptlrpc_request *req = pill->rc_req; + + rc = lustre_pack_request(req, req->rq_import->imp_msg_magic, + count, pill->rc_area[RCL_CLIENT], + NULL); + } else { + /* Sub request in a batch PTLRPC request */ + lustre_init_msg_v2(pill->rc_reqmsg, count, + pill->rc_area[RCL_CLIENT], NULL); + } + return rc; +} +EXPORT_SYMBOL(req_capsule_client_pack); + /** * Returns the PTLRPC request or reply (@loc) buffer offset of a @pill * corresponding to the given RMF (@field). @@ -2050,6 +2125,7 @@ static void *__req_capsule_get(struct req_capsule *pill, value = getter(msg, offset, len); if (!value) { + LASSERT(pill->rc_req); DEBUG_REQ(D_ERROR, pill->rc_req, "Wrong buffer for field '%s' (%u of %u) in format '%s', %u vs. %u (%s)", field->rmf_name, offset, lustre_msg_bufcount(msg), @@ -2218,10 +2294,18 @@ u32 req_capsule_get_size(const struct req_capsule *pill, */ u32 req_capsule_msg_size(struct req_capsule *pill, enum req_location loc) { - return lustre_msg_size(pill->rc_req->rq_import->imp_msg_magic, - pill->rc_fmt->rf_fields[loc].nr, - pill->rc_area[loc]); + if (req_capsule_ptlreq(pill)) { + return lustre_msg_size(pill->rc_req->rq_import->imp_msg_magic, + pill->rc_fmt->rf_fields[loc].nr, + pill->rc_area[loc]); + } else { /* SUB request in a batch request */ + int count; + + count = req_capsule_filled_sizes(pill, loc); + return lustre_msg_size_v2(count, pill->rc_area[loc]); + } } +EXPORT_SYMBOL(req_capsule_msg_size); /** * While req_capsule_msg_size() computes the size of a PTLRPC request or reply @@ -2373,16 +2457,32 @@ void req_capsule_shrink(struct req_capsule *pill, LASSERTF(newlen <= len, "%s:%s, oldlen=%u, newlen=%u\n", fmt->rf_name, field->rmf_name, len, newlen); + len = lustre_shrink_msg(msg, offset, newlen, 1); if (loc == RCL_CLIENT) { - pill->rc_req->rq_reqlen = lustre_shrink_msg(msg, offset, newlen, - 1); + if (req_capsule_ptlreq(pill)) + pill->rc_req->rq_reqlen = len; } else { - pill->rc_req->rq_replen = lustre_shrink_msg(msg, offset, newlen, - 1); /* update also field size in reply lenghts arrays for possible * reply re-pack due to req_capsule_server_grow() call. */ req_capsule_set_size(pill, field, loc, newlen); + if (req_capsule_ptlreq(pill)) + pill->rc_req->rq_replen = len; } } EXPORT_SYMBOL(req_capsule_shrink); + +void req_capsule_set_replen(struct req_capsule *pill) +{ + if (req_capsule_ptlreq(pill)) { + ptlrpc_request_set_replen(pill->rc_req); + } else { /* SUB request in a batch request */ + int count; + + count = req_capsule_filled_sizes(pill, RCL_SERVER); + pill->rc_reqmsg->lm_repsize = + lustre_msg_size_v2(count, + pill->rc_area[RCL_SERVER]); + } +} +EXPORT_SYMBOL(req_capsule_set_replen); diff --git a/fs/lustre/ptlrpc/lproc_ptlrpc.c b/fs/lustre/ptlrpc/lproc_ptlrpc.c index f3f8a71..af83902 100644 --- a/fs/lustre/ptlrpc/lproc_ptlrpc.c +++ b/fs/lustre/ptlrpc/lproc_ptlrpc.c @@ -98,6 +98,7 @@ { MDS_HSM_CT_UNREGISTER, "mds_hsm_ct_unregister" }, { MDS_SWAP_LAYOUTS, "mds_swap_layouts" }, { MDS_RMFID, "mds_rmfid" }, + { MDS_BATCH, "mds_batch" }, { LDLM_ENQUEUE, "ldlm_enqueue" }, { LDLM_CONVERT, "ldlm_convert" }, { LDLM_CANCEL, "ldlm_cancel" }, diff --git a/fs/lustre/ptlrpc/pack_generic.c b/fs/lustre/ptlrpc/pack_generic.c index 3499611..8d58f9b 100644 --- a/fs/lustre/ptlrpc/pack_generic.c +++ b/fs/lustre/ptlrpc/pack_generic.c @@ -491,7 +491,7 @@ static int lustre_unpack_msg_v2(struct lustre_msg_v2 *m, int len) __swab32s(&m->lm_repsize); __swab32s(&m->lm_cksum); __swab32s(&m->lm_flags); - BUILD_BUG_ON(offsetof(typeof(*m), lm_padding_2) == 0); + __swab32s(&m->lm_opc); BUILD_BUG_ON(offsetof(typeof(*m), lm_padding_3) == 0); } @@ -2591,6 +2591,31 @@ void lustre_swab_hsm_request(struct hsm_request *hr) __swab32s(&hr->hr_data_len); } +/* TODO: swab each sub reply message. */ +void lustre_swab_batch_update_reply(struct batch_update_reply *bur) +{ + __swab32s(&bur->burp_magic); + __swab16s(&bur->burp_count); + __swab16s(&bur->burp_padding); +} + +void lustre_swab_but_update_header(struct but_update_header *buh) +{ + __swab32s(&buh->buh_magic); + __swab32s(&buh->buh_count); + __swab32s(&buh->buh_inline_length); + __swab32s(&buh->buh_reply_size); + __swab32s(&buh->buh_update_count); +} +EXPORT_SYMBOL(lustre_swab_but_update_header); + +void lustre_swab_but_update_buffer(struct but_update_buffer *bub) +{ + __swab32s(&bub->bub_size); + __swab32s(&bub->bub_padding); +} +EXPORT_SYMBOL(lustre_swab_but_update_buffer); + void lustre_swab_swap_layouts(struct mdc_swap_layouts *msl) { __swab64s(&msl->msl_flags); diff --git a/fs/lustre/ptlrpc/wiretest.c b/fs/lustre/ptlrpc/wiretest.c index 372dc10..2c02430 100644 --- a/fs/lustre/ptlrpc/wiretest.c +++ b/fs/lustre/ptlrpc/wiretest.c @@ -181,7 +181,9 @@ void lustre_assert_wire_constants(void) (long long)MDS_SWAP_LAYOUTS); LASSERTF(MDS_RMFID == 62, "found %lld\n", (long long)MDS_RMFID); - LASSERTF(MDS_LAST_OPC == 63, "found %lld\n", + LASSERTF(MDS_BATCH == 63, "found %lld\n", + (long long)MDS_BATCH); + LASSERTF(MDS_LAST_OPC == 64, "found %lld\n", (long long)MDS_LAST_OPC); LASSERTF(REINT_SETATTR == 1, "found %lld\n", (long long)REINT_SETATTR); @@ -661,10 +663,10 @@ void lustre_assert_wire_constants(void) (long long)(int)offsetof(struct lustre_msg_v2, lm_flags)); LASSERTF((int)sizeof(((struct lustre_msg_v2 *)0)->lm_flags) == 4, "found %lld\n", (long long)(int)sizeof(((struct lustre_msg_v2 *)0)->lm_flags)); - LASSERTF((int)offsetof(struct lustre_msg_v2, lm_padding_2) == 24, "found %lld\n", - (long long)(int)offsetof(struct lustre_msg_v2, lm_padding_2)); - LASSERTF((int)sizeof(((struct lustre_msg_v2 *)0)->lm_padding_2) == 4, "found %lld\n", - (long long)(int)sizeof(((struct lustre_msg_v2 *)0)->lm_padding_2)); + LASSERTF((int)offsetof(struct lustre_msg_v2, lm_opc) == 24, "found %lld\n", + (long long)(int)offsetof(struct lustre_msg_v2, lm_opc)); + LASSERTF((int)sizeof(((struct lustre_msg_v2 *)0)->lm_opc) == 4, "found %lld\n", + (long long)(int)sizeof(((struct lustre_msg_v2 *)0)->lm_opc)); LASSERTF((int)offsetof(struct lustre_msg_v2, lm_padding_3) == 28, "found %lld\n", (long long)(int)offsetof(struct lustre_msg_v2, lm_padding_3)); LASSERTF((int)sizeof(((struct lustre_msg_v2 *)0)->lm_padding_3) == 4, "found %lld\n", diff --git a/include/uapi/linux/lustre/lustre_idl.h b/include/uapi/linux/lustre/lustre_idl.h index 8cf9323..99735fc 100644 --- a/include/uapi/linux/lustre/lustre_idl.h +++ b/include/uapi/linux/lustre/lustre_idl.h @@ -544,7 +544,7 @@ struct lustre_msg_v2 { __u32 lm_repsize; /* size of preallocated reply buffer */ __u32 lm_cksum; /* CRC32 of ptlrpc_body early reply messages */ __u32 lm_flags; /* enum lustre_msghdr MSGHDR_* flags */ - __u32 lm_padding_2; /* unused */ + __u32 lm_opc; /* SUB request opcode in a batch request */ __u32 lm_padding_3; /* unused */ __u32 lm_buflens[0]; /* length of additional buffers in bytes, * padded to a multiple of 8 bytes. @@ -555,6 +555,9 @@ struct lustre_msg_v2 { */ }; +/* The returned result of the SUB request in a batch request */ +#define lm_result lm_opc + /* ptlrpc_body packet pb_types */ #define PTL_RPC_MSG_REQUEST 4711 /* normal RPC request message */ #define PTL_RPC_MSG_ERR 4712 /* error reply if request unprocessed */ @@ -1428,6 +1431,7 @@ enum mds_cmd { MDS_HSM_CT_UNREGISTER = 60, MDS_SWAP_LAYOUTS = 61, MDS_RMFID = 62, + MDS_BATCH = 63, MDS_LAST_OPC }; @@ -2860,6 +2864,82 @@ struct hsm_progress_kernel { __u64 hpk_padding2; } __attribute__((packed)); +#define OUT_UPDATE_MAX_INLINE_SIZE 4096 + +#define BUT_REQUEST_MAGIC 0xBADE0001 +/* Hold batched updates sending to the remote target in a single RPC */ +struct batch_update_request { + /* Magic number: BUT_REQUEST_MAGIC. */ + __u32 burq_magic; + /* Number of sub requests packed in this batched RPC: burq_reqmsg[]. */ + __u16 burq_count; + /* Unused padding field. */ + __u16 burq_padding; + /* + * Sub request message array. As message feild buffers for each sub + * request are packed after padded lustre_msg.lm_buflens[] arrary, thus + * it can locate the next request message via the function + * @batch_update_reqmsg_next() in lustre/include/obj_update.h + */ + struct lustre_msg burq_reqmsg[0]; +}; + +#define BUT_HEADER_MAGIC 0xBADF0001 +/* Header for Batched UpdaTes request */ +struct but_update_header { + /* Magic number: BUT_HEADER_MAGIC */ + __u32 buh_magic; + /* + * When the total request buffer length is less than MAX_INLINE_SIZE, + * @buh_count is set with 1 and the batched RPC request can be packed + * inline. + * Otherwise, @buh_count indicates the IO vector count transferring in + * bulk I/O. + */ + __u32 buh_count; + /* inline buffer length when the batched RPC can be packed inline. */ + __u32 buh_inline_length; + /* The reply buffer size the client prepared. */ + __u32 buh_reply_size; + /* Sub request count in this batched RPC. */ + __u32 buh_update_count; + /* Unused padding field. */ + __u32 buh_padding; + /* Inline buffer used when the RPC request can be packed inline. */ + __u32 buh_inline_data[0]; +}; + +struct but_update_buffer { + __u32 bub_size; + __u32 bub_padding; +}; + +#define BUT_REPLY_MAGIC 0x00AD0001 +/* Batched reply received from a remote targer in a batched RPC. */ +struct batch_update_reply { + /* Magic number: BUT_REPLY_MAGIC. */ + __u32 burp_magic; + /* Successful returned sub requests. */ + __u16 burp_count; + /* Unused padding field. */ + __u16 burp_padding; + /* + * Sub reply message array. + * It can locate the next reply message buffer via the function + * @batch_update_repmsg_next() in lustre/include/obj_update.h + */ + struct lustre_msg burp_repmsg[0]; +}; + +/** + * Batch update opcode. + */ +enum batch_update_cmd { + BUT_GETATTR = 1, + BUT_LAST_OPC, + BUT_FIRST_OPC = BUT_GETATTR, +}; + /** layout swap request structure * fid1 and fid2 are in mdt_body */ From patchwork Sun Apr 9 12:12:42 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 13205941 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from pdx1-mailman-customer002.dreamhost.com (listserver-buz.dreamhost.com [69.163.136.29]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 7E368C77B61 for ; Sun, 9 Apr 2023 12:18:03 +0000 (UTC) Received: from pdx1-mailman-customer002.dreamhost.com (localhost [127.0.0.1]) by pdx1-mailman-customer002.dreamhost.com (Postfix) with ESMTP id 4PvWHq5PN3z1yDl; Sun, 9 Apr 2023 05:14:31 -0700 (PDT) Received: from smtp4.ccs.ornl.gov (smtp4.ccs.ornl.gov [160.91.203.40]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by pdx1-mailman-customer002.dreamhost.com (Postfix) with ESMTPS id 4PvWGf5f0fz1wY2 for ; Sun, 9 Apr 2023 05:13:30 -0700 (PDT) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp4.ccs.ornl.gov (Postfix) with ESMTP id CB840100826E; Sun, 9 Apr 2023 08:13:27 -0400 (EDT) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id BB0F12B3; Sun, 9 Apr 2023 08:13:27 -0400 (EDT) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Sun, 9 Apr 2023 08:12:42 -0400 Message-Id: <1681042400-15491-3-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1681042400-15491-1-git-send-email-jsimmons@infradead.org> References: <1681042400-15491-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 02/40] lustre: lov: fiemap improperly handles fm_extent_count=0 X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.39 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Andrew Perepechko , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Andrew Perepechko FIEMAP calls with fm_extent_count=0 are supposed only to return the number of extents. lov_object_fiemap() attempts to initialize stripe_last based on fiemap->fm_extents[0] which is not initialized in userspace and not even allocated in kernelspace. Eventually, the call exits with -EINVAL and "FIEMAP does not init start entry" kernel log message. Fixes: f39704f6e1 ("lustre: lov: FIEMAP support for PFL and FLR file") HPE-bug-id: LUS-11443 WC-bug-id: https://jira.whamcloud.com/browse/LU-16480 Lustre-commit: 829af7b029d8e4e39 ("LU-16480 lov: fiemap improperly handles fm_extent_count=0") Signed-off-by: Andrew Perepechko Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/49645 Reviewed-by: Andreas Dilger Reviewed-by: Alexander Boyko Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/lov/lov_object.c | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/fs/lustre/lov/lov_object.c b/fs/lustre/lov/lov_object.c index 34cb6a0..5d65aab 100644 --- a/fs/lustre/lov/lov_object.c +++ b/fs/lustre/lov/lov_object.c @@ -1896,7 +1896,7 @@ static int lov_object_fiemap(const struct lu_env *env, struct cl_object *obj, struct fiemap_state fs = { NULL }; struct lu_extent range; int cur_ext; - int stripe_last; + int stripe_last = 0; int start_stripe = 0; bool resume = false; @@ -1992,9 +1992,10 @@ static int lov_object_fiemap(const struct lu_env *env, struct cl_object *obj, * the high 16bits of fe_device remember which stripe the last * call has been arrived, we'd continue from there in this call. */ - if (fiemap->fm_extent_count && fiemap->fm_extents[0].fe_logical) + if (fiemap->fm_extent_count && fiemap->fm_extents[0].fe_logical) { resume = true; - stripe_last = get_fe_stripenr(&fiemap->fm_extents[0]); + stripe_last = get_fe_stripenr(&fiemap->fm_extents[0]); + } /** * stripe_last records stripe number we've been processed in the last * call From patchwork Sun Apr 9 12:12:43 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 13205943 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from pdx1-mailman-customer002.dreamhost.com (listserver-buz.dreamhost.com [69.163.136.29]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id D397EC77B70 for ; Sun, 9 Apr 2023 12:20:43 +0000 (UTC) Received: from pdx1-mailman-customer002.dreamhost.com (localhost [127.0.0.1]) by pdx1-mailman-customer002.dreamhost.com (Postfix) with ESMTP id 4PvWJT1LBLz1yGS; Sun, 9 Apr 2023 05:15:05 -0700 (PDT) Received: from smtp4.ccs.ornl.gov (smtp4.ccs.ornl.gov [160.91.203.40]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by pdx1-mailman-customer002.dreamhost.com (Postfix) with ESMTPS id 4PvWGh0w3Nz1y6C for ; Sun, 9 Apr 2023 05:13:32 -0700 (PDT) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp4.ccs.ornl.gov (Postfix) with ESMTP id CD22E100826F; Sun, 9 Apr 2023 08:13:27 -0400 (EDT) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id C0F8F2B4; Sun, 9 Apr 2023 08:13:27 -0400 (EDT) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Sun, 9 Apr 2023 08:12:43 -0400 Message-Id: <1681042400-15491-4-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1681042400-15491-1-git-send-email-jsimmons@infradead.org> References: <1681042400-15491-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 03/40] lustre: llite: SIGBUS is possible on a race with page reclaim X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.39 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Patrick Farrell , Andrew Perepechko , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Andrew Perepechko We can restart fault handling if page truncation happens in parallel with the fault handler. WC-bug-id: https://jira.whamcloud.com/browse/LU-16160 Lustre-commit: b4da788a819f82d35 ("LU-16160 llite: SIGBUS is possible on a race with page reclaim") Signed-off-by: Andrew Perepechko Signed-off-by: Patrick Farrell Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/49647 Reviewed-by: Andreas Dilger Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/llite/llite_internal.h | 4 ++++ fs/lustre/llite/llite_lib.c | 1 + fs/lustre/llite/llite_mmap.c | 19 +++++++++++++++++++ fs/lustre/llite/vvp_page.c | 37 +++++++++++++++++++++++++++++++++++++ fs/lustre/obdclass/cl_page.c | 18 ------------------ 5 files changed, 61 insertions(+), 18 deletions(-) diff --git a/fs/lustre/llite/llite_internal.h b/fs/lustre/llite/llite_internal.h index c42330e..0dac71d 100644 --- a/fs/lustre/llite/llite_internal.h +++ b/fs/lustre/llite/llite_internal.h @@ -47,6 +47,7 @@ #include #include #include +#include #include #include #include @@ -287,6 +288,7 @@ struct ll_inode_info { struct mutex lli_xattrs_enq_lock; struct list_head lli_xattrs; /* ll_xattr_entry->xe_list */ struct list_head lli_lccs; /* list of ll_cl_context */ + seqlock_t lli_page_inv_lock; }; static inline void ll_trunc_sem_init(struct ll_trunc_sem *sem) @@ -1834,4 +1836,6 @@ int ll_file_open_encrypt(struct inode *inode, struct file *filp) bool ll_foreign_is_openable(struct dentry *dentry, unsigned int flags); bool ll_foreign_is_removable(struct dentry *dentry, bool unset); +int ll_filemap_fault(struct vm_area_struct *vma, struct vm_fault *vmf); + #endif /* LLITE_INTERNAL_H */ diff --git a/fs/lustre/llite/llite_lib.c b/fs/lustre/llite/llite_lib.c index 30056a6..f84b6f5 100644 --- a/fs/lustre/llite/llite_lib.c +++ b/fs/lustre/llite/llite_lib.c @@ -1213,6 +1213,7 @@ void ll_lli_init(struct ll_inode_info *lli) memset(lli->lli_jobid, 0, sizeof(lli->lli_jobid)); /* ll_cl_context initialize */ INIT_LIST_HEAD(&lli->lli_lccs); + seqlock_init(&lli->lli_page_inv_lock); } int ll_fill_super(struct super_block *sb) diff --git a/fs/lustre/llite/llite_mmap.c b/fs/lustre/llite/llite_mmap.c index 4acc7ee..db069de 100644 --- a/fs/lustre/llite/llite_mmap.c +++ b/fs/lustre/llite/llite_mmap.c @@ -257,6 +257,25 @@ static inline vm_fault_t to_fault_error(int result) return result; } +int ll_filemap_fault(struct vm_area_struct *vma, struct vm_fault *vmf) +{ + struct inode *inode = file_inode(vma->vm_file); + int ret; + unsigned int seq; + + /* this seqlock lets us notice if a page has been deleted on this inode + * during the fault process, allowing us to catch an erroneous SIGBUS + * See LU-16160 + */ + do { + seq = read_seqbegin(&ll_i2info(inode)->lli_page_inv_lock); + ret = filemap_fault(vmf); + } while (read_seqretry(&ll_i2info(inode)->lli_page_inv_lock, seq) && + (ret & VM_FAULT_SIGBUS)); + + return ret; +} + /** * Lustre implementation of a vm_operations_struct::fault() method, called by * VM to server page fault (both in kernel and user space). diff --git a/fs/lustre/llite/vvp_page.c b/fs/lustre/llite/vvp_page.c index f359596..30524fd 100644 --- a/fs/lustre/llite/vvp_page.c +++ b/fs/lustre/llite/vvp_page.c @@ -63,6 +63,42 @@ static void vvp_page_discard(const struct lu_env *env, ll_ra_stats_inc(vmpage->mapping->host, RA_STAT_DISCARDED); } +static void vvp_page_delete(const struct lu_env *env, + const struct cl_page_slice *slice) +{ + struct cl_page *cp = slice->cpl_page; + + if (cp->cp_type == CPT_CACHEABLE) { + struct page *vmpage = cp->cp_vmpage; + struct inode *inode = vmpage->mapping->host; + + LASSERT(PageLocked(vmpage)); + LASSERT((struct cl_page *)vmpage->private == cp); + + /* Drop the reference count held in vvp_page_init */ + refcount_dec(&cp->cp_ref); + + ClearPagePrivate(vmpage); + vmpage->private = 0; + + /* clearpageuptodate prevents the page being read by the + * kernel after it has been deleted from Lustre, which avoids + * potential stale data reads. The seqlock allows us to see + * that a page was potentially deleted and catch the resulting + * SIGBUS - see ll_filemap_fault() (LU-16160) + */ + write_seqlock(&ll_i2info(inode)->lli_page_inv_lock); + ClearPageUptodate(vmpage); + write_sequnlock(&ll_i2info(inode)->lli_page_inv_lock); + + /* + * The reference from vmpage to cl_page is removed, + * but the reference back is still here. It is removed + * later in cl_page_free(). + */ + } +} + /** * Handles page transfer errors at VM level. * @@ -146,6 +182,7 @@ static void vvp_page_completion_write(const struct lu_env *env, } static const struct cl_page_operations vvp_page_ops = { + .cpo_delete = vvp_page_delete, .cpo_discard = vvp_page_discard, .io = { [CRT_READ] = { diff --git a/fs/lustre/obdclass/cl_page.c b/fs/lustre/obdclass/cl_page.c index 7011235..62d8ee5 100644 --- a/fs/lustre/obdclass/cl_page.c +++ b/fs/lustre/obdclass/cl_page.c @@ -704,7 +704,6 @@ void cl_page_discard(const struct lu_env *env, static void __cl_page_delete(const struct lu_env *env, struct cl_page *cp) { const struct cl_page_slice *slice; - struct page *vmpage; int i; PASSERT(env, cp, cp->cp_state != CPS_FREEING); @@ -719,23 +718,6 @@ static void __cl_page_delete(const struct lu_env *env, struct cl_page *cp) if (slice->cpl_ops->cpo_delete) (*slice->cpl_ops->cpo_delete)(env, slice); } - - if (cp->cp_type == CPT_CACHEABLE) { - vmpage = cp->cp_vmpage; - LASSERT(PageLocked(vmpage)); - LASSERT((struct cl_page *)vmpage->private == cp); - - /* Drop the reference count held in vvp_page_init */ - refcount_dec(&cp->cp_ref); - ClearPagePrivate(vmpage); - vmpage->private = 0; - - /* - * The reference from vmpage to cl_page is removed, - * but the reference back is still here. It is removed - * later in cl_page_free(). - */ - } } /** From patchwork Sun Apr 9 12:12:44 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 13205938 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from pdx1-mailman-customer002.dreamhost.com (listserver-buz.dreamhost.com [69.163.136.29]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 0A38EC77B70 for ; Sun, 9 Apr 2023 12:14:51 +0000 (UTC) Received: from pdx1-mailman-customer002.dreamhost.com (localhost [127.0.0.1]) by pdx1-mailman-customer002.dreamhost.com (Postfix) with ESMTP id 4PvWH44TyTz1y7L; Sun, 9 Apr 2023 05:13:52 -0700 (PDT) Received: from smtp4.ccs.ornl.gov (smtp4.ccs.ornl.gov [160.91.203.40]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by pdx1-mailman-customer002.dreamhost.com (Postfix) with ESMTPS id 4PvWGk6w2Rz1y6W for ; Sun, 9 Apr 2023 05:13:34 -0700 (PDT) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp4.ccs.ornl.gov (Postfix) with ESMTP id CEAC61008270; Sun, 9 Apr 2023 08:13:27 -0400 (EDT) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id C697A2B5; Sun, 9 Apr 2023 08:13:27 -0400 (EDT) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Sun, 9 Apr 2023 08:12:44 -0400 Message-Id: <1681042400-15491-5-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1681042400-15491-1-git-send-email-jsimmons@infradead.org> References: <1681042400-15491-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 04/40] lustre: osc: page fault in osc_release_bounce_pages() X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.39 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Andriy Skulysh , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Andriy Skulysh pga[i] can be uninitialized. It happens after following code path in osc_build_rpc(): oa = kmem_cache_zalloc(osc_obdo_kmem, GFP_NOFS); if (!oa) { rc = -ENOMEM; goto out; } Fixes: ef93d889b4c6 ("lustre: sec: encryption for write path") HPE-bug-id: LUS-10991 WC-bug-id: https://jira.whamcloud.com/browse/LU-16333 Signed-off-by: Andriy Skulysh Reviewed-by: Alexander Zarochentsev Reviewed-by: Alexander Boyko Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/49210 Reviewed-by: Sebastien Buisson Reviewed-by: Oleg Drokin Reviewed-by: Alexander Boyko Reviewed-by: Andreas Dilger Signed-off-by: James Simmons --- fs/lustre/osc/osc_request.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/fs/lustre/osc/osc_request.c b/fs/lustre/osc/osc_request.c index bd294c5..6ea1db6 100644 --- a/fs/lustre/osc/osc_request.c +++ b/fs/lustre/osc/osc_request.c @@ -1463,6 +1463,9 @@ static inline void osc_release_bounce_pages(struct brw_page **pga, struct page **pa = NULL; int i, j = 0; + if (!pga[0]) + return; + if (PageChecked(pga[0]->pg)) { pa = kvmalloc_array(page_count, sizeof(*pa), GFP_KERNEL | __GFP_ZERO); From patchwork Sun Apr 9 12:12:45 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 13205940 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from pdx1-mailman-customer002.dreamhost.com (listserver-buz.dreamhost.com [69.163.136.29]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 57308C77B70 for ; Sun, 9 Apr 2023 12:17:00 +0000 (UTC) Received: from pdx1-mailman-customer002.dreamhost.com (localhost [127.0.0.1]) by pdx1-mailman-customer002.dreamhost.com (Postfix) with ESMTP id 4PvWHR3mHCz1yD2; Sun, 9 Apr 2023 05:14:11 -0700 (PDT) Received: from smtp4.ccs.ornl.gov (smtp4.ccs.ornl.gov [160.91.203.40]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by pdx1-mailman-customer002.dreamhost.com (Postfix) with ESMTPS id 4PvWGq1npkz1y6k for ; Sun, 9 Apr 2023 05:13:39 -0700 (PDT) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp4.ccs.ornl.gov (Postfix) with ESMTP id D737D1008271; Sun, 9 Apr 2023 08:13:27 -0400 (EDT) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id CB18B2B6; Sun, 9 Apr 2023 08:13:27 -0400 (EDT) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Sun, 9 Apr 2023 08:12:45 -0400 Message-Id: <1681042400-15491-6-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1681042400-15491-1-git-send-email-jsimmons@infradead.org> References: <1681042400-15491-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 05/40] lustre: readahead: add stats for read-ahead page count X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.39 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Qian Yingjin This patch adds the stats for read-ahead page count: lctl get_param llite.*.read_ahead_stats llite.lustre-ffff938b7849d000.read_ahead_stats= snapshot_time 4011.320890492 secs.nsecs start_time 0.000000000 secs.nsecs elapsed_time 4011.320890492 secs.nsecs hits 4 samples [pages] misses 1 samples [pages] zero_size_window 4 samples [pages] failed_to_reach_end 1 samples [pages] failed_to_fast_read 1 samples [pages] readahead_pages 1 samples [pages] 255 255 255 WC-bug-id: https://jira.whamcloud.com/browse/LU-16338 Lustre-commit: cdcf97e17e73dfdd6 ("LU-16338 readahead: add stats for read-ahead page count") Signed-off-by: Qian Yingjin Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/49224 Reviewed-by: Andreas Dilger Reviewed-by: Patrick Farrell Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/llite/llite_internal.h | 1 + fs/lustre/llite/lproc_llite.c | 15 ++++++++++++--- fs/lustre/llite/rw.c | 12 ++++++++++++ 3 files changed, 25 insertions(+), 3 deletions(-) diff --git a/fs/lustre/llite/llite_internal.h b/fs/lustre/llite/llite_internal.h index 0dac71d..1d85d0b 100644 --- a/fs/lustre/llite/llite_internal.h +++ b/fs/lustre/llite/llite_internal.h @@ -578,6 +578,7 @@ enum ra_stat { RA_STAT_ASYNC, RA_STAT_FAILED_FAST_READ, RA_STAT_MMAP_RANGE_READ, + RA_STAT_READAHEAD_PAGES, _NR_RA_STAT, }; diff --git a/fs/lustre/llite/lproc_llite.c b/fs/lustre/llite/lproc_llite.c index 3d64a93..70dbc87 100644 --- a/fs/lustre/llite/lproc_llite.c +++ b/fs/lustre/llite/lproc_llite.c @@ -1858,6 +1858,7 @@ void ll_stats_ops_tally(struct ll_sb_info *sbi, int op, long count) [RA_STAT_ASYNC] = "async readahead", [RA_STAT_FAILED_FAST_READ] = "failed to fast read", [RA_STAT_MMAP_RANGE_READ] = "mmap range read", + [RA_STAT_READAHEAD_PAGES] = "readahead_pages", }; int ll_debugfs_register_super(struct super_block *sb, const char *name) @@ -1911,9 +1912,17 @@ int ll_debugfs_register_super(struct super_block *sb, const char *name) goto out_stats; } - for (id = 0; id < ARRAY_SIZE(ra_stat_string); id++) - lprocfs_counter_init(sbi->ll_ra_stats, id, LPROCFS_TYPE_PAGES, - ra_stat_string[id]); + for (id = 0; id < ARRAY_SIZE(ra_stat_string); id++) { + if (id == RA_STAT_READAHEAD_PAGES) + lprocfs_counter_init(sbi->ll_ra_stats, id, + LPROCFS_TYPE_PAGES | + LPROCFS_CNTR_AVGMINMAX, + ra_stat_string[id]); + else + lprocfs_counter_init(sbi->ll_ra_stats, id, + LPROCFS_TYPE_PAGES, + ra_stat_string[id]); + } debugfs_create_file("read_ahead_stats", 0644, sbi->ll_debugfs_entry, sbi->ll_ra_stats, &lprocfs_stats_seq_fops); diff --git a/fs/lustre/llite/rw.c b/fs/lustre/llite/rw.c index 2290b31..0b14ea6 100644 --- a/fs/lustre/llite/rw.c +++ b/fs/lustre/llite/rw.c @@ -150,6 +150,14 @@ void ll_ra_stats_inc(struct inode *inode, enum ra_stat which) ll_ra_stats_inc_sbi(sbi, which); } +void ll_ra_stats_add(struct inode *inode, enum ra_stat which, long count) +{ + struct ll_sb_info *sbi = ll_i2sbi(inode); + + LASSERTF(which < _NR_RA_STAT, "which: %u\n", which); + lprocfs_counter_add(sbi->ll_ra_stats, which, count); +} + #define RAS_CDEBUG(ras) \ CDEBUG(D_READA, \ "lre %llu cr %lu cb %llu wsi %lu wp %lu nra %lu rpc %lu r %lu csr %lu so %llu sb %llu sl %llu lr %lu\n", \ @@ -528,6 +536,10 @@ static bool ras_inside_ra_window(pgoff_t idx, struct ra_io_arg *ria) } cl_read_ahead_release(env, &ra); + if (count) + ll_ra_stats_add(vvp_object_inode(io->ci_obj), + RA_STAT_READAHEAD_PAGES, count); + return count; } From patchwork Sun Apr 9 12:12:46 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 13205946 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from pdx1-mailman-customer002.dreamhost.com (listserver-buz.dreamhost.com [69.163.136.29]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 7C071C77B61 for ; Sun, 9 Apr 2023 12:25:00 +0000 (UTC) Received: from pdx1-mailman-customer002.dreamhost.com (localhost [127.0.0.1]) by pdx1-mailman-customer002.dreamhost.com (Postfix) with ESMTP id 4PvWJr1w5Zz1y7v; Sun, 9 Apr 2023 05:15:24 -0700 (PDT) Received: from smtp4.ccs.ornl.gov (smtp4.ccs.ornl.gov [160.91.203.40]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by pdx1-mailman-customer002.dreamhost.com (Postfix) with ESMTPS id 4PvWH36btRz1y7D for ; Sun, 9 Apr 2023 05:13:51 -0700 (PDT) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp4.ccs.ornl.gov (Postfix) with ESMTP id D815E1008273; Sun, 9 Apr 2023 08:13:27 -0400 (EDT) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id CF84E2B8; Sun, 9 Apr 2023 08:13:27 -0400 (EDT) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Sun, 9 Apr 2023 08:12:46 -0400 Message-Id: <1681042400-15491-7-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1681042400-15491-1-git-send-email-jsimmons@infradead.org> References: <1681042400-15491-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 06/40] lustre: quota: enforce project quota for root X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.39 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Sergey Cheremencev , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Sergey Cheremencev Patch adds an option to enforce project quotas for root. It is disabled by default, to enable set osd-ldiskfs.*.quota_slave.root_prj_enable to 1 at each target where you need this option. WC-bug-id: https://jira.whamcloud.com/browse/LU-16415 Lustre-commit: f147655c33ea61450 ("LU-16415 quota: enforce project quota for root") Signed-off-by: Sergey Cheremencev Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/49460 Reviewed-by: Hongchao Zhang Reviewed-by: Sebastien Buisson Reviewed-by: Oleg Drokin Reviewed-by: Andreas Dilger Signed-off-by: James Simmons --- fs/lustre/include/obd.h | 8 ++++---- fs/lustre/osc/osc_cache.c | 2 +- fs/lustre/osc/osc_quota.c | 1 + include/uapi/linux/lustre/lustre_idl.h | 2 ++ 4 files changed, 8 insertions(+), 5 deletions(-) diff --git a/fs/lustre/include/obd.h b/fs/lustre/include/obd.h index a980bf0..54bef2e 100644 --- a/fs/lustre/include/obd.h +++ b/fs/lustre/include/obd.h @@ -208,8 +208,10 @@ struct client_obd { unsigned int cl_checksum:1, /* 0 = disabled, 1 = enabled */ cl_checksum_dump:1, /* same */ cl_ocd_grant_param:1, - cl_lsom_update:1; /* send LSOM updates */ - /* supported checksum types that are worked out at connect time */ + cl_lsom_update:1, /* send LSOM updates */ + cl_root_squash:1, /* if root squash enabled*/ + /* check prj quota for root */ + cl_root_prjquota:1; enum lustre_sec_part cl_sp_me; enum lustre_sec_part cl_sp_to; struct sptlrpc_flavor cl_flvr_mgc; /* fixed flavor of mgc->mgs */ @@ -233,8 +235,6 @@ struct client_obd { struct list_head cl_grant_chain; time64_t cl_grant_shrink_interval; /* seconds */ - int cl_root_squash; /* if root squash enabled*/ - /* A chunk is an optimal size used by osc_extent to determine * the extent size. A chunk is max(PAGE_SIZE, OST block size) */ diff --git a/fs/lustre/osc/osc_cache.c b/fs/lustre/osc/osc_cache.c index b339aef..dddf98f 100644 --- a/fs/lustre/osc/osc_cache.c +++ b/fs/lustre/osc/osc_cache.c @@ -2366,7 +2366,7 @@ int osc_queue_async_io(const struct lu_env *env, struct cl_io *io, * we should bypass quota */ if ((!oio->oi_cap_sys_resource || - cli->cl_root_squash) && + cli->cl_root_squash || cli->cl_root_prjquota) && !io->ci_noquota) { struct cl_object *obj; struct cl_attr *attr; diff --git a/fs/lustre/osc/osc_quota.c b/fs/lustre/osc/osc_quota.c index 708ad3c..c48a89f3 100644 --- a/fs/lustre/osc/osc_quota.c +++ b/fs/lustre/osc/osc_quota.c @@ -120,6 +120,7 @@ int osc_quota_setdq(struct client_obd *cli, u64 xid, const unsigned int qid[], mutex_lock(&cli->cl_quota_mutex); cli->cl_root_squash = !!(flags & OBD_FL_ROOT_SQUASH); + cli->cl_root_prjquota = !!(flags & OBD_FL_ROOT_PRJQUOTA); /* still mark the quots is running out for the old request, because it * could be processed after the new request at OST, the side effect is * the following request will be processed synchronously, but it will diff --git a/include/uapi/linux/lustre/lustre_idl.h b/include/uapi/linux/lustre/lustre_idl.h index 99735fc..b4185a7 100644 --- a/include/uapi/linux/lustre/lustre_idl.h +++ b/include/uapi/linux/lustre/lustre_idl.h @@ -961,6 +961,7 @@ enum obdo_flags { OBD_FL_FLUSH = 0x00200000, /* flush pages on the OST */ OBD_FL_SHORT_IO = 0x00400000, /* short io request */ OBD_FL_ROOT_SQUASH = 0x00800000, /* root squash */ + OBD_FL_ROOT_PRJQUOTA = 0x01000000, /* check prj quota for root */ /* OBD_FL_LOCAL_MASK = 0xF0000000, was local-only flags until 2.10 */ /* @@ -1250,6 +1251,7 @@ struct hsm_state_set { * it to sync quickly */ #define OBD_BRW_OVER_PRJQUOTA 0x8000 /* Running out of project quota */ +#define OBD_BRW_ROOT_PRJQUOTA 0x10000 /* check project quota for root */ #define OBD_BRW_RDMA_ONLY 0x20000 /* RPC contains RDMA-only pages*/ #define OBD_BRW_SYS_RESOURCE 0x40000 /* page has CAP_SYS_RESOURCE */ From patchwork Sun Apr 9 12:12:47 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 13205942 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from pdx1-mailman-customer002.dreamhost.com (listserver-buz.dreamhost.com [69.163.136.29]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id E545DC77B70 for ; Sun, 9 Apr 2023 12:18:47 +0000 (UTC) Received: from pdx1-mailman-customer002.dreamhost.com (localhost [127.0.0.1]) by pdx1-mailman-customer002.dreamhost.com (Postfix) with ESMTP id 4PvWJ94PCfz1yFH; Sun, 9 Apr 2023 05:14:49 -0700 (PDT) Received: from smtp4.ccs.ornl.gov (smtp4.ccs.ornl.gov [160.91.203.40]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by pdx1-mailman-customer002.dreamhost.com (Postfix) with ESMTPS id 4PvWHD3Tsgz1y7x for ; Sun, 9 Apr 2023 05:14:00 -0700 (PDT) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp4.ccs.ornl.gov (Postfix) with ESMTP id D9C571008275; Sun, 9 Apr 2023 08:13:27 -0400 (EDT) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id D59DB2AB; Sun, 9 Apr 2023 08:13:27 -0400 (EDT) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Sun, 9 Apr 2023 08:12:47 -0400 Message-Id: <1681042400-15491-8-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1681042400-15491-1-git-send-email-jsimmons@infradead.org> References: <1681042400-15491-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 07/40] lustre: ldlm: send the cancel RPC asap X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.39 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Yang Sheng , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Yang Sheng This patch try to send cancel RPC ASAP when bl_ast received from server. The exist problem is that lock could be added in regular queue before bl_ast arrived since other reason. It will prevent lock canceling in timely manner. The other problem is that we collect many locks in one RPC to save the network traffic. But this process could take a long time when dirty pages flushing. - The lock canceling will be processed even lock has been added to bl queue while bl_ast arrived. Unless the cancel RPC has been sent. - Send the cancel RPC immediately for bl_ast lock. Don't try to add more locks in such case. WC-bug-id: https://jira.whamcloud.com/browse/LU-16285 Lustre-commit: b65374d96b2027213 ("LU-16285 ldlm: send the cancel RPC asap") Signed-off-by: Yang Sheng Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/49527 Reviewed-by: Andreas Dilger Reviewed-by: Qian Yingjin Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/include/lustre_dlm.h | 1 + fs/lustre/ldlm/ldlm_lockd.c | 9 ++-- fs/lustre/ldlm/ldlm_request.c | 100 ++++++++++++++++++++++++++++------------- 3 files changed, 75 insertions(+), 35 deletions(-) diff --git a/fs/lustre/include/lustre_dlm.h b/fs/lustre/include/lustre_dlm.h index d08c48f..3a4f152 100644 --- a/fs/lustre/include/lustre_dlm.h +++ b/fs/lustre/include/lustre_dlm.h @@ -593,6 +593,7 @@ enum ldlm_cancel_flags { LCF_BL_AST = 0x4, /* Cancel locks marked as LDLM_FL_BL_AST * in the same RPC */ + LCF_ONE_LOCK = 0x8, /* Cancel locks pack only one lock. */ }; struct ldlm_flock { diff --git a/fs/lustre/ldlm/ldlm_lockd.c b/fs/lustre/ldlm/ldlm_lockd.c index 0ff4e3a..3a085db 100644 --- a/fs/lustre/ldlm/ldlm_lockd.c +++ b/fs/lustre/ldlm/ldlm_lockd.c @@ -700,8 +700,7 @@ static int ldlm_callback_handler(struct ptlrpc_request *req) * we can tell the server we have no lock. Otherwise, we * should send cancel after dropping the cache. */ - if ((ldlm_is_canceling(lock) && ldlm_is_bl_done(lock)) || - ldlm_is_failed(lock)) { + if (ldlm_is_ast_sent(lock) || ldlm_is_failed(lock)) { LDLM_DEBUG(lock, "callback on lock %#llx - lock disappeared", dlm_req->lock_handle[0].cookie); @@ -736,7 +735,7 @@ static int ldlm_callback_handler(struct ptlrpc_request *req) switch (lustre_msg_get_opc(req->rq_reqmsg)) { case LDLM_BL_CALLBACK: - CDEBUG(D_INODE, "blocking ast\n"); + LDLM_DEBUG(lock, "blocking ast\n"); req_capsule_extend(&req->rq_pill, &RQF_LDLM_BL_CALLBACK); if (!ldlm_is_cancel_on_block(lock)) { rc = ldlm_callback_reply(req, 0); @@ -748,14 +747,14 @@ static int ldlm_callback_handler(struct ptlrpc_request *req) ldlm_handle_bl_callback(ns, &dlm_req->lock_desc, lock); break; case LDLM_CP_CALLBACK: - CDEBUG(D_INODE, "completion ast\n"); + LDLM_DEBUG(lock, "completion ast\n"); req_capsule_extend(&req->rq_pill, &RQF_LDLM_CP_CALLBACK); rc = ldlm_handle_cp_callback(req, ns, dlm_req, lock); if (!OBD_FAIL_CHECK(OBD_FAIL_LDLM_CANCEL_BL_CB_RACE)) ldlm_callback_reply(req, rc); break; case LDLM_GL_CALLBACK: - CDEBUG(D_INODE, "glimpse ast\n"); + LDLM_DEBUG(lock, "glimpse ast\n"); req_capsule_extend(&req->rq_pill, &RQF_LDLM_GL_CALLBACK); ldlm_handle_gl_callback(req, ns, dlm_req, lock); break; diff --git a/fs/lustre/ldlm/ldlm_request.c b/fs/lustre/ldlm/ldlm_request.c index 8b244d7..ef3ad28 100644 --- a/fs/lustre/ldlm/ldlm_request.c +++ b/fs/lustre/ldlm/ldlm_request.c @@ -994,14 +994,34 @@ static u64 ldlm_cli_cancel_local(struct ldlm_lock *lock) return rc; } +static inline int __ldlm_pack_lock(struct ldlm_lock *lock, + struct ldlm_request *dlm) +{ + LASSERT(lock->l_conn_export); + lock_res_and_lock(lock); + if (ldlm_is_ast_sent(lock)) { + unlock_res_and_lock(lock); + return 0; + } + ldlm_set_ast_sent(lock); + unlock_res_and_lock(lock); + + /* Pack the lock handle to the given request buffer. */ + LDLM_DEBUG(lock, "packing"); + dlm->lock_handle[dlm->lock_count++] = lock->l_remote_handle; + + return 1; +} +#define ldlm_cancel_pack(req, head, count) \ + _ldlm_cancel_pack(req, NULL, head, count) + /** * Pack @count locks in @head into ldlm_request buffer of request @req. */ -static void ldlm_cancel_pack(struct ptlrpc_request *req, +static int _ldlm_cancel_pack(struct ptlrpc_request *req, struct ldlm_lock *lock, struct list_head *head, int count) { struct ldlm_request *dlm; - struct ldlm_lock *lock; int max, packed = 0; dlm = req_capsule_client_get(&req->rq_pill, &RMF_DLM_REQ); @@ -1019,24 +1039,23 @@ static void ldlm_cancel_pack(struct ptlrpc_request *req, * so that the server cancel would call filter_lvbo_update() less * frequently. */ - list_for_each_entry(lock, head, l_bl_ast) { - if (!count--) - break; - LASSERT(lock->l_conn_export); - /* Pack the lock handle to the given request buffer. */ - LDLM_DEBUG(lock, "packing"); - dlm->lock_handle[dlm->lock_count++] = lock->l_remote_handle; - packed++; + if (lock) { /* only pack one lock */ + packed = __ldlm_pack_lock(lock, dlm); + } else { + list_for_each_entry(lock, head, l_bl_ast) { + if (!count--) + break; + packed += __ldlm_pack_lock(lock, dlm); + } } - CDEBUG(D_DLMTRACE, "%d locks packed\n", packed); + return packed; } /** * Prepare and send a batched cancel RPC. It will include @count lock * handles of locks given in @cancels list. */ -static int ldlm_cli_cancel_req(struct obd_export *exp, - struct list_head *cancels, +static int ldlm_cli_cancel_req(struct obd_export *exp, void *ptr, int count, enum ldlm_cancel_flags flags) { struct ptlrpc_request *req = NULL; @@ -1085,7 +1104,15 @@ static int ldlm_cli_cancel_req(struct obd_export *exp, req->rq_reply_portal = LDLM_CANCEL_REPLY_PORTAL; ptlrpc_at_set_req_timeout(req); - ldlm_cancel_pack(req, cancels, count); + if (flags & LCF_ONE_LOCK) + rc = _ldlm_cancel_pack(req, ptr, NULL, count); + else + rc = _ldlm_cancel_pack(req, NULL, ptr, count); + if (rc == 0) { + ptlrpc_req_finished(req); + sent = count; + goto out; + } ptlrpc_request_set_replen(req); if (flags & LCF_ASYNC) { @@ -1235,10 +1262,10 @@ int ldlm_cli_convert(struct ldlm_lock *lock, * Lock must not have any readers or writers by this time. */ int ldlm_cli_cancel(const struct lustre_handle *lockh, - enum ldlm_cancel_flags cancel_flags) + enum ldlm_cancel_flags flags) { struct obd_export *exp; - int avail, count = 1; + int avail, count = 1, bl_ast = 0; u64 rc = 0; struct ldlm_namespace *ns; struct ldlm_lock *lock; @@ -1253,11 +1280,17 @@ int ldlm_cli_cancel(const struct lustre_handle *lockh, lock_res_and_lock(lock); LASSERT(!ldlm_is_converting(lock)); - /* Lock is being canceled and the caller doesn't want to wait */ - if (ldlm_is_canceling(lock)) { + if (ldlm_is_bl_ast(lock)) { + if (ldlm_is_ast_sent(lock)) { + unlock_res_and_lock(lock); + LDLM_LOCK_RELEASE(lock); + return 0; + } + bl_ast = 1; + } else if (ldlm_is_canceling(lock)) { + /* Lock is being canceled and the caller doesn't want to wait */ unlock_res_and_lock(lock); - - if (!(cancel_flags & LCF_ASYNC)) + if (flags & LCF_ASYNC) wait_event_idle(lock->l_waitq, is_bl_done(lock)); LDLM_LOCK_RELEASE(lock); @@ -1267,24 +1300,30 @@ int ldlm_cli_cancel(const struct lustre_handle *lockh, ldlm_set_canceling(lock); unlock_res_and_lock(lock); - if (cancel_flags & LCF_LOCAL) + if (flags & LCF_LOCAL) OBD_FAIL_TIMEOUT(OBD_FAIL_LDLM_LOCAL_CANCEL_PAUSE, cfs_fail_val); rc = ldlm_cli_cancel_local(lock); - if (rc == LDLM_FL_LOCAL_ONLY || cancel_flags & LCF_LOCAL) { + if (rc == LDLM_FL_LOCAL_ONLY || flags & LCF_LOCAL) { LDLM_LOCK_RELEASE(lock); return 0; } - /* - * Even if the lock is marked as LDLM_FL_BL_AST, this is a LDLM_CANCEL - * RPC which goes to canceld portal, so we can cancel other LRU locks - * here and send them all as one LDLM_CANCEL RPC. - */ - LASSERT(list_empty(&lock->l_bl_ast)); - list_add(&lock->l_bl_ast, &cancels); exp = lock->l_conn_export; + if (bl_ast) { /* Send RPC immedaitly for LDLM_FL_BL_AST */ + ldlm_cli_cancel_req(exp, lock, count, flags | LCF_ONE_LOCK); + LDLM_LOCK_RELEASE(lock); + return 0; + } + + LASSERT(list_empty(&lock->l_bl_ast)); + list_add(&lock->l_bl_ast, &cancels); + /* + * This is a LDLM_CANCEL RPC which goes to canceld portal, + * so we can cancel other LRU locks here and send them all + * as one LDLM_CANCEL RPC. + */ if (exp_connect_cancelset(exp)) { avail = ldlm_format_handles_avail(class_exp2cliimp(exp), &RQF_LDLM_CANCEL, @@ -1295,7 +1334,8 @@ int ldlm_cli_cancel(const struct lustre_handle *lockh, count += ldlm_cancel_lru_local(ns, &cancels, 0, avail - 1, LCF_BL_AST, 0); } - ldlm_cli_cancel_list(&cancels, count, NULL, cancel_flags); + ldlm_cli_cancel_list(&cancels, count, NULL, flags); + return 0; } EXPORT_SYMBOL(ldlm_cli_cancel); From patchwork Sun Apr 9 12:12:48 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 13205948 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from pdx1-mailman-customer002.dreamhost.com (listserver-buz.dreamhost.com [69.163.136.29]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 48F31C77B61 for ; Sun, 9 Apr 2023 12:26:39 +0000 (UTC) Received: from pdx1-mailman-customer002.dreamhost.com (localhost [127.0.0.1]) by pdx1-mailman-customer002.dreamhost.com (Postfix) with ESMTP id 4PvWKc4NdXz21C4; Sun, 9 Apr 2023 05:16:04 -0700 (PDT) Received: from smtp4.ccs.ornl.gov (smtp4.ccs.ornl.gov [160.91.203.40]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by pdx1-mailman-customer002.dreamhost.com (Postfix) with ESMTPS id 4PvWHH0SXVz1yBZ for ; Sun, 9 Apr 2023 05:14:03 -0700 (PDT) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp4.ccs.ornl.gov (Postfix) with ESMTP id DBE591008276; Sun, 9 Apr 2023 08:13:27 -0400 (EDT) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id DA3C62B2; Sun, 9 Apr 2023 08:13:27 -0400 (EDT) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Sun, 9 Apr 2023 08:12:48 -0400 Message-Id: <1681042400-15491-9-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1681042400-15491-1-git-send-email-jsimmons@infradead.org> References: <1681042400-15491-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 08/40] lustre: enc: align Base64 encoding with RFC 4648 base64url X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.39 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Sebastien Buisson Lustre encryption uses a Base64 encoding to encode no-key filenames (the filenames that are presented to userspace when a directory is listed without its encryption key). Make this Base64 encoding compliant with RFC 4648 base64url. And use '+' leading character to distringuish digested names. This is adapted from kernel commit ba47b515f594 ("fscrypt: align Base64 encoding with RFC 4648 base64url") To maintain compatibility with older clients, a new llite parameter named 'filename_enc_use_old_base64' is introduced, set to 0 by default. When 0, Lustre uses new-fashion base64 encoding. When set to 1, Lustre uses old-style base64 encoding. To set this parameter globally for all clients, do on the MGS: mgs# lctl set_param -P llite.*.filename_enc_use_old_base64={0,1} WC-bug-id: https://jira.whamcloud.com/browse/LU-16374 Lustre-commit: 583ee6911b6cac7f2 ("LU-16374 enc: align Base64 encoding with RFC 4648 base64url") Signed-off-by: Sebastien Buisson Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/49581 Reviewed-by: Andreas Dilger Reviewed-by: jsimmons Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/include/lustre_crypto.h | 3 +++ fs/lustre/include/lustre_disk.h | 3 ++- fs/lustre/llite/crypto.c | 24 ++++++++++++------- fs/lustre/llite/llite_lib.c | 3 +++ fs/lustre/llite/lproc_llite.c | 49 +++++++++++++++++++++++++++++++++++++++ 5 files changed, 72 insertions(+), 10 deletions(-) diff --git a/fs/lustre/include/lustre_crypto.h b/fs/lustre/include/lustre_crypto.h index 2252798..ced1a191 100644 --- a/fs/lustre/include/lustre_crypto.h +++ b/fs/lustre/include/lustre_crypto.h @@ -32,6 +32,9 @@ #include +#define LLCRYPT_DIGESTED_CHAR '+' +#define LLCRYPT_DIGESTED_CHAR_OLD '_' + /* Macro to extract digest from Lustre specific structures */ #define LLCRYPT_EXTRACT_DIGEST(name, len) \ ((name) + round_down((len) - FS_CRYPTO_BLOCK_SIZE - 1, \ diff --git a/fs/lustre/include/lustre_disk.h b/fs/lustre/include/lustre_disk.h index 15f94ad8..a8e935e 100644 --- a/fs/lustre/include/lustre_disk.h +++ b/fs/lustre/include/lustre_disk.h @@ -136,7 +136,8 @@ struct lustre_sb_info { struct fscrypt_dummy_context lsi_dummy_enc_ctx; }; -#define LSI_UMOUNT_FAILOVER 0x00200000 +#define LSI_UMOUNT_FAILOVER 0x00200000 +#define LSI_FILENAME_ENC_B64_OLD_CLI 0x01000000 /* use old style base64 */ #define s2lsi(sb) ((struct lustre_sb_info *)((sb)->s_fs_info)) #define s2lsi_nocast(sb) ((sb)->s_fs_info) diff --git a/fs/lustre/llite/crypto.c b/fs/lustre/llite/crypto.c index d6750fb..5fb7f4d 100644 --- a/fs/lustre/llite/crypto.c +++ b/fs/lustre/llite/crypto.c @@ -227,15 +227,16 @@ int ll_setup_filename(struct inode *dir, const struct qstr *iname, struct qstr dname; int rc; - if (fid) { - fid->f_seq = 0; - fid->f_oid = 0; - fid->f_ver = 0; - } - if (fid && IS_ENCRYPTED(dir) && !fscrypt_has_encryption_key(dir) && - iname->name[0] == '_') - digested = 1; + !fscrypt_has_encryption_key(dir)) { + struct lustre_sb_info *lsi = s2lsi(dir->i_sb); + + if ((!(lsi->lsi_flags & LSI_FILENAME_ENC_B64_OLD_CLI) && + iname->name[0] == LLCRYPT_DIGESTED_CHAR) || + ((lsi->lsi_flags & LSI_FILENAME_ENC_B64_OLD_CLI) && + iname->name[0] == LLCRYPT_DIGESTED_CHAR_OLD)) + digested = 1; + } dname.name = iname->name + digested; dname.len = iname->len - digested; @@ -375,6 +376,8 @@ int ll_fname_disk_to_usr(struct inode *inode, } if (lltr.len > FS_CRYPTO_BLOCK_SIZE * 2 && !fscrypt_has_encryption_key(inode)) { + struct lustre_sb_info *lsi = s2lsi(inode->i_sb); + digested = 1; /* Without the key for long names, set the dentry name * to the representing struct ll_digest_filename. It @@ -391,7 +394,10 @@ int ll_fname_disk_to_usr(struct inode *inode, lltr.name = (char *)&digest; lltr.len = sizeof(digest); - oname->name[0] = '_'; + if (!(lsi->lsi_flags & LSI_FILENAME_ENC_B64_OLD_CLI)) + oname->name[0] = LLCRYPT_DIGESTED_CHAR; + else + oname->name[0] = LLCRYPT_DIGESTED_CHAR_OLD; oname->name = oname->name + 1; oname->len--; } diff --git a/fs/lustre/llite/llite_lib.c b/fs/lustre/llite/llite_lib.c index f84b6f5..e48bb6c 100644 --- a/fs/lustre/llite/llite_lib.c +++ b/fs/lustre/llite/llite_lib.c @@ -508,10 +508,13 @@ static int client_common_fill_super(struct super_block *sb, char *md, char *dt) } if (ll_sbi_has_name_encrypt(sbi) && !obd_connect_has_name_enc(data)) { + struct lustre_sb_info *lsi = s2lsi(sb); + if (ll_sb_has_test_dummy_encryption(sb)) LCONSOLE_WARN("%s: server %s does not support name encryption, not using it.\n", sbi->ll_fsname, sbi->ll_md_exp->exp_obd->obd_name); + lsi->lsi_flags &= ~LSI_FILENAME_ENC_B64_OLD_CLI; ll_sbi_set_name_encrypt(sbi, false); } diff --git a/fs/lustre/llite/lproc_llite.c b/fs/lustre/llite/lproc_llite.c index 70dbc87..48d93c6 100644 --- a/fs/lustre/llite/lproc_llite.c +++ b/fs/lustre/llite/lproc_llite.c @@ -1653,6 +1653,53 @@ static ssize_t ll_nosquash_nids_seq_write(struct file *file, LDEBUGFS_SEQ_FOPS(ll_nosquash_nids); +static int ll_old_b64_enc_seq_show(struct seq_file *m, void *v) +{ + struct super_block *sb = m->private; + struct lustre_sb_info *lsi = s2lsi(sb); + + seq_printf(m, "%u\n", + lsi->lsi_flags & LSI_FILENAME_ENC_B64_OLD_CLI ? 1 : 0); + return 0; +} + +static ssize_t ll_old_b64_enc_seq_write(struct file *file, + const char __user *buffer, + size_t count, loff_t *off) +{ + struct seq_file *m = file->private_data; + struct super_block *sb = m->private; + struct lustre_sb_info *lsi = s2lsi(sb); + struct ll_sb_info *sbi = ll_s2sbi(sb); + bool val; + int rc; + + rc = kstrtobool_from_user(buffer, count, &val); + if (rc) + return rc; + + if (val) { + if (!ll_sbi_has_name_encrypt(sbi)) { + /* server does not support name encryption, + * so force it to NULL on client + */ + CDEBUG(D_SEC, + "%s: server does not support name encryption\n", + sbi->ll_fsname); + lsi->lsi_flags &= ~LSI_FILENAME_ENC_B64_OLD_CLI; + return -EOPNOTSUPP; + } + + lsi->lsi_flags |= LSI_FILENAME_ENC_B64_OLD_CLI; + } else { + lsi->lsi_flags &= ~LSI_FILENAME_ENC_B64_OLD_CLI; + } + + return count; +} + +LDEBUGFS_SEQ_FOPS(ll_old_b64_enc); + static int ll_pcc_seq_show(struct seq_file *m, void *v) { struct super_block *sb = m->private; @@ -1709,6 +1756,8 @@ struct ldebugfs_vars lprocfs_llite_obd_vars[] = { .fops = &ll_nosquash_nids_fops }, { .name = "pcc", .fops = &ll_pcc_fops, }, + { .name = "filename_enc_use_old_base64", + .fops = &ll_old_b64_enc_fops, }, { NULL } }; From patchwork Sun Apr 9 12:12:49 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 13205944 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from pdx1-mailman-customer002.dreamhost.com (listserver-buz.dreamhost.com [69.163.136.29]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id B7AF7C77B61 for ; Sun, 9 Apr 2023 12:22:16 +0000 (UTC) Received: from pdx1-mailman-customer002.dreamhost.com (localhost [127.0.0.1]) by pdx1-mailman-customer002.dreamhost.com (Postfix) with ESMTP id 4PvWJb20BTz215N; Sun, 9 Apr 2023 05:15:11 -0700 (PDT) Received: from smtp4.ccs.ornl.gov (smtp4.ccs.ornl.gov [160.91.203.40]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by pdx1-mailman-customer002.dreamhost.com (Postfix) with ESMTPS id 4PvWHH6gl7z1yBw for ; Sun, 9 Apr 2023 05:14:03 -0700 (PDT) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp4.ccs.ornl.gov (Postfix) with ESMTP id E060F1008277; Sun, 9 Apr 2023 08:13:27 -0400 (EDT) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id DEC3C2B3; Sun, 9 Apr 2023 08:13:27 -0400 (EDT) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Sun, 9 Apr 2023 08:12:49 -0400 Message-Id: <1681042400-15491-10-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1681042400-15491-1-git-send-email-jsimmons@infradead.org> References: <1681042400-15491-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 09/40] lustre: quota: fix insane grant quota X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.39 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Hongchao Zhang , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Hongchao Zhang Fix the insane grant value in quota master/slave index, the logs often contain the content similar to the following, LustreError: 39815:0:(qmt_handler.c:527:qmt_dqacq0()) $$$ Release too much! uuid:work-MDT0000-lwp-MDT0002_UUID release:18446744070274413724 granted:18446744070291193856, total:4118877744 qmt:work-QMT0000 pool:0-dt id:40212 enforced:1 hard:128849018880 soft:12884901888 granted:4118877744 time:0 qunit: 16777216 edquot:0 may_rel:0 revoke:0 default:no It could be caused by chgrp, which reserves quota before changing GID for some file at MDT, then release the reserved quota after the file GID has been changed on the corresponding OST, (this issue is tracked at LU-5152 and LU-11303) In some case, some quota could be released even the quota was not reserved correctly, which cause the grant quota to be some negative value, which is regarded as some insane big value because the type of grant is "u64", then the normal grant release will fail and the grant field of some quota ID in the quota file (both at QMT and QSD) contain insane value, but can't be reset correctly. This patch resets the affected quota by clear the quota limits and grant, and the grant will be reported by each QSD when the quota ID is enforced again, then rebuild the grant at QMT. WC-bug-id: https://jira.whamcloud.com/browse/LU-15880 Lustre-commit: a2fd4d3aee9739dcb ("LU-15880 quota: fix insane grant quota") Signed-off-by: Hongchao Zhang Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/48981 Reviewed-by: Sergey Cheremencev Reviewed-by: Andreas Dilger Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/llite/dir.c | 1 + include/uapi/linux/lustre/lustre_user.h | 1 + 2 files changed, 2 insertions(+) diff --git a/fs/lustre/llite/dir.c b/fs/lustre/llite/dir.c index 7dca0fc..56ef1bb 100644 --- a/fs/lustre/llite/dir.c +++ b/fs/lustre/llite/dir.c @@ -1158,6 +1158,7 @@ int quotactl_ioctl(struct super_block *sb, struct if_quotactl *qctl) case LUSTRE_Q_SETINFOPOOL: case LUSTRE_Q_SETDEFAULT_POOL: case LUSTRE_Q_DELETEQID: + case LUSTRE_Q_RESETQID: if (!capable(CAP_SYS_ADMIN)) return -EPERM; diff --git a/include/uapi/linux/lustre/lustre_user.h b/include/uapi/linux/lustre/lustre_user.h index 9bbb1c9..68fddcf 100644 --- a/include/uapi/linux/lustre/lustre_user.h +++ b/include/uapi/linux/lustre/lustre_user.h @@ -1041,6 +1041,7 @@ static inline void obd_uuid2fsname(char *buf, char *uuid, int buflen) #define LUSTRE_Q_GETDEFAULT_POOL 0x800013 /* get default pool quota*/ #define LUSTRE_Q_SETDEFAULT_POOL 0x800014 /* set default pool quota */ #define LUSTRE_Q_DELETEQID 0x800015 /* delete quota ID */ +#define LUSTRE_Q_RESETQID 0x800016 /* reset quota ID */ /* In the current Lustre implementation, the grace time is either the time * or the timestamp to be used after some quota ID exceeds the soft limt, * 48 bits should be enough, its high 16 bits can be used as quota flags. From patchwork Sun Apr 9 12:12:50 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 13205950 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from pdx1-mailman-customer002.dreamhost.com (listserver-buz.dreamhost.com [69.163.136.29]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 96D97C77B70 for ; Sun, 9 Apr 2023 12:29:02 +0000 (UTC) Received: from pdx1-mailman-customer002.dreamhost.com (localhost [127.0.0.1]) by pdx1-mailman-customer002.dreamhost.com (Postfix) with ESMTP id 4PvWLd5yWqz21Hj; Sun, 9 Apr 2023 05:16:57 -0700 (PDT) Received: from smtp4.ccs.ornl.gov (smtp4.ccs.ornl.gov [160.91.203.40]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by pdx1-mailman-customer002.dreamhost.com (Postfix) with ESMTPS id 4PvWHK1ppRz1yCH for ; Sun, 9 Apr 2023 05:14:05 -0700 (PDT) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp4.ccs.ornl.gov (Postfix) with ESMTP id E63551008278; Sun, 9 Apr 2023 08:13:27 -0400 (EDT) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id E39C62B4; Sun, 9 Apr 2023 08:13:27 -0400 (EDT) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Sun, 9 Apr 2023 08:12:50 -0400 Message-Id: <1681042400-15491-11-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1681042400-15491-1-git-send-email-jsimmons@infradead.org> References: <1681042400-15491-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 10/40] lustre: llite: check truncated page in ->readpage() X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.39 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Qian Yingjin The page end offset calculation in filemap_get_read_batch() was off by one. This bug was introduced in commit v5.11-10234-gcbd59c48ae ("mm/filemap: use head pages in generic_file_buffered_read") When a read is submitted with end offset 1048575, it calculates the end page index for read of 256 where it should be 255. This results in the readpage() call for the page with index 256 is over stripe boundary and may not be covered by a DLM extent lock. This happens in a corner race case: filemap_get_read_batch() batches the page with index 256 for read, but later this page is removed from page cache due to the lock protected it being revoked, but has a reference count due to the batch. This results in this page in the read path is not covered by any DLM lock. The solution is simple. We can check whether the page was truncated and removed from page cache in ->readpage() by the address_sapce pointer of the page. If it was truncated, return AOP_TRUNCATED_PAGE to the upper caller. This will cause the kernel to retry to batch pages and the truncated page will not be added as it was already removed from page cache of the file. WC-bug-id: https://jira.whamcloud.com/browse/LU-16412 Lustre-commit: 209afbe28b5f164bd ("LU-16412 llite: check truncated page in ->readpage()") Signed-off-by: Qian Yingjin Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/49433 Reviewed-by: Patrick Farrell Reviewed-by: Zhenyu Xu Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/include/obd_support.h | 6 ++++-- fs/lustre/llite/rw.c | 35 +++++++++++++++++++++++++++++++++++ fs/lustre/llite/rw26.c | 7 +++++++ 3 files changed, 46 insertions(+), 2 deletions(-) diff --git a/fs/lustre/include/obd_support.h b/fs/lustre/include/obd_support.h index a2930c8..4ef5c61 100644 --- a/fs/lustre/include/obd_support.h +++ b/fs/lustre/include/obd_support.h @@ -458,8 +458,8 @@ /* was OBD_FAIL_LLOG_CATINFO_NET 0x1309 until 2.3 */ #define OBD_FAIL_MDS_SYNC_CAPA_SL 0x1310 #define OBD_FAIL_SEQ_ALLOC 0x1311 -#define OBD_FAIL_PLAIN_RECORDS 0x1319 -#define OBD_FAIL_CATALOG_FULL_CHECK 0x131a +#define OBD_FAIL_PLAIN_RECORDS 0x1319 +#define OBD_FAIL_CATALOG_FULL_CHECK 0x131a #define OBD_FAIL_LLITE 0x1400 #define OBD_FAIL_LLITE_FAULT_TRUNC_RACE 0x1401 @@ -488,6 +488,8 @@ #define OBD_FAIL_LLITE_PAGE_ALLOC 0x1418 #define OBD_FAIL_LLITE_OPEN_DELAY 0x1419 #define OBD_FAIL_LLITE_XATTR_PAUSE 0x1420 +#define OBD_FAIL_LLITE_PAGE_INVALIDATE_PAUSE 0x1421 +#define OBD_FAIL_LLITE_READPAGE_PAUSE 0x1422 #define OBD_FAIL_FID_INDIR 0x1501 #define OBD_FAIL_FID_INLMA 0x1502 diff --git a/fs/lustre/llite/rw.c b/fs/lustre/llite/rw.c index 0b14ea6..dea2af1 100644 --- a/fs/lustre/llite/rw.c +++ b/fs/lustre/llite/rw.c @@ -1865,6 +1865,41 @@ int ll_readpage(struct file *file, struct page *vmpage) struct ll_sb_info *sbi = ll_i2sbi(inode); int result; + if (OBD_FAIL_PRECHECK(OBD_FAIL_LLITE_READPAGE_PAUSE)) { + unlock_page(vmpage); + OBD_FAIL_TIMEOUT(OBD_FAIL_LLITE_READPAGE_PAUSE, cfs_fail_val); + lock_page(vmpage); + } + + /* + * The @vmpage got truncated. + * This is a kernel bug introduced since kernel 5.12: + * comment: cbd59c48ae2bcadc4a7599c29cf32fd3f9b78251 + * ("mm/filemap: use head pages in generic_file_buffered_read") + * + * The page end offset calculation in filemap_get_read_batch() was off + * by one. When a read is submitted with end offset 1048575, then it + * calculates the end page for read of 256 where it should be 255. This + * results in the readpage() for the page with index 256 is over stripe + * boundary and may not covered by a DLM extent lock. + * + * This happens in a corner race case: filemap_get_read_batch() adds + * the page with index 256 for read which is not in the current read + * I/O context, and this page is being invalidated and will be removed + * from page cache due to the lock protected it being revoken. This + * results in this page in the read path not covered by any DLM lock. + * + * The solution is simple. Check whether the page was truncated in + * ->readpage(). If so, just return AOP_TRUNCATED_PAGE to the upper + * caller. Then the kernel will retry to batch pages, and it will not + * add the truncated page into batches as it was removed from page + * cache of the file. + */ + if (vmpage->mapping != inode->i_mapping) { + unlock_page(vmpage); + return AOP_TRUNCATED_PAGE; + } + lcc = ll_cl_find(inode); if (lcc) { env = lcc->lcc_env; diff --git a/fs/lustre/llite/rw26.c b/fs/lustre/llite/rw26.c index cadded4..6700717 100644 --- a/fs/lustre/llite/rw26.c +++ b/fs/lustre/llite/rw26.c @@ -96,6 +96,13 @@ static void ll_invalidatepage(struct page *vmpage, unsigned int offset, } cl_env_percpu_put(env); } + + if (OBD_FAIL_PRECHECK(OBD_FAIL_LLITE_PAGE_INVALIDATE_PAUSE)) { + unlock_page(vmpage); + OBD_FAIL_TIMEOUT(OBD_FAIL_LLITE_PAGE_INVALIDATE_PAUSE, + cfs_fail_val); + lock_page(vmpage); + } } static int ll_releasepage(struct page *vmpage, gfp_t gfp_mask) From patchwork Sun Apr 9 12:12:51 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 13205952 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from pdx1-mailman-customer002.dreamhost.com (listserver-buz.dreamhost.com [69.163.136.29]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 0ED5BC77B70 for ; Sun, 9 Apr 2023 12:30:26 +0000 (UTC) Received: from pdx1-mailman-customer002.dreamhost.com (localhost [127.0.0.1]) by pdx1-mailman-customer002.dreamhost.com (Postfix) with ESMTP id 4PvWMF2Y7Wz21Jq; Sun, 9 Apr 2023 05:17:29 -0700 (PDT) Received: from smtp4.ccs.ornl.gov (smtp4.ccs.ornl.gov [160.91.203.40]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by pdx1-mailman-customer002.dreamhost.com (Postfix) with ESMTPS id 4PvWHM08yZz1yCN for ; Sun, 9 Apr 2023 05:14:07 -0700 (PDT) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp4.ccs.ornl.gov (Postfix) with ESMTP id EA1481008279; Sun, 9 Apr 2023 08:13:27 -0400 (EDT) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id E8D392AB; Sun, 9 Apr 2023 08:13:27 -0400 (EDT) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Sun, 9 Apr 2023 08:12:51 -0400 Message-Id: <1681042400-15491-12-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1681042400-15491-1-git-send-email-jsimmons@infradead.org> References: <1681042400-15491-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 11/40] lnet: o2iblnd: Fix key mismatch issue X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.39 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Cyril Bordage , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Cyril Bordage If a pool memory region (mr) is mapped then unmapped without being used, its key becomes out of sync with the RDMA subsystem. At pool mr map time, the present code will create a local invalidate work request (wr) using the mr's present key and then change the mr's key. When the mr is first used after being mapped, the local invalidate wr will invalidate the original mr key, and then a fast register wr is used with the modified key. The fast register will update the RDMA subsystem's key for the mr. The error occurs when the mr is never used. The next time the mr is mapped, a local invalidate wr will again be created, but this time it will use the mr's modified key. The RDMA subsystem never saw the original local invalidate, so now the RDMA subsystem's key for the mr and o2iblnd's key for the mr are out of sync. Fix the issue by tracking if the invalidate has been used. Repurpose the boolean frd->frd_valid. Presently, frd_valid is always false. Remove the code that used frd_valid to conditionally split the invalidate from the fast register. Instead, use frd_valid to indicate when a new invalidate needs to be generated. After a post, evaluate if the invalidate was successfully used in the post. These changes are only meaningful to the FRWR code path. The failure has only been observed when using Omni-Path Architecture. WC-bug-id: https://jira.whamcloud.com/browse/LU-16349 Lustre-commit: 0c93919f1375ce16d ("LU-16349 o2iblnd: Fix key mismatch issue") Signed-off-by: Cyril Bordage Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/49714 Reviewed-by: Serguei Smirnov Reviewed-by: Amir Shehata Reviewed-by: Frank Sehr Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- net/lnet/klnds/o2iblnd/o2iblnd.c | 5 +++-- net/lnet/klnds/o2iblnd/o2iblnd_cb.c | 17 +++++++++++------ 2 files changed, 14 insertions(+), 8 deletions(-) diff --git a/net/lnet/klnds/o2iblnd/o2iblnd.c b/net/lnet/klnds/o2iblnd/o2iblnd.c index c1dfbe5..a7a3c79 100644 --- a/net/lnet/klnds/o2iblnd/o2iblnd.c +++ b/net/lnet/klnds/o2iblnd/o2iblnd.c @@ -1584,7 +1584,8 @@ static int kiblnd_alloc_freg_pool(struct kib_fmr_poolset *fps, goto out_middle; } - frd->frd_valid = true; + /* indicate that the local invalidate needs to be generated */ + frd->frd_valid = false; list_add_tail(&frd->frd_list, &fpo->fast_reg.fpo_pool_list); fpo->fast_reg.fpo_pool_size++; @@ -1738,7 +1739,6 @@ void kiblnd_fmr_pool_unmap(struct kib_fmr *fmr, int status) fps = fpo->fpo_owner; if (frd) { - frd->frd_valid = false; frd->frd_posted = false; fmr->fmr_frd = NULL; spin_lock(&fps->fps_lock); @@ -1800,6 +1800,7 @@ int kiblnd_fmr_pool_map(struct kib_fmr_poolset *fps, struct kib_tx *tx, u32 key = is_rx ? mr->rkey : mr->lkey; struct ib_send_wr *inv_wr; + frd->frd_valid = true; inv_wr = &frd->frd_inv_wr; memset(inv_wr, 0, sizeof(*inv_wr)); inv_wr->opcode = IB_WR_LOCAL_INV; diff --git a/net/lnet/klnds/o2iblnd/o2iblnd_cb.c b/net/lnet/klnds/o2iblnd/o2iblnd_cb.c index 6fc1730..5596fd6b 100644 --- a/net/lnet/klnds/o2iblnd/o2iblnd_cb.c +++ b/net/lnet/klnds/o2iblnd/o2iblnd_cb.c @@ -847,12 +847,8 @@ static int kiblnd_map_tx(struct lnet_ni *ni, struct kib_tx *tx, struct ib_send_wr *wrq = &tx->tx_wrq[0].wr; if (frd && !frd->frd_posted) { - if (!frd->frd_valid) { - wrq = &frd->frd_inv_wr; - wrq->next = &frd->frd_fastreg_wr.wr; - } else { - wrq = &frd->frd_fastreg_wr.wr; - } + wrq = &frd->frd_inv_wr; + wrq->next = &frd->frd_fastreg_wr.wr; frd->frd_fastreg_wr.wr.next = &tx->tx_wrq[0].wr; } @@ -866,6 +862,15 @@ static int kiblnd_map_tx(struct lnet_ni *ni, struct kib_tx *tx, rc = -EINVAL; else rc = ib_post_send(conn->ibc_cmid->qp, wrq, &bad); + + if (frd && !frd->frd_posted) { + /* The local invalidate becomes invalid (has been + * successfully used) if the post succeeds or the + * failing wr was not the invalidate. + */ + frd->frd_valid = + !(rc == 0 || (bad != &frd->frd_inv_wr)); + } } conn->ibc_last_send = ktime_get(); From patchwork Sun Apr 9 12:12:52 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 13205945 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from pdx1-mailman-customer002.dreamhost.com (listserver-buz.dreamhost.com [69.163.136.29]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 5A1AEC77B70 for ; Sun, 9 Apr 2023 12:24:42 +0000 (UTC) Received: from pdx1-mailman-customer002.dreamhost.com (localhost [127.0.0.1]) by pdx1-mailman-customer002.dreamhost.com (Postfix) with ESMTP id 4PvWJr6Mpmz1yBV; Sun, 9 Apr 2023 05:15:24 -0700 (PDT) Received: from smtp4.ccs.ornl.gov (smtp4.ccs.ornl.gov [160.91.203.40]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by pdx1-mailman-customer002.dreamhost.com (Postfix) with ESMTPS id 4PvWHN6zX6z1yCk for ; Sun, 9 Apr 2023 05:14:08 -0700 (PDT) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp4.ccs.ornl.gov (Postfix) with ESMTP id EE9C8100827A; Sun, 9 Apr 2023 08:13:27 -0400 (EDT) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id ED5C02B2; Sun, 9 Apr 2023 08:13:27 -0400 (EDT) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Sun, 9 Apr 2023 08:12:52 -0400 Message-Id: <1681042400-15491-13-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1681042400-15491-1-git-send-email-jsimmons@infradead.org> References: <1681042400-15491-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 12/40] lustre: sec: fid2path for encrypted files X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.39 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Sebastien Buisson Add support of fid2path for encrypted files. Server side returns raw encrypted path name to client, which needs to process the returned string. This is done from top to bottom, by iteratively decrypting parent name and then doing a lookup on it, so that child can in turn be decrypted. For encrypted files that do not have their names encrypted, lookups can be skipped. Indeed, name decryption is a no-op in this case, which means it is not necessary to fetch the encryption key associated with the parent inode. Without the encryption key, lookups are skipped for the same reason. But names have to be encoded and/or digested. So server needs to insert FIDs of individual path components in the returned string. These FIDs are interpreted by the client to build encoded/digested names. WC-bug-id: https://jira.whamcloud.com/browse/LU-16205 Lustre-commit: fa9da556ad22b1485 ("LU-16205 sec: fid2path for encrypted files") Signed-off-by: Sebastien Buisson Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/48930 Reviewed-by: Andreas Dilger Reviewed-by: jsimmons Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/include/lustre_export.h | 5 ++ fs/lustre/llite/file.c | 160 +++++++++++++++++++++++++++++++++++++- fs/lustre/llite/llite_internal.h | 17 ++++ fs/lustre/llite/llite_lib.c | 1 + fs/lustre/lmv/lmv_obd.c | 38 +++++++-- fs/lustre/mdc/mdc_request.c | 10 +-- 6 files changed, 214 insertions(+), 17 deletions(-) diff --git a/fs/lustre/include/lustre_export.h b/fs/lustre/include/lustre_export.h index 6a59e6c..59f1dea 100644 --- a/fs/lustre/include/lustre_export.h +++ b/fs/lustre/include/lustre_export.h @@ -284,6 +284,11 @@ static inline int exp_connect_encrypt(struct obd_export *exp) return !!(exp_connect_flags2(exp) & OBD_CONNECT2_ENCRYPT); } +static inline int exp_connect_encrypt_fid2path(struct obd_export *exp) +{ + return !!(exp_connect_flags2(exp) & OBD_CONNECT2_ENCRYPT_FID2PATH); +} + static inline int exp_connect_lseek(struct obd_export *exp) { return !!(exp_connect_flags2(exp) & OBD_CONNECT2_LSEEK); diff --git a/fs/lustre/llite/file.c b/fs/lustre/llite/file.c index aa9c5da..668d544 100644 --- a/fs/lustre/llite/file.c +++ b/fs/lustre/llite/file.c @@ -2744,12 +2744,146 @@ static int ll_do_fiemap(struct inode *inode, struct fiemap *fiemap, return rc; } +static int fid2path_for_enc_file(struct inode *parent, char *gfpath, + u32 gfpathlen) +{ + struct dentry *de = NULL, *de_parent = d_find_any_alias(parent); + struct fscrypt_str lltr = FSTR_INIT(NULL, 0); + struct fscrypt_str de_name; + char *p, *ptr = gfpath; + size_t len = 0, len_orig = 0; + int enckey = -1, nameenc = -1; + int rc = 0; + + gfpath++; + while ((p = strsep(&gfpath, "/")) != NULL) { + struct lu_fid fid; + + de = NULL; + if (!*p) { + dput(de_parent); + break; + } + len_orig = strlen(p); + + rc = sscanf(p, "["SFID"]", RFID(&fid)); + if (rc == 3) + p = strchr(p, ']') + 1; + else + fid_zero(&fid); + rc = 0; + len = strlen(p); + + if (!IS_ENCRYPTED(parent)) { + if (gfpathlen < len + 1) { + dput(de_parent); + rc = -EOVERFLOW; + break; + } + memmove(ptr, p, len); + p = ptr; + ptr += len; + *(ptr++) = '/'; + gfpathlen -= len + 1; + goto lookup; + } + + /* From here, we know parent is encrypted */ + if (enckey != 0) { + rc = fscrypt_get_encryption_info(parent); + if (rc && rc != -ENOKEY) { + dput(de_parent); + break; + } + } + + if (enckey == -1) { + if (fscrypt_has_encryption_key(parent)) + enckey = 1; + else + enckey = 0; + if (enckey == 1) + nameenc = true; + } + + /* Even if names are not encrypted, we still need to call + * ll_fname_disk_to_usr in order to decode names as they are + * coming from the wire. + */ + rc = fscrypt_fname_alloc_buffer(parent, NAME_MAX + 1, &lltr); + if (rc < 0) { + dput(de_parent); + break; + } + + de_name.name = p; + de_name.len = len; + rc = ll_fname_disk_to_usr(parent, 0, 0, &de_name, + &lltr, &fid); + if (rc) { + fscrypt_fname_free_buffer(&lltr); + dput(de_parent); + break; + } + lltr.name[lltr.len] = '\0'; + + if (lltr.len <= len_orig && gfpathlen >= lltr.len + 1) { + memcpy(ptr, lltr.name, lltr.len); + p = ptr; + len = lltr.len; + ptr += lltr.len; + *(ptr++) = '/'; + gfpathlen -= lltr.len + 1; + } else { + rc = -EOVERFLOW; + } + fscrypt_fname_free_buffer(&lltr); + + if (rc == -EOVERFLOW) { + dput(de_parent); + break; + } + +lookup: + if (!gfpath) { + /* We reached the end of the string, which means + * we are dealing with the last component in the path. + * So save a useless lookup and exit. + */ + dput(de_parent); + break; + } + + if (enckey == 0 || nameenc == 0) + continue; + + inode_lock(parent); + de = lookup_one_len(p, de_parent, len); + inode_unlock(parent); + if (IS_ERR_OR_NULL(de) || !de->d_inode) { + dput(de_parent); + rc = -ENODATA; + break; + } + + parent = de->d_inode; + dput(de_parent); + de_parent = de; + } + + if (len) + *(ptr - 1) = '\0'; + if (!IS_ERR_OR_NULL(de)) + dput(de); + return rc; +} + int ll_fid2path(struct inode *inode, void __user *arg) { struct obd_export *exp = ll_i2mdexp(inode); const struct getinfo_fid2path __user *gfin = arg; struct getinfo_fid2path *gfout; - u32 pathlen; + u32 pathlen, pathlen_orig; size_t outsize; int rc; @@ -2763,7 +2897,9 @@ int ll_fid2path(struct inode *inode, void __user *arg) if (pathlen > PATH_MAX) return -EINVAL; + pathlen_orig = pathlen; +gf_alloc: outsize = sizeof(*gfout) + pathlen; gfout = kzalloc(outsize, GFP_KERNEL); @@ -2781,17 +2917,37 @@ int ll_fid2path(struct inode *inode, void __user *arg) * old server without fileset mount support will ignore this. */ *gfout->gf_root_fid = *ll_inode2fid(inode); + gfout->gf_pathlen = pathlen; /* Call mdc_iocontrol */ rc = obd_iocontrol(OBD_IOC_FID2PATH, exp, outsize, gfout, NULL); if (rc != 0) goto gf_free; - if (copy_to_user(arg, gfout, outsize)) + if (gfout->gf_pathlen && gfout->gf_path[0] == '/') { + /* by convention, server side (mdt_path_current()) puts + * a leading '/' to tell client that we are dealing with + * an encrypted file + */ + rc = fid2path_for_enc_file(inode, gfout->gf_path, + gfout->gf_pathlen); + if (rc) + goto gf_free; + if (strlen(gfout->gf_path) > gfin->gf_pathlen) { + rc = -EOVERFLOW; + goto gf_free; + } + } + + if (copy_to_user(arg, gfout, sizeof(*gfout) + pathlen_orig)) rc = -EFAULT; gf_free: kfree(gfout); + if (rc == -ENAMETOOLONG) { + pathlen += PATH_MAX; + goto gf_alloc; + } return rc; } diff --git a/fs/lustre/llite/llite_internal.h b/fs/lustre/llite/llite_internal.h index 1d85d0b..2223dbb 100644 --- a/fs/lustre/llite/llite_internal.h +++ b/fs/lustre/llite/llite_internal.h @@ -523,6 +523,23 @@ static inline void obd_connect_set_name_enc(struct obd_connect_data *data) #endif } +static inline bool obd_connect_has_enc_fid2path(struct obd_connect_data *data) +{ +#ifdef HAVE_LUSTRE_CRYPTO + return data->ocd_connect_flags & OBD_CONNECT_FLAGS2 && + data->ocd_connect_flags2 & OBD_CONNECT2_ENCRYPT_FID2PATH; +#else + return false; +#endif +} + +static inline void obd_connect_set_enc_fid2path(struct obd_connect_data *data) +{ +#ifdef HAVE_LUSTRE_CRYPTO + data->ocd_connect_flags2 |= OBD_CONNECT2_ENCRYPT_FID2PATH; +#endif +} + /* * Locking to guarantee consistency of non-atomic updates to long long i_size, * consistency between file size and KMS. diff --git a/fs/lustre/llite/llite_lib.c b/fs/lustre/llite/llite_lib.c index e48bb6c..3774ca8 100644 --- a/fs/lustre/llite/llite_lib.c +++ b/fs/lustre/llite/llite_lib.c @@ -358,6 +358,7 @@ static int client_common_fill_super(struct super_block *sb, char *md, char *dt) obd_connect_set_secctx(data); if (ll_sbi_has_encrypt(sbi)) { + obd_connect_set_enc_fid2path(data); obd_connect_set_name_enc(data); obd_connect_set_enc(data); } diff --git a/fs/lustre/lmv/lmv_obd.c b/fs/lustre/lmv/lmv_obd.c index 64d16d8..99604e8 100644 --- a/fs/lustre/lmv/lmv_obd.c +++ b/fs/lustre/lmv/lmv_obd.c @@ -551,6 +551,8 @@ static int lmv_fid2path(struct obd_export *exp, int len, void *karg, struct getinfo_fid2path *remote_gf = NULL; struct lu_fid root_fid; int remote_gf_size = 0; + int currentisenc = 0; + int globalisenc = 0; int rc; tgt = lmv_fid2tgt(lmv, &gf->gf_fid); @@ -565,11 +567,23 @@ static int lmv_fid2path(struct obd_export *exp, int len, void *karg, if (rc != 0 && rc != -EREMOTE) goto out_fid2path; + if (gf->gf_path[0] == '/') { + /* by convention, server side (mdt_path_current()) puts + * a leading '/' to tell client that we are dealing with + * an encrypted file + */ + currentisenc = 1; + globalisenc = 1; + } else { + currentisenc = 0; + } + /* If remote_gf != NULL, it means just building the * path on the remote MDT, copy this path segment to gf */ if (remote_gf) { struct getinfo_fid2path *ori_gf; + int oldisenc = 0; char *ptr; int len; @@ -581,14 +595,22 @@ static int lmv_fid2path(struct obd_export *exp, int len, void *karg, } ptr = ori_gf->gf_path; + oldisenc = ptr[0] == '/'; len = strlen(gf->gf_path); - /* move the current path to the right to release space - * for closer-to-root part - */ - memmove(ptr + len + 1, ptr, strlen(ori_gf->gf_path)); - memcpy(ptr, gf->gf_path, len); - ptr[len] = '/'; + if (len) { + /* move the current path to the right to release space + * for closer-to-root part + */ + memmove(ptr + len - currentisenc + 1 + globalisenc, + ptr + oldisenc, + strlen(ori_gf->gf_path) - oldisenc + 1); + if (globalisenc) + *(ptr++) = '/'; + memcpy(ptr, gf->gf_path + currentisenc, + len - currentisenc); + ptr[len - currentisenc] = '/'; + } } CDEBUG(D_INFO, "%s: get path %s " DFID " rec: %llu ln: %u\n", @@ -601,13 +623,13 @@ static int lmv_fid2path(struct obd_export *exp, int len, void *karg, /* sigh, has to go to another MDT to do path building further */ if (!remote_gf) { - remote_gf_size = sizeof(*remote_gf) + PATH_MAX; + remote_gf_size = sizeof(*remote_gf) + len - sizeof(*gf); remote_gf = kzalloc(remote_gf_size, GFP_NOFS); if (!remote_gf) { rc = -ENOMEM; goto out_fid2path; } - remote_gf->gf_pathlen = PATH_MAX; + remote_gf->gf_pathlen = len - sizeof(*gf); } if (!fid_is_sane(&gf->gf_fid)) { diff --git a/fs/lustre/mdc/mdc_request.c b/fs/lustre/mdc/mdc_request.c index 643b6ee..58ea982 100644 --- a/fs/lustre/mdc/mdc_request.c +++ b/fs/lustre/mdc/mdc_request.c @@ -1707,8 +1707,6 @@ static int mdc_ioc_fid2path(struct obd_export *exp, struct getinfo_fid2path *gf) void *key; int rc; - if (gf->gf_pathlen > PATH_MAX) - return -ENAMETOOLONG; if (gf->gf_pathlen < 2) return -EOVERFLOW; @@ -1746,12 +1744,10 @@ static int mdc_ioc_fid2path(struct obd_export *exp, struct getinfo_fid2path *gf) goto out; } - CDEBUG(D_IOCTL, "path got " DFID " from %llu #%d: %s\n", + CDEBUG(D_IOCTL, "path got " DFID " from %llu #%d: %.*s\n", PFID(&gf->gf_fid), gf->gf_recno, gf->gf_linkno, - gf->gf_pathlen < 512 ? gf->gf_path : - /* only log the last 512 characters of the path */ - gf->gf_path + gf->gf_pathlen - 512); - + /* only log the first 512 characters of the path */ + 512, gf->gf_path); out: kfree(key); return rc; From patchwork Sun Apr 9 12:12:53 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 13205953 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from pdx1-mailman-customer002.dreamhost.com (listserver-buz.dreamhost.com [69.163.136.29]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 7C167C77B61 for ; Sun, 9 Apr 2023 12:30:40 +0000 (UTC) Received: from pdx1-mailman-customer002.dreamhost.com (localhost [127.0.0.1]) by pdx1-mailman-customer002.dreamhost.com (Postfix) with ESMTP id 4PvWMq3XkKz226P; Sun, 9 Apr 2023 05:17:59 -0700 (PDT) Received: from smtp4.ccs.ornl.gov (smtp4.ccs.ornl.gov [160.91.203.40]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by pdx1-mailman-customer002.dreamhost.com (Postfix) with ESMTPS id 4PvWHS4TCPz1yDK for ; Sun, 9 Apr 2023 05:14:12 -0700 (PDT) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp4.ccs.ornl.gov (Postfix) with ESMTP id F30BF100827B; Sun, 9 Apr 2023 08:13:27 -0400 (EDT) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id F1CD42B3; Sun, 9 Apr 2023 08:13:27 -0400 (EDT) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Sun, 9 Apr 2023 08:12:53 -0400 Message-Id: <1681042400-15491-14-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1681042400-15491-1-git-send-email-jsimmons@infradead.org> References: <1681042400-15491-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 13/40] lustre: sec: Lustre/HSM on enc file with enc key X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.39 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Sebastien Buisson Support for Lustre/HSM on encrypted files when the encryption key is available requires similar attention as with file migration. The volatile file used for HSM restore must have the same encryption context as the Lustre file being restored, so that file content remains accessible after the layout swap at the end of the restore procedure. Please note that using Lustre/HSM with the encryption key creates clear text copies of encrypted files on the HSM backend storage. WC-bug-id: https://jira.whamcloud.com/browse/LU-16310 Lustre-commit: df7a8d92d2378e236 ("LU-16310 sec: Lustre/HSM on enc file with enc key") Signed-off-by: Sebastien Buisson Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/49153 Reviewed-by: Oleg Drokin Reviewed-by: jsimmons Reviewed-by: Andreas Dilger Reviewed-by: Etienne AUJAMES Signed-off-by: James Simmons --- fs/lustre/llite/crypto.c | 11 ++++++++++- 1 file changed, 10 insertions(+), 1 deletion(-) diff --git a/fs/lustre/llite/crypto.c b/fs/lustre/llite/crypto.c index 5fb7f4d..61b85c8 100644 --- a/fs/lustre/llite/crypto.c +++ b/fs/lustre/llite/crypto.c @@ -246,7 +246,16 @@ int ll_setup_filename(struct inode *dir, const struct qstr *iname, fid->f_oid = 0; fid->f_ver = 0; } - rc = fscrypt_setup_filename(dir, &dname, lookup, fname); + if (unlikely(filename_is_volatile(iname->name, + iname->len, NULL))) { + /* keep volatile name as-is, matters for server side */ + memset(fname, 0, sizeof(struct fscrypt_name)); + fname->disk_name.name = (unsigned char *)iname->name; + fname->disk_name.len = iname->len; + rc = 0; + } else { + rc = fscrypt_setup_filename(dir, &dname, lookup, fname); + } if (rc == -ENOENT && lookup) { if (((is_root_inode(dir) && iname->len == strlen(dot_fscrypt_name) && From patchwork Sun Apr 9 12:12:54 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 13205947 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from pdx1-mailman-customer002.dreamhost.com (listserver-buz.dreamhost.com [69.163.136.29]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id CF506C77B70 for ; Sun, 9 Apr 2023 12:26:17 +0000 (UTC) Received: from pdx1-mailman-customer002.dreamhost.com (localhost [127.0.0.1]) by pdx1-mailman-customer002.dreamhost.com (Postfix) with ESMTP id 4PvWKT2Rw5z1yBt; Sun, 9 Apr 2023 05:15:57 -0700 (PDT) Received: from smtp4.ccs.ornl.gov (smtp4.ccs.ornl.gov [160.91.203.40]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by pdx1-mailman-customer002.dreamhost.com (Postfix) with ESMTPS id 4PvWHl4d4Mz1yDc for ; Sun, 9 Apr 2023 05:14:26 -0700 (PDT) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp4.ccs.ornl.gov (Postfix) with ESMTP id 034E1100827C; Sun, 9 Apr 2023 08:13:28 -0400 (EDT) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 0229D2AB; Sun, 9 Apr 2023 08:13:28 -0400 (EDT) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Sun, 9 Apr 2023 08:12:54 -0400 Message-Id: <1681042400-15491-15-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1681042400-15491-1-git-send-email-jsimmons@infradead.org> References: <1681042400-15491-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 14/40] lustre: llite: check read page past requested X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.39 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Qian Yingjin Due to a kernel bug introduced in 5.12 in commit: cbd59c48ae2bcadc4a7599c29cf32fd3f9b78251 ("mm/filemap: use head pages in generic_file_buffered_read") if the page immediately after the current read is in cache, the kernel will try to read it. This attempts to read a page past the end of requested read from userspace, and so has not been safely locked by Lustre. For a page after the end of the current read, check whether it is under the protection of a DLM lock. If so, we take a reference on the DLM lock until the page read has finished and then release the reference. If the page is not covered by a DLM lock, then we are racing with the page being removed from Lustre. In that case, we return AOP_TRUNCATED_PAGE, which makes the kernel release its reference on the page and retry the page read. This allows the page to be removed from cache, so the kernel will not find it and incorrectly attempt to read it again. NB: Earlier versions of this description refer to stripe boundaries, but the locking issue can occur whether or not the page is on a stripe boundary, because dlmlocks can cover part of a stripe. (This is rare, but is allowed.) WC-bug-id: https://jira.whamcloud.com/browse/LU-16412 Lustre-commit: 2f8f38effac3a9519 ("LU-16412 llite: check read page past requested") Signed-off-by: Qian Yingjin Signed-off-by: Patrick Farrell Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/49723 Reviewed-by: Zhenyu Xu Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/llite/llite_internal.h | 2 ++ fs/lustre/llite/rw.c | 58 +++++++++++++++++++++++++++++++++++++--- fs/lustre/llite/vvp_io.c | 10 +++++-- 3 files changed, 65 insertions(+), 5 deletions(-) diff --git a/fs/lustre/llite/llite_internal.h b/fs/lustre/llite/llite_internal.h index 2223dbb..970b144 100644 --- a/fs/lustre/llite/llite_internal.h +++ b/fs/lustre/llite/llite_internal.h @@ -1371,6 +1371,8 @@ struct ll_cl_context { struct cl_io *lcc_io; struct cl_page *lcc_page; enum lcc_type lcc_type; + struct kiocb *lcc_iocb; + struct iov_iter *lcc_iter; }; struct ll_thread_info { diff --git a/fs/lustre/llite/rw.c b/fs/lustre/llite/rw.c index dea2af1..d285ae1 100644 --- a/fs/lustre/llite/rw.c +++ b/fs/lustre/llite/rw.c @@ -1858,11 +1858,14 @@ int ll_readpage(struct file *file, struct page *vmpage) { struct inode *inode = file_inode(file); struct cl_object *clob = ll_i2info(inode)->lli_clob; - struct ll_cl_context *lcc; + struct ll_sb_info *sbi = ll_i2sbi(inode); const struct lu_env *env = NULL; + struct cl_read_ahead ra = { 0 }; + struct ll_cl_context *lcc; struct cl_io *io = NULL; + struct iov_iter *iter; struct cl_page *page; - struct ll_sb_info *sbi = ll_i2sbi(inode); + struct kiocb *iocb; int result; if (OBD_FAIL_PRECHECK(OBD_FAIL_LLITE_READPAGE_PAUSE)) { @@ -1911,6 +1914,8 @@ int ll_readpage(struct file *file, struct page *vmpage) struct ll_readahead_state *ras = &fd->fd_ras; struct lu_env *local_env = NULL; + CDEBUG(D_VFSTRACE, "fast read pgno: %ld\n", vmpage->index); + result = -ENODATA; /* @@ -1968,6 +1973,47 @@ int ll_readpage(struct file *file, struct page *vmpage) return result; } + if (lcc && lcc->lcc_type != LCC_MMAP) { + iocb = lcc->lcc_iocb; + iter = lcc->lcc_iter; + + CDEBUG(D_VFSTRACE, "pgno:%ld, cnt:%ld, pos:%lld\n", + vmpage->index, iter->count, iocb->ki_pos); + + /* + * This handles a kernel bug introduced in kernel 5.12: + * comment: cbd59c48ae2bcadc4a7599c29cf32fd3f9b78251 + * ("mm/filemap: use head pages in generic_file_buffered_read") + * + * See above in this function for a full description of the + * bug. Briefly, the kernel will try to read 1 more page than + * was actually requested *if that page is already in cache*. + * + * Because this page is beyond the boundary of the requested + * read, Lustre does not lock it as part of the read. This + * means we must check if there is a valid dlmlock on this + * page and reference it before we attempt to read in the + * page. If there is not a valid dlmlock, then we are racing + * with dlmlock cancellation and the page is being removed + * from the cache. + * + * That means we should return AOP_TRUNCATED_PAGE, which will + * cause the kernel to retry the read, which should allow the + * page to be removed from cache as the lock is cancelled. + * + * This should never occur except in kernels with the bug + * mentioned above. + */ + if (cl_offset(clob, vmpage->index) >= iter->count + iocb->ki_pos) { + result = cl_io_read_ahead(env, io, vmpage->index, &ra); + if (result < 0 || vmpage->index > ra.cra_end_idx) { + cl_read_ahead_release(env, &ra); + unlock_page(vmpage); + return AOP_TRUNCATED_PAGE; + } + } + } + /** * Direct read can fall back to buffered read, but DIO is done * with lockless i/o, and buffered requires LDLM locking, so in @@ -1979,7 +2025,8 @@ int ll_readpage(struct file *file, struct page *vmpage) unlock_page(vmpage); io->ci_dio_lock = 1; io->ci_need_restart = 1; - return -ENOLCK; + result = -ENOLCK; + goto out; } page = cl_page_find(env, clob, vmpage->index, vmpage, CPT_CACHEABLE); @@ -1999,5 +2046,10 @@ int ll_readpage(struct file *file, struct page *vmpage) unlock_page(vmpage); result = PTR_ERR(page); } + +out: + if (ra.cra_release) + cl_read_ahead_release(env, &ra); + return result; } diff --git a/fs/lustre/llite/vvp_io.c b/fs/lustre/llite/vvp_io.c index eacb35b..2da74a2 100644 --- a/fs/lustre/llite/vvp_io.c +++ b/fs/lustre/llite/vvp_io.c @@ -806,6 +806,7 @@ static int vvp_io_read_start(const struct lu_env *env, loff_t pos = io->u.ci_rd.rd.crw_pos; size_t cnt = io->u.ci_rd.rd.crw_count; size_t tot = vio->vui_tot_count; + struct ll_cl_context *lcc; int exceed = 0; int result; struct iov_iter iter; @@ -868,9 +869,14 @@ static int vvp_io_read_start(const struct lu_env *env, file_accessed(file); LASSERT(vio->vui_iocb->ki_pos == pos); iter = *vio->vui_iter; - result = generic_file_read_iter(vio->vui_iocb, &iter); - goto out; + lcc = ll_cl_find(inode); + lcc->lcc_iter = &iter; + lcc->lcc_iocb = vio->vui_iocb; + CDEBUG(D_VFSTRACE, "cnt:%ld,iocb pos:%lld\n", lcc->lcc_iter->count, + lcc->lcc_iocb->ki_pos); + + result = generic_file_read_iter(vio->vui_iocb, &iter); out: if (result >= 0) { if (result < cnt) From patchwork Sun Apr 9 12:12:55 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 13205949 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from pdx1-mailman-customer002.dreamhost.com (listserver-buz.dreamhost.com [69.163.136.29]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 1F937C77B70 for ; Sun, 9 Apr 2023 12:28:27 +0000 (UTC) Received: from pdx1-mailman-customer002.dreamhost.com (localhost [127.0.0.1]) by pdx1-mailman-customer002.dreamhost.com (Postfix) with ESMTP id 4PvWLX46LSz21HH; Sun, 9 Apr 2023 05:16:52 -0700 (PDT) Received: from smtp4.ccs.ornl.gov (smtp4.ccs.ornl.gov [160.91.203.40]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by pdx1-mailman-customer002.dreamhost.com (Postfix) with ESMTPS id 4PvWJ66wCqz1yFH for ; Sun, 9 Apr 2023 05:14:46 -0700 (PDT) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp4.ccs.ornl.gov (Postfix) with ESMTP id 082F1100827D; Sun, 9 Apr 2023 08:13:28 -0400 (EDT) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 06E102B2; Sun, 9 Apr 2023 08:13:28 -0400 (EDT) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Sun, 9 Apr 2023 08:12:55 -0400 Message-Id: <1681042400-15491-16-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1681042400-15491-1-git-send-email-jsimmons@infradead.org> References: <1681042400-15491-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 15/40] lustre: llite: fix relatime support X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.39 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Aurelien Degremont relatime behavior is properly managed by VFS, however Lustre also stores acmtime on OST objects and atime updates for OST objects should honor relatime behavior. This patch updates 'ci_noatime' feature which was introduced to properly honor noatime option for OST objects, to also support 'relatime'. file_is_noatime() code already comes from upstream touch_atime(). Add missing parts from touch_atime() to also support relatime. It also forces atime to disk on MDD if ondisk atime is older than ondisk mtime/ctime to match relatime (even if relatime is not enabled) WC-bug-id: https://jira.whamcloud.com/browse/LU-15728 Lustre-commit: c10c6eeb37dd55316 ("LU-15728 llite: fix relatime support") Signed-off-by: Aurelien Degremont Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/47017 Reviewed-by: Shaun Tancheff Reviewed-by: Oleg Drokin Reviewed-by: Yang Sheng Reviewed-by: Andreas Dilger Signed-off-by: James Simmons --- fs/lustre/llite/file.c | 45 ++++++++++++++++++++++++++++++++++++++++++--- 1 file changed, 42 insertions(+), 3 deletions(-) diff --git a/fs/lustre/llite/file.c b/fs/lustre/llite/file.c index 668d544..18f3302 100644 --- a/fs/lustre/llite/file.c +++ b/fs/lustre/llite/file.c @@ -1541,12 +1541,46 @@ void ll_io_set_mirror(struct cl_io *io, const struct file *file) file->f_path.dentry->d_name.name, io->ci_designated_mirror); } +/* + * This is relatime_need_update() from Linux 5.17, which is not exported. + */ +static int relatime_need_update(struct vfsmount *mnt, struct inode *inode, + struct timespec64 now) +{ + if (!(mnt->mnt_flags & MNT_RELATIME)) + return 1; + /* + * Is mtime younger than atime? If yes, update atime: + */ + if (timespec64_compare(&inode->i_mtime, &inode->i_atime) >= 0) + return 1; + /* + * Is ctime younger than atime? If yes, update atime: + */ + if (timespec64_compare(&inode->i_ctime, &inode->i_atime) >= 0) + return 1; + + /* + * Is the previous atime value older than a day? If yes, + * update atime: + */ + if ((long)(now.tv_sec - inode->i_atime.tv_sec) >= 24*60*60) + return 1; + /* + * Good, we can skip the atime update: + */ + return 0; +} + +/* + * Very similar to kernel function: !__atime_needs_update() + */ static bool file_is_noatime(const struct file *file) { - const struct vfsmount *mnt = file->f_path.mnt; - const struct inode *inode = file_inode(file); + struct vfsmount *mnt = file->f_path.mnt; + struct inode *inode = file_inode(file); + struct timespec64 now; - /* Adapted from file_accessed() and touch_atime().*/ if (file->f_flags & O_NOATIME) return true; @@ -1565,6 +1599,11 @@ static bool file_is_noatime(const struct file *file) if ((inode->i_sb->s_flags & SB_NODIRATIME) && S_ISDIR(inode->i_mode)) return true; + now = current_time(inode); + + if (!relatime_need_update(mnt, inode, now)) + return true; + return false; } From patchwork Sun Apr 9 12:12:56 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 13205951 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from pdx1-mailman-customer002.dreamhost.com (listserver-buz.dreamhost.com [69.163.136.29]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 5632BC77B70 for ; Sun, 9 Apr 2023 12:30:19 +0000 (UTC) Received: from pdx1-mailman-customer002.dreamhost.com (localhost [127.0.0.1]) by pdx1-mailman-customer002.dreamhost.com (Postfix) with ESMTP id 4PvWMB3SWnz21JY; Sun, 9 Apr 2023 05:17:26 -0700 (PDT) Received: from smtp4.ccs.ornl.gov (smtp4.ccs.ornl.gov [160.91.203.40]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by pdx1-mailman-customer002.dreamhost.com (Postfix) with ESMTPS id 4PvWJR0HG3z1yGG for ; Sun, 9 Apr 2023 05:15:02 -0700 (PDT) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp4.ccs.ornl.gov (Postfix) with ESMTP id 0C964100827E; Sun, 9 Apr 2023 08:13:28 -0400 (EDT) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 0B5C42B3; Sun, 9 Apr 2023 08:13:28 -0400 (EDT) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Sun, 9 Apr 2023 08:12:56 -0400 Message-Id: <1681042400-15491-17-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1681042400-15491-1-git-send-email-jsimmons@infradead.org> References: <1681042400-15491-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 16/40] lustre: ptlrpc: clarify AT error message X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.39 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Aurelien Degremont Clarify the error message related to passed deadline for AT early replies. It was indicating that the system was CPU bound which is most of the time wrong, as the issue is rather communication failure delaying RPC traffic. This could be confusing to people which will look for CPU resource consumption where the network traffic is more at cause. Also try to use less cryptic keywords which makes only sense to the feature developer, and not to admins. WC-bug-id: https://jira.whamcloud.com/browse/LU-930 Lustre-commit: 9ce04000fba07706c ("LU-930 ptlrpc: clarify AT error message") Signed-off-by: Aurelien Degremont Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/49548 Reviewed-by: Andreas Dilger Reviewed-by: Yang Sheng Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/ptlrpc/service.c | 11 +++++------ 1 file changed, 5 insertions(+), 6 deletions(-) diff --git a/fs/lustre/ptlrpc/service.c b/fs/lustre/ptlrpc/service.c index aaf7529..bf76272 100644 --- a/fs/lustre/ptlrpc/service.c +++ b/fs/lustre/ptlrpc/service.c @@ -1303,12 +1303,11 @@ static void ptlrpc_at_check_timed(struct ptlrpc_service_part *svcpt) * We're already past request deadlines before we even get a * chance to send early replies */ - LCONSOLE_WARN("%s: This server is not able to keep up with request traffic (cpu-bound).\n", - svcpt->scp_service->srv_name); - CWARN("earlyQ=%d reqQ=%d recA=%d, svcEst=%d, delay=%lldms\n", - counter, svcpt->scp_nreqs_incoming, - svcpt->scp_nreqs_active, - at_get(&svcpt->scp_at_estimate), delay_ms); + LCONSOLE_WARN("'%s' is processing requests too slowly, client may timeout. Late by %ds, missed %d early replies (reqs waiting=%d active=%d, at_estimate=%d, delay=%lldms)\n", + svcpt->scp_service->srv_name, -first, counter, + svcpt->scp_nreqs_incoming, + svcpt->scp_nreqs_active, + at_get(&svcpt->scp_at_estimate), delay_ms); } /* From patchwork Sun Apr 9 12:12:57 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 13205956 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from pdx1-mailman-customer002.dreamhost.com (listserver-buz.dreamhost.com [69.163.136.29]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 6411EC77B70 for ; Sun, 9 Apr 2023 12:34:23 +0000 (UTC) Received: from pdx1-mailman-customer002.dreamhost.com (localhost [127.0.0.1]) by pdx1-mailman-customer002.dreamhost.com (Postfix) with ESMTP id 4PvWNc04x1z22Pn; Sun, 9 Apr 2023 05:18:39 -0700 (PDT) Received: from smtp4.ccs.ornl.gov (smtp4.ccs.ornl.gov [160.91.203.40]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by pdx1-mailman-customer002.dreamhost.com (Postfix) with ESMTPS id 4PvWJc06d6z215Y for ; Sun, 9 Apr 2023 05:15:11 -0700 (PDT) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp4.ccs.ornl.gov (Postfix) with ESMTP id 10DB1100827F; Sun, 9 Apr 2023 08:13:28 -0400 (EDT) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 0FCE72AB; Sun, 9 Apr 2023 08:13:28 -0400 (EDT) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Sun, 9 Apr 2023 08:12:57 -0400 Message-Id: <1681042400-15491-18-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1681042400-15491-1-git-send-email-jsimmons@infradead.org> References: <1681042400-15491-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 17/40] lustre: update version to 2.15.54 X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.39 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Oleg Drokin New tag 2.15.54 Signed-off-by: Oleg Drokin Signed-off-by: James Simmons --- include/uapi/linux/lustre/lustre_ver.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/include/uapi/linux/lustre/lustre_ver.h b/include/uapi/linux/lustre/lustre_ver.h index 96267428..bc7a49c 100644 --- a/include/uapi/linux/lustre/lustre_ver.h +++ b/include/uapi/linux/lustre/lustre_ver.h @@ -3,9 +3,9 @@ #define LUSTRE_MAJOR 2 #define LUSTRE_MINOR 15 -#define LUSTRE_PATCH 53 +#define LUSTRE_PATCH 54 #define LUSTRE_FIX 0 -#define LUSTRE_VERSION_STRING "2.15.53" +#define LUSTRE_VERSION_STRING "2.15.54" #define OBD_OCD_VERSION(major, minor, patch, fix) \ (((major) << 24) + ((minor) << 16) + ((patch) << 8) + (fix)) From patchwork Sun Apr 9 12:12:58 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 13205958 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from pdx1-mailman-customer002.dreamhost.com (listserver-buz.dreamhost.com [69.163.136.29]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id B2EC6C77B61 for ; Sun, 9 Apr 2023 12:35:03 +0000 (UTC) Received: from pdx1-mailman-customer002.dreamhost.com (localhost [127.0.0.1]) by pdx1-mailman-customer002.dreamhost.com (Postfix) with ESMTP id 4PvWP30GBSz1yDy; Sun, 9 Apr 2023 05:19:03 -0700 (PDT) Received: from smtp4.ccs.ornl.gov (smtp4.ccs.ornl.gov [160.91.203.40]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by pdx1-mailman-customer002.dreamhost.com (Postfix) with ESMTPS id 4PvWJf5JlDz215y for ; Sun, 9 Apr 2023 05:15:14 -0700 (PDT) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp4.ccs.ornl.gov (Postfix) with ESMTP id 15CAC1008480; Sun, 9 Apr 2023 08:13:28 -0400 (EDT) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 149372B2; Sun, 9 Apr 2023 08:13:28 -0400 (EDT) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Sun, 9 Apr 2023 08:12:58 -0400 Message-Id: <1681042400-15491-19-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1681042400-15491-1-git-send-email-jsimmons@infradead.org> References: <1681042400-15491-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 18/40] lustre: tgt: skip free inodes in OST weights X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.39 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Andreas Dilger In lu_tgt_qos_weight_calc() calculate the target weight consistently with how the per-OST and per-OSS penalty calculation is done in ltd_qos_penalties_calc(). Otherwise, the QOS weighting calculations combine two different units, which incorrectly weighs allocations on OST with more free inodes over those with more free space. Fixes: 1fa303725063 ("lustre: lmv: share object alloc QoS code with LMV") WC-bug-id: https://jira.whamcloud.com/browse/LU-16501 Lustre-commit: 511bf2f4ccd1482d6 ("LU-16501 tgt: skip free inodes in OST weights") Signed-off-by: Andreas Dilger Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/49890 Reviewed-by: Artem Blagodarenko Reviewed-by: Lai Siyao Reviewed-by: Sergey Cheremencev Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/include/lu_object.h | 14 ++++++++++++- fs/lustre/lmv/lmv_obd.c | 4 ++-- fs/lustre/obdclass/lu_tgt_descs.c | 41 ++++++++++++++++----------------------- 3 files changed, 32 insertions(+), 27 deletions(-) diff --git a/fs/lustre/include/lu_object.h b/fs/lustre/include/lu_object.h index 4e101fa..0562f806 100644 --- a/fs/lustre/include/lu_object.h +++ b/fs/lustre/include/lu_object.h @@ -1539,6 +1539,18 @@ struct lu_tgt_desc { ltd_connecting:1; /* target is connecting */ }; +static inline u64 tgt_statfs_bavail(struct lu_tgt_desc *tgt) +{ + struct obd_statfs *statfs = &tgt->ltd_statfs; + + return statfs->os_bavail * statfs->os_bsize; +} + +static inline u64 tgt_statfs_iavail(struct lu_tgt_desc *tgt) +{ + return tgt->ltd_statfs.os_ffree; +} + /* number of pointers at 2nd level */ #define TGT_PTRS_PER_BLOCK (PAGE_SIZE / sizeof(void *)) /* number of pointers at 1st level - only need as many as max OST/MDT count */ @@ -1593,7 +1605,7 @@ struct lu_tgt_descs { u64 lu_prandom_u64_max(u64 ep_ro); int lu_qos_add_tgt(struct lu_qos *qos, struct lu_tgt_desc *ltd); int lu_qos_del_tgt(struct lu_qos *qos, struct lu_tgt_desc *ltd); -void lu_tgt_qos_weight_calc(struct lu_tgt_desc *tgt); +void lu_tgt_qos_weight_calc(struct lu_tgt_desc *tgt, bool is_mdt); int lu_tgt_descs_init(struct lu_tgt_descs *ltd, bool is_mdt); void lu_tgt_descs_fini(struct lu_tgt_descs *ltd); diff --git a/fs/lustre/lmv/lmv_obd.c b/fs/lustre/lmv/lmv_obd.c index 99604e8..1b6e4aa 100644 --- a/fs/lustre/lmv/lmv_obd.c +++ b/fs/lustre/lmv/lmv_obd.c @@ -1512,7 +1512,7 @@ static struct lu_tgt_desc *lmv_locate_tgt_qos(struct lmv_obd *lmv, } tgt->ltd_qos.ltq_usable = 1; - lu_tgt_qos_weight_calc(tgt); + lu_tgt_qos_weight_calc(tgt, true); if (tgt->ltd_index == op_data->op_mds) cur = tgt; total_avail += tgt->ltd_qos.ltq_avail; @@ -1613,7 +1613,7 @@ static struct lu_tgt_desc *lmv_locate_tgt_lf(struct lmv_obd *lmv) } tgt->ltd_qos.ltq_usable = 1; - lu_tgt_qos_weight_calc(tgt); + lu_tgt_qos_weight_calc(tgt, true); avail += tgt->ltd_qos.ltq_avail; if (!min || min->ltd_qos.ltq_avail > tgt->ltd_qos.ltq_avail) min = tgt; diff --git a/fs/lustre/obdclass/lu_tgt_descs.c b/fs/lustre/obdclass/lu_tgt_descs.c index 7394789..35e7c7c 100644 --- a/fs/lustre/obdclass/lu_tgt_descs.c +++ b/fs/lustre/obdclass/lu_tgt_descs.c @@ -198,33 +198,26 @@ int lu_qos_del_tgt(struct lu_qos *qos, struct lu_tgt_desc *ltd) } EXPORT_SYMBOL(lu_qos_del_tgt); -static inline u64 tgt_statfs_bavail(struct lu_tgt_desc *tgt) -{ - struct obd_statfs *statfs = &tgt->ltd_statfs; - - return statfs->os_bavail * statfs->os_bsize; -} - -static inline u64 tgt_statfs_iavail(struct lu_tgt_desc *tgt) -{ - return tgt->ltd_statfs.os_ffree; -} - /** * Calculate weight for a given tgt. * - * The final tgt weight is bavail >> 16 * iavail >> 8 minus the tgt and server - * penalties. See ltd_qos_penalties_calc() for how penalties are calculated. + * The final tgt weight uses only free space for OSTs, but combines + * both free space and inodes for MDTs, minus tgt and server penalties. + * See ltd_qos_penalties_calc() for how penalties are calculated. * * @tgt target descriptor + * @is_mdt target table is for MDT selection (use inodes) */ -void lu_tgt_qos_weight_calc(struct lu_tgt_desc *tgt) +void lu_tgt_qos_weight_calc(struct lu_tgt_desc *tgt, bool is_mdt) { struct lu_tgt_qos *ltq = &tgt->ltd_qos; u64 penalty; - ltq->ltq_avail = (tgt_statfs_bavail(tgt) >> 16) * - (tgt_statfs_iavail(tgt) >> 8); + if (is_mdt) + ltq->ltq_avail = (tgt_statfs_bavail(tgt) >> 16) * + (tgt_statfs_iavail(tgt) >> 8); + else + ltq->ltq_avail = tgt_statfs_bavail(tgt) >> 8; penalty = ltq->ltq_penalty + ltq->ltq_svr->lsq_penalty; if (ltq->ltq_avail < penalty) ltq->ltq_weight = 0; @@ -512,11 +505,10 @@ int ltd_qos_penalties_calc(struct lu_tgt_descs *ltd) /* * per-tgt penalty is - * prio * bavail * iavail / (num_tgt - 1) / 2 + * prio * bavail * iavail / (num_tgt - 1) / prio_max / 2 */ - tgt->ltd_qos.ltq_penalty_per_obj = prio_wide * ba * ia >> 8; + tgt->ltd_qos.ltq_penalty_per_obj = prio_wide * ba * ia >> 9; do_div(tgt->ltd_qos.ltq_penalty_per_obj, num_active); - tgt->ltd_qos.ltq_penalty_per_obj >>= 1; age = (now - tgt->ltd_qos.ltq_used) >> 3; if (test_bit(LQ_RESET, &qos->lq_flags) || @@ -563,14 +555,11 @@ int ltd_qos_penalties_calc(struct lu_tgt_descs *ltd) svr->lsq_penalty >>= age / desc->ld_qos_maxage; } - clear_bit(LQ_DIRTY, &qos->lq_flags); - clear_bit(LQ_RESET, &qos->lq_flags); /* * If each tgt has almost same free space, do rr allocation for better * creation performance */ - clear_bit(LQ_SAME_SPACE, &qos->lq_flags); if (((ba_max * (QOS_THRESHOLD_MAX - qos->lq_threshold_rr)) / QOS_THRESHOLD_MAX) < ba_min && ((ia_max * (QOS_THRESHOLD_MAX - qos->lq_threshold_rr)) / @@ -578,7 +567,11 @@ int ltd_qos_penalties_calc(struct lu_tgt_descs *ltd) set_bit(LQ_SAME_SPACE, &qos->lq_flags); /* Reset weights for the next time we enter qos mode */ set_bit(LQ_RESET, &qos->lq_flags); + } else { + clear_bit(LQ_SAME_SPACE, &qos->lq_flags); + clear_bit(LQ_RESET, &qos->lq_flags); } + clear_bit(LQ_DIRTY, &qos->lq_flags); rc = 0; out: @@ -653,7 +646,7 @@ int ltd_qos_update(struct lu_tgt_descs *ltd, struct lu_tgt_desc *tgt, else ltq->ltq_penalty -= ltq->ltq_penalty_per_obj; - lu_tgt_qos_weight_calc(tgt); + lu_tgt_qos_weight_calc(tgt, ltd->ltd_is_mdt); /* Recalc the total weight of usable osts */ if (ltq->ltq_usable) From patchwork Sun Apr 9 12:12:59 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 13205954 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from pdx1-mailman-customer002.dreamhost.com (listserver-buz.dreamhost.com [69.163.136.29]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 7F333C77B70 for ; Sun, 9 Apr 2023 12:30:40 +0000 (UTC) Received: from pdx1-mailman-customer002.dreamhost.com (localhost [127.0.0.1]) by pdx1-mailman-customer002.dreamhost.com (Postfix) with ESMTP id 4PvWMp5Dfjz226L; Sun, 9 Apr 2023 05:17:58 -0700 (PDT) Received: from smtp4.ccs.ornl.gov (smtp4.ccs.ornl.gov [160.91.203.40]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by pdx1-mailman-customer002.dreamhost.com (Postfix) with ESMTPS id 4PvWJm6QGYz216P for ; Sun, 9 Apr 2023 05:15:20 -0700 (PDT) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp4.ccs.ornl.gov (Postfix) with ESMTP id 1A4BF1008481; Sun, 9 Apr 2023 08:13:28 -0400 (EDT) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 190952B3; Sun, 9 Apr 2023 08:13:28 -0400 (EDT) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Sun, 9 Apr 2023 08:12:59 -0400 Message-Id: <1681042400-15491-20-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1681042400-15491-1-git-send-email-jsimmons@infradead.org> References: <1681042400-15491-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 19/40] lustre: fileset: check fileset for operations by fid X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.39 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Sebastien Buisson Some operations by FID, such as lfs rmfid, must be aware of subdirectory mount (fileset) so that they do not operate on files that are outside of the namespace currently mounted by the client. For lfs rmfid, we first proceed to a fid2path resolution. As fid2path is already fileset aware, it fails if a file or a link to a file is outside of the subdirectory mount. So we carry on with rmfid only for FIDs for which the file and all links do appear under the current fileset. This new behavior is enabled as soon as we detect a subdirectory mount is done (either directly or imposed by a nodemap fileset). This means the new behavior does not impact normal, whole-namespace client mount. WC-bug-id: https://jira.whamcloud.com/browse/LU-16494 Lustre-commit: 9a72c073d33b04542 ("LU-16494 fileset: check fileset for operations by fid") Signed-off-by: Sebastien Buisson Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/49696 Reviewed-by: Andreas Dilger Reviewed-by: jsimmons Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/llite/dir.c | 84 ++++++++++++++++++++++++++++++++++++++++ fs/lustre/llite/file.c | 55 ++++++++++++++------------ fs/lustre/llite/llite_internal.h | 2 + 3 files changed, 116 insertions(+), 25 deletions(-) diff --git a/fs/lustre/llite/dir.c b/fs/lustre/llite/dir.c index 56ef1bb..1298bd6 100644 --- a/fs/lustre/llite/dir.c +++ b/fs/lustre/llite/dir.c @@ -1295,6 +1295,7 @@ int ll_rmfid(struct file *file, void __user *arg) { const struct fid_array __user *ufa = arg; struct inode *inode = file_inode(file); + struct ll_sb_info *sbi = ll_i2sbi(inode); struct fid_array *lfa = NULL; size_t size; unsigned int nr; @@ -1325,8 +1326,91 @@ int ll_rmfid(struct file *file, void __user *arg) goto free_rcs; } + /* In case of subdirectory mount, we need to make sure all the files + * for which we want to remove FID are visible in the namespace. + */ + if (!fid_is_root(&sbi->ll_root_fid)) { + struct fid_array *lfa_new = NULL; + int path_len = PATH_MAX, linkno; + struct getinfo_fid2path *gf; + int idx, last_idx = nr - 1; + + lfa_new = kzalloc(size, GFP_NOFS); + if (!lfa_new) { + rc = -ENOMEM; + goto free_rcs; + } + lfa_new->fa_nr = 0; + + gf = kmalloc(sizeof(*gf) + path_len + 1, GFP_NOFS); + if (!gf) { + rc = -ENOMEM; + goto free_rcs; + } + + for (idx = 0; idx < nr; idx++) { + linkno = 0; + while (1) { + memset(gf, 0, sizeof(*gf) + path_len + 1); + gf->gf_fid = lfa->fa_fids[idx]; + gf->gf_pathlen = path_len; + gf->gf_linkno = linkno; + rc = __ll_fid2path(inode, gf, + sizeof(*gf) + gf->gf_pathlen, + gf->gf_pathlen); + if (rc == -ENAMETOOLONG) { + struct getinfo_fid2path *tmpgf; + + path_len += PATH_MAX; + tmpgf = krealloc(gf, + sizeof(*gf) + path_len + 1, + GFP_NOFS); + if (!tmpgf) { + kfree(gf); + kfree(lfa_new); + rc = -ENOMEM; + goto free_rcs; + } + gf = tmpgf; + continue; + } + if (rc) + break; + if (gf->gf_linkno == linkno) + break; + linkno = gf->gf_linkno; + } + + if (!rc) { + /* All the links for this fid are visible in the + * mounted subdir. So add it to the list of fids + * to remove. + */ + lfa_new->fa_fids[lfa_new->fa_nr++] = + lfa->fa_fids[idx]; + } else { + /* At least one link for this fid is not visible + * in the mounted subdir. So add it at the end + * of the list that will be hidden to lower + * layers, and set -ENOENT as ret code. + */ + lfa_new->fa_fids[last_idx] = lfa->fa_fids[idx]; + rcs[last_idx--] = rc; + } + } + kfree(gf); + kfree(lfa); + lfa = lfa_new; + } + + if (lfa->fa_nr == 0) { + rc = rcs[nr - 1]; + goto free_rcs; + } + /* Call mdc_iocontrol */ rc = md_rmfid(ll_i2mdexp(file_inode(file)), lfa, rcs, NULL); + lfa->fa_nr = nr; if (!rc) { for (i = 0; i < nr; i++) if (rcs[i]) diff --git a/fs/lustre/llite/file.c b/fs/lustre/llite/file.c index 18f3302..a9d247c 100644 --- a/fs/lustre/llite/file.c +++ b/fs/lustre/llite/file.c @@ -2917,9 +2917,37 @@ static int fid2path_for_enc_file(struct inode *parent, char *gfpath, return rc; } -int ll_fid2path(struct inode *inode, void __user *arg) +int __ll_fid2path(struct inode *inode, struct getinfo_fid2path *gfout, + size_t outsize, __u32 pathlen_orig) { struct obd_export *exp = ll_i2mdexp(inode); + int rc; + + /* Append root FID after gfout to let MDT know the root FID so that + * it can lookup the correct path, this is mainly for fileset. + * old server without fileset mount support will ignore this. + */ + *gfout->gf_root_fid = *ll_inode2fid(inode); + + /* Call mdc_iocontrol */ + rc = obd_iocontrol(OBD_IOC_FID2PATH, exp, outsize, gfout, NULL); + + if (!rc && gfout->gf_pathlen && gfout->gf_path[0] == '/') { + /* by convention, server side (mdt_path_current()) puts + * a leading '/' to tell client that we are dealing with + * an encrypted file + */ + rc = fid2path_for_enc_file(inode, gfout->gf_path, + gfout->gf_pathlen); + if (!rc && strlen(gfout->gf_path) > pathlen_orig) + rc = -EOVERFLOW; + } + + return rc; +} + +int ll_fid2path(struct inode *inode, void __user *arg) +{ const struct getinfo_fid2path __user *gfin = arg; struct getinfo_fid2path *gfout; u32 pathlen, pathlen_orig; @@ -2950,34 +2978,11 @@ int ll_fid2path(struct inode *inode, void __user *arg) goto gf_free; } - /* - * append root FID after gfout to let MDT know the root FID so that it - * can lookup the correct path, this is mainly for fileset. - * old server without fileset mount support will ignore this. - */ - *gfout->gf_root_fid = *ll_inode2fid(inode); gfout->gf_pathlen = pathlen; - - /* Call mdc_iocontrol */ - rc = obd_iocontrol(OBD_IOC_FID2PATH, exp, outsize, gfout, NULL); + rc = __ll_fid2path(inode, gfout, outsize, pathlen_orig); if (rc != 0) goto gf_free; - if (gfout->gf_pathlen && gfout->gf_path[0] == '/') { - /* by convention, server side (mdt_path_current()) puts - * a leading '/' to tell client that we are dealing with - * an encrypted file - */ - rc = fid2path_for_enc_file(inode, gfout->gf_path, - gfout->gf_pathlen); - if (rc) - goto gf_free; - if (strlen(gfout->gf_path) > gfin->gf_pathlen) { - rc = -EOVERFLOW; - goto gf_free; - } - } - if (copy_to_user(arg, gfout, sizeof(*gfout) + pathlen_orig)) rc = -EFAULT; diff --git a/fs/lustre/llite/llite_internal.h b/fs/lustre/llite/llite_internal.h index 970b144..6bbc781 100644 --- a/fs/lustre/llite/llite_internal.h +++ b/fs/lustre/llite/llite_internal.h @@ -1245,6 +1245,8 @@ int ll_dir_getstripe(struct inode *inode, void **plmm, int *plmm_size, int ll_fsync(struct file *file, loff_t start, loff_t end, int data); int ll_merge_attr(const struct lu_env *env, struct inode *inode); int ll_fid2path(struct inode *inode, void __user *arg); +int __ll_fid2path(struct inode *inode, struct getinfo_fid2path *gfout, + size_t outsize, u32 pathlen_orig); int ll_data_version(struct inode *inode, u64 *data_version, int flags); int ll_hsm_release(struct inode *inode); int ll_hsm_state_set(struct inode *inode, struct hsm_state_set *hss); From patchwork Sun Apr 9 12:13:00 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 13205960 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from pdx1-mailman-customer002.dreamhost.com (listserver-buz.dreamhost.com [69.163.136.29]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id DE418C77B70 for ; Sun, 9 Apr 2023 12:36:05 +0000 (UTC) Received: from pdx1-mailman-customer002.dreamhost.com (localhost [127.0.0.1]) by pdx1-mailman-customer002.dreamhost.com (Postfix) with ESMTP id 4PvWPl50yKz22Rm; Sun, 9 Apr 2023 05:19:39 -0700 (PDT) Received: from smtp4.ccs.ornl.gov (smtp4.ccs.ornl.gov [160.91.203.40]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by pdx1-mailman-customer002.dreamhost.com (Postfix) with ESMTPS id 4PvWJq1vrBz219P for ; Sun, 9 Apr 2023 05:15:23 -0700 (PDT) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp4.ccs.ornl.gov (Postfix) with ESMTP id 1EB941008482; Sun, 9 Apr 2023 08:13:28 -0400 (EDT) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 1D7252AB; Sun, 9 Apr 2023 08:13:28 -0400 (EDT) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Sun, 9 Apr 2023 08:13:00 -0400 Message-Id: <1681042400-15491-21-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1681042400-15491-1-git-send-email-jsimmons@infradead.org> References: <1681042400-15491-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 20/40] lustre: clio: Remove cl_page_size() X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.39 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Patrick Farrell cl_page_size() is just a function which does: 1 << PAGE_SHIFT and the kernel provides a macro for that - PAGE_SIZE. Maybe it didn't when this function was added, but it sure does now. So, remove cl_page_size(). WC-bug-id: https://jira.whamcloud.com/browse/LU-16515 Lustre-commit: 19c38f6c94ae161b1 ("LU-16515 clio: Remove cl_page_size()") Signed-off-by: Patrick Farrell Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/49918 Reviewed-by: Andreas Dilger Reviewed-by: Sebastien Buisson Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/include/cl_object.h | 1 - fs/lustre/llite/rw26.c | 9 ++++----- fs/lustre/llite/vvp_io.c | 2 +- fs/lustre/lov/lov_page.c | 2 +- fs/lustre/obdclass/cl_page.c | 6 ------ 5 files changed, 6 insertions(+), 14 deletions(-) diff --git a/fs/lustre/include/cl_object.h b/fs/lustre/include/cl_object.h index 41ce0b0..8a98413 100644 --- a/fs/lustre/include/cl_object.h +++ b/fs/lustre/include/cl_object.h @@ -2268,7 +2268,6 @@ void cl_page_touch(const struct lu_env *env, const struct cl_page *pg, size_t to); loff_t cl_offset(const struct cl_object *obj, pgoff_t idx); pgoff_t cl_index(const struct cl_object *obj, loff_t offset); -size_t cl_page_size(const struct cl_object *obj); int cl_pages_prune(const struct lu_env *env, struct cl_object *obj); void cl_lock_print(const struct lu_env *env, void *cookie, diff --git a/fs/lustre/llite/rw26.c b/fs/lustre/llite/rw26.c index 6700717..6b338b2 100644 --- a/fs/lustre/llite/rw26.c +++ b/fs/lustre/llite/rw26.c @@ -218,7 +218,6 @@ static unsigned long ll_iov_iter_alignment(struct iov_iter *i) struct cl_sync_io *anchor = &sdio->csd_sync; loff_t offset = pv->ldp_file_offset; int io_pages = 0; - size_t page_size = cl_page_size(obj); int i; ssize_t rc = 0; @@ -257,12 +256,12 @@ static unsigned long ll_iov_iter_alignment(struct iov_iter *i) * Set page clip to tell transfer formation engine * that page has to be sent even if it is beyond KMS. */ - if (size < page_size) + if (size < PAGE_SIZE) cl_page_clip(env, page, 0, size); ++io_pages; - offset += page_size; - size -= page_size; + offset += PAGE_SIZE; + size -= PAGE_SIZE; } if (rc == 0 && io_pages > 0) { int iot = rw == READ ? CRT_READ : CRT_WRITE; @@ -478,7 +477,7 @@ static int ll_prepare_partial_page(const struct lu_env *env, struct cl_io *io, if (attr->cat_kms <= offset) { char *kaddr = kmap_atomic(pg->cp_vmpage); - memset(kaddr, 0, cl_page_size(obj)); + memset(kaddr, 0, PAGE_SIZE); kunmap_atomic(kaddr); result = 0; goto out; diff --git a/fs/lustre/llite/vvp_io.c b/fs/lustre/llite/vvp_io.c index 2da74a2..561ce66 100644 --- a/fs/lustre/llite/vvp_io.c +++ b/fs/lustre/llite/vvp_io.c @@ -1525,7 +1525,7 @@ static int vvp_io_fault_start(const struct lu_env *env, */ fio->ft_nob = size - cl_offset(obj, fio->ft_index); else - fio->ft_nob = cl_page_size(obj); + fio->ft_nob = PAGE_SIZE; lu_ref_add(&page->cp_reference, "fault", io); fio->ft_page = page; diff --git a/fs/lustre/lov/lov_page.c b/fs/lustre/lov/lov_page.c index 6e28e62..e9283aa 100644 --- a/fs/lustre/lov/lov_page.c +++ b/fs/lustre/lov/lov_page.c @@ -144,7 +144,7 @@ int lov_page_init_empty(const struct lu_env *env, struct cl_object *obj, page->cp_lov_index = CP_LOV_INDEX_EMPTY; addr = kmap(page->cp_vmpage); - memset(addr, 0, cl_page_size(obj)); + memset(addr, 0, PAGE_SIZE); kunmap(page->cp_vmpage); SetPageUptodate(page->cp_vmpage); return 0; diff --git a/fs/lustre/obdclass/cl_page.c b/fs/lustre/obdclass/cl_page.c index 62d8ee5..8320293 100644 --- a/fs/lustre/obdclass/cl_page.c +++ b/fs/lustre/obdclass/cl_page.c @@ -1035,12 +1035,6 @@ pgoff_t cl_index(const struct cl_object *obj, loff_t offset) } EXPORT_SYMBOL(cl_index); -size_t cl_page_size(const struct cl_object *obj) -{ - return 1UL << PAGE_SHIFT; -} -EXPORT_SYMBOL(cl_page_size); - /** * Adds page slice to the compound page. * From patchwork Sun Apr 9 12:13:01 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 13205961 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from pdx1-mailman-customer002.dreamhost.com (listserver-buz.dreamhost.com [69.163.136.29]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 40875C77B70 for ; Sun, 9 Apr 2023 12:37:30 +0000 (UTC) Received: from pdx1-mailman-customer002.dreamhost.com (localhost [127.0.0.1]) by pdx1-mailman-customer002.dreamhost.com (Postfix) with ESMTP id 4PvWQK4yz1z22Sn; Sun, 9 Apr 2023 05:20:09 -0700 (PDT) Received: from smtp4.ccs.ornl.gov (smtp4.ccs.ornl.gov [160.91.203.40]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by pdx1-mailman-customer002.dreamhost.com (Postfix) with ESMTPS id 4PvWKD6ZM9z21B2 for ; Sun, 9 Apr 2023 05:15:44 -0700 (PDT) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp4.ccs.ornl.gov (Postfix) with ESMTP id 232501008483; Sun, 9 Apr 2023 08:13:28 -0400 (EDT) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 21DD42B2; Sun, 9 Apr 2023 08:13:28 -0400 (EDT) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Sun, 9 Apr 2023 08:13:01 -0400 Message-Id: <1681042400-15491-22-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1681042400-15491-1-git-send-email-jsimmons@infradead.org> References: <1681042400-15491-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 21/40] lustre: fid: clean up OBIF_MAX_OID and IDIF_MAX_OID X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.39 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Li Dongyang , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Li Dongyang Define the OBIF|IDIF_MAX_OID macros to 1ULL << OBIF|IDIF_MAX_BITS - 1 Clean up the callers and remove OBIF|IDIF_OID_MASK which are not used. WC-bug-id: https://jira.whamcloud.com/browse/LU-11912 Lustre-commit: bb2f0dac868cf1321 ("LU-11912 fid: clean up OBIF_MAX_OID and IDIF_MAX_OID") Signed-off-by: Li Dongyang Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/45659 Reviewed-by: Andreas Dilger Reviewed-by: Sergey Cheremencev Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/include/lustre_fid.h | 6 +++--- include/uapi/linux/lustre/lustre_idl.h | 6 ++---- include/uapi/linux/lustre/lustre_ostid.h | 4 ++-- 3 files changed, 7 insertions(+), 9 deletions(-) diff --git a/fs/lustre/include/lustre_fid.h b/fs/lustre/include/lustre_fid.h index b8a3f2e..88a6061 100644 --- a/fs/lustre/include/lustre_fid.h +++ b/fs/lustre/include/lustre_fid.h @@ -481,18 +481,18 @@ static inline int ostid_res_name_eq(const struct ost_id *oi, static inline int ostid_set_id(struct ost_id *oi, u64 oid) { if (fid_seq_is_mdt0(oi->oi.oi_seq)) { - if (oid >= IDIF_MAX_OID) + if (oid > IDIF_MAX_OID) return -E2BIG; oi->oi.oi_id = oid; } else if (fid_is_idif(&oi->oi_fid)) { - if (oid >= IDIF_MAX_OID) + if (oid > IDIF_MAX_OID) return -E2BIG; oi->oi_fid.f_seq = fid_idif_seq(oid, fid_idif_ost_idx(&oi->oi_fid)); oi->oi_fid.f_oid = oid; oi->oi_fid.f_ver = oid >> 48; } else { - if (oid >= OBIF_MAX_OID) + if (oid > OBIF_MAX_OID) return -E2BIG; oi->oi_fid.f_oid = oid; } diff --git a/include/uapi/linux/lustre/lustre_idl.h b/include/uapi/linux/lustre/lustre_idl.h index b4185a7..a752639 100644 --- a/include/uapi/linux/lustre/lustre_idl.h +++ b/include/uapi/linux/lustre/lustre_idl.h @@ -295,11 +295,9 @@ enum fid_seq { }; #define OBIF_OID_MAX_BITS 32 -#define OBIF_MAX_OID ((1ULL << OBIF_OID_MAX_BITS)) -#define OBIF_OID_MASK ((1ULL << OBIF_OID_MAX_BITS) - 1) +#define OBIF_MAX_OID ((1ULL << OBIF_OID_MAX_BITS) - 1) #define IDIF_OID_MAX_BITS 48 -#define IDIF_MAX_OID ((1ULL << IDIF_OID_MAX_BITS)) -#define IDIF_OID_MASK ((1ULL << IDIF_OID_MAX_BITS) - 1) +#define IDIF_MAX_OID ((1ULL << IDIF_OID_MAX_BITS) - 1) /** OID for FID_SEQ_SPECIAL */ enum special_oid { diff --git a/include/uapi/linux/lustre/lustre_ostid.h b/include/uapi/linux/lustre/lustre_ostid.h index 90fa213..baf7c8f 100644 --- a/include/uapi/linux/lustre/lustre_ostid.h +++ b/include/uapi/linux/lustre/lustre_ostid.h @@ -91,7 +91,7 @@ static inline __u64 ostid_seq(const struct ost_id *ostid) static inline __u64 ostid_id(const struct ost_id *ostid) { if (fid_seq_is_mdt0(ostid->oi.oi_seq)) - return ostid->oi.oi_id & IDIF_OID_MASK; + return ostid->oi.oi_id & IDIF_MAX_OID; if (fid_seq_is_default(ostid->oi.oi_seq)) return ostid->oi.oi_id; @@ -212,7 +212,7 @@ static inline int ostid_to_fid(struct lu_fid *fid, const struct ost_id *ostid, * been in production for years. This can handle create rates * of 1M objects/s/OST for 9 years, or combinations thereof. */ - if (oid >= IDIF_MAX_OID) + if (oid > IDIF_MAX_OID) return -EBADF; fid->f_seq = fid_idif_seq(oid, ost_idx); From patchwork Sun Apr 9 12:13:02 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 13205955 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from pdx1-mailman-customer002.dreamhost.com (listserver-buz.dreamhost.com [69.163.136.29]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 39299C77B70 for ; Sun, 9 Apr 2023 12:32:21 +0000 (UTC) Received: from pdx1-mailman-customer002.dreamhost.com (localhost [127.0.0.1]) by pdx1-mailman-customer002.dreamhost.com (Postfix) with ESMTP id 4PvWNJ2bK2z22PK; Sun, 9 Apr 2023 05:18:24 -0700 (PDT) Received: from smtp4.ccs.ornl.gov (smtp4.ccs.ornl.gov [160.91.203.40]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by pdx1-mailman-customer002.dreamhost.com (Postfix) with ESMTPS id 4PvWLG03Mfz21Gs for ; Sun, 9 Apr 2023 05:16:37 -0700 (PDT) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp4.ccs.ornl.gov (Postfix) with ESMTP id 27FED1008484; Sun, 9 Apr 2023 08:13:28 -0400 (EDT) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 265462B3; Sun, 9 Apr 2023 08:13:28 -0400 (EDT) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Sun, 9 Apr 2023 08:13:02 -0400 Message-Id: <1681042400-15491-23-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1681042400-15491-1-git-send-email-jsimmons@infradead.org> References: <1681042400-15491-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 22/40] lustre: llog: fix processing of a wrapped catalog X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.39 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Etienne AUJAMES , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Etienne AUJAMES Several issues were found with "lfs changelog --follow" for a wrapped catalog (llog_cat_process() with startidx): 1/ incorrect lpcd_first_idx value for a wrapped catalog (startcat>0) The first llog index to process is "lpcd_first_idx + 1". The startidx represents the last record index processed for a llog plain. The catalog index of this llog is startcat. lpcd_first_idx of a catalog should be set to "startcat - 1" e.g: llog_cat_process(... startcat=10, startidx=101) means that the processing will start with the llog plain at the index 10 of the catalog. And the first record to process will be at index 102. 2/ startidx is not reset for an incorrect startcat index startidx is relevant only for a startcat. So if the corresponding llog plain is removed or if startcat is out of range, we need to reset startidx. This patch remove LLOG_CAT_FIRST, that was really confusing (LU-14158). And update osp_sync_thread() with the llog_cat_process() corrected behavior. It modifies also llog_cat_retain_cb() to zap empty plain llog directly in it (like for llog_cat_size_cb()), the current implementation is not compatible with this patch. Fixes: 58239d59 ("lustre: llog: fix processing of a wrapped catalog") WC-bug-id: https://jira.whamcloud.com/browse/LU-15280 Lustre-commit: 76cf7427145a397a3 ("LU-15280 llog: fix processing of a wrapped catalog") Signed-off-by: Etienne AUJAMES Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/45708 Reviewed-by: Oleg Drokin Reviewed-by: Alexander Boyko Reviewed-by: Mikhail Pershin Signed-off-by: James Simmons --- fs/lustre/include/lustre_log.h | 16 +++++ fs/lustre/include/obd_support.h | 1 + fs/lustre/obdclass/llog.c | 21 +++--- fs/lustre/obdclass/llog_cat.c | 128 ++++++++++++++++++++++----------- include/uapi/linux/lustre/lustre_idl.h | 5 -- 5 files changed, 115 insertions(+), 56 deletions(-) diff --git a/fs/lustre/include/lustre_log.h b/fs/lustre/include/lustre_log.h index dbf3fd6..1586595 100644 --- a/fs/lustre/include/lustre_log.h +++ b/fs/lustre/include/lustre_log.h @@ -375,6 +375,15 @@ static inline int llog_next_block(const struct lu_env *env, return rc; } +static inline int llog_max_idx(struct llog_log_hdr *lh) +{ + if (OBD_FAIL_PRECHECK(OBD_FAIL_CAT_RECORDS) && + unlikely(lh->llh_flags & LLOG_F_IS_CAT)) + return cfs_fail_val; + else + return LLOG_HDR_BITMAP_SIZE(lh) - 1; +} + /* Determine if a llog plain of a catalog could be skiped based on record * custom indexes. * This assumes that indexes follow each other. The number of records to skip @@ -391,6 +400,13 @@ static inline int llog_is_plain_skipable(struct llog_log_hdr *lh, return (LLOG_HDR_BITMAP_SIZE(lh) - rec->lrh_index) < (start - curr); } +static inline bool llog_cat_is_wrapped(struct llog_handle *cat) +{ + struct llog_log_hdr *llh = cat->lgh_hdr; + + return llh->llh_cat_idx >= cat->lgh_last_idx && llh->llh_count > 1; +} + /* llog.c */ int lustre_process_log(struct super_block *sb, char *logname, struct config_llog_instance *cfg); diff --git a/fs/lustre/include/obd_support.h b/fs/lustre/include/obd_support.h index 4ef5c61..55196ce 100644 --- a/fs/lustre/include/obd_support.h +++ b/fs/lustre/include/obd_support.h @@ -458,6 +458,7 @@ /* was OBD_FAIL_LLOG_CATINFO_NET 0x1309 until 2.3 */ #define OBD_FAIL_MDS_SYNC_CAPA_SL 0x1310 #define OBD_FAIL_SEQ_ALLOC 0x1311 +#define OBD_FAIL_CAT_RECORDS 0x1312 #define OBD_FAIL_PLAIN_RECORDS 0x1319 #define OBD_FAIL_CATALOG_FULL_CHECK 0x131a diff --git a/fs/lustre/obdclass/llog.c b/fs/lustre/obdclass/llog.c index 90bb8bd..09fda39 100644 --- a/fs/lustre/obdclass/llog.c +++ b/fs/lustre/obdclass/llog.c @@ -247,7 +247,7 @@ int llog_verify_record(const struct llog_handle *llh, struct llog_rec_hdr *rec) else if (rec->lrh_len == 0 || rec->lrh_len > chunk_size) LLOG_ERROR_REC(llh, rec, "bad record len, chunk size is %d", chunk_size); - else if (rec->lrh_index >= LLOG_HDR_BITMAP_SIZE(llh->lgh_hdr)) + else if (rec->lrh_index > llog_max_idx(llh->lgh_hdr)) LLOG_ERROR_REC(llh, rec, "index is too high"); else return 0; @@ -292,16 +292,21 @@ static int llog_process_thread(void *arg) return 0; } + last_index = llog_max_idx(llh); if (cd) { - last_called_index = cd->lpcd_first_idx; + if (cd->lpcd_first_idx >= llog_max_idx(llh)) { + /* End of the indexes -> Nothing to do */ + rc = 0; + goto out; + } index = cd->lpcd_first_idx + 1; + last_called_index = cd->lpcd_first_idx; + if (cd->lpcd_last_idx > 0 && + cd->lpcd_last_idx <= llog_max_idx(llh)) + last_index = cd->lpcd_last_idx; + else if (cd->lpcd_read_mode & LLOG_READ_MODE_RAW) + last_index = loghandle->lgh_last_idx; } - if (cd && cd->lpcd_last_idx) - last_index = cd->lpcd_last_idx; - else if (cd && (cd->lpcd_read_mode & LLOG_READ_MODE_RAW)) - last_index = loghandle->lgh_last_idx; - else - last_index = LLOG_HDR_BITMAP_SIZE(llh) - 1; while (rc == 0) { unsigned int buf_offset = 0; diff --git a/fs/lustre/obdclass/llog_cat.c b/fs/lustre/obdclass/llog_cat.c index 95bfa65..9d624a7 100644 --- a/fs/lustre/obdclass/llog_cat.c +++ b/fs/lustre/obdclass/llog_cat.c @@ -174,19 +174,25 @@ static int llog_cat_process_cb(const struct lu_env *env, struct llog_handle *llh = NULL; int rc; + /* Skip processing of the logs until startcat */ + if (rec->lrh_index < d->lpd_startcat) + return 0; + rc = llog_cat_process_common(env, cat_llh, rec, &llh); if (rc) goto out; - if (rec->lrh_index < d->lpd_startcat) - /* Skip processing of the logs until startcat */ - rc = 0; - else if (d->lpd_startidx > 0) { - struct llog_process_cat_data cd; - - cd.lpcd_read_mode = LLOG_READ_MODE_NORMAL; - cd.lpcd_first_idx = d->lpd_startidx; - cd.lpcd_last_idx = 0; + if (d->lpd_startidx > 0) { + struct llog_process_cat_data cd = { + .lpcd_first_idx = 0, + .lpcd_last_idx = 0, + .lpcd_read_mode = LLOG_READ_MODE_NORMAL, + }; + + /* startidx is always associated with a catalog index */ + if (d->lpd_startcat == rec->lrh_index) + cd.lpcd_first_idx = d->lpd_startidx; + rc = llog_process_or_fork(env, llh, d->lpd_cb, d->lpd_data, &cd, false); /* Continue processing the next log from idx 0 */ @@ -208,57 +214,93 @@ static int llog_cat_process_or_fork(const struct lu_env *env, void *data, int startcat, int startidx, bool fork) { - struct llog_process_data d; struct llog_log_hdr *llh = cat_llh->lgh_hdr; + struct llog_process_data d; + struct llog_process_cat_data cd; int rc; LASSERT(llh->llh_flags & LLOG_F_IS_CAT); d.lpd_data = data; d.lpd_cb = cb; - d.lpd_startcat = (startcat == LLOG_CAT_FIRST ? 0 : startcat); - d.lpd_startidx = startidx; - if (llh->llh_cat_idx > cat_llh->lgh_last_idx) { - struct llog_process_cat_data cd = { - .lpcd_read_mode = LLOG_READ_MODE_NORMAL - }; + /* default: start from the oldest record */ + d.lpd_startidx = 0; + d.lpd_startcat = llh->llh_cat_idx + 1; + cd.lpcd_first_idx = llh->llh_cat_idx; + cd.lpcd_last_idx = 0; + cd.lpcd_read_mode = LLOG_READ_MODE_NORMAL; + + if (startcat > 0 && startcat <= llog_max_idx(llh)) { + /* start from a custom catalog/llog plain indexes*/ + d.lpd_startidx = startidx; + d.lpd_startcat = startcat; + cd.lpcd_first_idx = startcat - 1; + } else if (startcat != 0) { + CWARN("%s: startcat %d out of range for catlog "DFID"\n", + loghandle2name(cat_llh), startcat, + PLOGID(&cat_llh->lgh_id)); + return -EINVAL; + } + + startcat = d.lpd_startcat; + + /* if startcat <= lgh_last_idx, we only need to process the first part + * of the catalog (from startcat). + */ + if (llog_cat_is_wrapped(cat_llh) && startcat > cat_llh->lgh_last_idx) { + int cat_idx_origin = llh->llh_cat_idx; CWARN("%s: catlog " DFID " crosses index zero\n", loghandle2name(cat_llh), PLOGID(&cat_llh->lgh_id)); - /*startcat = 0 is default value for general processing */ - if ((startcat != LLOG_CAT_FIRST && - startcat >= llh->llh_cat_idx) || !startcat) { - /* processing the catalog part at the end */ - cd.lpcd_first_idx = (startcat ? startcat : - llh->llh_cat_idx); - cd.lpcd_last_idx = 0; - rc = llog_process_or_fork(env, cat_llh, cat_cb, - &d, &cd, fork); - /* Reset the startcat because it has already reached - * catalog bottom. - */ - startcat = 0; - d.lpd_startcat = 0; - if (rc != 0) - return rc; - } - /* processing the catalog part at the beginning */ - cd.lpcd_first_idx = (startcat == LLOG_CAT_FIRST) ? 0 : startcat; - /* Note, the processing will stop at the lgh_last_idx value, - * and it could be increased during processing. So records - * between current lgh_last_idx and lgh_last_idx in future - * would left unprocessed. - */ - cd.lpcd_last_idx = cat_llh->lgh_last_idx; + + /* processing the catalog part at the end */ rc = llog_process_or_fork(env, cat_llh, cat_cb, &d, &cd, fork); - } else { - rc = llog_process_or_fork(env, cat_llh, cat_cb, &d, NULL, fork); + if (rc) + return rc; + + /* Reset the startcat because it has already reached catalog + * bottom. + * lgh_last_idx value could be increased during processing. So + * we process the remaining of catalog entries to be sure. + */ + d.lpd_startcat = 1; + d.lpd_startidx = 0; + cd.lpcd_first_idx = 0; + cd.lpcd_last_idx = max(cat_idx_origin, cat_llh->lgh_last_idx); + } else if (llog_cat_is_wrapped(cat_llh)) { + /* only process 1st part -> stop before reaching 2sd part */ + cd.lpcd_last_idx = llh->llh_cat_idx; } + /* processing the catalog part at the beginning */ + rc = llog_process_or_fork(env, cat_llh, cat_cb, &d, &cd, fork); + return rc; } +/** + * Process catalog records with a callback + * + * @note + * If "starcat = 0", this is the default processing. "startidx" argument is + * ignored and processing begin from the oldest record. + * If "startcat > 0", this is a custom starting point. Processing begin with + * the llog plain defined in the catalog record at index "startcat". The first + * llog plain record to process is at index "startidx + 1". + * + * @env Lustre environnement + * @cat_llh Catalog llog handler + * @cb Callback executed for each records (in llog plain files) + * @data Callback data argument + * @startcat Catalog index of the llog plain to start with. + * @startidx Index of the llog plain to start processing. The first + * record to process is at startidx + 1. + * + * RETURN 0 processing successfully completed + * LLOG_PROC_BREAK processing was stopped by the callback. + * -errno on error. + */ int llog_cat_process(const struct lu_env *env, struct llog_handle *cat_llh, llog_cb_t cb, void *data, int startcat, int startidx) { diff --git a/include/uapi/linux/lustre/lustre_idl.h b/include/uapi/linux/lustre/lustre_idl.h index a752639..d60d1d8 100644 --- a/include/uapi/linux/lustre/lustre_idl.h +++ b/include/uapi/linux/lustre/lustre_idl.h @@ -2619,11 +2619,6 @@ enum llog_flag { LLOG_F_EXT_X_OMODE | LLOG_F_EXT_X_XATTR, }; -/* means first record of catalog */ -enum { - LLOG_CAT_FIRST = -1, -}; - /* On-disk header structure of each log object, stored in little endian order */ #define LLOG_MIN_CHUNK_SIZE 8192 #define LLOG_HEADER_SIZE (96) /* sizeof (llog_log_hdr) + From patchwork Sun Apr 9 12:13:03 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 13205964 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from pdx1-mailman-customer002.dreamhost.com (listserver-buz.dreamhost.com [69.163.136.29]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 31B7FC77B70 for ; Sun, 9 Apr 2023 12:38:52 +0000 (UTC) Received: from pdx1-mailman-customer002.dreamhost.com (localhost [127.0.0.1]) by pdx1-mailman-customer002.dreamhost.com (Postfix) with ESMTP id 4PvWRW0QtXz22VH; Sun, 9 Apr 2023 05:21:09 -0700 (PDT) Received: from smtp4.ccs.ornl.gov (smtp4.ccs.ornl.gov [160.91.203.40]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by pdx1-mailman-customer002.dreamhost.com (Postfix) with ESMTPS id 4PvWLb3MTRz1yD0 for ; Sun, 9 Apr 2023 05:16:55 -0700 (PDT) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp4.ccs.ornl.gov (Postfix) with ESMTP id 2CDD51008485; Sun, 9 Apr 2023 08:13:28 -0400 (EDT) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 2B5102AB; Sun, 9 Apr 2023 08:13:28 -0400 (EDT) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Sun, 9 Apr 2023 08:13:03 -0400 Message-Id: <1681042400-15491-24-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1681042400-15491-1-git-send-email-jsimmons@infradead.org> References: <1681042400-15491-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 23/40] lustre: llite: replace lld_nfs_dentry flag with opencache handling X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.39 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" The lld_nfs_dentry flag was created for the case of caching the open lock (opencache) when fetching fhandles for NFSv3. This same path is used by the fhandle APIs. This lighter open changes key behaviors since the open lock is always cached which we don't want. Lustre introduced a way to modify caching the open lock based on the number of opens done on a file within a certain span of time. We can replace lld_nfs_dentry flag with the new open lock caching. This way for fhandle handling we match the open lock caching behavior of a normal file open. WC-bug-id: https://jira.whamcloud.com/browse/LU-16463 Lustre-commit: d7a85652f4fcb8319 ("LU-16463 llite: replace lld_nfs_dentry flag with opencache handling") Signed-off-by: James Simmons Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/49237 Reviewed-by: Andreas Dilger Reviewed-by: Etienne AUJAMES Reviewed-by: Oleg Drokin --- fs/lustre/llite/file.c | 26 ++++++++------------------ fs/lustre/llite/llite_internal.h | 13 +++++++------ fs/lustre/llite/llite_nfs.c | 15 +-------------- fs/lustre/llite/namei.c | 8 +++++++- fs/lustre/llite/super25.c | 2 ++ 5 files changed, 25 insertions(+), 39 deletions(-) diff --git a/fs/lustre/llite/file.c b/fs/lustre/llite/file.c index a9d247c..fb8ede2 100644 --- a/fs/lustre/llite/file.c +++ b/fs/lustre/llite/file.c @@ -916,7 +916,7 @@ int ll_file_open(struct inode *inode, struct file *file) if (!it->it_disposition) { struct dentry *dentry = file_dentry(file); struct ll_sb_info *sbi = ll_i2sbi(inode); - struct ll_dentry_data *ldd; + int open_threshold = sbi->ll_oc_thrsh_count; /* We cannot just request lock handle now, new ELC code * means that one of other OPEN locks for this file @@ -927,22 +927,20 @@ int ll_file_open(struct inode *inode, struct file *file) mutex_unlock(&lli->lli_och_mutex); /* * Normally called under two situations: - * 1. NFS export. + * 1. fhandle / NFS export. * 2. A race/condition on MDS resulting in no open * handle to be returned from LOOKUP|OPEN request, * for example if the target entry was a symlink. * - * In NFS path we know there's pathologic behavior - * so we always enable open lock caching when coming - * from there. It's detected by setting a flag in - * ll_iget_for_nfs. - * * After reaching number of opens of this inode * we always ask for an open lock on it to handle * bad userspace actors that open and close files * in a loop for absolutely no good reason */ - ldd = ll_d2d(dentry); + /* fhandle / NFS path. */ + if (lli->lli_open_thrsh_count != UINT_MAX) + open_threshold = lli->lli_open_thrsh_count; + if (filename_is_volatile(dentry->d_name.name, dentry->d_name.len, NULL)) { @@ -951,17 +949,9 @@ int ll_file_open(struct inode *inode, struct file *file) * We do not want openlock for volatile * files under any circumstances */ - } else if (ldd && ldd->lld_nfs_dentry) { - /* NFS path. This also happens to catch - * open by fh files I guess - */ - it->it_flags |= MDS_OPEN_LOCK; - /* clear the flag for future lookups */ - ldd->lld_nfs_dentry = 0; - } else if (sbi->ll_oc_thrsh_count > 0) { + } else if (open_threshold > 0) { /* Take MDS_OPEN_LOCK with many opens */ - if (lli->lli_open_fd_count >= - sbi->ll_oc_thrsh_count) + if (lli->lli_open_fd_count >= open_threshold) it->it_flags |= MDS_OPEN_LOCK; /* If this is open after we just closed */ diff --git a/fs/lustre/llite/llite_internal.h b/fs/lustre/llite/llite_internal.h index 6bbc781..cdfc75e 100644 --- a/fs/lustre/llite/llite_internal.h +++ b/fs/lustre/llite/llite_internal.h @@ -72,7 +72,6 @@ struct ll_dentry_data { unsigned int lld_sa_generation; unsigned int lld_invalid:1; - unsigned int lld_nfs_dentry:1; struct rcu_head lld_rcu_head; }; @@ -145,11 +144,6 @@ struct ll_inode_info { u64 lli_open_fd_write_count; u64 lli_open_fd_exec_count; - /* Number of times this inode was opened */ - u64 lli_open_fd_count; - /* When last close was performed on this inode */ - ktime_t lli_close_fd_time; - /* Protects access to och pointers and their usage counters */ struct mutex lli_och_mutex; @@ -162,6 +156,13 @@ struct ll_inode_info { s64 lli_btime; spinlock_t lli_agl_lock; + /* inode specific open lock caching threshold */ + u32 lli_open_thrsh_count; + /* Number of times this inode was opened */ + u64 lli_open_fd_count; + /* When last close was performed on this inode */ + ktime_t lli_close_fd_time; + /* Try to make the d::member and f::member are aligned. Before using * these members, make clear whether it is directory or not. */ diff --git a/fs/lustre/llite/llite_nfs.c b/fs/lustre/llite/llite_nfs.c index 3c4c9ef..232b2b3 100644 --- a/fs/lustre/llite/llite_nfs.c +++ b/fs/lustre/llite/llite_nfs.c @@ -114,7 +114,6 @@ struct inode *search_inode_for_lustre(struct super_block *sb, struct lu_fid *fid, struct lu_fid *parent) { struct inode *inode; - struct dentry *result; if (!fid_is_sane(fid)) return ERR_PTR(-ESTALE); @@ -131,19 +130,7 @@ struct inode *search_inode_for_lustre(struct super_block *sb, return ERR_PTR(-ESTALE); } - result = d_obtain_alias(inode); - if (IS_ERR(result)) - return result; - - /* - * Need to signal to the ll_intent_file_open that - * we came from NFS and so opencache needs to be - * enabled for this one - */ - spin_lock(&result->d_lock); - ll_d2d(result)->lld_nfs_dentry = 1; - spin_unlock(&result->d_lock); - return result; + return d_obtain_alias(inode); } /** diff --git a/fs/lustre/llite/namei.c b/fs/lustre/llite/namei.c index 9314a17..ada539e 100644 --- a/fs/lustre/llite/namei.c +++ b/fs/lustre/llite/namei.c @@ -1144,6 +1144,7 @@ static int ll_atomic_open(struct inode *dir, struct dentry *dentry, struct ll_sb_info *sbi = NULL; struct pcc_create_attach pca = { NULL, NULL }; bool encrypt = false; + int open_threshold; int rc = 0; CDEBUG(D_VFSTRACE, @@ -1224,7 +1225,12 @@ static int ll_atomic_open(struct inode *dir, struct dentry *dentry, * we only need to request open lock if it was requested * for every open */ - if (ll_i2sbi(dir)->ll_oc_thrsh_count == 1 && + if (ll_i2info(dir)->lli_open_thrsh_count != UINT_MAX) + open_threshold = ll_i2info(dir)->lli_open_thrsh_count; + else + open_threshold = ll_i2sbi(dir)->ll_oc_thrsh_count; + + if (open_threshold == 1 && exp_connect_flags2(ll_i2mdexp(dir)) & OBD_CONNECT2_ATOMIC_OPEN_LOCK) it->it_flags |= MDS_OPEN_LOCK; diff --git a/fs/lustre/llite/super25.c b/fs/lustre/llite/super25.c index 5349a25..50272a7 100644 --- a/fs/lustre/llite/super25.c +++ b/fs/lustre/llite/super25.c @@ -55,6 +55,8 @@ static struct inode *ll_alloc_inode(struct super_block *sb) return NULL; inode_init_once(&lli->lli_vfs_inode); + lli->lli_open_thrsh_count = UINT_MAX; + return &lli->lli_vfs_inode; } From patchwork Sun Apr 9 12:13:04 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 13205957 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from pdx1-mailman-customer002.dreamhost.com (listserver-buz.dreamhost.com [69.163.136.29]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 67E84C77B61 for ; Sun, 9 Apr 2023 12:34:28 +0000 (UTC) Received: from pdx1-mailman-customer002.dreamhost.com (localhost [127.0.0.1]) by pdx1-mailman-customer002.dreamhost.com (Postfix) with ESMTP id 4PvWNy2rbqz22QT; Sun, 9 Apr 2023 05:18:58 -0700 (PDT) Received: from smtp4.ccs.ornl.gov (smtp4.ccs.ornl.gov [160.91.203.40]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by pdx1-mailman-customer002.dreamhost.com (Postfix) with ESMTPS id 4PvWLh3Lkdz21Hw for ; Sun, 9 Apr 2023 05:17:00 -0700 (PDT) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp4.ccs.ornl.gov (Postfix) with ESMTP id 314EF1008486; Sun, 9 Apr 2023 08:13:28 -0400 (EDT) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 3013A2B2; Sun, 9 Apr 2023 08:13:28 -0400 (EDT) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Sun, 9 Apr 2023 08:13:04 -0400 Message-Id: <1681042400-15491-25-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1681042400-15491-1-git-send-email-jsimmons@infradead.org> References: <1681042400-15491-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 24/40] lustre: llite: match lock in corresponding namespace X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.39 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lai Siyao , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Lai Siyao For remote object, LOOKUP lock is on parent MDT, so lmv_lock_match() iterates all MDT namespaces to match locks. This is needed in places where only LOOKUP ibit is matched, and the lock namespace is unknown. WC-bug-id: https://jira.whamcloud.com/browse/LU-15971 Lustre-commit: 64264dc424ca13d90 ("LU-15971 llite: match lock in corresponding namespace") Signed-off-by: Lai Siyao Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/47843 Reviewed-by: Andreas Dilger Reviewed-by: Qian Yingjin Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/llite/dir.c | 3 ++- fs/lustre/llite/file.c | 6 ++--- fs/lustre/llite/llite_internal.h | 4 ++-- fs/lustre/llite/namei.c | 7 +++--- fs/lustre/lmv/lmv_obd.c | 52 ++++++++++++++++++++++++---------------- 5 files changed, 42 insertions(+), 30 deletions(-) diff --git a/fs/lustre/llite/dir.c b/fs/lustre/llite/dir.c index 1298bd6..0422701 100644 --- a/fs/lustre/llite/dir.c +++ b/fs/lustre/llite/dir.c @@ -347,7 +347,8 @@ static int ll_readdir(struct file *filp, struct dir_context *ctx) struct inode *parent; parent = file_dentry(filp)->d_parent->d_inode; - if (ll_have_md_lock(parent, &ibits, LCK_MINMODE)) + if (ll_have_md_lock(ll_i2mdexp(parent), parent, &ibits, + LCK_MINMODE)) pfid = *ll_inode2fid(parent); } diff --git a/fs/lustre/llite/file.c b/fs/lustre/llite/file.c index fb8ede2..746c18f 100644 --- a/fs/lustre/llite/file.c +++ b/fs/lustre/llite/file.c @@ -5198,7 +5198,7 @@ int ll_migrate(struct inode *parent, struct file *file, struct lmv_user_md *lum, * * Returns: boolean, true iff all bits are found */ -int ll_have_md_lock(struct inode *inode, u64 *bits, +int ll_have_md_lock(struct obd_export *exp, struct inode *inode, u64 *bits, enum ldlm_mode l_req_mode) { struct lustre_handle lockh; @@ -5222,8 +5222,8 @@ int ll_have_md_lock(struct inode *inode, u64 *bits, if (policy.l_inodebits.bits == 0) continue; - if (md_lock_match(ll_i2mdexp(inode), flags, fid, LDLM_IBITS, - &policy, mode, &lockh)) { + if (md_lock_match(exp, flags, fid, LDLM_IBITS, &policy, mode, + &lockh)) { struct ldlm_lock *lock; lock = ldlm_handle2lock(&lockh); diff --git a/fs/lustre/llite/llite_internal.h b/fs/lustre/llite/llite_internal.h index cdfc75e..b101a71 100644 --- a/fs/lustre/llite/llite_internal.h +++ b/fs/lustre/llite/llite_internal.h @@ -1175,8 +1175,8 @@ static inline void mapping_clear_exiting(struct address_space *mapping) /* llite/file.c */ extern const struct inode_operations ll_file_inode_operations; const struct file_operations *ll_select_file_operations(struct ll_sb_info *sbi); -int ll_have_md_lock(struct inode *inode, u64 *bits, - enum ldlm_mode l_req_mode); +int ll_have_md_lock(struct obd_export *exp, struct inode *inode, + u64 *bits, enum ldlm_mode l_req_mode); enum ldlm_mode ll_take_md_lock(struct inode *inode, u64 bits, struct lustre_handle *lockh, u64 flags, enum ldlm_mode mode); diff --git a/fs/lustre/llite/namei.c b/fs/lustre/llite/namei.c index ada539e..0c4c8e6 100644 --- a/fs/lustre/llite/namei.c +++ b/fs/lustre/llite/namei.c @@ -256,7 +256,8 @@ static void ll_lock_cancel_bits(struct ldlm_lock *lock, u64 to_cancel) * LCK_CR, LCK_CW, LCK_PR - bug 22891 */ if (bits & MDS_INODELOCK_OPEN) - ll_have_md_lock(inode, &bits, lock->l_req_mode); + ll_have_md_lock(lock->l_conn_export, inode, &bits, + lock->l_req_mode); if (bits & MDS_INODELOCK_OPEN) { fmode_t fmode; @@ -284,7 +285,7 @@ static void ll_lock_cancel_bits(struct ldlm_lock *lock, u64 to_cancel) if (bits & (MDS_INODELOCK_LOOKUP | MDS_INODELOCK_UPDATE | MDS_INODELOCK_LAYOUT | MDS_INODELOCK_PERM | MDS_INODELOCK_DOM)) - ll_have_md_lock(inode, &bits, LCK_MINMODE); + ll_have_md_lock(lock->l_conn_export, inode, &bits, LCK_MINMODE); if (bits & MDS_INODELOCK_DOM) { rc = ll_dom_lock_cancel(inode, lock); @@ -435,7 +436,7 @@ int ll_md_need_convert(struct ldlm_lock *lock) unlock_res_and_lock(lock); inode = ll_inode_from_resource_lock(lock); - ll_have_md_lock(inode, &bits, mode); + ll_have_md_lock(lock->l_conn_export, inode, &bits, mode); iput(inode); return !!(bits); } diff --git a/fs/lustre/lmv/lmv_obd.c b/fs/lustre/lmv/lmv_obd.c index 1b6e4aa..1b95d93 100644 --- a/fs/lustre/lmv/lmv_obd.c +++ b/fs/lustre/lmv/lmv_obd.c @@ -3558,39 +3558,49 @@ static enum ldlm_mode lmv_lock_match(struct obd_export *exp, u64 flags, { struct obd_device *obd = exp->exp_obd; struct lmv_obd *lmv = &obd->u.lmv; - enum ldlm_mode rc; struct lu_tgt_desc *tgt; - int i; + u64 bits = policy->l_inodebits.bits; + enum ldlm_mode rc = LCK_MINMODE; int index; + int i; CDEBUG(D_INODE, "Lock match for " DFID "\n", PFID(fid)); - /* - * With DNE every object can have two locks in different namespaces: + /* only one bit is set */ + LASSERT(bits && !(bits & (bits - 1))); + /* With DNE every object can have two locks in different namespaces: * lookup lock in space of MDT storing direntry and update/open lock in * space of MDT storing inode. Try the MDT that the FID maps to first, * since this can be easily found, and only try others if that fails. */ - for (i = 0, index = lmv_fid2tgt_index(lmv, fid); - i < lmv->lmv_mdt_descs.ltd_tgts_size; - i++, index = (index + 1) % lmv->lmv_mdt_descs.ltd_tgts_size) { - if (index < 0) { - CDEBUG(D_HA, "%s: " DFID " is inaccessible: rc = %d\n", - obd->obd_name, PFID(fid), index); - index = 0; + if (bits == MDS_INODELOCK_LOOKUP) { + for (i = 0, index = lmv_fid2tgt_index(lmv, fid); + i < lmv->lmv_mdt_descs.ltd_tgts_size; i++, + index = (index + 1) % lmv->lmv_mdt_descs.ltd_tgts_size) { + if (index < 0) { + CDEBUG(D_HA, + "%s: " DFID " is inaccessible: rc = %d\n", + obd->obd_name, PFID(fid), index); + index = 0; + } + tgt = lmv_tgt(lmv, index); + if (!tgt || !tgt->ltd_exp || !tgt->ltd_active) + continue; + rc = md_lock_match(tgt->ltd_exp, flags, fid, type, + policy, mode, lockh); + if (rc) + break; } - - tgt = lmv_tgt(lmv, index); - if (!tgt || !tgt->ltd_exp || !tgt->ltd_active) - continue; - - rc = md_lock_match(tgt->ltd_exp, flags, fid, type, policy, mode, - lockh); - if (rc) - return rc; + } else { + tgt = lmv_fid2tgt(lmv, fid); + if (!IS_ERR(tgt) && tgt->ltd_exp && tgt->ltd_active) + rc = md_lock_match(tgt->ltd_exp, flags, fid, type, + policy, mode, lockh); } - return 0; + CDEBUG(D_INODE, "Lock match for "DFID": %d\n", PFID(fid), rc); + + return rc; } static int lmv_get_lustre_md(struct obd_export *exp, From patchwork Sun Apr 9 12:13:05 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 13205959 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from pdx1-mailman-customer002.dreamhost.com (listserver-buz.dreamhost.com [69.163.136.29]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 1A840C77B61 for ; Sun, 9 Apr 2023 12:36:00 +0000 (UTC) Received: from pdx1-mailman-customer002.dreamhost.com (localhost [127.0.0.1]) by pdx1-mailman-customer002.dreamhost.com (Postfix) with ESMTP id 4PvWPb0zNjz22RV; Sun, 9 Apr 2023 05:19:31 -0700 (PDT) Received: from smtp4.ccs.ornl.gov (smtp4.ccs.ornl.gov [160.91.203.40]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by pdx1-mailman-customer002.dreamhost.com (Postfix) with ESMTPS id 4PvWM94CKlz21JV for ; Sun, 9 Apr 2023 05:17:25 -0700 (PDT) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp4.ccs.ornl.gov (Postfix) with ESMTP id 35D541008487; Sun, 9 Apr 2023 08:13:28 -0400 (EDT) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 349252B3; Sun, 9 Apr 2023 08:13:28 -0400 (EDT) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Sun, 9 Apr 2023 08:13:05 -0400 Message-Id: <1681042400-15491-26-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1681042400-15491-1-git-send-email-jsimmons@infradead.org> References: <1681042400-15491-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 25/40] lnet: libcfs: remove unused hash code X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.39 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Timothy Day Two functions which hash then apply a mask are removed. WC-bug-id: https://jira.whamcloud.com/browse/LU-16518 Lustre-commit: 239e826876e5e2040 ("LU-16518 misc: use fixed hash code") Signed-off-by: Timothy Day Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/49916 Reviewed-by: jsimmons Reviewed-by: Oleg Drokin Reviewed-by: Andreas Dilger Signed-off-by: James Simmons --- include/linux/libcfs/libcfs_hash.h | 18 ------------------ 1 file changed, 18 deletions(-) diff --git a/include/linux/libcfs/libcfs_hash.h b/include/linux/libcfs/libcfs_hash.h index d3b4875..d60e002 100644 --- a/include/linux/libcfs/libcfs_hash.h +++ b/include/linux/libcfs/libcfs_hash.h @@ -829,24 +829,6 @@ static inline int __cfs_hash_theta(struct cfs_hash *hs) return (hash & mask); } -/* - * Generic u32 hash algorithm. - */ -static inline unsigned -cfs_hash_u32_hash(const u32 key, unsigned int mask) -{ - return ((key * CFS_GOLDEN_RATIO_PRIME_32) & mask); -} - -/* - * Generic u64 hash algorithm. - */ -static inline unsigned -cfs_hash_u64_hash(const u64 key, unsigned int mask) -{ - return ((unsigned int)(key * CFS_GOLDEN_RATIO_PRIME_64) & mask); -} - /** iterate over all buckets in @bds (array of struct cfs_hash_bd) */ #define cfs_hash_for_each_bd(bds, n, i) \ for (i = 0; i < n && (bds)[i].bd_bucket != NULL; i++) From patchwork Sun Apr 9 12:13:06 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 13205962 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from pdx1-mailman-customer002.dreamhost.com (listserver-buz.dreamhost.com [69.163.136.29]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 54613C77B61 for ; Sun, 9 Apr 2023 12:37:50 +0000 (UTC) Received: from pdx1-mailman-customer002.dreamhost.com (localhost [127.0.0.1]) by pdx1-mailman-customer002.dreamhost.com (Postfix) with ESMTP id 4PvWQC34qPz22SS; Sun, 9 Apr 2023 05:20:03 -0700 (PDT) Received: from smtp4.ccs.ornl.gov (smtp4.ccs.ornl.gov [160.91.203.40]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by pdx1-mailman-customer002.dreamhost.com (Postfix) with ESMTPS id 4PvWMK2dxlz21KC for ; Sun, 9 Apr 2023 05:17:33 -0700 (PDT) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp4.ccs.ornl.gov (Postfix) with ESMTP id 3A4A41008488; Sun, 9 Apr 2023 08:13:28 -0400 (EDT) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 390FF2AB; Sun, 9 Apr 2023 08:13:28 -0400 (EDT) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Sun, 9 Apr 2023 08:13:06 -0400 Message-Id: <1681042400-15491-27-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1681042400-15491-1-git-send-email-jsimmons@infradead.org> References: <1681042400-15491-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 26/40] lustre: client: -o network needs add_conn processing X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.39 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Mikhail Pershin , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Mikhail Pershin Mount option -o network restricts client import to use only selected network. It processes connection UUID/NIDs during 'setup' config command handling but skips any 'add_conn' command if its UUID has no mention about that network. Meahwhile connection UUID is just a name and may have many NIDs configured including those on restricted network which are skipped as well. Therefore client import configuration misses failover NIDs on restricted network. Patch makes import to save restricted network information after 'setup' command processing, so it is applied to any client_import_add_conn() call. The 'add_conn' command is always processed now and its NIDs will be filtered in the same way as for 'setup'. WC-bug-id: https://jira.whamcloud.com/browse/LU-16557 Lustre-commit: c508c9426838f1625 ("LU-16557 client: -o network needs add_conn processing") Signed-off-by: Mikhail Pershin Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/49986 Reviewed-by: Sebastien Buisson Reviewed-by: Andreas Dilger Reviewed-by: Cyril Bordage Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/include/lustre_import.h | 1 + fs/lustre/ldlm/ldlm_lib.c | 20 +++++++++----------- fs/lustre/obdclass/obd_config.c | 17 ----------------- 3 files changed, 10 insertions(+), 28 deletions(-) diff --git a/fs/lustre/include/lustre_import.h b/fs/lustre/include/lustre_import.h index 3ae05b5..ac46aae 100644 --- a/fs/lustre/include/lustre_import.h +++ b/fs/lustre/include/lustre_import.h @@ -340,6 +340,7 @@ struct obd_import { struct imp_at imp_at; /* adaptive timeout data */ time64_t imp_last_reply_time; /* for health check */ + u32 imp_conn_restricted_net; }; /* import.c : adaptive timeout handling. diff --git a/fs/lustre/ldlm/ldlm_lib.c b/fs/lustre/ldlm/ldlm_lib.c index ddedaad..0b8389e 100644 --- a/fs/lustre/ldlm/ldlm_lib.c +++ b/fs/lustre/ldlm/ldlm_lib.c @@ -56,7 +56,7 @@ static int import_set_conn(struct obd_import *imp, struct obd_uuid *uuid, { struct ptlrpc_connection *ptlrpc_conn; struct obd_import_conn *imp_conn = NULL, *item; - u32 refnet = LNET_NET_ANY; + u32 refnet = imp->imp_conn_restricted_net; int rc = 0; if (!create && !priority) { @@ -64,10 +64,11 @@ static int import_set_conn(struct obd_import *imp, struct obd_uuid *uuid, return -EINVAL; } - if (imp->imp_connection && - imp->imp_connection->c_remote_uuid.uuid[0] == 0) - /* refnet is used to restrict network connections */ - refnet = LNET_NID_NET(&imp->imp_connection->c_self); + /* refnet is used to restrict network connections */ + if (refnet != LNET_NET_ANY) + CDEBUG(D_HA, "imp %s: restrict %s to %s net\n", + imp->imp_obd->obd_name, uuid->uuid, + libcfs_net2str(refnet)); ptlrpc_conn = ptlrpc_uuid_to_connection(uuid, refnet); if (!ptlrpc_conn) { @@ -296,10 +297,6 @@ int client_obd_setup(struct obd_device *obd, struct lustre_cfg *lcfg) int rq_portal, rp_portal, connect_op; const char *name = obd->obd_type->typ_name; enum ldlm_ns_type ns_type = LDLM_NS_TYPE_UNKNOWN; - struct ptlrpc_connection fake_conn = { - .c_self = {}, - .c_remote_uuid.uuid[0] = 0 - }; int rc; /* @@ -494,8 +491,9 @@ int client_obd_setup(struct obd_device *obd, struct lustre_cfg *lcfg) rc); goto err_import; } - lnet_nid4_to_nid(LNET_MKNID(refnet, 0), &fake_conn.c_self); - imp->imp_connection = &fake_conn; + imp->imp_conn_restricted_net = refnet; + } else { + imp->imp_conn_restricted_net = LNET_NET_ANY; } rc = client_import_add_conn(imp, &server_uuid, 1); diff --git a/fs/lustre/obdclass/obd_config.c b/fs/lustre/obdclass/obd_config.c index 953f544..f2173df 100644 --- a/fs/lustre/obdclass/obd_config.c +++ b/fs/lustre/obdclass/obd_config.c @@ -1331,23 +1331,6 @@ int class_config_llog_handler(const struct lu_env *env, } } - /* Skip add_conn command if uuid is not on restricted net */ - if (clli && clli->cfg_sb && s2lsi(clli->cfg_sb)) { - struct lustre_sb_info *lsi = s2lsi(clli->cfg_sb); - char *uuid_str = lustre_cfg_string(lcfg, 1); - - if (lcfg->lcfg_command == LCFG_ADD_CONN && - lsi->lsi_lmd->lmd_nidnet && - LNET_NIDNET(libcfs_str2nid(uuid_str)) != - libcfs_str2net(lsi->lsi_lmd->lmd_nidnet)) { - CDEBUG(D_CONFIG, "skipping add_conn for %s\n", - uuid_str); - rc = 0; - /* No processing! */ - break; - } - } - lcfg_len = lustre_cfg_len(bufs.lcfg_bufcount, bufs.lcfg_buflen); lcfg_new = kzalloc(lcfg_len, GFP_NOFS); if (!lcfg_new) { From patchwork Sun Apr 9 12:13:07 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 13205973 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from pdx1-mailman-customer002.dreamhost.com (listserver-buz.dreamhost.com [69.163.136.29]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id E1162C77B61 for ; Sun, 9 Apr 2023 12:41:50 +0000 (UTC) Received: from pdx1-mailman-customer002.dreamhost.com (localhost [127.0.0.1]) by pdx1-mailman-customer002.dreamhost.com (Postfix) with ESMTP id 4PvWSZ6nZRz22WQ; Sun, 9 Apr 2023 05:22:06 -0700 (PDT) Received: from smtp4.ccs.ornl.gov (smtp4.ccs.ornl.gov [160.91.203.40]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by pdx1-mailman-customer002.dreamhost.com (Postfix) with ESMTPS id 4PvWMk4q8Nz21bB for ; Sun, 9 Apr 2023 05:17:54 -0700 (PDT) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp4.ccs.ornl.gov (Postfix) with ESMTP id 3EBF41008489; Sun, 9 Apr 2023 08:13:28 -0400 (EDT) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 3D8782B2; Sun, 9 Apr 2023 08:13:28 -0400 (EDT) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Sun, 9 Apr 2023 08:13:07 -0400 Message-Id: <1681042400-15491-28-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1681042400-15491-1-git-send-email-jsimmons@infradead.org> References: <1681042400-15491-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 27/40] lnet: Lock primary NID logic X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.39 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Amir Shehata , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Amir Shehata If a peer is created by Lustre make sure to lock that peer's primary NID. This peer can be discovered in the background. There is no need to block until discovery is complete, as Lustre can continue on with the primary NID it provided. Discovery will populate the peer with other interfaces the peer has but will not change the peer's primary NID. It can also delete peer's NIDs which Lustre told it about (not the Primary NID). If a peer has been manually discovered via lnetctl discover command, then make sure to delete the manually discovered peer and recreate it with the Lustre NID information provided for us. WC-bug-id: https://jira.whamcloud.com/browse/LU-14668 Lustre-commit: aacb16191a72bc6db ("LU-14668 lnet: Lock primary NID logic") Signed-off-by: Amir Shehata Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/50106 Reviewed-by: Oleg Drokin Reviewed-by: Cyril Bordage Reviewed-by: Frank Sehr Signed-off-by: James Simmons --- net/lnet/lnet/peer.c | 106 +++++++++++++++++++++++++++++++++++++++++---------- 1 file changed, 86 insertions(+), 20 deletions(-) diff --git a/net/lnet/lnet/peer.c b/net/lnet/lnet/peer.c index da1f8d4..0539cb4 100644 --- a/net/lnet/lnet/peer.c +++ b/net/lnet/lnet/peer.c @@ -534,6 +534,15 @@ static void lnet_peer_cancel_discovery(struct lnet_peer *lp) } } + /* If we're asked to lock down the primary NID we shouldn't be + * deleting it + */ + if (lp->lp_state & LNET_PEER_LOCK_PRIMARY && + nid_same(&primary_nid, nid)) { + rc = -EPERM; + goto out; + } + lpni = lnet_peer_ni_find_locked(nid); if (!lpni) { rc = -ENOENT; @@ -1358,6 +1367,19 @@ struct lnet_peer_ni * if (LNET_NID_IS_ANY(&pnid)) { lnet_nid4_to_nid(nids[i], &pnid); rc = lnet_add_peer_ni(&pnid, &LNET_ANY_NID, mr, true); + if (rc == -EALREADY) { + struct lnet_peer *lp; + + CDEBUG(D_NET, "A peer exists for NID %s\n", + libcfs_nidstr(&pnid)); + rc = 0; + /* Adds a refcount */ + lp = lnet_find_peer(&pnid); + LASSERT(lp); + pnid = lp->lp_primary_nid; + /* Drop refcount from lookup */ + lnet_peer_decref_locked(lp); + } } else if (lnet_peer_discovery_disabled) { lnet_nid4_to_nid(nids[i], &nid); rc = lnet_add_peer_ni(&nid, &LNET_ANY_NID, mr, true); @@ -1405,13 +1427,20 @@ void LNetPrimaryNID(struct lnet_nid *nid) * down then this discovery can introduce long delays into the mount * process, so skip it if it isn't necessary. */ - while (!lnet_peer_discovery_disabled && !lnet_peer_is_uptodate(lp)) { - spin_lock(&lp->lp_lock); + spin_lock(&lp->lp_lock); + if (!lnet_peer_discovery_disabled && + (!(lp->lp_state & LNET_PEER_LOCK_PRIMARY) || + !lnet_peer_is_uptodate_locked(lp))) { /* force a full discovery cycle */ - lp->lp_state |= LNET_PEER_FORCE_PING | LNET_PEER_FORCE_PUSH; + lp->lp_state |= LNET_PEER_FORCE_PING | LNET_PEER_FORCE_PUSH | + LNET_PEER_LOCK_PRIMARY; spin_unlock(&lp->lp_lock); - rc = lnet_discover_peer_locked(lpni, cpt, true); + /* start discovery in the background. Messages to that + * peer will not go through until the discovery is + * complete + */ + rc = lnet_discover_peer_locked(lpni, cpt, false); if (rc) goto out_decref; /* The lpni (or lp) for this NID may have changed and our ref is @@ -1425,14 +1454,8 @@ void LNetPrimaryNID(struct lnet_nid *nid) goto out_unlock; } lp = lpni->lpni_peer_net->lpn_peer; - - /* If we find that the peer has discovery disabled then we will - * not modify whatever primary NID is currently set for this - * peer. Thus, we can break out of this loop even if the peer - * is not fully up to date. - */ - if (lnet_is_discovery_disabled(lp)) - break; + } else { + spin_unlock(&lp->lp_lock); } *nid = lp->lp_primary_nid; out_decref: @@ -1538,6 +1561,8 @@ struct lnet_peer_net * lnet_peer_clr_non_mr_pref_nids(lp); } } + if (flags & LNET_PEER_LOCK_PRIMARY) + lp->lp_state |= LNET_PEER_LOCK_PRIMARY; spin_unlock(&lp->lp_lock); lp->lp_nnis++; @@ -1599,13 +1624,28 @@ struct lnet_peer_net * else if ((lp->lp_state ^ flags) & LNET_PEER_MULTI_RAIL) rc = -EPERM; goto out; - } else if (!(flags & LNET_PEER_CONFIGURED)) { + } else if (lp->lp_state & LNET_PEER_LOCK_PRIMARY) { if (nid_same(&lp->lp_primary_nid, nid)) { rc = -EEXIST; goto out; } + /* we're trying to recreate an existing peer which + * has already been created and its primary + * locked. This is likely due to two servers + * existing on the same node. So we'll just refer + * to that node with the primary NID which was + * first added by Lustre + */ + rc = -EALREADY; + goto out; } - /* Delete and recreate as a configured peer. */ + /* Delete and recreate the peer. + * We can get here: + * 1. If the peer is being recreated as a configured NID + * 2. if there already exists a peer which + * was discovered manually, but is recreated via Lustre + * with PRIMARY_lock + */ rc = lnet_peer_del(lp); if (rc) goto out; @@ -1695,19 +1735,36 @@ struct lnet_peer_net * } /* If this is the primary NID, destroy the peer. */ if (lnet_peer_ni_is_primary(lpni)) { - struct lnet_peer *rtr_lp = + struct lnet_peer *lp2 = lpni->lpni_peer_net->lpn_peer; - int rtr_refcount = rtr_lp->lp_rtr_refcount; - + int rtr_refcount = lp2->lp_rtr_refcount; + + /* If the new peer that this NID belongs to is + * a primary NID for another peer which we're + * suppose to preserve the Primary for then we + * don't want to mess with it. But the + * configuration is wrong at this point, so we + * should flag both of these peers as in a bad + * state + */ + if (lp2->lp_state & LNET_PEER_LOCK_PRIMARY) { + spin_lock(&lp->lp_lock); + lp->lp_state |= LNET_PEER_BAD_CONFIG; + spin_unlock(&lp->lp_lock); + spin_lock(&lp2->lp_lock); + lp2->lp_state |= LNET_PEER_BAD_CONFIG; + spin_unlock(&lp2->lp_lock); + goto out_free_lpni; + } /* if we're trying to delete a router it means * we're moving this peer NI to a new peer so must * transfer router properties to the new peer */ if (rtr_refcount > 0) { flags |= LNET_PEER_RTR_NI_FORCE_DEL; - lnet_rtr_transfer_to_peer(rtr_lp, lp); + lnet_rtr_transfer_to_peer(lp2, lp); } - lnet_peer_del(lpni->lpni_peer_net->lpn_peer); + lnet_peer_del(lp2); lnet_peer_ni_decref_locked(lpni); lpni = lnet_peer_ni_alloc(nid); if (!lpni) { @@ -1765,7 +1822,8 @@ struct lnet_peer_net * if (nid_same(&lp->lp_primary_nid, nid)) goto out; - lp->lp_primary_nid = *nid; + if (!(lp->lp_state & LNET_PEER_LOCK_PRIMARY)) + lp->lp_primary_nid = *nid; rc = lnet_peer_add_nid(lp, nid, flags); if (rc) { @@ -1773,6 +1831,14 @@ struct lnet_peer_net * goto out; } out: + /* if this is a configured peer or the primary for that peer has + * been locked, then we don't want to flag this scenario as + * a failure + */ + if (lp->lp_state & LNET_PEER_CONFIGURED || + lp->lp_state & LNET_PEER_LOCK_PRIMARY) + return 0; + CDEBUG(D_NET, "peer %s NID %s: %d\n", libcfs_nidstr(&old), libcfs_nidstr(nid), rc); From patchwork Sun Apr 9 12:13:08 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 13205963 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from pdx1-mailman-customer002.dreamhost.com (listserver-buz.dreamhost.com [69.163.136.29]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id F0D38C77B70 for ; Sun, 9 Apr 2023 12:38:37 +0000 (UTC) Received: from pdx1-mailman-customer002.dreamhost.com (localhost [127.0.0.1]) by pdx1-mailman-customer002.dreamhost.com (Postfix) with ESMTP id 4PvWQv6r7Jz22TX; Sun, 9 Apr 2023 05:20:39 -0700 (PDT) Received: from smtp4.ccs.ornl.gov (smtp4.ccs.ornl.gov [160.91.203.40]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by pdx1-mailman-customer002.dreamhost.com (Postfix) with ESMTPS id 4PvWNJ4GZRz22PM for ; Sun, 9 Apr 2023 05:18:24 -0700 (PDT) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp4.ccs.ornl.gov (Postfix) with ESMTP id 43479100848A; Sun, 9 Apr 2023 08:13:28 -0400 (EDT) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 420632B3; Sun, 9 Apr 2023 08:13:28 -0400 (EDT) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Sun, 9 Apr 2023 08:13:08 -0400 Message-Id: <1681042400-15491-29-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1681042400-15491-1-git-send-email-jsimmons@infradead.org> References: <1681042400-15491-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 28/40] lnet: Peers added via kernel API should be permanent X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.39 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Chris Horn , Amir Shehata , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Chris Horn The LNetAddPeer() API allows Lustre to predefine the Peer for LNet. Originally these peers would be temporary and potentially re-created via discovery. Instead, let's make these peers permanent. This allows Lustre to dictate the primary NID of the peer. LNet makes sure this primary NID is not changed afterwards. WC-bug-id: https://jira.whamcloud.com/browse/LU-14668 Lustre-commit: 41733dadd8ad0e87e ("LU-14668 lnet: Peers added via kernel API should be permanent") Signed-off-by: Amir Shehata Signed-off-by: Chris Horn Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/43788 Reviewed-by: Oleg Drokin Reviewed-by: Serguei Smirnov Reviewed-by: Cyril Bordage Reviewed-by: Frank Sehr Signed-off-by: James Simmons --- include/linux/lnet/lib-lnet.h | 3 +-- net/lnet/lnet/api-ni.c | 2 +- net/lnet/lnet/peer.c | 34 +++++++++++++++++----------------- 3 files changed, 19 insertions(+), 20 deletions(-) diff --git a/include/linux/lnet/lib-lnet.h b/include/linux/lnet/lib-lnet.h index d03dcf8..a8aa924 100644 --- a/include/linux/lnet/lib-lnet.h +++ b/include/linux/lnet/lib-lnet.h @@ -953,8 +953,7 @@ bool lnet_peer_is_pref_rtr_locked(struct lnet_peer_ni *lpni, int lnet_peer_add_pref_rtr(struct lnet_peer_ni *lpni, struct lnet_nid *nid); int lnet_peer_ni_set_non_mr_pref_nid(struct lnet_peer_ni *lpni, struct lnet_nid *nid); -int lnet_add_peer_ni(struct lnet_nid *key_nid, struct lnet_nid *nid, bool mr, - bool temp); +int lnet_user_add_peer_ni(struct lnet_nid *key_nid, struct lnet_nid *nid, bool mr); int lnet_del_peer_ni(struct lnet_nid *key_nid, struct lnet_nid *nid); int lnet_get_peer_info(struct lnet_ioctl_peer_cfg *cfg, void __user *bulk); int lnet_get_peer_ni_info(u32 peer_index, u64 *nid, diff --git a/net/lnet/lnet/api-ni.c b/net/lnet/lnet/api-ni.c index a4fb95f..20093a9 100644 --- a/net/lnet/lnet/api-ni.c +++ b/net/lnet/lnet/api-ni.c @@ -4239,7 +4239,7 @@ u32 lnet_get_dlc_seq_locked(void) mutex_lock(&the_lnet.ln_api_mutex); lnet_nid4_to_nid(cfg->prcfg_prim_nid, &prim_nid); lnet_nid4_to_nid(cfg->prcfg_cfg_nid, &nid); - rc = lnet_add_peer_ni(&prim_nid, &nid, cfg->prcfg_mr, false); + rc = lnet_user_add_peer_ni(&prim_nid, &nid, cfg->prcfg_mr); mutex_unlock(&the_lnet.ln_api_mutex); return rc; } diff --git a/net/lnet/lnet/peer.c b/net/lnet/lnet/peer.c index 0539cb4..fa2ca54 100644 --- a/net/lnet/lnet/peer.c +++ b/net/lnet/lnet/peer.c @@ -42,6 +42,8 @@ #define LNET_REDISCOVER_PEER (1) static int lnet_peer_queue_for_discovery(struct lnet_peer *lp); +static int lnet_add_peer_ni(struct lnet_nid *prim_nid, struct lnet_nid *nid, bool mr, + unsigned int flags); static void lnet_peer_remove_from_remote_list(struct lnet_peer_ni *lpni) @@ -1366,7 +1368,8 @@ struct lnet_peer_ni * lnet_nid4_to_nid(nids[i], &nid); if (LNET_NID_IS_ANY(&pnid)) { lnet_nid4_to_nid(nids[i], &pnid); - rc = lnet_add_peer_ni(&pnid, &LNET_ANY_NID, mr, true); + rc = lnet_add_peer_ni(&pnid, &LNET_ANY_NID, mr, + LNET_PEER_LOCK_PRIMARY); if (rc == -EALREADY) { struct lnet_peer *lp; @@ -1382,10 +1385,12 @@ struct lnet_peer_ni * } } else if (lnet_peer_discovery_disabled) { lnet_nid4_to_nid(nids[i], &nid); - rc = lnet_add_peer_ni(&nid, &LNET_ANY_NID, mr, true); + rc = lnet_add_peer_ni(&nid, &LNET_ANY_NID, mr, + LNET_PEER_LOCK_PRIMARY); } else { lnet_nid4_to_nid(nids[i], &nid); - rc = lnet_add_peer_ni(&pnid, &nid, mr, true); + rc = lnet_add_peer_ni(&pnid, &nid, mr, + LNET_PEER_LOCK_PRIMARY); } if (rc && rc != -EEXIST) @@ -1918,22 +1923,18 @@ struct lnet_peer_net * * The caller must hold ln_api_mutex. This prevents the peer from * being created/modified/deleted by a different thread. */ -int +static int lnet_add_peer_ni(struct lnet_nid *prim_nid, struct lnet_nid *nid, bool mr, - bool temp) + unsigned int flags) __must_hold(&the_lnet.ln_api_mutex) { struct lnet_peer *lp = NULL; struct lnet_peer_ni *lpni; - unsigned int flags = 0; /* The prim_nid must always be specified */ if (LNET_NID_IS_ANY(prim_nid)) return -EINVAL; - if (!temp) - flags = LNET_PEER_CONFIGURED; - if (mr) flags |= LNET_PEER_MULTI_RAIL; @@ -1951,13 +1952,6 @@ struct lnet_peer_net * lnet_peer_ni_decref_locked(lpni); lp = lpni->lpni_peer_net->lpn_peer; - /* Peer must have been configured. */ - if (!temp && !(lp->lp_state & LNET_PEER_CONFIGURED)) { - CDEBUG(D_NET, "peer %s was not configured\n", - libcfs_nidstr(prim_nid)); - return -ENOENT; - } - /* Primary NID must match */ if (!nid_same(&lp->lp_primary_nid, prim_nid)) { CDEBUG(D_NET, "prim_nid %s is not primary for peer %s\n", @@ -1973,7 +1967,8 @@ struct lnet_peer_net * return -EPERM; } - if (temp && lnet_peer_is_uptodate(lp)) { + if ((flags & LNET_PEER_LOCK_PRIMARY) && + (lnet_peer_is_uptodate(lp) && (lp->lp_state & LNET_PEER_LOCK_PRIMARY))) { CDEBUG(D_NET, "Don't add temporary peer NI for uptodate peer %s\n", libcfs_nidstr(&lp->lp_primary_nid)); @@ -1983,6 +1978,11 @@ struct lnet_peer_net * return lnet_peer_add_nid(lp, nid, flags); } +int lnet_user_add_peer_ni(struct lnet_nid *prim_nid, struct lnet_nid *nid, bool mr) +{ + return lnet_add_peer_ni(prim_nid, nid, mr, LNET_PEER_CONFIGURED); +} + /* * Implementation of IOC_LIBCFS_DEL_PEER_NI. * From patchwork Sun Apr 9 12:13:09 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 13205965 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from pdx1-mailman-customer002.dreamhost.com (listserver-buz.dreamhost.com [69.163.136.29]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 4A4F2C77B61 for ; Sun, 9 Apr 2023 12:39:05 +0000 (UTC) Received: from pdx1-mailman-customer002.dreamhost.com (localhost [127.0.0.1]) by pdx1-mailman-customer002.dreamhost.com (Postfix) with ESMTP id 4PvWRm6jGHz22Vc; Sun, 9 Apr 2023 05:21:24 -0700 (PDT) Received: from smtp4.ccs.ornl.gov (smtp4.ccs.ornl.gov [160.91.203.40]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by pdx1-mailman-customer002.dreamhost.com (Postfix) with ESMTPS id 4PvWNc3dykz22Q0 for ; Sun, 9 Apr 2023 05:18:40 -0700 (PDT) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp4.ccs.ornl.gov (Postfix) with ESMTP id 49352100848B; Sun, 9 Apr 2023 08:13:28 -0400 (EDT) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 47B482AB; Sun, 9 Apr 2023 08:13:28 -0400 (EDT) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Sun, 9 Apr 2023 08:13:09 -0400 Message-Id: <1681042400-15491-30-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1681042400-15491-1-git-send-email-jsimmons@infradead.org> References: <1681042400-15491-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 29/40] lnet: don't delete peer created by Lustre X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.39 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Amir Shehata , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Amir Shehata Peers created by Lustre have their primary NIDs locked. If that peer is deleted, it'll confuse lustre. So when manually deleting a peer using: lnetctl peer del --prim_nid ... We must continue to preserve the primary NID. Therefore we delete all the constituent NIDs, but keep the primary NID. We then flag the peer for rediscovery. WC-bug-id: https://jira.whamcloud.com/browse/LU-14668 Lustre-commit: 7cc5b4329fc2eecbf ("LU-14668 lnet: don't delete peer created by Lustre") Signed-off-by: Amir Shehata Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/43565 Reviewed-by: Oleg Drokin Reviewed-by: Serguei Smirnov Reviewed-by: Cyril Bordage Signed-off-by: James Simmons --- net/lnet/lnet/peer.c | 45 +++++++++++++++++++++++++++++++++++++++++++-- 1 file changed, 43 insertions(+), 2 deletions(-) diff --git a/net/lnet/lnet/peer.c b/net/lnet/lnet/peer.c index fa2ca54..0a5e73a 100644 --- a/net/lnet/lnet/peer.c +++ b/net/lnet/lnet/peer.c @@ -1983,6 +1983,40 @@ int lnet_user_add_peer_ni(struct lnet_nid *prim_nid, struct lnet_nid *nid, bool return lnet_add_peer_ni(prim_nid, nid, mr, LNET_PEER_CONFIGURED); } +static int +lnet_reset_peer(struct lnet_peer *lp) +{ + struct lnet_peer_net *lpn, *lpntmp; + struct lnet_peer_ni *lpni, *lpnitmp; + unsigned int flags; + int rc; + + lnet_peer_cancel_discovery(lp); + + flags = LNET_PEER_CONFIGURED; + if (lp->lp_state & LNET_PEER_MULTI_RAIL) + flags |= LNET_PEER_MULTI_RAIL; + + list_for_each_entry_safe(lpn, lpntmp, &lp->lp_peer_nets, lpn_peer_nets) { + list_for_each_entry_safe(lpni, lpnitmp, &lpn->lpn_peer_nis, + lpni_peer_nis) { + if (nid_same(&lpni->lpni_nid, &lp->lp_primary_nid)) + continue; + + rc = lnet_peer_del_nid(lp, &lpni->lpni_nid, flags); + if (rc) { + CERROR("Failed to delete %s from peer %s\n", + libcfs_nidstr(&lpni->lpni_nid), + libcfs_nidstr(&lp->lp_primary_nid)); + } + } + } + + /* mark it for discovery the next time we use it */ + lp->lp_state &= ~LNET_PEER_NIDS_UPTODATE; + return 0; +} + /* * Implementation of IOC_LIBCFS_DEL_PEER_NI. * @@ -2026,8 +2060,15 @@ int lnet_user_add_peer_ni(struct lnet_nid *prim_nid, struct lnet_nid *nid, bool } lnet_net_unlock(LNET_LOCK_EX); - if (LNET_NID_IS_ANY(nid) || nid_same(nid, &lp->lp_primary_nid)) - return lnet_peer_del(lp); + if (LNET_NID_IS_ANY(nid) || nid_same(nid, &lp->lp_primary_nid)) { + if (lp->lp_state & LNET_PEER_LOCK_PRIMARY) { + CERROR("peer %s created by Lustre. Must preserve primary NID, but will remove other NIDs\n", + libcfs_nidstr(&lp->lp_primary_nid)); + return lnet_reset_peer(lp); + } else { + return lnet_peer_del(lp); + } + } flags = LNET_PEER_CONFIGURED; if (lp->lp_state & LNET_PEER_MULTI_RAIL) From patchwork Sun Apr 9 12:13:10 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 13205971 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from pdx1-mailman-customer002.dreamhost.com (listserver-buz.dreamhost.com [69.163.136.29]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 11A39C77B61 for ; Sun, 9 Apr 2023 12:41:33 +0000 (UTC) Received: from pdx1-mailman-customer002.dreamhost.com (localhost [127.0.0.1]) by pdx1-mailman-customer002.dreamhost.com (Postfix) with ESMTP id 4PvWTP3mmHz20rQ; Sun, 9 Apr 2023 05:22:49 -0700 (PDT) Received: from smtp4.ccs.ornl.gov (smtp4.ccs.ornl.gov [160.91.203.40]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by pdx1-mailman-customer002.dreamhost.com (Postfix) with ESMTPS id 4PvWNh2nSvz22Q5 for ; Sun, 9 Apr 2023 05:18:44 -0700 (PDT) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp4.ccs.ornl.gov (Postfix) with ESMTP id 4FB84100848C; Sun, 9 Apr 2023 08:13:28 -0400 (EDT) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 4C84E2B2; Sun, 9 Apr 2023 08:13:28 -0400 (EDT) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Sun, 9 Apr 2023 08:13:10 -0400 Message-Id: <1681042400-15491-31-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1681042400-15491-1-git-send-email-jsimmons@infradead.org> References: <1681042400-15491-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 30/40] lnet: memory leak in copy_ioc_udsp_descr X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.39 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Chris Horn , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Chris Horn copy_ioc_udsp_descr() doesn't correctly handle the case where a net number was not specified. In this case, there isn't any net number range that needs to be copied into the udsp descriptor. WC-bug-id: https://jira.whamcloud.com/browse/LU-16575 Lustre-commit: f8e129198b002589d ("LU-16575 lnet: memory leak in copy_ioc_udsp_descr") Signed-off-by: Chris Horn Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/50081 Reviewed-by: Serguei Smirnov Reviewed-by: Frank Sehr Reviewed-by: Cyril Bordage Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- net/lnet/lnet/udsp.c | 15 +++++++++++++-- 1 file changed, 13 insertions(+), 2 deletions(-) diff --git a/net/lnet/lnet/udsp.c b/net/lnet/lnet/udsp.c index 2594df1..deaca51 100644 --- a/net/lnet/lnet/udsp.c +++ b/net/lnet/lnet/udsp.c @@ -1485,8 +1485,19 @@ struct lnet_udsp * CDEBUG(D_NET, "%u\n", nid_descr->ud_net_id.udn_net_type); /* allocate the total memory required to copy this NID descriptor */ - alloc_size = (sizeof(struct cfs_expr_list) * (expr_count + 1)) + - (sizeof(struct cfs_range_expr) * (range_count)); + if (ioc_nid->iud_net.ud_net_num_expr.le_count) { + if (ioc_nid->iud_net.ud_net_num_expr.le_count != 1) { + CERROR("Unexpected number of net numeric ranges \"%u\". Cannot add UDSP rule.\n", + ioc_nid->iud_net.ud_net_num_expr.le_count); + return -EINVAL; + } + alloc_size = (sizeof(struct cfs_expr_list) * (expr_count + 1)) + + (sizeof(struct cfs_range_expr) * (range_count)); + } else { + alloc_size = (sizeof(struct cfs_expr_list) * (expr_count)) + + (sizeof(struct cfs_range_expr) * (range_count)); + } + buf = kzalloc(alloc_size, GFP_KERNEL); if (!buf) return -ENOMEM; From patchwork Sun Apr 9 12:13:11 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 13205970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from pdx1-mailman-customer002.dreamhost.com (listserver-buz.dreamhost.com [69.163.136.29]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id C1A93C77B61 for ; Sun, 9 Apr 2023 12:40:25 +0000 (UTC) Received: from pdx1-mailman-customer002.dreamhost.com (localhost [127.0.0.1]) by pdx1-mailman-customer002.dreamhost.com (Postfix) with ESMTP id 4PvWSd3qPTz22WX; Sun, 9 Apr 2023 05:22:09 -0700 (PDT) Received: from smtp4.ccs.ornl.gov (smtp4.ccs.ornl.gov [160.91.203.40]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by pdx1-mailman-customer002.dreamhost.com (Postfix) with ESMTPS id 4PvWP938nDz22Qt for ; Sun, 9 Apr 2023 05:19:09 -0700 (PDT) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp4.ccs.ornl.gov (Postfix) with ESMTP id 530F2100848D; Sun, 9 Apr 2023 08:13:28 -0400 (EDT) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 515032B3; Sun, 9 Apr 2023 08:13:28 -0400 (EDT) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Sun, 9 Apr 2023 08:13:11 -0400 Message-Id: <1681042400-15491-32-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1681042400-15491-1-git-send-email-jsimmons@infradead.org> References: <1681042400-15491-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 31/40] lnet: remove crash with UDSP X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.39 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Cyril Bordage , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Cyril Bordage The following sequence of commands caused a crash: # lnetctl udsp add --dst tcp --prio 1 # lnetctl discover 192.168.122.60@tcp Pointer to lnet_peer_net in udsp_info is checked before used. WC-bug-id: https://jira.whamcloud.com/browse/LU-15944 Lustre-commit: c56b9455f05f760ae ("LU-15944 lnet: remove crash with UDSP") Signed-off-by: Cyril Bordage Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/48801 Reviewed-by: Chris Horn Reviewed-by: Serguei Smirnov Reviewed-by: Frank Sehr Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- net/lnet/lnet/udsp.c | 21 +++++++++++++-------- 1 file changed, 13 insertions(+), 8 deletions(-) diff --git a/net/lnet/lnet/udsp.c b/net/lnet/lnet/udsp.c index deaca51..eb9a614 100644 --- a/net/lnet/lnet/udsp.c +++ b/net/lnet/lnet/udsp.c @@ -74,13 +74,17 @@ * from the policy list. * * Generally, the syntax is as follows - * lnetctl policy - * --src: ip2nets syntax specifying the local NID to match - * --dst: ip2nets syntax specifying the remote NID to match - * --rte: ip2nets syntax specifying the router NID to match - * --priority: Priority to apply to rule matches - * --idx: Index of where to insert or delete the rule - * By default add appends to the end of the rule list + * lnetctl udsp add: add a udsp + * --src: ip2nets syntax specifying the local NID to match + * --dst: ip2nets syntax specifying the remote NID to match + * --rte: ip2nets syntax specifying the router NID to match + * --priority: priority value (0 - highest priority) + * --idx: index of where to insert the rule. + * By default, appends to the end of the rule list. + * lnetctl udsp del: delete a udsp + * --idx: index of the Policy. + * lnetctl udsp show: show udsps + * --idx: index of the policy to show. * * Author: Amir Shehata */ @@ -536,7 +540,8 @@ enum udsp_apply { /* check if looking for a net match */ if (!rc && - (lnet_get_list_len(&lp_match->ud_addr_range) || + (!udi->udi_lpn || + lnet_get_list_len(&lp_match->ud_addr_range) || !cfs_match_net(udi->udi_lpn->lpn_net_id, lp_match->ud_net_id.udn_net_type, &lp_match->ud_net_id.udn_net_num_range))) { From patchwork Sun Apr 9 12:13:12 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 13205974 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from pdx1-mailman-customer002.dreamhost.com (listserver-buz.dreamhost.com [69.163.136.29]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9B06FC77B70 for ; Sun, 9 Apr 2023 12:42:08 +0000 (UTC) Received: from pdx1-mailman-customer002.dreamhost.com (localhost [127.0.0.1]) by pdx1-mailman-customer002.dreamhost.com (Postfix) with ESMTP id 4PvWV96PHRz22Yl; Sun, 9 Apr 2023 05:23:29 -0700 (PDT) Received: from smtp4.ccs.ornl.gov (smtp4.ccs.ornl.gov [160.91.203.40]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by pdx1-mailman-customer002.dreamhost.com (Postfix) with ESMTPS id 4PvWQ00kPKz22S6 for ; Sun, 9 Apr 2023 05:19:52 -0700 (PDT) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp4.ccs.ornl.gov (Postfix) with ESMTP id 57C75100848E; Sun, 9 Apr 2023 08:13:28 -0400 (EDT) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 55FEB2AB; Sun, 9 Apr 2023 08:13:28 -0400 (EDT) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Sun, 9 Apr 2023 08:13:12 -0400 Message-Id: <1681042400-15491-33-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1681042400-15491-1-git-send-email-jsimmons@infradead.org> References: <1681042400-15491-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 32/40] lustre: ptlrpc: fix clang build errors X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.39 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Timothy Day Fixed bugs which cause errors on Clang. The majority of changes involve adding defines for the 'ptlrpc_nrs_ctl' enum. This avoids having to explicitly cast enums from one type to another. A 'strlcpy' in 'sptlrpc_process_config' was copying the wrong number of bytes. WC-bug-id: https://jira.whamcloud.com/browse/LU-16518 Lustre-commit: 50f28f81b5aa8f8ad ("LU-16518 ptlrpc: fix clang build errors") Signed-off-by: Timothy Day Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/49859 Reviewed-by: Andreas Dilger Reviewed-by: jsimmons Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/include/lustre_nrs.h | 11 ++++++++++- fs/lustre/include/lustre_nrs_delay.h | 14 ++++++-------- fs/lustre/ptlrpc/nrs_delay.c | 2 +- fs/lustre/ptlrpc/sec_config.c | 2 +- 4 files changed, 18 insertions(+), 11 deletions(-) diff --git a/fs/lustre/include/lustre_nrs.h b/fs/lustre/include/lustre_nrs.h index 7e0a840..0e0dd73 100644 --- a/fs/lustre/include/lustre_nrs.h +++ b/fs/lustre/include/lustre_nrs.h @@ -64,7 +64,16 @@ enum ptlrpc_nrs_ctl { * Policies can start using opcodes from this value and onwards for * their own purposes; the assigned value itself is arbitrary. */ - PTLRPC_NRS_CTL_1ST_POL_SPEC = 0x20, + PTLRPC_NRS_CTL_POL_SPEC_01 = 0x20, + PTLRPC_NRS_CTL_POL_SPEC_02, + PTLRPC_NRS_CTL_POL_SPEC_03, + PTLRPC_NRS_CTL_POL_SPEC_04, + PTLRPC_NRS_CTL_POL_SPEC_05, + PTLRPC_NRS_CTL_POL_SPEC_06, + PTLRPC_NRS_CTL_POL_SPEC_07, + PTLRPC_NRS_CTL_POL_SPEC_08, + PTLRPC_NRS_CTL_POL_SPEC_09, + PTLRPC_NRS_CTL_POL_SPEC_10 }; /** diff --git a/fs/lustre/include/lustre_nrs_delay.h b/fs/lustre/include/lustre_nrs_delay.h index 52c3885..75bf56d 100644 --- a/fs/lustre/include/lustre_nrs_delay.h +++ b/fs/lustre/include/lustre_nrs_delay.h @@ -73,14 +73,12 @@ struct nrs_delay_req { time64_t req_start_time; }; -enum nrs_ctl_delay { - NRS_CTL_DELAY_RD_MIN = PTLRPC_NRS_CTL_1ST_POL_SPEC, - NRS_CTL_DELAY_WR_MIN, - NRS_CTL_DELAY_RD_MAX, - NRS_CTL_DELAY_WR_MAX, - NRS_CTL_DELAY_RD_PCT, - NRS_CTL_DELAY_WR_PCT, -}; +#define NRS_CTL_DELAY_RD_MIN PTLRPC_NRS_CTL_POL_SPEC_01 +#define NRS_CTL_DELAY_WR_MIN PTLRPC_NRS_CTL_POL_SPEC_02 +#define NRS_CTL_DELAY_RD_MAX PTLRPC_NRS_CTL_POL_SPEC_03 +#define NRS_CTL_DELAY_WR_MAX PTLRPC_NRS_CTL_POL_SPEC_04 +#define NRS_CTL_DELAY_RD_PCT PTLRPC_NRS_CTL_POL_SPEC_05 +#define NRS_CTL_DELAY_WR_PCT PTLRPC_NRS_CTL_POL_SPEC_06 /** @} delay */ diff --git a/fs/lustre/ptlrpc/nrs_delay.c b/fs/lustre/ptlrpc/nrs_delay.c index 127f00c..b249749 100644 --- a/fs/lustre/ptlrpc/nrs_delay.c +++ b/fs/lustre/ptlrpc/nrs_delay.c @@ -322,7 +322,7 @@ static int nrs_delay_ctl(struct ptlrpc_nrs_policy *policy, assert_spin_locked(&policy->pol_nrs->nrs_lock); - switch ((enum nrs_ctl_delay)opc) { + switch (opc) { default: return -EINVAL; diff --git a/fs/lustre/ptlrpc/sec_config.c b/fs/lustre/ptlrpc/sec_config.c index e0ddebd..1b56ef4 100644 --- a/fs/lustre/ptlrpc/sec_config.c +++ b/fs/lustre/ptlrpc/sec_config.c @@ -649,7 +649,7 @@ int sptlrpc_process_config(struct lustre_cfg *lcfg) * is a actual filesystem. */ if (server_name2fsname(target, fsname, NULL)) - strlcpy(fsname, target, sizeof(target)); + strlcpy(fsname, target, sizeof(fsname)); rc = sptlrpc_parse_rule(param, &rule); if (rc) From patchwork Sun Apr 9 12:13:13 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 13205972 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from pdx1-mailman-customer002.dreamhost.com (listserver-buz.dreamhost.com [69.163.136.29]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 5CF86C77B70 for ; Sun, 9 Apr 2023 12:41:35 +0000 (UTC) Received: from pdx1-mailman-customer002.dreamhost.com (localhost [127.0.0.1]) by pdx1-mailman-customer002.dreamhost.com (Postfix) with ESMTP id 4PvWTL5YqMz22Xq; Sun, 9 Apr 2023 05:22:46 -0700 (PDT) Received: from smtp4.ccs.ornl.gov (smtp4.ccs.ornl.gov [160.91.203.40]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by pdx1-mailman-customer002.dreamhost.com (Postfix) with ESMTPS id 4PvWRF0NJ9z22Ts for ; Sun, 9 Apr 2023 05:20:57 -0700 (PDT) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp4.ccs.ornl.gov (Postfix) with ESMTP id 5C3E2100848F; Sun, 9 Apr 2023 08:13:28 -0400 (EDT) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 5AECB2B2; Sun, 9 Apr 2023 08:13:28 -0400 (EDT) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Sun, 9 Apr 2023 08:13:13 -0400 Message-Id: <1681042400-15491-34-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1681042400-15491-1-git-send-email-jsimmons@infradead.org> References: <1681042400-15491-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 33/40] lustre: ldlm: remove client_import_find_conn() X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.39 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Mr NeilBrown This function hasn't been used since Commit 3dd3fe462023 ("lustre: mgc: Use IR for client->MDS/OST connections"). So remove it. WC-bug-id: https://jira.whamcloud.com/browse/LU-10360 Lustre-commit: 14544bdca5cc42a3e ("LU-10360 ldlm: remove client_import_find_conn()") Signed-off-by: Mr NeilBrown Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/50000 Reviewed-by: Oleg Drokin Reviewed-by: Andreas Dilger Reviewed-by: jsimmons Signed-off-by: James Simmons --- fs/lustre/include/lustre_net.h | 2 -- fs/lustre/ldlm/ldlm_lib.c | 24 ------------------------ 2 files changed, 26 deletions(-) diff --git a/fs/lustre/include/lustre_net.h b/fs/lustre/include/lustre_net.h index 1ffe9f7..a305ba3 100644 --- a/fs/lustre/include/lustre_net.h +++ b/fs/lustre/include/lustre_net.h @@ -2358,8 +2358,6 @@ int client_import_dyn_add_conn(struct obd_import *imp, struct obd_uuid *uuid, int client_import_add_nids_to_conn(struct obd_import *imp, lnet_nid_t *nids, int nid_count, struct obd_uuid *uuid); int client_import_del_conn(struct obd_import *imp, struct obd_uuid *uuid); -int client_import_find_conn(struct obd_import *imp, lnet_nid_t peer, - struct obd_uuid *uuid); int import_set_conn_priority(struct obd_import *imp, struct obd_uuid *uuid); void client_destroy_import(struct obd_import *imp); /** @} */ diff --git a/fs/lustre/ldlm/ldlm_lib.c b/fs/lustre/ldlm/ldlm_lib.c index 0b8389e..b1ce0d4 100644 --- a/fs/lustre/ldlm/ldlm_lib.c +++ b/fs/lustre/ldlm/ldlm_lib.c @@ -243,30 +243,6 @@ int client_import_del_conn(struct obd_import *imp, struct obd_uuid *uuid) } EXPORT_SYMBOL(client_import_del_conn); -/** - * Find conn UUID by peer NID. @peer is a server NID. This function is used - * to find a conn uuid of @imp which can reach @peer. - */ -int client_import_find_conn(struct obd_import *imp, lnet_nid_t peer, - struct obd_uuid *uuid) -{ - struct obd_import_conn *conn; - int rc = -ENOENT; - - spin_lock(&imp->imp_lock); - list_for_each_entry(conn, &imp->imp_conn_list, oic_item) { - /* Check if conn UUID does have this peer NID. */ - if (class_check_uuid(&conn->oic_uuid, peer)) { - *uuid = conn->oic_uuid; - rc = 0; - break; - } - } - spin_unlock(&imp->imp_lock); - return rc; -} -EXPORT_SYMBOL(client_import_find_conn); - void client_destroy_import(struct obd_import *imp) { /* From patchwork Sun Apr 9 12:13:14 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 13205975 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from pdx1-mailman-customer002.dreamhost.com (listserver-buz.dreamhost.com [69.163.136.29]) by smtp.lore.kernel.org (Postfix) with ESMTP id 67EC0C77B61 for ; Sun, 9 Apr 2023 12:42:11 +0000 (UTC) Received: from pdx1-mailman-customer002.dreamhost.com (localhost [127.0.0.1]) by pdx1-mailman-customer002.dreamhost.com (Postfix) with ESMTP id 4PvWVC6wGHz22Yp; Sun, 9 Apr 2023 05:23:31 -0700 (PDT) Received: from smtp4.ccs.ornl.gov (smtp4.ccs.ornl.gov [160.91.203.40]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by pdx1-mailman-customer002.dreamhost.com (Postfix) with ESMTPS id 4PvWS62n1tz1yGl for ; Sun, 9 Apr 2023 05:21:42 -0700 (PDT) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp4.ccs.ornl.gov (Postfix) with ESMTP id 608AE1008490; Sun, 9 Apr 2023 08:13:28 -0400 (EDT) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 5F5262B3; Sun, 9 Apr 2023 08:13:28 -0400 (EDT) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Sun, 9 Apr 2023 08:13:14 -0400 Message-Id: <1681042400-15491-35-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1681042400-15491-1-git-send-email-jsimmons@infradead.org> References: <1681042400-15491-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 34/40] lnet: add 'force' option to lnetctl peer del X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.39 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Serguei Smirnov , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Serguei Smirnov Add --force option to 'lnetctl peer del' command. If the peer has primary NID locked, this option allows for the peer to be deleted manually: lnetctl peer del --prim_nid --force Add --prim_lock option to 'lnetctl peer add' command. If specified, the primary NID of the peer is locked such that it is going to be the NID used to identify the peer in communications with Lustre layer. WC-bug-id: https://jira.whamcloud.com/browse/LU-14668 Lustre-commit: f1b2d8d60c593a670 ("LU-14668 lnet: add 'force' option to lnetctl peer del") Signed-off-by: Serguei Smirnov Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/50149 Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- include/linux/lnet/lib-lnet.h | 6 ++++-- include/uapi/linux/lnet/lnet-dlc.h | 4 +++- net/lnet/lnet/api-ni.c | 6 ++++-- net/lnet/lnet/peer.c | 12 ++++++++---- 4 files changed, 19 insertions(+), 9 deletions(-) diff --git a/include/linux/lnet/lib-lnet.h b/include/linux/lnet/lib-lnet.h index a8aa924..e26e150 100644 --- a/include/linux/lnet/lib-lnet.h +++ b/include/linux/lnet/lib-lnet.h @@ -953,8 +953,10 @@ bool lnet_peer_is_pref_rtr_locked(struct lnet_peer_ni *lpni, int lnet_peer_add_pref_rtr(struct lnet_peer_ni *lpni, struct lnet_nid *nid); int lnet_peer_ni_set_non_mr_pref_nid(struct lnet_peer_ni *lpni, struct lnet_nid *nid); -int lnet_user_add_peer_ni(struct lnet_nid *key_nid, struct lnet_nid *nid, bool mr); -int lnet_del_peer_ni(struct lnet_nid *key_nid, struct lnet_nid *nid); +int lnet_user_add_peer_ni(struct lnet_nid *key_nid, struct lnet_nid *nid, + bool mr, bool lock_prim); +int lnet_del_peer_ni(struct lnet_nid *key_nid, struct lnet_nid *nid, + int force); int lnet_get_peer_info(struct lnet_ioctl_peer_cfg *cfg, void __user *bulk); int lnet_get_peer_ni_info(u32 peer_index, u64 *nid, char alivness[LNET_MAX_STR_LEN], diff --git a/include/uapi/linux/lnet/lnet-dlc.h b/include/uapi/linux/lnet/lnet-dlc.h index 63578a0..fc1d40c 100644 --- a/include/uapi/linux/lnet/lnet-dlc.h +++ b/include/uapi/linux/lnet/lnet-dlc.h @@ -298,7 +298,9 @@ struct lnet_ioctl_peer_cfg { struct libcfs_ioctl_hdr prcfg_hdr; lnet_nid_t prcfg_prim_nid; lnet_nid_t prcfg_cfg_nid; - __u32 prcfg_count; + __u32 prcfg_count; /* ADD_PEER_NI: used for 'lock_prim' option + * DEL_PEER_NI: used for 'force' option + */ __u32 prcfg_mr; __u32 prcfg_state; __u32 prcfg_size; diff --git a/net/lnet/lnet/api-ni.c b/net/lnet/lnet/api-ni.c index 20093a9..9095d4e 100644 --- a/net/lnet/lnet/api-ni.c +++ b/net/lnet/lnet/api-ni.c @@ -4239,7 +4239,8 @@ u32 lnet_get_dlc_seq_locked(void) mutex_lock(&the_lnet.ln_api_mutex); lnet_nid4_to_nid(cfg->prcfg_prim_nid, &prim_nid); lnet_nid4_to_nid(cfg->prcfg_cfg_nid, &nid); - rc = lnet_user_add_peer_ni(&prim_nid, &nid, cfg->prcfg_mr); + rc = lnet_user_add_peer_ni(&prim_nid, &nid, cfg->prcfg_mr, + cfg->prcfg_count == 1); mutex_unlock(&the_lnet.ln_api_mutex); return rc; } @@ -4255,7 +4256,8 @@ u32 lnet_get_dlc_seq_locked(void) lnet_nid4_to_nid(cfg->prcfg_prim_nid, &prim_nid); lnet_nid4_to_nid(cfg->prcfg_cfg_nid, &nid); rc = lnet_del_peer_ni(&prim_nid, - &nid); + &nid, + cfg->prcfg_count); mutex_unlock(&the_lnet.ln_api_mutex); return rc; } diff --git a/net/lnet/lnet/peer.c b/net/lnet/lnet/peer.c index 0a5e73a..619973b 100644 --- a/net/lnet/lnet/peer.c +++ b/net/lnet/lnet/peer.c @@ -1978,9 +1978,12 @@ struct lnet_peer_net * return lnet_peer_add_nid(lp, nid, flags); } -int lnet_user_add_peer_ni(struct lnet_nid *prim_nid, struct lnet_nid *nid, bool mr) +int lnet_user_add_peer_ni(struct lnet_nid *prim_nid, struct lnet_nid *nid, + bool mr, bool lock_prim) { - return lnet_add_peer_ni(prim_nid, nid, mr, LNET_PEER_CONFIGURED); + int fl = LNET_PEER_CONFIGURED | (LNET_PEER_LOCK_PRIMARY * lock_prim); + + return lnet_add_peer_ni(prim_nid, nid, mr, fl); } static int @@ -2029,7 +2032,8 @@ int lnet_user_add_peer_ni(struct lnet_nid *prim_nid, struct lnet_nid *nid, bool * being modified/deleted by a different thread. */ int -lnet_del_peer_ni(struct lnet_nid *prim_nid, struct lnet_nid *nid) +lnet_del_peer_ni(struct lnet_nid *prim_nid, struct lnet_nid *nid, + int force) { struct lnet_peer *lp; struct lnet_peer_ni *lpni; @@ -2061,7 +2065,7 @@ int lnet_user_add_peer_ni(struct lnet_nid *prim_nid, struct lnet_nid *nid, bool lnet_net_unlock(LNET_LOCK_EX); if (LNET_NID_IS_ANY(nid) || nid_same(nid, &lp->lp_primary_nid)) { - if (lp->lp_state & LNET_PEER_LOCK_PRIMARY) { + if (!force && lp->lp_state & LNET_PEER_LOCK_PRIMARY) { CERROR("peer %s created by Lustre. Must preserve primary NID, but will remove other NIDs\n", libcfs_nidstr(&lp->lp_primary_nid)); return lnet_reset_peer(lp); From patchwork Sun Apr 9 12:13:15 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 13205976 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from pdx1-mailman-customer002.dreamhost.com (listserver-buz.dreamhost.com [69.163.136.29]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 676ECC77B70 for ; Sun, 9 Apr 2023 12:42:55 +0000 (UTC) Received: from pdx1-mailman-customer002.dreamhost.com (localhost [127.0.0.1]) by pdx1-mailman-customer002.dreamhost.com (Postfix) with ESMTP id 4PvWVn21Rxz22Zg; Sun, 9 Apr 2023 05:24:01 -0700 (PDT) Received: from smtp4.ccs.ornl.gov (smtp4.ccs.ornl.gov [160.91.203.40]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by pdx1-mailman-customer002.dreamhost.com (Postfix) with ESMTPS id 4PvWTF30cXz22Xd for ; Sun, 9 Apr 2023 05:22:41 -0700 (PDT) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp4.ccs.ornl.gov (Postfix) with ESMTP id 651091008491; Sun, 9 Apr 2023 08:13:28 -0400 (EDT) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 63D062AB; Sun, 9 Apr 2023 08:13:28 -0400 (EDT) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Sun, 9 Apr 2023 08:13:15 -0400 Message-Id: <1681042400-15491-36-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1681042400-15491-1-git-send-email-jsimmons@infradead.org> References: <1681042400-15491-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 35/40] lustre: ldlm: BL_AST lock cancel still can be batched X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.39 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Vitaly Fertman , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Vitaly Fertman The previous patch makes BLAST locks to be cancelled separately. However the main problem is flushing the data under the other batched locks, thus still possible to batch it with those with no data. Could be optimized for not yet CANCELLING locks only, otherwise it is already in the l_bl_ast list. Fixes: 1ada5c64 ("lustre: ldlm: send the cancel RPC asap") WC-bug-id: https://jira.whamcloud.com/browse/LU-16285 Lustre-commit: 9d79f92076b6a9ca7 ("LU-16285 ldlm: BL_AST lock cancel still can be batched") Signed-off-by: Vitaly Fertman Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/50158 Reviewed-by: Yang Sheng Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/include/lustre_dlm.h | 1 - fs/lustre/ldlm/ldlm_lockd.c | 3 ++- fs/lustre/ldlm/ldlm_request.c | 42 +++++++++++++++++++++++++----------------- 3 files changed, 27 insertions(+), 19 deletions(-) diff --git a/fs/lustre/include/lustre_dlm.h b/fs/lustre/include/lustre_dlm.h index 3a4f152..d08c48f 100644 --- a/fs/lustre/include/lustre_dlm.h +++ b/fs/lustre/include/lustre_dlm.h @@ -593,7 +593,6 @@ enum ldlm_cancel_flags { LCF_BL_AST = 0x4, /* Cancel locks marked as LDLM_FL_BL_AST * in the same RPC */ - LCF_ONE_LOCK = 0x8, /* Cancel locks pack only one lock. */ }; struct ldlm_flock { diff --git a/fs/lustre/ldlm/ldlm_lockd.c b/fs/lustre/ldlm/ldlm_lockd.c index 3a085db..abd853b 100644 --- a/fs/lustre/ldlm/ldlm_lockd.c +++ b/fs/lustre/ldlm/ldlm_lockd.c @@ -700,7 +700,8 @@ static int ldlm_callback_handler(struct ptlrpc_request *req) * we can tell the server we have no lock. Otherwise, we * should send cancel after dropping the cache. */ - if (ldlm_is_ast_sent(lock) || ldlm_is_failed(lock)) { + if ((ldlm_is_canceling(lock) && ldlm_is_bl_done(lock)) || + ldlm_is_failed(lock)) { LDLM_DEBUG(lock, "callback on lock %#llx - lock disappeared", dlm_req->lock_handle[0].cookie); diff --git a/fs/lustre/ldlm/ldlm_request.c b/fs/lustre/ldlm/ldlm_request.c index ef3ad28..11071d9 100644 --- a/fs/lustre/ldlm/ldlm_request.c +++ b/fs/lustre/ldlm/ldlm_request.c @@ -1055,8 +1055,9 @@ static int _ldlm_cancel_pack(struct ptlrpc_request *req, struct ldlm_lock *lock, * Prepare and send a batched cancel RPC. It will include @count lock * handles of locks given in @cancels list. */ -static int ldlm_cli_cancel_req(struct obd_export *exp, void *ptr, - int count, enum ldlm_cancel_flags flags) +static int ldlm_cli_cancel_req(struct obd_export *exp, struct ldlm_lock *lock, + struct list_head *head, int count, + enum ldlm_cancel_flags flags) { struct ptlrpc_request *req = NULL; struct obd_import *imp; @@ -1065,6 +1066,7 @@ static int ldlm_cli_cancel_req(struct obd_export *exp, void *ptr, LASSERT(exp); LASSERT(count > 0); + LASSERT(!head || !lock); CFS_FAIL_TIMEOUT(OBD_FAIL_LDLM_PAUSE_CANCEL, cfs_fail_val); @@ -1104,10 +1106,7 @@ static int ldlm_cli_cancel_req(struct obd_export *exp, void *ptr, req->rq_reply_portal = LDLM_CANCEL_REPLY_PORTAL; ptlrpc_at_set_req_timeout(req); - if (flags & LCF_ONE_LOCK) - rc = _ldlm_cancel_pack(req, ptr, NULL, count); - else - rc = _ldlm_cancel_pack(req, NULL, ptr, count); + rc = _ldlm_cancel_pack(req, lock, head, count); if (rc == 0) { ptlrpc_req_finished(req); sent = count; @@ -1265,7 +1264,8 @@ int ldlm_cli_cancel(const struct lustre_handle *lockh, enum ldlm_cancel_flags flags) { struct obd_export *exp; - int avail, count = 1, bl_ast = 0; + int avail, count = 1, separate = 0; + enum ldlm_lru_flags lru_flags = 0; u64 rc = 0; struct ldlm_namespace *ns; struct ldlm_lock *lock; @@ -1286,7 +1286,8 @@ int ldlm_cli_cancel(const struct lustre_handle *lockh, LDLM_LOCK_RELEASE(lock); return 0; } - bl_ast = 1; + if (ldlm_is_canceling(lock)) + separate = 1; } else if (ldlm_is_canceling(lock)) { /* Lock is being canceled and the caller doesn't want to wait */ unlock_res_and_lock(lock); @@ -1308,11 +1309,18 @@ int ldlm_cli_cancel(const struct lustre_handle *lockh, if (rc == LDLM_FL_LOCAL_ONLY || flags & LCF_LOCAL) { LDLM_LOCK_RELEASE(lock); return 0; + } else if (rc == LDLM_FL_BL_AST) { + /* BL_AST lock must not wait. */ + lru_flags |= LDLM_LRU_FLAG_NO_WAIT; } exp = lock->l_conn_export; - if (bl_ast) { /* Send RPC immedaitly for LDLM_FL_BL_AST */ - ldlm_cli_cancel_req(exp, lock, count, flags | LCF_ONE_LOCK); + /* If a lock has been taken from lru for a batched cancel and a later + * BL_AST came, send a CANCEL RPC individually for it right away, not + * waiting for the batch to be handled. + */ + if (separate) { + ldlm_cli_cancel_req(exp, lock, NULL, 1, flags); LDLM_LOCK_RELEASE(lock); return 0; } @@ -1332,7 +1340,7 @@ int ldlm_cli_cancel(const struct lustre_handle *lockh, ns = ldlm_lock_to_ns(lock); count += ldlm_cancel_lru_local(ns, &cancels, 0, avail - 1, - LCF_BL_AST, 0); + LCF_BL_AST, lru_flags); } ldlm_cli_cancel_list(&cancels, count, NULL, flags); @@ -1345,7 +1353,7 @@ int ldlm_cli_cancel(const struct lustre_handle *lockh, * Return the number of cancelled locks. */ int ldlm_cli_cancel_list_local(struct list_head *cancels, int count, - enum ldlm_cancel_flags flags) + enum ldlm_cancel_flags cancel_flags) { LIST_HEAD(head); struct ldlm_lock *lock, *next; @@ -1357,7 +1365,7 @@ int ldlm_cli_cancel_list_local(struct list_head *cancels, int count, if (left-- == 0) break; - if (flags & LCF_LOCAL) { + if (cancel_flags & LCF_LOCAL) { rc = LDLM_FL_LOCAL_ONLY; ldlm_lock_cancel(lock); } else { @@ -1369,7 +1377,7 @@ int ldlm_cli_cancel_list_local(struct list_head *cancels, int count, * with the LDLM_FL_BL_AST flag in a separate RPC from * the one being generated now. */ - if (!(flags & LCF_BL_AST) && (rc == LDLM_FL_BL_AST)) { + if (!(cancel_flags & LCF_BL_AST) && (rc == LDLM_FL_BL_AST)) { LDLM_DEBUG(lock, "Cancel lock separately"); list_move(&lock->l_bl_ast, &head); bl_ast++; @@ -1384,7 +1392,7 @@ int ldlm_cli_cancel_list_local(struct list_head *cancels, int count, } if (bl_ast > 0) { count -= bl_ast; - ldlm_cli_cancel_list(&head, bl_ast, NULL, 0); + ldlm_cli_cancel_list(&head, bl_ast, NULL, cancel_flags); } return count; @@ -1887,11 +1895,11 @@ int ldlm_cli_cancel_list(struct list_head *cancels, int count, ldlm_cancel_pack(req, cancels, count); else res = ldlm_cli_cancel_req(lock->l_conn_export, - cancels, count, + NULL, cancels, count, flags); } else { res = ldlm_cli_cancel_req(lock->l_conn_export, - cancels, 1, flags); + NULL, cancels, 1, flags); } if (res < 0) { From patchwork Sun Apr 9 12:13:16 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 13205981 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from pdx1-mailman-customer002.dreamhost.com (listserver-buz.dreamhost.com [69.163.136.29]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 0852DC77B70 for ; Sun, 9 Apr 2023 12:44:43 +0000 (UTC) Received: from pdx1-mailman-customer002.dreamhost.com (localhost [127.0.0.1]) by pdx1-mailman-customer002.dreamhost.com (Postfix) with ESMTP id 4PvWW41vsfz22ZH; Sun, 9 Apr 2023 05:24:16 -0700 (PDT) Received: from smtp4.ccs.ornl.gov (smtp4.ccs.ornl.gov [160.91.203.40]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by pdx1-mailman-customer002.dreamhost.com (Postfix) with ESMTPS id 4PvWVL2tFlz22ZJ for ; Sun, 9 Apr 2023 05:23:38 -0700 (PDT) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp4.ccs.ornl.gov (Postfix) with ESMTP id 6989E1008492; Sun, 9 Apr 2023 08:13:28 -0400 (EDT) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 684822B2; Sun, 9 Apr 2023 08:13:28 -0400 (EDT) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Sun, 9 Apr 2023 08:13:16 -0400 Message-Id: <1681042400-15491-37-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1681042400-15491-1-git-send-email-jsimmons@infradead.org> References: <1681042400-15491-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 36/40] lnet: lnet_parse_route uses wrong loop var X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.39 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Chris Horn , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Chris Horn When looping over the gateways list, we're referencing the wrong loop variable to get the gateway nid (ltb instead of ltb2). Fixes: 1a77031c36 ("lustre: lnet/config: convert list_for_each to list_for_each_entry") WC-bug-id: https://jira.whamcloud.com/browse/LU-16606 Lustre-commit: 0a414b1077a2f9dbc ("LU-16606 lnet: lnet_parse_route uses wrong loop var") Signed-off-by: Chris Horn Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/50173 Reviewed-by: jsimmons Reviewed-by: Serguei Smirnov Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- net/lnet/lnet/config.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/net/lnet/lnet/config.c b/net/lnet/lnet/config.c index a54e1db..c239f9c 100644 --- a/net/lnet/lnet/config.c +++ b/net/lnet/lnet/config.c @@ -1168,7 +1168,7 @@ struct lnet_ni * LASSERT(net != LNET_NET_ANY); list_for_each_entry(ltb2, &gateways, ltb_list) { - LASSERT(libcfs_strnid(&nid, ltb->ltb_text) == 0); + LASSERT(libcfs_strnid(&nid, ltb2->ltb_text) == 0); if (lnet_islocalnid(&nid)) { *im_a_router = 1; From patchwork Sun Apr 9 12:13:17 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 13205977 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from pdx1-mailman-customer002.dreamhost.com (listserver-buz.dreamhost.com [69.163.136.29]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id DC93AC77B61 for ; Sun, 9 Apr 2023 12:43:26 +0000 (UTC) Received: from pdx1-mailman-customer002.dreamhost.com (localhost [127.0.0.1]) by pdx1-mailman-customer002.dreamhost.com (Postfix) with ESMTP id 4PvWWK2CKZz22bh; Sun, 9 Apr 2023 05:24:29 -0700 (PDT) Received: from smtp4.ccs.ornl.gov (smtp4.ccs.ornl.gov [160.91.203.40]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by pdx1-mailman-customer002.dreamhost.com (Postfix) with ESMTPS id 4PvWVw4NSSz2161 for ; Sun, 9 Apr 2023 05:24:08 -0700 (PDT) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp4.ccs.ornl.gov (Postfix) with ESMTP id 6E3B51008493; Sun, 9 Apr 2023 08:13:28 -0400 (EDT) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 6CB5C2B3; Sun, 9 Apr 2023 08:13:28 -0400 (EDT) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Sun, 9 Apr 2023 08:13:17 -0400 Message-Id: <1681042400-15491-38-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1681042400-15491-1-git-send-email-jsimmons@infradead.org> References: <1681042400-15491-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 37/40] lustre: tgt: add qos debug X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.39 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Sergey Cheremencev , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Sergey Cheremencev Add several debug lines for QOS allocator. Patch also changes S_CLASS subsystem to S_LOV in lu_tgt_desc_tgt.c thus it can be enabled to capture only QOS debugging. WC-bug-id: https://jira.whamcloud.com/browse/LU-16501 Lustre-commit: 5fe45f0ff98064561 ("LU-16501 tgt: add qos debug") Signed-off-by: Sergey Cheremencev Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/49977 Reviewed-by: Andreas Dilger Reviewed-by: Alex Zhuravlev Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/obdclass/lu_tgt_descs.c | 11 ++++++++++- 1 file changed, 10 insertions(+), 1 deletion(-) diff --git a/fs/lustre/obdclass/lu_tgt_descs.c b/fs/lustre/obdclass/lu_tgt_descs.c index 35e7c7c..d573c12 100644 --- a/fs/lustre/obdclass/lu_tgt_descs.c +++ b/fs/lustre/obdclass/lu_tgt_descs.c @@ -31,7 +31,7 @@ * */ -#define DEBUG_SUBSYSTEM S_CLASS +#define DEBUG_SUBSYSTEM S_LOV #include #include @@ -219,6 +219,8 @@ void lu_tgt_qos_weight_calc(struct lu_tgt_desc *tgt, bool is_mdt) else ltq->ltq_avail = tgt_statfs_bavail(tgt) >> 8; penalty = ltq->ltq_penalty + ltq->ltq_svr->lsq_penalty; + CDEBUG(D_OTHER, "ltq_penalty: %llu lsq_penalty: %llu tgt_bavail: %llu\n", + ltq->ltq_penalty, ltq->ltq_svr->lsq_penalty, ltq->ltq_avail); if (ltq->ltq_avail < penalty) ltq->ltq_weight = 0; else @@ -623,8 +625,15 @@ int ltd_qos_update(struct lu_tgt_descs *ltd, struct lu_tgt_desc *tgt, /* Set max penalties for this tgt and server */ ltq->ltq_penalty += ltq->ltq_penalty_per_obj * ltd->ltd_lov_desc.ld_active_tgt_count; + CDEBUG(D_OTHER, "ltq_penalty: %llu per_obj: %llu tgt_count: %d\n", + ltq->ltq_penalty, ltq->ltq_penalty_per_obj, + ltd->ltd_lov_desc.ld_active_tgt_count); svr->lsq_penalty += svr->lsq_penalty_per_obj * qos->lq_active_svr_count; + CDEBUG(D_OTHER, "lsq_penalty: %llu per_obj: %llu srv_count: %d\n", + svr->lsq_penalty, svr->lsq_penalty_per_obj, + qos->lq_active_svr_count); + /* Decrease all MDS penalties */ list_for_each_entry(svr, &qos->lq_svr_list, lsq_svr_list) { From patchwork Sun Apr 9 12:13:18 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 13205978 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from pdx1-mailman-customer002.dreamhost.com (listserver-buz.dreamhost.com [69.163.136.29]) by smtp.lore.kernel.org (Postfix) with ESMTP id 86CE7C77B70 for ; Sun, 9 Apr 2023 12:43:41 +0000 (UTC) Received: from pdx1-mailman-customer002.dreamhost.com (localhost [127.0.0.1]) by pdx1-mailman-customer002.dreamhost.com (Postfix) with ESMTP id 4PvWWc5vRFz22cC; Sun, 9 Apr 2023 05:24:44 -0700 (PDT) Received: from smtp4.ccs.ornl.gov (smtp4.ccs.ornl.gov [160.91.203.40]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by pdx1-mailman-customer002.dreamhost.com (Postfix) with ESMTPS id 4PvWWV14nWz216X for ; Sun, 9 Apr 2023 05:24:38 -0700 (PDT) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp4.ccs.ornl.gov (Postfix) with ESMTP id 727111008494; Sun, 9 Apr 2023 08:13:28 -0400 (EDT) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 712C02AB; Sun, 9 Apr 2023 08:13:28 -0400 (EDT) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Sun, 9 Apr 2023 08:13:18 -0400 Message-Id: <1681042400-15491-39-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1681042400-15491-1-git-send-email-jsimmons@infradead.org> References: <1681042400-15491-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 38/40] lustre: enc: file names encryption when using secure boot X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.39 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Alex Deiter , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Alex Deiter Secure boot activates lockdown mode in the Linux kernel. And debugfs is restricted when the kernel is locked down. This patch moves file names encryption from debugfs to sysfs. WC-bug-id: https://jira.whamcloud.com/browse/LU-16621 Lustre-commit: 716675fff642655c4 ("LU-16621 enc: file names encryption when using secure boot") Signed-off-by: Alex Deiter Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/50219 Reviewed-by: Andreas Dilger Reviewed-by: Sebastien Buisson Reviewed-by: jsimmons Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/llite/llite_internal.h | 1 + fs/lustre/llite/llite_lib.c | 5 +++-- fs/lustre/llite/lproc_llite.c | 35 ++++++++++++++++++----------------- 3 files changed, 22 insertions(+), 19 deletions(-) diff --git a/fs/lustre/llite/llite_internal.h b/fs/lustre/llite/llite_internal.h index b101a71..72de8f7 100644 --- a/fs/lustre/llite/llite_internal.h +++ b/fs/lustre/llite/llite_internal.h @@ -737,6 +737,7 @@ struct ll_sb_info { spinlock_t ll_lock; spinlock_t ll_pp_extent_lock; /* pp_extent entry*/ spinlock_t ll_process_lock; /* ll_rw_process_info */ + struct lustre_sb_info *lsi; struct obd_uuid ll_sb_uuid; struct obd_export *ll_md_exp; struct obd_export *ll_dt_exp; diff --git a/fs/lustre/llite/llite_lib.c b/fs/lustre/llite/llite_lib.c index 3774ca8..5a9bc61 100644 --- a/fs/lustre/llite/llite_lib.c +++ b/fs/lustre/llite/llite_lib.c @@ -79,7 +79,7 @@ static inline unsigned int ll_get_ra_async_max_active(void) return cfs_cpt_weight(cfs_cpt_tab, CFS_CPT_ANY) >> 1; } -static struct ll_sb_info *ll_init_sbi(void) +static struct ll_sb_info *ll_init_sbi(struct lustre_sb_info *lsi) { struct ll_sb_info *sbi = NULL; unsigned long pages; @@ -99,6 +99,7 @@ static struct ll_sb_info *ll_init_sbi(void) mutex_init(&sbi->ll_lco.lco_lock); spin_lock_init(&sbi->ll_pp_extent_lock); spin_lock_init(&sbi->ll_process_lock); + sbi->lsi = lsi; sbi->ll_rw_stats_on = 0; sbi->ll_statfs_max_age = OBD_STATFS_CACHE_SECONDS; @@ -1245,7 +1246,7 @@ int ll_fill_super(struct super_block *sb) } /* client additional sb info */ - sbi = ll_init_sbi(); + sbi = ll_init_sbi(lsi); lsi->lsi_llsbi = sbi; if (IS_ERR(sbi)) { err = PTR_ERR(sbi); diff --git a/fs/lustre/llite/lproc_llite.c b/fs/lustre/llite/lproc_llite.c index 48d93c6..8b6c86f 100644 --- a/fs/lustre/llite/lproc_llite.c +++ b/fs/lustre/llite/lproc_llite.c @@ -1653,28 +1653,30 @@ static ssize_t ll_nosquash_nids_seq_write(struct file *file, LDEBUGFS_SEQ_FOPS(ll_nosquash_nids); -static int ll_old_b64_enc_seq_show(struct seq_file *m, void *v) +static ssize_t filename_enc_use_old_base64_show(struct kobject *kobj, + struct attribute *attr, + char *buffer) { - struct super_block *sb = m->private; - struct lustre_sb_info *lsi = s2lsi(sb); + struct ll_sb_info *sbi = container_of(kobj, struct ll_sb_info, + ll_kset.kobj); + struct lustre_sb_info *lsi = sbi->lsi; - seq_printf(m, "%u\n", - lsi->lsi_flags & LSI_FILENAME_ENC_B64_OLD_CLI ? 1 : 0); - return 0; + return scnprintf(buffer, PAGE_SIZE, "%u\n", + lsi->lsi_flags & LSI_FILENAME_ENC_B64_OLD_CLI ? 1 : 0); } -static ssize_t ll_old_b64_enc_seq_write(struct file *file, - const char __user *buffer, - size_t count, loff_t *off) +static ssize_t filename_enc_use_old_base64_store(struct kobject *kobj, + struct attribute *attr, + const char *buffer, + size_t count) { - struct seq_file *m = file->private_data; - struct super_block *sb = m->private; - struct lustre_sb_info *lsi = s2lsi(sb); - struct ll_sb_info *sbi = ll_s2sbi(sb); + struct ll_sb_info *sbi = container_of(kobj, struct ll_sb_info, + ll_kset.kobj); + struct lustre_sb_info *lsi = sbi->lsi; bool val; int rc; - rc = kstrtobool_from_user(buffer, count, &val); + rc = kstrtobool(buffer, &val); if (rc) return rc; @@ -1698,7 +1700,7 @@ static ssize_t ll_old_b64_enc_seq_write(struct file *file, return count; } -LDEBUGFS_SEQ_FOPS(ll_old_b64_enc); +LUSTRE_RW_ATTR(filename_enc_use_old_base64); static int ll_pcc_seq_show(struct seq_file *m, void *v) { @@ -1756,8 +1758,6 @@ struct ldebugfs_vars lprocfs_llite_obd_vars[] = { .fops = &ll_nosquash_nids_fops }, { .name = "pcc", .fops = &ll_pcc_fops, }, - { .name = "filename_enc_use_old_base64", - .fops = &ll_old_b64_enc_fops, }, { NULL } }; @@ -1805,6 +1805,7 @@ struct ldebugfs_vars lprocfs_llite_obd_vars[] = { &lustre_attr_opencache_threshold_ms.attr, &lustre_attr_opencache_max_ms.attr, &lustre_attr_inode_cache.attr, + &lustre_attr_filename_enc_use_old_base64.attr, NULL, }; From patchwork Sun Apr 9 12:13:19 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 13205979 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from pdx1-mailman-customer002.dreamhost.com (listserver-buz.dreamhost.com [69.163.136.29]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 68118C77B70 for ; Sun, 9 Apr 2023 12:43:59 +0000 (UTC) Received: from pdx1-mailman-customer002.dreamhost.com (localhost [127.0.0.1]) by pdx1-mailman-customer002.dreamhost.com (Postfix) with ESMTP id 4PvWX93K1Gz217H; Sun, 9 Apr 2023 05:25:13 -0700 (PDT) Received: from smtp4.ccs.ornl.gov (smtp4.ccs.ornl.gov [160.91.203.40]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by pdx1-mailman-customer002.dreamhost.com (Postfix) with ESMTPS id 4PvWWw3ymlz22cZ for ; Sun, 9 Apr 2023 05:25:00 -0700 (PDT) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp4.ccs.ornl.gov (Postfix) with ESMTP id 771451008495; Sun, 9 Apr 2023 08:13:28 -0400 (EDT) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 759FF2B2; Sun, 9 Apr 2023 08:13:28 -0400 (EDT) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Sun, 9 Apr 2023 08:13:19 -0400 Message-Id: <1681042400-15491-40-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1681042400-15491-1-git-send-email-jsimmons@infradead.org> References: <1681042400-15491-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 39/40] lustre: uapi: add DMV_IMP_INHERIT connect flag X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.39 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lai Siyao , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Lai Siyao Add OBD_CONNECT2_DMV_IMP_INHERIT for implicit default LMV inherit. WC-bug-id: https://jira.whamcloud.com/browse/LU-15971 Lustre-commit: 203745e7b07101bb6 ("LU-15971 uapi: add DMV_IMP_INHERIT connect flag") Signed-off-by: Lai Siyao Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/47788 Reviewed-by: Andreas Dilger Reviewed-by: Hongchao Zhang Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/ptlrpc/wiretest.c | 2 ++ include/uapi/linux/lustre/lustre_idl.h | 1 + 2 files changed, 3 insertions(+) diff --git a/fs/lustre/ptlrpc/wiretest.c b/fs/lustre/ptlrpc/wiretest.c index 2c02430..472d155c 100644 --- a/fs/lustre/ptlrpc/wiretest.c +++ b/fs/lustre/ptlrpc/wiretest.c @@ -1245,6 +1245,8 @@ void lustre_assert_wire_constants(void) OBD_CONNECT2_ATOMIC_OPEN_LOCK); LASSERTF(OBD_CONNECT2_ENCRYPT_NAME == 0x8000000ULL, "found 0x%.16llxULL\n", OBD_CONNECT2_ENCRYPT_NAME); + LASSERTF(OBD_CONNECT2_DMV_IMP_INHERIT == 0x20000000ULL, "found 0x%.16llxULL\n", + OBD_CONNECT2_DMV_IMP_INHERIT); LASSERTF(OBD_CONNECT2_ENCRYPT_FID2PATH == 0x40000000ULL, "found 0x%.16llxULL\n", OBD_CONNECT2_ENCRYPT_FID2PATH); LASSERTF(OBD_CKSUM_CRC32 == 0x00000001UL, "found 0x%.8xUL\n", diff --git a/include/uapi/linux/lustre/lustre_idl.h b/include/uapi/linux/lustre/lustre_idl.h index d60d1d8..c979e24 100644 --- a/include/uapi/linux/lustre/lustre_idl.h +++ b/include/uapi/linux/lustre/lustre_idl.h @@ -784,6 +784,7 @@ struct ptlrpc_body_v2 { #define OBD_CONNECT2_LOCK_CONTENTION 0x2000000ULL /* contention detect */ #define OBD_CONNECT2_ATOMIC_OPEN_LOCK 0x4000000ULL /* lock on first open */ #define OBD_CONNECT2_ENCRYPT_NAME 0x8000000ULL /* name encrypt */ +#define OBD_CONNECT2_DMV_IMP_INHERIT 0x20000000ULL /* client handle DMV inheritance */ #define OBD_CONNECT2_ENCRYPT_FID2PATH 0x40000000ULL /* fid2path enc file */ /* XXX README XXX README XXX README XXX README XXX README XXX README XXX * Please DO NOT add OBD_CONNECT flags before first ensuring that this value From patchwork Sun Apr 9 12:13:20 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Simmons X-Patchwork-Id: 13205980 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from pdx1-mailman-customer002.dreamhost.com (listserver-buz.dreamhost.com [69.163.136.29]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 71931C77B61 for ; Sun, 9 Apr 2023 12:44:15 +0000 (UTC) Received: from pdx1-mailman-customer002.dreamhost.com (localhost [127.0.0.1]) by pdx1-mailman-customer002.dreamhost.com (Postfix) with ESMTP id 4PvWXc5WYyz22cv; Sun, 9 Apr 2023 05:25:36 -0700 (PDT) Received: from smtp4.ccs.ornl.gov (smtp4.ccs.ornl.gov [160.91.203.40]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by pdx1-mailman-customer002.dreamhost.com (Postfix) with ESMTPS id 4PvWXP4qxGz22ck for ; Sun, 9 Apr 2023 05:25:25 -0700 (PDT) Received: from star.ccs.ornl.gov (star.ccs.ornl.gov [160.91.202.134]) by smtp4.ccs.ornl.gov (Postfix) with ESMTP id 7B4901008496; Sun, 9 Apr 2023 08:13:28 -0400 (EDT) Received: by star.ccs.ornl.gov (Postfix, from userid 2004) id 7A1082B3; Sun, 9 Apr 2023 08:13:28 -0400 (EDT) From: James Simmons To: Andreas Dilger , Oleg Drokin , NeilBrown Date: Sun, 9 Apr 2023 08:13:20 -0400 Message-Id: <1681042400-15491-41-git-send-email-jsimmons@infradead.org> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1681042400-15491-1-git-send-email-jsimmons@infradead.org> References: <1681042400-15491-1-git-send-email-jsimmons@infradead.org> Subject: [lustre-devel] [PATCH 40/40] lustre: llite: dir layout inheritance fixes X-BeenThere: lustre-devel@lists.lustre.org X-Mailman-Version: 2.1.39 Precedence: list List-Id: "For discussing Lustre software development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Vitaly Fertman , Lustre Development List MIME-Version: 1.0 Errors-To: lustre-devel-bounces@lists.lustre.org Sender: "lustre-devel" From: Vitaly Fertman fixes for some minor problems: - it may happen that the depth is not set on a dir, do not consider depth == 0 as a real depth while checking if the root default is applicable; - setdirstripe util implicitely sets max_inherit to 3 for non-striped dir when -i option is given but -c is not; at the same time 3 is the default for striped dirs only; - getdirstripe shows inherited default layouts with max_inherit==0, whereas it has no sense anymore; the same for an explicitily set default layout on a dir/root with max_inherit==0; - getdirstripe hides max_inherit_rr when stripe_offset != -1 as it has no sense and reset to 0, however it leads to user confusion; HPE-bug-id: LUS-11090 WC-bug-id: https://jira.whamcloud.com/browse/LU-16527 Lustre-commit: 6a5a4b49fabcb4c97 ("LU-16527 llite: dir layout inheritance fixes") Signed-off-by: Vitaly Fertman Reviewed-on: https://es-gerrit.dev.cray.com/161035 Reviewed-by: Alexander Boyko Reviewed-by: Alexey Lyashkov Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/49882 Reviewed-by: Lai Siyao Reviewed-by: Oleg Drokin Signed-off-by: James Simmons --- fs/lustre/llite/dir.c | 34 +++++++++++++--------------------- fs/lustre/llite/namei.c | 10 ++++++++-- 2 files changed, 21 insertions(+), 23 deletions(-) diff --git a/fs/lustre/llite/dir.c b/fs/lustre/llite/dir.c index 0422701..871dd93 100644 --- a/fs/lustre/llite/dir.c +++ b/fs/lustre/llite/dir.c @@ -1695,40 +1695,34 @@ static long ll_dir_ioctl(struct file *file, unsigned int cmd, unsigned long arg) /* Get default LMV EA */ if (lum.lum_magic == LMV_USER_MAGIC) { + struct lmv_user_md *lum; + struct ll_inode_info *lli; + if (lmmsize > sizeof(*ulmv)) { rc = -EINVAL; goto finish_req; } - if (root_request) { - struct lmv_user_md *lum; - struct ll_inode_info *lli; + lum = (struct lmv_user_md *)lmm; + if (lum->lum_max_inherit == LMV_INHERIT_NONE) { + rc = -ENODATA; + goto finish_req; + } - lum = (struct lmv_user_md *)lmm; + if (root_request) { lli = ll_i2info(inode); if (lum->lum_max_inherit != LMV_INHERIT_UNLIMITED) { - if (lum->lum_max_inherit == - LMV_INHERIT_NONE || - lum->lum_max_inherit < + if (lum->lum_max_inherit < LMV_INHERIT_END || lum->lum_max_inherit > LMV_INHERIT_MAX || - lum->lum_max_inherit < + lum->lum_max_inherit <= lli->lli_dir_depth) { rc = -ENODATA; goto finish_req; } - if (lum->lum_max_inherit == - lli->lli_dir_depth) { - lum->lum_max_inherit = - LMV_INHERIT_NONE; - lum->lum_max_inherit_rr = - LMV_INHERIT_RR_NONE; - goto out_copy; - } - lum->lum_max_inherit -= lli->lli_dir_depth; } @@ -1748,10 +1742,8 @@ static long ll_dir_ioctl(struct file *file, unsigned int cmd, unsigned long arg) goto out_copy; } - if (lum->lum_max_inherit_rr > - lli->lli_dir_depth) - lum->lum_max_inherit_rr -= - lli->lli_dir_depth; + lum->lum_max_inherit_rr -= + lli->lli_dir_depth; } } out_copy: diff --git a/fs/lustre/llite/namei.c b/fs/lustre/llite/namei.c index 0c4c8e6..a19e5f7 100644 --- a/fs/lustre/llite/namei.c +++ b/fs/lustre/llite/namei.c @@ -1464,8 +1464,10 @@ static void ll_qos_mkdir_prep(struct md_op_data *op_data, struct inode *dir) struct ll_inode_info *rlli = ll_i2info(root); struct ll_inode_info *lli = ll_i2info(dir); struct lmv_stripe_md *lsm; + unsigned short depth; op_data->op_dir_depth = lli->lli_inherit_depth ?: lli->lli_dir_depth; + depth = lli->lli_dir_depth; /* parent directory is striped */ if (unlikely(lli->lli_lsm_md)) @@ -1492,13 +1494,17 @@ static void ll_qos_mkdir_prep(struct md_op_data *op_data, struct inode *dir) if (lsm->lsm_md_master_mdt_index != LMV_OFFSET_DEFAULT) goto unlock; + /** + * Check if the fs default is to be applied. + * depth == 0 means 'not inited' for not root dir. + */ if (lsm->lsm_md_max_inherit != LMV_INHERIT_NONE && (lsm->lsm_md_max_inherit == LMV_INHERIT_UNLIMITED || - lsm->lsm_md_max_inherit >= lli->lli_dir_depth)) { + (depth && lsm->lsm_md_max_inherit > depth))) { op_data->op_flags |= MF_QOS_MKDIR; if (lsm->lsm_md_max_inherit_rr != LMV_INHERIT_RR_NONE && (lsm->lsm_md_max_inherit_rr == LMV_INHERIT_RR_UNLIMITED || - lsm->lsm_md_max_inherit_rr >= lli->lli_dir_depth)) + (depth && lsm->lsm_md_max_inherit_rr > depth))) op_data->op_flags |= MF_RR_MKDIR; CDEBUG(D_INODE, DFID" requests qos mkdir %#x\n", PFID(&lli->lli_fid), op_data->op_flags);