From patchwork Sat Jan 15 04:29:08 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Bob Pearson X-Patchwork-Id: 12714322 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9E1F4C433F5 for ; Sat, 15 Jan 2022 04:29:57 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229992AbiAOE34 (ORCPT ); Fri, 14 Jan 2022 23:29:56 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:42528 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229952AbiAOE34 (ORCPT ); Fri, 14 Jan 2022 23:29:56 -0500 Received: from mail-ot1-x32c.google.com (mail-ot1-x32c.google.com [IPv6:2607:f8b0:4864:20::32c]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 1BA60C061574 for ; Fri, 14 Jan 2022 20:29:56 -0800 (PST) Received: by mail-ot1-x32c.google.com with SMTP id 60-20020a9d0142000000b0059103eb18d4so12484118otu.2 for ; Fri, 14 Jan 2022 20:29:56 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=LjPeX1jbCfEkNv54BR706NhrU+GZ1wnWhr3NxoYiB2c=; b=TNNZAERAGHEdg6SukFJjCIKPhwZifiwrUv/TIDKCLMzvjdfU0Nsj24CswOl1M2eyhc 18fwIRJspZXtVrqQ3oJ+0GUpFJImjr73MbRtnsgSuLCzSQz1emXQxE9QKjJ16W0enW/J NK5Z0tePJ9bq/PBzXpil3F/shF2/4GABeb9ULFwdIu9Io4fuVFl9cmGvUDrAFPZSYhwm rcU/6yH/dw7xnaWXLXGA6GUuBBsCH1+GuVF0+Y5yaJYJGoQLvHwi1VP+4hTD33hYtXus In3+xsJY3wksm2hguBUOpGCeBROezvx/Ym2uhgg3NklhWMwIAssq/dKpt5gPdX9IbTVI JAPQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=LjPeX1jbCfEkNv54BR706NhrU+GZ1wnWhr3NxoYiB2c=; b=H7Fcd8mGNTUqF74tZrVVcBaNxL1Xwsmo8ocRWoNebquqMYsLgpIsC452Cgofqxazz0 pYBV2+d4pC9xbp0q4mlXF4Q3kYP3OmxYKpNzV6RMhLX861Vwc4oTjtC+U4g1ddbdh53T R1rvxekEUheqXeORcb48xj/VzxtzzC5bihlq/GhujE7wytIlmiuR/wlKbODTCwFhQ1mW IzTFjRE/iAt/V9TY0HDJFt7+HTHlJKPsMtsZNefOA4pdzqiuhJfIbRggazieIpptGm6d 7lEZgLIs1ugV1J3PBN6iIfZ2xRZnRzeeKxNFpmD7TeEjCEd73NTfhEXrsHLBvDNGz0CU O7dg== X-Gm-Message-State: AOAM533wg4ZRAA3qLIDJn23RL8CFlttiPvGiu2eOjeFI2VGs49yA6FMU UbA/zYbn4N/w/4lDcHiBF78= X-Google-Smtp-Source: ABdhPJzKqx0xcuido9K4GVGwRGduz4+Z4YCP8Ktwye0aolaph54377KtUfKSdWO7s/qMefFoXLjpCw== X-Received: by 2002:a9d:4b13:: with SMTP id q19mr8982213otf.300.1642220995326; Fri, 14 Jan 2022 20:29:55 -0800 (PST) Received: from ubuntu-21.tx.rr.com (097-099-248-255.res.spectrum.com. [97.99.248.255]) by smtp.googlemail.com with ESMTPSA id k8sm2757515oon.2.2022.01.14.20.29.54 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 14 Jan 2022 20:29:54 -0800 (PST) From: Bob Pearson To: jgg@nvidia.com, zyjzyj2000@gmail.com, linux-rdma@vger.kernel.org Cc: Bob Pearson Subject: [PATCH for-next v9 1/4] RDMA/rxe: Move keyed objects to rxe_mcast.c Date: Fri, 14 Jan 2022 22:29:08 -0600 Message-Id: <20220115042910.40181-2-rpearsonhpe@gmail.com> X-Mailer: git-send-email 2.32.0 In-Reply-To: <20220115042910.40181-1-rpearsonhpe@gmail.com> References: <20220115042910.40181-1-rpearsonhpe@gmail.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-rdma@vger.kernel.org Currently there are two different types of objects supported by the pool machinery in rxe_pool.c. One is shared with rdma/core and can be indexed and the other has a key. The keyed type is only used once by rxe_mcast.c and is not shared with rdma/core. This patch separates this type by itself and moves the code into rxe_mcast.c which will allow simplification of the other object types. rxe_mcast.c is mostly re-written. Suggested-by: Jason Gunthorpe Signed-off-by: Bob Pearson --- drivers/infiniband/sw/rxe/rxe.c | 19 +- drivers/infiniband/sw/rxe/rxe_loc.h | 19 +- drivers/infiniband/sw/rxe/rxe_mcast.c | 436 +++++++++++++++++++------- drivers/infiniband/sw/rxe/rxe_net.c | 18 -- drivers/infiniband/sw/rxe/rxe_pool.c | 135 -------- drivers/infiniband/sw/rxe/rxe_pool.h | 39 --- drivers/infiniband/sw/rxe/rxe_qp.c | 5 +- drivers/infiniband/sw/rxe/rxe_recv.c | 28 +- drivers/infiniband/sw/rxe/rxe_verbs.c | 26 -- drivers/infiniband/sw/rxe/rxe_verbs.h | 29 +- 10 files changed, 367 insertions(+), 387 deletions(-) diff --git a/drivers/infiniband/sw/rxe/rxe.c b/drivers/infiniband/sw/rxe/rxe.c index fab291245366..c560d467a972 100644 --- a/drivers/infiniband/sw/rxe/rxe.c +++ b/drivers/infiniband/sw/rxe/rxe.c @@ -28,8 +28,6 @@ void rxe_dealloc(struct ib_device *ib_dev) rxe_pool_cleanup(&rxe->cq_pool); rxe_pool_cleanup(&rxe->mr_pool); rxe_pool_cleanup(&rxe->mw_pool); - rxe_pool_cleanup(&rxe->mc_grp_pool); - rxe_pool_cleanup(&rxe->mc_elem_pool); if (rxe->tfm) crypto_free_shash(rxe->tfm); @@ -158,22 +156,8 @@ static int rxe_init_pools(struct rxe_dev *rxe) if (err) goto err8; - err = rxe_pool_init(rxe, &rxe->mc_grp_pool, RXE_TYPE_MC_GRP, - rxe->attr.max_mcast_grp); - if (err) - goto err9; - - err = rxe_pool_init(rxe, &rxe->mc_elem_pool, RXE_TYPE_MC_ELEM, - rxe->attr.max_total_mcast_qp_attach); - if (err) - goto err10; - return 0; -err10: - rxe_pool_cleanup(&rxe->mc_grp_pool); -err9: - rxe_pool_cleanup(&rxe->mw_pool); err8: rxe_pool_cleanup(&rxe->mr_pool); err7: @@ -206,6 +190,9 @@ static int rxe_init(struct rxe_dev *rxe) if (err) return err; + spin_lock_init(&rxe->mcg_lock); + rxe->mcg_tree = RB_ROOT; + /* init pending mmap list */ spin_lock_init(&rxe->mmap_offset_lock); spin_lock_init(&rxe->pending_lock); diff --git a/drivers/infiniband/sw/rxe/rxe_loc.h b/drivers/infiniband/sw/rxe/rxe_loc.h index b1e174afb1d4..805f40f84e62 100644 --- a/drivers/infiniband/sw/rxe/rxe_loc.h +++ b/drivers/infiniband/sw/rxe/rxe_loc.h @@ -40,18 +40,11 @@ void rxe_cq_disable(struct rxe_cq *cq); void rxe_cq_cleanup(struct rxe_pool_elem *arg); /* rxe_mcast.c */ -int rxe_mcast_get_grp(struct rxe_dev *rxe, union ib_gid *mgid, - struct rxe_mc_grp **grp_p); - -int rxe_mcast_add_grp_elem(struct rxe_dev *rxe, struct rxe_qp *qp, - struct rxe_mc_grp *grp); - -int rxe_mcast_drop_grp_elem(struct rxe_dev *rxe, struct rxe_qp *qp, - union ib_gid *mgid); - -void rxe_drop_all_mcast_groups(struct rxe_qp *qp); - -void rxe_mc_cleanup(struct rxe_pool_elem *arg); +struct rxe_mcg *rxe_lookup_mcg(struct rxe_dev *rxe, union ib_gid *mgid); +int rxe_attach_mcast(struct ib_qp *ibqp, union ib_gid *mgid, u16 mlid); +int rxe_detach_mcast(struct ib_qp *ibqp, union ib_gid *mgid, u16 mlid); +void rxe_cleanup_mcast(struct rxe_qp *qp); +void rxe_cleanup_mcg(struct kref *kref); /* rxe_mmap.c */ struct rxe_mmap_info { @@ -106,8 +99,6 @@ int rxe_prepare(struct rxe_pkt_info *pkt, struct sk_buff *skb); int rxe_xmit_packet(struct rxe_qp *qp, struct rxe_pkt_info *pkt, struct sk_buff *skb); const char *rxe_parent_name(struct rxe_dev *rxe, unsigned int port_num); -int rxe_mcast_add(struct rxe_dev *rxe, union ib_gid *mgid); -int rxe_mcast_delete(struct rxe_dev *rxe, union ib_gid *mgid); /* rxe_qp.c */ int rxe_qp_chk_init(struct rxe_dev *rxe, struct ib_qp_init_attr *init); diff --git a/drivers/infiniband/sw/rxe/rxe_mcast.c b/drivers/infiniband/sw/rxe/rxe_mcast.c index bd1ac88b8700..95eb8c9ccda0 100644 --- a/drivers/infiniband/sw/rxe/rxe_mcast.c +++ b/drivers/infiniband/sw/rxe/rxe_mcast.c @@ -1,178 +1,398 @@ // SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB /* + * Copyright (c) 2022 Hewlett Packard Enterprise, Inc. All rights reserved. * Copyright (c) 2016 Mellanox Technologies Ltd. All rights reserved. * Copyright (c) 2015 System Fabric Works, Inc. All rights reserved. */ +/* + * rxe_mcast.c implements driver support for multicast transport. + * It is based on two data structures struct rxe_mcg ('mcg') and + * struct rxe_mca ('mca'). An mcg is allocated each time a qp is + * attached to a new mgid for the first time. These are held in a red-black + * tree and indexed by the mgid. This data structure is searched for + * the mcast group when a multicast packet is received and when another + * qp is attached to the same mgid. It is cleaned up when the last qp + * is detached from the mcg. Each time a qp is attached to an mcg + * an mca is created to hold pointers to the qp and + * the mcg and is added to two lists. One is a list of mcg's + * attached to by the qp and the other is the list of qp's attached + * to the mcg. mcg's are reference counted and once the count goes to + * zero it is inactive and will be cleaned up. + * + * The qp list is protected by mcg->lock while the other data + * structures are protected by rxe->mcg_lock. The performance critical + * path of processing multicast packets only requres holding the mcg->lock + * while the multicast related verbs APIs require holding both the locks. + */ + #include "rxe.h" -#include "rxe_loc.h" -/* caller should hold mc_grp_pool->pool_lock */ -static struct rxe_mc_grp *create_grp(struct rxe_dev *rxe, - struct rxe_pool *pool, - union ib_gid *mgid) +static int rxe_mcast_add(struct rxe_dev *rxe, union ib_gid *mgid) { - int err; - struct rxe_mc_grp *grp; + unsigned char ll_addr[ETH_ALEN]; + + ipv6_eth_mc_map((struct in6_addr *)mgid->raw, ll_addr); - grp = rxe_alloc_locked(&rxe->mc_grp_pool); - if (!grp) - return ERR_PTR(-ENOMEM); + return dev_mc_add(rxe->ndev, ll_addr); +} - INIT_LIST_HEAD(&grp->qp_list); - spin_lock_init(&grp->mcg_lock); - grp->rxe = rxe; - rxe_add_key_locked(grp, mgid); +static int rxe_mcast_delete(struct rxe_dev *rxe, union ib_gid *mgid) +{ + unsigned char ll_addr[ETH_ALEN]; - err = rxe_mcast_add(rxe, mgid); - if (unlikely(err)) { - rxe_drop_key_locked(grp); - rxe_drop_ref(grp); - return ERR_PTR(err); + ipv6_eth_mc_map((struct in6_addr *)mgid->raw, ll_addr); + + return dev_mc_del(rxe->ndev, ll_addr); +} + +/** + * __rxe_insert_mcg() - insert an mcg into red-black tree (rxe->mcg_tree) + * @mcg: mcast group object with an embedded red-black tree node + * + * Context: caller must hold rxe->mcg_lock and must first search + * the tree to see if the mcg is already present. + */ +static void __rxe_insert_mcg(struct rxe_mcg *mcg) +{ + struct rb_root *tree = &mcg->rxe->mcg_tree; + struct rb_node **link = &tree->rb_node; + struct rb_node *node = NULL; + struct rxe_mcg *tmp; + int cmp; + + while (*link) { + node = *link; + tmp = rb_entry(node, struct rxe_mcg, node); + + cmp = memcmp(&tmp->mgid, &mcg->mgid, sizeof(mcg->mgid)); + if (cmp > 0) + link = &(*link)->rb_left; + else + link = &(*link)->rb_right; } - return grp; + rb_link_node(&mcg->node, node, link); + rb_insert_color(&mcg->node, tree); } -int rxe_mcast_get_grp(struct rxe_dev *rxe, union ib_gid *mgid, - struct rxe_mc_grp **grp_p) +static void __rxe_remove_mcg(struct rxe_mcg *mcg) { + rb_erase(&mcg->node, &mcg->rxe->mcg_tree); +} + +/** + * __rxe_lookup_mcg() - lookup mcg in rxe->mcg_tree while holding lock + * @rxe: rxe device object + * @mgid: multicast IP address + * + * Context: caller must hold rxe->mcg_lock + * Returns: mcg on success or NULL + */ +static struct rxe_mcg *__rxe_lookup_mcg(struct rxe_dev *rxe, + union ib_gid *mgid) +{ + struct rb_root *tree = &rxe->mcg_tree; + struct rxe_mcg *mcg; + struct rb_node *node; + int cmp; + + node = tree->rb_node; + + while (node) { + mcg = rb_entry(node, struct rxe_mcg, node); + + cmp = memcmp(&mcg->mgid, mgid, sizeof(*mgid)); + + if (cmp > 0) + node = node->rb_left; + else if (cmp < 0) + node = node->rb_right; + else + break; + } + + if (node && kref_get_unless_zero(&mcg->ref_cnt)) + return mcg; + + return NULL; +} + +/** + * rxe_lookup_mcg() - lookup up mcast group from mgid + * @rxe: rxe device object + * @mgid: multicast IP address + * + * Returns: mcg if found else NULL + */ +struct rxe_mcg *rxe_lookup_mcg(struct rxe_dev *rxe, + union ib_gid *mgid) +{ + struct rxe_mcg *mcg; + + spin_lock_bh(&rxe->mcg_lock); + mcg = __rxe_lookup_mcg(rxe, mgid); + spin_unlock_bh(&rxe->mcg_lock); + + return mcg; +} + +/** + * rxe_get_mcg() - lookup or allocate a mcg + * @rxe: rxe device object + * @mgid: multicast IP address + * @mcgg: address of returned mcg value + * + * Returns: 0 on success else an error + */ +static int rxe_get_mcg(struct rxe_dev *rxe, union ib_gid *mgid, + struct rxe_mcg **mcgp) +{ + struct rxe_mcg *mcg, *tmp; int err; - struct rxe_mc_grp *grp; - struct rxe_pool *pool = &rxe->mc_grp_pool; - if (rxe->attr.max_mcast_qp_attach == 0) + if (rxe->attr.max_mcast_grp == 0) return -EINVAL; - write_lock_bh(&pool->pool_lock); + mcg = rxe_lookup_mcg(rxe, mgid); + if (mcg) + goto done; + + mcg = kzalloc(sizeof(*mcg), GFP_KERNEL); + if (!mcg) + return -ENOMEM; - grp = rxe_pool_get_key_locked(pool, mgid); - if (grp) + spin_lock_bh(&rxe->mcg_lock); + tmp = __rxe_lookup_mcg(rxe, mgid); + if (unlikely(tmp)) { + /* another thread just added this mcg, use that one */ + spin_unlock_bh(&rxe->mcg_lock); + kfree(mcg); + mcg = tmp; goto done; + } - grp = create_grp(rxe, pool, mgid); - if (IS_ERR(grp)) { - write_unlock_bh(&pool->pool_lock); - err = PTR_ERR(grp); - return err; + if (rxe->num_mcg >= rxe->attr.max_mcast_grp) { + err = -ENOMEM; + goto err_out; } + err = rxe_mcast_add(rxe, mgid); + if (unlikely(err)) + goto err_out; + + INIT_LIST_HEAD(&mcg->qp_list); + spin_lock_init(&mcg->lock); + mcg->rxe = rxe; + memcpy(&mcg->mgid, mgid, sizeof(*mgid)); + kref_init(&mcg->ref_cnt); + __rxe_insert_mcg(mcg); + spin_unlock_bh(&rxe->mcg_lock); done: - write_unlock_bh(&pool->pool_lock); - *grp_p = grp; + *mcgp = mcg; return 0; +err_out: + spin_unlock_bh(&rxe->mcg_lock); + kfree(mcg); + return err; } -int rxe_mcast_add_grp_elem(struct rxe_dev *rxe, struct rxe_qp *qp, - struct rxe_mc_grp *grp) +/** + * rxe_attach_mcg() - attach qp to mcg + * @qp: qp object + * @mcg: mcg object + * + * Context: caller must hold reference on qp and mcg. + * Returns: 0 on success else an error + */ +static int rxe_attach_mcg(struct rxe_qp *qp, struct rxe_mcg *mcg) { + struct rxe_dev *rxe = to_rdev(qp->ibqp.device); + struct rxe_mca *mca; int err; - struct rxe_mc_elem *elem; - /* check to see of the qp is already a member of the group */ - spin_lock_bh(&qp->grp_lock); - spin_lock_bh(&grp->mcg_lock); - list_for_each_entry(elem, &grp->qp_list, qp_list) { - if (elem->qp == qp) { + spin_lock_bh(&rxe->mcg_lock); + spin_lock_bh(&mcg->lock); + list_for_each_entry(mca, &mcg->qp_list, qp_list) { + if (mca->qp == qp) { err = 0; goto out; } } - if (grp->num_qp >= rxe->attr.max_mcast_qp_attach) { + if (rxe->num_attach >= rxe->attr.max_total_mcast_qp_attach + || mcg->num_qp >= rxe->attr.max_mcast_qp_attach) { err = -ENOMEM; goto out; } - elem = rxe_alloc_locked(&rxe->mc_elem_pool); - if (!elem) { + mca = kzalloc(sizeof(*mca), GFP_KERNEL); + if (!mca) { err = -ENOMEM; goto out; } - /* each qp holds a ref on the grp */ - rxe_add_ref(grp); + /* each mca holds a ref on mcg and qp */ + kref_get(&mcg->ref_cnt); + rxe_add_ref(qp); - grp->num_qp++; - elem->qp = qp; - elem->grp = grp; + mcg->num_qp++; + rxe->num_attach++; + mca->qp = qp; + mca->mcg = mcg; - list_add(&elem->qp_list, &grp->qp_list); - list_add(&elem->grp_list, &qp->grp_list); + list_add(&mca->qp_list, &mcg->qp_list); + list_add(&mca->mcg_list, &qp->mcg_list); err = 0; out: - spin_unlock_bh(&grp->mcg_lock); - spin_unlock_bh(&qp->grp_lock); + spin_unlock_bh(&mcg->lock); + spin_unlock_bh(&rxe->mcg_lock); return err; } -int rxe_mcast_drop_grp_elem(struct rxe_dev *rxe, struct rxe_qp *qp, - union ib_gid *mgid) +/** + * __rxe_cleanup_mca() - cleanup mca object + * @mca: mca object + * + * Context: caller holds rxe->mcg_lock and holds at least one reference + * to mca->mcg from the mca object and one from the rxe_get_mcg() + * call. If this is the last attachment to the mcast mcg object then + * drop the last refernece to it. + * Returns: 1 if the mcg is finished and needs to be cleaned up else 0. + */ +static void __rxe_cleanup_mca(struct rxe_mca *mca) { - struct rxe_mc_grp *grp; - struct rxe_mc_elem *elem, *tmp; - - grp = rxe_pool_get_key(&rxe->mc_grp_pool, mgid); - if (!grp) - goto err1; - - spin_lock_bh(&qp->grp_lock); - spin_lock_bh(&grp->mcg_lock); - - list_for_each_entry_safe(elem, tmp, &grp->qp_list, qp_list) { - if (elem->qp == qp) { - list_del(&elem->qp_list); - list_del(&elem->grp_list); - grp->num_qp--; - - spin_unlock_bh(&grp->mcg_lock); - spin_unlock_bh(&qp->grp_lock); - rxe_drop_ref(elem); - rxe_drop_ref(grp); /* ref held by QP */ - rxe_drop_ref(grp); /* ref from get_key */ - return 0; + struct rxe_mcg *mcg = mca->mcg; + struct rxe_dev *rxe = mcg->rxe; + + list_del(&mca->qp_list); + list_del(&mca->mcg_list); + rxe_drop_ref(mca->qp); + kfree(mca); + kref_put(&mcg->ref_cnt, rxe_cleanup_mcg); + rxe->num_attach--; + if (--mcg->num_qp <= 0) + kref_put(&mcg->ref_cnt, rxe_cleanup_mcg); +} + +/** + * rxe_detach_mcg() - detach qp from mcg + * @qp: qp object + * @mcg: mcg object + * + * Context: caller must hold reference to qp and mcg. + * Returns: 0 on success else an error. + */ +static int rxe_detach_mcg(struct rxe_qp *qp, struct rxe_mcg *mcg) +{ + struct rxe_dev *rxe = to_rdev(qp->ibqp.device); + struct rxe_mca *mca, *tmp; + int ret = -EINVAL; + + spin_lock_bh(&rxe->mcg_lock); + spin_lock_bh(&mcg->lock); + + list_for_each_entry_safe(mca, tmp, &mcg->qp_list, qp_list) { + if (mca->qp == qp) { + __rxe_cleanup_mca(mca); + ret = 0; + goto done; } } +done: + spin_unlock_bh(&mcg->lock); + spin_unlock_bh(&rxe->mcg_lock); + return ret; +} + +/** + * rxe_attach_mcast() - attach qp to multicast group (see IBA-11.3.1) + * @ibqp: (IB) qp object + * @mgid: multicast IP address + * @mlid: multicast LID, ignored for RoCEv2 (see IBA-A17.5.6) + * + * Returns: 0 on success else an errno + */ +int rxe_attach_mcast(struct ib_qp *ibqp, union ib_gid *mgid, u16 mlid) +{ + struct rxe_dev *rxe = to_rdev(ibqp->device); + struct rxe_qp *qp = to_rqp(ibqp); + struct rxe_mcg *mcg; + int err; + + err = rxe_get_mcg(rxe, mgid, &mcg); + if (err) + return err; + + err = rxe_attach_mcg(qp, mcg); + kref_put(&mcg->ref_cnt, rxe_cleanup_mcg); - spin_unlock_bh(&grp->mcg_lock); - spin_unlock_bh(&qp->grp_lock); - rxe_drop_ref(grp); /* ref from get_key */ -err1: - return -EINVAL; + return err; } -void rxe_drop_all_mcast_groups(struct rxe_qp *qp) +/** + * rxe_detach_mcast() - detach qp from multicast group (see IBA-11.3.2) + * @ibqp: address of (IB) qp object + * @mgid: multicast IP address + * @mlid: multicast LID, ignored for RoCEv2 (see IBA-A17.5.6) + * + * Returns: 0 on success else an errno + */ +int rxe_detach_mcast(struct ib_qp *ibqp, union ib_gid *mgid, u16 mlid) { - struct rxe_mc_grp *grp; - struct rxe_mc_elem *elem; + struct rxe_dev *rxe = to_rdev(ibqp->device); + struct rxe_qp *qp = to_rqp(ibqp); + struct rxe_mcg *mcg; + int err; + + mcg = rxe_lookup_mcg(rxe, mgid); + if (!mcg) + return -EINVAL; + + err = rxe_detach_mcg(qp, mcg); + kref_put(&mcg->ref_cnt, rxe_cleanup_mcg); + + return err; +} + +/** + * rxe_cleanup_mcast() - cleanup all mcg's qp is attached to + * @qp: qp object + */ +void rxe_cleanup_mcast(struct rxe_qp *qp) +{ + struct rxe_dev *rxe = to_rdev(qp->ibqp.device); + struct rxe_mca *mca; + struct rxe_mcg *mcg; while (1) { - spin_lock_bh(&qp->grp_lock); - if (list_empty(&qp->grp_list)) { - spin_unlock_bh(&qp->grp_lock); - break; + spin_lock_bh(&rxe->mcg_lock); + if (list_empty(&qp->mcg_list)) { + spin_unlock_bh(&rxe->mcg_lock); + return; } - elem = list_first_entry(&qp->grp_list, struct rxe_mc_elem, - grp_list); - list_del(&elem->grp_list); - spin_unlock_bh(&qp->grp_lock); - - grp = elem->grp; - spin_lock_bh(&grp->mcg_lock); - list_del(&elem->qp_list); - grp->num_qp--; - spin_unlock_bh(&grp->mcg_lock); - rxe_drop_ref(grp); - rxe_drop_ref(elem); + mca = list_first_entry(&qp->mcg_list, typeof(*mca), mcg_list); + mcg = mca->mcg; + spin_lock_bh(&mcg->lock); + __rxe_cleanup_mca(mca); + spin_unlock_bh(&mcg->lock); + spin_unlock_bh(&rxe->mcg_lock); } } -void rxe_mc_cleanup(struct rxe_pool_elem *elem) +/** + * rxe_cleanup_mcg() - cleanup mcg object + * @mcg: mcg object + * + * Context: caller has removed all references to mcg + */ +void rxe_cleanup_mcg(struct kref *kref) { - struct rxe_mc_grp *grp = container_of(elem, typeof(*grp), elem); - struct rxe_dev *rxe = grp->rxe; + struct rxe_mcg *mcg = container_of(kref, typeof(*mcg), ref_cnt); + struct rxe_dev *rxe = mcg->rxe; - rxe_drop_key(grp); - rxe_mcast_delete(rxe, &grp->mgid); + __rxe_remove_mcg(mcg); + rxe_mcast_delete(rxe, &mcg->mgid); + kfree(mcg); } diff --git a/drivers/infiniband/sw/rxe/rxe_net.c b/drivers/infiniband/sw/rxe/rxe_net.c index be72bdbfb4ba..a8cfa7160478 100644 --- a/drivers/infiniband/sw/rxe/rxe_net.c +++ b/drivers/infiniband/sw/rxe/rxe_net.c @@ -20,24 +20,6 @@ static struct rxe_recv_sockets recv_sockets; -int rxe_mcast_add(struct rxe_dev *rxe, union ib_gid *mgid) -{ - unsigned char ll_addr[ETH_ALEN]; - - ipv6_eth_mc_map((struct in6_addr *)mgid->raw, ll_addr); - - return dev_mc_add(rxe->ndev, ll_addr); -} - -int rxe_mcast_delete(struct rxe_dev *rxe, union ib_gid *mgid) -{ - unsigned char ll_addr[ETH_ALEN]; - - ipv6_eth_mc_map((struct in6_addr *)mgid->raw, ll_addr); - - return dev_mc_del(rxe->ndev, ll_addr); -} - static struct dst_entry *rxe_find_route4(struct net_device *ndev, struct in_addr *saddr, struct in_addr *daddr) diff --git a/drivers/infiniband/sw/rxe/rxe_pool.c b/drivers/infiniband/sw/rxe/rxe_pool.c index 4cb003885e00..4e558d5e0ded 100644 --- a/drivers/infiniband/sw/rxe/rxe_pool.c +++ b/drivers/infiniband/sw/rxe/rxe_pool.c @@ -81,20 +81,6 @@ static const struct rxe_type_info { .min_index = RXE_MIN_MW_INDEX, .max_index = RXE_MAX_MW_INDEX, }, - [RXE_TYPE_MC_GRP] = { - .name = "rxe-mc_grp", - .size = sizeof(struct rxe_mc_grp), - .elem_offset = offsetof(struct rxe_mc_grp, elem), - .cleanup = rxe_mc_cleanup, - .flags = RXE_POOL_KEY, - .key_offset = offsetof(struct rxe_mc_grp, mgid), - .key_size = sizeof(union ib_gid), - }, - [RXE_TYPE_MC_ELEM] = { - .name = "rxe-mc_elem", - .size = sizeof(struct rxe_mc_elem), - .elem_offset = offsetof(struct rxe_mc_elem, elem), - }, }; static int rxe_pool_init_index(struct rxe_pool *pool, u32 max, u32 min) @@ -152,12 +138,6 @@ int rxe_pool_init( goto out; } - if (pool->flags & RXE_POOL_KEY) { - pool->key.tree = RB_ROOT; - pool->key.key_offset = info->key_offset; - pool->key.key_size = info->key_size; - } - out: return err; } @@ -214,77 +194,6 @@ static int rxe_insert_index(struct rxe_pool *pool, struct rxe_pool_elem *new) return 0; } -static int rxe_insert_key(struct rxe_pool *pool, struct rxe_pool_elem *new) -{ - struct rb_node **link = &pool->key.tree.rb_node; - struct rb_node *parent = NULL; - struct rxe_pool_elem *elem; - int cmp; - - while (*link) { - parent = *link; - elem = rb_entry(parent, struct rxe_pool_elem, key_node); - - cmp = memcmp((u8 *)elem + pool->key.key_offset, - (u8 *)new + pool->key.key_offset, - pool->key.key_size); - - if (cmp == 0) { - pr_warn("key already exists!\n"); - return -EINVAL; - } - - if (cmp > 0) - link = &(*link)->rb_left; - else - link = &(*link)->rb_right; - } - - rb_link_node(&new->key_node, parent, link); - rb_insert_color(&new->key_node, &pool->key.tree); - - return 0; -} - -int __rxe_add_key_locked(struct rxe_pool_elem *elem, void *key) -{ - struct rxe_pool *pool = elem->pool; - int err; - - memcpy((u8 *)elem + pool->key.key_offset, key, pool->key.key_size); - err = rxe_insert_key(pool, elem); - - return err; -} - -int __rxe_add_key(struct rxe_pool_elem *elem, void *key) -{ - struct rxe_pool *pool = elem->pool; - int err; - - write_lock_bh(&pool->pool_lock); - err = __rxe_add_key_locked(elem, key); - write_unlock_bh(&pool->pool_lock); - - return err; -} - -void __rxe_drop_key_locked(struct rxe_pool_elem *elem) -{ - struct rxe_pool *pool = elem->pool; - - rb_erase(&elem->key_node, &pool->key.tree); -} - -void __rxe_drop_key(struct rxe_pool_elem *elem) -{ - struct rxe_pool *pool = elem->pool; - - write_lock_bh(&pool->pool_lock); - __rxe_drop_key_locked(elem); - write_unlock_bh(&pool->pool_lock); -} - int __rxe_add_index_locked(struct rxe_pool_elem *elem) { struct rxe_pool *pool = elem->pool; @@ -448,47 +357,3 @@ void *rxe_pool_get_index(struct rxe_pool *pool, u32 index) return obj; } - -void *rxe_pool_get_key_locked(struct rxe_pool *pool, void *key) -{ - struct rb_node *node; - struct rxe_pool_elem *elem; - void *obj; - int cmp; - - node = pool->key.tree.rb_node; - - while (node) { - elem = rb_entry(node, struct rxe_pool_elem, key_node); - - cmp = memcmp((u8 *)elem + pool->key.key_offset, - key, pool->key.key_size); - - if (cmp > 0) - node = node->rb_left; - else if (cmp < 0) - node = node->rb_right; - else - break; - } - - if (node) { - kref_get(&elem->ref_cnt); - obj = elem->obj; - } else { - obj = NULL; - } - - return obj; -} - -void *rxe_pool_get_key(struct rxe_pool *pool, void *key) -{ - void *obj; - - read_lock_bh(&pool->pool_lock); - obj = rxe_pool_get_key_locked(pool, key); - read_unlock_bh(&pool->pool_lock); - - return obj; -} diff --git a/drivers/infiniband/sw/rxe/rxe_pool.h b/drivers/infiniband/sw/rxe/rxe_pool.h index 214279310f4d..b6de415e10d2 100644 --- a/drivers/infiniband/sw/rxe/rxe_pool.h +++ b/drivers/infiniband/sw/rxe/rxe_pool.h @@ -9,7 +9,6 @@ enum rxe_pool_flags { RXE_POOL_INDEX = BIT(1), - RXE_POOL_KEY = BIT(2), RXE_POOL_NO_ALLOC = BIT(4), }; @@ -23,7 +22,6 @@ enum rxe_elem_type { RXE_TYPE_MR, RXE_TYPE_MW, RXE_TYPE_MC_GRP, - RXE_TYPE_MC_ELEM, RXE_NUM_TYPES, /* keep me last */ }; @@ -33,9 +31,6 @@ struct rxe_pool_elem { struct kref ref_cnt; struct list_head list; - /* only used if keyed */ - struct rb_node key_node; - /* only used if indexed */ struct rb_node index_node; u32 index; @@ -62,13 +57,6 @@ struct rxe_pool { u32 max_index; u32 min_index; } index; - - /* only used if keyed */ - struct { - struct rb_root tree; - size_t key_offset; - size_t key_size; - } key; }; /* initialize a pool of objects with given limit on @@ -113,26 +101,6 @@ void __rxe_drop_index(struct rxe_pool_elem *elem); #define rxe_drop_index(obj) __rxe_drop_index(&(obj)->elem) -/* assign a key to a keyed object and insert object into - * pool's rb tree holding and not holding pool_lock - */ -int __rxe_add_key_locked(struct rxe_pool_elem *elem, void *key); - -#define rxe_add_key_locked(obj, key) __rxe_add_key_locked(&(obj)->elem, key) - -int __rxe_add_key(struct rxe_pool_elem *elem, void *key); - -#define rxe_add_key(obj, key) __rxe_add_key(&(obj)->elem, key) - -/* remove elem from rb tree holding and not holding the pool_lock */ -void __rxe_drop_key_locked(struct rxe_pool_elem *elem); - -#define rxe_drop_key_locked(obj) __rxe_drop_key_locked(&(obj)->elem) - -void __rxe_drop_key(struct rxe_pool_elem *elem); - -#define rxe_drop_key(obj) __rxe_drop_key(&(obj)->elem) - /* lookup an indexed object from index holding and not holding the pool_lock. * takes a reference on object */ @@ -140,13 +108,6 @@ void *rxe_pool_get_index_locked(struct rxe_pool *pool, u32 index); void *rxe_pool_get_index(struct rxe_pool *pool, u32 index); -/* lookup keyed object from key holding and not holding the pool_lock. - * takes a reference on the objecti - */ -void *rxe_pool_get_key_locked(struct rxe_pool *pool, void *key); - -void *rxe_pool_get_key(struct rxe_pool *pool, void *key); - /* cleanup an object when all references are dropped */ void rxe_elem_release(struct kref *kref); diff --git a/drivers/infiniband/sw/rxe/rxe_qp.c b/drivers/infiniband/sw/rxe/rxe_qp.c index afe11f475b8c..4c0cea0833ee 100644 --- a/drivers/infiniband/sw/rxe/rxe_qp.c +++ b/drivers/infiniband/sw/rxe/rxe_qp.c @@ -188,9 +188,8 @@ static void rxe_qp_init_misc(struct rxe_dev *rxe, struct rxe_qp *qp, break; } - INIT_LIST_HEAD(&qp->grp_list); + INIT_LIST_HEAD(&qp->mcg_list); - spin_lock_init(&qp->grp_lock); spin_lock_init(&qp->state_lock); atomic_set(&qp->ssn, 0); @@ -799,7 +798,7 @@ static void rxe_qp_do_cleanup(struct work_struct *work) { struct rxe_qp *qp = container_of(work, typeof(*qp), cleanup_work.work); - rxe_drop_all_mcast_groups(qp); + rxe_cleanup_mcast(qp); if (qp->sq.queue) rxe_queue_cleanup(qp->sq.queue); diff --git a/drivers/infiniband/sw/rxe/rxe_recv.c b/drivers/infiniband/sw/rxe/rxe_recv.c index 6a6cc1fa90e4..78681f25a6d9 100644 --- a/drivers/infiniband/sw/rxe/rxe_recv.c +++ b/drivers/infiniband/sw/rxe/rxe_recv.c @@ -233,32 +233,33 @@ static inline void rxe_rcv_pkt(struct rxe_pkt_info *pkt, struct sk_buff *skb) static void rxe_rcv_mcast_pkt(struct rxe_dev *rxe, struct sk_buff *skb) { struct rxe_pkt_info *pkt = SKB_TO_PKT(skb); - struct rxe_mc_grp *mcg; - struct rxe_mc_elem *mce; + struct rxe_mcg *mcg; + struct rxe_mca *mca, *last; struct rxe_qp *qp; - union ib_gid dgid; + union ib_gid mgid; int err; if (skb->protocol == htons(ETH_P_IP)) ipv6_addr_set_v4mapped(ip_hdr(skb)->daddr, - (struct in6_addr *)&dgid); + (struct in6_addr *)&mgid); else if (skb->protocol == htons(ETH_P_IPV6)) - memcpy(&dgid, &ipv6_hdr(skb)->daddr, sizeof(dgid)); + memcpy(&mgid, &ipv6_hdr(skb)->daddr, sizeof(mgid)); - /* lookup mcast group corresponding to mgid, takes a ref */ - mcg = rxe_pool_get_key(&rxe->mc_grp_pool, &dgid); + mcg = rxe_lookup_mcg(rxe, &mgid); if (!mcg) goto drop; /* mcast group not registered */ - spin_lock_bh(&mcg->mcg_lock); + spin_lock_bh(&mcg->lock); + + last = list_last_entry(&mcg->qp_list, typeof(*last), qp_list); /* this is unreliable datagram service so we let * failures to deliver a multicast packet to a * single QP happen and just move on and try * the rest of them on the list */ - list_for_each_entry(mce, &mcg->qp_list, qp_list) { - qp = mce->qp; + list_for_each_entry(mca, &mcg->qp_list, qp_list) { + qp = mca->qp; /* validate qp for incoming packet */ err = check_type_state(rxe, pkt, qp); @@ -273,7 +274,7 @@ static void rxe_rcv_mcast_pkt(struct rxe_dev *rxe, struct sk_buff *skb) * skb and pass to the QP. Pass the original skb to * the last QP in the list. */ - if (mce->qp_list.next != &mcg->qp_list) { + if (mca != last) { struct sk_buff *cskb; struct rxe_pkt_info *cpkt; @@ -298,9 +299,8 @@ static void rxe_rcv_mcast_pkt(struct rxe_dev *rxe, struct sk_buff *skb) } } - spin_unlock_bh(&mcg->mcg_lock); - - rxe_drop_ref(mcg); /* drop ref from rxe_pool_get_key. */ + spin_unlock_bh(&mcg->lock); + kref_put(&mcg->ref_cnt, rxe_cleanup_mcg); if (likely(!skb)) return; diff --git a/drivers/infiniband/sw/rxe/rxe_verbs.c b/drivers/infiniband/sw/rxe/rxe_verbs.c index 915ad6664321..f7682541f9af 100644 --- a/drivers/infiniband/sw/rxe/rxe_verbs.c +++ b/drivers/infiniband/sw/rxe/rxe_verbs.c @@ -999,32 +999,6 @@ static int rxe_map_mr_sg(struct ib_mr *ibmr, struct scatterlist *sg, return n; } -static int rxe_attach_mcast(struct ib_qp *ibqp, union ib_gid *mgid, u16 mlid) -{ - int err; - struct rxe_dev *rxe = to_rdev(ibqp->device); - struct rxe_qp *qp = to_rqp(ibqp); - struct rxe_mc_grp *grp; - - /* takes a ref on grp if successful */ - err = rxe_mcast_get_grp(rxe, mgid, &grp); - if (err) - return err; - - err = rxe_mcast_add_grp_elem(rxe, qp, grp); - - rxe_drop_ref(grp); - return err; -} - -static int rxe_detach_mcast(struct ib_qp *ibqp, union ib_gid *mgid, u16 mlid) -{ - struct rxe_dev *rxe = to_rdev(ibqp->device); - struct rxe_qp *qp = to_rqp(ibqp); - - return rxe_mcast_drop_grp_elem(rxe, qp, mgid); -} - static ssize_t parent_show(struct device *device, struct device_attribute *attr, char *buf) { diff --git a/drivers/infiniband/sw/rxe/rxe_verbs.h b/drivers/infiniband/sw/rxe/rxe_verbs.h index e48969e8d4c8..e2431753755e 100644 --- a/drivers/infiniband/sw/rxe/rxe_verbs.h +++ b/drivers/infiniband/sw/rxe/rxe_verbs.h @@ -232,9 +232,7 @@ struct rxe_qp { struct rxe_av pri_av; struct rxe_av alt_av; - /* list of mcast groups qp has joined (for cleanup) */ - struct list_head grp_list; - spinlock_t grp_lock; /* guard grp_list */ + struct list_head mcg_list; struct sk_buff_head req_pkts; struct sk_buff_head resp_pkts; @@ -353,23 +351,23 @@ struct rxe_mw { u64 length; }; -struct rxe_mc_grp { - struct rxe_pool_elem elem; - spinlock_t mcg_lock; /* guard group */ - struct rxe_dev *rxe; - struct list_head qp_list; +struct rxe_mcg { + struct rb_node node; union ib_gid mgid; + struct list_head qp_list; + struct kref ref_cnt; + struct rxe_dev *rxe; + spinlock_t lock; /* guard qp_list */ int num_qp; u32 qkey; u16 pkey; }; -struct rxe_mc_elem { - struct rxe_pool_elem elem; +struct rxe_mca { struct list_head qp_list; - struct list_head grp_list; + struct list_head mcg_list; struct rxe_qp *qp; - struct rxe_mc_grp *grp; + struct rxe_mcg *mcg; }; struct rxe_port { @@ -400,8 +398,11 @@ struct rxe_dev { struct rxe_pool cq_pool; struct rxe_pool mr_pool; struct rxe_pool mw_pool; - struct rxe_pool mc_grp_pool; - struct rxe_pool mc_elem_pool; + + spinlock_t mcg_lock; /* guard mcg_tree and mcg->qp_list */ + struct rb_root mcg_tree; + int num_mcg; + int num_attach; spinlock_t pending_lock; /* guard pending_mmaps */ struct list_head pending_mmaps; From patchwork Sat Jan 15 04:29:09 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Bob Pearson X-Patchwork-Id: 12714324 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id D9638C433FE for ; Sat, 15 Jan 2022 04:29:58 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232454AbiAOE35 (ORCPT ); Fri, 14 Jan 2022 23:29:57 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:42532 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229952AbiAOE35 (ORCPT ); Fri, 14 Jan 2022 23:29:57 -0500 Received: from mail-ot1-x335.google.com (mail-ot1-x335.google.com [IPv6:2607:f8b0:4864:20::335]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 31D28C061574 for ; Fri, 14 Jan 2022 20:29:57 -0800 (PST) Received: by mail-ot1-x335.google.com with SMTP id c3-20020a9d6c83000000b00590b9c8819aso12512275otr.6 for ; Fri, 14 Jan 2022 20:29:57 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=R11qtTIKU1LpoNy0GmDSCBWb3rYyFQcOG6D/9cEQPmw=; b=GTBiWwZ/D6/XADY2GSxA/watpjBjNRav9dmKlMw38eqncHyvHHoaVg7XEEu3qDiXzQ rMkYbtmVBbn0mwYh6SpxojU/G5VWysvolbU4wCL5M7vMIPcfJQOcM72CdSrsm+W0QrdA Cvy6zCP+7iIK4ylMQkyrYxHxaINxNl2pYWo2siu/5HkvWbxOm4/3hFda6EBhSWmafMVJ jY95iYUv4Sq6BIzaSuKxMXX5VynqXBQ5B9Rs4K9X2UIUXSGcsgyeeoRrUzeuk0ilzfGF z6HQoFfF49m9cshfCZR/ukvneojhaUIBbiU9Bq3vbDNKD4HgJArbqGF4vVicT9vi68pw 1QLw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=R11qtTIKU1LpoNy0GmDSCBWb3rYyFQcOG6D/9cEQPmw=; b=tw8pWFRXASsr0bVfFUvw9qtjoJ3J8IRH02c0Xn+DIrz+XNqxORceP5UjfW7d301Hq0 Z8GbL25t3zN2HYbiID43TbfZHUOda90oFMo2cr97lMmiIW4inm22f9alWxuFFQ8ix89v syGvgzod0dCViNa6pjif7ehy0KJRQfHkGRFEedVlP48DXiS+d3r58NipAr29dR/9QM4X MgxwEHabQBXFIk9cFzP10IEL4SUJPt1m8wcbVu/C0PVw9fPEbU6EbRpWM/gcGefc7PBJ X8AWkCOxZI6+5pLWoZrusRYqqftGPUduA9i9aZdW7P61E/G0ML9Run4IGaPr6w6ZMM9k CtPg== X-Gm-Message-State: AOAM532+ZqGhx0PXT+V/OJ/XRQC7XBhRAMEUQjBTb+jitE1KfQs51N53 kasEWS0dG91rfpH78/YiPQDSB5SMuxw= X-Google-Smtp-Source: ABdhPJyCaqs4zSEY9C1GhFAfzEwoaueptjc/hpKNwM+Sh0+Fk6WPlTgUqI6v0Si1atS9BkdK8TU45g== X-Received: by 2002:a9d:7581:: with SMTP id s1mr9335572otk.57.1642220996418; Fri, 14 Jan 2022 20:29:56 -0800 (PST) Received: from ubuntu-21.tx.rr.com (097-099-248-255.res.spectrum.com. [97.99.248.255]) by smtp.googlemail.com with ESMTPSA id k8sm2757515oon.2.2022.01.14.20.29.55 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 14 Jan 2022 20:29:55 -0800 (PST) From: Bob Pearson To: jgg@nvidia.com, zyjzyj2000@gmail.com, linux-rdma@vger.kernel.org Cc: Bob Pearson Subject: [PATCH for-next v9 2/4] RDMA/rxe: Replace RB tree by xarray for indexes Date: Fri, 14 Jan 2022 22:29:09 -0600 Message-Id: <20220115042910.40181-3-rpearsonhpe@gmail.com> X-Mailer: git-send-email 2.32.0 In-Reply-To: <20220115042910.40181-1-rpearsonhpe@gmail.com> References: <20220115042910.40181-1-rpearsonhpe@gmail.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-rdma@vger.kernel.org Currently the rxe driver uses red-black trees to add indices to the rxe object pool. Linux xarrays provide a better way to implement the same functionality for indices. This patch replaces red-black trees by xarrays for indexed objects. Since caller managed locks for indexed objects are not used these APIs are deleted as well. To avoid double locking since xarray already includes a spinlock replace the rxe_pool rwlock by the spinlock included in xarray. The RDMA objects are created and destroyed by verbs calls from rdma_core but are looked up from indices or keys from soft IRQs so _bh style locks are the correct type to use. A private copy of kref_put_lock that calls a private copy of refcount_dec_and_lock which uses _bh locks. This routine generates a sparse warning as does the original refcount_dec_and_lock routine. There does not seem to be a way to annotate this away. There is only one remaining object type that allocates its own memory, that is MR, so the sense of RXE_POOL_NO_ALLOC is changed to RXE_POOL_ALLOC. Last ditch code is added to free MRs allocated by rxe if applications do not free their objects. rdma/core is responsible for freeing objects it allocated which covers the remaining cases for applications which do not clean up. Signed-off-by: Bob Pearson --- drivers/infiniband/sw/rxe/rxe.c | 87 +----- drivers/infiniband/sw/rxe/rxe_mr.c | 1 - drivers/infiniband/sw/rxe/rxe_mw.c | 4 - drivers/infiniband/sw/rxe/rxe_pool.c | 371 +++++++++++++------------- drivers/infiniband/sw/rxe/rxe_pool.h | 72 +---- drivers/infiniband/sw/rxe/rxe_verbs.c | 12 - 6 files changed, 205 insertions(+), 342 deletions(-) diff --git a/drivers/infiniband/sw/rxe/rxe.c b/drivers/infiniband/sw/rxe/rxe.c index c560d467a972..2952b27b3f20 100644 --- a/drivers/infiniband/sw/rxe/rxe.c +++ b/drivers/infiniband/sw/rxe/rxe.c @@ -112,83 +112,26 @@ static void rxe_init_ports(struct rxe_dev *rxe) } /* init pools of managed objects */ -static int rxe_init_pools(struct rxe_dev *rxe) +static void rxe_init_pools(struct rxe_dev *rxe) { - int err; - - err = rxe_pool_init(rxe, &rxe->uc_pool, RXE_TYPE_UC, - rxe->max_ucontext); - if (err) - goto err1; - - err = rxe_pool_init(rxe, &rxe->pd_pool, RXE_TYPE_PD, - rxe->attr.max_pd); - if (err) - goto err2; - - err = rxe_pool_init(rxe, &rxe->ah_pool, RXE_TYPE_AH, - rxe->attr.max_ah); - if (err) - goto err3; - - err = rxe_pool_init(rxe, &rxe->srq_pool, RXE_TYPE_SRQ, - rxe->attr.max_srq); - if (err) - goto err4; - - err = rxe_pool_init(rxe, &rxe->qp_pool, RXE_TYPE_QP, - rxe->attr.max_qp); - if (err) - goto err5; - - err = rxe_pool_init(rxe, &rxe->cq_pool, RXE_TYPE_CQ, - rxe->attr.max_cq); - if (err) - goto err6; - - err = rxe_pool_init(rxe, &rxe->mr_pool, RXE_TYPE_MR, - rxe->attr.max_mr); - if (err) - goto err7; - - err = rxe_pool_init(rxe, &rxe->mw_pool, RXE_TYPE_MW, - rxe->attr.max_mw); - if (err) - goto err8; - - return 0; - -err8: - rxe_pool_cleanup(&rxe->mr_pool); -err7: - rxe_pool_cleanup(&rxe->cq_pool); -err6: - rxe_pool_cleanup(&rxe->qp_pool); -err5: - rxe_pool_cleanup(&rxe->srq_pool); -err4: - rxe_pool_cleanup(&rxe->ah_pool); -err3: - rxe_pool_cleanup(&rxe->pd_pool); -err2: - rxe_pool_cleanup(&rxe->uc_pool); -err1: - return err; + rxe_pool_init(rxe, &rxe->uc_pool, RXE_TYPE_UC, rxe->max_ucontext); + rxe_pool_init(rxe, &rxe->pd_pool, RXE_TYPE_PD, rxe->attr.max_pd); + rxe_pool_init(rxe, &rxe->ah_pool, RXE_TYPE_AH, rxe->attr.max_ah); + rxe_pool_init(rxe, &rxe->srq_pool, RXE_TYPE_SRQ, rxe->attr.max_srq); + rxe_pool_init(rxe, &rxe->qp_pool, RXE_TYPE_QP, rxe->attr.max_qp); + rxe_pool_init(rxe, &rxe->cq_pool, RXE_TYPE_CQ, rxe->attr.max_cq); + rxe_pool_init(rxe, &rxe->mr_pool, RXE_TYPE_MR, rxe->attr.max_mr); + rxe_pool_init(rxe, &rxe->mw_pool, RXE_TYPE_MW, rxe->attr.max_mw); } /* initialize rxe device state */ -static int rxe_init(struct rxe_dev *rxe) +static void rxe_init(struct rxe_dev *rxe) { - int err; - /* init default device parameters */ rxe_init_device_param(rxe); rxe_init_ports(rxe); - - err = rxe_init_pools(rxe); - if (err) - return err; + rxe_init_pools(rxe); spin_lock_init(&rxe->mcg_lock); rxe->mcg_tree = RB_ROOT; @@ -199,8 +142,6 @@ static int rxe_init(struct rxe_dev *rxe) INIT_LIST_HEAD(&rxe->pending_mmaps); mutex_init(&rxe->usdev_lock); - - return 0; } void rxe_set_mtu(struct rxe_dev *rxe, unsigned int ndev_mtu) @@ -222,11 +163,7 @@ void rxe_set_mtu(struct rxe_dev *rxe, unsigned int ndev_mtu) */ int rxe_add(struct rxe_dev *rxe, unsigned int mtu, const char *ibdev_name) { - int err; - - err = rxe_init(rxe); - if (err) - return err; + rxe_init(rxe); rxe_set_mtu(rxe, mtu); diff --git a/drivers/infiniband/sw/rxe/rxe_mr.c b/drivers/infiniband/sw/rxe/rxe_mr.c index 453ef3c9d535..35628b8a00b4 100644 --- a/drivers/infiniband/sw/rxe/rxe_mr.c +++ b/drivers/infiniband/sw/rxe/rxe_mr.c @@ -691,7 +691,6 @@ int rxe_dereg_mr(struct ib_mr *ibmr, struct ib_udata *udata) mr->state = RXE_MR_STATE_INVALID; rxe_drop_ref(mr_pd(mr)); - rxe_drop_index(mr); rxe_drop_ref(mr); return 0; diff --git a/drivers/infiniband/sw/rxe/rxe_mw.c b/drivers/infiniband/sw/rxe/rxe_mw.c index 32dd8c0b8b9e..3ae981d77c25 100644 --- a/drivers/infiniband/sw/rxe/rxe_mw.c +++ b/drivers/infiniband/sw/rxe/rxe_mw.c @@ -20,7 +20,6 @@ int rxe_alloc_mw(struct ib_mw *ibmw, struct ib_udata *udata) return ret; } - rxe_add_index(mw); mw->rkey = ibmw->rkey = (mw->elem.index << 8) | rxe_get_next_key(-1); mw->state = (mw->ibmw.type == IB_MW_TYPE_2) ? RXE_MW_STATE_FREE : RXE_MW_STATE_VALID; @@ -332,7 +331,4 @@ struct rxe_mw *rxe_lookup_mw(struct rxe_qp *qp, int access, u32 rkey) void rxe_mw_cleanup(struct rxe_pool_elem *elem) { - struct rxe_mw *mw = container_of(elem, typeof(*mw), elem); - - rxe_drop_index(mw); } diff --git a/drivers/infiniband/sw/rxe/rxe_pool.c b/drivers/infiniband/sw/rxe/rxe_pool.c index 4e558d5e0ded..14c67e2a0904 100644 --- a/drivers/infiniband/sw/rxe/rxe_pool.c +++ b/drivers/infiniband/sw/rxe/rxe_pool.c @@ -4,6 +4,7 @@ * Copyright (c) 2015 System Fabric Works, Inc. All rights reserved. */ +#include #include "rxe.h" #define RXE_POOL_ALIGN (16) @@ -23,19 +24,17 @@ static const struct rxe_type_info { .name = "rxe-uc", .size = sizeof(struct rxe_ucontext), .elem_offset = offsetof(struct rxe_ucontext, elem), - .flags = RXE_POOL_NO_ALLOC, }, [RXE_TYPE_PD] = { .name = "rxe-pd", .size = sizeof(struct rxe_pd), .elem_offset = offsetof(struct rxe_pd, elem), - .flags = RXE_POOL_NO_ALLOC, }, [RXE_TYPE_AH] = { .name = "rxe-ah", .size = sizeof(struct rxe_ah), .elem_offset = offsetof(struct rxe_ah, elem), - .flags = RXE_POOL_INDEX | RXE_POOL_NO_ALLOC, + .flags = RXE_POOL_INDEX, .min_index = RXE_MIN_AH_INDEX, .max_index = RXE_MAX_AH_INDEX, }, @@ -43,7 +42,7 @@ static const struct rxe_type_info { .name = "rxe-srq", .size = sizeof(struct rxe_srq), .elem_offset = offsetof(struct rxe_srq, elem), - .flags = RXE_POOL_INDEX | RXE_POOL_NO_ALLOC, + .flags = RXE_POOL_INDEX, .min_index = RXE_MIN_SRQ_INDEX, .max_index = RXE_MAX_SRQ_INDEX, }, @@ -52,7 +51,7 @@ static const struct rxe_type_info { .size = sizeof(struct rxe_qp), .elem_offset = offsetof(struct rxe_qp, elem), .cleanup = rxe_qp_cleanup, - .flags = RXE_POOL_INDEX | RXE_POOL_NO_ALLOC, + .flags = RXE_POOL_INDEX, .min_index = RXE_MIN_QP_INDEX, .max_index = RXE_MAX_QP_INDEX, }, @@ -60,7 +59,6 @@ static const struct rxe_type_info { .name = "rxe-cq", .size = sizeof(struct rxe_cq), .elem_offset = offsetof(struct rxe_cq, elem), - .flags = RXE_POOL_NO_ALLOC, .cleanup = rxe_cq_cleanup, }, [RXE_TYPE_MR] = { @@ -68,7 +66,7 @@ static const struct rxe_type_info { .size = sizeof(struct rxe_mr), .elem_offset = offsetof(struct rxe_mr, elem), .cleanup = rxe_mr_cleanup, - .flags = RXE_POOL_INDEX, + .flags = RXE_POOL_INDEX | RXE_POOL_ALLOC, .min_index = RXE_MIN_MR_INDEX, .max_index = RXE_MAX_MR_INDEX, }, @@ -77,43 +75,16 @@ static const struct rxe_type_info { .size = sizeof(struct rxe_mw), .elem_offset = offsetof(struct rxe_mw, elem), .cleanup = rxe_mw_cleanup, - .flags = RXE_POOL_INDEX | RXE_POOL_NO_ALLOC, + .flags = RXE_POOL_INDEX, .min_index = RXE_MIN_MW_INDEX, .max_index = RXE_MAX_MW_INDEX, }, }; -static int rxe_pool_init_index(struct rxe_pool *pool, u32 max, u32 min) -{ - int err = 0; - - if ((max - min + 1) < pool->max_elem) { - pr_warn("not enough indices for max_elem\n"); - err = -EINVAL; - goto out; - } - - pool->index.max_index = max; - pool->index.min_index = min; - - pool->index.table = bitmap_zalloc(max - min + 1, GFP_KERNEL); - if (!pool->index.table) { - err = -ENOMEM; - goto out; - } - -out: - return err; -} - -int rxe_pool_init( - struct rxe_dev *rxe, - struct rxe_pool *pool, - enum rxe_elem_type type, - unsigned int max_elem) +void rxe_pool_init(struct rxe_dev *rxe, struct rxe_pool *pool, + enum rxe_elem_type type, unsigned int max_elem) { const struct rxe_type_info *info = &rxe_type_info[type]; - int err = 0; memset(pool, 0, sizeof(*pool)); @@ -128,148 +99,69 @@ int rxe_pool_init( atomic_set(&pool->num_elem, 0); - rwlock_init(&pool->pool_lock); - - if (pool->flags & RXE_POOL_INDEX) { - pool->index.tree = RB_ROOT; - err = rxe_pool_init_index(pool, info->max_index, - info->min_index); - if (err) - goto out; - } - -out: - return err; + /* used for pools with RXE_POOL_INDEX and + * the xa spinlock for other pools + */ + xa_init_flags(&pool->xa, XA_FLAGS_ALLOC); + pool->limit.max = info->max_index; + pool->limit.min = info->min_index; } void rxe_pool_cleanup(struct rxe_pool *pool) { + struct rxe_pool_elem *elem; + if (atomic_read(&pool->num_elem) > 0) pr_warn("%s pool destroyed with unfree'd elem\n", pool->name); - if (pool->flags & RXE_POOL_INDEX) - bitmap_free(pool->index.table); -} - -static u32 alloc_index(struct rxe_pool *pool) -{ - u32 index; - u32 range = pool->index.max_index - pool->index.min_index + 1; - - index = find_next_zero_bit(pool->index.table, range, pool->index.last); - if (index >= range) - index = find_first_zero_bit(pool->index.table, range); - - WARN_ON_ONCE(index >= range); - set_bit(index, pool->index.table); - pool->index.last = index; - return index + pool->index.min_index; -} - -static int rxe_insert_index(struct rxe_pool *pool, struct rxe_pool_elem *new) -{ - struct rb_node **link = &pool->index.tree.rb_node; - struct rb_node *parent = NULL; - struct rxe_pool_elem *elem; - - while (*link) { - parent = *link; - elem = rb_entry(parent, struct rxe_pool_elem, index_node); - - if (elem->index == new->index) { - pr_warn("element already exists!\n"); - return -EINVAL; - } - - if (elem->index > new->index) - link = &(*link)->rb_left; - else - link = &(*link)->rb_right; + if (pool->flags & RXE_POOL_INDEX) { + unsigned long index = 0; + unsigned long max = ULONG_MAX; + unsigned int elem_count = 0; + unsigned int free_count = 0; + + do { + elem = xa_find(&pool->xa, &index, max, XA_PRESENT); + if (elem) { + elem_count++; + xa_erase(&pool->xa, index); + if (pool->flags & RXE_POOL_ALLOC) { + kfree(elem->obj); + free_count++; + } + } + + } while (elem); + + if (elem_count || free_count) + pr_warn("Freed %d indices, %d objects\n", + elem_count, free_count); } - rb_link_node(&new->index_node, parent, link); - rb_insert_color(&new->index_node, &pool->index.tree); - - return 0; -} - -int __rxe_add_index_locked(struct rxe_pool_elem *elem) -{ - struct rxe_pool *pool = elem->pool; - int err; - - elem->index = alloc_index(pool); - err = rxe_insert_index(pool, elem); - - return err; -} - -int __rxe_add_index(struct rxe_pool_elem *elem) -{ - struct rxe_pool *pool = elem->pool; - int err; - - write_lock_bh(&pool->pool_lock); - err = __rxe_add_index_locked(elem); - write_unlock_bh(&pool->pool_lock); - - return err; -} - -void __rxe_drop_index_locked(struct rxe_pool_elem *elem) -{ - struct rxe_pool *pool = elem->pool; - - clear_bit(elem->index - pool->index.min_index, pool->index.table); - rb_erase(&elem->index_node, &pool->index.tree); -} - -void __rxe_drop_index(struct rxe_pool_elem *elem) -{ - struct rxe_pool *pool = elem->pool; - - write_lock_bh(&pool->pool_lock); - __rxe_drop_index_locked(elem); - write_unlock_bh(&pool->pool_lock); -} - -void *rxe_alloc_locked(struct rxe_pool *pool) -{ - struct rxe_pool_elem *elem; - void *obj; - - if (atomic_inc_return(&pool->num_elem) > pool->max_elem) - goto out_cnt; - - obj = kzalloc(pool->elem_size, GFP_ATOMIC); - if (!obj) - goto out_cnt; - - elem = (struct rxe_pool_elem *)((u8 *)obj + pool->elem_offset); - - elem->pool = pool; - elem->obj = obj; - kref_init(&elem->ref_cnt); - - return obj; - -out_cnt: - atomic_dec(&pool->num_elem); - return NULL; + xa_destroy(&pool->xa); } +/** + * rxe_alloc() - create a new rxe object + * @pool: rxe object pool + * + * Adds a new object to object pool allocating the storage here. + * If object pool has an index add elem to xarray. + * + * Returns: the object on success else NULL + */ void *rxe_alloc(struct rxe_pool *pool) { struct rxe_pool_elem *elem; void *obj; if (atomic_inc_return(&pool->num_elem) > pool->max_elem) - goto out_cnt; + goto err_cnt; obj = kzalloc(pool->elem_size, GFP_KERNEL); if (!obj) - goto out_cnt; + goto err_cnt; elem = (struct rxe_pool_elem *)((u8 *)obj + pool->elem_offset); @@ -277,40 +169,117 @@ void *rxe_alloc(struct rxe_pool *pool) elem->obj = obj; kref_init(&elem->ref_cnt); + if (pool->flags & RXE_POOL_INDEX) { + int err = xa_alloc_cyclic_bh(&pool->xa, &elem->index, + elem, pool->limit, + &pool->next, GFP_KERNEL); + if (err) + goto err_free; + } + return obj; -out_cnt: +err_free: + kfree(obj); +err_cnt: atomic_dec(&pool->num_elem); return NULL; } +/** + * __rxe_add_to_pool() - add pool element to object pool + * @pool: rxe object pool + * @elem: a pool element embedded in a rxe object + * + * Adds a rxe pool element to object pool when the storage is + * allocated by rdma/core before calling the verb that creates + * the object. If object pool has an index add elem to xarray. + * + * The rxe_add_to_pool() macro converts the 2nd argument from + * an object to a pool element embedded in the object. + * + * Returns: 0 on success else an error + */ int __rxe_add_to_pool(struct rxe_pool *pool, struct rxe_pool_elem *elem) { if (atomic_inc_return(&pool->num_elem) > pool->max_elem) - goto out_cnt; + goto err_cnt; elem->pool = pool; elem->obj = (u8 *)elem - pool->elem_offset; kref_init(&elem->ref_cnt); + if (pool->flags & RXE_POOL_INDEX) { + int err = xa_alloc_cyclic_bh(&pool->xa, &elem->index, + elem, pool->limit, + &pool->next, GFP_KERNEL); + if (err) + goto err_cnt; + } + return 0; -out_cnt: +err_cnt: atomic_dec(&pool->num_elem); return -EINVAL; } -void rxe_elem_release(struct kref *kref) +/** + * rxe_pool_get_index - lookup object from index + * @pool: the object pool + * @index: the index of the object + * + * Acquire the xa spinlock to make looking up the object from + * its index atomic with the call to kref_get_unless_zero() to avoid + * a race condition with a second thread deleting the object + * before we can acquire the reference. + * + * Returns: the object if the index exists in the pool + * and the reference count on the object is positive + * else NULL + */ +void *rxe_pool_get_index(struct rxe_pool *pool, u32 index) +{ + struct rxe_pool_elem *elem; + void *obj; + + xa_lock_bh(&pool->xa); + elem = xa_load(&pool->xa, index); + if (elem && kref_get_unless_zero(&elem->ref_cnt)) + obj = elem->obj; + else + obj = NULL; + xa_unlock_bh(&pool->xa); + + return obj; +} + +/** + * rxe_elem_release() - cleanup object + * @kref: pointer to kref embedded in pool element + * + * The kref_put_lock() call in rxe_drop_ref() takes the + * xa spinlock if the ref count goes to zero which is then + * released here after removing the xarray entry to prevent + * overlapping with rxe_get_index(). + */ +static void rxe_elem_release(struct kref *kref) + __releases(&pool->xa.xa_lock) { struct rxe_pool_elem *elem = container_of(kref, struct rxe_pool_elem, ref_cnt); struct rxe_pool *pool = elem->pool; void *obj; + if (pool->flags & RXE_POOL_INDEX) + __xa_erase(&pool->xa, elem->index); + + xa_unlock_bh(&pool->xa); + if (pool->cleanup) pool->cleanup(elem); - if (!(pool->flags & RXE_POOL_NO_ALLOC)) { + if (pool->flags & RXE_POOL_ALLOC) { obj = elem->obj; kfree(obj); } @@ -318,42 +287,60 @@ void rxe_elem_release(struct kref *kref) atomic_dec(&pool->num_elem); } -void *rxe_pool_get_index_locked(struct rxe_pool *pool, u32 index) +/** + * __rxe_add_ref() - takes a ref on pool element + * @elem: pool element + * + * Takes a ref on pool element if count is not zero + * + * The rxe_add_ref() macro converts argument from object to pool element + * + * Returns 1 if successful else 0 + */ +int __rxe_add_ref(struct rxe_pool_elem *elem) { - struct rb_node *node; - struct rxe_pool_elem *elem; - void *obj; - - node = pool->index.tree.rb_node; - - while (node) { - elem = rb_entry(node, struct rxe_pool_elem, index_node); + return kref_get_unless_zero(&elem->ref_cnt); +} - if (elem->index > index) - node = node->rb_left; - else if (elem->index < index) - node = node->rb_right; - else - break; - } +static bool refcount_dec_and_lock_bh(refcount_t *r, spinlock_t *lock) + __acquires(lock) __releases(lock) +{ + if (refcount_dec_not_one(r)) + return false; - if (node) { - kref_get(&elem->ref_cnt); - obj = elem->obj; - } else { - obj = NULL; + spin_lock_bh(lock); + if (!refcount_dec_and_test(r)) { + spin_unlock_bh(lock); + return false; } - return obj; + return true; } -void *rxe_pool_get_index(struct rxe_pool *pool, u32 index) +static int kref_put_lock_bh(struct kref *kref, + void (*release)(struct kref *kref), + spinlock_t *lock) { - void *obj; - - read_lock_bh(&pool->pool_lock); - obj = rxe_pool_get_index_locked(pool, index); - read_unlock_bh(&pool->pool_lock); + if (refcount_dec_and_lock_bh(&kref->refcount, lock)) { + release(kref); + return 1; + } + return 0; +} - return obj; +/** + * __rxe_drop_ref() - drops a ref on pool element + * @elem: pool element + * + * Drops a ref on pool element and if count goes to zero atomically + * acquires the xa lock and then calls rxe_elem_release() holding the lock + * + * The rxe_drop_ref() macro converts argument from object to pool element + * + * Returns 1 if rxe_elem_release called else 0 + */ +int __rxe_drop_ref(struct rxe_pool_elem *elem) +{ + return kref_put_lock_bh(&elem->ref_cnt, rxe_elem_release, + &elem->pool->xa.xa_lock); } diff --git a/drivers/infiniband/sw/rxe/rxe_pool.h b/drivers/infiniband/sw/rxe/rxe_pool.h index b6de415e10d2..0ff5e1d8b935 100644 --- a/drivers/infiniband/sw/rxe/rxe_pool.h +++ b/drivers/infiniband/sw/rxe/rxe_pool.h @@ -7,9 +7,11 @@ #ifndef RXE_POOL_H #define RXE_POOL_H +#include + enum rxe_pool_flags { RXE_POOL_INDEX = BIT(1), - RXE_POOL_NO_ALLOC = BIT(4), + RXE_POOL_ALLOC = BIT(2), }; enum rxe_elem_type { @@ -30,16 +32,12 @@ struct rxe_pool_elem { void *obj; struct kref ref_cnt; struct list_head list; - - /* only used if indexed */ - struct rb_node index_node; u32 index; }; struct rxe_pool { struct rxe_dev *rxe; const char *name; - rwlock_t pool_lock; /* protects pool add/del/search */ void (*cleanup)(struct rxe_pool_elem *obj); enum rxe_pool_flags flags; enum rxe_elem_type type; @@ -48,73 +46,31 @@ struct rxe_pool { atomic_t num_elem; size_t elem_size; size_t elem_offset; - - /* only used if indexed */ - struct { - struct rb_root tree; - unsigned long *table; - u32 last; - u32 max_index; - u32 min_index; - } index; + struct xarray xa; + struct xa_limit limit; + u32 next; + int locked; }; -/* initialize a pool of objects with given limit on - * number of elements. gets parameters from rxe_type_info - * pool elements will be allocated out of a slab cache - */ -int rxe_pool_init(struct rxe_dev *rxe, struct rxe_pool *pool, - enum rxe_elem_type type, u32 max_elem); +void rxe_pool_init(struct rxe_dev *rxe, struct rxe_pool *pool, + enum rxe_elem_type type, u32 max_elem); -/* free resources from object pool */ void rxe_pool_cleanup(struct rxe_pool *pool); -/* allocate an object from pool holding and not holding the pool lock */ -void *rxe_alloc_locked(struct rxe_pool *pool); - void *rxe_alloc(struct rxe_pool *pool); -/* connect already allocated object to pool */ int __rxe_add_to_pool(struct rxe_pool *pool, struct rxe_pool_elem *elem); #define rxe_add_to_pool(pool, obj) __rxe_add_to_pool(pool, &(obj)->elem) -/* assign an index to an indexed object and insert object into - * pool's rb tree holding and not holding the pool_lock - */ -int __rxe_add_index_locked(struct rxe_pool_elem *elem); - -#define rxe_add_index_locked(obj) __rxe_add_index_locked(&(obj)->elem) - -int __rxe_add_index(struct rxe_pool_elem *elem); - -#define rxe_add_index(obj) __rxe_add_index(&(obj)->elem) - -/* drop an index and remove object from rb tree - * holding and not holding the pool_lock - */ -void __rxe_drop_index_locked(struct rxe_pool_elem *elem); - -#define rxe_drop_index_locked(obj) __rxe_drop_index_locked(&(obj)->elem) - -void __rxe_drop_index(struct rxe_pool_elem *elem); - -#define rxe_drop_index(obj) __rxe_drop_index(&(obj)->elem) - -/* lookup an indexed object from index holding and not holding the pool_lock. - * takes a reference on object - */ -void *rxe_pool_get_index_locked(struct rxe_pool *pool, u32 index); - void *rxe_pool_get_index(struct rxe_pool *pool, u32 index); -/* cleanup an object when all references are dropped */ -void rxe_elem_release(struct kref *kref); +int __rxe_add_ref(struct rxe_pool_elem *elem); + +#define rxe_add_ref(obj) __rxe_add_ref(&(obj)->elem) -/* take a reference on an object */ -#define rxe_add_ref(obj) kref_get(&(obj)->elem.ref_cnt) +int __rxe_drop_ref(struct rxe_pool_elem *elem); -/* drop a reference on an object */ -#define rxe_drop_ref(obj) kref_put(&(obj)->elem.ref_cnt, rxe_elem_release) +#define rxe_drop_ref(obj) __rxe_drop_ref(&(obj)->elem) #endif /* RXE_POOL_H */ diff --git a/drivers/infiniband/sw/rxe/rxe_verbs.c b/drivers/infiniband/sw/rxe/rxe_verbs.c index f7682541f9af..bc3094d851f7 100644 --- a/drivers/infiniband/sw/rxe/rxe_verbs.c +++ b/drivers/infiniband/sw/rxe/rxe_verbs.c @@ -181,7 +181,6 @@ static int rxe_create_ah(struct ib_ah *ibah, return err; /* create index > 0 */ - rxe_add_index(ah); ah->ah_num = ah->elem.index; if (uresp) { @@ -189,7 +188,6 @@ static int rxe_create_ah(struct ib_ah *ibah, err = copy_to_user(&uresp->ah_num, &ah->ah_num, sizeof(uresp->ah_num)); if (err) { - rxe_drop_index(ah); rxe_drop_ref(ah); return -EFAULT; } @@ -230,7 +228,6 @@ static int rxe_destroy_ah(struct ib_ah *ibah, u32 flags) { struct rxe_ah *ah = to_rah(ibah); - rxe_drop_index(ah); rxe_drop_ref(ah); return 0; } @@ -437,7 +434,6 @@ static int rxe_create_qp(struct ib_qp *ibqp, struct ib_qp_init_attr *init, if (err) return err; - rxe_add_index(qp); err = rxe_qp_from_init(rxe, qp, pd, init, uresp, ibqp->pd, udata); if (err) goto qp_init; @@ -445,7 +441,6 @@ static int rxe_create_qp(struct ib_qp *ibqp, struct ib_qp_init_attr *init, return 0; qp_init: - rxe_drop_index(qp); rxe_drop_ref(qp); return err; } @@ -495,7 +490,6 @@ static int rxe_destroy_qp(struct ib_qp *ibqp, struct ib_udata *udata) struct rxe_qp *qp = to_rqp(ibqp); rxe_qp_destroy(qp); - rxe_drop_index(qp); rxe_drop_ref(qp); return 0; } @@ -898,7 +892,6 @@ static struct ib_mr *rxe_get_dma_mr(struct ib_pd *ibpd, int access) if (!mr) return ERR_PTR(-ENOMEM); - rxe_add_index(mr); rxe_add_ref(pd); rxe_mr_init_dma(pd, access, mr); @@ -922,7 +915,6 @@ static struct ib_mr *rxe_reg_user_mr(struct ib_pd *ibpd, goto err2; } - rxe_add_index(mr); rxe_add_ref(pd); @@ -934,7 +926,6 @@ static struct ib_mr *rxe_reg_user_mr(struct ib_pd *ibpd, err3: rxe_drop_ref(pd); - rxe_drop_index(mr); rxe_drop_ref(mr); err2: return ERR_PTR(err); @@ -957,8 +948,6 @@ static struct ib_mr *rxe_alloc_mr(struct ib_pd *ibpd, enum ib_mr_type mr_type, goto err1; } - rxe_add_index(mr); - rxe_add_ref(pd); err = rxe_mr_init_fast(pd, max_num_sg, mr); @@ -969,7 +958,6 @@ static struct ib_mr *rxe_alloc_mr(struct ib_pd *ibpd, enum ib_mr_type mr_type, err2: rxe_drop_ref(pd); - rxe_drop_index(mr); rxe_drop_ref(mr); err1: return ERR_PTR(err); From patchwork Sat Jan 15 04:29:10 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Bob Pearson X-Patchwork-Id: 12714323 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5525BC43217 for ; Sat, 15 Jan 2022 04:29:59 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232456AbiAOE36 (ORCPT ); Fri, 14 Jan 2022 23:29:58 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:42540 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232455AbiAOE36 (ORCPT ); Fri, 14 Jan 2022 23:29:58 -0500 Received: from mail-ot1-x32f.google.com (mail-ot1-x32f.google.com [IPv6:2607:f8b0:4864:20::32f]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id DA3CFC061574 for ; Fri, 14 Jan 2022 20:29:57 -0800 (PST) Received: by mail-ot1-x32f.google.com with SMTP id v8-20020a9d6048000000b005960952c694so2009588otj.12 for ; Fri, 14 Jan 2022 20:29:57 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=9mRs5UIotxqgzWYutm39oS/QceWN+6UhdYRy0fJf8AU=; b=XdPgodzwmc9XAxxLA8jHX95XSXlgXXJ1DdSiHRMvG74P22LfrDGb4eJWywOfYONzqb 8jipwPv9+cE3WA7zpX7GnOyAGLLuPHkogZteRg4rRLyO16rz4ei8zxgz/HutaeTx/o0R RNVAxkwqHWFPcmxi3M7oaYVesBKoi3MJFNrM5NJOfRCDMd6BzrmW0rP4i6EBocX0sTiM BvP+a63YCQOuqOkCWRfhc3vEB94a1C4LglAurgLOg5bRuNHZsH/G+klRPw95CoDAH3xT IDatzY3DSOSb9M1E4bcxiHEnWOZ8lZ702m03t41NRdUQtTeUfdcVQdBQweNiuYSsDZJ4 mfeA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=9mRs5UIotxqgzWYutm39oS/QceWN+6UhdYRy0fJf8AU=; b=PLHgv0KmhKmIRBcoMhM30RaneahhZj+QDIeMpkh/EIHdFa79ef6Npnw17h+Am50WV6 V+mBnh4le5VwEymdVcWjb4DEDZfxi+mhE3eGI7Qcn0ywFmUW/Z3D8s0nyngWHiM5Kxlv A0JZ4TBq/PHYgaT6gjSPu1bbmR04ibmyvcGGQZlDdp40pK/Y3DdGbtcGxC/bB4ksXT7L ID+SghaX66gEHraX7JxhHeJVUeWy6qoe8Bm6hptaV8lrTA4G5oij1Q3KiGTNYLraE8dw ggvC/8s6gbxkVE5G/4FQeMcZFR/gjg6vywQfvrhihGHAQTczvw+BzdlnAuCULuOPtuck W6Mw== X-Gm-Message-State: AOAM533Ud6fuAdJ+SPk4b7EeF79KbG3tkjEURiunc+xcPTRlL8H7DUoj fuZRZu+IMkb852hBzuF0BsN2J0WOoW0= X-Google-Smtp-Source: ABdhPJyyUcWnB2UCZ4B1u/i5lOfsQ05qdfx1Qou9BYuI/XCcKFYO/+mZxkmLOIwXAWhqpLz/soaEmA== X-Received: by 2002:a9d:5e0b:: with SMTP id d11mr9117125oti.167.1642220997095; Fri, 14 Jan 2022 20:29:57 -0800 (PST) Received: from ubuntu-21.tx.rr.com (097-099-248-255.res.spectrum.com. [97.99.248.255]) by smtp.googlemail.com with ESMTPSA id k8sm2757515oon.2.2022.01.14.20.29.56 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 14 Jan 2022 20:29:56 -0800 (PST) From: Bob Pearson To: jgg@nvidia.com, zyjzyj2000@gmail.com, linux-rdma@vger.kernel.org Cc: Bob Pearson Subject: [PATCH for-next v9 3/4] RDMA/rxe: Fix ref error in rxe_av.c Date: Fri, 14 Jan 2022 22:29:10 -0600 Message-Id: <20220115042910.40181-4-rpearsonhpe@gmail.com> X-Mailer: git-send-email 2.32.0 In-Reply-To: <20220115042910.40181-1-rpearsonhpe@gmail.com> References: <20220115042910.40181-1-rpearsonhpe@gmail.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-rdma@vger.kernel.org The commit referenced below can take a reference to the AH which is never dropped. This only happens in the UD request path. This patch optionally passes that AH back to the caller so that it can hold the reference while the AV is being accessed and then drop it. Code to do this is added to rxe_req.c. The AV is also passed to rxe_prepare in rxe_net.c as an optimization. Fixes: e2fe06c90806 ("RDMA/rxe: Lookup kernel AH from ah index in UD WQEs") Signed-off-by: Bob Pearson --- drivers/infiniband/sw/rxe/rxe_av.c | 19 +++++++++- drivers/infiniband/sw/rxe/rxe_loc.h | 5 ++- drivers/infiniband/sw/rxe/rxe_net.c | 17 +++++---- drivers/infiniband/sw/rxe/rxe_req.c | 55 +++++++++++++++++----------- drivers/infiniband/sw/rxe/rxe_resp.c | 2 +- 5 files changed, 63 insertions(+), 35 deletions(-) diff --git a/drivers/infiniband/sw/rxe/rxe_av.c b/drivers/infiniband/sw/rxe/rxe_av.c index 38c7b6fb39d7..360a567159fe 100644 --- a/drivers/infiniband/sw/rxe/rxe_av.c +++ b/drivers/infiniband/sw/rxe/rxe_av.c @@ -99,11 +99,14 @@ void rxe_av_fill_ip_info(struct rxe_av *av, struct rdma_ah_attr *attr) av->network_type = type; } -struct rxe_av *rxe_get_av(struct rxe_pkt_info *pkt) +struct rxe_av *rxe_get_av(struct rxe_pkt_info *pkt, struct rxe_ah **ahp) { struct rxe_ah *ah; u32 ah_num; + if (ahp) + *ahp = NULL; + if (!pkt || !pkt->qp) return NULL; @@ -117,10 +120,22 @@ struct rxe_av *rxe_get_av(struct rxe_pkt_info *pkt) if (ah_num) { /* only new user provider or kernel client */ ah = rxe_pool_get_index(&pkt->rxe->ah_pool, ah_num); - if (!ah || ah->ah_num != ah_num || rxe_ah_pd(ah) != pkt->qp->pd) { + if (!ah) { pr_warn("Unable to find AH matching ah_num\n"); return NULL; } + + if (rxe_ah_pd(ah) != pkt->qp->pd) { + pr_warn("PDs don't match for AH and QP\n"); + rxe_drop_ref(ah); + return NULL; + } + + if (ahp) + *ahp = ah; + else + rxe_drop_ref(ah); + return &ah->av; } diff --git a/drivers/infiniband/sw/rxe/rxe_loc.h b/drivers/infiniband/sw/rxe/rxe_loc.h index 805f40f84e62..35665d4bd960 100644 --- a/drivers/infiniband/sw/rxe/rxe_loc.h +++ b/drivers/infiniband/sw/rxe/rxe_loc.h @@ -19,7 +19,7 @@ void rxe_av_to_attr(struct rxe_av *av, struct rdma_ah_attr *attr); void rxe_av_fill_ip_info(struct rxe_av *av, struct rdma_ah_attr *attr); -struct rxe_av *rxe_get_av(struct rxe_pkt_info *pkt); +struct rxe_av *rxe_get_av(struct rxe_pkt_info *pkt, struct rxe_ah **ahp); /* rxe_cq.c */ int rxe_cq_chk_attr(struct rxe_dev *rxe, struct rxe_cq *cq, @@ -95,7 +95,8 @@ void rxe_mw_cleanup(struct rxe_pool_elem *arg); /* rxe_net.c */ struct sk_buff *rxe_init_packet(struct rxe_dev *rxe, struct rxe_av *av, int paylen, struct rxe_pkt_info *pkt); -int rxe_prepare(struct rxe_pkt_info *pkt, struct sk_buff *skb); +int rxe_prepare(struct rxe_av *av, struct rxe_pkt_info *pkt, + struct sk_buff *skb); int rxe_xmit_packet(struct rxe_qp *qp, struct rxe_pkt_info *pkt, struct sk_buff *skb); const char *rxe_parent_name(struct rxe_dev *rxe, unsigned int port_num); diff --git a/drivers/infiniband/sw/rxe/rxe_net.c b/drivers/infiniband/sw/rxe/rxe_net.c index a8cfa7160478..b06f22ffc5a8 100644 --- a/drivers/infiniband/sw/rxe/rxe_net.c +++ b/drivers/infiniband/sw/rxe/rxe_net.c @@ -271,13 +271,13 @@ static void prepare_ipv6_hdr(struct dst_entry *dst, struct sk_buff *skb, ip6h->payload_len = htons(skb->len - sizeof(*ip6h)); } -static int prepare4(struct rxe_pkt_info *pkt, struct sk_buff *skb) +static int prepare4(struct rxe_av *av, struct rxe_pkt_info *pkt, + struct sk_buff *skb) { struct rxe_qp *qp = pkt->qp; struct dst_entry *dst; bool xnet = false; __be16 df = htons(IP_DF); - struct rxe_av *av = rxe_get_av(pkt); struct in_addr *saddr = &av->sgid_addr._sockaddr_in.sin_addr; struct in_addr *daddr = &av->dgid_addr._sockaddr_in.sin_addr; @@ -297,11 +297,11 @@ static int prepare4(struct rxe_pkt_info *pkt, struct sk_buff *skb) return 0; } -static int prepare6(struct rxe_pkt_info *pkt, struct sk_buff *skb) +static int prepare6(struct rxe_av *av, struct rxe_pkt_info *pkt, + struct sk_buff *skb) { struct rxe_qp *qp = pkt->qp; struct dst_entry *dst; - struct rxe_av *av = rxe_get_av(pkt); struct in6_addr *saddr = &av->sgid_addr._sockaddr_in6.sin6_addr; struct in6_addr *daddr = &av->dgid_addr._sockaddr_in6.sin6_addr; @@ -322,16 +322,17 @@ static int prepare6(struct rxe_pkt_info *pkt, struct sk_buff *skb) return 0; } -int rxe_prepare(struct rxe_pkt_info *pkt, struct sk_buff *skb) +int rxe_prepare(struct rxe_av *av, struct rxe_pkt_info *pkt, + struct sk_buff *skb) { int err = 0; if (skb->protocol == htons(ETH_P_IP)) - err = prepare4(pkt, skb); + err = prepare4(av, pkt, skb); else if (skb->protocol == htons(ETH_P_IPV6)) - err = prepare6(pkt, skb); + err = prepare6(av, pkt, skb); - if (ether_addr_equal(skb->dev->dev_addr, rxe_get_av(pkt)->dmac)) + if (ether_addr_equal(skb->dev->dev_addr, av->dmac)) pkt->mask |= RXE_LOOPBACK_MASK; return err; diff --git a/drivers/infiniband/sw/rxe/rxe_req.c b/drivers/infiniband/sw/rxe/rxe_req.c index 5eb89052dd66..f44535f82bea 100644 --- a/drivers/infiniband/sw/rxe/rxe_req.c +++ b/drivers/infiniband/sw/rxe/rxe_req.c @@ -358,6 +358,7 @@ static inline int get_mtu(struct rxe_qp *qp) } static struct sk_buff *init_req_packet(struct rxe_qp *qp, + struct rxe_av *av, struct rxe_send_wqe *wqe, int opcode, int payload, struct rxe_pkt_info *pkt) @@ -365,7 +366,6 @@ static struct sk_buff *init_req_packet(struct rxe_qp *qp, struct rxe_dev *rxe = to_rdev(qp->ibqp.device); struct sk_buff *skb; struct rxe_send_wr *ibwr = &wqe->wr; - struct rxe_av *av; int pad = (-payload) & 0x3; int paylen; int solicited; @@ -374,21 +374,9 @@ static struct sk_buff *init_req_packet(struct rxe_qp *qp, /* length from start of bth to end of icrc */ paylen = rxe_opcode[opcode].length + payload + pad + RXE_ICRC_SIZE; - - /* pkt->hdr, port_num and mask are initialized in ifc layer */ - pkt->rxe = rxe; - pkt->opcode = opcode; - pkt->qp = qp; - pkt->psn = qp->req.psn; - pkt->mask = rxe_opcode[opcode].mask; - pkt->paylen = paylen; - pkt->wqe = wqe; + pkt->paylen = paylen; /* init skb */ - av = rxe_get_av(pkt); - if (!av) - return NULL; - skb = rxe_init_packet(rxe, av, paylen, pkt); if (unlikely(!skb)) return NULL; @@ -447,13 +435,13 @@ static struct sk_buff *init_req_packet(struct rxe_qp *qp, return skb; } -static int finish_packet(struct rxe_qp *qp, struct rxe_send_wqe *wqe, - struct rxe_pkt_info *pkt, struct sk_buff *skb, - int paylen) +static int finish_packet(struct rxe_qp *qp, struct rxe_av *av, + struct rxe_send_wqe *wqe, struct rxe_pkt_info *pkt, + struct sk_buff *skb, int paylen) { int err; - err = rxe_prepare(pkt, skb); + err = rxe_prepare(av, pkt, skb); if (err) return err; @@ -608,6 +596,7 @@ static int rxe_do_local_ops(struct rxe_qp *qp, struct rxe_send_wqe *wqe) int rxe_requester(void *arg) { struct rxe_qp *qp = (struct rxe_qp *)arg; + struct rxe_dev *rxe = to_rdev(qp->ibqp.device); struct rxe_pkt_info pkt; struct sk_buff *skb; struct rxe_send_wqe *wqe; @@ -619,6 +608,8 @@ int rxe_requester(void *arg) struct rxe_send_wqe rollback_wqe; u32 rollback_psn; struct rxe_queue *q = qp->sq.queue; + struct rxe_ah *ah; + struct rxe_av *av; rxe_add_ref(qp); @@ -705,14 +696,28 @@ int rxe_requester(void *arg) payload = mtu; } - skb = init_req_packet(qp, wqe, opcode, payload, &pkt); + pkt.rxe = rxe; + pkt.opcode = opcode; + pkt.qp = qp; + pkt.psn = qp->req.psn; + pkt.mask = rxe_opcode[opcode].mask; + pkt.wqe = wqe; + + av = rxe_get_av(&pkt, &ah); + if (unlikely(!av)) { + pr_err("qp#%d Failed no address vector\n", qp_num(qp)); + wqe->status = IB_WC_LOC_QP_OP_ERR; + goto err_drop_ah; + } + + skb = init_req_packet(qp, av, wqe, opcode, payload, &pkt); if (unlikely(!skb)) { pr_err("qp#%d Failed allocating skb\n", qp_num(qp)); wqe->status = IB_WC_LOC_QP_OP_ERR; - goto err; + goto err_drop_ah; } - ret = finish_packet(qp, wqe, &pkt, skb, payload); + ret = finish_packet(qp, av, wqe, &pkt, skb, payload); if (unlikely(ret)) { pr_debug("qp#%d Error during finish packet\n", qp_num(qp)); if (ret == -EFAULT) @@ -720,9 +725,12 @@ int rxe_requester(void *arg) else wqe->status = IB_WC_LOC_QP_OP_ERR; kfree_skb(skb); - goto err; + goto err_drop_ah; } + if (ah) + rxe_drop_ref(ah); + /* * To prevent a race on wqe access between requester and completer, * wqe members state and psn need to be set before calling @@ -751,6 +759,9 @@ int rxe_requester(void *arg) goto next_wqe; +err_drop_ah: + if (ah) + rxe_drop_ref(ah); err: wqe->state = wqe_state_error; __rxe_do_task(&qp->comp.task); diff --git a/drivers/infiniband/sw/rxe/rxe_resp.c b/drivers/infiniband/sw/rxe/rxe_resp.c index e8f435fa6e4d..f589f4dde35c 100644 --- a/drivers/infiniband/sw/rxe/rxe_resp.c +++ b/drivers/infiniband/sw/rxe/rxe_resp.c @@ -632,7 +632,7 @@ static struct sk_buff *prepare_ack_packet(struct rxe_qp *qp, if (ack->mask & RXE_ATMACK_MASK) atmack_set_orig(ack, qp->resp.atomic_orig); - err = rxe_prepare(ack, skb); + err = rxe_prepare(&qp->pri_av, ack, skb); if (err) { kfree_skb(skb); return NULL; From patchwork Sat Jan 15 04:29:11 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Bob Pearson X-Patchwork-Id: 12714325 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id C3667C433EF for ; Sat, 15 Jan 2022 04:29:59 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232457AbiAOE37 (ORCPT ); Fri, 14 Jan 2022 23:29:59 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:42542 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229952AbiAOE36 (ORCPT ); Fri, 14 Jan 2022 23:29:58 -0500 Received: from mail-ot1-x336.google.com (mail-ot1-x336.google.com [IPv6:2607:f8b0:4864:20::336]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 7224BC06161C for ; Fri, 14 Jan 2022 20:29:58 -0800 (PST) Received: by mail-ot1-x336.google.com with SMTP id c3-20020a9d6c83000000b00590b9c8819aso12512305otr.6 for ; Fri, 14 Jan 2022 20:29:58 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=XnlfDtbJ9XSg/Z8I+NhIk+YRiXsYOxtXy3+8vQ8Cnvo=; b=HQmHG7fmBYe9ttt1trkllQtty518TbrV16ctFCvQlYL7atKf9MypwTgPYrkgcgb6GJ 03JI+AnHRbb+YAv2VpJujuu9fzBmRuUIcqlmx3vGkr2fEQwZ2404SkQzjZ1VRttu7s+p rm/xBB5BqY6d6CnvZ/X+yFcMkar+s6eylGsQdIXarw/kQc+p3IqA5sN5SPWLVzE3wD4O L8R0hZnztItXHXZedAZYYof2f7YjRZGPny56vCCtOeydxvUE7SYZvSUYt9fvlvvUbMhX xGuctvN8vL13xJbVGBpzT8y2uCVATe4ZtbNVPFimCynYNzeOQ3S+h34VLhBWhbGEso7B Pafw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=XnlfDtbJ9XSg/Z8I+NhIk+YRiXsYOxtXy3+8vQ8Cnvo=; b=H6yuWZQMUpfkEuPonvH4A5Lebgc5sWx6MtTZps5NSeqbr25QNu7xoq2EM1/egbMv9o JJ/Rifp1XzHCz9LBBr8+c7VCn2OoZxlxhQbhzr0jsCNjCOKAoIJqClWzSja+nvQUjPHQ 5mFuDFeDnvHRWh92ZlLThEUC3wBRI2R3TVabzMVkV0lnHBs+3aSeqhRGf4+fx21Nu7uH vwX6vqrQ25c7YFPZzdMkMo+c6MEI38AVZop9FmLPP/8uiQVHc9ImU4QELm0wVmZCw7ad kT8qeO8g3yyvVGkbCMZL5Ncks6i18cnvgOAmdkzhhBZYC9pdonoIymwpUOjCanoBcTgV fUsQ== X-Gm-Message-State: AOAM532fP4uC1KUIE2S9ZWft5nCKrgYMqFRpKJkxGZKbn+ubMYVU94Z4 aQ4c06A/ROlnjIG3zFVasH0= X-Google-Smtp-Source: ABdhPJwUWV5T1xgnx+QL3kAbqNgN/3Vfej3cNivOwRC7ieFFLhnS9xdZ3xluVt7UlLPvoj++0wboNw== X-Received: by 2002:a9d:2206:: with SMTP id o6mr9363253ota.148.1642220997832; Fri, 14 Jan 2022 20:29:57 -0800 (PST) Received: from ubuntu-21.tx.rr.com (097-099-248-255.res.spectrum.com. [97.99.248.255]) by smtp.googlemail.com with ESMTPSA id k8sm2757515oon.2.2022.01.14.20.29.57 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 14 Jan 2022 20:29:57 -0800 (PST) From: Bob Pearson To: jgg@nvidia.com, zyjzyj2000@gmail.com, linux-rdma@vger.kernel.org Cc: Bob Pearson Subject: [PATCH for-next v9 4/4] RDMA/rxe: Replace mr by rkey in responder resources Date: Fri, 14 Jan 2022 22:29:11 -0600 Message-Id: <20220115042910.40181-5-rpearsonhpe@gmail.com> X-Mailer: git-send-email 2.32.0 In-Reply-To: <20220115042910.40181-1-rpearsonhpe@gmail.com> References: <20220115042910.40181-1-rpearsonhpe@gmail.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-rdma@vger.kernel.org Currently rxe saves a copy of MR in responder resources for RDMA reads. Since the responder resources are never freed just over written if more are needed this MR may not have a reference freed until the QP is destroyed. This patch uses the rkey instead of the MR and on subsequent packets of a multipacket read reply message it looks up the MR from the rkey for each packet. This makes it possible for a user to deregister an MR or unbind a MW on the fly and get correct behaviour. Signed-off-by: Bob Pearson --- drivers/infiniband/sw/rxe/rxe_qp.c | 10 +-- drivers/infiniband/sw/rxe/rxe_resp.c | 123 ++++++++++++++++++-------- drivers/infiniband/sw/rxe/rxe_verbs.h | 1 - 3 files changed, 87 insertions(+), 47 deletions(-) diff --git a/drivers/infiniband/sw/rxe/rxe_qp.c b/drivers/infiniband/sw/rxe/rxe_qp.c index 4c0cea0833ee..8986f871b7cb 100644 --- a/drivers/infiniband/sw/rxe/rxe_qp.c +++ b/drivers/infiniband/sw/rxe/rxe_qp.c @@ -135,12 +135,8 @@ static void free_rd_atomic_resources(struct rxe_qp *qp) void free_rd_atomic_resource(struct rxe_qp *qp, struct resp_res *res) { - if (res->type == RXE_ATOMIC_MASK) { + if (res->type == RXE_ATOMIC_MASK) kfree_skb(res->atomic.skb); - } else if (res->type == RXE_READ_MASK) { - if (res->read.mr) - rxe_drop_ref(res->read.mr); - } res->type = 0; } @@ -816,10 +812,8 @@ static void rxe_qp_do_cleanup(struct work_struct *work) if (qp->pd) rxe_drop_ref(qp->pd); - if (qp->resp.mr) { + if (qp->resp.mr) rxe_drop_ref(qp->resp.mr); - qp->resp.mr = NULL; - } if (qp_type(qp) == IB_QPT_RC) sk_dst_reset(qp->sk->sk); diff --git a/drivers/infiniband/sw/rxe/rxe_resp.c b/drivers/infiniband/sw/rxe/rxe_resp.c index f589f4dde35c..c776289842e5 100644 --- a/drivers/infiniband/sw/rxe/rxe_resp.c +++ b/drivers/infiniband/sw/rxe/rxe_resp.c @@ -641,6 +641,78 @@ static struct sk_buff *prepare_ack_packet(struct rxe_qp *qp, return skb; } +static struct resp_res *rxe_prepare_read_res(struct rxe_qp *qp, + struct rxe_pkt_info *pkt) +{ + struct resp_res *res; + u32 pkts; + + res = &qp->resp.resources[qp->resp.res_head]; + rxe_advance_resp_resource(qp); + free_rd_atomic_resource(qp, res); + + res->type = RXE_READ_MASK; + res->replay = 0; + res->read.va = qp->resp.va + qp->resp.offset; + res->read.va_org = qp->resp.va + qp->resp.offset; + res->read.resid = qp->resp.resid; + res->read.length = qp->resp.resid; + res->read.rkey = qp->resp.rkey; + + pkts = max_t(u32, (reth_len(pkt) + qp->mtu - 1)/qp->mtu, 1); + res->first_psn = pkt->psn; + res->cur_psn = pkt->psn; + res->last_psn = (pkt->psn + pkts - 1) & BTH_PSN_MASK; + + res->state = rdatm_res_state_new; + + return res; +} + +/** + * rxe_recheck_mr - revalidate MR from rkey and get a reference + * @qp: the qp + * @rkey: the rkey + * + * This code allows the MR to be invalidated or deregistered or + * the MW if one was used to be invalidated or deallocated. + * It is assumed that the access permissions if originally good + * are OK and the mappings to be unchanged. + * + * Return: mr on success else NULL + */ +static struct rxe_mr *rxe_recheck_mr(struct rxe_qp *qp, u32 rkey) +{ + struct rxe_dev *rxe = to_rdev(qp->ibqp.device); + struct rxe_mr *mr; + struct rxe_mw *mw; + + if (rkey_is_mw(rkey)) { + mw = rxe_pool_get_index(&rxe->mw_pool, rkey >> 8); + if (!mw || mw->rkey != rkey) + return NULL; + + if (mw->state != RXE_MW_STATE_VALID) { + rxe_drop_ref(mw); + return NULL; + } + + mr = mw->mr; + rxe_drop_ref(mw); + } else { + mr = rxe_pool_get_index(&rxe->mr_pool, rkey >> 8); + if (!mr || mr->rkey != rkey) + return NULL; + } + + if (mr->state != RXE_MR_STATE_VALID) { + rxe_drop_ref(mr); + return NULL; + } + + return mr; +} + /* RDMA read response. If res is not NULL, then we have a current RDMA request * being processed or replayed. */ @@ -655,53 +727,26 @@ static enum resp_states read_reply(struct rxe_qp *qp, int opcode; int err; struct resp_res *res = qp->resp.res; + struct rxe_mr *mr; if (!res) { - /* This is the first time we process that request. Get a - * resource - */ - res = &qp->resp.resources[qp->resp.res_head]; - - free_rd_atomic_resource(qp, res); - rxe_advance_resp_resource(qp); - - res->type = RXE_READ_MASK; - res->replay = 0; - - res->read.va = qp->resp.va + - qp->resp.offset; - res->read.va_org = qp->resp.va + - qp->resp.offset; - - res->first_psn = req_pkt->psn; - - if (reth_len(req_pkt)) { - res->last_psn = (req_pkt->psn + - (reth_len(req_pkt) + mtu - 1) / - mtu - 1) & BTH_PSN_MASK; - } else { - res->last_psn = res->first_psn; - } - res->cur_psn = req_pkt->psn; - - res->read.resid = qp->resp.resid; - res->read.length = qp->resp.resid; - res->read.rkey = qp->resp.rkey; - - /* note res inherits the reference to mr from qp */ - res->read.mr = qp->resp.mr; - qp->resp.mr = NULL; - - qp->resp.res = res; - res->state = rdatm_res_state_new; + res = rxe_prepare_read_res(qp, req_pkt); + qp->resp.res = res; } if (res->state == rdatm_res_state_new) { + mr = qp->resp.mr; + qp->resp.mr = NULL; + if (res->read.resid <= mtu) opcode = IB_OPCODE_RC_RDMA_READ_RESPONSE_ONLY; else opcode = IB_OPCODE_RC_RDMA_READ_RESPONSE_FIRST; } else { + mr = rxe_recheck_mr(qp, res->read.rkey); + if (!mr) + return RESPST_ERR_RKEY_VIOLATION; + if (res->read.resid > mtu) opcode = IB_OPCODE_RC_RDMA_READ_RESPONSE_MIDDLE; else @@ -717,10 +762,12 @@ static enum resp_states read_reply(struct rxe_qp *qp, if (!skb) return RESPST_ERR_RNR; - err = rxe_mr_copy(res->read.mr, res->read.va, payload_addr(&ack_pkt), + err = rxe_mr_copy(mr, res->read.va, payload_addr(&ack_pkt), payload, RXE_FROM_MR_OBJ); if (err) pr_err("Failed copying memory\n"); + if (mr) + rxe_drop_ref(mr); if (bth_pad(&ack_pkt)) { u8 *pad = payload_addr(&ack_pkt) + payload; diff --git a/drivers/infiniband/sw/rxe/rxe_verbs.h b/drivers/infiniband/sw/rxe/rxe_verbs.h index e2431753755e..46eb6a19a800 100644 --- a/drivers/infiniband/sw/rxe/rxe_verbs.h +++ b/drivers/infiniband/sw/rxe/rxe_verbs.h @@ -157,7 +157,6 @@ struct resp_res { struct sk_buff *skb; } atomic; struct { - struct rxe_mr *mr; u64 va_org; u32 rkey; u32 length;