From patchwork Thu Dec 7 19:29:02 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Bob Pearson X-Patchwork-Id: 13483878 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="kb2kBtCE" Received: from mail-oi1-x22f.google.com (mail-oi1-x22f.google.com [IPv6:2607:f8b0:4864:20::22f]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 3EACF171E; Thu, 7 Dec 2023 11:30:17 -0800 (PST) Received: by mail-oi1-x22f.google.com with SMTP id 5614622812f47-3b844e3e817so890496b6e.0; Thu, 07 Dec 2023 11:30:17 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1701977416; x=1702582216; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=QktSggJLQcetld8uakFuhfIq2ygrmNPQ2pKu14TqlYs=; b=kb2kBtCE5GTljKUyELKpwZYOlcMN/25IwQzvdoKq6rfMdQG51x9RDXQKkix2oRUfMt zrbEKQ831KRkHvCc2nE8IbC25s1SsWNiAyOYCHUxKyDXTCDWgIKCGLkQBXKRzDQ/nqzG 908QWoFt+c8JpoJ3CTnDLhN6yiIZZBQqA8fC6pR2w2n2VCoiHtSiV5jJU3oyR7Xk0GJm WOYp0z/ZbxTCllVcK87oIdx+WDRg03uBJHw8M+Vl6cFCZLqR/z9Nl5jRGZObW6ZZumAS JoVS4Z8i6korjrswqpyKtESjT681lF+BbYV6Q0dmtSnJuaT6FSnCUiISyAfzJCf9AupL NkXw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1701977416; x=1702582216; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=QktSggJLQcetld8uakFuhfIq2ygrmNPQ2pKu14TqlYs=; b=fAhEWMCDzd3tqSg3sZAt2jPzwbRDJvPuGZihnzZpN6n5wnqosgoLeKgVFAK0HcU1bN CvPkqyHsQkXJRH4gM1AjlhWuurs2q3ORJV1g3dHDx+WOW2GyWy98HnFBCAstVijUK69B t/JhLtpL4stPnUYEuuVams1yIiA6p7rLju9JDH38aMtGpOpfNYSf9YmGPQHJ5b6Kdaab JB4SCLm24qz3DXzmh+mTf+eiWdtRkJFYB5bijWi/7+jWhd/j+XaIwMSqey/4z4yYuJl9 uWfkhNoX3LsY9IDdNHbKVDkYa9op0iksdJXFGtdbFyH38njrfHEoL0mrq7IVnlZo+15u PJeQ== X-Gm-Message-State: AOJu0Yx4QwvcxyAKhqvh9eeq708xTE0wmYK/jfbIGHYBkUt3EXkKxIhX J472WD+qosBYhggttD4P1deB1AJ6DAA= X-Google-Smtp-Source: AGHT+IHaugf7v5rev4ezH8+oFQOF1jLIoqdSDgVBREoc0e+fC5G4xmuVkyKtz+7e/lA2vuYCkiVZ9g== X-Received: by 2002:a05:6870:7010:b0:1fb:5e42:5094 with SMTP id u16-20020a056870701000b001fb5e425094mr3443527oae.41.1701977416377; Thu, 07 Dec 2023 11:30:16 -0800 (PST) Received: from bob-3900x.lan (2603-8081-1405-679b-ca1f-53fd-59c8-8b84.res6.spectrum.com. [2603:8081:1405:679b:ca1f:53fd:59c8:8b84]) by smtp.gmail.com with ESMTPSA id mp23-20020a056871329700b001fb634b546dsm92347oac.14.2023.12.07.11.30.15 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 07 Dec 2023 11:30:15 -0800 (PST) From: Bob Pearson To: jgg@nvidia.com, yanjun.zhu@linux.dev, linux-rdma@vger.kernel.org, netdev@vger.kernel.org, dsahern@kernel.org, rain.1986.08.12@gmail.com Cc: Bob Pearson Subject: [PATCH for-next v6 1/7] RDMA/rxe: Cleanup rxe_ah/av_chk_attr Date: Thu, 7 Dec 2023 13:29:02 -0600 Message-Id: <20231207192907.10113-2-rpearsonhpe@gmail.com> X-Mailer: git-send-email 2.40.1 In-Reply-To: <20231207192907.10113-1-rpearsonhpe@gmail.com> References: <20231207192907.10113-1-rpearsonhpe@gmail.com> Precedence: bulk X-Mailing-List: linux-rdma@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Replace rxe_ah_chk_attr() and rxe_av_chk_attr() by a single routine rxe_chk_ah_attr(). Signed-off-by: Bob Pearson --- drivers/infiniband/sw/rxe/rxe_av.c | 43 ++++----------------------- drivers/infiniband/sw/rxe/rxe_loc.h | 3 +- drivers/infiniband/sw/rxe/rxe_qp.c | 4 +-- drivers/infiniband/sw/rxe/rxe_verbs.c | 5 ++-- 4 files changed, 12 insertions(+), 43 deletions(-) diff --git a/drivers/infiniband/sw/rxe/rxe_av.c b/drivers/infiniband/sw/rxe/rxe_av.c index 889d7adbd455..4ac17b8def28 100644 --- a/drivers/infiniband/sw/rxe/rxe_av.c +++ b/drivers/infiniband/sw/rxe/rxe_av.c @@ -14,45 +14,24 @@ void rxe_init_av(struct rdma_ah_attr *attr, struct rxe_av *av) memcpy(av->dmac, attr->roce.dmac, ETH_ALEN); } -static int chk_attr(void *obj, struct rdma_ah_attr *attr, bool obj_is_ah) +int rxe_chk_ah_attr(struct rxe_dev *rxe, struct rdma_ah_attr *attr) { const struct ib_global_route *grh = rdma_ah_read_grh(attr); - struct rxe_port *port; - struct rxe_dev *rxe; - struct rxe_qp *qp; - struct rxe_ah *ah; + struct rxe_port *port = &rxe->port; int type; - if (obj_is_ah) { - ah = obj; - rxe = to_rdev(ah->ibah.device); - } else { - qp = obj; - rxe = to_rdev(qp->ibqp.device); - } - - port = &rxe->port; - if (rdma_ah_get_ah_flags(attr) & IB_AH_GRH) { if (grh->sgid_index > port->attr.gid_tbl_len) { - if (obj_is_ah) - rxe_dbg_ah(ah, "invalid sgid index = %d\n", - grh->sgid_index); - else - rxe_dbg_qp(qp, "invalid sgid index = %d\n", - grh->sgid_index); + rxe_dbg_dev(rxe, "invalid sgid index = %d\n", + grh->sgid_index); return -EINVAL; } type = rdma_gid_attr_network_type(grh->sgid_attr); if (type < RDMA_NETWORK_IPV4 || type > RDMA_NETWORK_IPV6) { - if (obj_is_ah) - rxe_dbg_ah(ah, "invalid network type for rdma_rxe = %d\n", - type); - else - rxe_dbg_qp(qp, "invalid network type for rdma_rxe = %d\n", - type); + rxe_dbg_dev(rxe, "invalid network type for rdma_rxe = %d\n", + type); return -EINVAL; } } @@ -60,16 +39,6 @@ static int chk_attr(void *obj, struct rdma_ah_attr *attr, bool obj_is_ah) return 0; } -int rxe_av_chk_attr(struct rxe_qp *qp, struct rdma_ah_attr *attr) -{ - return chk_attr(qp, attr, false); -} - -int rxe_ah_chk_attr(struct rxe_ah *ah, struct rdma_ah_attr *attr) -{ - return chk_attr(ah, attr, true); -} - void rxe_av_from_attr(u8 port_num, struct rxe_av *av, struct rdma_ah_attr *attr) { diff --git a/drivers/infiniband/sw/rxe/rxe_loc.h b/drivers/infiniband/sw/rxe/rxe_loc.h index 4d2a8ef52c85..3d2504a0ae56 100644 --- a/drivers/infiniband/sw/rxe/rxe_loc.h +++ b/drivers/infiniband/sw/rxe/rxe_loc.h @@ -9,8 +9,7 @@ /* rxe_av.c */ void rxe_init_av(struct rdma_ah_attr *attr, struct rxe_av *av); -int rxe_av_chk_attr(struct rxe_qp *qp, struct rdma_ah_attr *attr); -int rxe_ah_chk_attr(struct rxe_ah *ah, struct rdma_ah_attr *attr); +int rxe_chk_ah_attr(struct rxe_dev *rxe, struct rdma_ah_attr *attr); void rxe_av_from_attr(u8 port_num, struct rxe_av *av, struct rdma_ah_attr *attr); void rxe_av_to_attr(struct rxe_av *av, struct rdma_ah_attr *attr); diff --git a/drivers/infiniband/sw/rxe/rxe_qp.c b/drivers/infiniband/sw/rxe/rxe_qp.c index 28e379c108bc..c28005db032d 100644 --- a/drivers/infiniband/sw/rxe/rxe_qp.c +++ b/drivers/infiniband/sw/rxe/rxe_qp.c @@ -456,11 +456,11 @@ int rxe_qp_chk_attr(struct rxe_dev *rxe, struct rxe_qp *qp, goto err1; } - if (mask & IB_QP_AV && rxe_av_chk_attr(qp, &attr->ah_attr)) + if (mask & IB_QP_AV && rxe_chk_ah_attr(rxe, &attr->ah_attr)) goto err1; if (mask & IB_QP_ALT_PATH) { - if (rxe_av_chk_attr(qp, &attr->alt_ah_attr)) + if (rxe_chk_ah_attr(rxe, &attr->alt_ah_attr)) goto err1; if (!rdma_is_port_valid(&rxe->ib_dev, attr->alt_port_num)) { rxe_dbg_qp(qp, "invalid alt port %d\n", attr->alt_port_num); diff --git a/drivers/infiniband/sw/rxe/rxe_verbs.c b/drivers/infiniband/sw/rxe/rxe_verbs.c index 48f86839d36a..6706d540f1f6 100644 --- a/drivers/infiniband/sw/rxe/rxe_verbs.c +++ b/drivers/infiniband/sw/rxe/rxe_verbs.c @@ -286,7 +286,7 @@ static int rxe_create_ah(struct ib_ah *ibah, /* create index > 0 */ ah->ah_num = ah->elem.index; - err = rxe_ah_chk_attr(ah, init_attr->ah_attr); + err = rxe_chk_ah_attr(rxe, init_attr->ah_attr); if (err) { rxe_dbg_ah(ah, "bad attr"); goto err_cleanup; @@ -322,10 +322,11 @@ static int rxe_create_ah(struct ib_ah *ibah, static int rxe_modify_ah(struct ib_ah *ibah, struct rdma_ah_attr *attr) { + struct rxe_dev *rxe = to_rdev(ibah->device); struct rxe_ah *ah = to_rah(ibah); int err; - err = rxe_ah_chk_attr(ah, attr); + err = rxe_chk_ah_attr(rxe, attr); if (err) { rxe_dbg_ah(ah, "bad attr"); goto err_out; From patchwork Thu Dec 7 19:29:03 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Bob Pearson X-Patchwork-Id: 13483879 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="TqNpJanj" Received: from mail-oa1-x29.google.com (mail-oa1-x29.google.com [IPv6:2001:4860:4864:20::29]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id E392A1722; Thu, 7 Dec 2023 11:30:18 -0800 (PST) Received: by mail-oa1-x29.google.com with SMTP id 586e51a60fabf-1fb9a22b4a7so787313fac.3; Thu, 07 Dec 2023 11:30:18 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1701977417; x=1702582217; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=XAo9Voy4RzHWA16EdVudg5saPOZLVlTIEuxiR5JPPMk=; b=TqNpJanje8kkQVMPKrLidQ738NHVW3WDmhI7totVpeNpT1a3xIBe4E6ZLeEv6lzPqo bOHMQjdTI950dEv2GqnlxNp3UJ66AAn0xo85NhtgnUeV3hzbD3O0g4NMX1IVCwcAB8MH drkvzWgdtGPlbut47n7uflG8usco/RSmlGkJ6C5/IijWt+V5cVQQbhn0GF40qDfuqWta 5B6IY+tZenRch2Z/IhuWceQenGr/NZjDOQnDHCkzoaidlm4gWZBM7+Kmoj0pW+Uv9DOy iKHlYiGLftPavg/5OYCDq1SyONEt392KELJL2wsdCMX2gnaTGI9LzxWlXwYDv80hXRkP hgJg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1701977417; x=1702582217; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=XAo9Voy4RzHWA16EdVudg5saPOZLVlTIEuxiR5JPPMk=; b=t2TrpWEbJoFMh0S65eySx7x/pVYGgJE8YUSMshLRNjhUUo9Iamc9lm5FIY7RvM0xVo 92R9axn/X4d2+chuAVlLtEyHQnRHVOWIHq9rsGLHgeNPxHZre/soI3OsCaMP9YBnjN8q 8IZoZO+EPznFtlLOmbq8pxW0WP75RG64x/Hr5PCnjCPM39JrApDdjXxql8/Ls6cTAzde wAVU1pcp9sIisDwwghC36gynnUEy0rEIr59yO8XQXM+B/fkoWhuBQ6AjlfIleJFEvD3v CYHtgSkvOahueRW5KjhO/kdvwkNdaJdAHbJLcs/CGVgrbOkDy6Vo6OM/M40bl+HJDvwg rtSg== X-Gm-Message-State: AOJu0YxaIKTVu87wWyAuGjTUKv6hGSWa8fH5RC9ouA2P1rDOGwOOqs/8 e4xwrrgFzHovCoZkEpttYsg= X-Google-Smtp-Source: AGHT+IEVAmegguGtfNHWli3NSVAd9B1FfQ+9wHqXSwq9t0bBPjvI+hoO179TRxj0kIglx8x1lsh1LQ== X-Received: by 2002:a05:6870:1f16:b0:1fb:75b:99bd with SMTP id pd22-20020a0568701f1600b001fb075b99bdmr3526861oab.108.1701977417514; Thu, 07 Dec 2023 11:30:17 -0800 (PST) Received: from bob-3900x.lan (2603-8081-1405-679b-ca1f-53fd-59c8-8b84.res6.spectrum.com. [2603:8081:1405:679b:ca1f:53fd:59c8:8b84]) by smtp.gmail.com with ESMTPSA id mp23-20020a056871329700b001fb634b546dsm92347oac.14.2023.12.07.11.30.16 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 07 Dec 2023 11:30:16 -0800 (PST) From: Bob Pearson To: jgg@nvidia.com, yanjun.zhu@linux.dev, linux-rdma@vger.kernel.org, netdev@vger.kernel.org, dsahern@kernel.org, rain.1986.08.12@gmail.com Cc: Bob Pearson Subject: [PATCH for-next v6 2/7] RDMA/rxe: Fix sending of mcast packets Date: Thu, 7 Dec 2023 13:29:03 -0600 Message-Id: <20231207192907.10113-3-rpearsonhpe@gmail.com> X-Mailer: git-send-email 2.40.1 In-Reply-To: <20231207192907.10113-1-rpearsonhpe@gmail.com> References: <20231207192907.10113-1-rpearsonhpe@gmail.com> Precedence: bulk X-Mailing-List: linux-rdma@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Currently the rdma_rxe driver does not send mcast packets correctly. It uses the wrong qp number for the packets. Add a mask bit to indicate that a multicast packet has been locally sent and use to set the correct qpn for multicast packets and identify mcast packets when sending. Fixes: 8700e3e7c485 ("Soft RoCE driver") Signed-off-by: Bob Pearson --- drivers/infiniband/sw/rxe/rxe_av.c | 7 +++++++ drivers/infiniband/sw/rxe/rxe_loc.h | 1 + drivers/infiniband/sw/rxe/rxe_net.c | 4 +++- drivers/infiniband/sw/rxe/rxe_opcode.h | 2 +- drivers/infiniband/sw/rxe/rxe_recv.c | 4 ++++ drivers/infiniband/sw/rxe/rxe_req.c | 11 +++++++++-- 6 files changed, 25 insertions(+), 4 deletions(-) diff --git a/drivers/infiniband/sw/rxe/rxe_av.c b/drivers/infiniband/sw/rxe/rxe_av.c index 4ac17b8def28..022173eb5d75 100644 --- a/drivers/infiniband/sw/rxe/rxe_av.c +++ b/drivers/infiniband/sw/rxe/rxe_av.c @@ -7,6 +7,13 @@ #include "rxe.h" #include "rxe_loc.h" +bool rxe_is_mcast_av(struct rxe_av *av) +{ + struct in6_addr *daddr = (struct in6_addr *)av->grh.dgid.raw; + + return rdma_is_multicast_addr(daddr); +} + void rxe_init_av(struct rdma_ah_attr *attr, struct rxe_av *av) { rxe_av_from_attr(rdma_ah_get_port_num(attr), av, attr); diff --git a/drivers/infiniband/sw/rxe/rxe_loc.h b/drivers/infiniband/sw/rxe/rxe_loc.h index 3d2504a0ae56..62b2b25903fc 100644 --- a/drivers/infiniband/sw/rxe/rxe_loc.h +++ b/drivers/infiniband/sw/rxe/rxe_loc.h @@ -8,6 +8,7 @@ #define RXE_LOC_H /* rxe_av.c */ +bool rxe_is_mcast_av(struct rxe_av *av); void rxe_init_av(struct rdma_ah_attr *attr, struct rxe_av *av); int rxe_chk_ah_attr(struct rxe_dev *rxe, struct rdma_ah_attr *attr); void rxe_av_from_attr(u8 port_num, struct rxe_av *av, diff --git a/drivers/infiniband/sw/rxe/rxe_net.c b/drivers/infiniband/sw/rxe/rxe_net.c index cd59666158b1..58c3f3759bf0 100644 --- a/drivers/infiniband/sw/rxe/rxe_net.c +++ b/drivers/infiniband/sw/rxe/rxe_net.c @@ -431,7 +431,9 @@ int rxe_xmit_packet(struct rxe_qp *qp, struct rxe_pkt_info *pkt, rxe_icrc_generate(skb, pkt); - if (pkt->mask & RXE_LOOPBACK_MASK) + if (pkt->mask & RXE_MCAST_MASK) + err = rxe_send(skb, pkt); + else if (pkt->mask & RXE_LOOPBACK_MASK) err = rxe_loopback(skb, pkt); else err = rxe_send(skb, pkt); diff --git a/drivers/infiniband/sw/rxe/rxe_opcode.h b/drivers/infiniband/sw/rxe/rxe_opcode.h index 5686b691d6b8..c4cf672ea26d 100644 --- a/drivers/infiniband/sw/rxe/rxe_opcode.h +++ b/drivers/infiniband/sw/rxe/rxe_opcode.h @@ -85,7 +85,7 @@ enum rxe_hdr_mask { RXE_END_MASK = BIT(NUM_HDR_TYPES + 11), RXE_LOOPBACK_MASK = BIT(NUM_HDR_TYPES + 12), - + RXE_MCAST_MASK = BIT(NUM_HDR_TYPES + 13), RXE_ATOMIC_WRITE_MASK = BIT(NUM_HDR_TYPES + 14), RXE_READ_OR_ATOMIC_MASK = (RXE_READ_MASK | RXE_ATOMIC_MASK), diff --git a/drivers/infiniband/sw/rxe/rxe_recv.c b/drivers/infiniband/sw/rxe/rxe_recv.c index 5861e4244049..7153de0799fc 100644 --- a/drivers/infiniband/sw/rxe/rxe_recv.c +++ b/drivers/infiniband/sw/rxe/rxe_recv.c @@ -217,6 +217,10 @@ static void rxe_rcv_mcast_pkt(struct rxe_dev *rxe, struct sk_buff *skb) list_for_each_entry(mca, &mcg->qp_list, qp_list) { qp = mca->qp; + /* don't reply packet to sender if locally sent */ + if (pkt->mask & RXE_MCAST_MASK && qp_num(qp) == deth_sqp(pkt)) + continue; + /* validate qp for incoming packet */ err = check_type_state(rxe, pkt, qp); if (err) diff --git a/drivers/infiniband/sw/rxe/rxe_req.c b/drivers/infiniband/sw/rxe/rxe_req.c index d8c41fd626a9..599bec88cb54 100644 --- a/drivers/infiniband/sw/rxe/rxe_req.c +++ b/drivers/infiniband/sw/rxe/rxe_req.c @@ -442,8 +442,12 @@ static struct sk_buff *init_req_packet(struct rxe_qp *qp, (pkt->mask & (RXE_WRITE_MASK | RXE_IMMDT_MASK)) == (RXE_WRITE_MASK | RXE_IMMDT_MASK)); - qp_num = (pkt->mask & RXE_DETH_MASK) ? ibwr->wr.ud.remote_qpn : - qp->attr.dest_qp_num; + if (pkt->mask & RXE_MCAST_MASK) + qp_num = IB_MULTICAST_QPN; + else if (pkt->mask & RXE_DETH_MASK) + qp_num = ibwr->wr.ud.remote_qpn; + else + qp_num = qp->attr.dest_qp_num; ack_req = ((pkt->mask & RXE_END_MASK) || (qp->req.noack_pkts++ > RXE_MAX_PKT_PER_ACK)); @@ -809,6 +813,9 @@ int rxe_requester(struct rxe_qp *qp) goto err; } + if (rxe_is_mcast_av(av)) + pkt.mask |= RXE_MCAST_MASK; + skb = init_req_packet(qp, av, wqe, opcode, payload, &pkt); if (unlikely(!skb)) { rxe_dbg_qp(qp, "Failed allocating skb\n"); From patchwork Thu Dec 7 19:29:04 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Bob Pearson X-Patchwork-Id: 13483880 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="erpPWC/C" Received: from mail-oo1-xc29.google.com (mail-oo1-xc29.google.com [IPv6:2607:f8b0:4864:20::c29]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 63FC11709; Thu, 7 Dec 2023 11:30:19 -0800 (PST) Received: by mail-oo1-xc29.google.com with SMTP id 006d021491bc7-5906c569a1fso484843eaf.1; Thu, 07 Dec 2023 11:30:19 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1701977418; x=1702582218; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=cCo25qT9AtztM6NlEg7BzkQzG2uRIBLOVHP87eaH1kE=; b=erpPWC/CktsiZevgUN7TBFNb2j9aycgWXDWiYpG9ITJLj6l+2d7J93Gvb/eUjGIOxT N8U6mjS1zO8sGHtipCTDq33d8r8MhnppifhOb9X7okRnsOpgKaXhPQ9eYGL1TakXti9G OKMlHE2XBMZf/gBGjXfMgHoKVW9zecWs6DPcIHPEC7peM/kerrKN118xD9/SuYpjBkAa XlfR5ZcjO3AMKrIKNQWm/l4mgp7ghAvt7d0iErW1d8HVBjRDshOScnwNkgWTPZ2zTZuK ydnZ1BS82ug/06AqpgEmV0PRXd9/aozGSRJLPW5VZA9tQy3ScyL4wlNLcjZc37GIKNAi ZLPA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1701977418; x=1702582218; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=cCo25qT9AtztM6NlEg7BzkQzG2uRIBLOVHP87eaH1kE=; b=fUCUvcVYw0N17A2yLjzAiigTcT7quuGydoOFHVNZ9IsZhDEuagS5gez5F3RVXm3zFl lGICneU6Q4sujjFROc8hDxDqPkO8ziCTgiVtTkxoC0unAJL1ErTiT7Q6FNas4Wq9kr9y uk6waxGLcyuHEXjyWuGoFZZ9Nbun9/K2+bK2UonMkTLuWlqFa4SFCl5dOPsiabjGMDPN U8TOokJHW0RV+p8+avN1msgkKOmkSt+m9u/ryBMAFeIXUgDMiKcHxX9bTl9gVHKMeoBZ eRPiyfRGs2n3iX1E4qoLncDNj7sGDh2EPxF2PjSFjlW8qyjbAcYKLb+4QWwrwR4gMY3a yXLQ== X-Gm-Message-State: AOJu0Yy2J3xECCdGMTIG7yv5dDb471QcoQChroMuunA0jdV6Z4aM2bU8 co2A6T00T4QrSRAYWMTWK42TnruXkfY= X-Google-Smtp-Source: AGHT+IGhBJASZsmYPKpG4D+3+g59MiW5kLirD/JA8XkbLAxYoGEwXgqs9hOVLuf23DdweQppuoN/cg== X-Received: by 2002:a05:6870:d8d0:b0:1fa:fadc:4d25 with SMTP id of16-20020a056870d8d000b001fafadc4d25mr1555789oac.22.1701977418689; Thu, 07 Dec 2023 11:30:18 -0800 (PST) Received: from bob-3900x.lan (2603-8081-1405-679b-ca1f-53fd-59c8-8b84.res6.spectrum.com. [2603:8081:1405:679b:ca1f:53fd:59c8:8b84]) by smtp.gmail.com with ESMTPSA id mp23-20020a056871329700b001fb634b546dsm92347oac.14.2023.12.07.11.30.17 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 07 Dec 2023 11:30:17 -0800 (PST) From: Bob Pearson To: jgg@nvidia.com, yanjun.zhu@linux.dev, linux-rdma@vger.kernel.org, netdev@vger.kernel.org, dsahern@kernel.org, rain.1986.08.12@gmail.com Cc: Bob Pearson Subject: [PATCH for-next v6 3/7] RDMA/rxe: Register IP mcast address Date: Thu, 7 Dec 2023 13:29:04 -0600 Message-Id: <20231207192907.10113-4-rpearsonhpe@gmail.com> X-Mailer: git-send-email 2.40.1 In-Reply-To: <20231207192907.10113-1-rpearsonhpe@gmail.com> References: <20231207192907.10113-1-rpearsonhpe@gmail.com> Precedence: bulk X-Mailing-List: linux-rdma@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Currently the rdma_rxe driver does not receive mcast packets at all. Add code to rxe_mcast_add() and rxe_mcast_del() to register/deregister the IP mcast address. This is required for mcast traffic to reach the rxe driver when coming from an external source. Fixes: 8700e3e7c485 ("Soft RoCE driver") Signed-off-by: Bob Pearson --- drivers/infiniband/sw/rxe/rxe_mcast.c | 119 +++++++++++++++++++++----- drivers/infiniband/sw/rxe/rxe_net.c | 2 +- drivers/infiniband/sw/rxe/rxe_net.h | 1 + drivers/infiniband/sw/rxe/rxe_verbs.h | 1 + 4 files changed, 102 insertions(+), 21 deletions(-) diff --git a/drivers/infiniband/sw/rxe/rxe_mcast.c b/drivers/infiniband/sw/rxe/rxe_mcast.c index 86cc2e18a7fd..5236761892dd 100644 --- a/drivers/infiniband/sw/rxe/rxe_mcast.c +++ b/drivers/infiniband/sw/rxe/rxe_mcast.c @@ -19,38 +19,116 @@ * mcast packets in the rxe receive path. */ +#include + #include "rxe.h" -/** - * rxe_mcast_add - add multicast address to rxe device - * @rxe: rxe device object - * @mgid: multicast address as a gid - * - * Returns 0 on success else an error - */ -static int rxe_mcast_add(struct rxe_dev *rxe, union ib_gid *mgid) +static int rxe_mcast_add6(struct rxe_dev *rxe, union ib_gid *mgid) { + struct in6_addr *addr6 = (struct in6_addr *)mgid; + struct sock *sk = recv_sockets.sk6->sk; unsigned char ll_addr[ETH_ALEN]; + int err; + + lock_sock(sk); + rtnl_lock(); + err = ipv6_sock_mc_join(sk, rxe->ndev->ifindex, addr6); + rtnl_unlock(); + release_sock(sk); + if (err && err != -EADDRINUSE) + goto err_out; ipv6_eth_mc_map((struct in6_addr *)mgid->raw, ll_addr); + err = dev_mc_add(rxe->ndev, ll_addr); + if (err) + goto err_drop; + + return 0; - return dev_mc_add(rxe->ndev, ll_addr); +err_drop: + lock_sock(sk); + rtnl_lock(); + ipv6_sock_mc_drop(sk, rxe->ndev->ifindex, addr6); + rtnl_unlock(); + release_sock(sk); +err_out: + return err; } -/** - * rxe_mcast_del - delete multicast address from rxe device - * @rxe: rxe device object - * @mgid: multicast address as a gid - * - * Returns 0 on success else an error - */ -static int rxe_mcast_del(struct rxe_dev *rxe, union ib_gid *mgid) +static int rxe_mcast_add(struct rxe_mcg *mcg) { + struct rxe_dev *rxe = mcg->rxe; + union ib_gid *mgid = &mcg->mgid; unsigned char ll_addr[ETH_ALEN]; + struct ip_mreqn imr = {}; + int err; + + if (mcg->is_ipv6) + return rxe_mcast_add6(rxe, mgid); + + imr.imr_multiaddr = *(struct in_addr *)(mgid->raw + 12); + imr.imr_ifindex = rxe->ndev->ifindex; + rtnl_lock(); + err = ip_mc_join_group(recv_sockets.sk4->sk, &imr); + rtnl_unlock(); + if (err && err != -EADDRINUSE) + goto err_out; + + ip_eth_mc_map(imr.imr_multiaddr.s_addr, ll_addr); + err = dev_mc_add(rxe->ndev, ll_addr); + if (err) + goto err_leave; + + return 0; + +err_leave: + rtnl_lock(); + ip_mc_leave_group(recv_sockets.sk4->sk, &imr); + rtnl_unlock(); +err_out: + return err; +} + +static int rxe_mcast_del6(struct rxe_dev *rxe, union ib_gid *mgid) +{ + struct sock *sk = recv_sockets.sk6->sk; + unsigned char ll_addr[ETH_ALEN]; + int err, err2; ipv6_eth_mc_map((struct in6_addr *)mgid->raw, ll_addr); + err = dev_mc_del(rxe->ndev, ll_addr); + + lock_sock(sk); + rtnl_lock(); + err2 = ipv6_sock_mc_drop(sk, rxe->ndev->ifindex, + (struct in6_addr *)mgid); + rtnl_unlock(); + release_sock(sk); + + return err ?: err2; +} + +static int rxe_mcast_del(struct rxe_mcg *mcg) +{ + struct rxe_dev *rxe = mcg->rxe; + union ib_gid *mgid = &mcg->mgid; + unsigned char ll_addr[ETH_ALEN]; + struct ip_mreqn imr = {}; + int err, err2; + + if (mcg->is_ipv6) + return rxe_mcast_del6(rxe, mgid); + + imr.imr_multiaddr = *(struct in_addr *)(mgid->raw + 12); + imr.imr_ifindex = rxe->ndev->ifindex; + ip_eth_mc_map(imr.imr_multiaddr.s_addr, ll_addr); + err = dev_mc_del(rxe->ndev, ll_addr); + + rtnl_lock(); + err2 = ip_mc_leave_group(recv_sockets.sk4->sk, &imr); + rtnl_unlock(); - return dev_mc_del(rxe->ndev, ll_addr); + return err ?: err2; } /** @@ -164,6 +242,7 @@ static void __rxe_init_mcg(struct rxe_dev *rxe, union ib_gid *mgid, { kref_init(&mcg->ref_cnt); memcpy(&mcg->mgid, mgid, sizeof(mcg->mgid)); + mcg->is_ipv6 = !ipv6_addr_v4mapped((struct in6_addr *)mgid); INIT_LIST_HEAD(&mcg->qp_list); mcg->rxe = rxe; @@ -225,7 +304,7 @@ static struct rxe_mcg *rxe_get_mcg(struct rxe_dev *rxe, union ib_gid *mgid) spin_unlock_bh(&rxe->mcg_lock); /* add mcast address outside of lock */ - err = rxe_mcast_add(rxe, mgid); + err = rxe_mcast_add(mcg); if (!err) return mcg; @@ -273,7 +352,7 @@ static void __rxe_destroy_mcg(struct rxe_mcg *mcg) static void rxe_destroy_mcg(struct rxe_mcg *mcg) { /* delete mcast address outside of lock */ - rxe_mcast_del(mcg->rxe, &mcg->mgid); + rxe_mcast_del(mcg); spin_lock_bh(&mcg->rxe->mcg_lock); __rxe_destroy_mcg(mcg); diff --git a/drivers/infiniband/sw/rxe/rxe_net.c b/drivers/infiniband/sw/rxe/rxe_net.c index 58c3f3759bf0..b481f8da2002 100644 --- a/drivers/infiniband/sw/rxe/rxe_net.c +++ b/drivers/infiniband/sw/rxe/rxe_net.c @@ -18,7 +18,7 @@ #include "rxe_net.h" #include "rxe_loc.h" -static struct rxe_recv_sockets recv_sockets; +struct rxe_recv_sockets recv_sockets; static struct dst_entry *rxe_find_route4(struct rxe_qp *qp, struct net_device *ndev, diff --git a/drivers/infiniband/sw/rxe/rxe_net.h b/drivers/infiniband/sw/rxe/rxe_net.h index 45d80d00f86b..89cee7d5340f 100644 --- a/drivers/infiniband/sw/rxe/rxe_net.h +++ b/drivers/infiniband/sw/rxe/rxe_net.h @@ -15,6 +15,7 @@ struct rxe_recv_sockets { struct socket *sk4; struct socket *sk6; }; +extern struct rxe_recv_sockets recv_sockets; int rxe_net_add(const char *ibdev_name, struct net_device *ndev); diff --git a/drivers/infiniband/sw/rxe/rxe_verbs.h b/drivers/infiniband/sw/rxe/rxe_verbs.h index ccb9d19ffe8a..7be9e6232dd9 100644 --- a/drivers/infiniband/sw/rxe/rxe_verbs.h +++ b/drivers/infiniband/sw/rxe/rxe_verbs.h @@ -352,6 +352,7 @@ struct rxe_mcg { atomic_t qp_num; u32 qkey; u16 pkey; + bool is_ipv6; }; struct rxe_mca { From patchwork Thu Dec 7 19:29:05 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Bob Pearson X-Patchwork-Id: 13483881 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="egFRhdgo" Received: from mail-oa1-x34.google.com (mail-oa1-x34.google.com [IPv6:2001:4860:4864:20::34]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 4356C10DC; Thu, 7 Dec 2023 11:30:20 -0800 (PST) Received: by mail-oa1-x34.google.com with SMTP id 586e51a60fabf-1eb39505ba4so807321fac.0; Thu, 07 Dec 2023 11:30:20 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1701977419; x=1702582219; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=ltRshs1fvRyryntixBOeAHNaaxz9f8ewSjKaE4jwwUs=; b=egFRhdgoVxgsR+6s3HM9/A/vWAbjMjRdKXVAxItUPDEgj9k/u723eVrts1k6qYYYZm fkB0mC17WM2W/YU/PfTo5CdKzI+8Bc6NS6qR6e29M4fdeTKXu6AjNC4up7y4yi23wQCU iGW+l/AwVmhJIJ2EQGnBEe5NXjgdeALB7yrs4164d8iYRc1nseBRxvX+1/BQPzjtufJD wKV94jPnO6blAdah8goJBeurGHRQ9PtpA1Oca+HMXSTV/4nCfIoWaxTVzVhzLvnxFlNw KrdbEsj00Ql3dn3neyrp0nuNjDHv/LzGqrXSK53mLQa569Ik+zX8A56UQBC+a/3fKj4R xExQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1701977419; x=1702582219; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=ltRshs1fvRyryntixBOeAHNaaxz9f8ewSjKaE4jwwUs=; b=oWFEfLJ8ctSOBDRuKSR7ok3P41X+1hpCSAD5lg6krf5aueoh5wjCOLrfspoVWyWTJe rlgmxtMJqzt2BLoA/Uh8Gzp77LOrx3Ypd2659KE8yKxSw78KfbK/l0kKX0ZV3OcWoEaV 9nsS7b0b5Empsk7L5Wyo2kW1fEHbT6r0umaeubKIes5dLXteOyKR2BH1XhgkoElgN2WZ BFjI198/b5gN4eSuGH/qlq6ZX9AFgbtB9ZTrgd7lYRNgYmEdc1qbuu5hgnLKU6WIMy2X skXzB60tbjeaZN7Ku+LyAKooeNc3pePHEum/+9S7byuAssfEyH+0/3smV3UQQq18ZqUI 0ubw== X-Gm-Message-State: AOJu0Yw3Dts0Remxr5ExlEb73hZhH4MLiVfw3urJoOUy0OZH//nhNHKw 26GTYVvP9IKu1k37LzPGBwE= X-Google-Smtp-Source: AGHT+IFNTI6DyZUNhTkN7GjIBXcmYRmsB7fKUY9AWEov8yNijYDJ3cQ8oXlWm9jDmh9SDSLAFuKR3A== X-Received: by 2002:a05:6870:818d:b0:1fb:75b:99b0 with SMTP id k13-20020a056870818d00b001fb075b99b0mr3311898oae.95.1701977419502; Thu, 07 Dec 2023 11:30:19 -0800 (PST) Received: from bob-3900x.lan (2603-8081-1405-679b-ca1f-53fd-59c8-8b84.res6.spectrum.com. [2603:8081:1405:679b:ca1f:53fd:59c8:8b84]) by smtp.gmail.com with ESMTPSA id mp23-20020a056871329700b001fb634b546dsm92347oac.14.2023.12.07.11.30.18 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 07 Dec 2023 11:30:19 -0800 (PST) From: Bob Pearson To: jgg@nvidia.com, yanjun.zhu@linux.dev, linux-rdma@vger.kernel.org, netdev@vger.kernel.org, dsahern@kernel.org, rain.1986.08.12@gmail.com Cc: Bob Pearson Subject: [PATCH for-next v6 4/7] RDMA/rxe: Let rxe_lookup_mcg use rcu_read_lock Date: Thu, 7 Dec 2023 13:29:05 -0600 Message-Id: <20231207192907.10113-5-rpearsonhpe@gmail.com> X-Mailer: git-send-email 2.40.1 In-Reply-To: <20231207192907.10113-1-rpearsonhpe@gmail.com> References: <20231207192907.10113-1-rpearsonhpe@gmail.com> Precedence: bulk X-Mailing-List: linux-rdma@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Change locking of read side operations of the mcast group red-black tree to use rcu read locking. This will allow changing the mcast lock in the next patch to be a mutex without breaking rxe_recv.c which runs in an atomic state. It is also a better implementation than the current use of a spin-lock per rdma device since receiving mcast packets will be much more common than registering/deregistering mcast groups. Signed-off-by: Bob Pearson --- drivers/infiniband/sw/rxe/rxe_mcast.c | 59 +++++++++------------------ drivers/infiniband/sw/rxe/rxe_verbs.h | 1 + 2 files changed, 21 insertions(+), 39 deletions(-) diff --git a/drivers/infiniband/sw/rxe/rxe_mcast.c b/drivers/infiniband/sw/rxe/rxe_mcast.c index 5236761892dd..315e7615e6a7 100644 --- a/drivers/infiniband/sw/rxe/rxe_mcast.c +++ b/drivers/infiniband/sw/rxe/rxe_mcast.c @@ -151,13 +151,18 @@ static void __rxe_insert_mcg(struct rxe_mcg *mcg) tmp = rb_entry(node, struct rxe_mcg, node); cmp = memcmp(&tmp->mgid, &mcg->mgid, sizeof(mcg->mgid)); - if (cmp > 0) + if (cmp > 0) { link = &(*link)->rb_left; - else + } else if (cmp < 0) { link = &(*link)->rb_right; + } else { + /* we must delete the old mcg before adding one */ + WARN_ON_ONCE(1); + return; + } } - rb_link_node(&mcg->node, node, link); + rb_link_node_rcu(&mcg->node, node, link); rb_insert_color(&mcg->node, tree); } @@ -172,15 +177,11 @@ static void __rxe_remove_mcg(struct rxe_mcg *mcg) rb_erase(&mcg->node, &mcg->rxe->mcg_tree); } -/** - * __rxe_lookup_mcg - lookup mcg in rxe->mcg_tree while holding lock - * @rxe: rxe device object - * @mgid: multicast IP address - * - * Context: caller must hold rxe->mcg_lock - * Returns: mcg on success and takes a ref to mcg else NULL +/* + * Lookup mgid in the multicast group red-black tree and try to + * get a ref on it. Return mcg on success else NULL. */ -static struct rxe_mcg *__rxe_lookup_mcg(struct rxe_dev *rxe, +struct rxe_mcg *rxe_lookup_mcg(struct rxe_dev *rxe, union ib_gid *mgid) { struct rb_root *tree = &rxe->mcg_tree; @@ -188,7 +189,8 @@ static struct rxe_mcg *__rxe_lookup_mcg(struct rxe_dev *rxe, struct rb_node *node; int cmp; - node = tree->rb_node; + rcu_read_lock(); + node = rcu_dereference_raw(tree->rb_node); while (node) { mcg = rb_entry(node, struct rxe_mcg, node); @@ -196,35 +198,14 @@ static struct rxe_mcg *__rxe_lookup_mcg(struct rxe_dev *rxe, cmp = memcmp(&mcg->mgid, mgid, sizeof(*mgid)); if (cmp > 0) - node = node->rb_left; + node = rcu_dereference_raw(node->rb_left); else if (cmp < 0) - node = node->rb_right; + node = rcu_dereference_raw(node->rb_right); else break; } - - if (node) { - kref_get(&mcg->ref_cnt); - return mcg; - } - - return NULL; -} - -/** - * rxe_lookup_mcg - lookup up mcg in red-back tree - * @rxe: rxe device object - * @mgid: multicast IP address - * - * Returns: mcg if found else NULL - */ -struct rxe_mcg *rxe_lookup_mcg(struct rxe_dev *rxe, union ib_gid *mgid) -{ - struct rxe_mcg *mcg; - - spin_lock_bh(&rxe->mcg_lock); - mcg = __rxe_lookup_mcg(rxe, mgid); - spin_unlock_bh(&rxe->mcg_lock); + mcg = (node && kref_get_unless_zero(&mcg->ref_cnt)) ? mcg : NULL; + rcu_read_unlock(); return mcg; } @@ -292,7 +273,7 @@ static struct rxe_mcg *rxe_get_mcg(struct rxe_dev *rxe, union ib_gid *mgid) spin_lock_bh(&rxe->mcg_lock); /* re-check to see if someone else just added it */ - tmp = __rxe_lookup_mcg(rxe, mgid); + tmp = rxe_lookup_mcg(rxe, mgid); if (tmp) { spin_unlock_bh(&rxe->mcg_lock); atomic_dec(&rxe->mcg_num); @@ -322,7 +303,7 @@ void rxe_cleanup_mcg(struct kref *kref) { struct rxe_mcg *mcg = container_of(kref, typeof(*mcg), ref_cnt); - kfree(mcg); + kfree_rcu(mcg, rcu); } /** diff --git a/drivers/infiniband/sw/rxe/rxe_verbs.h b/drivers/infiniband/sw/rxe/rxe_verbs.h index 7be9e6232dd9..8058e5039322 100644 --- a/drivers/infiniband/sw/rxe/rxe_verbs.h +++ b/drivers/infiniband/sw/rxe/rxe_verbs.h @@ -345,6 +345,7 @@ struct rxe_mw { struct rxe_mcg { struct rb_node node; + struct rcu_head rcu; struct kref ref_cnt; struct rxe_dev *rxe; struct list_head qp_list; From patchwork Thu Dec 7 19:29:06 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Bob Pearson X-Patchwork-Id: 13483882 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="CeXAnaVE" Received: from mail-ot1-x333.google.com (mail-ot1-x333.google.com [IPv6:2607:f8b0:4864:20::333]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 6644B93; Thu, 7 Dec 2023 11:30:21 -0800 (PST) Received: by mail-ot1-x333.google.com with SMTP id 46e09a7af769-6d9d21959aaso764423a34.2; Thu, 07 Dec 2023 11:30:21 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1701977420; x=1702582220; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=ySVJy5scblm7CsWNbPx4KnFb8eNZCK5TFCS8tbW2d08=; b=CeXAnaVE7UxpEFaEZY/4PzHCkLIdJkFbqBse7TJnAtC3DZ4xOKe/6lxuTNCAEHBSJb EfC6lWdggE1KW0YOUZvorWkNOh8yYbYtFvvwg1ugoHhkZgpkb6fAUxaaLttLKrI1q4um V1birToehxlZ4YueHpnC6nfbLzuInK6HfJca1LexupxVvpPSLO216Q3Oj76ygsrvZkm5 e6E4RXu2JY6owafRC9fCIBMFYl1OqPnIipy+z9SCBE5F0tIeW8xQ+ybHY5UP5cR9rtB1 l9rxMI4bUP40IYV+Lfd8/Jj08HNHi+viwOl1UDkXslq/HbMm8kf5evfjUUY3DwIgY5bZ SsFg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1701977420; x=1702582220; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=ySVJy5scblm7CsWNbPx4KnFb8eNZCK5TFCS8tbW2d08=; b=suQ3Rq275rey5XtsFoBo2VHmnKRbMtVKSVXtMtMxhd6EHkTUXphDu+47uXOWebxy+g fIBaHzF/c2zd/Nyw/QJ1xEcjLeyuvyXXnt5PLbgeQFAJG+v1ZmcJr6iiOcgl+5eN55qf Ug9gErhCnGXpbkuDdSNHB9+qdatqZyvw32fbWmjESXKAByn4+2Bs4iTcS5HR5gYM4ZuN Erkf/GN5flMY6x68dnXpczSMaH5CUwVzFTueGyxT7N90DDzqi0f/zkMolRqelMplw66Q Z7w8cXTAhYPJyi4iXzryLkWPJSDic+mBAz+E0eWo7H6wSBrOW+CZOKqYCvOaYA/k4+A2 Cc2g== X-Gm-Message-State: AOJu0YwUTMYY2fZx5kAFNk9thVOWVAv5RF3rnqTTASzV7526peoxsmE0 PuAjEAi3OxqHMNXOHGDFpFQ= X-Google-Smtp-Source: AGHT+IHuDtZActfNE2VLY4RXUQbHr6v0gydxffJXWNMc383vgLDrDvx+MnJqoU7bCSv5ZHgAC99YpA== X-Received: by 2002:a05:6870:2193:b0:1fb:75b:2bae with SMTP id l19-20020a056870219300b001fb075b2baemr3150178oae.106.1701977420501; Thu, 07 Dec 2023 11:30:20 -0800 (PST) Received: from bob-3900x.lan (2603-8081-1405-679b-ca1f-53fd-59c8-8b84.res6.spectrum.com. [2603:8081:1405:679b:ca1f:53fd:59c8:8b84]) by smtp.gmail.com with ESMTPSA id mp23-20020a056871329700b001fb634b546dsm92347oac.14.2023.12.07.11.30.19 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 07 Dec 2023 11:30:19 -0800 (PST) From: Bob Pearson To: jgg@nvidia.com, yanjun.zhu@linux.dev, linux-rdma@vger.kernel.org, netdev@vger.kernel.org, dsahern@kernel.org, rain.1986.08.12@gmail.com Cc: Bob Pearson Subject: [PATCH for-next v6 5/7] RDMA/rxe: Split multicast lock Date: Thu, 7 Dec 2023 13:29:06 -0600 Message-Id: <20231207192907.10113-6-rpearsonhpe@gmail.com> X-Mailer: git-send-email 2.40.1 In-Reply-To: <20231207192907.10113-1-rpearsonhpe@gmail.com> References: <20231207192907.10113-1-rpearsonhpe@gmail.com> Precedence: bulk X-Mailing-List: linux-rdma@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Split rxe->mcg_lock into two locks. One to protect mcg->qp_list and one to protect rxe->mcg_tree (red-black tree) write side operations and provide serialization between rxe_attach_mcast and rxe_detach_mcast. Make the qp_list lock a spin_lock_irqsave lock and move to the mcg struct. It protects the qp_list from simultaneous access from rxe_mcast.c and rxe_recv.c when processing incoming multi- cast packets. In theory some ethernet driver could bypass NAPI so an irq lock is better than a bh lock. Make the mcg_tree lock a mutex since the attach/detach APIs are not called in atomic context. This allows some significant cleanup since we can call kzalloc while holding the mutex so some recheck code can be eliminated. Signed-off-by: Bob Pearson --- drivers/infiniband/sw/rxe/rxe.c | 2 +- drivers/infiniband/sw/rxe/rxe_mcast.c | 254 ++++++++++---------------- drivers/infiniband/sw/rxe/rxe_recv.c | 5 +- drivers/infiniband/sw/rxe/rxe_verbs.h | 3 +- 4 files changed, 105 insertions(+), 159 deletions(-) diff --git a/drivers/infiniband/sw/rxe/rxe.c b/drivers/infiniband/sw/rxe/rxe.c index 54c723a6edda..147cb16e937d 100644 --- a/drivers/infiniband/sw/rxe/rxe.c +++ b/drivers/infiniband/sw/rxe/rxe.c @@ -142,7 +142,7 @@ static void rxe_init(struct rxe_dev *rxe) INIT_LIST_HEAD(&rxe->pending_mmaps); /* init multicast support */ - spin_lock_init(&rxe->mcg_lock); + mutex_init(&rxe->mcg_mutex); rxe->mcg_tree = RB_ROOT; mutex_init(&rxe->usdev_lock); diff --git a/drivers/infiniband/sw/rxe/rxe_mcast.c b/drivers/infiniband/sw/rxe/rxe_mcast.c index 315e7615e6a7..1c85aa03f79b 100644 --- a/drivers/infiniband/sw/rxe/rxe_mcast.c +++ b/drivers/infiniband/sw/rxe/rxe_mcast.c @@ -135,7 +135,7 @@ static int rxe_mcast_del(struct rxe_mcg *mcg) * __rxe_insert_mcg - insert an mcg into red-black tree (rxe->mcg_tree) * @mcg: mcg object with an embedded red-black tree node * - * Context: caller must hold a reference to mcg and rxe->mcg_lock and + * Context: caller must hold a reference to mcg and rxe->mcg_mutex and * is responsible to avoid adding the same mcg twice to the tree. */ static void __rxe_insert_mcg(struct rxe_mcg *mcg) @@ -170,7 +170,7 @@ static void __rxe_insert_mcg(struct rxe_mcg *mcg) * __rxe_remove_mcg - remove an mcg from red-black tree holding lock * @mcg: mcast group object with an embedded red-black tree node * - * Context: caller must hold a reference to mcg and rxe->mcg_lock + * Context: caller must hold a reference to mcg and rxe->mcg_mutex */ static void __rxe_remove_mcg(struct rxe_mcg *mcg) { @@ -210,34 +210,6 @@ struct rxe_mcg *rxe_lookup_mcg(struct rxe_dev *rxe, return mcg; } -/** - * __rxe_init_mcg - initialize a new mcg - * @rxe: rxe device - * @mgid: multicast address as a gid - * @mcg: new mcg object - * - * Context: caller should hold rxe->mcg lock - */ -static void __rxe_init_mcg(struct rxe_dev *rxe, union ib_gid *mgid, - struct rxe_mcg *mcg) -{ - kref_init(&mcg->ref_cnt); - memcpy(&mcg->mgid, mgid, sizeof(mcg->mgid)); - mcg->is_ipv6 = !ipv6_addr_v4mapped((struct in6_addr *)mgid); - INIT_LIST_HEAD(&mcg->qp_list); - mcg->rxe = rxe; - - /* caller holds a ref on mcg but that will be - * dropped when mcg goes out of scope. We need to take a ref - * on the pointer that will be saved in the red-black tree - * by __rxe_insert_mcg and used to lookup mcg from mgid later. - * Inserting mcg makes it visible to outside so this should - * be done last after the object is ready. - */ - kref_get(&mcg->ref_cnt); - __rxe_insert_mcg(mcg); -} - /** * rxe_get_mcg - lookup or allocate a mcg * @rxe: rxe device object @@ -247,51 +219,48 @@ static void __rxe_init_mcg(struct rxe_dev *rxe, union ib_gid *mgid, */ static struct rxe_mcg *rxe_get_mcg(struct rxe_dev *rxe, union ib_gid *mgid) { - struct rxe_mcg *mcg, *tmp; + struct rxe_mcg *mcg; int err; - if (rxe->attr.max_mcast_grp == 0) - return ERR_PTR(-EINVAL); - - /* check to see if mcg already exists */ + mutex_lock(&rxe->mcg_mutex); mcg = rxe_lookup_mcg(rxe, mgid); if (mcg) - return mcg; + goto out; /* nothing to do */ - /* check to see if we have reached limit */ if (atomic_inc_return(&rxe->mcg_num) > rxe->attr.max_mcast_grp) { err = -ENOMEM; goto err_dec; } - /* speculative alloc of new mcg */ mcg = kzalloc(sizeof(*mcg), GFP_KERNEL); if (!mcg) { err = -ENOMEM; goto err_dec; } - spin_lock_bh(&rxe->mcg_lock); - /* re-check to see if someone else just added it */ - tmp = rxe_lookup_mcg(rxe, mgid); - if (tmp) { - spin_unlock_bh(&rxe->mcg_lock); - atomic_dec(&rxe->mcg_num); - kfree(mcg); - return tmp; - } - - __rxe_init_mcg(rxe, mgid, mcg); - spin_unlock_bh(&rxe->mcg_lock); + memcpy(&mcg->mgid, mgid, sizeof(mcg->mgid)); + mcg->is_ipv6 = !ipv6_addr_v4mapped((struct in6_addr *)mgid); + mcg->rxe = rxe; + kref_init(&mcg->ref_cnt); + INIT_LIST_HEAD(&mcg->qp_list); + spin_lock_init(&mcg->lock); + kref_get(&mcg->ref_cnt); + __rxe_insert_mcg(mcg); - /* add mcast address outside of lock */ err = rxe_mcast_add(mcg); - if (!err) - return mcg; + if (err) + goto err_free; + +out: + mutex_unlock(&rxe->mcg_mutex); + return mcg; +err_free: + __rxe_remove_mcg(mcg); kfree(mcg); err_dec: atomic_dec(&rxe->mcg_num); + mutex_unlock(&rxe->mcg_mutex); return ERR_PTR(err); } @@ -307,10 +276,10 @@ void rxe_cleanup_mcg(struct kref *kref) } /** - * __rxe_destroy_mcg - destroy mcg object holding rxe->mcg_lock + * __rxe_destroy_mcg - destroy mcg object holding rxe->mcg_mutex * @mcg: the mcg object * - * Context: caller is holding rxe->mcg_lock + * Context: caller is holding rxe->mcg_mutex * no qp's are attached to mcg */ static void __rxe_destroy_mcg(struct rxe_mcg *mcg) @@ -335,151 +304,123 @@ static void rxe_destroy_mcg(struct rxe_mcg *mcg) /* delete mcast address outside of lock */ rxe_mcast_del(mcg); - spin_lock_bh(&mcg->rxe->mcg_lock); + mutex_lock(&mcg->rxe->mcg_mutex); __rxe_destroy_mcg(mcg); - spin_unlock_bh(&mcg->rxe->mcg_lock); + mutex_unlock(&mcg->rxe->mcg_mutex); } /** - * __rxe_init_mca - initialize a new mca holding lock + * rxe_attach_mcg - attach qp to mcg if not already attached * @qp: qp object * @mcg: mcg object - * @mca: empty space for new mca - * - * Context: caller must hold references on qp and mcg, rxe->mcg_lock - * and pass memory for new mca * * Returns: 0 on success else an error */ -static int __rxe_init_mca(struct rxe_qp *qp, struct rxe_mcg *mcg, - struct rxe_mca *mca) +static int rxe_attach_mcg(struct rxe_mcg *mcg, struct rxe_qp *qp) { - struct rxe_dev *rxe = to_rdev(qp->ibqp.device); - int n; + struct rxe_dev *rxe = mcg->rxe; + struct rxe_mca *mca; + unsigned long flags; + int err; - n = atomic_inc_return(&rxe->mcg_attach); - if (n > rxe->attr.max_total_mcast_qp_attach) { - atomic_dec(&rxe->mcg_attach); - return -ENOMEM; + mutex_lock(&rxe->mcg_mutex); + spin_lock_irqsave(&mcg->lock, flags); + list_for_each_entry(mca, &mcg->qp_list, qp_list) { + if (mca->qp == qp) { + spin_unlock_irqrestore(&mcg->lock, flags); + goto out; /* nothing to do */ + } } + spin_unlock_irqrestore(&mcg->lock, flags); - n = atomic_inc_return(&mcg->qp_num); - if (n > rxe->attr.max_mcast_qp_attach) { - atomic_dec(&mcg->qp_num); - atomic_dec(&rxe->mcg_attach); - return -ENOMEM; + if (atomic_inc_return(&rxe->mcg_attach) > + rxe->attr.max_total_mcast_qp_attach) { + err = -EINVAL; + goto err_dec_attach; } - atomic_inc(&qp->mcg_num); + if (atomic_inc_return(&mcg->qp_num) > + rxe->attr.max_mcast_qp_attach) { + err = -EINVAL; + goto err_dec_qp_num; + } + + mca = kzalloc(sizeof(*mca), GFP_KERNEL); + if (!mca) { + err = -ENOMEM; + goto err_dec_qp_num; + } + atomic_inc(&qp->mcg_num); rxe_get(qp); mca->qp = qp; + spin_lock_irqsave(&mcg->lock, flags); list_add_tail(&mca->qp_list, &mcg->qp_list); - + spin_unlock_irqrestore(&mcg->lock, flags); +out: + mutex_unlock(&rxe->mcg_mutex); return 0; + +err_dec_qp_num: + atomic_dec(&mcg->qp_num); +err_dec_attach: + atomic_dec(&rxe->mcg_attach); + mutex_unlock(&rxe->mcg_mutex); + return err; } /** - * rxe_attach_mcg - attach qp to mcg if not already attached - * @qp: qp object + * rxe_detach_mcg - detach qp from mcg * @mcg: mcg object + * @qp: qp object * - * Context: caller must hold reference on qp and mcg. - * Returns: 0 on success else an error + * Returns: 0 on success else an error if qp is not attached. */ -static int rxe_attach_mcg(struct rxe_mcg *mcg, struct rxe_qp *qp) +static int rxe_detach_mcg(struct rxe_mcg *mcg, struct rxe_qp *qp) { struct rxe_dev *rxe = mcg->rxe; - struct rxe_mca *mca, *tmp; - int err; + struct rxe_mca *mca; + unsigned long flags; + int err = 0; - /* check to see if the qp is already a member of the group */ - spin_lock_bh(&rxe->mcg_lock); + mutex_lock(&rxe->mcg_mutex); + spin_lock_irqsave(&mcg->lock, flags); list_for_each_entry(mca, &mcg->qp_list, qp_list) { if (mca->qp == qp) { - spin_unlock_bh(&rxe->mcg_lock); - return 0; + spin_unlock_irqrestore(&mcg->lock, flags); + goto found; } } - spin_unlock_bh(&rxe->mcg_lock); + spin_unlock_irqrestore(&mcg->lock, flags); - /* speculative alloc new mca without using GFP_ATOMIC */ - mca = kzalloc(sizeof(*mca), GFP_KERNEL); - if (!mca) - return -ENOMEM; - - spin_lock_bh(&rxe->mcg_lock); - /* re-check to see if someone else just attached qp */ - list_for_each_entry(tmp, &mcg->qp_list, qp_list) { - if (tmp->qp == qp) { - kfree(mca); - err = 0; - goto out; - } - } - - err = __rxe_init_mca(qp, mcg, mca); - if (err) - kfree(mca); -out: - spin_unlock_bh(&rxe->mcg_lock); - return err; -} + /* we didn't find the qp on the list */ + err = -EINVAL; + goto err_out; -/** - * __rxe_cleanup_mca - cleanup mca object holding lock - * @mca: mca object - * @mcg: mcg object - * - * Context: caller must hold a reference to mcg and rxe->mcg_lock - */ -static void __rxe_cleanup_mca(struct rxe_mca *mca, struct rxe_mcg *mcg) -{ +found: + spin_lock_irqsave(&mcg->lock, flags); list_del(&mca->qp_list); + spin_unlock_irqrestore(&mcg->lock, flags); atomic_dec(&mcg->qp_num); atomic_dec(&mcg->rxe->mcg_attach); atomic_dec(&mca->qp->mcg_num); rxe_put(mca->qp); - kfree(mca); -} - -/** - * rxe_detach_mcg - detach qp from mcg - * @mcg: mcg object - * @qp: qp object - * - * Returns: 0 on success else an error if qp is not attached. - */ -static int rxe_detach_mcg(struct rxe_mcg *mcg, struct rxe_qp *qp) -{ - struct rxe_dev *rxe = mcg->rxe; - struct rxe_mca *mca, *tmp; - spin_lock_bh(&rxe->mcg_lock); - list_for_each_entry_safe(mca, tmp, &mcg->qp_list, qp_list) { - if (mca->qp == qp) { - __rxe_cleanup_mca(mca, mcg); - - /* if the number of qp's attached to the - * mcast group falls to zero go ahead and - * tear it down. This will not free the - * object since we are still holding a ref - * from the caller - */ - if (atomic_read(&mcg->qp_num) <= 0) - __rxe_destroy_mcg(mcg); - - spin_unlock_bh(&rxe->mcg_lock); - return 0; - } - } + /* if the number of qp's attached to the + * mcast group falls to zero go ahead and + * tear it down. This will not free the + * object since we are still holding a ref + * from the caller + */ + if (atomic_read(&mcg->qp_num) <= 0) + __rxe_destroy_mcg(mcg); - /* we didn't find the qp on the list */ - spin_unlock_bh(&rxe->mcg_lock); - return -EINVAL; +err_out: + mutex_unlock(&rxe->mcg_mutex); + return err; } /** @@ -497,6 +438,9 @@ int rxe_attach_mcast(struct ib_qp *ibqp, union ib_gid *mgid, u16 mlid) struct rxe_qp *qp = to_rqp(ibqp); struct rxe_mcg *mcg; + if (rxe->attr.max_mcast_grp == 0) + return -EINVAL; + /* takes a ref on mcg if successful */ mcg = rxe_get_mcg(rxe, mgid); if (IS_ERR(mcg)) diff --git a/drivers/infiniband/sw/rxe/rxe_recv.c b/drivers/infiniband/sw/rxe/rxe_recv.c index 7153de0799fc..6cf0da958864 100644 --- a/drivers/infiniband/sw/rxe/rxe_recv.c +++ b/drivers/infiniband/sw/rxe/rxe_recv.c @@ -194,6 +194,7 @@ static void rxe_rcv_mcast_pkt(struct rxe_dev *rxe, struct sk_buff *skb) struct rxe_mca *mca; struct rxe_qp *qp; union ib_gid dgid; + unsigned long flags; int err; if (skb->protocol == htons(ETH_P_IP)) @@ -207,7 +208,7 @@ static void rxe_rcv_mcast_pkt(struct rxe_dev *rxe, struct sk_buff *skb) if (!mcg) goto drop; /* mcast group not registered */ - spin_lock_bh(&rxe->mcg_lock); + spin_lock_irqsave(&mcg->lock, flags); /* this is unreliable datagram service so we let * failures to deliver a multicast packet to a @@ -259,7 +260,7 @@ static void rxe_rcv_mcast_pkt(struct rxe_dev *rxe, struct sk_buff *skb) } } - spin_unlock_bh(&rxe->mcg_lock); + spin_unlock_irqrestore(&mcg->lock, flags); kref_put(&mcg->ref_cnt, rxe_cleanup_mcg); diff --git a/drivers/infiniband/sw/rxe/rxe_verbs.h b/drivers/infiniband/sw/rxe/rxe_verbs.h index 8058e5039322..f21963dcb2c8 100644 --- a/drivers/infiniband/sw/rxe/rxe_verbs.h +++ b/drivers/infiniband/sw/rxe/rxe_verbs.h @@ -351,6 +351,7 @@ struct rxe_mcg { struct list_head qp_list; union ib_gid mgid; atomic_t qp_num; + spinlock_t lock; /* protect qp_list */ u32 qkey; u16 pkey; bool is_ipv6; @@ -390,7 +391,7 @@ struct rxe_dev { struct rxe_pool mw_pool; /* multicast support */ - spinlock_t mcg_lock; + struct mutex mcg_mutex; struct rb_root mcg_tree; atomic_t mcg_num; atomic_t mcg_attach; From patchwork Thu Dec 7 19:29:07 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Bob Pearson X-Patchwork-Id: 13483884 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="YRvT1CMu" Received: from mail-oa1-x34.google.com (mail-oa1-x34.google.com [IPv6:2001:4860:4864:20::34]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 15E791708; Thu, 7 Dec 2023 11:30:23 -0800 (PST) Received: by mail-oa1-x34.google.com with SMTP id 586e51a60fabf-1fae0e518a4so1171367fac.0; Thu, 07 Dec 2023 11:30:22 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1701977422; x=1702582222; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=q07UAbjwmjVtJ75sgLnLmZYgkoeDAGYeoFaiA4q7nYs=; b=YRvT1CMurcehFeuMlv1O2c0MCfyqX8bfVn+OF8clPyg3HPc+9iVxD3hevciLqeClSC gcOWj7dTFSz2v2Pkw2Xqoz/4anhLB7UhZYbsXBPsWR5K16N7bdc51cOfaxsNLoBv7ZzN VcVDKkDDA7zV5FBi5Y5JCoGGABKr+hpDF58txOJni19qhtVQkKKbsaRVCA8jRrPXw/Ee 56a9Obi/fcywA9+vak0ug02tEq1cpwAS82fSwea5dICIkLdMMaPEuGt/aNuYVZ9nROFd NlGUgleKGWH/8GxTbIuXbO7pK/nWfq8E65YR6cvvc9KngrNvoj46PH2CZibx0BZ+/diM E+0g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1701977422; x=1702582222; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=q07UAbjwmjVtJ75sgLnLmZYgkoeDAGYeoFaiA4q7nYs=; b=F5FVvc2wFtInfVutaXsVL6V+JDiXBYdTQXNJlKpbLNNQVH9taXSFa2xjD3l4+DjKLn PzfHKNdcc6lIFbNsA7ucngIfwdw/5cMp9t8GPuJDWcD2ozQqYR+A9I85/rOjT6fkCdp7 wPZZNWQmFfK540RDw1jD/JQCl0G6OfTh2MqL5nfOAl/6fA4nX8zxX7VThJrtepUVuw4J qThLIGnNfYmFl9Yd/GIxOz2DfhpPP43bAGchkXilCRx9ZgRwTZWb9OkJTfM5kSn7dANK On19FkdRm7WcYumUA4N49h5le1aD+T+rQz3VonpYFPE+6RZatvqco+5kSepclRbGgnxY X4Ew== X-Gm-Message-State: AOJu0Yxq89mUVvw2LZmViSlUJk4WQUSezCXr8LZRUOOFsdPRm1y77GH8 ah4VxOgzGAQwCBrtwFCH1hc= X-Google-Smtp-Source: AGHT+IFfoi/aOKK+NYh3dYN/2FB1hoha+VfwsMbgQ4A/VFSAp5Ri0oZpWWco0XcMoTkHXE8dMJcwDw== X-Received: by 2002:a05:6871:4087:b0:1fb:75a:678c with SMTP id kz7-20020a056871408700b001fb075a678cmr1616114oab.51.1701977421742; Thu, 07 Dec 2023 11:30:21 -0800 (PST) Received: from bob-3900x.lan (2603-8081-1405-679b-ca1f-53fd-59c8-8b84.res6.spectrum.com. [2603:8081:1405:679b:ca1f:53fd:59c8:8b84]) by smtp.gmail.com with ESMTPSA id mp23-20020a056871329700b001fb634b546dsm92347oac.14.2023.12.07.11.30.20 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 07 Dec 2023 11:30:20 -0800 (PST) From: Bob Pearson To: jgg@nvidia.com, yanjun.zhu@linux.dev, linux-rdma@vger.kernel.org, netdev@vger.kernel.org, dsahern@kernel.org, rain.1986.08.12@gmail.com Cc: Bob Pearson Subject: [PATCH for-next v6 6/7] RDMA/rxe: Cleanup mcg lifetime Date: Thu, 7 Dec 2023 13:29:07 -0600 Message-Id: <20231207192907.10113-7-rpearsonhpe@gmail.com> X-Mailer: git-send-email 2.40.1 In-Reply-To: <20231207192907.10113-1-rpearsonhpe@gmail.com> References: <20231207192907.10113-1-rpearsonhpe@gmail.com> Precedence: bulk X-Mailing-List: linux-rdma@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Currently the rdma_rxe driver has two different and not really compatible ways of managing the lifetime of an mcast group, by ref counting the mcg struct and counting the number of attached qp's. They are each doing part of the job of cleaning up an mcg when the last qp is detached and are racy in the process. This patch removes using the use of the number of qp's. Fix up mcg reference counting so the ref count will drop to zero correctly and move code from rxe_destroy_mcg to rxe_cleanup_mcg since rxe_destroy is no longer needed. This set of fixes scrambles the code in rxe_mast.c and as a result a lot of cleanup has been done as well. Fixes: 6090a0c4c7c6 ("RDMA/rxe: Cleanup rxe_mcast.c") Signed-off-by: Bob Pearson --- drivers/infiniband/sw/rxe/rxe_loc.h | 2 +- drivers/infiniband/sw/rxe/rxe_mcast.c | 170 +++++++------------------- drivers/infiniband/sw/rxe/rxe_recv.c | 2 +- 3 files changed, 46 insertions(+), 128 deletions(-) diff --git a/drivers/infiniband/sw/rxe/rxe_loc.h b/drivers/infiniband/sw/rxe/rxe_loc.h index 62b2b25903fc..0509ccdaa2f2 100644 --- a/drivers/infiniband/sw/rxe/rxe_loc.h +++ b/drivers/infiniband/sw/rxe/rxe_loc.h @@ -37,7 +37,7 @@ void rxe_cq_cleanup(struct rxe_pool_elem *elem); struct rxe_mcg *rxe_lookup_mcg(struct rxe_dev *rxe, union ib_gid *mgid); int rxe_attach_mcast(struct ib_qp *ibqp, union ib_gid *mgid, u16 mlid); int rxe_detach_mcast(struct ib_qp *ibqp, union ib_gid *mgid, u16 mlid); -void rxe_cleanup_mcg(struct kref *kref); +int rxe_put_mcg(struct rxe_mcg *mcg); /* rxe_mmap.c */ struct rxe_mmap_info { diff --git a/drivers/infiniband/sw/rxe/rxe_mcast.c b/drivers/infiniband/sw/rxe/rxe_mcast.c index 1c85aa03f79b..8dd37375423b 100644 --- a/drivers/infiniband/sw/rxe/rxe_mcast.c +++ b/drivers/infiniband/sw/rxe/rxe_mcast.c @@ -131,13 +131,31 @@ static int rxe_mcast_del(struct rxe_mcg *mcg) return err ?: err2; } -/** - * __rxe_insert_mcg - insert an mcg into red-black tree (rxe->mcg_tree) - * @mcg: mcg object with an embedded red-black tree node - * - * Context: caller must hold a reference to mcg and rxe->mcg_mutex and - * is responsible to avoid adding the same mcg twice to the tree. - */ +static void __rxe_remove_mcg(struct rxe_mcg *mcg) +{ + rb_erase(&mcg->node, &mcg->rxe->mcg_tree); +} + +static void rxe_cleanup_mcg(struct kref *kref) +{ + struct rxe_mcg *mcg = container_of(kref, typeof(*mcg), ref_cnt); + + __rxe_remove_mcg(mcg); + rxe_mcast_del(mcg); + atomic_dec(&mcg->rxe->mcg_num); + kfree_rcu(mcg, rcu); +} + +static int rxe_get_mcg(struct rxe_mcg *mcg) +{ + return kref_get_unless_zero(&mcg->ref_cnt); +} + +int rxe_put_mcg(struct rxe_mcg *mcg) +{ + return kref_put(&mcg->ref_cnt, rxe_cleanup_mcg); +} + static void __rxe_insert_mcg(struct rxe_mcg *mcg) { struct rb_root *tree = &mcg->rxe->mcg_tree; @@ -166,23 +184,11 @@ static void __rxe_insert_mcg(struct rxe_mcg *mcg) rb_insert_color(&mcg->node, tree); } -/** - * __rxe_remove_mcg - remove an mcg from red-black tree holding lock - * @mcg: mcast group object with an embedded red-black tree node - * - * Context: caller must hold a reference to mcg and rxe->mcg_mutex - */ -static void __rxe_remove_mcg(struct rxe_mcg *mcg) -{ - rb_erase(&mcg->node, &mcg->rxe->mcg_tree); -} - /* * Lookup mgid in the multicast group red-black tree and try to * get a ref on it. Return mcg on success else NULL. */ -struct rxe_mcg *rxe_lookup_mcg(struct rxe_dev *rxe, - union ib_gid *mgid) +struct rxe_mcg *rxe_lookup_mcg(struct rxe_dev *rxe, union ib_gid *mgid) { struct rb_root *tree = &rxe->mcg_tree; struct rxe_mcg *mcg; @@ -204,20 +210,14 @@ struct rxe_mcg *rxe_lookup_mcg(struct rxe_dev *rxe, else break; } - mcg = (node && kref_get_unless_zero(&mcg->ref_cnt)) ? mcg : NULL; + mcg = (node && rxe_get_mcg(mcg)) ? mcg : NULL; rcu_read_unlock(); return mcg; } -/** - * rxe_get_mcg - lookup or allocate a mcg - * @rxe: rxe device object - * @mgid: multicast IP address as a gid - * - * Returns: mcg on success else ERR_PTR(error) - */ -static struct rxe_mcg *rxe_get_mcg(struct rxe_dev *rxe, union ib_gid *mgid) +/* find an existing mcg or allocate a new one */ +static struct rxe_mcg *rxe_alloc_mcg(struct rxe_dev *rxe, union ib_gid *mgid) { struct rxe_mcg *mcg; int err; @@ -228,7 +228,7 @@ static struct rxe_mcg *rxe_get_mcg(struct rxe_dev *rxe, union ib_gid *mgid) goto out; /* nothing to do */ if (atomic_inc_return(&rxe->mcg_num) > rxe->attr.max_mcast_grp) { - err = -ENOMEM; + err = -EINVAL; goto err_dec; } @@ -244,19 +244,17 @@ static struct rxe_mcg *rxe_get_mcg(struct rxe_dev *rxe, union ib_gid *mgid) kref_init(&mcg->ref_cnt); INIT_LIST_HEAD(&mcg->qp_list); spin_lock_init(&mcg->lock); - kref_get(&mcg->ref_cnt); - __rxe_insert_mcg(mcg); err = rxe_mcast_add(mcg); if (err) goto err_free; + __rxe_insert_mcg(mcg); out: mutex_unlock(&rxe->mcg_mutex); return mcg; err_free: - __rxe_remove_mcg(mcg); kfree(mcg); err_dec: atomic_dec(&rxe->mcg_num); @@ -264,64 +262,12 @@ static struct rxe_mcg *rxe_get_mcg(struct rxe_dev *rxe, union ib_gid *mgid) return ERR_PTR(err); } -/** - * rxe_cleanup_mcg - cleanup mcg for kref_put - * @kref: struct kref embnedded in mcg - */ -void rxe_cleanup_mcg(struct kref *kref) -{ - struct rxe_mcg *mcg = container_of(kref, typeof(*mcg), ref_cnt); - - kfree_rcu(mcg, rcu); -} - -/** - * __rxe_destroy_mcg - destroy mcg object holding rxe->mcg_mutex - * @mcg: the mcg object - * - * Context: caller is holding rxe->mcg_mutex - * no qp's are attached to mcg - */ -static void __rxe_destroy_mcg(struct rxe_mcg *mcg) -{ - struct rxe_dev *rxe = mcg->rxe; - - /* remove mcg from red-black tree then drop ref */ - __rxe_remove_mcg(mcg); - kref_put(&mcg->ref_cnt, rxe_cleanup_mcg); - - atomic_dec(&rxe->mcg_num); -} - -/** - * rxe_destroy_mcg - destroy mcg object - * @mcg: the mcg object - * - * Context: no qp's are attached to mcg - */ -static void rxe_destroy_mcg(struct rxe_mcg *mcg) -{ - /* delete mcast address outside of lock */ - rxe_mcast_del(mcg); - - mutex_lock(&mcg->rxe->mcg_mutex); - __rxe_destroy_mcg(mcg); - mutex_unlock(&mcg->rxe->mcg_mutex); -} - -/** - * rxe_attach_mcg - attach qp to mcg if not already attached - * @qp: qp object - * @mcg: mcg object - * - * Returns: 0 on success else an error - */ -static int rxe_attach_mcg(struct rxe_mcg *mcg, struct rxe_qp *qp) +static int rxe_attach_mcg(struct rxe_qp *qp, struct rxe_mcg *mcg) { struct rxe_dev *rxe = mcg->rxe; struct rxe_mca *mca; unsigned long flags; - int err; + int err = 0; mutex_lock(&rxe->mcg_mutex); spin_lock_irqsave(&mcg->lock, flags); @@ -355,29 +301,24 @@ static int rxe_attach_mcg(struct rxe_mcg *mcg, struct rxe_qp *qp) rxe_get(qp); mca->qp = qp; + rxe_get_mcg(mcg); + spin_lock_irqsave(&mcg->lock, flags); list_add_tail(&mca->qp_list, &mcg->qp_list); spin_unlock_irqrestore(&mcg->lock, flags); -out: - mutex_unlock(&rxe->mcg_mutex); - return 0; + goto out; err_dec_qp_num: atomic_dec(&mcg->qp_num); err_dec_attach: atomic_dec(&rxe->mcg_attach); +out: + rxe_put_mcg(mcg); mutex_unlock(&rxe->mcg_mutex); return err; } -/** - * rxe_detach_mcg - detach qp from mcg - * @mcg: mcg object - * @qp: qp object - * - * Returns: 0 on success else an error if qp is not attached. - */ -static int rxe_detach_mcg(struct rxe_mcg *mcg, struct rxe_qp *qp) +static int rxe_detach_mcg(struct rxe_qp *qp, struct rxe_mcg *mcg) { struct rxe_dev *rxe = mcg->rxe; struct rxe_mca *mca; @@ -394,7 +335,6 @@ static int rxe_detach_mcg(struct rxe_mcg *mcg, struct rxe_qp *qp) } spin_unlock_irqrestore(&mcg->lock, flags); - /* we didn't find the qp on the list */ err = -EINVAL; goto err_out; @@ -402,23 +342,15 @@ static int rxe_detach_mcg(struct rxe_mcg *mcg, struct rxe_qp *qp) spin_lock_irqsave(&mcg->lock, flags); list_del(&mca->qp_list); spin_unlock_irqrestore(&mcg->lock, flags); + rxe_put_mcg(mcg); atomic_dec(&mcg->qp_num); atomic_dec(&mcg->rxe->mcg_attach); atomic_dec(&mca->qp->mcg_num); rxe_put(mca->qp); kfree(mca); - - /* if the number of qp's attached to the - * mcast group falls to zero go ahead and - * tear it down. This will not free the - * object since we are still holding a ref - * from the caller - */ - if (atomic_read(&mcg->qp_num) <= 0) - __rxe_destroy_mcg(mcg); - err_out: + rxe_put_mcg(mcg); mutex_unlock(&rxe->mcg_mutex); return err; } @@ -433,7 +365,6 @@ static int rxe_detach_mcg(struct rxe_mcg *mcg, struct rxe_qp *qp) */ int rxe_attach_mcast(struct ib_qp *ibqp, union ib_gid *mgid, u16 mlid) { - int err; struct rxe_dev *rxe = to_rdev(ibqp->device); struct rxe_qp *qp = to_rqp(ibqp); struct rxe_mcg *mcg; @@ -441,20 +372,11 @@ int rxe_attach_mcast(struct ib_qp *ibqp, union ib_gid *mgid, u16 mlid) if (rxe->attr.max_mcast_grp == 0) return -EINVAL; - /* takes a ref on mcg if successful */ - mcg = rxe_get_mcg(rxe, mgid); + mcg = rxe_alloc_mcg(rxe, mgid); if (IS_ERR(mcg)) return PTR_ERR(mcg); - err = rxe_attach_mcg(mcg, qp); - - /* if we failed to attach the first qp to mcg tear it down */ - if (atomic_read(&mcg->qp_num) == 0) - rxe_destroy_mcg(mcg); - - kref_put(&mcg->ref_cnt, rxe_cleanup_mcg); - - return err; + return rxe_attach_mcg(qp, mcg); } /** @@ -470,14 +392,10 @@ int rxe_detach_mcast(struct ib_qp *ibqp, union ib_gid *mgid, u16 mlid) struct rxe_dev *rxe = to_rdev(ibqp->device); struct rxe_qp *qp = to_rqp(ibqp); struct rxe_mcg *mcg; - int err; mcg = rxe_lookup_mcg(rxe, mgid); if (!mcg) return -EINVAL; - err = rxe_detach_mcg(mcg, qp); - kref_put(&mcg->ref_cnt, rxe_cleanup_mcg); - - return err; + return rxe_detach_mcg(qp, mcg); } diff --git a/drivers/infiniband/sw/rxe/rxe_recv.c b/drivers/infiniband/sw/rxe/rxe_recv.c index 6cf0da958864..e3ec3dfc57f4 100644 --- a/drivers/infiniband/sw/rxe/rxe_recv.c +++ b/drivers/infiniband/sw/rxe/rxe_recv.c @@ -262,7 +262,7 @@ static void rxe_rcv_mcast_pkt(struct rxe_dev *rxe, struct sk_buff *skb) spin_unlock_irqrestore(&mcg->lock, flags); - kref_put(&mcg->ref_cnt, rxe_cleanup_mcg); + rxe_put_mcg(mcg); if (likely(!skb)) return; From patchwork Thu Dec 7 19:29:08 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Bob Pearson X-Patchwork-Id: 13483883 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="AKS6GtDx" Received: from mail-oi1-x22a.google.com (mail-oi1-x22a.google.com [IPv6:2607:f8b0:4864:20::22a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D438D1711; Thu, 7 Dec 2023 11:30:23 -0800 (PST) Received: by mail-oi1-x22a.google.com with SMTP id 5614622812f47-3b9d23c8bf7so878897b6e.0; Thu, 07 Dec 2023 11:30:23 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1701977423; x=1702582223; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=8xW8YlJNeX4Wpian/ZFUqeO11K05OcmJkUPNwn2Dt/c=; b=AKS6GtDxZjhxBHgvxjRcem2EX6slEveQxwN/W6n7zrTDE0COcarHBa6vlG23kN965K QrE1QNZpUCA1FV012koi07iLFi4ekqHkdfS73+ZsYUmnGu09dMHjyi6gORTelq1eFV3a Zh48Uxg8kEMdsTLPdI1xCy0yXlpYxTsYLGwy+2b2NG1koJr3LnbX7tEdIEHd7OnQ8LqZ LVTp0znh5IZx8bCyoI7c8KrCMuS4LPp0QvGhHkPmLUJxTZbM2u8A4C//5xAo7uMkx6Pb KaF1HmAnBfYct4KZyC67jdYFMx8ImyQtxlWyweRkz3O3XXm9Z6WEC+bk9WO0n5fm9YMo zbJA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1701977423; x=1702582223; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=8xW8YlJNeX4Wpian/ZFUqeO11K05OcmJkUPNwn2Dt/c=; b=m+nIQUsBTAEorB/WexYA74GnfQHJgopL/laEhaIqvvLcCdY8GaTUrPDfFrli8Fdq3t 7GmuIpI1tqqwoNGVed48HT4RaJgymcrps3ac2HPua3UKJnHftnldIn6OE+AjRcQ5pq+A vghZHocgitSY1Z1mDORJTyGVq2Pf9fJJKmo3GATwu7ul7peobJcRE9CqDDysnKROyc+8 TE2Vg/RoldR1IkKtXBhqWHdAWqQe8dblAiwNbJ0xqMjary864rS9G/bq2iE+Flofdei6 mcHMZOzSq01j6qhOp826Vfb/eJdHa/OQ/a4wcBfD1B6nJYOQDAs48J1DeCPDVv+NAbI+ +OMw== X-Gm-Message-State: AOJu0Yy0IbaQlNKvyO5WxncyujEDNLyNcuAO5pWlLSn1ToZgoVO5wPtU lEJZTHNrv0Sep4S/hsnJphU= X-Google-Smtp-Source: AGHT+IG2G5BMAwx1+fVJfv/Y8zVoBih63xNRl6EbqwKkxNGCCYkTwiAZBRdZ96EB1LtAxmvtZTa78w== X-Received: by 2002:a05:6870:224a:b0:1fb:75a:779c with SMTP id j10-20020a056870224a00b001fb075a779cmr3174238oaf.77.1701977422814; Thu, 07 Dec 2023 11:30:22 -0800 (PST) Received: from bob-3900x.lan (2603-8081-1405-679b-ca1f-53fd-59c8-8b84.res6.spectrum.com. [2603:8081:1405:679b:ca1f:53fd:59c8:8b84]) by smtp.gmail.com with ESMTPSA id mp23-20020a056871329700b001fb634b546dsm92347oac.14.2023.12.07.11.30.21 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 07 Dec 2023 11:30:22 -0800 (PST) From: Bob Pearson To: jgg@nvidia.com, yanjun.zhu@linux.dev, linux-rdma@vger.kernel.org, netdev@vger.kernel.org, dsahern@kernel.org, rain.1986.08.12@gmail.com Cc: Bob Pearson Subject: [PATCH for-next v6 7/7] RDMA/rxe: Add module parameters for mcast limits Date: Thu, 7 Dec 2023 13:29:08 -0600 Message-Id: <20231207192907.10113-8-rpearsonhpe@gmail.com> X-Mailer: git-send-email 2.40.1 In-Reply-To: <20231207192907.10113-1-rpearsonhpe@gmail.com> References: <20231207192907.10113-1-rpearsonhpe@gmail.com> Precedence: bulk X-Mailing-List: linux-rdma@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Add module parameters for max_mcast_grp, max_mcast_qp_attach, and tot_mcast_qp_attach to allow setting these parameters to small values when the driver is loaded to support testing these limits. Signed-off-by: Bob Pearson --- drivers/infiniband/sw/rxe/Makefile | 3 ++- drivers/infiniband/sw/rxe/rxe.c | 6 +++--- drivers/infiniband/sw/rxe/rxe_param.c | 23 +++++++++++++++++++++++ drivers/infiniband/sw/rxe/rxe_param.h | 4 ++++ 4 files changed, 32 insertions(+), 4 deletions(-) create mode 100644 drivers/infiniband/sw/rxe/rxe_param.c diff --git a/drivers/infiniband/sw/rxe/Makefile b/drivers/infiniband/sw/rxe/Makefile index 5395a581f4bb..b183924ea01d 100644 --- a/drivers/infiniband/sw/rxe/Makefile +++ b/drivers/infiniband/sw/rxe/Makefile @@ -22,4 +22,5 @@ rdma_rxe-y := \ rxe_mcast.o \ rxe_task.o \ rxe_net.o \ - rxe_hw_counters.o + rxe_hw_counters.o \ + rxe_param.o diff --git a/drivers/infiniband/sw/rxe/rxe.c b/drivers/infiniband/sw/rxe/rxe.c index 147cb16e937d..599fbfdeb426 100644 --- a/drivers/infiniband/sw/rxe/rxe.c +++ b/drivers/infiniband/sw/rxe/rxe.c @@ -59,9 +59,9 @@ static void rxe_init_device_param(struct rxe_dev *rxe) rxe->attr.max_res_rd_atom = RXE_MAX_RES_RD_ATOM; rxe->attr.max_qp_init_rd_atom = RXE_MAX_QP_INIT_RD_ATOM; rxe->attr.atomic_cap = IB_ATOMIC_HCA; - rxe->attr.max_mcast_grp = RXE_MAX_MCAST_GRP; - rxe->attr.max_mcast_qp_attach = RXE_MAX_MCAST_QP_ATTACH; - rxe->attr.max_total_mcast_qp_attach = RXE_MAX_TOT_MCAST_QP_ATTACH; + rxe->attr.max_mcast_grp = rxe_max_mcast_grp; + rxe->attr.max_mcast_qp_attach = rxe_max_mcast_qp_attach; + rxe->attr.max_total_mcast_qp_attach = rxe_max_tot_mcast_qp_attach; rxe->attr.max_ah = RXE_MAX_AH; rxe->attr.max_srq = RXE_MAX_SRQ; rxe->attr.max_srq_wr = RXE_MAX_SRQ_WR; diff --git a/drivers/infiniband/sw/rxe/rxe_param.c b/drivers/infiniband/sw/rxe/rxe_param.c new file mode 100644 index 000000000000..27873e7de753 --- /dev/null +++ b/drivers/infiniband/sw/rxe/rxe_param.c @@ -0,0 +1,23 @@ +// SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB +/* + * Copyright (c) 2023 Hewlett Packard Enterprise, Inc. All rights reserved. + */ + +#include "rxe.h" + +int rxe_max_mcast_grp = RXE_MAX_MCAST_GRP; +module_param_named(max_mcast_grp, rxe_max_mcast_grp, int, 0444); +MODULE_PARM_DESC(max_mcast_grp, + "Maximum number of multicast groups per device"); + +int rxe_max_mcast_qp_attach = RXE_MAX_MCAST_QP_ATTACH; +module_param_named(max_mcast_qp_attach, rxe_max_mcast_qp_attach, + int, 0444); +MODULE_PARM_DESC(max_mcast_qp_attach, + "Maximum number of QPs attached to a multicast group"); + +int rxe_max_tot_mcast_qp_attach = RXE_MAX_TOT_MCAST_QP_ATTACH; +module_param_named(max_tot_mcast_qp_attach, rxe_max_tot_mcast_qp_attach, + int, 0444); +MODULE_PARM_DESC(max_tot_mcast_qp_attach, + "Maximum total number of QPs attached to multicast groups per device"); diff --git a/drivers/infiniband/sw/rxe/rxe_param.h b/drivers/infiniband/sw/rxe/rxe_param.h index d2f57ead78ad..d6fe50f5f483 100644 --- a/drivers/infiniband/sw/rxe/rxe_param.h +++ b/drivers/infiniband/sw/rxe/rxe_param.h @@ -125,6 +125,10 @@ enum rxe_device_param { RXE_VENDOR_ID = 0XFFFFFF, }; +extern int rxe_max_mcast_grp; +extern int rxe_max_mcast_qp_attach; +extern int rxe_max_tot_mcast_qp_attach; + /* default/initial rxe port parameters */ enum rxe_port_param { RXE_PORT_GID_TBL_LEN = 1024,