diff mbox series

[rdma-next] RDMA/core: Fix resolve_prepare_src error cleanup

Message ID ec81a9d50462d9b9303966176b17b85f7dfbb96a.1670749660.git.leonro@nvidia.com (mailing list archive)
State Changes Requested
Commit 4d90832aabbe22a9627ba3083e56aabe9f908fb0
Headers show
Series [rdma-next] RDMA/core: Fix resolve_prepare_src error cleanup | expand

Commit Message

Leon Romanovsky Dec. 11, 2022, 9:08 a.m. UTC
From: Patrisious Haddad <phaddad@nvidia.com>

resolve_prepare_src() changes the destination address of the id,
regardless of success, and on failure zeroes it out.

Instead on function failure keep the original destination address
of the id.

Since the id could have been already added to the cm id tree and
zeroing its destination address, could result in a key mismatch or
multiple ids having the same key(zero) in the tree which could lead to:

BUG: KASAN: slab-out-of-bounds in compare_netdev_and_ip.isra.0+0x25/0x120 drivers/infiniband/core/cma.c:473
Read of size 4 at addr ffff88800920d9e4 by task syz-executor.4/14865

CPU: 2 PID: 14865 Comm: syz-executor.4 Not tainted 5.19.0-rc2 #21
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
Call Trace:
 <TASK>
 __dump_stack lib/dump_stack.c:88 [inline]
 dump_stack_lvl+0x6e/0x91 lib/dump_stack.c:106
 print_address_description mm/kasan/report.c:313 [inline]
 print_report.cold+0xff/0x68e mm/kasan/report.c:429
 kasan_report+0xa8/0x130 mm/kasan/report.c:491
 compare_netdev_and_ip.isra.0+0x25/0x120 drivers/infiniband/core/cma.c:473
 cma_add_id_to_tree drivers/infiniband/core/cma.c:506 [inline]
 rdma_resolve_route+0xbc4/0x1560 drivers/infiniband/core/cma.c:3303
 ucma_resolve_route+0xe4/0x150 drivers/infiniband/core/ucma.c:746
 ucma_write+0x19f/0x280 drivers/infiniband/core/ucma.c:1744
 vfs_write fs/read_write.c:589 [inline]
 vfs_write+0x181/0x580 fs/read_write.c:571
 ksys_write+0x1a1/0x1e0 fs/read_write.c:644
 do_syscall_x64 arch/x86/entry/common.c:50 [inline]
 do_syscall_64+0x3d/0x90 arch/x86/entry/common.c:80
 entry_SYSCALL_64_after_hwframe+0x46/0xb0
RIP: 0033:0x7f2afbe8c89d
Code: 02 b8 ff ff ff ff c3 66 0f 1f 44 00 00 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 bc ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007f2afcf1ec28 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
RAX: ffffffffffffffda RBX: 00007f2afbfabf60 RCX: 00007f2afbe8c89d
RDX: 0000000000000010 RSI: 0000000020000100 RDI: 0000000000000003
RBP: 00007f2afbef910d R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 00007ffcdc77c66f R14: 00007f2afbfabf60 R15: 00007f2afcf1edc0
 </TASK>

Fixes: 19b752a19dce ("IB/cma: Allow port reuse for rdma_id")
Signed-off-by: Patrisious Haddad <phaddad@nvidia.com>
Reviewed-by: Maor Gottlieb <maorg@nvidia.com>
Reported-by: Wei Chen <harperchen1110@gmail.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
---
 drivers/infiniband/core/cma.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

Comments

Jason Gunthorpe Dec. 12, 2022, 1:27 p.m. UTC | #1
On Sun, Dec 11, 2022 at 11:08:30AM +0200, Leon Romanovsky wrote:
> From: Patrisious Haddad <phaddad@nvidia.com>
> 
> resolve_prepare_src() changes the destination address of the id,
> regardless of success, and on failure zeroes it out.
> 
> Instead on function failure keep the original destination address
> of the id.
> 
> Since the id could have been already added to the cm id tree and
> zeroing its destination address, could result in a key mismatch or
> multiple ids having the same key(zero) in the tree which could lead to:

Oh, this can't be right

The destination address is variable and it is changed by resolve even
in good cases.

So this part of the rb search is nonsense:

		result = compare_netdev_and_ip(
			node_id_priv->id.route.addr.dev_addr.bound_dev_if,
			cma_dst_addr(node_id_priv), this);

The only way to fix it is to freeze the dst_addr before inserting
things into the rb tree.

ie completely block resolve_prepare_src()

Most probably this suggests that the id is being inserted into the
rbtree at the wrong time, before the dst_add becomes unchangable.

Jason
Patrisious Haddad Dec. 12, 2022, 1:42 p.m. UTC | #2
I think we have the same realization but different understanding of the 
code, please correct what I'm missing, rest inline:

On 12/12/2022 15:27, Jason Gunthorpe wrote:
> On Sun, Dec 11, 2022 at 11:08:30AM +0200, Leon Romanovsky wrote:
>> From: Patrisious Haddad <phaddad@nvidia.com>
>>
>> resolve_prepare_src() changes the destination address of the id,
>> regardless of success, and on failure zeroes it out.
>>
>> Instead on function failure keep the original destination address
>> of the id.
>>
>> Since the id could have been already added to the cm id tree and
>> zeroing its destination address, could result in a key mismatch or
>> multiple ids having the same key(zero) in the tree which could lead to:
> 
> Oh, this can't be right
> 
> The destination address is variable and it is changed by resolve even
> in good cases.
This is what I don't think can happen, since one address is 
resolved(bound), it can't be bound again so each an other try of resolve 
would fail and enter the error flow which I just fixed.
> 
> So this part of the rb search is nonsense:
> 
> 		result = compare_netdev_and_ip(
> 			node_id_priv->id.route.addr.dev_addr.bound_dev_if,
> 			cma_dst_addr(node_id_priv), this);
> 
> The only way to fix it is to freeze the dst_addr before inserting
> things into the rb tree.
I completely agree, and this was my assumption that after resolve 
address, and resolve route(where I add to the tree), the dst_addr is 
frozen, the only scenario where it isn't was the resolve_prepare_src 
failure which some why nullified the value instead of keeping the original.

and what I'm trying to say, is that once the CM is added the tree(aka 
passed resolve addr once + resolve route) , there can't be a 
good(success) case for the resolve_prepare_src again, since it is 
already bound so every consecutive call should fail, meaning the 
cma_dst_addr is technically frozen.
> 
> ie completely block resolve_prepare_src()
> 
> Most probably this suggests that the id is being inserted into the
> rbtree at the wrong time, before the dst_add becomes unchangable.
> 
> Jason
Jason Gunthorpe Dec. 12, 2022, 1:43 p.m. UTC | #3
On Mon, Dec 12, 2022 at 03:42:07PM +0200, Patrisious Haddad wrote:
> I think we have the same realization but different understanding of the
> code, please correct what I'm missing, rest inline:
> 
> On 12/12/2022 15:27, Jason Gunthorpe wrote:
> > On Sun, Dec 11, 2022 at 11:08:30AM +0200, Leon Romanovsky wrote:
> > > From: Patrisious Haddad <phaddad@nvidia.com>
> > > 
> > > resolve_prepare_src() changes the destination address of the id,
> > > regardless of success, and on failure zeroes it out.
> > > 
> > > Instead on function failure keep the original destination address
> > > of the id.
> > > 
> > > Since the id could have been already added to the cm id tree and
> > > zeroing its destination address, could result in a key mismatch or
> > > multiple ids having the same key(zero) in the tree which could lead to:
> > 
> > Oh, this can't be right
> > 
> > The destination address is variable and it is changed by resolve even
> > in good cases.
> This is what I don't think can happen, since one address is resolved(bound),
> it can't be bound again so each an other try of resolve would fail and enter
> the error flow which I just fixed.
> > 
> > So this part of the rb search is nonsense:
> > 
> > 		result = compare_netdev_and_ip(
> > 			node_id_priv->id.route.addr.dev_addr.bound_dev_if,
> > 			cma_dst_addr(node_id_priv), this);
> > 
> > The only way to fix it is to freeze the dst_addr before inserting
> > things into the rb tree.
> I completely agree, and this was my assumption that after resolve address,
> and resolve route(where I add to the tree), the dst_addr is frozen, the only
> scenario where it isn't was the resolve_prepare_src failure which some why
> nullified the value instead of keeping the original.

Then fix the control flow so it doesn't do the nullification if it
didn't change the value

You can't just change it while it is in the rb tree, that is racy

Jason
Patrisious Haddad Dec. 12, 2022, 1:55 p.m. UTC | #4
Agree, but changing the control flow of this function is really 
problematic , it was even tried before if you remember commit 
"e4103312d7b7a" , it got something to do with port allocation, I'll take 
another look over the code to see what other options we have though.

Since in short, you are right it is racy now.

On 12/12/2022 15:43, Jason Gunthorpe wrote:
> On Mon, Dec 12, 2022 at 03:42:07PM +0200, Patrisious Haddad wrote:
>> I think we have the same realization but different understanding of the
>> code, please correct what I'm missing, rest inline:
>>
>> On 12/12/2022 15:27, Jason Gunthorpe wrote:
>>> On Sun, Dec 11, 2022 at 11:08:30AM +0200, Leon Romanovsky wrote:
>>>> From: Patrisious Haddad <phaddad@nvidia.com>
>>>>
>>>> resolve_prepare_src() changes the destination address of the id,
>>>> regardless of success, and on failure zeroes it out.
>>>>
>>>> Instead on function failure keep the original destination address
>>>> of the id.
>>>>
>>>> Since the id could have been already added to the cm id tree and
>>>> zeroing its destination address, could result in a key mismatch or
>>>> multiple ids having the same key(zero) in the tree which could lead to:
>>>
>>> Oh, this can't be right
>>>
>>> The destination address is variable and it is changed by resolve even
>>> in good cases.
>> This is what I don't think can happen, since one address is resolved(bound),
>> it can't be bound again so each an other try of resolve would fail and enter
>> the error flow which I just fixed.
>>>
>>> So this part of the rb search is nonsense:
>>>
>>> 		result = compare_netdev_and_ip(
>>> 			node_id_priv->id.route.addr.dev_addr.bound_dev_if,
>>> 			cma_dst_addr(node_id_priv), this);
>>>
>>> The only way to fix it is to freeze the dst_addr before inserting
>>> things into the rb tree.
>> I completely agree, and this was my assumption that after resolve address,
>> and resolve route(where I add to the tree), the dst_addr is frozen, the only
>> scenario where it isn't was the resolve_prepare_src failure which some why
>> nullified the value instead of keeping the original.
> 
> Then fix the control flow so it doesn't do the nullification if it
> didn't change the value
> 
> You can't just change it while it is in the rb tree, that is racy
> 
> Jason
Jason Gunthorpe Dec. 12, 2022, 2 p.m. UTC | #5
On Mon, Dec 12, 2022 at 03:55:37PM +0200, Patrisious Haddad wrote:
> Agree, but changing the control flow of this function is really problematic
> , it was even tried before if you remember commit "e4103312d7b7a" , it got
> something to do with port allocation, I'll take another look over the code
> to see what other options we have though.

Yes, we've changed this function many times because it is badly
mis-designed

Jason
Patrisious Haddad Dec. 12, 2022, 2:06 p.m. UTC | #6
Btw there is the easy ugly fix obviously, which would be this patch + 
locking this function with the tree spin-lock(to avoid any race).

I'll check however if there is hope for a better possible design for 
this function.

On 12/12/2022 16:00, Jason Gunthorpe wrote:
> On Mon, Dec 12, 2022 at 03:55:37PM +0200, Patrisious Haddad wrote:
>> Agree, but changing the control flow of this function is really problematic
>> , it was even tried before if you remember commit "e4103312d7b7a" , it got
>> something to do with port allocation, I'll take another look over the code
>> to see what other options we have though.
> 
> Yes, we've changed this function many times because it is badly
> mis-designed
> 
> Jason
Jason Gunthorpe Dec. 12, 2022, 8:15 p.m. UTC | #7
On Mon, Dec 12, 2022 at 04:06:03PM +0200, Patrisious Haddad wrote:
> Btw there is the easy ugly fix obviously, which would be this patch +
> locking this function with the tree spin-lock(to avoid any race).
> 
> I'll check however if there is hope for a better possible design for this
> function.

The usual way I've fixed this is to avoid touching, in this case,
cma_dst_addr() in the call chain. eg we already pass in the correct
dst_addr

What you've done is made it so that in RDMA_CM_ROUTE_QUERY and beyond
the CM id's dst cannot change. The trick with this nasty code is that
it is trying to trigger auto-bind, and it has to do it blind because
of bad code structure

So, try something like this:

diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c
index 13e0ab785baa24..1d1f9cd01dd38f 100644
--- a/drivers/infiniband/core/cma.c
+++ b/drivers/infiniband/core/cma.c
@@ -3547,7 +3547,7 @@ static int cma_bind_addr(struct rdma_cm_id *id, struct sockaddr *src_addr,
 	struct sockaddr_storage zero_sock = {};
 
 	if (src_addr && src_addr->sa_family)
-		return rdma_bind_addr(id, src_addr);
+		return rdma_bind_addr_dst(id, src_addr, dst_addr);
 
 	/*
 	 * When the src_addr is not specified, automatically supply an any addr
@@ -3567,7 +3567,7 @@ static int cma_bind_addr(struct rdma_cm_id *id, struct sockaddr *src_addr,
 		((struct sockaddr_ib *)&zero_sock)->sib_pkey =
 			((struct sockaddr_ib *)dst_addr)->sib_pkey;
 	}
-	return rdma_bind_addr(id, (struct sockaddr *)&zero_sock);
+	return rdma_bind_addr_dst(id, (struct sockaddr *)&zero_sock, dst_addr);
 }
 
 /*
@@ -3582,17 +3582,14 @@ static int resolve_prepare_src(struct rdma_id_private *id_priv,
 {
 	int ret;
 
-	memcpy(cma_dst_addr(id_priv), dst_addr, rdma_addr_size(dst_addr));
 	if (!cma_comp_exch(id_priv, RDMA_CM_ADDR_BOUND, RDMA_CM_ADDR_QUERY)) {
 		/* For a well behaved ULP state will be RDMA_CM_IDLE */
 		ret = cma_bind_addr(&id_priv->id, src_addr, dst_addr);
 		if (ret)
-			goto err_dst;
+			return ret;
 		if (WARN_ON(!cma_comp_exch(id_priv, RDMA_CM_ADDR_BOUND,
-					   RDMA_CM_ADDR_QUERY))) {
-			ret = -EINVAL;
-			goto err_dst;
-		}
+					   RDMA_CM_ADDR_QUERY)))
+			return -EINVAL;
 	}
 
 	if (cma_family(id_priv) != dst_addr->sa_family) {
@@ -3603,8 +3600,6 @@ static int resolve_prepare_src(struct rdma_id_private *id_priv,
 
 err_state:
 	cma_comp_exch(id_priv, RDMA_CM_ADDR_QUERY, RDMA_CM_ADDR_BOUND);
-err_dst:
-	memset(cma_dst_addr(id_priv), 0, rdma_addr_size(dst_addr));
 	return ret;
 }
 
@@ -4058,27 +4053,25 @@ int rdma_listen(struct rdma_cm_id *id, int backlog)
 }
 EXPORT_SYMBOL(rdma_listen);
 
-int rdma_bind_addr(struct rdma_cm_id *id, struct sockaddr *addr)
+static int rdma_bind_addr_dst(struct rdma_id_private *id_priv,
+			      struct sockaddr *addr, struct sockaddr *daddr)
 {
-	struct rdma_id_private *id_priv;
 	int ret;
-	struct sockaddr  *daddr;
 
 	if (addr->sa_family != AF_INET && addr->sa_family != AF_INET6 &&
 	    addr->sa_family != AF_IB)
 		return -EAFNOSUPPORT;
 
-	id_priv = container_of(id, struct rdma_id_private, id);
 	if (!cma_comp_exch(id_priv, RDMA_CM_IDLE, RDMA_CM_ADDR_BOUND))
 		return -EINVAL;
 
-	ret = cma_check_linklocal(&id->route.addr.dev_addr, addr);
+	ret = cma_check_linklocal(&id_priv->id.route.addr.dev_addr, addr);
 	if (ret)
 		goto err1;
 
 	memcpy(cma_src_addr(id_priv), addr, rdma_addr_size(addr));
 	if (!cma_any_addr(addr)) {
-		ret = cma_translate_addr(addr, &id->route.addr.dev_addr);
+		ret = cma_translate_addr(addr, &id_priv->id.route.addr.dev_addr);
 		if (ret)
 			goto err1;
 
@@ -4098,8 +4091,14 @@ int rdma_bind_addr(struct rdma_cm_id *id, struct sockaddr *addr)
 		}
 #endif
 	}
-	daddr = cma_dst_addr(id_priv);
+
+	/*
+	 * FIXME: This seems wrong, we can't just blidnly replace the sa_family
+	 * unless we know the daddr is zero. It will corrupt it.
+	 */
 	daddr->sa_family = addr->sa_family;
+	if (daddr != cma_dst_addr(id_priv))
+		memcpy(cma_dst_addr(id_priv), daddr, rdma_addr_size(addr));
 
 	ret = cma_get_port(id_priv);
 	if (ret)
@@ -4115,6 +4114,14 @@ int rdma_bind_addr(struct rdma_cm_id *id, struct sockaddr *addr)
 	cma_comp_exch(id_priv, RDMA_CM_ADDR_BOUND, RDMA_CM_IDLE);
 	return ret;
 }
+
+int rdma_bind_addr(struct rdma_cm_id *id, struct sockaddr *addr)
+{
+	struct rdma_id_private *id_priv =
+		container_of(id, struct rdma_id_private, id);
+
+	return rdma_bind_addr_dst(id_priv, addr, cma_dst_addr(id_priv));
+}
 EXPORT_SYMBOL(rdma_bind_addr);
 
 static int cma_format_hdr(void *hdr, struct rdma_id_private *id_priv)
diff mbox series

Patch

diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c
index 1fca0a24f30f..2d4c391e36a9 100644
--- a/drivers/infiniband/core/cma.c
+++ b/drivers/infiniband/core/cma.c
@@ -3584,8 +3584,11 @@  static int resolve_prepare_src(struct rdma_id_private *id_priv,
 			       struct sockaddr *src_addr,
 			       const struct sockaddr *dst_addr)
 {
+	struct sockaddr org_addr = {};
 	int ret;
 
+	memcpy(&org_addr, cma_dst_addr(id_priv),
+	       rdma_addr_size(cma_dst_addr(id_priv)));
 	memcpy(cma_dst_addr(id_priv), dst_addr, rdma_addr_size(dst_addr));
 	if (!cma_comp_exch(id_priv, RDMA_CM_ADDR_BOUND, RDMA_CM_ADDR_QUERY)) {
 		/* For a well behaved ULP state will be RDMA_CM_IDLE */
@@ -3608,7 +3611,7 @@  static int resolve_prepare_src(struct rdma_id_private *id_priv,
 err_state:
 	cma_comp_exch(id_priv, RDMA_CM_ADDR_QUERY, RDMA_CM_ADDR_BOUND);
 err_dst:
-	memset(cma_dst_addr(id_priv), 0, rdma_addr_size(dst_addr));
+	memcpy(cma_dst_addr(id_priv), &org_addr, rdma_addr_size(&org_addr));
 	return ret;
 }