Message ID | ec81a9d50462d9b9303966176b17b85f7dfbb96a.1670749660.git.leonro@nvidia.com (mailing list archive) |
---|---|
State | Changes Requested |
Commit | 4d90832aabbe22a9627ba3083e56aabe9f908fb0 |
Headers | show |
Series | [rdma-next] RDMA/core: Fix resolve_prepare_src error cleanup | expand |
On Sun, Dec 11, 2022 at 11:08:30AM +0200, Leon Romanovsky wrote: > From: Patrisious Haddad <phaddad@nvidia.com> > > resolve_prepare_src() changes the destination address of the id, > regardless of success, and on failure zeroes it out. > > Instead on function failure keep the original destination address > of the id. > > Since the id could have been already added to the cm id tree and > zeroing its destination address, could result in a key mismatch or > multiple ids having the same key(zero) in the tree which could lead to: Oh, this can't be right The destination address is variable and it is changed by resolve even in good cases. So this part of the rb search is nonsense: result = compare_netdev_and_ip( node_id_priv->id.route.addr.dev_addr.bound_dev_if, cma_dst_addr(node_id_priv), this); The only way to fix it is to freeze the dst_addr before inserting things into the rb tree. ie completely block resolve_prepare_src() Most probably this suggests that the id is being inserted into the rbtree at the wrong time, before the dst_add becomes unchangable. Jason
I think we have the same realization but different understanding of the code, please correct what I'm missing, rest inline: On 12/12/2022 15:27, Jason Gunthorpe wrote: > On Sun, Dec 11, 2022 at 11:08:30AM +0200, Leon Romanovsky wrote: >> From: Patrisious Haddad <phaddad@nvidia.com> >> >> resolve_prepare_src() changes the destination address of the id, >> regardless of success, and on failure zeroes it out. >> >> Instead on function failure keep the original destination address >> of the id. >> >> Since the id could have been already added to the cm id tree and >> zeroing its destination address, could result in a key mismatch or >> multiple ids having the same key(zero) in the tree which could lead to: > > Oh, this can't be right > > The destination address is variable and it is changed by resolve even > in good cases. This is what I don't think can happen, since one address is resolved(bound), it can't be bound again so each an other try of resolve would fail and enter the error flow which I just fixed. > > So this part of the rb search is nonsense: > > result = compare_netdev_and_ip( > node_id_priv->id.route.addr.dev_addr.bound_dev_if, > cma_dst_addr(node_id_priv), this); > > The only way to fix it is to freeze the dst_addr before inserting > things into the rb tree. I completely agree, and this was my assumption that after resolve address, and resolve route(where I add to the tree), the dst_addr is frozen, the only scenario where it isn't was the resolve_prepare_src failure which some why nullified the value instead of keeping the original. and what I'm trying to say, is that once the CM is added the tree(aka passed resolve addr once + resolve route) , there can't be a good(success) case for the resolve_prepare_src again, since it is already bound so every consecutive call should fail, meaning the cma_dst_addr is technically frozen. > > ie completely block resolve_prepare_src() > > Most probably this suggests that the id is being inserted into the > rbtree at the wrong time, before the dst_add becomes unchangable. > > Jason
On Mon, Dec 12, 2022 at 03:42:07PM +0200, Patrisious Haddad wrote: > I think we have the same realization but different understanding of the > code, please correct what I'm missing, rest inline: > > On 12/12/2022 15:27, Jason Gunthorpe wrote: > > On Sun, Dec 11, 2022 at 11:08:30AM +0200, Leon Romanovsky wrote: > > > From: Patrisious Haddad <phaddad@nvidia.com> > > > > > > resolve_prepare_src() changes the destination address of the id, > > > regardless of success, and on failure zeroes it out. > > > > > > Instead on function failure keep the original destination address > > > of the id. > > > > > > Since the id could have been already added to the cm id tree and > > > zeroing its destination address, could result in a key mismatch or > > > multiple ids having the same key(zero) in the tree which could lead to: > > > > Oh, this can't be right > > > > The destination address is variable and it is changed by resolve even > > in good cases. > This is what I don't think can happen, since one address is resolved(bound), > it can't be bound again so each an other try of resolve would fail and enter > the error flow which I just fixed. > > > > So this part of the rb search is nonsense: > > > > result = compare_netdev_and_ip( > > node_id_priv->id.route.addr.dev_addr.bound_dev_if, > > cma_dst_addr(node_id_priv), this); > > > > The only way to fix it is to freeze the dst_addr before inserting > > things into the rb tree. > I completely agree, and this was my assumption that after resolve address, > and resolve route(where I add to the tree), the dst_addr is frozen, the only > scenario where it isn't was the resolve_prepare_src failure which some why > nullified the value instead of keeping the original. Then fix the control flow so it doesn't do the nullification if it didn't change the value You can't just change it while it is in the rb tree, that is racy Jason
Agree, but changing the control flow of this function is really problematic , it was even tried before if you remember commit "e4103312d7b7a" , it got something to do with port allocation, I'll take another look over the code to see what other options we have though. Since in short, you are right it is racy now. On 12/12/2022 15:43, Jason Gunthorpe wrote: > On Mon, Dec 12, 2022 at 03:42:07PM +0200, Patrisious Haddad wrote: >> I think we have the same realization but different understanding of the >> code, please correct what I'm missing, rest inline: >> >> On 12/12/2022 15:27, Jason Gunthorpe wrote: >>> On Sun, Dec 11, 2022 at 11:08:30AM +0200, Leon Romanovsky wrote: >>>> From: Patrisious Haddad <phaddad@nvidia.com> >>>> >>>> resolve_prepare_src() changes the destination address of the id, >>>> regardless of success, and on failure zeroes it out. >>>> >>>> Instead on function failure keep the original destination address >>>> of the id. >>>> >>>> Since the id could have been already added to the cm id tree and >>>> zeroing its destination address, could result in a key mismatch or >>>> multiple ids having the same key(zero) in the tree which could lead to: >>> >>> Oh, this can't be right >>> >>> The destination address is variable and it is changed by resolve even >>> in good cases. >> This is what I don't think can happen, since one address is resolved(bound), >> it can't be bound again so each an other try of resolve would fail and enter >> the error flow which I just fixed. >>> >>> So this part of the rb search is nonsense: >>> >>> result = compare_netdev_and_ip( >>> node_id_priv->id.route.addr.dev_addr.bound_dev_if, >>> cma_dst_addr(node_id_priv), this); >>> >>> The only way to fix it is to freeze the dst_addr before inserting >>> things into the rb tree. >> I completely agree, and this was my assumption that after resolve address, >> and resolve route(where I add to the tree), the dst_addr is frozen, the only >> scenario where it isn't was the resolve_prepare_src failure which some why >> nullified the value instead of keeping the original. > > Then fix the control flow so it doesn't do the nullification if it > didn't change the value > > You can't just change it while it is in the rb tree, that is racy > > Jason
On Mon, Dec 12, 2022 at 03:55:37PM +0200, Patrisious Haddad wrote: > Agree, but changing the control flow of this function is really problematic > , it was even tried before if you remember commit "e4103312d7b7a" , it got > something to do with port allocation, I'll take another look over the code > to see what other options we have though. Yes, we've changed this function many times because it is badly mis-designed Jason
Btw there is the easy ugly fix obviously, which would be this patch + locking this function with the tree spin-lock(to avoid any race). I'll check however if there is hope for a better possible design for this function. On 12/12/2022 16:00, Jason Gunthorpe wrote: > On Mon, Dec 12, 2022 at 03:55:37PM +0200, Patrisious Haddad wrote: >> Agree, but changing the control flow of this function is really problematic >> , it was even tried before if you remember commit "e4103312d7b7a" , it got >> something to do with port allocation, I'll take another look over the code >> to see what other options we have though. > > Yes, we've changed this function many times because it is badly > mis-designed > > Jason
On Mon, Dec 12, 2022 at 04:06:03PM +0200, Patrisious Haddad wrote: > Btw there is the easy ugly fix obviously, which would be this patch + > locking this function with the tree spin-lock(to avoid any race). > > I'll check however if there is hope for a better possible design for this > function. The usual way I've fixed this is to avoid touching, in this case, cma_dst_addr() in the call chain. eg we already pass in the correct dst_addr What you've done is made it so that in RDMA_CM_ROUTE_QUERY and beyond the CM id's dst cannot change. The trick with this nasty code is that it is trying to trigger auto-bind, and it has to do it blind because of bad code structure So, try something like this: diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c index 13e0ab785baa24..1d1f9cd01dd38f 100644 --- a/drivers/infiniband/core/cma.c +++ b/drivers/infiniband/core/cma.c @@ -3547,7 +3547,7 @@ static int cma_bind_addr(struct rdma_cm_id *id, struct sockaddr *src_addr, struct sockaddr_storage zero_sock = {}; if (src_addr && src_addr->sa_family) - return rdma_bind_addr(id, src_addr); + return rdma_bind_addr_dst(id, src_addr, dst_addr); /* * When the src_addr is not specified, automatically supply an any addr @@ -3567,7 +3567,7 @@ static int cma_bind_addr(struct rdma_cm_id *id, struct sockaddr *src_addr, ((struct sockaddr_ib *)&zero_sock)->sib_pkey = ((struct sockaddr_ib *)dst_addr)->sib_pkey; } - return rdma_bind_addr(id, (struct sockaddr *)&zero_sock); + return rdma_bind_addr_dst(id, (struct sockaddr *)&zero_sock, dst_addr); } /* @@ -3582,17 +3582,14 @@ static int resolve_prepare_src(struct rdma_id_private *id_priv, { int ret; - memcpy(cma_dst_addr(id_priv), dst_addr, rdma_addr_size(dst_addr)); if (!cma_comp_exch(id_priv, RDMA_CM_ADDR_BOUND, RDMA_CM_ADDR_QUERY)) { /* For a well behaved ULP state will be RDMA_CM_IDLE */ ret = cma_bind_addr(&id_priv->id, src_addr, dst_addr); if (ret) - goto err_dst; + return ret; if (WARN_ON(!cma_comp_exch(id_priv, RDMA_CM_ADDR_BOUND, - RDMA_CM_ADDR_QUERY))) { - ret = -EINVAL; - goto err_dst; - } + RDMA_CM_ADDR_QUERY))) + return -EINVAL; } if (cma_family(id_priv) != dst_addr->sa_family) { @@ -3603,8 +3600,6 @@ static int resolve_prepare_src(struct rdma_id_private *id_priv, err_state: cma_comp_exch(id_priv, RDMA_CM_ADDR_QUERY, RDMA_CM_ADDR_BOUND); -err_dst: - memset(cma_dst_addr(id_priv), 0, rdma_addr_size(dst_addr)); return ret; } @@ -4058,27 +4053,25 @@ int rdma_listen(struct rdma_cm_id *id, int backlog) } EXPORT_SYMBOL(rdma_listen); -int rdma_bind_addr(struct rdma_cm_id *id, struct sockaddr *addr) +static int rdma_bind_addr_dst(struct rdma_id_private *id_priv, + struct sockaddr *addr, struct sockaddr *daddr) { - struct rdma_id_private *id_priv; int ret; - struct sockaddr *daddr; if (addr->sa_family != AF_INET && addr->sa_family != AF_INET6 && addr->sa_family != AF_IB) return -EAFNOSUPPORT; - id_priv = container_of(id, struct rdma_id_private, id); if (!cma_comp_exch(id_priv, RDMA_CM_IDLE, RDMA_CM_ADDR_BOUND)) return -EINVAL; - ret = cma_check_linklocal(&id->route.addr.dev_addr, addr); + ret = cma_check_linklocal(&id_priv->id.route.addr.dev_addr, addr); if (ret) goto err1; memcpy(cma_src_addr(id_priv), addr, rdma_addr_size(addr)); if (!cma_any_addr(addr)) { - ret = cma_translate_addr(addr, &id->route.addr.dev_addr); + ret = cma_translate_addr(addr, &id_priv->id.route.addr.dev_addr); if (ret) goto err1; @@ -4098,8 +4091,14 @@ int rdma_bind_addr(struct rdma_cm_id *id, struct sockaddr *addr) } #endif } - daddr = cma_dst_addr(id_priv); + + /* + * FIXME: This seems wrong, we can't just blidnly replace the sa_family + * unless we know the daddr is zero. It will corrupt it. + */ daddr->sa_family = addr->sa_family; + if (daddr != cma_dst_addr(id_priv)) + memcpy(cma_dst_addr(id_priv), daddr, rdma_addr_size(addr)); ret = cma_get_port(id_priv); if (ret) @@ -4115,6 +4114,14 @@ int rdma_bind_addr(struct rdma_cm_id *id, struct sockaddr *addr) cma_comp_exch(id_priv, RDMA_CM_ADDR_BOUND, RDMA_CM_IDLE); return ret; } + +int rdma_bind_addr(struct rdma_cm_id *id, struct sockaddr *addr) +{ + struct rdma_id_private *id_priv = + container_of(id, struct rdma_id_private, id); + + return rdma_bind_addr_dst(id_priv, addr, cma_dst_addr(id_priv)); +} EXPORT_SYMBOL(rdma_bind_addr); static int cma_format_hdr(void *hdr, struct rdma_id_private *id_priv)
diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c index 1fca0a24f30f..2d4c391e36a9 100644 --- a/drivers/infiniband/core/cma.c +++ b/drivers/infiniband/core/cma.c @@ -3584,8 +3584,11 @@ static int resolve_prepare_src(struct rdma_id_private *id_priv, struct sockaddr *src_addr, const struct sockaddr *dst_addr) { + struct sockaddr org_addr = {}; int ret; + memcpy(&org_addr, cma_dst_addr(id_priv), + rdma_addr_size(cma_dst_addr(id_priv))); memcpy(cma_dst_addr(id_priv), dst_addr, rdma_addr_size(dst_addr)); if (!cma_comp_exch(id_priv, RDMA_CM_ADDR_BOUND, RDMA_CM_ADDR_QUERY)) { /* For a well behaved ULP state will be RDMA_CM_IDLE */ @@ -3608,7 +3611,7 @@ static int resolve_prepare_src(struct rdma_id_private *id_priv, err_state: cma_comp_exch(id_priv, RDMA_CM_ADDR_QUERY, RDMA_CM_ADDR_BOUND); err_dst: - memset(cma_dst_addr(id_priv), 0, rdma_addr_size(dst_addr)); + memcpy(cma_dst_addr(id_priv), &org_addr, rdma_addr_size(&org_addr)); return ret; }