Message ID | alpine.DEB.2.22.394.2101211318530.120233@www.lameter.com (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | Fix sendonly join going away after Reregister event | expand |
On Thu, Jan 21, 2021 at 01:24:43PM +0000, Christoph Lameter wrote: > From: Christoph Lameter <cl@linux.com> > Subject: [PATCH] Fix sendonly join going away after Reregister event > > When a server receives a REREG event then the SM information in > the kernel is marked as invalid and a request is sent to the SM to update > the information. > > However, receiving a REREG also occurs in user space applications that > are now trying to rejoin the multicast groups. > > If the SM information is invalid then ib_sa_sendonly_fullmem_support() > returns false. That is wrong because it just means that we do not know > yet if the potentially new SM supports sendonly joins. It does not mean > that the SM does not support Sendonly joins. > > This patch simply attempts to waits until the SM information is updated > and the determination can be made. > > The code has not been testet but compiles fine. > I am not sure if it is good to do an msleep here. > > Signed-off-by: Christoph Lameter <cl@linux.com> > > Index: linux/drivers/infiniband/core/sa_query.c > =================================================================== > --- linux.orig/drivers/infiniband/core/sa_query.c 2020-12-17 14:51:15.301206041 +0000 > +++ linux/drivers/infiniband/core/sa_query.c 2021-01-21 12:52:53.577943481 +0000 > @@ -1963,11 +1963,19 @@ bool ib_sa_sendonly_fullmem_support(stru > if (!sa_dev) > return ret; > > +redo: > port = &sa_dev->port[port_num - sa_dev->start_port]; > > + while (!port->classport_info.valid) > + msleep(100); > + > spin_lock_irqsave(&port->classport_lock, flags); > - if ((port->classport_info.valid) && > - (port->classport_info.data.type == RDMA_CLASS_PORT_INFO_IB)) > + if (!port->classport_info.valid) { > + /* Need to wait until the SM data is available */ > + spin_unlock_irqrestore(&port->classport_lock, flags); > + goto redo; We have all potential to loop forever here, if valid doesn't change. > + } > + if ((port->classport_info.data.type == RDMA_CLASS_PORT_INFO_IB)) > ret = ib_get_cpi_capmask2(&port->classport_info.data.ib) > & IB_SA_CAP_MASK2_SENDONLY_FULL_MEM_SUPPORT; > spin_unlock_irqrestore(&port->classport_lock, flags);
On Thu, 21 Jan 2021, Leon Romanovsky wrote: > > spin_lock_irqsave(&port->classport_lock, flags); > > - if ((port->classport_info.valid) && > > - (port->classport_info.data.type == RDMA_CLASS_PORT_INFO_IB)) > > + if (!port->classport_info.valid) { > > + /* Need to wait until the SM data is available */ > > + spin_unlock_irqrestore(&port->classport_lock, flags); > > + goto redo; > > We have all potential to loop forever here, if valid doesn't change. > Right. So what is the right solution here? The sendonly check function could return an errno instead? 0 = Sendonly join is supported -EAGAIN = SM information is currently invalid -ENOSUP = SM does not support sendonly join Since all SMs out there have had support for sendonly join for years now we could just remove the check entirely. If there is an old grizzly SM out there then it would not process that join request and would return an error.
On Fri, Jan 22, 2021 at 08:24:57AM +0000, Christoph Lameter wrote: > On Thu, 21 Jan 2021, Leon Romanovsky wrote: > > > > spin_lock_irqsave(&port->classport_lock, flags); > > > - if ((port->classport_info.valid) && > > > - (port->classport_info.data.type == RDMA_CLASS_PORT_INFO_IB)) > > > + if (!port->classport_info.valid) { > > > + /* Need to wait until the SM data is available */ > > > + spin_unlock_irqrestore(&port->classport_lock, flags); > > > + goto redo; > > > > We have all potential to loop forever here, if valid doesn't change. > > > > Right. So what is the right solution here? The sendonly check function could return > an errno instead? > > 0 = Sendonly join is supported > -EAGAIN = SM information is currently invalid > -ENOSUP = SM does not support sendonly join I would do the same flow as in update_ib_cpi(), use retry count and loop with delay, but without workqueue. > > Since all SMs out there have had support for sendonly join for years now > we could just remove the check entirely. If there is an old grizzly SM out > there then it would not process that join request and would return an > error. I have no idea if it possible, if yes, this will be the best solution. Thanks
Index: linux/drivers/infiniband/core/sa_query.c =================================================================== --- linux.orig/drivers/infiniband/core/sa_query.c 2020-12-17 14:51:15.301206041 +0000 +++ linux/drivers/infiniband/core/sa_query.c 2021-01-21 12:52:53.577943481 +0000 @@ -1963,11 +1963,19 @@ bool ib_sa_sendonly_fullmem_support(stru if (!sa_dev) return ret; +redo: port = &sa_dev->port[port_num - sa_dev->start_port]; + while (!port->classport_info.valid) + msleep(100); + spin_lock_irqsave(&port->classport_lock, flags); - if ((port->classport_info.valid) && - (port->classport_info.data.type == RDMA_CLASS_PORT_INFO_IB)) + if (!port->classport_info.valid) { + /* Need to wait until the SM data is available */ + spin_unlock_irqrestore(&port->classport_lock, flags); + goto redo; + } + if ((port->classport_info.data.type == RDMA_CLASS_PORT_INFO_IB)) ret = ib_get_cpi_capmask2(&port->classport_info.data.ib) & IB_SA_CAP_MASK2_SENDONLY_FULL_MEM_SUPPORT; spin_unlock_irqrestore(&port->classport_lock, flags);