diff mbox series

Fix sendonly join going away after Reregister event

Message ID alpine.DEB.2.22.394.2101211318530.120233@www.lameter.com (mailing list archive)
State Superseded
Headers show
Series Fix sendonly join going away after Reregister event | expand

Commit Message

Christoph Lameter (Ampere) Jan. 21, 2021, 1:24 p.m. UTC
From: Christoph Lameter <cl@linux.com>
Subject: [PATCH] Fix sendonly join going away after Reregister event

When a server receives a REREG event then the SM information in
the kernel is marked as invalid and a request is sent to the SM to update
the information.

However, receiving a REREG also occurs in user space applications that
are now trying to rejoin the multicast groups.

If the SM information is invalid then ib_sa_sendonly_fullmem_support()
returns false. That is wrong because it just means that we do not know
yet if the potentially new SM supports sendonly joins. It does not mean
that the SM does not support Sendonly joins.

This patch simply attempts to waits until the SM information is updated
and the determination can be made.

The code has not been testet but compiles fine.
I am not sure if it is good to do an msleep here.

Signed-off-by: Christoph Lameter <cl@linux.com>

Comments

Leon Romanovsky Jan. 21, 2021, 4:11 p.m. UTC | #1
On Thu, Jan 21, 2021 at 01:24:43PM +0000, Christoph Lameter wrote:
> From: Christoph Lameter <cl@linux.com>
> Subject: [PATCH] Fix sendonly join going away after Reregister event
>
> When a server receives a REREG event then the SM information in
> the kernel is marked as invalid and a request is sent to the SM to update
> the information.
>
> However, receiving a REREG also occurs in user space applications that
> are now trying to rejoin the multicast groups.
>
> If the SM information is invalid then ib_sa_sendonly_fullmem_support()
> returns false. That is wrong because it just means that we do not know
> yet if the potentially new SM supports sendonly joins. It does not mean
> that the SM does not support Sendonly joins.
>
> This patch simply attempts to waits until the SM information is updated
> and the determination can be made.
>
> The code has not been testet but compiles fine.
> I am not sure if it is good to do an msleep here.
>
> Signed-off-by: Christoph Lameter <cl@linux.com>
>
> Index: linux/drivers/infiniband/core/sa_query.c
> ===================================================================
> --- linux.orig/drivers/infiniband/core/sa_query.c	2020-12-17 14:51:15.301206041 +0000
> +++ linux/drivers/infiniband/core/sa_query.c	2021-01-21 12:52:53.577943481 +0000
> @@ -1963,11 +1963,19 @@ bool ib_sa_sendonly_fullmem_support(stru
>  	if (!sa_dev)
>  		return ret;
>
> +redo:
>  	port  = &sa_dev->port[port_num - sa_dev->start_port];
>
> +	while (!port->classport_info.valid)
> +		msleep(100);
> +
>  	spin_lock_irqsave(&port->classport_lock, flags);
> -	if ((port->classport_info.valid) &&
> -	    (port->classport_info.data.type == RDMA_CLASS_PORT_INFO_IB))
> +	if (!port->classport_info.valid) {
> +		/* Need to wait until the SM data is available */
> +		spin_unlock_irqrestore(&port->classport_lock, flags);
> +		goto redo;

We have all potential to loop forever here, if valid doesn't change.

> +	}
> +	if ((port->classport_info.data.type == RDMA_CLASS_PORT_INFO_IB))
>  		ret = ib_get_cpi_capmask2(&port->classport_info.data.ib)
>  			& IB_SA_CAP_MASK2_SENDONLY_FULL_MEM_SUPPORT;
>  	spin_unlock_irqrestore(&port->classport_lock, flags);
Christoph Lameter (Ampere) Jan. 22, 2021, 8:24 a.m. UTC | #2
On Thu, 21 Jan 2021, Leon Romanovsky wrote:

> >  	spin_lock_irqsave(&port->classport_lock, flags);
> > -	if ((port->classport_info.valid) &&
> > -	    (port->classport_info.data.type == RDMA_CLASS_PORT_INFO_IB))
> > +	if (!port->classport_info.valid) {
> > +		/* Need to wait until the SM data is available */
> > +		spin_unlock_irqrestore(&port->classport_lock, flags);
> > +		goto redo;
>
> We have all potential to loop forever here, if valid doesn't change.
>

Right. So what is the right solution here? The sendonly check function could return
an errno instead?

0	= Sendonly join is supported
-EAGAIN = SM information is currently invalid
-ENOSUP = SM does not support sendonly join

Since all SMs out there have had support for sendonly join for years now
we could just remove the check entirely. If there is an old grizzly SM out
there then it would not process that join request and would return an
error.
Leon Romanovsky Jan. 24, 2021, 6:57 a.m. UTC | #3
On Fri, Jan 22, 2021 at 08:24:57AM +0000, Christoph Lameter wrote:
> On Thu, 21 Jan 2021, Leon Romanovsky wrote:
>
> > >  	spin_lock_irqsave(&port->classport_lock, flags);
> > > -	if ((port->classport_info.valid) &&
> > > -	    (port->classport_info.data.type == RDMA_CLASS_PORT_INFO_IB))
> > > +	if (!port->classport_info.valid) {
> > > +		/* Need to wait until the SM data is available */
> > > +		spin_unlock_irqrestore(&port->classport_lock, flags);
> > > +		goto redo;
> >
> > We have all potential to loop forever here, if valid doesn't change.
> >
>
> Right. So what is the right solution here? The sendonly check function could return
> an errno instead?
>
> 0	= Sendonly join is supported
> -EAGAIN = SM information is currently invalid
> -ENOSUP = SM does not support sendonly join

I would do the same flow as in update_ib_cpi(), use retry count and loop
with delay, but without workqueue.

>
> Since all SMs out there have had support for sendonly join for years now
> we could just remove the check entirely. If there is an old grizzly SM out
> there then it would not process that join request and would return an
> error.

I have no idea if it possible, if yes, this will be the best solution.

Thanks
diff mbox series

Patch

Index: linux/drivers/infiniband/core/sa_query.c
===================================================================
--- linux.orig/drivers/infiniband/core/sa_query.c	2020-12-17 14:51:15.301206041 +0000
+++ linux/drivers/infiniband/core/sa_query.c	2021-01-21 12:52:53.577943481 +0000
@@ -1963,11 +1963,19 @@  bool ib_sa_sendonly_fullmem_support(stru
 	if (!sa_dev)
 		return ret;

+redo:
 	port  = &sa_dev->port[port_num - sa_dev->start_port];

+	while (!port->classport_info.valid)
+		msleep(100);
+
 	spin_lock_irqsave(&port->classport_lock, flags);
-	if ((port->classport_info.valid) &&
-	    (port->classport_info.data.type == RDMA_CLASS_PORT_INFO_IB))
+	if (!port->classport_info.valid) {
+		/* Need to wait until the SM data is available */
+		spin_unlock_irqrestore(&port->classport_lock, flags);
+		goto redo;
+	}
+	if ((port->classport_info.data.type == RDMA_CLASS_PORT_INFO_IB))
 		ret = ib_get_cpi_capmask2(&port->classport_info.data.ib)
 			& IB_SA_CAP_MASK2_SENDONLY_FULL_MEM_SUPPORT;
 	spin_unlock_irqrestore(&port->classport_lock, flags);