diff mbox

failure to get gid with rdma_bind_addr with >= 3.10 kernels

Message ID 1828884A29C6694DAF28B7E6B8A8237388CF6FD0@ORSMSX109.amr.corp.intel.com (mailing list archive)
State Rejected
Headers show

Commit Message

Hefty, Sean Nov. 11, 2013, 6:46 p.m. UTC
> > As part of the AF_IB changes, additional queries were introduced that
> > separated out retrieving the different kernel data -- assigned address,
> > GID mappings, and IB PR data.  (New queries were necessary, since
> > sockaddr_ib is larger than sockaddr_in6, and we want to eventually
> > handle non-reversible paths.)  The librdmacm 1.0.17 changed which query
> > was called after rdma_bind_addr was invoked to only retrieve the
> > assigned address, versus retrieving everything.  I think this is why you
> > see an SGID of all 0's after calling rdma_bind_addr.
> 
> Is there another call that allows the retrieval of the SGID?

Yes - something like this patch should help, but I don't think this is the correct behavior when the IP address is 'any' addresss.

Retrieve SGID after calling rdma_bind_addr

From: Sean Hefty <sean.hefty@intel.com>

A change was made to rdma_bind_addr when AF_IB is enabled
to only retrieve the resulting bound address.  Previously,
rdma_bind_addr would retrieve the corresponding SGID as
well.  This breaks some apps which were checking the
SGID after binding to an IP address.  Revert to the
previous behavior of also retrieving the SGID after
calling rdma_bind_addr.
---
 src/cma.c |    5 ++++-
 1 files changed, 4 insertions(+), 1 deletions(-)



--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Christoph Lameter (Ampere) Nov. 11, 2013, 7:41 p.m. UTC | #1
On Mon, 11 Nov 2013, Hefty, Sean wrote:

> Yes - something like this patch should help, but I don't think this is the correct behavior when the IP address is 'any' addresss.

Well it works fine. Application starts now.

Tested-by: Christoph Lameter <cl@linux.com>
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Or Gerlitz Nov. 12, 2013, 7:08 a.m. UTC | #2
On 11/11/2013 20:46, Hefty, Sean wrote:
> Yes - something like this patch should help, but I don't think this is the correct behavior when the IP address is 'any' addresss.
>
> Retrieve SGID after calling rdma_bind_addr
>
> From: Sean Hefty<sean.hefty@intel.com>
>
> A change was made to rdma_bind_addr when AF_IB is enabled
> to only retrieve the resulting bound address.  Previously,
> rdma_bind_addr would retrieve the corresponding SGID as
> well.  This breaks some apps which were checking the
> SGID after binding to an IP address.  Revert to the
> previous behavior of also retrieving the SGID after
> calling rdma_bind_addr.
> ---
>   src/cma.c |    5 ++++-
>   1 files changed, 4 insertions(+), 1 deletions(-)
>
> diff --git a/src/cma.c b/src/cma.c
> index 4f41879..0cf4203 100644
> --- a/src/cma.c
> +++ b/src/cma.c
> @@ -753,7 +753,10 @@ static int rdma_bind_addr2(struct rdma_cm_id *id, struct sockaddr *addr,
>   	if (ret != sizeof cmd)
>   		return (ret >= 0) ? ERR(ENODATA) : -1;
>   
> -	return ucma_query_addr(id);
> +	ret = ucma_query_addr(id);
> +	if (!ret)
> +		ret = ucma_query_gid(id);
> +	return ret;
>   }
>   
>   int rdma_bind_addr(struct rdma_cm_id *id, struct sockaddr *addr)

Sean, how do we continue here? the patch worked for Christoph, so are 
going to merge it into librdmacm or it needs more work?
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Hefty, Sean Nov. 13, 2013, 8:15 p.m. UTC | #3
> Sean, how do we continue here? the patch worked for Christoph, so are
> going to merge it into librdmacm or it needs more work?

I will merge it, since it fixes the issue.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Or Gerlitz Nov. 13, 2013, 10:38 p.m. UTC | #4
On Wed, Nov 13, 2013 at 10:15 PM, Hefty, Sean <sean.hefty@intel.com> wrote:
>
>> Sean, how do we continue here? the patch worked for Christoph, so are
>> going to merge it into librdmacm or it needs more work?
>
> I will merge it, since it fixes the issue.

cool
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Or Gerlitz Jan. 2, 2014, 9:39 a.m. UTC | #5
Hi Sean,

Can you please prepare a fresh (gift to 2014) release of librdmacm, such 
that distributions && people using kernels >= 3.11 will be able to have 
this functionality restored once they upgrade librdmacm.

thanks --

Or
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Yann Droneaud Jan. 2, 2014, 3:17 p.m. UTC | #6
Hi,

Le jeudi 02 janvier 2014 à 11:39 +0200, Or Gerlitz a écrit :

> Can you please prepare a fresh (gift to 2014) release of librdmacm, such 
> that distributions && people using kernels >= 3.11 will be able to have 
> this functionality restored once they upgrade librdmacm.
> 

Why is this issue addressed by a patch in librdmacm ?

After reading quickly the thread, I still believe that's sound like a
kernel regression which broke existing userspace applications.

So a patch must be applied on the kernel to fix that regression.

Introducing AF_IB must not have changed the behavor for existing
applications. Existing applications must not need a new librdmacm when
they don't use newer kernel extensions.

Regards.

PS: Happy new (gregorian calendar's) year 2014 and best wishes ;)
Hefty, Sean Jan. 2, 2014, 4:33 p.m. UTC | #7
> After reading quickly the thread, I still believe that's sound like a

> kernel regression which broke existing userspace applications.

> 

> So a patch must be applied on the kernel to fix that regression.


When AF_IB was added to the kernel, corresponding changes were added to the librdmacm.  Those changes to the librdmacm are the cause of the issue.

The kernel rdma cm previously supported a single 'query' call.  With AF_IB, it supports the original call, plus 3 additional calls.  The problem is the librdmacm using the newer query calls.
 
> Introducing AF_IB must not have changed the behavor for existing

> applications. Existing applications must not need a new librdmacm when

> they don't use newer kernel extensions.


An older version of the librdmacm that does NOT support AF_IB should work fine.

- Sean
Yann Droneaud Jan. 2, 2014, 4:59 p.m. UTC | #8
Le jeudi 02 janvier 2014 à 16:33 +0000, Hefty, Sean a écrit :
> > After reading quickly the thread, I still believe that's sound like a
> > kernel regression which broke existing userspace applications.
> > 
> > So a patch must be applied on the kernel to fix that regression.
> 
> When AF_IB was added to the kernel, corresponding changes were added to the librdmacm.
> Those changes to the librdmacm are the cause of the issue.
> 

I was missing this part, the original message was misleading for me.

So it's the combination of librdmacm >= 1.0.17 and kernel >= 3.10+ which
is affected by the problem reported.

> The kernel rdma cm previously supported a single 'query' call. 
> With AF_IB, it supports the original call, plus 3 additional calls.
> The problem is the librdmacm using the newer query calls.
>  
> > Introducing AF_IB must not have changed the behavor for existing
> > applications. Existing applications must not need a new librdmacm when
> > they don't use newer kernel extensions.
> 
> An older version of the librdmacm that does NOT support AF_IB should work fine.
> 

Thanks for clarifying this for me.

Regards.
Hefty, Sean Jan. 2, 2014, 5:04 p.m. UTC | #9
> Can you please prepare a fresh (gift to 2014) release of librdmacm, such
> that distributions && people using kernels >= 3.11 will be able to have
> this functionality restored once they upgrade librdmacm.

There's a 1.0.17.1 release available with this fix.

I will release 1.0.18 within a couple of weeks.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Or Gerlitz Jan. 5, 2014, 7:25 a.m. UTC | #10
On 02/01/2014 19:04, Hefty, Sean wrote:
>> Can you please prepare a fresh (gift to 2014) release of librdmacm, such
>> that distributions && people using kernels >= 3.11 will be able to have
>> this functionality restored once they upgrade librdmacm.
> There's a 1.0.17.1 release available with this fix.

good to know... was it announced on the list?


> I will release 1.0.18 within a couple of weeks.

cool
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/src/cma.c b/src/cma.c
index 4f41879..0cf4203 100644
--- a/src/cma.c
+++ b/src/cma.c
@@ -753,7 +753,10 @@  static int rdma_bind_addr2(struct rdma_cm_id *id, struct sockaddr *addr,
 	if (ret != sizeof cmd)
 		return (ret >= 0) ? ERR(ENODATA) : -1;
 
-	return ucma_query_addr(id);
+	ret = ucma_query_addr(id);
+	if (!ret)
+		ret = ucma_query_gid(id);
+	return ret;
 }
 
 int rdma_bind_addr(struct rdma_cm_id *id, struct sockaddr *addr)