diff mbox

IB/cma: cma_match_net_dev needs to take into account port_num

Message ID 1450710084-22547-1-git-send-email-matanb@mellanox.com (mailing list archive)
State Accepted
Headers show

Commit Message

Matan Barak Dec. 21, 2015, 3:01 p.m. UTC
Previously, cma_match_net_dev called cma_protocol_roce which
tried to verify that the IB device uses RoCE protocol. However,
if rdma_id didn't have a bounded port, it used the first port
of the device.

In VPI systems, the first port might be an IB port while the second
one could be an Ethernet port. This made requests for unbounded rdma_ids
that come from the Ethernet port fail.
Fixing this by passing the port of the request and checking this port
of the device.

Fixes: b8cab5dab15f ('IB/cma: Accept connection without a valid netdev on RoCE')
Signed-off-by: Matan Barak <matanb@mellanox.com>
---
Hi Doug,

This patch fixes a bug in VPI systems, where the first port is configured
as IB and the second one is configured as Ethernet.
In this case, if the rdma_id isn't bounded to a port, cma_match_net_dev
will try to verify that the first port is a RoCE port and fail.
This is fixed by passing the port of the incoming request.

Regards,
Matan

 drivers/infiniband/core/cma.c |   16 +++++++++-------
 1 files changed, 9 insertions(+), 7 deletions(-)

Comments

Or Gerlitz Dec. 22, 2015, 6:44 a.m. UTC | #1
On 12/21/2015 5:01 PM, Matan Barak wrote:
> Previously, cma_match_net_dev called cma_protocol_roce which
> tried to verify that the IB device uses RoCE protocol. However,
> if rdma_id didn't have a bounded port, it used the first port
> of the device.

maybe prefer a higher then code speak language e.g "if the rdma id 
didn't have"  also below "unbounded rdma ids"

>
> In VPI systems, the first port might be an IB port while the second
> one could be an Ethernet port. This made requests for unbounded rdma_ids
> that come from the Ethernet port fail.

add "to" --> "Ethernet port to fail"

> Fixing this by passing the port of the request and checking this port
> of the device.

OK, so this fix will work for both ib/eth and eth/ib configs, right? good.


>
> Fixes: b8cab5dab15f ('IB/cma: Accept connection without a valid netdev on RoCE')

Reported-by: Or Gerlitz <ogerlitz@mellanox.com>


> Signed-off-by: Matan Barak <matanb@mellanox.com>

Doug, the bug fixes a commit from from 4.3, lets fix it in 4.4 and later 
we will send it to -stable as well. So for 4.4 there's this one and the 
kvfree fix [1]

Or.

[1] https://patchwork.kernel.org/patch/7868481/


> ---
> Hi Doug,
>
> This patch fixes a bug in VPI systems, where the first port is configured
> as IB and the second one is configured as Ethernet.
> In this case, if the rdma_id isn't bounded to a port, cma_match_net_dev
> will try to verify that the first port is a RoCE port and fail.
> This is fixed by passing the port of the incoming request.
>
> Regards,
> Matan
>
>   drivers/infiniband/core/cma.c |   16 +++++++++-------
>   1 files changed, 9 insertions(+), 7 deletions(-)
>
> diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c
> index d2d5d00..c8a265c 100644
> --- a/drivers/infiniband/core/cma.c
> +++ b/drivers/infiniband/core/cma.c
> @@ -1265,15 +1265,17 @@ static bool cma_protocol_roce(const struct rdma_cm_id *id)
>   	return cma_protocol_roce_dev_port(device, port_num);
>   }
>   
> -static bool cma_match_net_dev(const struct rdma_id_private *id_priv,
> -			      const struct net_device *net_dev)
> +static bool cma_match_net_dev(const struct rdma_cm_id *id,
> +			      const struct net_device *net_dev,
> +			      u8 port_num)
>   {
> -	const struct rdma_addr *addr = &id_priv->id.route.addr;
> +	const struct rdma_addr *addr = &id->route.addr;
>   
>   	if (!net_dev)
>   		/* This request is an AF_IB request or a RoCE request */
> -		return addr->src_addr.ss_family == AF_IB ||
> -		       cma_protocol_roce(&id_priv->id);
> +		return (!id->port_num || id->port_num == port_num) &&
> +		       (addr->src_addr.ss_family == AF_IB ||
> +			cma_protocol_roce_dev_port(id->device, port_num));
>   
>   	return !addr->dev_addr.bound_dev_if ||
>   	       (net_eq(dev_net(net_dev), addr->dev_addr.net) &&
> @@ -1295,13 +1297,13 @@ static struct rdma_id_private *cma_find_listener(
>   	hlist_for_each_entry(id_priv, &bind_list->owners, node) {
>   		if (cma_match_private_data(id_priv, ib_event->private_data)) {
>   			if (id_priv->id.device == cm_id->device &&
> -			    cma_match_net_dev(id_priv, net_dev))
> +			    cma_match_net_dev(&id_priv->id, net_dev, req->port))
>   				return id_priv;
>   			list_for_each_entry(id_priv_dev,
>   					    &id_priv->listen_list,
>   					    listen_list) {
>   				if (id_priv_dev->id.device == cm_id->device &&
> -				    cma_match_net_dev(id_priv_dev, net_dev))
> +				    cma_match_net_dev(&id_priv_dev->id, net_dev, req->port))
>   					return id_priv_dev;
>   			}
>   		}

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Or Gerlitz Dec. 22, 2015, 7:17 a.m. UTC | #2
On 12/21/2015 5:01 PM, Matan Barak wrote:
> This patch fixes a bug in VPI systems, where the first port is configured
> as IB and the second one is configured as Ethernet. In this case, if the rdma_id isn't bounded to a port, cma_match_net_dev will try to verify that the first port is a RoCE port and fail. This is fixed by passing the port of the incoming request.

OK -- we have another bug down there, cma loopback doesn't work, same 
reject reason (below).This happens in both VPI and non-VPI configurations.

Works well with 4.2.3

Or.


$ rping -d -v -c -a 127.0.0.1 -C 1
verbose
client
count 1
created cm_id 0x6087d0
cma_event type RDMA_CM_EVENT_ADDR_RESOLVED cma_id 0x6087d0 (parent)
cma_event type RDMA_CM_EVENT_ROUTE_RESOLVED cma_id 0x6087d0 (parent)
rdma_resolve_addr - rdma_resolve_route successful
created pd 0x60e5f0
created channel 0x608250
created cq 0x608a20
created qp 0x6082e0
rping_setup_buffers called on cb 0x606010
allocated & registered buffers...
cq_thread started.
wait for CONNECTED state 10
cma_event type RDMA_CM_EVENT_REJECTED cma_id 0x6087d0 (parent)
cma event RDMA_CM_EVENT_REJECTED, error 28
connect error -1
rping_free_buffers called on cb 0x606010
destroy cm_id 0x6087d0


--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Or Gerlitz Dec. 22, 2015, 10:47 a.m. UTC | #3
On 12/21/2015 5:01 PM, Matan Barak wrote:
> Previously, cma_match_net_dev called cma_protocol_roce which
> tried to verify that the IB device uses RoCE protocol. However,
> if rdma_id didn't have a bounded port, it used the first port
> of the device.
>
> In VPI systems, the first port might be an IB port while the second
> one could be an Ethernet port. This made requests for unbounded rdma_ids
> that come from the Ethernet port fail.
> Fixing this by passing the port of the request and checking this port
> of the device.
>
> Fixes: b8cab5dab15f ('IB/cma: Accept connection without a valid netdev on RoCE')
> Signed-off-by: Matan Barak<matanb@mellanox.com>

seems that the patch is missing from patchworks, I can't explain that.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Or Gerlitz Dec. 22, 2015, 2:42 p.m. UTC | #4
On 12/22/2015 9:17 AM, Or Gerlitz wrote:
> On 12/21/2015 5:01 PM, Matan Barak wrote:
>> This patch fixes a bug in VPI systems, where the first port is 
>> configured
>> as IB and the second one is configured as Ethernet. In this case, if 
>> the rdma_id isn't bounded to a port, cma_match_net_dev will try to 
>> verify that the first port is a RoCE port and fail. This is fixed by 
>> passing the port of the incoming request.
>
> OK -- we have another bug down there, cma loopback doesn't work, same 
> reject reason (below).This happens in both VPI and non-VPI 
> configurations.
>
> Works well with 4.2.3

I made more checks with the 4.2.3 kernel (before all the IB core/cache 
changes that went in 4.3) -- rdma-cm loopback does work as long as there 
are active IB ports in the system. When there are no active IB ports, 
but rather only Eth ports (VPI or plain Eth), rping fails:

# rping -d -v -c -a 127.0.0.1 -C 1
created cm_id 0x6087d0
cma_event type RDMA_CM_EVENT_ADDR_RESOLVED cma_id 0x6087d0 (parent)
rdma_resolve_route: No such device


so it seems we had something there before the 4.3 changes.

Or.

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Doug Ledford Dec. 22, 2015, 6:58 p.m. UTC | #5
On 12/22/2015 05:47 AM, Or Gerlitz wrote:
> On 12/21/2015 5:01 PM, Matan Barak wrote:
>> Previously, cma_match_net_dev called cma_protocol_roce which
>> tried to verify that the IB device uses RoCE protocol. However,
>> if rdma_id didn't have a bounded port, it used the first port
>> of the device.
>>
>> In VPI systems, the first port might be an IB port while the second
>> one could be an Ethernet port. This made requests for unbounded rdma_ids
>> that come from the Ethernet port fail.
>> Fixing this by passing the port of the request and checking this port
>> of the device.
>>
>> Fixes: b8cab5dab15f ('IB/cma: Accept connection without a valid netdev
>> on RoCE')
>> Signed-off-by: Matan Barak<matanb@mellanox.com>
> 
> seems that the patch is missing from patchworks, I can't explain that.

I've already downloaded it and marked it accepted.
Matan Barak Dec. 22, 2015, 7:26 p.m. UTC | #6
On Tue, Dec 22, 2015 at 8:58 PM, Doug Ledford <dledford@redhat.com> wrote:
> On 12/22/2015 05:47 AM, Or Gerlitz wrote:
>> On 12/21/2015 5:01 PM, Matan Barak wrote:
>>> Previously, cma_match_net_dev called cma_protocol_roce which
>>> tried to verify that the IB device uses RoCE protocol. However,
>>> if rdma_id didn't have a bounded port, it used the first port
>>> of the device.
>>>
>>> In VPI systems, the first port might be an IB port while the second
>>> one could be an Ethernet port. This made requests for unbounded rdma_ids
>>> that come from the Ethernet port fail.
>>> Fixing this by passing the port of the request and checking this port
>>> of the device.
>>>
>>> Fixes: b8cab5dab15f ('IB/cma: Accept connection without a valid netdev
>>> on RoCE')
>>> Signed-off-by: Matan Barak<matanb@mellanox.com>
>>
>> seems that the patch is missing from patchworks, I can't explain that.
>
> I've already downloaded it and marked it accepted.
>

Thanks Doug. Would you like that I'll repost the patch with the commit
message changed as Or suggested or is the current version good enough?

Regarding the Ethernet loopback issue, I started looking into that,
but as Or stated, it's broken even before the RoCE patches.

> --
> Doug Ledford <dledford@redhat.com>
>               GPG KeyID: 0E572FDD
>
>

Matan
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Doug Ledford Dec. 22, 2015, 8:01 p.m. UTC | #7
On 12/22/2015 02:26 PM, Matan Barak wrote:
> On Tue, Dec 22, 2015 at 8:58 PM, Doug Ledford <dledford@redhat.com> wrote:
>> On 12/22/2015 05:47 AM, Or Gerlitz wrote:
>>> On 12/21/2015 5:01 PM, Matan Barak wrote:
>>>> Previously, cma_match_net_dev called cma_protocol_roce which
>>>> tried to verify that the IB device uses RoCE protocol. However,
>>>> if rdma_id didn't have a bounded port, it used the first port
>>>> of the device.
>>>>
>>>> In VPI systems, the first port might be an IB port while the second
>>>> one could be an Ethernet port. This made requests for unbounded rdma_ids
>>>> that come from the Ethernet port fail.
>>>> Fixing this by passing the port of the request and checking this port
>>>> of the device.
>>>>
>>>> Fixes: b8cab5dab15f ('IB/cma: Accept connection without a valid netdev
>>>> on RoCE')
>>>> Signed-off-by: Matan Barak<matanb@mellanox.com>
>>>
>>> seems that the patch is missing from patchworks, I can't explain that.
>>
>> I've already downloaded it and marked it accepted.
>>
> 
> Thanks Doug. Would you like that I'll repost the patch with the commit
> message changed as Or suggested or is the current version good enough?

I rewrote most of the commit message myself.  It now reads as such:

commit b00a1609f845df4c75bc62f56d0d71a01fe87cc6
Author: Matan Barak <matanb@mellanox.com>
Date:   Mon Dec 21 17:01:24 2015 +0200

    IB/cma: cma_match_net_dev needs to take into account port_num

    Previously, cma_match_net_dev called cma_protocol_roce which
    tried to verify that the IB device uses RoCE protocol. However,
    if rdma_id wasn't bound to a port, then the check would occur
    against the first port of the device without regard to whether
    that port was even of the same type as the type of port the
    incoming packet was received on.

    Fix this by passing the port of the request and only checking
    against the same port of the device.

    Fixes: b8cab5dab15f ('IB/cma: Accept connection without a valid
netdev on RoCE')
    Signed-off-by: Matan Barak <matanb@mellanox.com>
    Signed-off-by: Doug Ledford <dledford@redhat.com>


> Regarding the Ethernet loopback issue, I started looking into that,
> but as Or stated, it's broken even before the RoCE patches.

Sure, let me know if you root cause it, I'd like to get it into 4.5 if
possible.
Sagi Grimberg Dec. 23, 2015, 8:48 a.m. UTC | #8
FWIW, looks good to me,

Reviewed-by: Sagi Grimberg <sagig@mellanox.com>
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Doug Ledford Dec. 23, 2015, 4:08 p.m. UTC | #9
On 12/22/2015 02:26 PM, Matan Barak wrote:
> On Tue, Dec 22, 2015 at 8:58 PM, Doug Ledford <dledford@redhat.com> wrote:
>> On 12/22/2015 05:47 AM, Or Gerlitz wrote:
>>> On 12/21/2015 5:01 PM, Matan Barak wrote:
>>>> Previously, cma_match_net_dev called cma_protocol_roce which
>>>> tried to verify that the IB device uses RoCE protocol. However,
>>>> if rdma_id didn't have a bounded port, it used the first port
>>>> of the device.
>>>>
>>>> In VPI systems, the first port might be an IB port while the second
>>>> one could be an Ethernet port. This made requests for unbounded rdma_ids
>>>> that come from the Ethernet port fail.
>>>> Fixing this by passing the port of the request and checking this port
>>>> of the device.
>>>>
>>>> Fixes: b8cab5dab15f ('IB/cma: Accept connection without a valid netdev
>>>> on RoCE')
>>>> Signed-off-by: Matan Barak<matanb@mellanox.com>
>>>
>>> seems that the patch is missing from patchworks, I can't explain that.
>>
>> I've already downloaded it and marked it accepted.
>>
> 
> Thanks Doug. Would you like that I'll repost the patch with the commit
> message changed as Or suggested or is the current version good enough?
> 
> Regarding the Ethernet loopback issue, I started looking into that,
> but as Or stated, it's broken even before the RoCE patches.

Ping.  Any progress on this?
Matan Barak Dec. 23, 2015, 4:20 p.m. UTC | #10
On Wed, Dec 23, 2015 at 6:08 PM, Doug Ledford <dledford@redhat.com> wrote:
> On 12/22/2015 02:26 PM, Matan Barak wrote:
>> On Tue, Dec 22, 2015 at 8:58 PM, Doug Ledford <dledford@redhat.com> wrote:
>>> On 12/22/2015 05:47 AM, Or Gerlitz wrote:
>>>> On 12/21/2015 5:01 PM, Matan Barak wrote:
>>>>> Previously, cma_match_net_dev called cma_protocol_roce which
>>>>> tried to verify that the IB device uses RoCE protocol. However,
>>>>> if rdma_id didn't have a bounded port, it used the first port
>>>>> of the device.
>>>>>
>>>>> In VPI systems, the first port might be an IB port while the second
>>>>> one could be an Ethernet port. This made requests for unbounded rdma_ids
>>>>> that come from the Ethernet port fail.
>>>>> Fixing this by passing the port of the request and checking this port
>>>>> of the device.
>>>>>
>>>>> Fixes: b8cab5dab15f ('IB/cma: Accept connection without a valid netdev
>>>>> on RoCE')
>>>>> Signed-off-by: Matan Barak<matanb@mellanox.com>
>>>>
>>>> seems that the patch is missing from patchworks, I can't explain that.
>>>
>>> I've already downloaded it and marked it accepted.
>>>
>>
>> Thanks Doug. Would you like that I'll repost the patch with the commit
>> message changed as Or suggested or is the current version good enough?
>>
>> Regarding the Ethernet loopback issue, I started looking into that,
>> but as Or stated, it's broken even before the RoCE patches.
>
> Ping.  Any progress on this?

Yeah, there's some progress - the basic problem is that we don't have
a bounded ndev and thus cma_resolve_iboe_route returns -ENODEV.
The root cause for this is that we have to store the ndev in
cma_bind_loopback. Even after doing that, cma_set_loopback changes the
sgid to be the localhost GID, which doesn't exist in the GID table and
thus will fail later in the GID lookup.
I think that regarding loopback, we actually want to send the data on
the link local default GID, which is guaranteed to exist. That's why I
think we should:
1. Change the cma_src_addr and cma_dst_addr in cma_bind_loopback to be
the default GID.
2. Store the associated ndev of this default GID as the bounded device.
3. In cma_resolve_loopback, get the MAC of this bounded device and
store it as the DMAC.
4. In cma_resolve_iboe_route, don't try to do route resolve if the
dGID matches the default GID.

It's still not working though, but this is where I'm headed. What do you think?


>
>
> --
> Doug Ledford <dledford@redhat.com>
>               GPG KeyID: 0E572FDD
>
>

Regards,
Matan
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Matan Barak Dec. 23, 2015, 4:35 p.m. UTC | #11
On Wed, Dec 23, 2015 at 6:08 PM, Doug Ledford <dledford@redhat.com> wrote:
> On 12/22/2015 02:26 PM, Matan Barak wrote:
>> On Tue, Dec 22, 2015 at 8:58 PM, Doug Ledford <dledford@redhat.com> wrote:
>>> On 12/22/2015 05:47 AM, Or Gerlitz wrote:
>>>> On 12/21/2015 5:01 PM, Matan Barak wrote:
>>>>> Previously, cma_match_net_dev called cma_protocol_roce which
>>>>> tried to verify that the IB device uses RoCE protocol. However,
>>>>> if rdma_id didn't have a bounded port, it used the first port
>>>>> of the device.
>>>>>
>>>>> In VPI systems, the first port might be an IB port while the second
>>>>> one could be an Ethernet port. This made requests for unbounded rdma_ids
>>>>> that come from the Ethernet port fail.
>>>>> Fixing this by passing the port of the request and checking this port
>>>>> of the device.
>>>>>
>>>>> Fixes: b8cab5dab15f ('IB/cma: Accept connection without a valid netdev
>>>>> on RoCE')
>>>>> Signed-off-by: Matan Barak<matanb@mellanox.com>
>>>>
>>>> seems that the patch is missing from patchworks, I can't explain that.
>>>
>>> I've already downloaded it and marked it accepted.
>>>
>>
>> Thanks Doug. Would you like that I'll repost the patch with the commit
>> message changed as Or suggested or is the current version good enough?
>>
>> Regarding the Ethernet loopback issue, I started looking into that,
>> but as Or stated, it's broken even before the RoCE patches.
>
> Ping.  Any progress on this?

Yeah, there's some progress - the basic problem is that we don't have
a bounded ndev and thus cma_resolve_iboe_route returns -ENODEV.
The root cause for this is that we have to store the ndev in
cma_bind_loopback. Even after doing that, cma_set_loopback changes the
sgid to be the localhost GID, which doesn't exist in the GID table and
thus will fail later in the GID lookup.
I think that regarding loopback, we actually want to send the data on
the link local default GID, which is guaranteed to exist. That's why I
think we should:
1. Change the cma_src_addr and cma_dst_addr in cma_bind_loopback to be
the default GID.
2. Store the associated ndev of this default GID as the bounded device.
3. In cma_resolve_loopback, get the MAC of this bounded device and
store it as the DMAC.
4. In cma_resolve_iboe_route, don't try to do route resolve if the
dGID matches the default GID.

It's still not working though, but this is where I'm headed. What do you think?


>
>
> --
> Doug Ledford <dledford@redhat.com>
>               GPG KeyID: 0E572FDD
>
>
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Doug Ledford Dec. 23, 2015, 5:57 p.m. UTC | #12
On 12/23/2015 11:35 AM, Matan Barak wrote:
> On Wed, Dec 23, 2015 at 6:08 PM, Doug Ledford <dledford@redhat.com> wrote:
>> On 12/22/2015 02:26 PM, Matan Barak wrote:
>>> On Tue, Dec 22, 2015 at 8:58 PM, Doug Ledford <dledford@redhat.com> wrote:
>>>> On 12/22/2015 05:47 AM, Or Gerlitz wrote:
>>>>> On 12/21/2015 5:01 PM, Matan Barak wrote:
>>>>>> Previously, cma_match_net_dev called cma_protocol_roce which
>>>>>> tried to verify that the IB device uses RoCE protocol. However,
>>>>>> if rdma_id didn't have a bounded port, it used the first port
>>>>>> of the device.
>>>>>>
>>>>>> In VPI systems, the first port might be an IB port while the second
>>>>>> one could be an Ethernet port. This made requests for unbounded rdma_ids
>>>>>> that come from the Ethernet port fail.
>>>>>> Fixing this by passing the port of the request and checking this port
>>>>>> of the device.
>>>>>>
>>>>>> Fixes: b8cab5dab15f ('IB/cma: Accept connection without a valid netdev
>>>>>> on RoCE')
>>>>>> Signed-off-by: Matan Barak<matanb@mellanox.com>
>>>>>
>>>>> seems that the patch is missing from patchworks, I can't explain that.
>>>>
>>>> I've already downloaded it and marked it accepted.
>>>>
>>>
>>> Thanks Doug. Would you like that I'll repost the patch with the commit
>>> message changed as Or suggested or is the current version good enough?
>>>
>>> Regarding the Ethernet loopback issue, I started looking into that,
>>> but as Or stated, it's broken even before the RoCE patches.
>>
>> Ping.  Any progress on this?
> 
> Yeah, there's some progress - the basic problem is that we don't have
> a bounded ndev and thus cma_resolve_iboe_route returns -ENODEV.

Which makes sense considering that 127.0.0.1 doesn't belong to any of
the devs.

> The root cause for this is that we have to store the ndev in
> cma_bind_loopback. Even after doing that, cma_set_loopback changes the
> sgid to be the localhost GID, which doesn't exist in the GID table and
> thus will fail later in the GID lookup.

Again, makes sense.

> I think that regarding loopback, we actually want to send the data on
> the link local default GID,

Which link local default GID?  If you have more than one port or card,
then that is not a unique value.

> which is guaranteed to exist.

And in many cases, multiple times.

> That's why I
> think we should:
> 1. Change the cma_src_addr and cma_dst_addr in cma_bind_loopback to be
> the default GID.
> 2. Store the associated ndev of this default GID as the bounded device.
> 3. In cma_resolve_loopback, get the MAC of this bounded device and
> store it as the DMAC.
> 4. In cma_resolve_iboe_route, don't try to do route resolve if the
> dGID matches the default GID.
> 
> It's still not working though, but this is where I'm headed. What do you think?

Let's punt this until later.  It only effects the situation when you use
127.0.0.1 as the address.  If you use the local IP address of a specific
interface, you get the same loopback behavior, but no failures (and on
top of that instead of getting a random device to handle the loopback
transfer, you get a specific device of your choosing).  To me, that
qualifies as a reasonable workaround.  The 127.0.0.1 behavior has been
broken for a while (and I'm not sure it should have ever been relied
upon anyway), so I don't think we have to hold things up.
Matan Barak Dec. 24, 2015, 7:57 a.m. UTC | #13
On Wed, Dec 23, 2015 at 7:57 PM, Doug Ledford <dledford@redhat.com> wrote:
> On 12/23/2015 11:35 AM, Matan Barak wrote:
>> On Wed, Dec 23, 2015 at 6:08 PM, Doug Ledford <dledford@redhat.com> wrote:
>>> On 12/22/2015 02:26 PM, Matan Barak wrote:
>>>> On Tue, Dec 22, 2015 at 8:58 PM, Doug Ledford <dledford@redhat.com> wrote:
>>>>> On 12/22/2015 05:47 AM, Or Gerlitz wrote:
>>>>>> On 12/21/2015 5:01 PM, Matan Barak wrote:
>>>>>>> Previously, cma_match_net_dev called cma_protocol_roce which
>>>>>>> tried to verify that the IB device uses RoCE protocol. However,
>>>>>>> if rdma_id didn't have a bounded port, it used the first port
>>>>>>> of the device.
>>>>>>>
>>>>>>> In VPI systems, the first port might be an IB port while the second
>>>>>>> one could be an Ethernet port. This made requests for unbounded rdma_ids
>>>>>>> that come from the Ethernet port fail.
>>>>>>> Fixing this by passing the port of the request and checking this port
>>>>>>> of the device.
>>>>>>>
>>>>>>> Fixes: b8cab5dab15f ('IB/cma: Accept connection without a valid netdev
>>>>>>> on RoCE')
>>>>>>> Signed-off-by: Matan Barak<matanb@mellanox.com>
>>>>>>
>>>>>> seems that the patch is missing from patchworks, I can't explain that.
>>>>>
>>>>> I've already downloaded it and marked it accepted.
>>>>>
>>>>
>>>> Thanks Doug. Would you like that I'll repost the patch with the commit
>>>> message changed as Or suggested or is the current version good enough?
>>>>
>>>> Regarding the Ethernet loopback issue, I started looking into that,
>>>> but as Or stated, it's broken even before the RoCE patches.
>>>
>>> Ping.  Any progress on this?
>>
>> Yeah, there's some progress - the basic problem is that we don't have
>> a bounded ndev and thus cma_resolve_iboe_route returns -ENODEV.
>
> Which makes sense considering that 127.0.0.1 doesn't belong to any of
> the devs.
>
>> The root cause for this is that we have to store the ndev in
>> cma_bind_loopback. Even after doing that, cma_set_loopback changes the
>> sgid to be the localhost GID, which doesn't exist in the GID table and
>> thus will fail later in the GID lookup.
>
> Again, makes sense.
>
>> I think that regarding loopback, we actually want to send the data on
>> the link local default GID,
>
> Which link local default GID?  If you have more than one port or card,
> then that is not a unique value.

We assume that every RoCE port has an associated net device. Since a
net device should have a unique MAC, it should have a unique IPv6 link
local address and thus a unique GID.

>
>> which is guaranteed to exist.
>
> And in many cases, multiple times.
>
>> That's why I
>> think we should:
>> 1. Change the cma_src_addr and cma_dst_addr in cma_bind_loopback to be
>> the default GID.
>> 2. Store the associated ndev of this default GID as the bounded device.
>> 3. In cma_resolve_loopback, get the MAC of this bounded device and
>> store it as the DMAC.
>> 4. In cma_resolve_iboe_route, don't try to do route resolve if the
>> dGID matches the default GID.
>>
>> It's still not working though, but this is where I'm headed. What do you think?
>
> Let's punt this until later.  It only effects the situation when you use
> 127.0.0.1 as the address.  If you use the local IP address of a specific
> interface, you get the same loopback behavior, but no failures (and on
> top of that instead of getting a random device to handle the loopback
> transfer, you get a specific device of your choosing).  To me, that
> qualifies as a reasonable workaround.  The 127.0.0.1 behavior has been
> broken for a while (and I'm not sure it should have ever been relied
> upon anyway), so I don't think we have to hold things up.
>

I totally agree that it's better to use the local IP address and not
just get a random device by using 127.0.0.1. You could get a specific
device by binding it, but then - use its local IP instead of
127.0.0.1.


> --
> Doug Ledford <dledford@redhat.com>
>               GPG KeyID: 0E572FDD
>
>

Matan
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Or Gerlitz Dec. 24, 2015, 8:18 a.m. UTC | #14
On 12/24/2015 9:57 AM, Matan Barak wrote:
> I totally agree that it's better to use the local IP address and not
> just get a random device by using 127.0.0.1. You could get a specific
> device by binding it, but then - use its local IP instead of
> 127.0.0.1.

Yes guys, it might be better but the user can do that. However, loopback 
ala 127.0.0.1 is working in the rdma-cm since the 2.6.18 day one and was 
broken recently when Eth ports are around.

Lets fix.

Indeed, this is 4.5 and not 4.4 material.

Or.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c
index d2d5d00..c8a265c 100644
--- a/drivers/infiniband/core/cma.c
+++ b/drivers/infiniband/core/cma.c
@@ -1265,15 +1265,17 @@  static bool cma_protocol_roce(const struct rdma_cm_id *id)
 	return cma_protocol_roce_dev_port(device, port_num);
 }
 
-static bool cma_match_net_dev(const struct rdma_id_private *id_priv,
-			      const struct net_device *net_dev)
+static bool cma_match_net_dev(const struct rdma_cm_id *id,
+			      const struct net_device *net_dev,
+			      u8 port_num)
 {
-	const struct rdma_addr *addr = &id_priv->id.route.addr;
+	const struct rdma_addr *addr = &id->route.addr;
 
 	if (!net_dev)
 		/* This request is an AF_IB request or a RoCE request */
-		return addr->src_addr.ss_family == AF_IB ||
-		       cma_protocol_roce(&id_priv->id);
+		return (!id->port_num || id->port_num == port_num) &&
+		       (addr->src_addr.ss_family == AF_IB ||
+			cma_protocol_roce_dev_port(id->device, port_num));
 
 	return !addr->dev_addr.bound_dev_if ||
 	       (net_eq(dev_net(net_dev), addr->dev_addr.net) &&
@@ -1295,13 +1297,13 @@  static struct rdma_id_private *cma_find_listener(
 	hlist_for_each_entry(id_priv, &bind_list->owners, node) {
 		if (cma_match_private_data(id_priv, ib_event->private_data)) {
 			if (id_priv->id.device == cm_id->device &&
-			    cma_match_net_dev(id_priv, net_dev))
+			    cma_match_net_dev(&id_priv->id, net_dev, req->port))
 				return id_priv;
 			list_for_each_entry(id_priv_dev,
 					    &id_priv->listen_list,
 					    listen_list) {
 				if (id_priv_dev->id.device == cm_id->device &&
-				    cma_match_net_dev(id_priv_dev, net_dev))
+				    cma_match_net_dev(&id_priv_dev->id, net_dev, req->port))
 					return id_priv_dev;
 			}
 		}