diff mbox series

[RFC,3/3] RDMA/siw: Require non-zero 6-byte MACs for soft iWARP

Message ID 168330138101.5953.12575990094340826016.stgit@oracle-102.nfsv4bat.org (mailing list archive)
State Rejected
Headers show
Series siw on tunnel devices | expand

Commit Message

Chuck Lever May 5, 2023, 3:43 p.m. UTC
From: Chuck Lever <chuck.lever@oracle.com>

In the past, LOOPBACK and NONE (tunnel) devices had all-zero MAC
addresses. siw_device_create() would fall back to copying the
device's name in those cases, because an all-zero MAC address breaks
the RDMA core IP-to-device lookup mechanism.

However, in some cases, the net_device::name field is also empty.
So we're back at square one.

Rather than checking the device type, look at the
net_device::addr_len field. If it's got the right number of octets
and it is not all zeroes, use that.

Then, to enable siw support for that device/address type, change
the device driver to ensure such devices have a valid 6-octet MAC
address. For virtual devices, using eth_hw_addr_random() is
sufficient.

Fixes: a2d36b02c15d ("RDMA/siw: Enable siw on tunnel devices")
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
 drivers/infiniband/sw/siw/siw_main.c |   22 +++++++---------------
 1 file changed, 7 insertions(+), 15 deletions(-)

Comments

Jason Gunthorpe May 5, 2023, 7:58 p.m. UTC | #1
On Fri, May 05, 2023 at 11:43:11AM -0400, Chuck Lever wrote:
> From: Chuck Lever <chuck.lever@oracle.com>
> 
> In the past, LOOPBACK and NONE (tunnel) devices had all-zero MAC
> addresses. siw_device_create() would fall back to copying the
> device's name in those cases, because an all-zero MAC address breaks
> the RDMA core IP-to-device lookup mechanism.

Why not just make up a dummy address in SIW? It shouldn't need to leak
out of it.. It is just some artifact of how the iWarp stuff has been
designed

Jason
Chuck Lever May 5, 2023, 8:03 p.m. UTC | #2
> On May 5, 2023, at 3:58 PM, Jason Gunthorpe <jgg@ziepe.ca> wrote:
> 
> On Fri, May 05, 2023 at 11:43:11AM -0400, Chuck Lever wrote:
>> From: Chuck Lever <chuck.lever@oracle.com>
>> 
>> In the past, LOOPBACK and NONE (tunnel) devices had all-zero MAC
>> addresses. siw_device_create() would fall back to copying the
>> device's name in those cases, because an all-zero MAC address breaks
>> the RDMA core IP-to-device lookup mechanism.
> 
> Why not just make up a dummy address in SIW? It shouldn't need to leak
> out of it.. It is just some artifact of how the iWarp stuff has been
> designed

I've been trying that.

Even though the siw0 device is now registered with a non-zero GID, 
cma_acquire_dev_by_src_ip() still comes up with a zero GID which
matches no device. Address resolution then fails.

I'm still looking into why.


--
Chuck Lever
Chuck Lever May 6, 2023, 6:05 p.m. UTC | #3
> On May 5, 2023, at 4:03 PM, Chuck Lever III <chuck.lever@oracle.com> wrote:
> 
>> On May 5, 2023, at 3:58 PM, Jason Gunthorpe <jgg@ziepe.ca> wrote:
>> 
>> On Fri, May 05, 2023 at 11:43:11AM -0400, Chuck Lever wrote:
>>> From: Chuck Lever <chuck.lever@oracle.com>
>>> 
>>> In the past, LOOPBACK and NONE (tunnel) devices had all-zero MAC
>>> addresses. siw_device_create() would fall back to copying the
>>> device's name in those cases, because an all-zero MAC address breaks
>>> the RDMA core IP-to-device lookup mechanism.
>> 
>> Why not just make up a dummy address in SIW? It shouldn't need to leak
>> out of it.. It is just some artifact of how the iWarp stuff has been
>> designed
> 
> I've been trying that.
> 
> Even though the siw0 device is now registered with a non-zero GID, 
> cma_acquire_dev_by_src_ip() still comes up with a zero GID which
> matches no device. Address resolution then fails.
> 
> I'm still looking into why.

The tun0 device's flags are:

   UP|POINTOPOINT|NOARP|MULTICAST

That flag combination turns addr_resolve_neigh() into a no-op, so
that the returned GIDs and addresses are uninitialized.

Cc'ing Parav because he's the last person who did significant work
on this code path. I can hack this to make it work, but I have no
idea what the proper solution would be.


--
Chuck Lever
Chuck Lever May 23, 2023, 7:18 p.m. UTC | #4
> On May 5, 2023, at 3:58 PM, Jason Gunthorpe <jgg@ziepe.ca> wrote:
> 
> On Fri, May 05, 2023 at 11:43:11AM -0400, Chuck Lever wrote:
>> From: Chuck Lever <chuck.lever@oracle.com>
>> 
>> In the past, LOOPBACK and NONE (tunnel) devices had all-zero MAC
>> addresses. siw_device_create() would fall back to copying the
>> device's name in those cases, because an all-zero MAC address breaks
>> the RDMA core IP-to-device lookup mechanism.
> 
> Why not just make up a dummy address in SIW? It shouldn't need to leak
> out of it.. It is just some artifact of how the iWarp stuff has been
> designed

So that approach is already being done in siw_device_create(),
even though it is broken (the device name hasn't been initialized
when the phony MAC is created, so it is all zeroes). I've fixed
that and it still doesn't help.

siw cannot modify the underlying net_device to add a made-up
MAC address.

The core address resolution code wants to find an L2 address
for the egress device. The underlying ib_device, where a made-up
GID might be stored, is not involved with address resolution
AFAICT.

tun devices have no L2 address. Neither do loopback devices,
but address resolution makes an exception for LOOPBACK devices
by redirecting to a local physical Ethernet device.

Redirecting tun traffic to the local Ethernet device seems
dodgy at best.

I wasn't sure that an L2 address was required for siw before,
but now I'm pretty confident that it is required by our
implementation.

--
Chuck Lever
Tom Talpey May 23, 2023, 7:44 p.m. UTC | #5
On 5/23/2023 3:18 PM, Chuck Lever III wrote:
> 
>> On May 5, 2023, at 3:58 PM, Jason Gunthorpe <jgg@ziepe.ca> wrote:
>>
>> On Fri, May 05, 2023 at 11:43:11AM -0400, Chuck Lever wrote:
>>> From: Chuck Lever <chuck.lever@oracle.com>
>>>
>>> In the past, LOOPBACK and NONE (tunnel) devices had all-zero MAC
>>> addresses. siw_device_create() would fall back to copying the
>>> device's name in those cases, because an all-zero MAC address breaks
>>> the RDMA core IP-to-device lookup mechanism.
>>
>> Why not just make up a dummy address in SIW? It shouldn't need to leak
>> out of it.. It is just some artifact of how the iWarp stuff has been
>> designed
> 
> So that approach is already being done in siw_device_create(),
> even though it is broken (the device name hasn't been initialized
> when the phony MAC is created, so it is all zeroes). I've fixed
> that and it still doesn't help.
> 
> siw cannot modify the underlying net_device to add a made-up
> MAC address.
> 
> The core address resolution code wants to find an L2 address
> for the egress device. The underlying ib_device, where a made-up
> GID might be stored, is not involved with address resolution
> AFAICT.
> 
> tun devices have no L2 address. Neither do loopback devices,
> but address resolution makes an exception for LOOPBACK devices
> by redirecting to a local physical Ethernet device.
> 
> Redirecting tun traffic to the local Ethernet device seems
> dodgy at best.
> 
> I wasn't sure that an L2 address was required for siw before,
> but now I'm pretty confident that it is required by our
> implementation.

Does rxe work over tunnels? Seems like it would have the same issue.

int rxe_register_device(struct rxe_dev *rxe, const char *ibdev_name)
{
...
         addrconf_addr_eui48((unsigned char *)&dev->node_guid,
                             rxe->ndev->dev_addr);

static struct siw_device *siw_device_create(struct net_device *netdev)
{
...
         addrconf_addr_eui48((unsigned char *)&base_dev->node_guid,
                                     netdev->dev_addr);

Tom.
Chuck Lever May 23, 2023, 10:50 p.m. UTC | #6
> On May 23, 2023, at 3:44 PM, Tom Talpey <tom@talpey.com> wrote:
> 
> On 5/23/2023 3:18 PM, Chuck Lever III wrote:
>>> On May 5, 2023, at 3:58 PM, Jason Gunthorpe <jgg@ziepe.ca> wrote:
>>> 
>>> On Fri, May 05, 2023 at 11:43:11AM -0400, Chuck Lever wrote:
>>>> From: Chuck Lever <chuck.lever@oracle.com>
>>>> 
>>>> In the past, LOOPBACK and NONE (tunnel) devices had all-zero MAC
>>>> addresses. siw_device_create() would fall back to copying the
>>>> device's name in those cases, because an all-zero MAC address breaks
>>>> the RDMA core IP-to-device lookup mechanism.
>>> 
>>> Why not just make up a dummy address in SIW? It shouldn't need to leak
>>> out of it.. It is just some artifact of how the iWarp stuff has been
>>> designed
>> So that approach is already being done in siw_device_create(),
>> even though it is broken (the device name hasn't been initialized
>> when the phony MAC is created, so it is all zeroes). I've fixed
>> that and it still doesn't help.
>> siw cannot modify the underlying net_device to add a made-up
>> MAC address.
>> The core address resolution code wants to find an L2 address
>> for the egress device. The underlying ib_device, where a made-up
>> GID might be stored, is not involved with address resolution
>> AFAICT.
>> tun devices have no L2 address. Neither do loopback devices,
>> but address resolution makes an exception for LOOPBACK devices
>> by redirecting to a local physical Ethernet device.
>> Redirecting tun traffic to the local Ethernet device seems
>> dodgy at best.
>> I wasn't sure that an L2 address was required for siw before,
>> but now I'm pretty confident that it is required by our
>> implementation.
> 
> Does rxe work over tunnels?

(It's not tunnels per se, it's devices that don't have
L2 addresses... and tun happens to be one instance of
that class).

My (brief) reading of the source code is that the use of
devices that do not have L2 addresses is prohibited for
rxe.


> Seems like it would have the same issue.

Agreed, if rxe did not prohibit them, it would have the same
issue.

To be clear: siw itself and the family of iWARP protocols
shouldn't have any problem at all with such devices. The
issue seems to be with the Linux implementation of address
resolution.


> int rxe_register_device(struct rxe_dev *rxe, const char *ibdev_name)
> {
> ...
>        addrconf_addr_eui48((unsigned char *)&dev->node_guid,
>                            rxe->ndev->dev_addr);
> 
> static struct siw_device *siw_device_create(struct net_device *netdev)
> {
> ...
>        addrconf_addr_eui48((unsigned char *)&base_dev->node_guid,
>                                    netdev->dev_addr);
> 
> Tom.


--
Chuck Lever
diff mbox series

Patch

diff --git a/drivers/infiniband/sw/siw/siw_main.c b/drivers/infiniband/sw/siw/siw_main.c
index 65b5cda5457b..2c31bf397993 100644
--- a/drivers/infiniband/sw/siw/siw_main.c
+++ b/drivers/infiniband/sw/siw/siw_main.c
@@ -304,10 +304,15 @@  static const struct ib_device_ops siw_device_ops = {
 
 static struct siw_device *siw_device_create(struct net_device *netdev)
 {
+	static const u8 zeromac[ETH_ALEN] = { 0 };
 	struct siw_device *sdev = NULL;
 	struct ib_device *base_dev;
 	int rv;
 
+	if ((netdev->addr_len != ETH_ALEN) ||
+	    (memcmp(netdev->dev_addr, zeromac, ETH_ALEN) == 0))
+		return NULL;
+
 	sdev = ib_alloc_device(siw_device, base_dev);
 	if (!sdev)
 		return NULL;
@@ -316,21 +321,8 @@  static struct siw_device *siw_device_create(struct net_device *netdev)
 
 	sdev->netdev = netdev;
 
-	if (netdev->type != ARPHRD_LOOPBACK && netdev->type != ARPHRD_NONE) {
-		addrconf_addr_eui48((unsigned char *)&base_dev->node_guid,
-				    netdev->dev_addr);
-	} else {
-		/*
-		 * This device does not have a HW address,
-		 * but connection mangagement lib expects gid != 0
-		 */
-		size_t len = min_t(size_t, strlen(base_dev->name), 6);
-		char addr[6] = { };
-
-		memcpy(addr, base_dev->name, len);
-		addrconf_addr_eui48((unsigned char *)&base_dev->node_guid,
-				    addr);
-	}
+	addrconf_addr_eui48((unsigned char *)&base_dev->node_guid,
+			    netdev->dev_addr);
 
 	base_dev->uverbs_cmd_mask |= BIT_ULL(IB_USER_VERBS_CMD_POST_SEND);