mbox series

[0/3] RDMA net namespace

Message ID 20221023220450.2287909-1-yanjun.zhu@intel.com (mailing list archive)
Headers show
Series RDMA net namespace | expand

Message

Zhu Yanjun Oct. 23, 2022, 10:04 p.m. UTC
From: Zhu Yanjun <yanjun.zhu@linux.dev>

There are shared and exclusive modes in RDMA net namespace. After
discussion with Leon, the above modes are compatible with legacy IB
device. 

To the RoCE and iWARP devices, the ib devices should be in the same net
namespace with the related net devices regardless of in shared or
exclusive mode.

In the first commit, when the net devices are moved to a new net
namespace, the related ib devices are also moved to the same net
namespace.

In the second commit, the shared/exclusive modes still work with legacy
ib devices. To the RoCE and iWARP devices, these modes will not be
considered.

Because MLX4/5 do not call the function ib_device_set_netdev to map ib
devices and the related net devices, the function ib_device_get_by_netdev
can not get ib devices from net devices. In the third commit, all the
registered ib devices are parsed to get the net devices, then compared
with the given net devices.

The steps to make tests:
1) Create a new net namespace net0

   ip netns add net0

2) Show the rdma links in init_net

   rdma link

   "
   link mlx5_0/1 state DOWN physical_state DISABLED netdev enp7s0np1
   "

3) Move the net device to net namespace net0

   ip link set enp7s0np1 netns net0

4) Show the rdma links in init_net again

   rdma link

   There is no rdma links

5) Show the rdma links in net0

   ip netns exec net0 rdma link

   "
   link mlx5_0/1 state DOWN physical_state DISABLED netdev enp7s0np1
   "

We can confirm that rdma links are moved to the same net namespace with
the net devices.

Zhu Yanjun (3):
  RDMA/core: Move ib device to the same net namespace with net device
  RDMA/core: The legacy IB devices still work with shared/exclusive mode
  RDMA/core: Get all the ib devices from net devices

 drivers/infiniband/core/device.c | 107 ++++++++++++++++++++++++++++++-
 1 file changed, 105 insertions(+), 2 deletions(-)

Comments

Leon Romanovsky Oct. 23, 2022, 1:04 p.m. UTC | #1
On Sun, Oct 23, 2022 at 06:04:47PM -0400, Zhu Yanjun wrote:
> From: Zhu Yanjun <yanjun.zhu@linux.dev>
> 
> There are shared and exclusive modes in RDMA net namespace. After
> discussion with Leon, the above modes are compatible with legacy IB
> device. 
> 
> To the RoCE and iWARP devices, the ib devices should be in the same net
> namespace with the related net devices regardless of in shared or
> exclusive mode.
> 
> In the first commit, when the net devices are moved to a new net
> namespace, the related ib devices are also moved to the same net
> namespace.

I think that rdma_dev_net_ops are supposed to handle this.

Thanks
Zhu Yanjun Oct. 23, 2022, 1:42 p.m. UTC | #2
在 2022/10/23 21:04, Leon Romanovsky 写道:
> On Sun, Oct 23, 2022 at 06:04:47PM -0400, Zhu Yanjun wrote:
>> From: Zhu Yanjun <yanjun.zhu@linux.dev>
>>
>> There are shared and exclusive modes in RDMA net namespace. After
>> discussion with Leon, the above modes are compatible with legacy IB
>> device.
>>
>> To the RoCE and iWARP devices, the ib devices should be in the same net
>> namespace with the related net devices regardless of in shared or
>> exclusive mode.
>>
>> In the first commit, when the net devices are moved to a new net
>> namespace, the related ib devices are also moved to the same net
>> namespace.
> I think that rdma_dev_net_ops are supposed to handle this.

Yes. rdma_dev_net_ops can move ib devices from one net to another net.

But these functions are called by a netlink command "rdma dev...".


In my commit, to RoCE devices, ib devices and net devices should be in 
the same net.

That is, when the net devices are moved to another net, the ib devices 
are moved

to the same net automically instead of running a netlink command to move 
ib devices.


To legacy ib devices, this netlink command is needed. To RoCE devices, 
this command

is not needed. When net devices are moved to new net, the ib devices are 
also moved automically.

Per our discussion, if RoCE's net devices and ib devices are separated 
in the different net, ib devices

can not work.

Zhu Yanjun

>
> Thanks
Leon Romanovsky Oct. 23, 2022, 4:45 p.m. UTC | #3
On Sun, Oct 23, 2022 at 09:42:00PM +0800, Yanjun Zhu wrote:
> 
> 在 2022/10/23 21:04, Leon Romanovsky 写道:
> > On Sun, Oct 23, 2022 at 06:04:47PM -0400, Zhu Yanjun wrote:
> > > From: Zhu Yanjun <yanjun.zhu@linux.dev>
> > > 
> > > There are shared and exclusive modes in RDMA net namespace. After
> > > discussion with Leon, the above modes are compatible with legacy IB
> > > device.
> > > 
> > > To the RoCE and iWARP devices, the ib devices should be in the same net
> > > namespace with the related net devices regardless of in shared or
> > > exclusive mode.
> > > 
> > > In the first commit, when the net devices are moved to a new net
> > > namespace, the related ib devices are also moved to the same net
> > > namespace.
> > I think that rdma_dev_net_ops are supposed to handle this.
> 
> Yes. rdma_dev_net_ops can move ib devices from one net to another net.
> 
> But these functions are called by a netlink command "rdma dev...".

rdma_dev_net_ops are called when you move netdevice from one netlink to
another.

However you raised an interesting question if it is correct behaviour to
move IB device after moved netdevice.

I don't know an answer for that.

Thanks
Dust Li Oct. 24, 2022, 1:10 a.m. UTC | #4
On Sun, Oct 23, 2022 at 06:04:47PM -0400, Zhu Yanjun wrote:
>From: Zhu Yanjun <yanjun.zhu@linux.dev>
>
>There are shared and exclusive modes in RDMA net namespace. After
>discussion with Leon, the above modes are compatible with legacy IB
>device. 
>
>To the RoCE and iWARP devices, the ib devices should be in the same net
>namespace with the related net devices regardless of in shared or
>exclusive mode.

Does this mean that shared mode is no longer supported for RoCE and iWarp
devices ?


>
>In the first commit, when the net devices are moved to a new net
>namespace, the related ib devices are also moved to the same net
>namespace.
>
>In the second commit, the shared/exclusive modes still work with legacy
>ib devices. To the RoCE and iWARP devices, these modes will not be
>considered.
>
>Because MLX4/5 do not call the function ib_device_set_netdev to map ib
>devices and the related net devices, the function ib_device_get_by_netdev
>can not get ib devices from net devices. In the third commit, all the
>registered ib devices are parsed to get the net devices, then compared
>with the given net devices.
>
>The steps to make tests:
>1) Create a new net namespace net0
>
>   ip netns add net0
>
>2) Show the rdma links in init_net
>
>   rdma link
>
>   "
>   link mlx5_0/1 state DOWN physical_state DISABLED netdev enp7s0np1
>   "
>
>3) Move the net device to net namespace net0
>
>   ip link set enp7s0np1 netns net0
>
>4) Show the rdma links in init_net again
>
>   rdma link
>
>   There is no rdma links
>
>5) Show the rdma links in net0
>
>   ip netns exec net0 rdma link
>
>   "
>   link mlx5_0/1 state DOWN physical_state DISABLED netdev enp7s0np1
>   "
>
>We can confirm that rdma links are moved to the same net namespace with
>the net devices.
>
>Zhu Yanjun (3):
>  RDMA/core: Move ib device to the same net namespace with net device
>  RDMA/core: The legacy IB devices still work with shared/exclusive mode
>  RDMA/core: Get all the ib devices from net devices
>
> drivers/infiniband/core/device.c | 107 ++++++++++++++++++++++++++++++-
> 1 file changed, 105 insertions(+), 2 deletions(-)
>
>-- 
>2.27.0
Zhu Yanjun Oct. 24, 2022, 6:15 a.m. UTC | #5
October 24, 2022 9:10 AM, "Dust Li" <dust.li@linux.alibaba.com> wrote:

> On Sun, Oct 23, 2022 at 06:04:47PM -0400, Zhu Yanjun wrote:
> 
>> From: Zhu Yanjun <yanjun.zhu@linux.dev>
>> 
>> There are shared and exclusive modes in RDMA net namespace. After
>> discussion with Leon, the above modes are compatible with legacy IB
>> device.
>> 
>> To the RoCE and iWARP devices, the ib devices should be in the same net
>> namespace with the related net devices regardless of in shared or
>> exclusive mode.
> 
> Does this mean that shared mode is no longer supported for RoCE and iWarp
> devices ?

From the discussion,  a RoCE and iWarp device should make ib devices and net devices in the same net. So a RoCE and iWarp device has no shared/exclusive modes.

Shared/exclusive modes are for legacy ib devices, such as ipoib. 

In this patch series, shared/exclusive modes are left for legacy ib devices.
To a RoCE and iWarp device, we just keep net devices and ib devices in the same net.



> 
>> In the first commit, when the net devices are moved to a new net
>> namespace, the related ib devices are also moved to the same net
>> namespace.
>> 
>> In the second commit, the shared/exclusive modes still work with legacy
>> ib devices. To the RoCE and iWARP devices, these modes will not be
>> considered.
>> 
>> Because MLX4/5 do not call the function ib_device_set_netdev to map ib
>> devices and the related net devices, the function ib_device_get_by_netdev
>> can not get ib devices from net devices. In the third commit, all the
>> registered ib devices are parsed to get the net devices, then compared
>> with the given net devices.
>> 
>> The steps to make tests:
>> 1) Create a new net namespace net0
>> 
>> ip netns add net0
>> 
>> 2) Show the rdma links in init_net
>> 
>> rdma link
>> 
>> "
>> link mlx5_0/1 state DOWN physical_state DISABLED netdev enp7s0np1
>> "
>> 
>> 3) Move the net device to net namespace net0
>> 
>> ip link set enp7s0np1 netns net0
>> 
>> 4) Show the rdma links in init_net again
>> 
>> rdma link
>> 
>> There is no rdma links
>> 
>> 5) Show the rdma links in net0
>> 
>> ip netns exec net0 rdma link
>> 
>> "
>> link mlx5_0/1 state DOWN physical_state DISABLED netdev enp7s0np1
>> "
>> 
>> We can confirm that rdma links are moved to the same net namespace with
>> the net devices.
>> 
>> Zhu Yanjun (3):
>> RDMA/core: Move ib device to the same net namespace with net device
>> RDMA/core: The legacy IB devices still work with shared/exclusive mode
>> RDMA/core: Get all the ib devices from net devices
>> 
>> drivers/infiniband/core/device.c | 107 ++++++++++++++++++++++++++++++-
>> 1 file changed, 105 insertions(+), 2 deletions(-)
>> 
>> --
>> 2.27.0
Zhu Yanjun Oct. 24, 2022, 7:20 a.m. UTC | #6
October 24, 2022 12:45 AM, "Leon Romanovsky" <leon@kernel.org> wrote:

> On Sun, Oct 23, 2022 at 09:42:00PM +0800, Yanjun Zhu wrote:
> 
>> 在 2022/10/23 21:04, Leon Romanovsky 写道:
>> On Sun, Oct 23, 2022 at 06:04:47PM -0400, Zhu Yanjun wrote:
>>> From: Zhu Yanjun <yanjun.zhu@linux.dev>
>>> 
>>> There are shared and exclusive modes in RDMA net namespace. After
>>> discussion with Leon, the above modes are compatible with legacy IB
>>> device.
>>> 
>>> To the RoCE and iWARP devices, the ib devices should be in the same net
>>> namespace with the related net devices regardless of in shared or
>>> exclusive mode.
>>> 
>>> In the first commit, when the net devices are moved to a new net
>>> namespace, the related ib devices are also moved to the same net
>>> namespace.
>> I think that rdma_dev_net_ops are supposed to handle this.
>> 
>> Yes. rdma_dev_net_ops can move ib devices from one net to another net.
>> 
>> But these functions are called by a netlink command "rdma dev...".
> 
> rdma_dev_net_ops are called when you move netdevice from one netlink to
> another.

To "rdma_dev_net_ops are called when you move netdevice from one netlink to another.", 

if I get you correctly, you mean, when moving net device form one net namespace to another, rdma_dev_net_ops
will be called.

in fact, rdma_dev_net_ops will be called when running the 2 commands "ip netns add ..." and "ip netns del ...".

> 
> However you raised an interesting question if it is correct behaviour to
> move IB device after moved netdevice.

Now we come back to the original problem, how to make RoCE ib device work when the ib device and net devices are separated in the 2 different net namespaces?

If you know, please let me know.

Currently I keep ib devices and the related net devices in the same net namespace to make ib devices work. So I made these commits to keep ib devices and the related net devices in the same net namespace automatically .

Zhu Yanjun

> 
> I don't know an answer for that.
> 
> Thanks
Dust Li Oct. 24, 2022, 11:52 a.m. UTC | #7
On Mon, Oct 24, 2022 at 06:15:01AM +0000, yanjun.zhu@linux.dev wrote:
>October 24, 2022 9:10 AM, "Dust Li" <dust.li@linux.alibaba.com> wrote:
>
>> On Sun, Oct 23, 2022 at 06:04:47PM -0400, Zhu Yanjun wrote:
>> 
>>> From: Zhu Yanjun <yanjun.zhu@linux.dev>
>>> 
>>> There are shared and exclusive modes in RDMA net namespace. After
>>> discussion with Leon, the above modes are compatible with legacy IB
>>> device.
>>> 
>>> To the RoCE and iWARP devices, the ib devices should be in the same net
>>> namespace with the related net devices regardless of in shared or
>>> exclusive mode.
>> 
>> Does this mean that shared mode is no longer supported for RoCE and iWarp
>> devices ?
>
>From the discussion,  a RoCE and iWarp device should make ib devices and net devices in the same net. So a RoCE and iWarp device has no shared/exclusive modes.
>
>Shared/exclusive modes are for legacy ib devices, such as ipoib. 
>
>In this patch series, shared/exclusive modes are left for legacy ib devices.
>To a RoCE and iWarp device, we just keep net devices and ib devices in the same net.

I think this may limit the use case of RoCE and iWarp.

See the following use case:
In the container enviroment, we may have lots of containers on a host,
for example, more than 100. And we don't have that much VFs, so we use
ipvlan or other virtual network devices for each container, and put
those virtual network devices into each container(net namespace).
Since we only use 1 physical network device for all those containers,
there is only one RoCE device. If we don't support shared mode, we
cannot even enable RDMA for those containers with RoCE.

I don't know any other way to solve this, maybe I missed something ?

Thanks

>
>
>
>> 
>>> In the first commit, when the net devices are moved to a new net
>>> namespace, the related ib devices are also moved to the same net
>>> namespace.
>>> 
>>> In the second commit, the shared/exclusive modes still work with legacy
>>> ib devices. To the RoCE and iWARP devices, these modes will not be
>>> considered.
>>> 
>>> Because MLX4/5 do not call the function ib_device_set_netdev to map ib
>>> devices and the related net devices, the function ib_device_get_by_netdev
>>> can not get ib devices from net devices. In the third commit, all the
>>> registered ib devices are parsed to get the net devices, then compared
>>> with the given net devices.
>>> 
>>> The steps to make tests:
>>> 1) Create a new net namespace net0
>>> 
>>> ip netns add net0
>>> 
>>> 2) Show the rdma links in init_net
>>> 
>>> rdma link
>>> 
>>> "
>>> link mlx5_0/1 state DOWN physical_state DISABLED netdev enp7s0np1
>>> "
>>> 
>>> 3) Move the net device to net namespace net0
>>> 
>>> ip link set enp7s0np1 netns net0
>>> 
>>> 4) Show the rdma links in init_net again
>>> 
>>> rdma link
>>> 
>>> There is no rdma links
>>> 
>>> 5) Show the rdma links in net0
>>> 
>>> ip netns exec net0 rdma link
>>> 
>>> "
>>> link mlx5_0/1 state DOWN physical_state DISABLED netdev enp7s0np1
>>> "
>>> 
>>> We can confirm that rdma links are moved to the same net namespace with
>>> the net devices.
>>> 
>>> Zhu Yanjun (3):
>>> RDMA/core: Move ib device to the same net namespace with net device
>>> RDMA/core: The legacy IB devices still work with shared/exclusive mode
>>> RDMA/core: Get all the ib devices from net devices
>>> 
>>> drivers/infiniband/core/device.c | 107 ++++++++++++++++++++++++++++++-
>>> 1 file changed, 105 insertions(+), 2 deletions(-)
>>> 
>>> --
>>> 2.27.0
Zhu Yanjun Oct. 24, 2022, 1:12 p.m. UTC | #8
在 2022/10/24 19:52, Dust Li 写道:
> On Mon, Oct 24, 2022 at 06:15:01AM +0000, yanjun.zhu@linux.dev wrote:
>> October 24, 2022 9:10 AM, "Dust Li" <dust.li@linux.alibaba.com> wrote:
>>
>>> On Sun, Oct 23, 2022 at 06:04:47PM -0400, Zhu Yanjun wrote:
>>>
>>>> From: Zhu Yanjun <yanjun.zhu@linux.dev>
>>>>
>>>> There are shared and exclusive modes in RDMA net namespace. After
>>>> discussion with Leon, the above modes are compatible with legacy IB
>>>> device.
>>>>
>>>> To the RoCE and iWARP devices, the ib devices should be in the same net
>>>> namespace with the related net devices regardless of in shared or
>>>> exclusive mode.
>>> Does this mean that shared mode is no longer supported for RoCE and iWarp
>>> devices ?
> >From the discussion,  a RoCE and iWarp device should make ib devices and net devices in the same net. So a RoCE and iWarp device has no shared/exclusive modes.
>> Shared/exclusive modes are for legacy ib devices, such as ipoib.
>>
>> In this patch series, shared/exclusive modes are left for legacy ib devices.
>> To a RoCE and iWarp device, we just keep net devices and ib devices in the same net.
> I think this may limit the use case of RoCE and iWarp.
>
> See the following use case:
> In the container enviroment, we may have lots of containers on a host,
> for example, more than 100. And we don't have that much VFs, so we use
> ipvlan or other virtual network devices for each container, and put
> those virtual network devices into each container(net namespace).
> Since we only use 1 physical network device for all those containers,
> there is only one RoCE device. If we don't support shared mode, we
> cannot even enable RDMA for those containers with RoCE.

You use the ipvlan or other virtual network devices for each container.

In these containers, you also use RDMA, correct?

Since all the packets for these virtual network devices finally come to

the physical network devices, without shared/exclusive modes, it should 
work.

So we do not consider the shared/exclusive mode.

Zhu Yanjun

>
> I don't know any other way to solve this, maybe I missed something ?
>
> Thanks
>
>>
>>
>>>> In the first commit, when the net devices are moved to a new net
>>>> namespace, the related ib devices are also moved to the same net
>>>> namespace.
>>>>
>>>> In the second commit, the shared/exclusive modes still work with legacy
>>>> ib devices. To the RoCE and iWARP devices, these modes will not be
>>>> considered.
>>>>
>>>> Because MLX4/5 do not call the function ib_device_set_netdev to map ib
>>>> devices and the related net devices, the function ib_device_get_by_netdev
>>>> can not get ib devices from net devices. In the third commit, all the
>>>> registered ib devices are parsed to get the net devices, then compared
>>>> with the given net devices.
>>>>
>>>> The steps to make tests:
>>>> 1) Create a new net namespace net0
>>>>
>>>> ip netns add net0
>>>>
>>>> 2) Show the rdma links in init_net
>>>>
>>>> rdma link
>>>>
>>>> "
>>>> link mlx5_0/1 state DOWN physical_state DISABLED netdev enp7s0np1
>>>> "
>>>>
>>>> 3) Move the net device to net namespace net0
>>>>
>>>> ip link set enp7s0np1 netns net0
>>>>
>>>> 4) Show the rdma links in init_net again
>>>>
>>>> rdma link
>>>>
>>>> There is no rdma links
>>>>
>>>> 5) Show the rdma links in net0
>>>>
>>>> ip netns exec net0 rdma link
>>>>
>>>> "
>>>> link mlx5_0/1 state DOWN physical_state DISABLED netdev enp7s0np1
>>>> "
>>>>
>>>> We can confirm that rdma links are moved to the same net namespace with
>>>> the net devices.
>>>>
>>>> Zhu Yanjun (3):
>>>> RDMA/core: Move ib device to the same net namespace with net device
>>>> RDMA/core: The legacy IB devices still work with shared/exclusive mode
>>>> RDMA/core: Get all the ib devices from net devices
>>>>
>>>> drivers/infiniband/core/device.c | 107 ++++++++++++++++++++++++++++++-
>>>> 1 file changed, 105 insertions(+), 2 deletions(-)
>>>>
>>>> --
>>>> 2.27.0
Dust Li Oct. 24, 2022, 2:35 p.m. UTC | #9
On Mon, Oct 24, 2022 at 09:12:56PM +0800, Yanjun Zhu wrote:
>
>在 2022/10/24 19:52, Dust Li 写道:
>> On Mon, Oct 24, 2022 at 06:15:01AM +0000, yanjun.zhu@linux.dev wrote:
>> > October 24, 2022 9:10 AM, "Dust Li" <dust.li@linux.alibaba.com> wrote:
>> > 
>> > > On Sun, Oct 23, 2022 at 06:04:47PM -0400, Zhu Yanjun wrote:
>> > > 
>> > > > From: Zhu Yanjun <yanjun.zhu@linux.dev>
>> > > > 
>> > > > There are shared and exclusive modes in RDMA net namespace. After
>> > > > discussion with Leon, the above modes are compatible with legacy IB
>> > > > device.
>> > > > 
>> > > > To the RoCE and iWARP devices, the ib devices should be in the same net
>> > > > namespace with the related net devices regardless of in shared or
>> > > > exclusive mode.
>> > > Does this mean that shared mode is no longer supported for RoCE and iWarp
>> > > devices ?
>> >From the discussion,  a RoCE and iWarp device should make ib devices and net devices in the same net. So a RoCE and iWarp device has no shared/exclusive modes.
>> > Shared/exclusive modes are for legacy ib devices, such as ipoib.
>> > 
>> > In this patch series, shared/exclusive modes are left for legacy ib devices.
>> > To a RoCE and iWarp device, we just keep net devices and ib devices in the same net.
>> I think this may limit the use case of RoCE and iWarp.
>> 
>> See the following use case:
>> In the container enviroment, we may have lots of containers on a host,
>> for example, more than 100. And we don't have that much VFs, so we use
>> ipvlan or other virtual network devices for each container, and put
>> those virtual network devices into each container(net namespace).
>> Since we only use 1 physical network device for all those containers,
>> there is only one RoCE device. If we don't support shared mode, we
>> cannot even enable RDMA for those containers with RoCE.
>
>You use the ipvlan or other virtual network devices for each container.
>
>In these containers, you also use RDMA, correct?
>
>Since all the packets for these virtual network devices finally come to
>
>the physical network devices, without shared/exclusive modes, it should work.
>
>So we do not consider the shared/exclusive mode.

For the netdevice, that's true. But for RDMA, we should not even see
the ib device in the containers any more, so I think we cannot create
qp/cq, and RDMA is not available for these containers in this case.

Thanks



>
>Zhu Yanjun
>
>> 
>> I don't know any other way to solve this, maybe I missed something ?
>> 
>> Thanks
>> 
>> > 
>> > 
>> > > > In the first commit, when the net devices are moved to a new net
>> > > > namespace, the related ib devices are also moved to the same net
>> > > > namespace.
>> > > > 
>> > > > In the second commit, the shared/exclusive modes still work with legacy
>> > > > ib devices. To the RoCE and iWARP devices, these modes will not be
>> > > > considered.
>> > > > 
>> > > > Because MLX4/5 do not call the function ib_device_set_netdev to map ib
>> > > > devices and the related net devices, the function ib_device_get_by_netdev
>> > > > can not get ib devices from net devices. In the third commit, all the
>> > > > registered ib devices are parsed to get the net devices, then compared
>> > > > with the given net devices.
>> > > > 
>> > > > The steps to make tests:
>> > > > 1) Create a new net namespace net0
>> > > > 
>> > > > ip netns add net0
>> > > > 
>> > > > 2) Show the rdma links in init_net
>> > > > 
>> > > > rdma link
>> > > > 
>> > > > "
>> > > > link mlx5_0/1 state DOWN physical_state DISABLED netdev enp7s0np1
>> > > > "
>> > > > 
>> > > > 3) Move the net device to net namespace net0
>> > > > 
>> > > > ip link set enp7s0np1 netns net0
>> > > > 
>> > > > 4) Show the rdma links in init_net again
>> > > > 
>> > > > rdma link
>> > > > 
>> > > > There is no rdma links
>> > > > 
>> > > > 5) Show the rdma links in net0
>> > > > 
>> > > > ip netns exec net0 rdma link
>> > > > 
>> > > > "
>> > > > link mlx5_0/1 state DOWN physical_state DISABLED netdev enp7s0np1
>> > > > "
>> > > > 
>> > > > We can confirm that rdma links are moved to the same net namespace with
>> > > > the net devices.
>> > > > 
>> > > > Zhu Yanjun (3):
>> > > > RDMA/core: Move ib device to the same net namespace with net device
>> > > > RDMA/core: The legacy IB devices still work with shared/exclusive mode
>> > > > RDMA/core: Get all the ib devices from net devices
>> > > > 
>> > > > drivers/infiniband/core/device.c | 107 ++++++++++++++++++++++++++++++-
>> > > > 1 file changed, 105 insertions(+), 2 deletions(-)
>> > > > 
>> > > > --
>> > > > 2.27.0
Jason Gunthorpe Oct. 24, 2022, 4:41 p.m. UTC | #10
On Mon, Oct 24, 2022 at 10:35:21PM +0800, Dust Li wrote:

> For the netdevice, that's true. But for RDMA, we should not even see
> the ib device in the containers any more, so I think we cannot create
> qp/cq, and RDMA is not available for these containers in this case.

Correct, in shared mode the RDMA device should only allow using GID
table entries that have netdevs that are present in the processe's net
namespace.

This is, in general, the philosophy. The user is supposed to keep the
various devices in the namespace in sync, because the kernel cannot
guess what is correct.

Jason
Zhu Yanjun Oct. 25, 2022, 2:51 a.m. UTC | #11
在 2022/10/24 22:35, Dust Li 写道:
> On Mon, Oct 24, 2022 at 09:12:56PM +0800, Yanjun Zhu wrote:
>> 在 2022/10/24 19:52, Dust Li 写道:
>>> On Mon, Oct 24, 2022 at 06:15:01AM +0000, yanjun.zhu@linux.dev wrote:
>>>> October 24, 2022 9:10 AM, "Dust Li" <dust.li@linux.alibaba.com> wrote:
>>>>
>>>>> On Sun, Oct 23, 2022 at 06:04:47PM -0400, Zhu Yanjun wrote:
>>>>>
>>>>>> From: Zhu Yanjun <yanjun.zhu@linux.dev>
>>>>>>
>>>>>> There are shared and exclusive modes in RDMA net namespace. After
>>>>>> discussion with Leon, the above modes are compatible with legacy IB
>>>>>> device.
>>>>>>
>>>>>> To the RoCE and iWARP devices, the ib devices should be in the same net
>>>>>> namespace with the related net devices regardless of in shared or
>>>>>> exclusive mode.
>>>>> Does this mean that shared mode is no longer supported for RoCE and iWarp
>>>>> devices ?
>>> >From the discussion,  a RoCE and iWarp device should make ib devices and net devices in the same net. So a RoCE and iWarp device has no shared/exclusive modes.
>>>> Shared/exclusive modes are for legacy ib devices, such as ipoib.
>>>>
>>>> In this patch series, shared/exclusive modes are left for legacy ib devices.
>>>> To a RoCE and iWarp device, we just keep net devices and ib devices in the same net.
>>> I think this may limit the use case of RoCE and iWarp.
>>>
>>> See the following use case:
>>> In the container enviroment, we may have lots of containers on a host,
>>> for example, more than 100. And we don't have that much VFs, so we use
>>> ipvlan or other virtual network devices for each container, and put
>>> those virtual network devices into each container(net namespace).
>>> Since we only use 1 physical network device for all those containers,
>>> there is only one RoCE device. If we don't support shared mode, we
>>> cannot even enable RDMA for those containers with RoCE.
>> You use the ipvlan or other virtual network devices for each container.
>>
>> In these containers, you also use RDMA, correct?
>>
>> Since all the packets for these virtual network devices finally come to
>>
>> the physical network devices, without shared/exclusive modes, it should work.
>>
>> So we do not consider the shared/exclusive mode.
> For the netdevice, that's true. But for RDMA, we should not even see
> the ib device in the containers any more, so I think we cannot create
> qp/cq, and RDMA is not available for these containers in this case.

I can not get you.

Do you mean that RDMA can not be accessed in the container after these 
patches are applied?

Can you share a test case with me?


Zhu Yanjun

>
> Thanks
>
>
>
>> Zhu Yanjun
>>
>>> I don't know any other way to solve this, maybe I missed something ?
>>>
>>> Thanks
>>>
>>>>
>>>>>> In the first commit, when the net devices are moved to a new net
>>>>>> namespace, the related ib devices are also moved to the same net
>>>>>> namespace.
>>>>>>
>>>>>> In the second commit, the shared/exclusive modes still work with legacy
>>>>>> ib devices. To the RoCE and iWARP devices, these modes will not be
>>>>>> considered.
>>>>>>
>>>>>> Because MLX4/5 do not call the function ib_device_set_netdev to map ib
>>>>>> devices and the related net devices, the function ib_device_get_by_netdev
>>>>>> can not get ib devices from net devices. In the third commit, all the
>>>>>> registered ib devices are parsed to get the net devices, then compared
>>>>>> with the given net devices.
>>>>>>
>>>>>> The steps to make tests:
>>>>>> 1) Create a new net namespace net0
>>>>>>
>>>>>> ip netns add net0
>>>>>>
>>>>>> 2) Show the rdma links in init_net
>>>>>>
>>>>>> rdma link
>>>>>>
>>>>>> "
>>>>>> link mlx5_0/1 state DOWN physical_state DISABLED netdev enp7s0np1
>>>>>> "
>>>>>>
>>>>>> 3) Move the net device to net namespace net0
>>>>>>
>>>>>> ip link set enp7s0np1 netns net0
>>>>>>
>>>>>> 4) Show the rdma links in init_net again
>>>>>>
>>>>>> rdma link
>>>>>>
>>>>>> There is no rdma links
>>>>>>
>>>>>> 5) Show the rdma links in net0
>>>>>>
>>>>>> ip netns exec net0 rdma link
>>>>>>
>>>>>> "
>>>>>> link mlx5_0/1 state DOWN physical_state DISABLED netdev enp7s0np1
>>>>>> "
>>>>>>
>>>>>> We can confirm that rdma links are moved to the same net namespace with
>>>>>> the net devices.
>>>>>>
>>>>>> Zhu Yanjun (3):
>>>>>> RDMA/core: Move ib device to the same net namespace with net device
>>>>>> RDMA/core: The legacy IB devices still work with shared/exclusive mode
>>>>>> RDMA/core: Get all the ib devices from net devices
>>>>>>
>>>>>> drivers/infiniband/core/device.c | 107 ++++++++++++++++++++++++++++++-
>>>>>> 1 file changed, 105 insertions(+), 2 deletions(-)
>>>>>>
>>>>>> --
>>>>>> 2.27.0
Dust Li Oct. 26, 2022, 4:08 a.m. UTC | #12
On Tue, Oct 25, 2022 at 10:51:38AM +0800, Yanjun Zhu wrote:
>
>在 2022/10/24 22:35, Dust Li 写道:
>> On Mon, Oct 24, 2022 at 09:12:56PM +0800, Yanjun Zhu wrote:
>> > 在 2022/10/24 19:52, Dust Li 写道:
>> > > On Mon, Oct 24, 2022 at 06:15:01AM +0000, yanjun.zhu@linux.dev wrote:
>> > > > October 24, 2022 9:10 AM, "Dust Li" <dust.li@linux.alibaba.com> wrote:
>> > > > 
>> > > > > On Sun, Oct 23, 2022 at 06:04:47PM -0400, Zhu Yanjun wrote:
>> > > > > 
>> > > > > > From: Zhu Yanjun <yanjun.zhu@linux.dev>
>> > > > > > 
>> > > > > > There are shared and exclusive modes in RDMA net namespace. After
>> > > > > > discussion with Leon, the above modes are compatible with legacy IB
>> > > > > > device.
>> > > > > > 
>> > > > > > To the RoCE and iWARP devices, the ib devices should be in the same net
>> > > > > > namespace with the related net devices regardless of in shared or
>> > > > > > exclusive mode.
>> > > > > Does this mean that shared mode is no longer supported for RoCE and iWarp
>> > > > > devices ?
>> > > >From the discussion,  a RoCE and iWarp device should make ib devices and net devices in the same net. So a RoCE and iWarp device has no shared/exclusive modes.
>> > > > Shared/exclusive modes are for legacy ib devices, such as ipoib.
>> > > > 
>> > > > In this patch series, shared/exclusive modes are left for legacy ib devices.
>> > > > To a RoCE and iWarp device, we just keep net devices and ib devices in the same net.
>> > > I think this may limit the use case of RoCE and iWarp.
>> > > 
>> > > See the following use case:
>> > > In the container enviroment, we may have lots of containers on a host,
>> > > for example, more than 100. And we don't have that much VFs, so we use
>> > > ipvlan or other virtual network devices for each container, and put
>> > > those virtual network devices into each container(net namespace).
>> > > Since we only use 1 physical network device for all those containers,
>> > > there is only one RoCE device. If we don't support shared mode, we
>> > > cannot even enable RDMA for those containers with RoCE.
>> > You use the ipvlan or other virtual network devices for each container.
>> > 
>> > In these containers, you also use RDMA, correct?
>> > 
>> > Since all the packets for these virtual network devices finally come to
>> > 
>> > the physical network devices, without shared/exclusive modes, it should work.
>> > 
>> > So we do not consider the shared/exclusive mode.
>> For the netdevice, that's true. But for RDMA, we should not even see
>> the ib device in the containers any more, so I think we cannot create
>> qp/cq, and RDMA is not available for these containers in this case.
>
>I can not get you.
>
>Do you mean that RDMA can not be accessed in the container after these
>patches are applied?
>
>Can you share a test case with me?

OK, I will test your patch first and if possible I will provider a test
case.

Thanks


>
>
>Zhu Yanjun
>
>> 
>> Thanks
>> 
>> 
>> 
>> > Zhu Yanjun
>> > 
>> > > I don't know any other way to solve this, maybe I missed something ?
>> > > 
>> > > Thanks
>> > > 
>> > > > 
>> > > > > > In the first commit, when the net devices are moved to a new net
>> > > > > > namespace, the related ib devices are also moved to the same net
>> > > > > > namespace.
>> > > > > > 
>> > > > > > In the second commit, the shared/exclusive modes still work with legacy
>> > > > > > ib devices. To the RoCE and iWARP devices, these modes will not be
>> > > > > > considered.
>> > > > > > 
>> > > > > > Because MLX4/5 do not call the function ib_device_set_netdev to map ib
>> > > > > > devices and the related net devices, the function ib_device_get_by_netdev
>> > > > > > can not get ib devices from net devices. In the third commit, all the
>> > > > > > registered ib devices are parsed to get the net devices, then compared
>> > > > > > with the given net devices.
>> > > > > > 
>> > > > > > The steps to make tests:
>> > > > > > 1) Create a new net namespace net0
>> > > > > > 
>> > > > > > ip netns add net0
>> > > > > > 
>> > > > > > 2) Show the rdma links in init_net
>> > > > > > 
>> > > > > > rdma link
>> > > > > > 
>> > > > > > "
>> > > > > > link mlx5_0/1 state DOWN physical_state DISABLED netdev enp7s0np1
>> > > > > > "
>> > > > > > 
>> > > > > > 3) Move the net device to net namespace net0
>> > > > > > 
>> > > > > > ip link set enp7s0np1 netns net0
>> > > > > > 
>> > > > > > 4) Show the rdma links in init_net again
>> > > > > > 
>> > > > > > rdma link
>> > > > > > 
>> > > > > > There is no rdma links
>> > > > > > 
>> > > > > > 5) Show the rdma links in net0
>> > > > > > 
>> > > > > > ip netns exec net0 rdma link
>> > > > > > 
>> > > > > > "
>> > > > > > link mlx5_0/1 state DOWN physical_state DISABLED netdev enp7s0np1
>> > > > > > "
>> > > > > > 
>> > > > > > We can confirm that rdma links are moved to the same net namespace with
>> > > > > > the net devices.
>> > > > > > 
>> > > > > > Zhu Yanjun (3):
>> > > > > > RDMA/core: Move ib device to the same net namespace with net device
>> > > > > > RDMA/core: The legacy IB devices still work with shared/exclusive mode
>> > > > > > RDMA/core: Get all the ib devices from net devices
>> > > > > > 
>> > > > > > drivers/infiniband/core/device.c | 107 ++++++++++++++++++++++++++++++-
>> > > > > > 1 file changed, 105 insertions(+), 2 deletions(-)
>> > > > > > 
>> > > > > > --
>> > > > > > 2.27.0
Dust Li Oct. 26, 2022, 3:01 p.m. UTC | #13
On Sun, Oct 23, 2022 at 06:04:47PM -0400, Zhu Yanjun wrote:
>From: Zhu Yanjun <yanjun.zhu@linux.dev>
>
>There are shared and exclusive modes in RDMA net namespace. After
>discussion with Leon, the above modes are compatible with legacy IB
>device. 
>
>To the RoCE and iWARP devices, the ib devices should be in the same net
>namespace with the related net devices regardless of in shared or
>exclusive mode.
>
>In the first commit, when the net devices are moved to a new net
>namespace, the related ib devices are also moved to the same net
>namespace.
>
>In the second commit, the shared/exclusive modes still work with legacy
>ib devices. To the RoCE and iWARP devices, these modes will not be
>considered.
>
>Because MLX4/5 do not call the function ib_device_set_netdev to map ib
>devices and the related net devices, the function ib_device_get_by_netdev
>can not get ib devices from net devices. In the third commit, all the
>registered ib devices are parsed to get the net devices, then compared
>with the given net devices.
>
>The steps to make tests:
>1) Create a new net namespace net0
>
>   ip netns add net0
>
>2) Show the rdma links in init_net
>
>   rdma link
>
>   "
>   link mlx5_0/1 state DOWN physical_state DISABLED netdev enp7s0np1
>   "
>
>3) Move the net device to net namespace net0
>
>   ip link set enp7s0np1 netns net0
>
>4) Show the rdma links in init_net again
>
>   rdma link
>
>   There is no rdma links

Follow your steps, after step 3), I cannot reproduce this,
`rdma link` running in init_net still show the link.

I'm testing on a VM with ConnectX-4Lx, SRIOV enabled, and VF is passthroughed
to the VM.

Anything I missed ?

>
>5) Show the rdma links in net0
>
>   ip netns exec net0 rdma link
>
>   "
>   link mlx5_0/1 state DOWN physical_state DISABLED netdev enp7s0np1
>   "
>
>We can confirm that rdma links are moved to the same net namespace with
>the net devices.
>
>Zhu Yanjun (3):
>  RDMA/core: Move ib device to the same net namespace with net device
>  RDMA/core: The legacy IB devices still work with shared/exclusive mode
>  RDMA/core: Get all the ib devices from net devices
>
> drivers/infiniband/core/device.c | 107 ++++++++++++++++++++++++++++++-
> 1 file changed, 105 insertions(+), 2 deletions(-)
>
>-- 
>2.27.0
Dust Li Oct. 27, 2022, 2:30 a.m. UTC | #14
On Wed, Oct 26, 2022 at 11:01:13PM +0800, Dust Li wrote:
>On Sun, Oct 23, 2022 at 06:04:47PM -0400, Zhu Yanjun wrote:
>>From: Zhu Yanjun <yanjun.zhu@linux.dev>
>>
>>There are shared and exclusive modes in RDMA net namespace. After
>>discussion with Leon, the above modes are compatible with legacy IB
>>device. 
>>
>>To the RoCE and iWARP devices, the ib devices should be in the same net
>>namespace with the related net devices regardless of in shared or
>>exclusive mode.
>>
>>In the first commit, when the net devices are moved to a new net
>>namespace, the related ib devices are also moved to the same net
>>namespace.
>>
>>In the second commit, the shared/exclusive modes still work with legacy
>>ib devices. To the RoCE and iWARP devices, these modes will not be
>>considered.
>>
>>Because MLX4/5 do not call the function ib_device_set_netdev to map ib
>>devices and the related net devices, the function ib_device_get_by_netdev
>>can not get ib devices from net devices. In the third commit, all the
>>registered ib devices are parsed to get the net devices, then compared
>>with the given net devices.
>>
>>The steps to make tests:
>>1) Create a new net namespace net0
>>
>>   ip netns add net0
>>
>>2) Show the rdma links in init_net
>>
>>   rdma link
>>
>>   "
>>   link mlx5_0/1 state DOWN physical_state DISABLED netdev enp7s0np1
>>   "
>>
>>3) Move the net device to net namespace net0
>>
>>   ip link set enp7s0np1 netns net0
>>
>>4) Show the rdma links in init_net again
>>
>>   rdma link
>>
>>   There is no rdma links
>
>Follow your steps, after step 3), I cannot reproduce this,
>`rdma link` running in init_net still show the link.
>
>I'm testing on a VM with ConnectX-4Lx, SRIOV enabled, and VF is passthroughed
>to the VM.
>
>Anything I missed ?

Hi Zhu:

I think I know what's wrong here.

With your patch, if I put the netdevice from init_net into another
net_namespace(say ns0), the RDMA device is not moved, and `rdma link`
can't see the RDMA device in ns0(We can see it if we are in shared mode)

I think this is not the correct behaviour.

Maybe we should do:
1. If we are in shared mode, keep the current behaviour
2. else we are in exclusive mode. When the corresponding netdevice of the RoCE
   or iWarp device is moved from one net namespace to another, we move the
   RDMA device into that net namespace

What do you think ?

Thanks.

>
>>
>>5) Show the rdma links in net0
>>
>>   ip netns exec net0 rdma link
>>
>>   "
>>   link mlx5_0/1 state DOWN physical_state DISABLED netdev enp7s0np1
>>   "
>>
>>We can confirm that rdma links are moved to the same net namespace with
>>the net devices.
>>
>>Zhu Yanjun (3):
>>  RDMA/core: Move ib device to the same net namespace with net device
>>  RDMA/core: The legacy IB devices still work with shared/exclusive mode
>>  RDMA/core: Get all the ib devices from net devices
>>
>> drivers/infiniband/core/device.c | 107 ++++++++++++++++++++++++++++++-
>> 1 file changed, 105 insertions(+), 2 deletions(-)
>>
>>-- 
>>2.27.0
Zhu Yanjun Oct. 27, 2022, 2:54 a.m. UTC | #15
October 27, 2022 10:30 AM, "Dust Li" <dust.li@linux.alibaba.com> wrote:

> On Wed, Oct 26, 2022 at 11:01:13PM +0800, Dust Li wrote:
> 
>> On Sun, Oct 23, 2022 at 06:04:47PM -0400, Zhu Yanjun wrote:
>>> From: Zhu Yanjun <yanjun.zhu@linux.dev>
>>> 
>>> There are shared and exclusive modes in RDMA net namespace. After
>>> discussion with Leon, the above modes are compatible with legacy IB
>>> device.
>>> 
>>> To the RoCE and iWARP devices, the ib devices should be in the same net
>>> namespace with the related net devices regardless of in shared or
>>> exclusive mode.
>>> 
>>> In the first commit, when the net devices are moved to a new net
>>> namespace, the related ib devices are also moved to the same net
>>> namespace.
>>> 
>>> In the second commit, the shared/exclusive modes still work with legacy
>>> ib devices. To the RoCE and iWARP devices, these modes will not be
>>> considered.
>>> 
>>> Because MLX4/5 do not call the function ib_device_set_netdev to map ib
>>> devices and the related net devices, the function ib_device_get_by_netdev
>>> can not get ib devices from net devices. In the third commit, all the
>>> registered ib devices are parsed to get the net devices, then compared
>>> with the given net devices.
>>> 
>>> The steps to make tests:
>>> 1) Create a new net namespace net0
>>> 
>>> ip netns add net0
>>> 
>>> 2) Show the rdma links in init_net
>>> 
>>> rdma link
>>> 
>>> "
>>> link mlx5_0/1 state DOWN physical_state DISABLED netdev enp7s0np1
>>> "
>>> 
>>> 3) Move the net device to net namespace net0
>>> 
>>> ip link set enp7s0np1 netns net0
>>> 
>>> 4) Show the rdma links in init_net again
>>> 
>>> rdma link
>>> 
>>> There is no rdma links
>> 
>> Follow your steps, after step 3), I cannot reproduce this,
>> `rdma link` running in init_net still show the link.
>> 
>> I'm testing on a VM with ConnectX-4Lx, SRIOV enabled, and VF is passthroughed
>> to the VM.
>> 
>> Anything I missed ?
> 
> Hi Zhu:
> 
> I think I know what's wrong here.
> 
> With your patch, if I put the netdevice from init_net into another
> net_namespace(say ns0), the RDMA device is not moved, and `rdma link`
> can't see the RDMA device in ns0(We can see it if we are in shared mode)


Yes. This should move rdma device to ns0, the same net namespace with the net device.

I use the following device to make tests. It can work well.
"
Ethernet controller: Mellanox Technologies MT27800 Family [ConnectX-5]
"
And the driver is drivers/infiniband/hw/mlx5.

You are using ConnectX-4Lx. Can you let me know which driver is used?

Thanks and Regards,
Zhu Yanjun

> 
> I think this is not the correct behaviour.
> 
> Maybe we should do:
> 1. If we are in shared mode, keep the current behaviour
> 2. else we are in exclusive mode. When the corresponding netdevice of the RoCE
> or iWarp device is moved from one net namespace to another, we move the
> RDMA device into that net namespace
> 
> What do you think ?
> 
> Thanks.
> 
>>> 5) Show the rdma links in net0
>>> 
>>> ip netns exec net0 rdma link
>>> 
>>> "
>>> link mlx5_0/1 state DOWN physical_state DISABLED netdev enp7s0np1
>>> "
>>> 
>>> We can confirm that rdma links are moved to the same net namespace with
>>> the net devices.
>>> 
>>> Zhu Yanjun (3):
>>> RDMA/core: Move ib device to the same net namespace with net device
>>> RDMA/core: The legacy IB devices still work with shared/exclusive mode
>>> RDMA/core: Get all the ib devices from net devices
>>> 
>>> drivers/infiniband/core/device.c | 107 ++++++++++++++++++++++++++++++-
>>> 1 file changed, 105 insertions(+), 2 deletions(-)
>>> 
>>> --
>>> 2.27.0
Parav Pandit Oct. 27, 2022, 3:01 a.m. UTC | #16
> From: Dust Li <dust.li@linux.alibaba.com>
> Sent: Wednesday, October 26, 2022 10:31 PM


> 2. else we are in
> exclusive mode. When the corresponding netdevice of the RoCE
>    or iWarp device is moved from one net namespace to another, we move
> the
>    RDMA device into that net namespace
> 
> What do you think ?
No. one device is not supposed to move other devices.
Every device is independent that should be moved by explicit command.

Also changes like above breaks the existing orchestration, it no-go.
Zhu Yanjun Oct. 27, 2022, 3:07 a.m. UTC | #17
October 27, 2022 11:01 AM, "Parav Pandit" <parav@nvidia.com> wrote:

>> From: Dust Li <dust.li@linux.alibaba.com>
>> Sent: Wednesday, October 26, 2022 10:31 PM
>> 
>> 2. else we are in
>> exclusive mode. When the corresponding netdevice of the RoCE
>> or iWarp device is moved from one net namespace to another, we move
>> the
>> RDMA device into that net namespace
>> 
>> What do you think ?
> 
> No. one device is not supposed to move other devices.
> Every device is independent that should be moved by explicit command.

Can you show us where we can find this rule "Every device is independent that should be moved by explicit command."?

> 
> Also changes like above breaks the existing orchestration, it no-go.

In a RoCE device, ib device is related with the net device. When a net device is moved
to a new net namespace, if the ib device is not in the same net device, how to make ib device work?

Thanks and Regards,
Zhu Yanjun
Parav Pandit Oct. 27, 2022, 3:10 a.m. UTC | #18
> From: yanjun.zhu@linux.dev <yanjun.zhu@linux.dev>
> Sent: Wednesday, October 26, 2022 11:08 PM
> 
> October 27, 2022 11:01 AM, "Parav Pandit" <parav@nvidia.com> wrote:
> 
> >> From: Dust Li <dust.li@linux.alibaba.com>
> >> Sent: Wednesday, October 26, 2022 10:31 PM
> >>
> >> 2. else we are in
> >> exclusive mode. When the corresponding netdevice of the RoCE or iWarp
> >> device is moved from one net namespace to another, we move the RDMA
> >> device into that net namespace
> >>
> >> What do you think ?
> >
> > No. one device is not supposed to move other devices.
> > Every device is independent that should be moved by explicit command.
> 
> Can you show us where we can find this rule "Every device is independent
> that should be moved by explicit command."?
> 
> >
> > Also changes like above breaks the existing orchestration, it no-go.
> 
> In a RoCE device, ib device is related with the net device. When a net device
> is moved to a new net namespace, if the ib device is not in the same net
> device, how to make ib device work?
RDMA device should also be moved to the same net namespace as that of netdev.

Steps should be,

$ rdma system set netns exclusive
$ ip netns add NSNAME
$ ip link set [NETDEV] netns NSNAME
$ rdma dev set [ RDMA_DEV ] netns NSNAME
Zhu Yanjun Oct. 27, 2022, 3:17 a.m. UTC | #19
October 27, 2022 11:10 AM, "Parav Pandit" <parav@nvidia.com> wrote:

>> From: yanjun.zhu@linux.dev <yanjun.zhu@linux.dev>
>> Sent: Wednesday, October 26, 2022 11:08 PM
>> 
>> October 27, 2022 11:01 AM, "Parav Pandit" <parav@nvidia.com> wrote:
>> 
>> From: Dust Li <dust.li@linux.alibaba.com>
>> Sent: Wednesday, October 26, 2022 10:31 PM
>> 
>> 2. else we are in
>> exclusive mode. When the corresponding netdevice of the RoCE or iWarp
>> device is moved from one net namespace to another, we move the RDMA
>> device into that net namespace
>> 
>> What do you think ?
>> 
>> No. one device is not supposed to move other devices.
>> Every device is independent that should be moved by explicit command.
>> 
>> Can you show us where we can find this rule "Every device is independent
>> that should be moved by explicit command."?
>> 
>> Also changes like above breaks the existing orchestration, it no-go.
>> 
>> In a RoCE device, ib device is related with the net device. When a net device
>> is moved to a new net namespace, if the ib device is not in the same net
>> device, how to make ib device work?
> 
> RDMA device should also be moved to the same net namespace as that of netdev.

sure. I know the following commands.

In my commits, the process of moving IB devices to the same net namespace with net devices is automatically finished.

Is it OK?

Zhu Yanjun

> 
> Steps should be,
> 
> $ rdma system set netns exclusive
> $ ip netns add NSNAME
> $ ip link set [NETDEV] netns NSNAME
> $ rdma dev set [ RDMA_DEV ] netns NSNAME
Parav Pandit Oct. 27, 2022, 3:21 a.m. UTC | #20
> From: yanjun.zhu@linux.dev <yanjun.zhu@linux.dev>
> Sent: Wednesday, October 26, 2022 11:17 PM
> 
> October 27, 2022 11:10 AM, "Parav Pandit" <parav@nvidia.com> wrote:
> 
> >> From: yanjun.zhu@linux.dev <yanjun.zhu@linux.dev>
> >> Sent: Wednesday, October 26, 2022 11:08 PM
> >>
> >> October 27, 2022 11:01 AM, "Parav Pandit" <parav@nvidia.com> wrote:
> >>
> >> From: Dust Li <dust.li@linux.alibaba.com>
> >> Sent: Wednesday, October 26, 2022 10:31 PM
> >>
> >> 2. else we are in
> >> exclusive mode. When the corresponding netdevice of the RoCE or iWarp
> >> device is moved from one net namespace to another, we move the RDMA
> >> device into that net namespace
> >>
> >> What do you think ?
> >>
> >> No. one device is not supposed to move other devices.
> >> Every device is independent that should be moved by explicit command.
> >>
> >> Can you show us where we can find this rule "Every device is
> >> independent that should be moved by explicit command."?
> >>
> >> Also changes like above breaks the existing orchestration, it no-go.
> >>
> >> In a RoCE device, ib device is related with the net device. When a
> >> net device is moved to a new net namespace, if the ib device is not
> >> in the same net device, how to make ib device work?
> >
> > RDMA device should also be moved to the same net namespace as that of
> netdev.
> 
> sure. I know the following commands.
> 
> In my commits, the process of moving IB devices to the same net namespace
> with net devices is automatically finished.
> 
> Is it OK?
> 
No. 
Change like this breaks the user space who expect to move the rdma device to the net namespace explicitly.
It wont find the device which got moved as part of some other device movement.
Currently define scheme covers at least 3 different types of RDMA devices.
1. IB and IPoIB
2. RoCE
3. iWarp

Each has somewhat different relation to their net device.
Zhu Yanjun Oct. 27, 2022, 3:39 a.m. UTC | #21
October 27, 2022 11:21 AM, "Parav Pandit" <parav@nvidia.com> wrote:

>> From: yanjun.zhu@linux.dev <yanjun.zhu@linux.dev>
>> Sent: Wednesday, October 26, 2022 11:17 PM
>> 
>> October 27, 2022 11:10 AM, "Parav Pandit" <parav@nvidia.com> wrote:
>> 
>> From: yanjun.zhu@linux.dev <yanjun.zhu@linux.dev>
>> Sent: Wednesday, October 26, 2022 11:08 PM
>> 
>> October 27, 2022 11:01 AM, "Parav Pandit" <parav@nvidia.com> wrote:
>> 
>> From: Dust Li <dust.li@linux.alibaba.com>
>> Sent: Wednesday, October 26, 2022 10:31 PM
>> 
>> 2. else we are in
>> exclusive mode. When the corresponding netdevice of the RoCE or iWarp
>> device is moved from one net namespace to another, we move the RDMA
>> device into that net namespace
>> 
>> What do you think ?
>> 
>> No. one device is not supposed to move other devices.
>> Every device is independent that should be moved by explicit command.
>> 
>> Can you show us where we can find this rule "Every device is
>> independent that should be moved by explicit command."?
>> 
>> Also changes like above breaks the existing orchestration, it no-go.
>> 
>> In a RoCE device, ib device is related with the net device. When a
>> net device is moved to a new net namespace, if the ib device is not
>> in the same net device, how to make ib device work?
>> 
>> RDMA device should also be moved to the same net namespace as that of
>> netdev.
>> 
>> sure. I know the following commands.
>> 
>> In my commits, the process of moving IB devices to the same net namespace
>> with net devices is automatically finished.
>> 
>> Is it OK?
> 
> No.
> Change like this breaks the user space who expect to move the rdma device to the net namespace
> explicitly.

Which specification makes this kind of rule? Where can we find it?

> It wont find the device which got moved as part of some other device movement.
> Currently define scheme covers at least 3 different types of RDMA devices.
> 1. IB and IPoIB
> 2. RoCE
> 3. iWarp
> 
> Each has somewhat different relation to their net device.

IPoIB, RoCE and iWarp are somewhat different relation to their net device.
To RoCE and iWarp devices, ib devices should be the same net namespace with the related net devices.
Or else we can not make ib devices work. This is why I send out these commits.

Zhu Yanjun
Parav Pandit Oct. 27, 2022, 3:48 a.m. UTC | #22
> From: yanjun.zhu@linux.dev <yanjun.zhu@linux.dev>
> Sent: Wednesday, October 26, 2022 11:39 PM
> 
> October 27, 2022 11:21 AM, "Parav Pandit" <parav@nvidia.com> wrote:
> 
> >> From: yanjun.zhu@linux.dev <yanjun.zhu@linux.dev>
> >> Sent: Wednesday, October 26, 2022 11:17 PM
> >>
> >> October 27, 2022 11:10 AM, "Parav Pandit" <parav@nvidia.com> wrote:
> >>
> >> From: yanjun.zhu@linux.dev <yanjun.zhu@linux.dev>
> >> Sent: Wednesday, October 26, 2022 11:08 PM
> >>
> >> October 27, 2022 11:01 AM, "Parav Pandit" <parav@nvidia.com> wrote:
> >>
> >> From: Dust Li <dust.li@linux.alibaba.com>
> >> Sent: Wednesday, October 26, 2022 10:31 PM
> >>
> >> 2. else we are in
> >> exclusive mode. When the corresponding netdevice of the RoCE or iWarp
> >> device is moved from one net namespace to another, we move the RDMA
> >> device into that net namespace
> >>
> >> What do you think ?
> >>
> >> No. one device is not supposed to move other devices.
> >> Every device is independent that should be moved by explicit command.
> >>
> >> Can you show us where we can find this rule "Every device is
> >> independent that should be moved by explicit command."?
> >>
> >> Also changes like above breaks the existing orchestration, it no-go.
> >>
> >> In a RoCE device, ib device is related with the net device. When a
> >> net device is moved to a new net namespace, if the ib device is not
> >> in the same net device, how to make ib device work?
> >>
> >> RDMA device should also be moved to the same net namespace as that of
> >> netdev.
> >>
> >> sure. I know the following commands.
> >>
> >> In my commits, the process of moving IB devices to the same net
> >> namespace with net devices is automatically finished.
> >>
> >> Is it OK?
> >
> > No.
> > Change like this breaks the user space who expect to move the rdma
> > device to the net namespace explicitly.
> 
> Which specification makes this kind of rule? Where can we find it?
>
Existing ABI defines this which exists for many years now.
There is no Linux kernel subsystem or module to my knowledge that attempt to move multiple devices using single command.
When user executes command , user explicitly give device name "foo", only "foo" should move.
Other loosely coupled device whose name is not specified in the ip command which has a different life cycle should not move along with "foo".

You are trying to define the new rule that breaks the existing ABI and the iproute2 (ip and rdma) command semantics.
It is implicit that when command is issued on device A, operate on device A. This is part of iproute2 functioning.

> > It wont find the device which got moved as part of some other device
> movement.
> > Currently define scheme covers at least 3 different types of RDMA devices.
> > 1. IB and IPoIB
> > 2. RoCE
> > 3. iWarp
> >
> > Each has somewhat different relation to their net device.
> 
> IPoIB, RoCE and iWarp are somewhat different relation to their net device.
> To RoCE and iWarp devices, ib devices should be the same net namespace
> with the related net devices.
> Or else we can not make ib devices work. This is why I send out these
> commits.
So please move the rdma device also to the desired net namespace and it will work.
Zhu Yanjun Oct. 27, 2022, 6:01 a.m. UTC | #23
October 27, 2022 11:48 AM, "Parav Pandit" <parav@nvidia.com> wrote:

>> From: yanjun.zhu@linux.dev <yanjun.zhu@linux.dev>
>> Sent: Wednesday, October 26, 2022 11:39 PM
>> 
>> October 27, 2022 11:21 AM, "Parav Pandit" <parav@nvidia.com> wrote:
>> 
>> From: yanjun.zhu@linux.dev <yanjun.zhu@linux.dev>
>> Sent: Wednesday, October 26, 2022 11:17 PM
>> 
>> October 27, 2022 11:10 AM, "Parav Pandit" <parav@nvidia.com> wrote:
>> 
>> From: yanjun.zhu@linux.dev <yanjun.zhu@linux.dev>
>> Sent: Wednesday, October 26, 2022 11:08 PM
>> 
>> October 27, 2022 11:01 AM, "Parav Pandit" <parav@nvidia.com> wrote:
>> 
>> From: Dust Li <dust.li@linux.alibaba.com>
>> Sent: Wednesday, October 26, 2022 10:31 PM
>> 
>> 2. else we are in
>> exclusive mode. When the corresponding netdevice of the RoCE or iWarp
>> device is moved from one net namespace to another, we move the RDMA
>> device into that net namespace
>> 
>> What do you think ?
>> 
>> No. one device is not supposed to move other devices.
>> Every device is independent that should be moved by explicit command.
>> 
>> Can you show us where we can find this rule "Every device is
>> independent that should be moved by explicit command."?
>> 
>> Also changes like above breaks the existing orchestration, it no-go.
>> 
>> In a RoCE device, ib device is related with the net device. When a
>> net device is moved to a new net namespace, if the ib device is not
>> in the same net device, how to make ib device work?
>> 
>> RDMA device should also be moved to the same net namespace as that of
>> netdev.
>> 
>> sure. I know the following commands.
>> 
>> In my commits, the process of moving IB devices to the same net
>> namespace with net devices is automatically finished.
>> 
>> Is it OK?
>> 
>> No.
>> Change like this breaks the user space who expect to move the rdma
>> device to the net namespace explicitly.
>> 
>> Which specification makes this kind of rule? Where can we find it?
> 
> Existing ABI defines this which exists for many years now.

About ABI, I read through this link https://en.wikipedia.org/wiki/Application_binary_interface

Details covered by an ABI include the following:

Processor instruction set, with details like register file structure, stack organization, memory access types, etc.

Sizes, layouts, and alignments of basic data types that the processor can directly access

Calling convention, which controls how the arguments of functions are passed, and return values retrieved; for example, it controls the following:
  Whether all parameters are passed on the stack, or some are passed in registers
  Which registers are used for which function parameters
  Whether the first function parameter passed on the stack is pushed first or last

How an application should make system calls to the operating system, and if the ABI specifies direct system calls rather than procedure calls to system call stubs, the system call numbers

In the case of a complete operating system ABI, the binary format of object files, program libraries, etc.

And I do not find the rule that you mentioned.

> There is no Linux kernel subsystem or module to my knowledge that attempt to move multiple devices
> using single command.
> When user executes command , user explicitly give device name "foo", only "foo" should move.
> Other loosely coupled device whose name is not specified in the ip command which has a different
> life cycle should not move along with "foo".
> 
> You are trying to define the new rule that breaks the existing ABI and the iproute2 (ip and rdma)
> command semantics.
> It is implicit that when command is issued on device A, operate on device A. This is part of
> iproute2 functioning.

About iproute2, I read this link https://wiki.linuxfoundation.org/networking/iproute2#documentation

There is no rules that you mentioned.

This rule is defined explicitly or implicitly?

Zhu Yanjun
> 
>> It wont find the device which got moved as part of some other device
>> movement.
>> Currently define scheme covers at least 3 different types of RDMA devices.
>> 1. IB and IPoIB
>> 2. RoCE
>> 3. iWarp
>> 
>> Each has somewhat different relation to their net device.
>> 
>> IPoIB, RoCE and iWarp are somewhat different relation to their net device.
>> To RoCE and iWarp devices, ib devices should be the same net namespace
>> with the related net devices.
>> Or else we can not make ib devices work. This is why I send out these
>> commits.
> 
> So please move the rdma device also to the desired net namespace and it will work.
Parav Pandit Oct. 27, 2022, 2:06 p.m. UTC | #24
> From: yanjun.zhu@linux.dev <yanjun.zhu@linux.dev>
> Sent: Thursday, October 27, 2022 2:02 AM
> To: Parav Pandit <parav@nvidia.com>; dust.li@linux.alibaba.com; Zhu Yanjun
> <yanjun.zhu@intel.com>; jgg@ziepe.ca; leon@kernel.org; linux-
> rdma@vger.kernel.org
> Subject: Re: [PATCH 0/3] RDMA net namespace
> 
> October 27, 2022 11:48 AM, "Parav Pandit" <parav@nvidia.com> wrote:
> 
> >> From: yanjun.zhu@linux.dev <yanjun.zhu@linux.dev>
> >> Sent: Wednesday, October 26, 2022 11:39 PM
> >>
> >> October 27, 2022 11:21 AM, "Parav Pandit" <parav@nvidia.com> wrote:
> >>
> >> From: yanjun.zhu@linux.dev <yanjun.zhu@linux.dev>
> >> Sent: Wednesday, October 26, 2022 11:17 PM
> >>
> >> October 27, 2022 11:10 AM, "Parav Pandit" <parav@nvidia.com> wrote:
> >>
> >> From: yanjun.zhu@linux.dev <yanjun.zhu@linux.dev>
> >> Sent: Wednesday, October 26, 2022 11:08 PM
> >>
> >> October 27, 2022 11:01 AM, "Parav Pandit" <parav@nvidia.com> wrote:
> >>
> >> From: Dust Li <dust.li@linux.alibaba.com>
> >> Sent: Wednesday, October 26, 2022 10:31 PM
> >>
> >> 2. else we are in
> >> exclusive mode. When the corresponding netdevice of the RoCE or iWarp
> >> device is moved from one net namespace to another, we move the
> RDMA
> >> device into that net namespace
> >>
> >> What do you think ?
> >>
> >> No. one device is not supposed to move other devices.
> >> Every device is independent that should be moved by explicit command.
> >>
> >> Can you show us where we can find this rule "Every device is
> >> independent that should be moved by explicit command."?
> >>
> >> Also changes like above breaks the existing orchestration, it no-go.
> >>
> 
> And I do not find the rule that you mentioned.
> 
> > There is no Linux kernel subsystem or module to my knowledge that
> > attempt to move multiple devices using single command.
> > When user executes command , user explicitly give device name "foo",
> only "foo" should move.
> > Other loosely coupled device whose name is not specified in the ip
> > command which has a different life cycle should not move along with "foo".
> >
> > You are trying to define the new rule that breaks the existing ABI and
> > the iproute2 (ip and rdma) command semantics.
> > It is implicit that when command is issued on device A, operate on
> > device A. This is part of
> > iproute2 functioning.
> 
> About iproute2, I read this link
> https://wiki.linuxfoundation.org/networking/iproute2#documentation
> 
> There is no rules that you mentioned.
> 
> This rule is defined explicitly or implicitly?
> 
Wiki pages links are not the documentation.
Man pages of the iproute2 is documentation of iproute2 at [1] and [2].

[1] https://man7.org/linux/man-pages/man8/rdma-system.8.html
[2] https://man7.org/linux/man-pages/man8/rdma-dev.8.html

As I explained, the explicit rule that you are looking for that say "when I modify device foo, it can also modifies the device bar".
Because no part of the Linux kernel does that usually, unless the device is representor/control object etc or has parent/child relationship.
It is fundamental to a command definition, not a matter of explicit or implicit.

And clearly in this discussion foo and bar are loosely coupled network devices, one is not controlling the other.

Also, a rdma device is attached to multiple net devices, primary and other upper devices such as vlan, macvlan etc.
Zhu Yanjun Oct. 28, 2022, 3:21 a.m. UTC | #25
在 2022/10/27 22:06, Parav Pandit 写道:
>> From: yanjun.zhu@linux.dev <yanjun.zhu@linux.dev>
>> Sent: Thursday, October 27, 2022 2:02 AM
>> To: Parav Pandit <parav@nvidia.com>; dust.li@linux.alibaba.com; Zhu Yanjun
>> <yanjun.zhu@intel.com>; jgg@ziepe.ca; leon@kernel.org; linux-
>> rdma@vger.kernel.org
>> Subject: Re: [PATCH 0/3] RDMA net namespace
>>
>> October 27, 2022 11:48 AM, "Parav Pandit" <parav@nvidia.com> wrote:
>>
>>>> From: yanjun.zhu@linux.dev <yanjun.zhu@linux.dev>
>>>> Sent: Wednesday, October 26, 2022 11:39 PM
>>>>
>>>> October 27, 2022 11:21 AM, "Parav Pandit" <parav@nvidia.com> wrote:
>>>>
>>>> From: yanjun.zhu@linux.dev <yanjun.zhu@linux.dev>
>>>> Sent: Wednesday, October 26, 2022 11:17 PM
>>>>
>>>> October 27, 2022 11:10 AM, "Parav Pandit" <parav@nvidia.com> wrote:
>>>>
>>>> From: yanjun.zhu@linux.dev <yanjun.zhu@linux.dev>
>>>> Sent: Wednesday, October 26, 2022 11:08 PM
>>>>
>>>> October 27, 2022 11:01 AM, "Parav Pandit" <parav@nvidia.com> wrote:
>>>>
>>>> From: Dust Li <dust.li@linux.alibaba.com>
>>>> Sent: Wednesday, October 26, 2022 10:31 PM
>>>>
>>>> 2. else we are in
>>>> exclusive mode. When the corresponding netdevice of the RoCE or iWarp
>>>> device is moved from one net namespace to another, we move the
>> RDMA
>>>> device into that net namespace
>>>>
>>>> What do you think ?
>>>>
>>>> No. one device is not supposed to move other devices.
>>>> Every device is independent that should be moved by explicit command.
>>>>
>>>> Can you show us where we can find this rule "Every device is
>>>> independent that should be moved by explicit command."?
>>>>
>>>> Also changes like above breaks the existing orchestration, it no-go.
>>>>
>> And I do not find the rule that you mentioned.
>>
>>> There is no Linux kernel subsystem or module to my knowledge that
>>> attempt to move multiple devices using single command.
>>> When user executes command , user explicitly give device name "foo",
>> only "foo" should move.
>>> Other loosely coupled device whose name is not specified in the ip
>>> command which has a different life cycle should not move along with "foo".
>>>
>>> You are trying to define the new rule that breaks the existing ABI and
>>> the iproute2 (ip and rdma) command semantics.
>>> It is implicit that when command is issued on device A, operate on
>>> device A. This is part of
>>> iproute2 functioning.
>> About iproute2, I read this link
>> https://wiki.linuxfoundation.org/networking/iproute2#documentation
>>
>> There is no rules that you mentioned.
>>
>> This rule is defined explicitly or implicitly?
>>
> Wiki pages links are not the documentation.
> Man pages of the iproute2 is documentation of iproute2 at [1] and [2].
>
> [1] https://man7.org/linux/man-pages/man8/rdma-system.8.html
> [2] https://man7.org/linux/man-pages/man8/rdma-dev.8.html
>
> As I explained, the explicit rule that you are looking for that say "when I modify device foo, it can also modifies the device bar".
> Because no part of the Linux kernel does that usually, unless the device is representor/control object etc or has parent/child relationship.
> It is fundamental to a command definition, not a matter of explicit or implicit.

 From the ABI, iproute2 and current rdma command links, I can not find 
the rule that you mentioned.

Can you tell me the exact link that make such definition?

>
> And clearly in this discussion foo and bar are loosely coupled network devices, one is not controlling the other.
>
> Also, a rdma device is attached to multiple net devices, primary and other upper devices such as vlan, macvlan etc.

To a RoCE device, how to attach a rdma device to vlan, macvlan?

To "a rdma device is attached to multiple net devices, primary and other 
upper devices such as vlan, macvlan etc",

Can you show us an example? The rdma device is RoCE device, iWarp or 
ipoib device?

Zhu Yanjun

>
>
Parav Pandit Oct. 28, 2022, 3:31 a.m. UTC | #26
> From: Yanjun Zhu <yanjun.zhu@linux.dev>
> Sent: Thursday, October 27, 2022 11:21 PM
> 
> 
> 在 2022/10/27 22:06, Parav Pandit 写道:
> >> From: yanjun.zhu@linux.dev <yanjun.zhu@linux.dev>
> >> Sent: Thursday, October 27, 2022 2:02 AM
> >> To: Parav Pandit <parav@nvidia.com>; dust.li@linux.alibaba.com; Zhu
> >> Yanjun <yanjun.zhu@intel.com>; jgg@ziepe.ca; leon@kernel.org; linux-
> >> rdma@vger.kernel.org
> >> Subject: Re: [PATCH 0/3] RDMA net namespace
> >>
> >> October 27, 2022 11:48 AM, "Parav Pandit" <parav@nvidia.com> wrote:
> >>
> >>>> From: yanjun.zhu@linux.dev <yanjun.zhu@linux.dev>
> >>>> Sent: Wednesday, October 26, 2022 11:39 PM
> >>>>
> >>>> October 27, 2022 11:21 AM, "Parav Pandit" <parav@nvidia.com> wrote:
> >>>>
> >>>> From: yanjun.zhu@linux.dev <yanjun.zhu@linux.dev>
> >>>> Sent: Wednesday, October 26, 2022 11:17 PM
> >>>>
> >>>> October 27, 2022 11:10 AM, "Parav Pandit" <parav@nvidia.com> wrote:
> >>>>
> >>>> From: yanjun.zhu@linux.dev <yanjun.zhu@linux.dev>
> >>>> Sent: Wednesday, October 26, 2022 11:08 PM
> >>>>
> >>>> October 27, 2022 11:01 AM, "Parav Pandit" <parav@nvidia.com> wrote:
> >>>>
> >>>> From: Dust Li <dust.li@linux.alibaba.com>
> >>>> Sent: Wednesday, October 26, 2022 10:31 PM
> >>>>
> >>>> 2. else we are in
> >>>> exclusive mode. When the corresponding netdevice of the RoCE or
> >>>> iWarp device is moved from one net namespace to another, we move
> >>>> the
> >> RDMA
> >>>> device into that net namespace
> >>>>
> >>>> What do you think ?
> >>>>
> >>>> No. one device is not supposed to move other devices.
> >>>> Every device is independent that should be moved by explicit
> command.
> >>>>
> >>>> Can you show us where we can find this rule "Every device is
> >>>> independent that should be moved by explicit command."?
> >>>>
> >>>> Also changes like above breaks the existing orchestration, it no-go.
> >>>>
> >> And I do not find the rule that you mentioned.
> >>
> >>> There is no Linux kernel subsystem or module to my knowledge that
> >>> attempt to move multiple devices using single command.
> >>> When user executes command , user explicitly give device name "foo",
> >> only "foo" should move.
> >>> Other loosely coupled device whose name is not specified in the ip
> >>> command which has a different life cycle should not move along with
> "foo".
> >>>
> >>> You are trying to define the new rule that breaks the existing ABI
> >>> and the iproute2 (ip and rdma) command semantics.
> >>> It is implicit that when command is issued on device A, operate on
> >>> device A. This is part of
> >>> iproute2 functioning.
> >> About iproute2, I read this link
> >> https://wiki.linuxfoundation.org/networking/iproute2#documentation
> >>
> >> There is no rules that you mentioned.
> >>
> >> This rule is defined explicitly or implicitly?
> >>
> > Wiki pages links are not the documentation.
> > Man pages of the iproute2 is documentation of iproute2 at [1] and [2].
> >
> > [1] https://man7.org/linux/man-pages/man8/rdma-system.8.html
> > [2] https://man7.org/linux/man-pages/man8/rdma-dev.8.html
> >
> > As I explained, the explicit rule that you are looking for that say "when I
> modify device foo, it can also modifies the device bar".
> > Because no part of the Linux kernel does that usually, unless the device is
> representor/control object etc or has parent/child relationship.
> > It is fundamental to a command definition, not a matter of explicit or
> implicit.
> 
>  From the ABI, iproute2 and current rdma command links, I can not find the
> rule that you mentioned.
> 
> Can you tell me the exact link that make such definition?
> 
I explained you already above. You are repeating your weird question.

Can you show one iproute2 example, where you specify a command on device A, and kernel operates on device, A, P, Q, R?
This is the attempt you are trying to do for unknown reasons.

So, can you please explain, what is the problem in using existing rdma dev commands that move rdma device to net namespace?

> >
> > And clearly in this discussion foo and bar are loosely coupled network
> devices, one is not controlling the other.
> >
> > Also, a rdma device is attached to multiple net devices, primary and other
> upper devices such as vlan, macvlan etc.
> 
> To a RoCE device, how to attach a rdma device to vlan, macvlan?
> 
> To "a rdma device is attached to multiple net devices, primary and other
> upper devices such as vlan, macvlan etc",
> 
> Can you show us an example? The rdma device is RoCE device, iWarp or ipoib
> device?
Rdma device is roce device.
Add vlan, macvlan device on top of the netdevice linked to the roce device using iproute2.
Zhu Yanjun Oct. 28, 2022, 3:49 a.m. UTC | #27
在 2022/10/28 11:31, Parav Pandit 写道:
>> From: Yanjun Zhu <yanjun.zhu@linux.dev>
>> Sent: Thursday, October 27, 2022 11:21 PM
>>
>>
>> 在 2022/10/27 22:06, Parav Pandit 写道:
>>>> From: yanjun.zhu@linux.dev <yanjun.zhu@linux.dev>
>>>> Sent: Thursday, October 27, 2022 2:02 AM
>>>> To: Parav Pandit <parav@nvidia.com>; dust.li@linux.alibaba.com; Zhu
>>>> Yanjun <yanjun.zhu@intel.com>; jgg@ziepe.ca; leon@kernel.org; linux-
>>>> rdma@vger.kernel.org
>>>> Subject: Re: [PATCH 0/3] RDMA net namespace
>>>>
>>>> October 27, 2022 11:48 AM, "Parav Pandit" <parav@nvidia.com> wrote:
>>>>
>>>>>> From: yanjun.zhu@linux.dev <yanjun.zhu@linux.dev>
>>>>>> Sent: Wednesday, October 26, 2022 11:39 PM
>>>>>>
>>>>>> October 27, 2022 11:21 AM, "Parav Pandit" <parav@nvidia.com> wrote:
>>>>>>
>>>>>> From: yanjun.zhu@linux.dev <yanjun.zhu@linux.dev>
>>>>>> Sent: Wednesday, October 26, 2022 11:17 PM
>>>>>>
>>>>>> October 27, 2022 11:10 AM, "Parav Pandit" <parav@nvidia.com> wrote:
>>>>>>
>>>>>> From: yanjun.zhu@linux.dev <yanjun.zhu@linux.dev>
>>>>>> Sent: Wednesday, October 26, 2022 11:08 PM
>>>>>>
>>>>>> October 27, 2022 11:01 AM, "Parav Pandit" <parav@nvidia.com> wrote:
>>>>>>
>>>>>> From: Dust Li <dust.li@linux.alibaba.com>
>>>>>> Sent: Wednesday, October 26, 2022 10:31 PM
>>>>>>
>>>>>> 2. else we are in
>>>>>> exclusive mode. When the corresponding netdevice of the RoCE or
>>>>>> iWarp device is moved from one net namespace to another, we move
>>>>>> the
>>>> RDMA
>>>>>> device into that net namespace
>>>>>>
>>>>>> What do you think ?
>>>>>>
>>>>>> No. one device is not supposed to move other devices.
>>>>>> Every device is independent that should be moved by explicit
>> command.
>>>>>> Can you show us where we can find this rule "Every device is
>>>>>> independent that should be moved by explicit command."?
>>>>>>
>>>>>> Also changes like above breaks the existing orchestration, it no-go.
>>>>>>
>>>> And I do not find the rule that you mentioned.
>>>>
>>>>> There is no Linux kernel subsystem or module to my knowledge that
>>>>> attempt to move multiple devices using single command.
>>>>> When user executes command , user explicitly give device name "foo",
>>>> only "foo" should move.
>>>>> Other loosely coupled device whose name is not specified in the ip
>>>>> command which has a different life cycle should not move along with
>> "foo".
>>>>> You are trying to define the new rule that breaks the existing ABI
>>>>> and the iproute2 (ip and rdma) command semantics.
>>>>> It is implicit that when command is issued on device A, operate on
>>>>> device A. This is part of
>>>>> iproute2 functioning.
>>>> About iproute2, I read this link
>>>> https://wiki.linuxfoundation.org/networking/iproute2#documentation
>>>>
>>>> There is no rules that you mentioned.
>>>>
>>>> This rule is defined explicitly or implicitly?
>>>>
>>> Wiki pages links are not the documentation.
>>> Man pages of the iproute2 is documentation of iproute2 at [1] and [2].
>>>
>>> [1] https://man7.org/linux/man-pages/man8/rdma-system.8.html
>>> [2] https://man7.org/linux/man-pages/man8/rdma-dev.8.html
>>>
>>> As I explained, the explicit rule that you are looking for that say "when I
>> modify device foo, it can also modifies the device bar".
>>> Because no part of the Linux kernel does that usually, unless the device is
>> representor/control object etc or has parent/child relationship.
>>> It is fundamental to a command definition, not a matter of explicit or
>> implicit.
>>
>>   From the ABI, iproute2 and current rdma command links, I can not find the
>> rule that you mentioned.
>>
>> Can you tell me the exact link that make such definition?
>>
> I explained you already above. You are repeating your weird question.

You mentioned that it is a rule. I just want to know where it is defined.

>
> Can you show one iproute2 example, where you specify a command on device A, and kernel operates on device, A, P, Q, R?

When you add a net devices to a bonding device, you will find changes on 
the bonding device and the net devices.

Or some other commands like this.

Zhu Yanjun

> This is the attempt you are trying to do for unknown reasons.
>
> So, can you please explain, what is the problem in using existing rdma dev commands that move rdma device to net namespace?
>
>>> And clearly in this discussion foo and bar are loosely coupled network
>> devices, one is not controlling the other.
>>> Also, a rdma device is attached to multiple net devices, primary and other
>> upper devices such as vlan, macvlan etc.
>>
>> To a RoCE device, how to attach a rdma device to vlan, macvlan?
>>
>> To "a rdma device is attached to multiple net devices, primary and other
>> upper devices such as vlan, macvlan etc",
>>
>> Can you show us an example? The rdma device is RoCE device, iWarp or ipoib
>> device?
> Rdma device is roce device.
> Add vlan, macvlan device on top of the netdevice linked to the roce device using iproute2.
Parav Pandit Oct. 28, 2022, 3:58 a.m. UTC | #28
> From: Yanjun Zhu <yanjun.zhu@linux.dev>
> Sent: Thursday, October 27, 2022 11:49 PM

> > Can you show one iproute2 example, where you specify a command on
> device A, and kernel operates on device, A, P, Q, R?
> 
> When you add a net devices to a bonding device, you will find changes on the
> bonding device and the net devices.
> 
> Or some other commands like this.
That doesn’t count as I explained that it is more parent-child or similar control relationship, unlike rdma and netdevice as loosely coupled devices.

Also when you move a underlaying netdev interface of bond device, bond device doesn’t automatically move to new net ns.
Zhu Yanjun Nov. 11, 2022, 2:38 a.m. UTC | #29
在 2022/10/28 11:58, Parav Pandit 写道:
> 
>> From: Yanjun Zhu <yanjun.zhu@linux.dev>
>> Sent: Thursday, October 27, 2022 11:49 PM
> 
>>> Can you show one iproute2 example, where you specify a command on
>> device A, and kernel operates on device, A, P, Q, R?
>>
>> When you add a net devices to a bonding device, you will find changes on the
>> bonding device and the net devices.
>>
>> Or some other commands like this.
> That doesn’t count as I explained that it is more parent-child or similar control relationship, unlike rdma and netdevice as loosely coupled devices.

OK. Follow your advice.

Zhu Yanjun

> 
> Also when you move a underlaying netdev interface of bond device, bond device doesn’t automatically move to new net ns.