diff mbox series

RDMA: don't ignore client->add() failures

Message ID 0e031582-aed6-58ee-3477-6d787f06560a@I-love.SAKURA.ne.jp (mailing list archive)
State Changes Requested
Delegated to: Jason Gunthorpe
Headers show
Series RDMA: don't ignore client->add() failures | expand

Commit Message

Tetsuo Handa March 28, 2023, 10:52 a.m. UTC
syzbot is reporting refcount leak when client->add() from
add_client_context() fails, for commit 11a0ae4c4bff ("RDMA: Allow
ib_client's to fail when add() is called") for unknown reason ignores
error from client->add(). We need to return an error in order to indicate
that client could not be added, otherwise the caller will get confused.

Reported-by: syzbot <syzbot+5e70d01ee8985ae62a3b@syzkaller.appspotmail.com>
Link: https://syzkaller.appspot.com/bug?extid=5e70d01ee8985ae62a3b
Fixes: 11a0ae4c4bff ("RDMA: Allow ib_client's to fail when add() is called")
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
---
 drivers/infiniband/core/device.c | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

Comments

Jason Gunthorpe March 28, 2023, 11:24 a.m. UTC | #1
On Tue, Mar 28, 2023 at 07:52:32PM +0900, Tetsuo Handa wrote:
> syzbot is reporting refcount leak when client->add() from
> add_client_context() fails, for commit 11a0ae4c4bff ("RDMA: Allow
> ib_client's to fail when add() is called") for unknown reason ignores
> error from client->add(). We need to return an error in order to indicate
> that client could not be added, otherwise the caller will get confused.
> 
> Reported-by: syzbot <syzbot+5e70d01ee8985ae62a3b@syzkaller.appspotmail.com>
> Link: https://syzkaller.appspot.com/bug?extid=5e70d01ee8985ae62a3b
> Fixes: 11a0ae4c4bff ("RDMA: Allow ib_client's to fail when add() is called")
> Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
> ---
>  drivers/infiniband/core/device.c | 10 +++++-----
>  1 file changed, 5 insertions(+), 5 deletions(-)

This doesn't make any sense, you need to explain what "caller will get
confused" actually means

Jason
Tetsuo Handa March 28, 2023, 11:33 a.m. UTC | #2
On 2023/03/28 20:24, Jason Gunthorpe wrote:
> On Tue, Mar 28, 2023 at 07:52:32PM +0900, Tetsuo Handa wrote:
>> syzbot is reporting refcount leak when client->add() from
>> add_client_context() fails, for commit 11a0ae4c4bff ("RDMA: Allow
>> ib_client's to fail when add() is called") for unknown reason ignores
>> error from client->add(). We need to return an error in order to indicate
>> that client could not be added, otherwise the caller will get confused.
>>
>> Reported-by: syzbot <syzbot+5e70d01ee8985ae62a3b@syzkaller.appspotmail.com>
>> Link: https://syzkaller.appspot.com/bug?extid=5e70d01ee8985ae62a3b
>> Fixes: 11a0ae4c4bff ("RDMA: Allow ib_client's to fail when add() is called")
>> Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
>> ---
>>  drivers/infiniband/core/device.c | 10 +++++-----
>>  1 file changed, 5 insertions(+), 5 deletions(-)
> 
> This doesn't make any sense, you need to explain what "caller will get
> confused" actually means

The caller cannot perform cleanup upon error, which in turn manifests as
"unregister_netdevice: waiting for DEV to become free" message.
Please check https://lkml.kernel.org/r/710ef3ef-cd0a-5630-a68f-628d123180bf@I-love.SAKURA.ne.jp .
Jason Gunthorpe March 28, 2023, 11:34 a.m. UTC | #3
On Tue, Mar 28, 2023 at 08:33:21PM +0900, Tetsuo Handa wrote:
> On 2023/03/28 20:24, Jason Gunthorpe wrote:
> > On Tue, Mar 28, 2023 at 07:52:32PM +0900, Tetsuo Handa wrote:
> >> syzbot is reporting refcount leak when client->add() from
> >> add_client_context() fails, for commit 11a0ae4c4bff ("RDMA: Allow
> >> ib_client's to fail when add() is called") for unknown reason ignores
> >> error from client->add(). We need to return an error in order to indicate
> >> that client could not be added, otherwise the caller will get confused.
> >>
> >> Reported-by: syzbot <syzbot+5e70d01ee8985ae62a3b@syzkaller.appspotmail.com>
> >> Link: https://syzkaller.appspot.com/bug?extid=5e70d01ee8985ae62a3b
> >> Fixes: 11a0ae4c4bff ("RDMA: Allow ib_client's to fail when add() is called")
> >> Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
> >> ---
> >>  drivers/infiniband/core/device.c | 10 +++++-----
> >>  1 file changed, 5 insertions(+), 5 deletions(-)
> > 
> > This doesn't make any sense, you need to explain what "caller will get
> > confused" actually means
> 
> The caller cannot perform cleanup upon error, which in turn manifests as
> "unregister_netdevice: waiting for DEV to become free" message.
> Please check https://lkml.kernel.org/r/710ef3ef-cd0a-5630-a68f-628d123180bf@I-love.SAKURA.ne.jp .

Describe exactly where the "cannot perform cleanup upon error" is.

Jason
Tetsuo Handa March 28, 2023, 2:59 p.m. UTC | #4
On 2023/03/28 20:34, Jason Gunthorpe wrote:
> On Tue, Mar 28, 2023 at 08:33:21PM +0900, Tetsuo Handa wrote:
>> On 2023/03/28 20:24, Jason Gunthorpe wrote:
>>> On Tue, Mar 28, 2023 at 07:52:32PM +0900, Tetsuo Handa wrote:
>>>> syzbot is reporting refcount leak when client->add() from
>>>> add_client_context() fails, for commit 11a0ae4c4bff ("RDMA: Allow
>>>> ib_client's to fail when add() is called") for unknown reason ignores
>>>> error from client->add(). We need to return an error in order to indicate
>>>> that client could not be added, otherwise the caller will get confused.
>>>>
>>>> Reported-by: syzbot <syzbot+5e70d01ee8985ae62a3b@syzkaller.appspotmail.com>
>>>> Link: https://syzkaller.appspot.com/bug?extid=5e70d01ee8985ae62a3b
>>>> Fixes: 11a0ae4c4bff ("RDMA: Allow ib_client's to fail when add() is called")
>>>> Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
>>>> ---
>>>>  drivers/infiniband/core/device.c | 10 +++++-----
>>>>  1 file changed, 5 insertions(+), 5 deletions(-)
>>>
>>> This doesn't make any sense, you need to explain what "caller will get
>>> confused" actually means
>>
>> The caller cannot perform cleanup upon error, which in turn manifests as
>> "unregister_netdevice: waiting for DEV to become free" message.
>> Please check https://lkml.kernel.org/r/710ef3ef-cd0a-5630-a68f-628d123180bf@I-love.SAKURA.ne.jp .
> 
> Describe exactly where the "cannot perform cleanup upon error" is.
> 

Here is a debug printk() patch for explanation.
----------------------------------------
diff --git a/drivers/infiniband/core/device.c b/drivers/infiniband/core/device.c
index a666847bd714..2b9fc90d5cc4 100644
--- a/drivers/infiniband/core/device.c
+++ b/drivers/infiniband/core/device.c
@@ -1327,15 +1327,21 @@ static int enable_device_and_get(struct ib_device *device)
 			goto out;
 	}
 
+	pr_info("before add_client_context() loop: device=%p usage=%d\n", device, refcount_read(&device->refcount));
+	ret = 0;
 	down_read(&clients_rwsem);
 	xa_for_each_marked (&clients, index, client, CLIENT_REGISTERED) {
-		ret = add_client_context(device, client);
-		if (ret)
-			break;
+		const int ret2 = add_client_context(device, client);
+		pr_info("add_client_context()=%d\n", ret2);
+		if (!ret)
+			ret = ret2;
 	}
 	up_read(&clients_rwsem);
-	if (!ret)
+	pr_info("after add_client_context() loop: ret=%d device=%p usage=%d\n", ret, device, refcount_read(&device->refcount));
+	if (!ret) {
 		ret = add_compat_devs(device);
+	}
+	pr_info("return from add_client_context(): ret=%d device=%p usage=%d\n", ret, device, refcount_read(&device->refcount));
 out:
 	up_read(&devices_rwsem);
 	return ret;
@@ -1417,7 +1423,9 @@ int ib_register_device(struct ib_device *device, const char *name,
 		goto dev_cleanup;
 	}
 
+	pr_info("before0: device=%p usage=%d\n", device, refcount_read(&device->refcount));
 	ret = enable_device_and_get(device);
+	pr_info("after0: ret=%d device=%p usage=%d\n", ret, device, refcount_read(&device->refcount));
 	if (ret) {
 		void (*dealloc_fn)(struct ib_device *);
 
@@ -1435,7 +1443,9 @@ int ib_register_device(struct ib_device *device, const char *name,
 		dealloc_fn = device->ops.dealloc_driver;
 		device->ops.dealloc_driver = prevent_dealloc_device;
 		ib_device_put(device);
+		pr_info("after1: device=%p usage=%d\n", device, refcount_read(&device->refcount));
 		__ib_unregister_device(device);
+		pr_info("after2: device=%p usage=%d\n", device, refcount_read(&device->refcount));
 		device->ops.dealloc_driver = dealloc_fn;
 		dev_set_uevent_suppress(&device->dev, false);
 		return ret;
diff --git a/drivers/infiniband/sw/siw/siw_main.c b/drivers/infiniband/sw/siw/siw_main.c
index dacc174604bf..bc353b254257 100644
--- a/drivers/infiniband/sw/siw/siw_main.c
+++ b/drivers/infiniband/sw/siw/siw_main.c
@@ -519,7 +519,9 @@ static int siw_newlink(const char *basedev_name, struct net_device *netdev)
 		else
 			sdev->state = IB_PORT_DOWN;
 
+		pr_info("before siw_device_register(): device=%p usage=%d\n", &sdev->base_dev, refcount_read(&sdev->base_dev.refcount));
 		rv = siw_device_register(sdev, basedev_name);
+		pr_info("after siw_device_register(): rv=%d device=%p usage=%d\n", rv, &sdev->base_dev, refcount_read(&sdev->base_dev.refcount));
 		if (rv)
 			ib_dealloc_device(&sdev->base_dev);
 	}
----------------------------------------

Without this patch, __ib_unregister_device(device) is not called because
enable_device_and_get() returns 0 because add_client_context() returns 0
because add_client_context() ignores client->add() faiulures. As a result,
device's refcount remains 7, which later prevents unregister_netdevice()
 from unregistering this device.
----------------------------------------
[   32.366225][ T3491] netdevsim netdevsim0 netdevsim0: set [1, 0] type 2 family 0 port 6081 - 0
[   32.369414][ T3491] netdevsim netdevsim0 netdevsim1: set [1, 0] type 2 family 0 port 6081 - 0
[   32.372875][ T3491] netdevsim netdevsim0 netdevsim2: set [1, 0] type 2 family 0 port 6081 - 0
[   32.376025][ T3491] netdevsim netdevsim0 netdevsim3: set [1, 0] type 2 family 0 port 6081 - 0
[   32.390366][   T22] IPv6: ADDRCONF(NETDEV_CHANGE): veth1_to_batadv: link becomes ready
[   32.394826][ T3491] before siw_device_register(): device=ffff888108206000 usage=0
[   32.397861][ T3491] before0: device=ffff888108206000 usage=0
[   32.400057][ T3491] before add_client_context() loop: device=ffff888108206000 usage=2
[   32.407313][ T3491] add_client_context()=0
[   32.409077][ T3491] add_client_context()=0
[   32.411592][ T3491] add_client_context()=0
[   32.413276][ T3491] add_client_context()=0
[   32.414931][ T3491] add_client_context()=0
[   32.416596][ T3491] add_client_context()=0
[   32.418288][ T3491] iwpm_register_pid: Unable to send a nlmsg (client = 2)
[   32.423042][ T3491] infiniband syj1: RDMA CMA: cma_listen_on_dev, error -98
[   32.425622][ T3491] add_client_context()=0
[   32.428176][ T3491] add_client_context()=0
[   32.429835][ T3491] add_client_context()=0
[   32.435844][ T3491] add_client_context()=0
[   32.438270][ T3491] add_client_context()=0
[   32.439967][ T3491] add_client_context()=0
[   32.441667][ T3491] add_client_context()=0
[   32.443575][ T3491] add_client_context()=0
[   32.445507][ T3491] add_client_context()=0
[   32.447142][ T3491] after add_client_context() loop: ret=0 device=ffff888108206000 usage=8
[   32.450930][ T3491] return from add_client_context(): ret=0 device=ffff888108206000 usage=8
[   32.454207][ T3491] after0: ret=0 device=ffff888108206000 usage=8
[   32.456541][ T3491] after siw_device_register(): rv=0 device=ffff888108206000 usage=7
[  172.605175][ T3491] unregister_netdevice: waiting for vlan0 (ffff8881075ba000) to become free. Usage count = 2
[  172.634645][ T3491] leaked reference.
[  172.645548][ T3491]  ib_device_set_netdev+0x20c/0x3e0
[  172.651096][ T3491]  siw_newlink+0x2e0/0x540
[  172.659445][ T3491]  nldev_newlink+0x31f/0x3c0
[  172.666976][ T3491]  rdma_nl_rcv+0x379/0x4a0
[  172.683027][ T3491]  netlink_unicast+0x3b8/0x480
[  172.693245][ T3491]  netlink_sendmsg+0x574/0x660
[  172.702226][ T3491]  ____sys_sendmsg+0x2b7/0x3d0
[  172.715316][ T3491]  __sys_sendmsg+0x352/0x3c0
[  172.740055][ T3491]  do_syscall_64+0x41/0x90
[  172.788179][ T3491]  entry_SYSCALL_64_after_hwframe+0x63/0xcd
----------------------------------------

With this patch, __ib_unregister_device(device) is called because
enable_device_and_get() returns error because add_client_context() returns error
because add_client_context() does not ignore client->add() faiulures. As a result,
device's refcount properly drops to 0, which allows unregister_netdevice() to
unregister this device.
----------------------------------------
[   26.351298][ T3504] netdevsim netdevsim0 netdevsim0: set [1, 0] type 2 family 0 port 6081 - 0
[   26.373714][ T3504] netdevsim netdevsim0 netdevsim1: set [1, 0] type 2 family 0 port 6081 - 0
[   26.376967][ T3504] netdevsim netdevsim0 netdevsim2: set [1, 0] type 2 family 0 port 6081 - 0
[   26.380105][ T3504] netdevsim netdevsim0 netdevsim3: set [1, 0] type 2 family 0 port 6081 - 0
[   26.383755][  T160] IPv6: ADDRCONF(NETDEV_CHANGE): veth1_to_batadv: link becomes ready
[   26.396376][ T3504] before siw_device_register(): device=ffff8880075fc000 usage=0
[   26.402221][ T3504] before0: device=ffff8880075fc000 usage=0
[   26.404461][ T3504] before add_client_context() loop: device=ffff8880075fc000 usage=2
[   26.408205][ T3504] add_client_context()=-95
[   26.416782][ T3504] add_client_context()=-95
[   26.419042][ T3504] add_client_context()=-95
[   26.420843][ T3504] add_client_context()=0
[   26.422506][ T3504] add_client_context()=0
[   26.424163][ T3504] add_client_context()=-95
[   26.425958][ T3504] iwpm_register_pid: Unable to send a nlmsg (client = 2)
[   26.428983][ T3504] infiniband syj1: RDMA CMA: cma_listen_on_dev, error -98
[   26.431641][ T3504] add_client_context()=-98
[   26.433457][ T3504] add_client_context()=-95
[   26.435729][ T3504] add_client_context()=0
[   26.438326][ T3504] add_client_context()=0
[   26.439989][ T3504] add_client_context()=0
[   26.441629][ T3504] add_client_context()=-95
[   26.443411][ T3504] add_client_context()=0
[   26.445082][ T3504] add_client_context()=-95
[   26.446974][ T3504] add_client_context()=-95
[   26.449578][ T3504] after add_client_context() loop: ret=-95 device=ffff8880075fc000 usage=8
[   26.452833][ T3504] return from add_client_context(): ret=-95 device=ffff8880075fc000 usage=8
[   26.456563][ T3504] after0: ret=-95 device=ffff8880075fc000 usage=8
[   26.459548][ T3504] after1: device=ffff8880075fc000 usage=7
[   26.462257][ T3504] after2: device=ffff8880075fc000 usage=0
[   26.464377][ T3504] siw: device registration error -95
[   26.466639][ T3504] after siw_device_register(): rv=-95 device=ffff8880075fc000 usage=0
[   29.593478][  T821] netdevsim netdevsim0 netdevsim3 (unregistering): unset [1, 0] type 2 family 0 port 6081 - 0
[   32.039458][  T821] netdevsim netdevsim0 netdevsim2 (unregistering): unset [1, 0] type 2 family 0 port 6081 - 0
[   32.099082][  T821] netdevsim netdevsim0 netdevsim1 (unregistering): unset [1, 0] type 2 family 0 port 6081 - 0
[   32.138972][  T821] netdevsim netdevsim0 netdevsim0 (unregistering): unset [1, 0] type 2 family 0 port 6081 - 0
----------------------------------------
Jason Gunthorpe March 28, 2023, 4:18 p.m. UTC | #5
On Tue, Mar 28, 2023 at 11:59:48PM +0900, Tetsuo Handa wrote:
> Without this patch, __ib_unregister_device(device) is not called because
> enable_device_and_get() returns 0 because add_client_context() returns 0
> because add_client_context() ignores client->add() faiulures. As a result,
> device's refcount remains 7, which later prevents unregister_netdevice()
>  from unregistering this device.

That is completely correct, the device was successfully registered
without one of the clients.

It seems to me all this has done done is make it so the device doesn't
register and that will hide the refcount bug because the bug is
actually something that happens during operation - and we never get to
operation now.

Jason
Tetsuo Handa March 28, 2023, 10:17 p.m. UTC | #6
On 2023/03/29 1:18, Jason Gunthorpe wrote:
> On Tue, Mar 28, 2023 at 11:59:48PM +0900, Tetsuo Handa wrote:
>> Without this patch, __ib_unregister_device(device) is not called because
>> enable_device_and_get() returns 0 because add_client_context() returns 0
>> because add_client_context() ignores client->add() failures. As a result,
>> device's refcount remains 7, which later prevents unregister_netdevice()
>>  from unregistering this device.
> 
> That is completely correct, the device was successfully registered
> without one of the clients.

ib_register_device() is responsible for unregistering that device (by calling
__ib_unregister_device(device)) if ib_register_device() failed to
initialize a device, isn't it?

> 
> It seems to me all this has done done is make it so the device doesn't
> register and that will hide the refcount bug because the bug is
> actually something that happens during operation - and we never get to
> operation now.

If ib_register_device() were not responsible for unregistering that device
when ib_register_device() failed to initialize a device, who is responsible
for unregistering that device?

The caller of ib_register_device() (i.e. siw_device_register() from
siw_newlink()) is assuming that somebody will call __ib_unregister_device(),
but nobody is calling __ib_unregister_device().
Jason Gunthorpe March 29, 2023, 3:44 p.m. UTC | #7
On Wed, Mar 29, 2023 at 07:17:26AM +0900, Tetsuo Handa wrote:
> On 2023/03/29 1:18, Jason Gunthorpe wrote:
> > On Tue, Mar 28, 2023 at 11:59:48PM +0900, Tetsuo Handa wrote:
> >> Without this patch, __ib_unregister_device(device) is not called because
> >> enable_device_and_get() returns 0 because add_client_context() returns 0
> >> because add_client_context() ignores client->add() failures. As a result,
> >> device's refcount remains 7, which later prevents unregister_netdevice()
> >>  from unregistering this device.
> > 
> > That is completely correct, the device was successfully registered
> > without one of the clients.
> 
> ib_register_device() is responsible for unregistering that device (by calling
> __ib_unregister_device(device)) if ib_register_device() failed to
> initialize a device, isn't it?

Yes, if it fails. It isn't returning a failure code, so of course it
doesn't fail.

You changed this code and forced it to always fail, so of course it
radically changes everything about the bug you are looking at.

> The caller of ib_register_device() (i.e. siw_device_register() from
> siw_newlink()) is assuming that somebody will call __ib_unregister_device(),
> but nobody is calling __ib_unregister_device().

On the success path this stuff happens during dellink

Jason
Tetsuo Handa March 29, 2023, 11:51 p.m. UTC | #8
On 2023/03/30 0:44, Jason Gunthorpe wrote:
>> The caller of ib_register_device() (i.e. siw_device_register() from
>> siw_newlink()) is assuming that somebody will call __ib_unregister_device(),
>> but nobody is calling __ib_unregister_device().
> 
> On the success path this stuff happens during dellink
> 

"struct rdma_link_ops" has "newlink" callback but does not have "dellink" callback.
"struct rtnl_link_ops" has both "newlink" callback and "dellink" callback, but
only ipoib_netlink.c defines it inside drivers/infiniband/ directory.

Where is the dellink you are talking about?
I'm not familiar with rdma code. Please explain using specific function/symbol names.
Jason Gunthorpe March 30, 2023, 11:39 p.m. UTC | #9
On Thu, Mar 30, 2023 at 08:51:44AM +0900, Tetsuo Handa wrote:
> On 2023/03/30 0:44, Jason Gunthorpe wrote:
> >> The caller of ib_register_device() (i.e. siw_device_register() from
> >> siw_newlink()) is assuming that somebody will call __ib_unregister_device(),
> >> but nobody is calling __ib_unregister_device().
> > 
> > On the success path this stuff happens during dellink
> > 
> 
> "struct rdma_link_ops" has "newlink" callback but does not have "dellink" callback.
> "struct rtnl_link_ops" has both "newlink" callback and "dellink" callback, but
> only ipoib_netlink.c defines it inside drivers/infiniband/ directory.
> 
> Where is the dellink you are talking about?
> I'm not familiar with rdma code. Please explain using specific function/symbol names.

Your test code deletes a vlan

If SIW is attached to a vlan then SIW will destroy itself based on a
net notifier

Look at siw_netdev_event:

	case NETDEV_UNREGISTER:
		ib_unregister_device_queued(&sdev->base_dev);
		break;

Jason
Tetsuo Handa March 31, 2023, 4:19 p.m. UTC | #10
On 2023/03/31 8:39, Jason Gunthorpe wrote:
> Look at siw_netdev_event:
> 
> 	case NETDEV_UNREGISTER:
> 		ib_unregister_device_queued(&sdev->base_dev);
> 		break;

I see. We can observe that

  net vlan0: siw: event 6

is emitted for every second, but unfortunately ib_unregister_device_queued() is
never called because dev_net(netdev) != &init_net is true. Changing like below
avoids this problem.

I guess that either dev_net(netdev) is not appropriately initialized or
dev_net(netdev) != &init_net is too restrictive to call ib_unregister_device_queued().
Where is dev_net(netdev) initialized?

----------------------------------------
diff --git a/drivers/infiniband/sw/siw/siw_main.c b/drivers/infiniband/sw/siw/siw_main.c
index dacc174604bf..6f125e282dfc 100644
--- a/drivers/infiniband/sw/siw/siw_main.c
+++ b/drivers/infiniband/sw/siw/siw_main.c
@@ -434,13 +434,20 @@ static int siw_netdev_event(struct notifier_block *nb, unsigned long event,
 	struct net_device *netdev = netdev_notifier_info_to_dev(arg);
 	struct ib_device *base_dev;
 	struct siw_device *sdev;
+	bool flag = false;
 
-	dev_dbg(&netdev->dev, "siw: event %lu\n", event);
+	dev_info(&netdev->dev, "siw: event %lu\n", event);
 
-	if (dev_net(netdev) != &init_net)
-		return NOTIFY_OK;
+	if (dev_net(netdev) != &init_net) {
+		pr_info("dev_net(netdev)=%px init_net=%px\n", dev_net(netdev), &init_net);
+		if (event != NETDEV_UNREGISTER)
+			return NOTIFY_OK;
+		flag = true;
+	}
 
 	base_dev = ib_device_get_by_netdev(netdev, RDMA_DRIVER_SIW);
+	if (flag)
+		pr_info("base_dev=%px for netdev=%px\n", base_dev, netdev);
 	if (!base_dev)
 		return NOTIFY_OK;
 
@@ -471,6 +478,8 @@ static int siw_netdev_event(struct notifier_block *nb, unsigned long event,
 		break;
 
 	case NETDEV_UNREGISTER:
+		if (flag)
+			pr_info("ib_unregister_device_queued(%px)\n", &sdev->base_dev);
 		ib_unregister_device_queued(&sdev->base_dev);
 		break;
 
----------------------------------------

----------------------------------------
[   47.201976] net vlan0: siw: event 6
[   47.203645] dev_net(netdev)=ffff88800ef58000 init_net=ffffffff86435700
[   47.206277] base_dev=ffff888100e32000 for netdev=ffff88810389f000
[   47.208708] ib_unregister_device_queued(ffff888100e32000)
----------------------------------------
Jason Gunthorpe March 31, 2023, 4:23 p.m. UTC | #11
On Sat, Apr 01, 2023 at 01:19:47AM +0900, Tetsuo Handa wrote:
> On 2023/03/31 8:39, Jason Gunthorpe wrote:
> > Look at siw_netdev_event:
> > 
> > 	case NETDEV_UNREGISTER:
> > 		ib_unregister_device_queued(&sdev->base_dev);
> > 		break;
> 
> I see. We can observe that
> 
>   net vlan0: siw: event 6
> 
> is emitted for every second, but unfortunately ib_unregister_device_queued() is
> never called because dev_net(netdev) != &init_net is true. Changing like below
> avoids this problem.
> 
> I guess that either dev_net(netdev) is not appropriately initialized or
> dev_net(netdev) != &init_net is too restrictive to call ib_unregister_device_queued().
> Where is dev_net(netdev) initialized?

Bernard? What is this net ns check for? It seems surprising this would
be here

Jason
Tetsuo Handa April 1, 2023, 7:30 a.m. UTC | #12
On 2023/04/01 1:23, Jason Gunthorpe wrote:
> On Sat, Apr 01, 2023 at 01:19:47AM +0900, Tetsuo Handa wrote:
>> I guess that either dev_net(netdev) is not appropriately initialized or
>> dev_net(netdev) != &init_net is too restrictive to call ib_unregister_device_queued().
>> Where is dev_net(netdev) initialized?
> 
> Bernard? What is this net ns check for? It seems surprising this would
> be here
> 

Commit bdcf26bf9b3a ("rdma/siw: network and RDMA core interface") implemented
siw_netdev_event() with

	if (dev_net(netdev) != &init_net)
		return NOTIFY_OK;

check. But why this check is needed was not explained.
Maybe ib_devices_shared_netns is relevant?

Since network devices created upon/after unshare(CLONE_NEWNET) have network
namespace other than init_net, this check completely disables siw_netdev_event()
after unshare(CLONE_NEWNET). Thus, removing this check looks reasonable.
Leon Romanovsky April 1, 2023, 6:30 p.m. UTC | #13
On Sat, Apr 01, 2023 at 04:30:08PM +0900, Tetsuo Handa wrote:
> On 2023/04/01 1:23, Jason Gunthorpe wrote:
> > On Sat, Apr 01, 2023 at 01:19:47AM +0900, Tetsuo Handa wrote:
> >> I guess that either dev_net(netdev) is not appropriately initialized or
> >> dev_net(netdev) != &init_net is too restrictive to call ib_unregister_device_queued().
> >> Where is dev_net(netdev) initialized?
> > 
> > Bernard? What is this net ns check for? It seems surprising this would
> > be here
> > 
> 
> Commit bdcf26bf9b3a ("rdma/siw: network and RDMA core interface") implemented
> siw_netdev_event() with
> 
> 	if (dev_net(netdev) != &init_net)
> 		return NOTIFY_OK;
> 
> check. But why this check is needed was not explained.
> Maybe ib_devices_shared_netns is relevant?
> 
> Since network devices created upon/after unshare(CLONE_NEWNET) have network
> namespace other than init_net, this check completely disables siw_netdev_event()
> after unshare(CLONE_NEWNET). Thus, removing this check looks reasonable.

I agree with Jason, this check is not supposed to be in siw in first place
and needs to be removed.

Thanks
diff mbox series

Patch

diff --git a/drivers/infiniband/core/device.c b/drivers/infiniband/core/device.c
index a666847bd714..c72f810ceae1 100644
--- a/drivers/infiniband/core/device.c
+++ b/drivers/infiniband/core/device.c
@@ -718,17 +718,17 @@  static int add_client_context(struct ib_device *device,
 		goto out;
 	downgrade_write(&device->client_data_rwsem);
 	if (client->add) {
-		if (client->add(device)) {
+		ret = client->add(device);
+		if (ret) {
 			/*
-			 * If a client fails to add then the error code is
-			 * ignored, but we won't call any more ops on this
-			 * client.
+			 * If a client fails to add, we won't call any more
+			 * ops on this client.
 			 */
 			xa_erase(&device->client_data, client->client_id);
 			up_read(&device->client_data_rwsem);
 			ib_device_put(device);
 			ib_client_put(client);
-			return 0;
+			return ret;
 		}
 	}