diff mbox series

Delay the initialization of rnbd_server module to late_initcall level

Message ID 20200617103732.10356-1-haris.iqbal@cloud.ionos.com (mailing list archive)
State New, archived
Headers show
Series Delay the initialization of rnbd_server module to late_initcall level | expand

Commit Message

Md Haris Iqbal June 17, 2020, 10:37 a.m. UTC
From: Md Haris Iqbal <haris.iqbal@cloud.ionos.com>

Fixes: 2de6c8de192b ("block/rnbd: server: main functionality")
Reported-by: kernel test robot <rong.a.chen@intel.com>
Signed-off-by: Md Haris Iqbal <haris.iqbal@cloud.ionos.com>

The rnbd_server module's communication manager initialization depends on the
registration of the "network namespace subsystem" of the RDMA CM agent module.
As such, when the kernel is configured to load the rnbd_server and the RDMA
cma module during initialization; and if the rnbd_server module is initialized
before RDMA cma module, a null ptr dereference occurs during the RDMA bind
operation.
This patch delays the initialization of the rnbd_server module to the
late_initcall level, since RDMA cma module uses module_init which puts it into
the device_initcall level.
---
 drivers/block/rnbd/rnbd-srv.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Jinpu Wang June 17, 2020, 10:57 a.m. UTC | #1
Hi Haris,

Thanks for patch, a few comments inline
On Wed, Jun 17, 2020 at 12:37 PM <haris.iqbal@cloud.ionos.com> wrote:
>
> From: Md Haris Iqbal <haris.iqbal@cloud.ionos.com>
>
> Fixes: 2de6c8de192b ("block/rnbd: server: main functionality")
> Reported-by: kernel test robot <rong.a.chen@intel.com>
> Signed-off-by: Md Haris Iqbal <haris.iqbal@cloud.ionos.com>
This part should be the last after commit message.
The subject sounds better maybe: block/rnbd: Delay the initialization
of rnbd_server module to late_initcall level
>
> The rnbd_server module's communication manager initialization depends on the
> registration of the "network namespace subsystem" of the RDMA CM agent module.
> As such, when the kernel is configured to load the rnbd_server and the RDMA
> cma module during initialization; and if the rnbd_server module is initialized
> before RDMA cma module, a null ptr dereference occurs during the RDMA bind
> operation.
would be better to include the call trace here.
> This patch delays the initialization of the rnbd_server module to the
> late_initcall level, since RDMA cma module uses module_init which puts it into
> the device_initcall level.

With the comments addressed:
Acked-by: Jack Wang <jinpu.wang@cloud.ionos.com>
> ---
>  drivers/block/rnbd/rnbd-srv.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/block/rnbd/rnbd-srv.c b/drivers/block/rnbd/rnbd-srv.c
> index 86e61523907b..213df05e5994 100644
> --- a/drivers/block/rnbd/rnbd-srv.c
> +++ b/drivers/block/rnbd/rnbd-srv.c
> @@ -840,5 +840,5 @@ static void __exit rnbd_srv_cleanup_module(void)
>         rnbd_srv_destroy_sysfs_files();
>  }
>
> -module_init(rnbd_srv_init_module);
> +late_initcall(rnbd_srv_init_module);
>  module_exit(rnbd_srv_cleanup_module);
> --
> 2.25.1
>
Leon Romanovsky June 17, 2020, 11:28 a.m. UTC | #2
On Wed, Jun 17, 2020 at 04:07:32PM +0530, haris.iqbal@cloud.ionos.com wrote:
> From: Md Haris Iqbal <haris.iqbal@cloud.ionos.com>
>
> Fixes: 2de6c8de192b ("block/rnbd: server: main functionality")
> Reported-by: kernel test robot <rong.a.chen@intel.com>
> Signed-off-by: Md Haris Iqbal <haris.iqbal@cloud.ionos.com>
>
> The rnbd_server module's communication manager initialization depends on the
> registration of the "network namespace subsystem" of the RDMA CM agent module.
> As such, when the kernel is configured to load the rnbd_server and the RDMA
> cma module during initialization; and if the rnbd_server module is initialized
> before RDMA cma module, a null ptr dereference occurs during the RDMA bind
> operation.
> This patch delays the initialization of the rnbd_server module to the
> late_initcall level, since RDMA cma module uses module_init which puts it into
> the device_initcall level.
> ---
>  drivers/block/rnbd/rnbd-srv.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/block/rnbd/rnbd-srv.c b/drivers/block/rnbd/rnbd-srv.c
> index 86e61523907b..213df05e5994 100644
> --- a/drivers/block/rnbd/rnbd-srv.c
> +++ b/drivers/block/rnbd/rnbd-srv.c
> @@ -840,5 +840,5 @@ static void __exit rnbd_srv_cleanup_module(void)
>  	rnbd_srv_destroy_sysfs_files();
>  }
>
> -module_init(rnbd_srv_init_module);
> +late_initcall(rnbd_srv_init_module);

I don't think that this is correct change. Somehow nvme-rdma works:
module_init(nvme_rdma_init_module);
-> nvme_rdma_init_module
 -> nvmf_register_transport(&nvme_rdma_transport);
  -> nvme_rdma_create_ctrl
   -> nvme_rdma_setup_ctrl
    -> nvme_rdma_configure_admin_queue
     -> nvme_rdma_alloc_queue
      -> rdma_create_id

>  module_exit(rnbd_srv_cleanup_module);
> --
> 2.25.1
>
Jason Gunthorpe June 17, 2020, 6:20 p.m. UTC | #3
On Wed, Jun 17, 2020 at 02:28:11PM +0300, Leon Romanovsky wrote:
> On Wed, Jun 17, 2020 at 04:07:32PM +0530, haris.iqbal@cloud.ionos.com wrote:
> > From: Md Haris Iqbal <haris.iqbal@cloud.ionos.com>
> >
> > Fixes: 2de6c8de192b ("block/rnbd: server: main functionality")
> > Reported-by: kernel test robot <rong.a.chen@intel.com>
> > Signed-off-by: Md Haris Iqbal <haris.iqbal@cloud.ionos.com>
> >
> > The rnbd_server module's communication manager initialization depends on the
> > registration of the "network namespace subsystem" of the RDMA CM agent module.
> > As such, when the kernel is configured to load the rnbd_server and the RDMA
> > cma module during initialization; and if the rnbd_server module is initialized
> > before RDMA cma module, a null ptr dereference occurs during the RDMA bind
> > operation.
> > This patch delays the initialization of the rnbd_server module to the
> > late_initcall level, since RDMA cma module uses module_init which puts it into
> > the device_initcall level.
> >  drivers/block/rnbd/rnbd-srv.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/drivers/block/rnbd/rnbd-srv.c b/drivers/block/rnbd/rnbd-srv.c
> > index 86e61523907b..213df05e5994 100644
> > +++ b/drivers/block/rnbd/rnbd-srv.c
> > @@ -840,5 +840,5 @@ static void __exit rnbd_srv_cleanup_module(void)
> >  	rnbd_srv_destroy_sysfs_files();
> >  }
> >
> > -module_init(rnbd_srv_init_module);
> > +late_initcall(rnbd_srv_init_module);
> 
> I don't think that this is correct change. Somehow nvme-rdma works:
> module_init(nvme_rdma_init_module);
> -> nvme_rdma_init_module
>  -> nvmf_register_transport(&nvme_rdma_transport);
>   -> nvme_rdma_create_ctrl
>    -> nvme_rdma_setup_ctrl
>     -> nvme_rdma_configure_admin_queue
>      -> nvme_rdma_alloc_queue
>       -> rdma_create_id

If it does work, it is by luck.

Keep in mind all this only matters for kernels without modules.

Maybe cma should be upgraded to subsystem_init ? That is a bit tricky
too as it needs the ib_client stuff working

Jason
Leon Romanovsky June 17, 2020, 7:07 p.m. UTC | #4
On Wed, Jun 17, 2020 at 03:20:46PM -0300, Jason Gunthorpe wrote:
> On Wed, Jun 17, 2020 at 02:28:11PM +0300, Leon Romanovsky wrote:
> > On Wed, Jun 17, 2020 at 04:07:32PM +0530, haris.iqbal@cloud.ionos.com wrote:
> > > From: Md Haris Iqbal <haris.iqbal@cloud.ionos.com>
> > >
> > > Fixes: 2de6c8de192b ("block/rnbd: server: main functionality")
> > > Reported-by: kernel test robot <rong.a.chen@intel.com>
> > > Signed-off-by: Md Haris Iqbal <haris.iqbal@cloud.ionos.com>
> > >
> > > The rnbd_server module's communication manager initialization depends on the
> > > registration of the "network namespace subsystem" of the RDMA CM agent module.
> > > As such, when the kernel is configured to load the rnbd_server and the RDMA
> > > cma module during initialization; and if the rnbd_server module is initialized
> > > before RDMA cma module, a null ptr dereference occurs during the RDMA bind
> > > operation.
> > > This patch delays the initialization of the rnbd_server module to the
> > > late_initcall level, since RDMA cma module uses module_init which puts it into
> > > the device_initcall level.
> > >  drivers/block/rnbd/rnbd-srv.c | 2 +-
> > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > >
> > > diff --git a/drivers/block/rnbd/rnbd-srv.c b/drivers/block/rnbd/rnbd-srv.c
> > > index 86e61523907b..213df05e5994 100644
> > > +++ b/drivers/block/rnbd/rnbd-srv.c
> > > @@ -840,5 +840,5 @@ static void __exit rnbd_srv_cleanup_module(void)
> > >  	rnbd_srv_destroy_sysfs_files();
> > >  }
> > >
> > > -module_init(rnbd_srv_init_module);
> > > +late_initcall(rnbd_srv_init_module);
> >
> > I don't think that this is correct change. Somehow nvme-rdma works:
> > module_init(nvme_rdma_init_module);
> > -> nvme_rdma_init_module
> >  -> nvmf_register_transport(&nvme_rdma_transport);
> >   -> nvme_rdma_create_ctrl
> >    -> nvme_rdma_setup_ctrl
> >     -> nvme_rdma_configure_admin_queue
> >      -> nvme_rdma_alloc_queue
> >       -> rdma_create_id
>
> If it does work, it is by luck.

I didn't check every ULP, but it seems that all ULPs use the same
module_init.

>
> Keep in mind all this only matters for kernels without modules.

Can it be related to the fact that other ULPs call to ib_register_client()
before calling to rdma-cm? RNBD does not have such call.

>
> Maybe cma should be upgraded to subsystem_init ? That is a bit tricky
> too as it needs the ib_client stuff working.

It can work, but makes me wonder if it is last change in this area,
every time we touch *_initcall(), we break something else.

I'm not proposing this, but just loudly wondering, do we really need rdma-cm
as a separate module? Can we bring it to be part of ib_core?

Thanks

>
> Jason
Jason Gunthorpe June 17, 2020, 7:26 p.m. UTC | #5
On Wed, Jun 17, 2020 at 10:07:56PM +0300, Leon Romanovsky wrote:
> On Wed, Jun 17, 2020 at 03:20:46PM -0300, Jason Gunthorpe wrote:
> > On Wed, Jun 17, 2020 at 02:28:11PM +0300, Leon Romanovsky wrote:
> > > On Wed, Jun 17, 2020 at 04:07:32PM +0530, haris.iqbal@cloud.ionos.com wrote:
> > > > From: Md Haris Iqbal <haris.iqbal@cloud.ionos.com>
> > > >
> > > > Fixes: 2de6c8de192b ("block/rnbd: server: main functionality")
> > > > Reported-by: kernel test robot <rong.a.chen@intel.com>
> > > > Signed-off-by: Md Haris Iqbal <haris.iqbal@cloud.ionos.com>
> > > >
> > > > The rnbd_server module's communication manager initialization depends on the
> > > > registration of the "network namespace subsystem" of the RDMA CM agent module.
> > > > As such, when the kernel is configured to load the rnbd_server and the RDMA
> > > > cma module during initialization; and if the rnbd_server module is initialized
> > > > before RDMA cma module, a null ptr dereference occurs during the RDMA bind
> > > > operation.
> > > > This patch delays the initialization of the rnbd_server module to the
> > > > late_initcall level, since RDMA cma module uses module_init which puts it into
> > > > the device_initcall level.
> > > >  drivers/block/rnbd/rnbd-srv.c | 2 +-
> > > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > > >
> > > > diff --git a/drivers/block/rnbd/rnbd-srv.c b/drivers/block/rnbd/rnbd-srv.c
> > > > index 86e61523907b..213df05e5994 100644
> > > > +++ b/drivers/block/rnbd/rnbd-srv.c
> > > > @@ -840,5 +840,5 @@ static void __exit rnbd_srv_cleanup_module(void)
> > > >  	rnbd_srv_destroy_sysfs_files();
> > > >  }
> > > >
> > > > -module_init(rnbd_srv_init_module);
> > > > +late_initcall(rnbd_srv_init_module);
> > >
> > > I don't think that this is correct change. Somehow nvme-rdma works:
> > > module_init(nvme_rdma_init_module);
> > > -> nvme_rdma_init_module
> > >  -> nvmf_register_transport(&nvme_rdma_transport);
> > >   -> nvme_rdma_create_ctrl
> > >    -> nvme_rdma_setup_ctrl
> > >     -> nvme_rdma_configure_admin_queue
> > >      -> nvme_rdma_alloc_queue
> > >       -> rdma_create_id
> >
> > If it does work, it is by luck.
> 
> I didn't check every ULP, but it seems that all ULPs use the same
> module_init.
> 
> >
> > Keep in mind all this only matters for kernels without modules.
> 
> Can it be related to the fact that other ULPs call to ib_register_client()
> before calling to rdma-cm? RNBD does not have such call.

If the rdma_create_id() is not on a callchain from module_init then
you don't have a problem.

nvme has a bug here, IIRC. It is not OK to create RDMA CM IDs outside
a client - CM IDs are supposed to be cleaned up when the client is
removed.

Similarly they are supposed to be created from the client attachment.

Though listening CM IDs unbound to any device may change that
slightly, I think it is probably best practice to create the listening
ID only if a client is bound.

Most probably that is the best way to fix rnbd

> I'm not proposing this, but just loudly wondering, do we really need rdma-cm
> as a separate module? Can we bring it to be part of ib_core?

No idea.. It doesn't help this situation at least

Jason
Md Haris Iqbal June 18, 2020, 1:45 a.m. UTC | #6
(Apologies for multiple emails. Was having trouble with an extension,
cause of which emails did not get delivered to the mailing list.
Resolved now.)

> Somehow nvme-rdma works:

I think that's because the callchain during the nvme_rdma_init_module
initialization stops at "nvmf_register_transport()". Here only the
"struct nvmf_transport_ops nvme_rdma_transport" is registered, which
contains the function "nvme_rdma_create_ctrl()". I tested this in my
local setup and during kernel boot, that's the extent of the
callchain.
The ".create_ctrl"; which now points to "nvme_rdma_create_ctrl()" is
called later from "nvmf_dev_write()". I am not sure when this is
called, probably when the "discover" happens from the client side or
during the server config. I am trying to test this to confirm, will
send more details once I am done.
Am I missing something here?


> If the rdma_create_id() is not on a callchain from module_init then you don't have a problem.

I am a little confused. I thought the problem occurs from a call to
either "rdma_resolve_addr()" which calls "rdma_bind_addr()",
or a direct call to "rdma_bind_addr()" as in rtrs case.
In both the cases, a call to "rdma_create_id()" is needed before this.


> Similarly they are supposed to be created from the client attachment.
I am a beginner in terms of concepts here. Did you mean when a client
tries to establish the first connection to an rdma server?


On Thu, Jun 18, 2020 at 12:56 AM Jason Gunthorpe <jgg@ziepe.ca> wrote:
>
> On Wed, Jun 17, 2020 at 10:07:56PM +0300, Leon Romanovsky wrote:
> > On Wed, Jun 17, 2020 at 03:20:46PM -0300, Jason Gunthorpe wrote:
> > > On Wed, Jun 17, 2020 at 02:28:11PM +0300, Leon Romanovsky wrote:
> > > > On Wed, Jun 17, 2020 at 04:07:32PM +0530, haris.iqbal@cloud.ionos.com wrote:
> > > > > From: Md Haris Iqbal <haris.iqbal@cloud.ionos.com>
> > > > >
> > > > > Fixes: 2de6c8de192b ("block/rnbd: server: main functionality")
> > > > > Reported-by: kernel test robot <rong.a.chen@intel.com>
> > > > > Signed-off-by: Md Haris Iqbal <haris.iqbal@cloud.ionos.com>
> > > > >
> > > > > The rnbd_server module's communication manager initialization depends on the
> > > > > registration of the "network namespace subsystem" of the RDMA CM agent module.
> > > > > As such, when the kernel is configured to load the rnbd_server and the RDMA
> > > > > cma module during initialization; and if the rnbd_server module is initialized
> > > > > before RDMA cma module, a null ptr dereference occurs during the RDMA bind
> > > > > operation.
> > > > > This patch delays the initialization of the rnbd_server module to the
> > > > > late_initcall level, since RDMA cma module uses module_init which puts it into
> > > > > the device_initcall level.
> > > > >  drivers/block/rnbd/rnbd-srv.c | 2 +-
> > > > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > > > >
> > > > > diff --git a/drivers/block/rnbd/rnbd-srv.c b/drivers/block/rnbd/rnbd-srv.c
> > > > > index 86e61523907b..213df05e5994 100644
> > > > > +++ b/drivers/block/rnbd/rnbd-srv.c
> > > > > @@ -840,5 +840,5 @@ static void __exit rnbd_srv_cleanup_module(void)
> > > > >         rnbd_srv_destroy_sysfs_files();
> > > > >  }
> > > > >
> > > > > -module_init(rnbd_srv_init_module);
> > > > > +late_initcall(rnbd_srv_init_module);
> > > >
> > > > I don't think that this is correct change. Somehow nvme-rdma works:
> > > > module_init(nvme_rdma_init_module);
> > > > -> nvme_rdma_init_module
> > > >  -> nvmf_register_transport(&nvme_rdma_transport);
> > > >   -> nvme_rdma_create_ctrl
> > > >    -> nvme_rdma_setup_ctrl
> > > >     -> nvme_rdma_configure_admin_queue
> > > >      -> nvme_rdma_alloc_queue
> > > >       -> rdma_create_id
> > >
> > > If it does work, it is by luck.
> >
> > I didn't check every ULP, but it seems that all ULPs use the same
> > module_init.
> >
> > >
> > > Keep in mind all this only matters for kernels without modules.
> >
> > Can it be related to the fact that other ULPs call to ib_register_client()
> > before calling to rdma-cm? RNBD does not have such call.
>
> If the rdma_create_id() is not on a callchain from module_init then
> you don't have a problem.
>
> nvme has a bug here, IIRC. It is not OK to create RDMA CM IDs outside
> a client - CM IDs are supposed to be cleaned up when the client is
> removed.
>
> Similarly they are supposed to be created from the client attachment.
>
> Though listening CM IDs unbound to any device may change that
> slightly, I think it is probably best practice to create the listening
> ID only if a client is bound.
>
> Most probably that is the best way to fix rnbd
>
> > I'm not proposing this, but just loudly wondering, do we really need rdma-cm
> > as a separate module? Can we bring it to be part of ib_core?
>
> No idea.. It doesn't help this situation at least
>
> Jason
Md Haris Iqbal June 18, 2020, 9:14 a.m. UTC | #7
It seems that the "rdma_bind_addr()" is called by the nvme rdma
module; but during the following events
1) When a discover happens from the client side. Call trace for that looks like,
[ 1098.409398] nvmf_dev_write
[ 1098.409403] nvmf_create_ctrl
[ 1098.414568] nvme_rdma_create_ctrl
[ 1098.415009] nvme_rdma_setup_ctrl
[ 1098.415010] nvme_rdma_configure_admin_queue
[ 1098.415010] nvme_rdma_alloc_queue
                             -->(rdma_create_id)
[ 1098.415032] rdma_resolve_addr
[ 1098.415032] cma_bind_addr
[ 1098.415033] rdma_bind_addr

2) When a connect happens from the client side. Call trace is the same
as above, plus "nvme_rdma_alloc_queue()" is called n number of times;
n being the number of IO queues being created.


On the server side, when an nvmf port is enabled, that also triggers a
call to "rdma_bind_addr()", but that is not from the nvme rdma module.
may be nvme target rdma? (not sure).

On Thu, Jun 18, 2020 at 7:15 AM Haris Iqbal <haris.iqbal@cloud.ionos.com> wrote:
>
> (Apologies for multiple emails. Was having trouble with an extension,
> cause of which emails did not get delivered to the mailing list.
> Resolved now.)
>
> > Somehow nvme-rdma works:
>
> I think that's because the callchain during the nvme_rdma_init_module
> initialization stops at "nvmf_register_transport()". Here only the
> "struct nvmf_transport_ops nvme_rdma_transport" is registered, which
> contains the function "nvme_rdma_create_ctrl()". I tested this in my
> local setup and during kernel boot, that's the extent of the
> callchain.
> The ".create_ctrl"; which now points to "nvme_rdma_create_ctrl()" is
> called later from "nvmf_dev_write()". I am not sure when this is
> called, probably when the "discover" happens from the client side or
> during the server config. I am trying to test this to confirm, will
> send more details once I am done.
> Am I missing something here?
>
>
> > If the rdma_create_id() is not on a callchain from module_init then you don't have a problem.
>
> I am a little confused. I thought the problem occurs from a call to
> either "rdma_resolve_addr()" which calls "rdma_bind_addr()",
> or a direct call to "rdma_bind_addr()" as in rtrs case.
> In both the cases, a call to "rdma_create_id()" is needed before this.
>
>
> > Similarly they are supposed to be created from the client attachment.
> I am a beginner in terms of concepts here. Did you mean when a client
> tries to establish the first connection to an rdma server?
>
>
> On Thu, Jun 18, 2020 at 12:56 AM Jason Gunthorpe <jgg@ziepe.ca> wrote:
> >
> > On Wed, Jun 17, 2020 at 10:07:56PM +0300, Leon Romanovsky wrote:
> > > On Wed, Jun 17, 2020 at 03:20:46PM -0300, Jason Gunthorpe wrote:
> > > > On Wed, Jun 17, 2020 at 02:28:11PM +0300, Leon Romanovsky wrote:
> > > > > On Wed, Jun 17, 2020 at 04:07:32PM +0530, haris.iqbal@cloud.ionos.com wrote:
> > > > > > From: Md Haris Iqbal <haris.iqbal@cloud.ionos.com>
> > > > > >
> > > > > > Fixes: 2de6c8de192b ("block/rnbd: server: main functionality")
> > > > > > Reported-by: kernel test robot <rong.a.chen@intel.com>
> > > > > > Signed-off-by: Md Haris Iqbal <haris.iqbal@cloud.ionos.com>
> > > > > >
> > > > > > The rnbd_server module's communication manager initialization depends on the
> > > > > > registration of the "network namespace subsystem" of the RDMA CM agent module.
> > > > > > As such, when the kernel is configured to load the rnbd_server and the RDMA
> > > > > > cma module during initialization; and if the rnbd_server module is initialized
> > > > > > before RDMA cma module, a null ptr dereference occurs during the RDMA bind
> > > > > > operation.
> > > > > > This patch delays the initialization of the rnbd_server module to the
> > > > > > late_initcall level, since RDMA cma module uses module_init which puts it into
> > > > > > the device_initcall level.
> > > > > >  drivers/block/rnbd/rnbd-srv.c | 2 +-
> > > > > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > > > > >
> > > > > > diff --git a/drivers/block/rnbd/rnbd-srv.c b/drivers/block/rnbd/rnbd-srv.c
> > > > > > index 86e61523907b..213df05e5994 100644
> > > > > > +++ b/drivers/block/rnbd/rnbd-srv.c
> > > > > > @@ -840,5 +840,5 @@ static void __exit rnbd_srv_cleanup_module(void)
> > > > > >         rnbd_srv_destroy_sysfs_files();
> > > > > >  }
> > > > > >
> > > > > > -module_init(rnbd_srv_init_module);
> > > > > > +late_initcall(rnbd_srv_init_module);
> > > > >
> > > > > I don't think that this is correct change. Somehow nvme-rdma works:
> > > > > module_init(nvme_rdma_init_module);
> > > > > -> nvme_rdma_init_module
> > > > >  -> nvmf_register_transport(&nvme_rdma_transport);
> > > > >   -> nvme_rdma_create_ctrl
> > > > >    -> nvme_rdma_setup_ctrl
> > > > >     -> nvme_rdma_configure_admin_queue
> > > > >      -> nvme_rdma_alloc_queue
> > > > >       -> rdma_create_id
> > > >
> > > > If it does work, it is by luck.
> > >
> > > I didn't check every ULP, but it seems that all ULPs use the same
> > > module_init.
> > >
> > > >
> > > > Keep in mind all this only matters for kernels without modules.
> > >
> > > Can it be related to the fact that other ULPs call to ib_register_client()
> > > before calling to rdma-cm? RNBD does not have such call.
> >
> > If the rdma_create_id() is not on a callchain from module_init then
> > you don't have a problem.
> >
> > nvme has a bug here, IIRC. It is not OK to create RDMA CM IDs outside
> > a client - CM IDs are supposed to be cleaned up when the client is
> > removed.
> >
> > Similarly they are supposed to be created from the client attachment.
> >
> > Though listening CM IDs unbound to any device may change that
> > slightly, I think it is probably best practice to create the listening
> > ID only if a client is bound.
> >
> > Most probably that is the best way to fix rnbd
> >
> > > I'm not proposing this, but just loudly wondering, do we really need rdma-cm
> > > as a separate module? Can we bring it to be part of ib_core?
> >
> > No idea.. It doesn't help this situation at least
> >
> > Jason
>
>
>
> --
>
> Regards
> -Haris
Md Haris Iqbal June 23, 2020, 9:50 a.m. UTC | #8
Hi Jason and Leon,

Did you get a chance to look into my previous email?

On Thu, Jun 18, 2020 at 2:44 PM Haris Iqbal <haris.iqbal@cloud.ionos.com> wrote:
>
> It seems that the "rdma_bind_addr()" is called by the nvme rdma
> module; but during the following events
> 1) When a discover happens from the client side. Call trace for that looks like,
> [ 1098.409398] nvmf_dev_write
> [ 1098.409403] nvmf_create_ctrl
> [ 1098.414568] nvme_rdma_create_ctrl
> [ 1098.415009] nvme_rdma_setup_ctrl
> [ 1098.415010] nvme_rdma_configure_admin_queue
> [ 1098.415010] nvme_rdma_alloc_queue
>                              -->(rdma_create_id)
> [ 1098.415032] rdma_resolve_addr
> [ 1098.415032] cma_bind_addr
> [ 1098.415033] rdma_bind_addr
>
> 2) When a connect happens from the client side. Call trace is the same
> as above, plus "nvme_rdma_alloc_queue()" is called n number of times;
> n being the number of IO queues being created.
>
>
> On the server side, when an nvmf port is enabled, that also triggers a
> call to "rdma_bind_addr()", but that is not from the nvme rdma module.
> may be nvme target rdma? (not sure).
>
> On Thu, Jun 18, 2020 at 7:15 AM Haris Iqbal <haris.iqbal@cloud.ionos.com> wrote:
> >
> > (Apologies for multiple emails. Was having trouble with an extension,
> > cause of which emails did not get delivered to the mailing list.
> > Resolved now.)
> >
> > > Somehow nvme-rdma works:
> >
> > I think that's because the callchain during the nvme_rdma_init_module
> > initialization stops at "nvmf_register_transport()". Here only the
> > "struct nvmf_transport_ops nvme_rdma_transport" is registered, which
> > contains the function "nvme_rdma_create_ctrl()". I tested this in my
> > local setup and during kernel boot, that's the extent of the
> > callchain.
> > The ".create_ctrl"; which now points to "nvme_rdma_create_ctrl()" is
> > called later from "nvmf_dev_write()". I am not sure when this is
> > called, probably when the "discover" happens from the client side or
> > during the server config. I am trying to test this to confirm, will
> > send more details once I am done.
> > Am I missing something here?
> >
> >
> > > If the rdma_create_id() is not on a callchain from module_init then you don't have a problem.
> >
> > I am a little confused. I thought the problem occurs from a call to
> > either "rdma_resolve_addr()" which calls "rdma_bind_addr()",
> > or a direct call to "rdma_bind_addr()" as in rtrs case.
> > In both the cases, a call to "rdma_create_id()" is needed before this.
> >
> >
> > > Similarly they are supposed to be created from the client attachment.
> > I am a beginner in terms of concepts here. Did you mean when a client
> > tries to establish the first connection to an rdma server?
> >
> >
> > On Thu, Jun 18, 2020 at 12:56 AM Jason Gunthorpe <jgg@ziepe.ca> wrote:
> > >
> > > On Wed, Jun 17, 2020 at 10:07:56PM +0300, Leon Romanovsky wrote:
> > > > On Wed, Jun 17, 2020 at 03:20:46PM -0300, Jason Gunthorpe wrote:
> > > > > On Wed, Jun 17, 2020 at 02:28:11PM +0300, Leon Romanovsky wrote:
> > > > > > On Wed, Jun 17, 2020 at 04:07:32PM +0530, haris.iqbal@cloud.ionos.com wrote:
> > > > > > > From: Md Haris Iqbal <haris.iqbal@cloud.ionos.com>
> > > > > > >
> > > > > > > Fixes: 2de6c8de192b ("block/rnbd: server: main functionality")
> > > > > > > Reported-by: kernel test robot <rong.a.chen@intel.com>
> > > > > > > Signed-off-by: Md Haris Iqbal <haris.iqbal@cloud.ionos.com>
> > > > > > >
> > > > > > > The rnbd_server module's communication manager initialization depends on the
> > > > > > > registration of the "network namespace subsystem" of the RDMA CM agent module.
> > > > > > > As such, when the kernel is configured to load the rnbd_server and the RDMA
> > > > > > > cma module during initialization; and if the rnbd_server module is initialized
> > > > > > > before RDMA cma module, a null ptr dereference occurs during the RDMA bind
> > > > > > > operation.
> > > > > > > This patch delays the initialization of the rnbd_server module to the
> > > > > > > late_initcall level, since RDMA cma module uses module_init which puts it into
> > > > > > > the device_initcall level.
> > > > > > >  drivers/block/rnbd/rnbd-srv.c | 2 +-
> > > > > > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > > > > > >
> > > > > > > diff --git a/drivers/block/rnbd/rnbd-srv.c b/drivers/block/rnbd/rnbd-srv.c
> > > > > > > index 86e61523907b..213df05e5994 100644
> > > > > > > +++ b/drivers/block/rnbd/rnbd-srv.c
> > > > > > > @@ -840,5 +840,5 @@ static void __exit rnbd_srv_cleanup_module(void)
> > > > > > >         rnbd_srv_destroy_sysfs_files();
> > > > > > >  }
> > > > > > >
> > > > > > > -module_init(rnbd_srv_init_module);
> > > > > > > +late_initcall(rnbd_srv_init_module);
> > > > > >
> > > > > > I don't think that this is correct change. Somehow nvme-rdma works:
> > > > > > module_init(nvme_rdma_init_module);
> > > > > > -> nvme_rdma_init_module
> > > > > >  -> nvmf_register_transport(&nvme_rdma_transport);
> > > > > >   -> nvme_rdma_create_ctrl
> > > > > >    -> nvme_rdma_setup_ctrl
> > > > > >     -> nvme_rdma_configure_admin_queue
> > > > > >      -> nvme_rdma_alloc_queue
> > > > > >       -> rdma_create_id
> > > > >
> > > > > If it does work, it is by luck.
> > > >
> > > > I didn't check every ULP, but it seems that all ULPs use the same
> > > > module_init.
> > > >
> > > > >
> > > > > Keep in mind all this only matters for kernels without modules.
> > > >
> > > > Can it be related to the fact that other ULPs call to ib_register_client()
> > > > before calling to rdma-cm? RNBD does not have such call.
> > >
> > > If the rdma_create_id() is not on a callchain from module_init then
> > > you don't have a problem.
> > >
> > > nvme has a bug here, IIRC. It is not OK to create RDMA CM IDs outside
> > > a client - CM IDs are supposed to be cleaned up when the client is
> > > removed.
> > >
> > > Similarly they are supposed to be created from the client attachment.
> > >
> > > Though listening CM IDs unbound to any device may change that
> > > slightly, I think it is probably best practice to create the listening
> > > ID only if a client is bound.
> > >
> > > Most probably that is the best way to fix rnbd
> > >
> > > > I'm not proposing this, but just loudly wondering, do we really need rdma-cm
> > > > as a separate module? Can we bring it to be part of ib_core?
> > >
> > > No idea.. It doesn't help this situation at least
> > >
> > > Jason
> >
> >
> >
> > --
> >
> > Regards
> > -Haris
>
>
>
> --
>
> Regards
> -Haris
Md Haris Iqbal June 23, 2020, 11:35 a.m. UTC | #9
On Tue, Jun 23, 2020 at 7:54 PM Jason Gunthorpe <jgg@ziepe.ca> wrote:
>
> On Tue, Jun 23, 2020 at 07:15:03PM +0530, Haris Iqbal wrote:
> > On Tue, Jun 23, 2020 at 5:47 PM Jason Gunthorpe <jgg@ziepe.ca> wrote:
> > >
> > > On Tue, Jun 23, 2020 at 03:20:27PM +0530, Haris Iqbal wrote:
> > > > Hi Jason and Leon,
> > > >
> > > > Did you get a chance to look into my previous email?
> > >
> > > Was there a question?
> >
> > Multiple actually :)
> >
> > >
> > > Jason
> >
> > In response to your emails,
> >
> > > Somehow nvme-rdma works:
> >
> > I think that's because the callchain during the nvme_rdma_init_module
> > initialization stops at "nvmf_register_transport()". Here only the
> > "struct nvmf_transport_ops nvme_rdma_transport" is registered, which
> > contains the function "nvme_rdma_create_ctrl()". I tested this in my
> > local setup and during kernel boot, that's the extent of the
> > callchain.
> > The ".create_ctrl"; which now points to "nvme_rdma_create_ctrl()" is
> > called later from "nvmf_dev_write()". I am not sure when this is
> > called, probably when the "discover" happens from the client side or
> > during the server config.
> >
> > It seems that the "rdma_bind_addr()" is called by the nvme rdma
> > module; but during the following events
> > 1) When a discover happens from the client side. Call trace for that looks like,
> > [ 1098.409398] nvmf_dev_write
> > [ 1098.409403] nvmf_create_ctrl
> > [ 1098.414568] nvme_rdma_create_ctrl
> > [ 1098.415009] nvme_rdma_setup_ctrl
> > [ 1098.415010] nvme_rdma_configure_admin_queue
> > [ 1098.415010] nvme_rdma_alloc_queue
> > [ 1098.415032] rdma_resolve_addr
> > [ 1098.415032] cma_bind_addr
> > [ 1098.415033] rdma_bind_addr
> >
> > 2) When a connect happens from the client side. Call trace is the same
> > as above, plus "nvme_rdma_alloc_queue()" is called n number of times;
> > n being the number of IO queues being created.
> >
> > On the server side, when an nvmf port is enabled, that also triggers a
> > call to "rdma_bind_addr()", but that is not from the nvme rdma module.
> > may be nvme target rdma? (not sure).
> >
> > Does this make sense or am I missing something here?
>
> It make sense, delaying creating and CM ID's until user space starts
> will solve this init time problme

Right, and the patch is trying to achieve the delay by changing the
init level to "late_initcall()"

>
> >
> > > If the rdma_create_id() is not on a callchain from module_init then you don't have a problem.
> >
> > I am a little confused. I thought the problem occurs from a call to
> > either "rdma_resolve_addr()" which calls "rdma_bind_addr()",
> > or a direct call to "rdma_bind_addr()" as in rtrs case.
> > In both the cases, a call to "rdma_create_id()" is needed before this.
>
> Right rdma_create_id() must precede anything that has problems, and it
> should not be done from module_init.

I understand this, but I am not sure why that is; as in why it should
not be done from module_init?

Also, about one of your previous statements,

> It is not OK to create RDMA CM IDs outside
> a client - CM IDs are supposed to be cleaned up when the client is
> removed.
>
> Similarly they are supposed to be created from the client attachment.

This again is a little confusing to me, since what I've observed in
nvmt is, when a server port is created, the "rdma_bind_addr()"
function is called.
And this goes well with the server/target and client/initiator model,
where the server has to get ready and start listening before a client
can initiate a connection.
What am I missing here?

>
> Jason
Jason Gunthorpe June 23, 2020, 12:17 p.m. UTC | #10
On Tue, Jun 23, 2020 at 03:20:27PM +0530, Haris Iqbal wrote:
> Hi Jason and Leon,
> 
> Did you get a chance to look into my previous email?

Was there a question?

Jason
Md Haris Iqbal June 23, 2020, 1:45 p.m. UTC | #11
On Tue, Jun 23, 2020 at 5:47 PM Jason Gunthorpe <jgg@ziepe.ca> wrote:
>
> On Tue, Jun 23, 2020 at 03:20:27PM +0530, Haris Iqbal wrote:
> > Hi Jason and Leon,
> >
> > Did you get a chance to look into my previous email?
>
> Was there a question?

Multiple actually :)

>
> Jason

In response to your emails,

> Somehow nvme-rdma works:

I think that's because the callchain during the nvme_rdma_init_module
initialization stops at "nvmf_register_transport()". Here only the
"struct nvmf_transport_ops nvme_rdma_transport" is registered, which
contains the function "nvme_rdma_create_ctrl()". I tested this in my
local setup and during kernel boot, that's the extent of the
callchain.
The ".create_ctrl"; which now points to "nvme_rdma_create_ctrl()" is
called later from "nvmf_dev_write()". I am not sure when this is
called, probably when the "discover" happens from the client side or
during the server config.

It seems that the "rdma_bind_addr()" is called by the nvme rdma
module; but during the following events
1) When a discover happens from the client side. Call trace for that looks like,
[ 1098.409398] nvmf_dev_write
[ 1098.409403] nvmf_create_ctrl
[ 1098.414568] nvme_rdma_create_ctrl
[ 1098.415009] nvme_rdma_setup_ctrl
[ 1098.415010] nvme_rdma_configure_admin_queue
[ 1098.415010] nvme_rdma_alloc_queue
                             -->(rdma_create_id)
[ 1098.415032] rdma_resolve_addr
[ 1098.415032] cma_bind_addr
[ 1098.415033] rdma_bind_addr

2) When a connect happens from the client side. Call trace is the same
as above, plus "nvme_rdma_alloc_queue()" is called n number of times;
n being the number of IO queues being created.

On the server side, when an nvmf port is enabled, that also triggers a
call to "rdma_bind_addr()", but that is not from the nvme rdma module.
may be nvme target rdma? (not sure).

Does this make sense or am I missing something here?


> If the rdma_create_id() is not on a callchain from module_init then you don't have a problem.

I am a little confused. I thought the problem occurs from a call to
either "rdma_resolve_addr()" which calls "rdma_bind_addr()",
or a direct call to "rdma_bind_addr()" as in rtrs case.
In both the cases, a call to "rdma_create_id()" is needed before this.


> Similarly they are supposed to be created from the client attachment.
I am a beginner in terms of concepts here. Did you mean when a client
tries to establish the first connection to an rdma server?
Jason Gunthorpe June 23, 2020, 2:24 p.m. UTC | #12
On Tue, Jun 23, 2020 at 07:15:03PM +0530, Haris Iqbal wrote:
> On Tue, Jun 23, 2020 at 5:47 PM Jason Gunthorpe <jgg@ziepe.ca> wrote:
> >
> > On Tue, Jun 23, 2020 at 03:20:27PM +0530, Haris Iqbal wrote:
> > > Hi Jason and Leon,
> > >
> > > Did you get a chance to look into my previous email?
> >
> > Was there a question?
> 
> Multiple actually :)
> 
> >
> > Jason
> 
> In response to your emails,
> 
> > Somehow nvme-rdma works:
> 
> I think that's because the callchain during the nvme_rdma_init_module
> initialization stops at "nvmf_register_transport()". Here only the
> "struct nvmf_transport_ops nvme_rdma_transport" is registered, which
> contains the function "nvme_rdma_create_ctrl()". I tested this in my
> local setup and during kernel boot, that's the extent of the
> callchain.
> The ".create_ctrl"; which now points to "nvme_rdma_create_ctrl()" is
> called later from "nvmf_dev_write()". I am not sure when this is
> called, probably when the "discover" happens from the client side or
> during the server config.
> 
> It seems that the "rdma_bind_addr()" is called by the nvme rdma
> module; but during the following events
> 1) When a discover happens from the client side. Call trace for that looks like,
> [ 1098.409398] nvmf_dev_write
> [ 1098.409403] nvmf_create_ctrl
> [ 1098.414568] nvme_rdma_create_ctrl
> [ 1098.415009] nvme_rdma_setup_ctrl
> [ 1098.415010] nvme_rdma_configure_admin_queue
> [ 1098.415010] nvme_rdma_alloc_queue
> [ 1098.415032] rdma_resolve_addr
> [ 1098.415032] cma_bind_addr
> [ 1098.415033] rdma_bind_addr
> 
> 2) When a connect happens from the client side. Call trace is the same
> as above, plus "nvme_rdma_alloc_queue()" is called n number of times;
> n being the number of IO queues being created.
> 
> On the server side, when an nvmf port is enabled, that also triggers a
> call to "rdma_bind_addr()", but that is not from the nvme rdma module.
> may be nvme target rdma? (not sure).
> 
> Does this make sense or am I missing something here?

It make sense, delaying creating and CM ID's until user space starts
will solve this init time problme

> 
> > If the rdma_create_id() is not on a callchain from module_init then you don't have a problem.
> 
> I am a little confused. I thought the problem occurs from a call to
> either "rdma_resolve_addr()" which calls "rdma_bind_addr()",
> or a direct call to "rdma_bind_addr()" as in rtrs case.
> In both the cases, a call to "rdma_create_id()" is needed before this.

Right rdma_create_id() must precede anything that has problems, and it
should not be done from module_init.

Jason
Jason Gunthorpe June 23, 2020, 5:23 p.m. UTC | #13
On Tue, Jun 23, 2020 at 05:05:51PM +0530, Haris Iqbal wrote:
> On Tue, Jun 23, 2020 at 7:54 PM Jason Gunthorpe <jgg@ziepe.ca> wrote:
> >
> > On Tue, Jun 23, 2020 at 07:15:03PM +0530, Haris Iqbal wrote:
> > > On Tue, Jun 23, 2020 at 5:47 PM Jason Gunthorpe <jgg@ziepe.ca> wrote:
> > > >
> > > > On Tue, Jun 23, 2020 at 03:20:27PM +0530, Haris Iqbal wrote:
> > > > > Hi Jason and Leon,
> > > > >
> > > > > Did you get a chance to look into my previous email?
> > > >
> > > > Was there a question?
> > >
> > > Multiple actually :)
> > >
> > > >
> > > > Jason
> > >
> > > In response to your emails,
> > >
> > > > Somehow nvme-rdma works:
> > >
> > > I think that's because the callchain during the nvme_rdma_init_module
> > > initialization stops at "nvmf_register_transport()". Here only the
> > > "struct nvmf_transport_ops nvme_rdma_transport" is registered, which
> > > contains the function "nvme_rdma_create_ctrl()". I tested this in my
> > > local setup and during kernel boot, that's the extent of the
> > > callchain.
> > > The ".create_ctrl"; which now points to "nvme_rdma_create_ctrl()" is
> > > called later from "nvmf_dev_write()". I am not sure when this is
> > > called, probably when the "discover" happens from the client side or
> > > during the server config.
> > >
> > > It seems that the "rdma_bind_addr()" is called by the nvme rdma
> > > module; but during the following events
> > > 1) When a discover happens from the client side. Call trace for that looks like,
> > > [ 1098.409398] nvmf_dev_write
> > > [ 1098.409403] nvmf_create_ctrl
> > > [ 1098.414568] nvme_rdma_create_ctrl
> > > [ 1098.415009] nvme_rdma_setup_ctrl
> > > [ 1098.415010] nvme_rdma_configure_admin_queue
> > > [ 1098.415010] nvme_rdma_alloc_queue
> > > [ 1098.415032] rdma_resolve_addr
> > > [ 1098.415032] cma_bind_addr
> > > [ 1098.415033] rdma_bind_addr
> > >
> > > 2) When a connect happens from the client side. Call trace is the same
> > > as above, plus "nvme_rdma_alloc_queue()" is called n number of times;
> > > n being the number of IO queues being created.
> > >
> > > On the server side, when an nvmf port is enabled, that also triggers a
> > > call to "rdma_bind_addr()", but that is not from the nvme rdma module.
> > > may be nvme target rdma? (not sure).
> > >
> > > Does this make sense or am I missing something here?
> >
> > It make sense, delaying creating and CM ID's until user space starts
> > will solve this init time problme
> 
> Right, and the patch is trying to achieve the delay by changing the
> init level to "late_initcall()"

It should not be done with initcall levels

> > Right rdma_create_id() must precede anything that has problems, and it
> > should not be done from module_init.
> 
> I understand this, but I am not sure why that is; as in why it should
> not be done from module_init?

Because that is how our module ordering scheme works

> > It is not OK to create RDMA CM IDs outside
> > a client - CM IDs are supposed to be cleaned up when the client is
> > removed.
> >
> > Similarly they are supposed to be created from the client attachment.
> 
> This again is a little confusing to me, since what I've observed in
> nvmt is, when a server port is created, the "rdma_bind_addr()"
> function is called.
> And this goes well with the server/target and client/initiator model,
> where the server has to get ready and start listening before a client
> can initiate a connection.
> What am I missing here?

client means a struct ib_client

Jason

> 
> >
> > Jason
>
diff mbox series

Patch

diff --git a/drivers/block/rnbd/rnbd-srv.c b/drivers/block/rnbd/rnbd-srv.c
index 86e61523907b..213df05e5994 100644
--- a/drivers/block/rnbd/rnbd-srv.c
+++ b/drivers/block/rnbd/rnbd-srv.c
@@ -840,5 +840,5 @@  static void __exit rnbd_srv_cleanup_module(void)
 	rnbd_srv_destroy_sysfs_files();
 }
 
-module_init(rnbd_srv_init_module);
+late_initcall(rnbd_srv_init_module);
 module_exit(rnbd_srv_cleanup_module);