diff mbox series

[RFC,WIP,1/2] RDMA/Core: add RDMA_NLDEV_CMD_NEWLINK/DELLLINK support

Message ID 5753829890367ccdd54a6a80d2420930922dd4cd.1537896647.git.swise@opengridcomputing.com (mailing list archive)
State RFC
Headers show
Series *** SUBJECT HERE *** | expand

Commit Message

Steve Wise Sept. 13, 2018, 7:16 p.m. UTC
Add support for new LINK messages to allow adding and deleting rdma
interfaces.  This will be used initially for soft rdma drivers which
instantiate device instances dynamically by the admin specifying a netdev
device to use.  The rdma_rxe module will be the first user of these
messages.

The design is modeled after RTNL_NEWLINK/DELLINK:  rdma drivers
register with the rdma core if they provide link add/delete functions.
Each driver registers with a unique "type" string, that is used to
dispatch messages coming from user space.  A new RDMA_NLDEV_ATTR is
defined for the "type" string.  User mode will pass 3 attributes in a
NEWLINK message: RDMA_NLDEV_ATTR_IBDEV_NAME for the desired rdma device
name to be created, RDMA_NLDEV_ATTR_LINK_TYPE for the "type" of link
being added, and RDMA_NLDEV_ATTR_NDEV_NAME for the net_device interface
to use for this link. The DELLINK message will contain the IBDEV_NAME
and LINK_TYPE attributes.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
---
 drivers/infiniband/core/nldev.c  | 113 +++++++++++++++++++++++++++++++++++++++
 include/rdma/rdma_netlink.h      |  11 ++++
 include/uapi/rdma/rdma_netlink.h |   9 ++++
 3 files changed, 133 insertions(+)

Comments

Leon Romanovsky Sept. 25, 2018, 5:55 p.m. UTC | #1
On Thu, Sep 13, 2018 at 12:16:20PM -0700, Steve Wise wrote:
> Add support for new LINK messages to allow adding and deleting rdma
> interfaces.  This will be used initially for soft rdma drivers which
> instantiate device instances dynamically by the admin specifying a netdev
> device to use.  The rdma_rxe module will be the first user of these
> messages.
>
> The design is modeled after RTNL_NEWLINK/DELLINK:  rdma drivers
> register with the rdma core if they provide link add/delete functions.
> Each driver registers with a unique "type" string, that is used to
> dispatch messages coming from user space.  A new RDMA_NLDEV_ATTR is
> defined for the "type" string.  User mode will pass 3 attributes in a
> NEWLINK message: RDMA_NLDEV_ATTR_IBDEV_NAME for the desired rdma device
> name to be created, RDMA_NLDEV_ATTR_LINK_TYPE for the "type" of link
> being added, and RDMA_NLDEV_ATTR_NDEV_NAME for the net_device interface
> to use for this link. The DELLINK message will contain the IBDEV_NAME
> and LINK_TYPE attributes.
>
> Signed-off-by: Steve Wise <swise@opengridcomputing.com>
> ---

<...>
>
> +	RDMA_NLDEV_CMD_NEWLINK,
> +
> +	RDMA_NLDEV_CMD_DELLINK,
> +
>  	RDMA_NLDEV_NUM_OPS
>  };
>
> @@ -427,6 +431,11 @@ enum rdma_nldev_attr {
>  	RDMA_NLDEV_ATTR_DRIVER_U64,		/* u64 */
>
>  	/*
> +	 * Identifies the rdma driver. eg: "rxe" or "siw"
> +	 */
> +	RDMA_NLDEV_ATTR_LINK_TYPE,		/* string */

I'll review it more deeply tomorrow, but two things caught my attention:
1. You need RDMA_NL_ADMIN_PERM for your netlink commands
2. LINK_TYPE should be index and not string, because it is why we are
using using netlink :)

Thanks

> +
> +	/*
>  	 * Always the end
>  	 */
>  	RDMA_NLDEV_ATTR_MAX
> --
> 1.8.3.1
>
Leon Romanovsky Sept. 25, 2018, 6:19 p.m. UTC | #2
On Tue, Sep 25, 2018 at 01:00:47PM -0500, Steve Wise wrote:
>
>
> On 9/25/2018 12:55 PM, Leon Romanovsky wrote:
> > On Thu, Sep 13, 2018 at 12:16:20PM -0700, Steve Wise wrote:
> >> Add support for new LINK messages to allow adding and deleting rdma
> >> interfaces.  This will be used initially for soft rdma drivers which
> >> instantiate device instances dynamically by the admin specifying a netdev
> >> device to use.  The rdma_rxe module will be the first user of these
> >> messages.
> >>
> >> The design is modeled after RTNL_NEWLINK/DELLINK:  rdma drivers
> >> register with the rdma core if they provide link add/delete functions.
> >> Each driver registers with a unique "type" string, that is used to
> >> dispatch messages coming from user space.  A new RDMA_NLDEV_ATTR is
> >> defined for the "type" string.  User mode will pass 3 attributes in a
> >> NEWLINK message: RDMA_NLDEV_ATTR_IBDEV_NAME for the desired rdma device
> >> name to be created, RDMA_NLDEV_ATTR_LINK_TYPE for the "type" of link
> >> being added, and RDMA_NLDEV_ATTR_NDEV_NAME for the net_device interface
> >> to use for this link. The DELLINK message will contain the IBDEV_NAME
> >> and LINK_TYPE attributes.
> >>
> >> Signed-off-by: Steve Wise <swise@opengridcomputing.com>
> >> ---
> > <...>
> >> +	RDMA_NLDEV_CMD_NEWLINK,
> >> +
> >> +	RDMA_NLDEV_CMD_DELLINK,
> >> +
> >>  	RDMA_NLDEV_NUM_OPS
> >>  };
> >>
> >> @@ -427,6 +431,11 @@ enum rdma_nldev_attr {
> >>  	RDMA_NLDEV_ATTR_DRIVER_U64,		/* u64 */
> >>
> >>  	/*
> >> +	 * Identifies the rdma driver. eg: "rxe" or "siw"
> >> +	 */
> >> +	RDMA_NLDEV_ATTR_LINK_TYPE,		/* string */
> > I'll review it more deeply tomorrow, but two things caught my attention:
> > 1. You need RDMA_NL_ADMIN_PERM for your netlink commands
>
> Yes.  Thanks.
>
> > 2. LINK_TYPE should be index and not string, because it is why we are
> > using using netlink :)
>
> Hey Leon,
>
> Using a string allows the rdma tool to not have to map a string to some
> enum value, allows adding new drivers w/o updating rdmatool...

Most probably, you won't find real system with old userspace and new kernel.

>
> This was at Jason's suggestion.
>
> Steve.
>
Jason Gunthorpe Sept. 25, 2018, 8:11 p.m. UTC | #3
On Tue, Sep 25, 2018 at 08:55:29PM +0300, Leon Romanovsky wrote:
> On Thu, Sep 13, 2018 at 12:16:20PM -0700, Steve Wise wrote:
> > Add support for new LINK messages to allow adding and deleting rdma
> > interfaces.  This will be used initially for soft rdma drivers which
> > instantiate device instances dynamically by the admin specifying a netdev
> > device to use.  The rdma_rxe module will be the first user of these
> > messages.
> >
> > The design is modeled after RTNL_NEWLINK/DELLINK:  rdma drivers
> > register with the rdma core if they provide link add/delete functions.
> > Each driver registers with a unique "type" string, that is used to
> > dispatch messages coming from user space.  A new RDMA_NLDEV_ATTR is
> > defined for the "type" string.  User mode will pass 3 attributes in a
> > NEWLINK message: RDMA_NLDEV_ATTR_IBDEV_NAME for the desired rdma device
> > name to be created, RDMA_NLDEV_ATTR_LINK_TYPE for the "type" of link
> > being added, and RDMA_NLDEV_ATTR_NDEV_NAME for the net_device interface
> > to use for this link. The DELLINK message will contain the IBDEV_NAME
> > and LINK_TYPE attributes.
> >
> > Signed-off-by: Steve Wise <swise@opengridcomputing.com>
> 
> <...>
> >
> > +	RDMA_NLDEV_CMD_NEWLINK,
> > +
> > +	RDMA_NLDEV_CMD_DELLINK,
> > +
> >  	RDMA_NLDEV_NUM_OPS
> >  };
> >
> > @@ -427,6 +431,11 @@ enum rdma_nldev_attr {
> >  	RDMA_NLDEV_ATTR_DRIVER_U64,		/* u64 */
> >
> >  	/*
> > +	 * Identifies the rdma driver. eg: "rxe" or "siw"
> > +	 */
> > +	RDMA_NLDEV_ATTR_LINK_TYPE,		/* string */
> 
> I'll review it more deeply tomorrow, but two things caught my attention:
> 1. You need RDMA_NL_ADMIN_PERM for your netlink commands
> 2. LINK_TYPE should be index and not string, because it is why we are
> using using netlink :)

netdev uses strings. See IFLA_INFO_KIND

strings are not bad in netlink

Jason
Jason Gunthorpe Sept. 25, 2018, 8:16 p.m. UTC | #4
On Thu, Sep 13, 2018 at 12:16:20PM -0700, Steve Wise wrote:
> Add support for new LINK messages to allow adding and deleting rdma
> interfaces.  This will be used initially for soft rdma drivers which
> instantiate device instances dynamically by the admin specifying a netdev
> device to use.  The rdma_rxe module will be the first user of these
> messages.
> 
> The design is modeled after RTNL_NEWLINK/DELLINK:  rdma drivers
> register with the rdma core if they provide link add/delete functions.
> Each driver registers with a unique "type" string, that is used to
> dispatch messages coming from user space.  A new RDMA_NLDEV_ATTR is
> defined for the "type" string.  User mode will pass 3 attributes in a
> NEWLINK message: RDMA_NLDEV_ATTR_IBDEV_NAME for the desired rdma device
> name to be created, RDMA_NLDEV_ATTR_LINK_TYPE for the "type" of link
> being added, and RDMA_NLDEV_ATTR_NDEV_NAME for the net_device interface
> to use for this link. The DELLINK message will contain the IBDEV_NAME
> and LINK_TYPE attributes.
> 
> Signed-off-by: Steve Wise <swise@opengridcomputing.com>
>  drivers/infiniband/core/nldev.c  | 113 +++++++++++++++++++++++++++++++++++++++
>  include/rdma/rdma_netlink.h      |  11 ++++
>  include/uapi/rdma/rdma_netlink.h |   9 ++++
>  3 files changed, 133 insertions(+)
> 
> diff --git a/drivers/infiniband/core/nldev.c b/drivers/infiniband/core/nldev.c
> index 0385ab438320..d107b982c210 100644
> +++ b/drivers/infiniband/core/nldev.c
> @@ -33,6 +33,7 @@
>  #include <linux/module.h>
>  #include <linux/pid.h>
>  #include <linux/pid_namespace.h>
> +#include <linux/mutex.h>
>  #include <net/netlink.h>
>  #include <rdma/rdma_cm.h>
>  #include <rdma/rdma_netlink.h>
> @@ -107,6 +108,8 @@
>  	[RDMA_NLDEV_ATTR_DRIVER_U32]		= { .type = NLA_U32 },
>  	[RDMA_NLDEV_ATTR_DRIVER_S64]		= { .type = NLA_S64 },
>  	[RDMA_NLDEV_ATTR_DRIVER_U64]		= { .type = NLA_U64 },
> +	[RDMA_NLDEV_ATTR_LINK_TYPE]		= { .type = NLA_NUL_STRING,
> +						    .len = IFNAMSIZ },
>  };
>  
>  static int put_driver_name_print_type(struct sk_buff *msg, const char *name,
> @@ -1072,6 +1075,110 @@ static int nldev_res_get_pd_dumpit(struct sk_buff *skb,
>  	return res_get_common_dumpit(skb, cb, RDMA_RESTRACK_PD);
>  }
>  
> +static LIST_HEAD(link_ops);
> +static DEFINE_MUTEX(link_ops_mutex);
> +
> +void rdma_link_register(struct rdma_link_ops *ops)
> +{
> +	mutex_lock(&link_ops_mutex);
> +	list_add(&ops->list, &link_ops);

Make sure the name is unique

> +	mutex_unlock(&link_ops_mutex);
> +}
> +EXPORT_SYMBOL(rdma_link_register);
> +
> +void rdma_link_unregister(struct rdma_link_ops *ops)
> +{
> +	mutex_lock(&link_ops_mutex);
> +	list_del_init(&ops->list);
> +	mutex_unlock(&link_ops_mutex);
> +}
> +EXPORT_SYMBOL(rdma_link_unregister);
> +
> +static const struct rdma_link_ops *link_ops_get(const char *type)
> +{
> +	const struct rdma_link_ops *ops;
> +
> +	mutex_lock(&link_ops_mutex);
> +	list_for_each_entry(ops, &link_ops, list) {
> +		if (!strncmp(ops->type, type, IFNAMSIZ))

Why strncmp here? Both strings should be null terminated.

> +			goto out;
> +	}
> +	ops = NULL;

This should trigger module automatic loading and try again.

Ie call 

   request_module("rdma-link-%s", type);

And rxe should provide that symbol as a module-alias.

See the similar code in netdev, ie MODULE_ALIAS_RTNL_LINK and the
request_module calls

Jason
Leon Romanovsky Sept. 26, 2018, 8:05 a.m. UTC | #5
On Tue, Sep 25, 2018 at 02:11:18PM -0600, Jason Gunthorpe wrote:
> On Tue, Sep 25, 2018 at 08:55:29PM +0300, Leon Romanovsky wrote:
> > On Thu, Sep 13, 2018 at 12:16:20PM -0700, Steve Wise wrote:
> > > Add support for new LINK messages to allow adding and deleting rdma
> > > interfaces.  This will be used initially for soft rdma drivers which
> > > instantiate device instances dynamically by the admin specifying a netdev
> > > device to use.  The rdma_rxe module will be the first user of these
> > > messages.
> > >
> > > The design is modeled after RTNL_NEWLINK/DELLINK:  rdma drivers
> > > register with the rdma core if they provide link add/delete functions.
> > > Each driver registers with a unique "type" string, that is used to
> > > dispatch messages coming from user space.  A new RDMA_NLDEV_ATTR is
> > > defined for the "type" string.  User mode will pass 3 attributes in a
> > > NEWLINK message: RDMA_NLDEV_ATTR_IBDEV_NAME for the desired rdma device
> > > name to be created, RDMA_NLDEV_ATTR_LINK_TYPE for the "type" of link
> > > being added, and RDMA_NLDEV_ATTR_NDEV_NAME for the net_device interface
> > > to use for this link. The DELLINK message will contain the IBDEV_NAME
> > > and LINK_TYPE attributes.
> > >
> > > Signed-off-by: Steve Wise <swise@opengridcomputing.com>
> >
> > <...>
> > >
> > > +	RDMA_NLDEV_CMD_NEWLINK,
> > > +
> > > +	RDMA_NLDEV_CMD_DELLINK,
> > > +
> > >  	RDMA_NLDEV_NUM_OPS
> > >  };
> > >
> > > @@ -427,6 +431,11 @@ enum rdma_nldev_attr {
> > >  	RDMA_NLDEV_ATTR_DRIVER_U64,		/* u64 */
> > >
> > >  	/*
> > > +	 * Identifies the rdma driver. eg: "rxe" or "siw"
> > > +	 */
> > > +	RDMA_NLDEV_ATTR_LINK_TYPE,		/* string */
> >
> > I'll review it more deeply tomorrow, but two things caught my attention:
> > 1. You need RDMA_NL_ADMIN_PERM for your netlink commands
> > 2. LINK_TYPE should be index and not string, because it is why we are
> > using using netlink :)
>
> netdev uses strings. See IFLA_INFO_KIND
>
> strings are not bad in netlink

It is not the best example, because IFLA_INFO_KIND is used to forward
filtering information and one day we will do the same with various
rdmatool dumping options. However in case of ibdevice type, it is
declared as number and doesn't seem to me right to convert it to string
and limit our KABI to name.

Thanks

>
> Jason
Leon Romanovsky Sept. 26, 2018, 8:24 a.m. UTC | #6
On Thu, Sep 13, 2018 at 12:16:20PM -0700, Steve Wise wrote:
> Add support for new LINK messages to allow adding and deleting rdma
> interfaces.  This will be used initially for soft rdma drivers which
> instantiate device instances dynamically by the admin specifying a netdev
> device to use.  The rdma_rxe module will be the first user of these
> messages.
>
> The design is modeled after RTNL_NEWLINK/DELLINK:  rdma drivers
> register with the rdma core if they provide link add/delete functions.
> Each driver registers with a unique "type" string, that is used to
> dispatch messages coming from user space.  A new RDMA_NLDEV_ATTR is
> defined for the "type" string.  User mode will pass 3 attributes in a
> NEWLINK message: RDMA_NLDEV_ATTR_IBDEV_NAME for the desired rdma device
> name to be created, RDMA_NLDEV_ATTR_LINK_TYPE for the "type" of link
> being added, and RDMA_NLDEV_ATTR_NDEV_NAME for the net_device interface
> to use for this link. The DELLINK message will contain the IBDEV_NAME
> and LINK_TYPE attributes.
>
> Signed-off-by: Steve Wise <swise@opengridcomputing.com>
> ---
>  drivers/infiniband/core/nldev.c  | 113 +++++++++++++++++++++++++++++++++++++++
>  include/rdma/rdma_netlink.h      |  11 ++++
>  include/uapi/rdma/rdma_netlink.h |   9 ++++
>  3 files changed, 133 insertions(+)
>
> diff --git a/drivers/infiniband/core/nldev.c b/drivers/infiniband/core/nldev.c
> index 0385ab438320..d107b982c210 100644
> --- a/drivers/infiniband/core/nldev.c
> +++ b/drivers/infiniband/core/nldev.c
> @@ -33,6 +33,7 @@
>  #include <linux/module.h>
>  #include <linux/pid.h>
>  #include <linux/pid_namespace.h>
> +#include <linux/mutex.h>
>  #include <net/netlink.h>
>  #include <rdma/rdma_cm.h>
>  #include <rdma/rdma_netlink.h>
> @@ -107,6 +108,8 @@
>  	[RDMA_NLDEV_ATTR_DRIVER_U32]		= { .type = NLA_U32 },
>  	[RDMA_NLDEV_ATTR_DRIVER_S64]		= { .type = NLA_S64 },
>  	[RDMA_NLDEV_ATTR_DRIVER_U64]		= { .type = NLA_U64 },
> +	[RDMA_NLDEV_ATTR_LINK_TYPE]		= { .type = NLA_NUL_STRING,
> +						    .len = IFNAMSIZ },

In case, we are continuing this path, better to use our define RDMA_NLDEV_ATTR_ENTRY_STRLEN.

>  };
>
>  static int put_driver_name_print_type(struct sk_buff *msg, const char *name,
> @@ -1072,6 +1075,110 @@ static int nldev_res_get_pd_dumpit(struct sk_buff *skb,
>  	return res_get_common_dumpit(skb, cb, RDMA_RESTRACK_PD);
>  }
>
> +static LIST_HEAD(link_ops);
> +static DEFINE_MUTEX(link_ops_mutex);
> +
> +void rdma_link_register(struct rdma_link_ops *ops)
> +{
> +	mutex_lock(&link_ops_mutex);
> +	list_add(&ops->list, &link_ops);
> +	mutex_unlock(&link_ops_mutex);
> +}
> +EXPORT_SYMBOL(rdma_link_register);
> +
> +void rdma_link_unregister(struct rdma_link_ops *ops)
> +{
> +	mutex_lock(&link_ops_mutex);
> +	list_del_init(&ops->list);

Why did you use list_del_init and not list_del?

And in case we will use device_type, we won't need manage list at all :)

> +	mutex_unlock(&link_ops_mutex);
> +}
> +EXPORT_SYMBOL(rdma_link_unregister);
> +
> +static const struct rdma_link_ops *link_ops_get(const char *type)
> +{
> +	const struct rdma_link_ops *ops;
> +
> +	mutex_lock(&link_ops_mutex);
> +	list_for_each_entry(ops, &link_ops, list) {
> +		if (!strncmp(ops->type, type, IFNAMSIZ))
> +			goto out;
> +	}
> +	ops = NULL;
> +out:
> +	mutex_unlock(&link_ops_mutex);
> +	return ops;
> +}
> +
> +static int nldev_newlink(struct sk_buff *skb, struct nlmsghdr *nlh,
> +			  struct netlink_ext_ack *extack)
> +{
> +	struct nlattr *tb[RDMA_NLDEV_ATTR_MAX];
> +	char ibdev_name[IB_DEVICE_NAME_MAX];
> +	const struct rdma_link_ops *ops;
> +	char ndev_name[IFNAMSIZ];
> +	char type[IFNAMSIZ];
> +	int err;
> +
> +	err = nlmsg_parse(nlh, 0, tb, RDMA_NLDEV_ATTR_MAX - 1,
> +			  nldev_policy, extack);
> +	if (err || !tb[RDMA_NLDEV_ATTR_DEV_NAME] ||
> +	    !tb[RDMA_NLDEV_ATTR_LINK_TYPE] || !tb[RDMA_NLDEV_ATTR_NDEV_NAME]) {
> +		err = -EINVAL;
> +		goto err_out;
> +	}
> +
> +	nla_strlcpy(ibdev_name, tb[RDMA_NLDEV_ATTR_DEV_NAME],
> +		    sizeof(ibdev_name));

Query from userspace should be done RDMA_NLDEV_ATTR_DEV_INDEX, because
it is our handle and not device name.

> +	nla_strlcpy(type, tb[RDMA_NLDEV_ATTR_LINK_TYPE], sizeof(type));

Will it mean that type should be unique?

> +	nla_strlcpy(ndev_name, tb[RDMA_NLDEV_ATTR_NDEV_NAME],
> +		    sizeof(ndev_name));
> +	pr_debug("ibdev_name |%s| type |%s| ndev_name |%s|\n", ibdev_name,
> +		 type, ndev_name);
> +
> +	ops = link_ops_get(type);
> +	if (!ops) {
> +		err = -ENODEV;
> +		goto err_out;
> +	}
> +
> +	err = ops->newlink(ibdev_name, ndev_name);
> +err_out:
> +	return err;
> +}
> +
> +static int nldev_dellink(struct sk_buff *skb, struct nlmsghdr *nlh,
> +			  struct netlink_ext_ack *extack)
> +{
> +	struct nlattr *tb[RDMA_NLDEV_ATTR_MAX];
> +	char ibdev_name[IB_DEVICE_NAME_MAX];
> +	const struct rdma_link_ops *ops;
> +	char type[IFNAMSIZ];
> +	int err;
> +
> +	err = nlmsg_parse(nlh, 0, tb, RDMA_NLDEV_ATTR_MAX - 1,
> +			  nldev_policy, extack);
> +	if (err || !tb[RDMA_NLDEV_ATTR_DEV_NAME] ||
> +	    !tb[RDMA_NLDEV_ATTR_LINK_TYPE]) {
> +		err = !err ? -EINVAL : err;
> +		goto err_out;
> +	}
> +
> +	nla_strlcpy(ibdev_name, tb[RDMA_NLDEV_ATTR_DEV_NAME],
> +		    sizeof(ibdev_name));
> +	nla_strlcpy(type, tb[RDMA_NLDEV_ATTR_LINK_TYPE], sizeof(type));
> +	pr_debug("ibdev_name |%s| type |%s|\n", ibdev_name, type);

The same comments,

> +
> +	ops = link_ops_get(type);
> +	if (!ops) {
> +		err = -ENODEV;
> +		goto err_out;
> +	}
> +
> +	err = ops->dellink(ibdev_name);
> +err_out:
> +	return err;
> +}
> +
>  static const struct rdma_nl_cbs nldev_cb_table[RDMA_NLDEV_NUM_OPS] = {
>  	[RDMA_NLDEV_CMD_GET] = {
>  		.doit = nldev_get_doit,
> @@ -1110,6 +1217,12 @@ static int nldev_res_get_pd_dumpit(struct sk_buff *skb,
>  	[RDMA_NLDEV_CMD_RES_PD_GET] = {
>  		.dump = nldev_res_get_pd_dumpit,
>  	},
> +	[RDMA_NLDEV_CMD_NEWLINK] = {
> +		.doit = nldev_newlink,
> +	},
> +	[RDMA_NLDEV_CMD_DELLINK] = {
> +		.doit = nldev_dellink,
> +	},
>  };
>
>  void __init nldev_init(void)
> diff --git a/include/rdma/rdma_netlink.h b/include/rdma/rdma_netlink.h
> index c369703fcd69..b5a5a1a7cddc 100644
> --- a/include/rdma/rdma_netlink.h
> +++ b/include/rdma/rdma_netlink.h
> @@ -99,4 +99,15 @@ int ibnl_put_attr(struct sk_buff *skb, struct nlmsghdr *nlh,
>   * Returns 0 on success or a negative for no listeners.
>   */
>  int rdma_nl_chk_listeners(unsigned int group);
> +
> +struct rdma_link_ops {
> +	struct list_head list;
> +	const char *type;
> +	int (*newlink)(char *ibdev_name, char *ndev_name);
> +	int (*dellink)(char *ibdev_name);
> +};
> +
> +void rdma_link_register(struct rdma_link_ops *ops);
> +void rdma_link_unregister(struct rdma_link_ops *ops);
> +
>  #endif /* _RDMA_NETLINK_H */
> diff --git a/include/uapi/rdma/rdma_netlink.h b/include/uapi/rdma/rdma_netlink.h
> index edba6351ac13..025172d41fa7 100644
> --- a/include/uapi/rdma/rdma_netlink.h
> +++ b/include/uapi/rdma/rdma_netlink.h
> @@ -246,6 +246,10 @@ enum rdma_nldev_command {
>
>  	RDMA_NLDEV_CMD_RES_PD_GET, /* can dump */
>
> +	RDMA_NLDEV_CMD_NEWLINK,
> +
> +	RDMA_NLDEV_CMD_DELLINK,
> +

Let's reuse 3 and 4 indexes, which are not occupied.

>  	RDMA_NLDEV_NUM_OPS
>  };
>
> @@ -427,6 +431,11 @@ enum rdma_nldev_attr {
>  	RDMA_NLDEV_ATTR_DRIVER_U64,		/* u64 */
>
>  	/*
> +	 * Identifies the rdma driver. eg: "rxe" or "siw"
> +	 */
> +	RDMA_NLDEV_ATTR_LINK_TYPE,		/* string */
> +
> +	/*
>  	 * Always the end
>  	 */
>  	RDMA_NLDEV_ATTR_MAX
> --
> 1.8.3.1
>
Steve Wise Sept. 26, 2018, 3:44 p.m. UTC | #7
On 9/26/2018 3:05 AM, Leon Romanovsky wrote:
> On Tue, Sep 25, 2018 at 02:11:18PM -0600, Jason Gunthorpe wrote:
>> On Tue, Sep 25, 2018 at 08:55:29PM +0300, Leon Romanovsky wrote:
>>> On Thu, Sep 13, 2018 at 12:16:20PM -0700, Steve Wise wrote:
>>>> Add support for new LINK messages to allow adding and deleting rdma
>>>> interfaces.  This will be used initially for soft rdma drivers which
>>>> instantiate device instances dynamically by the admin specifying a netdev
>>>> device to use.  The rdma_rxe module will be the first user of these
>>>> messages.
>>>>
>>>> The design is modeled after RTNL_NEWLINK/DELLINK:  rdma drivers
>>>> register with the rdma core if they provide link add/delete functions.
>>>> Each driver registers with a unique "type" string, that is used to
>>>> dispatch messages coming from user space.  A new RDMA_NLDEV_ATTR is
>>>> defined for the "type" string.  User mode will pass 3 attributes in a
>>>> NEWLINK message: RDMA_NLDEV_ATTR_IBDEV_NAME for the desired rdma device
>>>> name to be created, RDMA_NLDEV_ATTR_LINK_TYPE for the "type" of link
>>>> being added, and RDMA_NLDEV_ATTR_NDEV_NAME for the net_device interface
>>>> to use for this link. The DELLINK message will contain the IBDEV_NAME
>>>> and LINK_TYPE attributes.
>>>>
>>>> Signed-off-by: Steve Wise <swise@opengridcomputing.com>
>>>
>>> <...>
>>>>
>>>> +	RDMA_NLDEV_CMD_NEWLINK,
>>>> +
>>>> +	RDMA_NLDEV_CMD_DELLINK,
>>>> +
>>>>  	RDMA_NLDEV_NUM_OPS
>>>>  };
>>>>
>>>> @@ -427,6 +431,11 @@ enum rdma_nldev_attr {
>>>>  	RDMA_NLDEV_ATTR_DRIVER_U64,		/* u64 */
>>>>
>>>>  	/*
>>>> +	 * Identifies the rdma driver. eg: "rxe" or "siw"
>>>> +	 */
>>>> +	RDMA_NLDEV_ATTR_LINK_TYPE,		/* string */
>>>
>>> I'll review it more deeply tomorrow, but two things caught my attention:
>>> 1. You need RDMA_NL_ADMIN_PERM for your netlink commands
>>> 2. LINK_TYPE should be index and not string, because it is why we are
>>> using using netlink :)
>>
>> netdev uses strings. See IFLA_INFO_KIND
>>
>> strings are not bad in netlink
> 
> It is not the best example, because IFLA_INFO_KIND is used to forward
> filtering information and one day we will do the same with various
> rdmatool dumping options. However in case of ibdevice type, it is
> declared as number and doesn't seem to me right to convert it to string
> and limit our KABI to name.
> 

Are you suggesting rdmatool uses enum rdma_driver_id from
rdma/rdma_user_ioctl_cmds.h?  So I add code in rdmatool to map the
user's string, eg "mlx5" or "rxe" into the enum and send that down?

That works for me.  If rdma_driver_id is part of the KABI, then I guess
its fine for rdmatool to use it and map names to enum values...

Steve.
Steve Wise Sept. 26, 2018, 3:51 p.m. UTC | #8
On 9/25/2018 3:16 PM, Jason Gunthorpe wrote:
> On Thu, Sep 13, 2018 at 12:16:20PM -0700, Steve Wise wrote:
>> Add support for new LINK messages to allow adding and deleting rdma
>> interfaces.  This will be used initially for soft rdma drivers which
>> instantiate device instances dynamically by the admin specifying a netdev
>> device to use.  The rdma_rxe module will be the first user of these
>> messages.
>>
>> The design is modeled after RTNL_NEWLINK/DELLINK:  rdma drivers
>> register with the rdma core if they provide link add/delete functions.
>> Each driver registers with a unique "type" string, that is used to
>> dispatch messages coming from user space.  A new RDMA_NLDEV_ATTR is
>> defined for the "type" string.  User mode will pass 3 attributes in a
>> NEWLINK message: RDMA_NLDEV_ATTR_IBDEV_NAME for the desired rdma device
>> name to be created, RDMA_NLDEV_ATTR_LINK_TYPE for the "type" of link
>> being added, and RDMA_NLDEV_ATTR_NDEV_NAME for the net_device interface
>> to use for this link. The DELLINK message will contain the IBDEV_NAME
>> and LINK_TYPE attributes.
>>
>> Signed-off-by: Steve Wise <swise@opengridcomputing.com>
>>  drivers/infiniband/core/nldev.c  | 113 +++++++++++++++++++++++++++++++++++++++
>>  include/rdma/rdma_netlink.h      |  11 ++++
>>  include/uapi/rdma/rdma_netlink.h |   9 ++++
>>  3 files changed, 133 insertions(+)
>>
>> diff --git a/drivers/infiniband/core/nldev.c b/drivers/infiniband/core/nldev.c
>> index 0385ab438320..d107b982c210 100644
>> +++ b/drivers/infiniband/core/nldev.c
>> @@ -33,6 +33,7 @@
>>  #include <linux/module.h>
>>  #include <linux/pid.h>
>>  #include <linux/pid_namespace.h>
>> +#include <linux/mutex.h>
>>  #include <net/netlink.h>
>>  #include <rdma/rdma_cm.h>
>>  #include <rdma/rdma_netlink.h>
>> @@ -107,6 +108,8 @@
>>  	[RDMA_NLDEV_ATTR_DRIVER_U32]		= { .type = NLA_U32 },
>>  	[RDMA_NLDEV_ATTR_DRIVER_S64]		= { .type = NLA_S64 },
>>  	[RDMA_NLDEV_ATTR_DRIVER_U64]		= { .type = NLA_U64 },
>> +	[RDMA_NLDEV_ATTR_LINK_TYPE]		= { .type = NLA_NUL_STRING,
>> +						    .len = IFNAMSIZ },
>>  };
>>  
>>  static int put_driver_name_print_type(struct sk_buff *msg, const char *name,
>> @@ -1072,6 +1075,110 @@ static int nldev_res_get_pd_dumpit(struct sk_buff *skb,
>>  	return res_get_common_dumpit(skb, cb, RDMA_RESTRACK_PD);
>>  }
>>  
>> +static LIST_HEAD(link_ops);
>> +static DEFINE_MUTEX(link_ops_mutex);
>> +
>> +void rdma_link_register(struct rdma_link_ops *ops)
>> +{
>> +	mutex_lock(&link_ops_mutex);
>> +	list_add(&ops->list, &link_ops);
> 
> Make sure the name is unique
>

The function name?

>> +	mutex_unlock(&link_ops_mutex);
>> +}
>> +EXPORT_SYMBOL(rdma_link_register);
>> +
>> +void rdma_link_unregister(struct rdma_link_ops *ops)
>> +{
>> +	mutex_lock(&link_ops_mutex);
>> +	list_del_init(&ops->list);
>> +	mutex_unlock(&link_ops_mutex);
>> +}
>> +EXPORT_SYMBOL(rdma_link_unregister);
>> +
>> +static const struct rdma_link_ops *link_ops_get(const char *type)
>> +{
>> +	const struct rdma_link_ops *ops;
>> +
>> +	mutex_lock(&link_ops_mutex);
>> +	list_for_each_entry(ops, &link_ops, list) {
>> +		if (!strncmp(ops->type, type, IFNAMSIZ))
> 
> Why strncmp here? Both strings should be null terminated.
> 

ok.

>> +			goto out;
>> +	}
>> +	ops = NULL;
> 
> This should trigger module automatic loading and try again.
> 
> Ie call 
> 
>    request_module("rdma-link-%s", type);
> 
> And rxe should provide that symbol as a module-alias.
> 
> See the similar code in netdev, ie MODULE_ALIAS_RTNL_LINK and the
> request_module calls

Hmm, I'll look into this.

Thanks!
Jason Gunthorpe Sept. 26, 2018, 4:51 p.m. UTC | #9
On Wed, Sep 26, 2018 at 10:44:56AM -0500, Steve Wise wrote:
> 
> 
> On 9/26/2018 3:05 AM, Leon Romanovsky wrote:
> > On Tue, Sep 25, 2018 at 02:11:18PM -0600, Jason Gunthorpe wrote:
> >> On Tue, Sep 25, 2018 at 08:55:29PM +0300, Leon Romanovsky wrote:
> >>> On Thu, Sep 13, 2018 at 12:16:20PM -0700, Steve Wise wrote:
> >>>> Add support for new LINK messages to allow adding and deleting rdma
> >>>> interfaces.  This will be used initially for soft rdma drivers which
> >>>> instantiate device instances dynamically by the admin specifying a netdev
> >>>> device to use.  The rdma_rxe module will be the first user of these
> >>>> messages.
> >>>>
> >>>> The design is modeled after RTNL_NEWLINK/DELLINK:  rdma drivers
> >>>> register with the rdma core if they provide link add/delete functions.
> >>>> Each driver registers with a unique "type" string, that is used to
> >>>> dispatch messages coming from user space.  A new RDMA_NLDEV_ATTR is
> >>>> defined for the "type" string.  User mode will pass 3 attributes in a
> >>>> NEWLINK message: RDMA_NLDEV_ATTR_IBDEV_NAME for the desired rdma device
> >>>> name to be created, RDMA_NLDEV_ATTR_LINK_TYPE for the "type" of link
> >>>> being added, and RDMA_NLDEV_ATTR_NDEV_NAME for the net_device interface
> >>>> to use for this link. The DELLINK message will contain the IBDEV_NAME
> >>>> and LINK_TYPE attributes.
> >>>>
> >>>> Signed-off-by: Steve Wise <swise@opengridcomputing.com>
> >>>
> >>> <...>
> >>>>
> >>>> +	RDMA_NLDEV_CMD_NEWLINK,
> >>>> +
> >>>> +	RDMA_NLDEV_CMD_DELLINK,
> >>>> +
> >>>>  	RDMA_NLDEV_NUM_OPS
> >>>>  };
> >>>>
> >>>> @@ -427,6 +431,11 @@ enum rdma_nldev_attr {
> >>>>  	RDMA_NLDEV_ATTR_DRIVER_U64,		/* u64 */
> >>>>
> >>>>  	/*
> >>>> +	 * Identifies the rdma driver. eg: "rxe" or "siw"
> >>>> +	 */
> >>>> +	RDMA_NLDEV_ATTR_LINK_TYPE,		/* string */
> >>>
> >>> I'll review it more deeply tomorrow, but two things caught my attention:
> >>> 1. You need RDMA_NL_ADMIN_PERM for your netlink commands
> >>> 2. LINK_TYPE should be index and not string, because it is why we are
> >>> using using netlink :)
> >>
> >> netdev uses strings. See IFLA_INFO_KIND
> >>
> >> strings are not bad in netlink
> > 
> > It is not the best example, because IFLA_INFO_KIND is used to forward
> > filtering information and one day we will do the same with various
> > rdmatool dumping options. However in case of ibdevice type, it is
> > declared as number and doesn't seem to me right to convert it to string
> > and limit our KABI to name.
> > 
> 
> Are you suggesting rdmatool uses enum rdma_driver_id from
> rdma/rdma_user_ioctl_cmds.h?  So I add code in rdmatool to map the
> user's string, eg "mlx5" or "rxe" into the enum and send that down?
> 
> That works for me.  If rdma_driver_id is part of the KABI, then I guess
> its fine for rdmatool to use it and map names to enum values...

I think this is a bad idea, we don't want to tie the kernel to iproute
so tightly, flowing the string from the command line is the right way
to handle this, just like netdev does.

Also, remember, the DRIVER_ID specifies the driver *NOT* the link
type. It would be perfectly resonable for a driver like RXE to have
multiple link types using the same user space driver. I can't imagine
a use case for this, but is wrong to mix these two different concepts.

Jason
Jason Gunthorpe Sept. 26, 2018, 4:52 p.m. UTC | #10
On Wed, Sep 26, 2018 at 10:51:00AM -0500, Steve Wise wrote:
> 
> 
> On 9/25/2018 3:16 PM, Jason Gunthorpe wrote:
> > On Thu, Sep 13, 2018 at 12:16:20PM -0700, Steve Wise wrote:
> >> Add support for new LINK messages to allow adding and deleting rdma
> >> interfaces.  This will be used initially for soft rdma drivers which
> >> instantiate device instances dynamically by the admin specifying a netdev
> >> device to use.  The rdma_rxe module will be the first user of these
> >> messages.
> >>
> >> The design is modeled after RTNL_NEWLINK/DELLINK:  rdma drivers
> >> register with the rdma core if they provide link add/delete functions.
> >> Each driver registers with a unique "type" string, that is used to
> >> dispatch messages coming from user space.  A new RDMA_NLDEV_ATTR is
> >> defined for the "type" string.  User mode will pass 3 attributes in a
> >> NEWLINK message: RDMA_NLDEV_ATTR_IBDEV_NAME for the desired rdma device
> >> name to be created, RDMA_NLDEV_ATTR_LINK_TYPE for the "type" of link
> >> being added, and RDMA_NLDEV_ATTR_NDEV_NAME for the net_device interface
> >> to use for this link. The DELLINK message will contain the IBDEV_NAME
> >> and LINK_TYPE attributes.
> >>
> >> Signed-off-by: Steve Wise <swise@opengridcomputing.com>
> >>  drivers/infiniband/core/nldev.c  | 113 +++++++++++++++++++++++++++++++++++++++
> >>  include/rdma/rdma_netlink.h      |  11 ++++
> >>  include/uapi/rdma/rdma_netlink.h |   9 ++++
> >>  3 files changed, 133 insertions(+)
> >>
> >> diff --git a/drivers/infiniband/core/nldev.c b/drivers/infiniband/core/nldev.c
> >> index 0385ab438320..d107b982c210 100644
> >> +++ b/drivers/infiniband/core/nldev.c
> >> @@ -33,6 +33,7 @@
> >>  #include <linux/module.h>
> >>  #include <linux/pid.h>
> >>  #include <linux/pid_namespace.h>
> >> +#include <linux/mutex.h>
> >>  #include <net/netlink.h>
> >>  #include <rdma/rdma_cm.h>
> >>  #include <rdma/rdma_netlink.h>
> >> @@ -107,6 +108,8 @@
> >>  	[RDMA_NLDEV_ATTR_DRIVER_U32]		= { .type = NLA_U32 },
> >>  	[RDMA_NLDEV_ATTR_DRIVER_S64]		= { .type = NLA_S64 },
> >>  	[RDMA_NLDEV_ATTR_DRIVER_U64]		= { .type = NLA_U64 },
> >> +	[RDMA_NLDEV_ATTR_LINK_TYPE]		= { .type = NLA_NUL_STRING,
> >> +						    .len = IFNAMSIZ },
> >>  };
> >>  
> >>  static int put_driver_name_print_type(struct sk_buff *msg, const char *name,
> >> @@ -1072,6 +1075,110 @@ static int nldev_res_get_pd_dumpit(struct sk_buff *skb,
> >>  	return res_get_common_dumpit(skb, cb, RDMA_RESTRACK_PD);
> >>  }
> >>  
> >> +static LIST_HEAD(link_ops);
> >> +static DEFINE_MUTEX(link_ops_mutex);
> >> +
> >> +void rdma_link_register(struct rdma_link_ops *ops)
> >> +{
> >> +	mutex_lock(&link_ops_mutex);
> >> +	list_add(&ops->list, &link_ops);
> > 
> > Make sure the name is unique
> 
> The function name?

No, ops->type

Jason
Leon Romanovsky Sept. 26, 2018, 4:55 p.m. UTC | #11
On Wed, Sep 26, 2018 at 10:44:56AM -0500, Steve Wise wrote:
>
>
> On 9/26/2018 3:05 AM, Leon Romanovsky wrote:
> > On Tue, Sep 25, 2018 at 02:11:18PM -0600, Jason Gunthorpe wrote:
> >> On Tue, Sep 25, 2018 at 08:55:29PM +0300, Leon Romanovsky wrote:
> >>> On Thu, Sep 13, 2018 at 12:16:20PM -0700, Steve Wise wrote:
> >>>> Add support for new LINK messages to allow adding and deleting rdma
> >>>> interfaces.  This will be used initially for soft rdma drivers which
> >>>> instantiate device instances dynamically by the admin specifying a netdev
> >>>> device to use.  The rdma_rxe module will be the first user of these
> >>>> messages.
> >>>>
> >>>> The design is modeled after RTNL_NEWLINK/DELLINK:  rdma drivers
> >>>> register with the rdma core if they provide link add/delete functions.
> >>>> Each driver registers with a unique "type" string, that is used to
> >>>> dispatch messages coming from user space.  A new RDMA_NLDEV_ATTR is
> >>>> defined for the "type" string.  User mode will pass 3 attributes in a
> >>>> NEWLINK message: RDMA_NLDEV_ATTR_IBDEV_NAME for the desired rdma device
> >>>> name to be created, RDMA_NLDEV_ATTR_LINK_TYPE for the "type" of link
> >>>> being added, and RDMA_NLDEV_ATTR_NDEV_NAME for the net_device interface
> >>>> to use for this link. The DELLINK message will contain the IBDEV_NAME
> >>>> and LINK_TYPE attributes.
> >>>>
> >>>> Signed-off-by: Steve Wise <swise@opengridcomputing.com>
> >>>
> >>> <...>
> >>>>
> >>>> +	RDMA_NLDEV_CMD_NEWLINK,
> >>>> +
> >>>> +	RDMA_NLDEV_CMD_DELLINK,
> >>>> +
> >>>>  	RDMA_NLDEV_NUM_OPS
> >>>>  };
> >>>>
> >>>> @@ -427,6 +431,11 @@ enum rdma_nldev_attr {
> >>>>  	RDMA_NLDEV_ATTR_DRIVER_U64,		/* u64 */
> >>>>
> >>>>  	/*
> >>>> +	 * Identifies the rdma driver. eg: "rxe" or "siw"
> >>>> +	 */
> >>>> +	RDMA_NLDEV_ATTR_LINK_TYPE,		/* string */
> >>>
> >>> I'll review it more deeply tomorrow, but two things caught my attention:
> >>> 1. You need RDMA_NL_ADMIN_PERM for your netlink commands
> >>> 2. LINK_TYPE should be index and not string, because it is why we are
> >>> using using netlink :)
> >>
> >> netdev uses strings. See IFLA_INFO_KIND
> >>
> >> strings are not bad in netlink
> >
> > It is not the best example, because IFLA_INFO_KIND is used to forward
> > filtering information and one day we will do the same with various
> > rdmatool dumping options. However in case of ibdevice type, it is
> > declared as number and doesn't seem to me right to convert it to string
> > and limit our KABI to name.
> >
>
> Are you suggesting rdmatool uses enum rdma_driver_id from
> rdma/rdma_user_ioctl_cmds.h?  So I add code in rdmatool to map the
> user's string, eg "mlx5" or "rxe" into the enum and send that down?
>
> That works for me.  If rdma_driver_id is part of the KABI, then I guess
> its fine for rdmatool to use it and map names to enum values...

I think so, for me it sounds like a good thing. I'll reuse this enum
rdma_driver_id in rdma-core later on.

Thanks

>
> Steve.
>
Leon Romanovsky Sept. 26, 2018, 4:57 p.m. UTC | #12
On Wed, Sep 26, 2018 at 10:51:46AM -0600, Jason Gunthorpe wrote:
> On Wed, Sep 26, 2018 at 10:44:56AM -0500, Steve Wise wrote:
> >
> >
> > On 9/26/2018 3:05 AM, Leon Romanovsky wrote:
> > > On Tue, Sep 25, 2018 at 02:11:18PM -0600, Jason Gunthorpe wrote:
> > >> On Tue, Sep 25, 2018 at 08:55:29PM +0300, Leon Romanovsky wrote:
> > >>> On Thu, Sep 13, 2018 at 12:16:20PM -0700, Steve Wise wrote:
> > >>>> Add support for new LINK messages to allow adding and deleting rdma
> > >>>> interfaces.  This will be used initially for soft rdma drivers which
> > >>>> instantiate device instances dynamically by the admin specifying a netdev
> > >>>> device to use.  The rdma_rxe module will be the first user of these
> > >>>> messages.
> > >>>>
> > >>>> The design is modeled after RTNL_NEWLINK/DELLINK:  rdma drivers
> > >>>> register with the rdma core if they provide link add/delete functions.
> > >>>> Each driver registers with a unique "type" string, that is used to
> > >>>> dispatch messages coming from user space.  A new RDMA_NLDEV_ATTR is
> > >>>> defined for the "type" string.  User mode will pass 3 attributes in a
> > >>>> NEWLINK message: RDMA_NLDEV_ATTR_IBDEV_NAME for the desired rdma device
> > >>>> name to be created, RDMA_NLDEV_ATTR_LINK_TYPE for the "type" of link
> > >>>> being added, and RDMA_NLDEV_ATTR_NDEV_NAME for the net_device interface
> > >>>> to use for this link. The DELLINK message will contain the IBDEV_NAME
> > >>>> and LINK_TYPE attributes.
> > >>>>
> > >>>> Signed-off-by: Steve Wise <swise@opengridcomputing.com>
> > >>>
> > >>> <...>
> > >>>>
> > >>>> +	RDMA_NLDEV_CMD_NEWLINK,
> > >>>> +
> > >>>> +	RDMA_NLDEV_CMD_DELLINK,
> > >>>> +
> > >>>>  	RDMA_NLDEV_NUM_OPS
> > >>>>  };
> > >>>>
> > >>>> @@ -427,6 +431,11 @@ enum rdma_nldev_attr {
> > >>>>  	RDMA_NLDEV_ATTR_DRIVER_U64,		/* u64 */
> > >>>>
> > >>>>  	/*
> > >>>> +	 * Identifies the rdma driver. eg: "rxe" or "siw"
> > >>>> +	 */
> > >>>> +	RDMA_NLDEV_ATTR_LINK_TYPE,		/* string */
> > >>>
> > >>> I'll review it more deeply tomorrow, but two things caught my attention:
> > >>> 1. You need RDMA_NL_ADMIN_PERM for your netlink commands
> > >>> 2. LINK_TYPE should be index and not string, because it is why we are
> > >>> using using netlink :)
> > >>
> > >> netdev uses strings. See IFLA_INFO_KIND
> > >>
> > >> strings are not bad in netlink
> > >
> > > It is not the best example, because IFLA_INFO_KIND is used to forward
> > > filtering information and one day we will do the same with various
> > > rdmatool dumping options. However in case of ibdevice type, it is
> > > declared as number and doesn't seem to me right to convert it to string
> > > and limit our KABI to name.
> > >
> >
> > Are you suggesting rdmatool uses enum rdma_driver_id from
> > rdma/rdma_user_ioctl_cmds.h?  So I add code in rdmatool to map the
> > user's string, eg "mlx5" or "rxe" into the enum and send that down?
> >
> > That works for me.  If rdma_driver_id is part of the KABI, then I guess
> > its fine for rdmatool to use it and map names to enum values...
>
> I think this is a bad idea, we don't want to tie the kernel to iproute
> so tightly, flowing the string from the command line is the right way
> to handle this, just like netdev does.

Sorry, but netdev uses strings for filtering and not for types.

>
> Also, remember, the DRIVER_ID specifies the driver *NOT* the link
> type. It would be perfectly resonable for a driver like RXE to have
> multiple link types using the same user space driver. I can't imagine
> a use case for this, but is wrong to mix these two different concepts.
>
> Jason
Jason Gunthorpe Sept. 26, 2018, 5:09 p.m. UTC | #13
On Wed, Sep 26, 2018 at 07:57:26PM +0300, Leon Romanovsky wrote:
> On Wed, Sep 26, 2018 at 10:51:46AM -0600, Jason Gunthorpe wrote:
> > On Wed, Sep 26, 2018 at 10:44:56AM -0500, Steve Wise wrote:
> > >
> > >
> > > On 9/26/2018 3:05 AM, Leon Romanovsky wrote:
> > > > On Tue, Sep 25, 2018 at 02:11:18PM -0600, Jason Gunthorpe wrote:
> > > >> On Tue, Sep 25, 2018 at 08:55:29PM +0300, Leon Romanovsky wrote:
> > > >>> On Thu, Sep 13, 2018 at 12:16:20PM -0700, Steve Wise wrote:
> > > >>>> Add support for new LINK messages to allow adding and deleting rdma
> > > >>>> interfaces.  This will be used initially for soft rdma drivers which
> > > >>>> instantiate device instances dynamically by the admin specifying a netdev
> > > >>>> device to use.  The rdma_rxe module will be the first user of these
> > > >>>> messages.
> > > >>>>
> > > >>>> The design is modeled after RTNL_NEWLINK/DELLINK:  rdma drivers
> > > >>>> register with the rdma core if they provide link add/delete functions.
> > > >>>> Each driver registers with a unique "type" string, that is used to
> > > >>>> dispatch messages coming from user space.  A new RDMA_NLDEV_ATTR is
> > > >>>> defined for the "type" string.  User mode will pass 3 attributes in a
> > > >>>> NEWLINK message: RDMA_NLDEV_ATTR_IBDEV_NAME for the desired rdma device
> > > >>>> name to be created, RDMA_NLDEV_ATTR_LINK_TYPE for the "type" of link
> > > >>>> being added, and RDMA_NLDEV_ATTR_NDEV_NAME for the net_device interface
> > > >>>> to use for this link. The DELLINK message will contain the IBDEV_NAME
> > > >>>> and LINK_TYPE attributes.
> > > >>>>
> > > >>>> Signed-off-by: Steve Wise <swise@opengridcomputing.com>
> > > >>>
> > > >>> <...>
> > > >>>>
> > > >>>> +	RDMA_NLDEV_CMD_NEWLINK,
> > > >>>> +
> > > >>>> +	RDMA_NLDEV_CMD_DELLINK,
> > > >>>> +
> > > >>>>  	RDMA_NLDEV_NUM_OPS
> > > >>>>  };
> > > >>>>
> > > >>>> @@ -427,6 +431,11 @@ enum rdma_nldev_attr {
> > > >>>>  	RDMA_NLDEV_ATTR_DRIVER_U64,		/* u64 */
> > > >>>>
> > > >>>>  	/*
> > > >>>> +	 * Identifies the rdma driver. eg: "rxe" or "siw"
> > > >>>> +	 */
> > > >>>> +	RDMA_NLDEV_ATTR_LINK_TYPE,		/* string */
> > > >>>
> > > >>> I'll review it more deeply tomorrow, but two things caught my attention:
> > > >>> 1. You need RDMA_NL_ADMIN_PERM for your netlink commands
> > > >>> 2. LINK_TYPE should be index and not string, because it is why we are
> > > >>> using using netlink :)
> > > >>
> > > >> netdev uses strings. See IFLA_INFO_KIND
> > > >>
> > > >> strings are not bad in netlink
> > > >
> > > > It is not the best example, because IFLA_INFO_KIND is used to forward
> > > > filtering information and one day we will do the same with various
> > > > rdmatool dumping options. However in case of ibdevice type, it is
> > > > declared as number and doesn't seem to me right to convert it to string
> > > > and limit our KABI to name.
> > > >
> > >
> > > Are you suggesting rdmatool uses enum rdma_driver_id from
> > > rdma/rdma_user_ioctl_cmds.h?  So I add code in rdmatool to map the
> > > user's string, eg "mlx5" or "rxe" into the enum and send that down?
> > >
> > > That works for me.  If rdma_driver_id is part of the KABI, then I guess
> > > its fine for rdmatool to use it and map names to enum values...
> >
> > I think this is a bad idea, we don't want to tie the kernel to iproute
> > so tightly, flowing the string from the command line is the right way
> > to handle this, just like netdev does.
> 
> Sorry, but netdev uses strings for filtering and not for types.

I showed you, it is using a string for NEWLINK, which is what we are
copying.

Jason
Leon Romanovsky Sept. 26, 2018, 5:21 p.m. UTC | #14
On Wed, Sep 26, 2018 at 11:09:53AM -0600, Jason Gunthorpe wrote:
> On Wed, Sep 26, 2018 at 07:57:26PM +0300, Leon Romanovsky wrote:
> > On Wed, Sep 26, 2018 at 10:51:46AM -0600, Jason Gunthorpe wrote:
> > > On Wed, Sep 26, 2018 at 10:44:56AM -0500, Steve Wise wrote:
> > > >
> > > >
> > > > On 9/26/2018 3:05 AM, Leon Romanovsky wrote:
> > > > > On Tue, Sep 25, 2018 at 02:11:18PM -0600, Jason Gunthorpe wrote:
> > > > >> On Tue, Sep 25, 2018 at 08:55:29PM +0300, Leon Romanovsky wrote:
> > > > >>> On Thu, Sep 13, 2018 at 12:16:20PM -0700, Steve Wise wrote:
> > > > >>>> Add support for new LINK messages to allow adding and deleting rdma
> > > > >>>> interfaces.  This will be used initially for soft rdma drivers which
> > > > >>>> instantiate device instances dynamically by the admin specifying a netdev
> > > > >>>> device to use.  The rdma_rxe module will be the first user of these
> > > > >>>> messages.
> > > > >>>>
> > > > >>>> The design is modeled after RTNL_NEWLINK/DELLINK:  rdma drivers
> > > > >>>> register with the rdma core if they provide link add/delete functions.
> > > > >>>> Each driver registers with a unique "type" string, that is used to
> > > > >>>> dispatch messages coming from user space.  A new RDMA_NLDEV_ATTR is
> > > > >>>> defined for the "type" string.  User mode will pass 3 attributes in a
> > > > >>>> NEWLINK message: RDMA_NLDEV_ATTR_IBDEV_NAME for the desired rdma device
> > > > >>>> name to be created, RDMA_NLDEV_ATTR_LINK_TYPE for the "type" of link
> > > > >>>> being added, and RDMA_NLDEV_ATTR_NDEV_NAME for the net_device interface
> > > > >>>> to use for this link. The DELLINK message will contain the IBDEV_NAME
> > > > >>>> and LINK_TYPE attributes.
> > > > >>>>
> > > > >>>> Signed-off-by: Steve Wise <swise@opengridcomputing.com>
> > > > >>>
> > > > >>> <...>
> > > > >>>>
> > > > >>>> +	RDMA_NLDEV_CMD_NEWLINK,
> > > > >>>> +
> > > > >>>> +	RDMA_NLDEV_CMD_DELLINK,
> > > > >>>> +
> > > > >>>>  	RDMA_NLDEV_NUM_OPS
> > > > >>>>  };
> > > > >>>>
> > > > >>>> @@ -427,6 +431,11 @@ enum rdma_nldev_attr {
> > > > >>>>  	RDMA_NLDEV_ATTR_DRIVER_U64,		/* u64 */
> > > > >>>>
> > > > >>>>  	/*
> > > > >>>> +	 * Identifies the rdma driver. eg: "rxe" or "siw"
> > > > >>>> +	 */
> > > > >>>> +	RDMA_NLDEV_ATTR_LINK_TYPE,		/* string */
> > > > >>>
> > > > >>> I'll review it more deeply tomorrow, but two things caught my attention:
> > > > >>> 1. You need RDMA_NL_ADMIN_PERM for your netlink commands
> > > > >>> 2. LINK_TYPE should be index and not string, because it is why we are
> > > > >>> using using netlink :)
> > > > >>
> > > > >> netdev uses strings. See IFLA_INFO_KIND
> > > > >>
> > > > >> strings are not bad in netlink
> > > > >
> > > > > It is not the best example, because IFLA_INFO_KIND is used to forward
> > > > > filtering information and one day we will do the same with various
> > > > > rdmatool dumping options. However in case of ibdevice type, it is
> > > > > declared as number and doesn't seem to me right to convert it to string
> > > > > and limit our KABI to name.
> > > > >
> > > >
> > > > Are you suggesting rdmatool uses enum rdma_driver_id from
> > > > rdma/rdma_user_ioctl_cmds.h?  So I add code in rdmatool to map the
> > > > user's string, eg "mlx5" or "rxe" into the enum and send that down?
> > > >
> > > > That works for me.  If rdma_driver_id is part of the KABI, then I guess
> > > > its fine for rdmatool to use it and map names to enum values...
> > >
> > > I think this is a bad idea, we don't want to tie the kernel to iproute
> > > so tightly, flowing the string from the command line is the right way
> > > to handle this, just like netdev does.
> >
> > Sorry, but netdev uses strings for filtering and not for types.
>
> I showed you, it is using a string for NEWLINK, which is what we are
> copying.

You showed me IFLA_INFO_KIND, which was introduce to filter link dumps.
dc599f76c22b ("net: Add support for filtering link dump by master device and kind")

It is not what we are doing here.

Thanks

>
> Jason
Leon Romanovsky Sept. 26, 2018, 5:34 p.m. UTC | #15
On Wed, Sep 26, 2018 at 08:21:16PM +0300, Leon Romanovsky wrote:
> On Wed, Sep 26, 2018 at 11:09:53AM -0600, Jason Gunthorpe wrote:
> > On Wed, Sep 26, 2018 at 07:57:26PM +0300, Leon Romanovsky wrote:
> > > On Wed, Sep 26, 2018 at 10:51:46AM -0600, Jason Gunthorpe wrote:
> > > > On Wed, Sep 26, 2018 at 10:44:56AM -0500, Steve Wise wrote:
> > > > >
> > > > >
> > > > > On 9/26/2018 3:05 AM, Leon Romanovsky wrote:
> > > > > > On Tue, Sep 25, 2018 at 02:11:18PM -0600, Jason Gunthorpe wrote:
> > > > > >> On Tue, Sep 25, 2018 at 08:55:29PM +0300, Leon Romanovsky wrote:
> > > > > >>> On Thu, Sep 13, 2018 at 12:16:20PM -0700, Steve Wise wrote:
> > > > > >>>> Add support for new LINK messages to allow adding and deleting rdma
> > > > > >>>> interfaces.  This will be used initially for soft rdma drivers which
> > > > > >>>> instantiate device instances dynamically by the admin specifying a netdev
> > > > > >>>> device to use.  The rdma_rxe module will be the first user of these
> > > > > >>>> messages.
> > > > > >>>>
> > > > > >>>> The design is modeled after RTNL_NEWLINK/DELLINK:  rdma drivers
> > > > > >>>> register with the rdma core if they provide link add/delete functions.
> > > > > >>>> Each driver registers with a unique "type" string, that is used to
> > > > > >>>> dispatch messages coming from user space.  A new RDMA_NLDEV_ATTR is
> > > > > >>>> defined for the "type" string.  User mode will pass 3 attributes in a
> > > > > >>>> NEWLINK message: RDMA_NLDEV_ATTR_IBDEV_NAME for the desired rdma device
> > > > > >>>> name to be created, RDMA_NLDEV_ATTR_LINK_TYPE for the "type" of link
> > > > > >>>> being added, and RDMA_NLDEV_ATTR_NDEV_NAME for the net_device interface
> > > > > >>>> to use for this link. The DELLINK message will contain the IBDEV_NAME
> > > > > >>>> and LINK_TYPE attributes.
> > > > > >>>>
> > > > > >>>> Signed-off-by: Steve Wise <swise@opengridcomputing.com>
> > > > > >>>
> > > > > >>> <...>
> > > > > >>>>
> > > > > >>>> +	RDMA_NLDEV_CMD_NEWLINK,
> > > > > >>>> +
> > > > > >>>> +	RDMA_NLDEV_CMD_DELLINK,
> > > > > >>>> +
> > > > > >>>>  	RDMA_NLDEV_NUM_OPS
> > > > > >>>>  };
> > > > > >>>>
> > > > > >>>> @@ -427,6 +431,11 @@ enum rdma_nldev_attr {
> > > > > >>>>  	RDMA_NLDEV_ATTR_DRIVER_U64,		/* u64 */
> > > > > >>>>
> > > > > >>>>  	/*
> > > > > >>>> +	 * Identifies the rdma driver. eg: "rxe" or "siw"
> > > > > >>>> +	 */
> > > > > >>>> +	RDMA_NLDEV_ATTR_LINK_TYPE,		/* string */
> > > > > >>>
> > > > > >>> I'll review it more deeply tomorrow, but two things caught my attention:
> > > > > >>> 1. You need RDMA_NL_ADMIN_PERM for your netlink commands
> > > > > >>> 2. LINK_TYPE should be index and not string, because it is why we are
> > > > > >>> using using netlink :)
> > > > > >>
> > > > > >> netdev uses strings. See IFLA_INFO_KIND
> > > > > >>
> > > > > >> strings are not bad in netlink
> > > > > >
> > > > > > It is not the best example, because IFLA_INFO_KIND is used to forward
> > > > > > filtering information and one day we will do the same with various
> > > > > > rdmatool dumping options. However in case of ibdevice type, it is
> > > > > > declared as number and doesn't seem to me right to convert it to string
> > > > > > and limit our KABI to name.
> > > > > >
> > > > >
> > > > > Are you suggesting rdmatool uses enum rdma_driver_id from
> > > > > rdma/rdma_user_ioctl_cmds.h?  So I add code in rdmatool to map the
> > > > > user's string, eg "mlx5" or "rxe" into the enum and send that down?
> > > > >
> > > > > That works for me.  If rdma_driver_id is part of the KABI, then I guess
> > > > > its fine for rdmatool to use it and map names to enum values...
> > > >
> > > > I think this is a bad idea, we don't want to tie the kernel to iproute
> > > > so tightly, flowing the string from the command line is the right way
> > > > to handle this, just like netdev does.
> > >
> > > Sorry, but netdev uses strings for filtering and not for types.
> >
> > I showed you, it is using a string for NEWLINK, which is what we are
> > copying.
>
> You showed me IFLA_INFO_KIND, which was introduce to filter link dumps.
> dc599f76c22b ("net: Add support for filtering link dump by master device and kind")
>
> It is not what we are doing here.

More on that, in netdev, they are not creating any KABI dependencies on
the input provided by IFLA_INFO_KIND, however, you are suggesting that
rdmatool will be forever bounded to the kernel by specific names rxe, siw
e.t.c.

Thanks

>
> Thanks
>
> >
> > Jason
Jason Gunthorpe Sept. 26, 2018, 5:43 p.m. UTC | #16
On Wed, Sep 26, 2018 at 08:21:16PM +0300, Leon Romanovsky wrote:

> > I showed you, it is using a string for NEWLINK, which is what we are
> > copying.
> 
> You showed me IFLA_INFO_KIND, which was introduce to filter link dumps.
> dc599f76c22b ("net: Add support for filtering link dump by master device and kind")

No, that commit is just adding some support to dump_ifinfo.

The IFLA_INFO_KIND has existed since day 1 of NEWLINK and is used to
lookup the link driver:

static int rtnl_newlink(struct sk_buff *skb, struct nlmsghdr *nlh,
                        struct netlink_ext_ack *extack)
{
        if (linkinfo[IFLA_INFO_KIND]) {
                nla_strlcpy(kind, linkinfo[IFLA_INFO_KIND], sizeof(kind));
                ops = rtnl_link_ops_get(kind);

Which is exactly the pattern we need to follow.

Jason
Jason Gunthorpe Sept. 26, 2018, 5:47 p.m. UTC | #17
On Wed, Sep 26, 2018 at 08:34:44PM +0300, Leon Romanovsky wrote:

> > You showed me IFLA_INFO_KIND, which was introduce to filter link dumps.
> > dc599f76c22b ("net: Add support for filtering link dump by master device and kind")
> >
> > It is not what we are doing here.
> 
> More on that, in netdev, they are not creating any KABI dependencies on
> the input provided by IFLA_INFO_KIND, however, you are suggesting that
> rdmatool will be forever bounded to the kernel by specific names rxe, siw
> e.t.c.

What are you talking about? netdev uses strings.

When you do 

  ip link add link ib0 ipoib

It passes string 'ipoib' in a IFLA_INFO_LIND attribute of a
RTM_NEWLINK message. The kernel calls rtnl_link_ops_get("ipoib") and
matches it to

static struct rtnl_link_ops ipoib_link_ops __read_mostly = {
        .kind           = "ipoib",

And that kind name string is forever part of the kernel ABI and part
of the netlink command interface.

They don't use for this numbers, that is just pointless extra work to
marshall.

Jason
Steve Wise Sept. 26, 2018, 10:05 p.m. UTC | #18
On 9/26/2018 3:24 AM, Leon Romanovsky wrote:
> On Thu, Sep 13, 2018 at 12:16:20PM -0700, Steve Wise wrote:
>> Add support for new LINK messages to allow adding and deleting rdma
>> interfaces.  This will be used initially for soft rdma drivers which
>> instantiate device instances dynamically by the admin specifying a netdev
>> device to use.  The rdma_rxe module will be the first user of these
>> messages.
>>
>> The design is modeled after RTNL_NEWLINK/DELLINK:  rdma drivers
>> register with the rdma core if they provide link add/delete functions.
>> Each driver registers with a unique "type" string, that is used to
>> dispatch messages coming from user space.  A new RDMA_NLDEV_ATTR is
>> defined for the "type" string.  User mode will pass 3 attributes in a
>> NEWLINK message: RDMA_NLDEV_ATTR_IBDEV_NAME for the desired rdma device
>> name to be created, RDMA_NLDEV_ATTR_LINK_TYPE for the "type" of link
>> being added, and RDMA_NLDEV_ATTR_NDEV_NAME for the net_device interface
>> to use for this link. The DELLINK message will contain the IBDEV_NAME
>> and LINK_TYPE attributes.
>>
>> Signed-off-by: Steve Wise <swise@opengridcomputing.com>
>> ---
>>  drivers/infiniband/core/nldev.c  | 113 +++++++++++++++++++++++++++++++++++++++
>>  include/rdma/rdma_netlink.h      |  11 ++++
>>  include/uapi/rdma/rdma_netlink.h |   9 ++++
>>  3 files changed, 133 insertions(+)
>>
>> diff --git a/drivers/infiniband/core/nldev.c b/drivers/infiniband/core/nldev.c
>> index 0385ab438320..d107b982c210 100644
>> --- a/drivers/infiniband/core/nldev.c
>> +++ b/drivers/infiniband/core/nldev.c
>> @@ -33,6 +33,7 @@
>>  #include <linux/module.h>
>>  #include <linux/pid.h>
>>  #include <linux/pid_namespace.h>
>> +#include <linux/mutex.h>
>>  #include <net/netlink.h>
>>  #include <rdma/rdma_cm.h>
>>  #include <rdma/rdma_netlink.h>
>> @@ -107,6 +108,8 @@
>>  	[RDMA_NLDEV_ATTR_DRIVER_U32]		= { .type = NLA_U32 },
>>  	[RDMA_NLDEV_ATTR_DRIVER_S64]		= { .type = NLA_S64 },
>>  	[RDMA_NLDEV_ATTR_DRIVER_U64]		= { .type = NLA_U64 },
>> +	[RDMA_NLDEV_ATTR_LINK_TYPE]		= { .type = NLA_NUL_STRING,
>> +						    .len = IFNAMSIZ },
> 
> In case, we are continuing this path, better to use our define RDMA_NLDEV_ATTR_ENTRY_STRLEN.
> 
>>  };
>>
>>  static int put_driver_name_print_type(struct sk_buff *msg, const char *name,
>> @@ -1072,6 +1075,110 @@ static int nldev_res_get_pd_dumpit(struct sk_buff *skb,
>>  	return res_get_common_dumpit(skb, cb, RDMA_RESTRACK_PD);
>>  }
>>
>> +static LIST_HEAD(link_ops);
>> +static DEFINE_MUTEX(link_ops_mutex);
>> +
>> +void rdma_link_register(struct rdma_link_ops *ops)
>> +{
>> +	mutex_lock(&link_ops_mutex);
>> +	list_add(&ops->list, &link_ops);
>> +	mutex_unlock(&link_ops_mutex);
>> +}
>> +EXPORT_SYMBOL(rdma_link_register);
>> +
>> +void rdma_link_unregister(struct rdma_link_ops *ops)
>> +{
>> +	mutex_lock(&link_ops_mutex);
>> +	list_del_init(&ops->list);
> 
> Why did you use list_del_init and not list_del?
> 
> And in case we will use device_type, we won't need manage list at all :)
> 
>> +	mutex_unlock(&link_ops_mutex);
>> +}
>> +EXPORT_SYMBOL(rdma_link_unregister);
>> +
>> +static const struct rdma_link_ops *link_ops_get(const char *type)
>> +{
>> +	const struct rdma_link_ops *ops;
>> +
>> +	mutex_lock(&link_ops_mutex);
>> +	list_for_each_entry(ops, &link_ops, list) {
>> +		if (!strncmp(ops->type, type, IFNAMSIZ))
>> +			goto out;
>> +	}
>> +	ops = NULL;
>> +out:
>> +	mutex_unlock(&link_ops_mutex);
>> +	return ops;
>> +}
>> +
>> +static int nldev_newlink(struct sk_buff *skb, struct nlmsghdr *nlh,
>> +			  struct netlink_ext_ack *extack)
>> +{
>> +	struct nlattr *tb[RDMA_NLDEV_ATTR_MAX];
>> +	char ibdev_name[IB_DEVICE_NAME_MAX];
>> +	const struct rdma_link_ops *ops;
>> +	char ndev_name[IFNAMSIZ];
>> +	char type[IFNAMSIZ];
>> +	int err;
>> +
>> +	err = nlmsg_parse(nlh, 0, tb, RDMA_NLDEV_ATTR_MAX - 1,
>> +			  nldev_policy, extack);
>> +	if (err || !tb[RDMA_NLDEV_ATTR_DEV_NAME] ||
>> +	    !tb[RDMA_NLDEV_ATTR_LINK_TYPE] || !tb[RDMA_NLDEV_ATTR_NDEV_NAME]) {
>> +		err = -EINVAL;
>> +		goto err_out;
>> +	}
>> +
>> +	nla_strlcpy(ibdev_name, tb[RDMA_NLDEV_ATTR_DEV_NAME],
>> +		    sizeof(ibdev_name));
> 
> Query from userspace should be done RDMA_NLDEV_ATTR_DEV_INDEX, because
> it is our handle and not device name.

I don't understand.  There is no ib_device yet, so there is no device
index yet.

> 
>> +	nla_strlcpy(type, tb[RDMA_NLDEV_ATTR_LINK_TYPE], sizeof(type));
> 
> Will it mean that type should be unique?

The type will have to be unique; only one driver can register link ops
for a given type string.

Steve
diff mbox series

Patch

diff --git a/drivers/infiniband/core/nldev.c b/drivers/infiniband/core/nldev.c
index 0385ab438320..d107b982c210 100644
--- a/drivers/infiniband/core/nldev.c
+++ b/drivers/infiniband/core/nldev.c
@@ -33,6 +33,7 @@ 
 #include <linux/module.h>
 #include <linux/pid.h>
 #include <linux/pid_namespace.h>
+#include <linux/mutex.h>
 #include <net/netlink.h>
 #include <rdma/rdma_cm.h>
 #include <rdma/rdma_netlink.h>
@@ -107,6 +108,8 @@ 
 	[RDMA_NLDEV_ATTR_DRIVER_U32]		= { .type = NLA_U32 },
 	[RDMA_NLDEV_ATTR_DRIVER_S64]		= { .type = NLA_S64 },
 	[RDMA_NLDEV_ATTR_DRIVER_U64]		= { .type = NLA_U64 },
+	[RDMA_NLDEV_ATTR_LINK_TYPE]		= { .type = NLA_NUL_STRING,
+						    .len = IFNAMSIZ },
 };
 
 static int put_driver_name_print_type(struct sk_buff *msg, const char *name,
@@ -1072,6 +1075,110 @@  static int nldev_res_get_pd_dumpit(struct sk_buff *skb,
 	return res_get_common_dumpit(skb, cb, RDMA_RESTRACK_PD);
 }
 
+static LIST_HEAD(link_ops);
+static DEFINE_MUTEX(link_ops_mutex);
+
+void rdma_link_register(struct rdma_link_ops *ops)
+{
+	mutex_lock(&link_ops_mutex);
+	list_add(&ops->list, &link_ops);
+	mutex_unlock(&link_ops_mutex);
+}
+EXPORT_SYMBOL(rdma_link_register);
+
+void rdma_link_unregister(struct rdma_link_ops *ops)
+{
+	mutex_lock(&link_ops_mutex);
+	list_del_init(&ops->list);
+	mutex_unlock(&link_ops_mutex);
+}
+EXPORT_SYMBOL(rdma_link_unregister);
+
+static const struct rdma_link_ops *link_ops_get(const char *type)
+{
+	const struct rdma_link_ops *ops;
+
+	mutex_lock(&link_ops_mutex);
+	list_for_each_entry(ops, &link_ops, list) {
+		if (!strncmp(ops->type, type, IFNAMSIZ))
+			goto out;
+	}
+	ops = NULL;
+out:
+	mutex_unlock(&link_ops_mutex);
+	return ops;
+}
+
+static int nldev_newlink(struct sk_buff *skb, struct nlmsghdr *nlh,
+			  struct netlink_ext_ack *extack)
+{
+	struct nlattr *tb[RDMA_NLDEV_ATTR_MAX];
+	char ibdev_name[IB_DEVICE_NAME_MAX];
+	const struct rdma_link_ops *ops;
+	char ndev_name[IFNAMSIZ];
+	char type[IFNAMSIZ];
+	int err;
+
+	err = nlmsg_parse(nlh, 0, tb, RDMA_NLDEV_ATTR_MAX - 1,
+			  nldev_policy, extack);
+	if (err || !tb[RDMA_NLDEV_ATTR_DEV_NAME] ||
+	    !tb[RDMA_NLDEV_ATTR_LINK_TYPE] || !tb[RDMA_NLDEV_ATTR_NDEV_NAME]) {
+		err = -EINVAL;
+		goto err_out;
+	}
+
+	nla_strlcpy(ibdev_name, tb[RDMA_NLDEV_ATTR_DEV_NAME],
+		    sizeof(ibdev_name));
+	nla_strlcpy(type, tb[RDMA_NLDEV_ATTR_LINK_TYPE], sizeof(type));
+	nla_strlcpy(ndev_name, tb[RDMA_NLDEV_ATTR_NDEV_NAME],
+		    sizeof(ndev_name));
+	pr_debug("ibdev_name |%s| type |%s| ndev_name |%s|\n", ibdev_name,
+		 type, ndev_name);
+
+	ops = link_ops_get(type);
+	if (!ops) {
+		err = -ENODEV;
+		goto err_out;
+	}
+	
+	err = ops->newlink(ibdev_name, ndev_name);
+err_out:
+	return err;
+}
+
+static int nldev_dellink(struct sk_buff *skb, struct nlmsghdr *nlh,
+			  struct netlink_ext_ack *extack)
+{
+	struct nlattr *tb[RDMA_NLDEV_ATTR_MAX];
+	char ibdev_name[IB_DEVICE_NAME_MAX];
+	const struct rdma_link_ops *ops;
+	char type[IFNAMSIZ];
+	int err;
+
+	err = nlmsg_parse(nlh, 0, tb, RDMA_NLDEV_ATTR_MAX - 1,
+			  nldev_policy, extack);
+	if (err || !tb[RDMA_NLDEV_ATTR_DEV_NAME] ||
+	    !tb[RDMA_NLDEV_ATTR_LINK_TYPE]) {
+		err = !err ? -EINVAL : err;
+		goto err_out;
+	}
+
+	nla_strlcpy(ibdev_name, tb[RDMA_NLDEV_ATTR_DEV_NAME],
+		    sizeof(ibdev_name));
+	nla_strlcpy(type, tb[RDMA_NLDEV_ATTR_LINK_TYPE], sizeof(type));
+	pr_debug("ibdev_name |%s| type |%s|\n", ibdev_name, type);
+
+	ops = link_ops_get(type);
+	if (!ops) {
+		err = -ENODEV;
+		goto err_out;
+	}
+	
+	err = ops->dellink(ibdev_name);
+err_out:
+	return err;
+}
+
 static const struct rdma_nl_cbs nldev_cb_table[RDMA_NLDEV_NUM_OPS] = {
 	[RDMA_NLDEV_CMD_GET] = {
 		.doit = nldev_get_doit,
@@ -1110,6 +1217,12 @@  static int nldev_res_get_pd_dumpit(struct sk_buff *skb,
 	[RDMA_NLDEV_CMD_RES_PD_GET] = {
 		.dump = nldev_res_get_pd_dumpit,
 	},
+	[RDMA_NLDEV_CMD_NEWLINK] = {
+		.doit = nldev_newlink,
+	},
+	[RDMA_NLDEV_CMD_DELLINK] = {
+		.doit = nldev_dellink,
+	},
 };
 
 void __init nldev_init(void)
diff --git a/include/rdma/rdma_netlink.h b/include/rdma/rdma_netlink.h
index c369703fcd69..b5a5a1a7cddc 100644
--- a/include/rdma/rdma_netlink.h
+++ b/include/rdma/rdma_netlink.h
@@ -99,4 +99,15 @@  int ibnl_put_attr(struct sk_buff *skb, struct nlmsghdr *nlh,
  * Returns 0 on success or a negative for no listeners.
  */
 int rdma_nl_chk_listeners(unsigned int group);
+
+struct rdma_link_ops {
+	struct list_head list;
+	const char *type;
+	int (*newlink)(char *ibdev_name, char *ndev_name);
+	int (*dellink)(char *ibdev_name);
+};
+
+void rdma_link_register(struct rdma_link_ops *ops);
+void rdma_link_unregister(struct rdma_link_ops *ops);
+
 #endif /* _RDMA_NETLINK_H */
diff --git a/include/uapi/rdma/rdma_netlink.h b/include/uapi/rdma/rdma_netlink.h
index edba6351ac13..025172d41fa7 100644
--- a/include/uapi/rdma/rdma_netlink.h
+++ b/include/uapi/rdma/rdma_netlink.h
@@ -246,6 +246,10 @@  enum rdma_nldev_command {
 
 	RDMA_NLDEV_CMD_RES_PD_GET, /* can dump */
 
+	RDMA_NLDEV_CMD_NEWLINK,
+
+	RDMA_NLDEV_CMD_DELLINK,
+
 	RDMA_NLDEV_NUM_OPS
 };
 
@@ -427,6 +431,11 @@  enum rdma_nldev_attr {
 	RDMA_NLDEV_ATTR_DRIVER_U64,		/* u64 */
 
 	/*
+	 * Identifies the rdma driver. eg: "rxe" or "siw"
+	 */
+	RDMA_NLDEV_ATTR_LINK_TYPE,		/* string */
+
+	/*
 	 * Always the end
 	 */
 	RDMA_NLDEV_ATTR_MAX