From patchwork Wed Dec 5 15:14:15 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Steve Wise X-Patchwork-Id: 10714745 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id ECCE817D5 for ; Wed, 5 Dec 2018 18:51:42 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id DF85C2E1CA for ; Wed, 5 Dec 2018 18:51:42 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id DDCDA2E1E6; Wed, 5 Dec 2018 18:51:42 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.3 required=2.0 tests=BAYES_00,DATE_IN_PAST_03_06, MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 6C4FB2E1CA for ; Wed, 5 Dec 2018 18:51:42 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727763AbeLESvm (ORCPT ); Wed, 5 Dec 2018 13:51:42 -0500 Received: from opengridcomputing.com ([72.48.214.68]:51386 "EHLO smtp.opengridcomputing.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727297AbeLESvl (ORCPT ); Wed, 5 Dec 2018 13:51:41 -0500 Received: by smtp.opengridcomputing.com (Postfix, from userid 503) id B88FF227A0; Wed, 5 Dec 2018 12:51:40 -0600 (CST) Message-Id: <6d6a4bb000730db4fcd8e54e9c7a1151c68dba27.1544033819.git.swise@opengridcomputing.com> In-Reply-To: References: From: Steve Wise Date: Wed, 5 Dec 2018 07:14:15 -0800 Subject: [PATCH v6 1/4] rdma_rxe: serialize device removal To: dledford@redhat.com, jgg@mellanox.com Cc: linux-rdma@vger.kernel.org, BMT@zurich.ibm.com, leon@kernel.org, markb@mellanox.com, yanjun.zhu@oracle.com Sender: linux-rdma-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-rdma@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP The rxe device can be removed, which includes unregistering with the rdma core, from 3 places: a netdev notifier call, the sysfs handler used to delete a rxe device, and module unload. Currently these are not synchronized to ensure that the device is unregistered once and the memory only freed once. This commit adds proper serialization. Device removal and deregistration is consolidated into rxe_net_remove() which uses dev_list_lock to serialize removal from rxe_dev_list and ensures only 1 deregister. Functions net_to_rxe() and get_rxe_by_name() both now take a kref reference with dev_list_lock held to ensure that during a race between 2 removes, the rxe struct memory remains around until both racers release the reference. So now callers to these functions must call rxe_dev_put() when they are done using the rxe pointer. Signed-off-by: Steve Wise --- drivers/infiniband/sw/rxe/rxe.c | 8 -------- drivers/infiniband/sw/rxe/rxe.h | 1 - drivers/infiniband/sw/rxe/rxe_net.c | 26 ++++++++++++++++++++++---- drivers/infiniband/sw/rxe/rxe_net.h | 1 + drivers/infiniband/sw/rxe/rxe_sysfs.c | 4 ++-- 5 files changed, 25 insertions(+), 15 deletions(-) diff --git a/drivers/infiniband/sw/rxe/rxe.c b/drivers/infiniband/sw/rxe/rxe.c index 383e65c7bbc0..971f0862cefe 100644 --- a/drivers/infiniband/sw/rxe/rxe.c +++ b/drivers/infiniband/sw/rxe/rxe.c @@ -331,14 +331,6 @@ int rxe_add(struct rxe_dev *rxe, unsigned int mtu) return err; } -/* called by the ifc layer to remove a device */ -void rxe_remove(struct rxe_dev *rxe) -{ - rxe_unregister_device(rxe); - - rxe_dev_put(rxe); -} - static int __init rxe_module_init(void) { int err; diff --git a/drivers/infiniband/sw/rxe/rxe.h b/drivers/infiniband/sw/rxe/rxe.h index 8f79bd86d033..e0c0dce80bbf 100644 --- a/drivers/infiniband/sw/rxe/rxe.h +++ b/drivers/infiniband/sw/rxe/rxe.h @@ -96,7 +96,6 @@ static inline u32 rxe_crc32(struct rxe_dev *rxe, void rxe_set_mtu(struct rxe_dev *rxe, unsigned int dev_mtu); int rxe_add(struct rxe_dev *rxe, unsigned int mtu); -void rxe_remove(struct rxe_dev *rxe); void rxe_remove_all(void); void rxe_rcv(struct sk_buff *skb); diff --git a/drivers/infiniband/sw/rxe/rxe_net.c b/drivers/infiniband/sw/rxe/rxe_net.c index b26a8141f3ed..6dc1a5b20e31 100644 --- a/drivers/infiniband/sw/rxe/rxe_net.c +++ b/drivers/infiniband/sw/rxe/rxe_net.c @@ -57,6 +57,7 @@ struct rxe_dev *net_to_rxe(struct net_device *ndev) list_for_each_entry(rxe, &rxe_dev_list, list) { if (rxe->ndev == ndev) { found = rxe; + kref_get(&rxe->ref_cnt); break; } } @@ -74,6 +75,7 @@ struct rxe_dev *get_rxe_by_name(const char *name) list_for_each_entry(rxe, &rxe_dev_list, list) { if (!strcmp(name, dev_name(&rxe->ib_dev.dev))) { found = rxe; + kref_get(&rxe->ref_cnt); break; } } @@ -573,6 +575,21 @@ struct rxe_dev *rxe_net_add(struct net_device *ndev) return rxe; } +void rxe_net_remove(struct rxe_dev *rxe) +{ + bool already_removed; + + spin_lock_bh(&dev_list_lock); + already_removed = list_empty(&rxe->list); + list_del_init(&rxe->list); + spin_unlock_bh(&dev_list_lock); + + if (!already_removed) { + rxe_unregister_device(rxe); + rxe_dev_put(rxe); + } +} + void rxe_remove_all(void) { spin_lock_bh(&dev_list_lock); @@ -580,9 +597,10 @@ void rxe_remove_all(void) struct rxe_dev *rxe = list_first_entry(&rxe_dev_list, struct rxe_dev, list); - list_del(&rxe->list); + kref_get(&rxe->ref_cnt); spin_unlock_bh(&dev_list_lock); - rxe_remove(rxe); + rxe_net_remove(rxe); + rxe_dev_put(rxe); spin_lock_bh(&dev_list_lock); } spin_unlock_bh(&dev_list_lock); @@ -637,8 +655,7 @@ static int rxe_notify(struct notifier_block *not_blk, switch (event) { case NETDEV_UNREGISTER: - list_del(&rxe->list); - rxe_remove(rxe); + rxe_net_remove(rxe); break; case NETDEV_UP: rxe_port_up(rxe); @@ -666,6 +683,7 @@ static int rxe_notify(struct notifier_block *not_blk, event, ndev->name); break; } + rxe_dev_put(rxe); out: return NOTIFY_OK; } diff --git a/drivers/infiniband/sw/rxe/rxe_net.h b/drivers/infiniband/sw/rxe/rxe_net.h index 106c586dbb26..222234a8d525 100644 --- a/drivers/infiniband/sw/rxe/rxe_net.h +++ b/drivers/infiniband/sw/rxe/rxe_net.h @@ -44,6 +44,7 @@ struct rxe_recv_sockets { }; struct rxe_dev *rxe_net_add(struct net_device *ndev); +void rxe_net_remove(struct rxe_dev *rxe); int rxe_net_init(void); void rxe_net_exit(void); diff --git a/drivers/infiniband/sw/rxe/rxe_sysfs.c b/drivers/infiniband/sw/rxe/rxe_sysfs.c index 73a19f808e1b..fc52eb3d1856 100644 --- a/drivers/infiniband/sw/rxe/rxe_sysfs.c +++ b/drivers/infiniband/sw/rxe/rxe_sysfs.c @@ -137,8 +137,8 @@ static int rxe_param_set_remove(const char *val, const struct kernel_param *kp) return -EINVAL; } - list_del(&rxe->list); - rxe_remove(rxe); + rxe_net_remove(rxe); + rxe_dev_put(rxe); return 0; } From patchwork Wed Dec 5 15:14:17 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Steve Wise X-Patchwork-Id: 10714747 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 5160C17D5 for ; Wed, 5 Dec 2018 18:51:47 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 43C682E1C6 for ; Wed, 5 Dec 2018 18:51:47 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 37A3B2E1CF; Wed, 5 Dec 2018 18:51:47 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.3 required=2.0 tests=BAYES_00,DATE_IN_PAST_03_06, MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id C6C022E1CA for ; Wed, 5 Dec 2018 18:51:46 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727783AbeLESvq (ORCPT ); Wed, 5 Dec 2018 13:51:46 -0500 Received: from opengridcomputing.com ([72.48.214.68]:51410 "EHLO smtp.opengridcomputing.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727297AbeLESvq (ORCPT ); Wed, 5 Dec 2018 13:51:46 -0500 Received: by smtp.opengridcomputing.com (Postfix, from userid 503) id C296A227AF; Wed, 5 Dec 2018 12:51:45 -0600 (CST) Message-Id: In-Reply-To: References: From: Steve Wise Date: Wed, 5 Dec 2018 07:14:17 -0800 Subject: [PATCH v6 2/4] rdma_rxe: schedule rxe_net_remove() To: dledford@redhat.com, jgg@mellanox.com Cc: linux-rdma@vger.kernel.org, BMT@zurich.ibm.com, leon@kernel.org, markb@mellanox.com, yanjun.zhu@oracle.com Sender: linux-rdma-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-rdma@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Schedule rxe_net_remove() to run in a workqueue context to avoid a deadlock when processing NETDEV_UNREGISTER events. Signed-off-by: Steve Wise --- drivers/infiniband/sw/rxe/rxe.c | 1 + drivers/infiniband/sw/rxe/rxe_net.c | 12 +++++++++++- drivers/infiniband/sw/rxe/rxe_net.h | 1 + drivers/infiniband/sw/rxe/rxe_verbs.h | 1 + 4 files changed, 14 insertions(+), 1 deletion(-) diff --git a/drivers/infiniband/sw/rxe/rxe.c b/drivers/infiniband/sw/rxe/rxe.c index 971f0862cefe..19d8e522b72b 100644 --- a/drivers/infiniband/sw/rxe/rxe.c +++ b/drivers/infiniband/sw/rxe/rxe.c @@ -280,6 +280,7 @@ static int rxe_init(struct rxe_dev *rxe) spin_lock_init(&rxe->pending_lock); INIT_LIST_HEAD(&rxe->pending_mmaps); INIT_LIST_HEAD(&rxe->list); + INIT_WORK(&rxe->net_remove_work, rxe_process_net_remove); mutex_init(&rxe->usdev_lock); diff --git a/drivers/infiniband/sw/rxe/rxe_net.c b/drivers/infiniband/sw/rxe/rxe_net.c index 6dc1a5b20e31..f1b559d728b6 100644 --- a/drivers/infiniband/sw/rxe/rxe_net.c +++ b/drivers/infiniband/sw/rxe/rxe_net.c @@ -643,6 +643,15 @@ void rxe_port_down(struct rxe_dev *rxe) dev_info(&rxe->ib_dev.dev, "set down\n"); } +void rxe_process_net_remove(struct work_struct *work) +{ + struct rxe_dev *rxe; + + rxe = container_of(work, struct rxe_dev, net_remove_work); + rxe_net_remove(rxe); + rxe_dev_put(rxe); +} + static int rxe_notify(struct notifier_block *not_blk, unsigned long event, void *arg) @@ -655,7 +664,8 @@ static int rxe_notify(struct notifier_block *not_blk, switch (event) { case NETDEV_UNREGISTER: - rxe_net_remove(rxe); + kref_get(&rxe->ref_cnt); + schedule_work(&rxe->net_remove_work); break; case NETDEV_UP: rxe_port_up(rxe); diff --git a/drivers/infiniband/sw/rxe/rxe_net.h b/drivers/infiniband/sw/rxe/rxe_net.h index 222234a8d525..feb1d715468c 100644 --- a/drivers/infiniband/sw/rxe/rxe_net.h +++ b/drivers/infiniband/sw/rxe/rxe_net.h @@ -48,5 +48,6 @@ struct rxe_recv_sockets { int rxe_net_init(void); void rxe_net_exit(void); +void rxe_process_net_remove(struct work_struct *work); #endif /* RXE_NET_H */ diff --git a/drivers/infiniband/sw/rxe/rxe_verbs.h b/drivers/infiniband/sw/rxe/rxe_verbs.h index 831381b7788d..32c8ecb707d1 100644 --- a/drivers/infiniband/sw/rxe/rxe_verbs.h +++ b/drivers/infiniband/sw/rxe/rxe_verbs.h @@ -414,6 +414,7 @@ struct rxe_dev { struct rxe_port port; struct list_head list; struct crypto_shash *tfm; + struct work_struct net_remove_work; }; static inline void rxe_counter_inc(struct rxe_dev *rxe, enum rxe_counters cnt) From patchwork Wed Dec 5 15:14:18 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Steve Wise X-Patchwork-Id: 10714749 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id AADEE13BB for ; Wed, 5 Dec 2018 18:51:52 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 9D0F32E1CE for ; Wed, 5 Dec 2018 18:51:52 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 9B9FB2E1DC; Wed, 5 Dec 2018 18:51:52 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.3 required=2.0 tests=BAYES_00,DATE_IN_PAST_03_06, MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id E94CF2E1D6 for ; Wed, 5 Dec 2018 18:51:51 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727804AbeLESvv (ORCPT ); Wed, 5 Dec 2018 13:51:51 -0500 Received: from opengridcomputing.com ([72.48.214.68]:51430 "EHLO smtp.opengridcomputing.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727297AbeLESvv (ORCPT ); Wed, 5 Dec 2018 13:51:51 -0500 Received: by smtp.opengridcomputing.com (Postfix, from userid 503) id CCA6B227AE; Wed, 5 Dec 2018 12:51:50 -0600 (CST) Message-Id: In-Reply-To: References: From: Steve Wise Date: Wed, 5 Dec 2018 07:14:18 -0800 Subject: [PATCH v6 3/4] RDMA/Core: add RDMA_NLDEV_CMD_NEWLINK/DELLINK support To: dledford@redhat.com, jgg@mellanox.com Cc: linux-rdma@vger.kernel.org, BMT@zurich.ibm.com, leon@kernel.org, markb@mellanox.com, yanjun.zhu@oracle.com Sender: linux-rdma-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-rdma@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Add support for new LINK messages to allow adding and deleting rdma interfaces. This will be used initially for soft rdma drivers which instantiate device instances dynamically by the admin specifying a netdev device to use. The rdma_rxe module will be the first user of these messages. The design is modeled after RTNL_NEWLINK/DELLINK: rdma drivers register with the rdma core if they provide link add/delete functions. Each driver registers with a unique "type" string, that is used to dispatch messages coming from user space. A new RDMA_NLDEV_ATTR is defined for the "type" string. User mode will pass 3 attributes in a NEWLINK message: RDMA_NLDEV_ATTR_DEV_NAME for the desired rdma device name to be created, RDMA_NLDEV_ATTR_LINK_TYPE for the "type" of link being added, and RDMA_NLDEV_ATTR_NDEV_NAME for the net_device interface to use for this link. The DELLINK message will contain the RDMA_NLDEV_ATTR_DEV_INDEX of the device to delete. Signed-off-by: Steve Wise Reviewed-by: Leon Romanovsky Reviewed-by: Michael J. Ruhl Signed-off-by: Jason Gunthorpe --- drivers/infiniband/core/nldev.c | 134 +++++++++++++++++++++++++++++++++++++++ include/rdma/ib_verbs.h | 2 + include/rdma/rdma_netlink.h | 13 ++++ include/uapi/rdma/rdma_netlink.h | 11 +++- 4 files changed, 158 insertions(+), 2 deletions(-) diff --git a/drivers/infiniband/core/nldev.c b/drivers/infiniband/core/nldev.c index 9abbadb9e366..b6d97c592074 100644 --- a/drivers/infiniband/core/nldev.c +++ b/drivers/infiniband/core/nldev.c @@ -33,6 +33,7 @@ #include #include #include +#include #include #include #include @@ -107,6 +108,8 @@ [RDMA_NLDEV_ATTR_DRIVER_U32] = { .type = NLA_U32 }, [RDMA_NLDEV_ATTR_DRIVER_S64] = { .type = NLA_S64 }, [RDMA_NLDEV_ATTR_DRIVER_U64] = { .type = NLA_U64 }, + [RDMA_NLDEV_ATTR_LINK_TYPE] = { .type = NLA_NUL_STRING, + .len = RDMA_NLDEV_ATTR_ENTRY_STRLEN }, }; static int put_driver_name_print_type(struct sk_buff *msg, const char *name, @@ -1104,6 +1107,129 @@ static int nldev_res_get_pd_dumpit(struct sk_buff *skb, return res_get_common_dumpit(skb, cb, RDMA_RESTRACK_PD); } +static LIST_HEAD(link_ops); +static DECLARE_RWSEM(link_ops_rwsem); + +static const struct rdma_link_ops *link_ops_get(const char *type) +{ + const struct rdma_link_ops *ops; + + list_for_each_entry(ops, &link_ops, list) { + if (!strcmp(ops->type, type)) + goto out; + } + ops = NULL; +out: + return ops; +} + +void rdma_link_register(struct rdma_link_ops *ops) +{ + down_write(&link_ops_rwsem); + if (link_ops_get(ops->type)) { + WARN_ONCE("Duplicate rdma_link_ops! %s\n", ops->type); + goto out; + } + list_add(&ops->list, &link_ops); +out: + up_write(&link_ops_rwsem); +} +EXPORT_SYMBOL(rdma_link_register); + +void rdma_link_unregister(struct rdma_link_ops *ops) +{ + down_write(&link_ops_rwsem); + list_del(&ops->list); + up_write(&link_ops_rwsem); +} +EXPORT_SYMBOL(rdma_link_unregister); + +static int nldev_newlink(struct sk_buff *skb, struct nlmsghdr *nlh, + struct netlink_ext_ack *extack) +{ + struct nlattr *tb[RDMA_NLDEV_ATTR_MAX]; + char ibdev_name[IB_DEVICE_NAME_MAX]; + const struct rdma_link_ops *ops; + struct ib_device *device; + char ndev_name[IFNAMSIZ]; + char type[IFNAMSIZ]; + int err; + + err = nlmsg_parse(nlh, 0, tb, RDMA_NLDEV_ATTR_MAX - 1, + nldev_policy, extack); + if (err || !tb[RDMA_NLDEV_ATTR_DEV_NAME] || + !tb[RDMA_NLDEV_ATTR_LINK_TYPE] || !tb[RDMA_NLDEV_ATTR_NDEV_NAME]) + return -EINVAL; + + nla_strlcpy(ibdev_name, tb[RDMA_NLDEV_ATTR_DEV_NAME], + sizeof(ibdev_name)); + if (strchr(ibdev_name, '%')) + return -EINVAL; + + nla_strlcpy(type, tb[RDMA_NLDEV_ATTR_LINK_TYPE], sizeof(type)); + nla_strlcpy(ndev_name, tb[RDMA_NLDEV_ATTR_NDEV_NAME], + sizeof(ndev_name)); + + down_read(&link_ops_rwsem); + ops = link_ops_get(type); +#ifdef CONFIG_MODULES + if (!ops) { + up_read(&link_ops_rwsem); + request_module("rdma-link-%s", type); + down_read(&link_ops_rwsem); + ops = link_ops_get(type); + } +#endif + if (ops) { + device = ops->newlink(ibdev_name, ndev_name); + if (IS_ERR(device)) + err = PTR_ERR(device); + else + device->link_ops = ops; + } else { + err = -ENODEV; + } + up_read(&link_ops_rwsem); + + return err; +} + +static int nldev_dellink(struct sk_buff *skb, struct nlmsghdr *nlh, + struct netlink_ext_ack *extack) +{ + struct nlattr *tb[RDMA_NLDEV_ATTR_MAX]; + const struct rdma_link_ops *ops; + struct device *dma_device; + struct ib_device *device; + u32 index; + int err; + + err = nlmsg_parse(nlh, 0, tb, RDMA_NLDEV_ATTR_MAX - 1, + nldev_policy, extack); + if (err || !tb[RDMA_NLDEV_ATTR_DEV_INDEX]) + return -EINVAL; + + index = nla_get_u32(tb[RDMA_NLDEV_ATTR_DEV_INDEX]); + device = ib_device_get_by_index(index); + if (!device) + return -EINVAL; + + ops = device->link_ops; + + /* + * Deref the ib_device before deleting it. Otherwise we + * deadlock unregistering the device. Hold a ref on the + * underlying dma_device though to keep the memory around + * until we're done. + */ + dma_device = get_device(device->dma_device); + ib_device_put(device); + err = ops ? ops->dellink(device) : -ENODEV; + put_device(dma_device); + + return err; +} + static const struct rdma_nl_cbs nldev_cb_table[RDMA_NLDEV_NUM_OPS] = { [RDMA_NLDEV_CMD_GET] = { .doit = nldev_get_doit, @@ -1113,6 +1239,14 @@ static int nldev_res_get_pd_dumpit(struct sk_buff *skb, .doit = nldev_set_doit, .flags = RDMA_NL_ADMIN_PERM, }, + [RDMA_NLDEV_CMD_NEWLINK] = { + .doit = nldev_newlink, + .flags = RDMA_NL_ADMIN_PERM, + }, + [RDMA_NLDEV_CMD_DELLINK] = { + .doit = nldev_dellink, + .flags = RDMA_NL_ADMIN_PERM, + }, [RDMA_NLDEV_CMD_PORT_GET] = { .doit = nldev_port_get_doit, .dump = nldev_port_get_dumpit, diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h index 85021451eee0..237d87c52e3a 100644 --- a/include/rdma/ib_verbs.h +++ b/include/rdma/ib_verbs.h @@ -2612,6 +2612,8 @@ struct ib_device { */ refcount_t refcount; struct completion unreg_completion; + + const struct rdma_link_ops *link_ops; }; struct ib_client { diff --git a/include/rdma/rdma_netlink.h b/include/rdma/rdma_netlink.h index 70218e6b5187..9b50e957cbae 100644 --- a/include/rdma/rdma_netlink.h +++ b/include/rdma/rdma_netlink.h @@ -99,4 +99,17 @@ int ibnl_put_attr(struct sk_buff *skb, struct nlmsghdr *nlh, * Returns true on success or false if no listeners. */ bool rdma_nl_chk_listeners(unsigned int group); + +struct rdma_link_ops { + struct list_head list; + const char *type; + struct ib_device *(*newlink)(const char *ibdev_name, + const char *ndev_name); + int (*dellink)(struct ib_device *ibdev); +}; + +void rdma_link_register(struct rdma_link_ops *ops); +void rdma_link_unregister(struct rdma_link_ops *ops); + +#define MODULE_ALIAS_RDMA_LINK(type) MODULE_ALIAS("rdma-link-" type) #endif /* _RDMA_NETLINK_H */ diff --git a/include/uapi/rdma/rdma_netlink.h b/include/uapi/rdma/rdma_netlink.h index f9c41bf59efc..dfdfc2b608b8 100644 --- a/include/uapi/rdma/rdma_netlink.h +++ b/include/uapi/rdma/rdma_netlink.h @@ -229,9 +229,11 @@ enum rdma_nldev_command { RDMA_NLDEV_CMD_GET, /* can dump */ RDMA_NLDEV_CMD_SET, - /* 3 - 4 are free to use */ + RDMA_NLDEV_CMD_NEWLINK, - RDMA_NLDEV_CMD_PORT_GET = 5, /* can dump */ + RDMA_NLDEV_CMD_DELLINK, + + RDMA_NLDEV_CMD_PORT_GET, /* can dump */ /* 6 - 8 are free to use */ @@ -428,6 +430,11 @@ enum rdma_nldev_attr { RDMA_NLDEV_ATTR_DRIVER_U64, /* u64 */ /* + * Identifies the rdma driver. eg: "rxe" or "siw" + */ + RDMA_NLDEV_ATTR_LINK_TYPE, /* string */ + + /* * Always the end */ RDMA_NLDEV_ATTR_MAX From patchwork Wed Dec 5 15:14:20 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Steve Wise X-Patchwork-Id: 10714751 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 260CB13BB for ; Wed, 5 Dec 2018 18:51:58 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 161A02E1D6 for ; Wed, 5 Dec 2018 18:51:58 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 0966D2E18D; Wed, 5 Dec 2018 18:51:58 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.3 required=2.0 tests=BAYES_00,DATE_IN_PAST_03_06, MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 307FD2E1F1 for ; Wed, 5 Dec 2018 18:51:57 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727866AbeLESv5 (ORCPT ); Wed, 5 Dec 2018 13:51:57 -0500 Received: from opengridcomputing.com ([72.48.214.68]:51454 "EHLO smtp.opengridcomputing.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727297AbeLESv4 (ORCPT ); Wed, 5 Dec 2018 13:51:56 -0500 Received: by smtp.opengridcomputing.com (Postfix, from userid 503) id D6BBB227B6; Wed, 5 Dec 2018 12:51:55 -0600 (CST) Message-Id: In-Reply-To: References: From: Steve Wise Date: Wed, 5 Dec 2018 07:14:20 -0800 Subject: [PATCH v6 4/4] rdma_rxe: use netlink messages to add/delete links To: dledford@redhat.com, jgg@mellanox.com Cc: linux-rdma@vger.kernel.org, BMT@zurich.ibm.com, leon@kernel.org, markb@mellanox.com, yanjun.zhu@oracle.com Sender: linux-rdma-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-rdma@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Add support for the RDMA_NLDEV_CMD_NEWLINK/DELLINK messages which allow dynamically adding new RXE links. Deprecate the old module options for now. Cc: Moni Shoua Reviewed-by: Yanjun Zhu Signed-off-by: Steve Wise --- drivers/infiniband/sw/rxe/rxe.c | 66 +++++++++++++++++++++++++++++++++-- drivers/infiniband/sw/rxe/rxe.h | 3 +- drivers/infiniband/sw/rxe/rxe_net.c | 21 +++++++++-- drivers/infiniband/sw/rxe/rxe_net.h | 2 +- drivers/infiniband/sw/rxe/rxe_sysfs.c | 6 ++-- drivers/infiniband/sw/rxe/rxe_verbs.c | 4 +-- drivers/infiniband/sw/rxe/rxe_verbs.h | 2 +- 7 files changed, 92 insertions(+), 12 deletions(-) diff --git a/drivers/infiniband/sw/rxe/rxe.c b/drivers/infiniband/sw/rxe/rxe.c index 19d8e522b72b..b98141442cb2 100644 --- a/drivers/infiniband/sw/rxe/rxe.c +++ b/drivers/infiniband/sw/rxe/rxe.c @@ -31,6 +31,7 @@ * SOFTWARE. */ +#include #include #include "rxe.h" #include "rxe_loc.h" @@ -309,7 +310,7 @@ void rxe_set_mtu(struct rxe_dev *rxe, unsigned int ndev_mtu) /* called by ifc layer to create new rxe device. * The caller should allocate memory for rxe by calling ib_alloc_device. */ -int rxe_add(struct rxe_dev *rxe, unsigned int mtu) +int rxe_add(struct rxe_dev *rxe, unsigned int mtu, const char *ibdev_name) { int err; @@ -321,7 +322,7 @@ int rxe_add(struct rxe_dev *rxe, unsigned int mtu) rxe_set_mtu(rxe, mtu); - err = rxe_register_device(rxe); + err = rxe_register_device(rxe, ibdev_name); if (err) goto err1; @@ -332,6 +333,63 @@ int rxe_add(struct rxe_dev *rxe, unsigned int mtu) return err; } +static struct ib_device *rxe_newlink(const char *ibdev_name, + const char *ndev_name) +{ + struct net_device *ndev = NULL; + struct rxe_dev *rxe; + int err = 0; + + ndev = dev_get_by_name(&init_net, ndev_name); + if (!ndev) { + pr_err("interface %s not found\n", ndev_name); + err = -ENODEV; + goto err; + } + + rxe = net_to_rxe(ndev); + if (rxe) { + pr_err("already configured on %s\n", ndev_name); + err = -EEXIST; + rxe_dev_put(rxe); + goto err; + } + + rxe = rxe_net_add(ibdev_name, ndev); + if (!rxe) { + pr_err("failed to add %s\n", ndev_name); + err = -EINVAL; + goto err; + } + + if (netif_running(ndev) && netif_carrier_ok(ndev)) + rxe_port_up(rxe); + else + rxe_port_down(rxe); + pr_info("added %s to %s\n", rxe->ib_dev.name, ndev->name); +err: + if (ndev) + dev_put(ndev); + return err ? ERR_PTR(err) : &rxe->ib_dev; +} + +static int rxe_dellink(struct ib_device *device) +{ + struct rxe_dev *rxe = ibdev_to_rxe(device); + + if (!rxe) + return -ENODEV; + rxe_net_remove(rxe); + rxe_dev_put(rxe); + return 0; +} + +static struct rdma_link_ops rxe_link_ops = { + .type = "rxe", + .newlink = rxe_newlink, + .dellink = rxe_dellink, +}; + static int __init rxe_module_init(void) { int err; @@ -347,12 +405,14 @@ static int __init rxe_module_init(void) if (err) return err; + rdma_link_register(&rxe_link_ops); pr_info("loaded\n"); return 0; } static void __exit rxe_module_exit(void) { + rdma_link_unregister(&rxe_link_ops); rxe_remove_all(); rxe_net_exit(); rxe_cache_exit(); @@ -362,3 +422,5 @@ static void __exit rxe_module_exit(void) late_initcall(rxe_module_init); module_exit(rxe_module_exit); + +MODULE_ALIAS_RDMA_LINK("rxe"); diff --git a/drivers/infiniband/sw/rxe/rxe.h b/drivers/infiniband/sw/rxe/rxe.h index e0c0dce80bbf..b8e754af6310 100644 --- a/drivers/infiniband/sw/rxe/rxe.h +++ b/drivers/infiniband/sw/rxe/rxe.h @@ -95,7 +95,7 @@ static inline u32 rxe_crc32(struct rxe_dev *rxe, void rxe_set_mtu(struct rxe_dev *rxe, unsigned int dev_mtu); -int rxe_add(struct rxe_dev *rxe, unsigned int mtu); +int rxe_add(struct rxe_dev *rxe, unsigned int mtu, const char *ibdev_name); void rxe_remove_all(void); void rxe_rcv(struct sk_buff *skb); @@ -106,6 +106,7 @@ static inline void rxe_dev_put(struct rxe_dev *rxe) } struct rxe_dev *net_to_rxe(struct net_device *ndev); struct rxe_dev *get_rxe_by_name(const char *name); +struct rxe_dev *ibdev_to_rxe(struct ib_device *device); void rxe_port_up(struct rxe_dev *rxe); void rxe_port_down(struct rxe_dev *rxe); diff --git a/drivers/infiniband/sw/rxe/rxe_net.c b/drivers/infiniband/sw/rxe/rxe_net.c index f1b559d728b6..7897bcb61394 100644 --- a/drivers/infiniband/sw/rxe/rxe_net.c +++ b/drivers/infiniband/sw/rxe/rxe_net.c @@ -83,6 +83,23 @@ struct rxe_dev *get_rxe_by_name(const char *name) return found; } +struct rxe_dev *ibdev_to_rxe(struct ib_device *device) +{ + struct rxe_dev *rxe; + struct rxe_dev *found = NULL; + + spin_lock_bh(&dev_list_lock); + list_for_each_entry(rxe, &rxe_dev_list, list) { + if (&rxe->ib_dev == device) { + found = rxe; + kref_get(&rxe->ref_cnt); + break; + } + } + spin_unlock_bh(&dev_list_lock); + + return found; +} static struct rxe_recv_sockets recv_sockets; @@ -552,7 +569,7 @@ enum rdma_link_layer rxe_link_layer(struct rxe_dev *rxe, unsigned int port_num) return IB_LINK_LAYER_ETHERNET; } -struct rxe_dev *rxe_net_add(struct net_device *ndev) +struct rxe_dev *rxe_net_add(const char *ibdev_name, struct net_device *ndev) { int err; struct rxe_dev *rxe = NULL; @@ -563,7 +580,7 @@ struct rxe_dev *rxe_net_add(struct net_device *ndev) rxe->ndev = ndev; - err = rxe_add(rxe, ndev->mtu); + err = rxe_add(rxe, ndev->mtu, ibdev_name); if (err) { ib_dealloc_device(&rxe->ib_dev); return NULL; diff --git a/drivers/infiniband/sw/rxe/rxe_net.h b/drivers/infiniband/sw/rxe/rxe_net.h index feb1d715468c..ced836f346ee 100644 --- a/drivers/infiniband/sw/rxe/rxe_net.h +++ b/drivers/infiniband/sw/rxe/rxe_net.h @@ -43,7 +43,7 @@ struct rxe_recv_sockets { struct socket *sk6; }; -struct rxe_dev *rxe_net_add(struct net_device *ndev); +struct rxe_dev *rxe_net_add(const char *ibdev_name, struct net_device *ndev); void rxe_net_remove(struct rxe_dev *rxe); int rxe_net_init(void); diff --git a/drivers/infiniband/sw/rxe/rxe_sysfs.c b/drivers/infiniband/sw/rxe/rxe_sysfs.c index fc52eb3d1856..1a3410647854 100644 --- a/drivers/infiniband/sw/rxe/rxe_sysfs.c +++ b/drivers/infiniband/sw/rxe/rxe_sysfs.c @@ -97,7 +97,7 @@ static int rxe_param_set_add(const char *val, const struct kernel_param *kp) goto err; } - rxe = rxe_net_add(ndev); + rxe = rxe_net_add("rxe%d", ndev); if (!rxe) { pr_err("failed to add %s\n", intf); err = -EINVAL; @@ -152,6 +152,6 @@ static int rxe_param_set_remove(const char *val, const struct kernel_param *kp) }; module_param_cb(add, &rxe_add_ops, NULL, 0200); -MODULE_PARM_DESC(add, "Create RXE device over network interface"); +MODULE_PARM_DESC(add, "DEPRECATED. Create RXE device over network interface"); module_param_cb(remove, &rxe_remove_ops, NULL, 0200); -MODULE_PARM_DESC(remove, "Remove RXE device over network interface"); +MODULE_PARM_DESC(remove, "DEPRECATED. Remove RXE device over network interface"); diff --git a/drivers/infiniband/sw/rxe/rxe_verbs.c b/drivers/infiniband/sw/rxe/rxe_verbs.c index 30817c79ba96..3536ca4627cd 100644 --- a/drivers/infiniband/sw/rxe/rxe_verbs.c +++ b/drivers/infiniband/sw/rxe/rxe_verbs.c @@ -1165,7 +1165,7 @@ static ssize_t parent_show(struct device *device, .attrs = rxe_dev_attributes, }; -int rxe_register_device(struct rxe_dev *rxe) +int rxe_register_device(struct rxe_dev *rxe, const char *ibdev_name) { int err; struct ib_device *dev = &rxe->ib_dev; @@ -1273,7 +1273,7 @@ int rxe_register_device(struct rxe_dev *rxe) rdma_set_device_sysfs_group(dev, &rxe_attr_group); dev->driver_id = RDMA_DRIVER_RXE; - err = ib_register_device(dev, "rxe%d", NULL); + err = ib_register_device(dev, ibdev_name, NULL); if (err) { pr_warn("%s failed with error %d\n", __func__, err); goto err1; diff --git a/drivers/infiniband/sw/rxe/rxe_verbs.h b/drivers/infiniband/sw/rxe/rxe_verbs.h index 32c8ecb707d1..01e2e5604137 100644 --- a/drivers/infiniband/sw/rxe/rxe_verbs.h +++ b/drivers/infiniband/sw/rxe/rxe_verbs.h @@ -467,7 +467,7 @@ static inline struct rxe_mem *to_rmw(struct ib_mw *mw) return mw ? container_of(mw, struct rxe_mem, ibmw) : NULL; } -int rxe_register_device(struct rxe_dev *rxe); +int rxe_register_device(struct rxe_dev *rxe, const char *ibdev_name); void rxe_unregister_device(struct rxe_dev *rxe); void rxe_mc_cleanup(struct rxe_pool_entry *arg);