From patchwork Fri Dec 14 16:05:47 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jason Gunthorpe X-Patchwork-Id: 10731821 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 3BC0B1399 for ; Fri, 14 Dec 2018 23:37:32 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id D42642AD56 for ; Fri, 14 Dec 2018 23:37:31 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id C7E562D562; Fri, 14 Dec 2018 23:37:31 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.4 required=2.0 tests=BAYES_00,DATE_IN_PAST_06_12, MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 6D9362AD56 for ; Fri, 14 Dec 2018 23:37:30 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727775AbeLNXha (ORCPT ); Fri, 14 Dec 2018 18:37:30 -0500 Received: from opengridcomputing.com ([72.48.214.68]:47318 "EHLO smtp.opengridcomputing.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726423AbeLNXha (ORCPT ); Fri, 14 Dec 2018 18:37:30 -0500 Received: by smtp.opengridcomputing.com (Postfix, from userid 503) id 1F59E227A9; Fri, 14 Dec 2018 17:37:29 -0600 (CST) Message-Id: In-Reply-To: References: From: Jason Gunthorpe Date: Fri, 14 Dec 2018 08:05:47 -0800 Subject: [PATCH v7 1/5] verbs/rxe: Use core services to keep track of RXE things To: dledford@redhat.com, jgg@mellanox.com Cc: linux-rdma@vger.kernel.org, BMT@zurich.ibm.com, leon@kernel.org, markb@mellanox.com, yanjun.zhu@oracle.com Sender: linux-rdma-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-rdma@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Replace - The nonsense rxe_dev.kref with rxe_dev.ib_dev.dev.kref - rxe_dev_list with core's device_list. - Buggy net_to_rxe with ib_device_get_by_netdev that does proper lifetime management - Buggy get_rxe_by_name with ib_device_get_by_name Add - ib_unregister_driver so drivers like rxe don't have to keep track of things to do module unload - ib_unregister_device_and_put to work with potiners returned by that various ib_device_get_* functions The refcount change to the pool might need more work, but fundamentally the driver cannot have krefs open in the pool across unregistration or it contains module unlolad races. Signed-off-by: Jason Gunthorpe --- drivers/infiniband/core/core_priv.h | 1 - drivers/infiniband/core/device.c | 213 ++++++++++++++++++++++++++++++++-- drivers/infiniband/sw/rxe/rxe.c | 38 +----- drivers/infiniband/sw/rxe/rxe.h | 14 ++- drivers/infiniband/sw/rxe/rxe_loc.h | 2 +- drivers/infiniband/sw/rxe/rxe_net.c | 83 +++---------- drivers/infiniband/sw/rxe/rxe_pool.c | 4 - drivers/infiniband/sw/rxe/rxe_sysfs.c | 39 +++---- drivers/infiniband/sw/rxe/rxe_verbs.c | 7 -- drivers/infiniband/sw/rxe/rxe_verbs.h | 3 - include/rdma/ib_verbs.h | 10 ++ 11 files changed, 261 insertions(+), 153 deletions(-) diff --git a/drivers/infiniband/core/core_priv.h b/drivers/infiniband/core/core_priv.h index cc7535c5e192..1b2575430032 100644 --- a/drivers/infiniband/core/core_priv.h +++ b/drivers/infiniband/core/core_priv.h @@ -267,7 +267,6 @@ static inline int ib_mad_enforce_security(struct ib_mad_agent_private *map, #endif struct ib_device *ib_device_get_by_index(u32 ifindex); -void ib_device_put(struct ib_device *device); /* RDMA device netlink */ void nldev_init(void); void nldev_exit(void); diff --git a/drivers/infiniband/core/device.c b/drivers/infiniband/core/device.c index 348a7fb1f945..a2523605e02a 100644 --- a/drivers/infiniband/core/device.c +++ b/drivers/infiniband/core/device.c @@ -85,6 +85,9 @@ struct ib_client_data { static DEFINE_MUTEX(device_mutex); static DECLARE_RWSEM(lists_rwsem); +/* Used to serialize ib_unregister_driver */ +static DECLARE_RWSEM(unregister_rwsem); + static int ib_security_change(struct notifier_block *nb, unsigned long event, void *lsm_data); static void ib_policy_change_task(struct work_struct *work); @@ -168,6 +171,7 @@ void ib_device_put(struct ib_device *device) if (refcount_dec_and_test(&device->refcount)) complete(&device->unreg_completion); } +EXPORT_SYMBOL(ib_device_put); static struct ib_device *__ib_device_get_by_name(const char *name) { @@ -180,6 +184,27 @@ static struct ib_device *__ib_device_get_by_name(const char *name) return NULL; } +struct ib_device *ib_device_get_by_name(const char *name, + enum rdma_driver_id driver_id) +{ + struct ib_device *device; + + down_read(&lists_rwsem); + device = __ib_device_get_by_name(name); + if (device && driver_id != RDMA_DRIVER_UNKNOWN && + device->driver_id != driver_id) + device = NULL; + + if (device) { + /* Do not return a device if unregistration has started. */ + if (!refcount_inc_not_zero(&device->refcount)) + device = NULL; + } + up_read(&lists_rwsem); + return device; +} +EXPORT_SYMBOL(ib_device_get_by_name); + int ib_device_rename(struct ib_device *ibdev, const char *name) { struct ib_device *device; @@ -641,28 +666,41 @@ int ib_register_device(struct ib_device *device, const char *name, } EXPORT_SYMBOL(ib_register_device); -/** - * ib_unregister_device - Unregister an IB device - * @device:Device to unregister +/* Returns false if the device is already undergoing unregistration in another + * thread, and the device pointer may be invalid upon return. * - * Unregister an IB device. All clients will receive a remove callback. + * If true is returned then the caller owns a kref associated with the device + * (taken from the kref owned by the list). + * + * Upon return the device cannot be returned by any 'get' functions. */ -void ib_unregister_device(struct ib_device *device) +static bool ib_pre_unregister(struct ib_device *device) +{ + bool already_removed; + + mutex_lock(&device_mutex); + down_write(&lists_rwsem); + already_removed = list_empty(&device->core_list); + list_del_init(&device->core_list); + up_write(&lists_rwsem); + mutex_unlock(&device_mutex); + + /* Matches the recound_set in ib_alloc_device */ + ib_device_put(device); + + return !already_removed; +} + +static void __ib_unregister_device(struct ib_device *device) { struct ib_client_data *context, *tmp; unsigned long flags; - /* - * Wait for all netlink command callers to finish working on the - * device. - */ - ib_device_put(device); wait_for_completion(&device->unreg_completion); mutex_lock(&device_mutex); down_write(&lists_rwsem); - list_del(&device->core_list); write_lock_irq(&device->client_data_lock); list_for_each_entry(context, &device->client_data_list, list) context->going_down = true; @@ -696,10 +734,116 @@ void ib_unregister_device(struct ib_device *device) up_write(&lists_rwsem); device->reg_state = IB_DEV_UNREGISTERED; + + /* + * Drivers using the new flow may not call ib_dealloc_device except + * in error unwind prior to registration success. + */ + if (device->driver_unregister) { + device->driver_unregister(device); + ib_dealloc_device(device); + } +} + +/** + * ib_unregister_device - Unregister an IB device + * @device: The device to unregister + * + * Unregister an IB device. All clients will receive a remove callback. + * + * Callers can call this routine only once, and must protect against + * races. Typically it should only be called as part of a remove callback in + * an implementation of driver core's struct device_driver and related. + * + * A driver must choose to use either only ib_unregister_device or + * ib_unregister_device_and_put & ib_unregister_driver. + */ +void ib_unregister_device(struct ib_device *device) +{ + down_read(&unregister_rwsem); + if (ib_pre_unregister(device)) + __ib_unregister_device(device); + else + WARN(true, "Driver mixing %s with _and_put", __func__); + up_read(&unregister_rwsem); } EXPORT_SYMBOL(ib_unregister_device); /** + * ib_unregister_device_and_put - Unregister a device while holding a 'get' + * device: The device to unregister + * + * This is the same as ib_unregister_device(), except it includes an internal + * ib_device_put() that should match a 'get' obtained by the caller. + * + * It is safe to call this routine concurrently from multiple threads while + * holding the 'get', however in this case return from the function does not + * guarantee the device is destroyed, only that destruction is in-progress. + * + * Drivers using this flow MUST use the driver_unregister callback to clean up + * their resources associated with the device and dealloc it. + */ +void ib_unregister_device_and_put(struct ib_device *device) +{ + down_read(&unregister_rwsem); + if (ib_pre_unregister(device)) { + /* + * Caller's get can be returned now that we hold the kref, + * otherwise ib_pre_unregister returned the caller's get + */ + ib_device_put(device); + __ib_unregister_device(device); + } + up_read(&unregister_rwsem); +} +EXPORT_SYMBOL(ib_unregister_device_and_put); + +/** + * ib_unregister_driver - Unregister all IB devices for a driver + * @driver_id: The driver to unregister + * + * This implements a fence for device unregistration. It only returns once all + * devices associated with the driver_id have fully completed their + * unregistration and returned from ib_unregister_device*(). + * + * If device's are not yet unregistered it goes ahead and starts unregistering + * them. + * + * The caller must ensure that no devices can be be added while this is + * running (or at all for the module_unload case) + */ +void ib_unregister_driver(enum rdma_driver_id driver_id) +{ + struct ib_device *device, *tmp; + LIST_HEAD(driver_devs); + + down_write(&unregister_rwsem); + mutex_lock(&device_mutex); + down_write(&lists_rwsem); + list_for_each_entry_safe(device, tmp, &device_list, core_list) { + if (device->driver_id == driver_id) + list_move(&device->core_list, &driver_devs); + } + up_write(&lists_rwsem); + + while ((device = list_first_entry_or_null( + &driver_devs, struct ib_device, core_list))) { + mutex_unlock(&device_mutex); + /* + * We are holding the unregister_rwsem, so it is impossible + * for another thread to be doing registration. + */ + if (!ib_pre_unregister(device)) + WARN(true, "Impossible failure of ib_pre_unregister"); + __ib_unregister_device(device); + mutex_lock(&device_mutex); + } + mutex_unlock(&device_mutex); + up_write(&unregister_rwsem); +} +EXPORT_SYMBOL(ib_unregister_driver); + +/** * ib_register_client - Register an IB client * @client:Client to register * @@ -981,6 +1125,53 @@ void ib_enum_roce_netdev(struct ib_device *ib_dev, } } +struct ib_device *ib_device_get_by_netdev(struct net_device *ndev, + enum rdma_driver_id driver_id) +{ + struct ib_device *ib_dev; + + down_read(&lists_rwsem); + list_for_each_entry(ib_dev, &device_list, core_list) { + unsigned int port; + + if (driver_id != RDMA_DRIVER_UNKNOWN && + ib_dev->driver_id != driver_id) + continue; + + if (!ib_dev->get_netdev) + continue; + + for (port = rdma_start_port(ib_dev); + port <= rdma_end_port(ib_dev); port++) { + struct net_device *idev; + + idev = ib_dev->get_netdev(ib_dev, port); + if (idev != ndev) { + if (idev) + dev_put(idev); + continue; + } + + /* + * Do not return a device if unregistration has + * started. + */ + if (!refcount_inc_not_zero(&ib_dev->refcount)) { + dev_put(idev); + continue; + } + + up_read(&lists_rwsem); + dev_put(idev); + return ib_dev; + } + } + up_read(&lists_rwsem); + + return NULL; +} +EXPORT_SYMBOL(ib_device_get_by_netdev); + /** * ib_enum_all_roce_netdevs - enumerate all RoCE devices * @filter: Should we call the callback? diff --git a/drivers/infiniband/sw/rxe/rxe.c b/drivers/infiniband/sw/rxe/rxe.c index 383e65c7bbc0..4069463f9a66 100644 --- a/drivers/infiniband/sw/rxe/rxe.c +++ b/drivers/infiniband/sw/rxe/rxe.c @@ -50,8 +50,10 @@ static void rxe_cleanup_ports(struct rxe_dev *rxe) /* free resources for a rxe device all objects created for this device must * have been destroyed */ -static void rxe_cleanup(struct rxe_dev *rxe) +void rxe_unregistered(struct ib_device *ib_dev) { + struct rxe_dev *rxe = container_of(ib_dev, struct rxe_dev, ib_dev); + rxe_pool_cleanup(&rxe->uc_pool); rxe_pool_cleanup(&rxe->pd_pool); rxe_pool_cleanup(&rxe->ah_pool); @@ -68,15 +70,6 @@ static void rxe_cleanup(struct rxe_dev *rxe) crypto_free_shash(rxe->tfm); } -/* called when all references have been dropped */ -void rxe_release(struct kref *kref) -{ - struct rxe_dev *rxe = container_of(kref, struct rxe_dev, ref_cnt); - - rxe_cleanup(rxe); - ib_dealloc_device(&rxe->ib_dev); -} - /* initialize rxe device parameters */ static void rxe_init_device_param(struct rxe_dev *rxe) { @@ -279,7 +272,6 @@ static int rxe_init(struct rxe_dev *rxe) spin_lock_init(&rxe->mmap_offset_lock); spin_lock_init(&rxe->pending_lock); INIT_LIST_HEAD(&rxe->pending_mmaps); - INIT_LIST_HEAD(&rxe->list); mutex_init(&rxe->usdev_lock); @@ -312,31 +304,13 @@ int rxe_add(struct rxe_dev *rxe, unsigned int mtu) { int err; - kref_init(&rxe->ref_cnt); - err = rxe_init(rxe); if (err) - goto err1; + return err; rxe_set_mtu(rxe, mtu); - err = rxe_register_device(rxe); - if (err) - goto err1; - - return 0; - -err1: - rxe_dev_put(rxe); - return err; -} - -/* called by the ifc layer to remove a device */ -void rxe_remove(struct rxe_dev *rxe) -{ - rxe_unregister_device(rxe); - - rxe_dev_put(rxe); + return rxe_register_device(rxe); } static int __init rxe_module_init(void) @@ -360,7 +334,7 @@ static int __init rxe_module_init(void) static void __exit rxe_module_exit(void) { - rxe_remove_all(); + ib_unregister_driver(RDMA_DRIVER_RXE); rxe_net_exit(); rxe_cache_exit(); diff --git a/drivers/infiniband/sw/rxe/rxe.h b/drivers/infiniband/sw/rxe/rxe.h index 8f79bd86d033..75e7113a756d 100644 --- a/drivers/infiniband/sw/rxe/rxe.h +++ b/drivers/infiniband/sw/rxe/rxe.h @@ -96,17 +96,19 @@ static inline u32 rxe_crc32(struct rxe_dev *rxe, void rxe_set_mtu(struct rxe_dev *rxe, unsigned int dev_mtu); int rxe_add(struct rxe_dev *rxe, unsigned int mtu); -void rxe_remove(struct rxe_dev *rxe); -void rxe_remove_all(void); void rxe_rcv(struct sk_buff *skb); -static inline void rxe_dev_put(struct rxe_dev *rxe) +/* The caller must do a matching ib_device_put(&dev->ib_dev) */ +static inline struct rxe_dev *rxe_get_dev_from_net(struct net_device *ndev) { - kref_put(&rxe->ref_cnt, rxe_release); + struct ib_device *ibdev = + ib_device_get_by_netdev(ndev, RDMA_DRIVER_RXE); + + if (!ibdev) + return NULL; + return container_of(ibdev, struct rxe_dev, ib_dev); } -struct rxe_dev *net_to_rxe(struct net_device *ndev); -struct rxe_dev *get_rxe_by_name(const char *name); void rxe_port_up(struct rxe_dev *rxe); void rxe_port_down(struct rxe_dev *rxe); diff --git a/drivers/infiniband/sw/rxe/rxe_loc.h b/drivers/infiniband/sw/rxe/rxe_loc.h index a675c9f2b427..e830d56a15c2 100644 --- a/drivers/infiniband/sw/rxe/rxe_loc.h +++ b/drivers/infiniband/sw/rxe/rxe_loc.h @@ -231,7 +231,7 @@ int rxe_srq_from_attr(struct rxe_dev *rxe, struct rxe_srq *srq, struct ib_srq_attr *attr, enum ib_srq_attr_mask mask, struct rxe_modify_srq_cmd *ucmd); -void rxe_release(struct kref *kref); +void rxe_unregistered(struct ib_device *ib_dev); int rxe_completer(void *arg); int rxe_requester(void *arg); diff --git a/drivers/infiniband/sw/rxe/rxe_net.c b/drivers/infiniband/sw/rxe/rxe_net.c index b26a8141f3ed..c7637ad5e584 100644 --- a/drivers/infiniband/sw/rxe/rxe_net.c +++ b/drivers/infiniband/sw/rxe/rxe_net.c @@ -45,43 +45,6 @@ #include "rxe_net.h" #include "rxe_loc.h" -static LIST_HEAD(rxe_dev_list); -static DEFINE_SPINLOCK(dev_list_lock); /* spinlock for device list */ - -struct rxe_dev *net_to_rxe(struct net_device *ndev) -{ - struct rxe_dev *rxe; - struct rxe_dev *found = NULL; - - spin_lock_bh(&dev_list_lock); - list_for_each_entry(rxe, &rxe_dev_list, list) { - if (rxe->ndev == ndev) { - found = rxe; - break; - } - } - spin_unlock_bh(&dev_list_lock); - - return found; -} - -struct rxe_dev *get_rxe_by_name(const char *name) -{ - struct rxe_dev *rxe; - struct rxe_dev *found = NULL; - - spin_lock_bh(&dev_list_lock); - list_for_each_entry(rxe, &rxe_dev_list, list) { - if (!strcmp(name, dev_name(&rxe->ib_dev.dev))) { - found = rxe; - break; - } - } - spin_unlock_bh(&dev_list_lock); - return found; -} - - static struct rxe_recv_sockets recv_sockets; struct device *rxe_dma_device(struct rxe_dev *rxe) @@ -229,18 +192,19 @@ static int rxe_udp_encap_recv(struct sock *sk, struct sk_buff *skb) struct udphdr *udph; struct net_device *ndev = skb->dev; struct net_device *rdev = ndev; - struct rxe_dev *rxe = net_to_rxe(ndev); + struct rxe_dev *rxe = rxe_get_dev_from_net(ndev); struct rxe_pkt_info *pkt = SKB_TO_PKT(skb); if (!rxe && is_vlan_dev(rdev)) { rdev = vlan_dev_real_dev(ndev); - rxe = net_to_rxe(rdev); + rxe = rxe_get_dev_from_net(rdev); } if (!rxe) goto drop; if (skb_linearize(skb)) { pr_err("skb_linearize failed\n"); + ib_device_put(&rxe->ib_dev); goto drop; } @@ -253,6 +217,12 @@ static int rxe_udp_encap_recv(struct sock *sk, struct sk_buff *skb) rxe_rcv(skb); + /* + * FIXME: this is in the wrong place, it needs to be done when pkt is + * destroyed + */ + ib_device_put(&rxe->ib_dev); + return 0; drop: kfree_skb(skb); @@ -558,36 +528,18 @@ struct rxe_dev *rxe_net_add(struct net_device *ndev) rxe = (struct rxe_dev *)ib_alloc_device(sizeof(*rxe)); if (!rxe) return NULL; - + rxe->ib_dev.driver_unregister = rxe_unregistered; rxe->ndev = ndev; err = rxe_add(rxe, ndev->mtu); if (err) { - ib_dealloc_device(&rxe->ib_dev); + rxe_unregistered(&rxe->ib_dev); return NULL; } - spin_lock_bh(&dev_list_lock); - list_add_tail(&rxe->list, &rxe_dev_list); - spin_unlock_bh(&dev_list_lock); return rxe; } -void rxe_remove_all(void) -{ - spin_lock_bh(&dev_list_lock); - while (!list_empty(&rxe_dev_list)) { - struct rxe_dev *rxe = - list_first_entry(&rxe_dev_list, struct rxe_dev, list); - - list_del(&rxe->list); - spin_unlock_bh(&dev_list_lock); - rxe_remove(rxe); - spin_lock_bh(&dev_list_lock); - } - spin_unlock_bh(&dev_list_lock); -} - static void rxe_port_event(struct rxe_dev *rxe, enum ib_event_type event) { @@ -630,16 +582,16 @@ static int rxe_notify(struct notifier_block *not_blk, void *arg) { struct net_device *ndev = netdev_notifier_info_to_dev(arg); - struct rxe_dev *rxe = net_to_rxe(ndev); + struct rxe_dev *rxe = rxe_get_dev_from_net(ndev); if (!rxe) - goto out; + return NOTIFY_OK; switch (event) { case NETDEV_UNREGISTER: - list_del(&rxe->list); - rxe_remove(rxe); - break; + ib_unregister_device_and_put(&rxe->ib_dev); + return NOTIFY_OK; + case NETDEV_UP: rxe_port_up(rxe); break; @@ -666,7 +618,8 @@ static int rxe_notify(struct notifier_block *not_blk, event, ndev->name); break; } -out: + + ib_device_put(&rxe->ib_dev); return NOTIFY_OK; } diff --git a/drivers/infiniband/sw/rxe/rxe_pool.c b/drivers/infiniband/sw/rxe/rxe_pool.c index a04a076e03af..bb7e3268713e 100644 --- a/drivers/infiniband/sw/rxe/rxe_pool.c +++ b/drivers/infiniband/sw/rxe/rxe_pool.c @@ -390,8 +390,6 @@ void *rxe_alloc(struct rxe_pool *pool) kref_get(&pool->ref_cnt); read_unlock_irqrestore(&pool->pool_lock, flags); - kref_get(&pool->rxe->ref_cnt); - if (atomic_inc_return(&pool->num_elem) > pool->max_elem) goto out_put_pool; @@ -408,7 +406,6 @@ void *rxe_alloc(struct rxe_pool *pool) out_put_pool: atomic_dec(&pool->num_elem); - rxe_dev_put(pool->rxe); rxe_pool_put(pool); return NULL; } @@ -424,7 +421,6 @@ void rxe_elem_release(struct kref *kref) kmem_cache_free(pool_cache(pool), elem); atomic_dec(&pool->num_elem); - rxe_dev_put(pool->rxe); rxe_pool_put(pool); } diff --git a/drivers/infiniband/sw/rxe/rxe_sysfs.c b/drivers/infiniband/sw/rxe/rxe_sysfs.c index 73a19f808e1b..87c17675cf9d 100644 --- a/drivers/infiniband/sw/rxe/rxe_sysfs.c +++ b/drivers/infiniband/sw/rxe/rxe_sysfs.c @@ -53,20 +53,14 @@ static int sanitize_arg(const char *val, char *intf, int intf_len) return len; } -static void rxe_set_port_state(struct net_device *ndev) +static void rxe_set_port_state(struct rxe_dev *rxe, struct net_device *ndev) { - struct rxe_dev *rxe = net_to_rxe(ndev); bool is_up = netif_running(ndev) && netif_carrier_ok(ndev); - if (!rxe) - goto out; - if (is_up) rxe_port_up(rxe); else rxe_port_down(rxe); /* down for unknown state */ -out: - return; } static int rxe_param_set_add(const char *val, const struct kernel_param *kp) @@ -74,24 +68,25 @@ static int rxe_param_set_add(const char *val, const struct kernel_param *kp) int len; int err = 0; char intf[32]; - struct net_device *ndev = NULL; + struct net_device *ndev; + struct rxe_dev *exists; struct rxe_dev *rxe; len = sanitize_arg(val, intf, sizeof(intf)); if (!len) { pr_err("add: invalid interface name\n"); - err = -EINVAL; - goto err; + return -EINVAL; } ndev = dev_get_by_name(&init_net, intf); if (!ndev) { pr_err("interface %s not found\n", intf); - err = -EINVAL; - goto err; + return -EINVAL; } - if (net_to_rxe(ndev)) { + exists = rxe_get_dev_from_net(ndev); + if (exists) { + ib_device_put(&exists->ib_dev); pr_err("already configured on %s\n", intf); err = -EINVAL; goto err; @@ -104,11 +99,11 @@ static int rxe_param_set_add(const char *val, const struct kernel_param *kp) goto err; } - rxe_set_port_state(ndev); + rxe_set_port_state(rxe, ndev); dev_info(&rxe->ib_dev.dev, "added %s\n", intf); + err: - if (ndev) - dev_put(ndev); + dev_put(ndev); return err; } @@ -116,7 +111,7 @@ static int rxe_param_set_remove(const char *val, const struct kernel_param *kp) { int len; char intf[32]; - struct rxe_dev *rxe; + struct ib_device *ib_dev; len = sanitize_arg(val, intf, sizeof(intf)); if (!len) { @@ -126,20 +121,18 @@ static int rxe_param_set_remove(const char *val, const struct kernel_param *kp) if (strncmp("all", intf, len) == 0) { pr_info("rxe_sys: remove all"); - rxe_remove_all(); + ib_unregister_driver(RDMA_DRIVER_RXE); return 0; } - rxe = get_rxe_by_name(intf); + ib_dev = ib_device_get_by_name(intf, RDMA_DRIVER_RXE); - if (!rxe) { + if (!ib_dev) { pr_err("not configured on %s\n", intf); return -EINVAL; } - list_del(&rxe->list); - rxe_remove(rxe); - + ib_unregister_device_and_put(ib_dev); return 0; } diff --git a/drivers/infiniband/sw/rxe/rxe_verbs.c b/drivers/infiniband/sw/rxe/rxe_verbs.c index 30817c79ba96..28beee8f5838 100644 --- a/drivers/infiniband/sw/rxe/rxe_verbs.c +++ b/drivers/infiniband/sw/rxe/rxe_verbs.c @@ -1286,10 +1286,3 @@ int rxe_register_device(struct rxe_dev *rxe) return err; } - -void rxe_unregister_device(struct rxe_dev *rxe) -{ - struct ib_device *dev = &rxe->ib_dev; - - ib_unregister_device(dev); -} diff --git a/drivers/infiniband/sw/rxe/rxe_verbs.h b/drivers/infiniband/sw/rxe/rxe_verbs.h index 831381b7788d..a0b635812784 100644 --- a/drivers/infiniband/sw/rxe/rxe_verbs.h +++ b/drivers/infiniband/sw/rxe/rxe_verbs.h @@ -385,7 +385,6 @@ struct rxe_dev { struct ib_device_attr attr; int max_ucontext; int max_inline_data; - struct kref ref_cnt; struct mutex usdev_lock; struct net_device *ndev; @@ -412,7 +411,6 @@ struct rxe_dev { u64 stats_counters[RXE_NUM_OF_COUNTERS]; struct rxe_port port; - struct list_head list; struct crypto_shash *tfm; }; @@ -467,7 +465,6 @@ static inline struct rxe_mem *to_rmw(struct ib_mw *mw) } int rxe_register_device(struct rxe_dev *rxe); -void rxe_unregister_device(struct rxe_dev *rxe); void rxe_mc_cleanup(struct rxe_pool_entry *arg); diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h index 85021451eee0..5fbeedacb3aa 100644 --- a/include/rdma/ib_verbs.h +++ b/include/rdma/ib_verbs.h @@ -2534,6 +2534,9 @@ struct ib_device { struct ib_counters_read_attr *counters_read_attr, struct uverbs_attr_bundle *attrs); + /* Called after all clients have been destroyed*/ + void (*driver_unregister)(struct ib_device *dev); + /** * rdma netdev operation * @@ -2653,6 +2656,8 @@ int ib_register_device(struct ib_device *device, const char *name, int (*port_callback)(struct ib_device *, u8, struct kobject *)); void ib_unregister_device(struct ib_device *device); +void ib_unregister_driver(enum rdma_driver_id driver_id); +void ib_unregister_device_and_put(struct ib_device *device); int ib_register_client (struct ib_client *client); void ib_unregister_client(struct ib_client *client); @@ -3937,6 +3942,11 @@ static inline bool ib_access_writable(int access_flags) int ib_check_mr_status(struct ib_mr *mr, u32 check_mask, struct ib_mr_status *mr_status); +void ib_device_put(struct ib_device *device); +struct ib_device *ib_device_get_by_netdev(struct net_device *ndev, + enum rdma_driver_id driver_id); +struct ib_device *ib_device_get_by_name(const char *name, + enum rdma_driver_id driver_id); struct net_device *ib_get_net_dev_by_params(struct ib_device *dev, u8 port, u16 pkey, const union ib_gid *gid, const struct sockaddr *addr); From patchwork Fri Dec 14 16:05:49 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Yuval Shaia X-Patchwork-Id: 10731823 X-Patchwork-Delegate: jgg@ziepe.ca Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id E15C11399 for ; Fri, 14 Dec 2018 23:37:35 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 87C612AD56 for ; Fri, 14 Dec 2018 23:37:35 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 7B97B2D562; Fri, 14 Dec 2018 23:37:35 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.4 required=2.0 tests=BAYES_00,DATE_IN_PAST_06_12, MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 1467A2AD56 for ; Fri, 14 Dec 2018 23:37:35 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728077AbeLNXhe (ORCPT ); Fri, 14 Dec 2018 18:37:34 -0500 Received: from opengridcomputing.com ([72.48.214.68]:47346 "EHLO smtp.opengridcomputing.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726423AbeLNXhe (ORCPT ); Fri, 14 Dec 2018 18:37:34 -0500 Received: by smtp.opengridcomputing.com (Postfix, from userid 503) id 29785227A9; Fri, 14 Dec 2018 17:37:34 -0600 (CST) Message-Id: <8ee7efc93c47cb6186834d4e17196238223cd2e6.1544826079.git.swise@opengridcomputing.com> In-Reply-To: References: From: Yuval Shaia Date: Fri, 14 Dec 2018 08:05:49 -0800 Subject: [PATCH v7 2/5] IB/rxe: Reuse code which sets port state MIME-Version: 1.0 To: dledford@redhat.com, jgg@mellanox.com Cc: linux-rdma@vger.kernel.org, BMT@zurich.ibm.com, leon@kernel.org, markb@mellanox.com, yanjun.zhu@oracle.com Sender: linux-rdma-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-rdma@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Same code is executed in both rxe_param_set_add and rxe_notify functions. Make one function and call it from both places. Note that the code that checks validity of rxe object is omitted since both callers already make sure it is valid. Signed-off-by: Yuval Shaia Reviewed-by: Steve Wise   Reviewed-by: Bart Van Assche --- drivers/infiniband/sw/rxe/rxe.h | 1 + drivers/infiniband/sw/rxe/rxe_net.c | 13 +++++++++---- drivers/infiniband/sw/rxe/rxe_sysfs.c | 12 +----------- 3 files changed, 11 insertions(+), 15 deletions(-) diff --git a/drivers/infiniband/sw/rxe/rxe.h b/drivers/infiniband/sw/rxe/rxe.h index 75e7113a756d..4ff33d921865 100644 --- a/drivers/infiniband/sw/rxe/rxe.h +++ b/drivers/infiniband/sw/rxe/rxe.h @@ -112,5 +112,6 @@ static inline struct rxe_dev *rxe_get_dev_from_net(struct net_device *ndev) void rxe_port_up(struct rxe_dev *rxe); void rxe_port_down(struct rxe_dev *rxe); +void rxe_set_port_state(struct net_device *ndev, struct rxe_dev *rxe); #endif /* RXE_H */ diff --git a/drivers/infiniband/sw/rxe/rxe_net.c b/drivers/infiniband/sw/rxe/rxe_net.c index c7637ad5e584..6a25b514d448 100644 --- a/drivers/infiniband/sw/rxe/rxe_net.c +++ b/drivers/infiniband/sw/rxe/rxe_net.c @@ -577,6 +577,14 @@ void rxe_port_down(struct rxe_dev *rxe) dev_info(&rxe->ib_dev.dev, "set down\n"); } +void rxe_set_port_state(struct net_device *ndev, struct rxe_dev *rxe) +{ + if (netif_running(ndev) && netif_carrier_ok(ndev)) + rxe_port_up(rxe); + else + rxe_port_down(rxe); +} + static int rxe_notify(struct notifier_block *not_blk, unsigned long event, void *arg) @@ -603,10 +611,7 @@ static int rxe_notify(struct notifier_block *not_blk, rxe_set_mtu(rxe, ndev->mtu); break; case NETDEV_CHANGE: - if (netif_running(ndev) && netif_carrier_ok(ndev)) - rxe_port_up(rxe); - else - rxe_port_down(rxe); + rxe_set_port_state(ndev, rxe); break; case NETDEV_REBOOT: case NETDEV_GOING_DOWN: diff --git a/drivers/infiniband/sw/rxe/rxe_sysfs.c b/drivers/infiniband/sw/rxe/rxe_sysfs.c index 87c17675cf9d..de9f51056f52 100644 --- a/drivers/infiniband/sw/rxe/rxe_sysfs.c +++ b/drivers/infiniband/sw/rxe/rxe_sysfs.c @@ -53,16 +53,6 @@ static int sanitize_arg(const char *val, char *intf, int intf_len) return len; } -static void rxe_set_port_state(struct rxe_dev *rxe, struct net_device *ndev) -{ - bool is_up = netif_running(ndev) && netif_carrier_ok(ndev); - - if (is_up) - rxe_port_up(rxe); - else - rxe_port_down(rxe); /* down for unknown state */ -} - static int rxe_param_set_add(const char *val, const struct kernel_param *kp) { int len; @@ -99,7 +89,7 @@ static int rxe_param_set_add(const char *val, const struct kernel_param *kp) goto err; } - rxe_set_port_state(rxe, ndev); + rxe_set_port_state(ndev, rxe); dev_info(&rxe->ib_dev.dev, "added %s\n", intf); err: From patchwork Fri Dec 14 16:05:51 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Steve Wise X-Patchwork-Id: 10731825 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id D13EC924 for ; Fri, 14 Dec 2018 23:37:40 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 75EE52AD56 for ; Fri, 14 Dec 2018 23:37:40 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 6A48F2D562; Fri, 14 Dec 2018 23:37:40 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.4 required=2.0 tests=BAYES_00,DATE_IN_PAST_06_12, MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 0F29E2AD56 for ; Fri, 14 Dec 2018 23:37:40 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728244AbeLNXhj (ORCPT ); Fri, 14 Dec 2018 18:37:39 -0500 Received: from opengridcomputing.com ([72.48.214.68]:47368 "EHLO smtp.opengridcomputing.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726423AbeLNXhj (ORCPT ); Fri, 14 Dec 2018 18:37:39 -0500 Received: by smtp.opengridcomputing.com (Postfix, from userid 503) id 3193B227A9; Fri, 14 Dec 2018 17:37:39 -0600 (CST) Message-Id: <69f5176272c93b5168b8df6ae71a5e285d124a86.1544826079.git.swise@opengridcomputing.com> In-Reply-To: References: From: Steve Wise Date: Fri, 14 Dec 2018 08:05:51 -0800 Subject: [PATCH v7 3/5] rdma_rxe: schedule ib_unregister_device_and_put() To: dledford@redhat.com, jgg@mellanox.com Cc: linux-rdma@vger.kernel.org, BMT@zurich.ibm.com, leon@kernel.org, markb@mellanox.com, yanjun.zhu@oracle.com Sender: linux-rdma-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-rdma@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Schedule ib_unregister_device_and_put() to run in a workqueue context to avoid a deadlock when processing NETDEV_UNREGISTER events. Signed-off-by: Steve Wise --- drivers/infiniband/sw/rxe/rxe.c | 1 + drivers/infiniband/sw/rxe/rxe_net.c | 10 +++++++++- drivers/infiniband/sw/rxe/rxe_net.h | 1 + drivers/infiniband/sw/rxe/rxe_verbs.h | 1 + 4 files changed, 12 insertions(+), 1 deletion(-) diff --git a/drivers/infiniband/sw/rxe/rxe.c b/drivers/infiniband/sw/rxe/rxe.c index 4069463f9a66..91ad339fd6e1 100644 --- a/drivers/infiniband/sw/rxe/rxe.c +++ b/drivers/infiniband/sw/rxe/rxe.c @@ -272,6 +272,7 @@ static int rxe_init(struct rxe_dev *rxe) spin_lock_init(&rxe->mmap_offset_lock); spin_lock_init(&rxe->pending_lock); INIT_LIST_HEAD(&rxe->pending_mmaps); + INIT_WORK(&rxe->netdev_unregister_work, rxe_netdev_unregister_work); mutex_init(&rxe->usdev_lock); diff --git a/drivers/infiniband/sw/rxe/rxe_net.c b/drivers/infiniband/sw/rxe/rxe_net.c index 6a25b514d448..c05549d09268 100644 --- a/drivers/infiniband/sw/rxe/rxe_net.c +++ b/drivers/infiniband/sw/rxe/rxe_net.c @@ -585,6 +585,14 @@ void rxe_set_port_state(struct net_device *ndev, struct rxe_dev *rxe) rxe_port_down(rxe); } +void rxe_netdev_unregister_work(struct work_struct *work) +{ + struct rxe_dev *rxe; + + rxe = container_of(work, struct rxe_dev, netdev_unregister_work); + ib_unregister_device_and_put(&rxe->ib_dev); +} + static int rxe_notify(struct notifier_block *not_blk, unsigned long event, void *arg) @@ -597,7 +605,7 @@ static int rxe_notify(struct notifier_block *not_blk, switch (event) { case NETDEV_UNREGISTER: - ib_unregister_device_and_put(&rxe->ib_dev); + schedule_work(&rxe->netdev_unregister_work); return NOTIFY_OK; case NETDEV_UP: diff --git a/drivers/infiniband/sw/rxe/rxe_net.h b/drivers/infiniband/sw/rxe/rxe_net.h index 106c586dbb26..685aa98baf55 100644 --- a/drivers/infiniband/sw/rxe/rxe_net.h +++ b/drivers/infiniband/sw/rxe/rxe_net.h @@ -47,5 +47,6 @@ struct rxe_recv_sockets { int rxe_net_init(void); void rxe_net_exit(void); +void rxe_netdev_unregister_work(struct work_struct *work); #endif /* RXE_NET_H */ diff --git a/drivers/infiniband/sw/rxe/rxe_verbs.h b/drivers/infiniband/sw/rxe/rxe_verbs.h index a0b635812784..ce63cbf0d738 100644 --- a/drivers/infiniband/sw/rxe/rxe_verbs.h +++ b/drivers/infiniband/sw/rxe/rxe_verbs.h @@ -412,6 +412,7 @@ struct rxe_dev { struct rxe_port port; struct crypto_shash *tfm; + struct work_struct netdev_unregister_work; }; static inline void rxe_counter_inc(struct rxe_dev *rxe, enum rxe_counters cnt) From patchwork Fri Dec 14 16:05:55 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Steve Wise X-Patchwork-Id: 10731827 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 646C8924 for ; Fri, 14 Dec 2018 23:37:46 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 0AFFF2AD56 for ; Fri, 14 Dec 2018 23:37:46 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id F30332D562; Fri, 14 Dec 2018 23:37:45 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.4 required=2.0 tests=BAYES_00,DATE_IN_PAST_06_12, MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 5EC0E2AD56 for ; Fri, 14 Dec 2018 23:37:45 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728415AbeLNXhp (ORCPT ); Fri, 14 Dec 2018 18:37:45 -0500 Received: from opengridcomputing.com ([72.48.214.68]:47392 "EHLO smtp.opengridcomputing.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726423AbeLNXhp (ORCPT ); Fri, 14 Dec 2018 18:37:45 -0500 Received: by smtp.opengridcomputing.com (Postfix, from userid 503) id 399D4227A9; Fri, 14 Dec 2018 17:37:44 -0600 (CST) Message-Id: <0f008a91c3221782886828e8fc4db1b5f928a7c5.1544826079.git.swise@opengridcomputing.com> In-Reply-To: References: From: Steve Wise Date: Fri, 14 Dec 2018 08:05:55 -0800 Subject: [PATCH v7 4/5] RDMA/Core: add RDMA_NLDEV_CMD_NEWLINK/DELLINK support To: dledford@redhat.com, jgg@mellanox.com Cc: linux-rdma@vger.kernel.org, BMT@zurich.ibm.com, leon@kernel.org, markb@mellanox.com, yanjun.zhu@oracle.com Sender: linux-rdma-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-rdma@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Add support for new LINK messages to allow adding and deleting rdma interfaces. This will be used initially for soft rdma drivers which instantiate device instances dynamically by the admin specifying a netdev device to use. The rdma_rxe module will be the first user of these messages. The design is modeled after RTNL_NEWLINK/DELLINK: rdma drivers register with the rdma core if they provide link add/delete functions. Each driver registers with a unique "type" string, that is used to dispatch messages coming from user space. A new RDMA_NLDEV_ATTR is defined for the "type" string. User mode will pass 3 attributes in a NEWLINK message: RDMA_NLDEV_ATTR_DEV_NAME for the desired rdma device name to be created, RDMA_NLDEV_ATTR_LINK_TYPE for the "type" of link being added, and RDMA_NLDEV_ATTR_NDEV_NAME for the net_device interface to use for this link. The DELLINK message will contain the RDMA_NLDEV_ATTR_DEV_INDEX of the device to delete. Signed-off-by: Steve Wise Reviewed-by: Leon Romanovsky Reviewed-by: Michael J. Ruhl --- drivers/infiniband/core/nldev.c | 122 +++++++++++++++++++++++++++++++++++++++ include/rdma/ib_verbs.h | 3 + include/rdma/rdma_netlink.h | 11 ++++ include/uapi/rdma/rdma_netlink.h | 11 +++- 4 files changed, 145 insertions(+), 2 deletions(-) diff --git a/drivers/infiniband/core/nldev.c b/drivers/infiniband/core/nldev.c index 9abbadb9e366..bd22f48e3ed5 100644 --- a/drivers/infiniband/core/nldev.c +++ b/drivers/infiniband/core/nldev.c @@ -33,6 +33,7 @@ #include #include #include +#include #include #include #include @@ -107,6 +108,8 @@ [RDMA_NLDEV_ATTR_DRIVER_U32] = { .type = NLA_U32 }, [RDMA_NLDEV_ATTR_DRIVER_S64] = { .type = NLA_S64 }, [RDMA_NLDEV_ATTR_DRIVER_U64] = { .type = NLA_U64 }, + [RDMA_NLDEV_ATTR_LINK_TYPE] = { .type = NLA_NUL_STRING, + .len = RDMA_NLDEV_ATTR_ENTRY_STRLEN }, }; static int put_driver_name_print_type(struct sk_buff *msg, const char *name, @@ -1104,6 +1107,117 @@ static int nldev_res_get_pd_dumpit(struct sk_buff *skb, return res_get_common_dumpit(skb, cb, RDMA_RESTRACK_PD); } +static LIST_HEAD(link_ops); +static DECLARE_RWSEM(link_ops_rwsem); + +static const struct rdma_link_ops *link_ops_get(const char *type) +{ + const struct rdma_link_ops *ops; + + list_for_each_entry(ops, &link_ops, list) { + if (!strcmp(ops->type, type)) + goto out; + } + ops = NULL; +out: + return ops; +} + +void rdma_link_register(struct rdma_link_ops *ops) +{ + down_write(&link_ops_rwsem); + if (link_ops_get(ops->type)) { + WARN_ONCE("Duplicate rdma_link_ops! %s\n", ops->type); + goto out; + } + list_add(&ops->list, &link_ops); +out: + up_write(&link_ops_rwsem); +} +EXPORT_SYMBOL(rdma_link_register); + +void rdma_link_unregister(struct rdma_link_ops *ops) +{ + down_write(&link_ops_rwsem); + list_del(&ops->list); + up_write(&link_ops_rwsem); +} +EXPORT_SYMBOL(rdma_link_unregister); + +static int nldev_newlink(struct sk_buff *skb, struct nlmsghdr *nlh, + struct netlink_ext_ack *extack) +{ + struct nlattr *tb[RDMA_NLDEV_ATTR_MAX]; + char ibdev_name[IB_DEVICE_NAME_MAX]; + const struct rdma_link_ops *ops; + char ndev_name[IFNAMSIZ]; + struct net_device *ndev; + char type[IFNAMSIZ]; + int err; + + err = nlmsg_parse(nlh, 0, tb, RDMA_NLDEV_ATTR_MAX - 1, + nldev_policy, extack); + if (err || !tb[RDMA_NLDEV_ATTR_DEV_NAME] || + !tb[RDMA_NLDEV_ATTR_LINK_TYPE] || !tb[RDMA_NLDEV_ATTR_NDEV_NAME]) + return -EINVAL; + + nla_strlcpy(ibdev_name, tb[RDMA_NLDEV_ATTR_DEV_NAME], + sizeof(ibdev_name)); + if (strchr(ibdev_name, '%')) + return -EINVAL; + + nla_strlcpy(type, tb[RDMA_NLDEV_ATTR_LINK_TYPE], sizeof(type)); + nla_strlcpy(ndev_name, tb[RDMA_NLDEV_ATTR_NDEV_NAME], + sizeof(ndev_name)); + + ndev = dev_get_by_name(&init_net, ndev_name); + if (!ndev) + return -ENODEV; + + down_read(&link_ops_rwsem); + ops = link_ops_get(type); +#ifdef CONFIG_MODULES + if (!ops) { + up_read(&link_ops_rwsem); + request_module("rdma-link-%s", type); + down_read(&link_ops_rwsem); + ops = link_ops_get(type); + } +#endif + err = ops ? ops->newlink(ibdev_name, ndev) : -EINVAL; + up_read(&link_ops_rwsem); + dev_put(ndev); + + return err; +} + +static int nldev_dellink(struct sk_buff *skb, struct nlmsghdr *nlh, + struct netlink_ext_ack *extack) +{ + struct nlattr *tb[RDMA_NLDEV_ATTR_MAX]; + struct ib_device *device; + u32 index; + int err; + + err = nlmsg_parse(nlh, 0, tb, RDMA_NLDEV_ATTR_MAX - 1, + nldev_policy, extack); + if (err || !tb[RDMA_NLDEV_ATTR_DEV_INDEX]) + return -EINVAL; + + index = nla_get_u32(tb[RDMA_NLDEV_ATTR_DEV_INDEX]); + device = ib_device_get_by_index(index); + if (!device) + return -EINVAL; + + if (!(device->attrs.device_cap_flags & IB_DEVICE_ALLOW_USER_UNREG)) { + ib_device_put(device); + return -EINVAL; + } + + ib_unregister_device_and_put(device); + return 0; +} + static const struct rdma_nl_cbs nldev_cb_table[RDMA_NLDEV_NUM_OPS] = { [RDMA_NLDEV_CMD_GET] = { .doit = nldev_get_doit, @@ -1113,6 +1227,14 @@ static int nldev_res_get_pd_dumpit(struct sk_buff *skb, .doit = nldev_set_doit, .flags = RDMA_NL_ADMIN_PERM, }, + [RDMA_NLDEV_CMD_NEWLINK] = { + .doit = nldev_newlink, + .flags = RDMA_NL_ADMIN_PERM, + }, + [RDMA_NLDEV_CMD_DELLINK] = { + .doit = nldev_dellink, + .flags = RDMA_NL_ADMIN_PERM, + }, [RDMA_NLDEV_CMD_PORT_GET] = { .doit = nldev_port_get_doit, .dump = nldev_port_get_dumpit, diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h index 5fbeedacb3aa..5a4833408476 100644 --- a/include/rdma/ib_verbs.h +++ b/include/rdma/ib_verbs.h @@ -238,6 +238,7 @@ enum ib_device_cap_flags { IB_DEVICE_RDMA_NETDEV_OPA_VNIC = (1ULL << 35), /* The device supports padding incoming writes to cacheline. */ IB_DEVICE_PCI_WRITE_END_PADDING = (1ULL << 36), + IB_DEVICE_ALLOW_USER_UNREG = (1ULL << 37), }; enum ib_signature_prot_cap { @@ -2615,6 +2616,8 @@ struct ib_device { */ refcount_t refcount; struct completion unreg_completion; + + const struct rdma_link_ops *link_ops; }; struct ib_client { diff --git a/include/rdma/rdma_netlink.h b/include/rdma/rdma_netlink.h index 70218e6b5187..10732ab31ba2 100644 --- a/include/rdma/rdma_netlink.h +++ b/include/rdma/rdma_netlink.h @@ -99,4 +99,15 @@ int ibnl_put_attr(struct sk_buff *skb, struct nlmsghdr *nlh, * Returns true on success or false if no listeners. */ bool rdma_nl_chk_listeners(unsigned int group); + +struct rdma_link_ops { + struct list_head list; + const char *type; + int (*newlink)(const char *ibdev_name, struct net_device *ndev); +}; + +void rdma_link_register(struct rdma_link_ops *ops); +void rdma_link_unregister(struct rdma_link_ops *ops); + +#define MODULE_ALIAS_RDMA_LINK(type) MODULE_ALIAS("rdma-link-" type) #endif /* _RDMA_NETLINK_H */ diff --git a/include/uapi/rdma/rdma_netlink.h b/include/uapi/rdma/rdma_netlink.h index f9c41bf59efc..dfdfc2b608b8 100644 --- a/include/uapi/rdma/rdma_netlink.h +++ b/include/uapi/rdma/rdma_netlink.h @@ -229,9 +229,11 @@ enum rdma_nldev_command { RDMA_NLDEV_CMD_GET, /* can dump */ RDMA_NLDEV_CMD_SET, - /* 3 - 4 are free to use */ + RDMA_NLDEV_CMD_NEWLINK, - RDMA_NLDEV_CMD_PORT_GET = 5, /* can dump */ + RDMA_NLDEV_CMD_DELLINK, + + RDMA_NLDEV_CMD_PORT_GET, /* can dump */ /* 6 - 8 are free to use */ @@ -428,6 +430,11 @@ enum rdma_nldev_attr { RDMA_NLDEV_ATTR_DRIVER_U64, /* u64 */ /* + * Identifies the rdma driver. eg: "rxe" or "siw" + */ + RDMA_NLDEV_ATTR_LINK_TYPE, /* string */ + + /* * Always the end */ RDMA_NLDEV_ATTR_MAX From patchwork Fri Dec 14 16:05:57 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Steve Wise X-Patchwork-Id: 10731829 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 84B0F13AD for ; Fri, 14 Dec 2018 23:37:51 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 2AFCF2AD56 for ; Fri, 14 Dec 2018 23:37:51 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 1D6CD2D562; Fri, 14 Dec 2018 23:37:51 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.4 required=2.0 tests=BAYES_00,DATE_IN_PAST_06_12, MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 7EC3F2AD56 for ; Fri, 14 Dec 2018 23:37:50 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728567AbeLNXhu (ORCPT ); Fri, 14 Dec 2018 18:37:50 -0500 Received: from opengridcomputing.com ([72.48.214.68]:47420 "EHLO smtp.opengridcomputing.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726423AbeLNXhu (ORCPT ); Fri, 14 Dec 2018 18:37:50 -0500 Received: by smtp.opengridcomputing.com (Postfix, from userid 503) id 43B48227A9; Fri, 14 Dec 2018 17:37:49 -0600 (CST) Message-Id: <454cee5c5751111e0705510d773b71be5cdcb751.1544826079.git.swise@opengridcomputing.com> In-Reply-To: References: From: Steve Wise Date: Fri, 14 Dec 2018 08:05:57 -0800 Subject: [PATCH v7 5/5] rdma_rxe: use netlink messages to add/delete links To: dledford@redhat.com, jgg@mellanox.com Cc: linux-rdma@vger.kernel.org, BMT@zurich.ibm.com, leon@kernel.org, markb@mellanox.com, yanjun.zhu@oracle.com Sender: linux-rdma-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-rdma@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Add support for the RDMA_NLDEV_CMD_NEWLINK/DELLINK messages which allow dynamically adding new RXE links. Deprecate the old module options for now. Cc: Moni Shoua Reviewed-by: Yanjun Zhu Signed-off-by: Steve Wise --- drivers/infiniband/sw/rxe/rxe.c | 39 +++++++++++++++++++++++++++++++++-- drivers/infiniband/sw/rxe/rxe.h | 2 +- drivers/infiniband/sw/rxe/rxe_net.c | 4 ++-- drivers/infiniband/sw/rxe/rxe_net.h | 2 +- drivers/infiniband/sw/rxe/rxe_param.h | 3 ++- drivers/infiniband/sw/rxe/rxe_sysfs.c | 6 +++--- drivers/infiniband/sw/rxe/rxe_verbs.c | 4 ++-- drivers/infiniband/sw/rxe/rxe_verbs.h | 2 +- 8 files changed, 49 insertions(+), 13 deletions(-) diff --git a/drivers/infiniband/sw/rxe/rxe.c b/drivers/infiniband/sw/rxe/rxe.c index 91ad339fd6e1..07fea762f0ea 100644 --- a/drivers/infiniband/sw/rxe/rxe.c +++ b/drivers/infiniband/sw/rxe/rxe.c @@ -31,6 +31,7 @@ * SOFTWARE. */ +#include #include #include "rxe.h" #include "rxe_loc.h" @@ -301,7 +302,7 @@ void rxe_set_mtu(struct rxe_dev *rxe, unsigned int ndev_mtu) /* called by ifc layer to create new rxe device. * The caller should allocate memory for rxe by calling ib_alloc_device. */ -int rxe_add(struct rxe_dev *rxe, unsigned int mtu) +int rxe_add(struct rxe_dev *rxe, unsigned int mtu, const char *ibdev_name) { int err; @@ -311,9 +312,39 @@ int rxe_add(struct rxe_dev *rxe, unsigned int mtu) rxe_set_mtu(rxe, mtu); - return rxe_register_device(rxe); + return rxe_register_device(rxe, ibdev_name); } +static int rxe_newlink(const char *ibdev_name, struct net_device *ndev) +{ + struct rxe_dev *rxe = rxe_get_dev_from_net(ndev); + int err = 0; + + if (rxe) { + pr_err("already configured on %s\n", ndev->name); + err = -EEXIST; + ib_device_put(&rxe->ib_dev); + goto err; + } + + rxe = rxe_net_add(ibdev_name, ndev); + if (!rxe) { + pr_err("failed to add %s\n", ndev->name); + err = -EINVAL; + goto err; + } + + rxe_set_port_state(ndev, rxe); + pr_info("added %s to %s\n", rxe->ib_dev.name, ndev->name); +err: + return err; +} + +static struct rdma_link_ops rxe_link_ops = { + .type = "rxe", + .newlink = rxe_newlink, +}; + static int __init rxe_module_init(void) { int err; @@ -329,12 +360,14 @@ static int __init rxe_module_init(void) if (err) return err; + rdma_link_register(&rxe_link_ops); pr_info("loaded\n"); return 0; } static void __exit rxe_module_exit(void) { + rdma_link_unregister(&rxe_link_ops); ib_unregister_driver(RDMA_DRIVER_RXE); rxe_net_exit(); rxe_cache_exit(); @@ -344,3 +377,5 @@ static void __exit rxe_module_exit(void) late_initcall(rxe_module_init); module_exit(rxe_module_exit); + +MODULE_ALIAS_RDMA_LINK("rxe"); diff --git a/drivers/infiniband/sw/rxe/rxe.h b/drivers/infiniband/sw/rxe/rxe.h index 4ff33d921865..d1861b2067e5 100644 --- a/drivers/infiniband/sw/rxe/rxe.h +++ b/drivers/infiniband/sw/rxe/rxe.h @@ -95,7 +95,7 @@ static inline u32 rxe_crc32(struct rxe_dev *rxe, void rxe_set_mtu(struct rxe_dev *rxe, unsigned int dev_mtu); -int rxe_add(struct rxe_dev *rxe, unsigned int mtu); +int rxe_add(struct rxe_dev *rxe, unsigned int mtu, const char *ibdev_name); void rxe_rcv(struct sk_buff *skb); diff --git a/drivers/infiniband/sw/rxe/rxe_net.c b/drivers/infiniband/sw/rxe/rxe_net.c index c05549d09268..d65c88ff949d 100644 --- a/drivers/infiniband/sw/rxe/rxe_net.c +++ b/drivers/infiniband/sw/rxe/rxe_net.c @@ -520,7 +520,7 @@ enum rdma_link_layer rxe_link_layer(struct rxe_dev *rxe, unsigned int port_num) return IB_LINK_LAYER_ETHERNET; } -struct rxe_dev *rxe_net_add(struct net_device *ndev) +struct rxe_dev *rxe_net_add(const char *ibdev_name, struct net_device *ndev) { int err; struct rxe_dev *rxe = NULL; @@ -531,7 +531,7 @@ struct rxe_dev *rxe_net_add(struct net_device *ndev) rxe->ib_dev.driver_unregister = rxe_unregistered; rxe->ndev = ndev; - err = rxe_add(rxe, ndev->mtu); + err = rxe_add(rxe, ndev->mtu, ibdev_name); if (err) { rxe_unregistered(&rxe->ib_dev); return NULL; diff --git a/drivers/infiniband/sw/rxe/rxe_net.h b/drivers/infiniband/sw/rxe/rxe_net.h index 685aa98baf55..4f87be455b5e 100644 --- a/drivers/infiniband/sw/rxe/rxe_net.h +++ b/drivers/infiniband/sw/rxe/rxe_net.h @@ -43,7 +43,7 @@ struct rxe_recv_sockets { struct socket *sk6; }; -struct rxe_dev *rxe_net_add(struct net_device *ndev); +struct rxe_dev *rxe_net_add(const char *ibdev_name, struct net_device *ndev); int rxe_net_init(void); void rxe_net_exit(void); diff --git a/drivers/infiniband/sw/rxe/rxe_param.h b/drivers/infiniband/sw/rxe/rxe_param.h index bdea899a58ac..1abed47ca221 100644 --- a/drivers/infiniband/sw/rxe/rxe_param.h +++ b/drivers/infiniband/sw/rxe/rxe_param.h @@ -78,7 +78,8 @@ enum rxe_device_param { | IB_DEVICE_SYS_IMAGE_GUID | IB_DEVICE_RC_RNR_NAK_GEN | IB_DEVICE_SRQ_RESIZE - | IB_DEVICE_MEM_MGT_EXTENSIONS, + | IB_DEVICE_MEM_MGT_EXTENSIONS + | IB_DEVICE_ALLOW_USER_UNREG, RXE_MAX_SGE = 32, RXE_MAX_SGE_RD = 32, RXE_MAX_CQ = 16384, diff --git a/drivers/infiniband/sw/rxe/rxe_sysfs.c b/drivers/infiniband/sw/rxe/rxe_sysfs.c index de9f51056f52..4dbd004956c8 100644 --- a/drivers/infiniband/sw/rxe/rxe_sysfs.c +++ b/drivers/infiniband/sw/rxe/rxe_sysfs.c @@ -82,7 +82,7 @@ static int rxe_param_set_add(const char *val, const struct kernel_param *kp) goto err; } - rxe = rxe_net_add(ndev); + rxe = rxe_net_add("rxe%d", ndev); if (!rxe) { pr_err("failed to add %s\n", intf); err = -EINVAL; @@ -135,6 +135,6 @@ static int rxe_param_set_remove(const char *val, const struct kernel_param *kp) }; module_param_cb(add, &rxe_add_ops, NULL, 0200); -MODULE_PARM_DESC(add, "Create RXE device over network interface"); +MODULE_PARM_DESC(add, "DEPRECATED. Create RXE device over network interface"); module_param_cb(remove, &rxe_remove_ops, NULL, 0200); -MODULE_PARM_DESC(remove, "Remove RXE device over network interface"); +MODULE_PARM_DESC(remove, "DEPRECATED. Remove RXE device over network interface"); diff --git a/drivers/infiniband/sw/rxe/rxe_verbs.c b/drivers/infiniband/sw/rxe/rxe_verbs.c index 28beee8f5838..05ed6d8f46ea 100644 --- a/drivers/infiniband/sw/rxe/rxe_verbs.c +++ b/drivers/infiniband/sw/rxe/rxe_verbs.c @@ -1165,7 +1165,7 @@ static ssize_t parent_show(struct device *device, .attrs = rxe_dev_attributes, }; -int rxe_register_device(struct rxe_dev *rxe) +int rxe_register_device(struct rxe_dev *rxe, const char *ibdev_name) { int err; struct ib_device *dev = &rxe->ib_dev; @@ -1273,7 +1273,7 @@ int rxe_register_device(struct rxe_dev *rxe) rdma_set_device_sysfs_group(dev, &rxe_attr_group); dev->driver_id = RDMA_DRIVER_RXE; - err = ib_register_device(dev, "rxe%d", NULL); + err = ib_register_device(dev, ibdev_name, NULL); if (err) { pr_warn("%s failed with error %d\n", __func__, err); goto err1; diff --git a/drivers/infiniband/sw/rxe/rxe_verbs.h b/drivers/infiniband/sw/rxe/rxe_verbs.h index ce63cbf0d738..cae04e016d83 100644 --- a/drivers/infiniband/sw/rxe/rxe_verbs.h +++ b/drivers/infiniband/sw/rxe/rxe_verbs.h @@ -465,7 +465,7 @@ static inline struct rxe_mem *to_rmw(struct ib_mw *mw) return mw ? container_of(mw, struct rxe_mem, ibmw) : NULL; } -int rxe_register_device(struct rxe_dev *rxe); +int rxe_register_device(struct rxe_dev *rxe, const char *ibdev_name); void rxe_mc_cleanup(struct rxe_pool_entry *arg);