From patchwork Wed Sep 27 09:32:48 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yuval Shaia X-Patchwork-Id: 9973611 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id E340260375 for ; Wed, 27 Sep 2017 09:33:29 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id CB8C028384 for ; Wed, 27 Sep 2017 09:33:29 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id C036B29157; Wed, 27 Sep 2017 09:33:29 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.9 required=2.0 tests=BAYES_00, RCVD_IN_DNSWL_HI, UNPARSEABLE_RELAY autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id B686728384 for ; Wed, 27 Sep 2017 09:33:28 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752350AbdI0Jd1 (ORCPT ); Wed, 27 Sep 2017 05:33:27 -0400 Received: from userp1040.oracle.com ([156.151.31.81]:19441 "EHLO userp1040.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750703AbdI0JdX (ORCPT ); Wed, 27 Sep 2017 05:33:23 -0400 Received: from userv0021.oracle.com (userv0021.oracle.com [156.151.31.71]) by userp1040.oracle.com (Sentrion-MTA-4.3.2/Sentrion-MTA-4.3.2) with ESMTP id v8R9X6CZ031105 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Wed, 27 Sep 2017 09:33:06 GMT Received: from aserv0121.oracle.com (aserv0121.oracle.com [141.146.126.235]) by userv0021.oracle.com (8.14.4/8.14.4) with ESMTP id v8R9X51x015009 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Wed, 27 Sep 2017 09:33:05 GMT Received: from abhmp0009.oracle.com (abhmp0009.oracle.com [141.146.116.15]) by aserv0121.oracle.com (8.14.4/8.13.8) with ESMTP id v8R9X2FN023741; Wed, 27 Sep 2017 09:33:02 GMT Received: from yuvallap.uk.oracle.com (/10.175.215.146) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Wed, 27 Sep 2017 02:33:02 -0700 From: Yuval Shaia To: dledford@redhat.com, sean.hefty@intel.com, hal.rosenstock@gmail.com, corbet@lwn.net, leon@kernel.org, valex@mellanox.com, erezsh@mellanox.com, yuval.shaia@oracle.com, dasaratharaman.chandramouli@intel.com, yanjun.zhu@oracle.com, pabeni@redhat.com, kernel@kyup.com, ferasda@mellanox.com, shamir.rabinovitch@oracle.com, mingo@kernel.org, linux-rdma@vger.kernel.org, mukesh.kacker@oracle.com, chien.yen@oracle.com Subject: [PATCH] IB/ipoib: Enable pkey and device name decoupling Date: Wed, 27 Sep 2017 12:32:48 +0300 Message-Id: <20170927093248.3819-1-yuval.shaia@oracle.com> X-Mailer: git-send-email 2.13.5 X-Source-IP: userv0021.oracle.com [156.151.31.71] Sender: linux-rdma-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-rdma@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP The sysfs "create_child" interface creates pkey based child interface but derives the name from parent device name and pkey value. This makes administration difficult where pkey values can change but policies encoded with device names do not. We add ability to create a child interface with a user specified name and a specified pkey with a new sysfs "create_named_child" interface (and also add a corresponding "delete_named_child" interface). We also add a new module api interface to query pkey from a netdevice so any kernel users of pkey based child interfaces can query it - since with device name decoupled from pkey, it can no longer be deduced from parsing the device name by other kernel users. Signed-off-by: Mukesh Kacker Reviewed-by: Yuval Shaia Reviewed-by: Chien-Hua Yen Signed-off-by: Yuval Shaia --- Documentation/infiniband/ipoib.txt | 12 ++ drivers/infiniband/ulp/ipoib/ipoib.h | 3 + drivers/infiniband/ulp/ipoib/ipoib_main.c | 187 ++++++++++++++++++++++++++++++ drivers/infiniband/ulp/ipoib/ipoib_vlan.c | 76 +++++++++++- 4 files changed, 272 insertions(+), 6 deletions(-) diff --git a/Documentation/infiniband/ipoib.txt b/Documentation/infiniband/ipoib.txt index 47c1dd9818f2..1db53c9b2906 100644 --- a/Documentation/infiniband/ipoib.txt +++ b/Documentation/infiniband/ipoib.txt @@ -21,6 +21,18 @@ Partitions and P_Keys echo 0x8001 > /sys/class/net/ib0/delete_child + Interfaces with a user chosen name can be created in a similar + manner with a different name and P_Key, by writing them into the + main interface's /sys/class/net//create_named_child + For example: + echo "epart2 0x8002" > /sys/class/net/ib1/create_named_child + + This will create an interfaces named epart2 with P_Key 0x8002 and + parent ib1. To remove a named subinterface, use the + "delete_named_child" file: + + echo epart2 > /sys/class/net/ib1/delete_named_child + The P_Key for any interface is given by the "pkey" file, and the main interface for a subinterface is in "parent." diff --git a/drivers/infiniband/ulp/ipoib/ipoib.h b/drivers/infiniband/ulp/ipoib/ipoib.h index 4a5c7a07a631..9d0010f9b324 100644 --- a/drivers/infiniband/ulp/ipoib/ipoib.h +++ b/drivers/infiniband/ulp/ipoib/ipoib.h @@ -589,6 +589,9 @@ void ipoib_event(struct ib_event_handler *handler, int ipoib_vlan_add(struct net_device *pdev, unsigned short pkey); int ipoib_vlan_delete(struct net_device *pdev, unsigned short pkey); +int ipoib_named_vlan_add(struct net_device *pdev, unsigned short pkey, + char *child_name_buf); +int ipoib_named_vlan_delete(struct net_device *pdev, char *child_name_buf); int __ipoib_vlan_add(struct ipoib_dev_priv *ppriv, struct ipoib_dev_priv *priv, u16 pkey, int child_type); diff --git a/drivers/infiniband/ulp/ipoib/ipoib_main.c b/drivers/infiniband/ulp/ipoib/ipoib_main.c index bac95b509a9b..2bdd4055d69f 100644 --- a/drivers/infiniband/ulp/ipoib/ipoib_main.c +++ b/drivers/infiniband/ulp/ipoib/ipoib_main.c @@ -34,6 +34,7 @@ #include "ipoib.h" +#include #include #include @@ -136,6 +137,13 @@ static int ipoib_netdev_event(struct notifier_block *this, } #endif +/* + * PKEY_HEXSTRING_MAXWIDTH - number of hex + * digits needed to represent max width of + * pkey value. + */ +#define PKEY_HEXSTRING_MAXWIDTH 4 + int ipoib_open(struct net_device *dev) { struct ipoib_dev_priv *priv = ipoib_priv(dev); @@ -2111,6 +2119,121 @@ static int ipoib_set_mac(struct net_device *dev, void *addr) return 0; } +/* + * Check if a buffer has name of the format + * + * .<4hexcharacters> + * e.g. ib1.8004 etc. + * + * Such names are generated by create_child() by + * concatenating parent device with 16-bit pkey + * in hex, and disallowed from usage with + * create_named_child() interface. + * + */ +static bool ipoib_disallowed_named_child_namespace(const char *buf) +{ + char localbuf[IFNAMSIZ]; + char *dotp = NULL; + char *buf_before_dot = NULL; + char *buf_after_dot = NULL; + unsigned int ii; + + memcpy(localbuf, buf, IFNAMSIZ); + localbuf[IFNAMSIZ-1] = '\0'; /* paranoia! */ + + dotp = strnchr(localbuf, IFNAMSIZ, '.'); + /* no dot or dot at end! */ + if (dotp == NULL || dotp == localbuf+IFNAMSIZ-2) + return false; + + *dotp = '\0'; /* split buffer at "dot" */ + buf_before_dot = localbuf; + buf_after_dot = dotp + 1; + + /* + * Check if buf_after_dot is hexstring of width + * that could be a pkey! + */ + if (strlen(buf_after_dot) != PKEY_HEXSTRING_MAXWIDTH) + return false; + + for (ii = 0; ii < PKEY_HEXSTRING_MAXWIDTH; ii++) { + if (!isxdigit(buf_after_dot[ii])) + return false; + } + + /* + * (1) buf_after_dot check above makes it valid hexdigit .XXXX format + * + * Now verify if buf_before_dot is a valid net device name - + * (if it is not, then we are not in disallowed namespace) + */ + if (__dev_get_by_name(&init_net, buf_before_dot) == NULL) + return false; + + /* + * (2) buf_before_dot is valid net device name + * - reserved namespace is being used! + * + * Note: No check on netdev->type to be ARPHRD_INFINIBAND etc + * We implicitly treat even misleading names such as eth1.XXXX + * (ethernet device prefix) for child interface name of an + * infiniband device as intrusion of reserved namespace! + */ + return true; +} + +static int parse_named_child(struct device *dev, const char *buf, + char *child_name_buf, int *pkeyp) +{ + int ret; + struct ipoib_dev_priv *priv = ipoib_priv(to_net_dev(dev)); + + if (pkeyp) + *pkeyp = -1; + + /* + * First parameter is child interface name, after that + * 'pkey' is required if we were passed a pkey buffer + * (Note: From create_named_child, we are passed a pkey + * buffer to parse input, from delete_named_child we are + * not!) + * Note: IFNAMSIZ is 16, allowing for tail null + * we only scan 15 characters for name. + */ + if (pkeyp) { + ret = sscanf(buf, "%15s %i", child_name_buf, pkeyp); + if (ret != 2) + return -EINVAL; + } else { + ret = sscanf(buf, "%15s", child_name_buf); + if (ret != 1) + return -EINVAL; + } + + if (strlen(child_name_buf) <= 0 || !dev_valid_name(child_name_buf)) + return -EINVAL; + + if (pkeyp && (*pkeyp <= 0 || *pkeyp > 0xffff || *pkeyp == 0x8000)) + return -EINVAL; + + if (ipoib_disallowed_named_child_namespace(child_name_buf)) { + pr_warn("child name %s not allowed to be used with create_named_child as it uses .XXXX format reserved for create_child/delete_child interfaces!\n", + child_name_buf); + return -EINVAL; + } + + if (pkeyp) + ipoib_dbg(priv, "%s inp %s out child_name_buf %s, pkey %04x\n", + __func__, buf, child_name_buf, *pkeyp); + else + ipoib_dbg(priv, "%s inp %s out child_name_buf %s\n", __func__, + buf, child_name_buf); + return 0; +} + + static ssize_t create_child(struct device *dev, struct device_attribute *attr, const char *buf, size_t count) @@ -2156,6 +2279,44 @@ static ssize_t delete_child(struct device *dev, } static DEVICE_ATTR(delete_child, S_IWUSR, NULL, delete_child); +static ssize_t create_named_child(struct device *dev, + struct device_attribute *attr, + const char *buf, size_t count) +{ + int pkey; + char child_name[IFNAMSIZ]; + int ret = 0; + + child_name[0] = '\0'; + + if (parse_named_child(dev, buf, child_name, &pkey)) + return -EINVAL; + + ret = ipoib_named_vlan_add(to_net_dev(dev), pkey, child_name); + return ret ? ret : count; +} +static DEVICE_ATTR(create_named_child, S_IWUSR, NULL, create_named_child); + +static ssize_t delete_named_child(struct device *dev, + struct device_attribute *attr, + const char *buf, size_t count) +{ + char child_name[IFNAMSIZ]; + int ret = 0; + + child_name[0] = '\0'; + + if (parse_named_child(dev, buf, child_name, NULL)) + return -EINVAL; + + ret = ipoib_named_vlan_delete(to_net_dev(dev), child_name); + + return ret ? ret : count; + +} +static DEVICE_ATTR(delete_named_child, S_IWUSR, NULL, delete_named_child); + + int ipoib_add_pkey_attr(struct net_device *dev) { return device_create_file(&dev->dev, &dev_attr_pkey); @@ -2263,6 +2424,11 @@ static struct net_device *ipoib_add_port(const char *format, goto sysfs_failed; if (device_create_file(&priv->dev->dev, &dev_attr_delete_child)) goto sysfs_failed; + if (device_create_file(&priv->dev->dev, &dev_attr_create_named_child)) + goto sysfs_failed; + if (device_create_file(&priv->dev->dev, &dev_attr_delete_named_child)) + goto sysfs_failed; + return priv->dev; @@ -2367,6 +2533,27 @@ static struct notifier_block ipoib_netdev_notifier = { }; #endif +int +ipoib_get_netdev_pkey(struct net_device *dev, u16 *pkey) +{ + struct ipoib_dev_priv *priv; + + if (dev->type != ARPHRD_INFINIBAND) + return -EINVAL; + + /* only for ipoib net devices! */ + if ((dev->netdev_ops != &ipoib_netdev_ops_pf) && + (dev->netdev_ops != &ipoib_netdev_ops_vf)) + return -EINVAL; + + priv = ipoib_priv(dev); + + *pkey = priv->pkey; + + return 0; +} +EXPORT_SYMBOL(ipoib_get_netdev_pkey); + static int __init ipoib_init_module(void) { int ret; diff --git a/drivers/infiniband/ulp/ipoib/ipoib_vlan.c b/drivers/infiniband/ulp/ipoib/ipoib_vlan.c index 9927cd6b7082..f5ae55f4f845 100644 --- a/drivers/infiniband/ulp/ipoib/ipoib_vlan.c +++ b/drivers/infiniband/ulp/ipoib/ipoib_vlan.c @@ -115,7 +115,9 @@ int __ipoib_vlan_add(struct ipoib_dev_priv *ppriv, struct ipoib_dev_priv *priv, return result; } -int ipoib_vlan_add(struct net_device *pdev, unsigned short pkey) +int ipoib_vlan_add_common(struct net_device *pdev, + unsigned short pkey, + char *child_name_buf) { struct ipoib_dev_priv *ppriv, *priv; char intf_name[IFNAMSIZ]; @@ -130,8 +132,21 @@ int ipoib_vlan_add(struct net_device *pdev, unsigned short pkey) if (test_bit(IPOIB_FLAG_GOING_DOWN, &ppriv->flags)) return -EPERM; - snprintf(intf_name, sizeof intf_name, "%s.%04x", - ppriv->dev->name, pkey); + if (child_name_buf == NULL) { + /* + * If child name is not provided, we generated + * one using name of parent and pkey. + */ + snprintf(intf_name, sizeof(intf_name), "%s.%04x", + ppriv->dev->name, pkey); + } else { + /* + * Note: Duplicate intf_name will be detected later in the code + * by register_netdevice() (inside __ipoib_vlan_add() call + * below) returning EEXIST! + */ + strncpy(intf_name, child_name_buf, IFNAMSIZ); + } if (!mutex_trylock(&ppriv->sysfs_mutex)) return restart_syscall(); @@ -183,10 +198,27 @@ int ipoib_vlan_add(struct net_device *pdev, unsigned short pkey) return result; } -int ipoib_vlan_delete(struct net_device *pdev, unsigned short pkey) +int ipoib_vlan_add(struct net_device *pdev, unsigned short pkey) +{ + return ipoib_vlan_add_common(pdev, pkey, NULL); +} + +int ipoib_named_vlan_add(struct net_device *pdev, + unsigned short pkey, + char *child_name_buf) +{ + return ipoib_vlan_add_common(pdev, pkey, child_name_buf); +} + +int ipoib_vlan_delete_common(struct net_device *pdev, + unsigned short pkey, + char *child_name_buf) { struct ipoib_dev_priv *ppriv, *priv, *tpriv; struct net_device *dev = NULL; + char gen_intf_name[IFNAMSIZ]; + + gen_intf_name[0] = '\0'; /* initialize - paranoia! */ if (!capable(CAP_NET_ADMIN)) return -EPERM; @@ -205,9 +237,30 @@ int ipoib_vlan_delete(struct net_device *pdev, unsigned short pkey) } down_write(&ppriv->vlan_rwsem); + if (child_name_buf == NULL && ppriv->dev) { + /* + * If child name is not provided, we generate the + * expected one using name of parent and pkey + * and use it in addition to pkey value + * (other children with same pkey may exist that have + * created by create_named_child() - we do not allow + * delete_child() to delete them - delete_named_child() + * has to be used!) + */ + snprintf(gen_intf_name, sizeof(gen_intf_name), + "%s.%04x", ppriv->dev->name, pkey); + } list_for_each_entry_safe(priv, tpriv, &ppriv->child_intfs, list) { - if (priv->pkey == pkey && - priv->child_type == IPOIB_LEGACY_CHILD) { + if ((priv->child_type == IPOIB_LEGACY_CHILD) && + /* user named child (match by name) OR */ + ((child_name_buf && priv->dev && + !strcmp(child_name_buf, priv->dev->name)) || + /* + * OR classic (devname.hexpkey generated name) child + * (match by pkey and generated name) + */ + (!child_name_buf && priv->pkey == pkey && + priv->dev && !strcmp(gen_intf_name, priv->dev->name)))) { list_del(&priv->list); dev = priv->dev; break; @@ -231,3 +284,14 @@ int ipoib_vlan_delete(struct net_device *pdev, unsigned short pkey) return -ENODEV; } + +int ipoib_vlan_delete(struct net_device *pdev, unsigned short pkey) +{ + + return ipoib_vlan_delete_common(pdev, pkey, NULL); +} + +int ipoib_named_vlan_delete(struct net_device *pdev, char *child_name_buf) +{ + return ipoib_vlan_delete_common(pdev, 0, child_name_buf); +}