Message ID | 20170927093248.3819-1-yuval.shaia@oracle.com (mailing list archive) |
---|---|
State | Changes Requested |
Headers | show |
On Wed, 2017-09-27 at 12:32 +0300, Yuval Shaia wrote: > + /* > + * (1) buf_after_dot check above makes it valid hexdigit > .XXXX format > + * > + * Now verify if buf_before_dot is a valid net device name - > + * (if it is not, then we are not in disallowed namespace) > + */ > + if (__dev_get_by_name(&init_net, buf_before_dot) == NULL) > + return false; This is wrong. We don't use &init_net in ipoib any more, we are namespace aware, so your patch must be namespace aware too.
On Wed, Sep 27, 2017 at 12:32:48PM +0300, Yuval Shaia wrote: > The sysfs "create_child" interface creates pkey based child interface but > derives the name from parent device name and pkey value. > This makes administration difficult where pkey values can change but > policies encoded with device names do not. > > We add ability to create a child interface with a user specified name and a > specified pkey with a new sysfs "create_named_child" interface (and also > add a corresponding "delete_named_child" interface). > > We also add a new module api interface to query pkey from a netdevice so > any kernel users of pkey based child interfaces can query it - since with > device name decoupled from pkey, it can no longer be deduced from parsing > the device name by other kernel users. > > Signed-off-by: Mukesh Kacker <mukesh.kacker@oracle.com> > Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com> > Reviewed-by: Chien-Hua Yen <chien.yen@oracle.com> > Signed-off-by: Yuval Shaia <yuval.shaia@oracle.com> > --- > Documentation/infiniband/ipoib.txt | 12 ++ > drivers/infiniband/ulp/ipoib/ipoib.h | 3 + > drivers/infiniband/ulp/ipoib/ipoib_main.c | 187 ++++++++++++++++++++++++++++++ > drivers/infiniband/ulp/ipoib/ipoib_vlan.c | 76 +++++++++++- > 4 files changed, 272 insertions(+), 6 deletions(-) > > diff --git a/Documentation/infiniband/ipoib.txt b/Documentation/infiniband/ipoib.txt > index 47c1dd9818f2..1db53c9b2906 100644 > --- a/Documentation/infiniband/ipoib.txt > +++ b/Documentation/infiniband/ipoib.txt > @@ -21,6 +21,18 @@ Partitions and P_Keys > > echo 0x8001 > /sys/class/net/ib0/delete_child > > + Interfaces with a user chosen name can be created in a similar > + manner with a different name and P_Key, by writing them into the > + main interface's /sys/class/net/<intf name>/create_named_child > + For example: > + echo "epart2 0x8002" > /sys/class/net/ib1/create_named_child > + > + This will create an interfaces named epart2 with P_Key 0x8002 and > + parent ib1. To remove a named subinterface, use the > + "delete_named_child" file: > + > + echo epart2 > /sys/class/net/ib1/delete_named_child I doubt that delete_named_child is actually needed. You can use delete_child on the pkey, which you used to create named child. Maybe better to add support to rename child instead of introducing named child concept? Thanks
Please do retain original author name from UEK4 and that should be me! :-) [ can be fixed by editing with "git commit --amend --author="<string>" ] -Mukesh Kacker On 09/27/2017 02:32 AM, Yuval Shaia wrote: > The sysfs "create_child" interface creates pkey based child interface but > derives the name from parent device name and pkey value. > This makes administration difficult where pkey values can change but > policies encoded with device names do not. > > We add ability to create a child interface with a user specified name and a > specified pkey with a new sysfs "create_named_child" interface (and also > add a corresponding "delete_named_child" interface). > > We also add a new module api interface to query pkey from a netdevice so > any kernel users of pkey based child interfaces can query it - since with > device name decoupled from pkey, it can no longer be deduced from parsing > the device name by other kernel users. > > Signed-off-by: Mukesh Kacker <mukesh.kacker@oracle.com> > Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com> > Reviewed-by: Chien-Hua Yen <chien.yen@oracle.com> > Signed-off-by: Yuval Shaia <yuval.shaia@oracle.com> > --- > Documentation/infiniband/ipoib.txt | 12 ++ > drivers/infiniband/ulp/ipoib/ipoib.h | 3 + > drivers/infiniband/ulp/ipoib/ipoib_main.c | 187 ++++++++++++++++++++++++++++++ > drivers/infiniband/ulp/ipoib/ipoib_vlan.c | 76 +++++++++++- > 4 files changed, 272 insertions(+), 6 deletions(-) > > diff --git a/Documentation/infiniband/ipoib.txt b/Documentation/infiniband/ipoib.txt > index 47c1dd9818f2..1db53c9b2906 100644 > --- a/Documentation/infiniband/ipoib.txt > +++ b/Documentation/infiniband/ipoib.txt > @@ -21,6 +21,18 @@ Partitions and P_Keys > > echo 0x8001 > /sys/class/net/ib0/delete_child > > + Interfaces with a user chosen name can be created in a similar > + manner with a different name and P_Key, by writing them into the > + main interface's /sys/class/net/<intf name>/create_named_child > + For example: > + echo "epart2 0x8002" > /sys/class/net/ib1/create_named_child > + > + This will create an interfaces named epart2 with P_Key 0x8002 and > + parent ib1. To remove a named subinterface, use the > + "delete_named_child" file: > + > + echo epart2 > /sys/class/net/ib1/delete_named_child > + > The P_Key for any interface is given by the "pkey" file, and the > main interface for a subinterface is in "parent." > > diff --git a/drivers/infiniband/ulp/ipoib/ipoib.h b/drivers/infiniband/ulp/ipoib/ipoib.h > index 4a5c7a07a631..9d0010f9b324 100644 > --- a/drivers/infiniband/ulp/ipoib/ipoib.h > +++ b/drivers/infiniband/ulp/ipoib/ipoib.h > @@ -589,6 +589,9 @@ void ipoib_event(struct ib_event_handler *handler, > > int ipoib_vlan_add(struct net_device *pdev, unsigned short pkey); > int ipoib_vlan_delete(struct net_device *pdev, unsigned short pkey); > +int ipoib_named_vlan_add(struct net_device *pdev, unsigned short pkey, > + char *child_name_buf); > +int ipoib_named_vlan_delete(struct net_device *pdev, char *child_name_buf); > > int __ipoib_vlan_add(struct ipoib_dev_priv *ppriv, struct ipoib_dev_priv *priv, > u16 pkey, int child_type); > diff --git a/drivers/infiniband/ulp/ipoib/ipoib_main.c b/drivers/infiniband/ulp/ipoib/ipoib_main.c > index bac95b509a9b..2bdd4055d69f 100644 > --- a/drivers/infiniband/ulp/ipoib/ipoib_main.c > +++ b/drivers/infiniband/ulp/ipoib/ipoib_main.c > @@ -34,6 +34,7 @@ > > #include "ipoib.h" > > +#include <linux/ctype.h> > #include <linux/module.h> > > #include <linux/init.h> > @@ -136,6 +137,13 @@ static int ipoib_netdev_event(struct notifier_block *this, > } > #endif > > +/* > + * PKEY_HEXSTRING_MAXWIDTH - number of hex > + * digits needed to represent max width of > + * pkey value. > + */ > +#define PKEY_HEXSTRING_MAXWIDTH 4 > + > int ipoib_open(struct net_device *dev) > { > struct ipoib_dev_priv *priv = ipoib_priv(dev); > @@ -2111,6 +2119,121 @@ static int ipoib_set_mac(struct net_device *dev, void *addr) > return 0; > } > > +/* > + * Check if a buffer has name of the format > + * > + * <network-device-name>.<4hexcharacters> > + * e.g. ib1.8004 etc. > + * > + * Such names are generated by create_child() by > + * concatenating parent device with 16-bit pkey > + * in hex, and disallowed from usage with > + * create_named_child() interface. > + * > + */ > +static bool ipoib_disallowed_named_child_namespace(const char *buf) > +{ > + char localbuf[IFNAMSIZ]; > + char *dotp = NULL; > + char *buf_before_dot = NULL; > + char *buf_after_dot = NULL; > + unsigned int ii; > + > + memcpy(localbuf, buf, IFNAMSIZ); > + localbuf[IFNAMSIZ-1] = '\0'; /* paranoia! */ > + > + dotp = strnchr(localbuf, IFNAMSIZ, '.'); > + /* no dot or dot at end! */ > + if (dotp == NULL || dotp == localbuf+IFNAMSIZ-2) > + return false; > + > + *dotp = '\0'; /* split buffer at "dot" */ > + buf_before_dot = localbuf; > + buf_after_dot = dotp + 1; > + > + /* > + * Check if buf_after_dot is hexstring of width > + * that could be a pkey! > + */ > + if (strlen(buf_after_dot) != PKEY_HEXSTRING_MAXWIDTH) > + return false; > + > + for (ii = 0; ii < PKEY_HEXSTRING_MAXWIDTH; ii++) { > + if (!isxdigit(buf_after_dot[ii])) > + return false; > + } > + > + /* > + * (1) buf_after_dot check above makes it valid hexdigit .XXXX format > + * > + * Now verify if buf_before_dot is a valid net device name - > + * (if it is not, then we are not in disallowed namespace) > + */ > + if (__dev_get_by_name(&init_net, buf_before_dot) == NULL) > + return false; > + > + /* > + * (2) buf_before_dot is valid net device name > + * - reserved namespace is being used! > + * > + * Note: No check on netdev->type to be ARPHRD_INFINIBAND etc > + * We implicitly treat even misleading names such as eth1.XXXX > + * (ethernet device prefix) for child interface name of an > + * infiniband device as intrusion of reserved namespace! > + */ > + return true; > +} > + > +static int parse_named_child(struct device *dev, const char *buf, > + char *child_name_buf, int *pkeyp) > +{ > + int ret; > + struct ipoib_dev_priv *priv = ipoib_priv(to_net_dev(dev)); > + > + if (pkeyp) > + *pkeyp = -1; > + > + /* > + * First parameter is child interface name, after that > + * 'pkey' is required if we were passed a pkey buffer > + * (Note: From create_named_child, we are passed a pkey > + * buffer to parse input, from delete_named_child we are > + * not!) > + * Note: IFNAMSIZ is 16, allowing for tail null > + * we only scan 15 characters for name. > + */ > + if (pkeyp) { > + ret = sscanf(buf, "%15s %i", child_name_buf, pkeyp); > + if (ret != 2) > + return -EINVAL; > + } else { > + ret = sscanf(buf, "%15s", child_name_buf); > + if (ret != 1) > + return -EINVAL; > + } > + > + if (strlen(child_name_buf) <= 0 || !dev_valid_name(child_name_buf)) > + return -EINVAL; > + > + if (pkeyp && (*pkeyp <= 0 || *pkeyp > 0xffff || *pkeyp == 0x8000)) > + return -EINVAL; > + > + if (ipoib_disallowed_named_child_namespace(child_name_buf)) { > + pr_warn("child name %s not allowed to be used with create_named_child as it uses <network-device-name>.XXXX format reserved for create_child/delete_child interfaces!\n", > + child_name_buf); > + return -EINVAL; > + } > + > + if (pkeyp) > + ipoib_dbg(priv, "%s inp %s out child_name_buf %s, pkey %04x\n", > + __func__, buf, child_name_buf, *pkeyp); > + else > + ipoib_dbg(priv, "%s inp %s out child_name_buf %s\n", __func__, > + buf, child_name_buf); > + return 0; > +} > + > + > static ssize_t create_child(struct device *dev, > struct device_attribute *attr, > const char *buf, size_t count) > @@ -2156,6 +2279,44 @@ static ssize_t delete_child(struct device *dev, > } > static DEVICE_ATTR(delete_child, S_IWUSR, NULL, delete_child); > > +static ssize_t create_named_child(struct device *dev, > + struct device_attribute *attr, > + const char *buf, size_t count) > +{ > + int pkey; > + char child_name[IFNAMSIZ]; > + int ret = 0; > + > + child_name[0] = '\0'; > + > + if (parse_named_child(dev, buf, child_name, &pkey)) > + return -EINVAL; > + > + ret = ipoib_named_vlan_add(to_net_dev(dev), pkey, child_name); > + return ret ? ret : count; > +} > +static DEVICE_ATTR(create_named_child, S_IWUSR, NULL, create_named_child); > + > +static ssize_t delete_named_child(struct device *dev, > + struct device_attribute *attr, > + const char *buf, size_t count) > +{ > + char child_name[IFNAMSIZ]; > + int ret = 0; > + > + child_name[0] = '\0'; > + > + if (parse_named_child(dev, buf, child_name, NULL)) > + return -EINVAL; > + > + ret = ipoib_named_vlan_delete(to_net_dev(dev), child_name); > + > + return ret ? ret : count; > + > +} > +static DEVICE_ATTR(delete_named_child, S_IWUSR, NULL, delete_named_child); > + > + > int ipoib_add_pkey_attr(struct net_device *dev) > { > return device_create_file(&dev->dev, &dev_attr_pkey); > @@ -2263,6 +2424,11 @@ static struct net_device *ipoib_add_port(const char *format, > goto sysfs_failed; > if (device_create_file(&priv->dev->dev, &dev_attr_delete_child)) > goto sysfs_failed; > + if (device_create_file(&priv->dev->dev, &dev_attr_create_named_child)) > + goto sysfs_failed; > + if (device_create_file(&priv->dev->dev, &dev_attr_delete_named_child)) > + goto sysfs_failed; > + > > return priv->dev; > > @@ -2367,6 +2533,27 @@ static struct notifier_block ipoib_netdev_notifier = { > }; > #endif > > +int > +ipoib_get_netdev_pkey(struct net_device *dev, u16 *pkey) > +{ > + struct ipoib_dev_priv *priv; > + > + if (dev->type != ARPHRD_INFINIBAND) > + return -EINVAL; > + > + /* only for ipoib net devices! */ > + if ((dev->netdev_ops != &ipoib_netdev_ops_pf) && > + (dev->netdev_ops != &ipoib_netdev_ops_vf)) > + return -EINVAL; > + > + priv = ipoib_priv(dev); > + > + *pkey = priv->pkey; > + > + return 0; > +} > +EXPORT_SYMBOL(ipoib_get_netdev_pkey); > + > static int __init ipoib_init_module(void) > { > int ret; > diff --git a/drivers/infiniband/ulp/ipoib/ipoib_vlan.c b/drivers/infiniband/ulp/ipoib/ipoib_vlan.c > index 9927cd6b7082..f5ae55f4f845 100644 > --- a/drivers/infiniband/ulp/ipoib/ipoib_vlan.c > +++ b/drivers/infiniband/ulp/ipoib/ipoib_vlan.c > @@ -115,7 +115,9 @@ int __ipoib_vlan_add(struct ipoib_dev_priv *ppriv, struct ipoib_dev_priv *priv, > return result; > } > > -int ipoib_vlan_add(struct net_device *pdev, unsigned short pkey) > +int ipoib_vlan_add_common(struct net_device *pdev, > + unsigned short pkey, > + char *child_name_buf) > { > struct ipoib_dev_priv *ppriv, *priv; > char intf_name[IFNAMSIZ]; > @@ -130,8 +132,21 @@ int ipoib_vlan_add(struct net_device *pdev, unsigned short pkey) > if (test_bit(IPOIB_FLAG_GOING_DOWN, &ppriv->flags)) > return -EPERM; > > - snprintf(intf_name, sizeof intf_name, "%s.%04x", > - ppriv->dev->name, pkey); > + if (child_name_buf == NULL) { > + /* > + * If child name is not provided, we generated > + * one using name of parent and pkey. > + */ > + snprintf(intf_name, sizeof(intf_name), "%s.%04x", > + ppriv->dev->name, pkey); > + } else { > + /* > + * Note: Duplicate intf_name will be detected later in the code > + * by register_netdevice() (inside __ipoib_vlan_add() call > + * below) returning EEXIST! > + */ > + strncpy(intf_name, child_name_buf, IFNAMSIZ); > + } > > if (!mutex_trylock(&ppriv->sysfs_mutex)) > return restart_syscall(); > @@ -183,10 +198,27 @@ int ipoib_vlan_add(struct net_device *pdev, unsigned short pkey) > return result; > } > > -int ipoib_vlan_delete(struct net_device *pdev, unsigned short pkey) > +int ipoib_vlan_add(struct net_device *pdev, unsigned short pkey) > +{ > + return ipoib_vlan_add_common(pdev, pkey, NULL); > +} > + > +int ipoib_named_vlan_add(struct net_device *pdev, > + unsigned short pkey, > + char *child_name_buf) > +{ > + return ipoib_vlan_add_common(pdev, pkey, child_name_buf); > +} > + > +int ipoib_vlan_delete_common(struct net_device *pdev, > + unsigned short pkey, > + char *child_name_buf) > { > struct ipoib_dev_priv *ppriv, *priv, *tpriv; > struct net_device *dev = NULL; > + char gen_intf_name[IFNAMSIZ]; > + > + gen_intf_name[0] = '\0'; /* initialize - paranoia! */ > > if (!capable(CAP_NET_ADMIN)) > return -EPERM; > @@ -205,9 +237,30 @@ int ipoib_vlan_delete(struct net_device *pdev, unsigned short pkey) > } > > down_write(&ppriv->vlan_rwsem); > + if (child_name_buf == NULL && ppriv->dev) { > + /* > + * If child name is not provided, we generate the > + * expected one using name of parent and pkey > + * and use it in addition to pkey value > + * (other children with same pkey may exist that have > + * created by create_named_child() - we do not allow > + * delete_child() to delete them - delete_named_child() > + * has to be used!) > + */ > + snprintf(gen_intf_name, sizeof(gen_intf_name), > + "%s.%04x", ppriv->dev->name, pkey); > + } > list_for_each_entry_safe(priv, tpriv, &ppriv->child_intfs, list) { > - if (priv->pkey == pkey && > - priv->child_type == IPOIB_LEGACY_CHILD) { > + if ((priv->child_type == IPOIB_LEGACY_CHILD) && > + /* user named child (match by name) OR */ > + ((child_name_buf && priv->dev && > + !strcmp(child_name_buf, priv->dev->name)) || > + /* > + * OR classic (devname.hexpkey generated name) child > + * (match by pkey and generated name) > + */ > + (!child_name_buf && priv->pkey == pkey && > + priv->dev && !strcmp(gen_intf_name, priv->dev->name)))) { > list_del(&priv->list); > dev = priv->dev; > break; > @@ -231,3 +284,14 @@ int ipoib_vlan_delete(struct net_device *pdev, unsigned short pkey) > > return -ENODEV; > } > + > +int ipoib_vlan_delete(struct net_device *pdev, unsigned short pkey) > +{ > + > + return ipoib_vlan_delete_common(pdev, pkey, NULL); > +} > + > +int ipoib_named_vlan_delete(struct net_device *pdev, char *child_name_buf) > +{ > + return ipoib_vlan_delete_common(pdev, 0, child_name_buf); > +} > -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, Sep 27, 2017 at 10:20:20AM -0700, Mukesh Kacker wrote: > Please do retain original author name from UEK4 and that should be me! :-) Oops, Will be fixed for v1 > [ can be fixed by editing with "git commit --amend --author="<string>" ] > > -Mukesh Kacker > > > On 09/27/2017 02:32 AM, Yuval Shaia wrote: > > The sysfs "create_child" interface creates pkey based child interface but > > derives the name from parent device name and pkey value. > > This makes administration difficult where pkey values can change but > > policies encoded with device names do not. > > > > We add ability to create a child interface with a user specified name and a > > specified pkey with a new sysfs "create_named_child" interface (and also > > add a corresponding "delete_named_child" interface). > > > > We also add a new module api interface to query pkey from a netdevice so > > any kernel users of pkey based child interfaces can query it - since with > > device name decoupled from pkey, it can no longer be deduced from parsing > > the device name by other kernel users. > > > > Signed-off-by: Mukesh Kacker <mukesh.kacker@oracle.com> > > Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com> > > Reviewed-by: Chien-Hua Yen <chien.yen@oracle.com> > > Signed-off-by: Yuval Shaia <yuval.shaia@oracle.com> > > --- > > Documentation/infiniband/ipoib.txt | 12 ++ > > drivers/infiniband/ulp/ipoib/ipoib.h | 3 + > > drivers/infiniband/ulp/ipoib/ipoib_main.c | 187 ++++++++++++++++++++++++++++++ > > drivers/infiniband/ulp/ipoib/ipoib_vlan.c | 76 +++++++++++- > > 4 files changed, 272 insertions(+), 6 deletions(-) > > > > diff --git a/Documentation/infiniband/ipoib.txt b/Documentation/infiniband/ipoib.txt > > index 47c1dd9818f2..1db53c9b2906 100644 > > --- a/Documentation/infiniband/ipoib.txt > > +++ b/Documentation/infiniband/ipoib.txt > > @@ -21,6 +21,18 @@ Partitions and P_Keys > > echo 0x8001 > /sys/class/net/ib0/delete_child > > + Interfaces with a user chosen name can be created in a similar > > + manner with a different name and P_Key, by writing them into the > > + main interface's /sys/class/net/<intf name>/create_named_child > > + For example: > > + echo "epart2 0x8002" > /sys/class/net/ib1/create_named_child > > + > > + This will create an interfaces named epart2 with P_Key 0x8002 and > > + parent ib1. To remove a named subinterface, use the > > + "delete_named_child" file: > > + > > + echo epart2 > /sys/class/net/ib1/delete_named_child > > + > > The P_Key for any interface is given by the "pkey" file, and the > > main interface for a subinterface is in "parent." > > diff --git a/drivers/infiniband/ulp/ipoib/ipoib.h b/drivers/infiniband/ulp/ipoib/ipoib.h > > index 4a5c7a07a631..9d0010f9b324 100644 > > --- a/drivers/infiniband/ulp/ipoib/ipoib.h > > +++ b/drivers/infiniband/ulp/ipoib/ipoib.h > > @@ -589,6 +589,9 @@ void ipoib_event(struct ib_event_handler *handler, > > int ipoib_vlan_add(struct net_device *pdev, unsigned short pkey); > > int ipoib_vlan_delete(struct net_device *pdev, unsigned short pkey); > > +int ipoib_named_vlan_add(struct net_device *pdev, unsigned short pkey, > > + char *child_name_buf); > > +int ipoib_named_vlan_delete(struct net_device *pdev, char *child_name_buf); > > int __ipoib_vlan_add(struct ipoib_dev_priv *ppriv, struct ipoib_dev_priv *priv, > > u16 pkey, int child_type); > > diff --git a/drivers/infiniband/ulp/ipoib/ipoib_main.c b/drivers/infiniband/ulp/ipoib/ipoib_main.c > > index bac95b509a9b..2bdd4055d69f 100644 > > --- a/drivers/infiniband/ulp/ipoib/ipoib_main.c > > +++ b/drivers/infiniband/ulp/ipoib/ipoib_main.c > > @@ -34,6 +34,7 @@ > > #include "ipoib.h" > > +#include <linux/ctype.h> > > #include <linux/module.h> > > #include <linux/init.h> > > @@ -136,6 +137,13 @@ static int ipoib_netdev_event(struct notifier_block *this, > > } > > #endif > > +/* > > + * PKEY_HEXSTRING_MAXWIDTH - number of hex > > + * digits needed to represent max width of > > + * pkey value. > > + */ > > +#define PKEY_HEXSTRING_MAXWIDTH 4 > > + > > int ipoib_open(struct net_device *dev) > > { > > struct ipoib_dev_priv *priv = ipoib_priv(dev); > > @@ -2111,6 +2119,121 @@ static int ipoib_set_mac(struct net_device *dev, void *addr) > > return 0; > > } > > +/* > > + * Check if a buffer has name of the format > > + * > > + * <network-device-name>.<4hexcharacters> > > + * e.g. ib1.8004 etc. > > + * > > + * Such names are generated by create_child() by > > + * concatenating parent device with 16-bit pkey > > + * in hex, and disallowed from usage with > > + * create_named_child() interface. > > + * > > + */ > > +static bool ipoib_disallowed_named_child_namespace(const char *buf) > > +{ > > + char localbuf[IFNAMSIZ]; > > + char *dotp = NULL; > > + char *buf_before_dot = NULL; > > + char *buf_after_dot = NULL; > > + unsigned int ii; > > + > > + memcpy(localbuf, buf, IFNAMSIZ); > > + localbuf[IFNAMSIZ-1] = '\0'; /* paranoia! */ > > + > > + dotp = strnchr(localbuf, IFNAMSIZ, '.'); > > + /* no dot or dot at end! */ > > + if (dotp == NULL || dotp == localbuf+IFNAMSIZ-2) > > + return false; > > + > > + *dotp = '\0'; /* split buffer at "dot" */ > > + buf_before_dot = localbuf; > > + buf_after_dot = dotp + 1; > > + > > + /* > > + * Check if buf_after_dot is hexstring of width > > + * that could be a pkey! > > + */ > > + if (strlen(buf_after_dot) != PKEY_HEXSTRING_MAXWIDTH) > > + return false; > > + > > + for (ii = 0; ii < PKEY_HEXSTRING_MAXWIDTH; ii++) { > > + if (!isxdigit(buf_after_dot[ii])) > > + return false; > > + } > > + > > + /* > > + * (1) buf_after_dot check above makes it valid hexdigit .XXXX format > > + * > > + * Now verify if buf_before_dot is a valid net device name - > > + * (if it is not, then we are not in disallowed namespace) > > + */ > > + if (__dev_get_by_name(&init_net, buf_before_dot) == NULL) > > + return false; > > + > > + /* > > + * (2) buf_before_dot is valid net device name > > + * - reserved namespace is being used! > > + * > > + * Note: No check on netdev->type to be ARPHRD_INFINIBAND etc > > + * We implicitly treat even misleading names such as eth1.XXXX > > + * (ethernet device prefix) for child interface name of an > > + * infiniband device as intrusion of reserved namespace! > > + */ > > + return true; > > +} > > + > > +static int parse_named_child(struct device *dev, const char *buf, > > + char *child_name_buf, int *pkeyp) > > +{ > > + int ret; > > + struct ipoib_dev_priv *priv = ipoib_priv(to_net_dev(dev)); > > + > > + if (pkeyp) > > + *pkeyp = -1; > > + > > + /* > > + * First parameter is child interface name, after that > > + * 'pkey' is required if we were passed a pkey buffer > > + * (Note: From create_named_child, we are passed a pkey > > + * buffer to parse input, from delete_named_child we are > > + * not!) > > + * Note: IFNAMSIZ is 16, allowing for tail null > > + * we only scan 15 characters for name. > > + */ > > + if (pkeyp) { > > + ret = sscanf(buf, "%15s %i", child_name_buf, pkeyp); > > + if (ret != 2) > > + return -EINVAL; > > + } else { > > + ret = sscanf(buf, "%15s", child_name_buf); > > + if (ret != 1) > > + return -EINVAL; > > + } > > + > > + if (strlen(child_name_buf) <= 0 || !dev_valid_name(child_name_buf)) > > + return -EINVAL; > > + > > + if (pkeyp && (*pkeyp <= 0 || *pkeyp > 0xffff || *pkeyp == 0x8000)) > > + return -EINVAL; > > + > > + if (ipoib_disallowed_named_child_namespace(child_name_buf)) { > > + pr_warn("child name %s not allowed to be used with create_named_child as it uses <network-device-name>.XXXX format reserved for create_child/delete_child interfaces!\n", > > + child_name_buf); > > + return -EINVAL; > > + } > > + > > + if (pkeyp) > > + ipoib_dbg(priv, "%s inp %s out child_name_buf %s, pkey %04x\n", > > + __func__, buf, child_name_buf, *pkeyp); > > + else > > + ipoib_dbg(priv, "%s inp %s out child_name_buf %s\n", __func__, > > + buf, child_name_buf); > > + return 0; > > +} > > + > > + > > static ssize_t create_child(struct device *dev, > > struct device_attribute *attr, > > const char *buf, size_t count) > > @@ -2156,6 +2279,44 @@ static ssize_t delete_child(struct device *dev, > > } > > static DEVICE_ATTR(delete_child, S_IWUSR, NULL, delete_child); > > +static ssize_t create_named_child(struct device *dev, > > + struct device_attribute *attr, > > + const char *buf, size_t count) > > +{ > > + int pkey; > > + char child_name[IFNAMSIZ]; > > + int ret = 0; > > + > > + child_name[0] = '\0'; > > + > > + if (parse_named_child(dev, buf, child_name, &pkey)) > > + return -EINVAL; > > + > > + ret = ipoib_named_vlan_add(to_net_dev(dev), pkey, child_name); > > + return ret ? ret : count; > > +} > > +static DEVICE_ATTR(create_named_child, S_IWUSR, NULL, create_named_child); > > + > > +static ssize_t delete_named_child(struct device *dev, > > + struct device_attribute *attr, > > + const char *buf, size_t count) > > +{ > > + char child_name[IFNAMSIZ]; > > + int ret = 0; > > + > > + child_name[0] = '\0'; > > + > > + if (parse_named_child(dev, buf, child_name, NULL)) > > + return -EINVAL; > > + > > + ret = ipoib_named_vlan_delete(to_net_dev(dev), child_name); > > + > > + return ret ? ret : count; > > + > > +} > > +static DEVICE_ATTR(delete_named_child, S_IWUSR, NULL, delete_named_child); > > + > > + > > int ipoib_add_pkey_attr(struct net_device *dev) > > { > > return device_create_file(&dev->dev, &dev_attr_pkey); > > @@ -2263,6 +2424,11 @@ static struct net_device *ipoib_add_port(const char *format, > > goto sysfs_failed; > > if (device_create_file(&priv->dev->dev, &dev_attr_delete_child)) > > goto sysfs_failed; > > + if (device_create_file(&priv->dev->dev, &dev_attr_create_named_child)) > > + goto sysfs_failed; > > + if (device_create_file(&priv->dev->dev, &dev_attr_delete_named_child)) > > + goto sysfs_failed; > > + > > return priv->dev; > > @@ -2367,6 +2533,27 @@ static struct notifier_block ipoib_netdev_notifier = { > > }; > > #endif > > +int > > +ipoib_get_netdev_pkey(struct net_device *dev, u16 *pkey) > > +{ > > + struct ipoib_dev_priv *priv; > > + > > + if (dev->type != ARPHRD_INFINIBAND) > > + return -EINVAL; > > + > > + /* only for ipoib net devices! */ > > + if ((dev->netdev_ops != &ipoib_netdev_ops_pf) && > > + (dev->netdev_ops != &ipoib_netdev_ops_vf)) > > + return -EINVAL; > > + > > + priv = ipoib_priv(dev); > > + > > + *pkey = priv->pkey; > > + > > + return 0; > > +} > > +EXPORT_SYMBOL(ipoib_get_netdev_pkey); > > + > > static int __init ipoib_init_module(void) > > { > > int ret; > > diff --git a/drivers/infiniband/ulp/ipoib/ipoib_vlan.c b/drivers/infiniband/ulp/ipoib/ipoib_vlan.c > > index 9927cd6b7082..f5ae55f4f845 100644 > > --- a/drivers/infiniband/ulp/ipoib/ipoib_vlan.c > > +++ b/drivers/infiniband/ulp/ipoib/ipoib_vlan.c > > @@ -115,7 +115,9 @@ int __ipoib_vlan_add(struct ipoib_dev_priv *ppriv, struct ipoib_dev_priv *priv, > > return result; > > } > > -int ipoib_vlan_add(struct net_device *pdev, unsigned short pkey) > > +int ipoib_vlan_add_common(struct net_device *pdev, > > + unsigned short pkey, > > + char *child_name_buf) > > { > > struct ipoib_dev_priv *ppriv, *priv; > > char intf_name[IFNAMSIZ]; > > @@ -130,8 +132,21 @@ int ipoib_vlan_add(struct net_device *pdev, unsigned short pkey) > > if (test_bit(IPOIB_FLAG_GOING_DOWN, &ppriv->flags)) > > return -EPERM; > > - snprintf(intf_name, sizeof intf_name, "%s.%04x", > > - ppriv->dev->name, pkey); > > + if (child_name_buf == NULL) { > > + /* > > + * If child name is not provided, we generated > > + * one using name of parent and pkey. > > + */ > > + snprintf(intf_name, sizeof(intf_name), "%s.%04x", > > + ppriv->dev->name, pkey); > > + } else { > > + /* > > + * Note: Duplicate intf_name will be detected later in the code > > + * by register_netdevice() (inside __ipoib_vlan_add() call > > + * below) returning EEXIST! > > + */ > > + strncpy(intf_name, child_name_buf, IFNAMSIZ); > > + } > > if (!mutex_trylock(&ppriv->sysfs_mutex)) > > return restart_syscall(); > > @@ -183,10 +198,27 @@ int ipoib_vlan_add(struct net_device *pdev, unsigned short pkey) > > return result; > > } > > -int ipoib_vlan_delete(struct net_device *pdev, unsigned short pkey) > > +int ipoib_vlan_add(struct net_device *pdev, unsigned short pkey) > > +{ > > + return ipoib_vlan_add_common(pdev, pkey, NULL); > > +} > > + > > +int ipoib_named_vlan_add(struct net_device *pdev, > > + unsigned short pkey, > > + char *child_name_buf) > > +{ > > + return ipoib_vlan_add_common(pdev, pkey, child_name_buf); > > +} > > + > > +int ipoib_vlan_delete_common(struct net_device *pdev, > > + unsigned short pkey, > > + char *child_name_buf) > > { > > struct ipoib_dev_priv *ppriv, *priv, *tpriv; > > struct net_device *dev = NULL; > > + char gen_intf_name[IFNAMSIZ]; > > + > > + gen_intf_name[0] = '\0'; /* initialize - paranoia! */ > > if (!capable(CAP_NET_ADMIN)) > > return -EPERM; > > @@ -205,9 +237,30 @@ int ipoib_vlan_delete(struct net_device *pdev, unsigned short pkey) > > } > > down_write(&ppriv->vlan_rwsem); > > + if (child_name_buf == NULL && ppriv->dev) { > > + /* > > + * If child name is not provided, we generate the > > + * expected one using name of parent and pkey > > + * and use it in addition to pkey value > > + * (other children with same pkey may exist that have > > + * created by create_named_child() - we do not allow > > + * delete_child() to delete them - delete_named_child() > > + * has to be used!) > > + */ > > + snprintf(gen_intf_name, sizeof(gen_intf_name), > > + "%s.%04x", ppriv->dev->name, pkey); > > + } > > list_for_each_entry_safe(priv, tpriv, &ppriv->child_intfs, list) { > > - if (priv->pkey == pkey && > > - priv->child_type == IPOIB_LEGACY_CHILD) { > > + if ((priv->child_type == IPOIB_LEGACY_CHILD) && > > + /* user named child (match by name) OR */ > > + ((child_name_buf && priv->dev && > > + !strcmp(child_name_buf, priv->dev->name)) || > > + /* > > + * OR classic (devname.hexpkey generated name) child > > + * (match by pkey and generated name) > > + */ > > + (!child_name_buf && priv->pkey == pkey && > > + priv->dev && !strcmp(gen_intf_name, priv->dev->name)))) { > > list_del(&priv->list); > > dev = priv->dev; > > break; > > @@ -231,3 +284,14 @@ int ipoib_vlan_delete(struct net_device *pdev, unsigned short pkey) > > return -ENODEV; > > } > > + > > +int ipoib_vlan_delete(struct net_device *pdev, unsigned short pkey) > > +{ > > + > > + return ipoib_vlan_delete_common(pdev, pkey, NULL); > > +} > > + > > +int ipoib_named_vlan_delete(struct net_device *pdev, char *child_name_buf) > > +{ > > + return ipoib_vlan_delete_common(pdev, 0, child_name_buf); > > +} > > > -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 09/27/2017 08:01 AM, Leon Romanovsky wrote: > On Wed, Sep 27, 2017 at 12:32:48PM +0300, Yuval Shaia wrote: >> The sysfs "create_child" interface creates pkey based child interface but >> derives the name from parent device name and pkey value. >> This makes administration difficult where pkey values can change but >> policies encoded with device names do not. >> >> We add ability to create a child interface with a user specified name and a >> specified pkey with a new sysfs "create_named_child" interface (and also >> add a corresponding "delete_named_child" interface). >> >> We also add a new module api interface to query pkey from a netdevice so >> any kernel users of pkey based child interfaces can query it - since with >> device name decoupled from pkey, it can no longer be deduced from parsing >> the device name by other kernel users. >> >> Signed-off-by: Mukesh Kacker <mukesh.kacker@oracle.com> >> Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com> >> Reviewed-by: Chien-Hua Yen <chien.yen@oracle.com> >> Signed-off-by: Yuval Shaia <yuval.shaia@oracle.com> >> --- >> Documentation/infiniband/ipoib.txt | 12 ++ >> drivers/infiniband/ulp/ipoib/ipoib.h | 3 + >> drivers/infiniband/ulp/ipoib/ipoib_main.c | 187 ++++++++++++++++++++++++++++++ >> drivers/infiniband/ulp/ipoib/ipoib_vlan.c | 76 +++++++++++- >> 4 files changed, 272 insertions(+), 6 deletions(-) >> >> diff --git a/Documentation/infiniband/ipoib.txt b/Documentation/infiniband/ipoib.txt >> index 47c1dd9818f2..1db53c9b2906 100644 >> --- a/Documentation/infiniband/ipoib.txt >> +++ b/Documentation/infiniband/ipoib.txt >> @@ -21,6 +21,18 @@ Partitions and P_Keys >> >> echo 0x8001 > /sys/class/net/ib0/delete_child >> >> + Interfaces with a user chosen name can be created in a similar >> + manner with a different name and P_Key, by writing them into the >> + main interface's /sys/class/net/<intf name>/create_named_child >> + For example: >> + echo "epart2 0x8002" > /sys/class/net/ib1/create_named_child >> + >> + This will create an interfaces named epart2 with P_Key 0x8002 and >> + parent ib1. To remove a named subinterface, use the >> + "delete_named_child" file: >> + >> + echo epart2 > /sys/class/net/ib1/delete_named_child > > I doubt that delete_named_child is actually needed. You can use delete_child > on the pkey, which you used to create named child. > > Maybe better to add support to rename child instead of introducing named > child concept? > > Thanks > I can offer a slightly indirect answer to justify the current interface by providing the background behind the requirements for this change. The requirement for this change had come from the desire for ease of writing management tools and facilitate "renumbering" of pkeys as IB network clouds are reconfigured. The renumbering still requires the name-value pair (e.g. PKEY_ID=<n>) to be propagated to hosts configurations, but having the pkey embeded in device name was introducing complexity as various sysadmin scripts and other things need to pick it up. Having devices with names like ib0.datanet, ib1.cellnet or any other ib<N>.<string> simplifies that life of people designing the management tools for networks and integrating them for the use case of renumbering of pkeys. Probably many future redesigns are possible, but for this tweak of the existing sysfs "create_child" interface, a rename child may not be the best variant if it requires using device name with pkey values at any stage in the use case. Same for delete_named_child. Also, some related trivia - which I would not use to justify this design but can explain why certain things were done. In ancient kernels like 2.6.39 (still widely used by our customers :-) ) where this was implemented first, it was possible to create multiple child interfaces with same pkey value through variants, so a delete interface just using pkey would have been ambiguous (probably not true in current kernels!). Another trivia: We also have an accompanying change diffs to the script usually installed as /etc/sysconfig/network-scripts/ifup-ib and part of startup scripts (usually in RHEL and related distributions) which uses "create_child" and was enhanced to allow both "create_child" and "create_named_child" - if these changes are accepted, those changes should also be presented to the appropriate upstream for those scripts. -Mukesh Kacker mukesh.kacker@oracle.com -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, Sep 27, 2017 at 12:03:40PM -0700, Mukesh Kacker wrote: > On 09/27/2017 08:01 AM, Leon Romanovsky wrote: > > On Wed, Sep 27, 2017 at 12:32:48PM +0300, Yuval Shaia wrote: > > > The sysfs "create_child" interface creates pkey based child interface but > > > derives the name from parent device name and pkey value. > > > This makes administration difficult where pkey values can change but > > > policies encoded with device names do not. > > > > > > We add ability to create a child interface with a user specified name and a > > > specified pkey with a new sysfs "create_named_child" interface (and also > > > add a corresponding "delete_named_child" interface). > > > > > > We also add a new module api interface to query pkey from a netdevice so > > > any kernel users of pkey based child interfaces can query it - since with > > > device name decoupled from pkey, it can no longer be deduced from parsing > > > the device name by other kernel users. > > > > > > Signed-off-by: Mukesh Kacker <mukesh.kacker@oracle.com> > > > Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com> > > > Reviewed-by: Chien-Hua Yen <chien.yen@oracle.com> > > > Signed-off-by: Yuval Shaia <yuval.shaia@oracle.com> > > > --- > > > Documentation/infiniband/ipoib.txt | 12 ++ > > > drivers/infiniband/ulp/ipoib/ipoib.h | 3 + > > > drivers/infiniband/ulp/ipoib/ipoib_main.c | 187 ++++++++++++++++++++++++++++++ > > > drivers/infiniband/ulp/ipoib/ipoib_vlan.c | 76 +++++++++++- > > > 4 files changed, 272 insertions(+), 6 deletions(-) > > > > > > diff --git a/Documentation/infiniband/ipoib.txt b/Documentation/infiniband/ipoib.txt > > > index 47c1dd9818f2..1db53c9b2906 100644 > > > --- a/Documentation/infiniband/ipoib.txt > > > +++ b/Documentation/infiniband/ipoib.txt > > > @@ -21,6 +21,18 @@ Partitions and P_Keys > > > > > > echo 0x8001 > /sys/class/net/ib0/delete_child > > > > > > + Interfaces with a user chosen name can be created in a similar > > > + manner with a different name and P_Key, by writing them into the > > > + main interface's /sys/class/net/<intf name>/create_named_child > > > + For example: > > > + echo "epart2 0x8002" > /sys/class/net/ib1/create_named_child > > > + > > > + This will create an interfaces named epart2 with P_Key 0x8002 and > > > + parent ib1. To remove a named subinterface, use the > > > + "delete_named_child" file: > > > + > > > + echo epart2 > /sys/class/net/ib1/delete_named_child > > > > I doubt that delete_named_child is actually needed. You can use delete_child > > on the pkey, which you used to create named child. > > > > Maybe better to add support to rename child instead of introducing named > > child concept? > > > > Thanks > > > > > I can offer a slightly indirect answer to justify the current interface by > providing the background behind the requirements for this change. > > The requirement for this change had come from the desire for ease of writing > management tools and facilitate "renumbering" of pkeys as IB network clouds > are reconfigured. > > The renumbering still requires the name-value pair (e.g. PKEY_ID=<n>) to be > propagated to hosts configurations, but having the pkey embeded in device > name was introducing complexity as various sysadmin scripts and other things > need to pick it up. > > Having devices with names like ib0.datanet, ib1.cellnet or any other > ib<N>.<string> simplifies that life of people designing the management tools > for networks and integrating them for the use case of renumbering of pkeys. > > Probably many future redesigns are possible, but for this tweak of the > existing sysfs "create_child" interface, a rename child may not be the best > variant if it requires using device name with pkey values at any stage in > the use case. Same for delete_named_child. I'm not the IPoIB expert, but I see ipoib_netlink.c which uses netdev stable index and can be easily extended without addition of new sysfs model to allow rename from ip tool. I'm aware of many management tools which uses directly netlink interface to configure network devices. Did you see it? > > Also, some related trivia - which I would not use to justify this design but > can explain why certain things were done. > > In ancient kernels like 2.6.39 (still widely used by our customers :-) ) > where this was implemented first, it was possible to create multiple child > interfaces with same pkey value through variants, so a delete interface just > using pkey would have been ambiguous (probably not true in current > kernels!). > > Another trivia: We also have an accompanying change diffs to the script > usually installed as /etc/sysconfig/network-scripts/ifup-ib and part of > startup scripts (usually in RHEL and related distributions) which uses > "create_child" and was enhanced to allow both "create_child" and > "create_named_child" - if these changes are accepted, those changes should > also be presented to the appropriate upstream for those scripts. Those "trivia" are not relevant for any modern distribution and looks like specific to ancient RHELs. > > -Mukesh Kacker > mukesh.kacker@oracle.com
On Wed, Sep 27, 2017 at 12:32:48PM +0300, Yuval Shaia wrote: > The sysfs "create_child" interface creates pkey based child interface but > derives the name from parent device name and pkey value. > This makes administration difficult where pkey values can change but > policies encoded with device names do not. > > We add ability to create a child interface with a user specified name and a > specified pkey with a new sysfs "create_named_child" interface (and also > add a corresponding "delete_named_child" interface). > > We also add a new module api interface to query pkey from a netdevice so > any kernel users of pkey based child interfaces can query it - since with > device name decoupled from pkey, it can no longer be deduced from parsing > the device name by other kernel users. This should all use netlink these days, not more sysfs files. Leon? What do you think about using rdmatool to provide a command line for creating ipoib children? > + Interfaces with a user chosen name can be created in a similar > + manner with a different name and P_Key, by writing them into the > + main interface's /sys/class/net/<intf name>/create_named_child > + For example: > + echo "epart2 0x8002" > /sys/class/net/ib1/create_named_child 'multi-value' sysfs files like this are categorically banned by Greg. Any kind of configuration in sysfs is really frowned on these days. Jason -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, Sep 27, 2017 at 12:03:40PM -0700, Mukesh Kacker wrote: > Having devices with names like ib0.datanet, ib1.cellnet or any other > ib<N>.<string> simplifies that life of people designing the management tools > for networks and integrating them for the use case of renumbering of pkeys. You should already be able to rename ipoib devices via 'ip link set name' - why didn't you use that to get your names? Do you need atomicity for some reason?? > variant if it requires using device name with pkey values at any stage in > the use case. Same for delete_named_child. delete_named_child is ugly, it should be a netlink command, and it should use a ifindex, not name. Jason -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, Sep 28, 2017 at 10:34:06AM -0600, Jason Gunthorpe wrote: > On Wed, Sep 27, 2017 at 12:32:48PM +0300, Yuval Shaia wrote: > > The sysfs "create_child" interface creates pkey based child interface but > > derives the name from parent device name and pkey value. > > This makes administration difficult where pkey values can change but > > policies encoded with device names do not. > > > > We add ability to create a child interface with a user specified name and a > > specified pkey with a new sysfs "create_named_child" interface (and also > > add a corresponding "delete_named_child" interface). > > > > We also add a new module api interface to query pkey from a netdevice so > > any kernel users of pkey based child interfaces can query it - since with > > device name decoupled from pkey, it can no longer be deduced from parsing > > the device name by other kernel users. > > This should all use netlink these days, not more sysfs files. > > Leon? What do you think about using rdmatool to provide a command line > for creating ipoib children? As far as I understand ipoib_netlink.c, ipoib_new_child_link() already implements it and it is supported in "ip". And I think that it is more netdev than rdma, so IMHO, the "ip" is more appropriate, but if someone decides to add "ipoib" object in rdmatool, I won't stand against it. > > > + Interfaces with a user chosen name can be created in a similar > > + manner with a different name and P_Key, by writing them into the > > + main interface's /sys/class/net/<intf name>/create_named_child > > + For example: > > + echo "epart2 0x8002" > /sys/class/net/ib1/create_named_child > > 'multi-value' sysfs files like this are categorically banned by Greg. > > Any kind of configuration in sysfs is really frowned on these days. > > Jason
On Thu, Sep 28, 2017 at 07:47:35PM +0300, Leon Romanovsky wrote: > On Thu, Sep 28, 2017 at 10:34:06AM -0600, Jason Gunthorpe wrote: > > On Wed, Sep 27, 2017 at 12:32:48PM +0300, Yuval Shaia wrote: > > > The sysfs "create_child" interface creates pkey based child interface but > > > derives the name from parent device name and pkey value. > > > This makes administration difficult where pkey values can change but > > > policies encoded with device names do not. > > > > > > We add ability to create a child interface with a user specified name and a > > > specified pkey with a new sysfs "create_named_child" interface (and also > > > add a corresponding "delete_named_child" interface). > > > > > > We also add a new module api interface to query pkey from a netdevice so > > > any kernel users of pkey based child interfaces can query it - since with > > > device name decoupled from pkey, it can no longer be deduced from parsing > > > the device name by other kernel users. > > > > This should all use netlink these days, not more sysfs files. > > > > Leon? What do you think about using rdmatool to provide a command line > > for creating ipoib children? > > As far as I understand ipoib_netlink.c, ipoib_new_child_link() already > implements it and it is supported in "ip". Oh right: ip link add DEVICE name NAME type ipoib [ pkey PKEY ] [mode MODE ] So what is the point of this series? NAK from me. Jason -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, Sep 28, 2017 at 10:53:05AM -0600, Jason Gunthorpe wrote: > On Thu, Sep 28, 2017 at 07:47:35PM +0300, Leon Romanovsky wrote: > > On Thu, Sep 28, 2017 at 10:34:06AM -0600, Jason Gunthorpe wrote: > > > On Wed, Sep 27, 2017 at 12:32:48PM +0300, Yuval Shaia wrote: > > > > The sysfs "create_child" interface creates pkey based child interface but > > > > derives the name from parent device name and pkey value. > > > > This makes administration difficult where pkey values can change but > > > > policies encoded with device names do not. > > > > > > > > We add ability to create a child interface with a user specified name and a > > > > specified pkey with a new sysfs "create_named_child" interface (and also > > > > add a corresponding "delete_named_child" interface). > > > > > > > > We also add a new module api interface to query pkey from a netdevice so > > > > any kernel users of pkey based child interfaces can query it - since with > > > > device name decoupled from pkey, it can no longer be deduced from parsing > > > > the device name by other kernel users. > > > > > > This should all use netlink these days, not more sysfs files. > > > > > > Leon? What do you think about using rdmatool to provide a command line > > > for creating ipoib children? > > > > As far as I understand ipoib_netlink.c, ipoib_new_child_link() already > > implements it and it is supported in "ip". > > Oh right: > > ip link add DEVICE name NAME type ipoib [ pkey PKEY ] [mode MODE ] > > So what is the point of this series? > > NAK from me. I tried to be more polite than you :) > > Jason
On Thu, Sep 28, 2017 at 10:53:05AM -0600, Jason Gunthorpe wrote: > On Thu, Sep 28, 2017 at 07:47:35PM +0300, Leon Romanovsky wrote: > > On Thu, Sep 28, 2017 at 10:34:06AM -0600, Jason Gunthorpe wrote: > > > On Wed, Sep 27, 2017 at 12:32:48PM +0300, Yuval Shaia wrote: > > > > The sysfs "create_child" interface creates pkey based child interface but > > > > derives the name from parent device name and pkey value. > > > > This makes administration difficult where pkey values can change but > > > > policies encoded with device names do not. > > > > > > > > We add ability to create a child interface with a user specified name and a > > > > specified pkey with a new sysfs "create_named_child" interface (and also > > > > add a corresponding "delete_named_child" interface). > > > > > > > > We also add a new module api interface to query pkey from a netdevice so > > > > any kernel users of pkey based child interfaces can query it - since with > > > > device name decoupled from pkey, it can no longer be deduced from parsing > > > > the device name by other kernel users. > > > > > > This should all use netlink these days, not more sysfs files. > > > > > > Leon? What do you think about using rdmatool to provide a command line > > > for creating ipoib children? > > > > As far as I understand ipoib_netlink.c, ipoib_new_child_link() already > > implements it and it is supported in "ip". > > Oh right: > > ip link add DEVICE name NAME type ipoib [ pkey PKEY ] [mode MODE ] So with this interface we can entirely remove the sysfs interface to create child, right? > > So what is the point of this series? > > NAK from me. > > Jason -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Sun, Oct 15, 2017 at 08:47:46AM +0300, Yuval Shaia wrote: > On Thu, Sep 28, 2017 at 10:53:05AM -0600, Jason Gunthorpe wrote: > > On Thu, Sep 28, 2017 at 07:47:35PM +0300, Leon Romanovsky wrote: > > > On Thu, Sep 28, 2017 at 10:34:06AM -0600, Jason Gunthorpe wrote: > > > > On Wed, Sep 27, 2017 at 12:32:48PM +0300, Yuval Shaia wrote: > > > > > The sysfs "create_child" interface creates pkey based child interface but > > > > > derives the name from parent device name and pkey value. > > > > > This makes administration difficult where pkey values can change but > > > > > policies encoded with device names do not. > > > > > > > > > > We add ability to create a child interface with a user specified name and a > > > > > specified pkey with a new sysfs "create_named_child" interface (and also > > > > > add a corresponding "delete_named_child" interface). > > > > > > > > > > We also add a new module api interface to query pkey from a netdevice so > > > > > any kernel users of pkey based child interfaces can query it - since with > > > > > device name decoupled from pkey, it can no longer be deduced from parsing > > > > > the device name by other kernel users. > > > > > > > > This should all use netlink these days, not more sysfs files. > > > > > > > > Leon? What do you think about using rdmatool to provide a command line > > > > for creating ipoib children? > > > > > > As far as I understand ipoib_netlink.c, ipoib_new_child_link() already > > > implements it and it is supported in "ip". > > > > Oh right: > > > > ip link add DEVICE name NAME type ipoib [ pkey PKEY ] [mode MODE ] > > So with this interface we can entirely remove the sysfs interface to create > child, right? We can't :(, it will break user's scripts. Thanks > > > > > So what is the point of this series? > > > > NAK from me. > > > > Jason
On Sun, Oct 15, 2017 at 08:47:46AM +0300, Yuval Shaia wrote: > > > As far as I understand ipoib_netlink.c, ipoib_new_child_link() already > > > implements it and it is supported in "ip". > > > > Oh right: > > > > ip link add DEVICE name NAME type ipoib [ pkey PKEY ] [mode MODE ] > > So with this interface we can entirely remove the sysfs interface to create > child, right? Yes, we should add a deprecation one shot printk to the kernel for the sysfs interface to encourage people to use ip Jason -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, Oct 17, 2017 at 02:18:37AM -0600, Jason Gunthorpe wrote: > On Sun, Oct 15, 2017 at 08:47:46AM +0300, Yuval Shaia wrote: > > > > > As far as I understand ipoib_netlink.c, ipoib_new_child_link() already > > > > implements it and it is supported in "ip". > > > > > > Oh right: > > > > > > ip link add DEVICE name NAME type ipoib [ pkey PKEY ] [mode MODE ] > > > > So with this interface we can entirely remove the sysfs interface to create > > child, right? > > Yes, we should add a deprecation one shot printk to the kernel for the > sysfs interface to encourage people to use ip Please don't do that, it won't help for anyone, and especially for the people who didn't hear about "ip" in 2017. IPoIB netlink doesn't support enhanced IPoIB device because child device in netlink code was not allocated with rdma_alloc_netdev call as it was done for other flows. Thanks > > Jason
On Tue, Oct 17, 2017 at 01:21:21PM +0300, Leon Romanovsky wrote: > > Yes, we should add a deprecation one shot printk to the kernel for the > > sysfs interface to encourage people to use ip > > Please don't do that, it won't help for anyone, and especially for the people > who didn't hear about "ip" in 2017. This is the standard kernel way to encourage people to use the new interfaces... > IPoIB netlink doesn't support enhanced IPoIB device because child > device in netlink code was not allocated with rdma_alloc_netdev call > as it was done for other flows. I don't understand, isn't this just a (bad) bug to be fixed? Jason -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, Oct 19, 2017 at 12:11:00AM -0600, Jason Gunthorpe wrote: > On Tue, Oct 17, 2017 at 01:21:21PM +0300, Leon Romanovsky wrote: > > > Yes, we should add a deprecation one shot printk to the kernel for the > > > sysfs interface to encourage people to use ip > > > > Please don't do that, it won't help for anyone, and especially for the people > > who didn't hear about "ip" in 2017. > > This is the standard kernel way to encourage people to use the new > interfaces... > > > IPoIB netlink doesn't support enhanced IPoIB device because child > > device in netlink code was not allocated with rdma_alloc_netdev call > > as it was done for other flows. > > I don't understand, isn't this just a (bad) bug to be fixed? Yes and we want to fix it BEFORE adding discouraging warnings. Thanks > > Jason
On 10/19/2017 9:14 AM, Leon Romanovsky wrote: > On Thu, Oct 19, 2017 at 12:11:00AM -0600, Jason Gunthorpe wrote: >> On Tue, Oct 17, 2017 at 01:21:21PM +0300, Leon Romanovsky wrote: >>>> Yes, we should add a deprecation one shot printk to the kernel for the >>>> sysfs interface to encourage people to use ip >>> Please don't do that, it won't help for anyone, and especially for the people >>> who didn't hear about "ip" in 2017. >> This is the standard kernel way to encourage people to use the new >> interfaces... >> >>> IPoIB netlink doesn't support enhanced IPoIB device because child >>> device in netlink code was not allocated with rdma_alloc_netdev call >>> as it was done for other flows. >> I don't understand, isn't this just a (bad) bug to be fixed? > Yes and we want to fix it BEFORE adding discouraging warnings. > > Thanks > >> Jason We plan to block ipoib_netlink in the meanwhile until we fix it. The fix is a little tricky since the netdev is allocated in rtnl_newlink and it should be allocated with the new logic introduced in enhanced IPoIB, setting rn_ops, allocating an rdma_netdev if possible. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, Oct 19, 2017 at 09:28:31AM +0300, Alex Vesker wrote: > On 10/19/2017 9:14 AM, Leon Romanovsky wrote: > >On Thu, Oct 19, 2017 at 12:11:00AM -0600, Jason Gunthorpe wrote: > >>On Tue, Oct 17, 2017 at 01:21:21PM +0300, Leon Romanovsky wrote: > >>>>Yes, we should add a deprecation one shot printk to the kernel for the > >>>>sysfs interface to encourage people to use ip > >>>Please don't do that, it won't help for anyone, and especially for the people > >>>who didn't hear about "ip" in 2017. > >>This is the standard kernel way to encourage people to use the new > >>interfaces... > >> > >>>IPoIB netlink doesn't support enhanced IPoIB device because child > >>>device in netlink code was not allocated with rdma_alloc_netdev call > >>>as it was done for other flows. > >>I don't understand, isn't this just a (bad) bug to be fixed? > >Yes and we want to fix it BEFORE adding discouraging warnings. > We plan to block ipoib_netlink in the meanwhile until we fix it. That isn't a good idea, don't break APIs for performance reasons.. > The fix is a little tricky since the netdev is allocated in > rtnl_newlink and it should be allocated with the new logic > introduced in enhanced IPoIB, setting rn_ops, allocating an > rdma_netdev if possible. Should get on it then .. :) Jason -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Sat, Oct 21, 2017 at 01:42:15PM -0600, Jason Gunthorpe wrote: > On Thu, Oct 19, 2017 at 09:28:31AM +0300, Alex Vesker wrote: > > On 10/19/2017 9:14 AM, Leon Romanovsky wrote: > > >On Thu, Oct 19, 2017 at 12:11:00AM -0600, Jason Gunthorpe wrote: > > >>On Tue, Oct 17, 2017 at 01:21:21PM +0300, Leon Romanovsky wrote: > > >>>>Yes, we should add a deprecation one shot printk to the kernel for the > > >>>>sysfs interface to encourage people to use ip > > >>>Please don't do that, it won't help for anyone, and especially for the people > > >>>who didn't hear about "ip" in 2017. > > >>This is the standard kernel way to encourage people to use the new > > >>interfaces... > > >> > > >>>IPoIB netlink doesn't support enhanced IPoIB device because child > > >>>device in netlink code was not allocated with rdma_alloc_netdev call > > >>>as it was done for other flows. > > >>I don't understand, isn't this just a (bad) bug to be fixed? > > >Yes and we want to fix it BEFORE adding discouraging warnings. > > > We plan to block ipoib_netlink in the meanwhile until we fix it. > > That isn't a good idea, don't break APIs for performance reasons.. It is not "performance", but simple attempt to prevent kernel panic, while calling that function, because I'm unsure if we success to provide proper fix till merge window. > > > The fix is a little tricky since the netdev is allocated in > > rtnl_newlink and it should be allocated with the new logic > > introduced in enhanced IPoIB, setting rn_ops, allocating an > > rdma_netdev if possible. > > Should get on it then .. :) > > Jason
On Mon, Oct 23, 2017 at 09:04:36AM +0300, Leon Romanovsky wrote: > On Sat, Oct 21, 2017 at 01:42:15PM -0600, Jason Gunthorpe wrote: > > On Thu, Oct 19, 2017 at 09:28:31AM +0300, Alex Vesker wrote: > > > On 10/19/2017 9:14 AM, Leon Romanovsky wrote: > > > >On Thu, Oct 19, 2017 at 12:11:00AM -0600, Jason Gunthorpe wrote: > > > >>On Tue, Oct 17, 2017 at 01:21:21PM +0300, Leon Romanovsky wrote: > > > >>>>Yes, we should add a deprecation one shot printk to the kernel for the > > > >>>>sysfs interface to encourage people to use ip > > > >>>Please don't do that, it won't help for anyone, and especially for the people > > > >>>who didn't hear about "ip" in 2017. > > > >>This is the standard kernel way to encourage people to use the new > > > >>interfaces... > > > >> > > > >>>IPoIB netlink doesn't support enhanced IPoIB device because child > > > >>>device in netlink code was not allocated with rdma_alloc_netdev call > > > >>>as it was done for other flows. > > > >>I don't understand, isn't this just a (bad) bug to be fixed? > > > >Yes and we want to fix it BEFORE adding discouraging warnings. > > > > > We plan to block ipoib_netlink in the meanwhile until we fix it. > > > > That isn't a good idea, don't break APIs for performance reasons.. > > It is not "performance", but simple attempt to prevent kernel panic, > while calling that function, because I'm unsure if we success to provide > proper fix till merge window. Er, why should a non-accelerated ipoib instance panic? That makes no sense :( In any event, someone needs to send patches to fix these oops's! Jason -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, Oct 23, 2017 at 09:23:40AM -0600, Jason Gunthorpe wrote: > On Mon, Oct 23, 2017 at 09:04:36AM +0300, Leon Romanovsky wrote: > > On Sat, Oct 21, 2017 at 01:42:15PM -0600, Jason Gunthorpe wrote: > > > On Thu, Oct 19, 2017 at 09:28:31AM +0300, Alex Vesker wrote: > > > > On 10/19/2017 9:14 AM, Leon Romanovsky wrote: > > > > >On Thu, Oct 19, 2017 at 12:11:00AM -0600, Jason Gunthorpe wrote: > > > > >>On Tue, Oct 17, 2017 at 01:21:21PM +0300, Leon Romanovsky wrote: > > > > >>>>Yes, we should add a deprecation one shot printk to the kernel for the > > > > >>>>sysfs interface to encourage people to use ip > > > > >>>Please don't do that, it won't help for anyone, and especially for the people > > > > >>>who didn't hear about "ip" in 2017. > > > > >>This is the standard kernel way to encourage people to use the new > > > > >>interfaces... > > > > >> > > > > >>>IPoIB netlink doesn't support enhanced IPoIB device because child > > > > >>>device in netlink code was not allocated with rdma_alloc_netdev call > > > > >>>as it was done for other flows. > > > > >>I don't understand, isn't this just a (bad) bug to be fixed? > > > > >Yes and we want to fix it BEFORE adding discouraging warnings. > > > > > > > We plan to block ipoib_netlink in the meanwhile until we fix it. > > > > > > That isn't a good idea, don't break APIs for performance reasons.. > > > > It is not "performance", but simple attempt to prevent kernel panic, > > while calling that function, because I'm unsure if we success to provide > > proper fix till merge window. > > Er, why should a non-accelerated ipoib instance panic? That makes no > sense :( > > In any event, someone needs to send patches to fix these oops's! Agree, Alex is supposed to work on it. > > Jason
diff --git a/Documentation/infiniband/ipoib.txt b/Documentation/infiniband/ipoib.txt index 47c1dd9818f2..1db53c9b2906 100644 --- a/Documentation/infiniband/ipoib.txt +++ b/Documentation/infiniband/ipoib.txt @@ -21,6 +21,18 @@ Partitions and P_Keys echo 0x8001 > /sys/class/net/ib0/delete_child + Interfaces with a user chosen name can be created in a similar + manner with a different name and P_Key, by writing them into the + main interface's /sys/class/net/<intf name>/create_named_child + For example: + echo "epart2 0x8002" > /sys/class/net/ib1/create_named_child + + This will create an interfaces named epart2 with P_Key 0x8002 and + parent ib1. To remove a named subinterface, use the + "delete_named_child" file: + + echo epart2 > /sys/class/net/ib1/delete_named_child + The P_Key for any interface is given by the "pkey" file, and the main interface for a subinterface is in "parent." diff --git a/drivers/infiniband/ulp/ipoib/ipoib.h b/drivers/infiniband/ulp/ipoib/ipoib.h index 4a5c7a07a631..9d0010f9b324 100644 --- a/drivers/infiniband/ulp/ipoib/ipoib.h +++ b/drivers/infiniband/ulp/ipoib/ipoib.h @@ -589,6 +589,9 @@ void ipoib_event(struct ib_event_handler *handler, int ipoib_vlan_add(struct net_device *pdev, unsigned short pkey); int ipoib_vlan_delete(struct net_device *pdev, unsigned short pkey); +int ipoib_named_vlan_add(struct net_device *pdev, unsigned short pkey, + char *child_name_buf); +int ipoib_named_vlan_delete(struct net_device *pdev, char *child_name_buf); int __ipoib_vlan_add(struct ipoib_dev_priv *ppriv, struct ipoib_dev_priv *priv, u16 pkey, int child_type); diff --git a/drivers/infiniband/ulp/ipoib/ipoib_main.c b/drivers/infiniband/ulp/ipoib/ipoib_main.c index bac95b509a9b..2bdd4055d69f 100644 --- a/drivers/infiniband/ulp/ipoib/ipoib_main.c +++ b/drivers/infiniband/ulp/ipoib/ipoib_main.c @@ -34,6 +34,7 @@ #include "ipoib.h" +#include <linux/ctype.h> #include <linux/module.h> #include <linux/init.h> @@ -136,6 +137,13 @@ static int ipoib_netdev_event(struct notifier_block *this, } #endif +/* + * PKEY_HEXSTRING_MAXWIDTH - number of hex + * digits needed to represent max width of + * pkey value. + */ +#define PKEY_HEXSTRING_MAXWIDTH 4 + int ipoib_open(struct net_device *dev) { struct ipoib_dev_priv *priv = ipoib_priv(dev); @@ -2111,6 +2119,121 @@ static int ipoib_set_mac(struct net_device *dev, void *addr) return 0; } +/* + * Check if a buffer has name of the format + * + * <network-device-name>.<4hexcharacters> + * e.g. ib1.8004 etc. + * + * Such names are generated by create_child() by + * concatenating parent device with 16-bit pkey + * in hex, and disallowed from usage with + * create_named_child() interface. + * + */ +static bool ipoib_disallowed_named_child_namespace(const char *buf) +{ + char localbuf[IFNAMSIZ]; + char *dotp = NULL; + char *buf_before_dot = NULL; + char *buf_after_dot = NULL; + unsigned int ii; + + memcpy(localbuf, buf, IFNAMSIZ); + localbuf[IFNAMSIZ-1] = '\0'; /* paranoia! */ + + dotp = strnchr(localbuf, IFNAMSIZ, '.'); + /* no dot or dot at end! */ + if (dotp == NULL || dotp == localbuf+IFNAMSIZ-2) + return false; + + *dotp = '\0'; /* split buffer at "dot" */ + buf_before_dot = localbuf; + buf_after_dot = dotp + 1; + + /* + * Check if buf_after_dot is hexstring of width + * that could be a pkey! + */ + if (strlen(buf_after_dot) != PKEY_HEXSTRING_MAXWIDTH) + return false; + + for (ii = 0; ii < PKEY_HEXSTRING_MAXWIDTH; ii++) { + if (!isxdigit(buf_after_dot[ii])) + return false; + } + + /* + * (1) buf_after_dot check above makes it valid hexdigit .XXXX format + * + * Now verify if buf_before_dot is a valid net device name - + * (if it is not, then we are not in disallowed namespace) + */ + if (__dev_get_by_name(&init_net, buf_before_dot) == NULL) + return false; + + /* + * (2) buf_before_dot is valid net device name + * - reserved namespace is being used! + * + * Note: No check on netdev->type to be ARPHRD_INFINIBAND etc + * We implicitly treat even misleading names such as eth1.XXXX + * (ethernet device prefix) for child interface name of an + * infiniband device as intrusion of reserved namespace! + */ + return true; +} + +static int parse_named_child(struct device *dev, const char *buf, + char *child_name_buf, int *pkeyp) +{ + int ret; + struct ipoib_dev_priv *priv = ipoib_priv(to_net_dev(dev)); + + if (pkeyp) + *pkeyp = -1; + + /* + * First parameter is child interface name, after that + * 'pkey' is required if we were passed a pkey buffer + * (Note: From create_named_child, we are passed a pkey + * buffer to parse input, from delete_named_child we are + * not!) + * Note: IFNAMSIZ is 16, allowing for tail null + * we only scan 15 characters for name. + */ + if (pkeyp) { + ret = sscanf(buf, "%15s %i", child_name_buf, pkeyp); + if (ret != 2) + return -EINVAL; + } else { + ret = sscanf(buf, "%15s", child_name_buf); + if (ret != 1) + return -EINVAL; + } + + if (strlen(child_name_buf) <= 0 || !dev_valid_name(child_name_buf)) + return -EINVAL; + + if (pkeyp && (*pkeyp <= 0 || *pkeyp > 0xffff || *pkeyp == 0x8000)) + return -EINVAL; + + if (ipoib_disallowed_named_child_namespace(child_name_buf)) { + pr_warn("child name %s not allowed to be used with create_named_child as it uses <network-device-name>.XXXX format reserved for create_child/delete_child interfaces!\n", + child_name_buf); + return -EINVAL; + } + + if (pkeyp) + ipoib_dbg(priv, "%s inp %s out child_name_buf %s, pkey %04x\n", + __func__, buf, child_name_buf, *pkeyp); + else + ipoib_dbg(priv, "%s inp %s out child_name_buf %s\n", __func__, + buf, child_name_buf); + return 0; +} + + static ssize_t create_child(struct device *dev, struct device_attribute *attr, const char *buf, size_t count) @@ -2156,6 +2279,44 @@ static ssize_t delete_child(struct device *dev, } static DEVICE_ATTR(delete_child, S_IWUSR, NULL, delete_child); +static ssize_t create_named_child(struct device *dev, + struct device_attribute *attr, + const char *buf, size_t count) +{ + int pkey; + char child_name[IFNAMSIZ]; + int ret = 0; + + child_name[0] = '\0'; + + if (parse_named_child(dev, buf, child_name, &pkey)) + return -EINVAL; + + ret = ipoib_named_vlan_add(to_net_dev(dev), pkey, child_name); + return ret ? ret : count; +} +static DEVICE_ATTR(create_named_child, S_IWUSR, NULL, create_named_child); + +static ssize_t delete_named_child(struct device *dev, + struct device_attribute *attr, + const char *buf, size_t count) +{ + char child_name[IFNAMSIZ]; + int ret = 0; + + child_name[0] = '\0'; + + if (parse_named_child(dev, buf, child_name, NULL)) + return -EINVAL; + + ret = ipoib_named_vlan_delete(to_net_dev(dev), child_name); + + return ret ? ret : count; + +} +static DEVICE_ATTR(delete_named_child, S_IWUSR, NULL, delete_named_child); + + int ipoib_add_pkey_attr(struct net_device *dev) { return device_create_file(&dev->dev, &dev_attr_pkey); @@ -2263,6 +2424,11 @@ static struct net_device *ipoib_add_port(const char *format, goto sysfs_failed; if (device_create_file(&priv->dev->dev, &dev_attr_delete_child)) goto sysfs_failed; + if (device_create_file(&priv->dev->dev, &dev_attr_create_named_child)) + goto sysfs_failed; + if (device_create_file(&priv->dev->dev, &dev_attr_delete_named_child)) + goto sysfs_failed; + return priv->dev; @@ -2367,6 +2533,27 @@ static struct notifier_block ipoib_netdev_notifier = { }; #endif +int +ipoib_get_netdev_pkey(struct net_device *dev, u16 *pkey) +{ + struct ipoib_dev_priv *priv; + + if (dev->type != ARPHRD_INFINIBAND) + return -EINVAL; + + /* only for ipoib net devices! */ + if ((dev->netdev_ops != &ipoib_netdev_ops_pf) && + (dev->netdev_ops != &ipoib_netdev_ops_vf)) + return -EINVAL; + + priv = ipoib_priv(dev); + + *pkey = priv->pkey; + + return 0; +} +EXPORT_SYMBOL(ipoib_get_netdev_pkey); + static int __init ipoib_init_module(void) { int ret; diff --git a/drivers/infiniband/ulp/ipoib/ipoib_vlan.c b/drivers/infiniband/ulp/ipoib/ipoib_vlan.c index 9927cd6b7082..f5ae55f4f845 100644 --- a/drivers/infiniband/ulp/ipoib/ipoib_vlan.c +++ b/drivers/infiniband/ulp/ipoib/ipoib_vlan.c @@ -115,7 +115,9 @@ int __ipoib_vlan_add(struct ipoib_dev_priv *ppriv, struct ipoib_dev_priv *priv, return result; } -int ipoib_vlan_add(struct net_device *pdev, unsigned short pkey) +int ipoib_vlan_add_common(struct net_device *pdev, + unsigned short pkey, + char *child_name_buf) { struct ipoib_dev_priv *ppriv, *priv; char intf_name[IFNAMSIZ]; @@ -130,8 +132,21 @@ int ipoib_vlan_add(struct net_device *pdev, unsigned short pkey) if (test_bit(IPOIB_FLAG_GOING_DOWN, &ppriv->flags)) return -EPERM; - snprintf(intf_name, sizeof intf_name, "%s.%04x", - ppriv->dev->name, pkey); + if (child_name_buf == NULL) { + /* + * If child name is not provided, we generated + * one using name of parent and pkey. + */ + snprintf(intf_name, sizeof(intf_name), "%s.%04x", + ppriv->dev->name, pkey); + } else { + /* + * Note: Duplicate intf_name will be detected later in the code + * by register_netdevice() (inside __ipoib_vlan_add() call + * below) returning EEXIST! + */ + strncpy(intf_name, child_name_buf, IFNAMSIZ); + } if (!mutex_trylock(&ppriv->sysfs_mutex)) return restart_syscall(); @@ -183,10 +198,27 @@ int ipoib_vlan_add(struct net_device *pdev, unsigned short pkey) return result; } -int ipoib_vlan_delete(struct net_device *pdev, unsigned short pkey) +int ipoib_vlan_add(struct net_device *pdev, unsigned short pkey) +{ + return ipoib_vlan_add_common(pdev, pkey, NULL); +} + +int ipoib_named_vlan_add(struct net_device *pdev, + unsigned short pkey, + char *child_name_buf) +{ + return ipoib_vlan_add_common(pdev, pkey, child_name_buf); +} + +int ipoib_vlan_delete_common(struct net_device *pdev, + unsigned short pkey, + char *child_name_buf) { struct ipoib_dev_priv *ppriv, *priv, *tpriv; struct net_device *dev = NULL; + char gen_intf_name[IFNAMSIZ]; + + gen_intf_name[0] = '\0'; /* initialize - paranoia! */ if (!capable(CAP_NET_ADMIN)) return -EPERM; @@ -205,9 +237,30 @@ int ipoib_vlan_delete(struct net_device *pdev, unsigned short pkey) } down_write(&ppriv->vlan_rwsem); + if (child_name_buf == NULL && ppriv->dev) { + /* + * If child name is not provided, we generate the + * expected one using name of parent and pkey + * and use it in addition to pkey value + * (other children with same pkey may exist that have + * created by create_named_child() - we do not allow + * delete_child() to delete them - delete_named_child() + * has to be used!) + */ + snprintf(gen_intf_name, sizeof(gen_intf_name), + "%s.%04x", ppriv->dev->name, pkey); + } list_for_each_entry_safe(priv, tpriv, &ppriv->child_intfs, list) { - if (priv->pkey == pkey && - priv->child_type == IPOIB_LEGACY_CHILD) { + if ((priv->child_type == IPOIB_LEGACY_CHILD) && + /* user named child (match by name) OR */ + ((child_name_buf && priv->dev && + !strcmp(child_name_buf, priv->dev->name)) || + /* + * OR classic (devname.hexpkey generated name) child + * (match by pkey and generated name) + */ + (!child_name_buf && priv->pkey == pkey && + priv->dev && !strcmp(gen_intf_name, priv->dev->name)))) { list_del(&priv->list); dev = priv->dev; break; @@ -231,3 +284,14 @@ int ipoib_vlan_delete(struct net_device *pdev, unsigned short pkey) return -ENODEV; } + +int ipoib_vlan_delete(struct net_device *pdev, unsigned short pkey) +{ + + return ipoib_vlan_delete_common(pdev, pkey, NULL); +} + +int ipoib_named_vlan_delete(struct net_device *pdev, char *child_name_buf) +{ + return ipoib_vlan_delete_common(pdev, 0, child_name_buf); +}