Message ID | 1432225447-6536-2-git-send-email-ogerlitz@mellanox.com (mailing list archive) |
---|---|
State | Changes Requested |
Headers | show |
On Thu, May 21, 2015 at 07:24:05PM +0300, Or Gerlitz wrote: > Standard configuration of SRIOV VFs through the host is done over the > following chain of calls: libvirt --> netlink --> PF netdevice > > When this comes to IB/IPoIB we should normalize this into the verbs > framework so we further go: PF IPoIB --> verbs API --> PF HW driver > > Virtualization systems assign VMs 48 bits mac, to allow working with > non-modified SW layers (open-stack, libvirt, etc), we can safely > extend this mac to unique 64 bits GUID. Hence the IPoIB ndo_set_vf_mac > entry calls the set_vf_guid verb. > > Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> > drivers/infiniband/ulp/ipoib/ipoib_main.c | 39 +++++++++++++++++++++++++++++ > include/rdma/ib_verbs.h | 4 +++ > 2 files changed, 43 insertions(+), 0 deletions(-) > > diff --git a/drivers/infiniband/ulp/ipoib/ipoib_main.c b/drivers/infiniband/ulp/ipoib/ipoib_main.c > index 9e1b203..8f82870 100644 > +++ b/drivers/infiniband/ulp/ipoib/ipoib_main.c > @@ -1357,6 +1357,43 @@ void ipoib_dev_cleanup(struct net_device *dev) > priv->tx_ring = NULL; > } > > +static int ipoib_get_vf_config(struct net_device *dev, int vf, struct ifla_vf_info *ivf) > +{ > + struct ipoib_dev_priv *priv = netdev_priv(dev); > + > + if (priv->ca->get_vf_config) > + return priv->ca->get_vf_config(priv->ca, priv->port, vf, ivf); > + else > + return -EINVAL; > +} > + > +static int ipoib_set_vf_mac(struct net_device *dev, int queue, u8 *mac) > +{ > + char *raw_guid; > + u64 guid = 0; This doesn't seem right at all. It makes no sense that a IPoIB interface with a 20 byte LLADDR would accept an 8 byte LLADDR only for 'ip link set vf mac' The definition of the netlink struct seems to confirm this: struct ifla_vf_mac { __u32 vf; __u8 mac[32]; /* MAX_ADDR_LEN */ }; If it was really just ever a mac it would really only be 6 bytes. [Honestly, this whole feature seems very inconistent with the rest of the design of ip net link, so who knows] If the ifla_vf_mac had been variable-sized (like every other address related attribute) then sure, auto detect the length and do the right thing. But with this API, I think you have no choice, 'ip set vf mac LLADDR' can only be the 20 byte address. If you really, really want this: Get someone from iproute or netdev to sign off that they really mean that 'ip set vf mac LLADDR' is always a 6 byte mac. Jason -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, May 21, 2015, Jason Gunthorpe wrote: >> +static int ipoib_set_vf_mac(struct net_device *dev, int queue, u8 *mac) >> +{ >> + char *raw_guid; >> + u64 guid = 0; > > This doesn't seem right at all. > > It makes no sense that a IPoIB interface with a 20 byte LLADDR would > accept an 8 byte LLADDR only for 'ip link set vf mac' > > The definition of the netlink struct seems to confirm this: > > struct ifla_vf_mac { > __u32 vf; > __u8 mac[32]; /* MAX_ADDR_LEN */ > }; > > If it was really just ever a mac it would really only be 6 bytes. > [Honestly, this whole feature seems very inconistent with the rest of > the design of ip net link, so who knows] > > If the ifla_vf_mac had been variable-sized (like every other address > related attribute) then sure, auto detect the length and do the right > thing. > > But with this API, I think you have no choice, 'ip set vf mac LLADDR' > can only be the 20 byte address. You can't enforce 20 byte address on ipoib instance b/c the driver can't dictate the QPN of their UD QP. Also, you don't need to force that, b/c what the virtualization system want to provision relates to a VM ID which is their mac (--> guid). If the ifla_vf_mac had been variable-sized we could have the ipoib set_vf_mac implementation to check if user-space provided 48 bits (mac), 64bits (guid) or even 128bits (whole gid), but it doesn't and I would like to either use the lowest common denominator (48bits) or just take always 64bits which could have two zero bytes (e.g when libvirt calls into that api through netlink). -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, May 21, 2015 at 11:05:32PM +0300, Or Gerlitz wrote: > On Thu, May 21, 2015, Jason Gunthorpe wrote: > > >> +static int ipoib_set_vf_mac(struct net_device *dev, int queue, u8 *mac) > >> +{ > >> + char *raw_guid; > >> + u64 guid = 0; > > > > This doesn't seem right at all. > > > > It makes no sense that a IPoIB interface with a 20 byte LLADDR would > > accept an 8 byte LLADDR only for 'ip link set vf mac' > > > > The definition of the netlink struct seems to confirm this: > > > > struct ifla_vf_mac { > > __u32 vf; > > __u8 mac[32]; /* MAX_ADDR_LEN */ > > }; > > > > If it was really just ever a mac it would really only be 6 bytes. > > [Honestly, this whole feature seems very inconistent with the rest of > > the design of ip net link, so who knows] > > > > If the ifla_vf_mac had been variable-sized (like every other address > > related attribute) then sure, auto detect the length and do the right > > thing. > > > > But with this API, I think you have no choice, 'ip set vf mac LLADDR' > > can only be the 20 byte address. > > You can't enforce 20 byte address on ipoib instance b/c the driver > can't dictate the QPN of their UD QP. I can think of several ways off the top of my head to deal with that. Don't see it as a problem. > Also, you don't need to force that, b/c what the virtualization > system want to provision relates to a VM ID which is their mac (--> > guid). No, I do need to force that because that is how netlink works: each device has a defined hw address length, and all netlink messages dealing with LLADDR must use an identical format for the hw address. For IPoIB that is 20 byte. It is a simple matter of UAPI sanity, it is not the kernel's problem that some userspace tools assume a 6 byte LLADDR for all devices - they are broken. > (mac), 64bits (guid) or even 128bits (whole gid), but it doesn't and I > would like to either use the lowest common denominator (48bits) or > just take always 64bits which could have two zero bytes (e.g when > libvirt calls into that api through netlink). Then get the iproute and netdev people to sign off on using ifla_vf_mac with something other than the defined 20 byte IPoIB HW address. Maybe you can work out some scheme with them. Making the IFLA_VF_MAC properly variable sized might be a good angle. NACK otherwise. The rest of the series's idea seems sound. Jason -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, 2015-05-21 at 12:46 -0600, Jason Gunthorpe wrote: > On Thu, May 21, 2015 at 07:24:05PM +0300, Or Gerlitz wrote: > > Standard configuration of SRIOV VFs through the host is done over the > > following chain of calls: libvirt --> netlink --> PF netdevice > > > > When this comes to IB/IPoIB we should normalize this into the verbs > > framework so we further go: PF IPoIB --> verbs API --> PF HW driver > > > > Virtualization systems assign VMs 48 bits mac, to allow working with > > non-modified SW layers (open-stack, libvirt, etc), we can safely > > extend this mac to unique 64 bits GUID. Hence the IPoIB ndo_set_vf_mac > > entry calls the set_vf_guid verb. > > > > Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> > > drivers/infiniband/ulp/ipoib/ipoib_main.c | 39 +++++++++++++++++++++++++++++ > > include/rdma/ib_verbs.h | 4 +++ > > 2 files changed, 43 insertions(+), 0 deletions(-) > > > > diff --git a/drivers/infiniband/ulp/ipoib/ipoib_main.c b/drivers/infiniband/ulp/ipoib/ipoib_main.c > > index 9e1b203..8f82870 100644 > > +++ b/drivers/infiniband/ulp/ipoib/ipoib_main.c > > @@ -1357,6 +1357,43 @@ void ipoib_dev_cleanup(struct net_device *dev) > > priv->tx_ring = NULL; > > } > > > > +static int ipoib_get_vf_config(struct net_device *dev, int vf, struct ifla_vf_info *ivf) > > +{ > > + struct ipoib_dev_priv *priv = netdev_priv(dev); > > + > > + if (priv->ca->get_vf_config) > > + return priv->ca->get_vf_config(priv->ca, priv->port, vf, ivf); > > + else > > + return -EINVAL; > > +} > > + > > +static int ipoib_set_vf_mac(struct net_device *dev, int queue, u8 *mac) > > +{ > > + char *raw_guid; > > + u64 guid = 0; > > This doesn't seem right at all. > > It makes no sense that a IPoIB interface with a 20 byte LLADDR would > accept an 8 byte LLADDR only for 'ip link set vf mac' It has to be the 8byte guid, the next 8bytes are the subnet prefix which is set by the SM and not under driver control, and the final 4bytes are specific to the IPoIB connection. You could pass in all 20bytes, but you would only be able to set 8 of them.
On Thu, May 21, 2015 at 05:05:31PM -0400, Doug Ledford wrote: > It has to be the 8byte guid, the next 8bytes are the subnet prefix which > is set by the SM and not under driver control, and the final 4bytes are > specific to the IPoIB connection. You could pass in all 20bytes, but > you would only be able to set 8 of them. Of course you can't set all 20 bytes, but you still have to provide them, that is the defined LLADDR format for this device. 1) Kernel demands immutable bits are zero and fills them in automatically 2) Userspace copies the current LLADDR and only modifies the 8 bytes of the GUID, kernel demands immutable bits are unchanged 3) Who is to say the QPN can't be made settable someday? People have complained about the unstable QPN in the past. Jason -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/drivers/infiniband/ulp/ipoib/ipoib_main.c b/drivers/infiniband/ulp/ipoib/ipoib_main.c index 9e1b203..8f82870 100644 --- a/drivers/infiniband/ulp/ipoib/ipoib_main.c +++ b/drivers/infiniband/ulp/ipoib/ipoib_main.c @@ -1357,6 +1357,43 @@ void ipoib_dev_cleanup(struct net_device *dev) priv->tx_ring = NULL; } +static int ipoib_get_vf_config(struct net_device *dev, int vf, struct ifla_vf_info *ivf) +{ + struct ipoib_dev_priv *priv = netdev_priv(dev); + + if (priv->ca->get_vf_config) + return priv->ca->get_vf_config(priv->ca, priv->port, vf, ivf); + else + return -EINVAL; +} + +static int ipoib_set_vf_mac(struct net_device *dev, int queue, u8 *mac) +{ + char *raw_guid; + u64 guid = 0; + + struct ipoib_dev_priv *priv = netdev_priv(dev); + + raw_guid = (char *)&guid; + raw_guid[0] = mac[0]; + raw_guid[1] = mac[1]; + raw_guid[2] = mac[2]; + raw_guid[3] = 0xff; + raw_guid[4] = 0xfe; + raw_guid[5] = mac[3]; + raw_guid[6] = mac[4]; + raw_guid[7] = mac[5]; + + guid &= ~(cpu_to_be64(1ULL << 56)); + guid |= cpu_to_be64(1ULL << 57); + + if (priv->ca->set_vf_guid) + return priv->ca->set_vf_guid(priv->ca, priv->port, queue, guid); + else + return -EINVAL; +} + + static const struct header_ops ipoib_header_ops = { .create = ipoib_hard_header, }; @@ -1371,6 +1408,8 @@ static const struct net_device_ops ipoib_netdev_ops = { .ndo_tx_timeout = ipoib_timeout, .ndo_set_rx_mode = ipoib_set_mcast_list, .ndo_get_iflink = ipoib_get_iflink, + .ndo_get_vf_config = ipoib_get_vf_config, + .ndo_set_vf_mac = ipoib_set_vf_mac, }; void ipoib_setup(struct net_device *dev) diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h index 65994a1..6589520 100644 --- a/include/rdma/ib_verbs.h +++ b/include/rdma/ib_verbs.h @@ -53,6 +53,7 @@ #include <linux/atomic.h> #include <linux/mmu_notifier.h> #include <asm/uaccess.h> +#include <linux/if_link.h> extern struct workqueue_struct *ib_wq; @@ -1653,6 +1654,9 @@ struct ib_device { int (*check_mr_status)(struct ib_mr *mr, u32 check_mask, struct ib_mr_status *mr_status); + int (*set_vf_guid) (struct ib_device *device, int port, int vf, u64 guid); + int (*get_vf_config)(struct ib_device *device, int port, int vf, struct ifla_vf_info *ivf); + struct ib_dma_mapping_ops *dma_ops; struct module *owner;
Standard configuration of SRIOV VFs through the host is done over the following chain of calls: libvirt --> netlink --> PF netdevice When this comes to IB/IPoIB we should normalize this into the verbs framework so we further go: PF IPoIB --> verbs API --> PF HW driver Virtualization systems assign VMs 48 bits mac, to allow working with non-modified SW layers (open-stack, libvirt, etc), we can safely extend this mac to unique 64 bits GUID. Hence the IPoIB ndo_set_vf_mac entry calls the set_vf_guid verb. Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> --- drivers/infiniband/ulp/ipoib/ipoib_main.c | 39 +++++++++++++++++++++++++++++ include/rdma/ib_verbs.h | 4 +++ 2 files changed, 43 insertions(+), 0 deletions(-)