Message ID | 20201112192424.2742-4-parav@nvidia.com (mailing list archive) |
---|---|
State | Not Applicable |
Headers | show |
Series | Add mlx5 subfunction support | expand |
On 11/12/20 12:24 PM, Parav Pandit wrote: > Extended devlink interface for the user to add and delete port. > Extend devlink to connect user requests to driver to add/delete > such port in the device. > > When driver routines are invoked, devlink instance lock is not held. > This enables driver to perform several devlink objects registration, > unregistration such as (port, health reporter, resource etc) > by using exising devlink APIs. > This also helps to uniformly use the code for port unregistration > during driver unload and during port deletion initiated by user. > > Examples of add, show and delete commands: > $ devlink dev eswitch set pci/0000:06:00.0 mode switchdev > > $ devlink port show > pci/0000:06:00.0/65535: type eth netdev ens2f0np0 flavour physical port 0 splittable false > > $ devlink port add pci/0000:06:00.0 flavour pcisf pfnum 0 sfnum 88 > > $ devlink port show pci/0000:06:00.0/32768 > pci/0000:06:00.0/32768: type eth netdev eth0 flavour pcisf controller 0 pfnum 0 sfnum 88 external false splittable false > function: > hw_addr 00:00:00:00:88:88 state inactive opstate detached > There has to be limits on the number of sub functions that can be created for a device. How does a user find that limit? Also, seems like there are hardware constraint at play. e.g., can a user reduce the number of queues used by the physical function to support more sub-functions? If so how does a user programmatically learn about this limitation? e.g., devlink could have support to show resource sizing and configure constraints similar to what mlxsw has.
> From: David Ahern <dsahern@gmail.com> > Sent: Wednesday, November 18, 2020 9:51 PM > > On 11/12/20 12:24 PM, Parav Pandit wrote: > > Extended devlink interface for the user to add and delete port. > > Extend devlink to connect user requests to driver to add/delete such > > port in the device. > > > > When driver routines are invoked, devlink instance lock is not held. > > This enables driver to perform several devlink objects registration, > > unregistration such as (port, health reporter, resource etc) by using > > exising devlink APIs. > > This also helps to uniformly use the code for port unregistration > > during driver unload and during port deletion initiated by user. > > > > Examples of add, show and delete commands: > > $ devlink dev eswitch set pci/0000:06:00.0 mode switchdev > > > > $ devlink port show > > pci/0000:06:00.0/65535: type eth netdev ens2f0np0 flavour physical > > port 0 splittable false > > > > $ devlink port add pci/0000:06:00.0 flavour pcisf pfnum 0 sfnum 88 > > > > $ devlink port show pci/0000:06:00.0/32768 > > pci/0000:06:00.0/32768: type eth netdev eth0 flavour pcisf controller 0 > pfnum 0 sfnum 88 external false splittable false > > function: > > hw_addr 00:00:00:00:88:88 state inactive opstate detached > > > > There has to be limits on the number of sub functions that can be created for > a device. How does a user find that limit? Yes, this came up internally, but didn't really converged. The devlink resource looked too verbose for an average or simple use cases. But it may be fine. The hurdle I faced with devlink resource is with defining the granularity. For example one devlink instance deploys sub functions on multiple pci functions. So how to name them? Currently we have controller and PFs in port annotation. So resource name as c0pf0_subfunctions -> for controller 0, pf 0 c1pf2_subfunctions -> for controller 1, pf 2 Couldn't convince my self to name it this way. Below example looked simpler to use but plumbing doesn’t exist for it. $ devlink resource show pci/0000:03:00.0 pci/0000:03:00.0/1: name max_sfs count 256 controller 0 pf 0 pci/0000:03:00.0/2: name max_sfs count 100 controller 1 pf 0 pci/0000:03:00.0/3: name max_sfs count 64 controller 1 pf 1 $ devlink resource set pci/0000:03:00.0/1 max_sfs 100 Second option I was considering was use port params which doesn't sound so right as resource. > > Also, seems like there are hardware constraint at play. e.g., can a user reduce > the number of queues used by the physical function to support more sub- > functions? If so how does a user programmatically learn about this limitation? > e.g., devlink could have support to show resource sizing and configure > constraints similar to what mlxsw has. Yes, need to figure out its naming. For mlx5 num queues doesn't have relation to subfunctions. But PCI resource has relation and this is something we want to do in future, as you said may be using devlink resource.
On 11/18/20 10:02 AM, Parav Pandit wrote: > >> From: David Ahern <dsahern@gmail.com> >> Sent: Wednesday, November 18, 2020 9:51 PM >> >> On 11/12/20 12:24 PM, Parav Pandit wrote: >>> Extended devlink interface for the user to add and delete port. >>> Extend devlink to connect user requests to driver to add/delete such >>> port in the device. >>> >>> When driver routines are invoked, devlink instance lock is not held. >>> This enables driver to perform several devlink objects registration, >>> unregistration such as (port, health reporter, resource etc) by using >>> exising devlink APIs. >>> This also helps to uniformly use the code for port unregistration >>> during driver unload and during port deletion initiated by user. >>> >>> Examples of add, show and delete commands: >>> $ devlink dev eswitch set pci/0000:06:00.0 mode switchdev >>> >>> $ devlink port show >>> pci/0000:06:00.0/65535: type eth netdev ens2f0np0 flavour physical >>> port 0 splittable false >>> >>> $ devlink port add pci/0000:06:00.0 flavour pcisf pfnum 0 sfnum 88 >>> >>> $ devlink port show pci/0000:06:00.0/32768 >>> pci/0000:06:00.0/32768: type eth netdev eth0 flavour pcisf controller 0 >> pfnum 0 sfnum 88 external false splittable false >>> function: >>> hw_addr 00:00:00:00:88:88 state inactive opstate detached >>> >> >> There has to be limits on the number of sub functions that can be created for >> a device. How does a user find that limit? > Yes, this came up internally, but didn't really converged. > The devlink resource looked too verbose for an average or simple use cases. > But it may be fine. > The hurdle I faced with devlink resource is with defining the granularity. > > For example one devlink instance deploys sub functions on multiple pci functions. > So how to name them? Currently we have controller and PFs in port annotation. > So resource name as > c0pf0_subfunctions -> for controller 0, pf 0 > c1pf2_subfunctions -> for controller 1, pf 2 > > Couldn't convince my self to name it this way. > > Below example looked simpler to use but plumbing doesn’t exist for it. > > $ devlink resource show pci/0000:03:00.0 > pci/0000:03:00.0/1: name max_sfs count 256 controller 0 pf 0 > pci/0000:03:00.0/2: name max_sfs count 100 controller 1 pf 0 > pci/0000:03:00.0/3: name max_sfs count 64 controller 1 pf 1 > > $ devlink resource set pci/0000:03:00.0/1 max_sfs 100 > > Second option I was considering was use port params which doesn't sound so right as resource. > >> >> Also, seems like there are hardware constraint at play. e.g., can a user reduce >> the number of queues used by the physical function to support more sub- >> functions? If so how does a user programmatically learn about this limitation? >> e.g., devlink could have support to show resource sizing and configure >> constraints similar to what mlxsw has. > Yes, need to figure out its naming. For mlx5 num queues doesn't have relation to subfunctions. > But PCI resource has relation and this is something we want to do in future, as you said may be using devlink resource. > With Connectx-4 Lx for example the netdev can have at most 63 queues leaving 96 cpu servers a bit short - as an example of the limited number of queues that a nic can handle (or currently exposes to the OS not sure which). If I create a subfunction for ethernet traffic, how many queues are allocated to it by default, is it managed via ethtool like the pf and is there an impact to the resources used by / available to the primary device?
On Wed, Nov 18, 2020 at 11:03:24AM -0700, David Ahern wrote:
> With Connectx-4 Lx for example the netdev can have at most 63 queues
What netdev calls a queue is really a "can the device deliver
interrupts and packets to a given per-CPU queue" and covers a whole
spectrum of smaller limits like RSS scheme, # of available interrupts,
ability of the device to create queues, etc.
CX4Lx can create a huge number of queues, but hits one of these limits
that mean netdev's specific usage can't scale up. Other stuff like
RDMA doesn't have the same limits, and has tonnes of queues.
What seems to be needed is a resource controller concept like cgroup
has for processes. The system is really organized into a tree:
physical device
mlx5_core
/ | \ \ (aux bus)
netdev rdma vdpa SF etc
| (aux bus)
mlx5_core
/ \ (aux bus)
netdev vdpa
And it does make a lot of sense to start to talk about limits at each
tree level.
eg the top of the tree may have 128 physical interrupts. With 128 CPU
cores that isn't enough interrupts to support all of those things
concurrently.
So the user may want to configure:
- The first level netdev only gets 64,
- 3rd level mlx5_core gets 32
- Final level vdpa gets 8
Other stuff has to fight it out with the remaining shared interrupts.
In netdev land # of interrupts governs # of queues
For RDMA # of interrupts limits the CPU affinities for queues
VPDA limits the # of VMs that can use VT-d
The same story repeats for other less general resources, mlx5 also
has consumption of limited BAR space, and consumption of some limited
memory elements. These numbers are much bigger and may not need
explicit governing, but the general concept holds.
It would be very nice if the limit could be injected when the aux
device is created but before the driver is bound. I'm not sure how to
manage that though..
I assume other devices will be different, maybe some devices have a
limit on the number of total queues, or a limit on the number of
VDPA or RDMA devices.
Jason
> From: David Ahern <dsahern@gmail.com> > Sent: Wednesday, November 18, 2020 11:33 PM > > > With Connectx-4 Lx for example the netdev can have at most 63 queues > leaving 96 cpu servers a bit short - as an example of the limited number of > queues that a nic can handle (or currently exposes to the OS not sure which). > If I create a subfunction for ethernet traffic, how many queues are allocated > to it by default, is it managed via ethtool like the pf and is there an impact to > the resources used by / available to the primary device? Jason already answered it with details. Thanks a lot Jason. Short answer to ethtool question, yes, ethtool can change num queues for subfunction like PF. Default is same number of queues for subfunction as that of PF in this patchset.
On 11/18/20 11:38 AM, Jason Gunthorpe wrote: > On Wed, Nov 18, 2020 at 11:03:24AM -0700, David Ahern wrote: > >> With Connectx-4 Lx for example the netdev can have at most 63 queues > > What netdev calls a queue is really a "can the device deliver > interrupts and packets to a given per-CPU queue" and covers a whole > spectrum of smaller limits like RSS scheme, # of available interrupts, > ability of the device to create queues, etc. > > CX4Lx can create a huge number of queues, but hits one of these limits > that mean netdev's specific usage can't scale up. Other stuff like > RDMA doesn't have the same limits, and has tonnes of queues. > > What seems to be needed is a resource controller concept like cgroup > has for processes. The system is really organized into a tree: > > physical device > mlx5_core > / | \ \ (aux bus) > netdev rdma vdpa SF etc > | (aux bus) > mlx5_core > / \ (aux bus) > netdev vdpa > > And it does make a lot of sense to start to talk about limits at each > tree level. > > eg the top of the tree may have 128 physical interrupts. With 128 CPU > cores that isn't enough interrupts to support all of those things > concurrently. > > So the user may want to configure: > - The first level netdev only gets 64, > - 3rd level mlx5_core gets 32 > - Final level vdpa gets 8 > > Other stuff has to fight it out with the remaining shared interrupts. > > In netdev land # of interrupts governs # of queues > > For RDMA # of interrupts limits the CPU affinities for queues > > VPDA limits the # of VMs that can use VT-d > > The same story repeats for other less general resources, mlx5 also > has consumption of limited BAR space, and consumption of some limited > memory elements. These numbers are much bigger and may not need > explicit governing, but the general concept holds. > > It would be very nice if the limit could be injected when the aux > device is created but before the driver is bound. I'm not sure how to > manage that though.. > > I assume other devices will be different, maybe some devices have a > limit on the number of total queues, or a limit on the number of > VDPA or RDMA devices. > > Jason > A lot of low level resource details that need to be summarized into a nicer user / config perspective to specify limits / allocations. Thanks for the detailed response.
On Wed, Nov 18, 2020 at 12:36:26PM -0700, David Ahern wrote: > On 11/18/20 11:38 AM, Jason Gunthorpe wrote: > > On Wed, Nov 18, 2020 at 11:03:24AM -0700, David Ahern wrote: > > > >> With Connectx-4 Lx for example the netdev can have at most 63 queues > > > > What netdev calls a queue is really a "can the device deliver > > interrupts and packets to a given per-CPU queue" and covers a whole > > spectrum of smaller limits like RSS scheme, # of available interrupts, > > ability of the device to create queues, etc. > > > > CX4Lx can create a huge number of queues, but hits one of these limits > > that mean netdev's specific usage can't scale up. Other stuff like > > RDMA doesn't have the same limits, and has tonnes of queues. > > > > What seems to be needed is a resource controller concept like cgroup > > has for processes. The system is really organized into a tree: > > > > physical device > > mlx5_core > > / | \ \ (aux bus) > > netdev rdma vdpa SF etc > > | (aux bus) > > mlx5_core > > / \ (aux bus) > > netdev vdpa > > > > And it does make a lot of sense to start to talk about limits at each > > tree level. > > > > eg the top of the tree may have 128 physical interrupts. With 128 CPU > > cores that isn't enough interrupts to support all of those things > > concurrently. > > > > So the user may want to configure: > > - The first level netdev only gets 64, > > - 3rd level mlx5_core gets 32 > > - Final level vdpa gets 8 > > > > Other stuff has to fight it out with the remaining shared interrupts. > > > > In netdev land # of interrupts governs # of queues > > > > For RDMA # of interrupts limits the CPU affinities for queues > > > > VPDA limits the # of VMs that can use VT-d > > > > The same story repeats for other less general resources, mlx5 also > > has consumption of limited BAR space, and consumption of some limited > > memory elements. These numbers are much bigger and may not need > > explicit governing, but the general concept holds. > > > > It would be very nice if the limit could be injected when the aux > > device is created but before the driver is bound. I'm not sure how to > > manage that though.. > > > > I assume other devices will be different, maybe some devices have a > > limit on the number of total queues, or a limit on the number of > > VDPA or RDMA devices. > > A lot of low level resource details that need to be summarized into a > nicer user / config perspective to specify limits / allocations. Well, now that we have the aux bus stuff there is a nice natural place to put things.. The aux bus owner device (mlx5_core) could have a list of available resources Each aux bus device (netdev/rdma/vdpa) could have a list of consumed resources Some API to place a limit on the consumed resources at each aux bus device. The tricky bit is the auto-probing/configure. By the time the user has a chance to apply a limit the drivers are already bound and have already done their setup. So each subsystem has to support dynamically imposing a limit.. And I simplified things a bit above too, we actually have two kinds of interrupt demand: sharable and dedicated. The actual need is to carve out a bunch of dedicated interrupts and only allow subsystems that are doing VT-d guest interrupt assignment to consume them (eg VDPA) Jason
On 11/18/2020 11:22 AM, Parav Pandit wrote: > > >> From: David Ahern <dsahern@gmail.com> >> Sent: Wednesday, November 18, 2020 11:33 PM >> >> >> With Connectx-4 Lx for example the netdev can have at most 63 queues >> leaving 96 cpu servers a bit short - as an example of the limited number of >> queues that a nic can handle (or currently exposes to the OS not sure which). >> If I create a subfunction for ethernet traffic, how many queues are allocated >> to it by default, is it managed via ethtool like the pf and is there an impact to >> the resources used by / available to the primary device? > > Jason already answered it with details. > Thanks a lot Jason. > > Short answer to ethtool question, yes, ethtool can change num queues for subfunction like PF. > Default is same number of queues for subfunction as that of PF in this patchset. > But what is the mechanism for partitioning the global resources of the device into each sub function? Is it just evenly divided into the subfunctions? is there some maximum limit per sub function?
On 11/18/2020 9:02 AM, Parav Pandit wrote: > >> From: David Ahern <dsahern@gmail.com> >> Sent: Wednesday, November 18, 2020 9:51 PM >> >> On 11/12/20 12:24 PM, Parav Pandit wrote: >>> Extended devlink interface for the user to add and delete port. >>> Extend devlink to connect user requests to driver to add/delete such >>> port in the device. >>> >>> When driver routines are invoked, devlink instance lock is not held. >>> This enables driver to perform several devlink objects registration, >>> unregistration such as (port, health reporter, resource etc) by using >>> exising devlink APIs. >>> This also helps to uniformly use the code for port unregistration >>> during driver unload and during port deletion initiated by user. >>> >>> Examples of add, show and delete commands: >>> $ devlink dev eswitch set pci/0000:06:00.0 mode switchdev >>> >>> $ devlink port show >>> pci/0000:06:00.0/65535: type eth netdev ens2f0np0 flavour physical >>> port 0 splittable false >>> >>> $ devlink port add pci/0000:06:00.0 flavour pcisf pfnum 0 sfnum 88 >>> >>> $ devlink port show pci/0000:06:00.0/32768 >>> pci/0000:06:00.0/32768: type eth netdev eth0 flavour pcisf controller 0 >> pfnum 0 sfnum 88 external false splittable false >>> function: >>> hw_addr 00:00:00:00:88:88 state inactive opstate detached >>> >> >> There has to be limits on the number of sub functions that can be created for >> a device. How does a user find that limit? > Yes, this came up internally, but didn't really converged. > The devlink resource looked too verbose for an average or simple use cases. > But it may be fine. > The hurdle I faced with devlink resource is with defining the granularity. > > For example one devlink instance deploys sub functions on multiple pci functions. > So how to name them? Currently we have controller and PFs in port annotation. > So resource name as > c0pf0_subfunctions -> for controller 0, pf 0 > c1pf2_subfunctions -> for controller 1, pf 2 > > Couldn't convince my self to name it this way. Yea, I think we need to extend the plumbing of resources to allow specifying or assigning parent resources to a subfunction. > > Below example looked simpler to use but plumbing doesn’t exist for it. > > $ devlink resource show pci/0000:03:00.0 > pci/0000:03:00.0/1: name max_sfs count 256 controller 0 pf 0 > pci/0000:03:00.0/2: name max_sfs count 100 controller 1 pf 0 > pci/0000:03:00.0/3: name max_sfs count 64 controller 1 pf 1 > > $ devlink resource set pci/0000:03:00.0/1 max_sfs 100 > > Second option I was considering was use port params which doesn't sound so right as resource. > I don't think port parameters make sense here. They only encapsulate single name -> value pairs, and don't really help show the relationships between the subfunction ports and the parent device. >> >> Also, seems like there are hardware constraint at play. e.g., can a user reduce >> the number of queues used by the physical function to support more sub- >> functions? If so how does a user programmatically learn about this limitation? >> e.g., devlink could have support to show resource sizing and configure >> constraints similar to what mlxsw has. > Yes, need to figure out its naming. For mlx5 num queues doesn't have relation to subfunctions. > But PCI resource has relation and this is something we want to do in future, as you said may be using devlink resource. > I've been looking into queue management and being able to add and remove queue groups and queues. I'm leaning towards building on top of devlink resource for this. Specifically I have been looking at picking up the work started by Magnus last year, around creating interface for representing queues to the stack better for AF_XDP, but it also has other possible uses. I'd like to make sure it aligns with the ideas here for partitioning resources. It seems like that should be best done at the devlink level, where the main devlink instance knows about all the part limitations and can then have new commands for allowing assignment of resources to ports.
On 11/18/20 5:41 PM, Jacob Keller wrote: > > > On 11/18/2020 11:22 AM, Parav Pandit wrote: >> >> >>> From: David Ahern <dsahern@gmail.com> >>> Sent: Wednesday, November 18, 2020 11:33 PM >>> >>> >>> With Connectx-4 Lx for example the netdev can have at most 63 queues >>> leaving 96 cpu servers a bit short - as an example of the limited number of >>> queues that a nic can handle (or currently exposes to the OS not sure which). >>> If I create a subfunction for ethernet traffic, how many queues are allocated >>> to it by default, is it managed via ethtool like the pf and is there an impact to >>> the resources used by / available to the primary device? >> >> Jason already answered it with details. >> Thanks a lot Jason. >> >> Short answer to ethtool question, yes, ethtool can change num queues for subfunction like PF. >> Default is same number of queues for subfunction as that of PF in this patchset. >> > > But what is the mechanism for partitioning the global resources of the > device into each sub function? > > Is it just evenly divided into the subfunctions? is there some maximum > limit per sub function? > I hope it is not just evenly divided; it should be user controllable. If I create a subfunction for say a container's networking, I may want to only assign 1 Rx and 1 Tx queue pair (or 1 channel depending on terminology where channel includes Rx, Tx and CQ).
On 11/18/2020 5:17 PM, David Ahern wrote: > On 11/18/20 5:41 PM, Jacob Keller wrote: >> >> >> On 11/18/2020 11:22 AM, Parav Pandit wrote: >>> >>> >>>> From: David Ahern <dsahern@gmail.com> >>>> Sent: Wednesday, November 18, 2020 11:33 PM >>>> >>>> >>>> With Connectx-4 Lx for example the netdev can have at most 63 queues >>>> leaving 96 cpu servers a bit short - as an example of the limited number of >>>> queues that a nic can handle (or currently exposes to the OS not sure which). >>>> If I create a subfunction for ethernet traffic, how many queues are allocated >>>> to it by default, is it managed via ethtool like the pf and is there an impact to >>>> the resources used by / available to the primary device? >>> >>> Jason already answered it with details. >>> Thanks a lot Jason. >>> >>> Short answer to ethtool question, yes, ethtool can change num queues for subfunction like PF. >>> Default is same number of queues for subfunction as that of PF in this patchset. >>> >> >> But what is the mechanism for partitioning the global resources of the >> device into each sub function? >> >> Is it just evenly divided into the subfunctions? is there some maximum >> limit per sub function? >> > > I hope it is not just evenly divided; it should be user controllable. If > I create a subfunction for say a container's networking, I may want to > only assign 1 Rx and 1 Tx queue pair (or 1 channel depending on > terminology where channel includes Rx, Tx and CQ). I think we need a way to expose and configure policy for resources associated with each type of auxiliary device. For ex: default, min and max queues and interrupt vectors. Once an auxiliary device is created, the user should be able to configure the resources within the allowed min-max values.
diff --git a/include/net/devlink.h b/include/net/devlink.h index 1b7c9fbc607a..3991345ef3e2 100644 --- a/include/net/devlink.h +++ b/include/net/devlink.h @@ -152,6 +152,17 @@ struct devlink_port { struct mutex reporters_lock; /* Protects reporter_list */ }; +struct devlink_port_new_attrs { + enum devlink_port_flavour flavour; + unsigned int port_index; + u32 controller; + u32 sfnum; + u16 pfnum; + u8 port_index_valid:1, + controller_valid:1, + sfnum_valid:1; +}; + struct devlink_sb_pool_info { enum devlink_sb_pool_type pool_type; u32 size; @@ -1360,6 +1371,33 @@ struct devlink_ops { int (*port_function_hw_addr_set)(struct devlink *devlink, struct devlink_port *port, const u8 *hw_addr, int hw_addr_len, struct netlink_ext_ack *extack); + /** + * @port_new: Port add function. + * + * Should be used by device driver to let caller add new port of a specified flavour + * with optional attributes. + * Driver should return -EOPNOTSUPP if it doesn't support port addition of a specified + * flavour or specified attributes. Driver should set extack error message in case of fail + * to add the port. + * devlink core does not hold a devlink instance lock when this callback is invoked. + * Driver must ensures synchronization when adding or deleting a port. Driver must + * register a port with devlink core. + */ + int (*port_new)(struct devlink *devlink, const struct devlink_port_new_attrs *attrs, + struct netlink_ext_ack *extack); + /** + * @port_del: Port delete function. + * + * Should be used by device driver to let caller delete port which was previously created + * using port_new() callback. + * Driver should return -EOPNOTSUPP if it doesn't support port deletion. + * Driver should set extack error message in case of fail to delete the port. + * devlink core does not hold a devlink instance lock when this callback is invoked. + * Driver must ensures synchronization when adding or deleting a port. Driver must + * register a port with devlink core. + */ + int (*port_del)(struct devlink *devlink, unsigned int port_index, + struct netlink_ext_ack *extack); }; static inline void *devlink_priv(struct devlink *devlink) diff --git a/net/core/devlink.c b/net/core/devlink.c index b1e849b624a6..dccdf36afba6 100644 --- a/net/core/devlink.c +++ b/net/core/devlink.c @@ -1124,6 +1124,57 @@ static int devlink_nl_cmd_port_unsplit_doit(struct sk_buff *skb, return devlink_port_unsplit(devlink, port_index, info->extack); } +static int devlink_nl_cmd_port_new_doit(struct sk_buff *skb, struct genl_info *info) +{ + struct netlink_ext_ack *extack = info->extack; + struct devlink_port_new_attrs new_attrs = {}; + struct devlink *devlink = info->user_ptr[0]; + + if (!info->attrs[DEVLINK_ATTR_PORT_FLAVOUR] || + !info->attrs[DEVLINK_ATTR_PORT_PCI_PF_NUMBER]) { + NL_SET_ERR_MSG_MOD(extack, "Port flavour or PCI PF are not specified"); + return -EINVAL; + } + new_attrs.flavour = nla_get_u16(info->attrs[DEVLINK_ATTR_PORT_FLAVOUR]); + new_attrs.pfnum = nla_get_u16(info->attrs[DEVLINK_ATTR_PORT_PCI_PF_NUMBER]); + + if (info->attrs[DEVLINK_ATTR_PORT_INDEX]) { + new_attrs.port_index = nla_get_u32(info->attrs[DEVLINK_ATTR_PORT_INDEX]); + new_attrs.port_index_valid = true; + } + if (info->attrs[DEVLINK_ATTR_PORT_CONTROLLER_NUMBER]) { + new_attrs.controller = + nla_get_u16(info->attrs[DEVLINK_ATTR_PORT_CONTROLLER_NUMBER]); + new_attrs.controller_valid = true; + } + if (info->attrs[DEVLINK_ATTR_PORT_PCI_SF_NUMBER]) { + new_attrs.sfnum = nla_get_u32(info->attrs[DEVLINK_ATTR_PORT_PCI_SF_NUMBER]); + new_attrs.sfnum_valid = true; + } + + if (!devlink->ops->port_new) + return -EOPNOTSUPP; + + return devlink->ops->port_new(devlink, &new_attrs, extack); +} + +static int devlink_nl_cmd_port_del_doit(struct sk_buff *skb, struct genl_info *info) +{ + struct netlink_ext_ack *extack = info->extack; + struct devlink *devlink = info->user_ptr[0]; + unsigned int port_index; + + if (!info->attrs[DEVLINK_ATTR_PORT_INDEX]) { + NL_SET_ERR_MSG_MOD(extack, "Port index is not specified"); + return -EINVAL; + } + port_index = nla_get_u32(info->attrs[DEVLINK_ATTR_PORT_INDEX]); + + if (!devlink->ops->port_del) + return -EOPNOTSUPP; + return devlink->ops->port_del(devlink, port_index, extack); +} + static int devlink_nl_sb_fill(struct sk_buff *msg, struct devlink *devlink, struct devlink_sb *devlink_sb, enum devlink_command cmd, u32 portid, @@ -7565,6 +7616,10 @@ static const struct nla_policy devlink_nl_policy[DEVLINK_ATTR_MAX + 1] = { [DEVLINK_ATTR_RELOAD_ACTION] = NLA_POLICY_RANGE(NLA_U8, DEVLINK_RELOAD_ACTION_DRIVER_REINIT, DEVLINK_RELOAD_ACTION_MAX), [DEVLINK_ATTR_RELOAD_LIMITS] = NLA_POLICY_BITFIELD32(DEVLINK_RELOAD_LIMITS_VALID_MASK), + [DEVLINK_ATTR_PORT_FLAVOUR] = { .type = NLA_U16 }, + [DEVLINK_ATTR_PORT_PCI_PF_NUMBER] = { .type = NLA_U16 }, + [DEVLINK_ATTR_PORT_PCI_SF_NUMBER] = { .type = NLA_U32 }, + [DEVLINK_ATTR_PORT_CONTROLLER_NUMBER] = { .type = NLA_U32 }, }; static const struct genl_small_ops devlink_nl_ops[] = { @@ -7604,6 +7659,18 @@ static const struct genl_small_ops devlink_nl_ops[] = { .flags = GENL_ADMIN_PERM, .internal_flags = DEVLINK_NL_FLAG_NO_LOCK, }, + { + .cmd = DEVLINK_CMD_PORT_NEW, + .doit = devlink_nl_cmd_port_new_doit, + .flags = GENL_ADMIN_PERM, + .internal_flags = DEVLINK_NL_FLAG_NO_LOCK, + }, + { + .cmd = DEVLINK_CMD_PORT_DEL, + .doit = devlink_nl_cmd_port_del_doit, + .flags = GENL_ADMIN_PERM, + .internal_flags = DEVLINK_NL_FLAG_NO_LOCK, + }, { .cmd = DEVLINK_CMD_SB_GET, .validate = GENL_DONT_VALIDATE_STRICT | GENL_DONT_VALIDATE_DUMP,