diff mbox series

[net-next,RFC,1/2] devlink: Add shared descriptor eswitch attr

Message ID 20240228015954.11981-1-witu@nvidia.com (mailing list archive)
State Superseded
Delegated to: Netdev Maintainers
Headers show
Series [net-next,RFC,1/2] devlink: Add shared descriptor eswitch attr | expand

Checks

Context Check Description
netdev/series_format success Single patches do not need cover letters
netdev/tree_selection success Clearly marked for net-next
netdev/ynl fail Tree is dirty after regen; no warnings/errors; no diff in generated;
netdev/fixes_present success Fixes tag not required for -next series
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 1310 this patch: 1310
netdev/build_tools success Errors and warnings before: 0 this patch: 0
netdev/cc_maintainers warning 2 maintainers not CCed: pabeni@redhat.com edumazet@google.com
netdev/build_clang fail Errors and warnings before: 965 this patch: 967
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/deprecated_api success None detected
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success No Fixes tag
netdev/build_allmodconfig_warn success Errors and warnings before: 1492 this patch: 1492
netdev/checkpatch fail CHECK: Alignment should match open parenthesis CHECK: Blank lines aren't necessary before a close brace '}' ERROR: code indent should use tabs where possible WARNING: line length of 104 exceeds 80 columns WARNING: line length of 87 exceeds 80 columns WARNING: line length of 90 exceeds 80 columns WARNING: line length of 91 exceeds 80 columns WARNING: line length of 93 exceeds 80 columns
netdev/build_clang_rust success No Rust files in patch. Skipping build
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0

Commit Message

William Tu Feb. 28, 2024, 1:59 a.m. UTC
Add two eswitch attrs: shrdesc_mode and shrdesc_count.

1. shrdesc_mode: to enable a sharing memory buffer for
representor's rx buffer, and 2. shrdesc_count: to control the
number of buffers in this shared memory pool.

When using switchdev mode, the representor ports handles the slow path
traffic, the traffic that can't be offloaded will be redirected to the
representor port for processing. Memory consumption of the representor
port's rx buffer can grow to several GB when scaling to 1k VFs reps.
For example, in mlx5 driver, each RQ, with a typical 1K descriptors,
consumes 3MB of DMA memory for packet buffer in WQEs, and with four
channels, it consumes 4 * 3MB * 1024 = 12GB of memory. And since rep
ports are for slow path traffic, most of these rx DMA memory are idle.

Add shrdesc_mode configuration, allowing multiple representors
to share a rx memory buffer pool. When enabled, individual representor
doesn't need to allocate its dedicated rx buffer, but just pointing
its rq to the memory pool. This could make the memory being better
utilized. The shrdesc_count represents the number of rx ring
entries, e.g., same meaning as ethtool -g, that's shared across other
representors. Users adjust it based on how many reps, total system
memory, or performance expectation.

The two params are also useful for other vendors such as Intel ICE
drivers and Broadcom's driver, which also have representor ports for
slow path traffic.

An example use case:
$ devlink dev eswitch show pci/0000:08:00.0
  pci/0000:08:00.0: mode legacy inline-mode none encap-mode basic \
  shrdesc-mode none shrdesc-count 0
$ devlink dev eswitch set pci/0000:08:00.0 mode switchdev \
  shrdesc-mode basic shrdesc-count 1024
$ devlink dev eswitch show pci/0000:08:00.0
  pci/0000:08:00.0: mode switchdev inline-mode none encap-mode basic \
  shrdesc-mode basic shrdesc-count 1024

Note that new configurations are set at legacy mode, and enabled at
switchdev mode.

Signed-off-by: William Tu <witu@nvidia.com>
---
previous devlink-sd discussion
https://lore.kernel.org/netdev/20240220141709.42a9c640@kernel.org/
---
 include/net/devlink.h        |  8 +++++++
 include/uapi/linux/devlink.h |  7 ++++++
 net/devlink/dev.c            | 43 ++++++++++++++++++++++++++++++++++++
 net/devlink/netlink_gen.c    |  6 +++--
 4 files changed, 62 insertions(+), 2 deletions(-)

Comments

Simon Horman Feb. 28, 2024, 1:12 p.m. UTC | #1
On Wed, Feb 28, 2024 at 03:59:53AM +0200, William Tu wrote:

...

> diff --git a/net/devlink/netlink_gen.c b/net/devlink/netlink_gen.c
> index c81cf2dd154f..ac8b0c7105dd 100644
> --- a/net/devlink/netlink_gen.c
> +++ b/net/devlink/netlink_gen.c
> @@ -194,12 +194,14 @@ static const struct nla_policy devlink_eswitch_get_nl_policy[DEVLINK_ATTR_DEV_NA
>  };
>  
>  /* DEVLINK_CMD_ESWITCH_SET - do */
> -static const struct nla_policy devlink_eswitch_set_nl_policy[DEVLINK_ATTR_ESWITCH_ENCAP_MODE + 1] = {
> +static const struct nla_policy devlink_eswitch_set_nl_policy[DEVLINK_ATTR_ESWITCH_SHRDESC_COUNT + 1] = {
>  	[DEVLINK_ATTR_BUS_NAME] = { .type = NLA_NUL_STRING, },
>  	[DEVLINK_ATTR_DEV_NAME] = { .type = NLA_NUL_STRING, },
>  	[DEVLINK_ATTR_ESWITCH_MODE] = NLA_POLICY_MAX(NLA_U16, 1),
>  	[DEVLINK_ATTR_ESWITCH_INLINE_MODE] = NLA_POLICY_MAX(NLA_U16, 3),
>  	[DEVLINK_ATTR_ESWITCH_ENCAP_MODE] = NLA_POLICY_MAX(NLA_U8, 1),
> +	[DEVLINK_ATTR_ESWITCH_SHRDESC_MODE] = NLA_POLICY_MAX(NLA_U8, 1),
> +	[DEVLINK_ATTR_ESWITCH_SHRDESC_COUNT] = NLA_POLICY_MAX(NLA_U32, 65535),
>  };

Hi William,

I realise this is probably not central to your purpose in sending an RFC,
but my understanding is that the max value set using NLA_POLICY_MAX
is of type s16, and thus 65535 is too large - it becomes -1.

Flagged by W=1 build with clang-17.

>  
>  /* DEVLINK_CMD_DPIPE_TABLE_GET - do */

...
Jiri Pirko Feb. 28, 2024, 1:20 p.m. UTC | #2
Wed, Feb 28, 2024 at 02:12:49PM CET, horms@kernel.org wrote:
>On Wed, Feb 28, 2024 at 03:59:53AM +0200, William Tu wrote:
>
>...
>
>> diff --git a/net/devlink/netlink_gen.c b/net/devlink/netlink_gen.c
>> index c81cf2dd154f..ac8b0c7105dd 100644
>> --- a/net/devlink/netlink_gen.c
>> +++ b/net/devlink/netlink_gen.c
>> @@ -194,12 +194,14 @@ static const struct nla_policy devlink_eswitch_get_nl_policy[DEVLINK_ATTR_DEV_NA
>>  };
>>  
>>  /* DEVLINK_CMD_ESWITCH_SET - do */
>> -static const struct nla_policy devlink_eswitch_set_nl_policy[DEVLINK_ATTR_ESWITCH_ENCAP_MODE + 1] = {
>> +static const struct nla_policy devlink_eswitch_set_nl_policy[DEVLINK_ATTR_ESWITCH_SHRDESC_COUNT + 1] = {
>>  	[DEVLINK_ATTR_BUS_NAME] = { .type = NLA_NUL_STRING, },
>>  	[DEVLINK_ATTR_DEV_NAME] = { .type = NLA_NUL_STRING, },
>>  	[DEVLINK_ATTR_ESWITCH_MODE] = NLA_POLICY_MAX(NLA_U16, 1),
>>  	[DEVLINK_ATTR_ESWITCH_INLINE_MODE] = NLA_POLICY_MAX(NLA_U16, 3),
>>  	[DEVLINK_ATTR_ESWITCH_ENCAP_MODE] = NLA_POLICY_MAX(NLA_U8, 1),
>> +	[DEVLINK_ATTR_ESWITCH_SHRDESC_MODE] = NLA_POLICY_MAX(NLA_U8, 1),
>> +	[DEVLINK_ATTR_ESWITCH_SHRDESC_COUNT] = NLA_POLICY_MAX(NLA_U32, 65535),
>>  };
>
>Hi William,
>
>I realise this is probably not central to your purpose in sending an RFC,
>but my understanding is that the max value set using NLA_POLICY_MAX
>is of type s16, and thus 65535 is too large - it becomes -1.
>
>Flagged by W=1 build with clang-17.

First of all:
/* Do not edit directly, auto-generated from: */
/*      Documentation/netlink/specs/devlink.yaml */



>
>>  
>>  /* DEVLINK_CMD_DPIPE_TABLE_GET - do */
>
>...
>
William Tu Feb. 29, 2024, 5:24 a.m. UTC | #3
On 2/28/24 5:20 AM, Jiri Pirko wrote:
> External email: Use caution opening links or attachments
>
>
> Wed, Feb 28, 2024 at 02:12:49PM CET, horms@kernel.org wrote:
>> On Wed, Feb 28, 2024 at 03:59:53AM +0200, William Tu wrote:
>>
>> ...
>>
>>> diff --git a/net/devlink/netlink_gen.c b/net/devlink/netlink_gen.c
>>> index c81cf2dd154f..ac8b0c7105dd 100644
>>> --- a/net/devlink/netlink_gen.c
>>> +++ b/net/devlink/netlink_gen.c
>>> @@ -194,12 +194,14 @@ static const struct nla_policy devlink_eswitch_get_nl_policy[DEVLINK_ATTR_DEV_NA
>>>   };
>>>
>>>   /* DEVLINK_CMD_ESWITCH_SET - do */
>>> -static const struct nla_policy devlink_eswitch_set_nl_policy[DEVLINK_ATTR_ESWITCH_ENCAP_MODE + 1] = {
>>> +static const struct nla_policy devlink_eswitch_set_nl_policy[DEVLINK_ATTR_ESWITCH_SHRDESC_COUNT + 1] = {
>>>       [DEVLINK_ATTR_BUS_NAME] = { .type = NLA_NUL_STRING, },
>>>       [DEVLINK_ATTR_DEV_NAME] = { .type = NLA_NUL_STRING, },
>>>       [DEVLINK_ATTR_ESWITCH_MODE] = NLA_POLICY_MAX(NLA_U16, 1),
>>>       [DEVLINK_ATTR_ESWITCH_INLINE_MODE] = NLA_POLICY_MAX(NLA_U16, 3),
>>>       [DEVLINK_ATTR_ESWITCH_ENCAP_MODE] = NLA_POLICY_MAX(NLA_U8, 1),
>>> +    [DEVLINK_ATTR_ESWITCH_SHRDESC_MODE] = NLA_POLICY_MAX(NLA_U8, 1),
>>> +    [DEVLINK_ATTR_ESWITCH_SHRDESC_COUNT] = NLA_POLICY_MAX(NLA_U32, 65535),
>>>   };
>> Hi William,
>>
>> I realise this is probably not central to your purpose in sending an RFC,
>> but my understanding is that the max value set using NLA_POLICY_MAX
>> is of type s16, and thus 65535 is too large - it becomes -1.
>>
>> Flagged by W=1 build with clang-17.
> First of all:
> /* Do not edit directly, auto-generated from: */
> /*      Documentation/netlink/specs/devlink.yaml */
>
>
>
Hi Jiri and Simon,
Thanks for the feedback. Will fix it.
William
diff mbox series

Patch

diff --git a/include/net/devlink.h b/include/net/devlink.h
index 9ac394bdfbe4..aca25183e72a 100644
--- a/include/net/devlink.h
+++ b/include/net/devlink.h
@@ -1327,6 +1327,14 @@  struct devlink_ops {
 	int (*eswitch_encap_mode_set)(struct devlink *devlink,
 				      enum devlink_eswitch_encap_mode encap_mode,
 				      struct netlink_ext_ack *extack);
+	int (*eswitch_shrdesc_mode_get)(struct devlink *devlink,
+				        enum devlink_eswitch_shrdesc_mode *p_shrdesc_mode);
+	int (*eswitch_shrdesc_mode_set)(struct devlink *devlink,
+				        enum devlink_eswitch_shrdesc_mode shrdesc_mode,
+				        struct netlink_ext_ack *extack);
+	int (*eswitch_shrdesc_count_get)(struct devlink *devlink, int *count);
+	int (*eswitch_shrdesc_count_set)(struct devlink *devlink, int count,
+				        struct netlink_ext_ack *extack);
 	int (*info_get)(struct devlink *devlink, struct devlink_info_req *req,
 			struct netlink_ext_ack *extack);
 	/**
diff --git a/include/uapi/linux/devlink.h b/include/uapi/linux/devlink.h
index 130cae0d3e20..31323c481614 100644
--- a/include/uapi/linux/devlink.h
+++ b/include/uapi/linux/devlink.h
@@ -195,6 +195,11 @@  enum devlink_eswitch_encap_mode {
 	DEVLINK_ESWITCH_ENCAP_MODE_BASIC,
 };
 
+enum devlink_eswitch_shrdesc_mode {
+	DEVLINK_ESWITCH_SHRDESC_MODE_NONE,
+	DEVLINK_ESWITCH_SHRDESC_MODE_BASIC,
+};
+
 enum devlink_port_flavour {
 	DEVLINK_PORT_FLAVOUR_PHYSICAL, /* Any kind of a port physically
 					* facing the user.
@@ -614,6 +619,8 @@  enum devlink_attr {
 
 	DEVLINK_ATTR_REGION_DIRECT,		/* flag */
 
+	DEVLINK_ATTR_ESWITCH_SHRDESC_MODE,	/* u8 */
+	DEVLINK_ATTR_ESWITCH_SHRDESC_COUNT,	/* u32 */
 	/* add new attributes above here, update the policy in devlink.c */
 
 	__DEVLINK_ATTR_MAX,
diff --git a/net/devlink/dev.c b/net/devlink/dev.c
index 19dbf540748a..90b279c3c1e6 100644
--- a/net/devlink/dev.c
+++ b/net/devlink/dev.c
@@ -631,8 +631,10 @@  static int devlink_nl_eswitch_fill(struct sk_buff *msg, struct devlink *devlink,
 				   enum devlink_command cmd, u32 portid,
 				   u32 seq, int flags)
 {
+	enum devlink_eswitch_shrdesc_mode shrdesc_mode;
 	const struct devlink_ops *ops = devlink->ops;
 	enum devlink_eswitch_encap_mode encap_mode;
+	u32 shrdesc_count;
 	u8 inline_mode;
 	void *hdr;
 	int err = 0;
@@ -674,6 +676,25 @@  static int devlink_nl_eswitch_fill(struct sk_buff *msg, struct devlink *devlink,
 			goto nla_put_failure;
 	}
 
+	if (ops->eswitch_shrdesc_mode_get) {
+		err = ops->eswitch_shrdesc_mode_get(devlink, &shrdesc_mode);
+		if (err)
+			goto nla_put_failure;
+		err = nla_put_u8(msg, DEVLINK_ATTR_ESWITCH_SHRDESC_MODE, shrdesc_mode);
+		if (err)
+			goto nla_put_failure;
+
+	}
+
+	if (ops->eswitch_shrdesc_count_get) {
+		err = ops->eswitch_shrdesc_count_get(devlink, &shrdesc_count);
+		if (err)
+			goto nla_put_failure;
+		err = nla_put_u32(msg, DEVLINK_ATTR_ESWITCH_SHRDESC_COUNT, shrdesc_count);
+		if (err)
+			goto nla_put_failure;
+	}
+
 	genlmsg_end(msg, hdr);
 	return 0;
 
@@ -705,9 +726,11 @@  int devlink_nl_eswitch_get_doit(struct sk_buff *skb, struct genl_info *info)
 
 int devlink_nl_eswitch_set_doit(struct sk_buff *skb, struct genl_info *info)
 {
+	enum devlink_eswitch_shrdesc_mode shrdesc_mode;
 	struct devlink *devlink = info->user_ptr[0];
 	const struct devlink_ops *ops = devlink->ops;
 	enum devlink_eswitch_encap_mode encap_mode;
+	u32 shrdesc_count;
 	u8 inline_mode;
 	int err = 0;
 	u16 mode;
@@ -744,6 +767,26 @@  int devlink_nl_eswitch_set_doit(struct sk_buff *skb, struct genl_info *info)
 			return err;
 	}
 
+	if (info->attrs[DEVLINK_ATTR_ESWITCH_SHRDESC_MODE]) {
+		if (!ops->eswitch_shrdesc_mode_set)
+			return -EOPNOTSUPP;
+		shrdesc_mode = nla_get_u8(info->attrs[DEVLINK_ATTR_ESWITCH_SHRDESC_MODE]);
+		err = ops->eswitch_shrdesc_mode_set(devlink, shrdesc_mode,
+						    info->extack);
+		if (err)
+			return err;
+	}
+
+	if (info->attrs[DEVLINK_ATTR_ESWITCH_SHRDESC_COUNT]) {
+		if (!ops->eswitch_shrdesc_count_set)
+			return -EOPNOTSUPP;
+		shrdesc_count = nla_get_u32(info->attrs[DEVLINK_ATTR_ESWITCH_SHRDESC_COUNT]);
+		err = ops->eswitch_shrdesc_count_set(devlink, shrdesc_count,
+						     info->extack);
+		if (err)
+			return err;
+	}
+
 	return 0;
 }
 
diff --git a/net/devlink/netlink_gen.c b/net/devlink/netlink_gen.c
index c81cf2dd154f..ac8b0c7105dd 100644
--- a/net/devlink/netlink_gen.c
+++ b/net/devlink/netlink_gen.c
@@ -194,12 +194,14 @@  static const struct nla_policy devlink_eswitch_get_nl_policy[DEVLINK_ATTR_DEV_NA
 };
 
 /* DEVLINK_CMD_ESWITCH_SET - do */
-static const struct nla_policy devlink_eswitch_set_nl_policy[DEVLINK_ATTR_ESWITCH_ENCAP_MODE + 1] = {
+static const struct nla_policy devlink_eswitch_set_nl_policy[DEVLINK_ATTR_ESWITCH_SHRDESC_COUNT + 1] = {
 	[DEVLINK_ATTR_BUS_NAME] = { .type = NLA_NUL_STRING, },
 	[DEVLINK_ATTR_DEV_NAME] = { .type = NLA_NUL_STRING, },
 	[DEVLINK_ATTR_ESWITCH_MODE] = NLA_POLICY_MAX(NLA_U16, 1),
 	[DEVLINK_ATTR_ESWITCH_INLINE_MODE] = NLA_POLICY_MAX(NLA_U16, 3),
 	[DEVLINK_ATTR_ESWITCH_ENCAP_MODE] = NLA_POLICY_MAX(NLA_U8, 1),
+	[DEVLINK_ATTR_ESWITCH_SHRDESC_MODE] = NLA_POLICY_MAX(NLA_U8, 1),
+	[DEVLINK_ATTR_ESWITCH_SHRDESC_COUNT] = NLA_POLICY_MAX(NLA_U32, 65535),
 };
 
 /* DEVLINK_CMD_DPIPE_TABLE_GET - do */
@@ -787,7 +789,7 @@  const struct genl_split_ops devlink_nl_ops[74] = {
 		.doit		= devlink_nl_eswitch_set_doit,
 		.post_doit	= devlink_nl_post_doit,
 		.policy		= devlink_eswitch_set_nl_policy,
-		.maxattr	= DEVLINK_ATTR_ESWITCH_ENCAP_MODE,
+		.maxattr	= DEVLINK_ATTR_ESWITCH_SHRDESC_COUNT,
 		.flags		= GENL_ADMIN_PERM | GENL_CMD_CAP_DO,
 	},
 	{