diff mbox series

[v4,net-next,1/5] ethtool: Add support for configuring tx_push_buf_len

Message ID 20230309131319.2531008-2-shayagr@amazon.com (mailing list archive)
State Superseded
Delegated to: Netdev Maintainers
Headers show
Series Add tx push buf len param to ethtool | expand

Checks

Context Check Description
netdev/series_format success Posting correctly formatted
netdev/tree_selection success Clearly marked for net-next
netdev/fixes_present success Fixes tag not required for -next series
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 1902 this patch: 1902
netdev/cc_maintainers warning 9 maintainers not CCed: wangjie125@huawei.com pabeni@redhat.com corbet@lwn.net linux-doc@vger.kernel.org shannon.nelson@amd.com huangguangbin2@huawei.com edumazet@google.com vladimir.oltean@nxp.com andrew@lunn.ch
netdev/build_clang success Errors and warnings before: 591 this patch: 591
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/deprecated_api success None detected
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success No Fixes tag
netdev/build_allmodconfig_warn success Errors and warnings before: 2011 this patch: 2011
netdev/checkpatch warning WARNING: line length of 86 exceeds 80 columns WARNING: line length of 95 exceeds 80 columns
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0

Commit Message

Shay Agroskin March 9, 2023, 1:13 p.m. UTC
This attribute, which is part of ethtool's ring param configuration
allows the user to specify the maximum number of the packet's payload
that can be written directly to the device.

Example usage:
    # ethtool -G [interface] tx-push-buf-len [number of bytes]

Co-developed-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Shay Agroskin <shayagr@amazon.com>
---
 Documentation/netlink/specs/ethtool.yaml     |  8 ++++
 Documentation/networking/ethtool-netlink.rst | 43 ++++++++++++--------
 include/linux/ethtool.h                      | 14 +++++--
 include/uapi/linux/ethtool_netlink.h         |  2 +
 net/ethtool/netlink.h                        |  2 +-
 net/ethtool/rings.c                          | 33 ++++++++++++++-
 6 files changed, 79 insertions(+), 23 deletions(-)

Comments

Simon Horman March 9, 2023, 4:25 p.m. UTC | #1
On Thu, Mar 09, 2023 at 03:13:15PM +0200, Shay Agroskin wrote:
> This attribute, which is part of ethtool's ring param configuration
> allows the user to specify the maximum number of the packet's payload
> that can be written directly to the device.
> 
> Example usage:
>     # ethtool -G [interface] tx-push-buf-len [number of bytes]
> 
> Co-developed-by: Jakub Kicinski <kuba@kernel.org>
> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
> Signed-off-by: Shay Agroskin <shayagr@amazon.com>

Reviewed-by: Simon Horman <simon.horman@corigine.com>
Gal Pressman March 9, 2023, 5:15 p.m. UTC | #2
On 09/03/2023 15:13, Shay Agroskin wrote:
> +``ETHTOOL_A_RINGS_TX_PUSH_BUF_LEN`` specifies the maximum number of bytes of a
> +transmitted packet a driver can push directly to the underlying device
> +('push' mode). Pushing some of the payload bytes to the device has the
> +advantages of reducing latency for small packets by avoiding DMA mapping (same
> +as ``ETHTOOL_A_RINGS_TX_PUSH`` parameter) as well as allowing the underlying
> +device to process packet headers ahead of fetching its payload.
> +This can help the device to make fast actions based on the packet's headers.

I know Jakub prefers the new parameter, but the description of this
still sounds extremely similar to TX copybreak to me..
TX copybreak was traditionally used to copy packets to preallocated DMA
buffers, but this could be implemented as copying the packet to the
(preallocated) WQE's inline part. That usually means DMA memory, but
could also be device memory in this ENA LLQ case.

Are we drawing a line that TX copybreak is the threshold for DMA memory
and tx_push_buf_len is the threshold for device memory?
Are they mutually exclusive?

BTW, as I mentioned in v1, ENA doesn't advertise tx_push, which is a bit
strange given the fact that it now adds tx_push_buf_len support.
Jakub Kicinski March 10, 2023, 6:53 a.m. UTC | #3
On Thu, 9 Mar 2023 19:15:43 +0200 Gal Pressman wrote:
> I know Jakub prefers the new parameter, but the description of this
> still sounds extremely similar to TX copybreak to me..
> TX copybreak was traditionally used to copy packets to preallocated DMA
> buffers, but this could be implemented as copying the packet to the
> (preallocated) WQE's inline part. That usually means DMA memory, but
> could also be device memory in this ENA LLQ case.
> 
> Are we drawing a line that TX copybreak is the threshold for DMA memory
> and tx_push_buf_len is the threshold for device memory?

Pretty much, yes. Not an amazing distinction but since TX copybreak can
already mean two different things (inline or DMA buf) I'd err on 
the side of not overloading it with another one. 
And Pensando needed a similar knob? Maybe I'm misremembering now.

> Are they mutually exclusive?

Not sure.

> BTW, as I mentioned in v1, ENA doesn't advertise tx_push, which is a bit
> strange given the fact that it now adds tx_push_buf_len support.

Fair point.
Gal Pressman March 12, 2023, 12:41 p.m. UTC | #4
On 10/03/2023 8:53, Jakub Kicinski wrote:
> On Thu, 9 Mar 2023 19:15:43 +0200 Gal Pressman wrote:
>> I know Jakub prefers the new parameter, but the description of this
>> still sounds extremely similar to TX copybreak to me..
>> TX copybreak was traditionally used to copy packets to preallocated DMA
>> buffers, but this could be implemented as copying the packet to the
>> (preallocated) WQE's inline part. That usually means DMA memory, but
>> could also be device memory in this ENA LLQ case.
>>
>> Are we drawing a line that TX copybreak is the threshold for DMA memory
>> and tx_push_buf_len is the threshold for device memory?
> 
> Pretty much, yes. Not an amazing distinction but since TX copybreak can
> already mean two different things (inline or DMA buf) I'd err on 
> the side of not overloading it with another one. 

Can we document that please?

> And Pensando needed a similar knob? Maybe I'm misremembering now.
> 
>> Are they mutually exclusive?
> 
> Not sure.

Me neither, but it's not clear who takes precedent when both are set.
Jakub Kicinski March 13, 2023, 7:09 p.m. UTC | #5
On Sun, 12 Mar 2023 14:41:39 +0200 Gal Pressman wrote:
> On 10/03/2023 8:53, Jakub Kicinski wrote:
> > On Thu, 9 Mar 2023 19:15:43 +0200 Gal Pressman wrote:  
> >> I know Jakub prefers the new parameter, but the description of this
> >> still sounds extremely similar to TX copybreak to me..
> >> TX copybreak was traditionally used to copy packets to preallocated DMA
> >> buffers, but this could be implemented as copying the packet to the
> >> (preallocated) WQE's inline part. That usually means DMA memory, but
> >> could also be device memory in this ENA LLQ case.
> >>
> >> Are we drawing a line that TX copybreak is the threshold for DMA memory
> >> and tx_push_buf_len is the threshold for device memory?  
> > 
> > Pretty much, yes. Not an amazing distinction but since TX copybreak can
> > already mean two different things (inline or DMA buf) I'd err on 
> > the side of not overloading it with another one.   
> 
> Can we document that please?

Shay, could you add a paragraph in the docs regarding copybreak in v5?
Shay Agroskin March 14, 2023, 3:38 p.m. UTC | #6
Jakub Kicinski <kuba@kernel.org> writes:

> CAUTION: This email originated from outside of the 
> organization. Do not click links or open attachments unless you 
> can confirm the sender and know the content is safe.
>
>
>
> On Sun, 12 Mar 2023 14:41:39 +0200 Gal Pressman wrote:
>> On 10/03/2023 8:53, Jakub Kicinski wrote:
>> > On Thu, 9 Mar 2023 19:15:43 +0200 Gal Pressman wrote:
>> >> I know Jakub prefers the new parameter, but the description 
>> >> of this
>> >> still sounds extremely similar to TX copybreak to me..
>> >> TX copybreak was traditionally used to copy packets to 
>> >> preallocated DMA
>> >> buffers, but this could be implemented as copying the packet 
>> >> to the
>> >> (preallocated) WQE's inline part. That usually means DMA 
>> >> memory, but
>> >> could also be device memory in this ENA LLQ case.
>> >>
>> >> Are we drawing a line that TX copybreak is the threshold for 
>> >> DMA memory
>> >> and tx_push_buf_len is the threshold for device memory?
>> >
>> > Pretty much, yes. Not an amazing distinction but since TX 
>> > copybreak can
>> > already mean two different things (inline or DMA buf) I'd err 
>> > on
>> > the side of not overloading it with another one.
>>
>> Can we document that please?
>
> Shay, could you add a paragraph in the docs regarding copybreak 
> in v5?

Document that tx_copybreak defines the threshold below which the 
packet is copied into a preallocated DMA'ed buffer and that 
tx_push_buf defines the same but for device memory?
Are we sure we want to make this distinction ? While the meaning 
of both params can overlap in their current definition, the 
motivation to use them is pretty different.
A driver can implement both for different purposes (and still copy 
both into the device).

I'll modify the documentation in next version
Shay Agroskin March 14, 2023, 3:46 p.m. UTC | #7
Gal Pressman <gal@nvidia.com> writes:

> CAUTION: This email originated from outside of the 
> organization. Do not click links or open attachments unless you 
> can confirm the sender and know the content is safe.
>
>
>
> On 09/03/2023 15:13, Shay Agroskin wrote:
>> +``ETHTOOL_A_RINGS_TX_PUSH_BUF_LEN`` specifies the maximum 
>> number of bytes of a
>> +transmitted packet a driver can push directly to the 
>> underlying device
>> +('push' mode). Pushing some of the payload bytes to the device 
>> has the
>> +advantages of reducing latency for small packets by avoiding 
>> DMA mapping (same
>> +as ``ETHTOOL_A_RINGS_TX_PUSH`` parameter) as well as allowing 
>> the underlying
>> +device to process packet headers ahead of fetching its 
>> payload.
>> +This can help the device to make fast actions based on the 
>> packet's headers.
> ...
>
> BTW, as I mentioned in v1, ENA doesn't advertise tx_push, which 
> is a bit
> strange given the fact that it now adds tx_push_buf_len support.

Yup you're right, sorry for missing it. I'll advertisement for 
tx_push support in next version
Gal Pressman March 16, 2023, 11:57 a.m. UTC | #8
On 14/03/2023 17:38, Shay Agroskin wrote:
> 
> Jakub Kicinski <kuba@kernel.org> writes:
> 
>> CAUTION: This email originated from outside of the organization. Do
>> not click links or open attachments unless you can confirm the sender
>> and know the content is safe.
>>
>>
>>
>> On Sun, 12 Mar 2023 14:41:39 +0200 Gal Pressman wrote:
>>> On 10/03/2023 8:53, Jakub Kicinski wrote:
>>> > On Thu, 9 Mar 2023 19:15:43 +0200 Gal Pressman wrote:
>>> >> I know Jakub prefers the new parameter, but the description >> of
>>> this
>>> >> still sounds extremely similar to TX copybreak to me..
>>> >> TX copybreak was traditionally used to copy packets to >>
>>> preallocated DMA
>>> >> buffers, but this could be implemented as copying the packet >> to
>>> the
>>> >> (preallocated) WQE's inline part. That usually means DMA >>
>>> memory, but
>>> >> could also be device memory in this ENA LLQ case.
>>> >>
>>> >> Are we drawing a line that TX copybreak is the threshold for >>
>>> DMA memory
>>> >> and tx_push_buf_len is the threshold for device memory?
>>> >
>>> > Pretty much, yes. Not an amazing distinction but since TX >
>>> copybreak can
>>> > already mean two different things (inline or DMA buf) I'd err > on
>>> > the side of not overloading it with another one.
>>>
>>> Can we document that please?
>>
>> Shay, could you add a paragraph in the docs regarding copybreak in v5?
> 
> Document that tx_copybreak defines the threshold below which the packet
> is copied into a preallocated DMA'ed buffer and that tx_push_buf defines
> the same but for device memory?
> Are we sure we want to make this distinction ? While the meaning of both
> params can overlap in their current definition, the motivation to use
> them is pretty different.
> A driver can implement both for different purposes (and still copy both
> into the device).

I don't understand what it means to implement both.

It's confusing because both parameters result in skipping the DMA map
operation, but their usage motivation is different?
What are we instructing our customers? Use copybreak when your iommu is
slow, but when should they use this new parameter?

IMO, a reasonable way to use both would be to only account for the
tx_push_buf_len when tx_push is enabled, otherwise use copybreak.
Jakub Kicinski March 16, 2023, 8:42 p.m. UTC | #9
On Thu, 16 Mar 2023 13:57:26 +0200 Gal Pressman wrote:
> On 14/03/2023 17:38, Shay Agroskin wrote:
> >> Shay, could you add a paragraph in the docs regarding copybreak in v5?  
> > 
> > Document that tx_copybreak defines the threshold below which the packet
> > is copied into a preallocated DMA'ed buffer and that tx_push_buf defines
> > the same but for device memory?
> > Are we sure we want to make this distinction ? While the meaning of both
> > params can overlap in their current definition, the motivation to use
> > them is pretty different.
> > A driver can implement both for different purposes (and still copy both
> > into the device).  
> 
> I don't understand what it means to implement both.

If skb head is large you can copy part into the iomem, part into 
a pre-mapped space.

> It's confusing because both parameters result in skipping the DMA map
> operation, but their usage motivation is different?
> What are we instructing our customers? Use copybreak when your iommu is
> slow, but when should they use this new parameter?

Your customers? Is mlx5 going to implement the iomem based push which
needs explicit slot size control?

> IMO, a reasonable way to use both would be to only account for the
> tx_push_buf_len when tx_push is enabled, otherwise use copybreak.

I thought Shay already agreed. Let's get the v5 out.
diff mbox series

Patch

diff --git a/Documentation/netlink/specs/ethtool.yaml b/Documentation/netlink/specs/ethtool.yaml
index 08b776908d15..244864c96feb 100644
--- a/Documentation/netlink/specs/ethtool.yaml
+++ b/Documentation/netlink/specs/ethtool.yaml
@@ -174,6 +174,12 @@  attribute-sets:
       -
         name: rx-push
         type: u8
+      -
+        name: tx-push-buf-len
+        type: u32
+      -
+        name: tx-push-buf-len-max
+        type: u32
 
   -
     name: mm-stat
@@ -324,6 +330,8 @@  operations:
             - cqe-size
             - tx-push
             - rx-push
+            - tx-push-buf-len
+            - tx-push-buf-len-max
       dump: *ring-get-op
     -
       name: rings-set
diff --git a/Documentation/networking/ethtool-netlink.rst b/Documentation/networking/ethtool-netlink.rst
index e1bc6186d7ea..1aa09e7e8dcc 100644
--- a/Documentation/networking/ethtool-netlink.rst
+++ b/Documentation/networking/ethtool-netlink.rst
@@ -860,22 +860,24 @@  Request contents:
 
 Kernel response contents:
 
-  ====================================  ======  ===========================
-  ``ETHTOOL_A_RINGS_HEADER``            nested  reply header
-  ``ETHTOOL_A_RINGS_RX_MAX``            u32     max size of RX ring
-  ``ETHTOOL_A_RINGS_RX_MINI_MAX``       u32     max size of RX mini ring
-  ``ETHTOOL_A_RINGS_RX_JUMBO_MAX``      u32     max size of RX jumbo ring
-  ``ETHTOOL_A_RINGS_TX_MAX``            u32     max size of TX ring
-  ``ETHTOOL_A_RINGS_RX``                u32     size of RX ring
-  ``ETHTOOL_A_RINGS_RX_MINI``           u32     size of RX mini ring
-  ``ETHTOOL_A_RINGS_RX_JUMBO``          u32     size of RX jumbo ring
-  ``ETHTOOL_A_RINGS_TX``                u32     size of TX ring
-  ``ETHTOOL_A_RINGS_RX_BUF_LEN``        u32     size of buffers on the ring
-  ``ETHTOOL_A_RINGS_TCP_DATA_SPLIT``    u8      TCP header / data split
-  ``ETHTOOL_A_RINGS_CQE_SIZE``          u32     Size of TX/RX CQE
-  ``ETHTOOL_A_RINGS_TX_PUSH``           u8      flag of TX Push mode
-  ``ETHTOOL_A_RINGS_RX_PUSH``           u8      flag of RX Push mode
-  ====================================  ======  ===========================
+  =======================================   ======  ===========================
+  ``ETHTOOL_A_RINGS_HEADER``                nested  reply header
+  ``ETHTOOL_A_RINGS_RX_MAX``                u32     max size of RX ring
+  ``ETHTOOL_A_RINGS_RX_MINI_MAX``           u32     max size of RX mini ring
+  ``ETHTOOL_A_RINGS_RX_JUMBO_MAX``          u32     max size of RX jumbo ring
+  ``ETHTOOL_A_RINGS_TX_MAX``                u32     max size of TX ring
+  ``ETHTOOL_A_RINGS_RX``                    u32     size of RX ring
+  ``ETHTOOL_A_RINGS_RX_MINI``               u32     size of RX mini ring
+  ``ETHTOOL_A_RINGS_RX_JUMBO``              u32     size of RX jumbo ring
+  ``ETHTOOL_A_RINGS_TX``                    u32     size of TX ring
+  ``ETHTOOL_A_RINGS_RX_BUF_LEN``            u32     size of buffers on the ring
+  ``ETHTOOL_A_RINGS_TCP_DATA_SPLIT``        u8      TCP header / data split
+  ``ETHTOOL_A_RINGS_CQE_SIZE``              u32     Size of TX/RX CQE
+  ``ETHTOOL_A_RINGS_TX_PUSH``               u8      flag of TX Push mode
+  ``ETHTOOL_A_RINGS_RX_PUSH``               u8      flag of RX Push mode
+  ``ETHTOOL_A_RINGS_TX_PUSH_BUF_LEN``       u32     size of TX push buffer
+  ``ETHTOOL_A_RINGS_TX_PUSH_BUF_LEN_MAX``   u32     max size of TX push buffer
+  =======================================   ======  ===========================
 
 ``ETHTOOL_A_RINGS_TCP_DATA_SPLIT`` indicates whether the device is usable with
 page-flipping TCP zero-copy receive (``getsockopt(TCP_ZEROCOPY_RECEIVE)``).
@@ -891,6 +893,14 @@  through MMIO writes, thus reducing the latency. However, enabling this feature
 may increase the CPU cost. Drivers may enforce additional per-packet
 eligibility checks (e.g. on packet size).
 
+``ETHTOOL_A_RINGS_TX_PUSH_BUF_LEN`` specifies the maximum number of bytes of a
+transmitted packet a driver can push directly to the underlying device
+('push' mode). Pushing some of the payload bytes to the device has the
+advantages of reducing latency for small packets by avoiding DMA mapping (same
+as ``ETHTOOL_A_RINGS_TX_PUSH`` parameter) as well as allowing the underlying
+device to process packet headers ahead of fetching its payload.
+This can help the device to make fast actions based on the packet's headers.
+
 RINGS_SET
 =========
 
@@ -908,6 +918,7 @@  Request contents:
   ``ETHTOOL_A_RINGS_CQE_SIZE``          u32     Size of TX/RX CQE
   ``ETHTOOL_A_RINGS_TX_PUSH``           u8      flag of TX Push mode
   ``ETHTOOL_A_RINGS_RX_PUSH``           u8      flag of RX Push mode
+  ``ETHTOOL_A_RINGS_TX_PUSH_BUF_LEN``   u32     size of TX push buffer
   ====================================  ======  ===========================
 
 Kernel checks that requested ring sizes do not exceed limits reported by
diff --git a/include/linux/ethtool.h b/include/linux/ethtool.h
index 2792185dda22..798d35890118 100644
--- a/include/linux/ethtool.h
+++ b/include/linux/ethtool.h
@@ -75,6 +75,8 @@  enum {
  * @tx_push: The flag of tx push mode
  * @rx_push: The flag of rx push mode
  * @cqe_size: Size of TX/RX completion queue event
+ * @tx_push_buf_len: Size of TX push buffer
+ * @tx_push_buf_max_len: Maximum allowed size of TX push buffer
  */
 struct kernel_ethtool_ringparam {
 	u32	rx_buf_len;
@@ -82,6 +84,8 @@  struct kernel_ethtool_ringparam {
 	u8	tx_push;
 	u8	rx_push;
 	u32	cqe_size;
+	u32	tx_push_buf_len;
+	u32	tx_push_buf_max_len;
 };
 
 /**
@@ -90,12 +94,14 @@  struct kernel_ethtool_ringparam {
  * @ETHTOOL_RING_USE_CQE_SIZE: capture for setting cqe_size
  * @ETHTOOL_RING_USE_TX_PUSH: capture for setting tx_push
  * @ETHTOOL_RING_USE_RX_PUSH: capture for setting rx_push
+ * @ETHTOOL_RING_USE_TX_PUSH_BUF_LEN: capture for setting tx_push_buf_len
  */
 enum ethtool_supported_ring_param {
-	ETHTOOL_RING_USE_RX_BUF_LEN = BIT(0),
-	ETHTOOL_RING_USE_CQE_SIZE   = BIT(1),
-	ETHTOOL_RING_USE_TX_PUSH    = BIT(2),
-	ETHTOOL_RING_USE_RX_PUSH    = BIT(3),
+	ETHTOOL_RING_USE_RX_BUF_LEN		= BIT(0),
+	ETHTOOL_RING_USE_CQE_SIZE		= BIT(1),
+	ETHTOOL_RING_USE_TX_PUSH		= BIT(2),
+	ETHTOOL_RING_USE_RX_PUSH		= BIT(3),
+	ETHTOOL_RING_USE_TX_PUSH_BUF_LEN	= BIT(4),
 };
 
 #define __ETH_RSS_HASH_BIT(bit)	((u32)1 << (bit))
diff --git a/include/uapi/linux/ethtool_netlink.h b/include/uapi/linux/ethtool_netlink.h
index d39ce21381c5..1ebf8d455f07 100644
--- a/include/uapi/linux/ethtool_netlink.h
+++ b/include/uapi/linux/ethtool_netlink.h
@@ -357,6 +357,8 @@  enum {
 	ETHTOOL_A_RINGS_CQE_SIZE,			/* u32 */
 	ETHTOOL_A_RINGS_TX_PUSH,			/* u8 */
 	ETHTOOL_A_RINGS_RX_PUSH,			/* u8 */
+	ETHTOOL_A_RINGS_TX_PUSH_BUF_LEN,		/* u32 */
+	ETHTOOL_A_RINGS_TX_PUSH_BUF_LEN_MAX,		/* u32 */
 
 	/* add new constants above here */
 	__ETHTOOL_A_RINGS_CNT,
diff --git a/net/ethtool/netlink.h b/net/ethtool/netlink.h
index f7b189ed96b2..79424b34b553 100644
--- a/net/ethtool/netlink.h
+++ b/net/ethtool/netlink.h
@@ -413,7 +413,7 @@  extern const struct nla_policy ethnl_features_set_policy[ETHTOOL_A_FEATURES_WANT
 extern const struct nla_policy ethnl_privflags_get_policy[ETHTOOL_A_PRIVFLAGS_HEADER + 1];
 extern const struct nla_policy ethnl_privflags_set_policy[ETHTOOL_A_PRIVFLAGS_FLAGS + 1];
 extern const struct nla_policy ethnl_rings_get_policy[ETHTOOL_A_RINGS_HEADER + 1];
-extern const struct nla_policy ethnl_rings_set_policy[ETHTOOL_A_RINGS_RX_PUSH + 1];
+extern const struct nla_policy ethnl_rings_set_policy[ETHTOOL_A_RINGS_TX_PUSH_BUF_LEN_MAX + 1];
 extern const struct nla_policy ethnl_channels_get_policy[ETHTOOL_A_CHANNELS_HEADER + 1];
 extern const struct nla_policy ethnl_channels_set_policy[ETHTOOL_A_CHANNELS_COMBINED_COUNT + 1];
 extern const struct nla_policy ethnl_coalesce_get_policy[ETHTOOL_A_COALESCE_HEADER + 1];
diff --git a/net/ethtool/rings.c b/net/ethtool/rings.c
index f358cd57d094..574a6f2e57af 100644
--- a/net/ethtool/rings.c
+++ b/net/ethtool/rings.c
@@ -11,6 +11,7 @@  struct rings_reply_data {
 	struct ethnl_reply_data		base;
 	struct ethtool_ringparam	ringparam;
 	struct kernel_ethtool_ringparam	kernel_ringparam;
+	u32				supported_ring_params;
 };
 
 #define RINGS_REPDATA(__reply_base) \
@@ -32,6 +33,8 @@  static int rings_prepare_data(const struct ethnl_req_info *req_base,
 
 	if (!dev->ethtool_ops->get_ringparam)
 		return -EOPNOTSUPP;
+
+	data->supported_ring_params = dev->ethtool_ops->supported_ring_params;
 	ret = ethnl_ops_begin(dev);
 	if (ret < 0)
 		return ret;
@@ -57,7 +60,9 @@  static int rings_reply_size(const struct ethnl_req_info *req_base,
 	       nla_total_size(sizeof(u8))  +	/* _RINGS_TCP_DATA_SPLIT */
 	       nla_total_size(sizeof(u32)  +	/* _RINGS_CQE_SIZE */
 	       nla_total_size(sizeof(u8))  +	/* _RINGS_TX_PUSH */
-	       nla_total_size(sizeof(u8)));	/* _RINGS_RX_PUSH */
+	       nla_total_size(sizeof(u8))) +	/* _RINGS_RX_PUSH */
+	       nla_total_size(sizeof(u32)) +	/* _RINGS_TX_PUSH_BUF_LEN */
+	       nla_total_size(sizeof(u32));	/* _RINGS_TX_PUSH_BUF_LEN_MAX */
 }
 
 static int rings_fill_reply(struct sk_buff *skb,
@@ -67,6 +72,7 @@  static int rings_fill_reply(struct sk_buff *skb,
 	const struct rings_reply_data *data = RINGS_REPDATA(reply_base);
 	const struct kernel_ethtool_ringparam *kr = &data->kernel_ringparam;
 	const struct ethtool_ringparam *ringparam = &data->ringparam;
+	u32 supported_ring_params = data->supported_ring_params;
 
 	WARN_ON(kr->tcp_data_split > ETHTOOL_TCP_DATA_SPLIT_ENABLED);
 
@@ -98,7 +104,12 @@  static int rings_fill_reply(struct sk_buff *skb,
 	    (kr->cqe_size &&
 	     (nla_put_u32(skb, ETHTOOL_A_RINGS_CQE_SIZE, kr->cqe_size))) ||
 	    nla_put_u8(skb, ETHTOOL_A_RINGS_TX_PUSH, !!kr->tx_push) ||
-	    nla_put_u8(skb, ETHTOOL_A_RINGS_RX_PUSH, !!kr->rx_push))
+	    nla_put_u8(skb, ETHTOOL_A_RINGS_RX_PUSH, !!kr->rx_push) ||
+	    ((supported_ring_params & ETHTOOL_RING_USE_TX_PUSH_BUF_LEN) &&
+	     (nla_put_u32(skb, ETHTOOL_A_RINGS_TX_PUSH_BUF_LEN_MAX,
+			  kr->tx_push_buf_max_len) ||
+	      nla_put_u32(skb, ETHTOOL_A_RINGS_TX_PUSH_BUF_LEN,
+			  kr->tx_push_buf_len))))
 		return -EMSGSIZE;
 
 	return 0;
@@ -117,6 +128,7 @@  const struct nla_policy ethnl_rings_set_policy[] = {
 	[ETHTOOL_A_RINGS_CQE_SIZE]		= NLA_POLICY_MIN(NLA_U32, 1),
 	[ETHTOOL_A_RINGS_TX_PUSH]		= NLA_POLICY_MAX(NLA_U8, 1),
 	[ETHTOOL_A_RINGS_RX_PUSH]		= NLA_POLICY_MAX(NLA_U8, 1),
+	[ETHTOOL_A_RINGS_TX_PUSH_BUF_LEN]	= { .type = NLA_U32 },
 };
 
 static int
@@ -158,6 +170,14 @@  ethnl_set_rings_validate(struct ethnl_req_info *req_info,
 		return -EOPNOTSUPP;
 	}
 
+	if (tb[ETHTOOL_A_RINGS_TX_PUSH_BUF_LEN] &&
+	    !(ops->supported_ring_params & ETHTOOL_RING_USE_TX_PUSH_BUF_LEN)) {
+		NL_SET_ERR_MSG_ATTR(info->extack,
+				    tb[ETHTOOL_A_RINGS_TX_PUSH_BUF_LEN],
+				    "setting tx push buf len is not supported");
+		return -EOPNOTSUPP;
+	}
+
 	return ops->get_ringparam && ops->set_ringparam ? 1 : -EOPNOTSUPP;
 }
 
@@ -189,6 +209,8 @@  ethnl_set_rings(struct ethnl_req_info *req_info, struct genl_info *info)
 			tb[ETHTOOL_A_RINGS_TX_PUSH], &mod);
 	ethnl_update_u8(&kernel_ringparam.rx_push,
 			tb[ETHTOOL_A_RINGS_RX_PUSH], &mod);
+	ethnl_update_u32(&kernel_ringparam.tx_push_buf_len,
+			 tb[ETHTOOL_A_RINGS_TX_PUSH_BUF_LEN], &mod);
 	if (!mod)
 		return 0;
 
@@ -209,6 +231,13 @@  ethnl_set_rings(struct ethnl_req_info *req_info, struct genl_info *info)
 		return -EINVAL;
 	}
 
+	if (kernel_ringparam.tx_push_buf_len > kernel_ringparam.tx_push_buf_max_len) {
+		NL_SET_ERR_MSG_ATTR(info->extack, tb[ETHTOOL_A_RINGS_TX_PUSH_BUF_LEN],
+				    "Requested TX push buffer exceeds maximum");
+
+		return -EINVAL;
+	}
+
 	ret = dev->ethtool_ops->set_ringparam(dev, &ringparam,
 					      &kernel_ringparam, info->extack);
 	return ret < 0 ? ret : 1;