Message ID | 20241003160620.1521626-4-ap420073@gmail.com (mailing list archive) |
---|---|
State | Changes Requested |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | bnxt_en: implement device memory TCP for bnxt | expand |
On Thu, Oct 3, 2024 at 9:07 AM Taehee Yoo <ap420073@gmail.com> wrote: > > The tcp-data-split-thresh option configures the threshold value of > the tcp-data-split. > If a received packet size is larger than this threshold value, a packet > will be split into header and payload. Why do you need this? devmem TCP will always not work with unsplit packets. Seems like you always want to set thresh to 0 to support something like devmem TCP. Why would the user ever want to configure this? I can't think of a scenario where the user wouldn't want packets under X bytes to be unsplit.
On Fri, Oct 4, 2024 at 3:25 AM Mina Almasry <almasrymina@google.com> wrote: > > On Thu, Oct 3, 2024 at 9:07 AM Taehee Yoo <ap420073@gmail.com> wrote: > > > > The tcp-data-split-thresh option configures the threshold value of > > the tcp-data-split. > > If a received packet size is larger than this threshold value, a packet > > will be split into header and payload. > > Why do you need this? devmem TCP will always not work with unsplit > packets. Seems like you always want to set thresh to 0 to support > something like devmem TCP. > > Why would the user ever want to configure this? I can't think of a > scenario where the user wouldn't want packets under X bytes to be > unsplit. I totally understand what you mean, Yes, tcp-data-split is zerocopy friendly option but as far as I know, this option is not only for the zerocopy usecase. So, If users enable tcp-data-split, they would assume threshold is 0. But there are already NICs that have been supporting tcp-data-split enabled by default. bnxt_en's default value is 256bytes. If we just assume the tcp-data-split-threshold to 0 for all cases, it would change the default behavior of bnxt_en driver(maybe other drivers too) for the not zerocopy case. Jakub pointed out the generic case, not only for zerocopy usecase in the v1 and I agree with that opinion. https://lore.kernel.org/netdev/20240906183844.2e8226f3@kernel.org/
On Thu, Oct 3, 2024 at 12:33 PM Taehee Yoo <ap420073@gmail.com> wrote: > > On Fri, Oct 4, 2024 at 3:25 AM Mina Almasry <almasrymina@google.com> wrote: > > > > On Thu, Oct 3, 2024 at 9:07 AM Taehee Yoo <ap420073@gmail.com> wrote: > > > > > > The tcp-data-split-thresh option configures the threshold value of > > > the tcp-data-split. > > > If a received packet size is larger than this threshold value, a packet > > > will be split into header and payload. > > > > Why do you need this? devmem TCP will always not work with unsplit > > packets. Seems like you always want to set thresh to 0 to support > > something like devmem TCP. > > > > Why would the user ever want to configure this? I can't think of a > > scenario where the user wouldn't want packets under X bytes to be > > unsplit. > > I totally understand what you mean, > Yes, tcp-data-split is zerocopy friendly option but as far as I know, > this option is not only for the zerocopy usecase. > So, If users enable tcp-data-split, they would assume threshold is 0. > But there are already NICs that have been supporting tcp-data-split > enabled by default. > bnxt_en's default value is 256bytes. > If we just assume the tcp-data-split-threshold to 0 for all cases, > it would change the default behavior of bnxt_en driver(maybe other drivers too) > for the not zerocopy case. > Jakub pointed out the generic case, not only for zerocopy usecase > in the v1 and I agree with that opinion. > https://lore.kernel.org/netdev/20240906183844.2e8226f3@kernel.org/ I see, thanks. The ability to tune the threshold to save some pcie bandwidth is interesting. Not sure how much it would matter in practice. I guess if you're receiving _lots_ of small packets then it could be critical. Sounds good then, please consider adding Jakub's reasoning for why tuning this could be valuable to the commit message for future userspace readers that wonder why to set this.
On Fri, Oct 4, 2024 at 10:47 AM Mina Almasry <almasrymina@google.com> wrote: > > On Thu, Oct 3, 2024 at 12:33 PM Taehee Yoo <ap420073@gmail.com> wrote: > > > > On Fri, Oct 4, 2024 at 3:25 AM Mina Almasry <almasrymina@google.com> wrote: > > > > > > On Thu, Oct 3, 2024 at 9:07 AM Taehee Yoo <ap420073@gmail.com> wrote: > > > > > > > > The tcp-data-split-thresh option configures the threshold value of > > > > the tcp-data-split. > > > > If a received packet size is larger than this threshold value, a packet > > > > will be split into header and payload. > > > > > > Why do you need this? devmem TCP will always not work with unsplit > > > packets. Seems like you always want to set thresh to 0 to support > > > something like devmem TCP. > > > > > > Why would the user ever want to configure this? I can't think of a > > > scenario where the user wouldn't want packets under X bytes to be > > > unsplit. > > > > I totally understand what you mean, > > Yes, tcp-data-split is zerocopy friendly option but as far as I know, > > this option is not only for the zerocopy usecase. > > So, If users enable tcp-data-split, they would assume threshold is 0. > > But there are already NICs that have been supporting tcp-data-split > > enabled by default. > > bnxt_en's default value is 256bytes. > > If we just assume the tcp-data-split-threshold to 0 for all cases, > > it would change the default behavior of bnxt_en driver(maybe other drivers too) > > for the not zerocopy case. > > Jakub pointed out the generic case, not only for zerocopy usecase > > in the v1 and I agree with that opinion. > > https://lore.kernel.org/netdev/20240906183844.2e8226f3@kernel.org/ > > I see, thanks. The ability to tune the threshold to save some pcie > bandwidth is interesting. Not sure how much it would matter in > practice. I guess if you're receiving _lots_ of small packets then it > could be critical. > > Sounds good then, please consider adding Jakub's reasoning for why > tuning this could be valuable to the commit message for future > userspace readers that wonder why to set this. Okay, I will add an explanation of this feature to commit message in v4 patch. Thanks a lot! Taehee Yoo > > -- > Thanks, > Mina
On Thu, 3 Oct 2024 16:06:16 +0000 Taehee Yoo wrote: > The tcp-data-split-thresh option configures the threshold value of > the tcp-data-split. > If a received packet size is larger than this threshold value, a packet > will be split into header and payload. > The header indicates TCP header, but it depends on driver spec. > The bnxt_en driver supports HDS(Header-Data-Split) configuration at > FW level, affecting TCP and UDP too. > So, like the tcp-data-split option, If tcp-data-split-thresh is set, > it affects UDP and TCP packets. > > The tcp-data-split-thresh has a dependency, that is tcp-data-split > option. This threshold value can be get/set only when tcp-data-split > option is enabled. > > Example: > # ethtool -G <interface name> tcp-data-split-thresh <value> > > # ethtool -G enp14s0f0np0 tcp-data-split on tcp-data-split-thresh 256 > # ethtool -g enp14s0f0np0 > Ring parameters for enp14s0f0np0: > Pre-set maximums: > ... > TCP data split thresh: 256 > Current hardware settings: > ... > TCP data split: on > TCP data split thresh: 256 > > The tcp-data-split is not enabled, the tcp-data-split-thresh will > not be used and can't be configured. > > # ethtool -G enp14s0f0np0 tcp-data-split off > # ethtool -g enp14s0f0np0 > Ring parameters for enp14s0f0np0: > Pre-set maximums: > ... > TCP data split thresh: 256 > Current hardware settings: > ... > TCP data split: off > TCP data split thresh: n/a My reply to Sridhar was probably quite unclear on this point, but FWIW I do also have a weak preference to drop the "TCP" from the new knob. Rephrasing what I said here: https://lore.kernel.org/all/20240911173150.571bf93b@kernel.org/ the old knob is defined as being about TCP but for the new one we can pick how widely applicable it is (and make it cover UDP as well). > The default/min/max values are not defined in the ethtool so the drivers > should define themself. > The 0 value means that all TCP and UDP packets' header and payload > will be split. > diff --git a/include/linux/ethtool.h b/include/linux/ethtool.h > index 12f6dc567598..891f55b0f6aa 100644 > --- a/include/linux/ethtool.h > +++ b/include/linux/ethtool.h > @@ -78,6 +78,8 @@ enum { > * @cqe_size: Size of TX/RX completion queue event > * @tx_push_buf_len: Size of TX push buffer > * @tx_push_buf_max_len: Maximum allowed size of TX push buffer > + * @tcp_data_split_thresh: Threshold value of tcp-data-split > + * @tcp_data_split_thresh_max: Maximum allowed threshold of tcp-data-split-threshold Please wrap at 80 chars: ./scripts/checkpatch.pl --max-line-length=80 --strict $patch.. > static int rings_fill_reply(struct sk_buff *skb, > @@ -108,7 +110,13 @@ static int rings_fill_reply(struct sk_buff *skb, > (nla_put_u32(skb, ETHTOOL_A_RINGS_TX_PUSH_BUF_LEN_MAX, > kr->tx_push_buf_max_len) || > nla_put_u32(skb, ETHTOOL_A_RINGS_TX_PUSH_BUF_LEN, > - kr->tx_push_buf_len)))) > + kr->tx_push_buf_len))) || > + (kr->tcp_data_split == ETHTOOL_TCP_DATA_SPLIT_ENABLED && Please add a new ETHTOOL_RING_USE_* flag for this, or fix all the drivers which set ETHTOOL_RING_USE_TCP_DATA_SPLIT already and use that. I don't think we should hide the value when HDS is disabled. > + (nla_put_u32(skb, ETHTOOL_A_RINGS_TCP_DATA_SPLIT_THRESH, > + kr->tcp_data_split_thresh))) || nit: unnecessary brackets around the nla_put_u32() > + (kr->tcp_data_split == ETHTOOL_TCP_DATA_SPLIT_ENABLED && > + (nla_put_u32(skb, ETHTOOL_A_RINGS_TCP_DATA_SPLIT_THRESH_MAX, > + kr->tcp_data_split_thresh_max)))) > return -EMSGSIZE; > > return 0; > + if (tb[ETHTOOL_A_RINGS_TCP_DATA_SPLIT_THRESH] && > + !(ops->supported_ring_params & ETHTOOL_RING_USE_TCP_DATA_SPLIT)) { here you use the existing flag, yet gve and idpf set that flag and will ignore the setting silently. They need to be changed or we need a new flag. > + NL_SET_ERR_MSG_ATTR(info->extack, > + tb[ETHTOOL_A_RINGS_TCP_DATA_SPLIT_THRESH], > + "setting tcp-data-split-thresh is not supported"); > + return -EOPNOTSUPP; > + } > + > if (tb[ETHTOOL_A_RINGS_CQE_SIZE] && > !(ops->supported_ring_params & ETHTOOL_RING_USE_CQE_SIZE)) { > NL_SET_ERR_MSG_ATTR(info->extack, > @@ -196,9 +213,9 @@ ethnl_set_rings(struct ethnl_req_info *req_info, struct genl_info *info) > struct kernel_ethtool_ringparam kernel_ringparam = {}; > struct ethtool_ringparam ringparam = {}; > struct net_device *dev = req_info->dev; > + bool mod = false, thresh_mod = false; > struct nlattr **tb = info->attrs; > const struct nlattr *err_attr; > - bool mod = false; > int ret; > > dev->ethtool_ops->get_ringparam(dev, &ringparam, > @@ -222,9 +239,30 @@ ethnl_set_rings(struct ethnl_req_info *req_info, struct genl_info *info) > tb[ETHTOOL_A_RINGS_RX_PUSH], &mod); > ethnl_update_u32(&kernel_ringparam.tx_push_buf_len, > tb[ETHTOOL_A_RINGS_TX_PUSH_BUF_LEN], &mod); > - if (!mod) > + ethnl_update_u32(&kernel_ringparam.tcp_data_split_thresh, > + tb[ETHTOOL_A_RINGS_TCP_DATA_SPLIT_THRESH], > + &thresh_mod); > + if (!mod && !thresh_mod) > return 0; > > + if (kernel_ringparam.tcp_data_split == ETHTOOL_TCP_DATA_SPLIT_DISABLED && > + thresh_mod) { > + NL_SET_ERR_MSG_ATTR(info->extack, > + tb[ETHTOOL_A_RINGS_TCP_DATA_SPLIT_THRESH], > + "tcp-data-split-thresh can not be updated while tcp-data-split is disabled"); > + return -EINVAL; > + } I'm not sure we need to reject changing the setting when HDS is disabled. Driver can just store the value until HDS gets enabled? WDYT? I don't have a strong preference. > + if (kernel_ringparam.tcp_data_split_thresh > > + kernel_ringparam.tcp_data_split_thresh_max) { > + NL_SET_ERR_MSG_ATTR_FMT(info->extack, > + tb[ETHTOOL_A_RINGS_TCP_DATA_SPLIT_THRESH_MAX], > + "Requested tcp-data-split-thresh exceeds the maximum of %u", No need for the string, just NL_SET_BAD_ATTR() + ERANGE is enough > + kernel_ringparam.tcp_data_split_thresh_max); > + > + return -EINVAL; ERANGE > + } > + > /* ensure new ring parameters are within limits */ > if (ringparam.rx_pending > ringparam.rx_max_pending) > err_attr = tb[ETHTOOL_A_RINGS_RX];
On Wed, Oct 9, 2024 at 3:33 AM Jakub Kicinski <kuba@kernel.org> wrote: > > On Thu, 3 Oct 2024 16:06:16 +0000 Taehee Yoo wrote: > > The tcp-data-split-thresh option configures the threshold value of > > the tcp-data-split. > > If a received packet size is larger than this threshold value, a packet > > will be split into header and payload. > > The header indicates TCP header, but it depends on driver spec. > > The bnxt_en driver supports HDS(Header-Data-Split) configuration at > > FW level, affecting TCP and UDP too. > > So, like the tcp-data-split option, If tcp-data-split-thresh is set, > > it affects UDP and TCP packets. > > > > The tcp-data-split-thresh has a dependency, that is tcp-data-split > > option. This threshold value can be get/set only when tcp-data-split > > option is enabled. > > > > Example: > > # ethtool -G <interface name> tcp-data-split-thresh <value> > > > > # ethtool -G enp14s0f0np0 tcp-data-split on tcp-data-split-thresh 256 > > # ethtool -g enp14s0f0np0 > > Ring parameters for enp14s0f0np0: > > Pre-set maximums: > > ... > > TCP data split thresh: 256 > > Current hardware settings: > > ... > > TCP data split: on > > TCP data split thresh: 256 > > > > The tcp-data-split is not enabled, the tcp-data-split-thresh will > > not be used and can't be configured. > > > > # ethtool -G enp14s0f0np0 tcp-data-split off > > # ethtool -g enp14s0f0np0 > > Ring parameters for enp14s0f0np0: > > Pre-set maximums: > > ... > > TCP data split thresh: 256 > > Current hardware settings: > > ... > > TCP data split: off > > TCP data split thresh: n/a > > My reply to Sridhar was probably quite unclear on this point, but FWIW > I do also have a weak preference to drop the "TCP" from the new knob. > Rephrasing what I said here: > https://lore.kernel.org/all/20240911173150.571bf93b@kernel.org/ > the old knob is defined as being about TCP but for the new one we can > pick how widely applicable it is (and make it cover UDP as well). I'm not sure that I understand about "knob". The knob means the command "tcp-data-split-thresh"? If so, I would like to change from "tcp-data-split-thresh" to "header-data-split-thresh". > > > The default/min/max values are not defined in the ethtool so the drivers > > should define themself. > > The 0 value means that all TCP and UDP packets' header and payload > > will be split. > > > diff --git a/include/linux/ethtool.h b/include/linux/ethtool.h > > index 12f6dc567598..891f55b0f6aa 100644 > > --- a/include/linux/ethtool.h > > +++ b/include/linux/ethtool.h > > @@ -78,6 +78,8 @@ enum { > > * @cqe_size: Size of TX/RX completion queue event > > * @tx_push_buf_len: Size of TX push buffer > > * @tx_push_buf_max_len: Maximum allowed size of TX push buffer > > + * @tcp_data_split_thresh: Threshold value of tcp-data-split > > + * @tcp_data_split_thresh_max: Maximum allowed threshold of tcp-data-split-threshold > > Please wrap at 80 chars: > > ./scripts/checkpatch.pl --max-line-length=80 --strict $patch.. Thanks, I will fix this in v4 patch. > > > static int rings_fill_reply(struct sk_buff *skb, > > @@ -108,7 +110,13 @@ static int rings_fill_reply(struct sk_buff *skb, > > (nla_put_u32(skb, ETHTOOL_A_RINGS_TX_PUSH_BUF_LEN_MAX, > > kr->tx_push_buf_max_len) || > > nla_put_u32(skb, ETHTOOL_A_RINGS_TX_PUSH_BUF_LEN, > > - kr->tx_push_buf_len)))) > > + kr->tx_push_buf_len))) || > > + (kr->tcp_data_split == ETHTOOL_TCP_DATA_SPLIT_ENABLED && > > Please add a new ETHTOOL_RING_USE_* flag for this, or fix all the > drivers which set ETHTOOL_RING_USE_TCP_DATA_SPLIT already and use that. > I don't think we should hide the value when HDS is disabled. > > > + (nla_put_u32(skb, ETHTOOL_A_RINGS_TCP_DATA_SPLIT_THRESH, > > + kr->tcp_data_split_thresh))) || > > nit: unnecessary brackets around the nla_put_u32() I will fix this too. > > > + (kr->tcp_data_split == ETHTOOL_TCP_DATA_SPLIT_ENABLED && > > + (nla_put_u32(skb, ETHTOOL_A_RINGS_TCP_DATA_SPLIT_THRESH_MAX, > > + kr->tcp_data_split_thresh_max)))) > > return -EMSGSIZE; > > > > return 0; > > > + if (tb[ETHTOOL_A_RINGS_TCP_DATA_SPLIT_THRESH] && > > + !(ops->supported_ring_params & ETHTOOL_RING_USE_TCP_DATA_SPLIT)) { > > here you use the existing flag, yet gve and idpf set that flag and will > ignore the setting silently. They need to be changed or we need a new > flag. Okay, I would like to add the ETHTOOL_RING_USE_TCP_DATA_SPLIT_THRESH flag. Or ETHTOOL_RING_USE_HDS_THRESH, which indicates header-data-split thresh. If you agree with adding a new flag, how do you think about naming it? > > > + NL_SET_ERR_MSG_ATTR(info->extack, > > + tb[ETHTOOL_A_RINGS_TCP_DATA_SPLIT_THRESH], > > + "setting tcp-data-split-thresh is not supported"); > > + return -EOPNOTSUPP; > > + } > > + > > if (tb[ETHTOOL_A_RINGS_CQE_SIZE] && > > !(ops->supported_ring_params & ETHTOOL_RING_USE_CQE_SIZE)) { > > NL_SET_ERR_MSG_ATTR(info->extack, > > @@ -196,9 +213,9 @@ ethnl_set_rings(struct ethnl_req_info *req_info, struct genl_info *info) > > struct kernel_ethtool_ringparam kernel_ringparam = {}; > > struct ethtool_ringparam ringparam = {}; > > struct net_device *dev = req_info->dev; > > + bool mod = false, thresh_mod = false; > > struct nlattr **tb = info->attrs; > > const struct nlattr *err_attr; > > - bool mod = false; > > int ret; > > > > dev->ethtool_ops->get_ringparam(dev, &ringparam, > > @@ -222,9 +239,30 @@ ethnl_set_rings(struct ethnl_req_info *req_info, struct genl_info *info) > > tb[ETHTOOL_A_RINGS_RX_PUSH], &mod); > > ethnl_update_u32(&kernel_ringparam.tx_push_buf_len, > > tb[ETHTOOL_A_RINGS_TX_PUSH_BUF_LEN], &mod); > > - if (!mod) > > + ethnl_update_u32(&kernel_ringparam.tcp_data_split_thresh, > > + tb[ETHTOOL_A_RINGS_TCP_DATA_SPLIT_THRESH], > > + &thresh_mod); > > + if (!mod && !thresh_mod) > > return 0; > > > > + if (kernel_ringparam.tcp_data_split == ETHTOOL_TCP_DATA_SPLIT_DISABLED && > > + thresh_mod) { > > + NL_SET_ERR_MSG_ATTR(info->extack, > > + tb[ETHTOOL_A_RINGS_TCP_DATA_SPLIT_THRESH], > > + "tcp-data-split-thresh can not be updated while tcp-data-split is disabled"); > > + return -EINVAL; > > + } > > I'm not sure we need to reject changing the setting when HDS is > disabled. Driver can just store the value until HDS gets enabled? > WDYT? I don't have a strong preference. I checked similar options, which are tx-push and tx-push-buff-len, updating tx-push-buff-len may not fail when tx-push is disabled. I think It's too strong condition check and it's not consistent with similar options. The disabling HDS option is going to be removed in v4 patch. I asked about how to handle hds_threshold when it is UNKNOWN mode in the previous patch thread. If the hds_threshold should follow rx-copybreak value in the UNKNOWN mode, this condition check is not necessary. > > > + if (kernel_ringparam.tcp_data_split_thresh > > > + kernel_ringparam.tcp_data_split_thresh_max) { > > + NL_SET_ERR_MSG_ATTR_FMT(info->extack, > > + tb[ETHTOOL_A_RINGS_TCP_DATA_SPLIT_THRESH_MAX], > > + "Requested tcp-data-split-thresh exceeds the maximum of %u", > > No need for the string, just NL_SET_BAD_ATTR() + ERANGE is enough Thanks, I will fix it. > > > + kernel_ringparam.tcp_data_split_thresh_max); > > + > > + return -EINVAL; > > ERANGE I will fix it too. > > > + } > > + > > /* ensure new ring parameters are within limits */ > > if (ringparam.rx_pending > ringparam.rx_max_pending) > > err_attr = tb[ETHTOOL_A_RINGS_RX]; >
On Wed, 9 Oct 2024 23:25:55 +0900 Taehee Yoo wrote: > > > The tcp-data-split is not enabled, the tcp-data-split-thresh will > > > not be used and can't be configured. > > > > > > # ethtool -G enp14s0f0np0 tcp-data-split off > > > # ethtool -g enp14s0f0np0 > > > Ring parameters for enp14s0f0np0: > > > Pre-set maximums: > > > ... > > > TCP data split thresh: 256 > > > Current hardware settings: > > > ... > > > TCP data split: off > > > TCP data split thresh: n/a > > > > My reply to Sridhar was probably quite unclear on this point, but FWIW > > I do also have a weak preference to drop the "TCP" from the new knob. > > Rephrasing what I said here: > > https://lore.kernel.org/all/20240911173150.571bf93b@kernel.org/ > > the old knob is defined as being about TCP but for the new one we can > > pick how widely applicable it is (and make it cover UDP as well). > > I'm not sure that I understand about "knob". > The knob means the command "tcp-data-split-thresh"? > If so, I would like to change from "tcp-data-split-thresh" to > "header-data-split-thresh". Sounds good! > > > + if (tb[ETHTOOL_A_RINGS_TCP_DATA_SPLIT_THRESH] && > > > + !(ops->supported_ring_params & ETHTOOL_RING_USE_TCP_DATA_SPLIT)) { > > > > here you use the existing flag, yet gve and idpf set that flag and will > > ignore the setting silently. They need to be changed or we need a new > > flag. > > Okay, I would like to add the ETHTOOL_RING_USE_TCP_DATA_SPLIT_THRESH flag. > Or ETHTOOL_RING_USE_HDS_THRESH, which indicates header-data-split thresh. > If you agree with adding a new flag, how do you think about naming it? How about ETHTOOL_RING_USE_HDS_THRS ?
On Thu, Oct 10, 2024 at 12:46 AM Jakub Kicinski <kuba@kernel.org> wrote: > > On Wed, 9 Oct 2024 23:25:55 +0900 Taehee Yoo wrote: > > > > The tcp-data-split is not enabled, the tcp-data-split-thresh will > > > > not be used and can't be configured. > > > > > > > > # ethtool -G enp14s0f0np0 tcp-data-split off > > > > # ethtool -g enp14s0f0np0 > > > > Ring parameters for enp14s0f0np0: > > > > Pre-set maximums: > > > > ... > > > > TCP data split thresh: 256 > > > > Current hardware settings: > > > > ... > > > > TCP data split: off > > > > TCP data split thresh: n/a > > > > > > My reply to Sridhar was probably quite unclear on this point, but FWIW > > > I do also have a weak preference to drop the "TCP" from the new knob. > > > Rephrasing what I said here: > > > https://lore.kernel.org/all/20240911173150.571bf93b@kernel.org/ > > > the old knob is defined as being about TCP but for the new one we can > > > pick how widely applicable it is (and make it cover UDP as well). > > > > I'm not sure that I understand about "knob". > > The knob means the command "tcp-data-split-thresh"? > > If so, I would like to change from "tcp-data-split-thresh" to > > "header-data-split-thresh". > > Sounds good! > > > > > + if (tb[ETHTOOL_A_RINGS_TCP_DATA_SPLIT_THRESH] && > > > > + !(ops->supported_ring_params & ETHTOOL_RING_USE_TCP_DATA_SPLIT)) { > > > > > > here you use the existing flag, yet gve and idpf set that flag and will > > > ignore the setting silently. They need to be changed or we need a new > > > flag. > > > > Okay, I would like to add the ETHTOOL_RING_USE_TCP_DATA_SPLIT_THRESH flag. > > Or ETHTOOL_RING_USE_HDS_THRESH, which indicates header-data-split thresh. > > If you agree with adding a new flag, how do you think about naming it? > > How about ETHTOOL_RING_USE_HDS_THRS ? Thanks! I will use that name.
diff --git a/Documentation/netlink/specs/ethtool.yaml b/Documentation/netlink/specs/ethtool.yaml index 6a050d755b9c..96298fe5ed43 100644 --- a/Documentation/netlink/specs/ethtool.yaml +++ b/Documentation/netlink/specs/ethtool.yaml @@ -215,6 +215,12 @@ attribute-sets: - name: tx-push-buf-len-max type: u32 + - + name: tcp-data-split-thresh + type: u32 + - + name: tcp-data-split-thresh-max + type: u32 - name: mm-stat @@ -1393,6 +1399,8 @@ operations: - rx-push - tx-push-buf-len - tx-push-buf-len-max + - tcp-data-split-thresh + - tcp-data-split-thresh-max dump: *ring-get-op - name: rings-set diff --git a/Documentation/networking/ethtool-netlink.rst b/Documentation/networking/ethtool-netlink.rst index 295563e91082..f0cd918dbe7e 100644 --- a/Documentation/networking/ethtool-netlink.rst +++ b/Documentation/networking/ethtool-netlink.rst @@ -875,24 +875,32 @@ Request contents: Kernel response contents: - ======================================= ====== =========================== - ``ETHTOOL_A_RINGS_HEADER`` nested reply header - ``ETHTOOL_A_RINGS_RX_MAX`` u32 max size of RX ring - ``ETHTOOL_A_RINGS_RX_MINI_MAX`` u32 max size of RX mini ring - ``ETHTOOL_A_RINGS_RX_JUMBO_MAX`` u32 max size of RX jumbo ring - ``ETHTOOL_A_RINGS_TX_MAX`` u32 max size of TX ring - ``ETHTOOL_A_RINGS_RX`` u32 size of RX ring - ``ETHTOOL_A_RINGS_RX_MINI`` u32 size of RX mini ring - ``ETHTOOL_A_RINGS_RX_JUMBO`` u32 size of RX jumbo ring - ``ETHTOOL_A_RINGS_TX`` u32 size of TX ring - ``ETHTOOL_A_RINGS_RX_BUF_LEN`` u32 size of buffers on the ring - ``ETHTOOL_A_RINGS_TCP_DATA_SPLIT`` u8 TCP header / data split - ``ETHTOOL_A_RINGS_CQE_SIZE`` u32 Size of TX/RX CQE - ``ETHTOOL_A_RINGS_TX_PUSH`` u8 flag of TX Push mode - ``ETHTOOL_A_RINGS_RX_PUSH`` u8 flag of RX Push mode - ``ETHTOOL_A_RINGS_TX_PUSH_BUF_LEN`` u32 size of TX push buffer - ``ETHTOOL_A_RINGS_TX_PUSH_BUF_LEN_MAX`` u32 max size of TX push buffer - ======================================= ====== =========================== + ============================================= ====== ======================= + ``ETHTOOL_A_RINGS_HEADER`` nested reply header + ``ETHTOOL_A_RINGS_RX_MAX`` u32 max size of RX ring + ``ETHTOOL_A_RINGS_RX_MINI_MAX`` u32 max size of RX mini + ring + ``ETHTOOL_A_RINGS_RX_JUMBO_MAX`` u32 max size of RX jumbo + ring + ``ETHTOOL_A_RINGS_TX_MAX`` u32 max size of TX ring + ``ETHTOOL_A_RINGS_RX`` u32 size of RX ring + ``ETHTOOL_A_RINGS_RX_MINI`` u32 size of RX mini ring + ``ETHTOOL_A_RINGS_RX_JUMBO`` u32 size of RX jumbo ring + ``ETHTOOL_A_RINGS_TX`` u32 size of TX ring + ``ETHTOOL_A_RINGS_RX_BUF_LEN`` u32 size of buffers on the + ring + ``ETHTOOL_A_RINGS_TCP_DATA_SPLIT`` u8 TCP header / data split + ``ETHTOOL_A_RINGS_CQE_SIZE`` u32 Size of TX/RX CQE + ``ETHTOOL_A_RINGS_TX_PUSH`` u8 flag of TX Push mode + ``ETHTOOL_A_RINGS_RX_PUSH`` u8 flag of RX Push mode + ``ETHTOOL_A_RINGS_TX_PUSH_BUF_LEN`` u32 size of TX push buffer + ``ETHTOOL_A_RINGS_TX_PUSH_BUF_LEN_MAX`` u32 max size of TX push + buffer + ``ETHTOOL_A_RINGS_TCP_DATA_SPLIT_THRESH`` u32 threshold of + TCP header / data split + ``ETHTOOL_A_RINGS_TCP_DATA_SPLIT_THRESH_MAX`` u32 max threshold of + TCP header / data split + ============================================= ====== ======================= ``ETHTOOL_A_RINGS_TCP_DATA_SPLIT`` indicates whether the device is usable with page-flipping TCP zero-copy receive (``getsockopt(TCP_ZEROCOPY_RECEIVE)``). @@ -927,18 +935,21 @@ Sets ring sizes like ``ETHTOOL_SRINGPARAM`` ioctl request. Request contents: - ==================================== ====== =========================== - ``ETHTOOL_A_RINGS_HEADER`` nested reply header - ``ETHTOOL_A_RINGS_RX`` u32 size of RX ring - ``ETHTOOL_A_RINGS_RX_MINI`` u32 size of RX mini ring - ``ETHTOOL_A_RINGS_RX_JUMBO`` u32 size of RX jumbo ring - ``ETHTOOL_A_RINGS_TX`` u32 size of TX ring - ``ETHTOOL_A_RINGS_RX_BUF_LEN`` u32 size of buffers on the ring - ``ETHTOOL_A_RINGS_CQE_SIZE`` u32 Size of TX/RX CQE - ``ETHTOOL_A_RINGS_TX_PUSH`` u8 flag of TX Push mode - ``ETHTOOL_A_RINGS_RX_PUSH`` u8 flag of RX Push mode - ``ETHTOOL_A_RINGS_TX_PUSH_BUF_LEN`` u32 size of TX push buffer - ==================================== ====== =========================== + ========================================= ====== ======================= + ``ETHTOOL_A_RINGS_HEADER`` nested reply header + ``ETHTOOL_A_RINGS_RX`` u32 size of RX ring + ``ETHTOOL_A_RINGS_RX_MINI`` u32 size of RX mini ring + ``ETHTOOL_A_RINGS_RX_JUMBO`` u32 size of RX jumbo ring + ``ETHTOOL_A_RINGS_TX`` u32 size of TX ring + ``ETHTOOL_A_RINGS_RX_BUF_LEN`` u32 size of buffers on the ring + ``ETHTOOL_A_RINGS_TCP_DATA_SPLIT`` u8 TCP header / data split + ``ETHTOOL_A_RINGS_CQE_SIZE`` u32 Size of TX/RX CQE + ``ETHTOOL_A_RINGS_TX_PUSH`` u8 flag of TX Push mode + ``ETHTOOL_A_RINGS_RX_PUSH`` u8 flag of RX Push mode + ``ETHTOOL_A_RINGS_TX_PUSH_BUF_LEN`` u32 size of TX push buffer + ``ETHTOOL_A_RINGS_TCP_DATA_SPLIT_THRESH`` u32 threshold of + TCP header / data split + ========================================= ====== ======================= Kernel checks that requested ring sizes do not exceed limits reported by driver. Driver may impose additional constraints and may not support all @@ -954,6 +965,10 @@ A bigger CQE can have more receive buffer pointers, and in turn the NIC can transfer a bigger frame from wire. Based on the NIC hardware, the overall completion queue size can be adjusted in the driver if CQE size is modified. +``ETHTOOL_A_RINGS_TCP_DATA_SPLIT_THRESH`` specifies the threshold value of +tcp data split feature. If tcp-data-split is enabled and a received packet +size is larger than this threshold value, header and data will be split. + CHANNELS_GET ============ diff --git a/include/linux/ethtool.h b/include/linux/ethtool.h index 12f6dc567598..891f55b0f6aa 100644 --- a/include/linux/ethtool.h +++ b/include/linux/ethtool.h @@ -78,6 +78,8 @@ enum { * @cqe_size: Size of TX/RX completion queue event * @tx_push_buf_len: Size of TX push buffer * @tx_push_buf_max_len: Maximum allowed size of TX push buffer + * @tcp_data_split_thresh: Threshold value of tcp-data-split + * @tcp_data_split_thresh_max: Maximum allowed threshold of tcp-data-split-threshold */ struct kernel_ethtool_ringparam { u32 rx_buf_len; @@ -87,6 +89,8 @@ struct kernel_ethtool_ringparam { u32 cqe_size; u32 tx_push_buf_len; u32 tx_push_buf_max_len; + u32 tcp_data_split_thresh; + u32 tcp_data_split_thresh_max; }; /** diff --git a/include/uapi/linux/ethtool_netlink.h b/include/uapi/linux/ethtool_netlink.h index 283305f6b063..20fe6065b7ba 100644 --- a/include/uapi/linux/ethtool_netlink.h +++ b/include/uapi/linux/ethtool_netlink.h @@ -364,6 +364,8 @@ enum { ETHTOOL_A_RINGS_RX_PUSH, /* u8 */ ETHTOOL_A_RINGS_TX_PUSH_BUF_LEN, /* u32 */ ETHTOOL_A_RINGS_TX_PUSH_BUF_LEN_MAX, /* u32 */ + ETHTOOL_A_RINGS_TCP_DATA_SPLIT_THRESH, /* u32 */ + ETHTOOL_A_RINGS_TCP_DATA_SPLIT_THRESH_MAX, /* u32 */ /* add new constants above here */ __ETHTOOL_A_RINGS_CNT, diff --git a/net/ethtool/netlink.h b/net/ethtool/netlink.h index 203b08eb6c6f..8bea47a26605 100644 --- a/net/ethtool/netlink.h +++ b/net/ethtool/netlink.h @@ -455,7 +455,7 @@ extern const struct nla_policy ethnl_features_set_policy[ETHTOOL_A_FEATURES_WANT extern const struct nla_policy ethnl_privflags_get_policy[ETHTOOL_A_PRIVFLAGS_HEADER + 1]; extern const struct nla_policy ethnl_privflags_set_policy[ETHTOOL_A_PRIVFLAGS_FLAGS + 1]; extern const struct nla_policy ethnl_rings_get_policy[ETHTOOL_A_RINGS_HEADER + 1]; -extern const struct nla_policy ethnl_rings_set_policy[ETHTOOL_A_RINGS_TX_PUSH_BUF_LEN_MAX + 1]; +extern const struct nla_policy ethnl_rings_set_policy[ETHTOOL_A_RINGS_TCP_DATA_SPLIT_THRESH_MAX + 1]; extern const struct nla_policy ethnl_channels_get_policy[ETHTOOL_A_CHANNELS_HEADER + 1]; extern const struct nla_policy ethnl_channels_set_policy[ETHTOOL_A_CHANNELS_COMBINED_COUNT + 1]; extern const struct nla_policy ethnl_coalesce_get_policy[ETHTOOL_A_COALESCE_HEADER + 1]; diff --git a/net/ethtool/rings.c b/net/ethtool/rings.c index b7865a14fdf8..c7824515857f 100644 --- a/net/ethtool/rings.c +++ b/net/ethtool/rings.c @@ -61,7 +61,9 @@ static int rings_reply_size(const struct ethnl_req_info *req_base, nla_total_size(sizeof(u8)) + /* _RINGS_TX_PUSH */ nla_total_size(sizeof(u8))) + /* _RINGS_RX_PUSH */ nla_total_size(sizeof(u32)) + /* _RINGS_TX_PUSH_BUF_LEN */ - nla_total_size(sizeof(u32)); /* _RINGS_TX_PUSH_BUF_LEN_MAX */ + nla_total_size(sizeof(u32)) + /* _RINGS_TX_PUSH_BUF_LEN_MAX */ + nla_total_size(sizeof(u32)) + /* _RINGS_TCP_DATA_SPLIT_THRESH */ + nla_total_size(sizeof(u32)); /* _RINGS_TCP_DATA_SPLIT_THRESH_MAX */ } static int rings_fill_reply(struct sk_buff *skb, @@ -108,7 +110,13 @@ static int rings_fill_reply(struct sk_buff *skb, (nla_put_u32(skb, ETHTOOL_A_RINGS_TX_PUSH_BUF_LEN_MAX, kr->tx_push_buf_max_len) || nla_put_u32(skb, ETHTOOL_A_RINGS_TX_PUSH_BUF_LEN, - kr->tx_push_buf_len)))) + kr->tx_push_buf_len))) || + (kr->tcp_data_split == ETHTOOL_TCP_DATA_SPLIT_ENABLED && + (nla_put_u32(skb, ETHTOOL_A_RINGS_TCP_DATA_SPLIT_THRESH, + kr->tcp_data_split_thresh))) || + (kr->tcp_data_split == ETHTOOL_TCP_DATA_SPLIT_ENABLED && + (nla_put_u32(skb, ETHTOOL_A_RINGS_TCP_DATA_SPLIT_THRESH_MAX, + kr->tcp_data_split_thresh_max)))) return -EMSGSIZE; return 0; @@ -130,6 +138,7 @@ const struct nla_policy ethnl_rings_set_policy[] = { [ETHTOOL_A_RINGS_TX_PUSH] = NLA_POLICY_MAX(NLA_U8, 1), [ETHTOOL_A_RINGS_RX_PUSH] = NLA_POLICY_MAX(NLA_U8, 1), [ETHTOOL_A_RINGS_TX_PUSH_BUF_LEN] = { .type = NLA_U32 }, + [ETHTOOL_A_RINGS_TCP_DATA_SPLIT_THRESH] = { .type = NLA_U32 }, }; static int @@ -155,6 +164,14 @@ ethnl_set_rings_validate(struct ethnl_req_info *req_info, return -EOPNOTSUPP; } + if (tb[ETHTOOL_A_RINGS_TCP_DATA_SPLIT_THRESH] && + !(ops->supported_ring_params & ETHTOOL_RING_USE_TCP_DATA_SPLIT)) { + NL_SET_ERR_MSG_ATTR(info->extack, + tb[ETHTOOL_A_RINGS_TCP_DATA_SPLIT_THRESH], + "setting tcp-data-split-thresh is not supported"); + return -EOPNOTSUPP; + } + if (tb[ETHTOOL_A_RINGS_CQE_SIZE] && !(ops->supported_ring_params & ETHTOOL_RING_USE_CQE_SIZE)) { NL_SET_ERR_MSG_ATTR(info->extack, @@ -196,9 +213,9 @@ ethnl_set_rings(struct ethnl_req_info *req_info, struct genl_info *info) struct kernel_ethtool_ringparam kernel_ringparam = {}; struct ethtool_ringparam ringparam = {}; struct net_device *dev = req_info->dev; + bool mod = false, thresh_mod = false; struct nlattr **tb = info->attrs; const struct nlattr *err_attr; - bool mod = false; int ret; dev->ethtool_ops->get_ringparam(dev, &ringparam, @@ -222,9 +239,30 @@ ethnl_set_rings(struct ethnl_req_info *req_info, struct genl_info *info) tb[ETHTOOL_A_RINGS_RX_PUSH], &mod); ethnl_update_u32(&kernel_ringparam.tx_push_buf_len, tb[ETHTOOL_A_RINGS_TX_PUSH_BUF_LEN], &mod); - if (!mod) + ethnl_update_u32(&kernel_ringparam.tcp_data_split_thresh, + tb[ETHTOOL_A_RINGS_TCP_DATA_SPLIT_THRESH], + &thresh_mod); + if (!mod && !thresh_mod) return 0; + if (kernel_ringparam.tcp_data_split == ETHTOOL_TCP_DATA_SPLIT_DISABLED && + thresh_mod) { + NL_SET_ERR_MSG_ATTR(info->extack, + tb[ETHTOOL_A_RINGS_TCP_DATA_SPLIT_THRESH], + "tcp-data-split-thresh can not be updated while tcp-data-split is disabled"); + return -EINVAL; + } + + if (kernel_ringparam.tcp_data_split_thresh > + kernel_ringparam.tcp_data_split_thresh_max) { + NL_SET_ERR_MSG_ATTR_FMT(info->extack, + tb[ETHTOOL_A_RINGS_TCP_DATA_SPLIT_THRESH_MAX], + "Requested tcp-data-split-thresh exceeds the maximum of %u", + kernel_ringparam.tcp_data_split_thresh_max); + + return -EINVAL; + } + /* ensure new ring parameters are within limits */ if (ringparam.rx_pending > ringparam.rx_max_pending) err_attr = tb[ETHTOOL_A_RINGS_RX];
The tcp-data-split-thresh option configures the threshold value of the tcp-data-split. If a received packet size is larger than this threshold value, a packet will be split into header and payload. The header indicates TCP header, but it depends on driver spec. The bnxt_en driver supports HDS(Header-Data-Split) configuration at FW level, affecting TCP and UDP too. So, like the tcp-data-split option, If tcp-data-split-thresh is set, it affects UDP and TCP packets. The tcp-data-split-thresh has a dependency, that is tcp-data-split option. This threshold value can be get/set only when tcp-data-split option is enabled. Example: # ethtool -G <interface name> tcp-data-split-thresh <value> # ethtool -G enp14s0f0np0 tcp-data-split on tcp-data-split-thresh 256 # ethtool -g enp14s0f0np0 Ring parameters for enp14s0f0np0: Pre-set maximums: ... TCP data split thresh: 256 Current hardware settings: ... TCP data split: on TCP data split thresh: 256 The tcp-data-split is not enabled, the tcp-data-split-thresh will not be used and can't be configured. # ethtool -G enp14s0f0np0 tcp-data-split off # ethtool -g enp14s0f0np0 Ring parameters for enp14s0f0np0: Pre-set maximums: ... TCP data split thresh: 256 Current hardware settings: ... TCP data split: off TCP data split thresh: n/a The default/min/max values are not defined in the ethtool so the drivers should define themself. The 0 value means that all TCP and UDP packets' header and payload will be split. Users should consider the overhead due to this feature. Signed-off-by: Taehee Yoo <ap420073@gmail.com> --- v3: - Fix documentation and ynl - Update error messages - Validate configuration of tcp-data-split and tcp-data-split-thresh v2: - Patch added. Documentation/netlink/specs/ethtool.yaml | 8 +++ Documentation/networking/ethtool-netlink.rst | 75 ++++++++++++-------- include/linux/ethtool.h | 4 ++ include/uapi/linux/ethtool_netlink.h | 2 + net/ethtool/netlink.h | 2 +- net/ethtool/rings.c | 46 ++++++++++-- 6 files changed, 102 insertions(+), 35 deletions(-)