diff mbox series

[net-next,v2,2/2] virtio-net: support dim profile fine-tuning

Message ID 1711531146-91920-3-git-send-email-hengqi@linux.alibaba.com (mailing list archive)
State Superseded
Delegated to: Netdev Maintainers
Headers show
Series ethtool: provide the dim profile fine-tuning channel | expand

Checks

Context Check Description
netdev/series_format success Posting correctly formatted
netdev/tree_selection success Clearly marked for net-next
netdev/ynl success Generated files up to date; no warnings/errors; no diff in generated;
netdev/fixes_present success Fixes tag not required for -next series
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 944 this patch: 944
netdev/build_tools success No tools touched, skip
netdev/cc_maintainers warning 1 maintainers not CCed: virtualization@lists.linux.dev
netdev/build_clang success Errors and warnings before: 955 this patch: 955
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/deprecated_api success None detected
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success No Fixes tag
netdev/build_allmodconfig_warn success Errors and warnings before: 955 this patch: 955
netdev/checkpatch success total: 0 errors, 0 warnings, 0 checks, 90 lines checked
netdev/build_clang_rust success No Rust files in patch. Skipping build
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0
netdev/contest success net-next-2024-03-27--15-00 (tests: 952)

Commit Message

Heng Qi March 27, 2024, 9:19 a.m. UTC
Virtio-net has different types of back-end device
implementations. In order to effectively optimize
the dim library's gains for different device
implementations, let's use the new interface params
to fine-tune the profile list.

Signed-off-by: Heng Qi <hengqi@linux.alibaba.com>
---
 drivers/net/virtio_net.c | 54 ++++++++++++++++++++++++++++++++++++++++++++++--
 1 file changed, 52 insertions(+), 2 deletions(-)

Comments

Alexander Lobakin March 27, 2024, 2:45 p.m. UTC | #1
From: Heng Qi <hengqi@linux.alibaba.com>
Date: Wed, 27 Mar 2024 17:19:06 +0800

> Virtio-net has different types of back-end device
> implementations. In order to effectively optimize
> the dim library's gains for different device
> implementations, let's use the new interface params
> to fine-tune the profile list.

Nice idea, but

> 
> Signed-off-by: Heng Qi <hengqi@linux.alibaba.com>
> ---
>  drivers/net/virtio_net.c | 54 ++++++++++++++++++++++++++++++++++++++++++++++--
>  1 file changed, 52 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index e709d44..9b6c727 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -57,6 +57,16 @@
>  
>  #define VIRTNET_DRIVER_VERSION "1.0.0"
>  
> +/* This is copied from NET_DIM_RX_EQE_PROFILES in DIM library */
> +#define VIRTNET_DIM_RX_PKTS 256
> +static struct dim_cq_moder rx_eqe_conf[] = {
> +	{.usec = 1,   .pkts = VIRTNET_DIM_RX_PKTS,},
> +	{.usec = 8,   .pkts = VIRTNET_DIM_RX_PKTS,},
> +	{.usec = 64,  .pkts = VIRTNET_DIM_RX_PKTS,},
> +	{.usec = 128, .pkts = VIRTNET_DIM_RX_PKTS,},
> +	{.usec = 256, .pkts = VIRTNET_DIM_RX_PKTS,}
> +};

This is wrong.
This way you will have one global table for ALL the virtio devices in
the system, while Ethtool performs configuration on a per-netdevice basis.
What you need is to have 1 dim_cq_moder per each virtio netdevice,
embedded somewhere into its netdev_priv(). Then
virtio_dim_{rx,tx}_work() will take profiles from there, not the global
struct. The global struct can stay here as const to initialize default
per-netdevice params.

> +
>  static const unsigned long guest_offloads[] = {
>  	VIRTIO_NET_F_GUEST_TSO4,
>  	VIRTIO_NET_F_GUEST_TSO6,

Thanks,
Olek
Jakub Kicinski March 28, 2024, 12:32 a.m. UTC | #2
On Wed, 27 Mar 2024 15:45:50 +0100 Alexander Lobakin wrote:
> > +/* This is copied from NET_DIM_RX_EQE_PROFILES in DIM library */
> > +#define VIRTNET_DIM_RX_PKTS 256
> > +static struct dim_cq_moder rx_eqe_conf[] = {
> > +	{.usec = 1,   .pkts = VIRTNET_DIM_RX_PKTS,},
> > +	{.usec = 8,   .pkts = VIRTNET_DIM_RX_PKTS,},
> > +	{.usec = 64,  .pkts = VIRTNET_DIM_RX_PKTS,},
> > +	{.usec = 128, .pkts = VIRTNET_DIM_RX_PKTS,},
> > +	{.usec = 256, .pkts = VIRTNET_DIM_RX_PKTS,}
> > +};  
> 
> This is wrong.
> This way you will have one global table for ALL the virtio devices in
> the system, while Ethtool performs configuration on a per-netdevice basis.
> What you need is to have 1 dim_cq_moder per each virtio netdevice,
> embedded somewhere into its netdev_priv(). Then
> virtio_dim_{rx,tx}_work() will take profiles from there, not the global
> struct. The global struct can stay here as const to initialize default
> per-netdevice params.

I've been wondering lately if adaptive IRQ moderation isn't exactly
the kind of heuristic we would be best off deferring to BPF.
I have done 0 experiments -- are the thresholds enough
or do more interesting algos come to mind for anyone?
Heng Qi March 28, 2024, 2:02 a.m. UTC | #3
在 2024/3/27 下午10:45, Alexander Lobakin 写道:
> From: Heng Qi <hengqi@linux.alibaba.com>
> Date: Wed, 27 Mar 2024 17:19:06 +0800
>
>> Virtio-net has different types of back-end device
>> implementations. In order to effectively optimize
>> the dim library's gains for different device
>> implementations, let's use the new interface params
>> to fine-tune the profile list.
> Nice idea, but
>
>> Signed-off-by: Heng Qi <hengqi@linux.alibaba.com>
>> ---
>>   drivers/net/virtio_net.c | 54 ++++++++++++++++++++++++++++++++++++++++++++++--
>>   1 file changed, 52 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
>> index e709d44..9b6c727 100644
>> --- a/drivers/net/virtio_net.c
>> +++ b/drivers/net/virtio_net.c
>> @@ -57,6 +57,16 @@
>>   
>>   #define VIRTNET_DRIVER_VERSION "1.0.0"
>>   
>> +/* This is copied from NET_DIM_RX_EQE_PROFILES in DIM library */
>> +#define VIRTNET_DIM_RX_PKTS 256
>> +static struct dim_cq_moder rx_eqe_conf[] = {
>> +	{.usec = 1,   .pkts = VIRTNET_DIM_RX_PKTS,},
>> +	{.usec = 8,   .pkts = VIRTNET_DIM_RX_PKTS,},
>> +	{.usec = 64,  .pkts = VIRTNET_DIM_RX_PKTS,},
>> +	{.usec = 128, .pkts = VIRTNET_DIM_RX_PKTS,},
>> +	{.usec = 256, .pkts = VIRTNET_DIM_RX_PKTS,}
>> +};
> This is wrong.
> This way you will have one global table for ALL the virtio devices in
> the system, while Ethtool performs configuration on a per-netdevice basis.
> What you need is to have 1 dim_cq_moder per each virtio netdevice,
> embedded somewhere into its netdev_priv(). Then
> virtio_dim_{rx,tx}_work() will take profiles from there, not the global
> struct. The global struct can stay here as const to initialize default
> per-netdevice params.

You are right. Good catch!

Thanks,
Heng

>> +
>>   static const unsigned long guest_offloads[] = {
>>   	VIRTIO_NET_F_GUEST_TSO4,
>>   	VIRTIO_NET_F_GUEST_TSO6,
> Thanks,
> Olek
Xuan Zhuo March 28, 2024, 2:12 a.m. UTC | #4
On Wed, 27 Mar 2024 17:32:58 -0700, Jakub Kicinski <kuba@kernel.org> wrote:
> On Wed, 27 Mar 2024 15:45:50 +0100 Alexander Lobakin wrote:
> > > +/* This is copied from NET_DIM_RX_EQE_PROFILES in DIM library */
> > > +#define VIRTNET_DIM_RX_PKTS 256
> > > +static struct dim_cq_moder rx_eqe_conf[] = {
> > > +	{.usec = 1,   .pkts = VIRTNET_DIM_RX_PKTS,},
> > > +	{.usec = 8,   .pkts = VIRTNET_DIM_RX_PKTS,},
> > > +	{.usec = 64,  .pkts = VIRTNET_DIM_RX_PKTS,},
> > > +	{.usec = 128, .pkts = VIRTNET_DIM_RX_PKTS,},
> > > +	{.usec = 256, .pkts = VIRTNET_DIM_RX_PKTS,}
> > > +};
> >
> > This is wrong.
> > This way you will have one global table for ALL the virtio devices in
> > the system, while Ethtool performs configuration on a per-netdevice basis.
> > What you need is to have 1 dim_cq_moder per each virtio netdevice,
> > embedded somewhere into its netdev_priv(). Then
> > virtio_dim_{rx,tx}_work() will take profiles from there, not the global
> > struct. The global struct can stay here as const to initialize default
> > per-netdevice params.
>
> I've been wondering lately if adaptive IRQ moderation isn't exactly
> the kind of heuristic we would be best off deferring to BPF.
> I have done 0 experiments -- are the thresholds enough
> or do more interesting algos come to mind for anyone?


Yes, we are working on this as well.

For netdim, I think profiles are an aspect. In many cases, this can solve many
problems.

It is a good idea to introduce bpf, but what bpf should do is some logical
situations that are not universal. In our practice, modifying profiles can solve
most situations. We also have a small number of scenarios that cannot be handled,
and we hope to use ebpf to solve them.

Thanks.
Heng Qi March 28, 2024, 2:26 a.m. UTC | #5
在 2024/3/28 上午8:32, Jakub Kicinski 写道:
> On Wed, 27 Mar 2024 15:45:50 +0100 Alexander Lobakin wrote:
>>> +/* This is copied from NET_DIM_RX_EQE_PROFILES in DIM library */
>>> +#define VIRTNET_DIM_RX_PKTS 256
>>> +static struct dim_cq_moder rx_eqe_conf[] = {
>>> +	{.usec = 1,   .pkts = VIRTNET_DIM_RX_PKTS,},
>>> +	{.usec = 8,   .pkts = VIRTNET_DIM_RX_PKTS,},
>>> +	{.usec = 64,  .pkts = VIRTNET_DIM_RX_PKTS,},
>>> +	{.usec = 128, .pkts = VIRTNET_DIM_RX_PKTS,},
>>> +	{.usec = 256, .pkts = VIRTNET_DIM_RX_PKTS,}
>>> +};
>> This is wrong.
>> This way you will have one global table for ALL the virtio devices in
>> the system, while Ethtool performs configuration on a per-netdevice basis.
>> What you need is to have 1 dim_cq_moder per each virtio netdevice,
>> embedded somewhere into its netdev_priv(). Then
>> virtio_dim_{rx,tx}_work() will take profiles from there, not the global
>> struct. The global struct can stay here as const to initialize default
>> per-netdevice params.
> I've been wondering lately if adaptive IRQ moderation isn't exactly
> the kind of heuristic we would be best off deferring to BPF.
> I have done 0 experiments -- are the thresholds enough
> or do more interesting algos come to mind for anyone?

Hi Jakub.

I totally agree with your idea. In order to get the best practices for 
virtio DIM on our DPUs,
I actually spent a lot of energy on tuning, similar to the current 
custom profile list and virtio
ctrlq asynchronousization. There are some other tuning methods that may 
not be applicable to
other manufacturers, so they are cannot be released for the time being.

But anyway, adding a dim_bpf interface similar to native XDP (located at 
the beginning
of net_dim()) I think would be good, if you and BPF maintainers (Cc'd) 
are willing for a new bpf type
to be introduced. dim_bpf allows devices that want to implement best 
practices for adaptive
IRQ moderation to implement custom logic.

Also, I think the existing patches are still necessary, and this 
provides the simplest and most
direct way to tune.

Thanks!
Michael S. Tsirkin March 28, 2024, 7:27 a.m. UTC | #6
On Wed, Mar 27, 2024 at 05:19:06PM +0800, Heng Qi wrote:
> Virtio-net has different types of back-end device
> implementations. In order to effectively optimize
> the dim library's gains for different device
> implementations, let's use the new interface params
> to fine-tune the profile list.
> 
> Signed-off-by: Heng Qi <hengqi@linux.alibaba.com>
> ---
>  drivers/net/virtio_net.c | 54 ++++++++++++++++++++++++++++++++++++++++++++++--
>  1 file changed, 52 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index e709d44..9b6c727 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -57,6 +57,16 @@
>  
>  #define VIRTNET_DRIVER_VERSION "1.0.0"
>  
> +/* This is copied from NET_DIM_RX_EQE_PROFILES in DIM library */

So maybe move it to a header and reuse?


> +#define VIRTNET_DIM_RX_PKTS 256
> +static struct dim_cq_moder rx_eqe_conf[] = {
> +	{.usec = 1,   .pkts = VIRTNET_DIM_RX_PKTS,},
> +	{.usec = 8,   .pkts = VIRTNET_DIM_RX_PKTS,},
> +	{.usec = 64,  .pkts = VIRTNET_DIM_RX_PKTS,},
> +	{.usec = 128, .pkts = VIRTNET_DIM_RX_PKTS,},
> +	{.usec = 256, .pkts = VIRTNET_DIM_RX_PKTS,}
> +};
> +
>  static const unsigned long guest_offloads[] = {
>  	VIRTIO_NET_F_GUEST_TSO4,
>  	VIRTIO_NET_F_GUEST_TSO6,
> @@ -3584,7 +3594,10 @@ static void virtnet_rx_dim_work(struct work_struct *work)
>  		if (!rq->dim_enabled)
>  			continue;
>  
> -		update_moder = net_dim_get_rx_moderation(dim->mode, dim->profile_ix);
> +		if (dim->profile_ix >= ARRAY_SIZE(rx_eqe_conf))
> +			dim->profile_ix = ARRAY_SIZE(rx_eqe_conf) - 1;
> +
> +		update_moder = rx_eqe_conf[dim->profile_ix];
>  		if (update_moder.usec != rq->intr_coal.max_usecs ||
>  		    update_moder.pkts != rq->intr_coal.max_packets) {
>  			err = virtnet_send_rx_ctrl_coal_vq_cmd(vi, qnum,
> @@ -3627,6 +3640,34 @@ static int virtnet_should_update_vq_weight(int dev_flags, int weight,
>  	return 0;
>  }
>  
> +static int virtnet_update_profile(struct virtnet_info *vi,
> +				  struct kernel_ethtool_coalesce *kc)
> +{
> +	int i;
> +
> +	if (virtio_has_feature(vi->vdev, VIRTIO_NET_F_VQ_NOTF_COAL)) {
> +		for (i = 0; i < NET_DIM_PARAMS_NUM_PROFILES; i++)
> +			if (kc->rx_eqe_profs[i].comps)
> +				return -EINVAL;
> +	} else {
> +		for (i = 0; i < NET_DIM_PARAMS_NUM_PROFILES; i++) {
> +			if (kc->rx_eqe_profs[i].usec != rx_eqe_conf[i].usec ||
> +			    kc->rx_eqe_profs[i].pkts != rx_eqe_conf[i].pkts ||
> +			    kc->rx_eqe_profs[i].comps)
> +				return -EINVAL;
> +		}
> +
> +		return 0;
> +	}
> +
> +	for (i = 0; i < NET_DIM_PARAMS_NUM_PROFILES; i++) {
> +		rx_eqe_conf[i].usec = kc->rx_eqe_profs[i].usec;
> +		rx_eqe_conf[i].pkts = kc->rx_eqe_profs[i].pkts;
> +	}
> +
> +	return 0;
> +}
> +
>  static int virtnet_set_coalesce(struct net_device *dev,
>  				struct ethtool_coalesce *ec,
>  				struct kernel_ethtool_coalesce *kernel_coal,
> @@ -3653,6 +3694,10 @@ static int virtnet_set_coalesce(struct net_device *dev,
>  		}
>  	}
>  
> +	ret = virtnet_update_profile(vi, kernel_coal);
> +	if (ret)
> +		return ret;
> +
>  	if (virtio_has_feature(vi->vdev, VIRTIO_NET_F_NOTF_COAL))
>  		ret = virtnet_send_notf_coal_cmds(vi, ec);
>  	else
> @@ -3689,6 +3734,10 @@ static int virtnet_get_coalesce(struct net_device *dev,
>  			ec->tx_max_coalesced_frames = 1;
>  	}
>  
> +	if (virtio_has_feature(vi->vdev, VIRTIO_NET_F_VQ_NOTF_COAL))
> +		memcpy(kernel_coal->rx_eqe_profs, rx_eqe_conf,
> +		       sizeof(rx_eqe_conf));
> +
>  	return 0;
>  }
>  
> @@ -3868,7 +3917,8 @@ static int virtnet_set_rxnfc(struct net_device *dev, struct ethtool_rxnfc *info)
>  
>  static const struct ethtool_ops virtnet_ethtool_ops = {
>  	.supported_coalesce_params = ETHTOOL_COALESCE_MAX_FRAMES |
> -		ETHTOOL_COALESCE_USECS | ETHTOOL_COALESCE_USE_ADAPTIVE_RX,
> +		ETHTOOL_COALESCE_USECS | ETHTOOL_COALESCE_USE_ADAPTIVE_RX |
> +		ETHTOOL_COALESCE_RX_EQE_PROFILE,
>  	.get_drvinfo = virtnet_get_drvinfo,
>  	.get_link = ethtool_op_get_link,
>  	.get_ringparam = virtnet_get_ringparam,
> -- 
> 1.8.3.1
Michael S. Tsirkin March 28, 2024, 7:27 a.m. UTC | #7
On Wed, Mar 27, 2024 at 03:45:50PM +0100, Alexander Lobakin wrote:
> From: Heng Qi <hengqi@linux.alibaba.com>
> Date: Wed, 27 Mar 2024 17:19:06 +0800
> 
> > Virtio-net has different types of back-end device
> > implementations. In order to effectively optimize
> > the dim library's gains for different device
> > implementations, let's use the new interface params
> > to fine-tune the profile list.
> 
> Nice idea, but
> 
> > 
> > Signed-off-by: Heng Qi <hengqi@linux.alibaba.com>
> > ---
> >  drivers/net/virtio_net.c | 54 ++++++++++++++++++++++++++++++++++++++++++++++--
> >  1 file changed, 52 insertions(+), 2 deletions(-)
> > 
> > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > index e709d44..9b6c727 100644
> > --- a/drivers/net/virtio_net.c
> > +++ b/drivers/net/virtio_net.c
> > @@ -57,6 +57,16 @@
> >  
> >  #define VIRTNET_DRIVER_VERSION "1.0.0"
> >  
> > +/* This is copied from NET_DIM_RX_EQE_PROFILES in DIM library */
> > +#define VIRTNET_DIM_RX_PKTS 256
> > +static struct dim_cq_moder rx_eqe_conf[] = {
> > +	{.usec = 1,   .pkts = VIRTNET_DIM_RX_PKTS,},
> > +	{.usec = 8,   .pkts = VIRTNET_DIM_RX_PKTS,},
> > +	{.usec = 64,  .pkts = VIRTNET_DIM_RX_PKTS,},
> > +	{.usec = 128, .pkts = VIRTNET_DIM_RX_PKTS,},
> > +	{.usec = 256, .pkts = VIRTNET_DIM_RX_PKTS,}
> > +};
> 
> This is wrong.

Yes I'd expect anything global to be const.

> This way you will have one global table for ALL the virtio devices in
> the system, while Ethtool performs configuration on a per-netdevice basis.
> What you need is to have 1 dim_cq_moder per each virtio netdevice,
> embedded somewhere into its netdev_priv(). Then
> virtio_dim_{rx,tx}_work() will take profiles from there, not the global
> struct. The global struct can stay here as const to initialize default
> per-netdevice params.
> 
> > +
> >  static const unsigned long guest_offloads[] = {
> >  	VIRTIO_NET_F_GUEST_TSO4,
> >  	VIRTIO_NET_F_GUEST_TSO6,
> 
> Thanks,
> Olek
Heng Qi March 28, 2024, 7:51 a.m. UTC | #8
在 2024/3/28 下午3:27, Michael S. Tsirkin 写道:
> On Wed, Mar 27, 2024 at 05:19:06PM +0800, Heng Qi wrote:
>> Virtio-net has different types of back-end device
>> implementations. In order to effectively optimize
>> the dim library's gains for different device
>> implementations, let's use the new interface params
>> to fine-tune the profile list.
>>
>> Signed-off-by: Heng Qi <hengqi@linux.alibaba.com>
>> ---
>>   drivers/net/virtio_net.c | 54 ++++++++++++++++++++++++++++++++++++++++++++++--
>>   1 file changed, 52 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
>> index e709d44..9b6c727 100644
>> --- a/drivers/net/virtio_net.c
>> +++ b/drivers/net/virtio_net.c
>> @@ -57,6 +57,16 @@
>>   
>>   #define VIRTNET_DRIVER_VERSION "1.0.0"
>>   
>> +/* This is copied from NET_DIM_RX_EQE_PROFILES in DIM library */
> So maybe move it to a header and reuse?

Will do. And plan to put configurable parameters into 'vi'.

Thanks.

>
>
>> +#define VIRTNET_DIM_RX_PKTS 256
>> +static struct dim_cq_moder rx_eqe_conf[] = {
>> +	{.usec = 1,   .pkts = VIRTNET_DIM_RX_PKTS,},
>> +	{.usec = 8,   .pkts = VIRTNET_DIM_RX_PKTS,},
>> +	{.usec = 64,  .pkts = VIRTNET_DIM_RX_PKTS,},
>> +	{.usec = 128, .pkts = VIRTNET_DIM_RX_PKTS,},
>> +	{.usec = 256, .pkts = VIRTNET_DIM_RX_PKTS,}
>> +};
>> +
>>   static const unsigned long guest_offloads[] = {
>>   	VIRTIO_NET_F_GUEST_TSO4,
>>   	VIRTIO_NET_F_GUEST_TSO6,
>> @@ -3584,7 +3594,10 @@ static void virtnet_rx_dim_work(struct work_struct *work)
>>   		if (!rq->dim_enabled)
>>   			continue;
>>   
>> -		update_moder = net_dim_get_rx_moderation(dim->mode, dim->profile_ix);
>> +		if (dim->profile_ix >= ARRAY_SIZE(rx_eqe_conf))
>> +			dim->profile_ix = ARRAY_SIZE(rx_eqe_conf) - 1;
>> +
>> +		update_moder = rx_eqe_conf[dim->profile_ix];
>>   		if (update_moder.usec != rq->intr_coal.max_usecs ||
>>   		    update_moder.pkts != rq->intr_coal.max_packets) {
>>   			err = virtnet_send_rx_ctrl_coal_vq_cmd(vi, qnum,
>> @@ -3627,6 +3640,34 @@ static int virtnet_should_update_vq_weight(int dev_flags, int weight,
>>   	return 0;
>>   }
>>   
>> +static int virtnet_update_profile(struct virtnet_info *vi,
>> +				  struct kernel_ethtool_coalesce *kc)
>> +{
>> +	int i;
>> +
>> +	if (virtio_has_feature(vi->vdev, VIRTIO_NET_F_VQ_NOTF_COAL)) {
>> +		for (i = 0; i < NET_DIM_PARAMS_NUM_PROFILES; i++)
>> +			if (kc->rx_eqe_profs[i].comps)
>> +				return -EINVAL;
>> +	} else {
>> +		for (i = 0; i < NET_DIM_PARAMS_NUM_PROFILES; i++) {
>> +			if (kc->rx_eqe_profs[i].usec != rx_eqe_conf[i].usec ||
>> +			    kc->rx_eqe_profs[i].pkts != rx_eqe_conf[i].pkts ||
>> +			    kc->rx_eqe_profs[i].comps)
>> +				return -EINVAL;
>> +		}
>> +
>> +		return 0;
>> +	}
>> +
>> +	for (i = 0; i < NET_DIM_PARAMS_NUM_PROFILES; i++) {
>> +		rx_eqe_conf[i].usec = kc->rx_eqe_profs[i].usec;
>> +		rx_eqe_conf[i].pkts = kc->rx_eqe_profs[i].pkts;
>> +	}
>> +
>> +	return 0;
>> +}
>> +
>>   static int virtnet_set_coalesce(struct net_device *dev,
>>   				struct ethtool_coalesce *ec,
>>   				struct kernel_ethtool_coalesce *kernel_coal,
>> @@ -3653,6 +3694,10 @@ static int virtnet_set_coalesce(struct net_device *dev,
>>   		}
>>   	}
>>   
>> +	ret = virtnet_update_profile(vi, kernel_coal);
>> +	if (ret)
>> +		return ret;
>> +
>>   	if (virtio_has_feature(vi->vdev, VIRTIO_NET_F_NOTF_COAL))
>>   		ret = virtnet_send_notf_coal_cmds(vi, ec);
>>   	else
>> @@ -3689,6 +3734,10 @@ static int virtnet_get_coalesce(struct net_device *dev,
>>   			ec->tx_max_coalesced_frames = 1;
>>   	}
>>   
>> +	if (virtio_has_feature(vi->vdev, VIRTIO_NET_F_VQ_NOTF_COAL))
>> +		memcpy(kernel_coal->rx_eqe_profs, rx_eqe_conf,
>> +		       sizeof(rx_eqe_conf));
>> +
>>   	return 0;
>>   }
>>   
>> @@ -3868,7 +3917,8 @@ static int virtnet_set_rxnfc(struct net_device *dev, struct ethtool_rxnfc *info)
>>   
>>   static const struct ethtool_ops virtnet_ethtool_ops = {
>>   	.supported_coalesce_params = ETHTOOL_COALESCE_MAX_FRAMES |
>> -		ETHTOOL_COALESCE_USECS | ETHTOOL_COALESCE_USE_ADAPTIVE_RX,
>> +		ETHTOOL_COALESCE_USECS | ETHTOOL_COALESCE_USE_ADAPTIVE_RX |
>> +		ETHTOOL_COALESCE_RX_EQE_PROFILE,
>>   	.get_drvinfo = virtnet_get_drvinfo,
>>   	.get_link = ethtool_op_get_link,
>>   	.get_ringparam = virtnet_get_ringparam,
>> -- 
>> 1.8.3.1
Jakub Kicinski March 28, 2024, 4:48 p.m. UTC | #9
On Thu, 28 Mar 2024 10:12:10 +0800 Xuan Zhuo wrote:
> For netdim, I think profiles are an aspect. In many cases, this can solve many
> problems.

Okay, but then you should try harder to hide all the config in the core.
The driver should be blissfully unaware that the user is changing 
the settings. It should just continue calling net_dim_get_*moderation().

You can create proper dim_init(), dim_destroy() functions for drivers
to call, instead of doing

	INIT_WORK(&bla->dim.work, my_driver_do_dim_work);

directly. In dim_init() you can hook the dim structure to net_device
and then ethtool code can operation on it without driver involvement.

About the uAPI - please make sure you add the new stuff to
Documentation/netlink/specs/ethtool.yaml
see: https://docs.kernel.org/next/userspace-api/netlink/specs.html

And break up the attributes, please, no raw C structs of this nature:

+	return nla_put(skb, attr_type, sizeof(struct dim_cq_moder) *
+		       NET_DIM_PARAMS_NUM_PROFILES, profs);

They are hard to extend.
Heng Qi March 29, 2024, 8:56 a.m. UTC | #10
在 2024/3/29 上午12:48, Jakub Kicinski 写道:
> On Thu, 28 Mar 2024 10:12:10 +0800 Xuan Zhuo wrote:
>> For netdim, I think profiles are an aspect. In many cases, this can solve many
>> problems.
> Okay, but then you should try harder to hide all the config in the core.
> The driver should be blissfully unaware that the user is changing
> the settings. It should just continue calling net_dim_get_*moderation().
>
> You can create proper dim_init(), dim_destroy() functions for drivers
> to call, instead of doing
>
> 	INIT_WORK(&bla->dim.work, my_driver_do_dim_work);
>
> directly. In dim_init() you can hook the dim structure to net_device
> and then ethtool code can operation on it without driver involvement.

Ok. Will try this.

>
> About the uAPI - please make sure you add the new stuff to
> Documentation/netlink/specs/ethtool.yaml
> see: https://docs.kernel.org/next/userspace-api/netlink/specs.html
>
> And break up the attributes, please, no raw C structs of this nature:
>
> +	return nla_put(skb, attr_type, sizeof(struct dim_cq_moder) *
> +		       NET_DIM_PARAMS_NUM_PROFILES, profs);
>
> They are hard to extend.

Sorry, I don't seem to get your point, why does this make extending hard?

Are you referring to specifying ETHTOOL_A_COALESCE_RX_EQE_PROFILE
as a nested array, i.e. having each element explicitly have an attr 
name? or passing the
u16 pointer and length as arguments?

Thanks.
Jakub Kicinski March 29, 2024, 3:18 p.m. UTC | #11
On Fri, 29 Mar 2024 16:56:12 +0800 Heng Qi wrote:
> > About the uAPI - please make sure you add the new stuff to
> > Documentation/netlink/specs/ethtool.yaml
> > see: https://docs.kernel.org/next/userspace-api/netlink/specs.html
> >
> > And break up the attributes, please, no raw C structs of this nature:
> >
> > +	return nla_put(skb, attr_type, sizeof(struct dim_cq_moder) *
> > +		       NET_DIM_PARAMS_NUM_PROFILES, profs);
> >
> > They are hard to extend.  
> 
> Sorry, I don't seem to get your point, why does this make extending hard?

It's not possible to make some fields optional later on.
It's also possible that user space will make assumptions about the size
of this struct so we won't be able to add fields.

So it's preferred to render the C struct members as individual netlink 
attributes. Look around ethtool netlink, you'll see there are no
structs dumped.

> Are you referring to specifying ETHTOOL_A_COALESCE_RX_EQE_PROFILE
> as a nested array, i.e. having each element explicitly have an attr 
> name? or passing the u16 pointer and length as arguments?
diff mbox series

Patch

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index e709d44..9b6c727 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -57,6 +57,16 @@ 
 
 #define VIRTNET_DRIVER_VERSION "1.0.0"
 
+/* This is copied from NET_DIM_RX_EQE_PROFILES in DIM library */
+#define VIRTNET_DIM_RX_PKTS 256
+static struct dim_cq_moder rx_eqe_conf[] = {
+	{.usec = 1,   .pkts = VIRTNET_DIM_RX_PKTS,},
+	{.usec = 8,   .pkts = VIRTNET_DIM_RX_PKTS,},
+	{.usec = 64,  .pkts = VIRTNET_DIM_RX_PKTS,},
+	{.usec = 128, .pkts = VIRTNET_DIM_RX_PKTS,},
+	{.usec = 256, .pkts = VIRTNET_DIM_RX_PKTS,}
+};
+
 static const unsigned long guest_offloads[] = {
 	VIRTIO_NET_F_GUEST_TSO4,
 	VIRTIO_NET_F_GUEST_TSO6,
@@ -3584,7 +3594,10 @@  static void virtnet_rx_dim_work(struct work_struct *work)
 		if (!rq->dim_enabled)
 			continue;
 
-		update_moder = net_dim_get_rx_moderation(dim->mode, dim->profile_ix);
+		if (dim->profile_ix >= ARRAY_SIZE(rx_eqe_conf))
+			dim->profile_ix = ARRAY_SIZE(rx_eqe_conf) - 1;
+
+		update_moder = rx_eqe_conf[dim->profile_ix];
 		if (update_moder.usec != rq->intr_coal.max_usecs ||
 		    update_moder.pkts != rq->intr_coal.max_packets) {
 			err = virtnet_send_rx_ctrl_coal_vq_cmd(vi, qnum,
@@ -3627,6 +3640,34 @@  static int virtnet_should_update_vq_weight(int dev_flags, int weight,
 	return 0;
 }
 
+static int virtnet_update_profile(struct virtnet_info *vi,
+				  struct kernel_ethtool_coalesce *kc)
+{
+	int i;
+
+	if (virtio_has_feature(vi->vdev, VIRTIO_NET_F_VQ_NOTF_COAL)) {
+		for (i = 0; i < NET_DIM_PARAMS_NUM_PROFILES; i++)
+			if (kc->rx_eqe_profs[i].comps)
+				return -EINVAL;
+	} else {
+		for (i = 0; i < NET_DIM_PARAMS_NUM_PROFILES; i++) {
+			if (kc->rx_eqe_profs[i].usec != rx_eqe_conf[i].usec ||
+			    kc->rx_eqe_profs[i].pkts != rx_eqe_conf[i].pkts ||
+			    kc->rx_eqe_profs[i].comps)
+				return -EINVAL;
+		}
+
+		return 0;
+	}
+
+	for (i = 0; i < NET_DIM_PARAMS_NUM_PROFILES; i++) {
+		rx_eqe_conf[i].usec = kc->rx_eqe_profs[i].usec;
+		rx_eqe_conf[i].pkts = kc->rx_eqe_profs[i].pkts;
+	}
+
+	return 0;
+}
+
 static int virtnet_set_coalesce(struct net_device *dev,
 				struct ethtool_coalesce *ec,
 				struct kernel_ethtool_coalesce *kernel_coal,
@@ -3653,6 +3694,10 @@  static int virtnet_set_coalesce(struct net_device *dev,
 		}
 	}
 
+	ret = virtnet_update_profile(vi, kernel_coal);
+	if (ret)
+		return ret;
+
 	if (virtio_has_feature(vi->vdev, VIRTIO_NET_F_NOTF_COAL))
 		ret = virtnet_send_notf_coal_cmds(vi, ec);
 	else
@@ -3689,6 +3734,10 @@  static int virtnet_get_coalesce(struct net_device *dev,
 			ec->tx_max_coalesced_frames = 1;
 	}
 
+	if (virtio_has_feature(vi->vdev, VIRTIO_NET_F_VQ_NOTF_COAL))
+		memcpy(kernel_coal->rx_eqe_profs, rx_eqe_conf,
+		       sizeof(rx_eqe_conf));
+
 	return 0;
 }
 
@@ -3868,7 +3917,8 @@  static int virtnet_set_rxnfc(struct net_device *dev, struct ethtool_rxnfc *info)
 
 static const struct ethtool_ops virtnet_ethtool_ops = {
 	.supported_coalesce_params = ETHTOOL_COALESCE_MAX_FRAMES |
-		ETHTOOL_COALESCE_USECS | ETHTOOL_COALESCE_USE_ADAPTIVE_RX,
+		ETHTOOL_COALESCE_USECS | ETHTOOL_COALESCE_USE_ADAPTIVE_RX |
+		ETHTOOL_COALESCE_RX_EQE_PROFILE,
 	.get_drvinfo = virtnet_get_drvinfo,
 	.get_link = ethtool_op_get_link,
 	.get_ringparam = virtnet_get_ringparam,