mbox series

[RFC,net,0/3] virtio-net: allow usage of small vrings

Message ID 20230430131518.2708471-1-alvaro.karsz@solid-run.com (mailing list archive)
Headers show
Series virtio-net: allow usage of small vrings | expand

Message

Alvaro Karsz April 30, 2023, 1:15 p.m. UTC
At the moment, if a virtio network device uses vrings with less than
MAX_SKB_FRAGS + 2 entries, the device won't be functional.

The following condition vq->num_free >= 2 + MAX_SKB_FRAGS will always
evaluate to false, leading to TX timeouts.

This patchset attempts this fix this bug, and to allow small rings down
to 4 entries.

The first patch introduces a new mechanism in virtio core - it allows to
block features in probe time.

If a virtio drivers blocks features and fails probe, virtio core will
reset the device, re-negotiate the features and probe again.

This is needed since some virtio net features are not supported with
small rings.

This patchset follows a discussion in the mailing list [1].

This fixes only part of the bug, rings with less than 4 entries won't
work.
My intention is to split the effort and fix the RING_SIZE < 4 case in a
follow up patchset.

Maybe we should fail probe if RING_SIZE < 4 until the follow up patchset?

I tested the patchset with SNET DPU (drivers/vdpa/solidrun), with packed
and split VQs, with rings down to 4 entries, with and without
VIRTIO_NET_F_MRG_RXBUF, with big MTUs.

I would appreciate more testing.
Xuan: I wasn't able to test XDP with my setup, maybe you can help with
that?

[1] https://lore.kernel.org/lkml/20230416074607.292616-1-alvaro.karsz@solid-run.com/

Alvaro Karsz (3):
  virtio: re-negotiate features if probe fails and features are blocked
  virtio-net: allow usage of vrings smaller than MAX_SKB_FRAGS + 2
  virtio-net: block ethtool from converting a ring to a small ring

 drivers/net/virtio_net.c | 161 +++++++++++++++++++++++++++++++++++++--
 drivers/virtio/virtio.c  |  73 +++++++++++++-----
 include/linux/virtio.h   |   3 +
 3 files changed, 212 insertions(+), 25 deletions(-)

Comments

Michael S. Tsirkin April 30, 2023, 2:06 p.m. UTC | #1
On Sun, Apr 30, 2023 at 04:15:15PM +0300, Alvaro Karsz wrote:
> At the moment, if a virtio network device uses vrings with less than
> MAX_SKB_FRAGS + 2 entries, the device won't be functional.
> 
> The following condition vq->num_free >= 2 + MAX_SKB_FRAGS will always
> evaluate to false, leading to TX timeouts.
> 
> This patchset attempts this fix this bug, and to allow small rings down
> to 4 entries.
> The first patch introduces a new mechanism in virtio core - it allows to
> block features in probe time.
> 
> If a virtio drivers blocks features and fails probe, virtio core will
> reset the device, re-negotiate the features and probe again.
> 
> This is needed since some virtio net features are not supported with
> small rings.
> 
> This patchset follows a discussion in the mailing list [1].
> 
> This fixes only part of the bug, rings with less than 4 entries won't
> work.

Why the difference?

> My intention is to split the effort and fix the RING_SIZE < 4 case in a
> follow up patchset.
> 
> Maybe we should fail probe if RING_SIZE < 4 until the follow up patchset?

I'd keep current behaviour.

> I tested the patchset with SNET DPU (drivers/vdpa/solidrun), with packed
> and split VQs, with rings down to 4 entries, with and without
> VIRTIO_NET_F_MRG_RXBUF, with big MTUs.
> 
> I would appreciate more testing.
> Xuan: I wasn't able to test XDP with my setup, maybe you can help with
> that?
> 
> [1] https://lore.kernel.org/lkml/20230416074607.292616-1-alvaro.karsz@solid-run.com/
> 
> Alvaro Karsz (3):
>   virtio: re-negotiate features if probe fails and features are blocked
>   virtio-net: allow usage of vrings smaller than MAX_SKB_FRAGS + 2
>   virtio-net: block ethtool from converting a ring to a small ring
> 
>  drivers/net/virtio_net.c | 161 +++++++++++++++++++++++++++++++++++++--
>  drivers/virtio/virtio.c  |  73 +++++++++++++-----
>  include/linux/virtio.h   |   3 +
>  3 files changed, 212 insertions(+), 25 deletions(-)
> 
> -- 
> 2.34.1
Alvaro Karsz April 30, 2023, 6:15 p.m. UTC | #2
> > This patchset follows a discussion in the mailing list [1].
> >
> > This fixes only part of the bug, rings with less than 4 entries won't
> > work.
> 
> Why the difference?
> 

Because the RING_SIZE < 4 case requires much more adjustments.

* We may need to squeeze the virtio header into the headroom.
* We may need to squeeze the GSO header into the headroom, or block the features.
* At the moment, without NETIF_F_SG, we can receive a skb with 2 segments, we may need to reduce it to 1.
* We may need to change all the control commands, so class,  command and command specific data will fit in a single segment.
* We may need to disable the control command and all the features depending on it.
* We may need to disable NAPI?

There may be more changes..

I was thinking that it may be easier to start with the easier case RING_SIZE >= 4, make sure everything is working fine, then send a follow up patchset with the required adjustments for RING_SIZE < 4.
Michael S. Tsirkin May 1, 2023, 10:27 a.m. UTC | #3
On Sun, Apr 30, 2023 at 06:15:03PM +0000, Alvaro Karsz wrote:
> 
> > > This patchset follows a discussion in the mailing list [1].
> > >
> > > This fixes only part of the bug, rings with less than 4 entries won't
> > > work.
> > 
> > Why the difference?
> > 
> 
> Because the RING_SIZE < 4 case requires much more adjustments.
> 
> * We may need to squeeze the virtio header into the headroom.
> * We may need to squeeze the GSO header into the headroom, or block the features.

We alread do this though no?
I think we'll need to tweak hard_header_len to guarantee it's there
as opposed to needed_headroom ...

> * At the moment, without NETIF_F_SG, we can receive a skb with 2 segments, we may need to reduce it to 1.

You are saying clearing NETIF_F_SG does not guarantee a linear skb?

> * We may need to change all the control commands, so class,  command and command specific data will fit in a single segment.
> * We may need to disable the control command and all the features depending on it.

well if we don't commands just fail as we can't add them right?
no corruption or stalls ...

> * We may need to disable NAPI?

hmm why napi?

> There may be more changes..
> 
> I was thinking that it may be easier to start with the easier case RING_SIZE >= 4, make sure everything is working fine, then send a follow up patchset with the required adjustments for RING_SIZE < 4.


it's ok but I'm just trying to figure out where does 4 come from.
Alvaro Karsz May 1, 2023, 11:41 a.m. UTC | #4
> > > Why the difference?
> > >
> >
> > Because the RING_SIZE < 4 case requires much more adjustments.
> >
> > * We may need to squeeze the virtio header into the headroom.
> > * We may need to squeeze the GSO header into the headroom, or block the features.
> 
> We alread do this though no?
> I think we'll need to tweak hard_header_len to guarantee it's there
> as opposed to needed_headroom ...
> 
> > * At the moment, without NETIF_F_SG, we can receive a skb with 2 segments, we may need to reduce it to 1.
> 
> You are saying clearing NETIF_F_SG does not guarantee a linear skb?
> 

I don't know..
I'm not sure what is the cause, but using this patchset, without any host GSO feature, I can get a chain of 3 descriptors.
Posing an example of a 4 entries ring during iperf3, acting as a client:
(TX descriptors)

len=86       flags 0x1         addr 0xf738d000
len=1448   flags 0x0         addr 0xf738d800
len=86       flags 0x8081   addr 0xf738e000
len=1184,   flags 0x8081  addr 0xf738e800
len=264     flags 0x8080   addr 0xf738f000
len=86       flags 0x8081   addr 0xf738f800
len=1448   flags 0x0         addr 0xf7390000
len=86       flags 0x1         addr 0xf7390800
len=1448   flags 0x0         addr 0xf7391000
len=86       flags 0x1         addr 0xf716a800
len=1448   flags 0x8080   addr 0xf716b000
len=86       flags 0x8081   addr 0xf7391800
len=1448   flags 0x8080   addr 0xf7392000

We got a chain of 3 in here.
This happens often.

Now, when negotiating host GSO features, I can get up to 4:

len=86       flags 0x1         addr 0xf71fc800
len=21328 flags 0x1         addr 0xf6e00800
len=32768 flags 0x8081   addr 0xf6e06000
len=11064 flags 0x8080   addr 0xf6e0e000
len=86       flags 0x8081   addr 0xf738b000
len=1         flags 0x8080   addr 0xf738b800
len=86       flags 0x1         addr 0xf738c000
len=21704 flags 0x1         addr 0xf738c800
len=32768 flags 0x1         addr 0xf7392000
len=10688 flags 0x0         addr 0xf739a000
len=86       flags 0x8081   addr 0xf739d000
len=22080 flags 0x8081   addr 0xf739d800
len=32768 flags 0x8081   addr 0xf73a3000
len=10312 flags 0x8080   addr 0xf73ab000

TBH, I thought that this behaviour was expected until you mentioned it,
This is why virtnet_calc_max_descs returns 3 if no host_gso feature is negotiated, and 4 otherwise.
I was thinking that we may need to use another skb to hold the TSO template (for headers generation)...

Any ideas?

> > * We may need to change all the control commands, so class,  command and command specific data will fit in a single segment.
> > * We may need to disable the control command and all the features depending on it.
> 
> well if we don't commands just fail as we can't add them right?
> no corruption or stalls ...
> 
> > * We may need to disable NAPI?
> 
> hmm why napi?
> 

I'm not sure if it's required to disable it, but I'm not sure what's the point having napi if the ring size is 1..
Will it work?

> > There may be more changes..
> >
> > I was thinking that it may be easier to start with the easier case RING_SIZE >= 4, make sure everything is working fine, then send a follow up patchset with the required adjustments for RING_SIZE < 4.
> 
> 
> it's ok but I'm just trying to figure out where does 4 come from.
> 

I guess this part was not clear, sorry..
In case of split vqs, we have ring size 2 before 4.
And as you saw, we still get chains of 3 when NETIF_F_SG is off (Which I thought was expected).
Michael S. Tsirkin June 2, 2023, 11:29 a.m. UTC | #5
On Mon, May 01, 2023 at 11:41:44AM +0000, Alvaro Karsz wrote:
> > > > Why the difference?
> > > >
> > >
> > > Because the RING_SIZE < 4 case requires much more adjustments.
> > >
> > > * We may need to squeeze the virtio header into the headroom.
> > > * We may need to squeeze the GSO header into the headroom, or block the features.
> > 
> > We alread do this though no?
> > I think we'll need to tweak hard_header_len to guarantee it's there
> > as opposed to needed_headroom ...
> > 
> > > * At the moment, without NETIF_F_SG, we can receive a skb with 2 segments, we may need to reduce it to 1.
> > 
> > You are saying clearing NETIF_F_SG does not guarantee a linear skb?
> > 
> 
> I don't know..
> I'm not sure what is the cause, but using this patchset, without any host GSO feature, I can get a chain of 3 descriptors.
> Posing an example of a 4 entries ring during iperf3, acting as a client:
> (TX descriptors)
> 
> len=86       flags 0x1         addr 0xf738d000
> len=1448   flags 0x0         addr 0xf738d800
> len=86       flags 0x8081   addr 0xf738e000
> len=1184,   flags 0x8081  addr 0xf738e800
> len=264     flags 0x8080   addr 0xf738f000
> len=86       flags 0x8081   addr 0xf738f800
> len=1448   flags 0x0         addr 0xf7390000
> len=86       flags 0x1         addr 0xf7390800
> len=1448   flags 0x0         addr 0xf7391000
> len=86       flags 0x1         addr 0xf716a800
> len=1448   flags 0x8080   addr 0xf716b000
> len=86       flags 0x8081   addr 0xf7391800
> len=1448   flags 0x8080   addr 0xf7392000
> 
> We got a chain of 3 in here.
> This happens often.
> 
> Now, when negotiating host GSO features, I can get up to 4:
> 
> len=86       flags 0x1         addr 0xf71fc800
> len=21328 flags 0x1         addr 0xf6e00800
> len=32768 flags 0x8081   addr 0xf6e06000
> len=11064 flags 0x8080   addr 0xf6e0e000
> len=86       flags 0x8081   addr 0xf738b000
> len=1         flags 0x8080   addr 0xf738b800
> len=86       flags 0x1         addr 0xf738c000
> len=21704 flags 0x1         addr 0xf738c800
> len=32768 flags 0x1         addr 0xf7392000
> len=10688 flags 0x0         addr 0xf739a000
> len=86       flags 0x8081   addr 0xf739d000
> len=22080 flags 0x8081   addr 0xf739d800
> len=32768 flags 0x8081   addr 0xf73a3000
> len=10312 flags 0x8080   addr 0xf73ab000
> 
> TBH, I thought that this behaviour was expected until you mentioned it,
> This is why virtnet_calc_max_descs returns 3 if no host_gso feature is negotiated, and 4 otherwise.
> I was thinking that we may need to use another skb to hold the TSO template (for headers generation)...
> 
> Any ideas?

Something's wrong with the code apparently. Want to try sticking
printk's in the driver to see what is going on?

Documentation/networking/netdev-features.rst says:
	Those features say that ndo_start_xmit can handle fragmented skbs:
	NETIF_F_SG --- paged skbs (skb_shinfo()->frags), NETIF_F_FRAGLIST ---
	chained skbs (skb->next/prev list).


of course we can always linearize if we want to ...


> > > * We may need to change all the control commands, so class,  command and command specific data will fit in a single segment.
> > > * We may need to disable the control command and all the features depending on it.
> > 
> > well if we don't commands just fail as we can't add them right?
> > no corruption or stalls ...
> > 
> > > * We may need to disable NAPI?
> > 
> > hmm why napi?
> > 
> 
> I'm not sure if it's required to disable it, but I'm not sure what's the point having napi if the ring size is 1..
> Will it work?

Not much point in it but it's simpler to just keep things consistent.

> > > There may be more changes..
> > >
> > > I was thinking that it may be easier to start with the easier case RING_SIZE >= 4, make sure everything is working fine, then send a follow up patchset with the required adjustments for RING_SIZE < 4.
> > 
> > 
> > it's ok but I'm just trying to figure out where does 4 come from.
> > 
> 
> I guess this part was not clear, sorry..
> In case of split vqs, we have ring size 2 before 4.
> And as you saw, we still get chains of 3 when NETIF_F_SG is off (Which I thought was expected).
> 

Worth figuring out where do these come from.
Michael S. Tsirkin June 17, 2023, 7:44 a.m. UTC | #6
On Sun, Apr 30, 2023 at 04:15:15PM +0300, Alvaro Karsz wrote:
> At the moment, if a virtio network device uses vrings with less than
> MAX_SKB_FRAGS + 2 entries, the device won't be functional.
> 
> The following condition vq->num_free >= 2 + MAX_SKB_FRAGS will always
> evaluate to false, leading to TX timeouts.
> 
> This patchset attempts this fix this bug, and to allow small rings down
> to 4 entries.
> 
> The first patch introduces a new mechanism in virtio core - it allows to
> block features in probe time.
> 
> If a virtio drivers blocks features and fails probe, virtio core will
> reset the device, re-negotiate the features and probe again.
> 
> This is needed since some virtio net features are not supported with
> small rings.
> 
> This patchset follows a discussion in the mailing list [1].
> 
> This fixes only part of the bug, rings with less than 4 entries won't
> work.
> My intention is to split the effort and fix the RING_SIZE < 4 case in a
> follow up patchset.
> 
> Maybe we should fail probe if RING_SIZE < 4 until the follow up patchset?
> 
> I tested the patchset with SNET DPU (drivers/vdpa/solidrun), with packed
> and split VQs, with rings down to 4 entries, with and without
> VIRTIO_NET_F_MRG_RXBUF, with big MTUs.
> 
> I would appreciate more testing.
> Xuan: I wasn't able to test XDP with my setup, maybe you can help with
> that?
> 
> [1] https://lore.kernel.org/lkml/20230416074607.292616-1-alvaro.karsz@solid-run.com/

the work is orphaned for now. Jason do you want to pick this up?
Related to all the hardening I guess ...

> Alvaro Karsz (3):
>   virtio: re-negotiate features if probe fails and features are blocked
>   virtio-net: allow usage of vrings smaller than MAX_SKB_FRAGS + 2
>   virtio-net: block ethtool from converting a ring to a small ring
> 
>  drivers/net/virtio_net.c | 161 +++++++++++++++++++++++++++++++++++++--
>  drivers/virtio/virtio.c  |  73 +++++++++++++-----
>  include/linux/virtio.h   |   3 +
>  3 files changed, 212 insertions(+), 25 deletions(-)
> 
> -- 
> 2.34.1