Message ID | 20240621192541.2082657-2-avkrasnov@salutedevices.com (mailing list archive) |
---|---|
State | RFC |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | virtio/vsock: some updates for deferred credit update | expand |
On Fri, Jun 21, 2024 at 10:25:40PM GMT, Arseniy Krasnov wrote: >Previous calculation of 'free_space' was wrong (but worked as expected >in most cases, see below), because it didn't account number of bytes in >rx queue. Let's rework 'free_space' calculation in the following way: >as this value is considered free space at rx side from tx point of view, >it must be equal to return value of 'virtio_transport_get_credit()' at >tx side. This function uses 'tx_cnt' counter and 'peer_fwd_cnt': first >is number of transmitted bytes (without wrap), second is last 'fwd_cnt' >value received from rx. So let's use same approach at rx side during >'free_space' calculation: add 'rx_cnt' counter which is number of >received bytes (also without wrap) and subtract 'last_fwd_cnt' from it. >Now we have: >1) 'rx_cnt' == 'tx_cnt' at both sides. >2) 'last_fwd_cnt' == 'peer_fwd_cnt' - because first is last 'fwd_cnt' > sent to tx, while second is last 'fwd_cnt' received from rx. > >Now 'free_space' is handled correctly and also we don't need mmm, I don't know if it was wrong before, maybe we could say it was less accurate. That said, could we have the same problem now if we have a lot of producers and the virtqueue becomes full? >'low_rx_bytes' flag - this was more like a hack. > >Previous calculation of 'free_space' worked (in 99% cases), because if >we take a look on behaviour of both expressions (new and previous): > >'(rx_cnt - last_fwd_cnt)' and '(fwd_cnt - last_fwd_cnt)' > >Both of them always grows up, with almost same "speed": only difference >is that 'rx_cnt' is incremented earlier during packet is received, >while 'fwd_cnt' in incremented when packet is read by user. So if 'rx_cnt' >grows "faster", then resulting 'free_space' become smaller also, so we >send credit updates a little bit more, but: > > * 'free_space' calculation based on 'rx_cnt' gives the same value, > which tx sees as free space at rx side, so original idea of Ditto, what happen if the virtqueue is full? > 'free_space' is now implemented as planned. > * Hack with 'low_rx_bytes' now is not needed. Yeah, so this patch should also mitigate issue reported by Alex (added in CC), right? If yes, please mention that problem and add a Reported-by giving credit to Alex. > >Also here is some performance comparison between both versions of >'free_space' calculation: > > *------*----------*----------* > | | 'rx_cnt' | previous | > *------*----------*----------* > |H -> G| 8.42 | 7.82 | > *------*----------*----------* > |G -> H| 11.6 | 12.1 | > *------*----------*----------* How many seconds did you run it? How many repetitions? There's a little discrepancy anyway, but I can't tell if it's just noise. > >As benchmark 'vsock-iperf' with default arguments was used. There is no >significant performance difference before and after this patch. > >Signed-off-by: Arseniy Krasnov <avkrasnov@salutedevices.com> >--- > include/linux/virtio_vsock.h | 1 + > net/vmw_vsock/virtio_transport_common.c | 8 +++----- > 2 files changed, 4 insertions(+), 5 deletions(-) Thanks for working on this, I'll do more tests but the approach LGTM. Thanks, Stefano > >diff --git a/include/linux/virtio_vsock.h b/include/linux/virtio_vsock.h >index c82089dee0c8..3579491c411e 100644 >--- a/include/linux/virtio_vsock.h >+++ b/include/linux/virtio_vsock.h >@@ -135,6 +135,7 @@ struct virtio_vsock_sock { > u32 peer_buf_alloc; > > /* Protected by rx_lock */ >+ u32 rx_cnt; > u32 fwd_cnt; > u32 last_fwd_cnt; > u32 rx_bytes; >diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c >index 16ff976a86e3..1d4e2328e06e 100644 >--- a/net/vmw_vsock/virtio_transport_common.c >+++ b/net/vmw_vsock/virtio_transport_common.c >@@ -441,6 +441,7 @@ static bool virtio_transport_inc_rx_pkt(struct virtio_vsock_sock *vvs, > return false; > > vvs->rx_bytes += len; >+ vvs->rx_cnt += len; > return true; > } > >@@ -558,7 +559,6 @@ virtio_transport_stream_do_dequeue(struct vsock_sock *vsk, > size_t bytes, total = 0; > struct sk_buff *skb; > u32 fwd_cnt_delta; >- bool low_rx_bytes; > int err = -EFAULT; > u32 free_space; > >@@ -603,9 +603,7 @@ virtio_transport_stream_do_dequeue(struct vsock_sock *vsk, > } > > fwd_cnt_delta = vvs->fwd_cnt - vvs->last_fwd_cnt; >- free_space = vvs->buf_alloc - fwd_cnt_delta; >- low_rx_bytes = (vvs->rx_bytes < >- sock_rcvlowat(sk_vsock(vsk), 0, INT_MAX)); >+ free_space = vvs->buf_alloc - (vvs->rx_cnt - vvs->last_fwd_cnt); > > spin_unlock_bh(&vvs->rx_lock); > >@@ -619,7 +617,7 @@ virtio_transport_stream_do_dequeue(struct vsock_sock *vsk, > * number of bytes in rx queue is not enough to wake up reader. > */ > if (fwd_cnt_delta && >- (free_space < VIRTIO_VSOCK_MAX_PKT_BUF_SIZE || low_rx_bytes)) >+ (free_space < VIRTIO_VSOCK_MAX_PKT_BUF_SIZE)) > virtio_transport_send_credit_update(vsk); > > return total; >-- >2.25.1 > >
On 25.06.2024 16:46, Stefano Garzarella wrote: > On Fri, Jun 21, 2024 at 10:25:40PM GMT, Arseniy Krasnov wrote: >> Previous calculation of 'free_space' was wrong (but worked as expected >> in most cases, see below), because it didn't account number of bytes in >> rx queue. Let's rework 'free_space' calculation in the following way: >> as this value is considered free space at rx side from tx point of view, >> it must be equal to return value of 'virtio_transport_get_credit()' at >> tx side. This function uses 'tx_cnt' counter and 'peer_fwd_cnt': first >> is number of transmitted bytes (without wrap), second is last 'fwd_cnt' >> value received from rx. So let's use same approach at rx side during >> 'free_space' calculation: add 'rx_cnt' counter which is number of >> received bytes (also without wrap) and subtract 'last_fwd_cnt' from it. >> Now we have: >> 1) 'rx_cnt' == 'tx_cnt' at both sides. >> 2) 'last_fwd_cnt' == 'peer_fwd_cnt' - because first is last 'fwd_cnt' >> sent to tx, while second is last 'fwd_cnt' received from rx. >> >> Now 'free_space' is handled correctly and also we don't need > > mmm, I don't know if it was wrong before, maybe we could say it was less accurate. May be "now 'free_space' is handled in more precise way and also we ..." ? > > That said, could we have the same problem now if we have a lot of producers and the virtqueue becomes full? > I guess if virtqueue is full, we just wait by returning skb back to tx queue... e.g. data exchange between two sockets just freezes. ? >> 'low_rx_bytes' flag - this was more like a hack. >> >> Previous calculation of 'free_space' worked (in 99% cases), because if >> we take a look on behaviour of both expressions (new and previous): >> >> '(rx_cnt - last_fwd_cnt)' and '(fwd_cnt - last_fwd_cnt)' >> >> Both of them always grows up, with almost same "speed": only difference >> is that 'rx_cnt' is incremented earlier during packet is received, >> while 'fwd_cnt' in incremented when packet is read by user. So if 'rx_cnt' >> grows "faster", then resulting 'free_space' become smaller also, so we >> send credit updates a little bit more, but: >> >> * 'free_space' calculation based on 'rx_cnt' gives the same value, >> which tx sees as free space at rx side, so original idea of > > Ditto, what happen if the virtqueue is full? > >> 'free_space' is now implemented as planned. >> * Hack with 'low_rx_bytes' now is not needed. > > Yeah, so this patch should also mitigate issue reported by Alex (added in CC), right? > > If yes, please mention that problem and add a Reported-by giving credit to Alex. Yes, of course! > >> >> Also here is some performance comparison between both versions of >> 'free_space' calculation: >> >> *------*----------*----------* >> | | 'rx_cnt' | previous | >> *------*----------*----------* >> |H -> G| 8.42 | 7.82 | >> *------*----------*----------* >> |G -> H| 11.6 | 12.1 | >> *------*----------*----------* > > How many seconds did you run it? How many repetitions? There's a little discrepancy anyway, but I can't tell if it's just noise. I run 4 times, each run for ~10 seconds... I think I can also add number of credit update messages to this report. > >> >> As benchmark 'vsock-iperf' with default arguments was used. There is no >> significant performance difference before and after this patch. >> >> Signed-off-by: Arseniy Krasnov <avkrasnov@salutedevices.com> >> --- >> include/linux/virtio_vsock.h | 1 + >> net/vmw_vsock/virtio_transport_common.c | 8 +++----- >> 2 files changed, 4 insertions(+), 5 deletions(-) > > Thanks for working on this, I'll do more tests but the approach LGTM. Got it, Thanks > > Thanks, > Stefano > >> >> diff --git a/include/linux/virtio_vsock.h b/include/linux/virtio_vsock.h >> index c82089dee0c8..3579491c411e 100644 >> --- a/include/linux/virtio_vsock.h >> +++ b/include/linux/virtio_vsock.h >> @@ -135,6 +135,7 @@ struct virtio_vsock_sock { >> u32 peer_buf_alloc; >> >> /* Protected by rx_lock */ >> + u32 rx_cnt; >> u32 fwd_cnt; >> u32 last_fwd_cnt; >> u32 rx_bytes; >> diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c >> index 16ff976a86e3..1d4e2328e06e 100644 >> --- a/net/vmw_vsock/virtio_transport_common.c >> +++ b/net/vmw_vsock/virtio_transport_common.c >> @@ -441,6 +441,7 @@ static bool virtio_transport_inc_rx_pkt(struct virtio_vsock_sock *vvs, >> return false; >> >> vvs->rx_bytes += len; >> + vvs->rx_cnt += len; >> return true; >> } >> >> @@ -558,7 +559,6 @@ virtio_transport_stream_do_dequeue(struct vsock_sock *vsk, >> size_t bytes, total = 0; >> struct sk_buff *skb; >> u32 fwd_cnt_delta; >> - bool low_rx_bytes; >> int err = -EFAULT; >> u32 free_space; >> >> @@ -603,9 +603,7 @@ virtio_transport_stream_do_dequeue(struct vsock_sock *vsk, >> } >> >> fwd_cnt_delta = vvs->fwd_cnt - vvs->last_fwd_cnt; >> - free_space = vvs->buf_alloc - fwd_cnt_delta; >> - low_rx_bytes = (vvs->rx_bytes < >> - sock_rcvlowat(sk_vsock(vsk), 0, INT_MAX)); >> + free_space = vvs->buf_alloc - (vvs->rx_cnt - vvs->last_fwd_cnt); >> >> spin_unlock_bh(&vvs->rx_lock); >> >> @@ -619,7 +617,7 @@ virtio_transport_stream_do_dequeue(struct vsock_sock *vsk, >> * number of bytes in rx queue is not enough to wake up reader. >> */ >> if (fwd_cnt_delta && >> - (free_space < VIRTIO_VSOCK_MAX_PKT_BUF_SIZE || low_rx_bytes)) >> + (free_space < VIRTIO_VSOCK_MAX_PKT_BUF_SIZE)) >> virtio_transport_send_credit_update(vsk); >> >> return total; >> -- >> 2.25.1 >> >> >
Hi Arseniy, On Fri, Jun 21, 2024 at 10:25:40PM GMT, Arseniy Krasnov wrote: >Previous calculation of 'free_space' was wrong (but worked as expected >in most cases, see below), because it didn't account number of bytes in >rx queue. Let's rework 'free_space' calculation in the following way: >as this value is considered free space at rx side from tx point of >view, >it must be equal to return value of 'virtio_transport_get_credit()' at >tx side. This function uses 'tx_cnt' counter and 'peer_fwd_cnt': first >is number of transmitted bytes (without wrap), second is last 'fwd_cnt' >value received from rx. So let's use same approach at rx side during >'free_space' calculation: add 'rx_cnt' counter which is number of >received bytes (also without wrap) and subtract 'last_fwd_cnt' from it. >Now we have: >1) 'rx_cnt' == 'tx_cnt' at both sides. >2) 'last_fwd_cnt' == 'peer_fwd_cnt' - because first is last 'fwd_cnt' > sent to tx, while second is last 'fwd_cnt' received from rx. > >Now 'free_space' is handled correctly and also we don't need >'low_rx_bytes' flag - this was more like a hack. > >Previous calculation of 'free_space' worked (in 99% cases), because if >we take a look on behaviour of both expressions (new and previous): > >'(rx_cnt - last_fwd_cnt)' and '(fwd_cnt - last_fwd_cnt)' > >Both of them always grows up, with almost same "speed": only difference >is that 'rx_cnt' is incremented earlier during packet is received, >while 'fwd_cnt' in incremented when packet is read by user. So if >'rx_cnt' >grows "faster", then resulting 'free_space' become smaller also, so we >send credit updates a little bit more, but: > > * 'free_space' calculation based on 'rx_cnt' gives the same value, > which tx sees as free space at rx side, so original idea of > 'free_space' is now implemented as planned. > * Hack with 'low_rx_bytes' now is not needed. > >Also here is some performance comparison between both versions of >'free_space' calculation: > > *------*----------*----------* > | | 'rx_cnt' | previous | > *------*----------*----------* > |H -> G| 8.42 | 7.82 | > *------*----------*----------* > |G -> H| 11.6 | 12.1 | > *------*----------*----------* I did some tests on an Intel(R) Xeon(R) Silver 4410Y using iperf-vsock: - kernel 6.9.0 pkt_size G->H H->G 4k 4.6 6.4 64k 13.8 11.5 128k 13.4 11.7 - kernel 6.9.0 with this series applied pkt_size G->H H->G 4k 4.6 8.16 64k 12.2 8.9 128k 12.8 8.8 I see a big drop, especially on H->G with big packets. Can you try to replicate on your env? I'll try to understand more and also an i7 on the next days. Thanks, Stefano > >As benchmark 'vsock-iperf' with default arguments was used. There is no >significant performance difference before and after this patch. > >Signed-off-by: Arseniy Krasnov <avkrasnov@salutedevices.com> >--- > include/linux/virtio_vsock.h | 1 + > net/vmw_vsock/virtio_transport_common.c | 8 +++----- > 2 files changed, 4 insertions(+), 5 deletions(-) > >diff --git a/include/linux/virtio_vsock.h b/include/linux/virtio_vsock.h >index c82089dee0c8..3579491c411e 100644 >--- a/include/linux/virtio_vsock.h >+++ b/include/linux/virtio_vsock.h >@@ -135,6 +135,7 @@ struct virtio_vsock_sock { > u32 peer_buf_alloc; > > /* Protected by rx_lock */ >+ u32 rx_cnt; > u32 fwd_cnt; > u32 last_fwd_cnt; > u32 rx_bytes; >diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c >index 16ff976a86e3..1d4e2328e06e 100644 >--- a/net/vmw_vsock/virtio_transport_common.c >+++ b/net/vmw_vsock/virtio_transport_common.c >@@ -441,6 +441,7 @@ static bool virtio_transport_inc_rx_pkt(struct virtio_vsock_sock *vvs, > return false; > > vvs->rx_bytes += len; >+ vvs->rx_cnt += len; > return true; > } > >@@ -558,7 +559,6 @@ virtio_transport_stream_do_dequeue(struct vsock_sock *vsk, > size_t bytes, total = 0; > struct sk_buff *skb; > u32 fwd_cnt_delta; >- bool low_rx_bytes; > int err = -EFAULT; > u32 free_space; > >@@ -603,9 +603,7 @@ virtio_transport_stream_do_dequeue(struct vsock_sock *vsk, > } > > fwd_cnt_delta = vvs->fwd_cnt - vvs->last_fwd_cnt; >- free_space = vvs->buf_alloc - fwd_cnt_delta; >- low_rx_bytes = (vvs->rx_bytes < >- sock_rcvlowat(sk_vsock(vsk), 0, INT_MAX)); >+ free_space = vvs->buf_alloc - (vvs->rx_cnt - vvs->last_fwd_cnt); > > spin_unlock_bh(&vvs->rx_lock); > >@@ -619,7 +617,7 @@ virtio_transport_stream_do_dequeue(struct vsock_sock *vsk, > * number of bytes in rx queue is not enough to wake up reader. > */ > if (fwd_cnt_delta && >- (free_space < VIRTIO_VSOCK_MAX_PKT_BUF_SIZE || low_rx_bytes)) >+ (free_space < VIRTIO_VSOCK_MAX_PKT_BUF_SIZE)) > virtio_transport_send_credit_update(vsk); > > return total; >-- >2.25.1 > >
diff --git a/include/linux/virtio_vsock.h b/include/linux/virtio_vsock.h index c82089dee0c8..3579491c411e 100644 --- a/include/linux/virtio_vsock.h +++ b/include/linux/virtio_vsock.h @@ -135,6 +135,7 @@ struct virtio_vsock_sock { u32 peer_buf_alloc; /* Protected by rx_lock */ + u32 rx_cnt; u32 fwd_cnt; u32 last_fwd_cnt; u32 rx_bytes; diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c index 16ff976a86e3..1d4e2328e06e 100644 --- a/net/vmw_vsock/virtio_transport_common.c +++ b/net/vmw_vsock/virtio_transport_common.c @@ -441,6 +441,7 @@ static bool virtio_transport_inc_rx_pkt(struct virtio_vsock_sock *vvs, return false; vvs->rx_bytes += len; + vvs->rx_cnt += len; return true; } @@ -558,7 +559,6 @@ virtio_transport_stream_do_dequeue(struct vsock_sock *vsk, size_t bytes, total = 0; struct sk_buff *skb; u32 fwd_cnt_delta; - bool low_rx_bytes; int err = -EFAULT; u32 free_space; @@ -603,9 +603,7 @@ virtio_transport_stream_do_dequeue(struct vsock_sock *vsk, } fwd_cnt_delta = vvs->fwd_cnt - vvs->last_fwd_cnt; - free_space = vvs->buf_alloc - fwd_cnt_delta; - low_rx_bytes = (vvs->rx_bytes < - sock_rcvlowat(sk_vsock(vsk), 0, INT_MAX)); + free_space = vvs->buf_alloc - (vvs->rx_cnt - vvs->last_fwd_cnt); spin_unlock_bh(&vvs->rx_lock); @@ -619,7 +617,7 @@ virtio_transport_stream_do_dequeue(struct vsock_sock *vsk, * number of bytes in rx queue is not enough to wake up reader. */ if (fwd_cnt_delta && - (free_space < VIRTIO_VSOCK_MAX_PKT_BUF_SIZE || low_rx_bytes)) + (free_space < VIRTIO_VSOCK_MAX_PKT_BUF_SIZE)) virtio_transport_send_credit_update(vsk); return total;
Previous calculation of 'free_space' was wrong (but worked as expected in most cases, see below), because it didn't account number of bytes in rx queue. Let's rework 'free_space' calculation in the following way: as this value is considered free space at rx side from tx point of view, it must be equal to return value of 'virtio_transport_get_credit()' at tx side. This function uses 'tx_cnt' counter and 'peer_fwd_cnt': first is number of transmitted bytes (without wrap), second is last 'fwd_cnt' value received from rx. So let's use same approach at rx side during 'free_space' calculation: add 'rx_cnt' counter which is number of received bytes (also without wrap) and subtract 'last_fwd_cnt' from it. Now we have: 1) 'rx_cnt' == 'tx_cnt' at both sides. 2) 'last_fwd_cnt' == 'peer_fwd_cnt' - because first is last 'fwd_cnt' sent to tx, while second is last 'fwd_cnt' received from rx. Now 'free_space' is handled correctly and also we don't need 'low_rx_bytes' flag - this was more like a hack. Previous calculation of 'free_space' worked (in 99% cases), because if we take a look on behaviour of both expressions (new and previous): '(rx_cnt - last_fwd_cnt)' and '(fwd_cnt - last_fwd_cnt)' Both of them always grows up, with almost same "speed": only difference is that 'rx_cnt' is incremented earlier during packet is received, while 'fwd_cnt' in incremented when packet is read by user. So if 'rx_cnt' grows "faster", then resulting 'free_space' become smaller also, so we send credit updates a little bit more, but: * 'free_space' calculation based on 'rx_cnt' gives the same value, which tx sees as free space at rx side, so original idea of 'free_space' is now implemented as planned. * Hack with 'low_rx_bytes' now is not needed. Also here is some performance comparison between both versions of 'free_space' calculation: *------*----------*----------* | | 'rx_cnt' | previous | *------*----------*----------* |H -> G| 8.42 | 7.82 | *------*----------*----------* |G -> H| 11.6 | 12.1 | *------*----------*----------* As benchmark 'vsock-iperf' with default arguments was used. There is no significant performance difference before and after this patch. Signed-off-by: Arseniy Krasnov <avkrasnov@salutedevices.com> --- include/linux/virtio_vsock.h | 1 + net/vmw_vsock/virtio_transport_common.c | 8 +++----- 2 files changed, 4 insertions(+), 5 deletions(-)