mbox series

[net-next,v4,0/9] vsock: updates for SO_RCVLOWAT handling

Message ID de41de4c-0345-34d7-7c36-4345258b7ba8@sberdevices.ru (mailing list archive)
Headers show
Series vsock: updates for SO_RCVLOWAT handling | expand

Message

Arseniy Krasnov Aug. 19, 2022, 5:21 a.m. UTC
Hello,

This patchset includes some updates for SO_RCVLOWAT:

1) af_vsock:
   During my experiments with zerocopy receive, i found, that in some
   cases, poll() implementation violates POSIX: when socket has non-
   default SO_RCVLOWAT(e.g. not 1), poll() will always set POLLIN and
   POLLRDNORM bits in 'revents' even number of bytes available to read
   on socket is smaller than SO_RCVLOWAT value. In this case,user sees
   POLLIN flag and then tries to read data(for example using  'read()'
   call), but read call will be blocked, because  SO_RCVLOWAT logic is
   supported in dequeue loop in af_vsock.c. But the same time,  POSIX
   requires that:

   "POLLIN     Data other than high-priority data may be read without
               blocking.
    POLLRDNORM Normal data may be read without blocking."

   See https://www.open-std.org/jtc1/sc22/open/n4217.pdf, page 293.

   So, we have, that poll() syscall returns POLLIN, but read call will
   be blocked.

   Also in man page socket(7) i found that:

   "Since Linux 2.6.28, select(2), poll(2), and epoll(7) indicate a
   socket as readable only if at least SO_RCVLOWAT bytes are available."

   I checked TCP callback for poll()(net/ipv4/tcp.c, tcp_poll()), it
   uses SO_RCVLOWAT value to set POLLIN bit, also i've tested TCP with
   this case for TCP socket, it works as POSIX required.

   I've added some fixes to af_vsock.c and virtio_transport_common.c,
   test is also implemented.

2) virtio/vsock:
   It adds some optimization to wake ups, when new data arrived. Now,
   SO_RCVLOWAT is considered before wake up sleepers who wait new data.
   There is no sense, to kick waiter, when number of available bytes
   in socket's queue < SO_RCVLOWAT, because if we wake up reader in
   this case, it will wait for SO_RCVLOWAT data anyway during dequeue,
   or in poll() case, POLLIN/POLLRDNORM bits won't be set, so such
   exit from poll() will be "spurious". This logic is also used in TCP
   sockets.

3) vmci/vsock:
   Same as 2), but i'm not sure about this changes. Will be very good,
   to get comments from someone who knows this code.

4) Hyper-V:
   As Dexuan Cui mentioned, for Hyper-V transport it is difficult to
   support SO_RCVLOWAT, so he suggested to disable this feature for
   Hyper-V.

Thank You

Arseniy Krasnov(9):
 vsock: SO_RCVLOWAT transport set callback
 hv_sock: disable SO_RCVLOWAT support
 virtio/vsock: use 'target' in notify_poll_in callback
 vmci/vsock: use 'target' in notify_poll_in callback
 vsock: pass sock_rcvlowat to notify_poll_in as target
 vsock: add API call for data ready
 virtio/vsock: check SO_RCVLOWAT before wake up reader
 vmci/vsock: check SO_RCVLOWAT before wake up reader
 vsock_test: POLLIN + SO_RCVLOWAT test

 include/net/af_vsock.h                       |   2 +
 net/vmw_vsock/af_vsock.c                     |  33 +++++++-
 net/vmw_vsock/hyperv_transport.c             |   7 ++
 net/vmw_vsock/virtio_transport_common.c      |   7 +-
 net/vmw_vsock/vmci_transport_notify.c        |  10 +--
 net/vmw_vsock/vmci_transport_notify_qstate.c |  12 +--
 tools/testing/vsock/vsock_test.c             | 108 +++++++++++++++++++++++++++
 7 files changed, 162 insertions(+), 17 deletions(-)

 Changelog:

 v1 -> v2:
 1) Patches for VMCI transport(same as for virtio-vsock).
 2) Patches for Hyper-V transport(disabling SO_RCVLOWAT setting).
 3) Waiting logic in test was updated(sleep() -> poll()).

 v2 -> v3:
 1) Patches were reordered.
 2) Commit message updated in 0005.
 3) Check 'transport' pointer in 0001 for NULL.

 v3 -> v4:
 1) vsock_set_rcvlowat() logic changed. Previous version required
    assigned transport and always called its 'set_rcvlowat' callback
    (if present). Now, assignment is not needed.
 2) 0003,0004,0005,0006,0007,0008 - commit messages updated.
 3) 0009 - small refactoring and style fixes.
 4) RFC tag was removed.

Comments

patchwork-bot+netdevbpf@kernel.org Aug. 23, 2022, 9:20 a.m. UTC | #1
Hello:

This series was applied to netdev/net-next.git (master)
by Paolo Abeni <pabeni@redhat.com>:

On Fri, 19 Aug 2022 05:21:58 +0000 you wrote:
> Hello,
> 
> This patchset includes some updates for SO_RCVLOWAT:
> 
> 1) af_vsock:
>    During my experiments with zerocopy receive, i found, that in some
>    cases, poll() implementation violates POSIX: when socket has non-
>    default SO_RCVLOWAT(e.g. not 1), poll() will always set POLLIN and
>    POLLRDNORM bits in 'revents' even number of bytes available to read
>    on socket is smaller than SO_RCVLOWAT value. In this case,user sees
>    POLLIN flag and then tries to read data(for example using  'read()'
>    call), but read call will be blocked, because  SO_RCVLOWAT logic is
>    supported in dequeue loop in af_vsock.c. But the same time,  POSIX
>    requires that:
> 
> [...]

Here is the summary with links:
  - [net-next,v4,1/9] vsock: SO_RCVLOWAT transport set callback
    https://git.kernel.org/netdev/net-next/c/e38f22c860ed
  - [net-next,v4,2/9] hv_sock: disable SO_RCVLOWAT support
    https://git.kernel.org/netdev/net-next/c/24764f8d3c31
  - [net-next,v4,3/9] virtio/vsock: use 'target' in notify_poll_in callback
    https://git.kernel.org/netdev/net-next/c/e7a3266c9167
  - [net-next,v4,4/9] vmci/vsock: use 'target' in notify_poll_in callback
    https://git.kernel.org/netdev/net-next/c/a274f6ff3c5c
  - [net-next,v4,5/9] vsock: pass sock_rcvlowat to notify_poll_in as target
    https://git.kernel.org/netdev/net-next/c/ee0b3843a269
  - [net-next,v4,6/9] vsock: add API call for data ready
    https://git.kernel.org/netdev/net-next/c/f2fdcf67aceb
  - [net-next,v4,7/9] virtio/vsock: check SO_RCVLOWAT before wake up reader
    https://git.kernel.org/netdev/net-next/c/39f1ed33a448
  - [net-next,v4,8/9] vmci/vsock: check SO_RCVLOWAT before wake up reader
    https://git.kernel.org/netdev/net-next/c/e061aed99855
  - [net-next,v4,9/9] vsock_test: POLLIN + SO_RCVLOWAT test
    https://git.kernel.org/netdev/net-next/c/b1346338fbae

You are awesome, thank you!
Stefan Hajnoczi Aug. 23, 2022, 7:14 p.m. UTC | #2
On Fri, Aug 19, 2022 at 05:21:58AM +0000, Arseniy Krasnov wrote:
> Hello,
> 
> This patchset includes some updates for SO_RCVLOWAT:
> 
> 1) af_vsock:
>    During my experiments with zerocopy receive, i found, that in some
>    cases, poll() implementation violates POSIX: when socket has non-
>    default SO_RCVLOWAT(e.g. not 1), poll() will always set POLLIN and
>    POLLRDNORM bits in 'revents' even number of bytes available to read
>    on socket is smaller than SO_RCVLOWAT value. In this case,user sees
>    POLLIN flag and then tries to read data(for example using  'read()'
>    call), but read call will be blocked, because  SO_RCVLOWAT logic is
>    supported in dequeue loop in af_vsock.c. But the same time,  POSIX
>    requires that:
> 
>    "POLLIN     Data other than high-priority data may be read without
>                blocking.
>     POLLRDNORM Normal data may be read without blocking."
> 
>    See https://www.open-std.org/jtc1/sc22/open/n4217.pdf, page 293.
> 
>    So, we have, that poll() syscall returns POLLIN, but read call will
>    be blocked.
> 
>    Also in man page socket(7) i found that:
> 
>    "Since Linux 2.6.28, select(2), poll(2), and epoll(7) indicate a
>    socket as readable only if at least SO_RCVLOWAT bytes are available."
> 
>    I checked TCP callback for poll()(net/ipv4/tcp.c, tcp_poll()), it
>    uses SO_RCVLOWAT value to set POLLIN bit, also i've tested TCP with
>    this case for TCP socket, it works as POSIX required.
> 
>    I've added some fixes to af_vsock.c and virtio_transport_common.c,
>    test is also implemented.
> 
> 2) virtio/vsock:
>    It adds some optimization to wake ups, when new data arrived. Now,
>    SO_RCVLOWAT is considered before wake up sleepers who wait new data.
>    There is no sense, to kick waiter, when number of available bytes
>    in socket's queue < SO_RCVLOWAT, because if we wake up reader in
>    this case, it will wait for SO_RCVLOWAT data anyway during dequeue,
>    or in poll() case, POLLIN/POLLRDNORM bits won't be set, so such
>    exit from poll() will be "spurious". This logic is also used in TCP
>    sockets.
> 
> 3) vmci/vsock:
>    Same as 2), but i'm not sure about this changes. Will be very good,
>    to get comments from someone who knows this code.
> 
> 4) Hyper-V:
>    As Dexuan Cui mentioned, for Hyper-V transport it is difficult to
>    support SO_RCVLOWAT, so he suggested to disable this feature for
>    Hyper-V.
> 
> Thank You

Hi Arseniy,
Stefano will be online again on Monday. I suggest we wait for him to
review this series. If it's urgent, please let me know and I'll take a
look.

Thanks,
Stefan

> Arseniy Krasnov(9):
>  vsock: SO_RCVLOWAT transport set callback
>  hv_sock: disable SO_RCVLOWAT support
>  virtio/vsock: use 'target' in notify_poll_in callback
>  vmci/vsock: use 'target' in notify_poll_in callback
>  vsock: pass sock_rcvlowat to notify_poll_in as target
>  vsock: add API call for data ready
>  virtio/vsock: check SO_RCVLOWAT before wake up reader
>  vmci/vsock: check SO_RCVLOWAT before wake up reader
>  vsock_test: POLLIN + SO_RCVLOWAT test
> 
>  include/net/af_vsock.h                       |   2 +
>  net/vmw_vsock/af_vsock.c                     |  33 +++++++-
>  net/vmw_vsock/hyperv_transport.c             |   7 ++
>  net/vmw_vsock/virtio_transport_common.c      |   7 +-
>  net/vmw_vsock/vmci_transport_notify.c        |  10 +--
>  net/vmw_vsock/vmci_transport_notify_qstate.c |  12 +--
>  tools/testing/vsock/vsock_test.c             | 108 +++++++++++++++++++++++++++
>  7 files changed, 162 insertions(+), 17 deletions(-)
> 
>  Changelog:
> 
>  v1 -> v2:
>  1) Patches for VMCI transport(same as for virtio-vsock).
>  2) Patches for Hyper-V transport(disabling SO_RCVLOWAT setting).
>  3) Waiting logic in test was updated(sleep() -> poll()).
> 
>  v2 -> v3:
>  1) Patches were reordered.
>  2) Commit message updated in 0005.
>  3) Check 'transport' pointer in 0001 for NULL.
> 
>  v3 -> v4:
>  1) vsock_set_rcvlowat() logic changed. Previous version required
>     assigned transport and always called its 'set_rcvlowat' callback
>     (if present). Now, assignment is not needed.
>  2) 0003,0004,0005,0006,0007,0008 - commit messages updated.
>  3) 0009 - small refactoring and style fixes.
>  4) RFC tag was removed.
> 
> -- 
> 2.25.1
Jakub Kicinski Aug. 23, 2022, 7:18 p.m. UTC | #3
On Tue, 23 Aug 2022 15:14:10 -0400 Stefan Hajnoczi wrote:
> Stefano will be online again on Monday. I suggest we wait for him to
> review this series. If it's urgent, please let me know and I'll take a
> look.

It was already applied, sorry about that. But please continue with
review as if it wasn't. We'll just revert based on Stefano's feedback
as needed.
Stefan Hajnoczi Aug. 23, 2022, 8:30 p.m. UTC | #4
On Tue, Aug 23, 2022 at 12:18:52PM -0700, Jakub Kicinski wrote:
> On Tue, 23 Aug 2022 15:14:10 -0400 Stefan Hajnoczi wrote:
> > Stefano will be online again on Monday. I suggest we wait for him to
> > review this series. If it's urgent, please let me know and I'll take a
> > look.
> 
> It was already applied, sorry about that. But please continue with
> review as if it wasn't. We'll just revert based on Stefano's feedback
> as needed.

Okay, no problem.

Stefan
Paolo Abeni Aug. 23, 2022, 8:57 p.m. UTC | #5
On Tue, 2022-08-23 at 16:30 -0400, Stefan Hajnoczi wrote:
> On Tue, Aug 23, 2022 at 12:18:52PM -0700, Jakub Kicinski wrote:
> > On Tue, 23 Aug 2022 15:14:10 -0400 Stefan Hajnoczi wrote:
> > > Stefano will be online again on Monday. I suggest we wait for him to
> > > review this series. If it's urgent, please let me know and I'll take a
> > > look.
> > 
> > It was already applied, sorry about that. But please continue with
> > review as if it wasn't. We'll just revert based on Stefano's feedback
> > as needed.
> 
> Okay, no problem.

For the records, I applied the series since it looked to me Arseniy
addressed all the feedback from Stefano on the first patch (the only
one still lacking an acked-by/reviewed-by tag) and I thought it would
be better avoiding such delay.

We are still early in this net-next cycle, I think it can be improved
incrementally as needed.

Paolo
Arseniy Krasnov Aug. 24, 2022, 3:48 a.m. UTC | #6
On 23.08.2022 23:57, Paolo Abeni wrote:
> On Tue, 2022-08-23 at 16:30 -0400, Stefan Hajnoczi wrote:
>> On Tue, Aug 23, 2022 at 12:18:52PM -0700, Jakub Kicinski wrote:
>>> On Tue, 23 Aug 2022 15:14:10 -0400 Stefan Hajnoczi wrote:
>>>> Stefano will be online again on Monday. I suggest we wait for him to
>>>> review this series. If it's urgent, please let me know and I'll take a
>>>> look.
Hi,

sure, nothing urgent, no problem. Let's wait for Stefano! Is there something
wrong with this patchset? I've replaced RFC -> net-next since Stefano reviewed
all patches except 0001 and suggested to use net-next in v4.
>>>
>>> It was already applied, sorry about that. But please continue with
>>> review as if it wasn't. We'll just revert based on Stefano's feedback
>>> as needed.
>>
>> Okay, no problem.
> 
> For the records, I applied the series since it looked to me Arseniy
> addressed all the feedback from Stefano on the first patch (the only
> one still lacking an acked-by/reviewed-by tag) and I thought it would
> be better avoiding such delay.
Yes, only 0001 lacks reviewed-by
> 
> We are still early in this net-next cycle, I think it can be improved
> incrementally as needed.
> 
> Paolo
> 
Thank You, Arseniy
Stefano Garzarella Aug. 29, 2022, 1:52 p.m. UTC | #7
On Tue, Aug 23, 2022 at 10:57:01PM +0200, Paolo Abeni wrote:
>On Tue, 2022-08-23 at 16:30 -0400, Stefan Hajnoczi wrote:
>> On Tue, Aug 23, 2022 at 12:18:52PM -0700, Jakub Kicinski wrote:
>> > On Tue, 23 Aug 2022 15:14:10 -0400 Stefan Hajnoczi wrote:
>> > > Stefano will be online again on Monday. I suggest we wait for him to
>> > > review this series. If it's urgent, please let me know and I'll take a
>> > > look.
>> >
>> > It was already applied, sorry about that. But please continue with
>> > review as if it wasn't. We'll just revert based on Stefano's feedback
>> > as needed.

Just back, and I'm fine with this version, so thanks for merging!
I also run tests with virtio-vsock and everything is fine.

>>
>> Okay, no problem.
>
>For the records, I applied the series since it looked to me Arseniy
>addressed all the feedback from Stefano on the first patch (the only
>one still lacking an acked-by/reviewed-by tag) and I thought it would
>be better avoiding such delay.

Yep, from v3 I asked some changes on the first patch that Arseniy 
addressed in this version, and we were waiting an ack for VMCI changes 
(thanks Vishnu for giving it).

So, it should be fine.

Thanks,
Stefano