diff mbox series

[net] inet: bring NLM_DONE out to a separate recv() again

Message ID 20240411180202.399246-1-kuba@kernel.org (mailing list archive)
State Accepted
Commit 460b0d33cf10eee33de651381d3170ef13241650
Delegated to: Netdev Maintainers
Headers show
Series [net] inet: bring NLM_DONE out to a separate recv() again | expand

Checks

Context Check Description
netdev/series_format success Single patches do not need cover letters
netdev/tree_selection success Clearly marked for net
netdev/ynl success Generated files up to date; no warnings/errors; no diff in generated;
netdev/fixes_present success Fixes tag present in non-next series
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 944 this patch: 944
netdev/build_tools success No tools touched, skip
netdev/cc_maintainers success CCed 6 of 6 maintainers
netdev/build_clang success Errors and warnings before: 954 this patch: 954
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/deprecated_api success None detected
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success Fixes tag looks correct
netdev/build_allmodconfig_warn success Errors and warnings before: 955 this patch: 955
netdev/checkpatch success total: 0 errors, 0 warnings, 0 checks, 11 lines checked
netdev/build_clang_rust success No Rust files in patch. Skipping build
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0
netdev/contest success net-next-2024-04-12--06-00 (tests: 962)

Commit Message

Jakub Kicinski April 11, 2024, 6:02 p.m. UTC
Commit under Fixes optimized the number of recv() calls
needed during RTM_GETROUTE dumps, but we got multiple
reports of applications hanging on recv() calls.
Applications expect that a route dump will be terminated
with a recv() reading an individual NLM_DONE message.

Coalescing NLM_DONE is perfectly legal in netlink,
but even tho reporters fixed the code in respective
projects, chances are it will take time for those
applications to get updated. So revert to old behavior
(for now)?

Old kernel (5.19):

 $ ./cli.py --dbg-small-recv 4096 --spec netlink/specs/rt_route.yaml \
            --dump getroute --json '{"rtm-family": 2}'
 Recv: read 692 bytes, 11 messages
   nl_len = 68 (52) nl_flags = 0x22 nl_type = 24
 ...
   nl_len = 60 (44) nl_flags = 0x22 nl_type = 24
 Recv: read 20 bytes, 1 messages
   nl_len = 20 (4) nl_flags = 0x2 nl_type = 3

Before (6.9-rc2):

 $ ./cli.py --dbg-small-recv 4096 --spec netlink/specs/rt_route.yaml \
            --dump getroute --json '{"rtm-family": 2}'
 Recv: read 712 bytes, 12 messages
   nl_len = 68 (52) nl_flags = 0x22 nl_type = 24
 ...
   nl_len = 60 (44) nl_flags = 0x22 nl_type = 24
   nl_len = 20 (4) nl_flags = 0x2 nl_type = 3

After:

 $ ./cli.py --dbg-small-recv 4096 --spec netlink/specs/rt_route.yaml \
            --dump getroute --json '{"rtm-family": 2}'
 Recv: read 692 bytes, 11 messages
   nl_len = 68 (52) nl_flags = 0x22 nl_type = 24
 ...
   nl_len = 60 (44) nl_flags = 0x22 nl_type = 24
 Recv: read 20 bytes, 1 messages
   nl_len = 20 (4) nl_flags = 0x2 nl_type = 3

Reported-by: Stefano Brivio <sbrivio@redhat.com>
Link: https://lore.kernel.org/all/20240315124808.033ff58d@elisabeth
Reported-by: Ilya Maximets <i.maximets@ovn.org>
Link: https://lore.kernel.org/all/02b50aae-f0e9-47a4-8365-a977a85975d3@ovn.org
Fixes: 4ce5dc9316de ("inet: switch inet_dump_fib() to RCU protection")
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
---
CC: dsahern@kernel.org
CC: donald.hunter@gmail.com
---
 net/ipv4/fib_frontend.c | 5 +++++
 1 file changed, 5 insertions(+)

Comments

Eric Dumazet April 11, 2024, 6:42 p.m. UTC | #1
On Thu, Apr 11, 2024 at 8:02 PM Jakub Kicinski <kuba@kernel.org> wrote:
>
> Commit under Fixes optimized the number of recv() calls
> needed during RTM_GETROUTE dumps, but we got multiple
> reports of applications hanging on recv() calls.
> Applications expect that a route dump will be terminated
> with a recv() reading an individual NLM_DONE message.
>
> Coalescing NLM_DONE is perfectly legal in netlink,
> but even tho reporters fixed the code in respective
> projects, chances are it will take time for those
> applications to get updated. So revert to old behavior
> (for now)?
>
> Old kernel (5.19):
>

> Reported-by: Stefano Brivio <sbrivio@redhat.com>
> Link: https://lore.kernel.org/all/20240315124808.033ff58d@elisabeth
> Reported-by: Ilya Maximets <i.maximets@ovn.org>
> Link: https://lore.kernel.org/all/02b50aae-f0e9-47a4-8365-a977a85975d3@ovn.org
> Fixes: 4ce5dc9316de ("inet: switch inet_dump_fib() to RCU protection")
> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
> ---
> CC: dsahern@kernel.org
> CC: donald.hunter@gmail.com
> ---
>  net/ipv4/fib_frontend.c | 5 +++++
>  1 file changed, 5 insertions(+)
>
> diff --git a/net/ipv4/fib_frontend.c b/net/ipv4/fib_frontend.c
> index 48741352a88a..c484b1c0fc00 100644
> --- a/net/ipv4/fib_frontend.c
> +++ b/net/ipv4/fib_frontend.c
> @@ -1050,6 +1050,11 @@ static int inet_dump_fib(struct sk_buff *skb, struct netlink_callback *cb)
>                         e++;
>                 }
>         }
> +
> +       /* Don't let NLM_DONE coalesce into a message, even if it could.
> +        * Some user space expects NLM_DONE in a separate recv().
> +        */
> +       err = skb->len;

My plan was to perform this generically from netlink_dump_done().

This would still avoid calling a RTNL-enabled-dump() again if EOF has
been met already.

A sysctl could opt-in for the coalescing, if there is interest.

diff --git a/net/netlink/af_netlink.c b/net/netlink/af_netlink.c
index dc8c3c01d51b709c132ff63a0c534c1cc286589a..cad1124393ac74f3d5bfa86556ed9028f5ec8f65
100644
--- a/net/netlink/af_netlink.c
+++ b/net/netlink/af_netlink.c
@@ -2282,7 +2282,12 @@ static int netlink_dump(struct sock *sk, bool lock_taken)
                cb->extack = NULL;
        }

+       /* Don't let NLM_DONE coalesce into a message, even if it could.
+        * Some user space expects NLM_DONE in a separate recv().
+        * Maybe opt-in this coalescing with a sysctl or socket option ?
+        */
        if (nlk->dump_done_errno > 0 ||
+           skb->len ||
            skb_tailroom(skb) <
nlmsg_total_size(sizeof(nlk->dump_done_errno))) {
                mutex_unlock(&nlk->nl_cb_mutex);
Jakub Kicinski April 11, 2024, 6:57 p.m. UTC | #2
On Thu, 11 Apr 2024 20:42:45 +0200 Eric Dumazet wrote:
> My plan was to perform this generically from netlink_dump_done().
> 
> This would still avoid calling a RTNL-enabled-dump() again if EOF has
> been met already.
> 
> A sysctl could opt-in for the coalescing, if there is interest.

I'd prefer to keep hacks contained to the few places that need them.
Netlink is full of strange quirks, I don't think this one deserves
a sysctl.
Ilya Maximets April 11, 2024, 7:35 p.m. UTC | #3
On 4/11/24 20:02, Jakub Kicinski wrote:
> Commit under Fixes optimized the number of recv() calls
> needed during RTM_GETROUTE dumps, but we got multiple
> reports of applications hanging on recv() calls.
> Applications expect that a route dump will be terminated
> with a recv() reading an individual NLM_DONE message.
> 
> Coalescing NLM_DONE is perfectly legal in netlink,
> but even tho reporters fixed the code in respective
> projects, chances are it will take time for those
> applications to get updated. So revert to old behavior
> (for now)?
> 
> Old kernel (5.19):
> 
>  $ ./cli.py --dbg-small-recv 4096 --spec netlink/specs/rt_route.yaml \
>             --dump getroute --json '{"rtm-family": 2}'
>  Recv: read 692 bytes, 11 messages
>    nl_len = 68 (52) nl_flags = 0x22 nl_type = 24
>  ...
>    nl_len = 60 (44) nl_flags = 0x22 nl_type = 24
>  Recv: read 20 bytes, 1 messages
>    nl_len = 20 (4) nl_flags = 0x2 nl_type = 3
> 
> Before (6.9-rc2):
> 
>  $ ./cli.py --dbg-small-recv 4096 --spec netlink/specs/rt_route.yaml \
>             --dump getroute --json '{"rtm-family": 2}'
>  Recv: read 712 bytes, 12 messages
>    nl_len = 68 (52) nl_flags = 0x22 nl_type = 24
>  ...
>    nl_len = 60 (44) nl_flags = 0x22 nl_type = 24
>    nl_len = 20 (4) nl_flags = 0x2 nl_type = 3
> 
> After:
> 
>  $ ./cli.py --dbg-small-recv 4096 --spec netlink/specs/rt_route.yaml \
>             --dump getroute --json '{"rtm-family": 2}'
>  Recv: read 692 bytes, 11 messages
>    nl_len = 68 (52) nl_flags = 0x22 nl_type = 24
>  ...
>    nl_len = 60 (44) nl_flags = 0x22 nl_type = 24
>  Recv: read 20 bytes, 1 messages
>    nl_len = 20 (4) nl_flags = 0x2 nl_type = 3
> 
> Reported-by: Stefano Brivio <sbrivio@redhat.com>
> Link: https://lore.kernel.org/all/20240315124808.033ff58d@elisabeth
> Reported-by: Ilya Maximets <i.maximets@ovn.org>
> Link: https://lore.kernel.org/all/02b50aae-f0e9-47a4-8365-a977a85975d3@ovn.org
> Fixes: 4ce5dc9316de ("inet: switch inet_dump_fib() to RCU protection")
> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
> ---
> CC: dsahern@kernel.org
> CC: donald.hunter@gmail.com
> ---
>  net/ipv4/fib_frontend.c | 5 +++++
>  1 file changed, 5 insertions(+)
> 
> diff --git a/net/ipv4/fib_frontend.c b/net/ipv4/fib_frontend.c
> index 48741352a88a..c484b1c0fc00 100644
> --- a/net/ipv4/fib_frontend.c
> +++ b/net/ipv4/fib_frontend.c
> @@ -1050,6 +1050,11 @@ static int inet_dump_fib(struct sk_buff *skb, struct netlink_callback *cb)
>  			e++;
>  		}
>  	}
> +
> +	/* Don't let NLM_DONE coalesce into a message, even if it could.
> +	 * Some user space expects NLM_DONE in a separate recv().
> +	 */
> +	err = skb->len;
>  out:
>  
>  	cb->args[1] = e;

FWIW, on current net-next this fixes the issue with Libreswan and IPv4
(IPv6 issue remains, obviously).  I also did a round of other OVS system
tests and they worked fine.

Tested-by: Ilya Maximets <i.maximets@ovn.org>
David Ahern April 11, 2024, 7:45 p.m. UTC | #4
On 4/11/24 12:02 PM, Jakub Kicinski wrote:
> Commit under Fixes optimized the number of recv() calls
> needed during RTM_GETROUTE dumps, but we got multiple
> reports of applications hanging on recv() calls.
> Applications expect that a route dump will be terminated
> with a recv() reading an individual NLM_DONE message.
> 
> Coalescing NLM_DONE is perfectly legal in netlink,
> but even tho reporters fixed the code in respective
> projects, chances are it will take time for those
> applications to get updated. So revert to old behavior
> (for now)?
> 
> Old kernel (5.19):
> 
>  $ ./cli.py --dbg-small-recv 4096 --spec netlink/specs/rt_route.yaml \
>             --dump getroute --json '{"rtm-family": 2}'
>  Recv: read 692 bytes, 11 messages
>    nl_len = 68 (52) nl_flags = 0x22 nl_type = 24
>  ...
>    nl_len = 60 (44) nl_flags = 0x22 nl_type = 24
>  Recv: read 20 bytes, 1 messages
>    nl_len = 20 (4) nl_flags = 0x2 nl_type = 3
> 
> Before (6.9-rc2):
> 
>  $ ./cli.py --dbg-small-recv 4096 --spec netlink/specs/rt_route.yaml \
>             --dump getroute --json '{"rtm-family": 2}'
>  Recv: read 712 bytes, 12 messages
>    nl_len = 68 (52) nl_flags = 0x22 nl_type = 24
>  ...
>    nl_len = 60 (44) nl_flags = 0x22 nl_type = 24
>    nl_len = 20 (4) nl_flags = 0x2 nl_type = 3
> 
> After:
> 
>  $ ./cli.py --dbg-small-recv 4096 --spec netlink/specs/rt_route.yaml \
>             --dump getroute --json '{"rtm-family": 2}'
>  Recv: read 692 bytes, 11 messages
>    nl_len = 68 (52) nl_flags = 0x22 nl_type = 24
>  ...
>    nl_len = 60 (44) nl_flags = 0x22 nl_type = 24
>  Recv: read 20 bytes, 1 messages
>    nl_len = 20 (4) nl_flags = 0x2 nl_type = 3
> 
> Reported-by: Stefano Brivio <sbrivio@redhat.com>
> Link: https://lore.kernel.org/all/20240315124808.033ff58d@elisabeth
> Reported-by: Ilya Maximets <i.maximets@ovn.org>
> Link: https://lore.kernel.org/all/02b50aae-f0e9-47a4-8365-a977a85975d3@ovn.org
> Fixes: 4ce5dc9316de ("inet: switch inet_dump_fib() to RCU protection")
> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
> ---
> CC: dsahern@kernel.org
> CC: donald.hunter@gmail.com
> ---
>  net/ipv4/fib_frontend.c | 5 +++++
>  1 file changed, 5 insertions(+)
> 
> diff --git a/net/ipv4/fib_frontend.c b/net/ipv4/fib_frontend.c
> index 48741352a88a..c484b1c0fc00 100644
> --- a/net/ipv4/fib_frontend.c
> +++ b/net/ipv4/fib_frontend.c
> @@ -1050,6 +1050,11 @@ static int inet_dump_fib(struct sk_buff *skb, struct netlink_callback *cb)
>  			e++;
>  		}
>  	}
> +
> +	/* Don't let NLM_DONE coalesce into a message, even if it could.
> +	 * Some user space expects NLM_DONE in a separate recv().

that's unfortunate

> +	 */
> +	err = skb->len;
>  out:
>  
>  	cb->args[1] = e;


Reviewed-by: David Ahern <dsahern@kernel.org>
Jakub Kicinski April 11, 2024, 9:01 p.m. UTC | #5
On Thu, 11 Apr 2024 13:45:42 -0600 David Ahern wrote:
> > +	/* Don't let NLM_DONE coalesce into a message, even if it could.
> > +	 * Some user space expects NLM_DONE in a separate recv().  
> 
> that's unfortunate

Do you have an opinion on the sysfs/opt-in question?
Feels to me like there shouldn't be that much user space doing raw
netlink, without a library. Old crufty code usually does ioctls, right?
So maybe we can periodically reintroduce this bug to shake out all 
the bad apps? :D
David Ahern April 11, 2024, 9:14 p.m. UTC | #6
On 4/11/24 3:01 PM, Jakub Kicinski wrote:
> On Thu, 11 Apr 2024 13:45:42 -0600 David Ahern wrote:
>>> +	/* Don't let NLM_DONE coalesce into a message, even if it could.
>>> +	 * Some user space expects NLM_DONE in a separate recv().  
>>
>> that's unfortunate
> 
> Do you have an opinion on the sysfs/opt-in question?

Not a performance critical path, so I would not add it right now.

> Feels to me like there shouldn't be that much user space doing raw
> netlink, without a library. Old crufty code usually does ioctls, right?
> So maybe we can periodically reintroduce this bug to shake out all 
> the bad apps? :D

:-) On the one hand, yea, push apps to improve. On the other, Donald
(FRR) is one who complains about nightmares chasing nuances across kernels.
Stefano Brivio April 12, 2024, 5:22 p.m. UTC | #7
On Thu, 11 Apr 2024 14:01:54 -0700
Jakub Kicinski <kuba@kernel.org> wrote:

> On Thu, 11 Apr 2024 13:45:42 -0600 David Ahern wrote:
> > > +	/* Don't let NLM_DONE coalesce into a message, even if it could.
> > > +	 * Some user space expects NLM_DONE in a separate recv().    
> > 
> > that's unfortunate  
> 
> Do you have an opinion on the sysfs/opt-in question?
> Feels to me like there shouldn't be that much user space doing raw
> netlink, without a library. Old crufty code usually does ioctls, right?

I think so too -- if there were more (maintained) applications with
this issue, we would have noticed by now.

> So maybe we can periodically reintroduce this bug to shake out all 
> the bad apps? :D

Actually, I had half a mind of proposing something on these lines: add
a TODO comment here and revisit in, say, two years.

I guess it's definitely more painful for libreswan, but for passt, I
think it's quite unlikely that distribution users could get the
"breaking" kernel change without a fixed version of the application: we
made a new release relatively close to the NLM_DONE change.

There might be substantial value in keeping this type of short netlink
exchanges fast, for example for container engines that need to be able
to spawn a bazillion containers per second.
Ilya Maximets April 12, 2024, 5:38 p.m. UTC | #8
On 4/12/24 19:22, Stefano Brivio wrote:
> On Thu, 11 Apr 2024 14:01:54 -0700
> Jakub Kicinski <kuba@kernel.org> wrote:
> 
>> On Thu, 11 Apr 2024 13:45:42 -0600 David Ahern wrote:
>>>> +	/* Don't let NLM_DONE coalesce into a message, even if it could.
>>>> +	 * Some user space expects NLM_DONE in a separate recv().    
>>>
>>> that's unfortunate  
>>
>> Do you have an opinion on the sysfs/opt-in question?
>> Feels to me like there shouldn't be that much user space doing raw
>> netlink, without a library. Old crufty code usually does ioctls, right?
> 
> I think so too -- if there were more (maintained) applications with
> this issue, we would have noticed by now.

It depends on how you define "maintained".  Most application devs
do not test with unreleased kernels.

> 
>> So maybe we can periodically reintroduce this bug to shake out all 
>> the bad apps? :D
> 
> Actually, I had half a mind of proposing something on these lines: add
> a TODO comment here and revisit in, say, two years.
> 
> I guess it's definitely more painful for libreswan, but for passt, I
> think it's quite unlikely that distribution users could get the
> "breaking" kernel change without a fixed version of the application: we
> made a new release relatively close to the NLM_DONE change.
> 
> There might be substantial value in keeping this type of short netlink
> exchanges fast, for example for container engines that need to be able
> to spawn a bazillion containers per second.
>
Stefano Brivio April 12, 2024, 6:03 p.m. UTC | #9
On Fri, 12 Apr 2024 19:38:53 +0200
Ilya Maximets <i.maximets@ovn.org> wrote:

> On 4/12/24 19:22, Stefano Brivio wrote:
> > On Thu, 11 Apr 2024 14:01:54 -0700
> > Jakub Kicinski <kuba@kernel.org> wrote:
> >   
> >> On Thu, 11 Apr 2024 13:45:42 -0600 David Ahern wrote:  
> >>>> +	/* Don't let NLM_DONE coalesce into a message, even if it could.
> >>>> +	 * Some user space expects NLM_DONE in a separate recv().      
> >>>
> >>> that's unfortunate    
> >>
> >> Do you have an opinion on the sysfs/opt-in question?
> >> Feels to me like there shouldn't be that much user space doing raw
> >> netlink, without a library. Old crufty code usually does ioctls, right?  
> > 
> > I think so too -- if there were more (maintained) applications with
> > this issue, we would have noticed by now.  
> 
> It depends on how you define "maintained".  Most application devs
> do not test with unreleased kernels.

I haven't, either, but users started shouting: we have nowadays plenty
of distributions shipping unreleased kernels.
Ilya Maximets April 12, 2024, 6:22 p.m. UTC | #10
On 4/12/24 20:03, Stefano Brivio wrote:
> On Fri, 12 Apr 2024 19:38:53 +0200
> Ilya Maximets <i.maximets@ovn.org> wrote:
> 
>> On 4/12/24 19:22, Stefano Brivio wrote:
>>> On Thu, 11 Apr 2024 14:01:54 -0700
>>> Jakub Kicinski <kuba@kernel.org> wrote:
>>>   
>>>> On Thu, 11 Apr 2024 13:45:42 -0600 David Ahern wrote:  
>>>>>> +	/* Don't let NLM_DONE coalesce into a message, even if it could.
>>>>>> +	 * Some user space expects NLM_DONE in a separate recv().      
>>>>>
>>>>> that's unfortunate    
>>>>
>>>> Do you have an opinion on the sysfs/opt-in question?
>>>> Feels to me like there shouldn't be that much user space doing raw
>>>> netlink, without a library. Old crufty code usually does ioctls, right?  
>>>
>>> I think so too -- if there were more (maintained) applications with
>>> this issue, we would have noticed by now.  
>>
>> It depends on how you define "maintained".  Most application devs
>> do not test with unreleased kernels.
> 
> I haven't, either, but users started shouting: we have nowadays plenty
> of distributions shipping unreleased kernels.
> 

In OVS we typically don't get any bug reports from users until the
issue hits a major distribution like RHEL or Ubuntu.

Slightly different crowd, I guess. :)

Best regards, Ilya Maximets.
patchwork-bot+netdevbpf@kernel.org April 15, 2024, 9:30 a.m. UTC | #11
Hello:

This patch was applied to netdev/net.git (main)
by David S. Miller <davem@davemloft.net>:

On Thu, 11 Apr 2024 11:02:02 -0700 you wrote:
> Commit under Fixes optimized the number of recv() calls
> needed during RTM_GETROUTE dumps, but we got multiple
> reports of applications hanging on recv() calls.
> Applications expect that a route dump will be terminated
> with a recv() reading an individual NLM_DONE message.
> 
> Coalescing NLM_DONE is perfectly legal in netlink,
> but even tho reporters fixed the code in respective
> projects, chances are it will take time for those
> applications to get updated. So revert to old behavior
> (for now)?
> 
> [...]

Here is the summary with links:
  - [net] inet: bring NLM_DONE out to a separate recv() again
    https://git.kernel.org/netdev/net/c/460b0d33cf10

You are awesome, thank you!
Ilya Maximets June 17, 2024, 3:09 p.m. UTC | #12
On 4/11/24 21:35, Ilya Maximets wrote:
> On 4/11/24 20:02, Jakub Kicinski wrote:
>> Commit under Fixes optimized the number of recv() calls
>> needed during RTM_GETROUTE dumps, but we got multiple
>> reports of applications hanging on recv() calls.
>> Applications expect that a route dump will be terminated
>> with a recv() reading an individual NLM_DONE message.
>>
>> Coalescing NLM_DONE is perfectly legal in netlink,
>> but even tho reporters fixed the code in respective
>> projects, chances are it will take time for those
>> applications to get updated. So revert to old behavior
>> (for now)?
>>
>> Old kernel (5.19):
>>
>>  $ ./cli.py --dbg-small-recv 4096 --spec netlink/specs/rt_route.yaml \
>>             --dump getroute --json '{"rtm-family": 2}'
>>  Recv: read 692 bytes, 11 messages
>>    nl_len = 68 (52) nl_flags = 0x22 nl_type = 24
>>  ...
>>    nl_len = 60 (44) nl_flags = 0x22 nl_type = 24
>>  Recv: read 20 bytes, 1 messages
>>    nl_len = 20 (4) nl_flags = 0x2 nl_type = 3
>>
>> Before (6.9-rc2):
>>
>>  $ ./cli.py --dbg-small-recv 4096 --spec netlink/specs/rt_route.yaml \
>>             --dump getroute --json '{"rtm-family": 2}'
>>  Recv: read 712 bytes, 12 messages
>>    nl_len = 68 (52) nl_flags = 0x22 nl_type = 24
>>  ...
>>    nl_len = 60 (44) nl_flags = 0x22 nl_type = 24
>>    nl_len = 20 (4) nl_flags = 0x2 nl_type = 3
>>
>> After:
>>
>>  $ ./cli.py --dbg-small-recv 4096 --spec netlink/specs/rt_route.yaml \
>>             --dump getroute --json '{"rtm-family": 2}'
>>  Recv: read 692 bytes, 11 messages
>>    nl_len = 68 (52) nl_flags = 0x22 nl_type = 24
>>  ...
>>    nl_len = 60 (44) nl_flags = 0x22 nl_type = 24
>>  Recv: read 20 bytes, 1 messages
>>    nl_len = 20 (4) nl_flags = 0x2 nl_type = 3
>>
>> Reported-by: Stefano Brivio <sbrivio@redhat.com>
>> Link: https://lore.kernel.org/all/20240315124808.033ff58d@elisabeth
>> Reported-by: Ilya Maximets <i.maximets@ovn.org>
>> Link: https://lore.kernel.org/all/02b50aae-f0e9-47a4-8365-a977a85975d3@ovn.org
>> Fixes: 4ce5dc9316de ("inet: switch inet_dump_fib() to RCU protection")
>> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
>> ---
>> CC: dsahern@kernel.org
>> CC: donald.hunter@gmail.com
>> ---
>>  net/ipv4/fib_frontend.c | 5 +++++
>>  1 file changed, 5 insertions(+)
>>
>> diff --git a/net/ipv4/fib_frontend.c b/net/ipv4/fib_frontend.c
>> index 48741352a88a..c484b1c0fc00 100644
>> --- a/net/ipv4/fib_frontend.c
>> +++ b/net/ipv4/fib_frontend.c
>> @@ -1050,6 +1050,11 @@ static int inet_dump_fib(struct sk_buff *skb, struct netlink_callback *cb)
>>  			e++;
>>  		}
>>  	}
>> +
>> +	/* Don't let NLM_DONE coalesce into a message, even if it could.
>> +	 * Some user space expects NLM_DONE in a separate recv().
>> +	 */
>> +	err = skb->len;
>>  out:
>>  
>>  	cb->args[1] = e;
> 
> FWIW, on current net-next this fixes the issue with Libreswan and IPv4
> (IPv6 issue remains, obviously).  I also did a round of other OVS system
> tests and they worked fine.
> 
> Tested-by: Ilya Maximets <i.maximets@ovn.org>

Hi, Jakub.  Now that IPv6 change is in 6.10-rc, do you plan to submit a similar
fix for it as well?  (Sorry if I missed it.)  Libreswan is getting stuck on IPv6
route lookups with 6.10-rc4.

Note: Libreswan fixed the issue on their main branch, but it is not available in
any release yet, and I'm not sure if the fix is going to make it into stable
releases.

Best regards, Ilya Maximets.
Jakub Kicinski June 17, 2024, 4:36 p.m. UTC | #13
On Mon, 17 Jun 2024 17:09:44 +0200 Ilya Maximets wrote:
> > FWIW, on current net-next this fixes the issue with Libreswan and IPv4
> > (IPv6 issue remains, obviously).  I also did a round of other OVS system
> > tests and they worked fine.
> > 
> > Tested-by: Ilya Maximets <i.maximets@ovn.org>  
> 
> Hi, Jakub.  Now that IPv6 change is in 6.10-rc, do you plan to submit a similar
> fix for it as well?  (Sorry if I missed it.)  Libreswan is getting stuck on IPv6
> route lookups with 6.10-rc4.
> 
> Note: Libreswan fixed the issue on their main branch, but it is not available in
> any release yet, and I'm not sure if the fix is going to make it into stable
> releases.

Sorry for the delay, we'll get it resolved before this week's PR.
Ilya Maximets June 17, 2024, 5:05 p.m. UTC | #14
On 6/17/24 18:36, Jakub Kicinski wrote:
> On Mon, 17 Jun 2024 17:09:44 +0200 Ilya Maximets wrote:
>>> FWIW, on current net-next this fixes the issue with Libreswan and IPv4
>>> (IPv6 issue remains, obviously).  I also did a round of other OVS system
>>> tests and they worked fine.
>>>
>>> Tested-by: Ilya Maximets <i.maximets@ovn.org>  
>>
>> Hi, Jakub.  Now that IPv6 change is in 6.10-rc, do you plan to submit a similar
>> fix for it as well?  (Sorry if I missed it.)  Libreswan is getting stuck on IPv6
>> route lookups with 6.10-rc4.
>>
>> Note: Libreswan fixed the issue on their main branch, but it is not available in
>> any release yet, and I'm not sure if the fix is going to make it into stable
>> releases.
> 
> Sorry for the delay, we'll get it resolved before this week's PR.

Ack.  Thanks!

Best regards, Ilya Maximets.
diff mbox series

Patch

diff --git a/net/ipv4/fib_frontend.c b/net/ipv4/fib_frontend.c
index 48741352a88a..c484b1c0fc00 100644
--- a/net/ipv4/fib_frontend.c
+++ b/net/ipv4/fib_frontend.c
@@ -1050,6 +1050,11 @@  static int inet_dump_fib(struct sk_buff *skb, struct netlink_callback *cb)
 			e++;
 		}
 	}
+
+	/* Don't let NLM_DONE coalesce into a message, even if it could.
+	 * Some user space expects NLM_DONE in a separate recv().
+	 */
+	err = skb->len;
 out:
 
 	cb->args[1] = e;