Message ID | 20240601212517.644844-1-kuba@kernel.org (mailing list archive) |
---|---|
State | Superseded |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | [net] inet: bring NLM_DONE out to a separate recv() in inet_dump_ifaddr() | expand |
On Sat, 1 Jun 2024 14:25:17 -0700 Jakub Kicinski <kuba@kernel.org> wrote: > + * When used in combination with for_each_netdev_dump() - make sure to > + * invalidate the ifindex when iteration is done. for_each_netdev_dump() > + * does not move the iterator index "after" the last valid entry. > + * > + * NOTE: Do not use this helper for dumps without known legacy users! > + * Most families are accessed only using well-written libraries > + * so starting to coalesce NLM_DONE is perfectly fine, and more efficient. Sorry, I disagree. You can't just fix the problem areas. The split was an ABI change, and there could be a problem in any dump. This the ABI version of the old argument If a tree falls in a forest and no one is around to hear it, does it make a sound? All dumps must behave the same. You are stuck with the legacy behavior.
On Sat, 1 Jun 2024 16:10:13 -0700 Stephen Hemminger wrote: > Sorry, I disagree. > > You can't just fix the problem areas. The split was an ABI change, and there could > be a problem in any dump. This the ABI version of the old argument > If a tree falls in a forest and no one is around to hear it, does it make a sound? > > All dumps must behave the same. You are stuck with the legacy behavior. The dump partitioning is up to the family. Multiple families coalesce NLM_DONE from day 1. "All dumps must behave the same" is saying we should convert all families to be poorly behaved. Admittedly changing the most heavily used parts of rtnetlink is very risky. And there's couple more corner cases which I'm afraid someone will hit. I'm adding this helper to clearly annotate "legacy" callbacks, so we don't regress again. At the same time nobody should use this in new code or "just to be safe" (read: because they don't understand netlink).
On 6/1/24 5:48 PM, Jakub Kicinski wrote: > On Sat, 1 Jun 2024 16:10:13 -0700 Stephen Hemminger wrote: >> Sorry, I disagree. >> >> You can't just fix the problem areas. The split was an ABI change, and there could >> be a problem in any dump. This the ABI version of the old argument >> If a tree falls in a forest and no one is around to hear it, does it make a sound? >> >> All dumps must behave the same. You are stuck with the legacy behavior. I don't agree with such a hard line stance. Mistakes made 20 years ago cannot hold Linux back from moving forward. We have to continue searching for ways to allow better or more performant behavior. > > The dump partitioning is up to the family. Multiple families > coalesce NLM_DONE from day 1. "All dumps must behave the same" > is saying we should convert all families to be poorly behaved. > > Admittedly changing the most heavily used parts of rtnetlink is very > risky. And there's couple more corner cases which I'm afraid someone > will hit. I'm adding this helper to clearly annotate "legacy" > callbacks, so we don't regress again. At the same time nobody should > use this in new code or "just to be safe" (read: because they don't > understand netlink). What about a socket option that says "I am a modern app and can handle the new way" - similar to the strict mode option that was added? Then the decision of requiring a separate message for NLM_DONE can be based on the app. Could even throw a `pr_warn_once("modernize app %s/%d\n")` to help old apps understand they need to move forward.
On Sun, Jun 2, 2024 at 4:23 AM David Ahern <dsahern@kernel.org> wrote: > > On 6/1/24 5:48 PM, Jakub Kicinski wrote: > > On Sat, 1 Jun 2024 16:10:13 -0700 Stephen Hemminger wrote: > >> Sorry, I disagree. > >> > >> You can't just fix the problem areas. The split was an ABI change, and there could > >> be a problem in any dump. This the ABI version of the old argument > >> If a tree falls in a forest and no one is around to hear it, does it make a sound? > >> > >> All dumps must behave the same. You are stuck with the legacy behavior. > > I don't agree with such a hard line stance. Mistakes made 20 years ago > cannot hold Linux back from moving forward. We have to continue > searching for ways to allow better or more performant behavior. > > > > > The dump partitioning is up to the family. Multiple families > > coalesce NLM_DONE from day 1. "All dumps must behave the same" > > is saying we should convert all families to be poorly behaved. > > > > Admittedly changing the most heavily used parts of rtnetlink is very > > risky. And there's couple more corner cases which I'm afraid someone > > will hit. I'm adding this helper to clearly annotate "legacy" > > callbacks, so we don't regress again. At the same time nobody should > > use this in new code or "just to be safe" (read: because they don't > > understand netlink). > > What about a socket option that says "I am a modern app and can handle > the new way" - similar to the strict mode option that was added? Then > the decision of requiring a separate message for NLM_DONE can be based > on the app. Could even throw a `pr_warn_once("modernize app %s/%d\n")` > to help old apps understand they need to move forward. Main motivation for me was to avoid re-grabbing RTNL again just to get NLM_DONE. The avoidance of the two system calls was really secondary. I think we could make a generic change in netlink_dump() to force NLM_DONE in an empty message _and_ avoiding a useless call to the dump method, which might still use RTNL or other contended mutex. In a prior feedback I suggested a sysctl that Jakub disliked, but really we do not care yet, as long as we avoid RTNL as much as we can. Jakub, what about the following generic change, instead of ad-hoc changes ? I tested it, I can send it with the minimal change (the alloc skb optim will reach net-next) diff --git a/net/netlink/af_netlink.c b/net/netlink/af_netlink.c index fa9c090cf629e6e92c097285b262ed90324c7656..0a58e5d13b8e68dd3fbb2b3fb362c3399fa29909 100644 --- a/net/netlink/af_netlink.c +++ b/net/netlink/af_netlink.c @@ -2289,15 +2289,20 @@ static int netlink_dump(struct sock *sk, bool lock_taken) * ever provided a big enough buffer. */ cb = &nlk->cb; - alloc_min_size = max_t(int, cb->min_dump_alloc, NLMSG_GOODSIZE); - - max_recvmsg_len = READ_ONCE(nlk->max_recvmsg_len); - if (alloc_min_size < max_recvmsg_len) { - alloc_size = max_recvmsg_len; - skb = alloc_skb(alloc_size, + if (nlk->dump_done_errno) { + alloc_min_size = max_t(int, cb->min_dump_alloc, NLMSG_GOODSIZE); + max_recvmsg_len = READ_ONCE(nlk->max_recvmsg_len); + if (alloc_min_size < max_recvmsg_len) { + alloc_size = max_recvmsg_len; + skb = alloc_skb(alloc_size, (GFP_KERNEL & ~__GFP_DIRECT_RECLAIM) | __GFP_NOWARN | __GFP_NORETRY); + } + } else { + /* Allocate the space needed for NLMSG_DONE alone. */ + alloc_min_size = nlmsg_total_size(sizeof(nlk->dump_done_errno)); } + if (!skb) { alloc_size = alloc_min_size; skb = alloc_skb(alloc_size, GFP_KERNEL); @@ -2350,8 +2355,7 @@ static int netlink_dump(struct sock *sk, bool lock_taken) cb->extack = NULL; } - if (nlk->dump_done_errno > 0 || - skb_tailroom(skb) < nlmsg_total_size(sizeof(nlk->dump_done_errno))) { + if (skb->len) { mutex_unlock(&nlk->nl_cb_mutex); if (sk_filter(sk, skb))
On Sat, 1 Jun 2024 20:23:17 -0600 David Ahern wrote: > > The dump partitioning is up to the family. Multiple families > > coalesce NLM_DONE from day 1. "All dumps must behave the same" > > is saying we should convert all families to be poorly behaved. > > > > Admittedly changing the most heavily used parts of rtnetlink is very > > risky. And there's couple more corner cases which I'm afraid someone > > will hit. I'm adding this helper to clearly annotate "legacy" > > callbacks, so we don't regress again. At the same time nobody should > > use this in new code or "just to be safe" (read: because they don't > > understand netlink). > > What about a socket option that says "I am a modern app and can handle > the new way" - similar to the strict mode option that was added? Then > the decision of requiring a separate message for NLM_DONE can be based > on the app. That seems like a good solution, with the helper marking the "legacy" handlers - I hope it should be trivial to add such option and change the helper's behavior based on the socket state. > Could even throw a `pr_warn_once("modernize app %s/%d\n")` > to help old apps understand they need to move forward. Hm, do you think people would actually modernize all the legacy apps? Coincidentally, looking at Jaroslav's traces it appears that the app sets ifindex for the link dump, so it must not be opting into strict checking, either.
On Sun, 2 Jun 2024 12:00:09 +0200 Eric Dumazet wrote: > Main motivation for me was to avoid re-grabbing RTNL again just to get NLM_DONE. > > The avoidance of the two system calls was really secondary. > > I think we could make a generic change in netlink_dump() to force NLM_DONE > in an empty message _and_ avoiding a useless call to the dump method, which > might still use RTNL or other contended mutex. > > In a prior feedback I suggested a sysctl that Jakub disliked, > but really we do not care yet, as long as we avoid RTNL as much as we can. > > Jakub, what about the following generic change, instead of ad-hoc changes ? Netlink is full of legacy behavior, the only way to make it usable in modern environments is to let new families not repeat the mistakes. That's why I'd really rather not add a workaround at the af_netlink level. Why would ethtool (which correctly coalesced NLM_DONE from day 1) suddenly start needed another recv(). A lot of the time the entire dump fits in one skb. If you prefer to sacrifice all of rtnetlink (some of which, to be clear, has also been correctly coded from day 1) - we can add a trampoline for rtnetlink dump handlers? (just to illustrate, not even compile tested) diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c index 522bbd70c205..c59b39ee544f 100644 --- a/net/core/rtnetlink.c +++ b/net/core/rtnetlink.c @@ -6487,6 +6487,18 @@ static int rtnl_mdb_del(struct sk_buff *skb, struct nlmsghdr *nlh, /* Process one rtnetlink message. */ +static int rtnl_dumpit(struct sk_buff *skb, struct netlink_callback *cb) +{ + rtnl_dumpit_func dumpit = cb->data; + int err; + + err = dumpit(skb, cb); + if (err < 0 && err != -EMSGSIZE) + return err; + + return skb->len; +} + static int rtnetlink_rcv_msg(struct sk_buff *skb, struct nlmsghdr *nlh, struct netlink_ext_ack *extack) { @@ -6546,10 +6558,11 @@ static int rtnetlink_rcv_msg(struct sk_buff *skb, struct nlmsghdr *nlh, rtnl = net->rtnl; if (err == 0) { struct netlink_dump_control c = { - .dump = dumpit, + .dump = rtnl_dumpit, .min_dump_alloc = min_dump_alloc, .module = owner, .flags = flags, + .data = dumpit, }; err = netlink_dump_start(rtnl, skb, nlh, &c); /* netlink_dump_start() will keep a reference on
On 6/2/24 3:59 PM, Jakub Kicinski wrote: > On Sat, 1 Jun 2024 20:23:17 -0600 David Ahern wrote: >>> The dump partitioning is up to the family. Multiple families >>> coalesce NLM_DONE from day 1. "All dumps must behave the same" >>> is saying we should convert all families to be poorly behaved. >>> >>> Admittedly changing the most heavily used parts of rtnetlink is very >>> risky. And there's couple more corner cases which I'm afraid someone >>> will hit. I'm adding this helper to clearly annotate "legacy" >>> callbacks, so we don't regress again. At the same time nobody should >>> use this in new code or "just to be safe" (read: because they don't >>> understand netlink). >> >> What about a socket option that says "I am a modern app and can handle >> the new way" - similar to the strict mode option that was added? Then >> the decision of requiring a separate message for NLM_DONE can be based >> on the app. > > That seems like a good solution, with the helper marking the "legacy" > handlers - I hope it should be trivial to add such option and change > the helper's behavior based on the socket state. > >> Could even throw a `pr_warn_once("modernize app %s/%d\n")` >> to help old apps understand they need to move forward. > > Hm, do you think people would actually modernize all the legacy apps? I have worked for a few companies that do monitor dmesg and when given the right push will update apps. Best an OS can do. > > Coincidentally, looking at Jaroslav's traces it appears that the app > sets ifindex for the link dump, so it must not be opting into strict > checking, either. :-( I should have added a warning back when the option was introduced - that and a warning when options to a dump are ignored.
On Sun, 2 Jun 2024 15:21:02 -0700 Jakub Kicinski wrote: > Netlink is full of legacy behavior, the only way to make it usable > in modern environments is to let new families not repeat the mistakes. > That's why I'd really rather not add a workaround at the af_netlink > level. Why would ethtool (which correctly coalesced NLM_DONE from day 1) > suddenly start needed another recv(). A lot of the time the entire dump > fits in one skb. > > If you prefer to sacrifice all of rtnetlink (some of which, to be clear, > has also been correctly coded from day 1) - we can add a trampoline for > rtnetlink dump handlers? Hi Eric, how do you feel about this approach? It would also let us extract the "RTNL unlocked dump" handling from af_netlink.c, which would be nice. BTW it will probably need to be paired with fixing the for_each_netdev_dump() foot gun, maybe (untested): --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -3025,7 +3025,8 @@ int call_netdevice_notifiers_info(unsigned long val, #define net_device_entry(lh) list_entry(lh, struct net_device, dev_list) #define for_each_netdev_dump(net, d, ifindex) \ - xa_for_each_start(&(net)->dev_by_index, (ifindex), (d), (ifindex)) + for (; (d = xa_find(&(net)->dev_by_index, &ifindex, \ + ULONG_MAX, XA_PRESENT)); ifindex++) static inline struct net_device *next_net_device(struct net_device *dev) {
On Sat, Jun 1, 2024 at 10:23 PM David Ahern <dsahern@kernel.org> wrote: > > On 6/1/24 5:48 PM, Jakub Kicinski wrote: > > On Sat, 1 Jun 2024 16:10:13 -0700 Stephen Hemminger wrote: > >> Sorry, I disagree. > >> > >> You can't just fix the problem areas. The split was an ABI change, and there could > >> be a problem in any dump. This the ABI version of the old argument > >> If a tree falls in a forest and no one is around to hear it, does it make a sound? > >> > >> All dumps must behave the same. You are stuck with the legacy behavior. > > I don't agree with such a hard line stance. Mistakes made 20 years ago > cannot hold Linux back from moving forward. We have to continue > searching for ways to allow better or more performant behavior. > > > > > The dump partitioning is up to the family. Multiple families > > coalesce NLM_DONE from day 1. "All dumps must behave the same" > > is saying we should convert all families to be poorly behaved. > > > > Admittedly changing the most heavily used parts of rtnetlink is very > > risky. And there's couple more corner cases which I'm afraid someone > > will hit. I'm adding this helper to clearly annotate "legacy" > > callbacks, so we don't regress again. At the same time nobody should > > use this in new code or "just to be safe" (read: because they don't > > understand netlink). > > What about a socket option that says "I am a modern app and can handle > the new way" - similar to the strict mode option that was added? Then > the decision of requiring a separate message for NLM_DONE can be based > on the app. Could even throw a `pr_warn_once("modernize app %s/%d\n")` > to help old apps understand they need to move forward. > Sorry, being a little lazy so asking instead: NLMSG_DONE is traditionally the "EOT" (end of transaction) signal, if you get rid of it - how does the user know there are more msgs coming or the dump transaction is over? In addition to the user->kernel "I am modern", perhaps set the nlmsg_flag in the reverse path to either say "there's more coming" which you dont set on the last message or "we are doing this the new way". Backward compat is very important - there are dinosaur apps out there that will break otherwise. cheers, jamal
On Mon, Jun 3, 2024 at 3:54 PM Jakub Kicinski <kuba@kernel.org> wrote: > > On Sun, 2 Jun 2024 15:21:02 -0700 Jakub Kicinski wrote: > > Netlink is full of legacy behavior, the only way to make it usable > > in modern environments is to let new families not repeat the mistakes. > > That's why I'd really rather not add a workaround at the af_netlink > > level. Why would ethtool (which correctly coalesced NLM_DONE from day 1) > > suddenly start needed another recv(). A lot of the time the entire dump > > fits in one skb. > > > > If you prefer to sacrifice all of rtnetlink (some of which, to be clear, > > has also been correctly coded from day 1) - we can add a trampoline for > > rtnetlink dump handlers? > > Hi Eric, how do you feel about this approach? It would also let us > extract the "RTNL unlocked dump" handling from af_netlink.c, which > would be nice. Sure, I have not thought of af_netlink > > BTW it will probably need to be paired with fixing the > for_each_netdev_dump() foot gun, maybe (untested): > I confess I am a bit lost : this part relates to your original submission, when you set "ctx->ifindex = ULONG_MAX;" in inet_dump_ifaddr() ? > --- a/include/linux/netdevice.h > +++ b/include/linux/netdevice.h > @@ -3025,7 +3025,8 @@ int call_netdevice_notifiers_info(unsigned long val, > #define net_device_entry(lh) list_entry(lh, struct net_device, dev_list) > > #define for_each_netdev_dump(net, d, ifindex) \ > - xa_for_each_start(&(net)->dev_by_index, (ifindex), (d), (ifindex)) > + for (; (d = xa_find(&(net)->dev_by_index, &ifindex, \ > + ULONG_MAX, XA_PRESENT)); ifindex++) > > static inline struct net_device *next_net_device(struct net_device *dev) > { >
On Mon, 3 Jun 2024 16:22:19 +0200 Eric Dumazet wrote: > > Hi Eric, how do you feel about this approach? It would also let us > > extract the "RTNL unlocked dump" handling from af_netlink.c, which > > would be nice. > > Sure, I have not thought of af_netlink > > > BTW it will probably need to be paired with fixing the > > for_each_netdev_dump() foot gun, maybe (untested): > > I confess I am a bit lost : this part relates to your original submission, > when you set "ctx->ifindex = ULONG_MAX;" in inet_dump_ifaddr() ? Yes, xa_for_each_start() leaves ifindex at the last valid entry, so without this snippet it's not safe to call dump again after we reached the end.
On Mon, Jun 3, 2024 at 4:59 PM Jakub Kicinski <kuba@kernel.org> wrote: > > On Mon, 3 Jun 2024 16:22:19 +0200 Eric Dumazet wrote: > > > Hi Eric, how do you feel about this approach? It would also let us > > > extract the "RTNL unlocked dump" handling from af_netlink.c, which > > > would be nice. > > > > Sure, I have not thought of af_netlink > > > > > BTW it will probably need to be paired with fixing the > > > for_each_netdev_dump() foot gun, maybe (untested): > > > > I confess I am a bit lost : this part relates to your original submission, > > when you set "ctx->ifindex = ULONG_MAX;" in inet_dump_ifaddr() ? > > Yes, xa_for_each_start() leaves ifindex at the last valid entry, so > without this snippet it's not safe to call dump again after we reached > the end. A generic change would be to make sure we remember we hit the end, I guess this can be done later.
On 6/3/24 8:05 AM, Jamal Hadi Salim wrote: > > Sorry, being a little lazy so asking instead: > NLMSG_DONE is traditionally the "EOT" (end of transaction) signal, if > you get rid of it - how does the user know there are more msgs coming > or the dump transaction is over? In addition to the user->kernel "I am > modern", perhaps set the nlmsg_flag in the reverse path to either say > "there's more coming" which you dont set on the last message or "we > are doing this the new way". Backward compat is very important - there > are dinosaur apps out there that will break otherwise. > NLM_DONE is not getting removed. The recent changes allow the end message signal without a separate system call.
On Mon, Jun 3, 2024 at 4:05 PM Jamal Hadi Salim <jhs@mojatatu.com> wrote: > > On Sat, Jun 1, 2024 at 10:23 PM David Ahern <dsahern@kernel.org> wrote: > > > > On 6/1/24 5:48 PM, Jakub Kicinski wrote: > > > On Sat, 1 Jun 2024 16:10:13 -0700 Stephen Hemminger wrote: > > >> Sorry, I disagree. > > >> > > >> You can't just fix the problem areas. The split was an ABI change, and there could > > >> be a problem in any dump. This the ABI version of the old argument > > >> If a tree falls in a forest and no one is around to hear it, does it make a sound? > > >> > > >> All dumps must behave the same. You are stuck with the legacy behavior. > > > > I don't agree with such a hard line stance. Mistakes made 20 years ago > > cannot hold Linux back from moving forward. We have to continue > > searching for ways to allow better or more performant behavior. > > > > > > > > The dump partitioning is up to the family. Multiple families > > > coalesce NLM_DONE from day 1. "All dumps must behave the same" > > > is saying we should convert all families to be poorly behaved. > > > > > > Admittedly changing the most heavily used parts of rtnetlink is very > > > risky. And there's couple more corner cases which I'm afraid someone > > > will hit. I'm adding this helper to clearly annotate "legacy" > > > callbacks, so we don't regress again. At the same time nobody should > > > use this in new code or "just to be safe" (read: because they don't > > > understand netlink). > > > > What about a socket option that says "I am a modern app and can handle > > the new way" - similar to the strict mode option that was added? Then > > the decision of requiring a separate message for NLM_DONE can be based > > on the app. Could even throw a `pr_warn_once("modernize app %s/%d\n")` > > to help old apps understand they need to move forward. > > > > Sorry, being a little lazy so asking instead: > NLMSG_DONE is traditionally the "EOT" (end of transaction) signal, if > you get rid of it - how does the user know there are more msgs coming > or the dump transaction is over? In addition to the user->kernel "I am > modern", perhaps set the nlmsg_flag in the reverse path to either say > "there's more coming" which you dont set on the last message or "we > are doing this the new way". Backward compat is very important - there > are dinosaur apps out there that will break otherwise. The NLMSG_DONE was not removed. Some applications expected it to be carried in a standalone message for some of the rtnetlink operations, because old kernel implementations accidentally had this (undocumented) behavior. When the kernel started to be smart and piggy-back the NLMSG_DONE in the 'last given answer', these applications started to complain. Basically these applications do not correctly parse the full answer the kernel gives to them.
diff --git a/include/net/netlink.h b/include/net/netlink.h index e78ce008e07c..8369aca32443 100644 --- a/include/net/netlink.h +++ b/include/net/netlink.h @@ -1198,6 +1198,48 @@ nl_dump_check_consistent(struct netlink_callback *cb, cb->prev_seq = cb->seq; } +/** + * nl_dump_legacy_retval - get legacy return code for netlink dumps + * @err: error encountered during dump, negative errno or zero + * @skb: skb with the dump results + * + * Netlink dump callbacks get called multiple times per dump, because + * all the objects may not fit into a single skb. Whether another iteration + * is necessary gets decided based on the return value of the callback + * (with 0 meaning "end reached"). + * + * The semantics used to be more complicated, with positive return values + * meaning "continue" and negative meaning "end with an error". A lot of + * handlers simplified this to return skb->len ? : -errno. Meaning that zero + * would only be returned when skb was empty, requiring another recvmsg() + * syscall just to get the NLM_DONE message. + * + * The current semantics allow handlers to also return -EMSGSIZE to continue. + * + * Unfortunately, some user space has started to depend on the NLM_DONE + * message being returned individually, in a separate recvmsg(). Select + * netlink dumps must preserve those semantics. + * + * This helper wraps the "legacy logic" and serves as an annotation for + * dumps which are known to require legacy handling. + * + * When used in combination with for_each_netdev_dump() - make sure to + * invalidate the ifindex when iteration is done. for_each_netdev_dump() + * does not move the iterator index "after" the last valid entry. + * + * NOTE: Do not use this helper for dumps without known legacy users! + * Most families are accessed only using well-written libraries + * so starting to coalesce NLM_DONE is perfectly fine, and more efficient. + * + * Return: return code to use for a dump handler + */ +static inline int nl_dump_legacy_retval(int err, const struct sk_buff *skb) +{ + if (err < 0 && err != -EMSGSIZE) + return err; + return skb->len; +} + /************************************************************************** * Netlink Attributes **************************************************************************/ diff --git a/net/ipv4/devinet.c b/net/ipv4/devinet.c index 96accde527da..6d0e5cbd95b4 100644 --- a/net/ipv4/devinet.c +++ b/net/ipv4/devinet.c @@ -1911,7 +1911,10 @@ static int inet_dump_ifaddr(struct sk_buff *skb, struct netlink_callback *cb) if (err < 0) goto done; } + ctx->ifindex = ULONG_MAX; done: + err = nl_dump_legacy_retval(err, skb); + if (fillargs.netnsid >= 0) put_net(tgt_net); rcu_read_unlock(); diff --git a/net/ipv4/fib_frontend.c b/net/ipv4/fib_frontend.c index c484b1c0fc00..100d77eafe35 100644 --- a/net/ipv4/fib_frontend.c +++ b/net/ipv4/fib_frontend.c @@ -1051,10 +1051,7 @@ static int inet_dump_fib(struct sk_buff *skb, struct netlink_callback *cb) } } - /* Don't let NLM_DONE coalesce into a message, even if it could. - * Some user space expects NLM_DONE in a separate recv(). - */ - err = skb->len; + err = nl_dump_legacy_retval(err, skb); out: cb->args[1] = e;
Jaroslav reports Dell's OMSA Systems Management Data Engine expects NLM_DONE in a separate recvmsg(), so revert to the old behavior. This is the same kind of fix as we added in commit 460b0d33cf10 ("inet: bring NLM_DONE out to a separate recv() again") so wrap the logic into a helper, to make it easier to keep track of which dump handles we know to require legacy handling (and possibly one day let sockets opt into not doing this). Tested: ./cli.py --dbg-small-recv 4096 --spec netlink/specs/rt_addr.yaml \ --dump getaddr --json '{"ifa-family": 2}' ./cli.py --dbg-small-recv 4096 --spec netlink/specs/rt_route.yaml \ --dump getroute --json '{"rtm-family": 2}' Fixes: cdb2f80f1c10 ("inet: use xa_array iterator to implement inet_dump_ifaddr()") Reported-by: Jaroslav Pulchart <jaroslav.pulchart@gooddata.com> Link: https://lore.kernel.org/all/CAK8fFZ7MKoFSEzMBDAOjoUt+vTZRRQgLDNXEOfdCCXSoXXKE0g@mail.gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> --- CC: dsahern@kernel.org --- include/net/netlink.h | 42 +++++++++++++++++++++++++++++++++++++++++ net/ipv4/devinet.c | 3 +++ net/ipv4/fib_frontend.c | 5 +---- 3 files changed, 46 insertions(+), 4 deletions(-)