diff mbox series

[net-next,v2,1/3] netlink: handle EMSGSIZE errors in the core

Message ID 20240303052408.310064-2-kuba@kernel.org (mailing list archive)
State Accepted
Commit b5a899154aa94cc573db3ae1f61dabe7bfe8b579
Delegated to: Netdev Maintainers
Headers show
Series netlink: handle EMSGSIZE errors in the core | expand

Checks

Context Check Description
netdev/series_format success Posting correctly formatted
netdev/tree_selection success Clearly marked for net-next
netdev/ynl success Generated files up to date; no warnings/errors; no diff in generated;
netdev/fixes_present success Fixes tag not required for -next series
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 945 this patch: 945
netdev/build_tools success No tools touched, skip
netdev/cc_maintainers success CCed 4 of 4 maintainers
netdev/build_clang success Errors and warnings before: 957 this patch: 957
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/deprecated_api success None detected
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success No Fixes tag
netdev/build_allmodconfig_warn success Errors and warnings before: 961 this patch: 961
netdev/checkpatch success total: 0 errors, 0 warnings, 0 checks, 15 lines checked
netdev/build_clang_rust success No Rust files in patch. Skipping build
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0
netdev/contest success net-next-2024-03-05--21-00 (tests: 892)

Commit Message

Jakub Kicinski March 3, 2024, 5:24 a.m. UTC
Eric points out that our current suggested way of handling
EMSGSIZE errors ((err == -EMSGSIZE) ? skb->len : err) will
break if we didn't fit even a single object into the buffer
provided by the user. This should not happen for well behaved
applications, but we can fix that, and free netlink families
from dealing with that completely by moving error handling
into the core.

Let's assume from now on that all EMSGSIZE errors in dumps are
because we run out of skb space. Families can now propagate
the error nla_put_*() etc generated and not worry about any
return value magic. If some family really wants to send EMSGSIZE
to user space, assuming it generates the same error on the next
dump iteration the skb->len should be 0, and user space should
still see the EMSGSIZE.

This should simplify families and prevent mistakes in return
values which lead to DONE being forced into a separate recv()
call as discovered by Ido some time ago.

Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
---
CC: kuniyu@amazon.com
---
 net/netlink/af_netlink.c | 9 +++++++++
 1 file changed, 9 insertions(+)

Comments

Ido Schimmel March 3, 2024, 3:01 p.m. UTC | #1
On Sat, Mar 02, 2024 at 09:24:06PM -0800, Jakub Kicinski wrote:
> Eric points out that our current suggested way of handling
> EMSGSIZE errors ((err == -EMSGSIZE) ? skb->len : err) will
> break if we didn't fit even a single object into the buffer
> provided by the user. This should not happen for well behaved
> applications, but we can fix that, and free netlink families
> from dealing with that completely by moving error handling
> into the core.
> 
> Let's assume from now on that all EMSGSIZE errors in dumps are
> because we run out of skb space. Families can now propagate
> the error nla_put_*() etc generated and not worry about any
> return value magic. If some family really wants to send EMSGSIZE
> to user space, assuming it generates the same error on the next
> dump iteration the skb->len should be 0, and user space should
> still see the EMSGSIZE.
> 
> This should simplify families and prevent mistakes in return
> values which lead to DONE being forced into a separate recv()
> call as discovered by Ido some time ago.
> 
> Reviewed-by: Eric Dumazet <edumazet@google.com>
> Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Reviewed-by: Ido Schimmel <idosch@nvidia.com>

I read your comment here [1] and I believe this patch [2] is needed to
avoid another pass in case of an error code other than EMSGSIZE. I can
submit it after your series is accepted.

[1] https://lore.kernel.org/netdev/20240229073750.6e59155e@kernel.org/

[2]
diff --git a/net/ipv4/nexthop.c b/net/ipv4/nexthop.c
index 70509da4f080..b3a24b61f76b 100644
--- a/net/ipv4/nexthop.c
+++ b/net/ipv4/nexthop.c
@@ -3241,10 +3241,6 @@ static int rtm_dump_nexthop(struct sk_buff *skb, struct netlink_callback *cb)
 
        err = rtm_dump_walk_nexthops(skb, cb, root, ctx,
                                     &rtm_dump_nexthop_cb, &filter);
-       if (err < 0) {
-               if (likely(skb->len))
-                       err = skb->len;
-       }
 
        cb->seq = net->nexthop.seq;
        nl_dump_check_consistent(cb, nlmsg_hdr(skb));
@@ -3439,11 +3435,6 @@ static int rtm_dump_nexthop_bucket(struct sk_buff *skb,
                                             &rtm_dump_nexthop_bucket_cb, &dd);
        }
 
-       if (err < 0) {
-               if (likely(skb->len))
-                       err = skb->len;
-       }
-
        cb->seq = net->nexthop.seq;
        nl_dump_check_consistent(cb, nlmsg_hdr(skb));
        return err;
diff mbox series

Patch

diff --git a/net/netlink/af_netlink.c b/net/netlink/af_netlink.c
index ad7b645e3ae7..da846212fb9b 100644
--- a/net/netlink/af_netlink.c
+++ b/net/netlink/af_netlink.c
@@ -2267,6 +2267,15 @@  static int netlink_dump(struct sock *sk, bool lock_taken)
 		if (extra_mutex)
 			mutex_unlock(extra_mutex);
 
+		/* EMSGSIZE plus something already in the skb means
+		 * that there's more to dump but current skb has filled up.
+		 * If the callback really wants to return EMSGSIZE to user space
+		 * it needs to do so again, on the next cb->dump() call,
+		 * without putting data in the skb.
+		 */
+		if (nlk->dump_done_errno == -EMSGSIZE && skb->len)
+			nlk->dump_done_errno = skb->len;
+
 		cb->extack = NULL;
 	}