diff mbox series

[net] net: loopback: Avoid sending IP packets without an Ethernet header

Message ID 20250220072559.782296-1-idosch@nvidia.com (mailing list archive)
State Accepted
Commit 0e4427f8f587c4b603475468bb3aee9418574893
Delegated to: Netdev Maintainers
Headers show
Series [net] net: loopback: Avoid sending IP packets without an Ethernet header | expand

Checks

Context Check Description
netdev/series_format success Single patches do not need cover letters
netdev/tree_selection success Clearly marked for net
netdev/ynl success Generated files up to date; no warnings/errors; no diff in generated;
netdev/fixes_present success Fixes tag present in non-next series
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 0 this patch: 0
netdev/build_tools success No tools touched, skip
netdev/cc_maintainers success CCed 7 of 7 maintainers
netdev/build_clang success Errors and warnings before: 0 this patch: 0
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/deprecated_api success None detected
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success Fixes tag looks correct
netdev/build_allmodconfig_warn success Errors and warnings before: 0 this patch: 0
netdev/checkpatch success total: 0 errors, 0 warnings, 0 checks, 22 lines checked
netdev/build_clang_rust success No Rust files in patch. Skipping build
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0
netdev/contest success net-next-2025-02-20--12-00 (tests: 893)

Commit Message

Ido Schimmel Feb. 20, 2025, 7:25 a.m. UTC
After commit 22600596b675 ("ipv4: give an IPv4 dev to blackhole_netdev")
IPv4 neighbors can be constructed on the blackhole net device, but they
are constructed with an output function (neigh_direct_output()) that
simply calls dev_queue_xmit(). The latter will transmit packets via
'skb->dev' which might not be the blackhole net device if dst_dev_put()
switched 'dst->dev' to the blackhole net device while another CPU was
using the dst entry in ip_output(), but after it already initialized
'skb->dev' from 'dst->dev'.

Specifically, the following can happen:

    CPU1                                      CPU2

udp_sendmsg(sk1)                          udp_sendmsg(sk2)
udp_send_skb()                            [...]
ip_output()
    skb->dev = skb_dst(skb)->dev
                                          dst_dev_put()
                                              dst->dev = blackhole_netdev
ip_finish_output2()
    resolves neigh on dst->dev
neigh_output()
neigh_direct_output()
dev_queue_xmit()

This will result in IPv4 packets being sent without an Ethernet header
via a valid net device:

tcpdump: verbose output suppressed, use -v[v]... for full protocol decode
listening on enp9s0, link-type EN10MB (Ethernet), snapshot length 262144 bytes
22:07:02.329668 20:00:40:11:18:fb > 45:00:00:44:f4:94, ethertype Unknown
(0x58c6), length 68:
        0x0000:  8dda 74ca f1ae ca6c ca6c 0098 969c 0400  ..t....l.l......
        0x0010:  0000 4730 3f18 6800 0000 0000 0000 9971  ..G0?.h........q
        0x0020:  c4c9 9055 a157 0a70 9ead bf83 38ca ab38  ...U.W.p....8..8
        0x0030:  8add ab96 e052                           .....R

Fix by making sure that neighbors are constructed on top of the
blackhole net device with an output function that simply consumes the
packets, in a similar fashion to dst_discard_out() and
blackhole_netdev_xmit().

Fixes: 8d7017fd621d ("blackhole_netdev: use blackhole_netdev to invalidate dst entries")
Fixes: 22600596b675 ("ipv4: give an IPv4 dev to blackhole_netdev")
Reported-by: Florian Meister <fmei@sfs.com>
Closes: https://lore.kernel.org/netdev/20250210084931.23a5c2e4@hermes.local/
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
---
 drivers/net/loopback.c | 14 ++++++++++++++
 1 file changed, 14 insertions(+)

Comments

Eric Dumazet Feb. 20, 2025, 10:40 a.m. UTC | #1
On Thu, Feb 20, 2025 at 8:26 AM Ido Schimmel <idosch@nvidia.com> wrote:
>
> After commit 22600596b675 ("ipv4: give an IPv4 dev to blackhole_netdev")
> IPv4 neighbors can be constructed on the blackhole net device, but they
> are constructed with an output function (neigh_direct_output()) that
> simply calls dev_queue_xmit(). The latter will transmit packets via
> 'skb->dev' which might not be the blackhole net device if dst_dev_put()
> switched 'dst->dev' to the blackhole net device while another CPU was
> using the dst entry in ip_output(), but after it already initialized
> 'skb->dev' from 'dst->dev'.
>
> Specifically, the following can happen:
>
>     CPU1                                      CPU2
>
> udp_sendmsg(sk1)                          udp_sendmsg(sk2)
> udp_send_skb()                            [...]
> ip_output()
>     skb->dev = skb_dst(skb)->dev
>                                           dst_dev_put()
>                                               dst->dev = blackhole_netdev
> ip_finish_output2()
>     resolves neigh on dst->dev
> neigh_output()
> neigh_direct_output()
> dev_queue_xmit()
>
> This will result in IPv4 packets being sent without an Ethernet header
> via a valid net device:
>
> tcpdump: verbose output suppressed, use -v[v]... for full protocol decode
> listening on enp9s0, link-type EN10MB (Ethernet), snapshot length 262144 bytes
> 22:07:02.329668 20:00:40:11:18:fb > 45:00:00:44:f4:94, ethertype Unknown
> (0x58c6), length 68:
>         0x0000:  8dda 74ca f1ae ca6c ca6c 0098 969c 0400  ..t....l.l......
>         0x0010:  0000 4730 3f18 6800 0000 0000 0000 9971  ..G0?.h........q
>         0x0020:  c4c9 9055 a157 0a70 9ead bf83 38ca ab38  ...U.W.p....8..8
>         0x0030:  8add ab96 e052                           .....R
>
> Fix by making sure that neighbors are constructed on top of the
> blackhole net device with an output function that simply consumes the
> packets, in a similar fashion to dst_discard_out() and
> blackhole_netdev_xmit().
>
> Fixes: 8d7017fd621d ("blackhole_netdev: use blackhole_netdev to invalidate dst entries")
> Fixes: 22600596b675 ("ipv4: give an IPv4 dev to blackhole_netdev")
> Reported-by: Florian Meister <fmei@sfs.com>
> Closes: https://lore.kernel.org/netdev/20250210084931.23a5c2e4@hermes.local/
> Signed-off-by: Ido Schimmel <idosch@nvidia.com>
> ---
>  drivers/net/loopback.c | 14 ++++++++++++++
>  1 file changed, 14 insertions(+)
>
> diff --git a/drivers/net/loopback.c b/drivers/net/loopback.c
> index c8840c3b9a1b..f1d68153987e 100644
> --- a/drivers/net/loopback.c
> +++ b/drivers/net/loopback.c
> @@ -244,8 +244,22 @@ static netdev_tx_t blackhole_netdev_xmit(struct sk_buff *skb,
>         return NETDEV_TX_OK;
>  }
>
> +static int blackhole_neigh_output(struct neighbour *n, struct sk_buff *skb)
> +{
> +       kfree_skb(skb);

If there is any risk of this being hit often, I would probably use the
recent SKB_DROP_REASON_BLACKHOLE

(feel free to resubmit
https://lore.kernel.org/netdev/20250212164323.2183023-1-edumazet@google.com/T/#mbb8d4b0779cb8f0654a382772c943af5389606ea
?)

Otherwise, this looks good to me, thanks !

Reviewed-by: Eric Dumazet <edumazet@google.com>
Ido Schimmel Feb. 20, 2025, 3:38 p.m. UTC | #2
On Thu, Feb 20, 2025 at 11:40:07AM +0100, Eric Dumazet wrote:
> On Thu, Feb 20, 2025 at 8:26 AM Ido Schimmel <idosch@nvidia.com> wrote:
> > diff --git a/drivers/net/loopback.c b/drivers/net/loopback.c
> > index c8840c3b9a1b..f1d68153987e 100644
> > --- a/drivers/net/loopback.c
> > +++ b/drivers/net/loopback.c
> > @@ -244,8 +244,22 @@ static netdev_tx_t blackhole_netdev_xmit(struct sk_buff *skb,
> >         return NETDEV_TX_OK;
> >  }
> >
> > +static int blackhole_neigh_output(struct neighbour *n, struct sk_buff *skb)
> > +{
> > +       kfree_skb(skb);
> 
> If there is any risk of this being hit often, I would probably use the
> recent SKB_DROP_REASON_BLACKHOLE

Not very often. About 10 times while running the reproducer I shared
here:
https://lore.kernel.org/netdev/Z7D9cR22BDPN7WSJ@shredder/

In line with the original report:
https://github.com/siderolabs/talos/issues/9837#issuecomment-2642116378

> (feel free to resubmit
> https://lore.kernel.org/netdev/20250212164323.2183023-1-edumazet@google.com/T/#mbb8d4b0779cb8f0654a382772c943af5389606ea
> ?)

Can we do it in net-next?

A few questions / suggestions regarding the patch:

1. Can we use it for IPv4 as well? I tested the following:

diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index 753704f75b2c..2aeab70c1cb5 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -966,6 +966,7 @@ static int ip_error(struct sk_buff *skb)
 
 	switch (rt->dst.error) {
 	case EINVAL:
+		SKB_DR_SET(reason, BLACKHOLE);
 	default:
 		goto out;
 	case EHOSTUNREACH:

2. Given that this reason is going to be triggered both for
user-installed blackhole routes and dst entries being destroyed how
about adjusting the comment? Otherwise I think it will be confusing for
users who didn't install a blackhole route. Something like:

diff --git a/drivers/net/loopback.c b/drivers/net/loopback.c
index f1d68153987e..cb269b3251d4 100644
--- a/drivers/net/loopback.c
+++ b/drivers/net/loopback.c
@@ -239,14 +239,13 @@ struct pernet_operations __net_initdata loopback_net_ops = {
 static netdev_tx_t blackhole_netdev_xmit(struct sk_buff *skb,
 					 struct net_device *dev)
 {
-	kfree_skb(skb);
-	net_warn_ratelimited("%s(): Dropping skb.\n", __func__);
+	kfree_skb_reason(skb, SKB_DROP_REASON_BLACKHOLE);
 	return NETDEV_TX_OK;
 }
 
 static int blackhole_neigh_output(struct neighbour *n, struct sk_buff *skb)
 {
-	kfree_skb(skb);
+	kfree_skb_reason(skb, SKB_DROP_REASON_BLACKHOLE);
 	return 0;
 }
 
diff --git a/include/net/dropreason-core.h b/include/net/dropreason-core.h
index de42577f16dd..0ef6869dbd1b 100644
--- a/include/net/dropreason-core.h
+++ b/include/net/dropreason-core.h
@@ -556,7 +556,8 @@ enum skb_drop_reason {
 	 */
 	SKB_DROP_REASON_BRIDGE_INGRESS_STP_STATE,
 	/**
-	 * @SKB_DROP_REASON_BLACKHOLE: blackhole route.
+	 * @SKB_DROP_REASON_BLACKHOLE: blackhole route or dst entry being
+	 * destroyed.
 	 */
 	SKB_DROP_REASON_BLACKHOLE,
 	/**

> Otherwise, this looks good to me, thanks !
> 
> Reviewed-by: Eric Dumazet <edumazet@google.com>

Thanks!
patchwork-bot+netdevbpf@kernel.org Feb. 22, 2025, 12:40 a.m. UTC | #3
Hello:

This patch was applied to netdev/net.git (main)
by Jakub Kicinski <kuba@kernel.org>:

On Thu, 20 Feb 2025 09:25:59 +0200 you wrote:
> After commit 22600596b675 ("ipv4: give an IPv4 dev to blackhole_netdev")
> IPv4 neighbors can be constructed on the blackhole net device, but they
> are constructed with an output function (neigh_direct_output()) that
> simply calls dev_queue_xmit(). The latter will transmit packets via
> 'skb->dev' which might not be the blackhole net device if dst_dev_put()
> switched 'dst->dev' to the blackhole net device while another CPU was
> using the dst entry in ip_output(), but after it already initialized
> 'skb->dev' from 'dst->dev'.
> 
> [...]

Here is the summary with links:
  - [net] net: loopback: Avoid sending IP packets without an Ethernet header
    https://git.kernel.org/netdev/net/c/0e4427f8f587

You are awesome, thank you!
diff mbox series

Patch

diff --git a/drivers/net/loopback.c b/drivers/net/loopback.c
index c8840c3b9a1b..f1d68153987e 100644
--- a/drivers/net/loopback.c
+++ b/drivers/net/loopback.c
@@ -244,8 +244,22 @@  static netdev_tx_t blackhole_netdev_xmit(struct sk_buff *skb,
 	return NETDEV_TX_OK;
 }
 
+static int blackhole_neigh_output(struct neighbour *n, struct sk_buff *skb)
+{
+	kfree_skb(skb);
+	return 0;
+}
+
+static int blackhole_neigh_construct(struct net_device *dev,
+				     struct neighbour *n)
+{
+	n->output = blackhole_neigh_output;
+	return 0;
+}
+
 static const struct net_device_ops blackhole_netdev_ops = {
 	.ndo_start_xmit = blackhole_netdev_xmit,
+	.ndo_neigh_construct = blackhole_neigh_construct,
 };
 
 /* This is a dst-dummy device used specifically for invalidated