Message ID | 20ea2ab0472ecf2d1625dadb7ca0df39cf4fe0f5.1712226175.git.antony.antony@secunet.com (mailing list archive) |
---|---|
State | Changes Requested |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | fix icmp error source with ICMP reverse lookup | expand |
Antony Antony via Devel <devel@linux-ipsec.org> wrote: > This commit would force to use source address from the gatway/host. > The ICMP error message source address correctly set from the host. While that seems more correct, since that host is generating, it might not fit into the IPsec tunnel, and therefore might go the right place or anywhere. Perhaps you could pick the internal IP of the gateway, but in more complex policies, the gateway itself might not be part of the VPN. > Again before the fix ping -W 5 -c 1 10.1.4.3 From 10.1.4.3 icmp_seq=1 > Destination Host Unreachable > After the fix From 10.1.3.2 icmp_seq=1 Destination Host Unreachable ip -netns host2 xfrm policy add src 10.1.1.0/24 dst 10.1.4.0/24 dir out \ flag icmp tmpl src 10.1.2.1 dst 10.1.3.2 proto esp reqid 1 mode tunnel As far as I can see, 10.1.3.2 does not fit into this policy. You appear to be selecting the outside ("WAN") interface of the gateway. It would be less confusing if you had used 172.16.0.0/24 for the outside of the gateways in your example. How will the WAN interface manage to talk to the internal sender of the packet except via the tunnel? -- ] Never tell me the odds! | ipv6 mesh networks [ ] Michael Richardson, Sandelman Software Works | IoT architect [ ] mcr@sandelman.ca http://www.sandelman.ca/ | ruby on rails [
Hi Michael, On Thu, Apr 04, 2024 at 07:38:04AM -0400, Michael Richardson via Devel wrote: > > Antony Antony via Devel <devel@linux-ipsec.org> wrote: > > This commit would force to use source address from the gatway/host. > > The ICMP error message source address correctly set from the host. > > While that seems more correct, since that host is generating, it might not > fit into the IPsec tunnel, and therefore might go the right place or > anywhere. Perhaps you could pick the internal IP of the gateway, but in > more complex policies, the gateway itself might not be part of the VPN. > > > Again before the fix ping -W 5 -c 1 10.1.4.3 From 10.1.4.3 icmp_seq=1 > > Destination Host Unreachable > > > After the fix From 10.1.3.2 icmp_seq=1 Destination Host Unreachable > > ip -netns host2 xfrm policy add src 10.1.1.0/24 dst 10.1.4.0/24 dir out \ > flag icmp tmpl src 10.1.2.1 dst 10.1.3.2 proto esp reqid 1 mode tunnel > > As far as I can see, 10.1.3.2 does not fit into this policy. Indeed, 10.1.3.2 does not match the policy. However, notice the "flag icmp" in the above line. That means the policy lookup will use the inner payload for policy lookup as specified in RFC 4301, Section 6, which will match. The inner payload 10.1.4.1 <=> 10.1.4.3 will match the policy. > You appear to be selecting the outside ("WAN") interface of the gateway. > It would be less confusing if you had used 172.16.0.0/24 for the outside of > the gateways in your example. With this fix, I am leaving the source address selection of the error response to the stack instead of unconditionally copying from the packet that was dropped. So, in a simple case, the outgoing interface will be the end of the tunnel, i.e., 10.1.3.2. Also that is the end point of the tunnel. > How will the WAN interface manage to talk to the internal sender of the > packet except via the tunnel? WAN is geneerating an ESP packet? In this case ESP tunnel mode with 10.1.3.2. Note: the reciver has also the icmp enabled. -antony
On 04.04.24 12:31, Antony Antony wrote: > When enabling support for XFRM lookup using reverse ICMP payload, we > have identified an issue where the source address of the IPv4, e.g., > "Destination Host Unreachable" message, is incorrect. The error message > received by the sender appears to originate from a > non-existing/unreachable host. IPv6 seems to behave correctly; respond > with an existing source address from the gateway. > > Here is example of incorrect source address for ICMP error response. > When sending a ping to an unreachable host, the sender would receive an > ICMP unreachable response with a fake source address. Rather the address > of the host that generated ICMP Unreachable message. This is confusing > and incorrect. > > Example: > ping -W 9 -w 5 -c 1 10.1.4.3 > PING 10.1.4.3 (10.1.4.3) 56(84) bytes of data. > From 10.1.4.3 icmp_seq=1 Destination Host Unreachable > > Notice : packet has the source address of the ICMP "Unreachable host!" > > This issue can be traced back to commit > 415b3334a21a ("icmp: Fix regression in nexthop resolution during replies.") > which introduced a change that copied the source address from the ICMP > payload. > > This commit would force to use source address from the gatway/host. > The ICMP error message source address correctly set from the host. > > After fixing: > ping -W 5 -c 1 10.1.4.3 > PING 10.1.4.3 (10.1.4.3) 56(84) bytes of data. > From 10.1.3.2 icmp_seq=1 Destination Host Unreachable > > Here is an snippt to reporduce the issue. > > export AB="10.1" > for i in 1 2 3 4 5; do > h="host${i}" > ip netns add ${h} > ip -netns ${h} link set lo up > ip netns exec ${h} sysctl -wq net.ipv4.ip_forward=1 > if [ $i -lt 5 ]; then > ip -netns ${h} link add eth0 type veth peer name eth10${i} > ip -netns ${h} addr add "${AB}.${i}.1/24" dev eth0 > ip -netns ${h} link set up dev eth0 > fi > done > > for i in 1 2 3 4 5; do > h="host${i}" > p=$((i - 1)) > ph="host${p}" > # connect to previous host > if [ $i -gt 1 ]; then > ip -netns ${ph} link set eth10${p} netns ${h} > ip -netns ${h} link set eth10${p} name eth1 > ip -netns ${h} link set up dev eth1 > ip -netns ${h} addr add "${AB}.${p}.2/24" dev eth1 > fi > # add forward routes > for k in $(seq ${i} $((5 - 1))); do > ip -netns ${h} route 2>/dev/null | (grep "${AB}.${k}.0" 2>/dev/null) || \ > ip -netns ${h} route add "${AB}.${k}.0/24" via "${AB}.${i}.2" 2>/dev/nul > done > > # add reverse routes > for k in $(seq 1 $((i - 2))); do > ip -netns ${h} route 2>/dev/null | grep "${AB}.${k}.0" 2>/dev/null || \ > ip -netns ${h} route add "${AB}.${k}.0/24" via "${AB}.${p}.1" 2>/dev/nul > done > done > > ip netns exec host1 ping -q -W 2 -w 1 -c 1 10.1.4.2 2>&1>/dev/null && echo "success 10.1.4.2 reachable" || echo "ERROR" > ip netns exec host1 ping -W 9 -w 5 -c 1 10.1.4.3 || echo "note the source address of unreachble of gateway" > ip -netns host1 route flush cache > > ip netns exec host3 nft add table inet filter > ip netns exec host3 nft add chain inet filter FORWARD { type filter hook forward priority filter\; policy drop \; } > ip netns exec host3 nft add rule inet filter FORWARD counter ip protocol icmp drop > ip netns exec host3 nft add rule inet filter FORWARD counter ip protocol esp accept > ip netns exec host3 nft add rule inet filter FORWARD counter drop > > ip -netns host2 xfrm policy add src 10.1.1.0/24 dst 10.1.4.0/24 dir out \ > flag icmp tmpl src 10.1.2.1 dst 10.1.3.2 proto esp reqid 1 mode tunnel > > ip -netns host2 xfrm policy add src 10.1.4.0/24 dst 10.1.1.0/24 dir in \ > tmpl src 10.1.3.2 dst 10.1.2.1 proto esp reqid 2 mode tunnel > > ip -netns host2 xfrm policy add src 10.1.4.0/24 dst 10.1.1.0/24 dir fwd \ > flag icmp tmpl src 10.1.3.2 dst 10.1.2.1 proto esp reqid 2 mode tunnel > > ip -netns host2 xfrm state add src 10.1.2.1 dst 10.1.3.2 proto esp spi 1 \ > reqid 1 replay-window 1 mode tunnel aead 'rfc4106(gcm(aes))' \ > 0x1111111111111111111111111111111111111111 96 \ > sel src 10.1.1.0/24 dst 10.1.4.0/24 > > ip -netns host2 xfrm state add src 10.1.3.2 dst 10.1.2.1 proto esp spi 2 \ > flag icmp reqid 2 replay-window 10 mode tunnel aead 'rfc4106(gcm(aes))' \ > 0x2222222222222222222222222222222222222222 96 > > ip -netns host4 xfrm policy add src 10.1.4.0/24 dst 10.1.1.0/24 dir out \ > flag icmp tmpl src 10.1.3.2 dst 10.1.2.1 proto esp reqid 1 mode tunnel > > ip -netns host4 xfrm policy add src 10.1.1.0/24 dst 10.1.4.0/24 dir in \ > tmpl src 10.1.2.1 dst 10.1.3.2 proto esp reqid 2 mode tunnel > > ip -netns host4 xfrm policy add src 10.1.1.0/24 dst 10.1.4.0/24 dir fwd \ > flag icmp tmpl src 10.1.2.1 dst 10.1.3.2 proto esp reqid 2 mode tunnel > > ip -netns host4 xfrm state add src 10.1.3.2 dst 10.1.2.1 proto esp spi 2 \ > reqid 1 replay-window 1 mode tunnel aead 'rfc4106(gcm(aes))' \ > 0x2222222222222222222222222222222222222222 96 > > ip -netns host4 xfrm state add src 10.1.2.1 dst 10.1.3.2 proto esp spi 1 \ > reqid 2 replay-window 20 flag icmp mode tunnel aead 'rfc4106(gcm(aes))' \ > 0x1111111111111111111111111111111111111111 96 \ > sel src 10.1.1.0/24 dst 10.1.4.0/24 > > ip netns exec host1 ping -W 5 -c 1 10.1.4.2 2>&1 > /dev/null && echo "" > ip netns exec host1 ping -W 5 -c 1 10.1.4.3 || echo "note source address of gateway 10.1.3.2" > > Again before the fix > ping -W 5 -c 1 10.1.4.3 > From 10.1.4.3 icmp_seq=1 Destination Host Unreachable > > After the fix > From 10.1.3.2 icmp_seq=1 Destination Host Unreachable > > Signed-off-by: Antony Antony <antony.antony@secunet.com> > --- > net/ipv4/icmp.c | 1 - > 1 file changed, 1 deletion(-) > > diff --git a/net/ipv4/icmp.c b/net/ipv4/icmp.c > index e63a3bf99617..bec234637122 100644 > --- a/net/ipv4/icmp.c > +++ b/net/ipv4/icmp.c > @@ -555,7 +555,6 @@ static struct rtable *icmp_route_lookup(struct net *net, > XFRM_LOOKUP_ICMP); > if (!IS_ERR(rt2)) { > dst_release(&rt->dst); > - memcpy(fl4, &fl4_dec, sizeof(*fl4)); > rt = rt2; > } else if (PTR_ERR(rt2) == -EPERM) { > if (rt) Acked-by: Tobias Brunner <tobias@strongswan.org>
Antony Antony <antony@phenome.org> wrote: > Indeed, 10.1.3.2 does not match the policy. However, notice the "flag > icmp" in the above line. That means the policy lookup will use the > inner payload for policy lookup as specified in RFC 4301, Section 6, > which will match. The inner payload 10.1.4.1 <=> 10.1.4.3 will match > the policy. How is "flag icmp" communicated via IKEv2? Won't the other gateway just drop this packet?
Hi Michael, On Thu, Apr 04, 2024 at 10:39:17AM -0400, Michael Richardson wrote: > > Antony Antony <antony@phenome.org> wrote: > > Indeed, 10.1.3.2 does not match the policy. However, notice the "flag > > icmp" in the above line. That means the policy lookup will use the > > inner payload for policy lookup as specified in RFC 4301, Section 6, > > which will match. The inner payload 10.1.4.1 <=> 10.1.4.3 will match > > the policy. > > How is "flag icmp" communicated via IKEv2? As far as I'm aware, it isn't communicated via IKEv2. I believe it's considered a local policy, and possibly specified in BCP. However, how is communicating it over IKEv2 relevant to this kernel patch? I don't see any connection! If there is one, please elaborate. Without a clear link, the netdev maintainers might reject this patch. > Won't the other gateway just drop this packet? That's would be a local choice, fate of an ICMP message:), akin to ICMP errors elsewhere. Let's not dive into filtering choices and PMTU for now:) Just thinking out loud, I haven't seen forwarding ICMP error messages negotiated in other tunneling protocols like MPLS or pptp...., if I recall correctly, QUIC does indeed have it specified. -antony
Antony Antony <antony@phenome.org> wrote: > On Thu, Apr 04, 2024 at 10:39:17AM -0400, Michael Richardson wrote: >> >> Antony Antony <antony@phenome.org> wrote: > Indeed, 10.1.3.2 does not >> match the policy. However, notice the "flag > icmp" in the above >> line. That means the policy lookup will use the > inner payload for >> policy lookup as specified in RFC 4301, Section 6, > which will >> match. The inner payload 10.1.4.1 <=> 10.1.4.3 will match > the >> policy. >> >> How is "flag icmp" communicated via IKEv2? > As far as I'm aware, it isn't communicated via IKEv2. I believe it's > considered a local policy, and possibly specified in BCP. > However, how is communicating it over IKEv2 relevant to this kernel > patch? I don't see any connection! If there is one, please > elaborate. Without a clear link, the netdev maintainers might reject > this patch. because, we are using a custom Linux kernel flag to get the ICMP back into the tunnel, then the other end might not accept the packet if it doesn't have a similiar configuration. >> Won't the other gateway just drop this packet? > That's would be a local choice, fate of an ICMP message:), akin to ICMP > errors elsewhere. Let's not dive into filtering choices and PMTU for > now:) No, it's not. It's up to IKEv2 to configure those flags. Your choice requires extra flags. The previous behaviour was rather ingenious because it guaranteed that the packet always fit into the tunnel. (the bug was that it didn't do it for IPv6 as well) > Just thinking out loud, I haven't seen forwarding ICMP error messages > negotiated in other tunneling protocols like MPLS or pptp...., if I > recall correctly, QUIC does indeed have it specified. So, what? They aren't L3 tunnel protocols are they? MPLS is L2.5, pptp is L2 and QUIC is L4. -- ] Never tell me the odds! | ipv6 mesh networks [ ] Michael Richardson, Sandelman Software Works | IoT architect [ ] mcr@sandelman.ca http://www.sandelman.ca/ | ruby on rails [
Michael Richardson via Devel writes: > > Antony Antony <antony@phenome.org> wrote: > > Indeed, 10.1.3.2 does not match the policy. However, notice the "flag > > icmp" in the above line. That means the policy lookup will use the > > inner payload for policy lookup as specified in RFC 4301, Section 6, > > which will match. The inner payload 10.1.4.1 <=> 10.1.4.3 will match > > the policy. > > How is "flag icmp" communicated via IKEv2? > Won't the other gateway just drop this packet? It is not specified in IKE, it is mandated by the RFC4301 section 6.2: ---------------------------------------------------------------------- 6.2. Processing Protected, Transit ICMP Error Messages ... If no SA exists that would carry the outbound ICMP message in question, and if no SPD entry would allow carriage of this outbound ICMP error message, then an IPsec implementation MUST map the message to the SA that would carry the return traffic associated with the packet that triggered the ICMP error message. This requires an IPsec implementation to detect outbound ICMP error messages that map to no extant SA or SPD entry, and treat them specially with regard to SA creation and lookup. The implementation extracts the header for the packet that triggered the error (from the ICMP message payload), reverses the source and destination IP address fields, extracts the protocol field, and reverses the port fields (if accessible). It then uses this extracted information to locate an appropriate, active outbound SA, and transmits the error message via this SA. If no such SA exists, no SA will be created, and this is an auditable event. If an IPsec implementation receives an inbound ICMP error message on an SA, and the IP and ICMP headers of the message do not match the traffic selectors for the SA, the receiver MUST process the received message in a special fashion. Specifically, the receiver must extract the header of the triggering packet from the ICMP payload, and reverse fields as described above to determine if the packet is consistent with the selectors for the SA via which the ICMP error message was received. If the packet fails this check, the IPsec implementation MUST NOT forwarded the ICMP message to the destination. This is an auditable event.
Hi Michael, thanks for your clarifications. On Thu, Apr 04, 2024 at 11:35:57AM -0400, Michael Richardson wrote: > > Antony Antony <antony@phenome.org> wrote: > > On Thu, Apr 04, 2024 at 10:39:17AM -0400, Michael Richardson wrote: > >> > >> Antony Antony <antony@phenome.org> wrote: > Indeed, 10.1.3.2 does not > >> match the policy. However, notice the "flag > icmp" in the above > >> line. That means the policy lookup will use the > inner payload for > >> policy lookup as specified in RFC 4301, Section 6, > which will > >> match. The inner payload 10.1.4.1 <=> 10.1.4.3 will match > the > >> policy. > >> > >> How is "flag icmp" communicated via IKEv2? > > > As far as I'm aware, it isn't communicated via IKEv2. I believe it's > > considered a local policy, and possibly specified in BCP. > > > However, how is communicating it over IKEv2 relevant to this kernel > > patch? I don't see any connection! If there is one, please > > elaborate. Without a clear link, the netdev maintainers might reject > > this patch. > > because, we are using a custom Linux kernel flag to get the ICMP back into > the tunnel, then the other end might not accept the packet if it doesn't have > a similiar configuration. The use of "flags icmp" option here for ICMP handling within tunnels is directly aligned with the standards outlined in IPsec RFC 4301, section 6.2. This section mandates that both sender and receiver must be configurable to verify inner payload header information against the SA it traverses. It explicitly states, "an IPsec implementation MUST be configurable to check that this payload header information is consistent with the SA via which it arrives." Furthermore, it details the required handling of inbound ICMP error messages that don't match the SA's traffic selectors, emphasizing the necessity for special processing in such cases. Therefore, the need for IKEv2 in this context does not arise, as Linux kernel approach is fully compliant with the specified standards. > > >> Won't the other gateway just drop this packet? > > > That's would be a local choice, fate of an ICMP message:), akin to ICMP > > errors elsewhere. Let's not dive into filtering choices and PMTU for > > now:) > > No, it's not. It's up to IKEv2 to configure those flags. > Your choice requires extra flags. The previous behaviour was rather > ingenious because it guaranteed that the packet always fit into the tunnel. > (the bug was that it didn't do it for IPv6 as well) While the current behavior might seem beneficial, it has a major flaw. The generated response matches the IPsec policy only because it uses incorrect source addresses. This is address spoofing by the IPsec peer. This should be considered a bug, not a feature. In contrast, the correct approach, as IPv6 demonstrates, avoids such tactics. Ensuring IPv4 also adheres to standards is crucial. Thus, applying this fix is an improvment. I think the patch should be applied, and leave it to the kernel community's insights and judgment. > > > Just thinking out loud, I haven't seen forwarding ICMP error messages > > negotiated in other tunneling protocols like MPLS or pptp...., if I > > recall correctly, QUIC does indeed have it specified. > > So, what? They aren't L3 tunnel protocols are they? Based on RFC 4301, section 6—as referenced in the quote above—it explicitly mandates that implementing a locally configurable option/ Therefore, negotiation over IKEv2 seems unnecessary. > MPLS is L2.5, pptp is L2 and QUIC is L4. -antony
On Thu, 4 Apr 2024 12:31:56 +0200 Antony Antony wrote: > export AB="10.1" > for i in 1 2 3 4 5; do > h="host${i}" > ip netns add ${h} > ip -netns ${h} link set lo up > ip netns exec ${h} sysctl -wq net.ipv4.ip_forward=1 > if [ $i -lt 5 ]; then > ip -netns ${h} link add eth0 type veth peer name eth10${i} > ip -netns ${h} addr add "${AB}.${i}.1/24" dev eth0 > ip -netns ${h} link set up dev eth0 > fi > done > > for i in 1 2 3 4 5; do > h="host${i}" > p=$((i - 1)) > ph="host${p}" > # connect to previous host > if [ $i -gt 1 ]; then > ip -netns ${ph} link set eth10${p} netns ${h} > ip -netns ${h} link set eth10${p} name eth1 > ip -netns ${h} link set up dev eth1 > ip -netns ${h} addr add "${AB}.${p}.2/24" dev eth1 > fi > # add forward routes > for k in $(seq ${i} $((5 - 1))); do > ip -netns ${h} route 2>/dev/null | (grep "${AB}.${k}.0" 2>/dev/null) || \ > ip -netns ${h} route add "${AB}.${k}.0/24" via "${AB}.${i}.2" 2>/dev/nul > done > > # add reverse routes > for k in $(seq 1 $((i - 2))); do > ip -netns ${h} route 2>/dev/null | grep "${AB}.${k}.0" 2>/dev/null || \ > ip -netns ${h} route add "${AB}.${k}.0/24" via "${AB}.${p}.1" 2>/dev/nul > done > done > > ip netns exec host1 ping -q -W 2 -w 1 -c 1 10.1.4.2 2>&1>/dev/null && echo "success 10.1.4.2 reachable" || echo "ERROR" > ip netns exec host1 ping -W 9 -w 5 -c 1 10.1.4.3 || echo "note the source address of unreachble of gateway" > ip -netns host1 route flush cache > > ip netns exec host3 nft add table inet filter > ip netns exec host3 nft add chain inet filter FORWARD { type filter hook forward priority filter\; policy drop \; } > ip netns exec host3 nft add rule inet filter FORWARD counter ip protocol icmp drop > ip netns exec host3 nft add rule inet filter FORWARD counter ip protocol esp accept > ip netns exec host3 nft add rule inet filter FORWARD counter drop > > ip -netns host2 xfrm policy add src 10.1.1.0/24 dst 10.1.4.0/24 dir out \ > flag icmp tmpl src 10.1.2.1 dst 10.1.3.2 proto esp reqid 1 mode tunnel > > ip -netns host2 xfrm policy add src 10.1.4.0/24 dst 10.1.1.0/24 dir in \ > tmpl src 10.1.3.2 dst 10.1.2.1 proto esp reqid 2 mode tunnel > > ip -netns host2 xfrm policy add src 10.1.4.0/24 dst 10.1.1.0/24 dir fwd \ > flag icmp tmpl src 10.1.3.2 dst 10.1.2.1 proto esp reqid 2 mode tunnel > > ip -netns host2 xfrm state add src 10.1.2.1 dst 10.1.3.2 proto esp spi 1 \ > reqid 1 replay-window 1 mode tunnel aead 'rfc4106(gcm(aes))' \ > 0x1111111111111111111111111111111111111111 96 \ > sel src 10.1.1.0/24 dst 10.1.4.0/24 > > ip -netns host2 xfrm state add src 10.1.3.2 dst 10.1.2.1 proto esp spi 2 \ > flag icmp reqid 2 replay-window 10 mode tunnel aead 'rfc4106(gcm(aes))' \ > 0x2222222222222222222222222222222222222222 96 > > ip -netns host4 xfrm policy add src 10.1.4.0/24 dst 10.1.1.0/24 dir out \ > flag icmp tmpl src 10.1.3.2 dst 10.1.2.1 proto esp reqid 1 mode tunnel > > ip -netns host4 xfrm policy add src 10.1.1.0/24 dst 10.1.4.0/24 dir in \ > tmpl src 10.1.2.1 dst 10.1.3.2 proto esp reqid 2 mode tunnel > > ip -netns host4 xfrm policy add src 10.1.1.0/24 dst 10.1.4.0/24 dir fwd \ > flag icmp tmpl src 10.1.2.1 dst 10.1.3.2 proto esp reqid 2 mode tunnel > > ip -netns host4 xfrm state add src 10.1.3.2 dst 10.1.2.1 proto esp spi 2 \ > reqid 1 replay-window 1 mode tunnel aead 'rfc4106(gcm(aes))' \ > 0x2222222222222222222222222222222222222222 96 > > ip -netns host4 xfrm state add src 10.1.2.1 dst 10.1.3.2 proto esp spi 1 \ > reqid 2 replay-window 20 flag icmp mode tunnel aead 'rfc4106(gcm(aes))' \ > 0x1111111111111111111111111111111111111111 96 \ > sel src 10.1.1.0/24 dst 10.1.4.0/24 > > ip netns exec host1 ping -W 5 -c 1 10.1.4.2 2>&1 > /dev/null && echo "" > ip netns exec host1 ping -W 5 -c 1 10.1.4.3 || echo "note source address of gateway 10.1.3.2" Could you turn this into a selftest?
On Mon, Apr 08, 2024 at 19:15:34 -0700, Jakub Kicinski wrote: > On Thu, 4 Apr 2024 12:31:56 +0200 Antony Antony wrote: > > export AB="10.1" > > for i in 1 2 3 4 5; do > > h="host${i}" > > ip netns add ${h} > > ip -netns ${h} link set lo up > > ip netns exec ${h} sysctl -wq net.ipv4.ip_forward=1 > > if [ $i -lt 5 ]; then > > ip -netns ${h} link add eth0 type veth peer name eth10${i} > > ip -netns ${h} addr add "${AB}.${i}.1/24" dev eth0 > > ip -netns ${h} link set up dev eth0 > > fi > > done > > > > for i in 1 2 3 4 5; do > > h="host${i}" > > p=$((i - 1)) > > ph="host${p}" > > # connect to previous host > > if [ $i -gt 1 ]; then > > ip -netns ${ph} link set eth10${p} netns ${h} > > ip -netns ${h} link set eth10${p} name eth1 > > ip -netns ${h} link set up dev eth1 > > ip -netns ${h} addr add "${AB}.${p}.2/24" dev eth1 > > fi > > # add forward routes > > for k in $(seq ${i} $((5 - 1))); do > > ip -netns ${h} route 2>/dev/null | (grep "${AB}.${k}.0" 2>/dev/null) || \ > > ip -netns ${h} route add "${AB}.${k}.0/24" via "${AB}.${i}.2" 2>/dev/nul > > done > > > > # add reverse routes > > for k in $(seq 1 $((i - 2))); do > > ip -netns ${h} route 2>/dev/null | grep "${AB}.${k}.0" 2>/dev/null || \ > > ip -netns ${h} route add "${AB}.${k}.0/24" via "${AB}.${p}.1" 2>/dev/nul > > done > > done > > > > ip netns exec host1 ping -q -W 2 -w 1 -c 1 10.1.4.2 2>&1>/dev/null && echo "success 10.1.4.2 reachable" || echo "ERROR" > > ip netns exec host1 ping -W 9 -w 5 -c 1 10.1.4.3 || echo "note the source address of unreachble of gateway" > > ip -netns host1 route flush cache > > > > ip netns exec host3 nft add table inet filter > > ip netns exec host3 nft add chain inet filter FORWARD { type filter hook forward priority filter\; policy drop \; } > > ip netns exec host3 nft add rule inet filter FORWARD counter ip protocol icmp drop > > ip netns exec host3 nft add rule inet filter FORWARD counter ip protocol esp accept > > ip netns exec host3 nft add rule inet filter FORWARD counter drop > > > > ip -netns host2 xfrm policy add src 10.1.1.0/24 dst 10.1.4.0/24 dir out \ > > flag icmp tmpl src 10.1.2.1 dst 10.1.3.2 proto esp reqid 1 mode tunnel > > > > ip -netns host2 xfrm policy add src 10.1.4.0/24 dst 10.1.1.0/24 dir in \ > > tmpl src 10.1.3.2 dst 10.1.2.1 proto esp reqid 2 mode tunnel > > > > ip -netns host2 xfrm policy add src 10.1.4.0/24 dst 10.1.1.0/24 dir fwd \ > > flag icmp tmpl src 10.1.3.2 dst 10.1.2.1 proto esp reqid 2 mode tunnel > > > > ip -netns host2 xfrm state add src 10.1.2.1 dst 10.1.3.2 proto esp spi 1 \ > > reqid 1 replay-window 1 mode tunnel aead 'rfc4106(gcm(aes))' \ > > 0x1111111111111111111111111111111111111111 96 \ > > sel src 10.1.1.0/24 dst 10.1.4.0/24 > > > > ip -netns host2 xfrm state add src 10.1.3.2 dst 10.1.2.1 proto esp spi 2 \ > > flag icmp reqid 2 replay-window 10 mode tunnel aead 'rfc4106(gcm(aes))' \ > > 0x2222222222222222222222222222222222222222 96 > > > > ip -netns host4 xfrm policy add src 10.1.4.0/24 dst 10.1.1.0/24 dir out \ > > flag icmp tmpl src 10.1.3.2 dst 10.1.2.1 proto esp reqid 1 mode tunnel > > > > ip -netns host4 xfrm policy add src 10.1.1.0/24 dst 10.1.4.0/24 dir in \ > > tmpl src 10.1.2.1 dst 10.1.3.2 proto esp reqid 2 mode tunnel > > > > ip -netns host4 xfrm policy add src 10.1.1.0/24 dst 10.1.4.0/24 dir fwd \ > > flag icmp tmpl src 10.1.2.1 dst 10.1.3.2 proto esp reqid 2 mode tunnel > > > > ip -netns host4 xfrm state add src 10.1.3.2 dst 10.1.2.1 proto esp spi 2 \ > > reqid 1 replay-window 1 mode tunnel aead 'rfc4106(gcm(aes))' \ > > 0x2222222222222222222222222222222222222222 96 > > > > ip -netns host4 xfrm state add src 10.1.2.1 dst 10.1.3.2 proto esp spi 1 \ > > reqid 2 replay-window 20 flag icmp mode tunnel aead 'rfc4106(gcm(aes))' \ > > 0x1111111111111111111111111111111111111111 96 \ > > sel src 10.1.1.0/24 dst 10.1.4.0/24 > > > > ip netns exec host1 ping -W 5 -c 1 10.1.4.2 2>&1 > /dev/null && echo "" > > ip netns exec host1 ping -W 5 -c 1 10.1.4.3 || echo "note source address of gateway 10.1.3.2" > > Could you turn this into a selftest? I thought about it, and I didn't find any selftest file that match this test. This test need a topology with 4, ideally 5, namespaces connected in a line, and ip xfrm. git/linux/tools/testing/selftests/net/pmtu.sh is probably the easiest I can think off. git/linux/tools/testing/selftests/net/xfrm_policy.sh seems to be a bit more complex to a extra tests to. Do you have any preference? which file to add? -antonz
On Wed, 10 Apr 2024 19:48:46 +0200 Antony Antony wrote: > > Could you turn this into a selftest? > > I thought about it, and I didn't find any selftest file that match this > test. This test need a topology with 4, ideally 5, namespaces connected in a line, and ip xfrm. > > git/linux/tools/testing/selftests/net/pmtu.sh is probably the easiest I > can think off. > > git/linux/tools/testing/selftests/net/xfrm_policy.sh seems to be a bit > more complex to a extra tests to. > > Do you have any preference? which file to add? Whatever's easiest, I don't have much experience with either of those. You can also create a new file, it's perfectly fine, just make sure you add it to the makefile so kselftest knows about it
diff --git a/net/ipv4/icmp.c b/net/ipv4/icmp.c index e63a3bf99617..bec234637122 100644 --- a/net/ipv4/icmp.c +++ b/net/ipv4/icmp.c @@ -555,7 +555,6 @@ static struct rtable *icmp_route_lookup(struct net *net, XFRM_LOOKUP_ICMP); if (!IS_ERR(rt2)) { dst_release(&rt->dst); - memcpy(fl4, &fl4_dec, sizeof(*fl4)); rt = rt2; } else if (PTR_ERR(rt2) == -EPERM) { if (rt)
When enabling support for XFRM lookup using reverse ICMP payload, we have identified an issue where the source address of the IPv4, e.g., "Destination Host Unreachable" message, is incorrect. The error message received by the sender appears to originate from a non-existing/unreachable host. IPv6 seems to behave correctly; respond with an existing source address from the gateway. Here is example of incorrect source address for ICMP error response. When sending a ping to an unreachable host, the sender would receive an ICMP unreachable response with a fake source address. Rather the address of the host that generated ICMP Unreachable message. This is confusing and incorrect. Example: ping -W 9 -w 5 -c 1 10.1.4.3 PING 10.1.4.3 (10.1.4.3) 56(84) bytes of data. From 10.1.4.3 icmp_seq=1 Destination Host Unreachable Notice : packet has the source address of the ICMP "Unreachable host!" This issue can be traced back to commit 415b3334a21a ("icmp: Fix regression in nexthop resolution during replies.") which introduced a change that copied the source address from the ICMP payload. This commit would force to use source address from the gatway/host. The ICMP error message source address correctly set from the host. After fixing: ping -W 5 -c 1 10.1.4.3 PING 10.1.4.3 (10.1.4.3) 56(84) bytes of data. From 10.1.3.2 icmp_seq=1 Destination Host Unreachable Here is an snippt to reporduce the issue. export AB="10.1" for i in 1 2 3 4 5; do h="host${i}" ip netns add ${h} ip -netns ${h} link set lo up ip netns exec ${h} sysctl -wq net.ipv4.ip_forward=1 if [ $i -lt 5 ]; then ip -netns ${h} link add eth0 type veth peer name eth10${i} ip -netns ${h} addr add "${AB}.${i}.1/24" dev eth0 ip -netns ${h} link set up dev eth0 fi done for i in 1 2 3 4 5; do h="host${i}" p=$((i - 1)) ph="host${p}" # connect to previous host if [ $i -gt 1 ]; then ip -netns ${ph} link set eth10${p} netns ${h} ip -netns ${h} link set eth10${p} name eth1 ip -netns ${h} link set up dev eth1 ip -netns ${h} addr add "${AB}.${p}.2/24" dev eth1 fi # add forward routes for k in $(seq ${i} $((5 - 1))); do ip -netns ${h} route 2>/dev/null | (grep "${AB}.${k}.0" 2>/dev/null) || \ ip -netns ${h} route add "${AB}.${k}.0/24" via "${AB}.${i}.2" 2>/dev/nul done # add reverse routes for k in $(seq 1 $((i - 2))); do ip -netns ${h} route 2>/dev/null | grep "${AB}.${k}.0" 2>/dev/null || \ ip -netns ${h} route add "${AB}.${k}.0/24" via "${AB}.${p}.1" 2>/dev/nul done done ip netns exec host1 ping -q -W 2 -w 1 -c 1 10.1.4.2 2>&1>/dev/null && echo "success 10.1.4.2 reachable" || echo "ERROR" ip netns exec host1 ping -W 9 -w 5 -c 1 10.1.4.3 || echo "note the source address of unreachble of gateway" ip -netns host1 route flush cache ip netns exec host3 nft add table inet filter ip netns exec host3 nft add chain inet filter FORWARD { type filter hook forward priority filter\; policy drop \; } ip netns exec host3 nft add rule inet filter FORWARD counter ip protocol icmp drop ip netns exec host3 nft add rule inet filter FORWARD counter ip protocol esp accept ip netns exec host3 nft add rule inet filter FORWARD counter drop ip -netns host2 xfrm policy add src 10.1.1.0/24 dst 10.1.4.0/24 dir out \ flag icmp tmpl src 10.1.2.1 dst 10.1.3.2 proto esp reqid 1 mode tunnel ip -netns host2 xfrm policy add src 10.1.4.0/24 dst 10.1.1.0/24 dir in \ tmpl src 10.1.3.2 dst 10.1.2.1 proto esp reqid 2 mode tunnel ip -netns host2 xfrm policy add src 10.1.4.0/24 dst 10.1.1.0/24 dir fwd \ flag icmp tmpl src 10.1.3.2 dst 10.1.2.1 proto esp reqid 2 mode tunnel ip -netns host2 xfrm state add src 10.1.2.1 dst 10.1.3.2 proto esp spi 1 \ reqid 1 replay-window 1 mode tunnel aead 'rfc4106(gcm(aes))' \ 0x1111111111111111111111111111111111111111 96 \ sel src 10.1.1.0/24 dst 10.1.4.0/24 ip -netns host2 xfrm state add src 10.1.3.2 dst 10.1.2.1 proto esp spi 2 \ flag icmp reqid 2 replay-window 10 mode tunnel aead 'rfc4106(gcm(aes))' \ 0x2222222222222222222222222222222222222222 96 ip -netns host4 xfrm policy add src 10.1.4.0/24 dst 10.1.1.0/24 dir out \ flag icmp tmpl src 10.1.3.2 dst 10.1.2.1 proto esp reqid 1 mode tunnel ip -netns host4 xfrm policy add src 10.1.1.0/24 dst 10.1.4.0/24 dir in \ tmpl src 10.1.2.1 dst 10.1.3.2 proto esp reqid 2 mode tunnel ip -netns host4 xfrm policy add src 10.1.1.0/24 dst 10.1.4.0/24 dir fwd \ flag icmp tmpl src 10.1.2.1 dst 10.1.3.2 proto esp reqid 2 mode tunnel ip -netns host4 xfrm state add src 10.1.3.2 dst 10.1.2.1 proto esp spi 2 \ reqid 1 replay-window 1 mode tunnel aead 'rfc4106(gcm(aes))' \ 0x2222222222222222222222222222222222222222 96 ip -netns host4 xfrm state add src 10.1.2.1 dst 10.1.3.2 proto esp spi 1 \ reqid 2 replay-window 20 flag icmp mode tunnel aead 'rfc4106(gcm(aes))' \ 0x1111111111111111111111111111111111111111 96 \ sel src 10.1.1.0/24 dst 10.1.4.0/24 ip netns exec host1 ping -W 5 -c 1 10.1.4.2 2>&1 > /dev/null && echo "" ip netns exec host1 ping -W 5 -c 1 10.1.4.3 || echo "note source address of gateway 10.1.3.2" Again before the fix ping -W 5 -c 1 10.1.4.3 From 10.1.4.3 icmp_seq=1 Destination Host Unreachable After the fix From 10.1.3.2 icmp_seq=1 Destination Host Unreachable Signed-off-by: Antony Antony <antony.antony@secunet.com> --- net/ipv4/icmp.c | 1 - 1 file changed, 1 deletion(-) -- 2.30.2