diff mbox series

[v2,ipsec-next,2/2] xfrm: fix source address in icmp error generation from IPsec gateway

Message ID 300c36a0644b63228cee8d0a74be0e1e81d0fe98.1698394516.git.antony.antony@secunet.com (mailing list archive)
State Awaiting Upstream
Delegated to: Netdev Maintainers
Headers show
Series None | expand

Commit Message

Antony Antony Oct. 27, 2023, 8:16 a.m. UTC
When enabling support for xfrm lookup using reverse ICMP payload,
We have identified an issue where the source address of the IPv4 e.g
"Destination Host Unreachable" message is incorrect. The IPv6 appear
to do the right thing.

Here is example of incorrect source address for ICMP error response.
When sending a ping to an unreachable host, the sender would receive an
ICMP unreachable response with a fake source address. Rather the address
of the host that generated ICMP Unreachable message. This is confusing
and incorrect.

Example:
ping -W 9 -w 5 -c 1 10.1.4.3
PING 10.1.4.3 (10.1.4.3) 56(84) bytes of data.
>From 10.1.4.3 icmp_seq=1 Destination Host Unreachable

Notice : packet has the source address of the ICMP "Unreachable host!"

This issue can be traced back to commit
415b3334a21a ("icmp: Fix regression in nexthop resolution during replies.")
which introduced a change that copied the source address from the ICMP
payload.

This commit would force to use source address from the gatway/host.
The ICMP error message source address correctly set from the host.

After fixing:
ping -W 5 -c 1 10.1.4.3
PING 10.1.4.3 (10.1.4.3) 56(84) bytes of data.
>From 10.1.3.2 icmp_seq=1 Destination Host Unreachable

Here is an example to reporduce:

export AB="10.1"
for i in 1 2 3 4 5; do
        h="host${i}"
        ip netns add ${h}
        ip -netns ${h} link set lo up
        ip netns exec ${h} sysctl -wq net.ipv4.ip_forward=1
        if [ $i -lt 5 ]; then
                ip -netns ${h} link add eth0 type veth peer name eth10${i}
                ip -netns ${h} addr add "${AB}.${i}.1/24" dev eth0
                ip -netns ${h} link set up dev eth0
        fi
done

for i in 1 2 3 4 5; do
        h="host${i}"
        p=$((i - 1))
        ph="host${p}"
        # connect to previous host
        if [ $i -gt 1 ]; then
                ip -netns ${ph} link set eth10${p} netns ${h}
                ip -netns ${h} link set eth10${p} name eth1
                ip -netns ${h} link set up dev eth1
                ip -netns ${h} addr add "${AB}.${p}.2/24" dev eth1
        fi
        # add forward routes
        for k in $(seq ${i} $((5 - 1))); do
                ip -netns ${h} route 2>/dev/null | (grep "${AB}.${k}.0" 2>/dev/null) || \
                ip -netns ${h} route add "${AB}.${k}.0/24" via "${AB}.${i}.2" 2>/dev/nul
        done

        # add reverse routes
        for k in $(seq 1 $((i - 2))); do
                ip -netns ${h} route 2>/dev/null | grep "${AB}.${k}.0" 2>/dev/null || \
                ip -netns ${h} route add "${AB}.${k}.0/24" via "${AB}.${p}.1" 2>/dev/nul
        done
done

ip netns exec host1 ping -q -W 2 -w 1 -c 1 10.1.4.2 2>&1>/dev/null && echo "success 10.1.4.2 reachable" || echo "ERROR"
ip netns exec host1 ping -W 9 -w 5 -c 1 10.1.4.3 || echo  "note the source address of unreachble"
ip -netns host1 route flush cache

ip netns exec host3 nft add table inet filter
ip netns exec host3 nft add chain inet filter FORWARD { type filter hook forward priority filter\; policy drop \; }
ip netns exec host3 nft add rule inet filter FORWARD counter ip protocol icmp drop
ip netns exec host3 nft add rule inet filter FORWARD counter ip protocol esp accept
ip netns exec host3 nft add rule inet filter FORWARD counter drop

ip -netns host2 xfrm policy add src 10.1.1.0/24 dst 10.1.4.0/24 dir out \
        flag icmp tmpl src 10.1.2.1 dst 10.1.3.2 proto esp reqid 1 mode tunnel

ip -netns host2 xfrm policy add src 10.1.4.0/24 dst 10.1.1.0/24 dir in \
        tmpl src 10.1.3.2 dst 10.1.2.1 proto esp reqid 2 mode tunnel

ip -netns host2 xfrm policy add src 10.1.4.0/24 dst 10.1.1.0/24 dir fwd \
        flag icmp tmpl src 10.1.3.2 dst 10.1.2.1 proto esp reqid 2 mode tunnel

ip -netns host2 xfrm state add src 10.1.2.1 dst 10.1.3.2 proto esp spi 1 \
        reqid 1 replay-window 1  mode tunnel aead 'rfc4106(gcm(aes))' \
        0x1111111111111111111111111111111111111111 96 \
        sel src 10.1.1.0/24 dst 10.1.4.0/24

ip -netns host2 xfrm state add src 10.1.3.2 dst 10.1.2.1 proto esp spi 2 \
        flag icmp reqid 2 replay-window 10 mode tunnel aead 'rfc4106(gcm(aes))' \
        0x2222222222222222222222222222222222222222 96

ip -netns host4 xfrm policy add src 10.1.4.0/24 dst 10.1.1.0/24 dir out \
        flag icmp tmpl src 10.1.3.2 dst 10.1.2.1 proto esp reqid 1 mode tunnel

ip -netns host4 xfrm policy add src 10.1.1.0/24 dst 10.1.4.0/24 dir in \
        tmpl src 10.1.2.1 dst 10.1.3.2 proto esp reqid 2  mode tunnel

ip -netns host4 xfrm policy add src 10.1.1.0/24 dst 10.1.4.0/24 dir fwd \
                flag icmp tmpl src 10.1.2.1 dst 10.1.3.2 proto esp reqid 2 mode tunnel

ip -netns host4 xfrm state add src 10.1.3.2 dst 10.1.2.1 proto esp spi 2 \
        reqid 1 replay-window 1 mode tunnel aead 'rfc4106(gcm(aes))' \
        0x2222222222222222222222222222222222222222 96

ip -netns host4 xfrm state add src 10.1.2.1 dst 10.1.3.2 proto esp spi 1 \
        reqid 2 replay-window 20 flag icmp  mode tunnel aead 'rfc4106(gcm(aes))' \
        0x1111111111111111111111111111111111111111 96 \
        sel src 10.1.1.0/24 dst 10.1.4.0/24

ip netns exec host1 ping -W 5 -c 1 10.1.4.2 2>&1 > /dev/null && echo ""
ip netns exec host1 ping -W 5 -c 1 10.1.4.3 || echo "note source address"

Again before the fix
ping -W 5 -c 1 10.1.4.3
>From 10.1.4.3 icmp_seq=1 Destination Host Unreachable

After the fix
>From 10.1.3.2 icmp_seq=1 Destination Host Unreachable

Signed-off-by: Antony Antony <antony.antony@secunet.com>
---
 net/ipv4/icmp.c | 1 -
 1 file changed, 1 deletion(-)

Comments

Michael Richardson Oct. 27, 2023, 1:30 p.m. UTC | #1
Antony Antony via Devel <devel@linux-ipsec.org> wrote:
    > When enabling support for xfrm lookup using reverse ICMP payload,
    > We have identified an issue where the source address of the IPv4 e.g
    > "Destination Host Unreachable" message is incorrect. The IPv6 appear
    > to do the right thing.

One thing that operators of routers with a multitude of interfaces want to do
is send all ICMP messages from a specific IP address.  Often the public
address, that has the sane reverse DNS name.
AFAIK, this is not an option on Linux, but Cisco/Juniper/etc. devices usually
can do this.  I can't recall how today. (I was actually looking that up this week)

This can conflict however, with the need to get the result back into the
tunnel.  I don't have a good answer, except that we probably need a fair bit
of flexibility, with some good automatically discovered defaults.
Antony Antony Oct. 29, 2023, 10:26 a.m. UTC | #2
On Fri, Oct 27, 2023 at 09:30:07AM -0400, Michael Richardson via Devel wrote:
> 
> Antony Antony via Devel <devel@linux-ipsec.org> wrote:
>     > When enabling support for xfrm lookup using reverse ICMP payload,
>     > We have identified an issue where the source address of the IPv4 e.g
>     > "Destination Host Unreachable" message is incorrect. The IPv6 appear
>     > to do the right thing.
> 
> One thing that operators of routers with a multitude of interfaces want to do
> is send all ICMP messages from a specific IP address.  Often the public
> address, that has the sane reverse DNS name.

While it makes sense for routers with multiple interfaces, receiving ICMP 
errors from private addresses can be confusing. However, wouldn't this also 
make it more challenging to adhere to BCP 32 and BCP 38? Routing with 
multiple interfaces is tricky on Linux, especially when it comes to 
compliance with these BCPs.

> AFAIK, this is not an option on Linux, but Cisco/Juniper/etc. devices usually
> can do this.  I can't recall how today. (I was actually looking that up this week)

I wonder if a netfilter rule would be a solution for you, something like:

"ip protocol icmp type <error codes> snat to x.x.x.x"

I would love see a simple option instead of a SNAT hack. May be a routing 
rule that will choose sourse address for ICMP error code.

> This can conflict however, with the need to get the result back into the
> tunnel.  I don't have a good answer, except that we probably need a fair bit

Forwarding that ICMP packet, which is not covered by your forward IPsec 
policy, would be fixed with the second patch in this series. In that case 
lookup would using the orginal packet as describe in RFC 4301, Section 6.

-antony
Michael Richardson Oct. 31, 2023, 7:59 a.m. UTC | #3
Antony Antony <antony@phenome.org> wrote:
    > On Fri, Oct 27, 2023 at 09:30:07AM -0400, Michael Richardson via Devel
    > wrote:
    >>
    >> Antony Antony via Devel <devel@linux-ipsec.org> wrote: > When enabling
    >> support for xfrm lookup using reverse ICMP payload, > We have
    >> identified an issue where the source address of the IPv4 e.g >
    >> "Destination Host Unreachable" message is incorrect. The IPv6 appear >
    >> to do the right thing.
    >>
    >> One thing that operators of routers with a multitude of interfaces
    >> want to do is send all ICMP messages from a specific IP address.
    >> Often the public address, that has the sane reverse DNS name.

    > While it makes sense for routers with multiple interfaces, receiving
    > ICMP errors from private addresses can be confusing. However, wouldn't
    > this also make it more challenging to adhere to BCP 32 and BCP 38?
    > Routing with multiple interfaces is tricky on Linux, especially when it
    > comes to compliance with these BCPs.

Yes, that's why sending from a public, topically significant source address
is really important.  Yet, many links are numbered in 1918 because..

    > I wonder if a netfilter rule would be a solution for you, something
    > like:

    > I would love see a simple option instead of a SNAT hack. May be a
    > routing rule that will choose sourse address for ICMP error code.

yeah, I really don't want to do SNAT stuff.
I'd like to have a flag on each interface that says to use the "global" ICMP
source or use the heuristic we have now.  And then we need a way to set that
source address.  Most routing platforms put a /32 address (and /128) on lo
(or a dummy) as the device's reachable address, and then spread that through
OSPF.
Steffen Klassert Nov. 17, 2023, 9:21 a.m. UTC | #4
On Fri, Oct 27, 2023 at 10:16:52AM +0200, Antony Antony wrote:
> 
> diff --git a/net/ipv4/icmp.c b/net/ipv4/icmp.c
> index e63a3bf99617..bec234637122 100644
> --- a/net/ipv4/icmp.c
> +++ b/net/ipv4/icmp.c
> @@ -555,7 +555,6 @@ static struct rtable *icmp_route_lookup(struct net *net,
>  					    XFRM_LOOKUP_ICMP);
>  	if (!IS_ERR(rt2)) {
>  		dst_release(&rt->dst);
> -		memcpy(fl4, &fl4_dec, sizeof(*fl4));

This is not really IPsec code. The change needs either an
Ack of one of the netdev Maintainers, or it has to go
through the nedev tree. Also, please consider this as
a fix.
Antony Antony Nov. 25, 2023, 11:15 p.m. UTC | #5
On Fri, Nov 17, 2023 at 10:21:37AM +0100, Steffen Klassert via Devel wrote:
> On Fri, Oct 27, 2023 at 10:16:52AM +0200, Antony Antony wrote:
> > 
> > diff --git a/net/ipv4/icmp.c b/net/ipv4/icmp.c
> > index e63a3bf99617..bec234637122 100644
> > --- a/net/ipv4/icmp.c
> > +++ b/net/ipv4/icmp.c
> > @@ -555,7 +555,6 @@ static struct rtable *icmp_route_lookup(struct net *net,
> >  					    XFRM_LOOKUP_ICMP);
> >  	if (!IS_ERR(rt2)) {
> >  		dst_release(&rt->dst);
> > -		memcpy(fl4, &fl4_dec, sizeof(*fl4));
> 
> This is not really IPsec code. The change needs either an
> Ack of one of the netdev Maintainers, or it has to go

I understand your concern. I chose to submit the change to ipsec-next as it 
is directly related to the outcome of a successful xfrm_lookup().

> through the nedev tree. Also, please consider this as
> a fix.

It is a fix:) I considered including a 'Fixes:' tag initially but ultimately 
decided against it.  My hesitation stemmed from the concern that if this fix 
were backported, it could inadvertently trigger regressions in someone’s 
test suite. This might lead to requests for a revert through the ipsec tree, 
which I am keen to avoid.

However, I do concur that this submission qualifies as a fix. Is there a way
to include the 'Fixes:' tag while also advising against backporting it to 
reduce the risk of potential regressions?

I will add the 'Fixes:' tag to the new version. When it comes to backport I 
will recomend not to backport this fix. Please keep an eye out for those 
messages. This could get backported to all curently maintained releases!

The key reason for pairing this update with my other patch ("xfrm: introduce 
forwarding of ICMP Error messages") is to proactively address any potential 
claims of a regression. Without this new patch, it's  conceivable that the 
changes could be misinterpreted as causing a regression, especially 
considering that the commit this patch addresses is 12 years old! By 
submitting them together, it should help clarify that these changes are, in 
fact, rectifying long-standing issues rather than introducing new ones.

I believe applying two patches together will provide a clearer context for 
both the changes and help streamline their acceptance and integration.

thanks,
-antony
diff mbox series

Patch

diff --git a/net/ipv4/icmp.c b/net/ipv4/icmp.c
index e63a3bf99617..bec234637122 100644
--- a/net/ipv4/icmp.c
+++ b/net/ipv4/icmp.c
@@ -555,7 +555,6 @@  static struct rtable *icmp_route_lookup(struct net *net,
 					    XFRM_LOOKUP_ICMP);
 	if (!IS_ERR(rt2)) {
 		dst_release(&rt->dst);
-		memcpy(fl4, &fl4_dec, sizeof(*fl4));
 		rt = rt2;
 	} else if (PTR_ERR(rt2) == -EPERM) {
 		if (rt)