diff mbox series

[net] netfilter: nf_nat: fix action not being set for all ct states

Message ID 20231221224311.130319-1-brad@faucet.nz (mailing list archive)
State Awaiting Upstream
Delegated to: Netdev Maintainers
Headers show
Series [net] netfilter: nf_nat: fix action not being set for all ct states | expand

Checks

Context Check Description
netdev/series_format success Single patches do not need cover letters
netdev/tree_selection success Clearly marked for net
netdev/ynl success SINGLE THREAD; Generated files up to date; no warnings/errors; no diff in generated;
netdev/fixes_present success Fixes tag present in non-next series
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 1113 this patch: 1113
netdev/cc_maintainers fail 2 blamed authors not CCed: lucien.xin@gmail.com aconole@redhat.com; 3 maintainers not CCed: coreteam@netfilter.org aconole@redhat.com lucien.xin@gmail.com
netdev/build_clang success Errors and warnings before: 1140 this patch: 1140
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/deprecated_api success None detected
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success Fixes tag looks correct
netdev/build_allmodconfig_warn success Errors and warnings before: 1140 this patch: 1140
netdev/checkpatch success total: 0 errors, 0 warnings, 0 checks, 11 lines checked
netdev/build_clang_rust success No Rust files in patch. Skipping build
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0

Commit Message

Brad Cowie Dec. 21, 2023, 10:43 p.m. UTC
This fixes openvswitch's handling of nat packets in the related state.

In nf_ct_nat_execute(), which is called from nf_ct_nat(), ICMP/ICMPv6
packets in the IP_CT_RELATED or IP_CT_RELATED_REPLY state, which have
not been dropped, will follow the goto, however the placement of the
goto label means that updating the action bit field will be bypassed.

This causes ovs_nat_update_key() to not be called from ovs_ct_nat()
which means the openvswitch match key for the ICMP/ICMPv6 packet is not
updated and the pre-nat value will be retained for the key, which will
result in the wrong openflow rule being matched for that packet.

Move the goto label above where the action bit field is being set so
that it is updated in all cases where the packet is accepted.

Fixes: ebddb1404900 ("net: move the nat function to nf_nat_ovs for ovs and tc")
Signed-off-by: Brad Cowie <brad@faucet.nz>
---
 net/netfilter/nf_nat_ovs.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

Comments

Simon Horman Dec. 23, 2023, 9:13 p.m. UTC | #1
+ Xin Long <lucien.xin@gmail.com>
  Aaron Conole <aconole@redhat.com>
  coreteam@netfilter.org

On Fri, Dec 22, 2023 at 11:43:11AM +1300, Brad Cowie wrote:
> This fixes openvswitch's handling of nat packets in the related state.
> 
> In nf_ct_nat_execute(), which is called from nf_ct_nat(), ICMP/ICMPv6
> packets in the IP_CT_RELATED or IP_CT_RELATED_REPLY state, which have
> not been dropped, will follow the goto, however the placement of the
> goto label means that updating the action bit field will be bypassed.
> 
> This causes ovs_nat_update_key() to not be called from ovs_ct_nat()
> which means the openvswitch match key for the ICMP/ICMPv6 packet is not
> updated and the pre-nat value will be retained for the key, which will
> result in the wrong openflow rule being matched for that packet.
> 
> Move the goto label above where the action bit field is being set so
> that it is updated in all cases where the packet is accepted.
> 
> Fixes: ebddb1404900 ("net: move the nat function to nf_nat_ovs for ovs and tc")
> Signed-off-by: Brad Cowie <brad@faucet.nz>

Thanks Brad,

I agree with your analysis and that the problem appears to
have been introduced by the cited commit.

I am curious to know what use case triggers this /
why it when unnoticed for a year.

But in any case, this fix looks good to me.

Reviewed-by: Simon Horman <horms@kernel.org>

> ---
>  net/netfilter/nf_nat_ovs.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/net/netfilter/nf_nat_ovs.c b/net/netfilter/nf_nat_ovs.c
> index 551abd2da614..0f9a559f6207 100644
> --- a/net/netfilter/nf_nat_ovs.c
> +++ b/net/netfilter/nf_nat_ovs.c
> @@ -75,9 +75,10 @@ static int nf_ct_nat_execute(struct sk_buff *skb, struct nf_conn *ct,
>  	}
>  
>  	err = nf_nat_packet(ct, ctinfo, hooknum, skb);
> +out:
>  	if (err == NF_ACCEPT)
>  		*action |= BIT(maniptype);
> -out:
> +
>  	return err;
>  }
>  
> -- 
> 2.34.1
> 
> _______________________________________________
> dev mailing list
> dev@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>
Brad Cowie Dec. 24, 2023, 2:47 a.m. UTC | #2
On Sun, 24 Dec 2023 at 10:13, Simon Horman <horms@kernel.org> wrote:
> Thanks Brad,
>
> I agree with your analysis and that the problem appears to
> have been introduced by the cited commit.

Thanks for the review Simon.

> I am curious to know what use case triggers this /
> why it when unnoticed for a year.

We encountered this issue while upgrading some routers from
linux 5.15 to 6.2. The dataplane on these routers is provided
by an openvswitch bridge which is controlled via openflow by
faucet. These routers are also performing SNAT on all traffic
to/from the wan interface via openvswitch conntrack openflow
rules.

We noticed that after upgrading the linux kernel, traceroute/mtr
no longer worked when run from clients behind the router.
We eventually discovered the reason for this is that the
ICMP time exceeded messages elicited by traceroute were
matching openflow rules with the incorrect destination ip,
despite there being an openflow rule to undo the nat.
Other packets in the established or new state matched the
expected openflow rules.

A git bisect between 5.15 and 6.2 showed that this change in
behaviour was introduced by commit ebddb1404900. After the
above patch is applied our routers perform nat correctly
again for traceroute/mtr.
Xin Long Dec. 28, 2023, 3:59 p.m. UTC | #3
On Sat, Dec 23, 2023 at 9:48 PM Brad Cowie <brad@faucet.nz> wrote:
>
> On Sun, 24 Dec 2023 at 10:13, Simon Horman <horms@kernel.org> wrote:
> > Thanks Brad,
> >
> > I agree with your analysis and that the problem appears to
> > have been introduced by the cited commit.
>
> Thanks for the review Simon.
>
> > I am curious to know what use case triggers this /
> > why it when unnoticed for a year.
>
> We encountered this issue while upgrading some routers from
> linux 5.15 to 6.2. The dataplane on these routers is provided
> by an openvswitch bridge which is controlled via openflow by
> faucet. These routers are also performing SNAT on all traffic
> to/from the wan interface via openvswitch conntrack openflow
> rules.
>
> We noticed that after upgrading the linux kernel, traceroute/mtr
> no longer worked when run from clients behind the router.
> We eventually discovered the reason for this is that the
> ICMP time exceeded messages elicited by traceroute were
> matching openflow rules with the incorrect destination ip,
> despite there being an openflow rule to undo the nat.
> Other packets in the established or new state matched the
> expected openflow rules.
>
> A git bisect between 5.15 and 6.2 showed that this change in
> behaviour was introduced by commit ebddb1404900. After the
> above patch is applied our routers perform nat correctly
> again for traceroute/mtr.

Acked-by: Xin Long <lucien.xin@gmail.com>
Aaron Conole Jan. 2, 2024, 3:10 p.m. UTC | #4
Simon Horman <horms@kernel.org> writes:

> + Xin Long <lucien.xin@gmail.com>
>   Aaron Conole <aconole@redhat.com>
>   coreteam@netfilter.org
>
> On Fri, Dec 22, 2023 at 11:43:11AM +1300, Brad Cowie wrote:
>> This fixes openvswitch's handling of nat packets in the related state.
>> 
>> In nf_ct_nat_execute(), which is called from nf_ct_nat(), ICMP/ICMPv6
>> packets in the IP_CT_RELATED or IP_CT_RELATED_REPLY state, which have
>> not been dropped, will follow the goto, however the placement of the
>> goto label means that updating the action bit field will be bypassed.
>> 
>> This causes ovs_nat_update_key() to not be called from ovs_ct_nat()
>> which means the openvswitch match key for the ICMP/ICMPv6 packet is not
>> updated and the pre-nat value will be retained for the key, which will
>> result in the wrong openflow rule being matched for that packet.
>> 
>> Move the goto label above where the action bit field is being set so
>> that it is updated in all cases where the packet is accepted.
>> 
>> Fixes: ebddb1404900 ("net: move the nat function to nf_nat_ovs for ovs and tc")
>> Signed-off-by: Brad Cowie <brad@faucet.nz>
>
> Thanks Brad,
>
> I agree with your analysis and that the problem appears to
> have been introduced by the cited commit.
>
> I am curious to know what use case triggers this /
> why it when unnoticed for a year.
>
> But in any case, this fix looks good to me.
>
> Reviewed-by: Simon Horman <horms@kernel.org>
>
>> ---

LGTM.  I guess we should try to codify the specific flows that were used
to flag this into the ovs selftest - we clearly have a missing case
after NAT lookup.

I'll add it to my (ever growing) list.

Meanwhile,

Acked-by: Aaron Conole <aconole@redhat.com>

>>  net/netfilter/nf_nat_ovs.c | 3 ++-
>>  1 file changed, 2 insertions(+), 1 deletion(-)
>> 
>> diff --git a/net/netfilter/nf_nat_ovs.c b/net/netfilter/nf_nat_ovs.c
>> index 551abd2da614..0f9a559f6207 100644
>> --- a/net/netfilter/nf_nat_ovs.c
>> +++ b/net/netfilter/nf_nat_ovs.c
>> @@ -75,9 +75,10 @@ static int nf_ct_nat_execute(struct sk_buff *skb, struct nf_conn *ct,
>>  	}
>>  
>>  	err = nf_nat_packet(ct, ctinfo, hooknum, skb);
>> +out:
>>  	if (err == NF_ACCEPT)
>>  		*action |= BIT(maniptype);
>> -out:
>> +
>>  	return err;
>>  }
>>  
>> -- 
>> 2.34.1
>> 
>> _______________________________________________
>> dev mailing list
>> dev@openvswitch.org
>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>>
Pablo Neira Ayuso Jan. 3, 2024, 10:26 a.m. UTC | #5
Applied to nf.git, thanks everyone for reviewing.
Brad Cowie Jan. 4, 2024, 5:04 a.m. UTC | #6
On Wed, 3 Jan 2024 at 04:10, Aaron Conole <aconole@redhat.com> wrote:

> LGTM.  I guess we should try to codify the specific flows that were used
> to flag this into the ovs selftest - we clearly have a missing case
> after NAT lookup.

Thanks for the review Aaron, and the sensible suggestion to add a
test to ovs to avoid this problem occuring again in future.

I've simplified our NAT ruleset and turned it into an ovs system test,
which I've submitted as a patch [1] to ovs-dev. The test reproduces
the issue introduced by ebddb1404900 and passes when e6345d2824a3
is applied.

[1]: https://mail.openvswitch.org/pipermail/ovs-dev/2024-January/410476.html
diff mbox series

Patch

diff --git a/net/netfilter/nf_nat_ovs.c b/net/netfilter/nf_nat_ovs.c
index 551abd2da614..0f9a559f6207 100644
--- a/net/netfilter/nf_nat_ovs.c
+++ b/net/netfilter/nf_nat_ovs.c
@@ -75,9 +75,10 @@  static int nf_ct_nat_execute(struct sk_buff *skb, struct nf_conn *ct,
 	}
 
 	err = nf_nat_packet(ct, ctinfo, hooknum, skb);
+out:
 	if (err == NF_ACCEPT)
 		*action |= BIT(maniptype);
-out:
+
 	return err;
 }