Message ID | 20210614193440.3813-1-olek2@wp.pl (mailing list archive) |
---|---|
State | Awaiting Upstream |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | [nf] Revert "netfilter: flowtable: Remove redundant hw refresh bit" | expand |
Context | Check | Description |
---|---|---|
netdev/tree_selection | success | Guessing tree name failed - patch did not apply |
Hi, On Mon, Jun 14, 2021 at 09:34:40PM +0200, Aleksander Jan Bajkowski wrote: > This reverts commit c07531c01d8284aedaf95708ea90e76d11af0e21. > > The previously mentioned commit significantly reduces NAT performance > in OpenWRT. Another user reports a high ping issue. The results of > IPv4 NAT benchmark on BT Home Hub 5A (with software flow offloading): > * 5.4.124 515 Mb/s > * 5.10.41 570 Mb/s > * 5.10.42 250 Mb/s > * 5.10.42 + revert 580 Mb/s > > Reverting this commit fixes this issue. The xt_flowoffload module is inconditionally setting on the hardware offload flag: static int __init xt_flowoffload_tg_init(void) { int ret; register_netdevice_notifier(&flow_offload_netdev_notifier); ret = init_flowtable(&flowtable[0]); if (ret) return ret; ret = init_flowtable(&flowtable[1]); if (ret) goto cleanup; flowtable[1].ft.flags = NF_FLOWTABLE_HW_OFFLOAD; [...] which is triggering the slow down because packet path is allocating work to offload the entry to hardware, however, this driver does not support for hardware offload. Probably this module can be updated to unset the flowtable flag if the harware does not support hardware offload.
Hi Aleksander, > The xt_flowoffload module is inconditionally setting on the hardware > offload flag: [...] > > which is triggering the slow down because packet path is allocating > work to offload the entry to hardware, however, this driver does not > support for hardware offload. > > Probably this module can be updated to unset the flowtable flag if the > harware does not support hardware offload. yesterday there was a discussion about this on the #openwrt-devel IRC channel. I am adding the IRC log to the end of this email because I am not sure if you're using IRC. I typically don't test with flow offloading enabled (I am testing with OpenWrt's "default" network configuration, where flow offloading is disabled by default). Also I am not familiar with the flow offloading code at all and reading the xt_FLOWOFFLOAD code just raised more questions for me. Maybe you can share some info whether your workaround from [0] "fixes" this issue. I am aware that it will probably break other devices. But maybe it helps Pablo to confirm whether it's really an xt_FLOWOFFLOAD bug or rather some generic flow offload issue (as Felix suggested on IRC). Best regards, Martin [0] https://github.com/abajk/openwrt/commit/ee4d790574c0edd170e1710d7cd4819727b23721 <rsalvaterra> nbd: I saw your flow offloading updates. Just to make sure, this issue hasn't been addressed yet, has it? https://lore.kernel.org/netdev/20210614215351.GA734@salvia/ <nbd> i don't think so <nbd> can you reproduce it? <rsalvaterra> nbd: Not really, I don't have the hardware. <rsalvaterra> It's lantiq, I think (bthh5a). <rsalvaterra> However, I believe dwmw2_gone has one, iirc. <xdarklight> nbd: I also have a HH5A. if you have any patch ready you can also send it as RFC on the mailing list and Cc Aleksander <rsalvaterra> xdarklight: Have you been able to reproduce the flow offloading regression? <xdarklight> I can try (typically I test with a "default" network configuration, where flow offloading is disabled) <rsalvaterra> xdarklight: I don't have a lot of details, only this thread: https://github.com/openwrt/openwrt/pull/4225#issuecomment-855454607 <xdarklight> rsalvaterra: this is the workaround that Aleksander has in his tree: https://github.com/abajk/openwrt/commit/ee4d790574c0edd170e1710d7cd4819727b23721 <rsalvaterra> xdarklight: Well, but that basically breaks hw flow offloading everywhere else, if I'm reading correctly. :) <xdarklight> rsalvaterra: I am not arguing with that :). I wanted to point out that Pablo's finding on the netfilter mailing list seems to be correct <rsalvaterra> xdarklight: Sure, which is why I pinged nbd, since he's the original author of the xt_FLOWOFFLOAD target. <rsalvaterra> What it seems is that it isn't such trivial fix. :) <xdarklight> I looked at the xt_FLOWOFFLOAD code myself and it raised more questions than I had before looking at the code. so I decided to wait for someone with better knowledge to look into that issue :) <rsalvaterra> Same here. I just went "oh, this requires divine intervention" and set it aside. :P <nbd> xdarklight: which finding did you mean? <xdarklight> nbd: "The xt_flowoffload module is inconditionally setting on the hardware offload flag" [...] flowtable[1].ft.flags = NF_FLOWTABLE_HW_OFFLOAD; <xdarklight> nbd: from this email: https://lore.kernel.org/netdev/20210614215351.GA734@salvia/ <nbd> i actually think that finding is wrong <nbd> xt_FLOWOFFLOAD registers two flowtables <nbd> one with hw offload, one without <nbd> the target code does this: <nbd> table = &flowtable[!!(info->flags & XT_FLOWOFFLOAD_HW)]; <nbd> so it selects flowtable[1] only if info->flags has XT_FLOWOFFLOAD_HW set <rsalvaterra> nbd: That's between you and Pablo, I mustn't interfere. :) <nbd> i did reply to pablo, but never heard back from him <rsalvaterra> nbd: The merge window is still open, he's probably busy at the moment… maybe ping him on Monday? <xdarklight> nbd: it seems that your mail also didn't make it to the netdev mailing list (at least I can't find it) <rsalvaterra> xdarklight: Now that you mention it, neither do I. <nbd> he wrote to me in private for some reason <xdarklight> oh okay. so for me to summarize: you're saying that the xt_FLOWOFFLOAD code should be fine. in that case Aleksander's workaround (https://github.com/abajk/openwrt/commit/ee4d790574c0edd170e1710d7cd4819727b23721) is also a no-op and the original problem would still be seen <rsalvaterra> xdarklight: I don't think it's a no-op, otherwise he wouldn't carry it in his tree… maybe something's wrong in the table selection? table = &flowtable[!!(info->flags & XT_FLOWOFFLOAD_HW)]; does look correct, though. <nbd> xdarklight: well, it's not a no-op if the issue was reproduced with option flow_offloading_hw 1 in the config <rsalvaterra> nbd: Uh… If he has option flow_offloading_hw '1' on a system which doesn't support hw flow offloading… he gets to keep the pieces, right? <nbd> it shouldn't break <nbd> and normally i'd expect the generic flow offload api to be able to deal with it without attempting to re-enable hw offload over and over again
Hi Martin, Thanks for the IRC log. Today I repeated my previous tests. I think I had to have Hardware flow offloading enabled before, even though Lantiq xRX200 doesn't support it. With only the software flow offloading turned on, I do not see a significant performance drop. Today's results (IPv6 routing with DSA driver and Burst Length Patch): Device Kernel Flow offload Upload Download HH5A 5.4.128 Disabled 101 170 HH5A 5.10.46 Disabled 95.4 159 HH5A 5.10.42 Disabled 96.6 165 HH5A 5.10.41 Disabled 101 165 HH5A 5.4.128 Soft 471 463 HH5A 5.10.46 Soft 553 472 HH5A 5.10.42 Soft 556 468 HH5A 5.10.41 Soft 558 492 HH5A 5.4.128 Soft+Hard 434 460 HH5A 5.10.46 Soft+Hard 229 181 HH5A 5.10.42 Soft+Hard 228 177 HH5A 5.10.41 Soft+Hard 577 482 It seems my workaround is unnecessary. Best regards, Aleksander Jan Bajkowski On 7/11/21 3:02 AM, Martin Blumenstingl wrote: > Hi Aleksander, > >> The xt_flowoffload module is inconditionally setting on the hardware >> offload flag: > [...] >> >> which is triggering the slow down because packet path is allocating >> work to offload the entry to hardware, however, this driver does not >> support for hardware offload. >> >> Probably this module can be updated to unset the flowtable flag if the >> harware does not support hardware offload. > > yesterday there was a discussion about this on the #openwrt-devel IRC > channel. I am adding the IRC log to the end of this email because I am > not sure if you're using IRC. > > I typically don't test with flow offloading enabled (I am testing with > OpenWrt's "default" network configuration, where flow offloading is > disabled by default). Also I am not familiar with the flow offloading > code at all and reading the xt_FLOWOFFLOAD code just raised more > questions for me. > > Maybe you can share some info whether your workaround from [0] "fixes" > this issue. I am aware that it will probably break other devices. But > maybe it helps Pablo to confirm whether it's really an xt_FLOWOFFLOAD > bug or rather some generic flow offload issue (as Felix suggested on > IRC). > > > Best regards, > Martin > > > [0] https://github.com/abajk/openwrt/commit/ee4d790574c0edd170e1710d7cd4819727b23721 > > > <rsalvaterra> nbd: I saw your flow offloading updates. Just to make sure, this issue hasn't been addressed yet, has it? https://lore.kernel.org/netdev/20210614215351.GA734@salvia/ > <nbd> i don't think so > <nbd> can you reproduce it? > <rsalvaterra> nbd: Not really, I don't have the hardware. > <rsalvaterra> It's lantiq, I think (bthh5a). > <rsalvaterra> However, I believe dwmw2_gone has one, iirc. > <xdarklight> nbd: I also have a HH5A. if you have any patch ready you can also send it as RFC on the mailing list and Cc Aleksander > <rsalvaterra> xdarklight: Have you been able to reproduce the flow offloading regression? > <xdarklight> I can try (typically I test with a "default" network configuration, where flow offloading is disabled) > <rsalvaterra> xdarklight: I don't have a lot of details, only this thread: https://github.com/openwrt/openwrt/pull/4225#issuecomment-855454607 > <xdarklight> rsalvaterra: this is the workaround that Aleksander has in his tree: https://github.com/abajk/openwrt/commit/ee4d790574c0edd170e1710d7cd4819727b23721 > <rsalvaterra> xdarklight: Well, but that basically breaks hw flow offloading everywhere else, if I'm reading correctly. :) > <xdarklight> rsalvaterra: I am not arguing with that :). I wanted to point out that Pablo's finding on the netfilter mailing list seems to be correct > <rsalvaterra> xdarklight: Sure, which is why I pinged nbd, since he's the original author of the xt_FLOWOFFLOAD target. > <rsalvaterra> What it seems is that it isn't such trivial fix. :) > <xdarklight> I looked at the xt_FLOWOFFLOAD code myself and it raised more questions than I had before looking at the code. so I decided to wait for someone with better knowledge to look into that issue :) > <rsalvaterra> Same here. I just went "oh, this requires divine intervention" and set it aside. :P > <nbd> xdarklight: which finding did you mean? > <xdarklight> nbd: "The xt_flowoffload module is inconditionally setting on the hardware offload flag" [...] flowtable[1].ft.flags = NF_FLOWTABLE_HW_OFFLOAD; > <xdarklight> nbd: from this email: https://lore.kernel.org/netdev/20210614215351.GA734@salvia/ > <nbd> i actually think that finding is wrong > <nbd> xt_FLOWOFFLOAD registers two flowtables > <nbd> one with hw offload, one without > <nbd> the target code does this: > <nbd> table = &flowtable[!!(info->flags & XT_FLOWOFFLOAD_HW)]; > <nbd> so it selects flowtable[1] only if info->flags has XT_FLOWOFFLOAD_HW set > <rsalvaterra> nbd: That's between you and Pablo, I mustn't interfere. :) > <nbd> i did reply to pablo, but never heard back from him > <rsalvaterra> nbd: The merge window is still open, he's probably busy at the moment… maybe ping him on Monday? > <xdarklight> nbd: it seems that your mail also didn't make it to the netdev mailing list (at least I can't find it) > <rsalvaterra> xdarklight: Now that you mention it, neither do I. > <nbd> he wrote to me in private for some reason > <xdarklight> oh okay. so for me to summarize: you're saying that the xt_FLOWOFFLOAD code should be fine. in that case Aleksander's workaround (https://github.com/abajk/openwrt/commit/ee4d790574c0edd170e1710d7cd4819727b23721) is also a no-op and the original problem would still be seen > <rsalvaterra> xdarklight: I don't think it's a no-op, otherwise he wouldn't carry it in his tree… maybe something's wrong in the table selection? table = &flowtable[!!(info->flags & XT_FLOWOFFLOAD_HW)]; does look correct, though. > <nbd> xdarklight: well, it's not a no-op if the issue was reproduced with option flow_offloading_hw 1 in the config > <rsalvaterra> nbd: Uh… If he has option flow_offloading_hw '1' on a system which doesn't support hw flow offloading… he gets to keep the pieces, right? > <nbd> it shouldn't break > <nbd> and normally i'd expect the generic flow offload api to be able to deal with it without attempting to re-enable hw offload over and over again >
Hi Martin, On Sun, Jul 11, 2021 at 03:02:44AM +0200, Martin Blumenstingl wrote: > Hi Aleksander, > > > The xt_flowoffload module is inconditionally setting on the hardware > > offload flag: > [...] > > > > which is triggering the slow down because packet path is allocating > > work to offload the entry to hardware, however, this driver does not > > support for hardware offload. > > > > Probably this module can be updated to unset the flowtable flag if the > > harware does not support hardware offload. > > yesterday there was a discussion about this on the #openwrt-devel IRC > channel. I am adding the IRC log to the end of this email because I am > not sure if you're using IRC. > > I typically don't test with flow offloading enabled (I am testing with > OpenWrt's "default" network configuration, where flow offloading is > disabled by default). Also I am not familiar with the flow offloading > code at all and reading the xt_FLOWOFFLOAD code just raised more > questions for me. > > Maybe you can share some info whether your workaround from [0] "fixes" > this issue. I am aware that it will probably break other devices. But > maybe it helps Pablo to confirm whether it's really an xt_FLOWOFFLOAD > bug or rather some generic flow offload issue (as Felix suggested on > IRC). Maybe the user reporting this issue is enabling the --hw option? As I said, the patch that is being proposed to be revert is just amplifying. The only way to trigger this bug that I can find is: - NF_FLOWTABLE_HW_OFFLOAD is enabled. - packets are following the software path. I don't see yet how this can happen with upstream codebase, nftables enables NF_FLOWTABLE_HW_OFFLOAD at configuration time, if the driver does not support for hardware offload, then NF_FLOWTABLE_HW_OFFLOAD is not set. Is xt_flowoffload rejecting the rule load if user specifies --hw and the hardware does not support for hardware offload? By reading Felix's discussion on the IRC, it seems to me he does not like that the packet path retries to offload flows. If so, it should be possible to add a driver flag to disable this behaviour, so driver developers select what they prefer that flowtable core retries to offload entries. I can have a look into adding such flag and use it from the mtk driver.
On 2021-07-12 11:46, Pablo Neira Ayuso wrote: > Maybe the user reporting this issue is enabling the --hw option? > As I said, the patch that is being proposed to be revert is just > amplifying. > > The only way to trigger this bug that I can find is: > > - NF_FLOWTABLE_HW_OFFLOAD is enabled. > - packets are following the software path. > > I don't see yet how this can happen with upstream codebase, nftables > enables NF_FLOWTABLE_HW_OFFLOAD at configuration time, if the driver > does not support for hardware offload, then NF_FLOWTABLE_HW_OFFLOAD is > not set. > > Is xt_flowoffload rejecting the rule load if user specifies --hw and > the hardware does not support for hardware offload? > > By reading Felix's discussion on the IRC, it seems to me he does not > like that the packet path retries to offload flows. If so, it should > be possible to add a driver flag to disable this behaviour, so driver > developers select what they prefer that flowtable core retries to > offload entries. I can have a look into adding such flag and use it > from the mtk driver. I'd prefer making the retry behavior depend on the error code during setup. For example, if we get -ENOMEM, -EAGAIN or something like that, we should definitely retry. If we get -EOPNOTSUPP or -EINVAL, I don't think a retry makes any sense on any driver. - Felix
diff --git a/include/net/netfilter/nf_flow_table.h b/include/net/netfilter/nf_flow_table.h index 48ef7460ff30..51d8eb99764d 100644 --- a/include/net/netfilter/nf_flow_table.h +++ b/include/net/netfilter/nf_flow_table.h @@ -157,6 +157,7 @@ enum nf_flow_flags { NF_FLOW_HW, NF_FLOW_HW_DYING, NF_FLOW_HW_DEAD, + NF_FLOW_HW_REFRESH, NF_FLOW_HW_PENDING, }; diff --git a/net/netfilter/nf_flow_table_core.c b/net/netfilter/nf_flow_table_core.c index 1d02650dd715..39c02d1aeedf 100644 --- a/net/netfilter/nf_flow_table_core.c +++ b/net/netfilter/nf_flow_table_core.c @@ -306,7 +306,8 @@ void flow_offload_refresh(struct nf_flowtable *flow_table, { flow->timeout = nf_flowtable_time_stamp + NF_FLOW_TIMEOUT; - if (likely(!nf_flowtable_hw_offload(flow_table))) + if (likely(!nf_flowtable_hw_offload(flow_table) || + !test_and_clear_bit(NF_FLOW_HW_REFRESH, &flow->flags))) return; nf_flow_offload_add(flow_table, flow); diff --git a/net/netfilter/nf_flow_table_offload.c b/net/netfilter/nf_flow_table_offload.c index 528b2f172684..2af7bdb38407 100644 --- a/net/netfilter/nf_flow_table_offload.c +++ b/net/netfilter/nf_flow_table_offload.c @@ -902,11 +902,10 @@ static void flow_offload_work_add(struct flow_offload_work *offload) err = flow_offload_rule_add(offload, flow_rule); if (err < 0) - goto out; - - set_bit(IPS_HW_OFFLOAD_BIT, &offload->flow->ct->status); + set_bit(NF_FLOW_HW_REFRESH, &offload->flow->flags); + else + set_bit(IPS_HW_OFFLOAD_BIT, &offload->flow->ct->status); -out: nf_flow_offload_destroy(flow_rule); }
This reverts commit c07531c01d8284aedaf95708ea90e76d11af0e21. The previously mentioned commit significantly reduces NAT performance in OpenWRT. Another user reports a high ping issue. The results of IPv4 NAT benchmark on BT Home Hub 5A (with software flow offloading): * 5.4.124 515 Mb/s * 5.10.41 570 Mb/s * 5.10.42 250 Mb/s * 5.10.42 + revert 580 Mb/s Reverting this commit fixes this issue. Fixes: c07531c01d8284aedaf95708ea90e76d11af0e21 ("netfilter: flowtable: Remove redundant hw refresh bit") Signed-off-by: Aleksander Jan Bajkowski <olek2@wp.pl> --- include/net/netfilter/nf_flow_table.h | 1 + net/netfilter/nf_flow_table_core.c | 3 ++- net/netfilter/nf_flow_table_offload.c | 7 +++---- 3 files changed, 6 insertions(+), 5 deletions(-)