diff mbox series

[RFC] netfilter: nf_tables: ignore errors on flowtable device hw offload setup

Message ID 20220510202739.67068-1-nbd@nbd.name (mailing list archive)
State RFC
Delegated to: Netdev Maintainers
Headers show
Series [RFC] netfilter: nf_tables: ignore errors on flowtable device hw offload setup | expand

Checks

Context Check Description
netdev/tree_selection success Guessed tree name to be net-next
netdev/fixes_present success Fixes tag not required for -next series
netdev/subject_prefix warning Target tree name not specified in the subject
netdev/cover_letter success Single patches do not need cover letters
netdev/patch_count success Link
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 16 this patch: 16
netdev/cc_maintainers warning 7 maintainers not CCed: coreteam@netfilter.org edumazet@google.com kadlec@netfilter.org pabeni@redhat.com fw@strlen.de kuba@kernel.org davem@davemloft.net
netdev/build_clang success Errors and warnings before: 0 this patch: 0
netdev/module_param success Was 0 now: 0
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/verify_fixes success No Fixes tag
netdev/build_allmodconfig_warn success Errors and warnings before: 16 this patch: 16
netdev/checkpatch success total: 0 errors, 0 warnings, 0 checks, 14 lines checked
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0

Commit Message

Felix Fietkau May 10, 2022, 8:27 p.m. UTC
In many cases, it's not easily possible for user space to know, which
devices properly support hardware offload. Even if a device supports hardware
flow offload, it is not guaranteed that it will actually be able to handle
the flows for which hardware offload is requested.

Ignoring errors on the FLOW_BLOCK_BIND makes it a lot easier to set up
configurations that use hardware offload where possible and gracefully
fall back to software offload for everything else.

Cc: Jo-Philipp Wich <jo@mein.io>
Signed-off-by: Felix Fietkau <nbd@nbd.name>
---
 net/netfilter/nf_tables_api.c | 8 +++-----
 1 file changed, 3 insertions(+), 5 deletions(-)

Comments

Pablo Neira Ayuso May 13, 2022, 7:49 a.m. UTC | #1
Hi,

On Tue, May 10, 2022 at 10:27:39PM +0200, Felix Fietkau wrote:
> In many cases, it's not easily possible for user space to know, which
> devices properly support hardware offload.

Then, it is a matter of extending the netlink interface to expose this
feature? Probably add a FLOW_BLOCK_PROBE or similar which allow to
consult if this feature is available?

> Even if a device supports hardware flow offload, it is not
> guaranteed that it will actually be able to handle the flows for
> which hardware offload is requested.

When might this happen?

> Ignoring errors on the FLOW_BLOCK_BIND makes it a lot easier to set up
> configurations that use hardware offload where possible and gracefully
> fall back to software offload for everything else.

I understand this might be useful from userspace perspective, because
forcing the user to re-try is silly.

However, on the other hand, the user should have some way to know from
the control plane that the feature (hardware offload) that they
request is not available for their setup.

> Cc: Jo-Philipp Wich <jo@mein.io>
> Signed-off-by: Felix Fietkau <nbd@nbd.name>
> ---
>  net/netfilter/nf_tables_api.c | 8 +++-----
>  1 file changed, 3 insertions(+), 5 deletions(-)
> 
> diff --git a/net/netfilter/nf_tables_api.c b/net/netfilter/nf_tables_api.c
> index 16c3a39689f4..9d4528f0aa12 100644
> --- a/net/netfilter/nf_tables_api.c
> +++ b/net/netfilter/nf_tables_api.c
> @@ -7323,11 +7323,9 @@ static int nft_register_flowtable_net_hooks(struct net *net,
>  			}
>  		}
>  
> -		err = flowtable->data.type->setup(&flowtable->data,
> -						  hook->ops.dev,
> -						  FLOW_BLOCK_BIND);
> -		if (err < 0)
> -			goto err_unregister_net_hooks;
> +		flowtable->data.type->setup(&flowtable->data,
> +					    hook->ops.dev,
> +					    FLOW_BLOCK_BIND);
>  
>  		err = nf_register_net_hook(net, &hook->ops);
>  		if (err < 0) {
> -- 
> 2.36.1
>
Felix Fietkau May 13, 2022, 8:03 a.m. UTC | #2
On 13.05.22 09:49, Pablo Neira Ayuso wrote:
> Hi,
> 
> On Tue, May 10, 2022 at 10:27:39PM +0200, Felix Fietkau wrote:
>> In many cases, it's not easily possible for user space to know, which
>> devices properly support hardware offload.
> 
> Then, it is a matter of extending the netlink interface to expose this
> feature? Probably add a FLOW_BLOCK_PROBE or similar which allow to
> consult if this feature is available?
> 
>> Even if a device supports hardware flow offload, it is not
>> guaranteed that it will actually be able to handle the flows for
>> which hardware offload is requested.
> 
> When might this happen?
I think there are many possible reasons: The flow might be using 
features not supported by the offload driver. Maybe it doesn't have any 
space left in the offload table. I'm sure there are many other possible 
reasons it could fail.

>> Ignoring errors on the FLOW_BLOCK_BIND makes it a lot easier to set up
>> configurations that use hardware offload where possible and gracefully
>> fall back to software offload for everything else.
> 
> I understand this might be useful from userspace perspective, because
> forcing the user to re-try is silly.
> 
> However, on the other hand, the user should have some way to know from
> the control plane that the feature (hardware offload) that they
> request is not available for their setup.
In my opinion, most users of this API probably don't care and just want 
to have offload on a best effort basis. Assuming that is the case, 
wouldn't it be better if we simply have an API that indicates, which 
flowtable members hardware offload was actually enabled for?

- Felix
Pablo Neira Ayuso May 13, 2022, 8:15 a.m. UTC | #3
On Fri, May 13, 2022 at 10:03:13AM +0200, Felix Fietkau wrote:
> 
> On 13.05.22 09:49, Pablo Neira Ayuso wrote:
> > Hi,
> > 
> > On Tue, May 10, 2022 at 10:27:39PM +0200, Felix Fietkau wrote:
> > > In many cases, it's not easily possible for user space to know, which
> > > devices properly support hardware offload.
> > 
> > Then, it is a matter of extending the netlink interface to expose this
> > feature? Probably add a FLOW_BLOCK_PROBE or similar which allow to
> > consult if this feature is available?
> > 
> > > Even if a device supports hardware flow offload, it is not
> > > guaranteed that it will actually be able to handle the flows for
> > > which hardware offload is requested.
> > 
> > When might this happen?
>
> I think there are many possible reasons: The flow might be using features
> not supported by the offload driver. Maybe it doesn't have any space left in
> the offload table. I'm sure there are many other possible reasons it could
> fail.

This fallback to software flowtable path for partial scenarios already
exists.

> > > Ignoring errors on the FLOW_BLOCK_BIND makes it a lot easier to set up
> > > configurations that use hardware offload where possible and gracefully
> > > fall back to software offload for everything else.
> > 
> > I understand this might be useful from userspace perspective, because
> > forcing the user to re-try is silly.
> > 
> > However, on the other hand, the user should have some way to know from
> > the control plane that the feature (hardware offload) that they
> > request is not available for their setup.
>
> In my opinion, most users of this API probably don't care and just want to
> have offload on a best effort basis.

OK, but if the setup does not support hardware offload at all, why
should the control plane accept this? I think user should know in
first place that no one single flow is going to be offloaded to
hardware.

> Assuming that is the case, wouldn't it be better if we simply have
> an API that indicates, which flowtable members hardware offload was
> actually enabled for?

What are you proposing?

I think it would be good to expose through netlink interface what the
device can actually do according to the existing supported flowtable
software datapath features.
Felix Fietkau May 13, 2022, 9:09 a.m. UTC | #4
On 13.05.22 10:15, Pablo Neira Ayuso wrote:
> On Fri, May 13, 2022 at 10:03:13AM +0200, Felix Fietkau wrote:
>> 
>> On 13.05.22 09:49, Pablo Neira Ayuso wrote:
>> > Hi,
>> > 
>> > On Tue, May 10, 2022 at 10:27:39PM +0200, Felix Fietkau wrote:
>> > > In many cases, it's not easily possible for user space to know, which
>> > > devices properly support hardware offload.
>> > 
>> > Then, it is a matter of extending the netlink interface to expose this
>> > feature? Probably add a FLOW_BLOCK_PROBE or similar which allow to
>> > consult if this feature is available?
>> > 
>> > > Even if a device supports hardware flow offload, it is not
>> > > guaranteed that it will actually be able to handle the flows for
>> > > which hardware offload is requested.
>> > 
>> > When might this happen?
>>
>> I think there are many possible reasons: The flow might be using features
>> not supported by the offload driver. Maybe it doesn't have any space left in
>> the offload table. I'm sure there are many other possible reasons it could
>> fail.
> 
> This fallback to software flowtable path for partial scenarios already
> exists.
I know. All I meant was to point out that hardware offload is not 
guaranteed in one place, so I don't think bailing out with an error 
because flow block bind didn't work for one of the flowtable devices is 
justified.

>> > > Ignoring errors on the FLOW_BLOCK_BIND makes it a lot easier to set up
>> > > configurations that use hardware offload where possible and gracefully
>> > > fall back to software offload for everything else.
>> > 
>> > I understand this might be useful from userspace perspective, because
>> > forcing the user to re-try is silly.
>> > 
>> > However, on the other hand, the user should have some way to know from
>> > the control plane that the feature (hardware offload) that they
>> > request is not available for their setup.
>>
>> In my opinion, most users of this API probably don't care and just want to
>> have offload on a best effort basis.
> 
> OK, but if the setup does not support hardware offload at all, why
> should the control plane accept this? I think user should know in
> first place that no one single flow is going to be offloaded to
> hardware.
It makes for a much cleaner configuration if you can just create a 
single hw-offload enabled flowtable containing multiple devices, some of 
which support hardware offload and some of which don't.

>> Assuming that is the case, wouldn't it be better if we simply have
>> an API that indicates, which flowtable members hardware offload was
>> actually enabled for?
> 
> What are you proposing?
> 
> I think it would be good to expose through netlink interface what the
> device can actually do according to the existing supported flowtable
> software datapath features.
In addition to the NFTA_FLOWTABLE_HOOK_DEVS array, the netlink API could 
also return another array, e.g. NFTA_FLOWTABLE_HOOK_OFFLOAD_DEVS which 
indicates devices for which hw offload is enabled.

What I really don't like about the current state of the flowtable 
offload API is the (in my opinion completely unnecessary) complexity 
that is required for the simple use case of enabling hw/sw flow 
offloading on a best effort basis for all devices.
What I like even less is the number of implementation details that it 
has to consider.

For example: Let's assume we have a machine with several devices, some 
of which support hw offload, some of which don't. We have a mix of VLANs 
and bridges in there as well, maybe even PPPoE.
Now the admin of that machine wants to enable best-effort hardware + 
software flow offloading for that configuration.
Now he (or a piece of user space software dealing with the config) has 
to do these things:
- figure out which devices could support hw offload, create a separate 
flow table for them
- be aware of which of these devices are actually used by looking at the 
stack of bridges, vlans, dsa devices, etc.
- if an error occurs, test them individually just to see which one 
actually failed and leave it out of the flowtable
- for sw offload be aware that there is limited support for offloading 
decap of vlans/pppoe, count the number of decaps and figure out the 
right input device to add based on the behavior of nft_dev_path_info, so 
that the 'indev' it selects matches the device you put in the flow table.

So I'm asking you: Am I getting any of this completely wrong? Do you 
consider it to be a reasonable trade-off to force the admin (or 
intermediate user space layer) to jump through these hoops for such a 
simple use case, just because somebody might want more fine grained control?

I consider this patch to be a first step towards making simple use cases 
easier to configure. I'd also be fine with adding a flag to make the 
fallback behavior opt-in, even though I think it would make a much 
better default.

Eventually I'd also like to add a flag that makes it unnecessary to even 
specify the devices in the flow table by making the code auto-create 
hooks for devices with active flows, just like I did in my xtables target.
You correctly pointed out to me in the past that this comes at the cost 
of a few packets delay before offloading kicks in, but I'm still 
wondering: who actually cares about that?

If I'm completely off-base with this, please let me know. I'm simply 
trying to make sense of all of this...

- Felix
Pablo Neira Ayuso May 16, 2022, 12:57 a.m. UTC | #5
On Fri, May 13, 2022 at 11:09:51AM +0200, Felix Fietkau wrote:
> 
> On 13.05.22 10:15, Pablo Neira Ayuso wrote:
> > On Fri, May 13, 2022 at 10:03:13AM +0200, Felix Fietkau wrote:
> > > 
> > > On 13.05.22 09:49, Pablo Neira Ayuso wrote:
> > > > Hi,
> > > > > On Tue, May 10, 2022 at 10:27:39PM +0200, Felix Fietkau wrote:
> > > > > In many cases, it's not easily possible for user space to know, which
> > > > > devices properly support hardware offload.
> > > > > Then, it is a matter of extending the netlink interface to
> > > expose this
> > > > feature? Probably add a FLOW_BLOCK_PROBE or similar which allow to
> > > > consult if this feature is available?
> > > > > > Even if a device supports hardware flow offload, it is not
> > > > > guaranteed that it will actually be able to handle the flows for
> > > > > which hardware offload is requested.
> > > > > When might this happen?
> > > 
> > > I think there are many possible reasons: The flow might be using features
> > > not supported by the offload driver. Maybe it doesn't have any space left in
> > > the offload table. I'm sure there are many other possible reasons it could
> > > fail.
> > 
> > This fallback to software flowtable path for partial scenarios already
> > exists.
> I know. All I meant was to point out that hardware offload is not guaranteed
> in one place, so I don't think bailing out with an error because flow block
> bind didn't work for one of the flowtable devices is justified.
> 
> > > > > Ignoring errors on the FLOW_BLOCK_BIND makes it a lot easier to set up
> > > > > configurations that use hardware offload where possible and gracefully
> > > > > fall back to software offload for everything else.
> > > > > I understand this might be useful from userspace perspective,
> > > because
> > > > forcing the user to re-try is silly.
> > > > > However, on the other hand, the user should have some way to
> > > know from
> > > > the control plane that the feature (hardware offload) that they
> > > > request is not available for their setup.
> > > 
> > > In my opinion, most users of this API probably don't care and just want to
> > > have offload on a best effort basis.
> > 
> > OK, but if the setup does not support hardware offload at all, why
> > should the control plane accept this? I think user should know in
> > first place that no one single flow is going to be offloaded to
> > hardware.
>
> It makes for a much cleaner configuration if you can just create a single
> hw-offload enabled flowtable containing multiple devices, some of which
> support hardware offload and some of which don't.

This scenario mixing devices that support hw offload and which does
not support hw offload make sense.

> > > Assuming that is the case, wouldn't it be better if we simply have
> > > an API that indicates, which flowtable members hardware offload was
> > > actually enabled for?
> > 
> > What are you proposing?
> > 
> > I think it would be good to expose through netlink interface what the
> > device can actually do according to the existing supported flowtable
> > software datapath features.
> In addition to the NFTA_FLOWTABLE_HOOK_DEVS array, the netlink API could
> also return another array, e.g. NFTA_FLOWTABLE_HOOK_OFFLOAD_DEVS which
> indicates devices for which hw offload is enabled.
> 
> What I really don't like about the current state of the flowtable offload
> API is the (in my opinion completely unnecessary) complexity that is
> required for the simple use case of enabling hw/sw flow offloading on a best
> effort basis for all devices.
> What I like even less is the number of implementation details that it has to
> consider.
> 
> For example: Let's assume we have a machine with several devices, some of
> which support hw offload, some of which don't. We have a mix of VLANs and
> bridges in there as well, maybe even PPPoE.
> Now the admin of that machine wants to enable best-effort hardware +
> software flow offloading for that configuration.
> Now he (or a piece of user space software dealing with the config) has to do
> these things:
> - figure out which devices could support hw offload, create a separate flow
> table for them
> - be aware of which of these devices are actually used by looking at the
> stack of bridges, vlans, dsa devices, etc.
> - if an error occurs, test them individually just to see which one actually
> failed and leave it out of the flowtable
> - for sw offload be aware that there is limited support for offloading decap
> of vlans/pppoe, count the number of decaps and figure out the right input
> device to add based on the behavior of nft_dev_path_info, so that the
> 'indev' it selects matches the device you put in the flow table.
> 
> So I'm asking you: Am I getting any of this completely wrong? Do you
> consider it to be a reasonable trade-off to force the admin (or intermediate
> user space layer) to jump through these hoops for such a simple use case,
> just because somebody might want more fine grained control?
> 
> I consider this patch to be a first step towards making simple use cases
> easier to configure. I'd also be fine with adding a flag to make the
> fallback behavior opt-in, even though I think it would make a much better
> default.
> 
> Eventually I'd also like to add a flag that makes it unnecessary to even
> specify the devices in the flow table by making the code auto-create hooks
> for devices with active flows, just like I did in my xtables target.
> You correctly pointed out to me in the past that this comes at the cost of a
> few packets delay before offloading kicks in, but I'm still wondering: who
> actually cares about that?
> 
> If I'm completely off-base with this, please let me know. I'm simply trying
> to make sense of all of this...

Maybe only fail if _none_ of the selected devices support for hardware
offload, ie. instead of silently accepting all devices, count of the
number of devices for which a block has been set up, if it is == 0
then bail out with EOPNOTSUPP.
Felix Fietkau May 19, 2022, 3:37 p.m. UTC | #6
On 16.05.22 02:57, Pablo Neira Ayuso wrote:
> On Fri, May 13, 2022 at 11:09:51AM +0200, Felix Fietkau wrote:
>> > > Assuming that is the case, wouldn't it be better if we simply have
>> > > an API that indicates, which flowtable members hardware offload was
>> > > actually enabled for?
>> > 
>> > What are you proposing?
>> > 
>> > I think it would be good to expose through netlink interface what the
>> > device can actually do according to the existing supported flowtable
>> > software datapath features.
>> In addition to the NFTA_FLOWTABLE_HOOK_DEVS array, the netlink API could
>> also return another array, e.g. NFTA_FLOWTABLE_HOOK_OFFLOAD_DEVS which
>> indicates devices for which hw offload is enabled.
>> 
>> What I really don't like about the current state of the flowtable offload
>> API is the (in my opinion completely unnecessary) complexity that is
>> required for the simple use case of enabling hw/sw flow offloading on a best
>> effort basis for all devices.
>> What I like even less is the number of implementation details that it has to
>> consider.
>> 
>> For example: Let's assume we have a machine with several devices, some of
>> which support hw offload, some of which don't. We have a mix of VLANs and
>> bridges in there as well, maybe even PPPoE.
>> Now the admin of that machine wants to enable best-effort hardware +
>> software flow offloading for that configuration.
>> Now he (or a piece of user space software dealing with the config) has to do
>> these things:
>> - figure out which devices could support hw offload, create a separate flow
>> table for them
>> - be aware of which of these devices are actually used by looking at the
>> stack of bridges, vlans, dsa devices, etc.
>> - if an error occurs, test them individually just to see which one actually
>> failed and leave it out of the flowtable
>> - for sw offload be aware that there is limited support for offloading decap
>> of vlans/pppoe, count the number of decaps and figure out the right input
>> device to add based on the behavior of nft_dev_path_info, so that the
>> 'indev' it selects matches the device you put in the flow table.
>> 
>> So I'm asking you: Am I getting any of this completely wrong? Do you
>> consider it to be a reasonable trade-off to force the admin (or intermediate
>> user space layer) to jump through these hoops for such a simple use case,
>> just because somebody might want more fine grained control?
>> 
>> I consider this patch to be a first step towards making simple use cases
>> easier to configure. I'd also be fine with adding a flag to make the
>> fallback behavior opt-in, even though I think it would make a much better
>> default.
>> 
>> Eventually I'd also like to add a flag that makes it unnecessary to even
>> specify the devices in the flow table by making the code auto-create hooks
>> for devices with active flows, just like I did in my xtables target.
>> You correctly pointed out to me in the past that this comes at the cost of a
>> few packets delay before offloading kicks in, but I'm still wondering: who
>> actually cares about that?
>> 
>> If I'm completely off-base with this, please let me know. I'm simply trying
>> to make sense of all of this...
> 
> Maybe only fail if _none_ of the selected devices support for hardware
> offload, ie. instead of silently accepting all devices, count of the
> number of devices for which a block has been set up, if it is == 0
> then bail out with EOPNOTSUPP.
I've thought about this some more. The problem with that is if you start 
by having only non-hw-offload devices in the flow table, you can't 
create it with the offload flag.
If you then want to add another device that is capable of doing hw 
offload, it means you have to delete the flowtable and recreate it 
(along with the rules that depend on it), since adding the offload flag 
at runtime isn't supported.

I still think the best course of action is to silently accept the 
offload flag even if none of the devices support hw offload.

- Felix
Pablo Neira Ayuso May 20, 2022, 7:50 a.m. UTC | #7
On Thu, May 19, 2022 at 05:37:12PM +0200, Felix Fietkau wrote:
> 
> On 16.05.22 02:57, Pablo Neira Ayuso wrote:
> > On Fri, May 13, 2022 at 11:09:51AM +0200, Felix Fietkau wrote:
> > > > > Assuming that is the case, wouldn't it be better if we simply have
> > > > > an API that indicates, which flowtable members hardware offload was
> > > > > actually enabled for?
> > > > > What are you proposing?
> > > > > I think it would be good to expose through netlink interface
> > > what the
> > > > device can actually do according to the existing supported flowtable
> > > > software datapath features.
> > > In addition to the NFTA_FLOWTABLE_HOOK_DEVS array, the netlink API could
> > > also return another array, e.g. NFTA_FLOWTABLE_HOOK_OFFLOAD_DEVS which
> > > indicates devices for which hw offload is enabled.
> > > 
> > > What I really don't like about the current state of the flowtable offload
> > > API is the (in my opinion completely unnecessary) complexity that is
> > > required for the simple use case of enabling hw/sw flow offloading on a best
> > > effort basis for all devices.
> > > What I like even less is the number of implementation details that it has to
> > > consider.
> > > 
> > > For example: Let's assume we have a machine with several devices, some of
> > > which support hw offload, some of which don't. We have a mix of VLANs and
> > > bridges in there as well, maybe even PPPoE.
> > > Now the admin of that machine wants to enable best-effort hardware +
> > > software flow offloading for that configuration.
> > > Now he (or a piece of user space software dealing with the config) has to do
> > > these things:
> > > - figure out which devices could support hw offload, create a separate flow
> > > table for them
> > > - be aware of which of these devices are actually used by looking at the
> > > stack of bridges, vlans, dsa devices, etc.
> > > - if an error occurs, test them individually just to see which one actually
> > > failed and leave it out of the flowtable
> > > - for sw offload be aware that there is limited support for offloading decap
> > > of vlans/pppoe, count the number of decaps and figure out the right input
> > > device to add based on the behavior of nft_dev_path_info, so that the
> > > 'indev' it selects matches the device you put in the flow table.
> > > 
> > > So I'm asking you: Am I getting any of this completely wrong? Do you
> > > consider it to be a reasonable trade-off to force the admin (or intermediate
> > > user space layer) to jump through these hoops for such a simple use case,
> > > just because somebody might want more fine grained control?
> > > 
> > > I consider this patch to be a first step towards making simple use cases
> > > easier to configure. I'd also be fine with adding a flag to make the
> > > fallback behavior opt-in, even though I think it would make a much better
> > > default.
> > > 
> > > Eventually I'd also like to add a flag that makes it unnecessary to even
> > > specify the devices in the flow table by making the code auto-create hooks
> > > for devices with active flows, just like I did in my xtables target.
> > > You correctly pointed out to me in the past that this comes at the cost of a
> > > few packets delay before offloading kicks in, but I'm still wondering: who
> > > actually cares about that?
> > > 
> > > If I'm completely off-base with this, please let me know. I'm simply trying
> > > to make sense of all of this...
> > 
> > Maybe only fail if _none_ of the selected devices support for hardware
> > offload, ie. instead of silently accepting all devices, count of the
> > number of devices for which a block has been set up, if it is == 0
> > then bail out with EOPNOTSUPP.
>
> I've thought about this some more. The problem with that is if you start by
> having only non-hw-offload devices in the flow table, you can't create it
> with the offload flag.
>
> If you then want to add another device that is capable of doing hw offload,
> it means you have to delete the flowtable and recreate it (along with the
> rules that depend on it), since adding the offload flag at runtime isn't
> supported.

I'm sssuming we relax the requirement as I proposed, ie. allow for not
allow devices to support for hardware offload, but at least one.

Then, it should be possible to extend the netlink interface to promote
a flowtable to support hardware offload, e.g.

 add flowtable inet x y { hook ingress devices = { eth0, eth1 } priority 0; flags offload; }

For an existing flowtable, that will add eth0 and eth1, and it will
request to turn hardware offload.

This is not supported, these bits are missing in the netlink interface.

> I still think the best course of action is to silently accept the offload
> flag even if none of the devices support hw offload.

Silent means user is asking for something that is actually not
supported, there will be no effective way from the control plane to
check if what they request is actually being applied.

I'd propose two changes:

- relax the existing requirement, so if one device support hw offload,
  then accept the configuration.

- allow to update a flowtable to on/off hardware offload from netlink
  interface without needing to reload your whole ruleset.
Felix Fietkau May 20, 2022, 6:07 p.m. UTC | #8
On 20.05.22 09:50, Pablo Neira Ayuso wrote:
> I'm sssuming we relax the requirement as I proposed, ie. allow for not
> allow devices to support for hardware offload, but at least one.
> 
> Then, it should be possible to extend the netlink interface to promote
> a flowtable to support hardware offload, e.g.
> 
>   add flowtable inet x y { hook ingress devices = { eth0, eth1 } priority 0; flags offload; }
> 
> For an existing flowtable, that will add eth0 and eth1, and it will
> request to turn hardware offload.
> 
> This is not supported, these bits are missing in the netlink interface.
> 
>> I still think the best course of action is to silently accept the offload
>> flag even if none of the devices support hw offload.
> 
> Silent means user is asking for something that is actually not
> supported, there will be no effective way from the control plane to
> check if what they request is actually being applied.
> 
> I'd propose two changes:
> 
> - relax the existing requirement, so if one device support hw offload,
>    then accept the configuration.
> 
> - allow to update a flowtable to on/off hardware offload from netlink
>    interface without needing to reload your whole ruleset.
I still don't see the value in forcing user space to do the 
failure-and-retry dance if none of the devices support hw offload.
If this is about notifying user space about the hw offload status, I 
think it's much better to simply accept such configurations as-is and 
extend the netlink api to report which of the member devices hw offload 
was actually enabled for.
This would be much more valuable to users that actually care about the 
hw offload status than knowing if one of the devices in the list has hw 
offload support, and it would simplify the code as well, for kernel and 
user space alike.

- Felix
Pablo Neira Ayuso May 20, 2022, 9:47 p.m. UTC | #9
On Fri, May 20, 2022 at 08:07:44PM +0200, Felix Fietkau wrote:
> 
> On 20.05.22 09:50, Pablo Neira Ayuso wrote:
> > I'm sssuming we relax the requirement as I proposed, ie. allow for not
> > allow devices to support for hardware offload, but at least one.
> > 
> > Then, it should be possible to extend the netlink interface to promote
> > a flowtable to support hardware offload, e.g.
> > 
> >   add flowtable inet x y { hook ingress devices = { eth0, eth1 } priority 0; flags offload; }
> > 
> > For an existing flowtable, that will add eth0 and eth1, and it will
> > request to turn hardware offload.
> > 
> > This is not supported, these bits are missing in the netlink interface.
> > 
> > > I still think the best course of action is to silently accept the offload
> > > flag even if none of the devices support hw offload.
> > 
> > Silent means user is asking for something that is actually not
> > supported, there will be no effective way from the control plane to
> > check if what they request is actually being applied.
> > 
> > I'd propose two changes:
> > 
> > - relax the existing requirement, so if one device support hw offload,
> >    then accept the configuration.
> > 
> > - allow to update a flowtable to on/off hardware offload from netlink
> >    interface without needing to reload your whole ruleset.
>
> I still don't see the value in forcing user space to do the
> failure-and-retry dance if none of the devices support hw offload.
> If this is about notifying user space about the hw offload status, I think
> it's much better to simply accept such configurations as-is and extend the
> netlink api to report which of the member devices hw offload was actually
> enabled for.
> This would be much more valuable to users that actually care about the hw
> offload status than knowing if one of the devices in the list has hw offload
> support, and it would simplify the code as well, for kernel and user space
> alike.

I would suggest to extend the API to expose if the device actually
support for the flowtable hardware offload, then after the listing,
the user knows if the feature is available, so they can turn it on.
Pablo Neira Ayuso May 30, 2022, 4:55 p.m. UTC | #10
On Fri, May 20, 2022 at 11:48:01PM +0200, Pablo Neira Ayuso wrote:
> On Fri, May 20, 2022 at 08:07:44PM +0200, Felix Fietkau wrote:
> > 
> > On 20.05.22 09:50, Pablo Neira Ayuso wrote:
> > > I'm sssuming we relax the requirement as I proposed, ie. allow for not
> > > allow devices to support for hardware offload, but at least one.
> > > 
> > > Then, it should be possible to extend the netlink interface to promote
> > > a flowtable to support hardware offload, e.g.
> > > 
> > >   add flowtable inet x y { hook ingress devices = { eth0, eth1 } priority 0; flags offload; }
> > > 
> > > For an existing flowtable, that will add eth0 and eth1, and it will
> > > request to turn hardware offload.
> > > 
> > > This is not supported, these bits are missing in the netlink interface.
> > > 
> > > > I still think the best course of action is to silently accept the offload
> > > > flag even if none of the devices support hw offload.
> > > 
> > > Silent means user is asking for something that is actually not
> > > supported, there will be no effective way from the control plane to
> > > check if what they request is actually being applied.
> > > 
> > > I'd propose two changes:
> > > 
> > > - relax the existing requirement, so if one device support hw offload,
> > >    then accept the configuration.
> > > 
> > > - allow to update a flowtable to on/off hardware offload from netlink
> > >    interface without needing to reload your whole ruleset.
> >
> > I still don't see the value in forcing user space to do the
> > failure-and-retry dance if none of the devices support hw offload.
> > If this is about notifying user space about the hw offload status, I think
> > it's much better to simply accept such configurations as-is and extend the
> > netlink api to report which of the member devices hw offload was actually
> > enabled for.
> > This would be much more valuable to users that actually care about the hw
> > offload status than knowing if one of the devices in the list has hw offload
> > support, and it would simplify the code as well, for kernel and user space
> > alike.
> 
> I would suggest to extend the API to expose if the device actually
> support for the flowtable hardware offload, then after the listing,
> the user knows if the feature is available, so they can turn it on.

Thinking it well, something in between your proposal and mine.

Allow to set on 'offload', then the kernel will disable this flag if
no devices support for hardware offload. The update path would also
need to allow for this new behaviour.

The user can check via 'nft list ruleset' if the flag is on / off.
Felix Fietkau May 30, 2022, 6:52 p.m. UTC | #11
On 30.05.22 18:55, Pablo Neira Ayuso wrote:
> On Fri, May 20, 2022 at 11:48:01PM +0200, Pablo Neira Ayuso wrote:
>> On Fri, May 20, 2022 at 08:07:44PM +0200, Felix Fietkau wrote:
>> > 
>> > On 20.05.22 09:50, Pablo Neira Ayuso wrote:
>> > > I'm sssuming we relax the requirement as I proposed, ie. allow for not
>> > > allow devices to support for hardware offload, but at least one.
>> > > 
>> > > Then, it should be possible to extend the netlink interface to promote
>> > > a flowtable to support hardware offload, e.g.
>> > > 
>> > >   add flowtable inet x y { hook ingress devices = { eth0, eth1 } priority 0; flags offload; }
>> > > 
>> > > For an existing flowtable, that will add eth0 and eth1, and it will
>> > > request to turn hardware offload.
>> > > 
>> > > This is not supported, these bits are missing in the netlink interface.
>> > > 
>> > > > I still think the best course of action is to silently accept the offload
>> > > > flag even if none of the devices support hw offload.
>> > > 
>> > > Silent means user is asking for something that is actually not
>> > > supported, there will be no effective way from the control plane to
>> > > check if what they request is actually being applied.
>> > > 
>> > > I'd propose two changes:
>> > > 
>> > > - relax the existing requirement, so if one device support hw offload,
>> > >    then accept the configuration.
>> > > 
>> > > - allow to update a flowtable to on/off hardware offload from netlink
>> > >    interface without needing to reload your whole ruleset.
>> >
>> > I still don't see the value in forcing user space to do the
>> > failure-and-retry dance if none of the devices support hw offload.
>> > If this is about notifying user space about the hw offload status, I think
>> > it's much better to simply accept such configurations as-is and extend the
>> > netlink api to report which of the member devices hw offload was actually
>> > enabled for.
>> > This would be much more valuable to users that actually care about the hw
>> > offload status than knowing if one of the devices in the list has hw offload
>> > support, and it would simplify the code as well, for kernel and user space
>> > alike.
>> 
>> I would suggest to extend the API to expose if the device actually
>> support for the flowtable hardware offload, then after the listing,
>> the user knows if the feature is available, so they can turn it on.
> 
> Thinking it well, something in between your proposal and mine.
> 
> Allow to set on 'offload', then the kernel will disable this flag if
> no devices support for hardware offload. The update path would also
> need to allow for this new behaviour.
> 
> The user can check via 'nft list ruleset' if the flag is on / off.

I think that's reasonable. Let's implement it this way for now.

Thanks,

- Felix
diff mbox series

Patch

diff --git a/net/netfilter/nf_tables_api.c b/net/netfilter/nf_tables_api.c
index 16c3a39689f4..9d4528f0aa12 100644
--- a/net/netfilter/nf_tables_api.c
+++ b/net/netfilter/nf_tables_api.c
@@ -7323,11 +7323,9 @@  static int nft_register_flowtable_net_hooks(struct net *net,
 			}
 		}
 
-		err = flowtable->data.type->setup(&flowtable->data,
-						  hook->ops.dev,
-						  FLOW_BLOCK_BIND);
-		if (err < 0)
-			goto err_unregister_net_hooks;
+		flowtable->data.type->setup(&flowtable->data,
+					    hook->ops.dev,
+					    FLOW_BLOCK_BIND);
 
 		err = nf_register_net_hook(net, &hook->ops);
 		if (err < 0) {