diff mbox series

[net-next,1/3] bonding: return detailed error when loading native XDP fails

Message ID 20241016031649.880-2-liuhangbin@gmail.com (mailing list archive)
State Changes Requested
Delegated to: Netdev Maintainers
Headers show
Series Bonding: return detailed error about XDP failures | expand

Checks

Context Check Description
netdev/series_format success Posting correctly formatted
netdev/tree_selection success Clearly marked for net-next
netdev/ynl success Generated files up to date; no warnings/errors; no diff in generated;
netdev/fixes_present success Fixes tag not required for -next series
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 5 this patch: 5
netdev/build_tools success No tools touched, skip
netdev/cc_maintainers success CCed 11 of 11 maintainers
netdev/build_clang success Errors and warnings before: 3 this patch: 3
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/deprecated_api success None detected
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success No Fixes tag
netdev/build_allmodconfig_warn success Errors and warnings before: 4 this patch: 4
netdev/checkpatch success total: 0 errors, 0 warnings, 0 checks, 12 lines checked
netdev/build_clang_rust success No Rust files in patch. Skipping build
netdev/kdoc success Errors and warnings before: 13 this patch: 13
netdev/source_inline success Was 0 now: 0
netdev/contest success net-next-2024-10-19--00-00 (tests: 777)

Commit Message

Hangbin Liu Oct. 16, 2024, 3:16 a.m. UTC
Bonding only supports native XDP for specific modes, which can lead to
confusion for users regarding why XDP loads successfully at times and
fails at others. This patch enhances error handling by returning detailed
error messages, providing users with clearer insights into the specific
reasons for the failure when loading native XDP.

Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
---
 drivers/net/bonding/bond_main.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

Comments

Nikolay Aleksandrov Oct. 16, 2024, 7:30 a.m. UTC | #1
On 16/10/2024 06:16, Hangbin Liu wrote:
> Bonding only supports native XDP for specific modes, which can lead to
> confusion for users regarding why XDP loads successfully at times and
> fails at others. This patch enhances error handling by returning detailed
> error messages, providing users with clearer insights into the specific
> reasons for the failure when loading native XDP.
> 
> Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
> ---
>  drivers/net/bonding/bond_main.c | 5 ++++-
>  1 file changed, 4 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
> index b1bffd8e9a95..f0f76b6ac8be 100644
> --- a/drivers/net/bonding/bond_main.c
> +++ b/drivers/net/bonding/bond_main.c
> @@ -5676,8 +5676,11 @@ static int bond_xdp_set(struct net_device *dev, struct bpf_prog *prog,
>  
>  	ASSERT_RTNL();
>  
> -	if (!bond_xdp_check(bond))
> +	if (!bond_xdp_check(bond)) {
> +		BOND_NL_ERR(dev, extack,
> +			    "No native XDP support for the current bonding mode");
>  		return -EOPNOTSUPP;
> +	}
>  
>  	old_prog = bond->xdp_prog;
>  	bond->xdp_prog = prog;

I guess this is based on our discussion earlier?

Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org>
Daniel Borkmann Oct. 16, 2024, 7:59 a.m. UTC | #2
On 10/16/24 5:16 AM, Hangbin Liu wrote:
> Bonding only supports native XDP for specific modes, which can lead to
> confusion for users regarding why XDP loads successfully at times and
> fails at others. This patch enhances error handling by returning detailed
> error messages, providing users with clearer insights into the specific
> reasons for the failure when loading native XDP.
> 
> Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
> ---
>   drivers/net/bonding/bond_main.c | 5 ++++-
>   1 file changed, 4 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
> index b1bffd8e9a95..f0f76b6ac8be 100644
> --- a/drivers/net/bonding/bond_main.c
> +++ b/drivers/net/bonding/bond_main.c
> @@ -5676,8 +5676,11 @@ static int bond_xdp_set(struct net_device *dev, struct bpf_prog *prog,
>   
>   	ASSERT_RTNL();
>   
> -	if (!bond_xdp_check(bond))
> +	if (!bond_xdp_check(bond)) {
> +		BOND_NL_ERR(dev, extack,
> +			    "No native XDP support for the current bonding mode");
>   		return -EOPNOTSUPP;
> +	}
>   
>   	old_prog = bond->xdp_prog;
>   	bond->xdp_prog = prog;

LGTM, but independent of these I was more thinking whether something like this
could do the trick (only compile tested). That way you also get the fallback
without changing anything in the core XDP code.

Thanks,
Daniel

diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index b1bffd8e9a95..2861b3a895ff 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -5915,6 +5915,10 @@ static const struct ethtool_ops bond_ethtool_ops = {
  	.get_ts_info		= bond_ethtool_get_ts_info,
  };
  
+static const struct device_type bond_type = {
+	.name = "bond",
+};
+
  static const struct net_device_ops bond_netdev_ops = {
  	.ndo_init		= bond_init,
  	.ndo_uninit		= bond_uninit,
@@ -5951,9 +5955,20 @@ static const struct net_device_ops bond_netdev_ops = {
  	.ndo_hwtstamp_set	= bond_hwtstamp_set,
  };
  
-static const struct device_type bond_type = {
-	.name = "bond",
-};
+static struct net_device_ops bond_netdev_ops_noxdp __ro_after_init;
+
+static void __init bond_setup_noxdp_ops(void)
+{
+	memcpy(&bond_netdev_ops_noxdp, &bond_netdev_ops,
+	       sizeof(bond_netdev_ops));
+
+	/* Used for bond device mode which does not support XDP
+	 * yet, see also bond_xdp_check().
+	 */
+	bond_netdev_ops_noxdp.ndo_bpf = NULL;
+	bond_netdev_ops_noxdp.ndo_xdp_xmit = NULL;
+	bond_netdev_ops_noxdp.ndo_xdp_get_xmit_slave = NULL;
+}
  
  static void bond_destructor(struct net_device *bond_dev)
  {
@@ -5978,7 +5993,9 @@ void bond_setup(struct net_device *bond_dev)
  	/* Initialize the device entry points */
  	ether_setup(bond_dev);
  	bond_dev->max_mtu = ETH_MAX_MTU;
-	bond_dev->netdev_ops = &bond_netdev_ops;
+	bond_dev->netdev_ops = bond_xdp_check(bond) ?
+			       &bond_netdev_ops :
+			       &bond_netdev_ops_noxdp;
  	bond_dev->ethtool_ops = &bond_ethtool_ops;
  
  	bond_dev->needs_free_netdev = true;
@@ -6591,6 +6608,8 @@ static int __init bonding_init(void)
  	int i;
  	int res;
  
+	bond_setup_noxdp_ops();
+
  	res = bond_check_params(&bonding_defaults);
  	if (res)
  		goto out;
Nikolay Aleksandrov Oct. 16, 2024, 8:13 a.m. UTC | #3
On 16/10/2024 10:59, Daniel Borkmann wrote:
> On 10/16/24 5:16 AM, Hangbin Liu wrote:
>> Bonding only supports native XDP for specific modes, which can lead to
>> confusion for users regarding why XDP loads successfully at times and
>> fails at others. This patch enhances error handling by returning detailed
>> error messages, providing users with clearer insights into the specific
>> reasons for the failure when loading native XDP.
>>
>> Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
>> ---
>>   drivers/net/bonding/bond_main.c | 5 ++++-
>>   1 file changed, 4 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
>> index b1bffd8e9a95..f0f76b6ac8be 100644
>> --- a/drivers/net/bonding/bond_main.c
>> +++ b/drivers/net/bonding/bond_main.c
>> @@ -5676,8 +5676,11 @@ static int bond_xdp_set(struct net_device *dev, struct bpf_prog *prog,
>>         ASSERT_RTNL();
>>   -    if (!bond_xdp_check(bond))
>> +    if (!bond_xdp_check(bond)) {
>> +        BOND_NL_ERR(dev, extack,
>> +                "No native XDP support for the current bonding mode");
>>           return -EOPNOTSUPP;
>> +    }
>>         old_prog = bond->xdp_prog;
>>       bond->xdp_prog = prog;
> 
> LGTM, but independent of these I was more thinking whether something like this
> could do the trick (only compile tested). That way you also get the fallback
> without changing anything in the core XDP code.
> 
> Thanks,
> Daniel
> 
> diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
> index b1bffd8e9a95..2861b3a895ff 100644
> --- a/drivers/net/bonding/bond_main.c
> +++ b/drivers/net/bonding/bond_main.c
> @@ -5915,6 +5915,10 @@ static const struct ethtool_ops bond_ethtool_ops = {
>      .get_ts_info        = bond_ethtool_get_ts_info,
>  };
>  
> +static const struct device_type bond_type = {
> +    .name = "bond",
> +};
> +
>  static const struct net_device_ops bond_netdev_ops = {
>      .ndo_init        = bond_init,
>      .ndo_uninit        = bond_uninit,
> @@ -5951,9 +5955,20 @@ static const struct net_device_ops bond_netdev_ops = {
>      .ndo_hwtstamp_set    = bond_hwtstamp_set,
>  };
>  
> -static const struct device_type bond_type = {
> -    .name = "bond",
> -};
> +static struct net_device_ops bond_netdev_ops_noxdp __ro_after_init;
> +
> +static void __init bond_setup_noxdp_ops(void)
> +{
> +    memcpy(&bond_netdev_ops_noxdp, &bond_netdev_ops,
> +           sizeof(bond_netdev_ops));
> +
> +    /* Used for bond device mode which does not support XDP
> +     * yet, see also bond_xdp_check().
> +     */
> +    bond_netdev_ops_noxdp.ndo_bpf = NULL;
> +    bond_netdev_ops_noxdp.ndo_xdp_xmit = NULL;
> +    bond_netdev_ops_noxdp.ndo_xdp_get_xmit_slave = NULL;
> +}
>  
>  static void bond_destructor(struct net_device *bond_dev)
>  {
> @@ -5978,7 +5993,9 @@ void bond_setup(struct net_device *bond_dev)
>      /* Initialize the device entry points */
>      ether_setup(bond_dev);
>      bond_dev->max_mtu = ETH_MAX_MTU;
> -    bond_dev->netdev_ops = &bond_netdev_ops;
> +    bond_dev->netdev_ops = bond_xdp_check(bond) ?
> +                   &bond_netdev_ops :
> +                   &bond_netdev_ops_noxdp;

This will have to be done safely on bond mode change as well.
If all slaves are released we can switch modes without destroying
the device.

>      bond_dev->ethtool_ops = &bond_ethtool_ops;
>  
>      bond_dev->needs_free_netdev = true;
> @@ -6591,6 +6608,8 @@ static int __init bonding_init(void)
>      int i;
>      int res;
>  
> +    bond_setup_noxdp_ops();
> +
>      res = bond_check_params(&bonding_defaults);
>      if (res)
>          goto out;
Hangbin Liu Oct. 16, 2024, 8:17 a.m. UTC | #4
On Wed, Oct 16, 2024 at 09:59:01AM +0200, Daniel Borkmann wrote:
> > diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
> > index b1bffd8e9a95..f0f76b6ac8be 100644
> > --- a/drivers/net/bonding/bond_main.c
> > +++ b/drivers/net/bonding/bond_main.c
> > @@ -5676,8 +5676,11 @@ static int bond_xdp_set(struct net_device *dev, struct bpf_prog *prog,
> >   	ASSERT_RTNL();
> > -	if (!bond_xdp_check(bond))
> > +	if (!bond_xdp_check(bond)) {
> > +		BOND_NL_ERR(dev, extack,
> > +			    "No native XDP support for the current bonding mode");
> >   		return -EOPNOTSUPP;
> > +	}
> >   	old_prog = bond->xdp_prog;
> >   	bond->xdp_prog = prog;
> 
> LGTM, but independent of these I was more thinking whether something like this
> could do the trick (only compile tested). That way you also get the fallback
> without changing anything in the core XDP code.

Yes, I also thought about do fallback on bonding. But Nikolay suggested
just use extack msg[1], and Jakub think this is report by QE rather than
a real user. So I think we can use extack first, and convert to auto
fallback on bonding if a real user complains. What do you think?

[1] https://lore.kernel.org/netdev/8088f2a7-3ab1-4a1e-996d-c15703da13cc@blackwall.org/
[2] https://lore.kernel.org/netdev/20241015085121.5f22e96f@kernel.org/

Thanks
Hangbin
Daniel Borkmann Oct. 16, 2024, 8:24 a.m. UTC | #5
On 10/16/24 10:13 AM, Nikolay Aleksandrov wrote:
> On 16/10/2024 10:59, Daniel Borkmann wrote:
>> On 10/16/24 5:16 AM, Hangbin Liu wrote:
>>> Bonding only supports native XDP for specific modes, which can lead to
>>> confusion for users regarding why XDP loads successfully at times and
>>> fails at others. This patch enhances error handling by returning detailed
>>> error messages, providing users with clearer insights into the specific
>>> reasons for the failure when loading native XDP.
>>>
>>> Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
>>> ---
>>>    drivers/net/bonding/bond_main.c | 5 ++++-
>>>    1 file changed, 4 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
>>> index b1bffd8e9a95..f0f76b6ac8be 100644
>>> --- a/drivers/net/bonding/bond_main.c
>>> +++ b/drivers/net/bonding/bond_main.c
>>> @@ -5676,8 +5676,11 @@ static int bond_xdp_set(struct net_device *dev, struct bpf_prog *prog,
>>>          ASSERT_RTNL();
>>>    -    if (!bond_xdp_check(bond))
>>> +    if (!bond_xdp_check(bond)) {
>>> +        BOND_NL_ERR(dev, extack,
>>> +                "No native XDP support for the current bonding mode");
>>>            return -EOPNOTSUPP;
>>> +    }
>>>          old_prog = bond->xdp_prog;
>>>        bond->xdp_prog = prog;
>>
>> LGTM, but independent of these I was more thinking whether something like this
>> could do the trick (only compile tested). That way you also get the fallback
>> without changing anything in the core XDP code.
>>
>> Thanks,
>> Daniel
>>
>> diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
>> index b1bffd8e9a95..2861b3a895ff 100644
>> --- a/drivers/net/bonding/bond_main.c
>> +++ b/drivers/net/bonding/bond_main.c
>> @@ -5915,6 +5915,10 @@ static const struct ethtool_ops bond_ethtool_ops = {
>>       .get_ts_info        = bond_ethtool_get_ts_info,
>>   };
>>   
>> +static const struct device_type bond_type = {
>> +    .name = "bond",
>> +};
>> +
>>   static const struct net_device_ops bond_netdev_ops = {
>>       .ndo_init        = bond_init,
>>       .ndo_uninit        = bond_uninit,
>> @@ -5951,9 +5955,20 @@ static const struct net_device_ops bond_netdev_ops = {
>>       .ndo_hwtstamp_set    = bond_hwtstamp_set,
>>   };
>>   
>> -static const struct device_type bond_type = {
>> -    .name = "bond",
>> -};
>> +static struct net_device_ops bond_netdev_ops_noxdp __ro_after_init;
>> +
>> +static void __init bond_setup_noxdp_ops(void)
>> +{
>> +    memcpy(&bond_netdev_ops_noxdp, &bond_netdev_ops,
>> +           sizeof(bond_netdev_ops));
>> +
>> +    /* Used for bond device mode which does not support XDP
>> +     * yet, see also bond_xdp_check().
>> +     */
>> +    bond_netdev_ops_noxdp.ndo_bpf = NULL;
>> +    bond_netdev_ops_noxdp.ndo_xdp_xmit = NULL;
>> +    bond_netdev_ops_noxdp.ndo_xdp_get_xmit_slave = NULL;
>> +}
>>   
>>   static void bond_destructor(struct net_device *bond_dev)
>>   {
>> @@ -5978,7 +5993,9 @@ void bond_setup(struct net_device *bond_dev)
>>       /* Initialize the device entry points */
>>       ether_setup(bond_dev);
>>       bond_dev->max_mtu = ETH_MAX_MTU;
>> -    bond_dev->netdev_ops = &bond_netdev_ops;
>> +    bond_dev->netdev_ops = bond_xdp_check(bond) ?
>> +                   &bond_netdev_ops :
>> +                   &bond_netdev_ops_noxdp;
> 
> This will have to be done safely on bond mode change as well.
> If all slaves are released we can switch modes without destroying
> the device.

Ah fair, yeah perhaps not worth the added complexity. Tbh, if someone
loads an XDP program with the fallback to generic, it feels super fragile
in the first place and I wouldn't do this ever for production workloads.
Meaning, fixed to native for production, generic XDP for testing where
native is not available (e.g. CI), at least that's how we use it.

Thanks,
Daniel
diff mbox series

Patch

diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index b1bffd8e9a95..f0f76b6ac8be 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -5676,8 +5676,11 @@  static int bond_xdp_set(struct net_device *dev, struct bpf_prog *prog,
 
 	ASSERT_RTNL();
 
-	if (!bond_xdp_check(bond))
+	if (!bond_xdp_check(bond)) {
+		BOND_NL_ERR(dev, extack,
+			    "No native XDP support for the current bonding mode");
 		return -EOPNOTSUPP;
+	}
 
 	old_prog = bond->xdp_prog;
 	bond->xdp_prog = prog;