diff mbox series

[net,v1] devlink: fix xa_alloc_cyclic error handling

Message ID 20250214132453.4108-1-michal.swiatkowski@linux.intel.com (mailing list archive)
State Changes Requested
Delegated to: Netdev Maintainers
Headers show
Series [net,v1] devlink: fix xa_alloc_cyclic error handling | expand

Checks

Context Check Description
netdev/series_format success Single patches do not need cover letters
netdev/tree_selection success Clearly marked for net
netdev/ynl success Generated files up to date; no warnings/errors; no diff in generated;
netdev/fixes_present fail Series targets non-next tree, but doesn't contain any Fixes tags
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 0 this patch: 0
netdev/build_tools success No tools touched, skip
netdev/cc_maintainers success CCed 6 of 6 maintainers
netdev/build_clang success Errors and warnings before: 1 this patch: 1
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/deprecated_api success None detected
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success No Fixes tag
netdev/build_allmodconfig_warn success Errors and warnings before: 0 this patch: 0
netdev/checkpatch success total: 0 errors, 0 warnings, 0 checks, 8 lines checked
netdev/build_clang_rust success No Rust files in patch. Skipping build
netdev/kdoc success Errors and warnings before: 3 this patch: 3
netdev/source_inline success Was 0 now: 0
netdev/contest success net-next-2025-02-15--03-00 (tests: 891)

Commit Message

Michal Swiatkowski Feb. 14, 2025, 1:24 p.m. UTC
Pierre Riteau <pierre@stackhpc.com> found suspicious handling an error
from xa_alloc_cyclic() in scheduler code [1]. The same is done in
devlink_rel_alloc().

In case of returning 1 from xa_alloc_cyclic() (wrapping) ERR_PTR(1) will
be returned, which will cause IS_ERR() to be false. Which can lead to
dereference not allocated pointer (rel).

Fix it by checking if err is lower than zero.

This wasn't found in real usecase, only noticed. Credit to Pierre.

Signed-off-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
---
 net/devlink/core.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Andrew Lunn Feb. 14, 2025, 1:44 p.m. UTC | #1
On Fri, Feb 14, 2025 at 02:24:53PM +0100, Michal Swiatkowski wrote:
> Pierre Riteau <pierre@stackhpc.com> found suspicious handling an error
> from xa_alloc_cyclic() in scheduler code [1]. The same is done in
> devlink_rel_alloc().

If the same bug exists twice it might exist more times. Did you find
this instance by searching the whole tree? Or just networking?

This is also something which would be good to have the static
analysers check for. I wounder if smatch can check this?

	Andrew

> 
> In case of returning 1 from xa_alloc_cyclic() (wrapping) ERR_PTR(1) will
> be returned, which will cause IS_ERR() to be false. Which can lead to
> dereference not allocated pointer (rel).
> 
> Fix it by checking if err is lower than zero.
> 
> This wasn't found in real usecase, only noticed. Credit to Pierre.
> 
> Signed-off-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
> ---
>  net/devlink/core.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/net/devlink/core.c b/net/devlink/core.c
> index f49cd83f1955..7203c39532fc 100644
> --- a/net/devlink/core.c
> +++ b/net/devlink/core.c
> @@ -117,7 +117,7 @@ static struct devlink_rel *devlink_rel_alloc(void)
>  
>  	err = xa_alloc_cyclic(&devlink_rels, &rel->index, rel,
>  			      xa_limit_32b, &next, GFP_KERNEL);
> -	if (err) {
> +	if (err < 0) {
>  		kfree(rel);
>  		return ERR_PTR(err);
>  	}
> -- 
> 2.42.0
> 
>
Michal Swiatkowski Feb. 14, 2025, 1:58 p.m. UTC | #2
On Fri, Feb 14, 2025 at 02:44:49PM +0100, Andrew Lunn wrote:
> On Fri, Feb 14, 2025 at 02:24:53PM +0100, Michal Swiatkowski wrote:
> > Pierre Riteau <pierre@stackhpc.com> found suspicious handling an error
> > from xa_alloc_cyclic() in scheduler code [1]. The same is done in
> > devlink_rel_alloc().
> 
> If the same bug exists twice it might exist more times. Did you find
> this instance by searching the whole tree? Or just networking?
> 
> This is also something which would be good to have the static
> analysers check for. I wounder if smatch can check this?
> 
> 	Andrew
> 

You are right, I checked only net folder and there are two usage like
that in drivers. I will send v2 with wider fixing, thanks.

It can be not so easy to check. What if someone want to treat wrapping
as an error (don't know if it is valid)? If one of the caller is
checking err < 0 it will be fine.

Thanks,
Michal

> > 
> > In case of returning 1 from xa_alloc_cyclic() (wrapping) ERR_PTR(1) will
> > be returned, which will cause IS_ERR() to be false. Which can lead to
> > dereference not allocated pointer (rel).
> > 
> > Fix it by checking if err is lower than zero.
> > 
> > This wasn't found in real usecase, only noticed. Credit to Pierre.
> > 
> > Signed-off-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
> > ---
> >  net/devlink/core.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/net/devlink/core.c b/net/devlink/core.c
> > index f49cd83f1955..7203c39532fc 100644
> > --- a/net/devlink/core.c
> > +++ b/net/devlink/core.c
> > @@ -117,7 +117,7 @@ static struct devlink_rel *devlink_rel_alloc(void)
> >  
> >  	err = xa_alloc_cyclic(&devlink_rels, &rel->index, rel,
> >  			      xa_limit_32b, &next, GFP_KERNEL);
> > -	if (err) {
> > +	if (err < 0) {
> >  		kfree(rel);
> >  		return ERR_PTR(err);
> >  	}
> > -- 
> > 2.42.0
> > 
> >
Andrew Lunn Feb. 14, 2025, 2:14 p.m. UTC | #3
On Fri, Feb 14, 2025 at 02:58:41PM +0100, Michal Swiatkowski wrote:
> On Fri, Feb 14, 2025 at 02:44:49PM +0100, Andrew Lunn wrote:
> > On Fri, Feb 14, 2025 at 02:24:53PM +0100, Michal Swiatkowski wrote:
> > > Pierre Riteau <pierre@stackhpc.com> found suspicious handling an error
> > > from xa_alloc_cyclic() in scheduler code [1]. The same is done in
> > > devlink_rel_alloc().
> > 
> > If the same bug exists twice it might exist more times. Did you find
> > this instance by searching the whole tree? Or just networking?
> > 
> > This is also something which would be good to have the static
> > analysers check for. I wounder if smatch can check this?
> > 
> > 	Andrew
> > 
> 
> You are right, I checked only net folder and there are two usage like
> that in drivers. I will send v2 with wider fixing, thanks.
> 
> It can be not so easy to check. What if someone want to treat wrapping
> as an error (don't know if it is valid)? If one of the caller is
> checking err < 0 it will be fine.

I put Dan in Cc:, lets see what he thinks.

There is at least one other functions i can think of which has similar
behaviour, < 0 on error, 0 or 1 are both different sorts of
success. If there are two, there are probably more. Having tooling to
find this sort of problem would be nice, even if it has a high false
positive rate and needs combining with manual inspection.

	Andrew
Dan Carpenter Feb. 16, 2025, 3:06 p.m. UTC | #4
On Fri, Feb 14, 2025 at 02:44:49PM +0100, Andrew Lunn wrote:
> On Fri, Feb 14, 2025 at 02:24:53PM +0100, Michal Swiatkowski wrote:
> > Pierre Riteau <pierre@stackhpc.com> found suspicious handling an error
> > from xa_alloc_cyclic() in scheduler code [1]. The same is done in
> > devlink_rel_alloc().
> 
> If the same bug exists twice it might exist more times. Did you find
> this instance by searching the whole tree? Or just networking?
> 
> This is also something which would be good to have the static
> analysers check for. I wounder if smatch can check this?

That's a great idea, thanks!  I'll try a couple experiments and see what
works tomorrow.  I've add these lines to check_zero_to_err_ptr.c

   183          max = rl_max(estate_rl(sm->state));
   184          if (max.value > 0 && !sval_is_a_max(max))
   185                  sm_warning("passing non-max range '%s' to '%s'", sm->state->name, fn);
   186  

I'm hoping this one works.  It complains about any positive returns
except for when the return is "some non-zero value".

   194                  if (estate_get_single_value(tmp->state, &sval) &&
   195                      (sval.value < -4096 || sval.value > 0)) {
   196                          sm_warning("passing invalid error code %lld to '%s'", sval.value, fn);
   197                          return;
   198                  }

This one might miss some bugs but it should catch most stuff and have few
false positives.  Both of them work on this example.

net/devlink/core.c:122 devlink_rel_alloc() warn: passing non-max range '(-4095)-(-1),1' to 'ERR_PTR'
net/devlink/core.c:122 devlink_rel_alloc() warn: passing invalid error code 1 to 'ERR_PTR'

regards,
dan carpenter
Andrew Lunn Feb. 16, 2025, 4:08 p.m. UTC | #5
On Sun, Feb 16, 2025 at 06:06:48PM +0300, Dan Carpenter wrote:
> On Fri, Feb 14, 2025 at 02:44:49PM +0100, Andrew Lunn wrote:
> > On Fri, Feb 14, 2025 at 02:24:53PM +0100, Michal Swiatkowski wrote:
> > > Pierre Riteau <pierre@stackhpc.com> found suspicious handling an error
> > > from xa_alloc_cyclic() in scheduler code [1]. The same is done in
> > > devlink_rel_alloc().
> > 
> > If the same bug exists twice it might exist more times. Did you find
> > this instance by searching the whole tree? Or just networking?
> > 
> > This is also something which would be good to have the static
> > analysers check for. I wounder if smatch can check this?
> 
> That's a great idea, thanks!  I'll try a couple experiments and see what
> works tomorrow.  I've add these lines to check_zero_to_err_ptr.c
> 
>    183          max = rl_max(estate_rl(sm->state));
>    184          if (max.value > 0 && !sval_is_a_max(max))
>    185                  sm_warning("passing non-max range '%s' to '%s'", sm->state->name, fn);
>    186  
> 
> I'm hoping this one works.  It complains about any positive returns
> except for when the return is "some non-zero value".
> 
>    194                  if (estate_get_single_value(tmp->state, &sval) &&
>    195                      (sval.value < -4096 || sval.value > 0)) {
>    196                          sm_warning("passing invalid error code %lld to '%s'", sval.value, fn);
>    197                          return;
>    198                  }
> 
> This one might miss some bugs but it should catch most stuff and have few
> false positives.  Both of them work on this example.
> 
> net/devlink/core.c:122 devlink_rel_alloc() warn: passing non-max range '(-4095)-(-1),1' to 'ERR_PTR'
> net/devlink/core.c:122 devlink_rel_alloc() warn: passing invalid error code 1 to 'ERR_PTR'

Nice. In networking, ethernet PHYs, there are a few functions ending
in _changed() have the same behaviour:

 * Returns negative errno, 0 if there was no change, and 1 in case of change

So there is the potential for the same issue with
mdiobus_modify_changed(), phy_modify_changed(),
phy_modify_mmd_changed(), phy_modify_paged_changed(). Hope this helps
with testing.

	Andrew
Dan Carpenter Feb. 17, 2025, 6:57 a.m. UTC | #6
Both versions find 97 warning but they're slightly different so I should
create some kind of combined check.

drivers/input/touchscreen/cyttsp_core.c:658 cyttsp_probe() warn: passing non-max range 's32min-(-1),1' to 'ERR_PTR'
drivers/gpu/drm/mediatek/mtk_dp.c:2736 mtk_dp_probe() warn: passing non-max range 's32min-(-1),1' to 'dev_err_probe'
drivers/clk/clk-gpio.c:371 clk_gated_fixed_probe() warn: passing non-max range 's32min-(-1),1' to 'dev_err_probe'
drivers/net/ethernet/socionext/netsec.c:1902 netsec_acpi_probe() warn: passing non-max range 's32min-(-1),1' to 'dev_err_probe'
drivers/net/ethernet/socionext/netsec.c:1909 netsec_acpi_probe() warn: passing non-max range 's32min-(-1),1' to 'dev_err_probe'
drivers/net/can/spi/mcp251xfd/mcp251xfd-core.c:2050 mcp251xfd_probe() warn: passing non-max range 's32min-(-1),1' to 'dev_err_probe'
drivers/net/can/spi/hi311x.c:857 hi3110_can_probe() warn: passing non-max range 's32min-(-1),1' to 'dev_err_probe'
drivers/net/can/dev/dev.c:496 can_get_termination() warn: passing non-max range 's32min-(-1),1' to 'ERR_PTR'
drivers/pwm/pwm-sl28cpld.c:228 sl28cpld_pwm_probe() warn: passing non-max range 's32min-(-1),1' to 'ERR_PTR'
drivers/leds/leds-is31fl319x.c:423 is31fl319x_parse_fw() warn: passing non-max range 's32min-(-1),1' to 'dev_err_probe'
drivers/leds/rgb/leds-mt6370-rgb.c:738 mt6370_assign_multicolor_info() warn: passing non-max range 's32min-(-1),1' to 'dev_err_probe'
drivers/leds/rgb/leds-mt6370-rgb.c:945 mt6370_leds_probe() warn: passing non-max range 's32min-(-1),1' to 'dev_err_probe'
drivers/leds/rgb/leds-mt6370-rgb.c:952 mt6370_leds_probe() warn: passing non-max range 's32min-(-1),1' to 'dev_err_probe'

The first return from acpi_data_prop_read() could be 1.  I'll report
this.

drivers/gpu/drm/msm/msm_gem_submit.c:537 msm_parse_deps() warn: passing non-max range '(-4095)-(-1),22' to 'ERR_PTR'

Clear bug.  Double negative -(-EINVAL).  I'll send a patch.

drivers/gpu/drm/xe/xe_guc.c:1241 xe_guc_suspend() warn: passing non-max range '(-110),(-71),(-6),1-268435455' to 'ERR_PTR'
drivers/gpu/drm/i915/gt/uc/intel_guc.c:698 intel_guc_suspend() warn: passing non-max range '1-268435455' to 'ERR_PTR'

Smatch gets confused by the return FIELD_GET(GUC_HXG_RESPONSE_MSG_0_DATA0, header);
in xe_guc_mmio_send_recv().

drivers/spi/spi-pxa2xx-platform.c:101 pxa2xx_spi_init_pdata() warn: passing non-max range 's32min-(-1),1' to 'ERR_PTR'
drivers/spi/spi-cs42l43.c:396 cs42l43_spi_probe() warn: passing non-max range 's32min-(-23),(-21)-(-1),1' to 'dev_err_probe'
drivers/spi/spi-rockchip-sfc.c:654 rockchip_sfc_probe() warn: passing non-max range 's32min-(-1),1' to 'dev_err_probe'
drivers/net/dsa/mv88e6xxx/pcs-6352.c:250 marvell_c22_pcs_link_up() warn: passing non-max range '(-110),(-95),(-16),1' to 'ERR_PTR'

False positive.  marvell_c22_pcs_restore_page() is complicated.

drivers/net/phy/phy_device.c:1150 phy_connect() warn: passing non-max range 's32min-(-1),1' to 'ERR_PTR'
drivers/net/phy/phy_device.c:1651 phy_attach() warn: passing non-max range 's32min-(-1),1' to 'ERR_PTR'
drivers/net/ethernet/ethoc.c:719 ethoc_mdio_probe() warn: passing non-max range 's32min-(-1),1' to 'dev_err_probe'
drivers/net/ethernet/hisilicon/hibmcge/hbg_mdio.c:178 hbg_phy_connect() warn: passing non-max range 's32min-(-1),1' to 'dev_err_probe'

xa_alloc_cyclic() returns 1 if the allocation succeeded after wrapping.
I'll report this.

drivers/net/ethernet/aquantia/atlantic/aq_ring.c:481 aq_xdp_run_prog() warn: passing non-max range '(-16),0,5-6' to 'ERR_PTR'

Bug.  Mixing error codes from aq_hw_err_from_flags() and NETDEV_TX_BUSY.

drivers/media/i2c/ds90ub913.c:856 ub913_probe() warn: passing non-max range 's32min-(-1),1' to 'dev_err_probe'

This is a transient bug that went away after a database rebuild.  I'll
debug the rest over time.

regards,
dan carpenter
Dan Carpenter Feb. 17, 2025, 7:46 a.m. UTC | #7
On Sun, Feb 16, 2025 at 05:08:23PM +0100, Andrew Lunn wrote:
>  * Returns negative errno, 0 if there was no change, and 1 in case of change
> 
> So there is the potential for the same issue with
> mdiobus_modify_changed(), phy_modify_changed(),
> phy_modify_mmd_changed(), phy_modify_paged_changed(). Hope this helps
> with testing.

Thanks, that's useful.

The first check would have caught all these, but the second one would
only have caught the first three.  The combined check will catch them
all.  :)

regards,
dan carpenter
Paolo Abeni Feb. 18, 2025, 11:56 a.m. UTC | #8
On 2/14/25 2:58 PM, Michal Swiatkowski wrote:
> On Fri, Feb 14, 2025 at 02:44:49PM +0100, Andrew Lunn wrote:
>> On Fri, Feb 14, 2025 at 02:24:53PM +0100, Michal Swiatkowski wrote:
>>> Pierre Riteau <pierre@stackhpc.com> found suspicious handling an error
>>> from xa_alloc_cyclic() in scheduler code [1]. The same is done in
>>> devlink_rel_alloc().
>>
>> If the same bug exists twice it might exist more times. Did you find
>> this instance by searching the whole tree? Or just networking?
>>
>> This is also something which would be good to have the static
>> analysers check for. I wounder if smatch can check this?
>>
>> 	Andrew
>>
> 
> You are right, I checked only net folder and there are two usage like
> that in drivers. I will send v2 with wider fixing, thanks.

While at that, please add the suitable fixes tag(s).

Thanks,

Paolo
diff mbox series

Patch

diff --git a/net/devlink/core.c b/net/devlink/core.c
index f49cd83f1955..7203c39532fc 100644
--- a/net/devlink/core.c
+++ b/net/devlink/core.c
@@ -117,7 +117,7 @@  static struct devlink_rel *devlink_rel_alloc(void)
 
 	err = xa_alloc_cyclic(&devlink_rels, &rel->index, rel,
 			      xa_limit_32b, &next, GFP_KERNEL);
-	if (err) {
+	if (err < 0) {
 		kfree(rel);
 		return ERR_PTR(err);
 	}