Message ID | 20200928174153.GA446008@mwanda (mailing list archive) |
---|---|
State | Not Applicable |
Headers | show |
Series | [1/2,net-next] net/mlx5e: TC: Fix IS_ERR() vs NULL checks | expand |
On Sep 28, 2020, at 13:42, Dan Carpenter <dan.carpenter@oracle.com> wrote: > > The mlx5_tc_ct_init() function doesn't return error pointers it returns > NULL. Also we need to set the error codes on this path. > > Fixes: aedd133d17bc ("net/mlx5e: Support CT offload for tc nic flows") > Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> > --- > drivers/net/ethernet/mellanox/mlx5/core/en_tc.c | 8 ++++++-- > 1 file changed, 6 insertions(+), 2 deletions(-) > > diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c > index 104b1c339de0..438fbcf478d1 100644 > --- a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c > +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c > @@ -5224,8 +5224,10 @@ int mlx5e_tc_nic_init(struct mlx5e_priv *priv) > > tc->ct = mlx5_tc_ct_init(priv, tc->chains, &priv->fs.tc.mod_hdr, > MLX5_FLOW_NAMESPACE_KERNEL); > - if (IS_ERR(tc->ct)) > + if (!tc->ct) { > + err = -ENOMEM; > goto err_ct; > + } Hi Dan, That was implement like that on purpose. If mlx5_tc_init_ct returns NULL it means the device doesn’t support CT offload which can happen with older devices or old FW on the devices. However, in this case we want to continue with the rest of the Tc initialization because we can still support other TC offloads. No need to fail the entire TC init in this case. Only if mlx5_tc_init_ct return err_ptr that means the tc init failed not because of lack of support but due to a real error and only then we want to fail the rest of the tc init. Your change will break compatibility for devices/FW versions that don’t have CT offload support. Ariel > > tc->netdevice_nb.notifier_call = mlx5e_tc_netdev_event; > err = register_netdevice_notifier_dev_net(priv->netdev, > @@ -5300,8 +5300,10 @@ int mlx5e_tc_esw_init(struct rhashtable *tc_ht) > esw_chains(esw), > &esw->offloads.mod_hdr, > MLX5_FLOW_NAMESPACE_FDB); > - if (IS_ERR(uplink_priv->ct_priv)) > + if (!uplink_priv->ct_priv) { > + err = -ENOMEM; > goto err_ct; > + } > > mapping = mapping_create(sizeof(struct tunnel_match_key), > TUNNEL_INFO_BITS_MASK, true); > -- > 2.28.0 >
On Mon, Sep 28, 2020 at 06:31:04PM +0000, Ariel Levkovich wrote: > On Sep 28, 2020, at 13:42, Dan Carpenter <dan.carpenter@oracle.com> wrote: > > > > The mlx5_tc_ct_init() function doesn't return error pointers it returns > > NULL. Also we need to set the error codes on this path. > > > > Fixes: aedd133d17bc ("net/mlx5e: Support CT offload for tc nic flows") > > Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> > > --- > > drivers/net/ethernet/mellanox/mlx5/core/en_tc.c | 8 ++++++-- > > 1 file changed, 6 insertions(+), 2 deletions(-) > > > > diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c > > index 104b1c339de0..438fbcf478d1 100644 > > --- a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c > > +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c > > @@ -5224,8 +5224,10 @@ int mlx5e_tc_nic_init(struct mlx5e_priv *priv) > > > > tc->ct = mlx5_tc_ct_init(priv, tc->chains, &priv->fs.tc.mod_hdr, > > MLX5_FLOW_NAMESPACE_KERNEL); > > - if (IS_ERR(tc->ct)) > > + if (!tc->ct) { > > + err = -ENOMEM; > > goto err_ct; > > + } > > Hi Dan, > That was implement like that on purpose. If mlx5_tc_init_ct returns > NULL it means the device doesn’t support CT offload which can happen > with older devices or old FW on the devices. > However, in this case we want to continue with the rest of the Tc > initialization because we can still support other TC offloads. No > need to fail the entire TC init in this case. Only if mlx5_tc_init_ct > return err_ptr that means the tc init failed not because of lack of > support but due to a real error and only then we want to fail the rest > of the tc init. > > Your change will break compatibility for devices/FW versions that > don’t have CT offload support. > I should have looked at this more closely. It seems the bug is in mlx5_tc_ct_init(). drivers/net/ethernet/mellanox/mlx5/core/en/tc_ct.c 1897 struct mlx5_tc_ct_priv * 1898 mlx5_tc_ct_init(struct mlx5e_priv *priv, struct mlx5_fs_chains *chains, 1899 struct mod_hdr_tbl *mod_hdr, 1900 enum mlx5_flow_namespace_type ns_type) 1901 { 1902 struct mlx5_tc_ct_priv *ct_priv; 1903 struct mlx5_core_dev *dev; 1904 const char *msg; 1905 int err; 1906 1907 dev = priv->mdev; 1908 err = mlx5_tc_ct_init_check_support(priv, ns_type, &msg); 1909 if (err) { 1910 mlx5_core_warn(dev, 1911 "tc ct offload not supported, %s\n", 1912 msg); 1913 goto err_support; This should probably return NULL and it does. 1914 } 1915 1916 ct_priv = kzalloc(sizeof(*ct_priv), GFP_KERNEL); 1917 if (!ct_priv) 1918 goto err_alloc; This should probably return an ERR_PTR(-ENOMEM) but it instead returns NULL. 1919 1920 ct_priv->zone_mapping = mapping_create(sizeof(u16), 0, true); 1921 if (IS_ERR(ct_priv->zone_mapping)) { 1922 err = PTR_ERR(ct_priv->zone_mapping); 1923 goto err_mapping_zone; ^^^^^^^^^^^^^^^^^^^^^^ This sets "err" but it still returns NULL. Then in the caller if the mlx5_tc_ct_init() call returns an error pointer, it should set the error code. (NULL is a special case of success etc). Can you fix this and give me a reported-by tag? I think my new analysis is correct... regards, dan carpenter
On Mon, Sep 28, 2020 at 06:31:04PM +0000, Ariel Levkovich wrote: > On Sep 28, 2020, at 13:42, Dan Carpenter <dan.carpenter@oracle.com> wrote: > > > > The mlx5_tc_ct_init() function doesn't return error pointers it returns > > NULL. Also we need to set the error codes on this path. > > > > Fixes: aedd133d17bc ("net/mlx5e: Support CT offload for tc nic flows") > > Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> > > --- > > drivers/net/ethernet/mellanox/mlx5/core/en_tc.c | 8 ++++++-- > > 1 file changed, 6 insertions(+), 2 deletions(-) > > > > diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c > > index 104b1c339de0..438fbcf478d1 100644 > > --- a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c > > +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c > > @@ -5224,8 +5224,10 @@ int mlx5e_tc_nic_init(struct mlx5e_priv *priv) > > > > tc->ct = mlx5_tc_ct_init(priv, tc->chains, &priv->fs.tc.mod_hdr, > > MLX5_FLOW_NAMESPACE_KERNEL); > > - if (IS_ERR(tc->ct)) > > + if (!tc->ct) { > > + err = -ENOMEM; > > goto err_ct; > > + } > > Hi Dan, > That was implement like that on purpose. If mlx5_tc_init_ct returns NULL it means the device doesn’t support CT offload which can happen with older devices or old FW on the devices. > However, in this case we want to continue with the rest of the Tc initialization because we can still support other TC offloads. No need to fail the entire TC init in this case. Only if mlx5_tc_init_ct return err_ptr that means the tc init failed not because of lack of support but due to a real error and only then we want to fail the rest of the tc init. > > Your change will break compatibility for devices/FW versions that don’t have CT offload support. > When we have a function like this which is optional then returning NULL is a special kind of success as you say. Returning NULL should not generate a warning message. At the same time, if the user enables the option and the code fails because we are low on memory then returning an error pointer is the correct behavior. Just because the feature is optional does not mean we should ignore what the user told us to do. This code never returns error pointers. It always returns NULL/success when an allocation fails. That triggers the first static checker warning from last year. Now Smatch is complaining about a new static checker warning: drivers/net/ethernet/mellanox/mlx5/core/en_tc.c:4754 mlx5e_tc_esw_init() warn: missing error code here? 'IS_ERR()' failed. 'err' = '0' 4708 int mlx5e_tc_esw_init(struct rhashtable *tc_ht) 4709 { 4710 const size_t sz_enc_opts = sizeof(struct tunnel_match_enc_opts); 4711 struct mlx5_rep_uplink_priv *uplink_priv; 4712 struct mlx5e_rep_priv *rpriv; 4713 struct mapping_ctx *mapping; 4714 struct mlx5_eswitch *esw; 4715 struct mlx5e_priv *priv; 4716 int err = 0; 4717 4718 uplink_priv = container_of(tc_ht, struct mlx5_rep_uplink_priv, tc_ht); 4719 rpriv = container_of(uplink_priv, struct mlx5e_rep_priv, uplink_priv); 4720 priv = netdev_priv(rpriv->netdev); 4721 esw = priv->mdev->priv.eswitch; 4722 4723 uplink_priv->ct_priv = mlx5_tc_ct_init(netdev_priv(priv->netdev), 4724 esw_chains(esw), 4725 &esw->offloads.mod_hdr, 4726 MLX5_FLOW_NAMESPACE_FDB); 4727 if (IS_ERR(uplink_priv->ct_priv)) 4728 goto err_ct; If mlx5_tc_ct_init() fails, which it should do if kmalloc() fails but currently it does not, then the error should be propagated all the way back. So this code should preserve the error code instead of returning success. 4729 4730 mapping = mapping_create(sizeof(struct tunnel_match_key), 4731 TUNNEL_INFO_BITS_MASK, true); regards, dan carpenter
On Mon, 2021-02-15 at 11:30 +0300, Dan Carpenter wrote: > On Mon, Sep 28, 2020 at 06:31:04PM +0000, Ariel Levkovich wrote: > > On Sep 28, 2020, at 13:42, Dan Carpenter <dan.carpenter@oracle.com> > > wrote: > > > > > > The mlx5_tc_ct_init() function doesn't return error pointers it > > > returns > > > NULL. Also we need to set the error codes on this path. > > > > > > Fixes: aedd133d17bc ("net/mlx5e: Support CT offload for tc nic > > > flows") > > > Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> > > > --- > > > drivers/net/ethernet/mellanox/mlx5/core/en_tc.c | 8 ++++++-- > > > 1 file changed, 6 insertions(+), 2 deletions(-) > > > > > > diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c > > > b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c > > > index 104b1c339de0..438fbcf478d1 100644 > > > --- a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c > > > +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c > > > @@ -5224,8 +5224,10 @@ int mlx5e_tc_nic_init(struct mlx5e_priv > > > *priv) > > > > > > tc->ct = mlx5_tc_ct_init(priv, tc->chains, &priv- > > > >fs.tc.mod_hdr, > > > MLX5_FLOW_NAMESPACE_KERNEL); > > > - if (IS_ERR(tc->ct)) > > > + if (!tc->ct) { > > > + err = -ENOMEM; > > > goto err_ct; > > > + } > > > > Hi Dan, > > That was implement like that on purpose. If mlx5_tc_init_ct returns > > NULL it means the device doesn’t support CT offload which can > > happen with older devices or old FW on the devices. > > However, in this case we want to continue with the rest of the Tc > > initialization because we can still support other TC offloads. No > > need to fail the entire TC init in this case. Only if > > mlx5_tc_init_ct return err_ptr that means the tc init failed not > > because of lack of support but due to a real error and only then we > > want to fail the rest of the tc init. > > > > Your change will break compatibility for devices/FW versions that > > don’t have CT offload support. > > > > When we have a function like this which is optional then returning > NULL > is a special kind of success as you say. Returning NULL should not > generate a warning message. At the same time, if the user enables > the > option and the code fails because we are low on memory then returning > an > error pointer is the correct behavior. Just because the feature is > optional does not mean we should ignore what the user told us to do. > > This code never returns error pointers. It always returns > NULL/success > when an allocation fails. That triggers the first static checker > warning from last year. Now Smatch is complaining about a new static > checker warning: > > drivers/net/ethernet/mellanox/mlx5/core/en_tc.c:4754 > mlx5e_tc_esw_init() warn: missing error code here? 'IS_ERR()' failed. > 'err' = '0' > > 4708 int mlx5e_tc_esw_init(struct rhashtable *tc_ht) > 4709 { > 4710 const size_t sz_enc_opts = sizeof(struct > tunnel_match_enc_opts); > 4711 struct mlx5_rep_uplink_priv *uplink_priv; > 4712 struct mlx5e_rep_priv *rpriv; > 4713 struct mapping_ctx *mapping; > 4714 struct mlx5_eswitch *esw; > 4715 struct mlx5e_priv *priv; > 4716 int err = 0; > 4717 > 4718 uplink_priv = container_of(tc_ht, struct > mlx5_rep_uplink_priv, tc_ht); > 4719 rpriv = container_of(uplink_priv, struct > mlx5e_rep_priv, uplink_priv); > 4720 priv = netdev_priv(rpriv->netdev); > 4721 esw = priv->mdev->priv.eswitch; > 4722 > 4723 uplink_priv->ct_priv = > mlx5_tc_ct_init(netdev_priv(priv->netdev), > 4724 > esw_chains(esw), > 4725 &esw- > >offloads.mod_hdr, > 4726 > MLX5_FLOW_NAMESPACE_FDB); > 4727 if (IS_ERR(uplink_priv->ct_priv)) > 4728 goto err_ct; > The driver is designed to tolerated failure in mlx5_tc_ct_init and is supposed to continue here and not abort with return 0.. so either return proper errno or continue initializing, the code currently has a bug. Thanks Dan for pointing that out. > If mlx5_tc_ct_init() fails, which it should do if kmalloc() fails but > currently it does not, then the error should be propagated all the > way > back. So this code should preserve the error code instead of > returning > success. > > 4729 > 4730 mapping = mapping_create(sizeof(struct > tunnel_match_key), > 4731 TUNNEL_INFO_BITS_MASK, > true); > > regards, > dan carpenter >
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c index 104b1c339de0..438fbcf478d1 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c @@ -5224,8 +5224,10 @@ int mlx5e_tc_nic_init(struct mlx5e_priv *priv) tc->ct = mlx5_tc_ct_init(priv, tc->chains, &priv->fs.tc.mod_hdr, MLX5_FLOW_NAMESPACE_KERNEL); - if (IS_ERR(tc->ct)) + if (!tc->ct) { + err = -ENOMEM; goto err_ct; + } tc->netdevice_nb.notifier_call = mlx5e_tc_netdev_event; err = register_netdevice_notifier_dev_net(priv->netdev, @@ -5300,8 +5300,10 @@ int mlx5e_tc_esw_init(struct rhashtable *tc_ht) esw_chains(esw), &esw->offloads.mod_hdr, MLX5_FLOW_NAMESPACE_FDB); - if (IS_ERR(uplink_priv->ct_priv)) + if (!uplink_priv->ct_priv) { + err = -ENOMEM; goto err_ct; + } mapping = mapping_create(sizeof(struct tunnel_match_key), TUNNEL_INFO_BITS_MASK, true);
The mlx5_tc_ct_init() function doesn't return error pointers it returns NULL. Also we need to set the error codes on this path. Fixes: aedd133d17bc ("net/mlx5e: Support CT offload for tc nic flows") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> --- drivers/net/ethernet/mellanox/mlx5/core/en_tc.c | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-)