Message ID | 20230609115806.2625564-1-saikrishnag@marvell.com (mailing list archive) |
---|---|
State | Changes Requested |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | [net,v2] octeontx2-af: Move validation of ptp pointer before its usage | expand |
On Fri, Jun 09, 2023 at 05:28:06PM +0530, Sai Krishna wrote: > @@ -428,7 +427,7 @@ static int ptp_probe(struct pci_dev *pdev, > return 0; > > error_free: > - devm_kfree(dev, ptp); > + kfree(ptp); Yeah. It's strange any time we call devm_kfree()... So there is something here which I have not understood. > > error: > /* For `ptp_get()` we need to differentiate between the case This probe function is super weird how it returns success on the failure path. One concern, I had initially was that if anything returns -EPROBE_DEFER then we cannot recover. That's not possible in the current code, but it makes me itch... But here is a different crash. drivers/net/ethernet/marvell/octeontx2/af/ptp.c 432 error: 433 /* For `ptp_get()` we need to differentiate between the case 434 * when the core has not tried to probe this device and the case when 435 * the probe failed. In the later case we pretend that the 436 * initialization was successful and keep the error in 437 * `dev->driver_data`. 438 */ 439 pci_set_drvdata(pdev, ERR_PTR(err)); 440 if (!first_ptp_block) 441 first_ptp_block = ERR_PTR(err); first_ptp_block is NULL for unprobed, an error pointer for probe failure, or valid pointer. 442 443 return 0; 444 } drivers/net/ethernet/marvell/octeontx2/af/ptp.c 201 struct ptp *ptp_get(void) 202 { 203 struct ptp *ptp = first_ptp_block; ^^^^^^^^^^^^^^^^^^^^^^ 204 205 /* Check PTP block is present in hardware */ 206 if (!pci_dev_present(ptp_id_table)) 207 return ERR_PTR(-ENODEV); 208 /* Check driver is bound to PTP block */ 209 if (!ptp) 210 ptp = ERR_PTR(-EPROBE_DEFER); 211 else 212 pci_dev_get(ptp->pdev); ^^^^^^^^^ if first_ptp_block is an error pointer this will Oops. 213 214 return ptp; 215 } regards, dan carpenter
> -----Original Message----- > From: Dan Carpenter <dan.carpenter@linaro.org> > Sent: Friday, June 9, 2023 6:48 PM > To: Sai Krishna Gajula <saikrishnag@marvell.com> > Cc: davem@davemloft.net; edumazet@google.com; kuba@kernel.org; > pabeni@redhat.com; netdev@vger.kernel.org; linux-kernel@vger.kernel.org; > Sunil Kovvuri Goutham <sgoutham@marvell.com>; > maciej.fijalkowski@intel.com; Naveen Mamindlapalli > <naveenm@marvell.com> > Subject: Re: [net PATCH v2] octeontx2-af: Move validation of ptp > pointer before its usage > > On Fri, Jun 09, 2023 at 05:28:06PM +0530, Sai Krishna wrote: > > @@ -428,7 +427,7 @@ static int ptp_probe(struct pci_dev *pdev, > > return 0; > > > > error_free: > > - devm_kfree(dev, ptp); > > + kfree(ptp); > > Yeah. It's strange any time we call devm_kfree()... So there is something > here which I have not understood. We moved from devm_kzalloc/devm_kfree to kzalloc/kfree as per review comments from Maciej. > > > > > error: > > /* For `ptp_get()` we need to differentiate between the case > > This probe function is super weird how it returns success on the failure path. > One concern, I had initially was that if anything returns -EPROBE_DEFER then > we cannot recover. That's not possible in the current code, but it makes me > itch... But here is a different crash. > In few circumstances, the PTP device is probed before the AF device in the driver. In such instance, -EPROBE_DEFER is used. -- EDEFER_PROBE is useful when probe order changes. Ex: AF driver probes before PTP. > drivers/net/ethernet/marvell/octeontx2/af/ptp.c > 432 error: > 433 /* For `ptp_get()` we need to differentiate between the case > 434 * when the core has not tried to probe this device and the case > when > 435 * the probe failed. In the later case we pretend that the > 436 * initialization was successful and keep the error in > 437 * `dev->driver_data`. > 438 */ > 439 pci_set_drvdata(pdev, ERR_PTR(err)); > 440 if (!first_ptp_block) > 441 first_ptp_block = ERR_PTR(err); > > first_ptp_block is NULL for unprobed, an error pointer for probe failure, or > valid pointer. This is correct. > > 442 > 443 return 0; > 444 } > > drivers/net/ethernet/marvell/octeontx2/af/ptp.c > 201 struct ptp *ptp_get(void) > 202 { > 203 struct ptp *ptp = first_ptp_block; > ^^^^^^^^^^^^^^^^^^^^^^ > > 204 > 205 /* Check PTP block is present in hardware */ > 206 if (!pci_dev_present(ptp_id_table)) > 207 return ERR_PTR(-ENODEV); > 208 /* Check driver is bound to PTP block */ > 209 if (!ptp) > 210 ptp = ERR_PTR(-EPROBE_DEFER); > 211 else > 212 pci_dev_get(ptp->pdev); > ^^^^^^^^^ if first_ptp_block is an error pointer this will > Oops. Will fix this in v3 patch. Thanks, Sai > > 213 > 214 return ptp; > 215 } > > regards, > dan carpenter
On Fri, Jun 23, 2023 at 11:28:19AM +0000, Sai Krishna Gajula wrote: > > This probe function is super weird how it returns success on the failure path. > > One concern, I had initially was that if anything returns -EPROBE_DEFER then > > we cannot recover. That's not possible in the current code, but it makes me > > itch... But here is a different crash. > > > > In few circumstances, the PTP device is probed before the AF device in > the driver. In such instance, -EPROBE_DEFER is used. > -- EDEFER_PROBE is useful when probe order changes. Ex: AF driver probes before PTP. > You're describing how -EPROBE_DEFER is *supposed* to work. But that's not what this driver does. If the AF driver is probed before the PTP driver then ptp_probe() should return -EPROBE_DEFER and that would allow the kernel to automatically retry ptp_probe() later. But instead of that, ptp_probe() returns success. So I guess the user would have to manually rmmod it and insmod it again? So, what I'm saying I don't understand why we can't do this in the normal way. The other thing I'm saying is that the weird return success on error stuff hasn't been tested or we would have discovered the crash through testing. regards, dan carpenter
> -----Original Message----- > From: Dan Carpenter <dan.carpenter@linaro.org> > Sent: Friday, June 23, 2023 5:14 PM > To: Sai Krishna Gajula <saikrishnag@marvell.com> > Cc: davem@davemloft.net; edumazet@google.com; kuba@kernel.org; > pabeni@redhat.com; netdev@vger.kernel.org; linux- > kernel@vger.kernel.org; Sunil Kovvuri Goutham <sgoutham@marvell.com>; > maciej.fijalkowski@intel.com; Naveen Mamindlapalli > <naveenm@marvell.com> > Subject: Re: [net PATCH v2] octeontx2-af: Move validation of ptp > pointer before its usage > > On Fri, Jun 23, 2023 at 11:28:19AM +0000, Sai Krishna Gajula wrote: > > > This probe function is super weird how it returns success on the failure > path. > > > One concern, I had initially was that if anything returns > > > -EPROBE_DEFER then we cannot recover. That's not possible in the > > > current code, but it makes me itch... But here is a different crash. > > > > > > > In few circumstances, the PTP device is probed before the AF device in > > the driver. In such instance, -EPROBE_DEFER is used. > > -- EDEFER_PROBE is useful when probe order changes. Ex: AF driver probes > before PTP. > > > > You're describing how -EPROBE_DEFER is *supposed* to work. But that's not > what this driver does. > > If the AF driver is probed before the PTP driver then ptp_probe() should > return -EPROBE_DEFER and that would allow the kernel to automatically retry > ptp_probe() later. But instead of that, ptp_probe() returns success. So I > guess the user would have to manually rmmod it and insmod it again? So, > what I'm saying I don't understand why we can't do this in the normal way. > > The other thing I'm saying is that the weird return success on error stuff > hasn't been tested or we would have discovered the crash through testing. > > regards, > dan carpenter As suggested, we will return error in ptp_probe in case of any failure conditions. In this case AF driver continues without PTP support. When the AF driver is probed before PTP driver , we will defer the AF probe. Hope these changes are inline with your view. I will send a v3 patch with these changes. Regards, Sai
On Fri, Jun 30, 2023 at 05:19:27AM +0000, Sai Krishna Gajula wrote: > > > -----Original Message----- > > From: Dan Carpenter <dan.carpenter@linaro.org> > > Sent: Friday, June 23, 2023 5:14 PM > > To: Sai Krishna Gajula <saikrishnag@marvell.com> > > Cc: davem@davemloft.net; edumazet@google.com; kuba@kernel.org; > > pabeni@redhat.com; netdev@vger.kernel.org; linux- > > kernel@vger.kernel.org; Sunil Kovvuri Goutham <sgoutham@marvell.com>; > > maciej.fijalkowski@intel.com; Naveen Mamindlapalli > > <naveenm@marvell.com> > > Subject: Re: [net PATCH v2] octeontx2-af: Move validation of ptp > > pointer before its usage > > > > On Fri, Jun 23, 2023 at 11:28:19AM +0000, Sai Krishna Gajula wrote: > > > > This probe function is super weird how it returns success on the failure > > path. > > > > One concern, I had initially was that if anything returns > > > > -EPROBE_DEFER then we cannot recover. That's not possible in the > > > > current code, but it makes me itch... But here is a different crash. > > > > > > > > > > In few circumstances, the PTP device is probed before the AF device in > > > the driver. In such instance, -EPROBE_DEFER is used. > > > -- EDEFER_PROBE is useful when probe order changes. Ex: AF driver probes > > before PTP. > > > > > > > You're describing how -EPROBE_DEFER is *supposed* to work. But that's not > > what this driver does. > > > > If the AF driver is probed before the PTP driver then ptp_probe() should > > return -EPROBE_DEFER and that would allow the kernel to automatically retry > > ptp_probe() later. But instead of that, ptp_probe() returns success. So I > > guess the user would have to manually rmmod it and insmod it again? So, > > what I'm saying I don't understand why we can't do this in the normal way. > > > > The other thing I'm saying is that the weird return success on error stuff > > hasn't been tested or we would have discovered the crash through testing. > > > > regards, > > dan carpenter > > As suggested, we will return error in ptp_probe in case of any > failure conditions. In this case AF driver continues without PTP support. I'm concerned about the "AF driver continues without PTP support". > When the AF driver is probed before PTP driver , we will defer the AF > probe. Hope these changes are inline with your view. > I will send a v3 patch with these changes. > I don't really understand the situation. You have two drivers. Normally, the relationship is very simple where you have to load one before you can load the other. But here it sounds like the relationships are very complicated and you are loading one in a degraded state for some reason... When drivers are loaded that happens in drivers/base/dd.c. We start with a list of drivers to probe. Then if any of them return -EPROBE_DEFER, we put them on deferred_probe_pending_list. Then as soon as we manage to probe another driver successfully we put the drivers on deferred_probe_pending_list onto another list and start trying to probe them again. That process continues until we've gone through the list of drivers and nothing succeeds. regards, dan carpenter
On Fri, Jun 30, 2023 at 11:16 AM Dan Carpenter <dan.carpenter@linaro.org> wrote: > > On Fri, Jun 30, 2023 at 05:19:27AM +0000, Sai Krishna Gajula wrote: > > > > > -----Original Message----- > > > > As suggested, we will return error in ptp_probe in case of any > > failure conditions. In this case AF driver continues without PTP support. > > I'm concerned about the "AF driver continues without PTP support". > Yes, it doesn't make sense to proceed with AF driver if PTP driver probe has failed. PTP driver probe will fail upon memory alloc or ioremap failures, such failures will most likely be encountered by AF driver as well. So better not continue with AF driver probe. > > When the AF driver is probed before PTP driver , we will defer the AF > > probe. Hope these changes are inline with your view. > > I will send a v3 patch with these changes. > > > > I don't really understand the situation. You have two drivers. > Normally, the relationship is very simple where you have to load one > before you can load the other. But here it sounds like the relationships > are very complicated and you are loading one in a degraded state for > some reason... > No, the relationship is simple. Idea is to defer AF driver probe until PTP driver is loaded. Once the above is fixed there won't be any degraded state. Thanks, Sunil.
diff --git a/drivers/net/ethernet/marvell/octeontx2/af/ptp.c b/drivers/net/ethernet/marvell/octeontx2/af/ptp.c index 3411e2e47d46..248316c4a53f 100644 --- a/drivers/net/ethernet/marvell/octeontx2/af/ptp.c +++ b/drivers/net/ethernet/marvell/octeontx2/af/ptp.c @@ -388,11 +388,10 @@ static int ptp_extts_on(struct ptp *ptp, int on) static int ptp_probe(struct pci_dev *pdev, const struct pci_device_id *ent) { - struct device *dev = &pdev->dev; struct ptp *ptp; int err; - ptp = devm_kzalloc(dev, sizeof(*ptp), GFP_KERNEL); + ptp = kzalloc(sizeof(struct ptp), GFP_KERNEL); if (!ptp) { err = -ENOMEM; goto error; @@ -428,7 +427,7 @@ static int ptp_probe(struct pci_dev *pdev, return 0; error_free: - devm_kfree(dev, ptp); + kfree(ptp); error: /* For `ptp_get()` we need to differentiate between the case @@ -449,16 +448,17 @@ static void ptp_remove(struct pci_dev *pdev) struct ptp *ptp = pci_get_drvdata(pdev); u64 clock_cfg; - if (cn10k_ptp_errata(ptp) && hrtimer_active(&ptp->hrtimer)) - hrtimer_cancel(&ptp->hrtimer); - if (IS_ERR_OR_NULL(ptp)) return; + if (cn10k_ptp_errata(ptp) && hrtimer_active(&ptp->hrtimer)) + hrtimer_cancel(&ptp->hrtimer); + /* Disable PTP clock */ clock_cfg = readq(ptp->reg_base + PTP_CLOCK_CFG); clock_cfg &= ~PTP_CLOCK_CFG_PTP_EN; writeq(clock_cfg, ptp->reg_base + PTP_CLOCK_CFG); + kfree(ptp); } static const struct pci_device_id ptp_id_table[] = {