Message ID | 20160126095205.0e5923bd@endymion.delvare (mailing list archive) |
---|---|
State | New, archived |
Delegated to: | Bjorn Helgaas |
Headers | show |
On Tue, Jan 26, 2016 at 09:52:05AM +0100, Jean Delvare (by way of Jean Delvare <jdelvare@suse.de>) wrote: > The aer_inject driver is very quiet. In most cases, it merely returns > an error code to user-space, leaving the user with little clue about > the actual reason for the failure. > > So, log error messages for 4 of the most frequent causes of failure: > * Can't find the root port of the specified device. > * Device doesn't support AER. > * Root port doesn't support AER. > * AER device not found. > This gives the user a chance to understand why aer-inject failed. > > Based on a preliminary patch by Thomas Renninger. > > Signed-off-by: Jean Delvare <jdelvare@suse.de> > Cc: Thomas Renninger <trenn@suse.de> > Cc: Bjorn Helgaas <bhelgaas@google.com> > --- > drivers/pci/pcie/aer/aer_inject.c | 8 +++++++- > 1 file changed, 7 insertions(+), 1 deletion(-) > > --- linux-4.5-rc0.orig/drivers/pci/pcie/aer/aer_inject.c 2016-01-20 09:25:54.815852332 +0100 > +++ linux-4.5-rc0/drivers/pci/pcie/aer/aer_inject.c 2016-01-26 09:41:17.361994839 +0100 > @@ -334,12 +334,14 @@ static int aer_inject(struct aer_error_i > return -ENODEV; > rpdev = pcie_find_root_port(dev); > if (!rpdev) { > + dev_err(&dev->dev, "aer_inject: Root port not found\n"); > ret = -ENODEV; > goto out_put; > } > > pos_cap_err = pci_find_ext_capability(dev, PCI_EXT_CAP_ID_ERR); > if (!pos_cap_err) { > + dev_err(&dev->dev, "aer_inject: Device doesn't support AER\n"); > ret = -EPERM; Btw, this -EPERM looks wrong - if we're checking for capabilities, we shouldn't be returning -EPERM but maybe something like -ENODEV or so. > goto out_put; > } > @@ -350,6 +352,8 @@ static int aer_inject(struct aer_error_i > > rp_pos_cap_err = pci_find_ext_capability(rpdev, PCI_EXT_CAP_ID_ERR); > if (!rp_pos_cap_err) { > + dev_err(&rpdev->dev, > + "aer_inject: Root port doesn't support AER\n"); > ret = -EPERM; Ditto. > goto out_put; > } > @@ -462,8 +466,10 @@ static int aer_inject(struct aer_error_i > goto out_put; > } > aer_irq(-1, edev); > - } else > + } else { > + dev_err(&rpdev->dev, "aer_inject: AER device not found\n"); So other error prints in that function do printk(KERN_WARNING. Why dev_err()? Why not pr_err() and define pr_fmt to "aer_inject: " and then drop that prefix from the messages? Thanks.
Hi Borislav, Thanks for the quick review. Le Tuesday 26 January 2016 à 11:12 +0100, Borislav Petkov a écrit : > On Tue, Jan 26, 2016 at 09:52:05AM +0100, Jean Delvare (by way of Jean Delvare <jdelvare@suse.de>) wrote: > > The aer_inject driver is very quiet. In most cases, it merely returns > > an error code to user-space, leaving the user with little clue about > > the actual reason for the failure. > > > > So, log error messages for 4 of the most frequent causes of failure: > > * Can't find the root port of the specified device. > > * Device doesn't support AER. > > * Root port doesn't support AER. > > * AER device not found. > > This gives the user a chance to understand why aer-inject failed. > > > > Based on a preliminary patch by Thomas Renninger. > > > > Signed-off-by: Jean Delvare <jdelvare@suse.de> > > Cc: Thomas Renninger <trenn@suse.de> > > Cc: Bjorn Helgaas <bhelgaas@google.com> > > --- > > drivers/pci/pcie/aer/aer_inject.c | 8 +++++++- > > 1 file changed, 7 insertions(+), 1 deletion(-) > > > > --- linux-4.5-rc0.orig/drivers/pci/pcie/aer/aer_inject.c 2016-01-20 09:25:54.815852332 +0100 > > +++ linux-4.5-rc0/drivers/pci/pcie/aer/aer_inject.c 2016-01-26 09:41:17.361994839 +0100 > > @@ -334,12 +334,14 @@ static int aer_inject(struct aer_error_i > > return -ENODEV; > > rpdev = pcie_find_root_port(dev); > > if (!rpdev) { > > + dev_err(&dev->dev, "aer_inject: Root port not found\n"); > > ret = -ENODEV; > > goto out_put; > > } > > > > pos_cap_err = pci_find_ext_capability(dev, PCI_EXT_CAP_ID_ERR); > > if (!pos_cap_err) { > > + dev_err(&dev->dev, "aer_inject: Device doesn't support AER\n"); > > ret = -EPERM; > > Btw, this -EPERM looks wrong - if we're checking for capabilities, we > shouldn't be returning -EPERM but maybe something like -ENODEV or so. I agree. It was originally -ENOTTY, changed to -EPERM by: commit e82b14bdd390c534750a191f9936f842bab255d4 Author: Prarit Bhargava <prarit@redhat.com> Date: Wed Mar 20 12:04:43 2013 +0000 But I'd say -EPERM is hardly better. The problem with -ENODEV is that it is already returned by this function for several other error causes. Also the aer-inject user-space tool will print the error message from the error code, and I don't think "No such device" is helpful in that case. What about -ENOTSUPP ("Operation not supported") or -EEPROTONOSUPPORT ("Protocol not supported")? I can change it if nobody objects. I think the change can be included in this patch as it is quite related. > > goto out_put; > > } > > @@ -350,6 +352,8 @@ static int aer_inject(struct aer_error_i > > > > rp_pos_cap_err = pci_find_ext_capability(rpdev, PCI_EXT_CAP_ID_ERR); > > if (!rp_pos_cap_err) { > > + dev_err(&rpdev->dev, > > + "aer_inject: Root port doesn't support AER\n"); > > ret = -EPERM; > > Ditto. > > > goto out_put; > > } > > @@ -462,8 +466,10 @@ static int aer_inject(struct aer_error_i > > goto out_put; > > } > > aer_irq(-1, edev); > > - } else > > + } else { > > + dev_err(&rpdev->dev, "aer_inject: AER device not found\n"); > > So other error prints in that function do printk(KERN_WARNING. Why > dev_err()? I'd rather ask, why printk? ;-) Using raw printk is considered bad and should be avoided whenever possible. So says checkpatch.pl. If anything, all these printks should be converted to at least pr_* and ideally dev_*. But that would be a separate patch. > Why not pr_err() and define pr_fmt to "aer_inject: " and then drop > that prefix from the messages? Because I believe that including the device name in the error messages makes them more helpful to understand and diagnose the problem. If the device where we try to inject the error has a problem, it's PCI name will be included in the error message. If the error is with the root port, then we include the root port's PCI name. If I used pr_err() instead then the device information would be missing.
On Tue, Jan 26, 2016 at 01:27:18PM +0100, Jean Delvare wrote: > But I'd say -EPERM is hardly better. The problem with -ENODEV is that it > is already returned by this function for several other error causes. > Also the aer-inject user-space tool will print the error message from > the error code, and I don't think "No such device" is helpful in that > case. What about -ENOTSUPP ("Operation not supported") or > -EEPROTONOSUPPORT ("Protocol not supported")? Makes sense. > I can change it if nobody objects. I think the change can be included in > this patch as it is quite related. I'd do a separate patch but this is only my opinion. I guess that's Bjorn's call. > I'd rather ask, why printk? ;-) Using raw printk is considered bad and > should be avoided whenever possible. Hmm, interesting. Why? > So says checkpatch.pl. Please don't tell me you believe what checkpatch says. > > Why not pr_err() and define pr_fmt to "aer_inject: " and then drop > > that prefix from the messages? > > Because I believe that including the device name in the error messages > makes them more helpful to understand and diagnose the problem. If the > device where we try to inject the error has a problem, it's PCI name > will be included in the error message. If the error is with the root > port, then we include the root port's PCI name. If I used pr_err() > instead then the device information would be missing. True, that's a good argument. However, if you're doing aer injection, you already *know* the device you're injecting too. Unless you want to inject in multiple devices and then it is helpful. So sure, dev_* sounds better as it gives more info about which device fails, but then please convert the whole driver. Thanks.
Le Tuesday 26 January 2016 à 13:49 +0100, Borislav Petkov a écrit : > On Tue, Jan 26, 2016 at 01:27:18PM +0100, Jean Delvare wrote: > > But I'd say -EPERM is hardly better. The problem with -ENODEV is that it > > is already returned by this function for several other error causes. > > Also the aer-inject user-space tool will print the error message from > > the error code, and I don't think "No such device" is helpful in that > > case. What about -ENOTSUPP ("Operation not supported") or > > -EEPROTONOSUPPORT ("Protocol not supported")? > > Makes sense. > > > I can change it if nobody objects. I think the change can be included in > > this patch as it is quite related. > > I'd do a separate patch but this is only my opinion. I guess that's > Bjorn's call. I am almost always advocating for separate patches, but here it seemed like hairsplitting so I wasn't sure. I'm fine both ways really. > > I'd rather ask, why printk? ;-) Using raw printk is considered bad and > > should be avoided whenever possible. > > Hmm, interesting. Why? I guess the idea is that it makes message formats more consistent and valuable. > > So says checkpatch.pl. > > Please don't tell me you believe what checkpatch says. Of course I believe it, as long as it says what I want to hear. If not then I just claim it's a piece of crap and ignore it ;-) As everybody does, it seems. > > > Why not pr_err() and define pr_fmt to "aer_inject: " and then drop > > > that prefix from the messages? > > > > Because I believe that including the device name in the error messages > > makes them more helpful to understand and diagnose the problem. If the > > device where we try to inject the error has a problem, it's PCI name > > will be included in the error message. If the error is with the root > > port, then we include the root port's PCI name. If I used pr_err() > > instead then the device information would be missing. > > True, that's a good argument. > > However, if you're doing aer injection, you already *know* the device > you're injecting too. Unless you want to inject in multiple devices and > then it is helpful. You know the device, but you don't know its root device, which apparently matters a lot for AER, and is used for 2 of the messages I introduced. Even for the device itself, a confirmation of the PCI device name is always good to have, to avoid confusion if you made a typo in your injection data for example. > So sure, dev_* sounds better as it gives more info about which device > fails, but then please convert the whole driver. OK, I'll work on this once the first round of reviews if done. I don't know if others have more comments, so let's wait a bit.
On Tue, Jan 26, 2016 at 02:05:54PM +0100, Jean Delvare wrote: > Le Tuesday 26 January 2016 à 13:49 +0100, Borislav Petkov a écrit : > > On Tue, Jan 26, 2016 at 01:27:18PM +0100, Jean Delvare wrote: > > > But I'd say -EPERM is hardly better. The problem with -ENODEV is that it > > > is already returned by this function for several other error causes. > > > Also the aer-inject user-space tool will print the error message from > > > the error code, and I don't think "No such device" is helpful in that > > > case. What about -ENOTSUPP ("Operation not supported") or > > > -EEPROTONOSUPPORT ("Protocol not supported")? > > > > Makes sense. > > > > > I can change it if nobody objects. I think the change can be included in > > > this patch as it is quite related. > > > > I'd do a separate patch but this is only my opinion. I guess that's > > Bjorn's call. > > I am almost always advocating for separate patches, but here it seemed > like hairsplitting so I wasn't sure. I'm fine both ways really. I'd prefer one patch to change the errno (only) and another to add the printk logging. I definitely prefer dev_* whenever possible. The aer_inject user knows the relevant device at the time, but dev_* makes the dmesg log more useful later. In fact, your patch only adds logging to some error paths. I'd like to have some indication in dmesg that aer_inject was used at all. Maybe even a synopsis of the injected error, though maybe that's too much if aer_inject is used in an automated way. It would just be nice to have a dmesg indication that subsequent AER events *might* be injected rather than real errors. Bjorn -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
--- linux-4.5-rc0.orig/drivers/pci/pcie/aer/aer_inject.c 2016-01-20 09:25:54.815852332 +0100 +++ linux-4.5-rc0/drivers/pci/pcie/aer/aer_inject.c 2016-01-26 09:41:17.361994839 +0100 @@ -334,12 +334,14 @@ static int aer_inject(struct aer_error_i return -ENODEV; rpdev = pcie_find_root_port(dev); if (!rpdev) { + dev_err(&dev->dev, "aer_inject: Root port not found\n"); ret = -ENODEV; goto out_put; } pos_cap_err = pci_find_ext_capability(dev, PCI_EXT_CAP_ID_ERR); if (!pos_cap_err) { + dev_err(&dev->dev, "aer_inject: Device doesn't support AER\n"); ret = -EPERM; goto out_put; } @@ -350,6 +352,8 @@ static int aer_inject(struct aer_error_i rp_pos_cap_err = pci_find_ext_capability(rpdev, PCI_EXT_CAP_ID_ERR); if (!rp_pos_cap_err) { + dev_err(&rpdev->dev, + "aer_inject: Root port doesn't support AER\n"); ret = -EPERM; goto out_put; } @@ -462,8 +466,10 @@ static int aer_inject(struct aer_error_i goto out_put; } aer_irq(-1, edev); - } else + } else { + dev_err(&rpdev->dev, "aer_inject: AER device not found\n"); ret = -EINVAL; + } out_put: kfree(err_alloc); kfree(rperr_alloc);
The aer_inject driver is very quiet. In most cases, it merely returns an error code to user-space, leaving the user with little clue about the actual reason for the failure. So, log error messages for 4 of the most frequent causes of failure: * Can't find the root port of the specified device. * Device doesn't support AER. * Root port doesn't support AER. * AER device not found. This gives the user a chance to understand why aer-inject failed. Based on a preliminary patch by Thomas Renninger. Signed-off-by: Jean Delvare <jdelvare@suse.de> Cc: Thomas Renninger <trenn@suse.de> Cc: Bjorn Helgaas <bhelgaas@google.com> --- drivers/pci/pcie/aer/aer_inject.c | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-)