Message ID | 20211124171652.831184-7-anthony.l.nguyen@intel.com (mailing list archive) |
---|---|
State | Changes Requested |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | 40GbE Intel Wired LAN Driver Updates 2021-11-24 | expand |
On Wed, 24 Nov 2021 09:16:46 -0800 Tony Nguyen wrote: > Add kernel trace that device was removed. > Currently there is no such information. > I.e. Host admin removes a PCI device from a VM, > than on VM shall be info about the event. > > This patch adds info log to iavf_remove function. Why is this an important thing to print to logs about? If it is why is PCI core not doing the printing?
On 2021-11-24 15:48, Jakub Kicinski wrote: > On Wed, 24 Nov 2021 09:16:46 -0800 Tony Nguyen wrote: > > Add kernel trace that device was removed. > > Currently there is no such information. > > I.e. Host admin removes a PCI device from a VM, > > than on VM shall be info about the event. > > > > This patch adds info log to iavf_remove function. > > Why is this an important thing to print to logs about? > If it is why is PCI core not doing the printing? > From personal experience I'd say this piece of information has value, especially when debugging it can be interesting to know exactly when the driver was removed. Stefan
On Thu, 25 Nov 2021 07:50:49 +0100 Stefan Assmann wrote: > On 2021-11-24 15:48, Jakub Kicinski wrote: > > On Wed, 24 Nov 2021 09:16:46 -0800 Tony Nguyen wrote: > > > Add kernel trace that device was removed. > > > Currently there is no such information. > > > I.e. Host admin removes a PCI device from a VM, > > > than on VM shall be info about the event. > > > > > > This patch adds info log to iavf_remove function. > > > > Why is this an important thing to print to logs about? > > If it is why is PCI core not doing the printing? > > From personal experience I'd say this piece of information has value, > especially when debugging it can be interesting to know exactly when > the driver was removed. But there isn't anything specific to iavf here, right? If it really is important then core should be doing the printing for all drivers. Actually, I can't come up with any uses for this print on the spot. What debugging scenarios do you have in mind?
On 2021-11-25 07:13, Jakub Kicinski wrote: > On Thu, 25 Nov 2021 07:50:49 +0100 Stefan Assmann wrote: > > On 2021-11-24 15:48, Jakub Kicinski wrote: > > > On Wed, 24 Nov 2021 09:16:46 -0800 Tony Nguyen wrote: > > > > Add kernel trace that device was removed. > > > > Currently there is no such information. > > > > I.e. Host admin removes a PCI device from a VM, > > > > than on VM shall be info about the event. > > > > > > > > This patch adds info log to iavf_remove function. > > > > > > Why is this an important thing to print to logs about? > > > If it is why is PCI core not doing the printing? > > > > From personal experience I'd say this piece of information has value, > > especially when debugging it can be interesting to know exactly when > > the driver was removed. > > But there isn't anything specific to iavf here, right? If it really > is important then core should be doing the printing for all drivers. > > Actually, I can't come up with any uses for this print on the spot. > What debugging scenarios do you have in mind? There was a lot of trouble with iavf in terms of device reset, device unbinding (DPDK), stress testing of driver load/unload issues. When looking through the crash logs it was not always easy to determine if the driver was still loaded. Especially on problems that weren't easy to reproduce. So for iavf having that information would have been valuable. Not sure if that justifies a PCI core message or if others might find that too verbose. Stefan
On Thu, 25 Nov 2021 16:43:49 +0100 Stefan Assmann wrote: > On 2021-11-25 07:13, Jakub Kicinski wrote: > > On Thu, 25 Nov 2021 07:50:49 +0100 Stefan Assmann wrote: > > > From personal experience I'd say this piece of information has value, > > > especially when debugging it can be interesting to know exactly when > > > the driver was removed. > > > > But there isn't anything specific to iavf here, right? If it really > > is important then core should be doing the printing for all drivers. > > > > Actually, I can't come up with any uses for this print on the spot. > > What debugging scenarios do you have in mind? > > There was a lot of trouble with iavf in terms of device reset, device > unbinding (DPDK), stress testing of driver load/unload issues. When > looking through the crash logs it was not always easy to determine if > the driver was still loaded. > Especially on problems that weren't easy to reproduce. That's a slippery slope, historically we were always pushing for avoiding informational prints. Otherwise every driver reconfig would result in a line in the logs. > So for iavf having that information would have been valuable. Not sure > if that justifies a PCI core message or if others might find that too > verbose. So what you're saying is from your experience iavf is of significantly poorer quality than other vendors' drivers?
On 2021-11-25 07:57, Jakub Kicinski wrote: > On Thu, 25 Nov 2021 16:43:49 +0100 Stefan Assmann wrote: > > On 2021-11-25 07:13, Jakub Kicinski wrote: > > > On Thu, 25 Nov 2021 07:50:49 +0100 Stefan Assmann wrote: > > > > From personal experience I'd say this piece of information has value, > > > > especially when debugging it can be interesting to know exactly when > > > > the driver was removed. > > > > > > But there isn't anything specific to iavf here, right? If it really > > > is important then core should be doing the printing for all drivers. > > > > > > Actually, I can't come up with any uses for this print on the spot. > > > What debugging scenarios do you have in mind? > > > > There was a lot of trouble with iavf in terms of device reset, device > > unbinding (DPDK), stress testing of driver load/unload issues. When > > looking through the crash logs it was not always easy to determine if > > the driver was still loaded. > > Especially on problems that weren't easy to reproduce. > > That's a slippery slope, historically we were always pushing for > avoiding informational prints. Otherwise every driver reconfig would > result in a line in the logs. > > > So for iavf having that information would have been valuable. Not sure > > if that justifies a PCI core message or if others might find that too > > verbose. > > So what you're saying is from your experience iavf is of significantly > poorer quality than other vendors' drivers? No, I can't really comment on how iavf compares to other vendor drivers since I've mostly worked with Intel. There's a lot of async tasks and communication going on between PF and VF and while that seems great it has a lot of potential for things to go wrong as well. If you look at the git history of iavf (previously i40evf) you'll see that the especially the reset, shutdown, error handling code has seen a lot of fixes and refactoring. Just recently in net-next 898ef1cb1cb2 iavf: Combine init and watchdog state machines 45eebd62999d iavf: Refactor iavf state machine tracking Finding and fixing all those corner cases is hard, especially when we have situation where we rely on firmware to perform a certain task in defined time. Example: In iavf_reset_task() we poll a register for a certain amount of time as the firmware is supposed to complete the reset in that amount of time. Now that works well 99,999% of the time but sometime it doesn't so the driver logs a message and continues operation. That might work, it might not and things go wrong and we had to figure out why. Or some async message came in at a bad time, while the driver was already being removed, and rescheduled the reset task which of course caused a crash as the structures were being free'd underneath the reset task. So from the perspective of somebody working on the driver I'd like to see it when the driver gets removed. Stefan
diff --git a/drivers/net/ethernet/intel/iavf/iavf_main.c b/drivers/net/ethernet/intel/iavf/iavf_main.c index bb2e91cb9cd4..4019191e6b0c 100644 --- a/drivers/net/ethernet/intel/iavf/iavf_main.c +++ b/drivers/net/ethernet/intel/iavf/iavf_main.c @@ -3987,6 +3987,7 @@ static void iavf_remove(struct pci_dev *pdev) if (iavf_lock_timeout(&adapter->crit_lock, 5000)) dev_warn(&adapter->pdev->dev, "failed to acquire crit_lock in %s\n", __FUNCTION__); + dev_info(&adapter->pdev->dev, "Removing device\n"); /* Shut down all the garbage mashers on the detention level */ iavf_change_state(adapter, __IAVF_REMOVE); adapter->aq_required = 0;