diff mbox series

[net-next,06/12] iavf: Add trace while removing device

Message ID 20211124171652.831184-7-anthony.l.nguyen@intel.com (mailing list archive)
State Changes Requested
Delegated to: Netdev Maintainers
Headers show
Series 40GbE Intel Wired LAN Driver Updates 2021-11-24 | expand

Commit Message

Tony Nguyen Nov. 24, 2021, 5:16 p.m. UTC
From: Jedrzej Jagielski <jedrzej.jagielski@intel.com>

Add kernel trace that device was removed.
Currently there is no such information.
I.e. Host admin removes a PCI device from a VM,
than on VM shall be info about the event.

This patch adds info log to iavf_remove function.

Signed-off-by: Arkadiusz Kubalewski <arkadiusz.kubalewski@intel.com>
Signed-off-by: Jedrzej Jagielski <jedrzej.jagielski@intel.com>
Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
---
 drivers/net/ethernet/intel/iavf/iavf_main.c | 1 +
 1 file changed, 1 insertion(+)

Comments

Jakub Kicinski Nov. 24, 2021, 11:48 p.m. UTC | #1
On Wed, 24 Nov 2021 09:16:46 -0800 Tony Nguyen wrote:
> Add kernel trace that device was removed.
> Currently there is no such information.
> I.e. Host admin removes a PCI device from a VM,
> than on VM shall be info about the event.
> 
> This patch adds info log to iavf_remove function.

Why is this an important thing to print to logs about?
If it is why is PCI core not doing the printing?
Stefan Assmann Nov. 25, 2021, 6:50 a.m. UTC | #2
On 2021-11-24 15:48, Jakub Kicinski wrote:
> On Wed, 24 Nov 2021 09:16:46 -0800 Tony Nguyen wrote:
> > Add kernel trace that device was removed.
> > Currently there is no such information.
> > I.e. Host admin removes a PCI device from a VM,
> > than on VM shall be info about the event.
> > 
> > This patch adds info log to iavf_remove function.
> 
> Why is this an important thing to print to logs about?
> If it is why is PCI core not doing the printing?
> 

From personal experience I'd say this piece of information has value,
especially when debugging it can be interesting to know exactly when the
driver was removed.

  Stefan
Jakub Kicinski Nov. 25, 2021, 3:13 p.m. UTC | #3
On Thu, 25 Nov 2021 07:50:49 +0100 Stefan Assmann wrote:
> On 2021-11-24 15:48, Jakub Kicinski wrote:
> > On Wed, 24 Nov 2021 09:16:46 -0800 Tony Nguyen wrote:  
> > > Add kernel trace that device was removed.
> > > Currently there is no such information.
> > > I.e. Host admin removes a PCI device from a VM,
> > > than on VM shall be info about the event.
> > > 
> > > This patch adds info log to iavf_remove function.  
> > 
> > Why is this an important thing to print to logs about?
> > If it is why is PCI core not doing the printing?
> 
> From personal experience I'd say this piece of information has value,
> especially when debugging it can be interesting to know exactly when
> the driver was removed.

But there isn't anything specific to iavf here, right? If it really 
is important then core should be doing the printing for all drivers.

Actually, I can't come up with any uses for this print on the spot.
What debugging scenarios do you have in mind?
Stefan Assmann Nov. 25, 2021, 3:43 p.m. UTC | #4
On 2021-11-25 07:13, Jakub Kicinski wrote:
> On Thu, 25 Nov 2021 07:50:49 +0100 Stefan Assmann wrote:
> > On 2021-11-24 15:48, Jakub Kicinski wrote:
> > > On Wed, 24 Nov 2021 09:16:46 -0800 Tony Nguyen wrote:  
> > > > Add kernel trace that device was removed.
> > > > Currently there is no such information.
> > > > I.e. Host admin removes a PCI device from a VM,
> > > > than on VM shall be info about the event.
> > > > 
> > > > This patch adds info log to iavf_remove function.  
> > > 
> > > Why is this an important thing to print to logs about?
> > > If it is why is PCI core not doing the printing?
> > 
> > From personal experience I'd say this piece of information has value,
> > especially when debugging it can be interesting to know exactly when
> > the driver was removed.
> 
> But there isn't anything specific to iavf here, right? If it really 
> is important then core should be doing the printing for all drivers.
> 
> Actually, I can't come up with any uses for this print on the spot.
> What debugging scenarios do you have in mind?

There was a lot of trouble with iavf in terms of device reset, device
unbinding (DPDK), stress testing of driver load/unload issues. When
looking through the crash logs it was not always easy to determine if
the driver was still loaded.
Especially on problems that weren't easy to reproduce.

So for iavf having that information would have been valuable. Not sure
if that justifies a PCI core message or if others might find that too
verbose.

  Stefan
Jakub Kicinski Nov. 25, 2021, 3:57 p.m. UTC | #5
On Thu, 25 Nov 2021 16:43:49 +0100 Stefan Assmann wrote:
> On 2021-11-25 07:13, Jakub Kicinski wrote:
> > On Thu, 25 Nov 2021 07:50:49 +0100 Stefan Assmann wrote:  
> > > From personal experience I'd say this piece of information has value,
> > > especially when debugging it can be interesting to know exactly when
> > > the driver was removed.  
> > 
> > But there isn't anything specific to iavf here, right? If it really 
> > is important then core should be doing the printing for all drivers.
> > 
> > Actually, I can't come up with any uses for this print on the spot.
> > What debugging scenarios do you have in mind?  
> 
> There was a lot of trouble with iavf in terms of device reset, device
> unbinding (DPDK), stress testing of driver load/unload issues. When
> looking through the crash logs it was not always easy to determine if
> the driver was still loaded.
> Especially on problems that weren't easy to reproduce.

That's a slippery slope, historically we were always pushing for
avoiding informational prints. Otherwise every driver reconfig would
result in a line in the logs.

> So for iavf having that information would have been valuable. Not sure
> if that justifies a PCI core message or if others might find that too
> verbose.

So what you're saying is from your experience iavf is of significantly
poorer quality than other vendors' drivers?
Stefan Assmann Nov. 25, 2021, 4:46 p.m. UTC | #6
On 2021-11-25 07:57, Jakub Kicinski wrote:
> On Thu, 25 Nov 2021 16:43:49 +0100 Stefan Assmann wrote:
> > On 2021-11-25 07:13, Jakub Kicinski wrote:
> > > On Thu, 25 Nov 2021 07:50:49 +0100 Stefan Assmann wrote:  
> > > > From personal experience I'd say this piece of information has value,
> > > > especially when debugging it can be interesting to know exactly when
> > > > the driver was removed.  
> > > 
> > > But there isn't anything specific to iavf here, right? If it really 
> > > is important then core should be doing the printing for all drivers.
> > > 
> > > Actually, I can't come up with any uses for this print on the spot.
> > > What debugging scenarios do you have in mind?  
> > 
> > There was a lot of trouble with iavf in terms of device reset, device
> > unbinding (DPDK), stress testing of driver load/unload issues. When
> > looking through the crash logs it was not always easy to determine if
> > the driver was still loaded.
> > Especially on problems that weren't easy to reproduce.
> 
> That's a slippery slope, historically we were always pushing for
> avoiding informational prints. Otherwise every driver reconfig would
> result in a line in the logs.
> 
> > So for iavf having that information would have been valuable. Not sure
> > if that justifies a PCI core message or if others might find that too
> > verbose.
> 
> So what you're saying is from your experience iavf is of significantly
> poorer quality than other vendors' drivers?

No, I can't really comment on how iavf compares to other vendor drivers
since I've mostly worked with Intel. There's a lot of async tasks and
communication going on between PF and VF and while that seems great it
has a lot of potential for things to go wrong as well. If you look at
the git history of iavf (previously i40evf) you'll see that the
especially the reset, shutdown, error handling code has seen a lot of
fixes and refactoring. Just recently in net-next
898ef1cb1cb2 iavf: Combine init and watchdog state machines
45eebd62999d iavf: Refactor iavf state machine tracking

Finding and fixing all those corner cases is hard, especially when we
have situation where we rely on firmware to perform a certain task in
defined time.
Example:
In iavf_reset_task() we poll a register for a certain amount of time as
the firmware is supposed to complete the reset in that amount of time.
Now that works well 99,999% of the time but sometime it doesn't so the
driver logs a message and continues operation.
That might work, it might not and things go wrong and we had to figure
out why.
Or some async message came in at a bad time, while the driver was
already being removed, and rescheduled the reset task which of course
caused a crash as the structures were being free'd underneath the reset
task.

So from the perspective of somebody working on the driver I'd like to
see it when the driver gets removed.

  Stefan
diff mbox series

Patch

diff --git a/drivers/net/ethernet/intel/iavf/iavf_main.c b/drivers/net/ethernet/intel/iavf/iavf_main.c
index bb2e91cb9cd4..4019191e6b0c 100644
--- a/drivers/net/ethernet/intel/iavf/iavf_main.c
+++ b/drivers/net/ethernet/intel/iavf/iavf_main.c
@@ -3987,6 +3987,7 @@  static void iavf_remove(struct pci_dev *pdev)
 	if (iavf_lock_timeout(&adapter->crit_lock, 5000))
 		dev_warn(&adapter->pdev->dev, "failed to acquire crit_lock in %s\n", __FUNCTION__);
 
+	dev_info(&adapter->pdev->dev, "Removing device\n");
 	/* Shut down all the garbage mashers on the detention level */
 	iavf_change_state(adapter, __IAVF_REMOVE);
 	adapter->aq_required = 0;