diff mbox series

net/mlx4_core: print firmware version during driver loading

Message ID 20180914002514.27571-1-qing.huang@oracle.com (mailing list archive)
State Not Applicable
Headers show
Series net/mlx4_core: print firmware version during driver loading | expand

Commit Message

Qing Huang Sept. 14, 2018, 12:25 a.m. UTC
When debugging firmware related issues, it's very helpful to have
the installed FW version info in the kernel log when the driver is
loaded. It's easier to match error/warning messages with different
FW versions in the log other than running a separate tool to get
the information back and forth.

Signed-off-by: Qing Huang <qing.huang@oracle.com>
---
 drivers/net/ethernet/mellanox/mlx4/fw.c | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

Comments

Leon Romanovsky Sept. 14, 2018, 4:43 a.m. UTC | #1
On Thu, Sep 13, 2018 at 05:25:14PM -0700, Qing Huang wrote:
> When debugging firmware related issues, it's very helpful to have

      ^^^^^^^^^^ exactly, this is why we set this print as mlx4_dbg and
      not mlx4_info.

> the installed FW version info in the kernel log when the driver is
> loaded. It's easier to match error/warning messages with different
> FW versions in the log other than running a separate tool to get
> the information back and forth.
>
> Signed-off-by: Qing Huang <qing.huang@oracle.com>
> ---
>  drivers/net/ethernet/mellanox/mlx4/fw.c | 10 +++++-----
>  1 file changed, 5 insertions(+), 5 deletions(-)
>
> diff --git a/drivers/net/ethernet/mellanox/mlx4/fw.c b/drivers/net/ethernet/mellanox/mlx4/fw.c
> index babcfd9..e1c5218 100644
> --- a/drivers/net/ethernet/mellanox/mlx4/fw.c
> +++ b/drivers/net/ethernet/mellanox/mlx4/fw.c
> @@ -1686,11 +1686,11 @@ int mlx4_QUERY_FW(struct mlx4_dev *dev)
>  	MLX4_GET(lg, outbox, QUERY_FW_MAX_CMD_OFFSET);
>  	cmd->max_cmds = 1 << lg;
>
> -	mlx4_dbg(dev, "FW version %d.%d.%03d (cmd intf rev %d), max commands %d\n",
> -		 (int) (dev->caps.fw_ver >> 32),
> -		 (int) (dev->caps.fw_ver >> 16) & 0xffff,
> -		 (int) dev->caps.fw_ver & 0xffff,
> -		 cmd_if_rev, cmd->max_cmds);
> +	mlx4_info(dev, "FW version %d.%d.%03d (cmd intf rev %d), max commands %d\n",
> +		  (int)(dev->caps.fw_ver >> 32),
> +		  (int)(dev->caps.fw_ver >> 16) & 0xffff,
> +		  (int)dev->caps.fw_ver & 0xffff,
> +		  cmd_if_rev, cmd->max_cmds);
>
>  	MLX4_GET(fw->catas_offset, outbox, QUERY_FW_ERR_START_OFFSET);
>  	MLX4_GET(fw->catas_size,   outbox, QUERY_FW_ERR_SIZE_OFFSET);
> --
> 2.9.3
>
Qing Huang Sept. 14, 2018, 5:15 p.m. UTC | #2
The FW version is actually a very crucial piece of information and only 
printed once here
when the driver is loaded. People tend to get confused when switching 
multiple FW files
back and forth without running separate utility tools, especially at 
customer sites.
IMHO, this information is very useful and only takes up very little log 
file space. :-)

I was also thinking of doing something slightly differently. Maybe we 
just trim down the
output string, and add something like this?
--- a/drivers/net/ethernet/mellanox/mlx4/main.c
+++ b/drivers/net/ethernet/mellanox/mlx4/main.c
@@ -2208,6 +2208,11 @@ static int mlx4_init_fw(struct mlx4_dev *dev)
                         return err;
                 }

+               mlx4_info(dev, "Installed FW version is %d.%d.%03d.\n",
+                         (int) (dev->caps.fw_ver >> 32),
+                         (int) (dev->caps.fw_ver >> 16) & 0xffff,
+                         (int) dev->caps.fw_ver & 0xffff);
+
                 err = mlx4_load_fw(dev);
                 if (err) {
                         mlx4_err(dev, "Failed to start FW, aborting\n");

Thanks,
Qing

On 9/13/2018 9:43 PM, Leon Romanovsky wrote:
> On Thu, Sep 13, 2018 at 05:25:14PM -0700, Qing Huang wrote:
>> When debugging firmware related issues, it's very helpful to have
>        ^^^^^^^^^^ exactly, this is why we set this print as mlx4_dbg and
>        not mlx4_info.
>
>> the installed FW version info in the kernel log when the driver is
>> loaded. It's easier to match error/warning messages with different
>> FW versions in the log other than running a separate tool to get
>> the information back and forth.
>>
>> Signed-off-by: Qing Huang <qing.huang@oracle.com>
>> ---
>>   drivers/net/ethernet/mellanox/mlx4/fw.c | 10 +++++-----
>>   1 file changed, 5 insertions(+), 5 deletions(-)
>>
>> diff --git a/drivers/net/ethernet/mellanox/mlx4/fw.c b/drivers/net/ethernet/mellanox/mlx4/fw.c
>> index babcfd9..e1c5218 100644
>> --- a/drivers/net/ethernet/mellanox/mlx4/fw.c
>> +++ b/drivers/net/ethernet/mellanox/mlx4/fw.c
>> @@ -1686,11 +1686,11 @@ int mlx4_QUERY_FW(struct mlx4_dev *dev)
>>   	MLX4_GET(lg, outbox, QUERY_FW_MAX_CMD_OFFSET);
>>   	cmd->max_cmds = 1 << lg;
>>
>> -	mlx4_dbg(dev, "FW version %d.%d.%03d (cmd intf rev %d), max commands %d\n",
>> -		 (int) (dev->caps.fw_ver >> 32),
>> -		 (int) (dev->caps.fw_ver >> 16) & 0xffff,
>> -		 (int) dev->caps.fw_ver & 0xffff,
>> -		 cmd_if_rev, cmd->max_cmds);
>> +	mlx4_info(dev, "FW version %d.%d.%03d (cmd intf rev %d), max commands %d\n",
>> +		  (int)(dev->caps.fw_ver >> 32),
>> +		  (int)(dev->caps.fw_ver >> 16) & 0xffff,
>> +		  (int)dev->caps.fw_ver & 0xffff,
>> +		  cmd_if_rev, cmd->max_cmds);
>>
>>   	MLX4_GET(fw->catas_offset, outbox, QUERY_FW_ERR_START_OFFSET);
>>   	MLX4_GET(fw->catas_size,   outbox, QUERY_FW_ERR_SIZE_OFFSET);
>> --
>> 2.9.3
>>
Andrew Lunn Sept. 14, 2018, 6:17 p.m. UTC | #3
On Fri, Sep 14, 2018 at 10:15:48AM -0700, Qing Huang wrote:
> The FW version is actually a very crucial piece of information and only
> printed once here
> when the driver is loaded. People tend to get confused when switching
> multiple FW files
> back and forth without running separate utility tools, especially at
> customer sites.
> IMHO, this information is very useful and only takes up very little log file
> space. :-)

Why not use ethtool -i ?

$ sudo ethtool -i eth0
driver: r8169
version: 2.3LK-NAPI
firmware-version: rtl8168g-2_0.0.1 02/06/13

    Andrew
Qing Huang Sept. 14, 2018, 6:33 p.m. UTC | #4
On 9/14/2018 11:17 AM, Andrew Lunn wrote:
> On Fri, Sep 14, 2018 at 10:15:48AM -0700, Qing Huang wrote:
>> The FW version is actually a very crucial piece of information and only
>> printed once here
>> when the driver is loaded. People tend to get confused when switching
>> multiple FW files
>> back and forth without running separate utility tools, especially at
>> customer sites.
>> IMHO, this information is very useful and only takes up very little log file
>> space. :-)
> Why not use ethtool -i ?
>
> $ sudo ethtool -i eth0
> driver: r8169
> version: 2.3LK-NAPI
> firmware-version: rtl8168g-2_0.0.1 02/06/13
>
>      Andrew
Sure. You can also use ibstat or ibv_devinfo tool if they are installed. 
But it's not very
convenient in some cases.

E.g.
A customer upgrades FW on HCAs and encounters issues. During triage, 
it's much easier
to study customer uploaded log files when remotely testing different FW 
files.

Thanks.
Andrew Lunn Sept. 14, 2018, 8:13 p.m. UTC | #5
> >$ sudo ethtool -i eth0
> >driver: r8169
> >version: 2.3LK-NAPI
> >firmware-version: rtl8168g-2_0.0.1 02/06/13
> >
> >     Andrew

> Sure. You can also use ibstat or ibv_devinfo tool if they are installed. But
> it's not very convenient in some cases.

This is the standardised way to do this. It should work for any
Ethernet driver, so long as it fills in the needed information.
Anything else is non-standard, and so inconvenient by definition.

	Andrew
David Miller Sept. 14, 2018, 9:09 p.m. UTC | #6
From: Qing Huang <qing.huang@oracle.com>
Date: Fri, 14 Sep 2018 10:15:48 -0700

> IMHO, this information is very useful and only takes up very little
> log file space. :-)

If it's critical then the log is the wrong place for it as the log
is lossy.

The proper place to obtain this information is via the fw_version
field of the ethtool_drvinfo struct.  This can be obtained at any time
and is reliable.  And if it isn't reliable or correct, we must fix
that.
David Miller Sept. 14, 2018, 9:14 p.m. UTC | #7
From: Qing Huang <qing.huang@oracle.com>
Date: Fri, 14 Sep 2018 11:33:40 -0700

> 
> 
> On 9/14/2018 11:17 AM, Andrew Lunn wrote:
>> On Fri, Sep 14, 2018 at 10:15:48AM -0700, Qing Huang wrote:
>>> The FW version is actually a very crucial piece of information and
>>> only
>>> printed once here
>>> when the driver is loaded. People tend to get confused when switching
>>> multiple FW files
>>> back and forth without running separate utility tools, especially at
>>> customer sites.
>>> IMHO, this information is very useful and only takes up very little
>>> log file
>>> space. :-)
>> Why not use ethtool -i ?
>>
>> $ sudo ethtool -i eth0
>> driver: r8169
>> version: 2.3LK-NAPI
>> firmware-version: rtl8168g-2_0.0.1 02/06/13
>>
>>      Andrew
> Sure. You can also use ibstat or ibv_devinfo tool if they are
> installed. But it's not very
> convenient in some cases.
> 
> E.g.
> A customer upgrades FW on HCAs and encounters issues. During triage,
> it's much easier
> to study customer uploaded log files when remotely testing different
> FW files.

Not a valid argument.  You can print the ethtool output from initramfs
if necessary for triage.

I still stand by the fact that ethtool is the only fully reliable way
to obtain this information, the kernel log is not.
David Miller Sept. 14, 2018, 9:14 p.m. UTC | #8
From: Andrew Lunn <andrew@lunn.ch>
Date: Fri, 14 Sep 2018 20:17:18 +0200

> On Fri, Sep 14, 2018 at 10:15:48AM -0700, Qing Huang wrote:
>> The FW version is actually a very crucial piece of information and only
>> printed once here
>> when the driver is loaded. People tend to get confused when switching
>> multiple FW files
>> back and forth without running separate utility tools, especially at
>> customer sites.
>> IMHO, this information is very useful and only takes up very little log file
>> space. :-)
> 
> Why not use ethtool -i ?
> 
> $ sudo ethtool -i eth0
> driver: r8169
> version: 2.3LK-NAPI
> firmware-version: rtl8168g-2_0.0.1 02/06/13

+1
Qing Huang Sept. 14, 2018, 10:36 p.m. UTC | #9
On 9/14/2018 2:14 PM, David Miller wrote:
> From: Qing Huang<qing.huang@oracle.com>
> Date: Fri, 14 Sep 2018 11:33:40 -0700
>
>> On 9/14/2018 11:17 AM, Andrew Lunn wrote:
>>> On Fri, Sep 14, 2018 at 10:15:48AM -0700, Qing Huang wrote:
>>>> The FW version is actually a very crucial piece of information and
>>>> only
>>>> printed once here
>>>> when the driver is loaded. People tend to get confused when switching
>>>> multiple FW files
>>>> back and forth without running separate utility tools, especially at
>>>> customer sites.
>>>> IMHO, this information is very useful and only takes up very little
>>>> log file
>>>> space. :-)
>>> Why not use ethtool -i ?
>>>
>>> $ sudo ethtool -i eth0
>>> driver: r8169
>>> version: 2.3LK-NAPI
>>> firmware-version: rtl8168g-2_0.0.1 02/06/13
>>>
>>>       Andrew
>> Sure. You can also use ibstat or ibv_devinfo tool if they are
>> installed. But it's not very
>> convenient in some cases.
>>
>> E.g.
>> A customer upgrades FW on HCAs and encounters issues. During triage,
>> it's much easier
>> to study customer uploaded log files when remotely testing different
>> FW files.
> Not a valid argument.  You can print the ethtool output from initramfs
> if necessary for triage.
>
> I still stand by the fact that ethtool is the only fully reliable way
> to obtain this information, the kernel log is not.

This is more for Infiniband mode which depends more on features and 
functionalities
provided in firmware and get much more frequent FW bug fixes than 
typical Ethernet
devices. This is not meant to replace other ways of getting the 
information, more like
an enhancement for checking log history.

This can provide valuable information when tracing through system log 
history to
discover what happened with a specific HCA drv ver and fw ver 
combination in the past.

Regards,
Qing
Leon Romanovsky Sept. 15, 2018, 8:50 a.m. UTC | #10
On Fri, Sep 14, 2018 at 03:36:46PM -0700, Qing Huang wrote:
>
>
> On 9/14/2018 2:14 PM, David Miller wrote:
> > From: Qing Huang<qing.huang@oracle.com>
> > Date: Fri, 14 Sep 2018 11:33:40 -0700
> >
> > > On 9/14/2018 11:17 AM, Andrew Lunn wrote:
> > > > On Fri, Sep 14, 2018 at 10:15:48AM -0700, Qing Huang wrote:
> > > > > The FW version is actually a very crucial piece of information and
> > > > > only
> > > > > printed once here
> > > > > when the driver is loaded. People tend to get confused when switching
> > > > > multiple FW files
> > > > > back and forth without running separate utility tools, especially at
> > > > > customer sites.
> > > > > IMHO, this information is very useful and only takes up very little
> > > > > log file
> > > > > space. :-)
> > > > Why not use ethtool -i ?
> > > >
> > > > $ sudo ethtool -i eth0
> > > > driver: r8169
> > > > version: 2.3LK-NAPI
> > > > firmware-version: rtl8168g-2_0.0.1 02/06/13
> > > >
> > > >       Andrew
> > > Sure. You can also use ibstat or ibv_devinfo tool if they are
> > > installed. But it's not very
> > > convenient in some cases.
> > >
> > > E.g.
> > > A customer upgrades FW on HCAs and encounters issues. During triage,
> > > it's much easier
> > > to study customer uploaded log files when remotely testing different
> > > FW files.
> > Not a valid argument.  You can print the ethtool output from initramfs
> > if necessary for triage.
> >
> > I still stand by the fact that ethtool is the only fully reliable way
> > to obtain this information, the kernel log is not.
>
> This is more for Infiniband mode which depends more on features and
> functionalities

For pure infiniband devices you have rdmatool, part of iproute2.
[leonro@server-14-015 ~]$ rdma dev
1: mlx5_0: node_type ca fw 3.8.9999 node_guid 5254:00c0:fe12:3455 sys_image_guid 5254:00c0:fe12:3455

> provided in firmware and get much more frequent FW bug fixes than typical
> Ethernet
> devices. This is not meant to replace other ways of getting the information,
> more like
> an enhancement for checking log history.
>
> This can provide valuable information when tracing through system log
> history to
> discover what happened with a specific HCA drv ver and fw ver combination in
> the past.
>
> Regards,
> Qing
diff mbox series

Patch

diff --git a/drivers/net/ethernet/mellanox/mlx4/fw.c b/drivers/net/ethernet/mellanox/mlx4/fw.c
index babcfd9..e1c5218 100644
--- a/drivers/net/ethernet/mellanox/mlx4/fw.c
+++ b/drivers/net/ethernet/mellanox/mlx4/fw.c
@@ -1686,11 +1686,11 @@  int mlx4_QUERY_FW(struct mlx4_dev *dev)
 	MLX4_GET(lg, outbox, QUERY_FW_MAX_CMD_OFFSET);
 	cmd->max_cmds = 1 << lg;
 
-	mlx4_dbg(dev, "FW version %d.%d.%03d (cmd intf rev %d), max commands %d\n",
-		 (int) (dev->caps.fw_ver >> 32),
-		 (int) (dev->caps.fw_ver >> 16) & 0xffff,
-		 (int) dev->caps.fw_ver & 0xffff,
-		 cmd_if_rev, cmd->max_cmds);
+	mlx4_info(dev, "FW version %d.%d.%03d (cmd intf rev %d), max commands %d\n",
+		  (int)(dev->caps.fw_ver >> 32),
+		  (int)(dev->caps.fw_ver >> 16) & 0xffff,
+		  (int)dev->caps.fw_ver & 0xffff,
+		  cmd_if_rev, cmd->max_cmds);
 
 	MLX4_GET(fw->catas_offset, outbox, QUERY_FW_ERR_START_OFFSET);
 	MLX4_GET(fw->catas_size,   outbox, QUERY_FW_ERR_SIZE_OFFSET);