Message ID | 20180914002514.27571-1-qing.huang@oracle.com (mailing list archive) |
---|---|
State | Not Applicable |
Headers | show |
Series | net/mlx4_core: print firmware version during driver loading | expand |
On Thu, Sep 13, 2018 at 05:25:14PM -0700, Qing Huang wrote: > When debugging firmware related issues, it's very helpful to have ^^^^^^^^^^ exactly, this is why we set this print as mlx4_dbg and not mlx4_info. > the installed FW version info in the kernel log when the driver is > loaded. It's easier to match error/warning messages with different > FW versions in the log other than running a separate tool to get > the information back and forth. > > Signed-off-by: Qing Huang <qing.huang@oracle.com> > --- > drivers/net/ethernet/mellanox/mlx4/fw.c | 10 +++++----- > 1 file changed, 5 insertions(+), 5 deletions(-) > > diff --git a/drivers/net/ethernet/mellanox/mlx4/fw.c b/drivers/net/ethernet/mellanox/mlx4/fw.c > index babcfd9..e1c5218 100644 > --- a/drivers/net/ethernet/mellanox/mlx4/fw.c > +++ b/drivers/net/ethernet/mellanox/mlx4/fw.c > @@ -1686,11 +1686,11 @@ int mlx4_QUERY_FW(struct mlx4_dev *dev) > MLX4_GET(lg, outbox, QUERY_FW_MAX_CMD_OFFSET); > cmd->max_cmds = 1 << lg; > > - mlx4_dbg(dev, "FW version %d.%d.%03d (cmd intf rev %d), max commands %d\n", > - (int) (dev->caps.fw_ver >> 32), > - (int) (dev->caps.fw_ver >> 16) & 0xffff, > - (int) dev->caps.fw_ver & 0xffff, > - cmd_if_rev, cmd->max_cmds); > + mlx4_info(dev, "FW version %d.%d.%03d (cmd intf rev %d), max commands %d\n", > + (int)(dev->caps.fw_ver >> 32), > + (int)(dev->caps.fw_ver >> 16) & 0xffff, > + (int)dev->caps.fw_ver & 0xffff, > + cmd_if_rev, cmd->max_cmds); > > MLX4_GET(fw->catas_offset, outbox, QUERY_FW_ERR_START_OFFSET); > MLX4_GET(fw->catas_size, outbox, QUERY_FW_ERR_SIZE_OFFSET); > -- > 2.9.3 >
The FW version is actually a very crucial piece of information and only printed once here when the driver is loaded. People tend to get confused when switching multiple FW files back and forth without running separate utility tools, especially at customer sites. IMHO, this information is very useful and only takes up very little log file space. :-) I was also thinking of doing something slightly differently. Maybe we just trim down the output string, and add something like this? --- a/drivers/net/ethernet/mellanox/mlx4/main.c +++ b/drivers/net/ethernet/mellanox/mlx4/main.c @@ -2208,6 +2208,11 @@ static int mlx4_init_fw(struct mlx4_dev *dev) return err; } + mlx4_info(dev, "Installed FW version is %d.%d.%03d.\n", + (int) (dev->caps.fw_ver >> 32), + (int) (dev->caps.fw_ver >> 16) & 0xffff, + (int) dev->caps.fw_ver & 0xffff); + err = mlx4_load_fw(dev); if (err) { mlx4_err(dev, "Failed to start FW, aborting\n"); Thanks, Qing On 9/13/2018 9:43 PM, Leon Romanovsky wrote: > On Thu, Sep 13, 2018 at 05:25:14PM -0700, Qing Huang wrote: >> When debugging firmware related issues, it's very helpful to have > ^^^^^^^^^^ exactly, this is why we set this print as mlx4_dbg and > not mlx4_info. > >> the installed FW version info in the kernel log when the driver is >> loaded. It's easier to match error/warning messages with different >> FW versions in the log other than running a separate tool to get >> the information back and forth. >> >> Signed-off-by: Qing Huang <qing.huang@oracle.com> >> --- >> drivers/net/ethernet/mellanox/mlx4/fw.c | 10 +++++----- >> 1 file changed, 5 insertions(+), 5 deletions(-) >> >> diff --git a/drivers/net/ethernet/mellanox/mlx4/fw.c b/drivers/net/ethernet/mellanox/mlx4/fw.c >> index babcfd9..e1c5218 100644 >> --- a/drivers/net/ethernet/mellanox/mlx4/fw.c >> +++ b/drivers/net/ethernet/mellanox/mlx4/fw.c >> @@ -1686,11 +1686,11 @@ int mlx4_QUERY_FW(struct mlx4_dev *dev) >> MLX4_GET(lg, outbox, QUERY_FW_MAX_CMD_OFFSET); >> cmd->max_cmds = 1 << lg; >> >> - mlx4_dbg(dev, "FW version %d.%d.%03d (cmd intf rev %d), max commands %d\n", >> - (int) (dev->caps.fw_ver >> 32), >> - (int) (dev->caps.fw_ver >> 16) & 0xffff, >> - (int) dev->caps.fw_ver & 0xffff, >> - cmd_if_rev, cmd->max_cmds); >> + mlx4_info(dev, "FW version %d.%d.%03d (cmd intf rev %d), max commands %d\n", >> + (int)(dev->caps.fw_ver >> 32), >> + (int)(dev->caps.fw_ver >> 16) & 0xffff, >> + (int)dev->caps.fw_ver & 0xffff, >> + cmd_if_rev, cmd->max_cmds); >> >> MLX4_GET(fw->catas_offset, outbox, QUERY_FW_ERR_START_OFFSET); >> MLX4_GET(fw->catas_size, outbox, QUERY_FW_ERR_SIZE_OFFSET); >> -- >> 2.9.3 >>
On Fri, Sep 14, 2018 at 10:15:48AM -0700, Qing Huang wrote: > The FW version is actually a very crucial piece of information and only > printed once here > when the driver is loaded. People tend to get confused when switching > multiple FW files > back and forth without running separate utility tools, especially at > customer sites. > IMHO, this information is very useful and only takes up very little log file > space. :-) Why not use ethtool -i ? $ sudo ethtool -i eth0 driver: r8169 version: 2.3LK-NAPI firmware-version: rtl8168g-2_0.0.1 02/06/13 Andrew
On 9/14/2018 11:17 AM, Andrew Lunn wrote: > On Fri, Sep 14, 2018 at 10:15:48AM -0700, Qing Huang wrote: >> The FW version is actually a very crucial piece of information and only >> printed once here >> when the driver is loaded. People tend to get confused when switching >> multiple FW files >> back and forth without running separate utility tools, especially at >> customer sites. >> IMHO, this information is very useful and only takes up very little log file >> space. :-) > Why not use ethtool -i ? > > $ sudo ethtool -i eth0 > driver: r8169 > version: 2.3LK-NAPI > firmware-version: rtl8168g-2_0.0.1 02/06/13 > > Andrew Sure. You can also use ibstat or ibv_devinfo tool if they are installed. But it's not very convenient in some cases. E.g. A customer upgrades FW on HCAs and encounters issues. During triage, it's much easier to study customer uploaded log files when remotely testing different FW files. Thanks.
> >$ sudo ethtool -i eth0 > >driver: r8169 > >version: 2.3LK-NAPI > >firmware-version: rtl8168g-2_0.0.1 02/06/13 > > > > Andrew > Sure. You can also use ibstat or ibv_devinfo tool if they are installed. But > it's not very convenient in some cases. This is the standardised way to do this. It should work for any Ethernet driver, so long as it fills in the needed information. Anything else is non-standard, and so inconvenient by definition. Andrew
From: Qing Huang <qing.huang@oracle.com> Date: Fri, 14 Sep 2018 10:15:48 -0700 > IMHO, this information is very useful and only takes up very little > log file space. :-) If it's critical then the log is the wrong place for it as the log is lossy. The proper place to obtain this information is via the fw_version field of the ethtool_drvinfo struct. This can be obtained at any time and is reliable. And if it isn't reliable or correct, we must fix that.
From: Qing Huang <qing.huang@oracle.com> Date: Fri, 14 Sep 2018 11:33:40 -0700 > > > On 9/14/2018 11:17 AM, Andrew Lunn wrote: >> On Fri, Sep 14, 2018 at 10:15:48AM -0700, Qing Huang wrote: >>> The FW version is actually a very crucial piece of information and >>> only >>> printed once here >>> when the driver is loaded. People tend to get confused when switching >>> multiple FW files >>> back and forth without running separate utility tools, especially at >>> customer sites. >>> IMHO, this information is very useful and only takes up very little >>> log file >>> space. :-) >> Why not use ethtool -i ? >> >> $ sudo ethtool -i eth0 >> driver: r8169 >> version: 2.3LK-NAPI >> firmware-version: rtl8168g-2_0.0.1 02/06/13 >> >> Andrew > Sure. You can also use ibstat or ibv_devinfo tool if they are > installed. But it's not very > convenient in some cases. > > E.g. > A customer upgrades FW on HCAs and encounters issues. During triage, > it's much easier > to study customer uploaded log files when remotely testing different > FW files. Not a valid argument. You can print the ethtool output from initramfs if necessary for triage. I still stand by the fact that ethtool is the only fully reliable way to obtain this information, the kernel log is not.
From: Andrew Lunn <andrew@lunn.ch> Date: Fri, 14 Sep 2018 20:17:18 +0200 > On Fri, Sep 14, 2018 at 10:15:48AM -0700, Qing Huang wrote: >> The FW version is actually a very crucial piece of information and only >> printed once here >> when the driver is loaded. People tend to get confused when switching >> multiple FW files >> back and forth without running separate utility tools, especially at >> customer sites. >> IMHO, this information is very useful and only takes up very little log file >> space. :-) > > Why not use ethtool -i ? > > $ sudo ethtool -i eth0 > driver: r8169 > version: 2.3LK-NAPI > firmware-version: rtl8168g-2_0.0.1 02/06/13 +1
On 9/14/2018 2:14 PM, David Miller wrote: > From: Qing Huang<qing.huang@oracle.com> > Date: Fri, 14 Sep 2018 11:33:40 -0700 > >> On 9/14/2018 11:17 AM, Andrew Lunn wrote: >>> On Fri, Sep 14, 2018 at 10:15:48AM -0700, Qing Huang wrote: >>>> The FW version is actually a very crucial piece of information and >>>> only >>>> printed once here >>>> when the driver is loaded. People tend to get confused when switching >>>> multiple FW files >>>> back and forth without running separate utility tools, especially at >>>> customer sites. >>>> IMHO, this information is very useful and only takes up very little >>>> log file >>>> space. :-) >>> Why not use ethtool -i ? >>> >>> $ sudo ethtool -i eth0 >>> driver: r8169 >>> version: 2.3LK-NAPI >>> firmware-version: rtl8168g-2_0.0.1 02/06/13 >>> >>> Andrew >> Sure. You can also use ibstat or ibv_devinfo tool if they are >> installed. But it's not very >> convenient in some cases. >> >> E.g. >> A customer upgrades FW on HCAs and encounters issues. During triage, >> it's much easier >> to study customer uploaded log files when remotely testing different >> FW files. > Not a valid argument. You can print the ethtool output from initramfs > if necessary for triage. > > I still stand by the fact that ethtool is the only fully reliable way > to obtain this information, the kernel log is not. This is more for Infiniband mode which depends more on features and functionalities provided in firmware and get much more frequent FW bug fixes than typical Ethernet devices. This is not meant to replace other ways of getting the information, more like an enhancement for checking log history. This can provide valuable information when tracing through system log history to discover what happened with a specific HCA drv ver and fw ver combination in the past. Regards, Qing
On Fri, Sep 14, 2018 at 03:36:46PM -0700, Qing Huang wrote: > > > On 9/14/2018 2:14 PM, David Miller wrote: > > From: Qing Huang<qing.huang@oracle.com> > > Date: Fri, 14 Sep 2018 11:33:40 -0700 > > > > > On 9/14/2018 11:17 AM, Andrew Lunn wrote: > > > > On Fri, Sep 14, 2018 at 10:15:48AM -0700, Qing Huang wrote: > > > > > The FW version is actually a very crucial piece of information and > > > > > only > > > > > printed once here > > > > > when the driver is loaded. People tend to get confused when switching > > > > > multiple FW files > > > > > back and forth without running separate utility tools, especially at > > > > > customer sites. > > > > > IMHO, this information is very useful and only takes up very little > > > > > log file > > > > > space. :-) > > > > Why not use ethtool -i ? > > > > > > > > $ sudo ethtool -i eth0 > > > > driver: r8169 > > > > version: 2.3LK-NAPI > > > > firmware-version: rtl8168g-2_0.0.1 02/06/13 > > > > > > > > Andrew > > > Sure. You can also use ibstat or ibv_devinfo tool if they are > > > installed. But it's not very > > > convenient in some cases. > > > > > > E.g. > > > A customer upgrades FW on HCAs and encounters issues. During triage, > > > it's much easier > > > to study customer uploaded log files when remotely testing different > > > FW files. > > Not a valid argument. You can print the ethtool output from initramfs > > if necessary for triage. > > > > I still stand by the fact that ethtool is the only fully reliable way > > to obtain this information, the kernel log is not. > > This is more for Infiniband mode which depends more on features and > functionalities For pure infiniband devices you have rdmatool, part of iproute2. [leonro@server-14-015 ~]$ rdma dev 1: mlx5_0: node_type ca fw 3.8.9999 node_guid 5254:00c0:fe12:3455 sys_image_guid 5254:00c0:fe12:3455 > provided in firmware and get much more frequent FW bug fixes than typical > Ethernet > devices. This is not meant to replace other ways of getting the information, > more like > an enhancement for checking log history. > > This can provide valuable information when tracing through system log > history to > discover what happened with a specific HCA drv ver and fw ver combination in > the past. > > Regards, > Qing
diff --git a/drivers/net/ethernet/mellanox/mlx4/fw.c b/drivers/net/ethernet/mellanox/mlx4/fw.c index babcfd9..e1c5218 100644 --- a/drivers/net/ethernet/mellanox/mlx4/fw.c +++ b/drivers/net/ethernet/mellanox/mlx4/fw.c @@ -1686,11 +1686,11 @@ int mlx4_QUERY_FW(struct mlx4_dev *dev) MLX4_GET(lg, outbox, QUERY_FW_MAX_CMD_OFFSET); cmd->max_cmds = 1 << lg; - mlx4_dbg(dev, "FW version %d.%d.%03d (cmd intf rev %d), max commands %d\n", - (int) (dev->caps.fw_ver >> 32), - (int) (dev->caps.fw_ver >> 16) & 0xffff, - (int) dev->caps.fw_ver & 0xffff, - cmd_if_rev, cmd->max_cmds); + mlx4_info(dev, "FW version %d.%d.%03d (cmd intf rev %d), max commands %d\n", + (int)(dev->caps.fw_ver >> 32), + (int)(dev->caps.fw_ver >> 16) & 0xffff, + (int)dev->caps.fw_ver & 0xffff, + cmd_if_rev, cmd->max_cmds); MLX4_GET(fw->catas_offset, outbox, QUERY_FW_ERR_START_OFFSET); MLX4_GET(fw->catas_size, outbox, QUERY_FW_ERR_SIZE_OFFSET);
When debugging firmware related issues, it's very helpful to have the installed FW version info in the kernel log when the driver is loaded. It's easier to match error/warning messages with different FW versions in the log other than running a separate tool to get the information back and forth. Signed-off-by: Qing Huang <qing.huang@oracle.com> --- drivers/net/ethernet/mellanox/mlx4/fw.c | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-)