Message ID | 20231005170110.3221306-6-anthony.l.nguyen@intel.com (mailing list archive) |
---|---|
State | Changes Requested |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | add v2 FW logging for ice driver | expand |
Context | Check | Description |
---|---|---|
netdev/series_format | success | Posting correctly formatted |
netdev/tree_selection | success | Clearly marked for net-next |
netdev/fixes_present | success | Fixes tag not required for -next series |
netdev/header_inline | success | No static functions without inline keyword in header files |
netdev/build_32bit | success | Errors and warnings before: 9 this patch: 9 |
netdev/cc_maintainers | success | CCed 7 of 7 maintainers |
netdev/build_clang | success | Errors and warnings before: 9 this patch: 9 |
netdev/verify_signedoff | success | Signed-off-by tag matches author and committer |
netdev/deprecated_api | success | None detected |
netdev/check_selftest | success | No net selftest shell script |
netdev/verify_fixes | success | No Fixes tag |
netdev/build_allmodconfig_warn | success | Errors and warnings before: 9 this patch: 9 |
netdev/checkpatch | success | total: 0 errors, 0 warnings, 0 checks, 123 lines checked |
netdev/kdoc | success | Errors and warnings before: 0 this patch: 0 |
netdev/source_inline | success | Was 0 now: 0 |
On Thu, 5 Oct 2023 10:01:10 -0700 Tony Nguyen wrote: > From: Paul M Stillwell Jr <paul.m.stillwell.jr@intel.com> > > Add documentation for FW logging in > Documentation/networking/device-drivers/ethernet/intel/ice.rst Wrong spelling, I think, because no such file. > Signed-off-by: Paul M Stillwell Jr <paul.m.stillwell.jr@intel.com> > Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> > +Firmware (FW) logging > +--------------------- I think you need empty lines after the headers. Did you try to build this documentation and checked the warnings? > +The driver supports FW logging via the debugfs interface on PF 0 only. In order > +for FW logging to work, the NVM must support it. The 'fwlog' file will only get > +created in the ice debugfs directory if the NVM supports FW logging. Odd phrasing - "in order to work it needs to be supported" also NVM == non-volatile memory, you mean the logging goes into NVM or NVM as in FW in the NVM needs to support it? > +Module configuration > +~~~~~~~~~~~~~~~~~~~~ > +To see the status of FW logging, read the 'fwlog/modules' file like this:: > + > + # cat /sys/kernel/debug/ice/0000\:18\:00.0/fwlog/modules > + > +To configure FW logging, write to the 'fwlog/modules' file like this:: > + > + # echo <fwlog_event> <fwlog_level> > /sys/kernel/debug/ice/0000\:18\:00.0/fwlog/modules > + > +where > + > +* fwlog_level is a name as described below. Each level includes the > + messages from the previous/lower level > + > + * NONE > + * ERROR > + * WARNING > + * NORMAL > + * VERBOSE Is this going to give us a nice list when we render the docs? White space looks odd. > +* fwlog_event is a name that represents the module to receive events for. The > + module names are > + > + * GENERAL > + * CTRL > + * LINK > + * LINK_TOPO > + * DNL > + * I2C > + * SDP > + * MDIO > + * ADMINQ > + * HDMA > + * LLDP > + * DCBX > + * DCB > + * XLR > + * NVM > + * AUTH > + * VPD > + * IOSF > + * PARSER > + * SW > + * SCHEDULER > + * TXQ > + * RSVD > + * POST > + * WATCHDOG > + * TASK_DISPATCH > + * MNG > + * SYNCE > + * HEALTH > + * TSDRV > + * PFREG > + * MDLVER > + * ALL > + > +The name ALL is special and specifies setting all of the modules to the > +specified fwlog_level. > + > +Example usage to configure the modules:: > + > + # echo LINK VERBOSE > /sys/kernel/debug/ice/0000\:18\:00.0/fwlog/modules > + > +Enabling FW log > +~~~~~~~~~~~~~~~ > +Once the desired modules are configured the user enables logging. To do > +this the user can write a 1 (enable) or 0 (disable) to 'fwlog/enable'. An > +example is:: > + > + # echo 1 > /sys/kernel/debug/ice/0000\:18\:00.0/fwlog/enable Hm, so we "select" the module and then enable / disable? It'd feel more natural to steal the +/- thing from dynamic printing. To enable: # echo '+LINK VERBOSE' > /sys/kernel/debug/ice/0000\:18\:00.0/fwlog/active To disable: # echo '-LINK VERBOSE' > /sys/kernel/debug/ice/0000\:18\:00.0/fwlog/active No? > +Retrieving FW log data > +~~~~~~~~~~~~~~~~~~~~~~ > +The FW log data can be retrieved by reading from 'fwlog/data'. The user can > +write to 'fwlog/data' to clear the data. The data can only be cleared when FW > +logging is disabled. Oh, now it sounds like only one thing can be enabled at a time. Can you clarify? > The FW log data is a binary file that is sent to Intel and > +used to help debug user issues. > + > +An example to read the data is:: > + > + # cat /sys/kernel/debug/ice/0000\:18\:00.0/fwlog/data > fwlog.bin > + > +An example to clear the data is:: > + > + # echo 0 > /sys/kernel/debug/ice/0000\:18\:00.0/fwlog/data > + > +Changing how often the log events are sent to the driver > +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > +The driver receives FW log data from the Admin Receive Queue (ARQ). The > +frequency that the FW sends the ARQ events can be configured by writing to > +'fwlog/resolution'. The range is 1-128 (1 means push every log message, 128 > +means push only when the max AQ command buffer is full). The suggested value is > +10. The user can see what the value is configured to by reading > +'fwlog/resolution'. An example to set the value is:: > + > + # echo 50 > /sys/kernel/debug/ice/0000\:18\:00.0/fwlog/resolution Resolution doesn't sound quite right, batch_size maybe? > +Configuring the number of buffers used to store FW log data > +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > +The driver stores FW log data in a ring within the driver. The default size of > +the ring is 256 4K buffers. Some use cases may require more or less data so > +the user can change the number of buffers that are allocated for FW log data. > +To change the number of buffers write to 'fwlog/nr_buffs'. The value must be one > +of: 64, 128, 256, or 512. FW logging must be disabled to change the value. An > +example of changing the value is:: > + > + # echo 128 > /sys/kernel/debug/ice/0000\:18\:00.0/fwlog/nr_buffs Why 4K? The number of buffers is irrelevant to the user, why not let the user configure the size in bytes (which his how much DRAM the driver will hold hostage)?
On 10/6/2023 4:46 PM, Jakub Kicinski wrote: > On Thu, 5 Oct 2023 10:01:10 -0700 Tony Nguyen wrote: >> From: Paul M Stillwell Jr <paul.m.stillwell.jr@intel.com> >> >> Add documentation for FW logging in >> Documentation/networking/device-drivers/ethernet/intel/ice.rst > > Wrong spelling, I think, because no such file. > Sorry, hyphen vs underscore issue, will fix. >> Signed-off-by: Paul M Stillwell Jr <paul.m.stillwell.jr@intel.com> >> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> > >> +Firmware (FW) logging >> +--------------------- > > I think you need empty lines after the headers. > Did you try to build this documentation and checked the warnings? > I believe this to be correct. It is the same as the section above it for GNSS and it looks correct when complete. I did run 'make htmldocs' on this and I don't get any errors or warnings and the page looks correct. >> +The driver supports FW logging via the debugfs interface on PF 0 only. In order >> +for FW logging to work, the NVM must support it. The 'fwlog' file will only get >> +created in the ice debugfs directory if the NVM supports FW logging. > > Odd phrasing - "in order to work it needs to be supported" > > also NVM == non-volatile memory, you mean the logging goes into NVM > or NVM as in FW in the NVM needs to support it? > Yeah, I can see it as oddly phrased. What I'm trying to say is that the NVM image on the NIC has to support FW logging and if it doesn't then the 'fwlog' directory will not be created. I'll take another run at it to try to make it less confusing. >> +Module configuration >> +~~~~~~~~~~~~~~~~~~~~ >> +To see the status of FW logging, read the 'fwlog/modules' file like this:: >> + >> + # cat /sys/kernel/debug/ice/0000\:18\:00.0/fwlog/modules >> + >> +To configure FW logging, write to the 'fwlog/modules' file like this:: >> + >> + # echo <fwlog_event> <fwlog_level> > /sys/kernel/debug/ice/0000\:18\:00.0/fwlog/modules >> + >> +where >> + >> +* fwlog_level is a name as described below. Each level includes the >> + messages from the previous/lower level >> + >> + * NONE >> + * ERROR >> + * WARNING >> + * NORMAL >> + * VERBOSE > > Is this going to give us a nice list when we render the docs? > White space looks odd. > Yes, it does give a nice list >> +* fwlog_event is a name that represents the module to receive events for. The >> + module names are >> + >> + * GENERAL >> + * CTRL >> + * LINK >> + * LINK_TOPO >> + * DNL >> + * I2C >> + * SDP >> + * MDIO >> + * ADMINQ >> + * HDMA >> + * LLDP >> + * DCBX >> + * DCB >> + * XLR >> + * NVM >> + * AUTH >> + * VPD >> + * IOSF >> + * PARSER >> + * SW >> + * SCHEDULER >> + * TXQ >> + * RSVD >> + * POST >> + * WATCHDOG >> + * TASK_DISPATCH >> + * MNG >> + * SYNCE >> + * HEALTH >> + * TSDRV >> + * PFREG >> + * MDLVER >> + * ALL >> + >> +The name ALL is special and specifies setting all of the modules to the >> +specified fwlog_level. >> + >> +Example usage to configure the modules:: >> + >> + # echo LINK VERBOSE > /sys/kernel/debug/ice/0000\:18\:00.0/fwlog/modules >> + >> +Enabling FW log >> +~~~~~~~~~~~~~~~ >> +Once the desired modules are configured the user enables logging. To do >> +this the user can write a 1 (enable) or 0 (disable) to 'fwlog/enable'. An >> +example is:: >> + >> + # echo 1 > /sys/kernel/debug/ice/0000\:18\:00.0/fwlog/enable > > Hm, so we "select" the module and then enable / disable? > > It'd feel more natural to steal the +/- thing from dynamic printing. > To enable: > > # echo '+LINK VERBOSE' > /sys/kernel/debug/ice/0000\:18\:00.0/fwlog/active > > To disable: > > # echo '-LINK VERBOSE' > /sys/kernel/debug/ice/0000\:18\:00.0/fwlog/active > > No? > I like this idea, but not sure if it will work or not for us. What I'm trying to do is reduce the number of AQ commands we send to the FW when configuring/enabling logging. What normally happens is the user sets multiple different modules up with different log values so my initial thought is to allow the user to do all the configuration first and then 'enable' that configuration. This way there is only 1 AQ write to the FW instead of a bunch of them and we know that once the logging is 'enabled' then the data we get from the FW is the data that we expect to see. If we enable each module individually then we are going to get data coming from the FW as each module gets enabled. That can get confusing to the FW team as they look at the log data because they may not see all the events they expect to see in any given time because the event wasn't enabled. >> +Retrieving FW log data >> +~~~~~~~~~~~~~~~~~~~~~~ >> +The FW log data can be retrieved by reading from 'fwlog/data'. The user can >> +write to 'fwlog/data' to clear the data. The data can only be cleared when FW >> +logging is disabled. > > Oh, now it sounds like only one thing can be enabled at a time. > Can you clarify? > What I'm trying to describe here is a mechanism to read all the data (whatever modules have been enabled) as it's coming in and to also be able to clear the data in case the user wants to start fresh (by writing 0 to the file). Does that make sense? I probably wasn't clear in the previous section that the user can enable many modules at the same time. >> The FW log data is a binary file that is sent to Intel and >> +used to help debug user issues. >> + >> +An example to read the data is:: >> + >> + # cat /sys/kernel/debug/ice/0000\:18\:00.0/fwlog/data > fwlog.bin >> + >> +An example to clear the data is:: >> + >> + # echo 0 > /sys/kernel/debug/ice/0000\:18\:00.0/fwlog/data >> + >> +Changing how often the log events are sent to the driver >> +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ >> +The driver receives FW log data from the Admin Receive Queue (ARQ). The >> +frequency that the FW sends the ARQ events can be configured by writing to >> +'fwlog/resolution'. The range is 1-128 (1 means push every log message, 128 >> +means push only when the max AQ command buffer is full). The suggested value is >> +10. The user can see what the value is configured to by reading >> +'fwlog/resolution'. An example to set the value is:: >> + >> + # echo 50 > /sys/kernel/debug/ice/0000\:18\:00.0/fwlog/resolution > > Resolution doesn't sound quite right, batch_size maybe? > I agree, resolution is what the FW team uses, but I'll change this to some other name >> +Configuring the number of buffers used to store FW log data >> +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ >> +The driver stores FW log data in a ring within the driver. The default size of >> +the ring is 256 4K buffers. Some use cases may require more or less data so >> +the user can change the number of buffers that are allocated for FW log data. >> +To change the number of buffers write to 'fwlog/nr_buffs'. The value must be one >> +of: 64, 128, 256, or 512. FW logging must be disabled to change the value. An >> +example of changing the value is:: >> + >> + # echo 128 > /sys/kernel/debug/ice/0000\:18\:00.0/fwlog/nr_buffs > > Why 4K? The number of buffers is irrelevant to the user, why not let > the user configure the size in bytes (which his how much DRAM the > driver will hold hostage)? I'm trying to keep the numbers small for the user :). I could say 1048576 bytes (256 x 4096), but those kinds of numbers get unwieldy to a user (IMO). The FW logs generate a LOT of data depending on what modules are enabled so we typically need a lot of buffers to handle them. In the past we have tried to use the syslog mechanism, but we generate SO much data that we overwhelm that and lose data. That's why the idea of using static buffers is appealing to us. We could still overrun the buffers, but at least we will have contiguous data. The problem then becomes one of allocating enough space for what the user is trying to catch instead of trying to start/stop logging and hoping you get all the events in the log. I can drop the mention of 4K buffers in the documentation. Or we could use terms like 1M, 2M, 512K, et al. That would require string parsing in the driver though and I'm trying to avoid that if possible. What do you think?
On Tue, 10 Oct 2023 16:00:13 -0700 Paul M Stillwell Jr wrote: > >> +Retrieving FW log data > >> +~~~~~~~~~~~~~~~~~~~~~~ > >> +The FW log data can be retrieved by reading from 'fwlog/data'. The user can > >> +write to 'fwlog/data' to clear the data. The data can only be cleared when FW > >> +logging is disabled. > > > > Oh, now it sounds like only one thing can be enabled at a time. > > Can you clarify? > > > > What I'm trying to describe here is a mechanism to read all the data > (whatever modules have been enabled) as it's coming in and to also be > able to clear the data in case the user wants to start fresh (by writing > 0 to the file). Does that make sense? Yes that part does. > I probably wasn't clear in the > previous section that the user can enable many modules at the same time. Probably best if you describe enabling of multiple modules in the example. I'm not sure how one disables a module with the current API. > > Why 4K? The number of buffers is irrelevant to the user, why not let > > the user configure the size in bytes (which his how much DRAM the > > driver will hold hostage)? > > I'm trying to keep the numbers small for the user :). I could say > 1048576 bytes (256 x 4096), but those kinds of numbers get unwieldy to a > user (IMO). echo $((256 * 4096)) >> $the_file But also... > The FW logs generate a LOT of data depending on what modules are enabled > so we typically need a lot of buffers to handle them. > > In the past we have tried to use the syslog mechanism, but we generate > SO much data that we overwhelm that and lose data. That's why the idea > of using static buffers is appealing to us. We could still overrun the > buffers, but at least we will have contiguous data. The problem then > becomes one of allocating enough space for what the user is trying to > catch instead of trying to start/stop logging and hoping you get all the > events in the log. > > I can drop the mention of 4K buffers in the documentation. Or we could > use terms like 1M, 2M, 512K, et al. That would require string parsing in > the driver though and I'm trying to avoid that if possible. What do you > think? .. I thought such helpers already existed.
On 10/10/2023 6:18 PM, Jakub Kicinski wrote: > On Tue, 10 Oct 2023 16:00:13 -0700 Paul M Stillwell Jr wrote: >>>> +Retrieving FW log data >>>> +~~~~~~~~~~~~~~~~~~~~~~ >>>> +The FW log data can be retrieved by reading from 'fwlog/data'. The user can >>>> +write to 'fwlog/data' to clear the data. The data can only be cleared when FW >>>> +logging is disabled. >>> >>> Oh, now it sounds like only one thing can be enabled at a time. >>> Can you clarify? >>> >> >> What I'm trying to describe here is a mechanism to read all the data >> (whatever modules have been enabled) as it's coming in and to also be >> able to clear the data in case the user wants to start fresh (by writing >> 0 to the file). Does that make sense? > > Yes that part does. > >> I probably wasn't clear in the >> previous section that the user can enable many modules at the same time. > > Probably best if you describe enabling of multiple modules in the > example. I'm not sure how one disables a module with the current API. > Will do >>> Why 4K? The number of buffers is irrelevant to the user, why not let >>> the user configure the size in bytes (which his how much DRAM the >>> driver will hold hostage)? >> >> I'm trying to keep the numbers small for the user :). I could say >> 1048576 bytes (256 x 4096), but those kinds of numbers get unwieldy to a >> user (IMO). > > echo $((256 * 4096)) >> $the_file > I'll change it to be a bytes of data to store instead of number of buffers > But also... > >> The FW logs generate a LOT of data depending on what modules are enabled >> so we typically need a lot of buffers to handle them. >> >> In the past we have tried to use the syslog mechanism, but we generate >> SO much data that we overwhelm that and lose data. That's why the idea >> of using static buffers is appealing to us. We could still overrun the >> buffers, but at least we will have contiguous data. The problem then >> becomes one of allocating enough space for what the user is trying to >> catch instead of trying to start/stop logging and hoping you get all the >> events in the log. >> >> I can drop the mention of 4K buffers in the documentation. Or we could >> use terms like 1M, 2M, 512K, et al. That would require string parsing in >> the driver though and I'm trying to avoid that if possible. What do you >> think? > > .. I thought such helpers already existed. If you are referring to helpers to handle 1M et al, I couldn't find anything. I found the kstrto<x> stuff, but that doesn't handle this case correctly I don't think.
diff --git a/Documentation/networking/device_drivers/ethernet/intel/ice.rst b/Documentation/networking/device_drivers/ethernet/intel/ice.rst index e4d065c55ea8..9042349f354a 100644 --- a/Documentation/networking/device_drivers/ethernet/intel/ice.rst +++ b/Documentation/networking/device_drivers/ethernet/intel/ice.rst @@ -895,6 +895,123 @@ driver writes raw bytes by the GNSS object to the receiver through i2c. Please refer to the hardware GNSS module documentation for configuration details. +Firmware (FW) logging +--------------------- +The driver supports FW logging via the debugfs interface on PF 0 only. In order +for FW logging to work, the NVM must support it. The 'fwlog' file will only get +created in the ice debugfs directory if the NVM supports FW logging. + +Module configuration +~~~~~~~~~~~~~~~~~~~~ +To see the status of FW logging, read the 'fwlog/modules' file like this:: + + # cat /sys/kernel/debug/ice/0000\:18\:00.0/fwlog/modules + +To configure FW logging, write to the 'fwlog/modules' file like this:: + + # echo <fwlog_event> <fwlog_level> > /sys/kernel/debug/ice/0000\:18\:00.0/fwlog/modules + +where + +* fwlog_level is a name as described below. Each level includes the + messages from the previous/lower level + + * NONE + * ERROR + * WARNING + * NORMAL + * VERBOSE + +* fwlog_event is a name that represents the module to receive events for. The + module names are + + * GENERAL + * CTRL + * LINK + * LINK_TOPO + * DNL + * I2C + * SDP + * MDIO + * ADMINQ + * HDMA + * LLDP + * DCBX + * DCB + * XLR + * NVM + * AUTH + * VPD + * IOSF + * PARSER + * SW + * SCHEDULER + * TXQ + * RSVD + * POST + * WATCHDOG + * TASK_DISPATCH + * MNG + * SYNCE + * HEALTH + * TSDRV + * PFREG + * MDLVER + * ALL + +The name ALL is special and specifies setting all of the modules to the +specified fwlog_level. + +Example usage to configure the modules:: + + # echo LINK VERBOSE > /sys/kernel/debug/ice/0000\:18\:00.0/fwlog/modules + +Enabling FW log +~~~~~~~~~~~~~~~ +Once the desired modules are configured the user enables logging. To do +this the user can write a 1 (enable) or 0 (disable) to 'fwlog/enable'. An +example is:: + + # echo 1 > /sys/kernel/debug/ice/0000\:18\:00.0/fwlog/enable + +Retrieving FW log data +~~~~~~~~~~~~~~~~~~~~~~ +The FW log data can be retrieved by reading from 'fwlog/data'. The user can +write to 'fwlog/data' to clear the data. The data can only be cleared when FW +logging is disabled. The FW log data is a binary file that is sent to Intel and +used to help debug user issues. + +An example to read the data is:: + + # cat /sys/kernel/debug/ice/0000\:18\:00.0/fwlog/data > fwlog.bin + +An example to clear the data is:: + + # echo 0 > /sys/kernel/debug/ice/0000\:18\:00.0/fwlog/data + +Changing how often the log events are sent to the driver +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +The driver receives FW log data from the Admin Receive Queue (ARQ). The +frequency that the FW sends the ARQ events can be configured by writing to +'fwlog/resolution'. The range is 1-128 (1 means push every log message, 128 +means push only when the max AQ command buffer is full). The suggested value is +10. The user can see what the value is configured to by reading +'fwlog/resolution'. An example to set the value is:: + + # echo 50 > /sys/kernel/debug/ice/0000\:18\:00.0/fwlog/resolution + +Configuring the number of buffers used to store FW log data +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +The driver stores FW log data in a ring within the driver. The default size of +the ring is 256 4K buffers. Some use cases may require more or less data so +the user can change the number of buffers that are allocated for FW log data. +To change the number of buffers write to 'fwlog/nr_buffs'. The value must be one +of: 64, 128, 256, or 512. FW logging must be disabled to change the value. An +example of changing the value is:: + + # echo 128 > /sys/kernel/debug/ice/0000\:18\:00.0/fwlog/nr_buffs + + Performance Optimization ======================== Driver defaults are meant to fit a wide variety of workloads, but if further