mbox series

[00/20] ASoC: SOF: Re-visit firmware state and panic tracking/handling

Message ID 20211223113628.18582-1-peter.ujfalusi@linux.intel.com (mailing list archive)
Headers show
Series ASoC: SOF: Re-visit firmware state and panic tracking/handling | expand

Message

Péter Ujfalusi Dec. 23, 2021, 11:36 a.m. UTC
Hi,

this series will improve how we are tracking the firmware's state to be able to
avoid communication with it when it is not going to answer due to a panic and
we will attempt to force power cycle the DSP to recover at the next runtime
suspend time.

The state handling brings in other improvements on the way the kernel reports
errors and DSP panics to reduce the printed lines for normal users, but at the
same time allowing developers (or for bug reports) to have more precise
information available to track down the issue.

We can now place messages easily in the correct debug level and not bound to the
static ERROR for some of the print chains, causing excess amount or partial
information to be printed, confusing users and machines (CI).

I would have prefered to split this series up, but it was developed together to
achieve a single goal to reduce the noise, but also provide the details we need
to be able to rootcause issues.

Regards,
Peter
---
Peter Ujfalusi (20):
  ASoC: SOF: ops: Use dev_warn() if the panic offsets differ
  ASoC: SOF: Intel: hda-loader: Avoid re-defining the
    HDA_FW_BOOT_ATTEMPTS
  ASoC: SOF: core: Add simple wrapper to check flags in sof_core_debug
  ASoC: SOF: Use sof_debug_check_flag() instead of sof_core_debug
    directly
  ASoC: SOF: Add 'non_recoverable' parameter to snd_sof_dsp_panic()
  ASoC: SOF: Add a 'message' parameter to snd_sof_dsp_dbg_dump()
  ASoC: SOF: Introduce new firmware state: SOF_FW_CRASHED
  ASoC: SOF: Introduce new firmware state: SOF_FW_BOOT_READY_OK
  ASoC: SOF: Move the definition of enum snd_sof_fw_state to global
    header
  ASoC: SOF: Rename 'enum snd_sof_fw_state' to 'enum sof_fw_state'
  ASoC: SOF: ipc: Only allow sending of an IPC in SOF_FW_BOOT_COMPLETE
    state
  ASoC: SOF: Set SOF_FW_BOOT_FAILED in case we have failure during boot
  ASoC: SOF: pm: Force DSP off on suspend in BOOT_FAILED state also
  ASoc: SOF: core: Update the FW boot state transition diagram
  ASoC: SOF: ops: Always print DSP Panic message but use different
    message
  ASoC: SOF: dsp_arch_ops: add kernel log level parameter for oops and
    stack
  ASoC: SOF: Rename snd_sof_get_status() and add kernel log level
    parameter
  ASoC: SOF: Add clarifying comments for sof_core_debug and DSP dump
    flags
  ASoC: SOF: debug: Use DEBUG log level for optional prints
  ASoC: SOF: Intel: hda: Use DEBUG log level for optional prints

 include/sound/sof.h              |  22 ++++++
 sound/soc/sof/core.c             | 119 +++++++++++++++++++------------
 sound/soc/sof/debug.c            |  35 +++++----
 sound/soc/sof/imx/imx-common.c   |   4 +-
 sound/soc/sof/imx/imx8.c         |   2 +-
 sound/soc/sof/imx/imx8m.c        |   2 +-
 sound/soc/sof/intel/atom.c       |   8 +--
 sound/soc/sof/intel/bdw.c        |   8 +--
 sound/soc/sof/intel/cnl.c        |  21 +++++-
 sound/soc/sof/intel/hda-ipc.c    |  19 ++++-
 sound/soc/sof/intel/hda-loader.c |  24 ++++---
 sound/soc/sof/intel/hda.c        |  20 +++---
 sound/soc/sof/intel/hda.h        |   2 +-
 sound/soc/sof/ipc.c              |   4 +-
 sound/soc/sof/loader.c           |  16 ++---
 sound/soc/sof/ops.c              |  47 ++++++++----
 sound/soc/sof/ops.h              |   4 +-
 sound/soc/sof/pm.c               |  10 +++
 sound/soc/sof/sof-priv.h         |  44 +++++-------
 sound/soc/sof/topology.c         |  12 ++--
 sound/soc/sof/xtensa/core.c      |  44 +++++++-----
 21 files changed, 299 insertions(+), 168 deletions(-)

Comments

Mark Brown Dec. 23, 2021, 5:17 p.m. UTC | #1
On Thu, 23 Dec 2021 13:36:08 +0200, Peter Ujfalusi wrote:
> this series will improve how we are tracking the firmware's state to be able to
> avoid communication with it when it is not going to answer due to a panic and
> we will attempt to force power cycle the DSP to recover at the next runtime
> suspend time.
> 
> The state handling brings in other improvements on the way the kernel reports
> errors and DSP panics to reduce the printed lines for normal users, but at the
> same time allowing developers (or for bug reports) to have more precise
> information available to track down the issue.
> 
> [...]

Applied to

   https://git.kernel.org/pub/scm/linux/kernel/git/broonie/sound.git for-next

Thanks!

[01/20] ASoC: SOF: ops: Use dev_warn() if the panic offsets differ
        commit: 72b8ed83f7eccf84c54b68a551beae400949cc29
[02/20] ASoC: SOF: Intel: hda-loader: Avoid re-defining the HDA_FW_BOOT_ATTEMPTS
        commit: b2539ef00e4427350b26896540ccabd98e88c7bb
[03/20] ASoC: SOF: core: Add simple wrapper to check flags in sof_core_debug
        commit: f902b21adba98f28eaa1cf5e509d99eaa7b1b36e
[04/20] ASoC: SOF: Use sof_debug_check_flag() instead of sof_core_debug directly
        commit: 12b401f4de787627f4a25784a0278bbbf93122b6
[05/20] ASoC: SOF: Add 'non_recoverable' parameter to snd_sof_dsp_panic()
        commit: b2b10aa79fe2fb3d3393d0e90ffb5c1802992412
[06/20] ASoC: SOF: Add a 'message' parameter to snd_sof_dsp_dbg_dump()
        commit: 2f148430b96e975e895163d763bfc9c5088100eb
[07/20] ASoC: SOF: Introduce new firmware state: SOF_FW_CRASHED
        commit: 4e1f86482189ddbef73f7be8c6e62e8e3730e6b9
[08/20] ASoC: SOF: Introduce new firmware state: SOF_FW_BOOT_READY_OK
        commit: b2e9eb3adb9a498b997b18852773e75d7af3b60d
[09/20] ASoC: SOF: Move the definition of enum snd_sof_fw_state to global header
        commit: fc179420fde3821c4d191e81b4f7b05c1dab87e2
[10/20] ASoC: SOF: Rename 'enum snd_sof_fw_state' to 'enum sof_fw_state'
        commit: d41607d37c1385da799f9a2ddb10c460e573687e
[11/20] ASoC: SOF: ipc: Only allow sending of an IPC in SOF_FW_BOOT_COMPLETE state
        commit: 9421ff7665f66452f61ee40566c6f562d3847873
[12/20] ASoC: SOF: Set SOF_FW_BOOT_FAILED in case we have failure during boot
        commit: e2406275be2b6b15d985f33aec921e6555e4f87a
[13/20] ASoC: SOF: pm: Force DSP off on suspend in BOOT_FAILED state also
        commit: b54b3a4e08bc0210768a1839af2ff888376cae4c
[14/20] ASoc: SOF: core: Update the FW boot state transition diagram
        commit: 9f89a988d5c222f2fba495bbc861a476bdf1bd30
[15/20] ASoC: SOF: ops: Always print DSP Panic message but use different message
        commit: fdc573b1c26a8859996de6fbae2d436511b74e00
[16/20] ASoC: SOF: dsp_arch_ops: add kernel log level parameter for oops and stack
        commit: b9f0bfd16d8b390b35dbec67c3ed74e74a0ade24
[17/20] ASoC: SOF: Rename snd_sof_get_status() and add kernel log level parameter
        commit: 4995ffce2ce2164fa507a5dbaf1aa38bab679cca
[18/20] ASoC: SOF: Add clarifying comments for sof_core_debug and DSP dump flags
        commit: beb6ade168177bf6c43abe78b3c9512b260b8068
[19/20] ASoC: SOF: debug: Use DEBUG log level for optional prints
        commit: 0152b8a2f0831b03bb7483159ef28167dcd33ab0
[20/20] ASoC: SOF: Intel: hda: Use DEBUG log level for optional prints
        commit: 34bfba9a63ece79c683591e757899e61fbcaa753

All being well this means that it will be integrated into the linux-next
tree (usually sometime in the next 24 hours) and sent to Linus during
the next merge window (or sooner if it is a bug fix), however if
problems are discovered then the patch may be dropped or reverted.

You may get further e-mails resulting from automated or manual testing
and review of the tree, please engage with people reporting problems and
send followup patches addressing any issues that are reported if needed.

If any updates are required or you are submitting further changes they
should be sent as incremental updates against current git, existing
patches will not be replaced.

Please add any relevant lists and maintainers to the CCs when replying
to this mail.

Thanks,
Mark