mbox series

[v8,00/10] Add Qualcomm APSS Minidump driver related support

Message ID 20240131110837.14218-1-quic_mojha@quicinc.com (mailing list archive)
Headers show
Series Add Qualcomm APSS Minidump driver related support | expand

Message

Mukesh Ojha Jan. 31, 2024, 11:08 a.m. UTC
Abstract and PDF here:
https://lpc.events/event/17/contributions/1468/

Video:
https://www.youtube.com/watch?v=3vL3gtAu84s

Patch 1 deals in detail documentation on minidump.
Patch 2-4 refactors minidump existing layout and separate it from remoteproc files.
Patch 6 is the Qualcomm APSS minidump driver.
Patch 7-10 Enable support to reserve dynamic ramoops and the support to
  register ramoops region with minidump.

Detail about Minidump is discussed in documentation patch (1/10) and also briefly
discussed after below changelog.

Changes in v8:
 - Addressed documentation comment made by Randy Dunlap.
 - Rebased on linux-next tag next-20240130

Changes in v7:
 - Addressed comment made by [Pavan.K] to use generic notifiers.
 - Addresses comment made on Dynamic ramoops about error handling.
 - Significant change minidump documentation suggested by [Bryan O'Donoghue]
 - Added Reviewed by from [Bagas]
 - Renamed ramoops notifiers.

Changes in v6: https://lore.kernel.org/lkml/1700864395-1479-1-git-send-email-quic_mojha@quicinc.com/
 - Accumalated the feedback received on v5 and rebase v5 versions in v6.
 - Removed the exported function as there is no current users of them.
 - Applied [Pavan.K] suggestion on caller/callee placement of dynamic ramoops reserve memory.
 - Addressed [krzysztof] comment on sizeof() and to have qcom_apss_md_table_exit().
 - Addressed [Bagas.S] comment on minidump doc.
 - Tried to implement [Kees] suggestion in slight different way with callback registration
   with ramoops instead of pstore core.

Change in rebase v5: https://lore.kernel.org/lkml/1694429639-21484-1-git-send-email-quic_mojha@quicinc.com/
 - Rebased it on latest tag available on linux-next
 - Added missed Poovendhan sign-off on 15/17 and tested-by tag from
   Kathiravan. Thanks to him for testing and reminding me of missing sign-off.

Changes in v5: https://lore.kernel.org/lkml/1694290578-17733-1-git-send-email-quic_mojha@quicinc.com/
 - On suggestion from Pavan.k, to have single function call for minidump collection
   from remoteproc driver, separated the logic to have separate minidump file called
   qcom_rproc_minidump.c and also renamed the function from qcom_minidump() to
   qcom_rproc_minidump(); however, dropped his suggestion about rework on lazy deletion
   during region unregister in this series, will pursue it in next series.

 - To simplify the minidump driver, removed the complication for frontend and different
   backend from Greg suggestion, will pursue this once main driver gets mainlined.

 - Move the dynamic ramoops region allocation from Device tree approach to command line
   approch with the introduction command line parsing and memblock reservation during
   early boot up; Not added documentation about it yet, will add if it gets positive
   response.

 - Exporting linux banner from kernel to make minidump build also as module, however,
   minidump is a debug module and should be kernel built to get most debug information
   from kernel.

 - Tried to address comments given on dload patch series.

Changes in v4: https://lore.kernel.org/lkml/1687955688-20809-1-git-send-email-quic_mojha@quicinc.com/
 - Redesigned the driver and divided the driver into front end and backend (smem) so
   that any new backend can be attached easily to avoid code duplication.
 - Patch reordering as per the driver and subsystem to easier review of the code.
 - Removed minidump specific code from remoteproc to minidump smem based driver.
 - Enabled the all the driver as modules.
 - Address comments made on documentation and yaml and Device tree file [Krzysztof/Konrad]
 - Address comments made qcom_pstore_minidump driver and given its Device tree
   same set of properties as ramoops. [Luca/Kees]
 - Added patch for MAINTAINER file.
 - Include defconfig change as one patch as per [Krzysztof] suggestion.
 - Tried to remove the redundant file scope variables from the module as per [Krzysztof] suggestion.
 - Addressed comments made on dload mode patch v6 version
   https://lore.kernel.org/lkml/1680076012-10785-1-git-send-email-quic_mojha@quicinc.com/

Changes in v3: https://lore.kernel.org/lkml/1683133352-10046-1-git-send-email-quic_mojha@quicinc.com/
 - Addressed most of the comments by Srini on v2 and refactored the minidump driver.
    - Added platform device support
    - Unregister region support.
 - Added update region for clients.
 - Added pending region support.
 - Modified the documentation guide accordingly.
 - Added qcom_pstore_ramdump client driver which happen to add ramoops platform
   device and also registers ramoops region with minidump.
 - Added download mode patch series with this minidump series.
    https://lore.kernel.org/lkml/1680076012-10785-1-git-send-email-quic_mojha@quicinc.com/

Changes in v2: https://lore.kernel.org/lkml/1679491817-2498-1-git-send-email-quic_mojha@quicinc.com/
 - Addressed review comment made by [quic_tsoni/bmasney] to add documentation.
 - Addressed comments made by [srinivas.kandagatla]
 - Dropped pstore 6/6 from the last series, till i get conclusion to get pstore
   region in minidump.
 - Fixed issue reported by kernel test robot.

Changes in v1: https://lore.kernel.org/lkml/1676978713-7394-1-git-send-email-quic_mojha@quicinc.com/

Minidump is a best effort mechanism to collect useful and predefined data
for first level of debugging on end user devices running on Qualcomm SoCs.
It is built on the premise that System on Chip (SoC) or subsystem part of
SoC crashes, due to a range of hardware and software bugs.

Qualcomm devices in engineering mode provides a mechanism for generating
full system ramdumps for post mortem debugging. But in some cases it's
however not feasible to capture the entire content of RAM. The minidump
mechanism provides the means for selecting which snippets should be
included in the ramdump.

The core of SMEM based minidump feature is part of Qualcomm's boot
firmware code. It initializes shared memory (SMEM), which is a part of
DDR and allocates a small section of SMEM to minidump table i.e also
called global table of content (G-ToC). Each subsystem (APSS, ADSP, ...)
has their own table of segments to be included in the minidump and all
get their reference from G-ToC. Each segment/region has some details
like name, physical address and it's size etc. and it could be anywhere
scattered in the DDR.

Existing upstream Qualcomm remoteproc driver[1] already supports SMEM
based minidump feature for remoteproc instances like ADSP, MODEM, ...
where predefined selective segments of subsystem region can be dumped
as part of coredump collection which generates smaller size artifacts
compared to complete coredump of subsystem on crash.

[1]
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/drivers/remoteproc/qcom_common.c#n142

In addition to managing and querying the APSS minidump description,
the Linux driver maintains a ELF header in a segment. This segment
gets updated with section/program header whenever a new entry gets
registered.

Support for Minidump enablement on Qualcomm SoCs is pursued separately and
can be done via below series of changes. For testing, these patches can be
applied

https://lore.kernel.org/lkml/1704727654-13999-1-git-send-email-quic_mojha@quicinc.com/

https://lore.kernel.org/lkml/20240109153200.12848-12-quic_mojha@quicinc.com/
https://lore.kernel.org/lkml/20240109153200.12848-13-quic_mojha@quicinc.com/

Testing of these patches has been done on sm8450 target after enabling kernel
config like CONFIG_PSTORE_RAM/CONFIG_PSTORE_CONSOLE and once the device boots
up. Below command can be executed from sysfs to enable minidump in the firmware.

 echo mini > /sys/module/qcom_scm/parameters/download_mode

Try crashing it via devmem2 0xf11c000(this is known command to create xpu violation
and put the device crash dump mode) on command prompt.

Default storage type is set to via USB, so Minidump would be downloaded with the
help of x86_64 machine (running PCAT tool) attached to Qualcomm product which has
backed Minidump boot firmware support.

After that we will see a bunch of predefined registered region as binary blobs files
starts with md_* downloaded on the x86 machine at configured/default location in PCAT
tool from the product, more about this can be found in qualcomm minidump guide
patch.

Mukesh Ojha (10):
  docs: qcom: Add qualcomm minidump guide
  soc: qcom: Add qcom_rproc_minidump module
  remoteproc: qcom_q6v5_pas: Use qcom_rproc_minidump()
  remoteproc: qcom: Remove minidump related data from qcom_common.c
  init: export linux_banner data variable
  soc: qcom: Add Qualcomm APSS minidump kernel driver
  MAINTAINERS: Add entry for minidump related files
  pstore/ram: Add dynamic ramoops region support through commandline
  pstore/ram: Add ramoops information notifier support
  soc: qcom: register ramoops region with APSS minidump

 Documentation/admin-guide/index.rst         |   1 +
 Documentation/admin-guide/qcom_minidump.rst | 318 +++++++++
 Documentation/admin-guide/ramoops.rst       |  23 +-
 MAINTAINERS                                 |  10 +
 drivers/remoteproc/Kconfig                  |   1 +
 drivers/remoteproc/qcom_common.c            | 160 -----
 drivers/remoteproc/qcom_q6v5_pas.c          |   3 +-
 drivers/soc/qcom/Kconfig                    |  23 +
 drivers/soc/qcom/Makefile                   |   2 +
 drivers/soc/qcom/qcom_minidump.c            | 690 ++++++++++++++++++++
 drivers/soc/qcom/qcom_minidump_internal.h   |  74 +++
 drivers/soc/qcom/qcom_rproc_minidump.c      | 111 ++++
 drivers/soc/qcom/smem.c                     |  20 +
 fs/pstore/Kconfig                           |  15 +
 fs/pstore/ram.c                             | 180 ++++-
 include/linux/init.h                        |   3 +
 include/linux/pstore_ram.h                  |  20 +
 include/linux/soc/qcom/smem.h               |   2 +
 include/soc/qcom/qcom_minidump.h            |  41 ++
 init/main.c                                 |   3 +
 init/version-timestamp.c                    |   3 +
 21 files changed, 1538 insertions(+), 165 deletions(-)
 create mode 100644 Documentation/admin-guide/qcom_minidump.rst
 create mode 100644 drivers/soc/qcom/qcom_minidump.c
 create mode 100644 drivers/soc/qcom/qcom_minidump_internal.h
 create mode 100644 drivers/soc/qcom/qcom_rproc_minidump.c
 create mode 100644 include/soc/qcom/qcom_minidump.h


base-commit: 41d66f96d0f15a0a2ad6fa2208f6bac1a66cbd52

Comments

Mukesh Ojha March 4, 2024, 4:45 p.m. UTC | #1
I would really appreciate if i get review on this series..
Thank you..

-Mukesh

On 1/31/2024 4:38 PM, Mukesh Ojha wrote:
> Abstract and PDF here:
> https://lpc.events/event/17/contributions/1468/
> 
> Video:
> https://www.youtube.com/watch?v=3vL3gtAu84s
> 
> Patch 1 deals in detail documentation on minidump.
> Patch 2-4 refactors minidump existing layout and separate it from remoteproc files.
> Patch 6 is the Qualcomm APSS minidump driver.
> Patch 7-10 Enable support to reserve dynamic ramoops and the support to
>    register ramoops region with minidump.
> 
> Detail about Minidump is discussed in documentation patch (1/10) and also briefly
> discussed after below changelog.
> 
> Changes in v8:
>   - Addressed documentation comment made by Randy Dunlap.
>   - Rebased on linux-next tag next-20240130
> 
> Changes in v7:
>   - Addressed comment made by [Pavan.K] to use generic notifiers.
>   - Addresses comment made on Dynamic ramoops about error handling.
>   - Significant change minidump documentation suggested by [Bryan O'Donoghue]
>   - Added Reviewed by from [Bagas]
>   - Renamed ramoops notifiers.
> 
> Changes in v6: https://lore.kernel.org/lkml/1700864395-1479-1-git-send-email-quic_mojha@quicinc.com/
>   - Accumalated the feedback received on v5 and rebase v5 versions in v6.
>   - Removed the exported function as there is no current users of them.
>   - Applied [Pavan.K] suggestion on caller/callee placement of dynamic ramoops reserve memory.
>   - Addressed [krzysztof] comment on sizeof() and to have qcom_apss_md_table_exit().
>   - Addressed [Bagas.S] comment on minidump doc.
>   - Tried to implement [Kees] suggestion in slight different way with callback registration
>     with ramoops instead of pstore core.
> 
> Change in rebase v5: https://lore.kernel.org/lkml/1694429639-21484-1-git-send-email-quic_mojha@quicinc.com/
>   - Rebased it on latest tag available on linux-next
>   - Added missed Poovendhan sign-off on 15/17 and tested-by tag from
>     Kathiravan. Thanks to him for testing and reminding me of missing sign-off.
> 
> Changes in v5: https://lore.kernel.org/lkml/1694290578-17733-1-git-send-email-quic_mojha@quicinc.com/
>   - On suggestion from Pavan.k, to have single function call for minidump collection
>     from remoteproc driver, separated the logic to have separate minidump file called
>     qcom_rproc_minidump.c and also renamed the function from qcom_minidump() to
>     qcom_rproc_minidump(); however, dropped his suggestion about rework on lazy deletion
>     during region unregister in this series, will pursue it in next series.
> 
>   - To simplify the minidump driver, removed the complication for frontend and different
>     backend from Greg suggestion, will pursue this once main driver gets mainlined.
> 
>   - Move the dynamic ramoops region allocation from Device tree approach to command line
>     approch with the introduction command line parsing and memblock reservation during
>     early boot up; Not added documentation about it yet, will add if it gets positive
>     response.
> 
>   - Exporting linux banner from kernel to make minidump build also as module, however,
>     minidump is a debug module and should be kernel built to get most debug information
>     from kernel.
> 
>   - Tried to address comments given on dload patch series.
> 
> Changes in v4: https://lore.kernel.org/lkml/1687955688-20809-1-git-send-email-quic_mojha@quicinc.com/
>   - Redesigned the driver and divided the driver into front end and backend (smem) so
>     that any new backend can be attached easily to avoid code duplication.
>   - Patch reordering as per the driver and subsystem to easier review of the code.
>   - Removed minidump specific code from remoteproc to minidump smem based driver.
>   - Enabled the all the driver as modules.
>   - Address comments made on documentation and yaml and Device tree file [Krzysztof/Konrad]
>   - Address comments made qcom_pstore_minidump driver and given its Device tree
>     same set of properties as ramoops. [Luca/Kees]
>   - Added patch for MAINTAINER file.
>   - Include defconfig change as one patch as per [Krzysztof] suggestion.
>   - Tried to remove the redundant file scope variables from the module as per [Krzysztof] suggestion.
>   - Addressed comments made on dload mode patch v6 version
>     https://lore.kernel.org/lkml/1680076012-10785-1-git-send-email-quic_mojha@quicinc.com/
> 
> Changes in v3: https://lore.kernel.org/lkml/1683133352-10046-1-git-send-email-quic_mojha@quicinc.com/
>   - Addressed most of the comments by Srini on v2 and refactored the minidump driver.
>      - Added platform device support
>      - Unregister region support.
>   - Added update region for clients.
>   - Added pending region support.
>   - Modified the documentation guide accordingly.
>   - Added qcom_pstore_ramdump client driver which happen to add ramoops platform
>     device and also registers ramoops region with minidump.
>   - Added download mode patch series with this minidump series.
>      https://lore.kernel.org/lkml/1680076012-10785-1-git-send-email-quic_mojha@quicinc.com/
> 
> Changes in v2: https://lore.kernel.org/lkml/1679491817-2498-1-git-send-email-quic_mojha@quicinc.com/
>   - Addressed review comment made by [quic_tsoni/bmasney] to add documentation.
>   - Addressed comments made by [srinivas.kandagatla]
>   - Dropped pstore 6/6 from the last series, till i get conclusion to get pstore
>     region in minidump.
>   - Fixed issue reported by kernel test robot.
> 
> Changes in v1: https://lore.kernel.org/lkml/1676978713-7394-1-git-send-email-quic_mojha@quicinc.com/
> 
> Minidump is a best effort mechanism to collect useful and predefined data
> for first level of debugging on end user devices running on Qualcomm SoCs.
> It is built on the premise that System on Chip (SoC) or subsystem part of
> SoC crashes, due to a range of hardware and software bugs.
> 
> Qualcomm devices in engineering mode provides a mechanism for generating
> full system ramdumps for post mortem debugging. But in some cases it's
> however not feasible to capture the entire content of RAM. The minidump
> mechanism provides the means for selecting which snippets should be
> included in the ramdump.
> 
> The core of SMEM based minidump feature is part of Qualcomm's boot
> firmware code. It initializes shared memory (SMEM), which is a part of
> DDR and allocates a small section of SMEM to minidump table i.e also
> called global table of content (G-ToC). Each subsystem (APSS, ADSP, ...)
> has their own table of segments to be included in the minidump and all
> get their reference from G-ToC. Each segment/region has some details
> like name, physical address and it's size etc. and it could be anywhere
> scattered in the DDR.
> 
> Existing upstream Qualcomm remoteproc driver[1] already supports SMEM
> based minidump feature for remoteproc instances like ADSP, MODEM, ...
> where predefined selective segments of subsystem region can be dumped
> as part of coredump collection which generates smaller size artifacts
> compared to complete coredump of subsystem on crash.
> 
> [1]
> https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/drivers/remoteproc/qcom_common.c#n142
> 
> In addition to managing and querying the APSS minidump description,
> the Linux driver maintains a ELF header in a segment. This segment
> gets updated with section/program header whenever a new entry gets
> registered.
> 
> Support for Minidump enablement on Qualcomm SoCs is pursued separately and
> can be done via below series of changes. For testing, these patches can be
> applied
> 
> https://lore.kernel.org/lkml/1704727654-13999-1-git-send-email-quic_mojha@quicinc.com/
> 
> https://lore.kernel.org/lkml/20240109153200.12848-12-quic_mojha@quicinc.com/
> https://lore.kernel.org/lkml/20240109153200.12848-13-quic_mojha@quicinc.com/
> 
> Testing of these patches has been done on sm8450 target after enabling kernel
> config like CONFIG_PSTORE_RAM/CONFIG_PSTORE_CONSOLE and once the device boots
> up. Below command can be executed from sysfs to enable minidump in the firmware.
> 
>   echo mini > /sys/module/qcom_scm/parameters/download_mode
> 
> Try crashing it via devmem2 0xf11c000(this is known command to create xpu violation
> and put the device crash dump mode) on command prompt.
> 
> Default storage type is set to via USB, so Minidump would be downloaded with the
> help of x86_64 machine (running PCAT tool) attached to Qualcomm product which has
> backed Minidump boot firmware support.
> 
> After that we will see a bunch of predefined registered region as binary blobs files
> starts with md_* downloaded on the x86 machine at configured/default location in PCAT
> tool from the product, more about this can be found in qualcomm minidump guide
> patch.
> 
> Mukesh Ojha (10):
>    docs: qcom: Add qualcomm minidump guide
>    soc: qcom: Add qcom_rproc_minidump module
>    remoteproc: qcom_q6v5_pas: Use qcom_rproc_minidump()
>    remoteproc: qcom: Remove minidump related data from qcom_common.c
>    init: export linux_banner data variable
>    soc: qcom: Add Qualcomm APSS minidump kernel driver
>    MAINTAINERS: Add entry for minidump related files
>    pstore/ram: Add dynamic ramoops region support through commandline
>    pstore/ram: Add ramoops information notifier support
>    soc: qcom: register ramoops region with APSS minidump
> 
>   Documentation/admin-guide/index.rst         |   1 +
>   Documentation/admin-guide/qcom_minidump.rst | 318 +++++++++
>   Documentation/admin-guide/ramoops.rst       |  23 +-
>   MAINTAINERS                                 |  10 +
>   drivers/remoteproc/Kconfig                  |   1 +
>   drivers/remoteproc/qcom_common.c            | 160 -----
>   drivers/remoteproc/qcom_q6v5_pas.c          |   3 +-
>   drivers/soc/qcom/Kconfig                    |  23 +
>   drivers/soc/qcom/Makefile                   |   2 +
>   drivers/soc/qcom/qcom_minidump.c            | 690 ++++++++++++++++++++
>   drivers/soc/qcom/qcom_minidump_internal.h   |  74 +++
>   drivers/soc/qcom/qcom_rproc_minidump.c      | 111 ++++
>   drivers/soc/qcom/smem.c                     |  20 +
>   fs/pstore/Kconfig                           |  15 +
>   fs/pstore/ram.c                             | 180 ++++-
>   include/linux/init.h                        |   3 +
>   include/linux/pstore_ram.h                  |  20 +
>   include/linux/soc/qcom/smem.h               |   2 +
>   include/soc/qcom/qcom_minidump.h            |  41 ++
>   init/main.c                                 |   3 +
>   init/version-timestamp.c                    |   3 +
>   21 files changed, 1538 insertions(+), 165 deletions(-)
>   create mode 100644 Documentation/admin-guide/qcom_minidump.rst
>   create mode 100644 drivers/soc/qcom/qcom_minidump.c
>   create mode 100644 drivers/soc/qcom/qcom_minidump_internal.h
>   create mode 100644 drivers/soc/qcom/qcom_rproc_minidump.c
>   create mode 100644 include/soc/qcom/qcom_minidump.h
> 
> 
> base-commit: 41d66f96d0f15a0a2ad6fa2208f6bac1a66cbd52