mbox series

[GIT,PULL] Qualcomm driver updates for v6.3

Message ID 20230126163008.3676950-1-andersson@kernel.org (mailing list archive)
State Changes Requested
Headers show
Series [GIT,PULL] Qualcomm driver updates for v6.3 | expand

Pull-request

https://git.kernel.org/pub/scm/linux/kernel/git/qcom/linux.git tags/qcom-drivers-for-6.3

Message

Bjorn Andersson Jan. 26, 2023, 4:30 p.m. UTC
The following changes since commit 6049aae52392539e505bfb8ccbcff3c26f1d2f0b:

  PM: AVS: qcom-cpr: Fix an error handling path in cpr_probe() (2023-01-10 09:48:13 -0600)

are available in the Git repository at:

  https://git.kernel.org/pub/scm/linux/kernel/git/qcom/linux.git tags/qcom-drivers-for-6.3

for you to fetch changes up to 6bf32599223634294cdc6efb359ffaab1d68073c:

  firmware: qcom: scm: Add wait-queue handling logic (2023-01-18 21:14:40 -0600)

----------------------------------------------------------------
Qualcomm driver updates for v6.3

This introduces a new driver for the Data Capture and Compare block,
which provides a mechanism for capturing hardware state (access MMIO
registers) either upon request of triggered automatically e.g. upon a
watchdog bite, for post mortem analysis.

The remote filesystem memory share driver gains support for having its
memory bound to more than a single VMID.

The SCM driver gains the minimal support needed to support a new
mechanism where secure world can put calls on hold and later request
them to be retried.

Support for the new SA8775P platform is added to rpmhpd, QDU1000 is
added to the SCM driver and a long list of platforms are added to the
socinfo driver. Support for socinfo data revision 16 is also introduced.

Lastly a driver to program the ramp controller in MSM8976 is introduced.

----------------------------------------------------------------
AngeloGioacchino Del Regno (2):
      dt-bindings: soc: qcom: Add bindings for Qualcomm Ramp Controller
      soc: qcom: Add Qualcomm Ramp Controller driver

Bagas Sanjaya (2):
      soc: qcom: dcc: Fix examples list on /sys/kernel/debug/dcc/.../[list-number]/config documentation
      soc: qcom: dcc: rewrite description of dcc sysfs files

Bartosz Golaszewski (2):
      dt-bindings: power: qcom,rpmpd: document sa8775p
      soc: qcom: rmphpd: add power domains for sa8775p

Bjorn Andersson (4):
      soc: qcom: ramp_controller: Include linux/bitfield.h
      soc: qcom: ramp_controller: Make things static
      Merge branch '20230109130523.298971-3-konrad.dybcio@linaro.org' into drivers-for-6.3
      Merge tag 'qcom-driver-fixes-for-6.2' into drivers-for-6.3

Bryan O'Donoghue (1):
      dt-bindings: soc: qcom: smd-rpm: Exclude MSM8936 from glink-channels

Dawei Li (1):
      soc: qcom: apr: make remove callback of apr driver void returned

Guru Das Srinagesh (2):
      dt-bindings: firmware: qcom,scm: Add optional interrupt
      firmware: qcom: scm: Add wait-queue handling logic

Konrad Dybcio (4):
      dt-bindings: reserved-memory: rmtfs: Make qcom,vmid an array
      dt-bindings: firmware: qcom: scm: Separate VMIDs from header to bindings
      Revert "soc: qcom: rpmpd: Add SM4250 support"
      Revert "dt-bindings: power: rpmpd: Add SM4250 support"

Krzysztof Kozlowski (2):
      dt-bindings: firmware: qcom,scm: document MSM8226 clocks
      dt-bindings: firmware: qcom,scm: narrow clocks and interconnects

Loic Poulain (1):
      soc: qcom: rmtfs: Optionally map RMTFS to more VMs

Melody Olvera (1):
      dt-bindings: firmware: scm: Add QDU1000/QRU1000 compatible

Naman Jain (1):
      soc: qcom: socinfo: Add support for new fields in revision 16

Neil Armstrong (1):
      dt-bindings: soc: qcom: convert non-smd RPM bindings to dt-schema

Souradeep Chowdhury (3):
      dt-bindings: soc: qcom,dcc: Add the dtschema
      soc: qcom: dcc: Add driver support for Data Capture and Compare unit(DCC)
      MAINTAINERS: Add the entry for DCC(Data Capture and Compare) driver support

Stephan Gerhold (4):
      soc: qcom: socinfo: Fix soc_id order
      dt-bindings: arm: qcom,ids: Add QRD board ID
      dt-bindings: arm: qcom,ids: Add a bunch of older SoCs
      soc: qcom: socinfo: Add a bunch of older SoCs

Yang Li (1):
      soc: qcom: dcc: Fix unsigned comparison with less than zero

 Documentation/ABI/testing/debugfs-driver-dcc       |  127 ++
 .../devicetree/bindings/firmware/qcom,scm.yaml     |   64 +-
 Documentation/devicetree/bindings/mfd/qcom-rpm.txt |  283 -----
 .../devicetree/bindings/power/qcom,rpmpd.yaml      |    2 +-
 .../bindings/reserved-memory/qcom,rmtfs-mem.yaml   |    6 +-
 .../devicetree/bindings/soc/qcom/qcom,dcc.yaml     |   44 +
 .../soc/qcom/qcom,msm8976-ramp-controller.yaml     |   36 +
 .../devicetree/bindings/soc/qcom/qcom,rpm.yaml     |  101 ++
 .../devicetree/bindings/soc/qcom/qcom,smd-rpm.yaml |    1 +
 MAINTAINERS                                        |    8 +
 drivers/firmware/qcom_scm-smc.c                    |   86 +-
 drivers/firmware/qcom_scm.c                        |   90 +-
 drivers/firmware/qcom_scm.h                        |    8 +
 drivers/soc/qcom/Kconfig                           |   17 +
 drivers/soc/qcom/Makefile                          |    2 +
 drivers/soc/qcom/dcc.c                             | 1300 ++++++++++++++++++++
 drivers/soc/qcom/ramp_controller.c                 |  343 ++++++
 drivers/soc/qcom/rmtfs_mem.c                       |   29 +-
 drivers/soc/qcom/rpmhpd.c                          |   34 +
 drivers/soc/qcom/rpmpd.c                           |   18 -
 drivers/soc/qcom/socinfo.c                         |   96 +-
 include/dt-bindings/arm/qcom,ids.h                 |   75 ++
 include/dt-bindings/firmware/qcom,scm.h            |   16 +
 include/dt-bindings/power/qcom-rpmpd.h             |   29 +-
 include/linux/qcom_scm.h                           |    6 +-
 include/linux/soc/qcom/apr.h                       |    2 +-
 sound/soc/qcom/qdsp6/q6core.c                      |    4 +-
 27 files changed, 2480 insertions(+), 347 deletions(-)
 create mode 100644 Documentation/ABI/testing/debugfs-driver-dcc
 delete mode 100644 Documentation/devicetree/bindings/mfd/qcom-rpm.txt
 create mode 100644 Documentation/devicetree/bindings/soc/qcom/qcom,dcc.yaml
 create mode 100644 Documentation/devicetree/bindings/soc/qcom/qcom,msm8976-ramp-controller.yaml
 create mode 100644 Documentation/devicetree/bindings/soc/qcom/qcom,rpm.yaml
 create mode 100644 drivers/soc/qcom/dcc.c
 create mode 100644 drivers/soc/qcom/ramp_controller.c
 create mode 100644 include/dt-bindings/firmware/qcom,scm.h

Comments

Arnd Bergmann Jan. 30, 2023, 3:18 p.m. UTC | #1
On Thu, Jan 26, 2023, at 17:30, Bjorn Andersson wrote:
> The following changes since commit 6049aae52392539e505bfb8ccbcff3c26f1d2f0b:
>
> ----------------------------------------------------------------
> Qualcomm driver updates for v6.3
>
> This introduces a new driver for the Data Capture and Compare block,
> which provides a mechanism for capturing hardware state (access MMIO
> registers) either upon request of triggered automatically e.g. upon a
> watchdog bite, for post mortem analysis.
>
> The remote filesystem memory share driver gains support for having its
> memory bound to more than a single VMID.
>
> The SCM driver gains the minimal support needed to support a new
> mechanism where secure world can put calls on hold and later request
> them to be retried.
>
> Support for the new SA8775P platform is added to rpmhpd, QDU1000 is
> added to the SCM driver and a long list of platforms are added to the
> socinfo driver. Support for socinfo data revision 16 is also introduced.
>
> Lastly a driver to program the ramp controller in MSM8976 is introduced.

Hi Bjorn,

I don't feel comfortable merging the DCC driver through drivers/soc/
at this point: This is the first time I see the driver and it introduces
a complex user space ABI that I have no time to review as part of the
merge process.

I usually try to avoid adding any custom user space interfaces
in drivers/soc, as these tend to be things that end up being
similar to other chips and need a generic interface.

In particular I don't see an explanation about how the new interface
relates to the established drivers/hwtracing/ subsystem and why it
shouldn't be part of that (adding the hwtracing and coresight
maintainers to Cc in case they have looked at this already).

Can you send an updated pull request that leaves out the
DCC driver until we have clarified these points?

      Arnd
Bjorn Andersson Jan. 30, 2023, 10:24 p.m. UTC | #2
On Mon, Jan 30, 2023 at 04:18:45PM +0100, Arnd Bergmann wrote:
> On Thu, Jan 26, 2023, at 17:30, Bjorn Andersson wrote:
> > The following changes since commit 6049aae52392539e505bfb8ccbcff3c26f1d2f0b:
> >
> > ----------------------------------------------------------------
> > Qualcomm driver updates for v6.3
> >
> > This introduces a new driver for the Data Capture and Compare block,
> > which provides a mechanism for capturing hardware state (access MMIO
> > registers) either upon request of triggered automatically e.g. upon a
> > watchdog bite, for post mortem analysis.
> >
> > The remote filesystem memory share driver gains support for having its
> > memory bound to more than a single VMID.
> >
> > The SCM driver gains the minimal support needed to support a new
> > mechanism where secure world can put calls on hold and later request
> > them to be retried.
> >
> > Support for the new SA8775P platform is added to rpmhpd, QDU1000 is
> > added to the SCM driver and a long list of platforms are added to the
> > socinfo driver. Support for socinfo data revision 16 is also introduced.
> >
> > Lastly a driver to program the ramp controller in MSM8976 is introduced.
> 
> Hi Bjorn,
> 
> I don't feel comfortable merging the DCC driver through drivers/soc/
> at this point: This is the first time I see the driver and it introduces
> a complex user space ABI that I have no time to review as part of the
> merge process.
> 

The DCC driver has made 22 versions over the last 23 months, but now
that I look back I do agree that the recipients list has been too
limited.

Further more, due to the complexity of the ABI I steered this towards
debugfs, with the explicit mentioning that we will change the interface
if needed - in particular since not a lot of review interest has
been shown...

> I usually try to avoid adding any custom user space interfaces
> in drivers/soc, as these tend to be things that end up being
> similar to other chips and need a generic interface.
> 

I have no concern with that, but I'm not able to suggest an existing
subsystem where this would fit.

> In particular I don't see an explanation about how the new interface
> relates to the established drivers/hwtracing/ subsystem and why it
> shouldn't be part of that (adding the hwtracing and coresight
> maintainers to Cc in case they have looked at this already).
> 

To my knowledge the hwtracing framework is an interface for
enabling/disabling traces and then you get a stream of trace data out of
it.

With DCC you essentially write a small "program" to be run at the time
of an exception (or triggered manually). When the "program" is run it
acquire data from mmio interfaces and stores data in sram, which can
then be retrieved - possibly after the fatal reset of the system.

Perhaps I've misunderstood the hwtracing framework, please help me steer
Souradeep towards a subsystem you find suitable for this functionality.

> Can you send an updated pull request that leaves out the
> DCC driver until we have clarified these points?
> 

I will send a new pull request, with the driver addition reverted. I
don't think there's anything controversial with the DT binding, so let's
keep that and the dts nodes (we can move the yaml if a better home is
found...).

Regards,
Bjorn
Arnd Bergmann Feb. 15, 2023, 3:05 p.m. UTC | #3
On Mon, Jan 30, 2023, at 23:24, Bjorn Andersson wrote:
> On Mon, Jan 30, 2023 at 04:18:45PM +0100, Arnd Bergmann wrote:
>> On Thu, Jan 26, 2023, at 17:30, Bjorn Andersson wrote:
>> 
>> I don't feel comfortable merging the DCC driver through drivers/soc/
>> at this point: This is the first time I see the driver and it introduces
>> a complex user space ABI that I have no time to review as part of the
>> merge process.
>> 
>
> The DCC driver has made 22 versions over the last 23 months, but now
> that I look back I do agree that the recipients list has been too
> limited.
>
> Further more, due to the complexity of the ABI I steered this towards
> debugfs, with the explicit mentioning that we will change the interface
> if needed - in particular since not a lot of review interest has
> been shown...

I'm sorry to hear this has already taken so long, I understand it's
frustrating to come up with a good userspace interface for any of
this.

>> I usually try to avoid adding any custom user space interfaces
>> in drivers/soc, as these tend to be things that end up being
>> similar to other chips and need a generic interface.
>> 
>
> I have no concern with that, but I'm not able to suggest an existing
> subsystem where this would fit.
>
>> In particular I don't see an explanation about how the new interface
>> relates to the established drivers/hwtracing/ subsystem and why it
>> shouldn't be part of that (adding the hwtracing and coresight
>> maintainers to Cc in case they have looked at this already).
>> 
>
> To my knowledge the hwtracing framework is an interface for
> enabling/disabling traces and then you get a stream of trace data out of
> it.
>
> With DCC you essentially write a small "program" to be run at the time
> of an exception (or triggered manually). When the "program" is run it
> acquire data from mmio interfaces and stores data in sram, which can
> then be retrieved - possibly after the fatal reset of the system.
>
> Perhaps I've misunderstood the hwtracing framework, please help me steer
> Souradeep towards a subsystem you find suitable for this functionality.

I'm also not too familiar with tracing infrastructure and was hoping
that the coresight maintainers (Mathieu, Suzuki, Mike and Leo)
would have some suggestions here. My initial guess was that in
both cases, you have hardware support that is abstracted by the
kernel in order to have a user interface that can be consumed
by the 'perf' tool. I probably misinterpreted the part about the
crash based trigger here, as my original (brief) reading was that
the data snapshot could be triggered by any kind of event in
the machine, which would make this useful as a more general
way of tracing the state of devices at runtime. Can you describe
how the crash trigger works, and if this would be usable with
other random hardware events aside from an explicit software
event?

I've added the perf maintainers to Cc as well now, for reference,
the now reverted commit is at
https://git.kernel.org/pub/scm/linux/kernel/git/qcom/linux.git/commit/?h=drivers-for-6.3&id=4cbe60cf5ad62
and it contains both the implementation and the documentation
of the debugfs interface.

One bit I don't see is the user space side. Is there a patch for
perf as well, or is the idea to use a custom tool for this? How
does userspace know which MMIO addresses are even valid here?

If the possible use is purely for saving some state across
a reboot, as opposed to other events, I wonder if there is
a good way to integrate it into the fs/pstore/ code, which
already has a way to multiplex various kinds of input (log
buffer, ftrace call chain, userspace strings, ...) into
various kinds of persistent buffers (sram, blockdev, mtd,
efivars, ...) with the purpose of helping analyze the
state after a reboot. 

>> Can you send an updated pull request that leaves out the
>> DCC driver until we have clarified these points?
>> 
>
> I will send a new pull request, with the driver addition reverted. I
> don't think there's anything controversial with the DT binding, so let's
> keep that and the dts nodes (we can move the yaml if a better home is
> found...)

Right, this is fine. I merged the first pull request after I saw the
revert in the second drivers branch, though I did not see a pull request
from you that replaced the first one with just the revert added as
I had expected. Also, I see that patchwork never noticed me merging
the PR, so you did not get the automated email. Maybe you can double
check the contents of the soc/drivers branch to see if the contents
are what you expect them to be.

       Arnd
Leo Yan Feb. 22, 2023, 1:43 a.m. UTC | #4
On Wed, Feb 15, 2023 at 04:05:36PM +0100, Arnd Bergmann wrote:

[...]

> > To my knowledge the hwtracing framework is an interface for
> > enabling/disabling traces and then you get a stream of trace data out of
> > it.
> >
> > With DCC you essentially write a small "program" to be run at the time
> > of an exception (or triggered manually). When the "program" is run it
> > acquire data from mmio interfaces and stores data in sram, which can
> > then be retrieved - possibly after the fatal reset of the system.
> >
> > Perhaps I've misunderstood the hwtracing framework, please help me steer
> > Souradeep towards a subsystem you find suitable for this functionality.
> 
> I'm also not too familiar with tracing infrastructure and was hoping
> that the coresight maintainers (Mathieu, Suzuki, Mike and Leo)
> would have some suggestions here. My initial guess was that in
> both cases, you have hardware support that is abstracted by the
> kernel in order to have a user interface that can be consumed
> by the 'perf' tool.

My understanding is hwtracing provides a common framework for STM so
that different tracing IPs (like Intel_th and Arm CoreSight) can
register STM module into this framework.  The framework code is placed
in: linux/drivers/hwtracing/stm.

Now kernel doesn't provide a general framework for all hardware tracing
IPs, e.g. Arm CoreSight has its own framework to manage tracing
components and creating links with sinks.

Simply to say, we can place DCC driver in linux/drivers/hwtracing folder
(like Hisilicon's ptt driver), but we have no common framework for it to
use.

Based on reading DCC's driver, seems to me it's more like a bus tracing
module rather than a uncore PMU.  I found the driver does not support
interrupt, I am not sure this is a hardware limitation or just software
doesn't implement the interrupt handling, without interrupt, it would be
difficult for using DCC for profiling.

If we register DCC into perf framework, the good thing is DCC can use
perf framework (e.g. perf's configs) as its user space interface, but
it's still not clear for me how to capture the DCC trace data (no
interrupt and not relevant with process task switching).

[...]

> If the possible use is purely for saving some state across
> a reboot, as opposed to other events, I wonder if there is
> a good way to integrate it into the fs/pstore/ code, which
> already has a way to multiplex various kinds of input (log
> buffer, ftrace call chain, userspace strings, ...) into
> various kinds of persistent buffers (sram, blockdev, mtd,
> efivars, ...) with the purpose of helping analyze the
> state after a reboot. 

Good point!

I understand pstore/ramoops is somehow like a sink which routes the
tracing data (software tracing data but not hadware tracing data) to
persistent memory.  This is why we also can route these software
tracing data to STM (hardware sink!).

Seems to me, Arnd suggests to connect two sinks between DCC and
pstore (to persistent memory).  But I cannot give an example code in
kernel for doing this way, sorry if I miss something.

Essentially, a good user case is to keep a persistent memory for the
tracing data, then after rebooting cycle we can retrieve the tracing
data via user space interface (like sysfs node).

Thanks,
Leo
Souradeep Chowdhury Feb. 22, 2023, 11:16 a.m. UTC | #5
On 2/22/2023 7:13 AM, Leo Yan wrote:
> On Wed, Feb 15, 2023 at 04:05:36PM +0100, Arnd Bergmann wrote:
> 
> [...]
> 
>>> To my knowledge the hwtracing framework is an interface for
>>> enabling/disabling traces and then you get a stream of trace data out of
>>> it.
>>>
>>> With DCC you essentially write a small "program" to be run at the time
>>> of an exception (or triggered manually). When the "program" is run it
>>> acquire data from mmio interfaces and stores data in sram, which can
>>> then be retrieved - possibly after the fatal reset of the system.
>>>
>>> Perhaps I've misunderstood the hwtracing framework, please help me steer
>>> Souradeep towards a subsystem you find suitable for this functionality.
>>
>> I'm also not too familiar with tracing infrastructure and was hoping
>> that the coresight maintainers (Mathieu, Suzuki, Mike and Leo)
>> would have some suggestions here. My initial guess was that in
>> both cases, you have hardware support that is abstracted by the
>> kernel in order to have a user interface that can be consumed
>> by the 'perf' tool.
> 
> My understanding is hwtracing provides a common framework for STM so
> that different tracing IPs (like Intel_th and Arm CoreSight) can
> register STM module into this framework.  The framework code is placed
> in: linux/drivers/hwtracing/stm.
> 
> Now kernel doesn't provide a general framework for all hardware tracing
> IPs, e.g. Arm CoreSight has its own framework to manage tracing
> components and creating links with sinks.
> 
> Simply to say, we can place DCC driver in linux/drivers/hwtracing folder
> (like Hisilicon's ptt driver), but we have no common framework for it to
> use.
> 
> Based on reading DCC's driver, seems to me it's more like a bus tracing
> module rather than a uncore PMU.  I found the driver does not support
> interrupt, I am not sure this is a hardware limitation or just software
> doesn't implement the interrupt handling, without interrupt, it would be
> difficult for using DCC for profiling.
> 
> If we register DCC into perf framework, the good thing is DCC can use
> perf framework (e.g. perf's configs) as its user space interface, but
> it's still not clear for me how to capture the DCC trace data (no
> interrupt and not relevant with process task switching).
> 
> [...]
> 
>> If the possible use is purely for saving some state across
>> a reboot, as opposed to other events, I wonder if there is
>> a good way to integrate it into the fs/pstore/ code, which
>> already has a way to multiplex various kinds of input (log
>> buffer, ftrace call chain, userspace strings, ...) into
>> various kinds of persistent buffers (sram, blockdev, mtd,
>> efivars, ...) with the purpose of helping analyze the
>> state after a reboot.
> 
> Good point!
> 
> I understand pstore/ramoops is somehow like a sink which routes the
> tracing data (software tracing data but not hadware tracing data) to
> persistent memory.  This is why we also can route these software
> tracing data to STM (hardware sink!).
> 
> Seems to me, Arnd suggests to connect two sinks between DCC and
> pstore (to persistent memory).  But I cannot give an example code in
> kernel for doing this way, sorry if I miss something.
> 
> Essentially, a good user case is to keep a persistent memory for the
> tracing data, then after rebooting cycle we can retrieve the tracing
> data via user space interface (like sysfs node).

Hi Leo/Arnd,

Just wanted to let you know that the justification of not using PStore 
was already given in the version 1 of this patch series as below

https://lore.kernel.org/linux-arm-msm/ab30490c016f906fd9bc5d789198530b@codeaurora.org/#r

PStore/Ramoops only persists across warm-reboots which is present for 
chrome devices but not for android ones. Also the dcc_sram contents can
also be collected by going for a software trigger after loading the 
kernel and the dcc_sram is parsed to get the register values with the
opensource parser as below

https://source.codeaurora.org/quic/la/platform/vendor/qcom-opensource/tools/tree/dcc_parser

Pstore on the other hand can only be collected on the next reboot.

Thanks,
Souradeep

> 
> Thanks,
> Leo
Leo Yan Feb. 22, 2023, 12:13 p.m. UTC | #6
Hi Souradeep,

On Wed, Feb 22, 2023 at 04:46:07PM +0530, Souradeep Chowdhury wrote:
> On 2/22/2023 7:13 AM, Leo Yan wrote:
> > On Wed, Feb 15, 2023 at 04:05:36PM +0100, Arnd Bergmann wrote:

[...]

> > > If the possible use is purely for saving some state across
> > > a reboot, as opposed to other events, I wonder if there is
> > > a good way to integrate it into the fs/pstore/ code, which
> > > already has a way to multiplex various kinds of input (log
> > > buffer, ftrace call chain, userspace strings, ...) into
> > > various kinds of persistent buffers (sram, blockdev, mtd,
> > > efivars, ...) with the purpose of helping analyze the
> > > state after a reboot.
> > 
> > Good point!
> > 
> > I understand pstore/ramoops is somehow like a sink which routes the
> > tracing data (software tracing data but not hadware tracing data) to
> > persistent memory.  This is why we also can route these software
> > tracing data to STM (hardware sink!).
> > 
> > Seems to me, Arnd suggests to connect two sinks between DCC and
> > pstore (to persistent memory).  But I cannot give an example code in
> > kernel for doing this way, sorry if I miss something.
> > 
> > Essentially, a good user case is to keep a persistent memory for the
> > tracing data, then after rebooting cycle we can retrieve the tracing
> > data via user space interface (like sysfs node).
> 
> Hi Leo/Arnd,
> 
> Just wanted to let you know that the justification of not using PStore was
> already given in the version 1 of this patch series as below
> 
> https://lore.kernel.org/linux-arm-msm/ab30490c016f906fd9bc5d789198530b@codeaurora.org/#r
> 
> PStore/Ramoops only persists across warm-reboots which is present for chrome
> devices but not for android ones.

Thanks for the info.  Just remind a subtle difference of reboots.

Besides warm reboot, kernel can reboot system after panic (see kernel
command line option `panic`) and watchdog can reboot the system as well.

Even though Android doesn't support warm reboot, system still can reboot
on panic or by watchdog (in particular after bus lockup), pstore/ramoops
also can support these cases.

> Also the dcc_sram contents can
> also be collected by going for a software trigger after loading the kernel
> and the dcc_sram is parsed to get the register values with the
> opensource parser as below
> 
> https://source.codeaurora.org/quic/la/platform/vendor/qcom-opensource/tools/tree/dcc_parser

To be clear, current driver is fine for me (TBH, I didn't spend much
time to read it but it's very neat after quickly went through it), I
just share some info in case it's helpful for the discussion.

Thanks,
Leo
Souradeep Chowdhury Feb. 27, 2023, 12:43 p.m. UTC | #7
On 2/22/2023 5:43 PM, Leo Yan wrote:
> Hi Souradeep,
> 
> On Wed, Feb 22, 2023 at 04:46:07PM +0530, Souradeep Chowdhury wrote:
>> On 2/22/2023 7:13 AM, Leo Yan wrote:
>>> On Wed, Feb 15, 2023 at 04:05:36PM +0100, Arnd Bergmann wrote:
> 
> [...]
> 
>>>> If the possible use is purely for saving some state across
>>>> a reboot, as opposed to other events, I wonder if there is
>>>> a good way to integrate it into the fs/pstore/ code, which
>>>> already has a way to multiplex various kinds of input (log
>>>> buffer, ftrace call chain, userspace strings, ...) into
>>>> various kinds of persistent buffers (sram, blockdev, mtd,
>>>> efivars, ...) with the purpose of helping analyze the
>>>> state after a reboot.
>>>
>>> Good point!
>>>
>>> I understand pstore/ramoops is somehow like a sink which routes the
>>> tracing data (software tracing data but not hadware tracing data) to
>>> persistent memory.  This is why we also can route these software
>>> tracing data to STM (hardware sink!).
>>>
>>> Seems to me, Arnd suggests to connect two sinks between DCC and
>>> pstore (to persistent memory).  But I cannot give an example code in
>>> kernel for doing this way, sorry if I miss something.
>>>
>>> Essentially, a good user case is to keep a persistent memory for the
>>> tracing data, then after rebooting cycle we can retrieve the tracing
>>> data via user space interface (like sysfs node).
>>
>> Hi Leo/Arnd,
>>
>> Just wanted to let you know that the justification of not using PStore was
>> already given in the version 1 of this patch series as below
>>
>> https://lore.kernel.org/linux-arm-msm/ab30490c016f906fd9bc5d789198530b@codeaurora.org/#r
>>
>> PStore/Ramoops only persists across warm-reboots which is present for chrome
>> devices but not for android ones.
> 
> Thanks for the info.  Just remind a subtle difference of reboots.
> 
> Besides warm reboot, kernel can reboot system after panic (see kernel
> command line option `panic`) and watchdog can reboot the system as well.
> 
> Even though Android doesn't support warm reboot, system still can reboot
> on panic or by watchdog (in particular after bus lockup), pstore/ramoops
> also can support these cases.


So for the SoCs that doesn't support warm reboots, the DDR memory is non
persistent across panics or watchdog bites in which case the 
PStore/Ramoops cannot be of use.


> 
>> Also the dcc_sram contents can
>> also be collected by going for a software trigger after loading the kernel
>> and the dcc_sram is parsed to get the register values with the
>> opensource parser as below
>>
>> https://source.codeaurora.org/quic/la/platform/vendor/qcom-opensource/tools/tree/dcc_parser
> 
> To be clear, current driver is fine for me (TBH, I didn't spend much
> time to read it but it's very neat after quickly went through it), I
> just share some info in case it's helpful for the discussion.
> 
> Thanks,
> Leo
Trilok Soni March 13, 2023, 4:55 p.m. UTC | #8
On 2/27/2023 4:43 AM, Souradeep Chowdhury wrote:
> 
> 
> On 2/22/2023 5:43 PM, Leo Yan wrote:
>> Hi Souradeep,
>>
>> On Wed, Feb 22, 2023 at 04:46:07PM +0530, Souradeep Chowdhury wrote:
>>> On 2/22/2023 7:13 AM, Leo Yan wrote:
>>>> On Wed, Feb 15, 2023 at 04:05:36PM +0100, Arnd Bergmann wrote:
>>
>> [...]
>>
>>>>> If the possible use is purely for saving some state across
>>>>> a reboot, as opposed to other events, I wonder if there is
>>>>> a good way to integrate it into the fs/pstore/ code, which
>>>>> already has a way to multiplex various kinds of input (log
>>>>> buffer, ftrace call chain, userspace strings, ...) into
>>>>> various kinds of persistent buffers (sram, blockdev, mtd,
>>>>> efivars, ...) with the purpose of helping analyze the
>>>>> state after a reboot.
>>>>
>>>> Good point!
>>>>
>>>> I understand pstore/ramoops is somehow like a sink which routes the
>>>> tracing data (software tracing data but not hadware tracing data) to
>>>> persistent memory.  This is why we also can route these software
>>>> tracing data to STM (hardware sink!).
>>>>
>>>> Seems to me, Arnd suggests to connect two sinks between DCC and
>>>> pstore (to persistent memory).  But I cannot give an example code in
>>>> kernel for doing this way, sorry if I miss something.
>>>>
>>>> Essentially, a good user case is to keep a persistent memory for the
>>>> tracing data, then after rebooting cycle we can retrieve the tracing
>>>> data via user space interface (like sysfs node).
>>>
>>> Hi Leo/Arnd,
>>>
>>> Just wanted to let you know that the justification of not using 
>>> PStore was
>>> already given in the version 1 of this patch series as below
>>>
>>> https://lore.kernel.org/linux-arm-msm/ab30490c016f906fd9bc5d789198530b@codeaurora.org/#r
>>>
>>> PStore/Ramoops only persists across warm-reboots which is present for 
>>> chrome
>>> devices but not for android ones.
>>
>> Thanks for the info.  Just remind a subtle difference of reboots.
>>
>> Besides warm reboot, kernel can reboot system after panic (see kernel
>> command line option `panic`) and watchdog can reboot the system as well.
>>
>> Even though Android doesn't support warm reboot, system still can reboot
>> on panic or by watchdog (in particular after bus lockup), pstore/ramoops
>> also can support these cases.
> 
> 
> So for the SoCs that doesn't support warm reboots, the DDR memory is non
> persistent across panics or watchdog bites in which case the 
> PStore/Ramoops cannot be of use.
> 
> 
>>
>>> Also the dcc_sram contents can
>>> also be collected by going for a software trigger after loading the 
>>> kernel
>>> and the dcc_sram is parsed to get the register values with the
>>> opensource parser as below
>>>
>>> https://source.codeaurora.org/quic/la/platform/vendor/qcom-opensource/tools/tree/dcc_parser
>>
>> To be clear, current driver is fine for me (TBH, I didn't spend much
>> time to read it but it's very neat after quickly went through it), I
>> just share some info in case it's helpful for the discussion.


What is the conclusion here? Can we pick up the DCC now if we rebase to 
the latest tree? It seems so far response here is that driver is fine 
as-is and it can be included without any changes. I want this driver 
discussion to be concluded since we are trying to submit it for more 
than 23 months (as Bjorn counted :) ).

---Trilok Soni
Trilok Soni March 23, 2023, 5:41 p.m. UTC | #9
On 3/13/2023 9:55 AM, Trilok Soni wrote:
> On 2/27/2023 4:43 AM, Souradeep Chowdhury wrote:
>>
>>
>> On 2/22/2023 5:43 PM, Leo Yan wrote:
>>> Hi Souradeep,
>>>
>>> On Wed, Feb 22, 2023 at 04:46:07PM +0530, Souradeep Chowdhury wrote:
>>>> On 2/22/2023 7:13 AM, Leo Yan wrote:
>>>>> On Wed, Feb 15, 2023 at 04:05:36PM +0100, Arnd Bergmann wrote:
>>>
>>> [...]
>>>
>>>>>> If the possible use is purely for saving some state across
>>>>>> a reboot, as opposed to other events, I wonder if there is
>>>>>> a good way to integrate it into the fs/pstore/ code, which
>>>>>> already has a way to multiplex various kinds of input (log
>>>>>> buffer, ftrace call chain, userspace strings, ...) into
>>>>>> various kinds of persistent buffers (sram, blockdev, mtd,
>>>>>> efivars, ...) with the purpose of helping analyze the
>>>>>> state after a reboot.
>>>>>
>>>>> Good point!
>>>>>
>>>>> I understand pstore/ramoops is somehow like a sink which routes the
>>>>> tracing data (software tracing data but not hadware tracing data) to
>>>>> persistent memory.  This is why we also can route these software
>>>>> tracing data to STM (hardware sink!).
>>>>>
>>>>> Seems to me, Arnd suggests to connect two sinks between DCC and
>>>>> pstore (to persistent memory).  But I cannot give an example code in
>>>>> kernel for doing this way, sorry if I miss something.
>>>>>
>>>>> Essentially, a good user case is to keep a persistent memory for the
>>>>> tracing data, then after rebooting cycle we can retrieve the tracing
>>>>> data via user space interface (like sysfs node).
>>>>
>>>> Hi Leo/Arnd,
>>>>
>>>> Just wanted to let you know that the justification of not using 
>>>> PStore was
>>>> already given in the version 1 of this patch series as below
>>>>
>>>> https://lore.kernel.org/linux-arm-msm/ab30490c016f906fd9bc5d789198530b@codeaurora.org/#r
>>>>
>>>> PStore/Ramoops only persists across warm-reboots which is present 
>>>> for chrome
>>>> devices but not for android ones.
>>>
>>> Thanks for the info.  Just remind a subtle difference of reboots.
>>>
>>> Besides warm reboot, kernel can reboot system after panic (see kernel
>>> command line option `panic`) and watchdog can reboot the system as well.
>>>
>>> Even though Android doesn't support warm reboot, system still can reboot
>>> on panic or by watchdog (in particular after bus lockup), pstore/ramoops
>>> also can support these cases.
>>
>>
>> So for the SoCs that doesn't support warm reboots, the DDR memory is non
>> persistent across panics or watchdog bites in which case the 
>> PStore/Ramoops cannot be of use.
>>
>>
>>>
>>>> Also the dcc_sram contents can
>>>> also be collected by going for a software trigger after loading the 
>>>> kernel
>>>> and the dcc_sram is parsed to get the register values with the
>>>> opensource parser as below
>>>>
>>>> https://source.codeaurora.org/quic/la/platform/vendor/qcom-opensource/tools/tree/dcc_parser
>>>
>>> To be clear, current driver is fine for me (TBH, I didn't spend much
>>> time to read it but it's very neat after quickly went through it), I
>>> just share some info in case it's helpful for the discussion.
> 
> 
> What is the conclusion here? Can we pick up the DCC now if we rebase to 
> the latest tree? It seems so far response here is that driver is fine 
> as-is and it can be included without any changes. I want this driver 
> discussion to be concluded since we are trying to submit it for more 
> than 23 months (as Bjorn counted :) ).


Bjorn and Arnd can you please check and provide feedback? I really want 
this driver to get merged or open issues resolved.

---Trilok Soni
Bjorn Andersson March 24, 2023, 5:14 p.m. UTC | #10
On Wed, Feb 22, 2023 at 08:13:59PM +0800, Leo Yan wrote:
> Hi Souradeep,
> 
> On Wed, Feb 22, 2023 at 04:46:07PM +0530, Souradeep Chowdhury wrote:
> > On 2/22/2023 7:13 AM, Leo Yan wrote:
> > > On Wed, Feb 15, 2023 at 04:05:36PM +0100, Arnd Bergmann wrote:
> 
> [...]
> 
> > > > If the possible use is purely for saving some state across
> > > > a reboot, as opposed to other events, I wonder if there is
> > > > a good way to integrate it into the fs/pstore/ code, which
> > > > already has a way to multiplex various kinds of input (log
> > > > buffer, ftrace call chain, userspace strings, ...) into
> > > > various kinds of persistent buffers (sram, blockdev, mtd,
> > > > efivars, ...) with the purpose of helping analyze the
> > > > state after a reboot.
> > > 
> > > Good point!
> > > 
> > > I understand pstore/ramoops is somehow like a sink which routes the
> > > tracing data (software tracing data but not hadware tracing data) to
> > > persistent memory.  This is why we also can route these software
> > > tracing data to STM (hardware sink!).
> > > 
> > > Seems to me, Arnd suggests to connect two sinks between DCC and
> > > pstore (to persistent memory).  But I cannot give an example code in
> > > kernel for doing this way, sorry if I miss something.
> > > 
> > > Essentially, a good user case is to keep a persistent memory for the
> > > tracing data, then after rebooting cycle we can retrieve the tracing
> > > data via user space interface (like sysfs node).
> > 
> > Hi Leo/Arnd,
> > 
> > Just wanted to let you know that the justification of not using PStore was
> > already given in the version 1 of this patch series as below
> > 
> > https://lore.kernel.org/linux-arm-msm/ab30490c016f906fd9bc5d789198530b@codeaurora.org/#r
> > 
> > PStore/Ramoops only persists across warm-reboots which is present for chrome
> > devices but not for android ones.
> 
> Thanks for the info.  Just remind a subtle difference of reboots.
> 
> Besides warm reboot, kernel can reboot system after panic (see kernel
> command line option `panic`) and watchdog can reboot the system as well.
> 
> Even though Android doesn't support warm reboot, system still can reboot
> on panic or by watchdog (in particular after bus lockup), pstore/ramoops
> also can support these cases.
> 

The fact that DDR content isn't maintained across a typical reboot is
certainly causing issues for using pstore.

But afaiu, with pstore we capture various system state into the pstore
and then make that available after reboot. DCC provides a mechanism for
capturing hardware state (by reading register content) and storing that
in DCC-specific SRAM. This can be triggered either by a HW reset, or on
request from the user.

As such it doesn't seem to be a conceptual fit, we perhaps could present
the data as if it was pstore data, but I don't see how the reports
created at runtime would fit into that model.

> > Also the dcc_sram contents can
> > also be collected by going for a software trigger after loading the kernel
> > and the dcc_sram is parsed to get the register values with the
> > opensource parser as below
> > 
> > https://source.codeaurora.org/quic/la/platform/vendor/qcom-opensource/tools/tree/dcc_parser
> 
> To be clear, current driver is fine for me (TBH, I didn't spend much
> time to read it but it's very neat after quickly went through it), I
> just share some info in case it's helpful for the discussion.
> 

My take is still that /dev/dcc looks to provide a reasonable ABI, and
the control mechanism was pushed to debugfs for the purpose of enabling
users to play with the interface, and improve it as necessary.

Regards,
Bjorn
Bjorn Andersson March 24, 2023, 5:26 p.m. UTC | #11
On Wed, Feb 15, 2023 at 04:05:36PM +0100, Arnd Bergmann wrote:
> On Mon, Jan 30, 2023, at 23:24, Bjorn Andersson wrote:
> > On Mon, Jan 30, 2023 at 04:18:45PM +0100, Arnd Bergmann wrote:
> >> On Thu, Jan 26, 2023, at 17:30, Bjorn Andersson wrote:
> >> 
> >> I don't feel comfortable merging the DCC driver through drivers/soc/
> >> at this point: This is the first time I see the driver and it introduces
> >> a complex user space ABI that I have no time to review as part of the
> >> merge process.
> >> 
> >
> > The DCC driver has made 22 versions over the last 23 months, but now
> > that I look back I do agree that the recipients list has been too
> > limited.
> >
> > Further more, due to the complexity of the ABI I steered this towards
> > debugfs, with the explicit mentioning that we will change the interface
> > if needed - in particular since not a lot of review interest has
> > been shown...
> 
> I'm sorry to hear this has already taken so long, I understand it's
> frustrating to come up with a good userspace interface for any of
> this.
> 
> >> I usually try to avoid adding any custom user space interfaces
> >> in drivers/soc, as these tend to be things that end up being
> >> similar to other chips and need a generic interface.
> >> 
> >
> > I have no concern with that, but I'm not able to suggest an existing
> > subsystem where this would fit.
> >
> >> In particular I don't see an explanation about how the new interface
> >> relates to the established drivers/hwtracing/ subsystem and why it
> >> shouldn't be part of that (adding the hwtracing and coresight
> >> maintainers to Cc in case they have looked at this already).
> >> 
> >
> > To my knowledge the hwtracing framework is an interface for
> > enabling/disabling traces and then you get a stream of trace data out of
> > it.
> >
> > With DCC you essentially write a small "program" to be run at the time
> > of an exception (or triggered manually). When the "program" is run it
> > acquire data from mmio interfaces and stores data in sram, which can
> > then be retrieved - possibly after the fatal reset of the system.
> >
> > Perhaps I've misunderstood the hwtracing framework, please help me steer
> > Souradeep towards a subsystem you find suitable for this functionality.
> 
> I'm also not too familiar with tracing infrastructure and was hoping
> that the coresight maintainers (Mathieu, Suzuki, Mike and Leo)
> would have some suggestions here. My initial guess was that in
> both cases, you have hardware support that is abstracted by the
> kernel in order to have a user interface that can be consumed
> by the 'perf' tool. I probably misinterpreted the part about the
> crash based trigger here, as my original (brief) reading was that
> the data snapshot could be triggered by any kind of event in
> the machine, which would make this useful as a more general
> way of tracing the state of devices at runtime. Can you describe
> how the crash trigger works, and if this would be usable with
> other random hardware events aside from an explicit software
> event?
> 
> I've added the perf maintainers to Cc as well now, for reference,
> the now reverted commit is at
> https://git.kernel.org/pub/scm/linux/kernel/git/qcom/linux.git/commit/?h=drivers-for-6.3&id=4cbe60cf5ad62
> and it contains both the implementation and the documentation
> of the debugfs interface.
> 
> One bit I don't see is the user space side. Is there a patch for
> perf as well, or is the idea to use a custom tool for this? How
> does userspace know which MMIO addresses are even valid here?
> 
> If the possible use is purely for saving some state across
> a reboot, as opposed to other events, I wonder if there is
> a good way to integrate it into the fs/pstore/ code, which
> already has a way to multiplex various kinds of input (log
> buffer, ftrace call chain, userspace strings, ...) into
> various kinds of persistent buffers (sram, blockdev, mtd,
> efivars, ...) with the purpose of helping analyze the
> state after a reboot. 
> 

Iiuc pstore provides a place to store system state for analysis after a
reboot, but DCC provides essentially register dumps on demand - with the
system reset being a special case trigger.

So I think it would look neat to expose the DCC data alongside other
pstore data (except for the mentioned issues with ramoops not working on
most Qualcomm devices), but when the reboot happens DCC captures data in
the DCC SRAM, not in the pstore (whatever backing might be used). So
post boot, something would need to get the DCC data into the pstore.

To me this sounds in conflict with the pstore design.


Further more, with the reboot trigger being the special case, we'd need
to amend the pstore state in runtime to capture the case where the user
requested the DCC to capture the state.


One idea that I was looking at was to trigger a devcoredump as a way to
get the data out of the kernel, instead of a new device node. But it
doesn't seem to fit very well with existing use cases, and I haven't
used DCC sufficiently - given that it doesn't exist upstream...

We made significant changes to the control interface through the review
process, I think we have something that looks reasonable now, but I
picked the patches under the premise that it's unstable and in debugfs,
and exposing the tool to users could lead to more interest in
polishing it.

> >> Can you send an updated pull request that leaves out the
> >> DCC driver until we have clarified these points?
> >> 
> >
> > I will send a new pull request, with the driver addition reverted. I
> > don't think there's anything controversial with the DT binding, so let's
> > keep that and the dts nodes (we can move the yaml if a better home is
> > found...)
> 
> Right, this is fine. I merged the first pull request after I saw the
> revert in the second drivers branch, though I did not see a pull request
> from you that replaced the first one with just the revert added as
> I had expected. Also, I see that patchwork never noticed me merging
> the PR, so you did not get the automated email. Maybe you can double
> check the contents of the soc/drivers branch to see if the contents
> are what you expect them to be.
> 

I've promised the ChromeOS team to try really hard to keep the commits
in my branch stable, so I really try to avoid rebasing commits that has
been present in linux-next for a while.

Regards,
Bjorn
Arnd Bergmann April 4, 2023, 6:23 p.m. UTC | #12
On Fri, Mar 24, 2023, at 18:26, Bjorn Andersson wrote:
> On Wed, Feb 15, 2023 at 04:05:36PM +0100, Arnd Bergmann wrote:
>> On Mon, Jan 30, 2023, at 23:24, Bjorn Andersson wrote:
>
> Iiuc pstore provides a place to store system state for analysis after a
> reboot, but DCC provides essentially register dumps on demand - with the
> system reset being a special case trigger.
>
> So I think it would look neat to expose the DCC data alongside other
> pstore data (except for the mentioned issues with ramoops not working on
> most Qualcomm devices), but when the reboot happens DCC captures data in
> the DCC SRAM, not in the pstore (whatever backing might be used). So
> post boot, something would need to get the DCC data into the pstore.
>
> To me this sounds in conflict with the pstore design.

Sorry for the late reply.

The case I was thinking of is making the DCC SRAM a pstore backend
that might be shared with other pstore users, in place of the
other targets like PSTORE_RAM, PSTORE_BLK, EFI_VARS_PSTORE and
the powerpc64 nvram. This might be a bad idea for other reasons of
course, e.g. if it's impossible to have more than one pstore
backend active and the SRAM is too small to contain the data
that would otherwise be store here, or if it's not persistent
enough.

> Further more, with the reboot trigger being the special case, we'd need
> to amend the pstore state in runtime to capture the case where the user
> requested the DCC to capture the state.
>
>
> One idea that I was looking at was to trigger a devcoredump as a way to
> get the data out of the kernel, instead of a new device node. But it
> doesn't seem to fit very well with existing use cases, and I haven't
> used DCC sufficiently - given that it doesn't exist upstream...

You are probably right, but I'm curious about DEV_COREDUMP, as I
haven't actually seen that before, and I can't seem to find any
documentation about what its intention is, though I can see a number
of in-kernel users.

> We made significant changes to the control interface through the review
> process, I think we have something that looks reasonable now, but I
> picked the patches under the premise that it's unstable and in debugfs,
> and exposing the tool to users could lead to more interest in
> polishing it.

Ok
 
        Arnd