mbox series

[v5,0/5] iommu/arm-smmu: adreno-smmu page fault handling

Message ID 20210610214431.539029-1-robdclark@gmail.com (mailing list archive)
Headers show
Series iommu/arm-smmu: adreno-smmu page fault handling | expand

Message

Rob Clark June 10, 2021, 9:44 p.m. UTC
From: Rob Clark <robdclark@chromium.org>

This picks up an earlier series[1] from Jordan, and adds additional
support needed to generate GPU devcore dumps on iova faults.  Original
description:

This is a stack to add an Adreno GPU specific handler for pagefaults. The first
patch starts by wiring up report_iommu_fault for arm-smmu. The next patch adds
a adreno-smmu-priv function hook to capture a handful of important debugging
registers such as TTBR0, CONTEXTIDR, FSYNR0 and others. This is used by the
third patch to print more detailed information on page fault such as the TTBR0
for the pagetable that caused the fault and the source of the fault as
determined by a combination of the FSYNR1 register and an internal GPU
register.

This code provides a solid base that we can expand on later for even more
extensive GPU side page fault debugging capabilities.

v5: [Rob] Use RBBM_STATUS3.SMMU_STALLED_ON_FAULT to detect case where
    GPU snapshotting needs to avoid crashdumper, and check the
    RBBM_STATUS3.SMMU_STALLED_ON_FAULT in GPU hang irq paths
v4: [Rob] Add support to stall SMMU on fault, and let the GPU driver
    resume translation after it has had a chance to snapshot the GPUs
    state
v3: Always clear FSR even if the target driver is going to handle resume
v2: Fix comment wording and function pointer check per Rob Clark

[1] https://lore.kernel.org/dri-devel/20210225175135.91922-1-jcrouse@codeaurora.org/

Jordan Crouse (3):
  iommu/arm-smmu: Add support for driver IOMMU fault handlers
  iommu/arm-smmu-qcom: Add an adreno-smmu-priv callback to get pagefault
    info
  drm/msm: Improve the a6xx page fault handler

Rob Clark (2):
  iommu/arm-smmu-qcom: Add stall support
  drm/msm: devcoredump iommu fault support

 drivers/gpu/drm/msm/adreno/a5xx_gpu.c       |  23 +++-
 drivers/gpu/drm/msm/adreno/a6xx_gpu.c       | 110 +++++++++++++++++++-
 drivers/gpu/drm/msm/adreno/a6xx_gpu_state.c |  42 ++++++--
 drivers/gpu/drm/msm/adreno/adreno_gpu.c     |  15 +++
 drivers/gpu/drm/msm/msm_gem.h               |   1 +
 drivers/gpu/drm/msm/msm_gem_submit.c        |   1 +
 drivers/gpu/drm/msm/msm_gpu.c               |  48 +++++++++
 drivers/gpu/drm/msm/msm_gpu.h               |  17 +++
 drivers/gpu/drm/msm/msm_gpummu.c            |   5 +
 drivers/gpu/drm/msm/msm_iommu.c             |  22 +++-
 drivers/gpu/drm/msm/msm_mmu.h               |   5 +-
 drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c  |  50 +++++++++
 drivers/iommu/arm/arm-smmu/arm-smmu.c       |   9 +-
 drivers/iommu/arm/arm-smmu/arm-smmu.h       |   2 +
 include/linux/adreno-smmu-priv.h            |  38 ++++++-
 15 files changed, 367 insertions(+), 21 deletions(-)

Comments

Dmitry Baryshkov July 4, 2021, 12:53 p.m. UTC | #1
Hi,

I've had splash screen disabled on my RB3. However once I've enabled it, 
I've got the attached crash during the boot on the msm/msm-next. It 
looks like it is related to this particular set of changes.

On 11/06/2021 00:44, Rob Clark wrote:
> From: Rob Clark <robdclark@chromium.org>
> 
> This picks up an earlier series[1] from Jordan, and adds additional
> support needed to generate GPU devcore dumps on iova faults.  Original
> description:
> 
> This is a stack to add an Adreno GPU specific handler for pagefaults. The first
> patch starts by wiring up report_iommu_fault for arm-smmu. The next patch adds
> a adreno-smmu-priv function hook to capture a handful of important debugging
> registers such as TTBR0, CONTEXTIDR, FSYNR0 and others. This is used by the
> third patch to print more detailed information on page fault such as the TTBR0
> for the pagetable that caused the fault and the source of the fault as
> determined by a combination of the FSYNR1 register and an internal GPU
> register.
> 
> This code provides a solid base that we can expand on later for even more
> extensive GPU side page fault debugging capabilities.
> 
> v5: [Rob] Use RBBM_STATUS3.SMMU_STALLED_ON_FAULT to detect case where
>      GPU snapshotting needs to avoid crashdumper, and check the
>      RBBM_STATUS3.SMMU_STALLED_ON_FAULT in GPU hang irq paths
> v4: [Rob] Add support to stall SMMU on fault, and let the GPU driver
>      resume translation after it has had a chance to snapshot the GPUs
>      state
> v3: Always clear FSR even if the target driver is going to handle resume
> v2: Fix comment wording and function pointer check per Rob Clark
> 
> [1] https://lore.kernel.org/dri-devel/20210225175135.91922-1-jcrouse@codeaurora.org/
> 
> Jordan Crouse (3):
>    iommu/arm-smmu: Add support for driver IOMMU fault handlers
>    iommu/arm-smmu-qcom: Add an adreno-smmu-priv callback to get pagefault
>      info
>    drm/msm: Improve the a6xx page fault handler
> 
> Rob Clark (2):
>    iommu/arm-smmu-qcom: Add stall support
>    drm/msm: devcoredump iommu fault support
> 
>   drivers/gpu/drm/msm/adreno/a5xx_gpu.c       |  23 +++-
>   drivers/gpu/drm/msm/adreno/a6xx_gpu.c       | 110 +++++++++++++++++++-
>   drivers/gpu/drm/msm/adreno/a6xx_gpu_state.c |  42 ++++++--
>   drivers/gpu/drm/msm/adreno/adreno_gpu.c     |  15 +++
>   drivers/gpu/drm/msm/msm_gem.h               |   1 +
>   drivers/gpu/drm/msm/msm_gem_submit.c        |   1 +
>   drivers/gpu/drm/msm/msm_gpu.c               |  48 +++++++++
>   drivers/gpu/drm/msm/msm_gpu.h               |  17 +++
>   drivers/gpu/drm/msm/msm_gpummu.c            |   5 +
>   drivers/gpu/drm/msm/msm_iommu.c             |  22 +++-
>   drivers/gpu/drm/msm/msm_mmu.h               |   5 +-
>   drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c  |  50 +++++++++
>   drivers/iommu/arm/arm-smmu/arm-smmu.c       |   9 +-
>   drivers/iommu/arm/arm-smmu/arm-smmu.h       |   2 +
>   include/linux/adreno-smmu-priv.h            |  38 ++++++-
>   15 files changed, 367 insertions(+), 21 deletions(-)
>
Rob Clark July 4, 2021, 6:20 p.m. UTC | #2
I suspect you are getting a dpu fault, and need:

https://lore.kernel.org/linux-arm-msm/CAF6AEGvTjTUQXqom-xhdh456tdLscbVFPQ+iud1H1gHc8A2=hA@mail.gmail.com/

I suppose Bjorn was expecting me to send that patch

BR,
-R

On Sun, Jul 4, 2021 at 5:53 AM Dmitry Baryshkov
<dmitry.baryshkov@linaro.org> wrote:
>
> Hi,
>
> I've had splash screen disabled on my RB3. However once I've enabled it,
> I've got the attached crash during the boot on the msm/msm-next. It
> looks like it is related to this particular set of changes.
>
> On 11/06/2021 00:44, Rob Clark wrote:
> > From: Rob Clark <robdclark@chromium.org>
> >
> > This picks up an earlier series[1] from Jordan, and adds additional
> > support needed to generate GPU devcore dumps on iova faults.  Original
> > description:
> >
> > This is a stack to add an Adreno GPU specific handler for pagefaults. The first
> > patch starts by wiring up report_iommu_fault for arm-smmu. The next patch adds
> > a adreno-smmu-priv function hook to capture a handful of important debugging
> > registers such as TTBR0, CONTEXTIDR, FSYNR0 and others. This is used by the
> > third patch to print more detailed information on page fault such as the TTBR0
> > for the pagetable that caused the fault and the source of the fault as
> > determined by a combination of the FSYNR1 register and an internal GPU
> > register.
> >
> > This code provides a solid base that we can expand on later for even more
> > extensive GPU side page fault debugging capabilities.
> >
> > v5: [Rob] Use RBBM_STATUS3.SMMU_STALLED_ON_FAULT to detect case where
> >      GPU snapshotting needs to avoid crashdumper, and check the
> >      RBBM_STATUS3.SMMU_STALLED_ON_FAULT in GPU hang irq paths
> > v4: [Rob] Add support to stall SMMU on fault, and let the GPU driver
> >      resume translation after it has had a chance to snapshot the GPUs
> >      state
> > v3: Always clear FSR even if the target driver is going to handle resume
> > v2: Fix comment wording and function pointer check per Rob Clark
> >
> > [1] https://lore.kernel.org/dri-devel/20210225175135.91922-1-jcrouse@codeaurora.org/
> >
> > Jordan Crouse (3):
> >    iommu/arm-smmu: Add support for driver IOMMU fault handlers
> >    iommu/arm-smmu-qcom: Add an adreno-smmu-priv callback to get pagefault
> >      info
> >    drm/msm: Improve the a6xx page fault handler
> >
> > Rob Clark (2):
> >    iommu/arm-smmu-qcom: Add stall support
> >    drm/msm: devcoredump iommu fault support
> >
> >   drivers/gpu/drm/msm/adreno/a5xx_gpu.c       |  23 +++-
> >   drivers/gpu/drm/msm/adreno/a6xx_gpu.c       | 110 +++++++++++++++++++-
> >   drivers/gpu/drm/msm/adreno/a6xx_gpu_state.c |  42 ++++++--
> >   drivers/gpu/drm/msm/adreno/adreno_gpu.c     |  15 +++
> >   drivers/gpu/drm/msm/msm_gem.h               |   1 +
> >   drivers/gpu/drm/msm/msm_gem_submit.c        |   1 +
> >   drivers/gpu/drm/msm/msm_gpu.c               |  48 +++++++++
> >   drivers/gpu/drm/msm/msm_gpu.h               |  17 +++
> >   drivers/gpu/drm/msm/msm_gpummu.c            |   5 +
> >   drivers/gpu/drm/msm/msm_iommu.c             |  22 +++-
> >   drivers/gpu/drm/msm/msm_mmu.h               |   5 +-
> >   drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c  |  50 +++++++++
> >   drivers/iommu/arm/arm-smmu/arm-smmu.c       |   9 +-
> >   drivers/iommu/arm/arm-smmu/arm-smmu.h       |   2 +
> >   include/linux/adreno-smmu-priv.h            |  38 ++++++-
> >   15 files changed, 367 insertions(+), 21 deletions(-)
> >
>
>
> --
> With best wishes
> Dmitry
Bjorn Andersson July 6, 2021, 9:36 p.m. UTC | #3
On Sun 04 Jul 13:20 CDT 2021, Rob Clark wrote:

> I suspect you are getting a dpu fault, and need:
> 
> https://lore.kernel.org/linux-arm-msm/CAF6AEGvTjTUQXqom-xhdh456tdLscbVFPQ+iud1H1gHc8A2=hA@mail.gmail.com/
> 
> I suppose Bjorn was expecting me to send that patch
> 

No, I left that discussion with the same understanding as you... But I
ended up side tracked by some other craziness.

Did you post this somewhere or would you still like me to test it and
spin a patch?

Regards,
Bjorn

> BR,
> -R
> 
> On Sun, Jul 4, 2021 at 5:53 AM Dmitry Baryshkov
> <dmitry.baryshkov@linaro.org> wrote:
> >
> > Hi,
> >
> > I've had splash screen disabled on my RB3. However once I've enabled it,
> > I've got the attached crash during the boot on the msm/msm-next. It
> > looks like it is related to this particular set of changes.
> >
> > On 11/06/2021 00:44, Rob Clark wrote:
> > > From: Rob Clark <robdclark@chromium.org>
> > >
> > > This picks up an earlier series[1] from Jordan, and adds additional
> > > support needed to generate GPU devcore dumps on iova faults.  Original
> > > description:
> > >
> > > This is a stack to add an Adreno GPU specific handler for pagefaults. The first
> > > patch starts by wiring up report_iommu_fault for arm-smmu. The next patch adds
> > > a adreno-smmu-priv function hook to capture a handful of important debugging
> > > registers such as TTBR0, CONTEXTIDR, FSYNR0 and others. This is used by the
> > > third patch to print more detailed information on page fault such as the TTBR0
> > > for the pagetable that caused the fault and the source of the fault as
> > > determined by a combination of the FSYNR1 register and an internal GPU
> > > register.
> > >
> > > This code provides a solid base that we can expand on later for even more
> > > extensive GPU side page fault debugging capabilities.
> > >
> > > v5: [Rob] Use RBBM_STATUS3.SMMU_STALLED_ON_FAULT to detect case where
> > >      GPU snapshotting needs to avoid crashdumper, and check the
> > >      RBBM_STATUS3.SMMU_STALLED_ON_FAULT in GPU hang irq paths
> > > v4: [Rob] Add support to stall SMMU on fault, and let the GPU driver
> > >      resume translation after it has had a chance to snapshot the GPUs
> > >      state
> > > v3: Always clear FSR even if the target driver is going to handle resume
> > > v2: Fix comment wording and function pointer check per Rob Clark
> > >
> > > [1] https://lore.kernel.org/dri-devel/20210225175135.91922-1-jcrouse@codeaurora.org/
> > >
> > > Jordan Crouse (3):
> > >    iommu/arm-smmu: Add support for driver IOMMU fault handlers
> > >    iommu/arm-smmu-qcom: Add an adreno-smmu-priv callback to get pagefault
> > >      info
> > >    drm/msm: Improve the a6xx page fault handler
> > >
> > > Rob Clark (2):
> > >    iommu/arm-smmu-qcom: Add stall support
> > >    drm/msm: devcoredump iommu fault support
> > >
> > >   drivers/gpu/drm/msm/adreno/a5xx_gpu.c       |  23 +++-
> > >   drivers/gpu/drm/msm/adreno/a6xx_gpu.c       | 110 +++++++++++++++++++-
> > >   drivers/gpu/drm/msm/adreno/a6xx_gpu_state.c |  42 ++++++--
> > >   drivers/gpu/drm/msm/adreno/adreno_gpu.c     |  15 +++
> > >   drivers/gpu/drm/msm/msm_gem.h               |   1 +
> > >   drivers/gpu/drm/msm/msm_gem_submit.c        |   1 +
> > >   drivers/gpu/drm/msm/msm_gpu.c               |  48 +++++++++
> > >   drivers/gpu/drm/msm/msm_gpu.h               |  17 +++
> > >   drivers/gpu/drm/msm/msm_gpummu.c            |   5 +
> > >   drivers/gpu/drm/msm/msm_iommu.c             |  22 +++-
> > >   drivers/gpu/drm/msm/msm_mmu.h               |   5 +-
> > >   drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c  |  50 +++++++++
> > >   drivers/iommu/arm/arm-smmu/arm-smmu.c       |   9 +-
> > >   drivers/iommu/arm/arm-smmu/arm-smmu.h       |   2 +
> > >   include/linux/adreno-smmu-priv.h            |  38 ++++++-
> > >   15 files changed, 367 insertions(+), 21 deletions(-)
> > >
> >
> >
> > --
> > With best wishes
> > Dmitry
John Stultz July 7, 2021, 5:12 a.m. UTC | #4
On Sun, Jul 4, 2021 at 11:16 AM Rob Clark <robdclark@gmail.com> wrote:
>
> I suspect you are getting a dpu fault, and need:
>
> https://lore.kernel.org/linux-arm-msm/CAF6AEGvTjTUQXqom-xhdh456tdLscbVFPQ+iud1H1gHc8A2=hA@mail.gmail.com/
>
> I suppose Bjorn was expecting me to send that patch

If it's helpful, I applied that and it got the db845c booting mainline
again for me (along with some reverts for a separate ext4 shrinker
crash).
Tested-by: John Stultz <john.stultz@linaro.org>

thanks
-john
Rob Clark July 7, 2021, 5:38 p.m. UTC | #5
On Tue, Jul 6, 2021 at 10:12 PM John Stultz <john.stultz@linaro.org> wrote:
>
> On Sun, Jul 4, 2021 at 11:16 AM Rob Clark <robdclark@gmail.com> wrote:
> >
> > I suspect you are getting a dpu fault, and need:
> >
> > https://lore.kernel.org/linux-arm-msm/CAF6AEGvTjTUQXqom-xhdh456tdLscbVFPQ+iud1H1gHc8A2=hA@mail.gmail.com/
> >
> > I suppose Bjorn was expecting me to send that patch
>
> If it's helpful, I applied that and it got the db845c booting mainline
> again for me (along with some reverts for a separate ext4 shrinker
> crash).
> Tested-by: John Stultz <john.stultz@linaro.org>
>

Thanks, I'll send a patch shortly

BR,
-R