Message ID | 20241217-abhinavk-smmu-fault-handler-v2-5-451377666cad@quicinc.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | drm/msm: add a display mmu fault handler | expand |
On Tue, Dec 17, 2024 at 04:27:57PM -0800, Jessica Zhang wrote: > From: Abhinav Kumar <quic_abhinavk@quicinc.com> > > There is no recovery mechanism in place yet to recover from mmu > faults for DPU. We can only prevent the faults by making sure there > is no misconfiguration. > > Rate-limit the snapshot capture for mmu faults to once per > msm_atomic_commit_tail() as that should be sufficient to capture > the snapshot for debugging otherwise there will be a lot of DPU > snapshots getting captured for the same fault which is redundant > and also might affect capturing even one snapshot accurately. > > Signed-off-by: Abhinav Kumar <quic_abhinavk@quicinc.com> > Signed-off-by: Jessica Zhang <quic_jesszhan@quicinc.com> > --- > drivers/gpu/drm/msm/msm_atomic.c | 2 ++ > drivers/gpu/drm/msm/msm_kms.c | 5 ++++- > drivers/gpu/drm/msm/msm_kms.h | 3 +++ > 3 files changed, 9 insertions(+), 1 deletion(-) > > diff --git a/drivers/gpu/drm/msm/msm_atomic.c b/drivers/gpu/drm/msm/msm_atomic.c > index 9c45d641b5212c11078ab38c13a519663d85e10a..9ad7eeb14d4336abd9d8a8eb1382bdddce80508a 100644 > --- a/drivers/gpu/drm/msm/msm_atomic.c > +++ b/drivers/gpu/drm/msm/msm_atomic.c > @@ -228,6 +228,8 @@ void msm_atomic_commit_tail(struct drm_atomic_state *state) > if (kms->funcs->prepare_commit) > kms->funcs->prepare_commit(kms, state); > > + kms->fault_snapshot_capture = 0; > + - Please move it before the prepare_commit(). - You are accessing the same variable from different threads / cores. There should be some kind of a sync barrier. > /* > * Push atomic updates down to hardware: > */ > diff --git a/drivers/gpu/drm/msm/msm_kms.c b/drivers/gpu/drm/msm/msm_kms.c > index 78830e446355f77154fa21a5d107635bc88ba3ed..3327caf396d4fc905dc127f09515559c12666dc8 100644 > --- a/drivers/gpu/drm/msm/msm_kms.c > +++ b/drivers/gpu/drm/msm/msm_kms.c > @@ -168,7 +168,10 @@ static int msm_kms_fault_handler(void *arg, unsigned long iova, int flags, void > { > struct msm_kms *kms = arg; > > - msm_disp_snapshot_state(kms->dev); > + if (!kms->fault_snapshot_capture) { > + msm_disp_snapshot_state(kms->dev); > + kms->fault_snapshot_capture++; > + } > > return -ENOSYS; > } > diff --git a/drivers/gpu/drm/msm/msm_kms.h b/drivers/gpu/drm/msm/msm_kms.h > index e60162744c669773b6e5aef824a173647626ab4e..3ac089e26e14b824567f3cd2c62f82a1b9ea9878 100644 > --- a/drivers/gpu/drm/msm/msm_kms.h > +++ b/drivers/gpu/drm/msm/msm_kms.h > @@ -128,6 +128,9 @@ struct msm_kms { > int irq; > bool irq_requested; > > + /* rate limit the snapshot capture to once per attach */ > + int fault_snapshot_capture; > + > /* mapper-id used to request GEM buffer mapped for scanout: */ > struct msm_gem_address_space *aspace; > > > -- > 2.34.1 >
On 12/18/2024 3:20 AM, Dmitry Baryshkov wrote: > On Tue, Dec 17, 2024 at 04:27:57PM -0800, Jessica Zhang wrote: >> From: Abhinav Kumar <quic_abhinavk@quicinc.com> >> >> There is no recovery mechanism in place yet to recover from mmu >> faults for DPU. We can only prevent the faults by making sure there >> is no misconfiguration. >> >> Rate-limit the snapshot capture for mmu faults to once per >> msm_atomic_commit_tail() as that should be sufficient to capture >> the snapshot for debugging otherwise there will be a lot of DPU >> snapshots getting captured for the same fault which is redundant >> and also might affect capturing even one snapshot accurately. >> >> Signed-off-by: Abhinav Kumar <quic_abhinavk@quicinc.com> >> Signed-off-by: Jessica Zhang <quic_jesszhan@quicinc.com> >> --- >> drivers/gpu/drm/msm/msm_atomic.c | 2 ++ >> drivers/gpu/drm/msm/msm_kms.c | 5 ++++- >> drivers/gpu/drm/msm/msm_kms.h | 3 +++ >> 3 files changed, 9 insertions(+), 1 deletion(-) >> >> diff --git a/drivers/gpu/drm/msm/msm_atomic.c b/drivers/gpu/drm/msm/msm_atomic.c >> index 9c45d641b5212c11078ab38c13a519663d85e10a..9ad7eeb14d4336abd9d8a8eb1382bdddce80508a 100644 >> --- a/drivers/gpu/drm/msm/msm_atomic.c >> +++ b/drivers/gpu/drm/msm/msm_atomic.c >> @@ -228,6 +228,8 @@ void msm_atomic_commit_tail(struct drm_atomic_state *state) >> if (kms->funcs->prepare_commit) >> kms->funcs->prepare_commit(kms, state); >> >> + kms->fault_snapshot_capture = 0; >> + > > - Please move it before the prepare_commit(). > - You are accessing the same variable from different threads / cores. > There should be some kind of a sync barrier. Hi Dmitry, Ack, will add a lock for the snapshot capture counter. Thanks, Jessica Zhang > >> /* >> * Push atomic updates down to hardware: >> */ >> diff --git a/drivers/gpu/drm/msm/msm_kms.c b/drivers/gpu/drm/msm/msm_kms.c >> index 78830e446355f77154fa21a5d107635bc88ba3ed..3327caf396d4fc905dc127f09515559c12666dc8 100644 >> --- a/drivers/gpu/drm/msm/msm_kms.c >> +++ b/drivers/gpu/drm/msm/msm_kms.c >> @@ -168,7 +168,10 @@ static int msm_kms_fault_handler(void *arg, unsigned long iova, int flags, void >> { >> struct msm_kms *kms = arg; >> >> - msm_disp_snapshot_state(kms->dev); >> + if (!kms->fault_snapshot_capture) { >> + msm_disp_snapshot_state(kms->dev); >> + kms->fault_snapshot_capture++; >> + } >> >> return -ENOSYS; >> } >> diff --git a/drivers/gpu/drm/msm/msm_kms.h b/drivers/gpu/drm/msm/msm_kms.h >> index e60162744c669773b6e5aef824a173647626ab4e..3ac089e26e14b824567f3cd2c62f82a1b9ea9878 100644 >> --- a/drivers/gpu/drm/msm/msm_kms.h >> +++ b/drivers/gpu/drm/msm/msm_kms.h >> @@ -128,6 +128,9 @@ struct msm_kms { >> int irq; >> bool irq_requested; >> >> + /* rate limit the snapshot capture to once per attach */ >> + int fault_snapshot_capture; >> + >> /* mapper-id used to request GEM buffer mapped for scanout: */ >> struct msm_gem_address_space *aspace; >> >> >> -- >> 2.34.1 >> > > -- > With best wishes > Dmitry
diff --git a/drivers/gpu/drm/msm/msm_atomic.c b/drivers/gpu/drm/msm/msm_atomic.c index 9c45d641b5212c11078ab38c13a519663d85e10a..9ad7eeb14d4336abd9d8a8eb1382bdddce80508a 100644 --- a/drivers/gpu/drm/msm/msm_atomic.c +++ b/drivers/gpu/drm/msm/msm_atomic.c @@ -228,6 +228,8 @@ void msm_atomic_commit_tail(struct drm_atomic_state *state) if (kms->funcs->prepare_commit) kms->funcs->prepare_commit(kms, state); + kms->fault_snapshot_capture = 0; + /* * Push atomic updates down to hardware: */ diff --git a/drivers/gpu/drm/msm/msm_kms.c b/drivers/gpu/drm/msm/msm_kms.c index 78830e446355f77154fa21a5d107635bc88ba3ed..3327caf396d4fc905dc127f09515559c12666dc8 100644 --- a/drivers/gpu/drm/msm/msm_kms.c +++ b/drivers/gpu/drm/msm/msm_kms.c @@ -168,7 +168,10 @@ static int msm_kms_fault_handler(void *arg, unsigned long iova, int flags, void { struct msm_kms *kms = arg; - msm_disp_snapshot_state(kms->dev); + if (!kms->fault_snapshot_capture) { + msm_disp_snapshot_state(kms->dev); + kms->fault_snapshot_capture++; + } return -ENOSYS; } diff --git a/drivers/gpu/drm/msm/msm_kms.h b/drivers/gpu/drm/msm/msm_kms.h index e60162744c669773b6e5aef824a173647626ab4e..3ac089e26e14b824567f3cd2c62f82a1b9ea9878 100644 --- a/drivers/gpu/drm/msm/msm_kms.h +++ b/drivers/gpu/drm/msm/msm_kms.h @@ -128,6 +128,9 @@ struct msm_kms { int irq; bool irq_requested; + /* rate limit the snapshot capture to once per attach */ + int fault_snapshot_capture; + /* mapper-id used to request GEM buffer mapped for scanout: */ struct msm_gem_address_space *aspace;