diff mbox series

drm/i915/perf: Update handling of MMIO triggered reports

Message ID 20231219000543.1087706-1-umesh.nerlige.ramappa@intel.com (mailing list archive)
State New, archived
Headers show
Series drm/i915/perf: Update handling of MMIO triggered reports | expand

Commit Message

Umesh Nerlige Ramappa Dec. 19, 2023, 12:05 a.m. UTC
On XEHP platforms user is not able to find MMIO triggered reports in the
OA buffer since i915 squashes the context ID fields. These context ID
fields hold the MMIO trigger markers.

Update logic to not squash the context ID fields of MMIO triggered
reports.

Fixes: cba94bbcff08 ("drm/i915/perf: Determine context valid in OA reports")
Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
---
 drivers/gpu/drm/i915/i915_perf.c | 39 ++++++++++++++++++++++++++++----
 1 file changed, 34 insertions(+), 5 deletions(-)

Comments

Ashutosh Dixit Dec. 19, 2023, 5:28 a.m. UTC | #1
On Mon, 18 Dec 2023 16:05:43 -0800, Umesh Nerlige Ramappa wrote:
>

Hi Umesh,

> On XEHP platforms user is not able to find MMIO triggered reports in the
> OA buffer since i915 squashes the context ID fields. These context ID
> fields hold the MMIO trigger markers.
>
> Update logic to not squash the context ID fields of MMIO triggered
> reports.
>
> Fixes: cba94bbcff08 ("drm/i915/perf: Determine context valid in OA reports")
> Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
> ---
>  drivers/gpu/drm/i915/i915_perf.c | 39 ++++++++++++++++++++++++++++----
>  1 file changed, 34 insertions(+), 5 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
> index 7b1c8de2f9cb..2d695818f006 100644
> --- a/drivers/gpu/drm/i915/i915_perf.c
> +++ b/drivers/gpu/drm/i915/i915_perf.c
> @@ -772,10 +772,6 @@ static int gen8_append_oa_reports(struct i915_perf_stream *stream,
>		 * The reason field includes flags identifying what
>		 * triggered this specific report (mostly timer
>		 * triggered or e.g. due to a context switch).
> -		 *
> -		 * In MMIO triggered reports, some platforms do not set the
> -		 * reason bit in this field and it is valid to have a reason
> -		 * field of zero.
>		 */
>		reason = oa_report_reason(stream, report);
>		ctx_id = oa_context_id(stream, report32);
> @@ -787,8 +783,41 @@ static int gen8_append_oa_reports(struct i915_perf_stream *stream,
>		 *
>		 * Note: that we don't clear the valid_ctx_bit so userspace can
>		 * understand that the ID has been squashed by the kernel.
> +		 *
> +		 * Update:
> +		 *
> +		 * On XEHP platforms the behavior of context id valid bit has
> +		 * changed compared to prior platforms. To describe this, we
> +		 * define a few terms:
> +		 *
> +		 * context-switch-report: This is a report with the reason type
> +		 * being context-switch. It is generated when a context switches
> +		 * out.
> +		 *
> +		 * context-valid-bit: A bit that is set in the report ID field
> +		 * to indicate that a valid context has been loaded.
> +		 *
> +		 * gpu-idle: A condition characterized by a
> +		 * context-switch-report with context-valid-bit set to 0.
> +		 *
> +		 * On prior platforms, context-id-valid bit is set to 0 only
> +		 * when GPU goes idle. In all other reports, it is set to 1.
> +		 *
> +		 * On XEHP platforms, context-valid-bit is set to 1 in a context
> +		 * switch report if a new context switched in. For all other
> +		 * reports it is set to 0.
> +		 *
> +		 * This change in behavior causes an issue with MMIO triggered
> +		 * reports. MMIO triggered reports have the markers in the
> +		 * context ID field and the context-valid-bit is 0. The logic
> +		 * below to squash the context ID would render the report
> +		 * useless since the user will not be able to find it in the OA
> +		 * buffer. Since MMIO triggered reports exist only on XEHP,
> +		 * we should avoid squashing these for XEHP platforms.

Hmm I am wondering if this is over-information and this comment should be
made brief. For the record, here's the explanation of what is happening
from Robert Krzemien's email (which at least makes it simpler for me to
understand what is happening):

	For Gen12HP+ (ATS/DG2/PVC/MTL+) platforms, context id valid bit is
	set only for context switch reports and when a context is being
	loaded. When exiting a context, a context switch report is
	generated, ctx id is not zero, but the bit is not set. It allows us
	to distinguish whether context switch reports are generated due to
	entering or exiting GPU contexts. Ctx id field is non-zero for
	context switches and mmio triggers. Other types always have ctx id
	set to 0.

	For previous platforms (like Gen12LP, Gen9/11), the bit is set for
	all types of reports if a context is loaded. But those older
	platforms don’t have mmio triggers. Ctx id field is non-zero for
	all types of reports if a context is loaded.

	I don’t understand why i915 needs to set ctx id to 0xffffffff if
	the flag is not set. It has been removed from XE KMD as I remember
	correctly.

>		 */
> -		if (oa_report_ctx_invalid(stream, report)) {
> +
> +		if (oa_report_ctx_invalid(stream, report) &&
> +		    GRAPHICS_VER_FULL(stream->engine->i915) < IP_VER(12, 50)) {
>			ctx_id = INVALID_CTX_ID;
>			oa_context_id_squash(stream, report32);
>		}

So I am assuming there's some unknown reason (maybe not hearing from Mesa?)
why we can't get rid of the squashing even for legacy platforms. But that's
ok I guess. So this is:

Reviewed-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Ashutosh Dixit Dec. 19, 2023, 5:48 a.m. UTC | #2
On Mon, 18 Dec 2023 21:28:33 -0800, Dixit, Ashutosh wrote:
>
> On Mon, 18 Dec 2023 16:05:43 -0800, Umesh Nerlige Ramappa wrote:
> >
>
> Hi Umesh,
>
> > On XEHP platforms user is not able to find MMIO triggered reports in the
> > OA buffer since i915 squashes the context ID fields. These context ID
> > fields hold the MMIO trigger markers.
> >
> > Update logic to not squash the context ID fields of MMIO triggered
> > reports.
> >
> > Fixes: cba94bbcff08 ("drm/i915/perf: Determine context valid in OA reports")
> > Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
> > ---
> >  drivers/gpu/drm/i915/i915_perf.c | 39 ++++++++++++++++++++++++++++----
> >  1 file changed, 34 insertions(+), 5 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
> > index 7b1c8de2f9cb..2d695818f006 100644
> > --- a/drivers/gpu/drm/i915/i915_perf.c
> > +++ b/drivers/gpu/drm/i915/i915_perf.c
> > @@ -772,10 +772,6 @@ static int gen8_append_oa_reports(struct i915_perf_stream *stream,
> >		 * The reason field includes flags identifying what
> >		 * triggered this specific report (mostly timer
> >		 * triggered or e.g. due to a context switch).
> > -		 *
> > -		 * In MMIO triggered reports, some platforms do not set the
> > -		 * reason bit in this field and it is valid to have a reason
> > -		 * field of zero.
> >		 */
> >		reason = oa_report_reason(stream, report);
> >		ctx_id = oa_context_id(stream, report32);
> > @@ -787,8 +783,41 @@ static int gen8_append_oa_reports(struct i915_perf_stream *stream,
> >		 *
> >		 * Note: that we don't clear the valid_ctx_bit so userspace can
> >		 * understand that the ID has been squashed by the kernel.
> > +		 *
> > +		 * Update:
> > +		 *
> > +		 * On XEHP platforms the behavior of context id valid bit has
> > +		 * changed compared to prior platforms. To describe this, we
> > +		 * define a few terms:
> > +		 *
> > +		 * context-switch-report: This is a report with the reason type
> > +		 * being context-switch. It is generated when a context switches
> > +		 * out.
> > +		 *
> > +		 * context-valid-bit: A bit that is set in the report ID field
> > +		 * to indicate that a valid context has been loaded.
> > +		 *
> > +		 * gpu-idle: A condition characterized by a
> > +		 * context-switch-report with context-valid-bit set to 0.
> > +		 *
> > +		 * On prior platforms, context-id-valid bit is set to 0 only
> > +		 * when GPU goes idle. In all other reports, it is set to 1.
> > +		 *
> > +		 * On XEHP platforms, context-valid-bit is set to 1 in a context
> > +		 * switch report if a new context switched in. For all other
> > +		 * reports it is set to 0.
> > +		 *
> > +		 * This change in behavior causes an issue with MMIO triggered
> > +		 * reports. MMIO triggered reports have the markers in the
> > +		 * context ID field and the context-valid-bit is 0. The logic
> > +		 * below to squash the context ID would render the report
> > +		 * useless since the user will not be able to find it in the OA
> > +		 * buffer. Since MMIO triggered reports exist only on XEHP,
> > +		 * we should avoid squashing these for XEHP platforms.
>
> Hmm I am wondering if this is over-information and this comment should be
> made brief.

Let me try: "For Gen's >= 12.50, the context id valid bit is reset when a
context switches out, but the context id is still valid. Because of this we
cannot squash the context id in this case".

So this should affect both the regular as well as MMIO triggered cases
afaiu.

Anyway, please do what you think is right with the comment. I just thought
I'll chime in.

> For the record, here's the explanation of what is happening from Robert
> Krzemien's email (which at least makes it simpler for me to understand
> what is happening):
>
>	For Gen12HP+ (ATS/DG2/PVC/MTL+) platforms, context id valid bit is
>	set only for context switch reports and when a context is being
>	loaded. When exiting a context, a context switch report is
>	generated, ctx id is not zero, but the bit is not set. It allows us
>	to distinguish whether context switch reports are generated due to
>	entering or exiting GPU contexts. Ctx id field is non-zero for
>	context switches and mmio triggers. Other types always have ctx id
>	set to 0.
>
>	For previous platforms (like Gen12LP, Gen9/11), the bit is set for
>	all types of reports if a context is loaded. But those older
>	platforms don’t have mmio triggers. Ctx id field is non-zero for
>	all types of reports if a context is loaded.
>
>	I don’t understand why i915 needs to set ctx id to 0xffffffff if
>	the flag is not set. It has been removed from XE KMD as I remember
>	correctly.
>
> >		 */
> > -		if (oa_report_ctx_invalid(stream, report)) {
> > +
> > +		if (oa_report_ctx_invalid(stream, report) &&
> > +		    GRAPHICS_VER_FULL(stream->engine->i915) < IP_VER(12, 50)) {
> >			ctx_id = INVALID_CTX_ID;
> >			oa_context_id_squash(stream, report32);
> >		}
>
> So I am assuming there's some unknown reason (maybe not hearing from Mesa?)
> why we can't get rid of the squashing even for legacy platforms. But that's
> ok I guess. So this is:
>
> Reviewed-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Umesh Nerlige Ramappa Dec. 19, 2023, 6:07 a.m. UTC | #3
On Mon, Dec 18, 2023 at 09:48:39PM -0800, Dixit, Ashutosh wrote:
>On Mon, 18 Dec 2023 21:28:33 -0800, Dixit, Ashutosh wrote:
>>
>> On Mon, 18 Dec 2023 16:05:43 -0800, Umesh Nerlige Ramappa wrote:
>> >
>>
>> Hi Umesh,
>>
>> > On XEHP platforms user is not able to find MMIO triggered reports in the
>> > OA buffer since i915 squashes the context ID fields. These context ID
>> > fields hold the MMIO trigger markers.
>> >
>> > Update logic to not squash the context ID fields of MMIO triggered
>> > reports.
>> >
>> > Fixes: cba94bbcff08 ("drm/i915/perf: Determine context valid in OA reports")
>> > Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
>> > ---
>> >  drivers/gpu/drm/i915/i915_perf.c | 39 ++++++++++++++++++++++++++++----
>> >  1 file changed, 34 insertions(+), 5 deletions(-)
>> >
>> > diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
>> > index 7b1c8de2f9cb..2d695818f006 100644
>> > --- a/drivers/gpu/drm/i915/i915_perf.c
>> > +++ b/drivers/gpu/drm/i915/i915_perf.c
>> > @@ -772,10 +772,6 @@ static int gen8_append_oa_reports(struct i915_perf_stream *stream,
>> >		 * The reason field includes flags identifying what
>> >		 * triggered this specific report (mostly timer
>> >		 * triggered or e.g. due to a context switch).
>> > -		 *
>> > -		 * In MMIO triggered reports, some platforms do not set the
>> > -		 * reason bit in this field and it is valid to have a reason
>> > -		 * field of zero.
>> >		 */
>> >		reason = oa_report_reason(stream, report);
>> >		ctx_id = oa_context_id(stream, report32);
>> > @@ -787,8 +783,41 @@ static int gen8_append_oa_reports(struct i915_perf_stream *stream,
>> >		 *
>> >		 * Note: that we don't clear the valid_ctx_bit so userspace can
>> >		 * understand that the ID has been squashed by the kernel.
>> > +		 *
>> > +		 * Update:
>> > +		 *
>> > +		 * On XEHP platforms the behavior of context id valid bit has
>> > +		 * changed compared to prior platforms. To describe this, we
>> > +		 * define a few terms:
>> > +		 *
>> > +		 * context-switch-report: This is a report with the reason type
>> > +		 * being context-switch. It is generated when a context switches
>> > +		 * out.
>> > +		 *
>> > +		 * context-valid-bit: A bit that is set in the report ID field
>> > +		 * to indicate that a valid context has been loaded.
>> > +		 *
>> > +		 * gpu-idle: A condition characterized by a
>> > +		 * context-switch-report with context-valid-bit set to 0.
>> > +		 *
>> > +		 * On prior platforms, context-id-valid bit is set to 0 only
>> > +		 * when GPU goes idle. In all other reports, it is set to 1.
>> > +		 *
>> > +		 * On XEHP platforms, context-valid-bit is set to 1 in a context
>> > +		 * switch report if a new context switched in. For all other
>> > +		 * reports it is set to 0.
>> > +		 *
>> > +		 * This change in behavior causes an issue with MMIO triggered
>> > +		 * reports. MMIO triggered reports have the markers in the
>> > +		 * context ID field and the context-valid-bit is 0. The logic
>> > +		 * below to squash the context ID would render the report
>> > +		 * useless since the user will not be able to find it in the OA
>> > +		 * buffer. Since MMIO triggered reports exist only on XEHP,
>> > +		 * we should avoid squashing these for XEHP platforms.
>>
>> Hmm I am wondering if this is over-information and this comment should be
>> made brief.
>
>Let me try: "For Gen's >= 12.50, the context id valid bit is reset when a
>context switches out, but the context id is still valid. Because of this we
>cannot squash the context id in this case".
>
>So this should affect both the regular as well as MMIO triggered cases
>afaiu.
>
>Anyway, please do what you think is right with the comment. I just thought
>I'll chime in.

The long and descriptive comment is entirely for my benefit. There is a 
very good chance I will forget this, so putting it down in the code.  
Also, I don't see this described in the spec, so thinking that we will 
benefit from it by having it here. I can put it in the commit msg 
instead if that helps.

Thanks,
Umesh

>
>> For the record, here's the explanation of what is happening from Robert
>> Krzemien's email (which at least makes it simpler for me to understand
>> what is happening):
>>
>>	For Gen12HP+ (ATS/DG2/PVC/MTL+) platforms, context id valid bit is
>>	set only for context switch reports and when a context is being
>>	loaded. When exiting a context, a context switch report is
>>	generated, ctx id is not zero, but the bit is not set. It allows us
>>	to distinguish whether context switch reports are generated due to
>>	entering or exiting GPU contexts. Ctx id field is non-zero for
>>	context switches and mmio triggers. Other types always have ctx id
>>	set to 0.
>>
>>	For previous platforms (like Gen12LP, Gen9/11), the bit is set for
>>	all types of reports if a context is loaded. But those older
>>	platforms don’t have mmio triggers. Ctx id field is non-zero for
>>	all types of reports if a context is loaded.
>>
>>	I don’t understand why i915 needs to set ctx id to 0xffffffff if
>>	the flag is not set. It has been removed from XE KMD as I remember
>>	correctly.
>>
>> >		 */
>> > -		if (oa_report_ctx_invalid(stream, report)) {
>> > +
>> > +		if (oa_report_ctx_invalid(stream, report) &&
>> > +		    GRAPHICS_VER_FULL(stream->engine->i915) < IP_VER(12, 50)) {
>> >			ctx_id = INVALID_CTX_ID;
>> >			oa_context_id_squash(stream, report32);
>> >		}
>>
>> So I am assuming there's some unknown reason (maybe not hearing from Mesa?)
>> why we can't get rid of the squashing even for legacy platforms. But that's
>> ok I guess. So this is:
>>
>> Reviewed-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Ashutosh Dixit Dec. 19, 2023, 7:13 a.m. UTC | #4
On Mon, 18 Dec 2023 22:07:38 -0800, Umesh Nerlige Ramappa wrote:
>
> On Mon, Dec 18, 2023 at 09:48:39PM -0800, Dixit, Ashutosh wrote:
> > On Mon, 18 Dec 2023 21:28:33 -0800, Dixit, Ashutosh wrote:
> >>
> >> On Mon, 18 Dec 2023 16:05:43 -0800, Umesh Nerlige Ramappa wrote:
> >> >
> >>
> >> Hi Umesh,
> >>
> >> > On XEHP platforms user is not able to find MMIO triggered reports in the
> >> > OA buffer since i915 squashes the context ID fields. These context ID
> >> > fields hold the MMIO trigger markers.
> >> >
> >> > Update logic to not squash the context ID fields of MMIO triggered
> >> > reports.
> >> >
> >> > Fixes: cba94bbcff08 ("drm/i915/perf: Determine context valid in OA reports")
> >> > Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
> >> > ---
> >> >  drivers/gpu/drm/i915/i915_perf.c | 39 ++++++++++++++++++++++++++++----
> >> >  1 file changed, 34 insertions(+), 5 deletions(-)
> >> >
> >> > diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
> >> > index 7b1c8de2f9cb..2d695818f006 100644
> >> > --- a/drivers/gpu/drm/i915/i915_perf.c
> >> > +++ b/drivers/gpu/drm/i915/i915_perf.c
> >> > @@ -772,10 +772,6 @@ static int gen8_append_oa_reports(struct i915_perf_stream *stream,
> >> >		 * The reason field includes flags identifying what
> >> >		 * triggered this specific report (mostly timer
> >> >		 * triggered or e.g. due to a context switch).
> >> > -		 *
> >> > -		 * In MMIO triggered reports, some platforms do not set the
> >> > -		 * reason bit in this field and it is valid to have a reason
> >> > -		 * field of zero.
> >> >		 */
> >> >		reason = oa_report_reason(stream, report);
> >> >		ctx_id = oa_context_id(stream, report32);
> >> > @@ -787,8 +783,41 @@ static int gen8_append_oa_reports(struct i915_perf_stream *stream,
> >> >		 *
> >> >		 * Note: that we don't clear the valid_ctx_bit so userspace can
> >> >		 * understand that the ID has been squashed by the kernel.
> >> > +		 *
> >> > +		 * Update:
> >> > +		 *
> >> > +		 * On XEHP platforms the behavior of context id valid bit has
> >> > +		 * changed compared to prior platforms. To describe this, we
> >> > +		 * define a few terms:
> >> > +		 *
> >> > +		 * context-switch-report: This is a report with the reason type
> >> > +		 * being context-switch. It is generated when a context switches
> >> > +		 * out.
> >> > +		 *
> >> > +		 * context-valid-bit: A bit that is set in the report ID field
> >> > +		 * to indicate that a valid context has been loaded.
> >> > +		 *
> >> > +		 * gpu-idle: A condition characterized by a
> >> > +		 * context-switch-report with context-valid-bit set to 0.
> >> > +		 *
> >> > +		 * On prior platforms, context-id-valid bit is set to 0 only
> >> > +		 * when GPU goes idle. In all other reports, it is set to 1.
> >> > +		 *
> >> > +		 * On XEHP platforms, context-valid-bit is set to 1 in a context
> >> > +		 * switch report if a new context switched in. For all other
> >> > +		 * reports it is set to 0.
> >> > +		 *
> >> > +		 * This change in behavior causes an issue with MMIO triggered
> >> > +		 * reports. MMIO triggered reports have the markers in the
> >> > +		 * context ID field and the context-valid-bit is 0. The logic
> >> > +		 * below to squash the context ID would render the report
> >> > +		 * useless since the user will not be able to find it in the OA
> >> > +		 * buffer. Since MMIO triggered reports exist only on XEHP,
> >> > +		 * we should avoid squashing these for XEHP platforms.
> >>
> >> Hmm I am wondering if this is over-information and this comment should be
> >> made brief.
> >
> > Let me try: "For Gen's >= 12.50, the context id valid bit is reset when a
> > context switches out, but the context id is still valid. Because of this we
> > cannot squash the context id in this case".
> >
> > So this should affect both the regular as well as MMIO triggered cases
> > afaiu.
> >
> > Anyway, please do what you think is right with the comment. I just thought
> > I'll chime in.
>
> The long and descriptive comment is entirely for my benefit. There is a
> very good chance I will forget this, so putting it down in the code.  Also,
> I don't see this described in the spec, so thinking that we will benefit
> from it by having it here. I can put it in the commit msg instead if that
> helps.

I've already R-b'd the patch, so entirely your call. So do whatever you're
comfortable with :)

Ashutosh

>
> >
> >> For the record, here's the explanation of what is happening from Robert
> >> Krzemien's email (which at least makes it simpler for me to understand
> >> what is happening):
> >>
> >>	For Gen12HP+ (ATS/DG2/PVC/MTL+) platforms, context id valid bit is
> >>	set only for context switch reports and when a context is being
> >>	loaded. When exiting a context, a context switch report is
> >>	generated, ctx id is not zero, but the bit is not set. It allows us
> >>	to distinguish whether context switch reports are generated due to
> >>	entering or exiting GPU contexts. Ctx id field is non-zero for
> >>	context switches and mmio triggers. Other types always have ctx id
> >>	set to 0.
> >>
> >>	For previous platforms (like Gen12LP, Gen9/11), the bit is set for
> >>	all types of reports if a context is loaded. But those older
> >>	platforms don’t have mmio triggers. Ctx id field is non-zero for
> >>	all types of reports if a context is loaded.
> >>
> >>	I don’t understand why i915 needs to set ctx id to 0xffffffff if
> >>	the flag is not set. It has been removed from XE KMD as I remember
> >>	correctly.
> >>
> >> >		 */
> >> > -		if (oa_report_ctx_invalid(stream, report)) {
> >> > +
> >> > +		if (oa_report_ctx_invalid(stream, report) &&
> >> > +		    GRAPHICS_VER_FULL(stream->engine->i915) < IP_VER(12, 50)) {
> >> >			ctx_id = INVALID_CTX_ID;
> >> >			oa_context_id_squash(stream, report32);
> >> >		}
> >>
> >> So I am assuming there's some unknown reason (maybe not hearing from Mesa?)
> >> why we can't get rid of the squashing even for legacy platforms. But that's
> >> ok I guess. So this is:
> >>
> >> Reviewed-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Umesh Nerlige Ramappa Dec. 20, 2023, 5:31 p.m. UTC | #5
On Tue, Dec 19, 2023 at 04:57:10AM +0000, Patchwork wrote:
>   Patch Details
>
>Series:  drm/i915/perf: Update handling of MMIO triggered reports
>URL:     [1]https://patchwork.freedesktop.org/series/127946/
>State:   failure
>Details: [2]https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_127946v1/index.html
>
>          CI Bug Log - changes from CI_DRM_14041 -> Patchwork_127946v1
>
>Summary
>
>   FAILURE
>
>   Serious unknown changes coming with Patchwork_127946v1 absolutely need to
>   be
>   verified manually.
>
>   If you think the reported changes have nothing to do with the changes
>   introduced in Patchwork_127946v1, please notify your bug team
>   (I915-ci-infra@lists.freedesktop.org) to allow them
>   to document this new failure mode, which will reduce false positives in
>   CI.
>
>   External URL:
>   https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_127946v1/index.html
>
>Participating hosts (36 -> 34)
>
>   Additional (1): bat-mtlp-8
>   Missing (3): fi-bsw-nick fi-snb-2520m fi-pnv-d510
>
>Possible new issues
>
>   Here are the unknown changes that may have been introduced in
>   Patchwork_127946v1:
>
>  IGT changes
>
>    Possible regressions
>
>     * igt@i915_selftest@live@hugepages:
>
>          * bat-jsl-3: [3]PASS -> [4]INCOMPLETE

This is unrelated since the OA specific change is not exercised in BAT 
tests.

Umesh

>
>Known issues
>
Umesh Nerlige Ramappa Dec. 22, 2023, 9:46 p.m. UTC | #6
On Mon, Dec 18, 2023 at 04:05:43PM -0800, Umesh Nerlige Ramappa wrote:
>On XEHP platforms user is not able to find MMIO triggered reports in the
>OA buffer since i915 squashes the context ID fields. These context ID
>fields hold the MMIO trigger markers.
>
>Update logic to not squash the context ID fields of MMIO triggered
>reports.
>
>Fixes: cba94bbcff08 ("drm/i915/perf: Determine context valid in OA reports")
>Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
>---

>Fi.CI.IGT failures.
>
>Possible new issues
>Here are the unknown changes that may have been introduced in Patchwork_127946v2_full:
>
>IGT changes
>Possible regressions
>igt@gem_exec_suspend@basic-s3-devices@smem:
>shard-mtlp: NOTRUN -> ABORT
>
>igt@i915_selftest@live@gt_pm:
>shard-rkl: PASS -> DMESG-FAIL

The above are unrelated and do not exercise the code path that has the fix.

Thanks,
Umesh

>
>Known issues
diff mbox series

Patch

diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
index 7b1c8de2f9cb..2d695818f006 100644
--- a/drivers/gpu/drm/i915/i915_perf.c
+++ b/drivers/gpu/drm/i915/i915_perf.c
@@ -772,10 +772,6 @@  static int gen8_append_oa_reports(struct i915_perf_stream *stream,
 		 * The reason field includes flags identifying what
 		 * triggered this specific report (mostly timer
 		 * triggered or e.g. due to a context switch).
-		 *
-		 * In MMIO triggered reports, some platforms do not set the
-		 * reason bit in this field and it is valid to have a reason
-		 * field of zero.
 		 */
 		reason = oa_report_reason(stream, report);
 		ctx_id = oa_context_id(stream, report32);
@@ -787,8 +783,41 @@  static int gen8_append_oa_reports(struct i915_perf_stream *stream,
 		 *
 		 * Note: that we don't clear the valid_ctx_bit so userspace can
 		 * understand that the ID has been squashed by the kernel.
+		 *
+		 * Update:
+		 *
+		 * On XEHP platforms the behavior of context id valid bit has
+		 * changed compared to prior platforms. To describe this, we
+		 * define a few terms:
+		 *
+		 * context-switch-report: This is a report with the reason type
+		 * being context-switch. It is generated when a context switches
+		 * out.
+		 *
+		 * context-valid-bit: A bit that is set in the report ID field
+		 * to indicate that a valid context has been loaded.
+		 *
+		 * gpu-idle: A condition characterized by a
+		 * context-switch-report with context-valid-bit set to 0.
+		 *
+		 * On prior platforms, context-id-valid bit is set to 0 only
+		 * when GPU goes idle. In all other reports, it is set to 1.
+		 *
+		 * On XEHP platforms, context-valid-bit is set to 1 in a context
+		 * switch report if a new context switched in. For all other
+		 * reports it is set to 0.
+		 *
+		 * This change in behavior causes an issue with MMIO triggered
+		 * reports. MMIO triggered reports have the markers in the
+		 * context ID field and the context-valid-bit is 0. The logic
+		 * below to squash the context ID would render the report
+		 * useless since the user will not be able to find it in the OA
+		 * buffer. Since MMIO triggered reports exist only on XEHP,
+		 * we should avoid squashing these for XEHP platforms.
 		 */
-		if (oa_report_ctx_invalid(stream, report)) {
+
+		if (oa_report_ctx_invalid(stream, report) &&
+		    GRAPHICS_VER_FULL(stream->engine->i915) < IP_VER(12, 50)) {
 			ctx_id = INVALID_CTX_ID;
 			oa_context_id_squash(stream, report32);
 		}