Message ID | 20171019202308.33884-1-michal.wajdeczko@intel.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Thu, 2017-10-19 at 20:23 +0000, Michal Wajdeczko wrote: > Include GuC and HuC firmware details in captured error state > to provide additional debug information. To reuse existing > uc firmware pretty printer, introduce new drm-printer variant > that works with our i915_error_state_buf output. Also update > uc firmware pretty printer to accept const input. > > v2: don't rely on current caps (Chris) > dump correct fw info (Michal) > v3: simplify capture of custom paths (Chris) > > Suggested-by: Chris Wilson <chris@chris-wilson.co.uk> > Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> > Cc: Chris Wilson <chris@chris-wilson.co.uk> > Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> A few comments below. Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Regards, Joonas > @@ -774,6 +793,11 @@ int i915_error_state_to_str(struct drm_i915_error_state_buf *m, > err_print_capabilities(m, &error->device_info); > err_print_params(m, &error->params); > > + if (error->device_info.has_guc) { > + intel_uc_fw_dump(&error->guc_fw, &p); > + intel_uc_fw_dump(&error->huc_fw, &p); > + } I might use "error->{g,h}uc_fw.path" as the condition, for both individually. We will have DMC here in the future, too. > +static void i915_capture_uc_state(struct drm_i915_private *dev_priv, > + struct i915_gpu_state *error) > +{ > + error->guc_fw = dev_priv->guc.fw; > + error->huc_fw = dev_priv->huc.fw; > + > + /* Make sure to capture custom firmware paths */ This is again a "what" comment, not "why". And they're not necessarily the custom ones. The "why" is generic to all error capture funcs. > + error->guc_fw.path = kstrdup(error->guc_fw.path, GFP_ATOMIC); > + error->huc_fw.path = kstrdup(error->huc_fw.path, GFP_ATOMIC); > +}
Quoting Joonas Lahtinen (2017-10-20 10:41:57) > On Thu, 2017-10-19 at 20:23 +0000, Michal Wajdeczko wrote: > > @@ -774,6 +793,11 @@ int i915_error_state_to_str(struct drm_i915_error_state_buf *m, > > err_print_capabilities(m, &error->device_info); > > err_print_params(m, &error->params); > > > > + if (error->device_info.has_guc) { > > + intel_uc_fw_dump(&error->guc_fw, &p); > > + intel_uc_fw_dump(&error->huc_fw, &p); > > + } > > I might use "error->{g,h}uc_fw.path" as the condition, for both > individually. We will have DMC here in the future, too. That's the type of predicate I was looking for. If we can tell when the fw has been loaded just by looking at the uc_fw struct, please do so. -Chris
On Fri, 20 Oct 2017 13:44:04 +0200, Chris Wilson <chris@chris-wilson.co.uk> wrote: > Quoting Joonas Lahtinen (2017-10-20 10:41:57) >> On Thu, 2017-10-19 at 20:23 +0000, Michal Wajdeczko wrote: >> > @@ -774,6 +793,11 @@ int i915_error_state_to_str(struct >> drm_i915_error_state_buf *m, >> > err_print_capabilities(m, &error->device_info); >> > err_print_params(m, &error->params); >> > >> > + if (error->device_info.has_guc) { >> > + intel_uc_fw_dump(&error->guc_fw, &p); >> > + intel_uc_fw_dump(&error->huc_fw, &p); >> > + } >> >> I might use "error->{g,h}uc_fw.path" as the condition, for both >> individually. We will have DMC here in the future, too. > > That's the type of predicate I was looking for. If we can tell when the > fw has been loaded just by looking at the uc_fw struct, please do so. If you want to include uc fw info only when fw was loaded at capture moment, then we can look directly at its "load_status" field: if (error->guc_fw.load_status == INTEL_UC_FIRMWARE_SUCCESS) But what if fw was loaded earlier, before error capture is started? Don't we want to capture whole driver configuration? Michal
Quoting Michal Wajdeczko (2017-10-20 13:14:45) > On Fri, 20 Oct 2017 13:44:04 +0200, Chris Wilson > <chris@chris-wilson.co.uk> wrote: > > > Quoting Joonas Lahtinen (2017-10-20 10:41:57) > >> On Thu, 2017-10-19 at 20:23 +0000, Michal Wajdeczko wrote: > >> > @@ -774,6 +793,11 @@ int i915_error_state_to_str(struct > >> drm_i915_error_state_buf *m, > >> > err_print_capabilities(m, &error->device_info); > >> > err_print_params(m, &error->params); > >> > > >> > + if (error->device_info.has_guc) { > >> > + intel_uc_fw_dump(&error->guc_fw, &p); > >> > + intel_uc_fw_dump(&error->huc_fw, &p); > >> > + } > >> > >> I might use "error->{g,h}uc_fw.path" as the condition, for both > >> individually. We will have DMC here in the future, too. > > > > That's the type of predicate I was looking for. If we can tell when the > > fw has been loaded just by looking at the uc_fw struct, please do so. > > If you want to include uc fw info only when fw was loaded at capture > moment, > then we can look directly at its "load_status" field: > > if (error->guc_fw.load_status == INTEL_UC_FIRMWARE_SUCCESS) > > But what if fw was loaded earlier, before error capture is started? > Don't we want to capture whole driver configuration? The goal is to know what the driver/hw was doing at the time of the capture (which we presume is still identical to the hang). At this point, we are just asking ourselves what is the most unambiguous way of printing valid data. We already show has_guc in the output, so it boils down to what was the uc_fw state. It doesn't need to be load_status == SUCCESS, as we should show a failed attempt to upload the fw as well. So intel_uc_fw.size? -Chris
On Fri, 20 Oct 2017 14:21:16 +0200, Chris Wilson <chris@chris-wilson.co.uk> wrote: > Quoting Michal Wajdeczko (2017-10-20 13:14:45) >> On Fri, 20 Oct 2017 13:44:04 +0200, Chris Wilson >> <chris@chris-wilson.co.uk> wrote: >> >> > Quoting Joonas Lahtinen (2017-10-20 10:41:57) >> >> On Thu, 2017-10-19 at 20:23 +0000, Michal Wajdeczko wrote: >> >> > @@ -774,6 +793,11 @@ int i915_error_state_to_str(struct >> >> drm_i915_error_state_buf *m, >> >> > err_print_capabilities(m, &error->device_info); >> >> > err_print_params(m, &error->params); >> >> > >> >> > + if (error->device_info.has_guc) { >> >> > + intel_uc_fw_dump(&error->guc_fw, &p); >> >> > + intel_uc_fw_dump(&error->huc_fw, &p); >> >> > + } >> >> >> >> I might use "error->{g,h}uc_fw.path" as the condition, for both >> >> individually. We will have DMC here in the future, too. >> > >> > That's the type of predicate I was looking for. If we can tell when >> the >> > fw has been loaded just by looking at the uc_fw struct, please do so. >> >> If you want to include uc fw info only when fw was loaded at capture >> moment, >> then we can look directly at its "load_status" field: >> >> if (error->guc_fw.load_status == INTEL_UC_FIRMWARE_SUCCESS) >> >> But what if fw was loaded earlier, before error capture is started? >> Don't we want to capture whole driver configuration? > > The goal is to know what the driver/hw was doing at the time of the > capture (which we presume is still identical to the hang). > > At this point, we are just asking ourselves what is the most unambiguous > way of printing valid data. We already show has_guc in the output, so it > boils down to what was the uc_fw state. It doesn't need to be > load_status == SUCCESS, as we should show a failed attempt to upload the > fw as well. > > So intel_uc_fw.size? But then in case of zero size fw you will not get info about load failure ;) What about leaving here "if (error->device_info.has_guc)" as main guard for both GuC and HuC fw dumps (note that upcoming series make it explicit that HuC is always in pair with GuC) but then also add "if (uc_fw->path)" condition to the uc fw pretty printer to stop printing fw details if no path was specified (all fields will be void anyway). Then we may get: has_guc: yes ... GuC firmware: (null) HuC firmware: (null) Michal
Quoting Michal Wajdeczko (2017-10-20 13:36:59) > On Fri, 20 Oct 2017 14:21:16 +0200, Chris Wilson > <chris@chris-wilson.co.uk> wrote: > > > Quoting Michal Wajdeczko (2017-10-20 13:14:45) > >> On Fri, 20 Oct 2017 13:44:04 +0200, Chris Wilson > >> <chris@chris-wilson.co.uk> wrote: > >> > >> > Quoting Joonas Lahtinen (2017-10-20 10:41:57) > >> >> On Thu, 2017-10-19 at 20:23 +0000, Michal Wajdeczko wrote: > >> >> > @@ -774,6 +793,11 @@ int i915_error_state_to_str(struct > >> >> drm_i915_error_state_buf *m, > >> >> > err_print_capabilities(m, &error->device_info); > >> >> > err_print_params(m, &error->params); > >> >> > > >> >> > + if (error->device_info.has_guc) { > >> >> > + intel_uc_fw_dump(&error->guc_fw, &p); > >> >> > + intel_uc_fw_dump(&error->huc_fw, &p); > >> >> > + } > >> >> > >> >> I might use "error->{g,h}uc_fw.path" as the condition, for both > >> >> individually. We will have DMC here in the future, too. > >> > > >> > That's the type of predicate I was looking for. If we can tell when > >> the > >> > fw has been loaded just by looking at the uc_fw struct, please do so. > >> > >> If you want to include uc fw info only when fw was loaded at capture > >> moment, > >> then we can look directly at its "load_status" field: > >> > >> if (error->guc_fw.load_status == INTEL_UC_FIRMWARE_SUCCESS) > >> > >> But what if fw was loaded earlier, before error capture is started? > >> Don't we want to capture whole driver configuration? > > > > The goal is to know what the driver/hw was doing at the time of the > > capture (which we presume is still identical to the hang). > > > > At this point, we are just asking ourselves what is the most unambiguous > > way of printing valid data. We already show has_guc in the output, so it > > boils down to what was the uc_fw state. It doesn't need to be > > load_status == SUCCESS, as we should show a failed attempt to upload the > > fw as well. > > > > So intel_uc_fw.size? > > But then in case of zero size fw you will not get info about load failure > ;) Heh, I would just go back and make sure we had a clear well of deducing when fw was being utilised. Maybe if (uc_fw.load_status != INTEL_UC_FIRMWARE_NONE) ? -Chris
On Fri, 20 Oct 2017 14:43:46 +0200, Chris Wilson <chris@chris-wilson.co.uk> wrote: > Quoting Michal Wajdeczko (2017-10-20 13:36:59) >> On Fri, 20 Oct 2017 14:21:16 +0200, Chris Wilson >> <chris@chris-wilson.co.uk> wrote: >> >> > Quoting Michal Wajdeczko (2017-10-20 13:14:45) >> >> On Fri, 20 Oct 2017 13:44:04 +0200, Chris Wilson >> >> <chris@chris-wilson.co.uk> wrote: >> >> >> >> > Quoting Joonas Lahtinen (2017-10-20 10:41:57) >> >> >> On Thu, 2017-10-19 at 20:23 +0000, Michal Wajdeczko wrote: >> >> >> > @@ -774,6 +793,11 @@ int i915_error_state_to_str(struct >> >> >> drm_i915_error_state_buf *m, >> >> >> > err_print_capabilities(m, &error->device_info); >> >> >> > err_print_params(m, &error->params); >> >> >> > >> >> >> > + if (error->device_info.has_guc) { >> >> >> > + intel_uc_fw_dump(&error->guc_fw, &p); >> >> >> > + intel_uc_fw_dump(&error->huc_fw, &p); >> >> >> > + } >> >> >> >> >> >> I might use "error->{g,h}uc_fw.path" as the condition, for both >> >> >> individually. We will have DMC here in the future, too. >> >> > >> >> > That's the type of predicate I was looking for. If we can tell when >> >> the >> >> > fw has been loaded just by looking at the uc_fw struct, please do >> so. >> >> >> >> If you want to include uc fw info only when fw was loaded at capture >> >> moment, >> >> then we can look directly at its "load_status" field: >> >> >> >> if (error->guc_fw.load_status == INTEL_UC_FIRMWARE_SUCCESS) >> >> >> >> But what if fw was loaded earlier, before error capture is started? >> >> Don't we want to capture whole driver configuration? >> > >> > The goal is to know what the driver/hw was doing at the time of the >> > capture (which we presume is still identical to the hang). >> > >> > At this point, we are just asking ourselves what is the most >> unambiguous >> > way of printing valid data. We already show has_guc in the output, so >> it >> > boils down to what was the uc_fw state. It doesn't need to be >> > load_status == SUCCESS, as we should show a failed attempt to upload >> the >> > fw as well. >> > >> > So intel_uc_fw.size? >> >> But then in case of zero size fw you will not get info about load >> failure >> ;) > > Heh, I would just go back and make sure we had a clear well of deducing > when fw was being utilised. Maybe > > if (uc_fw.load_status != INTEL_UC_FIRMWARE_NONE) > If we don't care about earlier fw fetch attempts/errors, then ok. Michal
Quoting Michal Wajdeczko (2017-10-20 13:50:23) > On Fri, 20 Oct 2017 14:43:46 +0200, Chris Wilson > <chris@chris-wilson.co.uk> wrote: > > Heh, I would just go back and make sure we had a clear well of deducing > > when fw was being utilised. Maybe > > > > if (uc_fw.load_status != INTEL_UC_FIRMWARE_NONE) > > > > If we don't care about earlier fw fetch attempts/errors, then ok. Just since the last reset is about all we aim for. It's a game of trying to work out what information will be required to debug future hangs. Too much and it's a needle-in-a-haystack-problem, too little and we have the round-trip latency in getting a new kernel to where the problem is seen. And then for the next few years being frustrated at seeing hangs with the old error state. I defer what information is required for debugging in the future to those already attempting it; but I do know that recording what firmware and what status we left the uc in will always be useful. :) -Chris
On Fri, 20 Oct 2017 11:41:57 +0200, Joonas Lahtinen <joonas.lahtinen@linux.intel.com> wrote: > On Thu, 2017-10-19 at 20:23 +0000, Michal Wajdeczko wrote: >> Include GuC and HuC firmware details in captured error state >> to provide additional debug information. To reuse existing >> uc firmware pretty printer, introduce new drm-printer variant >> that works with our i915_error_state_buf output. Also update >> uc firmware pretty printer to accept const input. >> >> v2: don't rely on current caps (Chris) >> dump correct fw info (Michal) >> v3: simplify capture of custom paths (Chris) >> >> Suggested-by: Chris Wilson <chris@chris-wilson.co.uk> >> Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> >> Cc: Chris Wilson <chris@chris-wilson.co.uk> >> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> > > A few comments below. > > Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> > > Regards, Joonas > >> @@ -774,6 +793,11 @@ int i915_error_state_to_str(struct >> drm_i915_error_state_buf *m, >> err_print_capabilities(m, &error->device_info); >> err_print_params(m, &error->params); >> >> + if (error->device_info.has_guc) { >> + intel_uc_fw_dump(&error->guc_fw, &p); >> + intel_uc_fw_dump(&error->huc_fw, &p); >> + } > > I might use "error->{g,h}uc_fw.path" as the condition, for both If we use "uc_fw->path" as condition then we have to assume that lack of uc firmware info is for this reason, and not from corrupted or incomplete or legacy format error dump ;) I would prefer to stay with "has_guc" condition here, and trim output from uc fw printer if path is not set. This way we will cover all possible scenario (no guc = no fw, no fw = no fw info) > individually. We will have DMC here in the future, too. Hmm, while not here, info about DMC fw is already included in error dump. But in their case reported DMC fw state (loaded flag, version) is taken directly from dev_priv instead of captured error state. Michal
Quoting Michal Wajdeczko (2017-10-20 15:44:02) > On Fri, 20 Oct 2017 11:41:57 +0200, Joonas Lahtinen > <joonas.lahtinen@linux.intel.com> wrote: > > > On Thu, 2017-10-19 at 20:23 +0000, Michal Wajdeczko wrote: > >> Include GuC and HuC firmware details in captured error state > >> to provide additional debug information. To reuse existing > >> uc firmware pretty printer, introduce new drm-printer variant > >> that works with our i915_error_state_buf output. Also update > >> uc firmware pretty printer to accept const input. > >> > >> v2: don't rely on current caps (Chris) > >> dump correct fw info (Michal) > >> v3: simplify capture of custom paths (Chris) > >> > >> Suggested-by: Chris Wilson <chris@chris-wilson.co.uk> > >> Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> > >> Cc: Chris Wilson <chris@chris-wilson.co.uk> > >> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> > > > > A few comments below. > > > > Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> > > > > Regards, Joonas > > > >> @@ -774,6 +793,11 @@ int i915_error_state_to_str(struct > >> drm_i915_error_state_buf *m, > >> err_print_capabilities(m, &error->device_info); > >> err_print_params(m, &error->params); > >> > >> + if (error->device_info.has_guc) { > >> + intel_uc_fw_dump(&error->guc_fw, &p); > >> + intel_uc_fw_dump(&error->huc_fw, &p); > >> + } > > > > I might use "error->{g,h}uc_fw.path" as the condition, for both > > If we use "uc_fw->path" as condition then we have to assume that > lack of uc firmware info is for this reason, and not from corrupted > or incomplete or legacy format error dump ;) > > I would prefer to stay with "has_guc" condition here, and trim > output from uc fw printer if path is not set. This way we will > cover all possible scenario (no guc = no fw, no fw = no fw info) I still do not understand the insistence on has_guc. no fw is no fw, no guc is no guc. > > individually. We will have DMC here in the future, too. > > Hmm, while not here, info about DMC fw is already included in > error dump. But in their case reported DMC fw state (loaded flag, > version) is taken directly from dev_priv instead of captured error > state. I know. You are not to copy that mistake! Also if you should happen to feel inclined to rectify the heinous actions, preferably by removing the dmc entirely, please do so. -Chris
On Fri, 20 Oct 2017 16:50:10 +0200, Chris Wilson <chris@chris-wilson.co.uk> wrote: > Quoting Michal Wajdeczko (2017-10-20 15:44:02) >> On Fri, 20 Oct 2017 11:41:57 +0200, Joonas Lahtinen >> <joonas.lahtinen@linux.intel.com> wrote: >> >> > On Thu, 2017-10-19 at 20:23 +0000, Michal Wajdeczko wrote: >> >> Include GuC and HuC firmware details in captured error state >> >> to provide additional debug information. To reuse existing >> >> uc firmware pretty printer, introduce new drm-printer variant >> >> that works with our i915_error_state_buf output. Also update >> >> uc firmware pretty printer to accept const input. >> >> >> >> v2: don't rely on current caps (Chris) >> >> dump correct fw info (Michal) >> >> v3: simplify capture of custom paths (Chris) >> >> >> >> Suggested-by: Chris Wilson <chris@chris-wilson.co.uk> >> >> Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> >> >> Cc: Chris Wilson <chris@chris-wilson.co.uk> >> >> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> >> > >> > A few comments below. >> > >> > Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> >> > >> > Regards, Joonas >> > >> >> @@ -774,6 +793,11 @@ int i915_error_state_to_str(struct >> >> drm_i915_error_state_buf *m, >> >> err_print_capabilities(m, &error->device_info); >> >> err_print_params(m, &error->params); >> >> >> >> + if (error->device_info.has_guc) { >> >> + intel_uc_fw_dump(&error->guc_fw, &p); >> >> + intel_uc_fw_dump(&error->huc_fw, &p); >> >> + } >> > >> > I might use "error->{g,h}uc_fw.path" as the condition, for both >> >> If we use "uc_fw->path" as condition then we have to assume that >> lack of uc firmware info is for this reason, and not from corrupted >> or incomplete or legacy format error dump ;) >> >> I would prefer to stay with "has_guc" condition here, and trim >> output from uc fw printer if path is not set. This way we will >> cover all possible scenario (no guc = no fw, no fw = no fw info) > > I still do not understand the insistence on has_guc. no fw is no fw, no > guc is no guc. I just wanted to cover the case where we have guc but without fw (explicitly expressed as null path) while skipping fw info for platforms without guc. No more guesses. No unwanted noise. Michal
Quoting Michal Wajdeczko (2017-10-20 15:59:41) > On Fri, 20 Oct 2017 16:50:10 +0200, Chris Wilson > <chris@chris-wilson.co.uk> wrote: > > > Quoting Michal Wajdeczko (2017-10-20 15:44:02) > >> On Fri, 20 Oct 2017 11:41:57 +0200, Joonas Lahtinen > >> <joonas.lahtinen@linux.intel.com> wrote: > >> > >> > On Thu, 2017-10-19 at 20:23 +0000, Michal Wajdeczko wrote: > >> >> Include GuC and HuC firmware details in captured error state > >> >> to provide additional debug information. To reuse existing > >> >> uc firmware pretty printer, introduce new drm-printer variant > >> >> that works with our i915_error_state_buf output. Also update > >> >> uc firmware pretty printer to accept const input. > >> >> > >> >> v2: don't rely on current caps (Chris) > >> >> dump correct fw info (Michal) > >> >> v3: simplify capture of custom paths (Chris) > >> >> > >> >> Suggested-by: Chris Wilson <chris@chris-wilson.co.uk> > >> >> Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> > >> >> Cc: Chris Wilson <chris@chris-wilson.co.uk> > >> >> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> > >> > > >> > A few comments below. > >> > > >> > Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> > >> > > >> > Regards, Joonas > >> > > >> >> @@ -774,6 +793,11 @@ int i915_error_state_to_str(struct > >> >> drm_i915_error_state_buf *m, > >> >> err_print_capabilities(m, &error->device_info); > >> >> err_print_params(m, &error->params); > >> >> > >> >> + if (error->device_info.has_guc) { > >> >> + intel_uc_fw_dump(&error->guc_fw, &p); > >> >> + intel_uc_fw_dump(&error->huc_fw, &p); > >> >> + } > >> > > >> > I might use "error->{g,h}uc_fw.path" as the condition, for both > >> > >> If we use "uc_fw->path" as condition then we have to assume that > >> lack of uc firmware info is for this reason, and not from corrupted > >> or incomplete or legacy format error dump ;) > >> > >> I would prefer to stay with "has_guc" condition here, and trim > >> output from uc fw printer if path is not set. This way we will > >> cover all possible scenario (no guc = no fw, no fw = no fw info) > > > > I still do not understand the insistence on has_guc. no fw is no fw, no > > guc is no guc. > > I just wanted to cover the case where we have guc but without fw > (explicitly expressed as null path) while skipping fw info for > platforms without guc. No more guesses. No unwanted noise. The first makes sense, but the second? We either used the struct uc_fw or we didn't. If we did and didn't have guc, that raises a question or two -- as it should never be the case, but for the error capture it doesn't have to know, it just prints what it has captured. -Chris
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h index 3c2649c..4d6519b 100644 --- a/drivers/gpu/drm/i915/i915_drv.h +++ b/drivers/gpu/drm/i915/i915_drv.h @@ -911,6 +911,10 @@ struct i915_gpu_state { struct intel_device_info device_info; struct i915_params params; + /* uC state */ + struct intel_uc_fw guc_fw; + struct intel_uc_fw huc_fw; + /* Generic register state */ u32 eir; u32 pgtbl_er; diff --git a/drivers/gpu/drm/i915/i915_gpu_error.c b/drivers/gpu/drm/i915/i915_gpu_error.c index 653fb69..21d78eb 100644 --- a/drivers/gpu/drm/i915/i915_gpu_error.c +++ b/drivers/gpu/drm/i915/i915_gpu_error.c @@ -30,6 +30,8 @@ #include <generated/utsrelease.h> #include <linux/stop_machine.h> #include <linux/zlib.h> +#include <drm/drm_print.h> + #include "i915_drv.h" static const char *engine_str(int engine) @@ -175,6 +177,21 @@ static void i915_error_puts(struct drm_i915_error_state_buf *e, #define err_printf(e, ...) i915_error_printf(e, __VA_ARGS__) #define err_puts(e, s) i915_error_puts(e, s) +static void __i915_printfn_error(struct drm_printer *p, struct va_format *vaf) +{ + i915_error_vprintf(p->arg, vaf->fmt, *vaf->va); +} + +static inline struct drm_printer +i915_error_printer(struct drm_i915_error_state_buf *e) +{ + struct drm_printer p = { + .printfn = __i915_printfn_error, + .arg = e, + }; + return p; +} + #ifdef CONFIG_DRM_I915_COMPRESS_ERROR struct compress { @@ -592,8 +609,10 @@ static void err_print_pciid(struct drm_i915_error_state_buf *m, int i915_error_state_to_str(struct drm_i915_error_state_buf *m, const struct i915_gpu_state *error) { + struct drm_printer p = i915_error_printer(m); struct drm_i915_private *dev_priv = m->i915; struct drm_i915_error_object *obj; + int i, j; if (!error) { @@ -774,6 +793,11 @@ int i915_error_state_to_str(struct drm_i915_error_state_buf *m, err_print_capabilities(m, &error->device_info); err_print_params(m, &error->params); + if (error->device_info.has_guc) { + intel_uc_fw_dump(&error->guc_fw, &p); + intel_uc_fw_dump(&error->huc_fw, &p); + } + if (m->bytes == 0 && m->err) return m->err; @@ -870,6 +894,9 @@ void __i915_gpu_state_free(struct kref *error_ref) I915_PARAMS_FOR_EACH(FREE); #undef FREE + kfree(error->guc_fw.path); + kfree(error->huc_fw.path); + kfree(error); } @@ -1559,6 +1586,17 @@ static void i915_capture_pinned_buffers(struct drm_i915_private *dev_priv, error->pinned_bo = bo; } +static void i915_capture_uc_state(struct drm_i915_private *dev_priv, + struct i915_gpu_state *error) +{ + error->guc_fw = dev_priv->guc.fw; + error->huc_fw = dev_priv->huc.fw; + + /* Make sure to capture custom firmware paths */ + error->guc_fw.path = kstrdup(error->guc_fw.path, GFP_ATOMIC); + error->huc_fw.path = kstrdup(error->huc_fw.path, GFP_ATOMIC); +} + static void i915_gem_capture_guc_log_buffer(struct drm_i915_private *dev_priv, struct i915_gpu_state *error) { @@ -1710,6 +1748,7 @@ static int capture(void *data) I915_PARAMS_FOR_EACH(DUP); #undef DUP + i915_capture_uc_state(error->i915, error); i915_capture_gen_state(error->i915, error); i915_capture_reg_state(error->i915, error); i915_gem_record_fences(error->i915, error); diff --git a/drivers/gpu/drm/i915/intel_uc_fw.c b/drivers/gpu/drm/i915/intel_uc_fw.c index 973888e..4bc82d3 100644 --- a/drivers/gpu/drm/i915/intel_uc_fw.c +++ b/drivers/gpu/drm/i915/intel_uc_fw.c @@ -299,7 +299,7 @@ void intel_uc_fw_fini(struct intel_uc_fw *uc_fw) * * Pretty printer for uC firmware. */ -void intel_uc_fw_dump(struct intel_uc_fw *uc_fw, struct drm_printer *p) +void intel_uc_fw_dump(const struct intel_uc_fw *uc_fw, struct drm_printer *p) { drm_printf(p, "%s firmware: %s\n", intel_uc_fw_type_repr(uc_fw->type), uc_fw->path); diff --git a/drivers/gpu/drm/i915/intel_uc_fw.h b/drivers/gpu/drm/i915/intel_uc_fw.h index 1329036..5394d9d 100644 --- a/drivers/gpu/drm/i915/intel_uc_fw.h +++ b/drivers/gpu/drm/i915/intel_uc_fw.h @@ -116,6 +116,6 @@ int intel_uc_fw_upload(struct intel_uc_fw *uc_fw, int (*xfer)(struct intel_uc_fw *uc_fw, struct i915_vma *vma)); void intel_uc_fw_fini(struct intel_uc_fw *uc_fw); -void intel_uc_fw_dump(struct intel_uc_fw *uc_fw, struct drm_printer *p); +void intel_uc_fw_dump(const struct intel_uc_fw *uc_fw, struct drm_printer *p); #endif
Include GuC and HuC firmware details in captured error state to provide additional debug information. To reuse existing uc firmware pretty printer, introduce new drm-printer variant that works with our i915_error_state_buf output. Also update uc firmware pretty printer to accept const input. v2: don't rely on current caps (Chris) dump correct fw info (Michal) v3: simplify capture of custom paths (Chris) Suggested-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> --- drivers/gpu/drm/i915/i915_drv.h | 4 ++++ drivers/gpu/drm/i915/i915_gpu_error.c | 39 +++++++++++++++++++++++++++++++++++ drivers/gpu/drm/i915/intel_uc_fw.c | 2 +- drivers/gpu/drm/i915/intel_uc_fw.h | 2 +- 4 files changed, 45 insertions(+), 2 deletions(-)