Message ID | 1434624128-13379-1-git-send-email-chris@chris-wilson.co.uk (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Thu, Jun 18, 2015 at 11:42:08AM +0100, Chris Wilson wrote: > If the user disables the GPU reset using the i915.reset parameter and > one occurs, report that we failed to reset the GPU. If we return early, > as we currently do, then we leave all state intact (with a hung GPU) > and clients block forever waiting for their requests to complete. > > Testcase: igt/gem_eio > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> > --- > drivers/gpu/drm/i915/i915_dma.c | 1 - > drivers/gpu/drm/i915/i915_drv.c | 3 --- > drivers/gpu/drm/i915/intel_uncore.c | 3 +++ > 3 files changed, 3 insertions(+), 4 deletions(-) > > diff --git a/drivers/gpu/drm/i915/i915_dma.c b/drivers/gpu/drm/i915/i915_dma.c > index 88795d2f1819..c5349fa3fcce 100644 > --- a/drivers/gpu/drm/i915/i915_dma.c > +++ b/drivers/gpu/drm/i915/i915_dma.c > @@ -165,7 +165,6 @@ static int i915_getparam(struct drm_device *dev, void *data, > break; > case I915_PARAM_HAS_GPU_RESET: > value = i915.enable_hangcheck && > - i915.reset && > intel_has_gpu_reset(dev); > break; > default: > diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c > index 78ef0bb53c36..25ffe8afe744 100644 > --- a/drivers/gpu/drm/i915/i915_drv.c > +++ b/drivers/gpu/drm/i915/i915_drv.c > @@ -863,9 +863,6 @@ int i915_reset(struct drm_device *dev) > bool simulated; > int ret; > > - if (!i915.reset) > - return 0; > - > intel_reset_gt_powersave(dev); > > mutex_lock(&dev->struct_mutex); > diff --git a/drivers/gpu/drm/i915/intel_uncore.c b/drivers/gpu/drm/i915/intel_uncore.c > index 4a86cf007aa0..f8e75def1a1d 100644 > --- a/drivers/gpu/drm/i915/intel_uncore.c > +++ b/drivers/gpu/drm/i915/intel_uncore.c > @@ -1457,6 +1457,9 @@ static int gen6_do_reset(struct drm_device *dev) > > static int (*intel_get_gpu_reset(struct drm_device *dev))(struct drm_device *) > { > + if (!i915.reset) > + return NULL; Maybe a special reset function which always returns -EIO and prints something to illuminate the situation to dmesg? Otherwise even more wtf material in bugzilla ... -Daniel > + > if (INTEL_INFO(dev)->gen >= 6) > return gen6_do_reset; > else if (IS_GEN5(dev)) > -- > 2.1.4 > > _______________________________________________ > Intel-gfx mailing list > Intel-gfx@lists.freedesktop.org > http://lists.freedesktop.org/mailman/listinfo/intel-gfx
On Mon, Jun 22, 2015 at 03:44:51PM +0200, Daniel Vetter wrote: > On Thu, Jun 18, 2015 at 11:42:08AM +0100, Chris Wilson wrote: > > If the user disables the GPU reset using the i915.reset parameter and > > one occurs, report that we failed to reset the GPU. If we return early, > > as we currently do, then we leave all state intact (with a hung GPU) > > and clients block forever waiting for their requests to complete. > > > > Testcase: igt/gem_eio > > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> > > --- > > drivers/gpu/drm/i915/i915_dma.c | 1 - > > drivers/gpu/drm/i915/i915_drv.c | 3 --- > > drivers/gpu/drm/i915/intel_uncore.c | 3 +++ > > 3 files changed, 3 insertions(+), 4 deletions(-) > > > > diff --git a/drivers/gpu/drm/i915/i915_dma.c b/drivers/gpu/drm/i915/i915_dma.c > > index 88795d2f1819..c5349fa3fcce 100644 > > --- a/drivers/gpu/drm/i915/i915_dma.c > > +++ b/drivers/gpu/drm/i915/i915_dma.c > > @@ -165,7 +165,6 @@ static int i915_getparam(struct drm_device *dev, void *data, > > break; > > case I915_PARAM_HAS_GPU_RESET: > > value = i915.enable_hangcheck && > > - i915.reset && > > intel_has_gpu_reset(dev); > > break; > > default: > > diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c > > index 78ef0bb53c36..25ffe8afe744 100644 > > --- a/drivers/gpu/drm/i915/i915_drv.c > > +++ b/drivers/gpu/drm/i915/i915_drv.c > > @@ -863,9 +863,6 @@ int i915_reset(struct drm_device *dev) > > bool simulated; > > int ret; > > > > - if (!i915.reset) > > - return 0; > > - > > intel_reset_gt_powersave(dev); > > > > mutex_lock(&dev->struct_mutex); > > diff --git a/drivers/gpu/drm/i915/intel_uncore.c b/drivers/gpu/drm/i915/intel_uncore.c > > index 4a86cf007aa0..f8e75def1a1d 100644 > > --- a/drivers/gpu/drm/i915/intel_uncore.c > > +++ b/drivers/gpu/drm/i915/intel_uncore.c > > @@ -1457,6 +1457,9 @@ static int gen6_do_reset(struct drm_device *dev) > > > > static int (*intel_get_gpu_reset(struct drm_device *dev))(struct drm_device *) > > { > > + if (!i915.reset) > > + return NULL; > > Maybe a special reset function which always returns -EIO and prints > something to illuminate the situation to dmesg? Otherwise even more wtf > material in bugzilla ... It does report a failure (different to reset() itself failing), and reset should be yet another do not touch module option. -Chris
On Mon, Jun 22, 2015 at 02:53:39PM +0100, Chris Wilson wrote: > On Mon, Jun 22, 2015 at 03:44:51PM +0200, Daniel Vetter wrote: > > On Thu, Jun 18, 2015 at 11:42:08AM +0100, Chris Wilson wrote: > > > If the user disables the GPU reset using the i915.reset parameter and > > > one occurs, report that we failed to reset the GPU. If we return early, > > > as we currently do, then we leave all state intact (with a hung GPU) > > > and clients block forever waiting for their requests to complete. > > > > > > Testcase: igt/gem_eio > > > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> > > > --- > > > drivers/gpu/drm/i915/i915_dma.c | 1 - > > > drivers/gpu/drm/i915/i915_drv.c | 3 --- > > > drivers/gpu/drm/i915/intel_uncore.c | 3 +++ > > > 3 files changed, 3 insertions(+), 4 deletions(-) > > > > > > diff --git a/drivers/gpu/drm/i915/i915_dma.c b/drivers/gpu/drm/i915/i915_dma.c > > > index 88795d2f1819..c5349fa3fcce 100644 > > > --- a/drivers/gpu/drm/i915/i915_dma.c > > > +++ b/drivers/gpu/drm/i915/i915_dma.c > > > @@ -165,7 +165,6 @@ static int i915_getparam(struct drm_device *dev, void *data, > > > break; > > > case I915_PARAM_HAS_GPU_RESET: > > > value = i915.enable_hangcheck && > > > - i915.reset && > > > intel_has_gpu_reset(dev); > > > break; > > > default: > > > diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c > > > index 78ef0bb53c36..25ffe8afe744 100644 > > > --- a/drivers/gpu/drm/i915/i915_drv.c > > > +++ b/drivers/gpu/drm/i915/i915_drv.c > > > @@ -863,9 +863,6 @@ int i915_reset(struct drm_device *dev) > > > bool simulated; > > > int ret; > > > > > > - if (!i915.reset) > > > - return 0; > > > - > > > intel_reset_gt_powersave(dev); > > > > > > mutex_lock(&dev->struct_mutex); > > > diff --git a/drivers/gpu/drm/i915/intel_uncore.c b/drivers/gpu/drm/i915/intel_uncore.c > > > index 4a86cf007aa0..f8e75def1a1d 100644 > > > --- a/drivers/gpu/drm/i915/intel_uncore.c > > > +++ b/drivers/gpu/drm/i915/intel_uncore.c > > > @@ -1457,6 +1457,9 @@ static int gen6_do_reset(struct drm_device *dev) > > > > > > static int (*intel_get_gpu_reset(struct drm_device *dev))(struct drm_device *) > > > { > > > + if (!i915.reset) > > > + return NULL; > > > > Maybe a special reset function which always returns -EIO and prints > > something to illuminate the situation to dmesg? Otherwise even more wtf > > material in bugzilla ... > > It does report a failure (different to reset() itself failing), and reset > should be yet another do not touch module option. The only one I could spot is conditional upon the hang being a simulated one. But just marking i915.reset as unsafe works for me too, I'll amend your patch. -Daniel
On Mon, Jun 22, 2015 at 04:48:26PM +0200, Daniel Vetter wrote: > On Mon, Jun 22, 2015 at 02:53:39PM +0100, Chris Wilson wrote: > > On Mon, Jun 22, 2015 at 03:44:51PM +0200, Daniel Vetter wrote: > > > On Thu, Jun 18, 2015 at 11:42:08AM +0100, Chris Wilson wrote: > > > > If the user disables the GPU reset using the i915.reset parameter and > > > > one occurs, report that we failed to reset the GPU. If we return early, > > > > as we currently do, then we leave all state intact (with a hung GPU) > > > > and clients block forever waiting for their requests to complete. > > > > > > > > Testcase: igt/gem_eio > > > > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> > > > > --- > > > > drivers/gpu/drm/i915/i915_dma.c | 1 - > > > > drivers/gpu/drm/i915/i915_drv.c | 3 --- > > > > drivers/gpu/drm/i915/intel_uncore.c | 3 +++ > > > > 3 files changed, 3 insertions(+), 4 deletions(-) > > > > > > > > diff --git a/drivers/gpu/drm/i915/i915_dma.c b/drivers/gpu/drm/i915/i915_dma.c > > > > index 88795d2f1819..c5349fa3fcce 100644 > > > > --- a/drivers/gpu/drm/i915/i915_dma.c > > > > +++ b/drivers/gpu/drm/i915/i915_dma.c > > > > @@ -165,7 +165,6 @@ static int i915_getparam(struct drm_device *dev, void *data, > > > > break; > > > > case I915_PARAM_HAS_GPU_RESET: > > > > value = i915.enable_hangcheck && > > > > - i915.reset && > > > > intel_has_gpu_reset(dev); > > > > break; > > > > default: > > > > diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c > > > > index 78ef0bb53c36..25ffe8afe744 100644 > > > > --- a/drivers/gpu/drm/i915/i915_drv.c > > > > +++ b/drivers/gpu/drm/i915/i915_drv.c > > > > @@ -863,9 +863,6 @@ int i915_reset(struct drm_device *dev) > > > > bool simulated; > > > > int ret; > > > > > > > > - if (!i915.reset) > > > > - return 0; > > > > - > > > > intel_reset_gt_powersave(dev); > > > > > > > > mutex_lock(&dev->struct_mutex); > > > > diff --git a/drivers/gpu/drm/i915/intel_uncore.c b/drivers/gpu/drm/i915/intel_uncore.c > > > > index 4a86cf007aa0..f8e75def1a1d 100644 > > > > --- a/drivers/gpu/drm/i915/intel_uncore.c > > > > +++ b/drivers/gpu/drm/i915/intel_uncore.c > > > > @@ -1457,6 +1457,9 @@ static int gen6_do_reset(struct drm_device *dev) > > > > > > > > static int (*intel_get_gpu_reset(struct drm_device *dev))(struct drm_device *) > > > > { > > > > + if (!i915.reset) > > > > + return NULL; > > > > > > Maybe a special reset function which always returns -EIO and prints > > > something to illuminate the situation to dmesg? Otherwise even more wtf > > > material in bugzilla ... > > > > It does report a failure (different to reset() itself failing), and reset > > should be yet another do not touch module option. > > The only one I could spot is conditional upon the hang being a simulated > one. But just marking i915.reset as unsafe works for me too, I'll amend > your patch. if (ret) { DRM_ERROR("Failed to reset chip: %i\n", ret); goto error; } ENODEV for anything > gen2 is a giveaway. -Chris
diff --git a/drivers/gpu/drm/i915/i915_dma.c b/drivers/gpu/drm/i915/i915_dma.c index 88795d2f1819..c5349fa3fcce 100644 --- a/drivers/gpu/drm/i915/i915_dma.c +++ b/drivers/gpu/drm/i915/i915_dma.c @@ -165,7 +165,6 @@ static int i915_getparam(struct drm_device *dev, void *data, break; case I915_PARAM_HAS_GPU_RESET: value = i915.enable_hangcheck && - i915.reset && intel_has_gpu_reset(dev); break; default: diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c index 78ef0bb53c36..25ffe8afe744 100644 --- a/drivers/gpu/drm/i915/i915_drv.c +++ b/drivers/gpu/drm/i915/i915_drv.c @@ -863,9 +863,6 @@ int i915_reset(struct drm_device *dev) bool simulated; int ret; - if (!i915.reset) - return 0; - intel_reset_gt_powersave(dev); mutex_lock(&dev->struct_mutex); diff --git a/drivers/gpu/drm/i915/intel_uncore.c b/drivers/gpu/drm/i915/intel_uncore.c index 4a86cf007aa0..f8e75def1a1d 100644 --- a/drivers/gpu/drm/i915/intel_uncore.c +++ b/drivers/gpu/drm/i915/intel_uncore.c @@ -1457,6 +1457,9 @@ static int gen6_do_reset(struct drm_device *dev) static int (*intel_get_gpu_reset(struct drm_device *dev))(struct drm_device *) { + if (!i915.reset) + return NULL; + if (INTEL_INFO(dev)->gen >= 6) return gen6_do_reset; else if (IS_GEN5(dev))
If the user disables the GPU reset using the i915.reset parameter and one occurs, report that we failed to reset the GPU. If we return early, as we currently do, then we leave all state intact (with a hung GPU) and clients block forever waiting for their requests to complete. Testcase: igt/gem_eio Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> --- drivers/gpu/drm/i915/i915_dma.c | 1 - drivers/gpu/drm/i915/i915_drv.c | 3 --- drivers/gpu/drm/i915/intel_uncore.c | 3 +++ 3 files changed, 3 insertions(+), 4 deletions(-)