diff mbox

[2/2] drm/i915: Soften error messages in i915_gem_object_create_from_data()

Message ID 1438090036-19235-2-git-send-email-chris@chris-wilson.co.uk (mailing list archive)
State New, archived
Headers show

Commit Message

Chris Wilson July 28, 2015, 1:27 p.m. UTC
Since we already return -EFAULT to the user, emitting an error message
*and* WARN is overkill. If the caller is upset, they can do so, but for
the purposes of debugging we need only log the erroneous values.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc:: Alex Dai <yu.dai@intel.com>
Cc: Dave Gordon <david.s.gordon@intel.com>
Cc: Tom O'Rourke <Tom.O'Rourke@intel.com>
---
 drivers/gpu/drm/i915/i915_gem.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

Comments

Dave Gordon July 28, 2015, 4:29 p.m. UTC | #1
On 28/07/15 14:27, Chris Wilson wrote:
> Since we already return -EFAULT to the user, emitting an error message
> *and* WARN is overkill. If the caller is upset, they can do so, but for
> the purposes of debugging we need only log the erroneous values.
>
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc:: Alex Dai <yu.dai@intel.com>
> Cc: Dave Gordon <david.s.gordon@intel.com>
> Cc: Tom O'Rourke <Tom.O'Rourke@intel.com>
> ---
>   drivers/gpu/drm/i915/i915_gem.c | 4 ++--
>   1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index c1ded76a6eb4..2039798f4403 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -5243,8 +5243,8 @@ i915_gem_object_create_from_data(struct drm_device *dev,
>
>   	i915_gem_object_unpin_pages(obj);
>
> -	if (WARN_ON(bytes != size)) {
> -		DRM_ERROR("Incomplete copy, wrote %zu of %zu", bytes, size);
> +	if (bytes != size) {
> +		DRM_DEBUG_GEM("Incomplete copy, wrote %zu of %zu", bytes, size);
>   		ret = -EFAULT;
>   		goto fail;
>   	}

I agree the WARN_ON() is overkill, but I think maybe the DRM_ERROR() is 
useful. The (one current) caller will report an error, but at that level 
it's just:

         DRM_ERROR("Failed to fetch GuC firmware from %s\n", ...)

with no more detail as to whether that was due to file-not-found, 
bad-file-size, out-of-memory, failed-to-get-pages, or any of the other 
errors that might arise.

At present this code is only called once, and I think this copy failure 
"shouldn't ever happen", so it won't be filling the logfile. But 
emitting the error here for the truly unexpected case (as opposed to the 
commonplace "out-of-memory" and suchlike) helps current and future 
callers avoid doing the detailed failure analysis.

Or maybe it's a good place for your (other) new macro?

	DRM_NOTICE_ON(bytes != size,
		"Incomplete copy, wrote %zu of %zu", bytes, size);

.Dave.
Chris Wilson July 28, 2015, 4:34 p.m. UTC | #2
On Tue, Jul 28, 2015 at 05:29:09PM +0100, Dave Gordon wrote:
> On 28/07/15 14:27, Chris Wilson wrote:
> >Since we already return -EFAULT to the user, emitting an error message
> >*and* WARN is overkill. If the caller is upset, they can do so, but for
> >the purposes of debugging we need only log the erroneous values.
> >
> >Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> >Cc:: Alex Dai <yu.dai@intel.com>
> >Cc: Dave Gordon <david.s.gordon@intel.com>
> >Cc: Tom O'Rourke <Tom.O'Rourke@intel.com>
> >---
> >  drivers/gpu/drm/i915/i915_gem.c | 4 ++--
> >  1 file changed, 2 insertions(+), 2 deletions(-)
> >
> >diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> >index c1ded76a6eb4..2039798f4403 100644
> >--- a/drivers/gpu/drm/i915/i915_gem.c
> >+++ b/drivers/gpu/drm/i915/i915_gem.c
> >@@ -5243,8 +5243,8 @@ i915_gem_object_create_from_data(struct drm_device *dev,
> >
> >  	i915_gem_object_unpin_pages(obj);
> >
> >-	if (WARN_ON(bytes != size)) {
> >-		DRM_ERROR("Incomplete copy, wrote %zu of %zu", bytes, size);
> >+	if (bytes != size) {
> >+		DRM_DEBUG_GEM("Incomplete copy, wrote %zu of %zu", bytes, size);
> >  		ret = -EFAULT;
> >  		goto fail;
> >  	}
> 
> I agree the WARN_ON() is overkill, but I think maybe the DRM_ERROR()
> is useful. The (one current) caller will report an error, but at
> that level it's just:
> 
>         DRM_ERROR("Failed to fetch GuC firmware from %s\n", ...)
> 
> with no more detail as to whether that was due to file-not-found,
> bad-file-size, out-of-memory, failed-to-get-pages, or any of the
> other errors that might arise.
> 
> At present this code is only called once, and I think this copy
> failure "shouldn't ever happen", so it won't be filling the logfile.
> But emitting the error here for the truly unexpected case (as
> opposed to the commonplace "out-of-memory" and suchlike) helps
> current and future callers avoid doing the detailed failure
> analysis.

The message is still there though, it's just a debug message for the
developer. And all it could mean is that the caller passed in a bad
value for the size.
-Chris
Dave Gordon July 28, 2015, 5:09 p.m. UTC | #3
On 28/07/15 17:34, Chris Wilson wrote:
> On Tue, Jul 28, 2015 at 05:29:09PM +0100, Dave Gordon wrote:
>> On 28/07/15 14:27, Chris Wilson wrote:
>>> Since we already return -EFAULT to the user, emitting an error message
>>> *and* WARN is overkill. If the caller is upset, they can do so, but for
>>> the purposes of debugging we need only log the erroneous values.
>>>
>>> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
>>> Cc:: Alex Dai <yu.dai@intel.com>
>>> Cc: Dave Gordon <david.s.gordon@intel.com>
>>> Cc: Tom O'Rourke <Tom.O'Rourke@intel.com>
>>> ---
>>>   drivers/gpu/drm/i915/i915_gem.c | 4 ++--
>>>   1 file changed, 2 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
>>> index c1ded76a6eb4..2039798f4403 100644
>>> --- a/drivers/gpu/drm/i915/i915_gem.c
>>> +++ b/drivers/gpu/drm/i915/i915_gem.c
>>> @@ -5243,8 +5243,8 @@ i915_gem_object_create_from_data(struct drm_device *dev,
>>>
>>>   	i915_gem_object_unpin_pages(obj);
>>>
>>> -	if (WARN_ON(bytes != size)) {
>>> -		DRM_ERROR("Incomplete copy, wrote %zu of %zu", bytes, size);
>>> +	if (bytes != size) {
>>> +		DRM_DEBUG_GEM("Incomplete copy, wrote %zu of %zu", bytes, size);
>>>   		ret = -EFAULT;
>>>   		goto fail;
>>>   	}
>>
>> I agree the WARN_ON() is overkill, but I think maybe the DRM_ERROR()
>> is useful. The (one current) caller will report an error, but at
>> that level it's just:
>>
>>          DRM_ERROR("Failed to fetch GuC firmware from %s\n", ...)
>>
>> with no more detail as to whether that was due to file-not-found,
>> bad-file-size, out-of-memory, failed-to-get-pages, or any of the
>> other errors that might arise.
>>
>> At present this code is only called once, and I think this copy
>> failure "shouldn't ever happen", so it won't be filling the logfile.
>> But emitting the error here for the truly unexpected case (as
>> opposed to the commonplace "out-of-memory" and suchlike) helps
>> current and future callers avoid doing the detailed failure
>> analysis.
>
> The message is still there though, it's just a debug message for the
> developer. And all it could mean is that the caller passed in a bad
> value for the size.
> -Chris

I'm not sure you could trigger it with a bad size. The copy-target has 
just been allocated (in this function) and so is known to be the right 
size. If the size specified is larger than the data source buffer, the 
copy will likely have OOPSed anyway - or just continued with whatever 
happens to be next in memory.

So AFAICT the bytes vs. size mismatch can only happen if something is 
broken in the GEM (or SG-list) framework, hence worth announcing. If you 
reload the driver with debug enabled, it probably won't happen again 
(and you would have to reload, not just change the debug level, because 
at present this is only called during driver load).

If not an actual ERROR, then maybe a NOTICE is the right level:

     DRM_NOTICE_IF(bytes != size,
         "Incomplete copy, wrote %zu of %zu", bytes, size);

so the the support engineer who sees the "failed to fetch" message in 
the log at least gets some hint about what particular weirdness has 
occurred on this occasion. (Other cases are easier: missing firmware 
file is obvious, wrong version gets a specific ERROR in the log; and 
these errors would be persistent. But a copy-fail would be a puzzle.)

.Dave.
Chris Wilson July 28, 2015, 5:18 p.m. UTC | #4
On Tue, Jul 28, 2015 at 06:09:16PM +0100, Dave Gordon wrote:
> On 28/07/15 17:34, Chris Wilson wrote:
> >On Tue, Jul 28, 2015 at 05:29:09PM +0100, Dave Gordon wrote:
> >>On 28/07/15 14:27, Chris Wilson wrote:
> >>>Since we already return -EFAULT to the user, emitting an error message
> >>>*and* WARN is overkill. If the caller is upset, they can do so, but for
> >>>the purposes of debugging we need only log the erroneous values.
> >>>
> >>>Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> >>>Cc:: Alex Dai <yu.dai@intel.com>
> >>>Cc: Dave Gordon <david.s.gordon@intel.com>
> >>>Cc: Tom O'Rourke <Tom.O'Rourke@intel.com>
> >>>---
> >>>  drivers/gpu/drm/i915/i915_gem.c | 4 ++--
> >>>  1 file changed, 2 insertions(+), 2 deletions(-)
> >>>
> >>>diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> >>>index c1ded76a6eb4..2039798f4403 100644
> >>>--- a/drivers/gpu/drm/i915/i915_gem.c
> >>>+++ b/drivers/gpu/drm/i915/i915_gem.c
> >>>@@ -5243,8 +5243,8 @@ i915_gem_object_create_from_data(struct drm_device *dev,
> >>>
> >>>  	i915_gem_object_unpin_pages(obj);
> >>>
> >>>-	if (WARN_ON(bytes != size)) {
> >>>-		DRM_ERROR("Incomplete copy, wrote %zu of %zu", bytes, size);
> >>>+	if (bytes != size) {
> >>>+		DRM_DEBUG_GEM("Incomplete copy, wrote %zu of %zu", bytes, size);
> >>>  		ret = -EFAULT;
> >>>  		goto fail;
> >>>  	}
> >>
> >>I agree the WARN_ON() is overkill, but I think maybe the DRM_ERROR()
> >>is useful. The (one current) caller will report an error, but at
> >>that level it's just:
> >>
> >>         DRM_ERROR("Failed to fetch GuC firmware from %s\n", ...)
> >>
> >>with no more detail as to whether that was due to file-not-found,
> >>bad-file-size, out-of-memory, failed-to-get-pages, or any of the
> >>other errors that might arise.
> >>
> >>At present this code is only called once, and I think this copy
> >>failure "shouldn't ever happen", so it won't be filling the logfile.
> >>But emitting the error here for the truly unexpected case (as
> >>opposed to the commonplace "out-of-memory" and suchlike) helps
> >>current and future callers avoid doing the detailed failure
> >>analysis.
> >
> >The message is still there though, it's just a debug message for the
> >developer. And all it could mean is that the caller passed in a bad
> >value for the size.
> >-Chris
> 
> I'm not sure you could trigger it with a bad size. The copy-target
> has just been allocated (in this function) and so is known to be the
> right size. If the size specified is larger than the data source
> buffer, the copy will likely have OOPSed anyway - or just continued
> with whatever happens to be next in memory.
> 
> So AFAICT the bytes vs. size mismatch can only happen if something
> is broken in the GEM (or SG-list) framework, hence worth announcing.
> If you reload the driver with debug enabled, it probably won't
> happen again (and you would have to reload, not just change the
> debug level, because at present this is only called during driver
> load).

Hmm, I was thinking that the access of data[size] would be checked and
the likely cause of the failure (thinking this was basically just
another copy_from_user check). I disagree that this one failure is any
more significant that any of the other internal failure cases then.

> If not an actual ERROR, then maybe a NOTICE is the right level:

No.
 
>     DRM_NOTICE_IF(bytes != size,
>         "Incomplete copy, wrote %zu of %zu", bytes, size);
> 
> so the the support engineer who sees the "failed to fetch" message
> in the log at least gets some hint about what particular weirdness
> has occurred on this occasion. (Other cases are easier: missing
> firmware file is obvious, wrong version gets a specific ERROR in the
> log; and these errors would be persistent. But a copy-fail would be
> a puzzle.)

That equally applies to a debug message. And even more so since EFAULT
here is unique. Under my assumption bytes would be recognisable in the
context of the caller, however if it ever happens it will be an equally
opaque value.

Given your arguments here, I think we should just remove it.
-Chris
diff mbox

Patch

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index c1ded76a6eb4..2039798f4403 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -5243,8 +5243,8 @@  i915_gem_object_create_from_data(struct drm_device *dev,
 
 	i915_gem_object_unpin_pages(obj);
 
-	if (WARN_ON(bytes != size)) {
-		DRM_ERROR("Incomplete copy, wrote %zu of %zu", bytes, size);
+	if (bytes != size) {
+		DRM_DEBUG_GEM("Incomplete copy, wrote %zu of %zu", bytes, size);
 		ret = -EFAULT;
 		goto fail;
 	}