diff mbox

Kernel Oops on 3.14.66

Message ID 57208E78.4070800@intel.com (mailing list archive)
State New, archived
Headers show

Commit Message

Dave Gordon April 27, 2016, 10:03 a.m. UTC
On 27/04/16 10:19, Andreas Lampersperger wrote:
> Hello,
>
> has anyone here a hint for me, what can cause this error.
> The error occures highly sporadic on different machines with intel hd
> graphics (ivb_gt1).
> I did also some kernel coredumps and found out, that the failed
> paging request came from drm_i915_gem_request->list.prev or ->list.next.
>
> Thank you
> Andreas

Try this patch.

.Dave.

Comments

Andreas Lampersperger April 27, 2016, 12:20 p.m. UTC | #1
Hi Dave,

thank you for the patch, but I'm using longtime kernel 3.14.66 and so the
patch does not fit.
Do you have any suggestions for that kernel?

Andreas

2016-04-27 12:03 GMT+02:00 Dave Gordon <david.s.gordon@intel.com>:

> On 27/04/16 10:19, Andreas Lampersperger wrote:
>
>> Hello,
>>
>> has anyone here a hint for me, what can cause this error.
>> The error occures highly sporadic on different machines with intel hd
>> graphics (ivb_gt1).
>> I did also some kernel coredumps and found out, that the failed
>> paging request came from drm_i915_gem_request->list.prev or ->list.next.
>>
>> Thank you
>> Andreas
>>
>
> Try this patch.
>
> .Dave.
>
>
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/intel-gfx
>
>
Dave Gordon April 27, 2016, 6:28 p.m. UTC | #2
> 2016-04-27 12:03 GMT+02:00 Dave Gordon <david.s.gordon@intel.com>:
>
>> On 27/04/16 10:19, Andreas Lampersperger wrote:
>>
>>> Hello,
>>>
>>> has anyone here a hint for me, what can cause this error.
>>> The error occures highly sporadic on different machines with intel hd
>>> graphics (ivb_gt1).
>>> I did also some kernel coredumps and found out, that the failed
>>> paging request came from drm_i915_gem_request->list.prev or ->list.next.
>>>
>>> Thank you
>>> Andreas
>>
>> Try this patch.
>>
>> .Dave.
>
On 27/04/16 13:20, Andreas Lampersperger wrote:
> Hi Dave,
>
> thank you for the patch, but I'm using longtime kernel 3.14.66 and so the
> patch does not fit.
> Do you have any suggestions for that kernel?
>
> Andreas

I have no idea what the code would look like in that version; do you 
have any way to get the git commit hash or tag of the i915 driver?

But essentially, if you have the kernel sources for that version, you 
could look for a line that allocates a "struct drm_i915_gem_request". It 
might be

     req = kmem_cache_zalloc(dev_priv->requests, GFP_KERNEL);

or in older versions it might be

     request = kzalloc(sizeof(*request), GFP_KERNEL);

or some other variation. It should of course be followed by a test for 
successful allocation, e.g.

     if (request == NULL)
         return -ENOMEM;

If you can find that line (or lines, at one time there were TWO such 
places), then you just have to add

     INIT_LIST_HEAD(&request->list);

after the test-for-NULL, or anywhere inline thereafter. For example:

     request = kzalloc(sizeof(*request), GFP_KERNEL);
     if (request == NULL)
         return -ENOMEM;

     ret = i915_gem_get_seqno(ring->dev, &request->seqno);
     if (ret) {
         kfree(request);
         return ret;
     }

+   INIT_LIST_HEAD(&request->list);
     kref_init(&request->ref);
     request->ring = ring;
     request->uniq = dev_private->request_uniq++;

HTH,
.Dave.
Andreas Lampersperger April 28, 2016, 8:58 a.m. UTC | #3
Hi Dave,

thank you again.

I searched, where the memory for the request came from, it is set in line
2158 of i915_gem.c
<https://git.kernel.org/cgit/linux/kernel/git/stable/linux-stable.git/tree/drivers/gpu/drm/i915/i915_gem.c?id=refs/tags/v3.14.67#n2158>
and the kmalloc is in intel_ring_alloc_seqno() in intel_ringbuffer.c
<https://git.kernel.org/cgit/linux/kernel/git/stable/linux-stable.git/tree/drivers/gpu/drm/i915/intel_ringbuffer.c?id=refs/tags/v3.14.67#n1600>,
right?

So I will test the following patch for intel_ringbuffer.c:


intel_ring_alloc_seqno(struct intel_ring_buffer *ring)
{
    if (ring->outstanding_lazy_seqno)
        return 0;

    if (ring->preallocated_lazy_request == NULL) {
        struct drm_i915_gem_request *request;

        request = kmalloc(sizeof(*request), GFP_KERNEL);
        if (request == NULL)
            return -ENOMEM;
+
+       INIT_LIST_HEAD(&request->list);
        ring->preallocated_lazy_request = request;
    }

    return i915_gem_get_seqno(ring->dev, &ring->outstanding_lazy_seqno);
}
:

2016-04-27 20:28 GMT+02:00 Dave Gordon <david.s.gordon@intel.com>:

> 2016-04-27 12:03 GMT+02:00 Dave Gordon <david.s.gordon@intel.com>:
>>
>> On 27/04/16 10:19, Andreas Lampersperger wrote:
>>>
>>> Hello,
>>>>
>>>> has anyone here a hint for me, what can cause this error.
>>>> The error occures highly sporadic on different machines with intel hd
>>>> graphics (ivb_gt1).
>>>> I did also some kernel coredumps and found out, that the failed
>>>> paging request came from drm_i915_gem_request->list.prev or ->list.next.
>>>>
>>>> Thank you
>>>> Andreas
>>>>
>>>
>>> Try this patch.
>>>
>>> .Dave.
>>>
>>
>> On 27/04/16 13:20, Andreas Lampersperger wrote:
>
>> Hi Dave,
>>
>> thank you for the patch, but I'm using longtime kernel 3.14.66 and so the
>> patch does not fit.
>> Do you have any suggestions for that kernel?
>>
>> Andreas
>>
>
> I have no idea what the code would look like in that version; do you have
> any way to get the git commit hash or tag of the i915 driver?
>
> But essentially, if you have the kernel sources for that version, you
> could look for a line that allocates a "struct drm_i915_gem_request". It
> might be
>
>     req = kmem_cache_zalloc(dev_priv->requests, GFP_KERNEL);
>
> or in older versions it might be
>
>     request = kzalloc(sizeof(*request), GFP_KERNEL);
>
> or some other variation. It should of course be followed by a test for
> successful allocation, e.g.
>
>     if (request == NULL)
>         return -ENOMEM;
>
> If you can find that line (or lines, at one time there were TWO such
> places), then you just have to add
>
>     INIT_LIST_HEAD(&request->list);
>
> after the test-for-NULL, or anywhere inline thereafter. For example:
>
>     request = kzalloc(sizeof(*request), GFP_KERNEL);
>     if (request == NULL)
>         return -ENOMEM;
>
>     ret = i915_gem_get_seqno(ring->dev, &request->seqno);
>     if (ret) {
>         kfree(request);
>         return ret;
>     }
>
> +   INIT_LIST_HEAD(&request->list);
>     kref_init(&request->ref);
>     request->ring = ring;
>     request->uniq = dev_private->request_uniq++;
>
> HTH,
> .Dave.
>
Dave Gordon April 28, 2016, 11:50 a.m. UTC | #4
On 28/04/16 09:58, Andreas Lampersperger wrote:
> Hi Dave,
>
> thank you again.
>
> I searched, where the memory for the request came from, it is set in line 2158
> of i915_gem.c
> <https://git.kernel.org/cgit/linux/kernel/git/stable/linux-stable.git/tree/drivers/gpu/drm/i915/i915_gem.c?id=refs/tags/v3.14.67#n2158>
> and the kmalloc is in intel_ring_alloc_seqno() in intel_ringbuffer.c
> <https://git.kernel.org/cgit/linux/kernel/git/stable/linux-stable.git/tree/drivers/gpu/drm/i915/intel_ringbuffer.c?id=refs/tags/v3.14.67#n1600>,
> right?
>
> So I will test the following patch for intel_ringbuffer.c:
>
>
> intel_ring_alloc_seqno(struct intel_ring_buffer *ring)
> {
>       if (ring->outstanding_lazy_seqno)
>           return 0;
>
>       if (ring->preallocated_lazy_request == NULL) {
>           struct drm_i915_gem_request *request;
>
>           request = kmalloc(sizeof(*request), GFP_KERNEL);
>           if (request == NULL)
>               return -ENOMEM;
> +
> +       INIT_LIST_HEAD(&request->list);
>           ring->preallocated_lazy_request = request;
>       }
>
>       return i915_gem_get_seqno(ring->dev, &ring->outstanding_lazy_seqno);
> }
> :

Yes, that looks sensible. Interestingly, that version uses kmalloc() 
rather than k*z*alloc, so the uninitialised list head will be random 
rather than zeroed. Anything could happen!

.Dave.
diff mbox

Patch

From 6dde6b8b07239c68f32315faffa9d80b6e2d0aec Mon Sep 17 00:00:00 2001
From: Dave Gordon <david.s.gordon@intel.com>
Date: Fri, 22 Apr 2016 19:33:53 +0100
Subject: [PATCH] Request constructor should initialise list head
Organization: Intel Corporation (UK) Ltd. - Co. Reg. #1134945 - Pipers Way, Swindon SN3 1RJ

Users of the Linux kernel linked-list macros should remember that a
zeroed "struct list_head" is not a valid list and in particular is
not the same as an empty list. ALL list heads MUST be initialised!

Without this, any attempt to follow this link (e.g. during teardown)
before the request has been put on the list of pending requests (or
if is not put on the list, e.g. is cancelled instead) will result in
a NULL pointer dereference.

Note: the diff below claims this code goes in i915_gem_request_free(),
but that's just git's failure to parse the function headers correctly.
It obviously belongs in __i915_gem_request_alloc(); it seems to apply
correctly despite the incorrect function tag.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=93907 ?
Signed-off-by: Dave Gordon <david.s.gordon@intel.com>
---
 drivers/gpu/drm/i915/i915_gem.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index d493e79..8ce7d4d 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -2751,6 +2751,7 @@  void i915_gem_request_free(struct kref *req_ref)
 	if (ret)
 		goto err;
 
+	INIT_LIST_HEAD(&req->list);
 	kref_init(&req->ref);
 	req->i915 = dev_priv;
 	req->engine = engine;
-- 
1.9.1