drm/sched: Fix a use-after-free when tracing the scheduler.
diff mbox series

Message ID 20181203201453.25773-1-eric@anholt.net
State New
Headers show
Series
  • drm/sched: Fix a use-after-free when tracing the scheduler.
Related show

Commit Message

Eric Anholt Dec. 3, 2018, 8:14 p.m. UTC
With DEBUG_SLAB (poisoning on free) enabled, I could quickly produce
an oops when tracing V3D.

Signed-off-by: Eric Anholt <eric@anholt.net>
---

I think this patch is correct (though maybe a bigger refactor could
avoid the extra get/put?), but I've still got this with "vblank_mode=0
perf record -a -e v3d:.\* -e gpu_scheduler:.\* glxgears".  Any ideas?

[  139.842191] Unable to handle kernel NULL pointer dereference at virtual address 00000020
[  139.850413] pgd = eab7bb57
[  139.854424] [00000020] *pgd=80000040004003, *pmd=00000000
[  139.860523] Internal error: Oops: 206 [#1] SMP ARM
[  139.865340] Modules linked in:
[  139.868404] CPU: 0 PID: 1161 Comm: v3d_render Not tainted 4.20.0-rc4+ #552
[  139.875287] Hardware name: Broadcom STB (Flattened Device Tree)
[  139.881228] PC is at perf_trace_drm_sched_job_wait_dep+0xa8/0xf4
[  139.887243] LR is at 0xe9790274
[  139.890388] pc : [<c06e715c>]    lr : [<e9790274>]    psr: a0050013
[  139.896662] sp : ed21dec0  ip : ed21dec0  fp : ed21df04
[  139.901893] r10: ed267478  r9 : 00000000  r8 : ff7bde04
[  139.907123] r7 : 00000000  r6 : 00000063  r5 : 00000000  r4 : c1208448
[  139.913659] r3 : c1265690  r2 : ff7bf660  r1 : 00000034  r0 : ff7bf660
[  139.920196] Flags: NzCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment user
[  139.927339] Control: 30c5383d  Table: 68fa3b40  DAC: fffffffd
[  139.933095] Process v3d_render (pid: 1161, stack limit = 0xb3c84b1b)
[  139.939457] Stack: (0xed21dec0 to 0xed21e000)
[  139.943821] dec0: 20050013 ffffffff eb9700cc 00000000 ec0e8e80 00000000 eb9700cc e9790274
[  139.952009] dee0: 00000000 e2f59345 eb970078 eba8f680 c12ae00c c1208478 00000000 e8c2b048
[  139.960197] df00: eb9700cc c06e92e4 c06e8f04 00000000 80050013 ed267478 eb970078 00000000
[  139.968385] df20: ed267578 c0e45ae0 e9093080 c06e831c ed267630 c06e8120 c06e77d4 c1208448
[  139.976573] df40: ee2e8acc 00000001 00000000 ee2e8640 c0272ab4 ed21df54 ed21df54 e2f59345
[  139.984762] df60: ed21c000 ed1b4800 ed2d7840 00000000 ed21c000 ed267478 c06e8084 ee935cb0
[  139.992950] df80: ed1b4838 c0249b44 ed21c000 ed2d7840 c02499e4 00000000 00000000 00000000
[  140.001138] dfa0: 00000000 00000000 00000000 c02010ac 00000000 00000000 00000000 00000000
[  140.009326] dfc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
[  140.017514] dfe0: 00000000 00000000 00000000 00000000 00000013 00000000 00000000 00000000
[  140.025707] [<c06e715c>] (perf_trace_drm_sched_job_wait_dep) from [<c06e92e4>] (drm_sched_entity_pop_job+0x394/0x438)
[  140.036332] [<c06e92e4>] (drm_sched_entity_pop_job) from [<c06e8120>] (drm_sched_main+0x9c/0x298)
[  140.045221] [<c06e8120>] (drm_sched_main) from [<c0249b44>] (kthread+0x160/0x168)
[  140.052716] [<c0249b44>] (kthread) from [<c02010ac>] (ret_from_fork+0x14/0x28)

 drivers/gpu/drm/scheduler/sched_entity.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

Comments

Koenig, Christian Dec. 4, 2018, 7:25 a.m. UTC | #1
Am 03.12.18 um 21:14 schrieb Eric Anholt:
> With DEBUG_SLAB (poisoning on free) enabled, I could quickly produce
> an oops when tracing V3D.

Good catch, but the solution is a clear NAK.

drm_sched_entity_add_dependency_cb() can result in setting 
entity->dependency to NULL. That in turn can lead to a memory leak 
because we call the _put with a NULL fence.

Instead we should rather call trace_drm_sched_job_wait_dep() before even 
calling drm_sched_entity_add_dependency_cb(). This is also cleaner 
because we want to trace which dependencies the driver gave to the 
scheduler and not which we actually needed to add a callback to.

Regards,
Christian.

>
> Signed-off-by: Eric Anholt <eric@anholt.net>
> ---
>
> I think this patch is correct (though maybe a bigger refactor could
> avoid the extra get/put?), but I've still got this with "vblank_mode=0
> perf record -a -e v3d:.\* -e gpu_scheduler:.\* glxgears".  Any ideas?
>
> [  139.842191] Unable to handle kernel NULL pointer dereference at virtual address 00000020
> [  139.850413] pgd = eab7bb57
> [  139.854424] [00000020] *pgd=80000040004003, *pmd=00000000
> [  139.860523] Internal error: Oops: 206 [#1] SMP ARM
> [  139.865340] Modules linked in:
> [  139.868404] CPU: 0 PID: 1161 Comm: v3d_render Not tainted 4.20.0-rc4+ #552
> [  139.875287] Hardware name: Broadcom STB (Flattened Device Tree)
> [  139.881228] PC is at perf_trace_drm_sched_job_wait_dep+0xa8/0xf4
> [  139.887243] LR is at 0xe9790274
> [  139.890388] pc : [<c06e715c>]    lr : [<e9790274>]    psr: a0050013
> [  139.896662] sp : ed21dec0  ip : ed21dec0  fp : ed21df04
> [  139.901893] r10: ed267478  r9 : 00000000  r8 : ff7bde04
> [  139.907123] r7 : 00000000  r6 : 00000063  r5 : 00000000  r4 : c1208448
> [  139.913659] r3 : c1265690  r2 : ff7bf660  r1 : 00000034  r0 : ff7bf660
> [  139.920196] Flags: NzCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment user
> [  139.927339] Control: 30c5383d  Table: 68fa3b40  DAC: fffffffd
> [  139.933095] Process v3d_render (pid: 1161, stack limit = 0xb3c84b1b)
> [  139.939457] Stack: (0xed21dec0 to 0xed21e000)
> [  139.943821] dec0: 20050013 ffffffff eb9700cc 00000000 ec0e8e80 00000000 eb9700cc e9790274
> [  139.952009] dee0: 00000000 e2f59345 eb970078 eba8f680 c12ae00c c1208478 00000000 e8c2b048
> [  139.960197] df00: eb9700cc c06e92e4 c06e8f04 00000000 80050013 ed267478 eb970078 00000000
> [  139.968385] df20: ed267578 c0e45ae0 e9093080 c06e831c ed267630 c06e8120 c06e77d4 c1208448
> [  139.976573] df40: ee2e8acc 00000001 00000000 ee2e8640 c0272ab4 ed21df54 ed21df54 e2f59345
> [  139.984762] df60: ed21c000 ed1b4800 ed2d7840 00000000 ed21c000 ed267478 c06e8084 ee935cb0
> [  139.992950] df80: ed1b4838 c0249b44 ed21c000 ed2d7840 c02499e4 00000000 00000000 00000000
> [  140.001138] dfa0: 00000000 00000000 00000000 c02010ac 00000000 00000000 00000000 00000000
> [  140.009326] dfc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
> [  140.017514] dfe0: 00000000 00000000 00000000 00000000 00000013 00000000 00000000 00000000
> [  140.025707] [<c06e715c>] (perf_trace_drm_sched_job_wait_dep) from [<c06e92e4>] (drm_sched_entity_pop_job+0x394/0x438)
> [  140.036332] [<c06e92e4>] (drm_sched_entity_pop_job) from [<c06e8120>] (drm_sched_main+0x9c/0x298)
> [  140.045221] [<c06e8120>] (drm_sched_main) from [<c0249b44>] (kthread+0x160/0x168)
> [  140.052716] [<c0249b44>] (kthread) from [<c02010ac>] (ret_from_fork+0x14/0x28)
>
>   drivers/gpu/drm/scheduler/sched_entity.c | 4 +++-
>   1 file changed, 3 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/scheduler/sched_entity.c b/drivers/gpu/drm/scheduler/sched_entity.c
> index 4463d3826ecb..0d4fc86089cb 100644
> --- a/drivers/gpu/drm/scheduler/sched_entity.c
> +++ b/drivers/gpu/drm/scheduler/sched_entity.c
> @@ -440,13 +440,15 @@ struct drm_sched_job *drm_sched_entity_pop_job(struct drm_sched_entity *entity)
>   
>   	while ((entity->dependency =
>   			sched->ops->dependency(sched_job, entity))) {
> -
> +		dma_fence_get(entity->dependency);
>   		if (drm_sched_entity_add_dependency_cb(entity)) {
>   
>   			trace_drm_sched_job_wait_dep(sched_job,
>   						     entity->dependency);
> +			dma_fence_put(entity->dependency);
>   			return NULL;
>   		}
> +		dma_fence_put(entity->dependency);
>   	}
>   
>   	/* skip jobs from entity that marked guilty */

Patch
diff mbox series

diff --git a/drivers/gpu/drm/scheduler/sched_entity.c b/drivers/gpu/drm/scheduler/sched_entity.c
index 4463d3826ecb..0d4fc86089cb 100644
--- a/drivers/gpu/drm/scheduler/sched_entity.c
+++ b/drivers/gpu/drm/scheduler/sched_entity.c
@@ -440,13 +440,15 @@  struct drm_sched_job *drm_sched_entity_pop_job(struct drm_sched_entity *entity)
 
 	while ((entity->dependency =
 			sched->ops->dependency(sched_job, entity))) {
-
+		dma_fence_get(entity->dependency);
 		if (drm_sched_entity_add_dependency_cb(entity)) {
 
 			trace_drm_sched_job_wait_dep(sched_job,
 						     entity->dependency);
+			dma_fence_put(entity->dependency);
 			return NULL;
 		}
+		dma_fence_put(entity->dependency);
 	}
 
 	/* skip jobs from entity that marked guilty */