diff mbox

[BUG] drm: vc4: refcount_t: increment on 0; use-after-free.

Message ID 20171122185111.0ffa7fca@bbrezillon (mailing list archive)
State New, archived
Headers show

Commit Message

Boris BREZILLON Nov. 22, 2017, 5:51 p.m. UTC
Hi Stefan,

On Wed, 22 Nov 2017 17:43:35 +0100 (CET)
Stefan Wahren <stefan.wahren@i2se.com> wrote:

> Hi Boris,
> if i boot Raspberry Pi 3 (ARM64, defconfig, linux-next-20171122) with sufficient CMA memory (32 MB), i'll get this warning during boot:
> 
> [    7.623510] vc4-drm soc:gpu: bound 3f902000.hdmi (ops vc4_hdmi_ops [vc4])
> [    7.632453] vc4-drm soc:gpu: bound 3f806000.vec (ops vc4_vec_ops [vc4])
> [    7.639707] vc4-drm soc:gpu: bound 3f400000.hvs (ops vc4_hvs_ops [vc4])
> [    7.647364] vc4-drm soc:gpu: bound 3f206000.pixelvalve (ops vc4_crtc_ops [vc4])
> [    7.655451] vc4-drm soc:gpu: bound 3f207000.pixelvalve (ops vc4_crtc_ops [vc4])
> [    7.663415] vc4-drm soc:gpu: bound 3f807000.pixelvalve (ops vc4_crtc_ops [vc4])
> [    7.730580] vc4-drm soc:gpu: bound 3fc00000.v3d (ops vc4_v3d_ops [vc4])
> [    7.743965] [drm] Initialized vc4 0.0.0 20140616 for soc:gpu on minor 0
> [    7.750841] [drm] Supports vblank timestamp caching Rev 2 (21.10.2013).
> [    7.757620] [drm] Driver supports precise vblank timestamp query.
> [    7.811775] ------------[ cut here ]------------
> [    7.811780] refcount_t: increment on 0; use-after-free.
> [    7.811881] WARNING: CPU: 2 PID: 2188 at lib/refcount.c:153 refcount_inc+0x44/0x50
> [    7.811884] Modules linked in: vc4(+) cfg80211 cec drm_kms_helper smsc95xx usbnet drm rfkill brcmutil bcm2835_rng rng_core bcm2835_dma crc32_ce i2c_bcm2835 pwm_bcm2835 ip_tables x_tables ipv6
> [    7.811950] CPU: 2 PID: 2188 Comm: systemd-udevd Not tainted 4.14.0-next-20171122 #1
> [    7.811953] Hardware name: Raspberry Pi 3 Model B Rev 1.2 (DT)
> [    7.811958] task: ffff800036b91c00 task.stack: ffff00000d6f0000
> [    7.811967] pstate: 20000005 (nzCv daif -PAN -UAO)
> [    7.811974] pc : refcount_inc+0x44/0x50
> [    7.811981] lr : refcount_inc+0x44/0x50
> [    7.811984] sp : ffff00000d6f3300
> [    7.811988] x29: ffff00000d6f3300 x28: 0000000000000000 
> [    7.811996] x27: 0000000000000003 x26: ffff800037107800 
> [    7.812004] x25: 0000000000000001 x24: ffff800035afdc00 
> [    7.812012] x23: 0000000000000000 x22: ffff800035dfa600 
> [    7.812020] x21: ffff800035afd9b0 x20: ffff800035afd9a4 
> [    7.812027] x19: 0000000000000000 x18: 0000000000000000 
> [    7.812034] x17: 0000000000000001 x16: 0000000000000019 
> [    7.812042] x15: 0000000000000001 x14: 00000000fffffff0 
> [    7.812049] x13: ffff0000090ae840 x12: ffff000008fa2d50 
> [    7.812057] x11: ffff000008fa2000 x10: ffff0000090ad000 
> [    7.812064] x9 : 0000000000000000 x8 : ffff0000090b5c2f 
> [    7.812072] x7 : 0000000000000000 x6 : 000000000015ee00 
> [    7.812079] x5 : 0000000000000000 x4 : 0000000000000000 
> [    7.812087] x3 : ffffffffffffffff x2 : 0000800030047000 
> [    7.812094] x1 : ffff800036b91c00 x0 : 000000000000002b 
> [    7.812102] Call trace:
> [    7.812109]  refcount_inc+0x44/0x50
> [    7.812226]  vc4_bo_inc_usecnt+0x84/0x88 [vc4]
> [    7.812310]  vc4_prepare_fb+0x40/0xf0 [vc4]
> [    7.812460]  drm_atomic_helper_prepare_planes+0x54/0xf0 [drm_kms_helper]
> [    7.812543]  vc4_atomic_commit+0x88/0x130 [vc4]
> [    7.812868]  drm_atomic_commit+0x48/0x68 [drm]
> [    7.813002]  restore_fbdev_mode_atomic+0x1d8/0x1e8 [drm_kms_helper]
> [    7.813113]  restore_fbdev_mode+0x28/0x160 [drm_kms_helper]
> [    7.813223]  drm_fb_helper_restore_fbdev_mode_unlocked.part.24+0x28/0x90 [drm_kms_helper]
> [    7.813331]  drm_fb_helper_set_par+0x54/0xa8 [drm_kms_helper]
> [    7.813346]  fbcon_init+0x4e8/0x538
> [    7.813357]  visual_init+0xb4/0x108
> [    7.813366]  do_bind_con_driver+0x1b8/0x3a0
> [    7.813373]  do_take_over_console+0x150/0x1d0
> [    7.813380]  do_fbcon_takeover+0x6c/0xf0
> [    7.813387]  fbcon_event_notify+0x8fc/0x928
> [    7.813399]  notifier_call_chain+0x50/0x90
> [    7.813406]  __blocking_notifier_call_chain+0x4c/0x90
> [    7.813413]  blocking_notifier_call_chain+0x14/0x20
> [    7.813420]  fb_notifier_call_chain+0x1c/0x28
> [    7.813426]  register_framebuffer+0x1d0/0x2d8
> [    7.813533]  __drm_fb_helper_initial_config_and_unlock+0x1e8/0x350 [drm_kms_helper]
> [    7.813639]  drm_fb_helper_initial_config+0x40/0x58 [drm_kms_helper]
> [    7.813747]  drm_fbdev_cma_init_with_funcs+0x88/0x158 [drm_kms_helper]
> [    7.813855]  drm_fbdev_cma_init+0x14/0x28 [drm_kms_helper]
> [    7.813943]  vc4_kms_load+0xa4/0xf0 [vc4]
> [    7.814026]  vc4_drm_bind+0x100/0x168 [vc4]
> [    7.814038]  try_to_bring_up_master+0x144/0x1a8
> [    7.814044]  component_master_add_with_match+0x9c/0xe0
> [    7.814128]  vc4_platform_drm_probe+0xb4/0xf0 [vc4]
> [    7.814137]  platform_drv_probe+0x58/0xc0
> [    7.814146]  driver_probe_device+0x224/0x308
> [    7.814153]  __driver_attach+0xac/0xb0
> [    7.814161]  bus_for_each_dev+0x64/0xa0
> [    7.814169]  driver_attach+0x20/0x28
> [    7.814177]  bus_add_driver+0x108/0x228
> [    7.814184]  driver_register+0x60/0xf8
> [    7.814190]  __platform_driver_register+0x40/0x48
> [    7.814274]  vc4_drm_register+0x38/0x1000 [vc4]
> [    7.814283]  do_one_initcall+0x38/0x120
> [    7.814295]  do_init_module+0x58/0x1b8
> [    7.814304]  load_module+0x1fa8/0x2280
> [    7.814311]  SyS_finit_module+0xc0/0xd0
> [    7.814318]  __sys_trace_return+0x0/0x4
> [    7.814325] ---[ end trace 3348554eb91e19a1 ]---

Looks like I didn't test this code with CONFIG_REFCOUNT_FULL enabled :-/.

Anyway, can you try to apply the following diff and let me know if it
fixes the problem?

Thanks,

Boris

--->8---

Comments

Daniel Vetter Nov. 22, 2017, 6:13 p.m. UTC | #1
On Wed, Nov 22, 2017 at 6:51 PM, Boris Brezillon
<boris.brezillon@free-electrons.com> wrote:
> Hi Stefan,
>
> On Wed, 22 Nov 2017 17:43:35 +0100 (CET)
> Stefan Wahren <stefan.wahren@i2se.com> wrote:
>
>> Hi Boris,
>> if i boot Raspberry Pi 3 (ARM64, defconfig, linux-next-20171122) with sufficient CMA memory (32 MB), i'll get this warning during boot:
>>
>> [    7.623510] vc4-drm soc:gpu: bound 3f902000.hdmi (ops vc4_hdmi_ops [vc4])
>> [    7.632453] vc4-drm soc:gpu: bound 3f806000.vec (ops vc4_vec_ops [vc4])
>> [    7.639707] vc4-drm soc:gpu: bound 3f400000.hvs (ops vc4_hvs_ops [vc4])
>> [    7.647364] vc4-drm soc:gpu: bound 3f206000.pixelvalve (ops vc4_crtc_ops [vc4])
>> [    7.655451] vc4-drm soc:gpu: bound 3f207000.pixelvalve (ops vc4_crtc_ops [vc4])
>> [    7.663415] vc4-drm soc:gpu: bound 3f807000.pixelvalve (ops vc4_crtc_ops [vc4])
>> [    7.730580] vc4-drm soc:gpu: bound 3fc00000.v3d (ops vc4_v3d_ops [vc4])
>> [    7.743965] [drm] Initialized vc4 0.0.0 20140616 for soc:gpu on minor 0
>> [    7.750841] [drm] Supports vblank timestamp caching Rev 2 (21.10.2013).
>> [    7.757620] [drm] Driver supports precise vblank timestamp query.
>> [    7.811775] ------------[ cut here ]------------
>> [    7.811780] refcount_t: increment on 0; use-after-free.
>> [    7.811881] WARNING: CPU: 2 PID: 2188 at lib/refcount.c:153 refcount_inc+0x44/0x50
>> [    7.811884] Modules linked in: vc4(+) cfg80211 cec drm_kms_helper smsc95xx usbnet drm rfkill brcmutil bcm2835_rng rng_core bcm2835_dma crc32_ce i2c_bcm2835 pwm_bcm2835 ip_tables x_tables ipv6
>> [    7.811950] CPU: 2 PID: 2188 Comm: systemd-udevd Not tainted 4.14.0-next-20171122 #1
>> [    7.811953] Hardware name: Raspberry Pi 3 Model B Rev 1.2 (DT)
>> [    7.811958] task: ffff800036b91c00 task.stack: ffff00000d6f0000
>> [    7.811967] pstate: 20000005 (nzCv daif -PAN -UAO)
>> [    7.811974] pc : refcount_inc+0x44/0x50
>> [    7.811981] lr : refcount_inc+0x44/0x50
>> [    7.811984] sp : ffff00000d6f3300
>> [    7.811988] x29: ffff00000d6f3300 x28: 0000000000000000
>> [    7.811996] x27: 0000000000000003 x26: ffff800037107800
>> [    7.812004] x25: 0000000000000001 x24: ffff800035afdc00
>> [    7.812012] x23: 0000000000000000 x22: ffff800035dfa600
>> [    7.812020] x21: ffff800035afd9b0 x20: ffff800035afd9a4
>> [    7.812027] x19: 0000000000000000 x18: 0000000000000000
>> [    7.812034] x17: 0000000000000001 x16: 0000000000000019
>> [    7.812042] x15: 0000000000000001 x14: 00000000fffffff0
>> [    7.812049] x13: ffff0000090ae840 x12: ffff000008fa2d50
>> [    7.812057] x11: ffff000008fa2000 x10: ffff0000090ad000
>> [    7.812064] x9 : 0000000000000000 x8 : ffff0000090b5c2f
>> [    7.812072] x7 : 0000000000000000 x6 : 000000000015ee00
>> [    7.812079] x5 : 0000000000000000 x4 : 0000000000000000
>> [    7.812087] x3 : ffffffffffffffff x2 : 0000800030047000
>> [    7.812094] x1 : ffff800036b91c00 x0 : 000000000000002b
>> [    7.812102] Call trace:
>> [    7.812109]  refcount_inc+0x44/0x50
>> [    7.812226]  vc4_bo_inc_usecnt+0x84/0x88 [vc4]
>> [    7.812310]  vc4_prepare_fb+0x40/0xf0 [vc4]
>> [    7.812460]  drm_atomic_helper_prepare_planes+0x54/0xf0 [drm_kms_helper]
>> [    7.812543]  vc4_atomic_commit+0x88/0x130 [vc4]
>> [    7.812868]  drm_atomic_commit+0x48/0x68 [drm]
>> [    7.813002]  restore_fbdev_mode_atomic+0x1d8/0x1e8 [drm_kms_helper]
>> [    7.813113]  restore_fbdev_mode+0x28/0x160 [drm_kms_helper]
>> [    7.813223]  drm_fb_helper_restore_fbdev_mode_unlocked.part.24+0x28/0x90 [drm_kms_helper]
>> [    7.813331]  drm_fb_helper_set_par+0x54/0xa8 [drm_kms_helper]
>> [    7.813346]  fbcon_init+0x4e8/0x538
>> [    7.813357]  visual_init+0xb4/0x108
>> [    7.813366]  do_bind_con_driver+0x1b8/0x3a0
>> [    7.813373]  do_take_over_console+0x150/0x1d0
>> [    7.813380]  do_fbcon_takeover+0x6c/0xf0
>> [    7.813387]  fbcon_event_notify+0x8fc/0x928
>> [    7.813399]  notifier_call_chain+0x50/0x90
>> [    7.813406]  __blocking_notifier_call_chain+0x4c/0x90
>> [    7.813413]  blocking_notifier_call_chain+0x14/0x20
>> [    7.813420]  fb_notifier_call_chain+0x1c/0x28
>> [    7.813426]  register_framebuffer+0x1d0/0x2d8
>> [    7.813533]  __drm_fb_helper_initial_config_and_unlock+0x1e8/0x350 [drm_kms_helper]
>> [    7.813639]  drm_fb_helper_initial_config+0x40/0x58 [drm_kms_helper]
>> [    7.813747]  drm_fbdev_cma_init_with_funcs+0x88/0x158 [drm_kms_helper]
>> [    7.813855]  drm_fbdev_cma_init+0x14/0x28 [drm_kms_helper]
>> [    7.813943]  vc4_kms_load+0xa4/0xf0 [vc4]
>> [    7.814026]  vc4_drm_bind+0x100/0x168 [vc4]
>> [    7.814038]  try_to_bring_up_master+0x144/0x1a8
>> [    7.814044]  component_master_add_with_match+0x9c/0xe0
>> [    7.814128]  vc4_platform_drm_probe+0xb4/0xf0 [vc4]
>> [    7.814137]  platform_drv_probe+0x58/0xc0
>> [    7.814146]  driver_probe_device+0x224/0x308
>> [    7.814153]  __driver_attach+0xac/0xb0
>> [    7.814161]  bus_for_each_dev+0x64/0xa0
>> [    7.814169]  driver_attach+0x20/0x28
>> [    7.814177]  bus_add_driver+0x108/0x228
>> [    7.814184]  driver_register+0x60/0xf8
>> [    7.814190]  __platform_driver_register+0x40/0x48
>> [    7.814274]  vc4_drm_register+0x38/0x1000 [vc4]
>> [    7.814283]  do_one_initcall+0x38/0x120
>> [    7.814295]  do_init_module+0x58/0x1b8
>> [    7.814304]  load_module+0x1fa8/0x2280
>> [    7.814311]  SyS_finit_module+0xc0/0xd0
>> [    7.814318]  __sys_trace_return+0x0/0x4
>> [    7.814325] ---[ end trace 3348554eb91e19a1 ]---
>
> Looks like I didn't test this code with CONFIG_REFCOUNT_FULL enabled :-/.
>
> Anyway, can you try to apply the following diff and let me know if it
> fixes the problem?
>
> Thanks,
>
> Boris
>
> --->8---
> diff --git a/drivers/gpu/drm/vc4/vc4_bo.c b/drivers/gpu/drm/vc4/vc4_bo.c
> index 4ae45d7dac42..2decc8e2c79f 100644
> --- a/drivers/gpu/drm/vc4/vc4_bo.c
> +++ b/drivers/gpu/drm/vc4/vc4_bo.c
> @@ -637,7 +637,8 @@ int vc4_bo_inc_usecnt(struct vc4_bo *bo)
>         mutex_lock(&bo->madv_lock);
>         switch (bo->madv) {
>         case VC4_MADV_WILLNEED:
> -               refcount_inc(&bo->usecnt);
> +               if (!refcount_inc_not_zero(&bo->usecnt))
> +                       refcount_set(&bo->usecnt, 1);

This is racy. You need a refcount_inc_allow_zero (if that exists).
-Daniel
--
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch
Stefan Wahren Nov. 22, 2017, 6:17 p.m. UTC | #2
Hi Boris,

> Boris Brezillon <boris.brezillon@free-electrons.com> hat am 22. November 2017 um 18:51 geschrieben:
> 
> 
> Hi Stefan,
> 
> On Wed, 22 Nov 2017 17:43:35 +0100 (CET)
> Stefan Wahren <stefan.wahren@i2se.com> wrote:
> ...
> 
> Looks like I didn't test this code with CONFIG_REFCOUNT_FULL enabled :-/.
> 
> Anyway, can you try to apply the following diff and let me know if it
> fixes the problem?

yes, this fixes the problem.

> 
> Thanks,
> 
> Boris
> 
> --->8---
> diff --git a/drivers/gpu/drm/vc4/vc4_bo.c b/drivers/gpu/drm/vc4/vc4_bo.c
> index 4ae45d7dac42..2decc8e2c79f 100644
> --- a/drivers/gpu/drm/vc4/vc4_bo.c
> +++ b/drivers/gpu/drm/vc4/vc4_bo.c
> @@ -637,7 +637,8 @@ int vc4_bo_inc_usecnt(struct vc4_bo *bo)
>         mutex_lock(&bo->madv_lock);
>         switch (bo->madv) {
>         case VC4_MADV_WILLNEED:
> -               refcount_inc(&bo->usecnt);
> +               if (!refcount_inc_not_zero(&bo->usecnt))
> +                       refcount_set(&bo->usecnt, 1);
>                 ret = 0;
>                 break;
>         case VC4_MADV_DONTNEED:
>
Boris BREZILLON Nov. 22, 2017, 6:21 p.m. UTC | #3
On Wed, 22 Nov 2017 19:13:09 +0100
Daniel Vetter <daniel@ffwll.ch> wrote:

> On Wed, Nov 22, 2017 at 6:51 PM, Boris Brezillon
> <boris.brezillon@free-electrons.com> wrote:
> > Hi Stefan,
> >
> > On Wed, 22 Nov 2017 17:43:35 +0100 (CET)
> > Stefan Wahren <stefan.wahren@i2se.com> wrote:
> >  
> >> Hi Boris,
> >> if i boot Raspberry Pi 3 (ARM64, defconfig, linux-next-20171122) with sufficient CMA memory (32 MB), i'll get this warning during boot:
> >>
> >> [    7.623510] vc4-drm soc:gpu: bound 3f902000.hdmi (ops vc4_hdmi_ops [vc4])
> >> [    7.632453] vc4-drm soc:gpu: bound 3f806000.vec (ops vc4_vec_ops [vc4])
> >> [    7.639707] vc4-drm soc:gpu: bound 3f400000.hvs (ops vc4_hvs_ops [vc4])
> >> [    7.647364] vc4-drm soc:gpu: bound 3f206000.pixelvalve (ops vc4_crtc_ops [vc4])
> >> [    7.655451] vc4-drm soc:gpu: bound 3f207000.pixelvalve (ops vc4_crtc_ops [vc4])
> >> [    7.663415] vc4-drm soc:gpu: bound 3f807000.pixelvalve (ops vc4_crtc_ops [vc4])
> >> [    7.730580] vc4-drm soc:gpu: bound 3fc00000.v3d (ops vc4_v3d_ops [vc4])
> >> [    7.743965] [drm] Initialized vc4 0.0.0 20140616 for soc:gpu on minor 0
> >> [    7.750841] [drm] Supports vblank timestamp caching Rev 2 (21.10.2013).
> >> [    7.757620] [drm] Driver supports precise vblank timestamp query.
> >> [    7.811775] ------------[ cut here ]------------
> >> [    7.811780] refcount_t: increment on 0; use-after-free.
> >> [    7.811881] WARNING: CPU: 2 PID: 2188 at lib/refcount.c:153 refcount_inc+0x44/0x50
> >> [    7.811884] Modules linked in: vc4(+) cfg80211 cec drm_kms_helper smsc95xx usbnet drm rfkill brcmutil bcm2835_rng rng_core bcm2835_dma crc32_ce i2c_bcm2835 pwm_bcm2835 ip_tables x_tables ipv6
> >> [    7.811950] CPU: 2 PID: 2188 Comm: systemd-udevd Not tainted 4.14.0-next-20171122 #1
> >> [    7.811953] Hardware name: Raspberry Pi 3 Model B Rev 1.2 (DT)
> >> [    7.811958] task: ffff800036b91c00 task.stack: ffff00000d6f0000
> >> [    7.811967] pstate: 20000005 (nzCv daif -PAN -UAO)
> >> [    7.811974] pc : refcount_inc+0x44/0x50
> >> [    7.811981] lr : refcount_inc+0x44/0x50
> >> [    7.811984] sp : ffff00000d6f3300
> >> [    7.811988] x29: ffff00000d6f3300 x28: 0000000000000000
> >> [    7.811996] x27: 0000000000000003 x26: ffff800037107800
> >> [    7.812004] x25: 0000000000000001 x24: ffff800035afdc00
> >> [    7.812012] x23: 0000000000000000 x22: ffff800035dfa600
> >> [    7.812020] x21: ffff800035afd9b0 x20: ffff800035afd9a4
> >> [    7.812027] x19: 0000000000000000 x18: 0000000000000000
> >> [    7.812034] x17: 0000000000000001 x16: 0000000000000019
> >> [    7.812042] x15: 0000000000000001 x14: 00000000fffffff0
> >> [    7.812049] x13: ffff0000090ae840 x12: ffff000008fa2d50
> >> [    7.812057] x11: ffff000008fa2000 x10: ffff0000090ad000
> >> [    7.812064] x9 : 0000000000000000 x8 : ffff0000090b5c2f
> >> [    7.812072] x7 : 0000000000000000 x6 : 000000000015ee00
> >> [    7.812079] x5 : 0000000000000000 x4 : 0000000000000000
> >> [    7.812087] x3 : ffffffffffffffff x2 : 0000800030047000
> >> [    7.812094] x1 : ffff800036b91c00 x0 : 000000000000002b
> >> [    7.812102] Call trace:
> >> [    7.812109]  refcount_inc+0x44/0x50
> >> [    7.812226]  vc4_bo_inc_usecnt+0x84/0x88 [vc4]
> >> [    7.812310]  vc4_prepare_fb+0x40/0xf0 [vc4]
> >> [    7.812460]  drm_atomic_helper_prepare_planes+0x54/0xf0 [drm_kms_helper]
> >> [    7.812543]  vc4_atomic_commit+0x88/0x130 [vc4]
> >> [    7.812868]  drm_atomic_commit+0x48/0x68 [drm]
> >> [    7.813002]  restore_fbdev_mode_atomic+0x1d8/0x1e8 [drm_kms_helper]
> >> [    7.813113]  restore_fbdev_mode+0x28/0x160 [drm_kms_helper]
> >> [    7.813223]  drm_fb_helper_restore_fbdev_mode_unlocked.part.24+0x28/0x90 [drm_kms_helper]
> >> [    7.813331]  drm_fb_helper_set_par+0x54/0xa8 [drm_kms_helper]
> >> [    7.813346]  fbcon_init+0x4e8/0x538
> >> [    7.813357]  visual_init+0xb4/0x108
> >> [    7.813366]  do_bind_con_driver+0x1b8/0x3a0
> >> [    7.813373]  do_take_over_console+0x150/0x1d0
> >> [    7.813380]  do_fbcon_takeover+0x6c/0xf0
> >> [    7.813387]  fbcon_event_notify+0x8fc/0x928
> >> [    7.813399]  notifier_call_chain+0x50/0x90
> >> [    7.813406]  __blocking_notifier_call_chain+0x4c/0x90
> >> [    7.813413]  blocking_notifier_call_chain+0x14/0x20
> >> [    7.813420]  fb_notifier_call_chain+0x1c/0x28
> >> [    7.813426]  register_framebuffer+0x1d0/0x2d8
> >> [    7.813533]  __drm_fb_helper_initial_config_and_unlock+0x1e8/0x350 [drm_kms_helper]
> >> [    7.813639]  drm_fb_helper_initial_config+0x40/0x58 [drm_kms_helper]
> >> [    7.813747]  drm_fbdev_cma_init_with_funcs+0x88/0x158 [drm_kms_helper]
> >> [    7.813855]  drm_fbdev_cma_init+0x14/0x28 [drm_kms_helper]
> >> [    7.813943]  vc4_kms_load+0xa4/0xf0 [vc4]
> >> [    7.814026]  vc4_drm_bind+0x100/0x168 [vc4]
> >> [    7.814038]  try_to_bring_up_master+0x144/0x1a8
> >> [    7.814044]  component_master_add_with_match+0x9c/0xe0
> >> [    7.814128]  vc4_platform_drm_probe+0xb4/0xf0 [vc4]
> >> [    7.814137]  platform_drv_probe+0x58/0xc0
> >> [    7.814146]  driver_probe_device+0x224/0x308
> >> [    7.814153]  __driver_attach+0xac/0xb0
> >> [    7.814161]  bus_for_each_dev+0x64/0xa0
> >> [    7.814169]  driver_attach+0x20/0x28
> >> [    7.814177]  bus_add_driver+0x108/0x228
> >> [    7.814184]  driver_register+0x60/0xf8
> >> [    7.814190]  __platform_driver_register+0x40/0x48
> >> [    7.814274]  vc4_drm_register+0x38/0x1000 [vc4]
> >> [    7.814283]  do_one_initcall+0x38/0x120
> >> [    7.814295]  do_init_module+0x58/0x1b8
> >> [    7.814304]  load_module+0x1fa8/0x2280
> >> [    7.814311]  SyS_finit_module+0xc0/0xd0
> >> [    7.814318]  __sys_trace_return+0x0/0x4
> >> [    7.814325] ---[ end trace 3348554eb91e19a1 ]---  
> >
> > Looks like I didn't test this code with CONFIG_REFCOUNT_FULL enabled :-/.
> >
> > Anyway, can you try to apply the following diff and let me know if it
> > fixes the problem?
> >
> > Thanks,
> >
> > Boris
> >  
> > --->8---  
> > diff --git a/drivers/gpu/drm/vc4/vc4_bo.c b/drivers/gpu/drm/vc4/vc4_bo.c
> > index 4ae45d7dac42..2decc8e2c79f 100644
> > --- a/drivers/gpu/drm/vc4/vc4_bo.c
> > +++ b/drivers/gpu/drm/vc4/vc4_bo.c
> > @@ -637,7 +637,8 @@ int vc4_bo_inc_usecnt(struct vc4_bo *bo)
> >         mutex_lock(&bo->madv_lock);
> >         switch (bo->madv) {
> >         case VC4_MADV_WILLNEED:
> > -               refcount_inc(&bo->usecnt);
> > +               if (!refcount_inc_not_zero(&bo->usecnt))
> > +                       refcount_set(&bo->usecnt, 1);  
> 
> This is racy.

Well, in this case it's not, because this section is protected by a
mutex.

> You need a refcount_inc_allow_zero (if that exists).

I searched for something like that, but it seems that there's no such
helper.
Daniel Vetter Nov. 23, 2017, 7:57 a.m. UTC | #4
On Wed, Nov 22, 2017 at 07:21:00PM +0100, Boris Brezillon wrote:
> On Wed, 22 Nov 2017 19:13:09 +0100
> Daniel Vetter <daniel@ffwll.ch> wrote:
> 
> > On Wed, Nov 22, 2017 at 6:51 PM, Boris Brezillon
> > <boris.brezillon@free-electrons.com> wrote:
> > > Hi Stefan,
> > >
> > > On Wed, 22 Nov 2017 17:43:35 +0100 (CET)
> > > Stefan Wahren <stefan.wahren@i2se.com> wrote:
> > >  
> > >> Hi Boris,
> > >> if i boot Raspberry Pi 3 (ARM64, defconfig, linux-next-20171122) with sufficient CMA memory (32 MB), i'll get this warning during boot:
> > >>
> > >> [    7.623510] vc4-drm soc:gpu: bound 3f902000.hdmi (ops vc4_hdmi_ops [vc4])
> > >> [    7.632453] vc4-drm soc:gpu: bound 3f806000.vec (ops vc4_vec_ops [vc4])
> > >> [    7.639707] vc4-drm soc:gpu: bound 3f400000.hvs (ops vc4_hvs_ops [vc4])
> > >> [    7.647364] vc4-drm soc:gpu: bound 3f206000.pixelvalve (ops vc4_crtc_ops [vc4])
> > >> [    7.655451] vc4-drm soc:gpu: bound 3f207000.pixelvalve (ops vc4_crtc_ops [vc4])
> > >> [    7.663415] vc4-drm soc:gpu: bound 3f807000.pixelvalve (ops vc4_crtc_ops [vc4])
> > >> [    7.730580] vc4-drm soc:gpu: bound 3fc00000.v3d (ops vc4_v3d_ops [vc4])
> > >> [    7.743965] [drm] Initialized vc4 0.0.0 20140616 for soc:gpu on minor 0
> > >> [    7.750841] [drm] Supports vblank timestamp caching Rev 2 (21.10.2013).
> > >> [    7.757620] [drm] Driver supports precise vblank timestamp query.
> > >> [    7.811775] ------------[ cut here ]------------
> > >> [    7.811780] refcount_t: increment on 0; use-after-free.
> > >> [    7.811881] WARNING: CPU: 2 PID: 2188 at lib/refcount.c:153 refcount_inc+0x44/0x50
> > >> [    7.811884] Modules linked in: vc4(+) cfg80211 cec drm_kms_helper smsc95xx usbnet drm rfkill brcmutil bcm2835_rng rng_core bcm2835_dma crc32_ce i2c_bcm2835 pwm_bcm2835 ip_tables x_tables ipv6
> > >> [    7.811950] CPU: 2 PID: 2188 Comm: systemd-udevd Not tainted 4.14.0-next-20171122 #1
> > >> [    7.811953] Hardware name: Raspberry Pi 3 Model B Rev 1.2 (DT)
> > >> [    7.811958] task: ffff800036b91c00 task.stack: ffff00000d6f0000
> > >> [    7.811967] pstate: 20000005 (nzCv daif -PAN -UAO)
> > >> [    7.811974] pc : refcount_inc+0x44/0x50
> > >> [    7.811981] lr : refcount_inc+0x44/0x50
> > >> [    7.811984] sp : ffff00000d6f3300
> > >> [    7.811988] x29: ffff00000d6f3300 x28: 0000000000000000
> > >> [    7.811996] x27: 0000000000000003 x26: ffff800037107800
> > >> [    7.812004] x25: 0000000000000001 x24: ffff800035afdc00
> > >> [    7.812012] x23: 0000000000000000 x22: ffff800035dfa600
> > >> [    7.812020] x21: ffff800035afd9b0 x20: ffff800035afd9a4
> > >> [    7.812027] x19: 0000000000000000 x18: 0000000000000000
> > >> [    7.812034] x17: 0000000000000001 x16: 0000000000000019
> > >> [    7.812042] x15: 0000000000000001 x14: 00000000fffffff0
> > >> [    7.812049] x13: ffff0000090ae840 x12: ffff000008fa2d50
> > >> [    7.812057] x11: ffff000008fa2000 x10: ffff0000090ad000
> > >> [    7.812064] x9 : 0000000000000000 x8 : ffff0000090b5c2f
> > >> [    7.812072] x7 : 0000000000000000 x6 : 000000000015ee00
> > >> [    7.812079] x5 : 0000000000000000 x4 : 0000000000000000
> > >> [    7.812087] x3 : ffffffffffffffff x2 : 0000800030047000
> > >> [    7.812094] x1 : ffff800036b91c00 x0 : 000000000000002b
> > >> [    7.812102] Call trace:
> > >> [    7.812109]  refcount_inc+0x44/0x50
> > >> [    7.812226]  vc4_bo_inc_usecnt+0x84/0x88 [vc4]
> > >> [    7.812310]  vc4_prepare_fb+0x40/0xf0 [vc4]
> > >> [    7.812460]  drm_atomic_helper_prepare_planes+0x54/0xf0 [drm_kms_helper]
> > >> [    7.812543]  vc4_atomic_commit+0x88/0x130 [vc4]
> > >> [    7.812868]  drm_atomic_commit+0x48/0x68 [drm]
> > >> [    7.813002]  restore_fbdev_mode_atomic+0x1d8/0x1e8 [drm_kms_helper]
> > >> [    7.813113]  restore_fbdev_mode+0x28/0x160 [drm_kms_helper]
> > >> [    7.813223]  drm_fb_helper_restore_fbdev_mode_unlocked.part.24+0x28/0x90 [drm_kms_helper]
> > >> [    7.813331]  drm_fb_helper_set_par+0x54/0xa8 [drm_kms_helper]
> > >> [    7.813346]  fbcon_init+0x4e8/0x538
> > >> [    7.813357]  visual_init+0xb4/0x108
> > >> [    7.813366]  do_bind_con_driver+0x1b8/0x3a0
> > >> [    7.813373]  do_take_over_console+0x150/0x1d0
> > >> [    7.813380]  do_fbcon_takeover+0x6c/0xf0
> > >> [    7.813387]  fbcon_event_notify+0x8fc/0x928
> > >> [    7.813399]  notifier_call_chain+0x50/0x90
> > >> [    7.813406]  __blocking_notifier_call_chain+0x4c/0x90
> > >> [    7.813413]  blocking_notifier_call_chain+0x14/0x20
> > >> [    7.813420]  fb_notifier_call_chain+0x1c/0x28
> > >> [    7.813426]  register_framebuffer+0x1d0/0x2d8
> > >> [    7.813533]  __drm_fb_helper_initial_config_and_unlock+0x1e8/0x350 [drm_kms_helper]
> > >> [    7.813639]  drm_fb_helper_initial_config+0x40/0x58 [drm_kms_helper]
> > >> [    7.813747]  drm_fbdev_cma_init_with_funcs+0x88/0x158 [drm_kms_helper]
> > >> [    7.813855]  drm_fbdev_cma_init+0x14/0x28 [drm_kms_helper]
> > >> [    7.813943]  vc4_kms_load+0xa4/0xf0 [vc4]
> > >> [    7.814026]  vc4_drm_bind+0x100/0x168 [vc4]
> > >> [    7.814038]  try_to_bring_up_master+0x144/0x1a8
> > >> [    7.814044]  component_master_add_with_match+0x9c/0xe0
> > >> [    7.814128]  vc4_platform_drm_probe+0xb4/0xf0 [vc4]
> > >> [    7.814137]  platform_drv_probe+0x58/0xc0
> > >> [    7.814146]  driver_probe_device+0x224/0x308
> > >> [    7.814153]  __driver_attach+0xac/0xb0
> > >> [    7.814161]  bus_for_each_dev+0x64/0xa0
> > >> [    7.814169]  driver_attach+0x20/0x28
> > >> [    7.814177]  bus_add_driver+0x108/0x228
> > >> [    7.814184]  driver_register+0x60/0xf8
> > >> [    7.814190]  __platform_driver_register+0x40/0x48
> > >> [    7.814274]  vc4_drm_register+0x38/0x1000 [vc4]
> > >> [    7.814283]  do_one_initcall+0x38/0x120
> > >> [    7.814295]  do_init_module+0x58/0x1b8
> > >> [    7.814304]  load_module+0x1fa8/0x2280
> > >> [    7.814311]  SyS_finit_module+0xc0/0xd0
> > >> [    7.814318]  __sys_trace_return+0x0/0x4
> > >> [    7.814325] ---[ end trace 3348554eb91e19a1 ]---  
> > >
> > > Looks like I didn't test this code with CONFIG_REFCOUNT_FULL enabled :-/.
> > >
> > > Anyway, can you try to apply the following diff and let me know if it
> > > fixes the problem?
> > >
> > > Thanks,
> > >
> > > Boris
> > >  
> > > --->8---  
> > > diff --git a/drivers/gpu/drm/vc4/vc4_bo.c b/drivers/gpu/drm/vc4/vc4_bo.c
> > > index 4ae45d7dac42..2decc8e2c79f 100644
> > > --- a/drivers/gpu/drm/vc4/vc4_bo.c
> > > +++ b/drivers/gpu/drm/vc4/vc4_bo.c
> > > @@ -637,7 +637,8 @@ int vc4_bo_inc_usecnt(struct vc4_bo *bo)
> > >         mutex_lock(&bo->madv_lock);
> > >         switch (bo->madv) {
> > >         case VC4_MADV_WILLNEED:
> > > -               refcount_inc(&bo->usecnt);
> > > +               if (!refcount_inc_not_zero(&bo->usecnt))
> > > +                       refcount_set(&bo->usecnt, 1);  
> > 
> > This is racy.
> 
> Well, in this case it's not, because this section is protected by a
> mutex.
> 
> > You need a refcount_inc_allow_zero (if that exists).
> 
> I searched for something like that, but it seems that there's no such
> helper.

In that case I think the right thing is to switch to atomic_t. That means
no overflow protection, but if we don't have a "this can be 0, I know what
we're doing" style refcount, then I don't think it's suitable.

Adding Kees to confirm or clarify or ...
-Daniel
Kees Cook Nov. 25, 2017, 8:57 p.m. UTC | #5
On Wed, Nov 22, 2017 at 11:57 PM, Daniel Vetter <daniel@ffwll.ch> wrote:
> On Wed, Nov 22, 2017 at 07:21:00PM +0100, Boris Brezillon wrote:
>> On Wed, 22 Nov 2017 19:13:09 +0100
>> Daniel Vetter <daniel@ffwll.ch> wrote:
>>
>> > On Wed, Nov 22, 2017 at 6:51 PM, Boris Brezillon
>> > <boris.brezillon@free-electrons.com> wrote:
>> > > Hi Stefan,
>> > >
>> > > On Wed, 22 Nov 2017 17:43:35 +0100 (CET)
>> > > Stefan Wahren <stefan.wahren@i2se.com> wrote:
>> > >
>> > >> Hi Boris,
>> > >> if i boot Raspberry Pi 3 (ARM64, defconfig, linux-next-20171122) with sufficient CMA memory (32 MB), i'll get this warning during boot:
>> > >>
>> > >> [    7.623510] vc4-drm soc:gpu: bound 3f902000.hdmi (ops vc4_hdmi_ops [vc4])
>> > >> [    7.632453] vc4-drm soc:gpu: bound 3f806000.vec (ops vc4_vec_ops [vc4])
>> > >> [    7.639707] vc4-drm soc:gpu: bound 3f400000.hvs (ops vc4_hvs_ops [vc4])
>> > >> [    7.647364] vc4-drm soc:gpu: bound 3f206000.pixelvalve (ops vc4_crtc_ops [vc4])
>> > >> [    7.655451] vc4-drm soc:gpu: bound 3f207000.pixelvalve (ops vc4_crtc_ops [vc4])
>> > >> [    7.663415] vc4-drm soc:gpu: bound 3f807000.pixelvalve (ops vc4_crtc_ops [vc4])
>> > >> [    7.730580] vc4-drm soc:gpu: bound 3fc00000.v3d (ops vc4_v3d_ops [vc4])
>> > >> [    7.743965] [drm] Initialized vc4 0.0.0 20140616 for soc:gpu on minor 0
>> > >> [    7.750841] [drm] Supports vblank timestamp caching Rev 2 (21.10.2013).
>> > >> [    7.757620] [drm] Driver supports precise vblank timestamp query.
>> > >> [    7.811775] ------------[ cut here ]------------
>> > >> [    7.811780] refcount_t: increment on 0; use-after-free.
>> > >> [    7.811881] WARNING: CPU: 2 PID: 2188 at lib/refcount.c:153 refcount_inc+0x44/0x50
>> > >> [    7.811884] Modules linked in: vc4(+) cfg80211 cec drm_kms_helper smsc95xx usbnet drm rfkill brcmutil bcm2835_rng rng_core bcm2835_dma crc32_ce i2c_bcm2835 pwm_bcm2835 ip_tables x_tables ipv6
>> > >> [    7.811950] CPU: 2 PID: 2188 Comm: systemd-udevd Not tainted 4.14.0-next-20171122 #1
>> > >> [    7.811953] Hardware name: Raspberry Pi 3 Model B Rev 1.2 (DT)
>> > >> [    7.811958] task: ffff800036b91c00 task.stack: ffff00000d6f0000
>> > >> [    7.811967] pstate: 20000005 (nzCv daif -PAN -UAO)
>> > >> [    7.811974] pc : refcount_inc+0x44/0x50
>> > >> [    7.811981] lr : refcount_inc+0x44/0x50
>> > >> [    7.811984] sp : ffff00000d6f3300
>> > >> [    7.811988] x29: ffff00000d6f3300 x28: 0000000000000000
>> > >> [    7.811996] x27: 0000000000000003 x26: ffff800037107800
>> > >> [    7.812004] x25: 0000000000000001 x24: ffff800035afdc00
>> > >> [    7.812012] x23: 0000000000000000 x22: ffff800035dfa600
>> > >> [    7.812020] x21: ffff800035afd9b0 x20: ffff800035afd9a4
>> > >> [    7.812027] x19: 0000000000000000 x18: 0000000000000000
>> > >> [    7.812034] x17: 0000000000000001 x16: 0000000000000019
>> > >> [    7.812042] x15: 0000000000000001 x14: 00000000fffffff0
>> > >> [    7.812049] x13: ffff0000090ae840 x12: ffff000008fa2d50
>> > >> [    7.812057] x11: ffff000008fa2000 x10: ffff0000090ad000
>> > >> [    7.812064] x9 : 0000000000000000 x8 : ffff0000090b5c2f
>> > >> [    7.812072] x7 : 0000000000000000 x6 : 000000000015ee00
>> > >> [    7.812079] x5 : 0000000000000000 x4 : 0000000000000000
>> > >> [    7.812087] x3 : ffffffffffffffff x2 : 0000800030047000
>> > >> [    7.812094] x1 : ffff800036b91c00 x0 : 000000000000002b
>> > >> [    7.812102] Call trace:
>> > >> [    7.812109]  refcount_inc+0x44/0x50
>> > >> [    7.812226]  vc4_bo_inc_usecnt+0x84/0x88 [vc4]
>> > >> [    7.812310]  vc4_prepare_fb+0x40/0xf0 [vc4]
>> > >> [    7.812460]  drm_atomic_helper_prepare_planes+0x54/0xf0 [drm_kms_helper]
>> > >> [    7.812543]  vc4_atomic_commit+0x88/0x130 [vc4]
>> > >> [    7.812868]  drm_atomic_commit+0x48/0x68 [drm]
>> > >> [    7.813002]  restore_fbdev_mode_atomic+0x1d8/0x1e8 [drm_kms_helper]
>> > >> [    7.813113]  restore_fbdev_mode+0x28/0x160 [drm_kms_helper]
>> > >> [    7.813223]  drm_fb_helper_restore_fbdev_mode_unlocked.part.24+0x28/0x90 [drm_kms_helper]
>> > >> [    7.813331]  drm_fb_helper_set_par+0x54/0xa8 [drm_kms_helper]
>> > >> [    7.813346]  fbcon_init+0x4e8/0x538
>> > >> [    7.813357]  visual_init+0xb4/0x108
>> > >> [    7.813366]  do_bind_con_driver+0x1b8/0x3a0
>> > >> [    7.813373]  do_take_over_console+0x150/0x1d0
>> > >> [    7.813380]  do_fbcon_takeover+0x6c/0xf0
>> > >> [    7.813387]  fbcon_event_notify+0x8fc/0x928
>> > >> [    7.813399]  notifier_call_chain+0x50/0x90
>> > >> [    7.813406]  __blocking_notifier_call_chain+0x4c/0x90
>> > >> [    7.813413]  blocking_notifier_call_chain+0x14/0x20
>> > >> [    7.813420]  fb_notifier_call_chain+0x1c/0x28
>> > >> [    7.813426]  register_framebuffer+0x1d0/0x2d8
>> > >> [    7.813533]  __drm_fb_helper_initial_config_and_unlock+0x1e8/0x350 [drm_kms_helper]
>> > >> [    7.813639]  drm_fb_helper_initial_config+0x40/0x58 [drm_kms_helper]
>> > >> [    7.813747]  drm_fbdev_cma_init_with_funcs+0x88/0x158 [drm_kms_helper]
>> > >> [    7.813855]  drm_fbdev_cma_init+0x14/0x28 [drm_kms_helper]
>> > >> [    7.813943]  vc4_kms_load+0xa4/0xf0 [vc4]
>> > >> [    7.814026]  vc4_drm_bind+0x100/0x168 [vc4]
>> > >> [    7.814038]  try_to_bring_up_master+0x144/0x1a8
>> > >> [    7.814044]  component_master_add_with_match+0x9c/0xe0
>> > >> [    7.814128]  vc4_platform_drm_probe+0xb4/0xf0 [vc4]
>> > >> [    7.814137]  platform_drv_probe+0x58/0xc0
>> > >> [    7.814146]  driver_probe_device+0x224/0x308
>> > >> [    7.814153]  __driver_attach+0xac/0xb0
>> > >> [    7.814161]  bus_for_each_dev+0x64/0xa0
>> > >> [    7.814169]  driver_attach+0x20/0x28
>> > >> [    7.814177]  bus_add_driver+0x108/0x228
>> > >> [    7.814184]  driver_register+0x60/0xf8
>> > >> [    7.814190]  __platform_driver_register+0x40/0x48
>> > >> [    7.814274]  vc4_drm_register+0x38/0x1000 [vc4]
>> > >> [    7.814283]  do_one_initcall+0x38/0x120
>> > >> [    7.814295]  do_init_module+0x58/0x1b8
>> > >> [    7.814304]  load_module+0x1fa8/0x2280
>> > >> [    7.814311]  SyS_finit_module+0xc0/0xd0
>> > >> [    7.814318]  __sys_trace_return+0x0/0x4
>> > >> [    7.814325] ---[ end trace 3348554eb91e19a1 ]---
>> > >
>> > > Looks like I didn't test this code with CONFIG_REFCOUNT_FULL enabled :-/.
>> > >
>> > > Anyway, can you try to apply the following diff and let me know if it
>> > > fixes the problem?
>> > >
>> > > Thanks,
>> > >
>> > > Boris
>> > >
>> > > --->8---
>> > > diff --git a/drivers/gpu/drm/vc4/vc4_bo.c b/drivers/gpu/drm/vc4/vc4_bo.c
>> > > index 4ae45d7dac42..2decc8e2c79f 100644
>> > > --- a/drivers/gpu/drm/vc4/vc4_bo.c
>> > > +++ b/drivers/gpu/drm/vc4/vc4_bo.c
>> > > @@ -637,7 +637,8 @@ int vc4_bo_inc_usecnt(struct vc4_bo *bo)
>> > >         mutex_lock(&bo->madv_lock);
>> > >         switch (bo->madv) {
>> > >         case VC4_MADV_WILLNEED:
>> > > -               refcount_inc(&bo->usecnt);
>> > > +               if (!refcount_inc_not_zero(&bo->usecnt))
>> > > +                       refcount_set(&bo->usecnt, 1);
>> >
>> > This is racy.
>>
>> Well, in this case it's not, because this section is protected by a
>> mutex.
>>
>> > You need a refcount_inc_allow_zero (if that exists).
>>
>> I searched for something like that, but it seems that there's no such
>> helper.
>
> In that case I think the right thing is to switch to atomic_t. That means
> no overflow protection, but if we don't have a "this can be 0, I know what
> we're doing" style refcount, then I don't think it's suitable.
>
> Adding Kees to confirm or clarify or ...

Adding Elena for more details. IIRC there have been two cases:

1) refcount == 0 doesn't mean the object has been released. The
solution tends to be a +1 to the count everywhere.

2) refcount == 0 means the object was released, but the refcount gets
reused for the next allocation. The solution tends to be to use
refcount_set() in the new initialization instead of refcount_inc().

It's not clear to me if either apply here, though.

-Kees
Boris BREZILLON Nov. 26, 2017, 8:02 p.m. UTC | #6
Hi Kees,

On Sat, 25 Nov 2017 12:57:04 -0800
Kees Cook <keescook@chromium.org> wrote:

> On Wed, Nov 22, 2017 at 11:57 PM, Daniel Vetter <daniel@ffwll.ch> wrote:
> > On Wed, Nov 22, 2017 at 07:21:00PM +0100, Boris Brezillon wrote:  
> >> On Wed, 22 Nov 2017 19:13:09 +0100
> >> Daniel Vetter <daniel@ffwll.ch> wrote:
> >>  
> >> > On Wed, Nov 22, 2017 at 6:51 PM, Boris Brezillon
> >> > <boris.brezillon@free-electrons.com> wrote:  
> >> > > Hi Stefan,
> >> > >
> >> > > On Wed, 22 Nov 2017 17:43:35 +0100 (CET)
> >> > > Stefan Wahren <stefan.wahren@i2se.com> wrote:
> >> > >  
> >> > >> Hi Boris,
> >> > >> if i boot Raspberry Pi 3 (ARM64, defconfig, linux-next-20171122) with sufficient CMA memory (32 MB), i'll get this warning during boot:
> >> > >>
> >> > >> [    7.623510] vc4-drm soc:gpu: bound 3f902000.hdmi (ops vc4_hdmi_ops [vc4])
> >> > >> [    7.632453] vc4-drm soc:gpu: bound 3f806000.vec (ops vc4_vec_ops [vc4])
> >> > >> [    7.639707] vc4-drm soc:gpu: bound 3f400000.hvs (ops vc4_hvs_ops [vc4])
> >> > >> [    7.647364] vc4-drm soc:gpu: bound 3f206000.pixelvalve (ops vc4_crtc_ops [vc4])
> >> > >> [    7.655451] vc4-drm soc:gpu: bound 3f207000.pixelvalve (ops vc4_crtc_ops [vc4])
> >> > >> [    7.663415] vc4-drm soc:gpu: bound 3f807000.pixelvalve (ops vc4_crtc_ops [vc4])
> >> > >> [    7.730580] vc4-drm soc:gpu: bound 3fc00000.v3d (ops vc4_v3d_ops [vc4])
> >> > >> [    7.743965] [drm] Initialized vc4 0.0.0 20140616 for soc:gpu on minor 0
> >> > >> [    7.750841] [drm] Supports vblank timestamp caching Rev 2 (21.10.2013).
> >> > >> [    7.757620] [drm] Driver supports precise vblank timestamp query.
> >> > >> [    7.811775] ------------[ cut here ]------------
> >> > >> [    7.811780] refcount_t: increment on 0; use-after-free.
> >> > >> [    7.811881] WARNING: CPU: 2 PID: 2188 at lib/refcount.c:153 refcount_inc+0x44/0x50
> >> > >> [    7.811884] Modules linked in: vc4(+) cfg80211 cec drm_kms_helper smsc95xx usbnet drm rfkill brcmutil bcm2835_rng rng_core bcm2835_dma crc32_ce i2c_bcm2835 pwm_bcm2835 ip_tables x_tables ipv6
> >> > >> [    7.811950] CPU: 2 PID: 2188 Comm: systemd-udevd Not tainted 4.14.0-next-20171122 #1
> >> > >> [    7.811953] Hardware name: Raspberry Pi 3 Model B Rev 1.2 (DT)
> >> > >> [    7.811958] task: ffff800036b91c00 task.stack: ffff00000d6f0000
> >> > >> [    7.811967] pstate: 20000005 (nzCv daif -PAN -UAO)
> >> > >> [    7.811974] pc : refcount_inc+0x44/0x50
> >> > >> [    7.811981] lr : refcount_inc+0x44/0x50
> >> > >> [    7.811984] sp : ffff00000d6f3300
> >> > >> [    7.811988] x29: ffff00000d6f3300 x28: 0000000000000000
> >> > >> [    7.811996] x27: 0000000000000003 x26: ffff800037107800
> >> > >> [    7.812004] x25: 0000000000000001 x24: ffff800035afdc00
> >> > >> [    7.812012] x23: 0000000000000000 x22: ffff800035dfa600
> >> > >> [    7.812020] x21: ffff800035afd9b0 x20: ffff800035afd9a4
> >> > >> [    7.812027] x19: 0000000000000000 x18: 0000000000000000
> >> > >> [    7.812034] x17: 0000000000000001 x16: 0000000000000019
> >> > >> [    7.812042] x15: 0000000000000001 x14: 00000000fffffff0
> >> > >> [    7.812049] x13: ffff0000090ae840 x12: ffff000008fa2d50
> >> > >> [    7.812057] x11: ffff000008fa2000 x10: ffff0000090ad000
> >> > >> [    7.812064] x9 : 0000000000000000 x8 : ffff0000090b5c2f
> >> > >> [    7.812072] x7 : 0000000000000000 x6 : 000000000015ee00
> >> > >> [    7.812079] x5 : 0000000000000000 x4 : 0000000000000000
> >> > >> [    7.812087] x3 : ffffffffffffffff x2 : 0000800030047000
> >> > >> [    7.812094] x1 : ffff800036b91c00 x0 : 000000000000002b
> >> > >> [    7.812102] Call trace:
> >> > >> [    7.812109]  refcount_inc+0x44/0x50
> >> > >> [    7.812226]  vc4_bo_inc_usecnt+0x84/0x88 [vc4]
> >> > >> [    7.812310]  vc4_prepare_fb+0x40/0xf0 [vc4]
> >> > >> [    7.812460]  drm_atomic_helper_prepare_planes+0x54/0xf0 [drm_kms_helper]
> >> > >> [    7.812543]  vc4_atomic_commit+0x88/0x130 [vc4]
> >> > >> [    7.812868]  drm_atomic_commit+0x48/0x68 [drm]
> >> > >> [    7.813002]  restore_fbdev_mode_atomic+0x1d8/0x1e8 [drm_kms_helper]
> >> > >> [    7.813113]  restore_fbdev_mode+0x28/0x160 [drm_kms_helper]
> >> > >> [    7.813223]  drm_fb_helper_restore_fbdev_mode_unlocked.part.24+0x28/0x90 [drm_kms_helper]
> >> > >> [    7.813331]  drm_fb_helper_set_par+0x54/0xa8 [drm_kms_helper]
> >> > >> [    7.813346]  fbcon_init+0x4e8/0x538
> >> > >> [    7.813357]  visual_init+0xb4/0x108
> >> > >> [    7.813366]  do_bind_con_driver+0x1b8/0x3a0
> >> > >> [    7.813373]  do_take_over_console+0x150/0x1d0
> >> > >> [    7.813380]  do_fbcon_takeover+0x6c/0xf0
> >> > >> [    7.813387]  fbcon_event_notify+0x8fc/0x928
> >> > >> [    7.813399]  notifier_call_chain+0x50/0x90
> >> > >> [    7.813406]  __blocking_notifier_call_chain+0x4c/0x90
> >> > >> [    7.813413]  blocking_notifier_call_chain+0x14/0x20
> >> > >> [    7.813420]  fb_notifier_call_chain+0x1c/0x28
> >> > >> [    7.813426]  register_framebuffer+0x1d0/0x2d8
> >> > >> [    7.813533]  __drm_fb_helper_initial_config_and_unlock+0x1e8/0x350 [drm_kms_helper]
> >> > >> [    7.813639]  drm_fb_helper_initial_config+0x40/0x58 [drm_kms_helper]
> >> > >> [    7.813747]  drm_fbdev_cma_init_with_funcs+0x88/0x158 [drm_kms_helper]
> >> > >> [    7.813855]  drm_fbdev_cma_init+0x14/0x28 [drm_kms_helper]
> >> > >> [    7.813943]  vc4_kms_load+0xa4/0xf0 [vc4]
> >> > >> [    7.814026]  vc4_drm_bind+0x100/0x168 [vc4]
> >> > >> [    7.814038]  try_to_bring_up_master+0x144/0x1a8
> >> > >> [    7.814044]  component_master_add_with_match+0x9c/0xe0
> >> > >> [    7.814128]  vc4_platform_drm_probe+0xb4/0xf0 [vc4]
> >> > >> [    7.814137]  platform_drv_probe+0x58/0xc0
> >> > >> [    7.814146]  driver_probe_device+0x224/0x308
> >> > >> [    7.814153]  __driver_attach+0xac/0xb0
> >> > >> [    7.814161]  bus_for_each_dev+0x64/0xa0
> >> > >> [    7.814169]  driver_attach+0x20/0x28
> >> > >> [    7.814177]  bus_add_driver+0x108/0x228
> >> > >> [    7.814184]  driver_register+0x60/0xf8
> >> > >> [    7.814190]  __platform_driver_register+0x40/0x48
> >> > >> [    7.814274]  vc4_drm_register+0x38/0x1000 [vc4]
> >> > >> [    7.814283]  do_one_initcall+0x38/0x120
> >> > >> [    7.814295]  do_init_module+0x58/0x1b8
> >> > >> [    7.814304]  load_module+0x1fa8/0x2280
> >> > >> [    7.814311]  SyS_finit_module+0xc0/0xd0
> >> > >> [    7.814318]  __sys_trace_return+0x0/0x4
> >> > >> [    7.814325] ---[ end trace 3348554eb91e19a1 ]---  
> >> > >
> >> > > Looks like I didn't test this code with CONFIG_REFCOUNT_FULL enabled :-/.
> >> > >
> >> > > Anyway, can you try to apply the following diff and let me know if it
> >> > > fixes the problem?
> >> > >
> >> > > Thanks,
> >> > >
> >> > > Boris
> >> > >  
> >> > > --->8---  
> >> > > diff --git a/drivers/gpu/drm/vc4/vc4_bo.c b/drivers/gpu/drm/vc4/vc4_bo.c
> >> > > index 4ae45d7dac42..2decc8e2c79f 100644
> >> > > --- a/drivers/gpu/drm/vc4/vc4_bo.c
> >> > > +++ b/drivers/gpu/drm/vc4/vc4_bo.c
> >> > > @@ -637,7 +637,8 @@ int vc4_bo_inc_usecnt(struct vc4_bo *bo)
> >> > >         mutex_lock(&bo->madv_lock);
> >> > >         switch (bo->madv) {
> >> > >         case VC4_MADV_WILLNEED:
> >> > > -               refcount_inc(&bo->usecnt);
> >> > > +               if (!refcount_inc_not_zero(&bo->usecnt))
> >> > > +                       refcount_set(&bo->usecnt, 1);  
> >> >
> >> > This is racy.  
> >>
> >> Well, in this case it's not, because this section is protected by a
> >> mutex.
> >>  
> >> > You need a refcount_inc_allow_zero (if that exists).  
> >>
> >> I searched for something like that, but it seems that there's no such
> >> helper.  
> >
> > In that case I think the right thing is to switch to atomic_t. That means
> > no overflow protection, but if we don't have a "this can be 0, I know what
> > we're doing" style refcount, then I don't think it's suitable.
> >
> > Adding Kees to confirm or clarify or ...  
> 
> Adding Elena for more details. IIRC there have been two cases:
> 
> 1) refcount == 0 doesn't mean the object has been released. The
> solution tends to be a +1 to the count everywhere.
> 
> 2) refcount == 0 means the object was released, but the refcount gets
> reused for the next allocation. The solution tends to be to use
> refcount_set() in the new initialization instead of refcount_inc().
> 
> It's not clear to me if either apply here, though.

I'd say #1, since ->usecnt is not controlling the VC4-BO object
lifetime. This being said, I'm not sure the +1 trick is really
appropriate. Indeed, we're using the
refcount_inc_not_zero()/refcount_dec_not_one() helpers to remove/put
the object from/in a
"mem-pointed-by-obj-can-be-freed-under-mem-pressure" pool (note the
object itself is not freed, just the memory portion that is pointed by
this object). If I apply +1 everywhere then I have no way to handle the
0 -> 1 and 1 -> 0 transitions differently (these become 1 -> 2 and 2 ->
1 transitions, and there's no functions called refcount_dec_not_two()
and refcount_inc_not_one()).

Anyway, if refcount_inc_allow_zero() is not option, maybe I should just
use an atomic_t object (as suggested by Daniel) or a plain unsigned
integer protected by the ->madv.lock (as suggested by Eric). Of course,
that means losing built-in counter-overflow checking and reworking a bit
the VC4-MADV logic, but that's not such a big problem. 

Regards,

Boris
Eric Anholt Nov. 27, 2017, 1:40 a.m. UTC | #7
Boris Brezillon <boris.brezillon@free-electrons.com> writes:

> Hi Kees,
>
> On Sat, 25 Nov 2017 12:57:04 -0800
> Kees Cook <keescook@chromium.org> wrote:
>
>> On Wed, Nov 22, 2017 at 11:57 PM, Daniel Vetter <daniel@ffwll.ch> wrote:
>> > On Wed, Nov 22, 2017 at 07:21:00PM +0100, Boris Brezillon wrote:  
>> >> On Wed, 22 Nov 2017 19:13:09 +0100
>> >> Daniel Vetter <daniel@ffwll.ch> wrote:
>> >>  
>> >> > On Wed, Nov 22, 2017 at 6:51 PM, Boris Brezillon
>> >> > <boris.brezillon@free-electrons.com> wrote:  
>> >> > > Hi Stefan,
>> >> > >
>> >> > > On Wed, 22 Nov 2017 17:43:35 +0100 (CET)
>> >> > > Stefan Wahren <stefan.wahren@i2se.com> wrote:
>> >> > >  
>> >> > >> Hi Boris,
>> >> > >> if i boot Raspberry Pi 3 (ARM64, defconfig, linux-next-20171122) with sufficient CMA memory (32 MB), i'll get this warning during boot:
>> >> > >>
>> >> > >> [    7.623510] vc4-drm soc:gpu: bound 3f902000.hdmi (ops vc4_hdmi_ops [vc4])
>> >> > >> [    7.632453] vc4-drm soc:gpu: bound 3f806000.vec (ops vc4_vec_ops [vc4])
>> >> > >> [    7.639707] vc4-drm soc:gpu: bound 3f400000.hvs (ops vc4_hvs_ops [vc4])
>> >> > >> [    7.647364] vc4-drm soc:gpu: bound 3f206000.pixelvalve (ops vc4_crtc_ops [vc4])
>> >> > >> [    7.655451] vc4-drm soc:gpu: bound 3f207000.pixelvalve (ops vc4_crtc_ops [vc4])
>> >> > >> [    7.663415] vc4-drm soc:gpu: bound 3f807000.pixelvalve (ops vc4_crtc_ops [vc4])
>> >> > >> [    7.730580] vc4-drm soc:gpu: bound 3fc00000.v3d (ops vc4_v3d_ops [vc4])
>> >> > >> [    7.743965] [drm] Initialized vc4 0.0.0 20140616 for soc:gpu on minor 0
>> >> > >> [    7.750841] [drm] Supports vblank timestamp caching Rev 2 (21.10.2013).
>> >> > >> [    7.757620] [drm] Driver supports precise vblank timestamp query.
>> >> > >> [    7.811775] ------------[ cut here ]------------
>> >> > >> [    7.811780] refcount_t: increment on 0; use-after-free.
>> >> > >> [    7.811881] WARNING: CPU: 2 PID: 2188 at lib/refcount.c:153 refcount_inc+0x44/0x50
>> >> > >> [    7.811884] Modules linked in: vc4(+) cfg80211 cec drm_kms_helper smsc95xx usbnet drm rfkill brcmutil bcm2835_rng rng_core bcm2835_dma crc32_ce i2c_bcm2835 pwm_bcm2835 ip_tables x_tables ipv6
>> >> > >> [    7.811950] CPU: 2 PID: 2188 Comm: systemd-udevd Not tainted 4.14.0-next-20171122 #1
>> >> > >> [    7.811953] Hardware name: Raspberry Pi 3 Model B Rev 1.2 (DT)
>> >> > >> [    7.811958] task: ffff800036b91c00 task.stack: ffff00000d6f0000
>> >> > >> [    7.811967] pstate: 20000005 (nzCv daif -PAN -UAO)
>> >> > >> [    7.811974] pc : refcount_inc+0x44/0x50
>> >> > >> [    7.811981] lr : refcount_inc+0x44/0x50
>> >> > >> [    7.811984] sp : ffff00000d6f3300
>> >> > >> [    7.811988] x29: ffff00000d6f3300 x28: 0000000000000000
>> >> > >> [    7.811996] x27: 0000000000000003 x26: ffff800037107800
>> >> > >> [    7.812004] x25: 0000000000000001 x24: ffff800035afdc00
>> >> > >> [    7.812012] x23: 0000000000000000 x22: ffff800035dfa600
>> >> > >> [    7.812020] x21: ffff800035afd9b0 x20: ffff800035afd9a4
>> >> > >> [    7.812027] x19: 0000000000000000 x18: 0000000000000000
>> >> > >> [    7.812034] x17: 0000000000000001 x16: 0000000000000019
>> >> > >> [    7.812042] x15: 0000000000000001 x14: 00000000fffffff0
>> >> > >> [    7.812049] x13: ffff0000090ae840 x12: ffff000008fa2d50
>> >> > >> [    7.812057] x11: ffff000008fa2000 x10: ffff0000090ad000
>> >> > >> [    7.812064] x9 : 0000000000000000 x8 : ffff0000090b5c2f
>> >> > >> [    7.812072] x7 : 0000000000000000 x6 : 000000000015ee00
>> >> > >> [    7.812079] x5 : 0000000000000000 x4 : 0000000000000000
>> >> > >> [    7.812087] x3 : ffffffffffffffff x2 : 0000800030047000
>> >> > >> [    7.812094] x1 : ffff800036b91c00 x0 : 000000000000002b
>> >> > >> [    7.812102] Call trace:
>> >> > >> [    7.812109]  refcount_inc+0x44/0x50
>> >> > >> [    7.812226]  vc4_bo_inc_usecnt+0x84/0x88 [vc4]
>> >> > >> [    7.812310]  vc4_prepare_fb+0x40/0xf0 [vc4]
>> >> > >> [    7.812460]  drm_atomic_helper_prepare_planes+0x54/0xf0 [drm_kms_helper]
>> >> > >> [    7.812543]  vc4_atomic_commit+0x88/0x130 [vc4]
>> >> > >> [    7.812868]  drm_atomic_commit+0x48/0x68 [drm]
>> >> > >> [    7.813002]  restore_fbdev_mode_atomic+0x1d8/0x1e8 [drm_kms_helper]
>> >> > >> [    7.813113]  restore_fbdev_mode+0x28/0x160 [drm_kms_helper]
>> >> > >> [    7.813223]  drm_fb_helper_restore_fbdev_mode_unlocked.part.24+0x28/0x90 [drm_kms_helper]
>> >> > >> [    7.813331]  drm_fb_helper_set_par+0x54/0xa8 [drm_kms_helper]
>> >> > >> [    7.813346]  fbcon_init+0x4e8/0x538
>> >> > >> [    7.813357]  visual_init+0xb4/0x108
>> >> > >> [    7.813366]  do_bind_con_driver+0x1b8/0x3a0
>> >> > >> [    7.813373]  do_take_over_console+0x150/0x1d0
>> >> > >> [    7.813380]  do_fbcon_takeover+0x6c/0xf0
>> >> > >> [    7.813387]  fbcon_event_notify+0x8fc/0x928
>> >> > >> [    7.813399]  notifier_call_chain+0x50/0x90
>> >> > >> [    7.813406]  __blocking_notifier_call_chain+0x4c/0x90
>> >> > >> [    7.813413]  blocking_notifier_call_chain+0x14/0x20
>> >> > >> [    7.813420]  fb_notifier_call_chain+0x1c/0x28
>> >> > >> [    7.813426]  register_framebuffer+0x1d0/0x2d8
>> >> > >> [    7.813533]  __drm_fb_helper_initial_config_and_unlock+0x1e8/0x350 [drm_kms_helper]
>> >> > >> [    7.813639]  drm_fb_helper_initial_config+0x40/0x58 [drm_kms_helper]
>> >> > >> [    7.813747]  drm_fbdev_cma_init_with_funcs+0x88/0x158 [drm_kms_helper]
>> >> > >> [    7.813855]  drm_fbdev_cma_init+0x14/0x28 [drm_kms_helper]
>> >> > >> [    7.813943]  vc4_kms_load+0xa4/0xf0 [vc4]
>> >> > >> [    7.814026]  vc4_drm_bind+0x100/0x168 [vc4]
>> >> > >> [    7.814038]  try_to_bring_up_master+0x144/0x1a8
>> >> > >> [    7.814044]  component_master_add_with_match+0x9c/0xe0
>> >> > >> [    7.814128]  vc4_platform_drm_probe+0xb4/0xf0 [vc4]
>> >> > >> [    7.814137]  platform_drv_probe+0x58/0xc0
>> >> > >> [    7.814146]  driver_probe_device+0x224/0x308
>> >> > >> [    7.814153]  __driver_attach+0xac/0xb0
>> >> > >> [    7.814161]  bus_for_each_dev+0x64/0xa0
>> >> > >> [    7.814169]  driver_attach+0x20/0x28
>> >> > >> [    7.814177]  bus_add_driver+0x108/0x228
>> >> > >> [    7.814184]  driver_register+0x60/0xf8
>> >> > >> [    7.814190]  __platform_driver_register+0x40/0x48
>> >> > >> [    7.814274]  vc4_drm_register+0x38/0x1000 [vc4]
>> >> > >> [    7.814283]  do_one_initcall+0x38/0x120
>> >> > >> [    7.814295]  do_init_module+0x58/0x1b8
>> >> > >> [    7.814304]  load_module+0x1fa8/0x2280
>> >> > >> [    7.814311]  SyS_finit_module+0xc0/0xd0
>> >> > >> [    7.814318]  __sys_trace_return+0x0/0x4
>> >> > >> [    7.814325] ---[ end trace 3348554eb91e19a1 ]---  
>> >> > >
>> >> > > Looks like I didn't test this code with CONFIG_REFCOUNT_FULL enabled :-/.
>> >> > >
>> >> > > Anyway, can you try to apply the following diff and let me know if it
>> >> > > fixes the problem?
>> >> > >
>> >> > > Thanks,
>> >> > >
>> >> > > Boris
>> >> > >  
>> >> > > --->8---  
>> >> > > diff --git a/drivers/gpu/drm/vc4/vc4_bo.c b/drivers/gpu/drm/vc4/vc4_bo.c
>> >> > > index 4ae45d7dac42..2decc8e2c79f 100644
>> >> > > --- a/drivers/gpu/drm/vc4/vc4_bo.c
>> >> > > +++ b/drivers/gpu/drm/vc4/vc4_bo.c
>> >> > > @@ -637,7 +637,8 @@ int vc4_bo_inc_usecnt(struct vc4_bo *bo)
>> >> > >         mutex_lock(&bo->madv_lock);
>> >> > >         switch (bo->madv) {
>> >> > >         case VC4_MADV_WILLNEED:
>> >> > > -               refcount_inc(&bo->usecnt);
>> >> > > +               if (!refcount_inc_not_zero(&bo->usecnt))
>> >> > > +                       refcount_set(&bo->usecnt, 1);  
>> >> >
>> >> > This is racy.  
>> >>
>> >> Well, in this case it's not, because this section is protected by a
>> >> mutex.
>> >>  
>> >> > You need a refcount_inc_allow_zero (if that exists).  
>> >>
>> >> I searched for something like that, but it seems that there's no such
>> >> helper.  
>> >
>> > In that case I think the right thing is to switch to atomic_t. That means
>> > no overflow protection, but if we don't have a "this can be 0, I know what
>> > we're doing" style refcount, then I don't think it's suitable.
>> >
>> > Adding Kees to confirm or clarify or ...  
>> 
>> Adding Elena for more details. IIRC there have been two cases:
>> 
>> 1) refcount == 0 doesn't mean the object has been released. The
>> solution tends to be a +1 to the count everywhere.
>> 
>> 2) refcount == 0 means the object was released, but the refcount gets
>> reused for the next allocation. The solution tends to be to use
>> refcount_set() in the new initialization instead of refcount_inc().
>> 
>> It's not clear to me if either apply here, though.
>
> I'd say #1, since ->usecnt is not controlling the VC4-BO object
> lifetime. This being said, I'm not sure the +1 trick is really
> appropriate. Indeed, we're using the
> refcount_inc_not_zero()/refcount_dec_not_one() helpers to remove/put
> the object from/in a
> "mem-pointed-by-obj-can-be-freed-under-mem-pressure" pool (note the
> object itself is not freed, just the memory portion that is pointed by
> this object). If I apply +1 everywhere then I have no way to handle the
> 0 -> 1 and 1 -> 0 transitions differently (these become 1 -> 2 and 2 ->
> 1 transitions, and there's no functions called refcount_dec_not_two()
> and refcount_inc_not_one()).
>
> Anyway, if refcount_inc_allow_zero() is not option, maybe I should just
> use an atomic_t object (as suggested by Daniel) or a plain unsigned
> integer protected by the ->madv.lock (as suggested by Eric). Of course,
> that means losing built-in counter-overflow checking and reworking a bit
> the VC4-MADV logic, but that's not such a big problem. 

I was profiling minetest last week, and the madv lock was at least 10%
of the kernel overhead, since we need to grab it for each BO that goes
idle->busy and also each BO that goes busy->idle in a job.  Given that
we don't reallocate out of the userspace BO cache until it goes idle,
that's potentially a lot of BOs per job, and we'd probably be better off
with the global bo_lock.
Reshetova, Elena Nov. 28, 2017, 10:25 a.m. UTC | #8
Hi and sorry for the delayed reply!

 
> Hi Kees,
> 
> On Sat, 25 Nov 2017 12:57:04 -0800
> Kees Cook <keescook@chromium.org> wrote:
> 
> > On Wed, Nov 22, 2017 at 11:57 PM, Daniel Vetter <daniel@ffwll.ch> wrote:
> > > On Wed, Nov 22, 2017 at 07:21:00PM +0100, Boris Brezillon wrote:
> > >> On Wed, 22 Nov 2017 19:13:09 +0100
> > >> Daniel Vetter <daniel@ffwll.ch> wrote:
> > >>
> > >> > On Wed, Nov 22, 2017 at 6:51 PM, Boris Brezillon
> > >> > <boris.brezillon@free-electrons.com> wrote:
> > >> > > Hi Stefan,
> > >> > >
> > >> > > On Wed, 22 Nov 2017 17:43:35 +0100 (CET)
> > >> > > Stefan Wahren <stefan.wahren@i2se.com> wrote:
> > >> > >
> > >> > >> Hi Boris,
> > >> > >> if i boot Raspberry Pi 3 (ARM64, defconfig, linux-next-20171122) with
> sufficient CMA memory (32 MB), i'll get this warning during boot:
> > >> > >>
> > >> > >> [    7.623510] vc4-drm soc:gpu: bound 3f902000.hdmi (ops vc4_hdmi_ops
> [vc4])
> > >> > >> [    7.632453] vc4-drm soc:gpu: bound 3f806000.vec (ops vc4_vec_ops
> [vc4])
> > >> > >> [    7.639707] vc4-drm soc:gpu: bound 3f400000.hvs (ops vc4_hvs_ops
> [vc4])
> > >> > >> [    7.647364] vc4-drm soc:gpu: bound 3f206000.pixelvalve (ops
> vc4_crtc_ops [vc4])
> > >> > >> [    7.655451] vc4-drm soc:gpu: bound 3f207000.pixelvalve (ops
> vc4_crtc_ops [vc4])
> > >> > >> [    7.663415] vc4-drm soc:gpu: bound 3f807000.pixelvalve (ops
> vc4_crtc_ops [vc4])
> > >> > >> [    7.730580] vc4-drm soc:gpu: bound 3fc00000.v3d (ops vc4_v3d_ops
> [vc4])
> > >> > >> [    7.743965] [drm] Initialized vc4 0.0.0 20140616 for soc:gpu on minor 0
> > >> > >> [    7.750841] [drm] Supports vblank timestamp caching Rev 2 (21.10.2013).
> > >> > >> [    7.757620] [drm] Driver supports precise vblank timestamp query.
> > >> > >> [    7.811775] ------------[ cut here ]------------
> > >> > >> [    7.811780] refcount_t: increment on 0; use-after-free.
> > >> > >> [    7.811881] WARNING: CPU: 2 PID: 2188 at lib/refcount.c:153
> refcount_inc+0x44/0x50
> > >> > >> [    7.811884] Modules linked in: vc4(+) cfg80211 cec drm_kms_helper
> smsc95xx usbnet drm rfkill brcmutil bcm2835_rng rng_core bcm2835_dma crc32_ce
> i2c_bcm2835 pwm_bcm2835 ip_tables x_tables ipv6
> > >> > >> [    7.811950] CPU: 2 PID: 2188 Comm: systemd-udevd Not tainted 4.14.0-
> next-20171122 #1
> > >> > >> [    7.811953] Hardware name: Raspberry Pi 3 Model B Rev 1.2 (DT)
> > >> > >> [    7.811958] task: ffff800036b91c00 task.stack: ffff00000d6f0000
> > >> > >> [    7.811967] pstate: 20000005 (nzCv daif -PAN -UAO)
> > >> > >> [    7.811974] pc : refcount_inc+0x44/0x50
> > >> > >> [    7.811981] lr : refcount_inc+0x44/0x50
> > >> > >> [    7.811984] sp : ffff00000d6f3300
> > >> > >> [    7.811988] x29: ffff00000d6f3300 x28: 0000000000000000
> > >> > >> [    7.811996] x27: 0000000000000003 x26: ffff800037107800
> > >> > >> [    7.812004] x25: 0000000000000001 x24: ffff800035afdc00
> > >> > >> [    7.812012] x23: 0000000000000000 x22: ffff800035dfa600
> > >> > >> [    7.812020] x21: ffff800035afd9b0 x20: ffff800035afd9a4
> > >> > >> [    7.812027] x19: 0000000000000000 x18: 0000000000000000
> > >> > >> [    7.812034] x17: 0000000000000001 x16: 0000000000000019
> > >> > >> [    7.812042] x15: 0000000000000001 x14: 00000000fffffff0
> > >> > >> [    7.812049] x13: ffff0000090ae840 x12: ffff000008fa2d50
> > >> > >> [    7.812057] x11: ffff000008fa2000 x10: ffff0000090ad000
> > >> > >> [    7.812064] x9 : 0000000000000000 x8 : ffff0000090b5c2f
> > >> > >> [    7.812072] x7 : 0000000000000000 x6 : 000000000015ee00
> > >> > >> [    7.812079] x5 : 0000000000000000 x4 : 0000000000000000
> > >> > >> [    7.812087] x3 : ffffffffffffffff x2 : 0000800030047000
> > >> > >> [    7.812094] x1 : ffff800036b91c00 x0 : 000000000000002b
> > >> > >> [    7.812102] Call trace:
> > >> > >> [    7.812109]  refcount_inc+0x44/0x50
> > >> > >> [    7.812226]  vc4_bo_inc_usecnt+0x84/0x88 [vc4]
> > >> > >> [    7.812310]  vc4_prepare_fb+0x40/0xf0 [vc4]
> > >> > >> [    7.812460]  drm_atomic_helper_prepare_planes+0x54/0xf0
> [drm_kms_helper]
> > >> > >> [    7.812543]  vc4_atomic_commit+0x88/0x130 [vc4]
> > >> > >> [    7.812868]  drm_atomic_commit+0x48/0x68 [drm]
> > >> > >> [    7.813002]  restore_fbdev_mode_atomic+0x1d8/0x1e8
> [drm_kms_helper]
> > >> > >> [    7.813113]  restore_fbdev_mode+0x28/0x160 [drm_kms_helper]
> > >> > >> [    7.813223]
> drm_fb_helper_restore_fbdev_mode_unlocked.part.24+0x28/0x90
> [drm_kms_helper]
> > >> > >> [    7.813331]  drm_fb_helper_set_par+0x54/0xa8 [drm_kms_helper]
> > >> > >> [    7.813346]  fbcon_init+0x4e8/0x538
> > >> > >> [    7.813357]  visual_init+0xb4/0x108
> > >> > >> [    7.813366]  do_bind_con_driver+0x1b8/0x3a0
> > >> > >> [    7.813373]  do_take_over_console+0x150/0x1d0
> > >> > >> [    7.813380]  do_fbcon_takeover+0x6c/0xf0
> > >> > >> [    7.813387]  fbcon_event_notify+0x8fc/0x928
> > >> > >> [    7.813399]  notifier_call_chain+0x50/0x90
> > >> > >> [    7.813406]  __blocking_notifier_call_chain+0x4c/0x90
> > >> > >> [    7.813413]  blocking_notifier_call_chain+0x14/0x20
> > >> > >> [    7.813420]  fb_notifier_call_chain+0x1c/0x28
> > >> > >> [    7.813426]  register_framebuffer+0x1d0/0x2d8
> > >> > >> [    7.813533]  __drm_fb_helper_initial_config_and_unlock+0x1e8/0x350
> [drm_kms_helper]
> > >> > >> [    7.813639]  drm_fb_helper_initial_config+0x40/0x58 [drm_kms_helper]
> > >> > >> [    7.813747]  drm_fbdev_cma_init_with_funcs+0x88/0x158
> [drm_kms_helper]
> > >> > >> [    7.813855]  drm_fbdev_cma_init+0x14/0x28 [drm_kms_helper]
> > >> > >> [    7.813943]  vc4_kms_load+0xa4/0xf0 [vc4]
> > >> > >> [    7.814026]  vc4_drm_bind+0x100/0x168 [vc4]
> > >> > >> [    7.814038]  try_to_bring_up_master+0x144/0x1a8
> > >> > >> [    7.814044]  component_master_add_with_match+0x9c/0xe0
> > >> > >> [    7.814128]  vc4_platform_drm_probe+0xb4/0xf0 [vc4]
> > >> > >> [    7.814137]  platform_drv_probe+0x58/0xc0
> > >> > >> [    7.814146]  driver_probe_device+0x224/0x308
> > >> > >> [    7.814153]  __driver_attach+0xac/0xb0
> > >> > >> [    7.814161]  bus_for_each_dev+0x64/0xa0
> > >> > >> [    7.814169]  driver_attach+0x20/0x28
> > >> > >> [    7.814177]  bus_add_driver+0x108/0x228
> > >> > >> [    7.814184]  driver_register+0x60/0xf8
> > >> > >> [    7.814190]  __platform_driver_register+0x40/0x48
> > >> > >> [    7.814274]  vc4_drm_register+0x38/0x1000 [vc4]
> > >> > >> [    7.814283]  do_one_initcall+0x38/0x120
> > >> > >> [    7.814295]  do_init_module+0x58/0x1b8
> > >> > >> [    7.814304]  load_module+0x1fa8/0x2280
> > >> > >> [    7.814311]  SyS_finit_module+0xc0/0xd0
> > >> > >> [    7.814318]  __sys_trace_return+0x0/0x4
> > >> > >> [    7.814325] ---[ end trace 3348554eb91e19a1 ]---
> > >> > >
> > >> > > Looks like I didn't test this code with CONFIG_REFCOUNT_FULL enabled :-/.
> > >> > >
> > >> > > Anyway, can you try to apply the following diff and let me know if it
> > >> > > fixes the problem?
> > >> > >
> > >> > > Thanks,
> > >> > >
> > >> > > Boris
> > >> > >
> > >> > > --->8---
> > >> > > diff --git a/drivers/gpu/drm/vc4/vc4_bo.c b/drivers/gpu/drm/vc4/vc4_bo.c
> > >> > > index 4ae45d7dac42..2decc8e2c79f 100644
> > >> > > --- a/drivers/gpu/drm/vc4/vc4_bo.c
> > >> > > +++ b/drivers/gpu/drm/vc4/vc4_bo.c
> > >> > > @@ -637,7 +637,8 @@ int vc4_bo_inc_usecnt(struct vc4_bo *bo)
> > >> > >         mutex_lock(&bo->madv_lock);
> > >> > >         switch (bo->madv) {
> > >> > >         case VC4_MADV_WILLNEED:
> > >> > > -               refcount_inc(&bo->usecnt);
> > >> > > +               if (!refcount_inc_not_zero(&bo->usecnt))
> > >> > > +                       refcount_set(&bo->usecnt, 1);
> > >> >
> > >> > This is racy.
> > >>
> > >> Well, in this case it's not, because this section is protected by a
> > >> mutex.
> > >>
> > >> > You need a refcount_inc_allow_zero (if that exists).
> > >>
> > >> I searched for something like that, but it seems that there's no such
> > >> helper.
> > >
> > > In that case I think the right thing is to switch to atomic_t. That means
> > > no overflow protection, but if we don't have a "this can be 0, I know what
> > > we're doing" style refcount, then I don't think it's suitable.
> > >
> > > Adding Kees to confirm or clarify or ...
> >
> > Adding Elena for more details. IIRC there have been two cases:
> >
> > 1) refcount == 0 doesn't mean the object has been released. The
> > solution tends to be a +1 to the count everywhere.
> >
> > 2) refcount == 0 means the object was released, but the refcount gets
> > reused for the next allocation. The solution tends to be to use
> > refcount_set() in the new initialization instead of refcount_inc().
> >
> > It's not clear to me if either apply here, though.
> 
> I'd say #1, since ->usecnt is not controlling the VC4-BO object
> lifetime. This being said, I'm not sure the +1 trick is really
> appropriate. Indeed, we're using the
> refcount_inc_not_zero()/refcount_dec_not_one() helpers to remove/put
> the object from/in a
> "mem-pointed-by-obj-can-be-freed-under-mem-pressure" pool (note the
> object itself is not freed, just the memory portion that is pointed by
> this object). If I apply +1 everywhere then I have no way to handle the
> 0 -> 1 and 1 -> 0 transitions differently (these become 1 -> 2 and 2 ->
> 1 transitions, and there's no functions called refcount_dec_not_two()
> and refcount_inc_not_one()).
> 
> Anyway, if refcount_inc_allow_zero() is not option, maybe I should just
> use an atomic_t object (as suggested by Daniel) or a plain unsigned
> integer protected by the ->madv.lock (as suggested by Eric). Of course,
> that means losing built-in counter-overflow checking and reworking a bit
> the VC4-MADV logic, but that's not such a big problem.


So, yes, refcount_inc_allow_zero() is really not an option since the idea is to
clearly separate cases when a reference counter is initialized and when it is
incremented. 

Kees outlined two common strategies on how we usually handle such cases,
but of course we need to look if it even make sense in each concrete case. 
So, if you say that ->usecnt is not controlling an object lifetime (as in classical
refcounter notion), then maybe the correct way is to really revert the conversion and
go back to atomic_t. Overflow protection will be lost, but if you don't care
about it in this particular case and your usage pattern for this counter is different,
probably no need to make things more complex.

Best Regards,
Elena.


> 
> Regards,
> 
> Boris
diff mbox

Patch

diff --git a/drivers/gpu/drm/vc4/vc4_bo.c b/drivers/gpu/drm/vc4/vc4_bo.c
index 4ae45d7dac42..2decc8e2c79f 100644
--- a/drivers/gpu/drm/vc4/vc4_bo.c
+++ b/drivers/gpu/drm/vc4/vc4_bo.c
@@ -637,7 +637,8 @@  int vc4_bo_inc_usecnt(struct vc4_bo *bo)
        mutex_lock(&bo->madv_lock);
        switch (bo->madv) {
        case VC4_MADV_WILLNEED:
-               refcount_inc(&bo->usecnt);
+               if (!refcount_inc_not_zero(&bo->usecnt))
+                       refcount_set(&bo->usecnt, 1);
                ret = 0;
                break;
        case VC4_MADV_DONTNEED: