diff mbox series

[1/2] drm/i915/gem: Take runtime-pm wakeref prior to unbinding

Message ID 20191203101347.2836057-1-chris@chris-wilson.co.uk (mailing list archive)
State New, archived
Headers show
Series [1/2] drm/i915/gem: Take runtime-pm wakeref prior to unbinding | expand

Commit Message

Chris Wilson Dec. 3, 2019, 10:13 a.m. UTC
Some machines require ACPI for runtime resume, and ACPI is quite kmalloc
happy. We cannot handle kmalloc from inside the vm->mutex, as they are
used by the shrinker, and so we must ensure the global runtime-pm is
awake prior to unbinding to avoid the potential inversion.

<4> [57.121748] ======================================================
<4> [57.121750] WARNING: possible circular locking dependency detected
<4> [57.121753] 5.4.0-rc8-CI-CI_DRM_7466+ #1 Tainted: G     U
<4> [57.121754] ------------------------------------------------------
<4> [57.121756] i915_pm_rpm/1105 is trying to acquire lock:
<4> [57.121758] ffffffff82263a40 (fs_reclaim){+.+.}, at: fs_reclaim_acquire.part.117+0x0/0x30
<4> [57.121766]
but task is already holding lock:
<4> [57.121768] ffff888475a593c0 (&vm->mutex){+.+.}, at: i915_vma_unbind+0x21/0x50 [i915]
<4> [57.121868]
which lock already depends on the new lock.

<4> [57.121869]
the existing dependency chain (in reverse order) is:
<4> [57.121871]
-> #1 (&vm->mutex){+.+.}:
<4> [57.121951]        i915_gem_shrinker_taints_mutex+0xa2/0xd0 [i915]
<4> [57.122028]        i915_address_space_init+0xa9/0x170 [i915]
<4> [57.122104]        i915_ggtt_init_hw+0x47/0x130 [i915]
<4> [57.122150]        i915_driver_probe+0xbb4/0x15f0 [i915]
<4> [57.122197]        i915_pci_probe+0x43/0x1c0 [i915]
<4> [57.122202]        pci_device_probe+0x9e/0x120
<4> [57.122206]        really_probe+0xea/0x420
<4> [57.122209]        driver_probe_device+0x10b/0x120
<4> [57.122212]        device_driver_attach+0x4a/0x50
<4> [57.122214]        __driver_attach+0x97/0x130
<4> [57.122217]        bus_for_each_dev+0x74/0xc0
<4> [57.122220]        bus_add_driver+0x142/0x220
<4> [57.122222]        driver_register+0x56/0xf0
<4> [57.122226]        do_one_initcall+0x58/0x2ff
<4> [57.122230]        do_init_module+0x56/0x1f8
<4> [57.122233]        load_module+0x243e/0x29f0
<4> [57.122236]        __do_sys_finit_module+0xe9/0x110
<4> [57.122239]        do_syscall_64+0x4f/0x210
<4> [57.122242]        entry_SYSCALL_64_after_hwframe+0x49/0xbe
<4> [57.122244]
-> #0 (fs_reclaim){+.+.}:
<4> [57.122249]        __lock_acquire+0x1328/0x15d0
<4> [57.122251]        lock_acquire+0xa7/0x1c0
<4> [57.122254]        fs_reclaim_acquire.part.117+0x24/0x30
<4> [57.122257]        __kmalloc+0x48/0x320
<4> [57.122261]        acpi_ns_internalize_name+0x44/0x9b
<4> [57.122264]        acpi_ns_get_node_unlocked+0x6b/0xd3
<4> [57.122267]        acpi_ns_get_node+0x3b/0x50
<4> [57.122271]        acpi_get_handle+0x8a/0xb4
<4> [57.122274]        acpi_has_method+0x1c/0x40
<4> [57.122278]        acpi_pci_set_power_state+0x40/0xe0
<4> [57.122281]        pci_platform_power_transition+0x3e/0x90
<4> [57.122284]        pci_set_power_state+0x83/0xf0
<4> [57.122287]        pci_restore_standard_config+0x22/0x40
<4> [57.122289]        pci_pm_runtime_resume+0x23/0xc0
<4> [57.122293]        __rpm_callback+0xb1/0x110
<4> [57.122296]        rpm_callback+0x1a/0x70
<4> [57.122299]        rpm_resume+0x50e/0x790
<4> [57.122302]        __pm_runtime_resume+0x42/0x80
<4> [57.122357]        __intel_runtime_pm_get+0x15/0x60 [i915]
<4> [57.122435]        ggtt_unbind_vma+0x24/0x60 [i915]
<4> [57.122514]        __i915_vma_unbind.part.39+0xb5/0x500 [i915]
<4> [57.122593]        i915_vma_unbind+0x2d/0x50 [i915]
<4> [57.122668]        i915_gem_object_unbind+0x11c/0x260 [i915]
<4> [57.122740]        i915_gem_object_set_cache_level+0x32/0x90 [i915]
<4> [57.122810]        i915_gem_set_caching_ioctl+0x1f7/0x2f0 [i915]
<4> [57.122815]        drm_ioctl_kernel+0xa7/0xf0
<4> [57.122818]        drm_ioctl+0x2e1/0x390
<4> [57.122822]        do_vfs_ioctl+0xa0/0x6f0
<4> [57.122825]        ksys_ioctl+0x35/0x60
<4> [57.122828]        __x64_sys_ioctl+0x11/0x20
<4> [57.122830]        do_syscall_64+0x4f/0x210
<4> [57.122833]        entry_SYSCALL_64_after_hwframe+0x49/0xbe

Closes: https://gitlab.freedesktop.org/drm/intel/issues/711
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_gem.c | 17 ++++++++++++++++-
 1 file changed, 16 insertions(+), 1 deletion(-)

Comments

Matthew Auld Dec. 3, 2019, 1 p.m. UTC | #1
On Tue, 3 Dec 2019 at 10:13, Chris Wilson <chris@chris-wilson.co.uk> wrote:
>
> Some machines require ACPI for runtime resume, and ACPI is quite kmalloc
> happy. We cannot handle kmalloc from inside the vm->mutex, as they are
> used by the shrinker, and so we must ensure the global runtime-pm is
> awake prior to unbinding to avoid the potential inversion.
>
> <4> [57.121748] ======================================================
> <4> [57.121750] WARNING: possible circular locking dependency detected
> <4> [57.121753] 5.4.0-rc8-CI-CI_DRM_7466+ #1 Tainted: G     U
> <4> [57.121754] ------------------------------------------------------
> <4> [57.121756] i915_pm_rpm/1105 is trying to acquire lock:
> <4> [57.121758] ffffffff82263a40 (fs_reclaim){+.+.}, at: fs_reclaim_acquire.part.117+0x0/0x30
> <4> [57.121766]
> but task is already holding lock:
> <4> [57.121768] ffff888475a593c0 (&vm->mutex){+.+.}, at: i915_vma_unbind+0x21/0x50 [i915]
> <4> [57.121868]
> which lock already depends on the new lock.
>
> <4> [57.121869]
> the existing dependency chain (in reverse order) is:
> <4> [57.121871]
> -> #1 (&vm->mutex){+.+.}:
> <4> [57.121951]        i915_gem_shrinker_taints_mutex+0xa2/0xd0 [i915]
> <4> [57.122028]        i915_address_space_init+0xa9/0x170 [i915]
> <4> [57.122104]        i915_ggtt_init_hw+0x47/0x130 [i915]
> <4> [57.122150]        i915_driver_probe+0xbb4/0x15f0 [i915]
> <4> [57.122197]        i915_pci_probe+0x43/0x1c0 [i915]
> <4> [57.122202]        pci_device_probe+0x9e/0x120
> <4> [57.122206]        really_probe+0xea/0x420
> <4> [57.122209]        driver_probe_device+0x10b/0x120
> <4> [57.122212]        device_driver_attach+0x4a/0x50
> <4> [57.122214]        __driver_attach+0x97/0x130
> <4> [57.122217]        bus_for_each_dev+0x74/0xc0
> <4> [57.122220]        bus_add_driver+0x142/0x220
> <4> [57.122222]        driver_register+0x56/0xf0
> <4> [57.122226]        do_one_initcall+0x58/0x2ff
> <4> [57.122230]        do_init_module+0x56/0x1f8
> <4> [57.122233]        load_module+0x243e/0x29f0
> <4> [57.122236]        __do_sys_finit_module+0xe9/0x110
> <4> [57.122239]        do_syscall_64+0x4f/0x210
> <4> [57.122242]        entry_SYSCALL_64_after_hwframe+0x49/0xbe
> <4> [57.122244]
> -> #0 (fs_reclaim){+.+.}:
> <4> [57.122249]        __lock_acquire+0x1328/0x15d0
> <4> [57.122251]        lock_acquire+0xa7/0x1c0
> <4> [57.122254]        fs_reclaim_acquire.part.117+0x24/0x30
> <4> [57.122257]        __kmalloc+0x48/0x320
> <4> [57.122261]        acpi_ns_internalize_name+0x44/0x9b
> <4> [57.122264]        acpi_ns_get_node_unlocked+0x6b/0xd3
> <4> [57.122267]        acpi_ns_get_node+0x3b/0x50
> <4> [57.122271]        acpi_get_handle+0x8a/0xb4
> <4> [57.122274]        acpi_has_method+0x1c/0x40
> <4> [57.122278]        acpi_pci_set_power_state+0x40/0xe0
> <4> [57.122281]        pci_platform_power_transition+0x3e/0x90
> <4> [57.122284]        pci_set_power_state+0x83/0xf0
> <4> [57.122287]        pci_restore_standard_config+0x22/0x40
> <4> [57.122289]        pci_pm_runtime_resume+0x23/0xc0
> <4> [57.122293]        __rpm_callback+0xb1/0x110
> <4> [57.122296]        rpm_callback+0x1a/0x70
> <4> [57.122299]        rpm_resume+0x50e/0x790
> <4> [57.122302]        __pm_runtime_resume+0x42/0x80
> <4> [57.122357]        __intel_runtime_pm_get+0x15/0x60 [i915]
> <4> [57.122435]        ggtt_unbind_vma+0x24/0x60 [i915]
> <4> [57.122514]        __i915_vma_unbind.part.39+0xb5/0x500 [i915]
> <4> [57.122593]        i915_vma_unbind+0x2d/0x50 [i915]
> <4> [57.122668]        i915_gem_object_unbind+0x11c/0x260 [i915]
> <4> [57.122740]        i915_gem_object_set_cache_level+0x32/0x90 [i915]
> <4> [57.122810]        i915_gem_set_caching_ioctl+0x1f7/0x2f0 [i915]
> <4> [57.122815]        drm_ioctl_kernel+0xa7/0xf0
> <4> [57.122818]        drm_ioctl+0x2e1/0x390
> <4> [57.122822]        do_vfs_ioctl+0xa0/0x6f0
> <4> [57.122825]        ksys_ioctl+0x35/0x60
> <4> [57.122828]        __x64_sys_ioctl+0x11/0x20
> <4> [57.122830]        do_syscall_64+0x4f/0x210
> <4> [57.122833]        entry_SYSCALL_64_after_hwframe+0x49/0xbe
>
> Closes: https://gitlab.freedesktop.org/drm/intel/issues/711
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
diff mbox series

Patch

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 61395b03443e..f7b5fe0f5424 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -119,10 +119,23 @@  i915_gem_get_aperture_ioctl(struct drm_device *dev, void *data,
 int i915_gem_object_unbind(struct drm_i915_gem_object *obj,
 			   unsigned long flags)
 {
-	struct i915_vma *vma;
+	struct intel_runtime_pm *rpm = &to_i915(obj->base.dev)->runtime_pm;
 	LIST_HEAD(still_in_list);
+	intel_wakeref_t wakeref;
+	struct i915_vma *vma;
 	int ret = 0;
 
+	if (!atomic_read(&obj->bind_count))
+		return 0;
+
+	/*
+	 * As some machines use ACPI to handle runtime-resume callbacks, and
+	 * ACPI is quite kmalloc happy, we cannot resume beneath the vm->mutex
+	 * as they are required by the shrinker. Ergo, we wake the device up
+	 * first just in case.
+	 */
+	wakeref = intel_runtime_pm_get(rpm);
+
 	spin_lock(&obj->vma.lock);
 	while (!ret && (vma = list_first_entry_or_null(&obj->vma.list,
 						       struct i915_vma,
@@ -146,6 +159,8 @@  int i915_gem_object_unbind(struct drm_i915_gem_object *obj,
 	list_splice(&still_in_list, &obj->vma.list);
 	spin_unlock(&obj->vma.lock);
 
+	intel_runtime_pm_put(rpm, wakeref);
+
 	return ret;
 }