From patchwork Sat Apr 5 20:08:02 2014 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ben Widawsky X-Patchwork-Id: 3942461 Return-Path: X-Original-To: patchwork-intel-gfx@patchwork.kernel.org Delivered-To: patchwork-parsemail@patchwork2.web.kernel.org Received: from mail.kernel.org (mail.kernel.org [198.145.19.201]) by patchwork2.web.kernel.org (Postfix) with ESMTP id 24F7DBFF02 for ; Sat, 5 Apr 2014 20:08:09 +0000 (UTC) Received: from mail.kernel.org (localhost [127.0.0.1]) by mail.kernel.org (Postfix) with ESMTP id 2F7EF2034C for ; Sat, 5 Apr 2014 20:08:08 +0000 (UTC) Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) by mail.kernel.org (Postfix) with ESMTP id 29E68202FF for ; Sat, 5 Apr 2014 20:08:07 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 098066E1BD; Sat, 5 Apr 2014 13:08:06 -0700 (PDT) X-Original-To: intel-gfx@lists.freedesktop.org Delivered-To: intel-gfx@lists.freedesktop.org Received: from mga02.intel.com (mga02.intel.com [134.134.136.20]) by gabe.freedesktop.org (Postfix) with ESMTP id 4F5A96E1BD for ; Sat, 5 Apr 2014 13:08:05 -0700 (PDT) Received: from orsmga002.jf.intel.com ([10.7.209.21]) by orsmga101.jf.intel.com with ESMTP; 05 Apr 2014 13:08:04 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="4.97,801,1389772800"; d="scan'208";a="515478845" Received: from unknown (HELO ironside.amr.corp.intel.com) ([10.255.12.150]) by orsmga002.jf.intel.com with ESMTP; 05 Apr 2014 13:08:03 -0700 From: Ben Widawsky To: Intel GFX Date: Sat, 5 Apr 2014 13:08:02 -0700 Message-Id: <1396728482-25402-1-git-send-email-benjamin.widawsky@intel.com> X-Mailer: git-send-email 1.9.1 Subject: [Intel-gfx] [PATCH] drm/i915: Make vm eviction uninterruptible X-BeenThere: intel-gfx@lists.freedesktop.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: Intel graphics driver community testing & development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , MIME-Version: 1.0 Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" X-Spam-Status: No, score=-4.8 required=5.0 tests=BAYES_00, RCVD_IN_DNSWL_MED, RP_MATCHES_RCVD, UNPARSEABLE_RELAY autolearn=unavailable version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on mail.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Our current code cannot handle a failure to evict well. You'll get at the very least the following splat, but usually a lot worse fallout after: [ 134.819441] ------------[ cut here ]------------ [ 134.819467] WARNING: CPU: 3 PID: 442 at drivers/gpu/drm/i915/i915_gem_evict.c:230 i915_gem_evict_vm+0x8a/0x1c0 [i915]() [ 134.819471] Modules linked in: i915 drm_kms_helper drm intel_gtt agpgart i2c_algo_bit ext4 crc16 mbcache jbd2 x86_pkg_temp_thermal coretemp kvm_intel kvm crc32c_intel ghash_clmulni_intel aesni_intel aes_x86_64 lrw iTCO_wdt gf128mul iTCO_vendor_support glue_helper ablk_helper cryptd microcode serio_raw i2c_i801 fan thermal battery e1000e acpi_cpufreq evdev ptp ac acpi_pad pps_core processor lpc_ich mfd_core snd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device snd_timer snd soundcore sd_mod crc_t10dif crct10dif_common ahci libahci libata ehci_pci ehci_hcd usbcore scsi_mod usb_common [ 134.819565] CPU: 3 PID: 442 Comm: glxgears Not tainted 3.14.0-BEN+ #480 [ 134.819568] Hardware name: Intel Corporation Broadwell Client platform/WhiteTip Mountain 1, BIOS BDW-E1R1.86C.0063.R01.1402110503 02/11/2014 [ 134.819571] 0000000000000009 ffff88009b10fa80 ffffffff8159e6a5 0000000000000000 [ 134.819577] ffff88009b10fab8 ffffffff8104895d ffff880145c353c0 ffff880145f400f8 [ 134.819584] 0000000000000000 ffff8800a274d300 ffff88009b10fb78 ffff88009b10fac8 [ 134.819590] Call Trace: [ 134.819599] [] dump_stack+0x4e/0x7a [ 134.819607] [] warn_slowpath_common+0x7d/0xa0 [ 134.819635] [] warn_slowpath_null+0x1a/0x20 [ 134.819656] [] i915_gem_evict_vm+0x8a/0x1c0 [i915] [ 134.819677] [] ppgtt_release+0x17b/0x1e0 [i915] [ 134.819693] [] i915_gem_context_free+0x7d/0x180 [i915] [ 134.819707] [] context_idr_cleanup+0x3c/0x40 [i915] [ 134.819715] [] idr_for_each+0x104/0x1a0 [ 134.819730] [] ? i915_gem_context_free+0x180/0x180 [i915] [ 134.819735] [] ? mutex_lock_nested+0x28c/0x3d0 [ 134.819761] [] ? i915_driver_preclose+0x25/0x50 [i915] [ 134.819778] [] i915_gem_context_close+0x35/0xa0 [i915] [ 134.819802] [] i915_driver_preclose+0x30/0x50 [i915] [ 134.819816] [] drm_release+0x5d/0x5f0 [drm] [ 134.819822] [] __fput+0xea/0x240 [ 134.819827] [] ____fput+0xe/0x10 [ 134.819832] [] task_work_run+0xac/0xe0 [ 134.819837] [] do_exit+0x2cf/0xcf0 [ 134.819844] [] ? _raw_spin_unlock_irq+0x2c/0x60 [ 134.819849] [] do_group_exit+0x4c/0xc0 [ 134.819855] [] get_signal_to_deliver+0x2d1/0x920 [ 134.819861] [] do_signal+0x48/0x620 [ 134.819867] [] ? do_readv_writev+0x169/0x220 [ 134.819873] [] ? trace_hardirqs_on_caller+0xfd/0x1c0 [ 134.819879] [] ? __fget_light+0x13d/0x160 [ 134.819886] [] ? sysret_signal+0x5/0x47 [ 134.819892] [] do_notify_resume+0x65/0x80 [ 134.819897] [] int_signal+0x12/0x17 [ 134.819901] ---[ end trace dbf4da2122c3d683 ]--- At first I was going to call this a bandage to the problem. However, upon further thought, I rather like the idea of making evictions atomic, and less prone to failure anyway. The reason it can still somewhat be considered a band-aid however is GPU hangs. It would be nice if we had some way to interrupt the process when the GPU is hung. I'll leave it for a follow patch though. Cc: Chris Wilson Signed-off-by: Ben Widawsky --- drivers/gpu/drm/i915/i915_gem_evict.c | 9 ++++++++- 1 file changed, 8 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/i915/i915_gem_evict.c b/drivers/gpu/drm/i915/i915_gem_evict.c index 75fca63..91da738 100644 --- a/drivers/gpu/drm/i915/i915_gem_evict.c +++ b/drivers/gpu/drm/i915/i915_gem_evict.c @@ -225,10 +225,17 @@ int i915_gem_evict_vm(struct i915_address_space *vm, bool do_idle) i915_gem_retire_requests(vm->dev); } - list_for_each_entry_safe(vma, next, &vm->inactive_list, mm_list) + list_for_each_entry_safe(vma, next, &vm->inactive_list, mm_list) { + struct drm_i915_private *dev_priv = vm->dev->dev_private; + bool old_intr = dev_priv->mm.interruptible; + dev_priv->mm.interruptible = false; + if (vma->pin_count == 0) WARN_ON(i915_vma_unbind(vma)); + dev_priv->mm.interruptible = old_intr; + } + return 0; }