blocking ops in drm_sched_cleanup_jobs()

From: Steven Price <steven.price@arm.com>

Hi,

I hit the below splat randomly with panfrost. From what I can tell this
is a more general issue which would affect other drivers.

----8<-----
[58604.913130] ------------[ cut here ]------------
[58604.918590] WARNING: CPU: 1 PID: 1758 at kernel/sched/core.c:6556 __might_sleep+0x74/0x98
[58604.927965] do not call blocking ops when !TASK_RUNNING; state=1 set at [<0c590494>] prepare_to_wait_event+0x104/0x164
[58604.940047] Modules linked in: panfrost gpu_sched
[58604.945370] CPU: 1 PID: 1758 Comm: pan_js Not tainted 5.3.0-rc1+ #13
[58604.952500] Hardware name: Rockchip (Device Tree)
[58604.957815] [<c0111150>] (unwind_backtrace) from [<c010c99c>] (show_stack+0x10/0x14)
[58604.966521] [<c010c99c>] (show_stack) from [<c07adbb4>] (dump_stack+0x9c/0xd4)
[58604.974639] [<c07adbb4>] (dump_stack) from [<c0121da8>] (__warn+0xe8/0x104)
[58604.982462] [<c0121da8>] (__warn) from [<c0121e08>] (warn_slowpath_fmt+0x44/0x6c)
[58604.990867] [<c0121e08>] (warn_slowpath_fmt) from [<c014eccc>] (__might_sleep+0x74/0x98)
[58604.999973] [<c014eccc>] (__might_sleep) from [<c07c73d8>] (__mutex_lock+0x38/0x948)
[58605.008690] [<c07c73d8>] (__mutex_lock) from [<c07c7d00>] (mutex_lock_nested+0x18/0x20)
[58605.017841] [<c07c7d00>] (mutex_lock_nested) from [<bf00b54c>] (panfrost_gem_free_object+0x60/0x10c [panfrost])
[58605.029430] [<bf00b54c>] (panfrost_gem_free_object [panfrost]) from [<bf00cecc>] (panfrost_job_put+0x138/0x150 [panfrost])
[58605.042076] [<bf00cecc>] (panfrost_job_put [panfrost]) from [<bf00121c>] (drm_sched_cleanup_jobs+0xc8/0xe0 [gpu_sched])
[58605.054417] [<bf00121c>] (drm_sched_cleanup_jobs [gpu_sched]) from [<bf001300>] (drm_sched_main+0xcc/0x26c [gpu_sched])
[58605.066620] [<bf001300>] (drm_sched_main [gpu_sched]) from [<c0146cfc>] (kthread+0x13c/0x154)
[58605.076226] [<c0146cfc>] (kthread) from [<c01010b4>] (ret_from_fork+0x14/0x20)
[58605.084346] Exception stack(0xe959bfb0 to 0xe959bff8)
[58605.090046] bfa0:                                     00000000 00000000 00000000 00000000
[58605.099250] bfc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
[58605.108480] bfe0: 00000000 00000000 00000000 00000000 00000013 00000000
[58605.116210] irq event stamp: 179
[58605.119955] hardirqs last  enabled at (187): [<c017f7e4>] console_unlock+0x564/0x5c4
[58605.128935] hardirqs last disabled at (202): [<c017f308>] console_unlock+0x88/0x5c4
[58605.137788] softirqs last  enabled at (216): [<c0102334>] __do_softirq+0x18c/0x548
[58605.146543] softirqs last disabled at (227): [<c0129528>] irq_exit+0xc4/0x10c
[58605.154618] ---[ end trace f65bdbd9ea9adfc0 ]---
----8<-----

The problem is that drm_sched_main() calls drm_sched_cleanup_jobs() as
part of the condition of wait_event_interruptible:

> 		wait_event_interruptible(sched->wake_up_worker,
> 					 (drm_sched_cleanup_jobs(sched),
> 					 (!drm_sched_blocked(sched) &&
> 					  (entity = drm_sched_select_entity(sched))) ||
> 					 kthread_should_stop()));

When drm_sched_cleanup_jobs() is called *after* a wait (i.e. after
prepare_to_wait_event() has been called), then any might_sleep() will
moan loudly about it. This doesn't seem to happen often (I've only
triggered it once) because usually drm_sched_cleanup_jobs() either
doesn't sleep or does the sleeping during the first call that
wait_event_interruptible() makes (which is before the task state is set).

I don't really understand why drm_sched_cleanup_jobs() needs to be
called here, a simple change like below 'fixes' it. But I presume
there's some reason for the call being part of the
wait_event_interruptible condition. Can anyone shed light on this?

The code was introduced in commit 5918045c4ed4 ("drm/scheduler: rework job destruction")

Steve

----8<-----

Message ID	6ed945cb-619a-b26d-21b0-9bdaeaa69582@arm.com (mailing list archive)
State	New, archived
Headers	show Return-Path: <SRS0=uQTY=XI=lists.freedesktop.org=dri-devel-bounces@kernel.org> Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 2A30D1599 for <patchwork-dri-devel@patchwork.kernel.org>; Fri, 13 Sep 2019 14:50:57 +0000 (UTC) Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 132D920693 for <patchwork-dri-devel@patchwork.kernel.org>; Fri, 13 Sep 2019 14:50:57 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 132D920693 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=arm.com Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=dri-devel-bounces@lists.freedesktop.org Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 4E9C46F3D8; Fri, 13 Sep 2019 14:50:56 +0000 (UTC) X-Original-To: dri-devel@lists.freedesktop.org Delivered-To: dri-devel@lists.freedesktop.org Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by gabe.freedesktop.org (Postfix) with ESMTP id 595086F3D8 for <dri-devel@lists.freedesktop.org>; Fri, 13 Sep 2019 14:50:55 +0000 (UTC) Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id D34BE1000; Fri, 13 Sep 2019 07:50:54 -0700 (PDT) Received: from [10.1.196.133] (e112269-lin.cambridge.arm.com [10.1.196.133]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 7CD903F67D; Fri, 13 Sep 2019 07:50:54 -0700 (PDT) From: Steven Price <steven.price@arm.com> Subject: blocking ops in drm_sched_cleanup_jobs() To: dri-devel <dri-devel@lists.freedesktop.org>, =?utf-8?q?Christian_K=C3=B6?= =?utf-8?q?nig?= <christian.koenig@amd.com> Message-ID: <6ed945cb-619a-b26d-21b0-9bdaeaa69582@arm.com> Date: Fri, 13 Sep 2019 15:50:53 +0100 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.8.0 MIME-Version: 1.0 Content-Language: en-GB X-BeenThere: dri-devel@lists.freedesktop.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Direct Rendering Infrastructure - Development <dri-devel.lists.freedesktop.org> List-Unsubscribe: <https://lists.freedesktop.org/mailman/options/dri-devel>, <mailto:dri-devel-request@lists.freedesktop.org?subject=unsubscribe> List-Archive: <https://lists.freedesktop.org/archives/dri-devel> List-Post: <mailto:dri-devel@lists.freedesktop.org> List-Help: <mailto:dri-devel-request@lists.freedesktop.org?subject=help> List-Subscribe: <https://lists.freedesktop.org/mailman/listinfo/dri-devel>, <mailto:dri-devel-request@lists.freedesktop.org?subject=subscribe> Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: base64 Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" <dri-devel-bounces@lists.freedesktop.org>
Series	blocking ops in drm_sched_cleanup_jobs() \| expand blocking ops in drm_sched_cleanup_jobs()

blocking ops in drm_sched_cleanup_jobs()

Commit Message

Comments

Patch