From patchwork Mon Aug 17 15:46:36 2015 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Grygorii Strashko X-Patchwork-Id: 7025571 Return-Path: X-Original-To: patchwork-linux-arm@patchwork.kernel.org Delivered-To: patchwork-parsemail@patchwork1.web.kernel.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.136]) by patchwork1.web.kernel.org (Postfix) with ESMTP id CDC869F358 for ; Mon, 17 Aug 2015 15:49:45 +0000 (UTC) Received: from mail.kernel.org (localhost [127.0.0.1]) by mail.kernel.org (Postfix) with ESMTP id A87C320690 for ; Mon, 17 Aug 2015 15:49:44 +0000 (UTC) Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.9]) (using TLSv1.2 with cipher AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 32D2C20681 for ; Mon, 17 Aug 2015 15:49:43 +0000 (UTC) Received: from localhost ([127.0.0.1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.80.1 #2 (Red Hat Linux)) id 1ZRMd4-0002Nj-LJ; Mon, 17 Aug 2015 15:47:18 +0000 Received: from comal.ext.ti.com ([198.47.26.152]) by bombadil.infradead.org with esmtps (Exim 4.80.1 #2 (Red Hat Linux)) id 1ZRMd1-0002Ll-1z for linux-arm-kernel@lists.infradead.org; Mon, 17 Aug 2015 15:47:16 +0000 Received: from dflxv15.itg.ti.com ([128.247.5.124]) by comal.ext.ti.com (8.13.7/8.13.7) with ESMTP id t7HFkd3c011437; Mon, 17 Aug 2015 10:46:39 -0500 Received: from DLEE71.ent.ti.com (dlee71.ent.ti.com [157.170.170.114]) by dflxv15.itg.ti.com (8.14.3/8.13.8) with ESMTP id t7HFkduD021918; Mon, 17 Aug 2015 10:46:39 -0500 Received: from dlep33.itg.ti.com (157.170.170.75) by DLEE71.ent.ti.com (157.170.170.114) with Microsoft SMTP Server id 14.3.224.2; Mon, 17 Aug 2015 10:46:39 -0500 Received: from [172.22.235.224] (ileax41-snat.itg.ti.com [10.172.224.153]) by dlep33.itg.ti.com (8.14.3/8.13.8) with ESMTP id t7HFkbwP005909; Mon, 17 Aug 2015 10:46:38 -0500 Message-ID: <55D201DC.5070604@ti.com> Date: Mon, 17 Aug 2015 18:46:36 +0300 From: Grygorii Strashko User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.8.0 MIME-Version: 1.0 To: Sebastian Andrzej Siewior Subject: Re: [4.1.3-rt3] [report][suspend to ram] BUG: sleeping function called from invalid context at kernel/locking/rtmutex.c:917 References: <55CB8FAE.9020309@ti.com> <20150816114241.GC7004@linutronix.de> In-Reply-To: <20150816114241.GC7004@linutronix.de> X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20150817_084715_244989_719A2E0A X-CRM114-Status: GOOD ( 21.93 ) X-Spam-Score: -7.5 (-------) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Sekhar Nori , Felipe Balbi , linux-rt-users@vger.kernel.org, linux-arm Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+patchwork-linux-arm=patchwork.kernel.org@lists.infradead.org X-Spam-Status: No, score=-4.8 required=5.0 tests=BAYES_00, RCVD_IN_DNSWL_MED, RP_MATCHES_RCVD, UNPARSEABLE_RELAY autolearn=unavailable version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on mail.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Hi Sebastian, On 08/16/2015 02:42 PM, Sebastian Andrzej Siewior wrote: > * Grygorii Strashko | 2015-08-12 21:25:50 [+0300]: >> >> I can constantly see below error report with RT-kernel on TI ARM dra7-evm >> if I'm trying to do Suspend to RAM. > > do you see the same problem on x86 with -RT? Unfortunately, I'm not working with x86 now, and I'm not sure I'll be able to try it in the nearest future. Below, I've tried to analyze involved code path and it seems expected to call clear_tasks_mm_cpumask() in atomic context, as the whole take_cpu_down() call chain expected to be atomic. > >> Disabling non-boot CPUs ... >> PM: noirq suspend of devices complete after 7.295 msecs >> [ 100.285729] Disabling non-boot CPUs ... >> [ 100.287575] BUG: sleeping function called from invalid context at kernel/locking/rtmutex.c:917 >> [ 100.287580] in_atomic(): 1, irqs_disabled(): 128, pid: 18, name: migration/1 >> [ 100.287583] INFO: lockdep is turned off. >> [ 100.287586] irq event stamp: 122 >> [ 100.287600] hardirqs last enabled at (121): [] _raw_spin_unlock_irqrestore+0x88/0x90 >> [ 100.287609] hardirqs last disabled at (122): [] _raw_spin_lock_irq+0x28/0x5c >> [ 100.287620] softirqs last enabled at (0): [] copy_process.part.52+0x410/0x19d8 >> [ 100.287625] softirqs last disabled at (0): [< (null)>] (null) >> [ 100.287630] Preemption disabled at:[< (null)>] (null) >> [ 100.287631] >> [ 100.287639] CPU: 1 PID: 18 Comm: migration/1 Tainted: G W 4.1.4-rt3-01046-g96ac8da #204 >> [ 100.287642] Hardware name: Generic DRA74X (Flattened Device Tree) >> [ 100.287659] [] (unwind_backtrace) from [] (show_stack+0x20/0x24) >> [ 100.287671] [] (show_stack) from [] (dump_stack+0x88/0xdc) >> [ 100.287681] [] (dump_stack) from [] (___might_sleep+0x198/0x2a8) >> [ 100.287689] [] (___might_sleep) from [] (rt_spin_lock+0x30/0x70) >> [ 100.287699] [] (rt_spin_lock) from [] (find_lock_task_mm+0x9c/0x174) > > this is task_lock() which takes a sleeping lock. > >> [ 100.287710] [] (find_lock_task_mm) from [] (clear_tasks_mm_cpumask+0xb4/0x1ac) >> [ 100.287720] [] (clear_tasks_mm_cpumask) from [] (__cpu_disable+0x98/0xbc) >> [ 100.287728] [] (__cpu_disable) from [] (take_cpu_down+0x1c/0x50) >> [ 100.287742] [] (take_cpu_down) from [] (multi_cpu_stop+0x11c/0x158) >> [ 100.287754] [] (multi_cpu_stop) from [] (cpu_stopper_thread+0xc4/0x184) > > this function contains local_save_flags(). multi_cpu_stop: local_save_flags(flags); if (!msdata->active_cpus) is_active = cpu == cpumask_first(cpu_online_mask); else is_active = cpumask_test_cpu(cpu, msdata->active_cpus); /* Simple state machine */ do { /* Chill out and ensure we re-read multi_stop_state. */ cpu_relax(); if (msdata->state != curstate) { curstate = msdata->state; switch (curstate) { case MULTI_STOP_DISABLE_IRQ: local_irq_disable(); hard_irq_disable(); <==== Step 2 disable IRQs break; case MULTI_STOP_RUN: if (is_active) err = msdata->fn(msdata->data); ===> Step 3 take_cpu_down() break; default: break; } ack_state(msdata); } } while (curstate != MULTI_STOP_EXIT); local_irq_restore(flags); > >> [ 100.287767] [] (cpu_stopper_thread) from [] (smpboot_thread_fn+0x18c/0x324) cpu_stopper_thread: preempt_disable(); ret = fn(arg); ===> multi_cpu_stop() if (ret) done->ret = ret; preempt_enable(); >> [ 100.287779] [] (smpboot_thread_fn) from [] (kthread+0xe8/0x104) >> [ 100.287791] [] (kthread) from [] (ret_from_fork+0x14/0x3c) >> [ 100.288114] CPU1: shutdown > > The local_save_flags() should be probably replaced with something else. > Below is list of clear_tasks_mm_cpumask() users: arch/arm/kernel/smp.c int __cpu_disable(void) 216: clear_tasks_mm_cpumask(cpu); arch/arm64/kernel/smp.c int __cpu_disable(void) 238: clear_tasks_mm_cpumask(cpu); arch/metag/kernel/smp.c int __cpu_disable(void) 290: clear_tasks_mm_cpumask(cpu); arch/powerpc/mm/mmu_context_nohash.c _cpu_down() + cpu_notify_nofail(CPU_DEAD | mod, hcpu); + mmu_context_cpu_notify() 400: clear_tasks_mm_cpumask(cpu); arch/sh/kernel/smp.c int __cpu_disable(void) 155: clear_tasks_mm_cpumask(cpu); arch/xtensa/kernel/smp.c int __cpu_disable(void) 279: clear_tasks_mm_cpumask(cpu); As per above, in all case, except *powerpc*, the clear_tasks_mm_cpumask() is called from __cpu_disable() (atomic context). While in powerpc case it's called from _cpu_down() (if I understand right, in non-atomic context). Commit which adds clear_tasks_mm_cpumask() call for arm is below: commit 3eaa73bde2fb475b731a0fde7dd11c3ecfb8679c Author: Anton Vorontsov Date: Thu May 31 16:26:22 2012 -0700 arm: use clear_tasks_mm_cpumask() Checking for process->mm is not enough because process' main thread may exit or detach its mm via use_mm(), but other threads may still have a valid mm. To fix this we would need to use find_lock_task_mm(), which would walk up all threads and returns an appropriate task (with task lock held). clear_tasks_mm_cpumask() has this issue fixed, so let's use it. Suggested-by: Oleg Nesterov Signed-off-by: Anton Vorontsov Cc: Russell King Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Thanks for your reply. diff --git a/arch/arm/kernel/smp.c b/arch/arm/kernel/smp.c index b735521..2c7217d 100644 --- a/arch/arm/kernel/smp.c +++ b/arch/arm/kernel/smp.c @@ -109,7 +109,6 @@ static void percpu_timer_stop(void); int __cpu_disable(void) { unsigned int cpu = smp_processor_id(); - struct task_struct *p; int ret; ret = platform_cpu_disable(cpu); @@ -139,12 +138,7 @@ int __cpu_disable(void) flush_cache_all(); local_flush_tlb_all(); - read_lock(&tasklist_lock); - for_each_process(p) { - if (p->mm) - cpumask_clear_cpu(cpu, mm_cpumask(p->mm)); - } - read_unlock(&tasklist_lock); + clear_tasks_mm_cpumask(cpu); return 0; }