From patchwork Wed Mar 28 09:33:32 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ji Zhang X-Patchwork-Id: 10312651 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id B7E3A60467 for ; Wed, 28 Mar 2018 09:34:39 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id B10DA2976D for ; Wed, 28 Mar 2018 09:34:39 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id A57F729B32; Wed, 28 Mar 2018 09:34:39 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-1.9 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,UNPARSEABLE_RELAY autolearn=ham version=3.3.1 Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id 297DF2976D for ; Wed, 28 Mar 2018 09:34:38 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20170209; h=Sender: Content-Transfer-Encoding:Content-Type:Cc:List-Subscribe:List-Help:List-Post: List-Archive:List-Unsubscribe:List-Id:MIME-Version:References:In-Reply-To: Date:To:From:Subject:Message-ID:Reply-To:Content-ID:Content-Description: Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID: List-Owner; bh=XqjRFIjMS2sMcOlqG/oP7bPnvVwGVK+gPwyUtCNsc7A=; b=bHjF5D8I9/bGQm Sxw1gcPPuk9OQSDquvVa1k8OzicstB+Nl584ykyNXSIKKrw6A5O9tkU9so6frPWvU0lFfs1lBxQRX 67TYY+2nsnM3Er0DRPJfizCWbaOaNgkVaF6osQYdBMcIaEgacNCm7jzLmBj5ucyqM1Jy8QOlv0xz4 X7dgP5+N1neDBJCAJyw2RWblKXALiNWLc8T9zSnqZG//f2qe9e2qOGfHRuDWvgUyO00R5YU5BmOVd +DhpaCWkkE5+9v4ubbJwoJR2vmeovozlUWBlpPMMIogmMXRCbBm4/1IClHEc51BZBJy3tv/Bl08Hj a0v1B6SnfkSUhkhHwuKA==; Received: from localhost ([127.0.0.1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.90_1 #2 (Red Hat Linux)) id 1f17TX-0005SJ-OU; Wed, 28 Mar 2018 09:34:35 +0000 Received: from [210.61.82.183] (helo=mailgw01.mediatek.com) by bombadil.infradead.org with esmtps (Exim 4.90_1 #2 (Red Hat Linux)) id 1f17TU-0005NH-Uc; Wed, 28 Mar 2018 09:34:34 +0000 X-UUID: 00d17a8fe31844af9cb7a5761cf24af9-20180328 Received: from mtkexhb01.mediatek.inc [(172.21.101.102)] by mailgw01.mediatek.com (envelope-from ) (mhqrelay.mediatek.com ESMTP with TLS) with ESMTP id 433048685; Wed, 28 Mar 2018 17:33:34 +0800 Received: from mtkcas08.mediatek.inc (172.21.101.126) by mtkmbs03n1.mediatek.inc (172.21.101.181) with Microsoft SMTP Server (TLS) id 15.0.1210.3; Wed, 28 Mar 2018 17:33:32 +0800 Received: from [172.21.84.99] (172.21.84.99) by mtkcas08.mediatek.inc (172.21.101.73) with Microsoft SMTP Server id 15.0.1210.3 via Frontend Transport; Wed, 28 Mar 2018 17:33:32 +0800 Message-ID: <1522229612.26617.47.camel@mtksdccf07> Subject: Re: [PATCH] arm64: avoid race condition issue in dump_backtrace From: Ji.Zhang To: Mark Rutland Date: Wed, 28 Mar 2018 17:33:32 +0800 In-Reply-To: <20180326113932.2i6qp3776jtmcqk4@lakrids.cambridge.arm.com> References: <1521687960-3744-1-git-send-email-ji.zhang@mediatek.com> <20180322055929.z25brvwlmdighz66@salmiak> <1521711329.26617.31.camel@mtksdccf07> <20180326113932.2i6qp3776jtmcqk4@lakrids.cambridge.arm.com> X-Mailer: Evolution 3.2.3-0ubuntu6 MIME-Version: 1.0 X-TM-AS-Product-Ver: SMEX-12.5.0.5042-8.2.9001-23746.005 X-TM-AS-Result: No-14.168500-8.000000-10 X-TMASE-MatchedRID: u6ojmU07PKz4OiVTWoD8RCa1MaKuob8Pt3aeg7g/usCe9toQ6h6LE/Rm 0kmqtH+DZ+lAC6HJ+w6Nl5Y6EH/oPFUlrVpxj7w04RtSDjG+z7C2McZY43zJ45lWFHP5R0I9ALW CBWhEf5c1mAq5eAuCky4MLpy8aLHP1OA+2Akm/pQHtOpEBhWiFlAI6wCVrE3vM/nq/fCVL9nMmG d7OOWkWpZ06ea5nm3veZ/YmcuV1Bi0ZvsM41ZxLb2xWbKjBfWPkT7cMJfe6Jttw+n+iKWyyC2en oBnmNIzMKnSGWbtofSDZ7J7xj+KAQ13FovSJ/os4NAJeq9IDSyvFlDTfVnoWstgH5JIEplzGmeB xwnW/ghtrvuWe9tiH8YIQyFXySOXD0VXqQ1iI8cNEGOVZ0MgDUGtrAxy5ENOv+26ZYWzwkJ077/ 202UZaGoNPFEfW/rHoKnU+0CGYDY+tQ+rplz0wyyEakGwrofuVG1q2Q/KGL6en0qBdy7fjGdykZ CztL0ZLJJcxtb4BJ6sBs95l0gCucrbSfoXari9lUgQqGVMqmy/zKpacmFSwaj5v7I4/SgYyJiVH sxfXPjAF9SUcVeiRiwfhrUBKKO91ddezVny+QJswYo64ufkVTlIA4KS6pW3GNAPebYwJ/vwtZWt kkk1NK3DbA13lHYGhnEXlA7xFpw3KXWd30Ii3Tl/1fD/GopdWQy9YC5qGvz6APa9i04WGCq2rl3 dzGQ1OD/gr58H3eE6TppRPHTnDTrmJ4euG06KTiG/zTezPqrWmerVz+XN0A== X-TM-AS-User-Approved-Sender: No X-TM-AS-User-Blocked-Sender: No X-TMASE-Result: 10--14.168500-8.000000 X-TMASE-Version: SMEX-12.5.0.5042-8.2.9001-23746.005 X-TMASE-POSTMAN: 2-d; X-MTK: N X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20180328_023433_250492_082AF72B X-CRM114-Status: GOOD ( 39.55 ) X-BeenThere: linux-mediatek@lists.infradead.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: wsd_upstream@mediatek.com, Xie XiuQi , Ard Biesheuvel , Marc Zyngier , Catalin Marinas , Julien Thierry , Will Deacon , linux-kernel@vger.kernel.org, shadanji@163.com, James Morse , Matthias Brugger , linux-mediatek@lists.infradead.org, Michael Weiser , Dave Martin , linux-arm-kernel@lists.infradead.org Sender: "Linux-mediatek" Errors-To: linux-mediatek-bounces+patchwork-linux-mediatek=patchwork.kernel.org@lists.infradead.org X-Virus-Scanned: ClamAV using ClamSMTP On Mon, 2018-03-26 at 12:39 +0100, Mark Rutland wrote: > On Thu, Mar 22, 2018 at 05:35:29PM +0800, Ji.Zhang wrote: > > On Thu, 2018-03-22 at 05:59 +0000, Mark Rutland wrote: > > > On Thu, Mar 22, 2018 at 11:06:00AM +0800, Ji Zhang wrote: > > > > When we dump the backtrace of some specific task, there is a potential race > > > > condition due to the task may be running on other cores if SMP enabled. > > > > That is because for current implementation, if the task is not the current > > > > task, we will get the registers used for unwind from cpu_context saved in > > > > thread_info, which is the snapshot before context switch, but if the task > > > > is running on other cores, the registers and the content of stack are > > > > changed. > > > > This may cause that we get the wrong backtrace or incomplete backtrace or > > > > even crash the kernel. > > > > > > When do we call dump_backtrace() on a running task that is not current? > > > > > > AFAICT, we don't do that in the arm64-specific callers of dump_backtrace(), and > > > this would have to be some caller of show_stack() in generic code. > > Yes, show_stack() can make caller specify a task and dump its backtrace. > > For example, SysRq-T (echo t > /proc/sysrq-trigger) will use this to > > dump the backtrace of specific tasks. > > Ok. I see that this eventually calls show_state_filter(0), where we call > sched_show_task() for every task. > > > > We pin the task's stack via try_get_task_stack(), so this cannot be unmapped > > > while we walk it. In unwind_frame() we check that the frame record falls > > > entirely within the task's stack. So AFAICT, we cannot crash the kernel here, > > > though the backtrace may be misleading (and we could potentially get stuck in > > > an infinite loop). > > You are right, I have checked the code and it seems that the check for > > fp in unwind_frame() is strong enough to handle the case which stack > > being changed due to task running. And as you mentioned, if > > unfortunately fp is point to the address of itself, the unwind will be > > an infinite loop, but it is a very small probability event, so we can > > ignore this, is that right? > > I think that it would be preferable to try to avoid the inifinite loop > case. We could hit that by accident if we're tracing a live task. > > It's a little tricky to ensure that we don't loop, since we can have > traces that span several stacks, e.g. overflow -> irq -> task, so we > need to know where the last frame was, and we need to defnie a strict > order for stack nesting. Can we consider this through an easier way? According to AArch64 PCS, stack should be full-descending, which means we can add validation on fp by comparing the fp and previous fp, if they are equal means there is an exactly loop, while if current fp is smaller than previous means the uwnind is rollback, which is also unexpected. The only concern is how to handle the unwind from one stack span to another (eg. overflow->irq, or irq->task, etc) Below diff is a proposal that we check if stack spans, and if yes, a tricky is used to bypass the fp check. @@ -127,6 +128,20 @@ void dump_backtrace(struct pt_regs *regs, struct task_struct *tsk) skip = !!regs; printk("Call trace:\n"); do { + unsigned long stack; + if (fp) { + if (in_entry_text(frame.pc)) { + stack = frame.fp - offsetof(struct pt_regs, stackframe); + + if (on_accessible_stack(tsk, stack)) + fp = frame.fp + 0x8; //tricky to bypass the fp check + } + if (fp <= frame->fp) { + pr_notice("fp invalid, stop unwind\n"); + break; + } + } + fp = frame.fp; /* skip until specified stack frame */ if (!skip) { dump_backtrace_entry(frame.pc); > > > > To avoid this case, do not dump the backtrace of the tasks which are > > > > running on other cores. > > > > This patch cannot solve the issue completely but can shrink the window of > > > > race condition. > > > > > > > @@ -113,6 +113,9 @@ void dump_backtrace(struct pt_regs *regs, struct task_struct *tsk) > > > > if (tsk == current) { > > > > frame.fp = (unsigned long)__builtin_frame_address(0); > > > > frame.pc = (unsigned long)dump_backtrace; > > > > + else if (tsk->state == TASK_RUNNING) { > > > > + pr_notice("Do not dump other running tasks\n"); > > > > + return; > > > > > > As you note, if we can race with the task being scheduled, this doesn't help. > > > > > > Can we rule this out at a higher level? > > > Thanks, > > > Mark. > > Actually, according to my previous understanding, the low level function > > should be transparent to callers and should provide the right result and > > handle some unexpected cases, which means that if the result may be > > misleading, we should drop it. That is why I bypass all TASK_RUNNING > > tasks. I am not sure if this understanding is reasonable for this case. > > Given that this can change under our feet, I think this only provides a > false sense of security and complicates the code. > Agreed > > And as you mentioned that rule this out at a higher level, is there any > > suggestions, handle this in the caller of show_stack()? > > Unfortunately, it doesn't look like we can do this in general given > cases like sysrq-t. > > Thanks, > Mark. Best Regards, Ji diff --git a/arch/arm64/kernel/traps.c b/arch/arm64/kernel/traps.c index eb2d151..760ea59 100644 --- a/arch/arm64/kernel/traps.c +++ b/arch/arm64/kernel/traps.c @@ -101,6 +101,7 @@ void dump_backtrace(struct pt_regs *regs, struct task_struct *tsk) { struct stackframe frame; int skip; + unsigned long fp = 0x0; pr_debug("%s(regs = %p tsk = %p)\n", __func__, regs, tsk);