From patchwork Fri Jan 13 19:57:36 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Grygorii Strashko X-Patchwork-Id: 9516321 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id C51E86077E for ; Fri, 13 Jan 2017 19:57:56 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id B5E5F28781 for ; Fri, 13 Jan 2017 19:57:56 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id AA54528783; Fri, 13 Jan 2017 19:57:56 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.9 required=2.0 tests=BAYES_00,RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id C897A28781 for ; Fri, 13 Jan 2017 19:57:55 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751137AbdAMT5z (ORCPT ); Fri, 13 Jan 2017 14:57:55 -0500 Received: from lelnx193.ext.ti.com ([198.47.27.77]:47096 "EHLO lelnx193.ext.ti.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750969AbdAMT5y (ORCPT ); Fri, 13 Jan 2017 14:57:54 -0500 Received: from dflxv15.itg.ti.com ([128.247.5.124]) by lelnx193.ext.ti.com (8.15.1/8.15.1) with ESMTP id v0DJvbpJ030132; Fri, 13 Jan 2017 13:57:37 -0600 Received: from DLEE70.ent.ti.com (dlemailx.itg.ti.com [157.170.170.113]) by dflxv15.itg.ti.com (8.14.3/8.13.8) with ESMTP id v0DJvbYi016184; Fri, 13 Jan 2017 13:57:37 -0600 Received: from [128.247.83.96] (128.247.83.96) by DLEE70.ent.ti.com (157.170.170.113) with Microsoft SMTP Server id 14.3.294.0; Fri, 13 Jan 2017 13:57:36 -0600 Subject: Re: [PATCHv2] dmaengine: cppi41: Fix oops in cppi41_runtime_resume To: Tony Lindgren References: <20170113180132.9188-1-tony@atomide.com> <20170113190126.GE2630@atomide.com> CC: Dan Williams , Vinod Koul , Bin Liu , Daniel Mack , Felipe Balbi , Johan Hovold , Peter Ujfalusi , Sekhar Nori , Sebastian Andrzej Siewior , , , , Andy Shevchenko , Kevin Hilman , Patrick Titiano , Sergei Shtylyov From: Grygorii Strashko Message-ID: Date: Fri, 13 Jan 2017 13:57:36 -0600 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.5.1 MIME-Version: 1.0 In-Reply-To: X-Originating-IP: [128.247.83.96] Sender: dmaengine-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: dmaengine@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP On 01/13/2017 01:53 PM, Grygorii Strashko wrote: > > > On 01/13/2017 01:01 PM, Tony Lindgren wrote: >> * Grygorii Strashko [170113 10:37]: >>> >>> >>> On 01/13/2017 12:01 PM, Tony Lindgren wrote: >>>> Commit fdea2d09b997 ("dmaengine: cppi41: Add basic PM runtime support") >>>> together with recent MUSB changes allowed USB and DMA on BeagleBone to idle >>>> when no cable is connected. But looks like few corner case issues still >>>> remain. >>>> >>>> Looks like just by re-plugging USB cable about ten or so times on BeagleBone >>>> when configured in USB peripheral mode we can get warnings and eventually >>>> trigger an oops in cppi41 DMA: >>>> >>>> WARNING: CPU: 0 PID: 14 at drivers/dma/cppi41.c:1154 cppi41_runtime_suspend+ >>>> x28/0x38 [cppi41] >>>> ... >>>> >>>> WARNING: CPU: 0 PID: 14 at drivers/dma/cppi41.c:452 >>>> push_desc_queue+0x94/0x9c [cppi41] >>>> ... >>>> >>>> Unable to handle kernel NULL pointer dereference at virtual >>>> address 00000104 >>>> pgd = c0004000 >>>> [00000104] *pgd=00000000 >>>> Internal error: Oops: 805 [#1] SMP ARM >>>> ... >>>> [] (cppi41_runtime_resume [cppi41]) from [] >>>> (__rpm_callback+0xc0/0x214) >>>> [] (__rpm_callback) from [] (rpm_callback+0x20/0x80) >>>> [] (rpm_callback) from [] (rpm_resume+0x504/0x78c) >>>> [] (rpm_resume) from [] (pm_runtime_work+0x60/0xa8) >>>> [] (pm_runtime_work) from [] (process_one_work+0x2b4/0x808) >>>> >>>> This is because of a race with runtime PM and cppi41_dma_issue_pending() >>>> as reported by Alexandre Bailon in earlier >>>> set of patches. Based on mailing list discussions we however came to the >>>> conclusion that a different fix from Alexandre's fix is needed in order >>>> to guarantee that DMA is really active when we try to use it. >>>> >>>> To fix the issue, we need to add a driver specific flag as we otherwise >>>> can have -EINPROGRESS state set by runtime PM and can't rely on >>>> pm_runtime_active() to tell us when we can use the DMA. >>>> >>>> And we need to make sure the DMA transfers get triggered in the queued >>>> order. So let's always queue the transfers, then flush the queue >>>> from both cppi41_dma_issue_pending() and cppi41_runtime_resume() >>>> as suggested by Grygorii Strashko in an >>>> earlier example patch. >>>> >>>> For reference, this is also documented in Documentation/power/runtime_pm.txt >>>> in the example at the end of the file as pointed out by Grygorii Strashko >>>> . >>>> >>>> Based on earlier patches from Alexandre Bailon >>>> and Grygorii Strashko modified based on >>>> testing and what was discussed on the mailing lists. >>>> >>>> Fixes: fdea2d09b997 ("dmaengine: cppi41: Add basic PM runtime support") >>>> Cc: Andy Shevchenko >>>> Cc: Bin Liu >>>> Cc: Grygorii Strashko >>>> Cc: Kevin Hilman >>>> Cc: Patrick Titiano >>>> Cc: Sergei Shtylyov >>>> Reported-by: Alexandre Bailon >>>> Signed-off-by: Tony Lindgren >>>> --- >>>> >>>> Alexandre & Grygorii, can you guys please review and test? >>> >>> I've just tested it on BBB. And it triggers warning from IRQ >>> >>> root@am335x-evm:~# [ 242.280546] ------------[ cut here ]------------ >>> [ 242.285532] WARNING: CPU: 0 PID: 0 at drivers/dma/cppi41.c:330 cppi41_irq+0x228/0x260 [cppi41] >>> [ 242.335164] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G W 4.10.0-rc3-00560-g51afc29 #133 >>> [ 242.344601] Hardware name: Generic AM33XX (Flattened Device Tree) >>> [ 242.351084] [] (unwind_backtrace) from [] (show_stack+0x10/0x14) >>> [ 242.359269] [] (show_stack) from [] (dump_stack+0xac/0xe0) >>> [ 242.366919] [] (dump_stack) from [] (__warn+0xd8/0x104) >>> [ 242.374282] [] (__warn) from [] (warn_slowpath_null+0x20/0x28) >>> [ 242.382299] [] (warn_slowpath_null) from [] (cppi41_irq+0x228/0x260 [cppi41]) >>> [ 242.391714] [] (cppi41_irq [cppi41]) from [] (__handle_irq_event_percpu+0x48/0x3b4) >>> [ 242.401727] [] (__handle_irq_event_percpu) from [] (handle_irq_event_percpu+0x1c/0x58) >>> [ 242.412017] [] (handle_irq_event_percpu) from [] (handle_irq_event+0x38/0x5c) >>> [ 242.421477] [] (handle_irq_event) from [] (handle_level_irq+0xc0/0x154) >>> [ 242.430378] [] (handle_level_irq) from [] (generic_handle_irq+0x20/0x34) >>> [ 242.439378] [] (generic_handle_irq) from [] (__handle_domain_irq+0x64/0xe0) >>> [ 242.448660] [] (__handle_domain_irq) from [] (__irq_svc+0x70/0x98) >>> [ 242.457117] [] (__irq_svc) from [] (arch_cpu_idle+0x20/0x3c) >>> [ 242.464935] [] (arch_cpu_idle) from [] (do_idle+0x164/0x218) >>> [ 242.472748] [] (do_idle) from [] (cpu_startup_entry+0x18/0x1c) >>> [ 242.480758] [] (cpu_startup_entry) from [] (start_kernel+0x344/0x3bc) >>> [ 242.489376] ---[ end trace 12c5b6488c1e8c75 ]--- >>> [ 242.496525] usb 1-1.3: USB disconnect, device number 4 >>> >>> test sequence: >>> - plug usb hub + cd card reader >>> - plug usb stick in hub >>> 37 mount /dev/sdb /media >>> 38 ls /media/ >>> 39 rm /media/data >>> 40 time dd if=/dev/zero of=/media/data bs=128k count=100 >>> 41 umount /media >>> - unplug USB stick - > Warning >>> >>> ^ The same sequence without Hub -- > no warnings >>> >>> When I plug-unplug USB stick from Hub I can reproduce it pretty often :( >>> but it can be different issue :( >> >> Hmm interesting, I'll try it here too. I wonder if adding spin_lock_irqsave/ >> restore around the WARN_ON() in the interrupt handler is enough to make it >> go away? >> > > I think not :) Most probably the PM runtime autosuspend is too small for the > USB hub case. > > I've applied below diff and can see that PM runtime suspend happens before > IRQ is triggered and there is one pending descs always. > > Then instead of trying to adjust autosuspend delay I tried to > maintain proper PM runtime counter: > - push desc -> pm_runtime_get -> counter ++ > - irq -> pop desc -> pm_runtime_put_autosuspend -> counter-- > which means "Keep cppi active until as least on desc is in progress"/ > > As result with above change I do not see any warning any more - i've tried two different hubs. > > I saw your 098de42 "dmaengine: cppi41: Fix unpaired pm runtime when only a USB hub is connected", > but, to be honest, I did not get how is it possible to get unpaired get/put from your description: > - descriptor pushed in to the Q should be returned back as part of normal operation > or as part of cppi41_tear_down_chan. May be you've hit an issue with tear_down? > > Simplified diff with fix on top of your patch: diff --git a/drivers/dma/cppi41.c b/drivers/dma/cppi41.c index ce37a1a..9e9403a 100644 --- a/drivers/dma/cppi41.c +++ b/drivers/dma/cppi41.c @@ -319,12 +319,6 @@ static irqreturn_t cppi41_irq(int irq, void *data) while (val) { u32 desc, len; - int error; - - error = pm_runtime_get(cdd->ddev.dev); - if ((error != -EINPROGRESS) && (error < 0)) - dev_err(cdd->ddev.dev, "%s pm runtime get: %i\n", - __func__, error); /* This warning should never trigger */ WARN_ON(cdd->is_suspended); @@ -500,7 +494,6 @@ static void cppi41_dma_issue_pending(struct dma_chan *chan) spin_unlock_irqrestore(&cdd->lock, flags); pm_runtime_mark_last_busy(cdd->ddev.dev); - pm_runtime_put_autosuspend(cdd->ddev.dev); } static u32 get_host_pd0(u32 length)