From patchwork Thu Sep 11 22:03:08 2014 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Stephen Warren X-Patchwork-Id: 4890731 Return-Path: X-Original-To: patchwork-linux-mmc@patchwork.kernel.org Delivered-To: patchwork-parsemail@patchwork1.web.kernel.org Received: from mail.kernel.org (mail.kernel.org [198.145.19.201]) by patchwork1.web.kernel.org (Postfix) with ESMTP id BDAB59F1D3 for ; Thu, 11 Sep 2014 21:57:35 +0000 (UTC) Received: from mail.kernel.org (localhost [127.0.0.1]) by mail.kernel.org (Postfix) with ESMTP id 74CD320212 for ; Thu, 11 Sep 2014 22:03:01 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 3C085201F7 for ; Thu, 11 Sep 2014 22:03:00 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752430AbaIKWC6 (ORCPT ); Thu, 11 Sep 2014 18:02:58 -0400 Received: from avon.wwwdotorg.org ([70.85.31.133]:54366 "EHLO avon.wwwdotorg.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751865AbaIKWC5 (ORCPT ); Thu, 11 Sep 2014 18:02:57 -0400 Received: from severn.wwwdotorg.org (unknown [192.168.65.5]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by avon.wwwdotorg.org (Postfix) with ESMTPS id 0A78F625C; Thu, 11 Sep 2014 16:02:57 -0600 (MDT) Received: from [127.0.0.1] (localhost [127.0.0.1]) (using TLSv1 with cipher DHE-RSA-AES128-SHA (128/128 bits)) (No client certificate requested) by severn.wwwdotorg.org (Postfix) with ESMTPSA id 62C30E460D; Thu, 11 Sep 2014 16:02:55 -0600 (MDT) Message-ID: <54121C1C.8080408@wwwdotorg.org> Date: Thu, 11 Sep 2014 16:03:08 -0600 From: Stephen Warren User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.0 MIME-Version: 1.0 To: Chris Ball , Ulf Hansson CC: Russell King , Adrian Hunter , Alexandre Courbot , "linux-mmc@vger.kernel.org" , "linux-kernel@vger.kernel.org" Subject: mmc: freeing host while host->detect work queue is still active X-Virus-Scanned: clamav-milter 0.97.8 at avon.wwwdotorg.org X-Virus-Status: Clean Sender: linux-mmc-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-mmc@vger.kernel.org X-Spam-Status: No, score=-9.4 required=5.0 tests=BAYES_00, RCVD_IN_DNSWL_HI, RP_MATCHES_RCVD, UNPARSEABLE_RELAY autolearn=ham version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on mail.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Running Fedora rawhides's 3.17.0-0.rc4.git2.1.fc22.armv7hl kernel on Jetson TK1 (an ARM board containing Tegra SoC), I see the following during boot most times the Tegra SDHCI driver defers probe for the SD slot: > [ 8.377719] sdhci-tegra 700b0400.sdhci: Got CD GPIO #170. > [ 8.377780] sdhci-tegra 700b0400.sdhci: Got WP GPIO #132. > [ 8.377898] mmc1: Unknown controller version (3). You may experience problems. > [ 8.379796] sdhci-tegra 700b0400.sdhci: No vmmc regulator found > [ 8.380225] ------------[ cut here ]------------ > [ 8.380243] WARNING: CPU: 2 PID: 6 at lib/debugobjects.c:263 debug_print_object+0x8c/0xb4() > [ 8.380261] ODEBUG: free active (active state 0) object type: timer_list hint: delayed_work_timer_fn+0x0/0x18 > [ 8.380308] Modules linked in: ehci_tegra(+) sdhci_tegra sdhci_pltfm sdhci phy_tegra_usb mmc_core i2c_tegra tegra_drm drm_kms_helper drm host1x > [ 8.380319] CPU: 2 PID: 6 Comm: kworker/u8:0 Not tainted 3.17.0-0.rc4.git2.1.fc22.armv7hl #1 > [ 8.380336] Workqueue: deferwq deferred_probe_work_func > [ 8.380358] [] (unwind_backtrace) from [] (show_stack+0x18/0x1c) > [ 8.380374] [] (show_stack) from [] (dump_stack+0x84/0xb0) > [ 8.380395] [] (dump_stack) from [] (warn_slowpath_common+0x70/0x94) > [ 8.380413] [] (warn_slowpath_common) from [] (warn_slowpath_fmt+0x34/0x44) > [ 8.380426] [] (warn_slowpath_fmt) from [] (debug_print_object+0x8c/0xb4) > [ 8.380439] [] (debug_print_object) from [] (debug_check_no_obj_freed+0xf4/0x1dc) > [ 8.380453] [] (debug_check_no_obj_freed) from [] (kfree+0x160/0x308) > [ 8.380465] [] (kfree) from [] (device_release+0x64/0x98) > [ 8.380479] [] (device_release) from [] (kobject_release+0x12c/0x19c) > [ 8.380498] [] (kobject_release) from [] (sdhci_tegra_probe+0x190/0x1bc [sdhci_tegra]) > [ 8.380810] [] (sdhci_tegra_probe [sdhci_tegra]) from [] (platform_drv_probe+0x34/0x68) > [ 8.380825] [] (platform_drv_probe) from [] (driver_probe_device+0x13c/0x33c) > [ 8.380838] [] (driver_probe_device) from [] (bus_for_each_drv+0x8c/0x9c) > [ 8.380852] [] (bus_for_each_drv) from [] (device_attach+0x70/0x94) > [ 8.380865] [] (device_attach) from [] (bus_probe_device+0x30/0xa4) > [ 8.380878] [] (bus_probe_device) from [] (deferred_probe_work_func+0x88/0xb8) > [ 8.380894] [] (deferred_probe_work_func) from [] (process_one_work+0x2a0/0x5f8) > [ 8.380908] [] (process_one_work) from [] (worker_thread+0x2d0/0x40c) > [ 8.380922] [] (worker_thread) from [] (kthread+0xd4/0xe8) > [ 8.380936] [] (kthread) from [] (ret_from_fork+0x14/0x20) > [ 8.380941] ---[ end trace b1f4b0fe632eb3a4 ]--- The problem is as follows: The following sequence of calls requests the CD (Change Detect) IRQ for the SD slot: sdhci_tegra_probe() sdhci_tegra_parse_dt() mmc_of_parse() mmc_gpio_request_cd() devm_request_threaded_irq() The IRQ is triggered by a pin on the SD slot, so could be in any state, and might fire immediately. This causes the IRQ handler mmc_gpio_cd_irqt() to run, which calls mmc_detect_change() which calls mmc_schedule_delayed_work(&host->detect, ...); Thus, the work queue can be immediately active. However, if later part of sdhci_tegra_probe() fails, e.g. sdhci_add_host() -> mmc_regulator_get_supply() -> devm_regulator_get_optional() -> -EPROBE_DEFER, then sdhci_tegra_probe() needs to tear everything down, and so calls sdhci_pltfm_free() -> sdhci_free_host() -> mmc_free_host() -> put_device(&host->class_dev) -> mmc_host_classdev_release() -> kfree(host). However, host->detect is part of host and may still be active at this point, which is what triggers the ODETECT log spew mentioned above. This doesn't happen in linux-next, because mmc_gpiod_request_cd() doesn't set up the IRQ handler. Rather this happens inside mmc_gpiod_request_cd_irq() which is called from mmc_start_host(), which I believe is well after deferred probe could occur. Some possible options for fixing this in 3.17 are: a) Back-port whatever changes in mainline stopped mmc_of_parse()() from requesting the IRQ. That is commit 740a221ef0e5 "mmc: slot-gpio: Add GPIO descriptor based CD GPIO API". I haven't investigated how many other commits would be required for that commit to apply, but I'm nervous it'd be too many to apply to 3.17. b) Fix the -EPROBE_DEFER cleanup path to explicitly free the IRQ (so it can't schedule the work queue any more), and then cancel the work queue (so it isn't active and doesn't trigger the ODETECT message). A naive patch might be like: However, that's problematic, since I think mmc_free_host() would have to somehow determine whether it needs to call mmc_gpio_free_cd() or mmc_gpiod_free_cd()? Perhaps we should *just* call devm_free_irq() there, rather than tearing down all the GPIO stuff too, i.e. split out a separate mmc_gpio_free_cd_irq() function rather like there's a separate IRQ request function? What approach do people prefer and/or does anyone have any other ideas for a fix? --- To unsubscribe from this list: send the line "unsubscribe linux-mmc" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html diff --git a/drivers/mmc/core/host.c b/drivers/mmc/core/host.c index 31969436d77c..8125f916be4a 100644 --- a/drivers/mmc/core/host.c +++ b/drivers/mmc/core/host.c @@ -586,6 +586,9 @@ void mmc_free_host(struct mmc_host *host) idr_remove(&mmc_host_idr, host->index); spin_unlock(&mmc_host_lock); + mmc_gpiod_free_cd(host); + cancel_delayed_work_sync(&host->detect); + put_device(&host->class_dev); }