From patchwork Thu Sep 26 19:22:59 2013 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Grant Grundler X-Patchwork-Id: 2951051 Return-Path: X-Original-To: patchwork-linux-mmc@patchwork.kernel.org Delivered-To: patchwork-parsemail@patchwork2.web.kernel.org Received: from mail.kernel.org (mail.kernel.org [198.145.19.201]) by patchwork2.web.kernel.org (Postfix) with ESMTP id B44E1BFF0B for ; Thu, 26 Sep 2013 19:24:53 +0000 (UTC) Received: from mail.kernel.org (localhost [127.0.0.1]) by mail.kernel.org (Postfix) with ESMTP id 7A580201ED for ; Thu, 26 Sep 2013 19:24:52 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 43FD5202C8 for ; Thu, 26 Sep 2013 19:24:51 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754447Ab3IZTYd (ORCPT ); Thu, 26 Sep 2013 15:24:33 -0400 Received: from mail-pb0-f42.google.com ([209.85.160.42]:40058 "EHLO mail-pb0-f42.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754271Ab3IZTXs (ORCPT ); Thu, 26 Sep 2013 15:23:48 -0400 Received: by mail-pb0-f42.google.com with SMTP id un15so1556012pbc.15 for ; Thu, 26 Sep 2013 12:23:48 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=chromium.org; s=google; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=74svHBhBetjMapV2TPvj7b2nLfTxSEzH6EUra2Dsx8s=; b=LPKxEUMZX1pLgDyo3vM7KPeWo6gc3QQYnxGUaF767tHs0xQoeEan0xCGmoJpX/dTRM qfc9R+xYmhbANPhfVS24nYQGTdNsqOSvSPq/ayHpdxUvDhaBJaeCuhCV2fmNDlGMQllH CFQaYUzz/lbBpOA8C+FkMkbh0Rutotaasi4HE= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=74svHBhBetjMapV2TPvj7b2nLfTxSEzH6EUra2Dsx8s=; b=YuNiyXthmM37fQptoYAq/sJQV7LhhgNX7U1xIuQJ2CYviPfVD7ODzyfs+lHV/cY53b WPPyAOMRH8wDqn+4RQ3jpp1UvVg5n+N/eUFh9speNqMh3JuA4jnk45j7paKxvEWLqVqu YFoTNQusw4PNdnj70qOhsnJqmFFEgIxcdB+Rlbv615lHxx6gVxtNYGhJZ/Txl5ZkU1rV 7fj0UOo8Nts5iAlF297XlN/THv5Otr4omgk7kM/cjkFSQPVon8T8OybnaPBgn7FFmSCw SkjzsXXhpn95x4Ced9LcA7hJNNvHnvo2y9spHLZH0WWJIEOl+wgdesZxHYPTMGz6PJRb SXAA== X-Gm-Message-State: ALoCoQkpMlQnKWH/LXEj8gHUBOMlei5/FzH2V0oowIpdNmyBibbeZ3Jn0L/pjKpiM/UL/9Vgt/UX X-Received: by 10.66.217.166 with SMTP id oz6mr7341944pac.22.1380223428297; Thu, 26 Sep 2013 12:23:48 -0700 (PDT) Received: from firesword.mtv.corp.google.com (firesword.mtv.corp.google.com [172.22.73.90]) by mx.google.com with ESMTPSA id ar1sm3900253pbc.34.1969.12.31.16.00.00 (version=TLSv1.1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Thu, 26 Sep 2013 12:23:47 -0700 (PDT) From: Grant Grundler To: Chris Ball , Ulf Hansson , Seungwon Jeon Cc: linux-mmc@vger.kernel.org, linux-kernel@vger.kernel.org, Grant Grundler Subject: [PATCH 6/7] mmc: core: protect references to host->areq with host->lock Date: Thu, 26 Sep 2013 12:22:59 -0700 Message-Id: <1380223380-22451-7-git-send-email-grundler@chromium.org> X-Mailer: git-send-email 1.8.4 In-Reply-To: <1380223380-22451-1-git-send-email-grundler@chromium.org> References: <1380223380-22451-1-git-send-email-grundler@chromium.org> Sender: linux-mmc-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-mmc@vger.kernel.org X-Spam-Status: No, score=-9.2 required=5.0 tests=BAYES_00,DKIM_SIGNED, RCVD_IN_DNSWL_HI,RP_MATCHES_RCVD,T_DKIM_INVALID,UNPARSEABLE_RELAY autolearn=unavailable version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on mail.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Races between host->areq being NULL or not are resulting in mmcqd hung_task panics. Like this one: <3>[ 240.501202] INFO: task mmcqd/1:85 blocked for more than 120 seconds. <3>[ 240.501213] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. <6>[ 240.501223] mmcqd/1 D 80528020 0 85 2 0x00000000 <5>[ 240.501254] [<80528020>] (__schedule+0x614/0x780) from [<80528550>] (schedule+0x94/0x98) <5>[ 240.501269] [<80528550>] (schedule+0x94/0x98) from [<80526270>] (schedule_timeout+0x38/0x2d0) <5>[ 240.501284] [<80526270>] (schedule_timeout+0x38/0x2d0) from [<805283a4>] (wait_for_common+0x164/0x1a0) <5>[ 240.501298] [<805283a4>] (wait_for_common+0x164/0x1a0) from [<805284b8>] (wait_for_completion+0x20/0x24) <5>[ 240.501313] [<805284b8>] (wait_for_completion+0x20/0x24) from [<803d7068>] (mmc_wait_for_req_done+0x2c/0x84) <5>[ 240.501327] [<803d7068>] (mmc_wait_for_req_done+0x2c/0x84) from [<803d81c0>] (mmc_start_req+0x60/0x120) <5>[ 240.501342] [<803d81c0>] (mmc_start_req+0x60/0x120) from [<803e402c>] (mmc_blk_issue_rw_rq+0xa0/0x3a8) <5>[ 240.501355] [<803e402c>] (mmc_blk_issue_rw_rq+0xa0/0x3a8) from [<803e4758>] (mmc_blk_issue_rq+0x424/0x478) <5>[ 240.501368] [<803e4758>] (mmc_blk_issue_rq+0x424/0x478) from [<803e587c>] (mmc_queue_thread+0xb0/0x118) <5>[ 240.501382] [<803e587c>] (mmc_queue_thread+0xb0/0x118) from [<8004d61c>] (kthread+0xa8/0xbc) <5>[ 240.501396] [<8004d61c>] (kthread+0xa8/0xbc) from [<8000f1c8>] (kernel_thread_exit+0x0/0x8) <0>[ 240.501407] Kernel panic - not syncing: hung_task: blocked tasks <5>[ 240.501421] [<800150a4>] (unwind_backtrace+0x0/0x114) from [<80521920>] (dump_stack+0x20/0x24) <5>[ 240.501434] [<80521920>] (dump_stack+0x20/0x24) from [<80521a90>] (panic+0xa8/0x1f4) <5>[ 240.501447] [<80521a90>] (panic+0xa8/0x1f4) from [<80086d3c>] (watchdog+0x1f4/0x25c) <5>[ 240.501459] [<80086d3c>] (watchdog+0x1f4/0x25c) from [<8004d61c>] (kthread+0xa8/0xbc) <5>[ 240.501471] [<8004d61c>] (kthread+0xa8/0xbc) from [<8000f1c8>] (kernel_thread_exit+0x0/0x8) I was able to reproduce the mmcqd "hung task" timeout consistently with this fio command line on an Exynos5250 system with Toshiba HS200 eMMC running in HS200 mode: fio --name=short_randwrite --size=2G --time_based --runtime=3m \ --readwrite=randwrite --numjobs=2 --bs=4k --norandommap \ --ioengine=psync --direct=0 --filename=/dev/mmcblk0p5 I believe the key parameters are "--numjobs=2" (or more) and "randwrite" workload. Then the completions are happening around the same time as mmc_start_req() is referencing and/or updating host->areq. I was NOT able to consistently reproduce the problem on a similar Exynos 5250 system which had "engineering samples" of Samsung HS200 capable eMMC installed. Just my clue that the timing is different (and the fio performance numbers are different too). Signed-off-by: Grant Grundler --- drivers/mmc/core/core.c | 34 +++++++++++++++++++++++----------- 1 file changed, 23 insertions(+), 11 deletions(-) diff --git a/drivers/mmc/core/core.c b/drivers/mmc/core/core.c index 36cfe91..e5a9599 100644 --- a/drivers/mmc/core/core.c +++ b/drivers/mmc/core/core.c @@ -529,29 +529,40 @@ struct mmc_async_req *mmc_start_req(struct mmc_host *host, { int saved_err = 0; int start_err = 0; - struct mmc_async_req *saved_areq = host->areq; + struct mmc_async_req *saved_areq; + unsigned long flags; - if (!saved_areq && !areq) - /* Nothing to do...some code is polling. */ + spin_lock_irqsave(&host->lock, flags); + saved_areq = host->areq; + if (!saved_areq && !areq) { + /* Nothing? Code is racing to harvest a completion. */ + spin_unlock_irqrestore(&host->lock, flags); goto set_error; + } /* Prepare a new request */ if (areq) mmc_pre_req(host, areq->mrq, !saved_areq); if (saved_areq) { + /* This thread owns this IO (saved_areq) for now. */ + host->areq = NULL; + spin_unlock_irqrestore(&host->lock, flags); + saved_err = mmc_wait_for_data_req_done(host, saved_areq->mrq, areq); if (saved_err == MMC_BLK_NEW_REQUEST) { - /* - * The previous request was not completed, - * nothing to return - */ + spin_lock_irqsave(&host->lock, flags); + BUG_ON(host->areq != NULL); + + /* Not completed. Don't report it. */ + host->areq = saved_areq; saved_areq = NULL; + saved_err = 0; + spin_unlock_irqrestore(&host->lock, flags); goto set_error; } - /* - * Check BKOPS urgency for each R1 response - */ + + /* Check BKOPS urgency for each R1 response */ if (host->card && mmc_card_mmc(host->card) && ((mmc_resp_type(saved_areq->mrq->cmd) == MMC_RSP_R1) || (mmc_resp_type(saved_areq->mrq->cmd) == MMC_RSP_R1B)) && @@ -562,11 +573,12 @@ struct mmc_async_req *mmc_start_req(struct mmc_host *host, /* Don't start something new if previous one failed. */ if (!saved_err && areq) { start_err = __mmc_start_data_req(host, areq->mrq); + /* Cancel a prepared request if it was not started. */ if (start_err) { mmc_post_req(host, areq->mrq, -EINVAL); host->areq = NULL; - } else + } else host->areq = areq; }