From patchwork Wed Feb 28 19:12:17 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ben Greear X-Patchwork-Id: 10249365 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id C34CB60365 for ; Wed, 28 Feb 2018 19:12:47 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id A7B7D28D36 for ; Wed, 28 Feb 2018 19:12:47 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 9C69928D3A; Wed, 28 Feb 2018 19:12:47 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-1.9 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID autolearn=unavailable version=3.3.1 Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id 3069028D3B for ; Wed, 28 Feb 2018 19:12:47 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20170209; h=Sender: Content-Transfer-Encoding:Content-Type:MIME-Version:Cc:List-Subscribe: List-Help:List-Post:List-Archive:List-Unsubscribe:List-Id:Message-Id:Date: Subject:To:From:Reply-To:Content-ID:Content-Description:Resent-Date: Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:In-Reply-To: References:List-Owner; bh=xC3VuzUUxjzGGAVCTDKVOYTTYn2XDXhdE6md0QkUk78=; b=lXZ Alew0pp4F4ZmG6NOnBLo1AishodHsnurYXJ+GhqP4XXXT8APpv7nZooZBqXE+CuDCZ4jOjCZNJwO5 pp5l7ypnyiuJe9LZomJCrYnlnVHD0sW/q8cahPcGQMtWrNBjd47KCK6S/95hyhuVIuHDNxEZdGujO Xww9YWtl29c96RBgQxqp9kzuXeDifSQQkZGNxvuWU6rB+QeuTyASp6GXn6FKBAGN4dXKvklflnnuF Q7LcPAxUd6q3UkwkMY49IBL91iV7R/1IOQ+LObLlz8uPuOBmmDFXIysJkxrUesQvmxbZvvn0ZdR/z csQjmiKG9of46egsgIiUBRDUfAeABbA==; Received: from localhost ([127.0.0.1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.89 #1 (Red Hat Linux)) id 1er79b-0004eR-8m; Wed, 28 Feb 2018 19:12:39 +0000 Received: from mail2.candelatech.com ([208.74.158.173]) by bombadil.infradead.org with esmtp (Exim 4.89 #1 (Red Hat Linux)) id 1er79Y-0004d2-NW for ath10k@lists.infradead.org; Wed, 28 Feb 2018 19:12:38 +0000 Received: from ben-dt3.candelatech.com (firewall.candelatech.com [50.251.239.81]) by mail2.candelatech.com (Postfix) with ESMTP id 836F040A30A; Wed, 28 Feb 2018 11:12:20 -0800 (PST) From: greearb@candelatech.com To: linux-wireless@vger.kernel.org Subject: [PATCH] ath10k: Attempt to work around napi_synchronize hang. Date: Wed, 28 Feb 2018 11:12:17 -0800 Message-Id: <1519845137-15365-1-git-send-email-greearb@candelatech.com> X-Mailer: git-send-email 2.4.11 X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20180228_111236_832701_F944957C X-CRM114-Status: GOOD ( 10.69 ) X-BeenThere: ath10k@lists.infradead.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Ben Greear , ath10k@lists.infradead.org MIME-Version: 1.0 Sender: "ath10k" Errors-To: ath10k-bounces+patchwork-ath10k=patchwork.kernel.org@lists.infradead.org X-Virus-Scanned: ClamAV using ClamSMTP From: Ben Greear Calling napi_disable twice in a row (w/out starting it and/or without having NAPI active leads to deadlock because napi_disable sets NAPI_STATE_SCHED and NAPI_STATE_NPSVC when it returns, as far as I can tell. So, guard this call to napi_disable. I believe the failure case is something like this: rmmod ath10k_pci ath10k_core Firmware crashes before hif_stop is called by the rmmod path The crash handling logic calls hif_stop Then rmmod gets around to calling hif_stop, but spins endlessly in napi_synchronize. I think one way this could happen is that ath10k_stop checks for state != ATH10K_STATE_OFF, but STATE_RESTARTING is also a possibility. That might be how we can have hif_stop called twice without a hif_start in between. --Ben Signed-off-by: Ben Greear --- * since RFC: Added similar code to ahb This seems needed back to at least 4.9 kernels. drivers/net/wireless/ath/ath10k/ahb.c | 9 +++++++-- drivers/net/wireless/ath/ath10k/core.h | 1 + drivers/net/wireless/ath/ath10k/pci.c | 25 +++++++++++++++++++++++-- 3 files changed, 31 insertions(+), 4 deletions(-) diff --git a/drivers/net/wireless/ath/ath10k/ahb.c b/drivers/net/wireless/ath/ath10k/ahb.c index da770af..f826c59 100644 --- a/drivers/net/wireless/ath/ath10k/ahb.c +++ b/drivers/net/wireless/ath/ath10k/ahb.c @@ -641,6 +641,8 @@ static int ath10k_ahb_hif_start(struct ath10k *ar) ath10k_dbg(ar, ATH10K_DBG_BOOT, "boot ahb hif start\n"); napi_enable(&ar->napi); + ar->napi_enabled = true; + ath10k_ce_enable_interrupts(ar); ath10k_pci_enable_legacy_irq(ar); @@ -660,8 +662,11 @@ static void ath10k_ahb_hif_stop(struct ath10k *ar) ath10k_pci_flush(ar); - napi_synchronize(&ar->napi); - napi_disable(&ar->napi); + if (ar->napi_enabled) { + napi_synchronize(&ar->napi); + napi_disable(&ar->napi); + ar->napi_enabled = false; + } } static int ath10k_ahb_hif_power_up(struct ath10k *ar) diff --git a/drivers/net/wireless/ath/ath10k/core.h b/drivers/net/wireless/ath/ath10k/core.h index 72b4495..c7ba49f 100644 --- a/drivers/net/wireless/ath/ath10k/core.h +++ b/drivers/net/wireless/ath/ath10k/core.h @@ -1205,6 +1205,7 @@ struct ath10k { /* NAPI */ struct net_device napi_dev; struct napi_struct napi; + bool napi_enabled; struct work_struct stop_scan_work; diff --git a/drivers/net/wireless/ath/ath10k/pci.c b/drivers/net/wireless/ath/ath10k/pci.c index 398e413..9131e15 100644 --- a/drivers/net/wireless/ath/ath10k/pci.c +++ b/drivers/net/wireless/ath/ath10k/pci.c @@ -1956,6 +1956,7 @@ static int ath10k_pci_hif_start(struct ath10k *ar) ath10k_dbg(ar, ATH10K_DBG_BOOT, "boot hif start\n"); napi_enable(&ar->napi); + ar->napi_enabled = true; ath10k_pci_irq_enable(ar); ath10k_pci_rx_post(ar); @@ -2086,8 +2087,28 @@ static void ath10k_pci_hif_stop(struct ath10k *ar) ath10k_pci_irq_disable(ar); ath10k_pci_irq_sync(ar); ath10k_pci_flush(ar); - napi_synchronize(&ar->napi); - napi_disable(&ar->napi); + + /* Calling napi_disable twice in a row (w/out starting it and/or without + * having NAPI active leads to deadlock because napi_disable sets + * NAPI_STATE_SCHED and NAPI_STATE_NPSVC when it returns, as far as I + * can tell. So, guard this call to napi_disable. I believe the + * failure case is something like this: + * rmmod ath10k_pci ath10k_core + * Firmware crashes before hif_stop is called by the rmmod path + * The crash handling logic calls hif_stop + * Then rmmod gets around to calling hif_stop, but spins endlessly + * in napi_synchronize. + * + * I think one way this could happen is that ath10k_stop checks + * for state != ATH10K_STATE_OFF, but STATE_RESTARTING is also + * a possibility. That might be how we can have hif_stop called twice + * without a hif_start in between. --Ben + */ + if (ar->napi_enabled) { + napi_synchronize(&ar->napi); + napi_disable(&ar->napi); + ar->napi_enabled = false; + } spin_lock_irqsave(&ar_pci->ps_lock, flags); WARN_ON(ar_pci->ps_wake_refcount > 0);