From patchwork Tue Mar 29 08:14:22 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Michal Kazior X-Patchwork-Id: 8683781 Return-Path: X-Original-To: patchwork-ath10k@patchwork.kernel.org Delivered-To: patchwork-parsemail@patchwork1.web.kernel.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.136]) by patchwork1.web.kernel.org (Postfix) with ESMTP id B9EC79F3D1 for ; Tue, 29 Mar 2016 08:14:55 +0000 (UTC) Received: from mail.kernel.org (localhost [127.0.0.1]) by mail.kernel.org (Postfix) with ESMTP id BC4AF201E4 for ; Tue, 29 Mar 2016 08:14:54 +0000 (UTC) Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.9]) (using TLSv1.2 with cipher AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id CA3682015A for ; Tue, 29 Mar 2016 08:14:53 +0000 (UTC) Received: from localhost ([127.0.0.1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.80.1 #2 (Red Hat Linux)) id 1akonW-0001BN-Cw; Tue, 29 Mar 2016 08:14:46 +0000 Received: from mail-wm0-x229.google.com ([2a00:1450:400c:c09::229]) by bombadil.infradead.org with esmtps (Exim 4.80.1 #2 (Red Hat Linux)) id 1akonT-00010Y-Nl for ath10k@lists.infradead.org; Tue, 29 Mar 2016 08:14:44 +0000 Received: by mail-wm0-x229.google.com with SMTP id 20so14457456wmh.1 for ; Tue, 29 Mar 2016 01:14:23 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=tieto.com; s=google; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-transfer-encoding; bh=sHFj4BPgZ06MPqtCByOQiGVYgYVVGGfK5fGiXkNPWZs=; b=K0SV8TNICwQpihi6GfdD8XmhOSK0bitesjXISMckLQEm8Ua7zS4k5oN/bZOWwW9BWu I6RdsJFqaOeYm6gkfJ+lM9PXuIao4rj6jwGAdlBfClnZi8jIfZmDCaWlH9I53l/zP2v7 vOrezBA6XjEl0ZHiQoTltNcbddZHX/ChTx3uo= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:date :message-id:subject:from:to:cc:content-transfer-encoding; bh=sHFj4BPgZ06MPqtCByOQiGVYgYVVGGfK5fGiXkNPWZs=; b=eiq5dvJdehMrs6gv7hXZe9kKJ5soXcAgubuSzR7Iqc2vGyzuQLuwVZMvpIhv3lUtHP qmeUqrDvuhrMdvRG+grZfgSvSvvkC6YYh5wQ2s1gJs6u7n1Jid+AlzbPezTkTJozdWoC ghS2hWor24G+EdZh4GvKJsULwgmzKdOsQrhcJ3ZaCUttPYWLQpnuZgKatAjZ3sErn1aC Wf0BWcre0yA9BdxXwcFT5M7Nwf+Z7wYWMAVsfz8jsPcQ0F9GhIAII4DFhtcjaFxgKfqv Ldrg+lSTb+vMlq/s+kC6VjNRnxryIegnu8pLX0BiQfxOn4W805wpnVrZ0+kRRYXrvY9i xlkw== X-Gm-Message-State: AD7BkJJgOUL6XlJB05LZO87lSy/g7v2p+SC1ee0cA7wFObjkv4P5DlXmVJXiil2JK4mcfuzISOlXNRp/s0CBB4M13oBAh6c30FgXR3Zlh/s+DEd6HROjaO8QSP6gti8lBxCGcMQQu4hzuqLWohL04RGk MIME-Version: 1.0 X-Received: by 10.28.55.139 with SMTP id e133mr1615609wma.98.1459239262215; Tue, 29 Mar 2016 01:14:22 -0700 (PDT) Received: by 10.194.115.3 with HTTP; Tue, 29 Mar 2016 01:14:22 -0700 (PDT) In-Reply-To: <56F5F38A.2070602@candelatech.com> References: <56F5F38A.2070602@candelatech.com> Date: Tue, 29 Mar 2016 10:14:22 +0200 Message-ID: Subject: Re: Deadlock on (faked) firmware crash, CUS239, modified 10.4.3 firmware. From: Michal Kazior To: Ben Greear X-DomainID: tieto.com X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20160329_011443_968860_4301949F X-CRM114-Status: GOOD ( 12.04 ) X-Spam-Score: -2.7 (--) X-BeenThere: ath10k@lists.infradead.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: ath10k Sender: "ath10k" Errors-To: ath10k-bounces+patchwork-ath10k=patchwork.kernel.org@lists.infradead.org X-Spam-Status: No, score=-5.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, RCVD_IN_DNSWL_MED,RP_MATCHES_RCVD,T_DKIM_INVALID,UNPARSEABLE_RELAY autolearn=ham version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on mail.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP On 26 March 2016 at 03:27, Ben Greear wrote: > I've been seeing this for a while now. When firmware crashes, often the OS > at least > partially locks up. > > This is modified 4.4.6 driver/kernel, modified 10.4.3 firmware. I had 35 > stations associated, > and reset one. Flush fails (maybe because nothing stops tx on other vdevs > while flushing one?) > and I added a fake firmware crash even in case flush fails. > > Then, I get deadlock. I've seen other similar deadlocks when the firmware > crashed due > to 'natural' causes when adding vdevs.... > > Looks like the same process is not actually stuck in one place...each time > the kernel splats, > it is in a different place..spinning and spinning. Maybe it needs a > bail-out on firmware > crash? [...] > [ 316.477677] NMI watchdog: BUG: soft lockup - CPU#1 stuck for 22s! > [kworker/u8:3:257] > [ 316.477720] Modules linked in: nf_conntrack_netlink nf_conntrack > nfnetlink nf_defrag_ipv4 8021q garp mrp stp llc bnep bluetooth fuse macvlan > wanlink(O) pktgen rpcsec_gss_krb5 nfsv4 nfs fscache iTCO_wdt > iTCO_vendor_support coretemp ath9k ath10k_pci hwmon ath9k_common ath10k_core > ath9k_hw intel_rapl iosf_mbi ath x86_pkg_temp_thermal intel_powerclamp > mac80211 kvm_intel kvm joydev irqbypass pcspkr serio_raw cfg80211 > snd_hda_codec_hdmi lpc_ich i2c_i801 snd_hda_codec_realtek > snd_hda_codec_generic snd_hda_intel snd_hda_codec snd_hda_core snd_hwdep > snd_seq snd_seq_device snd_pcm 8250_fintek snd_timer snd shpchp soundcore > tpm_tis tpm nfsd auth_rpcgss nfs_acl lockd grace sunrpc ata_generic > pata_acpi i915 e1000e ptp pps_core i2c_algo_bit drm_kms_helper drm i2c_core > fjes video ipv6 [last unloaded: nf_conntrack] > > [ 316.477721] irq event stamp: 2111179 > [ 316.477727] hardirqs last enabled at (2111179): [] > vprintk_emit+0x3ab/0x46a > [ 316.477730] hardirqs last disabled at (2111178): [] > vprintk_emit+0x5c/0x46a > [ 316.477742] softirqs last enabled at (2111014): [] > ath10k_set_key+0x136/0x602 [ath10k_core] > [ 316.477749] softirqs last disabled at (2111012): [] > ath10k_set_key+0x117/0x602 [ath10k_core] > [ 316.477751] CPU: 1 PID: 257 Comm: kworker/u8:3 Tainted: G W O > 4.4.6+ #21 > [ 316.477752] Hardware name: To be filled by O.E.M. To be filled by > O.E.M./HURONRIVER, BIOS 4.6.5 05/02/2012 > [ 316.477780] Workqueue: wiphy3 ieee80211_iface_work [mac80211] > [ 316.477781] task: ffff880212d225c0 ti: ffff880212d50000 task.ti: > ffff880212d50000 > [ 316.477790] RIP: 0010:[] [] > ath10k_mac_tx_push_pending+0xc1/0x12d [ath10k_core] Just in case, do you have these applied? 750eeed89cf3 ath10k: fix pull-push tx threshold handling 9d71d47eed20 ath10k: fix tx hang Hmm.. If it still reproduces can you try the following diff? Micha? --- a/drivers/net/wireless/ath/ath10k/mac.c +++ b/drivers/net/wireless/ath/ath10k/mac.c @@ -3780,6 +3780,8 @@ void ath10k_mac_tx_push_pending(struct ath10k *ar) list_del_init(&artxq->list); if (ret != -ENOENT) list_add_tail(&artxq->list, &ar->txqs); + else if (artxq == last) + last = list_last_entry(&ar->txqs, struct ath10k_txq, list); ath10k_htt_tx_txq_update(hw, txq);