From patchwork Tue Nov 20 21:20:01 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Daniel Santos X-Patchwork-Id: 10691229 X-Patchwork-Delegate: kvalo@adurom.com Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 6E7FC17FE for ; Tue, 20 Nov 2018 21:21:36 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 5BC3C2A9AC for ; Tue, 20 Nov 2018 21:21:36 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 4FC1F2A9AF; Tue, 20 Nov 2018 21:21:36 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-8.0 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id A92AE2A9AC for ; Tue, 20 Nov 2018 21:21:35 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726553AbeKUHwo (ORCPT ); Wed, 21 Nov 2018 02:52:44 -0500 Received: from pb-smtp2.pobox.com ([64.147.108.71]:53059 "EHLO pb-smtp2.pobox.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725828AbeKUHwn (ORCPT ); Wed, 21 Nov 2018 02:52:43 -0500 Received: from pb-smtp2.pobox.com (unknown [127.0.0.1]) by pb-smtp2.pobox.com (Postfix) with ESMTP id E6DE512BF29; Tue, 20 Nov 2018 16:21:31 -0500 (EST) DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=pobox.com; h=to:from :subject:message-id:date:mime-version:content-type :content-transfer-encoding; s=sasl; bh=2lzVvbDFxTyFby8oZWB4vObKj GQ=; b=E+j/GWRPTbeTMIoC1zvpH3Cmrk1pofNLFGyHkVfF5y7zz//ZA5Tc/JN83 wGXiOFTVUpAh6cNLlJ6c5IVMEHBNgqNgh68JyuMO03Uq81yNsdmYA7PE7Dk0O8q7 u7SYq8JXcIgt+4Ct/TJQlmvruFzDERVRD57KL1uCcyifPrk23I= DomainKey-Signature: a=rsa-sha1; c=nofws; d=pobox.com; h=to:from:subject :message-id:date:mime-version:content-type :content-transfer-encoding; q=dns; s=sasl; b=upyZ3NV28EVpHFNy+Fo yIouyQql6pwETTpB2LXz3tvjEEQA4zFh/ZOo/jrMcKHhrCC+CUeaHrgJOuAXzHcP V69Dq7XTQVuxwhfJ7hwHMUZdgc3RbnLSYssDUP8A6aRQ6CCKYR2o2mf9RWnybo2w nUoNNBB1JbMQr4qqcW5YmO/A= Received: from pb-smtp2.nyi.icgroup.com (unknown [127.0.0.1]) by pb-smtp2.pobox.com (Postfix) with ESMTP id E00ED12BF28; Tue, 20 Nov 2018 16:21:31 -0500 (EST) Received: from [10.69.183.138] (unknown [108.91.94.48]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by pb-smtp2.pobox.com (Postfix) with ESMTPSA id 2E77F12BF27; Tue, 20 Nov 2018 16:21:31 -0500 (EST) To: Stanislaw Gruszka , linux-wireless@vger.kernel.org From: Daniel Santos Subject: rt2800 tx frame dropping issue. Message-ID: Date: Tue, 20 Nov 2018 15:20:01 -0600 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.9.1 MIME-Version: 1.0 Content-Language: en-US X-Pobox-Relay-ID: 4001D12E-ED0A-11E8-84E2-BFB3E64BB12D-06139138!pb-smtp2.pobox.com Sender: linux-wireless-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-wireless@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Hello, I'm coming to this issue rather late and I realize there has been much ado about it.  I want to follow up on a thread from 3 months ago https://marc.info/?l=linux-wireless&m=153511575812945 > I get testing results from T-Bone user in bugzilla: > https://bugzilla.kernel.org/show_bug.cgi?id=82751 > > And get this: > > [ 781.644185] ieee80211 phy0: rt2x00queue_write_tx_frame: Error - Dropping frame due \ > to full tx queue 2 [ 781.662620] 2 d1 s1 p1 ms1 > > Looks like rt2x00 correctly stop queue in mac80211, but sill get skb's. > > So we can do 3 things: increase queue size to 128, increase threshold > to 16 and make the massage debug one instead of error (I checked > some other drivers and looks most of them silently drop the frame > in case queue is full). Especially removing the message can be > useful since printk can somehow make mt7620 router unresponsive > for some time resulting in connection drops. > > Thoughts ? > > Regards > Stanislaw I believe I have the answer as to why we're getting frames after we stop the queue.  I had a little chat about this in #kernelnewbies and some other devs believe it is intentional. There is a race in ieee80211_tx_frags (net/mac80211/tx.c) between releasing local->queue_stop_reason_lock and calling the driver's tx until we lock queue->tx_lock in rt2x00queue_write_tx_frame -- in between these two points the thread can be preempted.  So while we stop the queue in one thread, there can be 20 other threads in between that have already checked that the queue was running and have a frame buffer sitting on their stacks.  I think it could be fixed with the below patch, but I'm being told that it doesn't need it and that the driver should just *quietly* drop the frame: Could anybody illuminate for me why this isn't done?  I know that there has to be a point where we realize there are too many items in the queue and we can't keep up, but this would seem to be a terrible way to do that.  Is it one of those things where it isn't worth the performance degradation?  Any insights would be most appreciated!! :) Like Stanislaw, I have not been able to reproduce the problem for testing, but somebody who has the problem (in production) is going to get a build where the only change is to remove the rt2x00_err message.  It makes sense that it would be the problem as you mentioned.  I suppose the proper thing to do is to just quietly drop it and increment the struct net_device stats.tx_dropped? Thanks, Daniel diff --git a/net/mac80211/ieee80211_i.h b/net/mac80211/ieee80211_i.h index d43dab4..29009e0 100644 --- a/net/mac80211/ieee80211_i.h +++ b/net/mac80211/ieee80211_i.h @@ -1117,6 +1117,7 @@ struct ieee80211_local { int q_stop_reasons[IEEE80211_MAX_QUEUES][IEEE80211_QUEUE_STOP_REASONS]; /* also used to protect ampdu_ac_queue and amdpu_ac_stop_refcnt */ spinlock_t queue_stop_reason_lock; + spinlock_t tx_path_lock; int open_count; int monitors, cooked_mntrs; diff --git a/net/mac80211/main.c b/net/mac80211/main.c index 4f8eceb..2312ac9 100644 --- a/net/mac80211/main.c +++ b/net/mac80211/main.c @@ -609,6 +609,7 @@ struct ieee80211_hw *ieee80211_alloc_hw_nm(size_t priv_data_len, spin_lock_init(&local->filter_lock); spin_lock_init(&local->rx_path_lock); spin_lock_init(&local->queue_stop_reason_lock); + spin_lock_init(&local->tx_path_lock); INIT_LIST_HEAD(&local->chanctx_list); mutex_init(&local->chanctx_mtx); diff --git a/net/mac80211/tx.c b/net/mac80211/tx.c index 87d807f..8d43e2a 100644 --- a/net/mac80211/tx.c +++ b/net/mac80211/tx.c @@ -1311,6 +1311,7 @@ static bool ieee80211_tx_frags(struct ieee80211_local *local, } #endif + spin_lock_bh(&local->tx_path_lock); spin_lock_irqsave(&local->queue_stop_reason_lock, flags); if (local->queue_stop_reasons[q] || (!txpending && !skb_queue_empty(&local->pending[q]))) { @@ -1327,6 +1328,7 @@ static bool ieee80211_tx_frags(struct ieee80211_local *local, spin_unlock_irqrestore( &local->queue_stop_reason_lock, flags); + spin_unlock_bh(&local->tx_path_lock); ieee80211_purge_tx_queue(&local->hw, skbs); return true; @@ -1347,6 +1349,7 @@ static bool ieee80211_tx_frags(struct ieee80211_local *local, spin_unlock_irqrestore(&local->queue_stop_reason_lock, flags); + spin_unlock_bh(&local->tx_path_lock); return false; } } @@ -1356,6 +1359,7 @@ static bool ieee80211_tx_frags(struct ieee80211_local *local, __skb_unlink(skb, skbs); ieee80211_drv_tx(local, vif, sta, skb); + spin_unlock_bh(&local->tx_path_lock); } return true;