Message ID | 20230609103651.313194-1-treapking@chromium.org (mailing list archive) |
---|---|
State | Changes Requested |
Delegated to: | Kalle Valo |
Headers | show |
Series | wifi: mwifiex: Replace RX workqueues with kthreads | expand |
Pin-yen Lin <treapking@chromium.org> writes: > This improves the RX throughput likely by better locality for the work > loads. > > We tested this patch on Mediatek MT8173 Chromebooks, and it shows ~80% > Rx throughput improvement on high data rate test cases. 80%? That's huge from so small patch like this! What are the before and after numbers?
Hi Kalle, On Fri, Jun 9, 2023 at 6:41 PM Kalle Valo <kvalo@kernel.org> wrote: > > Pin-yen Lin <treapking@chromium.org> writes: > > > This improves the RX throughput likely by better locality for the work > > loads. > > > > We tested this patch on Mediatek MT8173 Chromebooks, and it shows ~80% > > Rx throughput improvement on high data rate test cases. > > 80%? That's huge from so small patch like this! What are the before and > after numbers? I realized that I might have over-simplified the background and the impact of this patch... The short answer to the question is that the throughput improved from 100 mbps to 180 mbps. The test was run on ChromeOS's v5.15 kernel fork. More detailed test setting is mentioned in [1]. However, the throughput of the same test case on our v4.19 kernel is 320 mbps. That is, we observed a 320 mbps --> 100 mbps regression when we tried to update the kernel version. This patch is more like a mitigation of the regression. It improves the throughput, even though it is still not as good as the older kernel. That being said, this patch does improve the throughput, so we think this patch can be landed into the mainline kernel. Best regards, Pin-yen [1]: https://lore.kernel.org/all/ZFvpJb9Dh0FCkLQA@google.com/ > > -- > https://patchwork.kernel.org/project/linux-wireless/list/ > > https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches
Hi, Thanks Pin-yen for most of the investigation here and for pushing the patch. With some additional information though, I might suggest *not* landing this patch at the moment. More details appended: On Sat, Jun 10, 2023 at 01:41:51AM +0800, Pin-yen Lin wrote: > I realized that I might have over-simplified the background and the > impact of this patch... > > The short answer to the question is that the throughput improved from > 100 mbps to 180 mbps. The test was run on ChromeOS's v5.15 kernel > fork. More detailed test setting is mentioned in [1]. > > However, the throughput of the same test case on our v4.19 kernel is > 320 mbps. That is, we observed a 320 mbps --> 100 mbps regression when > we tried to update the kernel version. This patch is more like a > mitigation of the regression. It improves the throughput, even though > it is still not as good as the older kernel. > > That being said, this patch does improve the throughput, so we think > this patch can be landed into the mainline kernel. > > Best regards, > Pin-yen > > [1]: https://lore.kernel.org/all/ZFvpJb9Dh0FCkLQA@google.com/ I (we?) was optimistic this would be an improvement (or at least, no worse) due to some of the reasoning at [1]. And, the work here is just a single work item, queued repeatedly to the same unbound workqueue. So conceptually, it shouldn't be much different than a kthread_worker, except for scheduler details -- where again, we'd think this should be an improvement, as the scheduler would now better track load for the task (mwifiex RX) in question. But additional testing on other mwifiex-based systems (RK3399 + PCIE 8997) showed the inverse: some throughput drops on similar benchmarks, from 110 Mbps to 80 Mbps. (Frankly, both numbers are significantly below where we might like...) Considering we still don't have a full explanation for all the performance differences we've been seeing (on either test platform), and that at least one of our platforms showed a (smaller) regression, I think we might want to do more research before committing to this. Brian
Brian Norris <briannorris@chromium.org> writes: > Hi, > > Thanks Pin-yen for most of the investigation here and for pushing the > patch. With some additional information though, I might suggest *not* > landing this patch at the moment. More details appended: > > On Sat, Jun 10, 2023 at 01:41:51AM +0800, Pin-yen Lin wrote: >> I realized that I might have over-simplified the background and the >> impact of this patch... >> >> The short answer to the question is that the throughput improved from >> 100 mbps to 180 mbps. The test was run on ChromeOS's v5.15 kernel >> fork. More detailed test setting is mentioned in [1]. >> >> However, the throughput of the same test case on our v4.19 kernel is >> 320 mbps. That is, we observed a 320 mbps --> 100 mbps regression when >> we tried to update the kernel version. This patch is more like a >> mitigation of the regression. It improves the throughput, even though >> it is still not as good as the older kernel. >> >> That being said, this patch does improve the throughput, so we think >> this patch can be landed into the mainline kernel. >> >> Best regards, >> Pin-yen >> >> [1]: https://lore.kernel.org/all/ZFvpJb9Dh0FCkLQA@google.com/ > > I (we?) was optimistic this would be an improvement (or at least, no > worse) due to some of the reasoning at [1]. And, the work here is just a > single work item, queued repeatedly to the same unbound workqueue. So > conceptually, it shouldn't be much different than a kthread_worker, > except for scheduler details -- where again, we'd think this should be > an improvement, as the scheduler would now better track load for the > task (mwifiex RX) in question. > > But additional testing on other mwifiex-based systems (RK3399 + PCIE > 8997) showed the inverse: some throughput drops on similar benchmarks, > from 110 Mbps to 80 Mbps. (Frankly, both numbers are significantly below > where we might like...) > > Considering we still don't have a full explanation for all the > performance differences we've been seeing (on either test platform), and > that at least one of our platforms showed a (smaller) regression, I > think we might want to do more research before committing to this. Yeah, I agree and I'll drop this. This is a really weird problem, I hope you can get to the bottom of it.
diff --git a/drivers/net/wireless/marvell/mwifiex/11n_rxreorder.c b/drivers/net/wireless/marvell/mwifiex/11n_rxreorder.c index 391793a16adc..431f6dc61f5b 100644 --- a/drivers/net/wireless/marvell/mwifiex/11n_rxreorder.c +++ b/drivers/net/wireless/marvell/mwifiex/11n_rxreorder.c @@ -198,7 +198,7 @@ mwifiex_del_rx_reorder_entry(struct mwifiex_private *priv, priv->adapter->rx_locked = true; if (priv->adapter->rx_processing) { spin_unlock_bh(&priv->adapter->rx_proc_lock); - flush_workqueue(priv->adapter->rx_workqueue); + kthread_flush_worker(priv->adapter->rx_thread); } else { spin_unlock_bh(&priv->adapter->rx_proc_lock); } diff --git a/drivers/net/wireless/marvell/mwifiex/cfg80211.c b/drivers/net/wireless/marvell/mwifiex/cfg80211.c index bcd564dc3554..f985bff4e52c 100644 --- a/drivers/net/wireless/marvell/mwifiex/cfg80211.c +++ b/drivers/net/wireless/marvell/mwifiex/cfg80211.c @@ -872,7 +872,7 @@ static int mwifiex_deinit_priv_params(struct mwifiex_private *priv) adapter->rx_locked = true; if (adapter->rx_processing) { spin_unlock_bh(&adapter->rx_proc_lock); - flush_workqueue(adapter->rx_workqueue); + kthread_flush_worker(adapter->rx_thread); } else { spin_unlock_bh(&adapter->rx_proc_lock); } diff --git a/drivers/net/wireless/marvell/mwifiex/main.c b/drivers/net/wireless/marvell/mwifiex/main.c index ea22a08e6c08..bab963f3db83 100644 --- a/drivers/net/wireless/marvell/mwifiex/main.c +++ b/drivers/net/wireless/marvell/mwifiex/main.c @@ -168,7 +168,7 @@ static void mwifiex_queue_rx_work(struct mwifiex_adapter *adapter) spin_unlock_bh(&adapter->rx_proc_lock); } else { spin_unlock_bh(&adapter->rx_proc_lock); - queue_work(adapter->rx_workqueue, &adapter->rx_work); + kthread_queue_work(adapter->rx_thread, &adapter->rx_work); } } @@ -526,9 +526,10 @@ static void mwifiex_terminate_workqueue(struct mwifiex_adapter *adapter) adapter->workqueue = NULL; } - if (adapter->rx_workqueue) { - destroy_workqueue(adapter->rx_workqueue); - adapter->rx_workqueue = NULL; + if (adapter->rx_thread) { + kthread_flush_worker(adapter->rx_thread); + kthread_destroy_worker(adapter->rx_thread); + adapter->rx_thread = NULL; } } @@ -1394,7 +1395,7 @@ int is_command_pending(struct mwifiex_adapter *adapter) * * It handles the RX operations. */ -static void mwifiex_rx_work_queue(struct work_struct *work) +static void mwifiex_rx_work_queue(struct kthread_work *work) { struct mwifiex_adapter *adapter = container_of(work, struct mwifiex_adapter, rx_work); @@ -1554,13 +1555,10 @@ mwifiex_reinit_sw(struct mwifiex_adapter *adapter) INIT_WORK(&adapter->main_work, mwifiex_main_work_queue); if (adapter->rx_work_enabled) { - adapter->rx_workqueue = alloc_workqueue("MWIFIEX_RX_WORK_QUEUE", - WQ_HIGHPRI | - WQ_MEM_RECLAIM | - WQ_UNBOUND, 1); - if (!adapter->rx_workqueue) + adapter->rx_thread = kthread_create_worker(0, "MWIFIEX_RX"); + if (IS_ERR(adapter->rx_thread)) goto err_kmalloc; - INIT_WORK(&adapter->rx_work, mwifiex_rx_work_queue); + kthread_init_work(&adapter->rx_work, mwifiex_rx_work_queue); } /* Register the device. Fill up the private data structure with @@ -1709,14 +1707,11 @@ mwifiex_add_card(void *card, struct completion *fw_done, INIT_WORK(&adapter->main_work, mwifiex_main_work_queue); if (adapter->rx_work_enabled) { - adapter->rx_workqueue = alloc_workqueue("MWIFIEX_RX_WORK_QUEUE", - WQ_HIGHPRI | - WQ_MEM_RECLAIM | - WQ_UNBOUND, 1); - if (!adapter->rx_workqueue) + adapter->rx_thread = kthread_create_worker(0, "MWIFIEX_RX"); + if (!adapter->rx_thread) goto err_kmalloc; - INIT_WORK(&adapter->rx_work, mwifiex_rx_work_queue); + kthread_init_work(&adapter->rx_work, mwifiex_rx_work_queue); } /* Register the device. Fill up the private data structure with relevant diff --git a/drivers/net/wireless/marvell/mwifiex/main.h b/drivers/net/wireless/marvell/mwifiex/main.h index b95886e1413e..3255f9a5c2d4 100644 --- a/drivers/net/wireless/marvell/mwifiex/main.h +++ b/drivers/net/wireless/marvell/mwifiex/main.h @@ -32,6 +32,7 @@ #include <linux/gfp.h> #include <linux/interrupt.h> #include <linux/io.h> +#include <linux/kthread.h> #include <linux/of_gpio.h> #include <linux/of_platform.h> #include <linux/platform_device.h> @@ -886,8 +887,8 @@ struct mwifiex_adapter { atomic_t tx_hw_pending; struct workqueue_struct *workqueue; struct work_struct main_work; - struct workqueue_struct *rx_workqueue; - struct work_struct rx_work; + struct kthread_worker *rx_thread; + struct kthread_work rx_work; struct workqueue_struct *dfs_workqueue; struct work_struct dfs_work; bool rx_work_enabled;
This improves the RX throughput likely by better locality for the work loads. We tested this patch on Mediatek MT8173 Chromebooks, and it shows ~80% Rx throughput improvement on high data rate test cases. Signed-off-by: Pin-yen Lin <treapking@chromium.org> --- .../wireless/marvell/mwifiex/11n_rxreorder.c | 2 +- .../net/wireless/marvell/mwifiex/cfg80211.c | 2 +- drivers/net/wireless/marvell/mwifiex/main.c | 29 ++++++++----------- drivers/net/wireless/marvell/mwifiex/main.h | 5 ++-- 4 files changed, 17 insertions(+), 21 deletions(-)