diff mbox

[01/12] net: mediatek: fix DQL support

Message ID 1465108385-38286-2-git-send-email-john@phrozen.org (mailing list archive)
State New, archived
Headers show

Commit Message

John Crispin June 5, 2016, 6:32 a.m. UTC
The MTK ethernet core has 2 MACs both sitting on the same DMA ring. For
DQL to be deterministic it needs to track the amount of data in the DMA
ring and not the amount of data enqueued on each device. The current code
is incorrect, fix it by making it each device track its own traffic aswell
as the traffic of the other device.

Signed-off-by: John Crispin <john@phrozen.org>
---
 drivers/net/ethernet/mediatek/mtk_eth_soc.c |   33 ++++++++++++++++-----------
 1 file changed, 20 insertions(+), 13 deletions(-)

Comments

David Miller June 5, 2016, 7:32 a.m. UTC | #1
From: John Crispin <john@phrozen.org>
Date: Sun,  5 Jun 2016 08:32:54 +0200

> @@ -625,7 +625,16 @@ static int mtk_tx_map(struct sk_buff *skb, struct net_device *dev,
>  	WRITE_ONCE(itxd->txd3, (TX_DMA_SWC | TX_DMA_PLEN0(skb_headlen(skb)) |
>  				(!nr_frags * TX_DMA_LS0)));
>  
> -	netdev_sent_queue(dev, skb->len);
> +	/* we have a single DMA ring so BQL needs to be updated for all devices
> +	 * sitting on this ring
> +	 */
> +	for (i = 0; i < MTK_MAC_COUNT; i++) {
> +		if (!eth->netdev[i])
> +			continue;
> +
> +		netdev_sent_queue(eth->netdev[i], skb->len);
> +	}
> +
>  	skb_tx_timestamp(skb);

Sorry, this is very far from working.

You cannot asynchronously touch the DQL state of another netdevice.

You have to hold the TX lock of a queue while changing it's DQL state,
otherwise you'll corrupt the state.

This "loop over all possible devices on this DMA ring" is pretty
expensive for the problem you're trying to solve.

You'll have to find another way to fix this bug, which BTW I'm not too
clear about.  The commit message doesn't explain sufficiently what the
actual problem is.  "not deterministic" doesn't give enough details.
John Crispin June 6, 2016, 6:43 a.m. UTC | #2
On 05/06/2016 09:32, David Miller wrote:
> From: John Crispin <john@phrozen.org>
> Date: Sun,  5 Jun 2016 08:32:54 +0200
> 
>> @@ -625,7 +625,16 @@ static int mtk_tx_map(struct sk_buff *skb, struct net_device *dev,
>>  	WRITE_ONCE(itxd->txd3, (TX_DMA_SWC | TX_DMA_PLEN0(skb_headlen(skb)) |
>>  				(!nr_frags * TX_DMA_LS0)));
>>  
>> -	netdev_sent_queue(dev, skb->len);
>> +	/* we have a single DMA ring so BQL needs to be updated for all devices
>> +	 * sitting on this ring
>> +	 */
>> +	for (i = 0; i < MTK_MAC_COUNT; i++) {
>> +		if (!eth->netdev[i])
>> +			continue;
>> +
>> +		netdev_sent_queue(eth->netdev[i], skb->len);
>> +	}
>> +
>>  	skb_tx_timestamp(skb);
> 
> Sorry, this is very far from working.
> 
> You cannot asynchronously touch the DQL state of another netdevice.
> 
> You have to hold the TX lock of a queue while changing it's DQL state,
> otherwise you'll corrupt the state.
> 
> This "loop over all possible devices on this DMA ring" is pretty
> expensive for the problem you're trying to solve.
> 
> You'll have to find another way to fix this bug, which BTW I'm not too
> clear about.  The commit message doesn't explain sufficiently what the
> actual problem is.  "not deterministic" doesn't give enough details.
> 

Hi David,

DQL is supposed to measure how much data is enqueued on a netdev. the
problem here is that two devices share the same hardware queue. fq_codel
for example uses the values from dql to base its QoS judgement on. if we
track the dql of the 2 devices separately then the values will not take
the actual amount of data enqueued into account but only parts of it.
this will make the queue length used as a basis for fq_codel
calculations non deterministic thus breaking qos. dql needs to track the
amount of data in the physical queue underlying the netdev to be useful.
hope that explanation is better to understand.

i think one solution would be to add some code to have 2 devices share
the same dql instance. would that be an acceptable solution ?

anyhow, i will resend the series without the dql patch today and then
worry about it afterwards.

	John
David Miller June 7, 2016, 11:01 p.m. UTC | #3
From: John Crispin <john@phrozen.org>
Date: Mon, 6 Jun 2016 08:43:13 +0200

> i think one solution would be to add some code to have 2 devices share
> the same dql instance. would that be an acceptable solution ?

You still need to address the issue of synchronization.

dql purposefully doesn't use locking, always because a higher level
object (in this case the netdev TX queue) it is contained within
provides the synchronization.

That breaks apart once you share the dql between two netdevs, as you
are proposing here.  You'll have to add locking, which is expensive.

That's why I'm trying to encourage you to think out of the box and
find some way to solve the issue without having to access shared
state shared between multiple devices.

Thanks.
Tom Herbert June 7, 2016, 11:20 p.m. UTC | #4
On Tue, Jun 7, 2016 at 4:01 PM, David Miller <davem@davemloft.net> wrote:
> From: John Crispin <john@phrozen.org>
> Date: Mon, 6 Jun 2016 08:43:13 +0200
>
>> i think one solution would be to add some code to have 2 devices share
>> the same dql instance. would that be an acceptable solution ?
>
> You still need to address the issue of synchronization.
>
> dql purposefully doesn't use locking, always because a higher level
> object (in this case the netdev TX queue) it is contained within
> provides the synchronization.
>
> That breaks apart once you share the dql between two netdevs, as you
> are proposing here.  You'll have to add locking, which is expensive.
>
> That's why I'm trying to encourage you to think out of the box and
> find some way to solve the issue without having to access shared
> state shared between multiple devices.
>

I think you guys mean mean BQL not DQL :-)

If two netdevs share the same DMA ring then is using two netdevs the
right approach. Seems like this would have other consequences beyond
BQL...

Tom

> Thanks.
>
diff mbox

Patch

diff --git a/drivers/net/ethernet/mediatek/mtk_eth_soc.c b/drivers/net/ethernet/mediatek/mtk_eth_soc.c
index c984462..45f8dbf 100644
--- a/drivers/net/ethernet/mediatek/mtk_eth_soc.c
+++ b/drivers/net/ethernet/mediatek/mtk_eth_soc.c
@@ -625,7 +625,16 @@  static int mtk_tx_map(struct sk_buff *skb, struct net_device *dev,
 	WRITE_ONCE(itxd->txd3, (TX_DMA_SWC | TX_DMA_PLEN0(skb_headlen(skb)) |
 				(!nr_frags * TX_DMA_LS0)));
 
-	netdev_sent_queue(dev, skb->len);
+	/* we have a single DMA ring so BQL needs to be updated for all devices
+	 * sitting on this ring
+	 */
+	for (i = 0; i < MTK_MAC_COUNT; i++) {
+		if (!eth->netdev[i])
+			continue;
+
+		netdev_sent_queue(eth->netdev[i], skb->len);
+	}
+
 	skb_tx_timestamp(skb);
 
 	ring->next_free = mtk_qdma_phys_to_virt(ring, txd->txd2);
@@ -853,21 +862,18 @@  static int mtk_poll_tx(struct mtk_eth *eth, int budget, bool *tx_again)
 	struct mtk_tx_dma *desc;
 	struct sk_buff *skb;
 	struct mtk_tx_buf *tx_buf;
-	int total = 0, done[MTK_MAX_DEVS];
-	unsigned int bytes[MTK_MAX_DEVS];
+	int total = 0, done = 0;
+	unsigned int bytes = 0;
 	u32 cpu, dma;
 	static int condition;
 	int i;
 
-	memset(done, 0, sizeof(done));
-	memset(bytes, 0, sizeof(bytes));
-
 	cpu = mtk_r32(eth, MTK_QTX_CRX_PTR);
 	dma = mtk_r32(eth, MTK_QTX_DRX_PTR);
 
 	desc = mtk_qdma_phys_to_virt(ring, cpu);
 
-	while ((cpu != dma) && budget) {
+	while ((cpu != dma) && done < budget) {
 		u32 next_cpu = desc->txd2;
 		int mac;
 
@@ -887,9 +893,8 @@  static int mtk_poll_tx(struct mtk_eth *eth, int budget, bool *tx_again)
 		}
 
 		if (skb != (struct sk_buff *)MTK_DMA_DUMMY_DESC) {
-			bytes[mac] += skb->len;
-			done[mac]++;
-			budget--;
+			bytes += skb->len;
+			done++;
 		}
 		mtk_tx_unmap(eth->dev, tx_buf);
 
@@ -902,11 +907,13 @@  static int mtk_poll_tx(struct mtk_eth *eth, int budget, bool *tx_again)
 
 	mtk_w32(eth, cpu, MTK_QTX_CRX_PTR);
 
+	/* we have a single DMA ring so BQL needs to be updated for all devices
+	 * sitting on this ring
+	 */
 	for (i = 0; i < MTK_MAC_COUNT; i++) {
-		if (!eth->netdev[i] || !done[i])
+		if (!eth->netdev[i])
 			continue;
-		netdev_completed_queue(eth->netdev[i], done[i], bytes[i]);
-		total += done[i];
+		netdev_completed_queue(eth->netdev[i], done, bytes);
 	}
 
 	/* read hw index again make sure no new tx packet */