From patchwork Sun May 22 11:30:27 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Lino Sanfilippo X-Patchwork-Id: 9130903 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id CEEAD60459 for ; Sun, 22 May 2016 11:31:24 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 7743B2819E for ; Sun, 22 May 2016 11:31:22 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 5955C281AC; Sun, 22 May 2016 11:31:22 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-4.2 required=2.0 tests=BAYES_00,FREEMAIL_FROM, RCVD_IN_DNSWL_MED autolearn=ham version=3.3.1 Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.9]) (using TLSv1.2 with cipher AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id 62A2E2819E for ; Sun, 22 May 2016 11:31:19 +0000 (UTC) Received: from localhost ([127.0.0.1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.80.1 #2 (Red Hat Linux)) id 1b4RbJ-0000Mo-MV; Sun, 22 May 2016 11:31:17 +0000 Received: from mout.gmx.net ([212.227.15.18]) by bombadil.infradead.org with esmtps (Exim 4.80.1 #2 (Red Hat Linux)) id 1b4RbH-0000LT-4Q for linux-rockchip@lists.infradead.org; Sun, 22 May 2016 11:31:16 +0000 Received: from [192.168.178.39] ([78.43.221.200]) by mail.gmx.com (mrgmx002) with ESMTPSA (Nemesis) id 0MPUlV-1b8YJr40Th-004gBv; Sun, 22 May 2016 13:30:31 +0200 Subject: Re: [PATCH v2] ethernet:arc: Fix racing of TX ring buffer To: Shuyu Wei References: <20160517152520.GA2750@debian-dorm> <20160517.142456.2247845107325931733.davem@davemloft.net> <20160518000153.GA21757@electric-eye.fr.zoreil.com> <573CD09D.1060307@gmx.de> <20160518225529.GA18671@electric-eye.fr.zoreil.com> <573E2D0C.604@gmx.de> <20160520003145.GA22420@electric-eye.fr.zoreil.com> <20160521160910.GA14945@debian-dorm> <5740E82F.8040903@gmx.de> <20160522091742.GA8681@debian-dorm> From: Lino Sanfilippo Message-ID: <57419853.9050701@gmx.de> Date: Sun, 22 May 2016 13:30:27 +0200 User-Agent: Mozilla/5.0 (X11; Linux i686; rv:38.0) Gecko/20100101 Thunderbird/38.8.0 MIME-Version: 1.0 In-Reply-To: <20160522091742.GA8681@debian-dorm> X-Provags-ID: V03:K0:6FYAT4iRglvtueQQczdngP6wQ1GnvT9KBNGsDvcP/M+bRzQjMZ2 iPNo7Whz1Ap2A+GZsKn+qx483vOaLNKEaE2zEKPZ3uUGX0QSwog6Vt1R3aIt9KyNHeonuB3 o2aJfdgxeJSsE3UJAozvq+nItmlX05zC1BL8Zr6VPNZH1yhzY+qLlbOV9Bb4YbDPyXKz3w4 HeqDRQ2Z9bNbGpBxg60fQ== X-UI-Out-Filterresults: notjunk:1; V01:K0:Myhs+n84TYc=:G3Md2fNYALUu/U1N7d39I6 UL6u5wMgA2cJSt1vBZalVH5/p218I9uqrFH9k2jMc/10kZ+y085/5/Pi/qmkAIZqzRtmdQz9Y gjjygX0r7e8m40U5xCP6rFZuHgly4DVPYd8CebJarR0UNUAsNxjK1Pagdl110Pbasc7Y1ro/A +6K8mpKpfvIdGIuA4HF9M6udWIWJ1meIVYxnhSdlDHYcyDkhBOf5ohouvfbwwou5cXgkty8n5 52rav/KzhkIhCpL3kVm3tpZ2fRHol/1pn88dIFY7+YfMpv/OadIrCFPWamNsARlAryxEAcj0X /tyE5/7D7GACsJjYRiSdeNXZW1AmP2Q0a9GRYcg0FBhbMLoqJMZaRhO7addqPJzDTFnCE2z4H NYEd0BWTlSuLtGOqfncJn8KhzaQQw8iXVwPlo1Gj6i5jKfHkg45p628Eqdsayc0PGvOsPDQka 6p/ncOGfFS5+DZKHwDmgcjInH0YvA0GX2LmGO5HlL79tehKVXnLUncM3kxs3BZzbbBQ83iI07 kagUenmC6fHqSWknNNDB9o+DO+hJcHls81tjs5dXmzsQF8eBch/l5nBBMd/AR+af2S5n4uBuG wcmyxyJ1WcPJ0ewFgtHhlFGKqHvGYi29aqBxqgRURbvzRe3T2rxVlJ/w1wZRNEp23Uwtytu4Z 2xSs7cM3zja/VvEnt9Zm5ZpDv4VS8MvZEZLKObChMQvRqj4OzdRcsBQxd32FcOcGfzzO0u8p5 fFcvQfLfoQ25h94wQJwWhSkWlgrDGPU7MdiHjN3jx7eaKRg4qR7Wkk1w6dRC9BJDBlKs+z3lr Y2vrFPj X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20160522_043115_523387_EB0EB6CF X-CRM114-Status: GOOD ( 23.02 ) X-BeenThere: linux-rockchip@lists.infradead.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Upstream kernel work for Rockchip platforms List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: heiko@sntech.de, al.kochet@gmail.com, netdev@vger.kernel.org, linux-rockchip@lists.infradead.org, Francois Romieu , David Miller , wxt@rock-chips.com Sender: "Linux-rockchip" Errors-To: linux-rockchip-bounces+patchwork-linux-rockchip=patchwork.kernel.org@lists.infradead.org X-Virus-Scanned: ClamAV using ClamSMTP On 22.05.2016 11:17, Shuyu Wei wrote: > Hi Lino, > > I tested this patch, it still got panic under stress. > Just wget 2 large files simultaneously and it failed. > > Looks like the problem comes from the if statement in tx_clean(). > I changed your patch to use > > - if (info & FOR_EMAC) > + if ((info & FOR_EMAC) || !txbd->data || !skb) > > and it worked. Thanks for testing. However that extra check for skb not being NULL should not be necessary if the code were correct. The changes I suggested were all about having skb and info consistent with txbd_curr. But I just realized that there is still a big flaw in the last changes. While tx() looks correct now (we first set up the descriptor and assign the skb and _then_ advance txbd_curr) tx_clean still is not: We _first_ have to read tx_curr and _then_ read the corresponding descriptor and its skb. (The last patch implemented just the reverse - and thus wrong - order, first get skb and descriptor and then read tx_curr). So the patch below hopefully handles also tx_clean correctly. Could you please do once more a test with this one? > > After further test, my patch to barrier timestamp() didn't work. > Just like the original code in the tree, the emac still got stuck under > high load, even if I changed the smp_wmb() to dma_wmb(). So the original > code do have race somewhere. So to make this clear: with the current code in net-next you still see a problem (lockup), right? > > I'm new to kernel development, and still trying to understand how memory > barrier works Its an interresting topic and thats what I am trying to understand, too :) > ... and why Francois' fix worked. Please be patient with me :-). So which fix(es) exactly work for you and solve your lockup issue? --- a/drivers/net/ethernet/arc/emac_main.c +++ b/drivers/net/ethernet/arc/emac_main.c @@ -159,12 +159,22 @@ static void arc_emac_tx_clean(struct net_device *ndev) unsigned int *txbd_dirty = &priv->txbd_dirty; struct arc_emac_bd *txbd = &priv->txbd[*txbd_dirty]; struct buffer_state *tx_buff = &priv->tx_buff[*txbd_dirty]; - struct sk_buff *skb = tx_buff->skb; unsigned int info = le32_to_cpu(txbd->info); + struct sk_buff *skb; - if ((info & FOR_EMAC) || !txbd->data || !skb) + if (*txbd_dirty == priv->txbd_curr) break; + /* Make sure curr pointer is consistent with info */ + rmb(); + + info = le32_to_cpu(txbd->info); + + if (info & FOR_EMAC) + break; + + skb = tx_buff->skb; + if (unlikely(info & (DROP | DEFR | LTCL | UFLO))) { stats->tx_errors++; stats->tx_dropped++; @@ -195,8 +205,8 @@ static void arc_emac_tx_clean(struct net_device *ndev) *txbd_dirty = (*txbd_dirty + 1) % TX_BD_NUM; } - /* Ensure that txbd_dirty is visible to tx() before checking - * for queue stopped. + /* Ensure that txbd_dirty is visible to tx() and we see the most recent + * value for txbd_curr. */ smp_mb(); @@ -680,35 +690,29 @@ static int arc_emac_tx(struct sk_buff *skb, struct net_device *ndev) dma_unmap_len_set(&priv->tx_buff[*txbd_curr], len, len); priv->txbd[*txbd_curr].data = cpu_to_le32(addr); - - /* Make sure pointer to data buffer is set */ - wmb(); + priv->tx_buff[*txbd_curr].skb = skb; skb_tx_timestamp(skb); *info = cpu_to_le32(FOR_EMAC | FIRST_OR_LAST_MASK | len); - /* Make sure info word is set */ + /* 1. Make sure that with respect to tx_clean everything is set up + * properly before we advance txbd_curr. + * 2. Make sure writes to DMA descriptors are completed before we inform + * the hardware. + */ wmb(); - priv->tx_buff[*txbd_curr].skb = skb; - /* Increment index to point to the next BD */ *txbd_curr = (*txbd_curr + 1) % TX_BD_NUM; - /* Ensure that tx_clean() sees the new txbd_curr before - * checking the queue status. This prevents an unneeded wake - * of the queue in tx_clean(). + /* Ensure we see the most recent value of txbd_dirty and tx_clean() sees + * the updated value of txbd_curr. */ smp_mb(); - if (!arc_emac_tx_avail(priv)) { + if (!arc_emac_tx_avail(priv)) netif_stop_queue(ndev); - /* Refresh tx_dirty */ - smp_mb(); - if (arc_emac_tx_avail(priv)) - netif_start_queue(ndev); - } arc_reg_set(priv, R_STATUS, TXPL_MASK);