From patchwork Mon May 23 11:36:09 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Shuyu Wei X-Patchwork-Id: 9131607 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id 1D4096075F for ; Mon, 23 May 2016 11:36:45 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 1B37828230 for ; Mon, 23 May 2016 11:36:45 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 0FDD828233; Mon, 23 May 2016 11:36:45 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-4.1 required=2.0 tests=BAYES_00, DKIM_ADSP_CUSTOM_MED, DKIM_SIGNED, FREEMAIL_FROM, RCVD_IN_DNSWL_MED, T_DKIM_INVALID autolearn=ham version=3.3.1 Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.9]) (using TLSv1.2 with cipher AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id AA4CE28230 for ; Mon, 23 May 2016 11:36:44 +0000 (UTC) Received: from localhost ([127.0.0.1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.80.1 #2 (Red Hat Linux)) id 1b4oA7-0003Sm-5U; Mon, 23 May 2016 11:36:43 +0000 Received: from mail-qg0-x241.google.com ([2607:f8b0:400d:c04::241]) by bombadil.infradead.org with esmtps (Exim 4.80.1 #2 (Red Hat Linux)) id 1b4oA4-0003Jn-IV for linux-rockchip@lists.infradead.org; Mon, 23 May 2016 11:36:41 +0000 Received: by mail-qg0-x241.google.com with SMTP id 90so14639783qgz.0 for ; Mon, 23 May 2016 04:36:18 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to:user-agent; bh=VgnVbTpx1tiVhOOR64OlpDMMXnsVB7WsayZ11Bf2JnM=; b=R7FI8Vo/RnMuTXRJl+xuoA6ZqmrDKYxj4rMQ/WgSD8zINlJEmlQNWXI/1Y187FAqM9 Qe0WVdrIhxRylddcNy4XRsIadVs8YzSF8hU+rWCbqj2TebVfS3WDr81BTdx6bhKF1wDO mHkTFdCRADaO2NA65Q8zEgCs0mZVHCYlRJEK0VNFBkXpl6udmwWQ5RrPnj9JKENmFka4 IF87QYh5XuVsaNcIxx9sWlGbYorsdz/Bv89yLNAFrJ9QbE3/kFi38e8uSsn+ubAg+Oth EQmWDn1DCC9zm636G2DfVW7Yc0FYkSM54JzaSG1fDGa/EogFd/xX/03OZzZonqD2S0Ih UzjA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to:user-agent; bh=VgnVbTpx1tiVhOOR64OlpDMMXnsVB7WsayZ11Bf2JnM=; b=AlU1CZ7xc7xOHSM0K0t12WcgTYzbSyta5qajSxAAXbpDdgE3Wees4h3fswXUoRdKZM CXm601a+eSz44OaCAKur4qQw4w9Kt9D+GjkAU5xdULPdvIkBg5FsJo3M2hybq5TES1tq WWFUcwYgRmUwFWtVhCWBSzkjJDG9jHi5Ok1Mgr5S2itkN1G2paMaIqkWRgdh7XCSsPok N85JoJxZ7oVldKr9+40vMX7aoZL1spMKFLd5TP3osdy6ByMhvHv74lyPtCFzpg8dwPgQ S6kYSPgp7lMB0pwqDhkamLfV2QUCFVdbFgxkMl9vJSysMBfOgzxbLn+t1/g4Qc1+4Ixd jY6w== X-Gm-Message-State: AOPr4FVU9xvu+5vLE56x9DIDvgPDFkj7ohS4oyFqebi8xpO8BL27vYlFZCHdjy0CTtadCw== X-Received: by 10.140.230.20 with SMTP id a20mr15898569qhc.68.1464003378090; Mon, 23 May 2016 04:36:18 -0700 (PDT) Received: from debian-dorm ([159.203.126.36]) by smtp.gmail.com with ESMTPSA id l3sm4505850qtb.11.2016.05.23.04.36.14 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Mon, 23 May 2016 04:36:17 -0700 (PDT) Date: Mon, 23 May 2016 19:36:09 +0800 From: Shuyu Wei To: Lino Sanfilippo Subject: Re: [PATCH v2] ethernet:arc: Fix racing of TX ring buffer Message-ID: <20160523113609.GA21019@debian-dorm> References: <20160517.142456.2247845107325931733.davem@davemloft.net> <20160518000153.GA21757@electric-eye.fr.zoreil.com> <573CD09D.1060307@gmx.de> <20160518225529.GA18671@electric-eye.fr.zoreil.com> <573E2D0C.604@gmx.de> <20160520003145.GA22420@electric-eye.fr.zoreil.com> <20160521160910.GA14945@debian-dorm> <5740E82F.8040903@gmx.de> <20160522091742.GA8681@debian-dorm> <57419853.9050701@gmx.de> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: <57419853.9050701@gmx.de> User-Agent: Mutt/1.6.0 (2016-04-01) X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20160523_043640_712069_49DBFB6B X-CRM114-Status: GOOD ( 22.83 ) X-BeenThere: linux-rockchip@lists.infradead.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Upstream kernel work for Rockchip platforms List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: heiko@sntech.de, al.kochet@gmail.com, netdev@vger.kernel.org, linux-rockchip@lists.infradead.org, Francois Romieu , David Miller , wxt@rock-chips.com Sender: "Linux-rockchip" Errors-To: linux-rockchip-bounces+patchwork-linux-rockchip=patchwork.kernel.org@lists.infradead.org X-Virus-Scanned: ClamAV using ClamSMTP On Sun, May 22, 2016 at 01:30:27PM +0200, Lino Sanfilippo wrote: > > Thanks for testing. However that extra check for skb not being NULL should not be > necessary if the code were correct. The changes I suggested were all about having > skb and info consistent with txbd_curr. > But I just realized that there is still a big flaw in the last changes. While > tx() looks correct now (we first set up the descriptor and assign the skb and _then_ > advance txbd_curr) tx_clean still is not: > > We _first_ have to read tx_curr and _then_ read the corresponding descriptor and its skb. > (The last patch implemented just the reverse - and thus wrong - order, first get skb and > descriptor and then read tx_curr). > > So the patch below hopefully handles also tx_clean correctly. Could you please do once more a test > with this one? Hi Lino, This patch worked after a whole night of stress testing. > > > > After further test, my patch to barrier timestamp() didn't work. > > Just like the original code in the tree, the emac still got stuck under > > high load, even if I changed the smp_wmb() to dma_wmb(). So the original > > code do have race somewhere. > > So to make this clear: with the current code in net-next you still see a problem (lockup), right? Yes, I mean the mainline kernel, which should be the same as net-next. > > ... and why Francois' fix worked. Please be patient with me :-). > > So which fix(es) exactly work for you and solve your lockup issue? I mean the patch below, starting this thread. diff --git a/drivers/net/ethernet/arc/emac_main.c b/drivers/net/ethernet/arc/emac_main.c index a3a9392..df3dfef 100644 --- a/drivers/net/ethernet/arc/emac_main.c +++ b/drivers/net/ethernet/arc/emac_main.c @@ -153,9 +153,8 @@ static void arc_emac_tx_clean(struct net_device *ndev) { struct arc_emac_priv *priv = netdev_priv(ndev); struct net_device_stats *stats = &ndev->stats; - unsigned int i; - for (i = 0; i < TX_BD_NUM; i++) { + while (priv->txbd_dirty != priv->txbd_curr) { unsigned int *txbd_dirty = &priv->txbd_dirty; struct arc_emac_bd *txbd = &priv->txbd[*txbd_dirty]; struct buffer_state *tx_buff = &priv->tx_buff[*txbd_dirty]; @@ -685,13 +684,15 @@ static int arc_emac_tx(struct sk_buff *skb, struct net_device *ndev) wmb(); skb_tx_timestamp(skb); + priv->tx_buff[*txbd_curr].skb = skb; + + dma_wmb(); *info = cpu_to_le32(FOR_EMAC | FIRST_OR_LAST_MASK | len); /* Make sure info word is set */ wmb(); - priv->tx_buff[*txbd_curr].skb = skb; /* Increment index to point to the next BD */ *txbd_curr = (*txbd_curr + 1) % TX_BD_NUM;