From patchwork Sun May 19 09:17:36 2013 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jinpu Wang X-Patchwork-Id: 2590051 Return-Path: X-Original-To: patchwork-linux-rdma@patchwork.kernel.org Delivered-To: patchwork-process-083081@patchwork2.kernel.org Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by patchwork2.kernel.org (Postfix) with ESMTP id 5CC04DF24C for ; Sun, 19 May 2013 09:17:48 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753563Ab3ESJRm (ORCPT ); Sun, 19 May 2013 05:17:42 -0400 Received: from mail-bk0-f44.google.com ([209.85.214.44]:33515 "EHLO mail-bk0-f44.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753175Ab3ESJRm (ORCPT ); Sun, 19 May 2013 05:17:42 -0400 Received: by mail-bk0-f44.google.com with SMTP id jk14so1395900bkc.17 for ; Sun, 19 May 2013 02:17:40 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=x-received:message-id:date:from:user-agent:mime-version:to:cc :subject:references:in-reply-to:content-type :content-transfer-encoding:x-gm-message-state; bh=DBFfi0VVLtKqMHdjceSbRxme6w6StQqkqwBVgFV7bhM=; b=KKYJAZRRsQe79vIY2OafHzVJ02zKYLF14NMMP5tet/8PbNoXfnLYpAAaWiDCE/Xt/5 dxmyJIxyi+MqVcteujOAFTUATGPd6S/J2ETjuqnNNg487lyAkFNYMdxHWfi1Cp0lomZZ EndIjosjETVD+HT+YadveWCY6+i9wKm3ATG1kl9qHkC07s5oF1vZN/0iXAoDB8nX+QYB MfllmQ9SmvXPcrqr5LPe+AMBToncU2LeuFN8SYLOKxG4B/9Lt5SMWc8WOkbjvBVB6Sc9 da1ln1xeBesC8hruzmP8JjUqGFwPxG+m3uOVyDqZwqR3hEvcVKyOT10TrBzI8dZzR/Lr q2Jg== X-Received: by 10.204.175.198 with SMTP id bb6mr18095592bkb.9.1368955060289; Sun, 19 May 2013 02:17:40 -0700 (PDT) Received: from [192.168.2.100] (p3E9E413E.dip0.t-ipconnect.de. [62.158.65.62]) by mx.google.com with ESMTPSA id v6sm4580996bko.3.2013.05.19.02.17.38 for (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Sun, 19 May 2013 02:17:39 -0700 (PDT) Message-ID: <519898B0.1000901@profitbricks.com> Date: Sun, 19 May 2013 11:17:36 +0200 From: Jack Wang User-Agent: Mozilla/5.0 (X11; Linux i686; rv:17.0) Gecko/20130308 Thunderbird/17.0.4 MIME-Version: 1.0 To: Or Gerlitz CC: Shlomo Pongratz , "linux-rdma@vger.kernel.org" , Dongsu Park Subject: Re: list corruption in IPOIB References: <519686B4.7010300@profitbricks.com> <5197F447.5020702@profitbricks.com> <51986A8B.9030806@mellanox.com> In-Reply-To: <51986A8B.9030806@mellanox.com> X-Gm-Message-State: ALoCoQmW4R0hnxzKAitEuHAraQWJ1ghPW16a8qJd3X6gkQXnLuK2c+KTPbBdU7ReVq8KAq8pyaOS Sender: linux-rdma-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-rdma@vger.kernel.org On 2013?05?19? 08:00, Or Gerlitz wrote: > On 19/05/2013 00:36, Jack Wang wrote: >> I tried 3.4.23, and mainline kernel from Roland's rdma-for-linus, we >> added bug injection interface, run multithread iperf, and switched ib >> mode between connected and datagram in sync on each side as Shlomo >> suggested. > > Can you be more specific re the bug injection interface, is that > existing kernel mechanism or something you added? so the bug triggers > when you run iperf in multi-threaded mode AND in parallel inject errors > AND in parallel switch between datagram and connected mode? bee --- I > assume this isn't something you do just for the fun of it... so some > problem X hits you in production and this problem Y you get with the > above juggling, any known or empiric relation between the two? > > Or. we added inject_bug sysfs node to make function run into error case, like something below. Yes, you are right, we want to speedup the bug reproduce process, and we saw the warning and come to conclusion the neigh->list corrupted some where. What's your opinion? Regards, Jack wc->status, wr_id, wc->vendor_err); --- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html --- a/drivers/infiniband/ulp/ipoib/ipoib_cm.c +++ b/drivers/infiniband/ulp/ipoib/ipoib_cm.c @@ -797,10 +797,12 @@ void ipoib_cm_handle_tx_wc(struct net_device *dev, struct ib_wc *wc) test_bit(IPOIB_FLAG_ADMIN_UP, &priv->flags)) netif_wake_queue(dev); - if (wc->status != IB_WC_SUCCESS && - wc->status != IB_WC_WR_FLUSH_ERR) { + if (priv->inject_bug || + (wc->status != IB_WC_SUCCESS && + wc->status != IB_WC_WR_FLUSH_ERR)) { struct ipoib_neigh *neigh; + priv->inject_bug = 0; ipoib_dbg(priv, "failed cm send event " "(status=%d, wrid=%d vend_err %x)\n",