From patchwork Thu May 1 14:42:00 2014 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Hal Rosenstock X-Patchwork-Id: 4099261 X-Patchwork-Delegate: hal@mellanox.com Return-Path: X-Original-To: patchwork-linux-rdma@patchwork.kernel.org Delivered-To: patchwork-parsemail@patchwork2.web.kernel.org Received: from mail.kernel.org (mail.kernel.org [198.145.19.201]) by patchwork2.web.kernel.org (Postfix) with ESMTP id 42149BFF02 for ; Thu, 1 May 2014 14:42:20 +0000 (UTC) Received: from mail.kernel.org (localhost [127.0.0.1]) by mail.kernel.org (Postfix) with ESMTP id 50A3E20353 for ; Thu, 1 May 2014 14:42:19 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id E31922034F for ; Thu, 1 May 2014 14:42:17 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755081AbaEAOmM (ORCPT ); Thu, 1 May 2014 10:42:12 -0400 Received: from mail-wi0-f181.google.com ([209.85.212.181]:42415 "EHLO mail-wi0-f181.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751599AbaEAOmH (ORCPT ); Thu, 1 May 2014 10:42:07 -0400 Received: by mail-wi0-f181.google.com with SMTP id f8so838159wiw.8 for ; Thu, 01 May 2014 07:42:06 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:message-id:date:from:user-agent:mime-version:to :cc:subject:content-type:content-transfer-encoding; bh=8sQ0To39QhpujTQjxKQGrKnuVd+LbbtA3UDDd6fv/YY=; b=AV9qpcTZBgSQsVwORCDfk6r3M2oC6Kp6sFZnGEKLck9lwbCJJUrPMf7WqBAUMFTWxP k9palLVTj5CdiccR4eL1uvSUUo7SuN/PefZGomX7j/B/T/2SPL4aPTNOrxHRz87MObKG Cz+DP/qJ2ByogBjB8/sKIJqVykGgGV1wx88yzYW5QAHplkMexVbUQg+wGdBgneKl3NsF zBll1uN1Jy/bYl/KRqfRcoB4uKxbmkOHa0MVP94CvJSHkoWlf+5iIYwng1wY9c/0ug1X 550guun+BVDzc1bMCmIV+f7BBUPb1f+/UpViG06+kYCjtpgVgzYE4YMJV8qeeRgaYwkW Cr9g== X-Gm-Message-State: ALoCoQnYpDkC3HyzCFAArZcfLPjori2ghrocsuZjD1yV/gTwDEtpv0J3RdAa7pkOlWwaeUqR4Z8J X-Received: by 10.180.80.3 with SMTP id n3mr2451804wix.36.1398955326152; Thu, 01 May 2014 07:42:06 -0700 (PDT) Received: from [10.222.163.8] ([193.47.165.251]) by mx.google.com with ESMTPSA id gz1sm4111067wib.14.2014.05.01.07.42.04 for (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Thu, 01 May 2014 07:42:05 -0700 (PDT) Message-ID: <53625D38.6020602@dev.mellanox.co.il> Date: Thu, 01 May 2014 10:42:00 -0400 From: Hal Rosenstock User-Agent: Mozilla/5.0 (Windows NT 6.1; rv:9.0) Gecko/20111222 Thunderbird/9.0.1 MIME-Version: 1.0 To: "linux-rdma (linux-rdma@vger.kernel.org)" CC: Vladimir Koushnir Subject: [PATCH opensm] SM should resweep the fabric if vl15_send_mad fails Sender: linux-rdma-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-rdma@vger.kernel.org X-Spam-Status: No, score=-7.5 required=5.0 tests=BAYES_00, RCVD_IN_DNSWL_HI, RP_MATCHES_RCVD, UNPARSEABLE_RELAY autolearn=ham version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on mail.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP From: Vladimir Koushnir If osm_vendor_send fails to send a resp_expected MAD in vl15_send_mad, opensm needs to resweep the fabric to recover from this error. Signed-off-by: Vladimir Koushnir Signed-off-by: Hal Rosenstock --- include/opensm/osm_vl15intf.h | 11 ++++++++++- opensm/osm_opensm.c | 2 +- opensm/osm_vl15intf.c | 18 +++++++++++++++++- 3 files changed, 28 insertions(+), 3 deletions(-) diff --git a/include/opensm/osm_vl15intf.h b/include/opensm/osm_vl15intf.h index e621c68..b024b23 100644 --- a/include/opensm/osm_vl15intf.h +++ b/include/opensm/osm_vl15intf.h @@ -53,6 +53,7 @@ #include #include #include +#include #ifdef __cplusplus # define BEGIN_C_DECLS extern "C" { @@ -127,6 +128,7 @@ typedef struct osm_vl15 { osm_vendor_t *p_vend; osm_log_t *p_log; osm_stats_t *p_stats; + osm_subn_t *p_subn; } osm_vl15_t; /* * FIELDS @@ -171,6 +173,9 @@ typedef struct osm_vl15 { * p_stats * Pointer to the OpenSM statistics block. * +* p_subn +* Pointer to the OpenSM subnet object. +* * SEE ALSO * VL15 object *********/ @@ -251,6 +256,7 @@ void osm_vl15_destroy(IN osm_vl15_t * p_vl15, IN struct osm_mad_pool *p_pool); */ ib_api_status_t osm_vl15_init(IN osm_vl15_t * p_vl15, IN osm_vendor_t * p_vend, IN osm_log_t * p_log, IN osm_stats_t * p_stats, + IN osm_subn_t * p_subn, IN int32_t max_wire_smps, IN int32_t max_wire_smps2, IN uint32_t max_smps_timeout); @@ -266,7 +272,10 @@ ib_api_status_t osm_vl15_init(IN osm_vl15_t * p_vl15, IN osm_vendor_t * p_vend, * [in] Pointer to the log object. * * p_stats -* [in] Pointer to the OpenSM stastics block. +* [in] Pointer to the OpenSM statistics block. +* +* p_subn +* [in] Pointer to the OpenSM subnet object. * * max_wire_smps * [in] Maximum number of SMPs allowed on the wire at one time. diff --git a/opensm/osm_opensm.c b/opensm/osm_opensm.c index f702c80..69d2ba6 100644 --- a/opensm/osm_opensm.c +++ b/opensm/osm_opensm.c @@ -465,7 +465,7 @@ ib_api_status_t osm_opensm_init_finish(IN osm_opensm_t * p_osm, goto Exit; status = osm_vl15_init(&p_osm->vl15, p_osm->p_vendor, - &p_osm->log, &p_osm->stats, + &p_osm->log, &p_osm->stats, &p_osm->subn, p_opt->max_wire_smps, p_opt->max_wire_smps2, p_opt->max_smps_timeout); if (status != IB_SUCCESS) diff --git a/opensm/osm_vl15intf.c b/opensm/osm_vl15intf.c index f85252c..d00ecda 100644 --- a/opensm/osm_vl15intf.c +++ b/opensm/osm_vl15intf.c @@ -60,6 +60,7 @@ static void vl15_send_mad(osm_vl15_t * p_vl, osm_madw_t * p_madw) { ib_api_status_t status; boolean_t resp_expected = p_madw->resp_expected; + ib_smp_t * p_smp; /* Non-response-expected mads are not throttled on the wire @@ -106,8 +107,21 @@ static void vl15_send_mad(osm_vl15_t * p_vl, osm_madw_t * p_madw) qp0_mads_outstanding will be decremented by send error callback (called by osm_vendor_send() */ cl_atomic_dec(&p_vl->p_stats->qp0_mads_sent); - if (!resp_expected) + if (!resp_expected) { cl_atomic_dec(&p_vl->p_stats->qp0_unicasts_sent); + return; + } + + /* need to cause heavy-sweep if resp_expected MAD sending failed */ + p_smp = osm_madw_get_smp_ptr(p_madw); + OSM_LOG(p_vl->p_log, OSM_LOG_ERROR, "ERR 3E04: " + "%s method failed for attribute 0x%X (%s)\n", + p_smp->method == IB_MAD_METHOD_SET ? "SET" : "GET", + cl_ntoh16(p_smp->attr_id), + ib_get_sm_attr_str(p_smp->attr_id)); + + p_vl->p_subn->subnet_initialization_error = TRUE; + } static void vl15_poller(IN void *p_ptr) @@ -246,6 +260,7 @@ void osm_vl15_destroy(IN osm_vl15_t * p_vl, IN struct osm_mad_pool *p_pool) ib_api_status_t osm_vl15_init(IN osm_vl15_t * p_vl, IN osm_vendor_t * p_vend, IN osm_log_t * p_log, IN osm_stats_t * p_stats, + IN osm_subn_t * p_subn, IN int32_t max_wire_smps, IN int32_t max_wire_smps2, IN uint32_t max_smps_timeout) @@ -257,6 +272,7 @@ ib_api_status_t osm_vl15_init(IN osm_vl15_t * p_vl, IN osm_vendor_t * p_vend, p_vl->p_vend = p_vend; p_vl->p_log = p_log; p_vl->p_stats = p_stats; + p_vl->p_subn = p_subn; p_vl->max_wire_smps = max_wire_smps; p_vl->max_wire_smps2 = max_wire_smps2; p_vl->max_smps_timeout = max_wire_smps < max_wire_smps2 ?