From patchwork Tue Mar 4 11:07:48 2014 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alex Netes X-Patchwork-Id: 3762301 X-Patchwork-Delegate: hal@mellanox.com Return-Path: X-Original-To: patchwork-linux-rdma@patchwork.kernel.org Delivered-To: patchwork-parsemail@patchwork2.web.kernel.org Received: from mail.kernel.org (mail.kernel.org [198.145.19.201]) by patchwork2.web.kernel.org (Postfix) with ESMTP id 16075BF13A for ; Tue, 4 Mar 2014 13:10:27 +0000 (UTC) Received: from mail.kernel.org (localhost [127.0.0.1]) by mail.kernel.org (Postfix) with ESMTP id 12BBD203EC for ; Tue, 4 Mar 2014 13:10:26 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id D61D3201B6 for ; Tue, 4 Mar 2014 13:10:24 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1756942AbaCDNKX (ORCPT ); Tue, 4 Mar 2014 08:10:23 -0500 Received: from mail-bk0-f43.google.com ([209.85.214.43]:45455 "EHLO mail-bk0-f43.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1756757AbaCDNKW (ORCPT ); Tue, 4 Mar 2014 08:10:22 -0500 Received: by mail-bk0-f43.google.com with SMTP id v15so203323bkz.30 for ; Tue, 04 Mar 2014 05:10:21 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:sender:from:to:cc:subject:date:message-id; bh=9oiDkIzLgT2PHQwcbMRS+QOsgqDx8FWVcmq1+JNleUE=; b=Tbh0xwHtNctJvfBnCnqWL8lhaejRK7+DUUMh9cVft2kDc45VTevNf02dffgeyLP6b5 jbxPAAwk1jf8cCVu8PckEyvZuminAOWsGmf3HVLbZBLINbzGy+oJmdWv1li6dHZc116a stqXkqxs5sJKEf5EdmZsH93gIBgN7+R3lCL/QPvjHKh0zrkqRxwwSs5jbSuVB32RGkrr 79crdK3sxLwCzTUUgAkahnX4I/ijE42SEyyhaNNtEHSQZJijr/nsXkWaAJAiKUvhnVp7 rNmAEUoa7gPN22rqAkFcljJLsdZIdlDJDowKeOWyeaT+VKfJfd8VyoIvnPoacUKfg0Po sY6g== X-Gm-Message-State: ALoCoQmbY0wvNBngYGBlky5WhGO1G+vZSDt1qaKAbbxncmm/QajuXtTNr/h+6rN5qWdEhQSXZAzF X-Received: by 10.205.32.204 with SMTP id sl12mr53566bkb.162.1393938621086; Tue, 04 Mar 2014 05:10:21 -0800 (PST) Received: from localhost (out.voltaire.com. [193.47.165.251]) by mx.google.com with ESMTPSA id f11sm1119197bkj.6.2014.03.04.05.10.20 for (version=TLSv1.1 cipher=RC4-SHA bits=128/128); Tue, 04 Mar 2014 05:10:20 -0800 (PST) From: Alex Netes To: linux-rdma@vger.kernel.org, Hal Rosenstock Cc: Alex Netes Subject: [PATCH] opensm: Fix crash during handover Date: Tue, 4 Mar 2014 13:07:48 +0200 Message-Id: <1393931268-15316-1-git-send-email-alexne@mellanox.com> X-Mailer: git-send-email 1.7.1 Sender: linux-rdma-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-rdma@vger.kernel.org X-Spam-Status: No, score=-6.9 required=5.0 tests=BAYES_00, RCVD_IN_DNSWL_HI, RP_MATCHES_RCVD, UNPARSEABLE_RELAY autolearn=ham version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on mail.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Another MASTER SM with lower priority sends HANDOVER to our SM, before our SM *starts* polling it. In sm_state_mgr_start_polling() there is no validation whether p_polling_sm is valid. Signed-off-by: Alex Netes --- include/opensm/osm_sm.h | 2 +- opensm/osm_drop_mgr.c | 6 +++--- opensm/osm_sm_state_mgr.c | 12 ++++++------ opensm/osm_sminfo_rcv.c | 2 +- opensm/osm_state_mgr.c | 2 +- 5 files changed, 12 insertions(+), 12 deletions(-) diff --git a/include/opensm/osm_sm.h b/include/opensm/osm_sm.h index e48c549..94d1831 100644 --- a/include/opensm/osm_sm.h +++ b/include/opensm/osm_sm.h @@ -116,7 +116,7 @@ typedef struct osm_sm { unsigned master_sm_found; uint32_t retry_number; ib_net64_t master_sm_guid; - osm_remote_sm_t *p_polling_sm; + ib_net64_t polling_sm_guid; osm_subn_t *p_subn; osm_db_t *p_db; osm_vendor_t *p_vendor; diff --git a/opensm/osm_drop_mgr.c b/opensm/osm_drop_mgr.c index ff6a81b..c1cdc0d 100644 --- a/opensm/osm_drop_mgr.c +++ b/opensm/osm_drop_mgr.c @@ -257,9 +257,9 @@ static void drop_mgr_remove_port(osm_sm_t * sm, IN osm_port_t * p_port) OSM_LOG(sm->p_log, OSM_LOG_VERBOSE, "Cleaned SM for port guid 0x%016" PRIx64 "\n", cl_ntoh64(port_guid)); - /* clean up the polling_sm pointer */ - if (sm->p_polling_sm == p_sm) - sm->p_polling_sm = NULL; + /* clean up the polling_sm_guid */ + if (sm->polling_sm_guid == p_sm->smi.guid) + sm->polling_sm_guid = 0; free(p_sm); } diff --git a/opensm/osm_sm_state_mgr.c b/opensm/osm_sm_state_mgr.c index 0660fb9..e5a11da 100644 --- a/opensm/osm_sm_state_mgr.c +++ b/opensm/osm_sm_state_mgr.c @@ -97,11 +97,11 @@ static boolean_t sm_state_mgr_send_master_sm_info_req(osm_sm_t * sm, uint8_t sm_ } else { /* * We are not in STANDBY - this means we are in MASTER state - - * so we need to poll the SM that is saved in p_polling_sm + * so we need to poll the SM that is saved in polling_sm_guid * under sm. * Send a query of SubnGet(SMInfo) to that SM. */ - guid = sm->p_polling_sm->smi.guid; + guid = sm->polling_sm_guid; } /* Verify that SM is not polling itself */ @@ -198,7 +198,7 @@ void osm_sm_state_mgr_polling_callback(IN void *context) * If we are not in one of these cases - don't need to restart the poller. */ if (!((sm_state == IB_SMINFO_STATE_MASTER && - sm->p_polling_sm != NULL) || + sm->polling_sm_guid != 0) || sm_state == IB_SMINFO_STATE_STANDBY)) { CL_PLOCK_RELEASE(sm->p_lock); goto Exit; @@ -426,7 +426,7 @@ ib_api_status_t osm_sm_state_mgr_process(osm_sm_t * sm, * We want to force a heavy sweep - hopefully this * occurred because the remote sm died, and we'll find * this out and configure the subnet after a heavy sweep. - * We also want to clear the p_polling_sm object - since + * We also want to clear the polling_sm_guid - since * we are done polling on that remote sm - we are * sweeping again. */ @@ -438,7 +438,7 @@ ib_api_status_t osm_sm_state_mgr_process(osm_sm_t * sm, * change, or we are in idle state - since we * recognized a master SM before - so we want to make a * heavy sweep and reconfigure the new subnet. - * We also want to clear the p_polling_sm object - since + * We also want to clear the polling_sm_guid - since * we are done polling on that remote sm - we got a * handover from it. */ @@ -449,7 +449,7 @@ ib_api_status_t osm_sm_state_mgr_process(osm_sm_t * sm, * SM may have configure/done on the fabric. */ sm->p_subn->set_client_rereg_on_sweep = TRUE; - sm->p_polling_sm = NULL; + sm->polling_sm_guid = 0; sm->p_subn->force_heavy_sweep = TRUE; osm_sm_signal(sm, OSM_SIGNAL_SWEEP); break; diff --git a/opensm/osm_sminfo_rcv.c b/opensm/osm_sminfo_rcv.c index 66ad410..9f62f9f 100644 --- a/opensm/osm_sminfo_rcv.c +++ b/opensm/osm_sminfo_rcv.c @@ -392,7 +392,7 @@ static void smi_rcv_process_get_sm(IN osm_sm_t * sm, * as it might not get it and we don't want to wait for a HANDOVER * forever. */ - if (sm->p_polling_sm) { + if (sm->polling_sm_guid) { if (smi_rcv_remote_sm_is_higher(sm, p_smi)) sm->p_subn->force_heavy_sweep = TRUE; else diff --git a/opensm/osm_state_mgr.c b/opensm/osm_state_mgr.c index f9b20e2..c4f4978 100644 --- a/opensm/osm_state_mgr.c +++ b/opensm/osm_state_mgr.c @@ -1386,7 +1386,7 @@ repeat_discovery: * need to wait for that SM to relinquish control * of its portion of the subnet. C14-60.2.1. * Also - need to start polling on that SM. */ - sm->p_polling_sm = p_remote_sm; + sm->polling_sm_guid = p_remote_sm->smi.guid; osm_sm_state_mgr_process(sm, OSM_SM_SIGNAL_WAIT_FOR_HANDOVER); return;