From patchwork Tue Dec 3 13:25:06 2013 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Hal Rosenstock X-Patchwork-Id: 3276011 X-Patchwork-Delegate: hal@mellanox.com Return-Path: X-Original-To: patchwork-linux-rdma@patchwork.kernel.org Delivered-To: patchwork-parsemail@patchwork2.web.kernel.org Received: from mail.kernel.org (mail.kernel.org [198.145.19.201]) by patchwork2.web.kernel.org (Postfix) with ESMTP id 6E6A8C0D4A for ; Tue, 3 Dec 2013 13:25:22 +0000 (UTC) Received: from mail.kernel.org (localhost [127.0.0.1]) by mail.kernel.org (Postfix) with ESMTP id A6F262037E for ; Tue, 3 Dec 2013 13:25:17 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 6C3352037D for ; Tue, 3 Dec 2013 13:25:12 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753262Ab3LCNZL (ORCPT ); Tue, 3 Dec 2013 08:25:11 -0500 Received: from mail-ee0-f51.google.com ([74.125.83.51]:52414 "EHLO mail-ee0-f51.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752485Ab3LCNZJ (ORCPT ); Tue, 3 Dec 2013 08:25:09 -0500 Received: by mail-ee0-f51.google.com with SMTP id b15so1299587eek.24 for ; Tue, 03 Dec 2013 05:25:08 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:message-id:date:from:user-agent:mime-version:to :cc:subject:content-type:content-transfer-encoding; bh=zIBfo0ngtU/b7NGq1OAmnx8XZhisIpRPQzKNWsHOBAM=; b=QJuW72MzPYtC7ZrG8WFkBrHyLz9Jmnu/6k8z4udPzEqxBDR1PwkAv8+QolvEntxXqf Q+pMNUeVoh3Tmvv9qICxUee09e9kNS5PqZXyDcaN9GA33g1wtw04OBRcdMtKxSKeaGQy y67WFCulT+Q7mUujNefqHFMNbzAKBTuUvkbw9c0g4lDMoCzs3lm5IJd1abf52v/82GdB L3FFJ5yFAxXw+XJESA5pgsj8OPISbdb6HubyelTMluXHxH6dWQj7wL0d4SupJQaMZOMF /QVxA0na1/fKY2sZF1e63x8kEMxf6/rzs0NWYidkEQkuslHHeq8Kh6u/Gq/2ZccZobzQ gBsA== X-Gm-Message-State: ALoCoQl+tD6qeTkZvzt8QOTFqQKRRG1i/bid20PQxwaf1rEP9od1lTpQHaRjpceIjrtp47cgxzSf X-Received: by 10.15.91.3 with SMTP id r3mr70862236eez.18.1386077108412; Tue, 03 Dec 2013 05:25:08 -0800 (PST) Received: from [192.168.1.102] (c-98-229-118-119.hsd1.ma.comcast.net. [98.229.118.119]) by mx.google.com with ESMTPSA id 44sm82427711eek.5.2013.12.03.05.25.07 for (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Tue, 03 Dec 2013 05:25:08 -0800 (PST) Message-ID: <529DDBB2.80309@dev.mellanox.co.il> Date: Tue, 03 Dec 2013 08:25:06 -0500 From: Hal Rosenstock User-Agent: Mozilla/5.0 (Windows NT 6.1; rv:9.0) Gecko/20111222 Thunderbird/9.0.1 MIME-Version: 1.0 To: "linux-rdma (linux-rdma@vger.kernel.org)" CC: Line Holen , Daniel Klein Subject: [PATCH opensm] osm_sm_state_mgr.c: Fix race condition during sm_state_mgr_send_master_sm_info_req Sender: linux-rdma-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-rdma@vger.kernel.org X-Spam-Status: No, score=-6.9 required=5.0 tests=BAYES_00, RCVD_IN_DNSWL_HI, RP_MATCHES_RCVD, UNPARSEABLE_RELAY autolearn=ham version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on mail.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP From: Daniel Klein When OpenSM changes SM state from standby to master, it sets m->p_polling_sm to NULL. When this occurs while calling sm_state_mgr_send_master_sm_info_req, it can result in a segmentation fault. Signed-off-by: Daniel Klein Signed-off-by: Hal Rosenstock Acked-by: Line Holen --- -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html diff --git a/opensm/osm_sm_state_mgr.c b/opensm/osm_sm_state_mgr.c index 596ad8f..770c5f9 100644 --- a/opensm/osm_sm_state_mgr.c +++ b/opensm/osm_sm_state_mgr.c @@ -74,7 +74,7 @@ void osm_report_sm_state(osm_sm_t * sm) OSM_LOG_MSG_BOX(sm->p_log, OSM_LOG_VERBOSE, buf); } -static void sm_state_mgr_send_master_sm_info_req(osm_sm_t * sm) +static void sm_state_mgr_send_master_sm_info_req(osm_sm_t * sm, uint8_t sm_state) { osm_madw_context_t context; const osm_port_t *p_port; @@ -85,8 +85,7 @@ static void sm_state_mgr_send_master_sm_info_req(osm_sm_t * sm) OSM_LOG_ENTER(sm->p_log); memset(&context, 0, sizeof(context)); - CL_PLOCK_ACQUIRE(sm->p_lock); - if (sm->p_subn->sm_state == IB_SMINFO_STATE_STANDBY) { + if (sm_state == IB_SMINFO_STATE_STANDBY) { /* * We are in STANDBY state - this means we need to poll the * master SM (according to master_guid). @@ -104,13 +103,19 @@ static void sm_state_mgr_send_master_sm_info_req(osm_sm_t * sm) guid = sm->p_polling_sm->smi.guid; } + /* Verify that SM is not polling itself */ + if (guid == sm->p_subn->sm_port_guid) { + OSM_LOG(sm->p_log, OSM_LOG_VERBOSE, + "OpenSM doesn't poll itself\n"); + goto Exit; + } + p_port = osm_get_port_by_guid(sm->p_subn, guid); if (p_port == NULL) { OSM_LOG(sm->p_log, OSM_LOG_ERROR, "ERR 3203: " "No port object for GUID 0x%016" PRIx64 "\n", cl_ntoh64(guid)); - CL_PLOCK_RELEASE(sm->p_lock); goto Exit; } @@ -122,7 +127,6 @@ static void sm_state_mgr_send_master_sm_info_req(osm_sm_t * sm) IB_MAD_ATTR_SM_INFO, 0, FALSE, ib_port_info_get_m_key(&p_port->p_physp->port_info), CL_DISP_MSGID_NONE, &context); - CL_PLOCK_RELEASE(sm->p_lock); if (status != IB_SUCCESS) OSM_LOG(sm->p_log, OSM_LOG_ERROR, "ERR 3204: " @@ -135,7 +139,7 @@ Exit: static void sm_state_mgr_start_polling(osm_sm_t * sm) { - uint32_t timeout = sm->p_subn->opt.sminfo_polling_timeout; + uint32_t timeout; cl_status_t cl_status; OSM_LOG_ENTER(sm->p_log); @@ -148,7 +152,10 @@ static void sm_state_mgr_start_polling(osm_sm_t * sm) /* * Send a SubnGet(SMInfo) query to the current (or new) master found. */ - sm_state_mgr_send_master_sm_info_req(sm); + CL_PLOCK_ACQUIRE(sm->p_lock); + timeout = sm->p_subn->opt.sminfo_polling_timeout; + sm_state_mgr_send_master_sm_info_req(sm, sm->p_subn->sm_state); + CL_PLOCK_RELEASE(sm->p_lock); /* * Start a timer that will wake up every sminfo_polling_timeout milliseconds. @@ -166,21 +173,31 @@ static void sm_state_mgr_start_polling(osm_sm_t * sm) void osm_sm_state_mgr_polling_callback(IN void *context) { osm_sm_t *sm = context; - uint32_t timeout = sm->p_subn->opt.sminfo_polling_timeout; + uint32_t timeout; cl_status_t cl_status; + uint8_t sm_state; OSM_LOG_ENTER(sm->p_log); + cl_spinlock_acquire(&sm->state_lock); + sm_state = sm->p_subn->sm_state; + cl_spinlock_release(&sm->state_lock); + + CL_PLOCK_ACQUIRE(sm->p_lock); + timeout = sm->p_subn->opt.sminfo_polling_timeout; + /* * We can be here in one of two cases: * 1. We are a STANDBY sm polling on the master SM. * 2. We are a MASTER sm, waiting for a handover from a remote master sm. * If we are not in one of these cases - don't need to restart the poller. */ - if (!((sm->p_subn->sm_state == IB_SMINFO_STATE_MASTER && + if (!((sm_state == IB_SMINFO_STATE_MASTER && sm->p_polling_sm != NULL) || - sm->p_subn->sm_state == IB_SMINFO_STATE_STANDBY)) + sm_state == IB_SMINFO_STATE_STANDBY)) { + CL_PLOCK_RELEASE(sm->p_lock); goto Exit; + } /* * If we are a STANDBY sm and the osm_exit_flag is set, then let's @@ -189,7 +206,8 @@ void osm_sm_state_mgr_polling_callback(IN void *context) * received. In other cases - it is not relevant whether or not the * signal is on - since we are currently in exit flow */ - if (sm->p_subn->sm_state == IB_SMINFO_STATE_STANDBY && osm_exit_flag) { + if (sm_state == IB_SMINFO_STATE_STANDBY && osm_exit_flag) { + CL_PLOCK_RELEASE(sm->p_lock); OSM_LOG(sm->p_log, OSM_LOG_VERBOSE, "Signalling subnet_up_event\n"); cl_event_signal(&sm->subnet_up_event); @@ -207,6 +225,7 @@ void osm_sm_state_mgr_polling_callback(IN void *context) sm->retry_number); if (sm->retry_number >= sm->p_subn->opt.polling_retry_number) { + CL_PLOCK_RELEASE(sm->p_lock); OSM_LOG(sm->p_log, OSM_LOG_DEBUG, "Reached polling_retry_number value in retry_number. " "Go to DISCOVERY state\n"); @@ -215,7 +234,9 @@ void osm_sm_state_mgr_polling_callback(IN void *context) } /* Send a SubnGet(SMInfo) request to the remote sm (depends on our state) */ - sm_state_mgr_send_master_sm_info_req(sm); + sm_state_mgr_send_master_sm_info_req(sm, sm_state); + + CL_PLOCK_RELEASE(sm->p_lock); /* restart the timer */ cl_status = cl_timer_start(&sm->polling_timer, timeout);