From patchwork Tue May 17 22:32:19 2011
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Jim Schutt <jaschut@sandia.gov>
X-Patchwork-Id: 792572
Received: from vger.kernel.org (vger.kernel.org [209.132.180.67])
	by demeter1.kernel.org (8.14.4/8.14.3) with ESMTP id p4HMWs8i030875
	for <patchwork-ceph-devel@patchwork.kernel.org>;
	Tue, 17 May 2011 22:32:56 GMT
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
	id S1756706Ab1EQWcx (ORCPT
	<rfc822;patchwork-ceph-devel@patchwork.kernel.org>);
	Tue, 17 May 2011 18:32:53 -0400
Received: from sentry-two.sandia.gov ([132.175.109.14]:33132 "EHLO
	sentry-two.sandia.gov" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
	with ESMTP id S1756695Ab1EQWcw (ORCPT
	<rfc822; ceph-devel@vger.kernel.org>); Tue, 17 May 2011 18:32:52 -0400
X-WSS-ID: 0LLD2MQ-0B-1VH-02
X-M-MSG: 
Received: from interceptor1.sandia.gov (interceptor1.sandia.gov
	[132.175.109.5])
	(using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits))
	(No client certificate requested)
	by sentry-two.sandia.gov (Postfix) with ESMTP id 14C8F5686BD;
	Tue, 17 May 2011 16:32:50 -0600 (MDT)
Received: from sentry.sandia.gov (sentry.sandia.gov [132.175.109.21]) by
	interceptor1.sandia.gov (RSA Interceptor);
	Tue, 17 May 2011 16:25:11 -0600
Received: from [132.175.109.4] by sentry.sandia.gov with ESMTP (SMTP
	Relay 01 (Email Firewall v6.3.2)); Tue, 17 May 2011 16:32:26 -0600
X-Server-Uuid: 6BFC7783-7E22-49B4-B610-66D6BE496C0E
Received: from mail.sandia.gov (exch03.sandia.gov [134.253.103.3] (may
	be forged)) by mailgate2.sandia.gov (8.14.4/8.14.4) with ESMTP id
	p4HMWP3O013681; Tue, 17 May 2011 16:32:25 -0600
Received: from [134.253.4.20] (134.253.95.179) by mail.sandia.gov (
	134.253.103.3) with Microsoft SMTP Server (TLS) id 14.1.270.1; Tue, 17
	May 2011 16:32:25 -0600
Message-ID: <4DD2F773.30508@sandia.gov>
Date: Tue, 17 May 2011 16:32:19 -0600
From: "Jim Schutt" <jaschut@sandia.gov>
User-Agent: Thunderbird 2.0.0.24 (X11/20110128)
MIME-Version: 1.0
To: "Sage Weil" <sage@newdream.net>
cc: ceph-devel@vger.kernel.org
Subject: Re: [PATCH 2/2] libceph: fix handle_timeout() racing with
	con_work()/try_write()
References: <Pine.LNX.4.64.1105051215050.26811@cobra.newdream.net>
	<1305235954-9860-2-git-send-email-jaschut@sandia.gov>
	<Pine.LNX.4.64.1105161033150.12367@cobra.newdream.net>
In-Reply-To: <Pine.LNX.4.64.1105161033150.12367@cobra.newdream.net>
X-Originating-IP: [134.253.95.179]
X-PMX-Version: 5.6.0.2009776, Antispam-Engine: 2.7.2.376379,
	Antispam-Data: 2011.5.17.222116
X-PMX-Spam: Gauge=IIIIIIII, Probability=8%, Report=' BODY_SIZE_3000_3999
	0, BODY_SIZE_5000_LESS 0, BODY_SIZE_7000_LESS 0, DATE_TZ_NA 0,
	NO_URI_FOUND 0, WEBMAIL_SOURCE 0, WEBMAIL_XOIP 0, WEBMAIL_X_IP_HDR 0,
	__BOUNCE_CHALLENGE_SUBJ 0, __BOUNCE_NDR_SUBJ_EXEMPT 0, __CT 0, __CTE 0,
	__CT_TEXT_PLAIN 0, __HAS_MSGID 0, __HAS_XOIP 0, __MIME_TEXT_ONLY 0,
	__MIME_VERSION 0, __MOZILLA_MSGID 0, __RATWARE_X_MAILER_CS_B 0,
	__SANE_MSGID 0, __TO_MALFORMED_2 0, __USER_AGENT 0'
X-TMWD-Spam-Summary: TS=20110517223229; ID=1; SEV=2.3.1;
	DFV=B2011051723; IFV=NA; AIF=B2011051723; RPD=5.03.0010; ENG=NA;
	RPDID=7374723D303030312E30413031303230342E34444432463737442E303033343A534346535441543838363133332C73733D312C6667733D30;
	CAT=NONE; CON=NONE; SIG=AAABAJsKIgAAAAAAAAAAAAAAAAAAAH0=
X-MMS-Spam-Filter-ID: B2011051723_5.03.0010
X-WSS-ID: 61CC28F02K44030446-01-01
X-RSA-Inspected: yes
X-RSA-Classifications: public
X-RSA-Action: allow
Sender: ceph-devel-owner@vger.kernel.org
Precedence: bulk
List-ID: <ceph-devel.vger.kernel.org>
X-Mailing-List: ceph-devel@vger.kernel.org
X-Greylist: IP, sender and recipient auto-whitelisted, not delayed by
	milter-greylist-4.2.6 (demeter1.kernel.org [140.211.167.41]);
	Tue, 17 May 2011 22:32:58 +0000 (UTC)

Hi Sage,

Sage Weil wrote:
> Hi Jim,

> This picks up the second resent message.  But and OPENING is set.. so how 
> is it that we were in try_write but we didn't pass through the OPENING 
> test in con_work?
> 
>>>> con_work()/try_write() start again on new message.
>>>> Note that con_work() did not notice that the connection was marked to
>>>> be closed/reopened!  This is where things went bad.
>>>> Note the connection state: 8220 = 0x201c, and 0x2000 is OPENING.
>> [ 2412.643178] libceph:   messenger.c:1969 : con_work ffff8801955c0030 OPENING
>> [ 2412.643182] libceph:   messenger.c:292  : con_close_socket on ffff8801955c0030 sock ffff8801825a8c00
>> [ 2412.643197] libceph:   messenger.c:162  : ceph_state_change ffff8801955c0030 state = 2076 sk_state = 4
> 
> Ok, now we see it, but this should have executed before we ever got to 
> try_write!
> 
> There are a handful of places that drop and re-take con->mutex... I think 
> it has to be one of those.  They are:
> 
> - prepare_connect_authorizer (before calling ->prepare_authorizer 
> callback)
> - fail_protocol (before calling ->bad_proto callback)
> - process_connect RESETSESSION case (before calling ->reset callback)
> - process_message (before calling ->dispatch callback)
> 
> I think after retaking the lock in each of these cases we need to test for 
> the CLOSED or OPENING bits and, if set, bail out so that con_work can 
> restart.
> 
> Does that make sense?  Can you confirm which case of lock droppage inside 
> con_work is at fault?  (Maybe adding a printk at the top of con_work after 
> taking the lock too would make it more clear what's going on; currently it 
> only prints in all the random test_bit cases).

So I've got logs now that show that connection state changes
across a call to process_message.  That seems like it proves
the concept.

> 
> I'm thinking something like the below... does that seem reasonable?  I 
> only did the prepare_authorizer case to start, but the others should be 
> fixed up too.  Compile tested only.
> 

Is there any reason not to fix all the cases at once, in try_read?
It's less code, but far away from where the mutex dropping occurs,
so we add a comment:


What am I missing?

-- Jim
---
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

diff --git a/net/ceph/messenger.c b/net/ceph/messenger.c
index a9bd424..11d6a61 100644
--- a/net/ceph/messenger.c
+++ b/net/ceph/messenger.c
@@ -1816,10 +1820,19 @@ static int try_read(struct ceph_connection *con)
  	dout("try_read start on %p state %lu\n", con, con->state);

  more:
  	dout("try_read %p state %lu tag %d in_base_pos %d\n", con,
  	     con->state, (int)con->in_tag, con->in_base_pos);
+
+	/* process_connect, process_message need to drop con->mutex,
+	 * so connection state might have changed on us.  Check it,
+	 * leave if necessary so new state can be processed.
+	 */
+	if (test_bit(CLOSED, &con->state) ||
+	    test_bit(OPENING, &con->state))
+		goto out;
+
  	if (test_bit(CONNECTING, &con->state)) {
  		if (!test_bit(NEGOTIATING, &con->state)) {
  			dout("try_read %p connecting\n", con);
  			ret = read_partial_banner(con);
  			if (ret <= 0)