From patchwork Wed Jun 22 21:22:58 2011 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Sage Weil X-Patchwork-Id: 907322 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by demeter2.kernel.org (8.14.4/8.14.4) with ESMTP id p5MLKTe8028386 for ; Wed, 22 Jun 2011 21:20:29 GMT Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1758618Ab1FVVU2 (ORCPT ); Wed, 22 Jun 2011 17:20:28 -0400 Received: from cobra.newdream.net ([66.33.216.30]:46089 "EHLO cobra.newdream.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1758211Ab1FVVU1 (ORCPT ); Wed, 22 Jun 2011 17:20:27 -0400 Received: from cobra.newdream.net (localhost [127.0.0.1]) by cobra.newdream.net (Postfix) with ESMTP id 624CEBC717; Wed, 22 Jun 2011 14:22:58 -0700 (PDT) DomainKey-Signature: a=rsa-sha1; c=nofws; d=newdream.net; h=date:from:to:cc :subject:in-reply-to:message-id:references:mime-version: content-type; q=dns; s=newdream.net; b=HkuCqde9wz2aDCWdm++Nls7EG nIqXF1XowgNbp4YTfniwqKBp3rwhe7W84Kw2/scjNCZPlK1yvheBerUqGvNYUj8A kLlVDJDAVZkdeJAYYpbdySEK8SaOc5Csox2kGEmEXfwa0m8U5/vio6Lq2g2mGwQR FPR0zwrv2B/fO86M48= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=newdream.net; h=date:from :to:cc:subject:in-reply-to:message-id:references:mime-version: content-type; s=newdream.net; bh=51brQUCjMUH5d2mYzUGPIdklFf8=; b=jM+1uasvz4igZGJnYDOspZZbIm/DVNxu96glXikBVyk62+egkr/1CwWOsHvHA HpyjjhVya43RlWrVN+EHFAf52jxaPsFGx3opSfLRdE22CyRcsvZaFieIZb8lErYn 394vc+361ie++ugvJkeUoztXKLphqGgDC+GF9R6Fsx1sYI= Received: by cobra.newdream.net (Postfix, from userid 1031) id 27E7FBC7C5; Wed, 22 Jun 2011 14:22:58 -0700 (PDT) Received: from localhost (localhost [127.0.0.1]) by cobra.newdream.net (Postfix) with ESMTP id 115A1BC717; Wed, 22 Jun 2011 14:22:58 -0700 (PDT) Date: Wed, 22 Jun 2011 14:22:58 -0700 (PDT) From: Sage Weil To: Jim Schutt cc: ceph-devel@vger.kernel.org Subject: Re: [PATCH 0/3] RFC: Enable clients to distinguish busy and unreachable OSDs In-Reply-To: Message-ID: References: <1308767187-10376-1-git-send-email-jaschut@sandia.gov> MIME-Version: 1.0 Sender: ceph-devel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: ceph-devel@vger.kernel.org X-Greylist: IP, sender and recipient auto-whitelisted, not delayed by milter-greylist-4.2.6 (demeter2.kernel.org [140.211.167.43]); Wed, 22 Jun 2011 21:20:29 +0000 (UTC) Hey Jim- I wonder if the below is sufficient, actually. This avoids any change on the server side, and just changes the client to start the per-message timeout "clock" when the message is actually received by the server... This way we still time out if the request gets stuck in cosd's request queues somewhere, or if the disk blocks up, or something. Any requests that didn't get received don't time out, though. What do you think? sage From e129e4f3f500f4e77cd1a7c64ff64edc54a9a9ea Mon Sep 17 00:00:00 2001 From: Sage Weil Date: Wed, 22 Jun 2011 13:43:06 -0700 Subject: [PATCH] libceph: don't time out osd requests that haven't been received Keep track of when an outgoing message is ACKed (i.e., the server fully received it and, presumably, queued it for processing). Time out OSD requests only if it's been too long since they've been received. This prevents timeouts and connection thrashing when the OSDs are simply busy and are throttling the requests they read off the network. Signed-off-by: Sage Weil --- include/linux/ceph/messenger.h | 1 + net/ceph/messenger.c | 12 +++++------- net/ceph/osd_client.c | 6 ++++++ 3 files changed, 12 insertions(+), 7 deletions(-) diff --git a/include/linux/ceph/messenger.h b/include/linux/ceph/messenger.h index 31d91a6..d7adf15 100644 --- a/include/linux/ceph/messenger.h +++ b/include/linux/ceph/messenger.h @@ -94,6 +94,7 @@ struct ceph_msg { bool more_to_follow; bool needs_out_seq; int front_max; + unsigned long ack_stamp; /* tx: when we were acked */ struct ceph_msgpool *pool; }; diff --git a/net/ceph/messenger.c b/net/ceph/messenger.c index 78b55f4..c340e2e 100644 --- a/net/ceph/messenger.c +++ b/net/ceph/messenger.c @@ -486,13 +486,10 @@ static void prepare_write_message(struct ceph_connection *con) m = list_first_entry(&con->out_queue, struct ceph_msg, list_head); con->out_msg = m; - if (test_bit(LOSSYTX, &con->state)) { - list_del_init(&m->list_head); - } else { - /* put message on sent list */ - ceph_msg_get(m); - list_move_tail(&m->list_head, &con->out_sent); - } + + /* put message on sent list */ + ceph_msg_get(m); + list_move_tail(&m->list_head, &con->out_sent); /* * only assign outgoing seq # if we haven't sent this message @@ -1399,6 +1396,7 @@ static void process_ack(struct ceph_connection *con) break; dout("got ack for seq %llu type %d at %p\n", seq, le16_to_cpu(m->hdr.type), m); + m->ack_stamp = jiffies; ceph_msg_remove(m); } prepare_read_tag(con); diff --git a/net/ceph/osd_client.c b/net/ceph/osd_client.c index 7330c27..ce310ee 100644 --- a/net/ceph/osd_client.c +++ b/net/ceph/osd_client.c @@ -1085,9 +1085,15 @@ static void handle_timeout(struct work_struct *work) req = list_entry(osdc->req_lru.next, struct ceph_osd_request, r_req_lru_item); + /* hasn't been long enough since we sent it? */ if (time_before(jiffies, req->r_stamp + timeout)) break; + /* hasn't been long enough since it was acked? */ + if (req->r_request->ack_stamp == 0 || + time_before(jiffies, req->r_request->ack_stamp + timeout)) + break; + BUG_ON(req == last_req && req->r_stamp == last_stamp); last_req = req; last_stamp = req->r_stamp;