From patchwork Tue Mar 26 02:27:42 2013 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alex Elder X-Patchwork-Id: 2334451 Return-Path: X-Original-To: patchwork-ceph-devel@patchwork.kernel.org Delivered-To: patchwork-process-083081@patchwork1.kernel.org Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by patchwork1.kernel.org (Postfix) with ESMTP id 746F83FC54 for ; Tue, 26 Mar 2013 02:27:46 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1759216Ab3CZC1p (ORCPT ); Mon, 25 Mar 2013 22:27:45 -0400 Received: from mail-ia0-f173.google.com ([209.85.210.173]:51750 "EHLO mail-ia0-f173.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1756064Ab3CZC1p (ORCPT ); Mon, 25 Mar 2013 22:27:45 -0400 Received: by mail-ia0-f173.google.com with SMTP id h37so6081502iak.4 for ; Mon, 25 Mar 2013 19:27:44 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=x-received:message-id:date:from:user-agent:mime-version:to:subject :references:in-reply-to:content-type:content-transfer-encoding :x-gm-message-state; bh=3uP0V2IRCKuL9xr9PLlkj3ORIZKvVjCu/76HsHaKlJU=; b=OdG2hlpiVxy3ceVYMdB9nHFOhfpML1CmoBmwV3GRL+CN/Fa5KtZ/81ceaP8O02hmlB o8eY9MEgM4y1rA6/PRqisEPzVOCWbJh3ny5IV2MlYo9MVNjbCa3z2bCfpMGCo2LiRtCM JxkVNjzb1rW2oMoMEM6JLVYwEzfMxicUIv5HHojrq9nzF98RcpwornHdsbmwdacs2zJg RuIMMfnjR0Wu3EFUQ+h59H7P2EX0ZGlNzmDw4FS5Vp5YJFGFO836pMpHIxcLVTtb0zD3 VQxVjwQDaPR+ujHlOu5Oxw9kOvjI8nfdydzkz+2XiU3hcgBwBu4xkaAchVlJaPzPFucU yJ1w== X-Received: by 10.50.153.232 with SMTP id vj8mr185828igb.2.1364264864518; Mon, 25 Mar 2013 19:27:44 -0700 (PDT) Received: from [172.22.22.4] (c-71-195-31-37.hsd1.mn.comcast.net. [71.195.31.37]) by mx.google.com with ESMTPS id s8sm764262igs.0.2013.03.25.19.27.43 (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Mon, 25 Mar 2013 19:27:43 -0700 (PDT) Message-ID: <5151079E.3020108@inktank.com> Date: Mon, 25 Mar 2013 21:27:42 -0500 From: Alex Elder User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:17.0) Gecko/20130308 Thunderbird/17.0.4 MIME-Version: 1.0 To: ceph-devel@vger.kernel.org Subject: [PATCH 3/6] libceph: prepend requests in order when kicking References: <5151071C.3000309@inktank.com> In-Reply-To: <5151071C.3000309@inktank.com> X-Gm-Message-State: ALoCoQl6B/J93uD+g95L24WTYHRJw8VZsoZErWudxUTEuX0Nn8ghUrrhRGqlv/ZPJBnoiZnruyxB Sender: ceph-devel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: ceph-devel@vger.kernel.org The osd expects incoming requests from a given client to arrive in order, with the tid for each request being greater than the tid for requests that have already arrived. This patch fixes one place the osd client might not maintain that ordering. For the osd client, the connection fault method is osd_reset(). That function calls __reset_osd() to close and re-open the connection, then calls __kick_osd_requests() to cause all outstanding requests for the affected osd to be re-sent after the connection has been re-established. When an osd is reset, both in-flight and unsent messages will need to be re-sent. An osd client maintains distinct lists for unsent and in-flight messages. Meanwhile, an osd maintains a single list of call its requests (both sent and un-sent). (Each message is linked into two lists--one for the osd client and one list for the osd.) To process an osd "kick" operation, the osd's request list is traversed, and each request is moved off whichever osd *client* list it was on (unsent or sent) and placed onto the osd client's unsent list. (It remains where it is on the osd's request list.) When that is done, osd_reset() calls __send_queued() to cause each of the osd client's unsent messages to be sent. OK, with that background... As the osd request list is traversed each request is prepended to the osd client's unsent list in the order they're seen. The effect of this is to reverse the order of these requests as they are put (back) onto the unsent list. Instead, traverse the osd request list in reverse, so their order is preserved when prepending them to the unsent list. We still want to prepend these requests, because they will have lower tids than any previously-sent request. Just below that, traverse the linger list in forward order as before, but add them to the *tail* of the list rather than the head. These requests get re-registered, and in the process are give a new (higher) tid, so the should go at the end. This partially resolves: http://tracker.ceph.com/issues/4392 Signed-off-by: Alex Elder --- net/ceph/osd_client.c | 16 ++++++++++++++-- 1 file changed, 14 insertions(+), 2 deletions(-) osd->o_osd); @@ -585,6 +592,11 @@ static void __kick_osd_requests(struct ceph_osd_client *osdc, req->r_flags |= CEPH_OSD_FLAG_RETRY; } + /* + * Linger requests are re-registered before sending, which + * sets up a new tid for each. We add them to the unsent + * list at the end to keep things in tid order. + */ list_for_each_entry_safe(req, nreq, &osd->o_linger_requests, r_linger_osd) { /* @@ -593,7 +605,7 @@ static void __kick_osd_requests(struct ceph_osd_client *osdc, */ BUG_ON(!list_empty(&req->r_req_lru_item)); __register_request(osdc, req); - list_add(&req->r_req_lru_item, &osdc->req_unsent); + list_add_tail(&req->r_req_lru_item, &osdc->req_unsent); list_add(&req->r_osd_item, &req->r_osd->o_requests); __unregister_linger_request(osdc, req); dout("requeued lingering %p tid %llu osd%d\n", req, req->r_tid, diff --git a/net/ceph/osd_client.c b/net/ceph/osd_client.c index 3723a7f..707d632 100644 --- a/net/ceph/osd_client.c +++ b/net/ceph/osd_client.c @@ -577,7 +577,14 @@ static void __kick_osd_requests(struct ceph_osd_client *osdc, if (err) return; - list_for_each_entry(req, &osd->o_requests, r_osd_item) { + /* + * Traverse the osd's list of requests in reverse, moving + * each entry from whatever osd client list it's on (unsent + * or in-flight/lru) to the front of the osd client's unsent + * list. When we're done all the osd's requests will all be + * in the osd client unsent list in increasing order of tid. + */ + list_for_each_entry_reverse(req, &osd->o_requests, r_osd_item) { list_move(&req->r_req_lru_item, &osdc->req_unsent); dout("requeued %p tid %llu osd%d\n", req, req->r_tid,