From patchwork Fri Aug 23 00:35:22 2013 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Hefty, Sean" X-Patchwork-Id: 2848480 Return-Path: X-Original-To: patchwork-linux-rdma@patchwork.kernel.org Delivered-To: patchwork-parsemail@patchwork1.web.kernel.org Received: from mail.kernel.org (mail.kernel.org [198.145.19.201]) by patchwork1.web.kernel.org (Postfix) with ESMTP id 6F2EA9F2F6 for ; Fri, 23 Aug 2013 00:35:29 +0000 (UTC) Received: from mail.kernel.org (localhost [127.0.0.1]) by mail.kernel.org (Postfix) with ESMTP id EFF252016A for ; Fri, 23 Aug 2013 00:35:27 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id C86CD20142 for ; Fri, 23 Aug 2013 00:35:26 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752700Ab3HWAfZ (ORCPT ); Thu, 22 Aug 2013 20:35:25 -0400 Received: from mga01.intel.com ([192.55.52.88]:20472 "EHLO mga01.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752580Ab3HWAfY convert rfc822-to-8bit (ORCPT ); Thu, 22 Aug 2013 20:35:24 -0400 Received: from fmsmga002.fm.intel.com ([10.253.24.26]) by fmsmga101.fm.intel.com with ESMTP; 22 Aug 2013 17:35:24 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="4.89,937,1367996400"; d="scan'208";a="390660140" Received: from orsmsx108.amr.corp.intel.com ([10.22.240.6]) by fmsmga002.fm.intel.com with ESMTP; 22 Aug 2013 17:35:23 -0700 Received: from orsmsx151.amr.corp.intel.com (10.22.226.38) by ORSMSX108.amr.corp.intel.com (10.22.240.6) with Microsoft SMTP Server (TLS) id 14.3.123.3; Thu, 22 Aug 2013 17:35:23 -0700 Received: from orsmsx109.amr.corp.intel.com ([169.254.2.134]) by ORSMSX151.amr.corp.intel.com ([169.254.7.169]) with mapi id 14.03.0123.003; Thu, 22 Aug 2013 17:35:23 -0700 From: "Hefty, Sean" To: Matthew Anderson CC: Andreas Bluemle , "ceph-devel@vger.kernel.org" , "linux-rdma@vger.kernel.org (linux-rdma@vger.kernel.org)" Subject: RE: [ceph-users] Help needed porting Ceph to RSockets Thread-Topic: [ceph-users] Help needed porting Ceph to RSockets Thread-Index: Ac6fmKQRc5IbPNDBTZikMvg9jqAuXg== Date: Fri, 23 Aug 2013 00:35:22 +0000 Message-ID: <1828884A29C6694DAF28B7E6B8A8237388CA7C88@ORSMSX109.amr.corp.intel.com> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [10.22.254.138] MIME-Version: 1.0 Sender: linux-rdma-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-rdma@vger.kernel.org X-Spam-Status: No, score=-9.7 required=5.0 tests=BAYES_00, RCVD_IN_DNSWL_HI, RP_MATCHES_RCVD, UNPARSEABLE_RELAY autolearn=unavailable version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on mail.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP > I tested out the patch and unfortunately had the same results as > Andreas. About 50% of the time the rpoll() thread in Ceph still hangs > when rshutdown() is called. I saw a similar behaviour when increasing > the poll time on the pre-patched version if that's of any relevance. I'm not optimistic, but here's an updated patch. I attempted to handle more shutdown conditions, but I can't say that any of those would prevent the hang that you see. I have a couple of questions: Is there any chance that the code would call rclose while rpoll is still running? Also, can you verify that the thread is in the real poll() call when the hang occurs? Signed-off-by: Sean Hefty --- src/rsocket.c | 35 +++++++++++++++++++++++++---------- 1 files changed, 25 insertions(+), 10 deletions(-) -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html diff --git a/src/rsocket.c b/src/rsocket.c index d544dd0..f94ddf3 100644 --- a/src/rsocket.c +++ b/src/rsocket.c @@ -1822,7 +1822,12 @@ static int rs_poll_cq(struct rsocket *rs) rs->state = rs_disconnected; return 0; } else if (rs_msg_data(msg) == RS_CTRL_SHUTDOWN) { - rs->state &= ~rs_readable; + if (rs->state & rs_writable) { + rs->state &= ~rs_readable; + } else { + rs->state = rs_disconnected; + return 0; + } } break; case RS_OP_WRITE: @@ -2948,10 +2953,12 @@ static int rs_poll_events(struct pollfd *rfds, struct pollfd *fds, nfds_t nfds) rs = idm_lookup(&idm, fds[i].fd); if (rs) { + fastlock_acquire(&rs->cq_wait_lock); if (rs->type == SOCK_STREAM) rs_get_cq_event(rs); else ds_get_cq_event(rs); + fastlock_release(&rs->cq_wait_lock); fds[i].revents = rs_poll_rs(rs, fds[i].events, 1, rs_poll_all); } else { fds[i].revents = rfds[i].revents; @@ -3098,7 +3105,8 @@ int rselect(int nfds, fd_set *readfds, fd_set *writefds, /* * For graceful disconnect, notify the remote side that we're - * disconnecting and wait until all outstanding sends complete. + * disconnecting and wait until all outstanding sends complete, provided + * that the remote side has not sent a disconnect message. */ int rshutdown(int socket, int how) { @@ -3106,11 +3114,6 @@ int rshutdown(int socket, int how) int ctrl, ret = 0; rs = idm_at(&idm, socket); - if (how == SHUT_RD) { - rs->state &= ~rs_readable; - return 0; - } - if (rs->fd_flags & O_NONBLOCK) rs_set_nonblocking(rs, 0); @@ -3118,15 +3121,20 @@ int rshutdown(int socket, int how) if (how == SHUT_RDWR) { ctrl = RS_CTRL_DISCONNECT; rs->state &= ~(rs_readable | rs_writable); - } else { + } else if (how == SHUT_WR) { rs->state &= ~rs_writable; ctrl = (rs->state & rs_readable) ? RS_CTRL_SHUTDOWN : RS_CTRL_DISCONNECT; + } else { + rs->state &= ~rs_readable; + if (rs->state & rs_writable) + goto out; + ctrl = RS_CTRL_DISCONNECT; } if (!rs->ctrl_avail) { ret = rs_process_cq(rs, 0, rs_conn_can_send_ctrl); if (ret) - return ret; + goto out; } if ((rs->state & rs_connected) && rs->ctrl_avail) { @@ -3138,10 +3146,17 @@ int rshutdown(int socket, int how) if (rs->state & rs_connected) rs_process_cq(rs, 0, rs_conn_all_sends_done); +out: if ((rs->fd_flags & O_NONBLOCK) && (rs->state & rs_connected)) rs_set_nonblocking(rs, rs->fd_flags); - return 0; + if (rs->state & rs_disconnected) { + /* Generate event by flushing receives to unblock rpoll */ + ibv_req_notify_cq(rs->cm_id->recv_cq, 0); + rdma_disconnect(rs->cm_id); + } + + return ret; } static void ds_shutdown(struct rsocket *rs)