From patchwork Wed Mar 21 11:09:55 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Rishabh Dave X-Patchwork-Id: 10299147 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id 2EA5E60385 for ; Wed, 21 Mar 2018 11:10:04 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 2119029800 for ; Wed, 21 Mar 2018 11:10:04 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 1056029802; Wed, 21 Mar 2018 11:10:04 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.9 required=2.0 tests=BAYES_00, RCVD_IN_DNSWL_HI, T_TVD_MIME_EPI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 54AAB29800 for ; Wed, 21 Mar 2018 11:10:01 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751546AbeCULJ7 (ORCPT ); Wed, 21 Mar 2018 07:09:59 -0400 Received: from mail-wm0-f51.google.com ([74.125.82.51]:54516 "EHLO mail-wm0-f51.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751542AbeCULJ6 (ORCPT ); Wed, 21 Mar 2018 07:09:58 -0400 Received: by mail-wm0-f51.google.com with SMTP id h76so8944535wme.4 for ; Wed, 21 Mar 2018 04:09:57 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:from:date:message-id:subject:to; bh=nRU5b3BL54p2fb5L7Tm4c7jeB4h6Orso+DAnJQkLYAw=; b=XJqM9cJUFg3qlNPJQXxGZTuxOrwbTTrnF+cPUk9Gq5ndZ4CoRsgQTA9qFwYGoI1MGz YNil829iqjPoPuIe42+l2FNVQ5SSz+9jCo1SsVkMU+U1udBfMJ1SkDeyXMQuQ2NCMRJX VAVyDCkVVzBIETD4eO6/dMnLTC+Hc+0NfSpRX/oYCn4Alo7RyaAg8uDNIAQ0qRIK8ekP aDIsTBGlxAITHrlSglXHhOf3ABJ23v4N3gNfkxYSJU9LvpRuieTNgwi9B1sqTzBL/bJ2 lPHnyxv7wZoPBU+pZLjR+Ji1F17NXItmxwuNYeh7R23LDdUjHQK3ckmQ67cmMSucmChv y5IA== X-Gm-Message-State: AElRT7EEvgkrh2h9blDXTEFm8GnXOu+4u/y0Pke6YbfMfsyrexqEu4QS ZDnWWXxh9hqUxvTM7S1GmIIcv+caOeJcr3uiuy0NamtF X-Google-Smtp-Source: AG47ELs4ef3ELucCIfWI/z3xpQD7fRZpS9vDJEKAZevGhLqNPpKHAOacmZ2ds9s5jCf6cAiJbZ4o9Q9tRxv0tAsKw9U= X-Received: by 10.80.202.133 with SMTP id x5mr21086100edh.30.1521630596498; Wed, 21 Mar 2018 04:09:56 -0700 (PDT) MIME-Version: 1.0 Received: by 10.80.181.134 with HTTP; Wed, 21 Mar 2018 04:09:55 -0700 (PDT) From: Rishabh Dave Date: Wed, 21 Mar 2018 16:39:55 +0530 Message-ID: Subject: bug #10915 client: hangs on umount if it had an MDS session evicted To: ceph-devel@vger.kernel.org Sender: ceph-devel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: ceph-devel@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Hi, I am trying to fix the bug - http://tracker.ceph.com/issues/10915. Patrick helped me to get started with it. I was able to reproduce it locally on vstart cluster and I am currently trying to fix it by getting the client unmounted on eviction. Once I could do this I would (as Patrick suggested) add a new option like "client_unmount_on_blacklist" and modify my code accordingly. The information about the blacklist seems to be available to the client code [1] but, as far as I can see, that line (i.e. the if-block containing it) never gets executed. MDS blacklists the client, evicts the session but client fails to notice that. I suppose, this in-congruence causes it to hang. The reason why the client fails to notice is that it never actually looks at the blacklist after the session is evicted -- handle_osd_map() never gets called after MDSRank::evict_session() is called. I did write a patch that would make the client check its address in the blacklist by calling a (new) function in ms_handle_reset() but it did not help. Looks like not only the client doesn't check the blacklist but also even if it were to, it would find an outdated version. To verify this, I wrote some debug code to iterate and display the blacklist towards the end of and after MDSRank::evict_session(). The blacklist turned out to be empty in both the location. Shouldn't blacklist be updated at least in or right after MDSRank::evict_session() gets executed? I think before fixing client, I need to have some sort of fix somewhere here [2]. And how can I get a stacktrace for commands like "bin/ceph tell mds.a client evict id=xxxx"? Also I have attached the patch containing modifications I have used so far. Thanks, Rishabh [1] https://github.com/ceph/ceph/blob/master/src/client/Client.cc#L2420 [2] https://github.com/ceph/ceph/blob/master/src/mds/MDSRank.cc#L2737 diff --git a/src/client/Client.cc b/src/client/Client.cc index 6c464d5a36..4e1f0b442c 100644 --- a/src/client/Client.cc +++ b/src/client/Client.cc @@ -2489,7 +2489,6 @@ void Client::handle_osd_map(MOSDMap *m) // ------------------------ // incoming messages - bool Client::ms_dispatch(Message *m) { Mutex::Locker l(client_lock); @@ -13489,9 +13488,22 @@ void Client::ms_handle_connect(Connection *con) ldout(cct, 10) << __func__ << " on " << con->get_peer_addr() << dendl; } +void Client::unmount_if_blacklisted() +{ + std::set new_blacklists; + objecter->consume_blacklist_events(&new_blacklists); + + const auto myaddr = messenger->get_myaddr(); + if (new_blacklists.count(myaddr)) { + cout << "UNMOUNTING!!!" << std::endl; + this->unmount(); + } +} + bool Client::ms_handle_reset(Connection *con) { ldout(cct, 0) << __func__ << " on " << con->get_peer_addr() << dendl; + this->unmount_if_blacklisted(); return false; } diff --git a/src/client/Client.h b/src/client/Client.h index ae5b188538..fd7b1f50da 100644 --- a/src/client/Client.h +++ b/src/client/Client.h @@ -558,6 +558,7 @@ protected: // friends friend class SyntheticClient; + void unmount_if_blacklisted(); bool ms_dispatch(Message *m) override; void ms_handle_connect(Connection *con) override; diff --git a/src/mds/MDSRank.cc b/src/mds/MDSRank.cc index d36d680d57..3ec16791e1 100644 --- a/src/mds/MDSRank.cc +++ b/src/mds/MDSRank.cc @@ -2717,6 +2717,9 @@ bool MDSRank::evict_client(int64_t session_id, bool wait, bool blacklist, std::stringstream& err_ss, Context *on_killed) { + FILE *fq = fopen("time", "a+"); + fprintf(fq, "mds: MDSRank::evict_client()\n"); + assert(mds_lock.is_locked_by_me()); // Mutually exclusive args @@ -2823,6 +2826,17 @@ bool MDSRank::evict_client(int64_t session_id, } } + fprintf(fq, "will print the blacklist -\n"); + std::set blacklist2; + objecter->consume_blacklist_events(&blacklist2); + int j = 0; + for (std::set::iterator i = blacklist2.begin(); + i != blacklist2.end(); ++i, ++j) { + stringstream ss; ss << *i; + fprintf(fq, "blacklist[%d] = %s", j, ss.str().c_str()); + } + fprintf(fq, "blacklist ends\n"); + fclose(fq); return true; } @@ -2900,6 +2914,20 @@ bool MDSRankDispatcher::handle_command( evict_clients(filter, m); *need_reply = false; + FILE *fq = fopen("time", "a+"); + fprintf(fq, "mds: MDSRank::ms_dispatch\n"); + fprintf(fq, "mds: will print the blacklist -\n"); + std::set blacklist2; + objecter->consume_blacklist_events(&blacklist2); + int j = 0; + for (std::set::iterator i = blacklist2.begin(); + i != blacklist2.end(); ++i, ++j) { + stringstream ss; ss << *i; + fprintf(fq, "blacklist[%d] = %s", j, ss.str().c_str()); + } + fprintf(fq, "mds: blacklist ends\n"); + fclose(fq); + return true; } else if (prefix == "damage ls") { Formatter *f = new JSONFormatter(true);