Message ID | CA+jMu4S5waxnQLOjbHesO64hUU2n=GknqLE2uXdzDGhwq+w6rA@mail.gmail.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Wed, Mar 21, 2018 at 7:09 PM, Rishabh Dave <ridave@redhat.com> wrote: > Hi, > > I am trying to fix the bug - http://tracker.ceph.com/issues/10915. > Patrick helped me to get started with it. I was able to reproduce it > locally on vstart cluster and I am currently trying to fix it by > getting the client unmounted on eviction. Once I could do this I would > (as Patrick suggested) add a new option like > "client_unmount_on_blacklist" and modify my code accordingly. > > The information about the blacklist seems to be available to the > client code [1] but, as far as I can see, that line (i.e. the if-block > containing it) never gets executed. MDS blacklists the client, evicts > the session but client fails to notice that. I suppose, this > in-congruence causes it to hang. > > The reason why the client fails to notice is that it never actually > looks at the blacklist after the session is evicted -- > handle_osd_map() never gets called after MDSRank::evict_session() is > called. I did write a patch that would make the client check its > address in the blacklist by calling a (new) function in > ms_handle_reset() but it did not help. Looks like not only the client > doesn't check the blacklist but also even if it were to, it would find > an outdated version. > > To verify this, I wrote some debug code to iterate and display the > blacklist towards the end of and after MDSRank::evict_session(). The > blacklist turned out to be empty in both the location. Shouldn't > blacklist be updated at least in or right after > MDSRank::evict_session() gets executed? I think before fixing client, > I need to have some sort of fix somewhere here [2]. The client only gets osdmap updates when it tries to communicate with an OSD, and the OSD tells it that its current map epoch is too old. In the case that the client isn't doing any data operations (i.e. no osd ops), then the client doesn't find out that its blacklisted. But that's okay, because the client's awareness of its own blacklisted-ness should only be needed in the case that there is some dirty data that needs to be thrown away in the special if(blacklisted) paths. So if it's not hanging on any OSD operations (those operations would have resulted in an updated osdmap), the question is what is it hanging on? Is it trying to open a new session with the MDS? John John > And how can I get a stacktrace for commands like "bin/ceph tell mds.a > client evict id=xxxx"? > > Also I have attached the patch containing modifications I have used so far. > > Thanks, > Rishabh > > [1] https://github.com/ceph/ceph/blob/master/src/client/Client.cc#L2420 > [2] https://github.com/ceph/ceph/blob/master/src/mds/MDSRank.cc#L2737 -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 22 March 2018 at 17:40, John Spray <jspray@redhat.com> wrote: > The client only gets osdmap updates when it tries to communicate with > an OSD, and the OSD tells it that its current map epoch is too old. > > In the case that the client isn't doing any data operations (i.e. no > osd ops), then the client doesn't find out that its blacklisted. But > that's okay, because the client's awareness of its own > blacklisted-ness should only be needed in the case that there is some > dirty data that needs to be thrown away in the special if(blacklisted) > paths. > > So if it's not hanging on any OSD operations (those operations would > have resulted in an updated osdmap), the question is what is it > hanging on? Is it trying to open a new session with the MDS? Looks like client still "thinks" that it has a session open (since condition at [1] was false when I checked it myself) and then it waits for a reply [2]. This is exactly where it hangs. I have written a fix and raised a PR for it [3]. Basically, it replaces caller_cond.Wait(client_lock) by caller_cond.WaitInterval(client_lock, utime_t(10, 0)). By the way, would it be correct for client to realize that it is blacklisted here [4]? When I checked, it wasn't so -- the (copy of) blacklist didn't have the client address. Also, I think it would better if an evicted/blacklisted client receives some sort of reply on making a request (like here [2] in this bug's case) from MDS that would convey that it can't access CephFS anymore. I don't know if this would be appropriate to do so, but this would not make the client wait infinitely. Though, this might be risky considering that client can, then, flood the MDS with requests. In that case, maybe MDS should send a reply to client only once to avoid that. [1] https://github.com/ceph/ceph/blob/master/src/client/Client.cc#L1689 [2] https://github.com/ceph/ceph/blob/master/src/client/Client.cc#L1719 [3] https://github.com/ceph/ceph/pull/21065 [4] https://github.com/ceph/ceph/blob/master/src/client/Client.cc#L1658 [5] https://github.com/ceph/ceph/blob/master/src/client/Client.cc#L2410 -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, Mar 27, 2018 at 2:24 PM, Rishabh Dave <ridave@redhat.com> wrote: > On 22 March 2018 at 17:40, John Spray <jspray@redhat.com> wrote: >> The client only gets osdmap updates when it tries to communicate with >> an OSD, and the OSD tells it that its current map epoch is too old. >> >> In the case that the client isn't doing any data operations (i.e. no >> osd ops), then the client doesn't find out that its blacklisted. But >> that's okay, because the client's awareness of its own >> blacklisted-ness should only be needed in the case that there is some >> dirty data that needs to be thrown away in the special if(blacklisted) >> paths. >> >> So if it's not hanging on any OSD operations (those operations would >> have resulted in an updated osdmap), the question is what is it >> hanging on? Is it trying to open a new session with the MDS? > > Looks like client still "thinks" that it has a session open (since > condition at [1] was false when I checked it myself) and then it waits > for a reply [2]. This is exactly where it hangs. I have written a fix > and raised a PR for it [3]. Basically, it replaces > caller_cond.Wait(client_lock) by caller_cond.WaitInterval(client_lock, > utime_t(10, 0)). So that patch is now auto-unmounting a client if any request takes longer than 10 seconds, which is not quite right. A request can take that long for other reasons, such as an ongoing MDS failover, or just a very slow/busy MDS, and we wouldn't want to start aborting requests because of that. Instead, you could do a check in Client::tick that looks at mds_requests.begin() (doesn't matter which request we check), and if it has been stuck for a long time then call into objecter->get_latest_version to ensure we have the latest OSDMap (and blacklist) even if there's no data IO going on. I don't think the client_unmount_on_blacklist behaviour is going to be a good idea in practice -- if someone has a workload writing files out to a mounted /mnt/cephfs, and it gets auto-unmounted, they'll start writing data out to their root filesystem instead! We need to leave the actual unmounting to the administrator so that they can stop whatever workloads were using the mount point as well. John > > By the way, would it be correct for client to realize that it is > blacklisted here [4]? When I checked, it wasn't so -- the (copy of) > blacklist didn't have the client address. > > Also, I think it would better if an evicted/blacklisted client > receives some sort of reply on making a request (like here [2] in this > bug's case) from MDS that would convey that it can't access CephFS > anymore. I don't know if this would be appropriate to do so, but this > would not make the client wait infinitely. Though, this might be risky > considering that client can, then, flood the MDS with requests. In > that case, maybe MDS should send a reply to client only once to avoid > that. > > [1] https://github.com/ceph/ceph/blob/master/src/client/Client.cc#L1689 > [2] https://github.com/ceph/ceph/blob/master/src/client/Client.cc#L1719 > [3] https://github.com/ceph/ceph/pull/21065 > [4] https://github.com/ceph/ceph/blob/master/src/client/Client.cc#L1658 > [5] https://github.com/ceph/ceph/blob/master/src/client/Client.cc#L2410 -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 27 March 2018 at 19:13, John Spray <jspray@redhat.com> wrote:> > So that patch is now auto-unmounting a client if any request takes > longer than 10 seconds, which is not quite right. A request can take > that long for other reasons, such as an ongoing MDS failover, or just > a very slow/busy MDS, and we wouldn't want to start aborting requests > because of that. I was going to ask about the very situation you mentioned here at the PR's page. > Instead, you could do a check in Client::tick that looks at > mds_requests.begin() (doesn't matter which request we check), and if > it has been stuck for a long time then call into > objecter->get_latest_version to ensure we have the latest OSDMap (and > blacklist) even if there's no data IO going on. Ok. I'll make the changes accordingly. > I don't think the client_unmount_on_blacklist behaviour is going to be > a good idea in practice -- if someone has a workload writing files out > to a mounted /mnt/cephfs, and it gets auto-unmounted, they'll start > writing data out to their root filesystem instead! We need to leave > the actual unmounting to the administrator so that they can stop > whatever workloads were using the mount point as well. So, would it preferable to just get rid of it or to set it to false by default? -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, Mar 27, 2018 at 6:54 AM, Rishabh Dave <ridave@redhat.com> wrote: > On 27 March 2018 at 19:13, John Spray <jspray@redhat.com> wrote:> >> I don't think the client_unmount_on_blacklist behaviour is going to be >> a good idea in practice -- if someone has a workload writing files out >> to a mounted /mnt/cephfs, and it gets auto-unmounted, they'll start >> writing data out to their root filesystem instead! We need to leave >> the actual unmounting to the administrator so that they can stop >> whatever workloads were using the mount point as well. > > So, would it preferable to just get rid of it or to set it to false by default? John raises a good point. I think instead just have the CephFS Client clean up its state such that umount(2) succeeds.
diff --git a/src/client/Client.cc b/src/client/Client.cc index 6c464d5a36..4e1f0b442c 100644 --- a/src/client/Client.cc +++ b/src/client/Client.cc @@ -2489,7 +2489,6 @@ void Client::handle_osd_map(MOSDMap *m) // ------------------------ // incoming messages - bool Client::ms_dispatch(Message *m) { Mutex::Locker l(client_lock); @@ -13489,9 +13488,22 @@ void Client::ms_handle_connect(Connection *con) ldout(cct, 10) << __func__ << " on " << con->get_peer_addr() << dendl; } +void Client::unmount_if_blacklisted() +{ + std::set<entity_addr_t> new_blacklists; + objecter->consume_blacklist_events(&new_blacklists); + + const auto myaddr = messenger->get_myaddr(); + if (new_blacklists.count(myaddr)) { + cout << "UNMOUNTING!!!" << std::endl; + this->unmount(); + } +} + bool Client::ms_handle_reset(Connection *con) { ldout(cct, 0) << __func__ << " on " << con->get_peer_addr() << dendl; + this->unmount_if_blacklisted(); return false; } diff --git a/src/client/Client.h b/src/client/Client.h index ae5b188538..fd7b1f50da 100644 --- a/src/client/Client.h +++ b/src/client/Client.h @@ -558,6 +558,7 @@ protected: // friends friend class SyntheticClient; + void unmount_if_blacklisted(); bool ms_dispatch(Message *m) override; void ms_handle_connect(Connection *con) override; diff --git a/src/mds/MDSRank.cc b/src/mds/MDSRank.cc index d36d680d57..3ec16791e1 100644 --- a/src/mds/MDSRank.cc +++ b/src/mds/MDSRank.cc @@ -2717,6 +2717,9 @@ bool MDSRank::evict_client(int64_t session_id, bool wait, bool blacklist, std::stringstream& err_ss, Context *on_killed) { + FILE *fq = fopen("time", "a+"); + fprintf(fq, "mds: MDSRank::evict_client()\n"); + assert(mds_lock.is_locked_by_me()); // Mutually exclusive args @@ -2823,6 +2826,17 @@ bool MDSRank::evict_client(int64_t session_id, } } + fprintf(fq, "will print the blacklist -\n"); + std::set<entity_addr_t> blacklist2; + objecter->consume_blacklist_events(&blacklist2); + int j = 0; + for (std::set<entity_addr_t>::iterator i = blacklist2.begin(); + i != blacklist2.end(); ++i, ++j) { + stringstream ss; ss << *i; + fprintf(fq, "blacklist[%d] = %s", j, ss.str().c_str()); + } + fprintf(fq, "blacklist ends\n"); + fclose(fq); return true; } @@ -2900,6 +2914,20 @@ bool MDSRankDispatcher::handle_command( evict_clients(filter, m); *need_reply = false; + FILE *fq = fopen("time", "a+"); + fprintf(fq, "mds: MDSRank::ms_dispatch\n"); + fprintf(fq, "mds: will print the blacklist -\n"); + std::set<entity_addr_t> blacklist2; + objecter->consume_blacklist_events(&blacklist2); + int j = 0; + for (std::set<entity_addr_t>::iterator i = blacklist2.begin(); + i != blacklist2.end(); ++i, ++j) { + stringstream ss; ss << *i; + fprintf(fq, "blacklist[%d] = %s", j, ss.str().c_str()); + } + fprintf(fq, "mds: blacklist ends\n"); + fclose(fq); + return true; } else if (prefix == "damage ls") { Formatter *f = new JSONFormatter(true);