From patchwork Thu Aug 28 07:36:05 2014 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alexandre Oliva X-Patchwork-Id: 4794901 Return-Path: X-Original-To: patchwork-ceph-devel@patchwork.kernel.org Delivered-To: patchwork-parsemail@patchwork1.web.kernel.org Received: from mail.kernel.org (mail.kernel.org [198.145.19.201]) by patchwork1.web.kernel.org (Postfix) with ESMTP id E2DCC9F38D for ; Thu, 28 Aug 2014 07:36:22 +0000 (UTC) Received: from mail.kernel.org (localhost [127.0.0.1]) by mail.kernel.org (Postfix) with ESMTP id 052832016C for ; Thu, 28 Aug 2014 07:36:22 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id CF15D2017D for ; Thu, 28 Aug 2014 07:36:20 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S936265AbaH1HgT (ORCPT ); Thu, 28 Aug 2014 03:36:19 -0400 Received: from linux-libre.fsfla.org ([208.118.235.54]:59924 "EHLO linux-libre.fsfla.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S934221AbaH1HgS (ORCPT ); Thu, 28 Aug 2014 03:36:18 -0400 Received: from freie.home (home.lxoliva.fsfla.org [172.31.160.22]) by linux-libre.fsfla.org (8.14.4/8.14.4/Debian-2ubuntu2.1) with ESMTP id s7S7aHmo026753 for ; Thu, 28 Aug 2014 07:36:17 GMT Received: from free.home (free.home [172.31.160.1]) by freie.home (8.14.8/8.14.7) with ESMTP id s7S7a5wc022956; Thu, 28 Aug 2014 04:36:05 -0300 From: Alexandre Oliva To: ceph-devel@vger.kernel.org Subject: Re: [PATCH] osd: requeue replays we couldn't start in this tick Organization: Free thinker, not speaking for the GNU Project References: Date: Thu, 28 Aug 2014 04:36:05 -0300 In-Reply-To: (Alexandre Oliva's message of "Wed, 30 Jul 2014 23:07:47 -0300") Message-ID: User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/23.2 (gnu/linux) MIME-Version: 1.0 Sender: ceph-devel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: ceph-devel@vger.kernel.org X-Spam-Status: No, score=-6.9 required=5.0 tests=BAYES_00, RCVD_IN_DNSWL_HI, RP_MATCHES_RCVD, UNPARSEABLE_RELAY autolearn=ham version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on mail.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Ping? If a PG is in replay but it's not active yet by the time the OSD tick gets to running the replay queue, we'll drop the PG from the replay queue without clearing its replay state. This patch arranges for PGs that weren't ready for replaying (or that weren't replayed for any other reason) to be requeued, so that they will be retried in the next tick. I have confirmed, with this patch, that my memory-starved nodes would often trigger the new log message, indicating the PGs were not active yet. I have not observed instances of non-primary or replay_until mismatch, and I'm not sure the latter should justify a requeue, an assertion failure or whatever, but simply dropping the PG on the floor leaving it indefinitely in replay state is not something that should be done, since it prevents other operations from being processed. This patch does NOT fix the other problem reported in Issue #8758, that restarting any of the osds holding a PG is enough to clear the replay state. Signed-off-by: Alexandre Oliva --- src/osd/OSD.cc | 22 ++++++++++++++++++++++ 1 file changed, 22 insertions(+) diff --git a/src/osd/OSD.cc b/src/osd/OSD.cc index 1e08c28..40d1121 100644 --- a/src/osd/OSD.cc +++ b/src/osd/OSD.cc @@ -7228,6 +7228,7 @@ void OSD::check_replay_queue() } replay_queue_lock.Unlock(); + list< pair > requeue; for (list< pair >::iterator p = pgids.begin(); p != pgids.end(); ++p) { spg_t pgid = p->first; if (pg_map.count(pgid)) { @@ -7239,12 +7240,33 @@ void OSD::check_replay_queue() pg->replay_until == p->second) { pg->replay_queued_ops(); } + if (pg->is_replay()) { + dout(12) << "check_replay_queue requeueing " << pgid + << (pg->is_active() ? "" : ", not active") + << (pg->is_primary() ? "" : ", not primary") + << (pg->replay_until == p->second ? "" + : ", mismatched replay_until") + << ", pg->until " << pg->replay_until + << ", queued " << p->second << dendl; + requeue.push_back(*p); + } pg->unlock(); } else { dout(10) << "check_replay_queue pgid " << pgid << " (not found)" << dendl; } } + if (!requeue.empty()) { + replay_queue_lock.Lock(); + // reverse order to preserve it; push to the front of the queue, + // where these entries came from + while (!requeue.empty()) { + replay_queue.push_front(requeue.back()); + requeue.pop_back(); + } + replay_queue_lock.Unlock(); + } + // wake up _all_ pg waiters; raw pg -> actual pg mapping may have shifted wake_all_pg_waiters(); }