From patchwork Wed Nov 22 10:21:53 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Rui Hua X-Patchwork-Id: 10070159 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id DE4EA60353 for ; Wed, 22 Nov 2017 10:22:20 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id C189C29C5D for ; Wed, 22 Nov 2017 10:22:20 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id B51ED29C60; Wed, 22 Nov 2017 10:22:20 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.8 required=2.0 tests=BAYES_00, DKIM_ADSP_CUSTOM_MED, DKIM_SIGNED, FREEMAIL_FROM, RCVD_IN_DNSWL_HI, T_DKIM_INVALID autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 2354E29C5D for ; Wed, 22 Nov 2017 10:22:20 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753762AbdKVKWR (ORCPT ); Wed, 22 Nov 2017 05:22:17 -0500 Received: from mail-ua0-f194.google.com ([209.85.217.194]:33297 "EHLO mail-ua0-f194.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753759AbdKVKWO (ORCPT ); Wed, 22 Nov 2017 05:22:14 -0500 Received: by mail-ua0-f194.google.com with SMTP id q18so10273524uaa.0; Wed, 22 Nov 2017 02:22:13 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:sender:in-reply-to:references:from:date:message-id :subject:to:cc; bh=qZhMFkeBsWh+Ya1AlEbawtI/8GLu/yNLMxucLq9qkBw=; b=ElZeT6spnncWwxVUvw53YeJXU3Kl+uK6JWGZYDApHvVQqgpkun/PYiUr+hb7uB/pY8 BKqlqcJcducO7y1JNdoFa/t+BDeY/cWWz8w6Qcc/8Mu7UCiWyaLnFhDWKonrWIyoihVH mt5G1uix1aFOvi8dvaHMsoOVbnHs6YFU4AnFlNwyh0w/u5CzSNdXvI0vpvvZx2pdwtfv nsLqOQjdtrpnCZ0cTaYDiLzjBo/lQPPWkXegbxwtkHWWRK5vzW4jSx3JdozD5KFsGuvg Bb0HDRAScg2KlctAc43do1EaXSsWiFDIWLACNscGDYzlK7iV44kSBTSVUbzoJUf9N1wb jr/g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:sender:in-reply-to:references:from :date:message-id:subject:to:cc; bh=qZhMFkeBsWh+Ya1AlEbawtI/8GLu/yNLMxucLq9qkBw=; b=VaSnOXVV1XwzcN74a36b71HQ5izycWG6M5qW1JNn2xioAH7dmsxYOGTBncEjlWsncb rqU5ZTNT0yNQGy6Hv7ZoKHPHgAdndchsY3MM97r1hQqQfGdBUiK+eGhQUMYDGKTMMunN Wpe2j5rcb82W8596Jo2UEXHXLzC2XZDiboi7uxYZkgdXQAwLm8SMVh9VcloF1LbZcRZ/ /Q5dvsxryeVr9w3gjdENZR2dudZNQz1+585QEhh20AsEZoKm5iqf9vxWyNKjrPjkaXAf z6I7H35RLorlf2dBqJZDPrm5v8m0vBHo8dqg95snpn1xmOglj+1inaOB1wYDkTBaLiA3 QjZw== X-Gm-Message-State: AJaThX7CWDndgmKMsUZ4HEl6JHloC9wm06KjsKvcMh6ODDQeMl7YlE3X kAHakzi7AN1zo8qVwRuZiV168lkpSNEQk7gvw2qE4A== X-Google-Smtp-Source: AGs4zMaNKiF304XbLkzznPEjkLxV7l8hC1We1Mc7EKJhyMuxIIxAmddg2Qd4QdqaOKY07o2N6FI3tavAY03kYFFzGUE= X-Received: by 10.176.0.183 with SMTP id 52mr18056551uaj.58.1511346133423; Wed, 22 Nov 2017 02:22:13 -0800 (PST) MIME-Version: 1.0 Received: by 10.103.120.136 with HTTP; Wed, 22 Nov 2017 02:21:53 -0800 (PST) In-Reply-To: <1511340585-24848-1-git-send-email-tang.junhui@zte.com.cn> References: <1511340585-24848-1-git-send-email-tang.junhui@zte.com.cn> From: Rui Hua Date: Wed, 22 Nov 2017 18:21:53 +0800 X-Google-Sender-Auth: 5NcsyHFHWyfa_dgOB99QJrexlVs Message-ID: Subject: Re: [bug report] bcache stucked when writting jounrnal To: tang.junhui@zte.com.cn Cc: Coly Li , Michael Lyle , linux-bcache , linux-block@vger.kernel.org Sender: linux-block-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Hi, Junhui, I have met the similar problem once. It looks like a deadlock between the cache device register thread and bcache_allocator thread. The trace info tell us the journal is full, probablely the allocator thread waits on bch_prio_write()->prio_io()->bch_journal_meta(), but there is no RESERVE_BTREE buckets to use for journal replay at this time, so register thread waits on bch_journal_replay()->bch_btree_insert() The path which your register command possibly blocked: run_cache_set() -> bch_journal_replay() -> bch_btree_insert() -> btree_insert_fn() -> bch_btree_insert_node() -> btree_split() -> btree_check_reserve() ----here we find RESERVE_BTREE buckets is empty, and then schedule out... bch_allocator_thread() ->bch_prio_write() ->bch_journal_meta() You can apply this patch to your code and try to register again. This is for your reference only. Because this patch was not verified in my environment, because my env was damaged last time before I dig into code and write this patch, I hopefully it can resolve your problem:-) Signed-off-by: Hua Rui --- drivers/md/bcache/btree.c | 8 +++++--- 1 file changed, 5 insertions(+), 3 deletions(-) } diff --git a/drivers/md/bcache/btree.c b/drivers/md/bcache/btree.c index 11c5503..211be35 100644 --- a/drivers/md/bcache/btree.c +++ b/drivers/md/bcache/btree.c @@ -1868,14 +1868,16 @@ void bch_initial_gc_finish(struct cache_set *c) */ for_each_cache(ca, c, i) { for_each_bucket(b, ca) { - if (fifo_full(&ca->free[RESERVE_PRIO])) + if (fifo_full(&ca->free[RESERVE_PRIO]) && + fifo_full(&ca->free[RESERVE_BTREE])) break; if (bch_can_invalidate_bucket(ca, b) && !GC_MARK(b)) { __bch_invalidate_one_bucket(ca, b); - fifo_push(&ca->free[RESERVE_PRIO], - b - ca->buckets); + if (!fifo_push(&ca->free[RESERVE_PRIO], b - ca->buckets)) + fifo_push(&ca->free[RESERVE_BTREE], + b - ca->buckets); } }