From patchwork Wed Feb 19 11:24:17 2014 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Wang Shilong X-Patchwork-Id: 3680431 Return-Path: X-Original-To: patchwork-linux-btrfs@patchwork.kernel.org Delivered-To: patchwork-parsemail@patchwork2.web.kernel.org Received: from mail.kernel.org (mail.kernel.org [198.145.19.201]) by patchwork2.web.kernel.org (Postfix) with ESMTP id 0486CBF13A for ; Wed, 19 Feb 2014 12:00:25 +0000 (UTC) Received: from mail.kernel.org (localhost [127.0.0.1]) by mail.kernel.org (Postfix) with ESMTP id 2D45720127 for ; Wed, 19 Feb 2014 12:00:24 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 3CB4320122 for ; Wed, 19 Feb 2014 12:00:23 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752114AbaBSMAM (ORCPT ); Wed, 19 Feb 2014 07:00:12 -0500 Received: from cn.fujitsu.com ([222.73.24.84]:63340 "EHLO song.cn.fujitsu.com" rhost-flags-OK-FAIL-OK-OK) by vger.kernel.org with ESMTP id S1751606AbaBSMAI (ORCPT ); Wed, 19 Feb 2014 07:00:08 -0500 X-IronPort-AV: E=Sophos;i="4.97,504,1389715200"; d="scan'208";a="9560242" Received: from unknown (HELO tang.cn.fujitsu.com) ([10.167.250.3]) by song.cn.fujitsu.com with ESMTP; 19 Feb 2014 19:56:14 +0800 Received: from fnstmail02.fnst.cn.fujitsu.com (tang.cn.fujitsu.com [127.0.0.1]) by tang.cn.fujitsu.com (8.14.3/8.13.1) with ESMTP id s1JBQ2ut026056 for ; Wed, 19 Feb 2014 19:26:22 +0800 Received: from wangs.fnst.cn.fujitsu.com ([10.167.226.104]) by fnstmail02.fnst.cn.fujitsu.com (Lotus Domino Release 8.5.3) with ESMTP id 2014021919235279-43108 ; Wed, 19 Feb 2014 19:23:52 +0800 From: Wang Shilong To: linux-btrfs@vger.kernel.org Subject: [PATCH 2/4] Btrfs: device_replace: fix deadlock for nocow case Date: Wed, 19 Feb 2014 19:24:17 +0800 Message-Id: <1392809059-22319-2-git-send-email-wangsl.fnst@cn.fujitsu.com> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1392809059-22319-1-git-send-email-wangsl.fnst@cn.fujitsu.com> References: <1392809059-22319-1-git-send-email-wangsl.fnst@cn.fujitsu.com> X-MIMETrack: Itemize by SMTP Server on mailserver/fnst(Release 8.5.3|September 15, 2011) at 2014/02/19 19:23:52, Serialize by Router on mailserver/fnst(Release 8.5.3|September 15, 2011) at 2014/02/19 19:24:13, Serialize complete at 2014/02/19 19:24:13 Sender: linux-btrfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org X-Spam-Status: No, score=-7.5 required=5.0 tests=BAYES_00, RCVD_IN_DNSWL_HI, RP_MATCHES_RCVD, UNPARSEABLE_RELAY autolearn=unavailable version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on mail.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP commit cb7ab02156e4 cause a following deadlock found by xfstests,btrfs/011: Thread1 is commiting transaction which is blocked at btrfs_scrub_pause(). Thread2 is calling btrfs_file_aio_write() which has held inode's @i_mutex and commit transaction(blocked because Thread1 is committing transaction). Thread3 is copy_nocow_page worker which will also try to hold inode @i_mutex, so thread3 will wait Thread1 finished. Thread4 is waiting pending workers finished which will wait Thread3 finished. So the problem is like this: Thread1--->Thread4--->Thread3--->Thread2---->Thread1 Deadlock happens! we fix it by letting Thread1 go firstly, which means we won't block transaction commit while we are waiting pending workers finished. Reported-by: Qu Wenruo Signed-off-by: Wang Shilong --- fs/btrfs/scrub.c | 17 +++++++++++++++-- 1 file changed, 15 insertions(+), 2 deletions(-) diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c index 51c342b..f2f8803 100644 --- a/fs/btrfs/scrub.c +++ b/fs/btrfs/scrub.c @@ -2686,10 +2686,23 @@ int scrub_enumerate_chunks(struct scrub_ctx *sctx, wait_event(sctx->list_wait, atomic_read(&sctx->bios_in_flight) == 0); - atomic_set(&sctx->wr_ctx.flush_all_writes, 0); + atomic_inc(&fs_info->scrubs_paused); + wake_up(&fs_info->scrub_pause_wait); + + /* + * must be called before we decrease @scrub_paused. + * make sure we don't block transaction commit while + * we are waiting pending workers finished. + */ wait_event(sctx->list_wait, atomic_read(&sctx->workers_pending) == 0); - scrub_blocked_if_needed(fs_info); + atomic_set(&sctx->wr_ctx.flush_all_writes, 0); + + mutex_lock(&fs_info->scrub_lock); + __scrub_blocked_if_needed(fs_info); + atomic_dec(&fs_info->scrubs_paused); + mutex_unlock(&fs_info->scrub_lock); + wake_up(&fs_info->scrub_pause_wait); btrfs_put_block_group(cache); if (ret)