From patchwork Fri Jan 11 17:17:59 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Filipe Manana X-Patchwork-Id: 10760351 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 2DE091390 for ; Fri, 11 Jan 2019 17:18:23 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 0D70F285C4 for ; Fri, 11 Jan 2019 17:18:23 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 015FC297E5; Fri, 11 Jan 2019 17:18:22 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-8.0 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id DCEA8285C4 for ; Fri, 11 Jan 2019 17:18:21 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1732530AbfAKRSM (ORCPT ); Fri, 11 Jan 2019 12:18:12 -0500 Received: from mail.kernel.org ([198.145.29.99]:45376 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1731663AbfAKRSM (ORCPT ); Fri, 11 Jan 2019 12:18:12 -0500 Received: from localhost.localdomain (bl8-197-74.dsl.telepac.pt [85.241.197.74]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id A3F1F20872 for ; Fri, 11 Jan 2019 17:18:10 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1547227091; bh=wLB5HgnjutDzopgHLDLFb7p5OKpLw1v/KGBrlXy4NHw=; h=From:To:Subject:Date:In-Reply-To:References:From; b=zH1rV+CmpjUJpADjk0QB5JlXYgOqLzqto+78LUITMJ83NKmFFt5RY9RlvTpGl3tBz CxLGLfkzZ4la5/DkDH4H+Feg6U9il0jIuJKMNTP1qpOEyLwKbfICjTqRhW4MiTnyQh zkl4DyJ//XqFERKXsjtFvkaARZU7Vfy/VC5SCVBg= From: fdmanana@kernel.org To: linux-btrfs@vger.kernel.org Subject: [PATCH v2] Btrfs: avoid deadlock with memory reclaim due to allocation of devices Date: Fri, 11 Jan 2019 17:17:59 +0000 Message-Id: <20190111171759.19920-1-fdmanana@kernel.org> X-Mailer: git-send-email 2.11.0 In-Reply-To: <20181213211725.14832-1-fdmanana@kernel.org> References: <20181213211725.14832-1-fdmanana@kernel.org> Sender: linux-btrfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP From: Filipe Manana In a few places we are allocating a device using the GFP_KERNEL flag when it is not safe to do so, because if reclaim is triggered it can cause a transaction commit while we are holding the device list mutex. This mutex is required in the transaction commit path (at write_all_supers() and btrfs_update_commit_device_size()). So fix this by setting up a nofs memory allocation context in those cases. Fixes: 78f2c9e6dbb14 ("btrfs: device add and remove: use GFP_KERNEL") Fixes: e0ae999414238 ("btrfs: preallocate device flush bio") Signed-off-by: Filipe Manana --- V2: Change the approach to fix the problem by setting up nofs contextes where needed. fs/btrfs/volumes.c | 33 ++++++++++++++++++++++++++++++--- 1 file changed, 30 insertions(+), 3 deletions(-) diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index 2576b1a379c9..663566baae78 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -14,6 +14,7 @@ #include #include #include +#include #include "ctree.h" #include "extent_map.h" #include "disk-io.h" @@ -988,20 +989,29 @@ static noinline struct btrfs_device *device_list_add(const char *path, } if (!device) { + unsigned int nofs_flag; + if (fs_devices->opened) { mutex_unlock(&fs_devices->device_list_mutex); return ERR_PTR(-EBUSY); } + /* + * Setup nofs context because we are holding the device list + * mutex, which is required for a transaction commit. + */ + nofs_flag = memalloc_nofs_save(); device = btrfs_alloc_device(NULL, &devid, disk_super->dev_item.uuid); if (IS_ERR(device)) { + memalloc_nofs_restore(nofs_flag); mutex_unlock(&fs_devices->device_list_mutex); /* we can safely leave the fs_devices entry around */ return device; } - name = rcu_string_strdup(path, GFP_NOFS); + name = rcu_string_strdup(path, GFP_KERNEL); + memalloc_nofs_restore(nofs_flag); if (!name) { btrfs_free_device(device); mutex_unlock(&fs_devices->device_list_mutex); @@ -1137,11 +1147,19 @@ static struct btrfs_fs_devices *clone_fs_devices(struct btrfs_fs_devices *orig) /* We have held the volume lock, it is safe to get the devices. */ list_for_each_entry(orig_dev, &orig->devices, dev_list) { struct rcu_string *name; + unsigned int nofs_flag; + /* + * Setup nofs context because we are holding the device list + * mutex, which is required for a transaction commit. + */ + nofs_flag = memalloc_nofs_save(); device = btrfs_alloc_device(NULL, &orig_dev->devid, orig_dev->uuid); - if (IS_ERR(device)) + if (IS_ERR(device)) { + memalloc_nofs_restore(nofs_flag); goto error; + } /* * This is ok to do without rcu read locked because we hold the @@ -1151,12 +1169,14 @@ static struct btrfs_fs_devices *clone_fs_devices(struct btrfs_fs_devices *orig) name = rcu_string_strdup(orig_dev->name->str, GFP_KERNEL); if (!name) { + memalloc_nofs_restore(nofs_flag); btrfs_free_device(device); goto error; } rcu_assign_pointer(device->name, name); } + memalloc_nofs_restore(nofs_flag); list_add(&device->dev_list, &fs_devices->devices); device->fs_devices = fs_devices; fs_devices->num_devices++; @@ -1262,6 +1282,7 @@ static void btrfs_close_one_device(struct btrfs_device *device) struct btrfs_fs_devices *fs_devices = device->fs_devices; struct btrfs_device *new_device; struct rcu_string *name; + unsigned int nofs_flag; if (device->bdev) fs_devices->open_devices--; @@ -1277,17 +1298,23 @@ static void btrfs_close_one_device(struct btrfs_device *device) btrfs_close_bdev(device); + /* + * Setup nofs context because we are holding the device list + * mutex, which is required for a transaction commit. + */ + nofs_flag = memalloc_nofs_save(); new_device = btrfs_alloc_device(NULL, &device->devid, device->uuid); BUG_ON(IS_ERR(new_device)); /* -ENOMEM */ /* Safe because we are under uuid_mutex */ if (device->name) { - name = rcu_string_strdup(device->name->str, GFP_NOFS); + name = rcu_string_strdup(device->name->str, GFP_KERNEL); BUG_ON(!name); /* -ENOMEM */ rcu_assign_pointer(new_device->name, name); } + memalloc_nofs_restore(nofs_flag); list_replace_rcu(&device->dev_list, &new_device->dev_list); new_device->fs_devices = device->fs_devices;