From patchwork Tue Sep 11 22:34:49 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Omar Sandoval X-Patchwork-Id: 10596435 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 7222914E0 for ; Tue, 11 Sep 2018 22:35:32 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 61AE429C45 for ; Tue, 11 Sep 2018 22:35:32 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 5645329DEA; Tue, 11 Sep 2018 22:35:32 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,MAILING_LIST_MULTI,RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 7148F29C45 for ; Tue, 11 Sep 2018 22:35:31 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 258018E0018; Tue, 11 Sep 2018 18:35:21 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 1E22C8E0001; Tue, 11 Sep 2018 18:35:21 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 0AD958E0018; Tue, 11 Sep 2018 18:35:20 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-pg1-f197.google.com (mail-pg1-f197.google.com [209.85.215.197]) by kanga.kvack.org (Postfix) with ESMTP id A759A8E0001 for ; Tue, 11 Sep 2018 18:35:20 -0400 (EDT) Received: by mail-pg1-f197.google.com with SMTP id s11-v6so13010291pgv.9 for ; Tue, 11 Sep 2018 15:35:20 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:dkim-signature:from:to:cc:subject:date :message-id:in-reply-to:references; bh=A6phYVM31YfWMc0OSMa9wU6yrWTSurXiNO96yCnyeWY=; b=PLyurGQiDpY8T6eEMRhAvLSldmxfxcqLu54vXGhb5ag3POyHSppd0nME5b3mnowIGL /DeNYmdmWHg8lA0uV2fONlUqfRgDw3pXTzqq2Zkiq+vW+HWZjng8s1DyehXTRTkGz5/j V+8gj1J5clNgeMQTjKt21QRkwJqaJbjxHqDuVRVERHu1vqb/6STYb4Xv76Hrcre6d8s2 g8r7jWyiSXinyW0iu+fjA/XGXKbMxfU37PjBUzz90TmJklNu1At5RChjMNq9HN9tX97Q o5knzdu3U2qFLY+hBwWbGC2UffsmRIc9ceyKsmnS26KU7B1B1+nqH40nO2dquYyr0rUk KBUw== X-Gm-Message-State: APzg51CCPK9hcMp61JkSNFjdP8ceAZDFteOVWIKRL9vcEcFVj1mON3YF RdkgzNDH8a9GjOi7rczOSBN/1E2Se20qy6+JorM+Vg86UWw+zgiA0xqIp5FfvsLjOtIXJqrD141 61LIFiGCGEtHMevE3SJF1qmaIFVS2+uASE1dY+MtjlEqwV0oloaXUfQp6aZiUoNvhH/vX34JsPF JCXdlAxTz17RIx0h9ZvF01bZe1AOAjXgI4aMAXvNKJUXdez31hgdKGqMT2GKkXYIkSWXmOpCZMW z03xX1bmT1biLZy5iAKTyF3Ubu1MbiETtz9PofIoFMYpNCpJGfeYBb9NwxQkSh+5RdpFfG3T89d Y1IOsO1X2Y2vnNA9EV0eeS59EkRLyJLEqTwdiPuWCAcLQ142Wh42xRTOs4/SGH3hSNakEcHDfhV p X-Received: by 2002:a63:f54c:: with SMTP id e12-v6mr30791585pgk.286.1536705320324; Tue, 11 Sep 2018 15:35:20 -0700 (PDT) X-Received: by 2002:a63:f54c:: with SMTP id e12-v6mr30791514pgk.286.1536705319050; Tue, 11 Sep 2018 15:35:19 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1536705319; cv=none; d=google.com; s=arc-20160816; b=RK6sz/HdVIaBPu7v7kxyny2xbgeGgGH9jDzBOqHKJ3jDwpzqVlQPlzdyH8z+zmL2pl mEa9nOUKgvehgGgVwc+8ht8ouRxNUtYt/fSf0GcrjVIwlozQoEgpyMaUw3FswNUxhtmb DadmOA1k1D0LhMpfsJ1aKha9INH7Q40/oT969Doz9RT2luDYu4Ti8bQlchowXv0nioOP vSm1p4L3wMejbFPr9GW9A1Q1a5lVWpSJLCskEwxmw1QBYfbexFEjsHLZFkGzsYNaUx26 O62RTXJtbtQ3udrdDUaUnWvks3qFZvZIv8QcLMOW9dYXzsJxMlbt4pnfhcaSPtMUzdN1 xKpQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=A6phYVM31YfWMc0OSMa9wU6yrWTSurXiNO96yCnyeWY=; b=x5QYPxfgCZTixV9OUlS7P2Nz+XrMM4UZ6DcJgBoMX7kwvEEOO2Wrqo/b/OOh+k3u6e 5hadsmpQdOFFnrsWdG4YjizGac68Zlm6Lm6skpaCvykzCIDK+I6mBxjMX7UzG9KIpTLF A31R0MS9QHZ9Dl+OzjL8+Da+cSW2qR8ALZl5du70nb8LJrRJv2W8uJUURf6n85gEknjJ FapKMxkwRP7Y6RKKwIjYq64+FW8AIgPtEDBgumhLI3zV37QY1Ni48TR8AeT9y02AJpef 5UDz4fx4TshlmMAC+nSLQZc38VupLWJMkIz2BQPK9Vn4mExD1GjrzsJUmjXtO5qwv/Op 9N1Q== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@osandov-com.20150623.gappssmtp.com header.s=20150623 header.b=fcQ7TxyB; spf=neutral (google.com: 209.85.220.65 is neither permitted nor denied by best guess record for domain of osandov@osandov.com) smtp.mailfrom=osandov@osandov.com Received: from mail-sor-f65.google.com (mail-sor-f65.google.com. [209.85.220.65]) by mx.google.com with SMTPS id m10-v6sor3575620pll.123.2018.09.11.15.35.18 for (Google Transport Security); Tue, 11 Sep 2018 15:35:19 -0700 (PDT) Received-SPF: neutral (google.com: 209.85.220.65 is neither permitted nor denied by best guess record for domain of osandov@osandov.com) client-ip=209.85.220.65; Authentication-Results: mx.google.com; dkim=pass header.i=@osandov-com.20150623.gappssmtp.com header.s=20150623 header.b=fcQ7TxyB; spf=neutral (google.com: 209.85.220.65 is neither permitted nor denied by best guess record for domain of osandov@osandov.com) smtp.mailfrom=osandov@osandov.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=osandov-com.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=A6phYVM31YfWMc0OSMa9wU6yrWTSurXiNO96yCnyeWY=; b=fcQ7TxyB54P6/rWFuBCAYvllKhbo+w6uzW88ruQuHqRePWPZ6ecaEGMYqvISqiOvuP TbFRHjCT9tAyNKyarAElRfZw4ggqlwto5GJ9XJEWrePgOJ7VjhN+U2C28qIzSWSSJNtB 5gRAa09wf2bfDCS6+f+P24ujcOoP9S/8d8uTDOYXUWJCOEmNytBqq2eM6EWK53awyR33 wcKURCWfXM/RtB8BbPFOO/deV/EDXjUGsKnGzHJOAzgFGW7ch7pNg2ISy4bXy5iaM3LB P42wJS8wCEdB3/zJuPivZyHPYGstPamVrRNhi2Sogj6OCJaTo/UcbG4F4T/JbxOuTUg+ /pkw== X-Google-Smtp-Source: ANB0Vdaqtlr2BggzCN1XPoJ3rzjz9Q3l/kqF0RJwmQrpzRBAFnCaNjH9im14YCdm0cr4OMsEmHD9eA== X-Received: by 2002:a17:902:b7c5:: with SMTP id v5-v6mr29983155plz.49.1536705318373; Tue, 11 Sep 2018 15:35:18 -0700 (PDT) Received: from vader.thefacebook.com ([2620:10d:c090:200::7:13b4]) by smtp.gmail.com with ESMTPSA id 186-v6sm18710176pgg.56.2018.09.11.15.35.17 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 11 Sep 2018 15:35:17 -0700 (PDT) From: Omar Sandoval To: linux-btrfs@vger.kernel.org Cc: kernel-team@fb.com, linux-mm@kvack.org Subject: [PATCH v7 6/6] Btrfs: support swap files Date: Tue, 11 Sep 2018 15:34:49 -0700 Message-Id: <61def3687f0309c9b846677c8d112afc4d6d90f1.1536704650.git.osandov@fb.com> X-Mailer: git-send-email 2.18.0 In-Reply-To: References: X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP From: Omar Sandoval Implement the swap file a_ops on Btrfs. Activation needs to make sure that the file can be used as a swap file, which currently means it must be fully allocated as nocow with no compression on one device. It must also do the proper tracking so that ioctls will not interfere with the swap file. Deactivation clears this tracking. Signed-off-by: Omar Sandoval --- fs/btrfs/inode.c | 317 +++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 317 insertions(+) diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index 3ea5339603cf..0586285b1d9f 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -27,6 +27,7 @@ #include #include #include +#include #include #include "ctree.h" #include "disk-io.h" @@ -10488,6 +10489,320 @@ void btrfs_set_range_writeback(struct extent_io_tree *tree, u64 start, u64 end) } } +/* + * Add an entry indicating a block group or device which is pinned by a + * swapfile. Returns 0 on success, 1 if there is already an entry for it, or a + * negative errno on failure. + */ +static int btrfs_add_swapfile_pin(struct inode *inode, void *ptr, + bool is_block_group) +{ + struct btrfs_fs_info *fs_info = BTRFS_I(inode)->root->fs_info; + struct btrfs_swapfile_pin *sp, *entry; + struct rb_node **p; + struct rb_node *parent = NULL; + + sp = kmalloc(sizeof(*sp), GFP_NOFS); + if (!sp) + return -ENOMEM; + sp->ptr = ptr; + sp->inode = inode; + sp->is_block_group = is_block_group; + + spin_lock(&fs_info->swapfile_pins_lock); + p = &fs_info->swapfile_pins.rb_node; + while (*p) { + parent = *p; + entry = rb_entry(parent, struct btrfs_swapfile_pin, node); + if (sp->ptr < entry->ptr || + (sp->ptr == entry->ptr && sp->inode < entry->inode)) { + p = &(*p)->rb_left; + } else if (sp->ptr > entry->ptr || + (sp->ptr == entry->ptr && sp->inode > entry->inode)) { + p = &(*p)->rb_right; + } else { + spin_unlock(&fs_info->swapfile_pins_lock); + kfree(sp); + return 1; + } + } + rb_link_node(&sp->node, parent, p); + rb_insert_color(&sp->node, &fs_info->swapfile_pins); + spin_unlock(&fs_info->swapfile_pins_lock); + return 0; +} + +/* Free all of the entries pinned by this swapfile. */ +static void btrfs_free_swapfile_pins(struct inode *inode) +{ + struct btrfs_fs_info *fs_info = BTRFS_I(inode)->root->fs_info; + struct btrfs_swapfile_pin *sp; + struct rb_node *node, *next; + + spin_lock(&fs_info->swapfile_pins_lock); + node = rb_first(&fs_info->swapfile_pins); + while (node) { + next = rb_next(node); + sp = rb_entry(node, struct btrfs_swapfile_pin, node); + if (sp->inode == inode) { + rb_erase(&sp->node, &fs_info->swapfile_pins); + if (sp->is_block_group) + btrfs_put_block_group(sp->ptr); + kfree(sp); + } + node = next; + } + spin_unlock(&fs_info->swapfile_pins_lock); +} + +struct btrfs_swap_info { + u64 start; + u64 block_start; + u64 block_len; + u64 lowest_ppage; + u64 highest_ppage; + unsigned long nr_pages; + int nr_extents; +}; + +static int btrfs_add_swap_extent(struct swap_info_struct *sis, + struct btrfs_swap_info *bsi) +{ + unsigned long nr_pages; + u64 first_ppage, first_ppage_reported, next_ppage; + int ret; + + first_ppage = ALIGN(bsi->block_start, PAGE_SIZE) >> PAGE_SHIFT; + next_ppage = ALIGN_DOWN(bsi->block_start + bsi->block_len, + PAGE_SIZE) >> PAGE_SHIFT; + + if (first_ppage >= next_ppage) + return 0; + nr_pages = next_ppage - first_ppage; + + first_ppage_reported = first_ppage; + if (bsi->start == 0) + first_ppage_reported++; + if (bsi->lowest_ppage > first_ppage_reported) + bsi->lowest_ppage = first_ppage_reported; + if (bsi->highest_ppage < (next_ppage - 1)) + bsi->highest_ppage = next_ppage - 1; + + ret = add_swap_extent(sis, bsi->nr_pages, nr_pages, first_ppage); + if (ret < 0) + return ret; + bsi->nr_extents += ret; + bsi->nr_pages += nr_pages; + return 0; +} + +static void btrfs_swap_deactivate(struct file *file) +{ + struct inode *inode = file_inode(file); + + btrfs_free_swapfile_pins(inode); + atomic_dec(&BTRFS_I(inode)->root->nr_swapfiles); +} + +static int btrfs_swap_activate(struct swap_info_struct *sis, struct file *file, + sector_t *span) +{ + struct inode *inode = file_inode(file); + struct btrfs_fs_info *fs_info = BTRFS_I(inode)->root->fs_info; + struct extent_io_tree *io_tree = &BTRFS_I(inode)->io_tree; + struct extent_state *cached_state = NULL; + struct extent_map *em = NULL; + struct btrfs_device *device = NULL; + struct btrfs_swap_info bsi = { + .lowest_ppage = (sector_t)-1ULL, + }; + int ret = 0; + u64 isize = inode->i_size; + u64 start; + + /* + * If the swap file was just created, make sure delalloc is done. If the + * file changes again after this, the user is doing something stupid and + * we don't really care. + */ + ret = btrfs_wait_ordered_range(inode, 0, (u64)-1); + if (ret) + return ret; + + /* + * The inode is locked, so these flags won't change after we check them. + */ + if (BTRFS_I(inode)->flags & BTRFS_INODE_COMPRESS) { + btrfs_info(fs_info, "swapfile must not be compressed"); + return -EINVAL; + } + if (!(BTRFS_I(inode)->flags & BTRFS_INODE_NODATACOW)) { + btrfs_info(fs_info, "swapfile must not be copy-on-write"); + return -EINVAL; + } + + /* + * Balance or device remove/replace/resize can move stuff around from + * under us. The EXCL_OP flag makes sure they aren't running/won't run + * concurrently while we are mapping the swap extents, and + * fs_info->swapfile_pins prevents them from running while the swap file + * is active and moving the extents. Note that this also prevents a + * concurrent device add which isn't actually necessary, but it's not + * really worth the trouble to allow it. + */ + if (test_and_set_bit(BTRFS_FS_EXCL_OP, &fs_info->flags)) + return -EBUSY; + /* + * Snapshots can create extents which require COW even if NODATACOW is + * set. We use this counter to prevent snapshots. We must increment it + * before walking the extents because we don't want a concurrent + * snapshot to run after we've already checked the extents. + */ + atomic_inc(&BTRFS_I(inode)->root->nr_swapfiles); + + lock_extent_bits(io_tree, 0, isize - 1, &cached_state); + start = 0; + while (start < isize) { + u64 end, logical_block_start, physical_block_start; + struct btrfs_block_group_cache *bg; + u64 len = isize - start; + + em = btrfs_get_extent(BTRFS_I(inode), NULL, 0, start, len, 0); + if (IS_ERR(em)) { + ret = PTR_ERR(em); + goto out; + } + end = extent_map_end(em); + + if (em->block_start == EXTENT_MAP_HOLE) { + btrfs_info(fs_info, "swapfile must not have holes"); + ret = -EINVAL; + goto out; + } + if (em->block_start == EXTENT_MAP_INLINE) { + /* + * It's unlikely we'll ever actually find ourselves + * here, as a file small enough to fit inline won't be + * big enough to store more than the swap header, but in + * case something changes in the future, let's catch it + * here rather than later. + */ + btrfs_info(fs_info, "swapfile must not be inline"); + ret = -EINVAL; + goto out; + } + if (test_bit(EXTENT_FLAG_COMPRESSED, &em->flags)) { + btrfs_info(fs_info, "swapfile must not be compressed"); + ret = -EINVAL; + goto out; + } + + logical_block_start = em->block_start + (start - em->start); + len = min(len, em->len - (start - em->start)); + free_extent_map(em); + em = NULL; + + ret = can_nocow_extent(inode, start, &len, NULL, NULL, NULL); + if (ret < 0) { + goto out; + } else if (ret) { + ret = 0; + } else { + btrfs_info(fs_info, "swapfile must not be copy-on-write"); + ret = -EINVAL; + goto out; + } + + em = btrfs_get_chunk_map(fs_info, logical_block_start, len); + if (IS_ERR(em)) { + ret = PTR_ERR(em); + goto out; + } + + if (em->map_lookup->type & BTRFS_BLOCK_GROUP_PROFILE_MASK) { + btrfs_info(fs_info, "swapfile must have single data profile"); + ret = -EINVAL; + goto out; + } + + if (device == NULL) { + device = em->map_lookup->stripes[0].dev; + ret = btrfs_add_swapfile_pin(inode, device, false); + if (ret == 1) + ret = 0; + else if (ret) + goto out; + } else if (device != em->map_lookup->stripes[0].dev) { + btrfs_info(fs_info, "swapfile must be on one device"); + ret = -EINVAL; + goto out; + } + + physical_block_start = (em->map_lookup->stripes[0].physical + + (logical_block_start - em->start)); + len = min(len, em->len - (logical_block_start - em->start)); + free_extent_map(em); + em = NULL; + + bg = btrfs_lookup_block_group(fs_info, logical_block_start); + if (!bg) { + btrfs_info(fs_info, "could not find block group containing swapfile"); + ret = -EINVAL; + goto out; + } + + ret = btrfs_add_swapfile_pin(inode, bg, true); + if (ret) { + btrfs_put_block_group(bg); + if (ret == 1) + ret = 0; + else + goto out; + } + + if (bsi.block_len && + bsi.block_start + bsi.block_len == physical_block_start) { + bsi.block_len += len; + } else { + if (bsi.block_len) { + ret = btrfs_add_swap_extent(sis, &bsi); + if (ret) + goto out; + } + bsi.start = start; + bsi.block_start = physical_block_start; + bsi.block_len = len; + } + + start = end; + } + + if (bsi.block_len) + ret = btrfs_add_swap_extent(sis, &bsi); + +out: + if (!IS_ERR_OR_NULL(em)) + free_extent_map(em); + + unlock_extent_cached(io_tree, 0, isize - 1, &cached_state); + + if (ret) + btrfs_swap_deactivate(file); + + clear_bit(BTRFS_FS_EXCL_OP, &fs_info->flags); + + if (ret) + return ret; + + if (device) + sis->bdev = device->bdev; + *span = bsi.highest_ppage - bsi.lowest_ppage + 1; + sis->max = bsi.nr_pages; + sis->pages = bsi.nr_pages - 1; + sis->highest_bit = bsi.nr_pages - 1; + return bsi.nr_extents; +} + static const struct inode_operations btrfs_dir_inode_operations = { .getattr = btrfs_getattr, .lookup = btrfs_lookup, @@ -10565,6 +10880,8 @@ static const struct address_space_operations btrfs_aops = { .releasepage = btrfs_releasepage, .set_page_dirty = btrfs_set_page_dirty, .error_remove_page = generic_error_remove_page, + .swap_activate = btrfs_swap_activate, + .swap_deactivate = btrfs_swap_deactivate, }; static const struct address_space_operations btrfs_symlink_aops = {