From patchwork Fri Sep 7 07:39:20 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Omar Sandoval X-Patchwork-Id: 10591801 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 2BDCB13AC for ; Fri, 7 Sep 2018 07:39:57 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 1E4372A630 for ; Fri, 7 Sep 2018 07:39:57 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 121152A64E; Fri, 7 Sep 2018 07:39:57 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,MAILING_LIST_MULTI,RCVD_IN_DNSWL_NONE autolearn=unavailable version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 429782A630 for ; Fri, 7 Sep 2018 07:39:56 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 89B7D6B7D35; Fri, 7 Sep 2018 03:39:44 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 820296B7D36; Fri, 7 Sep 2018 03:39:44 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 672B36B7D37; Fri, 7 Sep 2018 03:39:44 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-pl1-f198.google.com (mail-pl1-f198.google.com [209.85.214.198]) by kanga.kvack.org (Postfix) with ESMTP id 1952E6B7D35 for ; Fri, 7 Sep 2018 03:39:44 -0400 (EDT) Received: by mail-pl1-f198.google.com with SMTP id a10-v6so6663133pls.23 for ; Fri, 07 Sep 2018 00:39:44 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:dkim-signature:from:to:cc:subject:date :message-id:in-reply-to:references; bh=K4Y906AymHUAxdzlG6CbU0IU5tVvjC493iC6RSTqEvc=; b=RgbcpulSh4OnAhw7xoFNREJW4iwQFYXT/5VLo9D2oEbGfkALA8SLhCiM3ZD1Gnryan l4uEgD4pHtmEGm86l8vbKG1QbaPQ/kACwCkxpfz76i+5SJpPW7L6a7O29H7zm8xZJl3g /PDYqan7/MvOnKQxQrZ4DRMozWMcyzCObvkNBI0HAwPDu9Ml3Z4K9vhhvk5DaBwKPs3i P/rcr4x+x+QvvmkRjcAlO58zDhqPbQMmrHSba1mW4uvWH0ViHABUZaq+rSZ7f/2N8TiN IFk9JBV1qYpye3Zy2U/4KgnSRjcplvcIGOMZ1AOn1/DQAurDxRir635alGpGTdWMuoha WNdw== X-Gm-Message-State: APzg51ApLpjYQpIAomrVrO93J9nE0KxLyXf8ZAPOLwnrRsjmfo8NIylv nzDmZhUuEHptzc3Q0fZRF7dBfWWgDGPkl/FbRCRgz89W2XocPM7TNgC7f4/w9eBtrPKh4a3Ujhi 7Pt6fVfITQuHeQI+5z+34crYokXmp+NGkZa+M/jZPly3APGdl2bA2Ryjt4eAUkpQU8AZUbryXrj VXd0X/sXC71T/WhHWOhhu7YpmtdYO1k3+Yq8MyW+JCVXvpXxs0ogHIPOy4xHMdhhJB2TAAMrjOs noTliM9G/KKlPXKkiZxD1wLcTG717W5Hjnhe7a6YPPA4rPFhjiEWj8ewSAs9vjXVGQO2pE6Xljk ZHGdFUfQVLsm1RMRapbx8K9VH8kx8QAkmcR3tMlAeXxTBKm6ttK5EuAWWpiSitYKE2J2CuVfgBO C X-Received: by 2002:a63:f344:: with SMTP id t4-v6mr6921911pgj.428.1536305983736; Fri, 07 Sep 2018 00:39:43 -0700 (PDT) X-Received: by 2002:a63:f344:: with SMTP id t4-v6mr6921865pgj.428.1536305982697; Fri, 07 Sep 2018 00:39:42 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1536305982; cv=none; d=google.com; s=arc-20160816; b=WImywkqIqayXb55Ih3HUrYUMaSsNyjbSAhfUISaZZs7c/HyST+admZIK2BNkmOXT3U YyQg27Y1iZeLNwDuYc1oZ0pryrXOqk+F4bMxiYSCKh6jCWhe6ucvrHpgwbEaEFWPrzy8 URBwdMAWSbE10IjxezAVVpdANcadxI+WiPs2owotpJjgzxhDD1dmcDmE+2eSsWRM9qrE mciKfnm62b4dCUcbB6ziaF/5B9tfiXtj2QpDiHbOUOhbC9iR4VHVS+S+DgJO5Hdc9yiZ VuvCD126GffKVkU8mezjrccSor/S2nBh6AbtFKljjbp+H8P5+wWQA+sxTu8LLDZ6uC/1 JYNA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=K4Y906AymHUAxdzlG6CbU0IU5tVvjC493iC6RSTqEvc=; b=KOsp2PuLCd5PxPaD/DiLUmoPbxTX6fQZM6SdtcrblXqKuRn45PXCFclHx9RRj5IlXl rXLJ3lChkPyKXTjMtqPPKjav+Vsqyt16tOs5l/1u/K9rjne60arDL8ld7eJ6E67I99Rm aT/A24cwyN1E54QfuVReTrHVYzRvrEAfX+GG6J5ySYLm/fnXRfhIWLOVI23q7FdUTjXv Spf7PCFAmPNOxUK9QuqpSkz31ahQNHbBLh5gAxJvR2h0jd0vhSdrqR+TK1mjb3vJBc1j cnNnl3WgP5o/SD+Khi5N3bEb3SMUhVEYjl6lZ1O4oty0xw4NfbEFU1rMoGkL6yW8MjKo ybCw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@osandov-com.20150623.gappssmtp.com header.s=20150623 header.b=et3B1Rq1; spf=neutral (google.com: 209.85.220.65 is neither permitted nor denied by best guess record for domain of osandov@osandov.com) smtp.mailfrom=osandov@osandov.com Received: from mail-sor-f65.google.com (mail-sor-f65.google.com. [209.85.220.65]) by mx.google.com with SMTPS id w127-v6sor1311359pgb.391.2018.09.07.00.39.42 for (Google Transport Security); Fri, 07 Sep 2018 00:39:42 -0700 (PDT) Received-SPF: neutral (google.com: 209.85.220.65 is neither permitted nor denied by best guess record for domain of osandov@osandov.com) client-ip=209.85.220.65; Authentication-Results: mx.google.com; dkim=pass header.i=@osandov-com.20150623.gappssmtp.com header.s=20150623 header.b=et3B1Rq1; spf=neutral (google.com: 209.85.220.65 is neither permitted nor denied by best guess record for domain of osandov@osandov.com) smtp.mailfrom=osandov@osandov.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=osandov-com.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=K4Y906AymHUAxdzlG6CbU0IU5tVvjC493iC6RSTqEvc=; b=et3B1Rq1KwzRrw/3dDEr9I2QvqzOerMHlAujXvJfF/yifS4jd2ACJMYPoUPOFgHirS TFHWMsqKpzaojl5bJFcs22lfrLdHey1gWGOIk16qqnLkIYW6lSMS1ux9AWNXwdzMBYCu GhtGWwkrqUthTBIndycP+vhgSrxVr8TDtfIJbv4bsA6z5/Xwh2Gosq1ki/QA1dOeByqw zHlAGAfbeuohQzUc5W8KMDFpLT1Qd8brv0/yzZkJml13CNsh65xM74lCvB7A/+CTPGQN 6KF3JCQ5xVtNtuZAz+6O/P+Ol398K5fm5R30k3ODZvBmOmXm6ftc8zFytUO3qF363ihu bukg== X-Google-Smtp-Source: ANB0VdYbj4iBr4kARLbU03pQkLKfnUknFdnuQv7/qNfj+tSfWqZnleC8X4YfjFj9nIhQDEub5tYCIw== X-Received: by 2002:a63:2906:: with SMTP id p6-v6mr6783393pgp.204.1536305982065; Fri, 07 Sep 2018 00:39:42 -0700 (PDT) Received: from vader.thefacebook.com ([2620:10d:c090:180::1:904c]) by smtp.gmail.com with ESMTPSA id i65-v6sm16276914pfk.43.2018.09.07.00.39.41 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Fri, 07 Sep 2018 00:39:41 -0700 (PDT) From: Omar Sandoval To: linux-btrfs@vger.kernel.org Cc: kernel-team@fb.com, linux-mm@kvack.org Subject: [PATCH v6 6/6] Btrfs: support swap files Date: Fri, 7 Sep 2018 00:39:20 -0700 Message-Id: <77442bbbad9ebc37f3b72a47ca983a3a805e0718.1536305017.git.osandov@fb.com> X-Mailer: git-send-email 2.18.0 In-Reply-To: References: X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP From: Omar Sandoval Implement the swap file a_ops on Btrfs. Activation needs to make sure that the file can be used as a swap file, which currently means it must be fully allocated as nocow with no compression on one device. It must also do the proper tracking so that ioctls will not interfere with the swap file. Deactivation clears this tracking. Signed-off-by: Omar Sandoval --- fs/btrfs/inode.c | 316 +++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 316 insertions(+) diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index 9357a19d2bff..55aba2d7074c 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -27,6 +27,7 @@ #include #include #include +#include #include #include "ctree.h" #include "disk-io.h" @@ -10437,6 +10438,319 @@ void btrfs_set_range_writeback(struct extent_io_tree *tree, u64 start, u64 end) } } +/* + * Add an entry indicating a block group or device which is pinned by a + * swapfile. Returns 0 on success, 1 if there is already an entry for it, or a + * negative errno on failure. + */ +static int btrfs_add_swapfile_pin(struct inode *inode, void *ptr, + bool is_block_group) +{ + struct btrfs_fs_info *fs_info = BTRFS_I(inode)->root->fs_info; + struct btrfs_swapfile_pin *sp, *entry; + struct rb_node **p; + struct rb_node *parent = NULL; + + sp = kmalloc(sizeof(*sp), GFP_NOFS); + if (!sp) + return -ENOMEM; + sp->ptr = ptr; + sp->inode = inode; + sp->is_block_group = is_block_group; + + spin_lock(&fs_info->swapfile_pins_lock); + p = &fs_info->swapfile_pins.rb_node; + while (*p) { + parent = *p; + entry = rb_entry(parent, struct btrfs_swapfile_pin, node); + if (sp->ptr < entry->ptr || + (sp->ptr == entry->ptr && sp->inode < entry->inode)) { + p = &(*p)->rb_left; + } else if (sp->ptr > entry->ptr || + (sp->ptr == entry->ptr && sp->inode > entry->inode)) { + p = &(*p)->rb_right; + } else { + spin_unlock(&fs_info->swapfile_pins_lock); + kfree(sp); + return 1; + } + } + rb_link_node(&sp->node, parent, p); + rb_insert_color(&sp->node, &fs_info->swapfile_pins); + spin_unlock(&fs_info->swapfile_pins_lock); + return 0; +} + +static void btrfs_free_swapfile_pins(struct inode *inode) +{ + struct btrfs_fs_info *fs_info = BTRFS_I(inode)->root->fs_info; + struct btrfs_swapfile_pin *sp; + struct rb_node *node, *next; + + spin_lock(&fs_info->swapfile_pins_lock); + node = rb_first(&fs_info->swapfile_pins); + while (node) { + next = rb_next(node); + sp = rb_entry(node, struct btrfs_swapfile_pin, node); + if (sp->inode == inode) { + rb_erase(&sp->node, &fs_info->swapfile_pins); + if (sp->is_block_group) + btrfs_put_block_group(sp->ptr); + kfree(sp); + } + node = next; + } + spin_unlock(&fs_info->swapfile_pins_lock); +} + +struct btrfs_swap_info { + u64 start; + u64 block_start; + u64 block_len; + u64 lowest_ppage; + u64 highest_ppage; + unsigned long nr_pages; + int nr_extents; +}; + +static int btrfs_add_swap_extent(struct swap_info_struct *sis, + struct btrfs_swap_info *bsi) +{ + unsigned long nr_pages; + u64 first_ppage, first_ppage_reported, next_ppage; + int ret; + + first_ppage = ALIGN(bsi->block_start, PAGE_SIZE) >> PAGE_SHIFT; + next_ppage = ALIGN_DOWN(bsi->block_start + bsi->block_len, + PAGE_SIZE) >> PAGE_SHIFT; + + if (first_ppage >= next_ppage) + return 0; + nr_pages = next_ppage - first_ppage; + + first_ppage_reported = first_ppage; + if (bsi->start == 0) + first_ppage_reported++; + if (bsi->lowest_ppage > first_ppage_reported) + bsi->lowest_ppage = first_ppage_reported; + if (bsi->highest_ppage < (next_ppage - 1)) + bsi->highest_ppage = next_ppage - 1; + + ret = add_swap_extent(sis, bsi->nr_pages, nr_pages, first_ppage); + if (ret < 0) + return ret; + bsi->nr_extents += ret; + bsi->nr_pages += nr_pages; + return 0; +} + +static void btrfs_swap_deactivate(struct file *file) +{ + struct inode *inode = file_inode(file); + + btrfs_free_swapfile_pins(inode); + atomic_dec(&BTRFS_I(inode)->root->nr_swapfiles); +} + +static int btrfs_swap_activate(struct swap_info_struct *sis, struct file *file, + sector_t *span) +{ + struct inode *inode = file_inode(file); + struct btrfs_fs_info *fs_info = BTRFS_I(inode)->root->fs_info; + struct extent_io_tree *io_tree = &BTRFS_I(inode)->io_tree; + struct extent_state *cached_state = NULL; + struct extent_map *em = NULL; + struct btrfs_device *device = NULL; + struct btrfs_swap_info bsi = { + .lowest_ppage = (sector_t)-1ULL, + }; + int ret = 0; + u64 isize = inode->i_size; + u64 start; + + /* + * If the swap file was just created, make sure delalloc is done. If the + * file changes again after this, the user is doing something stupid and + * we don't really care. + */ + ret = btrfs_wait_ordered_range(inode, 0, (u64)-1); + if (ret) + return ret; + + /* + * The inode is locked, so these flags won't change after we check them. + */ + if (BTRFS_I(inode)->flags & BTRFS_INODE_COMPRESS) { + btrfs_info(fs_info, "swapfile must not be compressed"); + return -EINVAL; + } + if (!(BTRFS_I(inode)->flags & BTRFS_INODE_NODATACOW)) { + btrfs_info(fs_info, "swapfile must not be copy-on-write"); + return -EINVAL; + } + + /* + * Balance or device remove/replace/resize can move stuff around from + * under us. The EXCL_OP flag makes sure they aren't running/won't run + * concurrently while we are mapping the swap extents, and + * fs_info->swapfile_pins prevents them from running while the swap file + * is active and moving the extents. Note that this also prevents a + * concurrent device add which isn't actually necessary, but it's not + * really worth the trouble to allow it. + */ + if (test_and_set_bit(BTRFS_FS_EXCL_OP, &fs_info->flags)) + return -EBUSY; + /* + * Snapshots can create extents which require COW even if NODATACOW is + * set. We use this counter to prevent snapshots. We must increment it + * before walking the extents because we don't want a concurrent + * snapshot to run after we've already checked the extents. + */ + atomic_inc(&BTRFS_I(inode)->root->nr_swapfiles); + + lock_extent_bits(io_tree, 0, isize - 1, &cached_state); + start = 0; + while (start < isize) { + u64 end, logical_block_start, physical_block_start; + struct btrfs_block_group_cache *bg; + u64 len = isize - start; + + em = btrfs_get_extent(BTRFS_I(inode), NULL, 0, start, len, 0); + if (IS_ERR(em)) { + ret = PTR_ERR(em); + goto out; + } + end = extent_map_end(em); + + if (em->block_start == EXTENT_MAP_HOLE) { + btrfs_info(fs_info, "swapfile must not have holes"); + ret = -EINVAL; + goto out; + } + if (em->block_start == EXTENT_MAP_INLINE) { + /* + * It's unlikely we'll ever actually find ourselves + * here, as a file small enough to fit inline won't be + * big enough to store more than the swap header, but in + * case something changes in the future, let's catch it + * here rather than later. + */ + btrfs_info(fs_info, "swapfile must not be inline"); + ret = -EINVAL; + goto out; + } + if (test_bit(EXTENT_FLAG_COMPRESSED, &em->flags)) { + btrfs_info(fs_info, "swapfile must not be compressed"); + ret = -EINVAL; + goto out; + } + + logical_block_start = em->block_start + (start - em->start); + len = min(len, em->len - (start - em->start)); + free_extent_map(em); + em = NULL; + + ret = can_nocow_extent(inode, start, &len, NULL, NULL, NULL); + if (ret < 0) { + goto out; + } else if (ret) { + ret = 0; + } else { + btrfs_info(fs_info, "swapfile must not be copy-on-write"); + ret = -EINVAL; + goto out; + } + + em = btrfs_get_chunk_map(fs_info, logical_block_start, len); + if (IS_ERR(em)) { + ret = PTR_ERR(em); + goto out; + } + + if (em->map_lookup->type & BTRFS_BLOCK_GROUP_PROFILE_MASK) { + btrfs_info(fs_info, "swapfile must have single data profile"); + ret = -EINVAL; + goto out; + } + + if (device == NULL) { + device = em->map_lookup->stripes[0].dev; + ret = btrfs_add_swapfile_pin(inode, device, false); + if (ret == 1) + ret = 0; + else if (ret) + goto out; + } else if (device != em->map_lookup->stripes[0].dev) { + btrfs_info(fs_info, "swapfile must be on one device"); + ret = -EINVAL; + goto out; + } + + physical_block_start = (em->map_lookup->stripes[0].physical + + (logical_block_start - em->start)); + len = min(len, em->len - (logical_block_start - em->start)); + free_extent_map(em); + em = NULL; + + bg = btrfs_lookup_block_group(fs_info, logical_block_start); + if (!bg) { + btrfs_info(fs_info, "could not find block group containing swapfile"); + ret = -EINVAL; + goto out; + } + + ret = btrfs_add_swapfile_pin(inode, bg, true); + if (ret) { + btrfs_put_block_group(bg); + if (ret == 1) + ret = 0; + else + goto out; + } + + if (bsi.block_len && + bsi.block_start + bsi.block_len == physical_block_start) { + bsi.block_len += len; + } else { + if (bsi.block_len) { + ret = btrfs_add_swap_extent(sis, &bsi); + if (ret) + goto out; + } + bsi.start = start; + bsi.block_start = physical_block_start; + bsi.block_len = len; + } + + start = end; + } + + if (bsi.block_len) + ret = btrfs_add_swap_extent(sis, &bsi); + +out: + if (!IS_ERR_OR_NULL(em)) + free_extent_map(em); + + unlock_extent_cached(io_tree, 0, isize - 1, &cached_state); + + if (ret) + btrfs_swap_deactivate(file); + + clear_bit(BTRFS_FS_EXCL_OP, &fs_info->flags); + + if (ret) + return ret; + + if (device) + sis->bdev = device->bdev; + *span = bsi.highest_ppage - bsi.lowest_ppage + 1; + sis->max = bsi.nr_pages; + sis->pages = bsi.nr_pages - 1; + sis->highest_bit = bsi.nr_pages - 1; + return bsi.nr_extents; +} + static const struct inode_operations btrfs_dir_inode_operations = { .getattr = btrfs_getattr, .lookup = btrfs_lookup, @@ -10514,6 +10828,8 @@ static const struct address_space_operations btrfs_aops = { .releasepage = btrfs_releasepage, .set_page_dirty = btrfs_set_page_dirty, .error_remove_page = generic_error_remove_page, + .swap_activate = btrfs_swap_activate, + .swap_deactivate = btrfs_swap_deactivate, }; static const struct address_space_operations btrfs_symlink_aops = {