From patchwork Thu Feb 10 19:09:51 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Omar Sandoval X-Patchwork-Id: 12742333 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 636D0C433EF for ; Thu, 10 Feb 2022 19:10:34 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1343690AbiBJTKb (ORCPT ); Thu, 10 Feb 2022 14:10:31 -0500 Received: from mxb-00190b01.gslb.pphosted.com ([23.128.96.19]:48142 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233016AbiBJTKb (ORCPT ); Thu, 10 Feb 2022 14:10:31 -0500 Received: from mail-pf1-x42a.google.com (mail-pf1-x42a.google.com [IPv6:2607:f8b0:4864:20::42a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id CA162110C for ; Thu, 10 Feb 2022 11:10:31 -0800 (PST) Received: by mail-pf1-x42a.google.com with SMTP id y8so9221897pfa.11 for ; Thu, 10 Feb 2022 11:10:31 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=osandov-com.20210112.gappssmtp.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=8m2h/Layt/oGDX9jxdwkU6c4xbJMzjOpl2iC5A6mjQo=; b=ZCBvEhvR9jFaSN62oKt9ev1GvVrSt63z/C7xcEqh/KFeRZnwxEv+MjPHUUq9bW/kWi uHYrnpAgyPpEAbjcy92R19/YZWiOy6GIoth/OedcghpeMl0NK5FuasZt66wKOOORhk1R NTaP9IyZJRuUdX2e3gY7FUzjnBZMbVhLnTViwlpYgoN8nCVGAupUwDiIjpsKU3Bz2O1/ kr3I6EB/FR3FQyPdwVvlq2fY7jzyt0PrNGN+deg5sWrXN7ezKDPSzpeIhgf35XICzc7R vAnj3SR2UMTSHZQaWF4wdGa0AYODVsNwWmnNp9hdI/ynz3yv3yyljY4IQoFsbN9Rtys6 otJg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=8m2h/Layt/oGDX9jxdwkU6c4xbJMzjOpl2iC5A6mjQo=; b=rSJyPAGsmOI8kd2QnvPqFMn97OZ327xCZwmcraCW65i5WkKxEnyNKiX1jqD90FJQV5 nHQYrPEcqZ3TelaNIm1nEj7hBYeFae6r2qX0lqNuadHJp/dzx7i+8aECdMoAgnWA5C2E j7Nm4uYcP1sME9dllcA8VG9fCkMD7dyMc6RbEITAhFQCIN/KMMWeNMnRkgamT43uM3kn IH3FkaozT41xNWCgPaKMx7GDV1kUR/SNp90zkBbe/TJG9oJQ34i0PlFavjaXmap2j6vh d7owP1FwVCT95nE5c0GRlMOgAQEFJweLQJ/0yOgnHFA12lb2xaRSMEnC3QLmn7foBhLH HF5w== X-Gm-Message-State: AOAM531wXjQWi9BZOwX1KXR72Oh86OVyP3QJOtpebG5Vl+D5xgKIwvu3 x/hYqjdsbx0Ma6rXZw5e3wfSAIZpGrQ3uw== X-Google-Smtp-Source: ABdhPJzQ5kzrVEpBOiOTKvltHpgdPufkCuqlmP1Q/JpB4X77pqqGhtz4OXScIm32hxjLNpQrrqvxxw== X-Received: by 2002:a05:6a00:16d3:: with SMTP id l19mr8915359pfc.7.1644520230998; Thu, 10 Feb 2022 11:10:30 -0800 (PST) Received: from relinquished.tfbnw.net ([2620:10d:c090:400::5:1975]) by smtp.gmail.com with ESMTPSA id n5sm8315898pgt.22.2022.02.10.11.10.30 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 10 Feb 2022 11:10:30 -0800 (PST) From: Omar Sandoval To: linux-btrfs@vger.kernel.org Cc: kernel-team@fb.com Subject: [PATCH v13 01/17] fs: export rw_verify_area() Date: Thu, 10 Feb 2022 11:09:51 -0800 Message-Id: X-Mailer: git-send-email 2.35.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org From: Omar Sandoval I'm adding Btrfs ioctls to read and write compressed data, and rather than duplicating the checks in rw_verify_area(), let's just export it. Reviewed-by: Josef Bacik Signed-off-by: Omar Sandoval --- fs/internal.h | 5 ----- fs/read_write.c | 1 + include/linux/fs.h | 1 + 3 files changed, 2 insertions(+), 5 deletions(-) diff --git a/fs/internal.h b/fs/internal.h index 8590c973c2f4..711bdc00ec7c 100644 --- a/fs/internal.h +++ b/fs/internal.h @@ -157,11 +157,6 @@ extern char *simple_dname(struct dentry *, char *, int); extern void dput_to_list(struct dentry *, struct list_head *); extern void shrink_dentry_list(struct list_head *); -/* - * read_write.c - */ -extern int rw_verify_area(int, struct file *, const loff_t *, size_t); - /* * pipe.c */ diff --git a/fs/read_write.c b/fs/read_write.c index 0074afa7ecb3..4d60146243df 100644 --- a/fs/read_write.c +++ b/fs/read_write.c @@ -385,6 +385,7 @@ int rw_verify_area(int read_write, struct file *file, const loff_t *ppos, size_t return security_file_permission(file, read_write == READ ? MAY_READ : MAY_WRITE); } +EXPORT_SYMBOL(rw_verify_area); static ssize_t new_sync_read(struct file *filp, char __user *buf, size_t len, loff_t *ppos) { diff --git a/include/linux/fs.h b/include/linux/fs.h index e2d892b201b0..4deb42c326ba 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -3173,6 +3173,7 @@ extern loff_t fixed_size_llseek(struct file *file, loff_t offset, int whence, loff_t size); extern loff_t no_seek_end_llseek_size(struct file *, loff_t, int, loff_t); extern loff_t no_seek_end_llseek(struct file *, loff_t, int); +extern int rw_verify_area(int, struct file *, const loff_t *, size_t); extern int generic_file_open(struct inode * inode, struct file * filp); extern int nonseekable_open(struct inode * inode, struct file * filp); extern int stream_open(struct inode * inode, struct file * filp); From patchwork Thu Feb 10 19:09:52 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Omar Sandoval X-Patchwork-Id: 12742334 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2FE0AC433FE for ; Thu, 10 Feb 2022 19:10:35 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1343721AbiBJTKc (ORCPT ); Thu, 10 Feb 2022 14:10:32 -0500 Received: from mxb-00190b01.gslb.pphosted.com ([23.128.96.19]:48148 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1343710AbiBJTKb (ORCPT ); Thu, 10 Feb 2022 14:10:31 -0500 Received: from mail-pj1-x1032.google.com (mail-pj1-x1032.google.com [IPv6:2607:f8b0:4864:20::1032]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D61F210BA for ; Thu, 10 Feb 2022 11:10:32 -0800 (PST) Received: by mail-pj1-x1032.google.com with SMTP id c5-20020a17090a1d0500b001b904a7046dso8012377pjd.1 for ; Thu, 10 Feb 2022 11:10:32 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=osandov-com.20210112.gappssmtp.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=uOABk8mNzBKfgcWHyLCqTK+OXP9+oV+Aq9izhbHg64Q=; b=V0p1ePfjpZ7FSs49PS6VmQzcdHMQoEK0IXwS6Yjox3GLNfNM8Hzil+TvU5a6JTmBjd Ygo5B6xmzYd0yO89/z4p+yR8Ge+DFwTD+C9mBaks2yx12erVU5PFqqvKCeLsKXJxpYRX 6U3IYt4pT16djU3ZzrvgEVE8KURXLbN3cm1dsBNSOCV3ZDzRtPOLjjk0EeyYppAF+dK3 +WeecVq8FnpUTEYEK0qcBO01JCagsmVehBDI7HL/TzImXWwcHNYyovKLd922G+e2FYKQ 3ESV4D6Ft+2z2nqibN6BgWCjufs1BRI3T7uHTvgP7P5qI80WYTL+t25BzTiY6u4bCTE1 IN6w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=uOABk8mNzBKfgcWHyLCqTK+OXP9+oV+Aq9izhbHg64Q=; b=EHlEiYRnBHulJfNVwOakWIqSypBwMFMjNJvH5ukoodFQ5yyIfPhOFUvWmTwqqRaDPw AktVbqOdP6hcy9R51QneiM8C1YwtGurGJBewWgCwK+cudrnIAH80uPHIFWTiUR419xKf KrhUu5IDy3Ovc69056cRlMOD5qcbEUB3p1bMPBIZ+kUKTYIXakCOlUss/lpr/1zEqsBb SOiEllhln8vuxgEbRLBNKgQjYiNwKMOLvBh++iB6tr8bn2xLwz6bL7W9mP+w4GwdPOs9 8Ue0mQ8nZKjbU8YK9g7ZWewIGDLLdF9bzIB3BBKPCPuw2E8D09QIkTnAI0CQ0Xe32FYn 6XgQ== X-Gm-Message-State: AOAM530GMv78UUf+G5LxzNEKtF9JKtiCJH6kPYQ/YVmJQqLV43hptbCe 9JBnb2CUHaFCGLhZIZxouy+ghdU6nfUGvQ== X-Google-Smtp-Source: ABdhPJw0oMLy3q5Y5HJpwHL0N5ip3O/txBx8ukVa2+JFPlbV6SPwf0bkSFylYQD1772PRWonC8RwDA== X-Received: by 2002:a17:902:ce02:: with SMTP id k2mr5463858plg.93.1644520231898; Thu, 10 Feb 2022 11:10:31 -0800 (PST) Received: from relinquished.tfbnw.net ([2620:10d:c090:400::5:1975]) by smtp.gmail.com with ESMTPSA id n5sm8315898pgt.22.2022.02.10.11.10.31 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 10 Feb 2022 11:10:31 -0800 (PST) From: Omar Sandoval To: linux-btrfs@vger.kernel.org Cc: kernel-team@fb.com Subject: [PATCH v13 02/17] fs: export variant of generic_write_checks without iov_iter Date: Thu, 10 Feb 2022 11:09:52 -0800 Message-Id: X-Mailer: git-send-email 2.35.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org From: Omar Sandoval Encoded I/O in Btrfs needs to check a write with a given logical size without an iov_iter that matches that size (because the iov_iter we have is for the compressed data). So, factor out the parts of generic_write_check() that don't need an iov_iter into a new generic_write_checks_count() function and export that. Reviewed-by: Nikolay Borisov Signed-off-by: Omar Sandoval --- fs/read_write.c | 33 ++++++++++++++++++++------------- include/linux/fs.h | 1 + 2 files changed, 21 insertions(+), 13 deletions(-) diff --git a/fs/read_write.c b/fs/read_write.c index 4d60146243df..dc5000173b80 100644 --- a/fs/read_write.c +++ b/fs/read_write.c @@ -1618,24 +1618,16 @@ int generic_write_check_limits(struct file *file, loff_t pos, loff_t *count) return 0; } -/* - * Performs necessary checks before doing a write - * - * Can adjust writing position or amount of bytes to write. - * Returns appropriate error code that caller should return or - * zero in case that write should be allowed. - */ -ssize_t generic_write_checks(struct kiocb *iocb, struct iov_iter *from) +/* Like generic_write_checks(), but takes size of write instead of iter. */ +int generic_write_checks_count(struct kiocb *iocb, loff_t *count) { struct file *file = iocb->ki_filp; struct inode *inode = file->f_mapping->host; - loff_t count; - int ret; if (IS_SWAPFILE(inode)) return -ETXTBSY; - if (!iov_iter_count(from)) + if (!*count) return 0; /* FIXME: this is for backwards compatibility with 2.4 */ @@ -1645,8 +1637,23 @@ ssize_t generic_write_checks(struct kiocb *iocb, struct iov_iter *from) if ((iocb->ki_flags & IOCB_NOWAIT) && !(iocb->ki_flags & IOCB_DIRECT)) return -EINVAL; - count = iov_iter_count(from); - ret = generic_write_check_limits(file, iocb->ki_pos, &count); + return generic_write_check_limits(iocb->ki_filp, iocb->ki_pos, count); +} +EXPORT_SYMBOL(generic_write_checks_count); + +/* + * Performs necessary checks before doing a write + * + * Can adjust writing position or amount of bytes to write. + * Returns appropriate error code that caller should return or + * zero in case that write should be allowed. + */ +ssize_t generic_write_checks(struct kiocb *iocb, struct iov_iter *from) +{ + loff_t count = iov_iter_count(from); + int ret; + + ret = generic_write_checks_count(iocb, &count); if (ret) return ret; diff --git a/include/linux/fs.h b/include/linux/fs.h index 4deb42c326ba..ce5aabb609cd 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -3130,6 +3130,7 @@ extern int sb_min_blocksize(struct super_block *, int); extern int generic_file_mmap(struct file *, struct vm_area_struct *); extern int generic_file_readonly_mmap(struct file *, struct vm_area_struct *); extern ssize_t generic_write_checks(struct kiocb *, struct iov_iter *); +extern int generic_write_checks_count(struct kiocb *iocb, loff_t *count); extern int generic_write_check_limits(struct file *file, loff_t pos, loff_t *count); extern int generic_file_rw_checks(struct file *file_in, struct file *file_out); From patchwork Thu Feb 10 19:09:53 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Omar Sandoval X-Patchwork-Id: 12742335 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 43AF3C4332F for ; Thu, 10 Feb 2022 19:10:36 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1343723AbiBJTKd (ORCPT ); Thu, 10 Feb 2022 14:10:33 -0500 Received: from mxb-00190b01.gslb.pphosted.com ([23.128.96.19]:48158 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1343722AbiBJTKd (ORCPT ); Thu, 10 Feb 2022 14:10:33 -0500 Received: from mail-pj1-x1035.google.com (mail-pj1-x1035.google.com [IPv6:2607:f8b0:4864:20::1035]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 8FCEE10BA for ; Thu, 10 Feb 2022 11:10:33 -0800 (PST) Received: by mail-pj1-x1035.google.com with SMTP id v5-20020a17090a4ec500b001b8b702df57so9563818pjl.2 for ; Thu, 10 Feb 2022 11:10:33 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=osandov-com.20210112.gappssmtp.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=3Zh2+pk+XRnuuNmR/Zh4xLoNRRCKPCTCf8vQ0e4Mxpk=; b=cl/RCPCNfKsPCRLAXYDy5uhuxYJomsAVSMPqTSmra6kVAsQCcghFXJMBkSLgKx2CuL TmE6AZDAv/ec5Z82WOwswCb2ji9y+gWwalDV6ihNAYMoK/bv0xL4Ojn/HXmZbkzcCkGh dnZCb5ALs5Snb0Q8A6xh13Y1KuLyK30oXy1lyu+sizF1rMW7Dajqbo9gic4TgdyUAVWv ZzSunbdQ9/PacK6m8QeysTYL+c0Z8hRADGincWGq+kIKGELIUPvEoJAOM6bT4mk2tlem /VWSulZyCtaSHSqXres3rY8N8fjMjXhnMaRTpSlavxbbj0hwGoHxZ7yZKusxAFw2j8OO QslQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=3Zh2+pk+XRnuuNmR/Zh4xLoNRRCKPCTCf8vQ0e4Mxpk=; b=8HEXFVF6YbW+SMy5vjqcb6rQKWXAkWrfntPIzaOt04UZSDr4A0uFSUB11Wq+i07Cs5 BdS/9MPPYvi987ZNs2SpgmQ0Q0UWAu/LNt+kmL2E4HB/2/9UZPhqJdbr9HAeUJKdFe8t EPT1oHrtAn+pXCg4tVr8f9SXzx6dt/HNccnQTJINqj5JOxYfrURXhi+kGPZxob1L1Qp4 YQ2OVDeydno2l7tlLwO6OJT9ozW6/p4QPzFhP4EeT4THiduE1kqqXyqH5xTPlzHBA+dB UJQXdHRsiuj+yx5YwUii6dFAcQBfchBi6i/XwaUZhySWT3Sloph9EZWyyOF86z0yCN20 HT4g== X-Gm-Message-State: AOAM530GX7PmyiQeX6d3MaNNbsSlvS0FcgXp1kvyhMog+JP3XQHigOdv SYeZEngp16erD9bKqhry05jdGAJ/jnxAiA== X-Google-Smtp-Source: ABdhPJwe1MpluiPgXslawYXVlKamMS5uR0WXGkARAt/Sq6sGScXayg/vkh6MocKthGNpW02r19kyLw== X-Received: by 2002:a17:902:e84a:: with SMTP id t10mr8768698plg.25.1644520232751; Thu, 10 Feb 2022 11:10:32 -0800 (PST) Received: from relinquished.tfbnw.net ([2620:10d:c090:400::5:1975]) by smtp.gmail.com with ESMTPSA id n5sm8315898pgt.22.2022.02.10.11.10.32 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 10 Feb 2022 11:10:32 -0800 (PST) From: Omar Sandoval To: linux-btrfs@vger.kernel.org Cc: kernel-team@fb.com Subject: [PATCH v13 03/17] btrfs: don't advance offset for compressed bios in btrfs_csum_one_bio() Date: Thu, 10 Feb 2022 11:09:53 -0800 Message-Id: <723131ef561f7dcf676c1ca30fd1afa6c73048d3.1644519257.git.osandov@fb.com> X-Mailer: git-send-email 2.35.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org From: Omar Sandoval btrfs_csum_one_bio() loops over each filesystem block in the bio while keeping a cursor of its current logical position in the file in order to look up the ordered extent to add the checksums to. However, this doesn't make much sense for compressed extents, as a sector on disk does not correspond to a sector of decompressed file data. It happens to work because 1) the compressed bio always covers one ordered extent and 2) the size of the bio is always less than the size of the ordered extent. However, the second point will not always be true for encoded writes. Let's add a boolean parameter to btrfs_csum_one_bio() to indicate that it can assume that the bio only covers one ordered extent. Since we're already changing the signature, let's get rid of the contig parameter and make it implied by the offset parameter, similar to the change we recently made to btrfs_lookup_bio_sums(). Additionally, let's rename nr_sectors to blockcount to make it clear that it's the number of filesystem blocks, not the number of 512-byte sectors. Reviewed-by: Josef Bacik Reviewed-by: Nikolay Borisov Signed-off-by: Omar Sandoval --- fs/btrfs/compression.c | 3 ++- fs/btrfs/ctree.h | 2 +- fs/btrfs/file-item.c | 32 ++++++++++++++------------------ fs/btrfs/inode.c | 8 ++++---- 4 files changed, 21 insertions(+), 24 deletions(-) diff --git a/fs/btrfs/compression.c b/fs/btrfs/compression.c index 71e5b2e9a1ba..4bd4cdf866aa 100644 --- a/fs/btrfs/compression.c +++ b/fs/btrfs/compression.c @@ -591,7 +591,8 @@ blk_status_t btrfs_submit_compressed_write(struct btrfs_inode *inode, u64 start, if (submit) { if (!skip_sum) { - ret = btrfs_csum_one_bio(inode, bio, start, 1); + ret = btrfs_csum_one_bio(inode, bio, start, + true); if (ret) goto finish_cb; } diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index 9f7a950b8a69..65c0d67cd286 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -3193,7 +3193,7 @@ int btrfs_csum_file_blocks(struct btrfs_trans_handle *trans, struct btrfs_root *root, struct btrfs_ordered_sum *sums); blk_status_t btrfs_csum_one_bio(struct btrfs_inode *inode, struct bio *bio, - u64 file_start, int contig); + u64 offset, bool one_ordered); int btrfs_lookup_csums_range(struct btrfs_root *root, u64 start, u64 end, struct list_head *list, int search_commit); void btrfs_extent_item_to_extent_map(struct btrfs_inode *inode, diff --git a/fs/btrfs/file-item.c b/fs/btrfs/file-item.c index 9a3de652ada8..cda113727bd2 100644 --- a/fs/btrfs/file-item.c +++ b/fs/btrfs/file-item.c @@ -616,28 +616,28 @@ int btrfs_lookup_csums_range(struct btrfs_root *root, u64 start, u64 end, * btrfs_csum_one_bio - Calculates checksums of the data contained inside a bio * @inode: Owner of the data inside the bio * @bio: Contains the data to be checksummed - * @file_start: offset in file this bio begins to describe - * @contig: Boolean. If true/1 means all bio vecs in this bio are - * contiguous and they begin at @file_start in the file. False/0 - * means this bio can contain potentially discontiguous bio vecs - * so the logical offset of each should be calculated separately. + * @offset: If (u64)-1, @bio may contain discontiguous bio vecs, so the + * file offsets are determined from the page offsets in the bio. + * Otherwise, this is the starting file offset of the bio vecs in + * @bio, which must be contiguous. + * @one_ordered: If true, @bio only refers to one ordered extent. */ blk_status_t btrfs_csum_one_bio(struct btrfs_inode *inode, struct bio *bio, - u64 file_start, int contig) + u64 offset, bool one_ordered) { struct btrfs_fs_info *fs_info = inode->root->fs_info; SHASH_DESC_ON_STACK(shash, fs_info->csum_shash); struct btrfs_ordered_sum *sums; struct btrfs_ordered_extent *ordered = NULL; + const bool use_page_offsets = (offset == (u64)-1); char *data; struct bvec_iter iter; struct bio_vec bvec; int index; - int nr_sectors; + int blockcount; unsigned long total_bytes = 0; unsigned long this_sum_bytes = 0; int i; - u64 offset; unsigned nofs_flag; nofs_flag = memalloc_nofs_save(); @@ -651,18 +651,13 @@ blk_status_t btrfs_csum_one_bio(struct btrfs_inode *inode, struct bio *bio, sums->len = bio->bi_iter.bi_size; INIT_LIST_HEAD(&sums->list); - if (contig) - offset = file_start; - else - offset = 0; /* shut up gcc */ - sums->bytenr = bio->bi_iter.bi_sector << 9; index = 0; shash->tfm = fs_info->csum_shash; bio_for_each_segment(bvec, bio, iter) { - if (!contig) + if (use_page_offsets) offset = page_offset(bvec.bv_page) + bvec.bv_offset; if (!ordered) { @@ -681,13 +676,14 @@ blk_status_t btrfs_csum_one_bio(struct btrfs_inode *inode, struct bio *bio, } } - nr_sectors = BTRFS_BYTES_TO_BLKS(fs_info, + blockcount = BTRFS_BYTES_TO_BLKS(fs_info, bvec.bv_len + fs_info->sectorsize - 1); - for (i = 0; i < nr_sectors; i++) { - if (offset >= ordered->file_offset + ordered->num_bytes || - offset < ordered->file_offset) { + for (i = 0; i < blockcount; i++) { + if (!one_ordered && + !in_range(offset, ordered->file_offset, + ordered->num_bytes)) { unsigned long bytes_left; sums->len = this_sum_bytes; diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index 24099fe9e120..1fadbdc25168 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -2314,7 +2314,7 @@ void btrfs_clear_delalloc_extent(struct inode *vfs_inode, static blk_status_t btrfs_submit_bio_start(struct inode *inode, struct bio *bio, u64 dio_file_offset) { - return btrfs_csum_one_bio(BTRFS_I(inode), bio, 0, 0); + return btrfs_csum_one_bio(BTRFS_I(inode), bio, (u64)-1, false); } /* @@ -2566,7 +2566,7 @@ blk_status_t btrfs_submit_data_bio(struct inode *inode, struct bio *bio, 0, btrfs_submit_bio_start); goto out; } else if (!skip_sum) { - ret = btrfs_csum_one_bio(BTRFS_I(inode), bio, 0, 0); + ret = btrfs_csum_one_bio(BTRFS_I(inode), bio, (u64)-1, false); if (ret) goto out; } @@ -7817,7 +7817,7 @@ static blk_status_t btrfs_submit_bio_start_direct_io(struct inode *inode, struct bio *bio, u64 dio_file_offset) { - return btrfs_csum_one_bio(BTRFS_I(inode), bio, dio_file_offset, 1); + return btrfs_csum_one_bio(BTRFS_I(inode), bio, dio_file_offset, false); } static void btrfs_end_dio_bio(struct bio *bio) @@ -7874,7 +7874,7 @@ static inline blk_status_t btrfs_submit_dio_bio(struct bio *bio, * If we aren't doing async submit, calculate the csum of the * bio now. */ - ret = btrfs_csum_one_bio(BTRFS_I(inode), bio, file_offset, 1); + ret = btrfs_csum_one_bio(BTRFS_I(inode), bio, file_offset, false); if (ret) goto err; } else { From patchwork Thu Feb 10 19:09:54 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Omar Sandoval X-Patchwork-Id: 12742336 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id D92DEC433EF for ; Thu, 10 Feb 2022 19:10:37 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1343728AbiBJTKf (ORCPT ); Thu, 10 Feb 2022 14:10:35 -0500 Received: from mxb-00190b01.gslb.pphosted.com ([23.128.96.19]:48176 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1343722AbiBJTKe (ORCPT ); Thu, 10 Feb 2022 14:10:34 -0500 Received: from mail-pf1-x434.google.com (mail-pf1-x434.google.com [IPv6:2607:f8b0:4864:20::434]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 94EEE110C for ; Thu, 10 Feb 2022 11:10:34 -0800 (PST) Received: by mail-pf1-x434.google.com with SMTP id a39so11107029pfx.7 for ; Thu, 10 Feb 2022 11:10:34 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=osandov-com.20210112.gappssmtp.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=q8YOMMz5NNSiy3+dYMc4caCrdF6tu4FlTUaiG0w5KUk=; b=RmBGSB1b4GGHLQhsV4U1VaLTWrmpVFNDF0Rs5wc/609YJx1Tbx13oIs1UwFr6br3HT ASZ+xDD+8dTvbCuuiPC1mrwERzHG8u6KRQeDMlhKQFM2DJsfBfhzLbcRkSZGYDjXYYGi tSs8tHfUhx1uiq7uMSS1xrWD2BOrdKqR8hmZlqFXXV0dVlpQ+7MdDoTkayRZONet9iOr Qz3dr0J5sMO0rMczkBRJdODIoUhZgNs6B1SaqZepVzntciJkzm6m18U6gywdeJ5oMOY5 c4buqBOtART1HTdY6A0KmZh2tm/z/OF2UYWyi+vb9ADI2Lj5ROQgjyzIfMUN1n0pytxq GnAQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=q8YOMMz5NNSiy3+dYMc4caCrdF6tu4FlTUaiG0w5KUk=; b=nChPuGKmCZgR6i0pzQLnaDzrhZcRWomh0cD/cS1PxhHRo2tKfRAuPath8P8SgXZh0t qZ5c4F2Neb28aPAG8EfMAyVFW77stQFesq5E+ptKUNiivoF5JCCClRTmrhyc4qpnOOI7 25b+Lwj0fNgtBHa1gjHVCJLl+RtKyELdF/5eLbClx168Kc5bfHlyTzOtLWdIJpKe49Rn FajExTtYkvutyFjp9ZfO4EKGfaLxLZLdqBcDn0rSYudz7ta0IT+E1R421EKM/cdFuuGw eADexxpsIzh32+L5zieoExaM2yFZ+hSRuLuvk2XCNZh+xwNTUouxfGL4GmDLhGnDtPds Mjwg== X-Gm-Message-State: AOAM5303YonxYuFjeEY1jYPXvz7LycNMJA+gViAn1kVbGsu5c3bO4rln 1MyLoyX1WzK42PZChutXgDgsW9UyV1LnRA== X-Google-Smtp-Source: ABdhPJxucEf5MDuJn8BTLZTK1EtEDbfu29IOEoHXi9KRBfPU/EpyWAZzlVCtF59KQd6SO0sUsxvlDQ== X-Received: by 2002:a05:6a00:21cb:: with SMTP id t11mr8907523pfj.16.1644520233683; Thu, 10 Feb 2022 11:10:33 -0800 (PST) Received: from relinquished.tfbnw.net ([2620:10d:c090:400::5:1975]) by smtp.gmail.com with ESMTPSA id n5sm8315898pgt.22.2022.02.10.11.10.32 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 10 Feb 2022 11:10:33 -0800 (PST) From: Omar Sandoval To: linux-btrfs@vger.kernel.org Cc: kernel-team@fb.com Subject: [PATCH v13 04/17] btrfs: add ram_bytes and offset to btrfs_ordered_extent Date: Thu, 10 Feb 2022 11:09:54 -0800 Message-Id: <0b3f7ed73c87963a3ab5513494e9a3d4d248c59c.1644519257.git.osandov@fb.com> X-Mailer: git-send-email 2.35.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org From: Omar Sandoval Currently, we only create ordered extents when ram_bytes == num_bytes and offset == 0. However, BTRFS_IOC_ENCODED_WRITE writes may create extents which only refer to a subset of the full unencoded extent, so we need to plumb these fields through the ordered extent infrastructure and pass them down to insert_reserved_file_extent(). Since we're changing the btrfs_add_ordered_extent* signature, let's get rid of the trivial wrappers and add a kernel-doc. Reviewed-by: Nikolay Borisov Reviewed-by: Josef Bacik Signed-off-by: Omar Sandoval --- fs/btrfs/inode.c | 58 ++++++++++++-------- fs/btrfs/ordered-data.c | 119 ++++++++++++---------------------------- fs/btrfs/ordered-data.h | 22 +++++--- 3 files changed, 82 insertions(+), 117 deletions(-) diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index 1fadbdc25168..55fa13221f5c 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -985,11 +985,14 @@ static int submit_one_async_extent(struct btrfs_inode *inode, } free_extent_map(em); - ret = btrfs_add_ordered_extent_compress(inode, start, /* file_offset */ - ins.objectid, /* disk_bytenr */ - async_extent->ram_size, /* num_bytes */ - ins.offset, /* disk_num_bytes */ - async_extent->compress_type); + ret = btrfs_add_ordered_extent(inode, start, /* file_offset */ + async_extent->ram_size, /* num_bytes */ + async_extent->ram_size, /* ram_bytes */ + ins.objectid, /* disk_bytenr */ + ins.offset, /* disk_num_bytes */ + 0, /* offset */ + 1 << BTRFS_ORDERED_COMPRESSED, + async_extent->compress_type); if (ret) { btrfs_drop_extent_cache(inode, start, end, 0); goto out_free_reserve; @@ -1238,9 +1241,10 @@ static noinline int cow_file_range(struct btrfs_inode *inode, } free_extent_map(em); - ret = btrfs_add_ordered_extent(inode, start, ins.objectid, - ram_size, cur_alloc_size, - BTRFS_ORDERED_REGULAR); + ret = btrfs_add_ordered_extent(inode, start, ram_size, ram_size, + ins.objectid, cur_alloc_size, 0, + 1 << BTRFS_ORDERED_REGULAR, + BTRFS_COMPRESS_NONE); if (ret) goto out_drop_extent_cache; @@ -1899,10 +1903,11 @@ static noinline int run_delalloc_nocow(struct btrfs_inode *inode, goto error; } free_extent_map(em); - ret = btrfs_add_ordered_extent(inode, cur_offset, - disk_bytenr, num_bytes, - num_bytes, - BTRFS_ORDERED_PREALLOC); + ret = btrfs_add_ordered_extent(inode, + cur_offset, num_bytes, num_bytes, + disk_bytenr, num_bytes, 0, + 1 << BTRFS_ORDERED_PREALLOC, + BTRFS_COMPRESS_NONE); if (ret) { btrfs_drop_extent_cache(inode, cur_offset, cur_offset + num_bytes - 1, @@ -1911,9 +1916,11 @@ static noinline int run_delalloc_nocow(struct btrfs_inode *inode, } } else { ret = btrfs_add_ordered_extent(inode, cur_offset, + num_bytes, num_bytes, disk_bytenr, num_bytes, - num_bytes, - BTRFS_ORDERED_NOCOW); + 0, + 1 << BTRFS_ORDERED_NOCOW, + BTRFS_COMPRESS_NONE); if (ret) goto error; } @@ -2874,6 +2881,7 @@ static int insert_reserved_file_extent(struct btrfs_trans_handle *trans, struct btrfs_key ins; u64 disk_num_bytes = btrfs_stack_file_extent_disk_num_bytes(stack_fi); u64 disk_bytenr = btrfs_stack_file_extent_disk_bytenr(stack_fi); + u64 offset = btrfs_stack_file_extent_offset(stack_fi); u64 num_bytes = btrfs_stack_file_extent_num_bytes(stack_fi); u64 ram_bytes = btrfs_stack_file_extent_ram_bytes(stack_fi); struct btrfs_drop_extents_args drop_args = { 0 }; @@ -2948,7 +2956,8 @@ static int insert_reserved_file_extent(struct btrfs_trans_handle *trans, goto out; ret = btrfs_alloc_reserved_file_extent(trans, root, btrfs_ino(inode), - file_pos, qgroup_reserved, &ins); + file_pos - offset, + qgroup_reserved, &ins); out: btrfs_free_path(path); @@ -2974,20 +2983,20 @@ static int insert_ordered_extent_file_extent(struct btrfs_trans_handle *trans, struct btrfs_ordered_extent *oe) { struct btrfs_file_extent_item stack_fi; - u64 logical_len; bool update_inode_bytes; + u64 num_bytes = oe->num_bytes; + u64 ram_bytes = oe->ram_bytes; memset(&stack_fi, 0, sizeof(stack_fi)); btrfs_set_stack_file_extent_type(&stack_fi, BTRFS_FILE_EXTENT_REG); btrfs_set_stack_file_extent_disk_bytenr(&stack_fi, oe->disk_bytenr); btrfs_set_stack_file_extent_disk_num_bytes(&stack_fi, oe->disk_num_bytes); + btrfs_set_stack_file_extent_offset(&stack_fi, oe->offset); if (test_bit(BTRFS_ORDERED_TRUNCATED, &oe->flags)) - logical_len = oe->truncated_len; - else - logical_len = oe->num_bytes; - btrfs_set_stack_file_extent_num_bytes(&stack_fi, logical_len); - btrfs_set_stack_file_extent_ram_bytes(&stack_fi, logical_len); + num_bytes = ram_bytes = oe->truncated_len; + btrfs_set_stack_file_extent_num_bytes(&stack_fi, num_bytes); + btrfs_set_stack_file_extent_ram_bytes(&stack_fi, ram_bytes); btrfs_set_stack_file_extent_compression(&stack_fi, oe->compress_type); /* Encryption and other encoding is reserved and all 0 */ @@ -7054,8 +7063,11 @@ static struct extent_map *btrfs_create_dio_extent(struct btrfs_inode *inode, if (IS_ERR(em)) goto out; } - ret = btrfs_add_ordered_extent_dio(inode, start, block_start, len, - block_len, type); + ret = btrfs_add_ordered_extent(inode, start, len, len, block_start, + block_len, 0, + (1 << type) | + (1 << BTRFS_ORDERED_DIRECT), + BTRFS_COMPRESS_NONE); if (ret) { if (em) { free_extent_map(em); diff --git a/fs/btrfs/ordered-data.c b/fs/btrfs/ordered-data.c index 6b51fd2ec5ac..5e4c59b00b01 100644 --- a/fs/btrfs/ordered-data.c +++ b/fs/btrfs/ordered-data.c @@ -143,16 +143,27 @@ static inline struct rb_node *tree_search(struct btrfs_ordered_inode_tree *tree, return ret; } -/* - * Allocate and add a new ordered_extent into the per-inode tree. +/** + * btrfs_add_ordered_extent - Add an ordered extent to the per-inode tree. + * @inode: inode that this extent is for. + * @file_offset: Logical offset in file where the extent starts. + * @num_bytes: Logical length of extent in file. + * @ram_bytes: Full length of unencoded data. + * @disk_bytenr: Offset of extent on disk. + * @disk_num_bytes: Size of extent on disk. + * @offset: Offset into unencoded data where file data starts. + * @flags: Flags specifying type of extent (1 << BTRFS_ORDERED_*). + * @compress_type: Compression algorithm used for data. * - * The tree is given a single reference on the ordered extent that was - * inserted. + * Most of these parameters correspond to &struct btrfs_file_extent_item. The + * tree is given a single reference on the ordered extent that was inserted. + * + * Return: 0 or -ENOMEM. */ -static int __btrfs_add_ordered_extent(struct btrfs_inode *inode, u64 file_offset, - u64 disk_bytenr, u64 num_bytes, - u64 disk_num_bytes, int type, int dio, - int compress_type) +int btrfs_add_ordered_extent(struct btrfs_inode *inode, u64 file_offset, + u64 num_bytes, u64 ram_bytes, u64 disk_bytenr, + u64 disk_num_bytes, u64 offset, int flags, + int compress_type) { struct btrfs_root *root = inode->root; struct btrfs_fs_info *fs_info = root->fs_info; @@ -161,7 +172,8 @@ static int __btrfs_add_ordered_extent(struct btrfs_inode *inode, u64 file_offset struct btrfs_ordered_extent *entry; int ret; - if (type == BTRFS_ORDERED_NOCOW || type == BTRFS_ORDERED_PREALLOC) { + if (flags & + ((1 << BTRFS_ORDERED_NOCOW) | (1 << BTRFS_ORDERED_PREALLOC))) { /* For nocow write, we can release the qgroup rsv right now */ ret = btrfs_qgroup_free_data(inode, NULL, file_offset, num_bytes); if (ret < 0) @@ -181,9 +193,11 @@ static int __btrfs_add_ordered_extent(struct btrfs_inode *inode, u64 file_offset return -ENOMEM; entry->file_offset = file_offset; - entry->disk_bytenr = disk_bytenr; entry->num_bytes = num_bytes; + entry->ram_bytes = ram_bytes; + entry->disk_bytenr = disk_bytenr; entry->disk_num_bytes = disk_num_bytes; + entry->offset = offset; entry->bytes_left = num_bytes; entry->inode = igrab(&inode->vfs_inode); entry->compress_type = compress_type; @@ -191,18 +205,12 @@ static int __btrfs_add_ordered_extent(struct btrfs_inode *inode, u64 file_offset entry->qgroup_rsv = ret; entry->physical = (u64)-1; - ASSERT(type == BTRFS_ORDERED_REGULAR || - type == BTRFS_ORDERED_NOCOW || - type == BTRFS_ORDERED_PREALLOC || - type == BTRFS_ORDERED_COMPRESSED); - set_bit(type, &entry->flags); + ASSERT((flags & ~BTRFS_ORDERED_TYPE_FLAGS) == 0); + entry->flags = flags; percpu_counter_add_batch(&fs_info->ordered_bytes, num_bytes, fs_info->delalloc_batch); - if (dio) - set_bit(BTRFS_ORDERED_DIRECT, &entry->flags); - /* one ref for the tree */ refcount_set(&entry->refs, 1); init_waitqueue_head(&entry->wait); @@ -247,41 +255,6 @@ static int __btrfs_add_ordered_extent(struct btrfs_inode *inode, u64 file_offset return 0; } -int btrfs_add_ordered_extent(struct btrfs_inode *inode, u64 file_offset, - u64 disk_bytenr, u64 num_bytes, u64 disk_num_bytes, - int type) -{ - ASSERT(type == BTRFS_ORDERED_REGULAR || - type == BTRFS_ORDERED_NOCOW || - type == BTRFS_ORDERED_PREALLOC); - return __btrfs_add_ordered_extent(inode, file_offset, disk_bytenr, - num_bytes, disk_num_bytes, type, 0, - BTRFS_COMPRESS_NONE); -} - -int btrfs_add_ordered_extent_dio(struct btrfs_inode *inode, u64 file_offset, - u64 disk_bytenr, u64 num_bytes, - u64 disk_num_bytes, int type) -{ - ASSERT(type == BTRFS_ORDERED_REGULAR || - type == BTRFS_ORDERED_NOCOW || - type == BTRFS_ORDERED_PREALLOC); - return __btrfs_add_ordered_extent(inode, file_offset, disk_bytenr, - num_bytes, disk_num_bytes, type, 1, - BTRFS_COMPRESS_NONE); -} - -int btrfs_add_ordered_extent_compress(struct btrfs_inode *inode, u64 file_offset, - u64 disk_bytenr, u64 num_bytes, - u64 disk_num_bytes, int compress_type) -{ - ASSERT(compress_type != BTRFS_COMPRESS_NONE); - return __btrfs_add_ordered_extent(inode, file_offset, disk_bytenr, - num_bytes, disk_num_bytes, - BTRFS_ORDERED_COMPRESSED, 0, - compress_type); -} - /* * Add a struct btrfs_ordered_sum into the list of checksums to be inserted * when an ordered extent is finished. If the list covers more than one @@ -1052,42 +1025,18 @@ static int clone_ordered_extent(struct btrfs_ordered_extent *ordered, u64 pos, struct btrfs_fs_info *fs_info = BTRFS_I(inode)->root->fs_info; u64 file_offset = ordered->file_offset + pos; u64 disk_bytenr = ordered->disk_bytenr + pos; - u64 num_bytes = len; - u64 disk_num_bytes = len; - int type; - unsigned long flags_masked = ordered->flags & ~(1 << BTRFS_ORDERED_DIRECT); - int compress_type = ordered->compress_type; - unsigned long weight; - int ret; - - weight = hweight_long(flags_masked); - WARN_ON_ONCE(weight > 1); - if (!weight) - type = 0; - else - type = __ffs(flags_masked); + unsigned long flags = ordered->flags & BTRFS_ORDERED_TYPE_FLAGS; /* - * The splitting extent is already counted and will be added again - * in btrfs_add_ordered_extent_*(). Subtract num_bytes to avoid - * double counting. + * The splitting extent is already counted and will be added again in + * btrfs_add_ordered_extent_*(). Subtract len to avoid double counting. */ - percpu_counter_add_batch(&fs_info->ordered_bytes, -num_bytes, + percpu_counter_add_batch(&fs_info->ordered_bytes, -len, fs_info->delalloc_batch); - if (test_bit(BTRFS_ORDERED_COMPRESSED, &ordered->flags)) { - WARN_ON_ONCE(1); - ret = btrfs_add_ordered_extent_compress(BTRFS_I(inode), - file_offset, disk_bytenr, num_bytes, - disk_num_bytes, compress_type); - } else if (test_bit(BTRFS_ORDERED_DIRECT, &ordered->flags)) { - ret = btrfs_add_ordered_extent_dio(BTRFS_I(inode), file_offset, - disk_bytenr, num_bytes, disk_num_bytes, type); - } else { - ret = btrfs_add_ordered_extent(BTRFS_I(inode), file_offset, - disk_bytenr, num_bytes, disk_num_bytes, type); - } - - return ret; + WARN_ON_ONCE(flags & (1 << BTRFS_ORDERED_COMPRESSED)); + return btrfs_add_ordered_extent(BTRFS_I(inode), file_offset, len, len, + disk_bytenr, len, 0, flags, + ordered->compress_type); } int btrfs_split_ordered_extent(struct btrfs_ordered_extent *ordered, u64 pre, diff --git a/fs/btrfs/ordered-data.h b/fs/btrfs/ordered-data.h index 4194e960ff61..0feb0c29839e 100644 --- a/fs/btrfs/ordered-data.h +++ b/fs/btrfs/ordered-data.h @@ -76,6 +76,13 @@ enum { BTRFS_ORDERED_PENDING, }; +/* BTRFS_ORDERED_* flags that specify the type of the extent. */ +#define BTRFS_ORDERED_TYPE_FLAGS ((1UL << BTRFS_ORDERED_REGULAR) | \ + (1UL << BTRFS_ORDERED_NOCOW) | \ + (1UL << BTRFS_ORDERED_PREALLOC) | \ + (1UL << BTRFS_ORDERED_COMPRESSED) | \ + (1UL << BTRFS_ORDERED_DIRECT)) + struct btrfs_ordered_extent { /* logical offset in the file */ u64 file_offset; @@ -84,9 +91,11 @@ struct btrfs_ordered_extent { * These fields directly correspond to the same fields in * btrfs_file_extent_item. */ - u64 disk_bytenr; u64 num_bytes; + u64 ram_bytes; + u64 disk_bytenr; u64 disk_num_bytes; + u64 offset; /* number of bytes that still need writing */ u64 bytes_left; @@ -179,14 +188,9 @@ bool btrfs_dec_test_ordered_pending(struct btrfs_inode *inode, struct btrfs_ordered_extent **cached, u64 file_offset, u64 io_size); int btrfs_add_ordered_extent(struct btrfs_inode *inode, u64 file_offset, - u64 disk_bytenr, u64 num_bytes, u64 disk_num_bytes, - int type); -int btrfs_add_ordered_extent_dio(struct btrfs_inode *inode, u64 file_offset, - u64 disk_bytenr, u64 num_bytes, - u64 disk_num_bytes, int type); -int btrfs_add_ordered_extent_compress(struct btrfs_inode *inode, u64 file_offset, - u64 disk_bytenr, u64 num_bytes, - u64 disk_num_bytes, int compress_type); + u64 num_bytes, u64 ram_bytes, u64 disk_bytenr, + u64 disk_num_bytes, u64 offset, int flags, + int compress_type); void btrfs_add_ordered_sum(struct btrfs_ordered_extent *entry, struct btrfs_ordered_sum *sum); struct btrfs_ordered_extent *btrfs_lookup_ordered_extent(struct btrfs_inode *inode, From patchwork Thu Feb 10 19:09:55 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Omar Sandoval X-Patchwork-Id: 12742338 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id CCC31C433F5 for ; Thu, 10 Feb 2022 19:10:38 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1343730AbiBJTKg (ORCPT ); Thu, 10 Feb 2022 14:10:36 -0500 Received: from mxb-00190b01.gslb.pphosted.com ([23.128.96.19]:48182 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1343724AbiBJTKe (ORCPT ); Thu, 10 Feb 2022 14:10:34 -0500 Received: from mail-pf1-x42a.google.com (mail-pf1-x42a.google.com [IPv6:2607:f8b0:4864:20::42a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 59DDA10BA for ; Thu, 10 Feb 2022 11:10:35 -0800 (PST) Received: by mail-pf1-x42a.google.com with SMTP id y8so9222155pfa.11 for ; Thu, 10 Feb 2022 11:10:35 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=osandov-com.20210112.gappssmtp.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=Rw1biYAnRjYUCnV9Pn4BbD09yYt/EIi9jnt7jmf9VOk=; b=HtXT6kbtolQBYg1YnlqMdzF8b5onpVlZH1+LJFgZoIm41yay0Ig+cOSXPm1t9Mvj+p R9myaH42Qj8A5xGbBlZIYqCN68WIgAHfZxjMxjvjE+yLjchsmgeh6TDYUvy7c04lMnYV fhxE5Xw03G8YoBdW4XhfUZ1CckeqZcaqT6rxONHhfXxoYjaYIj0i0S+LUol0Fclmsci0 1AIM4X08/FJkT2tHjEwkpT7Yy47cPACJj9EYv9SjrSYQcab1ECcKH7eYlgf6pGz5bYvW dyZK0NDrATJPOurKwYX42T+dglG86UgCBIriECmIYklhBmCS2PGp0XA0k0avIP2m04eA t29Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=Rw1biYAnRjYUCnV9Pn4BbD09yYt/EIi9jnt7jmf9VOk=; b=fVNW/CLQcCtAJW6IX2x4kvmf/WWdd0SHlN1In1yyX1ovD2Vs5w8ZBFRvfTd6D5ckHj NrwhxzpZZzKNOwOgACslRaPmUa9FziyvYbgRABuvjBhdxJAhmDBW0Il52rHZz/Fkpcjx bhYQ5HEIbQb3w9IPrGLdlz8Vaa4R8stQYwE4Bn6FscFGXn66zgy0EM+53yz7UgJxLucm OR56SSUtGmeuyxLkGEuWNU5h5O8XHouG0PsHNG9kr4igH/ADugVsRcyb1P02ACpPWCcp irr4wtIJFacHRqIEAcju8mEhs9IkhVELeUvX1TBHIZ/dR2pZHlZSjqZ4zLV2nrCg1Jj8 VYiw== X-Gm-Message-State: AOAM531DjWcASy0cxWBbZL0RPYm7EyFHoAbPyExc+XlA94cbm6+P6Oww YHURGk4hR8P4IC9ToeN3DqBQQpCZ4ZPTtw== X-Google-Smtp-Source: ABdhPJxDDbNaUerYJD5R4eVtw+hPN/vnHXeySVkJVvnN6tvrx9WRTuJ5tpS5yU60gyACpS+fb1zOUw== X-Received: by 2002:a63:fc21:: with SMTP id j33mr962530pgi.557.1644520234611; Thu, 10 Feb 2022 11:10:34 -0800 (PST) Received: from relinquished.tfbnw.net ([2620:10d:c090:400::5:1975]) by smtp.gmail.com with ESMTPSA id n5sm8315898pgt.22.2022.02.10.11.10.33 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 10 Feb 2022 11:10:34 -0800 (PST) From: Omar Sandoval To: linux-btrfs@vger.kernel.org Cc: kernel-team@fb.com Subject: [PATCH v13 05/17] btrfs: support different disk extent size for delalloc Date: Thu, 10 Feb 2022 11:09:55 -0800 Message-Id: X-Mailer: git-send-email 2.35.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org From: Omar Sandoval Currently, we always reserve the same extent size in the file and extent size on disk for delalloc because the former is the worst case for the latter. For BTRFS_IOC_ENCODED_WRITE writes, we know the exact size of the extent on disk, which may be less than or greater than (for bookends) the size in the file. Add a disk_num_bytes parameter to btrfs_delalloc_reserve_metadata() so that we can reserve the correct amount of csum bytes. No functional change. Reviewed-by: Nikolay Borisov Reviewed-by: Josef Bacik Signed-off-by: Omar Sandoval --- fs/btrfs/ctree.h | 3 ++- fs/btrfs/delalloc-space.c | 18 ++++++++++-------- fs/btrfs/file.c | 3 ++- fs/btrfs/inode.c | 4 ++-- fs/btrfs/relocation.c | 2 +- 5 files changed, 17 insertions(+), 13 deletions(-) diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index 65c0d67cd286..a1660f5a8ec6 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -2877,7 +2877,8 @@ void btrfs_subvolume_release_metadata(struct btrfs_root *root, struct btrfs_block_rsv *rsv); void btrfs_delalloc_release_extents(struct btrfs_inode *inode, u64 num_bytes); -int btrfs_delalloc_reserve_metadata(struct btrfs_inode *inode, u64 num_bytes); +int btrfs_delalloc_reserve_metadata(struct btrfs_inode *inode, u64 num_bytes, + u64 disk_num_bytes); u64 btrfs_account_ro_block_groups_free_space(struct btrfs_space_info *sinfo); int btrfs_error_unpin_extent_range(struct btrfs_fs_info *fs_info, u64 start, u64 end); diff --git a/fs/btrfs/delalloc-space.c b/fs/btrfs/delalloc-space.c index fb46a28f5065..bd8267c4687d 100644 --- a/fs/btrfs/delalloc-space.c +++ b/fs/btrfs/delalloc-space.c @@ -270,11 +270,11 @@ static void btrfs_calculate_inode_block_rsv_size(struct btrfs_fs_info *fs_info, } static void calc_inode_reservations(struct btrfs_fs_info *fs_info, - u64 num_bytes, u64 *meta_reserve, - u64 *qgroup_reserve) + u64 num_bytes, u64 disk_num_bytes, + u64 *meta_reserve, u64 *qgroup_reserve) { u64 nr_extents = count_max_extents(num_bytes); - u64 csum_leaves = btrfs_csum_bytes_to_leaves(fs_info, num_bytes); + u64 csum_leaves = btrfs_csum_bytes_to_leaves(fs_info, disk_num_bytes); u64 inode_update = btrfs_calc_metadata_size(fs_info, 1); *meta_reserve = btrfs_calc_insert_metadata_size(fs_info, @@ -288,7 +288,8 @@ static void calc_inode_reservations(struct btrfs_fs_info *fs_info, *qgroup_reserve = nr_extents * fs_info->nodesize; } -int btrfs_delalloc_reserve_metadata(struct btrfs_inode *inode, u64 num_bytes) +int btrfs_delalloc_reserve_metadata(struct btrfs_inode *inode, u64 num_bytes, + u64 disk_num_bytes) { struct btrfs_root *root = inode->root; struct btrfs_fs_info *fs_info = root->fs_info; @@ -318,6 +319,7 @@ int btrfs_delalloc_reserve_metadata(struct btrfs_inode *inode, u64 num_bytes) } num_bytes = ALIGN(num_bytes, fs_info->sectorsize); + disk_num_bytes = ALIGN(disk_num_bytes, fs_info->sectorsize); /* * We always want to do it this way, every other way is wrong and ends @@ -329,8 +331,8 @@ int btrfs_delalloc_reserve_metadata(struct btrfs_inode *inode, u64 num_bytes) * everything out and try again, which is bad. This way we just * over-reserve slightly, and clean up the mess when we are done. */ - calc_inode_reservations(fs_info, num_bytes, &meta_reserve, - &qgroup_reserve); + calc_inode_reservations(fs_info, num_bytes, disk_num_bytes, + &meta_reserve, &qgroup_reserve); ret = btrfs_qgroup_reserve_meta_prealloc(root, qgroup_reserve, true); if (ret) return ret; @@ -349,7 +351,7 @@ int btrfs_delalloc_reserve_metadata(struct btrfs_inode *inode, u64 num_bytes) spin_lock(&inode->lock); nr_extents = count_max_extents(num_bytes); btrfs_mod_outstanding_extents(inode, nr_extents); - inode->csum_bytes += num_bytes; + inode->csum_bytes += disk_num_bytes; btrfs_calculate_inode_block_rsv_size(fs_info, inode); spin_unlock(&inode->lock); @@ -454,7 +456,7 @@ int btrfs_delalloc_reserve_space(struct btrfs_inode *inode, ret = btrfs_check_data_free_space(inode, reserved, start, len); if (ret < 0) return ret; - ret = btrfs_delalloc_reserve_metadata(inode, len); + ret = btrfs_delalloc_reserve_metadata(inode, len, len); if (ret < 0) { btrfs_free_reserved_data_space(inode, *reserved, start, len); extent_changeset_free(*reserved); diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c index c462896122dc..1b3e083adae0 100644 --- a/fs/btrfs/file.c +++ b/fs/btrfs/file.c @@ -1746,7 +1746,8 @@ static noinline ssize_t btrfs_buffered_write(struct kiocb *iocb, fs_info->sectorsize); WARN_ON(reserve_bytes == 0); ret = btrfs_delalloc_reserve_metadata(BTRFS_I(inode), - reserve_bytes); + reserve_bytes, + reserve_bytes); if (ret) { if (!only_release_metadata) btrfs_free_reserved_data_space(BTRFS_I(inode), diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index 55fa13221f5c..5acdbde17574 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -4697,7 +4697,7 @@ int btrfs_truncate_block(struct btrfs_inode *inode, loff_t from, loff_t len, goto out; } } - ret = btrfs_delalloc_reserve_metadata(inode, blocksize); + ret = btrfs_delalloc_reserve_metadata(inode, blocksize, blocksize); if (ret < 0) { if (!only_release_metadata) btrfs_free_reserved_data_space(inode, data_reserved, @@ -7467,7 +7467,7 @@ static int btrfs_get_blocks_direct_write(struct extent_map **map, struct extent_map *em2; /* We can NOCOW, so only need to reserve metadata space. */ - ret = btrfs_delalloc_reserve_metadata(BTRFS_I(inode), len); + ret = btrfs_delalloc_reserve_metadata(BTRFS_I(inode), len, len); if (ret < 0) { /* Our caller expects us to free the input extent map. */ free_extent_map(em); diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c index f5465197996d..9a6fb5c37022 100644 --- a/fs/btrfs/relocation.c +++ b/fs/btrfs/relocation.c @@ -2997,7 +2997,7 @@ static int relocate_one_page(struct inode *inode, struct file_ra_state *ra, /* Reserve metadata for this range */ ret = btrfs_delalloc_reserve_metadata(BTRFS_I(inode), - clamped_len); + clamped_len, clamped_len); if (ret) goto release_page; From patchwork Thu Feb 10 19:09:56 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Omar Sandoval X-Patchwork-Id: 12742337 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 03188C433FE for ; Thu, 10 Feb 2022 19:10:39 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1343732AbiBJTKg (ORCPT ); Thu, 10 Feb 2022 14:10:36 -0500 Received: from mxb-00190b01.gslb.pphosted.com ([23.128.96.19]:48190 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1343720AbiBJTKf (ORCPT ); Thu, 10 Feb 2022 14:10:35 -0500 Received: from mail-pf1-x42b.google.com (mail-pf1-x42b.google.com [IPv6:2607:f8b0:4864:20::42b]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 7EB9010BA for ; Thu, 10 Feb 2022 11:10:36 -0800 (PST) Received: by mail-pf1-x42b.google.com with SMTP id l19so6140869pfu.2 for ; Thu, 10 Feb 2022 11:10:36 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=osandov-com.20210112.gappssmtp.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=Xt3rAE30yMLKYSibbekxlucGm1OKnp1pcr2vGpv56tM=; b=YpCELcdqNEFNgMhOBif/zmSz3hhpMXH3Oon7VznmQziTxhOIT8ctd8Ykwfll0Ofl7n jwysEk8zzw8t5jrVYibsxBKw4gHhxeb4ZAzpX6rDNyJgg1lTFSPBSfemm6PHZPwdl3dd Qj3byuUa4tCH+2RclXUEjou07i6QPr8ThQYV866IQHOJUdw1fO5Fteu5AE0rS8pY3EH6 hzS/MnNNpyXG1HaQdCflj8UYmc3m2UxpBXgb4KDoVGfZ4hdhKasVnuitJKdbUyKleaxB iXpDTnrmbR5YvEE20yEhlY8xhskEuhQrPWH6H3NzZK3sm/U6oDGPWi+OK6waZ5k83dI5 fI4w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=Xt3rAE30yMLKYSibbekxlucGm1OKnp1pcr2vGpv56tM=; b=7Og35QR8gBalCtXgOh7wxYWa6mk8j6qnPQTf8YouJk91pUN8o08L+bi3lNIKfJQiE+ CZAsX38iIdhpMStdUjHC0rKaGFn6LAON7B+ucX06VgFjfK89uTByCu6YankTSftjLoIE hNu9LF0+Be7pmRI7Mej5YXnPPEeTtRM1S/CfI2WbtB/95I87GtDpZaNItF+iBadKU6rN 6J3UIwXP+XTgdD3BAGYK8jdhJWkLP8IrlM6sxMWJJw8vj96UgruKcdX0KR1+mg7RD5HD LO4fOLEE30w2rkcJMj9PGA1FANK+v6Gj8lKGZchqmPjlFvX4jJaO7ghZgwJyEOdWZsIO 2Xqw== X-Gm-Message-State: AOAM530NZL+KaFar8s6f/cJdqXvkdGrAUg2ydrztHP/C2JLuGivMRcki sAsApQ5jbLk/dRRhtPiX3UT/6RbEpsMrJg== X-Google-Smtp-Source: ABdhPJydyURTB5Kg/l/XLdih9bUMIVBiiViU58DpYkAxIDvvJwpyEwzeKOHEJwcbIwnZgXW1osJauA== X-Received: by 2002:aa7:88c6:: with SMTP id k6mr8815676pff.68.1644520235694; Thu, 10 Feb 2022 11:10:35 -0800 (PST) Received: from relinquished.tfbnw.net ([2620:10d:c090:400::5:1975]) by smtp.gmail.com with ESMTPSA id n5sm8315898pgt.22.2022.02.10.11.10.34 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 10 Feb 2022 11:10:35 -0800 (PST) From: Omar Sandoval To: linux-btrfs@vger.kernel.org Cc: kernel-team@fb.com Subject: [PATCH v13 06/17] btrfs: clean up cow_file_range_inline() Date: Thu, 10 Feb 2022 11:09:56 -0800 Message-Id: X-Mailer: git-send-email 2.35.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org From: Omar Sandoval The start parameter to cow_file_range_inline() (and insert_inline_extent()) is always 0, so get rid of it and simplify the logic in those two functions. Pass btrfs_inode to insert_inline_extent() and remove the redundant root parameter. Also document the requirements for creating an inline extent. No functional change. Signed-off-by: Omar Sandoval --- fs/btrfs/inode.c | 88 +++++++++++++++++++++--------------------------- 1 file changed, 38 insertions(+), 50 deletions(-) diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index 5acdbde17574..bd6588c7f8eb 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -239,12 +239,13 @@ static int btrfs_init_inode_security(struct btrfs_trans_handle *trans, * no overlapping inline items exist in the btree */ static int insert_inline_extent(struct btrfs_trans_handle *trans, - struct btrfs_path *path, bool extent_inserted, - struct btrfs_root *root, struct inode *inode, - u64 start, size_t size, size_t compressed_size, + struct btrfs_path *path, + struct btrfs_inode *inode, bool extent_inserted, + size_t size, size_t compressed_size, int compress_type, struct page **compressed_pages) { + struct btrfs_root *root = inode->root; struct extent_buffer *leaf; struct page *page = NULL; char *kaddr; @@ -252,7 +253,6 @@ static int insert_inline_extent(struct btrfs_trans_handle *trans, struct btrfs_file_extent_item *ei; int ret; size_t cur_size = size; - unsigned long offset; ASSERT((compressed_size > 0 && compressed_pages) || (compressed_size == 0 && !compressed_pages)); @@ -264,8 +264,8 @@ static int insert_inline_extent(struct btrfs_trans_handle *trans, struct btrfs_key key; size_t datasize; - key.objectid = btrfs_ino(BTRFS_I(inode)); - key.offset = start; + key.objectid = btrfs_ino(inode); + key.offset = 0; key.type = BTRFS_EXTENT_DATA_KEY; datasize = btrfs_file_extent_calc_inline_size(cur_size); @@ -303,12 +303,10 @@ static int insert_inline_extent(struct btrfs_trans_handle *trans, btrfs_set_file_extent_compression(leaf, ei, compress_type); } else { - page = find_get_page(inode->i_mapping, - start >> PAGE_SHIFT); + page = find_get_page(inode->vfs_inode.i_mapping, 0); btrfs_set_file_extent_compression(leaf, ei, 0); kaddr = kmap_atomic(page); - offset = offset_in_page(start); - write_extent_buffer(leaf, kaddr + offset, ptr, size); + write_extent_buffer(leaf, kaddr, ptr, size); kunmap_atomic(kaddr); put_page(page); } @@ -319,8 +317,8 @@ static int insert_inline_extent(struct btrfs_trans_handle *trans, * We align size to sectorsize for inline extents just for simplicity * sake. */ - size = ALIGN(size, root->fs_info->sectorsize); - ret = btrfs_inode_set_file_extent_range(BTRFS_I(inode), start, size); + ret = btrfs_inode_set_file_extent_range(inode, 0, + ALIGN(size, root->fs_info->sectorsize)); if (ret) goto fail; @@ -333,7 +331,8 @@ static int insert_inline_extent(struct btrfs_trans_handle *trans, * before we unlock the pages. Otherwise we * could end up racing with unlink. */ - BTRFS_I(inode)->disk_i_size = inode->i_size; + inode->disk_i_size = i_size_read(&inode->vfs_inode); + fail: return ret; } @@ -344,8 +343,8 @@ static int insert_inline_extent(struct btrfs_trans_handle *trans, * does the checks required to make sure the data is small enough * to fit as an inline extent. */ -static noinline int cow_file_range_inline(struct btrfs_inode *inode, u64 start, - u64 end, size_t compressed_size, +static noinline int cow_file_range_inline(struct btrfs_inode *inode, u64 size, + size_t compressed_size, int compress_type, struct page **compressed_pages) { @@ -353,26 +352,21 @@ static noinline int cow_file_range_inline(struct btrfs_inode *inode, u64 start, struct btrfs_root *root = inode->root; struct btrfs_fs_info *fs_info = root->fs_info; struct btrfs_trans_handle *trans; - u64 isize = i_size_read(&inode->vfs_inode); - u64 actual_end = min(end + 1, isize); - u64 inline_len = actual_end - start; - u64 aligned_end = ALIGN(end, fs_info->sectorsize); - u64 data_len = inline_len; + u64 data_len = compressed_size ? compressed_size : size; int ret; struct btrfs_path *path; - if (compressed_size) - data_len = compressed_size; - - if (start > 0 || - actual_end > fs_info->sectorsize || + /* + * We can create an inline extent if it ends at or beyond the current + * i_size, is no larger than a sector (decompressed), and the (possibly + * compressed) data fits in a leaf and the configured maximum inline + * size. + */ + if (size < i_size_read(&inode->vfs_inode) || + size > fs_info->sectorsize || data_len > BTRFS_MAX_INLINE_DATA_SIZE(fs_info) || - (!compressed_size && - (actual_end & (fs_info->sectorsize - 1)) == 0) || - end + 1 < isize || - data_len > fs_info->max_inline) { + data_len > fs_info->max_inline) return 1; - } path = btrfs_alloc_path(); if (!path) @@ -386,30 +380,21 @@ static noinline int cow_file_range_inline(struct btrfs_inode *inode, u64 start, trans->block_rsv = &inode->block_rsv; drop_args.path = path; - drop_args.start = start; - drop_args.end = aligned_end; + drop_args.start = 0; + drop_args.end = fs_info->sectorsize; drop_args.drop_cache = true; drop_args.replace_extent = true; - - if (compressed_size && compressed_pages) - drop_args.extent_item_size = btrfs_file_extent_calc_inline_size( - compressed_size); - else - drop_args.extent_item_size = btrfs_file_extent_calc_inline_size( - inline_len); - + drop_args.extent_item_size = btrfs_file_extent_calc_inline_size(data_len); ret = btrfs_drop_extents(trans, root, inode, &drop_args); if (ret) { btrfs_abort_transaction(trans, ret); goto out; } - if (isize > actual_end) - inline_len = min_t(u64, isize, actual_end); - ret = insert_inline_extent(trans, path, drop_args.extent_inserted, - root, &inode->vfs_inode, start, - inline_len, compressed_size, - compress_type, compressed_pages); + ret = insert_inline_extent(trans, path, inode, + drop_args.extent_inserted, size, + compressed_size, compress_type, + compressed_pages); if (ret && ret != -ENOSPC) { btrfs_abort_transaction(trans, ret); goto out; @@ -418,7 +403,7 @@ static noinline int cow_file_range_inline(struct btrfs_inode *inode, u64 start, goto out; } - btrfs_update_inode_bytes(inode, inline_len, drop_args.bytes_found); + btrfs_update_inode_bytes(inode, size, drop_args.bytes_found); ret = btrfs_update_inode(trans, root, inode); if (ret && ret != -ENOSPC) { btrfs_abort_transaction(trans, ret); @@ -739,12 +724,12 @@ static noinline int compress_file_range(struct async_chunk *async_chunk) /* we didn't compress the entire range, try * to make an uncompressed inline extent. */ - ret = cow_file_range_inline(BTRFS_I(inode), start, end, + ret = cow_file_range_inline(BTRFS_I(inode), actual_end, 0, BTRFS_COMPRESS_NONE, NULL); } else { /* try making a compressed inline extent */ - ret = cow_file_range_inline(BTRFS_I(inode), start, end, + ret = cow_file_range_inline(BTRFS_I(inode), actual_end, total_compressed, compress_type, pages); } @@ -1159,8 +1144,11 @@ static noinline int cow_file_range(struct btrfs_inode *inode, * So here we skip inline extent creation completely. */ if (start == 0 && fs_info->sectorsize == PAGE_SIZE) { + u64 actual_end = min_t(u64, i_size_read(&inode->vfs_inode), + end + 1); + /* lets try to make an inline extent */ - ret = cow_file_range_inline(inode, start, end, 0, + ret = cow_file_range_inline(inode, actual_end, 0, BTRFS_COMPRESS_NONE, NULL); if (ret == 0) { /* From patchwork Thu Feb 10 19:09:57 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Omar Sandoval X-Patchwork-Id: 12742339 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 15FAEC4332F for ; Thu, 10 Feb 2022 19:10:40 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1343734AbiBJTKh (ORCPT ); Thu, 10 Feb 2022 14:10:37 -0500 Received: from mxb-00190b01.gslb.pphosted.com ([23.128.96.19]:48200 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1343731AbiBJTKg (ORCPT ); Thu, 10 Feb 2022 14:10:36 -0500 Received: from mail-pf1-x42d.google.com (mail-pf1-x42d.google.com [IPv6:2607:f8b0:4864:20::42d]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 56019110C for ; Thu, 10 Feb 2022 11:10:37 -0800 (PST) Received: by mail-pf1-x42d.google.com with SMTP id t36so1116442pfg.0 for ; Thu, 10 Feb 2022 11:10:37 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=osandov-com.20210112.gappssmtp.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=EpG212sjbwihiGa7491SZqW25yRnSlsMv9ZhY3K1bvI=; b=yRpg3qxzcgcDx8+HUn55FJ29A5w7Wdh+Ojk85u4fFspjObISXOlgjdVg6QYdNBRWjj QzzkSoTihOZh+EGDkwi2yr5rETPQFuTpGpjPyajiOVH9NhZZAV4CGQbOJNFptcEnuKgp ijOmsLg8rKqTI0wgj//YsVOM8pOXQoRPTDkgIwxi53QEm7NhMeO28zmEPBSrQbxX/FA7 9LD0Jb3HHCiiAHI7xf0g66I4qB2joTeIJRQuhLWOTql8wPM+ybKABn4TinFTdgiGHCtz 49PVqH1UpNhBLeiyMqtKEUMdf1nRg801lwl/uIcSj2TBgswRwm3Q5wqnnwgm7M+88mGe m0HA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=EpG212sjbwihiGa7491SZqW25yRnSlsMv9ZhY3K1bvI=; b=uG3WgF/bDDkRSKspHQp5cmY7tTo6lNcNNtfpRJrjUxnYFoLg9lhL4de35Up8U/XePX 7NbDC2s4fdMrMIv6u11nL+3v1QR7aJV5bwOFWEdl5KkwHtlUZ2LU+JfU1nRJ70q7RQxd uSi8EsNYVmQPWAcygSMj3KbL+ENE7RIBOelu9Zmj88ls1YxdBdfr9KT+fhMrshVzGPof gpx3OWQkQdB4N/sj9BUQtCifQ3MLEhxpNDTjfSbZddVy2xU98lYy6e8X8SV8nm5Z3aPt NSqqWMJ6QveUlX5dP+f7JQvEUKC86DEYFPmKdZftn5KD8jl6rp4MkLCshilkxBwjk/Pa vyzg== X-Gm-Message-State: AOAM532fl4YK7kJZMGZoQatZj40E4TRe21Zqh8kkJ/BMquTk2sRLKRRP In6v+MDfizyyGWjwCtIc7wBGCCFAs8dj2g== X-Google-Smtp-Source: ABdhPJwGbLh3bzRnRsB0DgAgSdw89s/8TrSuMub71RbTVaE/5K3txu4Z3UwCU5Ek1ULsNxPGSon9Vg== X-Received: by 2002:a05:6a00:9a9:: with SMTP id u41mr8863943pfg.75.1644520236567; Thu, 10 Feb 2022 11:10:36 -0800 (PST) Received: from relinquished.tfbnw.net ([2620:10d:c090:400::5:1975]) by smtp.gmail.com with ESMTPSA id n5sm8315898pgt.22.2022.02.10.11.10.35 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 10 Feb 2022 11:10:36 -0800 (PST) From: Omar Sandoval To: linux-btrfs@vger.kernel.org Cc: kernel-team@fb.com Subject: [PATCH v13 07/17] btrfs: optionally extend i_size in cow_file_range_inline() Date: Thu, 10 Feb 2022 11:09:57 -0800 Message-Id: X-Mailer: git-send-email 2.35.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org From: Omar Sandoval Currently, an inline extent is always created after i_size is extended from btrfs_dirty_pages(). However, for encoded writes, we only want to update i_size after we successfully created the inline extent. Add an update_i_size parameter to cow_file_range_inline() and insert_inline_extent() and pass in the size of the extent rather than determining it from i_size. Reviewed-by: Josef Bacik Signed-off-by: Omar Sandoval --- fs/btrfs/inode.c | 23 ++++++++++++++++------- 1 file changed, 16 insertions(+), 7 deletions(-) diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index bd6588c7f8eb..f77850e5a682 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -243,7 +243,8 @@ static int insert_inline_extent(struct btrfs_trans_handle *trans, struct btrfs_inode *inode, bool extent_inserted, size_t size, size_t compressed_size, int compress_type, - struct page **compressed_pages) + struct page **compressed_pages, + bool update_i_size) { struct btrfs_root *root = inode->root; struct extent_buffer *leaf; @@ -253,6 +254,7 @@ static int insert_inline_extent(struct btrfs_trans_handle *trans, struct btrfs_file_extent_item *ei; int ret; size_t cur_size = size; + u64 i_size; ASSERT((compressed_size > 0 && compressed_pages) || (compressed_size == 0 && !compressed_pages)); @@ -331,7 +333,12 @@ static int insert_inline_extent(struct btrfs_trans_handle *trans, * before we unlock the pages. Otherwise we * could end up racing with unlink. */ - inode->disk_i_size = i_size_read(&inode->vfs_inode); + i_size = i_size_read(&inode->vfs_inode); + if (update_i_size && size > i_size) { + i_size_write(&inode->vfs_inode, size); + i_size = size; + } + inode->disk_i_size = i_size; fail: return ret; @@ -346,7 +353,8 @@ static int insert_inline_extent(struct btrfs_trans_handle *trans, static noinline int cow_file_range_inline(struct btrfs_inode *inode, u64 size, size_t compressed_size, int compress_type, - struct page **compressed_pages) + struct page **compressed_pages, + bool update_i_size) { struct btrfs_drop_extents_args drop_args = { 0 }; struct btrfs_root *root = inode->root; @@ -394,7 +402,7 @@ static noinline int cow_file_range_inline(struct btrfs_inode *inode, u64 size, ret = insert_inline_extent(trans, path, inode, drop_args.extent_inserted, size, compressed_size, compress_type, - compressed_pages); + compressed_pages, update_i_size); if (ret && ret != -ENOSPC) { btrfs_abort_transaction(trans, ret); goto out; @@ -726,12 +734,13 @@ static noinline int compress_file_range(struct async_chunk *async_chunk) */ ret = cow_file_range_inline(BTRFS_I(inode), actual_end, 0, BTRFS_COMPRESS_NONE, - NULL); + NULL, false); } else { /* try making a compressed inline extent */ ret = cow_file_range_inline(BTRFS_I(inode), actual_end, total_compressed, - compress_type, pages); + compress_type, pages, + false); } if (ret <= 0) { unsigned long clear_flags = EXTENT_DELALLOC | @@ -1149,7 +1158,7 @@ static noinline int cow_file_range(struct btrfs_inode *inode, /* lets try to make an inline extent */ ret = cow_file_range_inline(inode, actual_end, 0, - BTRFS_COMPRESS_NONE, NULL); + BTRFS_COMPRESS_NONE, NULL, false); if (ret == 0) { /* * We use DO_ACCOUNTING here because we need the From patchwork Thu Feb 10 19:09:58 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Omar Sandoval X-Patchwork-Id: 12742340 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id D72CEC43217 for ; Thu, 10 Feb 2022 19:10:40 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1343736AbiBJTKi (ORCPT ); Thu, 10 Feb 2022 14:10:38 -0500 Received: from mxb-00190b01.gslb.pphosted.com ([23.128.96.19]:48210 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1343733AbiBJTKh (ORCPT ); Thu, 10 Feb 2022 14:10:37 -0500 Received: from mail-pf1-x42f.google.com (mail-pf1-x42f.google.com [IPv6:2607:f8b0:4864:20::42f]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 5536010BA for ; Thu, 10 Feb 2022 11:10:38 -0800 (PST) Received: by mail-pf1-x42f.google.com with SMTP id r19so11948321pfh.6 for ; Thu, 10 Feb 2022 11:10:38 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=osandov-com.20210112.gappssmtp.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=XGpxFGVFUzjgSC60uqsV6E7m0Hr+yTpZZ8E54p81lNs=; b=mshaAYj+oXfqAAwFgC6nysic5MOsJkfgC/U5JrJZf8pNN2b/Z6QMk3nnhlFLs9CNCe UVwfdWjEXIb0b2+DwJAzBbaP+1uTUHfWsW6ch41+GDHHqO7sHfzca+lVmzjrc+JVY0RS rJkcOBXMBj4N1GTQ/0WmJTmQQ+E8du7dJh19pz2DqHuLRdGL91oeO3dOIGKUvuZjboEv 38c/0UrF6/vGsklSFDx+3rVFfOcbdwQ4rqec1XfEtfttO/FAZ4ptOXT7oiIjQif5rJee mRIU3zad4eEKf8sKXBjPvzZ7WO6HfifIWODGteH6nfV8UeF/hyfpHr6JnO5g6TtrQimL gLkQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=XGpxFGVFUzjgSC60uqsV6E7m0Hr+yTpZZ8E54p81lNs=; b=BQNnD9GghZa3frA1/HL+Bw66cbGuL803GEuqw2XuItrSCXrjXG4/McoLB8U7xgp0nC ZFYnwnvF17WMi1eZRZOwYuB9hG/R7mLdEaXAuopLe2QU1h/4/uPjCflSHPIjHNU8tV21 EegVO/LuzIICpKypkpMRthdbSs5vHqGbnFADJ6CDAGq92UZ9pMC55IFtGDV9yVDqhWlF 7O9mkBgvQkDelIYhVOEKg+YjoeIcmwaYRooWhJlNbzXON6ooXtd37X2ACqD08DdwiHed xS7YYFFinTg1HCrFgPCJiu2UoW+0grWC9ZJbeM5EXLW9FakXrFkQKBLyUvQVucohD+oM 59eg== X-Gm-Message-State: AOAM5339pSNyiBXhChN1JuPEMrgIZmEsABWlE9SoCJR1TmCcfkgMq1GH 9f30vtNUAaMQg/KpkiH/Um0JfYl/ml0O9A== X-Google-Smtp-Source: ABdhPJz4te0IjYCIFoHPk7ySw0YAj+B6w//RXAH6FJ1OeYxwdqGPqO35gXMGWBuiyrxd+bMUlx3Y4w== X-Received: by 2002:a63:2b48:: with SMTP id r69mr7258656pgr.177.1644520237417; Thu, 10 Feb 2022 11:10:37 -0800 (PST) Received: from relinquished.tfbnw.net ([2620:10d:c090:400::5:1975]) by smtp.gmail.com with ESMTPSA id n5sm8315898pgt.22.2022.02.10.11.10.36 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 10 Feb 2022 11:10:37 -0800 (PST) From: Omar Sandoval To: linux-btrfs@vger.kernel.org Cc: kernel-team@fb.com Subject: [PATCH v13 08/17] btrfs: add definitions + documentation for encoded I/O ioctls Date: Thu, 10 Feb 2022 11:09:58 -0800 Message-Id: X-Mailer: git-send-email 2.35.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org From: Omar Sandoval In order to allow sending and receiving compressed data without decompressing it, we need an interface to write pre-compressed data directly to the filesystem and the matching interface to read compressed data without decompressing it. This adds the definitions for ioctls to do that and detailed explanations of how to use them. Reviewed-by: Nikolay Borisov Signed-off-by: Omar Sandoval --- include/uapi/linux/btrfs.h | 132 +++++++++++++++++++++++++++++++++++++ 1 file changed, 132 insertions(+) diff --git a/include/uapi/linux/btrfs.h b/include/uapi/linux/btrfs.h index 1cb1a3860f1d..1a96645243e0 100644 --- a/include/uapi/linux/btrfs.h +++ b/include/uapi/linux/btrfs.h @@ -869,6 +869,134 @@ struct btrfs_ioctl_get_subvol_rootref_args { __u8 align[7]; }; +/* + * Data and metadata for an encoded read or write. + * + * Encoded I/O bypasses any encoding automatically done by the filesystem (e.g., + * compression). This can be used to read the compressed contents of a file or + * write pre-compressed data directly to a file. + * + * BTRFS_IOC_ENCODED_READ and BTRFS_IOC_ENCODED_WRITE are essentially + * preadv/pwritev with additional metadata about how the data is encoded and the + * size of the unencoded data. + * + * BTRFS_IOC_ENCODED_READ fills the given iovecs with the encoded data, fills + * the metadata fields, and returns the size of the encoded data. It reads one + * extent per call. It can also read data which is not encoded. + * + * BTRFS_IOC_ENCODED_WRITE uses the metadata fields, writes the encoded data + * from the iovecs, and returns the size of the encoded data. Note that the + * encoded data is not validated when it is written; if it is not valid (e.g., + * it cannot be decompressed), then a subsequent read may return an error. + * + * Since the filesystem page cache contains decoded data, encoded I/O bypasses + * the page cache. Encoded I/O requires CAP_SYS_ADMIN. + */ +struct btrfs_ioctl_encoded_io_args { + /* Input parameters for both reads and writes. */ + + /* + * iovecs containing encoded data. + * + * For reads, if the size of the encoded data is larger than the sum of + * iov[n].iov_len for 0 <= n < iovcnt, then the ioctl fails with + * ENOBUFS. + * + * For writes, the size of the encoded data is the sum of iov[n].iov_len + * for 0 <= n < iovcnt. This must be less than 128 KiB (this limit may + * increase in the future). This must also be less than or equal to + * unencoded_len. + */ + const struct iovec __user *iov; + /* Number of iovecs. */ + unsigned long iovcnt; + /* + * Offset in file. + * + * For writes, must be aligned to the sector size of the filesystem. + */ + __s64 offset; + /* Currently must be zero. */ + __u64 flags; + + /* + * For reads, the following members are output parameters that will + * contain the returned metadata for the encoded data. + * For writes, the following members must be set to the metadata for the + * encoded data. + */ + + /* + * Length of the data in the file. + * + * Must be less than or equal to unencoded_len - unencoded_offset. For + * writes, must be aligned to the sector size of the filesystem unless + * the data ends at or beyond the current end of the file. + */ + __u64 len; + /* + * Length of the unencoded (i.e., decrypted and decompressed) data. + * + * For writes, must be no more than 128 KiB (this limit may increase in + * the future). If the unencoded data is actually longer than + * unencoded_len, then it is truncated; if it is shorter, then it is + * extended with zeroes. + */ + __u64 unencoded_len; + /* + * Offset from the first byte of the unencoded data to the first byte of + * logical data in the file. + * + * Must be less than unencoded_len. + */ + __u64 unencoded_offset; + /* + * BTRFS_ENCODED_IO_COMPRESSION_* type. + * + * For writes, must not be BTRFS_ENCODED_IO_COMPRESSION_NONE. + */ + __u32 compression; + /* Currently always BTRFS_ENCODED_IO_ENCRYPTION_NONE. */ + __u32 encryption; + /* + * Reserved for future expansion. + * + * For reads, always returned as zero. Users should check for non-zero + * bytes. If there are any, then the kernel has a newer version of this + * structure with additional information that the user definition is + * missing. + * + * For writes, must be zeroed. + */ + __u8 reserved[32]; +}; + +/* Data is not compressed. */ +#define BTRFS_ENCODED_IO_COMPRESSION_NONE 0 +/* Data is compressed as a single zlib stream. */ +#define BTRFS_ENCODED_IO_COMPRESSION_ZLIB 1 +/* + * Data is compressed as a single zstd frame with the windowLog compression + * parameter set to no more than 17. + */ +#define BTRFS_ENCODED_IO_COMPRESSION_ZSTD 2 +/* + * Data is compressed sector by sector (using the sector size indicated by the + * name of the constant) with LZO1X and wrapped in the format documented in + * fs/btrfs/lzo.c. For writes, the compression sector size must match the + * filesystem sector size. + */ +#define BTRFS_ENCODED_IO_COMPRESSION_LZO_4K 3 +#define BTRFS_ENCODED_IO_COMPRESSION_LZO_8K 4 +#define BTRFS_ENCODED_IO_COMPRESSION_LZO_16K 5 +#define BTRFS_ENCODED_IO_COMPRESSION_LZO_32K 6 +#define BTRFS_ENCODED_IO_COMPRESSION_LZO_64K 7 +#define BTRFS_ENCODED_IO_COMPRESSION_TYPES 8 + +/* Data is not encrypted. */ +#define BTRFS_ENCODED_IO_ENCRYPTION_NONE 0 +#define BTRFS_ENCODED_IO_ENCRYPTION_TYPES 1 + /* Error codes as returned by the kernel */ enum btrfs_err_code { BTRFS_ERROR_DEV_RAID1_MIN_NOT_MET = 1, @@ -997,5 +1125,9 @@ enum btrfs_err_code { struct btrfs_ioctl_ino_lookup_user_args) #define BTRFS_IOC_SNAP_DESTROY_V2 _IOW(BTRFS_IOCTL_MAGIC, 63, \ struct btrfs_ioctl_vol_args_v2) +#define BTRFS_IOC_ENCODED_READ _IOR(BTRFS_IOCTL_MAGIC, 64, \ + struct btrfs_ioctl_encoded_io_args) +#define BTRFS_IOC_ENCODED_WRITE _IOW(BTRFS_IOCTL_MAGIC, 64, \ + struct btrfs_ioctl_encoded_io_args) #endif /* _UAPI_LINUX_BTRFS_H */ From patchwork Thu Feb 10 19:09:59 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Omar Sandoval X-Patchwork-Id: 12742342 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 78F8DC433F5 for ; Thu, 10 Feb 2022 19:10:42 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1343739AbiBJTKk (ORCPT ); Thu, 10 Feb 2022 14:10:40 -0500 Received: from mxb-00190b01.gslb.pphosted.com ([23.128.96.19]:48224 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1343733AbiBJTKj (ORCPT ); Thu, 10 Feb 2022 14:10:39 -0500 Received: from mail-pf1-x432.google.com (mail-pf1-x432.google.com [IPv6:2607:f8b0:4864:20::432]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 5CC98110C for ; Thu, 10 Feb 2022 11:10:39 -0800 (PST) Received: by mail-pf1-x432.google.com with SMTP id i30so11928368pfk.8 for ; Thu, 10 Feb 2022 11:10:39 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=osandov-com.20210112.gappssmtp.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=GzSS1AY1oY5vmQKMYFo825qbKh+LPftxzufdab3qjoE=; b=roC93po4FlpjrNwd7N651NTS47t3P7LM+ctrzPx6vYmIeB5u1sUP3CyIzUCUYoC4oj 5frRm8zMXwlk2mJScr33+FHy0osJGRnDCxUv1P4i04xEjWkS3OApjLsm2DoxP6dBML0Y cWw0dKrBbyTv3Kv69W/OlDpqlZ3brRbMbiRjL5jHuKOBh0SOP6DfVLI9sZhj0ZiDWsIF Yy98bNi49eD/okSeD7dI9t5aZA/4ULNokJAi+IHL1faXJW3fcTzNfpCrSeAhTo3Caf/y cZfbIlcp433MwzVjzG5GIcf9ynDs96BQpNvoKmnvTXdjvS9xsI6y8EbtrYq3GktDuh0N iAOg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=GzSS1AY1oY5vmQKMYFo825qbKh+LPftxzufdab3qjoE=; b=QY6CKoUGAELFrOUfxk2eedmlQVGxUUJBRRgoNVfvyxtOR4ZDoqMRDXAYWGEbfBuviQ 8BIiBfBiDXczPi63fplGWKse4E7SZ6PnL0gIBX3h0m3w0YJIdfmerEtTjD6pwG/tpt2b PqAerXZUWx3SfVZdPNESCq2uwhJtQLVw2r6DoINti19M2B/SyZWEU9q5oUKp2qf8n0Am SC9DBVDD6Hpxh35uHKJE7I+dScWYfLVY5Yqs2/Lhh3gyD/RO3ZNC8ImHn5DYqxcz1e7k fqFHqeDf04ItI9QVbT0LdAvA3Qu1SRo/VO1Lt1xDpmgTLJwMcHL/un9u6rkfwA23ebkb GBEQ== X-Gm-Message-State: AOAM5335PhD0NMXUPn3M/zc91nrmfF2zhNRlaIumWsMPgxIwicP1+y/J s7L+DMB/ZV4C5+Bz+Sbpp6NGGumQaEIZ3A== X-Google-Smtp-Source: ABdhPJwe5poDtqVOo6LycPyz/PsVAchStEZ0N2vKa0hymb8EKuLQOzdlQJwfCmAR5WPn0QLeZHzr3w== X-Received: by 2002:a62:7ec3:: with SMTP id z186mr8622620pfc.85.1644520238376; Thu, 10 Feb 2022 11:10:38 -0800 (PST) Received: from relinquished.tfbnw.net ([2620:10d:c090:400::5:1975]) by smtp.gmail.com with ESMTPSA id n5sm8315898pgt.22.2022.02.10.11.10.37 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 10 Feb 2022 11:10:37 -0800 (PST) From: Omar Sandoval To: linux-btrfs@vger.kernel.org Cc: kernel-team@fb.com Subject: [PATCH v13 09/17] btrfs: add BTRFS_IOC_ENCODED_READ Date: Thu, 10 Feb 2022 11:09:59 -0800 Message-Id: X-Mailer: git-send-email 2.35.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org From: Omar Sandoval There are 4 main cases: 1. Inline extents: we copy the data straight out of the extent buffer. 2. Hole/preallocated extents: we fill in zeroes. 3. Regular, uncompressed extents: we read the sectors we need directly from disk. 4. Regular, compressed extents: we read the entire compressed extent from disk and indicate what subset of the decompressed extent is in the file. This initial implementation simplifies a few things that can be improved in the future: - Cases 1, 3, and 4 allocate temporary memory to read into before copying out to userspace. - We don't do read repair, because it turns out that read repair is currently broken for compressed data. - We hold the inode lock during the operation. Note that we don't need to hold the mmap lock. We may race with btrfs_page_mkwrite() and read the old data from before the page was dirtied: btrfs_page_mkwrite btrfs_encoded_read --------------------------------------------------- (enter) (enter) btrfs_wait_ordered_range lock_extent_bits btrfs_page_set_dirty unlock_extent_cached (exit) lock_extent_bits read extent (dirty page hasn't been flushed, so this is the old data) unlock_extent_cached (exit) we read the old data from before the page was dirtied. But, that's true even if we were to hold the mmap lock: btrfs_page_mkwrite btrfs_encoded_read ------------------------------------------------------------------- (enter) (enter) btrfs_inode_lock(BTRFS_ILOCK_MMAP) down_read(i_mmap_lock) (blocked) btrfs_wait_ordered_range lock_extent_bits read extent (page hasn't been dirtied, so this is the old data) unlock_extent_cached btrfs_inode_unlock(BTRFS_ILOCK_MMAP) down_read(i_mmap_lock) returns lock_extent_bits btrfs_page_set_dirty unlock_extent_cached In other words, this is inherently racy, so it's fine that we return the old data in this tiny window. Signed-off-by: Omar Sandoval --- fs/btrfs/ctree.h | 4 + fs/btrfs/inode.c | 501 +++++++++++++++++++++++++++++++++++++++++++++++ fs/btrfs/ioctl.c | 106 ++++++++++ 3 files changed, 611 insertions(+) diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index a1660f5a8ec6..ac43f7196825 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -3295,6 +3295,10 @@ int btrfs_writepage_cow_fixup(struct page *page); void btrfs_writepage_endio_finish_ordered(struct btrfs_inode *inode, struct page *page, u64 start, u64 end, bool uptodate); +struct btrfs_ioctl_encoded_io_args; +ssize_t btrfs_encoded_read(struct kiocb *iocb, struct iov_iter *iter, + struct btrfs_ioctl_encoded_io_args *encoded); + extern const struct dentry_operations btrfs_dentry_operations; extern const struct iomap_ops btrfs_dio_iomap_ops; extern const struct iomap_dio_ops btrfs_dio_ops; diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index f77850e5a682..e9bebad7bfcf 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -10131,6 +10131,507 @@ void btrfs_set_range_writeback(struct btrfs_inode *inode, u64 start, u64 end) } } +static int btrfs_encoded_io_compression_from_extent( + struct btrfs_fs_info *fs_info, + int compress_type) +{ + switch (compress_type) { + case BTRFS_COMPRESS_NONE: + return BTRFS_ENCODED_IO_COMPRESSION_NONE; + case BTRFS_COMPRESS_ZLIB: + return BTRFS_ENCODED_IO_COMPRESSION_ZLIB; + case BTRFS_COMPRESS_LZO: + /* + * The LZO format depends on the sector size. 64k is the maximum + * sector size that we support. + */ + if (fs_info->sectorsize < SZ_4K || fs_info->sectorsize > SZ_64K) + return -EINVAL; + return BTRFS_ENCODED_IO_COMPRESSION_LZO_4K + + (fs_info->sectorsize_bits - 12); + case BTRFS_COMPRESS_ZSTD: + return BTRFS_ENCODED_IO_COMPRESSION_ZSTD; + default: + return -EUCLEAN; + } +} + +static ssize_t btrfs_encoded_read_inline( + struct kiocb *iocb, + struct iov_iter *iter, u64 start, + u64 lockend, + struct extent_state **cached_state, + u64 extent_start, size_t count, + struct btrfs_ioctl_encoded_io_args *encoded, + bool *unlocked) +{ + struct btrfs_inode *inode = BTRFS_I(file_inode(iocb->ki_filp)); + struct btrfs_root *root = inode->root; + struct btrfs_fs_info *fs_info = root->fs_info; + struct extent_io_tree *io_tree = &inode->io_tree; + struct btrfs_path *path; + struct extent_buffer *leaf; + struct btrfs_file_extent_item *item; + u64 ram_bytes; + unsigned long ptr; + void *tmp; + ssize_t ret; + + path = btrfs_alloc_path(); + if (!path) { + ret = -ENOMEM; + goto out; + } + ret = btrfs_lookup_file_extent(NULL, root, path, btrfs_ino(inode), + extent_start, 0); + if (ret) { + if (ret > 0) { + /* The extent item disappeared? */ + ret = -EIO; + } + goto out; + } + leaf = path->nodes[0]; + item = btrfs_item_ptr(leaf, path->slots[0], + struct btrfs_file_extent_item); + + ram_bytes = btrfs_file_extent_ram_bytes(leaf, item); + ptr = btrfs_file_extent_inline_start(item); + + encoded->len = min_t(u64, extent_start + ram_bytes, + inode->vfs_inode.i_size) - + iocb->ki_pos; + ret = btrfs_encoded_io_compression_from_extent(fs_info, + btrfs_file_extent_compression(leaf, item)); + if (ret < 0) + goto out; + encoded->compression = ret; + if (encoded->compression) { + size_t inline_size; + + inline_size = btrfs_file_extent_inline_item_len(leaf, + path->slots[0]); + if (inline_size > count) { + ret = -ENOBUFS; + goto out; + } + count = inline_size; + encoded->unencoded_len = ram_bytes; + encoded->unencoded_offset = iocb->ki_pos - extent_start; + } else { + count = min_t(u64, count, encoded->len); + encoded->len = count; + encoded->unencoded_len = count; + ptr += iocb->ki_pos - extent_start; + } + + tmp = kmalloc(count, GFP_NOFS); + if (!tmp) { + ret = -ENOMEM; + goto out; + } + read_extent_buffer(leaf, tmp, ptr, count); + btrfs_release_path(path); + unlock_extent_cached(io_tree, start, lockend, cached_state); + btrfs_inode_unlock(&inode->vfs_inode, BTRFS_ILOCK_SHARED); + *unlocked = true; + + ret = copy_to_iter(tmp, count, iter); + if (ret != count) + ret = -EFAULT; + kfree(tmp); +out: + btrfs_free_path(path); + return ret; +} + +struct btrfs_encoded_read_private { + struct btrfs_inode *inode; + u64 file_offset; + wait_queue_head_t wait; + atomic_t pending; + blk_status_t status; + bool skip_csum; +}; + +static blk_status_t submit_encoded_read_bio(struct btrfs_inode *inode, + struct bio *bio, int mirror_num) +{ + struct btrfs_encoded_read_private *priv = bio->bi_private; + struct btrfs_bio *bbio = btrfs_bio(bio); + struct btrfs_fs_info *fs_info = inode->root->fs_info; + blk_status_t ret; + + if (!priv->skip_csum) { + ret = btrfs_lookup_bio_sums(&inode->vfs_inode, bio, NULL); + if (ret) + return ret; + } + + ret = btrfs_bio_wq_end_io(fs_info, bio, BTRFS_WQ_ENDIO_DATA); + if (ret) { + btrfs_bio_free_csum(bbio); + return ret; + } + + atomic_inc(&priv->pending); + ret = btrfs_map_bio(fs_info, bio, mirror_num); + if (ret) { + atomic_dec(&priv->pending); + btrfs_bio_free_csum(bbio); + } + return ret; +} + +static blk_status_t btrfs_encoded_read_verify_csum(struct btrfs_bio *bbio) +{ + const bool uptodate = (bbio->bio.bi_status == BLK_STS_OK); + struct btrfs_encoded_read_private *priv = bbio->bio.bi_private; + struct btrfs_inode *inode = priv->inode; + struct btrfs_fs_info *fs_info = inode->root->fs_info; + u32 sectorsize = fs_info->sectorsize; + struct bio_vec *bvec; + struct bvec_iter_all iter_all; + u64 start = priv->file_offset; + u32 bio_offset = 0; + + if (priv->skip_csum || !uptodate) + return bbio->bio.bi_status; + + bio_for_each_segment_all(bvec, &bbio->bio, iter_all) { + unsigned int i, nr_sectors, pgoff; + + nr_sectors = BTRFS_BYTES_TO_BLKS(fs_info, bvec->bv_len); + pgoff = bvec->bv_offset; + for (i = 0; i < nr_sectors; i++) { + ASSERT(pgoff < PAGE_SIZE); + if (check_data_csum(&inode->vfs_inode, bbio, bio_offset, + bvec->bv_page, pgoff, start)) + return BLK_STS_IOERR; + start += sectorsize; + bio_offset += sectorsize; + pgoff += sectorsize; + } + } + return BLK_STS_OK; +} + +static void btrfs_encoded_read_endio(struct bio *bio) +{ + struct btrfs_encoded_read_private *priv = bio->bi_private; + struct btrfs_bio *bbio = btrfs_bio(bio); + blk_status_t status; + + status = btrfs_encoded_read_verify_csum(bbio); + if (status) { + /* + * The memory barrier implied by the atomic_dec_return() here + * pairs with the memory barrier implied by the + * atomic_dec_return() or io_wait_event() in + * btrfs_encoded_read_regular_fill_pages() to ensure that this + * write is observed before the load of status in + * btrfs_encoded_read_regular_fill_pages(). + */ + WRITE_ONCE(priv->status, status); + } + if (!atomic_dec_return(&priv->pending)) + wake_up(&priv->wait); + btrfs_bio_free_csum(bbio); + bio_put(bio); +} + +static int btrfs_encoded_read_regular_fill_pages(struct btrfs_inode *inode, + u64 file_offset, + u64 disk_bytenr, + u64 disk_io_size, + struct page **pages) +{ + struct btrfs_fs_info *fs_info = inode->root->fs_info; + struct btrfs_encoded_read_private priv = { + .inode = inode, + .file_offset = file_offset, + .pending = ATOMIC_INIT(1), + .skip_csum = inode->flags & BTRFS_INODE_NODATASUM, + }; + unsigned long i = 0; + u64 cur = 0; + int ret; + + init_waitqueue_head(&priv.wait); + /* + * Submit bios for the extent, splitting due to bio or stripe limits as + * necessary. + */ + while (cur < disk_io_size) { + struct extent_map *em; + struct btrfs_io_geometry geom; + struct bio *bio = NULL; + u64 remaining; + + em = btrfs_get_chunk_map(fs_info, disk_bytenr + cur, + disk_io_size - cur); + if (IS_ERR(em)) { + ret = PTR_ERR(em); + } else { + ret = btrfs_get_io_geometry(fs_info, em, BTRFS_MAP_READ, + disk_bytenr + cur, &geom); + free_extent_map(em); + } + if (ret) { + WRITE_ONCE(priv.status, errno_to_blk_status(ret)); + break; + } + remaining = min(geom.len, disk_io_size - cur); + while (bio || remaining) { + size_t bytes = min_t(u64, remaining, PAGE_SIZE); + + if (!bio) { + bio = btrfs_bio_alloc(BIO_MAX_VECS); + bio->bi_iter.bi_sector = + (disk_bytenr + cur) >> SECTOR_SHIFT; + bio->bi_end_io = btrfs_encoded_read_endio; + bio->bi_private = &priv; + bio->bi_opf = REQ_OP_READ; + } + + if (!bytes || + bio_add_page(bio, pages[i], bytes, 0) < bytes) { + blk_status_t status; + + status = submit_encoded_read_bio(inode, bio, 0); + if (status) { + WRITE_ONCE(priv.status, status); + bio_put(bio); + goto out; + } + bio = NULL; + continue; + } + + i++; + cur += bytes; + remaining -= bytes; + } + } + +out: + if (atomic_dec_return(&priv.pending)) + io_wait_event(priv.wait, !atomic_read(&priv.pending)); + /* See btrfs_encoded_read_endio() for ordering. */ + return blk_status_to_errno(READ_ONCE(priv.status)); +} + +static ssize_t btrfs_encoded_read_regular(struct kiocb *iocb, + struct iov_iter *iter, + u64 start, u64 lockend, + struct extent_state **cached_state, + u64 disk_bytenr, u64 disk_io_size, + size_t count, bool compressed, + bool *unlocked) +{ + struct btrfs_inode *inode = BTRFS_I(file_inode(iocb->ki_filp)); + struct extent_io_tree *io_tree = &inode->io_tree; + struct page **pages; + unsigned long nr_pages, i; + u64 cur; + size_t page_offset; + ssize_t ret; + + nr_pages = DIV_ROUND_UP(disk_io_size, PAGE_SIZE); + pages = kcalloc(nr_pages, sizeof(struct page *), GFP_NOFS); + if (!pages) + return -ENOMEM; + for (i = 0; i < nr_pages; i++) { + pages[i] = alloc_page(GFP_NOFS); + if (!pages[i]) { + ret = -ENOMEM; + goto out; + } + } + + ret = btrfs_encoded_read_regular_fill_pages(inode, start, disk_bytenr, + disk_io_size, pages); + if (ret) + goto out; + + unlock_extent_cached(io_tree, start, lockend, cached_state); + btrfs_inode_unlock(&inode->vfs_inode, BTRFS_ILOCK_SHARED); + *unlocked = true; + + if (compressed) { + i = 0; + page_offset = 0; + } else { + i = (iocb->ki_pos - start) >> PAGE_SHIFT; + page_offset = (iocb->ki_pos - start) & (PAGE_SIZE - 1); + } + cur = 0; + while (cur < count) { + size_t bytes = min_t(size_t, count - cur, + PAGE_SIZE - page_offset); + + if (copy_page_to_iter(pages[i], page_offset, bytes, + iter) != bytes) { + ret = -EFAULT; + goto out; + } + i++; + cur += bytes; + page_offset = 0; + } + ret = count; +out: + for (i = 0; i < nr_pages; i++) { + if (pages[i]) + __free_page(pages[i]); + } + kfree(pages); + return ret; +} + +ssize_t btrfs_encoded_read(struct kiocb *iocb, struct iov_iter *iter, + struct btrfs_ioctl_encoded_io_args *encoded) +{ + struct btrfs_inode *inode = BTRFS_I(file_inode(iocb->ki_filp)); + struct btrfs_fs_info *fs_info = inode->root->fs_info; + struct extent_io_tree *io_tree = &inode->io_tree; + ssize_t ret; + size_t count = iov_iter_count(iter); + u64 start, lockend, disk_bytenr, disk_io_size; + struct extent_state *cached_state = NULL; + struct extent_map *em; + bool unlocked = false; + + file_accessed(iocb->ki_filp); + + btrfs_inode_lock(&inode->vfs_inode, BTRFS_ILOCK_SHARED); + + if (iocb->ki_pos >= inode->vfs_inode.i_size) { + btrfs_inode_unlock(&inode->vfs_inode, BTRFS_ILOCK_SHARED); + return 0; + } + start = ALIGN_DOWN(iocb->ki_pos, fs_info->sectorsize); + /* + * We don't know how long the extent containing iocb->ki_pos is, but if + * it's compressed we know that it won't be longer than this. + */ + lockend = start + BTRFS_MAX_UNCOMPRESSED - 1; + + for (;;) { + struct btrfs_ordered_extent *ordered; + + ret = btrfs_wait_ordered_range(&inode->vfs_inode, start, + lockend - start + 1); + if (ret) + goto out_unlock_inode; + lock_extent_bits(io_tree, start, lockend, &cached_state); + ordered = btrfs_lookup_ordered_range(inode, start, + lockend - start + 1); + if (!ordered) + break; + btrfs_put_ordered_extent(ordered); + unlock_extent_cached(io_tree, start, lockend, &cached_state); + cond_resched(); + } + + em = btrfs_get_extent(inode, NULL, 0, start, lockend - start + 1); + if (IS_ERR(em)) { + ret = PTR_ERR(em); + goto out_unlock_extent; + } + + if (em->block_start == EXTENT_MAP_INLINE) { + u64 extent_start = em->start; + + /* + * For inline extents we get everything we need out of the + * extent item. + */ + free_extent_map(em); + em = NULL; + ret = btrfs_encoded_read_inline(iocb, iter, start, lockend, + &cached_state, extent_start, + count, encoded, &unlocked); + goto out; + } + + /* + * We only want to return up to EOF even if the extent extends beyond + * that. + */ + encoded->len = min_t(u64, extent_map_end(em), inode->vfs_inode.i_size) - + iocb->ki_pos; + if (em->block_start == EXTENT_MAP_HOLE || + test_bit(EXTENT_FLAG_PREALLOC, &em->flags)) { + disk_bytenr = EXTENT_MAP_HOLE; + count = min_t(u64, count, encoded->len); + encoded->len = count; + encoded->unencoded_len = count; + } else if (test_bit(EXTENT_FLAG_COMPRESSED, &em->flags)) { + disk_bytenr = em->block_start; + /* + * Bail if the buffer isn't large enough to return the whole + * compressed extent. + */ + if (em->block_len > count) { + ret = -ENOBUFS; + goto out_em; + } + disk_io_size = count = em->block_len; + encoded->unencoded_len = em->ram_bytes; + encoded->unencoded_offset = iocb->ki_pos - em->orig_start; + ret = btrfs_encoded_io_compression_from_extent(fs_info, + em->compress_type); + if (ret < 0) + goto out_em; + encoded->compression = ret; + } else { + disk_bytenr = em->block_start + (start - em->start); + if (encoded->len > count) + encoded->len = count; + /* + * Don't read beyond what we locked. This also limits the page + * allocations that we'll do. + */ + disk_io_size = min(lockend + 1, + iocb->ki_pos + encoded->len) - start; + count = start + disk_io_size - iocb->ki_pos; + encoded->len = count; + encoded->unencoded_len = count; + disk_io_size = ALIGN(disk_io_size, fs_info->sectorsize); + } + free_extent_map(em); + em = NULL; + + if (disk_bytenr == EXTENT_MAP_HOLE) { + unlock_extent_cached(io_tree, start, lockend, &cached_state); + btrfs_inode_unlock(&inode->vfs_inode, BTRFS_ILOCK_SHARED); + unlocked = true; + ret = iov_iter_zero(count, iter); + if (ret != count) + ret = -EFAULT; + } else { + ret = btrfs_encoded_read_regular(iocb, iter, start, lockend, + &cached_state, disk_bytenr, + disk_io_size, count, + encoded->compression, + &unlocked); + } + +out: + if (ret >= 0) + iocb->ki_pos += encoded->len; +out_em: + free_extent_map(em); +out_unlock_extent: + if (!unlocked) + unlock_extent_cached(io_tree, start, lockend, &cached_state); +out_unlock_inode: + if (!unlocked) + btrfs_inode_unlock(&inode->vfs_inode, BTRFS_ILOCK_SHARED); + return ret; +} + #ifdef CONFIG_SWAP /* * Add an entry indicating a block group or device which is pinned by a diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c index 294288a5dc23..855ab63cc9a0 100644 --- a/fs/btrfs/ioctl.c +++ b/fs/btrfs/ioctl.c @@ -28,6 +28,7 @@ #include #include #include +#include #include "ctree.h" #include "disk-io.h" #include "export.h" @@ -88,6 +89,22 @@ struct btrfs_ioctl_send_args_32 { #define BTRFS_IOC_SEND_32 _IOW(BTRFS_IOCTL_MAGIC, 38, \ struct btrfs_ioctl_send_args_32) + +struct btrfs_ioctl_encoded_io_args_32 { + compat_uptr_t iov; + compat_ulong_t iovcnt; + __s64 offset; + __u64 flags; + __u64 len; + __u64 unencoded_len; + __u64 unencoded_offset; + __u32 compression; + __u32 encryption; + __u32 reserved[8]; +}; + +#define BTRFS_IOC_ENCODED_READ_32 _IOR(BTRFS_IOCTL_MAGIC, 64, \ + struct btrfs_ioctl_encoded_io_args_32) #endif /* Mask out flags that are inappropriate for the given type of inode. */ @@ -4977,6 +4994,89 @@ static int _btrfs_ioctl_send(struct inode *inode, void __user *argp, bool compat return ret; } +static int btrfs_ioctl_encoded_read(struct file *file, void __user *argp, + bool compat) +{ + struct btrfs_ioctl_encoded_io_args args = {}; + size_t copy_end_kernel = offsetofend(struct btrfs_ioctl_encoded_io_args, + flags); + size_t copy_end; + struct iovec iovstack[UIO_FASTIOV]; + struct iovec *iov = iovstack; + struct iov_iter iter; + loff_t pos; + struct kiocb kiocb; + ssize_t ret; + + if (!capable(CAP_SYS_ADMIN)) { + ret = -EPERM; + goto out_acct; + } + + if (compat) { +#if defined(CONFIG_64BIT) && defined(CONFIG_COMPAT) + struct btrfs_ioctl_encoded_io_args_32 args32; + + copy_end = offsetofend(struct btrfs_ioctl_encoded_io_args_32, + flags); + if (copy_from_user(&args32, argp, copy_end)) { + ret = -EFAULT; + goto out_acct; + } + args.iov = compat_ptr(args32.iov); + args.iovcnt = args32.iovcnt; + args.offset = args32.offset; + args.flags = args32.flags; +#else + return -ENOTTY; +#endif + } else { + copy_end = copy_end_kernel; + if (copy_from_user(&args, argp, copy_end)) { + ret = -EFAULT; + goto out_acct; + } + } + if (args.flags != 0) { + ret = -EINVAL; + goto out_acct; + } + + ret = import_iovec(READ, args.iov, args.iovcnt, ARRAY_SIZE(iovstack), + &iov, &iter); + if (ret < 0) + goto out_acct; + + if (iov_iter_count(&iter) == 0) { + ret = 0; + goto out_iov; + } + pos = args.offset; + ret = rw_verify_area(READ, file, &pos, args.len); + if (ret < 0) + goto out_iov; + + init_sync_kiocb(&kiocb, file); + kiocb.ki_pos = pos; + + ret = btrfs_encoded_read(&kiocb, &iter, &args); + if (ret >= 0) { + fsnotify_access(file); + if (copy_to_user(argp + copy_end, + (char *)&args + copy_end_kernel, + sizeof(args) - copy_end_kernel)) + ret = -EFAULT; + } + +out_iov: + kfree(iov); +out_acct: + if (ret > 0) + add_rchar(current, ret); + inc_syscr(current); + return ret; +} + long btrfs_ioctl(struct file *file, unsigned int cmd, unsigned long arg) { @@ -5121,6 +5221,12 @@ long btrfs_ioctl(struct file *file, unsigned int return fsverity_ioctl_enable(file, (const void __user *)argp); case FS_IOC_MEASURE_VERITY: return fsverity_ioctl_measure(file, argp); + case BTRFS_IOC_ENCODED_READ: + return btrfs_ioctl_encoded_read(file, argp, false); +#if defined(CONFIG_64BIT) && defined(CONFIG_COMPAT) + case BTRFS_IOC_ENCODED_READ_32: + return btrfs_ioctl_encoded_read(file, argp, true); +#endif } return -ENOTTY; From patchwork Thu Feb 10 19:10:00 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Omar Sandoval X-Patchwork-Id: 12742345 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 50DD3C433FE for ; Thu, 10 Feb 2022 19:10:45 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1343741AbiBJTKn (ORCPT ); Thu, 10 Feb 2022 14:10:43 -0500 Received: from mxb-00190b01.gslb.pphosted.com ([23.128.96.19]:48248 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1343738AbiBJTKl (ORCPT ); Thu, 10 Feb 2022 14:10:41 -0500 Received: from mail-pl1-x633.google.com (mail-pl1-x633.google.com [IPv6:2607:f8b0:4864:20::633]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 2E3C010BA for ; Thu, 10 Feb 2022 11:10:41 -0800 (PST) Received: by mail-pl1-x633.google.com with SMTP id c3so2680552pls.5 for ; Thu, 10 Feb 2022 11:10:41 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=osandov-com.20210112.gappssmtp.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=GC3qEKJuXiTY1ZFCLoHxfx1KjczmRYYsMlTl5KzB9ZU=; b=qmT0MkVHR5esexCy7ILSJ+pUDgY6R2LVmbrfv4Ai+Rsr3NEE2/09g/1rYGanKSGyEN LDC0Of9bb+Vpnuz3A1YbII0e6fbac5I8DDBlUAG8pOEUtLPyOZyxXNTxqfHFl5vWP9YZ cT8Z2Sotdu+7CAZYittNU5sJkyAIzun/KcaO7rEiEdxZiqtbLKV2iNmfDhDxKZBajDiu ll9iQnefRza5G5pOvMspPPysSPeYe8m4PutHpV2zZdAPwK4HDR4yjaKI/Yuak19eHh5G 1NTD5glLdf9sW8UObbrVIPbNo365c3XZANqG6PAmM/yOnw3Q7G5hAroxf1E4kEEt+sUA Eo3Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=GC3qEKJuXiTY1ZFCLoHxfx1KjczmRYYsMlTl5KzB9ZU=; b=UcXfY/FpACOvGswkBNRhQVnsJsKGw7PDihMdZ+X/ekHVW5j7C9370u2o+NBN2MmWTp axgvzEQMCKnlnA1z08pUfWFNstkYb40WNnMpCUzXLPA12QHA5Sbw2+x9U2ognDsjtYzU ZVb9kca0kJdSBeSgKQJTAepVfstU6p6TSIJjIFP6y55PONeks+VfXZOGHl3ft81wT4ys FfLko/0fQN9x5GO5+ElZ+Gb2/eQ7guC7nXOrxQlZbpJpV6yKfYyHY2Ft3jWUuQtbsUgn jGACXMf/si1rtUtePpxDK5vIBlG8tAozstfsB/u4RBptStDC4la8yPuCXTT9fP3TeGyf NflA== X-Gm-Message-State: AOAM532udJluyZx7YgPLovtmOO7qsbtJsxn6IJUlSlIhvlUYUhpby+XI aIat6GP/d0AUbsR4Kr1kOVjZ8+E6G+VySg== X-Google-Smtp-Source: ABdhPJympv6RdiB757VZvlh2JXWoOR89ggHiQPIWZvkQNWTe9J5cyDbap8JdXJqX5uB/p6Rno1opVw== X-Received: by 2002:a17:90b:3808:: with SMTP id mq8mr4330173pjb.203.1644520239375; Thu, 10 Feb 2022 11:10:39 -0800 (PST) Received: from relinquished.tfbnw.net ([2620:10d:c090:400::5:1975]) by smtp.gmail.com with ESMTPSA id n5sm8315898pgt.22.2022.02.10.11.10.38 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 10 Feb 2022 11:10:39 -0800 (PST) From: Omar Sandoval To: linux-btrfs@vger.kernel.org Cc: kernel-team@fb.com Subject: [PATCH v13 10/17] btrfs: add BTRFS_IOC_ENCODED_WRITE Date: Thu, 10 Feb 2022 11:10:00 -0800 Message-Id: <36e79f61d78e8ce4692df013a112cbc7cb5c9fd3.1644519257.git.osandov@fb.com> X-Mailer: git-send-email 2.35.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org From: Omar Sandoval The implementation resembles direct I/O: we have to flush any ordered extents, invalidate the page cache, and do the io tree/delalloc/extent map/ordered extent dance. From there, we can reuse the compression code with a minor modification to distinguish the write from writeback. This also creates inline extents when possible. Signed-off-by: Omar Sandoval --- fs/btrfs/compression.c | 7 +- fs/btrfs/compression.h | 6 +- fs/btrfs/ctree.h | 4 + fs/btrfs/file.c | 65 ++++++++-- fs/btrfs/inode.c | 255 +++++++++++++++++++++++++++++++++++++++- fs/btrfs/ioctl.c | 102 ++++++++++++++++ fs/btrfs/ordered-data.c | 12 +- fs/btrfs/ordered-data.h | 5 +- 8 files changed, 436 insertions(+), 20 deletions(-) diff --git a/fs/btrfs/compression.c b/fs/btrfs/compression.c index 4bd4cdf866aa..00faf6d092d5 100644 --- a/fs/btrfs/compression.c +++ b/fs/btrfs/compression.c @@ -383,7 +383,8 @@ static void finish_compressed_bio_write(struct compressed_bio *cb) cb->start, cb->start + cb->len - 1, !cb->errors); - end_compressed_writeback(inode, cb); + if (cb->writeback) + end_compressed_writeback(inode, cb); /* Note, our inode could be gone now */ /* @@ -506,7 +507,8 @@ blk_status_t btrfs_submit_compressed_write(struct btrfs_inode *inode, u64 start, struct page **compressed_pages, unsigned int nr_pages, unsigned int write_flags, - struct cgroup_subsys_state *blkcg_css) + struct cgroup_subsys_state *blkcg_css, + bool writeback) { struct btrfs_fs_info *fs_info = inode->root->fs_info; struct bio *bio = NULL; @@ -531,6 +533,7 @@ blk_status_t btrfs_submit_compressed_write(struct btrfs_inode *inode, u64 start, cb->mirror_num = 0; cb->compressed_pages = compressed_pages; cb->compressed_len = compressed_len; + cb->writeback = writeback; cb->orig_bio = NULL; cb->nr_pages = nr_pages; diff --git a/fs/btrfs/compression.h b/fs/btrfs/compression.h index 7dbd14caab01..7773953470e3 100644 --- a/fs/btrfs/compression.h +++ b/fs/btrfs/compression.h @@ -54,6 +54,9 @@ struct compressed_bio { /* The compression algorithm for this bio */ u8 compress_type; + /* Whether this is a write for writeback. */ + bool writeback; + /* IO errors */ u8 errors; int mirror_num; @@ -97,7 +100,8 @@ blk_status_t btrfs_submit_compressed_write(struct btrfs_inode *inode, u64 start, struct page **compressed_pages, unsigned int nr_pages, unsigned int write_flags, - struct cgroup_subsys_state *blkcg_css); + struct cgroup_subsys_state *blkcg_css, + bool writeback); blk_status_t btrfs_submit_compressed_read(struct inode *inode, struct bio *bio, int mirror_num, unsigned long bio_flags); diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index ac43f7196825..54f0bf10e885 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -3298,6 +3298,8 @@ void btrfs_writepage_endio_finish_ordered(struct btrfs_inode *inode, struct btrfs_ioctl_encoded_io_args; ssize_t btrfs_encoded_read(struct kiocb *iocb, struct iov_iter *iter, struct btrfs_ioctl_encoded_io_args *encoded); +ssize_t btrfs_do_encoded_write(struct kiocb *iocb, struct iov_iter *from, + const struct btrfs_ioctl_encoded_io_args *encoded); extern const struct dentry_operations btrfs_dentry_operations; extern const struct iomap_ops btrfs_dio_iomap_ops; @@ -3361,6 +3363,8 @@ int btrfs_replace_file_extents(struct btrfs_inode *inode, struct btrfs_trans_handle **trans_out); int btrfs_mark_extent_written(struct btrfs_trans_handle *trans, struct btrfs_inode *inode, u64 start, u64 end); +ssize_t btrfs_do_write_iter(struct kiocb *iocb, struct iov_iter *from, + const struct btrfs_ioctl_encoded_io_args *encoded); int btrfs_release_file(struct inode *inode, struct file *file); int btrfs_dirty_pages(struct btrfs_inode *inode, struct page **pages, size_t num_pages, loff_t pos, size_t write_bytes, diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c index 1b3e083adae0..f74f5eadab9b 100644 --- a/fs/btrfs/file.c +++ b/fs/btrfs/file.c @@ -2064,12 +2064,43 @@ static ssize_t btrfs_direct_write(struct kiocb *iocb, struct iov_iter *from) return err < 0 ? err : written; } -static ssize_t btrfs_file_write_iter(struct kiocb *iocb, - struct iov_iter *from) +static ssize_t btrfs_encoded_write(struct kiocb *iocb, struct iov_iter *from, + const struct btrfs_ioctl_encoded_io_args *encoded) +{ + struct file *file = iocb->ki_filp; + struct inode *inode = file_inode(file); + loff_t count; + ssize_t ret; + + btrfs_inode_lock(inode, 0); + count = encoded->len; + ret = generic_write_checks_count(iocb, &count); + if (ret == 0 && count != encoded->len) { + /* + * The write got truncated by generic_write_checks_count(). We + * can't do a partial encoded write. + */ + ret = -EFBIG; + } + if (ret || encoded->len == 0) + goto out; + + ret = btrfs_write_check(iocb, from, encoded->len); + if (ret < 0) + goto out; + + ret = btrfs_do_encoded_write(iocb, from, encoded); +out: + btrfs_inode_unlock(inode, 0); + return ret; +} + +ssize_t btrfs_do_write_iter(struct kiocb *iocb, struct iov_iter *from, + const struct btrfs_ioctl_encoded_io_args *encoded) { struct file *file = iocb->ki_filp; struct btrfs_inode *inode = BTRFS_I(file_inode(file)); - ssize_t num_written = 0; + ssize_t num_written, num_sync; const bool sync = iocb->ki_flags & IOCB_DSYNC; /* @@ -2080,22 +2111,28 @@ static ssize_t btrfs_file_write_iter(struct kiocb *iocb, if (BTRFS_FS_ERROR(inode->root->fs_info)) return -EROFS; - if (!(iocb->ki_flags & IOCB_DIRECT) && - (iocb->ki_flags & IOCB_NOWAIT)) + if ((iocb->ki_flags & IOCB_NOWAIT) && !(iocb->ki_flags & IOCB_DIRECT)) return -EOPNOTSUPP; if (sync) atomic_inc(&inode->sync_writers); - if (iocb->ki_flags & IOCB_DIRECT) - num_written = btrfs_direct_write(iocb, from); - else - num_written = btrfs_buffered_write(iocb, from); + if (encoded) { + num_written = btrfs_encoded_write(iocb, from, encoded); + num_sync = encoded->len; + } else if (iocb->ki_flags & IOCB_DIRECT) { + num_written = num_sync = btrfs_direct_write(iocb, from); + } else { + num_written = num_sync = btrfs_buffered_write(iocb, from); + } btrfs_set_inode_last_sub_trans(inode); - if (num_written > 0) - num_written = generic_write_sync(iocb, num_written); + if (num_sync > 0) { + num_sync = generic_write_sync(iocb, num_sync); + if (num_sync < 0) + num_written = num_sync; + } if (sync) atomic_dec(&inode->sync_writers); @@ -2104,6 +2141,12 @@ static ssize_t btrfs_file_write_iter(struct kiocb *iocb, return num_written; } +static ssize_t btrfs_file_write_iter(struct kiocb *iocb, + struct iov_iter *from) +{ + return btrfs_do_write_iter(iocb, from, NULL); +} + int btrfs_release_file(struct inode *inode, struct file *filp) { struct btrfs_file_private *private = filp->private_data; diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index e9bebad7bfcf..c8c1a4bab06b 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -1004,7 +1004,7 @@ static int submit_one_async_extent(struct btrfs_inode *inode, async_extent->pages, /* compressed_pages */ async_extent->nr_pages, async_chunk->write_flags, - async_chunk->blkcg_css)) { + async_chunk->blkcg_css, true)) { const u64 start = async_extent->start; const u64 end = start + async_extent->ram_size - 1; @@ -3004,6 +3004,7 @@ static int insert_ordered_extent_file_extent(struct btrfs_trans_handle *trans, * except if the ordered extent was truncated. */ update_inode_bytes = test_bit(BTRFS_ORDERED_DIRECT, &oe->flags) || + test_bit(BTRFS_ORDERED_ENCODED, &oe->flags) || test_bit(BTRFS_ORDERED_TRUNCATED, &oe->flags); return insert_reserved_file_extent(trans, BTRFS_I(oe->inode), @@ -3038,7 +3039,8 @@ static int btrfs_finish_ordered_io(struct btrfs_ordered_extent *ordered_extent) if (!test_bit(BTRFS_ORDERED_NOCOW, &ordered_extent->flags) && !test_bit(BTRFS_ORDERED_PREALLOC, &ordered_extent->flags) && - !test_bit(BTRFS_ORDERED_DIRECT, &ordered_extent->flags)) + !test_bit(BTRFS_ORDERED_DIRECT, &ordered_extent->flags) && + !test_bit(BTRFS_ORDERED_ENCODED, &ordered_extent->flags)) clear_bits |= EXTENT_DELALLOC_NEW; freespace_inode = btrfs_is_free_space_inode(inode); @@ -10632,6 +10634,255 @@ ssize_t btrfs_encoded_read(struct kiocb *iocb, struct iov_iter *iter, return ret; } +ssize_t btrfs_do_encoded_write(struct kiocb *iocb, struct iov_iter *from, + const struct btrfs_ioctl_encoded_io_args *encoded) +{ + struct btrfs_inode *inode = BTRFS_I(file_inode(iocb->ki_filp)); + struct btrfs_root *root = inode->root; + struct btrfs_fs_info *fs_info = root->fs_info; + struct extent_io_tree *io_tree = &inode->io_tree; + struct extent_changeset *data_reserved = NULL; + struct extent_state *cached_state = NULL; + int compression; + size_t orig_count; + u64 start, end; + u64 num_bytes, ram_bytes, disk_num_bytes; + unsigned long nr_pages, i; + struct page **pages; + struct btrfs_key ins; + bool extent_reserved = false; + struct extent_map *em; + ssize_t ret; + + switch (encoded->compression) { + case BTRFS_ENCODED_IO_COMPRESSION_ZLIB: + compression = BTRFS_COMPRESS_ZLIB; + break; + case BTRFS_ENCODED_IO_COMPRESSION_ZSTD: + compression = BTRFS_COMPRESS_ZSTD; + break; + case BTRFS_ENCODED_IO_COMPRESSION_LZO_4K: + case BTRFS_ENCODED_IO_COMPRESSION_LZO_8K: + case BTRFS_ENCODED_IO_COMPRESSION_LZO_16K: + case BTRFS_ENCODED_IO_COMPRESSION_LZO_32K: + case BTRFS_ENCODED_IO_COMPRESSION_LZO_64K: + /* The sector size must match for LZO. */ + if (encoded->compression - + BTRFS_ENCODED_IO_COMPRESSION_LZO_4K + 12 != + fs_info->sectorsize_bits) + return -EINVAL; + compression = BTRFS_COMPRESS_LZO; + break; + default: + return -EINVAL; + } + if (encoded->encryption != BTRFS_ENCODED_IO_ENCRYPTION_NONE) + return -EINVAL; + + orig_count = iov_iter_count(from); + + /* The extent size must be sane. */ + if (encoded->unencoded_len > BTRFS_MAX_UNCOMPRESSED || + orig_count > BTRFS_MAX_COMPRESSED || orig_count == 0) + return -EINVAL; + + /* + * The compressed data must be smaller than the decompressed data. + * + * It's of course possible for data to compress to larger or the same + * size, but the buffered I/O path falls back to no compression for such + * data, and we don't want to break any assumptions by creating these + * extents. + * + * Note that this is less strict than the current check we have that the + * compressed data must be at least one sector smaller than the + * decompressed data. We only want to enforce the weaker requirement + * from old kernels that it is at least one byte smaller. + */ + if (orig_count >= encoded->unencoded_len) + return -EINVAL; + + /* The extent must start on a sector boundary. */ + start = iocb->ki_pos; + if (!IS_ALIGNED(start, fs_info->sectorsize)) + return -EINVAL; + + /* + * The extent must end on a sector boundary. However, we allow a write + * which ends at or extends i_size to have an unaligned length; we round + * up the extent size and set i_size to the unaligned end. + */ + if (start + encoded->len < inode->vfs_inode.i_size && + !IS_ALIGNED(start + encoded->len, fs_info->sectorsize)) + return -EINVAL; + + /* Finally, the offset in the unencoded data must be sector-aligned. */ + if (!IS_ALIGNED(encoded->unencoded_offset, fs_info->sectorsize)) + return -EINVAL; + + num_bytes = ALIGN(encoded->len, fs_info->sectorsize); + ram_bytes = ALIGN(encoded->unencoded_len, fs_info->sectorsize); + end = start + num_bytes - 1; + + /* + * If the extent cannot be inline, the compressed data on disk must be + * sector-aligned. For convenience, we extend it with zeroes if it + * isn't. + */ + disk_num_bytes = ALIGN(orig_count, fs_info->sectorsize); + nr_pages = DIV_ROUND_UP(disk_num_bytes, PAGE_SIZE); + pages = kvcalloc(nr_pages, sizeof(struct page *), GFP_KERNEL_ACCOUNT); + if (!pages) + return -ENOMEM; + for (i = 0; i < nr_pages; i++) { + size_t bytes = min_t(size_t, PAGE_SIZE, iov_iter_count(from)); + char *kaddr; + + pages[i] = alloc_page(GFP_KERNEL_ACCOUNT); + if (!pages[i]) { + ret = -ENOMEM; + goto out_pages; + } + kaddr = kmap(pages[i]); + if (copy_from_iter(kaddr, bytes, from) != bytes) { + kunmap(pages[i]); + ret = -EFAULT; + goto out_pages; + } + if (bytes < PAGE_SIZE) + memset(kaddr + bytes, 0, PAGE_SIZE - bytes); + kunmap(pages[i]); + } + + for (;;) { + struct btrfs_ordered_extent *ordered; + + ret = btrfs_wait_ordered_range(&inode->vfs_inode, start, + num_bytes); + if (ret) + goto out_pages; + ret = invalidate_inode_pages2_range(inode->vfs_inode.i_mapping, + start >> PAGE_SHIFT, + end >> PAGE_SHIFT); + if (ret) + goto out_pages; + lock_extent_bits(io_tree, start, end, &cached_state); + ordered = btrfs_lookup_ordered_range(inode, start, + num_bytes); + if (!ordered && + !filemap_range_has_page(inode->vfs_inode.i_mapping, start, + end)) + break; + if (ordered) + btrfs_put_ordered_extent(ordered); + unlock_extent_cached(io_tree, start, end, &cached_state); + cond_resched(); + } + + /* + * We don't use the higher-level delalloc space functions because our + * num_bytes and disk_num_bytes are different. + */ + ret = btrfs_alloc_data_chunk_ondemand(inode, disk_num_bytes); + if (ret) + goto out_unlock; + ret = btrfs_qgroup_reserve_data(inode, &data_reserved, start, + num_bytes); + if (ret) + goto out_free_data_space; + ret = btrfs_delalloc_reserve_metadata(inode, num_bytes, disk_num_bytes); + if (ret) + goto out_qgroup_free_data; + + /* Try an inline extent first. */ + if (start == 0 && encoded->unencoded_len == encoded->len && + encoded->unencoded_offset == 0) { + ret = cow_file_range_inline(inode, encoded->len, orig_count, + compression, pages, true); + if (ret <= 0) { + if (ret == 0) + ret = orig_count; + goto out_delalloc_release; + } + } + + ret = btrfs_reserve_extent(root, disk_num_bytes, disk_num_bytes, + disk_num_bytes, 0, 0, &ins, 1, 1); + if (ret) + goto out_delalloc_release; + extent_reserved = true; + + em = create_io_em(inode, start, num_bytes, + start - encoded->unencoded_offset, ins.objectid, + ins.offset, ins.offset, ram_bytes, compression, + BTRFS_ORDERED_COMPRESSED); + if (IS_ERR(em)) { + ret = PTR_ERR(em); + goto out_free_reserved; + } + free_extent_map(em); + + ret = btrfs_add_ordered_extent(inode, start, num_bytes, ram_bytes, + ins.objectid, ins.offset, + encoded->unencoded_offset, + (1 << BTRFS_ORDERED_ENCODED) | + (1 << BTRFS_ORDERED_COMPRESSED), + compression); + if (ret) { + btrfs_drop_extent_cache(inode, start, end, 0); + goto out_free_reserved; + } + btrfs_dec_block_group_reservations(fs_info, ins.objectid); + + if (start + encoded->len > inode->vfs_inode.i_size) + i_size_write(&inode->vfs_inode, start + encoded->len); + + unlock_extent_cached(io_tree, start, end, &cached_state); + + btrfs_delalloc_release_extents(inode, num_bytes); + + if (btrfs_submit_compressed_write(inode, start, num_bytes, ins.objectid, + ins.offset, pages, nr_pages, 0, NULL, + false)) { + btrfs_writepage_endio_finish_ordered(inode, pages[0], + start, end, 0); + ret = -EIO; + goto out_pages; + } + ret = orig_count; + goto out; + +out_free_reserved: + btrfs_dec_block_group_reservations(fs_info, ins.objectid); + btrfs_free_reserved_extent(fs_info, ins.objectid, ins.offset, 1); +out_delalloc_release: + btrfs_delalloc_release_extents(inode, num_bytes); + btrfs_delalloc_release_metadata(inode, disk_num_bytes, ret < 0); +out_qgroup_free_data: + if (ret < 0) { + btrfs_qgroup_free_data(inode, data_reserved, start, num_bytes); + } +out_free_data_space: + /* + * If btrfs_reserve_extent() succeeded, then we already decremented + * bytes_may_use. + */ + if (!extent_reserved) + btrfs_free_reserved_data_space_noquota(fs_info, disk_num_bytes); +out_unlock: + unlock_extent_cached(io_tree, start, end, &cached_state); +out_pages: + for (i = 0; i < nr_pages; i++) { + if (pages[i]) + __free_page(pages[i]); + } + kvfree(pages); +out: + if (ret >= 0) + iocb->ki_pos += encoded->len; + return ret; +} + #ifdef CONFIG_SWAP /* * Add an entry indicating a block group or device which is pinned by a diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c index 855ab63cc9a0..2b1cfc906fd3 100644 --- a/fs/btrfs/ioctl.c +++ b/fs/btrfs/ioctl.c @@ -105,6 +105,8 @@ struct btrfs_ioctl_encoded_io_args_32 { #define BTRFS_IOC_ENCODED_READ_32 _IOR(BTRFS_IOCTL_MAGIC, 64, \ struct btrfs_ioctl_encoded_io_args_32) +#define BTRFS_IOC_ENCODED_WRITE_32 _IOW(BTRFS_IOCTL_MAGIC, 64, \ + struct btrfs_ioctl_encoded_io_args_32) #endif /* Mask out flags that are inappropriate for the given type of inode. */ @@ -5077,6 +5079,102 @@ static int btrfs_ioctl_encoded_read(struct file *file, void __user *argp, return ret; } +static int btrfs_ioctl_encoded_write(struct file *file, void __user *argp, + bool compat) +{ + struct btrfs_ioctl_encoded_io_args args; + struct iovec iovstack[UIO_FASTIOV]; + struct iovec *iov = iovstack; + struct iov_iter iter; + loff_t pos; + struct kiocb kiocb; + ssize_t ret; + + if (!capable(CAP_SYS_ADMIN)) { + ret = -EPERM; + goto out_acct; + } + + if (!(file->f_mode & FMODE_WRITE)) { + ret = -EBADF; + goto out_acct; + } + + if (compat) { +#if defined(CONFIG_64BIT) && defined(CONFIG_COMPAT) + struct btrfs_ioctl_encoded_io_args_32 args32; + + if (copy_from_user(&args32, argp, sizeof(args32))) { + ret = -EFAULT; + goto out_acct; + } + args.iov = compat_ptr(args32.iov); + args.iovcnt = args32.iovcnt; + memcpy(&args.offset, &args32.offset, + sizeof(args) - + offsetof(struct btrfs_ioctl_encoded_io_args, offset)); +#else + return -ENOTTY; +#endif + } else { + if (copy_from_user(&args, argp, sizeof(args))) { + ret = -EFAULT; + goto out_acct; + } + } + + ret = -EINVAL; + if (args.flags != 0) + goto out_acct; + if (memchr_inv(args.reserved, 0, sizeof(args.reserved))) + goto out_acct; + if (args.compression == BTRFS_ENCODED_IO_COMPRESSION_NONE && + args.encryption == BTRFS_ENCODED_IO_ENCRYPTION_NONE) + goto out_acct; + if (args.compression >= BTRFS_ENCODED_IO_COMPRESSION_TYPES || + args.encryption >= BTRFS_ENCODED_IO_ENCRYPTION_TYPES) + goto out_acct; + if (args.unencoded_offset > args.unencoded_len) + goto out_acct; + if (args.len > args.unencoded_len - args.unencoded_offset) + goto out_acct; + + ret = import_iovec(WRITE, args.iov, args.iovcnt, ARRAY_SIZE(iovstack), + &iov, &iter); + if (ret < 0) + goto out_acct; + + file_start_write(file); + + if (iov_iter_count(&iter) == 0) { + ret = 0; + goto out_end_write; + } + pos = args.offset; + ret = rw_verify_area(WRITE, file, &pos, args.len); + if (ret < 0) + goto out_end_write; + + init_sync_kiocb(&kiocb, file); + ret = kiocb_set_rw_flags(&kiocb, 0); + if (ret) + goto out_end_write; + kiocb.ki_pos = pos; + + ret = btrfs_do_write_iter(&kiocb, &iter, &args); + if (ret > 0) + fsnotify_modify(file); + +out_end_write: + file_end_write(file); + kfree(iov); +out_acct: + if (ret > 0) + add_wchar(current, ret); + inc_syscw(current); + return ret; +} + long btrfs_ioctl(struct file *file, unsigned int cmd, unsigned long arg) { @@ -5223,9 +5321,13 @@ long btrfs_ioctl(struct file *file, unsigned int return fsverity_ioctl_measure(file, argp); case BTRFS_IOC_ENCODED_READ: return btrfs_ioctl_encoded_read(file, argp, false); + case BTRFS_IOC_ENCODED_WRITE: + return btrfs_ioctl_encoded_write(file, argp, false); #if defined(CONFIG_64BIT) && defined(CONFIG_COMPAT) case BTRFS_IOC_ENCODED_READ_32: return btrfs_ioctl_encoded_read(file, argp, true); + case BTRFS_IOC_ENCODED_WRITE_32: + return btrfs_ioctl_encoded_write(file, argp, true); #endif } diff --git a/fs/btrfs/ordered-data.c b/fs/btrfs/ordered-data.c index 5e4c59b00b01..7837336ab4b9 100644 --- a/fs/btrfs/ordered-data.c +++ b/fs/btrfs/ordered-data.c @@ -521,9 +521,15 @@ void btrfs_remove_ordered_extent(struct btrfs_inode *btrfs_inode, spin_lock(&btrfs_inode->lock); btrfs_mod_outstanding_extents(btrfs_inode, -1); spin_unlock(&btrfs_inode->lock); - if (root != fs_info->tree_root) - btrfs_delalloc_release_metadata(btrfs_inode, entry->num_bytes, - false); + if (root != fs_info->tree_root) { + u64 release; + + if (test_bit(BTRFS_ORDERED_ENCODED, &entry->flags)) + release = entry->disk_num_bytes; + else + release = entry->num_bytes; + btrfs_delalloc_release_metadata(btrfs_inode, release, false); + } percpu_counter_add_batch(&fs_info->ordered_bytes, -entry->num_bytes, fs_info->delalloc_batch); diff --git a/fs/btrfs/ordered-data.h b/fs/btrfs/ordered-data.h index 0feb0c29839e..aeb17714ba5a 100644 --- a/fs/btrfs/ordered-data.h +++ b/fs/btrfs/ordered-data.h @@ -74,6 +74,8 @@ enum { BTRFS_ORDERED_LOGGED_CSUM, /* We wait for this extent to complete in the current transaction */ BTRFS_ORDERED_PENDING, + /* BTRFS_IOC_ENCODED_WRITE */ + BTRFS_ORDERED_ENCODED, }; /* BTRFS_ORDERED_* flags that specify the type of the extent. */ @@ -81,7 +83,8 @@ enum { (1UL << BTRFS_ORDERED_NOCOW) | \ (1UL << BTRFS_ORDERED_PREALLOC) | \ (1UL << BTRFS_ORDERED_COMPRESSED) | \ - (1UL << BTRFS_ORDERED_DIRECT)) + (1UL << BTRFS_ORDERED_DIRECT) | \ + (1UL << BTRFS_ORDERED_ENCODED)) struct btrfs_ordered_extent { /* logical offset in the file */ From patchwork Thu Feb 10 19:10:01 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Omar Sandoval X-Patchwork-Id: 12742341 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0657DC433EF for ; Thu, 10 Feb 2022 19:10:43 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1343740AbiBJTKk (ORCPT ); Thu, 10 Feb 2022 14:10:40 -0500 Received: from mxb-00190b01.gslb.pphosted.com ([23.128.96.19]:48238 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1343738AbiBJTKk (ORCPT ); Thu, 10 Feb 2022 14:10:40 -0500 Received: from mail-pj1-x1030.google.com (mail-pj1-x1030.google.com [IPv6:2607:f8b0:4864:20::1030]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 368AE1167 for ; Thu, 10 Feb 2022 11:10:41 -0800 (PST) Received: by mail-pj1-x1030.google.com with SMTP id h7-20020a17090a648700b001b927560c2bso5329561pjj.1 for ; Thu, 10 Feb 2022 11:10:41 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=osandov-com.20210112.gappssmtp.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=gqizGt0Qx9qi5ldlK0GqPvkGDdm7qRBFk1/irnafi8A=; b=vj37FVMVRcDRkcJpW+G2yV9foeddiUPDRMqKTh4YZotc9kVgcjJ14OUVt3p0PmRWKi sM3DZ9AjOTVGeIi2CNMP0hf2y/6auJFLRJa42Rh3BCSRAwE7uuVnOwUzYhRKVUFwjEZe KpA9utY+ArPRAAao+5CaviMYrD0CQLd5xv08k0xOZnW9OqFYqO66xHfkNYPSmyFhymb5 uxqf1CwF/uo7EKg6zvi7DnCZrygQGGDa98d/KNIoqO0GkBlc2PlFzHi1B9CX4a+YUAFo ksTxhKv6HnREgnyE07RAMR8XAvne/j/g6v12kbgWn4wRVYYRIS1cVGehe5IG6o7SL3IP X0/A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=gqizGt0Qx9qi5ldlK0GqPvkGDdm7qRBFk1/irnafi8A=; b=xNWK3ZYNMB/JR6MUdeky9yLLBc51/MXY/0D7Pz6q5UxqtEzd11Yyd3l/hDaoh4UXYA 02/5YsDHzpJr+iUrLzE6lzWd9lg0hrmrdmI1mMa+rjZhD3N09H21BxGr8x/wTn72R3E6 LY/G8jWcDHCbF40qY1Bp4Fbea1UaSwbAri8GVNwfI31JEVTp+iF3pE6A3IaPKoK/h/t2 EglnxFBUdKeEpLGZu5K0/RezDYLJDAmCUETYpmUeV4fFZ7KW6lVX8c+ml3tLlAEG4EX5 ayUROFQuF3aYOQEwydRapjOwj894/9dKc8NLUPOhe64qM7NFVZ3X4FjRKWXsPvZ+i1lp /QlQ== X-Gm-Message-State: AOAM530LaJBUdbtxWpo5F3FwMjDDgV/y+KPdULVuifhBEquuDBFgZzNy PhBFezUGs+TJJ/J4F0Z9XK/pd8Ro7FSvFQ== X-Google-Smtp-Source: ABdhPJzVkuDKW7MwAtt9R/DPF5tu7/xnzvcQucYgBbIIHPbKRwIXTlso+Z8ndLl8FUGFjOGsOZ6JuQ== X-Received: by 2002:a17:903:18a:: with SMTP id z10mr8999515plg.119.1644520240364; Thu, 10 Feb 2022 11:10:40 -0800 (PST) Received: from relinquished.tfbnw.net ([2620:10d:c090:400::5:1975]) by smtp.gmail.com with ESMTPSA id n5sm8315898pgt.22.2022.02.10.11.10.39 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 10 Feb 2022 11:10:39 -0800 (PST) From: Omar Sandoval To: linux-btrfs@vger.kernel.org Cc: kernel-team@fb.com Subject: [PATCH v13 11/17] btrfs: send: remove unused send_ctx::{total,cmd}_send_size Date: Thu, 10 Feb 2022 11:10:01 -0800 Message-Id: <2e6872831ec1d89ddea0bc56f0ad4aed49cb3c51.1644519257.git.osandov@fb.com> X-Mailer: git-send-email 2.35.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org From: Omar Sandoval We collect these statistics but have never used them for anything. Signed-off-by: Omar Sandoval --- fs/btrfs/send.c | 4 ---- 1 file changed, 4 deletions(-) diff --git a/fs/btrfs/send.c b/fs/btrfs/send.c index 7d1642937274..97e97aa7f4d7 100644 --- a/fs/btrfs/send.c +++ b/fs/btrfs/send.c @@ -82,8 +82,6 @@ struct send_ctx { char *send_buf; u32 send_size; u32 send_max_size; - u64 total_send_size; - u64 cmd_send_size[BTRFS_SEND_C_MAX + 1]; u64 flags; /* 'flags' member of btrfs_ioctl_send_args is u64 */ /* Protocol version compatibility requested */ u32 proto; @@ -727,8 +725,6 @@ static int send_cmd(struct send_ctx *sctx) ret = write_buf(sctx->send_filp, sctx->send_buf, sctx->send_size, &sctx->send_off); - sctx->total_send_size += sctx->send_size; - sctx->cmd_send_size[get_unaligned_le16(&hdr->cmd)] += sctx->send_size; sctx->send_size = 0; return ret; From patchwork Thu Feb 10 19:10:02 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Omar Sandoval X-Patchwork-Id: 12742343 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id C98A4C4332F for ; Thu, 10 Feb 2022 19:10:44 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1343743AbiBJTKm (ORCPT ); Thu, 10 Feb 2022 14:10:42 -0500 Received: from mxb-00190b01.gslb.pphosted.com ([23.128.96.19]:48246 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1343741AbiBJTKl (ORCPT ); Thu, 10 Feb 2022 14:10:41 -0500 Received: from mail-pj1-x102d.google.com (mail-pj1-x102d.google.com [IPv6:2607:f8b0:4864:20::102d]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id EE7FB110C for ; Thu, 10 Feb 2022 11:10:41 -0800 (PST) Received: by mail-pj1-x102d.google.com with SMTP id on2so6024113pjb.4 for ; Thu, 10 Feb 2022 11:10:41 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=osandov-com.20210112.gappssmtp.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=oUUCIYm5W31hLoWrGj3mWOGa1Y94atwoF4AGBZVVvjY=; b=wP87yrorEcSWjq+e14UiLqoCfHRVJDtUNsvXUqU63cH0t0VVOguzS1Qn4z0gt2UiDd 9j5p+9xLPD43lZ2pWyzIFEJFj+FgqtPGElUEzxg4VPXlusax4LhDLpFq49ciKEbU0Kw+ DPbAlzGngUXMxvvk24nskyXvAouXVPx9rjQ+OqoAVr1X27k9tlLWwipSY3R0+y0N6zj2 F/LAYk2u5CIdlecoUYokBUa1Irz1qyVPeBVaKO2fjCOEckmzHgKDNEjSPIdEwmhl5eBg sOGI6d1teF2OquezFhXa7LaS25LnLZXZ4Onyp5PQrdDeyRF7Tu0DEBUp/W/Fvd8QrMmU evbg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=oUUCIYm5W31hLoWrGj3mWOGa1Y94atwoF4AGBZVVvjY=; b=ZRxrQdDuqd3l0vXcU3e8B2axfbcv8BL/Xqjv6ho79u/K6K6Kv5atyQhPgniU/Yi5pw m8fGqSKgPDlXHd8o5BvXE0zvgg15+7UyJNTyoRp/dfRNslWft61jcbapUEPq20czt+FZ Nd9RgyieFEXUEPRxtEg5A8MM18QJn25KQI8mVvDtF0gXj6oeJ96KK9Hhbh0I63DzuUiy L7EKED16SbWsI/0pRbuwXMUtCw4TcBOYPjxICZn5As3X8irp4u6V5VZqQDL8pIpVdyuZ 42VCHoIDGfeM4/nVea4ov3dCAn/SZVjIrKZrL0DszV4k+6JVv37XRFS/Y2fqF0V8fErb r9KA== X-Gm-Message-State: AOAM533R4J/Tn3BfbdZ5V6qqGsNBiRPeGGCd+bJ9BaT8r7GWaXZ43X6J LNpbbff1kzLrrgMe/XzBWUFr2AKhJM4Ahw== X-Google-Smtp-Source: ABdhPJyDUypIV15aVvlSg9IQ6IY+0cAFX99Pm+stFRbOQtjyRnFUSkmNSQ+hd6VwQzWeF4UiYWF2BA== X-Received: by 2002:a17:90b:354:: with SMTP id fh20mr4267345pjb.134.1644520241231; Thu, 10 Feb 2022 11:10:41 -0800 (PST) Received: from relinquished.tfbnw.net ([2620:10d:c090:400::5:1975]) by smtp.gmail.com with ESMTPSA id n5sm8315898pgt.22.2022.02.10.11.10.40 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 10 Feb 2022 11:10:40 -0800 (PST) From: Omar Sandoval To: linux-btrfs@vger.kernel.org Cc: kernel-team@fb.com Subject: [PATCH v13 12/17] btrfs: send: explicitly number commands and attributes Date: Thu, 10 Feb 2022 11:10:02 -0800 Message-Id: <23d929398ec3abee3b5b3e140b28470e26ce8bc1.1644519257.git.osandov@fb.com> X-Mailer: git-send-email 2.35.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org From: Omar Sandoval Commit e77fbf990316 ("btrfs: send: prepare for v2 protocol") added _BTRFS_SEND_C_MAX_V* macros equal to the maximum command number for the version plus 1, but as written this creates gaps in the number space. The maximum command number is currently 22, and __BTRFS_SEND_C_MAX_V1 is accordingly 23. But then __BTRFS_SEND_C_MAX_V2 is 24, suggesting that v2 has a command numbered 23, and __BTRFS_SEND_C_MAX is 25, suggesting that 23 and 24 are valid commands. Instead, let's explicitly number all of the commands, attributes, and sentinel MAX constants. Signed-off-by: Omar Sandoval --- fs/btrfs/send.c | 4 +- fs/btrfs/send.h | 106 ++++++++++++++++++++++++------------------------ 2 files changed, 54 insertions(+), 56 deletions(-) diff --git a/fs/btrfs/send.c b/fs/btrfs/send.c index 97e97aa7f4d7..4fcfd81ade5e 100644 --- a/fs/btrfs/send.c +++ b/fs/btrfs/send.c @@ -326,8 +326,8 @@ __maybe_unused static bool proto_cmd_ok(const struct send_ctx *sctx, int cmd) { switch (sctx->proto) { - case 1: return cmd < __BTRFS_SEND_C_MAX_V1; - case 2: return cmd < __BTRFS_SEND_C_MAX_V2; + case 1: return cmd <= BTRFS_SEND_C_MAX_V1; + case 2: return cmd <= BTRFS_SEND_C_MAX_V2; default: return false; } } diff --git a/fs/btrfs/send.h b/fs/btrfs/send.h index 08602fdd600a..67721e0281ba 100644 --- a/fs/btrfs/send.h +++ b/fs/btrfs/send.h @@ -46,84 +46,82 @@ struct btrfs_tlv_header { /* commands */ enum btrfs_send_cmd { - BTRFS_SEND_C_UNSPEC, + BTRFS_SEND_C_UNSPEC = 0, /* Version 1 */ - BTRFS_SEND_C_SUBVOL, - BTRFS_SEND_C_SNAPSHOT, + BTRFS_SEND_C_SUBVOL = 1, + BTRFS_SEND_C_SNAPSHOT = 2, - BTRFS_SEND_C_MKFILE, - BTRFS_SEND_C_MKDIR, - BTRFS_SEND_C_MKNOD, - BTRFS_SEND_C_MKFIFO, - BTRFS_SEND_C_MKSOCK, - BTRFS_SEND_C_SYMLINK, + BTRFS_SEND_C_MKFILE = 3, + BTRFS_SEND_C_MKDIR = 4, + BTRFS_SEND_C_MKNOD = 5, + BTRFS_SEND_C_MKFIFO = 6, + BTRFS_SEND_C_MKSOCK = 7, + BTRFS_SEND_C_SYMLINK = 8, - BTRFS_SEND_C_RENAME, - BTRFS_SEND_C_LINK, - BTRFS_SEND_C_UNLINK, - BTRFS_SEND_C_RMDIR, + BTRFS_SEND_C_RENAME = 9, + BTRFS_SEND_C_LINK = 10, + BTRFS_SEND_C_UNLINK = 11, + BTRFS_SEND_C_RMDIR = 12, - BTRFS_SEND_C_SET_XATTR, - BTRFS_SEND_C_REMOVE_XATTR, + BTRFS_SEND_C_SET_XATTR = 13, + BTRFS_SEND_C_REMOVE_XATTR = 14, - BTRFS_SEND_C_WRITE, - BTRFS_SEND_C_CLONE, + BTRFS_SEND_C_WRITE = 15, + BTRFS_SEND_C_CLONE = 16, - BTRFS_SEND_C_TRUNCATE, - BTRFS_SEND_C_CHMOD, - BTRFS_SEND_C_CHOWN, - BTRFS_SEND_C_UTIMES, + BTRFS_SEND_C_TRUNCATE = 17, + BTRFS_SEND_C_CHMOD = 18, + BTRFS_SEND_C_CHOWN = 19, + BTRFS_SEND_C_UTIMES = 20, - BTRFS_SEND_C_END, - BTRFS_SEND_C_UPDATE_EXTENT, - __BTRFS_SEND_C_MAX_V1, + BTRFS_SEND_C_END = 21, + BTRFS_SEND_C_UPDATE_EXTENT = 22, + BTRFS_SEND_C_MAX_V1 = 22, /* Version 2 */ - __BTRFS_SEND_C_MAX_V2, + BTRFS_SEND_C_MAX_V2 = 22, /* End */ - __BTRFS_SEND_C_MAX, + BTRFS_SEND_C_MAX = 22, }; -#define BTRFS_SEND_C_MAX (__BTRFS_SEND_C_MAX - 1) /* attributes in send stream */ enum { - BTRFS_SEND_A_UNSPEC, + BTRFS_SEND_A_UNSPEC = 0, - BTRFS_SEND_A_UUID, - BTRFS_SEND_A_CTRANSID, + BTRFS_SEND_A_UUID = 1, + BTRFS_SEND_A_CTRANSID = 2, - BTRFS_SEND_A_INO, - BTRFS_SEND_A_SIZE, - BTRFS_SEND_A_MODE, - BTRFS_SEND_A_UID, - BTRFS_SEND_A_GID, - BTRFS_SEND_A_RDEV, - BTRFS_SEND_A_CTIME, - BTRFS_SEND_A_MTIME, - BTRFS_SEND_A_ATIME, - BTRFS_SEND_A_OTIME, + BTRFS_SEND_A_INO = 3, + BTRFS_SEND_A_SIZE = 4, + BTRFS_SEND_A_MODE = 5, + BTRFS_SEND_A_UID = 6, + BTRFS_SEND_A_GID = 7, + BTRFS_SEND_A_RDEV = 8, + BTRFS_SEND_A_CTIME = 9, + BTRFS_SEND_A_MTIME = 10, + BTRFS_SEND_A_ATIME = 11, + BTRFS_SEND_A_OTIME = 12, - BTRFS_SEND_A_XATTR_NAME, - BTRFS_SEND_A_XATTR_DATA, + BTRFS_SEND_A_XATTR_NAME = 13, + BTRFS_SEND_A_XATTR_DATA = 14, - BTRFS_SEND_A_PATH, - BTRFS_SEND_A_PATH_TO, - BTRFS_SEND_A_PATH_LINK, + BTRFS_SEND_A_PATH = 15, + BTRFS_SEND_A_PATH_TO = 16, + BTRFS_SEND_A_PATH_LINK = 17, - BTRFS_SEND_A_FILE_OFFSET, - BTRFS_SEND_A_DATA, + BTRFS_SEND_A_FILE_OFFSET = 18, + BTRFS_SEND_A_DATA = 19, - BTRFS_SEND_A_CLONE_UUID, - BTRFS_SEND_A_CLONE_CTRANSID, - BTRFS_SEND_A_CLONE_PATH, - BTRFS_SEND_A_CLONE_OFFSET, - BTRFS_SEND_A_CLONE_LEN, + BTRFS_SEND_A_CLONE_UUID = 20, + BTRFS_SEND_A_CLONE_CTRANSID = 21, + BTRFS_SEND_A_CLONE_PATH = 22, + BTRFS_SEND_A_CLONE_OFFSET = 23, + BTRFS_SEND_A_CLONE_LEN = 24, - __BTRFS_SEND_A_MAX, + BTRFS_SEND_A_MAX = 24, }; -#define BTRFS_SEND_A_MAX (__BTRFS_SEND_A_MAX - 1) #ifdef __KERNEL__ long btrfs_ioctl_send(struct inode *inode, struct btrfs_ioctl_send_args *arg); From patchwork Thu Feb 10 19:10:03 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Omar Sandoval X-Patchwork-Id: 12742344 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id DD679C433EF for ; Thu, 10 Feb 2022 19:10:45 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1343746AbiBJTKn (ORCPT ); Thu, 10 Feb 2022 14:10:43 -0500 Received: from mxb-00190b01.gslb.pphosted.com ([23.128.96.19]:48270 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1343745AbiBJTKm (ORCPT ); Thu, 10 Feb 2022 14:10:42 -0500 Received: from mail-pl1-x62d.google.com (mail-pl1-x62d.google.com [IPv6:2607:f8b0:4864:20::62d]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 1DDA8110C for ; Thu, 10 Feb 2022 11:10:43 -0800 (PST) Received: by mail-pl1-x62d.google.com with SMTP id l9so1068292plg.0 for ; Thu, 10 Feb 2022 11:10:43 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=osandov-com.20210112.gappssmtp.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=BzdNK/xxnsl8agLtteSaV3bTofU+kl6waZ9fLxf1ucQ=; b=IoFbHLJMGmXdYs7kMJM5x9e0VFBVyRLYFANlA0CrrcOmoFOA32jvlrr5U2/FyfoGei 4m3qSL43J3sg4zOkWTy6h65Nql5dBg+g1+xZOerMzUQ0hZCvzN0MziiVJqt8KMHKeIe6 G/XVYptj5/fOIUKriMeWm70u7oKb7vXvTykwwFl+YAPaSncQutNBqOzn/f9+evpBOGWF WbkzDQQzQkxDgVR4lNCMdi4+RiWVJab/YBKWTeVNfGPz4J7ZWnXo/gdFXaI0JPFKGhGc z10CLtkf9SQupZfPJKz4oqwMqitOK0oJxS1z2ujG1gpM+IfKdwR2fAT/bhWwTBPtXFec RINg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=BzdNK/xxnsl8agLtteSaV3bTofU+kl6waZ9fLxf1ucQ=; b=QvO4mhTGG+MjIt/dbomWgw2WZOVsnMqOxszhicNtllnDT27B5ojrJFY9oY1CP5rD0R sRgOaUJzLz8thfMY7H3uvOvzn4TW9odqVyBAf+OAeyefsgfPBsLB9LZGYE8hhrpabsFP GuHEFMoXEeAPLzIa9vNfKXEKfxsyNEO7ahGY4rKHaFCBmHyNP2uuK17Cdqe3veRY/l+r HDt3ebrwnfT+uViTsZrtStwYUn9uSQSo8BMe6DI9bV/teE7/t6zhaH8kVLhjrTxsSjtk Z58jG/ch67YueXWg5aUbeM56Cf8t3lv9Kca4tSasi4RNcl51dy/k/eAPGtdTr6v2alpA 3mNQ== X-Gm-Message-State: AOAM532HFIrF8n3pxBiVYeGEnkBuvqt6S7+kLiYX9WQOi4bajRNfNCtB VYdesnaevbaP76q1ajJAQkiZ/TA8+sHU2g== X-Google-Smtp-Source: ABdhPJzA++uEFFdWT/rkX2vJtxrfdfOI5Th9QnIMmUdNI8S2r+0ljuBYIo8N7eA3wG9MrT1hGPG+2Q== X-Received: by 2002:a17:902:b215:: with SMTP id t21mr8641409plr.69.1644520242332; Thu, 10 Feb 2022 11:10:42 -0800 (PST) Received: from relinquished.tfbnw.net ([2620:10d:c090:400::5:1975]) by smtp.gmail.com with ESMTPSA id n5sm8315898pgt.22.2022.02.10.11.10.41 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 10 Feb 2022 11:10:41 -0800 (PST) From: Omar Sandoval To: linux-btrfs@vger.kernel.org Cc: kernel-team@fb.com Subject: [PATCH v13 13/17] btrfs: add send stream v2 definitions Date: Thu, 10 Feb 2022 11:10:03 -0800 Message-Id: X-Mailer: git-send-email 2.35.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org From: Omar Sandoval This adds the definitions of the new commands for send stream version 2 and their respective attributes: fallocate, FS_IOC_SETFLAGS (a.k.a. chattr), and encoded writes. It also documents two changes to the send stream format in v2: the receiver shouldn't assume a maximum command size, and the DATA attribute is encoded differently to allow for writes larger than 64k. These will be implemented in subsequent changes, and then the ioctl will accept the new version and flag. Reviewed-by: Josef Bacik Signed-off-by: Omar Sandoval --- fs/btrfs/send.c | 2 +- fs/btrfs/send.h | 40 ++++++++++++++++++++++++++++++++++---- include/uapi/linux/btrfs.h | 7 +++++++ 3 files changed, 44 insertions(+), 5 deletions(-) diff --git a/fs/btrfs/send.c b/fs/btrfs/send.c index 4fcfd81ade5e..9c9be0983cb3 100644 --- a/fs/btrfs/send.c +++ b/fs/btrfs/send.c @@ -7564,7 +7564,7 @@ long btrfs_ioctl_send(struct inode *inode, struct btrfs_ioctl_send_args *arg) sctx->clone_roots_cnt = arg->clone_sources_count; - sctx->send_max_size = BTRFS_SEND_BUF_SIZE; + sctx->send_max_size = BTRFS_SEND_BUF_SIZE_V1; sctx->send_buf = kvmalloc(sctx->send_max_size, GFP_KERNEL); if (!sctx->send_buf) { ret = -ENOMEM; diff --git a/fs/btrfs/send.h b/fs/btrfs/send.h index 67721e0281ba..805d8095209a 100644 --- a/fs/btrfs/send.h +++ b/fs/btrfs/send.h @@ -12,7 +12,11 @@ #define BTRFS_SEND_STREAM_MAGIC "btrfs-stream" #define BTRFS_SEND_STREAM_VERSION 1 -#define BTRFS_SEND_BUF_SIZE SZ_64K +/* + * In send stream v1, no command is larger than 64k. In send stream v2, no limit + * should be assumed. + */ +#define BTRFS_SEND_BUF_SIZE_V1 SZ_64K enum btrfs_tlv_type { BTRFS_TLV_U8, @@ -80,16 +84,20 @@ enum btrfs_send_cmd { BTRFS_SEND_C_MAX_V1 = 22, /* Version 2 */ - BTRFS_SEND_C_MAX_V2 = 22, + BTRFS_SEND_C_FALLOCATE = 23, + BTRFS_SEND_C_SETFLAGS = 24, + BTRFS_SEND_C_ENCODED_WRITE = 25, + BTRFS_SEND_C_MAX_V2 = 25, /* End */ - BTRFS_SEND_C_MAX = 22, + BTRFS_SEND_C_MAX = 25, }; /* attributes in send stream */ enum { BTRFS_SEND_A_UNSPEC = 0, + /* Version 1 */ BTRFS_SEND_A_UUID = 1, BTRFS_SEND_A_CTRANSID = 2, @@ -112,6 +120,11 @@ enum { BTRFS_SEND_A_PATH_LINK = 17, BTRFS_SEND_A_FILE_OFFSET = 18, + /* + * As of send stream v2, this attribute is special: it must be the last + * attribute in a command, its header contains only the type, and its + * length is implicitly the remaining length of the command. + */ BTRFS_SEND_A_DATA = 19, BTRFS_SEND_A_CLONE_UUID = 20, @@ -120,7 +133,26 @@ enum { BTRFS_SEND_A_CLONE_OFFSET = 23, BTRFS_SEND_A_CLONE_LEN = 24, - BTRFS_SEND_A_MAX = 24, + BTRFS_SEND_A_MAX_V1 = 24, + + /* Version 2 */ + BTRFS_SEND_A_FALLOCATE_MODE = 25, + + BTRFS_SEND_A_SETFLAGS_FLAGS = 26, + + BTRFS_SEND_A_UNENCODED_FILE_LEN = 27, + BTRFS_SEND_A_UNENCODED_LEN = 28, + BTRFS_SEND_A_UNENCODED_OFFSET = 29, + /* + * COMPRESSION and ENCRYPTION default to NONE (0) if omitted from + * BTRFS_SEND_C_ENCODED_WRITE. + */ + BTRFS_SEND_A_COMPRESSION = 30, + BTRFS_SEND_A_ENCRYPTION = 31, + BTRFS_SEND_A_MAX_V2 = 31, + + /* End */ + BTRFS_SEND_A_MAX = 31, }; #ifdef __KERNEL__ diff --git a/include/uapi/linux/btrfs.h b/include/uapi/linux/btrfs.h index 1a96645243e0..38578b1ce0a3 100644 --- a/include/uapi/linux/btrfs.h +++ b/include/uapi/linux/btrfs.h @@ -777,6 +777,13 @@ struct btrfs_ioctl_received_subvol_args { */ #define BTRFS_SEND_FLAG_VERSION 0x8 +/* + * Send compressed data using the ENCODED_WRITE command instead of decompressing + * the data and sending it with the WRITE command. This requires protocol + * version >= 2. + */ +#define BTRFS_SEND_FLAG_COMPRESSED 0x10 + #define BTRFS_SEND_FLAG_MASK \ (BTRFS_SEND_FLAG_NO_FILE_DATA | \ BTRFS_SEND_FLAG_OMIT_STREAM_HEADER | \ From patchwork Thu Feb 10 19:10:04 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Omar Sandoval X-Patchwork-Id: 12742346 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5982AC433F5 for ; Thu, 10 Feb 2022 19:10:46 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1343749AbiBJTKo (ORCPT ); Thu, 10 Feb 2022 14:10:44 -0500 Received: from mxb-00190b01.gslb.pphosted.com ([23.128.96.19]:48284 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1343738AbiBJTKn (ORCPT ); Thu, 10 Feb 2022 14:10:43 -0500 Received: from mail-pl1-x630.google.com (mail-pl1-x630.google.com [IPv6:2607:f8b0:4864:20::630]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 065152188 for ; Thu, 10 Feb 2022 11:10:44 -0800 (PST) Received: by mail-pl1-x630.google.com with SMTP id p6so2480682plf.10 for ; Thu, 10 Feb 2022 11:10:44 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=osandov-com.20210112.gappssmtp.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=oktzhnVhg73ak7/Gcb3l9iKfEFB0FYYIUdGjJPrwc2s=; b=Cs1YeznxTPN/dFj5PAIIjnjeQs30bstETfA7ZIXzgTrd265xhRF2JDNT/kMhwO7JFW WMOtwn89U35KPHDRPTHiJgjqAoFVjp78qcLujxoQ3c6GcoRziNkEMy/Sqmv+AmoqG5w9 Zq4sYRoQYOpBE6vZ52h7ercBYWq/52ALbe9eOEU4AH5gRI2O9psxBz2LjzXrqo1kp5al kBO002zWK/NUXsBnOb2+ExOCWJAEzTUNVzSCzudX2L2W4NI+9YPYNJGWIua46nCfYy39 ewvSlP1JO/ecx3Kg2Y7eZMju5odrSV5e0IPH8x3oreFdpvJYEtRgQUmsvvLDr97j9lBR Hd5g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=oktzhnVhg73ak7/Gcb3l9iKfEFB0FYYIUdGjJPrwc2s=; b=sBLAIgNiv1sFUMBNPdaQJn3hlA67JriChqP+Tnqc3T7/fgqXd1dPF0Qauyio18c8Dx voz4pMch3i0NGWJAfXo87f54psqYpYj9RVRQ1o5+KLuSi1nq3ZYTvmU/BxjHX1g+XUMD qAjwac1e+TXAPoI0XEIpII/BbbcbdtdaMOzgzHqjPqGZPn4+U0aAaImcSLOFl/li5I7V kQxuH1Ln+CO5HkpgKodmdvlxIoNIVgTLcedZzQzkrglJgaHTXm+r2meiA681HyfVG1N8 PTatzK6gJum3z3gZPoCapdyLBByNpNMhJGFfNb4TlacjJBRiQj6v4gb1mK1h2NYV+M2x D8bg== X-Gm-Message-State: AOAM530kBfAhDRCxQC8bSzjtPfCXSSAohV3Y+B4TnyStIqxypsxMklAI TjlhTbAxftGKImdRC6r24nXiGdvD0Nia2w== X-Google-Smtp-Source: ABdhPJwwDFQ94lkK6JokG7apjzMx6JnGmkqDwhzbixvcU4abAy5g1h4+ExJ54CTGpxLbl9VAJrc+zw== X-Received: by 2002:a17:902:ab88:: with SMTP id f8mr9163764plr.27.1644520243159; Thu, 10 Feb 2022 11:10:43 -0800 (PST) Received: from relinquished.tfbnw.net ([2620:10d:c090:400::5:1975]) by smtp.gmail.com with ESMTPSA id n5sm8315898pgt.22.2022.02.10.11.10.42 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 10 Feb 2022 11:10:42 -0800 (PST) From: Omar Sandoval To: linux-btrfs@vger.kernel.org Cc: kernel-team@fb.com Subject: [PATCH v13 14/17] btrfs: send: write larger chunks when using stream v2 Date: Thu, 10 Feb 2022 11:10:04 -0800 Message-Id: <0235c8a8aeb00b8b0f2d784e82bb0eb470e848d8.1644519257.git.osandov@fb.com> X-Mailer: git-send-email 2.35.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org From: Omar Sandoval The length field of the send stream TLV header is 16 bits. This means that the maximum amount of data that can be sent for one write is 64k minus one. However, encoded writes must be able to send the maximum compressed extent (128k) in one command. To support this, send stream version 2 encodes the DATA attribute differently: it has no length field, and the length is implicitly up to the end of containing command (which has a 32-bit length field). Although this is necessary for encoded writes, normal writes can benefit from it, too. Also add a check to enforce that the DATA attribute is last. It is only strictly necessary for v2, but we might as well make v1 consistent with it. For v2, let's bump up the send buffer to the maximum compressed extent size plus 16k for the other metadata (144k total). Since this will most likely be vmalloc'd (and always will be after the next commit), we round it up to the next page since we might as well use the rest of the page on systems with >16k pages. Reviewed-by: Nikolay Borisov Signed-off-by: Omar Sandoval --- fs/btrfs/send.c | 42 ++++++++++++++++++++++++++++++++++-------- 1 file changed, 34 insertions(+), 8 deletions(-) diff --git a/fs/btrfs/send.c b/fs/btrfs/send.c index 9c9be0983cb3..6d0686c51f80 100644 --- a/fs/btrfs/send.c +++ b/fs/btrfs/send.c @@ -82,6 +82,7 @@ struct send_ctx { char *send_buf; u32 send_size; u32 send_max_size; + bool put_data; u64 flags; /* 'flags' member of btrfs_ioctl_send_args is u64 */ /* Protocol version compatibility requested */ u32 proto; @@ -589,6 +590,9 @@ static int tlv_put(struct send_ctx *sctx, u16 attr, const void *data, int len) int total_len = sizeof(*hdr) + len; int left = sctx->send_max_size - sctx->send_size; + if (WARN_ON_ONCE(sctx->put_data)) + return -EINVAL; + if (unlikely(left < total_len)) return -EOVERFLOW; @@ -726,6 +730,7 @@ static int send_cmd(struct send_ctx *sctx) &sctx->send_off); sctx->send_size = 0; + sctx->put_data = false; return ret; } @@ -4927,14 +4932,30 @@ static inline u64 max_send_read_size(const struct send_ctx *sctx) static int put_data_header(struct send_ctx *sctx, u32 len) { - struct btrfs_tlv_header *hdr; + if (WARN_ON_ONCE(sctx->put_data)) + return -EINVAL; + sctx->put_data = true; + if (sctx->proto >= 2) { + /* + * In v2, the data attribute header doesn't include a length; it + * is implicitly to the end of the command. + */ + if (sctx->send_max_size - sctx->send_size < 2 + len) + return -EOVERFLOW; + put_unaligned_le16(BTRFS_SEND_A_DATA, + sctx->send_buf + sctx->send_size); + sctx->send_size += 2; + } else { + struct btrfs_tlv_header *hdr; - if (sctx->send_max_size - sctx->send_size < sizeof(*hdr) + len) - return -EOVERFLOW; - hdr = (struct btrfs_tlv_header *)(sctx->send_buf + sctx->send_size); - put_unaligned_le16(BTRFS_SEND_A_DATA, &hdr->tlv_type); - put_unaligned_le16(len, &hdr->tlv_len); - sctx->send_size += sizeof(*hdr); + if (sctx->send_max_size - sctx->send_size < sizeof(*hdr) + len) + return -EOVERFLOW; + hdr = (struct btrfs_tlv_header *)(sctx->send_buf + + sctx->send_size); + put_unaligned_le16(BTRFS_SEND_A_DATA, &hdr->tlv_type); + put_unaligned_le16(len, &hdr->tlv_len); + sctx->send_size += sizeof(*hdr); + } return 0; } @@ -7564,7 +7585,12 @@ long btrfs_ioctl_send(struct inode *inode, struct btrfs_ioctl_send_args *arg) sctx->clone_roots_cnt = arg->clone_sources_count; - sctx->send_max_size = BTRFS_SEND_BUF_SIZE_V1; + if (sctx->proto >= 2) { + sctx->send_max_size = ALIGN(SZ_16K + BTRFS_MAX_COMPRESSED, + PAGE_SIZE); + } else { + sctx->send_max_size = BTRFS_SEND_BUF_SIZE_V1; + } sctx->send_buf = kvmalloc(sctx->send_max_size, GFP_KERNEL); if (!sctx->send_buf) { ret = -ENOMEM; From patchwork Thu Feb 10 19:10:05 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Omar Sandoval X-Patchwork-Id: 12742347 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7DC6FC43219 for ; Thu, 10 Feb 2022 19:10:47 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1343754AbiBJTKo (ORCPT ); Thu, 10 Feb 2022 14:10:44 -0500 Received: from mxb-00190b01.gslb.pphosted.com ([23.128.96.19]:48302 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1343750AbiBJTKo (ORCPT ); Thu, 10 Feb 2022 14:10:44 -0500 Received: from mail-pf1-x432.google.com (mail-pf1-x432.google.com [IPv6:2607:f8b0:4864:20::432]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D4035110C for ; Thu, 10 Feb 2022 11:10:44 -0800 (PST) Received: by mail-pf1-x432.google.com with SMTP id i6so10040030pfc.9 for ; Thu, 10 Feb 2022 11:10:44 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=osandov-com.20210112.gappssmtp.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=zBHnkiFwQgGlRibgIKmNQj+ejfFIM03MUokocJ326cE=; b=3mR/vACmT2E5sICY843LKKHw4xRZ2IWyxcVLwMSlAQW6tOIuyKKCmbcoMz6Pu/zWET OmmSauyJzsGDB85zxsuuMQY64WZg7Q4hp1zwKNNXhW1pFK2wR+hktIfzyHHC+hG2lzB2 cESBHgjbuIretqgpzrbP+6qxPtDrQH+g4NeSj95RJ4oEkKwebGfyEGrRMnKwrNXRFUyJ J6somegPucYQ/kGZSdoipvrmxuGPmjoYwHUHnPJ9x8gCb3vsrh7ursTJsho/LleV+uxv zgpLzjQD4m1OXF6eJc5LFLUx3Ea8ZNz9oOdNeFjK3NeiCz/ShugMhRdkUxLZh8Y7V73c Y8lg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=zBHnkiFwQgGlRibgIKmNQj+ejfFIM03MUokocJ326cE=; b=jBsN2ax4yvgEMnH5fAK3LSlkg9lclb6eTckuTXsp3SClhVQ3xfagyBb17agWvn7K59 O3nk9VQpWI/6rZNbxT5hUYYe3MLlHrVoOxUHAYOWqGkEm6VMSJRKIS0mag2atCCNYvST j9795s7A4cMKIOGBAM71LTE9dGEwOODC+Nlxj0z0fai8z2xN/4DXLxnAo7iuTxYezFsH QOy4tFOCr6Meod7HNcEX1nkoTenT14p1cKFInafWKMpH1ejHONPds0f+9JR8zE806ddm fGXaePBNaF/l4rWkxny4KuuRUsElAsMW9Y72l/3YChceXQ6rs5f7HAczb2lNt5IF7DMe G4JQ== X-Gm-Message-State: AOAM5321lJyBomAMC7nmf2tKwGTs7hhii6adopMF8tSPJeLc8kI+3DOf A15HaeYh+GxEk9yIfMQxX9TG3Ma0SvvAsA== X-Google-Smtp-Source: ABdhPJyptkJzZqTLcqdllOgI6jNIvWJvLtgQ0EYLuEWiAUTUbE/rUFOWNevnoFME3zrXowXCM7kODA== X-Received: by 2002:a63:1d56:: with SMTP id d22mr7270992pgm.149.1644520244066; Thu, 10 Feb 2022 11:10:44 -0800 (PST) Received: from relinquished.tfbnw.net ([2620:10d:c090:400::5:1975]) by smtp.gmail.com with ESMTPSA id n5sm8315898pgt.22.2022.02.10.11.10.43 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 10 Feb 2022 11:10:43 -0800 (PST) From: Omar Sandoval To: linux-btrfs@vger.kernel.org Cc: kernel-team@fb.com Subject: [PATCH v13 15/17] btrfs: send: allocate send buffer with alloc_page() and vmap() for v2 Date: Thu, 10 Feb 2022 11:10:05 -0800 Message-Id: X-Mailer: git-send-email 2.35.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org From: Omar Sandoval For encoded writes, we need the raw pages for reading compressed data directly via a bio. So, replace kvmalloc() with vmap() so we have access to the raw pages. 144k is large enough that it usually gets allocated with vmalloc(), anyways. Reviewed-by: Nikolay Borisov Signed-off-by: Omar Sandoval --- fs/btrfs/send.c | 33 +++++++++++++++++++++++++++++++-- 1 file changed, 31 insertions(+), 2 deletions(-) diff --git a/fs/btrfs/send.c b/fs/btrfs/send.c index 6d0686c51f80..2b0bb8b8f972 100644 --- a/fs/btrfs/send.c +++ b/fs/btrfs/send.c @@ -83,6 +83,7 @@ struct send_ctx { u32 send_size; u32 send_max_size; bool put_data; + struct page **send_buf_pages; u64 flags; /* 'flags' member of btrfs_ioctl_send_args is u64 */ /* Protocol version compatibility requested */ u32 proto; @@ -7497,6 +7498,7 @@ long btrfs_ioctl_send(struct inode *inode, struct btrfs_ioctl_send_args *arg) struct btrfs_root *clone_root; struct send_ctx *sctx = NULL; u32 i; + u32 send_buf_num_pages = 0; u64 *clone_sources_tmp = NULL; int clone_sources_to_rollback = 0; size_t alloc_size; @@ -7588,10 +7590,28 @@ long btrfs_ioctl_send(struct inode *inode, struct btrfs_ioctl_send_args *arg) if (sctx->proto >= 2) { sctx->send_max_size = ALIGN(SZ_16K + BTRFS_MAX_COMPRESSED, PAGE_SIZE); + send_buf_num_pages = sctx->send_max_size >> PAGE_SHIFT; + sctx->send_buf_pages = kcalloc(send_buf_num_pages, + sizeof(*sctx->send_buf_pages), + GFP_KERNEL); + if (!sctx->send_buf_pages) { + send_buf_num_pages = 0; + ret = -ENOMEM; + goto out; + } + for (i = 0; i < send_buf_num_pages; i++) { + sctx->send_buf_pages[i] = alloc_page(GFP_KERNEL); + if (!sctx->send_buf_pages[i]) { + ret = -ENOMEM; + goto out; + } + } + sctx->send_buf = vmap(sctx->send_buf_pages, send_buf_num_pages, + VM_MAP, PAGE_KERNEL); } else { sctx->send_max_size = BTRFS_SEND_BUF_SIZE_V1; + sctx->send_buf = kvmalloc(sctx->send_max_size, GFP_KERNEL); } - sctx->send_buf = kvmalloc(sctx->send_max_size, GFP_KERNEL); if (!sctx->send_buf) { ret = -ENOMEM; goto out; @@ -7784,7 +7804,16 @@ long btrfs_ioctl_send(struct inode *inode, struct btrfs_ioctl_send_args *arg) fput(sctx->send_filp); kvfree(sctx->clone_roots); - kvfree(sctx->send_buf); + if (sctx->proto >= 2) { + vunmap(sctx->send_buf); + for (i = 0; i < send_buf_num_pages; i++) { + if (sctx->send_buf_pages[i]) + __free_page(sctx->send_buf_pages[i]); + } + kfree(sctx->send_buf_pages); + } else { + kvfree(sctx->send_buf); + } name_cache_free(sctx); From patchwork Thu Feb 10 19:10:06 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Omar Sandoval X-Patchwork-Id: 12742348 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3F3EFC433EF for ; Thu, 10 Feb 2022 19:10:50 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1343757AbiBJTKr (ORCPT ); Thu, 10 Feb 2022 14:10:47 -0500 Received: from mxb-00190b01.gslb.pphosted.com ([23.128.96.19]:48310 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1343747AbiBJTKp (ORCPT ); Thu, 10 Feb 2022 14:10:45 -0500 Received: from mail-pj1-x102f.google.com (mail-pj1-x102f.google.com [IPv6:2607:f8b0:4864:20::102f]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D0B9710BA for ; Thu, 10 Feb 2022 11:10:45 -0800 (PST) Received: by mail-pj1-x102f.google.com with SMTP id t4-20020a17090a510400b001b8c4a6cd5dso6498150pjh.5 for ; Thu, 10 Feb 2022 11:10:45 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=osandov-com.20210112.gappssmtp.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=i08ylIvYePgvaN25EivFMBo8SJ6Co9ZswL+ivbeCEwI=; b=Jr8u/eyioRQhnFE9+I++va7uZodoAuV4CHHxlKDf/PTKpTLjfP7Um2xuI3oV/DGvMg EGaE3IfBA9tqIFeEPKl1jWol4K1jeHm8OBcJmF/rbIjZoN6S7OAXTPhKrILsggISwio8 RehPj5408yCio971ZMv8Znc6MNiV5WHK55kzGPH4loJ88X8twSI2lc4MK28R8buz6L+n qkzOG7M8b6KcTVi8VY+QZxptVFMXYZZG4HsiJzm5WrEVsT0QmpptuHXIBlsH5rxRH6b+ hSMwcYGVjcQPnqQqxxcaUwYVQZdOGDyxA3WzVxmGobgJKqPAO/8haB40ki9RawSM6FOi 3zFg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=i08ylIvYePgvaN25EivFMBo8SJ6Co9ZswL+ivbeCEwI=; b=6FnIKxdkFBiutCfEQgw+vMEcUFyWCj9TqoUXLjzYc8abrJ3q7DfB71YY4/HMloQ5NA puXRasBYZFRPVc1yM0h+i/znfaENL7C2dHkashWM8OtY9RAlEHGQTNEJLvnI6CIEz5LO cOjuT2U1/+zEMJZ8L6GbIjB9O4hU+5Z1aGlHjFB2MQtKfcjeZ6UEfcHngw8PQsBqSlW3 kmmps5kUEVd3ifRNruwWuHGKsaSC6c95eGKfLrsropiDNKm4ej16rdaXQRs9+vvPrhYs /5/j9/21Glt3+No5k248FkJ0QJknR8YIQ9JzfOiyqMLi3+o3hqzdNu/eViUX/iqzlMjb Cc7w== X-Gm-Message-State: AOAM5329WU/IdFTqyRAmaxY76dd0NSfXbz5BCyFJmOlgNuHTQGmN9iQR 0J+oykFLLgsN/VVUvBRyiI8cVJviCJH35g== X-Google-Smtp-Source: ABdhPJzqQ5jCvJJc/NRqogcfqLBltQZkD33xIeLvY0mq9uU1w9L0U1GeZcvRJzqCXX9tzgzdM5/29g== X-Received: by 2002:a17:902:bd09:: with SMTP id p9mr9033477pls.52.1644520244994; Thu, 10 Feb 2022 11:10:44 -0800 (PST) Received: from relinquished.tfbnw.net ([2620:10d:c090:400::5:1975]) by smtp.gmail.com with ESMTPSA id n5sm8315898pgt.22.2022.02.10.11.10.44 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 10 Feb 2022 11:10:44 -0800 (PST) From: Omar Sandoval To: linux-btrfs@vger.kernel.org Cc: kernel-team@fb.com Subject: [PATCH v13 16/17] btrfs: send: send compressed extents with encoded writes Date: Thu, 10 Feb 2022 11:10:06 -0800 Message-Id: <640f9ffbfc6af75b782a4142f06723808cd5ebb5.1644519258.git.osandov@fb.com> X-Mailer: git-send-email 2.35.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org From: Omar Sandoval Now that all of the pieces are in place, we can use the ENCODED_WRITE command to send compressed extents when appropriate. Signed-off-by: Omar Sandoval --- fs/btrfs/ctree.h | 6 ++ fs/btrfs/inode.c | 13 +-- fs/btrfs/send.c | 234 +++++++++++++++++++++++++++++++++++++++++++---- 3 files changed, 228 insertions(+), 25 deletions(-) diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index 54f0bf10e885..a2da402163bf 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -3295,6 +3295,12 @@ int btrfs_writepage_cow_fixup(struct page *page); void btrfs_writepage_endio_finish_ordered(struct btrfs_inode *inode, struct page *page, u64 start, u64 end, bool uptodate); +int btrfs_encoded_io_compression_from_extent(struct btrfs_fs_info *fs_info, + int compress_type); +int btrfs_encoded_read_regular_fill_pages(struct btrfs_inode *inode, + u64 file_offset, u64 disk_bytenr, + u64 disk_io_size, + struct page **pages); struct btrfs_ioctl_encoded_io_args; ssize_t btrfs_encoded_read(struct kiocb *iocb, struct iov_iter *iter, struct btrfs_ioctl_encoded_io_args *encoded); diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index c8c1a4bab06b..0dbf995dc75b 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -10133,9 +10133,8 @@ void btrfs_set_range_writeback(struct btrfs_inode *inode, u64 start, u64 end) } } -static int btrfs_encoded_io_compression_from_extent( - struct btrfs_fs_info *fs_info, - int compress_type) +int btrfs_encoded_io_compression_from_extent(struct btrfs_fs_info *fs_info, + int compress_type) { switch (compress_type) { case BTRFS_COMPRESS_NONE: @@ -10342,11 +10341,9 @@ static void btrfs_encoded_read_endio(struct bio *bio) bio_put(bio); } -static int btrfs_encoded_read_regular_fill_pages(struct btrfs_inode *inode, - u64 file_offset, - u64 disk_bytenr, - u64 disk_io_size, - struct page **pages) +int btrfs_encoded_read_regular_fill_pages(struct btrfs_inode *inode, + u64 file_offset, u64 disk_bytenr, + u64 disk_io_size, struct page **pages) { struct btrfs_fs_info *fs_info = inode->root->fs_info; struct btrfs_encoded_read_private priv = { diff --git a/fs/btrfs/send.c b/fs/btrfs/send.c index 2b0bb8b8f972..5192ba58459e 100644 --- a/fs/btrfs/send.c +++ b/fs/btrfs/send.c @@ -614,6 +614,7 @@ static int tlv_put(struct send_ctx *sctx, u16 attr, const void *data, int len) return tlv_put(sctx, attr, &__tmp, sizeof(__tmp)); \ } +TLV_PUT_DEFINE_INT(32) TLV_PUT_DEFINE_INT(64) static int tlv_put_string(struct send_ctx *sctx, u16 attr, @@ -5234,16 +5235,215 @@ static int send_hole(struct send_ctx *sctx, u64 end) return ret; } -static int send_extent_data(struct send_ctx *sctx, - const u64 offset, - const u64 len) +static int send_encoded_inline_extent(struct send_ctx *sctx, + struct btrfs_path *path, u64 offset, + u64 len) { + struct btrfs_root *root = sctx->send_root; + struct btrfs_fs_info *fs_info = root->fs_info; + struct inode *inode; + struct fs_path *p; + struct extent_buffer *leaf = path->nodes[0]; + struct btrfs_key key; + struct btrfs_file_extent_item *ei; + u64 ram_bytes; + size_t inline_size; + int ret; + + inode = btrfs_iget(fs_info->sb, sctx->cur_ino, root); + if (IS_ERR(inode)) + return PTR_ERR(inode); + + p = fs_path_alloc(); + if (!p) { + ret = -ENOMEM; + goto out; + } + + ret = begin_cmd(sctx, BTRFS_SEND_C_ENCODED_WRITE); + if (ret < 0) + goto out; + + ret = get_cur_path(sctx, sctx->cur_ino, sctx->cur_inode_gen, p); + if (ret < 0) + goto out; + + btrfs_item_key_to_cpu(leaf, &key, path->slots[0]); + ei = btrfs_item_ptr(leaf, path->slots[0], + struct btrfs_file_extent_item); + ram_bytes = btrfs_file_extent_ram_bytes(leaf, ei); + inline_size = btrfs_file_extent_inline_item_len(leaf, path->slots[0]); + + TLV_PUT_PATH(sctx, BTRFS_SEND_A_PATH, p); + TLV_PUT_U64(sctx, BTRFS_SEND_A_FILE_OFFSET, offset); + TLV_PUT_U64(sctx, BTRFS_SEND_A_UNENCODED_FILE_LEN, + min(key.offset + ram_bytes - offset, len)); + TLV_PUT_U64(sctx, BTRFS_SEND_A_UNENCODED_LEN, ram_bytes); + TLV_PUT_U64(sctx, BTRFS_SEND_A_UNENCODED_OFFSET, offset - key.offset); + ret = btrfs_encoded_io_compression_from_extent(fs_info, + btrfs_file_extent_compression(leaf, ei)); + if (ret < 0) + goto out; + TLV_PUT_U32(sctx, BTRFS_SEND_A_COMPRESSION, ret); + + ret = put_data_header(sctx, inline_size); + if (ret < 0) + goto out; + read_extent_buffer(leaf, sctx->send_buf + sctx->send_size, + btrfs_file_extent_inline_start(ei), inline_size); + sctx->send_size += inline_size; + + ret = send_cmd(sctx); + +tlv_put_failure: +out: + fs_path_free(p); + iput(inode); + return ret; +} + +static int send_encoded_extent(struct send_ctx *sctx, struct btrfs_path *path, + u64 offset, u64 len) +{ + struct btrfs_root *root = sctx->send_root; + struct btrfs_fs_info *fs_info = root->fs_info; + struct inode *inode; + struct fs_path *p; + struct extent_buffer *leaf = path->nodes[0]; + struct btrfs_key key; + struct btrfs_file_extent_item *ei; + u64 disk_bytenr, disk_num_bytes; + u32 data_offset; + struct btrfs_cmd_header *hdr; + u32 crc; + int ret; + + inode = btrfs_iget(fs_info->sb, sctx->cur_ino, root); + if (IS_ERR(inode)) + return PTR_ERR(inode); + + p = fs_path_alloc(); + if (!p) { + ret = -ENOMEM; + goto out; + } + + ret = begin_cmd(sctx, BTRFS_SEND_C_ENCODED_WRITE); + if (ret < 0) + goto out; + + ret = get_cur_path(sctx, sctx->cur_ino, sctx->cur_inode_gen, p); + if (ret < 0) + goto out; + + btrfs_item_key_to_cpu(leaf, &key, path->slots[0]); + ei = btrfs_item_ptr(leaf, path->slots[0], + struct btrfs_file_extent_item); + disk_bytenr = btrfs_file_extent_disk_bytenr(leaf, ei); + disk_num_bytes = btrfs_file_extent_disk_num_bytes(leaf, ei); + + TLV_PUT_PATH(sctx, BTRFS_SEND_A_PATH, p); + TLV_PUT_U64(sctx, BTRFS_SEND_A_FILE_OFFSET, offset); + TLV_PUT_U64(sctx, BTRFS_SEND_A_UNENCODED_FILE_LEN, + min(key.offset + btrfs_file_extent_num_bytes(leaf, ei) - offset, + len)); + TLV_PUT_U64(sctx, BTRFS_SEND_A_UNENCODED_LEN, + btrfs_file_extent_ram_bytes(leaf, ei)); + TLV_PUT_U64(sctx, BTRFS_SEND_A_UNENCODED_OFFSET, + offset - key.offset + btrfs_file_extent_offset(leaf, ei)); + ret = btrfs_encoded_io_compression_from_extent(fs_info, + btrfs_file_extent_compression(leaf, ei)); + if (ret < 0) + goto out; + TLV_PUT_U32(sctx, BTRFS_SEND_A_COMPRESSION, ret); + TLV_PUT_U32(sctx, BTRFS_SEND_A_ENCRYPTION, 0); + + ret = put_data_header(sctx, disk_num_bytes); + if (ret < 0) + goto out; + + /* + * We want to do I/O directly into the send buffer, so get the next page + * boundary in the send buffer. This means that there may be a gap + * between the beginning of the command and the file data. + */ + data_offset = ALIGN(sctx->send_size, PAGE_SIZE); + if (data_offset > sctx->send_max_size || + sctx->send_max_size - data_offset < disk_num_bytes) { + ret = -EOVERFLOW; + goto out; + } + + /* + * Note that send_buf is a mapping of send_buf_pages, so this is really + * reading into send_buf. + */ + ret = btrfs_encoded_read_regular_fill_pages(BTRFS_I(inode), offset, + disk_bytenr, disk_num_bytes, + sctx->send_buf_pages + + (data_offset >> PAGE_SHIFT)); + if (ret) + goto out; + + hdr = (struct btrfs_cmd_header *)sctx->send_buf; + hdr->len = cpu_to_le32(sctx->send_size + disk_num_bytes - sizeof(*hdr)); + hdr->crc = 0; + crc = btrfs_crc32c(0, sctx->send_buf, sctx->send_size); + crc = btrfs_crc32c(crc, sctx->send_buf + data_offset, disk_num_bytes); + hdr->crc = cpu_to_le32(crc); + + ret = write_buf(sctx->send_filp, sctx->send_buf, sctx->send_size, + &sctx->send_off); + if (!ret) { + ret = write_buf(sctx->send_filp, sctx->send_buf + data_offset, + disk_num_bytes, &sctx->send_off); + } + sctx->send_size = 0; + sctx->put_data = false; + +tlv_put_failure: +out: + fs_path_free(p); + iput(inode); + return ret; +} + +static int send_extent_data(struct send_ctx *sctx, struct btrfs_path *path, + const u64 offset, const u64 len) +{ + struct extent_buffer *leaf = path->nodes[0]; + struct btrfs_file_extent_item *ei; u64 read_size = max_send_read_size(sctx); u64 sent = 0; if (sctx->flags & BTRFS_SEND_FLAG_NO_FILE_DATA) return send_update_extent(sctx, offset, len); + ei = btrfs_item_ptr(leaf, path->slots[0], + struct btrfs_file_extent_item); + if ((sctx->flags & BTRFS_SEND_FLAG_COMPRESSED) && + btrfs_file_extent_compression(leaf, ei) != BTRFS_COMPRESS_NONE) { + bool is_inline = (btrfs_file_extent_type(leaf, ei) == + BTRFS_FILE_EXTENT_INLINE); + + /* + * Send the compressed extent unless the compressed data is + * larger than the decompressed data. This can happen if we're + * not sending the entire extent, either because it has been + * partially overwritten/truncated or because this is a part of + * the extent that we couldn't clone in clone_range(). + */ + if (is_inline && + btrfs_file_extent_inline_item_len(leaf, + path->slots[0]) <= len) { + return send_encoded_inline_extent(sctx, path, offset, + len); + } else if (!is_inline && + btrfs_file_extent_disk_num_bytes(leaf, ei) <= len) { + return send_encoded_extent(sctx, path, offset, len); + } + } + while (sent < len) { u64 size = min(len - sent, read_size); int ret; @@ -5314,12 +5514,9 @@ static int send_capabilities(struct send_ctx *sctx) return ret; } -static int clone_range(struct send_ctx *sctx, - struct clone_root *clone_root, - const u64 disk_byte, - u64 data_offset, - u64 offset, - u64 len) +static int clone_range(struct send_ctx *sctx, struct btrfs_path *dst_path, + struct clone_root *clone_root, const u64 disk_byte, + u64 data_offset, u64 offset, u64 len) { struct btrfs_path *path; struct btrfs_key key; @@ -5343,7 +5540,7 @@ static int clone_range(struct send_ctx *sctx, */ if (clone_root->offset == 0 && len == sctx->send_root->fs_info->sectorsize) - return send_extent_data(sctx, offset, len); + return send_extent_data(sctx, dst_path, offset, len); path = alloc_path_for_send(); if (!path) @@ -5440,7 +5637,8 @@ static int clone_range(struct send_ctx *sctx, if (hole_len > len) hole_len = len; - ret = send_extent_data(sctx, offset, hole_len); + ret = send_extent_data(sctx, dst_path, offset, + hole_len); if (ret < 0) goto out; @@ -5513,14 +5711,16 @@ static int clone_range(struct send_ctx *sctx, if (ret < 0) goto out; } - ret = send_extent_data(sctx, offset + slen, + ret = send_extent_data(sctx, dst_path, + offset + slen, clone_len - slen); } else { ret = send_clone(sctx, offset, clone_len, clone_root); } } else { - ret = send_extent_data(sctx, offset, clone_len); + ret = send_extent_data(sctx, dst_path, offset, + clone_len); } if (ret < 0) @@ -5552,7 +5752,7 @@ static int clone_range(struct send_ctx *sctx, } if (len > 0) - ret = send_extent_data(sctx, offset, len); + ret = send_extent_data(sctx, dst_path, offset, len); else ret = 0; out: @@ -5583,10 +5783,10 @@ static int send_write_or_clone(struct send_ctx *sctx, struct btrfs_file_extent_item); disk_byte = btrfs_file_extent_disk_bytenr(path->nodes[0], ei); data_offset = btrfs_file_extent_offset(path->nodes[0], ei); - ret = clone_range(sctx, clone_root, disk_byte, data_offset, - offset, end - offset); + ret = clone_range(sctx, path, clone_root, disk_byte, + data_offset, offset, end - offset); } else { - ret = send_extent_data(sctx, offset, end - offset); + ret = send_extent_data(sctx, path, offset, end - offset); } sctx->cur_inode_next_write_offset = end; return ret; From patchwork Thu Feb 10 19:10:07 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Omar Sandoval X-Patchwork-Id: 12742349 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 48A37C433FE for ; Thu, 10 Feb 2022 19:10:51 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1343759AbiBJTKs (ORCPT ); Thu, 10 Feb 2022 14:10:48 -0500 Received: from mxb-00190b01.gslb.pphosted.com ([23.128.96.19]:48316 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1343750AbiBJTKp (ORCPT ); Thu, 10 Feb 2022 14:10:45 -0500 Received: from mail-pl1-x62e.google.com (mail-pl1-x62e.google.com [IPv6:2607:f8b0:4864:20::62e]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id C7762110C for ; Thu, 10 Feb 2022 11:10:46 -0800 (PST) Received: by mail-pl1-x62e.google.com with SMTP id w20so2655646plq.12 for ; Thu, 10 Feb 2022 11:10:46 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=osandov-com.20210112.gappssmtp.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=xb84Ra7GCRaPQur5sHxFQDfHS0V1ba52DoxX2Ew8cyM=; b=ycklHt9x8XnNWNKA2/VJbCEXvr1e6SMxnYIQ+yI5zglf4wFoQO8/gR1ngdyrDRHrF6 JhDpY+GAayGkmc0Z9UAfcujpBJf+alNthLMXnXJWZDlP6GaIROpywOdaYVaJMiCY0n7u yqKiSLCACwQ30rm9XmTWIK3vW2h3OjBFVYlZBO1VT7fv/k/Bmrkw1OAqFGD9ukvRQAfF 9m0WsB0QJL9gWIwx8fu6sMxXqK3eJCExHPRWsKdFqx9P88ocT7mPzpFvQrOdJVwmlFjI R+VxE5D1qiufCk7UGYaB2XE9Fa2/GG+Uz08kHGMfbIVlQIRr/lYT/IgPRTjv5fiWrTI8 bTHw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=xb84Ra7GCRaPQur5sHxFQDfHS0V1ba52DoxX2Ew8cyM=; b=J1IQMM66nBSVMe378rGlaTYPJYQkX44dfkKKVbfv+TFD5Xe6iPLG/jUUYLYfrQt43w Pzu89YLNtp38VCEQ1f0YvuSm1OgXO7UKu+EebFLxY0mi+y7cCjbtKQiKKXlRXo/yrQw2 hGNVLnmzICJJ4IMXVxsROYNwK+nqxaNqcO6aULWvOKeK5jlraQXSObwO6Rx+dVjzVh9w 4pQPVu3EgxdJcrTWznxznSWpX95lnOGK8DpjPLXMN9rQMFMmeBdbTJnh419XkjhHE2QL uBCVTjOwOeUBDo+CkXqcsPwAJ4aNsyA8pFUk6tU+AzXmPJAuc7gBuRICDYeDL7YBSnyc lLQw== X-Gm-Message-State: AOAM531wiw77Fc/meCVINx+57d/Iuky5AWAKxqnG6lr7fdiQXsZjj1NO PblnVUaWG5mVW+Oh2vuXJtVwwYC9hu/u+Q== X-Google-Smtp-Source: ABdhPJy5yYWl2epF3H2wnEPLrZRALksW+VJ4Li5Yp/RiHAZfDKZS8kEeHP0sBlh+M8ya9g9b3pXM7A== X-Received: by 2002:a17:90a:141:: with SMTP id z1mr4330375pje.87.1644520245926; Thu, 10 Feb 2022 11:10:45 -0800 (PST) Received: from relinquished.tfbnw.net ([2620:10d:c090:400::5:1975]) by smtp.gmail.com with ESMTPSA id n5sm8315898pgt.22.2022.02.10.11.10.45 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 10 Feb 2022 11:10:45 -0800 (PST) From: Omar Sandoval To: linux-btrfs@vger.kernel.org Cc: kernel-team@fb.com Subject: [PATCH v13 17/17] btrfs: send: enable support for stream v2 and compressed writes Date: Thu, 10 Feb 2022 11:10:07 -0800 Message-Id: <71ec89beca3ab00460b345e58d031680e61d22d2.1644519258.git.osandov@fb.com> X-Mailer: git-send-email 2.35.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org From: Omar Sandoval Now that the new support is implemented, allow the ioctl to accept v2 and the compressed flag, and update the version in sysfs. Signed-off-by: Omar Sandoval --- fs/btrfs/send.c | 7 +++++-- fs/btrfs/send.h | 2 +- include/uapi/linux/btrfs.h | 3 ++- 3 files changed, 8 insertions(+), 4 deletions(-) diff --git a/fs/btrfs/send.c b/fs/btrfs/send.c index 5192ba58459e..7053cb2cfec3 100644 --- a/fs/btrfs/send.c +++ b/fs/btrfs/send.c @@ -690,8 +690,7 @@ static int send_header(struct send_ctx *sctx) struct btrfs_stream_header hdr; strcpy(hdr.magic, BTRFS_SEND_STREAM_MAGIC); - hdr.version = cpu_to_le32(BTRFS_SEND_STREAM_VERSION); - + hdr.version = cpu_to_le32(sctx->proto); return write_buf(sctx->send_filp, &hdr, sizeof(hdr), &sctx->send_off); } @@ -7768,6 +7767,10 @@ long btrfs_ioctl_send(struct inode *inode, struct btrfs_ioctl_send_args *arg) } else { sctx->proto = 1; } + if ((arg->flags & BTRFS_SEND_FLAG_COMPRESSED) && sctx->proto < 2) { + ret = -EINVAL; + goto out; + } sctx->send_filp = fget(arg->send_fd); if (!sctx->send_filp) { diff --git a/fs/btrfs/send.h b/fs/btrfs/send.h index 805d8095209a..50a2aceae929 100644 --- a/fs/btrfs/send.h +++ b/fs/btrfs/send.h @@ -10,7 +10,7 @@ #include "ctree.h" #define BTRFS_SEND_STREAM_MAGIC "btrfs-stream" -#define BTRFS_SEND_STREAM_VERSION 1 +#define BTRFS_SEND_STREAM_VERSION 2 /* * In send stream v1, no command is larger than 64k. In send stream v2, no limit diff --git a/include/uapi/linux/btrfs.h b/include/uapi/linux/btrfs.h index 38578b1ce0a3..c749a5b7a4a3 100644 --- a/include/uapi/linux/btrfs.h +++ b/include/uapi/linux/btrfs.h @@ -788,7 +788,8 @@ struct btrfs_ioctl_received_subvol_args { (BTRFS_SEND_FLAG_NO_FILE_DATA | \ BTRFS_SEND_FLAG_OMIT_STREAM_HEADER | \ BTRFS_SEND_FLAG_OMIT_END_CMD | \ - BTRFS_SEND_FLAG_VERSION) + BTRFS_SEND_FLAG_VERSION | \ + BTRFS_SEND_FLAG_COMPRESSED) struct btrfs_ioctl_send_args { __s64 send_fd; /* in */