From patchwork Wed Apr 29 02:44:20 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Darrick J. Wong" X-Patchwork-Id: 11515827 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 90EDD92C for ; Wed, 29 Apr 2020 02:44:25 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 75B032073E for ; Wed, 29 Apr 2020 02:44:25 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=oracle.com header.i=@oracle.com header.b="OVG7nKcu" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726560AbgD2CoY (ORCPT ); Tue, 28 Apr 2020 22:44:24 -0400 Received: from userp2120.oracle.com ([156.151.31.85]:38248 "EHLO userp2120.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726345AbgD2CoY (ORCPT ); Tue, 28 Apr 2020 22:44:24 -0400 Received: from pps.filterd (userp2120.oracle.com [127.0.0.1]) by userp2120.oracle.com (8.16.0.42/8.16.0.42) with SMTP id 03T2hH0C121431; Wed, 29 Apr 2020 02:44:23 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=subject : from : to : cc : date : message-id : in-reply-to : references : mime-version : content-type : content-transfer-encoding; s=corp-2020-01-29; bh=guhVmQ0pe2mThZ/0Ink7Hxjr6GB/qhtjtXsZIhi+oxk=; b=OVG7nKcuYHwbt7YU5WUTzUKQd39S2ORYVy2afxs55XR3nHaWKg2Z3Nqv4tUDuBhWrVlR zz21WR8ZgrJMk/ovdjLpoZ9mer4//n+aDlS1+e2PGeLXyFAnM8eNqzrzcPl6x0DBT1cc GJGosulBKZsYyU71J4Uu90T2VABThHRfafc07qCTMZU8p1QQAuWGt1orWEQUqPBSeWyY dfsvLjLAz9moeYF8FNwocFgS3+djt69qQ5I4z4wcstUDDlewAsoDdjF1EzovN7dvXkmt +Tz1lDtwgOYFOKFDyXpjyf+Ugs6D4eD9y977ZcA39iI8TiHC1vGnnfQ/FVBYgp91Hx5r ZA== Received: from userp3020.oracle.com (userp3020.oracle.com [156.151.31.79]) by userp2120.oracle.com with ESMTP id 30p2p08p0y-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Wed, 29 Apr 2020 02:44:22 +0000 Received: from pps.filterd (userp3020.oracle.com [127.0.0.1]) by userp3020.oracle.com (8.16.0.42/8.16.0.42) with SMTP id 03T2g6nL096345; Wed, 29 Apr 2020 02:44:22 GMT Received: from userv0122.oracle.com (userv0122.oracle.com [156.151.31.75]) by userp3020.oracle.com with ESMTP id 30pvcytckj-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Wed, 29 Apr 2020 02:44:22 +0000 Received: from abhmp0002.oracle.com (abhmp0002.oracle.com [141.146.116.8]) by userv0122.oracle.com (8.14.4/8.14.4) with ESMTP id 03T2iL6G022398; Wed, 29 Apr 2020 02:44:21 GMT Received: from localhost (/67.169.218.210) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Tue, 28 Apr 2020 19:44:21 -0700 Subject: [PATCH 01/18] xfs: clean up the error handling in xfs_swap_extent_rmap From: "Darrick J. Wong" To: darrick.wong@oracle.com Cc: linux-xfs@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-api@vger.kernel.org Date: Tue, 28 Apr 2020 19:44:20 -0700 Message-ID: <158812826049.168506.1665433119534581837.stgit@magnolia> In-Reply-To: <158812825316.168506.932540609191384366.stgit@magnolia> References: <158812825316.168506.932540609191384366.stgit@magnolia> User-Agent: StGit/0.17.1-dirty MIME-Version: 1.0 X-Proofpoint-Virus-Version: vendor=nai engine=6000 definitions=9605 signatures=668686 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 spamscore=0 mlxlogscore=999 suspectscore=1 malwarescore=0 adultscore=0 bulkscore=0 phishscore=0 mlxscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2003020000 definitions=main-2004290020 X-Proofpoint-Virus-Version: vendor=nai engine=6000 definitions=9605 signatures=668686 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 spamscore=0 phishscore=0 clxscore=1015 bulkscore=0 adultscore=0 lowpriorityscore=0 impostorscore=0 malwarescore=0 mlxscore=0 suspectscore=1 mlxlogscore=999 priorityscore=1501 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2003020000 definitions=main-2004290020 Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org From: Darrick J. Wong Clean up the error handling and make sure we actually bail out if there's something not right with either file's fork mappings or we couldn't clear all the COW extents. Signed-off-by: Darrick J. Wong --- fs/xfs/xfs_bmap_util.c | 33 ++++++++++++++++++++++++--------- 1 file changed, 24 insertions(+), 9 deletions(-) diff --git a/fs/xfs/xfs_bmap_util.c b/fs/xfs/xfs_bmap_util.c index cfd6e64661ba..746bb0c8271c 100644 --- a/fs/xfs/xfs_bmap_util.c +++ b/fs/xfs/xfs_bmap_util.c @@ -1393,8 +1393,16 @@ xfs_swap_extent_rmap( &nimaps, 0); if (error) goto out; - ASSERT(nimaps == 1); - ASSERT(tirec.br_startblock != DELAYSTARTBLOCK); + if (nimaps != 1 || tirec.br_startblock == DELAYSTARTBLOCK) { + /* + * We should never get no mapping or a delalloc extent + * since the donor file should have been flushed by the + * caller. + */ + ASSERT(0); + error = -EINVAL; + goto out; + } trace_xfs_swap_extent_rmap_remap(tip, &tirec); ilen = tirec.br_blockcount; @@ -1411,8 +1419,17 @@ xfs_swap_extent_rmap( &nimaps, 0); if (error) goto out; - ASSERT(nimaps == 1); - ASSERT(tirec.br_startoff == irec.br_startoff); + if (nimaps != 1 || + tirec.br_startoff != irec.br_startoff) { + /* + * We should never get no mapping or a mapping + * for another offset, but bail out if that + * ever does. + */ + ASSERT(0); + error = -EFSCORRUPTED; + goto out; + } trace_xfs_swap_extent_rmap_remap_piece(ip, &irec); /* Trim the extent. */ @@ -1451,11 +1468,9 @@ xfs_swap_extent_rmap( offset_fsb += ilen; } - tip->i_d.di_flags2 = tip_flags2; - return 0; - out: - trace_xfs_swap_extent_rmap_error(ip, error, _RET_IP_); + if (error) + trace_xfs_swap_extent_rmap_error(ip, error, _RET_IP_); tip->i_d.di_flags2 = tip_flags2; return error; } @@ -1657,7 +1672,7 @@ xfs_swap_extents( if (xfs_inode_has_cow_data(tip)) { error = xfs_reflink_cancel_cow_range(tip, 0, NULLFILEOFF, true); if (error) - return error; + goto out_unlock; } /* From patchwork Wed Apr 29 02:44:26 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Darrick J. Wong" X-Patchwork-Id: 11515893 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 15D7892C for ; Wed, 29 Apr 2020 02:46:32 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id F200D2073E for ; Wed, 29 Apr 2020 02:46:31 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=oracle.com header.i=@oracle.com header.b="t0mjEXnZ" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726753AbgD2Cqb (ORCPT ); Tue, 28 Apr 2020 22:46:31 -0400 Received: from aserp2120.oracle.com ([141.146.126.78]:49528 "EHLO aserp2120.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726571AbgD2Cqb (ORCPT ); Tue, 28 Apr 2020 22:46:31 -0400 Received: from pps.filterd (aserp2120.oracle.com [127.0.0.1]) by aserp2120.oracle.com (8.16.0.42/8.16.0.42) with SMTP id 03T2h6cM072910; Wed, 29 Apr 2020 02:46:29 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=subject : from : to : cc : date : message-id : in-reply-to : references : mime-version : content-type : content-transfer-encoding; s=corp-2020-01-29; bh=0qxyctrjdQ6j6nuMWFNiHvvebUDRZj79SaRpQZaKPrg=; b=t0mjEXnZM1rKerwKQliovyKy9mMYRBrSYHAKjKmrPj+6JvNBcs2VsvT9x+LvDx1O1QJ9 2vBWVJSA88qAw0G33ihNJCqfvRAP4WvJ7LbOTRPWmRqVI8vPJCpbyY1rd2CQi4mlC0le 1RQ45aQfVo4JGO8mljPcqcNZ0V950lcgm7D2CQvAvrPedwpdXIvb7IVHT3WzzrEMQL0k tNdykI5XAlHZL1CMdn5ZAT28B4/nnlIvzum6wyLl5rQtumIDij2tjHnzjl+RjvFi7mZB IobDBduWunacnKtZ1Nx3R5YglS3DstzyCz9N9poD4OLEBXXrkCGISfZHmHs3ziSAQXws TA== Received: from userp3030.oracle.com (userp3030.oracle.com [156.151.31.80]) by aserp2120.oracle.com with ESMTP id 30nucg39rb-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Wed, 29 Apr 2020 02:46:29 +0000 Received: from pps.filterd (userp3030.oracle.com [127.0.0.1]) by userp3030.oracle.com (8.16.0.42/8.16.0.42) with SMTP id 03T2gg0X071549; Wed, 29 Apr 2020 02:44:29 GMT Received: from userv0122.oracle.com (userv0122.oracle.com [156.151.31.75]) by userp3030.oracle.com with ESMTP id 30mxphp13m-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Wed, 29 Apr 2020 02:44:28 +0000 Received: from abhmp0017.oracle.com (abhmp0017.oracle.com [141.146.116.23]) by userv0122.oracle.com (8.14.4/8.14.4) with ESMTP id 03T2iSLr022423; Wed, 29 Apr 2020 02:44:28 GMT Received: from localhost (/67.169.218.210) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Tue, 28 Apr 2020 19:44:27 -0700 Subject: [PATCH 02/18] xfs: fix xfs_reflink_remap_prep calling conventions From: "Darrick J. Wong" To: darrick.wong@oracle.com Cc: linux-xfs@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-api@vger.kernel.org Date: Tue, 28 Apr 2020 19:44:26 -0700 Message-ID: <158812826681.168506.8309047158870409011.stgit@magnolia> In-Reply-To: <158812825316.168506.932540609191384366.stgit@magnolia> References: <158812825316.168506.932540609191384366.stgit@magnolia> User-Agent: StGit/0.17.1-dirty MIME-Version: 1.0 X-Proofpoint-Virus-Version: vendor=nai engine=6000 definitions=9605 signatures=668686 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 mlxlogscore=999 malwarescore=0 mlxscore=0 bulkscore=0 adultscore=0 phishscore=0 suspectscore=1 spamscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2003020000 definitions=main-2004290020 X-Proofpoint-Virus-Version: vendor=nai engine=6000 definitions=9605 signatures=668686 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 clxscore=1015 priorityscore=1501 mlxlogscore=999 impostorscore=0 suspectscore=1 malwarescore=0 lowpriorityscore=0 mlxscore=0 spamscore=0 adultscore=0 phishscore=0 bulkscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2003020000 definitions=main-2004290020 Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org From: Darrick J. Wong Fix the return value of xfs_reflink_remap_prep so that its calling conventions match the rest of xfs. Signed-off-by: Darrick J. Wong Reviewed-by: Allison Collins --- fs/xfs/xfs_file.c | 2 +- fs/xfs/xfs_reflink.c | 6 +++--- 2 files changed, 4 insertions(+), 4 deletions(-) diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c index 994fd3d59872..1759fbcbcd46 100644 --- a/fs/xfs/xfs_file.c +++ b/fs/xfs/xfs_file.c @@ -1029,7 +1029,7 @@ xfs_file_remap_range( /* Prepare and then clone file data. */ ret = xfs_reflink_remap_prep(file_in, pos_in, file_out, pos_out, &len, remap_flags); - if (ret < 0 || len == 0) + if (ret || len == 0) return ret; trace_xfs_reflink_remap_range(src, pos_in, len, dest, pos_out); diff --git a/fs/xfs/xfs_reflink.c b/fs/xfs/xfs_reflink.c index d8c8b299cb1f..5e978d1f169d 100644 --- a/fs/xfs/xfs_reflink.c +++ b/fs/xfs/xfs_reflink.c @@ -1375,7 +1375,7 @@ xfs_reflink_remap_prep( struct inode *inode_out = file_inode(file_out); struct xfs_inode *dest = XFS_I(inode_out); bool same_inode = (inode_in == inode_out); - ssize_t ret; + int ret; /* Lock both files against IO */ ret = xfs_iolock_two_inodes_and_break_layout(inode_in, inode_out); @@ -1399,7 +1399,7 @@ xfs_reflink_remap_prep( ret = generic_remap_file_range_prep(file_in, pos_in, file_out, pos_out, len, remap_flags); - if (ret < 0 || *len == 0) + if (ret || *len == 0) goto out_unlock; /* Attach dquots to dest inode before changing block map */ @@ -1434,7 +1434,7 @@ xfs_reflink_remap_prep( if (ret) goto out_unlock; - return 1; + return 0; out_unlock: xfs_reflink_remap_unlock(file_in, file_out); return ret; From patchwork Wed Apr 29 02:44:33 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Darrick J. Wong" X-Patchwork-Id: 11515833 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 4DAD417EF for ; Wed, 29 Apr 2020 02:44:41 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 2875F20775 for ; Wed, 29 Apr 2020 02:44:41 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=oracle.com header.i=@oracle.com header.b="gfPq8EKi" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726567AbgD2Cok (ORCPT ); Tue, 28 Apr 2020 22:44:40 -0400 Received: from userp2120.oracle.com ([156.151.31.85]:38382 "EHLO userp2120.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726345AbgD2Coj (ORCPT ); Tue, 28 Apr 2020 22:44:39 -0400 Received: from pps.filterd (userp2120.oracle.com [127.0.0.1]) by userp2120.oracle.com (8.16.0.42/8.16.0.42) with SMTP id 03T2ibu8122095; Wed, 29 Apr 2020 02:44:37 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=subject : from : to : cc : date : message-id : in-reply-to : references : mime-version : content-type : content-transfer-encoding; s=corp-2020-01-29; bh=jk8EPYg3gUBxdiB0ZPxBzPkSOBbhlCvMSpzZSp9mc3Y=; b=gfPq8EKisCoo2kUL/B692ceNG2qGt5s9t9apJoxCj2CoYrcq4zPnkozydTHRnG4TjETE 4joH35KKioAGtxFvEiqPi1UmONrSmXUiH+DCtScTjEafUAHunzrTaQZf5sB2wQkcjw7u 7siw+zIuJoQMsMu+K4fzcrW43ZxAqonKK5p5Ckl9IZgUORoSv9i9WkbG85PjGF6lqFnF 4n/tmhhxnROFf1inhjLm41eLQXaI75HgP4Xo0msraiiHZnzJBuOZIL7P7At+NBjIB9Ro voP7jgGdzHKHONRIdSdaz+nkJ5V4WGOVWtvd9weBbqtW6wXpsh+fX8oyTmBhic80kEuA +g== Received: from aserp3030.oracle.com (aserp3030.oracle.com [141.146.126.71]) by userp2120.oracle.com with ESMTP id 30p2p08p1b-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Wed, 29 Apr 2020 02:44:36 +0000 Received: from pps.filterd (aserp3030.oracle.com [127.0.0.1]) by aserp3030.oracle.com (8.16.0.42/8.16.0.42) with SMTP id 03T2g4jm039276; Wed, 29 Apr 2020 02:44:36 GMT Received: from userv0121.oracle.com (userv0121.oracle.com [156.151.31.72]) by aserp3030.oracle.com with ESMTP id 30mxru03tu-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Wed, 29 Apr 2020 02:44:35 +0000 Received: from abhmp0017.oracle.com (abhmp0017.oracle.com [141.146.116.23]) by userv0121.oracle.com (8.14.4/8.13.8) with ESMTP id 03T2iYxD003471; Wed, 29 Apr 2020 02:44:34 GMT Received: from localhost (/67.169.218.210) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Tue, 28 Apr 2020 19:44:34 -0700 Subject: [PATCH 03/18] vfs: introduce new file extent swap ioctl From: "Darrick J. Wong" To: darrick.wong@oracle.com Cc: linux-xfs@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-api@vger.kernel.org Date: Tue, 28 Apr 2020 19:44:33 -0700 Message-ID: <158812827320.168506.17255602633619684843.stgit@magnolia> In-Reply-To: <158812825316.168506.932540609191384366.stgit@magnolia> References: <158812825316.168506.932540609191384366.stgit@magnolia> User-Agent: StGit/0.17.1-dirty MIME-Version: 1.0 X-Proofpoint-Virus-Version: vendor=nai engine=6000 definitions=9605 signatures=668686 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 mlxscore=0 phishscore=0 suspectscore=1 mlxlogscore=807 malwarescore=0 bulkscore=0 spamscore=0 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2003020000 definitions=main-2004290020 X-Proofpoint-Virus-Version: vendor=nai engine=6000 definitions=9605 signatures=668686 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 spamscore=0 phishscore=0 clxscore=1015 bulkscore=0 adultscore=0 lowpriorityscore=0 impostorscore=0 malwarescore=0 mlxscore=0 suspectscore=1 mlxlogscore=862 priorityscore=1501 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2003020000 definitions=main-2004290020 Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org From: Darrick J. Wong Introduce a new ioctl to handle swapping extents between two files. Signed-off-by: Darrick J. Wong --- fs/ioctl.c | 32 ++++++++ fs/read_write.c | 188 +++++++++++++++++++++++++++++++++++++++++++++++ fs/xfs/libxfs/xfs_fs.h | 1 include/linux/fs.h | 15 ++++ include/uapi/linux/fs.h | 55 ++++++++++++++ mm/filemap.c | 77 +++++++++++++++++++ 6 files changed, 367 insertions(+), 1 deletion(-) diff --git a/fs/ioctl.c b/fs/ioctl.c index 282d45be6f45..f564e6f2fad5 100644 --- a/fs/ioctl.c +++ b/fs/ioctl.c @@ -268,6 +268,35 @@ static long ioctl_file_clone_range(struct file *file, args.src_length, args.dest_offset); } +static long ioctl_file_swap_range(struct file *file2, + struct file_swap_range __user *argp) +{ + struct file_swap_range args; + struct fd file1; + int ret; + + if (copy_from_user(&args, argp, sizeof(args))) + return -EFAULT; + + file1 = fdget(args.file1_fd); + if (!file1.file) + return -EBADF; + + ret = -EXDEV; + if (file1.file->f_path.mnt != file2->f_path.mnt) + goto fdput; + + ret = vfs_swap_file_range(file1.file, file2, &args); + if (ret) + goto fdput; + + if (copy_to_user(argp, &args, sizeof(args))) + ret = -EFAULT; +fdput: + fdput(file1); + return ret; +} + #ifdef CONFIG_BLOCK static inline sector_t logical_to_blk(struct inode *inode, loff_t offset) @@ -730,6 +759,9 @@ static int do_vfs_ioctl(struct file *filp, unsigned int fd, case FIDEDUPERANGE: return ioctl_file_dedupe_range(filp, argp); + case FISWAPRANGE: + return ioctl_file_swap_range(filp, argp); + case FIONREAD: if (!S_ISREG(inode->i_mode)) return vfs_ioctl(filp, cmd, arg); diff --git a/fs/read_write.c b/fs/read_write.c index bbfa9b12b15e..2b5116f129de 100644 --- a/fs/read_write.c +++ b/fs/read_write.c @@ -2081,6 +2081,92 @@ int generic_remap_file_range_prep(struct file *file_in, loff_t pos_in, } EXPORT_SYMBOL(generic_remap_file_range_prep); +/* + * Check that the two inodes are eligible for range swapping, the ranges make + * sense, and then flush all dirty data. Caller must ensure that the inodes + * have been locked against any other modifications. + */ +int generic_swap_file_range_prep(struct file *file1, struct file *file2, + struct file_swap_range *fsr) +{ + struct inode *inode1 = file_inode(file1); + struct inode *inode2 = file_inode(file2); + u64 blkmask = i_blocksize(inode1) - 1; + bool same_inode = (inode1 == inode2); + int ret; + + /* Don't touch certain kinds of inodes */ + if (IS_IMMUTABLE(inode2)) + return -EPERM; + + if (IS_SWAPFILE(inode1) || IS_SWAPFILE(inode2)) + return -ETXTBSY; + + /* Don't reflink dirs, pipes, sockets... */ + if (S_ISDIR(inode1->i_mode) || S_ISDIR(inode2->i_mode)) + return -EISDIR; + if (!S_ISREG(inode1->i_mode) || !S_ISREG(inode2->i_mode)) + return -EINVAL; + + /* Ranges cannot start after EOF. */ + if (fsr->file1_offset > i_size_read(inode1) || + fsr->file2_offset > i_size_read(inode2)) + return -EINVAL; + + /* + * If the caller said to swap to EOF, we set the length of the request + * large enough to cover everything to the end of both files. + */ + if (fsr->flags & FILE_SWAP_RANGE_TO_EOF) + fsr->length = max_t(int64_t, + i_size_read(inode1) - fsr->file1_offset, + i_size_read(inode2) - fsr->file2_offset); + + /* Zero length swapext exits immediately. */ + if (fsr->length == 0) + return 0; + + /* Check that we don't violate system file offset limits. */ + ret = generic_swap_file_range_checks(file1, file2, fsr); + if (ret) + return ret; + + /* + * Ensure that we don't swap a partial EOF block into the middle of + * another file. + */ + if (fsr->length & blkmask) { + loff_t new_length = fsr->length; + + if (fsr->file2_offset + new_length < i_size_read(inode2)) + new_length &= ~blkmask; + + if (fsr->file1_offset + new_length < i_size_read(inode1)) + new_length &= ~blkmask; + + if (new_length != fsr->length) + return -EINVAL; + } + + /* Wait for the completion of any pending IOs on both files */ + inode_dio_wait(inode1); + if (!same_inode) + inode_dio_wait(inode2); + + ret = filemap_write_and_wait_range(inode1->i_mapping, fsr->file1_offset, + fsr->file1_offset + fsr->length - 1); + if (ret) + return ret; + + ret = filemap_write_and_wait_range(inode2->i_mapping, fsr->file2_offset, + fsr->file2_offset + fsr->length - 1); + if (ret) + return ret; + + return 0; +} +EXPORT_SYMBOL(generic_swap_file_range_prep); + loff_t do_clone_file_range(struct file *file_in, loff_t pos_in, struct file *file_out, loff_t pos_out, loff_t len, unsigned int remap_flags) @@ -2278,3 +2364,105 @@ int vfs_dedupe_file_range(struct file *file, struct file_dedupe_range *same) return ret; } EXPORT_SYMBOL(vfs_dedupe_file_range); + +/* + * Check that both files' metadata agree with the snapshot that we took for + * the range swap request. + + * This should be called after the filesystem has locked /all/ inode metadata + * against modification. + */ +int generic_swap_file_range_check_fresh(struct inode *inode1, + struct inode *inode2, + const struct file_swap_range *fsr) +{ + /* Check that the offset/length values cover all of both files */ + if ((fsr->flags & FILE_SWAP_RANGE_FULL_FILES) && + (fsr->file1_offset != 0 || + fsr->file2_offset != 0 || + fsr->length != i_size_read(inode1) || + fsr->length != i_size_read(inode2))) + return -EDOM; + + /* Check that file2 hasn't otherwise been modified. */ + if ((fsr->flags & FILE_SWAP_RANGE_FILE2_FRESH) && + (fsr->file2_ino != inode2->i_ino || + fsr->file2_ctime != inode2->i_ctime.tv_sec || + fsr->file2_ctime_nsec != inode2->i_ctime.tv_nsec || + fsr->file2_mtime != inode2->i_mtime.tv_sec || + fsr->file2_mtime_nsec != inode2->i_mtime.tv_nsec)) + return -EBUSY; + + return 0; +} +EXPORT_SYMBOL(generic_swap_file_range_check_fresh); + +static inline int swap_range_verify_area(struct file *file, loff_t pos, + struct file_swap_range *fsr) +{ + int64_t len = fsr->length; + + if (fsr->flags & FILE_SWAP_RANGE_TO_EOF) + len = min_t(int64_t, len, i_size_read(file_inode(file)) - pos); + return remap_verify_area(file, pos, len, true); +} + +int do_swap_file_range(struct file *file1, struct file *file2, + struct file_swap_range *fsr) +{ + int ret; + + if ((fsr->flags & ~FILE_SWAP_RANGE_ALL_FLAGS) || + memchr_inv(&fsr->pad, 0, sizeof(fsr->pad))) + return -EINVAL; + + if ((fsr->flags & FILE_SWAP_RANGE_FULL_FILES) && + (fsr->flags & FILE_SWAP_RANGE_TO_EOF)) + return -EINVAL; + + /* + * FISWAPRANGE ioctl enforces that src and dest files are on the same + * mount. Practically, they only need to be on the same file system. + */ + if (file_inode(file1)->i_sb != file_inode(file2)->i_sb) + return -EXDEV; + + ret = generic_file_rw_checks(file1, file2); + if (ret < 0) + return ret; + + if (!file1->f_op->swap_file_range) + return -EOPNOTSUPP; + + ret = swap_range_verify_area(file1, fsr->file1_offset, fsr); + if (ret) + return ret; + + ret = swap_range_verify_area(file2, fsr->file2_offset, fsr); + if (ret) + return ret; + + ret = file2->f_op->swap_file_range(file1, file2, fsr); + if (ret) + return ret; + + file_modified(file1); + file_modified(file2); + fsnotify_modify(file1); + fsnotify_modify(file2); + return ret; +} +EXPORT_SYMBOL(do_swap_file_range); + +int vfs_swap_file_range(struct file *file1, struct file *file2, + struct file_swap_range *fsr) +{ + int ret; + + file_start_write(file2); + ret = do_swap_file_range(file1, file2, fsr); + file_end_write(file2); + + return ret; +} +EXPORT_SYMBOL(vfs_swap_file_range); diff --git a/fs/xfs/libxfs/xfs_fs.h b/fs/xfs/libxfs/xfs_fs.h index 18054120074e..c5b75082b9db 100644 --- a/fs/xfs/libxfs/xfs_fs.h +++ b/fs/xfs/libxfs/xfs_fs.h @@ -844,6 +844,7 @@ struct xfs_scrub_metadata { #define XFS_IOC_FSGEOMETRY _IOR ('X', 126, struct xfs_fsop_geom) #define XFS_IOC_BULKSTAT _IOR ('X', 127, struct xfs_bulkstat_req) #define XFS_IOC_INUMBERS _IOR ('X', 128, struct xfs_inumbers_req) +/* FISWAPRANGE ---------------- hoisted 129 */ /* XFS_IOC_GETFSUUID ---------- deprecated 140 */ diff --git a/include/linux/fs.h b/include/linux/fs.h index 4f6f59b4f22a..63acc11d0804 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -1862,6 +1862,8 @@ struct file_operations { loff_t (*remap_file_range)(struct file *file_in, loff_t pos_in, struct file *file_out, loff_t pos_out, loff_t len, unsigned int remap_flags); + int (*swap_file_range)(struct file *file_in, struct file *file_out, + struct file_swap_range *fsr); int (*fadvise)(struct file *, loff_t, loff_t, int); } __randomize_layout; @@ -1931,6 +1933,8 @@ extern int generic_remap_file_range_prep(struct file *file_in, loff_t pos_in, struct file *file_out, loff_t pos_out, loff_t *count, unsigned int remap_flags); +extern int generic_swap_file_range_prep(struct file *file1, struct file *file2, + struct file_swap_range *fsr); extern loff_t do_clone_file_range(struct file *file_in, loff_t pos_in, struct file *file_out, loff_t pos_out, loff_t len, unsigned int remap_flags); @@ -1942,7 +1946,13 @@ extern int vfs_dedupe_file_range(struct file *file, extern loff_t vfs_dedupe_file_range_one(struct file *src_file, loff_t src_pos, struct file *dst_file, loff_t dst_pos, loff_t len, unsigned int remap_flags); - +extern int do_swap_file_range(struct file *file1, struct file *file2, + struct file_swap_range *fsr); +extern int vfs_swap_file_range(struct file *file1, struct file *file2, + struct file_swap_range *fsr); +extern int generic_swap_file_range_check_fresh(struct inode *inode1, + struct inode *inode2, + const struct file_swap_range *fsr); struct super_operations { struct inode *(*alloc_inode)(struct super_block *sb); @@ -3120,6 +3130,9 @@ extern ssize_t generic_write_checks(struct kiocb *, struct iov_iter *); extern int generic_remap_checks(struct file *file_in, loff_t pos_in, struct file *file_out, loff_t pos_out, loff_t *count, unsigned int remap_flags); +extern int generic_swap_file_range_checks(struct file *file1, + struct file *file2, + const struct file_swap_range *fsr); extern int generic_file_rw_checks(struct file *file_in, struct file *file_out); extern int generic_copy_file_checks(struct file *file_in, loff_t pos_in, struct file *file_out, loff_t pos_out, diff --git a/include/uapi/linux/fs.h b/include/uapi/linux/fs.h index 379a612f8f1d..a74b49b02e75 100644 --- a/include/uapi/linux/fs.h +++ b/include/uapi/linux/fs.h @@ -93,6 +93,60 @@ struct file_dedupe_range { struct file_dedupe_range_info info[0]; }; +/* + * Swap part of file1 with part of the file that this ioctl that is being + * called against (which we'll call file2). Filesystems must be able to + * complete the operation even if the system goes down. + */ +struct file_swap_range { + __s64 file1_fd; + __s64 file1_offset; /* file1 offset, bytes */ + __s64 file2_offset; /* file2 offset, bytes */ + __s64 length; /* bytes to swap */ + + __u64 flags; /* see FILE_SWAP_RANGE_* below */ + + /* file2 metadata for optional freshness checks */ + __s64 file2_ino; /* inode number */ + __s64 file2_mtime; /* modification time */ + __s64 file2_ctime; /* change time */ + __s32 file2_mtime_nsec; /* mod time, nsec */ + __s32 file2_ctime_nsec; /* change time, nsec */ + + __u64 pad[6]; /* must be zeroes */ +}; + +/* + * Atomic swap operations are not required. This relaxes the requirement that + * the filesystem must be able to complete the operation after a crash. + */ +#define FILE_SWAP_RANGE_NONATOMIC (1 << 0) + +/* + * Check that file2's inode number, mtime, and ctime against the values + * provided, and return -EBUSY if there isn't an exact match. + */ +#define FILE_SWAP_RANGE_FILE2_FRESH (1 << 1) + +/* + * Check that the file1's length is equal to file1_offset + length, and that + * file2's length is equal to file2_offset + length. Returns -EDOM if there + * isn't an exact match. + */ +#define FILE_SWAP_RANGE_FULL_FILES (1 << 2) + +/* + * Swap file data all the way to the ends of both files, and then swap the file + * sizes. This flag can be used to replace a file's contents with a different + * amount of data. length will be ignored. + */ +#define FILE_SWAP_RANGE_TO_EOF (1 << 3) + +#define FILE_SWAP_RANGE_ALL_FLAGS (FILE_SWAP_RANGE_NONATOMIC | \ + FILE_SWAP_RANGE_FILE2_FRESH | \ + FILE_SWAP_RANGE_FULL_FILES | \ + FILE_SWAP_RANGE_TO_EOF) + /* And dynamically-tunable limits and defaults: */ struct files_stat_struct { unsigned long nr_files; /* read only */ @@ -198,6 +252,7 @@ struct fsxattr { #define FICLONE _IOW(0x94, 9, int) #define FICLONERANGE _IOW(0x94, 13, struct file_clone_range) #define FIDEDUPERANGE _IOWR(0x94, 54, struct file_dedupe_range) +#define FISWAPRANGE _IOWR('X', 129, struct file_swap_range) #define FSLABEL_MAX 256 /* Max chars for the interface; each fs may differ */ diff --git a/mm/filemap.c b/mm/filemap.c index 23a051a7ef0f..e21b63654767 100644 --- a/mm/filemap.c +++ b/mm/filemap.c @@ -3035,6 +3035,83 @@ int generic_remap_checks(struct file *file_in, loff_t pos_in, return 0; } +/* Performs necessary checks before doing a range swap. */ +int generic_swap_file_range_checks(struct file *file1, struct file *file2, + const struct file_swap_range *fsr) +{ + struct inode *inode1 = file1->f_mapping->host; + struct inode *inode2 = file2->f_mapping->host; + int64_t test_len; + uint64_t blen; + loff_t size1, size2; + loff_t bs = inode2->i_sb->s_blocksize; + int ret; + + if (fsr->length < 0) + return -EINVAL; + + /* The start of both ranges must be aligned to an fs block. */ + if (!IS_ALIGNED(fsr->file1_offset, bs) || + !IS_ALIGNED(fsr->file2_offset, bs)) + return -EINVAL; + + /* Ensure offsets don't wrap. */ + if (fsr->file1_offset + fsr->length < fsr->file1_offset || + fsr->file2_offset + fsr->length < fsr->file2_offset) + return -EINVAL; + + size1 = i_size_read(inode1); + size2 = i_size_read(inode2); + + /* + * Swapext require both ranges to be within EOF, unless we're swapping + * to EOF. generic_swap_range_prep already checked that both + * fsr->file1_offset and fsr->file2_offset are within EOF. + */ + if (!(fsr->flags & FILE_SWAP_RANGE_TO_EOF) && + (fsr->file1_offset + fsr->length > size1 || + fsr->file2_offset + fsr->length > size2)) + return -EINVAL; + + /* + * Make sure we don't hit any file size limits. If we hit any size + * limits such that test_length was adjusted, we abort the whole + * operation. + */ + test_len = fsr->length; + ret = generic_write_check_limits(file2, fsr->file2_offset, &test_len); + if (ret) + return ret; + ret = generic_write_check_limits(file1, fsr->file1_offset, &test_len); + if (ret) + return ret; + if (test_len != fsr->length) + return -EINVAL; + + /* + * If the user wanted us to swap to the infile's EOF, round up to the + * next block boundary for this check. Do the same for the outfile. + * + * Otherwise, reject the range length if it's not block aligned. We + * already confirmed the starting offsets' block alignment. + */ + if (fsr->file1_offset + fsr->length == size1) + blen = ALIGN(size1, bs) - fsr->file1_offset; + else if (fsr->file2_offset + fsr->length == size2) + blen = ALIGN(size2, bs) - fsr->file2_offset; + else if (!IS_ALIGNED(fsr->length, bs)) + return -EINVAL; + else + blen = fsr->length; + + /* Don't allow overlapped swapping within the same file. */ + if (inode1 == inode2 && + fsr->file2_offset + blen > fsr->file1_offset && + fsr->file1_offset + blen > fsr->file2_offset) + return -EINVAL; + + return 0; +} /* * Performs common checks before doing a file copy/clone From patchwork Wed Apr 29 02:44:39 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Darrick J. Wong" X-Patchwork-Id: 11515837 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 4275817EF for ; Wed, 29 Apr 2020 02:44:46 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 258732073E for ; Wed, 29 Apr 2020 02:44:46 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=oracle.com header.i=@oracle.com header.b="o7UP2jPf" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726596AbgD2Cop (ORCPT ); Tue, 28 Apr 2020 22:44:45 -0400 Received: from userp2130.oracle.com ([156.151.31.86]:50920 "EHLO userp2130.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726345AbgD2Cop (ORCPT ); Tue, 28 Apr 2020 22:44:45 -0400 Received: from pps.filterd (userp2130.oracle.com [127.0.0.1]) by userp2130.oracle.com (8.16.0.42/8.16.0.42) with SMTP id 03T2htL3155949; Wed, 29 Apr 2020 02:44:43 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=subject : from : to : cc : date : message-id : in-reply-to : references : mime-version : content-type : content-transfer-encoding; s=corp-2020-01-29; bh=DDKfNEFao+P2dZR33rqVXYI3gk5lvqtSbri2p/E4roo=; b=o7UP2jPfeG9eFQ3yJlYJpAUlqQJt+l2oeRCwTWxYQSKONk/2hB5z7926l1T+XffYT4NN 60z6d4f8uNUGqI+zwhRBKxnw1QZNtdCi6fCku43UTaEUBE2bTtYqqvsgccTMTZy1rown k0PdDO9v7DaLNfsOA1Mqs8MddmPKmQPTmn8VQDZ26dMyLBdlGaaidoxklkO5GasXwNcR nkknvqc4gnGOD5DUYi1dGiVnq5IbxmOaCb6OoUXxlIDOP+tSZLGAsTra7nkEaI/AwGMV mHoN9JCVaZYcxFc3JtC/nyP7ainCEygv3pWwouK7ElMlaXW6b9c16RInLfmmrBvLeGQq GA== Received: from aserp3020.oracle.com (aserp3020.oracle.com [141.146.126.70]) by userp2130.oracle.com with ESMTP id 30p01nstgk-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Wed, 29 Apr 2020 02:44:43 +0000 Received: from pps.filterd (aserp3020.oracle.com [127.0.0.1]) by aserp3020.oracle.com (8.16.0.42/8.16.0.42) with SMTP id 03T2h1vk075497; Wed, 29 Apr 2020 02:44:42 GMT Received: from userv0121.oracle.com (userv0121.oracle.com [156.151.31.72]) by aserp3020.oracle.com with ESMTP id 30my0f8cht-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Wed, 29 Apr 2020 02:44:42 +0000 Received: from abhmp0015.oracle.com (abhmp0015.oracle.com [141.146.116.21]) by userv0121.oracle.com (8.14.4/8.13.8) with ESMTP id 03T2ifPP003486; Wed, 29 Apr 2020 02:44:41 GMT Received: from localhost (/67.169.218.210) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Tue, 28 Apr 2020 19:44:40 -0700 Subject: [PATCH 04/18] xfs: support deferred bmap updates on the attr fork From: "Darrick J. Wong" To: darrick.wong@oracle.com Cc: linux-xfs@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-api@vger.kernel.org Date: Tue, 28 Apr 2020 19:44:39 -0700 Message-ID: <158812827961.168506.8394664032648525321.stgit@magnolia> In-Reply-To: <158812825316.168506.932540609191384366.stgit@magnolia> References: <158812825316.168506.932540609191384366.stgit@magnolia> User-Agent: StGit/0.17.1-dirty MIME-Version: 1.0 X-Proofpoint-Virus-Version: vendor=nai engine=6000 definitions=9605 signatures=668686 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 malwarescore=0 spamscore=0 suspectscore=3 adultscore=0 mlxlogscore=999 bulkscore=0 phishscore=0 mlxscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2003020000 definitions=main-2004290020 X-Proofpoint-Virus-Version: vendor=nai engine=6000 definitions=9605 signatures=668686 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 bulkscore=0 spamscore=0 clxscore=1015 phishscore=0 mlxlogscore=999 adultscore=0 priorityscore=1501 mlxscore=0 suspectscore=3 malwarescore=0 lowpriorityscore=0 impostorscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2003020000 definitions=main-2004290020 Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org From: Darrick J. Wong The deferred bmap update log item has always supported the attr fork, so plumb this in so that higher layers can access this. Signed-off-by: Darrick J. Wong --- fs/xfs/libxfs/xfs_bmap.c | 42 ++++++++++++++++-------------------------- fs/xfs/libxfs/xfs_bmap.h | 4 ++-- fs/xfs/xfs_bmap_item.c | 2 +- fs/xfs/xfs_bmap_util.c | 8 ++++---- fs/xfs/xfs_reflink.c | 4 ++-- 5 files changed, 25 insertions(+), 35 deletions(-) diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c index 33dbae784463..2752df4f4e69 100644 --- a/fs/xfs/libxfs/xfs_bmap.c +++ b/fs/xfs/libxfs/xfs_bmap.c @@ -6238,17 +6238,8 @@ xfs_bmap_split_extent( return error; } -/* Deferred mapping is only for real extents in the data fork. */ -static bool -xfs_bmap_is_update_needed( - struct xfs_bmbt_irec *bmap) -{ - return bmap->br_startblock != HOLESTARTBLOCK && - bmap->br_startblock != DELAYSTARTBLOCK; -} - /* Record a bmap intent. */ -static int +static void __xfs_bmap_add( struct xfs_trans *tp, enum xfs_bmap_intent_type type, @@ -6258,6 +6249,11 @@ __xfs_bmap_add( { struct xfs_bmap_intent *bi; + if ((whichfork != XFS_DATA_FORK && whichfork != XFS_ATTR_FORK) || + bmap->br_startblock == HOLESTARTBLOCK || + bmap->br_startblock == DELAYSTARTBLOCK) + return; + trace_xfs_bmap_defer(tp->t_mountp, XFS_FSB_TO_AGNO(tp->t_mountp, bmap->br_startblock), type, @@ -6275,7 +6271,6 @@ __xfs_bmap_add( bi->bi_bmap = *bmap; xfs_defer_add(tp, XFS_DEFER_OPS_TYPE_BMAP, &bi->bi_list); - return 0; } /* Map an extent into a file. */ @@ -6283,12 +6278,10 @@ void xfs_bmap_map_extent( struct xfs_trans *tp, struct xfs_inode *ip, + int whichfork, struct xfs_bmbt_irec *PREV) { - if (!xfs_bmap_is_update_needed(PREV)) - return; - - __xfs_bmap_add(tp, XFS_BMAP_MAP, ip, XFS_DATA_FORK, PREV); + __xfs_bmap_add(tp, XFS_BMAP_MAP, ip, whichfork, PREV); } /* Unmap an extent out of a file. */ @@ -6296,12 +6289,10 @@ void xfs_bmap_unmap_extent( struct xfs_trans *tp, struct xfs_inode *ip, + int whichfork, struct xfs_bmbt_irec *PREV) { - if (!xfs_bmap_is_update_needed(PREV)) - return; - - __xfs_bmap_add(tp, XFS_BMAP_UNMAP, ip, XFS_DATA_FORK, PREV); + __xfs_bmap_add(tp, XFS_BMAP_UNMAP, ip, whichfork, PREV); } /* @@ -6320,6 +6311,10 @@ xfs_bmap_finish_one( xfs_exntst_t state) { int error = 0; + int flags = 0; + + if (whichfork == XFS_ATTR_FORK) + flags |= XFS_BMAPI_ATTRFORK; ASSERT(tp->t_firstblock == NULLFSBLOCK); @@ -6328,11 +6323,6 @@ xfs_bmap_finish_one( XFS_FSB_TO_AGBNO(tp->t_mountp, startblock), ip->i_ino, whichfork, startoff, *blockcount, state); - if (WARN_ON_ONCE(whichfork != XFS_DATA_FORK)) { - xfs_bmap_mark_sick(ip, whichfork); - return -EFSCORRUPTED; - } - if (XFS_TEST_ERROR(false, tp->t_mountp, XFS_ERRTAG_BMAP_FINISH_ONE)) return -EIO; @@ -6340,12 +6330,12 @@ xfs_bmap_finish_one( switch (type) { case XFS_BMAP_MAP: error = xfs_bmapi_remap(tp, ip, startoff, *blockcount, - startblock, 0); + startblock, flags); *blockcount = 0; break; case XFS_BMAP_UNMAP: error = __xfs_bunmapi(tp, ip, startoff, blockcount, - XFS_BMAPI_REMAP, 1); + flags | XFS_BMAPI_REMAP, 1); break; default: ASSERT(0); diff --git a/fs/xfs/libxfs/xfs_bmap.h b/fs/xfs/libxfs/xfs_bmap.h index bbd8ccdecffa..3367df499ac8 100644 --- a/fs/xfs/libxfs/xfs_bmap.h +++ b/fs/xfs/libxfs/xfs_bmap.h @@ -266,9 +266,9 @@ int xfs_bmap_finish_one(struct xfs_trans *tp, struct xfs_inode *ip, xfs_fileoff_t startoff, xfs_fsblock_t startblock, xfs_filblks_t *blockcount, xfs_exntst_t state); void xfs_bmap_map_extent(struct xfs_trans *tp, struct xfs_inode *ip, - struct xfs_bmbt_irec *imap); + int whichfork, struct xfs_bmbt_irec *imap); void xfs_bmap_unmap_extent(struct xfs_trans *tp, struct xfs_inode *ip, - struct xfs_bmbt_irec *imap); + int whichfork, struct xfs_bmbt_irec *imap); static inline int xfs_bmap_fork_to_state(int whichfork) { diff --git a/fs/xfs/xfs_bmap_item.c b/fs/xfs/xfs_bmap_item.c index 267351fbea67..7ad803a06634 100644 --- a/fs/xfs/xfs_bmap_item.c +++ b/fs/xfs/xfs_bmap_item.c @@ -581,7 +581,7 @@ xfs_bui_recover( irec.br_blockcount = count; irec.br_startoff = bmap->me_startoff; irec.br_state = state; - xfs_bmap_unmap_extent(tp, ip, &irec); + xfs_bmap_unmap_extent(tp, ip, whichfork, &irec); } set_bit(XFS_BUI_RECOVERED, &buip->bui_flags); diff --git a/fs/xfs/xfs_bmap_util.c b/fs/xfs/xfs_bmap_util.c index 746bb0c8271c..070f657241a1 100644 --- a/fs/xfs/xfs_bmap_util.c +++ b/fs/xfs/xfs_bmap_util.c @@ -1440,16 +1440,16 @@ xfs_swap_extent_rmap( trace_xfs_swap_extent_rmap_remap_piece(tip, &uirec); /* Remove the mapping from the donor file. */ - xfs_bmap_unmap_extent(tp, tip, &uirec); + xfs_bmap_unmap_extent(tp, tip, XFS_DATA_FORK, &uirec); /* Remove the mapping from the source file. */ - xfs_bmap_unmap_extent(tp, ip, &irec); + xfs_bmap_unmap_extent(tp, ip, XFS_DATA_FORK, &irec); /* Map the donor file's blocks into the source file. */ - xfs_bmap_map_extent(tp, ip, &uirec); + xfs_bmap_map_extent(tp, ip, XFS_DATA_FORK, &uirec); /* Map the source file's blocks into the donor file. */ - xfs_bmap_map_extent(tp, tip, &irec); + xfs_bmap_map_extent(tp, tip, XFS_DATA_FORK, &irec); error = xfs_defer_finish(tpp); tp = *tpp; diff --git a/fs/xfs/xfs_reflink.c b/fs/xfs/xfs_reflink.c index 5e978d1f169d..f206f6637daf 100644 --- a/fs/xfs/xfs_reflink.c +++ b/fs/xfs/xfs_reflink.c @@ -706,7 +706,7 @@ xfs_reflink_end_cow_extent( xfs_refcount_free_cow_extent(tp, del.br_startblock, del.br_blockcount); /* Map the new blocks into the data fork. */ - xfs_bmap_map_extent(tp, ip, &del); + xfs_bmap_map_extent(tp, ip, XFS_DATA_FORK, &del); /* Charge this new data fork mapping to the on-disk quota. */ xfs_trans_mod_dquot_byino(tp, ip, XFS_TRANS_DQ_DELBCOUNT, @@ -1125,7 +1125,7 @@ xfs_reflink_remap_extent( xfs_refcount_increase_extent(tp, &uirec); /* Map the new blocks into the data fork. */ - xfs_bmap_map_extent(tp, ip, &uirec); + xfs_bmap_map_extent(tp, ip, XFS_DATA_FORK, &uirec); /* Update quota accounting. */ xfs_trans_mod_dquot_byino(tp, ip, XFS_TRANS_DQ_BCOUNT, From patchwork Wed Apr 29 02:44:46 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Darrick J. Wong" X-Patchwork-Id: 11515901 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 3877A17EF for ; Wed, 29 Apr 2020 02:46:51 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 1EC5A20787 for ; Wed, 29 Apr 2020 02:46:51 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=oracle.com header.i=@oracle.com header.b="zQp/MZ+C" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726755AbgD2Cqu (ORCPT ); Tue, 28 Apr 2020 22:46:50 -0400 Received: from aserp2120.oracle.com ([141.146.126.78]:49744 "EHLO aserp2120.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726498AbgD2Cqu (ORCPT ); Tue, 28 Apr 2020 22:46:50 -0400 Received: from pps.filterd (aserp2120.oracle.com [127.0.0.1]) by aserp2120.oracle.com (8.16.0.42/8.16.0.42) with SMTP id 03T2hMZb072961; Wed, 29 Apr 2020 02:46:49 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=subject : from : to : cc : date : message-id : in-reply-to : references : mime-version : content-type : content-transfer-encoding; s=corp-2020-01-29; bh=9ab5a2MnK6x57LlE+3OzKnaJvtKwKr/k/I7QkylaSwU=; b=zQp/MZ+CzOl5KNar1lmJ8Ym6B+NlukxYcizomoUmkJPY9pKgeixibOERJTGDor60cgZS aAEnP6eVezB11X0WnMMWGr9BYmR9uTHzivm6JDX1iefZ6FN0zZvDG7M//F2Rr/ZpWwly rX7gX+DmcQdXemy+Be1OqgpojDbGw+BbKL5kdVo8WfLiLPU5P02w9Djq9Bch3STXphve dMvDgeOvZpTgr7aAoZDXDnlZc8iZqLiflUZkd2PbyIX57WGZpfGYZ4YCIJJWJH7RZeJH f66eTA3+2dMvTb+2AC0nWfq5N/tbZx78RdzWZnakGeDe/r9nn9NuBBtWt4PzZ7tNLoIx bA== Received: from userp3030.oracle.com (userp3030.oracle.com [156.151.31.80]) by aserp2120.oracle.com with ESMTP id 30nucg39s1-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Wed, 29 Apr 2020 02:46:48 +0000 Received: from pps.filterd (userp3030.oracle.com [127.0.0.1]) by userp3030.oracle.com (8.16.0.42/8.16.0.42) with SMTP id 03T2ggT2071521; Wed, 29 Apr 2020 02:44:48 GMT Received: from userv0122.oracle.com (userv0122.oracle.com [156.151.31.75]) by userp3030.oracle.com with ESMTP id 30mxphp1vy-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Wed, 29 Apr 2020 02:44:48 +0000 Received: from abhmp0001.oracle.com (abhmp0001.oracle.com [141.146.116.7]) by userv0122.oracle.com (8.14.4/8.14.4) with ESMTP id 03T2ilfE022537; Wed, 29 Apr 2020 02:44:47 GMT Received: from localhost (/67.169.218.210) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Tue, 28 Apr 2020 19:44:47 -0700 Subject: [PATCH 05/18] xfs: xfs_bmap_finish_one should map unwritten extents properly From: "Darrick J. Wong" To: darrick.wong@oracle.com Cc: linux-xfs@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-api@vger.kernel.org Date: Tue, 28 Apr 2020 19:44:46 -0700 Message-ID: <158812828594.168506.12916087752534925204.stgit@magnolia> In-Reply-To: <158812825316.168506.932540609191384366.stgit@magnolia> References: <158812825316.168506.932540609191384366.stgit@magnolia> User-Agent: StGit/0.17.1-dirty MIME-Version: 1.0 X-Proofpoint-Virus-Version: vendor=nai engine=6000 definitions=9605 signatures=668686 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 mlxlogscore=931 malwarescore=0 mlxscore=0 bulkscore=0 adultscore=0 phishscore=0 suspectscore=1 spamscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2003020000 definitions=main-2004290020 X-Proofpoint-Virus-Version: vendor=nai engine=6000 definitions=9605 signatures=668686 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 clxscore=1015 priorityscore=1501 mlxlogscore=986 impostorscore=0 suspectscore=1 malwarescore=0 lowpriorityscore=0 mlxscore=0 spamscore=0 adultscore=0 phishscore=0 bulkscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2003020000 definitions=main-2004290020 Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org From: Darrick J. Wong The deferred bmap work state and the log item can transmit unwritten state, so the XFS_BMAP_MAP handler must map in extents with that unwritten state. Signed-off-by: Darrick J. Wong --- fs/xfs/libxfs/xfs_bmap.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c index 2752df4f4e69..81e03461312b 100644 --- a/fs/xfs/libxfs/xfs_bmap.c +++ b/fs/xfs/libxfs/xfs_bmap.c @@ -6329,6 +6329,8 @@ xfs_bmap_finish_one( switch (type) { case XFS_BMAP_MAP: + if (state == XFS_EXT_UNWRITTEN) + flags |= XFS_BMAPI_PREALLOC; error = xfs_bmapi_remap(tp, ip, startoff, *blockcount, startblock, flags); *blockcount = 0; From patchwork Wed Apr 29 02:44:52 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Darrick J. Wong" X-Patchwork-Id: 11515905 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 491CE92C for ; Wed, 29 Apr 2020 02:47:01 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 31E4D20787 for ; Wed, 29 Apr 2020 02:47:01 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=oracle.com header.i=@oracle.com header.b="Hs5zWGvT" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726787AbgD2CrA (ORCPT ); Tue, 28 Apr 2020 22:47:00 -0400 Received: from aserp2120.oracle.com ([141.146.126.78]:49830 "EHLO aserp2120.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726778AbgD2Cq6 (ORCPT ); Tue, 28 Apr 2020 22:46:58 -0400 Received: from pps.filterd (aserp2120.oracle.com [127.0.0.1]) by aserp2120.oracle.com (8.16.0.42/8.16.0.42) with SMTP id 03T2he7u073293; Wed, 29 Apr 2020 02:46:57 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=subject : from : to : cc : date : message-id : in-reply-to : references : mime-version : content-type : content-transfer-encoding; s=corp-2020-01-29; bh=3YiNEmjLje5pTnDILa7dhgmHUAljMFp/iVyWS1P2I/g=; b=Hs5zWGvTMi0OHn9FGpU7UweyHoVAVJbr8twbMI+ygQ6INGnmugUcZYCkTsiGNPoyDU8i zIVID11kvhqEQ6S6mtSP6jfRhTIOoFgoQKwo2OA4zZoedISk4Byv6x0mUITHaTaJhASA 1GnfVfK1V2cxJaWr7YKrKW3P7v1+uIqi2ipP4zWggJNapZQVDKh/+53s+GWZqJPgJ1j9 sUAlPp75JiwqM7kZlW5kwIcBG6cCw/7UN2auIMBf4dJg+NzVNdIodl2PZvkCSDKXfmBs UbZ2lNpsovNAicpDeFQndffb0d2cT/40xgGOQra6dmKN+KpYP6c3V7AtW+meGqUEIreu Xg== Received: from userp3030.oracle.com (userp3030.oracle.com [156.151.31.80]) by aserp2120.oracle.com with ESMTP id 30nucg39s9-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Wed, 29 Apr 2020 02:46:57 +0000 Received: from pps.filterd (userp3030.oracle.com [127.0.0.1]) by userp3030.oracle.com (8.16.0.42/8.16.0.42) with SMTP id 03T2ggFX071513; Wed, 29 Apr 2020 02:44:56 GMT Received: from userv0122.oracle.com (userv0122.oracle.com [156.151.31.75]) by userp3030.oracle.com with ESMTP id 30mxphp247-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Wed, 29 Apr 2020 02:44:56 +0000 Received: from abhmp0015.oracle.com (abhmp0015.oracle.com [141.146.116.21]) by userv0122.oracle.com (8.14.4/8.14.4) with ESMTP id 03T2it6S022586; Wed, 29 Apr 2020 02:44:55 GMT Received: from localhost (/67.169.218.210) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Tue, 28 Apr 2020 19:44:55 -0700 Subject: [PATCH 06/18] xfs: create a log incompat flag for atomic extent swapping From: "Darrick J. Wong" To: darrick.wong@oracle.com Cc: linux-xfs@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-api@vger.kernel.org Date: Tue, 28 Apr 2020 19:44:52 -0700 Message-ID: <158812829237.168506.10231967263101202625.stgit@magnolia> In-Reply-To: <158812825316.168506.932540609191384366.stgit@magnolia> References: <158812825316.168506.932540609191384366.stgit@magnolia> User-Agent: StGit/0.17.1-dirty MIME-Version: 1.0 X-Proofpoint-Virus-Version: vendor=nai engine=6000 definitions=9605 signatures=668686 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 mlxlogscore=999 malwarescore=0 mlxscore=0 bulkscore=0 adultscore=0 phishscore=0 suspectscore=1 spamscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2003020000 definitions=main-2004290020 X-Proofpoint-Virus-Version: vendor=nai engine=6000 definitions=9605 signatures=668686 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 clxscore=1015 priorityscore=1501 mlxlogscore=999 impostorscore=0 suspectscore=1 malwarescore=0 lowpriorityscore=0 mlxscore=0 spamscore=0 adultscore=0 phishscore=0 bulkscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2003020000 definitions=main-2004290020 Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org From: Darrick J. Wong Create a log incompat flag so that we only attempt to process swap extent log items if the filesystem supports it. Signed-off-by: Darrick J. Wong --- fs/xfs/libxfs/xfs_format.h | 11 ++++++++++- fs/xfs/libxfs/xfs_fs.h | 1 + fs/xfs/libxfs/xfs_sb.c | 2 ++ fs/xfs/xfs_super.c | 4 ++++ 4 files changed, 17 insertions(+), 1 deletion(-) diff --git a/fs/xfs/libxfs/xfs_format.h b/fs/xfs/libxfs/xfs_format.h index 34babf402e14..63ed62a92c9c 100644 --- a/fs/xfs/libxfs/xfs_format.h +++ b/fs/xfs/libxfs/xfs_format.h @@ -500,7 +500,10 @@ xfs_sb_has_incompat_feature( return (sbp->sb_features_incompat & feature) != 0; } -#define XFS_SB_FEAT_INCOMPAT_LOG_ALL 0 +#define XFS_SB_FEAT_INCOMPAT_LOG_ATOMIC_SWAP (1 << 0) +#define XFS_SB_FEAT_INCOMPAT_LOG_ALL \ + (XFS_SB_FEAT_INCOMPAT_LOG_ATOMIC_SWAP) + #define XFS_SB_FEAT_INCOMPAT_LOG_UNKNOWN ~XFS_SB_FEAT_INCOMPAT_LOG_ALL static inline bool xfs_sb_has_incompat_log_feature( @@ -614,6 +617,12 @@ static inline bool xfs_sb_version_hasrtrmapbt(struct xfs_sb *sbp) xfs_sb_version_hasrmapbt(sbp); } +static inline bool xfs_sb_version_hasatomicswap(struct xfs_sb *sbp) +{ + return XFS_SB_VERSION_NUM(sbp) == XFS_SB_VERSION_5 && + (sbp->sb_features_log_incompat & XFS_SB_FEAT_INCOMPAT_LOG_ATOMIC_SWAP); +} + /* * end of superblock version macros */ diff --git a/fs/xfs/libxfs/xfs_fs.h b/fs/xfs/libxfs/xfs_fs.h index c5b75082b9db..d278ca5731e4 100644 --- a/fs/xfs/libxfs/xfs_fs.h +++ b/fs/xfs/libxfs/xfs_fs.h @@ -251,6 +251,7 @@ typedef struct xfs_fsop_resblks { #define XFS_FSOP_GEOM_FLAGS_RMAPBT (1 << 19) /* reverse mapping btree */ #define XFS_FSOP_GEOM_FLAGS_REFLINK (1 << 20) /* files can share blocks */ #define XFS_FSOP_GEOM_FLAGS_BIGTIME (1 << 21) /* 64-bit nsec timestamps */ +#define XFS_FSOP_GEOM_FLAGS_ATOMIC_SWAP (1 << 22) /* atomic swapext */ /* * Minimum and maximum sizes need for growth checks. diff --git a/fs/xfs/libxfs/xfs_sb.c b/fs/xfs/libxfs/xfs_sb.c index c50f589824f0..16094fd1a75e 100644 --- a/fs/xfs/libxfs/xfs_sb.c +++ b/fs/xfs/libxfs/xfs_sb.c @@ -1205,6 +1205,8 @@ xfs_fs_geometry( geo->flags |= XFS_FSOP_GEOM_FLAGS_REFLINK; if (xfs_sb_version_hasbigtime(sbp)) geo->flags |= XFS_FSOP_GEOM_FLAGS_BIGTIME; + if (xfs_sb_version_hasatomicswap(sbp)) + geo->flags |= XFS_FSOP_GEOM_FLAGS_ATOMIC_SWAP; if (xfs_sb_version_hassector(sbp)) geo->logsectsize = sbp->sb_logsectsize; else diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c index a17e824c6084..42d82c9d2a1d 100644 --- a/fs/xfs/xfs_super.c +++ b/fs/xfs/xfs_super.c @@ -1647,6 +1647,10 @@ xfs_fc_fill_super( xfs_warn(mp, "EXPERIMENTAL inode btree counters feature in use. Use at your own risk!"); + if (xfs_sb_version_hasatomicswap(&mp->m_sb)) + xfs_warn(mp, + "EXPERIMENTAL atomic file range swap feature in use. Use at your own risk!"); + error = xfs_mountfs(mp); if (error) goto out_filestream_unmount; From patchwork Wed Apr 29 02:45:00 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Darrick J. Wong" X-Patchwork-Id: 11515841 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 29A9B17EF for ; Wed, 29 Apr 2020 02:45:05 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 0D8642082E for ; Wed, 29 Apr 2020 02:45:05 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=oracle.com header.i=@oracle.com header.b="WcTrs97V" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726662AbgD2CpE (ORCPT ); Tue, 28 Apr 2020 22:45:04 -0400 Received: from userp2130.oracle.com ([156.151.31.86]:51158 "EHLO userp2130.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726536AbgD2CpD (ORCPT ); Tue, 28 Apr 2020 22:45:03 -0400 Received: from pps.filterd (userp2130.oracle.com [127.0.0.1]) by userp2130.oracle.com (8.16.0.42/8.16.0.42) with SMTP id 03T2hc3B155889; Wed, 29 Apr 2020 02:45:03 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=subject : from : to : cc : date : message-id : in-reply-to : references : mime-version : content-type : content-transfer-encoding; s=corp-2020-01-29; bh=HzxgoLgrQNxMCxmiaUI1HbQLBUqMyDyOBwJ0LWxYPCQ=; b=WcTrs97Vho/OLOXwhVU+K24oByfK1javS281zht/FuMmIHNsUnW1XQiJEystvt4f+09+ T2dleHBsltlycpqAOTHqF5Qb95F8rNtW1VwS+ubqtptmhlFOMZcNVZMTNO93ZSwJOslD F6A57x1KhQmzD6T6Ghtd7YE5w2d3C2pUEBrrm7B5Y+n1BSA0fYThoQMiLMIDfVmdC5Du d/WlI9PJpns4t5If8mvPu+/mVyy40JtnJgoSVU+x0HSSdsh9Stvw55n13ZB0ju++vqfY h78dZCBPz2gOQJGnfFRONReDXiTP4+9DL32aaffZuDdKu+WBw03uifoWF2FaAo6U2R2k aA== Received: from aserp3030.oracle.com (aserp3030.oracle.com [141.146.126.71]) by userp2130.oracle.com with ESMTP id 30p01nsthf-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Wed, 29 Apr 2020 02:45:03 +0000 Received: from pps.filterd (aserp3030.oracle.com [127.0.0.1]) by aserp3030.oracle.com (8.16.0.42/8.16.0.42) with SMTP id 03T2g3wB039228; Wed, 29 Apr 2020 02:45:02 GMT Received: from aserv0121.oracle.com (aserv0121.oracle.com [141.146.126.235]) by aserp3030.oracle.com with ESMTP id 30mxru049u-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Wed, 29 Apr 2020 02:45:02 +0000 Received: from abhmp0014.oracle.com (abhmp0014.oracle.com [141.146.116.20]) by aserv0121.oracle.com (8.14.4/8.13.8) with ESMTP id 03T2j1vJ015858; Wed, 29 Apr 2020 02:45:01 GMT Received: from localhost (/67.169.218.210) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Tue, 28 Apr 2020 19:45:01 -0700 Subject: [PATCH 07/18] xfs: allow deferred ops items to put themselves at the end of the pending queue From: "Darrick J. Wong" To: darrick.wong@oracle.com Cc: linux-xfs@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-api@vger.kernel.org Date: Tue, 28 Apr 2020 19:45:00 -0700 Message-ID: <158812830045.168506.2200063100219298803.stgit@magnolia> In-Reply-To: <158812825316.168506.932540609191384366.stgit@magnolia> References: <158812825316.168506.932540609191384366.stgit@magnolia> User-Agent: StGit/0.17.1-dirty MIME-Version: 1.0 X-Proofpoint-Virus-Version: vendor=nai engine=6000 definitions=9605 signatures=668686 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 mlxscore=0 phishscore=0 suspectscore=3 mlxlogscore=999 malwarescore=0 bulkscore=0 spamscore=0 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2003020000 definitions=main-2004290020 X-Proofpoint-Virus-Version: vendor=nai engine=6000 definitions=9605 signatures=668686 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 bulkscore=0 spamscore=0 clxscore=1015 phishscore=0 mlxlogscore=999 adultscore=0 priorityscore=1501 mlxscore=0 suspectscore=3 malwarescore=0 lowpriorityscore=0 impostorscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2003020000 definitions=main-2004290020 Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org From: Darrick J. Wong Allow individual deferred op ->finish_item functions to decide that they want to yield to all other deferred ops that might need processing. Signed-off-by: Darrick J. Wong --- fs/xfs/libxfs/xfs_defer.c | 29 +++++++++++++++++++++++------ 1 file changed, 23 insertions(+), 6 deletions(-) diff --git a/fs/xfs/libxfs/xfs_defer.c b/fs/xfs/libxfs/xfs_defer.c index 1cab95cef399..f53e3ce858eb 100644 --- a/fs/xfs/libxfs/xfs_defer.c +++ b/fs/xfs/libxfs/xfs_defer.c @@ -69,10 +69,10 @@ * - For each work item attached to the log intent item, * * Perform the described action. * * Attach the work item to the log done item. - * * If the result of doing the work was -EAGAIN, ->finish work - * wants a new transaction. See the "Requesting a Fresh - * Transaction while Finishing Deferred Work" section below for - * details. + * * If the result of doing the work was -EAGAIN or -EMULTIHOP, + * ->finish work wants a new transaction. See the "Requesting a + * Fresh Transaction while Finishing Deferred Work" section below + * for details. * * The key here is that we must log an intent item for all pending * work items every time we roll the transaction, and that we must log @@ -108,6 +108,13 @@ * required that ->finish_item must be careful to leave enough * transaction reservation to fit the new log intent item. * + * If ->finish_item returns -EMULTIHOP, defer_finish will log the new + * intent item with the remaining work items but it will move the + * xfs_defer_pending item to a separate queue. The separate queue + * will be put back into the pending list at the very end of processing + * after all other pending items (including ones that were created as + * part of finishing other items) have been processed. + * * This is an example of remapping the extent (E, E+B) into file X at * offset A and dealing with the extent (C, C+B) already being mapped * there: @@ -365,12 +372,14 @@ xfs_defer_finish_noroll( int error = 0; const struct xfs_defer_op_type *ops; LIST_HEAD(dop_pending); + LIST_HEAD(dop_endofline); ASSERT((*tp)->t_flags & XFS_TRANS_PERM_LOG_RES); trace_xfs_defer_finish(*tp, _RET_IP_); /* Until we run out of pending work to finish... */ +again: while (!list_empty(&dop_pending) || !list_empty(&(*tp)->t_dfops)) { /* log intents and pull in intake items */ xfs_defer_create_intents(*tp); @@ -398,7 +407,7 @@ xfs_defer_finish_noroll( dfp->dfp_count--; error = ops->finish_item(*tp, li, dfp->dfp_done, &state); - if (error == -EAGAIN) { + if (error == -EAGAIN || error == -EMULTIHOP) { /* * Caller wants a fresh transaction; * put the work item back on the list @@ -418,7 +427,7 @@ xfs_defer_finish_noroll( goto out; } } - if (error == -EAGAIN) { + if (error == -EAGAIN || error == -EMULTIHOP) { /* * Caller wants a fresh transaction, so log a * new log intent item to replace the old one @@ -431,6 +440,8 @@ xfs_defer_finish_noroll( dfp->dfp_done = NULL; list_for_each(li, &dfp->dfp_work) ops->log_item(*tp, dfp->dfp_intent, li); + if (error == -EMULTIHOP) + list_move_tail(&dfp->dfp_list, &dop_endofline); } else { /* Done with the dfp, free it. */ list_del(&dfp->dfp_list); @@ -441,8 +452,14 @@ xfs_defer_finish_noroll( ops->finish_cleanup(*tp, state, error); } + if (!list_empty(&dop_endofline)) { + list_splice_tail_init(&dop_endofline, &dop_pending); + goto again; + } + out: if (error) { + list_splice_tail_init(&dop_endofline, &dop_pending); xfs_defer_trans_abort(*tp, &dop_pending); xfs_force_shutdown((*tp)->t_mountp, SHUTDOWN_CORRUPT_INCORE); trace_xfs_defer_finish_error(*tp, error); From patchwork Wed Apr 29 02:45:06 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Darrick J. Wong" X-Patchwork-Id: 11515843 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id A4A3092C for ; Wed, 29 Apr 2020 02:45:15 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 7158020784 for ; Wed, 29 Apr 2020 02:45:15 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=oracle.com header.i=@oracle.com header.b="bngsgZs8" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726560AbgD2CpO (ORCPT ); Tue, 28 Apr 2020 22:45:14 -0400 Received: from userp2130.oracle.com ([156.151.31.86]:51248 "EHLO userp2130.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726536AbgD2CpO (ORCPT ); Tue, 28 Apr 2020 22:45:14 -0400 Received: from pps.filterd (userp2130.oracle.com [127.0.0.1]) by userp2130.oracle.com (8.16.0.42/8.16.0.42) with SMTP id 03T2iAZ9156399; Wed, 29 Apr 2020 02:45:10 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=subject : from : to : cc : date : message-id : in-reply-to : references : mime-version : content-type : content-transfer-encoding; s=corp-2020-01-29; bh=EnWcOENY45vRma4eOt7J2rnoMq67iuOnNc2LJyn0OHY=; b=bngsgZs8v4AfwRwfa0FEwuW9snXCwUNNO3JtBh8uBQalRL3H6SDcdbYVn6cjxDUpIsCC PrjtIl1/DOHKeewWnpUVBZOmrxRX1OKkof5KJuBzN35rxb8eo4TWS9mH0HqLSnt2HviG KXeXMj2JQfajVHXRHdULZFPV4ByMPHHUqvOuHxGTnBTy8GA8eaVkxY+ABMUDDocXMFG1 xnBj6mRI7JjmvZpjGhHJ0HwAgC2lz835UpIavai/8IiBDu4rP1C5onh60vbU+vo5fwH5 AIRNxZlWpWktM+4hndi7lhNdmFsEbpMQ3t/3TM7rJ7eQh0rrkb7EtJMbwV1fx94Prrcv GQ== Received: from userp3030.oracle.com (userp3030.oracle.com [156.151.31.80]) by userp2130.oracle.com with ESMTP id 30p01nsthw-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Wed, 29 Apr 2020 02:45:09 +0000 Received: from pps.filterd (userp3030.oracle.com [127.0.0.1]) by userp3030.oracle.com (8.16.0.42/8.16.0.42) with SMTP id 03T2ggFe071513; Wed, 29 Apr 2020 02:45:09 GMT Received: from aserv0122.oracle.com (aserv0122.oracle.com [141.146.126.236]) by userp3030.oracle.com with ESMTP id 30mxphp2rw-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Wed, 29 Apr 2020 02:45:09 +0000 Received: from abhmp0012.oracle.com (abhmp0012.oracle.com [141.146.116.18]) by aserv0122.oracle.com (8.14.4/8.14.4) with ESMTP id 03T2j8XC003599; Wed, 29 Apr 2020 02:45:08 GMT Received: from localhost (/67.169.218.210) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Tue, 28 Apr 2020 19:45:07 -0700 Subject: [PATCH 08/18] xfs: introduce a swap-extent log intent item From: "Darrick J. Wong" To: darrick.wong@oracle.com Cc: linux-xfs@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-api@vger.kernel.org Date: Tue, 28 Apr 2020 19:45:06 -0700 Message-ID: <158812830680.168506.10239099532005921334.stgit@magnolia> In-Reply-To: <158812825316.168506.932540609191384366.stgit@magnolia> References: <158812825316.168506.932540609191384366.stgit@magnolia> User-Agent: StGit/0.17.1-dirty MIME-Version: 1.0 X-Proofpoint-Virus-Version: vendor=nai engine=6000 definitions=9605 signatures=668686 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 mlxlogscore=999 malwarescore=0 mlxscore=0 bulkscore=0 adultscore=0 phishscore=0 suspectscore=3 spamscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2003020000 definitions=main-2004290020 X-Proofpoint-Virus-Version: vendor=nai engine=6000 definitions=9605 signatures=668686 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 bulkscore=0 spamscore=0 clxscore=1015 phishscore=0 mlxlogscore=999 adultscore=0 priorityscore=1501 mlxscore=0 suspectscore=3 malwarescore=0 lowpriorityscore=0 impostorscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2003020000 definitions=main-2004290020 Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org From: Darrick J. Wong Introduce a new intent log item to handle swapping extents. Signed-off-by: Darrick J. Wong --- fs/xfs/Makefile | 1 fs/xfs/libxfs/xfs_log_format.h | 55 ++++++ fs/xfs/libxfs/xfs_log_recover.h | 1 fs/xfs/xfs_log.c | 2 fs/xfs/xfs_log_recover.c | 6 + fs/xfs/xfs_super.c | 17 ++ fs/xfs/xfs_swapext_item.c | 365 +++++++++++++++++++++++++++++++++++++++ fs/xfs/xfs_swapext_item.h | 67 +++++++ 8 files changed, 512 insertions(+), 2 deletions(-) create mode 100644 fs/xfs/xfs_swapext_item.c create mode 100644 fs/xfs/xfs_swapext_item.h diff --git a/fs/xfs/Makefile b/fs/xfs/Makefile index 2bd822c784cb..27b4bd5c8ffe 100644 --- a/fs/xfs/Makefile +++ b/fs/xfs/Makefile @@ -109,6 +109,7 @@ xfs-y += xfs_log.o \ xfs_inode_item.o \ xfs_refcount_item.o \ xfs_rmap_item.o \ + xfs_swapext_item.o \ xfs_log_recover.o \ xfs_trans_ail.o \ xfs_trans_buf.o diff --git a/fs/xfs/libxfs/xfs_log_format.h b/fs/xfs/libxfs/xfs_log_format.h index 382b7cd6ba82..ceb67213df64 100644 --- a/fs/xfs/libxfs/xfs_log_format.h +++ b/fs/xfs/libxfs/xfs_log_format.h @@ -117,7 +117,9 @@ struct xfs_unmount_log_format { #define XLOG_REG_TYPE_CUD_FORMAT 24 #define XLOG_REG_TYPE_BUI_FORMAT 25 #define XLOG_REG_TYPE_BUD_FORMAT 26 -#define XLOG_REG_TYPE_MAX 26 +#define XLOG_REG_TYPE_SXI_FORMAT 27 +#define XLOG_REG_TYPE_SXD_FORMAT 28 +#define XLOG_REG_TYPE_MAX 28 /* * Flags to log operation header @@ -240,6 +242,8 @@ typedef struct xfs_trans_header { #define XFS_LI_CUD 0x1243 #define XFS_LI_BUI 0x1244 /* bmbt update intent */ #define XFS_LI_BUD 0x1245 +#define XFS_LI_SXI 0x1246 +#define XFS_LI_SXD 0x1247 #define XFS_LI_TYPE_DESC \ { XFS_LI_EFI, "XFS_LI_EFI" }, \ @@ -255,7 +259,9 @@ typedef struct xfs_trans_header { { XFS_LI_CUI, "XFS_LI_CUI" }, \ { XFS_LI_CUD, "XFS_LI_CUD" }, \ { XFS_LI_BUI, "XFS_LI_BUI" }, \ - { XFS_LI_BUD, "XFS_LI_BUD" } + { XFS_LI_BUD, "XFS_LI_BUD" }, \ + { XFS_LI_SXI, "XFS_LI_SXI" }, \ + { XFS_LI_SXD, "XFS_LI_SXD" } /* * Inode Log Item Format definitions. @@ -786,6 +792,51 @@ struct xfs_bud_log_format { uint64_t bud_bui_id; /* id of corresponding bui */ }; +/* + * SXI/SXD (extent swapping) log format definitions + */ + +struct xfs_swap_extent { + uint64_t se_inode1; + uint64_t se_inode2; + uint64_t se_startoff1; + uint64_t se_startoff2; + uint64_t se_blockcount; + uint64_t se_flags; + int64_t se_isize1; + int64_t se_isize2; +}; + +/* Swap extents between extended attribute forks. */ +#define XFS_SWAP_EXTENT_ATTR_FORK (1ULL << 0) + +/* Set the file sizes when finished. */ +#define XFS_SWAP_EXTENT_SET_SIZES (1ULL << 1) + +#define XFS_SWAP_EXTENT_FLAGS (XFS_SWAP_EXTENT_ATTR_FORK | \ + XFS_SWAP_EXTENT_SET_SIZES) + +/* This is the structure used to lay out an sxi log item in the log. */ +struct xfs_sxi_log_format { + uint16_t sxi_type; /* sxi log item type */ + uint16_t sxi_size; /* size of this item */ + uint32_t __pad; /* must be zero */ + uint64_t sxi_id; /* sxi identifier */ + struct xfs_swap_extent sxi_extent; /* extent to swap */ +}; + +/* + * This is the structure used to lay out an sxd log item in the + * log. The sxd_extents array is a variable size array whose + * size is given by sxd_nextents; + */ +struct xfs_sxd_log_format { + uint16_t sxd_type; /* sxd log item type */ + uint16_t sxd_size; /* size of this item */ + uint32_t __pad; + uint64_t sxd_sxi_id; /* id of corresponding bui */ +}; + /* * Dquot Log format definitions. * diff --git a/fs/xfs/libxfs/xfs_log_recover.h b/fs/xfs/libxfs/xfs_log_recover.h index b36ccaa5465b..c9cd6775f50c 100644 --- a/fs/xfs/libxfs/xfs_log_recover.h +++ b/fs/xfs/libxfs/xfs_log_recover.h @@ -169,6 +169,7 @@ extern const struct xlog_recover_intent_type xlog_recover_extfree_type; extern const struct xlog_recover_intent_type xlog_recover_rmap_type; extern const struct xlog_recover_intent_type xlog_recover_refcount_type; extern const struct xlog_recover_intent_type xlog_recover_bmap_type; +extern const struct xlog_recover_intent_type xlog_recover_swapext_type; typedef bool (*xlog_recover_release_intent_fn)(struct xlog *log, struct xfs_log_item *item, uint64_t intent_id); diff --git a/fs/xfs/xfs_log.c b/fs/xfs/xfs_log.c index 00fda2e8e738..f589157059d2 100644 --- a/fs/xfs/xfs_log.c +++ b/fs/xfs/xfs_log.c @@ -1975,6 +1975,8 @@ xlog_print_tic_res( REG_TYPE_STR(CUD_FORMAT, "cud_format"), REG_TYPE_STR(BUI_FORMAT, "bui_format"), REG_TYPE_STR(BUD_FORMAT, "bud_format"), + REG_TYPE_STR(SXI_FORMAT, "sxi_format"), + REG_TYPE_STR(SXD_FORMAT, "sxd_format"), }; BUILD_BUG_ON(ARRAY_SIZE(res_type_str) != XLOG_REG_TYPE_MAX + 1); #undef REG_TYPE_STR diff --git a/fs/xfs/xfs_log_recover.c b/fs/xfs/xfs_log_recover.c index 1c836dcf3e3e..4f990a45291b 100644 --- a/fs/xfs/xfs_log_recover.c +++ b/fs/xfs/xfs_log_recover.c @@ -1851,6 +1851,9 @@ xlog_intent_for_type( case XFS_LI_BUI: case XFS_LI_BUD: return &xlog_recover_bmap_type; + case XFS_LI_SXI: + case XFS_LI_SXD: + return &xlog_recover_swapext_type; default: return NULL; } @@ -1865,6 +1868,7 @@ xlog_is_intent_done_item( case XFS_LI_RUD: case XFS_LI_CUD: case XFS_LI_BUD: + case XFS_LI_SXD: return true; default: return false; @@ -1917,6 +1921,8 @@ xlog_item_for_type( case XFS_LI_CUD: case XFS_LI_BUI: case XFS_LI_BUD: + case XFS_LI_SXI: + case XFS_LI_SXD: return &xlog_intent_item_type; case XFS_LI_INODE: return &xlog_inode_item_type; diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c index 42d82c9d2a1d..206db91d113f 100644 --- a/fs/xfs/xfs_super.c +++ b/fs/xfs/xfs_super.c @@ -35,6 +35,7 @@ #include "xfs_refcount_item.h" #include "xfs_bmap_item.h" #include "xfs_reflink.h" +#include "xfs_swapext_item.h" #include #include @@ -2075,8 +2076,24 @@ xfs_init_zones(void) if (!xfs_bui_zone) goto out_destroy_bud_zone; + xfs_sxd_zone = kmem_cache_create("xfs_sxd_item", + sizeof(struct xfs_sxd_log_item), + 0, 0, NULL); + if (!xfs_sxd_zone) + goto out_destroy_bui_zone; + + xfs_sxi_zone = kmem_cache_create("xfs_sxi_item", + sizeof(struct xfs_sxi_log_item), + 0, 0, NULL); + if (!xfs_sxi_zone) + goto out_destroy_sxd_zone; + return 0; + out_destroy_sxd_zone: + kmem_cache_destroy(xfs_sxd_zone); + out_destroy_bui_zone: + kmem_cache_destroy(xfs_bui_zone); out_destroy_bud_zone: kmem_cache_destroy(xfs_bud_zone); out_destroy_cui_zone: diff --git a/fs/xfs/xfs_swapext_item.c b/fs/xfs/xfs_swapext_item.c new file mode 100644 index 000000000000..63ba43e5c3bb --- /dev/null +++ b/fs/xfs/xfs_swapext_item.c @@ -0,0 +1,365 @@ +// SPDX-License-Identifier: GPL-2.0-or-later +/* + * Copyright (C) 2020 Oracle. All Rights Reserved. + * Author: Darrick J. Wong + */ +#include "xfs.h" +#include "xfs_fs.h" +#include "xfs_format.h" +#include "xfs_log_format.h" +#include "xfs_trans_resv.h" +#include "xfs_bit.h" +#include "xfs_shared.h" +#include "xfs_mount.h" +#include "xfs_defer.h" +#include "xfs_inode.h" +#include "xfs_trans.h" +#include "xfs_trans_priv.h" +#include "xfs_swapext_item.h" +#include "xfs_log.h" +#include "xfs_bmap.h" +#include "xfs_icache.h" +#include "xfs_trans_space.h" +#include "xfs_error.h" +#include "xfs_log_priv.h" +#include "xfs_log_recover.h" + +kmem_zone_t *xfs_sxi_zone; +kmem_zone_t *xfs_sxd_zone; + +static inline struct xfs_sxi_log_item *SXI_ITEM(struct xfs_log_item *lip) +{ + return container_of(lip, struct xfs_sxi_log_item, sxi_item); +} + +STATIC void +xfs_sxi_item_free( + struct xfs_sxi_log_item *ilip) +{ + kmem_cache_free(xfs_sxi_zone, ilip); +} + +/* + * Freeing the SXI requires that we remove it from the AIL if it has already + * been placed there. However, the SXI may not yet have been placed in the AIL + * when called by xfs_sxi_release() from SXD processing due to the ordering of + * committed vs unpin operations in bulk insert operations. Hence the reference + * count to ensure only the last caller frees the SXI. + */ +STATIC void +xfs_sxi_release( + struct xfs_sxi_log_item *ilip) +{ + ASSERT(atomic_read(&ilip->sxi_refcount) > 0); + if (atomic_dec_and_test(&ilip->sxi_refcount)) { + xfs_trans_ail_remove(&ilip->sxi_item, SHUTDOWN_LOG_IO_ERROR); + xfs_sxi_item_free(ilip); + } +} + + +STATIC void +xfs_sxi_item_size( + struct xfs_log_item *lip, + int *nvecs, + int *nbytes) +{ + *nvecs += 1; + *nbytes += sizeof(struct xfs_sxi_log_format); +} + +/* + * This is called to fill in the vector of log iovecs for the + * given sxi log item. We use only 1 iovec, and we point that + * at the sxi_log_format structure embedded in the sxi item. + * It is at this point that we assert that all of the extent + * slots in the sxi item have been filled. + */ +STATIC void +xfs_sxi_item_format( + struct xfs_log_item *lip, + struct xfs_log_vec *lv) +{ + struct xfs_sxi_log_item *ilip = SXI_ITEM(lip); + struct xfs_log_iovec *vecp = NULL; + + ilip->sxi_format.sxi_type = XFS_LI_SXI; + ilip->sxi_format.sxi_size = 1; + + xlog_copy_iovec(lv, &vecp, XLOG_REG_TYPE_SXI_FORMAT, &ilip->sxi_format, + sizeof(struct xfs_sxi_log_format)); +} + +/* + * The unpin operation is the last place an SXI is manipulated in the log. It is + * either inserted in the AIL or aborted in the event of a log I/O error. In + * either case, the SXI transaction has been successfully committed to make it + * this far. Therefore, we expect whoever committed the SXI to either construct + * and commit the SXD or drop the SXD's reference in the event of error. Simply + * drop the log's SXI reference now that the log is done with it. + */ +STATIC void +xfs_sxi_item_unpin( + struct xfs_log_item *lip, + int remove) +{ + struct xfs_sxi_log_item *ilip = SXI_ITEM(lip); + + xfs_sxi_release(ilip); +} + +/* + * The SXI has been either committed or aborted if the transaction has been + * cancelled. If the transaction was cancelled, an SXD isn't going to be + * constructed and thus we free the SXI here directly. + */ +STATIC void +xfs_sxi_item_release( + struct xfs_log_item *lip) +{ + xfs_sxi_release(SXI_ITEM(lip)); +} + +static const struct xfs_item_ops xfs_sxi_item_ops = { + .iop_size = xfs_sxi_item_size, + .iop_format = xfs_sxi_item_format, + .iop_unpin = xfs_sxi_item_unpin, + .iop_release = xfs_sxi_item_release, +}; + +/* + * Allocate and initialize an sxi item with the given number of extents. + */ +STATIC struct xfs_sxi_log_item * +xfs_sxi_init( + struct xfs_mount *mp) + +{ + struct xfs_sxi_log_item *ilip; + + ilip = kmem_zone_zalloc(xfs_sxi_zone, 0); + + xfs_log_item_init(mp, &ilip->sxi_item, XFS_LI_SXI, &xfs_sxi_item_ops); + ilip->sxi_format.sxi_id = (uintptr_t)(void *)ilip; + atomic_set(&ilip->sxi_refcount, 2); + + return ilip; +} + +static inline struct xfs_sxd_log_item *SXD_ITEM(struct xfs_log_item *lip) +{ + return container_of(lip, struct xfs_sxd_log_item, sxd_item); +} + +STATIC void +xfs_sxd_item_size( + struct xfs_log_item *lip, + int *nvecs, + int *nbytes) +{ + *nvecs += 1; + *nbytes += sizeof(struct xfs_sxd_log_format); +} + +/* + * This is called to fill in the vector of log iovecs for the + * given sxd log item. We use only 1 iovec, and we point that + * at the sxd_log_format structure embedded in the sxd item. + * It is at this point that we assert that all of the extent + * slots in the sxd item have been filled. + */ +STATIC void +xfs_sxd_item_format( + struct xfs_log_item *lip, + struct xfs_log_vec *lv) +{ + struct xfs_sxd_log_item *dlip = SXD_ITEM(lip); + struct xfs_log_iovec *vecp = NULL; + + dlip->sxd_format.sxd_type = XFS_LI_SXD; + dlip->sxd_format.sxd_size = 1; + + xlog_copy_iovec(lv, &vecp, XLOG_REG_TYPE_SXD_FORMAT, &dlip->sxd_format, + sizeof(struct xfs_sxd_log_format)); +} + +/* + * The SXD is either committed or aborted if the transaction is cancelled. If + * the transaction is cancelled, drop our reference to the SXI and free the + * SXD. + */ +STATIC void +xfs_sxd_item_release( + struct xfs_log_item *lip) +{ + struct xfs_sxd_log_item *dlip = SXD_ITEM(lip); + + xfs_sxi_release(dlip->sxd_intent_log_item); + kmem_cache_free(xfs_sxd_zone, dlip); +} + +static const struct xfs_item_ops xfs_sxd_item_ops = { + .flags = XFS_ITEM_RELEASE_WHEN_COMMITTED, + .iop_size = xfs_sxd_item_size, + .iop_format = xfs_sxd_item_format, + .iop_release = xfs_sxd_item_release, +}; + +/* + * Process a swapext update intent item that was recovered from the log. + * We need to update some inode's bmbt. + */ +STATIC int +xfs_sxi_recover( + struct xfs_mount *mp, + struct xfs_defer_freezer **dffp, + struct xfs_sxi_log_item *ilip) +{ + return -EFSCORRUPTED; +} + +/* + * Copy an SXI format buffer from the given buf, and into the destination + * SXI format structure. The SXI/SXD items were designed not to need any + * special alignment handling. + */ +static int +xfs_sxi_copy_format( + struct xfs_log_iovec *buf, + struct xfs_sxi_log_format *dst_sxi_fmt) +{ + struct xfs_sxi_log_format *src_sxi_fmt; + size_t len; + + src_sxi_fmt = buf->i_addr; + len = sizeof(struct xfs_sxi_log_format); + + if (buf->i_len == len) { + memcpy(dst_sxi_fmt, src_sxi_fmt, len); + return 0; + } + XFS_ERROR_REPORT(__func__, XFS_ERRLEVEL_LOW, NULL); + return -EFSCORRUPTED; +} + +/* + * This routine is called to create an in-core extent swapext update + * item from the sxi format structure which was logged on disk. + * It allocates an in-core sxi, copies the extents from the format + * structure into it, and adds the sxi to the AIL with the given + * LSN. + */ +STATIC int +xlog_recover_sxi( + struct xlog *log, + struct xlog_recover_item *item, + xfs_lsn_t lsn) +{ + int error; + struct xfs_mount *mp = log->l_mp; + struct xfs_sxi_log_item *ilip; + struct xfs_sxi_log_format *sxi_formatp; + + sxi_formatp = item->ri_buf[0].i_addr; + + if (sxi_formatp->__pad != 0) { + XFS_ERROR_REPORT(__func__, XFS_ERRLEVEL_LOW, log->l_mp); + return -EFSCORRUPTED; + } + ilip = xfs_sxi_init(mp); + error = xfs_sxi_copy_format(&item->ri_buf[0], &ilip->sxi_format); + if (error) { + xfs_sxi_item_free(ilip); + return error; + } + xlog_recover_insert_ail(log, &ilip->sxi_item, lsn); + xfs_sxi_release(ilip); + return 0; +} + +STATIC bool +xlog_release_sxi( + struct xlog *log, + struct xfs_log_item *lip, + uint64_t intent_id) +{ + struct xfs_sxi_log_item *ilip = SXI_ITEM(lip); + struct xfs_ail *ailp = log->l_ailp; + + if (ilip->sxi_format.sxi_id == intent_id) { + /* + * Drop the SXD reference to the SXI. This + * removes the SXI from the AIL and frees it. + */ + spin_unlock(&ailp->ail_lock); + xfs_sxi_release(ilip); + spin_lock(&ailp->ail_lock); + return true; + } + + return false; +} + +/* + * This routine is called when an SXD format structure is found in a committed + * transaction in the log. Its purpose is to cancel the corresponding SXI if it + * was still in the log. To do this it searches the AIL for the SXI with an id + * equal to that in the SXD format structure. If we find it we drop the SXD + * reference, which removes the SXI from the AIL and frees it. + */ +STATIC int +xlog_recover_sxd( + struct xlog *log, + struct xlog_recover_item *item) +{ + struct xfs_sxd_log_format *sxd_formatp; + + sxd_formatp = item->ri_buf[0].i_addr; + if (item->ri_buf[0].i_len != sizeof(struct xfs_sxd_log_format)) { + XFS_ERROR_REPORT(__func__, XFS_ERRLEVEL_LOW, log->l_mp); + return -EFSCORRUPTED; + } + + xlog_recover_release_intent(log, XFS_LI_SXI, sxd_formatp->sxd_sxi_id, + xlog_release_sxi); + return 0; +} + +/* Recover the SXI if necessary. */ +STATIC int +xlog_recover_process_sxi( + struct xlog *log, + struct xfs_defer_freezer **dffp, + struct xfs_log_item *lip) +{ + struct xfs_ail *ailp = log->l_ailp; + struct xfs_sxi_log_item *ilip = SXI_ITEM(lip); + int error; + + /* + * Skip SXIs that we've already processed. + */ + if (test_bit(XFS_SXI_RECOVERED, &ilip->sxi_flags)) + return 0; + + spin_unlock(&ailp->ail_lock); + error = xfs_sxi_recover(log->l_mp, dffp, ilip); + spin_lock(&ailp->ail_lock); + + return error; +} + +/* Release the SXI since we're cancelling everything. */ +STATIC void +xlog_recover_cancel_sxi( + struct xfs_log_item *lip) +{ + xfs_sxi_release(SXI_ITEM(lip)); +} + +const struct xlog_recover_intent_type xlog_recover_swapext_type = { + .recover_intent = xlog_recover_sxi, + .recover_done = xlog_recover_sxd, + .process_intent = xlog_recover_process_sxi, + .cancel_intent = xlog_recover_cancel_sxi, +}; diff --git a/fs/xfs/xfs_swapext_item.h b/fs/xfs/xfs_swapext_item.h new file mode 100644 index 000000000000..63e2c15d117d --- /dev/null +++ b/fs/xfs/xfs_swapext_item.h @@ -0,0 +1,67 @@ +// SPDX-License-Identifier: GPL-2.0-or-later +/* + * Copyright (C) 2020 Oracle. All Rights Reserved. + * Author: Darrick J. Wong + */ +#ifndef __XFS_SWAPEXT_ITEM_H__ +#define __XFS_SWAPEXT_ITEM_H__ + +/* + * The extent swapping intent item help us perform atomic extent swaps between + * two inode forks. It does this by tracking the range of logical offsets that + * still need to be swapped, and relogs as progress happens. + * + * *I items should be recorded in the *first* of a series of rolled + * transactions, and the *D items should be recorded in the same transaction + * that records the associated bmbt updates. + * + * Should the system crash after the commit of the first transaction but + * before the commit of the final transaction in a series, log recovery will + * use the redo information recorded by the intent items to replay the + * rest of the extent swaps. + */ + +/* kernel only SXI/SXD definitions */ + +struct xfs_mount; +struct kmem_zone; + +/* + * Max number of extents in fast allocation path. + */ +#define XFS_SXI_MAX_FAST_EXTENTS 1 + +/* + * Define SXI flag bits. Manipulated by set/clear/test_bit operators. + */ +#define XFS_SXI_RECOVERED 1 + +/* + * This is the "swapext update intent" log item. It is used to log the fact + * that we are swapping extents between two files. It is used in conjunction + * with the "swapext update done" log item described below. + * + * These log items follow the same rules as struct xfs_efi_log_item; see the + * comments about that structure (in xfs_extfree_item.h) for more details. + */ +struct xfs_sxi_log_item { + struct xfs_log_item sxi_item; + atomic_t sxi_refcount; + unsigned long sxi_flags; + struct xfs_sxi_log_format sxi_format; +}; + +/* + * This is the "swapext update done" log item. It is used to log the fact that + * some extent swapping mentioned in an earlier sxi item have been performed. + */ +struct xfs_sxd_log_item { + struct xfs_log_item sxd_item; + struct xfs_sxi_log_item *sxd_intent_log_item; + struct xfs_sxd_log_format sxd_format; +}; + +extern struct kmem_zone *xfs_sxi_zone; +extern struct kmem_zone *xfs_sxd_zone; + +#endif /* __XFS_SWAPEXT_ITEM_H__ */ From patchwork Wed Apr 29 02:45:13 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Darrick J. Wong" X-Patchwork-Id: 11515911 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 8D42C17EF for ; Wed, 29 Apr 2020 02:47:26 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 6CB4920775 for ; Wed, 29 Apr 2020 02:47:26 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=oracle.com header.i=@oracle.com header.b="UqFQdsYO" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726567AbgD2CrZ (ORCPT ); Tue, 28 Apr 2020 22:47:25 -0400 Received: from userp2130.oracle.com ([156.151.31.86]:52950 "EHLO userp2130.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726522AbgD2CrZ (ORCPT ); Tue, 28 Apr 2020 22:47:25 -0400 Received: from pps.filterd (userp2130.oracle.com [127.0.0.1]) by userp2130.oracle.com (8.16.0.42/8.16.0.42) with SMTP id 03T2lKcC159037; Wed, 29 Apr 2020 02:47:20 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=subject : from : to : cc : date : message-id : in-reply-to : references : mime-version : content-type : content-transfer-encoding; s=corp-2020-01-29; bh=YbxyQRYNMLnUlXv61jZPEj1CmrGgoU6OledFJg0xMZg=; b=UqFQdsYOCFsv+RWGyERFJAinyEA8t583dyM6SoYou2f5okzCP08nfA/p2H023Z0xNlx5 fkjJPu1fIPpSt1h3QUZ4nktQOSBFeXQTUb5cLjAWSvLuRUpeBtw9/OdELO2kv7Fw5m61 zvljL+i20EEHJoO9Jm8qeKHMs/Y0LvBO53asqZJwdOE3xbYSG8e2drT/jwf+7mg0OJAq HmzgniThtaEbVuv8QwlW9frR8778ERBc9GVO1RGqjiWQMy9CsxccAIHjXYQjNwN1yQaa s6MyURYdUtISnJIiMJKLabvUDR6ZJ3g1IarBoSY1Uy6RRl1c8EK7UBTFXgQu8v1TssqS tQ== Received: from userp3030.oracle.com (userp3030.oracle.com [156.151.31.80]) by userp2130.oracle.com with ESMTP id 30p01nstpf-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Wed, 29 Apr 2020 02:47:20 +0000 Received: from pps.filterd (userp3030.oracle.com [127.0.0.1]) by userp3030.oracle.com (8.16.0.42/8.16.0.42) with SMTP id 03T2gi1S071666; Wed, 29 Apr 2020 02:45:16 GMT Received: from aserv0121.oracle.com (aserv0121.oracle.com [141.146.126.235]) by userp3030.oracle.com with ESMTP id 30mxphp33g-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Wed, 29 Apr 2020 02:45:15 +0000 Received: from abhmp0014.oracle.com (abhmp0014.oracle.com [141.146.116.20]) by aserv0121.oracle.com (8.14.4/8.13.8) with ESMTP id 03T2jE42015941; Wed, 29 Apr 2020 02:45:14 GMT Received: from localhost (/67.169.218.210) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Tue, 28 Apr 2020 19:45:14 -0700 Subject: [PATCH 09/18] xfs: create deferred log items for extent swapping From: "Darrick J. Wong" To: darrick.wong@oracle.com Cc: linux-xfs@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-api@vger.kernel.org Date: Tue, 28 Apr 2020 19:45:13 -0700 Message-ID: <158812831335.168506.4177678044971007213.stgit@magnolia> In-Reply-To: <158812825316.168506.932540609191384366.stgit@magnolia> References: <158812825316.168506.932540609191384366.stgit@magnolia> User-Agent: StGit/0.17.1-dirty MIME-Version: 1.0 X-Proofpoint-Virus-Version: vendor=nai engine=6000 definitions=9605 signatures=668686 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 mlxlogscore=999 malwarescore=0 mlxscore=0 bulkscore=0 adultscore=0 phishscore=0 suspectscore=3 spamscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2003020000 definitions=main-2004290020 X-Proofpoint-Virus-Version: vendor=nai engine=6000 definitions=9605 signatures=668686 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 bulkscore=0 spamscore=0 clxscore=1015 phishscore=0 mlxlogscore=999 adultscore=0 priorityscore=1501 mlxscore=0 suspectscore=3 malwarescore=0 lowpriorityscore=0 impostorscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2003020000 definitions=main-2004290020 Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org From: Darrick J. Wong Now that we've created the skeleton of a log intent item to track and restart extent swap operations, add the upper level logic to commit intent items and turn them into concrete work recorded in the log. We use the deferred item "multihop" feature that was introduced a few patches ago to constrain the number of active swap operations to one per thread. Signed-off-by: Darrick J. Wong --- fs/xfs/Makefile | 1 fs/xfs/libxfs/xfs_bmap.h | 13 + fs/xfs/libxfs/xfs_defer.c | 1 fs/xfs/libxfs/xfs_defer.h | 2 fs/xfs/libxfs/xfs_swapext.c | 430 +++++++++++++++++++++++++++++++++++++++++++ fs/xfs/libxfs/xfs_swapext.h | 57 ++++++ fs/xfs/xfs_swapext_item.c | 336 ++++++++++++++++++++++++++++++++++ fs/xfs/xfs_trace.c | 1 fs/xfs/xfs_trace.h | 49 +++++ 9 files changed, 885 insertions(+), 5 deletions(-) create mode 100644 fs/xfs/libxfs/xfs_swapext.c create mode 100644 fs/xfs/libxfs/xfs_swapext.h diff --git a/fs/xfs/Makefile b/fs/xfs/Makefile index 27b4bd5c8ffe..6f8d8f2f8a8c 100644 --- a/fs/xfs/Makefile +++ b/fs/xfs/Makefile @@ -51,6 +51,7 @@ xfs-y += $(addprefix libxfs/, \ xfs_refcount.o \ xfs_refcount_btree.o \ xfs_sb.o \ + xfs_swapext.o \ xfs_symlink_remote.o \ xfs_trans_inode.o \ xfs_trans_resv.o \ diff --git a/fs/xfs/libxfs/xfs_bmap.h b/fs/xfs/libxfs/xfs_bmap.h index 3367df499ac8..215ce1b8c736 100644 --- a/fs/xfs/libxfs/xfs_bmap.h +++ b/fs/xfs/libxfs/xfs_bmap.h @@ -158,6 +158,13 @@ static inline int xfs_bmapi_whichfork(int bmapi_flags) { BMAP_ATTRFORK, "ATTR" }, \ { BMAP_COWFORK, "COW" } +/* Return true if the extent is an allocated extent, written or not. */ +static inline bool xfs_bmap_is_mapped_extent(struct xfs_bmbt_irec *irec) +{ + return irec->br_startblock != HOLESTARTBLOCK && + irec->br_startblock != DELAYSTARTBLOCK && + !isnullstartblock(irec->br_startblock); +} /* * Return true if the extent is a real, allocated extent, or false if it is a @@ -165,10 +172,8 @@ static inline int xfs_bmapi_whichfork(int bmapi_flags) */ static inline bool xfs_bmap_is_real_extent(struct xfs_bmbt_irec *irec) { - return irec->br_state != XFS_EXT_UNWRITTEN && - irec->br_startblock != HOLESTARTBLOCK && - irec->br_startblock != DELAYSTARTBLOCK && - !isnullstartblock(irec->br_startblock); + return xfs_bmap_is_mapped_extent(irec) && + irec->br_state != XFS_EXT_UNWRITTEN; } /* diff --git a/fs/xfs/libxfs/xfs_defer.c b/fs/xfs/libxfs/xfs_defer.c index f53e3ce858eb..00bd0e478829 100644 --- a/fs/xfs/libxfs/xfs_defer.c +++ b/fs/xfs/libxfs/xfs_defer.c @@ -184,6 +184,7 @@ static const struct xfs_defer_op_type *defer_op_types[] = { [XFS_DEFER_OPS_TYPE_RMAP] = &xfs_rmap_update_defer_type, [XFS_DEFER_OPS_TYPE_FREE] = &xfs_extent_free_defer_type, [XFS_DEFER_OPS_TYPE_AGFL_FREE] = &xfs_agfl_free_defer_type, + [XFS_DEFER_OPS_TYPE_SWAPEXT] = &xfs_swapext_defer_type, }; /* diff --git a/fs/xfs/libxfs/xfs_defer.h b/fs/xfs/libxfs/xfs_defer.h index e64b577a9b95..226db6e5a1b0 100644 --- a/fs/xfs/libxfs/xfs_defer.h +++ b/fs/xfs/libxfs/xfs_defer.h @@ -18,6 +18,7 @@ enum xfs_defer_ops_type { XFS_DEFER_OPS_TYPE_RMAP, XFS_DEFER_OPS_TYPE_FREE, XFS_DEFER_OPS_TYPE_AGFL_FREE, + XFS_DEFER_OPS_TYPE_SWAPEXT, XFS_DEFER_OPS_TYPE_MAX, }; @@ -65,6 +66,7 @@ extern const struct xfs_defer_op_type xfs_refcount_update_defer_type; extern const struct xfs_defer_op_type xfs_rmap_update_defer_type; extern const struct xfs_defer_op_type xfs_extent_free_defer_type; extern const struct xfs_defer_op_type xfs_agfl_free_defer_type; +extern const struct xfs_defer_op_type xfs_swapext_defer_type; /* * Deferred operation freezer. This structure enables a dfops user to detach diff --git a/fs/xfs/libxfs/xfs_swapext.c b/fs/xfs/libxfs/xfs_swapext.c new file mode 100644 index 000000000000..2eff48453070 --- /dev/null +++ b/fs/xfs/libxfs/xfs_swapext.c @@ -0,0 +1,430 @@ +// SPDX-License-Identifier: GPL-2.0-or-later +/* + * Copyright (C) 2020 Oracle. All Rights Reserved. + * Author: Darrick J. Wong + */ +#include "xfs.h" +#include "xfs_fs.h" +#include "xfs_shared.h" +#include "xfs_format.h" +#include "xfs_log_format.h" +#include "xfs_trans_resv.h" +#include "xfs_mount.h" +#include "xfs_defer.h" +#include "xfs_inode.h" +#include "xfs_trans.h" +#include "xfs_bmap.h" +#include "xfs_icache.h" +#include "xfs_quota.h" +#include "xfs_swapext.h" +#include "xfs_trace.h" + +/* Information to help us reset reflink flag / CoW fork state after a swap. */ + +/* Are we swapping the data fork? */ +#define XFS_SX_REFLINK_DATAFORK (1U << 0) + +/* Can we swap the flags? */ +#define XFS_SX_REFLINK_SWAPFLAGS (1U << 1) + +/* Previous state of the two inodes' reflink flags. */ +#define XFS_SX_REFLINK_IP1_REFLINK (1U << 2) +#define XFS_SX_REFLINK_IP2_REFLINK (1U << 3) + + +/* + * Prepare both inodes' reflink state for an extent swap, and return our + * findings so that xfs_swapext_reflink_finish can deal with the aftermath. + */ +unsigned int +xfs_swapext_reflink_prep( + struct xfs_inode *ip1, + struct xfs_inode *ip2, + int whichfork, + xfs_fileoff_t startoff1, + xfs_fileoff_t startoff2, + xfs_filblks_t blockcount) +{ + struct xfs_mount *mp = ip1->i_mount; + unsigned int rs = 0; + + if (whichfork != XFS_DATA_FORK) + return 0; + + /* + * If either file has shared blocks and we're swapping data forks, we + * must flag the other file as having shared blocks so that we get the + * shared-block rmap functions if we need to fix up the rmaps. The + * flags will be switched for real by xfs_swapext_reflink_finish. + */ + if (xfs_is_reflink_inode(ip1)) + rs |= XFS_SX_REFLINK_IP1_REFLINK; + if (xfs_is_reflink_inode(ip2)) + rs |= XFS_SX_REFLINK_IP2_REFLINK; + + if (rs & XFS_SX_REFLINK_IP1_REFLINK) + ip2->i_d.di_flags2 |= XFS_DIFLAG2_REFLINK; + if (rs & XFS_SX_REFLINK_IP2_REFLINK) + ip1->i_d.di_flags2 |= XFS_DIFLAG2_REFLINK; + + /* + * If either file had the reflink flag set before; and the two files' + * reflink state was different; and we're swapping the entirety of both + * files, then we can exchange the reflink flags at the end. + * Otherwise, we propagate the reflink flag from either file to the + * other file. + * + * Note that we've only set the _REFLINK flags of the reflink state, so + * we can cheat and use hweight32 for the reflink flag test. + * + */ + if (hweight32(rs) == 1 && startoff1 == 0 && startoff2 == 0 && + blockcount == XFS_B_TO_FSB(mp, ip1->i_d.di_size) && + blockcount == XFS_B_TO_FSB(mp, ip2->i_d.di_size)) + rs |= XFS_SX_REFLINK_SWAPFLAGS; + + rs |= XFS_SX_REFLINK_DATAFORK; + return rs; +} + +/* + * If the reflink flag is set on either inode, make sure it has an incore CoW + * fork, since all reflink inodes must have them. If there's a CoW fork and it + * has extents in it, make sure the inodes are tagged appropriately so that + * speculative preallocations can be GC'd if we run low of space. + */ +static inline void +xfs_swapext_ensure_cowfork( + struct xfs_inode *ip) +{ + struct xfs_ifork *cfork; + + if (xfs_is_reflink_inode(ip)) + xfs_ifork_init_cow(ip); + + cfork = XFS_IFORK_PTR(ip, XFS_COW_FORK); + if (!cfork) + return; + if (cfork->if_bytes > 0) + xfs_inode_set_cowblocks_tag(ip); + else + xfs_inode_clear_cowblocks_tag(ip); +} + +/* + * Set both inodes' ondisk reflink flags to their final state and ensure that + * the incore state is ready to go. + */ +void +xfs_swapext_reflink_finish( + struct xfs_trans *tp, + struct xfs_inode *ip1, + struct xfs_inode *ip2, + unsigned int rs) +{ + if (!(rs & XFS_SX_REFLINK_DATAFORK)) + return; + + if (rs & XFS_SX_REFLINK_SWAPFLAGS) { + /* Exchange the reflink inode flags and log them. */ + ip1->i_d.di_flags2 &= ~XFS_DIFLAG2_REFLINK; + if (rs & XFS_SX_REFLINK_IP2_REFLINK) + ip1->i_d.di_flags2 |= XFS_DIFLAG2_REFLINK; + + ip2->i_d.di_flags2 &= ~XFS_DIFLAG2_REFLINK; + if (rs & XFS_SX_REFLINK_IP1_REFLINK) + ip2->i_d.di_flags2 |= XFS_DIFLAG2_REFLINK; + + xfs_trans_log_inode(tp, ip1, XFS_ILOG_CORE); + xfs_trans_log_inode(tp, ip2, XFS_ILOG_CORE); + } + + xfs_swapext_ensure_cowfork(ip1); + xfs_swapext_ensure_cowfork(ip2); +} + +/* Schedule an atomic extent swap. */ +static inline void +xfs_swapext_schedule( + struct xfs_trans *tp, + struct xfs_swapext_intent *sxi) +{ + trace_xfs_swapext_defer(tp->t_mountp, sxi); + xfs_defer_add(tp, XFS_DEFER_OPS_TYPE_SWAPEXT, &sxi->si_list); +} + +/* Reschedule an atomic extent swap on behalf of log recovery. */ +void +xfs_swapext_reschedule( + struct xfs_trans *tp, + const struct xfs_swapext_intent *sxi) +{ + struct xfs_swapext_intent *new_sxi; + + new_sxi = kmem_alloc(sizeof(struct xfs_swapext_intent), KM_NOFS); + memcpy(new_sxi, sxi, sizeof(*new_sxi)); + INIT_LIST_HEAD(&new_sxi->si_list); + + xfs_swapext_schedule(tp, new_sxi); +} + +/* + * Adjust the on-disk inode size upwards if needed so that we never map extents + * into the file past EOF. This is crucial so that log recovery won't get + * confused by the sudden appearance of post-eof extents. + */ +STATIC void +xfs_swapext_update_size( + struct xfs_trans *tp, + struct xfs_inode *ip, + struct xfs_bmbt_irec *imap, + xfs_fsize_t new_isize) +{ + struct xfs_mount *mp = tp->t_mountp; + xfs_fsize_t len; + + if (new_isize < 0) + return; + + len = min(XFS_FSB_TO_B(mp, imap->br_startoff + imap->br_blockcount), + new_isize); + + if (len <= ip->i_d.di_size) + return; + + trace_xfs_swapext_update_inode_size(ip, len); + + ip->i_d.di_size = len; + xfs_trans_log_inode(tp, ip, XFS_ILOG_CORE); +} + +/* Do we have more work to do to finish this operation? */ +bool +xfs_swapext_has_more_work( + struct xfs_swapext_intent *sxi) +{ + return sxi->si_blockcount > 0; +} + +/* Finish one extent swap, possibly log more. */ +int +xfs_swapext_finish_one( + struct xfs_trans *tp, + struct xfs_swapext_intent *sxi) +{ + struct xfs_bmbt_irec irec1, irec2; + int whichfork; + int nimaps; + int bmap_flags; + int error; + + whichfork = (sxi->si_flags & XFS_SWAP_EXTENT_ATTR_FORK) ? + XFS_ATTR_FORK : XFS_DATA_FORK; + bmap_flags = xfs_bmapi_aflag(whichfork); + + while (sxi->si_blockcount > 0) { + int64_t ip1_delta = 0, ip2_delta = 0; + + /* Read extent from the first file */ + nimaps = 1; + error = xfs_bmapi_read(sxi->si_ip1, sxi->si_startoff1, + sxi->si_blockcount, &irec1, &nimaps, + bmap_flags); + if (error) + return error; + if (nimaps != 1 || + irec1.br_startblock == DELAYSTARTBLOCK || + irec1.br_startoff != sxi->si_startoff1) { + /* + * We should never get no mapping or a delalloc extent + * or something that doesn't match what we asked for, + * since the caller flushed both inodes and we hold the + * ILOCKs for both inodes. + */ + ASSERT(0); + return -EINVAL; + } + + /* Read extent from the second file */ + nimaps = 1; + error = xfs_bmapi_read(sxi->si_ip2, sxi->si_startoff2, + irec1.br_blockcount, &irec2, &nimaps, + bmap_flags); + if (error) + return error; + if (nimaps != 1 || + irec2.br_startblock == DELAYSTARTBLOCK || + irec2.br_startoff != sxi->si_startoff2) { + /* + * We should never get no mapping or a delalloc extent + * or something that doesn't match what we asked for, + * since the caller flushed both inodes and we hold the + * ILOCKs for both inodes. + */ + ASSERT(0); + return -EINVAL; + } + + /* + * We can only swap as many blocks as the smaller of the two + * extent maps. + */ + irec1.br_blockcount = min(irec1.br_blockcount, + irec2.br_blockcount); + + trace_xfs_swapext_extent1(sxi->si_ip1, &irec1); + trace_xfs_swapext_extent2(sxi->si_ip2, &irec2); + + /* + * Two extents mapped to the same physical block must not have + * different states; that's filesystem corruption. Move on to + * the next extent if they're both holes or both the same + * physical extent. + */ + if (irec1.br_startblock == irec2.br_startblock) { + if (irec1.br_state != irec2.br_state) + return -EFSCORRUPTED; + + sxi->si_startoff1 += irec1.br_blockcount; + sxi->si_startoff2 += irec1.br_blockcount; + sxi->si_blockcount -= irec1.br_blockcount; + continue; + } + + /* Update quota accounting. */ + if (xfs_bmap_is_mapped_extent(&irec1)) { + ip1_delta -= irec1.br_blockcount; + ip2_delta += irec1.br_blockcount; + } + if (xfs_bmap_is_mapped_extent(&irec2)) { + ip1_delta += irec2.br_blockcount; + ip2_delta -= irec2.br_blockcount; + } + + if (ip1_delta) + xfs_trans_mod_dquot_byino(tp, sxi->si_ip1, + XFS_TRANS_DQ_BCOUNT, ip1_delta); + if (ip2_delta) + xfs_trans_mod_dquot_byino(tp, sxi->si_ip2, + XFS_TRANS_DQ_BCOUNT, ip2_delta); + + /* Remove both mappings. */ + xfs_bmap_unmap_extent(tp, sxi->si_ip1, whichfork, &irec1); + xfs_bmap_unmap_extent(tp, sxi->si_ip2, whichfork, &irec2); + + /* + * Re-add both mappings. We swap the file offsets between the + * two maps and add the opposite map, which has the effect of + * filling the logical offsets we just unmapped, but with with + * the physical mapping information swapped. + */ + swap(irec1.br_startoff, irec2.br_startoff); + xfs_bmap_map_extent(tp, sxi->si_ip1, whichfork, &irec2); + xfs_bmap_map_extent(tp, sxi->si_ip2, whichfork, &irec1); + + /* Make sure we're not mapping extents past EOF. */ + if (whichfork == XFS_DATA_FORK) { + xfs_swapext_update_size(tp, sxi->si_ip1, &irec2, + sxi->si_isize1); + xfs_swapext_update_size(tp, sxi->si_ip2, &irec1, + sxi->si_isize2); + } + + /* + * Advance our cursor and exit. The caller (either defer ops + * or log recovery) will log the SXD item, and if *blockcount + * is nonzero, it will log a new SXI item for the remainder + * and call us back. + */ + sxi->si_startoff1 += irec1.br_blockcount; + sxi->si_startoff2 += irec1.br_blockcount; + sxi->si_blockcount -= irec1.br_blockcount; + break; + } + + /* + * If we've reached the end of the remap operation and the caller + * wanted us to exchange the sizes, do that now. + */ + if (sxi->si_blockcount == 0 && + (sxi->si_flags & XFS_SWAP_EXTENT_SET_SIZES)) { + sxi->si_ip1->i_d.di_size = sxi->si_isize1; + sxi->si_ip2->i_d.di_size = sxi->si_isize2; + xfs_trans_log_inode(tp, sxi->si_ip1, XFS_ILOG_CORE); + xfs_trans_log_inode(tp, sxi->si_ip2, XFS_ILOG_CORE); + } + + if (xfs_swapext_has_more_work(sxi)) + trace_xfs_swapext_defer(tp->t_mountp, sxi); + return 0; +} + +static void +xfs_swapext_init_intent( + struct xfs_swapext_intent *sxi, + struct xfs_inode *ip1, + struct xfs_inode *ip2, + int whichfork, + xfs_fileoff_t startoff1, + xfs_fileoff_t startoff2, + xfs_filblks_t blockcount, + unsigned int flags) +{ + INIT_LIST_HEAD(&sxi->si_list); + sxi->si_flags = 0; + if (whichfork == XFS_ATTR_FORK) + sxi->si_flags |= XFS_SWAP_EXTENT_ATTR_FORK; + sxi->si_isize1 = sxi->si_isize2 = -1; + if (whichfork == XFS_DATA_FORK && (flags & XFS_SWAPEXT_SET_SIZES)) { + sxi->si_flags |= XFS_SWAP_EXTENT_SET_SIZES; + sxi->si_isize1 = ip2->i_d.di_size; + sxi->si_isize2 = ip1->i_d.di_size; + } + sxi->si_ip1 = ip1; + sxi->si_ip2 = ip2; + sxi->si_startoff1 = startoff1; + sxi->si_startoff2 = startoff2; + sxi->si_blockcount = blockcount; +} + +/* + * Atomically swap a range of extents from one inode to another. + * + * The caller must ensure the inodes must be joined to the transaction and + * ILOCKd; they will still be joined to the transaction at exit. + */ +int +xfs_swapext_atomic( + struct xfs_trans **tpp, + struct xfs_inode *ip1, + struct xfs_inode *ip2, + int whichfork, + xfs_fileoff_t startoff1, + xfs_fileoff_t startoff2, + xfs_filblks_t blockcount, + unsigned int flags) +{ + struct xfs_swapext_intent *sxi; + unsigned int state; + int error; + + ASSERT(xfs_isilocked(ip1, XFS_ILOCK_EXCL)); + ASSERT(xfs_isilocked(ip2, XFS_ILOCK_EXCL)); + ASSERT(whichfork != XFS_COW_FORK); + ASSERT(whichfork == XFS_DATA_FORK || !(flags & XFS_SWAPEXT_SET_SIZES)); + + state = xfs_swapext_reflink_prep(ip1, ip2, whichfork, startoff1, + startoff2, blockcount); + + sxi = kmem_alloc(sizeof(struct xfs_swapext_intent), KM_NOFS); + xfs_swapext_init_intent(sxi, ip1, ip2, whichfork, startoff1, startoff2, + blockcount, flags); + xfs_swapext_schedule(*tpp, sxi); + + error = xfs_defer_finish(tpp); + if (error) + return error; + + xfs_swapext_reflink_finish(*tpp, ip1, ip2, state); + return 0; +} diff --git a/fs/xfs/libxfs/xfs_swapext.h b/fs/xfs/libxfs/xfs_swapext.h new file mode 100644 index 000000000000..af1893f37d39 --- /dev/null +++ b/fs/xfs/libxfs/xfs_swapext.h @@ -0,0 +1,57 @@ +// SPDX-License-Identifier: GPL-2.0-or-later +/* + * Copyright (C) 2020 Oracle. All Rights Reserved. + * Author: Darrick J. Wong + */ +#ifndef __XFS_SWAPEXT_H_ +#define __XFS_SWAPEXT_H_ 1 + +/* + * In-core information about an extent swap request between ranges of two + * inodes. + */ +struct xfs_swapext_intent { + /* List of other incore deferred work. */ + struct list_head si_list; + + /* The two inodes we're swapping. */ + union { + struct xfs_inode *si_ip1; + xfs_ino_t si_ino1; + }; + union { + struct xfs_inode *si_ip2; + xfs_ino_t si_ino2; + }; + + /* File offset range information. */ + xfs_fileoff_t si_startoff1; + xfs_fileoff_t si_startoff2; + xfs_filblks_t si_blockcount; + uint64_t si_flags; + + /* Set these file sizes after the operation, unless negative. */ + xfs_fsize_t si_isize1; + xfs_fsize_t si_isize2; +}; + +bool xfs_swapext_has_more_work(struct xfs_swapext_intent *sxi); + +unsigned int xfs_swapext_reflink_prep(struct xfs_inode *ip1, + struct xfs_inode *ip2, int whichfork, xfs_fileoff_t startoff1, + xfs_fileoff_t startoff2, xfs_filblks_t blockcount); +void xfs_swapext_reflink_finish(struct xfs_trans *tp, struct xfs_inode *ip1, + struct xfs_inode *ip2, unsigned int reflink_state); + +void xfs_swapext_reschedule(struct xfs_trans *tpp, + const struct xfs_swapext_intent *sxi_state); +int xfs_swapext_finish_one(struct xfs_trans *tp, + struct xfs_swapext_intent *sxi_state); + +#define XFS_SWAPEXT_SET_SIZES (1U << 0) +int xfs_swapext_atomic(struct xfs_trans **tpp, struct xfs_inode *ip1, + struct xfs_inode *ip2, int whichfork, xfs_fileoff_t startoff1, + xfs_fileoff_t startoff2, xfs_filblks_t blockcount, + unsigned int flags); + +#endif /* __XFS_SWAPEXT_H_ */ diff --git a/fs/xfs/xfs_swapext_item.c b/fs/xfs/xfs_swapext_item.c index 63ba43e5c3bb..fadd522c6841 100644 --- a/fs/xfs/xfs_swapext_item.c +++ b/fs/xfs/xfs_swapext_item.c @@ -16,9 +16,11 @@ #include "xfs_trans.h" #include "xfs_trans_priv.h" #include "xfs_swapext_item.h" +#include "xfs_swapext.h" #include "xfs_log.h" #include "xfs_bmap.h" #include "xfs_icache.h" +#include "xfs_bmap_btree.h" #include "xfs_trans_space.h" #include "xfs_error.h" #include "xfs_log_priv.h" @@ -205,6 +207,240 @@ static const struct xfs_item_ops xfs_sxd_item_ops = { .iop_release = xfs_sxd_item_release, }; +static struct xfs_sxd_log_item * +xfs_trans_get_sxd( + struct xfs_trans *tp, + struct xfs_sxi_log_item *ilip) +{ + struct xfs_sxd_log_item *dlip; + + dlip = kmem_zone_zalloc(xfs_sxd_zone, 0); + xfs_log_item_init(tp->t_mountp, &dlip->sxd_item, XFS_LI_SXD, + &xfs_sxd_item_ops); + dlip->sxd_intent_log_item = ilip; + dlip->sxd_format.sxd_sxi_id = ilip->sxi_format.sxi_id; + + xfs_trans_add_item(tp, &dlip->sxd_item); + return dlip; +} + +/* + * Finish an swapext update and log it to the SXD. Note that the + * transaction is marked dirty regardless of whether the swapext update + * succeeds or fails to support the SXI/SXD lifecycle rules. + */ +static int +xfs_trans_log_finish_swapext_update( + struct xfs_trans *tp, + struct xfs_sxd_log_item *dlip, + struct xfs_swapext_intent *sxi) +{ + int error; + + error = xfs_swapext_finish_one(tp, sxi); + + /* + * Mark the transaction dirty, even on error. This ensures the + * transaction is aborted, which: + * + * 1.) releases the SXI and frees the SXD + * 2.) shuts down the filesystem + */ + tp->t_flags |= XFS_TRANS_DIRTY; + set_bit(XFS_LI_DIRTY, &dlip->sxd_item.li_flags); + + return error; +} + +/* Sort swapext intents by inode. */ +static int +xfs_swapext_diff_items( + void *priv, + struct list_head *a, + struct list_head *b) +{ + struct xfs_swapext_intent *sa; + struct xfs_swapext_intent *sb; + + sa = container_of(a, struct xfs_swapext_intent, si_list); + sb = container_of(b, struct xfs_swapext_intent, si_list); + return sa->si_ip1->i_ino - sb->si_ip2->i_ino; +} + +/* Get an SXI. */ +STATIC void * +xfs_swapext_create_intent( + struct xfs_trans *tp, + unsigned int count) +{ + struct xfs_sxi_log_item *ilip; + + ASSERT(count == XFS_SXI_MAX_FAST_EXTENTS); + ASSERT(tp != NULL); + + ilip = xfs_sxi_init(tp->t_mountp); + ASSERT(ilip != NULL); + + /* + * Get a log_item_desc to point at the new item. + */ + xfs_trans_add_item(tp, &ilip->sxi_item); + return ilip; +} + +/* Log swapext updates in the intent item. */ +STATIC void +xfs_swapext_log_item( + struct xfs_trans *tp, + void *intent, + struct list_head *item) +{ + struct xfs_sxi_log_item *ilip = intent; + struct xfs_swapext_intent *sxi; + struct xfs_swap_extent *se; + + ASSERT(!test_bit(XFS_LI_DIRTY, &ilip->sxi_item.li_flags)); + + sxi = container_of(item, struct xfs_swapext_intent, si_list); + + tp->t_flags |= XFS_TRANS_DIRTY; + set_bit(XFS_LI_DIRTY, &ilip->sxi_item.li_flags); + + se = &ilip->sxi_format.sxi_extent; + se->se_inode1 = sxi->si_ip1->i_ino; + se->se_inode2 = sxi->si_ip2->i_ino; + se->se_startoff1 = sxi->si_startoff1; + se->se_startoff2 = sxi->si_startoff2; + se->se_blockcount = sxi->si_blockcount; + se->se_isize1 = sxi->si_isize1; + se->se_isize2 = sxi->si_isize2; + se->se_flags = sxi->si_flags; +} + +/* Get an SXD so we can process all the deferred swapext updates. */ +STATIC void * +xfs_swapext_create_done( + struct xfs_trans *tp, + void *intent, + unsigned int count) +{ + return xfs_trans_get_sxd(tp, intent); +} + +/* Process a deferred swapext update. */ +STATIC int +xfs_swapext_finish_item( + struct xfs_trans *tp, + struct list_head *item, + void *done_item, + void **state) +{ + struct xfs_swapext_intent *sxi; + int error; + + sxi = container_of(item, struct xfs_swapext_intent, si_list); + + /* + * Swap one more extent between the two files. If there's still more + * work to do, we want to requeue ourselves after all other pending + * deferred operations have finished. This includes all of the dfops + * that we queued directly as well as any new ones created in the + * process of finishing the others. Doing so prevents us from queuing + * a large number of SXI log items in kernel memory, which in turn + * prevents us from pinning the tail of the log (while logging those + * new SXI items) until the first SXI items can be processed. + */ + error = xfs_trans_log_finish_swapext_update(tp, done_item, sxi); + if (!error && xfs_swapext_has_more_work(sxi)) + return -EMULTIHOP; + + kmem_free(sxi); + return error; +} + +/* Abort all pending SXIs. */ +STATIC void +xfs_swapext_abort_intent( + void *intent) +{ + xfs_sxi_release(intent); +} + +/* Cancel a deferred swapext update. */ +STATIC void +xfs_swapext_cancel_item( + struct list_head *item) +{ + struct xfs_swapext_intent *sxi; + + sxi = container_of(item, struct xfs_swapext_intent, si_list); + kmem_free(sxi); +} + +/* Prepare a deferred swapext item for freezing by detaching the inodes. */ +STATIC int +xfs_swapext_freeze_item( + struct xfs_defer_freezer *freezer, + struct list_head *item) +{ + struct xfs_swapext_intent *sxi; + struct xfs_inode *ip; + int error; + + sxi = container_of(item, struct xfs_swapext_intent, si_list); + + ip = sxi->si_ip1; + error = xfs_defer_freezer_ijoin(freezer, ip); + if (error) + return error; + sxi->si_ino1 = ip->i_ino; + + ip = sxi->si_ip2; + error = xfs_defer_freezer_ijoin(freezer, ip); + if (error) + return error; + sxi->si_ino2 = ip->i_ino; + + return 0; +} + +/* Thaw a deferred swapext item by reattaching the inodes. */ +STATIC int +xfs_swapext_thaw_item( + struct xfs_defer_freezer *freezer, + struct list_head *item) +{ + struct xfs_swapext_intent *sxi; + struct xfs_inode *ip; + + sxi = container_of(item, struct xfs_swapext_intent, si_list); + + ip = xfs_defer_freezer_igrab(freezer, sxi->si_ino1); + if (!ip) + return -EFSCORRUPTED; + sxi->si_ip1 = ip; + + ip = xfs_defer_freezer_igrab(freezer, sxi->si_ino2); + if (!ip) + return -EFSCORRUPTED; + sxi->si_ip2 = ip; + + return 0; +} + +const struct xfs_defer_op_type xfs_swapext_defer_type = { + .max_items = XFS_SXI_MAX_FAST_EXTENTS, + .diff_items = xfs_swapext_diff_items, + .create_intent = xfs_swapext_create_intent, + .abort_intent = xfs_swapext_abort_intent, + .log_item = xfs_swapext_log_item, + .create_done = xfs_swapext_create_done, + .finish_item = xfs_swapext_finish_item, + .cancel_item = xfs_swapext_cancel_item, + .freeze_item = xfs_swapext_freeze_item, + .thaw_item = xfs_swapext_thaw_item, +}; + /* * Process a swapext update intent item that was recovered from the log. * We need to update some inode's bmbt. @@ -215,7 +451,105 @@ xfs_sxi_recover( struct xfs_defer_freezer **dffp, struct xfs_sxi_log_item *ilip) { - return -EFSCORRUPTED; + struct xfs_swapext_intent sxi; + struct xfs_swap_extent *se; + struct xfs_sxd_log_item *dlip; + struct xfs_trans *tp; + int error = 0; + + ASSERT(!test_bit(XFS_SXI_RECOVERED, &ilip->sxi_flags)); + + /* + * First check the validity of the extent described by the + * SXI. If anything is bad, then toss the SXI. + */ + se = &ilip->sxi_format.sxi_extent; + if (se->se_blockcount == 0 || + ilip->sxi_format.__pad != 0 || + !xfs_verify_ino(mp, se->se_inode1) || + !xfs_verify_ino(mp, se->se_inode2) || + (se->se_flags & ~XFS_SWAP_EXTENT_FLAGS) || + ((se->se_flags & XFS_SWAP_EXTENT_SET_SIZES) && + (se->se_isize1 < 0 || se->se_isize2 < 0))) { + /* + * This will pull the SXI from the AIL and + * free the memory associated with it. + */ + set_bit(XFS_SXI_RECOVERED, &ilip->sxi_flags); + xfs_sxi_release(ilip); + return -EFSCORRUPTED; + } + + error = xfs_trans_alloc(mp, &M_RES(mp)->tr_itruncate, + XFS_EXTENTADD_SPACE_RES(mp, XFS_DATA_FORK), 0, 0, &tp); + if (error) + return error; + + dlip = xfs_trans_get_sxd(tp, ilip); + memset(&sxi, 0, sizeof(sxi)); + INIT_LIST_HEAD(&sxi.si_list); + + /* Grab both inodes and lock them. */ + error = xfs_iget(mp, tp, se->se_inode1, 0, 0, &sxi.si_ip1); + if (error) + goto out_fail; + error = xfs_iget(mp, tp, se->se_inode2, 0, 0, &sxi.si_ip2); + if (error) + goto out_fail; + + xfs_lock_two_inodes(sxi.si_ip1, XFS_ILOCK_EXCL, + sxi.si_ip2, XFS_ILOCK_EXCL); + xfs_trans_ijoin(tp, sxi.si_ip1, 0); + xfs_trans_ijoin(tp, sxi.si_ip2, 0); + + /* + * Set IRECOVERY to prevent trimming of post-eof extents and freeing of + * unlinked inodes until we're totally done processing files. + */ + if (VFS_I(sxi.si_ip1)->i_nlink == 0) + xfs_iflags_set(sxi.si_ip1, XFS_IRECOVERY); + if (VFS_I(sxi.si_ip2)->i_nlink == 0) + xfs_iflags_set(sxi.si_ip2, XFS_IRECOVERY); + + /* + * Construct the rest of our in-core swapext intent state so that we + * can call the deferred operation functions to continue the work. + */ + sxi.si_flags = se->se_flags; + sxi.si_startoff1 = se->se_startoff1; + sxi.si_startoff2 = se->se_startoff2; + sxi.si_blockcount = se->se_blockcount; + sxi.si_isize1 = se->se_isize1; + sxi.si_isize2 = se->se_isize2; + error = xfs_trans_log_finish_swapext_update(tp, dlip, &sxi); + if (error) + goto out_fail; + + /* + * If there's more extent swapping to be done, we have to schedule that + * as a separate deferred operation to be run after we've finished + * replaying all of the intents we recovered from the log. + */ + if (xfs_swapext_has_more_work(&sxi)) + xfs_swapext_reschedule(tp, &sxi); + + set_bit(XFS_SXI_RECOVERED, &ilip->sxi_flags); + error = xlog_recover_trans_commit(tp, dffp); + goto out_rele; + +out_fail: + xfs_trans_cancel(tp); +out_rele: + if (sxi.si_ip2) { + xfs_iunlock(sxi.si_ip2, XFS_ILOCK_EXCL); + xfs_irele(sxi.si_ip2); + } + if (sxi.si_ip1) { + xfs_iunlock(sxi.si_ip1, XFS_ILOCK_EXCL); + xfs_irele(sxi.si_ip1); + } + return error; + } /* diff --git a/fs/xfs/xfs_trace.c b/fs/xfs/xfs_trace.c index 9b8d703dc9fd..f8cceacfb51d 100644 --- a/fs/xfs/xfs_trace.c +++ b/fs/xfs/xfs_trace.c @@ -30,6 +30,7 @@ #include "xfs_fsmap.h" #include "xfs_btree_staging.h" #include "xfs_icache.h" +#include "xfs_swapext.h" /* * We include this last to have the helpers above available for the trace diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h index 721e14f5c98b..af9c7bcb7a8a 100644 --- a/fs/xfs/xfs_trace.h +++ b/fs/xfs/xfs_trace.h @@ -37,6 +37,7 @@ struct xfs_trans_res; struct xfs_inobt_rec_incore; union xfs_btree_ptr; struct xfs_eofblocks; +struct xfs_swapext_intent; #define XFS_ATTR_FILTER_FLAGS \ { XFS_ATTR_ROOT, "ROOT" }, \ @@ -3207,6 +3208,9 @@ DEFINE_INODE_IREC_EVENT(xfs_reflink_cancel_cow); DEFINE_INODE_IREC_EVENT(xfs_swap_extent_rmap_remap); DEFINE_INODE_IREC_EVENT(xfs_swap_extent_rmap_remap_piece); DEFINE_INODE_ERROR_EVENT(xfs_swap_extent_rmap_error); +DEFINE_INODE_IREC_EVENT(xfs_swapext_extent1); +DEFINE_INODE_IREC_EVENT(xfs_swapext_extent2); +DEFINE_ITRUNC_EVENT(xfs_swapext_update_inode_size); /* fsmap traces */ DECLARE_EVENT_CLASS(xfs_fsmap_class, @@ -3836,6 +3840,51 @@ DEFINE_NAMESPACE_EVENT(xfs_imeta_dir_created); DEFINE_NAMESPACE_EVENT(xfs_imeta_dir_unlinked); DEFINE_NAMESPACE_EVENT(xfs_imeta_dir_zap); +#define XFS_SWAPEXT_FLAGS \ + { XFS_SWAP_EXTENT_ATTR_FORK, "ATTRFORK" }, \ + { XFS_SWAP_EXTENT_SET_SIZES, "SETSIZES" } + +TRACE_EVENT(xfs_swapext_defer, + TP_PROTO(struct xfs_mount *mp, const struct xfs_swapext_intent *sxi), + TP_ARGS(mp, sxi), + TP_STRUCT__entry( + __field(dev_t, dev) + __field(xfs_ino_t, ino1) + __field(xfs_ino_t, ino2) + __field(uint64_t, flags) + __field(xfs_fileoff_t, startoff1) + __field(xfs_fileoff_t, startoff2) + __field(xfs_filblks_t, blockcount) + __field(xfs_fsize_t, isize1) + __field(xfs_fsize_t, isize2) + __field(xfs_fsize_t, new_isize1) + __field(xfs_fsize_t, new_isize2) + ), + TP_fast_assign( + __entry->dev = mp->m_super->s_dev; + __entry->ino1 = sxi->si_ip1->i_ino; + __entry->ino2 = sxi->si_ip2->i_ino; + __entry->flags = sxi->si_flags; + __entry->startoff1 = sxi->si_startoff1; + __entry->startoff2 = sxi->si_startoff2; + __entry->blockcount = sxi->si_blockcount; + __entry->isize1 = sxi->si_ip1->i_d.di_size; + __entry->isize2 = sxi->si_ip2->i_d.di_size; + __entry->new_isize1 = sxi->si_isize1; + __entry->new_isize2 = sxi->si_isize2; + ), + TP_printk("dev %d:%d ino1 0x%llx isize1 %lld ino2 0x%llx isize2 %lld flags (%s) startoff1 %llu startoff2 %llu blockcount %llu newisize1 %lld newisize2 %lld", + MAJOR(__entry->dev), MINOR(__entry->dev), + __entry->ino1, __entry->isize1, + __entry->ino2, __entry->isize2, + __print_flags(__entry->flags, "|", XFS_SWAPEXT_FLAGS), + __entry->startoff1, + __entry->startoff2, + __entry->blockcount, + __entry->new_isize1, __entry->new_isize2) + +); + #endif /* _TRACE_XFS_H */ #undef TRACE_INCLUDE_PATH From patchwork Wed Apr 29 02:45:20 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Darrick J. Wong" X-Patchwork-Id: 11515849 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 1FC5A17EF for ; Wed, 29 Apr 2020 02:45:25 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 08B5D2076A for ; Wed, 29 Apr 2020 02:45:25 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=oracle.com header.i=@oracle.com header.b="jyEeKjor" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726669AbgD2CpY (ORCPT ); Tue, 28 Apr 2020 22:45:24 -0400 Received: from userp2130.oracle.com ([156.151.31.86]:51410 "EHLO userp2130.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726536AbgD2CpY (ORCPT ); Tue, 28 Apr 2020 22:45:24 -0400 Received: from pps.filterd (userp2130.oracle.com [127.0.0.1]) by userp2130.oracle.com (8.16.0.42/8.16.0.42) with SMTP id 03T2heRf155902; Wed, 29 Apr 2020 02:45:22 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=subject : from : to : cc : date : message-id : in-reply-to : references : mime-version : content-type : content-transfer-encoding; s=corp-2020-01-29; bh=3NGGvaTRiwIXeQdhdZrFy56v/hlQDcYPEPi410TXjt8=; b=jyEeKjorPnoeW418VDA+kvMoaixo50QphkNr6CH78K20Ub1kkMKZEhEDkS8jcgq4+npI wKuKAMZW+6NhdnG6W07h2MvkfvnALb8bZKRXAXByPpGSuiFZ71vM6w3Oj/0izMBnOa4b hvWoCaQRTQIYSyy3cPAYRI1nEAzRh7mTLDyK4ZiOytDwcdwdZ3uUGrjAXgLOMsbQRjkK mx2y6IuLIHWAz9W8yyH1c7/tp+HOsXmRsUcgn760GQtp5pO9ay37l6ZBnERBTh0Neqbw cGu1FHGUhmXKTr4qJcNLZ+gUbebnpdXJreY31yh2B9zV8xa+oJbtcJ2ZxkOnUu/o8ZuM 0A== Received: from userp3020.oracle.com (userp3020.oracle.com [156.151.31.79]) by userp2130.oracle.com with ESMTP id 30p01nstje-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Wed, 29 Apr 2020 02:45:22 +0000 Received: from pps.filterd (userp3020.oracle.com [127.0.0.1]) by userp3020.oracle.com (8.16.0.42/8.16.0.42) with SMTP id 03T2gNSW096583; Wed, 29 Apr 2020 02:45:22 GMT Received: from aserv0122.oracle.com (aserv0122.oracle.com [141.146.126.236]) by userp3020.oracle.com with ESMTP id 30pvcytdh2-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Wed, 29 Apr 2020 02:45:22 +0000 Received: from abhmp0007.oracle.com (abhmp0007.oracle.com [141.146.116.13]) by aserv0122.oracle.com (8.14.4/8.14.4) with ESMTP id 03T2jLFL003624; Wed, 29 Apr 2020 02:45:21 GMT Received: from localhost (/67.169.218.210) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Tue, 28 Apr 2020 19:45:20 -0700 Subject: [PATCH 10/18] xfs: refactor locking and unlocking two inodes against userspace IO From: "Darrick J. Wong" To: darrick.wong@oracle.com Cc: linux-xfs@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-api@vger.kernel.org Date: Tue, 28 Apr 2020 19:45:20 -0700 Message-ID: <158812831991.168506.927297614049035671.stgit@magnolia> In-Reply-To: <158812825316.168506.932540609191384366.stgit@magnolia> References: <158812825316.168506.932540609191384366.stgit@magnolia> User-Agent: StGit/0.17.1-dirty MIME-Version: 1.0 X-Proofpoint-Virus-Version: vendor=nai engine=6000 definitions=9605 signatures=668686 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 spamscore=0 mlxlogscore=999 suspectscore=1 malwarescore=0 adultscore=0 bulkscore=0 phishscore=0 mlxscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2003020000 definitions=main-2004290020 X-Proofpoint-Virus-Version: vendor=nai engine=6000 definitions=9605 signatures=668686 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 bulkscore=0 spamscore=0 clxscore=1015 phishscore=0 mlxlogscore=999 adultscore=0 priorityscore=1501 mlxscore=0 suspectscore=1 malwarescore=0 lowpriorityscore=0 impostorscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2003020000 definitions=main-2004290020 Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org From: Darrick J. Wong Refactor the two functions that we use to lock and unlock two inodes to block userspace from initiating IO against a file, whether via system calls or mmap activity. Move them to xfs_inode.c since this functionality won't be specific to reflink for much longer. Signed-off-by: Darrick J. Wong --- fs/xfs/xfs_file.c | 2 + fs/xfs/xfs_inode.c | 93 ++++++++++++++++++++++++++++++++++++++++++++++++++ fs/xfs/xfs_inode.h | 3 ++ fs/xfs/xfs_reflink.c | 85 +--------------------------------------------- fs/xfs/xfs_reflink.h | 2 - 5 files changed, 99 insertions(+), 86 deletions(-) diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c index 1759fbcbcd46..9bce98323ca6 100644 --- a/fs/xfs/xfs_file.c +++ b/fs/xfs/xfs_file.c @@ -1059,7 +1059,7 @@ xfs_file_remap_range( if (mp->m_flags & XFS_MOUNT_WSYNC) xfs_log_force_inode(dest); out_unlock: - xfs_reflink_remap_unlock(file_in, file_out); + xfs_iunlock_two_io(src, dest); if (ret) trace_xfs_reflink_remap_range_error(dest, ret, _RET_IP_); return remapped > 0 ? remapped : ret; diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c index a0db7f47826f..080c8838fba5 100644 --- a/fs/xfs/xfs_inode.c +++ b/fs/xfs/xfs_inode.c @@ -3112,3 +3112,96 @@ xfs_is_always_cow_inode( return ip->i_mount->m_always_cow && xfs_sb_version_hasreflink(&ip->i_mount->m_sb); } + +/* + * Grab the exclusive iolock for a data copy from src to dest, making sure to + * abide vfs locking order (lowest pointer value goes first) and breaking the + * layout leases before proceeding. The loop is needed because we cannot call + * the blocking break_layout() with the iolocks held, and therefore have to + * back out both locks. + */ +static int +xfs_iolock_two_inodes_and_break_layout( + struct inode *src, + struct inode *dest) +{ + int error; + + if (src > dest) + swap(src, dest); + +retry: + /* Wait to break both inodes' layouts before we start locking. */ + error = break_layout(src, true); + if (error) + return error; + if (src != dest) { + error = break_layout(dest, true); + if (error) + return error; + } + + /* Lock one inode and make sure nobody got in and leased it. */ + inode_lock(src); + error = break_layout(src, false); + if (error) { + inode_unlock(src); + if (error == -EWOULDBLOCK) + goto retry; + return error; + } + + if (src == dest) + return 0; + + /* Lock the other inode and make sure nobody got in and leased it. */ + inode_lock_nested(dest, I_MUTEX_NONDIR2); + error = break_layout(dest, false); + if (error) { + inode_unlock(src); + inode_unlock(dest); + if (error == -EWOULDBLOCK) + goto retry; + return error; + } + + return 0; +} + +/* + * Lock two files so that userspace cannot initiate I/O via file syscalls or + * mmap activity. + */ +int +xfs_ilock_two_io( + struct xfs_inode *ip1, + struct xfs_inode *ip2) +{ + int ret; + + ret = xfs_iolock_two_inodes_and_break_layout(VFS_I(ip1), VFS_I(ip2)); + if (ret) + return ret; + if (ip1 == ip2) + xfs_ilock(ip1, XFS_MMAPLOCK_EXCL); + else + xfs_lock_two_inodes(ip1, XFS_MMAPLOCK_EXCL, + ip2, XFS_MMAPLOCK_EXCL); + return 0; +} + +/* Unlock both files to allow IO and mmap activity. */ +void +xfs_iunlock_two_io( + struct xfs_inode *ip1, + struct xfs_inode *ip2) +{ + bool same_inode = (ip1 == ip2); + + xfs_iunlock(ip2, XFS_MMAPLOCK_EXCL); + if (!same_inode) + xfs_iunlock(ip1, XFS_MMAPLOCK_EXCL); + inode_unlock(VFS_I(ip2)); + if (!same_inode) + inode_unlock(VFS_I(ip1)); +} diff --git a/fs/xfs/xfs_inode.h b/fs/xfs/xfs_inode.h index df5021cf5d0f..d8cb7bed4dd9 100644 --- a/fs/xfs/xfs_inode.h +++ b/fs/xfs/xfs_inode.h @@ -509,4 +509,7 @@ void xfs_inode_inactivation_cleanup(struct xfs_inode *ip); void xfs_end_io(struct work_struct *work); +int xfs_ilock_two_io(struct xfs_inode *ip1, struct xfs_inode *ip2); +void xfs_iunlock_two_io(struct xfs_inode *ip1, struct xfs_inode *ip2); + #endif /* __XFS_INODE_H__ */ diff --git a/fs/xfs/xfs_reflink.c b/fs/xfs/xfs_reflink.c index f206f6637daf..566a3dee2815 100644 --- a/fs/xfs/xfs_reflink.c +++ b/fs/xfs/xfs_reflink.c @@ -1237,81 +1237,6 @@ xfs_reflink_remap_blocks( return error; } -/* - * Grab the exclusive iolock for a data copy from src to dest, making sure to - * abide vfs locking order (lowest pointer value goes first) and breaking the - * layout leases before proceeding. The loop is needed because we cannot call - * the blocking break_layout() with the iolocks held, and therefore have to - * back out both locks. - */ -static int -xfs_iolock_two_inodes_and_break_layout( - struct inode *src, - struct inode *dest) -{ - int error; - - if (src > dest) - swap(src, dest); - -retry: - /* Wait to break both inodes' layouts before we start locking. */ - error = break_layout(src, true); - if (error) - return error; - if (src != dest) { - error = break_layout(dest, true); - if (error) - return error; - } - - /* Lock one inode and make sure nobody got in and leased it. */ - inode_lock(src); - error = break_layout(src, false); - if (error) { - inode_unlock(src); - if (error == -EWOULDBLOCK) - goto retry; - return error; - } - - if (src == dest) - return 0; - - /* Lock the other inode and make sure nobody got in and leased it. */ - inode_lock_nested(dest, I_MUTEX_NONDIR2); - error = break_layout(dest, false); - if (error) { - inode_unlock(src); - inode_unlock(dest); - if (error == -EWOULDBLOCK) - goto retry; - return error; - } - - return 0; -} - -/* Unlock both inodes after they've been prepped for a range clone. */ -void -xfs_reflink_remap_unlock( - struct file *file_in, - struct file *file_out) -{ - struct inode *inode_in = file_inode(file_in); - struct xfs_inode *src = XFS_I(inode_in); - struct inode *inode_out = file_inode(file_out); - struct xfs_inode *dest = XFS_I(inode_out); - bool same_inode = (inode_in == inode_out); - - xfs_iunlock(dest, XFS_MMAPLOCK_EXCL); - if (!same_inode) - xfs_iunlock(src, XFS_MMAPLOCK_EXCL); - inode_unlock(inode_out); - if (!same_inode) - inode_unlock(inode_in); -} - /* * If we're reflinking to a point past the destination file's EOF, we must * zero any speculative post-EOF preallocations that sit between the old EOF @@ -1374,18 +1299,12 @@ xfs_reflink_remap_prep( struct xfs_inode *src = XFS_I(inode_in); struct inode *inode_out = file_inode(file_out); struct xfs_inode *dest = XFS_I(inode_out); - bool same_inode = (inode_in == inode_out); int ret; /* Lock both files against IO */ - ret = xfs_iolock_two_inodes_and_break_layout(inode_in, inode_out); + ret = xfs_ilock_two_io(src, dest); if (ret) return ret; - if (same_inode) - xfs_ilock(src, XFS_MMAPLOCK_EXCL); - else - xfs_lock_two_inodes(src, XFS_MMAPLOCK_EXCL, dest, - XFS_MMAPLOCK_EXCL); /* Check file eligibility and prepare for block sharing. */ ret = -EINVAL; @@ -1436,7 +1355,7 @@ xfs_reflink_remap_prep( return 0; out_unlock: - xfs_reflink_remap_unlock(file_in, file_out); + xfs_iunlock_two_io(src, dest); return ret; } diff --git a/fs/xfs/xfs_reflink.h b/fs/xfs/xfs_reflink.h index 0879d2e71e11..8ddf1300a982 100644 --- a/fs/xfs/xfs_reflink.h +++ b/fs/xfs/xfs_reflink.h @@ -50,7 +50,5 @@ extern int xfs_reflink_remap_blocks(struct xfs_inode *src, loff_t pos_in, loff_t *remapped); extern int xfs_reflink_update_dest(struct xfs_inode *dest, xfs_off_t newlen, xfs_extlen_t cowextsize, unsigned int remap_flags); -extern void xfs_reflink_remap_unlock(struct file *file_in, - struct file *file_out); #endif /* __XFS_REFLINK_H */ From patchwork Wed Apr 29 02:45:26 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Darrick J. Wong" X-Patchwork-Id: 11515851 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 6F1DC92C for ; Wed, 29 Apr 2020 02:45:34 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 4C3F320784 for ; Wed, 29 Apr 2020 02:45:34 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=oracle.com header.i=@oracle.com header.b="QyZ/1T7Z" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726690AbgD2Cpd (ORCPT ); Tue, 28 Apr 2020 22:45:33 -0400 Received: from aserp2120.oracle.com ([141.146.126.78]:48896 "EHLO aserp2120.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726536AbgD2Cpc (ORCPT ); Tue, 28 Apr 2020 22:45:32 -0400 Received: from pps.filterd (aserp2120.oracle.com [127.0.0.1]) by aserp2120.oracle.com (8.16.0.42/8.16.0.42) with SMTP id 03T2h6gR072916; Wed, 29 Apr 2020 02:45:30 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=subject : from : to : cc : date : message-id : in-reply-to : references : mime-version : content-type : content-transfer-encoding; s=corp-2020-01-29; bh=/GqMvnZm7ctqNarf3UK8yO20PuE3GtR89fcuQmYbX5M=; b=QyZ/1T7ZZOrMLLQXddMSh+jH10AojjEu8XXWhc1dS5Gid+NDlfix1doVu4iGfCDFZRfE Oq2bHn1zdN/Bcsi1H5qvtIwpll4/G+655OkUM9lldFhuDRX423UvffIZKXKioTJR/BS9 M+B5FQQwfypv4rYfwvqX8itALdd1Tn7VwSyadMK7kY8t1VRBZikKaFec/r2M3mRH/BQF 6bcu438fqChfxIDhEfX9bdahIjUMsWUqNlXymuD0ye8ndeQN6A+fiX9XjGCn69RpM2sE koil+/gofyapZnrNnpFsxBZUW4ScHKH/+USBoctku4eURZgY+VJnMxN5LY9wcaphh/g6 qw== Received: from aserp3030.oracle.com (aserp3030.oracle.com [141.146.126.71]) by aserp2120.oracle.com with ESMTP id 30nucg39p3-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Wed, 29 Apr 2020 02:45:29 +0000 Received: from pps.filterd (aserp3030.oracle.com [127.0.0.1]) by aserp3030.oracle.com (8.16.0.42/8.16.0.42) with SMTP id 03T2g3VA039235; Wed, 29 Apr 2020 02:45:29 GMT Received: from userv0122.oracle.com (userv0122.oracle.com [156.151.31.75]) by aserp3030.oracle.com with ESMTP id 30mxru04wf-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Wed, 29 Apr 2020 02:45:29 +0000 Received: from abhmp0006.oracle.com (abhmp0006.oracle.com [141.146.116.12]) by userv0122.oracle.com (8.14.4/8.14.4) with ESMTP id 03T2jRHn022735; Wed, 29 Apr 2020 02:45:28 GMT Received: from localhost (/67.169.218.210) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Tue, 28 Apr 2020 19:45:27 -0700 Subject: [PATCH 11/18] xfs: add a ->swap_file_range handler From: "Darrick J. Wong" To: darrick.wong@oracle.com Cc: linux-xfs@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-api@vger.kernel.org Date: Tue, 28 Apr 2020 19:45:26 -0700 Message-ID: <158812832621.168506.10248212998434869117.stgit@magnolia> In-Reply-To: <158812825316.168506.932540609191384366.stgit@magnolia> References: <158812825316.168506.932540609191384366.stgit@magnolia> User-Agent: StGit/0.17.1-dirty MIME-Version: 1.0 X-Proofpoint-Virus-Version: vendor=nai engine=6000 definitions=9605 signatures=668686 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 mlxscore=0 phishscore=0 suspectscore=3 mlxlogscore=999 malwarescore=0 bulkscore=0 spamscore=0 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2003020000 definitions=main-2004290020 X-Proofpoint-Virus-Version: vendor=nai engine=6000 definitions=9605 signatures=668686 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 clxscore=1015 priorityscore=1501 mlxlogscore=999 impostorscore=0 suspectscore=3 malwarescore=0 lowpriorityscore=0 mlxscore=0 spamscore=0 adultscore=0 phishscore=0 bulkscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2003020000 definitions=main-2004290020 Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org From: Darrick J. Wong Add a function to handle range swap requests from the vfs. Signed-off-by: Darrick J. Wong --- fs/xfs/xfs_bmap_util.c | 340 ++++++++++++++++++++++++++++++++++++++++++++++++ fs/xfs/xfs_bmap_util.h | 4 + fs/xfs/xfs_file.c | 39 ++++++ fs/xfs/xfs_trace.h | 4 + 4 files changed, 387 insertions(+) diff --git a/fs/xfs/xfs_bmap_util.c b/fs/xfs/xfs_bmap_util.c index 070f657241a1..a8bd2627d76e 100644 --- a/fs/xfs/xfs_bmap_util.c +++ b/fs/xfs/xfs_bmap_util.c @@ -29,6 +29,7 @@ #include "xfs_iomap.h" #include "xfs_reflink.h" #include "xfs_sb.h" +#include "xfs_swapext.h" /* Kernel only BMAP related definitions and functions */ @@ -1841,3 +1842,342 @@ xfs_swap_extents( xfs_trans_cancel(tp); goto out_unlock; } + +/* Prepare two files to have their data swapped. */ +int +xfs_swap_range_prep( + struct file *file1, + struct file *file2, + struct file_swap_range *fsr) +{ + struct xfs_inode *ip1 = XFS_I(file_inode(file1)); + struct xfs_inode *ip2 = XFS_I(file_inode(file2)); + int ret; + + /* Verify both files are either real-time or non-realtime */ + if (XFS_IS_REALTIME_INODE(ip1) != XFS_IS_REALTIME_INODE(ip2)) + return -EINVAL; + + ret = generic_swap_file_range_prep(file1, file2, fsr); + if (ret) + return ret; + + /* Attach dquots to both inodes before changing block maps. */ + ret = xfs_qm_dqattach(ip2); + if (ret) + return ret; + ret = xfs_qm_dqattach(ip1); + if (ret) + return ret; + + /* Flush the relevant ranges of both files. */ + ret = xfs_flush_unmap_range(ip2, fsr->file2_offset, fsr->length); + if (ret) + return ret; + return xfs_flush_unmap_range(ip1, fsr->file1_offset, fsr->length); +} + +/* + * Compute the number of blocks and extents mapped to part of a file, and the + * worst case estimate of the number of bmbt blocks required to store those + * mappings. + */ +STATIC int +xfs_bmap_count_range_blocks( + struct xfs_inode *ip, + int whichfork, + xfs_fileoff_t startoff, + xfs_filblks_t blockcount, + xfs_filblks_t *nr_mapped_blocks) +{ + struct xfs_bmbt_irec irec; + xfs_filblks_t nr_blocks = 0; + xfs_extnum_t extents = 0; + int bmapi_flags = xfs_bmapi_aflag(whichfork); + int nimaps; + int error; + + *nr_mapped_blocks = 0; + + /* Count all the extents that map to allocated space. */ + while (blockcount > 0) { + nimaps = 1; + error = xfs_bmapi_read(ip, startoff, blockcount, &irec, + &nimaps, bmapi_flags); + if (error) + return error; + if (nimaps != 1) + return -EINVAL; + if (xfs_bmap_is_mapped_extent(&irec)) { + nr_blocks += irec.br_blockcount; + extents++; + } + startoff += irec.br_blockcount; + blockcount -= irec.br_blockcount; + } + + /* Add in the number of bmbt splits that could happen. */ + nr_blocks += XFS_NEXTENTADD_SPACE_RES(ip->i_mount, nr_blocks, + whichfork); + *nr_mapped_blocks = nr_blocks; + + return 0; +} + +/* + * Compute the number of blocks we need to reserve to handle a log-assisted + * extent swap operation. + */ +static inline unsigned int +xfs_swap_range_calc_resblks( + struct xfs_inode *ip1, + struct xfs_inode *ip2, + int whichfork, + xfs_filblks_t blockcount) +{ + struct xfs_mount *mp = ip1->i_mount; + xfs_extnum_t ip1_nr = XFS_IFORK_NEXTENTS(ip1, whichfork); + xfs_extnum_t ip2_nr = XFS_IFORK_NEXTENTS(ip2, whichfork); + unsigned int resblks; + + /* + * Each file range cannot have more extents than there are blocks in + * that range. + */ + ip1_nr = min_t(xfs_filblks_t, ip1_nr, blockcount); + ip2_nr = min_t(xfs_filblks_t, ip2_nr, blockcount); + + /* + * Conceptually this shouldn't affect the shape of either bmbt, but + * since we atomically move extents one by one, we reserve enough space + * to rebuild both trees. + */ + resblks = XFS_SWAP_RMAP_SPACE_RES(mp, ip1_nr, whichfork); + resblks += XFS_SWAP_RMAP_SPACE_RES(mp, ip2_nr, whichfork); + + /* + * Handle the corner case where either inode might straddle the btree + * format boundary. If so, the inode could bounce between btree <-> + * extent format on unmap -> remap cycles, freeing and allocating a + * bmapbt block each time. + */ + if (ip1_nr == (XFS_IFORK_MAXEXT(ip1, whichfork) + 1)) + resblks += XFS_IFORK_MAXEXT(ip1, whichfork); + if (ip2_nr == (XFS_IFORK_MAXEXT(ip2, whichfork) + 1)) + resblks += XFS_IFORK_MAXEXT(ip2, whichfork); + + return resblks; +} + +/* + * Obtain a quota reservation to make sure we don't hit EDQUOT. We can skip + * this if quota enforcement is disabled or if both inodes' dquots are the + * same. + */ +STATIC int +xfs_swap_range_prep_quota( + struct xfs_trans *tp, + struct xfs_inode *ip1, + struct xfs_inode *ip2, + int whichfork, + xfs_fileoff_t startoff1, + xfs_fileoff_t startoff2, + xfs_filblks_t blockcount) +{ + struct xfs_mount *mp = ip1->i_mount; + xfs_filblks_t ip1_mapped, ip2_mapped; + int error; + + /* + * Don't bother with a quota reservation if we're not enforcing them + * or the two inodes have the same dquots. + */ + if (!(mp->m_qflags & XFS_ALL_QUOTA_ENFD) || ip1 == ip2) + return 0; + + if (ip1->i_udquot == ip2->i_udquot && + ip1->i_gdquot == ip2->i_gdquot && + ip1->i_pdquot == ip2->i_pdquot) + return 0; + + /* Figure out how many blocks we'll move out of each file. */ + error = xfs_bmap_count_range_blocks(ip1, whichfork, startoff1, + blockcount, &ip1_mapped); + if (error) + return error; + error = xfs_bmap_count_range_blocks(ip2, whichfork, startoff2, + blockcount, &ip2_mapped); + if (error) + return error; + + /* + * For each file, compute the net gain in the number of blocks that + * will be mapped into that file and reserve that much quota. The + * quota counts must be able to absorb at least that much space. + */ + if (ip2_mapped > ip1_mapped) { + error = xfs_trans_reserve_quota_nblks(tp, ip1, + ip2_mapped - ip1_mapped, 0, + XFS_QMOPT_RES_REGBLKS); + if (error) + return error; + } + + if (ip1_mapped > ip2_mapped) { + error = xfs_trans_reserve_quota_nblks(tp, ip2, + ip1_mapped - ip2_mapped, 0, + XFS_QMOPT_RES_REGBLKS); + if (error) + return error; + } + + /* + * For each file, forcibly reserve the gross gain in mapped blocks so + * that we don't trip over any quota block reservation assertions. + * We must reserve the gross gain because the quota code subtracts from + * bcount the number of blocks that we unmap; it does not add that + * quantity back to the quota block reservation. + */ + error = xfs_trans_reserve_quota_nblks(tp, ip1, ip1_mapped, 0, + XFS_QMOPT_FORCE_RES | XFS_QMOPT_RES_REGBLKS); + if (error) + return error; + + return xfs_trans_reserve_quota_nblks(tp, ip2, ip2_mapped, 0, + XFS_QMOPT_FORCE_RES | XFS_QMOPT_RES_REGBLKS); +} + +/* Swap parts of two files. */ +int +xfs_swap_range( + struct xfs_inode *ip1, + struct xfs_inode *ip2, + const struct file_swap_range *fsr) +{ + struct xfs_mount *mp = ip1->i_mount; + struct xfs_trans *tp; + xfs_fileoff_t startoff1; + xfs_fileoff_t startoff2; + xfs_filblks_t blockcount = XFS_B_TO_FSB(mp, fsr->length); + unsigned int resblks; + unsigned int sxflags = 0; + int error; + + if (!xfs_sb_version_hasatomicswap(&mp->m_sb)) + return -EOPNOTSUPP; + + startoff1 = XFS_B_TO_FSBT(mp, fsr->file1_offset); + startoff2 = XFS_B_TO_FSBT(mp, fsr->file2_offset); + + /* + * Cancel CoW fork preallocations for the ranges of both files. The + * prep function should have flushed all the dirty data, so the only + * extents remaining should be speculative. + */ + if (xfs_inode_has_cow_data(ip1)) { + error = xfs_reflink_cancel_cow_range(ip1, fsr->file1_offset, + fsr->length, true); + if (error) + return error; + } + + if (xfs_inode_has_cow_data(ip2)) { + error = xfs_reflink_cancel_cow_range(ip2, fsr->file2_offset, + fsr->length, true); + if (error) + return error; + } + + resblks = xfs_swap_range_calc_resblks(ip1, ip2, XFS_DATA_FORK, + blockcount); + error = xfs_trans_alloc(mp, &M_RES(mp)->tr_write, resblks, 0, 0, &tp); + if (error) + return error; + + /* + * Lock and join the inodes to the tansaction so that transaction commit + * or cancel will unlock the inodes from this point onwards. + */ + if (ip1 != ip2) { + xfs_lock_two_inodes(ip1, XFS_ILOCK_EXCL, ip2, XFS_ILOCK_EXCL); + xfs_trans_ijoin(tp, ip1, 0); + xfs_trans_ijoin(tp, ip2, 0); + } else { + xfs_ilock(ip1, XFS_ILOCK_EXCL); + xfs_trans_ijoin(tp, ip1, 0); + } + + trace_xfs_swap_extent_before(ip2, 0); + trace_xfs_swap_extent_before(ip1, 1); + + /* + * Do all of the inputs checking that we can only do once we've taken + * both ILOCKs. + */ + error = generic_swap_file_range_check_fresh(VFS_I(ip1), VFS_I(ip2), + fsr); + if (error) + goto out_trans_cancel; + + if (XFS_IFORK_FORMAT(ip1, XFS_DATA_FORK) == XFS_DINODE_FMT_LOCAL || + XFS_IFORK_FORMAT(ip2, XFS_DATA_FORK) == XFS_DINODE_FMT_LOCAL) { + error = -EINVAL; + goto out_trans_cancel; + } + + /* + * Reserve ourselves some quota if any of them are in enforcing mode. + * In theory we only need enough to satisfy the change in the number + * of blocks between the two ranges being remapped. + */ + error = xfs_swap_range_prep_quota(tp, ip1, ip2, XFS_DATA_FORK, + startoff1, startoff2, blockcount); + if (error) + goto out_trans_cancel; + + /* Perform the file range swap. */ + if (fsr->flags & FILE_SWAP_RANGE_TO_EOF) + sxflags |= XFS_SWAPEXT_SET_SIZES; + + error = xfs_swapext_atomic(&tp, ip1, ip2, XFS_DATA_FORK, startoff1, + startoff2, blockcount, sxflags); + if (error) + goto out_trans_cancel; + + /* + * If the caller wanted us to swap two complete files of unequal + * length, swap the incore sizes now. This should be safe because we + * flushed both files' page caches and moved all the post-eof extents, + * so there should not be anything to zero. + */ + if (fsr->flags & FILE_SWAP_RANGE_TO_EOF) { + loff_t temp; + + temp = i_size_read(VFS_I(ip2)); + i_size_write(VFS_I(ip2), i_size_read(VFS_I(ip1))); + i_size_write(VFS_I(ip1), temp); + } + + /* + * If this is a synchronous mount, make sure that the + * transaction goes to disk before returning to the user. + */ + if (mp->m_flags & XFS_MOUNT_WSYNC) + xfs_trans_set_sync(tp); + + error = xfs_trans_commit(tp); + + trace_xfs_swap_extent_after(ip2, 0); + trace_xfs_swap_extent_after(ip1, 1); + +out_unlock: + xfs_iunlock(ip1, XFS_ILOCK_EXCL); + if (ip1 != ip2) + xfs_iunlock(ip2, XFS_ILOCK_EXCL); + return error; + +out_trans_cancel: + xfs_trans_cancel(tp); + goto out_unlock; +} + diff --git a/fs/xfs/xfs_bmap_util.h b/fs/xfs/xfs_bmap_util.h index 9f993168b55b..d3444a63bbd7 100644 --- a/fs/xfs/xfs_bmap_util.h +++ b/fs/xfs/xfs_bmap_util.h @@ -68,6 +68,10 @@ int xfs_free_eofblocks(struct xfs_inode *ip); int xfs_swap_extents(struct xfs_inode *ip, struct xfs_inode *tip, struct xfs_swapext *sx); +int xfs_swap_range_prep(struct file *file1, struct file *file2, + struct file_swap_range *fsr); +int xfs_swap_range(struct xfs_inode *ip1, struct xfs_inode *ip2, + const struct file_swap_range *fsr); xfs_daddr_t xfs_fsb_to_db(struct xfs_inode *ip, xfs_fsblock_t fsb); diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c index 9bce98323ca6..d446c16cfc30 100644 --- a/fs/xfs/xfs_file.c +++ b/fs/xfs/xfs_file.c @@ -1065,6 +1065,44 @@ xfs_file_remap_range( return remapped > 0 ? remapped : ret; } +STATIC int +xfs_file_swap_range( + struct file *file1, + struct file *file2, + struct file_swap_range *fsr) +{ + struct xfs_inode *ip1 = XFS_I(file_inode(file1)); + struct xfs_inode *ip2 = XFS_I(file_inode(file2)); + struct xfs_mount *mp = ip1->i_mount; + int ret; + + if (XFS_FORCED_SHUTDOWN(mp)) + return -EIO; + + /* Lock both files against IO */ + ret = xfs_ilock_two_io(ip1, ip2); + if (ret) + return ret; + + /* Prepare and then swap file data. */ + ret = xfs_swap_range_prep(file1, file2, fsr); + if (ret) + goto out_unlock; + + trace_xfs_file_swap_range(ip1, fsr->file1_offset, fsr->length, ip2, + fsr->file2_offset); + + ret = xfs_swap_range(ip1, ip2, fsr); + if (ret) + goto out_unlock; + +out_unlock: + xfs_iunlock_two_io(ip1, ip2); + if (ret) + trace_xfs_file_swap_range_error(ip2, ret, _RET_IP_); + return ret; +} + STATIC int xfs_file_open( struct inode *inode, @@ -1307,6 +1345,7 @@ const struct file_operations xfs_file_operations = { .fallocate = xfs_file_fallocate, .fadvise = xfs_file_fadvise, .remap_file_range = xfs_file_remap_range, + .swap_file_range = xfs_file_swap_range, }; const struct file_operations xfs_dir_file_operations = { diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h index af9c7bcb7a8a..7917203e56d4 100644 --- a/fs/xfs/xfs_trace.h +++ b/fs/xfs/xfs_trace.h @@ -3208,6 +3208,10 @@ DEFINE_INODE_IREC_EVENT(xfs_reflink_cancel_cow); DEFINE_INODE_IREC_EVENT(xfs_swap_extent_rmap_remap); DEFINE_INODE_IREC_EVENT(xfs_swap_extent_rmap_remap_piece); DEFINE_INODE_ERROR_EVENT(xfs_swap_extent_rmap_error); + +/* swapext tracepoints */ +DEFINE_DOUBLE_IO_EVENT(xfs_file_swap_range); +DEFINE_INODE_ERROR_EVENT(xfs_file_swap_range_error); DEFINE_INODE_IREC_EVENT(xfs_swapext_extent1); DEFINE_INODE_IREC_EVENT(xfs_swapext_extent2); DEFINE_ITRUNC_EVENT(xfs_swapext_update_inode_size); From patchwork Wed Apr 29 02:45:32 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Darrick J. Wong" X-Patchwork-Id: 11515857 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 46C3A92C for ; Wed, 29 Apr 2020 02:45:38 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 2BC442073E for ; Wed, 29 Apr 2020 02:45:38 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=oracle.com header.i=@oracle.com header.b="XS/uUNsJ" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726685AbgD2Cph (ORCPT ); Tue, 28 Apr 2020 22:45:37 -0400 Received: from aserp2120.oracle.com ([141.146.126.78]:48958 "EHLO aserp2120.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726536AbgD2Cpg (ORCPT ); Tue, 28 Apr 2020 22:45:36 -0400 Received: from pps.filterd (aserp2120.oracle.com [127.0.0.1]) by aserp2120.oracle.com (8.16.0.42/8.16.0.42) with SMTP id 03T2hPux072971; Wed, 29 Apr 2020 02:45:35 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=subject : from : to : cc : date : message-id : in-reply-to : references : mime-version : content-type : content-transfer-encoding; s=corp-2020-01-29; bh=REtCg83SBW0btZpXzg7fbMRqTYS6epoFIMUFGix2/5U=; b=XS/uUNsJThSlvTB2Q/G6ZwFdPesmvxSD1dpbsK73zLvRFxyfGrh2WV0w8m/URAlNlk8R UlVrUlPTqDFVq56R5J1bSCdzGrZHqvPhmerFHOCQdpL4mHx81JNd5trAycHu0AqHrZ3G WMxqevjrCfHOUIs45SK2EXQxJUWaR1wnKDjzHOa2iqxpVkSBI4uTK+5++l1UppuYAAlH E/PV3/d/GEnt4Fi606ClUMp/Fv+0MjybMT4zi3LwEv+c+OKMKE5d609r0Ih04Uo1QwWK A8K0vC5gJ5vypCjXhThL/Mem7YauaUIkvj8pM8gsNCz/Pl2pVLZsiXJ+WrUkO/ViQW1v gQ== Received: from userp3020.oracle.com (userp3020.oracle.com [156.151.31.79]) by aserp2120.oracle.com with ESMTP id 30nucg39p9-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Wed, 29 Apr 2020 02:45:35 +0000 Received: from pps.filterd (userp3020.oracle.com [127.0.0.1]) by userp3020.oracle.com (8.16.0.42/8.16.0.42) with SMTP id 03T2fvCn096228; Wed, 29 Apr 2020 02:45:34 GMT Received: from userv0121.oracle.com (userv0121.oracle.com [156.151.31.72]) by userp3020.oracle.com with ESMTP id 30pvcytdrr-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Wed, 29 Apr 2020 02:45:34 +0000 Received: from abhmp0019.oracle.com (abhmp0019.oracle.com [141.146.116.25]) by userv0121.oracle.com (8.14.4/8.13.8) with ESMTP id 03T2jYW0003799; Wed, 29 Apr 2020 02:45:34 GMT Received: from localhost (/67.169.218.210) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Tue, 28 Apr 2020 19:45:33 -0700 Subject: [PATCH 12/18] xfs: add error injection to test swapext recovery From: "Darrick J. Wong" To: darrick.wong@oracle.com Cc: linux-xfs@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-api@vger.kernel.org Date: Tue, 28 Apr 2020 19:45:32 -0700 Message-ID: <158812833282.168506.13350211780610846492.stgit@magnolia> In-Reply-To: <158812825316.168506.932540609191384366.stgit@magnolia> References: <158812825316.168506.932540609191384366.stgit@magnolia> User-Agent: StGit/0.17.1-dirty MIME-Version: 1.0 X-Proofpoint-Virus-Version: vendor=nai engine=6000 definitions=9605 signatures=668686 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 spamscore=0 mlxlogscore=999 suspectscore=1 malwarescore=0 adultscore=0 bulkscore=0 phishscore=0 mlxscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2003020000 definitions=main-2004290020 X-Proofpoint-Virus-Version: vendor=nai engine=6000 definitions=9605 signatures=668686 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 clxscore=1015 priorityscore=1501 mlxlogscore=999 impostorscore=0 suspectscore=1 malwarescore=0 lowpriorityscore=0 mlxscore=0 spamscore=0 adultscore=0 phishscore=0 bulkscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2003020000 definitions=main-2004290020 Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org From: Darrick J. Wong Add an errortag so that we can test recovery of swapext log items. Signed-off-by: Darrick J. Wong --- fs/xfs/libxfs/xfs_errortag.h | 4 +++- fs/xfs/libxfs/xfs_swapext.c | 5 +++++ fs/xfs/xfs_error.c | 3 +++ 3 files changed, 11 insertions(+), 1 deletion(-) diff --git a/fs/xfs/libxfs/xfs_errortag.h b/fs/xfs/libxfs/xfs_errortag.h index 79e6c4fb1d8a..e99683558ccc 100644 --- a/fs/xfs/libxfs/xfs_errortag.h +++ b/fs/xfs/libxfs/xfs_errortag.h @@ -55,7 +55,8 @@ #define XFS_ERRTAG_FORCE_SCRUB_REPAIR 32 #define XFS_ERRTAG_FORCE_SUMMARY_RECALC 33 #define XFS_ERRTAG_IUNLINK_FALLBACK 34 -#define XFS_ERRTAG_MAX 35 +#define XFS_ERRTAG_SWAPEXT_FINISH_ONE 35 +#define XFS_ERRTAG_MAX 36 /* * Random factors for above tags, 1 means always, 2 means 1/2 time, etc. @@ -94,6 +95,7 @@ #define XFS_RANDOM_BUF_LRU_REF 2 #define XFS_RANDOM_FORCE_SCRUB_REPAIR 1 #define XFS_RANDOM_FORCE_SUMMARY_RECALC 1 +#define XFS_RANDOM_SWAPEXT_FINISH_ONE 1 #define XFS_RANDOM_IUNLINK_FALLBACK (XFS_RANDOM_DEFAULT/10) #endif /* __XFS_ERRORTAG_H_ */ diff --git a/fs/xfs/libxfs/xfs_swapext.c b/fs/xfs/libxfs/xfs_swapext.c index 2eff48453070..6597c613fa3e 100644 --- a/fs/xfs/libxfs/xfs_swapext.c +++ b/fs/xfs/libxfs/xfs_swapext.c @@ -18,6 +18,8 @@ #include "xfs_quota.h" #include "xfs_swapext.h" #include "xfs_trace.h" +#include "xfs_errortag.h" +#include "xfs_error.h" /* Information to help us reset reflink flag / CoW fork state after a swap. */ @@ -354,6 +356,9 @@ xfs_swapext_finish_one( xfs_trans_log_inode(tp, sxi->si_ip2, XFS_ILOG_CORE); } + if (XFS_TEST_ERROR(false, tp->t_mountp, XFS_ERRTAG_SWAPEXT_FINISH_ONE)) + return -EIO; + if (xfs_swapext_has_more_work(sxi)) trace_xfs_swapext_defer(tp->t_mountp, sxi); return 0; diff --git a/fs/xfs/xfs_error.c b/fs/xfs/xfs_error.c index a21e9cc6516a..d818497afa2c 100644 --- a/fs/xfs/xfs_error.c +++ b/fs/xfs/xfs_error.c @@ -53,6 +53,7 @@ static unsigned int xfs_errortag_random_default[] = { XFS_RANDOM_FORCE_SCRUB_REPAIR, XFS_RANDOM_FORCE_SUMMARY_RECALC, XFS_RANDOM_IUNLINK_FALLBACK, + XFS_RANDOM_SWAPEXT_FINISH_ONE, }; struct xfs_errortag_attr { @@ -162,6 +163,7 @@ XFS_ERRORTAG_ATTR_RW(buf_lru_ref, XFS_ERRTAG_BUF_LRU_REF); XFS_ERRORTAG_ATTR_RW(force_repair, XFS_ERRTAG_FORCE_SCRUB_REPAIR); XFS_ERRORTAG_ATTR_RW(bad_summary, XFS_ERRTAG_FORCE_SUMMARY_RECALC); XFS_ERRORTAG_ATTR_RW(iunlink_fallback, XFS_ERRTAG_IUNLINK_FALLBACK); +XFS_ERRORTAG_ATTR_RW(swapext_finish_one, XFS_RANDOM_SWAPEXT_FINISH_ONE); static struct attribute *xfs_errortag_attrs[] = { XFS_ERRORTAG_ATTR_LIST(noerror), @@ -199,6 +201,7 @@ static struct attribute *xfs_errortag_attrs[] = { XFS_ERRORTAG_ATTR_LIST(force_repair), XFS_ERRORTAG_ATTR_LIST(bad_summary), XFS_ERRORTAG_ATTR_LIST(iunlink_fallback), + XFS_ERRORTAG_ATTR_LIST(swapext_finish_one), NULL, }; From patchwork Wed Apr 29 02:45:39 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Darrick J. Wong" X-Patchwork-Id: 11515861 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 292BA17EF for ; Wed, 29 Apr 2020 02:45:44 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 12C822076A for ; Wed, 29 Apr 2020 02:45:44 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=oracle.com header.i=@oracle.com header.b="CMrdJfyB" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726692AbgD2Cpn (ORCPT ); Tue, 28 Apr 2020 22:45:43 -0400 Received: from userp2120.oracle.com ([156.151.31.85]:38980 "EHLO userp2120.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726512AbgD2Cpm (ORCPT ); Tue, 28 Apr 2020 22:45:42 -0400 Received: from pps.filterd (userp2120.oracle.com [127.0.0.1]) by userp2120.oracle.com (8.16.0.42/8.16.0.42) with SMTP id 03T2iHT1122030; Wed, 29 Apr 2020 02:45:41 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=subject : from : to : cc : date : message-id : in-reply-to : references : mime-version : content-type : content-transfer-encoding; s=corp-2020-01-29; bh=IGWm8WxAfDQAWkPKnJp2FO4bKIMqTYnORrHj1EplwQw=; b=CMrdJfyB3AXeMOO3DF7o3r3vYe6KLDpIPmde+qg5/207GXhmjv65JsBDVbZfotWPTXZj 5L+mr99Fm7wt5AglVQZlTYqtX/3ovJz1lAL+A8YAjFLocXcDYWJqRmaSxPHGwyG/exiv bHyUEegjD9Zs9HXS/m/FNv3QGragXQrBBFRdDKrpDelj/oT+xiFqEVYJR5C8hhH6cxeE GzYqOYjYoMrRY+910PUy1hChwzT+lKqdHTB0AnSY2r8qgrInPS+P3GVrUMBYVjEEv6qD OVC88DI2ejCtR4aJzmo4+YYQGA5qMQojSrPK3SXj7FVAWPqrd/TT/I/rcpcJD8QM2JQQ Pg== Received: from userp3030.oracle.com (userp3030.oracle.com [156.151.31.80]) by userp2120.oracle.com with ESMTP id 30p2p08p36-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Wed, 29 Apr 2020 02:45:41 +0000 Received: from pps.filterd (userp3030.oracle.com [127.0.0.1]) by userp3030.oracle.com (8.16.0.42/8.16.0.42) with SMTP id 03T2ggTE071521; Wed, 29 Apr 2020 02:45:41 GMT Received: from aserv0122.oracle.com (aserv0122.oracle.com [141.146.126.236]) by userp3030.oracle.com with ESMTP id 30mxphp42u-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Wed, 29 Apr 2020 02:45:41 +0000 Received: from abhmp0010.oracle.com (abhmp0010.oracle.com [141.146.116.16]) by aserv0122.oracle.com (8.14.4/8.14.4) with ESMTP id 03T2jeK6003831; Wed, 29 Apr 2020 02:45:40 GMT Received: from localhost (/67.169.218.210) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Tue, 28 Apr 2020 19:45:40 -0700 Subject: [PATCH 13/18] xfs: allow xfs_swap_range to use older extent swap algorithms From: "Darrick J. Wong" To: darrick.wong@oracle.com Cc: linux-xfs@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-api@vger.kernel.org Date: Tue, 28 Apr 2020 19:45:39 -0700 Message-ID: <158812833911.168506.9347356534527509263.stgit@magnolia> In-Reply-To: <158812825316.168506.932540609191384366.stgit@magnolia> References: <158812825316.168506.932540609191384366.stgit@magnolia> User-Agent: StGit/0.17.1-dirty MIME-Version: 1.0 X-Proofpoint-Virus-Version: vendor=nai engine=6000 definitions=9605 signatures=668686 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 mlxlogscore=999 malwarescore=0 mlxscore=0 bulkscore=0 adultscore=0 phishscore=0 suspectscore=1 spamscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2003020000 definitions=main-2004290020 X-Proofpoint-Virus-Version: vendor=nai engine=6000 definitions=9605 signatures=668686 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 spamscore=0 phishscore=0 clxscore=1015 bulkscore=0 adultscore=0 lowpriorityscore=0 impostorscore=0 malwarescore=0 mlxscore=0 suspectscore=1 mlxlogscore=999 priorityscore=1501 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2003020000 definitions=main-2004290020 Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org From: Darrick J. Wong If userspace permits non-atomic swap operations, use the older code paths to implement the same functionality. Signed-off-by: Darrick J. Wong --- fs/xfs/xfs_bmap_util.c | 42 ++++++++++++++++++++++++++++++++++++------ 1 file changed, 36 insertions(+), 6 deletions(-) diff --git a/fs/xfs/xfs_bmap_util.c b/fs/xfs/xfs_bmap_util.c index a8bd2627d76e..72aebf7ed42d 100644 --- a/fs/xfs/xfs_bmap_util.c +++ b/fs/xfs/xfs_bmap_util.c @@ -2063,9 +2063,6 @@ xfs_swap_range( unsigned int sxflags = 0; int error; - if (!xfs_sb_version_hasatomicswap(&mp->m_sb)) - return -EOPNOTSUPP; - startoff1 = XFS_B_TO_FSBT(mp, fsr->file1_offset); startoff2 = XFS_B_TO_FSBT(mp, fsr->file2_offset); @@ -2135,12 +2132,45 @@ xfs_swap_range( if (error) goto out_trans_cancel; - /* Perform the file range swap. */ if (fsr->flags & FILE_SWAP_RANGE_TO_EOF) sxflags |= XFS_SWAPEXT_SET_SIZES; - error = xfs_swapext_atomic(&tp, ip1, ip2, XFS_DATA_FORK, startoff1, - startoff2, blockcount, sxflags); + /* Perform the file range swap... */ + if (xfs_sb_version_hasatomicswap(&mp->m_sb)) { + /* ...by using the atomic swap, since it's available. */ + error = xfs_swapext_atomic(&tp, ip1, ip2, XFS_DATA_FORK, + startoff1, startoff2, blockcount, sxflags); + } else if ((fsr->flags & FILE_SWAP_RANGE_NONATOMIC) && + (xfs_sb_version_hasreflink(&mp->m_sb) || + xfs_sb_version_hasrmapbt(&mp->m_sb))) { + /* + * ...by using deferred bmap operations, which are only + * supported if userspace is ok with a non-atomic swap + * (e.g. xfs_fsr) and the log supports deferred bmap. + */ + error = xfs_swapext_deferred_bmap(&tp, ip1, ip2, XFS_DATA_FORK, + startoff1, startoff2, blockcount, sxflags); + } else if ((fsr->flags & FILE_SWAP_RANGE_NONATOMIC) && + !(fsr->flags & FILE_SWAP_RANGE_TO_EOF) && + fsr->file1_offset == 0 && fsr->file2_offset == 0 && + fsr->length == ip1->i_d.di_size && + fsr->length == ip2->i_d.di_size) { + /* + * ...by using the old bmap owner change code, if we're doing + * a full file swap and we're ok with non-atomic mode. + */ + error = xfs_swap_extents_check_format(ip2, ip1); + if (error) { + xfs_notice(mp, + "%s: inode 0x%llx format is incompatible for exchanging.", + __func__, ip2->i_ino); + goto out_trans_cancel; + } + error = xfs_swap_extent_forks(&tp, ip2, ip1); + } else { + /* ...or not at all, because we cannot do it. */ + error = -EOPNOTSUPP; + } if (error) goto out_trans_cancel; From patchwork Wed Apr 29 02:45:45 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Darrick J. Wong" X-Patchwork-Id: 11515863 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 3658F92C for ; Wed, 29 Apr 2020 02:45:53 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 0EF4E20784 for ; Wed, 29 Apr 2020 02:45:53 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=oracle.com header.i=@oracle.com header.b="dhFWCtKb" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726699AbgD2Cpw (ORCPT ); Tue, 28 Apr 2020 22:45:52 -0400 Received: from userp2120.oracle.com ([156.151.31.85]:39076 "EHLO userp2120.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726512AbgD2Cpw (ORCPT ); Tue, 28 Apr 2020 22:45:52 -0400 Received: from pps.filterd (userp2120.oracle.com [127.0.0.1]) by userp2120.oracle.com (8.16.0.42/8.16.0.42) with SMTP id 03T2hjcZ121531; Wed, 29 Apr 2020 02:45:49 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=subject : from : to : cc : date : message-id : in-reply-to : references : mime-version : content-type : content-transfer-encoding; s=corp-2020-01-29; bh=iBdeNDEStaoZqSxt9mY/tdQsl4D7IPvOjSPNK+x7piw=; b=dhFWCtKbfWFWmUrTKsjNQuxuUrQEMYNHWbIVuuh7ryJc5ktVh+pcvU4xf+uKf4+N7Z45 5eV8HrU6HPolwHXkugxI/tTwrAq5KSfJxIQMxjr1x2Co/ZPfpqJ5WTVMn5sdIGQ0oLNz FSRv8tBXsdA9RoyBy1+83IC/123MtK2LoNVpCj8qbQysFaL8xMeC0Z7Sc5oknXW6A2cA kMegGqmobFEQGPxeg8qrB5UCMmDufe8b1H35csOCOyhifcVnkXIozzmqYzUL9OV/G9SS IRt04piP4h0zg0dSsi/alkYIOGf37zKxt3A3ZHLzmexUifpdeW4biEImZ8pFmK1D71nX wQ== Received: from aserp3020.oracle.com (aserp3020.oracle.com [141.146.126.70]) by userp2120.oracle.com with ESMTP id 30p2p08p3g-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Wed, 29 Apr 2020 02:45:49 +0000 Received: from pps.filterd (aserp3020.oracle.com [127.0.0.1]) by aserp3020.oracle.com (8.16.0.42/8.16.0.42) with SMTP id 03T2h0NR075358; Wed, 29 Apr 2020 02:45:48 GMT Received: from aserv0122.oracle.com (aserv0122.oracle.com [141.146.126.236]) by aserp3020.oracle.com with ESMTP id 30my0f8ew9-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Wed, 29 Apr 2020 02:45:48 +0000 Received: from abhmp0013.oracle.com (abhmp0013.oracle.com [141.146.116.19]) by aserv0122.oracle.com (8.14.4/8.14.4) with ESMTP id 03T2jlla003886; Wed, 29 Apr 2020 02:45:47 GMT Received: from localhost (/67.169.218.210) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Tue, 28 Apr 2020 19:45:46 -0700 Subject: [PATCH 14/18] xfs: port xfs_swap_extents_rmap to our new code From: "Darrick J. Wong" To: darrick.wong@oracle.com Cc: linux-xfs@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-api@vger.kernel.org Date: Tue, 28 Apr 2020 19:45:45 -0700 Message-ID: <158812834534.168506.15707098363449442583.stgit@magnolia> In-Reply-To: <158812825316.168506.932540609191384366.stgit@magnolia> References: <158812825316.168506.932540609191384366.stgit@magnolia> User-Agent: StGit/0.17.1-dirty MIME-Version: 1.0 X-Proofpoint-Virus-Version: vendor=nai engine=6000 definitions=9605 signatures=668686 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 malwarescore=0 spamscore=0 suspectscore=1 adultscore=0 mlxlogscore=999 bulkscore=0 phishscore=0 mlxscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2003020000 definitions=main-2004290020 X-Proofpoint-Virus-Version: vendor=nai engine=6000 definitions=9605 signatures=668686 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 spamscore=0 phishscore=0 clxscore=1015 bulkscore=0 adultscore=0 lowpriorityscore=0 impostorscore=0 malwarescore=0 mlxscore=0 suspectscore=1 mlxlogscore=999 priorityscore=1501 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2003020000 definitions=main-2004290020 Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org From: Darrick J. Wong The inner loop of xfs_swap_extents_rmap does the same work as xfs_swapext_finish_one, so adapt it to use that. Doing so has the side benefit that the older code path no longer wastes its time remapping shared extents. Signed-off-by: Darrick J. Wong --- fs/xfs/libxfs/xfs_swapext.c | 46 +++++++++++++++ fs/xfs/libxfs/xfs_swapext.h | 5 ++ fs/xfs/xfs_bmap_util.c | 136 +++---------------------------------------- fs/xfs/xfs_trace.h | 5 -- 4 files changed, 60 insertions(+), 132 deletions(-) diff --git a/fs/xfs/libxfs/xfs_swapext.c b/fs/xfs/libxfs/xfs_swapext.c index 6597c613fa3e..64083d48fb7d 100644 --- a/fs/xfs/libxfs/xfs_swapext.c +++ b/fs/xfs/libxfs/xfs_swapext.c @@ -433,3 +433,49 @@ xfs_swapext_atomic( xfs_swapext_reflink_finish(*tpp, ip1, ip2, state); return 0; } + +/* + * Swap a range of extents from one inode to another, non-atomically. + * + * Use deferred bmap log items swap a range of extents from one inode with + * another. Overall extent swap progress is /not/ tracked through the log, + * which means that while log recovery can finish remapping a single extent, + * it cannot finish the entire operation. + */ +int +xfs_swapext_deferred_bmap( + struct xfs_trans **tpp, + struct xfs_inode *ip1, + struct xfs_inode *ip2, + int whichfork, + xfs_fileoff_t startoff1, + xfs_fileoff_t startoff2, + xfs_filblks_t blockcount, + unsigned int flags) +{ + struct xfs_swapext_intent sxi; + unsigned int state; + int error; + + ASSERT(xfs_isilocked(ip1, XFS_ILOCK_EXCL)); + ASSERT(xfs_isilocked(ip2, XFS_ILOCK_EXCL)); + ASSERT(whichfork != XFS_COW_FORK); + + state = xfs_swapext_reflink_prep(ip1, ip2, whichfork, startoff1, + startoff2, blockcount); + + xfs_swapext_init_intent(&sxi, ip1, ip2, whichfork, startoff1, startoff2, + blockcount, flags); + + while (sxi.si_blockcount > 0) { + error = xfs_swapext_finish_one(*tpp, &sxi); + if (error) + return error; + error = xfs_defer_finish(tpp); + if (error) + return error; + } + + xfs_swapext_reflink_finish(*tpp, ip1, ip2, state); + return 0; +} diff --git a/fs/xfs/libxfs/xfs_swapext.h b/fs/xfs/libxfs/xfs_swapext.h index af1893f37d39..f4146f55a4c9 100644 --- a/fs/xfs/libxfs/xfs_swapext.h +++ b/fs/xfs/libxfs/xfs_swapext.h @@ -54,4 +54,9 @@ int xfs_swapext_atomic(struct xfs_trans **tpp, struct xfs_inode *ip1, xfs_fileoff_t startoff2, xfs_filblks_t blockcount, unsigned int flags); +int xfs_swapext_deferred_bmap(struct xfs_trans **tpp, struct xfs_inode *ip1, + struct xfs_inode *ip2, int whichfork, xfs_fileoff_t startoff1, + xfs_fileoff_t startoff2, xfs_filblks_t blockcount, + unsigned int flags); + #endif /* __XFS_SWAPEXT_H_ */ diff --git a/fs/xfs/xfs_bmap_util.c b/fs/xfs/xfs_bmap_util.c index 72aebf7ed42d..d1351f0176a3 100644 --- a/fs/xfs/xfs_bmap_util.c +++ b/fs/xfs/xfs_bmap_util.c @@ -1351,131 +1351,6 @@ xfs_swap_extent_flush( return 0; } -/* - * Move extents from one file to another, when rmap is enabled. - */ -STATIC int -xfs_swap_extent_rmap( - struct xfs_trans **tpp, - struct xfs_inode *ip, - struct xfs_inode *tip) -{ - struct xfs_trans *tp = *tpp; - struct xfs_bmbt_irec irec; - struct xfs_bmbt_irec uirec; - struct xfs_bmbt_irec tirec; - xfs_fileoff_t offset_fsb; - xfs_fileoff_t end_fsb; - xfs_filblks_t count_fsb; - int error; - xfs_filblks_t ilen; - xfs_filblks_t rlen; - int nimaps; - uint64_t tip_flags2; - - /* - * If the source file has shared blocks, we must flag the donor - * file as having shared blocks so that we get the shared-block - * rmap functions when we go to fix up the rmaps. The flags - * will be switch for reals later. - */ - tip_flags2 = tip->i_d.di_flags2; - if (ip->i_d.di_flags2 & XFS_DIFLAG2_REFLINK) - tip->i_d.di_flags2 |= XFS_DIFLAG2_REFLINK; - - offset_fsb = 0; - end_fsb = XFS_B_TO_FSB(ip->i_mount, i_size_read(VFS_I(ip))); - count_fsb = (xfs_filblks_t)(end_fsb - offset_fsb); - - while (count_fsb) { - /* Read extent from the donor file */ - nimaps = 1; - error = xfs_bmapi_read(tip, offset_fsb, count_fsb, &tirec, - &nimaps, 0); - if (error) - goto out; - if (nimaps != 1 || tirec.br_startblock == DELAYSTARTBLOCK) { - /* - * We should never get no mapping or a delalloc extent - * since the donor file should have been flushed by the - * caller. - */ - ASSERT(0); - error = -EINVAL; - goto out; - } - - trace_xfs_swap_extent_rmap_remap(tip, &tirec); - ilen = tirec.br_blockcount; - - /* Unmap the old blocks in the source file. */ - while (tirec.br_blockcount) { - ASSERT(tp->t_firstblock == NULLFSBLOCK); - trace_xfs_swap_extent_rmap_remap_piece(tip, &tirec); - - /* Read extent from the source file */ - nimaps = 1; - error = xfs_bmapi_read(ip, tirec.br_startoff, - tirec.br_blockcount, &irec, - &nimaps, 0); - if (error) - goto out; - if (nimaps != 1 || - tirec.br_startoff != irec.br_startoff) { - /* - * We should never get no mapping or a mapping - * for another offset, but bail out if that - * ever does. - */ - ASSERT(0); - error = -EFSCORRUPTED; - goto out; - } - trace_xfs_swap_extent_rmap_remap_piece(ip, &irec); - - /* Trim the extent. */ - uirec = tirec; - uirec.br_blockcount = rlen = min_t(xfs_filblks_t, - tirec.br_blockcount, - irec.br_blockcount); - trace_xfs_swap_extent_rmap_remap_piece(tip, &uirec); - - /* Remove the mapping from the donor file. */ - xfs_bmap_unmap_extent(tp, tip, XFS_DATA_FORK, &uirec); - - /* Remove the mapping from the source file. */ - xfs_bmap_unmap_extent(tp, ip, XFS_DATA_FORK, &irec); - - /* Map the donor file's blocks into the source file. */ - xfs_bmap_map_extent(tp, ip, XFS_DATA_FORK, &uirec); - - /* Map the source file's blocks into the donor file. */ - xfs_bmap_map_extent(tp, tip, XFS_DATA_FORK, &irec); - - error = xfs_defer_finish(tpp); - tp = *tpp; - if (error) - goto out; - - tirec.br_startoff += rlen; - if (tirec.br_startblock != HOLESTARTBLOCK && - tirec.br_startblock != DELAYSTARTBLOCK) - tirec.br_startblock += rlen; - tirec.br_blockcount -= rlen; - } - - /* Roll on... */ - count_fsb -= ilen; - offset_fsb += ilen; - } - -out: - if (error) - trace_xfs_swap_extent_rmap_error(ip, error, _RET_IP_); - tip->i_d.di_flags2 = tip_flags2; - return error; -} - /* Swap the extents of two files by swapping data forks. */ STATIC int xfs_swap_extent_forks( @@ -1765,15 +1640,20 @@ xfs_swap_extents( target_log_flags = XFS_ILOG_CORE; if (xfs_sb_version_hasrmapbt(&mp->m_sb)) - error = xfs_swap_extent_rmap(&tp, ip, tip); + error = xfs_swapext_deferred_bmap(&tp, ip, tip, XFS_DATA_FORK, + 0, 0, XFS_B_TO_FSB(ip->i_mount, + i_size_read(VFS_I(ip))), 0); else error = xfs_swap_extent_forks(tp, ip, tip, &src_log_flags, &target_log_flags); - if (error) + if (error) { + trace_xfs_swap_extent_error(ip, error, _THIS_IP_); goto out_trans_cancel; + } /* Do we have to swap reflink flags? */ - if ((ip->i_d.di_flags2 & XFS_DIFLAG2_REFLINK) ^ + if (!xfs_sb_version_hasrmapbt(&mp->m_sb) && + (ip->i_d.di_flags2 & XFS_DIFLAG2_REFLINK) ^ (tip->i_d.di_flags2 & XFS_DIFLAG2_REFLINK)) { f = ip->i_d.di_flags2 & XFS_DIFLAG2_REFLINK; ip->i_d.di_flags2 &= ~XFS_DIFLAG2_REFLINK; diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h index 7917203e56d4..306cf86c353d 100644 --- a/fs/xfs/xfs_trace.h +++ b/fs/xfs/xfs_trace.h @@ -3204,14 +3204,11 @@ DEFINE_INODE_ERROR_EVENT(xfs_reflink_end_cow_error); DEFINE_INODE_IREC_EVENT(xfs_reflink_cancel_cow); -/* rmap swapext tracepoints */ -DEFINE_INODE_IREC_EVENT(xfs_swap_extent_rmap_remap); -DEFINE_INODE_IREC_EVENT(xfs_swap_extent_rmap_remap_piece); -DEFINE_INODE_ERROR_EVENT(xfs_swap_extent_rmap_error); /* swapext tracepoints */ DEFINE_DOUBLE_IO_EVENT(xfs_file_swap_range); DEFINE_INODE_ERROR_EVENT(xfs_file_swap_range_error); +DEFINE_INODE_ERROR_EVENT(xfs_swap_extent_error); DEFINE_INODE_IREC_EVENT(xfs_swapext_extent1); DEFINE_INODE_IREC_EVENT(xfs_swapext_extent2); DEFINE_ITRUNC_EVENT(xfs_swapext_update_inode_size); From patchwork Wed Apr 29 02:45:53 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Darrick J. Wong" X-Patchwork-Id: 11515869 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 1202717EF for ; Wed, 29 Apr 2020 02:45:59 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id E7DA32082E for ; Wed, 29 Apr 2020 02:45:58 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=oracle.com header.i=@oracle.com header.b="qeweivu4" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726702AbgD2Cp6 (ORCPT ); Tue, 28 Apr 2020 22:45:58 -0400 Received: from aserp2120.oracle.com ([141.146.126.78]:49174 "EHLO aserp2120.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726509AbgD2Cp6 (ORCPT ); Tue, 28 Apr 2020 22:45:58 -0400 Received: from pps.filterd (aserp2120.oracle.com [127.0.0.1]) by aserp2120.oracle.com (8.16.0.42/8.16.0.42) with SMTP id 03T2j6kE074138; Wed, 29 Apr 2020 02:45:55 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=subject : from : to : cc : date : message-id : in-reply-to : references : mime-version : content-type : content-transfer-encoding; s=corp-2020-01-29; bh=nmNUKSiFUHDR5UYywkzOXuBx7CPOqyjicEFgyzI25+I=; b=qeweivu4QupNVFSNx9rKML2qQHhWljVU/jGG2wMnIY3FMn6Sqrdxq+kupLAA5t4N7bll HVEb2dTf8DEcDonureSQjQjjfeGeNO7DEpUSsDpRS7XP6DwPmkxxTvocdkjzOYIuplRB 4RIUVsDmxwImez9PlZKwxdFH+US/7RHuZL96kfr3Ep/8oSRVEywNBrBsOaSf8dwOkCue k3X1HtymrJCRWG1sFO6ccm81KQ9P4S3QBIMFzZFdixlHpqPX6HFb7VYbg+sOVwP3d5zl Nxp2ZfuxbNGNZnIOFdt0JyV5FVJpNZy3oKWBTLwQ/NDS9AqAd1pmkOzWtClFfIHfThuw mA== Received: from aserp3020.oracle.com (aserp3020.oracle.com [141.146.126.70]) by aserp2120.oracle.com with ESMTP id 30nucg39q1-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Wed, 29 Apr 2020 02:45:55 +0000 Received: from pps.filterd (aserp3020.oracle.com [127.0.0.1]) by aserp3020.oracle.com (8.16.0.42/8.16.0.42) with SMTP id 03T2h12B075391; Wed, 29 Apr 2020 02:45:54 GMT Received: from aserv0122.oracle.com (aserv0122.oracle.com [141.146.126.236]) by aserp3020.oracle.com with ESMTP id 30my0f8f4b-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Wed, 29 Apr 2020 02:45:54 +0000 Received: from abhmp0020.oracle.com (abhmp0020.oracle.com [141.146.116.26]) by aserv0122.oracle.com (8.14.4/8.14.4) with ESMTP id 03T2jsXo003894; Wed, 29 Apr 2020 02:45:54 GMT Received: from localhost (/67.169.218.210) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Tue, 28 Apr 2020 19:45:53 -0700 Subject: [PATCH 15/18] xfs: consolidate all of the xfs_swap_extent_forks code From: "Darrick J. Wong" To: darrick.wong@oracle.com Cc: linux-xfs@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-api@vger.kernel.org Date: Tue, 28 Apr 2020 19:45:53 -0700 Message-ID: <158812835295.168506.6384467972863200135.stgit@magnolia> In-Reply-To: <158812825316.168506.932540609191384366.stgit@magnolia> References: <158812825316.168506.932540609191384366.stgit@magnolia> User-Agent: StGit/0.17.1-dirty MIME-Version: 1.0 X-Proofpoint-Virus-Version: vendor=nai engine=6000 definitions=9605 signatures=668686 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 malwarescore=0 spamscore=0 suspectscore=1 adultscore=0 mlxlogscore=999 bulkscore=0 phishscore=0 mlxscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2003020000 definitions=main-2004290020 X-Proofpoint-Virus-Version: vendor=nai engine=6000 definitions=9605 signatures=668686 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 clxscore=1015 priorityscore=1501 mlxlogscore=999 impostorscore=0 suspectscore=1 malwarescore=0 lowpriorityscore=0 mlxscore=0 spamscore=0 adultscore=0 phishscore=0 bulkscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2003020000 definitions=main-2004290020 Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org From: Darrick J. Wong Consolidate the bmbt owner change scan code in xfs_swap_extent_forks, since it's not needed for the deferred bmap log item swapext implementation. The goal is to package up all three implementations into functions that have the same preconditions and leave the system in the same state. Signed-off-by: Darrick J. Wong --- fs/xfs/xfs_bmap_util.c | 211 +++++++++++++++++++++++------------------------- 1 file changed, 103 insertions(+), 108 deletions(-) diff --git a/fs/xfs/xfs_bmap_util.c b/fs/xfs/xfs_bmap_util.c index d1351f0176a3..1767f1586c46 100644 --- a/fs/xfs/xfs_bmap_util.c +++ b/fs/xfs/xfs_bmap_util.c @@ -1351,19 +1351,61 @@ xfs_swap_extent_flush( return 0; } +/* + * Fix up the owners of the bmbt blocks to refer to the current inode. The + * change owner scan attempts to order all modified buffers in the current + * transaction. In the event of ordered buffer failure, the offending buffer is + * physically logged as a fallback and the scan returns -EAGAIN. We must roll + * the transaction in this case to replenish the fallback log reservation and + * restart the scan. This process repeats until the scan completes. + */ +static int +xfs_swap_change_owner( + struct xfs_trans **tpp, + struct xfs_inode *ip, + struct xfs_inode *tmpip) +{ + int error; + struct xfs_trans *tp = *tpp; + + do { + error = xfs_bmbt_change_owner(tp, ip, XFS_DATA_FORK, ip->i_ino, + NULL); + /* success or fatal error */ + if (error != -EAGAIN) + break; + + error = xfs_trans_roll(tpp); + if (error) + break; + tp = *tpp; + + /* + * Redirty both inodes so they can relog and keep the log tail + * moving forward. + */ + xfs_trans_ijoin(tp, ip, 0); + xfs_trans_ijoin(tp, tmpip, 0); + xfs_trans_log_inode(tp, ip, XFS_ILOG_CORE); + xfs_trans_log_inode(tp, tmpip, XFS_ILOG_CORE); + } while (true); + + return error; +} + /* Swap the extents of two files by swapping data forks. */ STATIC int xfs_swap_extent_forks( - struct xfs_trans *tp, + struct xfs_trans **tpp, struct xfs_inode *ip, - struct xfs_inode *tip, - int *src_log_flags, - int *target_log_flags) + struct xfs_inode *tip) { xfs_filblks_t aforkblks = 0; xfs_filblks_t taforkblks = 0; xfs_extnum_t junk; uint64_t tmp; + int src_log_flags = XFS_ILOG_CORE; + int target_log_flags = XFS_ILOG_CORE; int error; /* @@ -1371,14 +1413,14 @@ xfs_swap_extent_forks( */ if ( ((XFS_IFORK_Q(ip) != 0) && (ip->i_d.di_anextents > 0)) && (ip->i_d.di_aformat != XFS_DINODE_FMT_LOCAL)) { - error = xfs_bmap_count_blocks(tp, ip, XFS_ATTR_FORK, &junk, + error = xfs_bmap_count_blocks(*tpp, ip, XFS_ATTR_FORK, &junk, &aforkblks); if (error) return error; } if ( ((XFS_IFORK_Q(tip) != 0) && (tip->i_d.di_anextents > 0)) && (tip->i_d.di_aformat != XFS_DINODE_FMT_LOCAL)) { - error = xfs_bmap_count_blocks(tp, tip, XFS_ATTR_FORK, &junk, + error = xfs_bmap_count_blocks(*tpp, tip, XFS_ATTR_FORK, &junk, &taforkblks); if (error) return error; @@ -1393,9 +1435,9 @@ xfs_swap_extent_forks( */ if (xfs_sb_version_has_v3inode(&ip->i_mount->m_sb)) { if (ip->i_d.di_format == XFS_DINODE_FMT_BTREE) - (*target_log_flags) |= XFS_ILOG_DOWNER; + target_log_flags |= XFS_ILOG_DOWNER; if (tip->i_d.di_format == XFS_DINODE_FMT_BTREE) - (*src_log_flags) |= XFS_ILOG_DOWNER; + src_log_flags |= XFS_ILOG_DOWNER; } /* @@ -1428,69 +1470,77 @@ xfs_swap_extent_forks( switch (ip->i_d.di_format) { case XFS_DINODE_FMT_EXTENTS: - (*src_log_flags) |= XFS_ILOG_DEXT; + src_log_flags |= XFS_ILOG_DEXT; break; case XFS_DINODE_FMT_BTREE: ASSERT(!xfs_sb_version_has_v3inode(&ip->i_mount->m_sb) || - (*src_log_flags & XFS_ILOG_DOWNER)); - (*src_log_flags) |= XFS_ILOG_DBROOT; + (src_log_flags & XFS_ILOG_DOWNER)); + src_log_flags |= XFS_ILOG_DBROOT; break; } switch (tip->i_d.di_format) { case XFS_DINODE_FMT_EXTENTS: - (*target_log_flags) |= XFS_ILOG_DEXT; + target_log_flags |= XFS_ILOG_DEXT; break; case XFS_DINODE_FMT_BTREE: - (*target_log_flags) |= XFS_ILOG_DBROOT; + target_log_flags |= XFS_ILOG_DBROOT; ASSERT(!xfs_sb_version_has_v3inode(&ip->i_mount->m_sb) || - (*target_log_flags & XFS_ILOG_DOWNER)); + (target_log_flags & XFS_ILOG_DOWNER)); break; } - return 0; -} + /* Do we have to swap reflink flags? */ + if ((ip->i_d.di_flags2 & XFS_DIFLAG2_REFLINK) ^ + (tip->i_d.di_flags2 & XFS_DIFLAG2_REFLINK)) { + uint64_t f; -/* - * Fix up the owners of the bmbt blocks to refer to the current inode. The - * change owner scan attempts to order all modified buffers in the current - * transaction. In the event of ordered buffer failure, the offending buffer is - * physically logged as a fallback and the scan returns -EAGAIN. We must roll - * the transaction in this case to replenish the fallback log reservation and - * restart the scan. This process repeats until the scan completes. - */ -static int -xfs_swap_change_owner( - struct xfs_trans **tpp, - struct xfs_inode *ip, - struct xfs_inode *tmpip) -{ - int error; - struct xfs_trans *tp = *tpp; + f = ip->i_d.di_flags2 & XFS_DIFLAG2_REFLINK; + ip->i_d.di_flags2 &= ~XFS_DIFLAG2_REFLINK; + ip->i_d.di_flags2 |= tip->i_d.di_flags2 & XFS_DIFLAG2_REFLINK; + tip->i_d.di_flags2 &= ~XFS_DIFLAG2_REFLINK; + tip->i_d.di_flags2 |= f & XFS_DIFLAG2_REFLINK; + } - do { - error = xfs_bmbt_change_owner(tp, ip, XFS_DATA_FORK, ip->i_ino, - NULL); - /* success or fatal error */ - if (error != -EAGAIN) - break; + /* Swap the cow forks. */ + if (xfs_sb_version_hasreflink(&ip->i_mount->m_sb)) { + ASSERT(ip->i_cformat == XFS_DINODE_FMT_EXTENTS); + ASSERT(tip->i_cformat == XFS_DINODE_FMT_EXTENTS); - error = xfs_trans_roll(tpp); - if (error) - break; - tp = *tpp; + swap(ip->i_cnextents, tip->i_cnextents); + swap(ip->i_cowfp, tip->i_cowfp); - /* - * Redirty both inodes so they can relog and keep the log tail - * moving forward. - */ - xfs_trans_ijoin(tp, ip, 0); - xfs_trans_ijoin(tp, tmpip, 0); - xfs_trans_log_inode(tp, ip, XFS_ILOG_CORE); - xfs_trans_log_inode(tp, tmpip, XFS_ILOG_CORE); - } while (true); + if (ip->i_cowfp && ip->i_cowfp->if_bytes) + xfs_inode_set_cowblocks_tag(ip); + else + xfs_inode_clear_cowblocks_tag(ip); + if (tip->i_cowfp && tip->i_cowfp->if_bytes) + xfs_inode_set_cowblocks_tag(tip); + else + xfs_inode_clear_cowblocks_tag(tip); + } - return error; + xfs_trans_log_inode(*tpp, ip, src_log_flags); + xfs_trans_log_inode(*tpp, tip, target_log_flags); + + /* + * The extent forks have been swapped, but crc=1,rmapbt=0 filesystems + * have inode number owner values in the bmbt blocks that still refer to + * the old inode. Scan each bmbt to fix up the owner values with the + * inode number of the current inode. + */ + if (src_log_flags & XFS_ILOG_DOWNER) { + error = xfs_swap_change_owner(tpp, ip, tip); + if (error) + return error; + } + if (target_log_flags & XFS_ILOG_DOWNER) { + error = xfs_swap_change_owner(tpp, tip, ip); + if (error) + return error; + } + + return 0; } int @@ -1502,10 +1552,8 @@ xfs_swap_extents( struct xfs_mount *mp = ip->i_mount; struct xfs_trans *tp; struct xfs_bstat *sbp = &sxp->sx_stat; - int src_log_flags, target_log_flags; int error = 0; int lock_flags; - uint64_t f; int resblks = 0; /* @@ -1636,70 +1684,17 @@ xfs_swap_extents( * recovery is going to see the fork as owned by the swapped inode, * not the pre-swapped inodes. */ - src_log_flags = XFS_ILOG_CORE; - target_log_flags = XFS_ILOG_CORE; - if (xfs_sb_version_hasrmapbt(&mp->m_sb)) error = xfs_swapext_deferred_bmap(&tp, ip, tip, XFS_DATA_FORK, 0, 0, XFS_B_TO_FSB(ip->i_mount, i_size_read(VFS_I(ip))), 0); else - error = xfs_swap_extent_forks(tp, ip, tip, &src_log_flags, - &target_log_flags); + error = xfs_swap_extent_forks(&tp, ip, tip); if (error) { trace_xfs_swap_extent_error(ip, error, _THIS_IP_); goto out_trans_cancel; } - /* Do we have to swap reflink flags? */ - if (!xfs_sb_version_hasrmapbt(&mp->m_sb) && - (ip->i_d.di_flags2 & XFS_DIFLAG2_REFLINK) ^ - (tip->i_d.di_flags2 & XFS_DIFLAG2_REFLINK)) { - f = ip->i_d.di_flags2 & XFS_DIFLAG2_REFLINK; - ip->i_d.di_flags2 &= ~XFS_DIFLAG2_REFLINK; - ip->i_d.di_flags2 |= tip->i_d.di_flags2 & XFS_DIFLAG2_REFLINK; - tip->i_d.di_flags2 &= ~XFS_DIFLAG2_REFLINK; - tip->i_d.di_flags2 |= f & XFS_DIFLAG2_REFLINK; - } - - /* Swap the cow forks. */ - if (xfs_sb_version_hasreflink(&mp->m_sb)) { - ASSERT(ip->i_cformat == XFS_DINODE_FMT_EXTENTS); - ASSERT(tip->i_cformat == XFS_DINODE_FMT_EXTENTS); - - swap(ip->i_cnextents, tip->i_cnextents); - swap(ip->i_cowfp, tip->i_cowfp); - - if (ip->i_cowfp && ip->i_cowfp->if_bytes) - xfs_inode_set_cowblocks_tag(ip); - else - xfs_inode_clear_cowblocks_tag(ip); - if (tip->i_cowfp && tip->i_cowfp->if_bytes) - xfs_inode_set_cowblocks_tag(tip); - else - xfs_inode_clear_cowblocks_tag(tip); - } - - xfs_trans_log_inode(tp, ip, src_log_flags); - xfs_trans_log_inode(tp, tip, target_log_flags); - - /* - * The extent forks have been swapped, but crc=1,rmapbt=0 filesystems - * have inode number owner values in the bmbt blocks that still refer to - * the old inode. Scan each bmbt to fix up the owner values with the - * inode number of the current inode. - */ - if (src_log_flags & XFS_ILOG_DOWNER) { - error = xfs_swap_change_owner(&tp, ip, tip); - if (error) - goto out_trans_cancel; - } - if (target_log_flags & XFS_ILOG_DOWNER) { - error = xfs_swap_change_owner(&tp, tip, ip); - if (error) - goto out_trans_cancel; - } - /* * If this is a synchronous mount, make sure that the * transaction goes to disk before returning to the user. From patchwork Wed Apr 29 02:45:59 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Darrick J. Wong" X-Patchwork-Id: 11515873 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 94FAB17EF for ; Wed, 29 Apr 2020 02:46:04 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 7DD1E20775 for ; Wed, 29 Apr 2020 02:46:04 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=oracle.com header.i=@oracle.com header.b="WCvV237k" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726737AbgD2CqD (ORCPT ); Tue, 28 Apr 2020 22:46:03 -0400 Received: from aserp2120.oracle.com ([141.146.126.78]:49306 "EHLO aserp2120.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726509AbgD2CqD (ORCPT ); Tue, 28 Apr 2020 22:46:03 -0400 Received: from pps.filterd (aserp2120.oracle.com [127.0.0.1]) by aserp2120.oracle.com (8.16.0.42/8.16.0.42) with SMTP id 03T2he7s073293; Wed, 29 Apr 2020 02:46:02 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=subject : from : to : cc : date : message-id : in-reply-to : references : mime-version : content-type : content-transfer-encoding; s=corp-2020-01-29; bh=EPCmgnSkWHytMY4lHYcCnHjXuq/7Jt5u158xA5I6pVg=; b=WCvV237kF3cm/jODqLiOdP1wSGz//u+KrNhMTTrF5tz7BH6q6hHFfy4Ti28SZLIMOhgL fzLB8LGDJ2ash7SDZtfY7efePzuiVTTEAW00yftpNku3dfvCC7wjPXdh7G4NOjjiQfSz a8rsJSEtc6fcE0CJ7Eqx6q5cLUvdFrggGySevknGBnTB/G1Fl+EX0ih2SE1z2FpejbAy VHW83o5QRddO6w6PiPqI26rq/yMk0W/O2cWOk4zWXAccxDv0d5pbFE1gs00hBprkmf4H oIac4d164mGSAPMOCp7DjJD24toR4d7lVTsULHisJqbyFwQogjbyyhWku9//Y9/Bfg8M lA== Received: from userp3030.oracle.com (userp3030.oracle.com [156.151.31.80]) by aserp2120.oracle.com with ESMTP id 30nucg39qm-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Wed, 29 Apr 2020 02:46:02 +0000 Received: from pps.filterd (userp3030.oracle.com [127.0.0.1]) by userp3030.oracle.com (8.16.0.42/8.16.0.42) with SMTP id 03T2ggoW071530; Wed, 29 Apr 2020 02:46:01 GMT Received: from aserv0121.oracle.com (aserv0121.oracle.com [141.146.126.235]) by userp3030.oracle.com with ESMTP id 30mxphp4st-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Wed, 29 Apr 2020 02:46:01 +0000 Received: from abhmp0011.oracle.com (abhmp0011.oracle.com [141.146.116.17]) by aserv0121.oracle.com (8.14.4/8.13.8) with ESMTP id 03T2k0CS016247; Wed, 29 Apr 2020 02:46:00 GMT Received: from localhost (/67.169.218.210) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Tue, 28 Apr 2020 19:46:00 -0700 Subject: [PATCH 16/18] xfs: refactor reflink flag handling in xfs_swap_extent_forks From: "Darrick J. Wong" To: darrick.wong@oracle.com Cc: linux-xfs@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-api@vger.kernel.org Date: Tue, 28 Apr 2020 19:45:59 -0700 Message-ID: <158812835926.168506.1546972667821869337.stgit@magnolia> In-Reply-To: <158812825316.168506.932540609191384366.stgit@magnolia> References: <158812825316.168506.932540609191384366.stgit@magnolia> User-Agent: StGit/0.17.1-dirty MIME-Version: 1.0 X-Proofpoint-Virus-Version: vendor=nai engine=6000 definitions=9605 signatures=668686 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 mlxlogscore=999 malwarescore=0 mlxscore=0 bulkscore=0 adultscore=0 phishscore=0 suspectscore=1 spamscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2003020000 definitions=main-2004290020 X-Proofpoint-Virus-Version: vendor=nai engine=6000 definitions=9605 signatures=668686 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 clxscore=1015 priorityscore=1501 mlxlogscore=999 impostorscore=0 suspectscore=1 malwarescore=0 lowpriorityscore=0 mlxscore=0 spamscore=0 adultscore=0 phishscore=0 bulkscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2003020000 definitions=main-2004290020 Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org From: Darrick J. Wong Refactor the old data fork swap function to use the new reflink flag helpers to propagate reflink flags between the two files. Signed-off-by: Darrick J. Wong --- fs/xfs/xfs_bmap_util.c | 34 +++++----------------------------- 1 file changed, 5 insertions(+), 29 deletions(-) diff --git a/fs/xfs/xfs_bmap_util.c b/fs/xfs/xfs_bmap_util.c index 1767f1586c46..639b42b1d568 100644 --- a/fs/xfs/xfs_bmap_util.c +++ b/fs/xfs/xfs_bmap_util.c @@ -1404,10 +1404,14 @@ xfs_swap_extent_forks( xfs_filblks_t taforkblks = 0; xfs_extnum_t junk; uint64_t tmp; + unsigned int state; int src_log_flags = XFS_ILOG_CORE; int target_log_flags = XFS_ILOG_CORE; int error; + state = xfs_swapext_reflink_prep(ip, tip, XFS_DATA_FORK, 0, 0, + XFS_B_TO_FSB(ip->i_mount, i_size_read(VFS_I(ip)))); + /* * Count the number of extended attribute blocks */ @@ -1490,35 +1494,7 @@ xfs_swap_extent_forks( break; } - /* Do we have to swap reflink flags? */ - if ((ip->i_d.di_flags2 & XFS_DIFLAG2_REFLINK) ^ - (tip->i_d.di_flags2 & XFS_DIFLAG2_REFLINK)) { - uint64_t f; - - f = ip->i_d.di_flags2 & XFS_DIFLAG2_REFLINK; - ip->i_d.di_flags2 &= ~XFS_DIFLAG2_REFLINK; - ip->i_d.di_flags2 |= tip->i_d.di_flags2 & XFS_DIFLAG2_REFLINK; - tip->i_d.di_flags2 &= ~XFS_DIFLAG2_REFLINK; - tip->i_d.di_flags2 |= f & XFS_DIFLAG2_REFLINK; - } - - /* Swap the cow forks. */ - if (xfs_sb_version_hasreflink(&ip->i_mount->m_sb)) { - ASSERT(ip->i_cformat == XFS_DINODE_FMT_EXTENTS); - ASSERT(tip->i_cformat == XFS_DINODE_FMT_EXTENTS); - - swap(ip->i_cnextents, tip->i_cnextents); - swap(ip->i_cowfp, tip->i_cowfp); - - if (ip->i_cowfp && ip->i_cowfp->if_bytes) - xfs_inode_set_cowblocks_tag(ip); - else - xfs_inode_clear_cowblocks_tag(ip); - if (tip->i_cowfp && tip->i_cowfp->if_bytes) - xfs_inode_set_cowblocks_tag(tip); - else - xfs_inode_clear_cowblocks_tag(tip); - } + xfs_swapext_reflink_finish(*tpp, ip, tip, state); xfs_trans_log_inode(*tpp, ip, src_log_flags); xfs_trans_log_inode(*tpp, tip, target_log_flags); From patchwork Wed Apr 29 02:46:05 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Darrick J. Wong" X-Patchwork-Id: 11515877 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 2ED4B17EF for ; Wed, 29 Apr 2020 02:46:11 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 1211C20787 for ; Wed, 29 Apr 2020 02:46:11 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=oracle.com header.i=@oracle.com header.b="F2mM/KgE" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726690AbgD2CqK (ORCPT ); Tue, 28 Apr 2020 22:46:10 -0400 Received: from userp2120.oracle.com ([156.151.31.85]:39268 "EHLO userp2120.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726422AbgD2CqK (ORCPT ); Tue, 28 Apr 2020 22:46:10 -0400 Received: from pps.filterd (userp2120.oracle.com [127.0.0.1]) by userp2120.oracle.com (8.16.0.42/8.16.0.42) with SMTP id 03T2hqAH121547; Wed, 29 Apr 2020 02:46:08 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=subject : from : to : cc : date : message-id : in-reply-to : references : mime-version : content-type : content-transfer-encoding; s=corp-2020-01-29; bh=/2GeUgKrIIN71/twzFoc+FClZ+Db9erexpTZyKWzEXE=; b=F2mM/KgEc+bwzEtRFpHtfJwNeySnQuyd9NOj+fU9UqbZGMxl0rIS2KyCFuP8UIYEALlQ IYELJauzcw/MrAU2ZCCbYepGfMp3tHo+BF73e0XObNNxibWPAPvw/KxaFlgYHtN8BY5x MYA8MrEf3ap+FfBdZvVGiejq4BkulX5PkRG0TUgRunTzog33e712mj43n+nqnsOvGktR 9Gy/nSDy51x2xp76o4RWFRr0atAaVfUr3QKWwMTfXcMN7b9pSBthTEkW18VtWewKqH24 yGWkTP7MSiW6vdVWvgXXqYm3qIFs8/fSY8U29PbC2WH2bAD7y2jZ+P58mAEflsASf06j Yw== Received: from userp3030.oracle.com (userp3030.oracle.com [156.151.31.80]) by userp2120.oracle.com with ESMTP id 30p2p08p44-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Wed, 29 Apr 2020 02:46:08 +0000 Received: from pps.filterd (userp3030.oracle.com [127.0.0.1]) by userp3030.oracle.com (8.16.0.42/8.16.0.42) with SMTP id 03T2ghPu071631; Wed, 29 Apr 2020 02:46:07 GMT Received: from userv0121.oracle.com (userv0121.oracle.com [156.151.31.72]) by userp3030.oracle.com with ESMTP id 30mxphp536-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Wed, 29 Apr 2020 02:46:07 +0000 Received: from abhmp0020.oracle.com (abhmp0020.oracle.com [141.146.116.26]) by userv0121.oracle.com (8.14.4/8.13.8) with ESMTP id 03T2k60f004090; Wed, 29 Apr 2020 02:46:07 GMT Received: from localhost (/67.169.218.210) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Tue, 28 Apr 2020 19:46:06 -0700 Subject: [PATCH 17/18] xfs: remove old swap extents implementation From: "Darrick J. Wong" To: darrick.wong@oracle.com Cc: linux-xfs@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-api@vger.kernel.org Date: Tue, 28 Apr 2020 19:46:05 -0700 Message-ID: <158812836551.168506.6339048941371563780.stgit@magnolia> In-Reply-To: <158812825316.168506.932540609191384366.stgit@magnolia> References: <158812825316.168506.932540609191384366.stgit@magnolia> User-Agent: StGit/0.17.1-dirty MIME-Version: 1.0 X-Proofpoint-Virus-Version: vendor=nai engine=6000 definitions=9605 signatures=668686 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 mlxlogscore=999 malwarescore=0 mlxscore=0 bulkscore=0 adultscore=0 phishscore=0 suspectscore=3 spamscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2003020000 definitions=main-2004290020 X-Proofpoint-Virus-Version: vendor=nai engine=6000 definitions=9605 signatures=668686 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 spamscore=0 phishscore=0 clxscore=1015 bulkscore=0 adultscore=0 lowpriorityscore=0 impostorscore=0 malwarescore=0 mlxscore=0 suspectscore=3 mlxlogscore=999 priorityscore=1501 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2003020000 definitions=main-2004290020 Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org From: Darrick J. Wong Migrate the old XFS_IOC_SWAPEXT implementation to use our shiny new one. Signed-off-by: Darrick J. Wong --- fs/xfs/xfs_bmap_util.c | 193 ------------------------------------------------ fs/xfs/xfs_bmap_util.h | 2 fs/xfs/xfs_ioctl.c | 108 ++++++++------------------- 3 files changed, 32 insertions(+), 271 deletions(-) diff --git a/fs/xfs/xfs_bmap_util.c b/fs/xfs/xfs_bmap_util.c index 639b42b1d568..df373107e782 100644 --- a/fs/xfs/xfs_bmap_util.c +++ b/fs/xfs/xfs_bmap_util.c @@ -1334,23 +1334,6 @@ xfs_swap_extents_check_format( return 0; } -static int -xfs_swap_extent_flush( - struct xfs_inode *ip) -{ - int error; - - error = filemap_write_and_wait(VFS_I(ip)->i_mapping); - if (error) - return error; - truncate_pagecache_range(VFS_I(ip), 0, -1); - - /* Verify O_DIRECT for ftmp */ - if (VFS_I(ip)->i_mapping->nrpages) - return -EINVAL; - return 0; -} - /* * Fix up the owners of the bmbt blocks to refer to the current inode. The * change owner scan attempts to order all modified buffers in the current @@ -1519,181 +1502,6 @@ xfs_swap_extent_forks( return 0; } -int -xfs_swap_extents( - struct xfs_inode *ip, /* target inode */ - struct xfs_inode *tip, /* tmp inode */ - struct xfs_swapext *sxp) -{ - struct xfs_mount *mp = ip->i_mount; - struct xfs_trans *tp; - struct xfs_bstat *sbp = &sxp->sx_stat; - int error = 0; - int lock_flags; - int resblks = 0; - - /* - * Lock the inodes against other IO, page faults and truncate to - * begin with. Then we can ensure the inodes are flushed and have no - * page cache safely. Once we have done this we can take the ilocks and - * do the rest of the checks. - */ - lock_two_nondirectories(VFS_I(ip), VFS_I(tip)); - lock_flags = XFS_MMAPLOCK_EXCL; - xfs_lock_two_inodes(ip, XFS_MMAPLOCK_EXCL, tip, XFS_MMAPLOCK_EXCL); - - /* Verify that both files have the same format */ - if ((VFS_I(ip)->i_mode & S_IFMT) != (VFS_I(tip)->i_mode & S_IFMT)) { - error = -EINVAL; - goto out_unlock; - } - - /* Verify both files are either real-time or non-realtime */ - if (XFS_IS_REALTIME_INODE(ip) != XFS_IS_REALTIME_INODE(tip)) { - error = -EINVAL; - goto out_unlock; - } - - error = xfs_qm_dqattach(ip); - if (error) - goto out_unlock; - - error = xfs_qm_dqattach(tip); - if (error) - goto out_unlock; - - error = xfs_swap_extent_flush(ip); - if (error) - goto out_unlock; - error = xfs_swap_extent_flush(tip); - if (error) - goto out_unlock; - - if (xfs_inode_has_cow_data(tip)) { - error = xfs_reflink_cancel_cow_range(tip, 0, NULLFILEOFF, true); - if (error) - goto out_unlock; - } - - /* - * Extent "swapping" with rmap requires a permanent reservation and - * a block reservation because it's really just a remap operation - * performed with log redo items! - */ - if (xfs_sb_version_hasrmapbt(&mp->m_sb)) { - int w = XFS_DATA_FORK; - uint32_t ipnext = XFS_IFORK_NEXTENTS(ip, w); - uint32_t tipnext = XFS_IFORK_NEXTENTS(tip, w); - - /* - * Conceptually this shouldn't affect the shape of either bmbt, - * but since we atomically move extents one by one, we reserve - * enough space to rebuild both trees. - */ - resblks = XFS_SWAP_RMAP_SPACE_RES(mp, ipnext, w); - resblks += XFS_SWAP_RMAP_SPACE_RES(mp, tipnext, w); - - /* - * Handle the corner case where either inode might straddle the - * btree format boundary. If so, the inode could bounce between - * btree <-> extent format on unmap -> remap cycles, freeing and - * allocating a bmapbt block each time. - */ - if (ipnext == (XFS_IFORK_MAXEXT(ip, w) + 1)) - resblks += XFS_IFORK_MAXEXT(ip, w); - if (tipnext == (XFS_IFORK_MAXEXT(tip, w) + 1)) - resblks += XFS_IFORK_MAXEXT(tip, w); - } - error = xfs_trans_alloc(mp, &M_RES(mp)->tr_write, resblks, 0, 0, &tp); - if (error) - goto out_unlock; - - /* - * Lock and join the inodes to the tansaction so that transaction commit - * or cancel will unlock the inodes from this point onwards. - */ - xfs_lock_two_inodes(ip, XFS_ILOCK_EXCL, tip, XFS_ILOCK_EXCL); - lock_flags |= XFS_ILOCK_EXCL; - xfs_trans_ijoin(tp, ip, 0); - xfs_trans_ijoin(tp, tip, 0); - - - /* Verify all data are being swapped */ - if (sxp->sx_offset != 0 || - sxp->sx_length != ip->i_d.di_size || - sxp->sx_length != tip->i_d.di_size) { - error = -EFAULT; - goto out_trans_cancel; - } - - trace_xfs_swap_extent_before(ip, 0); - trace_xfs_swap_extent_before(tip, 1); - - /* check inode formats now that data is flushed */ - error = xfs_swap_extents_check_format(ip, tip); - if (error) { - xfs_notice(mp, - "%s: inode 0x%llx format is incompatible for exchanging.", - __func__, ip->i_ino); - goto out_trans_cancel; - } - - /* - * Compare the current change & modify times with that - * passed in. If they differ, we abort this swap. - * This is the mechanism used to ensure the calling - * process that the file was not changed out from - * under it. - */ - if ((sbp->bs_ctime.tv_sec != VFS_I(ip)->i_ctime.tv_sec) || - (sbp->bs_ctime.tv_nsec != VFS_I(ip)->i_ctime.tv_nsec) || - (sbp->bs_mtime.tv_sec != VFS_I(ip)->i_mtime.tv_sec) || - (sbp->bs_mtime.tv_nsec != VFS_I(ip)->i_mtime.tv_nsec)) { - error = -EBUSY; - goto out_trans_cancel; - } - - /* - * Note the trickiness in setting the log flags - we set the owner log - * flag on the opposite inode (i.e. the inode we are setting the new - * owner to be) because once we swap the forks and log that, log - * recovery is going to see the fork as owned by the swapped inode, - * not the pre-swapped inodes. - */ - if (xfs_sb_version_hasrmapbt(&mp->m_sb)) - error = xfs_swapext_deferred_bmap(&tp, ip, tip, XFS_DATA_FORK, - 0, 0, XFS_B_TO_FSB(ip->i_mount, - i_size_read(VFS_I(ip))), 0); - else - error = xfs_swap_extent_forks(&tp, ip, tip); - if (error) { - trace_xfs_swap_extent_error(ip, error, _THIS_IP_); - goto out_trans_cancel; - } - - /* - * If this is a synchronous mount, make sure that the - * transaction goes to disk before returning to the user. - */ - if (mp->m_flags & XFS_MOUNT_WSYNC) - xfs_trans_set_sync(tp); - - error = xfs_trans_commit(tp); - - trace_xfs_swap_extent_after(ip, 0); - trace_xfs_swap_extent_after(tip, 1); - -out_unlock: - xfs_iunlock(ip, lock_flags); - xfs_iunlock(tip, lock_flags); - unlock_two_nondirectories(VFS_I(ip), VFS_I(tip)); - return error; - -out_trans_cancel: - xfs_trans_cancel(tp); - goto out_unlock; -} - /* Prepare two files to have their data swapped. */ int xfs_swap_range_prep( @@ -2061,4 +1869,3 @@ xfs_swap_range( xfs_trans_cancel(tp); goto out_unlock; } - diff --git a/fs/xfs/xfs_bmap_util.h b/fs/xfs/xfs_bmap_util.h index d3444a63bbd7..e0712c274dd2 100644 --- a/fs/xfs/xfs_bmap_util.h +++ b/fs/xfs/xfs_bmap_util.h @@ -66,8 +66,6 @@ int xfs_insert_file_space(struct xfs_inode *, xfs_off_t offset, bool xfs_can_free_eofblocks(struct xfs_inode *ip, bool force); int xfs_free_eofblocks(struct xfs_inode *ip); -int xfs_swap_extents(struct xfs_inode *ip, struct xfs_inode *tip, - struct xfs_swapext *sx); int xfs_swap_range_prep(struct file *file1, struct file *file2, struct file_swap_range *fsr); int xfs_swap_range(struct xfs_inode *ip1, struct xfs_inode *ip2, diff --git a/fs/xfs/xfs_ioctl.c b/fs/xfs/xfs_ioctl.c index 274423ba3bb5..f93de4f7a944 100644 --- a/fs/xfs/xfs_ioctl.c +++ b/fs/xfs/xfs_ioctl.c @@ -1864,81 +1864,47 @@ xfs_ioc_scrub_metadata( int xfs_ioc_swapext( - xfs_swapext_t *sxp) + struct xfs_swapext __user *arg) { - xfs_inode_t *ip, *tip; - struct fd f, tmp; - int error = 0; + struct xfs_swapext sx; + struct file_swap_range fsr = { 0 }; + struct fd fd2, fd1; + int error = 0; - /* Pull information for the target fd */ - f = fdget((int)sxp->sx_fdtarget); - if (!f.file) { - error = -EINVAL; - goto out; - } + if (copy_from_user(&sx, arg, sizeof(struct xfs_swapext))) + return -EFAULT; - if (!(f.file->f_mode & FMODE_WRITE) || - !(f.file->f_mode & FMODE_READ) || - (f.file->f_flags & O_APPEND)) { - error = -EBADF; - goto out_put_file; - } + fd2 = fdget((int)sx.sx_fdtarget); + if (!fd2.file) + return -EINVAL; - tmp = fdget((int)sxp->sx_fdtmp); - if (!tmp.file) { + fd1 = fdget((int)sx.sx_fdtmp); + if (!fd1.file) { error = -EINVAL; - goto out_put_file; + goto dest_fdput; } - if (!(tmp.file->f_mode & FMODE_WRITE) || - !(tmp.file->f_mode & FMODE_READ) || - (tmp.file->f_flags & O_APPEND)) { - error = -EBADF; - goto out_put_tmp_file; - } + fsr.file1_fd = sx.sx_fdtmp; + fsr.length = sx.sx_length; + fsr.flags = FILE_SWAP_RANGE_NONATOMIC | FILE_SWAP_RANGE_FILE2_FRESH | + FILE_SWAP_RANGE_FULL_FILES; + fsr.file2_ino = sx.sx_stat.bs_ino; + fsr.file2_mtime = sx.sx_stat.bs_mtime.tv_sec; + fsr.file2_ctime = sx.sx_stat.bs_ctime.tv_sec; + fsr.file2_mtime_nsec = sx.sx_stat.bs_mtime.tv_nsec; + fsr.file2_ctime_nsec = sx.sx_stat.bs_ctime.tv_nsec; - if (IS_SWAPFILE(file_inode(f.file)) || - IS_SWAPFILE(file_inode(tmp.file))) { - error = -EINVAL; - goto out_put_tmp_file; - } + error = vfs_swap_file_range(fd1.file, fd2.file, &fsr); /* - * We need to ensure that the fds passed in point to XFS inodes - * before we cast and access them as XFS structures as we have no - * control over what the user passes us here. + * The old implementation returned EFAULT if the swap range was not + * the entirety of both files. */ - if (f.file->f_op != &xfs_file_operations || - tmp.file->f_op != &xfs_file_operations) { - error = -EINVAL; - goto out_put_tmp_file; - } - - ip = XFS_I(file_inode(f.file)); - tip = XFS_I(file_inode(tmp.file)); - - if (ip->i_mount != tip->i_mount) { - error = -EINVAL; - goto out_put_tmp_file; - } - - if (ip->i_ino == tip->i_ino) { - error = -EINVAL; - goto out_put_tmp_file; - } - - if (XFS_FORCED_SHUTDOWN(ip->i_mount)) { - error = -EIO; - goto out_put_tmp_file; - } - - error = xfs_swap_extents(ip, tip, sxp); - - out_put_tmp_file: - fdput(tmp); - out_put_file: - fdput(f); - out: + if (error == -EDOM) + error = -EFAULT; + fdput(fd1); +dest_fdput: + fdput(fd2); return error; } @@ -2183,18 +2149,8 @@ xfs_file_ioctl( case XFS_IOC_ATTRMULTI_BY_HANDLE: return xfs_attrmulti_by_handle(filp, arg); - case XFS_IOC_SWAPEXT: { - struct xfs_swapext sxp; - - if (copy_from_user(&sxp, arg, sizeof(xfs_swapext_t))) - return -EFAULT; - error = mnt_want_write_file(filp); - if (error) - return error; - error = xfs_ioc_swapext(&sxp); - mnt_drop_write_file(filp); - return error; - } + case XFS_IOC_SWAPEXT: + return xfs_ioc_swapext(arg); case XFS_IOC_FSCOUNTS: { xfs_fsop_counts_t out; From patchwork Wed Apr 29 02:46:11 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Darrick J. Wong" X-Patchwork-Id: 11515881 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 5017017EF for ; Wed, 29 Apr 2020 02:46:17 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 375D22076A for ; Wed, 29 Apr 2020 02:46:17 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=oracle.com header.i=@oracle.com header.b="xc+PP61x" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726740AbgD2CqQ (ORCPT ); Tue, 28 Apr 2020 22:46:16 -0400 Received: from userp2120.oracle.com ([156.151.31.85]:39362 "EHLO userp2120.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726422AbgD2CqP (ORCPT ); Tue, 28 Apr 2020 22:46:15 -0400 Received: from pps.filterd (userp2120.oracle.com [127.0.0.1]) by userp2120.oracle.com (8.16.0.42/8.16.0.42) with SMTP id 03T2hjcc121531; Wed, 29 Apr 2020 02:46:14 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=subject : from : to : cc : date : message-id : in-reply-to : references : mime-version : content-type : content-transfer-encoding; s=corp-2020-01-29; bh=gIeAZYkuVVDyweFavOri/H5uxFdFvLW4QIrNOv2+naM=; b=xc+PP61xi/bU+9667oXO9wieCTD0n7tfrhIMeBqPrGaZMc/SdD6nPvwqt+MXPb6LZSjw k5U6SEkwjpqc2+vihk2K+srxcb9AIedEL+LUDySVXfi3deNIgHClwIwFDkRMxSdwwftN kW2P0hO4mpyhw269M8sjBbIslT+DhYA10LM1jFXSJOSySSSRRJogJLx3TTfyXacNzJhh Vo39gBKykL3rhWUIfmKnJNm7tlft9qHpIPAvnQnHnXKvUIeICBOys0+Ehb+V1cCT296u wIDovXlyg/ycd7bWSN4crd+bBENVcdNmcl/7Dr8vUVeBiTGfvgvNUI2P8OzJeRYXmnh8 8A== Received: from userp3020.oracle.com (userp3020.oracle.com [156.151.31.79]) by userp2120.oracle.com with ESMTP id 30p2p08p49-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Wed, 29 Apr 2020 02:46:14 +0000 Received: from pps.filterd (userp3020.oracle.com [127.0.0.1]) by userp3020.oracle.com (8.16.0.42/8.16.0.42) with SMTP id 03T2fu9K096183; Wed, 29 Apr 2020 02:46:14 GMT Received: from aserv0122.oracle.com (aserv0122.oracle.com [141.146.126.236]) by userp3020.oracle.com with ESMTP id 30pvcytegs-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Wed, 29 Apr 2020 02:46:14 +0000 Received: from abhmp0016.oracle.com (abhmp0016.oracle.com [141.146.116.22]) by aserv0122.oracle.com (8.14.4/8.14.4) with ESMTP id 03T2kD1S004048; Wed, 29 Apr 2020 02:46:13 GMT Received: from localhost (/67.169.218.210) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Tue, 28 Apr 2020 19:46:12 -0700 Subject: [PATCH 18/18] xfs: fix quota accounting in the old fork swap code From: "Darrick J. Wong" To: darrick.wong@oracle.com Cc: linux-xfs@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-api@vger.kernel.org Date: Tue, 28 Apr 2020 19:46:11 -0700 Message-ID: <158812837189.168506.4738302671249782594.stgit@magnolia> In-Reply-To: <158812825316.168506.932540609191384366.stgit@magnolia> References: <158812825316.168506.932540609191384366.stgit@magnolia> User-Agent: StGit/0.17.1-dirty MIME-Version: 1.0 X-Proofpoint-Virus-Version: vendor=nai engine=6000 definitions=9605 signatures=668686 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 spamscore=0 mlxlogscore=999 suspectscore=1 malwarescore=0 adultscore=0 bulkscore=0 phishscore=0 mlxscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2003020000 definitions=main-2004290020 X-Proofpoint-Virus-Version: vendor=nai engine=6000 definitions=9605 signatures=668686 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 spamscore=0 phishscore=0 clxscore=1015 bulkscore=0 adultscore=0 lowpriorityscore=0 impostorscore=0 malwarescore=0 mlxscore=0 suspectscore=1 mlxlogscore=999 priorityscore=1501 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2003020000 definitions=main-2004290020 Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org From: Darrick J. Wong The old fork swapping code doesn't change quota counts when it swaps data forks. Fix it to do that. Signed-off-by: Darrick J. Wong --- fs/xfs/xfs_bmap_util.c | 10 ++++++++++ 1 file changed, 10 insertions(+) diff --git a/fs/xfs/xfs_bmap_util.c b/fs/xfs/xfs_bmap_util.c index df373107e782..de6d1747a3fa 100644 --- a/fs/xfs/xfs_bmap_util.c +++ b/fs/xfs/xfs_bmap_util.c @@ -1386,6 +1386,7 @@ xfs_swap_extent_forks( xfs_filblks_t aforkblks = 0; xfs_filblks_t taforkblks = 0; xfs_extnum_t junk; + int64_t temp_blks; uint64_t tmp; unsigned int state; int src_log_flags = XFS_ILOG_CORE; @@ -1432,6 +1433,15 @@ xfs_swap_extent_forks( */ swap(ip->i_df, tip->i_df); + /* Update quota accounting. */ + temp_blks = tip->i_d.di_nblocks - taforkblks + aforkblks; + xfs_trans_mod_dquot_byino(*tpp, ip, XFS_TRANS_DQ_BCOUNT, + temp_blks - ip->i_d.di_nblocks); + + temp_blks = ip->i_d.di_nblocks + taforkblks - aforkblks; + xfs_trans_mod_dquot_byino(*tpp, tip, XFS_TRANS_DQ_BCOUNT, + temp_blks - tip->i_d.di_nblocks); + /* * Fix the on-disk inode values */