From patchwork Fri May 22 02:53:01 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Darrick J. Wong" X-Patchwork-Id: 11564433 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 19E6E90 for ; Fri, 22 May 2020 02:53:16 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 02CA0207E8 for ; Fri, 22 May 2020 02:53:16 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=oracle.com header.i=@oracle.com header.b="Vr3av6Rk" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728273AbgEVCxP (ORCPT ); Thu, 21 May 2020 22:53:15 -0400 Received: from aserp2120.oracle.com ([141.146.126.78]:55852 "EHLO aserp2120.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728175AbgEVCxP (ORCPT ); Thu, 21 May 2020 22:53:15 -0400 Received: from pps.filterd (aserp2120.oracle.com [127.0.0.1]) by aserp2120.oracle.com (8.16.0.42/8.16.0.42) with SMTP id 04M2miR2095173; Fri, 22 May 2020 02:53:06 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=subject : from : to : cc : date : message-id : in-reply-to : references : mime-version : content-type : content-transfer-encoding; s=corp-2020-01-29; bh=i0N33mgwHZ/0Fz39zo3dkbyTWGxO2teVQwcGFQXeNmM=; b=Vr3av6RkByCg0OPDw+uGVBYCxDBLoC478dbLr+aKGlQY6CRbfsjpF273yardVjKLox24 HXuHPicQXfz+j5yzyMik6Mf7UJD393wWBvqZiTds2YIpCIrsSuDtcbYKUkGhh//5p8wJ yIXCbyVJiVcMQrWJV3eTgdiJkW0dKFXIUiZnWkOjTrvD71esp8IitGFRC8Gkx0IxjMWy gIZccgGcK5TTo2beGqwkxvTopwwon5HQnetfnGIkbH71EI2xZ1NVz5OpGCa+WHM+ZuCq eMWMQtEg85IHIFQWrHiuXYSNAAMv5MVkUVR5z0o5sVDUyb1C14C+bDEpmE8W+6422CP5 5w== Received: from userp3030.oracle.com (userp3030.oracle.com [156.151.31.80]) by aserp2120.oracle.com with ESMTP id 31284mbhyf-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=FAIL); Fri, 22 May 2020 02:53:06 +0000 Received: from pps.filterd (userp3030.oracle.com [127.0.0.1]) by userp3030.oracle.com (8.16.0.42/8.16.0.42) with SMTP id 04M2mXv1186075; Fri, 22 May 2020 02:53:05 GMT Received: from userv0122.oracle.com (userv0122.oracle.com [156.151.31.75]) by userp3030.oracle.com with ESMTP id 314gmaauc8-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Fri, 22 May 2020 02:53:05 +0000 Received: from abhmp0018.oracle.com (abhmp0018.oracle.com [141.146.116.24]) by userv0122.oracle.com (8.14.4/8.14.4) with ESMTP id 04M2r4ZW004338; Fri, 22 May 2020 02:53:04 GMT Received: from localhost (/67.169.218.210) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Thu, 21 May 2020 19:53:04 -0700 Subject: [PATCH 1/4] xfs: don't fail unwritten extent conversion on writeback due to edquot From: "Darrick J. Wong" To: darrick.wong@oracle.com Cc: Brian Foster , Christoph Hellwig , linux-xfs@vger.kernel.org, hch@infradead.org, bfoster@redhat.com Date: Thu, 21 May 2020 19:53:01 -0700 Message-ID: <159011598100.76931.14459764090448034071.stgit@magnolia> In-Reply-To: <159011597442.76931.7800023221007221972.stgit@magnolia> References: <159011597442.76931.7800023221007221972.stgit@magnolia> User-Agent: StGit/0.19 MIME-Version: 1.0 X-Proofpoint-Virus-Version: vendor=nai engine=6000 definitions=9628 signatures=668686 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 malwarescore=0 mlxlogscore=999 adultscore=0 phishscore=0 mlxscore=0 spamscore=0 suspectscore=1 bulkscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2004280000 definitions=main-2005220021 X-Proofpoint-Virus-Version: vendor=nai engine=6000 definitions=9628 signatures=668686 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 suspectscore=1 mlxscore=0 cotscore=-2147483648 impostorscore=0 malwarescore=0 mlxlogscore=999 lowpriorityscore=0 phishscore=0 spamscore=0 bulkscore=0 adultscore=0 priorityscore=1501 clxscore=1015 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2004280000 definitions=main-2005220021 Sender: linux-xfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-xfs@vger.kernel.org From: Darrick J. Wong During writeback, it's possible for the quota block reservation in xfs_iomap_write_unwritten to fail with EDQUOT because we hit the quota limit. This causes writeback errors for data that was already written to disk, when it's not even guaranteed that the bmbt will expand to exceed the quota limit. Irritatingly, this condition is reported to userspace as EIO by fsync, which is confusing. We wrote the data, so allow the reservation. That might put us slightly above the hard limit, but it's better than losing data after a write. Signed-off-by: Darrick J. Wong Reviewed-by: Brian Foster Reviewed-by: Christoph Hellwig --- fs/xfs/xfs_iomap.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c index bb590a267a7f..ac970b13b1f8 100644 --- a/fs/xfs/xfs_iomap.c +++ b/fs/xfs/xfs_iomap.c @@ -563,7 +563,7 @@ xfs_iomap_write_unwritten( xfs_trans_ijoin(tp, ip, 0); error = xfs_trans_reserve_quota_nblks(tp, ip, resblks, 0, - XFS_QMOPT_RES_REGBLKS); + XFS_QMOPT_RES_REGBLKS | XFS_QMOPT_FORCE_RES); if (error) goto error_on_bmapi_transaction; From patchwork Fri May 22 02:53:09 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Darrick J. Wong" X-Patchwork-Id: 11564435 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 6A53390 for ; Fri, 22 May 2020 02:53:19 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 53E30207E8 for ; Fri, 22 May 2020 02:53:19 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=oracle.com header.i=@oracle.com header.b="OZlBuLTG" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727944AbgEVCxT (ORCPT ); Thu, 21 May 2020 22:53:19 -0400 Received: from userp2130.oracle.com ([156.151.31.86]:35126 "EHLO userp2130.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727770AbgEVCxS (ORCPT ); Thu, 21 May 2020 22:53:18 -0400 Received: from pps.filterd (userp2130.oracle.com [127.0.0.1]) by userp2130.oracle.com (8.16.0.42/8.16.0.42) with SMTP id 04M2oh3l124550; Fri, 22 May 2020 02:53:13 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=subject : from : to : cc : date : message-id : in-reply-to : references : mime-version : content-type : content-transfer-encoding; s=corp-2020-01-29; bh=ZFDLWaEKIRrO0Kvw52IZ3NYXF1IH4XypqMPGXtRrXpo=; b=OZlBuLTG/uDFedui9RKYCXu+i+pt2vbxf55Y4lswYxw/N9wpR1/CrLMIpdAOKH9zVmQc HTdCBkWXEmTK5rMifkz4+0+BXf8sk/eV2DPy6+A4xJxTD1zIAWsh1QB24JlOd6gW+Svw hCywd7dujeQRu8zmHzHkzMCalWsl2MywrfPQKqDjP5jxlp8XiNam/uPMM+Fayj8PlPkg RxFOxO9+vznEtjqrJMmlyRj+JW8cNcyy3ZyTjFmXTCLSoCbm3V3OKxr5+P/022jghLl8 fRc0HEwYeYGcGDNNiVnhoNPqYtWIHw2CG6ZFMdBwsQtKRDMfLTBaaC1ydXAlQ746UyVL gQ== Received: from aserp3020.oracle.com (aserp3020.oracle.com [141.146.126.70]) by userp2130.oracle.com with ESMTP id 3127krkk00-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=FAIL); Fri, 22 May 2020 02:53:13 +0000 Received: from pps.filterd (aserp3020.oracle.com [127.0.0.1]) by aserp3020.oracle.com (8.16.0.42/8.16.0.42) with SMTP id 04M2nELO173920; Fri, 22 May 2020 02:53:12 GMT Received: from userv0122.oracle.com (userv0122.oracle.com [156.151.31.75]) by aserp3020.oracle.com with ESMTP id 312t3d3vk8-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Fri, 22 May 2020 02:53:12 +0000 Received: from abhmp0011.oracle.com (abhmp0011.oracle.com [141.146.116.17]) by userv0122.oracle.com (8.14.4/8.14.4) with ESMTP id 04M2rBdM004429; Fri, 22 May 2020 02:53:11 GMT Received: from localhost (/67.169.218.210) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Thu, 21 May 2020 19:53:11 -0700 Subject: [PATCH 2/4] xfs: measure all contiguous previous extents for prealloc size From: "Darrick J. Wong" To: darrick.wong@oracle.com Cc: linux-xfs@vger.kernel.org, hch@infradead.org, bfoster@redhat.com Date: Thu, 21 May 2020 19:53:09 -0700 Message-ID: <159011598984.76931.15076402801787913960.stgit@magnolia> In-Reply-To: <159011597442.76931.7800023221007221972.stgit@magnolia> References: <159011597442.76931.7800023221007221972.stgit@magnolia> User-Agent: StGit/0.19 MIME-Version: 1.0 X-Proofpoint-Virus-Version: vendor=nai engine=6000 definitions=9628 signatures=668686 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 bulkscore=0 spamscore=0 mlxlogscore=999 phishscore=0 mlxscore=0 malwarescore=0 suspectscore=1 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2004280000 definitions=main-2005220021 X-Proofpoint-Virus-Version: vendor=nai engine=6000 definitions=9628 signatures=668686 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 adultscore=0 phishscore=0 spamscore=0 bulkscore=0 clxscore=1015 priorityscore=1501 mlxscore=0 impostorscore=0 suspectscore=1 mlxlogscore=999 malwarescore=0 cotscore=-2147483648 lowpriorityscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2004280000 definitions=main-2005220021 Sender: linux-xfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-xfs@vger.kernel.org From: Darrick J. Wong When we're estimating a new speculative preallocation length for an extending write, we should walk backwards through the extent list to determine the number of number of blocks that are physically and logically contiguous with the write offset, and use that as an input to the preallocation size computation. This way, preallocation length is truly measured by the effectiveness of the allocator in giving us contiguous allocations without being influenced by the state of a given extent. This fixes both the problem where ZERO_RANGE within an EOF can reduce preallocation, and prevents the unnecessary shrinkage of preallocation when delalloc extents are turned into unwritten extents. This was found as a regression in xfs/014 after changing delalloc writes to create unwritten extents during writeback. Signed-off-by: Darrick J. Wong Reviewed-by: Christoph Hellwig --- fs/xfs/xfs_iomap.c | 37 +++++++++++++++++++++++++------------ 1 file changed, 25 insertions(+), 12 deletions(-) diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c index ac970b13b1f8..6a308af93893 100644 --- a/fs/xfs/xfs_iomap.c +++ b/fs/xfs/xfs_iomap.c @@ -377,15 +377,17 @@ xfs_iomap_prealloc_size( loff_t count, struct xfs_iext_cursor *icur) { + struct xfs_iext_cursor ncur = *icur; /* struct copy */ + struct xfs_bmbt_irec prev, got; struct xfs_mount *mp = ip->i_mount; struct xfs_ifork *ifp = XFS_IFORK_PTR(ip, whichfork); xfs_fileoff_t offset_fsb = XFS_B_TO_FSBT(mp, offset); - struct xfs_bmbt_irec prev; - int shift = 0; int64_t freesp; xfs_fsblock_t qblocks; - int qshift = 0; xfs_fsblock_t alloc_blocks = 0; + xfs_extlen_t plen; + int shift = 0; + int qshift = 0; if (offset + count <= XFS_ISIZE(ip)) return 0; @@ -413,16 +415,27 @@ xfs_iomap_prealloc_size( * preallocation size. * * If the extent is a hole, then preallocation is essentially disabled. - * Otherwise we take the size of the preceding data extent as the basis - * for the preallocation size. If the size of the extent is greater than - * half the maximum extent length, then use the current offset as the - * basis. This ensures that for large files the preallocation size - * always extends to MAXEXTLEN rather than falling short due to things - * like stripe unit/width alignment of real extents. + * Otherwise we take the size of the preceding data extents as the basis + * for the preallocation size. Note that we don't care if the previous + * extents are written or not. + * + * If the size of the extents is greater than half the maximum extent + * length, then use the current offset as the basis. This ensures that + * for large files the preallocation size always extends to MAXEXTLEN + * rather than falling short due to things like stripe unit/width + * alignment of real extents. */ - if (prev.br_blockcount <= (MAXEXTLEN >> 1)) - alloc_blocks = prev.br_blockcount << 1; - else + plen = prev.br_blockcount; + while (xfs_iext_prev_extent(ifp, &ncur, &got)) { + if (plen > MAXEXTLEN / 2 || + got.br_startoff + got.br_blockcount != prev.br_startoff || + got.br_startblock + got.br_blockcount != prev.br_startblock) + break; + plen += got.br_blockcount; + prev = got; + } + alloc_blocks = plen * 2; + if (alloc_blocks > MAXEXTLEN) alloc_blocks = XFS_B_TO_FSB(mp, offset); if (!alloc_blocks) goto check_writeio; From patchwork Fri May 22 02:53:16 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Darrick J. Wong" X-Patchwork-Id: 11564437 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 5DD5190 for ; Fri, 22 May 2020 02:53:25 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 40684207E8 for ; Fri, 22 May 2020 02:53:25 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=oracle.com header.i=@oracle.com header.b="TVYOaTk2" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727770AbgEVCxY (ORCPT ); Thu, 21 May 2020 22:53:24 -0400 Received: from userp2130.oracle.com ([156.151.31.86]:35174 "EHLO userp2130.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727055AbgEVCxY (ORCPT ); Thu, 21 May 2020 22:53:24 -0400 Received: from pps.filterd (userp2130.oracle.com [127.0.0.1]) by userp2130.oracle.com (8.16.0.42/8.16.0.42) with SMTP id 04M2mVMM123383; Fri, 22 May 2020 02:53:20 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=subject : from : to : cc : date : message-id : in-reply-to : references : mime-version : content-type : content-transfer-encoding; s=corp-2020-01-29; bh=FAO3H4DWyRjLYvBNdb2G9tFexwRSnmfFh30gHMiobTQ=; b=TVYOaTk2wJUykg4HTGUFhtPm9Bs1344SN8+5ZHIZEhvYNav+k/bau7VUzP7orMk9ZiM+ 0RssWinQKfiTXOYd+a093QaHlYGGhEhKFLSWLhtYbjjt5MIjBqM1rU+61UFc/95GN+7W aO7+VjjhxK2hhGAJeps8VhqanwVf2qSqIIzm4lRINqgKhcsEMs2kr1R0BLZjKXoTRT0c brG0bsyCM1SW2Yj7XrxTApP8CvqQghcdSGEsJ5zeYwvKBp18dO58CgTq5qPpCQSbVa1Y JMSbpwLGpg2S/eSUgu1haYiFv1l0fz7e1RRe2b+fdlrqDnNTyHZuZiX/+kl07g9Ye0ed 4w== Received: from aserp3020.oracle.com (aserp3020.oracle.com [141.146.126.70]) by userp2130.oracle.com with ESMTP id 3127krkk0d-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=FAIL); Fri, 22 May 2020 02:53:20 +0000 Received: from pps.filterd (aserp3020.oracle.com [127.0.0.1]) by aserp3020.oracle.com (8.16.0.42/8.16.0.42) with SMTP id 04M2nEZQ173910; Fri, 22 May 2020 02:53:19 GMT Received: from aserv0122.oracle.com (aserv0122.oracle.com [141.146.126.236]) by aserp3020.oracle.com with ESMTP id 312t3d3vsj-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Fri, 22 May 2020 02:53:19 +0000 Received: from abhmp0005.oracle.com (abhmp0005.oracle.com [141.146.116.11]) by aserv0122.oracle.com (8.14.4/8.14.4) with ESMTP id 04M2rIxb029849; Fri, 22 May 2020 02:53:18 GMT Received: from localhost (/67.169.218.210) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Thu, 21 May 2020 19:53:17 -0700 Subject: [PATCH 3/4] xfs: refactor xfs_iomap_prealloc_size From: "Darrick J. Wong" To: darrick.wong@oracle.com Cc: Christoph Hellwig , linux-xfs@vger.kernel.org, hch@infradead.org, bfoster@redhat.com Date: Thu, 21 May 2020 19:53:16 -0700 Message-ID: <159011599650.76931.9345570053700795571.stgit@magnolia> In-Reply-To: <159011597442.76931.7800023221007221972.stgit@magnolia> References: <159011597442.76931.7800023221007221972.stgit@magnolia> User-Agent: StGit/0.19 MIME-Version: 1.0 X-Proofpoint-Virus-Version: vendor=nai engine=6000 definitions=9628 signatures=668686 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 bulkscore=0 spamscore=0 mlxlogscore=999 phishscore=0 mlxscore=0 malwarescore=0 suspectscore=1 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2004280000 definitions=main-2005220021 X-Proofpoint-Virus-Version: vendor=nai engine=6000 definitions=9628 signatures=668686 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 adultscore=0 phishscore=0 spamscore=0 bulkscore=0 clxscore=1015 priorityscore=1501 mlxscore=0 impostorscore=0 suspectscore=1 mlxlogscore=999 malwarescore=0 cotscore=-2147483648 lowpriorityscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2004280000 definitions=main-2005220021 Sender: linux-xfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-xfs@vger.kernel.org From: Darrick J. Wong Refactor xfs_iomap_prealloc_size to be the function that dynamically computes the per-file preallocation size by moving the allocsize= case to the caller. Break up the huge comment preceding the function to annotate the relevant parts of the code, and remove the impossible check_writeio case. Suggested-by: Christoph Hellwig Signed-off-by: Darrick J. Wong Reviewed-by: Christoph Hellwig --- fs/xfs/xfs_iomap.c | 83 ++++++++++++++++++++++------------------------------ 1 file changed, 35 insertions(+), 48 deletions(-) diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c index 6a308af93893..d8fa5519c761 100644 --- a/fs/xfs/xfs_iomap.c +++ b/fs/xfs/xfs_iomap.c @@ -352,22 +352,10 @@ xfs_quota_calc_throttle( } /* - * If we are doing a write at the end of the file and there are no allocations - * past this one, then extend the allocation out to the file system's write - * iosize. - * * If we don't have a user specified preallocation size, dynamically increase * the preallocation size as the size of the file grows. Cap the maximum size * at a single extent or less if the filesystem is near full. The closer the - * filesystem is to full, the smaller the maximum prealocation. - * - * As an exception we don't do any preallocation at all if the file is smaller - * than the minimum preallocation and we are using the default dynamic - * preallocation scheme, as it is likely this is the only write to the file that - * is going to be done. - * - * We clean up any extra space left over when the file is closed in - * xfs_inactive(). + * filesystem is to being full, the smaller the maximum preallocation. */ STATIC xfs_fsblock_t xfs_iomap_prealloc_size( @@ -389,41 +377,28 @@ xfs_iomap_prealloc_size( int shift = 0; int qshift = 0; - if (offset + count <= XFS_ISIZE(ip)) - return 0; - - if (!(mp->m_flags & XFS_MOUNT_ALLOCSIZE) && - (XFS_ISIZE(ip) < XFS_FSB_TO_B(mp, mp->m_allocsize_blocks))) + /* + * As an exception we don't do any preallocation at all if the file is + * smaller than the minimum preallocation and we are using the default + * dynamic preallocation scheme, as it is likely this is the only write + * to the file that is going to be done. + */ + if (XFS_ISIZE(ip) < XFS_FSB_TO_B(mp, mp->m_allocsize_blocks)) return 0; /* - * If an explicit allocsize is set, the file is small, or we - * are writing behind a hole, then use the minimum prealloc: + * Use the minimum preallocation size for small files or if we are + * writing right after a hole. */ - if ((mp->m_flags & XFS_MOUNT_ALLOCSIZE) || - XFS_ISIZE(ip) < XFS_FSB_TO_B(mp, mp->m_dalign) || + if (XFS_ISIZE(ip) < XFS_FSB_TO_B(mp, mp->m_dalign) || !xfs_iext_peek_prev_extent(ifp, icur, &prev) || prev.br_startoff + prev.br_blockcount < offset_fsb) return mp->m_allocsize_blocks; /* - * Determine the initial size of the preallocation. We are beyond the - * current EOF here, but we need to take into account whether this is - * a sparse write or an extending write when determining the - * preallocation size. Hence we need to look up the extent that ends - * at the current write offset and use the result to determine the - * preallocation size. - * - * If the extent is a hole, then preallocation is essentially disabled. - * Otherwise we take the size of the preceding data extents as the basis - * for the preallocation size. Note that we don't care if the previous - * extents are written or not. - * - * If the size of the extents is greater than half the maximum extent - * length, then use the current offset as the basis. This ensures that - * for large files the preallocation size always extends to MAXEXTLEN - * rather than falling short due to things like stripe unit/width - * alignment of real extents. + * Take the size of the preceding data extents as the basis for the + * preallocation size. Note that we don't care if the previous extents + * are written or not. */ plen = prev.br_blockcount; while (xfs_iext_prev_extent(ifp, &ncur, &got)) { @@ -434,19 +409,25 @@ xfs_iomap_prealloc_size( plen += got.br_blockcount; prev = got; } + + /* + * If the size of the extents is greater than half the maximum extent + * length, then use the current offset as the basis. This ensures that + * for large files the preallocation size always extends to MAXEXTLEN + * rather than falling short due to things like stripe unit/width + * alignment of real extents. + */ alloc_blocks = plen * 2; if (alloc_blocks > MAXEXTLEN) alloc_blocks = XFS_B_TO_FSB(mp, offset); - if (!alloc_blocks) - goto check_writeio; qblocks = alloc_blocks; /* * MAXEXTLEN is not a power of two value but we round the prealloc down * to the nearest power of two value after throttling. To prevent the - * round down from unconditionally reducing the maximum supported prealloc - * size, we round up first, apply appropriate throttling, round down and - * cap the value to MAXEXTLEN. + * round down from unconditionally reducing the maximum supported + * prealloc size, we round up first, apply appropriate throttling, + * round down and cap the value to MAXEXTLEN. */ alloc_blocks = XFS_FILEOFF_MIN(roundup_pow_of_two(MAXEXTLEN), alloc_blocks); @@ -507,7 +488,6 @@ xfs_iomap_prealloc_size( */ while (alloc_blocks && alloc_blocks >= freesp) alloc_blocks >>= 4; -check_writeio: if (alloc_blocks < mp->m_allocsize_blocks) alloc_blocks = mp->m_allocsize_blocks; trace_xfs_iomap_prealloc_size(ip, alloc_blocks, shift, @@ -974,9 +954,16 @@ xfs_buffered_write_iomap_begin( if (error) goto out_unlock; - if (eof) { - prealloc_blocks = xfs_iomap_prealloc_size(ip, allocfork, offset, - count, &icur); + if (eof && offset + count > XFS_ISIZE(ip)) { + /* + * Determine the initial size of the preallocation. + * We clean up any extra preallocation when the file is closed. + */ + if (mp->m_flags & XFS_MOUNT_ALLOCSIZE) + prealloc_blocks = mp->m_allocsize_blocks; + else + prealloc_blocks = xfs_iomap_prealloc_size(ip, allocfork, + offset, count, &icur); if (prealloc_blocks) { xfs_extlen_t align; xfs_off_t end_offset; From patchwork Fri May 22 02:53:23 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Darrick J. Wong" X-Patchwork-Id: 11564441 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id E9B3990 for ; Fri, 22 May 2020 02:53:34 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id D1399207D8 for ; Fri, 22 May 2020 02:53:34 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=oracle.com header.i=@oracle.com header.b="Gm7sKZaE" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727055AbgEVCxe (ORCPT ); Thu, 21 May 2020 22:53:34 -0400 Received: from userp2130.oracle.com ([156.151.31.86]:35244 "EHLO userp2130.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727992AbgEVCxe (ORCPT ); Thu, 21 May 2020 22:53:34 -0400 Received: from pps.filterd (userp2130.oracle.com [127.0.0.1]) by userp2130.oracle.com (8.16.0.42/8.16.0.42) with SMTP id 04M2m0ou111156; Fri, 22 May 2020 02:53:27 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=subject : from : to : cc : date : message-id : in-reply-to : references : mime-version : content-type : content-transfer-encoding; s=corp-2020-01-29; bh=nhNEfbDA7hygXgzHiaabroNOr78xf+7YmgWUhReNu9A=; b=Gm7sKZaE5bqB8F2duUdVN3yNnuyB6qG0+DgTXf+BKYjO3ctQXQX7xmRJl/WPWTYFUu0k gLDrJjfzHYlTbgxwinkOdV1XUFT9WIfgJSfI14ccBmbfQByvF25LeCB1Mq4pxCEczjSL SYlPdRMj6kFqDwIbwq8+TwvJeXWbYnn8o7PbUnmyfG/g9AP/K9GoDbnV4oP94QuydP9g umzrpZ+FQwcoZQMfCO0ZACf+P5yGfa72yITyF1Lom9ebjoNXpykBsIr3btIdSugK7szo 0ImUV14YjUpqC0C0vrA8VhKNDqFCekqz2F2IGHjkB/7vTFmgg/6bNgeYqL88rOt330/r yw== Received: from userp3030.oracle.com (userp3030.oracle.com [156.151.31.80]) by userp2130.oracle.com with ESMTP id 3127krkk0h-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=FAIL); Fri, 22 May 2020 02:53:27 +0000 Received: from pps.filterd (userp3030.oracle.com [127.0.0.1]) by userp3030.oracle.com (8.16.0.42/8.16.0.42) with SMTP id 04M2mX8d186009; Fri, 22 May 2020 02:53:26 GMT Received: from aserv0121.oracle.com (aserv0121.oracle.com [141.146.126.235]) by userp3030.oracle.com with ESMTP id 314gmaav3s-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Fri, 22 May 2020 02:53:26 +0000 Received: from abhmp0014.oracle.com (abhmp0014.oracle.com [141.146.116.20]) by aserv0121.oracle.com (8.14.4/8.13.8) with ESMTP id 04M2rOuw029821; Fri, 22 May 2020 02:53:25 GMT Received: from localhost (/67.169.218.210) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Thu, 21 May 2020 19:53:24 -0700 Subject: [PATCH 4/4] xfs: force writes to delalloc regions to unwritten From: "Darrick J. Wong" To: darrick.wong@oracle.com Cc: Christoph Hellwig , Brian Foster , linux-xfs@vger.kernel.org, hch@infradead.org, bfoster@redhat.com Date: Thu, 21 May 2020 19:53:23 -0700 Message-ID: <159011600308.76931.7853207930055232164.stgit@magnolia> In-Reply-To: <159011597442.76931.7800023221007221972.stgit@magnolia> References: <159011597442.76931.7800023221007221972.stgit@magnolia> User-Agent: StGit/0.19 MIME-Version: 1.0 X-Proofpoint-Virus-Version: vendor=nai engine=6000 definitions=9628 signatures=668686 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 malwarescore=0 mlxlogscore=999 adultscore=0 phishscore=0 mlxscore=0 spamscore=0 suspectscore=1 bulkscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2004280000 definitions=main-2005220021 X-Proofpoint-Virus-Version: vendor=nai engine=6000 definitions=9628 signatures=668686 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 adultscore=0 phishscore=0 spamscore=0 bulkscore=0 clxscore=1015 priorityscore=1501 mlxscore=0 impostorscore=0 suspectscore=1 mlxlogscore=999 malwarescore=0 cotscore=-2147483648 lowpriorityscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2004280000 definitions=main-2005220021 Sender: linux-xfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-xfs@vger.kernel.org From: Darrick J. Wong When writing to a delalloc region in the data fork, commit the new allocations (of the da reservation) as unwritten so that the mappings are only marked written once writeback completes successfully. This fixes the problem of stale data exposure if the system goes down during targeted writeback of a specific region of a file, as tested by generic/042. Signed-off-by: Darrick J. Wong Reviewed-by: Christoph Hellwig Reviewed-by: Brian Foster --- fs/xfs/libxfs/xfs_bmap.c | 28 +++++++++++++++++----------- 1 file changed, 17 insertions(+), 11 deletions(-) diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c index fda13cd7add0..825d170e1503 100644 --- a/fs/xfs/libxfs/xfs_bmap.c +++ b/fs/xfs/libxfs/xfs_bmap.c @@ -4193,17 +4193,7 @@ xfs_bmapi_allocate( bma->got.br_blockcount = bma->length; bma->got.br_state = XFS_EXT_NORM; - /* - * In the data fork, a wasdelay extent has been initialized, so - * shouldn't be flagged as unwritten. - * - * For the cow fork, however, we convert delalloc reservations - * (extents allocated for speculative preallocation) to - * allocated unwritten extents, and only convert the unwritten - * extents to real extents when we're about to write the data. - */ - if ((!bma->wasdel || (bma->flags & XFS_BMAPI_COWFORK)) && - (bma->flags & XFS_BMAPI_PREALLOC)) + if (bma->flags & XFS_BMAPI_PREALLOC) bma->got.br_state = XFS_EXT_UNWRITTEN; if (bma->wasdel) @@ -4611,8 +4601,24 @@ xfs_bmapi_convert_delalloc( bma.offset = bma.got.br_startoff; bma.length = max_t(xfs_filblks_t, bma.got.br_blockcount, MAXEXTLEN); bma.minleft = xfs_bmapi_minleft(tp, ip, whichfork); + + /* + * When we're converting the delalloc reservations backing dirty pages + * in the page cache, we must be careful about how we create the new + * extents: + * + * New CoW fork extents are created unwritten, turned into real extents + * when we're about to write the data to disk, and mapped into the data + * fork after the write finishes. End of story. + * + * New data fork extents must be mapped in as unwritten and converted + * to real extents after the write succeeds to avoid exposing stale + * disk contents if we crash. + */ if (whichfork == XFS_COW_FORK) bma.flags = XFS_BMAPI_COWFORK | XFS_BMAPI_PREALLOC; + else + bma.flags = XFS_BMAPI_PREALLOC; if (!xfs_iext_peek_prev_extent(ifp, &bma.icur, &bma.prev)) bma.prev.br_startoff = NULLFILEOFF;