From patchwork Thu Sep 19 16:07:40 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Brian Foster X-Patchwork-Id: 13807833 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id EC9BDCF395D for ; Thu, 19 Sep 2024 16:06:44 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 646FC6B0093; Thu, 19 Sep 2024 12:06:44 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 5CE136B0095; Thu, 19 Sep 2024 12:06:44 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 46EBB6B0098; Thu, 19 Sep 2024 12:06:44 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 278EC6B0093 for ; Thu, 19 Sep 2024 12:06:44 -0400 (EDT) Received: from smtpin08.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id DAD30A0E3A for ; Thu, 19 Sep 2024 16:06:43 +0000 (UTC) X-FDA: 82581965886.08.3EDC448 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by imf09.hostedemail.com (Postfix) with ESMTP id 0D18B140009 for ; Thu, 19 Sep 2024 16:06:40 +0000 (UTC) Authentication-Results: imf09.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=PSJSNAue; dmarc=pass (policy=none) header.from=redhat.com; spf=pass (imf09.hostedemail.com: domain of bfoster@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=bfoster@redhat.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1726761917; a=rsa-sha256; cv=none; b=Nf6PpbCJaOICqn9od/Y7xzEvKufYVTtnA9dCd7ev+Na/z+vYZsUN+G3H6rFkaLii7wC+bI l1+2puUfAq3QeXiLmylDxkeux7gtntixGroJUbL/gdHGazhZ+Guriq8ymOYHqsrU0ZNF8P H04yShuDzKgcWBpzqOQFmPTz3Bo9IvY= ARC-Authentication-Results: i=1; imf09.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=PSJSNAue; dmarc=pass (policy=none) header.from=redhat.com; spf=pass (imf09.hostedemail.com: domain of bfoster@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=bfoster@redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1726761917; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=+Iu9w4/pdlA/C4lnKYiprUYbbck6Hh3XVGFak1anZvk=; b=BWQXYF65oqrnykiR5kt8Z6Rh+5Bbszlk73YhIy8prRxyo+89InHHTg275zlyumwTaGAUeM uWVxkQG2yHfD5krl05GTJQq2Czv1UM/pNCpsXKunxdGxHswqjLCySfFS4qzvYjRtXNtrV/ vc7VtB08pmUTrCOCZEDP/nPbgSPw20M= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1726762000; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=+Iu9w4/pdlA/C4lnKYiprUYbbck6Hh3XVGFak1anZvk=; b=PSJSNAueD8gcKgx+uwdBkd5UTtQtwyuUuunHZGmsJZrpKB6jq8ZoSJHKz3w3A4qabkkn7G vSHGgvDE1khICIpccGCTJmMSawZWr9opEoWOLBlL0bK8gHI0ZJdGPpDyOZe/KpHG0STH9+ lHXfI2cMZqofBzzZPcLQJyeBlWJ2T+E= Received: from mx-prod-mc-02.mail-002.prod.us-west-2.aws.redhat.com (ec2-54-186-198-63.us-west-2.compute.amazonaws.com [54.186.198.63]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-336-FDbLKZuHMeKQ0c_FHjUlAQ-1; Thu, 19 Sep 2024 12:06:36 -0400 X-MC-Unique: FDbLKZuHMeKQ0c_FHjUlAQ-1 Received: from mx-prod-int-05.mail-002.prod.us-west-2.aws.redhat.com (unknown [10.30.177.17]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-02.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 1F69E1945118; Thu, 19 Sep 2024 16:06:35 +0000 (UTC) Received: from bfoster.redhat.com (unknown [10.22.9.175]) by mx-prod-int-05.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id 3A0A019560A3; Thu, 19 Sep 2024 16:06:34 +0000 (UTC) From: Brian Foster To: linux-ext4@vger.kernel.org, linux-mm@kvack.org Cc: linux-fsdevel@vger.kernel.org, willy@infradead.org Subject: [PATCH 1/2] ext4: partial zero eof block on unaligned inode size extension Date: Thu, 19 Sep 2024 12:07:40 -0400 Message-ID: <20240919160741.208162-2-bfoster@redhat.com> In-Reply-To: <20240919160741.208162-1-bfoster@redhat.com> References: <20240919160741.208162-1-bfoster@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.0 on 10.30.177.17 X-Rspamd-Queue-Id: 0D18B140009 X-Rspam-User: X-Rspamd-Server: rspam05 X-Stat-Signature: 9wxebdtyrycubcmsniwcxmf8zpcf96e3 X-HE-Tag: 1726762000-974376 X-HE-Meta: U2FsdGVkX1+Gjq9CvWO82adyaHuU4hTGdyLfB/wTttN/YXxDIovvbzKgESuIGmyX5gaHTsrbPwVAWUDCqDpqjn7wC3QqlZazH7e8XOSY8Dy9oGNT05kLS9s+pyHH+jKeA38PAeMctN284TqVuw/9soYVSUDH4em38q4GKNPohDfoSI+JhpykFW8NypFwC68qVSMRnvP1fGeU/HvLBaQxRhF3W4y/aesHazuPp9BS1oqgJ0Wq++7jyzT+6i7Ulu5p4E98fwFW8BNO4GqQBMShZ4GsoTE2Swr+BQO2MNKFEkbV0NYI7w8pxXDtMyniZY9N6RWHUbBbwECACRLNNK82IIEawr2Drq8fJv5RaBsH+ZipYsiDbf2i58ZhfxuXT/RVnEpVS1NMoZr0Jxu0FfQaIkMF6IGLWCMgFTIISDi80quxOptvkaTkBpjSrelS2kkQijbfX86G7l4oHtm7AFwsgKW9avJ2NNEM9aMo4yV3zyInlnl7ziGypByMPfl8xMbqwPALVN6AT0s78fzRJw3CCwogJ0e1s3o8nH3+yeEHty6dusH4mQHEeYRDM5rDifZYnn0VeRI6RP9XR/1tMzqajnsSEdK7VQzJ8p/04/Ps1YR4DEHTS79EewoE3EVRKT6cDrfYUPpyAFoBhSEsjta+fkGb1KTplzHYOTlEgZtPjhcRrMsrHu+ignrCeRVHNoHmrgGkHr2bjEQmPGGj7yzxOa2pKq1NMbXja7U2MHEJQTD8nF03lQANo9h0xYVX3gX+hvab0PdQAVqDFyjGfpT67qr7TGujsMjNfq4EG62OHMhw7sZdHswJ0tUwF4CKc0FtmA6qcGB6n8N6T2+n7DCzZ8fjjmMHBzpQtgaftLPE4rNIJSrKCO38EkIVROsWNu5LCW5W0Q7JTmztaQtMruoJ9Q9WlXLPI228y5f3rQBSyG0AqyQnlfXzv7r5f4w3l6CqYvKqO77WDla7vRcxsM3 iz++HKeK UuEFmSAb7TrUBlHyxld5jWIhtFEp8S1wWhU3k57lf6MfBTh0r/BydJxn+kcGAorZscRx021kiaPahuLPgPeVHCwFgYCeJeUXjuDHAHHgmgISRrppV8t2mfqo9jbW/rEc0ssDhDqFbfEfIkt+ErFVjm1ltwD0gjT7rRg+gZNdBr+dRnAycYwQodSg/FzGJNij2NC40wtZtOjYOI55KrjCUiRxHCAYwinxpmiTtL4AjPlE1cyoMV7zFx0SUOYzKw6grRH5zjsSyRXIfxaerve2zkZctN6gKXLJ6QYon9mgQWiVJpX/v4yZjOKFTCatbaa7LBBymOm66WNdV9W4= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Using mapped writes, it's technically possible to expose stale post-eof data on a truncate up operation. Consider the following example: $ xfs_io -fc "pwrite 0 2k" -c "mmap 0 4k" -c "mwrite 2k 2k" \ -c "truncate 8k" -c "pread -v 2k 16" ... 00000800: 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 XXXXXXXXXXXXXXXX ... This shows that the post-eof data written via mwrite lands within EOF after a truncate up. While this is deliberate of the test case, behavior is somewhat unpredictable because writeback does post-eof zeroing, and writeback can occur at any time in the background. For example, an fsync inserted between the mwrite and truncate causes the subsequent read to instead return zeroes. This basically means that there is a race window in this situation between any subsequent extending operation and writeback that dictates whether post-eof data is exposed to the file or zeroed. To prevent this problem, perform partial block zeroing as part of the various inode size extending operations that are susceptible to it. For truncate extension, zero around the original eof similar to how truncate down does partial zeroing of the new eof. For extension via writes and fallocate related operations, zero the newly exposed range of the file to cover any partial zeroing that must occur at the original and new eof blocks. Signed-off-by: Brian Foster --- fs/ext4/extents.c | 7 ++++++- fs/ext4/inode.c | 51 +++++++++++++++++++++++++++++++++-------------- 2 files changed, 42 insertions(+), 16 deletions(-) diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c index e067f2dd0335..d43a23abf148 100644 --- a/fs/ext4/extents.c +++ b/fs/ext4/extents.c @@ -4457,7 +4457,7 @@ static int ext4_alloc_file_blocks(struct file *file, ext4_lblk_t offset, int depth = 0; struct ext4_map_blocks map; unsigned int credits; - loff_t epos; + loff_t epos, old_size = i_size_read(inode); BUG_ON(!ext4_test_inode_flag(inode, EXT4_INODE_EXTENTS)); map.m_lblk = offset; @@ -4516,6 +4516,11 @@ static int ext4_alloc_file_blocks(struct file *file, ext4_lblk_t offset, if (ext4_update_inode_size(inode, epos) & 0x1) inode_set_mtime_to_ts(inode, inode_get_ctime(inode)); + if (epos > old_size) { + pagecache_isize_extended(inode, old_size, epos); + ext4_zero_partial_blocks(handle, inode, + old_size, epos - old_size); + } } ret2 = ext4_mark_inode_dirty(handle, inode); ext4_update_inode_fsync_trans(handle, inode, 1); diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c index 03374dc215d1..c8d5334cecca 100644 --- a/fs/ext4/inode.c +++ b/fs/ext4/inode.c @@ -1327,8 +1327,10 @@ static int ext4_write_end(struct file *file, folio_unlock(folio); folio_put(folio); - if (old_size < pos && !verity) + if (old_size < pos && !verity) { pagecache_isize_extended(inode, old_size, pos); + ext4_zero_partial_blocks(handle, inode, old_size, pos - old_size); + } /* * Don't mark the inode dirty under folio lock. First, it unnecessarily * makes the holding time of folio lock longer. Second, it forces lock @@ -1443,8 +1445,10 @@ static int ext4_journalled_write_end(struct file *file, folio_unlock(folio); folio_put(folio); - if (old_size < pos && !verity) + if (old_size < pos && !verity) { pagecache_isize_extended(inode, old_size, pos); + ext4_zero_partial_blocks(handle, inode, old_size, pos - old_size); + } if (size_changed) { ret2 = ext4_mark_inode_dirty(handle, inode); @@ -3015,7 +3019,8 @@ static int ext4_da_do_write_end(struct address_space *mapping, struct inode *inode = mapping->host; loff_t old_size = inode->i_size; bool disksize_changed = false; - loff_t new_i_size; + loff_t new_i_size, zero_len = 0; + handle_t *handle; if (unlikely(!folio_buffers(folio))) { folio_unlock(folio); @@ -3059,18 +3064,21 @@ static int ext4_da_do_write_end(struct address_space *mapping, folio_unlock(folio); folio_put(folio); - if (old_size < pos) + if (pos > old_size) { pagecache_isize_extended(inode, old_size, pos); + zero_len = pos - old_size; + } - if (disksize_changed) { - handle_t *handle; + if (!disksize_changed && !zero_len) + return copied; - handle = ext4_journal_start(inode, EXT4_HT_INODE, 2); - if (IS_ERR(handle)) - return PTR_ERR(handle); - ext4_mark_inode_dirty(handle, inode); - ext4_journal_stop(handle); - } + handle = ext4_journal_start(inode, EXT4_HT_INODE, 2); + if (IS_ERR(handle)) + return PTR_ERR(handle); + if (zero_len) + ext4_zero_partial_blocks(handle, inode, old_size, zero_len); + ext4_mark_inode_dirty(handle, inode); + ext4_journal_stop(handle); return copied; } @@ -5453,6 +5461,14 @@ int ext4_setattr(struct mnt_idmap *idmap, struct dentry *dentry, } if (attr->ia_size != inode->i_size) { + /* attach jbd2 jinode for EOF folio tail zeroing */ + if (attr->ia_size & (inode->i_sb->s_blocksize - 1) || + oldsize & (inode->i_sb->s_blocksize - 1)) { + error = ext4_inode_attach_jinode(inode); + if (error) + goto err_out; + } + handle = ext4_journal_start(inode, EXT4_HT_INODE, 3); if (IS_ERR(handle)) { error = PTR_ERR(handle); @@ -5463,12 +5479,17 @@ int ext4_setattr(struct mnt_idmap *idmap, struct dentry *dentry, orphan = 1; } /* - * Update c/mtime on truncate up, ext4_truncate() will - * update c/mtime in shrink case below + * Update c/mtime and tail zero the EOF folio on + * truncate up. ext4_truncate() handles the shrink case + * below. */ - if (!shrink) + if (!shrink) { inode_set_mtime_to_ts(inode, inode_set_ctime_current(inode)); + if (oldsize & (inode->i_sb->s_blocksize - 1)) + ext4_block_truncate_page(handle, + inode->i_mapping, oldsize); + } if (shrink) ext4_fc_track_range(handle, inode,