From patchwork Sat Dec 2 09:14:30 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Baokun Li X-Patchwork-Id: 13476818 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5D213C4167B for ; Sat, 2 Dec 2023 09:10:54 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 59BC56B0452; Sat, 2 Dec 2023 04:10:53 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 572726B044B; Sat, 2 Dec 2023 04:10:53 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 3D0FD6B0458; Sat, 2 Dec 2023 04:10:53 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 2A5BE6B044B for ; Sat, 2 Dec 2023 04:10:53 -0500 (EST) Received: from smtpin28.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id DF18140010 for ; Sat, 2 Dec 2023 09:10:52 +0000 (UTC) X-FDA: 81521308344.28.3B4FE37 Received: from szxga08-in.huawei.com (szxga08-in.huawei.com [45.249.212.255]) by imf28.hostedemail.com (Postfix) with ESMTP id 065EFC0004 for ; Sat, 2 Dec 2023 09:10:49 +0000 (UTC) Authentication-Results: imf28.hostedemail.com; dkim=none; dmarc=pass (policy=quarantine) header.from=huawei.com; spf=pass (imf28.hostedemail.com: domain of libaokun1@huawei.com designates 45.249.212.255 as permitted sender) smtp.mailfrom=libaokun1@huawei.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1701508251; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding:in-reply-to: references; bh=rsvBb22ydhvf3up9FB/H1FvlZYAaImebSpvSspuySrE=; b=ZmKOKY25OWSufCU56KDkwfMd+1QwLdvGy9qkTjnt2jnX+5L0UWgNbmhcJgAHGayXmtaYNt 3O7VDqJ7xGr6IDrBS9Go9enzRe7fcpVQmeaC/QbTZth26VWCdf4i6fza0p9C6N+FnnBbZ4 ZxgSLru7rzYduKG2F4vYBFQ46UnU1rk= ARC-Authentication-Results: i=1; imf28.hostedemail.com; dkim=none; dmarc=pass (policy=quarantine) header.from=huawei.com; spf=pass (imf28.hostedemail.com: domain of libaokun1@huawei.com designates 45.249.212.255 as permitted sender) smtp.mailfrom=libaokun1@huawei.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1701508251; a=rsa-sha256; cv=none; b=OA4lNWCbmLp8TDJnzA+t1R+mo/5Pc2q5TBmbPKKD6oPn2ztbiDt6pxPE2Bp35FojvcPX2n t0qP2ndE7u9eOF9yPHKwgrLv6eX3i1Ul5J9UMW1E2TC2uDncqIACjj+IUp/XznyLZoKHR5 LUao0wErXRqHBFbiPiiR3xfhHsO/KdA= Received: from dggpeml500021.china.huawei.com (unknown [172.30.72.53]) by szxga08-in.huawei.com (SkyGuard) with ESMTP id 4Sj3w71gq7z1P8b8; Sat, 2 Dec 2023 17:07:03 +0800 (CST) Received: from huawei.com (10.175.127.227) by dggpeml500021.china.huawei.com (7.185.36.21) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2507.35; Sat, 2 Dec 2023 17:10:44 +0800 From: Baokun Li To: , CC: , , , , , , , , , , Subject: [PATCH -RFC 0/2] mm/ext4: avoid data corruption when extending DIO write race with buffered read Date: Sat, 2 Dec 2023 17:14:30 +0800 Message-ID: <20231202091432.8349-1-libaokun1@huawei.com> X-Mailer: git-send-email 2.31.1 MIME-Version: 1.0 X-Originating-IP: [10.175.127.227] X-ClientProxiedBy: dggems703-chm.china.huawei.com (10.3.19.180) To dggpeml500021.china.huawei.com (7.185.36.21) X-CFilter-Loop: Reflected X-Rspamd-Queue-Id: 065EFC0004 X-Rspam-User: X-Rspamd-Server: rspam04 X-Stat-Signature: kfrhk8k1s3hajdmjrewiemkrz5terwuz X-HE-Tag: 1701508249-315666 X-HE-Meta: U2FsdGVkX1/WjwBfiAWsxoePQI8tWrxnvPzXLqB+f+igrAKB3ZiRy0dvdhNFt6+qvOv4AwtkMtiTzw23cwmEEnWxLOQp9S/YnkrlA8cfEL42Um6fKcFrdKASrTtVYTTb9/oeK3V/PtIX47mUt8kU4LXw7orQgr/F2mnQSemEpya7COiT+Vs6lmY/K09jkrLvgrPvEyg76T44+brm0KdeM7j0Srg3zqpc54b/Pz5n/Ra6aB5vnNlWUwLceSW2uqO5/ALAYWBf4RGm/8QFbfBpuzqR9AGImWLPOsgERq2G+SgXOO87uTxLCbCM9APNSkE1Ygz5WbvfSbCMtib+GC1qwuweklpYyLITOlzLevahQ7erpOivd2L4VvX7TxEAJqx88qL4xnEhMuzBmM9Yuhe5bzcp12kZEuhRykZVinY4h+Q/TEb+qn+Qc+J8OHpromuMljZY26DggHJTqAvbNLEJQ+QkCzWcI00oYl7knTNXDd5Do3AdACTiV+OpNF6ZfxMts1368l0NV6jDuNjyw7Anjqkda+jmoXjByQk30p3TlLFMJW9uEViWj11gSQqIq9jzzIjxerfKCkFym+BBcBIc7LMzU/qkyoCZPfBhYiyKL9AxYCUE61c60MnniBmbFnsxXMzvofCZPbedliNK0yphzMD0MI/KBI21qr/0ggDnFsjntcHxgglpnmNoGytEPQbF9qLcNPcx0E00GAl94KOdMYSCMuA6yXjCBWQr94nRBqHU3QNWirRnP6WAq/4tcRsDGOfbq5+TM7uyW+LMSIHt4kNFiSNvuF+oM7NCL2zsNrdjmBjdu698B1TqV396T6q4/tD1cfo1i8GCXVHoNUQV3sb7xIar+D0wBnZDz8nQnlzmR1IuSO1zIV0QOJ2M8yvI3rR3qqTq89VVfjtOLsaqLcfkHq/gYjzBLKbD2SW7aOTS75tKt0Lt2ZJKW3CVU8k34p/5rEC57ZKmpB73rcj V/C/ijnE zFZfCBzy4XhX7sHCWRQMhivB+y0FDjIrWgFBrmrbJ0GysPnA+R0iPUCVf1X5UCwh9JDR/ujfXRuCBWygHNiYSD0aJFO6QVHHOceTtP19g6e1LUtwcsvOCfrAinVLYdKm9YZI+TAo4UOffO4C3A4TfS/5lvl9spkGH7KqCSnj0Jx/dJKSTneU1uK+OerSJjdFKz+Uk X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Hello everyone! Recently, while running some pressure tests on MYSQL, noticed that occasionally a "corrupted data in log event" error would be reported. After analyzing the error, I found that extending DIO write and buffered read were competing, resulting in some zero-filled page end being read. Since ext4 buffered read doesn't hold an inode lock, and there is no field in the page to indicate the valid data size, it seems to me that it is impossible to solve this problem perfectly without changing these two things. In this series, the first patch reads the inode size twice, and takes the smaller of the two values as the copyout limit to avoid copying data that was not actually read (0-padding) into the user buffer and causing data corruption. This greatly reduces the probability of problems under 4k page. However, the problem is still easily triggered under 64k page. The second patch waits for the existing dio write to complete and invalidate the stale page cache before performing a new buffered read in ext4, avoiding data corruption by copying the stale page cache to the user buffer. This makes it much less likely that the problem will be triggered in a 64k page. Do we have a plan to add a lock to the ext4 buffered read or a field in the page that indicates the size of the valid data in the page? Or does anyone have a better idea? Comments and questions are, as always, welcome. Baokun Li (2): mm: avoid data corruption when extending DIO write race with buffered read ext4: avoid data corruption when extending DIO write race with buffered read fs/ext4/file.c | 3 +++ mm/filemap.c | 5 +++-- 2 files changed, 6 insertions(+), 2 deletions(-)