From patchwork Sun Nov 17 21:32:06 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Chuck Lever X-Patchwork-Id: 13877970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id B2D70D10F32 for ; Sun, 17 Nov 2024 21:32:30 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 87D1B8D001B; Sun, 17 Nov 2024 16:32:27 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 73FCE8D0018; Sun, 17 Nov 2024 16:32:27 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 5E0998D001B; Sun, 17 Nov 2024 16:32:27 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 3AD0A8D0018 for ; Sun, 17 Nov 2024 16:32:27 -0500 (EST) Received: from smtpin01.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id DBDA440112 for ; Sun, 17 Nov 2024 21:32:26 +0000 (UTC) X-FDA: 82796883456.01.925E692 Received: from nyc.source.kernel.org (nyc.source.kernel.org [147.75.193.91]) by imf15.hostedemail.com (Postfix) with ESMTP id F154FA0002 for ; Sun, 17 Nov 2024 21:31:35 +0000 (UTC) Authentication-Results: imf15.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=CrUEwBWi; dmarc=pass (policy=quarantine) header.from=kernel.org; spf=pass (imf15.hostedemail.com: domain of cel@kernel.org designates 147.75.193.91 as permitted sender) smtp.mailfrom=cel@kernel.org ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1731879080; a=rsa-sha256; cv=none; b=cJCGv+fyiRd8XocYjIUVfRogW05ILZ/IfRZVW1in17aui1SDPqKrsJSqxnnol3if0rUKSJ 4Ctv4TjpyvHnzOwhdbB2Ayjk5rBRnlPAzDHriXKGygI/qjc73Q80O/K5bUowuCveiSUy3m 2x5jXJAP7KR0GXsuidjbATvrWueImFQ= ARC-Authentication-Results: i=1; imf15.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=CrUEwBWi; dmarc=pass (policy=quarantine) header.from=kernel.org; spf=pass (imf15.hostedemail.com: domain of cel@kernel.org designates 147.75.193.91 as permitted sender) smtp.mailfrom=cel@kernel.org ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1731879080; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=QTRRcXTGIOzQ7xeqTLUkcFIaHO4tKPLIimmQFdZtsJ0=; b=zASQ7OTOWjWAFjIlD59vhphkeSEoUHwDcpYU7p0zgnf0xklBFWVFCpcPnUXEF4rHf6/4YL 2fByPasuocqtVCsALnBS1f1SCKWl14SkpZGZly6CRz1HjhASVADrUT/xvz3X/kmr7BOYqQ UmdnueSkmYlwZ6qHQy1joaiSUs3lfD4= Received: from smtp.kernel.org (transwarp.subspace.kernel.org [100.75.92.58]) by nyc.source.kernel.org (Postfix) with ESMTP id 0ED69A41483; Sun, 17 Nov 2024 21:30:31 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id F02CEC4CED7; Sun, 17 Nov 2024 21:32:23 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1731879144; bh=Imws/SMxyuWl9wa5OM4mA83z05SUQjW00CX49uy8e2Q=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=CrUEwBWi8qr+8LgPYCTL0ObS1dvFTb6tStBaAmqhwR7WOv75uob1xy+nsGzACW624 983hu1q4sR4cX2Lk83LRRVBgzOtVX/6owVkk0TrNqlzfdYLXdwCM1XCR0Np7GPjuSU Og8beXb5TVOqbhoeokxP7WPfYbCgY+DnXAiqTVGuT6U/dlT6k9cvhSK9ggu9RMZ+xR 8D3o4NLYk2W6dkM3nuWMPPRYJd+pRrgJjMhRsoEJ+oYPBgYwkU9qX/mN0B3VNI4BNK ghueOUjbNtj2HMnqRey5UkJoKe8lBeI+KHwdbhSMMID/1zeihiN8kyRHaZkNYuYjYA s6bhxF0uRx91w== From: cel@kernel.org To: , , Hugh Dickens Cc: yukuai3@huawei.com, yangerkun@huaweicloud.com, Chuck Lever Subject: [RFC PATCH 2/2] libfs: Improve behavior when directory offset values wrap Date: Sun, 17 Nov 2024 16:32:06 -0500 Message-ID: <20241117213206.1636438-3-cel@kernel.org> X-Mailer: git-send-email 2.47.0 In-Reply-To: <20241117213206.1636438-1-cel@kernel.org> References: <20241117213206.1636438-1-cel@kernel.org> MIME-Version: 1.0 X-Rspam-User: X-Rspamd-Queue-Id: F154FA0002 X-Rspamd-Server: rspam01 X-Stat-Signature: bkk1giynnxthw86o9wwh6pz465abhfbm X-HE-Tag: 1731879095-462895 X-HE-Meta: U2FsdGVkX1/CctjXaz2MHBOtvTtO6ZhymWcxG1XsjSMAsD4ofVEGdWxa5JcLtVHrVI1r5bK/TKCyXh6u2N185ESd1GmACoLkPBvuo8AkIHUie/75Z6J7P0mzxDBGKBfX+Epx/QIA6i1bDShtdnC7PmjKYrr570CJQxASpyVqJ8F7T3RVX69Bs32rZc96CUiuy3C+ztDqXVVHlmgRtill/08D8XvHLHSFDtYyJcEj8Xhw4mlbZHfjJS1ZLCBSp2MrxLkMN1lLY1ysBFuSBXA01WR5dCGBM88L7J3wLOMFi1q4rVv7loOfRtj+NnBHPcSTb7H2mPg9FNkGuot46o8ihaYpoji3XCeh13aPkUXiAk/jO2GYkd58X9tu8+q4CZ9nhaFD0aP7yzxViCXydBaNzdvHNSmCSJ16CeQkWpZ0+6Z/2MNAJ5jPbwMXTvoKkoc4FftlxhVgQcCNjxvgWf61vX1VUqFzmvSxe5Rv78gYtsMzgSTM1MJQrupcc4lON5Bw8gPkbulK490KhemdX80udjOcBUVodc1NIpf5irKT231+pC8afnppy4R7IJuFr7fE4oO2YyLfpjwKoIsd81PZf2MKlm0rwtRd0480U6MOSH1Sze2q1AdEyiOHKBk1qs5AEMkOVzxm1VNR7i1QIvHdVvSUT3VgC2FkLaaGyMAegVHrCROdZ/zAuZOCcM7yKpuXlRzlsybL7E62Wm0YQo/UErdcedBK10QgMkNNlH2TN8dDds41CXn8LjVvbPpiDQdTLpWadymLsEom4lJF9Q5u/V1wRSubbqYenZL+SgYIzN5kzmYoGA/t+ffRHaLzO3rakgfh1Z9pBVUZaY5RN8GU+iLh+UddX94VJARhtZ42pT6bP1tM3hlzzHGPeuV9MuS9Q/VFUqccl8kxRX5VwbFlR1Xw8oJpyo0sGjYpDYtw8AXk8dsurrrhVrRAbS6pYKryyYDD1VWv8D7HXIPiPmG y5KaE8wS zvx9OTkZbwHNXz7Xb0NvYqiK/vZg7MH9+sDhbDUfy8a6wXmFeZ8bvu0X56+A8IaZ/0Ilgg73DaM7dfmwMU4mF/v9KcUhI6ImWzRB3yBkAJD6orH8wjTcG1mTo0EvgXYn9KvfFjoQcDl4nsdbtOQekyUtZMZm7qY3e8SEGYBxb3Wn2RoMu8PUA0KbMh3YvMBH2dCWLYQLjXkXKustz3d1OsE+wlFzi5NMmQAeXOKHmHYn7lvbpq+Uu1q36jX1KjnAqvkB7oI4uz7x9Op9i3MzBP/YKNg== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: From: Chuck Lever The fix in commit 64a7ce76fb90 ("libfs: fix infinite directory reads for offset dir") introduced a fence in offset_iterate_dir() to stop the loop from returning child entries created after the directory was opened. This comparison relies on the strong ordering of DIR_OFFSET_MIN <= largest child offset <= next_offset to terminate the directory iteration. However, because simple_offset_add() uses mtree_alloc_cyclic() to select each next new directory offset, ctx->next_offset is not always the highest unused offset. Once mtree_alloc_cyclic() allows a new offset value to wrap, ctx->next_offset will be set to a value less than the actual largest child offset. The result is that readdir(3) no longer shows any entries in the directory because their offsets are above ctx->next_offset, which is now a small value. This situation is persistent, and the directory cannot be removed unless all current children are already known and can be explicitly removed by name first. In the current Maple tree implementation, there is no practical way that 63-bit offset values can ever wrap, so this issue is cleverly avoided. But the ordering dependency is not documented via comments or code, making the mechanism somewhat brittle. And it makes the continued use of mtree_alloc_cyclic() somewhat confusing. Further, if commit 64a7ce76fb90 ("libfs: fix infinite directory reads for offset dir") were to be backported to a kernel that still uses xarray to manage simple directory offsets, the directory offset value range is limited to 32-bits, which is small enough to allow a wrap after a few weeks of constant creation of entries in one directory. Therefore, replace the use of ctx->next_offset for fencing new children from appearing in readdir results. A jiffies timestamp marks the end of each opendir epoch. Entries created after this timestamp will not be visible to the file descriptor. I chose jiffies so that the dentry->d_time field can be re-used for storing the entry creation time. The new mechanism has its own corner cases. For instance, I think if jiffies wraps twice while a directory is open, some children might become invisible. On 32-bit systems, the jiffies value wraps every 49 days. Double-wrapping is not a risk on systems with 64-bit jiffies. Unlike with the next_offset-based mechanism, re-opening the directory will make invisible children re-appear. Reported-by: Yu Kuai Closes: https://lore.kernel.org/stable/20241111005242.34654-1-cel@kernel.org/T/#m1c448e5bd4aae3632a09468affcfe1d1594c6a59 Fixes: 64a7ce76fb90 ("libfs: fix infinite directory reads for offset dir") Signed-off-by: Chuck Lever --- fs/libfs.c | 36 +++++++++++++++++------------------- 1 file changed, 17 insertions(+), 19 deletions(-) diff --git a/fs/libfs.c b/fs/libfs.c index bf67954b525b..862a603fd454 100644 --- a/fs/libfs.c +++ b/fs/libfs.c @@ -294,6 +294,7 @@ int simple_offset_add(struct offset_ctx *octx, struct dentry *dentry) return ret; offset_set(dentry, offset); + WRITE_ONCE(dentry->d_time, jiffies); return 0; } @@ -454,9 +455,7 @@ void simple_offset_destroy(struct offset_ctx *octx) static int offset_dir_open(struct inode *inode, struct file *file) { - struct offset_ctx *ctx = inode->i_op->get_offset_ctx(inode); - - file->private_data = (void *)ctx->next_offset; + file->private_data = (void *)jiffies; return 0; } @@ -473,9 +472,6 @@ static int offset_dir_open(struct inode *inode, struct file *file) */ static loff_t offset_dir_llseek(struct file *file, loff_t offset, int whence) { - struct inode *inode = file->f_inode; - struct offset_ctx *ctx = inode->i_op->get_offset_ctx(inode); - switch (whence) { case SEEK_CUR: offset += file->f_pos; @@ -490,7 +486,8 @@ static loff_t offset_dir_llseek(struct file *file, loff_t offset, int whence) /* In this case, ->private_data is protected by f_pos_lock */ if (!offset) - file->private_data = (void *)ctx->next_offset; + /* Make newer child entries visible */ + file->private_data = (void *)jiffies; return vfs_setpos(file, offset, LONG_MAX); } @@ -521,7 +518,8 @@ static bool offset_dir_emit(struct dir_context *ctx, struct dentry *dentry) inode->i_ino, fs_umode_to_dtype(inode->i_mode)); } -static void offset_iterate_dir(struct inode *inode, struct dir_context *ctx, long last_index) +static void offset_iterate_dir(struct inode *inode, struct dir_context *ctx, + unsigned long fence) { struct offset_ctx *octx = inode->i_op->get_offset_ctx(inode); struct dentry *dentry; @@ -531,14 +529,15 @@ static void offset_iterate_dir(struct inode *inode, struct dir_context *ctx, lon if (!dentry) return; - if (dentry2offset(dentry) >= last_index) { - dput(dentry); - return; - } - - if (!offset_dir_emit(ctx, dentry)) { - dput(dentry); - return; + /* + * Output only child entries created during or before + * the current opendir epoch. + */ + if (time_before_eq(dentry->d_time, fence)) { + if (!offset_dir_emit(ctx, dentry)) { + dput(dentry); + return; + } } ctx->pos = dentry2offset(dentry) + 1; @@ -569,15 +568,14 @@ static void offset_iterate_dir(struct inode *inode, struct dir_context *ctx, lon */ static int offset_readdir(struct file *file, struct dir_context *ctx) { + unsigned long fence = (unsigned long)file->private_data; struct dentry *dir = file->f_path.dentry; - long last_index = (long)file->private_data; lockdep_assert_held(&d_inode(dir)->i_rwsem); if (!dir_emit_dots(file, ctx)) return 0; - - offset_iterate_dir(d_inode(dir), ctx, last_index); + offset_iterate_dir(d_inode(dir), ctx, fence); return 0; }