From patchwork Tue Nov 26 15:54:44 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Chuck Lever X-Patchwork-Id: 13886137 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 40108D3B9B5 for ; Tue, 26 Nov 2024 15:55:09 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id EEA4E6B0096; Tue, 26 Nov 2024 10:55:00 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id E9BF76B0095; Tue, 26 Nov 2024 10:55:00 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id CC76D6B0096; Tue, 26 Nov 2024 10:55:00 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id A8D016B0092 for ; Tue, 26 Nov 2024 10:55:00 -0500 (EST) Received: from smtpin01.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id 6FDD3C1228 for ; Tue, 26 Nov 2024 15:55:00 +0000 (UTC) X-FDA: 82828694508.01.4A3CB5B Received: from nyc.source.kernel.org (nyc.source.kernel.org [147.75.193.91]) by imf01.hostedemail.com (Postfix) with ESMTP id 5864140003 for ; Tue, 26 Nov 2024 15:54:55 +0000 (UTC) Authentication-Results: imf01.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=aukl0fCD; dmarc=pass (policy=quarantine) header.from=kernel.org; spf=pass (imf01.hostedemail.com: domain of cel@kernel.org designates 147.75.193.91 as permitted sender) smtp.mailfrom=cel@kernel.org ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1732636494; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=0eiKIPAFAINpQvIQNm5swN/POrKtOBR4g3W7io/hVGA=; b=MbiDpmt+HI+e4o1FuBhVeCtfRXlrCxxDd+5bhQfJVQJSShZ53KHAA89TKV8br1uzivlvoi XcIJjIRwd7ZY9X7FA5jPmIxL0tKcuN58J+xGCvv25gbBx5z1wxk599jmZGONNGmrof9JQZ XqmK3LP2bDmu8ODTjwQ5rkTtBZxQh6Y= ARC-Authentication-Results: i=1; imf01.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=aukl0fCD; dmarc=pass (policy=quarantine) header.from=kernel.org; spf=pass (imf01.hostedemail.com: domain of cel@kernel.org designates 147.75.193.91 as permitted sender) smtp.mailfrom=cel@kernel.org ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1732636494; a=rsa-sha256; cv=none; b=Rddb/rasY68rPjN0ODRfHXMBiKNiz79YWYRyj1bPGukVs9AMMrdNoIwOiGRhzaJzXGkNWW rtp9hBzWjHp1i2Io9i+CdzBR4M+JoK9FwLhLrbFErhZtgtnhO+dSa0VqS2Nqv/CQsekWD9 /vmt4n7Igh7HrYyRsmyklIsfKGz8ziw= Received: from smtp.kernel.org (transwarp.subspace.kernel.org [100.75.92.58]) by nyc.source.kernel.org (Postfix) with ESMTP id 40998A40808; Tue, 26 Nov 2024 15:53:05 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 3870BC4CED7; Tue, 26 Nov 2024 15:54:57 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1732636498; bh=l4XCZVlwO/32TPnzhT69mYkOXGa3rUk3rjWN4U/BZdE=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=aukl0fCDiCZH1mltSMfE078b+QiSx1oYrncyG7BPLLeBrJpI/mYBPvRwqP1VOvqOE 4aTKsTb5LJePJEXAnF7uz+znj+PvK2gAXJF6PcrL4fsAqfdVwHAR3Ps38LP1lCeO/v AaMJZKZaQJoXpdnpHVK0hvWY0RNqxnymlHPEqP8d9JLHUsZvdUWdBioiU/veJtU/hw kc7H44lmJRBVOsJm3BZWhBscorvatt1RzkelvzMeMSfZteV/eE8AphXPhC/j8ShCkn wonsfhF5alWIOvuXOnRfGI0/r5d7g/iEDUc2lstZe411KI0GGiaX1bfHbZTcQYo5jQ uun6vWQx5mkwg== From: cel@kernel.org To: Hugh Dickens , Christian Brauner , Al Viro Cc: , , yukuai3@huawei.com, yangerkun@huaweicloud.com, Chuck Lever Subject: [RFC PATCH v2 5/5] libfs: Refactor offset_iterate_dir() Date: Tue, 26 Nov 2024 10:54:44 -0500 Message-ID: <20241126155444.2556-6-cel@kernel.org> X-Mailer: git-send-email 2.47.0 In-Reply-To: <20241126155444.2556-1-cel@kernel.org> References: <20241126155444.2556-1-cel@kernel.org> MIME-Version: 1.0 X-Rspamd-Server: rspam04 X-Rspamd-Queue-Id: 5864140003 X-Stat-Signature: oyi47ai9k43nygd3spt85qn95uc678jt X-Rspam-User: X-HE-Tag: 1732636495-896198 X-HE-Meta: U2FsdGVkX1/ZvTEwhJVLeHI+LXnISIDv8tOeCeQKddGYxep4nl0qKXhHw+ejCW9276WzJQGUxDJ3bC/TonNWgYkb77+a7z0mnJaKawqEH0rw/5w9osyGDxPzRvgN7kt1EPyBhMbFDWU/EO00J3O/pnrNe8bZ+hqhyFGzhAg2FaefLptCIGjNwH9meaFUOMnQM7qf5f9QXYy5op8A8WwCwpc4WXp8u30EXWtvjfhpBq6MSTKBkh0GSCTog9lWcR4eQ7UuUxuNL3O0wwTCy+Su7dKWCiNz7rvTygXSC+ZiYgGnB6Iynjspg8Q7SxcQv5NC/h2ProcjMAKptSw30PQjYExuVwz7U7FGGzndLh4bMxU62uNs6HzvLdBm9kdj4C/Tx8/2LzyixGz9EYxN4XM9FKq59aIXp+daPMEWeWXjmQsrFR3j8AnVd3KHQzd2DDYu7s525XTZQC5sXb+k8Doq1eHBABRCGj4wekmXQDEi4H4Y6/mT3PQuTPTKXm0XuldloBxmeICJAAze3E57KsYaZv+t+JcdlzsGIKsBTfdkuLQ+cY5xpPwYhHgRnWgjJ6Az+LE/WXEjfwBc2Jl5vWAWL4djX97gl6g8llQWaPZ1Xd1Jvv3jtvxfpQhRDr7WI5pqrUeKicbW+BJt9dRQob213Y4SCwkI15PfoiK3yCSYG7gG8x0jALi8Sjx9M+UFZUREYmz1mg/FhAFdB78ATCY2y3IENNYABMzmup7M6Qw2OTy5D1kCbjTt4IFLnuTchzeTzrEP8vZInGlX5c+LIprPg/S9HRU2xHy/c8YWubxRS0/g60KygNTDcAQpXaxH5fjPdYps1rHbUNDcsC58jEzony/27x0bPPPFkXXuNgWAimXBcQ0Ya0aSEMJSYkEhIpLTTw6jfp0vEj3aIJyk6Z3Wvd0V9FUvJ4tPtmcsrmVu92Owpj3S5nDHvt7+/UezbFQtE7QuyAggDpcvTEm5+BE ma74DQ0i HEmbMSFROx1hWUH3dKqna7yKSg3DscoreH9E51xutx3QbuH5JMqsiS+VcroqZxmlcX6p5LOmwoEJd9MkFbUjbYwf3zLwEFuhRDy9UgGvxLH9MBXFBEM8FTsOgyL/D94W8I6OmJdDxFjWuTfNKpLpnCKKUbwfcmAJluHJDYKJ4pPLVwmAQ+zZVJg5SvFQfurRQu/ABmETM22XQX8U82qyNXBgIBCVYp8ukYd5bSQhkolLZ9Agfa2dba0A1GqiEiMYooTZwNAjlnkY991JTskyXL+hrgQ== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: From: Chuck Lever This line in offset_iterate_dir(): ctx->pos = dentry2offset(dentry) + 1; assumes that the next child entry has an offset value that is greater than the current child entry. Since directory offsets are actually cookies, this heuristic is not always correct. We have tested the current code with a limited offset range to see if this is an operational problem. It doesn't seem to be, but doing a "+ 1" on what is supposed to be an opaque cookie is very likely wrong and brittle. Instead of using the mtree to emit entries in the order of their offset values, use it only to map the initial ctx->pos to a starting entry. Then use the directory's d_children list, which is already maintained by the dcache, to find the next child to emit, as the simple cursor-based implementation still does. Signed-off-by: Chuck Lever --- fs/libfs.c | 89 +++++++++++++++++++++++++++++++++++++++++++----------- 1 file changed, 71 insertions(+), 18 deletions(-) diff --git a/fs/libfs.c b/fs/libfs.c index be641a84047a..862b4203d389 100644 --- a/fs/libfs.c +++ b/fs/libfs.c @@ -241,9 +241,9 @@ const struct inode_operations simple_dir_inode_operations = { }; EXPORT_SYMBOL(simple_dir_inode_operations); -/* 0 is '.', 1 is '..', so always start with offset 2 or more */ enum { - DIR_OFFSET_MIN = 2, + DIR_OFFSET_FIRST = 2, /* seek to the first real entry */ + DIR_OFFSET_MIN = 3, /* lowest real offset value */ }; static void offset_set(struct dentry *dentry, long offset) @@ -267,7 +267,7 @@ void simple_offset_init(struct offset_ctx *octx) { mt_init_flags(&octx->mt, MT_FLAGS_ALLOC_RANGE); lockdep_set_class(&octx->mt.ma_lock, &simple_offset_lock_class); - octx->next_offset = DIR_OFFSET_MIN; + octx->next_offset = 0; } /** @@ -511,10 +511,30 @@ static loff_t offset_dir_llseek(struct file *file, loff_t offset, int whence) return vfs_setpos(file, offset, LONG_MAX); } -static struct dentry *offset_find_next(struct offset_ctx *octx, loff_t offset) +static noinline_for_stack struct dentry *offset_dir_first(struct file *file) { + struct dentry *child, *found = NULL, *dir = file->f_path.dentry; + + spin_lock(&dir->d_lock); + child = d_first_child(dir); + if (child && simple_positive(child)) { + spin_lock_nested(&child->d_lock, DENTRY_D_LOCK_NESTED); + if (simple_positive(child)) + found = dget_dlock(child); + spin_unlock(&child->d_lock); + } + spin_unlock(&dir->d_lock); + return found; +} + +static noinline_for_stack struct dentry * +offset_dir_lookup(struct file *file, loff_t offset) +{ + struct dentry *child, *found = NULL, *dir = file->f_path.dentry; + struct inode *inode = d_inode(dir); + struct offset_ctx *octx = inode->i_op->get_offset_ctx(inode); + MA_STATE(mas, &octx->mt, offset, offset); - struct dentry *child, *found = NULL; rcu_read_lock(); child = mas_find(&mas, LONG_MAX); @@ -538,29 +558,62 @@ static bool offset_dir_emit(struct dir_context *ctx, struct dentry *dentry) inode->i_ino, fs_umode_to_dtype(inode->i_mode)); } +/* + * This is find_next_child() without the dput() tail. We might + * combine offset_dir_next() and find_next_child(). + */ +static struct dentry *offset_dir_next(struct dentry *dentry) +{ + struct dentry *parent = dentry->d_parent; + struct dentry *d, *found = NULL; + + spin_lock(&parent->d_lock); + d = d_next_sibling(dentry); + hlist_for_each_entry_from(d, d_sib) { + if (simple_positive(d)) { + spin_lock_nested(&d->d_lock, DENTRY_D_LOCK_NESTED); + if (simple_positive(d)) + found = dget_dlock(d); + spin_unlock(&d->d_lock); + if (likely(found)) + break; + } + } + spin_unlock(&parent->d_lock); + return found; +} + static void offset_iterate_dir(struct file *file, struct dir_context *ctx) { - struct dentry *dir = file->f_path.dentry; - struct inode *inode = d_inode(dir); - struct offset_ctx *octx = inode->i_op->get_offset_ctx(inode); - struct dentry *dentry; + struct dentry *dentry, *next = NULL; + + if (ctx->pos == DIR_OFFSET_FIRST) + dentry = offset_dir_first(file); + else + dentry = offset_dir_lookup(file, ctx->pos); + if (!dentry) { + /* ->private_data is protected by f_pos_lock */ + offset_set_eod(file); + return; + } while (true) { - dentry = offset_find_next(octx, ctx->pos); - if (!dentry) { - /* ->private_data is protected by f_pos_lock */ - offset_set_eod(file); - return; - } - if (!offset_dir_emit(ctx, dentry)) { - dput(dentry); + ctx->pos = dentry2offset(dentry); + break; + } + + next = offset_dir_next(dentry); + if (!next) { + offset_set_eod(file); break; } - ctx->pos = dentry2offset(dentry) + 1; dput(dentry); + dentry = next; } + + dput(dentry); } /**