From patchwork Tue Oct 1 10:58:55 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jeff Layton X-Patchwork-Id: 13817774 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id B24B7CE7D11 for ; Tue, 1 Oct 2024 10:59:17 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 348F5280071; Tue, 1 Oct 2024 06:59:17 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 2D1A6280068; Tue, 1 Oct 2024 06:59:17 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 14AF6280071; Tue, 1 Oct 2024 06:59:17 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id EA444280068 for ; Tue, 1 Oct 2024 06:59:16 -0400 (EDT) Received: from smtpin28.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id 9C29F160C8D for ; Tue, 1 Oct 2024 10:59:16 +0000 (UTC) X-FDA: 82624736712.28.9C6406F Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by imf10.hostedemail.com (Postfix) with ESMTP id C4CD7C0013 for ; Tue, 1 Oct 2024 10:59:14 +0000 (UTC) Authentication-Results: imf10.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=di+Ntkta; spf=pass (imf10.hostedemail.com: domain of jlayton@kernel.org designates 139.178.84.217 as permitted sender) smtp.mailfrom=jlayton@kernel.org; dmarc=pass (policy=quarantine) header.from=kernel.org ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1727780227; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=q7XdCvNbTwtwXszIgu/p52GnWd98CBWi45ZNHRi5mNs=; b=wT6ltjV2dKR5YHrGKOkhicKkTa0OSS3dpRQt/vZ5gqTLJD8AjQ+t5BMzC7fWeufURxQp51 wbXgcmFdUzZ8aigecqw1t663m6YLvAXszVSLUCINvFKvPrbFwSbmaIqCxlhQlXeI3mS0/b n9qRi4eBiNLEia8cppgo4pPJsQZF/BM= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1727780227; a=rsa-sha256; cv=none; b=WAJOZskjVT8Xv+gjdwGxfkt3SWabe1N3XYsPfD1mHfEPAgV85ZGps7wiiFDDFx2vnodCy8 pgXLsdfL9rTwZjxp5TveM2vHpivvpMyhouwATJuzfW3XkY/SjI28XxD9QalqXKtGaj9LRk kTZCwO2HZ5cCCktCGmPmCba3LZOJisc= ARC-Authentication-Results: i=1; imf10.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=di+Ntkta; spf=pass (imf10.hostedemail.com: domain of jlayton@kernel.org designates 139.178.84.217 as permitted sender) smtp.mailfrom=jlayton@kernel.org; dmarc=pass (policy=quarantine) header.from=kernel.org Received: from smtp.kernel.org (transwarp.subspace.kernel.org [100.75.92.58]) by dfw.source.kernel.org (Postfix) with ESMTP id A8C875C4D4C; Tue, 1 Oct 2024 10:59:09 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id D676DC4CED1; Tue, 1 Oct 2024 10:59:10 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1727780353; bh=Dyhqpi3r2jE4Qp9OIe8aVuLXfVqzINBPByhzwJGovx0=; h=From:Date:Subject:References:In-Reply-To:To:Cc:From; b=di+Ntkta84XIbICT1l6uHEZQqcJAWyuEkz56XDUMly44w/9xQ2bM53vsliOMIyNCY WFZn0iYiL+n2Am2GBNpYLLGzJZTei41kMfnCjKXAGL5+0Kjd9b4Lt6CI4iWbhpmG2d VonfbrVQIEV8jsfwNJ5Mj9v/c7o31NnoSV1WAqv/nBIXVbbW8KG16DvSATspUnUieR d2C+aaJF5j9nFIyonHxYD1CChWF6buB3hjeqpFSbgHAPikLudDgCuflPomFx1Sjw1y cNl6EADebtp4b5OqDvoz9O8qUSnTiSWj3lfjrGwBGUyH4qcPsWesmPEVq119GT5KqI SXviJqRrkTDvg== From: Jeff Layton Date: Tue, 01 Oct 2024 06:58:55 -0400 Subject: [PATCH v8 01/12] timekeeping: add interfaces for handling timestamps with a floor value MIME-Version: 1.0 Message-Id: <20241001-mgtime-v8-1-903343d91bc3@kernel.org> References: <20241001-mgtime-v8-0-903343d91bc3@kernel.org> In-Reply-To: <20241001-mgtime-v8-0-903343d91bc3@kernel.org> To: John Stultz , Thomas Gleixner , Stephen Boyd , Alexander Viro , Christian Brauner , Jan Kara , Steven Rostedt , Masami Hiramatsu , Mathieu Desnoyers , Jonathan Corbet , Randy Dunlap , Chandan Babu R , "Darrick J. Wong" , Theodore Ts'o , Andreas Dilger , Chris Mason , Josef Bacik , David Sterba , Hugh Dickins , Andrew Morton , Chuck Lever , Vadim Fedorenko Cc: linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-trace-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-xfs@vger.kernel.org, linux-ext4@vger.kernel.org, linux-btrfs@vger.kernel.org, linux-nfs@vger.kernel.org, linux-mm@kvack.org, Jeff Layton X-Mailer: b4 0.14.1 X-Developer-Signature: v=1; a=openpgp-sha256; l=7181; i=jlayton@kernel.org; h=from:subject:message-id; bh=Dyhqpi3r2jE4Qp9OIe8aVuLXfVqzINBPByhzwJGovx0=; b=owEBbQKS/ZANAwAIAQAOaEEZVoIVAcsmYgBm+9X5JtTXeFjUK9UuoZQvQI8+qcObCx94f/dTY /S/ZWh1uamJAjMEAAEIAB0WIQRLwNeyRHGyoYTq9dMADmhBGVaCFQUCZvvV+QAKCRAADmhBGVaC FSXQD/4lCar2TPcWziWfbWJoJLmUiOIURbAaVR6eJgKLBKHOgr9JM3SpvS1z6bDZU5a7tpyQvS7 YEwCjNF/vVWqnVGaNq9gSku9USOKNV4bvpJtatqhqKkYJGW99o6FoD0oP1M/hgJOtgZ4y4Ue0QE +I1FGS3T0QiJLp7T0QRNE548J5KdjyzC8d1vWEKHBoQaYuQNHOANILK2sx+7Ll9GmtR3oFKM6FZ vAb9Uq9cZiMCQH96uRU9SwOI7xRmv+s9wDvcVEmJ/C5sOfatL+SW1C9BH2scMXRXPeOHouFG24e Kssf4xMp6TxLT3pnQERfygw1v3T7JiSand39N+0sLQgE0Udb4XnXtTUQ86OOSeqmfg7lIQlq+fz RYk48ZGEWYokMxJBNf1zyXtAR6GHpECJ4+eWQsgZ3gh/vRjtepMfPyaQggBoAu9bcqZJzUdA7B4 PTKzVJlkCzn8FY/a8Wh/J7xzRBS9xds9fqTlTYLyBrxE7sIvG/7KXXbq/9rBSTJ3nZMrKrv3gzG pNmJlvToQsJSSGSiZJZxdt97nI2B4DD4xAo87YdPd+1rxtE3PzeiF2KGu0Fdg+xA+GivAJc2v3t HamDS6Ofti8G/RFQGXJhwpAPytOmoiBVhg8iETXmCBzB5kFpn66ywgDs1zySURdVisrTq+6Qvxx f5+y8P3aSPKiW+w== X-Developer-Key: i=jlayton@kernel.org; a=openpgp; fpr=4BC0D7B24471B2A184EAF5D3000E684119568215 X-Stat-Signature: 1ttt6w4s876mdgspbxdsety5c1hsi4ih X-Rspamd-Queue-Id: C4CD7C0013 X-Rspam-User: X-Rspamd-Server: rspam08 X-HE-Tag: 1727780354-751441 X-HE-Meta: U2FsdGVkX1/RHAo6d8/sOBCeZSWcLsamLa+bUTSaCfV4BkRpeeNj0H/QlZlu9dSoXG98+tUG2qW2RIf/8HVTmFz4IDoMZuSYkfVwVKs6NNsYmjcOr0jwdszu4AkTxpYOSdGQaUkbo+hu4pqJm1RXgHYwugUM1iOp4d9bFaHL3lSwkenQ4GC3E9XyTtXa2Dgt1QKb/3dV8IEWqJ+iBfll/z2PKwXouMA0khFwYeLol9rzL91X5zyqZWSJ7hlAZO1KMQXtg/3t2UVfohxlPkQWAJ4adPBuBYpnFuZ+0RoQr6eZig46DOqi9DSGw2JwIhPJvVy68oIWyI6u0IpdzYESk0g6xexRyfcf6qLYmhC9LpjUAosnx6viXqfSI0KEd37Ym3tlgDgQB8xQ7Ce6d5C9jzpHhMcrwZXmSfZTwT6mOfgLm2iZOzF56LRgGoM8EvR7aUpo9WsprI4/faBui7dSA72UUzzjiRbbUjq7JI4vLVUSV4iEvUErGrGKJ7USK/FxgiQ095UkIWmMYJAkJFPEfU91Zu4061wzZJgQ+SpUa/FeDc3FHINMGVbFLKKB9ybJL6ObIrTBX4gop0RJQeo2NkQSR/7llXgQOoEvlih8VGjvmmHrag0e4beiI1bcGX3v5KlT7ent2GMyPgHxHgYnKvF5S12yziVPRws6knNzpd3h1F1NM+6u4jLzoBZtTmR7UsA5Vz5zpXhIsUgplFkNKi5cPtIxN4fM8R7QCp3etbIl/E1l7KeNQNXojj00bmGIEV6trehJQn76CYkGelUiNtUpBpzCx8VeFWgKfYRZx6s2X5yW3SYpFGt5tHJG5dAHlB6pZ3sTEZscOHUB8kOscrLpZI5hHNmxcmR+wm+QkdQ9FcoSOFAkCKqNibcKhK48BvktAgyAXulofwwoD7y3yT1YcnZr3xoAMa8Z7hNW8ctQIqbLw2Vr3J8DcBW39YCX7OGLyuEey/S5dlzj29Q DZPMdmYB iIS5ihIxevaZuS9XCOk/x1f5584XYcZ72iGKAtC2/sl9gVwTmcIpy+79wBZl9/veZqvxCp4e+g9H2R0PUskuK5nJnrGp+7hBHsFDIKSyyKahC7q7Qb3ZKn751iR/lktKrp1tfHBOVOg6J3S4Bj6An/6sJ1UdCrLLPrQp5JHm7ASDYIeIYReSixMl4AvyhL9OOod8tvKEl8xxi3VBxE/e67Jj0WrQQFWmLnzcBRoWk13s3ok8Tu/zY5qaVJRSF74WgoyDhjCgda4Rg6gQbWJpjFjcWDg+PX4DUMbL3BmWc1xQvb9Vio76P53SjbA== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Multigrain timestamps allow the kernel to use fine-grained timestamps when an inode's attributes is being actively observed via ->getattr(). With this support, it's possible for a file to get a fine-grained timestamp, and another modified after it to get a coarse-grained stamp that is earlier than the fine-grained time. If this happens then the files can appear to have been modified in reverse order, which breaks VFS ordering guarantees. To prevent this, maintain a floor value for multigrain timestamps. Whenever a fine-grained timestamp is handed out, record it, and when coarse-grained stamps are handed out, ensure they are not earlier than that value. If the coarse-grained timestamp is earlier than the fine-grained floor, return the floor value instead. Add a static singleton atomic64_t into timekeeper.c that we can use to keep track of the latest fine-grained time ever handed out. This is tracked as a monotonic ktime_t value to ensure that it isn't affected by clock jumps. Because it is updated at different times than the rest of the timekeeper object, the floor value is managed independently of the timekeeper via a cmpxchg() operation, and sits on its own cacheline. This patch also adds two new public interfaces: - ktime_get_coarse_real_ts64_mg() fills a timespec64 with the later of the coarse-grained clock and the floor time - ktime_get_real_ts64_mg() gets the fine-grained clock value, and tries to swap it into the floor. A timespec64 is filled with the result. Since the floor is global, take care to avoid updating it unless it's absolutely necessary. If we do the cmpxchg and find that the value has been updated since we fetched it, then we discard the fine-grained time that was fetched in favor of the recent update. Note that the VFS ordering guarantees assume that the realtime clock does not experience a backward jump. POSIX requires that we stamp files using realtime clock values, so if a backward clock jump occurs, then files can appear to have been modified in reverse order. Tested-by: Randy Dunlap # documentation bits Signed-off-by: Jeff Layton Acked-by: John Stultz --- include/linux/timekeeping.h | 4 ++ kernel/time/timekeeping.c | 96 +++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 100 insertions(+) diff --git a/include/linux/timekeeping.h b/include/linux/timekeeping.h index fc12a9ba2c88..7aa85246c183 100644 --- a/include/linux/timekeeping.h +++ b/include/linux/timekeeping.h @@ -45,6 +45,10 @@ extern void ktime_get_real_ts64(struct timespec64 *tv); extern void ktime_get_coarse_ts64(struct timespec64 *ts); extern void ktime_get_coarse_real_ts64(struct timespec64 *ts); +/* Multigrain timestamp interfaces */ +extern void ktime_get_coarse_real_ts64_mg(struct timespec64 *ts); +extern void ktime_get_real_ts64_mg(struct timespec64 *ts); + void getboottime64(struct timespec64 *ts); /* diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c index 7e6f409bf311..37004a4758cf 100644 --- a/kernel/time/timekeeping.c +++ b/kernel/time/timekeeping.c @@ -114,6 +114,22 @@ static struct tk_fast tk_fast_raw ____cacheline_aligned = { .base[1] = FAST_TK_INIT, }; +/* + * Multigrain timestamps require that we keep track of the latest fine-grained + * timestamp that has been issued, and never return a coarse-grained timestamp + * that is earlier than that value. + * + * mg_floor represents the latest fine-grained time that we have handed out as + * a timestamp on the system. Tracked as a monotonic ktime_t, and converted to + * the realtime clock on an as-needed basis. + * + * This ensures that we never issue a timestamp earlier than one that has + * already been issued, as long as the realtime clock never experiences a + * backward clock jump. If the realtime clock is set to an earlier time, then + * realtime timestamps can appear to go backward. + */ +static __cacheline_aligned_in_smp atomic64_t mg_floor; + static inline void tk_normalize_xtime(struct timekeeper *tk) { while (tk->tkr_mono.xtime_nsec >= ((u64)NSEC_PER_SEC << tk->tkr_mono.shift)) { @@ -2394,6 +2410,86 @@ void ktime_get_coarse_real_ts64(struct timespec64 *ts) } EXPORT_SYMBOL(ktime_get_coarse_real_ts64); +/** + * ktime_get_coarse_real_ts64_mg - return latter of coarse grained time or floor + * @ts: timespec64 to be filled + * + * Fetch the global mg_floor value, convert it to realtime and + * compare it to the current coarse-grained time. Fill @ts with + * whichever is latest. Note that this is a filesystem-specific + * interface and should be avoided outside of that context. + */ +void ktime_get_coarse_real_ts64_mg(struct timespec64 *ts) +{ + struct timekeeper *tk = &tk_core.timekeeper; + u64 floor = atomic64_read(&mg_floor); + ktime_t f_real, offset, coarse; + unsigned int seq; + + do { + seq = read_seqcount_begin(&tk_core.seq); + *ts = tk_xtime(tk); + offset = tk_core.timekeeper.offs_real; + } while (read_seqcount_retry(&tk_core.seq, seq)); + + coarse = timespec64_to_ktime(*ts); + f_real = ktime_add(floor, offset); + if (ktime_after(f_real, coarse)) + *ts = ktime_to_timespec64(f_real); +} +EXPORT_SYMBOL_GPL(ktime_get_coarse_real_ts64_mg); + +/** + * ktime_get_real_ts64_mg - attempt to update floor value and return result + * @ts: pointer to the timespec to be set + * + * Get a monotonic fine-grained time value and attempt to swap it into the + * floor. If it succeeds then accept the new floor value. If it fails + * then another task raced in during the interim time and updated the floor. + * That value is just as valid, so accept that value in this case. + * + * @ts will be filled with the resulting floor value, regardless of + * the outcome of the swap. Note that this is a filesystem specific interface + * and should be avoided outside of that context. + */ +void ktime_get_real_ts64_mg(struct timespec64 *ts) +{ + struct timekeeper *tk = &tk_core.timekeeper; + ktime_t old = atomic64_read(&mg_floor); + ktime_t offset, mono; + unsigned int seq; + u64 nsecs; + + do { + seq = read_seqcount_begin(&tk_core.seq); + + ts->tv_sec = tk->xtime_sec; + mono = tk->tkr_mono.base; + nsecs = timekeeping_get_ns(&tk->tkr_mono); + offset = tk_core.timekeeper.offs_real; + } while (read_seqcount_retry(&tk_core.seq, seq)); + + mono = ktime_add_ns(mono, nsecs); + + /* + * Attempt to update the floor with the new time value. Accept the + * resulting floor value regardless of the outcome of the swap. + */ + if (atomic64_try_cmpxchg(&mg_floor, &old, mono)) { + ts->tv_nsec = 0; + timespec64_add_ns(ts, nsecs); + } else { + /* + * Something has changed mg_floor since "old" was + * fetched. "old" has now been updated with the + * current value of mg_floor, so use that to return + * the current coarse floor value. + */ + *ts = ktime_to_timespec64(ktime_add(old, offset)); + } +} +EXPORT_SYMBOL_GPL(ktime_get_real_ts64_mg); + void ktime_get_coarse_ts64(struct timespec64 *ts) { struct timekeeper *tk = &tk_core.timekeeper;