From patchwork Sat Sep 14 17:07:14 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jeff Layton X-Patchwork-Id: 13804477 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 10AFE1D094B; Sat, 14 Sep 2024 17:07:24 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1726333647; cv=none; b=q3JVK/f5hNTfuWAao8gKYIbLgjxS7x9Nehi+EyXIWLIN0lG8nM2/M0XMEggpUJraeD+DkGuXqPKu7AB/afHK797hUC1HBt8WE1SgZJQhBlzgtIkCpzH8Dl5qEpUM//I8TPDIjSQuJQ0XYMpLuCl9RvLbbdUkLK5AcUB60z43lHo= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1726333647; c=relaxed/simple; bh=I9/47O7qo7bvUG1tGenasxTgf1bKbVz1h75bJIuETII=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:References: In-Reply-To:To:Cc; b=Il/kKVrKldO9ZdmI7BPUvh0svpI3Z0R5k7dyILnJSXIW70et3fXzU3hmDnUhVVGSdDjKpTFhar14RNTGFXUOj8Fn2w2aiDFPKGEwNtU1rRf4TA+8j0Zl/O/HDkrdOErmj0TovADThGYo9eITZb/qI4eEEokUV+8sMIpwF+I4GXQ= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=qgrESo/2; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="qgrESo/2" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 08B09C4CECE; Sat, 14 Sep 2024 17:07:21 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1726333644; bh=I9/47O7qo7bvUG1tGenasxTgf1bKbVz1h75bJIuETII=; h=From:Date:Subject:References:In-Reply-To:To:Cc:From; b=qgrESo/2dKlnGWdwdJt4WBacgChSPGID4vFllkaU0ksLvfh8W0/erR8TcsEd20YuZ qy7LG7yQaw1+e/hwVRstO2c5SRLjUDVSP/ai1Epg+DZnh8JrRgfa5TP41U31y2xz/7 +080vYTYI1+3AEo46MOZstgKVgoVQmbrOFSFzlkobnc00Vjxiieldhagjt4wBVp973 XWYxtSPcLauiLWBgbApL12W40KgS+mjYM9Rf0aPgs+ds4DZXA4+rRSTEnRWVCNll2X J3CCfCt44JNNKu6dnZep+GyBvY2msx2KWB6EgyIY2/rZe6t72Qcb7qRfabSY4kKAsr o+N7Q0cpT2x6w== From: Jeff Layton Date: Sat, 14 Sep 2024 13:07:14 -0400 Subject: [PATCH v8 01/11] timekeeping: move multigrain timestamp floor handling into timekeeper Precedence: bulk X-Mailing-List: linux-nfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Message-Id: <20240914-mgtime-v8-1-5bd872330bed@kernel.org> References: <20240914-mgtime-v8-0-5bd872330bed@kernel.org> In-Reply-To: <20240914-mgtime-v8-0-5bd872330bed@kernel.org> To: John Stultz , Thomas Gleixner , Stephen Boyd , Alexander Viro , Christian Brauner , Jan Kara , Steven Rostedt , Masami Hiramatsu , Mathieu Desnoyers , Jonathan Corbet , Chandan Babu R , "Darrick J. Wong" , Theodore Ts'o , Andreas Dilger , Chris Mason , Josef Bacik , David Sterba , Hugh Dickins , Andrew Morton , Chuck Lever , Vadim Fedorenko Cc: Randy Dunlap , linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-trace-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-xfs@vger.kernel.org, linux-ext4@vger.kernel.org, linux-btrfs@vger.kernel.org, linux-nfs@vger.kernel.org, linux-mm@kvack.org, Jeff Layton X-Mailer: b4 0.14.1 X-Developer-Signature: v=1; a=openpgp-sha256; l=5475; i=jlayton@kernel.org; h=from:subject:message-id; bh=I9/47O7qo7bvUG1tGenasxTgf1bKbVz1h75bJIuETII=; b=owEBbQKS/ZANAwAIAQAOaEEZVoIVAcsmYgBm5cLFY6lXZvZm8RxN6kOFAGtO8JT2riURAMDHv 52tncP7VVSJAjMEAAEIAB0WIQRLwNeyRHGyoYTq9dMADmhBGVaCFQUCZuXCxQAKCRAADmhBGVaC FVmqD/42vnYgXZg5mzwCTj8STzoscinyiV3K/zfYJZXBKlhsahthN4ivhC4Pb/lyArui3/IxAQq EWZuccCCsliVOR5fGIXxBnGCyGUE/NFyJcgQVEdAjlz/lglEXC0qzAgFyqnBFXLjHIO/vEBN9A+ 3YgW/1rz3+jDt7H0Kfaszw9349+fGDmnzIlYg0fLqOG1l9LIdTFY7d1tCWtOxFJOhGeP+m5EUna +jv01w18PUKFWAnYS5w3rpbXPSkmQt/sw6EeN19nRR3QwlkumZck7VVgKrkWPzk1k1fY1omGGeN uu8YXrAhrI48XJ65fOyuUw4jBALv+ImDZwQGejRRPsguzETloqOXZI7HFDS+gpo52sMT22HwmgI XrpgwBa1uVZVVqf02ZnjHftQRwVpHryUcccG4KCDzFW1ZkTKyIjUxpqhXcEAp6jR8/Roa58umQ1 qXMl1cHM2aIpIjH7vLxInGCDw/RpXZMST+q5+QwCvBnKs4WeJBuaI0XLBsBMmbfQhhqiKf3L+HF 7xIK1uDDqzl9gZXVjuD8xpxGxLyhcf6BoT+dbv1E7KV5V8C8OJtRfPiuB/r6fbvaeqXDQLdlNxv vHj65OVIB+mRWvY0NEVU9WSQgxeWZ24UWiflRgY4sxq6+zKOkdbJNn9cAkfgQhI/GS1tqCKU3Mq e6rjZuBfD33jY/w== X-Developer-Key: i=jlayton@kernel.org; a=openpgp; fpr=4BC0D7B24471B2A184EAF5D3000E684119568215 For multigrain timestamps, we must keep track of the latest timestamp that has ever been handed out, and never hand out a coarse time below that value. Add a static singleton atomic64_t into timekeeper.c that we can use to keep track of the latest fine-grained time ever handed out. This is tracked as a monotonic ktime_t value to ensure that it isn't affected by clock jumps. Add two new public interfaces: - ktime_get_coarse_real_ts64_mg() fills a timespec64 with the later of the coarse-grained clock and the floor time - ktime_get_real_ts64_mg() gets the fine-grained clock value, and tries to swap it into the floor. A timespec64 is filled with the result. Since the floor is global, we take great pains to avoid updating it unless it's absolutely necessary. If we do the cmpxchg and find that the value has been updated since we fetched it, then we discard the fine-grained time that was fetched in favor of the recent update. To maximize the window of this occurring when multiple tasks are racing to update the floor, ktime_get_coarse_real_ts64_mg returns a cookie value that represents the state of the floor tracking word, and ktime_get_real_ts64_mg accepts a cookie value that it uses as the "old" value when calling cmpxchg(). Signed-off-by: Jeff Layton --- include/linux/timekeeping.h | 4 +++ kernel/time/timekeeping.c | 82 +++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 86 insertions(+) diff --git a/include/linux/timekeeping.h b/include/linux/timekeeping.h index fc12a9ba2c88..7aa85246c183 100644 --- a/include/linux/timekeeping.h +++ b/include/linux/timekeeping.h @@ -45,6 +45,10 @@ extern void ktime_get_real_ts64(struct timespec64 *tv); extern void ktime_get_coarse_ts64(struct timespec64 *ts); extern void ktime_get_coarse_real_ts64(struct timespec64 *ts); +/* Multigrain timestamp interfaces */ +extern void ktime_get_coarse_real_ts64_mg(struct timespec64 *ts); +extern void ktime_get_real_ts64_mg(struct timespec64 *ts); + void getboottime64(struct timespec64 *ts); /* diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c index 5391e4167d60..16937242b904 100644 --- a/kernel/time/timekeeping.c +++ b/kernel/time/timekeeping.c @@ -114,6 +114,13 @@ static struct tk_fast tk_fast_raw ____cacheline_aligned = { .base[1] = FAST_TK_INIT, }; +/* + * This represents the latest fine-grained time that we have handed out as a + * timestamp on the system. Tracked as a monotonic ktime_t, and converted to the + * realtime clock on an as-needed basis. + */ +static __cacheline_aligned_in_smp atomic64_t mg_floor; + static inline void tk_normalize_xtime(struct timekeeper *tk) { while (tk->tkr_mono.xtime_nsec >= ((u64)NSEC_PER_SEC << tk->tkr_mono.shift)) { @@ -2394,6 +2401,81 @@ void ktime_get_coarse_real_ts64(struct timespec64 *ts) } EXPORT_SYMBOL(ktime_get_coarse_real_ts64); +/** + * ktime_get_coarse_real_ts64_mg - get later of coarse grained time or floor + * @ts: timespec64 to be filled + * + * Adjust floor to realtime and compare it to the coarse time. Fill + * @ts with the latest one. Note that this is a filesystem-specific + * interface and should be avoided outside of that context. + */ +void ktime_get_coarse_real_ts64_mg(struct timespec64 *ts) +{ + struct timekeeper *tk = &tk_core.timekeeper; + u64 floor = atomic64_read(&mg_floor); + ktime_t f_real, offset, coarse; + unsigned int seq; + + WARN_ON(timekeeping_suspended); + + do { + seq = read_seqcount_begin(&tk_core.seq); + *ts = tk_xtime(tk); + offset = *offsets[TK_OFFS_REAL]; + } while (read_seqcount_retry(&tk_core.seq, seq)); + + coarse = timespec64_to_ktime(*ts); + f_real = ktime_add(floor, offset); + if (ktime_after(f_real, coarse)) + *ts = ktime_to_timespec64(f_real); +} +EXPORT_SYMBOL_GPL(ktime_get_coarse_real_ts64_mg); + +/** + * ktime_get_real_ts64_mg - attempt to update floor value and return result + * @ts: pointer to the timespec to be set + * + * Get a current monotonic fine-grained time value and attempt to swap + * it into the floor. @ts will be filled with the resulting floor value, + * regardless of the outcome of the swap. Note that this is a filesystem + * specific interface and should be avoided outside of that context. + */ +void ktime_get_real_ts64_mg(struct timespec64 *ts, u64 cookie) +{ + struct timekeeper *tk = &tk_core.timekeeper; + ktime_t old = atomic64_read(&mg_floor); + ktime_t offset, mono; + unsigned int seq; + u64 nsecs; + + WARN_ON(timekeeping_suspended); + + do { + seq = read_seqcount_begin(&tk_core.seq); + + ts->tv_sec = tk->xtime_sec; + mono = tk->tkr_mono.base; + nsecs = timekeeping_get_ns(&tk->tkr_mono); + offset = *offsets[TK_OFFS_REAL]; + } while (read_seqcount_retry(&tk_core.seq, seq)); + + mono = ktime_add_ns(mono, nsecs); + + if (atomic64_try_cmpxchg(&mg_floor, &old, mono)) { + ts->tv_nsec = 0; + timespec64_add_ns(ts, nsecs); + } else { + /* + * Something has changed mg_floor since "old" was + * fetched. "old" has now been updated with the + * current value of mg_floor, so use that to return + * the current coarse floor value. + */ + *ts = ktime_to_timespec64(ktime_add(old, offset)); + } +} +EXPORT_SYMBOL_GPL(ktime_get_real_ts64_mg); + void ktime_get_coarse_ts64(struct timespec64 *ts) { struct timekeeper *tk = &tk_core.timekeeper; From patchwork Sat Sep 14 17:07:15 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jeff Layton X-Patchwork-Id: 13804478 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id CD1DA1D220E; Sat, 14 Sep 2024 17:07:27 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1726333648; cv=none; b=B+pFAFzvHM3q7B0xlTyvIrxlsOO2szU2m2UXQFNCBq/WmYMdbyp97e3NGmgucBFO5c5oAJhy0B/alEH9KMTwe6boarb1G96Tal/86Y4qeSs+9bxC+CddCnPtN2t7AOWInGDMBcMi+rMMsReyTQWybe5LR52Pb+rRZUtWNC+m93I= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1726333648; c=relaxed/simple; bh=6kb3F02fF3yC3FwnTxgN5Y4EQfLhA8JPkgEVySAXb5A=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:References: In-Reply-To:To:Cc; b=MSdM7eQM7stL0tNQOtRHXLLllv1Cp2PhPG8pimRC6JMtHfW3nb83wTBT9HG4rj1shWFcyx8oD2omtxzreOwtHfWLSArYz+rZrGtf9Kz0XqPyGdr60AyUjdiWhz5s/uXuciAiaZYCifGCdSCwcd157+NaD2DUc2CSVSu9ByRc4m4= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=bqyCIWG0; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="bqyCIWG0" Received: by smtp.kernel.org (Postfix) with ESMTPSA id D048FC4CEC0; Sat, 14 Sep 2024 17:07:24 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1726333647; bh=6kb3F02fF3yC3FwnTxgN5Y4EQfLhA8JPkgEVySAXb5A=; h=From:Date:Subject:References:In-Reply-To:To:Cc:From; b=bqyCIWG02cE5/h88EEs6kuvJhlA+x+RPzR7r9exRHfe9fz3NpXdiY2wbiWYTVT8sQ GzFJqoPzpscQ03XwTQLvgu7O1wGfcWJlc0tU5tOv/8BEh/ylidITUVI7Vpz9hP54ot 5L4uiM1F8z5HLBAGcZdMob1Ad0wJZLcVzv4z39dPGhyiSti9bHyC1Tu8VgopKQ3ycD ziX+913EeD3y3nBF3XSAOcDyjx80QvsDw2SFTIdN4X6BcCB12bHBpmezUm2YKcLG0i kjfOQCQewIAY/21h2SwbIJ81nEUtelDZr02Dh/W84RPgQ6oCPVsyyRC9jUqHMYQLtX juVvj++9tQqjA== From: Jeff Layton Date: Sat, 14 Sep 2024 13:07:15 -0400 Subject: [PATCH v8 02/11] fs: add infrastructure for multigrain timestamps Precedence: bulk X-Mailing-List: linux-nfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Message-Id: <20240914-mgtime-v8-2-5bd872330bed@kernel.org> References: <20240914-mgtime-v8-0-5bd872330bed@kernel.org> In-Reply-To: <20240914-mgtime-v8-0-5bd872330bed@kernel.org> To: John Stultz , Thomas Gleixner , Stephen Boyd , Alexander Viro , Christian Brauner , Jan Kara , Steven Rostedt , Masami Hiramatsu , Mathieu Desnoyers , Jonathan Corbet , Chandan Babu R , "Darrick J. Wong" , Theodore Ts'o , Andreas Dilger , Chris Mason , Josef Bacik , David Sterba , Hugh Dickins , Andrew Morton , Chuck Lever , Vadim Fedorenko Cc: Randy Dunlap , linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-trace-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-xfs@vger.kernel.org, linux-ext4@vger.kernel.org, linux-btrfs@vger.kernel.org, linux-nfs@vger.kernel.org, linux-mm@kvack.org, Jeff Layton X-Mailer: b4 0.14.1 X-Developer-Signature: v=1; a=openpgp-sha256; l=13790; i=jlayton@kernel.org; h=from:subject:message-id; bh=6kb3F02fF3yC3FwnTxgN5Y4EQfLhA8JPkgEVySAXb5A=; b=owEBbQKS/ZANAwAIAQAOaEEZVoIVAcsmYgBm5cLFarm4u48HfsabqF5lelJTtzuDvkf47vB0k xXDWBJEVAOJAjMEAAEIAB0WIQRLwNeyRHGyoYTq9dMADmhBGVaCFQUCZuXCxQAKCRAADmhBGVaC FVTnD/9tiqxWQUOghhcwLETn1W5RSDVynLPbNaTtLiTsCGfB7rmh5vCKK753A9Ld4ONwdFdkMNj Knl3QMFTBH4SlyYSV2Pg9tqNYG42kjbbXYIr1uZTcCxUgQuca82sN7ECn6K/KqpBKnSBcUYuguJ 7cg0hdRSo7l5OtGWZbr7ly15ezibrezG2bMLn2WQT3m0J29KKwbVRpXxiT1JRo2tPrMBR0MSBGx rXq9PDpvqLU6l9kOYyKm2FkFLrYK4eZM6jqGfiP9ML6ywkqrMFdKGMrCl2LtQDoUSFICgQ9ofZu bVXce4cRe2T6jqFMrGLnGtLS6G55D7UldNwtS6tVpQb4kyX1QgKJKMS8QckHzgWLyl5ltYxaTeq IXlbnvznVuaQaQ9g+1SkTLNhL/HtM42Yuz0H+F12JuHcuLaSEf6fGNEbOdIE5NDNwuc2srynL82 ce0ryxnBKq9iV2qwyieK2l+PZPNXOhabuWbmTbiF7eLUisaYEdN0c7NFy3Q8lfW7eX73UVuRyWF ayU6DXUe4bqtI+L4iUMxqz7pAkDIheoDj5AMTX9hIgTEl0VNZ+k7MNB/If/hEB4l+3TbEV73/8e y4QQTjKXvIIEHA4zufUljPUsPKYWO2xorB8jE8WsZLyofLMYwOj55DNltfQ07VaajdxvVP1jVAa 2OaOsCYKSlEGsug== X-Developer-Key: i=jlayton@kernel.org; a=openpgp; fpr=4BC0D7B24471B2A184EAF5D3000E684119568215 The VFS has always used coarse-grained timestamps when updating the ctime and mtime after a change. This has the benefit of allowing filesystems to optimize away a lot metadata updates, down to around 1 per jiffy, even when a file is under heavy writes. Unfortunately, this has always been an issue when we're exporting via NFSv3, which relies on timestamps to validate caches. A lot of changes can happen in a jiffy, so timestamps aren't sufficient to help the client decide when to invalidate the cache. Even with NFSv4, a lot of exported filesystems don't properly support a change attribute and are subject to the same problems with timestamp granularity. Other applications have similar issues with timestamps (e.g backup applications). If we were to always use fine-grained timestamps, that would improve the situation, but that becomes rather expensive, as the underlying filesystem would have to log a lot more metadata updates. What we need is a way to only use fine-grained timestamps when they are being actively queried. Use the (unused) top bit in inode->i_ctime_nsec as a flag that indicates whether the current timestamps have been queried via stat() or the like. When it's set, we allow the kernel to use a fine-grained timestamp iff it's necessary to make the ctime show a different value. This solves the problem of being able to distinguish the timestamp between updates, but introduces a new problem: it's now possible for a file being changed to get a fine-grained timestamp. A file that is altered just a bit later can then get a coarse-grained one that appears older than the earlier fine-grained time. This violates timestamp ordering guarantees. To remedy this, keep a global monotonic atomic64_t value that acts as a timestamp floor. When we go to stamp a file, we first get the latter of the current floor value and the current coarse-grained time. If the inode ctime hasn't been queried then we just attempt to stamp it with that value. If it has been queried, then first see whether the current coarse time is later than the existing ctime. If it is, then we accept that value. If it isn't, then we get a fine-grained timestamp. Filesystems can opt into this by setting the FS_MGTIME fstype flag. Others should be unaffected (other than being subject to the same floor value as multigrain filesystems). Signed-off-by: Jeff Layton --- fs/inode.c | 137 +++++++++++++++++++++++++++++++++++++++++++---------- fs/stat.c | 39 ++++++++++++++- include/linux/fs.h | 34 +++++++++---- 3 files changed, 175 insertions(+), 35 deletions(-) diff --git a/fs/inode.c b/fs/inode.c index 10c4619faeef..232b474218e6 100644 --- a/fs/inode.c +++ b/fs/inode.c @@ -2172,19 +2172,58 @@ int file_remove_privs(struct file *file) } EXPORT_SYMBOL(file_remove_privs); +/** + * current_time - Return FS time (possibly fine-grained) + * @inode: inode. + * + * Return the current time truncated to the time granularity supported by + * the fs, as suitable for a ctime/mtime change. If the ctime is flagged + * as having been QUERIED, get a fine-grained timestamp, but don't update + * the floor. + * + * For a multigrain inode, this is effectively an estimate of the timestamp + * that a file would receive. An actual update must go through + * inode_set_ctime_current(). + */ +struct timespec64 current_time(struct inode *inode) +{ + struct timespec64 now; + u32 cns; + + ktime_get_coarse_real_ts64_mg(&now); + + if (!is_mgtime(inode)) + goto out; + + /* If nothing has queried it, then coarse time is fine */ + cns = smp_load_acquire(&inode->i_ctime_nsec); + if (cns & I_CTIME_QUERIED) { + /* + * If there is no apparent change, then get a fine-grained + * timestamp. + */ + if (now.tv_nsec == (cns & ~I_CTIME_QUERIED)) + ktime_get_real_ts64(&now); + } +out: + return timestamp_truncate(now, inode); +} +EXPORT_SYMBOL(current_time); + static int inode_needs_update_time(struct inode *inode) { + struct timespec64 now, ts; int sync_it = 0; - struct timespec64 now = current_time(inode); - struct timespec64 ts; /* First try to exhaust all avenues to not sync */ if (IS_NOCMTIME(inode)) return 0; + now = current_time(inode); + ts = inode_get_mtime(inode); if (!timespec64_equal(&ts, &now)) - sync_it = S_MTIME; + sync_it |= S_MTIME; ts = inode_get_ctime(inode); if (!timespec64_equal(&ts, &now)) @@ -2562,6 +2601,15 @@ void inode_nohighmem(struct inode *inode) } EXPORT_SYMBOL(inode_nohighmem); +struct timespec64 inode_set_ctime_to_ts(struct inode *inode, struct timespec64 ts) +{ + set_normalized_timespec64(&ts, ts.tv_sec, ts.tv_nsec); + inode->i_ctime_sec = ts.tv_sec; + inode->i_ctime_nsec = ts.tv_nsec; + return ts; +} +EXPORT_SYMBOL(inode_set_ctime_to_ts); + /** * timestamp_truncate - Truncate timespec to a granularity * @t: Timespec @@ -2594,36 +2642,75 @@ struct timespec64 timestamp_truncate(struct timespec64 t, struct inode *inode) EXPORT_SYMBOL(timestamp_truncate); /** - * current_time - Return FS time - * @inode: inode. + * inode_set_ctime_current - set the ctime to current_time + * @inode: inode * - * Return the current time truncated to the time granularity supported by - * the fs. + * Set the inode's ctime to the current value for the inode. Returns the + * current value that was assigned. If this is not a multigrain inode, then we + * set it to the later of the coarse time and floor value. * - * Note that inode and inode->sb cannot be NULL. - * Otherwise, the function warns and returns time without truncation. + * If it is multigrain, then we first see if the coarse-grained timestamp is + * distinct from what we have. If so, then we'll just use that. If we have to + * get a fine-grained timestamp, then do so, and try to swap it into the floor. + * We accept the new floor value regardless of the outcome of the cmpxchg. + * After that, we try to swap the new value into i_ctime_nsec. Again, we take + * the resulting ctime, regardless of the outcome of the swap. */ -struct timespec64 current_time(struct inode *inode) +struct timespec64 inode_set_ctime_current(struct inode *inode) { struct timespec64 now; + u32 cns, cur; - ktime_get_coarse_real_ts64(&now); - return timestamp_truncate(now, inode); -} -EXPORT_SYMBOL(current_time); + ktime_get_coarse_real_ts64_mg(&now); + now = timestamp_truncate(now, inode); -/** - * inode_set_ctime_current - set the ctime to current_time - * @inode: inode - * - * Set the inode->i_ctime to the current value for the inode. Returns - * the current value that was assigned to i_ctime. - */ -struct timespec64 inode_set_ctime_current(struct inode *inode) -{ - struct timespec64 now = current_time(inode); + /* Just return that if this is not a multigrain fs */ + if (!is_mgtime(inode)) { + inode_set_ctime_to_ts(inode, now); + goto out; + } - inode_set_ctime_to_ts(inode, now); + /* + * We only need a fine-grained time if someone has queried it, + * and the current coarse grained time isn't later than what's + * already there. + */ + cns = smp_load_acquire(&inode->i_ctime_nsec); + if (cns & I_CTIME_QUERIED) { + struct timespec64 ctime = { .tv_sec = inode->i_ctime_sec, + .tv_nsec = cns & ~I_CTIME_QUERIED }; + + if (timespec64_compare(&now, &ctime) <= 0) { + ktime_get_real_ts64_mg(&now); + now = timestamp_truncate(now, inode); + } + } + + /* No need to cmpxchg if it's exactly the same */ + if (cns == now.tv_nsec && inode->i_ctime_sec == now.tv_sec) + goto out; + cur = cns; +retry: + /* Try to swap the nsec value into place. */ + if (try_cmpxchg(&inode->i_ctime_nsec, &cur, now.tv_nsec)) { + /* If swap occurred, then we're (mostly) done */ + inode->i_ctime_sec = now.tv_sec; + } else { + /* + * Was the change due to someone marking the old ctime QUERIED? + * If so then retry the swap. This can only happen once since + * the only way to clear I_CTIME_QUERIED is to stamp the inode + * with a new ctime. + */ + if (!(cns & I_CTIME_QUERIED) && (cns | I_CTIME_QUERIED) == cur) { + cns = cur; + goto retry; + } + /* Otherwise, keep the existing ctime */ + now.tv_sec = inode->i_ctime_sec; + now.tv_nsec = cur & ~I_CTIME_QUERIED; + } +out: return now; } EXPORT_SYMBOL(inode_set_ctime_current); diff --git a/fs/stat.c b/fs/stat.c index 89ce1be56310..a449626fd460 100644 --- a/fs/stat.c +++ b/fs/stat.c @@ -26,6 +26,35 @@ #include "internal.h" #include "mount.h" +/** + * fill_mg_cmtime - Fill in the mtime and ctime and flag ctime as QUERIED + * @stat: where to store the resulting values + * @request_mask: STATX_* values requested + * @inode: inode from which to grab the c/mtime + * + * Given @inode, grab the ctime and mtime out if it and store the result + * in @stat. When fetching the value, flag it as QUERIED (if not already) + * so the next write will record a distinct timestamp. + */ +void fill_mg_cmtime(struct kstat *stat, u32 request_mask, struct inode *inode) +{ + atomic_t *pcn = (atomic_t *)&inode->i_ctime_nsec; + + /* If neither time was requested, then don't report them */ + if (!(request_mask & (STATX_CTIME|STATX_MTIME))) { + stat->result_mask &= ~(STATX_CTIME|STATX_MTIME); + return; + } + + stat->mtime = inode_get_mtime(inode); + stat->ctime.tv_sec = inode->i_ctime_sec; + stat->ctime.tv_nsec = (u32)atomic_read(pcn); + if (!(stat->ctime.tv_nsec & I_CTIME_QUERIED)) + stat->ctime.tv_nsec = ((u32)atomic_fetch_or(I_CTIME_QUERIED, pcn)); + stat->ctime.tv_nsec &= ~I_CTIME_QUERIED; +} +EXPORT_SYMBOL(fill_mg_cmtime); + /** * generic_fillattr - Fill in the basic attributes from the inode struct * @idmap: idmap of the mount the inode was found from @@ -58,8 +87,14 @@ void generic_fillattr(struct mnt_idmap *idmap, u32 request_mask, stat->rdev = inode->i_rdev; stat->size = i_size_read(inode); stat->atime = inode_get_atime(inode); - stat->mtime = inode_get_mtime(inode); - stat->ctime = inode_get_ctime(inode); + + if (is_mgtime(inode)) { + fill_mg_cmtime(stat, request_mask, inode); + } else { + stat->ctime = inode_get_ctime(inode); + stat->mtime = inode_get_mtime(inode); + } + stat->blksize = i_blocksize(inode); stat->blocks = inode->i_blocks; diff --git a/include/linux/fs.h b/include/linux/fs.h index 6ca11e241a24..eff688e75f2f 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -1613,6 +1613,17 @@ static inline struct timespec64 inode_set_mtime(struct inode *inode, return inode_set_mtime_to_ts(inode, ts); } +/* + * Multigrain timestamps + * + * Conditionally use fine-grained ctime and mtime timestamps when there + * are users actively observing them via getattr. The primary use-case + * for this is NFS clients that use the ctime to distinguish between + * different states of the file, and that are often fooled by multiple + * operations that occur in the same coarse-grained timer tick. + */ +#define I_CTIME_QUERIED ((u32)BIT(31)) + static inline time64_t inode_get_ctime_sec(const struct inode *inode) { return inode->i_ctime_sec; @@ -1620,7 +1631,7 @@ static inline time64_t inode_get_ctime_sec(const struct inode *inode) static inline long inode_get_ctime_nsec(const struct inode *inode) { - return inode->i_ctime_nsec; + return inode->i_ctime_nsec & ~I_CTIME_QUERIED; } static inline struct timespec64 inode_get_ctime(const struct inode *inode) @@ -1631,13 +1642,7 @@ static inline struct timespec64 inode_get_ctime(const struct inode *inode) return ts; } -static inline struct timespec64 inode_set_ctime_to_ts(struct inode *inode, - struct timespec64 ts) -{ - inode->i_ctime_sec = ts.tv_sec; - inode->i_ctime_nsec = ts.tv_nsec; - return ts; -} +struct timespec64 inode_set_ctime_to_ts(struct inode *inode, struct timespec64 ts); /** * inode_set_ctime - set the ctime in the inode @@ -2500,6 +2505,7 @@ struct file_system_type { #define FS_USERNS_MOUNT 8 /* Can be mounted by userns root */ #define FS_DISALLOW_NOTIFY_PERM 16 /* Disable fanotify permission events */ #define FS_ALLOW_IDMAP 32 /* FS has been updated to handle vfs idmappings. */ +#define FS_MGTIME 64 /* FS uses multigrain timestamps */ #define FS_RENAME_DOES_D_MOVE 32768 /* FS will handle d_move() during rename() internally. */ int (*init_fs_context)(struct fs_context *); const struct fs_parameter_spec *parameters; @@ -2523,6 +2529,17 @@ struct file_system_type { #define MODULE_ALIAS_FS(NAME) MODULE_ALIAS("fs-" NAME) +/** + * is_mgtime: is this inode using multigrain timestamps + * @inode: inode to test for multigrain timestamps + * + * Return true if the inode uses multigrain timestamps, false otherwise. + */ +static inline bool is_mgtime(const struct inode *inode) +{ + return inode->i_sb->s_type->fs_flags & FS_MGTIME; +} + extern struct dentry *mount_bdev(struct file_system_type *fs_type, int flags, const char *dev_name, void *data, int (*fill_super)(struct super_block *, void *, int)); @@ -3262,6 +3279,7 @@ extern void page_put_link(void *); extern int page_symlink(struct inode *inode, const char *symname, int len); extern const struct inode_operations page_symlink_inode_operations; extern void kfree_link(void *); +void fill_mg_cmtime(struct kstat *stat, u32 request_mask, struct inode *inode); void generic_fillattr(struct mnt_idmap *, u32, struct inode *, struct kstat *); void generic_fill_statx_attr(struct inode *inode, struct kstat *stat); void generic_fill_statx_atomic_writes(struct kstat *stat, From patchwork Sat Sep 14 17:07:16 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jeff Layton X-Patchwork-Id: 13804479 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id BE0C71D31AB; Sat, 14 Sep 2024 17:07:30 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1726333652; cv=none; b=RW8OeaykdGsLkKXYXG2ybq2taEpVi4Vxi/+aZsaEnkiu8rOK81CksEizIF/g431MKawHIKbcM/u+z9L3lFcdJC6IOnUxobytaQ4uW0kMl+qqNb2cFxBfqDbSjdJZwXfcsl1G46Xnu7bRoGnoTEUcUU3VvxtcLNmTG6vvcxzLYIQ= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1726333652; c=relaxed/simple; bh=9/XBaGHpEs7172AZ1LJbIYF77AkYkko2J3CyGkwFfPc=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:References: In-Reply-To:To:Cc; b=Y/AMT9Z8d/OaqoW3/FA1s2Vuqi4t6Vqqq1RXol8b81q1VggO82AGaxPGRMQ8Sh50aLvwVOV0aaj7h4bALf+1bCKq1Z5+QpDd/sFKLCzmVKtJkSvRLSwTtICDCwD1DJrn7L4z4G39d6zdqPk8xsuBA+cuoFT0KAVFS3PNQ6DPJz4= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=ZCOpnIci; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="ZCOpnIci" Received: by smtp.kernel.org (Postfix) with ESMTPSA id A40BFC4CED2; Sat, 14 Sep 2024 17:07:27 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1726333650; bh=9/XBaGHpEs7172AZ1LJbIYF77AkYkko2J3CyGkwFfPc=; h=From:Date:Subject:References:In-Reply-To:To:Cc:From; b=ZCOpnIci9ttfjhsJzdlcwY1nYcNFUbiD1dipumSZt87w3MACEWPzaGdku7J58Ocut 7HnIhiMDQAIoUXSnyL904jY9IAOTXxIxiIR1LWhnuDUgwkupg8+oSupET1iJss0j6j 58zUooisPaBhdB4W13l3tpEKZ/zhOH9LDKDAINhpnCC8r6Ajx/nmf0AhEdr/uUo0Oc CjPVoNLSKEWHCQEGyvzxrzRJEqsldglkPe2BcUgUhDpbf1SwnPrBrDTtNubansxew5 /6dR/EZrDa8YIK8KYySDQSYx4czlItz8HflLG4e7RWbLXVLXgWgfxKuTUczpylZw1Y XmuT7q3oYEnbw== From: Jeff Layton Date: Sat, 14 Sep 2024 13:07:16 -0400 Subject: [PATCH v8 03/11] fs: have setattr_copy handle multigrain timestamps appropriately Precedence: bulk X-Mailing-List: linux-nfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Message-Id: <20240914-mgtime-v8-3-5bd872330bed@kernel.org> References: <20240914-mgtime-v8-0-5bd872330bed@kernel.org> In-Reply-To: <20240914-mgtime-v8-0-5bd872330bed@kernel.org> To: John Stultz , Thomas Gleixner , Stephen Boyd , Alexander Viro , Christian Brauner , Jan Kara , Steven Rostedt , Masami Hiramatsu , Mathieu Desnoyers , Jonathan Corbet , Chandan Babu R , "Darrick J. Wong" , Theodore Ts'o , Andreas Dilger , Chris Mason , Josef Bacik , David Sterba , Hugh Dickins , Andrew Morton , Chuck Lever , Vadim Fedorenko Cc: Randy Dunlap , linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-trace-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-xfs@vger.kernel.org, linux-ext4@vger.kernel.org, linux-btrfs@vger.kernel.org, linux-nfs@vger.kernel.org, linux-mm@kvack.org, Jeff Layton X-Mailer: b4 0.14.1 X-Developer-Signature: v=1; a=openpgp-sha256; l=3540; i=jlayton@kernel.org; h=from:subject:message-id; bh=9/XBaGHpEs7172AZ1LJbIYF77AkYkko2J3CyGkwFfPc=; b=owEBbQKS/ZANAwAIAQAOaEEZVoIVAcsmYgBm5cLFOqolNl/xZsf6WUTQki64XUYvRpIiwgA6E 8I95y8GTEWJAjMEAAEIAB0WIQRLwNeyRHGyoYTq9dMADmhBGVaCFQUCZuXCxQAKCRAADmhBGVaC FV3xEACKkDWRY2iDTOCIYD0Vc9W6MmuR9SEw96FMPF1lO+jbD26E8dxf0hrkhre9oBUikVhQVAK mso6eiHYIR5pTUeQadzyNEdK8QQwVDVIq95IG5F1sp5NZBdW+myoBZzaVbX6i42BeAmKBFm+EAR uc0WhyS1vxp/5G2TzkAGEot1iMRUtgptkCYA1ksBNkbAQuAUpc28ksyE7gtXE+XTYRjt1+h7lb8 cXQakAWWrU9Q6lvD5Eu3kI30Ml47tKa/aLFyPct/CMVP1wzQT79+5bCbUQeoa3EmPsYZG3i6tgS TXVg1FFqlViABoU1Khb1RT0U4PY2i8jCrcZbncFaS2IJqlBSy2X3Eq618IyGtKF2ZOLoZV80axd hJFet3BIa4P64HQ4Fwjedi/V4OdyKodhAbbAzinWTRT89vH6DMI6nNlFCRSayI4PGlcoqrD4gZK rDvvA59XFLCJ978lHKKngt2iaKf3f90k955si5r2iZahPIjkJNnt1y1GRg1hVrP8NwwI2cyrNxV q0N4yXSUlEdwbb42iJKH5aKp7AADnCwtFkCZ/W3qxfAyhlm3eVtV/60AuQS3lzjIXr+TYqy5CWC BFUrQdbfj9XMZMNWLwf4qTQqvZKxw+Pt2OUbDcndFIk5qB1yYq0dgaRpOrtgUPx9FAtsIGGKb+8 vkCknd986DS7deg== X-Developer-Key: i=jlayton@kernel.org; a=openpgp; fpr=4BC0D7B24471B2A184EAF5D3000E684119568215 The setattr codepath is still using coarse-grained timestamps, even on multigrain filesystems. To fix this, we need to fetch the timestamp for ctime updates later, at the point where the assignment occurs in setattr_copy. On a multigrain inode, ignore the ia_ctime in the attrs, and always update the ctime to the current clock value. Update the atime and mtime with the same value (if needed) unless they are being set to other specific values, a'la utimes(). Note that we don't want to do this universally however, as some filesystems (e.g. most networked fs) want to do an explicit update elsewhere before updating the local inode. Reviewed-by: Darrick J. Wong Reviewed-by: Josef Bacik Reviewed-by: Jan Kara Signed-off-by: Jeff Layton --- fs/attr.c | 52 ++++++++++++++++++++++++++++++++++++++++++++++------ 1 file changed, 46 insertions(+), 6 deletions(-) diff --git a/fs/attr.c b/fs/attr.c index c04d19b58f12..3bcbc45708a3 100644 --- a/fs/attr.c +++ b/fs/attr.c @@ -271,6 +271,42 @@ int inode_newsize_ok(const struct inode *inode, loff_t offset) } EXPORT_SYMBOL(inode_newsize_ok); +/** + * setattr_copy_mgtime - update timestamps for mgtime inodes + * @inode: inode timestamps to be updated + * @attr: attrs for the update + * + * With multigrain timestamps, we need to take more care to prevent races + * when updating the ctime. Always update the ctime to the very latest + * using the standard mechanism, and use that to populate the atime and + * mtime appropriately (unless we're setting those to specific values). + */ +static void setattr_copy_mgtime(struct inode *inode, const struct iattr *attr) +{ + unsigned int ia_valid = attr->ia_valid; + struct timespec64 now; + + /* + * If the ctime isn't being updated then nothing else should be + * either. + */ + if (!(ia_valid & ATTR_CTIME)) { + WARN_ON_ONCE(ia_valid & (ATTR_ATIME|ATTR_MTIME)); + return; + } + + now = inode_set_ctime_current(inode); + if (ia_valid & ATTR_ATIME_SET) + inode_set_atime_to_ts(inode, attr->ia_atime); + else if (ia_valid & ATTR_ATIME) + inode_set_atime_to_ts(inode, now); + + if (ia_valid & ATTR_MTIME_SET) + inode_set_mtime_to_ts(inode, attr->ia_mtime); + else if (ia_valid & ATTR_MTIME) + inode_set_mtime_to_ts(inode, now); +} + /** * setattr_copy - copy simple metadata updates into the generic inode * @idmap: idmap of the mount the inode was found from @@ -303,12 +339,6 @@ void setattr_copy(struct mnt_idmap *idmap, struct inode *inode, i_uid_update(idmap, attr, inode); i_gid_update(idmap, attr, inode); - if (ia_valid & ATTR_ATIME) - inode_set_atime_to_ts(inode, attr->ia_atime); - if (ia_valid & ATTR_MTIME) - inode_set_mtime_to_ts(inode, attr->ia_mtime); - if (ia_valid & ATTR_CTIME) - inode_set_ctime_to_ts(inode, attr->ia_ctime); if (ia_valid & ATTR_MODE) { umode_t mode = attr->ia_mode; if (!in_group_or_capable(idmap, inode, @@ -316,6 +346,16 @@ void setattr_copy(struct mnt_idmap *idmap, struct inode *inode, mode &= ~S_ISGID; inode->i_mode = mode; } + + if (is_mgtime(inode)) + return setattr_copy_mgtime(inode, attr); + + if (ia_valid & ATTR_ATIME) + inode_set_atime_to_ts(inode, attr->ia_atime); + if (ia_valid & ATTR_MTIME) + inode_set_mtime_to_ts(inode, attr->ia_mtime); + if (ia_valid & ATTR_CTIME) + inode_set_ctime_to_ts(inode, attr->ia_ctime); } EXPORT_SYMBOL(setattr_copy); From patchwork Sat Sep 14 17:07:17 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jeff Layton X-Patchwork-Id: 13804480 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 8563F1D45FF; Sat, 14 Sep 2024 17:07:33 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1726333653; cv=none; b=Feb7mXNlabbUKzDz4oloBP6Be2GhgzKAYCzT+rEBSLUDHnBTbWIlklziEEC8Yo9ZmZpOcvekPmiYUb2FRy7XyieuEDS1WgnN/dAJLw9VYZHCJTA2X7s72qarqLp/cWhW5ZkT8nxtUynuTjDIc0sM/MS6N0kuERZzmPjMwo+KbsQ= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1726333653; c=relaxed/simple; bh=aSqbB0gVujzisU/y6FV03b+ti8yD2UZ7U4fMZl1YgTo=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:References: In-Reply-To:To:Cc; b=FYRKxw4cgeciM0Mzd1fRFbX9mFKoJdJ9/y6WWUJ4IX4rk84ICN6hL+XqJqhBJ4TVpjjB8kE+jCZyB+B+rgk/fRUbKG6/H6zGc4EdOILlDkp7tm6nX+X3DZ8YdeyE5EmhSvoUYRRd88HOhs2vkAMDemzZ/h+tugaqccrV2/72nXM= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=Pl5qlkt/; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="Pl5qlkt/" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 78032C4CEC0; Sat, 14 Sep 2024 17:07:30 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1726333653; bh=aSqbB0gVujzisU/y6FV03b+ti8yD2UZ7U4fMZl1YgTo=; h=From:Date:Subject:References:In-Reply-To:To:Cc:From; b=Pl5qlkt/VRPxlEcECODI0Xew6wnc+fSPdtKQp8YUJed8MYy0e5w2xaPwRcdMBgUEX EpB1GwVipSkYtzT4yrDLNRGOmaESeRR8m1tNAMSKO5hR4QqT1zGET+2Eou0MilSySI /62SgfKM/z80LwjQbkooyLyYx8/YcM4NEjijduCtIUi1nxOwqRJAvxDk0IGFVy489A IizzP7V2GujnZLllrWkxp4cKoqQkdRkijuNJJB3pofH9F6G2JmhIMFY7uTji5J0rxl osd1Y1rOrPAsGiKanTdIrtULRfkQUyOW9EY0W2vuNxYLQ3adHa0xSm1IFwWGFcOOoT 1Y1NG/JInOehw== From: Jeff Layton Date: Sat, 14 Sep 2024 13:07:17 -0400 Subject: [PATCH v8 04/11] fs: handle delegated timestamps in setattr_copy_mgtime Precedence: bulk X-Mailing-List: linux-nfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Message-Id: <20240914-mgtime-v8-4-5bd872330bed@kernel.org> References: <20240914-mgtime-v8-0-5bd872330bed@kernel.org> In-Reply-To: <20240914-mgtime-v8-0-5bd872330bed@kernel.org> To: John Stultz , Thomas Gleixner , Stephen Boyd , Alexander Viro , Christian Brauner , Jan Kara , Steven Rostedt , Masami Hiramatsu , Mathieu Desnoyers , Jonathan Corbet , Chandan Babu R , "Darrick J. Wong" , Theodore Ts'o , Andreas Dilger , Chris Mason , Josef Bacik , David Sterba , Hugh Dickins , Andrew Morton , Chuck Lever , Vadim Fedorenko Cc: Randy Dunlap , linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-trace-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-xfs@vger.kernel.org, linux-ext4@vger.kernel.org, linux-btrfs@vger.kernel.org, linux-nfs@vger.kernel.org, linux-mm@kvack.org, Jeff Layton X-Mailer: b4 0.14.1 X-Developer-Signature: v=1; a=openpgp-sha256; l=6206; i=jlayton@kernel.org; h=from:subject:message-id; bh=aSqbB0gVujzisU/y6FV03b+ti8yD2UZ7U4fMZl1YgTo=; b=owEBbQKS/ZANAwAIAQAOaEEZVoIVAcsmYgBm5cLFQ8HK1/W2OQTOc42uOSa7MYW8Wq+Whom2H gQ2acWJQ+SJAjMEAAEIAB0WIQRLwNeyRHGyoYTq9dMADmhBGVaCFQUCZuXCxQAKCRAADmhBGVaC Fc5vEADJP9NxRrVNDWeeviXY5a0IMlU18SPHOde4ugVGgDVCrzzVS05AkoXjAn2YVxLbwxN2ls2 Xt0ndOnQIzwg5/eJrvpZdUlgKhEzMwtV2nsiT6qdBMSK0yeQHxgbbeNb1CUF66GiuTafnxCFO9c sccqTqX75ub/SOPK/89OsWUfBPIJtKcBqOzMeOsgJE17qSOBkXRr0ItxwxbxWc0m4ryqmIvth3z NPeW+VHgfvWPnx/eEK0FKiYh/Bkyeb6GdebifCbkRUEtCIYww4s3QkE+jWy1rb8akDFPD94vooX 5sZ77ScCJaYnb0Jas54ze20L1jQI+XoYIo0MLqjUAIY2cKMIRrzCDXCpoEPPnucksVjxFRJTaj2 mGXAC1vlu/7x+xu9rS8qu3xSUaShw/WCDKAyWkL1WSrd9pNgWenfVdu8FSJBc/3cSxXT93VT5Fk 1LgzBtyh74/qerEJ6ie4twa+D+eVUGmNsLVEg2EnSX63Gk5BXeQTn1sdnf56tpqTz8K5QUhjPQV XPkOopyKtxqzlYrpF7CJyJr8AetoNK9xLXniUW5qwbuDR1Hjqr1vm1EBnXDGk8khK5T27zBmxUb LwkTcemBpaEbcbX/PGbRS3tr6NFaZdf4Koe6U0Mu+rN/k0srUBYIxZlJj/WWHmBlKMrRspT/g+j VIHjLzOS8yubbvg== X-Developer-Key: i=jlayton@kernel.org; a=openpgp; fpr=4BC0D7B24471B2A184EAF5D3000E684119568215 When updating the ctime on an inode for a SETATTR with a multigrain filesystem, we usually want to take the latest time we can get for the ctime. The exception to this rule is when there is a nfsd write delegation and the server is proxying timestamps from the client. When nfsd gets a CB_GETATTR response, we want to update the timestamp value in the inode to the values that the client is tracking. The client doesn't send a ctime value (since that's always determined by the exported filesystem), but it can send a mtime value. In the case where it does, then we may need to update the ctime to a value commensurate with that instead of the current time. If ATTR_DELEG is set, then use ia_ctime value instead of setting the timestamp to the current time. With the addition of delegated timestamps we can also receive a request to update only the atime, but we may not need to set the ctime. Trust the ATTR_CTIME flag in the update and only update the ctime when it's set. Signed-off-by: Jeff Layton --- fs/attr.c | 28 +++++++++++++-------- fs/inode.c | 72 ++++++++++++++++++++++++++++++++++++++++++++++++++++++ include/linux/fs.h | 2 ++ 3 files changed, 92 insertions(+), 10 deletions(-) diff --git a/fs/attr.c b/fs/attr.c index 3bcbc45708a3..392eb62aa609 100644 --- a/fs/attr.c +++ b/fs/attr.c @@ -286,16 +286,20 @@ static void setattr_copy_mgtime(struct inode *inode, const struct iattr *attr) unsigned int ia_valid = attr->ia_valid; struct timespec64 now; - /* - * If the ctime isn't being updated then nothing else should be - * either. - */ - if (!(ia_valid & ATTR_CTIME)) { - WARN_ON_ONCE(ia_valid & (ATTR_ATIME|ATTR_MTIME)); - return; + if (ia_valid & ATTR_CTIME) { + /* + * In the case of an update for a write delegation, we must respect + * the value in ia_ctime and not use the current time. + */ + if (ia_valid & ATTR_DELEG) + now = inode_set_ctime_deleg(inode, attr->ia_ctime); + else + now = inode_set_ctime_current(inode); + } else { + /* If ATTR_CTIME isn't set, then ATTR_MTIME shouldn't be either. */ + WARN_ON_ONCE(ia_valid & ATTR_MTIME); } - now = inode_set_ctime_current(inode); if (ia_valid & ATTR_ATIME_SET) inode_set_atime_to_ts(inode, attr->ia_atime); else if (ia_valid & ATTR_ATIME) @@ -354,8 +358,12 @@ void setattr_copy(struct mnt_idmap *idmap, struct inode *inode, inode_set_atime_to_ts(inode, attr->ia_atime); if (ia_valid & ATTR_MTIME) inode_set_mtime_to_ts(inode, attr->ia_mtime); - if (ia_valid & ATTR_CTIME) - inode_set_ctime_to_ts(inode, attr->ia_ctime); + if (ia_valid & ATTR_CTIME) { + if (ia_valid & ATTR_DELEG) + inode_set_ctime_deleg(inode, attr->ia_ctime); + else + inode_set_ctime_to_ts(inode, attr->ia_ctime); + } } EXPORT_SYMBOL(setattr_copy); diff --git a/fs/inode.c b/fs/inode.c index 232b474218e6..614d0402e9ad 100644 --- a/fs/inode.c +++ b/fs/inode.c @@ -2715,6 +2715,78 @@ struct timespec64 inode_set_ctime_current(struct inode *inode) } EXPORT_SYMBOL(inode_set_ctime_current); +/** + * inode_set_ctime_deleg - try to update the ctime on a delegated inode + * @inode: inode to update + * @update: timespec64 to set the ctime + * + * Attempt to atomically update the ctime on behalf of a delegation holder. + * + * The nfs server can call back the holder of a delegation to get updated + * inode attributes, including the mtime. When updating the mtime we may + * need to update the ctime to a value at least equal to that. + * + * This can race with concurrent updates to the inode, in which + * case we just don't do the update. + * + * Note that this works even when multigrain timestamps are not enabled, + * so use it in either case. + */ +struct timespec64 inode_set_ctime_deleg(struct inode *inode, struct timespec64 update) +{ + struct timespec64 now, cur_ts; + u32 cur, old; + + /* pairs with try_cmpxchg below */ + cur = smp_load_acquire(&inode->i_ctime_nsec); + cur_ts.tv_nsec = cur & ~I_CTIME_QUERIED; + cur_ts.tv_sec = inode->i_ctime_sec; + + /* If the update is older than the existing value, skip it. */ + if (timespec64_compare(&update, &cur_ts) <= 0) + return cur_ts; + + ktime_get_coarse_real_ts64_mg(&now); + + /* Clamp the update to "now" if it's in the future */ + if (timespec64_compare(&update, &now) > 0) + update = now; + + update = timestamp_truncate(update, inode); + + /* No need to update if the values are already the same */ + if (timespec64_equal(&update, &cur_ts)) + return cur_ts; + + /* + * Try to swap the nsec value into place. If it fails, that means + * we raced with an update due to a write or similar activity. That + * stamp takes precedence, so just skip the update. + */ +retry: + old = cur; + if (try_cmpxchg(&inode->i_ctime_nsec, &cur, update.tv_nsec)) { + inode->i_ctime_sec = update.tv_sec; + mgtime_counter_inc(mg_ctime_swaps); + return update; + } + + /* + * Was the change due to someone marking the old ctime QUERIED? + * If so then retry the swap. This can only happen once since + * the only way to clear I_CTIME_QUERIED is to stamp the inode + * with a new ctime. + */ + if (!(old & I_CTIME_QUERIED) && (cur == (old | I_CTIME_QUERIED))) + goto retry; + + /* Otherwise, it was a new timestamp. */ + cur_ts.tv_sec = inode->i_ctime_sec; + cur_ts.tv_nsec = cur & ~I_CTIME_QUERIED; + return cur_ts; +} +EXPORT_SYMBOL(inode_set_ctime_deleg); + /** * in_group_or_capable - check whether caller is CAP_FSETID privileged * @idmap: idmap of the mount @inode was found from diff --git a/include/linux/fs.h b/include/linux/fs.h index eff688e75f2f..ea7ed437d2b1 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -1544,6 +1544,8 @@ static inline bool fsuidgid_has_mapping(struct super_block *sb, struct timespec64 current_time(struct inode *inode); struct timespec64 inode_set_ctime_current(struct inode *inode); +struct timespec64 inode_set_ctime_deleg(struct inode *inode, + struct timespec64 update); static inline time64_t inode_get_atime_sec(const struct inode *inode) { From patchwork Sat Sep 14 17:07:18 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jeff Layton X-Patchwork-Id: 13804481 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 578561D12F8; Sat, 14 Sep 2024 17:07:36 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1726333656; cv=none; b=NNDHClBF6TEWPxWBmj+2MsKqghu9xhNsuI8gnlC54vZdOCeh6T9OKUZe3bkFjCrFFrLiQOfFInDSxwllKEEql4nVo4JVg8N5KJUj12LJH2gbOsrgN8adhQGgnPxput9wnHtxzxn52m5j4+YbX5BYOTyxWFfE28hzQKdddD+Glls= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1726333656; c=relaxed/simple; bh=/dfE47m7K8nZA6bqMBlfXG7BPm8zSfwAyhiBp2lHZvM=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:References: In-Reply-To:To:Cc; b=YzPInGF12cFRvBa+4m+zMzIPIHsnnvXPF+w/w51hr89BMLg8ug5eiNIx2a3NDAzwkxYgpPrj6w7Lh2x/3pnCJNRjgoZ7YUW/Ezb4f85GTnT9+zotGAnD7w3jWQABlUi/8f53wpOWAp+XuwfEwmTMVtMfZjnql5rvdwt5RPcTyiE= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=VQYAuBes; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="VQYAuBes" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 4BA65C4CECC; Sat, 14 Sep 2024 17:07:33 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1726333655; bh=/dfE47m7K8nZA6bqMBlfXG7BPm8zSfwAyhiBp2lHZvM=; h=From:Date:Subject:References:In-Reply-To:To:Cc:From; b=VQYAuBesq56UQPycOVQYKukq5mJbQ9QK4E7WhdklLL7GlhgbMNKWmgah4QPT7IO2Z AHj8MRiXTcHSGOFaEZ7iTGLMQmiKQMcCJ86KBKkyLGFKQQqXeT3otFDPkGicHlZCfo 6OEyw4ujRtYlCEUbUxTjonLAIYcQHDNz13Rj0JyCKMxwsP6HtherPjsTMDVla2+XXJ lPRKe55XyWLGdPU+Bk6eFQOA4QiZiq4T6fmkCnH4HtdCn4nyCcxLQGNZaQewo01SHe Vtz+KmyeYvomaHHxZSNoNYLu6sM0gV0sVx8xgQCBQZvN4z2pl/A9Ab7LxHqDiznOHt VugKDAtQdg4uA== From: Jeff Layton Date: Sat, 14 Sep 2024 13:07:18 -0400 Subject: [PATCH v8 05/11] fs: tracepoints around multigrain timestamp events Precedence: bulk X-Mailing-List: linux-nfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Message-Id: <20240914-mgtime-v8-5-5bd872330bed@kernel.org> References: <20240914-mgtime-v8-0-5bd872330bed@kernel.org> In-Reply-To: <20240914-mgtime-v8-0-5bd872330bed@kernel.org> To: John Stultz , Thomas Gleixner , Stephen Boyd , Alexander Viro , Christian Brauner , Jan Kara , Steven Rostedt , Masami Hiramatsu , Mathieu Desnoyers , Jonathan Corbet , Chandan Babu R , "Darrick J. Wong" , Theodore Ts'o , Andreas Dilger , Chris Mason , Josef Bacik , David Sterba , Hugh Dickins , Andrew Morton , Chuck Lever , Vadim Fedorenko Cc: Randy Dunlap , linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-trace-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-xfs@vger.kernel.org, linux-ext4@vger.kernel.org, linux-btrfs@vger.kernel.org, linux-nfs@vger.kernel.org, linux-mm@kvack.org, Jeff Layton X-Mailer: b4 0.14.1 X-Developer-Signature: v=1; a=openpgp-sha256; l=5947; i=jlayton@kernel.org; h=from:subject:message-id; bh=/dfE47m7K8nZA6bqMBlfXG7BPm8zSfwAyhiBp2lHZvM=; b=owEBbQKS/ZANAwAIAQAOaEEZVoIVAcsmYgBm5cLG5tHz6yHM6q5mnvpj1Gu54/Dx+ejg8C/Lo OL6Yjyr7tSJAjMEAAEIAB0WIQRLwNeyRHGyoYTq9dMADmhBGVaCFQUCZuXCxgAKCRAADmhBGVaC FSzSD/9C2MjL8co2O/piD/TweFTojMnHgr0m/BmN/ofQm9TCW/y8BD7awkt7HUezkDb1iYLdyB/ mH1mvdLoddXhzME0s6tZBboBr8tuz/5mdiAewdqeMI3/Y/f2Z7wpdfHIOKHTrOXiD16vaWUlFfd feaAqdR8hjYXciuRBcRJzcJwyvkISuZUJz03+s/w73WvTASj1Iev39akcCFpJg8XN5Tfyt8S8nR xjb2fSfXNYCBu79VlQ7RzjhRLMqMfugbbnq+9SaqYRL4OaLZS5WZVHobfdGzmhAZI8IcP9cKuaY cZKdb5UCzcQtBkl1vKL2Dz4S3HCP0v5jG/1r+5mzJ6RVndUnowSWfi6+WIXoqtWPKSftX+/BtRt QxZNMb3zlh4PRtBuryOdhk1zJUJ/Q3oKglc2vERC23EJV4FN3l7CFYsQ6xYSvFNqcAIJ8PBBRAJ 0zT2vceH51lrHWOfc/H3fWN48m14JtgQfiWogDrV+Bww7I/k7hfjhRMhoBoCssKOH6E+XkTZXIk seUqqYcXooLcs7NaQdxeis5R2B75CBMNIDLJphZy1Ejjw3gD8OJ2Alj8pt5tXONZfYHcIrFqdog UTFuATPPICaN0K1m0yUB4WQzG4e9XQBNd4jbaBDSGxH861ChDaqx4Bi+8sNDxWYxWj3lYqvy5a/ icz9mM/v5aV35AA== X-Developer-Key: i=jlayton@kernel.org; a=openpgp; fpr=4BC0D7B24471B2A184EAF5D3000E684119568215 Add some tracepoints around various multigrain timestamp events. Reviewed-by: Josef Bacik Reviewed-by: Darrick J. Wong Reviewed-by: Jan Kara Signed-off-by: Jeff Layton Reviewed-by: Steven Rostedt (Google) --- fs/inode.c | 9 ++- fs/stat.c | 3 + include/trace/events/timestamp.h | 124 +++++++++++++++++++++++++++++++++++++++ 3 files changed, 135 insertions(+), 1 deletion(-) diff --git a/fs/inode.c b/fs/inode.c index 614d0402e9ad..d7da9d06921f 100644 --- a/fs/inode.c +++ b/fs/inode.c @@ -22,6 +22,9 @@ #include #include #include +#define CREATE_TRACE_POINTS +#include + #include "internal.h" /* @@ -2603,6 +2606,7 @@ EXPORT_SYMBOL(inode_nohighmem); struct timespec64 inode_set_ctime_to_ts(struct inode *inode, struct timespec64 ts) { + trace_inode_set_ctime_to_ts(inode, &ts); set_normalized_timespec64(&ts, ts.tv_sec, ts.tv_nsec); inode->i_ctime_sec = ts.tv_sec; inode->i_ctime_nsec = ts.tv_nsec; @@ -2687,14 +2691,17 @@ struct timespec64 inode_set_ctime_current(struct inode *inode) } /* No need to cmpxchg if it's exactly the same */ - if (cns == now.tv_nsec && inode->i_ctime_sec == now.tv_sec) + if (cns == now.tv_nsec && inode->i_ctime_sec == now.tv_sec) { + trace_ctime_xchg_skip(inode, &now); goto out; + } cur = cns; retry: /* Try to swap the nsec value into place. */ if (try_cmpxchg(&inode->i_ctime_nsec, &cur, now.tv_nsec)) { /* If swap occurred, then we're (mostly) done */ inode->i_ctime_sec = now.tv_sec; + trace_ctime_ns_xchg(inode, cns, now.tv_nsec, cur); } else { /* * Was the change due to someone marking the old ctime QUERIED? diff --git a/fs/stat.c b/fs/stat.c index a449626fd460..9eb6d9b2d010 100644 --- a/fs/stat.c +++ b/fs/stat.c @@ -23,6 +23,8 @@ #include #include +#include + #include "internal.h" #include "mount.h" @@ -52,6 +54,7 @@ void fill_mg_cmtime(struct kstat *stat, u32 request_mask, struct inode *inode) if (!(stat->ctime.tv_nsec & I_CTIME_QUERIED)) stat->ctime.tv_nsec = ((u32)atomic_fetch_or(I_CTIME_QUERIED, pcn)); stat->ctime.tv_nsec &= ~I_CTIME_QUERIED; + trace_fill_mg_cmtime(inode, &stat->ctime, &stat->mtime); } EXPORT_SYMBOL(fill_mg_cmtime); diff --git a/include/trace/events/timestamp.h b/include/trace/events/timestamp.h new file mode 100644 index 000000000000..c9e5ec930054 --- /dev/null +++ b/include/trace/events/timestamp.h @@ -0,0 +1,124 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#undef TRACE_SYSTEM +#define TRACE_SYSTEM timestamp + +#if !defined(_TRACE_TIMESTAMP_H) || defined(TRACE_HEADER_MULTI_READ) +#define _TRACE_TIMESTAMP_H + +#include +#include + +#define CTIME_QUERIED_FLAGS \ + { I_CTIME_QUERIED, "Q" } + +DECLARE_EVENT_CLASS(ctime, + TP_PROTO(struct inode *inode, + struct timespec64 *ctime), + + TP_ARGS(inode, ctime), + + TP_STRUCT__entry( + __field(dev_t, dev) + __field(ino_t, ino) + __field(time64_t, ctime_s) + __field(u32, ctime_ns) + __field(u32, gen) + ), + + TP_fast_assign( + __entry->dev = inode->i_sb->s_dev; + __entry->ino = inode->i_ino; + __entry->gen = inode->i_generation; + __entry->ctime_s = ctime->tv_sec; + __entry->ctime_ns = ctime->tv_nsec; + ), + + TP_printk("ino=%d:%d:%ld:%u ctime=%lld.%u", + MAJOR(__entry->dev), MINOR(__entry->dev), __entry->ino, __entry->gen, + __entry->ctime_s, __entry->ctime_ns + ) +); + +DEFINE_EVENT(ctime, inode_set_ctime_to_ts, + TP_PROTO(struct inode *inode, + struct timespec64 *ctime), + TP_ARGS(inode, ctime)); + +DEFINE_EVENT(ctime, ctime_xchg_skip, + TP_PROTO(struct inode *inode, + struct timespec64 *ctime), + TP_ARGS(inode, ctime)); + +TRACE_EVENT(ctime_ns_xchg, + TP_PROTO(struct inode *inode, + u32 old, + u32 new, + u32 cur), + + TP_ARGS(inode, old, new, cur), + + TP_STRUCT__entry( + __field(dev_t, dev) + __field(ino_t, ino) + __field(u32, gen) + __field(u32, old) + __field(u32, new) + __field(u32, cur) + ), + + TP_fast_assign( + __entry->dev = inode->i_sb->s_dev; + __entry->ino = inode->i_ino; + __entry->gen = inode->i_generation; + __entry->old = old; + __entry->new = new; + __entry->cur = cur; + ), + + TP_printk("ino=%d:%d:%ld:%u old=%u:%s new=%u cur=%u:%s", + MAJOR(__entry->dev), MINOR(__entry->dev), __entry->ino, __entry->gen, + __entry->old & ~I_CTIME_QUERIED, + __print_flags(__entry->old & I_CTIME_QUERIED, "|", CTIME_QUERIED_FLAGS), + __entry->new, + __entry->cur & ~I_CTIME_QUERIED, + __print_flags(__entry->cur & I_CTIME_QUERIED, "|", CTIME_QUERIED_FLAGS) + ) +); + +TRACE_EVENT(fill_mg_cmtime, + TP_PROTO(struct inode *inode, + struct timespec64 *ctime, + struct timespec64 *mtime), + + TP_ARGS(inode, ctime, mtime), + + TP_STRUCT__entry( + __field(dev_t, dev) + __field(ino_t, ino) + __field(time64_t, ctime_s) + __field(time64_t, mtime_s) + __field(u32, ctime_ns) + __field(u32, mtime_ns) + __field(u32, gen) + ), + + TP_fast_assign( + __entry->dev = inode->i_sb->s_dev; + __entry->ino = inode->i_ino; + __entry->gen = inode->i_generation; + __entry->ctime_s = ctime->tv_sec; + __entry->mtime_s = mtime->tv_sec; + __entry->ctime_ns = ctime->tv_nsec; + __entry->mtime_ns = mtime->tv_nsec; + ), + + TP_printk("ino=%d:%d:%ld:%u ctime=%lld.%u mtime=%lld.%u", + MAJOR(__entry->dev), MINOR(__entry->dev), __entry->ino, __entry->gen, + __entry->ctime_s, __entry->ctime_ns, + __entry->mtime_s, __entry->mtime_ns + ) +); +#endif /* _TRACE_TIMESTAMP_H */ + +/* This part must be outside protection */ +#include From patchwork Sat Sep 14 17:07:19 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jeff Layton X-Patchwork-Id: 13804482 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 2732C1D589A; Sat, 14 Sep 2024 17:07:38 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1726333659; cv=none; b=Qt+sygecCntRf3fH1f5NjXEnG940+XDYMn1HKhEhhnit61GqN4HXzBCvKpOI3f+RffMvjW9Bi7PuC5072rUjFbzxqn5v27bUux5Rv4NiwegzZqrF6cp3jsB7nlmdIq5TMk5gDFPR44tmbadR1m1HsGSBabiDVkaCoAEvlg9gN+Q= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1726333659; c=relaxed/simple; bh=+3a+MvaEmr8jlVEnfUS/6TeXNgPVOZEguly4I5T5qLo=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:References: In-Reply-To:To:Cc; b=s6ni0GAUVWKHTvJIBd1nd7tizCg4gbggIjrIxZmW8lvqh9jBP9AAfp6S86RyJ6O6TbFqVeZo0qfRGb9RH1xCDIC2313bpZy7wAVCYYumfoiVuR00P6t17TfA3bvYpAmUTfCiVsYRQP4DFGsWrk+B+CYo6FKqRF/5oWCLtNDSP+g= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=aD+iHFXG; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="aD+iHFXG" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 20F96C4CECF; Sat, 14 Sep 2024 17:07:36 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1726333658; bh=+3a+MvaEmr8jlVEnfUS/6TeXNgPVOZEguly4I5T5qLo=; h=From:Date:Subject:References:In-Reply-To:To:Cc:From; b=aD+iHFXGqA5JgGaTYLcJ4FNvOcGWpB4PW6RtlVSPpR24jd5+bFiFM7Pxh4qJ7JBu/ dYIJMiQoz4IGBxFkWI0inRm2+leKVKhkQPUsfiEYkIMZnGasQL+JLA9x0dgds+ULSU 6UHnPiF4cgMAYIHYTD397RggIVocHkt1rqfLCKDYAdercMlwkBqOrPnh4rMnVH2tv9 i2lBH38WJTEteBvsFxDp6mzkxg6PJ3Z5ju7h/GBfzNQsaAwis4exfZKYq5TQvOO4kz 5RJaBkotGU4z1xjOjBeeY3gXWZ4yC4qeDlfubsn5Bu1zr3Mq4cPE1xE/CqXo+y/PwG uwtpVpVM9zK1g== From: Jeff Layton Date: Sat, 14 Sep 2024 13:07:19 -0400 Subject: [PATCH v8 06/11] fs: add percpu counters for significant multigrain timestamp events Precedence: bulk X-Mailing-List: linux-nfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Message-Id: <20240914-mgtime-v8-6-5bd872330bed@kernel.org> References: <20240914-mgtime-v8-0-5bd872330bed@kernel.org> In-Reply-To: <20240914-mgtime-v8-0-5bd872330bed@kernel.org> To: John Stultz , Thomas Gleixner , Stephen Boyd , Alexander Viro , Christian Brauner , Jan Kara , Steven Rostedt , Masami Hiramatsu , Mathieu Desnoyers , Jonathan Corbet , Chandan Babu R , "Darrick J. Wong" , Theodore Ts'o , Andreas Dilger , Chris Mason , Josef Bacik , David Sterba , Hugh Dickins , Andrew Morton , Chuck Lever , Vadim Fedorenko Cc: Randy Dunlap , linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-trace-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-xfs@vger.kernel.org, linux-ext4@vger.kernel.org, linux-btrfs@vger.kernel.org, linux-nfs@vger.kernel.org, linux-mm@kvack.org, Jeff Layton X-Mailer: b4 0.14.1 X-Developer-Signature: v=1; a=openpgp-sha256; l=7206; i=jlayton@kernel.org; h=from:subject:message-id; bh=+3a+MvaEmr8jlVEnfUS/6TeXNgPVOZEguly4I5T5qLo=; b=owEBbQKS/ZANAwAIAQAOaEEZVoIVAcsmYgBm5cLGQYo9QregMNCy/xaR7pG6H/Plnuv5MCrll yGT7FFAkm2JAjMEAAEIAB0WIQRLwNeyRHGyoYTq9dMADmhBGVaCFQUCZuXCxgAKCRAADmhBGVaC FY4dEACQ30o7dm6/X64AG7zaEOxgddb96T6gDj8TDTuCswZDN4BXXzgQQ3pmW+k7IGGXxgoZv1L Ah6te+pdJPDzaY3XkddSkrAAGklFh7bDRhwz8fN8W003zQIHSMLCDPT1jMdThh8U6VYDLvstq3J fY/+GyscMLHjaven87xQGU7DXJdWqWRdXfEmP4iJzBOEWti/U/YEECZ9XLSs6OMyHAGH0VV2zZM v3RDk5ivfmR4OiifBN+dR6ycULQFIjHh29mfYZQZPpNiYK2ud1uLkRmmoAznFTlt73zas407QZ2 4uiaN9PsVyHOwcraYMC5K0NeXzKPC4bm4fbnkTQ/7tLTUJ2aoNAt2tcV0aXnLNNwZQegh23JyDh HAAnq1W0uM6cjxXT21V03epzC8gmaprBda2pT2QpiNz5yMawOu+8d7M8DqN6HfBbWbCT3JMmSsi qlx05niBKzs0E1YgjmAcUG5S1RsxE+ma0nGcxRXSq9q0NziRwTw82xxPxihZ/yaF8OVPWGZ45B8 lQjwUDqgHSkbJ2XLa1YdE0CeJXhcCqpYQux5mgJw0aWJV2aa1lJOvRxWhoX2MkKyMV268AtM/gd Q8nadX2GchrFtBB7Oe906n/rM7CnD+U0z7G0xGfbbZVDWb+4AdPB5Olj+iI9A9YEG7axNm6KMBJ VObT+RUdyFXPSzA== X-Developer-Key: i=jlayton@kernel.org; a=openpgp; fpr=4BC0D7B24471B2A184EAF5D3000E684119568215 New percpu counters for counting various stats around mgtimes, and a new debugfs file for displaying them when CONFIG_DEBUG_FS is enabled: - number of attempted ctime updates - number of successful i_ctime_nsec swaps - number of fine-grained timestamp fetches - number of coarse-grained floor swaps Reviewed-by: Josef Bacik Reviewed-by: Darrick J. Wong Reviewed-by: Jan Kara Signed-off-by: Jeff Layton --- fs/inode.c | 76 ++++++++++++++++++++++++++++++++++++-- include/linux/timekeeping.h | 1 + kernel/time/timekeeping.c | 3 +- kernel/time/timekeeping_debug.c | 12 ++++++ kernel/time/timekeeping_internal.h | 3 ++ 5 files changed, 90 insertions(+), 5 deletions(-) diff --git a/fs/inode.c b/fs/inode.c index d7da9d06921f..1f0487104c71 100644 --- a/fs/inode.c +++ b/fs/inode.c @@ -21,6 +21,8 @@ #include #include #include +#include +#include #include #define CREATE_TRACE_POINTS #include @@ -101,6 +103,70 @@ long get_nr_dirty_inodes(void) return nr_dirty > 0 ? nr_dirty : 0; } +#ifdef CONFIG_DEBUG_FS +static DEFINE_PER_CPU(long, mg_ctime_updates); +static DEFINE_PER_CPU(long, mg_fine_stamps); +static DEFINE_PER_CPU(long, mg_ctime_swaps); + +static long get_mg_ctime_updates(void) +{ + int i; + long sum = 0; + + for_each_possible_cpu(i) + sum += per_cpu(mg_ctime_updates, i); + return sum < 0 ? 0 : sum; +} + +static long get_mg_fine_stamps(void) +{ + int i; + long sum = 0; + + for_each_possible_cpu(i) + sum += per_cpu(mg_fine_stamps, i); + return sum < 0 ? 0 : sum; +} + +static long get_mg_ctime_swaps(void) +{ + int i; + long sum = 0; + + for_each_possible_cpu(i) + sum += per_cpu(mg_ctime_swaps, i); + return sum < 0 ? 0 : sum; +} + +#define mgtime_counter_inc(__var) this_cpu_inc(__var) + +static int mgts_show(struct seq_file *s, void *p) +{ + long ctime_updates = get_mg_ctime_updates(); + long ctime_swaps = get_mg_ctime_swaps(); + long fine_stamps = get_mg_fine_stamps(); + long floor_swaps = get_mg_floor_swaps(); + + seq_printf(s, "%ld %ld %ld %ld\n", + ctime_updates, ctime_swaps, fine_stamps, floor_swaps); + return 0; +} + +DEFINE_SHOW_ATTRIBUTE(mgts); + +static int __init mg_debugfs_init(void) +{ + debugfs_create_file("multigrain_timestamps", S_IFREG | S_IRUGO, NULL, NULL, &mgts_fops); + return 0; +} +late_initcall(mg_debugfs_init); + +#else /* ! CONFIG_DEBUG_FS */ + +#define mgtime_counter_inc() do { } while (0) + +#endif /* CONFIG_DEBUG_FS */ + /* * Handle nr_inode sysctl */ @@ -2655,10 +2721,9 @@ EXPORT_SYMBOL(timestamp_truncate); * * If it is multigrain, then we first see if the coarse-grained timestamp is * distinct from what we have. If so, then we'll just use that. If we have to - * get a fine-grained timestamp, then do so, and try to swap it into the floor. - * We accept the new floor value regardless of the outcome of the cmpxchg. - * After that, we try to swap the new value into i_ctime_nsec. Again, we take - * the resulting ctime, regardless of the outcome of the swap. + * get a fine-grained timestamp, then do so. After that, we try to swap the new + * value into i_ctime_nsec. We take the resulting ctime, regardless of the + * outcome of the swap. */ struct timespec64 inode_set_ctime_current(struct inode *inode) { @@ -2687,8 +2752,10 @@ struct timespec64 inode_set_ctime_current(struct inode *inode) if (timespec64_compare(&now, &ctime) <= 0) { ktime_get_real_ts64_mg(&now); now = timestamp_truncate(now, inode); + mgtime_counter_inc(mg_fine_stamps); } } + mgtime_counter_inc(mg_ctime_updates); /* No need to cmpxchg if it's exactly the same */ if (cns == now.tv_nsec && inode->i_ctime_sec == now.tv_sec) { @@ -2702,6 +2769,7 @@ struct timespec64 inode_set_ctime_current(struct inode *inode) /* If swap occurred, then we're (mostly) done */ inode->i_ctime_sec = now.tv_sec; trace_ctime_ns_xchg(inode, cns, now.tv_nsec, cur); + mgtime_counter_inc(mg_ctime_swaps); } else { /* * Was the change due to someone marking the old ctime QUERIED? diff --git a/include/linux/timekeeping.h b/include/linux/timekeeping.h index 7aa85246c183..b9c8c597a073 100644 --- a/include/linux/timekeeping.h +++ b/include/linux/timekeeping.h @@ -48,6 +48,7 @@ extern void ktime_get_coarse_real_ts64(struct timespec64 *ts); /* Multigrain timestamp interfaces */ extern void ktime_get_coarse_real_ts64_mg(struct timespec64 *ts); extern void ktime_get_real_ts64_mg(struct timespec64 *ts); +extern long get_mg_floor_swaps(void); void getboottime64(struct timespec64 *ts); diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c index 16937242b904..94b0219955a2 100644 --- a/kernel/time/timekeeping.c +++ b/kernel/time/timekeeping.c @@ -2440,7 +2440,7 @@ EXPORT_SYMBOL_GPL(ktime_get_coarse_real_ts64_mg); * regardless of the outcome of the swap. Note that this is a filesystem * specific interface and should be avoided outside of that context. */ -void ktime_get_real_ts64_mg(struct timespec64 *ts, u64 cookie) +void ktime_get_real_ts64_mg(struct timespec64 *ts) { struct timekeeper *tk = &tk_core.timekeeper; ktime_t old = atomic64_read(&mg_floor); @@ -2464,6 +2464,7 @@ void ktime_get_real_ts64_mg(struct timespec64 *ts, u64 cookie) if (atomic64_try_cmpxchg(&mg_floor, &old, mono)) { ts->tv_nsec = 0; timespec64_add_ns(ts, nsecs); + mgtime_counter_inc(mg_floor_swaps); } else { /* * Something has changed mg_floor since "old" was diff --git a/kernel/time/timekeeping_debug.c b/kernel/time/timekeeping_debug.c index b73e8850e58d..9a3792072762 100644 --- a/kernel/time/timekeeping_debug.c +++ b/kernel/time/timekeeping_debug.c @@ -17,6 +17,9 @@ #define NUM_BINS 32 +/* incremented every time mg_floor is updated */ +DEFINE_PER_CPU(long, mg_floor_swaps); + static unsigned int sleep_time_bin[NUM_BINS] = {0}; static int tk_debug_sleep_time_show(struct seq_file *s, void *data) @@ -53,3 +56,12 @@ void tk_debug_account_sleep_time(const struct timespec64 *t) (s64)t->tv_sec, t->tv_nsec / NSEC_PER_MSEC); } +long get_mg_floor_swaps(void) +{ + int i; + long sum = 0; + + for_each_possible_cpu(i) + sum += per_cpu(mg_floor_swaps, i); + return sum < 0 ? 0 : sum; +} diff --git a/kernel/time/timekeeping_internal.h b/kernel/time/timekeeping_internal.h index 4ca2787d1642..2b49332b45a5 100644 --- a/kernel/time/timekeeping_internal.h +++ b/kernel/time/timekeeping_internal.h @@ -11,8 +11,11 @@ */ #ifdef CONFIG_DEBUG_FS extern void tk_debug_account_sleep_time(const struct timespec64 *t); +DECLARE_PER_CPU(long, mg_floor_swaps); +#define mgtime_counter_inc(__var) this_cpu_inc(__var) #else #define tk_debug_account_sleep_time(x) +#define mgtime_counter_inc() do { } while (0) #endif #ifdef CONFIG_CLOCKSOURCE_VALIDATE_LAST_CYCLE From patchwork Sat Sep 14 17:07:20 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jeff Layton X-Patchwork-Id: 13804483 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id A07D31D5CE9; Sat, 14 Sep 2024 17:07:41 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1726333661; cv=none; b=rNkSJz4lOdq6GuzRWhZhwvB+M/zs/Fgz1CoiX9ZnoqcmmMsuhrxGgL4josAfMxkxQnluJFVevSGBAKtocYBaYignNaiDpjjAE226rOed88X1+QLw+YxtaJ3C2lwRCDY9K4QJEeW8zRuSCTWqN8+DWlPeIkqLt1gZYT4D545hN5E= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1726333661; c=relaxed/simple; bh=/XcZHkK5gYpUwCSVg7s3+A6XUqyCAQD/OQvh4jjSlT8=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:References: In-Reply-To:To:Cc; b=dztHNkjndRO63MTOzAZuQ6UXAH5PP6cLKGe75eBkuzl40fZE8/9/VezzdJTqZNRSVthNXqXqmnqzpxJtIuFAZ+r7Oo8rvLecFRvQBUVIO0zixIkl+irDOYJoZYynd6TrRwAEEm0xk5v47lBf18KuWcjUM055FSE5bt5KykjCaF4= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=I6jXlekN; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="I6jXlekN" Received: by smtp.kernel.org (Postfix) with ESMTPSA id EB481C4CECE; Sat, 14 Sep 2024 17:07:38 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1726333661; bh=/XcZHkK5gYpUwCSVg7s3+A6XUqyCAQD/OQvh4jjSlT8=; h=From:Date:Subject:References:In-Reply-To:To:Cc:From; b=I6jXlekNuZGCsp5NBpRYP0R+xcXpVQbTNaK6HMUBUnTRSpt1zMlwP9l28+OQQsA35 qDoXUcSjO+bpgtb+5Fwv7gteU9Nn3R0dTcshB0a9pD2k3fNQQcCNyEnuNQAPrPXCmS 9EPY65VWshQzrPwqy8HHc2coX7lK5lfDWBwZO4fBj5XPwwC/irwcLnoxz9nT+Ceopb obf3x/9tQr5KsfITxF8pANIEFJblk2cNow6RtJQgL5OpANFICZK2eSs5HLhO3srtMS 82TGmM5omjLC8ZyXKmCqqtVN0z4DPt4tz+4+9qZgV/GdqjganAwnPl2l2av7XZxMN5 Emi4PP3ARb5eg== From: Jeff Layton Date: Sat, 14 Sep 2024 13:07:20 -0400 Subject: [PATCH v8 07/11] Documentation: add a new file documenting multigrain timestamps Precedence: bulk X-Mailing-List: linux-nfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Message-Id: <20240914-mgtime-v8-7-5bd872330bed@kernel.org> References: <20240914-mgtime-v8-0-5bd872330bed@kernel.org> In-Reply-To: <20240914-mgtime-v8-0-5bd872330bed@kernel.org> To: John Stultz , Thomas Gleixner , Stephen Boyd , Alexander Viro , Christian Brauner , Jan Kara , Steven Rostedt , Masami Hiramatsu , Mathieu Desnoyers , Jonathan Corbet , Chandan Babu R , "Darrick J. Wong" , Theodore Ts'o , Andreas Dilger , Chris Mason , Josef Bacik , David Sterba , Hugh Dickins , Andrew Morton , Chuck Lever , Vadim Fedorenko Cc: Randy Dunlap , linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-trace-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-xfs@vger.kernel.org, linux-ext4@vger.kernel.org, linux-btrfs@vger.kernel.org, linux-nfs@vger.kernel.org, linux-mm@kvack.org, Jeff Layton X-Mailer: b4 0.14.1 X-Developer-Signature: v=1; a=openpgp-sha256; l=6998; i=jlayton@kernel.org; h=from:subject:message-id; bh=/XcZHkK5gYpUwCSVg7s3+A6XUqyCAQD/OQvh4jjSlT8=; b=owEBbQKS/ZANAwAIAQAOaEEZVoIVAcsmYgBm5cLGJuuSb1xltFHpW0q8a+qitvGMUmowicsfj Rg/e3AZpXiJAjMEAAEIAB0WIQRLwNeyRHGyoYTq9dMADmhBGVaCFQUCZuXCxgAKCRAADmhBGVaC FWVQEADCr1Ju8Vg4w9iEuJQYTT5m4eUvccIiH6Wxe4xu7Bjjl7Esfa5BErRe/5Jzro90FFNT424 rEz4iPtm6aigGIR/RWKMg+HHvguntKiMPVorNF2LSPWDdD323OJYfux0/lhljniTqzGjYNxEJDD AyTSwsd+qaP99+L8AtMBC8VbvyBAM9M2d8I+U82Omh/FxBpuybHvs2KUh8Zqnom/ICx7y3131gx BUbpZN4gCpoP8yyUoB1+I8XnGs3Q4e4xU/3f3VMmHve1xpMr47XZ9aEuEpVQZ/6dyKTSTEkr5pd BW9VllFp5Xq0daBkYHXOVTJVR+5xX+GPkraB/JNTcJBRl8FQXnUXzbXHFottzpRseuCGkLnlhR5 /Oy1FspdD6sUlKvnyALimjvBu6vKZzqJRuYOxHLNI0HYiY3mnBwq8UTouZR515qw64HylfuCHIh tfNxxCNnfr86gANN4QXomTNmAzzBzvWGfD+V48xHvhRE4ESBjfQ1B2NFJyinNOYkBQfrRtVUUXP aCHXIgF0oFoiDt2e0nCQym4YsEt0Wxh5aSNJDdcA0Zs6srKCiIjbA1U7TgGCF+UXQ9QhBLeeAaw iGt7V2f/qfXuyfXskvDFLtFTHw84eXq55hkiuqqgZJ+i4cvgaOQ6/d6+K8qD98nF0fWuHreU9yx EBLPusAlWUknZNQ== X-Developer-Key: i=jlayton@kernel.org; a=openpgp; fpr=4BC0D7B24471B2A184EAF5D3000E684119568215 Add a high-level document that describes how multigrain timestamps work, rationale for them, and some info about implementation and tradeoffs. Reviewed-by: Josef Bacik Reviewed-by: Darrick J. Wong Reviewed-by: Randy Dunlap Reviewed-by: Jan Kara Signed-off-by: Jeff Layton --- Documentation/filesystems/index.rst | 1 + Documentation/filesystems/multigrain-ts.rst | 121 ++++++++++++++++++++++++++++ 2 files changed, 122 insertions(+) diff --git a/Documentation/filesystems/index.rst b/Documentation/filesystems/index.rst index e8e496d23e1d..44e9e77ffe0d 100644 --- a/Documentation/filesystems/index.rst +++ b/Documentation/filesystems/index.rst @@ -29,6 +29,7 @@ algorithms work. fiemap files locks + multigrain-ts mount_api quota seq_file diff --git a/Documentation/filesystems/multigrain-ts.rst b/Documentation/filesystems/multigrain-ts.rst new file mode 100644 index 000000000000..97877ab3d933 --- /dev/null +++ b/Documentation/filesystems/multigrain-ts.rst @@ -0,0 +1,121 @@ +.. SPDX-License-Identifier: GPL-2.0 + +===================== +Multigrain Timestamps +===================== + +Introduction +============ +Historically, the kernel has always used coarse time values to stamp inodes. +This value is updated every jiffy, so any change that happens within that jiffy +will end up with the same timestamp. + +When the kernel goes to stamp an inode (due to a read or write), it first gets +the current time and then compares it to the existing timestamp(s) to see +whether anything will change. If nothing changed, then it can avoid updating +the inode's metadata. + +Coarse timestamps are therefore good from a performance standpoint, since they +reduce the need for metadata updates, but bad from the standpoint of +determining whether anything has changed, since a lot of things can happen in a +jiffy. + +They are particularly troublesome with NFSv3, where unchanging timestamps can +make it difficult to tell whether to invalidate caches. NFSv4 provides a +dedicated change attribute that should always show a visible change, but not +all filesystems implement this properly, causing the NFS server to substitute +the ctime in many cases. + +Multigrain timestamps aim to remedy this by selectively using fine-grained +timestamps when a file has had its timestamps queried recently, and the current +coarse-grained time does not cause a change. + +Inode Timestamps +================ +There are currently 3 timestamps in the inode that are updated to the current +wallclock time on different activity: + +ctime: + The inode change time. This is stamped with the current time whenever + the inode's metadata is changed. Note that this value is not settable + from userland. + +mtime: + The inode modification time. This is stamped with the current time + any time a file's contents change. + +atime: + The inode access time. This is stamped whenever an inode's contents are + read. Widely considered to be a terrible mistake. Usually avoided with + options like noatime or relatime. + +Updating the mtime always implies a change to the ctime, but updating the +atime due to a read request does not. + +Multigrain timestamps are only tracked for the ctime and the mtime. atimes are +not affected and always use the coarse-grained value (subject to the floor). + +Inode Timestamp Ordering +======================== + +In addition to just providing info about changes to individual files, file +timestamps also serve an important purpose in applications like "make". These +programs measure timestamps in order to determine whether source files might be +newer than cached objects. + +Userland applications like make can only determine ordering based on +operational boundaries. For a syscall those are the syscall entry and exit +points. For io_uring or nfsd operations, that's the request submission and +response. In the case of concurrent operations, userland can make no +determination about the order in which things will occur. + +For instance, if a single thread modifies one file, and then another file in +sequence, the second file must show an equal or later mtime than the first. The +same is true if two threads are issuing similar operations that do not overlap +in time. + +If however, two threads have racing syscalls that overlap in time, then there +is no such guarantee, and the second file may appear to have been modified +before, after or at the same time as the first, regardless of which one was +submitted first. + +Multigrain Timestamp Implementation +=================================== +Multigrain timestamps are aimed at ensuring that changes to a single file are +always recognizable, without violating the ordering guarantees when multiple +different files are modified. This affects the mtime and the ctime, but the +atime will always use coarse-grained timestamps. + +It uses an unused bit in the i_ctime_nsec field to indicate whether the mtime +or ctime has been queried. If either or both have, then the kernel takes +special care to ensure the next timestamp update will display a visible change. +This ensures tight cache coherency for use-cases like NFS, without sacrificing +the benefits of reduced metadata updates when files aren't being watched. + +The Ctime Floor Value +===================== +It's not sufficient to simply use fine or coarse-grained timestamps based on +whether the mtime or ctime has been queried. A file could get a fine grained +timestamp, and then a second file modified later could get a coarse-grained one +that appears earlier than the first, which would break the kernel's timestamp +ordering guarantees. + +To mitigate this problem, we maintain a global floor value that ensures that +this can't happen. The two files in the above example may appear to have been +modified at the same time in such a case, but they will never show the reverse +order. To avoid problems with realtime clock jumps, the floor is managed as a +monotonic ktime_t, and the values are converted to realtime clock values as +needed. + +Implementation Notes +==================== +Multigrain timestamps are intended for use by local filesystems that get +ctime values from the local clock. This is in contrast to network filesystems +and the like that just mirror timestamp values from a server. + +For most filesystems, it's sufficient to just set the FS_MGTIME flag in the +fstype->fs_flags in order to opt-in, providing the ctime is only ever set via +inode_set_ctime_current(). If the filesystem has a ->getattr routine that +doesn't call generic_fillattr, then you should have it call fill_mg_cmtime to +fill those values. For setattr, it should use setattr_copy() to update the +timestamps, or otherwise mimic its behavior. From patchwork Sat Sep 14 17:07:21 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jeff Layton X-Patchwork-Id: 13804484 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id C91731D1317; Sat, 14 Sep 2024 17:07:44 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1726333664; cv=none; b=A9DZNQz7ESYinLOULW4tUHdMpTFJXq7ndUwHGmCzCHbEPOKW48SbTgxPXt1aGV4GUuc99pXOwoGEn/u8C1Etotd9tKteMYT4fEPcR9jO/i3AHebHndx0/5/aAc5RPy1kJbgdUx4NRmCwWeVEJaEEnVj8uqXr4hKOpnXDZ+p6nZQ= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1726333664; c=relaxed/simple; bh=bIaGrKHNkJGKwsIp2/9hx+nM5n7ktsZpZECPjECvfD4=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:References: In-Reply-To:To:Cc; b=sp03rjI22y7EttMQJwD/R3yaCMyogdpWCWu/Rl70oZzDx1mJii+WFzWRm90KXY2PJPGxFCBmm5LmbYwJOLI7glopqIMBgQGQG/JAZXp8mmWWOiOyHMDCvezNuGiLkoe/DG84ZNN7lRp9moblEdLYTtgnVcKzYDZiy9xSJ0JAy7E= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=WWxcOHhP; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="WWxcOHhP" Received: by smtp.kernel.org (Postfix) with ESMTPSA id BD957C4CECF; Sat, 14 Sep 2024 17:07:41 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1726333664; bh=bIaGrKHNkJGKwsIp2/9hx+nM5n7ktsZpZECPjECvfD4=; h=From:Date:Subject:References:In-Reply-To:To:Cc:From; b=WWxcOHhPNBbvQOZoVEbWbSYGiCVIBwokWxYhCZncCHYiJQcTFpwERaEcR4Wh0Xbn2 3xgKHnCdve6t5zipyFCFIIBbKMWVFc9FsCdvQlwXt0VN1K59PGcfWb0gtf66zVh9ku 4gWzO/Exl3YH/sNl8Z55iJKZqKHzfWBD/8d75G3/rPfif7e0K/HOcWPYvxm/rvB6z7 ynGJN/c23nHNOBswZWBCa0GrR4QQylZNz9jykfRjCvUNB3lAR92/VFZcvrBWVHvkFG cgl7qUvXIQhMBLNMBiIOTqlN2+HOEptvI6n7QQwrcqDYZie6g2r0gfXk8R9Qa2FNtn LNjg2zn+ZKxEQ== From: Jeff Layton Date: Sat, 14 Sep 2024 13:07:21 -0400 Subject: [PATCH v8 08/11] xfs: switch to multigrain timestamps Precedence: bulk X-Mailing-List: linux-nfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Message-Id: <20240914-mgtime-v8-8-5bd872330bed@kernel.org> References: <20240914-mgtime-v8-0-5bd872330bed@kernel.org> In-Reply-To: <20240914-mgtime-v8-0-5bd872330bed@kernel.org> To: John Stultz , Thomas Gleixner , Stephen Boyd , Alexander Viro , Christian Brauner , Jan Kara , Steven Rostedt , Masami Hiramatsu , Mathieu Desnoyers , Jonathan Corbet , Chandan Babu R , "Darrick J. Wong" , Theodore Ts'o , Andreas Dilger , Chris Mason , Josef Bacik , David Sterba , Hugh Dickins , Andrew Morton , Chuck Lever , Vadim Fedorenko Cc: Randy Dunlap , linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-trace-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-xfs@vger.kernel.org, linux-ext4@vger.kernel.org, linux-btrfs@vger.kernel.org, linux-nfs@vger.kernel.org, linux-mm@kvack.org, Jeff Layton X-Mailer: b4 0.14.1 X-Developer-Signature: v=1; a=openpgp-sha256; l=2942; i=jlayton@kernel.org; h=from:subject:message-id; bh=bIaGrKHNkJGKwsIp2/9hx+nM5n7ktsZpZECPjECvfD4=; b=owEBbQKS/ZANAwAIAQAOaEEZVoIVAcsmYgBm5cLGuWV9a+ING+mQgmsUefPyvxOtCZHhgbDyr h3WPzWE+SWJAjMEAAEIAB0WIQRLwNeyRHGyoYTq9dMADmhBGVaCFQUCZuXCxgAKCRAADmhBGVaC Ff0ID/9dUQxrrNbqHpnaVnHsEUNWAcwDvm/Uzqz8JRH/iN4YRdGjThpIu7yEpdW72rihVUgzhdp MDDhOlQO5YqQqkgHr2CFKMZk6mr8pRtn85S+8NTVanoaK5CcKu+J3cM9B8uAhojNlwwLA08TIEg GarAOa/asTvsuyly/mkIWdl+B86IGn5U069ELYgegsSia8FaUxhSDP2aMn0ZXoP7yKsKU3iEGnY OKDmWJaL4kfYP5m+nQQcjCvXkSJY9Ek9EIZIErc5POHxebkpCVmS2EYzTeC7jcJsCpY0mK4euhj 05JoZnB5tDPT+Ct7GxZMu+oFKFfHk9kOG7cpt4wbXXB0V/eTid0mZdkeFy3Q6hzrgRxg+GoIJML nS4LFtcRuOl/hBIkH+gAMvbPxqQMHWj5xvEiFoJGKo1X48NfNJwzOids5R/ru8ztisjIxOWUIQ8 loAuC5LGF8DixJKgHNRS9S7Pf5fC/zOAJwTWbHc1WsnJ8jCSraBKi6yhlJaY2ky5i0OSdZJ6KAK tWQg7IgZtrxbNJuibvrXaOj8KBKky232I5TcB9IwV28ZFZiflP8071Ut5Gt8VEuoMan75ri60n2 R+g/WaoBLU5fySDEGF02y91KTtNV09R9oXBtC6oLh2HAY++rt9IZ6JzLzFbgWZCMyyI8/yQJwzq yRfbZfa50FD1bMA== X-Developer-Key: i=jlayton@kernel.org; a=openpgp; fpr=4BC0D7B24471B2A184EAF5D3000E684119568215 Enable multigrain timestamps, which should ensure that there is an apparent change to the timestamp whenever it has been written after being actively observed via getattr. Also, anytime the mtime changes, the ctime must also change, and those are now the only two options for xfs_trans_ichgtime. Have that function unconditionally bump the ctime, and ASSERT that XFS_ICHGTIME_CHG is always set. Finally, stop setting STATX_CHANGE_COOKIE in getattr, since the ctime should give us better semantics now. Reviewed-by: Josef Bacik Reviewed-by: Darrick J. Wong Signed-off-by: Jeff Layton --- fs/xfs/libxfs/xfs_trans_inode.c | 6 +++--- fs/xfs/xfs_iops.c | 10 +++------- fs/xfs/xfs_super.c | 2 +- 3 files changed, 7 insertions(+), 11 deletions(-) diff --git a/fs/xfs/libxfs/xfs_trans_inode.c b/fs/xfs/libxfs/xfs_trans_inode.c index 3c40f37e82c7..c962ad64b0c1 100644 --- a/fs/xfs/libxfs/xfs_trans_inode.c +++ b/fs/xfs/libxfs/xfs_trans_inode.c @@ -62,12 +62,12 @@ xfs_trans_ichgtime( ASSERT(tp); xfs_assert_ilocked(ip, XFS_ILOCK_EXCL); - tv = current_time(inode); + /* If the mtime changes, then ctime must also change */ + ASSERT(flags & XFS_ICHGTIME_CHG); + tv = inode_set_ctime_current(inode); if (flags & XFS_ICHGTIME_MOD) inode_set_mtime_to_ts(inode, tv); - if (flags & XFS_ICHGTIME_CHG) - inode_set_ctime_to_ts(inode, tv); if (flags & XFS_ICHGTIME_ACCESS) inode_set_atime_to_ts(inode, tv); if (flags & XFS_ICHGTIME_CREATE) diff --git a/fs/xfs/xfs_iops.c b/fs/xfs/xfs_iops.c index 1cdc8034f54d..a1c4a350a6db 100644 --- a/fs/xfs/xfs_iops.c +++ b/fs/xfs/xfs_iops.c @@ -597,8 +597,9 @@ xfs_vn_getattr( stat->gid = vfsgid_into_kgid(vfsgid); stat->ino = ip->i_ino; stat->atime = inode_get_atime(inode); - stat->mtime = inode_get_mtime(inode); - stat->ctime = inode_get_ctime(inode); + + fill_mg_cmtime(stat, request_mask, inode); + stat->blocks = XFS_FSB_TO_BB(mp, ip->i_nblocks + ip->i_delayed_blks); if (xfs_has_v3inodes(mp)) { @@ -608,11 +609,6 @@ xfs_vn_getattr( } } - if ((request_mask & STATX_CHANGE_COOKIE) && IS_I_VERSION(inode)) { - stat->change_cookie = inode_query_iversion(inode); - stat->result_mask |= STATX_CHANGE_COOKIE; - } - /* * Note: If you add another clause to set an attribute flag, please * update attributes_mask below. diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c index 27e9f749c4c7..210481b03fdb 100644 --- a/fs/xfs/xfs_super.c +++ b/fs/xfs/xfs_super.c @@ -2052,7 +2052,7 @@ static struct file_system_type xfs_fs_type = { .init_fs_context = xfs_init_fs_context, .parameters = xfs_fs_parameters, .kill_sb = xfs_kill_sb, - .fs_flags = FS_REQUIRES_DEV | FS_ALLOW_IDMAP, + .fs_flags = FS_REQUIRES_DEV | FS_ALLOW_IDMAP | FS_MGTIME, }; MODULE_ALIAS_FS("xfs"); From patchwork Sat Sep 14 17:07:22 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jeff Layton X-Patchwork-Id: 13804485 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 45BF61D131E; Sat, 14 Sep 2024 17:07:47 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1726333667; cv=none; b=tgjVp4RgcWClfUVilXexWJhILVho0gWV1jtH9QatQPs1K6nM2OQFM7WdQ+ita41TCQz4NWoYI/gEgxfOIa69TksM3MzsXk1K0jYRzBLXNqmBEKzGmAs34foWNObpFelLd0fkAce/4zNmHMAz6Prq+i+2P+0xv7z4nolRNrE60HM= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1726333667; c=relaxed/simple; bh=Fd5IxgHtKcN7b8Mow1VZm20amle3gCRV/bljD+jCFMs=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:References: In-Reply-To:To:Cc; b=dHgNpq0FbkaBQ2aaQT/mrWnsWQ8O0vsovzoks2YkSov0lEzAOT/7WbIh6jZlrpbARX1LItWCaN5EH2FjB1lesF0Hj3R6cAj7Ns5Sm6opvTDcOwNlx/sb964xuIdksMI9oAHTv0Aq3URjDGijpVOIXtdRf1Prgn7eLL/2tb64z/s= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=lG7KWITq; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="lG7KWITq" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 91458C4CECE; Sat, 14 Sep 2024 17:07:44 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1726333667; bh=Fd5IxgHtKcN7b8Mow1VZm20amle3gCRV/bljD+jCFMs=; h=From:Date:Subject:References:In-Reply-To:To:Cc:From; b=lG7KWITqZ0JTULqD7K9VNOzVeIcKW5pDUATUsN0srkR/FaTU34rCalh4fQLd0NiDz ayuWAUjH/b0F6+2dSPyidYV2vGekaNEFlst+DGGLDZJp9KDjfwEp7fFvmRbLLGpgXQ VKp25sywWngfmZlWl3znmMEZJ2xwbKrYa3+nQL+VkAIDTbthpmqHHGPh0xozWOzWA3 tdQLKrY/X3SFm2oYyzfnF6oETy80+BgJ1JlbjHE2pKOJGAY5EBgFE2vN7/MpyNIKib DxwIgODNV99Racaj1tVRZSo5DgF94J49k5zNoIAj9ZI+9aj+TWw8u/SluzXG2HZjNI dCovHd/yHMRbA== From: Jeff Layton Date: Sat, 14 Sep 2024 13:07:22 -0400 Subject: [PATCH v8 09/11] ext4: switch to multigrain timestamps Precedence: bulk X-Mailing-List: linux-nfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Message-Id: <20240914-mgtime-v8-9-5bd872330bed@kernel.org> References: <20240914-mgtime-v8-0-5bd872330bed@kernel.org> In-Reply-To: <20240914-mgtime-v8-0-5bd872330bed@kernel.org> To: John Stultz , Thomas Gleixner , Stephen Boyd , Alexander Viro , Christian Brauner , Jan Kara , Steven Rostedt , Masami Hiramatsu , Mathieu Desnoyers , Jonathan Corbet , Chandan Babu R , "Darrick J. Wong" , Theodore Ts'o , Andreas Dilger , Chris Mason , Josef Bacik , David Sterba , Hugh Dickins , Andrew Morton , Chuck Lever , Vadim Fedorenko Cc: Randy Dunlap , linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-trace-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-xfs@vger.kernel.org, linux-ext4@vger.kernel.org, linux-btrfs@vger.kernel.org, linux-nfs@vger.kernel.org, linux-mm@kvack.org, Jeff Layton X-Mailer: b4 0.14.1 X-Developer-Signature: v=1; a=openpgp-sha256; l=926; i=jlayton@kernel.org; h=from:subject:message-id; bh=Fd5IxgHtKcN7b8Mow1VZm20amle3gCRV/bljD+jCFMs=; b=owEBbQKS/ZANAwAIAQAOaEEZVoIVAcsmYgBm5cLGaLrdQdloS9SHdXwjs/lx9Z+m43owb/WYh Ced8DtcfP2JAjMEAAEIAB0WIQRLwNeyRHGyoYTq9dMADmhBGVaCFQUCZuXCxgAKCRAADmhBGVaC Fdf/D/43c/84XJBxc2MJzrsOgo8VKaf8OhXukqOqa4RnPa4E5mj1ttGDtErs8lIQmzt45brcimw cxGf4byRL5N91O6Cv5PFmSlZiKV/MGMucjoc+YCRT6PPP+L14YxaP0mnxjK7DSxNPiEXtPytkk/ pMnbw41G5nHWCvJXaZnH2iV628zZgovKt9rGi05T+YvxjfYZhRRxZBc2r5q5OHsFYRlvlCFWnYh Uiy4soYIg6KZNopRh28NC75kVeKzbJm9OjYuSO8Eklbhbn1ew6n1TEudO25zyMSL1gimXSzU28o WXRf+DYyfpaWR1ZSCFRhYUvhslFN+gJ23qo7BwI3R1HmQmObfhuRUFPQGRRGq3YWfZkWTRoL+h8 ktSVEFwBZZ8jJw3ZyF18d2WM5rUZXtQTYj6ZjBfLq0viDaj+whmVXP5WKcyjqHNc8+9tf3IjRNM QICRarciWhavol197vxmJ9xwMm/whTgX6Zn90+/2JwlxsNNey+3aFRKKTJLC4rVZcwNnfPaOTbD 8D9w21RW913ovXTTJz61yn+MWHnrTlcjJ1Q3nWPY/XbvNQyXJoT5xrMBZ6ew2mhamHbgTC+jZvi +ZzebH+GhoqU0MBAOmPVX6nNE3BiL1nAq1UPunG3Vivc/vxhkn/LxCuFW/8esGLfkACeXqVNf7X aVUfw8gUCfn5HNw== X-Developer-Key: i=jlayton@kernel.org; a=openpgp; fpr=4BC0D7B24471B2A184EAF5D3000E684119568215 Enable multigrain timestamps, which should ensure that there is an apparent change to the timestamp whenever it has been written after being actively observed via getattr. For ext4, we only need to enable the FS_MGTIME flag. Reviewed-by: Josef Bacik Reviewed-by: Jan Kara Signed-off-by: Jeff Layton --- fs/ext4/super.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fs/ext4/super.c b/fs/ext4/super.c index e72145c4ae5a..a125d9435b8a 100644 --- a/fs/ext4/super.c +++ b/fs/ext4/super.c @@ -7298,7 +7298,7 @@ static struct file_system_type ext4_fs_type = { .init_fs_context = ext4_init_fs_context, .parameters = ext4_param_specs, .kill_sb = ext4_kill_sb, - .fs_flags = FS_REQUIRES_DEV | FS_ALLOW_IDMAP, + .fs_flags = FS_REQUIRES_DEV | FS_ALLOW_IDMAP | FS_MGTIME, }; MODULE_ALIAS_FS("ext4"); From patchwork Sat Sep 14 17:07:23 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jeff Layton X-Patchwork-Id: 13804486 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 6E7111D86C0; Sat, 14 Sep 2024 17:07:50 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1726333670; cv=none; b=FZnsdqrc2Wify/skLbi5btvBEGe2I7drwd5qr44KYy5M6akAgoBHxEKtRq7D1iUagM8LPTUyq5+29hC6Qz8TROa+cqIQky97lpKEFhCUfwNd2E+t1MEhwtob8FEcCxlIz031WDoj0570SX9i86ewRzbhpBOc2ayHkWaE/zhcGBA= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1726333670; c=relaxed/simple; bh=gD0wNq7Gu5j1JVptVjT60GA5HV2tmuyZD0KoREfuUWw=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:References: In-Reply-To:To:Cc; b=qhMsg9hedwIPubjHF3dm3t+Rhjd+GvxXteoXlqZNmlz/o4x3zNCpqY1yJmxEKiAalcrbTzG2zy2t6v2pIAGpqUgEzg4bNgLIeQ39cueSrj7LQo1Ghn8J4MiDNZVNBcVsKLq8hvitfacB+EqZ+gdEosN+jrQDArXUBd7pYvPYPak= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=OnsaXc9Y; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="OnsaXc9Y" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 64294C4CECF; Sat, 14 Sep 2024 17:07:47 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1726333670; bh=gD0wNq7Gu5j1JVptVjT60GA5HV2tmuyZD0KoREfuUWw=; h=From:Date:Subject:References:In-Reply-To:To:Cc:From; b=OnsaXc9Ymk3RYfHMoFlLfXvNdkNLPJT/nCCxRvBL04+CscVmYlVBuqxqN52CHdmZq rS++6tGhQj3jHFzyPbPjWZz60BE3gbH55x26rwO5tlfsF/Oy6ua//id7CsKu8XobYh XsyTGeKDPE9GQOa6pebLQ5hcvMqGTv743a/9EA7n6F578YF5ADhWllJWaCFs3ahd94 4G7wC+3BmpbRo179YYT9jFBP1j7K283dTrDdRUb8SKm+hEJdqDm9Iw7LZTIXza9b4Y Hy2/y7j2vFxVnXRqpTxnXQF3KFr34hNAELx6wKUpGn5vJFSPI1OcEq4FIAOYgFVVja DUQfrqF54ZGDw== From: Jeff Layton Date: Sat, 14 Sep 2024 13:07:23 -0400 Subject: [PATCH v8 10/11] btrfs: convert to multigrain timestamps Precedence: bulk X-Mailing-List: linux-nfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Message-Id: <20240914-mgtime-v8-10-5bd872330bed@kernel.org> References: <20240914-mgtime-v8-0-5bd872330bed@kernel.org> In-Reply-To: <20240914-mgtime-v8-0-5bd872330bed@kernel.org> To: John Stultz , Thomas Gleixner , Stephen Boyd , Alexander Viro , Christian Brauner , Jan Kara , Steven Rostedt , Masami Hiramatsu , Mathieu Desnoyers , Jonathan Corbet , Chandan Babu R , "Darrick J. Wong" , Theodore Ts'o , Andreas Dilger , Chris Mason , Josef Bacik , David Sterba , Hugh Dickins , Andrew Morton , Chuck Lever , Vadim Fedorenko Cc: Randy Dunlap , linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-trace-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-xfs@vger.kernel.org, linux-ext4@vger.kernel.org, linux-btrfs@vger.kernel.org, linux-nfs@vger.kernel.org, linux-mm@kvack.org, Jeff Layton X-Mailer: b4 0.14.1 X-Developer-Signature: v=1; a=openpgp-sha256; l=2660; i=jlayton@kernel.org; h=from:subject:message-id; bh=gD0wNq7Gu5j1JVptVjT60GA5HV2tmuyZD0KoREfuUWw=; b=owEBbQKS/ZANAwAIAQAOaEEZVoIVAcsmYgBm5cLGDFBDcopNy/21OHrXrCajnq3tvWydZqq9f oWLLLJG7hyJAjMEAAEIAB0WIQRLwNeyRHGyoYTq9dMADmhBGVaCFQUCZuXCxgAKCRAADmhBGVaC FbCsEADDSDCuLOkoBu17JJLTtSg9T7tYQSu30N5o6tKAz5hfYb3oqONtgrosynqevuMTPya4ygi j2u7OXW8FNsPhJbFFfD7RYKB1bha7dDz9BXy1qJep8KB06FA4/ZZKqGo9Rtfo/S564yYOmrMtD1 LCCfM9ZMexmEjLRbE/6qQk9LKHL4RUkw/Uus8ZQw0m4pti5eQVEsZMvIyqDEKhzySyvg0Xgaqds k4SDLA4lRPxKpMm+kTdJ3nhEGNPL6AL8lzF0lzeT8rtq6HnV951MCwIbBPnAcGzXZH11GFmSwGh /r/HxQsBLNwqwXFzB+lqbp9403WElBb+E0J9Sj/6zQf7jA3CicYASW4ojXLpgxzzZGeSAfs2lkz RrQ8mEhezpMeVrn+HaorHAqEH7FzMvYSGJNE+Mxss0+hqBH/sJ7kja46tMW+5rzZ0SwjmNuue2W 1WyYbsohvdObP6z6hREpQ1OO7D9VQ6QoH0lOnOWGF4re20/p1yp7uclcGu7qe6UgMDtWCpjetRo 2DKRnag68WIlLO/hcUlJpg7HRhzU+kIB0YTG61GKnVdydjW2GQJScOLK8PxJJczGuWiesH9seJk hJB65WmjibjgT8/tq+g5CPdXoKRvGkyDBcSWcFPDeyO7uFLuDqqAFFH5S/OVgK9syKPrMNpx+te xeyJbodV8Mp6iSg== X-Developer-Key: i=jlayton@kernel.org; a=openpgp; fpr=4BC0D7B24471B2A184EAF5D3000E684119568215 Enable multigrain timestamps, which should ensure that there is an apparent change to the timestamp whenever it has been written after being actively observed via getattr. Beyond enabling the FS_MGTIME flag, this patch eliminates update_time_for_write, which goes to great pains to avoid in-memory stores. Just have it overwrite the timestamps unconditionally. Note that this also drops the IS_I_VERSION check and unconditionally bumps the change attribute, since SB_I_VERSION is always set on btrfs. Reviewed-by: Josef Bacik Signed-off-by: Jeff Layton --- fs/btrfs/file.c | 25 ++++--------------------- fs/btrfs/super.c | 3 ++- 2 files changed, 6 insertions(+), 22 deletions(-) diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c index 2aeb8116549c..1656ad7498b8 100644 --- a/fs/btrfs/file.c +++ b/fs/btrfs/file.c @@ -1120,26 +1120,6 @@ void btrfs_check_nocow_unlock(struct btrfs_inode *inode) btrfs_drew_write_unlock(&inode->root->snapshot_lock); } -static void update_time_for_write(struct inode *inode) -{ - struct timespec64 now, ts; - - if (IS_NOCMTIME(inode)) - return; - - now = current_time(inode); - ts = inode_get_mtime(inode); - if (!timespec64_equal(&ts, &now)) - inode_set_mtime_to_ts(inode, now); - - ts = inode_get_ctime(inode); - if (!timespec64_equal(&ts, &now)) - inode_set_ctime_to_ts(inode, now); - - if (IS_I_VERSION(inode)) - inode_inc_iversion(inode); -} - int btrfs_write_check(struct kiocb *iocb, struct iov_iter *from, size_t count) { struct file *file = iocb->ki_filp; @@ -1170,7 +1150,10 @@ int btrfs_write_check(struct kiocb *iocb, struct iov_iter *from, size_t count) * need to start yet another transaction to update the inode as we will * update the inode when we finish writing whatever data we write. */ - update_time_for_write(inode); + if (!IS_NOCMTIME(inode)) { + inode_set_mtime_to_ts(inode, inode_set_ctime_current(inode)); + inode_inc_iversion(inode); + } start_pos = round_down(pos, fs_info->sectorsize); oldsize = i_size_read(inode); diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c index 98fa0f382480..d423acfe11d0 100644 --- a/fs/btrfs/super.c +++ b/fs/btrfs/super.c @@ -2198,7 +2198,8 @@ static struct file_system_type btrfs_fs_type = { .init_fs_context = btrfs_init_fs_context, .parameters = btrfs_fs_parameters, .kill_sb = btrfs_kill_super, - .fs_flags = FS_REQUIRES_DEV | FS_BINARY_MOUNTDATA | FS_ALLOW_IDMAP, + .fs_flags = FS_REQUIRES_DEV | FS_BINARY_MOUNTDATA | + FS_ALLOW_IDMAP | FS_MGTIME, }; MODULE_ALIAS_FS("btrfs"); From patchwork Sat Sep 14 17:07:24 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jeff Layton X-Patchwork-Id: 13804487 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 2B4091D9350; Sat, 14 Sep 2024 17:07:52 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1726333673; cv=none; b=jMSoDijNKSbJ4ywCRx1K5F2qhVR4DQpuUovln0O4egVozOsfepLKx1eGORAJhNlrW/6jYp9vlLiFQ+sxOgCe3oYlfsC8oSiCJxqaPuS70nl8XKyCXs52UurRPTXMf4i3cvZ4pMEgMf8PrmZaD3gOeoA3TP/PM/guDsF8j1r5dcU= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1726333673; c=relaxed/simple; bh=gE8r0Qbj4D8u7V9shLCLoJR+0WYqh7zAXrVYnVeHWEs=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:References: In-Reply-To:To:Cc; b=JguhdejjYn+ADav1jWIc14IA0+nErUOAZmJFf2+4vzP/75NrPS9JUTvGZj5YW40w0hP2uWzvdN3jnwvEDr0CEwfBEdhYYA2T7unc7K/7PWAn0Gr99N68yCBwr6pbNmL12ZZiN7PkCeG2tqlZLyPWXGdn2jmYn9qU3gbu0PKEOd4= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=R0wkJ2rw; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="R0wkJ2rw" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 3738DC4CECC; Sat, 14 Sep 2024 17:07:50 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1726333672; bh=gE8r0Qbj4D8u7V9shLCLoJR+0WYqh7zAXrVYnVeHWEs=; h=From:Date:Subject:References:In-Reply-To:To:Cc:From; b=R0wkJ2rwtbowlAve1guhZzLO89JBiB6wfkUI9wk3bplGqMCoTw9HGM4KS8fTDOshB FuZWd67kwt4rXUsO6CyB2+Ab6PMhi1ARNtC/t9cutBzZelr/O7diYk1RtBKLvbXFwj tFYMT7EBiTl4wyF3ptzVe8+gBbE4nobdO7AjF3pR/1ftAychS6wsoIDxrRWGG/ajRJ 8El2g5E5XmxrAyULWb6SehvjQWXEmUzvBqZFIwqsSMLoOvY0obrig7U5h2VflcZRPB hRzANQMSzmpqk6mhxGQh3iMgVDaA7LDTb/Cp6c6d3nZNG5jhWPSAOkjEbJx7IVPYfx liPdJfoPZ6GWw== From: Jeff Layton Date: Sat, 14 Sep 2024 13:07:24 -0400 Subject: [PATCH v8 11/11] tmpfs: add support for multigrain timestamps Precedence: bulk X-Mailing-List: linux-nfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Message-Id: <20240914-mgtime-v8-11-5bd872330bed@kernel.org> References: <20240914-mgtime-v8-0-5bd872330bed@kernel.org> In-Reply-To: <20240914-mgtime-v8-0-5bd872330bed@kernel.org> To: John Stultz , Thomas Gleixner , Stephen Boyd , Alexander Viro , Christian Brauner , Jan Kara , Steven Rostedt , Masami Hiramatsu , Mathieu Desnoyers , Jonathan Corbet , Chandan Babu R , "Darrick J. Wong" , Theodore Ts'o , Andreas Dilger , Chris Mason , Josef Bacik , David Sterba , Hugh Dickins , Andrew Morton , Chuck Lever , Vadim Fedorenko Cc: Randy Dunlap , linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-trace-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-xfs@vger.kernel.org, linux-ext4@vger.kernel.org, linux-btrfs@vger.kernel.org, linux-nfs@vger.kernel.org, linux-mm@kvack.org, Jeff Layton X-Mailer: b4 0.14.1 X-Developer-Signature: v=1; a=openpgp-sha256; l=862; i=jlayton@kernel.org; h=from:subject:message-id; bh=gE8r0Qbj4D8u7V9shLCLoJR+0WYqh7zAXrVYnVeHWEs=; b=owEBbQKS/ZANAwAIAQAOaEEZVoIVAcsmYgBm5cLG77OZfuwOGAG5Ly687wn48EdQFM4UuCpjV SX1+Zqe5KSJAjMEAAEIAB0WIQRLwNeyRHGyoYTq9dMADmhBGVaCFQUCZuXCxgAKCRAADmhBGVaC FfuLEAC3RtpxYP6q0t0xpB+5OwE31ZZoYfdisPWuC6dQLRBf3JPHoqiJI1tM3i2ks/CU726MZIK hrROc4CQsVRZrxvfgB3tCcL+TZFHq1469dvPTsHpldJ9POculR2FqNZaK07nhswJGDtwXp9KryG TXSExFmSv3iZAZqDa4RNy6DAuOb4nMchTbC/v+BhHmGES71T+aZMSfGP5kpvjDeTdI9WcIRtbgX rC34F9nBnt5pGNWL8wyImBu/GKNQYb5N6eaq7Yyv8eNLqtOlwEc4NRsrqSGcnWOxt32FpzMkP3i 8vXlw7NOtTM/uv4uG3Dvq8Gatg8/6EsknseTHtcNd77XEDGCf6pqTzZ+vfkg1v9GuomVqwDubWI 9UUBtVq9VcF59teol7/FyZoXiOigJ7wj/mCK6vzRepGszlCjC7Bi+j500YArHMd+322IG9y5EXe O8t6skl5tsxmALcyf4FPYk/fLSj5Fkx/H7fFWygBTxXDCrCAfbNMcmCqbOYyIY6rpY8JueiwWXB TTaoMNYhE2yNiwQNLRUg+4lRgBX5G18vQaYIvFbIgMw0Bdg80zssmKfdd6v9hTOlgzD9IoJM+p2 mvIvWO/A2NlUe2C+lWB3tkbElYUryFmwB6RZZNxpBq/egGr6D3hSxNiFV8m497rAIxKwUeqzIz9 YmSBRVn00Sho7OQ== X-Developer-Key: i=jlayton@kernel.org; a=openpgp; fpr=4BC0D7B24471B2A184EAF5D3000E684119568215 Enable multigrain timestamps, which should ensure that there is an apparent change to the timestamp whenever it has been written after being actively observed via getattr. tmpfs only requires the FS_MGTIME flag. Reviewed-by: Josef Bacik Reviewed-by: Jan Kara Signed-off-by: Jeff Layton --- mm/shmem.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/mm/shmem.c b/mm/shmem.c index 5a77acf6ac6a..5f17eaaa32e2 100644 --- a/mm/shmem.c +++ b/mm/shmem.c @@ -4804,7 +4804,7 @@ static struct file_system_type shmem_fs_type = { .parameters = shmem_fs_parameters, #endif .kill_sb = kill_litter_super, - .fs_flags = FS_USERNS_MOUNT | FS_ALLOW_IDMAP, + .fs_flags = FS_USERNS_MOUNT | FS_ALLOW_IDMAP | FS_MGTIME, }; void __init shmem_init(void)