Message ID | 20230918-hirte-neuzugang-4c2324e7bae3@brauner (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [GIT,PULL] timestamp fixes | expand |
On Mon, 18 Sept 2023 at 04:54, Christian Brauner <brauner@kernel.org> wrote: > > * Only update the atime if "now" is later than the current value. This > can happen when the atime gets updated with a fine-grained timestamp > and then later gets updated using a coarse-grained timestamp. I pulled this, and then I unpulled it again. I think this is fundamentally wrong. If somebody has set the time into the future (for whatever reason - maybe the clocks were wrong at some point), afaik accessing a file should reset it, and very much used to do that. Am I missing something? Because this really seems *horribly* broken garbage. Any "go from fine-grained back to coarse-grained" situation needs to explicitly test *that* case. Not some kind of completely broken "don't update to past value" like this. Linus
On Mon, 2023-09-18 at 11:24 -0700, Linus Torvalds wrote: > On Mon, 18 Sept 2023 at 04:54, Christian Brauner <brauner@kernel.org> wrote: > > > > * Only update the atime if "now" is later than the current value. This > > can happen when the atime gets updated with a fine-grained timestamp > > and then later gets updated using a coarse-grained timestamp. > > I pulled this, and then I unpulled it again. > > I think this is fundamentally wrong. > > If somebody has set the time into the future (for whatever reason - > maybe the clocks were wrong at some point), afaik accessing a file > should reset it, and very much used to do that. > > Am I missing something? Because this really seems *horribly* broken garbage. > > Any "go from fine-grained back to coarse-grained" situation needs to > explicitly test *that* case. > > Not some kind of completely broken "don't update to past value" like this. > Fair point. Now that I've considered it some more, I think that commit 7df48e7d99a4 (fs: don't update the atime if existing atime is newer than "now") is not necessary. What prompted this was a bug report from the kernel test robot that showed the atime going backward on a STRICTATIME LTP test, but I think the root cause of that was the missing ctime initialization after allocation that we fixed in 0a22d3ff61b7 (fs: initialize inode->__i_ctime to the epoch). In general, we always update the atime with a coarse-grained timestamp, since atime and ctime updates are never done together during normal read and write operations. As you note, things are a little more murky with utimes() updates but I think we should be safe to overwrite the atime with a coarse-grained timestamp unconditionally. We should be able to just drop that patch from the series. Whether you want to pull the rest now or wait for a bit, I'll leave up to you and Christian to decide. Thanks,
On Mon, 18 Sept 2023 at 12:39, Jeff Layton <jlayton@kernel.org> wrote: > > In general, we always update the atime with a coarse-grained timestamp, > since atime and ctime updates are never done together during normal read > and write operations. As you note, things are a little more murky with > utimes() updates but I think we should be safe to overwrite the atime > with a coarse-grained timestamp unconditionally. I do think utimes() ends up always overwriting, but that's a different code-path entirely (ie it goes through the ->setattr() logic, not this inode_update_timestamps() code). So I *think* that even with your patch, doing a "touch" would end up doing the right thing - it would update atime even if it was in the future before. But doing a plain "read()" would break, and not update atime. That said, I didn't actually ever *test* any of this, so this is purely from reading the patch, and I can easily have missed something. Anyway, I do think that the timespec64_equal() tests are a bit iffy in fs/inode.c now, since the timespecs that are being tested might be of different precision. So I do think there's a *problem* here, I just do not believe that doing that timespec64_equal() -> timespec64_compare() is at all the right thing to do. My *gut* feel is that in both cases, we have this if (timespec64_equal(&inode->i_atime, &now)) and the problem is that *sometimes* 'now' is the coarse time, but sometimes it's the fine-grained one, and so checking for equality is simply nonsensical. I get the feeling that that timespec64_equal() logic for those atime updates should be something like - if 'now' is in the future, we always considering it different, and update the time - if 'now' is further in the past than the coarse granularity, we also update the time ("clearly not equal") - but if 'now' is in the past, but within the coarse time granularity, we consider it equal and do not update anything but it's not like I have really given this a huge amount of thought. It's just that "don't update if in the past" that I am pretty sure can *not* be right. Linus
On Mon, 2023-09-18 at 13:18 -0700, Linus Torvalds wrote: > On Mon, 18 Sept 2023 at 12:39, Jeff Layton <jlayton@kernel.org> wrote: > > > > In general, we always update the atime with a coarse-grained timestamp, > > since atime and ctime updates are never done together during normal read > > and write operations. As you note, things are a little more murky with > > utimes() updates but I think we should be safe to overwrite the atime > > with a coarse-grained timestamp unconditionally. > > I do think utimes() ends up always overwriting, but that's a different > code-path entirely (ie it goes through the ->setattr() logic, not this > inode_update_timestamps() code). > > So I *think* that even with your patch, doing a "touch" would end up > doing the right thing - it would update atime even if it was in the > future before. > > But doing a plain "read()" would break, and not update atime. > > That said, I didn't actually ever *test* any of this, so this is > purely from reading the patch, and I can easily have missed something. > No, you're quite right. That's exactly what would have happened. > Anyway, I do think that the timespec64_equal() tests are a bit iffy in > fs/inode.c now, since the timespecs that are being tested might be of > different precision. > > So I do think there's a *problem* here, I just do not believe that > doing that timespec64_equal() -> timespec64_compare() is at all the > right thing to do. > > My *gut* feel is that in both cases, we have this > > if (timespec64_equal(&inode->i_atime, &now)) > > and the problem is that *sometimes* 'now' is the coarse time, but > sometimes it's the fine-grained one, and so checking for equality is > simply nonsensical. > > I get the feeling that that timespec64_equal() logic for those atime > updates should be something like > > - if 'now' is in the future, we always considering it different, and > update the time > > - if 'now' is further in the past than the coarse granularity, we > also update the time ("clearly not equal") > > - but if 'now' is in the past, but within the coarse time > granularity, we consider it equal and do not update anything > > but it's not like I have really given this a huge amount of thought. > It's just that "don't update if in the past" that I am pretty sure can > *not* be right. > > I think the atime problem is solved by just dropping the patch I mentioned before. The atime is updated for read operations and the m/ctime for write. You only get a fine-grained timestamp when updating the ctime, so for an atime update we always use a coarse timestamp. It should never roll backward. [1] We may have a problem with the ctime update though, since you pointed it out. We have this in inode_set_ctime_current(), in the codepath where the QUERIED bit isn't set: /* * If we've recently updated with a fine-grained timestamp, * then the coarse-grained one may still be earlier than the * existing ctime. Just keep the existing value if so. */ ctime.tv_sec = inode->__i_ctime.tv_sec; if (timespec64_compare(&ctime, &now) > 0) return ctime; The ctime can't be set via utimes(), so that's not an issue here, but we could get a realtime clock jump backward that causes this to not be updated like it should be. I think (like you suggest above) that this needs some bounds-checking where we make sure that the current coarse grained time isn't more than around 1-2 jiffies earlier than the existing ctime. If it is, then we'll go ahead and just update it anyway. Thoughts?
On Mon, 18 Sept 2023 at 13:56, Jeff Layton <jlayton@kernel.org> wrote: > > We may have a problem with the ctime update though, since you pointed it > out. We have this in inode_set_ctime_current(), in the codepath where > the QUERIED bit isn't set: > > /* > * If we've recently updated with a fine-grained timestamp, > * then the coarse-grained one may still be earlier than the > * existing ctime. Just keep the existing value if so. > */ > ctime.tv_sec = inode->__i_ctime.tv_sec; > if (timespec64_compare(&ctime, &now) > 0) > return ctime; > > The ctime can't be set via utimes(), so that's not an issue here, but we > could get a realtime clock jump backward that causes this to not be > updated like it should be. > > I think (like you suggest above) that this needs some bounds-checking > where we make sure that the current coarse grained time isn't more than > around 1-2 jiffies earlier than the existing ctime. If it is, then we'll > go ahead and just update it anyway. > > Thoughts? Ack, that sounds about right to me. Christian - I'm just going to assume that you'll sort this out and I'll get a new pull request at some point. Holler if you think something else is needed, ok? Linus
> Christian - I'm just going to assume that you'll sort this out and > I'll get a new pull request at some point. Holler if you think > something else is needed, ok? I'll take care of it and will send you a new pull request once everything's sorted. Thanks for looking! Christian
On Tue, Sep 19, 2023 at 10:28:11AM +0200, Christian Brauner wrote: > > Christian - I'm just going to assume that you'll sort this out and > > I'll get a new pull request at some point. Holler if you think > > something else is needed, ok? > > I'll take care of it and will send you a new pull request once > everything's sorted. Thanks for looking! In the meantime we had a report that unconditionally enabling multi-grain timestamps causes a regression for some users workloads. So, the honest thing is to revert the five commits that introduced the infrastructure on top of the cleanup we did for v6.6. Jeff and Jan agreed and we can give it another go for v6.7+. The general improvements to timestamp handling and related cleanup stand on their. I'll be putting the reverts into -next now and I'll get you a pull request by the end of the week. In case you'd rather do it right now it's already here: git@gitolite.kernel.org:pub/scm/linux/kernel/git/vfs/vfs tags/v6.6-rc3.vfs.ctime.revert I'd put something like: Users reported regressions due to enabling multi-grained timestamps unconditionally because timestamps appeared stale and could break compilation. As no clear consensus on a solution has come up and the discussion has gone back to the drawing board whether we should even expose this to userspace, revert the infrastructure changes for. If it isn't code that's here to stay, make it go away. Sorry for the inconvenience. Christian
On Wed, 20 Sept 2023 at 09:22, Christian Brauner <brauner@kernel.org> wrote: > > In the meantime we had a report that unconditionally enabling > multi-grain timestamps causes a regression for some users workloads. Ok. > I'll be putting the reverts into -next now and I'll get you a pull > request by the end of the week. In case you'd rather do it right now > it's already here: By the end of the week is fine, I'll wait for the proper pull request. Thanks, Linus