diff mbox

[RESEND] timerfd: Allow TFD_TIMER_CANCEL_ON_SET with relative timeouts

Message ID 20151009082514.GY4919@axis.com (mailing list archive)
State New, archived
Headers show

Commit Message

Jesper Nilsson Oct. 9, 2015, 8:25 a.m. UTC
Allow TFD_TIMER_CANCEL_ON_SET on timerfd_settime() with relative
as well as absolute timeout.

Signed-off-by: Jesper Nilsson <jesper.nilsson@axis.com>
---
Resending after some discussion with Thomas Gleixner at ELCE,
and Cc:ing John Stultz and Michael Kerrisk who may have comments.

Longer background:

One of the uses for TFD_TIMER_CANCEL_ON_SET is to get
an event when the CLOCK_REALTIME changes (as by NTP or user action).
In this case, the timeout irrelevant, and the maximum
available value would be selected to avoid mis-triggers.

However, timerfd uses time_t for configuration, and the maximum
value on a 32bit time_t system is actually a valid time
(near 2038-01-19 03:14) in the 64bit ktime_t used internally in timerfd.

One way of provoking this problem would be to set the time
using "date '2038-01-19 03:14'" and letting the time roll over
a few seconds later.

After this time, a timerfd will continuously fire
when configured with a maximum absolute timeout,
potentially stealing all CPU and stopping the application
from doing what it really should be doing.
Which would be fine, unless the application is systemd
and loops at startup, leaving the system in a state where
the kernel is up, but nothing running in userspace. :-(

This problem was further exposed in kernel v3.19 by
commit a6d6e1c879efa4b77e250c34fe5fe1c34e6ef070
which introduced 64bit time in the RTC subsystem.
On an unconfigured RTC or an RTC with flat/removed battery
the date on could be random, and in some cases past 2038.

Of course, the proposed patch only allows the setting of relative
timeouts with TFD_TIMER_CANCEL_ON_SET, any application using
it would also need to be patched to use the relative timer
for this solve the described problem.

Another solution would be to add a new flag to timerfd_settime()
to indicate that the timer value is irrelevant, but I considered
it an unnecessary waste of a flag-bit.

A third possible solution is to steal the time (0,0)
(which currently gives -EINVAL) to indicate that the timeout
is irrelevant. This would however be a change in behaviour
that a current user wouldn't expect, perhaps making a previously
correctly failing application wait until a time change (or forever)

Tested on a MIPS 34Kc with kernel v4.1.

 fs/timerfd.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

John Stultz Oct. 19, 2015, 6:53 p.m. UTC | #1
On Fri, Oct 9, 2015 at 1:25 AM, Jesper Nilsson <jesper.nilsson@axis.com> wrote:
> Allow TFD_TIMER_CANCEL_ON_SET on timerfd_settime() with relative
> as well as absolute timeout.
>
> Signed-off-by: Jesper Nilsson <jesper.nilsson@axis.com>
> ---
> Resending after some discussion with Thomas Gleixner at ELCE,
> and Cc:ing John Stultz and Michael Kerrisk who may have comments.
>
> Longer background:
>
> One of the uses for TFD_TIMER_CANCEL_ON_SET is to get
> an event when the CLOCK_REALTIME changes (as by NTP or user action).
> In this case, the timeout irrelevant, and the maximum
> available value would be selected to avoid mis-triggers.
>
> However, timerfd uses time_t for configuration, and the maximum
> value on a 32bit time_t system is actually a valid time
> (near 2038-01-19 03:14) in the 64bit ktime_t used internally in timerfd.
>
> One way of provoking this problem would be to set the time
> using "date '2038-01-19 03:14'" and letting the time roll over
> a few seconds later.
>
> After this time, a timerfd will continuously fire
> when configured with a maximum absolute timeout,
> potentially stealing all CPU and stopping the application
> from doing what it really should be doing.
> Which would be fine, unless the application is systemd
> and loops at startup, leaving the system in a state where
> the kernel is up, but nothing running in userspace. :-(
>
> This problem was further exposed in kernel v3.19 by
> commit a6d6e1c879efa4b77e250c34fe5fe1c34e6ef070
> which introduced 64bit time in the RTC subsystem.
> On an unconfigured RTC or an RTC with flat/removed battery
> the date on could be random, and in some cases past 2038.

My first impulse is: Yes, for now,  32bit systems are broken past y2038.  :)


> Of course, the proposed patch only allows the setting of relative
> timeouts with TFD_TIMER_CANCEL_ON_SET, any application using
> it would also need to be patched to use the relative timer
> for this solve the described problem.

So is this not more of a workaround? Basically just allows
applications on systems that can't handle y2038 to be able to use
different interfaces to maybe allow the system to crawl onward to the
next problem?

Miroslav had a patch recently to try to keep 32bit systems from
getting into a week near y2038 to try to stave off some of these ugly
problems. While I'm not totally against such patches (Miroslav's
concern of a DoS vector is reasonable), I also want to avoid giving
folks a false sense of confidence that the problems are resolved.

Arnd and others are working on getting real 64bit time interfaces
piped out to 32bit userspace (once the in-kernel usage is fixed up).
So having some clear motivation to get folks to migrate to the new
interfaces will be needed.

But yea. At the same time I get you want to avoid user-pain like in
the case of the badly initialized RTC, but in that case would
returning 0 for RTC reads greater then y2038 on 32 bit systems be a
more sane fix?

thanks
-john
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Miroslav Lichvar Oct. 20, 2015, 7:36 a.m. UTC | #2
On Mon, Oct 19, 2015 at 11:53:25AM -0700, John Stultz wrote:
> On Fri, Oct 9, 2015 at 1:25 AM, Jesper Nilsson <jesper.nilsson@axis.com> wrote:
> > Of course, the proposed patch only allows the setting of relative
> > timeouts with TFD_TIMER_CANCEL_ON_SET, any application using
> > it would also need to be patched to use the relative timer
> > for this solve the described problem.
> 
> So is this not more of a workaround? Basically just allows
> applications on systems that can't handle y2038 to be able to use
> different interfaces to maybe allow the system to crawl onward to the
> next problem?

Right, any absolute timer set on the system time is a problem when it
overflows.

> Miroslav had a patch recently to try to keep 32bit systems from
> getting into a week near y2038 to try to stave off some of these ugly
> problems. While I'm not totally against such patches (Miroslav's
> concern of a DoS vector is reasonable), I also want to avoid giving
> folks a false sense of confidence that the problems are resolved.

Not all of the problems can be resolved, but current time overflowing
32-bit time_t can be prevented in the kernel, reliably.

Applications using 32-bit time_t shouldn't be expected to handle a
case when it's not really Oct 20 2015, but rather Nov 26 2151, should
they?

That would be like implicitly opening files with O_LARGEFILE, even for
applications not compiled with 64-bit off_t. The difference to the
large file support is that there is only one file (the system time)
and it's shared between all applications.

I think as long as there is anything with 32-bit time_t running on the
machine, the system time must not be allowed to overflow. If you don't
like the idea of stepping the clock back one week sooner to prevent
also time_t overflowing in majority of the user-space code, it could
be moved closer to the actual overflow. I don't see the benefit
though.

Similar applies to KTIME_MAX and the kernel code.
Arnd Bergmann Oct. 20, 2015, 8:18 a.m. UTC | #3
On Monday 19 October 2015 11:53:25 John Stultz wrote:
> 
> But yea. At the same time I get you want to avoid user-pain like in
> the case of the badly initialized RTC, but in that case would
> returning 0 for RTC reads greater then y2038 on 32 bit systems be a
> more sane fix?

I like that idea. In theory we could go further and check that the RTC
is somewhere between 2015 and 2037 (or higher on 64-bit systems) but
return 0 (1970) for anything that is outside of that range. That might
have side-effects for users that have a legitimate reason to backdate
their clocks though.

	Arnd
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jesper Nilsson Oct. 20, 2015, 8:59 a.m. UTC | #4
On Tue, Oct 20, 2015 at 10:18:22AM +0200, Arnd Bergmann wrote:
> On Monday 19 October 2015 11:53:25 John Stultz wrote:
> > 
> > But yea. At the same time I get you want to avoid user-pain like in
> > the case of the badly initialized RTC, but in that case would
> > returning 0 for RTC reads greater then y2038 on 32 bit systems be a
> > more sane fix?
> 
> I like that idea. In theory we could go further and check that the RTC
> is somewhere between 2015 and 2037 (or higher on 64-bit systems) but
> return 0 (1970) for anything that is outside of that range. That might
> have side-effects for users that have a legitimate reason to backdate
> their clocks though.

This is how the RTC framework used to handle it before the referenced
patch in my original mail, so a reversal (conditional on 32bit)
would solve that part of the problem.

It also looks like Miroslav's patch will handle the other cases of a
accidental user initiated set of a bad date or a maliciously set NTP.
Though, from my point of view, a wrap-around to 1970 would be just as valid
as a jump one week in the past.

What's the current status of that patch?

> 	Arnd

/^JN - Jesper Nilsson
Arnd Bergmann Oct. 20, 2015, 11:07 a.m. UTC | #5
On Tuesday 20 October 2015 10:59:34 Jesper Nilsson wrote:
> On Tue, Oct 20, 2015 at 10:18:22AM +0200, Arnd Bergmann wrote:
> > On Monday 19 October 2015 11:53:25 John Stultz wrote:
> > > 
> > > But yea. At the same time I get you want to avoid user-pain like in
> > > the case of the badly initialized RTC, but in that case would
> > > returning 0 for RTC reads greater then y2038 on 32 bit systems be a
> > > more sane fix?
> > 
> > I like that idea. In theory we could go further and check that the RTC
> > is somewhere between 2015 and 2037 (or higher on 64-bit systems) but
> > return 0 (1970) for anything that is outside of that range. That might
> > have side-effects for users that have a legitimate reason to backdate
> > their clocks though.
> 
> This is how the RTC framework used to handle it before the referenced
> patch in my original mail, so a reversal (conditional on 32bit)
> would solve that part of the problem.

I thought about it some more and unfortunately fail to see a long-term
strategy here. In the future, a kernel will support user space with
either 32-bit or 64-bit time_t, in different tasks.

Going back to the wraparound means that the 64-bit time_t based
tasks won't work beyond 2038 either, but as you say the current
code breaks things that used to work.

I would like to introduce a Kconfig symbol like CONFIG_Y2038_UNSAFE
defaults to on, but can be disabled on 32-bit architectures in order
to leave out all known unsafe code from the kernel (including the
existing system calls that pass a time argument). We could possibly
use that as a guard here as well, to only use the 64-bit time
variant if CONFIG_Y2038_UNSAFE is disabled, but it would be nice
to have a kernel that can run the old binaries and that is also
working beyond 2038 when used with new binaries.

Note that the same problem you see now also exists on all 64-bit
architectures running 32-bit user space, and has been like that
since the start.

> It also looks like Miroslav's patch will handle the other cases of a
> accidental user initiated set of a bad date or a maliciously set NTP.
> Though, from my point of view, a wrap-around to 1970 would be just as valid
> as a jump one week in the past.
> 
> What's the current status of that patch?

It's basically NAKed.

	Arnd
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/fs/timerfd.c b/fs/timerfd.c
index b94fa6c..8ec3aeb 100644
--- a/fs/timerfd.c
+++ b/fs/timerfd.c
@@ -134,7 +134,7 @@  static void timerfd_setup_cancel(struct timerfd_ctx *ctx, int flags)
 {
 	if ((ctx->clockid == CLOCK_REALTIME ||
 	     ctx->clockid == CLOCK_REALTIME_ALARM) &&
-	    (flags & TFD_TIMER_ABSTIME) && (flags & TFD_TIMER_CANCEL_ON_SET)) {
+	    (flags & TFD_TIMER_CANCEL_ON_SET)) {
 		if (!ctx->might_cancel) {
 			ctx->might_cancel = true;
 			spin_lock(&cancel_lock);