diff mbox series

[v4,1/6] timer: kasan: record timer stack

Message ID 20200924040335.30934-1-walter-zh.wu@mediatek.com (mailing list archive)
State New, archived
Headers show
Series kasan: add workqueue and timer stack for generic KASAN | expand

Commit Message

Walter Wu Sept. 24, 2020, 4:03 a.m. UTC
When analyze use-after-free or double-free issue, recording the timer
stacks is helpful to preserve usage history which potentially gives
a hint about the affected code.

Record the most recent two timer init calls in KASAN which are printed
on failure in the KASAN report.

For timers it has turned out to be useful to record the stack trace
of the timer init call. Because if the UAF root cause is in timer init,
then user can see KASAN report to get where it is registered and find
out the root cause. It don't need to enable DEBUG_OBJECTS_TIMERS,
but they have a chance to find out the root cause.

Signed-off-by: Walter Wu <walter-zh.wu@mediatek.com>
Suggested-by: Marco Elver <elver@google.com>
Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Marco Elver <elver@google.com>
Reviewed-by: Dmitry Vyukov <dvyukov@google.com>
Reviewed-by: Andrey Konovalov <andreyknvl@google.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Stephen Boyd <sboyd@kernel.org>
---

v2:
- Thanks for Marco and Thomas suggestion.
- Remove unnecessary code and fix commit log
- reuse kasan_record_aux_stack() and aux_stack
  to record timer and workqueue stack.

---
 kernel/time/timer.c | 3 +++
 1 file changed, 3 insertions(+)

Comments

Thomas Gleixner Sept. 24, 2020, 9:41 p.m. UTC | #1
On Thu, Sep 24 2020 at 12:03, Walter Wu wrote:
> When analyze use-after-free or double-free issue, recording the timer
> stacks is helpful to preserve usage history which potentially gives
> a hint about the affected code.
>
> Record the most recent two timer init calls in KASAN which are printed
> on failure in the KASAN report.
>
> For timers it has turned out to be useful to record the stack trace
> of the timer init call.

In which way? And what kind of bug does it catch which cannot be catched
by existing debug mechanisms already?

> Because if the UAF root cause is in timer init, then user can see
> KASAN report to get where it is registered and find out the root
> cause.

What? If the UAF root cause is in timer init, then registering it after
using it in that very same function is pretty pointless.

> It don't need to enable DEBUG_OBJECTS_TIMERS, but they have a chance
> to find out the root cause.

There is a lot of handwaving how useful this is, but TBH I don't see the
value at all.

DEBUG_OBJECTS_TIMERS does a lot more than crashing on UAF. If KASAN
provides additional value over DEBUG_OBJECTS_TIMERS then spell it out,
but just saying that you don't need to enable DEBUG_OBJECTS_TIMERS is
not making an argument for that change.

Try again please.

Thanks,

        tglx
Walter Wu Sept. 25, 2020, 7:18 a.m. UTC | #2
On Thu, 2020-09-24 at 23:41 +0200, Thomas Gleixner wrote:
> On Thu, Sep 24 2020 at 12:03, Walter Wu wrote:
> > When analyze use-after-free or double-free issue, recording the timer
> > stacks is helpful to preserve usage history which potentially gives
> > a hint about the affected code.
> >
> > Record the most recent two timer init calls in KASAN which are printed
> > on failure in the KASAN report.
> >
> > For timers it has turned out to be useful to record the stack trace
> > of the timer init call.
> 
> In which way? And what kind of bug does it catch which cannot be catched
> by existing debug mechanisms already?
> 

We only provide another debug mechanisms to debug use-after-free or
double-free, it can be displayed together in KASAN report and have a
chance to debug, and it doesn't need to enable existing debug mechanisms
at the same time. then it has a chance to resolve issue.

> > Because if the UAF root cause is in timer init, then user can see
> > KASAN report to get where it is registered and find out the root
> > cause.
> 
> What? If the UAF root cause is in timer init, then registering it after
> using it in that very same function is pretty pointless.
> 

See [1], the call stack shows UAF happen at dummy_timer(), it is the
callback function and set by timer_setup(), if KASAN report shows the
timer call stack, it should be useful for programmer.

[1]
https://syzkaller.appspot.com/bug?id=34e69b7c8c0165658cbc987da0b61dadec644b6b


> > It don't need to enable DEBUG_OBJECTS_TIMERS, but they have a chance
> > to find out the root cause.
> 
> There is a lot of handwaving how useful this is, but TBH I don't see the
> value at all.
> 
> DEBUG_OBJECTS_TIMERS does a lot more than crashing on UAF. If KASAN
> provides additional value over DEBUG_OBJECTS_TIMERS then spell it out,
> but just saying that you don't need to enable DEBUG_OBJECTS_TIMERS is
> not making an argument for that change.
> 

We don't want to replace DEBUG_OBJECTS_TIMERS with this patches, only
hope to use low overhead(compare with DEBUG_OBJECTS_TIMERS) to debug
use-after-free/double-free issue. If you have some concerns, we can add
those message into commit log.

Thanks.

Walter
Thomas Gleixner Sept. 25, 2020, 8:55 a.m. UTC | #3
Walter,

On Fri, Sep 25 2020 at 15:18, Walter Wu wrote:
> On Thu, 2020-09-24 at 23:41 +0200, Thomas Gleixner wrote:
>> > For timers it has turned out to be useful to record the stack trace
>> > of the timer init call.
>> 
>> In which way? And what kind of bug does it catch which cannot be catched
>> by existing debug mechanisms already?
>> 
> We only provide another debug mechanisms to debug use-after-free or
> double-free, it can be displayed together in KASAN report and have a
> chance to debug, and it doesn't need to enable existing debug mechanisms
> at the same time. then it has a chance to resolve issue.

Again. KASAN can only cover UAF, but there are a dozen other ways to
wreck the system with wrong usage of timers which can't be caught by
KASAN.

>> > Because if the UAF root cause is in timer init, then user can see
>> > KASAN report to get where it is registered and find out the root
>> > cause.
>> 
>> What? If the UAF root cause is in timer init, then registering it after
>> using it in that very same function is pretty pointless.
>> 
> See [1], the call stack shows UAF happen at dummy_timer(), it is the
> callback function and set by timer_setup(), if KASAN report shows the
> timer call stack, it should be useful for programmer.

The report you linked to has absolutely nothing to do with a timer
related UAF. The timer callback calls kfree_skb() on something which is
already freed. So the root cause of this is NOT in timer init as you
claimed above. The timer callback is just exposing a problem in the URB
management of this driver. IOW the recording of the timer init stack is
completely useless for decoding this problem.

>> There is a lot of handwaving how useful this is, but TBH I don't see the
>> value at all.
>> 
>> DEBUG_OBJECTS_TIMERS does a lot more than crashing on UAF. If KASAN
>> provides additional value over DEBUG_OBJECTS_TIMERS then spell it out,
>> but just saying that you don't need to enable DEBUG_OBJECTS_TIMERS is
>> not making an argument for that change.
>> 
> We don't want to replace DEBUG_OBJECTS_TIMERS with this patches, only
> hope to use low overhead(compare with DEBUG_OBJECTS_TIMERS) to debug

KASAN has lower overhead than DEBUG_OBJECTS_TIMERS? Maybe in a different
universe.

That said, I'm not opposed to the change per se, but without a sensible
justification this is just pointless.

Sprinkling kasan_foo() all over the place and claiming it's useful
without a valid example does not provide any value.

Quite the contrary it gives the completely wrong sense what KASAN can do
and what not.

Thanks,

        tglx
Walter Wu Sept. 25, 2020, 9:15 a.m. UTC | #4
On Fri, 2020-09-25 at 10:55 +0200, Thomas Gleixner wrote:
> Walter,
> 
> On Fri, Sep 25 2020 at 15:18, Walter Wu wrote:
> > On Thu, 2020-09-24 at 23:41 +0200, Thomas Gleixner wrote:
> >> > For timers it has turned out to be useful to record the stack trace
> >> > of the timer init call.
> >> 
> >> In which way? And what kind of bug does it catch which cannot be catched
> >> by existing debug mechanisms already?
> >> 
> > We only provide another debug mechanisms to debug use-after-free or
> > double-free, it can be displayed together in KASAN report and have a
> > chance to debug, and it doesn't need to enable existing debug mechanisms
> > at the same time. then it has a chance to resolve issue.
> 
> Again. KASAN can only cover UAF, but there are a dozen other ways to
> wreck the system with wrong usage of timers which can't be caught by
> KASAN.
> 
> >> > Because if the UAF root cause is in timer init, then user can see
> >> > KASAN report to get where it is registered and find out the root
> >> > cause.
> >> 
> >> What? If the UAF root cause is in timer init, then registering it after
> >> using it in that very same function is pretty pointless.
> >> 
> > See [1], the call stack shows UAF happen at dummy_timer(), it is the
> > callback function and set by timer_setup(), if KASAN report shows the
> > timer call stack, it should be useful for programmer.
> 
> The report you linked to has absolutely nothing to do with a timer
> related UAF. The timer callback calls kfree_skb() on something which is
> already freed. So the root cause of this is NOT in timer init as you
> claimed above. The timer callback is just exposing a problem in the URB
> management of this driver. IOW the recording of the timer init stack is
> completely useless for decoding this problem.
> 
> >> There is a lot of handwaving how useful this is, but TBH I don't see the
> >> value at all.
> >> 
> >> DEBUG_OBJECTS_TIMERS does a lot more than crashing on UAF. If KASAN
> >> provides additional value over DEBUG_OBJECTS_TIMERS then spell it out,
> >> but just saying that you don't need to enable DEBUG_OBJECTS_TIMERS is
> >> not making an argument for that change.
> >> 
> > We don't want to replace DEBUG_OBJECTS_TIMERS with this patches, only
> > hope to use low overhead(compare with DEBUG_OBJECTS_TIMERS) to debug
> 
> KASAN has lower overhead than DEBUG_OBJECTS_TIMERS? Maybe in a different
> universe.
> 

I mean KASAN + our patch vs KASAN + DEBUG_OBJECTS_TIMERS. The front one
have the information to the original caller and help to debug. It is
smaller overhead than the one behind.

> That said, I'm not opposed to the change per se, but without a sensible
> justification this is just pointless.
> 
> Sprinkling kasan_foo() all over the place and claiming it's useful
> without a valid example does not provide any value.
> 
> Quite the contrary it gives the completely wrong sense what KASAN can do
> and what not.
> 

I agree your saying, so that I need to find out a use case to explain to
you.

Thanks

Walter

> Thanks,
> 
>         tglx
>
Thomas Gleixner Sept. 25, 2020, 10:59 p.m. UTC | #5
On Fri, Sep 25 2020 at 17:15, Walter Wu wrote:
> On Fri, 2020-09-25 at 10:55 +0200, Thomas Gleixner wrote:
>> > We don't want to replace DEBUG_OBJECTS_TIMERS with this patches, only
>> > hope to use low overhead(compare with DEBUG_OBJECTS_TIMERS) to debug
>> 
>> KASAN has lower overhead than DEBUG_OBJECTS_TIMERS? Maybe in a different
>> universe.
>> 
> I mean KASAN + our patch vs KASAN + DEBUG_OBJECTS_TIMERS. The front one
> have the information to the original caller and help to debug. It is
> smaller overhead than the one behind.

For ONE specific problem related to timers and you have still not shown
a single useful debug output where this information helps to debug
anything.

> I agree your saying, so that I need to find out a use case to explain to
> you.

Indeed.

Thanks,

        tglx
Walter Wu Sept. 26, 2020, 5:11 p.m. UTC | #6
Hi Thomas,

On Sat, 2020-09-26 at 00:59 +0200, Thomas Gleixner wrote:
> On Fri, Sep 25 2020 at 17:15, Walter Wu wrote:
> > On Fri, 2020-09-25 at 10:55 +0200, Thomas Gleixner wrote:
> >> > We don't want to replace DEBUG_OBJECTS_TIMERS with this patches, only
> >> > hope to use low overhead(compare with DEBUG_OBJECTS_TIMERS) to debug
> >> 
> >> KASAN has lower overhead than DEBUG_OBJECTS_TIMERS? Maybe in a different
> >> universe.
> >> 
> > I mean KASAN + our patch vs KASAN + DEBUG_OBJECTS_TIMERS. The front one
> > have the information to the original caller and help to debug. It is
> > smaller overhead than the one behind.
> 
> For ONE specific problem related to timers and you have still not shown
> a single useful debug output where this information helps to debug
> anything.
> 
> > I agree your saying, so that I need to find out a use case to explain to
> > you.
> 
> Indeed.
> 

First, I think the commit log “Because if the UAF root cause is in timer
init …” needs to be removed, this patch hopes to help programmer gets
timer callback is where is registered. It is useful only if free stack
is called from timer callback, because programmer can see why & where
register this function.

Second, see [1], it should satisfies first point. The free stack is from
timer callback, if we know where register this function, then it should
be useful to solve UAF.

[1]https://lore.kernel.org/linux-usb/000000000000590f6b05a1c05d15@google.com/



Thanks

Walter





> Thanks,
> 
>         tglx
>
Thomas Gleixner Sept. 30, 2020, 7:18 a.m. UTC | #7
Walter,

On Sun, Sep 27 2020 at 01:11, Walter Wu wrote:
> First, I think the commit log “Because if the UAF root cause is in timer
> init …” needs to be removed, this patch hopes to help programmer gets
> timer callback is where is registered. It is useful only if free stack
> is called from timer callback, because programmer can see why & where
> register this function.
>
> Second, see [1], it should satisfies first point. The free stack is from
> timer callback, if we know where register this function, then it should
> be useful to solve UAF.

No. It's completely useless.

The problem has absolutely nothing to do with the timer callback and the
timer_init() invocation which set the timer's callback to 'dummy_timer'.

The timer callback happens to free the object, but the worker thread has
still a reference of some sort.

So the problem is either missing refcounting which allows the timer
callback to free the object or some missing serialization.

Knowing the place which initialized the timer is absolutely not helping
to figure out what's missing here.

Aside of that it's trivial enough to do:

  git grep dummy_timer drivers/usb/gadget/udc/dummy_hcd.c

if you really want to know what initialized it:

 dummy_timer+0x1258/0x32ae drivers/usb/gadget/udc/dummy_hcd.c:1966
 call_timer_fn+0x195/0x6f0 kernel/time/timer.c:1404
 expire_timers kernel/time/timer.c:1449 [inline]

That said, I'm all for adding useful information to KASAN or whatever
reports, but I'm not agreeing with the approach of 'Let's sprinkle
kasan_foo() all over tha place and claim it is useful to decode an UAF'.
Adding irrelevant information to a report is actually counter productive
because it makes people look at the wrong place.

Again: Provide an analysis of such a dump where the timer_init()
function is a key element of solving the problem.

Thanks,

        tglx
diff mbox series

Patch

diff --git a/kernel/time/timer.c b/kernel/time/timer.c
index a16764b0116e..1ed8f8aca7f5 100644
--- a/kernel/time/timer.c
+++ b/kernel/time/timer.c
@@ -796,6 +796,9 @@  static void do_init_timer(struct timer_list *timer,
 	timer->function = func;
 	timer->flags = flags | raw_smp_processor_id();
 	lockdep_init_map(&timer->lockdep_map, name, key, 0);
+
+	/* record the timer stack in order to print it in KASAN report */
+	kasan_record_aux_stack(timer);
 }
 
 /**