diff mbox series

[v3,1/2] ftrace: disable preemption between ftrace_test_recursion_trylock/unlock()

Message ID 7e4738b5-21d4-c4d0-3136-a096bbb5cd2c@linux.alibaba.com (mailing list archive)
State New, archived
Headers show
Series fix & prevent the missing preemption disabling | expand

Commit Message

王贇 Oct. 13, 2021, 8:51 a.m. UTC
As the documentation explained, ftrace_test_recursion_trylock()
and ftrace_test_recursion_unlock() were supposed to disable and
enable preemption properly, however currently this work is done
outside of the function, which could be missing by mistake.

This path will make sure the preemption was disabled when trylock()
succeed, and the unlock() will enable the preemption if previously
enabled.

CC: Steven Rostedt <rostedt@goodmis.org>
CC: Miroslav Benes <mbenes@suse.cz>
Reported-by: Abaci <abaci@linux.alibaba.com>
Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Michael Wang <yun.wang@linux.alibaba.com>
---
 arch/csky/kernel/probes/ftrace.c     |  2 --
 arch/parisc/kernel/ftrace.c          |  2 --
 arch/powerpc/kernel/kprobes-ftrace.c |  2 --
 arch/riscv/kernel/probes/ftrace.c    |  2 --
 arch/x86/kernel/kprobes/ftrace.c     |  2 --
 include/linux/trace_recursion.h      | 17 ++++++++++++++++-
 kernel/livepatch/patch.c             | 13 +++++++------
 kernel/trace/trace_functions.c       |  5 -----
 8 files changed, 23 insertions(+), 22 deletions(-)

Comments

Miroslav Benes Oct. 14, 2021, 9:13 a.m. UTC | #1
> diff --git a/kernel/livepatch/patch.c b/kernel/livepatch/patch.c
> index e8029ae..b8d75fb 100644
> --- a/kernel/livepatch/patch.c
> +++ b/kernel/livepatch/patch.c
> @@ -49,14 +49,16 @@ static void notrace klp_ftrace_handler(unsigned long ip,
> 
>  	ops = container_of(fops, struct klp_ops, fops);
> 
> +	/*
> +	 *

This empty line is not useful.

> +	 * The ftrace_test_recursion_trylock() will disable preemption,
> +	 * which is required for the variant of synchronize_rcu() that is
> +	 * used to allow patching functions where RCU is not watching.
> +	 * See klp_synchronize_transition() for more details.
> +	 */
>  	bit = ftrace_test_recursion_trylock(ip, parent_ip);
>  	if (WARN_ON_ONCE(bit < 0))
>  		return;
> -	/*
> -	 * A variant of synchronize_rcu() is used to allow patching functions
> -	 * where RCU is not watching, see klp_synchronize_transition().
> -	 */
> -	preempt_disable_notrace();
> 
>  	func = list_first_or_null_rcu(&ops->func_stack, struct klp_func,
>  				      stack_node);
> @@ -120,7 +122,6 @@ static void notrace klp_ftrace_handler(unsigned long ip,
>  	klp_arch_set_pc(fregs, (unsigned long)func->new_func);
> 
>  unlock:
> -	preempt_enable_notrace();
>  	ftrace_test_recursion_unlock(bit);
>  }

Acked-by: Miroslav Benes <mbenes@suse.cz>

for the livepatch part of the patch.

I would also ask you not to submit new versions so often, so that the 
other reviewers have time to actually review the patch set.

Quoting from Documentation/process/submitting-patches.rst:

"Wait for a minimum of one week before resubmitting or pinging reviewers - 
possibly longer during busy times like merge windows."

Thanks

Miroslav
王贇 Oct. 14, 2021, 9:22 a.m. UTC | #2
On 2021/10/14 下午5:13, Miroslav Benes wrote:
>> diff --git a/kernel/livepatch/patch.c b/kernel/livepatch/patch.c
>> index e8029ae..b8d75fb 100644
>> --- a/kernel/livepatch/patch.c
>> +++ b/kernel/livepatch/patch.c
>> @@ -49,14 +49,16 @@ static void notrace klp_ftrace_handler(unsigned long ip,
>>
>>  	ops = container_of(fops, struct klp_ops, fops);
>>
>> +	/*
>> +	 *
> 
> This empty line is not useful.
> 
>> +	 * The ftrace_test_recursion_trylock() will disable preemption,
>> +	 * which is required for the variant of synchronize_rcu() that is
>> +	 * used to allow patching functions where RCU is not watching.
>> +	 * See klp_synchronize_transition() for more details.
>> +	 */
>>  	bit = ftrace_test_recursion_trylock(ip, parent_ip);
>>  	if (WARN_ON_ONCE(bit < 0))
>>  		return;
>> -	/*
>> -	 * A variant of synchronize_rcu() is used to allow patching functions
>> -	 * where RCU is not watching, see klp_synchronize_transition().
>> -	 */
>> -	preempt_disable_notrace();
>>
>>  	func = list_first_or_null_rcu(&ops->func_stack, struct klp_func,
>>  				      stack_node);
>> @@ -120,7 +122,6 @@ static void notrace klp_ftrace_handler(unsigned long ip,
>>  	klp_arch_set_pc(fregs, (unsigned long)func->new_func);
>>
>>  unlock:
>> -	preempt_enable_notrace();
>>  	ftrace_test_recursion_unlock(bit);
>>  }
> 
> Acked-by: Miroslav Benes <mbenes@suse.cz>
> 
> for the livepatch part of the patch.
> 
> I would also ask you not to submit new versions so often, so that the 
> other reviewers have time to actually review the patch set.
> 
> Quoting from Documentation/process/submitting-patches.rst:
> 
> "Wait for a minimum of one week before resubmitting or pinging reviewers - 
> possibly longer during busy times like merge windows."

Thanks for the Ack and suggestion, will take care from now on :-)

Regards,
Michael Wang

> 
> Thanks
> 
> Miroslav
>
Steven Rostedt Oct. 14, 2021, 1:28 p.m. UTC | #3
On Thu, 14 Oct 2021 11:13:13 +0200 (CEST)
Miroslav Benes <mbenes@suse.cz> wrote:

> for the livepatch part of the patch.
> 
> I would also ask you not to submit new versions so often, so that the 
> other reviewers have time to actually review the patch set.
> 
> Quoting from Documentation/process/submitting-patches.rst:
> 
> "Wait for a minimum of one week before resubmitting or pinging reviewers - 
> possibly longer during busy times like merge windows."

Although, for updates on a small patch set, I would say just a couple of
days, instead of a week. It's annoying when you have a 15 patch set series,
that gets updated on a daily basis. But for something with only 2 patches,
wait just two days. At least, that's how I feel.

-- Steve
Petr Mladek Oct. 14, 2021, 3:14 p.m. UTC | #4
On Wed 2021-10-13 16:51:46, 王贇 wrote:
> As the documentation explained, ftrace_test_recursion_trylock()
> and ftrace_test_recursion_unlock() were supposed to disable and
> enable preemption properly, however currently this work is done
> outside of the function, which could be missing by mistake.
> 
> This path will make sure the preemption was disabled when trylock()
> succeed, and the unlock() will enable the preemption if previously
> enabled.
> 
> diff --git a/include/linux/trace_recursion.h b/include/linux/trace_recursion.h
> index a9f9c57..58e474c 100644
> --- a/include/linux/trace_recursion.h
> +++ b/include/linux/trace_recursion.h
> @@ -214,7 +214,18 @@ static __always_inline void trace_clear_recursion(int bit)

We should also update the description above the function, for example:

  /**
   * ftrace_test_recursion_trylock - tests for recursion in same context
   *
   * Use this for ftrace callbacks. This will detect if the function
   * tracing recursed in the same context (normal vs interrupt),
   *
   * Returns: -1 if a recursion happened.
-  *           >= 0 if no recursion
+  *           >= 0 if no recursion (success)
+  *
+  * Disables the preemption on success. It is just for a convenience.
+  * Current users needed to disable the preemtion for some reasons.
   */


>  static __always_inline int ftrace_test_recursion_trylock(unsigned long ip,
>  							 unsigned long parent_ip)
>  {
> -	return trace_test_and_set_recursion(ip, parent_ip, TRACE_FTRACE_START, TRACE_FTRACE_MAX);
> +	int bit;
> +
> +	bit = trace_test_and_set_recursion(ip, parent_ip, TRACE_FTRACE_START, TRACE_FTRACE_MAX);
> +	/*
> +	 * The zero bit indicate we are nested
> +	 * in another trylock(), which means the
> +	 * preemption already disabled.
> +	 */
> +	if (bit > 0)
> +		preempt_disable_notrace();

Is this safe? The preemption is disabled only when
trace_test_and_set_recursion() was called by ftrace_test_recursion_trylock().

We must either always disable the preemtion when bit >= 0.
Or we have to disable the preemtion already in
trace_test_and_set_recursion().


Finally, the comment confused me a lot. The difference between nesting and
recursion is far from clear. And the code is tricky liky like hell :-)
I propose to add some comments, see below for a proposal.

> +
> +	return bit;
>  }
>  /**
> @@ -222,9 +233,13 @@ static __always_inline int ftrace_test_recursion_trylock(unsigned long ip,
>   * @bit: The return of a successful ftrace_test_recursion_trylock()
>   *
>   * This is used at the end of a ftrace callback.
> + *
> + * Preemption will be enabled (if it was previously enabled).
>   */
>  static __always_inline void ftrace_test_recursion_unlock(int bit)
>  {
> +	if (bit)

This is not symetric with trylock(). It should be:

	if (bit > 0)

Anyway, trace_clear_recursion() quiently ignores bit != 0


> +		preempt_enable_notrace();
>  	trace_clear_recursion(bit);
>  }


Below is my proposed patch that tries to better explain the recursion
check:

From 20d69f11e2683262fa0043b803999061cbac543f Mon Sep 17 00:00:00 2001
From: Petr Mladek <pmladek@suse.com>
Date: Thu, 14 Oct 2021 16:57:39 +0200
Subject: [PATCH] trace: Better describe the recursion check return values

The trace recursion check might be called recursively by different
layers of the tracing code. It is safe recursion and the check
is even optimized for this case.

The problematic recursion is when the traced function is called
by the tracing code. This is properly detected.

Try to explain this difference a better way.

Signed-off-by: Petr Mladek <pmladek@suse.com>
---
 include/linux/trace_recursion.h | 16 +++++++++++++++-
 1 file changed, 15 insertions(+), 1 deletion(-)

diff --git a/include/linux/trace_recursion.h b/include/linux/trace_recursion.h
index a9f9c5714e65..b5efb804efdf 100644
--- a/include/linux/trace_recursion.h
+++ b/include/linux/trace_recursion.h
@@ -159,13 +159,27 @@ extern void ftrace_record_recursion(unsigned long ip, unsigned long parent_ip);
 # define do_ftrace_record_recursion(ip, pip)	do { } while (0)
 #endif
 
+/*
+ * trace_test_and_set_recursion() is called on several layers
+ * of the ftrace code when handling the same ftrace entry.
+ * These calls might be nested/recursive.
+ *
+ * It uses TRACE_LIST_*BITs to distinguish between this
+ * internal recursion and recursion caused by calling
+ * the traced function by the ftrace code.
+ *
+ * Returns: > 0 when no recursion
+ *          0 when called recursively internally (safe)
+ *	    -1 when the traced function was called recursively from
+ *             the ftrace handler (unsafe)
+ */
 static __always_inline int trace_test_and_set_recursion(unsigned long ip, unsigned long pip,
 							int start, int max)
 {
 	unsigned int val = READ_ONCE(current->trace_recursion);
 	int bit;
 
-	/* A previous recursion check was made */
+	/* Called recursively internally by different ftrace code layers? */
 	if ((val & TRACE_CONTEXT_MASK) > max)
 		return 0;
王贇 Oct. 15, 2021, 3:13 a.m. UTC | #5
On 2021/10/14 下午11:14, Petr Mladek wrote:
[snip]
>> -	return trace_test_and_set_recursion(ip, parent_ip, TRACE_FTRACE_START, TRACE_FTRACE_MAX);
>> +	int bit;
>> +
>> +	bit = trace_test_and_set_recursion(ip, parent_ip, TRACE_FTRACE_START, TRACE_FTRACE_MAX);
>> +	/*
>> +	 * The zero bit indicate we are nested
>> +	 * in another trylock(), which means the
>> +	 * preemption already disabled.
>> +	 */
>> +	if (bit > 0)
>> +		preempt_disable_notrace();
> 
> Is this safe? The preemption is disabled only when
> trace_test_and_set_recursion() was called by ftrace_test_recursion_trylock().
> 
> We must either always disable the preemtion when bit >= 0.
> Or we have to disable the preemtion already in
> trace_test_and_set_recursion().

Internal calling of trace_test_and_set_recursion() will disable preemption
on succeed, it should be safe.

We can also consider move the logical into trace_test_and_set_recursion()
and trace_clear_recursion(), but I'm not very sure about that... ftrace
internally already make sure preemption disabled, what uncovered is those
users who call API trylock/unlock, isn't it?

> 
> 
> Finally, the comment confused me a lot. The difference between nesting and
> recursion is far from clear. And the code is tricky liky like hell :-)
> I propose to add some comments, see below for a proposal.
The comments do confusing, I'll make it something like:

The zero bit indicate trace recursion happened, whatever
the recursively call was made by ftrace handler or ftrace
itself, the preemption already disabled.

Will this one looks better to you?

> 
>> +
>> +	return bit;
>>  }
>>  /**
>> @@ -222,9 +233,13 @@ static __always_inline int ftrace_test_recursion_trylock(unsigned long ip,
>>   * @bit: The return of a successful ftrace_test_recursion_trylock()
>>   *
>>   * This is used at the end of a ftrace callback.
>> + *
>> + * Preemption will be enabled (if it was previously enabled).
>>   */
>>  static __always_inline void ftrace_test_recursion_unlock(int bit)
>>  {
>> +	if (bit)
> 
> This is not symetric with trylock(). It should be:
> 
> 	if (bit > 0)
> 
> Anyway, trace_clear_recursion() quiently ignores bit != 0

Yes, bit == 0 should not happen in here.

> 
> 
>> +		preempt_enable_notrace();
>>  	trace_clear_recursion(bit);
>>  }
> 
> 
> Below is my proposed patch that tries to better explain the recursion
> check:
> 
> From 20d69f11e2683262fa0043b803999061cbac543f Mon Sep 17 00:00:00 2001
> From: Petr Mladek <pmladek@suse.com>
> Date: Thu, 14 Oct 2021 16:57:39 +0200
> Subject: [PATCH] trace: Better describe the recursion check return values
> 
> The trace recursion check might be called recursively by different
> layers of the tracing code. It is safe recursion and the check
> is even optimized for this case.
> 
> The problematic recursion is when the traced function is called
> by the tracing code. This is properly detected.
> 
> Try to explain this difference a better way.
> 
> Signed-off-by: Petr Mladek <pmladek@suse.com>
> ---
>  include/linux/trace_recursion.h | 16 +++++++++++++++-
>  1 file changed, 15 insertions(+), 1 deletion(-)
> 
> diff --git a/include/linux/trace_recursion.h b/include/linux/trace_recursion.h
> index a9f9c5714e65..b5efb804efdf 100644
> --- a/include/linux/trace_recursion.h
> +++ b/include/linux/trace_recursion.h
> @@ -159,13 +159,27 @@ extern void ftrace_record_recursion(unsigned long ip, unsigned long parent_ip);
>  # define do_ftrace_record_recursion(ip, pip)	do { } while (0)
>  #endif
>  
> +/*
> + * trace_test_and_set_recursion() is called on several layers
> + * of the ftrace code when handling the same ftrace entry.
> + * These calls might be nested/recursive.
> + *
> + * It uses TRACE_LIST_*BITs to distinguish between this
> + * internal recursion and recursion caused by calling
> + * the traced function by the ftrace code.
> + *
> + * Returns: > 0 when no recursion
> + *          0 when called recursively internally (safe)

The 0 can also happened when ftrace handler recursively called trylock()
under the same context, or not?

Regards,
Michael Wang

> + *	    -1 when the traced function was called recursively from
> + *             the ftrace handler (unsafe)
> + */
>  static __always_inline int trace_test_and_set_recursion(unsigned long ip, unsigned long pip,
>  							int start, int max)
>  {
>  	unsigned int val = READ_ONCE(current->trace_recursion);
>  	int bit;
>  
> -	/* A previous recursion check was made */
> +	/* Called recursively internally by different ftrace code layers? */
>  	if ((val & TRACE_CONTEXT_MASK) > max)
>  		return 0;

>  
>
Steven Rostedt Oct. 15, 2021, 3:39 a.m. UTC | #6
On Thu, 14 Oct 2021 17:14:07 +0200
Petr Mladek <pmladek@suse.com> wrote:

>   /**
>    * ftrace_test_recursion_trylock - tests for recursion in same context
>    *
>    * Use this for ftrace callbacks. This will detect if the function
>    * tracing recursed in the same context (normal vs interrupt),
>    *
>    * Returns: -1 if a recursion happened.
> -  *           >= 0 if no recursion
> +  *           >= 0 if no recursion (success)
> +  *
> +  * Disables the preemption on success. It is just for a convenience.
> +  * Current users needed to disable the preemtion for some reasons.
>    */

I started replying to this explaining the difference between bit not
zero and a bit zero, and I think I found a design flaw that can allow
unwanted recursion.

It's late and I'm about to go to bed, but I may have a new patch to fix
this before this gets added, as the fix will conflict with this patch,
and the fix will likely need to go to stable.

Stay tuned.

-- Steve
王贇 Oct. 15, 2021, 4:45 a.m. UTC | #7
On 2021/10/15 上午11:13, 王贇 wrote:
[snip]
>>  # define do_ftrace_record_recursion(ip, pip)	do { } while (0)
>>  #endif
>>  
>> +/*
>> + * trace_test_and_set_recursion() is called on several layers
>> + * of the ftrace code when handling the same ftrace entry.
>> + * These calls might be nested/recursive.
>> + *
>> + * It uses TRACE_LIST_*BITs to distinguish between this
>> + * internal recursion and recursion caused by calling
>> + * the traced function by the ftrace code.
>> + *
>> + * Returns: > 0 when no recursion
>> + *          0 when called recursively internally (safe)
> 
> The 0 can also happened when ftrace handler recursively called trylock()
> under the same context, or not?
> 

Never mind... you're right about this.

Regards,
Michael Wang

> Regards,
> Michael Wang
> 
>> + *	    -1 when the traced function was called recursively from
>> + *             the ftrace handler (unsafe)
>> + */
>>  static __always_inline int trace_test_and_set_recursion(unsigned long ip, unsigned long pip,
>>  							int start, int max)
>>  {
>>  	unsigned int val = READ_ONCE(current->trace_recursion);
>>  	int bit;
>>  
>> -	/* A previous recursion check was made */
>> +	/* Called recursively internally by different ftrace code layers? */
>>  	if ((val & TRACE_CONTEXT_MASK) > max)
>>  		return 0;
> 
>>  
>>
Petr Mladek Oct. 15, 2021, 7:28 a.m. UTC | #8
On Fri 2021-10-15 11:13:08, 王贇 wrote:
> 
> 
> On 2021/10/14 下午11:14, Petr Mladek wrote:
> [snip]
> >> -	return trace_test_and_set_recursion(ip, parent_ip, TRACE_FTRACE_START, TRACE_FTRACE_MAX);
> >> +	int bit;
> >> +
> >> +	bit = trace_test_and_set_recursion(ip, parent_ip, TRACE_FTRACE_START, TRACE_FTRACE_MAX);
> >> +	/*
> >> +	 * The zero bit indicate we are nested
> >> +	 * in another trylock(), which means the
> >> +	 * preemption already disabled.
> >> +	 */
> >> +	if (bit > 0)
> >> +		preempt_disable_notrace();
> > 
> > Is this safe? The preemption is disabled only when
> > trace_test_and_set_recursion() was called by ftrace_test_recursion_trylock().
> > 
> > We must either always disable the preemtion when bit >= 0.
> > Or we have to disable the preemtion already in
> > trace_test_and_set_recursion().
> 
> Internal calling of trace_test_and_set_recursion() will disable preemption
> on succeed, it should be safe.

trace_test_and_set_recursion() does _not_ disable preemtion!
It works only because all callers disable the preemption.

It means that the comment is wrong. It is not guarantted that the
preemption will be disabled. It works only by chance.


> We can also consider move the logical into trace_test_and_set_recursion()
> and trace_clear_recursion(), but I'm not very sure about that... ftrace
> internally already make sure preemption disabled

How? Because it calls trace_test_and_set_recursion() via the trylock() API?


> , what uncovered is those users who call API trylock/unlock, isn't
> it?

And this is exactly the problem. trace_test_and_set_recursion() is in
a public header. Anyone could use it. And if anyone uses it in the
future without the trylock() and does not disable the preemtion
explicitely then we are lost again.

And it is even more dangerous. The original code disabled the
preemtion on various layers. As a result, the preemtion was disabled
several times for sure. It means that the deeper layers were
always on the safe side.

With this patch, if the first trace_test_and_set_recursion() caller
does not disable preemtion then trylock() will not disable it either
and the entire code is procceed with preemtion enabled.


> > Finally, the comment confused me a lot. The difference between nesting and
> > recursion is far from clear. And the code is tricky liky like hell :-)
> > I propose to add some comments, see below for a proposal.
> The comments do confusing, I'll make it something like:
> 
> The zero bit indicate trace recursion happened, whatever
> the recursively call was made by ftrace handler or ftrace
> itself, the preemption already disabled.

I am sorry but it is still confusing. We need to find a better way
how to clearly explain the difference between the safe and
unsafe recursion.

My understanding is that the recursion is:

  + "unsafe" when the trace code recursively enters the same trace point.

  + "safe" when ftrace_test_recursion_trylock() is called recursivelly
    while still processing the same trace entry.

> >> +
> >> +	return bit;
> >>  }
> >>  /**
> >> @@ -222,9 +233,13 @@ static __always_inline int ftrace_test_recursion_trylock(unsigned long ip,
> >>   * @bit: The return of a successful ftrace_test_recursion_trylock()
> >>   *
> >>   * This is used at the end of a ftrace callback.
> >> + *
> >> + * Preemption will be enabled (if it was previously enabled).
> >>   */
> >>  static __always_inline void ftrace_test_recursion_unlock(int bit)
> >>  {
> >> +	if (bit)
> > 
> > This is not symetric with trylock(). It should be:
> > 
> > 	if (bit > 0)
> > 
> > Anyway, trace_clear_recursion() quiently ignores bit != 0
> 
> Yes, bit == 0 should not happen in here.

Yes, it "should" not happen. My point is that we could make the API
more safe. We could do the right thing when
ftrace_test_recursion_unlock() is called with negative @bit.
Ideally, we should also warn about the mis-use.


Anyway, let's wait for Steven. It seems that he found another problem
with the API that should be solved first. The fix will probably
also help to better understand the "safe" vs "unsafe" recursion.

Best Regards,
Petr
王贇 Oct. 15, 2021, 9:12 a.m. UTC | #9
On 2021/10/15 下午3:28, Petr Mladek wrote:
> On Fri 2021-10-15 11:13:08, 王贇 wrote:
>>
>>
>> On 2021/10/14 下午11:14, Petr Mladek wrote:
>> [snip]
>>>> -	return trace_test_and_set_recursion(ip, parent_ip, TRACE_FTRACE_START, TRACE_FTRACE_MAX);
>>>> +	int bit;
>>>> +
>>>> +	bit = trace_test_and_set_recursion(ip, parent_ip, TRACE_FTRACE_START, TRACE_FTRACE_MAX);
>>>> +	/*
>>>> +	 * The zero bit indicate we are nested
>>>> +	 * in another trylock(), which means the
>>>> +	 * preemption already disabled.
>>>> +	 */
>>>> +	if (bit > 0)
>>>> +		preempt_disable_notrace();
>>>
>>> Is this safe? The preemption is disabled only when
>>> trace_test_and_set_recursion() was called by ftrace_test_recursion_trylock().
>>>
>>> We must either always disable the preemtion when bit >= 0.
>>> Or we have to disable the preemtion already in
>>> trace_test_and_set_recursion().
>>
>> Internal calling of trace_test_and_set_recursion() will disable preemption
>> on succeed, it should be safe.
> 
> trace_test_and_set_recursion() does _not_ disable preemtion!
> It works only because all callers disable the preemption.

Yup.

> 
> It means that the comment is wrong. It is not guarantted that the
> preemption will be disabled. It works only by chance.
> 
> 
>> We can also consider move the logical into trace_test_and_set_recursion()
>> and trace_clear_recursion(), but I'm not very sure about that... ftrace
>> internally already make sure preemption disabled
> 
> How? Because it calls trace_test_and_set_recursion() via the trylock() API?

I mean currently all the direct caller of trace_test_and_set_recursion()
have disabled the preemption as you mentioned above, but yes if anyone later
write some kernel code to call trace_test_and_set_recursion() without
disabling preemption after, then promise broken.

> 
> 
>> , what uncovered is those users who call API trylock/unlock, isn't
>> it?
> 
> And this is exactly the problem. trace_test_and_set_recursion() is in
> a public header. Anyone could use it. And if anyone uses it in the
> future without the trylock() and does not disable the preemtion
> explicitely then we are lost again.
> 
> And it is even more dangerous. The original code disabled the
> preemtion on various layers. As a result, the preemtion was disabled
> several times for sure. It means that the deeper layers were
> always on the safe side.
> 
> With this patch, if the first trace_test_and_set_recursion() caller
> does not disable preemtion then trylock() will not disable it either
> and the entire code is procceed with preemtion enabled.

Yes, what confusing me at first is that I think people who call
trace_test_and_set_recursion() without trylock() can only be a
ftrace hacker, not a user, but in case if anyone can use it without
respect to preemption stuff, then I think the logical should be inside
trace_test_and_set_recursion() rather than trylock().

> 
> 
>>> Finally, the comment confused me a lot. The difference between nesting and
>>> recursion is far from clear. And the code is tricky liky like hell :-)
>>> I propose to add some comments, see below for a proposal.
>> The comments do confusing, I'll make it something like:
>>
>> The zero bit indicate trace recursion happened, whatever
>> the recursively call was made by ftrace handler or ftrace
>> itself, the preemption already disabled.
> 
> I am sorry but it is still confusing. We need to find a better way
> how to clearly explain the difference between the safe and
> unsafe recursion.
> 
> My understanding is that the recursion is:
> 
>   + "unsafe" when the trace code recursively enters the same trace point.
> 
>   + "safe" when ftrace_test_recursion_trylock() is called recursivelly
>     while still processing the same trace entry.

Maybe take some example would be easier to understand...

Currently there are two way of using ftrace_test_recursion_trylock(),
one with TRACE_FTRACE_XXX we mark as A, one with TRACE_LIST_XXX we mark
as B, then:

A followed by B on same context got bit > 0
B followed by A on any context got bit 0
A followed by A on same context got bit > 0
A followed by A followed by A on same context got bit -1
B followed by B on same context got bit > 0
B followed by B followed by B on same context got bit -1

If we get rid of the TRACE_TRANSITION_BIT which allowed recursion for
onetime, then it would be:

A followed by B on same context got bit > 0
B followed by A on any context got bit 0
A followed by A on same context got bit -1
B followed by B on same context got bit -1

So as long as no continuously AAA it's safe?

> 
>>>> +
>>>> +	return bit;
>>>>  }
>>>>  /**
>>>> @@ -222,9 +233,13 @@ static __always_inline int ftrace_test_recursion_trylock(unsigned long ip,
>>>>   * @bit: The return of a successful ftrace_test_recursion_trylock()
>>>>   *
>>>>   * This is used at the end of a ftrace callback.
>>>> + *
>>>> + * Preemption will be enabled (if it was previously enabled).
>>>>   */
>>>>  static __always_inline void ftrace_test_recursion_unlock(int bit)
>>>>  {
>>>> +	if (bit)
>>>
>>> This is not symetric with trylock(). It should be:
>>>
>>> 	if (bit > 0)
>>>
>>> Anyway, trace_clear_recursion() quiently ignores bit != 0
>>
>> Yes, bit == 0 should not happen in here.
> 
> Yes, it "should" not happen. My point is that we could make the API
> more safe. We could do the right thing when
> ftrace_test_recursion_unlock() is called with negative @bit.
> Ideally, we should also warn about the mis-use.

Agree with a WARN here on bit 0.

> 
> 
> Anyway, let's wait for Steven. It seems that he found another problem
> with the API that should be solved first. The fix will probably
> also help to better understand the "safe" vs "unsafe" recursion.

Cool~

Regards,
Michael Wang

> 
> Best Regards,
> Petr
>
Steven Rostedt Oct. 15, 2021, 1:35 p.m. UTC | #10
On Fri, 15 Oct 2021 17:12:26 +0800
王贇 <yun.wang@linux.alibaba.com> wrote:

> Maybe take some example would be easier to understand...
> 
> Currently there are two way of using ftrace_test_recursion_trylock(),
> one with TRACE_FTRACE_XXX we mark as A, one with TRACE_LIST_XXX we mark
> as B, then:
> 
> A followed by B on same context got bit > 0
> B followed by A on any context got bit 0
> A followed by A on same context got bit > 0
> A followed by A followed by A on same context got bit -1
> B followed by B on same context got bit > 0
> B followed by B followed by B on same context got bit -1
> 
> If we get rid of the TRACE_TRANSITION_BIT which allowed recursion for
> onetime, then it would be:

We can't get rid of the transition bit. It's there to fix a bug. Or at
least until we finish the "noinst" issue which may be true now? I have to
go revisit it.

The reason for the transition bit, is because we were dropping function
traces, that people relied on being there. The problem is that the
recursion protection allows for nested context. That is, it will not detect
recursion if we an interrupt triggers during a trace (while the recursion
lock is held) and then that interrupt does a trace. It is allowed to call
the ftrace_test_recursion_trylock() again.

But, what happens if the trace occurs after the interrupt triggers, but
before the preempt_count states that we are now in interrupt context? As
preempt_count is used by this code to determine what context we are in, if
a trace happens in this "transition" state, without the transition bit, a
recursion is detected (false positive) and the event is dropped. People are
then confused on how an interrupt happened, but the entry of the interrupt
never triggered (according to the trace).

The transition bit allows one level of recursion to handle this case.

The "noinst" work may also remove all locations that can be traced before
the context is updated. I need to see if that is now the case, and if it
is, we can remove the transition bit.

This is not the design issue I mentioned earlier. I'm still working on that
one.

-- Steve


> 
> A followed by B on same context got bit > 0
> B followed by A on any context got bit 0
> A followed by A on same context got bit -1
> B followed by B on same context got bit -1
> 
> So as long as no continuously AAA it's safe?
diff mbox series

Patch

diff --git a/arch/csky/kernel/probes/ftrace.c b/arch/csky/kernel/probes/ftrace.c
index b388228..834cffc 100644
--- a/arch/csky/kernel/probes/ftrace.c
+++ b/arch/csky/kernel/probes/ftrace.c
@@ -17,7 +17,6 @@  void kprobe_ftrace_handler(unsigned long ip, unsigned long parent_ip,
 		return;

 	regs = ftrace_get_regs(fregs);
-	preempt_disable_notrace();
 	p = get_kprobe((kprobe_opcode_t *)ip);
 	if (!p) {
 		p = get_kprobe((kprobe_opcode_t *)(ip - MCOUNT_INSN_SIZE));
@@ -57,7 +56,6 @@  void kprobe_ftrace_handler(unsigned long ip, unsigned long parent_ip,
 		__this_cpu_write(current_kprobe, NULL);
 	}
 out:
-	preempt_enable_notrace();
 	ftrace_test_recursion_unlock(bit);
 }
 NOKPROBE_SYMBOL(kprobe_ftrace_handler);
diff --git a/arch/parisc/kernel/ftrace.c b/arch/parisc/kernel/ftrace.c
index 0a1e75a..3543496 100644
--- a/arch/parisc/kernel/ftrace.c
+++ b/arch/parisc/kernel/ftrace.c
@@ -216,7 +216,6 @@  void kprobe_ftrace_handler(unsigned long ip, unsigned long parent_ip,
 		return;

 	regs = ftrace_get_regs(fregs);
-	preempt_disable_notrace();
 	p = get_kprobe((kprobe_opcode_t *)ip);
 	if (unlikely(!p) || kprobe_disabled(p))
 		goto out;
@@ -245,7 +244,6 @@  void kprobe_ftrace_handler(unsigned long ip, unsigned long parent_ip,
 	}
 	__this_cpu_write(current_kprobe, NULL);
 out:
-	preempt_enable_notrace();
 	ftrace_test_recursion_unlock(bit);
 }
 NOKPROBE_SYMBOL(kprobe_ftrace_handler);
diff --git a/arch/powerpc/kernel/kprobes-ftrace.c b/arch/powerpc/kernel/kprobes-ftrace.c
index 7154d58..072ebe7 100644
--- a/arch/powerpc/kernel/kprobes-ftrace.c
+++ b/arch/powerpc/kernel/kprobes-ftrace.c
@@ -26,7 +26,6 @@  void kprobe_ftrace_handler(unsigned long nip, unsigned long parent_nip,
 		return;

 	regs = ftrace_get_regs(fregs);
-	preempt_disable_notrace();
 	p = get_kprobe((kprobe_opcode_t *)nip);
 	if (unlikely(!p) || kprobe_disabled(p))
 		goto out;
@@ -61,7 +60,6 @@  void kprobe_ftrace_handler(unsigned long nip, unsigned long parent_nip,
 		__this_cpu_write(current_kprobe, NULL);
 	}
 out:
-	preempt_enable_notrace();
 	ftrace_test_recursion_unlock(bit);
 }
 NOKPROBE_SYMBOL(kprobe_ftrace_handler);
diff --git a/arch/riscv/kernel/probes/ftrace.c b/arch/riscv/kernel/probes/ftrace.c
index aab85a8..7142ec4 100644
--- a/arch/riscv/kernel/probes/ftrace.c
+++ b/arch/riscv/kernel/probes/ftrace.c
@@ -15,7 +15,6 @@  void kprobe_ftrace_handler(unsigned long ip, unsigned long parent_ip,
 	if (bit < 0)
 		return;

-	preempt_disable_notrace();
 	p = get_kprobe((kprobe_opcode_t *)ip);
 	if (unlikely(!p) || kprobe_disabled(p))
 		goto out;
@@ -52,7 +51,6 @@  void kprobe_ftrace_handler(unsigned long ip, unsigned long parent_ip,
 		__this_cpu_write(current_kprobe, NULL);
 	}
 out:
-	preempt_enable_notrace();
 	ftrace_test_recursion_unlock(bit);
 }
 NOKPROBE_SYMBOL(kprobe_ftrace_handler);
diff --git a/arch/x86/kernel/kprobes/ftrace.c b/arch/x86/kernel/kprobes/ftrace.c
index 596de2f..dd2ec14 100644
--- a/arch/x86/kernel/kprobes/ftrace.c
+++ b/arch/x86/kernel/kprobes/ftrace.c
@@ -25,7 +25,6 @@  void kprobe_ftrace_handler(unsigned long ip, unsigned long parent_ip,
 	if (bit < 0)
 		return;

-	preempt_disable_notrace();
 	p = get_kprobe((kprobe_opcode_t *)ip);
 	if (unlikely(!p) || kprobe_disabled(p))
 		goto out;
@@ -59,7 +58,6 @@  void kprobe_ftrace_handler(unsigned long ip, unsigned long parent_ip,
 		__this_cpu_write(current_kprobe, NULL);
 	}
 out:
-	preempt_enable_notrace();
 	ftrace_test_recursion_unlock(bit);
 }
 NOKPROBE_SYMBOL(kprobe_ftrace_handler);
diff --git a/include/linux/trace_recursion.h b/include/linux/trace_recursion.h
index a9f9c57..58e474c 100644
--- a/include/linux/trace_recursion.h
+++ b/include/linux/trace_recursion.h
@@ -214,7 +214,18 @@  static __always_inline void trace_clear_recursion(int bit)
 static __always_inline int ftrace_test_recursion_trylock(unsigned long ip,
 							 unsigned long parent_ip)
 {
-	return trace_test_and_set_recursion(ip, parent_ip, TRACE_FTRACE_START, TRACE_FTRACE_MAX);
+	int bit;
+
+	bit = trace_test_and_set_recursion(ip, parent_ip, TRACE_FTRACE_START, TRACE_FTRACE_MAX);
+	/*
+	 * The zero bit indicate we are nested
+	 * in another trylock(), which means the
+	 * preemption already disabled.
+	 */
+	if (bit > 0)
+		preempt_disable_notrace();
+
+	return bit;
 }

 /**
@@ -222,9 +233,13 @@  static __always_inline int ftrace_test_recursion_trylock(unsigned long ip,
  * @bit: The return of a successful ftrace_test_recursion_trylock()
  *
  * This is used at the end of a ftrace callback.
+ *
+ * Preemption will be enabled (if it was previously enabled).
  */
 static __always_inline void ftrace_test_recursion_unlock(int bit)
 {
+	if (bit)
+		preempt_enable_notrace();
 	trace_clear_recursion(bit);
 }

diff --git a/kernel/livepatch/patch.c b/kernel/livepatch/patch.c
index e8029ae..b8d75fb 100644
--- a/kernel/livepatch/patch.c
+++ b/kernel/livepatch/patch.c
@@ -49,14 +49,16 @@  static void notrace klp_ftrace_handler(unsigned long ip,

 	ops = container_of(fops, struct klp_ops, fops);

+	/*
+	 *
+	 * The ftrace_test_recursion_trylock() will disable preemption,
+	 * which is required for the variant of synchronize_rcu() that is
+	 * used to allow patching functions where RCU is not watching.
+	 * See klp_synchronize_transition() for more details.
+	 */
 	bit = ftrace_test_recursion_trylock(ip, parent_ip);
 	if (WARN_ON_ONCE(bit < 0))
 		return;
-	/*
-	 * A variant of synchronize_rcu() is used to allow patching functions
-	 * where RCU is not watching, see klp_synchronize_transition().
-	 */
-	preempt_disable_notrace();

 	func = list_first_or_null_rcu(&ops->func_stack, struct klp_func,
 				      stack_node);
@@ -120,7 +122,6 @@  static void notrace klp_ftrace_handler(unsigned long ip,
 	klp_arch_set_pc(fregs, (unsigned long)func->new_func);

 unlock:
-	preempt_enable_notrace();
 	ftrace_test_recursion_unlock(bit);
 }

diff --git a/kernel/trace/trace_functions.c b/kernel/trace/trace_functions.c
index 1f0e63f..9f1bfbe 100644
--- a/kernel/trace/trace_functions.c
+++ b/kernel/trace/trace_functions.c
@@ -186,7 +186,6 @@  static void function_trace_start(struct trace_array *tr)
 		return;

 	trace_ctx = tracing_gen_ctx();
-	preempt_disable_notrace();

 	cpu = smp_processor_id();
 	data = per_cpu_ptr(tr->array_buffer.data, cpu);
@@ -194,7 +193,6 @@  static void function_trace_start(struct trace_array *tr)
 		trace_function(tr, ip, parent_ip, trace_ctx);

 	ftrace_test_recursion_unlock(bit);
-	preempt_enable_notrace();
 }

 #ifdef CONFIG_UNWINDER_ORC
@@ -298,8 +296,6 @@  static inline void process_repeats(struct trace_array *tr,
 	if (bit < 0)
 		return;

-	preempt_disable_notrace();
-
 	cpu = smp_processor_id();
 	data = per_cpu_ptr(tr->array_buffer.data, cpu);
 	if (atomic_read(&data->disabled))
@@ -324,7 +320,6 @@  static inline void process_repeats(struct trace_array *tr,

 out:
 	ftrace_test_recursion_unlock(bit);
-	preempt_enable_notrace();
 }

 static void