diff mbox

[v9,4/7] tracepoint: Make rcuidle tracepoint callers use SRCU

Message ID 20180628182149.226164-5-joel@joelfernandes.org (mailing list archive)
State New
Headers show

Commit Message

Joel Fernandes June 28, 2018, 6:21 p.m. UTC
From: "Joel Fernandes (Google)" <joel@joelfernandes.org>

In recent tests with IRQ on/off tracepoints, a large performance
overhead ~10% is noticed when running hackbench. This is root caused to
calls to rcu_irq_enter_irqson and rcu_irq_exit_irqson from the
tracepoint code. Following a long discussion on the list [1] about this,
we concluded that srcu is a better alternative for use during rcu idle.
Although it does involve extra barriers, its lighter than the sched-rcu
version which has to do additional RCU calls to notify RCU idle about
entry into RCU sections.

In this patch, we change the underlying implementation of the
trace_*_rcuidle API to use SRCU. This has shown to improve performance
alot for the high frequency irq enable/disable tracepoints.

Test: Tested idle and preempt/irq tracepoints.

Here are some performance numbers:

With a run of the following 30 times on a single core x86 Qemu instance
with 1GB memory:
hackbench -g 4 -f 2 -l 3000

Completion times in seconds. CONFIG_PROVE_LOCKING=y.

No patches (without this series)
Mean: 3.048
Median: 3.025
Std Dev: 0.064

With Lockdep using irq tracepoints with RCU implementation:
Mean: 3.451   (-11.66 %)
Median: 3.447 (-12.22%)
Std Dev: 0.049

With Lockdep using irq tracepoints with SRCU implementation (this series):
Mean: 3.020   (I would consider the improvement against the "without
	       this series" case as just noise).
Median: 3.013
Std Dev: 0.033

[1] https://patchwork.kernel.org/patch/10344297/

Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
---
 include/linux/tracepoint.h | 49 +++++++++++++++++++++++++++++++-------
 kernel/tracepoint.c        | 16 ++++++++++++-
 2 files changed, 56 insertions(+), 9 deletions(-)

Comments

Peter Zijlstra July 11, 2018, 12:49 p.m. UTC | #1
On Thu, Jun 28, 2018 at 11:21:46AM -0700, Joel Fernandes wrote:
> -		it_func_ptr = rcu_dereference_sched((tp)->funcs);	\

I would convert to rcu_dereference_raw() to appease sparse. The fancy
stuff below is pointless if you then turn off all checking.

> +									\
> +		/*							\
> +		 * For rcuidle callers, use srcu since sched-rcu	\
> +		 * doesn't work from the idle path.			\
> +		 */							\
> +		if (rcuidle) {						\
> +			if (in_nmi()) {					\
> +				WARN_ON_ONCE(1);			\
> +				return; /* no srcu from nmi */		\
> +			}						\
> +									\
> +			idx = srcu_read_lock_notrace(&tracepoint_srcu);	\
> +			it_func_ptr =					\
> +				srcu_dereference_notrace((tp)->funcs,	\
> +						&tracepoint_srcu);	\
> +			/* To keep it consistent with !rcuidle path */	\
> +			preempt_disable_notrace();			\
> +		} else {						\
> +			rcu_read_lock_sched_notrace();			\
> +			it_func_ptr =					\
> +				rcu_dereference_sched((tp)->funcs);	\
> +		}							\
--
To unsubscribe from this list: send the line "unsubscribe linux-kselftest" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Peter Zijlstra July 11, 2018, 12:53 p.m. UTC | #2
On Thu, Jun 28, 2018 at 11:21:46AM -0700, Joel Fernandes wrote:
> -		rcu_read_lock_sched_notrace();				\
> -		it_func_ptr = rcu_dereference_sched((tp)->funcs);	\
> +									\
> +		/*							\
> +		 * For rcuidle callers, use srcu since sched-rcu	\
> +		 * doesn't work from the idle path.			\
> +		 */							\
> +		if (rcuidle) {						\
> +			if (in_nmi()) {					\
> +				WARN_ON_ONCE(1);			\
> +				return; /* no srcu from nmi */		\
> +			}						\
> +									\
> +			idx = srcu_read_lock_notrace(&tracepoint_srcu);	\
> +			it_func_ptr =					\
> +				srcu_dereference_notrace((tp)->funcs,	\
> +						&tracepoint_srcu);	\
> +			/* To keep it consistent with !rcuidle path */	\
> +			preempt_disable_notrace();			\
> +		} else {						\
> +			rcu_read_lock_sched_notrace();			\
> +			it_func_ptr =					\
> +				rcu_dereference_sched((tp)->funcs);	\
> +		}							\
> +									\
>  		if (it_func_ptr) {					\
>  			do {						\
>  				it_func = (it_func_ptr)->func;		\
> @@ -148,9 +177,13 @@ extern void syscall_unregfunc(void);
>  				((void(*)(proto))(it_func))(args);	\
>  			} while ((++it_func_ptr)->func);		\
>  		}							\
> -		rcu_read_unlock_sched_notrace();			\
> -		if (rcucheck)						\
> -			rcu_irq_exit_irqson();				\
> +									\
> +		if (rcuidle) {						\
> +			preempt_enable_notrace();			\
> +			srcu_read_unlock_notrace(&tracepoint_srcu, idx);\
> +		} else {						\
> +			rcu_read_unlock_sched_notrace();		\
> +		}							\
>  	} while (0)

In fact, I would write the thing like:

		preempt_disable_notrace();
		if (rcuidle)
			idx = srcu_read_lock_notrace(&tracepoint_srcu);

		it_func_ptr = rcu_dereference_raw((tp)->funcs);

		/* ... */

		if (rcu_idle)
			srcu_read_unlock_notrace(&tracepoint_srcu, idx);
		preempt_enable_notrace();

Much simpler and very much the same.
--
To unsubscribe from this list: send the line "unsubscribe linux-kselftest" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Peter Zijlstra July 11, 2018, 12:56 p.m. UTC | #3
On Thu, Jun 28, 2018 at 11:21:46AM -0700, Joel Fernandes wrote:
>  static inline void tracepoint_synchronize_unregister(void)
>  {
> +	synchronize_srcu(&tracepoint_srcu);
>  	synchronize_sched();
>  }

Given you below do call_rcu_sched() and then call_srcu(), isn't the
above the wrong way around?

Also, does the above want to be barrier instead of synchronize, so as to
guarantee completion of the callbacks.

> +static void srcu_free_old_probes(struct rcu_head *head)
>  {
>  	kfree(container_of(head, struct tp_probes, rcu));
>  }
>  
> +static void rcu_free_old_probes(struct rcu_head *head)
> +{
> +	call_srcu(&tracepoint_srcu, head, srcu_free_old_probes);
> +}
> +
>  static inline void release_probes(struct tracepoint_func *old)
>  {
>  	if (old) {
>  		struct tp_probes *tp_probes = container_of(old,
>  			struct tp_probes, probes[0]);
> +		/*
> +		 * Tracepoint probes are protected by both sched RCU and SRCU,
> +		 * by calling the SRCU callback in the sched RCU callback we
> +		 * cover both cases. So let us chain the SRCU and sched RCU
> +		 * callbacks to wait for both grace periods.
> +		 */
>  		call_rcu_sched(&tp_probes->rcu, rcu_free_old_probes);
>  	}
>  }
--
To unsubscribe from this list: send the line "unsubscribe linux-kselftest" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Steven Rostedt July 11, 2018, 1 p.m. UTC | #4
On Wed, 11 Jul 2018 14:49:54 +0200
Peter Zijlstra <peterz@infradead.org> wrote:

> On Thu, Jun 28, 2018 at 11:21:46AM -0700, Joel Fernandes wrote:
> > -		it_func_ptr = rcu_dereference_sched((tp)->funcs);	\  
> 
> I would convert to rcu_dereference_raw() to appease sparse. The fancy
> stuff below is pointless if you then turn off all checking.

The problem with doing this is if we use a trace event without the
proper _idle() or whatever, we wont get a warning that it is used
incorrectly with lockdep. Or does lockdep still check if "rcu is
watching" with rcu_dereference_raw()?

-- Steve

> 
> > +									\
> > +		/*							\
> > +		 * For rcuidle callers, use srcu since sched-rcu	\
> > +		 * doesn't work from the idle path.			\
> > +		 */							\
> > +		if (rcuidle) {						\
> > +			if (in_nmi()) {					\
> > +				WARN_ON_ONCE(1);			\
> > +				return; /* no srcu from nmi */		\
> > +			}						\
> > +									\
> > +			idx = srcu_read_lock_notrace(&tracepoint_srcu);	\
> > +			it_func_ptr =					\
> > +				srcu_dereference_notrace((tp)->funcs,	\
> > +						&tracepoint_srcu);	\
> > +			/* To keep it consistent with !rcuidle path */	\
> > +			preempt_disable_notrace();			\
> > +		} else {						\
> > +			rcu_read_lock_sched_notrace();			\
> > +			it_func_ptr =					\
> > +				rcu_dereference_sched((tp)->funcs);	\
> > +		}							\  

--
To unsubscribe from this list: send the line "unsubscribe linux-kselftest" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Steven Rostedt July 11, 2018, 1:06 p.m. UTC | #5
On Wed, 11 Jul 2018 14:56:47 +0200
Peter Zijlstra <peterz@infradead.org> wrote:

> On Thu, Jun 28, 2018 at 11:21:46AM -0700, Joel Fernandes wrote:
> >  static inline void tracepoint_synchronize_unregister(void)
> >  {
> > +	synchronize_srcu(&tracepoint_srcu);
> >  	synchronize_sched();
> >  }  
> 
> Given you below do call_rcu_sched() and then call_srcu(), isn't the
> above the wrong way around?

Good catch!

	release_probes()
		call_rcu_sched()
			---> rcu_free_old_probes() queued

	tracepoint_synchronize_unregister()
		synchronize_srcu(&tracepoint_srcu);
			< finishes right away >
		synchronize_sched()
			--> rcu_free_old_probes()
				--> srcu_free_old_probes() queued
	
Here tracepoint_synchronize_unregister() returned before the srcu
portion ran.


> 
> Also, does the above want to be barrier instead of synchronize, so as to
> guarantee completion of the callbacks.

Not sure what you mean here.

-- Steve

> 
> > +static void srcu_free_old_probes(struct rcu_head *head)
> >  {
> >  	kfree(container_of(head, struct tp_probes, rcu));
> >  }
> >  
> > +static void rcu_free_old_probes(struct rcu_head *head)
> > +{
> > +	call_srcu(&tracepoint_srcu, head, srcu_free_old_probes);
> > +}
> > +
> >  static inline void release_probes(struct tracepoint_func *old)
> >  {
> >  	if (old) {
> >  		struct tp_probes *tp_probes = container_of(old,
> >  			struct tp_probes, probes[0]);
> > +		/*
> > +		 * Tracepoint probes are protected by both sched RCU and SRCU,
> > +		 * by calling the SRCU callback in the sched RCU callback we
> > +		 * cover both cases. So let us chain the SRCU and sched RCU
> > +		 * callbacks to wait for both grace periods.
> > +		 */
> >  		call_rcu_sched(&tp_probes->rcu, rcu_free_old_probes);
> >  	}
> >  }  

--
To unsubscribe from this list: send the line "unsubscribe linux-kselftest" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Paul E. McKenney July 11, 2018, 2:27 p.m. UTC | #6
On Wed, Jul 11, 2018 at 09:00:03AM -0400, Steven Rostedt wrote:
> On Wed, 11 Jul 2018 14:49:54 +0200
> Peter Zijlstra <peterz@infradead.org> wrote:
> 
> > On Thu, Jun 28, 2018 at 11:21:46AM -0700, Joel Fernandes wrote:
> > > -		it_func_ptr = rcu_dereference_sched((tp)->funcs);	\  
> > 
> > I would convert to rcu_dereference_raw() to appease sparse. The fancy
> > stuff below is pointless if you then turn off all checking.
> 
> The problem with doing this is if we use a trace event without the
> proper _idle() or whatever, we wont get a warning that it is used
> incorrectly with lockdep. Or does lockdep still check if "rcu is
> watching" with rcu_dereference_raw()?

No lockdep checking is done by rcu_dereference_raw().

							Thanx, Paul

> -- Steve
> 
> > 
> > > +									\
> > > +		/*							\
> > > +		 * For rcuidle callers, use srcu since sched-rcu	\
> > > +		 * doesn't work from the idle path.			\
> > > +		 */							\
> > > +		if (rcuidle) {						\
> > > +			if (in_nmi()) {					\
> > > +				WARN_ON_ONCE(1);			\
> > > +				return; /* no srcu from nmi */		\
> > > +			}						\
> > > +									\
> > > +			idx = srcu_read_lock_notrace(&tracepoint_srcu);	\
> > > +			it_func_ptr =					\
> > > +				srcu_dereference_notrace((tp)->funcs,	\
> > > +						&tracepoint_srcu);	\
> > > +			/* To keep it consistent with !rcuidle path */	\
> > > +			preempt_disable_notrace();			\
> > > +		} else {						\
> > > +			rcu_read_lock_sched_notrace();			\
> > > +			it_func_ptr =					\
> > > +				rcu_dereference_sched((tp)->funcs);	\
> > > +		}							\  
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-kselftest" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Steven Rostedt July 11, 2018, 2:46 p.m. UTC | #7
On Wed, 11 Jul 2018 07:27:44 -0700
"Paul E. McKenney" <paulmck@linux.vnet.ibm.com> wrote:

> On Wed, Jul 11, 2018 at 09:00:03AM -0400, Steven Rostedt wrote:
> > On Wed, 11 Jul 2018 14:49:54 +0200
> > Peter Zijlstra <peterz@infradead.org> wrote:
> >   
> > > On Thu, Jun 28, 2018 at 11:21:46AM -0700, Joel Fernandes wrote:  
> > > > -		it_func_ptr = rcu_dereference_sched((tp)->funcs);	\    
> > > 
> > > I would convert to rcu_dereference_raw() to appease sparse. The fancy
> > > stuff below is pointless if you then turn off all checking.  
> > 
> > The problem with doing this is if we use a trace event without the
> > proper _idle() or whatever, we wont get a warning that it is used
> > incorrectly with lockdep. Or does lockdep still check if "rcu is
> > watching" with rcu_dereference_raw()?  
> 
> No lockdep checking is done by rcu_dereference_raw().

Correct, but I think we can do this regardless. So Joel please resend
with Peter's suggestion.

The reason being is because of this:


#define __DECLARE_TRACE(name, proto, args, cond, data_proto, data_args) \
	extern struct tracepoint __tracepoint_##name;			\
	static inline void trace_##name(proto)				\
	{								\
		if (static_key_false(&__tracepoint_##name.key))		\
			__DO_TRACE(&__tracepoint_##name,		\
				TP_PROTO(data_proto),			\
				TP_ARGS(data_args),			\
				TP_CONDITION(cond), 0);			\
		if (IS_ENABLED(CONFIG_LOCKDEP) && (cond)) {		\
			rcu_read_lock_sched_notrace();			\
			rcu_dereference_sched(__tracepoint_##name.funcs);\
			rcu_read_unlock_sched_notrace();		\
		}							\
	}

Because lockdep would only trigger warnings when the tracepoint was
enabled and used in a place it shouldn't be, we added the above
IS_ENABLED(CONFIG_LOCKDEP) part to test regardless if the the
tracepoint was enabled or not. Because we do this, we don't need to
have the test in the __DO_TRACE() code itself. That means we can clean
up the code as per Peter's suggestion.

-- Steve
--
To unsubscribe from this list: send the line "unsubscribe linux-kselftest" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Paul E. McKenney July 11, 2018, 3:15 p.m. UTC | #8
On Wed, Jul 11, 2018 at 10:46:18AM -0400, Steven Rostedt wrote:
> On Wed, 11 Jul 2018 07:27:44 -0700
> "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> wrote:
> 
> > On Wed, Jul 11, 2018 at 09:00:03AM -0400, Steven Rostedt wrote:
> > > On Wed, 11 Jul 2018 14:49:54 +0200
> > > Peter Zijlstra <peterz@infradead.org> wrote:
> > >   
> > > > On Thu, Jun 28, 2018 at 11:21:46AM -0700, Joel Fernandes wrote:  
> > > > > -		it_func_ptr = rcu_dereference_sched((tp)->funcs);	\    
> > > > 
> > > > I would convert to rcu_dereference_raw() to appease sparse. The fancy
> > > > stuff below is pointless if you then turn off all checking.  
> > > 
> > > The problem with doing this is if we use a trace event without the
> > > proper _idle() or whatever, we wont get a warning that it is used
> > > incorrectly with lockdep. Or does lockdep still check if "rcu is
> > > watching" with rcu_dereference_raw()?  
> > 
> > No lockdep checking is done by rcu_dereference_raw().
> 
> Correct, but I think we can do this regardless. So Joel please resend
> with Peter's suggestion.
> 
> The reason being is because of this:
> 
> 
> #define __DECLARE_TRACE(name, proto, args, cond, data_proto, data_args) \
> 	extern struct tracepoint __tracepoint_##name;			\
> 	static inline void trace_##name(proto)				\
> 	{								\
> 		if (static_key_false(&__tracepoint_##name.key))		\
> 			__DO_TRACE(&__tracepoint_##name,		\
> 				TP_PROTO(data_proto),			\
> 				TP_ARGS(data_args),			\
> 				TP_CONDITION(cond), 0);			\
> 		if (IS_ENABLED(CONFIG_LOCKDEP) && (cond)) {		\
> 			rcu_read_lock_sched_notrace();			\
> 			rcu_dereference_sched(__tracepoint_##name.funcs);\
> 			rcu_read_unlock_sched_notrace();		\
> 		}							\
> 	}
> 
> Because lockdep would only trigger warnings when the tracepoint was
> enabled and used in a place it shouldn't be, we added the above
> IS_ENABLED(CONFIG_LOCKDEP) part to test regardless if the the
> tracepoint was enabled or not. Because we do this, we don't need to
> have the test in the __DO_TRACE() code itself. That means we can clean
> up the code as per Peter's suggestion.

Indeed, the rcu_dereference_sched() would catch it in that case, so
agreed, Peter's suggestion isn't losing any debuggability.

						Thanx, Paul

--
To unsubscribe from this list: send the line "unsubscribe linux-kselftest" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Peter Zijlstra July 11, 2018, 3:17 p.m. UTC | #9
On Wed, Jul 11, 2018 at 09:06:49AM -0400, Steven Rostedt wrote:
> On Wed, 11 Jul 2018 14:56:47 +0200
> Peter Zijlstra <peterz@infradead.org> wrote:
> 
> > On Thu, Jun 28, 2018 at 11:21:46AM -0700, Joel Fernandes wrote:
> > >  static inline void tracepoint_synchronize_unregister(void)
> > >  {
> > > +	synchronize_srcu(&tracepoint_srcu);
> > >  	synchronize_sched();
> > >  }  
> > 
> > Given you below do call_rcu_sched() and then call_srcu(), isn't the
> > above the wrong way around?
> 
> Good catch!
> 
> 	release_probes()
> 		call_rcu_sched()
> 			---> rcu_free_old_probes() queued
> 
> 	tracepoint_synchronize_unregister()
> 		synchronize_srcu(&tracepoint_srcu);
> 			< finishes right away >
> 		synchronize_sched()
> 			--> rcu_free_old_probes()
> 				--> srcu_free_old_probes() queued
> 	
> Here tracepoint_synchronize_unregister() returned before the srcu
> portion ran.

I just read the comment that goes with that function; the order doesn't
matter. All we want to ensure is that the unregistration is visible to
either sched or srcu tracepoint users.
--
To unsubscribe from this list: send the line "unsubscribe linux-kselftest" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Steven Rostedt July 11, 2018, 3:26 p.m. UTC | #10
On Wed, 11 Jul 2018 17:17:51 +0200
Peter Zijlstra <peterz@infradead.org> wrote:

> I just read the comment that goes with that function; the order doesn't
> matter. All we want to ensure is that the unregistration is visible to
> either sched or srcu tracepoint users.

Yeah, but I think it is still good to change the order. It doesn't
hurt, and in my opinion makes the code a bit more robust.

-- Steve
--
To unsubscribe from this list: send the line "unsubscribe linux-kselftest" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Mathieu Desnoyers July 11, 2018, 4:40 p.m. UTC | #11
----- On Jul 11, 2018, at 11:17 AM, Peter Zijlstra peterz@infradead.org wrote:

> On Wed, Jul 11, 2018 at 09:06:49AM -0400, Steven Rostedt wrote:
>> On Wed, 11 Jul 2018 14:56:47 +0200
>> Peter Zijlstra <peterz@infradead.org> wrote:
>> 
>> > On Thu, Jun 28, 2018 at 11:21:46AM -0700, Joel Fernandes wrote:
>> > >  static inline void tracepoint_synchronize_unregister(void)
>> > >  {
>> > > +	synchronize_srcu(&tracepoint_srcu);
>> > >  	synchronize_sched();
>> > >  }
>> > 
>> > Given you below do call_rcu_sched() and then call_srcu(), isn't the
>> > above the wrong way around?
>> 
>> Good catch!
>> 
>> 	release_probes()
>> 		call_rcu_sched()
>> 			---> rcu_free_old_probes() queued
>> 
>> 	tracepoint_synchronize_unregister()
>> 		synchronize_srcu(&tracepoint_srcu);
>> 			< finishes right away >
>> 		synchronize_sched()
>> 			--> rcu_free_old_probes()
>> 				--> srcu_free_old_probes() queued
>> 	
>> Here tracepoint_synchronize_unregister() returned before the srcu
>> portion ran.
> 
> I just read the comment that goes with that function; the order doesn't
> matter. All we want to ensure is that the unregistration is visible to
> either sched or srcu tracepoint users.

Exactly, the order does not matter here.

Thanks,

Mathieu
Mathieu Desnoyers July 11, 2018, 4:46 p.m. UTC | #12
----- On Jul 11, 2018, at 11:26 AM, rostedt rostedt@goodmis.org wrote:

> On Wed, 11 Jul 2018 17:17:51 +0200
> Peter Zijlstra <peterz@infradead.org> wrote:
> 
>> I just read the comment that goes with that function; the order doesn't
>> matter. All we want to ensure is that the unregistration is visible to
>> either sched or srcu tracepoint users.
> 
> Yeah, but I think it is still good to change the order. It doesn't
> hurt, and in my opinion makes the code a bit more robust.

I don't mind. It makes the code more regular. It does not change
anything wrt robustness here though.

Thanks,

Mathieu
Joel Fernandes July 11, 2018, 8:52 p.m. UTC | #13
On Wed, Jul 11, 2018 at 10:46:18AM -0400, Steven Rostedt wrote:
> On Wed, 11 Jul 2018 07:27:44 -0700
> "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> wrote:
> 
> > On Wed, Jul 11, 2018 at 09:00:03AM -0400, Steven Rostedt wrote:
> > > On Wed, 11 Jul 2018 14:49:54 +0200
> > > Peter Zijlstra <peterz@infradead.org> wrote:
> > >   
> > > > On Thu, Jun 28, 2018 at 11:21:46AM -0700, Joel Fernandes wrote:  
> > > > > -		it_func_ptr = rcu_dereference_sched((tp)->funcs);	\    
> > > > 
> > > > I would convert to rcu_dereference_raw() to appease sparse. The fancy
> > > > stuff below is pointless if you then turn off all checking.  
> > > 
> > > The problem with doing this is if we use a trace event without the
> > > proper _idle() or whatever, we wont get a warning that it is used
> > > incorrectly with lockdep. Or does lockdep still check if "rcu is
> > > watching" with rcu_dereference_raw()?  
> > 
> > No lockdep checking is done by rcu_dereference_raw().
> 
> Correct, but I think we can do this regardless. So Joel please resend
> with Peter's suggestion.
> 
> The reason being is because of this:
> 
> 
> #define __DECLARE_TRACE(name, proto, args, cond, data_proto, data_args) \
> 	extern struct tracepoint __tracepoint_##name;			\
> 	static inline void trace_##name(proto)				\
> 	{								\
> 		if (static_key_false(&__tracepoint_##name.key))		\
> 			__DO_TRACE(&__tracepoint_##name,		\
> 				TP_PROTO(data_proto),			\
> 				TP_ARGS(data_args),			\
> 				TP_CONDITION(cond), 0);			\
> 		if (IS_ENABLED(CONFIG_LOCKDEP) && (cond)) {		\
> 			rcu_read_lock_sched_notrace();			\
> 			rcu_dereference_sched(__tracepoint_##name.funcs);\
> 			rcu_read_unlock_sched_notrace();		\
> 		}							\
> 	}
> 
> Because lockdep would only trigger warnings when the tracepoint was
> enabled and used in a place it shouldn't be, we added the above
> IS_ENABLED(CONFIG_LOCKDEP) part to test regardless if the the
> tracepoint was enabled or not. Because we do this, we don't need to
> have the test in the __DO_TRACE() code itself. That means we can clean
> up the code as per Peter's suggestion.

Sounds good, I'm Ok with making this change.

Just to clarify, are you proposing to change the rcu_dereference_sched to
rcu_dereference_raw in both __DECLARE_TRACE and __DO_TRACE?

thanks!

- Joel

--
To unsubscribe from this list: send the line "unsubscribe linux-kselftest" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Joel Fernandes July 11, 2018, 8:56 p.m. UTC | #14
On Wed, Jul 11, 2018 at 08:15:59AM -0700, Paul E. McKenney wrote:
> On Wed, Jul 11, 2018 at 10:46:18AM -0400, Steven Rostedt wrote:
> > On Wed, 11 Jul 2018 07:27:44 -0700
> > "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> wrote:
> > 
> > > On Wed, Jul 11, 2018 at 09:00:03AM -0400, Steven Rostedt wrote:
> > > > On Wed, 11 Jul 2018 14:49:54 +0200
> > > > Peter Zijlstra <peterz@infradead.org> wrote:
> > > >   
> > > > > On Thu, Jun 28, 2018 at 11:21:46AM -0700, Joel Fernandes wrote:  
> > > > > > -		it_func_ptr = rcu_dereference_sched((tp)->funcs);	\    
> > > > > 
> > > > > I would convert to rcu_dereference_raw() to appease sparse. The fancy
> > > > > stuff below is pointless if you then turn off all checking.  
> > > > 
> > > > The problem with doing this is if we use a trace event without the
> > > > proper _idle() or whatever, we wont get a warning that it is used
> > > > incorrectly with lockdep. Or does lockdep still check if "rcu is
> > > > watching" with rcu_dereference_raw()?  
> > > 
> > > No lockdep checking is done by rcu_dereference_raw().
> > 
> > Correct, but I think we can do this regardless. So Joel please resend
> > with Peter's suggestion.
> > 
> > The reason being is because of this:
> > 
> > 
> > #define __DECLARE_TRACE(name, proto, args, cond, data_proto, data_args) \
> > 	extern struct tracepoint __tracepoint_##name;			\
> > 	static inline void trace_##name(proto)				\
> > 	{								\
> > 		if (static_key_false(&__tracepoint_##name.key))		\
> > 			__DO_TRACE(&__tracepoint_##name,		\
> > 				TP_PROTO(data_proto),			\
> > 				TP_ARGS(data_args),			\
> > 				TP_CONDITION(cond), 0);			\
> > 		if (IS_ENABLED(CONFIG_LOCKDEP) && (cond)) {		\
> > 			rcu_read_lock_sched_notrace();			\
> > 			rcu_dereference_sched(__tracepoint_##name.funcs);\
> > 			rcu_read_unlock_sched_notrace();		\
> > 		}							\
> > 	}
> > 
> > Because lockdep would only trigger warnings when the tracepoint was
> > enabled and used in a place it shouldn't be, we added the above
> > IS_ENABLED(CONFIG_LOCKDEP) part to test regardless if the the
> > tracepoint was enabled or not. Because we do this, we don't need to
> > have the test in the __DO_TRACE() code itself. That means we can clean
> > up the code as per Peter's suggestion.
> 
> Indeed, the rcu_dereference_sched() would catch it in that case, so
> agreed, Peter's suggestion isn't losing any debuggability.

Hmm, but if we are doing the check later anyway, then why not do it in
__DO_TRACE itself?

Also I guess we are discussing about changing the rcu_dereference_sched which
I think should go into a separate patch since my patch isn't touching how the
rcuidle==0 paths use the RCU API. So I think this is an existing issue
independent of this series.

thanks!

- Joel





--
To unsubscribe from this list: send the line "unsubscribe linux-kselftest" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Joel Fernandes July 12, 2018, 12:31 a.m. UTC | #15
On Wed, Jul 11, 2018 at 09:06:49AM -0400, Steven Rostedt wrote:
> On Wed, 11 Jul 2018 14:56:47 +0200
> Peter Zijlstra <peterz@infradead.org> wrote:
> 
> > On Thu, Jun 28, 2018 at 11:21:46AM -0700, Joel Fernandes wrote:
> > >  static inline void tracepoint_synchronize_unregister(void)
> > >  {
> > > +	synchronize_srcu(&tracepoint_srcu);
> > >  	synchronize_sched();
> > >  }  
> > 
> > Given you below do call_rcu_sched() and then call_srcu(), isn't the
> > above the wrong way around?
> 
> Good catch!
> 
> 	release_probes()
> 		call_rcu_sched()
> 			---> rcu_free_old_probes() queued
> 
> 	tracepoint_synchronize_unregister()
> 		synchronize_srcu(&tracepoint_srcu);
> 			< finishes right away >
> 		synchronize_sched()
> 			--> rcu_free_old_probes()
> 				--> srcu_free_old_probes() queued
> 	
> Here tracepoint_synchronize_unregister() returned before the srcu
> portion ran.

But isn't the point of synchronize_rcu to make sure that we're no longer in
an RCU read-side section, not that *all* queued callbacks already ran? So in that
case, I think it doesn't matter which order the 2 synchronize functions are
called in. Please let me know if if I missed something!

I believe what we're trying to guarantee here is that no tracepoints using
either flavor of RCU are active after tracepoint_synchronize_unregister
returns.

thanks!

- Joel

--
To unsubscribe from this list: send the line "unsubscribe linux-kselftest" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Steven Rostedt July 12, 2018, 1:22 a.m. UTC | #16
On Wed, 11 Jul 2018 13:56:39 -0700
Joel Fernandes <joel@joelfernandes.org> wrote:

> > > #define __DECLARE_TRACE(name, proto, args, cond, data_proto, data_args) \
> > > 	extern struct tracepoint __tracepoint_##name;			\
> > > 	static inline void trace_##name(proto)				\
> > > 	{								\
> > > 		if (static_key_false(&__tracepoint_##name.key))		\
> > > 			__DO_TRACE(&__tracepoint_##name,		\
> > > 				TP_PROTO(data_proto),			\
> > > 				TP_ARGS(data_args),			\
> > > 				TP_CONDITION(cond), 0);			\
> > > 		if (IS_ENABLED(CONFIG_LOCKDEP) && (cond)) {		\
> > > 			rcu_read_lock_sched_notrace();			\
> > > 			rcu_dereference_sched(__tracepoint_##name.funcs);\
> > > 			rcu_read_unlock_sched_notrace();		\
> > > 		}							\
> > > 	}
> > > 
> > > Because lockdep would only trigger warnings when the tracepoint was
> > > enabled and used in a place it shouldn't be, we added the above
> > > IS_ENABLED(CONFIG_LOCKDEP) part to test regardless if the the
> > > tracepoint was enabled or not. Because we do this, we don't need to
> > > have the test in the __DO_TRACE() code itself. That means we can clean
> > > up the code as per Peter's suggestion.  
> > 
> > Indeed, the rcu_dereference_sched() would catch it in that case, so
> > agreed, Peter's suggestion isn't losing any debuggability.  
> 
> Hmm, but if we are doing the check later anyway, then why not do it in
> __DO_TRACE itself?

Because __DO_TRACE is only called if the trace event is enabled. If we
never enable a trace event, we never know if it has a potential of
doing it wrong. The second part is to trigger the warning immediately
regardless if the trace event is enabled or not.

> 
> Also I guess we are discussing about changing the rcu_dereference_sched which
> I think should go into a separate patch since my patch isn't touching how the
> rcuidle==0 paths use the RCU API. So I think this is an existing issue
> independent of this series.

But the code you added made it much more complex to keep the checks as
is. If we remove the checks then this patch doesn't need to have all
the if statements, and we can do it the way Peter suggested.

But sure, go ahead and make a separate patch first that removes the
checks from __DO_TRACE() first if you want to.

-- Steve
--
To unsubscribe from this list: send the line "unsubscribe linux-kselftest" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Steven Rostedt July 12, 2018, 1:26 a.m. UTC | #17
On Wed, 11 Jul 2018 17:31:00 -0700
Joel Fernandes <joel@joelfernandes.org> wrote:

> On Wed, Jul 11, 2018 at 09:06:49AM -0400, Steven Rostedt wrote:
> > On Wed, 11 Jul 2018 14:56:47 +0200
> > Peter Zijlstra <peterz@infradead.org> wrote:
> >   
> > > On Thu, Jun 28, 2018 at 11:21:46AM -0700, Joel Fernandes wrote:  
> > > >  static inline void tracepoint_synchronize_unregister(void)
> > > >  {
> > > > +	synchronize_srcu(&tracepoint_srcu);
> > > >  	synchronize_sched();
> > > >  }    
> > > 
> > > Given you below do call_rcu_sched() and then call_srcu(), isn't the
> > > above the wrong way around?  
> > 
> > Good catch!
> > 
> > 	release_probes()
> > 		call_rcu_sched()  
> > 			---> rcu_free_old_probes() queued  
> > 
> > 	tracepoint_synchronize_unregister()
> > 		synchronize_srcu(&tracepoint_srcu);
> > 			< finishes right away >
> > 		synchronize_sched()  
> > 			--> rcu_free_old_probes()
> > 				--> srcu_free_old_probes() queued  
> > 	
> > Here tracepoint_synchronize_unregister() returned before the srcu
> > portion ran.  
> 
> But isn't the point of synchronize_rcu to make sure that we're no longer in
> an RCU read-side section, not that *all* queued callbacks already ran? So in that
> case, I think it doesn't matter which order the 2 synchronize functions are
> called in. Please let me know if if I missed something!
> 
> I believe what we're trying to guarantee here is that no tracepoints using
> either flavor of RCU are active after tracepoint_synchronize_unregister
> returns.

Yes you are correct. If tracepoint_synchronize_unregister() is only to
make sure that there is no more trace events using the probes, then
this should work. I was focused on looking at it with release_probes()
too. So the patch is fine as is.

-- Steve
--
To unsubscribe from this list: send the line "unsubscribe linux-kselftest" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Joel Fernandes July 12, 2018, 2:32 a.m. UTC | #18
On Wed, Jul 11, 2018 at 02:53:22PM +0200, Peter Zijlstra wrote:
> On Thu, Jun 28, 2018 at 11:21:46AM -0700, Joel Fernandes wrote:
> > -		rcu_read_lock_sched_notrace();				\
> > -		it_func_ptr = rcu_dereference_sched((tp)->funcs);	\
> > +									\
> > +		/*							\
> > +		 * For rcuidle callers, use srcu since sched-rcu	\
> > +		 * doesn't work from the idle path.			\
> > +		 */							\
> > +		if (rcuidle) {						\
> > +			if (in_nmi()) {					\
> > +				WARN_ON_ONCE(1);			\
> > +				return; /* no srcu from nmi */		\
> > +			}						\
> > +									\
> > +			idx = srcu_read_lock_notrace(&tracepoint_srcu);	\
> > +			it_func_ptr =					\
> > +				srcu_dereference_notrace((tp)->funcs,	\
> > +						&tracepoint_srcu);	\
> > +			/* To keep it consistent with !rcuidle path */	\
> > +			preempt_disable_notrace();			\
> > +		} else {						\
> > +			rcu_read_lock_sched_notrace();			\
> > +			it_func_ptr =					\
> > +				rcu_dereference_sched((tp)->funcs);	\
> > +		}							\
> > +									\
> >  		if (it_func_ptr) {					\
> >  			do {						\
> >  				it_func = (it_func_ptr)->func;		\
> > @@ -148,9 +177,13 @@ extern void syscall_unregfunc(void);
> >  				((void(*)(proto))(it_func))(args);	\
> >  			} while ((++it_func_ptr)->func);		\
> >  		}							\
> > -		rcu_read_unlock_sched_notrace();			\
> > -		if (rcucheck)						\
> > -			rcu_irq_exit_irqson();				\
> > +									\
> > +		if (rcuidle) {						\
> > +			preempt_enable_notrace();			\
> > +			srcu_read_unlock_notrace(&tracepoint_srcu, idx);\
> > +		} else {						\
> > +			rcu_read_unlock_sched_notrace();		\
> > +		}							\
> >  	} while (0)
> 
> In fact, I would write the thing like:
> 
> 		preempt_disable_notrace();
> 		if (rcuidle)
> 			idx = srcu_read_lock_notrace(&tracepoint_srcu);
> 
> 		it_func_ptr = rcu_dereference_raw((tp)->funcs);
> 
> 		/* ... */
> 
> 		if (rcu_idle)
> 			srcu_read_unlock_notrace(&tracepoint_srcu, idx);
> 		preempt_enable_notrace();
> 
> Much simpler and very much the same.

Cool, thanks! I will do it this way and resend.

- Joel

--
To unsubscribe from this list: send the line "unsubscribe linux-kselftest" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Joel Fernandes July 12, 2018, 2:35 a.m. UTC | #19
On Wed, Jul 11, 2018 at 09:22:37PM -0400, Steven Rostedt wrote:
> On Wed, 11 Jul 2018 13:56:39 -0700
> Joel Fernandes <joel@joelfernandes.org> wrote:
> 
> > > > #define __DECLARE_TRACE(name, proto, args, cond, data_proto, data_args) \
> > > > 	extern struct tracepoint __tracepoint_##name;			\
> > > > 	static inline void trace_##name(proto)				\
> > > > 	{								\
> > > > 		if (static_key_false(&__tracepoint_##name.key))		\
> > > > 			__DO_TRACE(&__tracepoint_##name,		\
> > > > 				TP_PROTO(data_proto),			\
> > > > 				TP_ARGS(data_args),			\
> > > > 				TP_CONDITION(cond), 0);			\
> > > > 		if (IS_ENABLED(CONFIG_LOCKDEP) && (cond)) {		\
> > > > 			rcu_read_lock_sched_notrace();			\
> > > > 			rcu_dereference_sched(__tracepoint_##name.funcs);\
> > > > 			rcu_read_unlock_sched_notrace();		\
> > > > 		}							\
> > > > 	}
> > > > 
> > > > Because lockdep would only trigger warnings when the tracepoint was
> > > > enabled and used in a place it shouldn't be, we added the above
> > > > IS_ENABLED(CONFIG_LOCKDEP) part to test regardless if the the
> > > > tracepoint was enabled or not. Because we do this, we don't need to
> > > > have the test in the __DO_TRACE() code itself. That means we can clean
> > > > up the code as per Peter's suggestion.  
> > > 
> > > Indeed, the rcu_dereference_sched() would catch it in that case, so
> > > agreed, Peter's suggestion isn't losing any debuggability.  
> > 
> > Hmm, but if we are doing the check later anyway, then why not do it in
> > __DO_TRACE itself?
> 
> Because __DO_TRACE is only called if the trace event is enabled. If we
> never enable a trace event, we never know if it has a potential of
> doing it wrong. The second part is to trigger the warning immediately
> regardless if the trace event is enabled or not.

I see, thanks for the clarification.

> > 
> > Also I guess we are discussing about changing the rcu_dereference_sched which
> > I think should go into a separate patch since my patch isn't touching how the
> > rcuidle==0 paths use the RCU API. So I think this is an existing issue
> > independent of this series.
> 
> But the code you added made it much more complex to keep the checks as
> is. If we remove the checks then this patch doesn't need to have all
> the if statements, and we can do it the way Peter suggested.

Yes, I agree Peter's suggestion is very clean.

> But sure, go ahead and make a separate patch first that removes the
> checks from __DO_TRACE() first if you want to.

No its ok, no problem, I can just do it in the same patch now that I see the code is much simplified with what Peter is suggesting.

thanks!

- Joel


--
To unsubscribe from this list: send the line "unsubscribe linux-kselftest" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Steven Rostedt July 12, 2018, 3:21 a.m. UTC | #20
On Wed, 11 Jul 2018 13:52:49 -0700
Joel Fernandes <joel@joelfernandes.org> wrote:

> > #define __DECLARE_TRACE(name, proto, args, cond, data_proto, data_args) \
> > 	extern struct tracepoint __tracepoint_##name;			\
> > 	static inline void trace_##name(proto)				\
> > 	{								\
> > 		if (static_key_false(&__tracepoint_##name.key))		\
> > 			__DO_TRACE(&__tracepoint_##name,		\
> > 				TP_PROTO(data_proto),			\
> > 				TP_ARGS(data_args),			\
> > 				TP_CONDITION(cond), 0);			\
> > 		if (IS_ENABLED(CONFIG_LOCKDEP) && (cond)) {		\
> > 			rcu_read_lock_sched_notrace();			\
> > 			rcu_dereference_sched(__tracepoint_##name.funcs);\
> > 			rcu_read_unlock_sched_notrace();		\
> > 		}							\
> > 	}
> > 
> > Because lockdep would only trigger warnings when the tracepoint was
> > enabled and used in a place it shouldn't be, we added the above
> > IS_ENABLED(CONFIG_LOCKDEP) part to test regardless if the the
> > tracepoint was enabled or not. Because we do this, we don't need to
> > have the test in the __DO_TRACE() code itself. That means we can clean
> > up the code as per Peter's suggestion.  
> 
> Sounds good, I'm Ok with making this change.
> 
> Just to clarify, are you proposing to change the rcu_dereference_sched to
> rcu_dereference_raw in both __DECLARE_TRACE and __DO_TRACE?

No, just in __DO_TRACE(). The rcu_dereference_sched() above in
__DECLARE_TRACE() in the if (IS_ENABLED(CONFIG_LOCKDEP) block is
required to show the warnings if trace_##name() is used wrong, and is
the reason we can use rcu_dereference_raw() in __DO_TRACE() in the
first place ;-)

This brings up another point. We should probably add to
__DECLARE_TRACE_RCU() this:

#ifndef MODULE
#define __DECLARE_TRACE_RCU(name, proto, args, cond, data_proto, data_args) \
	static inline void trace_##name##_rcuidle(proto)		\
	{								\
		if (static_key_false(&__tracepoint_##name.key))		\
			__DO_TRACE(&__tracepoint_##name,		\
				TP_PROTO(data_proto),			\
				TP_ARGS(data_args),			\
				TP_CONDITION(cond), 1);			\
+		if (IS_ENABLED(CONFIG_LOCKDEP) && (cond)) {		\
+			int idx;					\
+			idx = srcu_read_lock_notrace(&tracepoint_srcu); \
+			srcu_dereference_notrace(__tracepoint_##name.funcs, \
+						&tracepoint_srcu);	\
+			srcu_read_unlock_notrace(&tracepoint_srcu, idx); \
+		}							\
	}
#else


So that lockdep works with trace_##name##__rcuidle() when the trace
event is not enabled.

But that should be a separate patch and not part of this series. I may
write that up tomorrow.

-- Steve
--
To unsubscribe from this list: send the line "unsubscribe linux-kselftest" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Joel Fernandes July 12, 2018, 4:28 a.m. UTC | #21
On Wed, Jul 11, 2018 at 11:21:20PM -0400, Steven Rostedt wrote:
> On Wed, 11 Jul 2018 13:52:49 -0700
> Joel Fernandes <joel@joelfernandes.org> wrote:
> 
> > > #define __DECLARE_TRACE(name, proto, args, cond, data_proto, data_args) \
> > > 	extern struct tracepoint __tracepoint_##name;			\
> > > 	static inline void trace_##name(proto)				\
> > > 	{								\
> > > 		if (static_key_false(&__tracepoint_##name.key))		\
> > > 			__DO_TRACE(&__tracepoint_##name,		\
> > > 				TP_PROTO(data_proto),			\
> > > 				TP_ARGS(data_args),			\
> > > 				TP_CONDITION(cond), 0);			\
> > > 		if (IS_ENABLED(CONFIG_LOCKDEP) && (cond)) {		\
> > > 			rcu_read_lock_sched_notrace();			\
> > > 			rcu_dereference_sched(__tracepoint_##name.funcs);\
> > > 			rcu_read_unlock_sched_notrace();		\
> > > 		}							\
> > > 	}
> > > 
> > > Because lockdep would only trigger warnings when the tracepoint was
> > > enabled and used in a place it shouldn't be, we added the above
> > > IS_ENABLED(CONFIG_LOCKDEP) part to test regardless if the the
> > > tracepoint was enabled or not. Because we do this, we don't need to
> > > have the test in the __DO_TRACE() code itself. That means we can clean
> > > up the code as per Peter's suggestion.  
> > 
> > Sounds good, I'm Ok with making this change.
> > 
> > Just to clarify, are you proposing to change the rcu_dereference_sched to
> > rcu_dereference_raw in both __DECLARE_TRACE and __DO_TRACE?
> 
> No, just in __DO_TRACE(). The rcu_dereference_sched() above in
> __DECLARE_TRACE() in the if (IS_ENABLED(CONFIG_LOCKDEP) block is
> required to show the warnings if trace_##name() is used wrong, and is
> the reason we can use rcu_dereference_raw() in __DO_TRACE() in the
> first place ;-)
> 
> This brings up another point. We should probably add to
> __DECLARE_TRACE_RCU() this:
> 
> #ifndef MODULE
> #define __DECLARE_TRACE_RCU(name, proto, args, cond, data_proto, data_args) \
> 	static inline void trace_##name##_rcuidle(proto)		\
> 	{								\
> 		if (static_key_false(&__tracepoint_##name.key))		\
> 			__DO_TRACE(&__tracepoint_##name,		\
> 				TP_PROTO(data_proto),			\
> 				TP_ARGS(data_args),			\
> 				TP_CONDITION(cond), 1);			\
> +		if (IS_ENABLED(CONFIG_LOCKDEP) && (cond)) {		\
> +			int idx;					\
> +			idx = srcu_read_lock_notrace(&tracepoint_srcu); \
> +			srcu_dereference_notrace(__tracepoint_##name.funcs, \
> +						&tracepoint_srcu);	\
> +			srcu_read_unlock_notrace(&tracepoint_srcu, idx); \
> +		}							\
> 	}
> #else
> 
> 
> So that lockdep works with trace_##name##__rcuidle() when the trace
> event is not enabled.
> 
> But that should be a separate patch and not part of this series. I may
> write that up tomorrow.

Yes, that sounds good to me and would be good to add the safe guard there.
But you meant srcu_dereference above, not srcu_dereference_notrace right?

Meanwhile I'll drop that lockdep_recursion tomorrow and run some tests and
see how it behaves with Peter's changes.

thanks!

- Joel

--
To unsubscribe from this list: send the line "unsubscribe linux-kselftest" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Steven Rostedt July 12, 2018, 1:35 p.m. UTC | #22
On Wed, 11 Jul 2018 21:28:25 -0700
Joel Fernandes <joel@joelfernandes.org> wrote:

> On Wed, Jul 11, 2018 at 11:21:20PM -0400, Steven Rostedt wrote:
> > On Wed, 11 Jul 2018 13:52:49 -0700
> > Joel Fernandes <joel@joelfernandes.org> wrote:
> >   
> > > > #define __DECLARE_TRACE(name, proto, args, cond, data_proto, data_args) \
> > > > 	extern struct tracepoint __tracepoint_##name;			\
> > > > 	static inline void trace_##name(proto)				\
> > > > 	{								\
> > > > 		if (static_key_false(&__tracepoint_##name.key))		\
> > > > 			__DO_TRACE(&__tracepoint_##name,		\
> > > > 				TP_PROTO(data_proto),			\
> > > > 				TP_ARGS(data_args),			\
> > > > 				TP_CONDITION(cond), 0);			\
> > > > 		if (IS_ENABLED(CONFIG_LOCKDEP) && (cond)) {		\
> > > > 			rcu_read_lock_sched_notrace();			\
> > > > 			rcu_dereference_sched(__tracepoint_##name.funcs);\
> > > > 			rcu_read_unlock_sched_notrace();		\
> > > > 		}							\
> > > > 	}
> > > > 
> > > > Because lockdep would only trigger warnings when the tracepoint was
> > > > enabled and used in a place it shouldn't be, we added the above
> > > > IS_ENABLED(CONFIG_LOCKDEP) part to test regardless if the the
> > > > tracepoint was enabled or not. Because we do this, we don't need to
> > > > have the test in the __DO_TRACE() code itself. That means we can clean
> > > > up the code as per Peter's suggestion.    
> > > 
> > > Sounds good, I'm Ok with making this change.
> > > 
> > > Just to clarify, are you proposing to change the rcu_dereference_sched to
> > > rcu_dereference_raw in both __DECLARE_TRACE and __DO_TRACE?  
> > 
> > No, just in __DO_TRACE(). The rcu_dereference_sched() above in
> > __DECLARE_TRACE() in the if (IS_ENABLED(CONFIG_LOCKDEP) block is
> > required to show the warnings if trace_##name() is used wrong, and is
> > the reason we can use rcu_dereference_raw() in __DO_TRACE() in the
> > first place ;-)
> > 
> > This brings up another point. We should probably add to
> > __DECLARE_TRACE_RCU() this:
> > 
> > #ifndef MODULE
> > #define __DECLARE_TRACE_RCU(name, proto, args, cond, data_proto, data_args) \
> > 	static inline void trace_##name##_rcuidle(proto)		\
> > 	{								\
> > 		if (static_key_false(&__tracepoint_##name.key))		\
> > 			__DO_TRACE(&__tracepoint_##name,		\
> > 				TP_PROTO(data_proto),			\
> > 				TP_ARGS(data_args),			\
> > 				TP_CONDITION(cond), 1);			\
> > +		if (IS_ENABLED(CONFIG_LOCKDEP) && (cond)) {		\
> > +			int idx;					\
> > +			idx = srcu_read_lock_notrace(&tracepoint_srcu); \
> > +			srcu_dereference_notrace(__tracepoint_##name.funcs, \
> > +						&tracepoint_srcu);	\
> > +			srcu_read_unlock_notrace(&tracepoint_srcu, idx); \
> > +		}							\
> > 	}
> > #else
> > 
> > 
> > So that lockdep works with trace_##name##__rcuidle() when the trace
> > event is not enabled.
> > 
> > But that should be a separate patch and not part of this series. I may
> > write that up tomorrow.  
> 
> Yes, that sounds good to me and would be good to add the safe guard there.
> But you meant srcu_dereference above, not srcu_dereference_notrace right?

We don't need to trace them. I believe that the "srcu_*_notrace" still
performs the lockdep checks. That's what we want. If they don't then we
should not use notrace. But I believe they still do lockdep.

-- Steve
--
To unsubscribe from this list: send the line "unsubscribe linux-kselftest" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Joel Fernandes July 12, 2018, 7:17 p.m. UTC | #23
On Thu, Jul 12, 2018 at 09:35:12AM -0400, Steven Rostedt wrote:

> On Wed, 11 Jul 2018 21:28:25 -0700
> Joel Fernandes <joel@joelfernandes.org> wrote:
> 
> > On Wed, Jul 11, 2018 at 11:21:20PM -0400, Steven Rostedt wrote:
> > > On Wed, 11 Jul 2018 13:52:49 -0700
> > > Joel Fernandes <joel@joelfernandes.org> wrote:
> > >   
> > > > > #define __DECLARE_TRACE(name, proto, args, cond, data_proto, data_args) \
> > > > > 	extern struct tracepoint __tracepoint_##name;			\
> > > > > 	static inline void trace_##name(proto)				\
> > > > > 	{								\
> > > > > 		if (static_key_false(&__tracepoint_##name.key))		\
> > > > > 			__DO_TRACE(&__tracepoint_##name,		\
> > > > > 				TP_PROTO(data_proto),			\
> > > > > 				TP_ARGS(data_args),			\
> > > > > 				TP_CONDITION(cond), 0);			\
> > > > > 		if (IS_ENABLED(CONFIG_LOCKDEP) && (cond)) {		\
> > > > > 			rcu_read_lock_sched_notrace();			\
> > > > > 			rcu_dereference_sched(__tracepoint_##name.funcs);\
> > > > > 			rcu_read_unlock_sched_notrace();		\
> > > > > 		}							\
> > > > > 	}
> > > > > 
> > > > > Because lockdep would only trigger warnings when the tracepoint was
> > > > > enabled and used in a place it shouldn't be, we added the above
> > > > > IS_ENABLED(CONFIG_LOCKDEP) part to test regardless if the the
> > > > > tracepoint was enabled or not. Because we do this, we don't need to
> > > > > have the test in the __DO_TRACE() code itself. That means we can clean
> > > > > up the code as per Peter's suggestion.    
> > > > 
> > > > Sounds good, I'm Ok with making this change.
> > > > 
> > > > Just to clarify, are you proposing to change the rcu_dereference_sched to
> > > > rcu_dereference_raw in both __DECLARE_TRACE and __DO_TRACE?  
> > > 
> > > No, just in __DO_TRACE(). The rcu_dereference_sched() above in
> > > __DECLARE_TRACE() in the if (IS_ENABLED(CONFIG_LOCKDEP) block is
> > > required to show the warnings if trace_##name() is used wrong, and is
> > > the reason we can use rcu_dereference_raw() in __DO_TRACE() in the
> > > first place ;-)
> > > 
> > > This brings up another point. We should probably add to
> > > __DECLARE_TRACE_RCU() this:
> > > 
> > > #ifndef MODULE
> > > #define __DECLARE_TRACE_RCU(name, proto, args, cond, data_proto, data_args) \
> > > 	static inline void trace_##name##_rcuidle(proto)		\
> > > 	{								\
> > > 		if (static_key_false(&__tracepoint_##name.key))		\
> > > 			__DO_TRACE(&__tracepoint_##name,		\
> > > 				TP_PROTO(data_proto),			\
> > > 				TP_ARGS(data_args),			\
> > > 				TP_CONDITION(cond), 1);			\
> > > +		if (IS_ENABLED(CONFIG_LOCKDEP) && (cond)) {		\
> > > +			int idx;					\
> > > +			idx = srcu_read_lock_notrace(&tracepoint_srcu); \
> > > +			srcu_dereference_notrace(__tracepoint_##name.funcs, \
> > > +						&tracepoint_srcu);	\
> > > +			srcu_read_unlock_notrace(&tracepoint_srcu, idx); \
> > > +		}							\
> > > 	}
> > > #else
> > > 
> > > 
> > > So that lockdep works with trace_##name##__rcuidle() when the trace
> > > event is not enabled.
> > > 
> > > But that should be a separate patch and not part of this series. I may
> > > write that up tomorrow.  
> > 
> > Yes, that sounds good to me and would be good to add the safe guard there.
> > But you meant srcu_dereference above, not srcu_dereference_notrace right?
> 
> We don't need to trace them. I believe that the "srcu_*_notrace" still
> performs the lockdep checks. That's what we want. If they don't then we
> should not use notrace. But I believe they still do lockdep.

AFAICT, _notrace doesn't call into lockdep or tracing (there's also a comment
that says so):

/**
 * srcu_dereference_notrace - no tracing and no lockdep calls from here
 */

So then, we should use the regular variant for this additional check you're
suggesting.

thanks,

- Joel

--
To unsubscribe from this list: send the line "unsubscribe linux-kselftest" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Steven Rostedt July 12, 2018, 8:15 p.m. UTC | #24
On Thu, 12 Jul 2018 12:17:01 -0700
Joel Fernandes <joel@joelfernandes.org> wrote:

> AFAICT, _notrace doesn't call into lockdep or tracing (there's also a comment
> that says so):
> 
> /**
>  * srcu_dereference_notrace - no tracing and no lockdep calls from here
>  */

Note, I had a different tree checked out, so I didn't have the source
available without digging through my email.

> 
> So then, we should use the regular variant for this additional check you're
> suggesting.

OK, I thought we had a rcu_dereference_notrace() that did checks and
thought that this followed suit, but it appears there is no such call.
That's where my confusion was.

Sure, I'll nuke the notrace() portion, thanks.

Also, I've applied 1-3, since 4 and 5 looks to be getting a remake, I'm
going to remove them from my queue. Please fold the SPDX patch into 5.

Thanks!

-- Steve
--
To unsubscribe from this list: send the line "unsubscribe linux-kselftest" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Joel Fernandes July 12, 2018, 8:29 p.m. UTC | #25
On Thu, Jul 12, 2018 at 1:15 PM, Steven Rostedt <rostedt@goodmis.org> wrote:
> On Thu, 12 Jul 2018 12:17:01 -0700
>> So then, we should use the regular variant for this additional check you're
>> suggesting.
>
> OK, I thought we had a rcu_dereference_notrace() that did checks and
> thought that this followed suit, but it appears there is no such call.
> That's where my confusion was.
>
> Sure, I'll nuke the notrace() portion, thanks.
>
> Also, I've applied 1-3, since 4 and 5 looks to be getting a remake, I'm
> going to remove them from my queue. Please fold the SPDX patch into 5.

Will do, and send out the 4 and 5 shortly with the SPDK folded.

Also the kselftest patches were acked and can be taken independently,
I had reposted them as a separate 2 patch series with some minor
changes based on your suggestions. Could you check them?

thanks!

- Joel
--
To unsubscribe from this list: send the line "unsubscribe linux-kselftest" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Steven Rostedt July 12, 2018, 8:31 p.m. UTC | #26
On Thu, 12 Jul 2018 13:29:32 -0700
Joel Fernandes <joel@joelfernandes.org> wrote:

> Also the kselftest patches were acked and can be taken independently,
> I had reposted them as a separate 2 patch series with some minor
> changes based on your suggestions. Could you check them?
>

Yep, I saw them. I was going to wait till these patches were sent, but
since they are agnostic, I'll look at them now. Thanks for letting me
know.

-- Steve
--
To unsubscribe from this list: send the line "unsubscribe linux-kselftest" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/include/linux/tracepoint.h b/include/linux/tracepoint.h
index 19a690b559ca..beeb01e147f8 100644
--- a/include/linux/tracepoint.h
+++ b/include/linux/tracepoint.h
@@ -15,6 +15,7 @@ 
  */
 
 #include <linux/smp.h>
+#include <linux/srcu.h>
 #include <linux/errno.h>
 #include <linux/types.h>
 #include <linux/cpumask.h>
@@ -33,6 +34,8 @@  struct trace_eval_map {
 
 #define TRACEPOINT_DEFAULT_PRIO	10
 
+extern struct srcu_struct tracepoint_srcu;
+
 extern int
 tracepoint_probe_register(struct tracepoint *tp, void *probe, void *data);
 extern int
@@ -75,10 +78,16 @@  int unregister_tracepoint_module_notifier(struct notifier_block *nb)
  * probe unregistration and the end of module exit to make sure there is no
  * caller executing a probe when it is freed.
  */
+#ifdef CONFIG_TRACEPOINTS
 static inline void tracepoint_synchronize_unregister(void)
 {
+	synchronize_srcu(&tracepoint_srcu);
 	synchronize_sched();
 }
+#else
+static inline void tracepoint_synchronize_unregister(void)
+{ }
+#endif
 
 #ifdef CONFIG_HAVE_SYSCALL_TRACEPOINTS
 extern int syscall_regfunc(void);
@@ -129,18 +138,38 @@  extern void syscall_unregfunc(void);
  * as "(void *, void)". The DECLARE_TRACE_NOARGS() will pass in just
  * "void *data", where as the DECLARE_TRACE() will pass in "void *data, proto".
  */
-#define __DO_TRACE(tp, proto, args, cond, rcucheck)			\
+#define __DO_TRACE(tp, proto, args, cond, rcuidle)			\
 	do {								\
 		struct tracepoint_func *it_func_ptr;			\
 		void *it_func;						\
 		void *__data;						\
+		int __maybe_unused idx = 0;				\
 									\
 		if (!(cond))						\
 			return;						\
-		if (rcucheck)						\
-			rcu_irq_enter_irqson();				\
-		rcu_read_lock_sched_notrace();				\
-		it_func_ptr = rcu_dereference_sched((tp)->funcs);	\
+									\
+		/*							\
+		 * For rcuidle callers, use srcu since sched-rcu	\
+		 * doesn't work from the idle path.			\
+		 */							\
+		if (rcuidle) {						\
+			if (in_nmi()) {					\
+				WARN_ON_ONCE(1);			\
+				return; /* no srcu from nmi */		\
+			}						\
+									\
+			idx = srcu_read_lock_notrace(&tracepoint_srcu);	\
+			it_func_ptr =					\
+				srcu_dereference_notrace((tp)->funcs,	\
+						&tracepoint_srcu);	\
+			/* To keep it consistent with !rcuidle path */	\
+			preempt_disable_notrace();			\
+		} else {						\
+			rcu_read_lock_sched_notrace();			\
+			it_func_ptr =					\
+				rcu_dereference_sched((tp)->funcs);	\
+		}							\
+									\
 		if (it_func_ptr) {					\
 			do {						\
 				it_func = (it_func_ptr)->func;		\
@@ -148,9 +177,13 @@  extern void syscall_unregfunc(void);
 				((void(*)(proto))(it_func))(args);	\
 			} while ((++it_func_ptr)->func);		\
 		}							\
-		rcu_read_unlock_sched_notrace();			\
-		if (rcucheck)						\
-			rcu_irq_exit_irqson();				\
+									\
+		if (rcuidle) {						\
+			preempt_enable_notrace();			\
+			srcu_read_unlock_notrace(&tracepoint_srcu, idx);\
+		} else {						\
+			rcu_read_unlock_sched_notrace();		\
+		}							\
 	} while (0)
 
 #ifndef MODULE
diff --git a/kernel/tracepoint.c b/kernel/tracepoint.c
index 6dc6356c3327..955148d91b74 100644
--- a/kernel/tracepoint.c
+++ b/kernel/tracepoint.c
@@ -31,6 +31,9 @@ 
 extern struct tracepoint * const __start___tracepoints_ptrs[];
 extern struct tracepoint * const __stop___tracepoints_ptrs[];
 
+DEFINE_SRCU(tracepoint_srcu);
+EXPORT_SYMBOL_GPL(tracepoint_srcu);
+
 /* Set to 1 to enable tracepoint debug output */
 static const int tracepoint_debug;
 
@@ -67,16 +70,27 @@  static inline void *allocate_probes(int count)
 	return p == NULL ? NULL : p->probes;
 }
 
-static void rcu_free_old_probes(struct rcu_head *head)
+static void srcu_free_old_probes(struct rcu_head *head)
 {
 	kfree(container_of(head, struct tp_probes, rcu));
 }
 
+static void rcu_free_old_probes(struct rcu_head *head)
+{
+	call_srcu(&tracepoint_srcu, head, srcu_free_old_probes);
+}
+
 static inline void release_probes(struct tracepoint_func *old)
 {
 	if (old) {
 		struct tp_probes *tp_probes = container_of(old,
 			struct tp_probes, probes[0]);
+		/*
+		 * Tracepoint probes are protected by both sched RCU and SRCU,
+		 * by calling the SRCU callback in the sched RCU callback we
+		 * cover both cases. So let us chain the SRCU and sched RCU
+		 * callbacks to wait for both grace periods.
+		 */
 		call_rcu_sched(&tp_probes->rcu, rcu_free_old_probes);
 	}
 }