diff mbox series

[v4,net,1/3] rcu: add a helper to report consolidated flavor QS

Message ID 491d3af6c7d66dfb3b60b2f210f38e843dfe6ed2.1710525524.git.yan@cloudflare.com (mailing list archive)
State Superseded
Delegated to: Netdev Maintainers
Headers show
Series Report RCU QS for busy network kthreads | expand

Checks

Context Check Description
netdev/series_format success Posting correctly formatted
netdev/tree_selection success Clearly marked for net
netdev/ynl success Generated files up to date; no warnings/errors; no diff in generated;
netdev/fixes_present success Fixes tag present in non-next series
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 15258 this patch: 15258
netdev/build_tools success Errors and warnings before: 0 this patch: 0
netdev/cc_maintainers warning 7 maintainers not CCed: josh@joshtriplett.org boqun.feng@gmail.com jiangshanlai@gmail.com mathieu.desnoyers@efficios.com qiang.zhang1211@gmail.com frederic@kernel.org quic_neeraju@quicinc.com
netdev/build_clang success Errors and warnings before: 3081 this patch: 3081
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/deprecated_api success None detected
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success No Fixes tag
netdev/build_allmodconfig_warn success Errors and warnings before: 16175 this patch: 16175
netdev/checkpatch success total: 0 errors, 0 warnings, 0 checks, 30 lines checked
netdev/build_clang_rust success No Rust files in patch. Skipping build
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0
netdev/contest success net-next-2024-03-19--18-00 (tests: 907)

Commit Message

Yan Zhai March 15, 2024, 7:55 p.m. UTC
There are several scenario in network processing that can run
extensively under heavy traffic. In such situation, RCU synchronization
might not observe desired quiescent states for indefinitely long period.
Create a helper to safely raise the desired RCU quiescent states for
such scenario.

Currently the frequency is locked at HZ/10, i.e. 100ms, which is
sufficient to address existing problems around RCU tasks. It's unclear
yet if there is any future scenario for it to be further tuned down.

Suggested-by: Paul E. McKenney <paulmck@kernel.org>
Reviewed-by: Jesper Dangaard Brouer <hawk@kernel.org>
Signed-off-by: Yan Zhai <yan@cloudflare.com>
---
v3->v4: comment fixup

---
 include/linux/rcupdate.h | 24 ++++++++++++++++++++++++
 1 file changed, 24 insertions(+)

Comments

Paul E. McKenney March 16, 2024, 5:40 a.m. UTC | #1
On Fri, Mar 15, 2024 at 12:55:03PM -0700, Yan Zhai wrote:
> There are several scenario in network processing that can run
> extensively under heavy traffic. In such situation, RCU synchronization
> might not observe desired quiescent states for indefinitely long period.
> Create a helper to safely raise the desired RCU quiescent states for
> such scenario.
> 
> Currently the frequency is locked at HZ/10, i.e. 100ms, which is
> sufficient to address existing problems around RCU tasks. It's unclear
> yet if there is any future scenario for it to be further tuned down.

I suggest something like the following for the commit log:

------------------------------------------------------------------------

When under heavy load, network processing can run CPU-bound for many tens
of seconds.  Even in preemptible kernels, this can block RCU Tasks grace
periods, which can cause trace-event removal to take more than a minute,
which is unacceptably long.

This commit therefore creates a new helper function that passes
through both RCU and RCU-Tasks quiescent states every 100 milliseconds.
This hard-coded value suffices for current workloads.

------------------------------------------------------------------------

> Suggested-by: Paul E. McKenney <paulmck@kernel.org>
> Reviewed-by: Jesper Dangaard Brouer <hawk@kernel.org>
> Signed-off-by: Yan Zhai <yan@cloudflare.com>
> ---
> v3->v4: comment fixup
> 
> ---
>  include/linux/rcupdate.h | 24 ++++++++++++++++++++++++
>  1 file changed, 24 insertions(+)
> 
> diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
> index 0746b1b0b663..da224706323e 100644
> --- a/include/linux/rcupdate.h
> +++ b/include/linux/rcupdate.h
> @@ -247,6 +247,30 @@ do { \
>  	cond_resched(); \
>  } while (0)
>  
> +/**
> + * rcu_softirq_qs_periodic - Periodically report consolidated quiescent states
> + * @old_ts: last jiffies when QS was reported. Might be modified in the macro.
> + *
> + * This helper is for network processing in non-RT kernels, where there could
> + * be busy polling threads that block RCU synchronization indefinitely.  In
> + * such context, simply calling cond_resched is insufficient, so give it a
> + * stronger push to eliminate all potential blockage of all RCU types.
> + *
> + * NOTE: unless absolutely sure, this helper should in general be called
> + * outside of bh lock section to avoid reporting a surprising QS to updaters,
> + * who could be expecting RCU read critical section to end at local_bh_enable().
> + */

How about something like this for the kernel-doc comment?

/**
 * rcu_softirq_qs_periodic - Report RCU and RCU-Tasks quiescent states
 * @old_ts: jiffies at start of processing.
 *
 * This helper is for long-running softirq handlers, such as those
 * in networking.  The caller should initialize the variable passed in
 * as @old_ts at the beginning of the softirq handler.  When invoked
 * frequently, this macro will invoke rcu_softirq_qs() every 100
 * milliseconds thereafter, which will provide both RCU and RCU-Tasks
 * quiescent states.  Note that this macro modifies its old_ts argument.
 *
 * Note that although cond_resched() provides RCU quiescent states,
 * it does not provide RCU-Tasks quiescent states.
 *
 * Because regions of code that have disabled softirq act as RCU
 * read-side critical sections, this macro should be invoked with softirq
 * (and preemption) enabled.
 *
 * This macro has no effect in CONFIG_PREEMPT_RT kernels.
 */

							Thanx, Paul

> +#define rcu_softirq_qs_periodic(old_ts) \
> +do { \
> +	if (!IS_ENABLED(CONFIG_PREEMPT_RT) && \
> +	    time_after(jiffies, (old_ts) + HZ / 10)) { \
> +		preempt_disable(); \
> +		rcu_softirq_qs(); \
> +		preempt_enable(); \
> +		(old_ts) = jiffies; \
> +	} \
> +} while (0)
> +
>  /*
>   * Infrastructure to implement the synchronize_() primitives in
>   * TREE_RCU and rcu_barrier_() primitives in TINY_RCU.
> -- 
> 2.30.2
> 
>
Mark Rutland March 18, 2024, 10:58 a.m. UTC | #2
On Fri, Mar 15, 2024 at 10:40:56PM -0700, Paul E. McKenney wrote:
> On Fri, Mar 15, 2024 at 12:55:03PM -0700, Yan Zhai wrote:
> > There are several scenario in network processing that can run
> > extensively under heavy traffic. In such situation, RCU synchronization
> > might not observe desired quiescent states for indefinitely long period.
> > Create a helper to safely raise the desired RCU quiescent states for
> > such scenario.
> > 
> > Currently the frequency is locked at HZ/10, i.e. 100ms, which is
> > sufficient to address existing problems around RCU tasks. It's unclear
> > yet if there is any future scenario for it to be further tuned down.
> 
> I suggest something like the following for the commit log:
> 
> ------------------------------------------------------------------------
> 
> When under heavy load, network processing can run CPU-bound for many tens
> of seconds.  Even in preemptible kernels, this can block RCU Tasks grace
> periods, which can cause trace-event removal to take more than a minute,
> which is unacceptably long.
> 
> This commit therefore creates a new helper function that passes
> through both RCU and RCU-Tasks quiescent states every 100 milliseconds.
> This hard-coded value suffices for current workloads.

FWIW, this sounds good to me.

> 
> ------------------------------------------------------------------------
> 
> > Suggested-by: Paul E. McKenney <paulmck@kernel.org>
> > Reviewed-by: Jesper Dangaard Brouer <hawk@kernel.org>
> > Signed-off-by: Yan Zhai <yan@cloudflare.com>
> > ---
> > v3->v4: comment fixup
> > 
> > ---
> >  include/linux/rcupdate.h | 24 ++++++++++++++++++++++++
> >  1 file changed, 24 insertions(+)
> > 
> > diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
> > index 0746b1b0b663..da224706323e 100644
> > --- a/include/linux/rcupdate.h
> > +++ b/include/linux/rcupdate.h
> > @@ -247,6 +247,30 @@ do { \
> >  	cond_resched(); \
> >  } while (0)
> >  
> > +/**
> > + * rcu_softirq_qs_periodic - Periodically report consolidated quiescent states
> > + * @old_ts: last jiffies when QS was reported. Might be modified in the macro.
> > + *
> > + * This helper is for network processing in non-RT kernels, where there could
> > + * be busy polling threads that block RCU synchronization indefinitely.  In
> > + * such context, simply calling cond_resched is insufficient, so give it a
> > + * stronger push to eliminate all potential blockage of all RCU types.
> > + *
> > + * NOTE: unless absolutely sure, this helper should in general be called
> > + * outside of bh lock section to avoid reporting a surprising QS to updaters,
> > + * who could be expecting RCU read critical section to end at local_bh_enable().
> > + */
> 
> How about something like this for the kernel-doc comment?
> 
> /**
>  * rcu_softirq_qs_periodic - Report RCU and RCU-Tasks quiescent states
>  * @old_ts: jiffies at start of processing.
>  *
>  * This helper is for long-running softirq handlers, such as those
>  * in networking.  The caller should initialize the variable passed in
>  * as @old_ts at the beginning of the softirq handler.  When invoked
>  * frequently, this macro will invoke rcu_softirq_qs() every 100
>  * milliseconds thereafter, which will provide both RCU and RCU-Tasks
>  * quiescent states.  Note that this macro modifies its old_ts argument.
>  *
>  * Note that although cond_resched() provides RCU quiescent states,
>  * it does not provide RCU-Tasks quiescent states.
>  *
>  * Because regions of code that have disabled softirq act as RCU
>  * read-side critical sections, this macro should be invoked with softirq
>  * (and preemption) enabled.
>  *
>  * This macro has no effect in CONFIG_PREEMPT_RT kernels.
>  */

Considering the note about cond_resched(), does does cond_resched() actually
provide an RCU quiescent state for fully-preemptible kernels? IIUC for those
cond_resched() expands to:

	__might_resched();
	klp_sched_try_switch()

... and AFAICT neither reports an RCU quiescent state.

So maybe it's worth dropping the note?

Seperately, what's the rationale for not doing this on PREEMPT_RT? Does that
avoid the problem through other means, or are people just not running effected
workloads on that?

Mark.

> 
> 							Thanx, Paul
> 
> > +#define rcu_softirq_qs_periodic(old_ts) \
> > +do { \
> > +	if (!IS_ENABLED(CONFIG_PREEMPT_RT) && \
> > +	    time_after(jiffies, (old_ts) + HZ / 10)) { \
> > +		preempt_disable(); \
> > +		rcu_softirq_qs(); \
> > +		preempt_enable(); \
> > +		(old_ts) = jiffies; \
> > +	} \
> > +} while (0)
> > +
> >  /*
> >   * Infrastructure to implement the synchronize_() primitives in
> >   * TREE_RCU and rcu_barrier_() primitives in TINY_RCU.
> > -- 
> > 2.30.2
> > 
> >
Yan Zhai March 19, 2024, 1:26 a.m. UTC | #3
On Sat, Mar 16, 2024 at 12:41 AM Paul E. McKenney <paulmck@kernel.org> wrote:
>
> On Fri, Mar 15, 2024 at 12:55:03PM -0700, Yan Zhai wrote:
> > There are several scenario in network processing that can run
> > extensively under heavy traffic. In such situation, RCU synchronization
> > might not observe desired quiescent states for indefinitely long period.
> > Create a helper to safely raise the desired RCU quiescent states for
> > such scenario.
> >
> > Currently the frequency is locked at HZ/10, i.e. 100ms, which is
> > sufficient to address existing problems around RCU tasks. It's unclear
> > yet if there is any future scenario for it to be further tuned down.
>
> I suggest something like the following for the commit log:
>
> ------------------------------------------------------------------------
>
> When under heavy load, network processing can run CPU-bound for many tens
> of seconds.  Even in preemptible kernels, this can block RCU Tasks grace
> periods, which can cause trace-event removal to take more than a minute,
> which is unacceptably long.
>
> This commit therefore creates a new helper function that passes
> through both RCU and RCU-Tasks quiescent states every 100 milliseconds.
> This hard-coded value suffices for current workloads.
>
> ------------------------------------------------------------------------
>
> > Suggested-by: Paul E. McKenney <paulmck@kernel.org>
> > Reviewed-by: Jesper Dangaard Brouer <hawk@kernel.org>
> > Signed-off-by: Yan Zhai <yan@cloudflare.com>
> > ---
> > v3->v4: comment fixup
> >
> > ---
> >  include/linux/rcupdate.h | 24 ++++++++++++++++++++++++
> >  1 file changed, 24 insertions(+)
> >
> > diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
> > index 0746b1b0b663..da224706323e 100644
> > --- a/include/linux/rcupdate.h
> > +++ b/include/linux/rcupdate.h
> > @@ -247,6 +247,30 @@ do { \
> >       cond_resched(); \
> >  } while (0)
> >
> > +/**
> > + * rcu_softirq_qs_periodic - Periodically report consolidated quiescent states
> > + * @old_ts: last jiffies when QS was reported. Might be modified in the macro.
> > + *
> > + * This helper is for network processing in non-RT kernels, where there could
> > + * be busy polling threads that block RCU synchronization indefinitely.  In
> > + * such context, simply calling cond_resched is insufficient, so give it a
> > + * stronger push to eliminate all potential blockage of all RCU types.
> > + *
> > + * NOTE: unless absolutely sure, this helper should in general be called
> > + * outside of bh lock section to avoid reporting a surprising QS to updaters,
> > + * who could be expecting RCU read critical section to end at local_bh_enable().
> > + */
>
> How about something like this for the kernel-doc comment?
>
> /**
>  * rcu_softirq_qs_periodic - Report RCU and RCU-Tasks quiescent states
>  * @old_ts: jiffies at start of processing.
>  *
>  * This helper is for long-running softirq handlers, such as those
>  * in networking.  The caller should initialize the variable passed in
>  * as @old_ts at the beginning of the softirq handler.  When invoked
>  * frequently, this macro will invoke rcu_softirq_qs() every 100
>  * milliseconds thereafter, which will provide both RCU and RCU-Tasks
>  * quiescent states.  Note that this macro modifies its old_ts argument.
>  *
>  * Note that although cond_resched() provides RCU quiescent states,
>  * it does not provide RCU-Tasks quiescent states.
>  *
>  * Because regions of code that have disabled softirq act as RCU
>  * read-side critical sections, this macro should be invoked with softirq
>  * (and preemption) enabled.
>  *
>  * This macro has no effect in CONFIG_PREEMPT_RT kernels.
>  */
>
It would be more accurate this way, I like it. Thanks!

Yan

>                                                         Thanx, Paul
>
> > +#define rcu_softirq_qs_periodic(old_ts) \
> > +do { \
> > +     if (!IS_ENABLED(CONFIG_PREEMPT_RT) && \
> > +         time_after(jiffies, (old_ts) + HZ / 10)) { \
> > +             preempt_disable(); \
> > +             rcu_softirq_qs(); \
> > +             preempt_enable(); \
> > +             (old_ts) = jiffies; \
> > +     } \
> > +} while (0)
> > +
> >  /*
> >   * Infrastructure to implement the synchronize_() primitives in
> >   * TREE_RCU and rcu_barrier_() primitives in TINY_RCU.
> > --
> > 2.30.2
> >
> >
Yan Zhai March 19, 2024, 2:32 a.m. UTC | #4
On Mon, Mar 18, 2024 at 5:59 AM Mark Rutland <mark.rutland@arm.com> wrote:
>
> On Fri, Mar 15, 2024 at 10:40:56PM -0700, Paul E. McKenney wrote:
> > On Fri, Mar 15, 2024 at 12:55:03PM -0700, Yan Zhai wrote:
> > > There are several scenario in network processing that can run
> > > extensively under heavy traffic. In such situation, RCU synchronization
> > > might not observe desired quiescent states for indefinitely long period.
> > > Create a helper to safely raise the desired RCU quiescent states for
> > > such scenario.
> > >
> > > Currently the frequency is locked at HZ/10, i.e. 100ms, which is
> > > sufficient to address existing problems around RCU tasks. It's unclear
> > > yet if there is any future scenario for it to be further tuned down.
> >
> > I suggest something like the following for the commit log:
> >
> > ------------------------------------------------------------------------
> >
> > When under heavy load, network processing can run CPU-bound for many tens
> > of seconds.  Even in preemptible kernels, this can block RCU Tasks grace
> > periods, which can cause trace-event removal to take more than a minute,
> > which is unacceptably long.
> >
> > This commit therefore creates a new helper function that passes
> > through both RCU and RCU-Tasks quiescent states every 100 milliseconds.
> > This hard-coded value suffices for current workloads.
>
> FWIW, this sounds good to me.
>
> >
> > ------------------------------------------------------------------------
> >
> > > Suggested-by: Paul E. McKenney <paulmck@kernel.org>
> > > Reviewed-by: Jesper Dangaard Brouer <hawk@kernel.org>
> > > Signed-off-by: Yan Zhai <yan@cloudflare.com>
> > > ---
> > > v3->v4: comment fixup
> > >
> > > ---
> > >  include/linux/rcupdate.h | 24 ++++++++++++++++++++++++
> > >  1 file changed, 24 insertions(+)
> > >
> > > diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
> > > index 0746b1b0b663..da224706323e 100644
> > > --- a/include/linux/rcupdate.h
> > > +++ b/include/linux/rcupdate.h
> > > @@ -247,6 +247,30 @@ do { \
> > >     cond_resched(); \
> > >  } while (0)
> > >
> > > +/**
> > > + * rcu_softirq_qs_periodic - Periodically report consolidated quiescent states
> > > + * @old_ts: last jiffies when QS was reported. Might be modified in the macro.
> > > + *
> > > + * This helper is for network processing in non-RT kernels, where there could
> > > + * be busy polling threads that block RCU synchronization indefinitely.  In
> > > + * such context, simply calling cond_resched is insufficient, so give it a
> > > + * stronger push to eliminate all potential blockage of all RCU types.
> > > + *
> > > + * NOTE: unless absolutely sure, this helper should in general be called
> > > + * outside of bh lock section to avoid reporting a surprising QS to updaters,
> > > + * who could be expecting RCU read critical section to end at local_bh_enable().
> > > + */
> >
> > How about something like this for the kernel-doc comment?
> >
> > /**
> >  * rcu_softirq_qs_periodic - Report RCU and RCU-Tasks quiescent states
> >  * @old_ts: jiffies at start of processing.
> >  *
> >  * This helper is for long-running softirq handlers, such as those
> >  * in networking.  The caller should initialize the variable passed in
> >  * as @old_ts at the beginning of the softirq handler.  When invoked
> >  * frequently, this macro will invoke rcu_softirq_qs() every 100
> >  * milliseconds thereafter, which will provide both RCU and RCU-Tasks
> >  * quiescent states.  Note that this macro modifies its old_ts argument.
> >  *
> >  * Note that although cond_resched() provides RCU quiescent states,
> >  * it does not provide RCU-Tasks quiescent states.
> >  *
> >  * Because regions of code that have disabled softirq act as RCU
> >  * read-side critical sections, this macro should be invoked with softirq
> >  * (and preemption) enabled.
> >  *
> >  * This macro has no effect in CONFIG_PREEMPT_RT kernels.
> >  */
>
> Considering the note about cond_resched(), does does cond_resched() actually
> provide an RCU quiescent state for fully-preemptible kernels? IIUC for those
> cond_resched() expands to:
>
>         __might_resched();
>         klp_sched_try_switch()
>
> ... and AFAICT neither reports an RCU quiescent state.
>
> So maybe it's worth dropping the note?
>
> Seperately, what's the rationale for not doing this on PREEMPT_RT? Does that
> avoid the problem through other means, or are people just not running effected
> workloads on that?
>
It's a bit anti-intuition but yes the RT kernel avoids the problem.
This is because "schedule()" reports task RCU QS actually, and on RT
kernel cond_resched() call won't call "__cond_resched()" or
"__schedule(PREEMPT)" as you already pointed out, which would clear
need-resched flag. This then allows "schedule()" to be called on hard
IRQ exit time by time.

Yan

> Mark.
>
> >
> >                                                       Thanx, Paul
> >
> > > +#define rcu_softirq_qs_periodic(old_ts) \
> > > +do { \
> > > +   if (!IS_ENABLED(CONFIG_PREEMPT_RT) && \
> > > +       time_after(jiffies, (old_ts) + HZ / 10)) { \
> > > +           preempt_disable(); \
> > > +           rcu_softirq_qs(); \
> > > +           preempt_enable(); \
> > > +           (old_ts) = jiffies; \
> > > +   } \
> > > +} while (0)
> > > +
> > >  /*
> > >   * Infrastructure to implement the synchronize_() primitives in
> > >   * TREE_RCU and rcu_barrier_() primitives in TINY_RCU.
> > > --
> > > 2.30.2
> > >
> > >
Yan Zhai March 19, 2024, 2:39 a.m. UTC | #5
On Mon, Mar 18, 2024 at 9:32 PM Yan Zhai <yan@cloudflare.com> wrote:
>
> On Mon, Mar 18, 2024 at 5:59 AM Mark Rutland <mark.rutland@arm.com> wrote:
> >
> > On Fri, Mar 15, 2024 at 10:40:56PM -0700, Paul E. McKenney wrote:
> > > On Fri, Mar 15, 2024 at 12:55:03PM -0700, Yan Zhai wrote:
> > > > There are several scenario in network processing that can run
> > > > extensively under heavy traffic. In such situation, RCU synchronization
> > > > might not observe desired quiescent states for indefinitely long period.
> > > > Create a helper to safely raise the desired RCU quiescent states for
> > > > such scenario.
> > > >
> > > > Currently the frequency is locked at HZ/10, i.e. 100ms, which is
> > > > sufficient to address existing problems around RCU tasks. It's unclear
> > > > yet if there is any future scenario for it to be further tuned down.
> > >
> > > I suggest something like the following for the commit log:
> > >
> > > ------------------------------------------------------------------------
> > >
> > > When under heavy load, network processing can run CPU-bound for many tens
> > > of seconds.  Even in preemptible kernels, this can block RCU Tasks grace
> > > periods, which can cause trace-event removal to take more than a minute,
> > > which is unacceptably long.
> > >
> > > This commit therefore creates a new helper function that passes
> > > through both RCU and RCU-Tasks quiescent states every 100 milliseconds.
> > > This hard-coded value suffices for current workloads.
> >
> > FWIW, this sounds good to me.
> >
> > >
> > > ------------------------------------------------------------------------
> > >
> > > > Suggested-by: Paul E. McKenney <paulmck@kernel.org>
> > > > Reviewed-by: Jesper Dangaard Brouer <hawk@kernel.org>
> > > > Signed-off-by: Yan Zhai <yan@cloudflare.com>
> > > > ---
> > > > v3->v4: comment fixup
> > > >
> > > > ---
> > > >  include/linux/rcupdate.h | 24 ++++++++++++++++++++++++
> > > >  1 file changed, 24 insertions(+)
> > > >
> > > > diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
> > > > index 0746b1b0b663..da224706323e 100644
> > > > --- a/include/linux/rcupdate.h
> > > > +++ b/include/linux/rcupdate.h
> > > > @@ -247,6 +247,30 @@ do { \
> > > >     cond_resched(); \
> > > >  } while (0)
> > > >
> > > > +/**
> > > > + * rcu_softirq_qs_periodic - Periodically report consolidated quiescent states
> > > > + * @old_ts: last jiffies when QS was reported. Might be modified in the macro.
> > > > + *
> > > > + * This helper is for network processing in non-RT kernels, where there could
> > > > + * be busy polling threads that block RCU synchronization indefinitely.  In
> > > > + * such context, simply calling cond_resched is insufficient, so give it a
> > > > + * stronger push to eliminate all potential blockage of all RCU types.
> > > > + *
> > > > + * NOTE: unless absolutely sure, this helper should in general be called
> > > > + * outside of bh lock section to avoid reporting a surprising QS to updaters,
> > > > + * who could be expecting RCU read critical section to end at local_bh_enable().
> > > > + */
> > >
> > > How about something like this for the kernel-doc comment?
> > >
> > > /**
> > >  * rcu_softirq_qs_periodic - Report RCU and RCU-Tasks quiescent states
> > >  * @old_ts: jiffies at start of processing.
> > >  *
> > >  * This helper is for long-running softirq handlers, such as those
> > >  * in networking.  The caller should initialize the variable passed in
> > >  * as @old_ts at the beginning of the softirq handler.  When invoked
> > >  * frequently, this macro will invoke rcu_softirq_qs() every 100
> > >  * milliseconds thereafter, which will provide both RCU and RCU-Tasks
> > >  * quiescent states.  Note that this macro modifies its old_ts argument.
> > >  *
> > >  * Note that although cond_resched() provides RCU quiescent states,
> > >  * it does not provide RCU-Tasks quiescent states.
> > >  *
> > >  * Because regions of code that have disabled softirq act as RCU
> > >  * read-side critical sections, this macro should be invoked with softirq
> > >  * (and preemption) enabled.
> > >  *
> > >  * This macro has no effect in CONFIG_PREEMPT_RT kernels.
> > >  */
> >
> > Considering the note about cond_resched(), does does cond_resched() actually
> > provide an RCU quiescent state for fully-preemptible kernels? IIUC for those
> > cond_resched() expands to:
> >
> >         __might_resched();
> >         klp_sched_try_switch()
> >
> > ... and AFAICT neither reports an RCU quiescent state.
> >
> > So maybe it's worth dropping the note?
> >
> > Seperately, what's the rationale for not doing this on PREEMPT_RT? Does that
> > avoid the problem through other means, or are people just not running effected
> > workloads on that?
> >
> It's a bit anti-intuition but yes the RT kernel avoids the problem.
> This is because "schedule()" reports task RCU QS actually, and on RT
> kernel cond_resched() call won't call "__cond_resched()" or
> "__schedule(PREEMPT)" as you already pointed out, which would clear
> need-resched flag. This then allows "schedule()" to be called on hard
> IRQ exit time by time.
>

And these are excellent questions that I should originally include in
the comment. Thanks for bringing it up.
Let me send another version tomorrow, allowing more thoughts on this if any.

thanks
Yan

> Yan
>
> > Mark.
> >
> > >
> > >                                                       Thanx, Paul
> > >
> > > > +#define rcu_softirq_qs_periodic(old_ts) \
> > > > +do { \
> > > > +   if (!IS_ENABLED(CONFIG_PREEMPT_RT) && \
> > > > +       time_after(jiffies, (old_ts) + HZ / 10)) { \
> > > > +           preempt_disable(); \
> > > > +           rcu_softirq_qs(); \
> > > > +           preempt_enable(); \
> > > > +           (old_ts) = jiffies; \
> > > > +   } \
> > > > +} while (0)
> > > > +
> > > >  /*
> > > >   * Infrastructure to implement the synchronize_() primitives in
> > > >   * TREE_RCU and rcu_barrier_() primitives in TINY_RCU.
> > > > --
> > > > 2.30.2
> > > >
> > > >
diff mbox series

Patch

diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
index 0746b1b0b663..da224706323e 100644
--- a/include/linux/rcupdate.h
+++ b/include/linux/rcupdate.h
@@ -247,6 +247,30 @@  do { \
 	cond_resched(); \
 } while (0)
 
+/**
+ * rcu_softirq_qs_periodic - Periodically report consolidated quiescent states
+ * @old_ts: last jiffies when QS was reported. Might be modified in the macro.
+ *
+ * This helper is for network processing in non-RT kernels, where there could
+ * be busy polling threads that block RCU synchronization indefinitely.  In
+ * such context, simply calling cond_resched is insufficient, so give it a
+ * stronger push to eliminate all potential blockage of all RCU types.
+ *
+ * NOTE: unless absolutely sure, this helper should in general be called
+ * outside of bh lock section to avoid reporting a surprising QS to updaters,
+ * who could be expecting RCU read critical section to end at local_bh_enable().
+ */
+#define rcu_softirq_qs_periodic(old_ts) \
+do { \
+	if (!IS_ENABLED(CONFIG_PREEMPT_RT) && \
+	    time_after(jiffies, (old_ts) + HZ / 10)) { \
+		preempt_disable(); \
+		rcu_softirq_qs(); \
+		preempt_enable(); \
+		(old_ts) = jiffies; \
+	} \
+} while (0)
+
 /*
  * Infrastructure to implement the synchronize_() primitives in
  * TREE_RCU and rcu_barrier_() primitives in TINY_RCU.