Message ID | 20250318135619.4300-2-frederic@kernel.org (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | rcu: Random updates | expand |
On Tue, Mar 18, 2025 at 02:56:18PM +0100, Frederic Weisbecker wrote: > The numbers used in rcu_seq_done_exact() lack some explanation behind > their magic. Especially after the commit: > > 85aad7cc4178 ("rcu: Fix get_state_synchronize_rcu_full() GP-start detection") > > which reported a subtle issue where a new GP sequence snapshot was taken > on the root node state while a grace period had already been started and > reflected on the global state sequence but not yet on the root node > sequence, making a polling user waiting on a wrong already started grace > period that would ignore freshly online CPUs. > > The fix involved taking the snaphot on the global state sequence and > waiting on the root node sequence. And since a grace period is first > started on the global state and only afterward reflected on the root > node, a snapshot taken on the global state sequence might be two full > grace periods ahead of the root node as in the following example: > > rnp->gp_seq = rcu_state.gp_seq = 0 > > CPU 0 CPU 1 > ----- ----- > // rcu_state.gp_seq = 1 > rcu_seq_start(&rcu_state.gp_seq) > // snap = 8 > snap = rcu_seq_snap(&rcu_state.gp_seq) > // Two full GP differences > rcu_seq_done_exact(&rnp->gp_seq, snap) > // rnp->gp_seq = 1 > WRITE_ONCE(rnp->gp_seq, rcu_state.gp_seq); > > Add a comment about those expectations and to clarify the magic within > the relevant function. > > Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Reviewed-by: Paul E. McKenney <paulmck@kernel.org> But it would of course be good to get reviews from the others. > --- > kernel/rcu/rcu.h | 7 +++++++ > 1 file changed, 7 insertions(+) > > diff --git a/kernel/rcu/rcu.h b/kernel/rcu/rcu.h > index eed2951a4962..7acf1f36dd6c 100644 > --- a/kernel/rcu/rcu.h > +++ b/kernel/rcu/rcu.h > @@ -157,6 +157,13 @@ static inline bool rcu_seq_done(unsigned long *sp, unsigned long s) > * Given a snapshot from rcu_seq_snap(), determine whether or not a > * full update-side operation has occurred, but do not allow the > * (ULONG_MAX / 2) safety-factor/guard-band. > + * > + * The token returned by get_state_synchronize_rcu_full() is based on > + * rcu_state.gp_seq but it is tested in poll_state_synchronize_rcu_full() > + * against the root rnp->gp_seq. Since rcu_seq_start() is first called > + * on rcu_state.gp_seq and only later reflected on the root rnp->gp_seq, > + * it is possible that rcu_seq_snap(rcu_state.gp_seq) returns 2 full grace > + * periods ahead of the root rnp->gp_seq. > */ > static inline bool rcu_seq_done_exact(unsigned long *sp, unsigned long s) > { > -- > 2.48.1 >
On Tue, Mar 18, 2025 at 11:37:38AM -0700, Paul E. McKenney wrote: > On Tue, Mar 18, 2025 at 02:56:18PM +0100, Frederic Weisbecker wrote: > > The numbers used in rcu_seq_done_exact() lack some explanation behind > > their magic. Especially after the commit: > > > > 85aad7cc4178 ("rcu: Fix get_state_synchronize_rcu_full() GP-start detection") > > > > which reported a subtle issue where a new GP sequence snapshot was taken > > on the root node state while a grace period had already been started and > > reflected on the global state sequence but not yet on the root node > > sequence, making a polling user waiting on a wrong already started grace > > period that would ignore freshly online CPUs. > > > > The fix involved taking the snaphot on the global state sequence and > > waiting on the root node sequence. And since a grace period is first > > started on the global state and only afterward reflected on the root > > node, a snapshot taken on the global state sequence might be two full > > grace periods ahead of the root node as in the following example: > > > > rnp->gp_seq = rcu_state.gp_seq = 0 > > > > CPU 0 CPU 1 > > ----- ----- > > // rcu_state.gp_seq = 1 > > rcu_seq_start(&rcu_state.gp_seq) > > // snap = 8 > > snap = rcu_seq_snap(&rcu_state.gp_seq) > > // Two full GP differences > > rcu_seq_done_exact(&rnp->gp_seq, snap) > > // rnp->gp_seq = 1 > > WRITE_ONCE(rnp->gp_seq, rcu_state.gp_seq); > > > > Add a comment about those expectations and to clarify the magic within > > the relevant function. > > > > Signed-off-by: Frederic Weisbecker <frederic@kernel.org> > > Reviewed-by: Paul E. McKenney <paulmck@kernel.org> > > But it would of course be good to get reviews from the others. I actually don't agree that the magic in the rcu_seq_done_exact() function about the ~2 GPs is related to the lag between rcu_state.gp_seq and root rnp->gp_seq, because the small lag can just as well survive with the rcu_seq_done() function in the above sequence right? The rcu_seq_done_exact() function on the other hand is more about not being stuck in the ULONG_MAX/2 guard band, but to actually get to that, you need a wrap around to happen and the delta between "rnp->gp_seq" and "snap" to be at least ULONG_MAX/2 AFAIU. So the only time this magic will matter is if you have a huge delta between what is being compared, not just 2 GPs. Or, did I miss something? (Also sorry about slow email responses this week as I had my presentation today and was busy preparing this week and attending other presentations at OSPM, I'll provide an update on that soon!). thanks, - Joel > > > --- > > kernel/rcu/rcu.h | 7 +++++++ > > 1 file changed, 7 insertions(+) > > > > diff --git a/kernel/rcu/rcu.h b/kernel/rcu/rcu.h > > index eed2951a4962..7acf1f36dd6c 100644 > > --- a/kernel/rcu/rcu.h > > +++ b/kernel/rcu/rcu.h > > @@ -157,6 +157,13 @@ static inline bool rcu_seq_done(unsigned long *sp, unsigned long s) > > * Given a snapshot from rcu_seq_snap(), determine whether or not a > > * full update-side operation has occurred, but do not allow the > > * (ULONG_MAX / 2) safety-factor/guard-band. > > + * > > + * The token returned by get_state_synchronize_rcu_full() is based on > > + * rcu_state.gp_seq but it is tested in poll_state_synchronize_rcu_full() > > + * against the root rnp->gp_seq. Since rcu_seq_start() is first called > > + * on rcu_state.gp_seq and only later reflected on the root rnp->gp_seq, > > + * it is possible that rcu_seq_snap(rcu_state.gp_seq) returns 2 full grace > > + * periods ahead of the root rnp->gp_seq. > > */ > > static inline bool rcu_seq_done_exact(unsigned long *sp, unsigned long s) > > { > > -- > > 2.48.1 > >
Le Wed, Mar 19, 2025 at 03:38:31PM -0400, Joel Fernandes a écrit : > On Tue, Mar 18, 2025 at 11:37:38AM -0700, Paul E. McKenney wrote: > > On Tue, Mar 18, 2025 at 02:56:18PM +0100, Frederic Weisbecker wrote: > > > The numbers used in rcu_seq_done_exact() lack some explanation behind > > > their magic. Especially after the commit: > > > > > > 85aad7cc4178 ("rcu: Fix get_state_synchronize_rcu_full() GP-start detection") > > > > > > which reported a subtle issue where a new GP sequence snapshot was taken > > > on the root node state while a grace period had already been started and > > > reflected on the global state sequence but not yet on the root node > > > sequence, making a polling user waiting on a wrong already started grace > > > period that would ignore freshly online CPUs. > > > > > > The fix involved taking the snaphot on the global state sequence and > > > waiting on the root node sequence. And since a grace period is first > > > started on the global state and only afterward reflected on the root > > > node, a snapshot taken on the global state sequence might be two full > > > grace periods ahead of the root node as in the following example: > > > > > > rnp->gp_seq = rcu_state.gp_seq = 0 > > > > > > CPU 0 CPU 1 > > > ----- ----- > > > // rcu_state.gp_seq = 1 > > > rcu_seq_start(&rcu_state.gp_seq) > > > // snap = 8 > > > snap = rcu_seq_snap(&rcu_state.gp_seq) > > > // Two full GP differences > > > rcu_seq_done_exact(&rnp->gp_seq, snap) > > > // rnp->gp_seq = 1 > > > WRITE_ONCE(rnp->gp_seq, rcu_state.gp_seq); > > > > > > Add a comment about those expectations and to clarify the magic within > > > the relevant function. > > > > > > Signed-off-by: Frederic Weisbecker <frederic@kernel.org> > > > > Reviewed-by: Paul E. McKenney <paulmck@kernel.org> > > > > But it would of course be good to get reviews from the others. > > I actually don't agree that the magic in the rcu_seq_done_exact() function about the > ~2 GPs is related to the lag between rcu_state.gp_seq and root rnp->gp_seq, > because the small lag can just as well survive with the rcu_seq_done() > function in the above sequence right? > > The rcu_seq_done_exact() function on the other hand is more about not being > stuck in the ULONG_MAX/2 guard band, but to actually get to that, you need a > wrap around to happen and the delta between "rnp->gp_seq" and "snap" to be at > least ULONG_MAX/2 AFAIU. > > So the only time this magic will matter is if you have a huge delta between > what is being compared, not just 2 GPs. You're right, and perhaps I should have made it more specific that my comment only explains the magic "3" number here, in that if it were "2" instead, there could be accidents with 2 full GPs difference (which is possible) spuriously accounted as a wrap around. Thanks. > > Or, did I miss something? > > (Also sorry about slow email responses this week as I had my presentation > today and was busy preparing this week and attending other presentations at > OSPM, I'll provide an update on that soon!). > > thanks, > > - Joel > > > > > > > > > > > > > > --- > > > kernel/rcu/rcu.h | 7 +++++++ > > > 1 file changed, 7 insertions(+) > > > > > > diff --git a/kernel/rcu/rcu.h b/kernel/rcu/rcu.h > > > index eed2951a4962..7acf1f36dd6c 100644 > > > --- a/kernel/rcu/rcu.h > > > +++ b/kernel/rcu/rcu.h > > > @@ -157,6 +157,13 @@ static inline bool rcu_seq_done(unsigned long *sp, unsigned long s) > > > * Given a snapshot from rcu_seq_snap(), determine whether or not a > > > * full update-side operation has occurred, but do not allow the > > > * (ULONG_MAX / 2) safety-factor/guard-band. > > > + * > > > + * The token returned by get_state_synchronize_rcu_full() is based on > > > + * rcu_state.gp_seq but it is tested in poll_state_synchronize_rcu_full() > > > + * against the root rnp->gp_seq. Since rcu_seq_start() is first called > > > + * on rcu_state.gp_seq and only later reflected on the root rnp->gp_seq, > > > + * it is possible that rcu_seq_snap(rcu_state.gp_seq) returns 2 full grace > > > + * periods ahead of the root rnp->gp_seq. > > > */ > > > static inline bool rcu_seq_done_exact(unsigned long *sp, unsigned long s) > > > { > > > -- > > > 2.48.1 > > >
Insomnia kicked in, so 3 am reply here (Zurich local time) ;-): On 3/20/2025 3:15 PM, Frederic Weisbecker wrote: > Le Wed, Mar 19, 2025 at 03:38:31PM -0400, Joel Fernandes a écrit : >> On Tue, Mar 18, 2025 at 11:37:38AM -0700, Paul E. McKenney wrote: >>> On Tue, Mar 18, 2025 at 02:56:18PM +0100, Frederic Weisbecker wrote: >>>> The numbers used in rcu_seq_done_exact() lack some explanation behind >>>> their magic. Especially after the commit: >>>> >>>> 85aad7cc4178 ("rcu: Fix get_state_synchronize_rcu_full() GP-start detection") >>>> >>>> which reported a subtle issue where a new GP sequence snapshot was taken >>>> on the root node state while a grace period had already been started and >>>> reflected on the global state sequence but not yet on the root node >>>> sequence, making a polling user waiting on a wrong already started grace >>>> period that would ignore freshly online CPUs. >>>> >>>> The fix involved taking the snaphot on the global state sequence and >>>> waiting on the root node sequence. And since a grace period is first >>>> started on the global state and only afterward reflected on the root >>>> node, a snapshot taken on the global state sequence might be two full >>>> grace periods ahead of the root node as in the following example: >>>> >>>> rnp->gp_seq = rcu_state.gp_seq = 0 >>>> >>>> CPU 0 CPU 1 >>>> ----- ----- >>>> // rcu_state.gp_seq = 1 >>>> rcu_seq_start(&rcu_state.gp_seq) >>>> // snap = 8 >>>> snap = rcu_seq_snap(&rcu_state.gp_seq) >>>> // Two full GP differences >>>> rcu_seq_done_exact(&rnp->gp_seq, snap) >>>> // rnp->gp_seq = 1 >>>> WRITE_ONCE(rnp->gp_seq, rcu_state.gp_seq); >>>> >>>> Add a comment about those expectations and to clarify the magic within >>>> the relevant function. >>>> >>>> Signed-off-by: Frederic Weisbecker <frederic@kernel.org> >>> Reviewed-by: Paul E. McKenney <paulmck@kernel.org> >>> >>> But it would of course be good to get reviews from the others. >> I actually don't agree that the magic in the rcu_seq_done_exact() function about the >> ~2 GPs is related to the lag between rcu_state.gp_seq and root rnp->gp_seq, >> because the small lag can just as well survive with the rcu_seq_done() >> function in the above sequence right? >> >> The rcu_seq_done_exact() function on the other hand is more about not being >> stuck in the ULONG_MAX/2 guard band, but to actually get to that, you need a >> wrap around to happen and the delta between "rnp->gp_seq" and "snap" to be at >> least ULONG_MAX/2 AFAIU. >> >> So the only time this magic will matter is if you have a huge delta between >> what is being compared, not just 2 GPs. > You're right, and perhaps I should have made it more specific that my comment > only explains the magic "3" number here, in that if it were "2" instead, there > could be accidents with 2 full GPs difference (which is possible) spuriously > accounted as a wrap around. Ahh, so I guess I get it now and we are both right. The complete picture is - We are trying to handle the case of "very large wrap" around but as a part of that, we don't want to create false-positives for this "snap" case. A "snap" can be atmost (2 * RCU_SEQ_STATE_MASK + 1) away from a gp_seq. That's within "2 GPs" worth of counts (about 8 counts) Taking some numbers: cur_s s delta (s - cur_s) 0 4 4 1 8 7 2 8 6 3 8 5 4 8 4 5 12 7 The maximum delta of a snap from actual gp_seq can be (2 * RCU_SEQ_STATE_MASK + 1) which in this case is 7. So we adjust the comparison by adding the ULONG_CMP_LT(cur_s, s - (2 * RCU_SEQ_STATE_MASK + 1)). i.e. after a snap, if we blindly do ULONG_CMP_LT without adjustment, we'll falsely conclude that the GP has completed thinking it was due to wrap around, where as it is possible we just snapped and got a false positive. So I think your comment is mostly correct then. But I think it may be better to clarify that the reason we need rcu_seq_done_exact() and that ULONG_CMP_LT is because we want handle very large wrap around not being stuck in "false negative" territory as we would with rcu_seq_done(). But that also means we can't break the "snap" usecase to the introduction of ULONG_CMP_LT. Unless you beat me to it, I may modify your patch for v6.16 augmented with this reasoning ;) (Also since I am also working on adding that forced wrap around test to rcutorture). Also it is still not fully clear to me what the root node has to do with all this in your example, because the rcu_seq_done_exact() needs to be what it is (that is having that 2 GP adjustment) even if the rnp->gp_seq and rcu_state.gp_seq were in sync? thanks, - Joel
Le Sat, Mar 22, 2025 at 03:06:08AM +0100, Joel Fernandes a écrit : > Insomnia kicked in, so 3 am reply here (Zurich local time) ;-): > > On 3/20/2025 3:15 PM, Frederic Weisbecker wrote: > > Le Wed, Mar 19, 2025 at 03:38:31PM -0400, Joel Fernandes a écrit : > >> On Tue, Mar 18, 2025 at 11:37:38AM -0700, Paul E. McKenney wrote: > >>> On Tue, Mar 18, 2025 at 02:56:18PM +0100, Frederic Weisbecker wrote: > >>>> The numbers used in rcu_seq_done_exact() lack some explanation behind > >>>> their magic. Especially after the commit: > >>>> > >>>> 85aad7cc4178 ("rcu: Fix get_state_synchronize_rcu_full() GP-start detection") > >>>> > >>>> which reported a subtle issue where a new GP sequence snapshot was taken > >>>> on the root node state while a grace period had already been started and > >>>> reflected on the global state sequence but not yet on the root node > >>>> sequence, making a polling user waiting on a wrong already started grace > >>>> period that would ignore freshly online CPUs. > >>>> > >>>> The fix involved taking the snaphot on the global state sequence and > >>>> waiting on the root node sequence. And since a grace period is first > >>>> started on the global state and only afterward reflected on the root > >>>> node, a snapshot taken on the global state sequence might be two full > >>>> grace periods ahead of the root node as in the following example: > >>>> > >>>> rnp->gp_seq = rcu_state.gp_seq = 0 > >>>> > >>>> CPU 0 CPU 1 > >>>> ----- ----- > >>>> // rcu_state.gp_seq = 1 > >>>> rcu_seq_start(&rcu_state.gp_seq) > >>>> // snap = 8 > >>>> snap = rcu_seq_snap(&rcu_state.gp_seq) > >>>> // Two full GP differences > >>>> rcu_seq_done_exact(&rnp->gp_seq, snap) > >>>> // rnp->gp_seq = 1 > >>>> WRITE_ONCE(rnp->gp_seq, rcu_state.gp_seq); > >>>> > >>>> Add a comment about those expectations and to clarify the magic within > >>>> the relevant function. > >>>> > >>>> Signed-off-by: Frederic Weisbecker <frederic@kernel.org> > >>> Reviewed-by: Paul E. McKenney <paulmck@kernel.org> > >>> > >>> But it would of course be good to get reviews from the others. > >> I actually don't agree that the magic in the rcu_seq_done_exact() function about the > >> ~2 GPs is related to the lag between rcu_state.gp_seq and root rnp->gp_seq, > >> because the small lag can just as well survive with the rcu_seq_done() > >> function in the above sequence right? > >> > >> The rcu_seq_done_exact() function on the other hand is more about not being > >> stuck in the ULONG_MAX/2 guard band, but to actually get to that, you need a > >> wrap around to happen and the delta between "rnp->gp_seq" and "snap" to be at > >> least ULONG_MAX/2 AFAIU. > >> > >> So the only time this magic will matter is if you have a huge delta between > >> what is being compared, not just 2 GPs. > > You're right, and perhaps I should have made it more specific that my comment > > only explains the magic "3" number here, in that if it were "2" instead, there > > could be accidents with 2 full GPs difference (which is possible) spuriously > > accounted as a wrap around. > > Ahh, so I guess I get it now and we are both right. The complete picture is - We > are trying to handle the case of "very large wrap" around but as a part of that, > we don't want to create false-positives for this "snap" case. > > A "snap" can be atmost (2 * RCU_SEQ_STATE_MASK + 1) away from a gp_seq. > > That's within "2 GPs" worth of counts (about 8 counts) > > Taking some numbers: > > cur_s s delta (s - cur_s) > 0 4 4 > 1 8 7 > 2 8 6 > 3 8 5 > 4 8 4 > 5 12 7 > > The maximum delta of a snap from actual gp_seq can be (2 * RCU_SEQ_STATE_MASK + > 1) which in this case is 7. > > So we adjust the comparison by adding the ULONG_CMP_LT(cur_s, s - (2 * > RCU_SEQ_STATE_MASK + 1)). i.e. 3, right? > > after a snap, if we blindly do ULONG_CMP_LT without adjustment, we'll falsely > conclude that the GP has completed thinking it was due to wrap around, where as > it is possible we just snapped and got a false positive. > > So I think your comment is mostly correct then. But I think it may be better to > clarify that the reason we need rcu_seq_done_exact() and that ULONG_CMP_LT is > because we want handle very large wrap around not being stuck in "false > negative" territory as we would with rcu_seq_done(). But that also means we > can't break the "snap" usecase to the introduction of ULONG_CMP_LT. Indeed. > > Unless you beat me to it, I may modify your patch for v6.16 augmented with this > reasoning ;) (Also since I am also working on adding that forced wrap around > test to rcutorture). Please do, I appreciate! > > Also it is still not fully clear to me what the root node has to do with all > this in your example, because the rcu_seq_done_exact() needs to be what it is > (that is having that 2 GP adjustment) even if the rnp->gp_seq and > rcu_state.gp_seq were in sync? Yes, this is only to explain that the maximum drift between the snap on rsp and the current state on root rnp can be at most 2 full GP. And that explain the "3" magic in the function. But if they were in sync it's all fine. Thanks. > > thanks, > > - Joel >
On 3/22/2025 11:25 AM, Frederic Weisbecker wrote: > Le Sat, Mar 22, 2025 at 03:06:08AM +0100, Joel Fernandes a écrit : >> Insomnia kicked in, so 3 am reply here (Zurich local time) ;-): >> >> On 3/20/2025 3:15 PM, Frederic Weisbecker wrote: >>> Le Wed, Mar 19, 2025 at 03:38:31PM -0400, Joel Fernandes a écrit : >>>> On Tue, Mar 18, 2025 at 11:37:38AM -0700, Paul E. McKenney wrote: >>>>> On Tue, Mar 18, 2025 at 02:56:18PM +0100, Frederic Weisbecker wrote: >>>>>> The numbers used in rcu_seq_done_exact() lack some explanation behind >>>>>> their magic. Especially after the commit: >>>>>> >>>>>> 85aad7cc4178 ("rcu: Fix get_state_synchronize_rcu_full() GP-start detection") >>>>>> >>>>>> which reported a subtle issue where a new GP sequence snapshot was taken >>>>>> on the root node state while a grace period had already been started and >>>>>> reflected on the global state sequence but not yet on the root node >>>>>> sequence, making a polling user waiting on a wrong already started grace >>>>>> period that would ignore freshly online CPUs. >>>>>> >>>>>> The fix involved taking the snaphot on the global state sequence and >>>>>> waiting on the root node sequence. And since a grace period is first >>>>>> started on the global state and only afterward reflected on the root >>>>>> node, a snapshot taken on the global state sequence might be two full >>>>>> grace periods ahead of the root node as in the following example: >>>>>> >>>>>> rnp->gp_seq = rcu_state.gp_seq = 0 >>>>>> >>>>>> CPU 0 CPU 1 >>>>>> ----- ----- >>>>>> // rcu_state.gp_seq = 1 >>>>>> rcu_seq_start(&rcu_state.gp_seq) >>>>>> // snap = 8 >>>>>> snap = rcu_seq_snap(&rcu_state.gp_seq) >>>>>> // Two full GP differences >>>>>> rcu_seq_done_exact(&rnp->gp_seq, snap) >>>>>> // rnp->gp_seq = 1 >>>>>> WRITE_ONCE(rnp->gp_seq, rcu_state.gp_seq); >>>>>> >>>>>> Add a comment about those expectations and to clarify the magic within >>>>>> the relevant function. >>>>>> >>>>>> Signed-off-by: Frederic Weisbecker <frederic@kernel.org> >>>>> Reviewed-by: Paul E. McKenney <paulmck@kernel.org> >>>>> >>>>> But it would of course be good to get reviews from the others. >>>> I actually don't agree that the magic in the rcu_seq_done_exact() function about the >>>> ~2 GPs is related to the lag between rcu_state.gp_seq and root rnp->gp_seq, >>>> because the small lag can just as well survive with the rcu_seq_done() >>>> function in the above sequence right? >>>> >>>> The rcu_seq_done_exact() function on the other hand is more about not being >>>> stuck in the ULONG_MAX/2 guard band, but to actually get to that, you need a >>>> wrap around to happen and the delta between "rnp->gp_seq" and "snap" to be at >>>> least ULONG_MAX/2 AFAIU. >>>> >>>> So the only time this magic will matter is if you have a huge delta between >>>> what is being compared, not just 2 GPs. >>> You're right, and perhaps I should have made it more specific that my comment >>> only explains the magic "3" number here, in that if it were "2" instead, there >>> could be accidents with 2 full GPs difference (which is possible) spuriously >>> accounted as a wrap around. >> >> Ahh, so I guess I get it now and we are both right. The complete picture is - We >> are trying to handle the case of "very large wrap" around but as a part of that, >> we don't want to create false-positives for this "snap" case. >> >> A "snap" can be atmost (2 * RCU_SEQ_STATE_MASK + 1) away from a gp_seq. >> >> That's within "2 GPs" worth of counts (about 8 counts) >> >> Taking some numbers: >> >> cur_s s delta (s - cur_s) >> 0 4 4 >> 1 8 7 >> 2 8 6 >> 3 8 5 >> 4 8 4 >> 5 12 7 >> >> The maximum delta of a snap from actual gp_seq can be (2 * RCU_SEQ_STATE_MASK + >> 1) which in this case is 7. >> >> So we adjust the comparison by adding the ULONG_CMP_LT(cur_s, s - (2 * >> RCU_SEQ_STATE_MASK + 1)). i.e. > > 3, right? Just to be absolutely sure, are you talking about the value of RCU_SEQ_STATE_MASK ? That is indeed 3 (RCU_SEQ_STATE_MASK). But if we're talking about number of GPs, my understanding is a count of 4 is one GP worth. Per the above table, the delta between gp_seq and is snap is always a count of 7 (hence less than 2 GPs). Agreed? If RCU_SEQ_STATE_MASK was 0x1 instead of 0x11, that is a single bit (or a count of 2 instead of 4, for a GP), then the table would be: cur_s s (snap) delta (s - cur_s) 0 2 2 1 4 3 2 4 2 3 6 3 4 6 2 5 8 3 So delta is always <= 3, Or more generally: <= (RCU_SEQ_STATE_MASK * 2) + 1 >> Unless you beat me to it, I may modify your patch for v6.16 augmented with this >> reasoning ;) (Also since I am also working on adding that forced wrap around >> test to rcutorture). > > Please do, I appreciate! Great, will do. >> Also it is still not fully clear to me what the root node has to do with all >> this in your example, because the rcu_seq_done_exact() needs to be what it is >> (that is having that 2 GP adjustment) even if the rnp->gp_seq and >> rcu_state.gp_seq were in sync? > > Yes, this is only to explain that the maximum drift between the snap on rsp > and the current state on root rnp can be at most 2 full GP. And that explain the "3" > magic in the function. But if they were in sync it's all fine. Right. But I would argue that, even if things were in sync, its not fine to drop the magic number. Because you cannot take a gp_seq and its corresponding "snap" and compare them instantly without the "ULONG_CMP_LT". If you do, you'd get a false positive. But as you did, we could say that the lag between rnp and rsp's gp_seq values is also covered by this "ULONG_CMP_LT" expression. Glad we are discussing this in detail so that once we forget all this in 24 hours, we can go back to the archives ;-) ;-) thanks, - Joel
On 3/22/2025 3:20 PM, Joel Fernandes wrote: > > On 3/22/2025 11:25 AM, Frederic Weisbecker wrote: >> Le Sat, Mar 22, 2025 at 03:06:08AM +0100, Joel Fernandes a écrit : >>> Insomnia kicked in, so 3 am reply here (Zurich local time) ;-): >>> >>> On 3/20/2025 3:15 PM, Frederic Weisbecker wrote: >>>> Le Wed, Mar 19, 2025 at 03:38:31PM -0400, Joel Fernandes a écrit : >>>>> On Tue, Mar 18, 2025 at 11:37:38AM -0700, Paul E. McKenney wrote: >>>>>> On Tue, Mar 18, 2025 at 02:56:18PM +0100, Frederic Weisbecker wrote: >>>>>>> The numbers used in rcu_seq_done_exact() lack some explanation behind >>>>>>> their magic. Especially after the commit: >>>>>>> >>>>>>> 85aad7cc4178 ("rcu: Fix get_state_synchronize_rcu_full() GP-start detection") >>>>>>> >>>>>>> which reported a subtle issue where a new GP sequence snapshot was taken >>>>>>> on the root node state while a grace period had already been started and >>>>>>> reflected on the global state sequence but not yet on the root node >>>>>>> sequence, making a polling user waiting on a wrong already started grace >>>>>>> period that would ignore freshly online CPUs. >>>>>>> >>>>>>> The fix involved taking the snaphot on the global state sequence and >>>>>>> waiting on the root node sequence. And since a grace period is first >>>>>>> started on the global state and only afterward reflected on the root >>>>>>> node, a snapshot taken on the global state sequence might be two full >>>>>>> grace periods ahead of the root node as in the following example: >>>>>>> >>>>>>> rnp->gp_seq = rcu_state.gp_seq = 0 >>>>>>> >>>>>>> CPU 0 CPU 1 >>>>>>> ----- ----- >>>>>>> // rcu_state.gp_seq = 1 >>>>>>> rcu_seq_start(&rcu_state.gp_seq) >>>>>>> // snap = 8 >>>>>>> snap = rcu_seq_snap(&rcu_state.gp_seq) >>>>>>> // Two full GP differences >>>>>>> rcu_seq_done_exact(&rnp->gp_seq, snap) >>>>>>> // rnp->gp_seq = 1 >>>>>>> WRITE_ONCE(rnp->gp_seq, rcu_state.gp_seq); >>>>>>> >>>>>>> Add a comment about those expectations and to clarify the magic within >>>>>>> the relevant function. >>>>>>> >>>>>>> Signed-off-by: Frederic Weisbecker <frederic@kernel.org> >>>>>> Reviewed-by: Paul E. McKenney <paulmck@kernel.org> >>>>>> >>>>>> But it would of course be good to get reviews from the others. >>>>> I actually don't agree that the magic in the rcu_seq_done_exact() function about the >>>>> ~2 GPs is related to the lag between rcu_state.gp_seq and root rnp->gp_seq, >>>>> because the small lag can just as well survive with the rcu_seq_done() >>>>> function in the above sequence right? >>>>> >>>>> The rcu_seq_done_exact() function on the other hand is more about not being >>>>> stuck in the ULONG_MAX/2 guard band, but to actually get to that, you need a >>>>> wrap around to happen and the delta between "rnp->gp_seq" and "snap" to be at >>>>> least ULONG_MAX/2 AFAIU. >>>>> >>>>> So the only time this magic will matter is if you have a huge delta between >>>>> what is being compared, not just 2 GPs. >>>> You're right, and perhaps I should have made it more specific that my comment >>>> only explains the magic "3" number here, in that if it were "2" instead, there >>>> could be accidents with 2 full GPs difference (which is possible) spuriously >>>> accounted as a wrap around. >>> Ahh, so I guess I get it now and we are both right. The complete picture is - We >>> are trying to handle the case of "very large wrap" around but as a part of that, >>> we don't want to create false-positives for this "snap" case. >>> >>> A "snap" can be atmost (2 * RCU_SEQ_STATE_MASK + 1) away from a gp_seq. >>> >>> That's within "2 GPs" worth of counts (about 8 counts) >>> >>> Taking some numbers: >>> >>> cur_s s delta (s - cur_s) >>> 0 4 4 >>> 1 8 7 >>> 2 8 6 >>> 3 8 5 >>> 4 8 4 >>> 5 12 7 >>> >>> The maximum delta of a snap from actual gp_seq can be (2 * RCU_SEQ_STATE_MASK + >>> 1) which in this case is 7. >>> >>> So we adjust the comparison by adding the ULONG_CMP_LT(cur_s, s - (2 * >>> RCU_SEQ_STATE_MASK + 1)). i.e. >> 3, right? > Just to be absolutely sure, are you talking about the value of RCU_SEQ_STATE_MASK ? > > That is indeed 3 (RCU_SEQ_STATE_MASK). > > But if we're talking about number of GPs, my understanding is a count of 4 is > one GP worth. Per the above table, the delta between gp_seq and is snap is > always a count of 7 (hence less than 2 GPs). > > Agreed? > > If RCU_SEQ_STATE_MASK was 0x1 instead of 0x11, that is a single bit (or a count > of 2 instead of 4, for a GP), then the table would be: > > cur_s s (snap) delta (s - cur_s) > 0 2 2 > 1 4 3 > 2 4 2 > 3 6 3 > 4 6 2 > 5 8 3 > > So delta is always <= 3, Or more generally: <= (RCU_SEQ_STATE_MASK * 2) + 1 Oh man, I am wondering if we are on to a bug here: From your example: CPU 0 CPU 1 ----- ----- // rcu_state.gp_seq = 1 rcu_seq_start(&rcu_state.gp_seq) // snap = 8 snap = rcu_seq_snap(&rcu_state.gp_seq) // Two full GP rcu_seq_done_exact(&rnp->gp_seq, snap) Here, the ULONG_CMP_LT(cur_s, s - (2 * RCU_SEQ_STATE_MASK + 1)); Will be ULONG_CMP_LT(0, 8 - (2 * RCU_SEQ_STATE_MASK + 1)); = ULONG_CMP_LT(0, 8 - 7) = TRUE. Which means rcu_seq_done_exact() will return a false positive saying the GP has completed even though it has not. I think rcu_seq_done_exact() is off by one and should be doing: ULONG_CMP_LT(cur_s, s - (2 * RCU_SEQ_STATE_MASK + 2)); ? Oh and I see from an old email that Frederic did ask about this: "Should it be ULONG_CMP_LT(cur_s, s - (2 * (RCU_SEQ_STATE_MASK + 1))) ?" thanks, - Joel
Le Sat, Mar 22, 2025 at 03:40:53PM +0100, Joel Fernandes a écrit : > > > On 3/22/2025 3:20 PM, Joel Fernandes wrote: > > > > On 3/22/2025 11:25 AM, Frederic Weisbecker wrote: > >> Le Sat, Mar 22, 2025 at 03:06:08AM +0100, Joel Fernandes a écrit : > >>> Insomnia kicked in, so 3 am reply here (Zurich local time) ;-): > >>> > >>> On 3/20/2025 3:15 PM, Frederic Weisbecker wrote: > >>>> Le Wed, Mar 19, 2025 at 03:38:31PM -0400, Joel Fernandes a écrit : > >>>>> On Tue, Mar 18, 2025 at 11:37:38AM -0700, Paul E. McKenney wrote: > >>>>>> On Tue, Mar 18, 2025 at 02:56:18PM +0100, Frederic Weisbecker wrote: > >>>>>>> The numbers used in rcu_seq_done_exact() lack some explanation behind > >>>>>>> their magic. Especially after the commit: > >>>>>>> > >>>>>>> 85aad7cc4178 ("rcu: Fix get_state_synchronize_rcu_full() GP-start detection") > >>>>>>> > >>>>>>> which reported a subtle issue where a new GP sequence snapshot was taken > >>>>>>> on the root node state while a grace period had already been started and > >>>>>>> reflected on the global state sequence but not yet on the root node > >>>>>>> sequence, making a polling user waiting on a wrong already started grace > >>>>>>> period that would ignore freshly online CPUs. > >>>>>>> > >>>>>>> The fix involved taking the snaphot on the global state sequence and > >>>>>>> waiting on the root node sequence. And since a grace period is first > >>>>>>> started on the global state and only afterward reflected on the root > >>>>>>> node, a snapshot taken on the global state sequence might be two full > >>>>>>> grace periods ahead of the root node as in the following example: > >>>>>>> > >>>>>>> rnp->gp_seq = rcu_state.gp_seq = 0 > >>>>>>> > >>>>>>> CPU 0 CPU 1 > >>>>>>> ----- ----- > >>>>>>> // rcu_state.gp_seq = 1 > >>>>>>> rcu_seq_start(&rcu_state.gp_seq) > >>>>>>> // snap = 8 > >>>>>>> snap = rcu_seq_snap(&rcu_state.gp_seq) > >>>>>>> // Two full GP differences > >>>>>>> rcu_seq_done_exact(&rnp->gp_seq, snap) > >>>>>>> // rnp->gp_seq = 1 > >>>>>>> WRITE_ONCE(rnp->gp_seq, rcu_state.gp_seq); > >>>>>>> > >>>>>>> Add a comment about those expectations and to clarify the magic within > >>>>>>> the relevant function. > >>>>>>> > >>>>>>> Signed-off-by: Frederic Weisbecker <frederic@kernel.org> > >>>>>> Reviewed-by: Paul E. McKenney <paulmck@kernel.org> > >>>>>> > >>>>>> But it would of course be good to get reviews from the others. > >>>>> I actually don't agree that the magic in the rcu_seq_done_exact() function about the > >>>>> ~2 GPs is related to the lag between rcu_state.gp_seq and root rnp->gp_seq, > >>>>> because the small lag can just as well survive with the rcu_seq_done() > >>>>> function in the above sequence right? > >>>>> > >>>>> The rcu_seq_done_exact() function on the other hand is more about not being > >>>>> stuck in the ULONG_MAX/2 guard band, but to actually get to that, you need a > >>>>> wrap around to happen and the delta between "rnp->gp_seq" and "snap" to be at > >>>>> least ULONG_MAX/2 AFAIU. > >>>>> > >>>>> So the only time this magic will matter is if you have a huge delta between > >>>>> what is being compared, not just 2 GPs. > >>>> You're right, and perhaps I should have made it more specific that my comment > >>>> only explains the magic "3" number here, in that if it were "2" instead, there > >>>> could be accidents with 2 full GPs difference (which is possible) spuriously > >>>> accounted as a wrap around. > >>> Ahh, so I guess I get it now and we are both right. The complete picture is - We > >>> are trying to handle the case of "very large wrap" around but as a part of that, > >>> we don't want to create false-positives for this "snap" case. > >>> > >>> A "snap" can be atmost (2 * RCU_SEQ_STATE_MASK + 1) away from a gp_seq. > >>> > >>> That's within "2 GPs" worth of counts (about 8 counts) > >>> > >>> Taking some numbers: > >>> > >>> cur_s s delta (s - cur_s) > >>> 0 4 4 > >>> 1 8 7 > >>> 2 8 6 > >>> 3 8 5 > >>> 4 8 4 > >>> 5 12 7 > >>> > >>> The maximum delta of a snap from actual gp_seq can be (2 * RCU_SEQ_STATE_MASK + > >>> 1) which in this case is 7. > >>> > >>> So we adjust the comparison by adding the ULONG_CMP_LT(cur_s, s - (2 * > >>> RCU_SEQ_STATE_MASK + 1)). i.e. > >> 3, right? > > Just to be absolutely sure, are you talking about the value of RCU_SEQ_STATE_MASK ? > > > > That is indeed 3 (RCU_SEQ_STATE_MASK). > > > > But if we're talking about number of GPs, my understanding is a count of 4 is > > one GP worth. Per the above table, the delta between gp_seq and is snap is > > always a count of 7 (hence less than 2 GPs). > > > > Agreed? > > > > If RCU_SEQ_STATE_MASK was 0x1 instead of 0x11, that is a single bit (or a count > > of 2 instead of 4, for a GP), then the table would be: > > > > cur_s s (snap) delta (s - cur_s) > > 0 2 2 > > 1 4 3 > > 2 4 2 > > 3 6 3 > > 4 6 2 > > 5 8 3 > > > > So delta is always <= 3, Or more generally: <= (RCU_SEQ_STATE_MASK * 2) + 1 > > Oh man, I am wondering if we are on to a bug here: > > From your example: > > CPU 0 CPU 1 > ----- ----- > // rcu_state.gp_seq = 1 > rcu_seq_start(&rcu_state.gp_seq) > // snap = 8 > snap = rcu_seq_snap(&rcu_state.gp_seq) > // Two full GP > rcu_seq_done_exact(&rnp->gp_seq, snap) > > > Here, the > ULONG_CMP_LT(cur_s, s - (2 * RCU_SEQ_STATE_MASK + 1)); > > Will be > ULONG_CMP_LT(0, 8 - (2 * RCU_SEQ_STATE_MASK + 1)); > > = ULONG_CMP_LT(0, 8 - 7) > > = TRUE. > > Which means rcu_seq_done_exact() will return a false positive saying the GP has > completed even though it has not. > > I think rcu_seq_done_exact() is off by one and should be doing: > > ULONG_CMP_LT(cur_s, s - (2 * RCU_SEQ_STATE_MASK + 2)); > > ? But it's ULONG_CMP_LT(cur_s, s - (3 * RCU_SEQ_STATE_MASK + 1) now since: 85aad7cc4178 ("rcu: Fix get_state_synchronize_rcu_full() GP-start detection") That's 10 so we are good. However that magic value is arbitrary and doesn't mean much. It should be like you said. Or rather for clarity: diff --git a/kernel/rcu/rcu.h b/kernel/rcu/rcu.h index 7acf1f36dd6c..e53f0b687a83 100644 --- a/kernel/rcu/rcu.h +++ b/kernel/rcu/rcu.h @@ -57,6 +57,10 @@ /* Low-order bit definition for polled grace-period APIs. */ #define RCU_GET_STATE_COMPLETED 0x1 +/* A complete grace period count */ +#define RCU_SEQ_GP (RCU_SEQ_STATE_MASK + 1) + + extern int sysctl_sched_rt_runtime; /* @@ -169,7 +173,7 @@ static inline bool rcu_seq_done_exact(unsigned long *sp, unsigned long s) { unsigned long cur_s = READ_ONCE(*sp); - return ULONG_CMP_GE(cur_s, s) || ULONG_CMP_LT(cur_s, s - (3 * RCU_SEQ_STATE_MASK + 1)); + return ULONG_CMP_GE(cur_s, s) || ULONG_CMP_LT(cur_s, s - (2 * RCU_SEQ_GP)); } /*
> On Mar 23, 2025, at 11:05 PM, Frederic Weisbecker <frederic@kernel.org> wrote: > > Le Sat, Mar 22, 2025 at 03:40:53PM +0100, Joel Fernandes a écrit : >> >> >>> On 3/22/2025 3:20 PM, Joel Fernandes wrote: >>> >>> On 3/22/2025 11:25 AM, Frederic Weisbecker wrote: >>>> Le Sat, Mar 22, 2025 at 03:06:08AM +0100, Joel Fernandes a écrit : >>>>> Insomnia kicked in, so 3 am reply here (Zurich local time) ;-): >>>>> >>>>> On 3/20/2025 3:15 PM, Frederic Weisbecker wrote: >>>>>> Le Wed, Mar 19, 2025 at 03:38:31PM -0400, Joel Fernandes a écrit : >>>>>>> On Tue, Mar 18, 2025 at 11:37:38AM -0700, Paul E. McKenney wrote: >>>>>>>> On Tue, Mar 18, 2025 at 02:56:18PM +0100, Frederic Weisbecker wrote: >>>>>>>>> The numbers used in rcu_seq_done_exact() lack some explanation behind >>>>>>>>> their magic. Especially after the commit: >>>>>>>>> >>>>>>>>> 85aad7cc4178 ("rcu: Fix get_state_synchronize_rcu_full() GP-start detection") >>>>>>>>> >>>>>>>>> which reported a subtle issue where a new GP sequence snapshot was taken >>>>>>>>> on the root node state while a grace period had already been started and >>>>>>>>> reflected on the global state sequence but not yet on the root node >>>>>>>>> sequence, making a polling user waiting on a wrong already started grace >>>>>>>>> period that would ignore freshly online CPUs. >>>>>>>>> >>>>>>>>> The fix involved taking the snaphot on the global state sequence and >>>>>>>>> waiting on the root node sequence. And since a grace period is first >>>>>>>>> started on the global state and only afterward reflected on the root >>>>>>>>> node, a snapshot taken on the global state sequence might be two full >>>>>>>>> grace periods ahead of the root node as in the following example: >>>>>>>>> >>>>>>>>> rnp->gp_seq = rcu_state.gp_seq = 0 >>>>>>>>> >>>>>>>>> CPU 0 CPU 1 >>>>>>>>> ----- ----- >>>>>>>>> // rcu_state.gp_seq = 1 >>>>>>>>> rcu_seq_start(&rcu_state.gp_seq) >>>>>>>>> // snap = 8 >>>>>>>>> snap = rcu_seq_snap(&rcu_state.gp_seq) >>>>>>>>> // Two full GP differences >>>>>>>>> rcu_seq_done_exact(&rnp->gp_seq, snap) >>>>>>>>> // rnp->gp_seq = 1 >>>>>>>>> WRITE_ONCE(rnp->gp_seq, rcu_state.gp_seq); >>>>>>>>> >>>>>>>>> Add a comment about those expectations and to clarify the magic within >>>>>>>>> the relevant function. >>>>>>>>> >>>>>>>>> Signed-off-by: Frederic Weisbecker <frederic@kernel.org> >>>>>>>> Reviewed-by: Paul E. McKenney <paulmck@kernel.org> >>>>>>>> >>>>>>>> But it would of course be good to get reviews from the others. >>>>>>> I actually don't agree that the magic in the rcu_seq_done_exact() function about the >>>>>>> ~2 GPs is related to the lag between rcu_state.gp_seq and root rnp->gp_seq, >>>>>>> because the small lag can just as well survive with the rcu_seq_done() >>>>>>> function in the above sequence right? >>>>>>> >>>>>>> The rcu_seq_done_exact() function on the other hand is more about not being >>>>>>> stuck in the ULONG_MAX/2 guard band, but to actually get to that, you need a >>>>>>> wrap around to happen and the delta between "rnp->gp_seq" and "snap" to be at >>>>>>> least ULONG_MAX/2 AFAIU. >>>>>>> >>>>>>> So the only time this magic will matter is if you have a huge delta between >>>>>>> what is being compared, not just 2 GPs. >>>>>> You're right, and perhaps I should have made it more specific that my comment >>>>>> only explains the magic "3" number here, in that if it were "2" instead, there >>>>>> could be accidents with 2 full GPs difference (which is possible) spuriously >>>>>> accounted as a wrap around. >>>>> Ahh, so I guess I get it now and we are both right. The complete picture is - We >>>>> are trying to handle the case of "very large wrap" around but as a part of that, >>>>> we don't want to create false-positives for this "snap" case. >>>>> >>>>> A "snap" can be atmost (2 * RCU_SEQ_STATE_MASK + 1) away from a gp_seq. >>>>> >>>>> That's within "2 GPs" worth of counts (about 8 counts) >>>>> >>>>> Taking some numbers: >>>>> >>>>> cur_s s delta (s - cur_s) >>>>> 0 4 4 >>>>> 1 8 7 >>>>> 2 8 6 >>>>> 3 8 5 >>>>> 4 8 4 >>>>> 5 12 7 >>>>> >>>>> The maximum delta of a snap from actual gp_seq can be (2 * RCU_SEQ_STATE_MASK + >>>>> 1) which in this case is 7. >>>>> >>>>> So we adjust the comparison by adding the ULONG_CMP_LT(cur_s, s - (2 * >>>>> RCU_SEQ_STATE_MASK + 1)). i.e. >>>> 3, right? >>> Just to be absolutely sure, are you talking about the value of RCU_SEQ_STATE_MASK ? >>> >>> That is indeed 3 (RCU_SEQ_STATE_MASK). >>> >>> But if we're talking about number of GPs, my understanding is a count of 4 is >>> one GP worth. Per the above table, the delta between gp_seq and is snap is >>> always a count of 7 (hence less than 2 GPs). >>> >>> Agreed? >>> >>> If RCU_SEQ_STATE_MASK was 0x1 instead of 0x11, that is a single bit (or a count >>> of 2 instead of 4, for a GP), then the table would be: >>> >>> cur_s s (snap) delta (s - cur_s) >>> 0 2 2 >>> 1 4 3 >>> 2 4 2 >>> 3 6 3 >>> 4 6 2 >>> 5 8 3 >>> >>> So delta is always <= 3, Or more generally: <= (RCU_SEQ_STATE_MASK * 2) + 1 >> >> Oh man, I am wondering if we are on to a bug here: >> >> From your example: >> >> CPU 0 CPU 1 >> ----- ----- >> // rcu_state.gp_seq = 1 >> rcu_seq_start(&rcu_state.gp_seq) >> // snap = 8 >> snap = rcu_seq_snap(&rcu_state.gp_seq) >> // Two full GP >> rcu_seq_done_exact(&rnp->gp_seq, snap) >> >> >> Here, the >> ULONG_CMP_LT(cur_s, s - (2 * RCU_SEQ_STATE_MASK + 1)); >> >> Will be >> ULONG_CMP_LT(0, 8 - (2 * RCU_SEQ_STATE_MASK + 1)); >> >> = ULONG_CMP_LT(0, 8 - 7) >> >> = TRUE. >> >> Which means rcu_seq_done_exact() will return a false positive saying the GP has >> completed even though it has not. >> >> I think rcu_seq_done_exact() is off by one and should be doing: >> >> ULONG_CMP_LT(cur_s, s - (2 * RCU_SEQ_STATE_MASK + 2)); >> >> ? > > But it's ULONG_CMP_LT(cur_s, s - (3 * RCU_SEQ_STATE_MASK + 1) now since: > > 85aad7cc4178 ("rcu: Fix get_state_synchronize_rcu_full() GP-start detection") > > That's 10 so we are good. > > However that magic value is arbitrary and doesn't mean much. It should be > like you said. Or rather for clarity: > > diff --git a/kernel/rcu/rcu.h b/kernel/rcu/rcu.h > index 7acf1f36dd6c..e53f0b687a83 100644 > --- a/kernel/rcu/rcu.h > +++ b/kernel/rcu/rcu.h > @@ -57,6 +57,10 @@ > /* Low-order bit definition for polled grace-period APIs. */ > #define RCU_GET_STATE_COMPLETED 0x1 > > +/* A complete grace period count */ > +#define RCU_SEQ_GP (RCU_SEQ_STATE_MASK + 1) > + > + > extern int sysctl_sched_rt_runtime; > > /* > @@ -169,7 +173,7 @@ static inline bool rcu_seq_done_exact(unsigned long *sp, unsigned long s) > { > unsigned long cur_s = READ_ONCE(*sp); > > - return ULONG_CMP_GE(cur_s, s) || ULONG_CMP_LT(cur_s, s - (3 * RCU_SEQ_STATE_MASK + 1)); > + return ULONG_CMP_GE(cur_s, s) || ULONG_CMP_LT(cur_s, s - (2 * RCU_SEQ_GP)); > } Ah, my kernel did not have the change at the time I commented, sorry. I agree that this is a much better and meaningful expression than the existing one. I shall create a patch with it and send it out with my other series on forcing the wrap for testing (along with the modification for the new comments you added). Thanks! > > /* > >
diff --git a/kernel/rcu/rcu.h b/kernel/rcu/rcu.h index eed2951a4962..7acf1f36dd6c 100644 --- a/kernel/rcu/rcu.h +++ b/kernel/rcu/rcu.h @@ -157,6 +157,13 @@ static inline bool rcu_seq_done(unsigned long *sp, unsigned long s) * Given a snapshot from rcu_seq_snap(), determine whether or not a * full update-side operation has occurred, but do not allow the * (ULONG_MAX / 2) safety-factor/guard-band. + * + * The token returned by get_state_synchronize_rcu_full() is based on + * rcu_state.gp_seq but it is tested in poll_state_synchronize_rcu_full() + * against the root rnp->gp_seq. Since rcu_seq_start() is first called + * on rcu_state.gp_seq and only later reflected on the root rnp->gp_seq, + * it is possible that rcu_seq_snap(rcu_state.gp_seq) returns 2 full grace + * periods ahead of the root rnp->gp_seq. */ static inline bool rcu_seq_done_exact(unsigned long *sp, unsigned long s) {
The numbers used in rcu_seq_done_exact() lack some explanation behind their magic. Especially after the commit: 85aad7cc4178 ("rcu: Fix get_state_synchronize_rcu_full() GP-start detection") which reported a subtle issue where a new GP sequence snapshot was taken on the root node state while a grace period had already been started and reflected on the global state sequence but not yet on the root node sequence, making a polling user waiting on a wrong already started grace period that would ignore freshly online CPUs. The fix involved taking the snaphot on the global state sequence and waiting on the root node sequence. And since a grace period is first started on the global state and only afterward reflected on the root node, a snapshot taken on the global state sequence might be two full grace periods ahead of the root node as in the following example: rnp->gp_seq = rcu_state.gp_seq = 0 CPU 0 CPU 1 ----- ----- // rcu_state.gp_seq = 1 rcu_seq_start(&rcu_state.gp_seq) // snap = 8 snap = rcu_seq_snap(&rcu_state.gp_seq) // Two full GP differences rcu_seq_done_exact(&rnp->gp_seq, snap) // rnp->gp_seq = 1 WRITE_ONCE(rnp->gp_seq, rcu_state.gp_seq); Add a comment about those expectations and to clarify the magic within the relevant function. Signed-off-by: Frederic Weisbecker <frederic@kernel.org> --- kernel/rcu/rcu.h | 7 +++++++ 1 file changed, 7 insertions(+)