Message ID | 20221130181325.1012760-15-paulmck@kernel.org (mailing list archive) |
---|---|
State | Not Applicable |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | None | expand |
Hi Eric, Could you give your ACK for this patch for this one as well? This is the other networking one. The networking testing passed on ChromeOS and it has been in -next for some time so has gotten testing there. The CONFIG option is default disabled. Thanks a lot, - Joel On Wed, Nov 30, 2022 at 6:14 PM Paul E. McKenney <paulmck@kernel.org> wrote: > > From: "Joel Fernandes (Google)" <joel@joelfernandes.org> > > In a networking test on ChromeOS, kernels built with the new > CONFIG_RCU_LAZY=y Kconfig option fail a networking test in the teardown > phase. > > This failure may be reproduced as follows: ip netns del <name> > > The CONFIG_RCU_LAZY=y Kconfig option was introduced by earlier commits > in this series for the benefit of certain battery-powered systems. > This Kconfig option causes call_rcu() to delay its callbacks in order > to batch them. This means that a given RCU grace period covers more > callbacks, thus reducing the number of grace periods, in turn reducing > the amount of energy consumed, which increases battery lifetime which > can be a very good thing. This is not a subtle effect: In some important > use cases, the battery lifetime is increased by more than 10%. > > This CONFIG_RCU_LAZY=y option is available only for CPUs that offload > callbacks, for example, CPUs mentioned in the rcu_nocbs kernel boot > parameter passed to kernels built with CONFIG_RCU_NOCB_CPU=y. > > Delaying callbacks is normally not a problem because most callbacks do > nothing but free memory. If the system is short on memory, a shrinker > will kick all currently queued lazy callbacks out of their laziness, > thus freeing their memory in short order. Similarly, the rcu_barrier() > function, which blocks until all currently queued callbacks are invoked, > will also kick lazy callbacks, thus enabling rcu_barrier() to complete > in a timely manner. > > However, there are some cases where laziness is not a good option. > For example, synchronize_rcu() invokes call_rcu(), and blocks until > the newly queued callback is invoked. It would not be a good for > synchronize_rcu() to block for ten seconds, even on an idle system. > Therefore, synchronize_rcu() invokes call_rcu_hurry() instead of > call_rcu(). The arrival of a non-lazy call_rcu_hurry() callback on a > given CPU kicks any lazy callbacks that might be already queued on that > CPU. After all, if there is going to be a grace period, all callbacks > might as well get full benefit from it. > > Yes, this could be done the other way around by creating a > call_rcu_lazy(), but earlier experience with this approach and > feedback at the 2022 Linux Plumbers Conference shifted the approach > to call_rcu() being lazy with call_rcu_hurry() for the few places > where laziness is inappropriate. > > Returning to the test failure, use of ftrace showed that this failure > cause caused by the aadded delays due to this new lazy behavior of > call_rcu() in kernels built with CONFIG_RCU_LAZY=y. > > Therefore, make dst_release() use call_rcu_hurry() in order to revert > to the old test-failure-free behavior. > > [ paulmck: Apply s/call_rcu_flush/call_rcu_hurry/ feedback from Tejun Heo. ] > > Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org> > Cc: David Ahern <dsahern@kernel.org> > Cc: "David S. Miller" <davem@davemloft.net> > Cc: Eric Dumazet <edumazet@google.com> > Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org> > Cc: Jakub Kicinski <kuba@kernel.org> > Cc: Paolo Abeni <pabeni@redhat.com> > Cc: <netdev@vger.kernel.org> > Signed-off-by: Paul E. McKenney <paulmck@kernel.org> > --- > net/core/dst.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/net/core/dst.c b/net/core/dst.c > index bc9c9be4e0801..a4e738d321ba2 100644 > --- a/net/core/dst.c > +++ b/net/core/dst.c > @@ -174,7 +174,7 @@ void dst_release(struct dst_entry *dst) > net_warn_ratelimited("%s: dst:%p refcnt:%d\n", > __func__, dst, newrefcnt); > if (!newrefcnt) > - call_rcu(&dst->rcu_head, dst_destroy_rcu); > + call_rcu_hurry(&dst->rcu_head, dst_destroy_rcu); > } > } > EXPORT_SYMBOL(dst_release); > -- > 2.31.1.189.g2e36527f23 >
Sure, thanks. Reviewed-by: Eric Dumazet <edumazet@google.com> I think we can work later to change how dst are freed/released to avoid using call_rcu_hurry() On Wed, Nov 30, 2022 at 7:17 PM Joel Fernandes <joel@joelfernandes.org> wrote: > > Hi Eric, > > Could you give your ACK for this patch for this one as well? This is > the other networking one. > > The networking testing passed on ChromeOS and it has been in -next for > some time so has gotten testing there. The CONFIG option is default > disabled. > > Thanks a lot, > > - Joel > > On Wed, Nov 30, 2022 at 6:14 PM Paul E. McKenney <paulmck@kernel.org> wrote: > > > > From: "Joel Fernandes (Google)" <joel@joelfernandes.org> > > > > In a networking test on ChromeOS, kernels built with the new > > CONFIG_RCU_LAZY=y Kconfig option fail a networking test in the teardown > > phase. > > > > This failure may be reproduced as follows: ip netns del <name> > > > > The CONFIG_RCU_LAZY=y Kconfig option was introduced by earlier commits > > in this series for the benefit of certain battery-powered systems. > > This Kconfig option causes call_rcu() to delay its callbacks in order > > to batch them. This means that a given RCU grace period covers more > > callbacks, thus reducing the number of grace periods, in turn reducing > > the amount of energy consumed, which increases battery lifetime which > > can be a very good thing. This is not a subtle effect: In some important > > use cases, the battery lifetime is increased by more than 10%. > > > > This CONFIG_RCU_LAZY=y option is available only for CPUs that offload > > callbacks, for example, CPUs mentioned in the rcu_nocbs kernel boot > > parameter passed to kernels built with CONFIG_RCU_NOCB_CPU=y. > > > > Delaying callbacks is normally not a problem because most callbacks do > > nothing but free memory. If the system is short on memory, a shrinker > > will kick all currently queued lazy callbacks out of their laziness, > > thus freeing their memory in short order. Similarly, the rcu_barrier() > > function, which blocks until all currently queued callbacks are invoked, > > will also kick lazy callbacks, thus enabling rcu_barrier() to complete > > in a timely manner. > > > > However, there are some cases where laziness is not a good option. > > For example, synchronize_rcu() invokes call_rcu(), and blocks until > > the newly queued callback is invoked. It would not be a good for > > synchronize_rcu() to block for ten seconds, even on an idle system. > > Therefore, synchronize_rcu() invokes call_rcu_hurry() instead of > > call_rcu(). The arrival of a non-lazy call_rcu_hurry() callback on a > > given CPU kicks any lazy callbacks that might be already queued on that > > CPU. After all, if there is going to be a grace period, all callbacks > > might as well get full benefit from it. > > > > Yes, this could be done the other way around by creating a > > call_rcu_lazy(), but earlier experience with this approach and > > feedback at the 2022 Linux Plumbers Conference shifted the approach > > to call_rcu() being lazy with call_rcu_hurry() for the few places > > where laziness is inappropriate. > > > > Returning to the test failure, use of ftrace showed that this failure > > cause caused by the aadded delays due to this new lazy behavior of > > call_rcu() in kernels built with CONFIG_RCU_LAZY=y. > > > > Therefore, make dst_release() use call_rcu_hurry() in order to revert > > to the old test-failure-free behavior. > > > > [ paulmck: Apply s/call_rcu_flush/call_rcu_hurry/ feedback from Tejun Heo. ] > > > > Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org> > > Cc: David Ahern <dsahern@kernel.org> > > Cc: "David S. Miller" <davem@davemloft.net> > > Cc: Eric Dumazet <edumazet@google.com> > > Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org> > > Cc: Jakub Kicinski <kuba@kernel.org> > > Cc: Paolo Abeni <pabeni@redhat.com> > > Cc: <netdev@vger.kernel.org> > > Signed-off-by: Paul E. McKenney <paulmck@kernel.org> > > --- > > net/core/dst.c | 2 +- > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > diff --git a/net/core/dst.c b/net/core/dst.c > > index bc9c9be4e0801..a4e738d321ba2 100644 > > --- a/net/core/dst.c > > +++ b/net/core/dst.c > > @@ -174,7 +174,7 @@ void dst_release(struct dst_entry *dst) > > net_warn_ratelimited("%s: dst:%p refcnt:%d\n", > > __func__, dst, newrefcnt); > > if (!newrefcnt) > > - call_rcu(&dst->rcu_head, dst_destroy_rcu); > > + call_rcu_hurry(&dst->rcu_head, dst_destroy_rcu); > > } > > } > > EXPORT_SYMBOL(dst_release); > > -- > > 2.31.1.189.g2e36527f23 > >
Hi Eric, On Wed, Nov 30, 2022 at 6:39 PM Eric Dumazet <edumazet@google.com> wrote: > > Sure, thanks. > > Reviewed-by: Eric Dumazet <edumazet@google.com> > > I think we can work later to change how dst are freed/released to > avoid using call_rcu_hurry() That sounds great, if you can give me any high-level guidance (in the future) on that and what to look for, I can give it a try as well. I have been wanting to learn more about the networking code :-) Thanks, - Joel > On Wed, Nov 30, 2022 at 7:17 PM Joel Fernandes <joel@joelfernandes.org> wrote: > > > > Hi Eric, > > > > Could you give your ACK for this patch for this one as well? This is > > the other networking one. > > > > The networking testing passed on ChromeOS and it has been in -next for > > some time so has gotten testing there. The CONFIG option is default > > disabled. > > > > Thanks a lot, > > > > - Joel > > > > On Wed, Nov 30, 2022 at 6:14 PM Paul E. McKenney <paulmck@kernel.org> wrote: > > > > > > From: "Joel Fernandes (Google)" <joel@joelfernandes.org> > > > > > > In a networking test on ChromeOS, kernels built with the new > > > CONFIG_RCU_LAZY=y Kconfig option fail a networking test in the teardown > > > phase. > > > > > > This failure may be reproduced as follows: ip netns del <name> > > > > > > The CONFIG_RCU_LAZY=y Kconfig option was introduced by earlier commits > > > in this series for the benefit of certain battery-powered systems. > > > This Kconfig option causes call_rcu() to delay its callbacks in order > > > to batch them. This means that a given RCU grace period covers more > > > callbacks, thus reducing the number of grace periods, in turn reducing > > > the amount of energy consumed, which increases battery lifetime which > > > can be a very good thing. This is not a subtle effect: In some important > > > use cases, the battery lifetime is increased by more than 10%. > > > > > > This CONFIG_RCU_LAZY=y option is available only for CPUs that offload > > > callbacks, for example, CPUs mentioned in the rcu_nocbs kernel boot > > > parameter passed to kernels built with CONFIG_RCU_NOCB_CPU=y. > > > > > > Delaying callbacks is normally not a problem because most callbacks do > > > nothing but free memory. If the system is short on memory, a shrinker > > > will kick all currently queued lazy callbacks out of their laziness, > > > thus freeing their memory in short order. Similarly, the rcu_barrier() > > > function, which blocks until all currently queued callbacks are invoked, > > > will also kick lazy callbacks, thus enabling rcu_barrier() to complete > > > in a timely manner. > > > > > > However, there are some cases where laziness is not a good option. > > > For example, synchronize_rcu() invokes call_rcu(), and blocks until > > > the newly queued callback is invoked. It would not be a good for > > > synchronize_rcu() to block for ten seconds, even on an idle system. > > > Therefore, synchronize_rcu() invokes call_rcu_hurry() instead of > > > call_rcu(). The arrival of a non-lazy call_rcu_hurry() callback on a > > > given CPU kicks any lazy callbacks that might be already queued on that > > > CPU. After all, if there is going to be a grace period, all callbacks > > > might as well get full benefit from it. > > > > > > Yes, this could be done the other way around by creating a > > > call_rcu_lazy(), but earlier experience with this approach and > > > feedback at the 2022 Linux Plumbers Conference shifted the approach > > > to call_rcu() being lazy with call_rcu_hurry() for the few places > > > where laziness is inappropriate. > > > > > > Returning to the test failure, use of ftrace showed that this failure > > > cause caused by the aadded delays due to this new lazy behavior of > > > call_rcu() in kernels built with CONFIG_RCU_LAZY=y. > > > > > > Therefore, make dst_release() use call_rcu_hurry() in order to revert > > > to the old test-failure-free behavior. > > > > > > [ paulmck: Apply s/call_rcu_flush/call_rcu_hurry/ feedback from Tejun Heo. ] > > > > > > Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org> > > > Cc: David Ahern <dsahern@kernel.org> > > > Cc: "David S. Miller" <davem@davemloft.net> > > > Cc: Eric Dumazet <edumazet@google.com> > > > Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org> > > > Cc: Jakub Kicinski <kuba@kernel.org> > > > Cc: Paolo Abeni <pabeni@redhat.com> > > > Cc: <netdev@vger.kernel.org> > > > Signed-off-by: Paul E. McKenney <paulmck@kernel.org> > > > --- > > > net/core/dst.c | 2 +- > > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > > > diff --git a/net/core/dst.c b/net/core/dst.c > > > index bc9c9be4e0801..a4e738d321ba2 100644 > > > --- a/net/core/dst.c > > > +++ b/net/core/dst.c > > > @@ -174,7 +174,7 @@ void dst_release(struct dst_entry *dst) > > > net_warn_ratelimited("%s: dst:%p refcnt:%d\n", > > > __func__, dst, newrefcnt); > > > if (!newrefcnt) > > > - call_rcu(&dst->rcu_head, dst_destroy_rcu); > > > + call_rcu_hurry(&dst->rcu_head, dst_destroy_rcu); > > > } > > > } > > > EXPORT_SYMBOL(dst_release); > > > -- > > > 2.31.1.189.g2e36527f23 > > >
On Wed, Nov 30, 2022 at 07:39:02PM +0100, Eric Dumazet wrote: > Sure, thanks. > > Reviewed-by: Eric Dumazet <edumazet@google.com> Applied, thank you!!! > I think we can work later to change how dst are freed/released to > avoid using call_rcu_hurry() Thank you for being willing to look into that! Thanx, Paul > On Wed, Nov 30, 2022 at 7:17 PM Joel Fernandes <joel@joelfernandes.org> wrote: > > > > Hi Eric, > > > > Could you give your ACK for this patch for this one as well? This is > > the other networking one. > > > > The networking testing passed on ChromeOS and it has been in -next for > > some time so has gotten testing there. The CONFIG option is default > > disabled. > > > > Thanks a lot, > > > > - Joel > > > > On Wed, Nov 30, 2022 at 6:14 PM Paul E. McKenney <paulmck@kernel.org> wrote: > > > > > > From: "Joel Fernandes (Google)" <joel@joelfernandes.org> > > > > > > In a networking test on ChromeOS, kernels built with the new > > > CONFIG_RCU_LAZY=y Kconfig option fail a networking test in the teardown > > > phase. > > > > > > This failure may be reproduced as follows: ip netns del <name> > > > > > > The CONFIG_RCU_LAZY=y Kconfig option was introduced by earlier commits > > > in this series for the benefit of certain battery-powered systems. > > > This Kconfig option causes call_rcu() to delay its callbacks in order > > > to batch them. This means that a given RCU grace period covers more > > > callbacks, thus reducing the number of grace periods, in turn reducing > > > the amount of energy consumed, which increases battery lifetime which > > > can be a very good thing. This is not a subtle effect: In some important > > > use cases, the battery lifetime is increased by more than 10%. > > > > > > This CONFIG_RCU_LAZY=y option is available only for CPUs that offload > > > callbacks, for example, CPUs mentioned in the rcu_nocbs kernel boot > > > parameter passed to kernels built with CONFIG_RCU_NOCB_CPU=y. > > > > > > Delaying callbacks is normally not a problem because most callbacks do > > > nothing but free memory. If the system is short on memory, a shrinker > > > will kick all currently queued lazy callbacks out of their laziness, > > > thus freeing their memory in short order. Similarly, the rcu_barrier() > > > function, which blocks until all currently queued callbacks are invoked, > > > will also kick lazy callbacks, thus enabling rcu_barrier() to complete > > > in a timely manner. > > > > > > However, there are some cases where laziness is not a good option. > > > For example, synchronize_rcu() invokes call_rcu(), and blocks until > > > the newly queued callback is invoked. It would not be a good for > > > synchronize_rcu() to block for ten seconds, even on an idle system. > > > Therefore, synchronize_rcu() invokes call_rcu_hurry() instead of > > > call_rcu(). The arrival of a non-lazy call_rcu_hurry() callback on a > > > given CPU kicks any lazy callbacks that might be already queued on that > > > CPU. After all, if there is going to be a grace period, all callbacks > > > might as well get full benefit from it. > > > > > > Yes, this could be done the other way around by creating a > > > call_rcu_lazy(), but earlier experience with this approach and > > > feedback at the 2022 Linux Plumbers Conference shifted the approach > > > to call_rcu() being lazy with call_rcu_hurry() for the few places > > > where laziness is inappropriate. > > > > > > Returning to the test failure, use of ftrace showed that this failure > > > cause caused by the aadded delays due to this new lazy behavior of > > > call_rcu() in kernels built with CONFIG_RCU_LAZY=y. > > > > > > Therefore, make dst_release() use call_rcu_hurry() in order to revert > > > to the old test-failure-free behavior. > > > > > > [ paulmck: Apply s/call_rcu_flush/call_rcu_hurry/ feedback from Tejun Heo. ] > > > > > > Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org> > > > Cc: David Ahern <dsahern@kernel.org> > > > Cc: "David S. Miller" <davem@davemloft.net> > > > Cc: Eric Dumazet <edumazet@google.com> > > > Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org> > > > Cc: Jakub Kicinski <kuba@kernel.org> > > > Cc: Paolo Abeni <pabeni@redhat.com> > > > Cc: <netdev@vger.kernel.org> > > > Signed-off-by: Paul E. McKenney <paulmck@kernel.org> > > > --- > > > net/core/dst.c | 2 +- > > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > > > diff --git a/net/core/dst.c b/net/core/dst.c > > > index bc9c9be4e0801..a4e738d321ba2 100644 > > > --- a/net/core/dst.c > > > +++ b/net/core/dst.c > > > @@ -174,7 +174,7 @@ void dst_release(struct dst_entry *dst) > > > net_warn_ratelimited("%s: dst:%p refcnt:%d\n", > > > __func__, dst, newrefcnt); > > > if (!newrefcnt) > > > - call_rcu(&dst->rcu_head, dst_destroy_rcu); > > > + call_rcu_hurry(&dst->rcu_head, dst_destroy_rcu); > > > } > > > } > > > EXPORT_SYMBOL(dst_release); > > > -- > > > 2.31.1.189.g2e36527f23 > > >
diff --git a/net/core/dst.c b/net/core/dst.c index bc9c9be4e0801..a4e738d321ba2 100644 --- a/net/core/dst.c +++ b/net/core/dst.c @@ -174,7 +174,7 @@ void dst_release(struct dst_entry *dst) net_warn_ratelimited("%s: dst:%p refcnt:%d\n", __func__, dst, newrefcnt); if (!newrefcnt) - call_rcu(&dst->rcu_head, dst_destroy_rcu); + call_rcu_hurry(&dst->rcu_head, dst_destroy_rcu); } } EXPORT_SYMBOL(dst_release);