Message ID | 20221117031551.1142289-1-joel@joelfernandes.org (mailing list archive) |
---|---|
State | Not Applicable |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | [rcu/dev,1/3] net: Use call_rcu_flush() for qdisc_free_cb | expand |
On Wed, Nov 16, 2022 at 7:16 PM Joel Fernandes (Google) <joel@joelfernandes.org> wrote: > > In a networking test on ChromeOS, we find that using the new CONFIG_RCU_LAZY > causes a networking test to fail in the teardown phase. > > The failure happens during: ip netns del <name> > > Using ftrace, I found the callbacks it was queuing which this series fixes. Use > call_rcu_flush() to revert to the old behavior. With that, the test passes. > > Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org> > --- > net/sched/sch_generic.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c > index a9aadc4e6858..63fbf640d3b2 100644 > --- a/net/sched/sch_generic.c > +++ b/net/sched/sch_generic.c > @@ -1067,7 +1067,7 @@ static void qdisc_destroy(struct Qdisc *qdisc) > > trace_qdisc_destroy(qdisc); > > - call_rcu(&qdisc->rcu, qdisc_free_cb); > + call_rcu_flush(&qdisc->rcu, qdisc_free_cb); > } I took a look at this one. qdisc_free_cb() is essentially freeing : Some per-cpu memory, and the 'struct Qdisc' I do not see why we need to force a flush for this (small ?) piece of memory.
> On Nov 17, 2022, at 4:44 PM, Eric Dumazet <edumazet@google.com> wrote: > > On Wed, Nov 16, 2022 at 7:16 PM Joel Fernandes (Google) > <joel@joelfernandes.org> wrote: >> >> In a networking test on ChromeOS, we find that using the new CONFIG_RCU_LAZY >> causes a networking test to fail in the teardown phase. >> >> The failure happens during: ip netns del <name> >> >> Using ftrace, I found the callbacks it was queuing which this series fixes. Use >> call_rcu_flush() to revert to the old behavior. With that, the test passes. >> >> Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org> >> --- >> net/sched/sch_generic.c | 2 +- >> 1 file changed, 1 insertion(+), 1 deletion(-) >> >> diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c >> index a9aadc4e6858..63fbf640d3b2 100644 >> --- a/net/sched/sch_generic.c >> +++ b/net/sched/sch_generic.c >> @@ -1067,7 +1067,7 @@ static void qdisc_destroy(struct Qdisc *qdisc) >> >> trace_qdisc_destroy(qdisc); >> >> - call_rcu(&qdisc->rcu, qdisc_free_cb); >> + call_rcu_flush(&qdisc->rcu, qdisc_free_cb); >> } > > I took a look at this one. > > qdisc_free_cb() is essentially freeing : Some per-cpu memory, and the > 'struct Qdisc' > > I do not see why we need to force a flush for this (small ?) piece of memory. I’ll try to drop that and rerun the test, and get back to you. It could be that there is a different callback that this flush() is compensating for, or something. I am pretty sure at one point, dropping this patch made the test fail most of the time. Now it passes 100%. I’ll also attempt to collect a complete trace, maybe I’ll learn some networking code in the process.. Thanks!
On Thu, Nov 17, 2022 at 01:44:12PM -0800, Eric Dumazet wrote: > On Wed, Nov 16, 2022 at 7:16 PM Joel Fernandes (Google) > <joel@joelfernandes.org> wrote: > > > > In a networking test on ChromeOS, we find that using the new CONFIG_RCU_LAZY > > causes a networking test to fail in the teardown phase. > > > > The failure happens during: ip netns del <name> > > > > Using ftrace, I found the callbacks it was queuing which this series fixes. Use > > call_rcu_flush() to revert to the old behavior. With that, the test passes. > > > > Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org> > > --- > > net/sched/sch_generic.c | 2 +- > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c > > index a9aadc4e6858..63fbf640d3b2 100644 > > --- a/net/sched/sch_generic.c > > +++ b/net/sched/sch_generic.c > > @@ -1067,7 +1067,7 @@ static void qdisc_destroy(struct Qdisc *qdisc) > > > > trace_qdisc_destroy(qdisc); > > > > - call_rcu(&qdisc->rcu, qdisc_free_cb); > > + call_rcu_flush(&qdisc->rcu, qdisc_free_cb); > > } > > I took a look at this one. > > qdisc_free_cb() is essentially freeing : Some per-cpu memory, and the > 'struct Qdisc' > > I do not see why we need to force a flush for this (small ?) piece of memory. Indeed! Just tested and dropping this one still makes the test pass. I believe this patch was papering over the issues fixed by the other patches, so it stuck. I will drop this one and move over to trying your suggestions for 2/3. Thanks for taking a look, - Joel
diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c index a9aadc4e6858..63fbf640d3b2 100644 --- a/net/sched/sch_generic.c +++ b/net/sched/sch_generic.c @@ -1067,7 +1067,7 @@ static void qdisc_destroy(struct Qdisc *qdisc) trace_qdisc_destroy(qdisc); - call_rcu(&qdisc->rcu, qdisc_free_cb); + call_rcu_flush(&qdisc->rcu, qdisc_free_cb); } void qdisc_put(struct Qdisc *qdisc)
In a networking test on ChromeOS, we find that using the new CONFIG_RCU_LAZY causes a networking test to fail in the teardown phase. The failure happens during: ip netns del <name> Using ftrace, I found the callbacks it was queuing which this series fixes. Use call_rcu_flush() to revert to the old behavior. With that, the test passes. Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org> --- net/sched/sch_generic.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)