Message ID | 20230505113315.3307723-3-liujian56@huawei.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | fix softlockup in run_timer_softirq | expand |
On Fri, May 5, 2023 at 7:25 PM Liu Jian <liujian56@huawei.com> wrote: > > From: Peter Zijlstra <peterz@infradead.org> > > Replace the jiffies based timeout with a sched_clock() based one. > > Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> > Signed-off-by: Liu Jian <liujian56@huawei.com> > --- > kernel/softirq.c | 7 ++++--- > 1 file changed, 4 insertions(+), 3 deletions(-) > > diff --git a/kernel/softirq.c b/kernel/softirq.c > index bff5debf6ce6..59f16a9af5d1 100644 > --- a/kernel/softirq.c > +++ b/kernel/softirq.c > @@ -27,6 +27,7 @@ > #include <linux/tick.h> > #include <linux/irq.h> > #include <linux/wait_bit.h> > +#include <linux/sched/clock.h> > > #include <asm/softirq_stack.h> > > @@ -489,7 +490,7 @@ asmlinkage __visible void do_softirq(void) > * we want to handle softirqs as soon as possible, but they > * should not be able to lock up the box. > */ > -#define MAX_SOFTIRQ_TIME msecs_to_jiffies(2) > +#define MAX_SOFTIRQ_TIME (2 * NSEC_PER_MSEC) I wonder if it affects those servers that set HZ to some different values rather than 1000 as default. Thanks, Jason > #define MAX_SOFTIRQ_RESTART 10 > > #ifdef CONFIG_TRACE_IRQFLAGS > @@ -527,9 +528,9 @@ static inline void lockdep_softirq_end(bool in_hardirq) { } > > asmlinkage __visible void __softirq_entry __do_softirq(void) > { > - unsigned long end = jiffies + MAX_SOFTIRQ_TIME; > unsigned long old_flags = current->flags; > int max_restart = MAX_SOFTIRQ_RESTART; > + u64 start = sched_clock(); > struct softirq_action *h; > unsigned long pending; > unsigned int vec_nr; > @@ -584,7 +585,7 @@ asmlinkage __visible void __softirq_entry __do_softirq(void) > > pending = local_softirq_pending(); > if (pending) { > - if (time_before(jiffies, end) && !need_resched() && > + if (sched_clock() - start < MAX_SOFTIRQ_TIME && !need_resched() && > --max_restart) > goto restart; > > -- > 2.34.1 > >
On Mon, May 08 2023 at 12:08, Jason Xing wrote: > On Fri, May 5, 2023 at 7:25 PM Liu Jian <liujian56@huawei.com> wrote: >> @@ -489,7 +490,7 @@ asmlinkage __visible void do_softirq(void) >> * we want to handle softirqs as soon as possible, but they >> * should not be able to lock up the box. >> */ >> -#define MAX_SOFTIRQ_TIME msecs_to_jiffies(2) >> +#define MAX_SOFTIRQ_TIME (2 * NSEC_PER_MSEC) > > I wonder if it affects those servers that set HZ to some different > values rather than 1000 as default. The result of msecs_to_jiffies(2) for different HZ values: HZ=100 1 HZ=250 1 HZ=1000 2 So depending on when the softirq processing starts, this gives the following ranges in which the timeout ends: HZ=100 0 - 10ms HZ=250 0 - 4ms HZ=1000 1 - 2ms But as the various softirq handlers have their own notion of timeouts, loop limits etc. and the timeout is only checked after _all_ pending bits of each iteration have been processed, the outcome of this is all lottery. Due to that the sched_clock() change per se won't have too much impact, but if the overall changes to consolidate the break conditions are in place, I think it will have observable effects. Making this consistent is definitely a good thing, but it won't solve the underlying problem of soft interrupt processing at all. We definitely need to spend more thoughts on pulling things out of soft interrupt context so that these functionalities get under proper resource control by the scheduler. Thanks, tglx
diff --git a/kernel/softirq.c b/kernel/softirq.c index bff5debf6ce6..59f16a9af5d1 100644 --- a/kernel/softirq.c +++ b/kernel/softirq.c @@ -27,6 +27,7 @@ #include <linux/tick.h> #include <linux/irq.h> #include <linux/wait_bit.h> +#include <linux/sched/clock.h> #include <asm/softirq_stack.h> @@ -489,7 +490,7 @@ asmlinkage __visible void do_softirq(void) * we want to handle softirqs as soon as possible, but they * should not be able to lock up the box. */ -#define MAX_SOFTIRQ_TIME msecs_to_jiffies(2) +#define MAX_SOFTIRQ_TIME (2 * NSEC_PER_MSEC) #define MAX_SOFTIRQ_RESTART 10 #ifdef CONFIG_TRACE_IRQFLAGS @@ -527,9 +528,9 @@ static inline void lockdep_softirq_end(bool in_hardirq) { } asmlinkage __visible void __softirq_entry __do_softirq(void) { - unsigned long end = jiffies + MAX_SOFTIRQ_TIME; unsigned long old_flags = current->flags; int max_restart = MAX_SOFTIRQ_RESTART; + u64 start = sched_clock(); struct softirq_action *h; unsigned long pending; unsigned int vec_nr; @@ -584,7 +585,7 @@ asmlinkage __visible void __softirq_entry __do_softirq(void) pending = local_softirq_pending(); if (pending) { - if (time_before(jiffies, end) && !need_resched() && + if (sched_clock() - start < MAX_SOFTIRQ_TIME && !need_resched() && --max_restart) goto restart;