Message ID | 20210924132837.45994-1-kernelfans@gmail.com (mailing list archive) |
---|---|
Headers | show |
Series | arm64/irqentry: remove duplicate housekeeping of | expand |
[Adding Paul for RCU, s390 folk for entry code RCU semantics] On Fri, Sep 24, 2021 at 09:28:32PM +0800, Pingfan Liu wrote: > After introducing arm64/kernel/entry_common.c which is akin to > kernel/entry/common.c , the housekeeping of rcu/trace are done twice as > the following: > enter_from_kernel_mode()->rcu_irq_enter(). > And > gic_handle_irq()->...->handle_domain_irq()->irq_enter()->rcu_irq_enter() > > Besides redundance, based on code analysis, the redundance also raise > some mistake, e.g. rcu_data->dynticks_nmi_nesting inc 2, which causes > rcu_is_cpu_rrupt_from_idle() unexpected. Hmmm... The fundamental questionss are: 1) Who is supposed to be responsible for doing the rcu entry/exit? 2) Is it supposed to matter if this happens multiple times? For (1), I'd generally expect that this is supposed to happen in the arch/common entry code, since that itself (or the irqchip driver) could depend on RCU, and if that's the case thatn handle_domain_irq() shouldn't need to call rcu_irq_enter(). That would be consistent with the way we handle all other exceptions. For (2) I don't know whether the level of nesting is suppoosed to matter. I was under the impression it wasn't meant to matter in general, so I'm a little surprised that rcu_is_cpu_rrupt_from_idle() depends on a specific level of nesting. From a glance it looks like this would cause rcu_sched_clock_irq() to skip setting TIF_NEED_RESCHED, and to not call invoke_rcu_core(), which doesn't sound right, at least... Thomas, Paul, thoughts? AFAICT, s390 will have a similar flow on its IRQ handling path, so if this is a real issue they'll be affected too. Thanks, Mark. > Nmi also faces duplicate accounts. This series aims to address these > duplicate issues. > [1-2/5]: address nmi account duplicate > [3-4/5]: address rcu housekeeping duplicate in irq > [5/5]: as a natural result of [3-4/5], address a history issue. [1] > > > History: > v1 -> v2: > change the subject as the motivation varies. > add the fix for nmi account duplicate > > The subject of v1 is "[PATCH 1/3] kernel/irq: __handle_domain_irq() > makes irq_enter/exit arch optional". [2] It is brought up to fix [1]. > > There have been some tries to enable crash-stop-NMI on arm64, one by me, > the other by Yuichi's [4]. I hope after this series, they can advance, > as Marc said in [3] "No additional NMI patches will make it until we > have resolved the issues" > > [1] https://lore.kernel.org/linux-arm-kernel/87lfewnmdz.fsf@nanos.tec.linutronix.de/ > [2] https://lore.kernel.org/linux-arm-kernel/1607912752-12481-1-git-send-email-kernelfans@gmail.com > [3] https://lore.kernel.org/linux-arm-kernel/afd82be798cb55fd2f96940db7be78c0@kernel.org > [4] https://lore.kernel.org/linux-arm-kernel/20201104080539.3205889-1-ito-yuichi@fujitsu.com > > Cc: Catalin Marinas <catalin.marinas@arm.com> > Cc: Will Deacon <will@kernel.org> > Cc: Mark Rutland <mark.rutland@arm.com> > Cc: Marc Zyngier <maz@kernel.org> > Cc: Joey Gouly <joey.gouly@arm.com> > Cc: Sami Tolvanen <samitolvanen@google.com> > Cc: Julien Thierry <julien.thierry@arm.com> > Cc: Thomas Gleixner <tglx@linutronix.de> > Cc: Yuichi Ito <ito-yuichi@fujitsu.com> > Cc: linux-kernel@vger.kernel.org > To: linux-arm-kernel@lists.infradead.org > > > Pingfan Liu (5): > arm64/entry-common: push the judgement of nmi ahead > irqchip/GICv3: expose handle_nmi() directly > kernel/irq: make irq_{enter,exit}() in handle_domain_irq() arch > optional > irqchip/GICv3: let gic_handle_irq() utilize irqentry on arm64 > irqchip/GICv3: make reschedule-ipi light weight > > arch/arm64/Kconfig | 1 + > arch/arm64/include/asm/irq.h | 7 ++++ > arch/arm64/kernel/entry-common.c | 45 +++++++++++++++------- > arch/arm64/kernel/irq.c | 29 ++++++++++++++ > drivers/irqchip/irq-gic-v3.c | 66 ++++++++++++++++++++------------ > kernel/irq/Kconfig | 3 ++ > kernel/irq/irqdesc.c | 4 ++ > 7 files changed, 116 insertions(+), 39 deletions(-) > > -- > 2.31.1 >
On Fri, Sep 24, 2021 at 06:36:15PM +0100, Mark Rutland wrote: > [Adding Paul for RCU, s390 folk for entry code RCU semantics] > > On Fri, Sep 24, 2021 at 09:28:32PM +0800, Pingfan Liu wrote: > > After introducing arm64/kernel/entry_common.c which is akin to > > kernel/entry/common.c , the housekeeping of rcu/trace are done twice as > > the following: > > enter_from_kernel_mode()->rcu_irq_enter(). > > And > > gic_handle_irq()->...->handle_domain_irq()->irq_enter()->rcu_irq_enter() > > > > Besides redundance, based on code analysis, the redundance also raise > > some mistake, e.g. rcu_data->dynticks_nmi_nesting inc 2, which causes > > rcu_is_cpu_rrupt_from_idle() unexpected. > > Hmmm... > > The fundamental questionss are: > > 1) Who is supposed to be responsible for doing the rcu entry/exit? > > 2) Is it supposed to matter if this happens multiple times? > > For (1), I'd generally expect that this is supposed to happen in the > arch/common entry code, since that itself (or the irqchip driver) could > depend on RCU, and if that's the case thatn handle_domain_irq() > shouldn't need to call rcu_irq_enter(). That would be consistent with > the way we handle all other exceptions. > > For (2) I don't know whether the level of nesting is suppoosed to > matter. I was under the impression it wasn't meant to matter in general, > so I'm a little surprised that rcu_is_cpu_rrupt_from_idle() depends on a > specific level of nesting. > > >From a glance it looks like this would cause rcu_sched_clock_irq() to > skip setting TIF_NEED_RESCHED, and to not call invoke_rcu_core(), which > doesn't sound right, at least... > > Thomas, Paul, thoughts? It is absolutely required that rcu_irq_enter() and rcu_irq_exit() calls be balanced. Normally, this is taken care of by the fact that irq_enter() invokes rcu_irq_enter() and irq_exit() invokes rcu_irq_exit(). Similarly, nmi_enter() invokes rcu_nmi_enter() and nmi_exit() invokes rcu_nmi_exit(). But if you are doing some special-case exception where the handler needs to use RCU readers, but where the rest of the work is not needed, then the resulting calls to rcu_irq_enter() and rcu_irq_exit() must be in the architecture-specific code and must be properly balanced. So if exception entry invokes rcu_irq_enter() twice, then exception exit also needs to invoke rcu_irq_exit() twice. There are some constraints on where calls to these functions are place. For example, any exception-entry code prior to the call to rcu_irq_enter() must consist solely of functions marked noinstr, but Thomas can tell you more. Or am I missing the point of your question? Thanx, Paul > AFAICT, s390 will have a similar flow on its IRQ handling path, so if > this is a real issue they'll be affected too. > > Thanks, > Mark. > > > Nmi also faces duplicate accounts. This series aims to address these > > duplicate issues. > > [1-2/5]: address nmi account duplicate > > [3-4/5]: address rcu housekeeping duplicate in irq > > [5/5]: as a natural result of [3-4/5], address a history issue. [1] > > > > > > History: > > v1 -> v2: > > change the subject as the motivation varies. > > add the fix for nmi account duplicate > > > > The subject of v1 is "[PATCH 1/3] kernel/irq: __handle_domain_irq() > > makes irq_enter/exit arch optional". [2] It is brought up to fix [1]. > > > > There have been some tries to enable crash-stop-NMI on arm64, one by me, > > the other by Yuichi's [4]. I hope after this series, they can advance, > > as Marc said in [3] "No additional NMI patches will make it until we > > have resolved the issues" > > > > [1] https://lore.kernel.org/linux-arm-kernel/87lfewnmdz.fsf@nanos.tec.linutronix.de/ > > [2] https://lore.kernel.org/linux-arm-kernel/1607912752-12481-1-git-send-email-kernelfans@gmail.com > > [3] https://lore.kernel.org/linux-arm-kernel/afd82be798cb55fd2f96940db7be78c0@kernel.org > > [4] https://lore.kernel.org/linux-arm-kernel/20201104080539.3205889-1-ito-yuichi@fujitsu.com > > > > Cc: Catalin Marinas <catalin.marinas@arm.com> > > Cc: Will Deacon <will@kernel.org> > > Cc: Mark Rutland <mark.rutland@arm.com> > > Cc: Marc Zyngier <maz@kernel.org> > > Cc: Joey Gouly <joey.gouly@arm.com> > > Cc: Sami Tolvanen <samitolvanen@google.com> > > Cc: Julien Thierry <julien.thierry@arm.com> > > Cc: Thomas Gleixner <tglx@linutronix.de> > > Cc: Yuichi Ito <ito-yuichi@fujitsu.com> > > Cc: linux-kernel@vger.kernel.org > > To: linux-arm-kernel@lists.infradead.org > > > > > > Pingfan Liu (5): > > arm64/entry-common: push the judgement of nmi ahead > > irqchip/GICv3: expose handle_nmi() directly > > kernel/irq: make irq_{enter,exit}() in handle_domain_irq() arch > > optional > > irqchip/GICv3: let gic_handle_irq() utilize irqentry on arm64 > > irqchip/GICv3: make reschedule-ipi light weight > > > > arch/arm64/Kconfig | 1 + > > arch/arm64/include/asm/irq.h | 7 ++++ > > arch/arm64/kernel/entry-common.c | 45 +++++++++++++++------- > > arch/arm64/kernel/irq.c | 29 ++++++++++++++ > > drivers/irqchip/irq-gic-v3.c | 66 ++++++++++++++++++++------------ > > kernel/irq/Kconfig | 3 ++ > > kernel/irq/irqdesc.c | 4 ++ > > 7 files changed, 116 insertions(+), 39 deletions(-) > > > > -- > > 2.31.1 > >
On Fri, Sep 24, 2021 at 06:36:15PM +0100, Mark Rutland wrote: > [Adding Paul for RCU, s390 folk for entry code RCU semantics] > > On Fri, Sep 24, 2021 at 09:28:32PM +0800, Pingfan Liu wrote: > > After introducing arm64/kernel/entry_common.c which is akin to > > kernel/entry/common.c , the housekeeping of rcu/trace are done twice as > > the following: > > enter_from_kernel_mode()->rcu_irq_enter(). > > And > > gic_handle_irq()->...->handle_domain_irq()->irq_enter()->rcu_irq_enter() > > > > Besides redundance, based on code analysis, the redundance also raise > > some mistake, e.g. rcu_data->dynticks_nmi_nesting inc 2, which causes > > rcu_is_cpu_rrupt_from_idle() unexpected. > > Hmmm... > > The fundamental questionss are: > > 1) Who is supposed to be responsible for doing the rcu entry/exit? > > 2) Is it supposed to matter if this happens multiple times? > > For (1), I'd generally expect that this is supposed to happen in the > arch/common entry code, since that itself (or the irqchip driver) could > depend on RCU, and if that's the case thatn handle_domain_irq() > shouldn't need to call rcu_irq_enter(). That would be consistent with > the way we handle all other exceptions. > In my humble opinion, it had better happen in arch/common entry code as you said. But for many arches which assures that before handle_domain_irq(), no data is involved in rcu updater, it can be done in handle_domain_irq(). And that is a cheap way to integrate with rcu system (at least for the time being). For the (2), it goes deeply into RCU core, hope guides from Paul and Thomas. But at least for the condition if ((user || rcu_is_cpu_rrupt_from_idle()) && rcu_nohz_full_cpu()) in rcu_pending(), it makes sense to tell the nested interrupt from a first level interrupt. Thanks, Pingfan > For (2) I don't know whether the level of nesting is suppoosed to > matter. I was under the impression it wasn't meant to matter in general, > so I'm a little surprised that rcu_is_cpu_rrupt_from_idle() depends on a > specific level of nesting. > > From a glance it looks like this would cause rcu_sched_clock_irq() to > skip setting TIF_NEED_RESCHED, and to not call invoke_rcu_core(), which > doesn't sound right, at least... > > Thomas, Paul, thoughts? > > AFAICT, s390 will have a similar flow on its IRQ handling path, so if > this is a real issue they'll be affected too. > > Thanks, > Mark. > > > Nmi also faces duplicate accounts. This series aims to address these > > duplicate issues. > > [1-2/5]: address nmi account duplicate > > [3-4/5]: address rcu housekeeping duplicate in irq > > [5/5]: as a natural result of [3-4/5], address a history issue. [1] > > > > > > History: > > v1 -> v2: > > change the subject as the motivation varies. > > add the fix for nmi account duplicate > > > > The subject of v1 is "[PATCH 1/3] kernel/irq: __handle_domain_irq() > > makes irq_enter/exit arch optional". [2] It is brought up to fix [1]. > > > > There have been some tries to enable crash-stop-NMI on arm64, one by me, > > the other by Yuichi's [4]. I hope after this series, they can advance, > > as Marc said in [3] "No additional NMI patches will make it until we > > have resolved the issues" > > > > [1] https://lore.kernel.org/linux-arm-kernel/87lfewnmdz.fsf@nanos.tec.linutronix.de/ > > [2] https://lore.kernel.org/linux-arm-kernel/1607912752-12481-1-git-send-email-kernelfans@gmail.com > > [3] https://lore.kernel.org/linux-arm-kernel/afd82be798cb55fd2f96940db7be78c0@kernel.org > > [4] https://lore.kernel.org/linux-arm-kernel/20201104080539.3205889-1-ito-yuichi@fujitsu.com > > > > Cc: Catalin Marinas <catalin.marinas@arm.com> > > Cc: Will Deacon <will@kernel.org> > > Cc: Mark Rutland <mark.rutland@arm.com> > > Cc: Marc Zyngier <maz@kernel.org> > > Cc: Joey Gouly <joey.gouly@arm.com> > > Cc: Sami Tolvanen <samitolvanen@google.com> > > Cc: Julien Thierry <julien.thierry@arm.com> > > Cc: Thomas Gleixner <tglx@linutronix.de> > > Cc: Yuichi Ito <ito-yuichi@fujitsu.com> > > Cc: linux-kernel@vger.kernel.org > > To: linux-arm-kernel@lists.infradead.org > > > > > > Pingfan Liu (5): > > arm64/entry-common: push the judgement of nmi ahead > > irqchip/GICv3: expose handle_nmi() directly > > kernel/irq: make irq_{enter,exit}() in handle_domain_irq() arch > > optional > > irqchip/GICv3: let gic_handle_irq() utilize irqentry on arm64 > > irqchip/GICv3: make reschedule-ipi light weight > > > > arch/arm64/Kconfig | 1 + > > arch/arm64/include/asm/irq.h | 7 ++++ > > arch/arm64/kernel/entry-common.c | 45 +++++++++++++++------- > > arch/arm64/kernel/irq.c | 29 ++++++++++++++ > > drivers/irqchip/irq-gic-v3.c | 66 ++++++++++++++++++++------------ > > kernel/irq/Kconfig | 3 ++ > > kernel/irq/irqdesc.c | 4 ++ > > 7 files changed, 116 insertions(+), 39 deletions(-) > > > > -- > > 2.31.1 > > > > _______________________________________________ > linux-arm-kernel mailing list > linux-arm-kernel@lists.infradead.org > http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
On Fri, Sep 24, 2021 at 03:59:54PM -0700, Paul E. McKenney wrote: > On Fri, Sep 24, 2021 at 06:36:15PM +0100, Mark Rutland wrote: > > [Adding Paul for RCU, s390 folk for entry code RCU semantics] > > > > On Fri, Sep 24, 2021 at 09:28:32PM +0800, Pingfan Liu wrote: > > > After introducing arm64/kernel/entry_common.c which is akin to > > > kernel/entry/common.c , the housekeeping of rcu/trace are done twice as > > > the following: > > > enter_from_kernel_mode()->rcu_irq_enter(). > > > And > > > gic_handle_irq()->...->handle_domain_irq()->irq_enter()->rcu_irq_enter() > > > > > > Besides redundance, based on code analysis, the redundance also raise > > > some mistake, e.g. rcu_data->dynticks_nmi_nesting inc 2, which causes > > > rcu_is_cpu_rrupt_from_idle() unexpected. > > > > Hmmm... > > > > The fundamental questionss are: > > > > 1) Who is supposed to be responsible for doing the rcu entry/exit? > > > > 2) Is it supposed to matter if this happens multiple times? > > > > For (1), I'd generally expect that this is supposed to happen in the > > arch/common entry code, since that itself (or the irqchip driver) could > > depend on RCU, and if that's the case thatn handle_domain_irq() > > shouldn't need to call rcu_irq_enter(). That would be consistent with > > the way we handle all other exceptions. > > > > For (2) I don't know whether the level of nesting is suppoosed to > > matter. I was under the impression it wasn't meant to matter in general, > > so I'm a little surprised that rcu_is_cpu_rrupt_from_idle() depends on a > > specific level of nesting. > > > > >From a glance it looks like this would cause rcu_sched_clock_irq() to > > skip setting TIF_NEED_RESCHED, and to not call invoke_rcu_core(), which > > doesn't sound right, at least... > > > > Thomas, Paul, thoughts? > > It is absolutely required that rcu_irq_enter() and rcu_irq_exit() calls > be balanced. Normally, this is taken care of by the fact that irq_enter() > invokes rcu_irq_enter() and irq_exit() invokes rcu_irq_exit(). Similarly, > nmi_enter() invokes rcu_nmi_enter() and nmi_exit() invokes rcu_nmi_exit(). Sure; I didn't mean to suggest those weren't balanced! The problem here is *nesting*. Due to the structure of our entry code and the core IRQ code, when handling an IRQ we have a sequence: irq_enter() // arch code irq_enter() // irq code < irq handler here > irq_exit() // irq code irq_exit() // arch code ... and if we use something like rcu_is_cpu_rrupt_from_idle() in the middle (e.g. as part of rcu_sched_clock_irq()), this will not give the expected result because of the additional nesting, since rcu_is_cpu_rrupt_from_idle() seems to expect that dynticks_nmi_nesting is only incremented once per exception entry, when it does: /* Are we at first interrupt nesting level? */ nesting = __this_cpu_read(rcu_data.dynticks_nmi_nesting); if (nesting > 1) return false; What I'm trying to figure out is whether that expectation is legitimate, and assuming so, where the entry/exit should happen. Thanks, Mark. > But if you are doing some special-case exception where the handler needs > to use RCU readers, but where the rest of the work is not needed, then > the resulting calls to rcu_irq_enter() and rcu_irq_exit() must be in > the architecture-specific code and must be properly balanced. > > So if exception entry invokes rcu_irq_enter() twice, then exception > exit also needs to invoke rcu_irq_exit() twice. > > There are some constraints on where calls to these functions are place. > For example, any exception-entry code prior to the call to rcu_irq_enter() > must consist solely of functions marked noinstr, but Thomas can tell > you more. > > Or am I missing the point of your question? > > Thanx, Paul > > > AFAICT, s390 will have a similar flow on its IRQ handling path, so if > > this is a real issue they'll be affected too. > > > > Thanks, > > Mark. > > > > > Nmi also faces duplicate accounts. This series aims to address these > > > duplicate issues. > > > [1-2/5]: address nmi account duplicate > > > [3-4/5]: address rcu housekeeping duplicate in irq > > > [5/5]: as a natural result of [3-4/5], address a history issue. [1] > > > > > > > > > History: > > > v1 -> v2: > > > change the subject as the motivation varies. > > > add the fix for nmi account duplicate > > > > > > The subject of v1 is "[PATCH 1/3] kernel/irq: __handle_domain_irq() > > > makes irq_enter/exit arch optional". [2] It is brought up to fix [1]. > > > > > > There have been some tries to enable crash-stop-NMI on arm64, one by me, > > > the other by Yuichi's [4]. I hope after this series, they can advance, > > > as Marc said in [3] "No additional NMI patches will make it until we > > > have resolved the issues" > > > > > > [1] https://lore.kernel.org/linux-arm-kernel/87lfewnmdz.fsf@nanos.tec.linutronix.de/ > > > [2] https://lore.kernel.org/linux-arm-kernel/1607912752-12481-1-git-send-email-kernelfans@gmail.com > > > [3] https://lore.kernel.org/linux-arm-kernel/afd82be798cb55fd2f96940db7be78c0@kernel.org > > > [4] https://lore.kernel.org/linux-arm-kernel/20201104080539.3205889-1-ito-yuichi@fujitsu.com > > > > > > Cc: Catalin Marinas <catalin.marinas@arm.com> > > > Cc: Will Deacon <will@kernel.org> > > > Cc: Mark Rutland <mark.rutland@arm.com> > > > Cc: Marc Zyngier <maz@kernel.org> > > > Cc: Joey Gouly <joey.gouly@arm.com> > > > Cc: Sami Tolvanen <samitolvanen@google.com> > > > Cc: Julien Thierry <julien.thierry@arm.com> > > > Cc: Thomas Gleixner <tglx@linutronix.de> > > > Cc: Yuichi Ito <ito-yuichi@fujitsu.com> > > > Cc: linux-kernel@vger.kernel.org > > > To: linux-arm-kernel@lists.infradead.org > > > > > > > > > Pingfan Liu (5): > > > arm64/entry-common: push the judgement of nmi ahead > > > irqchip/GICv3: expose handle_nmi() directly > > > kernel/irq: make irq_{enter,exit}() in handle_domain_irq() arch > > > optional > > > irqchip/GICv3: let gic_handle_irq() utilize irqentry on arm64 > > > irqchip/GICv3: make reschedule-ipi light weight > > > > > > arch/arm64/Kconfig | 1 + > > > arch/arm64/include/asm/irq.h | 7 ++++ > > > arch/arm64/kernel/entry-common.c | 45 +++++++++++++++------- > > > arch/arm64/kernel/irq.c | 29 ++++++++++++++ > > > drivers/irqchip/irq-gic-v3.c | 66 ++++++++++++++++++++------------ > > > kernel/irq/Kconfig | 3 ++ > > > kernel/irq/irqdesc.c | 4 ++ > > > 7 files changed, 116 insertions(+), 39 deletions(-) > > > > > > -- > > > 2.31.1 > > >
On Mon, Sep 27, 2021 at 10:23:18AM +0100, Mark Rutland wrote: > On Fri, Sep 24, 2021 at 03:59:54PM -0700, Paul E. McKenney wrote: > > On Fri, Sep 24, 2021 at 06:36:15PM +0100, Mark Rutland wrote: > > > [Adding Paul for RCU, s390 folk for entry code RCU semantics] > > > > > > On Fri, Sep 24, 2021 at 09:28:32PM +0800, Pingfan Liu wrote: > > > > After introducing arm64/kernel/entry_common.c which is akin to > > > > kernel/entry/common.c , the housekeeping of rcu/trace are done twice as > > > > the following: > > > > enter_from_kernel_mode()->rcu_irq_enter(). > > > > And > > > > gic_handle_irq()->...->handle_domain_irq()->irq_enter()->rcu_irq_enter() > > > > > > > > Besides redundance, based on code analysis, the redundance also raise > > > > some mistake, e.g. rcu_data->dynticks_nmi_nesting inc 2, which causes > > > > rcu_is_cpu_rrupt_from_idle() unexpected. > > > > > > Hmmm... > > > > > > The fundamental questionss are: > > > > > > 1) Who is supposed to be responsible for doing the rcu entry/exit? > > > > > > 2) Is it supposed to matter if this happens multiple times? > > > > > > For (1), I'd generally expect that this is supposed to happen in the > > > arch/common entry code, since that itself (or the irqchip driver) could > > > depend on RCU, and if that's the case thatn handle_domain_irq() > > > shouldn't need to call rcu_irq_enter(). That would be consistent with > > > the way we handle all other exceptions. > > > > > > For (2) I don't know whether the level of nesting is suppoosed to > > > matter. I was under the impression it wasn't meant to matter in general, > > > so I'm a little surprised that rcu_is_cpu_rrupt_from_idle() depends on a > > > specific level of nesting. > > > > > > >From a glance it looks like this would cause rcu_sched_clock_irq() to > > > skip setting TIF_NEED_RESCHED, and to not call invoke_rcu_core(), which > > > doesn't sound right, at least... > > > > > > Thomas, Paul, thoughts? > > > > It is absolutely required that rcu_irq_enter() and rcu_irq_exit() calls > > be balanced. Normally, this is taken care of by the fact that irq_enter() > > invokes rcu_irq_enter() and irq_exit() invokes rcu_irq_exit(). Similarly, > > nmi_enter() invokes rcu_nmi_enter() and nmi_exit() invokes rcu_nmi_exit(). > > Sure; I didn't mean to suggest those weren't balanced! The problem here > is *nesting*. Due to the structure of our entry code and the core IRQ > code, when handling an IRQ we have a sequence: > > irq_enter() // arch code > irq_enter() // irq code > > < irq handler here > > > irq_exit() // irq code > irq_exit() // arch code > > ... and if we use something like rcu_is_cpu_rrupt_from_idle() in the > middle (e.g. as part of rcu_sched_clock_irq()), this will not give the > expected result because of the additional nesting, since > rcu_is_cpu_rrupt_from_idle() seems to expect that dynticks_nmi_nesting > is only incremented once per exception entry, when it does: > > /* Are we at first interrupt nesting level? */ > nesting = __this_cpu_read(rcu_data.dynticks_nmi_nesting); > if (nesting > 1) > return false; > > What I'm trying to figure out is whether that expectation is legitimate, > and assuming so, where the entry/exit should happen. Oooh... The penalty for fooling rcu_is_cpu_rrupt_from_idle() is that RCU will be unable to detect a userspace quiescent state for a non-nohz_full CPU. That could result in RCU CPU stall warnings if a user task runs continuously on a given CPU for more than 21 seconds (60 seconds in some distros). And this can easily happen if the user has a CPU-bound thread that is the only runnable task on that CPU. So, yes, this does need some sort of resolution. The traditional approach is (as you surmise) to have only a single call to irq_enter() on exception entry and only a single call to irq_exit() on exception exit. If this is feasible, it is highly recommended. In theory, we could have that "1" in "nesting > 1" be a constant supplied by the architecture (you would want "3" if I remember correctly) but in practice could we please avoid this? For one thing, if there is some other path into the kernel for your architecture that does only a single irq_enter(), then rcu_is_cpu_rrupt_from_idle() just doesn't stand a chance. It would need to compare against a different value depending on what exception showed up. Even if that cannot happen, it would be better if your architecture could remain in blissful ignorance of the colorful details of ->dynticks_nmi_nesting manipulations. Another approach would be for the arch code to supply RCU a function that it calls. If there is such a function (or perhaps better, if some new Kconfig option is enabled), RCU invokes it. Otherwise, it compares to "1" as it does now. But you break it, you buy it! ;-) Thoughts? Other approaches? Thanx, Paul > Thanks, > Mark. > > > But if you are doing some special-case exception where the handler needs > > to use RCU readers, but where the rest of the work is not needed, then > > the resulting calls to rcu_irq_enter() and rcu_irq_exit() must be in > > the architecture-specific code and must be properly balanced. > > > > So if exception entry invokes rcu_irq_enter() twice, then exception > > exit also needs to invoke rcu_irq_exit() twice. > > > > There are some constraints on where calls to these functions are place. > > For example, any exception-entry code prior to the call to rcu_irq_enter() > > must consist solely of functions marked noinstr, but Thomas can tell > > you more. > > > > Or am I missing the point of your question? > > > > Thanx, Paul > > > > > AFAICT, s390 will have a similar flow on its IRQ handling path, so if > > > this is a real issue they'll be affected too. > > > > > > Thanks, > > > Mark. > > > > > > > Nmi also faces duplicate accounts. This series aims to address these > > > > duplicate issues. > > > > [1-2/5]: address nmi account duplicate > > > > [3-4/5]: address rcu housekeeping duplicate in irq > > > > [5/5]: as a natural result of [3-4/5], address a history issue. [1] > > > > > > > > > > > > History: > > > > v1 -> v2: > > > > change the subject as the motivation varies. > > > > add the fix for nmi account duplicate > > > > > > > > The subject of v1 is "[PATCH 1/3] kernel/irq: __handle_domain_irq() > > > > makes irq_enter/exit arch optional". [2] It is brought up to fix [1]. > > > > > > > > There have been some tries to enable crash-stop-NMI on arm64, one by me, > > > > the other by Yuichi's [4]. I hope after this series, they can advance, > > > > as Marc said in [3] "No additional NMI patches will make it until we > > > > have resolved the issues" > > > > > > > > [1] https://lore.kernel.org/linux-arm-kernel/87lfewnmdz.fsf@nanos.tec.linutronix.de/ > > > > [2] https://lore.kernel.org/linux-arm-kernel/1607912752-12481-1-git-send-email-kernelfans@gmail.com > > > > [3] https://lore.kernel.org/linux-arm-kernel/afd82be798cb55fd2f96940db7be78c0@kernel.org > > > > [4] https://lore.kernel.org/linux-arm-kernel/20201104080539.3205889-1-ito-yuichi@fujitsu.com > > > > > > > > Cc: Catalin Marinas <catalin.marinas@arm.com> > > > > Cc: Will Deacon <will@kernel.org> > > > > Cc: Mark Rutland <mark.rutland@arm.com> > > > > Cc: Marc Zyngier <maz@kernel.org> > > > > Cc: Joey Gouly <joey.gouly@arm.com> > > > > Cc: Sami Tolvanen <samitolvanen@google.com> > > > > Cc: Julien Thierry <julien.thierry@arm.com> > > > > Cc: Thomas Gleixner <tglx@linutronix.de> > > > > Cc: Yuichi Ito <ito-yuichi@fujitsu.com> > > > > Cc: linux-kernel@vger.kernel.org > > > > To: linux-arm-kernel@lists.infradead.org > > > > > > > > > > > > Pingfan Liu (5): > > > > arm64/entry-common: push the judgement of nmi ahead > > > > irqchip/GICv3: expose handle_nmi() directly > > > > kernel/irq: make irq_{enter,exit}() in handle_domain_irq() arch > > > > optional > > > > irqchip/GICv3: let gic_handle_irq() utilize irqentry on arm64 > > > > irqchip/GICv3: make reschedule-ipi light weight > > > > > > > > arch/arm64/Kconfig | 1 + > > > > arch/arm64/include/asm/irq.h | 7 ++++ > > > > arch/arm64/kernel/entry-common.c | 45 +++++++++++++++------- > > > > arch/arm64/kernel/irq.c | 29 ++++++++++++++ > > > > drivers/irqchip/irq-gic-v3.c | 66 ++++++++++++++++++++------------ > > > > kernel/irq/Kconfig | 3 ++ > > > > kernel/irq/irqdesc.c | 4 ++ > > > > 7 files changed, 116 insertions(+), 39 deletions(-) > > > > > > > > -- > > > > 2.31.1 > > > >
On Mon, Sep 27, 2021 at 05:09:22PM -0700, Paul E. McKenney wrote: > On Mon, Sep 27, 2021 at 10:23:18AM +0100, Mark Rutland wrote: > > On Fri, Sep 24, 2021 at 03:59:54PM -0700, Paul E. McKenney wrote: > > > On Fri, Sep 24, 2021 at 06:36:15PM +0100, Mark Rutland wrote: > > > > [Adding Paul for RCU, s390 folk for entry code RCU semantics] > > > > > > > > On Fri, Sep 24, 2021 at 09:28:32PM +0800, Pingfan Liu wrote: > > > > > After introducing arm64/kernel/entry_common.c which is akin to > > > > > kernel/entry/common.c , the housekeeping of rcu/trace are done twice as > > > > > the following: > > > > > enter_from_kernel_mode()->rcu_irq_enter(). > > > > > And > > > > > gic_handle_irq()->...->handle_domain_irq()->irq_enter()->rcu_irq_enter() > > > > > > > > > > Besides redundance, based on code analysis, the redundance also raise > > > > > some mistake, e.g. rcu_data->dynticks_nmi_nesting inc 2, which causes > > > > > rcu_is_cpu_rrupt_from_idle() unexpected. > > > > > > > > Hmmm... > > > > > > > > The fundamental questionss are: > > > > > > > > 1) Who is supposed to be responsible for doing the rcu entry/exit? > > > > > > > > 2) Is it supposed to matter if this happens multiple times? > > > > > > > > For (1), I'd generally expect that this is supposed to happen in the > > > > arch/common entry code, since that itself (or the irqchip driver) could > > > > depend on RCU, and if that's the case thatn handle_domain_irq() > > > > shouldn't need to call rcu_irq_enter(). That would be consistent with > > > > the way we handle all other exceptions. > > > > > > > > For (2) I don't know whether the level of nesting is suppoosed to > > > > matter. I was under the impression it wasn't meant to matter in general, > > > > so I'm a little surprised that rcu_is_cpu_rrupt_from_idle() depends on a > > > > specific level of nesting. > > > > > > > > >From a glance it looks like this would cause rcu_sched_clock_irq() to > > > > skip setting TIF_NEED_RESCHED, and to not call invoke_rcu_core(), which > > > > doesn't sound right, at least... > > > > > > > > Thomas, Paul, thoughts? > > > > > > It is absolutely required that rcu_irq_enter() and rcu_irq_exit() calls > > > be balanced. Normally, this is taken care of by the fact that irq_enter() > > > invokes rcu_irq_enter() and irq_exit() invokes rcu_irq_exit(). Similarly, > > > nmi_enter() invokes rcu_nmi_enter() and nmi_exit() invokes rcu_nmi_exit(). > > > > Sure; I didn't mean to suggest those weren't balanced! The problem here > > is *nesting*. Due to the structure of our entry code and the core IRQ > > code, when handling an IRQ we have a sequence: > > > > irq_enter() // arch code > > irq_enter() // irq code > > > > < irq handler here > > > > > irq_exit() // irq code > > irq_exit() // arch code > > > > ... and if we use something like rcu_is_cpu_rrupt_from_idle() in the > > middle (e.g. as part of rcu_sched_clock_irq()), this will not give the > > expected result because of the additional nesting, since > > rcu_is_cpu_rrupt_from_idle() seems to expect that dynticks_nmi_nesting > > is only incremented once per exception entry, when it does: > > > > /* Are we at first interrupt nesting level? */ > > nesting = __this_cpu_read(rcu_data.dynticks_nmi_nesting); > > if (nesting > 1) > > return false; > > > > What I'm trying to figure out is whether that expectation is legitimate, > > and assuming so, where the entry/exit should happen. > > Oooh... > > The penalty for fooling rcu_is_cpu_rrupt_from_idle() is that RCU will > be unable to detect a userspace quiescent state for a non-nohz_full > CPU. That could result in RCU CPU stall warnings if a user task runs > continuously on a given CPU for more than 21 seconds (60 seconds in > some distros). And this can easily happen if the user has a CPU-bound > thread that is the only runnable task on that CPU. > > So, yes, this does need some sort of resolution. > > The traditional approach is (as you surmise) to have only a single call > to irq_enter() on exception entry and only a single call to irq_exit() > on exception exit. If this is feasible, it is highly recommended. Cool; that's roughly what I was expecting / hoping to hear! > In theory, we could have that "1" in "nesting > 1" be a constant supplied > by the architecture (you would want "3" if I remember correctly) but > in practice could we please avoid this? For one thing, if there is > some other path into the kernel for your architecture that does only a > single irq_enter(), then rcu_is_cpu_rrupt_from_idle() just doesn't stand > a chance. It would need to compare against a different value depending > on what exception showed up. Even if that cannot happen, it would be > better if your architecture could remain in blissful ignorance of the > colorful details of ->dynticks_nmi_nesting manipulations. I completely agree. I think it's much harder to keep that in check than to enforce a "once per architectural exception" policy in the arch code. > Another approach would be for the arch code to supply RCU a function that > it calls. If there is such a function (or perhaps better, if some new > Kconfig option is enabled), RCU invokes it. Otherwise, it compares to > "1" as it does now. But you break it, you buy it! ;-) I guess we could look at the exception regs and inspect the original context, but it sounds overkill... I think the cleanest thing is to leave this to arch code, and have the common IRQ code stay well clear. Unfortunately most architectures (including arch/arm) still need the common IRQ code to handle this, so we'll have to make that conditional on Kconfig, something like the below (build+boot tested only). If there are no objections, I'll go check who else needs the same treatment (IIUC at least s390 will), and spin that as a real patch/series. Thanks, Mark. ---->8---- diff --git a/arch/Kconfig b/arch/Kconfig index 8df1c7102643..c59475e50e4c 100644 --- a/arch/Kconfig +++ b/arch/Kconfig @@ -225,6 +225,12 @@ config GENERIC_SMP_IDLE_THREAD config GENERIC_IDLE_POLL_SETUP bool +config ARCH_ENTERS_IRQ + bool + help + An architecture should select this when it performs irq entry + management itself (e.g. calling irq_enter() and irq_exit()). + config ARCH_HAS_FORTIFY_SOURCE bool help diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig index 5c7ae4c3954b..fa6476bf2b4d 100644 --- a/arch/arm64/Kconfig +++ b/arch/arm64/Kconfig @@ -16,6 +16,7 @@ config ARM64 select ARCH_ENABLE_MEMORY_HOTREMOVE select ARCH_ENABLE_SPLIT_PMD_PTLOCK if PGTABLE_LEVELS > 2 select ARCH_ENABLE_THP_MIGRATION if TRANSPARENT_HUGEPAGE + select ARCH_ENTERS_IRQ select ARCH_HAS_CACHE_LINE_SIZE select ARCH_HAS_DEBUG_VIRTUAL select ARCH_HAS_DEBUG_VM_PGTABLE diff --git a/kernel/irq/irqdesc.c b/kernel/irq/irqdesc.c index 4e3c29bb603c..6affa12222e0 100644 --- a/kernel/irq/irqdesc.c +++ b/kernel/irq/irqdesc.c @@ -677,6 +677,15 @@ int generic_handle_domain_irq(struct irq_domain *domain, unsigned int hwirq) EXPORT_SYMBOL_GPL(generic_handle_domain_irq); #ifdef CONFIG_HANDLE_DOMAIN_IRQ + +#ifdef ARCH_ENTERS_IRQ +#define handle_irq_enter() +#define handle_irq_exit() +#else +#define handle_irq_enter() irq_enter() +#define handle_irq_exit() irq_exit() +#endif + /** * handle_domain_irq - Invoke the handler for a HW irq belonging to a domain, * usually for a root interrupt controller @@ -693,7 +702,7 @@ int handle_domain_irq(struct irq_domain *domain, struct irq_desc *desc; int ret = 0; - irq_enter(); + handle_irq_enter(); /* The irqdomain code provides boundary checks */ desc = irq_resolve_mapping(domain, hwirq); @@ -702,7 +711,7 @@ int handle_domain_irq(struct irq_domain *domain, else ret = -EINVAL; - irq_exit(); + handle_irq_exit(); set_irq_regs(old_regs); return ret; }
On Tue, Sep 28, 2021 at 09:32:22AM +0100, Mark Rutland wrote: > On Mon, Sep 27, 2021 at 05:09:22PM -0700, Paul E. McKenney wrote: > > On Mon, Sep 27, 2021 at 10:23:18AM +0100, Mark Rutland wrote: > > > On Fri, Sep 24, 2021 at 03:59:54PM -0700, Paul E. McKenney wrote: > > > > On Fri, Sep 24, 2021 at 06:36:15PM +0100, Mark Rutland wrote: > > > > > [Adding Paul for RCU, s390 folk for entry code RCU semantics] > > > > > > > > > > On Fri, Sep 24, 2021 at 09:28:32PM +0800, Pingfan Liu wrote: > > > > > > After introducing arm64/kernel/entry_common.c which is akin to > > > > > > kernel/entry/common.c , the housekeeping of rcu/trace are done twice as > > > > > > the following: > > > > > > enter_from_kernel_mode()->rcu_irq_enter(). > > > > > > And > > > > > > gic_handle_irq()->...->handle_domain_irq()->irq_enter()->rcu_irq_enter() > > > > > > > > > > > > Besides redundance, based on code analysis, the redundance also raise > > > > > > some mistake, e.g. rcu_data->dynticks_nmi_nesting inc 2, which causes > > > > > > rcu_is_cpu_rrupt_from_idle() unexpected. > > > > > > > > > > Hmmm... > > > > > > > > > > The fundamental questionss are: > > > > > > > > > > 1) Who is supposed to be responsible for doing the rcu entry/exit? > > > > > > > > > > 2) Is it supposed to matter if this happens multiple times? > > > > > > > > > > For (1), I'd generally expect that this is supposed to happen in the > > > > > arch/common entry code, since that itself (or the irqchip driver) could > > > > > depend on RCU, and if that's the case thatn handle_domain_irq() > > > > > shouldn't need to call rcu_irq_enter(). That would be consistent with > > > > > the way we handle all other exceptions. > > > > > > > > > > For (2) I don't know whether the level of nesting is suppoosed to > > > > > matter. I was under the impression it wasn't meant to matter in general, > > > > > so I'm a little surprised that rcu_is_cpu_rrupt_from_idle() depends on a > > > > > specific level of nesting. > > > > > > > > > > >From a glance it looks like this would cause rcu_sched_clock_irq() to > > > > > skip setting TIF_NEED_RESCHED, and to not call invoke_rcu_core(), which > > > > > doesn't sound right, at least... > > > > > > > > > > Thomas, Paul, thoughts? > > > > > > > > It is absolutely required that rcu_irq_enter() and rcu_irq_exit() calls > > > > be balanced. Normally, this is taken care of by the fact that irq_enter() > > > > invokes rcu_irq_enter() and irq_exit() invokes rcu_irq_exit(). Similarly, > > > > nmi_enter() invokes rcu_nmi_enter() and nmi_exit() invokes rcu_nmi_exit(). > > > > > > Sure; I didn't mean to suggest those weren't balanced! The problem here > > > is *nesting*. Due to the structure of our entry code and the core IRQ > > > code, when handling an IRQ we have a sequence: > > > > > > irq_enter() // arch code > > > irq_enter() // irq code > > > > > > < irq handler here > > > > > > > irq_exit() // irq code > > > irq_exit() // arch code > > > > > > ... and if we use something like rcu_is_cpu_rrupt_from_idle() in the > > > middle (e.g. as part of rcu_sched_clock_irq()), this will not give the > > > expected result because of the additional nesting, since > > > rcu_is_cpu_rrupt_from_idle() seems to expect that dynticks_nmi_nesting > > > is only incremented once per exception entry, when it does: > > > > > > /* Are we at first interrupt nesting level? */ > > > nesting = __this_cpu_read(rcu_data.dynticks_nmi_nesting); > > > if (nesting > 1) > > > return false; > > > > > > What I'm trying to figure out is whether that expectation is legitimate, > > > and assuming so, where the entry/exit should happen. > > > > Oooh... > > > > The penalty for fooling rcu_is_cpu_rrupt_from_idle() is that RCU will > > be unable to detect a userspace quiescent state for a non-nohz_full > > CPU. That could result in RCU CPU stall warnings if a user task runs > > continuously on a given CPU for more than 21 seconds (60 seconds in > > some distros). And this can easily happen if the user has a CPU-bound > > thread that is the only runnable task on that CPU. > > > > So, yes, this does need some sort of resolution. > > > > The traditional approach is (as you surmise) to have only a single call > > to irq_enter() on exception entry and only a single call to irq_exit() > > on exception exit. If this is feasible, it is highly recommended. > > Cool; that's roughly what I was expecting / hoping to hear! > > > In theory, we could have that "1" in "nesting > 1" be a constant supplied > > by the architecture (you would want "3" if I remember correctly) but > > in practice could we please avoid this? For one thing, if there is > > some other path into the kernel for your architecture that does only a > > single irq_enter(), then rcu_is_cpu_rrupt_from_idle() just doesn't stand > > a chance. It would need to compare against a different value depending > > on what exception showed up. Even if that cannot happen, it would be > > better if your architecture could remain in blissful ignorance of the > > colorful details of ->dynticks_nmi_nesting manipulations. > > I completely agree. I think it's much harder to keep that in check than > to enforce a "once per architectural exception" policy in the arch code. > > > Another approach would be for the arch code to supply RCU a function that > > it calls. If there is such a function (or perhaps better, if some new > > Kconfig option is enabled), RCU invokes it. Otherwise, it compares to > > "1" as it does now. But you break it, you buy it! ;-) > > I guess we could look at the exception regs and inspect the original > context, but it sounds overkill... > > I think the cleanest thing is to leave this to arch code, and have the > common IRQ code stay well clear. Unfortunately most architectures > (including arch/arm) still need the common IRQ code to handle this, so > we'll have to make that conditional on Kconfig, something like the below > (build+boot tested only). > > If there are no objections, I'll go check who else needs the same > treatment (IIUC at least s390 will), and spin that as a real > patch/series. Ah, looking again this is basically Pinfan's patch 2, so ignore the below, and I'll review Pingfan's patch instead. Thanks, Mark.
Mark Rutland <mark.rutland@arm.com> writes: > On Mon, Sep 27, 2021 at 05:09:22PM -0700, Paul E. McKenney wrote: >> On Mon, Sep 27, 2021 at 10:23:18AM +0100, Mark Rutland wrote: >> > On Fri, Sep 24, 2021 at 03:59:54PM -0700, Paul E. McKenney wrote: >> > > On Fri, Sep 24, 2021 at 06:36:15PM +0100, Mark Rutland wrote: >> > > > [Adding Paul for RCU, s390 folk for entry code RCU semantics] >> > > > >> > > > On Fri, Sep 24, 2021 at 09:28:32PM +0800, Pingfan Liu wrote: >> > > > > After introducing arm64/kernel/entry_common.c which is akin to >> > > > > kernel/entry/common.c , the housekeeping of rcu/trace are done twice as >> > > > > the following: >> > > > > enter_from_kernel_mode()->rcu_irq_enter(). >> > > > > And >> > > > > gic_handle_irq()->...->handle_domain_irq()->irq_enter()->rcu_irq_enter() >> > > > > >> > > > > Besides redundance, based on code analysis, the redundance also raise >> > > > > some mistake, e.g. rcu_data->dynticks_nmi_nesting inc 2, which causes >> > > > > rcu_is_cpu_rrupt_from_idle() unexpected. >> > > > >> > > > Hmmm... >> > > > >> > > > The fundamental questionss are: >> > > > >> > > > 1) Who is supposed to be responsible for doing the rcu entry/exit? >> > > > >> > > > 2) Is it supposed to matter if this happens multiple times? >> > > > >> > > > For (1), I'd generally expect that this is supposed to happen in the >> > > > arch/common entry code, since that itself (or the irqchip driver) could >> > > > depend on RCU, and if that's the case thatn handle_domain_irq() >> > > > shouldn't need to call rcu_irq_enter(). That would be consistent with >> > > > the way we handle all other exceptions. >> > > > >> > > > For (2) I don't know whether the level of nesting is suppoosed to >> > > > matter. I was under the impression it wasn't meant to matter in general, >> > > > so I'm a little surprised that rcu_is_cpu_rrupt_from_idle() depends on a >> > > > specific level of nesting. >> > > > >> > > > >From a glance it looks like this would cause rcu_sched_clock_irq() to >> > > > skip setting TIF_NEED_RESCHED, and to not call invoke_rcu_core(), which >> > > > doesn't sound right, at least... >> > > > >> > > > Thomas, Paul, thoughts? >> > > >> > > It is absolutely required that rcu_irq_enter() and rcu_irq_exit() calls >> > > be balanced. Normally, this is taken care of by the fact that irq_enter() >> > > invokes rcu_irq_enter() and irq_exit() invokes rcu_irq_exit(). Similarly, >> > > nmi_enter() invokes rcu_nmi_enter() and nmi_exit() invokes rcu_nmi_exit(). >> > >> > Sure; I didn't mean to suggest those weren't balanced! The problem here >> > is *nesting*. Due to the structure of our entry code and the core IRQ >> > code, when handling an IRQ we have a sequence: >> > >> > irq_enter() // arch code >> > irq_enter() // irq code >> > >> > < irq handler here > >> > >> > irq_exit() // irq code >> > irq_exit() // arch code >> > >> > ... and if we use something like rcu_is_cpu_rrupt_from_idle() in the >> > middle (e.g. as part of rcu_sched_clock_irq()), this will not give the >> > expected result because of the additional nesting, since >> > rcu_is_cpu_rrupt_from_idle() seems to expect that dynticks_nmi_nesting >> > is only incremented once per exception entry, when it does: >> > >> > /* Are we at first interrupt nesting level? */ >> > nesting = __this_cpu_read(rcu_data.dynticks_nmi_nesting); >> > if (nesting > 1) >> > return false; >> > >> > What I'm trying to figure out is whether that expectation is legitimate, >> > and assuming so, where the entry/exit should happen. >> >> Oooh... >> >> The penalty for fooling rcu_is_cpu_rrupt_from_idle() is that RCU will >> be unable to detect a userspace quiescent state for a non-nohz_full >> CPU. That could result in RCU CPU stall warnings if a user task runs >> continuously on a given CPU for more than 21 seconds (60 seconds in >> some distros). And this can easily happen if the user has a CPU-bound >> thread that is the only runnable task on that CPU. >> >> So, yes, this does need some sort of resolution. >> >> The traditional approach is (as you surmise) to have only a single call >> to irq_enter() on exception entry and only a single call to irq_exit() >> on exception exit. If this is feasible, it is highly recommended. > > Cool; that's roughly what I was expecting / hoping to hear! > >> In theory, we could have that "1" in "nesting > 1" be a constant supplied >> by the architecture (you would want "3" if I remember correctly) but >> in practice could we please avoid this? For one thing, if there is >> some other path into the kernel for your architecture that does only a >> single irq_enter(), then rcu_is_cpu_rrupt_from_idle() just doesn't stand >> a chance. It would need to compare against a different value depending >> on what exception showed up. Even if that cannot happen, it would be >> better if your architecture could remain in blissful ignorance of the >> colorful details of ->dynticks_nmi_nesting manipulations. > > I completely agree. I think it's much harder to keep that in check than > to enforce a "once per architectural exception" policy in the arch code. > >> Another approach would be for the arch code to supply RCU a function that >> it calls. If there is such a function (or perhaps better, if some new >> Kconfig option is enabled), RCU invokes it. Otherwise, it compares to >> "1" as it does now. But you break it, you buy it! ;-) > > I guess we could look at the exception regs and inspect the original > context, but it sounds overkill... > > I think the cleanest thing is to leave this to arch code, and have the > common IRQ code stay well clear. Unfortunately most architectures > (including arch/arm) still need the common IRQ code to handle this, so > we'll have to make that conditional on Kconfig, something like the below > (build+boot tested only). > > If there are no objections, I'll go check who else needs the same > treatment (IIUC at least s390 will), and spin that as a real > patch/series. Hmm, s390 doesn't use handle_domain_irq() and doesn't have HANDLE_DOMAIN_IRQ set. So i don't think the patch below applies to s390. However, i'll follow the code to make sure we're not calling irq_enter/irq_exit twice. > Thanks, > Mark. > > ---->8---- > diff --git a/arch/Kconfig b/arch/Kconfig > index 8df1c7102643..c59475e50e4c 100644 > --- a/arch/Kconfig > +++ b/arch/Kconfig > @@ -225,6 +225,12 @@ config GENERIC_SMP_IDLE_THREAD > config GENERIC_IDLE_POLL_SETUP > bool > > +config ARCH_ENTERS_IRQ > + bool > + help > + An architecture should select this when it performs irq entry > + management itself (e.g. calling irq_enter() and irq_exit()). > + > config ARCH_HAS_FORTIFY_SOURCE > bool > help > diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig > index 5c7ae4c3954b..fa6476bf2b4d 100644 > --- a/arch/arm64/Kconfig > +++ b/arch/arm64/Kconfig > @@ -16,6 +16,7 @@ config ARM64 > select ARCH_ENABLE_MEMORY_HOTREMOVE > select ARCH_ENABLE_SPLIT_PMD_PTLOCK if PGTABLE_LEVELS > 2 > select ARCH_ENABLE_THP_MIGRATION if TRANSPARENT_HUGEPAGE > + select ARCH_ENTERS_IRQ > select ARCH_HAS_CACHE_LINE_SIZE > select ARCH_HAS_DEBUG_VIRTUAL > select ARCH_HAS_DEBUG_VM_PGTABLE > diff --git a/kernel/irq/irqdesc.c b/kernel/irq/irqdesc.c > index 4e3c29bb603c..6affa12222e0 100644 > --- a/kernel/irq/irqdesc.c > +++ b/kernel/irq/irqdesc.c > @@ -677,6 +677,15 @@ int generic_handle_domain_irq(struct irq_domain *domain, unsigned int hwirq) > EXPORT_SYMBOL_GPL(generic_handle_domain_irq); > > #ifdef CONFIG_HANDLE_DOMAIN_IRQ > + > +#ifdef ARCH_ENTERS_IRQ > +#define handle_irq_enter() > +#define handle_irq_exit() > +#else > +#define handle_irq_enter() irq_enter() > +#define handle_irq_exit() irq_exit() > +#endif > + > /** > * handle_domain_irq - Invoke the handler for a HW irq belonging to a domain, > * usually for a root interrupt controller > @@ -693,7 +702,7 @@ int handle_domain_irq(struct irq_domain *domain, > struct irq_desc *desc; > int ret = 0; > > - irq_enter(); > + handle_irq_enter(); > > /* The irqdomain code provides boundary checks */ > desc = irq_resolve_mapping(domain, hwirq); > @@ -702,7 +711,7 @@ int handle_domain_irq(struct irq_domain *domain, > else > ret = -EINVAL; > > - irq_exit(); > + handle_irq_exit(); > set_irq_regs(old_regs); > return ret; > }
On Tue, Sep 28, 2021 at 11:52:51AM +0200, Sven Schnelle wrote: > Mark Rutland <mark.rutland@arm.com> writes: > > > On Mon, Sep 27, 2021 at 05:09:22PM -0700, Paul E. McKenney wrote: > >> On Mon, Sep 27, 2021 at 10:23:18AM +0100, Mark Rutland wrote: > >> > Sure; I didn't mean to suggest those weren't balanced! The problem here > >> > is *nesting*. Due to the structure of our entry code and the core IRQ > >> > code, when handling an IRQ we have a sequence: > >> > > >> > irq_enter() // arch code > >> > irq_enter() // irq code > >> > > >> > < irq handler here > > >> > > >> > irq_exit() // irq code > >> > irq_exit() // arch code > >> > > >> > ... and if we use something like rcu_is_cpu_rrupt_from_idle() in the > >> > middle (e.g. as part of rcu_sched_clock_irq()), this will not give the > >> > expected result because of the additional nesting, since > >> > rcu_is_cpu_rrupt_from_idle() seems to expect that dynticks_nmi_nesting > >> > is only incremented once per exception entry, when it does: > >> > > >> > /* Are we at first interrupt nesting level? */ > >> > nesting = __this_cpu_read(rcu_data.dynticks_nmi_nesting); > >> > if (nesting > 1) > >> > return false; > >> > > >> > What I'm trying to figure out is whether that expectation is legitimate, > >> > and assuming so, where the entry/exit should happen. > >> > >> Oooh... > >> > >> The penalty for fooling rcu_is_cpu_rrupt_from_idle() is that RCU will > >> be unable to detect a userspace quiescent state for a non-nohz_full > >> CPU. That could result in RCU CPU stall warnings if a user task runs > >> continuously on a given CPU for more than 21 seconds (60 seconds in > >> some distros). And this can easily happen if the user has a CPU-bound > >> thread that is the only runnable task on that CPU. > >> > >> So, yes, this does need some sort of resolution. > >> > >> The traditional approach is (as you surmise) to have only a single call > >> to irq_enter() on exception entry and only a single call to irq_exit() > >> on exception exit. If this is feasible, it is highly recommended. > > > > Cool; that's roughly what I was expecting / hoping to hear! > > > >> In theory, we could have that "1" in "nesting > 1" be a constant supplied > >> by the architecture (you would want "3" if I remember correctly) but > >> in practice could we please avoid this? For one thing, if there is > >> some other path into the kernel for your architecture that does only a > >> single irq_enter(), then rcu_is_cpu_rrupt_from_idle() just doesn't stand > >> a chance. It would need to compare against a different value depending > >> on what exception showed up. Even if that cannot happen, it would be > >> better if your architecture could remain in blissful ignorance of the > >> colorful details of ->dynticks_nmi_nesting manipulations. > > > > I completely agree. I think it's much harder to keep that in check than > > to enforce a "once per architectural exception" policy in the arch code. > > > >> Another approach would be for the arch code to supply RCU a function that > >> it calls. If there is such a function (or perhaps better, if some new > >> Kconfig option is enabled), RCU invokes it. Otherwise, it compares to > >> "1" as it does now. But you break it, you buy it! ;-) > > > > I guess we could look at the exception regs and inspect the original > > context, but it sounds overkill... > > > > I think the cleanest thing is to leave this to arch code, and have the > > common IRQ code stay well clear. Unfortunately most architectures > > (including arch/arm) still need the common IRQ code to handle this, so > > we'll have to make that conditional on Kconfig, something like the below > > (build+boot tested only). > > > > If there are no objections, I'll go check who else needs the same > > treatment (IIUC at least s390 will), and spin that as a real > > patch/series. > > Hmm, s390 doesn't use handle_domain_irq() and doesn't have > HANDLE_DOMAIN_IRQ set. So i don't think the patch below applies to s390. > However, i'll follow the code to make sure we're not calling > irq_enter/irq_exit twice. I wasn't clear, but for s390, my concern was that in do_io_irq() and do_ext_irq() you have the sequence: irqentry_enter() // calls rcu_irq_enter() irq_enter(); // calls rcu_irq_enter() then irq_enter_rcu(); < handler> irq_exit(); // calls __irq_exit_rcu then rcu_irq_exit(); irqentry_exit(); // calls rcu_irq_exit() ... and so IIUC you call rcu_irq_enter() and rcu_irq_exit() twice, getting the same double-increment of `dynticks_nmi_nesting` per interrupt, and the same potential problem with rcu_is_cpu_rrupt_from_idle(). Thanks, Mark.
On Tue, Sep 28, 2021 at 09:32:22AM +0100, Mark Rutland wrote: > On Mon, Sep 27, 2021 at 05:09:22PM -0700, Paul E. McKenney wrote: > > On Mon, Sep 27, 2021 at 10:23:18AM +0100, Mark Rutland wrote: > > > On Fri, Sep 24, 2021 at 03:59:54PM -0700, Paul E. McKenney wrote: > > > > On Fri, Sep 24, 2021 at 06:36:15PM +0100, Mark Rutland wrote: > > > > > [Adding Paul for RCU, s390 folk for entry code RCU semantics] > > > > > > > > > > On Fri, Sep 24, 2021 at 09:28:32PM +0800, Pingfan Liu wrote: > > > > > > After introducing arm64/kernel/entry_common.c which is akin to > > > > > > kernel/entry/common.c , the housekeeping of rcu/trace are done twice as > > > > > > the following: > > > > > > enter_from_kernel_mode()->rcu_irq_enter(). > > > > > > And > > > > > > gic_handle_irq()->...->handle_domain_irq()->irq_enter()->rcu_irq_enter() > > > > > > > > > > > > Besides redundance, based on code analysis, the redundance also raise > > > > > > some mistake, e.g. rcu_data->dynticks_nmi_nesting inc 2, which causes > > > > > > rcu_is_cpu_rrupt_from_idle() unexpected. > > > > > > > > > > Hmmm... > > > > > > > > > > The fundamental questionss are: > > > > > > > > > > 1) Who is supposed to be responsible for doing the rcu entry/exit? > > > > > > > > > > 2) Is it supposed to matter if this happens multiple times? > > > > > > > > > > For (1), I'd generally expect that this is supposed to happen in the > > > > > arch/common entry code, since that itself (or the irqchip driver) could > > > > > depend on RCU, and if that's the case thatn handle_domain_irq() > > > > > shouldn't need to call rcu_irq_enter(). That would be consistent with > > > > > the way we handle all other exceptions. > > > > > > > > > > For (2) I don't know whether the level of nesting is suppoosed to > > > > > matter. I was under the impression it wasn't meant to matter in general, > > > > > so I'm a little surprised that rcu_is_cpu_rrupt_from_idle() depends on a > > > > > specific level of nesting. > > > > > > > > > > >From a glance it looks like this would cause rcu_sched_clock_irq() to > > > > > skip setting TIF_NEED_RESCHED, and to not call invoke_rcu_core(), which > > > > > doesn't sound right, at least... > > > > > > > > > > Thomas, Paul, thoughts? > > > > > > > > It is absolutely required that rcu_irq_enter() and rcu_irq_exit() calls > > > > be balanced. Normally, this is taken care of by the fact that irq_enter() > > > > invokes rcu_irq_enter() and irq_exit() invokes rcu_irq_exit(). Similarly, > > > > nmi_enter() invokes rcu_nmi_enter() and nmi_exit() invokes rcu_nmi_exit(). > > > > > > Sure; I didn't mean to suggest those weren't balanced! The problem here > > > is *nesting*. Due to the structure of our entry code and the core IRQ > > > code, when handling an IRQ we have a sequence: > > > > > > irq_enter() // arch code > > > irq_enter() // irq code > > > > > > < irq handler here > > > > > > > irq_exit() // irq code > > > irq_exit() // arch code > > > > > > ... and if we use something like rcu_is_cpu_rrupt_from_idle() in the > > > middle (e.g. as part of rcu_sched_clock_irq()), this will not give the > > > expected result because of the additional nesting, since > > > rcu_is_cpu_rrupt_from_idle() seems to expect that dynticks_nmi_nesting > > > is only incremented once per exception entry, when it does: > > > > > > /* Are we at first interrupt nesting level? */ > > > nesting = __this_cpu_read(rcu_data.dynticks_nmi_nesting); > > > if (nesting > 1) > > > return false; > > > > > > What I'm trying to figure out is whether that expectation is legitimate, > > > and assuming so, where the entry/exit should happen. > > > > Oooh... > > > > The penalty for fooling rcu_is_cpu_rrupt_from_idle() is that RCU will > > be unable to detect a userspace quiescent state for a non-nohz_full > > CPU. That could result in RCU CPU stall warnings if a user task runs > > continuously on a given CPU for more than 21 seconds (60 seconds in > > some distros). And this can easily happen if the user has a CPU-bound > > thread that is the only runnable task on that CPU. > > > > So, yes, this does need some sort of resolution. > > > > The traditional approach is (as you surmise) to have only a single call > > to irq_enter() on exception entry and only a single call to irq_exit() > > on exception exit. If this is feasible, it is highly recommended. > > Cool; that's roughly what I was expecting / hoping to hear! > > > In theory, we could have that "1" in "nesting > 1" be a constant supplied > > by the architecture (you would want "3" if I remember correctly) but > > in practice could we please avoid this? For one thing, if there is > > some other path into the kernel for your architecture that does only a > > single irq_enter(), then rcu_is_cpu_rrupt_from_idle() just doesn't stand > > a chance. It would need to compare against a different value depending > > on what exception showed up. Even if that cannot happen, it would be > > better if your architecture could remain in blissful ignorance of the > > colorful details of ->dynticks_nmi_nesting manipulations. > > I completely agree. I think it's much harder to keep that in check than > to enforce a "once per architectural exception" policy in the arch code. > > > Another approach would be for the arch code to supply RCU a function that > > it calls. If there is such a function (or perhaps better, if some new > > Kconfig option is enabled), RCU invokes it. Otherwise, it compares to > > "1" as it does now. But you break it, you buy it! ;-) > > I guess we could look at the exception regs and inspect the original > context, but it sounds overkill... > > I think the cleanest thing is to leave this to arch code, and have the > common IRQ code stay well clear. Unfortunately most architectures > (including arch/arm) still need the common IRQ code to handle this, so > we'll have to make that conditional on Kconfig, something like the below > (build+boot tested only). > > If there are no objections, I'll go check who else needs the same > treatment (IIUC at least s390 will), and spin that as a real > patch/series. This approach (whether from you or Pinfan) looks good to me! Thanx, Paul > Thanks, > Mark. > > ---->8---- > diff --git a/arch/Kconfig b/arch/Kconfig > index 8df1c7102643..c59475e50e4c 100644 > --- a/arch/Kconfig > +++ b/arch/Kconfig > @@ -225,6 +225,12 @@ config GENERIC_SMP_IDLE_THREAD > config GENERIC_IDLE_POLL_SETUP > bool > > +config ARCH_ENTERS_IRQ > + bool > + help > + An architecture should select this when it performs irq entry > + management itself (e.g. calling irq_enter() and irq_exit()). > + > config ARCH_HAS_FORTIFY_SOURCE > bool > help > diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig > index 5c7ae4c3954b..fa6476bf2b4d 100644 > --- a/arch/arm64/Kconfig > +++ b/arch/arm64/Kconfig > @@ -16,6 +16,7 @@ config ARM64 > select ARCH_ENABLE_MEMORY_HOTREMOVE > select ARCH_ENABLE_SPLIT_PMD_PTLOCK if PGTABLE_LEVELS > 2 > select ARCH_ENABLE_THP_MIGRATION if TRANSPARENT_HUGEPAGE > + select ARCH_ENTERS_IRQ > select ARCH_HAS_CACHE_LINE_SIZE > select ARCH_HAS_DEBUG_VIRTUAL > select ARCH_HAS_DEBUG_VM_PGTABLE > diff --git a/kernel/irq/irqdesc.c b/kernel/irq/irqdesc.c > index 4e3c29bb603c..6affa12222e0 100644 > --- a/kernel/irq/irqdesc.c > +++ b/kernel/irq/irqdesc.c > @@ -677,6 +677,15 @@ int generic_handle_domain_irq(struct irq_domain *domain, unsigned int hwirq) > EXPORT_SYMBOL_GPL(generic_handle_domain_irq); > > #ifdef CONFIG_HANDLE_DOMAIN_IRQ > + > +#ifdef ARCH_ENTERS_IRQ > +#define handle_irq_enter() > +#define handle_irq_exit() > +#else > +#define handle_irq_enter() irq_enter() > +#define handle_irq_exit() irq_exit() > +#endif > + > /** > * handle_domain_irq - Invoke the handler for a HW irq belonging to a domain, > * usually for a root interrupt controller > @@ -693,7 +702,7 @@ int handle_domain_irq(struct irq_domain *domain, > struct irq_desc *desc; > int ret = 0; > > - irq_enter(); > + handle_irq_enter(); > > /* The irqdomain code provides boundary checks */ > desc = irq_resolve_mapping(domain, hwirq); > @@ -702,7 +711,7 @@ int handle_domain_irq(struct irq_domain *domain, > else > ret = -EINVAL; > > - irq_exit(); > + handle_irq_exit(); > set_irq_regs(old_regs); > return ret; > } >