Message ID | 20240629085601.470241-1-ruanjinjie@huawei.com (mailing list archive) |
---|---|
Headers | show |
Series | arm64: entry: Convert to generic entry | expand |
On Sat, Jun 29, 2024 at 04:55:58PM +0800, Jinjie Ruan wrote: > Changes in v3: > - Test the MTE test cases. > - Handle forget_syscall() in arch_post_report_syscall_entry() > - Make the arch funcs not use __weak as Thomas suggested, so move > the arch funcs to entry-common.h, and make arch_forget_syscall() folded > in arch_post_report_syscall_entry() as suggested. > - Move report_single_step() to thread_info.h for arm64 > - Change __always_inline() to inline, add inline for the other arch funcs. > - Remove unused signal.h for entry-common.h. > - Add Suggested-by. > - Update the commit message. I didn't see the mentioned feedback from tglx in any of the threads. Is lore busted or is there discussion on this series happening somewhere else?
On 2024/7/1 23:40, Kees Cook wrote: > On Sat, Jun 29, 2024 at 04:55:58PM +0800, Jinjie Ruan wrote: >> Changes in v3: >> - Test the MTE test cases. >> - Handle forget_syscall() in arch_post_report_syscall_entry() >> - Make the arch funcs not use __weak as Thomas suggested, so move >> the arch funcs to entry-common.h, and make arch_forget_syscall() folded >> in arch_post_report_syscall_entry() as suggested. >> - Move report_single_step() to thread_info.h for arm64 >> - Change __always_inline() to inline, add inline for the other arch funcs. >> - Remove unused signal.h for entry-common.h. >> - Add Suggested-by. >> - Update the commit message. > > I didn't see the mentioned feedback from tglx in any of the threads. Is > lore busted or is there discussion on this series happening somewhere > else? It Seems Thomas only sent it to me without a public email. >
On Tue, Jul 02, 2024 at 09:01:42AM +0800, Jinjie Ruan wrote: > > > On 2024/7/1 23:40, Kees Cook wrote: > > On Sat, Jun 29, 2024 at 04:55:58PM +0800, Jinjie Ruan wrote: > >> Changes in v3: > >> - Test the MTE test cases. > >> - Handle forget_syscall() in arch_post_report_syscall_entry() > >> - Make the arch funcs not use __weak as Thomas suggested, so move > >> the arch funcs to entry-common.h, and make arch_forget_syscall() folded > >> in arch_post_report_syscall_entry() as suggested. > >> - Move report_single_step() to thread_info.h for arm64 > >> - Change __always_inline() to inline, add inline for the other arch funcs. > >> - Remove unused signal.h for entry-common.h. > >> - Add Suggested-by. > >> - Update the commit message. > > > > I didn't see the mentioned feedback from tglx in any of the threads. Is > > lore busted or is there discussion on this series happening somewhere > > else? > > It Seems Thomas only sent it to me without a public email. Ah ha! Okay, thanks. I thought I was missing some discussion somewhere. :)
On Tue, Jul 02 2024 at 13:59, Kees Cook wrote: > On Tue, Jul 02, 2024 at 09:01:42AM +0800, Jinjie Ruan wrote: >> It Seems Thomas only sent it to me without a public email. > > Ah ha! Okay, thanks. I thought I was missing some discussion somewhere. :) No idea why though ....
Hi, On Sat, Jun 29, 2024 at 04:55:58PM +0800, Jinjie Ruan wrote: > Currently, x86, Riscv, Loongarch use the generic entry. Convert arm64 > to use the generic entry infrastructure from kernel/entry/*. The generic > entry makes maintainers' work easier and codes more elegant, which aslo > removed a lot of duplicate code. > arch/arm64/Kconfig | 1 + > arch/arm64/include/asm/entry-common.h | 172 ++++++++++++ > arch/arm64/include/asm/ptrace.h | 5 + > arch/arm64/include/asm/stacktrace.h | 5 +- > arch/arm64/include/asm/syscall.h | 6 +- > arch/arm64/include/asm/thread_info.h | 23 +- > arch/arm64/kernel/entry-common.c | 368 +++++--------------------- > arch/arm64/kernel/ptrace.c | 90 ------- > arch/arm64/kernel/signal.c | 3 +- > arch/arm64/kernel/syscall.c | 18 +- > include/linux/entry-common.h | 90 +++++++ > include/linux/thread_info.h | 13 + > kernel/entry/common.c | 37 +-- > 13 files changed, 395 insertions(+), 436 deletions(-) > create mode 100644 arch/arm64/include/asm/entry-common.h Looking at this I have a few concerns, which I've tried to explain below. Firstly, this is difficult to review (and will be difficult to test, queue. and debug in future) because lots of independent changes are made all at once. I think that needs to be split out more. It would be good if preparatory rework/cleanup could be split into a few patches that we could consider queueing before the rest of the series, or even if we decide to not pick the rest of the series. For example, patch 2 should be split into: * One patch that replaces arm64's interrupts_enabled() with regs_irqs_disabled(), removing interrupts_enabled() entirely rather than implementing regs_irqs_disabled() using interrupts_enabled(). That'll require updating existing users, but the end result will be more consistent and have fewer lines of code. * One patch that changes on_thread_stack() from a macro to a function. The commit message for patch 2 currently says: > Make on_thread_stack() compatible with generic entry. ... but it's not clear to me *what* that incompatibility is, and that should be explained in the commit message. * One patch that splits report_syscall() into report_syscall_enter() and report_syscall_exit(). This should have no functional change. Patch 3 in particular is very hard to follow because several unrelated complex systems are updated simultaneously. It would be really nice if we could move to the generic sycall code separately from moving the rest of the entry code, as the sycall handling code is a particularly important ABI concern, and it's difficult to see whether we're making ABI changes (accidentaly or knowingly). Can we split that up (e.g. splitting the generic code first into separate entry and syscall files), or are those too tightly coupled for that to be possible? At the end of the series, pt_regs::{lockdep_hardirqs,exit_rcu} still exist, though they're unused. It would be nicer if we could get rid of those in a preparatory patch, e.g. have enter_from_kernel_mode() and exit_to_kernel_mode() use an irqentry_state_t (or a temporary arm64-specific version). That would make the subsequent changes clearer since we'd already have the same structure. In the end result, there's a lot of bouncing between noinstr functions where things are inlined today. For example, el0_da() calls irqentry_enter_from_user_mode(), which is an out-of-line noinstr wrapper for enter_from_user_mode(), which is an __always_inline function in a header. It would be nice to avoid unnecessary bouncing through out-of-line functions. I see s390 and x86 use enter_from_user_mode() directly. There's also some indirection that I don't think is necessary *and* hides important ordering concerns and results in mistakes. In particular, note that before this series, enter_from_kernel_mode() calls the (instrumentable) MTE checks *after* all the necessary lockdep+RCU management is performed by __enter_from_kernel_mode(): static void noinstr enter_from_kernel_mode(struct pt_regs *regs) { __enter_from_kernel_mode(regs); mte_check_tfsr_entry(); mte_disable_tco_entry(current); } ... whereas after this series is applied, those MTE checks are placed in arch_enter_from_kernel_mode(), which irqentry_enter() calls *before* the necessary lockdep+RCU management. That is broken. It would be better to keep that explicit in the arm64 entry code with arm64-specific wrappers, e.g. static noinstr irqentry_state_t enter_from_kernel_mode(struct pt_regs *regs) { irqentry_state_t state = irqentry_enter(regs); mte_check_tfsr_entry(); mte_disable_tco_entry(current); return state; } ... which would avoid the need for arch_enter_from_kernel_mode(), make that ordering obvious, and would remove the need to modify all the callers. Likewise for the user entry/exit paths, which would avoid the visual imbalance of: irqentry_enter_from_user_mode(); ... exit_to_user_mode_wrapper() Thanks, Mark.
On 2024/10/17 23:25, Mark Rutland wrote: > Hi, > > On Sat, Jun 29, 2024 at 04:55:58PM +0800, Jinjie Ruan wrote: >> Currently, x86, Riscv, Loongarch use the generic entry. Convert arm64 >> to use the generic entry infrastructure from kernel/entry/*. The generic >> entry makes maintainers' work easier and codes more elegant, which aslo >> removed a lot of duplicate code. > >> arch/arm64/Kconfig | 1 + >> arch/arm64/include/asm/entry-common.h | 172 ++++++++++++ >> arch/arm64/include/asm/ptrace.h | 5 + >> arch/arm64/include/asm/stacktrace.h | 5 +- >> arch/arm64/include/asm/syscall.h | 6 +- >> arch/arm64/include/asm/thread_info.h | 23 +- >> arch/arm64/kernel/entry-common.c | 368 +++++--------------------- >> arch/arm64/kernel/ptrace.c | 90 ------- >> arch/arm64/kernel/signal.c | 3 +- >> arch/arm64/kernel/syscall.c | 18 +- >> include/linux/entry-common.h | 90 +++++++ >> include/linux/thread_info.h | 13 + >> kernel/entry/common.c | 37 +-- >> 13 files changed, 395 insertions(+), 436 deletions(-) >> create mode 100644 arch/arm64/include/asm/entry-common.h > > Looking at this I have a few concerns, which I've tried to explain > below. > > Firstly, this is difficult to review (and will be difficult to test, > queue. and debug in future) because lots of independent changes are made > all at once. I think that needs to be split out more. > > It would be good if preparatory rework/cleanup could be split into a few > patches that we could consider queueing before the rest of the series, > or even if we decide to not pick the rest of the series. For example, > patch 2 should be split into: > > * One patch that replaces arm64's interrupts_enabled() with > regs_irqs_disabled(), removing interrupts_enabled() entirely rather > than implementing regs_irqs_disabled() using interrupts_enabled(). Yes, only the new version is needed, another interrupts_enabled() should be removed. > > That'll require updating existing users, but the end result will be > more consistent and have fewer lines of code. > > * One patch that changes on_thread_stack() from a macro to a function. > The commit message for patch 2 currently says: > > > Make on_thread_stack() compatible with generic entry. > > ... but it's not clear to me *what* that incompatibility is, and that > should be explained in the commit message. This change is not needed, I'll remove it. > > * One patch that splits report_syscall() into report_syscall_enter() and > report_syscall_exit(). This should have no functional change. Yes, that will be more clear. > > Patch 3 in particular is very hard to follow because several unrelated > complex systems are updated simultaneously. It would be really nice if > we could move to the generic sycall code separately from moving the rest > of the entry code, as the sycall handling code is a particularly > important ABI concern, and it's difficult to see whether we're making > ABI changes (accidentaly or knowingly). > > Can we split that up (e.g. splitting the generic code first into > separate entry and syscall files), or are those too tightly coupled for > that to be possible? It will be hard, but I will try to split it, they are surely tightly coupled which make the 3th patch too big when I try to switch to generic entry. > > At the end of the series, pt_regs::{lockdep_hardirqs,exit_rcu} still > exist, though they're unused. It would be nicer if we could get rid of > those in a preparatory patch, e.g. have enter_from_kernel_mode() and > exit_to_kernel_mode() use an irqentry_state_t (or a temporary > arm64-specific version). That would make the subsequent changes clearer > since we'd already have the same structure. You are totally right, when I do as you said, the third switch patch is more smoother and more comprehensible. > > In the end result, there's a lot of bouncing between noinstr functions > where things are inlined today. For example, el0_da() calls > irqentry_enter_from_user_mode(), which is an out-of-line noinstr wrapper > for enter_from_user_mode(), which is an __always_inline function in a > header. It would be nice to avoid unnecessary bouncing through > out-of-line functions. I see s390 and x86 use enter_from_user_mode() > directly.Yes, the enter_from_user_mode() is enough, there is no need to use the wrapper irqentry_enter_from_user_mode(). > > There's also some indirection that I don't think is necessary *and* > hides important ordering concerns and results in mistakes. In > particular, note that before this series, enter_from_kernel_mode() calls > the (instrumentable) MTE checks *after* all the necessary lockdep+RCU > management is performed by __enter_from_kernel_mode(): > > static void noinstr enter_from_kernel_mode(struct pt_regs *regs) > { > __enter_from_kernel_mode(regs); > mte_check_tfsr_entry(); > mte_disable_tco_entry(current); > } > > ... whereas after this series is applied, those MTE checks are placed in > arch_enter_from_kernel_mode(), which irqentry_enter() calls *before* the > necessary lockdep+RCU management. That is broken. Yes, these MTE checks can be wrapped in arm64 version enter_from_kernel_mode() code, and the new defined arch functions can be removed. > > It would be better to keep that explicit in the arm64 entry code with > arm64-specific wrappers, e.g. > > static noinstr irqentry_state_t enter_from_kernel_mode(struct pt_regs *regs) > { > irqentry_state_t state = irqentry_enter(regs); > mte_check_tfsr_entry(); > mte_disable_tco_entry(current); > > return state; > } > > ... which would avoid the need for arch_enter_from_kernel_mode(), make > that ordering obvious, and would remove the need to modify all the > callers. > > Likewise for the user entry/exit paths, which would avoid the visual > imbalance of: > > irqentry_enter_from_user_mode(); > ... > exit_to_user_mode_wrapper() > Yes, it is not clear, we can eliminate this sense of imbalance by renaming them and wrap them. > Thanks, > Mark. >
On Mon, Oct 21, 2024 at 04:30:51PM +0800, Jinjie Ruan wrote: > On 2024/10/17 23:25, Mark Rutland wrote: > > On Sat, Jun 29, 2024 at 04:55:58PM +0800, Jinjie Ruan wrote: > > Patch 3 in particular is very hard to follow because several unrelated > > complex systems are updated simultaneously. It would be really nice if > > we could move to the generic sycall code separately from moving the rest > > of the entry code, as the sycall handling code is a particularly > > important ABI concern, and it's difficult to see whether we're making > > ABI changes (accidentaly or knowingly). > > > > Can we split that up (e.g. splitting the generic code first into > > separate entry and syscall files), or are those too tightly coupled for > > that to be possible? > > It will be hard, but I will try to split it, they are surely tightly > coupled which make the 3th patch too big when I try to switch to generic > entry. I'm confused. The point I'm making is don't try to switch to *all* the generic entry code in one go: split the 3rd patch into smaller, logically-distinct separate steps. The 3rd patch shouldn't get larger as you should be changing fewer lines in any individual patch. The regular entry state management (e.g. enter_from_user_mode() and exit_to_user_mode()) is largely separate from the syscall state management, which is pretty clear given syscall_enter_from_user_mode_prepare() and syscall_exit_to_user_mode() wrap the regular entry logic: | noinstr void syscall_enter_from_user_mode_prepare(struct pt_regs *regs) | { | enter_from_user_mode(regs); | instrumentation_begin(); | local_irq_enable(); | instrumentation_end(); | } | __visible noinstr void syscall_exit_to_user_mode(struct pt_regs *regs) | { | instrumentation_begin(); | __syscall_exit_to_user_mode_work(regs); | instrumentation_end(); | exit_to_user_mode(); | } ... and while exit_to_user_mode_prepare() is called by irqentry_exit_to_user_mode(), that's also just a wrapper around exit_to_user_mode(): | noinstr void irqentry_exit_to_user_mode(struct pt_regs *regs) | { | instrumentation_begin(); | exit_to_user_mode_prepare(regs); | instrumentation_end(); | exit_to_user_mode(); | } ... so AFAICT we could move arm64 over to enter_from_user_mode() and exit_to_user_mode() without needing to use any of the generic syscall logic. Doing that first, *then* moving over to the generic syscall handling would be much easier to review/test/bisect, and if there are any ABI issues with the syscall handling in particular (which I think is likely), it will be easier to handle those in isolation. Thanks, Mark.
On 2024/10/17 23:25, Mark Rutland wrote: > Hi, > > On Sat, Jun 29, 2024 at 04:55:58PM +0800, Jinjie Ruan wrote: >> Currently, x86, Riscv, Loongarch use the generic entry. Convert arm64 >> to use the generic entry infrastructure from kernel/entry/*. The generic >> entry makes maintainers' work easier and codes more elegant, which aslo >> removed a lot of duplicate code. > >> arch/arm64/Kconfig | 1 + >> arch/arm64/include/asm/entry-common.h | 172 ++++++++++++ >> arch/arm64/include/asm/ptrace.h | 5 + >> arch/arm64/include/asm/stacktrace.h | 5 +- >> arch/arm64/include/asm/syscall.h | 6 +- >> arch/arm64/include/asm/thread_info.h | 23 +- >> arch/arm64/kernel/entry-common.c | 368 +++++--------------------- >> arch/arm64/kernel/ptrace.c | 90 ------- >> arch/arm64/kernel/signal.c | 3 +- >> arch/arm64/kernel/syscall.c | 18 +- >> include/linux/entry-common.h | 90 +++++++ >> include/linux/thread_info.h | 13 + >> kernel/entry/common.c | 37 +-- >> 13 files changed, 395 insertions(+), 436 deletions(-) >> create mode 100644 arch/arm64/include/asm/entry-common.h > > Looking at this I have a few concerns, which I've tried to explain > below. > > Firstly, this is difficult to review (and will be difficult to test, > queue. and debug in future) because lots of independent changes are made > all at once. I think that needs to be split out more. > > It would be good if preparatory rework/cleanup could be split into a few > patches that we could consider queueing before the rest of the series, > or even if we decide to not pick the rest of the series. For example, > patch 2 should be split into: > > * One patch that replaces arm64's interrupts_enabled() with > regs_irqs_disabled(), removing interrupts_enabled() entirely rather > than implementing regs_irqs_disabled() using interrupts_enabled(). > > That'll require updating existing users, but the end result will be > more consistent and have fewer lines of code. > > * One patch that changes on_thread_stack() from a macro to a function. > The commit message for patch 2 currently says: > > > Make on_thread_stack() compatible with generic entry. > > ... but it's not clear to me *what* that incompatibility is, and that > should be explained in the commit message. > > * One patch that splits report_syscall() into report_syscall_enter() and > report_syscall_exit(). This should have no functional change. > > Patch 3 in particular is very hard to follow because several unrelated > complex systems are updated simultaneously. It would be really nice if > we could move to the generic sycall code separately from moving the rest > of the entry code, as the sycall handling code is a particularly > important ABI concern, and it's difficult to see whether we're making > ABI changes (accidentaly or knowingly). > > Can we split that up (e.g. splitting the generic code first into > separate entry and syscall files), or are those too tightly coupled for > that to be possible? > > At the end of the series, pt_regs::{lockdep_hardirqs,exit_rcu} still > exist, though they're unused. It would be nicer if we could get rid of > those in a preparatory patch, e.g. have enter_from_kernel_mode() and > exit_to_kernel_mode() use an irqentry_state_t (or a temporary > arm64-specific version). That would make the subsequent changes clearer > since we'd already have the same structure. > > In the end result, there's a lot of bouncing between noinstr functions > where things are inlined today. For example, el0_da() calls > irqentry_enter_from_user_mode(), which is an out-of-line noinstr wrapper > for enter_from_user_mode(), which is an __always_inline function in a > header. It would be nice to avoid unnecessary bouncing through > out-of-line functions. I see s390 and x86 use enter_from_user_mode() > directly. > > There's also some indirection that I don't think is necessary *and* > hides important ordering concerns and results in mistakes. In > particular, note that before this series, enter_from_kernel_mode() calls > the (instrumentable) MTE checks *after* all the necessary lockdep+RCU > management is performed by __enter_from_kernel_mode(): > > static void noinstr enter_from_kernel_mode(struct pt_regs *regs) > { > __enter_from_kernel_mode(regs); > mte_check_tfsr_entry(); > mte_disable_tco_entry(current); > } > > ... whereas after this series is applied, those MTE checks are placed in > arch_enter_from_kernel_mode(), which irqentry_enter() calls *before* the > necessary lockdep+RCU management. That is broken. > > It would be better to keep that explicit in the arm64 entry code with > arm64-specific wrappers, e.g. > > static noinstr irqentry_state_t enter_from_kernel_mode(struct pt_regs *regs) > { > irqentry_state_t state = irqentry_enter(regs); > mte_check_tfsr_entry(); > mte_disable_tco_entry(current); > > return state; > } Hi, Mark, It seems that there is a problem for arm64_preempt_schedule_irq() when wrap irqentry_exit() with exit_to_kernel_mode(). The arm64_preempt_schedule_irq() is about PREEMPT_DYNAMIC and preempt irq which is the raw_irqentry_exit_cond_resched() in generic code called by irqentry_exit(). Only __el1_irq() call arm64_preempt_schedule_irq(), but when we switch all exit_to_kernel_mode() to arm64-specific one that wrap irqentry_exit(), not only __el1_irq() but also el1_abort(), el1_pc(), el1_undef() etc. will try to reschedule by calling arm64_preempt_schedule_irq() similar logic. > > ... which would avoid the need for arch_enter_from_kernel_mode(), make > that ordering obvious, and would remove the need to modify all the > callers. > > Likewise for the user entry/exit paths, which would avoid the visual > imbalance of: > > irqentry_enter_from_user_mode(); > ... > exit_to_user_mode_wrapper() > > Thanks, > Mark. >
On Tue, Oct 22, 2024 at 08:07:54PM +0800, Jinjie Ruan wrote: > On 2024/10/17 23:25, Mark Rutland wrote: > > There's also some indirection that I don't think is necessary *and* > > hides important ordering concerns and results in mistakes. In > > particular, note that before this series, enter_from_kernel_mode() calls > > the (instrumentable) MTE checks *after* all the necessary lockdep+RCU > > management is performed by __enter_from_kernel_mode(): > > > > static void noinstr enter_from_kernel_mode(struct pt_regs *regs) > > { > > __enter_from_kernel_mode(regs); > > mte_check_tfsr_entry(); > > mte_disable_tco_entry(current); > > } > > > > ... whereas after this series is applied, those MTE checks are placed in > > arch_enter_from_kernel_mode(), which irqentry_enter() calls *before* the > > necessary lockdep+RCU management. That is broken. > > > > It would be better to keep that explicit in the arm64 entry code with > > arm64-specific wrappers, e.g. > > > > static noinstr irqentry_state_t enter_from_kernel_mode(struct pt_regs *regs) > > { > > irqentry_state_t state = irqentry_enter(regs); > > mte_check_tfsr_entry(); > > mte_disable_tco_entry(current); > > > > return state; > > } > > Hi, Mark, It seems that there is a problem for > arm64_preempt_schedule_irq() when wrap irqentry_exit() with > exit_to_kernel_mode(). > > The arm64_preempt_schedule_irq() is about PREEMPT_DYNAMIC and preempt > irq which is the raw_irqentry_exit_cond_resched() in generic code called > by irqentry_exit(). > > Only __el1_irq() call arm64_preempt_schedule_irq(), but when we switch > all exit_to_kernel_mode() to arm64-specific one that wrap > irqentry_exit(), not only __el1_irq() but also el1_abort(), el1_pc(), > el1_undef() etc. will try to reschedule by calling > arm64_preempt_schedule_irq() similar logic. Yes, the generic entry code will preempt any context where an interrupt *could* have been taken from. I'm not sure it actually matters either way; I believe that the generic entry code was written this way because that's what x86 did, and checking for whether interrupts are enabled in the interrupted context is cheap. I's suggest you first write a patch to align arm64's entry code with the generic code, by removing the call to arm64_preempt_schedule_irq() from __el1_irq(), and adding a call to arm64_preempt_schedule_irq() in __exit_to_kernel_mode(), e.g. | static __always_inline void __exit_to_kernel_mode(struct pt_regs *regs) | { | ... | if (interrupts_enabled(regs)) { | ... | if (regs->exit_rcu) { | ... | } | ... | arm64_preempt_schedule_irq(); | ... | } else { | ... | } | } That way the behaviour and structure will be more aligned with the generic code, and with that as an independent patch it will be easier to review/test/bisect/etc. This change will have at least two key impacts: (1) We'll preempt even without taking a "real" interrupt. That shouldn't result in preemption that wasn't possible before, but it does change the probability of preempting at certain points, and might have a performance impact, so probably warrants a benchmark. (2) We will not preempt when taking interrupts from a region of kernel code where IRQs are enabled but RCU is not watching, matching the behaviour of the generic entry code. This has the potential to introduce livelock if we can ever have a screaming interrupt in such a region, so we'll need to go figure out whether that's actually a problem. Having this as a separate patch will make it easier to test/bisect for that specifically. Mark.
On Tue, Oct 22, 2024 at 02:08:12PM +0100, Mark Rutland wrote: > On Tue, Oct 22, 2024 at 08:07:54PM +0800, Jinjie Ruan wrote: > > On 2024/10/17 23:25, Mark Rutland wrote: > > > There's also some indirection that I don't think is necessary *and* > > > hides important ordering concerns and results in mistakes. In > > > particular, note that before this series, enter_from_kernel_mode() calls > > > the (instrumentable) MTE checks *after* all the necessary lockdep+RCU > > > management is performed by __enter_from_kernel_mode(): > > > > > > static void noinstr enter_from_kernel_mode(struct pt_regs *regs) > > > { > > > __enter_from_kernel_mode(regs); > > > mte_check_tfsr_entry(); > > > mte_disable_tco_entry(current); > > > } > > > > > > ... whereas after this series is applied, those MTE checks are placed in > > > arch_enter_from_kernel_mode(), which irqentry_enter() calls *before* the > > > necessary lockdep+RCU management. That is broken. > > > > > > It would be better to keep that explicit in the arm64 entry code with > > > arm64-specific wrappers, e.g. > > > > > > static noinstr irqentry_state_t enter_from_kernel_mode(struct pt_regs *regs) > > > { > > > irqentry_state_t state = irqentry_enter(regs); > > > mte_check_tfsr_entry(); > > > mte_disable_tco_entry(current); > > > > > > return state; > > > } > > > > Hi, Mark, It seems that there is a problem for > > arm64_preempt_schedule_irq() when wrap irqentry_exit() with > > exit_to_kernel_mode(). > > > > The arm64_preempt_schedule_irq() is about PREEMPT_DYNAMIC and preempt > > irq which is the raw_irqentry_exit_cond_resched() in generic code called > > by irqentry_exit(). > > > > Only __el1_irq() call arm64_preempt_schedule_irq(), but when we switch > > all exit_to_kernel_mode() to arm64-specific one that wrap > > irqentry_exit(), not only __el1_irq() but also el1_abort(), el1_pc(), > > el1_undef() etc. will try to reschedule by calling > > arm64_preempt_schedule_irq() similar logic. > > Yes, the generic entry code will preempt any context where an interrupt > *could* have been taken from. > > I'm not sure it actually matters either way; I believe that the generic > entry code was written this way because that's what x86 did, and > checking for whether interrupts are enabled in the interrupted context > is cheap. > > I's suggest you first write a patch to align arm64's entry code with the > generic code, by removing the call to arm64_preempt_schedule_irq() from > __el1_irq(), and adding a call to arm64_preempt_schedule_irq() in > __exit_to_kernel_mode(), e.g. > > | static __always_inline void __exit_to_kernel_mode(struct pt_regs *regs) > | { > | ... > | if (interrupts_enabled(regs)) { > | ... > | if (regs->exit_rcu) { > | ... > | } > | ... > | arm64_preempt_schedule_irq(); > | ... > | } else { > | ... > | } > | } > > That way the behaviour and structure will be more aligned with the > generic code, and with that as an independent patch it will be easier to > review/test/bisect/etc. > > This change will have at least two key impacts: > > (1) We'll preempt even without taking a "real" interrupt. That > shouldn't result in preemption that wasn't possible before, > but it does change the probability of preempting at certain points, > and might have a performance impact, so probably warrants a > benchmark. > > (2) We will not preempt when taking interrupts from a region of kernel > code where IRQs are enabled but RCU is not watching, matching the > behaviour of the generic entry code. > > This has the potential to introduce livelock if we can ever have a > screaming interrupt in such a region, so we'll need to go figure out > whether that's actually a problem. > > Having this as a separate patch will make it easier to test/bisect > for that specifically. Looking some more, the pseudo-NMI DAIF check in arm64_preempt_schedule_irq() is going to be a problem, becuase the IRQ/NMI path manages DAIF distinctly from all other exception handlers. I suspect we'll need to factor that out more in the generic entry code. Mark.
On Thu, Oct 17, 2024 at 04:25:36PM +0100, Mark Rutland wrote: > Hi, > > On Sat, Jun 29, 2024 at 04:55:58PM +0800, Jinjie Ruan wrote: > > Currently, x86, Riscv, Loongarch use the generic entry. Convert arm64 > > to use the generic entry infrastructure from kernel/entry/*. The generic > > entry makes maintainers' work easier and codes more elegant, which aslo > > removed a lot of duplicate code. > > > arch/arm64/Kconfig | 1 + > > arch/arm64/include/asm/entry-common.h | 172 ++++++++++++ > > arch/arm64/include/asm/ptrace.h | 5 + > > arch/arm64/include/asm/stacktrace.h | 5 +- > > arch/arm64/include/asm/syscall.h | 6 +- > > arch/arm64/include/asm/thread_info.h | 23 +- > > arch/arm64/kernel/entry-common.c | 368 +++++--------------------- > > arch/arm64/kernel/ptrace.c | 90 ------- > > arch/arm64/kernel/signal.c | 3 +- > > arch/arm64/kernel/syscall.c | 18 +- > > include/linux/entry-common.h | 90 +++++++ > > include/linux/thread_info.h | 13 + > > kernel/entry/common.c | 37 +-- > > 13 files changed, 395 insertions(+), 436 deletions(-) > > create mode 100644 arch/arm64/include/asm/entry-common.h > > Looking at this I have a few concerns, which I've tried to explain > below. One concern I notice Mark didn't mention is (and I've made this same point in response to LinusW's series doing similar for 32-bit ARM) is... we need to quantify what the impact is of making these changes. Historically, people have worked hard to make the overhead from syscalls as low as possible - if one looks at the x86 architecture, Linux first started off using int $0x80 to deliver syscalls to the kernel. Later CPUs invented sysenter and syscall instructions which I guess result in increased performance over the old int $0x80 approach. syscall entry/exit times are one of those trivial things to measure, most commonly done by seeing how many getpid() calls one can make to the kernel in a defined period of time - and it's one of those measurements people consider (rightly or wrongly) to be a measure of the system performance. When one considers that the majority of interactions userspace has with the kernel involve syscalls, one can see why this is an important path to be as performant as possible. So, I think it's important to always quantify what the impact is of any major change to the kernel entry/exit paths - it's all part of properly evaluating whether the change makes sense. If switching to generic code causes a significant degredation in performance, then that needs to be investigated and fixed - not waited for until someone pops up and eventually says "hey, that change to use the generic code resulted in our systems becoming much less performant!" six or twelve months down the road.
On Tue, Oct 22, 2024 at 3:59 PM Russell King (Oracle) <linux@armlinux.org.uk> wrote: > So, I think it's important to always quantify what the impact is of > any major change to the kernel entry/exit paths - it's all part of > properly evaluating whether the change makes sense. > > If switching to generic code causes a significant degredation in > performance, then that needs to be investigated and fixed - not > waited for until someone pops up and eventually says "hey, that > change to use the generic code resulted in our systems becoming > much less performant!" six or twelve months down the road. I agree, I did some measurements for the ARM32 patch which is in RFC v2. I ran: perf bench syscall all This issues 10 million getppid calls and measures how much time that takes (on a system with some background IRQ noise). I issued this three times (which takes some time to execute) before and after the patch set, then compared the best performance before the patch with the worst performance after the patch. On ARM32 generic entry costs some 3-4% of performance overhead (increased execution time). By chance I discovered that PAN (privileged access never, CPU_SW_DOMAIN_PAN) costs around 400% increased syscall execution time overhead (two orders of magnitude more than generic entry) Debian apparently turns PAN off, maybe by mistake, maybe for this reason. I was a bit surprised by it, I don't know if I made some error in the measurement. Yours, Linus Walleij