mbox series

[v3,0/3] arm64: entry: Convert to generic entry

Message ID 20240629085601.470241-1-ruanjinjie@huawei.com (mailing list archive)
Headers show
Series arm64: entry: Convert to generic entry | expand

Message

Jinjie Ruan June 29, 2024, 8:55 a.m. UTC
Currently, x86, Riscv, Loongarch use the generic entry. Convert arm64
to use the generic entry infrastructure from kernel/entry/*. The generic
entry makes maintainers' work easier and codes more elegant, which aslo
removed a lot of duplicate code.

Changes in v3:
- Test the MTE test cases.
- Handle forget_syscall() in arch_post_report_syscall_entry()
- Make the arch funcs not use __weak as Thomas suggested, so move
  the arch funcs to entry-common.h, and make arch_forget_syscall() folded
  in arch_post_report_syscall_entry() as suggested.
- Move report_single_step() to thread_info.h for arm64
- Change __always_inline() to inline, add inline for the other arch funcs.
- Remove unused signal.h for entry-common.h.
- Add Suggested-by.
- Update the commit message.

Changes in v2:
- Add tested-by.
- Fix a bug that not call arch_post_report_syscall_entry() in
  syscall_trace_enter() if ptrace_report_syscall_entry() return not zero.
- Refactor report_syscall().
- Add comment for arch_prepare_report_syscall_exit().
- Adjust entry-common.h header file inclusion to alphabetical order.
- Update the commit message.

Jinjie Ruan (3):
  entry: Add some arch funcs to support arm64 to use generic entry
  arm64: Prepare to switch to generic entry
  arm64: entry: Convert to generic entry

 arch/arm64/Kconfig                    |   1 +
 arch/arm64/include/asm/entry-common.h | 172 ++++++++++++
 arch/arm64/include/asm/ptrace.h       |   5 +
 arch/arm64/include/asm/stacktrace.h   |   5 +-
 arch/arm64/include/asm/syscall.h      |   6 +-
 arch/arm64/include/asm/thread_info.h  |  23 +-
 arch/arm64/kernel/entry-common.c      | 368 +++++---------------------
 arch/arm64/kernel/ptrace.c            |  90 -------
 arch/arm64/kernel/signal.c            |   3 +-
 arch/arm64/kernel/syscall.c           |  18 +-
 include/linux/entry-common.h          |  90 +++++++
 include/linux/thread_info.h           |  13 +
 kernel/entry/common.c                 |  37 +--
 13 files changed, 395 insertions(+), 436 deletions(-)
 create mode 100644 arch/arm64/include/asm/entry-common.h

Comments

Kees Cook July 1, 2024, 3:40 p.m. UTC | #1
On Sat, Jun 29, 2024 at 04:55:58PM +0800, Jinjie Ruan wrote:
> Changes in v3:
> - Test the MTE test cases.
> - Handle forget_syscall() in arch_post_report_syscall_entry()
> - Make the arch funcs not use __weak as Thomas suggested, so move
>   the arch funcs to entry-common.h, and make arch_forget_syscall() folded
>   in arch_post_report_syscall_entry() as suggested.
> - Move report_single_step() to thread_info.h for arm64
> - Change __always_inline() to inline, add inline for the other arch funcs.
> - Remove unused signal.h for entry-common.h.
> - Add Suggested-by.
> - Update the commit message.

I didn't see the mentioned feedback from tglx in any of the threads. Is
lore busted or is there discussion on this series happening somewhere
else?
Jinjie Ruan July 2, 2024, 1:01 a.m. UTC | #2
On 2024/7/1 23:40, Kees Cook wrote:
> On Sat, Jun 29, 2024 at 04:55:58PM +0800, Jinjie Ruan wrote:
>> Changes in v3:
>> - Test the MTE test cases.
>> - Handle forget_syscall() in arch_post_report_syscall_entry()
>> - Make the arch funcs not use __weak as Thomas suggested, so move
>>   the arch funcs to entry-common.h, and make arch_forget_syscall() folded
>>   in arch_post_report_syscall_entry() as suggested.
>> - Move report_single_step() to thread_info.h for arm64
>> - Change __always_inline() to inline, add inline for the other arch funcs.
>> - Remove unused signal.h for entry-common.h.
>> - Add Suggested-by.
>> - Update the commit message.
> 
> I didn't see the mentioned feedback from tglx in any of the threads. Is
> lore busted or is there discussion on this series happening somewhere
> else?

It Seems Thomas only sent it to me without a public email.

>
Kees Cook July 2, 2024, 8:59 p.m. UTC | #3
On Tue, Jul 02, 2024 at 09:01:42AM +0800, Jinjie Ruan wrote:
> 
> 
> On 2024/7/1 23:40, Kees Cook wrote:
> > On Sat, Jun 29, 2024 at 04:55:58PM +0800, Jinjie Ruan wrote:
> >> Changes in v3:
> >> - Test the MTE test cases.
> >> - Handle forget_syscall() in arch_post_report_syscall_entry()
> >> - Make the arch funcs not use __weak as Thomas suggested, so move
> >>   the arch funcs to entry-common.h, and make arch_forget_syscall() folded
> >>   in arch_post_report_syscall_entry() as suggested.
> >> - Move report_single_step() to thread_info.h for arm64
> >> - Change __always_inline() to inline, add inline for the other arch funcs.
> >> - Remove unused signal.h for entry-common.h.
> >> - Add Suggested-by.
> >> - Update the commit message.
> > 
> > I didn't see the mentioned feedback from tglx in any of the threads. Is
> > lore busted or is there discussion on this series happening somewhere
> > else?
> 
> It Seems Thomas only sent it to me without a public email.

Ah ha! Okay, thanks. I thought I was missing some discussion somewhere. :)
Thomas Gleixner July 7, 2024, 7:20 p.m. UTC | #4
On Tue, Jul 02 2024 at 13:59, Kees Cook wrote:
> On Tue, Jul 02, 2024 at 09:01:42AM +0800, Jinjie Ruan wrote:
>> It Seems Thomas only sent it to me without a public email.
>
> Ah ha! Okay, thanks. I thought I was missing some discussion somewhere. :)

No idea why though ....
Mark Rutland Oct. 17, 2024, 3:25 p.m. UTC | #5
Hi,

On Sat, Jun 29, 2024 at 04:55:58PM +0800, Jinjie Ruan wrote:
> Currently, x86, Riscv, Loongarch use the generic entry. Convert arm64
> to use the generic entry infrastructure from kernel/entry/*. The generic
> entry makes maintainers' work easier and codes more elegant, which aslo
> removed a lot of duplicate code.

>  arch/arm64/Kconfig                    |   1 +
>  arch/arm64/include/asm/entry-common.h | 172 ++++++++++++
>  arch/arm64/include/asm/ptrace.h       |   5 +
>  arch/arm64/include/asm/stacktrace.h   |   5 +-
>  arch/arm64/include/asm/syscall.h      |   6 +-
>  arch/arm64/include/asm/thread_info.h  |  23 +-
>  arch/arm64/kernel/entry-common.c      | 368 +++++---------------------
>  arch/arm64/kernel/ptrace.c            |  90 -------
>  arch/arm64/kernel/signal.c            |   3 +-
>  arch/arm64/kernel/syscall.c           |  18 +-
>  include/linux/entry-common.h          |  90 +++++++
>  include/linux/thread_info.h           |  13 +
>  kernel/entry/common.c                 |  37 +--
>  13 files changed, 395 insertions(+), 436 deletions(-)
>  create mode 100644 arch/arm64/include/asm/entry-common.h

Looking at this I have a few concerns, which I've tried to explain
below.

Firstly, this is difficult to review (and will be difficult to test,
queue. and debug in future) because lots of independent changes are made
all at once. I think that needs to be split out more.

It would be good if preparatory rework/cleanup could be split into a few
patches that we could consider queueing before the rest of the series,
or even if we decide to not pick the rest of the series. For example,
patch 2 should be split into:

* One patch that replaces arm64's interrupts_enabled() with
  regs_irqs_disabled(), removing interrupts_enabled() entirely rather
  than implementing regs_irqs_disabled() using interrupts_enabled().

  That'll require updating existing users, but the end result will be
  more consistent and have fewer lines of code.

* One patch that changes on_thread_stack() from a macro to a function.
  The commit message for patch 2 currently says:

  >  Make on_thread_stack() compatible with generic entry.   
 
  ... but it's not clear to me *what* that incompatibility is, and that
  should be explained in the commit message.

* One patch that splits report_syscall() into report_syscall_enter() and
  report_syscall_exit(). This should have no functional change.

Patch 3 in particular is very hard to follow because several unrelated
complex systems are updated simultaneously. It would be really nice if
we could move to the generic sycall code separately from moving the rest
of the entry code, as the sycall handling code is a particularly
important ABI concern, and it's difficult to see whether we're making
ABI changes (accidentaly or knowingly).

Can we split that up (e.g. splitting the generic code first into
separate entry and syscall files), or are those too tightly coupled for
that to be possible?

At the end of the series, pt_regs::{lockdep_hardirqs,exit_rcu} still
exist, though they're unused. It would be nicer if we could get rid of
those in a preparatory patch, e.g. have enter_from_kernel_mode() and
exit_to_kernel_mode() use an irqentry_state_t (or a temporary
arm64-specific version). That would make the subsequent changes clearer
since we'd already have the same structure.

In the end result, there's a lot of bouncing between noinstr functions
where things are inlined today. For example, el0_da() calls
irqentry_enter_from_user_mode(), which is an out-of-line noinstr wrapper
for enter_from_user_mode(), which is an __always_inline function in a
header. It would be nice to avoid unnecessary bouncing through
out-of-line functions. I see s390 and x86 use enter_from_user_mode()
directly.

There's also some indirection that I don't think is necessary *and*
hides important ordering concerns and results in mistakes. In
particular, note that before this series, enter_from_kernel_mode() calls
the (instrumentable) MTE checks *after* all the necessary lockdep+RCU
management is performed by __enter_from_kernel_mode():

	static void noinstr enter_from_kernel_mode(struct pt_regs *regs)
	{
	        __enter_from_kernel_mode(regs);
		mte_check_tfsr_entry();
		mte_disable_tco_entry(current);
	}

... whereas after this series is applied, those MTE checks are placed in
arch_enter_from_kernel_mode(), which irqentry_enter() calls *before* the
necessary lockdep+RCU management. That is broken.

It would be better to keep that explicit in the arm64 entry code with
arm64-specific wrappers, e.g.

	static noinstr irqentry_state_t enter_from_kernel_mode(struct pt_regs *regs)
	{
		irqentry_state_t state = irqentry_enter(regs);
		mte_check_tfsr_entry();
		mte_disable_tco_entry(current);

		return state;
	}

... which would avoid the need for arch_enter_from_kernel_mode(), make
that ordering obvious, and would remove the need to modify all the
callers.

Likewise for the user entry/exit paths, which would avoid the visual
imbalance of:
	
	irqentry_enter_from_user_mode();
	...
	exit_to_user_mode_wrapper()

Thanks,
Mark.
Jinjie Ruan Oct. 21, 2024, 8:30 a.m. UTC | #6
On 2024/10/17 23:25, Mark Rutland wrote:
> Hi,
> 
> On Sat, Jun 29, 2024 at 04:55:58PM +0800, Jinjie Ruan wrote:
>> Currently, x86, Riscv, Loongarch use the generic entry. Convert arm64
>> to use the generic entry infrastructure from kernel/entry/*. The generic
>> entry makes maintainers' work easier and codes more elegant, which aslo
>> removed a lot of duplicate code.
> 
>>  arch/arm64/Kconfig                    |   1 +
>>  arch/arm64/include/asm/entry-common.h | 172 ++++++++++++
>>  arch/arm64/include/asm/ptrace.h       |   5 +
>>  arch/arm64/include/asm/stacktrace.h   |   5 +-
>>  arch/arm64/include/asm/syscall.h      |   6 +-
>>  arch/arm64/include/asm/thread_info.h  |  23 +-
>>  arch/arm64/kernel/entry-common.c      | 368 +++++---------------------
>>  arch/arm64/kernel/ptrace.c            |  90 -------
>>  arch/arm64/kernel/signal.c            |   3 +-
>>  arch/arm64/kernel/syscall.c           |  18 +-
>>  include/linux/entry-common.h          |  90 +++++++
>>  include/linux/thread_info.h           |  13 +
>>  kernel/entry/common.c                 |  37 +--
>>  13 files changed, 395 insertions(+), 436 deletions(-)
>>  create mode 100644 arch/arm64/include/asm/entry-common.h
> 
> Looking at this I have a few concerns, which I've tried to explain
> below.
> 
> Firstly, this is difficult to review (and will be difficult to test,
> queue. and debug in future) because lots of independent changes are made
> all at once. I think that needs to be split out more.
> 
> It would be good if preparatory rework/cleanup could be split into a few
> patches that we could consider queueing before the rest of the series,
> or even if we decide to not pick the rest of the series. For example,
> patch 2 should be split into:
> 
> * One patch that replaces arm64's interrupts_enabled() with
>   regs_irqs_disabled(), removing interrupts_enabled() entirely rather
>   than implementing regs_irqs_disabled() using interrupts_enabled().

Yes, only the new version is needed, another interrupts_enabled() should
 be removed.

> 
>   That'll require updating existing users, but the end result will be
>   more consistent and have fewer lines of code.
> 
> * One patch that changes on_thread_stack() from a macro to a function.
>   The commit message for patch 2 currently says:
> 
>   >  Make on_thread_stack() compatible with generic entry.   
>  
>   ... but it's not clear to me *what* that incompatibility is, and that
>   should be explained in the commit message.

This change is not needed, I'll remove it.

> 
> * One patch that splits report_syscall() into report_syscall_enter() and
>   report_syscall_exit(). This should have no functional change.

Yes, that will be more clear.

> 
> Patch 3 in particular is very hard to follow because several unrelated
> complex systems are updated simultaneously. It would be really nice if
> we could move to the generic sycall code separately from moving the rest
> of the entry code, as the sycall handling code is a particularly
> important ABI concern, and it's difficult to see whether we're making
> ABI changes (accidentaly or knowingly).
> 
> Can we split that up (e.g. splitting the generic code first into
> separate entry and syscall files), or are those too tightly coupled for
> that to be possible?

It will be hard, but I will try to split it, they are surely tightly
coupled which make the 3th patch too big when I try to switch to generic
entry.

> 
> At the end of the series, pt_regs::{lockdep_hardirqs,exit_rcu} still
> exist, though they're unused. It would be nicer if we could get rid of
> those in a preparatory patch, e.g. have enter_from_kernel_mode() and
> exit_to_kernel_mode() use an irqentry_state_t (or a temporary
> arm64-specific version). That would make the subsequent changes clearer
> since we'd already have the same structure.

You are totally right, when I do as you said, the third switch patch is
more smoother and more comprehensible.

> 
> In the end result, there's a lot of bouncing between noinstr functions
> where things are inlined today. For example, el0_da() calls
> irqentry_enter_from_user_mode(), which is an out-of-line noinstr wrapper
> for enter_from_user_mode(), which is an __always_inline function in a
> header. It would be nice to avoid unnecessary bouncing through
> out-of-line functions. I see s390 and x86 use enter_from_user_mode()
> directly.Yes, the enter_from_user_mode() is enough, there is no need to use the
wrapper irqentry_enter_from_user_mode().

> 
> There's also some indirection that I don't think is necessary *and*
> hides important ordering concerns and results in mistakes. In
> particular, note that before this series, enter_from_kernel_mode() calls
> the (instrumentable) MTE checks *after* all the necessary lockdep+RCU
> management is performed by __enter_from_kernel_mode():
> 
> 	static void noinstr enter_from_kernel_mode(struct pt_regs *regs)
> 	{
> 	        __enter_from_kernel_mode(regs);
> 		mte_check_tfsr_entry();
> 		mte_disable_tco_entry(current);
> 	}
> 
> ... whereas after this series is applied, those MTE checks are placed in
> arch_enter_from_kernel_mode(), which irqentry_enter() calls *before* the
> necessary lockdep+RCU management. That is broken.

Yes, these MTE checks can be wrapped in arm64 version
enter_from_kernel_mode() code, and the new defined arch functions can be
removed.

> 
> It would be better to keep that explicit in the arm64 entry code with
> arm64-specific wrappers, e.g.
> 
> 	static noinstr irqentry_state_t enter_from_kernel_mode(struct pt_regs *regs)
> 	{
> 		irqentry_state_t state = irqentry_enter(regs);
> 		mte_check_tfsr_entry();
> 		mte_disable_tco_entry(current);
> 
> 		return state;
> 	}
> 
> ... which would avoid the need for arch_enter_from_kernel_mode(), make
> that ordering obvious, and would remove the need to modify all the
> callers.
> 
> Likewise for the user entry/exit paths, which would avoid the visual
> imbalance of:
> 	
> 	irqentry_enter_from_user_mode();
> 	...
> 	exit_to_user_mode_wrapper()
> 

Yes, it is not clear, we can eliminate this sense of imbalance by
renaming them and wrap them.

> Thanks,
> Mark.
>
Mark Rutland Oct. 21, 2024, 9:45 a.m. UTC | #7
On Mon, Oct 21, 2024 at 04:30:51PM +0800, Jinjie Ruan wrote:
> On 2024/10/17 23:25, Mark Rutland wrote:
> > On Sat, Jun 29, 2024 at 04:55:58PM +0800, Jinjie Ruan wrote:

> > Patch 3 in particular is very hard to follow because several unrelated
> > complex systems are updated simultaneously. It would be really nice if
> > we could move to the generic sycall code separately from moving the rest
> > of the entry code, as the sycall handling code is a particularly
> > important ABI concern, and it's difficult to see whether we're making
> > ABI changes (accidentaly or knowingly).
> > 
> > Can we split that up (e.g. splitting the generic code first into
> > separate entry and syscall files), or are those too tightly coupled for
> > that to be possible?
> 
> It will be hard, but I will try to split it, they are surely tightly
> coupled which make the 3th patch too big when I try to switch to generic
> entry.

I'm confused. The point I'm making is don't try to switch to *all* the
generic entry code in one go: split the 3rd patch into smaller,
logically-distinct separate steps. The 3rd patch shouldn't get larger as
you should be changing fewer lines in any individual patch.

The regular entry state management (e.g. enter_from_user_mode() and
exit_to_user_mode()) is largely separate from the syscall state
management, which is pretty clear given
syscall_enter_from_user_mode_prepare() and syscall_exit_to_user_mode()
wrap the regular entry logic:

| noinstr void syscall_enter_from_user_mode_prepare(struct pt_regs *regs)
| {
|         enter_from_user_mode(regs);
|         instrumentation_begin();
|         local_irq_enable();
|         instrumentation_end();
| }

| __visible noinstr void syscall_exit_to_user_mode(struct pt_regs *regs)
| {
|         instrumentation_begin();
|         __syscall_exit_to_user_mode_work(regs);
|         instrumentation_end();
|         exit_to_user_mode();
| }

... and while exit_to_user_mode_prepare() is called by
irqentry_exit_to_user_mode(), that's also just a wrapper around
exit_to_user_mode():

| noinstr void irqentry_exit_to_user_mode(struct pt_regs *regs)
| {
|         instrumentation_begin();
|         exit_to_user_mode_prepare(regs);
|         instrumentation_end();
|         exit_to_user_mode();
| }

... so AFAICT we could move arm64 over to enter_from_user_mode() and
exit_to_user_mode() without needing to use any of the generic syscall
logic.

Doing that first, *then* moving over to the generic syscall handling
would be much easier to review/test/bisect, and if there are any ABI
issues with the syscall handling in particular (which I think is
likely), it will be easier to handle those in isolation.

Thanks,
Mark.
Jinjie Ruan Oct. 22, 2024, 12:07 p.m. UTC | #8
On 2024/10/17 23:25, Mark Rutland wrote:
> Hi,
> 
> On Sat, Jun 29, 2024 at 04:55:58PM +0800, Jinjie Ruan wrote:
>> Currently, x86, Riscv, Loongarch use the generic entry. Convert arm64
>> to use the generic entry infrastructure from kernel/entry/*. The generic
>> entry makes maintainers' work easier and codes more elegant, which aslo
>> removed a lot of duplicate code.
> 
>>  arch/arm64/Kconfig                    |   1 +
>>  arch/arm64/include/asm/entry-common.h | 172 ++++++++++++
>>  arch/arm64/include/asm/ptrace.h       |   5 +
>>  arch/arm64/include/asm/stacktrace.h   |   5 +-
>>  arch/arm64/include/asm/syscall.h      |   6 +-
>>  arch/arm64/include/asm/thread_info.h  |  23 +-
>>  arch/arm64/kernel/entry-common.c      | 368 +++++---------------------
>>  arch/arm64/kernel/ptrace.c            |  90 -------
>>  arch/arm64/kernel/signal.c            |   3 +-
>>  arch/arm64/kernel/syscall.c           |  18 +-
>>  include/linux/entry-common.h          |  90 +++++++
>>  include/linux/thread_info.h           |  13 +
>>  kernel/entry/common.c                 |  37 +--
>>  13 files changed, 395 insertions(+), 436 deletions(-)
>>  create mode 100644 arch/arm64/include/asm/entry-common.h
> 
> Looking at this I have a few concerns, which I've tried to explain
> below.
> 
> Firstly, this is difficult to review (and will be difficult to test,
> queue. and debug in future) because lots of independent changes are made
> all at once. I think that needs to be split out more.
> 
> It would be good if preparatory rework/cleanup could be split into a few
> patches that we could consider queueing before the rest of the series,
> or even if we decide to not pick the rest of the series. For example,
> patch 2 should be split into:
> 
> * One patch that replaces arm64's interrupts_enabled() with
>   regs_irqs_disabled(), removing interrupts_enabled() entirely rather
>   than implementing regs_irqs_disabled() using interrupts_enabled().
> 
>   That'll require updating existing users, but the end result will be
>   more consistent and have fewer lines of code.
> 
> * One patch that changes on_thread_stack() from a macro to a function.
>   The commit message for patch 2 currently says:
> 
>   >  Make on_thread_stack() compatible with generic entry.   
>  
>   ... but it's not clear to me *what* that incompatibility is, and that
>   should be explained in the commit message.
> 
> * One patch that splits report_syscall() into report_syscall_enter() and
>   report_syscall_exit(). This should have no functional change.
> 
> Patch 3 in particular is very hard to follow because several unrelated
> complex systems are updated simultaneously. It would be really nice if
> we could move to the generic sycall code separately from moving the rest
> of the entry code, as the sycall handling code is a particularly
> important ABI concern, and it's difficult to see whether we're making
> ABI changes (accidentaly or knowingly).
> 
> Can we split that up (e.g. splitting the generic code first into
> separate entry and syscall files), or are those too tightly coupled for
> that to be possible?
> 
> At the end of the series, pt_regs::{lockdep_hardirqs,exit_rcu} still
> exist, though they're unused. It would be nicer if we could get rid of
> those in a preparatory patch, e.g. have enter_from_kernel_mode() and
> exit_to_kernel_mode() use an irqentry_state_t (or a temporary
> arm64-specific version). That would make the subsequent changes clearer
> since we'd already have the same structure.
> 
> In the end result, there's a lot of bouncing between noinstr functions
> where things are inlined today. For example, el0_da() calls
> irqentry_enter_from_user_mode(), which is an out-of-line noinstr wrapper
> for enter_from_user_mode(), which is an __always_inline function in a
> header. It would be nice to avoid unnecessary bouncing through
> out-of-line functions. I see s390 and x86 use enter_from_user_mode()
> directly.
> 
> There's also some indirection that I don't think is necessary *and*
> hides important ordering concerns and results in mistakes. In
> particular, note that before this series, enter_from_kernel_mode() calls
> the (instrumentable) MTE checks *after* all the necessary lockdep+RCU
> management is performed by __enter_from_kernel_mode():
> 
> 	static void noinstr enter_from_kernel_mode(struct pt_regs *regs)
> 	{
> 	        __enter_from_kernel_mode(regs);
> 		mte_check_tfsr_entry();
> 		mte_disable_tco_entry(current);
> 	}
> 
> ... whereas after this series is applied, those MTE checks are placed in
> arch_enter_from_kernel_mode(), which irqentry_enter() calls *before* the
> necessary lockdep+RCU management. That is broken.
> 
> It would be better to keep that explicit in the arm64 entry code with
> arm64-specific wrappers, e.g.
> 
> 	static noinstr irqentry_state_t enter_from_kernel_mode(struct pt_regs *regs)
> 	{
> 		irqentry_state_t state = irqentry_enter(regs);
> 		mte_check_tfsr_entry();
> 		mte_disable_tco_entry(current);
> 
> 		return state;
> 	}

Hi, Mark, It seems that there is a problem for
arm64_preempt_schedule_irq() when wrap irqentry_exit() with
exit_to_kernel_mode().

The arm64_preempt_schedule_irq() is about PREEMPT_DYNAMIC and preempt
irq which is the raw_irqentry_exit_cond_resched() in generic code called
by irqentry_exit().

Only __el1_irq() call arm64_preempt_schedule_irq(), but when we switch
all exit_to_kernel_mode() to arm64-specific one that wrap
irqentry_exit(), not only __el1_irq() but also el1_abort(), el1_pc(),
el1_undef() etc. will try to reschedule by calling
arm64_preempt_schedule_irq() similar logic.



> 
> ... which would avoid the need for arch_enter_from_kernel_mode(), make
> that ordering obvious, and would remove the need to modify all the
> callers.
> 
> Likewise for the user entry/exit paths, which would avoid the visual
> imbalance of:
> 	
> 	irqentry_enter_from_user_mode();
> 	...
> 	exit_to_user_mode_wrapper()
> 
> Thanks,
> Mark.
>
Mark Rutland Oct. 22, 2024, 1:08 p.m. UTC | #9
On Tue, Oct 22, 2024 at 08:07:54PM +0800, Jinjie Ruan wrote:
> On 2024/10/17 23:25, Mark Rutland wrote:
> > There's also some indirection that I don't think is necessary *and*
> > hides important ordering concerns and results in mistakes. In
> > particular, note that before this series, enter_from_kernel_mode() calls
> > the (instrumentable) MTE checks *after* all the necessary lockdep+RCU
> > management is performed by __enter_from_kernel_mode():
> > 
> > 	static void noinstr enter_from_kernel_mode(struct pt_regs *regs)
> > 	{
> > 	        __enter_from_kernel_mode(regs);
> > 		mte_check_tfsr_entry();
> > 		mte_disable_tco_entry(current);
> > 	}
> > 
> > ... whereas after this series is applied, those MTE checks are placed in
> > arch_enter_from_kernel_mode(), which irqentry_enter() calls *before* the
> > necessary lockdep+RCU management. That is broken.
> > 
> > It would be better to keep that explicit in the arm64 entry code with
> > arm64-specific wrappers, e.g.
> > 
> > 	static noinstr irqentry_state_t enter_from_kernel_mode(struct pt_regs *regs)
> > 	{
> > 		irqentry_state_t state = irqentry_enter(regs);
> > 		mte_check_tfsr_entry();
> > 		mte_disable_tco_entry(current);
> > 
> > 		return state;
> > 	}
> 
> Hi, Mark, It seems that there is a problem for
> arm64_preempt_schedule_irq() when wrap irqentry_exit() with
> exit_to_kernel_mode().
> 
> The arm64_preempt_schedule_irq() is about PREEMPT_DYNAMIC and preempt
> irq which is the raw_irqentry_exit_cond_resched() in generic code called
> by irqentry_exit().
> 
> Only __el1_irq() call arm64_preempt_schedule_irq(), but when we switch
> all exit_to_kernel_mode() to arm64-specific one that wrap
> irqentry_exit(), not only __el1_irq() but also el1_abort(), el1_pc(),
> el1_undef() etc. will try to reschedule by calling
> arm64_preempt_schedule_irq() similar logic.

Yes, the generic entry code will preempt any context where an interrupt
*could* have been taken from.

I'm not sure it actually matters either way; I believe that the generic
entry code was written this way because that's what x86 did, and
checking for whether interrupts are enabled in the interrupted context
is cheap.

I's suggest you first write a patch to align arm64's entry code with the
generic code, by removing the call to arm64_preempt_schedule_irq() from
__el1_irq(), and adding a call to arm64_preempt_schedule_irq() in
__exit_to_kernel_mode(), e.g.

| static __always_inline void __exit_to_kernel_mode(struct pt_regs *regs)
| {
| 	...
| 	if (interrupts_enabled(regs)) {
| 		...
| 		if (regs->exit_rcu) {
| 			...
| 		}
| 		...
| 		arm64_preempt_schedule_irq();
| 		...
| 	} else {
| 		...
| 	}
| }

That way the behaviour and structure will be more aligned with the
generic code, and with that as an independent patch it will be easier to
review/test/bisect/etc.

This change will have at least two key impacts:

(1) We'll preempt even without taking a "real" interrupt. That
    shouldn't result in preemption that wasn't possible before, 
    but it does change the probability of preempting at certain points,
    and might have a performance impact, so probably warrants a
    benchmark.

(2) We will not preempt when taking interrupts from a region of kernel
    code where IRQs are enabled but RCU is not watching, matching the
    behaviour of the generic entry code.

    This has the potential to introduce livelock if we can ever have a
    screaming interrupt in such a region, so we'll need to go figure out
    whether that's actually a problem.

    Having this as a separate patch will make it easier to test/bisect
    for that specifically.

Mark.
Mark Rutland Oct. 22, 2024, 1:37 p.m. UTC | #10
On Tue, Oct 22, 2024 at 02:08:12PM +0100, Mark Rutland wrote:
> On Tue, Oct 22, 2024 at 08:07:54PM +0800, Jinjie Ruan wrote:
> > On 2024/10/17 23:25, Mark Rutland wrote:
> > > There's also some indirection that I don't think is necessary *and*
> > > hides important ordering concerns and results in mistakes. In
> > > particular, note that before this series, enter_from_kernel_mode() calls
> > > the (instrumentable) MTE checks *after* all the necessary lockdep+RCU
> > > management is performed by __enter_from_kernel_mode():
> > > 
> > > 	static void noinstr enter_from_kernel_mode(struct pt_regs *regs)
> > > 	{
> > > 	        __enter_from_kernel_mode(regs);
> > > 		mte_check_tfsr_entry();
> > > 		mte_disable_tco_entry(current);
> > > 	}
> > > 
> > > ... whereas after this series is applied, those MTE checks are placed in
> > > arch_enter_from_kernel_mode(), which irqentry_enter() calls *before* the
> > > necessary lockdep+RCU management. That is broken.
> > > 
> > > It would be better to keep that explicit in the arm64 entry code with
> > > arm64-specific wrappers, e.g.
> > > 
> > > 	static noinstr irqentry_state_t enter_from_kernel_mode(struct pt_regs *regs)
> > > 	{
> > > 		irqentry_state_t state = irqentry_enter(regs);
> > > 		mte_check_tfsr_entry();
> > > 		mte_disable_tco_entry(current);
> > > 
> > > 		return state;
> > > 	}
> > 
> > Hi, Mark, It seems that there is a problem for
> > arm64_preempt_schedule_irq() when wrap irqentry_exit() with
> > exit_to_kernel_mode().
> > 
> > The arm64_preempt_schedule_irq() is about PREEMPT_DYNAMIC and preempt
> > irq which is the raw_irqentry_exit_cond_resched() in generic code called
> > by irqentry_exit().
> > 
> > Only __el1_irq() call arm64_preempt_schedule_irq(), but when we switch
> > all exit_to_kernel_mode() to arm64-specific one that wrap
> > irqentry_exit(), not only __el1_irq() but also el1_abort(), el1_pc(),
> > el1_undef() etc. will try to reschedule by calling
> > arm64_preempt_schedule_irq() similar logic.
> 
> Yes, the generic entry code will preempt any context where an interrupt
> *could* have been taken from.
> 
> I'm not sure it actually matters either way; I believe that the generic
> entry code was written this way because that's what x86 did, and
> checking for whether interrupts are enabled in the interrupted context
> is cheap.
> 
> I's suggest you first write a patch to align arm64's entry code with the
> generic code, by removing the call to arm64_preempt_schedule_irq() from
> __el1_irq(), and adding a call to arm64_preempt_schedule_irq() in
> __exit_to_kernel_mode(), e.g.
> 
> | static __always_inline void __exit_to_kernel_mode(struct pt_regs *regs)
> | {
> | 	...
> | 	if (interrupts_enabled(regs)) {
> | 		...
> | 		if (regs->exit_rcu) {
> | 			...
> | 		}
> | 		...
> | 		arm64_preempt_schedule_irq();
> | 		...
> | 	} else {
> | 		...
> | 	}
> | }
> 
> That way the behaviour and structure will be more aligned with the
> generic code, and with that as an independent patch it will be easier to
> review/test/bisect/etc.
> 
> This change will have at least two key impacts:
> 
> (1) We'll preempt even without taking a "real" interrupt. That
>     shouldn't result in preemption that wasn't possible before, 
>     but it does change the probability of preempting at certain points,
>     and might have a performance impact, so probably warrants a
>     benchmark.
> 
> (2) We will not preempt when taking interrupts from a region of kernel
>     code where IRQs are enabled but RCU is not watching, matching the
>     behaviour of the generic entry code.
> 
>     This has the potential to introduce livelock if we can ever have a
>     screaming interrupt in such a region, so we'll need to go figure out
>     whether that's actually a problem.
> 
>     Having this as a separate patch will make it easier to test/bisect
>     for that specifically.

Looking some more, the pseudo-NMI DAIF check in
arm64_preempt_schedule_irq() is going to be a problem, becuase the
IRQ/NMI path manages DAIF distinctly from all other exception handlers.

I suspect we'll need to factor that out more in the generic entry code.

Mark.
Russell King (Oracle) Oct. 22, 2024, 1:58 p.m. UTC | #11
On Thu, Oct 17, 2024 at 04:25:36PM +0100, Mark Rutland wrote:
> Hi,
> 
> On Sat, Jun 29, 2024 at 04:55:58PM +0800, Jinjie Ruan wrote:
> > Currently, x86, Riscv, Loongarch use the generic entry. Convert arm64
> > to use the generic entry infrastructure from kernel/entry/*. The generic
> > entry makes maintainers' work easier and codes more elegant, which aslo
> > removed a lot of duplicate code.
> 
> >  arch/arm64/Kconfig                    |   1 +
> >  arch/arm64/include/asm/entry-common.h | 172 ++++++++++++
> >  arch/arm64/include/asm/ptrace.h       |   5 +
> >  arch/arm64/include/asm/stacktrace.h   |   5 +-
> >  arch/arm64/include/asm/syscall.h      |   6 +-
> >  arch/arm64/include/asm/thread_info.h  |  23 +-
> >  arch/arm64/kernel/entry-common.c      | 368 +++++---------------------
> >  arch/arm64/kernel/ptrace.c            |  90 -------
> >  arch/arm64/kernel/signal.c            |   3 +-
> >  arch/arm64/kernel/syscall.c           |  18 +-
> >  include/linux/entry-common.h          |  90 +++++++
> >  include/linux/thread_info.h           |  13 +
> >  kernel/entry/common.c                 |  37 +--
> >  13 files changed, 395 insertions(+), 436 deletions(-)
> >  create mode 100644 arch/arm64/include/asm/entry-common.h
> 
> Looking at this I have a few concerns, which I've tried to explain
> below.

One concern I notice Mark didn't mention is (and I've made this same
point in response to LinusW's series doing similar for 32-bit ARM)
is... we need to quantify what the impact is of making these changes.

Historically, people have worked hard to make the overhead from
syscalls as low as possible - if one looks at the x86 architecture,
Linux first started off using int $0x80 to deliver syscalls to the
kernel. Later CPUs invented sysenter and syscall instructions which
I guess result in increased performance over the old int $0x80
approach.

syscall entry/exit times are one of those trivial things to measure,
most commonly done by seeing how many getpid() calls one can make
to the kernel in a defined period of time - and it's one of those
measurements people consider (rightly or wrongly) to be a measure
of the system performance.

When one considers that the majority of interactions userspace has
with the kernel involve syscalls, one can see why this is an
important path to be as performant as possible.

So, I think it's important to always quantify what the impact is of
any major change to the kernel entry/exit paths - it's all part of
properly evaluating whether the change makes sense.

If switching to generic code causes a significant degredation in
performance, then that needs to be investigated and fixed - not
waited for until someone pops up and eventually says "hey, that
change to use the generic code resulted in our systems becoming
much less performant!" six or twelve months down the road.
Linus Walleij Nov. 1, 2024, 11:01 p.m. UTC | #12
On Tue, Oct 22, 2024 at 3:59 PM Russell King (Oracle)
<linux@armlinux.org.uk> wrote:

> So, I think it's important to always quantify what the impact is of
> any major change to the kernel entry/exit paths - it's all part of
> properly evaluating whether the change makes sense.
>
> If switching to generic code causes a significant degredation in
> performance, then that needs to be investigated and fixed - not
> waited for until someone pops up and eventually says "hey, that
> change to use the generic code resulted in our systems becoming
> much less performant!" six or twelve months down the road.

I agree, I did some measurements for the ARM32 patch which is
in RFC v2. I ran:

perf bench syscall all

This issues 10 million getppid calls and measures how much time
that takes (on a system with some background IRQ noise). I issued
this three times (which takes some time to execute) before and
after the patch set, then compared the best performance before the
patch with the worst performance after the patch.

On ARM32 generic entry costs some 3-4% of performance overhead
(increased execution time).

By chance I discovered that PAN (privileged access never,
CPU_SW_DOMAIN_PAN) costs around 400% increased syscall
execution time overhead (two orders of magnitude more than generic
entry) Debian apparently turns PAN off, maybe by mistake, maybe
for this reason. I was a bit surprised by it, I don't know if I made
some error in the measurement.

Yours,
Linus Walleij