diff mbox series

[1/3] ptrace,syscall_user_dispatch: Implement Syscall User Dispatch Suspension

Message ID 20230109153348.5625-2-gregory.price@memverge.com (mailing list archive)
State New
Headers show
Series Checkpoint Support for Syscall User Dispatch | expand

Commit Message

Gregory Price Jan. 9, 2023, 3:33 p.m. UTC
Adds PTRACE_O_SUSPEND_SYSCALL_USER_DISPATCH to ptrace options, and
modify Syscall User Dispatch to suspend interception when enabled.

This is modeled after the SUSPEND_SECCOMP feature, which suspends
SECCOMP interposition.  Without doing this, software like CRIU will
inject system calls into a process and be intercepted by Syscall
User Dispatch, either causing a crash (due to blocked signals) or
the delivery of those signals to a ptracer (not the intended behavior).

Since Syscall User Dispatch is not a privileged feature, a check
for permissions is not required, however attempting to set this
option when CONFIG_CHECKPOINT_RESTORE it not supported should be
disallowed, as its intended use is checkpoint/resume.

Signed-off-by: Gregory Price <gregory.price@memverge.com>
---
 include/linux/ptrace.h               | 2 ++
 include/uapi/linux/ptrace.h          | 6 +++++-
 kernel/entry/syscall_user_dispatch.c | 5 +++++
 kernel/ptrace.c                      | 5 +++++
 4 files changed, 17 insertions(+), 1 deletion(-)

Comments

Peter Zijlstra Jan. 18, 2023, 5:16 p.m. UTC | #1
On Mon, Jan 09, 2023 at 10:33:46AM -0500, Gregory Price wrote:
> @@ -36,6 +37,10 @@ bool syscall_user_dispatch(struct pt_regs *regs)
>  	struct syscall_user_dispatch *sd = &current->syscall_dispatch;
>  	char state;
>  
> +	if (IS_ENABLED(CONFIG_CHECKPOINT_RESTORE) &&
> +			unlikely(current->ptrace & PT_SUSPEND_SYSCALL_USER_DISPATCH))
> +		return false;
> +
>  	if (likely(instruction_pointer(regs) - sd->offset < sd->len))
>  		return false;
>  

So by making syscall_user_dispatch() return false, we'll make
syscall_trace_enter() continue to handle things, and supposedly you want
to land in ptrace_report_syscall_entry(), right?

> diff --git a/kernel/ptrace.c b/kernel/ptrace.c
> index 54482193e1ed..a6ad815bd4be 100644
> --- a/kernel/ptrace.c
> +++ b/kernel/ptrace.c
> @@ -370,6 +370,11 @@ static int check_ptrace_options(unsigned long data)
>  	if (data & ~(unsigned long)PTRACE_O_MASK)
>  		return -EINVAL;
>  
> +	if (unlikely(data & PTRACE_O_SUSPEND_SYSCALL_USER_DISPATCH)) {
> +		if (!IS_ENABLED(CONFIG_CHECKPOINT_RESTART))
> +			return -EINVAL;
> +	}

Should setting this then not also depend on having
SYSCALL_WORK_SYSCALL_TRACE set? Because without that, you get 'funny'
things.
Gregory Price Jan. 18, 2023, 7:49 p.m. UTC | #2
On Wed, Jan 18, 2023 at 02:41:00PM -0500, Gregory Price wrote:
> ---------- Forwarded message ---------
> From: Peter Zijlstra <peterz@infradead.org>
> Date: Wed, Jan 18, 2023 at 12:16 PM
> Subject: Re: [PATCH 1/3] ptrace,syscall_user_dispatch: Implement Syscall
> User Dispatch Suspension
> To: Gregory Price <gourry.memverge@gmail.com>
> 
> 
> On Mon, Jan 09, 2023 at 10:33:46AM -0500, Gregory Price wrote:
> > @@ -36,6 +37,10 @@ bool syscall_user_dispatch(struct pt_regs *regs)
> >       struct syscall_user_dispatch *sd = &current->syscall_dispatch;
> >       char state;
> >
> > +     if (IS_ENABLED(CONFIG_CHECKPOINT_RESTORE) &&
> > +                     unlikely(current->ptrace &
> PT_SUSPEND_SYSCALL_USER_DISPATCH))
> > +             return false;
> > +
> >       if (likely(instruction_pointer(regs) - sd->offset < sd->len))
> >               return false;
> >
> 
> So by making syscall_user_dispatch() return false, we'll make
> syscall_trace_enter() continue to handle things, and supposedly you want
> to land in ptrace_report_syscall_entry(), right?
>
> ... snip ...
> 
> Should setting this then not also depend on having
> SYSCALL_WORK_SYSCALL_TRACE set? Because without that, you get 'funny'
> things.

Hm, this is an interesting question.  My thoughts are that I want the
process to handle the syscall as-if syscall user dispatch was not
present at all, regardless of SYSCALL_TRACE.

This is because some software, like CRIU, actually injects syscalls to
run in the context of the software in an effort to collect resources.

So I actually *want* those 'funny' things to occur, because they're most
likely intentional.  I don't necessarily want to intercept system calls
that subsequently occur (although i might).

So if this feature required SYSCALL_TRACE, you would no longer be able
to inject system calls ala CRIU.

That's also my understanding of the SECCOMP_SUSPEND feature as well,
it's intended specifically to allow *otherwise disallowed* syscalls to
be injected into the process and SECCOMP bypassed. (in this case,
SECCOMP_SUSPEND requires root for exactly this reason).
Peter Zijlstra Jan. 18, 2023, 8:40 p.m. UTC | #3
On Wed, Jan 18, 2023 at 02:49:31PM -0500, Gregory Price wrote:
> On Wed, Jan 18, 2023 at 02:41:00PM -0500, Gregory Price wrote:
> > ---------- Forwarded message ---------
> > From: Peter Zijlstra <peterz@infradead.org>
> > Date: Wed, Jan 18, 2023 at 12:16 PM
> > Subject: Re: [PATCH 1/3] ptrace,syscall_user_dispatch: Implement Syscall
> > User Dispatch Suspension
> > To: Gregory Price <gourry.memverge@gmail.com>
> > 
> > 
> > On Mon, Jan 09, 2023 at 10:33:46AM -0500, Gregory Price wrote:
> > > @@ -36,6 +37,10 @@ bool syscall_user_dispatch(struct pt_regs *regs)
> > >       struct syscall_user_dispatch *sd = &current->syscall_dispatch;
> > >       char state;
> > >
> > > +     if (IS_ENABLED(CONFIG_CHECKPOINT_RESTORE) &&
> > > +                     unlikely(current->ptrace &
> > PT_SUSPEND_SYSCALL_USER_DISPATCH))
> > > +             return false;
> > > +
> > >       if (likely(instruction_pointer(regs) - sd->offset < sd->len))
> > >               return false;
> > >
> > 
> > So by making syscall_user_dispatch() return false, we'll make
> > syscall_trace_enter() continue to handle things, and supposedly you want
> > to land in ptrace_report_syscall_entry(), right?
> >
> > ... snip ...
> > 
> > Should setting this then not also depend on having
> > SYSCALL_WORK_SYSCALL_TRACE set? Because without that, you get 'funny'
> > things.
> 
> Hm, this is an interesting question.  My thoughts are that I want the
> process to handle the syscall as-if syscall user dispatch was not
> present at all, regardless of SYSCALL_TRACE.
> 
> This is because some software, like CRIU, actually injects syscalls to
> run in the context of the software in an effort to collect resources.

Oh, right. I used to know that.

> So I actually *want* those 'funny' things to occur, because they're most
> likely intentional.  I don't necessarily want to intercept system calls
> that subsequently occur (although i might).
> 
> So if this feature required SYSCALL_TRACE, you would no longer be able
> to inject system calls ala CRIU.

Yeah, I suppose you're right. It makes it a very sharp instrument, but I
suppose you get what you asked for.
diff mbox series

Patch

diff --git a/include/linux/ptrace.h b/include/linux/ptrace.h
index eaaef3ffec22..461ae5c99d57 100644
--- a/include/linux/ptrace.h
+++ b/include/linux/ptrace.h
@@ -45,6 +45,8 @@  extern int ptrace_access_vm(struct task_struct *tsk, unsigned long addr,
 
 #define PT_EXITKILL		(PTRACE_O_EXITKILL << PT_OPT_FLAG_SHIFT)
 #define PT_SUSPEND_SECCOMP	(PTRACE_O_SUSPEND_SECCOMP << PT_OPT_FLAG_SHIFT)
+#define PT_SUSPEND_SYSCALL_USER_DISPATCH \
+	(PTRACE_O_SUSPEND_SYSCALL_USER_DISPATCH << PT_OPT_FLAG_SHIFT)
 
 extern long arch_ptrace(struct task_struct *child, long request,
 			unsigned long addr, unsigned long data);
diff --git a/include/uapi/linux/ptrace.h b/include/uapi/linux/ptrace.h
index 195ae64a8c87..ba9e3f19a22c 100644
--- a/include/uapi/linux/ptrace.h
+++ b/include/uapi/linux/ptrace.h
@@ -146,9 +146,13 @@  struct ptrace_rseq_configuration {
 /* eventless options */
 #define PTRACE_O_EXITKILL		(1 << 20)
 #define PTRACE_O_SUSPEND_SECCOMP	(1 << 21)
+#define PTRACE_O_SUSPEND_SYSCALL_USER_DISPATCH	(1 << 22)
 
 #define PTRACE_O_MASK		(\
-	0x000000ff | PTRACE_O_EXITKILL | PTRACE_O_SUSPEND_SECCOMP)
+	0x000000ff | \
+	PTRACE_O_EXITKILL | \
+	PTRACE_O_SUSPEND_SECCOMP | \
+	PTRACE_O_SUSPEND_SYSCALL_USER_DISPATCH)
 
 #include <asm/ptrace.h>
 
diff --git a/kernel/entry/syscall_user_dispatch.c b/kernel/entry/syscall_user_dispatch.c
index 0b6379adff6b..f097c06224c9 100644
--- a/kernel/entry/syscall_user_dispatch.c
+++ b/kernel/entry/syscall_user_dispatch.c
@@ -8,6 +8,7 @@ 
 #include <linux/uaccess.h>
 #include <linux/signal.h>
 #include <linux/elf.h>
+#include <linux/ptrace.h>
 
 #include <linux/sched/signal.h>
 #include <linux/sched/task_stack.h>
@@ -36,6 +37,10 @@  bool syscall_user_dispatch(struct pt_regs *regs)
 	struct syscall_user_dispatch *sd = &current->syscall_dispatch;
 	char state;
 
+	if (IS_ENABLED(CONFIG_CHECKPOINT_RESTORE) &&
+			unlikely(current->ptrace & PT_SUSPEND_SYSCALL_USER_DISPATCH))
+		return false;
+
 	if (likely(instruction_pointer(regs) - sd->offset < sd->len))
 		return false;
 
diff --git a/kernel/ptrace.c b/kernel/ptrace.c
index 54482193e1ed..a6ad815bd4be 100644
--- a/kernel/ptrace.c
+++ b/kernel/ptrace.c
@@ -370,6 +370,11 @@  static int check_ptrace_options(unsigned long data)
 	if (data & ~(unsigned long)PTRACE_O_MASK)
 		return -EINVAL;
 
+	if (unlikely(data & PTRACE_O_SUSPEND_SYSCALL_USER_DISPATCH)) {
+		if (!IS_ENABLED(CONFIG_CHECKPOINT_RESTART))
+			return -EINVAL;
+	}
+
 	if (unlikely(data & PTRACE_O_SUSPEND_SECCOMP)) {
 		if (!IS_ENABLED(CONFIG_CHECKPOINT_RESTORE) ||
 		    !IS_ENABLED(CONFIG_SECCOMP))