diff mbox

[RFC] ptrace: add generic SET_SYSCALL request

Message ID 1415346443-28915-1-git-send-email-takahiro.akashi@linaro.org (mailing list archive)
State New, archived
Headers show

Commit Message

AKASHI Takahiro Nov. 7, 2014, 7:47 a.m. UTC
This patch adds a new generic ptrace request, PTRACE_SET_SYSCALL.
It can be used to change a system call number as follows:
    ret = ptrace(pid, PTRACE_SET_SYSCALL, null, new_syscall_no);
'new_syscall_no' can be -1 to skip this system call, you need to modify
a register's value, in arch-specific way, as return value though.

Please note that we can't define PTRACE_SET_SYSCALL macro in
uapi/linux/ptrace.h partly because its value on arm, 23, is used as another
request on sparc.

This patch also contains an example of change on arch side, arm.
Only syscall_set_nr() is required to be defined in asm/syscall.h.

Currently only arm has this request, while arm64 would also have it
once my patch series of seccomp for arm64 is merged. It will also be
usable for most of other arches.
See the discussions in lak-ml:
http://lists.infradead.org/pipermail/linux-arm-kernel/2014-November/300167.html

Signed-off-by: AKASHI Takahiro <takahiro.akashi@linaro.org>
---
 arch/arm/include/asm/syscall.h |    7 +++++++
 arch/arm/kernel/ptrace.c       |    5 -----
 kernel/ptrace.c                |    6 ++++++
 3 files changed, 13 insertions(+), 5 deletions(-)

Comments

Arnd Bergmann Nov. 7, 2014, 9:30 a.m. UTC | #1
On Friday 07 November 2014 16:47:23 AKASHI Takahiro wrote:
> This patch adds a new generic ptrace request, PTRACE_SET_SYSCALL.
> It can be used to change a system call number as follows:
>     ret = ptrace(pid, PTRACE_SET_SYSCALL, null, new_syscall_no);
> 'new_syscall_no' can be -1 to skip this system call, you need to modify
> a register's value, in arch-specific way, as return value though.
> 
> Please note that we can't define PTRACE_SET_SYSCALL macro in
> uapi/linux/ptrace.h partly because its value on arm, 23, is used as another
> request on sparc.
> 
> This patch also contains an example of change on arch side, arm.
> Only syscall_set_nr() is required to be defined in asm/syscall.h.
> 
> Currently only arm has this request, while arm64 would also have it
> once my patch series of seccomp for arm64 is merged. It will also be
> usable for most of other arches.
> See the discussions in lak-ml:
> http://lists.infradead.org/pipermail/linux-arm-kernel/2014-November/300167.html
> 
> Signed-off-by: AKASHI Takahiro <takahiro.akashi@linaro.org>
> 

Can you describe why you are moving the implementation? Is this a feature
that we want to have on all architectures in the future? As you say,
only arm32 implements is at the moment.

	Arnd
Will Deacon Nov. 7, 2014, 11:55 a.m. UTC | #2
On Fri, Nov 07, 2014 at 09:30:53AM +0000, Arnd Bergmann wrote:
> On Friday 07 November 2014 16:47:23 AKASHI Takahiro wrote:
> > This patch adds a new generic ptrace request, PTRACE_SET_SYSCALL.
> > It can be used to change a system call number as follows:
> >     ret = ptrace(pid, PTRACE_SET_SYSCALL, null, new_syscall_no);
> > 'new_syscall_no' can be -1 to skip this system call, you need to modify
> > a register's value, in arch-specific way, as return value though.
> > 
> > Please note that we can't define PTRACE_SET_SYSCALL macro in
> > uapi/linux/ptrace.h partly because its value on arm, 23, is used as another
> > request on sparc.
> > 
> > This patch also contains an example of change on arch side, arm.
> > Only syscall_set_nr() is required to be defined in asm/syscall.h.
> > 
> > Currently only arm has this request, while arm64 would also have it
> > once my patch series of seccomp for arm64 is merged. It will also be
> > usable for most of other arches.
> > See the discussions in lak-ml:
> > http://lists.infradead.org/pipermail/linux-arm-kernel/2014-November/300167.html
> > 
> > Signed-off-by: AKASHI Takahiro <takahiro.akashi@linaro.org>
> > 
> 
> Can you describe why you are moving the implementation? Is this a feature
> that we want to have on all architectures in the future? As you say,
> only arm32 implements is at the moment.

We need this for arm64 and, since all architectures seem to have a mechanism
for setting a system call via ptrace, moving it to generic code should make
sense for new architectures too, no?

We don't have any arch-specific ptrace requests on arm64, so it would be
a shame if we had to add one now, especially since there's nothing
conceptually arch-specific about setting a syscall number.

Will
Arnd Bergmann Nov. 7, 2014, 12:03 p.m. UTC | #3
On Friday 07 November 2014 11:55:51 Will Deacon wrote:
> On Fri, Nov 07, 2014 at 09:30:53AM +0000, Arnd Bergmann wrote:
> > On Friday 07 November 2014 16:47:23 AKASHI Takahiro wrote:
> > > This patch adds a new generic ptrace request, PTRACE_SET_SYSCALL.
> > > It can be used to change a system call number as follows:
> > >     ret = ptrace(pid, PTRACE_SET_SYSCALL, null, new_syscall_no);
> > > 'new_syscall_no' can be -1 to skip this system call, you need to modify
> > > a register's value, in arch-specific way, as return value though.
> > > 
> > > Please note that we can't define PTRACE_SET_SYSCALL macro in
> > > uapi/linux/ptrace.h partly because its value on arm, 23, is used as another
> > > request on sparc.
> > > 
> > > This patch also contains an example of change on arch side, arm.
> > > Only syscall_set_nr() is required to be defined in asm/syscall.h.
> > > 
> > > Currently only arm has this request, while arm64 would also have it
> > > once my patch series of seccomp for arm64 is merged. It will also be
> > > usable for most of other arches.
> > > See the discussions in lak-ml:
> > > http://lists.infradead.org/pipermail/linux-arm-kernel/2014-November/300167.html
> > > 
> > > Signed-off-by: AKASHI Takahiro <takahiro.akashi@linaro.org>
> > > 
> > 
> > Can you describe why you are moving the implementation? Is this a feature
> > that we want to have on all architectures in the future? As you say,
> > only arm32 implements is at the moment.
> 
> We need this for arm64 and, since all architectures seem to have a mechanism
> for setting a system call via ptrace, moving it to generic code should make
> sense for new architectures too, no?

It makes a little more sense now, but I still don't understand why you
need to set the system call number via ptrace. What is this used for,
and why doesn't any other architecture have this?

	Arnd
Russell King - ARM Linux Nov. 7, 2014, 12:11 p.m. UTC | #4
On Fri, Nov 07, 2014 at 01:03:00PM +0100, Arnd Bergmann wrote:
> On Friday 07 November 2014 11:55:51 Will Deacon wrote:
> > We need this for arm64 and, since all architectures seem to have a mechanism
> > for setting a system call via ptrace, moving it to generic code should make
> > sense for new architectures too, no?
> 
> It makes a little more sense now, but I still don't understand why you
> need to set the system call number via ptrace. What is this used for,
> and why doesn't any other architecture have this?

All other architectures have a way.  x86, for example, you set orig_eax
(or orig_rax) to change the syscall number.  On ARM, that doesn't work
because we don't always pass the syscall number in a register.
Will Deacon Nov. 7, 2014, 12:27 p.m. UTC | #5
On Fri, Nov 07, 2014 at 12:03:00PM +0000, Arnd Bergmann wrote:
> On Friday 07 November 2014 11:55:51 Will Deacon wrote:
> > On Fri, Nov 07, 2014 at 09:30:53AM +0000, Arnd Bergmann wrote:
> > > On Friday 07 November 2014 16:47:23 AKASHI Takahiro wrote:
> > > > This patch adds a new generic ptrace request, PTRACE_SET_SYSCALL.
> > > > It can be used to change a system call number as follows:
> > > >     ret = ptrace(pid, PTRACE_SET_SYSCALL, null, new_syscall_no);
> > > > 'new_syscall_no' can be -1 to skip this system call, you need to modify
> > > > a register's value, in arch-specific way, as return value though.
> > > > 
> > > > Please note that we can't define PTRACE_SET_SYSCALL macro in
> > > > uapi/linux/ptrace.h partly because its value on arm, 23, is used as another
> > > > request on sparc.
> > > > 
> > > > This patch also contains an example of change on arch side, arm.
> > > > Only syscall_set_nr() is required to be defined in asm/syscall.h.
> > > > 
> > > > Currently only arm has this request, while arm64 would also have it
> > > > once my patch series of seccomp for arm64 is merged. It will also be
> > > > usable for most of other arches.
> > > > See the discussions in lak-ml:
> > > > http://lists.infradead.org/pipermail/linux-arm-kernel/2014-November/300167.html
> > > > 
> > > > Signed-off-by: AKASHI Takahiro <takahiro.akashi@linaro.org>
> > > > 
> > > 
> > > Can you describe why you are moving the implementation? Is this a feature
> > > that we want to have on all architectures in the future? As you say,
> > > only arm32 implements is at the moment.
> > 
> > We need this for arm64 and, since all architectures seem to have a mechanism
> > for setting a system call via ptrace, moving it to generic code should make
> > sense for new architectures too, no?
> 
> It makes a little more sense now, but I still don't understand why you
> need to set the system call number via ptrace. What is this used for,
> and why doesn't any other architecture have this?

I went through the same thought process back in August, and Akashi
eventually convinced me that this was the best thing to do:

  http://lists.infradead.org/pipermail/linux-arm-kernel/2014-August/278692.html

It comes down to a debugger (which could be GDB, seccomp, tracer ...)
wanting to change the system call number. This is also used as a mechanism
to skip a system call by setting it to '-1' (yeah, it's gross, and the
interaction between all of these syscall hooks is horrible too).

If we update w8 directly instead, we run into a couple of issues:

  - Needing to restore the original w8 if the value is set to '-1' for
    skip, but continuing to return -ENOSYS for syscall(-1) when not on a
    tracer path

  - seccomp assumes that syscall_get_nr will return the version set by
    the most recent tracer, so then we need hacks in ptrace to route
    register writes to w8 to syscallno in pt_regs, but again, only in the
    case that we're tracing.

Akashi might be able to elaborate on other problems, since this was a
couple of months ago and I take every opportunity I can to avoid looking
at this part of the kernel.

Will
Arnd Bergmann Nov. 7, 2014, 12:44 p.m. UTC | #6
On Friday 07 November 2014 12:11:19 Russell King - ARM Linux wrote:
> On Fri, Nov 07, 2014 at 01:03:00PM +0100, Arnd Bergmann wrote:
> > On Friday 07 November 2014 11:55:51 Will Deacon wrote:
> > > We need this for arm64 and, since all architectures seem to have a mechanism
> > > for setting a system call via ptrace, moving it to generic code should make
> > > sense for new architectures too, no?
> > 
> > It makes a little more sense now, but I still don't understand why you
> > need to set the system call number via ptrace. What is this used for,
> > and why doesn't any other architecture have this?
> 
> All other architectures have a way.  x86, for example, you set orig_eax
> (or orig_rax) to change the syscall number.  On ARM, that doesn't work
> because we don't always pass the syscall number in a register.
> 

Sorry for being slow today, but why can't we use the same interface that
s390 has on arm64:

static int s390_system_call_get(struct task_struct *target,
                                const struct user_regset *regset,
                                unsigned int pos, unsigned int count,
                                void *kbuf, void __user *ubuf)
{
        unsigned int *data = &task_thread_info(target)->system_call;
        return user_regset_copyout(&pos, &count, &kbuf, &ubuf,
                                   data, 0, sizeof(unsigned int));
}

static int s390_system_call_set(struct task_struct *target,
                                const struct user_regset *regset,
                                unsigned int pos, unsigned int count,
                                const void *kbuf, const void __user *ubuf)
{
        unsigned int *data = &task_thread_info(target)->system_call;
        return user_regset_copyin(&pos, &count, &kbuf, &ubuf,
                                  data, 0, sizeof(unsigned int));
}

static const struct user_regset s390_regsets[] = {
	...
        {
                .core_note_type = NT_S390_SYSTEM_CALL,
                .n = 1,
                .size = sizeof(unsigned int),
                .align = sizeof(unsigned int),
                .get = s390_system_call_get,
                .set = s390_system_call_set,
        },
	...
};

Is it just preference for being consistent with ARM32, or is there a
reason this won't work?

It's not that I care strongly about the interface, my main point is
that the changelog doesn't describe why one interface was used instead
the other.

	Arnd
Will Deacon Nov. 7, 2014, 1:11 p.m. UTC | #7
On Fri, Nov 07, 2014 at 12:44:07PM +0000, Arnd Bergmann wrote:
> On Friday 07 November 2014 12:11:19 Russell King - ARM Linux wrote:
> > On Fri, Nov 07, 2014 at 01:03:00PM +0100, Arnd Bergmann wrote:
> > > On Friday 07 November 2014 11:55:51 Will Deacon wrote:
> > > > We need this for arm64 and, since all architectures seem to have a mechanism
> > > > for setting a system call via ptrace, moving it to generic code should make
> > > > sense for new architectures too, no?
> > > 
> > > It makes a little more sense now, but I still don't understand why you
> > > need to set the system call number via ptrace. What is this used for,
> > > and why doesn't any other architecture have this?
> > 
> > All other architectures have a way.  x86, for example, you set orig_eax
> > (or orig_rax) to change the syscall number.  On ARM, that doesn't work
> > because we don't always pass the syscall number in a register.
> > 
> 
> Sorry for being slow today, but why can't we use the same interface that
> s390 has on arm64:
> 
> static int s390_system_call_get(struct task_struct *target,
>                                 const struct user_regset *regset,
>                                 unsigned int pos, unsigned int count,
>                                 void *kbuf, void __user *ubuf)
> {
>         unsigned int *data = &task_thread_info(target)->system_call;
>         return user_regset_copyout(&pos, &count, &kbuf, &ubuf,
>                                    data, 0, sizeof(unsigned int));
> }
> 
> static int s390_system_call_set(struct task_struct *target,
>                                 const struct user_regset *regset,
>                                 unsigned int pos, unsigned int count,
>                                 const void *kbuf, const void __user *ubuf)
> {
>         unsigned int *data = &task_thread_info(target)->system_call;
>         return user_regset_copyin(&pos, &count, &kbuf, &ubuf,
>                                   data, 0, sizeof(unsigned int));
> }
> 
> static const struct user_regset s390_regsets[] = {
> 	...
>         {
>                 .core_note_type = NT_S390_SYSTEM_CALL,
>                 .n = 1,
>                 .size = sizeof(unsigned int),
>                 .align = sizeof(unsigned int),
>                 .get = s390_system_call_get,
>                 .set = s390_system_call_set,
>         },
> 	...
> };
> 
> Is it just preference for being consistent with ARM32, or is there a
> reason this won't work?

Interesting, I hadn't considered a unit-length regset.

> It's not that I care strongly about the interface, my main point is
> that the changelog doesn't describe why one interface was used instead
> the other.

I suspect the current approach was taken because it follows the same scheme
as 32-bit ARM. If both methods are sufficient (Kees would have a better idea
than me on that), then I don't have a strong preference.

Will
Oleg Nesterov Nov. 7, 2014, 2:04 p.m. UTC | #8
On 11/07, AKASHI Takahiro wrote:
>
> --- a/arch/arm/kernel/ptrace.c
> +++ b/arch/arm/kernel/ptrace.c
> @@ -853,11 +853,6 @@ long arch_ptrace(struct task_struct *child, long request,
>  				       datap);
>  			break;
>  
> -		case PTRACE_SET_SYSCALL:
> -			task_thread_info(child)->syscall = data;
> -			ret = 0;
> -			break;
> -
>  #ifdef CONFIG_CRUNCH
>  		case PTRACE_GETCRUNCHREGS:
>  			ret = ptrace_getcrunchregs(child, datap);
> diff --git a/kernel/ptrace.c b/kernel/ptrace.c
> index 54e7522..d7048fa 100644
> --- a/kernel/ptrace.c
> +++ b/kernel/ptrace.c
> @@ -1001,6 +1001,12 @@ int ptrace_request(struct task_struct *child, long request,
>  		break;
>  	}
>  #endif
> +
> +#ifdef PTRACE_SET_SYSCALL
> +	case PTRACE_SET_SYSCALL:
> +		ret = syscall_set_nr(child, task_pt_regs(child), data);
> +		break;
> +#endif

I too do not understand why it makes sense to move PTRACE_SET_SYSCALL into
the common kernel/ptrace.c.

To me the fact that PTRACE_SET_SYSCALL can be undefined and syscall_set_nr()
is very much arch-dependant (but most probably trivial) means that this code
should live in arch_ptrace().

In any case, I think it doesn't make sense to pass task_pt_regs(child), this
helper can do this itself if it needs struct pt_regs.

Oleg.
Arnd Bergmann Nov. 7, 2014, 2:30 p.m. UTC | #9
On Friday 07 November 2014 13:11:30 Will Deacon wrote:
> 
> > It's not that I care strongly about the interface, my main point is
> > that the changelog doesn't describe why one interface was used instead
> > the other.
> 
> I suspect the current approach was taken because it follows the same scheme
> as 32-bit ARM. If both methods are sufficient (Kees would have a better idea
> than me on that), then I don't have a strong preference.

Using the regset would probably address Oleg's comment, and would keep the
implementation architecture specific. You could even share the NT_S390_SYSTEM_CALL
number, but I don't know if there any downsides to doing that.

	Arnd
Kees Cook Nov. 7, 2014, 4:44 p.m. UTC | #10
On Fri, Nov 7, 2014 at 6:30 AM, Arnd Bergmann <arnd@arndb.de> wrote:
> On Friday 07 November 2014 13:11:30 Will Deacon wrote:
>>
>> > It's not that I care strongly about the interface, my main point is
>> > that the changelog doesn't describe why one interface was used instead
>> > the other.
>>
>> I suspect the current approach was taken because it follows the same scheme
>> as 32-bit ARM. If both methods are sufficient (Kees would have a better idea
>> than me on that), then I don't have a strong preference.
>
> Using the regset would probably address Oleg's comment, and would keep the
> implementation architecture specific. You could even share the NT_S390_SYSTEM_CALL
> number, but I don't know if there any downsides to doing that.

That's fine by me -- I only want an interface. :) I think it'd be nice
to keep it the same between arm32 and arm64, but using a specific
regset does seem to be the better approach.

-Kees
Roland McGrath Nov. 7, 2014, 11:05 p.m. UTC | #11
Not that I'm actually involved any more, but I'd endorse the user_regset
approach and not the new request.  On many (most?) machines, it's already
part of the main integer regset (orig_rax et al) and adding another
mechanism is redundant.  Using user_regset also means there won't be a word
of hidden state that is not visible in core dumps.
AKASHI Takahiro Nov. 10, 2014, 6:36 a.m. UTC | #12
On 11/07/2014 09:27 PM, Will Deacon wrote:
> On Fri, Nov 07, 2014 at 12:03:00PM +0000, Arnd Bergmann wrote:
>> On Friday 07 November 2014 11:55:51 Will Deacon wrote:
>>> On Fri, Nov 07, 2014 at 09:30:53AM +0000, Arnd Bergmann wrote:
>>>> On Friday 07 November 2014 16:47:23 AKASHI Takahiro wrote:
>>>>> This patch adds a new generic ptrace request, PTRACE_SET_SYSCALL.
>>>>> It can be used to change a system call number as follows:
>>>>>      ret = ptrace(pid, PTRACE_SET_SYSCALL, null, new_syscall_no);
>>>>> 'new_syscall_no' can be -1 to skip this system call, you need to modify
>>>>> a register's value, in arch-specific way, as return value though.
>>>>>
>>>>> Please note that we can't define PTRACE_SET_SYSCALL macro in
>>>>> uapi/linux/ptrace.h partly because its value on arm, 23, is used as another
>>>>> request on sparc.
>>>>>
>>>>> This patch also contains an example of change on arch side, arm.
>>>>> Only syscall_set_nr() is required to be defined in asm/syscall.h.
>>>>>
>>>>> Currently only arm has this request, while arm64 would also have it
>>>>> once my patch series of seccomp for arm64 is merged. It will also be
>>>>> usable for most of other arches.
>>>>> See the discussions in lak-ml:
>>>>> http://lists.infradead.org/pipermail/linux-arm-kernel/2014-November/300167.html
>>>>>
>>>>> Signed-off-by: AKASHI Takahiro <takahiro.akashi@linaro.org>
>>>>>
>>>>
>>>> Can you describe why you are moving the implementation? Is this a feature
>>>> that we want to have on all architectures in the future? As you say,
>>>> only arm32 implements is at the moment.
>>>
>>> We need this for arm64 and, since all architectures seem to have a mechanism
>>> for setting a system call via ptrace, moving it to generic code should make
>>> sense for new architectures too, no?
>>
>> It makes a little more sense now, but I still don't understand why you
>> need to set the system call number via ptrace. What is this used for,
>> and why doesn't any other architecture have this?
>
> I went through the same thought process back in August, and Akashi
> eventually convinced me that this was the best thing to do:
>
>    http://lists.infradead.org/pipermail/linux-arm-kernel/2014-August/278692.html
>
> It comes down to a debugger (which could be GDB, seccomp, tracer ...)
> wanting to change the system call number. This is also used as a mechanism
> to skip a system call by setting it to '-1' (yeah, it's gross, and the
> interaction between all of these syscall hooks is horrible too).
>
> If we update w8 directly instead, we run into a couple of issues:
>
>    - Needing to restore the original w8 if the value is set to '-1' for
>      skip, but continuing to return -ENOSYS for syscall(-1) when not on a
>      tracer path

Yeah, this restriction still exists on my recent patch, v7.
(this is because arm64 uses the same register, x0, as the first argument
and a return value.)

>    - seccomp assumes that syscall_get_nr will return the version set by
>      the most recent tracer, so then we need hacks in ptrace to route
>      register writes to w8 to syscallno in pt_regs, but again, only in the
>      case that we're tracing.

The problem here is that, if we had a hack of replacinging syscallno with w8
in ptrace (ptrace_syscall_enter()), secure_computing() (actually, seccomp_phase2()
on v3.18-rc) would have no chance of seeing a modified syscall number because
the hack would be executed after secure_computing().
(Please note that a tracer simply modifies w8, not syscallno directly).

This eventually results in missing a special case of -1 (skipping this system call).
     http://lists.infradead.org/pipermail/linux-arm-kernel/2014-August/280846.html

That is why we needed to have a dedicated new interface.

-Takahiro AKASHI

> Akashi might be able to elaborate on other problems, since this was a
> couple of months ago and I take every opportunity I can to avoid looking
> at this part of the kernel.
>
> Will
>
AKASHI Takahiro Nov. 12, 2014, 10:46 a.m. UTC | #13
Will,

On 11/07/2014 11:04 PM, Oleg Nesterov wrote:
> On 11/07, AKASHI Takahiro wrote:
>>
>> --- a/arch/arm/kernel/ptrace.c
>> +++ b/arch/arm/kernel/ptrace.c
>> @@ -853,11 +853,6 @@ long arch_ptrace(struct task_struct *child, long request,
>>   				       datap);
>>   			break;
>>
>> -		case PTRACE_SET_SYSCALL:
>> -			task_thread_info(child)->syscall = data;
>> -			ret = 0;
>> -			break;
>> -
>>   #ifdef CONFIG_CRUNCH
>>   		case PTRACE_GETCRUNCHREGS:
>>   			ret = ptrace_getcrunchregs(child, datap);
>> diff --git a/kernel/ptrace.c b/kernel/ptrace.c
>> index 54e7522..d7048fa 100644
>> --- a/kernel/ptrace.c
>> +++ b/kernel/ptrace.c
>> @@ -1001,6 +1001,12 @@ int ptrace_request(struct task_struct *child, long request,
>>   		break;
>>   	}
>>   #endif
>> +
>> +#ifdef PTRACE_SET_SYSCALL
>> +	case PTRACE_SET_SYSCALL:
>> +		ret = syscall_set_nr(child, task_pt_regs(child), data);
>> +		break;
>> +#endif
>
> I too do not understand why it makes sense to move PTRACE_SET_SYSCALL into
> the common kernel/ptrace.c.

I think I explained why we need a new (atomic) interface of changing a system
call number while tracing with ptrace. But I don't have a strong preference,
either ptrace(SET_SYSCALL) or ptrace(SETREGSET, NT_SYSTEM_CALL).

> To me the fact that PTRACE_SET_SYSCALL can be undefined and syscall_set_nr()
> is very much arch-dependant (but most probably trivial) means that this  code
> should live in arch_ptrace().

Thinking of Oleg's comment above, it doesn't make sense neither to define generic
NT_SYSTEM_CALL (user_regset) in uapi/linux/elf.h and implement it in ptrace_regset()
in kernel/ptrace.c with arch-defined syscall_(g)set_nr().

Since we should have the same interface on arm and arm64, we'd better implement
ptrace(PTRACE_SET_SYSCALL) locally on arm64 for now (as I originally submitted).

-Takahiro AKASHI

> In any case, I think it doesn't make sense to pass task_pt_regs(child), this
> helper can do this itself if it needs struct pt_regs.
>
> Oleg.
>
Will Deacon Nov. 12, 2014, 11 a.m. UTC | #14
On Wed, Nov 12, 2014 at 10:46:01AM +0000, AKASHI Takahiro wrote:
> On 11/07/2014 11:04 PM, Oleg Nesterov wrote:
> > To me the fact that PTRACE_SET_SYSCALL can be undefined and syscall_set_nr()
> > is very much arch-dependant (but most probably trivial) means that this  code
> > should live in arch_ptrace().
> 
> Thinking of Oleg's comment above, it doesn't make sense neither to define generic
> NT_SYSTEM_CALL (user_regset) in uapi/linux/elf.h and implement it in ptrace_regset()
> in kernel/ptrace.c with arch-defined syscall_(g)set_nr().
> 
> Since we should have the same interface on arm and arm64, we'd better implement
> ptrace(PTRACE_SET_SYSCALL) locally on arm64 for now (as I originally submitted).

I think the regset approach is cleaner. We already do something similar for
TLS. That would be implemented under arch/arm64/ with it's own NT type.

Will
AKASHI Takahiro Nov. 12, 2014, 11:06 a.m. UTC | #15
On 11/12/2014 08:00 PM, Will Deacon wrote:
> On Wed, Nov 12, 2014 at 10:46:01AM +0000, AKASHI Takahiro wrote:
>> On 11/07/2014 11:04 PM, Oleg Nesterov wrote:
>>> To me the fact that PTRACE_SET_SYSCALL can be undefined and syscall_set_nr()
>>> is very much arch-dependant (but most probably trivial) means that this  code
>>> should live in arch_ptrace().
>>
>> Thinking of Oleg's comment above, it doesn't make sense neither to define generic
>> NT_SYSTEM_CALL (user_regset) in uapi/linux/elf.h and implement it in ptrace_regset()
>> in kernel/ptrace.c with arch-defined syscall_(g)set_nr().
>>
>> Since we should have the same interface on arm and arm64, we'd better implement
>> ptrace(PTRACE_SET_SYSCALL) locally on arm64 for now (as I originally submitted).
>
> I think the regset approach is cleaner. We already do something similar for
> TLS. That would be implemented under arch/arm64/ with it's own NT type.

Okey, so arm64 goes its own way :)
Or do you want to have a similar regset on arm, too?
(In this case, NT_ARM_SYSTEM_CALL can be shared in uapi/linux/elf.h)

-Takahiro AKASHI

> Will
>
Will Deacon Nov. 12, 2014, 11:13 a.m. UTC | #16
On Wed, Nov 12, 2014 at 11:06:59AM +0000, AKASHI Takahiro wrote:
> On 11/12/2014 08:00 PM, Will Deacon wrote:
> > On Wed, Nov 12, 2014 at 10:46:01AM +0000, AKASHI Takahiro wrote:
> >> On 11/07/2014 11:04 PM, Oleg Nesterov wrote:
> >>> To me the fact that PTRACE_SET_SYSCALL can be undefined and syscall_set_nr()
> >>> is very much arch-dependant (but most probably trivial) means that this  code
> >>> should live in arch_ptrace().
> >>
> >> Thinking of Oleg's comment above, it doesn't make sense neither to define generic
> >> NT_SYSTEM_CALL (user_regset) in uapi/linux/elf.h and implement it in ptrace_regset()
> >> in kernel/ptrace.c with arch-defined syscall_(g)set_nr().
> >>
> >> Since we should have the same interface on arm and arm64, we'd better implement
> >> ptrace(PTRACE_SET_SYSCALL) locally on arm64 for now (as I originally submitted).
> >
> > I think the regset approach is cleaner. We already do something similar for
> > TLS. That would be implemented under arch/arm64/ with it's own NT type.
> 
> Okey, so arm64 goes its own way :)
> Or do you want to have a similar regset on arm, too?
> (In this case, NT_ARM_SYSTEM_CALL can be shared in uapi/linux/elf.h)

Just do arm64. We already have the dedicated request for arch/arm/.

Will
Arnd Bergmann Nov. 12, 2014, 11:19 a.m. UTC | #17
On Wednesday 12 November 2014 11:13:52 Will Deacon wrote:
> On Wed, Nov 12, 2014 at 11:06:59AM +0000, AKASHI Takahiro wrote:
> > On 11/12/2014 08:00 PM, Will Deacon wrote:
> > > On Wed, Nov 12, 2014 at 10:46:01AM +0000, AKASHI Takahiro wrote:
> > >> On 11/07/2014 11:04 PM, Oleg Nesterov wrote:
> > >>> To me the fact that PTRACE_SET_SYSCALL can be undefined and syscall_set_nr()
> > >>> is very much arch-dependant (but most probably trivial) means that this  code
> > >>> should live in arch_ptrace().
> > >>
> > >> Thinking of Oleg's comment above, it doesn't make sense neither to define generic
> > >> NT_SYSTEM_CALL (user_regset) in uapi/linux/elf.h and implement it in ptrace_regset()
> > >> in kernel/ptrace.c with arch-defined syscall_(g)set_nr().
> > >>
> > >> Since we should have the same interface on arm and arm64, we'd better implement
> > >> ptrace(PTRACE_SET_SYSCALL) locally on arm64 for now (as I originally submitted).
> > >
> > > I think the regset approach is cleaner. We already do something similar for
> > > TLS. That would be implemented under arch/arm64/ with it's own NT type.
> > 
> > Okey, so arm64 goes its own way 
> > Or do you want to have a similar regset on arm, too?
> > (In this case, NT_ARM_SYSTEM_CALL can be shared in uapi/linux/elf.h)
> 
> Just do arm64. We already have the dedicated request for arch/arm/.

I wonder if we should define NT_ARM64_SYSTEM_CALL to the same value
as NT_S390_SYSTEM_CALL (0x307), or even define it as an architecture-
independent NT_SYSTEM_CALL number with that value, so other architectures
don't have to introduce new types when they also want to implement it.

	Arnd
Russell King - ARM Linux Nov. 12, 2014, 12:05 p.m. UTC | #18
On Wed, Nov 12, 2014 at 12:19:26PM +0100, Arnd Bergmann wrote:
> On Wednesday 12 November 2014 11:13:52 Will Deacon wrote:
> > Just do arm64. We already have the dedicated request for arch/arm/.
> 
> I wonder if we should define NT_ARM64_SYSTEM_CALL to the same value
> as NT_S390_SYSTEM_CALL (0x307), or even define it as an architecture-
> independent NT_SYSTEM_CALL number with that value, so other architectures
> don't have to introduce new types when they also want to implement it.

That would be a sane thing to do, so that tools which want to get at
this information can do in an almost standardised way for architectures
implementing this method.
AKASHI Takahiro Nov. 13, 2014, 7:02 a.m. UTC | #19
On 11/12/2014 08:19 PM, Arnd Bergmann wrote:
> On Wednesday 12 November 2014 11:13:52 Will Deacon wrote:
>> On Wed, Nov 12, 2014 at 11:06:59AM +0000, AKASHI Takahiro wrote:
>>> On 11/12/2014 08:00 PM, Will Deacon wrote:
>>>> On Wed, Nov 12, 2014 at 10:46:01AM +0000, AKASHI Takahiro wrote:
>>>>> On 11/07/2014 11:04 PM, Oleg Nesterov wrote:
>>>>>> To me the fact that PTRACE_SET_SYSCALL can be undefined and syscall_set_nr()
>>>>>> is very much arch-dependant (but most probably trivial) means that this  code
>>>>>> should live in arch_ptrace().
>>>>>
>>>>> Thinking of Oleg's comment above, it doesn't make sense neither to define generic
>>>>> NT_SYSTEM_CALL (user_regset) in uapi/linux/elf.h and implement it in ptrace_regset()
>>>>> in kernel/ptrace.c with arch-defined syscall_(g)set_nr().
>>>>>
>>>>> Since we should have the same interface on arm and arm64, we'd better implement
>>>>> ptrace(PTRACE_SET_SYSCALL) locally on arm64 for now (as I originally submitted).
>>>>
>>>> I think the regset approach is cleaner. We already do something similar for
>>>> TLS. That would be implemented under arch/arm64/ with it's own NT type.
>>>
>>> Okey, so arm64 goes its own way
>>> Or do you want to have a similar regset on arm, too?
>>> (In this case, NT_ARM_SYSTEM_CALL can be shared in uapi/linux/elf.h)
>>
>> Just do arm64. We already have the dedicated request for arch/arm/.
>
> I wonder if we should define NT_ARM64_SYSTEM_CALL to the same value
> as NT_S390_SYSTEM_CALL (0x307), or even define it as an architecture-
> independent NT_SYSTEM_CALL number with that value, so other architectures
> don't have to introduce new types when they also want to implement it.

I digged into gdb code (gdb/bfd/elf.c):
https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;a=blob;f=bfd/elf.c;h=8b207ad872a3992381e93bdfa0a75ef444651613;hb=HEAD
   elf_parse_notes()->elfcore_grok_note()->elfcore_grok_s390_system_call()

It seems to me that NT_S390_SYSTEM_CALL(=0x307) is recognized as a s390 specific
value (without checking for machine type). So thinking of potential conflict, it might not be
a good idea to use this value as a common number (of NT_SYSTEM_CALL).
It's very unlikely that a "note" section for NT_(S390_)SYSTEM_CALL appears in a coredump file, though.

What do you think?

-Takahiro AKASHI

>
> 	Arnd
>
Arnd Bergmann Nov. 13, 2014, 10:21 a.m. UTC | #20
On Thursday 13 November 2014 16:02:49 AKASHI Takahiro wrote:
> On 11/12/2014 08:19 PM, Arnd Bergmann wrote:
> > On Wednesday 12 November 2014 11:13:52 Will Deacon wrote:
> >> On Wed, Nov 12, 2014 at 11:06:59AM +0000, AKASHI Takahiro wrote:
> >>> On 11/12/2014 08:00 PM, Will Deacon wrote:
> >>>> On Wed, Nov 12, 2014 at 10:46:01AM +0000, AKASHI Takahiro wrote:
> >>>>> On 11/07/2014 11:04 PM, Oleg Nesterov wrote:
> >>>>>> To me the fact that PTRACE_SET_SYSCALL can be undefined and syscall_set_nr()
> >>>>>> is very much arch-dependant (but most probably trivial) means that this  code
> >>>>>> should live in arch_ptrace().
> >>>>>
> >>>>> Thinking of Oleg's comment above, it doesn't make sense neither to define generic
> >>>>> NT_SYSTEM_CALL (user_regset) in uapi/linux/elf.h and implement it in ptrace_regset()
> >>>>> in kernel/ptrace.c with arch-defined syscall_(g)set_nr().
> >>>>>
> >>>>> Since we should have the same interface on arm and arm64, we'd better implement
> >>>>> ptrace(PTRACE_SET_SYSCALL) locally on arm64 for now (as I originally submitted).
> >>>>
> >>>> I think the regset approach is cleaner. We already do something similar for
> >>>> TLS. That would be implemented under arch/arm64/ with it's own NT type.
> >>>
> >>> Okey, so arm64 goes its own way
> >>> Or do you want to have a similar regset on arm, too?
> >>> (In this case, NT_ARM_SYSTEM_CALL can be shared in uapi/linux/elf.h)
> >>
> >> Just do arm64. We already have the dedicated request for arch/arm/.
> >
> > I wonder if we should define NT_ARM64_SYSTEM_CALL to the same value
> > as NT_S390_SYSTEM_CALL (0x307), or even define it as an architecture-
> > independent NT_SYSTEM_CALL number with that value, so other architectures
> > don't have to introduce new types when they also want to implement it.
> 
> I digged into gdb code (gdb/bfd/elf.c):
> https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;a=blob;f=bfd/elf.c;h=8b207ad872a3992381e93bdfa0a75ef444651613;hb=HEAD
>    elf_parse_notes()->elfcore_grok_note()->elfcore_grok_s390_system_call()
> 
> It seems to me that NT_S390_SYSTEM_CALL(=0x307) is recognized as a s390 specific
> value (without checking for machine type). So thinking of potential conflict, it might not be
> a good idea to use this value as a common number (of NT_SYSTEM_CALL).
> It's very unlikely that a "note" section for NT_(S390_)SYSTEM_CALL appears in a coredump file, though.
>
> What do you think?

(adding Ulrich and Andreas)

This code was introduced by http://sourceware-org.1504.n7.nabble.com/rfa-s390-bfd-part-Support-extended-register-sets-td50072.html

I have to admit that I don't really understand gdb internals, but from
a first look I get the impression that it will just do the right thing
if you reuse NT_S390_SYSTEM_CALL on ARM64 with the same semantics.

If not, we should indeed have a different number for it and duplicate that
code.

	Arnd
Ulrich Weigand Nov. 13, 2014, 2:49 p.m. UTC | #21
Arnd Bergmann <arnd@arndb.de> wrote on 13.11.2014 11:21:28:

> I have to admit that I don't really understand gdb internals, but from
> a first look I get the impression that it will just do the right thing
> if you reuse NT_S390_SYSTEM_CALL on ARM64 with the same semantics.

There's an interface between BFD and GDB proper involved here.  BFD will
detect the presence of register set notes in the core dump, and will
translate them into virtual sections; GDB will then simply look up such
sections under well-known names.

In particular, the NT_S390_SYSTEM_CALL note will be translated by BFD
into a virtual section named ".reg-s390-system-call"; GDB platform-
specific code will look for sections of this particular name.

So if you were to create notes using the same note type, by default it
would do nothing on ARM64.  You might add code to the ARM64 back-end
to also look for a section ".reg-s390-system-call", but that would be
somewhat confusing.  Using a new, platform-specific note type for ARM64
would appear to fit better with existing precedent.

Bye,
Ulrich
Arnd Bergmann Nov. 13, 2014, 10:25 p.m. UTC | #22
On Thursday 13 November 2014 15:49:20 Ulrich Weigand wrote:
> Arnd Bergmann <arnd@arndb.de> wrote on 13.11.2014 11:21:28:
> 
> > I have to admit that I don't really understand gdb internals, but from
> > a first look I get the impression that it will just do the right thing
> > if you reuse NT_S390_SYSTEM_CALL on ARM64 with the same semantics.
> 
> There's an interface between BFD and GDB proper involved here.  BFD will
> detect the presence of register set notes in the core dump, and will
> translate them into virtual sections; GDB will then simply look up such
> sections under well-known names.
> 
> In particular, the NT_S390_SYSTEM_CALL note will be translated by BFD
> into a virtual section named ".reg-s390-system-call"; GDB platform-
> specific code will look for sections of this particular name.
> 
> So if you were to create notes using the same note type, by default it
> would do nothing on ARM64.  You might add code to the ARM64 back-end
> to also look for a section ".reg-s390-system-call", but that would be
> somewhat confusing.  Using a new, platform-specific note type for ARM64
> would appear to fit better with existing precedent.

Ok, thanks a lot for your insight and for confirming what Takahiro AKASHI
said. Let's use a new NT_ARM64_SYSTEM_CALL type with a different
number then.

	Arnd
AKASHI Takahiro Nov. 14, 2014, 1:40 a.m. UTC | #23
Ulrich, Arnd, thank you for your discussions:

On 11/14/2014 07:25 AM, Arnd Bergmann wrote:
> On Thursday 13 November 2014 15:49:20 Ulrich Weigand wrote:
>> Arnd Bergmann <arnd@arndb.de> wrote on 13.11.2014 11:21:28:
>>
>>> I have to admit that I don't really understand gdb internals, but from
>>> a first look I get the impression that it will just do the right thing
>>> if you reuse NT_S390_SYSTEM_CALL on ARM64 with the same semantics.
>>
>> There's an interface between BFD and GDB proper involved here.  BFD will
>> detect the presence of register set notes in the core dump, and will
>> translate them into virtual sections; GDB will then simply look up such
>> sections under well-known names.
>>
>> In particular, the NT_S390_SYSTEM_CALL note will be translated by BFD
>> into a virtual section named ".reg-s390-system-call"; GDB platform-
>> specific code will look for sections of this particular name.
>>
>> So if you were to create notes using the same note type, by default it
>> would do nothing on ARM64.  You might add code to the ARM64 back-end
>> to also look for a section ".reg-s390-system-call", but that would be
>> somewhat confusing.  Using a new, platform-specific note type for ARM64
>> would appear to fit better with existing precedent.

I implemented a regset of NT_SYSTEM_CALL(=NT_S390_SYSTEM_CALL) experimentally,
and checked a generated core file:

 >$ aarch64-linux-gnu-readelf -Wn <...>/tmp/nulltest/core
 >
 >Displaying notes found at file offset 0x000003c0 with length 0x00000a68:
 >  Owner                 Data size    Description
 >  CORE                 0x00000188    NT_PRSTATUS (prstatus structure)
 >  CORE                 0x00000088    NT_PRPSINFO (prpsinfo structure)
 >  CORE                 0x00000080    NT_SIGINFO (siginfo_t data)
 >  CORE                 0x00000130    NT_AUXV (auxiliary vector)
 >  CORE                 0x000001b4    NT_FILE (mapped files)
 >    Page size: 4096
 >                 Start                 End         Page Offset
 >[snip]...
 >  CORE                 0x00000210    NT_FPREGSET (floating point registers)
 >  LINUX                0x00000008    NT_ARM_TLS (AArch TLS registers)
 >  LINUX                0x00000108    NT_ARM_HW_BREAK (AArch hardware breakpoint registers)
 >  LINUX                0x00000108    NT_ARM_HW_WATCH (AArch hardware watchpoint registers)
 >  LINUX                0x00000004    NT_S390_SYSTEM_CALL (s390 system call restart data)

Looks funny:)

> Ok, thanks a lot for your insight and for confirming what Takahiro AKASHI
> said. Let's use a new NT_ARM64_SYSTEM_CALL type with a different
> number then.

We will use NT_ARM_SYSTEM_CALL(=0x404) as other NT_ARM_*, except NT_ARM_VFP,
are also shared by arch/arm and arch/arm64.

Anyhow, gdb (and/or binutils?) should be updated as well once my coming patch is merged.
My next question is who should know this?

Thanks,
-Takahiro AKASHI

> 	Arnd
>
diff mbox

Patch

diff --git a/arch/arm/include/asm/syscall.h b/arch/arm/include/asm/syscall.h
index e86c985..3e1d9c0 100644
--- a/arch/arm/include/asm/syscall.h
+++ b/arch/arm/include/asm/syscall.h
@@ -24,6 +24,13 @@  static inline int syscall_get_nr(struct task_struct *task,
 	return task_thread_info(task)->syscall;
 }
 
+static inline int syscall_set_nr(struct task_struct *task,
+				 struct pt_regs *regs, int syscall)
+{
+	task_thread_info(task)->syscall = syscall;
+	return 0;
+}
+
 static inline void syscall_rollback(struct task_struct *task,
 				    struct pt_regs *regs)
 {
diff --git a/arch/arm/kernel/ptrace.c b/arch/arm/kernel/ptrace.c
index ef9119f..908bae8 100644
--- a/arch/arm/kernel/ptrace.c
+++ b/arch/arm/kernel/ptrace.c
@@ -853,11 +853,6 @@  long arch_ptrace(struct task_struct *child, long request,
 				       datap);
 			break;
 
-		case PTRACE_SET_SYSCALL:
-			task_thread_info(child)->syscall = data;
-			ret = 0;
-			break;
-
 #ifdef CONFIG_CRUNCH
 		case PTRACE_GETCRUNCHREGS:
 			ret = ptrace_getcrunchregs(child, datap);
diff --git a/kernel/ptrace.c b/kernel/ptrace.c
index 54e7522..d7048fa 100644
--- a/kernel/ptrace.c
+++ b/kernel/ptrace.c
@@ -1001,6 +1001,12 @@  int ptrace_request(struct task_struct *child, long request,
 		break;
 	}
 #endif
+
+#ifdef PTRACE_SET_SYSCALL
+	case PTRACE_SET_SYSCALL:
+		ret = syscall_set_nr(child, task_pt_regs(child), data);
+		break;
+#endif
 	default:
 		break;
 	}