diff mbox

[v6,0/6] arm64: Add kernel probes (kprobes) support

Message ID 5551F693.8050100@redhat.com (mailing list archive)
State New, archived
Headers show

Commit Message

William Cohen May 12, 2015, 12:48 p.m. UTC
On 05/12/2015 01:54 AM, David Long wrote:
> On 05/05/15 11:48, Will Deacon wrote:
>> On Tue, May 05, 2015 at 06:14:51AM +0100, David Long wrote:
>>> On 05/01/15 21:44, William Cohen wrote:
>>>> Dave Long and I did some additional experimentation to better
>>>> understand what is condition causes the kernel to sometimes spew:
>>>>
>>>> Unexpected kernel single-step exception at EL1
>>>>
>>>> The functioncallcount.stp test instruments the entry and return of
>>>> every function in the mm files, including kfree.  In most cases the
>>>> arm64 trampoline_probe_handler just determines which return probe
>>>> instance matches the current conditions, runs the associated handler,
>>>> and recycles the return probe instance for another use by placing it
>>>> on a hlist.  However, it is possible that a return probe instance has
>>>> been set up on function entry and the return probe is unregistered
>>>> before the return probe instance fires.  In this case kfree is called
>>>> by the trampoline handler to remove the return probe instances related
>>>> to the unregistered kretprobe.  This case where the the kprobed kfree
>>>> is called within the arm64 trampoline_probe_handler function trigger
>>>> the problem.
>>>>
>>>> The kprobe breakpoint for the kfree call from within the
>>>> trampoline_probe_handler is encountered and started, but things go
>>>> wrong when attempting the single step on the instruction.
>>>>
>>>> It took a while to trigger this problem with the sytemtap testsuite.
>>>> Dave Long came up with steps that reproduce this more quickly with a
>>>> probed function that is always called within the trampoline handler.
>>>> Trying the same on x86_64 doesn't trigger the problem.  It appears
>>>> that the x86_64 code can handle a single step from within the
>>>> trampoline_handler.
>>>>
>>>
>>> I'm assuming there are no plans for supporting software breakpoint debug
>>> exceptions during processing of single-step exceptions, any time soon on
>>> arm64.  Given that the only solution that I can come with for this is
>>> instead of making this orphaned kretprobe instance list exist only
>>> temporarily (in the scope of the kretprobe trampoline handler), make it
>>> always exist and kfree any items found on it as part of a periodic
>>> cleanup running outside of the handler context.  I think these changes
>>> would still all be in archiecture-specific code.  This doesn't feel to
>>> me like a bad solution.  Does anyone think there is a simpler way out of
>>> this?
>>
>> Just to clarify, is the problem here the software breakpoint exception,
>> or trying to step the faulting instruction whilst we were already handling
>> a step?
>>
> 
> Sorry for the delay, I got tripped up with some global optimizations that happened when I made more testing changes.  When the kprobes software breakpoint handler for kretprobes is reentered it sets up the single-step and that ends up hitting inside entry.S, apparently in el1_undef.
> 
>> I think I'd be inclined to keep the code run in debug context to a minimum.
>> We already can't block there, and the more code we add the more black spots
>> we end up with in the kernel itself. The alternative would be to make your
>> kprobes code re-entrant, but that sounds like a nightmare.
>>
>> You say this works on x86. How do they handle it? Is the nested probe
>> on kfree ignored or handled?
>>
> 
> Will Cohen's email pointing out x86 does not use a breakpoint for the trampoline handler explains a lot.  I'm experimenting starting with his proposed new trampoline code.  I can't see a reason this can't be made to work and so given everything it doesn't seem interesting to try and understand the failure in reentering the kprobe break handler in any more detail.
> 
> -dave long
> 
> 

Hi Dave,

In some of the previous diagnostic output it looked like things would go wrong in the entry.S when the D bit was cleared and the debug interrupts were unmasksed.  I wonder if some of the issue might be due to the starting the kprobe for the trampoline, but leaving things in an odd state when another set of krpobe/kretporbes are hit when the trampoline is running.  As Dave mentioned the proposed trampoline patch avoids using a kprobe in the trampoline and directly calls the trampoline handler.  Attached is the current version of the patch which was able to run the systemtap testsuite.  Systemtap does exercise the kprobe/kretprobe infrastructure, but it would be good to have additional raw kprobe tests to check that kprobe reentry works as expected.

-Will Cohen

Comments

Masami Hiramatsu May 13, 2015, 9:22 a.m. UTC | #1
On 2015/05/12 21:48, William Cohen wrote:
> On 05/12/2015 01:54 AM, David Long wrote:
>> On 05/05/15 11:48, Will Deacon wrote:
>>> On Tue, May 05, 2015 at 06:14:51AM +0100, David Long wrote:
>>>> On 05/01/15 21:44, William Cohen wrote:
>>>>> Dave Long and I did some additional experimentation to better
>>>>> understand what is condition causes the kernel to sometimes spew:
>>>>>
>>>>> Unexpected kernel single-step exception at EL1
>>>>>
>>>>> The functioncallcount.stp test instruments the entry and return of
>>>>> every function in the mm files, including kfree.  In most cases the
>>>>> arm64 trampoline_probe_handler just determines which return probe
>>>>> instance matches the current conditions, runs the associated handler,
>>>>> and recycles the return probe instance for another use by placing it
>>>>> on a hlist.  However, it is possible that a return probe instance has
>>>>> been set up on function entry and the return probe is unregistered
>>>>> before the return probe instance fires.  In this case kfree is called
>>>>> by the trampoline handler to remove the return probe instances related
>>>>> to the unregistered kretprobe.  This case where the the kprobed kfree
>>>>> is called within the arm64 trampoline_probe_handler function trigger
>>>>> the problem.
>>>>>
>>>>> The kprobe breakpoint for the kfree call from within the
>>>>> trampoline_probe_handler is encountered and started, but things go
>>>>> wrong when attempting the single step on the instruction.
>>>>>
>>>>> It took a while to trigger this problem with the sytemtap testsuite.
>>>>> Dave Long came up with steps that reproduce this more quickly with a
>>>>> probed function that is always called within the trampoline handler.
>>>>> Trying the same on x86_64 doesn't trigger the problem.  It appears
>>>>> that the x86_64 code can handle a single step from within the
>>>>> trampoline_handler.
>>>>>
>>>>
>>>> I'm assuming there are no plans for supporting software breakpoint debug
>>>> exceptions during processing of single-step exceptions, any time soon on
>>>> arm64.  Given that the only solution that I can come with for this is
>>>> instead of making this orphaned kretprobe instance list exist only
>>>> temporarily (in the scope of the kretprobe trampoline handler), make it
>>>> always exist and kfree any items found on it as part of a periodic
>>>> cleanup running outside of the handler context.  I think these changes
>>>> would still all be in archiecture-specific code.  This doesn't feel to
>>>> me like a bad solution.  Does anyone think there is a simpler way out of
>>>> this?
>>>
>>> Just to clarify, is the problem here the software breakpoint exception,
>>> or trying to step the faulting instruction whilst we were already handling
>>> a step?
>>>
>>
>> Sorry for the delay, I got tripped up with some global optimizations that happened when I made more testing changes.  When the kprobes software breakpoint handler for kretprobes is reentered it sets up the single-step and that ends up hitting inside entry.S, apparently in el1_undef.
>>
>>> I think I'd be inclined to keep the code run in debug context to a minimum.
>>> We already can't block there, and the more code we add the more black spots
>>> we end up with in the kernel itself. The alternative would be to make your
>>> kprobes code re-entrant, but that sounds like a nightmare.
>>>
>>> You say this works on x86. How do they handle it? Is the nested probe
>>> on kfree ignored or handled?
>>>
>>
>> Will Cohen's email pointing out x86 does not use a breakpoint for the trampoline handler explains a lot.  I'm experimenting starting with his proposed new trampoline code.  I can't see a reason this can't be made to work and so given everything it doesn't seem interesting to try and understand the failure in reentering the kprobe break handler in any more detail.
>>
>> -dave long
>>
>>
> 
> Hi Dave,
> 
> In some of the previous diagnostic output it looked like things would go wrong
> in the entry.S when the D bit was cleared and the debug interrupts were 
> unmasksed.  I wonder if some of the issue might be due to the starting the 
> kprobe for the trampoline, but leaving things in an odd state when another
> set of krpobe/kretporbes are hit when the trampoline is running.

Hmm, does this mean we have a trouble if a user kprobe handler calls the
function which is probed by other kprobe? Or, is this just a problem
only for kretprobes?

>  As Dave
> mentioned the proposed trampoline patch avoids using a kprobe in the
> trampoline and directly calls the trampoline handler.  Attached is the
> current version of the patch which was able to run the systemtap testsuite.
>  Systemtap does exercise the kprobe/kretprobe infrastructure, but it would
> be good to have additional raw kprobe tests to check that kprobe reentry
> works as expected.

Actually, Will's patch looks like the same thing what I did on x86,
as the kretprobe-booster. So I'm OK for that. But if the above problem
is not solved, we need to fix that, since kprobes can be used from
different sources.

Thank you,
William Cohen May 13, 2015, 3:41 p.m. UTC | #2
On 05/13/2015 05:22 AM, Masami Hiramatsu wrote:
> On 2015/05/12 21:48, William Cohen wrote:

>> Hi Dave,
>>
>> In some of the previous diagnostic output it looked like things would go wrong
>> in the entry.S when the D bit was cleared and the debug interrupts were 
>> unmasksed.  I wonder if some of the issue might be due to the starting the 
>> kprobe for the trampoline, but leaving things in an odd state when another
>> set of krpobe/kretporbes are hit when the trampoline is running.
> 
> Hmm, does this mean we have a trouble if a user kprobe handler calls the
> function which is probed by other kprobe? Or, is this just a problem
> only for kretprobes?

Hi Masami,

I wrote an example based off of sample/kprobes/kprobes_sample.c to force the reentry issue for kprobes (the attached kprobe_rentry_example.c). That seemed to run fine.  I think the reason that the trampoline handler got into trouble is because of the reset_current_kprobe() before the possible call to kfree, but I haven't verified it. It seems like that should be at the end of trampoline handler just before the return.  Other architectures have similar trampoline handlers, so I am surprised that the other architectures haven't encountered this issue with kretprobes.  Maybe this is due to specific of arm64 exception handling.

# modprobe kprobe_reentry_example
[  909.617295] Planted kprobe at fffffe00000b7b34
[  909.623873] Planted kprobe at fffffe000032d34c
# rmmod kprobe_reentry_example
[ 1482.647504] kprobe at fffffe00000b7b34 unregistered
[ 1482.687506] kprobe at fffffe000032d34c unregistered
[ 1482.692361] y = 42
[ 1482.694361] z = 0
# grep \ int_sqrt$ /proc/kallsyms 
fffffe000032d34c T int_sqrt
# grep \ do_fork$ /proc/kallsyms 
fffffe00000b7b34 T do_fork

> 
>>  As Dave
>> mentioned the proposed trampoline patch avoids using a kprobe in the
>> trampoline and directly calls the trampoline handler.  Attached is the
>> current version of the patch which was able to run the systemtap testsuite.
>>  Systemtap does exercise the kprobe/kretprobe infrastructure, but it would
>> be good to have additional raw kprobe tests to check that kprobe reentry
>> works as expected.
> 
> Actually, Will's patch looks like the same thing what I did on x86,
> as the kretprobe-booster. So I'm OK for that. But if the above problem
> is not solved, we need to fix that, since kprobes can be used from
> different sources.

The patch should look similar to the x86 code. The x86 code was used as a model.

-Will
David Long May 13, 2015, 7:58 p.m. UTC | #3
On 05/13/15 11:41, William Cohen wrote:
> On 05/13/2015 05:22 AM, Masami Hiramatsu wrote:
>> On 2015/05/12 21:48, William Cohen wrote:
>
>>> Hi Dave,
>>>
>>> In some of the previous diagnostic output it looked like things would go wrong
>>> in the entry.S when the D bit was cleared and the debug interrupts were
>>> unmasksed.  I wonder if some of the issue might be due to the starting the
>>> kprobe for the trampoline, but leaving things in an odd state when another
>>> set of krpobe/kretporbes are hit when the trampoline is running.
>>
>> Hmm, does this mean we have a trouble if a user kprobe handler calls the
>> function which is probed by other kprobe? Or, is this just a problem
>> only for kretprobes?
>
> Hi Masami,
>
> I wrote an example based off of sample/kprobes/kprobes_sample.c to force the reentry issue for kprobes (the attached kprobe_rentry_example.c). That seemed to run fine.  I think the reason that the trampoline handler got into trouble is because of the reset_current_kprobe() before the possible call to kfree, but I haven't verified it. It seems like that should be at the end of trampoline handler just before the return.  Other architectures have similar trampoline handlers, so I am surprised that the other architectures haven't encountered this issue with kretprobes.  Maybe this is due to specific of arm64 exception handling.
>
> # modprobe kprobe_reentry_example
> [  909.617295] Planted kprobe at fffffe00000b7b34
> [  909.623873] Planted kprobe at fffffe000032d34c
> # rmmod kprobe_reentry_example
> [ 1482.647504] kprobe at fffffe00000b7b34 unregistered
> [ 1482.687506] kprobe at fffffe000032d34c unregistered
> [ 1482.692361] y = 42
> [ 1482.694361] z = 0
> # grep \ int_sqrt$ /proc/kallsyms
> fffffe000032d34c T int_sqrt
> # grep \ do_fork$ /proc/kallsyms
> fffffe00000b7b34 T do_fork
>

I actually have been doing exactly the same thing.  I was not able to 
provoke a failure in either the pre or post handler when having them 
call to another kprobe'd (dummy) function.  However it does look to me 
like it's reentering the exception handler firstly at the second 
software breakpoint and then (shortly after returning from that) with a 
single-step.   That's what you would expect but I *thought* neither of 
those cases was allowed.  It does not call either the pre or post 
handler in the dummy function.  I would expect this to be an issue for a 
kretprobe pre handler too, although I've yet to test this.

>>
>>>   As Dave
>>> mentioned the proposed trampoline patch avoids using a kprobe in the
>>> trampoline and directly calls the trampoline handler.  Attached is the
>>> current version of the patch which was able to run the systemtap testsuite.
>>>   Systemtap does exercise the kprobe/kretprobe infrastructure, but it would
>>> be good to have additional raw kprobe tests to check that kprobe reentry
>>> works as expected.
>>
>> Actually, Will's patch looks like the same thing what I did on x86,
>> as the kretprobe-booster. So I'm OK for that. But if the above problem
>> is not solved, we need to fix that, since kprobes can be used from
>> different sources.
>
> The patch should look similar to the x86 code. The x86 code was used as a model.
>
> -Will
>
William Cohen May 13, 2015, 8:35 p.m. UTC | #4
On 05/13/2015 03:58 PM, David Long wrote:
> On 05/13/15 11:41, William Cohen wrote:
>> On 05/13/2015 05:22 AM, Masami Hiramatsu wrote:
>>> On 2015/05/12 21:48, William Cohen wrote:
>>
>>>> Hi Dave,
>>>>
>>>> In some of the previous diagnostic output it looked like things would go wrong
>>>> in the entry.S when the D bit was cleared and the debug interrupts were
>>>> unmasksed.  I wonder if some of the issue might be due to the starting the
>>>> kprobe for the trampoline, but leaving things in an odd state when another
>>>> set of krpobe/kretporbes are hit when the trampoline is running.
>>>
>>> Hmm, does this mean we have a trouble if a user kprobe handler calls the
>>> function which is probed by other kprobe? Or, is this just a problem
>>> only for kretprobes?
>>
>> Hi Masami,
>>
>> I wrote an example based off of sample/kprobes/kprobes_sample.c to force the reentry issue for kprobes (the attached kprobe_rentry_example.c). That seemed to run fine.  I think the reason that the trampoline handler got into trouble is because of the reset_current_kprobe() before the possible call to kfree, but I haven't verified it. It seems like that should be at the end of trampoline handler just before the return.  Other architectures have similar trampoline handlers, so I am surprised that the other architectures haven't encountered this issue with kretprobes.  Maybe this is due to specific of arm64 exception handling.
>>
>> # modprobe kprobe_reentry_example
>> [  909.617295] Planted kprobe at fffffe00000b7b34
>> [  909.623873] Planted kprobe at fffffe000032d34c
>> # rmmod kprobe_reentry_example
>> [ 1482.647504] kprobe at fffffe00000b7b34 unregistered
>> [ 1482.687506] kprobe at fffffe000032d34c unregistered
>> [ 1482.692361] y = 42
>> [ 1482.694361] z = 0
>> # grep \ int_sqrt$ /proc/kallsyms
>> fffffe000032d34c T int_sqrt
>> # grep \ do_fork$ /proc/kallsyms
>> fffffe00000b7b34 T do_fork
>>
> 
> I actually have been doing exactly the same thing.  I was not able to provoke a failure in either the pre or post handler when having them call to another kprobe'd (dummy) function.  However it does look to me like it's reentering the exception handler firstly at the second software breakpoint and then (shortly after returning from that) with a single-step.   That's what you would expect but I *thought* neither of those cases was allowed.  It does not call either the pre or post handler in the dummy function.  I would expect this to be an issue for a kretprobe pre handler too, although I've yet to test this.
> 

Hi Dave,

That is a good point. If the kprobe machinery disables the kretprobe handler because of reentry, that would be problem.  The kretprobe handler always has to fix up the return address, no exceptions.  The patch with the direct call to the trampoline handler would avoid these possible kprobe handler disables.  Could the skipping of the kretprobe trampoline handler be triggered by putting a kprobe on a "ret" instruction that is in a kretprobe'd function?  Is there any functions in the arm64 kernel memory that end up being a stub with just a return instruction in the memory management code?  Something similar to the following routines:

fffffe0000090e10 <calibration_delay_done>:
fffffe0000090e10:       d65f03c0        ret

or

fffffe0000094aac <exit_thread>:
fffffe0000094aac:       d65f03c0        ret



-Will


>>>
>>>>   As Dave
>>>> mentioned the proposed trampoline patch avoids using a kprobe in the
>>>> trampoline and directly calls the trampoline handler.  Attached is the
>>>> current version of the patch which was able to run the systemtap testsuite.
>>>>   Systemtap does exercise the kprobe/kretprobe infrastructure, but it would
>>>> be good to have additional raw kprobe tests to check that kprobe reentry
>>>> works as expected.
>>>
>>> Actually, Will's patch looks like the same thing what I did on x86,
>>> as the kretprobe-booster. So I'm OK for that. But if the above problem
>>> is not solved, we need to fix that, since kprobes can be used from
>>> different sources.
>>
>> The patch should look similar to the x86 code. The x86 code was used as a model.
>>
>> -Will
>>
>
Masami Hiramatsu May 14, 2015, 12:01 a.m. UTC | #5
On 2015/05/14 0:41, William Cohen wrote:
> On 05/13/2015 05:22 AM, Masami Hiramatsu wrote:
>> On 2015/05/12 21:48, William Cohen wrote:
> 
>>> Hi Dave,
>>>
>>> In some of the previous diagnostic output it looked like things would go wrong
>>> in the entry.S when the D bit was cleared and the debug interrupts were 
>>> unmasksed.  I wonder if some of the issue might be due to the starting the 
>>> kprobe for the trampoline, but leaving things in an odd state when another
>>> set of krpobe/kretporbes are hit when the trampoline is running.
>>
>> Hmm, does this mean we have a trouble if a user kprobe handler calls the
>> function which is probed by other kprobe? Or, is this just a problem
>> only for kretprobes?
> 
> Hi Masami,
> 
> I wrote an example based off of sample/kprobes/kprobes_sample.c to force 
> the reentry issue for kprobes (the attached kprobe_rentry_example.c). That
> seemed to run fine.  I think the reason that the trampoline handler got 
> into trouble is because of the reset_current_kprobe() before the possible
> call to kfree, but I haven't verified it.

I still doubt about kfree() reentrant call, since kretprobe handler only
calls recycle_rp_inst() which doesn't free the instance but just push it back
to the unused instance list.

> It seems like that should be at the end of trampoline handler just before
> the return.  Other architectures have similar trampoline handlers,
> so I am surprised that the other architectures haven't encountered this
> issue with kretprobes.  Maybe this is due to specific of arm64 exception
> handling.

Ah, indeed the reset_current_kprobe() here seems not good since some
interruption or some other reason, another kprobes can be hit before
returning.

If kprobes can handle reentered probes correctly, I think your patch
(directly branch from trampoline) looks good to fix this problem.
Actually, arm32 and x86 already has same method.

It seems that powerpc will have similar issue, does someone checked
that? Ananth?

Thank you,

> 
> # modprobe kprobe_reentry_example
> [  909.617295] Planted kprobe at fffffe00000b7b34
> [  909.623873] Planted kprobe at fffffe000032d34c
> # rmmod kprobe_reentry_example
> [ 1482.647504] kprobe at fffffe00000b7b34 unregistered
> [ 1482.687506] kprobe at fffffe000032d34c unregistered
> [ 1482.692361] y = 42
> [ 1482.694361] z = 0
> # grep \ int_sqrt$ /proc/kallsyms 
> fffffe000032d34c T int_sqrt
> # grep \ do_fork$ /proc/kallsyms 
> fffffe00000b7b34 T do_fork
> 
>>
>>>  As Dave
>>> mentioned the proposed trampoline patch avoids using a kprobe in the
>>> trampoline and directly calls the trampoline handler.  Attached is the
>>> current version of the patch which was able to run the systemtap testsuite.
>>>  Systemtap does exercise the kprobe/kretprobe infrastructure, but it would
>>> be good to have additional raw kprobe tests to check that kprobe reentry
>>> works as expected.
>>
>> Actually, Will's patch looks like the same thing what I did on x86,
>> as the kretprobe-booster. So I'm OK for that. But if the above problem
>> is not solved, we need to fix that, since kprobes can be used from
>> different sources.
> 
> The patch should look similar to the x86 code. The x86 code was used as a model.
> 
> -Will
>
Ananth N Mavinakayanahalli May 14, 2015, 3:48 a.m. UTC | #6
On Thu, May 14, 2015 at 09:01:03AM +0900, Masami Hiramatsu wrote:
> On 2015/05/14 0:41, William Cohen wrote:
> > On 05/13/2015 05:22 AM, Masami Hiramatsu wrote:
> >> On 2015/05/12 21:48, William Cohen wrote:
> > 
> >>> Hi Dave,
> >>>
> >>> In some of the previous diagnostic output it looked like things would go wrong
> >>> in the entry.S when the D bit was cleared and the debug interrupts were 
> >>> unmasksed.  I wonder if some of the issue might be due to the starting the 
> >>> kprobe for the trampoline, but leaving things in an odd state when another
> >>> set of krpobe/kretporbes are hit when the trampoline is running.
> >>
> >> Hmm, does this mean we have a trouble if a user kprobe handler calls the
> >> function which is probed by other kprobe? Or, is this just a problem
> >> only for kretprobes?
> > 
> > Hi Masami,
> > 
> > I wrote an example based off of sample/kprobes/kprobes_sample.c to force 
> > the reentry issue for kprobes (the attached kprobe_rentry_example.c). That
> > seemed to run fine.  I think the reason that the trampoline handler got 
> > into trouble is because of the reset_current_kprobe() before the possible
> > call to kfree, but I haven't verified it.
> 
> I still doubt about kfree() reentrant call, since kretprobe handler only
> calls recycle_rp_inst() which doesn't free the instance but just push it back
> to the unused instance list.
> 
> > It seems like that should be at the end of trampoline handler just before
> > the return.  Other architectures have similar trampoline handlers,
> > so I am surprised that the other architectures haven't encountered this
> > issue with kretprobes.  Maybe this is due to specific of arm64 exception
> > handling.
> 
> Ah, indeed the reset_current_kprobe() here seems not good since some
> interruption or some other reason, another kprobes can be hit before
> returning.
> 
> If kprobes can handle reentered probes correctly, I think your patch
> (directly branch from trampoline) looks good to fix this problem.
> Actually, arm32 and x86 already has same method.
> 
> It seems that powerpc will have similar issue, does someone checked
> that? Ananth?

Yes, there is a window where this can happen on powerpc. I haven't
however been able to trigger it thus far -- will try with a different
sample and test.

Thanks for the heads-up.

Ananth
diff mbox

Patch

diff --git a/arch/arm64/kernel/kprobes-arm64.h b/arch/arm64/kernel/kprobes-arm64.h
index ff8a55f..46dab24 100644
--- a/arch/arm64/kernel/kprobes-arm64.h
+++ b/arch/arm64/kernel/kprobes-arm64.h
@@ -27,4 +27,47 @@  extern kprobes_pstate_check_t * const kprobe_condition_checks[16];
 enum kprobe_insn __kprobes
 arm_kprobe_decode_insn(kprobe_opcode_t insn, struct arch_specific_insn *asi);
 
+#define SAVE_REGS_STRING\
+	"	stp x0, x1, [sp, #16 * 0]\n"	\
+	"	stp x2, x3, [sp, #16 * 1]\n"	\
+	"	stp x4, x5, [sp, #16 * 2]\n"	\
+	"	stp x6, x7, [sp, #16 * 3]\n"	\
+	"	stp x8, x9, [sp, #16 * 4]\n"	\
+	"	stp x10, x11, [sp, #16 * 5]\n"	\
+	"	stp x12, x13, [sp, #16 * 6]\n"	\
+	"	stp x14, x15, [sp, #16 * 7]\n"	\
+	"	stp x16, x17, [sp, #16 * 8]\n"	\
+	"	stp x18, x19, [sp, #16 * 9]\n"	\
+	"	stp x20, x21, [sp, #16 * 10]\n"	\
+	"	stp x22, x23, [sp, #16 * 11]\n"	\
+	"	stp x24, x25, [sp, #16 * 12]\n"	\
+	"	stp x26, x27, [sp, #16 * 13]\n"	\
+	"	stp x28, x29, [sp, #16 * 14]\n"	\
+	"	str x30,   [sp, #16 * 15]\n"    \
+	"	mrs x0, nzcv\n"			\
+	"	str x0, [sp, #8 * 33]\n"
+	
+
+#define RESTORE_REGS_STRING\
+	"	ldr x0, [sp, #8 * 33]\n"	\
+	"	msr nzcv, x0\n"			\
+	"	ldp x0, x1, [sp, #16 * 0]\n"	\
+	"	ldp x2, x3, [sp, #16 * 1]\n"	\
+	"	ldp x4, x5, [sp, #16 * 2]\n"	\
+	"	ldp x6, x7, [sp, #16 * 3]\n"	\
+	"	ldp x8, x9, [sp, #16 * 4]\n"	\
+	"	ldp x10, x11, [sp, #16 * 5]\n"	\
+	"	ldp x12, x13, [sp, #16 * 6]\n"	\
+	"	ldp x14, x15, [sp, #16 * 7]\n"	\
+	"	ldp x16, x17, [sp, #16 * 8]\n"	\
+	"	ldp x18, x19, [sp, #16 * 9]\n"	\
+	"	ldp x20, x21, [sp, #16 * 10]\n"	\
+	"	ldp x22, x23, [sp, #16 * 11]\n"	\
+	"	ldp x24, x25, [sp, #16 * 12]\n"	\
+	"	ldp x26, x27, [sp, #16 * 13]\n"	\
+	"	ldp x28, x29, [sp, #16 * 14]\n"	\
+	"	ldr x30,   [sp, #16 * 15]\n"	\
+
+
+
 #endif /* _ARM_KERNEL_KPROBES_ARM64_H */
diff --git a/arch/arm64/kernel/kprobes.c b/arch/arm64/kernel/kprobes.c
index d2aa4bc..af55e37 100644
--- a/arch/arm64/kernel/kprobes.c
+++ b/arch/arm64/kernel/kprobes.c
@@ -565,32 +565,27 @@  int __kprobes longjmp_break_handler(struct kprobe *p, struct pt_regs *regs)
 }
 
 /*
- * Kretprobes: kernel return probes handling
- *
- * AArch64 mode does not support popping the PC value from the
- * stack like on ARM 32-bit (ldmia {..,pc}), so atleast one
- * register need to be used to achieve branching/return.
- * It means return probes cannot return back to the original
- * return address directly without modifying the register context.
- *
- * So like other architectures, we prepare a global routine
- * with NOPs, which serve as trampoline address that hack away the
- * function return, with the exact register context.
- * Placing a kprobe on trampoline routine entry will trap again to
- * execute return probe handlers and restore original return address
- * in ELR_EL1, this way saved pt_regs still hold the original
- * register values to be carried back to the caller.
+ * When a retprobed function returns, this code saves registers and
+ * calls trampoline_handler() runs, which calls the kretprobe's handler.
  */
-static void __used kretprobe_trampoline_holder(void)
+static void __used __kprobes kretprobe_trampoline_holder(void)
 {
 	asm volatile (".global kretprobe_trampoline\n"
 			"kretprobe_trampoline:\n"
-			"NOP\n\t"
-			"NOP\n\t");
+		        "sub sp, sp, %0\n"
+			SAVE_REGS_STRING
+			"mov x0, sp\n"
+			"bl trampoline_probe_handler\n"
+			/* Replace trampoline address in lr with actual
+			   orig_ret_addr return address. */
+			"str x0, [sp, #16 * 15]\n"
+			RESTORE_REGS_STRING
+		        "add sp, sp, %0\n"
+			"ret\n"
+		      : : "I"(sizeof(struct pt_regs)) : "memory");
 }
 
-static int __kprobes
-trampoline_probe_handler(struct kprobe *p, struct pt_regs *regs)
+static void __kprobes __used *trampoline_probe_handler(struct pt_regs *regs)
 {
 	struct kretprobe_instance *ri = NULL;
 	struct hlist_head *head, empty_rp;
@@ -651,7 +646,7 @@  trampoline_probe_handler(struct kprobe *p, struct pt_regs *regs)
 	}
 
 	/* return 1 so that post handlers not called */
-	return 1;
+	return (void *) orig_ret_addr;
 }
 
 void __kprobes arch_prepare_kretprobe(struct kretprobe_instance *ri,
@@ -663,18 +658,12 @@  void __kprobes arch_prepare_kretprobe(struct kretprobe_instance *ri,
 	regs->regs[30] = (long)&kretprobe_trampoline;
 }
 
-static struct kprobe trampoline = {
-	.addr = (kprobe_opcode_t *) &kretprobe_trampoline,
-	.pre_handler = trampoline_probe_handler
-};
-
-int __kprobes arch_trampoline_kprobe(struct kprobe *p)
+int __init arch_init_kprobes(void)
 {
-	return p->addr == (kprobe_opcode_t *) &kretprobe_trampoline;
+	return 0;
 }
 
-int __init arch_init_kprobes(void)
+int arch_trampoline_kprobe(struct kprobe *p)
 {
-	/* register trampoline for kret probe */
-	return register_kprobe(&trampoline);
+	return 0;
 }