diff mbox

[2/2] arm64: Fix watchpoint recursion when single-step is wrongly triggered in irq

Message ID 1458549470-124791-2-git-send-email-hekuang@huawei.com (mailing list archive)
State New, archived
Headers show

Commit Message

He Kuang March 21, 2016, 8:37 a.m. UTC
On arm64, watchpoint handler enables single-step to bypass the next
instruction for not recursive enter. If an irq is triggered right
after the watchpoint, a single-step will be wrongly triggered in irq
handler, which causes the watchpoint address not stepped over and
system hang.

Problem can be found at the following URL:
"http://thread.gmane.org/gmane.linux.kernel/2167918"

This patch pushes watchpoint status and disables single step if it is
triggered in irq handler and restores them back after irq is handled.

Signed-off-by: Wang Nan <wangnan0@huawei.com>
Signed-off-by: He Kuang <hekuang@huawei.com>
---
 arch/arm64/include/asm/debug-monitors.h |  9 +++++++
 arch/arm64/kernel/debug-monitors.c      | 13 ++++++++++
 arch/arm64/kernel/entry.S               |  6 +++++
 arch/arm64/kernel/hw_breakpoint.c       | 44 +++++++++++++++++++++++++++++++--
 4 files changed, 70 insertions(+), 2 deletions(-)

Comments

Pratyush Anand March 21, 2016, 10:24 a.m. UTC | #1
On 21/03/2016:08:37:50 AM, He Kuang wrote:
> On arm64, watchpoint handler enables single-step to bypass the next
> instruction for not recursive enter. If an irq is triggered right
> after the watchpoint, a single-step will be wrongly triggered in irq
> handler, which causes the watchpoint address not stepped over and
> system hang.

Does patch [1] resolves this issue as well? I hope it should. Patch[1] has still
not been sent for review. Your test result will be helpful.

~Pratyush

[1] https://github.com/pratyushanand/linux/commit/7623c8099ac22eaa00e7e0f52430f7a4bd154652
Wang Nan March 21, 2016, 10:38 a.m. UTC | #2
On 2016/3/21 18:24, Pratyush Anand wrote:
> On 21/03/2016:08:37:50 AM, He Kuang wrote:
>> On arm64, watchpoint handler enables single-step to bypass the next
>> instruction for not recursive enter. If an irq is triggered right
>> after the watchpoint, a single-step will be wrongly triggered in irq
>> handler, which causes the watchpoint address not stepped over and
>> system hang.
> Does patch [1] resolves this issue as well? I hope it should. Patch[1] has still
> not been sent for review. Your test result will be helpful.
>
> ~Pratyush
>
> [1] https://github.com/pratyushanand/linux/commit/7623c8099ac22eaa00e7e0f52430f7a4bd154652

Could you please provide a test program for your case so we can test
it on our devices? I guess setting breakpoint on a "copy_from_user()"
accessing an invalid address can trigger this problem?

Thank you.
Pratyush Anand March 21, 2016, 11:05 a.m. UTC | #3
On 21/03/2016:06:38:31 PM, Wangnan (F) wrote:
> 
> 
> On 2016/3/21 18:24, Pratyush Anand wrote:
> >On 21/03/2016:08:37:50 AM, He Kuang wrote:
> >>On arm64, watchpoint handler enables single-step to bypass the next
> >>instruction for not recursive enter. If an irq is triggered right
> >>after the watchpoint, a single-step will be wrongly triggered in irq
> >>handler, which causes the watchpoint address not stepped over and
> >>system hang.
> >Does patch [1] resolves this issue as well? I hope it should. Patch[1] has still
> >not been sent for review. Your test result will be helpful.
> >
> >~Pratyush
> >
> >[1] https://github.com/pratyushanand/linux/commit/7623c8099ac22eaa00e7e0f52430f7a4bd154652
> 
> Could you please provide a test program for your case so we can test
> it on our devices? I guess setting breakpoint on a "copy_from_user()"
> accessing an invalid address can trigger this problem?

My test case was to test kprobing of copy_from_user. I used kprobe64-v11.

I reverted "patch v11 3/9" and used following script  for __copy_to_user(),
which instruments kprobe at every instruction of a given function. I can easily
see "Unexpected kernel single-step exception at EL1".
-------------------------------------------------------------
#kprobe_at_function_all_inst.sh  
-------------------------------------------------------------
#! /bin/sh
#$1: function name
echo 0 > /sys/kernel/debug/tracing/events/kprobes/enable
echo >  /sys/kernel/debug/tracing/trace
echo > /sys/kernel/debug/tracing/kprobe_events
func=$(cat /proc/kallsyms | grep -A 1 -w $1 | cut -d ' ' -f 1)
func_start=$((0x$(echo $func | cut -d ' ' -f 1)))
func_end=$((0x$(echo $func | cut -d ' ' -f 2)))
offset=0
while [ $(($func_start + $offset)) -lt $func_end ]
  do
          printf -v cmd "p:probe_%x $1+0x%x" $offset $offset
          echo $cmd >> /sys/kernel/debug/tracing/kprobe_events
          offset=$((offset + 4))
  done
echo 1 > /sys/kernel/debug/tracing/events/kprobes/enable
-------------------------------------------------------------

# ./kprobe_at_function_all_inst.sh __copy_to_user

Now, if I apply the patch which I referred in [1], I can no longer see any
"Unexpected kernel single-step exception at EL1" with above test script.

If I understood correctly, then the problem you described in your patch is that
an irq (el1_irq) is raised when watchpoint was being handled by kernel(specially
before kernel could call reinstall_suspended_bps() to disable single stepping).
Since, I disable single stepping for all the el1 exception mode, if
kernel_enable_single_step() had been called but kernel_disable_single_step() had
n't been called. So, your test case could be another good test for my
patch.

~Pratyush
Li Bin March 31, 2016, 12:45 p.m. UTC | #4
Hi Pratyush,

on 2016/3/21 18:24, Pratyush Anand wrote:
> On 21/03/2016:08:37:50 AM, He Kuang wrote:
>> On arm64, watchpoint handler enables single-step to bypass the next
>> instruction for not recursive enter. If an irq is triggered right
>> after the watchpoint, a single-step will be wrongly triggered in irq
>> handler, which causes the watchpoint address not stepped over and
>> system hang.
> 
> Does patch [1] resolves this issue as well? I hope it should. Patch[1] has still
> not been sent for review. Your test result will be helpful.
> 
> ~Pratyush
> 
> [1] https://github.com/pratyushanand/linux/commit/7623c8099ac22eaa00e7e0f52430f7a4bd154652

This patch did not consider that, when excetpion return, the singlestep flag
should be restored, otherwise the right singlestep will not triggered.
Right?

Thanks,
Li Bin

> 
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
> 
> .
>
Pratyush Anand April 4, 2016, 5:17 a.m. UTC | #5
Hi Li,

On 31/03/2016:08:45:05 PM, Li Bin wrote:
> Hi Pratyush,
> 
> on 2016/3/21 18:24, Pratyush Anand wrote:
> > On 21/03/2016:08:37:50 AM, He Kuang wrote:
> >> On arm64, watchpoint handler enables single-step to bypass the next
> >> instruction for not recursive enter. If an irq is triggered right
> >> after the watchpoint, a single-step will be wrongly triggered in irq
> >> handler, which causes the watchpoint address not stepped over and
> >> system hang.
> > 
> > Does patch [1] resolves this issue as well? I hope it should. Patch[1] has still
> > not been sent for review. Your test result will be helpful.
> > 
> > ~Pratyush
> > 
> > [1] https://github.com/pratyushanand/linux/commit/7623c8099ac22eaa00e7e0f52430f7a4bd154652
> 
> This patch did not consider that, when excetpion return, the singlestep flag
> should be restored, otherwise the right singlestep will not triggered.
> Right?

Yes, you are right, and there are other problems as well. Will Deacon pointed
out [1] that kernel debugging is per-cpu rather than per-task. So, I did thought
of a per-cpu implementation by introducing a new element "flags" in struct
pt_regs. But even with that I see issues. For example:
- While executing single step instruction, we get a data abort
- In the kernel_entry of data abort we disable single stepping based on "flags"
  bit field
- While handling data abort we receive anther interrupt, so we are again in
  kernel_entry (for el1_irq). Single stepping will be disabled again (although
  it does not matter).

Now the issue is that, what condition should be verified in kernel_exit for
enabling single step again? In the above scenario, kernel_exit for el1_irq
should not enable single stepping, but how to prevent that elegantly?

[1] http://www.spinics.net/lists/arm-kernel/msg491844.html
diff mbox

Patch

diff --git a/arch/arm64/include/asm/debug-monitors.h b/arch/arm64/include/asm/debug-monitors.h
index b5902e8..fe6939e 100644
--- a/arch/arm64/include/asm/debug-monitors.h
+++ b/arch/arm64/include/asm/debug-monitors.h
@@ -133,7 +133,10 @@  int kernel_active_single_step(void);
 #ifdef CONFIG_HAVE_HW_BREAKPOINT
 int reinstall_suspended_bps(struct pt_regs *regs);
 u64 signal_single_step_enable_bps(void);
+u64 irq_single_step_enable_bps(void);
+
 void signal_reinstall_single_step(u64 pstate);
+void irq_reinstall_single_step(struct pt_regs *regs);
 #else
 static inline int reinstall_suspended_bps(struct pt_regs *regs)
 {
@@ -145,7 +148,13 @@  static inline u64 signal_single_step_enable_bps(void)
 	return 0;
 }
 
+static inline u64 irq_single_step_enable_bps(void)
+{
+	return 0;
+}
+
 static inline void signal_reinstall_single_step(u64 pstate) { }
+static inline void irq_reinstall_single_step(struct pt_regs *regs) { }
 #endif
 
 int aarch32_break_handler(struct pt_regs *regs);
diff --git a/arch/arm64/kernel/debug-monitors.c b/arch/arm64/kernel/debug-monitors.c
index c536c9e..fab1faa 100644
--- a/arch/arm64/kernel/debug-monitors.c
+++ b/arch/arm64/kernel/debug-monitors.c
@@ -245,9 +245,22 @@  static void send_user_sigtrap(int si_code)
 	force_sig_info(SIGTRAP, &info, current);
 }
 
+extern unsigned long el1_irq_ss_entry[];
+
 static int single_step_handler(unsigned long addr, unsigned int esr,
 			       struct pt_regs *regs)
 {
+	void *pc = (void *)instruction_pointer(regs);
+
+	if (pc == &el1_irq_ss_entry) {
+		struct pt_regs *irq_regs = (struct pt_regs *)(regs->sp);
+
+		irq_regs->pstate |= irq_single_step_enable_bps();
+		kernel_disable_single_step();
+
+		return 0;
+	}
+
 	/*
 	 * If we are stepping a pending breakpoint, call the hw_breakpoint
 	 * handler first.
diff --git a/arch/arm64/kernel/entry.S b/arch/arm64/kernel/entry.S
index 1f7f5a2..836d98e 100644
--- a/arch/arm64/kernel/entry.S
+++ b/arch/arm64/kernel/entry.S
@@ -402,12 +402,18 @@  ENDPROC(el1_sync)
 el1_irq:
 	kernel_entry 1
 	enable_dbg
+	.global el1_irq_ss_entry
+el1_irq_ss_entry:
 #ifdef CONFIG_TRACE_IRQFLAGS
 	bl	trace_hardirqs_off
 #endif
 
 	get_thread_info tsk
 	irq_handler
+#ifdef CONFIG_HAVE_HW_BREAKPOINT
+	mov	x0, sp
+	bl	irq_reinstall_single_step
+#endif
 
 #ifdef CONFIG_PREEMPT
 	ldr	w24, [tsk, #TI_PREEMPT]		// get preempt count
diff --git a/arch/arm64/kernel/hw_breakpoint.c b/arch/arm64/kernel/hw_breakpoint.c
index 18fd3d3..0cf13ee 100644
--- a/arch/arm64/kernel/hw_breakpoint.c
+++ b/arch/arm64/kernel/hw_breakpoint.c
@@ -540,11 +540,12 @@  int arch_validate_hwbkpt_settings(struct perf_event *bp)
  * exception level at the register level.
  * This is used when single-stepping after a breakpoint exception.
  */
-static void toggle_bp_registers(int reg, enum dbg_active_el el, int enable)
+static bool toggle_bp_registers(int reg, enum dbg_active_el el, int enable)
 {
 	int i, max_slots, privilege;
 	u32 ctrl;
 	struct perf_event **slots;
+	bool origin_state = false;
 
 	switch (reg) {
 	case AARCH64_DBG_REG_BCR:
@@ -556,7 +557,7 @@  static void toggle_bp_registers(int reg, enum dbg_active_el el, int enable)
 		max_slots = core_num_wrps;
 		break;
 	default:
-		return;
+		return false;
 	}
 
 	for (i = 0; i < max_slots; ++i) {
@@ -568,12 +569,16 @@  static void toggle_bp_registers(int reg, enum dbg_active_el el, int enable)
 			continue;
 
 		ctrl = read_wb_reg(reg, i);
+		if (ctrl & 0x1)
+			origin_state = true;
 		if (enable)
 			ctrl |= 0x1;
 		else
 			ctrl &= ~0x1;
 		write_wb_reg(reg, i, ctrl);
 	}
+
+	return origin_state;
 }
 
 /*
@@ -982,6 +987,41 @@  u64 signal_single_step_enable_bps(void)
 	return retval;
 }
 
+u64 irq_single_step_enable_bps(void)
+{
+	u64 retval = 0;
+
+	if (!toggle_bp_registers(AARCH64_DBG_REG_WCR, DBG_ACTIVE_EL1, 1))
+		retval |= PSR_LINUX_HW_WP_SS;
+
+	if (!toggle_bp_registers(AARCH64_DBG_REG_BCR, DBG_ACTIVE_EL1, 1))
+		retval |= PSR_LINUX_HW_BP_SS;
+
+	return retval;
+}
+
+void irq_reinstall_single_step(struct pt_regs *regs)
+{
+	u64 pstate = regs->pstate;
+
+	if (likely(!(regs->pstate & PSR_LINUX_HW_SS)))
+		return;
+
+	if (!user_mode(regs)) {
+		if (pstate & PSR_LINUX_HW_BP_SS)
+			toggle_bp_registers(AARCH64_DBG_REG_BCR,
+					    DBG_ACTIVE_EL1, 0);
+		if (pstate & PSR_LINUX_HW_WP_SS)
+			toggle_bp_registers(AARCH64_DBG_REG_WCR,
+					    DBG_ACTIVE_EL1, 0);
+
+		if (!kernel_active_single_step()) {
+			asm volatile ("msr     daifset, #8\n");
+			kernel_enable_single_step(regs);
+		}
+	}
+}
+
 void signal_reinstall_single_step(u64 pstate)
 {
 	struct debug_info *debug_info = &current->thread.debug;