diff mbox series

[2/2] arm64/arch_timer: replace arch_counter_enforce_ordering() with isb

Message ID 20210330105719.47760-3-kernelfans@gmail.com (mailing list archive)
State New, archived
Headers show
Series arm64/timer: trival fix for sync inside counter | expand

Commit Message

Pingfan Liu March 30, 2021, 10:57 a.m. UTC
The description of getting counter value is not very clear. [1]
'mrs Xt, cntpct' may execute out of program order, either forward or
backward.

Now taking a look at this group of getting counter routines. All of them
are called from sched_clock(). And there is an isb to protect forward
speculation. But there is no isb for the backward speculation.

The current code enforces read dependency instructions anchored on
getting counter. But it is not enough to protect against other no
dependency instructions, and even function call can not prevent the
speculation between getting counter and them.

Replacing arch_counter_enforce_ordering() with isb to achieve the aim.

[1]: AArch64 Programmer's Guides Generic Timer:  3.1. Count and frequency

Signed-off-by: Pingfan Liu <kernelfans@gmail.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Vincenzo Frascino <vincenzo.frascino@arm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Andrei Vagin <avagin@gmail.com>
Cc: Marc Zyngier <maz@kernel.org>
To: linux-arm-kernel@lists.infradead.org
---
 arch/arm64/include/asm/arch_timer.h | 34 ++++++++---------------------
 1 file changed, 9 insertions(+), 25 deletions(-)

Comments

Will Deacon March 30, 2021, 11:05 a.m. UTC | #1
On Tue, Mar 30, 2021 at 06:57:19PM +0800, Pingfan Liu wrote:
> The description of getting counter value is not very clear. [1]
> 'mrs Xt, cntpct' may execute out of program order, either forward or
> backward.
> 
> Now taking a look at this group of getting counter routines. All of them
> are called from sched_clock(). And there is an isb to protect forward
> speculation. But there is no isb for the backward speculation.
> 
> The current code enforces read dependency instructions anchored on
> getting counter. But it is not enough to protect against other no
> dependency instructions, and even function call can not prevent the
> speculation between getting counter and them.

Which "no dependency instructions"?

> 
> Replacing arch_counter_enforce_ordering() with isb to achieve the aim.
> 
> [1]: AArch64 Programmer's Guides Generic Timer:  3.1. Count and frequency

I wouldn't trust that guide as far as I can throw it.

Please describe the problem you're trying to solve, and hopefully I can
help. ISB is an expensive instruction so we need a good justification to
add it here (i.e. an example of why the current scheme is not correct).

Thanks,

Will
Pingfan Liu March 31, 2021, 9:20 a.m. UTC | #2
On Tue, Mar 30, 2021 at 12:05:53PM +0100, Will Deacon wrote:
Hi Will,

Thank you for kindly review.

I have several questions haunting on this issue. Hope further help from you.

> On Tue, Mar 30, 2021 at 06:57:19PM +0800, Pingfan Liu wrote:
> > The description of getting counter value is not very clear. [1]
> > 'mrs Xt, cntpct' may execute out of program order, either forward or
> > backward.
> > 
> > Now taking a look at this group of getting counter routines. All of them
> > are called from sched_clock(). And there is an isb to protect forward
> > speculation. But there is no isb for the backward speculation.
> > 
> > The current code enforces read dependency instructions anchored on
> > getting counter. But it is not enough to protect against other no
> > dependency instructions, and even function call can not prevent the
> > speculation between getting counter and them.
> 
> Which "no dependency instructions"?
> 
In fact, I have two questions here. 

-1. let me start with the backward speculation.
No such instructions inside this function, but combining the caller and callee,
it will look like:
00000000000002e0 <sched_clock>:
...
 318:   f9400e60        ldr     x0, [x19, #24]
 31c:   d63f0000        blr     x0
					0000000000000030 <arch_counter_get_cntpct>:
					      30:       d503233f        paciasp
					      34:       d5033fdf        isb
					      38:       d53be020        mrs     x0, cntpct_el0
					      3c:       ca000001        eor     x1, x0, x0
					      40:       8b2163e1        add     x1, sp, x1
					      44:       f940003f        ldr     xzr, [x1]
					      48:       d50323bf        autiasp
					      4c:       d65f03c0        ret
		
 320:   29441261        ldp     w1, w4, [x19, #32]
...
 33c:   cb060000        sub     x0, x0, x6

There is no 'instruction sync semantics' applied on 'ret', so
arch_counter_get_cntpct() may return to 320 before its instrutions are finished
in pipeline, and then new instructions are fetched into pipeline.

In theory (no idea about better material than [1]), any instruction touching
x0, but have no data dependency anchored on getting cntpct will take the risk
of speculation with getting cntpct. Can this happen?  (I admit there is no such
instruction in disasemble, but does it vary due to compiler?)

Making an analogy to isb at the entry. If this speculation is not possible, could
it be eliminated?


-2. Is arch_counter_enforce_ordering() needed?
Since the return value of arch_counter_get_cntpct() is always read, and
naturally, this read op will observe the read-barrier.
Here this instruction is "33c:   cb060000        sub     x0, x0, x6"


> > 
> > Replacing arch_counter_enforce_ordering() with isb to achieve the aim.
> > 
> > [1]: AArch64 Programmer's Guides Generic Timer:  3.1. Count and frequency
> 
> I wouldn't trust that guide as far as I can throw it.
> 
> Please describe the problem you're trying to solve, and hopefully I can
> help. ISB is an expensive instruction so we need a good justification to
> add it here (i.e. an example of why the current scheme is not correct).
> 
Yes, ISB is expensive, and original, I consider whether it can be eliminated
totally, but finally run to this reverse conclusion.

On the contrary, if this kind of speculation is not real, plus the fact that
there is already a natural read on getting cntvct. Can the second isb in
__arch_get_hw_counter() just be replaced with compiler barrier,


Thanks,

Pingfan

---
In case of reference

00000000000002e0 <sched_clock>:
 2e0:   d503233f        paciasp
 2e4:   a9bc7bfd        stp     x29, x30, [sp, #-64]!
 2e8:   910003fd        mov     x29, sp
 2ec:   a90153f3        stp     x19, x20, [sp, #16]
 2f0:   90000014        adrp    x20, 0 <jiffy_sched_clock_read>
 2f4:   91000294        add     x20, x20, #0x0
 2f8:   a90363f7        stp     x23, x24, [sp, #48]
 2fc:   91002297        add     x23, x20, #0x8
 300:   52800518        mov     w24, #0x28                      // #40
 304:   a9025bf5        stp     x21, x22, [sp, #32]
 308:   b9400296        ldr     w22, [x20]
 30c:   120002d5        and     w21, w22, #0x1
 310:   9bb87eb5        umull   x21, w21, w24
 314:   8b1502f3        add     x19, x23, x21
 318:   f9400e60        ldr     x0, [x19, #24]
 31c:   d63f0000        blr     x0
 320:   29441261        ldp     w1, w4, [x19, #32]
 324:   f8756ae3        ldr     x3, [x23, x21]
 328:   a9409666        ldp     x6, x5, [x19, #8]
 32c:   d50339bf        dmb     ishld
 330:   b9400282        ldr     w2, [x20]
 334:   6b16005f        cmp     w2, w22
 338:   54fffe81        b.ne    308 <sched_clock+0x28>  // b.any
 33c:   cb060000        sub     x0, x0, x6
 340:   2a0103e1        mov     w1, w1
 344:   8a050000        and     x0, x0, x5
 348:   a94153f3        ldp     x19, x20, [sp, #16]
 34c:   9b017c00        mul     x0, x0, x1
 350:   a9425bf5        ldp     x21, x22, [sp, #32]
 354:   9ac42400        lsr     x0, x0, x4
 358:   8b030000        add     x0, x0, x3
 35c:   a94363f7        ldp     x23, x24, [sp, #48]
 360:   a8c47bfd        ldp     x29, x30, [sp], #64
 364:   d50323bf        autiasp
 368:   d65f03c0        ret
diff mbox series

Patch

diff --git a/arch/arm64/include/asm/arch_timer.h b/arch/arm64/include/asm/arch_timer.h
index 9f0ec21d6327..233ade46390f 100644
--- a/arch/arm64/include/asm/arch_timer.h
+++ b/arch/arm64/include/asm/arch_timer.h
@@ -165,32 +165,18 @@  static inline void arch_timer_set_cntkctl(u32 cntkctl)
 	isb();
 }
 
-/*
- * Ensure that reads of the counter are treated the same as memory reads
- * for the purposes of ordering by subsequent memory barriers.
- *
- * This insanity brought to you by speculative system register reads,
- * out-of-order memory accesses, sequence locks and Thomas Gleixner.
- *
- * http://lists.infradead.org/pipermail/linux-arm-kernel/2019-February/631195.html
- */
-#define arch_counter_enforce_ordering(val) do {				\
-	u64 tmp, _val = (val);						\
-									\
-	asm volatile(							\
-	"	eor	%0, %1, %1\n"					\
-	"	add	%0, sp, %0\n"					\
-	"	ldr	xzr, [%0]"					\
-	: "=r" (tmp) : "r" (_val));					\
-} while (0)
-
 static __always_inline u64 __arch_counter_get_cntpct_stable(void)
 {
 	u64 cnt;
 
 	isb();
 	cnt = arch_timer_reg_read_stable(cntpct_el0);
-	arch_counter_enforce_ordering(cnt);
+	/*
+	 * read dependency does not help against the sepculation between getting counter
+	 * and no dependency instructions, which cause the overwritten of register value.
+	 * So here needs isb.
+	 */
+	isb();
 	return cnt;
 }
 
@@ -200,7 +186,7 @@  static __always_inline u64 __arch_counter_get_cntpct(void)
 
 	isb();
 	cnt = read_sysreg(cntpct_el0);
-	arch_counter_enforce_ordering(cnt);
+	isb();
 	return cnt;
 }
 
@@ -210,7 +196,7 @@  static __always_inline u64 __arch_counter_get_cntvct_stable(void)
 
 	isb();
 	cnt = arch_timer_reg_read_stable(cntvct_el0);
-	arch_counter_enforce_ordering(cnt);
+	isb();
 	return cnt;
 }
 
@@ -220,12 +206,10 @@  static __always_inline u64 __arch_counter_get_cntvct(void)
 
 	isb();
 	cnt = read_sysreg(cntvct_el0);
-	arch_counter_enforce_ordering(cnt);
+	isb();
 	return cnt;
 }
 
-#undef arch_counter_enforce_ordering
-
 static inline int arch_timer_arch_init(void)
 {
 	return 0;