Message ID | 20210108062126.2335873-2-wei.chen@arm.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [v2,1/2] xen/arm: Correct the coding style of get_cycles | expand |
Hi Wei, On 08/01/2021 06:21, Wei Chen wrote: > Per the discussion [1] on the mailing list, we'd better to > have a barrier after reading CNTPCT in get_cycles. If there > is not any barrier there. When get_cycles being used in some > seqlock critical context in the future, the seqlock can be > speculated potentially. > > We import Linux commit 75a19a0202db21638a1c2b424afb867e1f9a2376: > arm64: arch_timer: Ensure counter register reads occur with seqlock held > > When executing clock_gettime(), either in the vDSO or via a system call, > we need to ensure that the read of the counter register occurs within > the seqlock reader critical section. This ensures that updates to the > clocksource parameters (e.g. the multiplier) are consistent with the > counter value and therefore avoids the situation where time appears to > go backwards across multiple reads. > > Extend the vDSO logic so that the seqlock critical section covers the > read of the counter register as well as accesses to the data page. Since > reads of the counter system registers are not ordered by memory barrier > instructions, introduce dependency ordering from the counter read to a > subsequent memory access so that the seqlock memory barriers apply to > the counter access in both the vDSO and the system call paths. > > Cc: <stable@vger.kernel.org> > Cc: Marc Zyngier <marc.zyngier@arm.com> > Tested-by: Vincenzo Frascino <vincenzo.frascino@arm.com> > Link: https://lore.kernel.org/linux-arm-kernel/alpine.DEB.2.21.1902081950260.1662@nanos.tec.linutronix.de/ > Reported-by: Thomas Gleixner <tglx@linutronix.de> > Signed-off-by: Will Deacon <will.deacon@arm.com> > > While we are not aware of such use in Xen, it would be best to add the > barrier to avoid any suprise. > > In order to reduce the impact of new barrier, we perfer to > use enforce order instead of ISB [2]. > > Currently, enforce order is not applied to arm32 as this is > not done in Linux at the date of this patch. If this is done > in Linux it will need to be also done in Xen. > > [1] https://lists.xenproject.org/archives/html/xen-devel/2020-12/msg00181.html > [2] https://lkml.org/lkml/2020/3/13/645 > > Signed-off-by: Wei Chen <wei.chen@arm.com> > --- > v1 -> v2: > 1. Update commit message to refer Linux commit. > 2. Change u64 to uint64_t > --- > xen/include/asm-arm/time.h | 43 ++++++++++++++++++++++++++++++++++++-- > 1 file changed, 41 insertions(+), 2 deletions(-) > > diff --git a/xen/include/asm-arm/time.h b/xen/include/asm-arm/time.h > index 5c4529ebb5..6b8fd839dd 100644 > --- a/xen/include/asm-arm/time.h > +++ b/xen/include/asm-arm/time.h > @@ -11,9 +11,26 @@ > > typedef uint64_t cycles_t; > > -static inline cycles_t get_cycles(void) > +/* > + * Ensure that reads of the counter are treated the same as memory reads > + * for the purposes of ordering by subsequent memory barriers. > + */ > +#if defined(CONFIG_ARM_64) > +#define read_cntpct_enforce_ordering(val) do { \ > + uint64_t tmp, _val = (val); \ > + \ > + asm volatile( \ > + "eor %0, %1, %1\n" \ > + "add %0, sp, %0\n" \ > + "ldr xzr, [%0]" \ > + : "=r" (tmp) : "r" (_val)); \ > +} while (0) > +#else > +#define read_cntpct_enforce_ordering(val) do {} while (0) > +#endif > + > +static inline cycles_t read_cntpct_stable(void) OOI, is there any particular reason to create a new helper? > { > - isb(); > /* > * ARM_WORKAROUND_858921: Cortex-A73 (all versions) counter read > * can return a wrong value when the counter crosses a 32bit boundary. > @@ -36,6 +53,28 @@ static inline cycles_t get_cycles(void) > } > } > > +static inline cycles_t get_cycles(void) > +{ > + cycles_t cnt; > + > + isb(); > + cnt = read_cntpct_stable(); > + > + /* > + * If there is not any barrier here. When get_cycles being used in > + * some seqlock critical context in the future, the seqlock can be > + * speculated potentially. > + * > + * To prevent seqlock from being speculated silently, we add a barrier > + * here defensively. Normally, we just need an ISB here is enough, but > + * considering the minimum performance cost. We prefer to use enforce > + * order here. > + */ I thought, we agreed to remove the comment. Did I miss anything? I can fix this one on commit if there is no need for a new revision. Cheers, > + read_cntpct_enforce_ordering(cnt); > + > + return cnt; > +} > + > /* List of timer's IRQ */ > enum timer_ppi > { >
HI Julien, > -----Original Message----- > From: Julien Grall <julien@xen.org> > Sent: 2021年1月8日 19:56 > To: Wei Chen <Wei.Chen@arm.com>; xen-devel@lists.xenproject.org; > sstabellini@kernel.org > Cc: Bertrand Marquis <Bertrand.Marquis@arm.com>; Penny Zheng > <Penny.Zheng@arm.com>; Jiamei Xie <Jiamei.Xie@arm.com>; nd > <nd@arm.com> > Subject: Re: [PATCH v2 2/2] xen/arm: Add defensive barrier in get_cycles for > Arm64 > > Hi Wei, > > On 08/01/2021 06:21, Wei Chen wrote: > > Per the discussion [1] on the mailing list, we'd better to > > have a barrier after reading CNTPCT in get_cycles. If there > > is not any barrier there. When get_cycles being used in some > > seqlock critical context in the future, the seqlock can be > > speculated potentially. > > > > We import Linux commit 75a19a0202db21638a1c2b424afb867e1f9a2376: > > arm64: arch_timer: Ensure counter register reads occur with seqlock held > > > > When executing clock_gettime(), either in the vDSO or via a system call, > > we need to ensure that the read of the counter register occurs within > > the seqlock reader critical section. This ensures that updates to the > > clocksource parameters (e.g. the multiplier) are consistent with the > > counter value and therefore avoids the situation where time appears to > > go backwards across multiple reads. > > > > Extend the vDSO logic so that the seqlock critical section covers the > > read of the counter register as well as accesses to the data page. Since > > reads of the counter system registers are not ordered by memory barrier > > instructions, introduce dependency ordering from the counter read to a > > subsequent memory access so that the seqlock memory barriers apply to > > the counter access in both the vDSO and the system call paths. > > > > Cc: <stable@vger.kernel.org> > > Cc: Marc Zyngier <marc.zyngier@arm.com> > > Tested-by: Vincenzo Frascino <vincenzo.frascino@arm.com> > > Link: https://lore.kernel.org/linux-arm- > kernel/alpine.DEB.2.21.1902081950260.1662@nanos.tec.linutronix.de/ > > Reported-by: Thomas Gleixner <tglx@linutronix.de> > > Signed-off-by: Will Deacon <will.deacon@arm.com> > > > > While we are not aware of such use in Xen, it would be best to add the > > barrier to avoid any suprise. > > > > In order to reduce the impact of new barrier, we perfer to > > use enforce order instead of ISB [2]. > > > > Currently, enforce order is not applied to arm32 as this is > > not done in Linux at the date of this patch. If this is done > > in Linux it will need to be also done in Xen. > > > > [1] https://lists.xenproject.org/archives/html/xen-devel/2020- > 12/msg00181.html > > [2] https://lkml.org/lkml/2020/3/13/645 > > > > Signed-off-by: Wei Chen <wei.chen@arm.com> > > --- > > v1 -> v2: > > 1. Update commit message to refer Linux commit. > > 2. Change u64 to uint64_t > > --- > > xen/include/asm-arm/time.h | 43 > ++++++++++++++++++++++++++++++++++++-- > > 1 file changed, 41 insertions(+), 2 deletions(-) > > > > diff --git a/xen/include/asm-arm/time.h b/xen/include/asm-arm/time.h > > index 5c4529ebb5..6b8fd839dd 100644 > > --- a/xen/include/asm-arm/time.h > > +++ b/xen/include/asm-arm/time.h > > @@ -11,9 +11,26 @@ > > > > typedef uint64_t cycles_t; > > > > -static inline cycles_t get_cycles(void) > > +/* > > + * Ensure that reads of the counter are treated the same as memory reads > > + * for the purposes of ordering by subsequent memory barriers. > > + */ > > +#if defined(CONFIG_ARM_64) > > +#define read_cntpct_enforce_ordering(val) do { \ > > + uint64_t tmp, _val = (val); \ > > + \ > > + asm volatile( \ > > + "eor %0, %1, %1\n" \ > > + "add %0, sp, %0\n" \ > > + "ldr xzr, [%0]" \ > > + : "=r" (tmp) : "r" (_val)); \ > > +} while (0) > > +#else > > +#define read_cntpct_enforce_ordering(val) do {} while (0) > > +#endif > > + > > +static inline cycles_t read_cntpct_stable(void) > > OOI, is there any particular reason to create a new helper? > Yes, I try to reduce the #if defined(CONFIG_ARM_64) chunks. I think if we introduce an empty helper for Arm32, we can reduce the other chunk inside get_cycles. In addition, if we need to do the same work for Arm32 in the future, we may not need to modify get_cycles. > > { > > - isb(); > > /* > > * ARM_WORKAROUND_858921: Cortex-A73 (all versions) counter read > > * can return a wrong value when the counter crosses a 32bit boundary. > > @@ -36,6 +53,28 @@ static inline cycles_t get_cycles(void) > > } > > } > > > > +static inline cycles_t get_cycles(void) > > +{ > > + cycles_t cnt; > > + > > + isb(); > > + cnt = read_cntpct_stable(); > > + > > + /* > > + * If there is not any barrier here. When get_cycles being used in > > + * some seqlock critical context in the future, the seqlock can be > > + * speculated potentially. > > + * > > + * To prevent seqlock from being speculated silently, we add a barrier > > + * here defensively. Normally, we just need an ISB here is enough, but > > + * considering the minimum performance cost. We prefer to use enforce > > + * order here. > > + */ > > I thought, we agreed to remove the comment. Did I miss anything? > > I can fix this one on commit if there is no need for a new revision. > Ah, sorry I forget that. If we don't need a new revision, please help me to remove it. Thanks. > Cheers, > > > + read_cntpct_enforce_ordering(cnt); > > + > > + return cnt; > > +} > > + > > /* List of timer's IRQ */ > > enum timer_ppi > > { > > > > -- > Julien Grall
On 09/01/2021 12:16, Wei Chen wrote: > HI Julien, Hi Wei, >> -----Original Message----- >> From: Julien Grall <julien@xen.org> >> Sent: 2021年1月8日 19:56 >> To: Wei Chen <Wei.Chen@arm.com>; xen-devel@lists.xenproject.org; >> sstabellini@kernel.org >> Cc: Bertrand Marquis <Bertrand.Marquis@arm.com>; Penny Zheng >> <Penny.Zheng@arm.com>; Jiamei Xie <Jiamei.Xie@arm.com>; nd >> <nd@arm.com> >> Subject: Re: [PATCH v2 2/2] xen/arm: Add defensive barrier in get_cycles for >> Arm64 >> >> Hi Wei, >> >> On 08/01/2021 06:21, Wei Chen wrote: >>> Per the discussion [1] on the mailing list, we'd better to >>> have a barrier after reading CNTPCT in get_cycles. If there >>> is not any barrier there. When get_cycles being used in some >>> seqlock critical context in the future, the seqlock can be >>> speculated potentially. >>> >>> We import Linux commit 75a19a0202db21638a1c2b424afb867e1f9a2376: >>> arm64: arch_timer: Ensure counter register reads occur with seqlock held >>> >>> When executing clock_gettime(), either in the vDSO or via a system call, >>> we need to ensure that the read of the counter register occurs within >>> the seqlock reader critical section. This ensures that updates to the >>> clocksource parameters (e.g. the multiplier) are consistent with the >>> counter value and therefore avoids the situation where time appears to >>> go backwards across multiple reads. >>> >>> Extend the vDSO logic so that the seqlock critical section covers the >>> read of the counter register as well as accesses to the data page. Since >>> reads of the counter system registers are not ordered by memory barrier >>> instructions, introduce dependency ordering from the counter read to a >>> subsequent memory access so that the seqlock memory barriers apply to >>> the counter access in both the vDSO and the system call paths. >>> >>> Cc: <stable@vger.kernel.org> >>> Cc: Marc Zyngier <marc.zyngier@arm.com> >>> Tested-by: Vincenzo Frascino <vincenzo.frascino@arm.com> >>> Link: https://lore.kernel.org/linux-arm- >> kernel/alpine.DEB.2.21.1902081950260.1662@nanos.tec.linutronix.de/ >>> Reported-by: Thomas Gleixner <tglx@linutronix.de> >>> Signed-off-by: Will Deacon <will.deacon@arm.com> >>> >>> While we are not aware of such use in Xen, it would be best to add the >>> barrier to avoid any suprise. >>> >>> In order to reduce the impact of new barrier, we perfer to >>> use enforce order instead of ISB [2]. >>> >>> Currently, enforce order is not applied to arm32 as this is >>> not done in Linux at the date of this patch. If this is done >>> in Linux it will need to be also done in Xen. >>> >>> [1] https://lists.xenproject.org/archives/html/xen-devel/2020- >> 12/msg00181.html >>> [2] https://lkml.org/lkml/2020/3/13/645 >>> >>> Signed-off-by: Wei Chen <wei.chen@arm.com> >>> --- >>> v1 -> v2: >>> 1. Update commit message to refer Linux commit. >>> 2. Change u64 to uint64_t >>> --- >>> xen/include/asm-arm/time.h | 43 >> ++++++++++++++++++++++++++++++++++++-- >>> 1 file changed, 41 insertions(+), 2 deletions(-) >>> >>> diff --git a/xen/include/asm-arm/time.h b/xen/include/asm-arm/time.h >>> index 5c4529ebb5..6b8fd839dd 100644 >>> --- a/xen/include/asm-arm/time.h >>> +++ b/xen/include/asm-arm/time.h >>> @@ -11,9 +11,26 @@ >>> >>> typedef uint64_t cycles_t; >>> >>> -static inline cycles_t get_cycles(void) >>> +/* >>> + * Ensure that reads of the counter are treated the same as memory reads >>> + * for the purposes of ordering by subsequent memory barriers. >>> + */ >>> +#if defined(CONFIG_ARM_64) >>> +#define read_cntpct_enforce_ordering(val) do { \ >>> + uint64_t tmp, _val = (val); \ >>> + \ >>> + asm volatile( \ >>> + "eor %0, %1, %1\n" \ >>> + "add %0, sp, %0\n" \ >>> + "ldr xzr, [%0]" \ >>> + : "=r" (tmp) : "r" (_val)); \ >>> +} while (0) >>> +#else >>> +#define read_cntpct_enforce_ordering(val) do {} while (0) >>> +#endif >>> + >>> +static inline cycles_t read_cntpct_stable(void) >> >> OOI, is there any particular reason to create a new helper? >> > > Yes, I try to reduce the #if defined(CONFIG_ARM_64) chunks. Hmmm... There is no #ifdef chunk in read_cntpct_stable(). Did I miss anything? > I think > if we introduce an empty helper for Arm32, we can reduce the other > chunk inside get_cycles. In addition, if we need to do the same work > for Arm32 in the future, we may not need to modify get_cycles. I don't really follow this. I wasn't asking about read_cntpct_enforce_ordering(). Instead I was asking about read_cntpct_stable() which looks like you just split get_cycles(). This change looks completely unrelated to the purpose of this series. I am not going to ask to split it, but I think this should be explained in the commit message. Cheers,
Hi Julien, > -----Original Message----- > From: Julien Grall <julien@xen.org> > Sent: 2021年1月14日 2:30 > To: Wei Chen <Wei.Chen@arm.com>; xen-devel@lists.xenproject.org; > sstabellini@kernel.org > Cc: Bertrand Marquis <Bertrand.Marquis@arm.com>; Penny Zheng > <Penny.Zheng@arm.com>; Jiamei Xie <Jiamei.Xie@arm.com>; nd > <nd@arm.com> > Subject: Re: [PATCH v2 2/2] xen/arm: Add defensive barrier in get_cycles for > Arm64 > > > > On 09/01/2021 12:16, Wei Chen wrote: > > HI Julien, > > Hi Wei, > > >> -----Original Message----- > >> From: Julien Grall <julien@xen.org> > >> Sent: 2021年1月8日 19:56 > >> To: Wei Chen <Wei.Chen@arm.com>; xen-devel@lists.xenproject.org; > >> sstabellini@kernel.org > >> Cc: Bertrand Marquis <Bertrand.Marquis@arm.com>; Penny Zheng > >> <Penny.Zheng@arm.com>; Jiamei Xie <Jiamei.Xie@arm.com>; nd > >> <nd@arm.com> > >> Subject: Re: [PATCH v2 2/2] xen/arm: Add defensive barrier in get_cycles for > >> Arm64 > >> > >> Hi Wei, > >> > >> On 08/01/2021 06:21, Wei Chen wrote: > >>> Per the discussion [1] on the mailing list, we'd better to > >>> have a barrier after reading CNTPCT in get_cycles. If there > >>> is not any barrier there. When get_cycles being used in some > >>> seqlock critical context in the future, the seqlock can be > >>> speculated potentially. > >>> > >>> We import Linux commit 75a19a0202db21638a1c2b424afb867e1f9a2376: > >>> arm64: arch_timer: Ensure counter register reads occur with seqlock > held > >>> > >>> When executing clock_gettime(), either in the vDSO or via a system call, > >>> we need to ensure that the read of the counter register occurs within > >>> the seqlock reader critical section. This ensures that updates to the > >>> clocksource parameters (e.g. the multiplier) are consistent with the > >>> counter value and therefore avoids the situation where time appears to > >>> go backwards across multiple reads. > >>> > >>> Extend the vDSO logic so that the seqlock critical section covers the > >>> read of the counter register as well as accesses to the data page. Since > >>> reads of the counter system registers are not ordered by memory barrier > >>> instructions, introduce dependency ordering from the counter read to a > >>> subsequent memory access so that the seqlock memory barriers apply to > >>> the counter access in both the vDSO and the system call paths. > >>> > >>> Cc: <stable@vger.kernel.org> > >>> Cc: Marc Zyngier <marc.zyngier@arm.com> > >>> Tested-by: Vincenzo Frascino <vincenzo.frascino@arm.com> > >>> Link: https://lore.kernel.org/linux-arm- > >> kernel/alpine.DEB.2.21.1902081950260.1662@nanos.tec.linutronix.de/ > >>> Reported-by: Thomas Gleixner <tglx@linutronix.de> > >>> Signed-off-by: Will Deacon <will.deacon@arm.com> > >>> > >>> While we are not aware of such use in Xen, it would be best to add the > >>> barrier to avoid any suprise. > >>> > >>> In order to reduce the impact of new barrier, we perfer to > >>> use enforce order instead of ISB [2]. > >>> > >>> Currently, enforce order is not applied to arm32 as this is > >>> not done in Linux at the date of this patch. If this is done > >>> in Linux it will need to be also done in Xen. > >>> > >>> [1] https://lists.xenproject.org/archives/html/xen-devel/2020- > >> 12/msg00181.html > >>> [2] https://lkml.org/lkml/2020/3/13/645 > >>> > >>> Signed-off-by: Wei Chen <wei.chen@arm.com> > >>> --- > >>> v1 -> v2: > >>> 1. Update commit message to refer Linux commit. > >>> 2. Change u64 to uint64_t > >>> --- > >>> xen/include/asm-arm/time.h | 43 > >> ++++++++++++++++++++++++++++++++++++-- > >>> 1 file changed, 41 insertions(+), 2 deletions(-) > >>> > >>> diff --git a/xen/include/asm-arm/time.h b/xen/include/asm-arm/time.h > >>> index 5c4529ebb5..6b8fd839dd 100644 > >>> --- a/xen/include/asm-arm/time.h > >>> +++ b/xen/include/asm-arm/time.h > >>> @@ -11,9 +11,26 @@ > >>> > >>> typedef uint64_t cycles_t; > >>> > >>> -static inline cycles_t get_cycles(void) > >>> +/* > >>> + * Ensure that reads of the counter are treated the same as memory reads > >>> + * for the purposes of ordering by subsequent memory barriers. > >>> + */ > >>> +#if defined(CONFIG_ARM_64) > >>> +#define read_cntpct_enforce_ordering(val) do { \ > >>> + uint64_t tmp, _val = (val); \ > >>> + \ > >>> + asm volatile( \ > >>> + "eor %0, %1, %1\n" \ > >>> + "add %0, sp, %0\n" \ > >>> + "ldr xzr, [%0]" \ > >>> + : "=r" (tmp) : "r" (_val)); \ > >>> +} while (0) > >>> +#else > >>> +#define read_cntpct_enforce_ordering(val) do {} while (0) > >>> +#endif > >>> + > >>> +static inline cycles_t read_cntpct_stable(void) > >> > >> OOI, is there any particular reason to create a new helper? > >> > > > > Yes, I try to reduce the #if defined(CONFIG_ARM_64) chunks. > > Hmmm... There is no #ifdef chunk in read_cntpct_stable(). Did I miss > anything? No, It was I who misunderstood your comments. > > > I think > > if we introduce an empty helper for Arm32, we can reduce the other > > chunk inside get_cycles. In addition, if we need to do the same work > > for Arm32 in the future, we may not need to modify get_cycles. > I don't really follow this. I wasn't asking about > read_cntpct_enforce_ordering(). Instead I was asking about > read_cntpct_stable() which looks like you just split get_cycles(). > > This change looks completely unrelated to the purpose of this series. I > am not going to ask to split it, but I think this should be explained in > the commit message. > Yes, I forgot to explain this changes in the commit message. When I was trying to add read_cntpct_enforce_ordering into get_cycles, there were three ways I can do: 1. Add read_cntpct_enforce_ordering to the end of each branch 2. Add a new cycles_t variable and record value of each branch. Using the function end as unique return point. And then we can add read_cntpct_enforce_ordering to the end of get_cycles. 3. Don't touch the get_cycles function body, just rename it and create another helper named get_cycles to include original function and read_cntpct_enforce_ordering Personally, I prefer the #3, so I changed the function like this. How about adding the explanation in the end of commit message like this: " .... If this is done in Linux it will need to be also done in Xen. To avoid adding read_cntpct_enforce_ordering everywhere, we introduced a new helper read_cntpct_stable to replace original get_cycles, and turn get_cycles to a wrapper which we can add read_cntpct_enforce_ordering easily. " Thanks, Wei Chen > Cheers, > > -- > Julien Grall
Hi Wei, On 14/01/2021 00:18, Wei Chen wrote: >>> I think >>> if we introduce an empty helper for Arm32, we can reduce the other >>> chunk inside get_cycles. In addition, if we need to do the same work >>> for Arm32 in the future, we may not need to modify get_cycles. >> I don't really follow this. I wasn't asking about >> read_cntpct_enforce_ordering(). Instead I was asking about >> read_cntpct_stable() which looks like you just split get_cycles(). >> >> This change looks completely unrelated to the purpose of this series. I >> am not going to ask to split it, but I think this should be explained in >> the commit message. >> > > Yes, I forgot to explain this changes in the commit message. > > When I was trying to add read_cntpct_enforce_ordering into get_cycles, > there were three ways I can do: > 1. Add read_cntpct_enforce_ordering to the end of each branch > 2. Add a new cycles_t variable and record value of each branch. Using > the function end as unique return point. And then we can add > read_cntpct_enforce_ordering to the end of get_cycles. > 3. Don't touch the get_cycles function body, just rename it and create > another helper named get_cycles to include original function and > read_cntpct_enforce_ordering > > Personally, I prefer the #3, so I changed the function like this. I agree. Thanks for the explanation! In which case: Reviewed-by: Julien Grall <jgrall@amazon.com> > > How about adding the explanation in the end of commit message like this: > " > .... If this is done in Linux it will need to be also done in Xen. > > To avoid adding read_cntpct_enforce_ordering everywhere, we introduced > a new helper read_cntpct_stable to replace original get_cycles, and turn > get_cycles to a wrapper which we can add read_cntpct_enforce_ordering > easily. > " I have updated the commit message and committed the patch. Thanks for your contribution. Cheers,
diff --git a/xen/include/asm-arm/time.h b/xen/include/asm-arm/time.h index 5c4529ebb5..6b8fd839dd 100644 --- a/xen/include/asm-arm/time.h +++ b/xen/include/asm-arm/time.h @@ -11,9 +11,26 @@ typedef uint64_t cycles_t; -static inline cycles_t get_cycles(void) +/* + * Ensure that reads of the counter are treated the same as memory reads + * for the purposes of ordering by subsequent memory barriers. + */ +#if defined(CONFIG_ARM_64) +#define read_cntpct_enforce_ordering(val) do { \ + uint64_t tmp, _val = (val); \ + \ + asm volatile( \ + "eor %0, %1, %1\n" \ + "add %0, sp, %0\n" \ + "ldr xzr, [%0]" \ + : "=r" (tmp) : "r" (_val)); \ +} while (0) +#else +#define read_cntpct_enforce_ordering(val) do {} while (0) +#endif + +static inline cycles_t read_cntpct_stable(void) { - isb(); /* * ARM_WORKAROUND_858921: Cortex-A73 (all versions) counter read * can return a wrong value when the counter crosses a 32bit boundary. @@ -36,6 +53,28 @@ static inline cycles_t get_cycles(void) } } +static inline cycles_t get_cycles(void) +{ + cycles_t cnt; + + isb(); + cnt = read_cntpct_stable(); + + /* + * If there is not any barrier here. When get_cycles being used in + * some seqlock critical context in the future, the seqlock can be + * speculated potentially. + * + * To prevent seqlock from being speculated silently, we add a barrier + * here defensively. Normally, we just need an ISB here is enough, but + * considering the minimum performance cost. We prefer to use enforce + * order here. + */ + read_cntpct_enforce_ordering(cnt); + + return cnt; +} + /* List of timer's IRQ */ enum timer_ppi {