diff mbox series

[v2,2/2] xen/arm: Add defensive barrier in get_cycles for Arm64

Message ID 20210108062126.2335873-2-wei.chen@arm.com (mailing list archive)
State New, archived
Headers show
Series [v2,1/2] xen/arm: Correct the coding style of get_cycles | expand

Commit Message

Wei Chen Jan. 8, 2021, 6:21 a.m. UTC
Per the discussion [1] on the mailing list, we'd better to
have a barrier after reading CNTPCT in get_cycles. If there
is not any barrier there. When get_cycles being used in some
seqlock critical context in the future, the seqlock can be
speculated potentially.

We import Linux commit 75a19a0202db21638a1c2b424afb867e1f9a2376:
    arm64: arch_timer: Ensure counter register reads occur with seqlock held

    When executing clock_gettime(), either in the vDSO or via a system call,
    we need to ensure that the read of the counter register occurs within
    the seqlock reader critical section. This ensures that updates to the
    clocksource parameters (e.g. the multiplier) are consistent with the
    counter value and therefore avoids the situation where time appears to
    go backwards across multiple reads.

    Extend the vDSO logic so that the seqlock critical section covers the
    read of the counter register as well as accesses to the data page. Since
    reads of the counter system registers are not ordered by memory barrier
    instructions, introduce dependency ordering from the counter read to a
    subsequent memory access so that the seqlock memory barriers apply to
    the counter access in both the vDSO and the system call paths.

    Cc: <stable@vger.kernel.org>
    Cc: Marc Zyngier <marc.zyngier@arm.com>
    Tested-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
    Link: https://lore.kernel.org/linux-arm-kernel/alpine.DEB.2.21.1902081950260.1662@nanos.tec.linutronix.de/
    Reported-by: Thomas Gleixner <tglx@linutronix.de>
    Signed-off-by: Will Deacon <will.deacon@arm.com>

While we are not aware of such use in Xen, it would be best to add the
barrier to avoid any suprise.

In order to reduce the impact of new barrier, we perfer to
use enforce order instead of ISB [2].

Currently, enforce order is not applied to arm32 as this is
not done in Linux at the date of this patch. If this is done
in Linux it will need to be also done in Xen.

[1] https://lists.xenproject.org/archives/html/xen-devel/2020-12/msg00181.html
[2] https://lkml.org/lkml/2020/3/13/645

Signed-off-by: Wei Chen <wei.chen@arm.com>
---
v1 -> v2:
1. Update commit message to refer Linux commit.
2. Change u64 to uint64_t
---
 xen/include/asm-arm/time.h | 43 ++++++++++++++++++++++++++++++++++++--
 1 file changed, 41 insertions(+), 2 deletions(-)

Comments

Julien Grall Jan. 8, 2021, 11:55 a.m. UTC | #1
Hi Wei,

On 08/01/2021 06:21, Wei Chen wrote:
> Per the discussion [1] on the mailing list, we'd better to
> have a barrier after reading CNTPCT in get_cycles. If there
> is not any barrier there. When get_cycles being used in some
> seqlock critical context in the future, the seqlock can be
> speculated potentially.
> 
> We import Linux commit 75a19a0202db21638a1c2b424afb867e1f9a2376:
>      arm64: arch_timer: Ensure counter register reads occur with seqlock held
> 
>      When executing clock_gettime(), either in the vDSO or via a system call,
>      we need to ensure that the read of the counter register occurs within
>      the seqlock reader critical section. This ensures that updates to the
>      clocksource parameters (e.g. the multiplier) are consistent with the
>      counter value and therefore avoids the situation where time appears to
>      go backwards across multiple reads.
> 
>      Extend the vDSO logic so that the seqlock critical section covers the
>      read of the counter register as well as accesses to the data page. Since
>      reads of the counter system registers are not ordered by memory barrier
>      instructions, introduce dependency ordering from the counter read to a
>      subsequent memory access so that the seqlock memory barriers apply to
>      the counter access in both the vDSO and the system call paths.
> 
>      Cc: <stable@vger.kernel.org>
>      Cc: Marc Zyngier <marc.zyngier@arm.com>
>      Tested-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
>      Link: https://lore.kernel.org/linux-arm-kernel/alpine.DEB.2.21.1902081950260.1662@nanos.tec.linutronix.de/
>      Reported-by: Thomas Gleixner <tglx@linutronix.de>
>      Signed-off-by: Will Deacon <will.deacon@arm.com>
> 
> While we are not aware of such use in Xen, it would be best to add the
> barrier to avoid any suprise.
> 
> In order to reduce the impact of new barrier, we perfer to
> use enforce order instead of ISB [2].
> 
> Currently, enforce order is not applied to arm32 as this is
> not done in Linux at the date of this patch. If this is done
> in Linux it will need to be also done in Xen.
> 
> [1] https://lists.xenproject.org/archives/html/xen-devel/2020-12/msg00181.html
> [2] https://lkml.org/lkml/2020/3/13/645
> 
> Signed-off-by: Wei Chen <wei.chen@arm.com>
> ---
> v1 -> v2:
> 1. Update commit message to refer Linux commit.
> 2. Change u64 to uint64_t
> ---
>   xen/include/asm-arm/time.h | 43 ++++++++++++++++++++++++++++++++++++--
>   1 file changed, 41 insertions(+), 2 deletions(-)
> 
> diff --git a/xen/include/asm-arm/time.h b/xen/include/asm-arm/time.h
> index 5c4529ebb5..6b8fd839dd 100644
> --- a/xen/include/asm-arm/time.h
> +++ b/xen/include/asm-arm/time.h
> @@ -11,9 +11,26 @@
>   
>   typedef uint64_t cycles_t;
>   
> -static inline cycles_t get_cycles(void)
> +/*
> + * Ensure that reads of the counter are treated the same as memory reads
> + * for the purposes of ordering by subsequent memory barriers.
> + */
> +#if defined(CONFIG_ARM_64)
> +#define read_cntpct_enforce_ordering(val) do { \
> +    uint64_t tmp, _val = (val);                \
> +                                               \
> +    asm volatile(                              \
> +    "eor %0, %1, %1\n"                         \
> +    "add %0, sp, %0\n"                         \
> +    "ldr xzr, [%0]"                            \
> +    : "=r" (tmp) : "r" (_val));                \
> +} while (0)
> +#else
> +#define read_cntpct_enforce_ordering(val) do {} while (0)
> +#endif
> +
> +static inline cycles_t read_cntpct_stable(void)

OOI, is there any particular reason to create a new helper?

>   {
> -    isb();
>       /*
>        * ARM_WORKAROUND_858921: Cortex-A73 (all versions) counter read
>        * can return a wrong value when the counter crosses a 32bit boundary.
> @@ -36,6 +53,28 @@ static inline cycles_t get_cycles(void)
>       }
>   }
>   
> +static inline cycles_t get_cycles(void)
> +{
> +    cycles_t cnt;
> +
> +    isb();
> +    cnt = read_cntpct_stable();
> +
> +    /*
> +     * If there is not any barrier here. When get_cycles being used in
> +     * some seqlock critical context in the future, the seqlock can be
> +     * speculated potentially.
> +     *
> +     * To prevent seqlock from being speculated silently, we add a barrier
> +     * here defensively. Normally, we just need an ISB here is enough, but
> +     * considering the minimum performance cost. We prefer to use enforce
> +     * order here.
> +     */

I thought, we agreed to remove the comment. Did I miss anything?

I can fix this one on commit if there is no need for a new revision.

Cheers,

> +    read_cntpct_enforce_ordering(cnt);
> +
> +    return cnt;
> +}
> +
>   /* List of timer's IRQ */
>   enum timer_ppi
>   {
>
Wei Chen Jan. 9, 2021, 12:16 p.m. UTC | #2
HI Julien,

> -----Original Message-----
> From: Julien Grall <julien@xen.org>
> Sent: 2021年1月8日 19:56
> To: Wei Chen <Wei.Chen@arm.com>; xen-devel@lists.xenproject.org;
> sstabellini@kernel.org
> Cc: Bertrand Marquis <Bertrand.Marquis@arm.com>; Penny Zheng
> <Penny.Zheng@arm.com>; Jiamei Xie <Jiamei.Xie@arm.com>; nd
> <nd@arm.com>
> Subject: Re: [PATCH v2 2/2] xen/arm: Add defensive barrier in get_cycles for
> Arm64
> 
> Hi Wei,
> 
> On 08/01/2021 06:21, Wei Chen wrote:
> > Per the discussion [1] on the mailing list, we'd better to
> > have a barrier after reading CNTPCT in get_cycles. If there
> > is not any barrier there. When get_cycles being used in some
> > seqlock critical context in the future, the seqlock can be
> > speculated potentially.
> >
> > We import Linux commit 75a19a0202db21638a1c2b424afb867e1f9a2376:
> >      arm64: arch_timer: Ensure counter register reads occur with seqlock held
> >
> >      When executing clock_gettime(), either in the vDSO or via a system call,
> >      we need to ensure that the read of the counter register occurs within
> >      the seqlock reader critical section. This ensures that updates to the
> >      clocksource parameters (e.g. the multiplier) are consistent with the
> >      counter value and therefore avoids the situation where time appears to
> >      go backwards across multiple reads.
> >
> >      Extend the vDSO logic so that the seqlock critical section covers the
> >      read of the counter register as well as accesses to the data page. Since
> >      reads of the counter system registers are not ordered by memory barrier
> >      instructions, introduce dependency ordering from the counter read to a
> >      subsequent memory access so that the seqlock memory barriers apply to
> >      the counter access in both the vDSO and the system call paths.
> >
> >      Cc: <stable@vger.kernel.org>
> >      Cc: Marc Zyngier <marc.zyngier@arm.com>
> >      Tested-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
> >      Link: https://lore.kernel.org/linux-arm-
> kernel/alpine.DEB.2.21.1902081950260.1662@nanos.tec.linutronix.de/
> >      Reported-by: Thomas Gleixner <tglx@linutronix.de>
> >      Signed-off-by: Will Deacon <will.deacon@arm.com>
> >
> > While we are not aware of such use in Xen, it would be best to add the
> > barrier to avoid any suprise.
> >
> > In order to reduce the impact of new barrier, we perfer to
> > use enforce order instead of ISB [2].
> >
> > Currently, enforce order is not applied to arm32 as this is
> > not done in Linux at the date of this patch. If this is done
> > in Linux it will need to be also done in Xen.
> >
> > [1] https://lists.xenproject.org/archives/html/xen-devel/2020-
> 12/msg00181.html
> > [2] https://lkml.org/lkml/2020/3/13/645
> >
> > Signed-off-by: Wei Chen <wei.chen@arm.com>
> > ---
> > v1 -> v2:
> > 1. Update commit message to refer Linux commit.
> > 2. Change u64 to uint64_t
> > ---
> >   xen/include/asm-arm/time.h | 43
> ++++++++++++++++++++++++++++++++++++--
> >   1 file changed, 41 insertions(+), 2 deletions(-)
> >
> > diff --git a/xen/include/asm-arm/time.h b/xen/include/asm-arm/time.h
> > index 5c4529ebb5..6b8fd839dd 100644
> > --- a/xen/include/asm-arm/time.h
> > +++ b/xen/include/asm-arm/time.h
> > @@ -11,9 +11,26 @@
> >
> >   typedef uint64_t cycles_t;
> >
> > -static inline cycles_t get_cycles(void)
> > +/*
> > + * Ensure that reads of the counter are treated the same as memory reads
> > + * for the purposes of ordering by subsequent memory barriers.
> > + */
> > +#if defined(CONFIG_ARM_64)
> > +#define read_cntpct_enforce_ordering(val) do { \
> > +    uint64_t tmp, _val = (val);                \
> > +                                               \
> > +    asm volatile(                              \
> > +    "eor %0, %1, %1\n"                         \
> > +    "add %0, sp, %0\n"                         \
> > +    "ldr xzr, [%0]"                            \
> > +    : "=r" (tmp) : "r" (_val));                \
> > +} while (0)
> > +#else
> > +#define read_cntpct_enforce_ordering(val) do {} while (0)
> > +#endif
> > +
> > +static inline cycles_t read_cntpct_stable(void)
> 
> OOI, is there any particular reason to create a new helper?
> 

Yes, I try to reduce the #if defined(CONFIG_ARM_64) chunks. I think
if we introduce an empty helper for Arm32, we can reduce the other
chunk inside get_cycles. In addition, if we need to do the same work
for Arm32 in the future, we may not need to modify get_cycles.

> >   {
> > -    isb();
> >       /*
> >        * ARM_WORKAROUND_858921: Cortex-A73 (all versions) counter read
> >        * can return a wrong value when the counter crosses a 32bit boundary.
> > @@ -36,6 +53,28 @@ static inline cycles_t get_cycles(void)
> >       }
> >   }
> >
> > +static inline cycles_t get_cycles(void)
> > +{
> > +    cycles_t cnt;
> > +
> > +    isb();
> > +    cnt = read_cntpct_stable();
> > +
> > +    /*
> > +     * If there is not any barrier here. When get_cycles being used in
> > +     * some seqlock critical context in the future, the seqlock can be
> > +     * speculated potentially.
> > +     *
> > +     * To prevent seqlock from being speculated silently, we add a barrier
> > +     * here defensively. Normally, we just need an ISB here is enough, but
> > +     * considering the minimum performance cost. We prefer to use enforce
> > +     * order here.
> > +     */
> 
> I thought, we agreed to remove the comment. Did I miss anything?
> 
> I can fix this one on commit if there is no need for a new revision.
> 

Ah, sorry I forget that. If we don't need a new revision, please help me to
remove it.

Thanks.

> Cheers,
> 
> > +    read_cntpct_enforce_ordering(cnt);
> > +
> > +    return cnt;
> > +}
> > +
> >   /* List of timer's IRQ */
> >   enum timer_ppi
> >   {
> >
> 
> --
> Julien Grall
Julien Grall Jan. 13, 2021, 6:30 p.m. UTC | #3
On 09/01/2021 12:16, Wei Chen wrote:
> HI Julien,

Hi Wei,

>> -----Original Message-----
>> From: Julien Grall <julien@xen.org>
>> Sent: 2021年1月8日 19:56
>> To: Wei Chen <Wei.Chen@arm.com>; xen-devel@lists.xenproject.org;
>> sstabellini@kernel.org
>> Cc: Bertrand Marquis <Bertrand.Marquis@arm.com>; Penny Zheng
>> <Penny.Zheng@arm.com>; Jiamei Xie <Jiamei.Xie@arm.com>; nd
>> <nd@arm.com>
>> Subject: Re: [PATCH v2 2/2] xen/arm: Add defensive barrier in get_cycles for
>> Arm64
>>
>> Hi Wei,
>>
>> On 08/01/2021 06:21, Wei Chen wrote:
>>> Per the discussion [1] on the mailing list, we'd better to
>>> have a barrier after reading CNTPCT in get_cycles. If there
>>> is not any barrier there. When get_cycles being used in some
>>> seqlock critical context in the future, the seqlock can be
>>> speculated potentially.
>>>
>>> We import Linux commit 75a19a0202db21638a1c2b424afb867e1f9a2376:
>>>       arm64: arch_timer: Ensure counter register reads occur with seqlock held
>>>
>>>       When executing clock_gettime(), either in the vDSO or via a system call,
>>>       we need to ensure that the read of the counter register occurs within
>>>       the seqlock reader critical section. This ensures that updates to the
>>>       clocksource parameters (e.g. the multiplier) are consistent with the
>>>       counter value and therefore avoids the situation where time appears to
>>>       go backwards across multiple reads.
>>>
>>>       Extend the vDSO logic so that the seqlock critical section covers the
>>>       read of the counter register as well as accesses to the data page. Since
>>>       reads of the counter system registers are not ordered by memory barrier
>>>       instructions, introduce dependency ordering from the counter read to a
>>>       subsequent memory access so that the seqlock memory barriers apply to
>>>       the counter access in both the vDSO and the system call paths.
>>>
>>>       Cc: <stable@vger.kernel.org>
>>>       Cc: Marc Zyngier <marc.zyngier@arm.com>
>>>       Tested-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
>>>       Link: https://lore.kernel.org/linux-arm-
>> kernel/alpine.DEB.2.21.1902081950260.1662@nanos.tec.linutronix.de/
>>>       Reported-by: Thomas Gleixner <tglx@linutronix.de>
>>>       Signed-off-by: Will Deacon <will.deacon@arm.com>
>>>
>>> While we are not aware of such use in Xen, it would be best to add the
>>> barrier to avoid any suprise.
>>>
>>> In order to reduce the impact of new barrier, we perfer to
>>> use enforce order instead of ISB [2].
>>>
>>> Currently, enforce order is not applied to arm32 as this is
>>> not done in Linux at the date of this patch. If this is done
>>> in Linux it will need to be also done in Xen.
>>>
>>> [1] https://lists.xenproject.org/archives/html/xen-devel/2020-
>> 12/msg00181.html
>>> [2] https://lkml.org/lkml/2020/3/13/645
>>>
>>> Signed-off-by: Wei Chen <wei.chen@arm.com>
>>> ---
>>> v1 -> v2:
>>> 1. Update commit message to refer Linux commit.
>>> 2. Change u64 to uint64_t
>>> ---
>>>    xen/include/asm-arm/time.h | 43
>> ++++++++++++++++++++++++++++++++++++--
>>>    1 file changed, 41 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/xen/include/asm-arm/time.h b/xen/include/asm-arm/time.h
>>> index 5c4529ebb5..6b8fd839dd 100644
>>> --- a/xen/include/asm-arm/time.h
>>> +++ b/xen/include/asm-arm/time.h
>>> @@ -11,9 +11,26 @@
>>>
>>>    typedef uint64_t cycles_t;
>>>
>>> -static inline cycles_t get_cycles(void)
>>> +/*
>>> + * Ensure that reads of the counter are treated the same as memory reads
>>> + * for the purposes of ordering by subsequent memory barriers.
>>> + */
>>> +#if defined(CONFIG_ARM_64)
>>> +#define read_cntpct_enforce_ordering(val) do { \
>>> +    uint64_t tmp, _val = (val);                \
>>> +                                               \
>>> +    asm volatile(                              \
>>> +    "eor %0, %1, %1\n"                         \
>>> +    "add %0, sp, %0\n"                         \
>>> +    "ldr xzr, [%0]"                            \
>>> +    : "=r" (tmp) : "r" (_val));                \
>>> +} while (0)
>>> +#else
>>> +#define read_cntpct_enforce_ordering(val) do {} while (0)
>>> +#endif
>>> +
>>> +static inline cycles_t read_cntpct_stable(void)
>>
>> OOI, is there any particular reason to create a new helper?
>>
> 
> Yes, I try to reduce the #if defined(CONFIG_ARM_64) chunks.

Hmmm... There is no #ifdef chunk in read_cntpct_stable(). Did I miss 
anything?

> I think
> if we introduce an empty helper for Arm32, we can reduce the other
> chunk inside get_cycles. In addition, if we need to do the same work
> for Arm32 in the future, we may not need to modify get_cycles.
I don't really follow this. I wasn't asking about 
read_cntpct_enforce_ordering(). Instead I was asking about 
read_cntpct_stable() which looks like you just split get_cycles().

This change looks completely unrelated to the purpose of this series. I 
am not going to ask to split it, but I think this should be explained in 
the commit message.

Cheers,
Wei Chen Jan. 14, 2021, 12:18 a.m. UTC | #4
Hi Julien,

> -----Original Message-----
> From: Julien Grall <julien@xen.org>
> Sent: 2021年1月14日 2:30
> To: Wei Chen <Wei.Chen@arm.com>; xen-devel@lists.xenproject.org;
> sstabellini@kernel.org
> Cc: Bertrand Marquis <Bertrand.Marquis@arm.com>; Penny Zheng
> <Penny.Zheng@arm.com>; Jiamei Xie <Jiamei.Xie@arm.com>; nd
> <nd@arm.com>
> Subject: Re: [PATCH v2 2/2] xen/arm: Add defensive barrier in get_cycles for
> Arm64
> 
> 
> 
> On 09/01/2021 12:16, Wei Chen wrote:
> > HI Julien,
> 
> Hi Wei,
> 
> >> -----Original Message-----
> >> From: Julien Grall <julien@xen.org>
> >> Sent: 2021年1月8日 19:56
> >> To: Wei Chen <Wei.Chen@arm.com>; xen-devel@lists.xenproject.org;
> >> sstabellini@kernel.org
> >> Cc: Bertrand Marquis <Bertrand.Marquis@arm.com>; Penny Zheng
> >> <Penny.Zheng@arm.com>; Jiamei Xie <Jiamei.Xie@arm.com>; nd
> >> <nd@arm.com>
> >> Subject: Re: [PATCH v2 2/2] xen/arm: Add defensive barrier in get_cycles for
> >> Arm64
> >>
> >> Hi Wei,
> >>
> >> On 08/01/2021 06:21, Wei Chen wrote:
> >>> Per the discussion [1] on the mailing list, we'd better to
> >>> have a barrier after reading CNTPCT in get_cycles. If there
> >>> is not any barrier there. When get_cycles being used in some
> >>> seqlock critical context in the future, the seqlock can be
> >>> speculated potentially.
> >>>
> >>> We import Linux commit 75a19a0202db21638a1c2b424afb867e1f9a2376:
> >>>       arm64: arch_timer: Ensure counter register reads occur with seqlock
> held
> >>>
> >>>       When executing clock_gettime(), either in the vDSO or via a system call,
> >>>       we need to ensure that the read of the counter register occurs within
> >>>       the seqlock reader critical section. This ensures that updates to the
> >>>       clocksource parameters (e.g. the multiplier) are consistent with the
> >>>       counter value and therefore avoids the situation where time appears to
> >>>       go backwards across multiple reads.
> >>>
> >>>       Extend the vDSO logic so that the seqlock critical section covers the
> >>>       read of the counter register as well as accesses to the data page. Since
> >>>       reads of the counter system registers are not ordered by memory barrier
> >>>       instructions, introduce dependency ordering from the counter read to a
> >>>       subsequent memory access so that the seqlock memory barriers apply to
> >>>       the counter access in both the vDSO and the system call paths.
> >>>
> >>>       Cc: <stable@vger.kernel.org>
> >>>       Cc: Marc Zyngier <marc.zyngier@arm.com>
> >>>       Tested-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
> >>>       Link: https://lore.kernel.org/linux-arm-
> >> kernel/alpine.DEB.2.21.1902081950260.1662@nanos.tec.linutronix.de/
> >>>       Reported-by: Thomas Gleixner <tglx@linutronix.de>
> >>>       Signed-off-by: Will Deacon <will.deacon@arm.com>
> >>>
> >>> While we are not aware of such use in Xen, it would be best to add the
> >>> barrier to avoid any suprise.
> >>>
> >>> In order to reduce the impact of new barrier, we perfer to
> >>> use enforce order instead of ISB [2].
> >>>
> >>> Currently, enforce order is not applied to arm32 as this is
> >>> not done in Linux at the date of this patch. If this is done
> >>> in Linux it will need to be also done in Xen.
> >>>
> >>> [1] https://lists.xenproject.org/archives/html/xen-devel/2020-
> >> 12/msg00181.html
> >>> [2] https://lkml.org/lkml/2020/3/13/645
> >>>
> >>> Signed-off-by: Wei Chen <wei.chen@arm.com>
> >>> ---
> >>> v1 -> v2:
> >>> 1. Update commit message to refer Linux commit.
> >>> 2. Change u64 to uint64_t
> >>> ---
> >>>    xen/include/asm-arm/time.h | 43
> >> ++++++++++++++++++++++++++++++++++++--
> >>>    1 file changed, 41 insertions(+), 2 deletions(-)
> >>>
> >>> diff --git a/xen/include/asm-arm/time.h b/xen/include/asm-arm/time.h
> >>> index 5c4529ebb5..6b8fd839dd 100644
> >>> --- a/xen/include/asm-arm/time.h
> >>> +++ b/xen/include/asm-arm/time.h
> >>> @@ -11,9 +11,26 @@
> >>>
> >>>    typedef uint64_t cycles_t;
> >>>
> >>> -static inline cycles_t get_cycles(void)
> >>> +/*
> >>> + * Ensure that reads of the counter are treated the same as memory reads
> >>> + * for the purposes of ordering by subsequent memory barriers.
> >>> + */
> >>> +#if defined(CONFIG_ARM_64)
> >>> +#define read_cntpct_enforce_ordering(val) do { \
> >>> +    uint64_t tmp, _val = (val);                \
> >>> +                                               \
> >>> +    asm volatile(                              \
> >>> +    "eor %0, %1, %1\n"                         \
> >>> +    "add %0, sp, %0\n"                         \
> >>> +    "ldr xzr, [%0]"                            \
> >>> +    : "=r" (tmp) : "r" (_val));                \
> >>> +} while (0)
> >>> +#else
> >>> +#define read_cntpct_enforce_ordering(val) do {} while (0)
> >>> +#endif
> >>> +
> >>> +static inline cycles_t read_cntpct_stable(void)
> >>
> >> OOI, is there any particular reason to create a new helper?
> >>
> >
> > Yes, I try to reduce the #if defined(CONFIG_ARM_64) chunks.
> 
> Hmmm... There is no #ifdef chunk in read_cntpct_stable(). Did I miss
> anything?

No, It was I who misunderstood your comments.

> 
> > I think
> > if we introduce an empty helper for Arm32, we can reduce the other
> > chunk inside get_cycles. In addition, if we need to do the same work
> > for Arm32 in the future, we may not need to modify get_cycles.
> I don't really follow this. I wasn't asking about
> read_cntpct_enforce_ordering(). Instead I was asking about
> read_cntpct_stable() which looks like you just split get_cycles().
> 
> This change looks completely unrelated to the purpose of this series. I
> am not going to ask to split it, but I think this should be explained in
> the commit message.
> 

Yes, I forgot to explain this changes in the commit message.

When I was trying to add read_cntpct_enforce_ordering into get_cycles,
there were three ways I can do:
1. Add read_cntpct_enforce_ordering to the end of each branch
2. Add a new cycles_t variable and record value of each branch. Using
    the function end as unique return point. And then we can add
    read_cntpct_enforce_ordering to the end of get_cycles.
3. Don't touch the get_cycles function body, just rename it and create
    another helper named get_cycles to include original function and
    read_cntpct_enforce_ordering

Personally, I prefer the #3, so I changed the function like this.

How about adding the explanation in the end of commit message like this:
"
.... If this is done in Linux it will need to be also done in Xen.

To avoid adding read_cntpct_enforce_ordering everywhere, we introduced
a new helper read_cntpct_stable to replace original get_cycles, and turn
get_cycles to a wrapper which we can add read_cntpct_enforce_ordering
easily.
"

Thanks,
Wei Chen

> Cheers,
> 
> --
> Julien Grall
Julien Grall Jan. 20, 2021, 5:27 p.m. UTC | #5
Hi Wei,

On 14/01/2021 00:18, Wei Chen wrote:
>>> I think
>>> if we introduce an empty helper for Arm32, we can reduce the other
>>> chunk inside get_cycles. In addition, if we need to do the same work
>>> for Arm32 in the future, we may not need to modify get_cycles.
>> I don't really follow this. I wasn't asking about
>> read_cntpct_enforce_ordering(). Instead I was asking about
>> read_cntpct_stable() which looks like you just split get_cycles().
>>
>> This change looks completely unrelated to the purpose of this series. I
>> am not going to ask to split it, but I think this should be explained in
>> the commit message.
>>
> 
> Yes, I forgot to explain this changes in the commit message.
> 
> When I was trying to add read_cntpct_enforce_ordering into get_cycles,
> there were three ways I can do:
> 1. Add read_cntpct_enforce_ordering to the end of each branch
> 2. Add a new cycles_t variable and record value of each branch. Using
>      the function end as unique return point. And then we can add
>      read_cntpct_enforce_ordering to the end of get_cycles.
> 3. Don't touch the get_cycles function body, just rename it and create
>      another helper named get_cycles to include original function and
>      read_cntpct_enforce_ordering
> 
> Personally, I prefer the #3, so I changed the function like this.

I agree. Thanks for the explanation! In which case:

Reviewed-by: Julien Grall <jgrall@amazon.com>


> 
> How about adding the explanation in the end of commit message like this:
> "
> .... If this is done in Linux it will need to be also done in Xen.
> 
> To avoid adding read_cntpct_enforce_ordering everywhere, we introduced
> a new helper read_cntpct_stable to replace original get_cycles, and turn
> get_cycles to a wrapper which we can add read_cntpct_enforce_ordering
> easily.
> "

I have updated the commit message and committed the patch.

Thanks for your contribution.

Cheers,
diff mbox series

Patch

diff --git a/xen/include/asm-arm/time.h b/xen/include/asm-arm/time.h
index 5c4529ebb5..6b8fd839dd 100644
--- a/xen/include/asm-arm/time.h
+++ b/xen/include/asm-arm/time.h
@@ -11,9 +11,26 @@ 
 
 typedef uint64_t cycles_t;
 
-static inline cycles_t get_cycles(void)
+/*
+ * Ensure that reads of the counter are treated the same as memory reads
+ * for the purposes of ordering by subsequent memory barriers.
+ */
+#if defined(CONFIG_ARM_64)
+#define read_cntpct_enforce_ordering(val) do { \
+    uint64_t tmp, _val = (val);                \
+                                               \
+    asm volatile(                              \
+    "eor %0, %1, %1\n"                         \
+    "add %0, sp, %0\n"                         \
+    "ldr xzr, [%0]"                            \
+    : "=r" (tmp) : "r" (_val));                \
+} while (0)
+#else
+#define read_cntpct_enforce_ordering(val) do {} while (0)
+#endif
+
+static inline cycles_t read_cntpct_stable(void)
 {
-    isb();
     /*
      * ARM_WORKAROUND_858921: Cortex-A73 (all versions) counter read
      * can return a wrong value when the counter crosses a 32bit boundary.
@@ -36,6 +53,28 @@  static inline cycles_t get_cycles(void)
     }
 }
 
+static inline cycles_t get_cycles(void)
+{
+    cycles_t cnt;
+
+    isb();
+    cnt = read_cntpct_stable();
+
+    /*
+     * If there is not any barrier here. When get_cycles being used in
+     * some seqlock critical context in the future, the seqlock can be
+     * speculated potentially.
+     *
+     * To prevent seqlock from being speculated silently, we add a barrier
+     * here defensively. Normally, we just need an ISB here is enough, but
+     * considering the minimum performance cost. We prefer to use enforce
+     * order here.
+     */
+    read_cntpct_enforce_ordering(cnt);
+
+    return cnt;
+}
+
 /* List of timer's IRQ */
 enum timer_ppi
 {