diff mbox

x86/time: Adjust init-time handling of pit0_ticks

Message ID 1482165504-26423-1-git-send-email-andrew.cooper3@citrix.com (mailing list archive)
State New, archived
Headers show

Commit Message

Andrew Cooper Dec. 19, 2016, 4:38 p.m. UTC
There is no need for the volatile cast in the timer interrupt.  pit0_ticks has
external linkage, preventing the compiler from eliding the update.  This
reduces the generated assembly from a read, local modify, write to a single
add instruction.

Drop the memory barriers from timer_irq_works(), as they are not needed.
pit0_ticks is only modified by timer_interrupt() running on the same CPU, so
that is required is a volatile reference to prevent the compiler from eliding
the second read.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
CC: Jan Beulich <JBeulich@suse.com>
---
 xen/arch/x86/io_apic.c | 6 ++----
 xen/arch/x86/time.c    | 2 +-
 xen/include/xen/lib.h  | 2 ++
 3 files changed, 5 insertions(+), 5 deletions(-)

Comments

Jan Beulich Dec. 19, 2016, 4:51 p.m. UTC | #1
>>> On 19.12.16 at 17:38, <andrew.cooper3@citrix.com> wrote:
> There is no need for the volatile cast in the timer interrupt.  pit0_ticks has
> external linkage, preventing the compiler from eliding the update.  This
> reduces the generated assembly from a read, local modify, write to a single
> add instruction.

I don't think external linkage is the reason here, considering the
effects of whole-program-optimization.

> --- a/xen/arch/x86/io_apic.c
> +++ b/xen/arch/x86/io_apic.c
> @@ -1485,8 +1485,7 @@ static int __init timer_irq_works(void)
>  {
>      unsigned long t1, flags;
>  
> -    t1 = pit0_ticks;
> -    mb();
> +    t1 = ACCESS_ONCE(pit0_ticks);

Any reason not to use the available read_atomic() here?

Jan
Andrew Cooper Dec. 19, 2016, 4:58 p.m. UTC | #2
On 19/12/16 16:51, Jan Beulich wrote:
>>>> On 19.12.16 at 17:38, <andrew.cooper3@citrix.com> wrote:
>> There is no need for the volatile cast in the timer interrupt.  pit0_ticks has
>> external linkage, preventing the compiler from eliding the update.  This
>> reduces the generated assembly from a read, local modify, write to a single
>> add instruction.
> I don't think external linkage is the reason here, considering the
> effects of whole-program-optimization.

In the case of whole-program-optimisation, the compiler would observe
that one function wrote to the variable, and one function read from it. 
I presume that is also sufficient to prevent the eliding?

>
>> --- a/xen/arch/x86/io_apic.c
>> +++ b/xen/arch/x86/io_apic.c
>> @@ -1485,8 +1485,7 @@ static int __init timer_irq_works(void)
>>  {
>>      unsigned long t1, flags;
>>  
>> -    t1 = pit0_ticks;
>> -    mb();
>> +    t1 = ACCESS_ONCE(pit0_ticks);
> Any reason not to use the available read_atomic() here?

ACCESS_ONCE() doesn't force an explicit reg/mem mov instruction,
although in practice it doesn't make any difference in this case.

~Andrew
Jan Beulich Dec. 20, 2016, 7:25 a.m. UTC | #3
>>> On 19.12.16 at 17:58, <andrew.cooper3@citrix.com> wrote:
> On 19/12/16 16:51, Jan Beulich wrote:
>>>>> On 19.12.16 at 17:38, <andrew.cooper3@citrix.com> wrote:
>>> There is no need for the volatile cast in the timer interrupt.  pit0_ticks has
>>> external linkage, preventing the compiler from eliding the update.  This
>>> reduces the generated assembly from a read, local modify, write to a single
>>> add instruction.
>> I don't think external linkage is the reason here, considering the
>> effects of whole-program-optimization.
> 
> In the case of whole-program-optimisation, the compiler would observe
> that one function wrote to the variable, and one function read from it. 
> I presume that is also sufficient to prevent the eliding?

I would think so, yes (albeit the end result of that process may
be that everything which isn't recursive and doesn't serve as
independent entry point ends up as a few huge functions); I
merely wanted to point out that linkage isn't really relevant here.

Jan
Andrew Cooper Dec. 20, 2016, 12:17 p.m. UTC | #4
On 20/12/2016 07:25, Jan Beulich wrote:
>>>> On 19.12.16 at 17:58, <andrew.cooper3@citrix.com> wrote:
>> On 19/12/16 16:51, Jan Beulich wrote:
>>>>>> On 19.12.16 at 17:38, <andrew.cooper3@citrix.com> wrote:
>>>> There is no need for the volatile cast in the timer interrupt.  pit0_ticks has
>>>> external linkage, preventing the compiler from eliding the update.  This
>>>> reduces the generated assembly from a read, local modify, write to a single
>>>> add instruction.
>>> I don't think external linkage is the reason here, considering the
>>> effects of whole-program-optimization.
>> In the case of whole-program-optimisation, the compiler would observe
>> that one function wrote to the variable, and one function read from it. 
>> I presume that is also sufficient to prevent the eliding?
> I would think so, yes (albeit the end result of that process may
> be that everything which isn't recursive and doesn't serve as
> independent entry point ends up as a few huge functions); I
> merely wanted to point out that linkage isn't really relevant here.

What about this?

There is no need for the volatile cast in the timer interrupt; the compiler
may not elide the update.  This reduces the generated assembly from a read,
local modify, write to a single add instruction.

~Andrew
Jan Beulich Dec. 20, 2016, 12:56 p.m. UTC | #5
>>> On 20.12.16 at 13:17, <andrew.cooper3@citrix.com> wrote:
> On 20/12/2016 07:25, Jan Beulich wrote:
>>>>> On 19.12.16 at 17:58, <andrew.cooper3@citrix.com> wrote:
>>> On 19/12/16 16:51, Jan Beulich wrote:
>>>>>>> On 19.12.16 at 17:38, <andrew.cooper3@citrix.com> wrote:
>>>>> There is no need for the volatile cast in the timer interrupt.  pit0_ticks has
>>>>> external linkage, preventing the compiler from eliding the update.  This
>>>>> reduces the generated assembly from a read, local modify, write to a single
>>>>> add instruction.
>>>> I don't think external linkage is the reason here, considering the
>>>> effects of whole-program-optimization.
>>> In the case of whole-program-optimisation, the compiler would observe
>>> that one function wrote to the variable, and one function read from it. 
>>> I presume that is also sufficient to prevent the eliding?
>> I would think so, yes (albeit the end result of that process may
>> be that everything which isn't recursive and doesn't serve as
>> independent entry point ends up as a few huge functions); I
>> merely wanted to point out that linkage isn't really relevant here.
> 
> What about this?
> 
> There is no need for the volatile cast in the timer interrupt; the compiler
> may not elide the update.  This reduces the generated assembly from a read,
> local modify, write to a single add instruction.

Sounds fine.

Jan
diff mbox

Patch

diff --git a/xen/arch/x86/io_apic.c b/xen/arch/x86/io_apic.c
index 33e5927..f989978 100644
--- a/xen/arch/x86/io_apic.c
+++ b/xen/arch/x86/io_apic.c
@@ -1485,8 +1485,7 @@  static int __init timer_irq_works(void)
 {
     unsigned long t1, flags;
 
-    t1 = pit0_ticks;
-    mb();
+    t1 = ACCESS_ONCE(pit0_ticks);
 
     local_save_flags(flags);
     local_irq_enable();
@@ -1501,8 +1500,7 @@  static int __init timer_irq_works(void)
      * might have cached one ExtINT interrupt.  Finally, at
      * least one tick may be lost due to delays.
      */
-    mb();
-    if (pit0_ticks - t1 > 4)
+    if ( (ACCESS_ONCE(pit0_ticks) - t1) > 4 )
         return 1;
 
     return 0;
diff --git a/xen/arch/x86/time.c b/xen/arch/x86/time.c
index cb6939e..f160c01 100644
--- a/xen/arch/x86/time.c
+++ b/xen/arch/x86/time.c
@@ -197,7 +197,7 @@  static void timer_interrupt(int irq, void *dev_id, struct cpu_user_regs *regs)
         return;
 
     /* Only for start-of-day interruopt tests in io_apic.c. */
-    (*(volatile unsigned long *)&pit0_ticks)++;
+    pit0_ticks++;
 
     /* Rough hack to allow accurate timers to sort-of-work with no APIC. */
     if ( !cpu_has_apic )
diff --git a/xen/include/xen/lib.h b/xen/include/xen/lib.h
index d1171b7..1976e4b 100644
--- a/xen/include/xen/lib.h
+++ b/xen/include/xen/lib.h
@@ -56,6 +56,8 @@ 
 
 #define ARRAY_SIZE(x) (sizeof(x) / sizeof((x)[0]) + __must_be_array(x))
 
+#define ACCESS_ONCE(x) (*(volatile typeof(x) *)&(x))
+
 #define MASK_EXTR(v, m) (((v) & (m)) / ((m) & -(m)))
 #define MASK_INSR(v, m) (((v) * ((m) & -(m))) & (m))