Message ID | 1482165504-26423-1-git-send-email-andrew.cooper3@citrix.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
>>> On 19.12.16 at 17:38, <andrew.cooper3@citrix.com> wrote: > There is no need for the volatile cast in the timer interrupt. pit0_ticks has > external linkage, preventing the compiler from eliding the update. This > reduces the generated assembly from a read, local modify, write to a single > add instruction. I don't think external linkage is the reason here, considering the effects of whole-program-optimization. > --- a/xen/arch/x86/io_apic.c > +++ b/xen/arch/x86/io_apic.c > @@ -1485,8 +1485,7 @@ static int __init timer_irq_works(void) > { > unsigned long t1, flags; > > - t1 = pit0_ticks; > - mb(); > + t1 = ACCESS_ONCE(pit0_ticks); Any reason not to use the available read_atomic() here? Jan
On 19/12/16 16:51, Jan Beulich wrote: >>>> On 19.12.16 at 17:38, <andrew.cooper3@citrix.com> wrote: >> There is no need for the volatile cast in the timer interrupt. pit0_ticks has >> external linkage, preventing the compiler from eliding the update. This >> reduces the generated assembly from a read, local modify, write to a single >> add instruction. > I don't think external linkage is the reason here, considering the > effects of whole-program-optimization. In the case of whole-program-optimisation, the compiler would observe that one function wrote to the variable, and one function read from it. I presume that is also sufficient to prevent the eliding? > >> --- a/xen/arch/x86/io_apic.c >> +++ b/xen/arch/x86/io_apic.c >> @@ -1485,8 +1485,7 @@ static int __init timer_irq_works(void) >> { >> unsigned long t1, flags; >> >> - t1 = pit0_ticks; >> - mb(); >> + t1 = ACCESS_ONCE(pit0_ticks); > Any reason not to use the available read_atomic() here? ACCESS_ONCE() doesn't force an explicit reg/mem mov instruction, although in practice it doesn't make any difference in this case. ~Andrew
>>> On 19.12.16 at 17:58, <andrew.cooper3@citrix.com> wrote: > On 19/12/16 16:51, Jan Beulich wrote: >>>>> On 19.12.16 at 17:38, <andrew.cooper3@citrix.com> wrote: >>> There is no need for the volatile cast in the timer interrupt. pit0_ticks has >>> external linkage, preventing the compiler from eliding the update. This >>> reduces the generated assembly from a read, local modify, write to a single >>> add instruction. >> I don't think external linkage is the reason here, considering the >> effects of whole-program-optimization. > > In the case of whole-program-optimisation, the compiler would observe > that one function wrote to the variable, and one function read from it. > I presume that is also sufficient to prevent the eliding? I would think so, yes (albeit the end result of that process may be that everything which isn't recursive and doesn't serve as independent entry point ends up as a few huge functions); I merely wanted to point out that linkage isn't really relevant here. Jan
On 20/12/2016 07:25, Jan Beulich wrote: >>>> On 19.12.16 at 17:58, <andrew.cooper3@citrix.com> wrote: >> On 19/12/16 16:51, Jan Beulich wrote: >>>>>> On 19.12.16 at 17:38, <andrew.cooper3@citrix.com> wrote: >>>> There is no need for the volatile cast in the timer interrupt. pit0_ticks has >>>> external linkage, preventing the compiler from eliding the update. This >>>> reduces the generated assembly from a read, local modify, write to a single >>>> add instruction. >>> I don't think external linkage is the reason here, considering the >>> effects of whole-program-optimization. >> In the case of whole-program-optimisation, the compiler would observe >> that one function wrote to the variable, and one function read from it. >> I presume that is also sufficient to prevent the eliding? > I would think so, yes (albeit the end result of that process may > be that everything which isn't recursive and doesn't serve as > independent entry point ends up as a few huge functions); I > merely wanted to point out that linkage isn't really relevant here. What about this? There is no need for the volatile cast in the timer interrupt; the compiler may not elide the update. This reduces the generated assembly from a read, local modify, write to a single add instruction. ~Andrew
>>> On 20.12.16 at 13:17, <andrew.cooper3@citrix.com> wrote: > On 20/12/2016 07:25, Jan Beulich wrote: >>>>> On 19.12.16 at 17:58, <andrew.cooper3@citrix.com> wrote: >>> On 19/12/16 16:51, Jan Beulich wrote: >>>>>>> On 19.12.16 at 17:38, <andrew.cooper3@citrix.com> wrote: >>>>> There is no need for the volatile cast in the timer interrupt. pit0_ticks has >>>>> external linkage, preventing the compiler from eliding the update. This >>>>> reduces the generated assembly from a read, local modify, write to a single >>>>> add instruction. >>>> I don't think external linkage is the reason here, considering the >>>> effects of whole-program-optimization. >>> In the case of whole-program-optimisation, the compiler would observe >>> that one function wrote to the variable, and one function read from it. >>> I presume that is also sufficient to prevent the eliding? >> I would think so, yes (albeit the end result of that process may >> be that everything which isn't recursive and doesn't serve as >> independent entry point ends up as a few huge functions); I >> merely wanted to point out that linkage isn't really relevant here. > > What about this? > > There is no need for the volatile cast in the timer interrupt; the compiler > may not elide the update. This reduces the generated assembly from a read, > local modify, write to a single add instruction. Sounds fine. Jan
diff --git a/xen/arch/x86/io_apic.c b/xen/arch/x86/io_apic.c index 33e5927..f989978 100644 --- a/xen/arch/x86/io_apic.c +++ b/xen/arch/x86/io_apic.c @@ -1485,8 +1485,7 @@ static int __init timer_irq_works(void) { unsigned long t1, flags; - t1 = pit0_ticks; - mb(); + t1 = ACCESS_ONCE(pit0_ticks); local_save_flags(flags); local_irq_enable(); @@ -1501,8 +1500,7 @@ static int __init timer_irq_works(void) * might have cached one ExtINT interrupt. Finally, at * least one tick may be lost due to delays. */ - mb(); - if (pit0_ticks - t1 > 4) + if ( (ACCESS_ONCE(pit0_ticks) - t1) > 4 ) return 1; return 0; diff --git a/xen/arch/x86/time.c b/xen/arch/x86/time.c index cb6939e..f160c01 100644 --- a/xen/arch/x86/time.c +++ b/xen/arch/x86/time.c @@ -197,7 +197,7 @@ static void timer_interrupt(int irq, void *dev_id, struct cpu_user_regs *regs) return; /* Only for start-of-day interruopt tests in io_apic.c. */ - (*(volatile unsigned long *)&pit0_ticks)++; + pit0_ticks++; /* Rough hack to allow accurate timers to sort-of-work with no APIC. */ if ( !cpu_has_apic ) diff --git a/xen/include/xen/lib.h b/xen/include/xen/lib.h index d1171b7..1976e4b 100644 --- a/xen/include/xen/lib.h +++ b/xen/include/xen/lib.h @@ -56,6 +56,8 @@ #define ARRAY_SIZE(x) (sizeof(x) / sizeof((x)[0]) + __must_be_array(x)) +#define ACCESS_ONCE(x) (*(volatile typeof(x) *)&(x)) + #define MASK_EXTR(v, m) (((v) & (m)) / ((m) & -(m))) #define MASK_INSR(v, m) (((v) * ((m) & -(m))) & (m))
There is no need for the volatile cast in the timer interrupt. pit0_ticks has external linkage, preventing the compiler from eliding the update. This reduces the generated assembly from a read, local modify, write to a single add instruction. Drop the memory barriers from timer_irq_works(), as they are not needed. pit0_ticks is only modified by timer_interrupt() running on the same CPU, so that is required is a volatile reference to prevent the compiler from eliding the second read. Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> --- CC: Jan Beulich <JBeulich@suse.com> --- xen/arch/x86/io_apic.c | 6 ++---- xen/arch/x86/time.c | 2 +- xen/include/xen/lib.h | 2 ++ 3 files changed, 5 insertions(+), 5 deletions(-)