diff mbox series

x86/hvm/rtc: preserved guest RTC offset during suspend/resume/migrate

Message ID 20191218144133.16089-1-pdurrant@amazon.com (mailing list archive)
State New, archived
Headers show
Series x86/hvm/rtc: preserved guest RTC offset during suspend/resume/migrate | expand

Commit Message

Paul Durrant Dec. 18, 2019, 2:41 p.m. UTC
The emulated RTC is synchronized with the PV wallclock; any write to the
RTC will update struct domain's 'time_offset_seconds' field and call
update_domain_wallclock().

However, the value of 'time_offset_seconds' is not preserved in any save
record and indeed, when the RTC save record is loaded, the CMOS values
will be updated based on an offset value which may or may not have been
set by the toolstack [1]. This may result in making bogus values available
to the guest and messing up any calculations done in the call to
alarm_timer_update() at the end of rtc_load().

This patch extends the RTC save record to contain an offset value, which
will be zero filled on load of an older record. The 'time_offset_secoonds'
field in struct domain is also modified into a 'time_offset' struct,
containing a 'seconds' field and a boolean 'set' field.

The code in rtc_load() then uses the new value in the save record to
update the value of struct domain's 'time_offset.seconds' unless
'time_offset.set' is true, which will only be the case if the toolstack has
already performed a XEN_DOMCTL_settimeoffset.

[1] There is currently no way for a toolstack to read the value of
    'time_offset_seconds' from struct domain. In the past, any hope of
    preservation of the value across a guest life-cycle operation was based
    on relying on qemu-dm to write a value into xenstore whenever the RTC
    was updated, in response to an IOREQ with type IOREQ_TYPE_TIMEOFFSET
    being sent by Xen; see:

    https://xenbits.xen.org/gitweb/?p=qemu-xen-traditional.git;a=blob;f=i386-dm/helper2.c#l457

    but this behaviour was never forward-ported into upstream QEMU, which
    completely ignores that IOREQ type.
    In either case, nothing in xl or libxl ever samples the value of
    RTC offset from xenstore so any offset adjustment to a non-zero value
    performed by the guest (which in the case of Windows is highly likely
    as it normally writes RTC in local time, whereas Xen maintains time in
    UTC) is completely lost with the de-facto toolstack, and always has
    been. Instead, PV drivers are relied upon to paper over this gaping
    hole.

Signed-off-by: Paul Durrant <pdurrant@amazon.com>
---
Cc: Stefano Stabellini <sstabellini@kernel.org>
Cc: Julien Grall <julien@xen.org>
Cc: Volodymyr Babchuk <Volodymyr_Babchuk@epam.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
Cc: George Dunlap <George.Dunlap@eu.citrix.com>
Cc: Ian Jackson <ian.jackson@eu.citrix.com>
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Wei Liu <wl@xen.org>
Cc: "Roger Pau Monné" <roger.pau@citrix.com>
---
 xen/arch/arm/platform_hypercall.c      |  2 +-
 xen/arch/arm/time.c                    |  3 ++-
 xen/arch/arm/vtimer.c                  |  4 ++--
 xen/arch/x86/hvm/rtc.c                 | 12 ++++++++++--
 xen/arch/x86/time.c                    |  3 ++-
 xen/common/time.c                      |  6 +++---
 xen/include/public/arch-x86/hvm/save.h |  2 ++
 xen/include/xen/sched.h                |  5 ++++-
 8 files changed, 26 insertions(+), 11 deletions(-)

Comments

Jan Beulich Dec. 19, 2019, 11:29 a.m. UTC | #1
On 18.12.2019 15:41, Paul Durrant wrote:
> The emulated RTC is synchronized with the PV wallclock; any write to the
> RTC will update struct domain's 'time_offset_seconds' field and call
> update_domain_wallclock().
> 
> However, the value of 'time_offset_seconds' is not preserved in any save
> record and indeed, when the RTC save record is loaded, the CMOS values
> will be updated based on an offset value which may or may not have been
> set by the toolstack [1]. This may result in making bogus values available
> to the guest and messing up any calculations done in the call to
> alarm_timer_update() at the end of rtc_load().
> 
> This patch extends the RTC save record to contain an offset value, which
> will be zero filled on load of an older record. The 'time_offset_secoonds'
> field in struct domain is also modified into a 'time_offset' struct,
> containing a 'seconds' field and a boolean 'set' field.
> 
> The code in rtc_load() then uses the new value in the save record to
> update the value of struct domain's 'time_offset.seconds' unless
> 'time_offset.set' is true, which will only be the case if the toolstack has
> already performed a XEN_DOMCTL_settimeoffset.
> 
> [1] There is currently no way for a toolstack to read the value of
>     'time_offset_seconds' from struct domain. In the past, any hope of
>     preservation of the value across a guest life-cycle operation was based
>     on relying on qemu-dm to write a value into xenstore whenever the RTC
>     was updated, in response to an IOREQ with type IOREQ_TYPE_TIMEOFFSET
>     being sent by Xen; see:
> 
>     https://xenbits.xen.org/gitweb/?p=qemu-xen-traditional.git;a=blob;f=i386-dm/helper2.c#l457
> 
>     but this behaviour was never forward-ported into upstream QEMU, which
>     completely ignores that IOREQ type.
>     In either case, nothing in xl or libxl ever samples the value of
>     RTC offset from xenstore so any offset adjustment to a non-zero value
>     performed by the guest (which in the case of Windows is highly likely
>     as it normally writes RTC in local time, whereas Xen maintains time in
>     UTC) is completely lost with the de-facto toolstack, and always has
>     been. Instead, PV drivers are relied upon to paper over this gaping
>     hole.
> 
> Signed-off-by: Paul Durrant <pdurrant@amazon.com>

Reviewed-by: Jan Beulich <jbeulich@suse.com>
with one remark:

> @@ -771,6 +773,12 @@ static int rtc_load(struct domain *d, hvm_domain_context_t *h)
>  
>      /* Reset the wall-clock time.  In normal running, this runs with host 
>       * time, so let's keep doing that. */
> +    if ( !d->time_offset.set )
> +    {
> +        d->time_offset.seconds = s->hw.rtc_offset;
> +        update_domain_wallclock_time(d);
> +    }

It's not really clear to me which of the possible behaviors is the
better one - overriding a tool stack set value with what the save
record says, or (like you do) the other way around.

Jan
Paul Durrant Dec. 19, 2019, 11:41 a.m. UTC | #2
> -----Original Message-----
> From: Jan Beulich <jbeulich@suse.com>
> Sent: 19 December 2019 11:30
> To: Durrant, Paul <pdurrant@amazon.com>
> Cc: xen-devel@lists.xenproject.org; Stefano Stabellini
> <sstabellini@kernel.org>; Julien Grall <julien@xen.org>; Volodymyr Babchuk
> <Volodymyr_Babchuk@epam.com>; Andrew Cooper <andrew.cooper3@citrix.com>;
> George Dunlap <George.Dunlap@eu.citrix.com>; Ian Jackson
> <ian.jackson@eu.citrix.com>; Konrad Rzeszutek Wilk
> <konrad.wilk@oracle.com>; Wei Liu <wl@xen.org>; Roger Pau Monné
> <roger.pau@citrix.com>
> Subject: Re: [PATCH] x86/hvm/rtc: preserved guest RTC offset during
> suspend/resume/migrate
> 
> On 18.12.2019 15:41, Paul Durrant wrote:
> > The emulated RTC is synchronized with the PV wallclock; any write to the
> > RTC will update struct domain's 'time_offset_seconds' field and call
> > update_domain_wallclock().
> >
> > However, the value of 'time_offset_seconds' is not preserved in any save
> > record and indeed, when the RTC save record is loaded, the CMOS values
> > will be updated based on an offset value which may or may not have been
> > set by the toolstack [1]. This may result in making bogus values
> available
> > to the guest and messing up any calculations done in the call to
> > alarm_timer_update() at the end of rtc_load().
> >
> > This patch extends the RTC save record to contain an offset value, which
> > will be zero filled on load of an older record. The
> 'time_offset_secoonds'
> > field in struct domain is also modified into a 'time_offset' struct,
> > containing a 'seconds' field and a boolean 'set' field.
> >
> > The code in rtc_load() then uses the new value in the save record to
> > update the value of struct domain's 'time_offset.seconds' unless
> > 'time_offset.set' is true, which will only be the case if the toolstack
> has
> > already performed a XEN_DOMCTL_settimeoffset.
> >
> > [1] There is currently no way for a toolstack to read the value of
> >     'time_offset_seconds' from struct domain. In the past, any hope of
> >     preservation of the value across a guest life-cycle operation was
> based
> >     on relying on qemu-dm to write a value into xenstore whenever the
> RTC
> >     was updated, in response to an IOREQ with type IOREQ_TYPE_TIMEOFFSET
> >     being sent by Xen; see:
> >
> >     https://xenbits.xen.org/gitweb/?p=qemu-xen-
> traditional.git;a=blob;f=i386-dm/helper2.c#l457
> >
> >     but this behaviour was never forward-ported into upstream QEMU,
> which
> >     completely ignores that IOREQ type.
> >     In either case, nothing in xl or libxl ever samples the value of
> >     RTC offset from xenstore so any offset adjustment to a non-zero
> value
> >     performed by the guest (which in the case of Windows is highly
> likely
> >     as it normally writes RTC in local time, whereas Xen maintains time
> in
> >     UTC) is completely lost with the de-facto toolstack, and always has
> >     been. Instead, PV drivers are relied upon to paper over this gaping
> >     hole.
> >
> > Signed-off-by: Paul Durrant <pdurrant@amazon.com>
> 
> Reviewed-by: Jan Beulich <jbeulich@suse.com>

Thanks.

> with one remark:
> 
> > @@ -771,6 +773,12 @@ static int rtc_load(struct domain *d,
> hvm_domain_context_t *h)
> >
> >      /* Reset the wall-clock time.  In normal running, this runs with
> host
> >       * time, so let's keep doing that. */
> > +    if ( !d->time_offset.set )
> > +    {
> > +        d->time_offset.seconds = s->hw.rtc_offset;
> > +        update_domain_wallclock_time(d);
> > +    }
> 
> It's not really clear to me which of the possible behaviors is the
> better one - overriding a tool stack set value with what the save
> record says, or (like you do) the other way around.

It's not clear to me either, which is why I erred on the side of caution. I didn't want to break any out-of-tree toolstacks that might be overriding the offset early in the restore phase from a value acquired via QEMU hackery. I guess the boolean could be dispensed with if we retire the IOREQ and silently ignore the DOMCTL for HVM guests (so overrides from the aforementioned toolstacks simply have no effect).

  Paul

> 
> Jan
Julien Grall Dec. 23, 2019, 5:50 p.m. UTC | #3
Hi Paul,

On 18/12/2019 15:41, Paul Durrant wrote:
> The emulated RTC is synchronized with the PV wallclock; any write to the
> RTC will update struct domain's 'time_offset_seconds' field and call
> update_domain_wallclock().
> 
> However, the value of 'time_offset_seconds' is not preserved in any save
> record and indeed, when the RTC save record is loaded, the CMOS values
> will be updated based on an offset value which may or may not have been
> set by the toolstack [1]. This may result in making bogus values available
> to the guest and messing up any calculations done in the call to
> alarm_timer_update() at the end of rtc_load().
> 
> This patch extends the RTC save record to contain an offset value, which
> will be zero filled on load of an older record. The 'time_offset_secoonds'
> field in struct domain is also modified into a 'time_offset' struct,
> containing a 'seconds' field and a boolean 'set' field.
> 
> The code in rtc_load() then uses the new value in the save record to
> update the value of struct domain's 'time_offset.seconds' unless
> 'time_offset.set' is true, which will only be the case if the toolstack has
> already performed a XEN_DOMCTL_settimeoffset.
> 
> [1] There is currently no way for a toolstack to read the value of
>      'time_offset_seconds' from struct domain. In the past, any hope of
>      preservation of the value across a guest life-cycle operation was based
>      on relying on qemu-dm to write a value into xenstore whenever the RTC
>      was updated, in response to an IOREQ with type IOREQ_TYPE_TIMEOFFSET
>      being sent by Xen; see:
> 
>      https://xenbits.xen.org/gitweb/?p=qemu-xen-traditional.git;a=blob;f=i386-dm/helper2.c#l457
> 
>      but this behaviour was never forward-ported into upstream QEMU, which
>      completely ignores that IOREQ type.
>      In either case, nothing in xl or libxl ever samples the value of
>      RTC offset from xenstore so any offset adjustment to a non-zero value
>      performed by the guest (which in the case of Windows is highly likely
>      as it normally writes RTC in local time, whereas Xen maintains time in
>      UTC) is completely lost with the de-facto toolstack, and always has
>      been. Instead, PV drivers are relied upon to paper over this gaping
>      hole.
> 
> Signed-off-by: Paul Durrant <pdurrant@amazon.com>
> ---
> Cc: Stefano Stabellini <sstabellini@kernel.org>
> Cc: Julien Grall <julien@xen.org>
> Cc: Volodymyr Babchuk <Volodymyr_Babchuk@epam.com>
> Cc: Andrew Cooper <andrew.cooper3@citrix.com>
> Cc: George Dunlap <George.Dunlap@eu.citrix.com>
> Cc: Ian Jackson <ian.jackson@eu.citrix.com>
> Cc: Jan Beulich <jbeulich@suse.com>
> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
> Cc: Wei Liu <wl@xen.org>
> Cc: "Roger Pau Monné" <roger.pau@citrix.com>
> ---
>   xen/arch/arm/platform_hypercall.c      |  2 +-
>   xen/arch/arm/time.c                    |  3 ++-
>   xen/arch/arm/vtimer.c                  |  4 ++--

The Arm changes looks mostly mechanical:

Acked-by: Julien Grall <julien@xen.org>

Cheers,
diff mbox series

Patch

diff --git a/xen/arch/arm/platform_hypercall.c b/xen/arch/arm/platform_hypercall.c
index 5aab856ce7..8efac7ee60 100644
--- a/xen/arch/arm/platform_hypercall.c
+++ b/xen/arch/arm/platform_hypercall.c
@@ -53,7 +53,7 @@  long do_platform_op(XEN_GUEST_HANDLE_PARAM(xen_platform_op_t) u_xenpf_op)
         if ( likely(!op->u.settime64.mbz) )
             do_settime(op->u.settime64.secs,
                        op->u.settime64.nsecs,
-                       op->u.settime64.system_time + SECONDS(d->time_offset_seconds));
+                       op->u.settime64.system_time + SECONDS(d->time_offset.seconds));
         else
             ret = -EINVAL;
         break;
diff --git a/xen/arch/arm/time.c b/xen/arch/arm/time.c
index 739bcf186c..b0021c2c69 100644
--- a/xen/arch/arm/time.c
+++ b/xen/arch/arm/time.c
@@ -353,7 +353,8 @@  void update_vcpu_system_time(struct vcpu *v)
 
 void domain_set_time_offset(struct domain *d, int64_t time_offset_seconds)
 {
-    d->time_offset_seconds = time_offset_seconds;
+    d->time_offset.seconds = time_offset_seconds;
+    d->time_offset.set = true;
     /* XXX update guest visible wallclock time */
 }
 
diff --git a/xen/arch/arm/vtimer.c b/xen/arch/arm/vtimer.c
index e6aebdac9e..240a850b6e 100644
--- a/xen/arch/arm/vtimer.c
+++ b/xen/arch/arm/vtimer.c
@@ -64,8 +64,8 @@  int domain_vtimer_init(struct domain *d, struct xen_arch_domainconfig *config)
 {
     d->arch.phys_timer_base.offset = NOW();
     d->arch.virt_timer_base.offset = READ_SYSREG64(CNTPCT_EL0);
-    d->time_offset_seconds = ticks_to_ns(d->arch.virt_timer_base.offset - boot_count);
-    do_div(d->time_offset_seconds, 1000000000);
+    d->time_offset.seconds = ticks_to_ns(d->arch.virt_timer_base.offset - boot_count);
+    do_div(d->time_offset.seconds, 1000000000);
 
     config->clock_frequency = timer_dt_clock_frequency;
 
diff --git a/xen/arch/x86/hvm/rtc.c b/xen/arch/x86/hvm/rtc.c
index 42339682e8..bb41efe84a 100644
--- a/xen/arch/x86/hvm/rtc.c
+++ b/xen/arch/x86/hvm/rtc.c
@@ -594,7 +594,7 @@  static void rtc_set_time(RTCState *s)
 
     /* We use the guest's setting of the RTC to define the local-time 
      * offset for this domain. */
-    d->time_offset_seconds += (after - before);
+    d->time_offset.seconds += (after - before);
     update_domain_wallclock_time(d);
     /* Also tell qemu-dm about it so it will be remembered for next boot. */
     send_timeoffset_req(after - before);
@@ -747,8 +747,10 @@  static int rtc_save(struct vcpu *v, hvm_domain_context_t *h)
         return 0;
 
     spin_lock(&s->lock);
+    s->hw.rtc_offset = d->time_offset.seconds;
     rc = hvm_save_entry(RTC, 0, h, &s->hw);
     spin_unlock(&s->lock);
+
     return rc;
 }
 
@@ -763,7 +765,7 @@  static int rtc_load(struct domain *d, hvm_domain_context_t *h)
     spin_lock(&s->lock);
 
     /* Restore the registers */
-    if ( hvm_load_entry(RTC, h, &s->hw) != 0 )
+    if ( hvm_load_entry_zeroextend(RTC, h, &s->hw) != 0 )
     {
         spin_unlock(&s->lock);
         return -EINVAL;
@@ -771,6 +773,12 @@  static int rtc_load(struct domain *d, hvm_domain_context_t *h)
 
     /* Reset the wall-clock time.  In normal running, this runs with host 
      * time, so let's keep doing that. */
+    if ( !d->time_offset.set )
+    {
+        d->time_offset.seconds = s->hw.rtc_offset;
+        update_domain_wallclock_time(d);
+    }
+
     s->current_tm = gmtime(get_localtime(d));
     rtc_copy_date(s);
 
diff --git a/xen/arch/x86/time.c b/xen/arch/x86/time.c
index 216169a025..f2d03933e4 100644
--- a/xen/arch/x86/time.c
+++ b/xen/arch/x86/time.c
@@ -1190,7 +1190,8 @@  static void update_domain_rtc(void)
 
 void domain_set_time_offset(struct domain *d, int64_t time_offset_seconds)
 {
-    d->time_offset_seconds = time_offset_seconds;
+    d->time_offset.seconds = time_offset_seconds;
+    d->time_offset.set = true;
     if ( is_hvm_domain(d) )
         rtc_update_clock(d);
     update_domain_wallclock_time(d);
diff --git a/xen/common/time.c b/xen/common/time.c
index a7caea99e0..82336e2d5a 100644
--- a/xen/common/time.c
+++ b/xen/common/time.c
@@ -105,7 +105,7 @@  void update_domain_wallclock_time(struct domain *d)
     *wc_version = version_update_begin(*wc_version);
     smp_wmb();
 
-    sec = wc_sec + d->time_offset_seconds;
+    sec = wc_sec + d->time_offset.seconds;
     shared_info(d, wc_sec)    = sec;
     shared_info(d, wc_nsec)   = wc_nsec;
 #ifdef CONFIG_X86
@@ -148,13 +148,13 @@  void do_settime(u64 secs, unsigned int nsecs, u64 system_time_base)
 unsigned long get_localtime(struct domain *d)
 {
     return wc_sec + (wc_nsec + NOW()) / 1000000000ULL
-        + d->time_offset_seconds;
+        + d->time_offset.seconds;
 }
 
 /* Return microsecs after 00:00:00 localtime, 1 January, 1970. */
 uint64_t get_localtime_us(struct domain *d)
 {
-    return (SECONDS(wc_sec + d->time_offset_seconds) + wc_nsec + NOW())
+    return (SECONDS(wc_sec + d->time_offset.seconds) + wc_nsec + NOW())
            / 1000UL;
 }
 
diff --git a/xen/include/public/arch-x86/hvm/save.h b/xen/include/public/arch-x86/hvm/save.h
index bb8fa7c12f..b2ad3fcd74 100644
--- a/xen/include/public/arch-x86/hvm/save.h
+++ b/xen/include/public/arch-x86/hvm/save.h
@@ -500,6 +500,8 @@  struct hvm_hw_rtc {
     /* Index register for 2-part operations */
     uint8_t cmos_index;
     uint8_t pad0;
+    /* RTC offset from host time */
+    int64_t rtc_offset;
 };
 
 DECLARE_HVM_SAVE_TYPE(RTC, 11, struct hvm_hw_rtc);
diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h
index 9f7bc69293..94add37990 100644
--- a/xen/include/xen/sched.h
+++ b/xen/include/xen/sched.h
@@ -406,7 +406,10 @@  struct domain
     /* Domain is paused by controller software? */
     int              controller_pause_count;
 
-    int64_t          time_offset_seconds;
+    struct {
+        int64_t seconds;
+        bool set;
+    } time_offset;
 
 #ifdef CONFIG_HAS_PCI
     struct list_head pdev_list;