diff mbox

remove static declaration from wall clock version

Message ID 1235677340-3139-1-git-send-email-glommer@redhat.com (mailing list archive)
State New, archived
Headers show

Commit Message

Glauber Costa Feb. 26, 2009, 7:42 p.m. UTC
Matt T. Yourst noted that we're currently having a dumb
race for no reason in paravirtual wall clock. This is due
to the use of a static variable to hold the counting.

This can race with multiple guests reading wallclock
at the same time, since the static variable value would
then be accessible to all callers. This wasn't noted
before because it is a rather rare scenario.

Instead, just use a normal stack variable. This will
mean that each caller will have it's version written
separatedly. No need for a global counter.

Signed-off-by: Glauber Costa <glommer@redhat.com>
---
 arch/x86/kvm/x86.c |    4 +---
 1 files changed, 1 insertions(+), 3 deletions(-)

Comments

Arnd Bergmann Feb. 26, 2009, 7:50 p.m. UTC | #1
On Thursday 26 February 2009, Glauber Costa wrote:
> @@ -548,15 +548,13 @@ static int do_set_msr(struct kvm_vcpu *vcpu, unsigned index, u64 *data)
>  
>  static void kvm_write_wall_clock(struct kvm *kvm, gpa_t wall_clock)
>  {
> -       static int version;
> +       int version = 1;
>         struct pvclock_wall_clock wc;
>         struct timespec now, sys, boot;
>  
>         if (!wall_clock)
>                 return;
>  
> -       version++;
> -
>         kvm_write_guest(kvm, wall_clock, &version, sizeof(version));
>  
>         /*

Doesn't this mean that kvm_write_guest now writes an uninitialized value
to the guest?

I think what you need here is a 'static atomic_t version;' so you can
do an atomic_inc instead of the ++.

	Arnd <><
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Glauber Costa Feb. 27, 2009, 1:22 a.m. UTC | #2
On Thu, Feb 26, 2009 at 08:50:26PM +0100, Arnd Bergmann wrote:
> On Thursday 26 February 2009, Glauber Costa wrote:
> > @@ -548,15 +548,13 @@ static int do_set_msr(struct kvm_vcpu *vcpu, unsigned index, u64 *data)
> >  
> >  static void kvm_write_wall_clock(struct kvm *kvm, gpa_t wall_clock)
> >  {
> > -       static int version;
> > +       int version = 1;
> >         struct pvclock_wall_clock wc;
> >         struct timespec now, sys, boot;
> >  
> >         if (!wall_clock)
> >                 return;
> >  
> > -       version++;
> > -
> >         kvm_write_guest(kvm, wall_clock, &version, sizeof(version));
> >  
> >         /*
> 
> Doesn't this mean that kvm_write_guest now writes an uninitialized value
> to the guest?
No. If you look closely, it's now initialized to 1.

> 
> I think what you need here is a 'static atomic_t version;' so you can
> do an atomic_inc instead of the ++.

I don't see the need for atomicity. This is just called once, at boot time.
The only thing we're protecting here is one guest from another. The stack
will do fine for this.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Arnd Bergmann Feb. 27, 2009, 1:28 a.m. UTC | #3
On Friday 27 February 2009, Glauber Costa wrote:
> 
> > Doesn't this mean that kvm_write_guest now writes an uninitialized value
> > to the guest?
> No. If you look closely, it's now initialized to 1.

Right, I didn't see that change at first.

	Arnd <><
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Marcelo Tosatti March 3, 2009, 11:21 a.m. UTC | #4
On Thu, Feb 26, 2009 at 02:42:20PM -0500, Glauber Costa wrote:
> Matt T. Yourst noted that we're currently having a dumb
> race for no reason in paravirtual wall clock. This is due
> to the use of a static variable to hold the counting.
> 
> This can race with multiple guests reading wallclock
> at the same time, since the static variable value would
> then be accessible to all callers. This wasn't noted
> before because it is a rather rare scenario.
> 
> Instead, just use a normal stack variable. This will
> mean that each caller will have it's version written
> separatedly. No need for a global counter.

Applied, thanks.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Avi Kivity March 8, 2009, 12:15 p.m. UTC | #5
Glauber Costa wrote:
> Matt T. Yourst noted that we're currently having a dumb
> race for no reason in paravirtual wall clock. This is due
> to the use of a static variable to hold the counting.
>
> This can race with multiple guests reading wallclock
> at the same time, since the static variable value would
> then be accessible to all callers. This wasn't noted
> before because it is a rather rare scenario.
>
> Instead, just use a normal stack variable. This will
> mean that each caller will have it's version written
> separatedly. No need for a global counter.
>
> Signed-off-by: Glauber Costa <glommer@redhat.com>
> ---
>  arch/x86/kvm/x86.c |    4 +---
>  1 files changed, 1 insertions(+), 3 deletions(-)
>
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 2511708..d7236f6 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -548,15 +548,13 @@ static int do_set_msr(struct kvm_vcpu *vcpu, unsigned index, u64 *data)
>  
>  static void kvm_write_wall_clock(struct kvm *kvm, gpa_t wall_clock)
>  {
> -	static int version;
> +	int version = 1;
>  	struct pvclock_wall_clock wc;
>  	struct timespec now, sys, boot;
>  
>  	if (!wall_clock)
>  		return;
>  
> -	version++;
> -
>  	kvm_write_guest(kvm, wall_clock, &version, sizeof(version));
>  

Suppose currently version == 2.

guest: read version (2)
guest: read sec
    host: write version (1)
    host: write sec
    host: write nsec
    host: write version (2)
guest: read nsec
guest: read version (2)

So now we have inconsistent time (sec from old data, nsec from new data).

We need to make version a per-vm value.  Best to read it from guest 
memory, so nothing special needs to be done for live migration.  Also 
use mutual exclusion in kvm_write_wall_clock() - sequence locks don't 
support multiple writers.
diff mbox

Patch

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 2511708..d7236f6 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -548,15 +548,13 @@  static int do_set_msr(struct kvm_vcpu *vcpu, unsigned index, u64 *data)
 
 static void kvm_write_wall_clock(struct kvm *kvm, gpa_t wall_clock)
 {
-	static int version;
+	int version = 1;
 	struct pvclock_wall_clock wc;
 	struct timespec now, sys, boot;
 
 	if (!wall_clock)
 		return;
 
-	version++;
-
 	kvm_write_guest(kvm, wall_clock, &version, sizeof(version));
 
 	/*