diff mbox

x86: vdso: fix pvclock races with task migration

Message ID 1428000263-11892-1-git-send-email-rkrcmar@redhat.com (mailing list archive)
State New, archived
Headers show

Commit Message

Radim Krčmář April 2, 2015, 6:44 p.m. UTC
If we were migrated right after __getcpu, but before reading the
migration_count, we wouldn't notice that we read TSC of a different
VCPU, nor that KVM's bug made pvti invalid, as only migration_count
on source VCPU is increased.

Change vdso instead of updating migration_count on destination.

Fixes: 0a4e6be9ca17 ("x86: kvm: Revert "remove sched notifier for cross-cpu migrations"")
Cc: stable@vger.kernel.org
Signed-off-by: Radim Kr?má? <rkrcmar@redhat.com>
---
 Because it we'll get a complete rewrite, this series does not
 - remove the outdated 'TODO: We can put [...]' comment
 - use a proper encapsulation for the inner do-while loop
 - optimize the outer do-while loop
   (no need to re-read cpu id on version mismatch)

 arch/x86/vdso/vclock_gettime.c | 20 ++++++++++++--------
 1 file changed, 12 insertions(+), 8 deletions(-)

Comments

Andy Lutomirski April 2, 2015, 6:59 p.m. UTC | #1
On Thu, Apr 2, 2015 at 11:44 AM, Radim Kr?má? <rkrcmar@redhat.com> wrote:
> If we were migrated right after __getcpu, but before reading the
> migration_count, we wouldn't notice that we read TSC of a different
> VCPU, nor that KVM's bug made pvti invalid, as only migration_count
> on source VCPU is increased.
>
> Change vdso instead of updating migration_count on destination.

Looks good to me.

--Andy

>
> Fixes: 0a4e6be9ca17 ("x86: kvm: Revert "remove sched notifier for cross-cpu migrations"")
> Cc: stable@vger.kernel.org
> Signed-off-by: Radim Kr?má? <rkrcmar@redhat.com>
> ---
>  Because it we'll get a complete rewrite, this series does not
>  - remove the outdated 'TODO: We can put [...]' comment
>  - use a proper encapsulation for the inner do-while loop
>  - optimize the outer do-while loop
>    (no need to re-read cpu id on version mismatch)
>
>  arch/x86/vdso/vclock_gettime.c | 20 ++++++++++++--------
>  1 file changed, 12 insertions(+), 8 deletions(-)
>
> diff --git a/arch/x86/vdso/vclock_gettime.c b/arch/x86/vdso/vclock_gettime.c
> index 30933760ee5f..40d2473836c9 100644
> --- a/arch/x86/vdso/vclock_gettime.c
> +++ b/arch/x86/vdso/vclock_gettime.c
> @@ -99,21 +99,25 @@ static notrace cycle_t vread_pvclock(int *mode)
>                  * __getcpu() calls (Gleb).
>                  */
>
> -               pvti = get_pvti(cpu);
> +               /* Make sure migrate_count will change if we leave the VCPU. */
> +               do {
> +                       pvti = get_pvti(cpu);
> +                       migrate_count = pvti->migrate_count;
>
> -               migrate_count = pvti->migrate_count;
> +                       cpu1 = cpu;
> +                       cpu = __getcpu() & VGETCPU_CPU_MASK;
> +               } while (unlikely(cpu != cpu1));
>
>                 version = __pvclock_read_cycles(&pvti->pvti, &ret, &flags);
>
>                 /*
>                  * Test we're still on the cpu as well as the version.
> -                * We could have been migrated just after the first
> -                * vgetcpu but before fetching the version, so we
> -                * wouldn't notice a version change.
> +                * - We must read TSC of pvti's VCPU.
> +                * - KVM doesn't follow the versioning protocol, so data could
> +                *   change before version if we left the VCPU.
>                  */
> -               cpu1 = __getcpu() & VGETCPU_CPU_MASK;
> -       } while (unlikely(cpu != cpu1 ||
> -                         (pvti->pvti.version & 1) ||
> +               smp_rmb();
> +       } while (unlikely((pvti->pvti.version & 1) ||
>                           pvti->pvti.version != version ||
>                           pvti->migrate_count != migrate_count));
>
> --
> 2.3.4
>
Andy Lutomirski April 6, 2015, 8:07 p.m. UTC | #2
On 04/02/2015 11:59 AM, Andy Lutomirski wrote:
> On Thu, Apr 2, 2015 at 11:44 AM, Radim Kr?má? <rkrcmar@redhat.com> wrote:
>> If we were migrated right after __getcpu, but before reading the
>> migration_count, we wouldn't notice that we read TSC of a different
>> VCPU, nor that KVM's bug made pvti invalid, as only migration_count
>> on source VCPU is increased.
>>
>> Change vdso instead of updating migration_count on destination.
>
> Looks good to me.

Just to check: what tree is this intended to go through?  I can take it, 
but not until the previous patch makes it into Linus' tree or -tip.  Or 
I can take both patches.

Marcelo, Paolo?

--Andy

>
> --Andy
>
>>
>> Fixes: 0a4e6be9ca17 ("x86: kvm: Revert "remove sched notifier for cross-cpu migrations"")
>> Cc: stable@vger.kernel.org
>> Signed-off-by: Radim Kr?má? <rkrcmar@redhat.com>
>> ---
>>   Because it we'll get a complete rewrite, this series does not
>>   - remove the outdated 'TODO: We can put [...]' comment
>>   - use a proper encapsulation for the inner do-while loop
>>   - optimize the outer do-while loop
>>     (no need to re-read cpu id on version mismatch)
>>
>>   arch/x86/vdso/vclock_gettime.c | 20 ++++++++++++--------
>>   1 file changed, 12 insertions(+), 8 deletions(-)
>>
>> diff --git a/arch/x86/vdso/vclock_gettime.c b/arch/x86/vdso/vclock_gettime.c
>> index 30933760ee5f..40d2473836c9 100644
>> --- a/arch/x86/vdso/vclock_gettime.c
>> +++ b/arch/x86/vdso/vclock_gettime.c
>> @@ -99,21 +99,25 @@ static notrace cycle_t vread_pvclock(int *mode)
>>                   * __getcpu() calls (Gleb).
>>                   */
>>
>> -               pvti = get_pvti(cpu);
>> +               /* Make sure migrate_count will change if we leave the VCPU. */
>> +               do {
>> +                       pvti = get_pvti(cpu);
>> +                       migrate_count = pvti->migrate_count;
>>
>> -               migrate_count = pvti->migrate_count;
>> +                       cpu1 = cpu;
>> +                       cpu = __getcpu() & VGETCPU_CPU_MASK;
>> +               } while (unlikely(cpu != cpu1));
>>
>>                  version = __pvclock_read_cycles(&pvti->pvti, &ret, &flags);
>>
>>                  /*
>>                   * Test we're still on the cpu as well as the version.
>> -                * We could have been migrated just after the first
>> -                * vgetcpu but before fetching the version, so we
>> -                * wouldn't notice a version change.
>> +                * - We must read TSC of pvti's VCPU.
>> +                * - KVM doesn't follow the versioning protocol, so data could
>> +                *   change before version if we left the VCPU.
>>                   */
>> -               cpu1 = __getcpu() & VGETCPU_CPU_MASK;
>> -       } while (unlikely(cpu != cpu1 ||
>> -                         (pvti->pvti.version & 1) ||
>> +               smp_rmb();
>> +       } while (unlikely((pvti->pvti.version & 1) ||
>>                            pvti->pvti.version != version ||
>>                            pvti->migrate_count != migrate_count));
>>
>> --
>> 2.3.4
>>
>
>
>

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Marcelo Tosatti April 6, 2015, 8:57 p.m. UTC | #3
On Thu, Apr 02, 2015 at 08:44:23PM +0200, Radim Kr?má? wrote:
> If we were migrated right after __getcpu, but before reading the
> migration_count, we wouldn't notice that we read TSC of a different
> VCPU, nor that KVM's bug made pvti invalid, as only migration_count
> on source VCPU is increased.
> 
> Change vdso instead of updating migration_count on destination.
> 
> Fixes: 0a4e6be9ca17 ("x86: kvm: Revert "remove sched notifier for cross-cpu migrations"")
> Cc: stable@vger.kernel.org
> Signed-off-by: Radim Kr?má? <rkrcmar@redhat.com>
> ---
>  Because it we'll get a complete rewrite, this series does not
>  - remove the outdated 'TODO: We can put [...]' comment
>  - use a proper encapsulation for the inner do-while loop
>  - optimize the outer do-while loop
>    (no need to re-read cpu id on version mismatch)
> 
>  arch/x86/vdso/vclock_gettime.c | 20 ++++++++++++--------
>  1 file changed, 12 insertions(+), 8 deletions(-)
> 
> diff --git a/arch/x86/vdso/vclock_gettime.c b/arch/x86/vdso/vclock_gettime.c
> index 30933760ee5f..40d2473836c9 100644
> --- a/arch/x86/vdso/vclock_gettime.c
> +++ b/arch/x86/vdso/vclock_gettime.c
> @@ -99,21 +99,25 @@ static notrace cycle_t vread_pvclock(int *mode)
>  		 * __getcpu() calls (Gleb).
>  		 */
>  
> -		pvti = get_pvti(cpu);
> +		/* Make sure migrate_count will change if we leave the VCPU. */
> +		do {
> +			pvti = get_pvti(cpu);
> +			migrate_count = pvti->migrate_count;
>  
> -		migrate_count = pvti->migrate_count;
> +			cpu1 = cpu;
> +			cpu = __getcpu() & VGETCPU_CPU_MASK;
> +		} while (unlikely(cpu != cpu1));
>  
>  		version = __pvclock_read_cycles(&pvti->pvti, &ret, &flags);
>  
>  		/*
>  		 * Test we're still on the cpu as well as the version.
> -		 * We could have been migrated just after the first
> -		 * vgetcpu but before fetching the version, so we
> -		 * wouldn't notice a version change.
> +		 * - We must read TSC of pvti's VCPU.
> +		 * - KVM doesn't follow the versioning protocol, so data could
> +		 *   change before version if we left the VCPU.
>  		 */
> -		cpu1 = __getcpu() & VGETCPU_CPU_MASK;
> -	} while (unlikely(cpu != cpu1 ||
> -			  (pvti->pvti.version & 1) ||
> +		smp_rmb();
> +	} while (unlikely((pvti->pvti.version & 1) ||
>  			  pvti->pvti.version != version ||
>  			  pvti->migrate_count != migrate_count));
>  
> -- 
> 2.3.4
> 
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com>

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Paolo Bonzini April 6, 2015, 10:04 p.m. UTC | #4
On 06/04/2015 22:07, Andy Lutomirski wrote:
> On 04/02/2015 11:59 AM, Andy Lutomirski wrote:
>> On Thu, Apr 2, 2015 at 11:44 AM, Radim Kr?má? <rkrcmar@redhat.com> wrote:
>>> If we were migrated right after __getcpu, but before reading the
>>> migration_count, we wouldn't notice that we read TSC of a different
>>> VCPU, nor that KVM's bug made pvti invalid, as only migration_count
>>> on source VCPU is increased.
>>>
>>> Change vdso instead of updating migration_count on destination.
>>
>> Looks good to me.
> 
> Just to check: what tree is this intended to go through?  I can take it,
> but not until the previous patch makes it into Linus' tree or -tip.  Or
> I can take both patches.

I'll take it for 4.1.

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Paolo Bonzini April 7, 2015, 11:11 a.m. UTC | #5
On 02/04/2015 20:44, Radim Kr?má? wrote:
> If we were migrated right after __getcpu, but before reading the
> migration_count, we wouldn't notice that we read TSC of a different
> VCPU, nor that KVM's bug made pvti invalid, as only migration_count
> on source VCPU is increased.
> 
> Change vdso instead of updating migration_count on destination.
> 
> Fixes: 0a4e6be9ca17 ("x86: kvm: Revert "remove sched notifier for cross-cpu migrations"")
> Cc: stable@vger.kernel.org
> Signed-off-by: Radim Kr?má? <rkrcmar@redhat.com>

Applying this, but removing the "Fixes" tag because a guest patch cannot
fix a host patch (it can work around it or complement it).

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Radim Krčmář April 7, 2015, 12:47 p.m. UTC | #6
2015-04-07 13:11+0200, Paolo Bonzini:
> On 02/04/2015 20:44, Radim Kr?má? wrote:
> > If we were migrated right after __getcpu, but before reading the
> > migration_count, we wouldn't notice that we read TSC of a different
> > VCPU, nor that KVM's bug made pvti invalid, as only migration_count
> > on source VCPU is increased.
> > 
> > Change vdso instead of updating migration_count on destination.
> > 
> > Fixes: 0a4e6be9ca17 ("x86: kvm: Revert "remove sched notifier for cross-cpu migrations"")
> > Cc: stable@vger.kernel.org
> > Signed-off-by: Radim Kr?má? <rkrcmar@redhat.com>
> 
> Applying this, but removing the "Fixes" tag because a guest patch cannot
> fix a host patch (it can work around it or complement it).

I think it was correct.  Both are guest only, the revert just missed
some races.  (0a4e6be9ca17 has misleading commit message ...)
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Paolo Bonzini April 7, 2015, 1:57 p.m. UTC | #7
On 07/04/2015 14:47, Radim Kr?má? wrote:
> I think it was correct.  Both are guest only, the revert just missed
> some races.  (0a4e6be9ca17 has misleading commit message ...)

Oops.  You're right.

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/arch/x86/vdso/vclock_gettime.c b/arch/x86/vdso/vclock_gettime.c
index 30933760ee5f..40d2473836c9 100644
--- a/arch/x86/vdso/vclock_gettime.c
+++ b/arch/x86/vdso/vclock_gettime.c
@@ -99,21 +99,25 @@  static notrace cycle_t vread_pvclock(int *mode)
 		 * __getcpu() calls (Gleb).
 		 */
 
-		pvti = get_pvti(cpu);
+		/* Make sure migrate_count will change if we leave the VCPU. */
+		do {
+			pvti = get_pvti(cpu);
+			migrate_count = pvti->migrate_count;
 
-		migrate_count = pvti->migrate_count;
+			cpu1 = cpu;
+			cpu = __getcpu() & VGETCPU_CPU_MASK;
+		} while (unlikely(cpu != cpu1));
 
 		version = __pvclock_read_cycles(&pvti->pvti, &ret, &flags);
 
 		/*
 		 * Test we're still on the cpu as well as the version.
-		 * We could have been migrated just after the first
-		 * vgetcpu but before fetching the version, so we
-		 * wouldn't notice a version change.
+		 * - We must read TSC of pvti's VCPU.
+		 * - KVM doesn't follow the versioning protocol, so data could
+		 *   change before version if we left the VCPU.
 		 */
-		cpu1 = __getcpu() & VGETCPU_CPU_MASK;
-	} while (unlikely(cpu != cpu1 ||
-			  (pvti->pvti.version & 1) ||
+		smp_rmb();
+	} while (unlikely((pvti->pvti.version & 1) ||
 			  pvti->pvti.version != version ||
 			  pvti->migrate_count != migrate_count));