x86/vdso/pvclock: Protect STABLE check with the seqcount

Message ID	755dcedb17269e1d7ce12a9a713dea303835137e.1451949191.git.luto@kernel.org (mailing list archive)
State	New, archived
Headers	show Return-Path: <kvm-owner@kernel.org> From: Andy Lutomirski <luto@kernel.org> To: X86 ML <x86@kernel.org>, Radim Krcmar <rkrcmar@redhat.com>, Paolo Bonzini <pbonzini@redhat.com>, linux-kernel@vger.kernel.org, kvm@vger.kernel.org, Alexander Graf <agraf@suse.de> Cc: Andy Lutomirski <luto@kernel.org> Subject: [PATCH] x86/vdso/pvclock: Protect STABLE check with the seqcount Date: Mon, 4 Jan 2016 15:14:28 -0800 Message-Id: <755dcedb17269e1d7ce12a9a713dea303835137e.1451949191.git.luto@kernel.org> In-Reply-To: <20160104225904.GA4947@amt.cnet> References: <20160104225904.GA4947@amt.cnet> Sender: kvm-owner@vger.kernel.org Precedence: bulk

Andy Lutomirski Jan. 4, 2016, 11:14 p.m. UTC

If the clock becomes unstable while we're reading it, we need to
bail.  We can do this by simply moving the check into the seqcount
loop.

Reported-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Andy Lutomirski <luto@kernel.org>
---

Marcelo, how's this?

arch/x86/entry/vdso/vclock_gettime.c | 12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

Marcelo Tosatti Jan. 7, 2016, 9:02 p.m. UTC | #1

On Mon, Jan 04, 2016 at 03:14:28PM -0800, Andy Lutomirski wrote:
> If the clock becomes unstable while we're reading it, we need to
> bail.  We can do this by simply moving the check into the seqcount
> loop.
> 
> Reported-by: Marcelo Tosatti <mtosatti@redhat.com>
> Signed-off-by: Andy Lutomirski <luto@kernel.org>
> ---
> 
> Marcelo, how's this?
> 
> arch/x86/entry/vdso/vclock_gettime.c | 12 ++++++------
>  1 file changed, 6 insertions(+), 6 deletions(-)
> 
> diff --git a/arch/x86/entry/vdso/vclock_gettime.c b/arch/x86/entry/vdso/vclock_gettime.c
> index 8602f06c759f..1a50e09c945b 100644
> --- a/arch/x86/entry/vdso/vclock_gettime.c
> +++ b/arch/x86/entry/vdso/vclock_gettime.c
> @@ -126,23 +126,23 @@ static notrace cycle_t vread_pvclock(int *mode)
>  	 *
>  	 * On Xen, we don't appear to have that guarantee, but Xen still
>  	 * supplies a valid seqlock using the version field.
> -
> +	 *
>  	 * We only do pvclock vdso timing at all if
>  	 * PVCLOCK_TSC_STABLE_BIT is set, and we interpret that bit to
>  	 * mean that all vCPUs have matching pvti and that the TSC is
>  	 * synced, so we can just look at vCPU 0's pvti.
>  	 */
>  
> -	if (unlikely(!(pvti->flags & PVCLOCK_TSC_STABLE_BIT))) {
> -		*mode = VCLOCK_NONE;
> -		return 0;
> -	}
> -
>  	do {
>  		version = pvti->version;
>  
>  		smp_rmb();
>  
> +		if (unlikely(!(pvti->flags & PVCLOCK_TSC_STABLE_BIT))) {
> +			*mode = VCLOCK_NONE;
> +			return 0;
> +		}
> +
>  		tsc = rdtsc_ordered();
>  		pvti_tsc_to_system_mul = pvti->tsc_to_system_mul;
>  		pvti_tsc_shift = pvti->tsc_shift;
> -- 
> 2.4.3

Check it before returning the value (once cleared, it can't be set back 
to 1), similarly to what was in place before.


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Andy Lutomirski Jan. 7, 2016, 9:13 p.m. UTC | #2

On Thu, Jan 7, 2016 at 1:02 PM, Marcelo Tosatti <mtosatti@redhat.com> wrote:
> On Mon, Jan 04, 2016 at 03:14:28PM -0800, Andy Lutomirski wrote:
>> If the clock becomes unstable while we're reading it, we need to
>> bail.  We can do this by simply moving the check into the seqcount
>> loop.
>>
>> Reported-by: Marcelo Tosatti <mtosatti@redhat.com>
>> Signed-off-by: Andy Lutomirski <luto@kernel.org>
>> ---
>>
>> Marcelo, how's this?
>>
>> arch/x86/entry/vdso/vclock_gettime.c | 12 ++++++------
>>  1 file changed, 6 insertions(+), 6 deletions(-)
>>
>> diff --git a/arch/x86/entry/vdso/vclock_gettime.c b/arch/x86/entry/vdso/vclock_gettime.c
>> index 8602f06c759f..1a50e09c945b 100644
>> --- a/arch/x86/entry/vdso/vclock_gettime.c
>> +++ b/arch/x86/entry/vdso/vclock_gettime.c
>> @@ -126,23 +126,23 @@ static notrace cycle_t vread_pvclock(int *mode)
>>        *
>>        * On Xen, we don't appear to have that guarantee, but Xen still
>>        * supplies a valid seqlock using the version field.
>> -
>> +      *
>>        * We only do pvclock vdso timing at all if
>>        * PVCLOCK_TSC_STABLE_BIT is set, and we interpret that bit to
>>        * mean that all vCPUs have matching pvti and that the TSC is
>>        * synced, so we can just look at vCPU 0's pvti.
>>        */
>>
>> -     if (unlikely(!(pvti->flags & PVCLOCK_TSC_STABLE_BIT))) {
>> -             *mode = VCLOCK_NONE;
>> -             return 0;
>> -     }
>> -
>>       do {
>>               version = pvti->version;
>>
>>               smp_rmb();
>>
>> +             if (unlikely(!(pvti->flags & PVCLOCK_TSC_STABLE_BIT))) {
>> +                     *mode = VCLOCK_NONE;
>> +                     return 0;
>> +             }
>> +
>>               tsc = rdtsc_ordered();
>>               pvti_tsc_to_system_mul = pvti->tsc_to_system_mul;
>>               pvti_tsc_shift = pvti->tsc_shift;
>> --
>> 2.4.3
>
> Check it before returning the value (once cleared, it can't be set back
> to 1), similarly to what was in place before.
>
>

I don't understand what you mean.

In the old code (4.3 and 4.4), the vdso checks STABLE_BIT at the end,
which is correct as long as STABLE_BIT can never change from 0 to 1.

In the -tip code, it's clearly wrong.

In the code in this patch, it should be correct regardless of how
STABLE_BIT changes as long as the seqcount works.  Given that the
performance cost of doing that is zero, I'd rather keep it that way.
If we're really paranoid, we could move it after the rest of the pvti
reads and add a barrier, but is there really any host on which that
matters?

--Andy

Paolo Bonzini Jan. 7, 2016, 9:47 p.m. UTC | #3

On 07/01/2016 22:13, Andy Lutomirski wrote:
> I don't understand what you mean.
> 
> In the old code (4.3 and 4.4), the vdso checks STABLE_BIT at the end,
> which is correct as long as STABLE_BIT can never change from 0 to 1.
> 
> In the -tip code, it's clearly wrong.
> 
> In the code in this patch, it should be correct regardless of how
> STABLE_BIT changes as long as the seqcount works.  Given that the
> performance cost of doing that is zero, I'd rather keep it that way.
> If we're really paranoid, we could move it after the rest of the pvti
> reads and add a barrier, but is there really any host on which that
> matters?

I agree that your patch is fine.

Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Marcelo Tosatti Jan. 8, 2016, 2:04 p.m. UTC | #4

On Thu, Jan 07, 2016 at 01:13:41PM -0800, Andy Lutomirski wrote:
> On Thu, Jan 7, 2016 at 1:02 PM, Marcelo Tosatti <mtosatti@redhat.com> wrote:
> > On Mon, Jan 04, 2016 at 03:14:28PM -0800, Andy Lutomirski wrote:
> >> If the clock becomes unstable while we're reading it, we need to
> >> bail.  We can do this by simply moving the check into the seqcount
> >> loop.
> >>
> >> Reported-by: Marcelo Tosatti <mtosatti@redhat.com>
> >> Signed-off-by: Andy Lutomirski <luto@kernel.org>
> >> ---
> >>
> >> Marcelo, how's this?
> >>
> >> arch/x86/entry/vdso/vclock_gettime.c | 12 ++++++------
> >>  1 file changed, 6 insertions(+), 6 deletions(-)
> >>
> >> diff --git a/arch/x86/entry/vdso/vclock_gettime.c b/arch/x86/entry/vdso/vclock_gettime.c
> >> index 8602f06c759f..1a50e09c945b 100644
> >> --- a/arch/x86/entry/vdso/vclock_gettime.c
> >> +++ b/arch/x86/entry/vdso/vclock_gettime.c
> >> @@ -126,23 +126,23 @@ static notrace cycle_t vread_pvclock(int *mode)
> >>        *
> >>        * On Xen, we don't appear to have that guarantee, but Xen still
> >>        * supplies a valid seqlock using the version field.
> >> -
> >> +      *
> >>        * We only do pvclock vdso timing at all if
> >>        * PVCLOCK_TSC_STABLE_BIT is set, and we interpret that bit to
> >>        * mean that all vCPUs have matching pvti and that the TSC is
> >>        * synced, so we can just look at vCPU 0's pvti.
> >>        */
> >>
> >> -     if (unlikely(!(pvti->flags & PVCLOCK_TSC_STABLE_BIT))) {
> >> -             *mode = VCLOCK_NONE;
> >> -             return 0;
> >> -     }
> >> -
> >>       do {
> >>               version = pvti->version;
> >>
> >>               smp_rmb();
> >>
> >> +             if (unlikely(!(pvti->flags & PVCLOCK_TSC_STABLE_BIT))) {
> >> +                     *mode = VCLOCK_NONE;
> >> +                     return 0;
> >> +             }
> >> +
> >>               tsc = rdtsc_ordered();
> >>               pvti_tsc_to_system_mul = pvti->tsc_to_system_mul;
> >>               pvti_tsc_shift = pvti->tsc_shift;
> >> --
> >> 2.4.3
> >
> > Check it before returning the value (once cleared, it can't be set back
> > to 1), similarly to what was in place before.
> >
> >
> 
> I don't understand what you mean.
> 
> In the old code (4.3 and 4.4), the vdso checks STABLE_BIT at the end,
> which is correct as long as STABLE_BIT can never change from 0 to 1.
> 
> In the -tip code, it's clearly wrong.
> 
> In the code in this patch, it should be correct regardless of how
> STABLE_BIT changes as long as the seqcount works.  Given that the
> performance cost of doing that is zero, I'd rather keep it that way.
> If we're really paranoid, we could move it after the rest of the pvti
> reads and add a barrier, but is there really any host on which that
> matters?
> 
> --Andy
> 
> -- 
> Andy Lutomirski
> AMA Capital Management, LLC

Right, its OK due to version check, thanks.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Andy Lutomirski Jan. 12, 2016, 7:48 p.m. UTC | #5

Hi Ingo-

Can you apply this before the tip:x86/asm pull request goes out?  It
fixes a regression in tip:x86/asm.

--Andy

On Fri, Jan 8, 2016 at 6:04 AM, Marcelo Tosatti <mtosatti@redhat.com> wrote:
> On Thu, Jan 07, 2016 at 01:13:41PM -0800, Andy Lutomirski wrote:
>> On Thu, Jan 7, 2016 at 1:02 PM, Marcelo Tosatti <mtosatti@redhat.com> wrote:
>> > On Mon, Jan 04, 2016 at 03:14:28PM -0800, Andy Lutomirski wrote:
>> >> If the clock becomes unstable while we're reading it, we need to
>> >> bail.  We can do this by simply moving the check into the seqcount
>> >> loop.
>> >>
>> >> Reported-by: Marcelo Tosatti <mtosatti@redhat.com>
>> >> Signed-off-by: Andy Lutomirski <luto@kernel.org>
>> >> ---
>> >>
>> >> Marcelo, how's this?
>> >>
>> >> arch/x86/entry/vdso/vclock_gettime.c | 12 ++++++------
>> >>  1 file changed, 6 insertions(+), 6 deletions(-)
>> >>
>> >> diff --git a/arch/x86/entry/vdso/vclock_gettime.c b/arch/x86/entry/vdso/vclock_gettime.c
>> >> index 8602f06c759f..1a50e09c945b 100644
>> >> --- a/arch/x86/entry/vdso/vclock_gettime.c
>> >> +++ b/arch/x86/entry/vdso/vclock_gettime.c
>> >> @@ -126,23 +126,23 @@ static notrace cycle_t vread_pvclock(int *mode)
>> >>        *
>> >>        * On Xen, we don't appear to have that guarantee, but Xen still
>> >>        * supplies a valid seqlock using the version field.
>> >> -
>> >> +      *
>> >>        * We only do pvclock vdso timing at all if
>> >>        * PVCLOCK_TSC_STABLE_BIT is set, and we interpret that bit to
>> >>        * mean that all vCPUs have matching pvti and that the TSC is
>> >>        * synced, so we can just look at vCPU 0's pvti.
>> >>        */
>> >>
>> >> -     if (unlikely(!(pvti->flags & PVCLOCK_TSC_STABLE_BIT))) {
>> >> -             *mode = VCLOCK_NONE;
>> >> -             return 0;
>> >> -     }
>> >> -
>> >>       do {
>> >>               version = pvti->version;
>> >>
>> >>               smp_rmb();
>> >>
>> >> +             if (unlikely(!(pvti->flags & PVCLOCK_TSC_STABLE_BIT))) {
>> >> +                     *mode = VCLOCK_NONE;
>> >> +                     return 0;
>> >> +             }
>> >> +
>> >>               tsc = rdtsc_ordered();
>> >>               pvti_tsc_to_system_mul = pvti->tsc_to_system_mul;
>> >>               pvti_tsc_shift = pvti->tsc_shift;
>> >> --
>> >> 2.4.3
>> >
>> > Check it before returning the value (once cleared, it can't be set back
>> > to 1), similarly to what was in place before.
>> >
>> >
>>
>> I don't understand what you mean.
>>
>> In the old code (4.3 and 4.4), the vdso checks STABLE_BIT at the end,
>> which is correct as long as STABLE_BIT can never change from 0 to 1.
>>
>> In the -tip code, it's clearly wrong.
>>
>> In the code in this patch, it should be correct regardless of how
>> STABLE_BIT changes as long as the seqcount works.  Given that the
>> performance cost of doing that is zero, I'd rather keep it that way.
>> If we're really paranoid, we could move it after the rest of the pvti
>> reads and add a barrier, but is there really any host on which that
>> matters?
>>
>> --Andy
>>
>> --
>> Andy Lutomirski
>> AMA Capital Management, LLC
>
> Right, its OK due to version check, thanks.
>

Ingo Molnar Jan. 13, 2016, 10:46 a.m. UTC | #6

* Andy Lutomirski <luto@amacapital.net> wrote:

> Hi Ingo-
> 
> Can you apply this before the tip:x86/asm pull request goes out?  It
> fixes a regression in tip:x86/asm.

Ooops, saw this mail too late - I'll merge this up into x86/urgent right now and 
send all pending fixes to Linus.

Thanks,

	Ingo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

x86/vdso/pvclock: Protect STABLE check with the seqcount

Commit Message

Comments

Patch