diff mbox series

[v7,5/6] KVM: x86: Refactor tsc synchronization code

Message ID 20210816001130.3059564-6-oupton@google.com (mailing list archive)
State New, archived
Headers show
Series KVM: x86: Add idempotent controls for migrating system counter state | expand

Commit Message

Oliver Upton Aug. 16, 2021, 12:11 a.m. UTC
Refactor kvm_synchronize_tsc to make a new function that allows callers
to specify TSC parameters (offset, value, nanoseconds, etc.) explicitly
for the sake of participating in TSC synchronization.

Signed-off-by: Oliver Upton <oupton@google.com>
---
 arch/x86/kvm/x86.c | 105 ++++++++++++++++++++++++++-------------------
 1 file changed, 61 insertions(+), 44 deletions(-)

Comments

Sean Christopherson Sept. 2, 2021, 7:21 p.m. UTC | #1
On Mon, Aug 16, 2021, Oliver Upton wrote:
> Refactor kvm_synchronize_tsc to make a new function that allows callers
> to specify TSC parameters (offset, value, nanoseconds, etc.) explicitly
> for the sake of participating in TSC synchronization.
> 
> Signed-off-by: Oliver Upton <oupton@google.com>
> ---
> +	struct kvm *kvm = vcpu->kvm;
> +	bool already_matched;
> +
> +	lockdep_assert_held(&kvm->arch.tsc_write_lock);
> +
> +	already_matched =
> +	       (vcpu->arch.this_tsc_generation == kvm->arch.cur_tsc_generation);
> +

...

> +	if (!matched) {
> +		/*
> +		 * We split periods of matched TSC writes into generations.
> +		 * For each generation, we track the original measured
> +		 * nanosecond time, offset, and write, so if TSCs are in
> +		 * sync, we can match exact offset, and if not, we can match
> +		 * exact software computation in compute_guest_tsc()
> +		 *
> +		 * These values are tracked in kvm->arch.cur_xxx variables.
> +		 */
> +		kvm->arch.cur_tsc_generation++;
> +		kvm->arch.cur_tsc_nsec = ns;
> +		kvm->arch.cur_tsc_write = tsc;
> +		kvm->arch.cur_tsc_offset = offset;
> +
> +		spin_lock(&kvm->arch.pvclock_gtod_sync_lock);
> +		kvm->arch.nr_vcpus_matched_tsc = 0;
> +	} else if (!already_matched) {
> +		spin_lock(&kvm->arch.pvclock_gtod_sync_lock);
> +		kvm->arch.nr_vcpus_matched_tsc++;
> +	}
> +
> +	kvm_track_tsc_matching(vcpu);
> +	spin_unlock(&kvm->arch.pvclock_gtod_sync_lock);

This unlock is imbalanced if matched and already_matched are both true.  It's not
immediately obvious that that _can't_ happen, and if it truly can't happen then
conditionally locking is pointless (because it's not actually conditional).

The previous code took the lock unconditionally, I don't see a strong argument
to change that, e.g. holding it for a few extra cycles while kvm->arch.cur_tsc_*
are updated is unlikely to be noticable.

If you really want to delay taking the locking, you could do

	if (!matched) {
		kvm->arch.cur_tsc_generation++;
		kvm->arch.cur_tsc_nsec = ns;
		kvm->arch.cur_tsc_write = data;
		kvm->arch.cur_tsc_offset = offset;
	}

	spin_lock(&kvm->arch.pvclock_gtod_sync_lock);
	if (!matched)
		kvm->arch.nr_vcpus_matched_tsc = 0;
	else if (!already_matched)
		kvm->arch.nr_vcpus_matched_tsc++;
	spin_unlock(&kvm->arch.pvclock_gtod_sync_lock);

or if you want to get greedy

	if (!matched || !already_matched) {
		spin_lock(&kvm->arch.pvclock_gtod_sync_lock);
		if (!matched)
			kvm->arch.nr_vcpus_matched_tsc = 0;
		else
			kvm->arch.nr_vcpus_matched_tsc++;
		spin_unlock(&kvm->arch.pvclock_gtod_sync_lock);
	}

Though I'm not sure the minor complexity is worth avoiding spinlock contention.
Oliver Upton Sept. 2, 2021, 7:41 p.m. UTC | #2
On Thu, Sep 2, 2021 at 12:21 PM Sean Christopherson <seanjc@google.com> wrote:
>
> On Mon, Aug 16, 2021, Oliver Upton wrote:
> > Refactor kvm_synchronize_tsc to make a new function that allows callers
> > to specify TSC parameters (offset, value, nanoseconds, etc.) explicitly
> > for the sake of participating in TSC synchronization.
> >
> > Signed-off-by: Oliver Upton <oupton@google.com>
> > ---
> > +     struct kvm *kvm = vcpu->kvm;
> > +     bool already_matched;
> > +
> > +     lockdep_assert_held(&kvm->arch.tsc_write_lock);
> > +
> > +     already_matched =
> > +            (vcpu->arch.this_tsc_generation == kvm->arch.cur_tsc_generation);
> > +
>
> ...
>
> > +     if (!matched) {
> > +             /*
> > +              * We split periods of matched TSC writes into generations.
> > +              * For each generation, we track the original measured
> > +              * nanosecond time, offset, and write, so if TSCs are in
> > +              * sync, we can match exact offset, and if not, we can match
> > +              * exact software computation in compute_guest_tsc()
> > +              *
> > +              * These values are tracked in kvm->arch.cur_xxx variables.
> > +              */
> > +             kvm->arch.cur_tsc_generation++;
> > +             kvm->arch.cur_tsc_nsec = ns;
> > +             kvm->arch.cur_tsc_write = tsc;
> > +             kvm->arch.cur_tsc_offset = offset;
> > +
> > +             spin_lock(&kvm->arch.pvclock_gtod_sync_lock);
> > +             kvm->arch.nr_vcpus_matched_tsc = 0;
> > +     } else if (!already_matched) {
> > +             spin_lock(&kvm->arch.pvclock_gtod_sync_lock);
> > +             kvm->arch.nr_vcpus_matched_tsc++;
> > +     }
> > +
> > +     kvm_track_tsc_matching(vcpu);
> > +     spin_unlock(&kvm->arch.pvclock_gtod_sync_lock);
>
> This unlock is imbalanced if matched and already_matched are both true.  It's not
> immediately obvious that that _can't_ happen, and if it truly can't happen then
> conditionally locking is pointless (because it's not actually conditional).
>
> The previous code took the lock unconditionally, I don't see a strong argument
> to change that, e.g. holding it for a few extra cycles while kvm->arch.cur_tsc_*
> are updated is unlikely to be noticable.

We may have gone full circle here :-) You had said it was confusing to
hold the lock when updating kvm->arch.cur_tsc_* a while back. I do
still agree with that sentiment, but the conditional locking is odd.

> If you really want to delay taking the locking, you could do
>
>         if (!matched) {
>                 kvm->arch.cur_tsc_generation++;
>                 kvm->arch.cur_tsc_nsec = ns;
>                 kvm->arch.cur_tsc_write = data;
>                 kvm->arch.cur_tsc_offset = offset;
>         }
>
>         spin_lock(&kvm->arch.pvclock_gtod_sync_lock);
>         if (!matched)
>                 kvm->arch.nr_vcpus_matched_tsc = 0;
>         else if (!already_matched)
>                 kvm->arch.nr_vcpus_matched_tsc++;
>         spin_unlock(&kvm->arch.pvclock_gtod_sync_lock);

This seems the most readable, making it clear what is guarded and what
is not. I'll probably go this route.

--
Thanks,
Oliver
Paolo Bonzini Sept. 24, 2021, 9:28 a.m. UTC | #3
On 02/09/21 21:21, Sean Christopherson wrote:
> 
>> +	if (!matched) {
>> +		...
>> +		spin_lock(&kvm->arch.pvclock_gtod_sync_lock);
>> +		kvm->arch.nr_vcpus_matched_tsc = 0;
>> +	} else if (!already_matched) {
>> +		spin_lock(&kvm->arch.pvclock_gtod_sync_lock);
>> +		kvm->arch.nr_vcpus_matched_tsc++;
>> +	}
>> +
>> +	kvm_track_tsc_matching(vcpu);
>> +	spin_unlock(&kvm->arch.pvclock_gtod_sync_lock);
>
> This unlock is imbalanced if matched and already_matched are both true.  It's not
> immediately obvious that that_can't_  happen, and if it truly can't happen then
> conditionally locking is pointless (because it's not actually conditional).

This is IMO another reason to unify tsc_write_lock and 
pvclock_gtod_sync_lock.  The chances of contention are pretty slim.  As 
soon as I sort out the next -rc3 pull request I'll send out my version 
of Oliver's patches.

Thanks,

Paolo
diff mbox series

Patch

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index f1434cd388b9..9d0445527dad 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -2447,13 +2447,71 @@  static inline bool kvm_check_tsc_unstable(void)
 	return check_tsc_unstable();
 }
 
+/*
+ * Infers attempts to synchronize the guest's tsc from host writes. Sets the
+ * offset for the vcpu and tracks the TSC matching generation that the vcpu
+ * participates in.
+ */
+static void __kvm_synchronize_tsc(struct kvm_vcpu *vcpu, u64 offset, u64 tsc,
+				  u64 ns, bool matched)
+{
+	struct kvm *kvm = vcpu->kvm;
+	bool already_matched;
+
+	lockdep_assert_held(&kvm->arch.tsc_write_lock);
+
+	already_matched =
+	       (vcpu->arch.this_tsc_generation == kvm->arch.cur_tsc_generation);
+
+	/*
+	 * We track the most recent recorded KHZ, write and time to
+	 * allow the matching interval to be extended at each write.
+	 */
+	kvm->arch.last_tsc_nsec = ns;
+	kvm->arch.last_tsc_write = tsc;
+	kvm->arch.last_tsc_khz = vcpu->arch.virtual_tsc_khz;
+
+	vcpu->arch.last_guest_tsc = tsc;
+
+	/* Keep track of which generation this VCPU has synchronized to */
+	vcpu->arch.this_tsc_generation = kvm->arch.cur_tsc_generation;
+	vcpu->arch.this_tsc_nsec = kvm->arch.cur_tsc_nsec;
+	vcpu->arch.this_tsc_write = kvm->arch.cur_tsc_write;
+
+	kvm_vcpu_write_tsc_offset(vcpu, offset);
+
+	if (!matched) {
+		/*
+		 * We split periods of matched TSC writes into generations.
+		 * For each generation, we track the original measured
+		 * nanosecond time, offset, and write, so if TSCs are in
+		 * sync, we can match exact offset, and if not, we can match
+		 * exact software computation in compute_guest_tsc()
+		 *
+		 * These values are tracked in kvm->arch.cur_xxx variables.
+		 */
+		kvm->arch.cur_tsc_generation++;
+		kvm->arch.cur_tsc_nsec = ns;
+		kvm->arch.cur_tsc_write = tsc;
+		kvm->arch.cur_tsc_offset = offset;
+
+		spin_lock(&kvm->arch.pvclock_gtod_sync_lock);
+		kvm->arch.nr_vcpus_matched_tsc = 0;
+	} else if (!already_matched) {
+		spin_lock(&kvm->arch.pvclock_gtod_sync_lock);
+		kvm->arch.nr_vcpus_matched_tsc++;
+	}
+
+	kvm_track_tsc_matching(vcpu);
+	spin_unlock(&kvm->arch.pvclock_gtod_sync_lock);
+}
+
 static void kvm_synchronize_tsc(struct kvm_vcpu *vcpu, u64 data)
 {
 	struct kvm *kvm = vcpu->kvm;
 	u64 offset, ns, elapsed;
 	unsigned long flags;
-	bool matched;
-	bool already_matched;
+	bool matched = false;
 	bool synchronizing = false;
 
 	raw_spin_lock_irqsave(&kvm->arch.tsc_write_lock, flags);
@@ -2499,50 +2557,9 @@  static void kvm_synchronize_tsc(struct kvm_vcpu *vcpu, u64 data)
 			offset = kvm_compute_l1_tsc_offset(vcpu, data);
 		}
 		matched = true;
-		already_matched = (vcpu->arch.this_tsc_generation == kvm->arch.cur_tsc_generation);
-	} else {
-		/*
-		 * We split periods of matched TSC writes into generations.
-		 * For each generation, we track the original measured
-		 * nanosecond time, offset, and write, so if TSCs are in
-		 * sync, we can match exact offset, and if not, we can match
-		 * exact software computation in compute_guest_tsc()
-		 *
-		 * These values are tracked in kvm->arch.cur_xxx variables.
-		 */
-		kvm->arch.cur_tsc_generation++;
-		kvm->arch.cur_tsc_nsec = ns;
-		kvm->arch.cur_tsc_write = data;
-		kvm->arch.cur_tsc_offset = offset;
-		matched = false;
 	}
 
-	/*
-	 * We also track th most recent recorded KHZ, write and time to
-	 * allow the matching interval to be extended at each write.
-	 */
-	kvm->arch.last_tsc_nsec = ns;
-	kvm->arch.last_tsc_write = data;
-	kvm->arch.last_tsc_khz = vcpu->arch.virtual_tsc_khz;
-
-	vcpu->arch.last_guest_tsc = data;
-
-	/* Keep track of which generation this VCPU has synchronized to */
-	vcpu->arch.this_tsc_generation = kvm->arch.cur_tsc_generation;
-	vcpu->arch.this_tsc_nsec = kvm->arch.cur_tsc_nsec;
-	vcpu->arch.this_tsc_write = kvm->arch.cur_tsc_write;
-
-	kvm_vcpu_write_tsc_offset(vcpu, offset);
-
-	spin_lock_irqsave(&kvm->arch.pvclock_gtod_sync_lock, flags);
-	if (!matched) {
-		kvm->arch.nr_vcpus_matched_tsc = 0;
-	} else if (!already_matched) {
-		kvm->arch.nr_vcpus_matched_tsc++;
-	}
-
-	kvm_track_tsc_matching(vcpu);
-	spin_unlock_irqrestore(&kvm->arch.pvclock_gtod_sync_lock, flags);
+	__kvm_synchronize_tsc(vcpu, offset, data, ns, matched);
 	raw_spin_unlock_irqrestore(&kvm->arch.tsc_write_lock, flags);
 }