[v5,02/13] KVM: x86: Refactor tsc synchronization code

Message ID	20210729173300.181775-3-oupton@google.com (mailing list archive)
State	New, archived
Headers	show Return-Path: <SRS0=TA8D=MV=lists.infradead.org=linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@kernel.org> DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org CF35A60EBB Date: Thu, 29 Jul 2021 17:32:49 +0000 In-Reply-To: <20210729173300.181775-1-oupton@google.com> Message-Id: <20210729173300.181775-3-oupton@google.com> Mime-Version: 1.0 References: <20210729173300.181775-1-oupton@google.com> Subject: [PATCH v5 02/13] KVM: x86: Refactor tsc synchronization code From: Oliver Upton <oupton@google.com> To: kvm@vger.kernel.org, kvmarm@lists.cs.columbia.edu Cc: Paolo Bonzini <pbonzini@redhat.com>, Sean Christopherson <seanjc@google.com>, Marc Zyngier <maz@kernel.org>, Peter Shier <pshier@google.com>, Jim Mattson <jmattson@google.com>, David Matlack <dmatlack@google.com>, Ricardo Koller <ricarkol@google.com>, Jing Zhang <jingzhangos@google.com>, Raghavendra Rao Anata <rananta@google.com>, James Morse <james.morse@arm.com>, Alexandru Elisei <alexandru.elisei@arm.com>, Suzuki K Poulose <suzuki.poulose@arm.com>, linux-arm-kernel@lists.infradead.org, Andrew Jones <drjones@redhat.com>, Oliver Upton <oupton@google.com> Precedence: list Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Sender: "linux-arm-kernel" <linux-arm-kernel-bounces@lists.infradead.org> Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org
Series	KVM: Add idempotent controls for migrating system counter state \| expand [v5,00/13] KVM: Add idempotent controls for migrating system counter state [v5,01/13] KVM: x86: Report host tsc and realtime values in KVM_GET_CLOCK [v5,02/13] KVM: x86: Refactor tsc synchronization code [v5,03/13] KVM: x86: Expose TSC offset controls to userspace [v5,04/13] tools: arch: x86: pull in pvclock headers [v5,05/13] selftests: KVM: Add test for KVM_{GET,SET}_CLOCK [v5,06/13] selftests: KVM: Fix kvm device helper ioctl assertions [v5,07/13] selftests: KVM: Add helpers for vCPU device attributes [v5,08/13] selftests: KVM: Introduce system counter offset test [v5,09/13] KVM: arm64: Allow userspace to configure a vCPU's virtual offset [v5,10/13] selftests: KVM: Add support for aarch64 to system_counter_offset_test [v5,11/13] KVM: arm64: Provide userspace access to the physical counter offset [v5,12/13] selftests: KVM: Test physical counter offsetting [v5,13/13] selftests: KVM: Add counter emulation benchmark

Message ID

20210729173300.181775-3-oupton@google.com (mailing list archive)

State

New, archived

Headers

DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org CF35A60EBB
Date: Thu, 29 Jul 2021 17:32:49 +0000
In-Reply-To: <20210729173300.181775-1-oupton@google.com>
Message-Id: <20210729173300.181775-3-oupton@google.com>
Mime-Version: 1.0
References: <20210729173300.181775-1-oupton@google.com>
Subject: [PATCH v5 02/13] KVM: x86: Refactor tsc synchronization code
From: Oliver Upton <oupton@google.com>
To: kvm@vger.kernel.org, kvmarm@lists.cs.columbia.edu
Cc: Paolo Bonzini <pbonzini@redhat.com>,
 Sean Christopherson <seanjc@google.com>,
 Marc Zyngier <maz@kernel.org>, Peter Shier <pshier@google.com>,
 Jim Mattson <jmattson@google.com>,
 David Matlack <dmatlack@google.com>, Ricardo Koller <ricarkol@google.com>,
 Jing Zhang <jingzhangos@google.com>,
 Raghavendra Rao Anata <rananta@google.com>,
 James Morse <james.morse@arm.com>,
 Alexandru Elisei <alexandru.elisei@arm.com>,
 Suzuki K Poulose <suzuki.poulose@arm.com>,
 linux-arm-kernel@lists.infradead.org,
 Andrew Jones <drjones@redhat.com>, Oliver Upton <oupton@google.com>
Precedence: list
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit
Sender: "linux-arm-kernel" <linux-arm-kernel-bounces@lists.infradead.org>
Errors-To: 
 linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org

Series

KVM: Add idempotent controls for migrating system counter state | expand

Commit Message

Oliver Upton July 29, 2021, 5:32 p.m. UTC

Refactor kvm_synchronize_tsc to make a new function that allows callers
to specify TSC parameters (offset, value, nanoseconds, etc.) explicitly
for the sake of participating in TSC synchronization.

This changes the locking semantics around TSC writes. Writes to the TSC
will now take the pvclock gtod lock while holding the tsc write lock,
whereas before these locks were disjoint.

Reviewed-by: David Matlack <dmatlack@google.com>
Signed-off-by: Oliver Upton <oupton@google.com>
---
 Documentation/virt/kvm/locking.rst |  11 +++
 arch/x86/kvm/x86.c                 | 106 +++++++++++++++++------------
 2 files changed, 74 insertions(+), 43 deletions(-)

Comments

Sean Christopherson July 30, 2021, 6:08 p.m. UTC | #1

On Thu, Jul 29, 2021, Oliver Upton wrote:
> Refactor kvm_synchronize_tsc to make a new function that allows callers
> to specify TSC parameters (offset, value, nanoseconds, etc.) explicitly
> for the sake of participating in TSC synchronization.
> 
> This changes the locking semantics around TSC writes.

"refactor" and "changes the locking semantics" are somewhat contradictory.  The
correct way to do this is to first change the locking semantics, then extract the
helper you want.  That makes review and archaeology easier, and isolates the
locking change in case it isn't so safe after all.

> Writes to the TSC will now take the pvclock gtod lock while holding the tsc
> write lock, whereas before these locks were disjoint.
> 
> Reviewed-by: David Matlack <dmatlack@google.com>
> Signed-off-by: Oliver Upton <oupton@google.com>
> ---
> +/*
> + * Infers attempts to synchronize the guest's tsc from host writes. Sets the
> + * offset for the vcpu and tracks the TSC matching generation that the vcpu
> + * participates in.
> + *
> + * Must hold kvm->arch.tsc_write_lock to call this function.

Drop this blurb, lockdep assertions exist for a reason :-)

> + */
> +static void __kvm_synchronize_tsc(struct kvm_vcpu *vcpu, u64 offset, u64 tsc,
> +				  u64 ns, bool matched)
> +{
> +	struct kvm *kvm = vcpu->kvm;
> +	bool already_matched;

Eww, not your code, but "matched" and "already_matched" are not helpful names,
e.g. they don't provide a clue as to _what_ matched, and thus don't explain why
there are two separate variables.  And I would expect an "already" variant to
come in from the caller, not the other way 'round.

  matched         => freq_matched
  already_matched => gen_matched

> +	unsigned long flags;
> +
> +	lockdep_assert_held(&kvm->arch.tsc_write_lock);
> +
> +	already_matched =
> +	       (vcpu->arch.this_tsc_generation == kvm->arch.cur_tsc_generation);
> +
> +	/*
> +	 * We track the most recent recorded KHZ, write and time to
> +	 * allow the matching interval to be extended at each write.
> +	 */
> +	kvm->arch.last_tsc_nsec = ns;
> +	kvm->arch.last_tsc_write = tsc;
> +	kvm->arch.last_tsc_khz = vcpu->arch.virtual_tsc_khz;
> +
> +	vcpu->arch.last_guest_tsc = tsc;
> +
> +	/* Keep track of which generation this VCPU has synchronized to */
> +	vcpu->arch.this_tsc_generation = kvm->arch.cur_tsc_generation;
> +	vcpu->arch.this_tsc_nsec = kvm->arch.cur_tsc_nsec;
> +	vcpu->arch.this_tsc_write = kvm->arch.cur_tsc_write;
> +
> +	kvm_vcpu_write_tsc_offset(vcpu, offset);
> +
> +	spin_lock_irqsave(&kvm->arch.pvclock_gtod_sync_lock, flags);

I believe this can be spin_lock(), since AFAICT the caller _must_ disable IRQs
when taking tsc_write_lock, i.e. we know IRQs are disabled at this point.

> +	if (!matched) {
> +		/*
> +		 * We split periods of matched TSC writes into generations.
> +		 * For each generation, we track the original measured
> +		 * nanosecond time, offset, and write, so if TSCs are in
> +		 * sync, we can match exact offset, and if not, we can match
> +		 * exact software computation in compute_guest_tsc()
> +		 *
> +		 * These values are tracked in kvm->arch.cur_xxx variables.
> +		 */
> +		kvm->arch.nr_vcpus_matched_tsc = 0;
> +		kvm->arch.cur_tsc_generation++;
> +		kvm->arch.cur_tsc_nsec = ns;
> +		kvm->arch.cur_tsc_write = tsc;
> +		kvm->arch.cur_tsc_offset = offset;

IMO, adjusting kvm->arch.cur_tsc_* belongs outside of pvclock_gtod_sync_lock.
Based on the existing code, it is protected by tsc_write_lock.  I don't care
about the extra work while holding pvclock_gtod_sync_lock, but it's very confusing
to see code that reads variables outside of a lock, then take a lock and write
those same variables without first rechecking.

> +		matched = false;

What's the point of clearing "matched"?  It's already false...

> +	} else if (!already_matched) {
> +		kvm->arch.nr_vcpus_matched_tsc++;
> +	}
> +
> +	kvm_track_tsc_matching(vcpu);
> +	spin_unlock_irqrestore(&kvm->arch.pvclock_gtod_sync_lock, flags);
> +}
> +

Oliver Upton Aug. 3, 2021, 9:18 p.m. UTC | #2

On Fri, Jul 30, 2021 at 11:08 AM Sean Christopherson <seanjc@google.com> wrote:
>
> On Thu, Jul 29, 2021, Oliver Upton wrote:
> > Refactor kvm_synchronize_tsc to make a new function that allows callers
> > to specify TSC parameters (offset, value, nanoseconds, etc.) explicitly
> > for the sake of participating in TSC synchronization.
> >
> > This changes the locking semantics around TSC writes.
>
> "refactor" and "changes the locking semantics" are somewhat contradictory.  The
> correct way to do this is to first change the locking semantics, then extract the
> helper you want.  That makes review and archaeology easier, and isolates the
> locking change in case it isn't so safe after all.

Indeed, it was mere laziness doing so :)

> > Writes to the TSC will now take the pvclock gtod lock while holding the tsc
> > write lock, whereas before these locks were disjoint.
> >
> > Reviewed-by: David Matlack <dmatlack@google.com>
> > Signed-off-by: Oliver Upton <oupton@google.com>
> > ---
> > +/*
> > + * Infers attempts to synchronize the guest's tsc from host writes. Sets the
> > + * offset for the vcpu and tracks the TSC matching generation that the vcpu
> > + * participates in.
> > + *
> > + * Must hold kvm->arch.tsc_write_lock to call this function.
>
> Drop this blurb, lockdep assertions exist for a reason :-)
>

Ack.

> > + */
> > +static void __kvm_synchronize_tsc(struct kvm_vcpu *vcpu, u64 offset, u64 tsc,
> > +                               u64 ns, bool matched)
> > +{
> > +     struct kvm *kvm = vcpu->kvm;
> > +     bool already_matched;
>
> Eww, not your code, but "matched" and "already_matched" are not helpful names,
> e.g. they don't provide a clue as to _what_ matched, and thus don't explain why
> there are two separate variables.  And I would expect an "already" variant to
> come in from the caller, not the other way 'round.
>
>   matched         => freq_matched
>   already_matched => gen_matched

Yeah, everything this series touches is a bit messy. I greedily
avoided the pile of cleanups that are needed, but alas...

> > +     spin_lock_irqsave(&kvm->arch.pvclock_gtod_sync_lock, flags);
>
> I believe this can be spin_lock(), since AFAICT the caller _must_ disable IRQs
> when taking tsc_write_lock, i.e. we know IRQs are disabled at this point.

Definitely.


>
> > +     if (!matched) {
> > +             /*
> > +              * We split periods of matched TSC writes into generations.
> > +              * For each generation, we track the original measured
> > +              * nanosecond time, offset, and write, so if TSCs are in
> > +              * sync, we can match exact offset, and if not, we can match
> > +              * exact software computation in compute_guest_tsc()
> > +              *
> > +              * These values are tracked in kvm->arch.cur_xxx variables.
> > +              */
> > +             kvm->arch.nr_vcpus_matched_tsc = 0;
> > +             kvm->arch.cur_tsc_generation++;
> > +             kvm->arch.cur_tsc_nsec = ns;
> > +             kvm->arch.cur_tsc_write = tsc;
> > +             kvm->arch.cur_tsc_offset = offset;
>
> IMO, adjusting kvm->arch.cur_tsc_* belongs outside of pvclock_gtod_sync_lock.
> Based on the existing code, it is protected by tsc_write_lock.  I don't care
> about the extra work while holding pvclock_gtod_sync_lock, but it's very confusing
> to see code that reads variables outside of a lock, then take a lock and write
> those same variables without first rechecking.
>
> > +             matched = false;
>
> What's the point of clearing "matched"?  It's already false...

None, besides just yanking the old chunk of code :)

>
> > +     } else if (!already_matched) {
> > +             kvm->arch.nr_vcpus_matched_tsc++;
> > +     }
> > +
> > +     kvm_track_tsc_matching(vcpu);
> > +     spin_unlock_irqrestore(&kvm->arch.pvclock_gtod_sync_lock, flags);
> > +}
> > +

--
Thanks,
Oliver

diff --git a/Documentation/virt/kvm/locking.rst b/Documentation/virt/kvm/locking.rst
index 8138201efb09..0bf346adac2a 100644
--- a/Documentation/virt/kvm/locking.rst
+++ b/Documentation/virt/kvm/locking.rst
@@ -36,6 +36,9 @@  On x86:
   holding kvm->arch.mmu_lock (typically with ``read_lock``, otherwise
   there's no need to take kvm->arch.tdp_mmu_pages_lock at all).
 
+- kvm->arch.tsc_write_lock is taken outside
+  kvm->arch.pvclock_gtod_sync_lock
+
 Everything else is a leaf: no other lock is taken inside the critical
 sections.
 
@@ -222,6 +225,14 @@  time it will be set using the Dirty tracking mechanism described above.
 :Comment:	'raw' because hardware enabling/disabling must be atomic /wrt
 		migration.
 
+:Name:		kvm_arch::pvclock_gtod_sync_lock
+:Type:		raw_spinlock_t
+:Arch:		x86
+:Protects:	kvm_arch::{cur_tsc_generation,cur_tsc_nsec,cur_tsc_write,
+			cur_tsc_offset,nr_vcpus_matched_tsc}
+:Comment:	'raw' because updating the kvm master clock must not be
+		preempted.
+
 :Name:		kvm_arch::tsc_write_lock
 :Type:		raw_spinlock
 :Arch:		x86
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index e052c7afaac4..27435a07fb46 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -2443,13 +2443,73 @@  static inline bool kvm_check_tsc_unstable(void)
 	return check_tsc_unstable();
 }
 
+/*
+ * Infers attempts to synchronize the guest's tsc from host writes. Sets the
+ * offset for the vcpu and tracks the TSC matching generation that the vcpu
+ * participates in.
+ *
+ * Must hold kvm->arch.tsc_write_lock to call this function.
+ */
+static void __kvm_synchronize_tsc(struct kvm_vcpu *vcpu, u64 offset, u64 tsc,
+				  u64 ns, bool matched)
+{
+	struct kvm *kvm = vcpu->kvm;
+	bool already_matched;
+	unsigned long flags;
+
+	lockdep_assert_held(&kvm->arch.tsc_write_lock);
+
+	already_matched =
+	       (vcpu->arch.this_tsc_generation == kvm->arch.cur_tsc_generation);
+
+	/*
+	 * We track the most recent recorded KHZ, write and time to
+	 * allow the matching interval to be extended at each write.
+	 */
+	kvm->arch.last_tsc_nsec = ns;
+	kvm->arch.last_tsc_write = tsc;
+	kvm->arch.last_tsc_khz = vcpu->arch.virtual_tsc_khz;
+
+	vcpu->arch.last_guest_tsc = tsc;
+
+	/* Keep track of which generation this VCPU has synchronized to */
+	vcpu->arch.this_tsc_generation = kvm->arch.cur_tsc_generation;
+	vcpu->arch.this_tsc_nsec = kvm->arch.cur_tsc_nsec;
+	vcpu->arch.this_tsc_write = kvm->arch.cur_tsc_write;
+
+	kvm_vcpu_write_tsc_offset(vcpu, offset);
+
+	spin_lock_irqsave(&kvm->arch.pvclock_gtod_sync_lock, flags);
+	if (!matched) {
+		/*
+		 * We split periods of matched TSC writes into generations.
+		 * For each generation, we track the original measured
+		 * nanosecond time, offset, and write, so if TSCs are in
+		 * sync, we can match exact offset, and if not, we can match
+		 * exact software computation in compute_guest_tsc()
+		 *
+		 * These values are tracked in kvm->arch.cur_xxx variables.
+		 */
+		kvm->arch.nr_vcpus_matched_tsc = 0;
+		kvm->arch.cur_tsc_generation++;
+		kvm->arch.cur_tsc_nsec = ns;
+		kvm->arch.cur_tsc_write = tsc;
+		kvm->arch.cur_tsc_offset = offset;
+		matched = false;
+	} else if (!already_matched) {
+		kvm->arch.nr_vcpus_matched_tsc++;
+	}
+
+	kvm_track_tsc_matching(vcpu);
+	spin_unlock_irqrestore(&kvm->arch.pvclock_gtod_sync_lock, flags);
+}
+
 static void kvm_synchronize_tsc(struct kvm_vcpu *vcpu, u64 data)
 {
 	struct kvm *kvm = vcpu->kvm;
 	u64 offset, ns, elapsed;
 	unsigned long flags;
-	bool matched;
-	bool already_matched;
+	bool matched = false;
 	bool synchronizing = false;
 
 	raw_spin_lock_irqsave(&kvm->arch.tsc_write_lock, flags);
@@ -2495,51 +2555,11 @@  static void kvm_synchronize_tsc(struct kvm_vcpu *vcpu, u64 data)
 			offset = kvm_compute_l1_tsc_offset(vcpu, data);
 		}
 		matched = true;
-		already_matched = (vcpu->arch.this_tsc_generation == kvm->arch.cur_tsc_generation);
-	} else {
-		/*
-		 * We split periods of matched TSC writes into generations.
-		 * For each generation, we track the original measured
-		 * nanosecond time, offset, and write, so if TSCs are in
-		 * sync, we can match exact offset, and if not, we can match
-		 * exact software computation in compute_guest_tsc()
-		 *
-		 * These values are tracked in kvm->arch.cur_xxx variables.
-		 */
-		kvm->arch.cur_tsc_generation++;
-		kvm->arch.cur_tsc_nsec = ns;
-		kvm->arch.cur_tsc_write = data;
-		kvm->arch.cur_tsc_offset = offset;
-		matched = false;
 	}
 
-	/*
-	 * We also track th most recent recorded KHZ, write and time to
-	 * allow the matching interval to be extended at each write.
-	 */
-	kvm->arch.last_tsc_nsec = ns;
-	kvm->arch.last_tsc_write = data;
-	kvm->arch.last_tsc_khz = vcpu->arch.virtual_tsc_khz;
-
-	vcpu->arch.last_guest_tsc = data;
+	__kvm_synchronize_tsc(vcpu, offset, data, ns, matched);
 
-	/* Keep track of which generation this VCPU has synchronized to */
-	vcpu->arch.this_tsc_generation = kvm->arch.cur_tsc_generation;
-	vcpu->arch.this_tsc_nsec = kvm->arch.cur_tsc_nsec;
-	vcpu->arch.this_tsc_write = kvm->arch.cur_tsc_write;
-
-	kvm_vcpu_write_tsc_offset(vcpu, offset);
 	raw_spin_unlock_irqrestore(&kvm->arch.tsc_write_lock, flags);
-
-	spin_lock_irqsave(&kvm->arch.pvclock_gtod_sync_lock, flags);
-	if (!matched) {
-		kvm->arch.nr_vcpus_matched_tsc = 0;
-	} else if (!already_matched) {
-		kvm->arch.nr_vcpus_matched_tsc++;
-	}
-
-	kvm_track_tsc_matching(vcpu);
-	spin_unlock_irqrestore(&kvm->arch.pvclock_gtod_sync_lock, flags);
 }
 
 static inline void adjust_tsc_offset_guest(struct kvm_vcpu *vcpu,

[v5,02/13] KVM: x86: Refactor tsc synchronization code

Commit Message

Comments

Patch