[kvm-unit-tests,16/16] add IPI loss stress test

Message ID	20221020152404.283980-17-mlevitsk@redhat.com (mailing list archive)
State	New, archived
Headers	show Return-Path: <kvm-owner@kernel.org> From: Maxim Levitsky <mlevitsk@redhat.com> To: kvm@vger.kernel.org Cc: Maxim Levitsky <mlevitsk@redhat.com>, Cathy Avery <cavery@redhat.com>, Paolo Bonzini <pbonzini@redhat.com> Subject: [kvm-unit-tests PATCH 16/16] add IPI loss stress test Date: Thu, 20 Oct 2022 18:24:04 +0300 Message-Id: <20221020152404.283980-17-mlevitsk@redhat.com> In-Reply-To: <20221020152404.283980-1-mlevitsk@redhat.com> References: <20221020152404.283980-1-mlevitsk@redhat.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Precedence: bulk
Series	kvm-unit-tests: set of fixes and new tests \| expand [kvm-unit-tests,00/16] kvm-unit-tests: set of fixes and new tests [kvm-unit-tests,01/16] x86: make irq_enable avoid the interrupt shadow [kvm-unit-tests,02/16] x86: add few helper functions for apic local timer [kvm-unit-tests,03/16] svm: use irq_enable instead of sti/nop [kvm-unit-tests,04/16] svm: make svm_intr_intercept_mix_if/gif test a bit more robust [kvm-unit-tests,05/16] svm: use apic_start_timer/apic_stop_timer instead of open coding it [kvm-unit-tests,06/16] x86: Add test for #SMI during interrupt window [kvm-unit-tests,07/16] x86: Add a simple test for SYSENTER instruction. [kvm-unit-tests,08/16] svm: add nested shutdown test. [kvm-unit-tests,09/16] svm: move svm spec definitions to lib/x86/svm.h [kvm-unit-tests,10/16] svm: move some svm support functions into lib/x86/svm_lib.h [kvm-unit-tests,11/16] svm: add svm_suported [kvm-unit-tests,12/16] svm: move setup_svm to svm_lib.c [kvm-unit-tests,13/16] svm: move vmcb_ident to svm_lib.c [kvm-unit-tests,14/16] svm: rewerite vm entry macros [kvm-unit-tests,15/16] svm: introduce svm_vcpu [kvm-unit-tests,16/16] add IPI loss stress test

Maxim Levitsky Oct. 20, 2022, 3:24 p.m. UTC

Adds a test that sends IPIs between vCPUs and detects missing IPIs

Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
---
 x86/Makefile.common |   3 +-
 x86/ipi_stress.c    | 235 ++++++++++++++++++++++++++++++++++++++++++++
 x86/unittests.cfg   |   5 +
 3 files changed, 242 insertions(+), 1 deletion(-)
 create mode 100644 x86/ipi_stress.c

Sean Christopherson Oct. 20, 2022, 8:23 p.m. UTC | #1

On Thu, Oct 20, 2022, Maxim Levitsky wrote:
> +u64 num_iterations = -1;

"Run indefinitely" is an odd default.  Why not set the default number of iterations
to something reasonable and then let the user override that if the user wants to
run for an absurdly long time?

> +
> +volatile u64 *isr_counts;
> +bool use_svm;
> +int hlt_allowed = -1;

These can all be static.

> +
> +static int get_random(int min, int max)
> +{
> +	/* TODO : use rdrand to seed an PRNG instead */
> +	u64 random_value = rdtsc() >> 4;
> +
> +	return min + random_value % (max - min + 1);
> +}
> +
> +static void ipi_interrupt_handler(isr_regs_t *r)
> +{
> +	isr_counts[smp_id()]++;
> +	eoi();
> +}
> +
> +static void wait_for_ipi(volatile u64 *count)
> +{
> +	u64 old_count = *count;
> +	bool use_halt;
> +
> +	switch (hlt_allowed) {
> +	case -1:
> +		use_halt = get_random(0,10000) == 0;

Randomly doing "halt" is going to be annoying to debug.  What about tying the
this decision to the iteration and then providing a knob to let the user specify
the frequency?  It seems unlikely that this test will expose a bug that occurs
if and only if the halt path is truly random.

> +		break;
> +	case 0:
> +		use_halt = false;
> +		break;
> +	case 1:
> +		use_halt = true;
> +		break;
> +	default:
> +		use_halt = false;
> +		break;
> +	}
> +
> +	do {
> +		if (use_halt)
> +			asm volatile ("sti;hlt;cli\n");

safe_halt();

> +		else
> +			asm volatile ("sti;nop;cli");

sti_nop_cli();

> +
> +	} while (old_count == *count);

There's no need to loop in the use_halt case.  If KVM spuriously wakes the vCPU
from halt, then that's a KVM bug.  Kinda ugly, but it does provide meaningfully
coverage for the HLT case.

	if (use_halt) {
		safe_halt();
		cli();
	} else {
		do {
			sti_nop_cli();
		} while (old_count == *count);
	}

	assert(*count == old_count + 1);

> +}
> +
> +/******************************************************************************************************/
> +
> +#ifdef __x86_64__
> +
> +static void l2_guest_wait_for_ipi(volatile u64 *count)
> +{
> +	wait_for_ipi(count);
> +	asm volatile("vmmcall");
> +}
> +
> +static void l2_guest_dummy(void)
> +{
> +	asm volatile("vmmcall");
> +}
> +
> +static void wait_for_ipi_in_l2(volatile u64 *count, struct svm_vcpu *vcpu)
> +{
> +	u64 old_count = *count;
> +	bool irq_on_vmentry = get_random(0,1) == 0;

Same concerns about using random numbers.

> +
> +	vcpu->vmcb->save.rip = (ulong)l2_guest_wait_for_ipi;
> +	vcpu->regs.rdi = (u64)count;
> +
> +	vcpu->vmcb->save.rip = irq_on_vmentry ? (ulong)l2_guest_dummy : (ulong)l2_guest_wait_for_ipi;
> +
> +	do {
> +		if (irq_on_vmentry)
> +			vcpu->vmcb->save.rflags |= X86_EFLAGS_IF;
> +		else
> +			vcpu->vmcb->save.rflags &= ~X86_EFLAGS_IF;
> +
> +		asm volatile("clgi;nop;sti");

Why a NOP between CLGI and STI?  And why re-enable GIF on each iteration?

> +		// GIF is set by VMRUN
> +		SVM_VMRUN(vcpu->vmcb, &vcpu->regs);
> +		// GIF is cleared by VMEXIT
> +		asm volatile("cli;nop;stgi");

Why re-enable GIF on every exit?

> +
> +		assert(vcpu->vmcb->control.exit_code == SVM_EXIT_VMMCALL);
> +
> +	} while (old_count == *count);

Isn't the loop only necessary in the irq_on_vmentry case?

static void run_svm_l2(...)
{
	SVM_VMRUN(vcpu->vmcb, &vcpu->regs);
	assert(vcpu->vmcb->control.exit_code == SVM_EXIT_VMMCALL);
}

E.g. can't this be:

	bool irq_on_vmentry = ???;
	u64 old_count = *count;

	clgi();
	sti();

	vcpu->regs.rdi = (u64)count;

	if (!irq_on_vmentry) {
		vcpu->vmcb->save.rip = (ulong)l2_guest_wait_for_ipi;
		vcpu->vmcb->save.rflags &= ~X86_EFLAGS_IF;
		run_svm_l2(...);
	} else {
		vcpu->vmcb->save.rip = (ulong)l2_guest_dummy
		vcpu->vmcb->save.rflags |= X86_EFLAGS_IF;
		do {
			run_svm_l2(...);
		} while (old_count == *count);
	}

	assert(*count == old_count + 1);
	cli();
	stgi();

> +}
> +#endif
> +
> +/******************************************************************************************************/
> +
> +#define FIRST_TEST_VCPU 1
> +
> +static void vcpu_init(void *data)
> +{
> +	/* To make it easier to see iteration number in the trace */
> +	handle_irq(0x40, ipi_interrupt_handler);
> +	handle_irq(0x50, ipi_interrupt_handler);

Why not make it even more granular?  E.g. do vector == 32 + (iteration % ???)
Regardless, a #define for the (base) vector would be helpful, the usage in
vcpu_code() is a bit magical.


> +}
> +
> +static void vcpu_code(void *data)
> +{
> +	int ncpus = cpu_count();
> +	int cpu = (long)data;
> +#ifdef __x86_64__
> +	struct svm_vcpu vcpu;
> +#endif
> +
> +	u64 i;
> +
> +#ifdef __x86_64__
> +	if (cpu == 2 && use_svm)

Why only CPU2?

> +		svm_vcpu_init(&vcpu);
> +#endif
> +
> +	assert(cpu != 0);
> +
> +	if (cpu != FIRST_TEST_VCPU)
> +		wait_for_ipi(&isr_counts[cpu]);
> +
> +	for (i = 0; i < num_iterations; i++)
> +	{
> +		u8 physical_dst = cpu == ncpus -1 ? 1 : cpu + 1;

Space after the '-'.

> +
> +		// send IPI to a next vCPU in a circular fashion
> +		apic_icr_write(APIC_INT_ASSERT |
> +				APIC_DEST_PHYSICAL |
> +				APIC_DM_FIXED |
> +				(i % 2 ? 0x40 : 0x50),
> +				physical_dst);
> +
> +		if (i == (num_iterations - 1) && cpu != FIRST_TEST_VCPU)
> +			break;
> +
> +#ifdef __x86_64__
> +		// wait for the IPI interrupt chain to come back to us
> +		if (cpu == 2 && use_svm) {
> +				wait_for_ipi_in_l2(&isr_counts[cpu], &vcpu);

Indentation is funky.

> +				continue;
> +		}
> +#endif
> +		wait_for_ipi(&isr_counts[cpu]);
> +	}
> +}
> +
> +int main(int argc, void** argv)
> +{
> +	int cpu, ncpus = cpu_count();
> +
> +	assert(ncpus > 2);
> +
> +	if (argc > 1)
> +		hlt_allowed = atol(argv[1]);
> +
> +	if (argc > 2)
> +		num_iterations = atol(argv[2]);
> +
> +	setup_vm();
> +
> +#ifdef __x86_64__
> +	if (svm_supported()) {
> +		use_svm = true;
> +		setup_svm();
> +	}
> +#endif
> +
> +	isr_counts = (volatile u64 *)calloc(ncpus, sizeof(u64));
> +
> +	printf("found %d cpus\n", ncpus);
> +	printf("running for %lld iterations - test\n",
> +		(long long unsigned int)num_iterations);
> +
> +
> +	for (cpu = 0; cpu < ncpus; ++cpu)
> +		on_cpu_async(cpu, vcpu_init, (void *)(long)cpu);
> +
> +	/* now let all the vCPUs end the IPI function*/
> +	while (cpus_active() > 1)
> +		  pause();
> +
> +	printf("starting test on all cpus but 0...\n");
> +
> +	for (cpu = ncpus-1; cpu >= FIRST_TEST_VCPU; cpu--)

Spaces around the '-'.

> +		on_cpu_async(cpu, vcpu_code, (void *)(long)cpu);

Why not use smp_id() in vcpu_code()?  ipi_interrupt_handler() already relies on
that being correct.

> +
> +	printf("test started, waiting to end...\n");
> +
> +	while (cpus_active() > 1) {
> +
> +		unsigned long isr_count1, isr_count2;
> +
> +		isr_count1 = isr_counts[1];
> +		delay(5ULL*1000*1000*1000);

Please add a macro or two for nanoseconds/milliseconds/seconds or whatever this
expands to.

> +		isr_count2 = isr_counts[1];
> +
> +		if (isr_count1 == isr_count2) {
> +			printf("\n");
> +			printf("hang detected!!\n");
> +			break;
> +		} else {
> +			printf("made %ld IPIs \n", (isr_count2 - isr_count1)*(ncpus-1));
> +		}
> +	}
> +
> +	printf("\n");
> +
> +	for (cpu = 1; cpu < ncpus; ++cpu)
> +		report(isr_counts[cpu] == num_iterations,
> +				"Number of IPIs match (%lld)",

Indentation.

> +				(long long unsigned int)isr_counts[cpu]);

Print num_iterations, i.e. expected vs. actual?

> +
> +	free((void*)isr_counts);
> +	return report_summary();
> +}
> diff --git a/x86/unittests.cfg b/x86/unittests.cfg
> index ebb3fdfc..7655d2ba 100644
> --- a/x86/unittests.cfg
> +++ b/x86/unittests.cfg
> @@ -61,6 +61,11 @@ smp = 2
>  file = smptest.flat
>  smp = 3
>  
> +[ipi_stress]
> +file = ipi_stress.flat
> +extra_params = -cpu host,-x2apic,-svm,-hypervisor -global kvm-pit.lost_tick_policy=discard -machine kernel-irqchip=on -append '0 50000'

Why add all the SVM and HLT stuff and then effectively turn them off by default?
There's basically zero chance any other configuration will get regular testing.

And why not have multi configs, e.g. to run with and without x2APIC?

> +smp = 4
> +
>  [vmexit_cpuid]
>  file = vmexit.flat
>  extra_params = -append 'cpuid'
> -- 
> 2.26.3
>

Maxim Levitsky Oct. 24, 2022, 12:54 p.m. UTC | #2

On Thu, 2022-10-20 at 20:23 +0000, Sean Christopherson wrote:
> On Thu, Oct 20, 2022, Maxim Levitsky wrote:
> > +u64 num_iterations = -1;
> 
> "Run indefinitely" is an odd default.  Why not set the default number of iterations
> to something reasonable and then let the user override that if the user wants to
> run for an absurdly long time?
> 
> > +
> > +volatile u64 *isr_counts;
> > +bool use_svm;
> > +int hlt_allowed = -1;
> 
> These can all be static.
> 
> > +
> > +static int get_random(int min, int max)
> > +{
> > +       /* TODO : use rdrand to seed an PRNG instead */
> > +       u64 random_value = rdtsc() >> 4;
> > +
> > +       return min + random_value % (max - min + 1);
> > +}
> > +
> > +static void ipi_interrupt_handler(isr_regs_t *r)
> > +{
> > +       isr_counts[smp_id()]++;
> > +       eoi();
> > +}
> > +
> > +static void wait_for_ipi(volatile u64 *count)
> > +{
> > +       u64 old_count = *count;
> > +       bool use_halt;
> > +
> > +       switch (hlt_allowed) {
> > +       case -1:
> > +               use_halt = get_random(0,10000) == 0;
> 
> Randomly doing "halt" is going to be annoying to debug.  What about tying the
> this decision to the iteration and then providing a knob to let the user specify
> the frequency?  It seems unlikely that this test will expose a bug that occurs
> if and only if the halt path is truly random.

This is stress test, it is pretty much impossible to debug, it is more like pass/fail test.
In addition to that as you see in the switch below the use_halt is trinary boolean,
it can be 0, 1 and -1, and the -1 means random, while 0,1 means that it will alway use halt.


> 
> > +               break;
> > +       case 0:
> > +               use_halt = false;
> > +               break;
> > +       case 1:
> > +               use_halt = true;
> > +               break;
> > +       default:
> > +               use_halt = false;
> > +               break;
> > +       }
> > +
> > +       do {
> > +               if (use_halt)
> > +                       asm volatile ("sti;hlt;cli\n");
> 
> safe_halt();
OK.

> 
> > +               else
> > +                       asm volatile ("sti;nop;cli");
> 
> sti_nop_cli();
I think you mean sti_nop(); cli();


> 
> > +
> > +       } while (old_count == *count);
> 
> There's no need to loop in the use_halt case.  If KVM spuriously wakes the vCPU
> from halt, then that's a KVM bug.  Kinda ugly, but it does provide meaningfully
> coverage for the HLT case.

Nope - KVM does spuriously wake up the CPU, for example when the vCPU thread recieves a signal
and anything else that makes the kvm_vcpu_check_block return -EINTR.


> 
>         if (use_halt) {
>                 safe_halt();
>                 cli();
>         } else {
>                 do {
>                         sti_nop_cli();
>                 } while (old_count == *count);
>         }
> 
>         assert(*count == old_count + 1);
> 
> > +}
> > +
> > +/******************************************************************************************************/
> > +
> > +#ifdef __x86_64__
> > +
> > +static void l2_guest_wait_for_ipi(volatile u64 *count)
> > +{
> > +       wait_for_ipi(count);
> > +       asm volatile("vmmcall");
> > +}
> > +
> > +static void l2_guest_dummy(void)
> > +{
> > +       asm volatile("vmmcall");
> > +}
> > +
> > +static void wait_for_ipi_in_l2(volatile u64 *count, struct svm_vcpu *vcpu)
> > +{
> > +       u64 old_count = *count;
> > +       bool irq_on_vmentry = get_random(0,1) == 0;
> 
> Same concerns about using random numbers.

I can also add a parameter to force this to true/false, or better long term,
is to provide a PRNG and just seed it with either RDRAND or a userspace given number.
RDRAND retrived value can be even printed so that the test can be replayed.

You know just like the tools we both worked on at Intel did....

In fact I'll just do it - just need to pick some open source PRNG code.
Do you happen to know a good one? Mersenne Twister? 

> 
> > +
> > +       vcpu->vmcb->save.rip = (ulong)l2_guest_wait_for_ipi;
> > +       vcpu->regs.rdi = (u64)count;
> > +
> > +       vcpu->vmcb->save.rip = irq_on_vmentry ? (ulong)l2_guest_dummy : (ulong)l2_guest_wait_for_ipi;
> > +
> > +       do {
> > +               if (irq_on_vmentry)
> > +                       vcpu->vmcb->save.rflags |= X86_EFLAGS_IF;
> > +               else
> > +                       vcpu->vmcb->save.rflags &= ~X86_EFLAGS_IF;
> > +
> > +               asm volatile("clgi;nop;sti");
> 
> Why a NOP between CLGI and STI?  And why re-enable GIF on each iteration?

Its a remain from the days I was too lazy to check which instructions have interrupt window.
Also still using comma here. I'll fix this.


> 
> > +               // GIF is set by VMRUN
> > +               SVM_VMRUN(vcpu->vmcb, &vcpu->regs);
> > +               // GIF is cleared by VMEXIT
> > +               asm volatile("cli;nop;stgi");
> 
> Why re-enable GIF on every exit?

And why not? KVM does this on each VMRUN.

> 
> > +
> > +               assert(vcpu->vmcb->control.exit_code == SVM_EXIT_VMMCALL);
> > +
> > +       } while (old_count == *count);
> 
> Isn't the loop only necessary in the irq_on_vmentry case?

Yes it is - the interrupts come from a different vCPU, so entering
the guest with IF set doesn't guarantee that it will get an
interrupt instantly, but the other way around is true,
with IF clear it will alway get the interrupt only after it set it later
in wait_for_ipi(). 

I need to rename irq_on_vmentry to IF_set_on_vmentry, or something.



> 
> static void run_svm_l2(...)
> {
>         SVM_VMRUN(vcpu->vmcb, &vcpu->regs);
>         assert(vcpu->vmcb->control.exit_code == SVM_EXIT_VMMCALL);
> }
> 
> E.g. can't this be:
> 
>         bool irq_on_vmentry = ???;
>         u64 old_count = *count;
> 
>         clgi();
>         sti();
> 
>         vcpu->regs.rdi = (u64)count;
> 
>         if (!irq_on_vmentry) {
>                 vcpu->vmcb->save.rip = (ulong)l2_guest_wait_for_ipi;
>                 vcpu->vmcb->save.rflags &= ~X86_EFLAGS_IF;
>                 run_svm_l2(...);
>         } else {
>                 vcpu->vmcb->save.rip = (ulong)l2_guest_dummy
>                 vcpu->vmcb->save.rflags |= X86_EFLAGS_IF;
>                 do {
>                         run_svm_l2(...);
>                 } while (old_count == *count);
>         }
> 
>         assert(*count == old_count + 1);
>         cli();
>         stgi();
> 
> > +}
> > +#endif
> > +
> > +/******************************************************************************************************/
> > +
> > +#define FIRST_TEST_VCPU 1
> > +
> > +static void vcpu_init(void *data)
> > +{
> > +       /* To make it easier to see iteration number in the trace */
> > +       handle_irq(0x40, ipi_interrupt_handler);
> > +       handle_irq(0x50, ipi_interrupt_handler);
> 
> Why not make it even more granular?  E.g. do vector == 32 + (iteration % ???)
> Regardless, a #define for the (base) vector would be helpful, the usage in
> vcpu_code() is a bit magical.

Don't see why not, but usually two vectors is enough. I can replace the magic
numbers with #defines.

> 
> 
> > +}
> > +
> > +static void vcpu_code(void *data)
> > +{
> > +       int ncpus = cpu_count();
> > +       int cpu = (long)data;
> > +#ifdef __x86_64__
> > +       struct svm_vcpu vcpu;
> > +#endif
> > +
> > +       u64 i;
> > +
> > +#ifdef __x86_64__
> > +       if (cpu == 2 && use_svm)
> 
> Why only CPU2?

Remain from the days when I had no code to run multiple guests.



> 
> > +               svm_vcpu_init(&vcpu);
> > +#endif
> > +
> > +       assert(cpu != 0);
> > +
> > +       if (cpu != FIRST_TEST_VCPU)
> > +               wait_for_ipi(&isr_counts[cpu]);
> > +
> > +       for (i = 0; i < num_iterations; i++)
> > +       {
> > +               u8 physical_dst = cpu == ncpus -1 ? 1 : cpu + 1;
> 
> Space after the '-'.
OK.

> 
> > +
> > +               // send IPI to a next vCPU in a circular fashion
> > +               apic_icr_write(APIC_INT_ASSERT |
> > +                               APIC_DEST_PHYSICAL |
> > +                               APIC_DM_FIXED |
> > +                               (i % 2 ? 0x40 : 0x50),
> > +                               physical_dst);
> > +
> > +               if (i == (num_iterations - 1) && cpu != FIRST_TEST_VCPU)
> > +                       break;
> > +
> > +#ifdef __x86_64__
> > +               // wait for the IPI interrupt chain to come back to us
> > +               if (cpu == 2 && use_svm) {
> > +                               wait_for_ipi_in_l2(&isr_counts[cpu], &vcpu);
> 
> Indentation is funky.
OK.
> 
> > +                               continue;
> > +               }
> > +#endif
> > +               wait_for_ipi(&isr_counts[cpu]);
> > +       }
> > +}
> > +
> > +int main(int argc, void** argv)
> > +{
> > +       int cpu, ncpus = cpu_count();
> > +
> > +       assert(ncpus > 2);
> > +
> > +       if (argc > 1)
> > +               hlt_allowed = atol(argv[1]);
> > +
> > +       if (argc > 2)
> > +               num_iterations = atol(argv[2]);
> > +
> > +       setup_vm();
> > +
> > +#ifdef __x86_64__
> > +       if (svm_supported()) {
> > +               use_svm = true;
> > +               setup_svm();
> > +       }
> > +#endif
> > +
> > +       isr_counts = (volatile u64 *)calloc(ncpus, sizeof(u64));
> > +
> > +       printf("found %d cpus\n", ncpus);
> > +       printf("running for %lld iterations - test\n",
> > +               (long long unsigned int)num_iterations);
> > +
> > +
> > +       for (cpu = 0; cpu < ncpus; ++cpu)
> > +               on_cpu_async(cpu, vcpu_init, (void *)(long)cpu);
> > +
> > +       /* now let all the vCPUs end the IPI function*/
> > +       while (cpus_active() > 1)
> > +                 pause();
> > +
> > +       printf("starting test on all cpus but 0...\n");
> > +
> > +       for (cpu = ncpus-1; cpu >= FIRST_TEST_VCPU; cpu--)
> 
> Spaces around the '-'.

I will find a way to run checkpatch.pl on the patches...

> 
> > +               on_cpu_async(cpu, vcpu_code, (void *)(long)cpu);
> 
> Why not use smp_id() in vcpu_code()?  ipi_interrupt_handler() already relies on
> that being correct.
> 
> > +
> > +       printf("test started, waiting to end...\n");
> > +
> > +       while (cpus_active() > 1) {
> > +
> > +               unsigned long isr_count1, isr_count2;
> > +
> > +               isr_count1 = isr_counts[1];
> > +               delay(5ULL*1000*1000*1000);
> 
> Please add a macro or two for nanoseconds/milliseconds/seconds or whatever this
> expands to.

That is the problem - the delay is just in TSC freq units, and knowing TSC freq
for some reason on x86 is next to impossible on AMD

(If someone from AMD listens, please add a CPUID for this!)


> 
> > +               isr_count2 = isr_counts[1];
> > +
> > +               if (isr_count1 == isr_count2) {
> > +                       printf("\n");
> > +                       printf("hang detected!!\n");
> > +                       break;
> > +               } else {
> > +                       printf("made %ld IPIs \n", (isr_count2 - isr_count1)*(ncpus-1));
> > +               }
> > +       }
> > +
> > +       printf("\n");
> > +
> > +       for (cpu = 1; cpu < ncpus; ++cpu)
> > +               report(isr_counts[cpu] == num_iterations,
> > +                               "Number of IPIs match (%lld)",
> 
> Indentation.
> 
> > +                               (long long unsigned int)isr_counts[cpu]);
> 
> Print num_iterations, i.e. expected vs. actual?
> 
> > +
> > +       free((void*)isr_counts);
> > +       return report_summary();
> > +}
> > diff --git a/x86/unittests.cfg b/x86/unittests.cfg
> > index ebb3fdfc..7655d2ba 100644
> > --- a/x86/unittests.cfg
> > +++ b/x86/unittests.cfg
> > @@ -61,6 +61,11 @@ smp = 2
> >  file = smptest.flat
> >  smp = 3
> >  
> > +[ipi_stress]
> > +file = ipi_stress.flat
> > +extra_params = -cpu host,-x2apic,-svm,-hypervisor -global kvm-pit.lost_tick_policy=discard -machine kernel-irqchip=on -append '0 50000'
> 
> Why add all the SVM and HLT stuff and then effectively turn them off by default?
> There's basically zero chance any other configuration will get regular testing.

This is because this is a stress test and it is mostly useful to run manually for some time.
The svm and hlt should be enabled though, it is a leftover from the fact that I almost never run the test
from the kvm unit test main script, I'll enable these.




> 
> And why not have multi configs, e.g. to run with and without x2APIC?

Good idea as well, although I don't know if I want to slow down the kvm unit tests run too much.
No x2apic is mostly because AVIC used to not work without it. I can drop it now.
'-hypervisor' is also some leftover don't know why it is there.


Best regards,
	Maxim Levitsky

> 
> > +smp = 4
> > +
> >  [vmexit_cpuid]
> >  file = vmexit.flat
> >  extra_params = -append 'cpuid'
> > -- 
> > 2.26.3
> > 
>

Sean Christopherson Oct. 24, 2022, 5:19 p.m. UTC | #3

On Mon, Oct 24, 2022, Maxim Levitsky wrote:
> On Thu, 2022-10-20 at 20:23 +0000, Sean Christopherson wrote:
> > On Thu, Oct 20, 2022, Maxim Levitsky wrote:
> > > +static void wait_for_ipi(volatile u64 *count)
> > > +{
> > > +       u64 old_count = *count;
> > > +       bool use_halt;
> > > +
> > > +       switch (hlt_allowed) {
> > > +       case -1:
> > > +               use_halt = get_random(0,10000) == 0;
> > 
> > Randomly doing "halt" is going to be annoying to debug.  What about tying the
> > this decision to the iteration and then providing a knob to let the user specify
> > the frequency?  It seems unlikely that this test will expose a bug that occurs
> > if and only if the halt path is truly random.
> 
> This is stress test, it is pretty much impossible to debug, it is more like
> pass/fail test.

There's a big difference between "hard to debug because there's a lot going on"
and "hard to debug because failures are intermittent due to use of random numbers
with no way to ensure a deterministic sequence.  I completely understand that this
type of test is going to be really hard to debug, but that's argument for making
the test as deterministic as possible, i.e. do what we can to make it slightly
less awful to debug.

> > > +                       asm volatile ("sti;nop;cli");
> > 
> > sti_nop_cli();
> I think you mean sti_nop(); cli();

I was thinking we could add another helper since it's such a common pattern.

> > > +
> > > +       } while (old_count == *count);
> > 
> > There's no need to loop in the use_halt case.  If KVM spuriously wakes the vCPU
> > from halt, then that's a KVM bug.  Kinda ugly, but it does provide meaningfully
> > coverage for the HLT case.
> 
> Nope - KVM does spuriously wake up the CPU, for example when the vCPU thread
> recieves a signal and anything else that makes the kvm_vcpu_check_block
> return -EINTR.

That doesn't (and shouldn't) wake the vCPU from the guest's perspective.  If/when
userspace calls KVM_RUN again, the vCPU's state should still be KVM_MP_STATE_HALTED
and thus KVM will invoke vcpu_block() until there is an actual wake event.

This is something that KVM _must_ get correct,

> > > +static void wait_for_ipi_in_l2(volatile u64 *count, struct svm_vcpu *vcpu)
> > > +{
> > > +       u64 old_count = *count;
> > > +       bool irq_on_vmentry = get_random(0,1) == 0;
> > 
> > Same concerns about using random numbers.
> 
> I can also add a parameter to force this to true/false, or better long term,
> is to provide a PRNG and just seed it with either RDRAND or a userspace given number.
> RDRAND retrived value can be even printed so that the test can be replayed.
> 
> You know just like the tools we both worked on at Intel did....
> 
> In fact I'll just do it - just need to pick some open source PRNG code.
> Do you happen to know a good one? Mersenne Twister? 

It probably makes sense to use whatever we end up using for selftests[*] in order
to minimize the code we have to maintain.

[*] https://lore.kernel.org/all/20221019221321.3033920-2-coltonlewis@google.com

> > > +               // GIF is set by VMRUN
> > > +               SVM_VMRUN(vcpu->vmcb, &vcpu->regs);
> > > +               // GIF is cleared by VMEXIT
> > > +               asm volatile("cli;nop;stgi");
> > 
> > Why re-enable GIF on every exit?
> 
> And why not? KVM does this on each VMRUN.

Because doing work for no discernible reason is confusing.  E.g. if this were a
"real" hypervisor, it should also context switch CR2.

KVM enables STGI because GIF=0 blocks _all_ interrupts, i.e. KVM needs to recognize
NMI, SMI, #MC, etc... asap and even if KVM stays in its tight run loop.  For KUT,
there should be never be an NMI, SMI, #MC, etc... and so no need to enable GIF.

I suppose you could make the argument that the test should set GIF when running on
bare metal, but that's tenuous at best as SMI is the only event that isn't fatal to
the test.

> > > +
> > > +       printf("test started, waiting to end...\n");
> > > +
> > > +       while (cpus_active() > 1) {
> > > +
> > > +               unsigned long isr_count1, isr_count2;
> > > +
> > > +               isr_count1 = isr_counts[1];
> > > +               delay(5ULL*1000*1000*1000);
> > 
> > Please add a macro or two for nanoseconds/milliseconds/seconds or whatever this
> > expands to.
> 
> That is the problem - the delay is just in TSC freq units, and knowing TSC freq
> for some reason on x86 is next to impossible on AMD

Ah, delay() takes the number cycles.  Ugh.

We should fix that, e.g. use the CPUID-provided frequency when possible (KVM should
emulate this if it doesn't already), and then #define an arbitrary TSC frequency as
a fall back so that we can write readable code, e.g. 2.4Ghz is probably close enough
to work.

> > And why not have multi configs, e.g. to run with and without x2APIC?
> 
> Good idea as well, although I don't know if I want to slow down the kvm unit
> tests run too much.

We should add a way to flag and omit all "slow" tests, e.g. vmx_vmcs_shadow_test
takes an absurd amount of time and is uninteresting for the vast majority of changes.

Maxim Levitsky Oct. 27, 2022, 11 a.m. UTC | #4

On Mon, 2022-10-24 at 17:19 +0000, Sean Christopherson wrote:
> On Mon, Oct 24, 2022, Maxim Levitsky wrote:
> > On Thu, 2022-10-20 at 20:23 +0000, Sean Christopherson wrote:
> > > On Thu, Oct 20, 2022, Maxim Levitsky wrote:
> > > > +static void wait_for_ipi(volatile u64 *count)
> > > > +{
> > > > +       u64 old_count = *count;
> > > > +       bool use_halt;
> > > > +
> > > > +       switch (hlt_allowed) {
> > > > +       case -1:
> > > > +               use_halt = get_random(0,10000) == 0;
> > > 
> > > Randomly doing "halt" is going to be annoying to debug.  What about tying the
> > > this decision to the iteration and then providing a knob to let the user specify
> > > the frequency?  It seems unlikely that this test will expose a bug that occurs
> > > if and only if the halt path is truly random.
> > 
> > This is stress test, it is pretty much impossible to debug, it is more like
> > pass/fail test.
> 
> There's a big difference between "hard to debug because there's a lot going on"
> and "hard to debug because failures are intermittent due to use of random numbers
> with no way to ensure a deterministic sequence.  I completely understand that this
> type of test is going to be really hard to debug, but that's argument for making
> the test as deterministic as possible, i.e. do what we can to make it slightly
> less awful to debug.

I agree with you mostly, but I think that using a PRNG and a seed is the best way
to acheeve both randomness and determenism at the same time.

> 
> > > > +                       asm volatile ("sti;nop;cli");
> > > 
> > > sti_nop_cli();
> > I think you mean sti_nop(); cli();
> 
> I was thinking we could add another helper since it's such a common pattern.

This is a good overall idea.

I;ll would call it process_pending_interrupts() with a comment that it
enables interrupts for one CPU cycle.

And BTW I also do use that semicolon, because I either forgot about the gcc rule
or I didn't knew about it. I'll fix it.


> 
> > > > +
> > > > +       } while (old_count == *count);
> > > 
> > > There's no need to loop in the use_halt case.  If KVM spuriously wakes the vCPU
> > > from halt, then that's a KVM bug.  Kinda ugly, but it does provide meaningfully
> > > coverage for the HLT case.
> > 
> > Nope - KVM does spuriously wake up the CPU, for example when the vCPU thread
> > recieves a signal and anything else that makes the kvm_vcpu_check_block
> > return -EINTR.
> 
> That doesn't (and shouldn't) wake the vCPU from the guest's perspective.  If/when
> userspace calls KVM_RUN again, the vCPU's state should still be KVM_MP_STATE_HALTED
> and thus KVM will invoke vcpu_block() until there is an actual wake event.

Well HLT is allowed to do suprious wakeups so KVM is allowed to not do it correclty,
so I thought that KVM doesn't do this correclty, but I glad to hear that it does.
Thanks for the explanation!

I'll test if my test passes if I remove the loop in the halt case.

> 
> This is something that KVM _must_ get correct,
> 
> > > > +static void wait_for_ipi_in_l2(volatile u64 *count, struct svm_vcpu *vcpu)
> > > > +{
> > > > +       u64 old_count = *count;
> > > > +       bool irq_on_vmentry = get_random(0,1) == 0;
> > > 
> > > Same concerns about using random numbers.
> > 
> > I can also add a parameter to force this to true/false, or better long term,
> > is to provide a PRNG and just seed it with either RDRAND or a userspace given number.
> > RDRAND retrived value can be even printed so that the test can be replayed.
> > 
> > You know just like the tools we both worked on at Intel did....
> > 
> > In fact I'll just do it - just need to pick some open source PRNG code.
> > Do you happen to know a good one? Mersenne Twister? 
> 
> It probably makes sense to use whatever we end up using for selftests[*] in order
> to minimize the code we have to maintain.
> 
> [*] https://lore.kernel.org/all/20221019221321.3033920-2-coltonlewis@google.com

Makes sense. I'll then just take this generator and adopt it to the kvm unit tests.
Or do you want to actually share the code? via a kernel header or something?

> 
> > > > +               // GIF is set by VMRUN
> > > > +               SVM_VMRUN(vcpu->vmcb, &vcpu->regs);
> > > > +               // GIF is cleared by VMEXIT
> > > > +               asm volatile("cli;nop;stgi");
> > > 
> > > Why re-enable GIF on every exit?
> > 
> > And why not? KVM does this on each VMRUN.
> 
> Because doing work for no discernible reason is confusing.  E.g. if this were a
> "real" hypervisor, it should also context switch CR2.

I agree that my justification for this was not correct, but I might still want
to have the gif toggling here, because I think it might add some value to the test.
I'll think about it.

> 
> KVM enables STGI because GIF=0 blocks _all_ interrupts, i.e. KVM needs to recognize
> NMI, SMI, #MC, etc... asap and even if KVM stays in its tight run loop.  For KUT,
> there should be never be an NMI, SMI, #MC, etc... and so no need to enable GIF.
> 
> I suppose you could make the argument that the test should set GIF when running on
> bare metal, but that's tenuous at best as SMI is the only event that isn't fatal to
> the test.
> 
> > > > +
> > > > +       printf("test started, waiting to end...\n");
> > > > +
> > > > +       while (cpus_active() > 1) {
> > > > +
> > > > +               unsigned long isr_count1, isr_count2;
> > > > +
> > > > +               isr_count1 = isr_counts[1];
> > > > +               delay(5ULL*1000*1000*1000);
> > > 
> > > Please add a macro or two for nanoseconds/milliseconds/seconds or whatever this
> > > expands to.
> > 
> > That is the problem - the delay is just in TSC freq units, and knowing TSC freq
> > for some reason on x86 is next to impossible on AMD
> 
> Ah, delay() takes the number cycles.  Ugh.
> 
> We should fix that, e.g. use the CPUID-provided frequency when possible (KVM should
> emulate this if it doesn't already), and then #define an arbitrary TSC frequency as
> a fall back so that we can write readable code, e.g. 2.4Ghz is probably close enough
> to work.

KVM doesn't emulate the Intel's specific way of reporting TSC freq on AMD.
In some sense this is wrong to do as it is Intel specific.

I do think that it is a great idea to report the TSC freq via some KVM specific MSR.
That might though open a pandora box in regard to migration.

I don't like the 2.4Ghz idea at all - it is once again corner cutting. Its true
that most code doens't need exact delay, not to mention that delay is never going
to be exact, but once you expose (nano)second based interface, test writers
will start to use it, and then wonder why someone hardcoded it to 2.4 GHz.

> 
> > > And why not have multi configs, e.g. to run with and without x2APIC?
> > 
> > Good idea as well, although I don't know if I want to slow down the kvm unit
> > tests run too much.
> 
> We should add a way to flag and omit all "slow" tests, e.g. vmx_vmcs_shadow_test
> takes an absurd amount of time and is uninteresting for the vast majority of changes.

This is a good idea as well.


Best regards,
	Maxim Levitsky

>

Sean Christopherson Oct. 27, 2022, 6:41 p.m. UTC | #5

On Thu, Oct 27, 2022, Maxim Levitsky wrote:
> On Mon, 2022-10-24 at 17:19 +0000, Sean Christopherson wrote:
> > That doesn't (and shouldn't) wake the vCPU from the guest's perspective.  If/when
> > userspace calls KVM_RUN again, the vCPU's state should still be KVM_MP_STATE_HALTED
> > and thus KVM will invoke vcpu_block() until there is an actual wake event.
> 
> Well HLT is allowed to do suprious wakeups so KVM is allowed to not do it correclty,

I suspect the above "HLT is allowed to do spurious wakeups" is a typo, but in case
it's not, the SDM says:

  An enabled interrupt (including NMI and SMI), a debug exception, the BINIT# signal,
  the INIT# signal, or the RESET# signal will resume execution.

and the APM says:

  Execution resumes when an unmasked hardware interrupt (INTR), non-maskable
  interrupt (NMI), system management interrupt (SMI), RESET, or INIT occurs.

I.e. resuming from HLT without a valid wake event is a violation of the x86 architecture.

> > > In fact I'll just do it - just need to pick some open source PRNG code.
> > > Do you happen to know a good one? Mersenne Twister? 
> > 
> > It probably makes sense to use whatever we end up using for selftests[*] in order
> > to minimize the code we have to maintain.
> > 
> > [*] https://lore.kernel.org/all/20221019221321.3033920-2-coltonlewis@google.com
> 
> Makes sense. I'll then just take this generator and adopt it to the kvm unit tests.
> Or do you want to actually share the code? via a kernel header or something?

Sadly, just copy+paste for now.  It'd be nice to share code, e.g. for the myriad
X86_FEATURE_* flags, but's a separate problem.

> > > That is the problem - the delay is just in TSC freq units, and knowing TSC freq
> > > for some reason on x86 is next to impossible on AMD
> > 
> > Ah, delay() takes the number cycles.  Ugh.
> > 
> > We should fix that, e.g. use the CPUID-provided frequency when possible (KVM should
> > emulate this if it doesn't already), and then #define an arbitrary TSC frequency as
> > a fall back so that we can write readable code, e.g. 2.4Ghz is probably close enough
> > to work.
> 
> KVM doesn't emulate the Intel's specific way of reporting TSC freq on AMD.
> In some sense this is wrong to do as it is Intel specific.
> 
> I do think that it is a great idea to report the TSC freq via some KVM specific MSR.
> That might though open a pandora box in regard to migration.

Heh, yeah, the Hyper-V TSC stuff is rather ugly.

> I don't like the 2.4Ghz idea at all - it is once again corner cutting. Its true
> that most code doens't need exact delay, not to mention that delay is never going
> to be exact, but once you expose (nano)second based interface, test writers
> will start to use it, and then wonder why someone hardcoded it to 2.4 GHz.a

True.  A really crazy/bad idea would be to get APERF/MPERF from /dev/cpu/0/msr
in the run script and somehow feed the host TSC into KUT :-)

[kvm-unit-tests,16/16] add IPI loss stress test

Commit Message

Comments

Patch