KVM: Defer remote tlb flushes on invlpg (v4)

On Sun, Mar 29, 2009 at 01:36:01PM +0300, Avi Kivity wrote:
> Marcelo, Andrea?

Had to read the code a bit more to understand the reason of the
unsync_mmu flush in cr3 overwrite.

> Avi Kivity wrote:
>> diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
>> index 2a36f7f..f0ea56c 100644
>> --- a/arch/x86/kvm/mmu.c
>> +++ b/arch/x86/kvm/mmu.c
>> @@ -1184,8 +1184,7 @@ static void mmu_sync_children(struct kvm_vcpu *vcpu,
>>  		for_each_sp(pages, sp, parents, i)
>>  			protected |= rmap_write_protect(vcpu->kvm, sp->gfn);
>>  -		if (protected)
>> -			kvm_flush_remote_tlbs(vcpu->kvm);
>> +		kvm_flush_remote_tlbs_cond(vcpu->kvm, protected);
>>   		for_each_sp(pages, sp, parents, i) {
>>  			kvm_sync_page(vcpu, sp);

Ok so because we didn't flush the tlb on the other vcpus when invlpg
run, if cr3 overwrite needs to re-sync sptes wrprotecting them, we've
to flush the tlb in all vcpus to be sure the possibly writable tlb
entry reflecting the old writable spte instantiated before invlpg run,
is removed from the physical cpus. We wouldn't find it in for_each_sp
because it was rmap_removed, but we'll find something in
mmu_unsync_walk (right? we definitely have to find something in
mmu_unsync_walk for this to work, the parent sp have to leave
child->unsync set even after rmap_remove run in invlpg without
flushing the other vcpus tlbs).

>>  @@ -465,7 +464,7 @@ static void FNAME(invlpg)(struct kvm_vcpu *vcpu, 
>> gva_t gva)
>>  				rmap_remove(vcpu->kvm, sptep);
>>  				if (is_large_pte(*sptep))
>>  					--vcpu->kvm->stat.lpages;
>> -				need_flush = 1;
>> +				vcpu->kvm->remote_tlbs_dirty = true;
>>  			}
>>  			set_shadow_pte(sptep, shadow_trap_nonpresent_pte);
>>  			break;
>> @@ -475,8 +474,6 @@ static void FNAME(invlpg)(struct kvm_vcpu *vcpu, gva_t 
>> gva)
>>  			break;
>>  	}
>>  -	if (need_flush)
>> -		kvm_flush_remote_tlbs(vcpu->kvm);
>>  	spin_unlock(&vcpu->kvm->mmu_lock);

AFIK to be compliant with lowlevel archs (without ASN it doesn't
matter I think as vmx always flush on exit), we have to flush the
local tlb here, with set_bit(KVM_REQ_TLB_FLUSH, &vcpu->requests). I
don't see why it's missing. Or am I wrong?

>> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
>> index 68b217e..12afa50 100644
>> --- a/virt/kvm/kvm_main.c
>> +++ b/virt/kvm/kvm_main.c
>> @@ -758,10 +758,18 @@ static bool make_all_cpus_request(struct kvm *kvm, 
>> unsigned int req)
>>   void kvm_flush_remote_tlbs(struct kvm *kvm)
>>  {
>> +	kvm->remote_tlbs_dirty = false;
>> +	smp_wmb();

Still no lock prefix to the asm insn and here it runs outside the
mmu_lock, but ok, I tend to agree smp_wmb should be enough to be sure
the write is fully finished by the time smb_wmb returns. There's
another problem though.

CPU0				CPU1
-----------			-------------
remote_tlbs_dirty = false
				remote_tlbs_dirty = true
smp_tlb_flush
				set_shadow_pte(sptep, shadow_trap_nonpresent_pte);

The flush for the sptep will be lost.

>> @@ -907,8 +913,7 @@ static int kvm_mmu_notifier_clear_flush_young(struct 
>> mmu_notifier *mn,
>>  	young = kvm_age_hva(kvm, address);
>>  	spin_unlock(&kvm->mmu_lock);
>>  -	if (young)
>> -		kvm_flush_remote_tlbs(kvm);
>> +	kvm_flush_remote_tlbs_cond(kvm, young);
>>   	return young;
>>  }

No need to flush for clear_flush_young method, pages can't be freed
there.

I mangled over the patch a bit, plus fixed the above smp race, let me
know what you think.

The the best workload to exercise this is running a VM with lots of
VCPUs and 8G of ram with a 32bit guest kernel and then just malloc and
touch a byte for each 4096 page allocated by malloc. That will run a
flood of invlpg. Then push the system to swap. while :; do cp /dev/hda
/dev/null; done, also works without O_DIRECT so the host cache make it
fast at the second run (not so much faster with host swapping though).

I only tested it so far with 12 VM on swap with 64bit kernels with
heavy I/O so it's not good test as I doubt any invlpg runs, not even
munmap(addr, 4k) uses invlpg.

Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
---

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

KVM: Defer remote tlb flushes on invlpg (v4)

Commit Message

Comments

Patch