diff mbox

[2/2] kvm: x86: reduce collisions in mmu_page_hash

Message ID 1482184705-127401-2-git-send-email-dmatlack@google.com (mailing list archive)
State New, archived
Headers show

Commit Message

David Matlack Dec. 19, 2016, 9:58 p.m. UTC
When using two-dimensional paging, the mmu_page_hash (which provides
lookups for existing kvm_mmu_page structs), becomes imbalanced; with
too many collisions in buckets 0 and 512. This has been seen to cause
mmu_lock to be held for multiple milliseconds in kvm_mmu_get_page on
VMs with a large amount of RAM mapped with 4K pages.

The current hash function uses the lower 10 bits of gfn to index into
mmu_page_hash. When doing shadow paging, gfn is the address of the
guest page table being shadow. These tables are 4K-aligned, which
makes the low bits of gfn a good hash. However, with two-dimensional
paging, no guest page tables are being shadowed, so gfn is the base
address that is mapped by the table. Thus page tables (level=1) have
a 2MB aligned gfn, page directories (level=2) have a 1GB aligned gfn,
etc. This means hashes will only differ in their 10th bit.

hash_64() provides a better hash. For example, on a VM with ~200G
(99458 direct=1 kvm_mmu_page structs):

hash            max_mmu_page_hash_collisions
--------------------------------------------
low 10 bits     49847
hash_64         105
perfect         97

While we're changing the hash, increase the table size by 4x to better
support large VMs (further reduces number of collisions in 200G VM to
29).

Note that hash_64() does not provide a good distribution prior to commit
ef703f49a6c5 ("Eliminate bad hash multipliers from hash_32() and
hash_64()").

Signed-off-by: David Matlack <dmatlack@google.com>
Change-Id: I5aa6b13c834722813c6cca46b8b1ed6f53368ade
---
 arch/x86/include/asm/kvm_host.h | 2 +-
 arch/x86/kvm/mmu.c              | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

Comments

David Hildenbrand Dec. 20, 2016, 10 a.m. UTC | #1
> diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
> index 58995fd9..de55653 100644
> --- a/arch/x86/kvm/mmu.c
> +++ b/arch/x86/kvm/mmu.c
> @@ -1713,7 +1713,7 @@ static void kvm_mmu_free_page(struct kvm_mmu_page *sp)
>
>  static unsigned kvm_page_table_hashfn(gfn_t gfn)
>  {
> -	return gfn & ((1 << KVM_MMU_HASH_SHIFT) - 1);
> +	return hash_64(gfn, KVM_MMU_HASH_SHIFT);
>  }
>
>  static void mmu_page_add_parent_pte(struct kvm_vcpu *vcpu,
>

hash_64() might be more expensive to calculate, as it involves one
multiplication. Not sure if that might be a problem.

Looks good to me!
Paolo Bonzini Dec. 20, 2016, 4:57 p.m. UTC | #2
On 19/12/2016 22:58, David Matlack wrote:
> When using two-dimensional paging, the mmu_page_hash (which provides
> lookups for existing kvm_mmu_page structs), becomes imbalanced; with
> too many collisions in buckets 0 and 512.

Ouch, nice catch.  I'll apply this patch tomorrow since it's pretty much
independent of patch 1.  If you want to send out a revision of the
statistics patch, please do.

Thanks,

Paolo

 This has been seen to cause
> mmu_lock to be held for multiple milliseconds in kvm_mmu_get_page on
> VMs with a large amount of RAM mapped with 4K pages.
> 
> The current hash function uses the lower 10 bits of gfn to index into
> mmu_page_hash. When doing shadow paging, gfn is the address of the
> guest page table being shadow. These tables are 4K-aligned, which
> makes the low bits of gfn a good hash. However, with two-dimensional
> paging, no guest page tables are being shadowed, so gfn is the base
> address that is mapped by the table. Thus page tables (level=1) have
> a 2MB aligned gfn, page directories (level=2) have a 1GB aligned gfn,
> etc. This means hashes will only differ in their 10th bit.
> 
> hash_64() provides a better hash. For example, on a VM with ~200G
> (99458 direct=1 kvm_mmu_page structs):
> 
> hash            max_mmu_page_hash_collisions
> --------------------------------------------
> low 10 bits     49847
> hash_64         105
> perfect         97
> 
> While we're changing the hash, increase the table size by 4x to better
> support large VMs (further reduces number of collisions in 200G VM to
> 29).
> 
> Note that hash_64() does not provide a good distribution prior to commit
> ef703f49a6c5 ("Eliminate bad hash multipliers from hash_32() and
> hash_64()").
> 
> Signed-off-by: David Matlack <dmatlack@google.com>
> Change-Id: I5aa6b13c834722813c6cca46b8b1ed6f53368ade
> ---
>  arch/x86/include/asm/kvm_host.h | 2 +-
>  arch/x86/kvm/mmu.c              | 2 +-
>  2 files changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index 8ba0d64..5962e4bc 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -115,7 +115,7 @@ static inline gfn_t gfn_to_index(gfn_t gfn, gfn_t base_gfn, int level)
>  
>  #define KVM_PERMILLE_MMU_PAGES 20
>  #define KVM_MIN_ALLOC_MMU_PAGES 64
> -#define KVM_MMU_HASH_SHIFT 10
> +#define KVM_MMU_HASH_SHIFT 12
>  #define KVM_NUM_MMU_PAGES (1 << KVM_MMU_HASH_SHIFT)
>  #define KVM_MIN_FREE_MMU_PAGES 5
>  #define KVM_REFILL_PAGES 25
> diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
> index 58995fd9..de55653 100644
> --- a/arch/x86/kvm/mmu.c
> +++ b/arch/x86/kvm/mmu.c
> @@ -1713,7 +1713,7 @@ static void kvm_mmu_free_page(struct kvm_mmu_page *sp)
>  
>  static unsigned kvm_page_table_hashfn(gfn_t gfn)
>  {
> -	return gfn & ((1 << KVM_MMU_HASH_SHIFT) - 1);
> +	return hash_64(gfn, KVM_MMU_HASH_SHIFT);
>  }
>  
>  static void mmu_page_add_parent_pte(struct kvm_vcpu *vcpu,
> 
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Matlack Dec. 20, 2016, 11:01 p.m. UTC | #3
On Tue, Dec 20, 2016 at 8:57 AM, Paolo Bonzini <pbonzini@redhat.com> wrote:
> On 19/12/2016 22:58, David Matlack wrote:
>> When using two-dimensional paging, the mmu_page_hash (which provides
>> lookups for existing kvm_mmu_page structs), becomes imbalanced; with
>> too many collisions in buckets 0 and 512.
>
> Ouch, nice catch.  I'll apply this patch tomorrow since it's pretty much
> independent of patch 1.  If you want to send out a revision of the
> statistics patch, please do.

SG. I'll send v2 of the stats patch by itself. Thanks!
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 8ba0d64..5962e4bc 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -115,7 +115,7 @@  static inline gfn_t gfn_to_index(gfn_t gfn, gfn_t base_gfn, int level)
 
 #define KVM_PERMILLE_MMU_PAGES 20
 #define KVM_MIN_ALLOC_MMU_PAGES 64
-#define KVM_MMU_HASH_SHIFT 10
+#define KVM_MMU_HASH_SHIFT 12
 #define KVM_NUM_MMU_PAGES (1 << KVM_MMU_HASH_SHIFT)
 #define KVM_MIN_FREE_MMU_PAGES 5
 #define KVM_REFILL_PAGES 25
diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 58995fd9..de55653 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -1713,7 +1713,7 @@  static void kvm_mmu_free_page(struct kvm_mmu_page *sp)
 
 static unsigned kvm_page_table_hashfn(gfn_t gfn)
 {
-	return gfn & ((1 << KVM_MMU_HASH_SHIFT) - 1);
+	return hash_64(gfn, KVM_MMU_HASH_SHIFT);
 }
 
 static void mmu_page_add_parent_pte(struct kvm_vcpu *vcpu,