diff mbox series

[v4,11/18] KVM: x86/mmu: Add documentation of NUMA aware page table capability

Message ID 20230306224127.1689967-12-vipinsh@google.com (mailing list archive)
State New, archived
Headers show
Series NUMA aware page table allocation | expand

Commit Message

Vipin Sharma March 6, 2023, 10:41 p.m. UTC
Add documentation for KVM_CAP_NUMA_AWARE_PAGE_TABLE capability and
explain why it is needed.

Signed-off-by: Vipin Sharma <vipinsh@google.com>
---
 Documentation/virt/kvm/api.rst | 29 +++++++++++++++++++++++++++++
 1 file changed, 29 insertions(+)

Comments

David Matlack March 23, 2023, 9:59 p.m. UTC | #1
On Mon, Mar 06, 2023 at 02:41:20PM -0800, Vipin Sharma wrote:
> Add documentation for KVM_CAP_NUMA_AWARE_PAGE_TABLE capability and
> explain why it is needed.
> 
> Signed-off-by: Vipin Sharma <vipinsh@google.com>
> ---
>  Documentation/virt/kvm/api.rst | 29 +++++++++++++++++++++++++++++
>  1 file changed, 29 insertions(+)
> 
> diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
> index 62de0768d6aa..7e3a1299ca8e 100644
> --- a/Documentation/virt/kvm/api.rst
> +++ b/Documentation/virt/kvm/api.rst
> @@ -7669,6 +7669,35 @@ This capability is aimed to mitigate the threat that malicious VMs can
>  cause CPU stuck (due to event windows don't open up) and make the CPU
>  unavailable to host or other VMs.
>  
> +7.34 KVM_CAP_NUMA_AWARE_PAGE_TABLE
> +------------------------------
> +
> +:Architectures: x86
> +:Target: VM
> +:Returns: 0 on success, -EINVAL if vCPUs are already created.
> +
> +This capability allows userspace to enable NUMA aware page tables allocations.

Call out that this capability overrides task mempolicies. e.g.

  This capability causes KVM to use a custom NUMA memory policy when
  allocating page tables. Specifically, KVM will attempt to co-locate
  page tables pages with the memory that they map, rather than following
  the mempolicy of the current task.

> +NUMA aware page tables are disabled by default. Once enabled, prior to vCPU
> +creation, any page table allocated during the life of a VM will be allocated

The "prior to vCPU creation" part here is confusing because it sounds
like you're talking about any page tables allocated before vCPU
creation. Just delete that part and put it in a separate paragraph.

 KVM_CAP_NUMA_AWARE_PAGE_TABLE must be enabled before any vCPU is
 created, otherwise KVM will return -EINVAL.

> +preferably from the NUMA node of the leaf page.
> +
> +Without this capability, default feature is to use current thread mempolicy and

s/default feature is to/KVM will/

> +allocate page table based on that.

s/and allocate page table based on that./to allocate page tables./

> +
> +This capability is useful to improve page accesses by a guest. For example, an

nit: Be more specific about how.

 This capability aims to minimize the cost of TLB misses when a vCPU is
 accessing NUMA-local memory, by reducing the number of remote memory
 accesses needed to walk KVM's page tables.

> +initialization thread which access lots of remote memory and ends up creating
> +page tables on local NUMA node, or some service thread allocates memory on
> +remote NUMA nodes and later worker/background threads accessing that memory
> +will end up accessing remote NUMA node page tables.

It's not clear if these examples are talking about what happens when
KVM_CAP_NUMA_AWARE_PAGE_TABLE is enabled or disabled.

Also it's important to distinguish virtual NUMA nodes from physical NUMA
nodes and where these "threads" are running. How about this:

 For example, when KVM_CAP_NUMA_AWARE_PAGE_TABLE is disabled and a vCPU
 accesses memory on a remote NUMA node and triggers a KVM page fault,
 KVM will allocate page tables to handle that fault on the node where
 the vCPU is running rather than the node where the memory is allocated.
 When KVM_CAP_NUMA_AWARE_PAGE_TABLE is enabled, KVM will allocate the
 page tables on the node where the memory is located.

 This is intended to be used in VM configurations that properly
 virtualize NUMA. i.e. VMs with one or more virtual NUMA nodes, each of
 which is mapped to a physical NUMA node. With this capability enabled
 on such VMs, any guest memory access to virtually-local memory will be
 translated through mostly[*] physically-local page tables, regardless
 of how the memory was faulted in.

 [*] KVM will fallback to allocating from remote NUMA nodes if the
 preferred node is out of memory. Also, in VMs with 2 or more NUMA
 nodes, higher level page tables will necessarily map memory across
 multiple physical nodes.

> So, a multi NUMA node
> +guest, can with high confidence access local memory faster instead of going
> +through remote page tables first.
> +
> +This capability is also helpful for host to reduce live migration impact when
> +splitting huge pages during dirty log operations. If the thread splitting huge
> +page is on remote NUMA node it will create page tables on remote node. Even if
> +guest is careful in making sure that it only access local memory they will end
> +up accessing remote page tables.

Please also cover the limitations of this feature:

 - Impact on remote memory accesses (more expensive).
 - How KVM handles NUMA node exhaustion.
 - How high-level page tables can span multiple nodes.
 - What KVM does if it can't determine the NUMA node of the pfn.
 - What KVM does for faults on GPAs that aren't backed by a pfn.

> +
>  8. Other capabilities.
>  ======================
>  
> -- 
> 2.40.0.rc0.216.gc4246ad0f0-goog
>
Vipin Sharma March 28, 2023, 4:47 p.m. UTC | #2
On Thu, Mar 23, 2023 at 2:59 PM David Matlack <dmatlack@google.com> wrote:
>
> On Mon, Mar 06, 2023 at 02:41:20PM -0800, Vipin Sharma wrote:
> > Add documentation for KVM_CAP_NUMA_AWARE_PAGE_TABLE capability and
> > explain why it is needed.
> >
> > Signed-off-by: Vipin Sharma <vipinsh@google.com>
> > ---
> >  Documentation/virt/kvm/api.rst | 29 +++++++++++++++++++++++++++++
> >  1 file changed, 29 insertions(+)
> >
> > diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
> > index 62de0768d6aa..7e3a1299ca8e 100644
> > --- a/Documentation/virt/kvm/api.rst
> > +++ b/Documentation/virt/kvm/api.rst
> > @@ -7669,6 +7669,35 @@ This capability is aimed to mitigate the threat that malicious VMs can
> >  cause CPU stuck (due to event windows don't open up) and make the CPU
> >  unavailable to host or other VMs.
> >
> > +7.34 KVM_CAP_NUMA_AWARE_PAGE_TABLE
> > +------------------------------
> > +
> > +:Architectures: x86
> > +:Target: VM
> > +:Returns: 0 on success, -EINVAL if vCPUs are already created.
> > +
> > +This capability allows userspace to enable NUMA aware page tables allocations.
>
> Call out that this capability overrides task mempolicies. e.g.
>
>   This capability causes KVM to use a custom NUMA memory policy when
>   allocating page tables. Specifically, KVM will attempt to co-locate
>   page tables pages with the memory that they map, rather than following
>   the mempolicy of the current task.
>
> > +NUMA aware page tables are disabled by default. Once enabled, prior to vCPU
> > +creation, any page table allocated during the life of a VM will be allocated
>
> The "prior to vCPU creation" part here is confusing because it sounds
> like you're talking about any page tables allocated before vCPU
> creation. Just delete that part and put it in a separate paragraph.
>
>  KVM_CAP_NUMA_AWARE_PAGE_TABLE must be enabled before any vCPU is
>  created, otherwise KVM will return -EINVAL.
>
> > +preferably from the NUMA node of the leaf page.
> > +
> > +Without this capability, default feature is to use current thread mempolicy and
>
> s/default feature is to/KVM will/
>
> > +allocate page table based on that.
>
> s/and allocate page table based on that./to allocate page tables./
>
> > +
> > +This capability is useful to improve page accesses by a guest. For example, an
>
> nit: Be more specific about how.
>
>  This capability aims to minimize the cost of TLB misses when a vCPU is
>  accessing NUMA-local memory, by reducing the number of remote memory
>  accesses needed to walk KVM's page tables.
>
> > +initialization thread which access lots of remote memory and ends up creating
> > +page tables on local NUMA node, or some service thread allocates memory on
> > +remote NUMA nodes and later worker/background threads accessing that memory
> > +will end up accessing remote NUMA node page tables.
>
> It's not clear if these examples are talking about what happens when
> KVM_CAP_NUMA_AWARE_PAGE_TABLE is enabled or disabled.
>
> Also it's important to distinguish virtual NUMA nodes from physical NUMA
> nodes and where these "threads" are running. How about this:
>
>  For example, when KVM_CAP_NUMA_AWARE_PAGE_TABLE is disabled and a vCPU
>  accesses memory on a remote NUMA node and triggers a KVM page fault,
>  KVM will allocate page tables to handle that fault on the node where
>  the vCPU is running rather than the node where the memory is allocated.
>  When KVM_CAP_NUMA_AWARE_PAGE_TABLE is enabled, KVM will allocate the
>  page tables on the node where the memory is located.
>
>  This is intended to be used in VM configurations that properly
>  virtualize NUMA. i.e. VMs with one or more virtual NUMA nodes, each of
>  which is mapped to a physical NUMA node. With this capability enabled
>  on such VMs, any guest memory access to virtually-local memory will be
>  translated through mostly[*] physically-local page tables, regardless
>  of how the memory was faulted in.
>
>  [*] KVM will fallback to allocating from remote NUMA nodes if the
>  preferred node is out of memory. Also, in VMs with 2 or more NUMA
>  nodes, higher level page tables will necessarily map memory across
>  multiple physical nodes.
>
> > So, a multi NUMA node
> > +guest, can with high confidence access local memory faster instead of going
> > +through remote page tables first.
> > +
> > +This capability is also helpful for host to reduce live migration impact when
> > +splitting huge pages during dirty log operations. If the thread splitting huge
> > +page is on remote NUMA node it will create page tables on remote node. Even if
> > +guest is careful in making sure that it only access local memory they will end
> > +up accessing remote page tables.
>
> Please also cover the limitations of this feature:
>
>  - Impact on remote memory accesses (more expensive).
>  - How KVM handles NUMA node exhaustion.
>  - How high-level page tables can span multiple nodes.
>  - What KVM does if it can't determine the NUMA node of the pfn.
>  - What KVM does for faults on GPAs that aren't backed by a pfn.
>

Thanks for the suggestions, I will incorporate them in the next version.
diff mbox series

Patch

diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
index 62de0768d6aa..7e3a1299ca8e 100644
--- a/Documentation/virt/kvm/api.rst
+++ b/Documentation/virt/kvm/api.rst
@@ -7669,6 +7669,35 @@  This capability is aimed to mitigate the threat that malicious VMs can
 cause CPU stuck (due to event windows don't open up) and make the CPU
 unavailable to host or other VMs.
 
+7.34 KVM_CAP_NUMA_AWARE_PAGE_TABLE
+------------------------------
+
+:Architectures: x86
+:Target: VM
+:Returns: 0 on success, -EINVAL if vCPUs are already created.
+
+This capability allows userspace to enable NUMA aware page tables allocations.
+NUMA aware page tables are disabled by default. Once enabled, prior to vCPU
+creation, any page table allocated during the life of a VM will be allocated
+preferably from the NUMA node of the leaf page.
+
+Without this capability, default feature is to use current thread mempolicy and
+allocate page table based on that.
+
+This capability is useful to improve page accesses by a guest. For example, an
+initialization thread which access lots of remote memory and ends up creating
+page tables on local NUMA node, or some service thread allocates memory on
+remote NUMA nodes and later worker/background threads accessing that memory
+will end up accessing remote NUMA node page tables. So, a multi NUMA node
+guest, can with high confidence access local memory faster instead of going
+through remote page tables first.
+
+This capability is also helpful for host to reduce live migration impact when
+splitting huge pages during dirty log operations. If the thread splitting huge
+page is on remote NUMA node it will create page tables on remote node. Even if
+guest is careful in making sure that it only access local memory they will end
+up accessing remote page tables.
+
 8. Other capabilities.
 ======================