diff mbox series

[v2,01/10] KVM: Document KVM_MAP_MEMORY ioctl

Message ID 9a060293c9ad9a78f1d8994cfe1311e818e99257.1712785629.git.isaku.yamahata@intel.com (mailing list archive)
State New
Headers show
Series KVM: Guest Memory Pre-Population API | expand

Commit Message

Isaku Yamahata April 10, 2024, 10:07 p.m. UTC
From: Isaku Yamahata <isaku.yamahata@intel.com>

Adds documentation of KVM_MAP_MEMORY ioctl. [1]

It populates guest memory.  It doesn't do extra operations on the
underlying technology-specific initialization [2].  For example,
CoCo-related operations won't be performed.  Concretely for TDX, this API
won't invoke TDH.MEM.PAGE.ADD() or TDH.MR.EXTEND().  Vendor-specific APIs
are required for such operations.

The key point is to adapt of vcpu ioctl instead of VM ioctl.  First,
populating guest memory requires vcpu.  If it is VM ioctl, we need to pick
one vcpu somehow.  Secondly, vcpu ioctl allows each vcpu to invoke this
ioctl in parallel.  It helps to scale regarding guest memory size, e.g.,
hundreds of GB.

[1] https://lore.kernel.org/kvm/Zbrj5WKVgMsUFDtb@google.com/
[2] https://lore.kernel.org/kvm/Ze-TJh0BBOWm9spT@google.com/

Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
---
v2:
- Make flags reserved for future use. (Sean, Michael)
- Clarified the supposed use case. (Kai)
- Dropped source member of struct kvm_memory_mapping. (Michael)
- Change the unit from pages to bytes. (Michael)
---
 Documentation/virt/kvm/api.rst | 52 ++++++++++++++++++++++++++++++++++
 1 file changed, 52 insertions(+)

Comments

Edgecombe, Rick P April 15, 2024, 11:27 p.m. UTC | #1
Nits only...

On Wed, 2024-04-10 at 15:07 -0700, isaku.yamahata@intel.com wrote:
> From: Isaku Yamahata <isaku.yamahata@intel.com>
> 
> Adds documentation of KVM_MAP_MEMORY ioctl. [1]
> 
> It populates guest memory.  It doesn't do extra operations on the
> underlying technology-specific initialization [2].  For example,
> CoCo-related operations won't be performed.  Concretely for TDX, this API
> won't invoke TDH.MEM.PAGE.ADD() or TDH.MR.EXTEND().  Vendor-specific APIs
> are required for such operations.
> 
> The key point is to adapt of vcpu ioctl instead of VM ioctl.

Not sure what you are trying to say here.

>   First,
> populating guest memory requires vcpu.  If it is VM ioctl, we need to pick
> one vcpu somehow.  Secondly, vcpu ioctl allows each vcpu to invoke this
> ioctl in parallel.  It helps to scale regarding guest memory size, e.g.,
> hundreds of GB.

I guess you are explaining why this is a vCPU ioctl instead of a KVM ioctl. Is
this clearer:

Although the operation is sort of a VM operation, make the ioctl a vCPU ioctl
instead of KVM ioctl. Do this because a vCPU is needed internally for the fault
path anyway, and because... (I don't follow the second point).

> 
> [1] https://lore.kernel.org/kvm/Zbrj5WKVgMsUFDtb@google.com/
> [2] https://lore.kernel.org/kvm/Ze-TJh0BBOWm9spT@google.com/
> 
> Suggested-by: Sean Christopherson <seanjc@google.com>
> Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
> ---
> v2:
> - Make flags reserved for future use. (Sean, Michael)
> - Clarified the supposed use case. (Kai)
> - Dropped source member of struct kvm_memory_mapping. (Michael)
> - Change the unit from pages to bytes. (Michael)
> ---
>  Documentation/virt/kvm/api.rst | 52 ++++++++++++++++++++++++++++++++++
>  1 file changed, 52 insertions(+)
> 
> diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
> index f0b76ff5030d..6ee3d2b51a2b 100644
> --- a/Documentation/virt/kvm/api.rst
> +++ b/Documentation/virt/kvm/api.rst
> @@ -6352,6 +6352,58 @@ a single guest_memfd file, but the bound ranges must
> not overlap).
>  
>  See KVM_SET_USER_MEMORY_REGION2 for additional details.
>  
> +4.143 KVM_MAP_MEMORY
> +------------------------
> +
> +:Capability: KVM_CAP_MAP_MEMORY
> +:Architectures: none
> +:Type: vcpu ioctl
> +:Parameters: struct kvm_memory_mapping (in/out)
> +:Returns: 0 on success, < 0 on error
> +
> +Errors:
> +
> +  ========== =============================================================
> +  EINVAL     invalid parameters
> +  EAGAIN     The region is only processed partially.  The caller should
> +             issue the ioctl with the updated parameters when `size` > 0.
> +  EINTR      An unmasked signal is pending.  The region may be processed
> +             partially.
> +  EFAULT     The parameter address was invalid.  The specified region
> +             `base_address` and `size` was invalid.  The region isn't
> +             covered by KVM memory slot.
> +  EOPNOTSUPP The architecture doesn't support this operation. The x86 two
> +             dimensional paging supports this API.  the x86 kvm shadow mmu
> +             doesn't support it.  The other arch KVM doesn't support it.
> +  ========== =============================================================
> +
> +::
> +
> +  struct kvm_memory_mapping {
> +       __u64 base_address;
> +       __u64 size;
> +       __u64 flags;
> +  };
> +
> +KVM_MAP_MEMORY populates guest memory with the range, `base_address` in (L1)
> +guest physical address(GPA) and `size` in bytes.  `flags` must be zero.  It's
> +reserved for future use.  When the ioctl returns, the input values are
> updated
> +to point to the remaining range.  If `size` > 0 on return, the caller should
> +issue the ioctl with the updated parameters.
> +
> +Multiple vcpus are allowed to call this ioctl simultaneously.  It's not
> +mandatory for all vcpus to issue this ioctl.  A single vcpu can suffice.
> +Multiple vcpus invocations are utilized for scalability to process the
> +population in parallel.  If multiple vcpus call this ioctl in parallel, it
> may
> +result in the error of EAGAIN due to race conditions.
> +
> +This population is restricted to the "pure" population without triggering
> +underlying technology-specific initialization.  For example, CoCo-related
> +operations won't perform.  In the case of TDX, this API won't invoke
> +TDH.MEM.PAGE.ADD() or TDH.MR.EXTEND().  Vendor-specific uAPIs are required
> for
> +such operations.

Probably don't want to have TDX bits in here yet. Since it's talking about what
KVM_MAP_MEMORY is *not* doing, it can just be dropped.

> +
> +
>  5. The kvm_run structure
>  ========================
>
Isaku Yamahata April 15, 2024, 11:47 p.m. UTC | #2
On Mon, Apr 15, 2024 at 11:27:20PM +0000,
"Edgecombe, Rick P" <rick.p.edgecombe@intel.com> wrote:

> Nits only...
> 
> On Wed, 2024-04-10 at 15:07 -0700, isaku.yamahata@intel.com wrote:
> > From: Isaku Yamahata <isaku.yamahata@intel.com>
> > 
> > Adds documentation of KVM_MAP_MEMORY ioctl. [1]
> > 
> > It populates guest memory.  It doesn't do extra operations on the
> > underlying technology-specific initialization [2].  For example,
> > CoCo-related operations won't be performed.  Concretely for TDX, this API
> > won't invoke TDH.MEM.PAGE.ADD() or TDH.MR.EXTEND().  Vendor-specific APIs
> > are required for such operations.
> > 
> > The key point is to adapt of vcpu ioctl instead of VM ioctl.
> 
> Not sure what you are trying to say here.
> 
> >   First,
> > populating guest memory requires vcpu.  If it is VM ioctl, we need to pick
> > one vcpu somehow.  Secondly, vcpu ioctl allows each vcpu to invoke this
> > ioctl in parallel.  It helps to scale regarding guest memory size, e.g.,
> > hundreds of GB.
> 
> I guess you are explaining why this is a vCPU ioctl instead of a KVM ioctl. Is
> this clearer:

Right, I wanted to explain why I chose vCPU ioctl.  Let me update the commit
message.


> Although the operation is sort of a VM operation, make the ioctl a vCPU ioctl
> instead of KVM ioctl. Do this because a vCPU is needed internally for the fault
> path anyway, and because... (I don't follow the second point).
> 
> > 
> > [1] https://lore.kernel.org/kvm/Zbrj5WKVgMsUFDtb@google.com/
> > [2] https://lore.kernel.org/kvm/Ze-TJh0BBOWm9spT@google.com/
> > 
> > Suggested-by: Sean Christopherson <seanjc@google.com>
> > Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
> > ---
> > v2:
> > - Make flags reserved for future use. (Sean, Michael)
> > - Clarified the supposed use case. (Kai)
> > - Dropped source member of struct kvm_memory_mapping. (Michael)
> > - Change the unit from pages to bytes. (Michael)
> > ---
> >  Documentation/virt/kvm/api.rst | 52 ++++++++++++++++++++++++++++++++++
> >  1 file changed, 52 insertions(+)
> > 
> > diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
> > index f0b76ff5030d..6ee3d2b51a2b 100644
> > --- a/Documentation/virt/kvm/api.rst
> > +++ b/Documentation/virt/kvm/api.rst
> > @@ -6352,6 +6352,58 @@ a single guest_memfd file, but the bound ranges must
> > not overlap).
> >  
> >  See KVM_SET_USER_MEMORY_REGION2 for additional details.
> >  
> > +4.143 KVM_MAP_MEMORY
> > +------------------------
> > +
> > +:Capability: KVM_CAP_MAP_MEMORY
> > +:Architectures: none
> > +:Type: vcpu ioctl
> > +:Parameters: struct kvm_memory_mapping (in/out)
> > +:Returns: 0 on success, < 0 on error
> > +
> > +Errors:
> > +
> > +  ========== =============================================================
> > +  EINVAL     invalid parameters
> > +  EAGAIN     The region is only processed partially.  The caller should
> > +             issue the ioctl with the updated parameters when `size` > 0.
> > +  EINTR      An unmasked signal is pending.  The region may be processed
> > +             partially.
> > +  EFAULT     The parameter address was invalid.  The specified region
> > +             `base_address` and `size` was invalid.  The region isn't
> > +             covered by KVM memory slot.
> > +  EOPNOTSUPP The architecture doesn't support this operation. The x86 two
> > +             dimensional paging supports this API.  the x86 kvm shadow mmu
> > +             doesn't support it.  The other arch KVM doesn't support it.
> > +  ========== =============================================================
> > +
> > +::
> > +
> > +  struct kvm_memory_mapping {
> > +       __u64 base_address;
> > +       __u64 size;
> > +       __u64 flags;
> > +  };
> > +
> > +KVM_MAP_MEMORY populates guest memory with the range, `base_address` in (L1)
> > +guest physical address(GPA) and `size` in bytes.  `flags` must be zero.  It's
> > +reserved for future use.  When the ioctl returns, the input values are
> > updated
> > +to point to the remaining range.  If `size` > 0 on return, the caller should
> > +issue the ioctl with the updated parameters.
> > +
> > +Multiple vcpus are allowed to call this ioctl simultaneously.  It's not
> > +mandatory for all vcpus to issue this ioctl.  A single vcpu can suffice.
> > +Multiple vcpus invocations are utilized for scalability to process the
> > +population in parallel.  If multiple vcpus call this ioctl in parallel, it
> > may
> > +result in the error of EAGAIN due to race conditions.
> > +
> > +This population is restricted to the "pure" population without triggering
> > +underlying technology-specific initialization.  For example, CoCo-related
> > +operations won't perform.  In the case of TDX, this API won't invoke
> > +TDH.MEM.PAGE.ADD() or TDH.MR.EXTEND().  Vendor-specific uAPIs are required
> > for
> > +such operations.
> 
> Probably don't want to have TDX bits in here yet. Since it's talking about what
> KVM_MAP_MEMORY is *not* doing, it can just be dropped.

Ok.  Will drop it.
Paolo Bonzini April 17, 2024, 11:56 a.m. UTC | #3
On Tue, Apr 16, 2024 at 1:27 AM Edgecombe, Rick P
<rick.p.edgecombe@intel.com> wrote:
> > +  EAGAIN     The region is only processed partially.  The caller should
> > +             issue the ioctl with the updated parameters when `size` > 0.
> > +  EINTR      An unmasked signal is pending.  The region may be processed
> > +             partially.

The common convention is to only return errno if no page was processed.

> > +KVM_MAP_MEMORY populates guest memory with the range, `base_address` in (L1)
> > +guest physical address(GPA) and `size` in bytes.  `flags` must be zero.  It's
> > +reserved for future use.  When the ioctl returns, the input values are
> > updated
> > +to point to the remaining range.  If `size` > 0 on return, the caller should
> > +issue the ioctl with the updated parameters.
> > +
> > +Multiple vcpus are allowed to call this ioctl simultaneously.  It's not
> > +mandatory for all vcpus to issue this ioctl.  A single vcpu can suffice.
> > +Multiple vcpus invocations are utilized for scalability to process the
> > +population in parallel.  If multiple vcpus call this ioctl in parallel, it
> > may
> > +result in the error of EAGAIN due to race conditions.
> > +
> > +This population is restricted to the "pure" population without triggering
> > +underlying technology-specific initialization.  For example, CoCo-related
> > +operations won't perform.  In the case of TDX, this API won't invoke
> > +TDH.MEM.PAGE.ADD() or TDH.MR.EXTEND().  Vendor-specific uAPIs are required
> > for
> > +such operations.
>
> Probably don't want to have TDX bits in here yet. Since it's talking about what
> KVM_MAP_MEMORY is *not* doing, it can just be dropped.

Let's rewrite everything to be more generic:

+KVM_MAP_MEMORY populates guest memory in the page tables of a vCPU.
+When the ioctl returns, the input values are updated to point to the
+remaining range.  If `size` > 0 on return, the caller should
+issue the ioctl again with updated parameters.
+
+In some cases, multiple vCPUs might share the page tables.  In this
+case, if this ioctl is called in parallel for multiple vCPUs the
+ioctl might return with `size > 0`.
+
+The ioctl may not be supported for all VMs.  You may use
+`KVM_CHECK_EXTENSION` on the VM file descriptor to check if it is
+supported.
+
+`flags` must currently be zero.


Paolo
diff mbox series

Patch

diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
index f0b76ff5030d..6ee3d2b51a2b 100644
--- a/Documentation/virt/kvm/api.rst
+++ b/Documentation/virt/kvm/api.rst
@@ -6352,6 +6352,58 @@  a single guest_memfd file, but the bound ranges must not overlap).
 
 See KVM_SET_USER_MEMORY_REGION2 for additional details.
 
+4.143 KVM_MAP_MEMORY
+------------------------
+
+:Capability: KVM_CAP_MAP_MEMORY
+:Architectures: none
+:Type: vcpu ioctl
+:Parameters: struct kvm_memory_mapping (in/out)
+:Returns: 0 on success, < 0 on error
+
+Errors:
+
+  ========== =============================================================
+  EINVAL     invalid parameters
+  EAGAIN     The region is only processed partially.  The caller should
+             issue the ioctl with the updated parameters when `size` > 0.
+  EINTR      An unmasked signal is pending.  The region may be processed
+             partially.
+  EFAULT     The parameter address was invalid.  The specified region
+             `base_address` and `size` was invalid.  The region isn't
+             covered by KVM memory slot.
+  EOPNOTSUPP The architecture doesn't support this operation. The x86 two
+             dimensional paging supports this API.  the x86 kvm shadow mmu
+             doesn't support it.  The other arch KVM doesn't support it.
+  ========== =============================================================
+
+::
+
+  struct kvm_memory_mapping {
+	__u64 base_address;
+	__u64 size;
+	__u64 flags;
+  };
+
+KVM_MAP_MEMORY populates guest memory with the range, `base_address` in (L1)
+guest physical address(GPA) and `size` in bytes.  `flags` must be zero.  It's
+reserved for future use.  When the ioctl returns, the input values are updated
+to point to the remaining range.  If `size` > 0 on return, the caller should
+issue the ioctl with the updated parameters.
+
+Multiple vcpus are allowed to call this ioctl simultaneously.  It's not
+mandatory for all vcpus to issue this ioctl.  A single vcpu can suffice.
+Multiple vcpus invocations are utilized for scalability to process the
+population in parallel.  If multiple vcpus call this ioctl in parallel, it may
+result in the error of EAGAIN due to race conditions.
+
+This population is restricted to the "pure" population without triggering
+underlying technology-specific initialization.  For example, CoCo-related
+operations won't perform.  In the case of TDX, this API won't invoke
+TDH.MEM.PAGE.ADD() or TDH.MR.EXTEND().  Vendor-specific uAPIs are required for
+such operations.
+
+
 5. The kvm_run structure
 ========================