Message ID | 9a060293c9ad9a78f1d8994cfe1311e818e99257.1712785629.git.isaku.yamahata@intel.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | KVM: Guest Memory Pre-Population API | expand |
Nits only... On Wed, 2024-04-10 at 15:07 -0700, isaku.yamahata@intel.com wrote: > From: Isaku Yamahata <isaku.yamahata@intel.com> > > Adds documentation of KVM_MAP_MEMORY ioctl. [1] > > It populates guest memory. It doesn't do extra operations on the > underlying technology-specific initialization [2]. For example, > CoCo-related operations won't be performed. Concretely for TDX, this API > won't invoke TDH.MEM.PAGE.ADD() or TDH.MR.EXTEND(). Vendor-specific APIs > are required for such operations. > > The key point is to adapt of vcpu ioctl instead of VM ioctl. Not sure what you are trying to say here. > First, > populating guest memory requires vcpu. If it is VM ioctl, we need to pick > one vcpu somehow. Secondly, vcpu ioctl allows each vcpu to invoke this > ioctl in parallel. It helps to scale regarding guest memory size, e.g., > hundreds of GB. I guess you are explaining why this is a vCPU ioctl instead of a KVM ioctl. Is this clearer: Although the operation is sort of a VM operation, make the ioctl a vCPU ioctl instead of KVM ioctl. Do this because a vCPU is needed internally for the fault path anyway, and because... (I don't follow the second point). > > [1] https://lore.kernel.org/kvm/Zbrj5WKVgMsUFDtb@google.com/ > [2] https://lore.kernel.org/kvm/Ze-TJh0BBOWm9spT@google.com/ > > Suggested-by: Sean Christopherson <seanjc@google.com> > Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com> > --- > v2: > - Make flags reserved for future use. (Sean, Michael) > - Clarified the supposed use case. (Kai) > - Dropped source member of struct kvm_memory_mapping. (Michael) > - Change the unit from pages to bytes. (Michael) > --- > Documentation/virt/kvm/api.rst | 52 ++++++++++++++++++++++++++++++++++ > 1 file changed, 52 insertions(+) > > diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst > index f0b76ff5030d..6ee3d2b51a2b 100644 > --- a/Documentation/virt/kvm/api.rst > +++ b/Documentation/virt/kvm/api.rst > @@ -6352,6 +6352,58 @@ a single guest_memfd file, but the bound ranges must > not overlap). > > See KVM_SET_USER_MEMORY_REGION2 for additional details. > > +4.143 KVM_MAP_MEMORY > +------------------------ > + > +:Capability: KVM_CAP_MAP_MEMORY > +:Architectures: none > +:Type: vcpu ioctl > +:Parameters: struct kvm_memory_mapping (in/out) > +:Returns: 0 on success, < 0 on error > + > +Errors: > + > + ========== ============================================================= > + EINVAL invalid parameters > + EAGAIN The region is only processed partially. The caller should > + issue the ioctl with the updated parameters when `size` > 0. > + EINTR An unmasked signal is pending. The region may be processed > + partially. > + EFAULT The parameter address was invalid. The specified region > + `base_address` and `size` was invalid. The region isn't > + covered by KVM memory slot. > + EOPNOTSUPP The architecture doesn't support this operation. The x86 two > + dimensional paging supports this API. the x86 kvm shadow mmu > + doesn't support it. The other arch KVM doesn't support it. > + ========== ============================================================= > + > +:: > + > + struct kvm_memory_mapping { > + __u64 base_address; > + __u64 size; > + __u64 flags; > + }; > + > +KVM_MAP_MEMORY populates guest memory with the range, `base_address` in (L1) > +guest physical address(GPA) and `size` in bytes. `flags` must be zero. It's > +reserved for future use. When the ioctl returns, the input values are > updated > +to point to the remaining range. If `size` > 0 on return, the caller should > +issue the ioctl with the updated parameters. > + > +Multiple vcpus are allowed to call this ioctl simultaneously. It's not > +mandatory for all vcpus to issue this ioctl. A single vcpu can suffice. > +Multiple vcpus invocations are utilized for scalability to process the > +population in parallel. If multiple vcpus call this ioctl in parallel, it > may > +result in the error of EAGAIN due to race conditions. > + > +This population is restricted to the "pure" population without triggering > +underlying technology-specific initialization. For example, CoCo-related > +operations won't perform. In the case of TDX, this API won't invoke > +TDH.MEM.PAGE.ADD() or TDH.MR.EXTEND(). Vendor-specific uAPIs are required > for > +such operations. Probably don't want to have TDX bits in here yet. Since it's talking about what KVM_MAP_MEMORY is *not* doing, it can just be dropped. > + > + > 5. The kvm_run structure > ======================== >
On Mon, Apr 15, 2024 at 11:27:20PM +0000, "Edgecombe, Rick P" <rick.p.edgecombe@intel.com> wrote: > Nits only... > > On Wed, 2024-04-10 at 15:07 -0700, isaku.yamahata@intel.com wrote: > > From: Isaku Yamahata <isaku.yamahata@intel.com> > > > > Adds documentation of KVM_MAP_MEMORY ioctl. [1] > > > > It populates guest memory. It doesn't do extra operations on the > > underlying technology-specific initialization [2]. For example, > > CoCo-related operations won't be performed. Concretely for TDX, this API > > won't invoke TDH.MEM.PAGE.ADD() or TDH.MR.EXTEND(). Vendor-specific APIs > > are required for such operations. > > > > The key point is to adapt of vcpu ioctl instead of VM ioctl. > > Not sure what you are trying to say here. > > > First, > > populating guest memory requires vcpu. If it is VM ioctl, we need to pick > > one vcpu somehow. Secondly, vcpu ioctl allows each vcpu to invoke this > > ioctl in parallel. It helps to scale regarding guest memory size, e.g., > > hundreds of GB. > > I guess you are explaining why this is a vCPU ioctl instead of a KVM ioctl. Is > this clearer: Right, I wanted to explain why I chose vCPU ioctl. Let me update the commit message. > Although the operation is sort of a VM operation, make the ioctl a vCPU ioctl > instead of KVM ioctl. Do this because a vCPU is needed internally for the fault > path anyway, and because... (I don't follow the second point). > > > > > [1] https://lore.kernel.org/kvm/Zbrj5WKVgMsUFDtb@google.com/ > > [2] https://lore.kernel.org/kvm/Ze-TJh0BBOWm9spT@google.com/ > > > > Suggested-by: Sean Christopherson <seanjc@google.com> > > Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com> > > --- > > v2: > > - Make flags reserved for future use. (Sean, Michael) > > - Clarified the supposed use case. (Kai) > > - Dropped source member of struct kvm_memory_mapping. (Michael) > > - Change the unit from pages to bytes. (Michael) > > --- > > Documentation/virt/kvm/api.rst | 52 ++++++++++++++++++++++++++++++++++ > > 1 file changed, 52 insertions(+) > > > > diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst > > index f0b76ff5030d..6ee3d2b51a2b 100644 > > --- a/Documentation/virt/kvm/api.rst > > +++ b/Documentation/virt/kvm/api.rst > > @@ -6352,6 +6352,58 @@ a single guest_memfd file, but the bound ranges must > > not overlap). > > > > See KVM_SET_USER_MEMORY_REGION2 for additional details. > > > > +4.143 KVM_MAP_MEMORY > > +------------------------ > > + > > +:Capability: KVM_CAP_MAP_MEMORY > > +:Architectures: none > > +:Type: vcpu ioctl > > +:Parameters: struct kvm_memory_mapping (in/out) > > +:Returns: 0 on success, < 0 on error > > + > > +Errors: > > + > > + ========== ============================================================= > > + EINVAL invalid parameters > > + EAGAIN The region is only processed partially. The caller should > > + issue the ioctl with the updated parameters when `size` > 0. > > + EINTR An unmasked signal is pending. The region may be processed > > + partially. > > + EFAULT The parameter address was invalid. The specified region > > + `base_address` and `size` was invalid. The region isn't > > + covered by KVM memory slot. > > + EOPNOTSUPP The architecture doesn't support this operation. The x86 two > > + dimensional paging supports this API. the x86 kvm shadow mmu > > + doesn't support it. The other arch KVM doesn't support it. > > + ========== ============================================================= > > + > > +:: > > + > > + struct kvm_memory_mapping { > > + __u64 base_address; > > + __u64 size; > > + __u64 flags; > > + }; > > + > > +KVM_MAP_MEMORY populates guest memory with the range, `base_address` in (L1) > > +guest physical address(GPA) and `size` in bytes. `flags` must be zero. It's > > +reserved for future use. When the ioctl returns, the input values are > > updated > > +to point to the remaining range. If `size` > 0 on return, the caller should > > +issue the ioctl with the updated parameters. > > + > > +Multiple vcpus are allowed to call this ioctl simultaneously. It's not > > +mandatory for all vcpus to issue this ioctl. A single vcpu can suffice. > > +Multiple vcpus invocations are utilized for scalability to process the > > +population in parallel. If multiple vcpus call this ioctl in parallel, it > > may > > +result in the error of EAGAIN due to race conditions. > > + > > +This population is restricted to the "pure" population without triggering > > +underlying technology-specific initialization. For example, CoCo-related > > +operations won't perform. In the case of TDX, this API won't invoke > > +TDH.MEM.PAGE.ADD() or TDH.MR.EXTEND(). Vendor-specific uAPIs are required > > for > > +such operations. > > Probably don't want to have TDX bits in here yet. Since it's talking about what > KVM_MAP_MEMORY is *not* doing, it can just be dropped. Ok. Will drop it.
On Tue, Apr 16, 2024 at 1:27 AM Edgecombe, Rick P <rick.p.edgecombe@intel.com> wrote: > > + EAGAIN The region is only processed partially. The caller should > > + issue the ioctl with the updated parameters when `size` > 0. > > + EINTR An unmasked signal is pending. The region may be processed > > + partially. The common convention is to only return errno if no page was processed. > > +KVM_MAP_MEMORY populates guest memory with the range, `base_address` in (L1) > > +guest physical address(GPA) and `size` in bytes. `flags` must be zero. It's > > +reserved for future use. When the ioctl returns, the input values are > > updated > > +to point to the remaining range. If `size` > 0 on return, the caller should > > +issue the ioctl with the updated parameters. > > + > > +Multiple vcpus are allowed to call this ioctl simultaneously. It's not > > +mandatory for all vcpus to issue this ioctl. A single vcpu can suffice. > > +Multiple vcpus invocations are utilized for scalability to process the > > +population in parallel. If multiple vcpus call this ioctl in parallel, it > > may > > +result in the error of EAGAIN due to race conditions. > > + > > +This population is restricted to the "pure" population without triggering > > +underlying technology-specific initialization. For example, CoCo-related > > +operations won't perform. In the case of TDX, this API won't invoke > > +TDH.MEM.PAGE.ADD() or TDH.MR.EXTEND(). Vendor-specific uAPIs are required > > for > > +such operations. > > Probably don't want to have TDX bits in here yet. Since it's talking about what > KVM_MAP_MEMORY is *not* doing, it can just be dropped. Let's rewrite everything to be more generic: +KVM_MAP_MEMORY populates guest memory in the page tables of a vCPU. +When the ioctl returns, the input values are updated to point to the +remaining range. If `size` > 0 on return, the caller should +issue the ioctl again with updated parameters. + +In some cases, multiple vCPUs might share the page tables. In this +case, if this ioctl is called in parallel for multiple vCPUs the +ioctl might return with `size > 0`. + +The ioctl may not be supported for all VMs. You may use +`KVM_CHECK_EXTENSION` on the VM file descriptor to check if it is +supported. + +`flags` must currently be zero. Paolo
diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst index f0b76ff5030d..6ee3d2b51a2b 100644 --- a/Documentation/virt/kvm/api.rst +++ b/Documentation/virt/kvm/api.rst @@ -6352,6 +6352,58 @@ a single guest_memfd file, but the bound ranges must not overlap). See KVM_SET_USER_MEMORY_REGION2 for additional details. +4.143 KVM_MAP_MEMORY +------------------------ + +:Capability: KVM_CAP_MAP_MEMORY +:Architectures: none +:Type: vcpu ioctl +:Parameters: struct kvm_memory_mapping (in/out) +:Returns: 0 on success, < 0 on error + +Errors: + + ========== ============================================================= + EINVAL invalid parameters + EAGAIN The region is only processed partially. The caller should + issue the ioctl with the updated parameters when `size` > 0. + EINTR An unmasked signal is pending. The region may be processed + partially. + EFAULT The parameter address was invalid. The specified region + `base_address` and `size` was invalid. The region isn't + covered by KVM memory slot. + EOPNOTSUPP The architecture doesn't support this operation. The x86 two + dimensional paging supports this API. the x86 kvm shadow mmu + doesn't support it. The other arch KVM doesn't support it. + ========== ============================================================= + +:: + + struct kvm_memory_mapping { + __u64 base_address; + __u64 size; + __u64 flags; + }; + +KVM_MAP_MEMORY populates guest memory with the range, `base_address` in (L1) +guest physical address(GPA) and `size` in bytes. `flags` must be zero. It's +reserved for future use. When the ioctl returns, the input values are updated +to point to the remaining range. If `size` > 0 on return, the caller should +issue the ioctl with the updated parameters. + +Multiple vcpus are allowed to call this ioctl simultaneously. It's not +mandatory for all vcpus to issue this ioctl. A single vcpu can suffice. +Multiple vcpus invocations are utilized for scalability to process the +population in parallel. If multiple vcpus call this ioctl in parallel, it may +result in the error of EAGAIN due to race conditions. + +This population is restricted to the "pure" population without triggering +underlying technology-specific initialization. For example, CoCo-related +operations won't perform. In the case of TDX, this API won't invoke +TDH.MEM.PAGE.ADD() or TDH.MR.EXTEND(). Vendor-specific uAPIs are required for +such operations. + + 5. The kvm_run structure ========================