Message ID | 20241212032628.475976-1-binbin.wu@linux.intel.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | i386/kvm: Set return value after handling KVM_EXIT_HYPERCALL | expand |
On Thu, Dec 12, 2024 at 11:26:28AM +0800, Binbin Wu wrote: > Userspace should set the ret field of hypercall after handling > KVM_EXIT_HYPERCALL. Otherwise, a stale value could be returned to KVM. > > Fixes: 47e76d03b15 ("i386/kvm: Add KVM_EXIT_HYPERCALL handling for KVM_HC_MAP_GPA_RANGE") > Reported-by: Farrah Chen <farrah.chen@intel.com> > Signed-off-by: Binbin Wu <binbin.wu@linux.intel.com> > Tested-by: Farrah Chen <farrah.chen@intel.com> > --- > To test the TDX code in kvm-coco-queue, please apply the patch to the QEMU, > otherwise, TDX guest boot could fail. > A matching QEMU tree including this patch is here: > https://github.com/intel-staging/qemu-tdx/releases/tag/tdx-qemu-upstream-v6.1-fix_kvm_hypercall_return_value > > Previously, the issue was not triggered because no one would modify the ret > value. But with the refactor patch for __kvm_emulate_hypercall() in KVM, > https://lore.kernel.org/kvm/20241128004344.4072099-7-seanjc@google.com/, the > value could be modified. > --- > target/i386/kvm/kvm.c | 8 ++++++-- > 1 file changed, 6 insertions(+), 2 deletions(-) > > diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c > index 8e17942c3b..4bcccb48d1 100644 > --- a/target/i386/kvm/kvm.c > +++ b/target/i386/kvm/kvm.c > @@ -6005,10 +6005,14 @@ static int kvm_handle_hc_map_gpa_range(struct kvm_run *run) > > static int kvm_handle_hypercall(struct kvm_run *run) > { > + int ret = -EINVAL; > + > if (run->hypercall.nr == KVM_HC_MAP_GPA_RANGE) > - return kvm_handle_hc_map_gpa_range(run); > + ret = kvm_handle_hc_map_gpa_range(run); LGTM to the issue it tries to fix :-) > + > + run->hypercall.ret = ret; > > - return -EINVAL; > + return ret; > } > > #define VMX_INVALID_GUEST_STATE 0x80000021 > > base-commit: ae35f033b874c627d81d51070187fbf55f0bf1a7 > -- > 2.46.0 >
On 12/12/2024 11:26 AM, Binbin Wu wrote: > Userspace should set the ret field of hypercall after handling > KVM_EXIT_HYPERCALL. Otherwise, a stale value could be returned to KVM. > > Fixes: 47e76d03b15 ("i386/kvm: Add KVM_EXIT_HYPERCALL handling for KVM_HC_MAP_GPA_RANGE") > Reported-by: Farrah Chen <farrah.chen@intel.com> > Signed-off-by: Binbin Wu <binbin.wu@linux.intel.com> > Tested-by: Farrah Chen <farrah.chen@intel.com> > --- > To test the TDX code in kvm-coco-queue, please apply the patch to the QEMU, > otherwise, TDX guest boot could fail. > A matching QEMU tree including this patch is here: > https://github.com/intel-staging/qemu-tdx/releases/tag/tdx-qemu-upstream-v6.1-fix_kvm_hypercall_return_value > > Previously, the issue was not triggered because no one would modify the ret > value. But with the refactor patch for __kvm_emulate_hypercall() in KVM, > https://lore.kernel.org/kvm/20241128004344.4072099-7-seanjc@google.com/, the > value could be modified. > --- > target/i386/kvm/kvm.c | 8 ++++++-- > 1 file changed, 6 insertions(+), 2 deletions(-) > > diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c > index 8e17942c3b..4bcccb48d1 100644 > --- a/target/i386/kvm/kvm.c > +++ b/target/i386/kvm/kvm.c > @@ -6005,10 +6005,14 @@ static int kvm_handle_hc_map_gpa_range(struct kvm_run *run) > > static int kvm_handle_hypercall(struct kvm_run *run) > { > + int ret = -EINVAL; > + > if (run->hypercall.nr == KVM_HC_MAP_GPA_RANGE) > - return kvm_handle_hc_map_gpa_range(run); > + ret = kvm_handle_hc_map_gpa_range(run); > + > + run->hypercall.ret = ret; Updating run->hypercall.ret is useful only when QEMU needs to re-enter the guest. For the case of ret < 0, QEMU will stop the vcpu. I think we might need re-think on the handling of KVM_EXIT_HYPERCALL. E.g., in what error case should QEMU stop the vcpu, and in what case can QEMU return the error back to the guest via run->hypercall.ret. > - return -EINVAL; > + return ret; > } > > #define VMX_INVALID_GUEST_STATE 0x80000021 > > base-commit: ae35f033b874c627d81d51070187fbf55f0bf1a7
On 12/12/2024 11:44 AM, Xiaoyao Li wrote: > On 12/12/2024 11:26 AM, Binbin Wu wrote: >> Userspace should set the ret field of hypercall after handling >> KVM_EXIT_HYPERCALL. Otherwise, a stale value could be returned to KVM. >> >> Fixes: 47e76d03b15 ("i386/kvm: Add KVM_EXIT_HYPERCALL handling for KVM_HC_MAP_GPA_RANGE") >> Reported-by: Farrah Chen <farrah.chen@intel.com> >> Signed-off-by: Binbin Wu <binbin.wu@linux.intel.com> >> Tested-by: Farrah Chen <farrah.chen@intel.com> >> --- >> To test the TDX code in kvm-coco-queue, please apply the patch to the QEMU, >> otherwise, TDX guest boot could fail. >> A matching QEMU tree including this patch is here: >> https://github.com/intel-staging/qemu-tdx/releases/tag/tdx-qemu-upstream-v6.1-fix_kvm_hypercall_return_value >> >> Previously, the issue was not triggered because no one would modify the ret >> value. But with the refactor patch for __kvm_emulate_hypercall() in KVM, >> https://lore.kernel.org/kvm/20241128004344.4072099-7-seanjc@google.com/, the >> value could be modified. >> --- >> target/i386/kvm/kvm.c | 8 ++++++-- >> 1 file changed, 6 insertions(+), 2 deletions(-) >> >> diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c >> index 8e17942c3b..4bcccb48d1 100644 >> --- a/target/i386/kvm/kvm.c >> +++ b/target/i386/kvm/kvm.c >> @@ -6005,10 +6005,14 @@ static int kvm_handle_hc_map_gpa_range(struct kvm_run *run) >> static int kvm_handle_hypercall(struct kvm_run *run) >> { >> + int ret = -EINVAL; >> + >> if (run->hypercall.nr == KVM_HC_MAP_GPA_RANGE) >> - return kvm_handle_hc_map_gpa_range(run); >> + ret = kvm_handle_hc_map_gpa_range(run); >> + >> + run->hypercall.ret = ret; > > Updating run->hypercall.ret is useful only when QEMU needs to re-enter the guest. For the case of ret < 0, QEMU will stop the vcpu. IMHO, assign run->hypercall.ret anyway should be OK, no need to add a per-condition on ret, although the value is not used when ret < 0. Currently, since QEMU will stop the vcpu when ret < 0, this patch doesn't convert ret to -Exxx that the ABI expects. > > I think we might need re-think on the handling of KVM_EXIT_HYPERCALL. E.g., in what error case should QEMU stop the vcpu, and in what case can QEMU return the error back to the guest via run->hypercall.ret. Actually, I had the similar question before. https://lore.kernel.org/kvm/d25cc62c-0f56-4be2-968a-63c8b1d63b5a@linux.intel.com/ It might depends on the hypercall number? Another option is QEMU always sets run->hypercall.ret appropriately and continues the vcpu thread. > >> - return -EINVAL; >> + return ret; >> } >> #define VMX_INVALID_GUEST_STATE 0x80000021 >> >> base-commit: ae35f033b874c627d81d51070187fbf55f0bf1a7 >
On 12/12/2024 1:18 PM, Binbin Wu wrote: > > > > On 12/12/2024 11:44 AM, Xiaoyao Li wrote: >> On 12/12/2024 11:26 AM, Binbin Wu wrote: >>> Userspace should set the ret field of hypercall after handling >>> KVM_EXIT_HYPERCALL. Otherwise, a stale value could be returned to KVM. >>> >>> Fixes: 47e76d03b15 ("i386/kvm: Add KVM_EXIT_HYPERCALL handling for >>> KVM_HC_MAP_GPA_RANGE") >>> Reported-by: Farrah Chen <farrah.chen@intel.com> >>> Signed-off-by: Binbin Wu <binbin.wu@linux.intel.com> >>> Tested-by: Farrah Chen <farrah.chen@intel.com> >>> --- >>> To test the TDX code in kvm-coco-queue, please apply the patch to the >>> QEMU, >>> otherwise, TDX guest boot could fail. >>> A matching QEMU tree including this patch is here: >>> https://github.com/intel-staging/qemu-tdx/releases/tag/tdx-qemu- >>> upstream-v6.1-fix_kvm_hypercall_return_value >>> >>> Previously, the issue was not triggered because no one would modify >>> the ret >>> value. But with the refactor patch for __kvm_emulate_hypercall() in KVM, >>> https://lore.kernel.org/kvm/20241128004344.4072099-7- >>> seanjc@google.com/, the >>> value could be modified. >>> --- >>> target/i386/kvm/kvm.c | 8 ++++++-- >>> 1 file changed, 6 insertions(+), 2 deletions(-) >>> >>> diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c >>> index 8e17942c3b..4bcccb48d1 100644 >>> --- a/target/i386/kvm/kvm.c >>> +++ b/target/i386/kvm/kvm.c >>> @@ -6005,10 +6005,14 @@ static int kvm_handle_hc_map_gpa_range(struct >>> kvm_run *run) >>> static int kvm_handle_hypercall(struct kvm_run *run) >>> { >>> + int ret = -EINVAL; >>> + >>> if (run->hypercall.nr == KVM_HC_MAP_GPA_RANGE) >>> - return kvm_handle_hc_map_gpa_range(run); >>> + ret = kvm_handle_hc_map_gpa_range(run); >>> + >>> + run->hypercall.ret = ret; >> >> Updating run->hypercall.ret is useful only when QEMU needs to re-enter >> the guest. For the case of ret < 0, QEMU will stop the vcpu. > > IMHO, assign run->hypercall.ret anyway should be OK, no need to add a > per-condition on ret, although the value is not used when ret < 0. > > Currently, since QEMU will stop the vcpu when ret < 0, this patch doesn't > convert ret to -Exxx that the ABI expects. I was thinking if it is better to let each specific handling function to update run->hypercall.ret with its own logic. E.g., for this case, let kvm_handle_hc_map_gpa_range() to update the run->hypercall.ret. Reusing the return value of the handling function to update run->hypercall.ret seems not logically correct to me. >> >> I think we might need re-think on the handling of KVM_EXIT_HYPERCALL. >> E.g., in what error case should QEMU stop the vcpu, and in what case >> can QEMU return the error back to the guest via run->hypercall.ret. > > Actually, I had the similar question before. > https://lore.kernel.org/kvm/ > d25cc62c-0f56-4be2-968a-63c8b1d63b5a@linux.intel.com/ > > It might depends on the hypercall number? > Another option is QEMU always sets run->hypercall.ret appropriately and > continues the vcpu thread. > > >> >>> - return -EINVAL; >>> + return ret; >>> } >>> #define VMX_INVALID_GUEST_STATE 0x80000021 >>> >>> base-commit: ae35f033b874c627d81d51070187fbf55f0bf1a7 >> >
On 12/12/2024 3:09 PM, Xiaoyao Li wrote: > On 12/12/2024 1:18 PM, Binbin Wu wrote: >> >> >> >> On 12/12/2024 11:44 AM, Xiaoyao Li wrote: >>> On 12/12/2024 11:26 AM, Binbin Wu wrote: >>>> Userspace should set the ret field of hypercall after handling >>>> KVM_EXIT_HYPERCALL. Otherwise, a stale value could be returned to KVM. >>>> >>>> Fixes: 47e76d03b15 ("i386/kvm: Add KVM_EXIT_HYPERCALL handling for KVM_HC_MAP_GPA_RANGE") >>>> Reported-by: Farrah Chen <farrah.chen@intel.com> >>>> Signed-off-by: Binbin Wu <binbin.wu@linux.intel.com> >>>> Tested-by: Farrah Chen <farrah.chen@intel.com> >>>> --- >>>> To test the TDX code in kvm-coco-queue, please apply the patch to the QEMU, >>>> otherwise, TDX guest boot could fail. >>>> A matching QEMU tree including this patch is here: >>>> https://github.com/intel-staging/qemu-tdx/releases/tag/tdx-qemu- upstream-v6.1-fix_kvm_hypercall_return_value >>>> >>>> Previously, the issue was not triggered because no one would modify the ret >>>> value. But with the refactor patch for __kvm_emulate_hypercall() in KVM, >>>> https://lore.kernel.org/kvm/20241128004344.4072099-7- seanjc@google.com/, the >>>> value could be modified. >>>> --- >>>> target/i386/kvm/kvm.c | 8 ++++++-- >>>> 1 file changed, 6 insertions(+), 2 deletions(-) >>>> >>>> diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c >>>> index 8e17942c3b..4bcccb48d1 100644 >>>> --- a/target/i386/kvm/kvm.c >>>> +++ b/target/i386/kvm/kvm.c >>>> @@ -6005,10 +6005,14 @@ static int kvm_handle_hc_map_gpa_range(struct kvm_run *run) >>>> static int kvm_handle_hypercall(struct kvm_run *run) >>>> { >>>> + int ret = -EINVAL; >>>> + >>>> if (run->hypercall.nr == KVM_HC_MAP_GPA_RANGE) >>>> - return kvm_handle_hc_map_gpa_range(run); >>>> + ret = kvm_handle_hc_map_gpa_range(run); >>>> + >>>> + run->hypercall.ret = ret; >>> >>> Updating run->hypercall.ret is useful only when QEMU needs to re-enter the guest. For the case of ret < 0, QEMU will stop the vcpu. >> >> IMHO, assign run->hypercall.ret anyway should be OK, no need to add a >> per-condition on ret, although the value is not used when ret < 0. >> >> Currently, since QEMU will stop the vcpu when ret < 0, this patch doesn't >> convert ret to -Exxx that the ABI expects. > > I was thinking if it is better to let each specific handling function to update run->hypercall.ret with its own logic. E.g., for this case, let kvm_handle_hc_map_gpa_range() to update the run->hypercall.ret. I think it makes sense. Also, each handling function can decide whether the vcpu should continue if the handling failed. - Return 0 and set the error code ( 0 or -Exxx) to run->hypercall.ret if it want to continue. - Return negative value if it want to stop the vcpu thread. > > Reusing the return value of the handling function to update > run->hypercall.ret seems not logically correct to me. > >>> >>> I think we might need re-think on the handling of KVM_EXIT_HYPERCALL. E.g., in what error case should QEMU stop the vcpu, and in what case can QEMU return the error back to the guest via run->hypercall.ret. >> >> Actually, I had the similar question before. >> https://lore.kernel.org/kvm/ d25cc62c-0f56-4be2-968a-63c8b1d63b5a@linux.intel.com/ >> >> It might depends on the hypercall number? >> Another option is QEMU always sets run->hypercall.ret appropriately and continues the vcpu thread. >> >> >>> >>>> - return -EINVAL; >>>> + return ret; >>>> } >>>> #define VMX_INVALID_GUEST_STATE 0x80000021 >>>> >>>> base-commit: ae35f033b874c627d81d51070187fbf55f0bf1a7 >>> >> >
On Thu, Dec 12, 2024 at 11:26:28AM +0800, Binbin Wu wrote: > Date: Thu, 12 Dec 2024 11:26:28 +0800 > From: Binbin Wu <binbin.wu@linux.intel.com> > Subject: [PATCH] i386/kvm: Set return value after handling > KVM_EXIT_HYPERCALL > X-Mailer: git-send-email 2.46.0 > > Userspace should set the ret field of hypercall after handling > KVM_EXIT_HYPERCALL. Otherwise, a stale value could be returned to KVM. > > Fixes: 47e76d03b15 ("i386/kvm: Add KVM_EXIT_HYPERCALL handling for KVM_HC_MAP_GPA_RANGE") > Reported-by: Farrah Chen <farrah.chen@intel.com> > Signed-off-by: Binbin Wu <binbin.wu@linux.intel.com> > Tested-by: Farrah Chen <farrah.chen@intel.com> > --- > To test the TDX code in kvm-coco-queue, please apply the patch to the QEMU, > otherwise, TDX guest boot could fail. > A matching QEMU tree including this patch is here: > https://github.com/intel-staging/qemu-tdx/releases/tag/tdx-qemu-upstream-v6.1-fix_kvm_hypercall_return_value > > Previously, the issue was not triggered because no one would modify the ret > value. But with the refactor patch for __kvm_emulate_hypercall() in KVM, > https://lore.kernel.org/kvm/20241128004344.4072099-7-seanjc@google.com/, the > value could be modified. Could you explain the specific reasons here in detail? It would be helpful with debugging or reproducing the issue. > --- > target/i386/kvm/kvm.c | 8 ++++++-- > 1 file changed, 6 insertions(+), 2 deletions(-) > > diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c > index 8e17942c3b..4bcccb48d1 100644 > --- a/target/i386/kvm/kvm.c > +++ b/target/i386/kvm/kvm.c > @@ -6005,10 +6005,14 @@ static int kvm_handle_hc_map_gpa_range(struct kvm_run *run) > > static int kvm_handle_hypercall(struct kvm_run *run) > { > + int ret = -EINVAL; > + > if (run->hypercall.nr == KVM_HC_MAP_GPA_RANGE) > - return kvm_handle_hc_map_gpa_range(run); > + ret = kvm_handle_hc_map_gpa_range(run); > + > + run->hypercall.ret = ret; ret may be negative but hypercall.ret is u64. Do we need to set it to -ret? > - return -EINVAL; > + return ret; > } > > #define VMX_INVALID_GUEST_STATE 0x80000021 > > base-commit: ae35f033b874c627d81d51070187fbf55f0bf1a7 > -- > 2.46.0 > >
On 12/12/24 09:07, Zhao Liu wrote: > On Thu, Dec 12, 2024 at 11:26:28AM +0800, Binbin Wu wrote: >> Date: Thu, 12 Dec 2024 11:26:28 +0800 >> From: Binbin Wu <binbin.wu@linux.intel.com> >> Subject: [PATCH] i386/kvm: Set return value after handling >> KVM_EXIT_HYPERCALL >> X-Mailer: git-send-email 2.46.0 >> >> Userspace should set the ret field of hypercall after handling >> KVM_EXIT_HYPERCALL. Otherwise, a stale value could be returned to KVM. >> >> Fixes: 47e76d03b15 ("i386/kvm: Add KVM_EXIT_HYPERCALL handling for KVM_HC_MAP_GPA_RANGE") >> Reported-by: Farrah Chen <farrah.chen@intel.com> >> Signed-off-by: Binbin Wu <binbin.wu@linux.intel.com> >> Tested-by: Farrah Chen <farrah.chen@intel.com> >> --- >> To test the TDX code in kvm-coco-queue, please apply the patch to the QEMU, >> otherwise, TDX guest boot could fail. >> A matching QEMU tree including this patch is here: >> https://github.com/intel-staging/qemu-tdx/releases/tag/tdx-qemu-upstream-v6.1-fix_kvm_hypercall_return_value >> >> Previously, the issue was not triggered because no one would modify the ret >> value. But with the refactor patch for __kvm_emulate_hypercall() in KVM, >> https://lore.kernel.org/kvm/20241128004344.4072099-7-seanjc@google.com/, the >> value could be modified. > > Could you explain the specific reasons here in detail? It would be > helpful with debugging or reproducing the issue. > >> --- >> target/i386/kvm/kvm.c | 8 ++++++-- >> 1 file changed, 6 insertions(+), 2 deletions(-) >> >> diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c >> index 8e17942c3b..4bcccb48d1 100644 >> --- a/target/i386/kvm/kvm.c >> +++ b/target/i386/kvm/kvm.c >> @@ -6005,10 +6005,14 @@ static int kvm_handle_hc_map_gpa_range(struct kvm_run *run) >> >> static int kvm_handle_hypercall(struct kvm_run *run) >> { >> + int ret = -EINVAL; >> + >> if (run->hypercall.nr == KVM_HC_MAP_GPA_RANGE) >> - return kvm_handle_hc_map_gpa_range(run); >> + ret = kvm_handle_hc_map_gpa_range(run); >> + >> + run->hypercall.ret = ret; > > ret may be negative but hypercall.ret is u64. Do we need to set it to > -ret? If ret is less than zero, will stop the VM anyway as RUN_STATE_INTERNAL_ERROR. If this has to be fixed in QEMU, I think there's no need to set anything if ret != 0; also because kvm_convert_memory() returns -1 on error and that's not how the error would be passed to the guest. However, I think the right fix should simply be this in KVM: diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 83fe0a78146f..e2118ba93ef6 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -10066,6 +10066,7 @@ unsigned long __kvm_emulate_hypercall(struct kvm_vcpu *vcpu, unsigned long nr, } vcpu->run->exit_reason = KVM_EXIT_HYPERCALL; + vcpu->run->ret = 0; vcpu->run->hypercall.nr = KVM_HC_MAP_GPA_RANGE; vcpu->run->hypercall.args[0] = gpa; vcpu->run->hypercall.args[1] = npages; While there is arguably a change in behavior of the kernel both with the patches in kvm-coco-queue and with the above one, _in practice_ the above change is one that userspace will not notice. Paolo >> - return -EINVAL; >> + return ret; >> } >> >> #define VMX_INVALID_GUEST_STATE 0x80000021 >> >> base-commit: ae35f033b874c627d81d51070187fbf55f0bf1a7 >> -- >> 2.46.0 >> >>
On Thu, Dec 12, 2024, Paolo Bonzini wrote: > On 12/12/24 09:07, Zhao Liu wrote: > > On Thu, Dec 12, 2024 at 11:26:28AM +0800, Binbin Wu wrote: > > > Date: Thu, 12 Dec 2024 11:26:28 +0800 > > > From: Binbin Wu <binbin.wu@linux.intel.com> > > > Subject: [PATCH] i386/kvm: Set return value after handling > > > KVM_EXIT_HYPERCALL > > > X-Mailer: git-send-email 2.46.0 > > > > > > Userspace should set the ret field of hypercall after handling > > > KVM_EXIT_HYPERCALL. Otherwise, a stale value could be returned to KVM. > > > > > > Fixes: 47e76d03b15 ("i386/kvm: Add KVM_EXIT_HYPERCALL handling for KVM_HC_MAP_GPA_RANGE") > > > Reported-by: Farrah Chen <farrah.chen@intel.com> > > > Signed-off-by: Binbin Wu <binbin.wu@linux.intel.com> > > > Tested-by: Farrah Chen <farrah.chen@intel.com> > > > --- > > > To test the TDX code in kvm-coco-queue, please apply the patch to the QEMU, > > > otherwise, TDX guest boot could fail. > > > A matching QEMU tree including this patch is here: > > > https://github.com/intel-staging/qemu-tdx/releases/tag/tdx-qemu-upstream-v6.1-fix_kvm_hypercall_return_value > > > > > > Previously, the issue was not triggered because no one would modify the ret > > > value. But with the refactor patch for __kvm_emulate_hypercall() in KVM, > > > https://lore.kernel.org/kvm/20241128004344.4072099-7-seanjc@google.com/, the > > > value could be modified. > > > > Could you explain the specific reasons here in detail? It would be > > helpful with debugging or reproducing the issue. > > > > > --- > > > target/i386/kvm/kvm.c | 8 ++++++-- > > > 1 file changed, 6 insertions(+), 2 deletions(-) > > > > > > diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c > > > index 8e17942c3b..4bcccb48d1 100644 > > > --- a/target/i386/kvm/kvm.c > > > +++ b/target/i386/kvm/kvm.c > > > @@ -6005,10 +6005,14 @@ static int kvm_handle_hc_map_gpa_range(struct kvm_run *run) > > > static int kvm_handle_hypercall(struct kvm_run *run) > > > { > > > + int ret = -EINVAL; > > > + > > > if (run->hypercall.nr == KVM_HC_MAP_GPA_RANGE) > > > - return kvm_handle_hc_map_gpa_range(run); > > > + ret = kvm_handle_hc_map_gpa_range(run); > > > + > > > + run->hypercall.ret = ret; > > > > ret may be negative but hypercall.ret is u64. Do we need to set it to > > -ret? > > If ret is less than zero, will stop the VM anyway as > RUN_STATE_INTERNAL_ERROR. > > If this has to be fixed in QEMU, I think there's no need to set anything > if ret != 0; also because kvm_convert_memory() returns -1 on error and > that's not how the error would be passed to the guest. > > However, I think the right fix should simply be this in KVM: > > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c > index 83fe0a78146f..e2118ba93ef6 100644 > --- a/arch/x86/kvm/x86.c > +++ b/arch/x86/kvm/x86.c > @@ -10066,6 +10066,7 @@ unsigned long __kvm_emulate_hypercall(struct kvm_vcpu *vcpu, unsigned long nr, > } > vcpu->run->exit_reason = KVM_EXIT_HYPERCALL; > + vcpu->run->ret = 0; vcpu->run->hypercall.ret > vcpu->run->hypercall.nr = KVM_HC_MAP_GPA_RANGE; > vcpu->run->hypercall.args[0] = gpa; > vcpu->run->hypercall.args[1] = npages; > > While there is arguably a change in behavior of the kernel both with > the patches in kvm-coco-queue and with the above one, _in practice_ > the above change is one that userspace will not notice. I agree that KVM should initialize "ret", but I don't think '0' is the right value. KVM shouldn't assume userspace will successfully handle the hypercall. What happens if KVM sets vcpu->run->hypercall.ret to a non-zero value, e.g. -KVM_ENOSYS?
On 12/12/24 20:13, Sean Christopherson wrote: > On Thu, Dec 12, 2024, Paolo Bonzini wrote: >> On 12/12/24 09:07, Zhao Liu wrote: >>> On Thu, Dec 12, 2024 at 11:26:28AM +0800, Binbin Wu wrote: >>>> Date: Thu, 12 Dec 2024 11:26:28 +0800 >>>> From: Binbin Wu <binbin.wu@linux.intel.com> >>>> Subject: [PATCH] i386/kvm: Set return value after handling >>>> KVM_EXIT_HYPERCALL >>>> X-Mailer: git-send-email 2.46.0 >>>> >>>> Userspace should set the ret field of hypercall after handling >>>> KVM_EXIT_HYPERCALL. Otherwise, a stale value could be returned to KVM. >>>> >>>> Fixes: 47e76d03b15 ("i386/kvm: Add KVM_EXIT_HYPERCALL handling for KVM_HC_MAP_GPA_RANGE") >>>> Reported-by: Farrah Chen <farrah.chen@intel.com> >>>> Signed-off-by: Binbin Wu <binbin.wu@linux.intel.com> >>>> Tested-by: Farrah Chen <farrah.chen@intel.com> >>>> --- >>>> To test the TDX code in kvm-coco-queue, please apply the patch to the QEMU, >>>> otherwise, TDX guest boot could fail. >>>> A matching QEMU tree including this patch is here: >>>> https://github.com/intel-staging/qemu-tdx/releases/tag/tdx-qemu-upstream-v6.1-fix_kvm_hypercall_return_value >>>> >>>> Previously, the issue was not triggered because no one would modify the ret >>>> value. But with the refactor patch for __kvm_emulate_hypercall() in KVM, >>>> https://lore.kernel.org/kvm/20241128004344.4072099-7-seanjc@google.com/, the >>>> value could be modified. >>> >>> Could you explain the specific reasons here in detail? It would be >>> helpful with debugging or reproducing the issue. >>> >>>> --- >>>> target/i386/kvm/kvm.c | 8 ++++++-- >>>> 1 file changed, 6 insertions(+), 2 deletions(-) >>>> >>>> diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c >>>> index 8e17942c3b..4bcccb48d1 100644 >>>> --- a/target/i386/kvm/kvm.c >>>> +++ b/target/i386/kvm/kvm.c >>>> @@ -6005,10 +6005,14 @@ static int kvm_handle_hc_map_gpa_range(struct kvm_run *run) >>>> static int kvm_handle_hypercall(struct kvm_run *run) >>>> { >>>> + int ret = -EINVAL; >>>> + >>>> if (run->hypercall.nr == KVM_HC_MAP_GPA_RANGE) >>>> - return kvm_handle_hc_map_gpa_range(run); >>>> + ret = kvm_handle_hc_map_gpa_range(run); >>>> + >>>> + run->hypercall.ret = ret; >>> >>> ret may be negative but hypercall.ret is u64. Do we need to set it to >>> -ret? >> >> If ret is less than zero, will stop the VM anyway as >> RUN_STATE_INTERNAL_ERROR. >> >> If this has to be fixed in QEMU, I think there's no need to set anything >> if ret != 0; also because kvm_convert_memory() returns -1 on error and >> that's not how the error would be passed to the guest. >> >> However, I think the right fix should simply be this in KVM: >> >> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c >> index 83fe0a78146f..e2118ba93ef6 100644 >> --- a/arch/x86/kvm/x86.c >> +++ b/arch/x86/kvm/x86.c >> @@ -10066,6 +10066,7 @@ unsigned long __kvm_emulate_hypercall(struct kvm_vcpu *vcpu, unsigned long nr, >> } >> vcpu->run->exit_reason = KVM_EXIT_HYPERCALL; >> + vcpu->run->ret = 0; > > vcpu->run->hypercall.ret > >> vcpu->run->hypercall.nr = KVM_HC_MAP_GPA_RANGE; >> vcpu->run->hypercall.args[0] = gpa; >> vcpu->run->hypercall.args[1] = npages; >> >> While there is arguably a change in behavior of the kernel both with >> the patches in kvm-coco-queue and with the above one, _in practice_ >> the above change is one that userspace will not notice. > > I agree that KVM should initialize "ret", but I don't think '0' is the right > value. KVM shouldn't assume userspace will successfully handle the hypercall. > What happens if KVM sets vcpu->run->hypercall.ret to a non-zero value, e.g. -KVM_ENOSYS? Unfortunately QEMU is never writing vcpu->run->hypercall.ret, so the guest sees -KVM_ENOSYS; this is basically the same bug that Binbin is fixing, just with a different value passed to the guest. In other words, the above one-liner is pulling the "don't break userspace" card. Paolo
On Thu, Dec 12, 2024, Paolo Bonzini wrote: > On 12/12/24 20:13, Sean Christopherson wrote: > > On Thu, Dec 12, 2024, Paolo Bonzini wrote: > > > If ret is less than zero, will stop the VM anyway as > > > RUN_STATE_INTERNAL_ERROR. > > > > > > If this has to be fixed in QEMU, I think there's no need to set anything > > > if ret != 0; also because kvm_convert_memory() returns -1 on error and > > > that's not how the error would be passed to the guest. > > > > > > However, I think the right fix should simply be this in KVM: > > > > > > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c > > > index 83fe0a78146f..e2118ba93ef6 100644 > > > --- a/arch/x86/kvm/x86.c > > > +++ b/arch/x86/kvm/x86.c > > > @@ -10066,6 +10066,7 @@ unsigned long __kvm_emulate_hypercall(struct kvm_vcpu *vcpu, unsigned long nr, > > > } > > > vcpu->run->exit_reason = KVM_EXIT_HYPERCALL; > > > + vcpu->run->ret = 0; > > > > vcpu->run->hypercall.ret > > > > > vcpu->run->hypercall.nr = KVM_HC_MAP_GPA_RANGE; > > > vcpu->run->hypercall.args[0] = gpa; > > > vcpu->run->hypercall.args[1] = npages; > > > > > > While there is arguably a change in behavior of the kernel both with > > > the patches in kvm-coco-queue and with the above one, _in practice_ > > > the above change is one that userspace will not notice. > > > > I agree that KVM should initialize "ret", but I don't think '0' is the right > > value. KVM shouldn't assume userspace will successfully handle the hypercall. > > What happens if KVM sets vcpu->run->hypercall.ret to a non-zero value, e.g. -KVM_ENOSYS? > > Unfortunately QEMU is never writing vcpu->run->hypercall.ret, so the guest > sees -KVM_ENOSYS; this is basically the same bug that Binbin is fixing, just > with a different value passed to the guest. > > In other words, the above one-liner is pulling the "don't break userspace" > card. But how is anything breaking userspace? QEMU needs to opt-in to intercepting KVM_HC_MAP_GPA_RANGE, and this has been KVM's behavior since commit 0dbb11230437 ("KVM: X86: Introduce KVM_HC_MAP_GPA_RANGE hypercall"). Ah, "ret" happens to be deep in the union and KVM zero allocates vcpu->run, so QEMU gets lucky and "ret" happens to be zero because no other non-fatal userspace exit on x86 happens to need as many bytes. Hilarious. FWIW, if TDX marshalls hypercall state into KVM's "normal" registers, then KVM's shenanigans with vcpu->run->hypercall.ret might go away? Though regardless of what happens on that front, I think it makes to explicitly initialize "ret" to *something*. I checked our VMM, and it does the right thing, so I don't have any objection to explicitly zeroing "ret". Though it needs a comment explaining that it's a terrible hack for broken userspace ;-)
On 12/13/2024 5:28 AM, Paolo Bonzini wrote: > On 12/12/24 20:13, Sean Christopherson wrote: >> On Thu, Dec 12, 2024, Paolo Bonzini wrote: >>> On 12/12/24 09:07, Zhao Liu wrote: >>>> On Thu, Dec 12, 2024 at 11:26:28AM +0800, Binbin Wu wrote: >>>>> Date: Thu, 12 Dec 2024 11:26:28 +0800 >>>>> From: Binbin Wu <binbin.wu@linux.intel.com> >>>>> Subject: [PATCH] i386/kvm: Set return value after handling >>>>> KVM_EXIT_HYPERCALL >>>>> X-Mailer: git-send-email 2.46.0 >>>>> >>>>> Userspace should set the ret field of hypercall after handling >>>>> KVM_EXIT_HYPERCALL. Otherwise, a stale value could be returned to KVM. >>>>> >>>>> Fixes: 47e76d03b15 ("i386/kvm: Add KVM_EXIT_HYPERCALL handling for KVM_HC_MAP_GPA_RANGE") >>>>> Reported-by: Farrah Chen <farrah.chen@intel.com> >>>>> Signed-off-by: Binbin Wu <binbin.wu@linux.intel.com> >>>>> Tested-by: Farrah Chen <farrah.chen@intel.com> >>>>> --- >>>>> To test the TDX code in kvm-coco-queue, please apply the patch to the QEMU, >>>>> otherwise, TDX guest boot could fail. >>>>> A matching QEMU tree including this patch is here: >>>>> https://github.com/intel-staging/qemu-tdx/releases/tag/tdx-qemu-upstream-v6.1-fix_kvm_hypercall_return_value >>>>> >>>>> Previously, the issue was not triggered because no one would modify the ret >>>>> value. But with the refactor patch for __kvm_emulate_hypercall() in KVM, >>>>> https://lore.kernel.org/kvm/20241128004344.4072099-7-seanjc@google.com/, the >>>>> value could be modified. >>>> >>>> Could you explain the specific reasons here in detail? It would be >>>> helpful with debugging or reproducing the issue. >>>> >>>>> --- >>>>> target/i386/kvm/kvm.c | 8 ++++++-- >>>>> 1 file changed, 6 insertions(+), 2 deletions(-) >>>>> >>>>> diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c >>>>> index 8e17942c3b..4bcccb48d1 100644 >>>>> --- a/target/i386/kvm/kvm.c >>>>> +++ b/target/i386/kvm/kvm.c >>>>> @@ -6005,10 +6005,14 @@ static int kvm_handle_hc_map_gpa_range(struct kvm_run *run) >>>>> static int kvm_handle_hypercall(struct kvm_run *run) >>>>> { >>>>> + int ret = -EINVAL; >>>>> + >>>>> if (run->hypercall.nr == KVM_HC_MAP_GPA_RANGE) >>>>> - return kvm_handle_hc_map_gpa_range(run); >>>>> + ret = kvm_handle_hc_map_gpa_range(run); >>>>> + >>>>> + run->hypercall.ret = ret; >>>> >>>> ret may be negative but hypercall.ret is u64. Do we need to set it to >>>> -ret? >>> >>> If ret is less than zero, will stop the VM anyway as >>> RUN_STATE_INTERNAL_ERROR. >>> >>> If this has to be fixed in QEMU, I think there's no need to set anything >>> if ret != 0; also because kvm_convert_memory() returns -1 on error and >>> that's not how the error would be passed to the guest. >>> >>> However, I think the right fix should simply be this in KVM: >>> >>> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c >>> index 83fe0a78146f..e2118ba93ef6 100644 >>> --- a/arch/x86/kvm/x86.c >>> +++ b/arch/x86/kvm/x86.c >>> @@ -10066,6 +10066,7 @@ unsigned long __kvm_emulate_hypercall(struct kvm_vcpu *vcpu, unsigned long nr, >>> } >>> vcpu->run->exit_reason = KVM_EXIT_HYPERCALL; >>> + vcpu->run->ret = 0; >> >> vcpu->run->hypercall.ret >> >>> vcpu->run->hypercall.nr = KVM_HC_MAP_GPA_RANGE; >>> vcpu->run->hypercall.args[0] = gpa; >>> vcpu->run->hypercall.args[1] = npages; >>> >>> While there is arguably a change in behavior of the kernel both with >>> the patches in kvm-coco-queue and with the above one, _in practice_ >>> the above change is one that userspace will not notice. >> >> I agree that KVM should initialize "ret", but I don't think '0' is the right >> value. KVM shouldn't assume userspace will successfully handle the hypercall. >> What happens if KVM sets vcpu->run->hypercall.ret to a non-zero value, e.g. -KVM_ENOSYS? > > Unfortunately QEMU is never writing vcpu->run->hypercall.ret, so the guest sees -KVM_ENOSYS; this is basically the same bug that Binbin is fixing, just with a different value passed to the guest. > > In other words, the above one-liner is pulling the "don't break userspace" card. > > Paolo > > If the change need to be done in KVM, there are other 3 functions that use KVM_EXIT_HYPERCALL based on the code in kvm-coco-queue. diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c index 40fe7258843e..a624f7289282 100644 --- a/arch/x86/kvm/svm/sev.c +++ b/arch/x86/kvm/svm/sev.c @@ -3633,6 +3633,7 @@ static int snp_begin_psc_msr(struct vcpu_svm *svm, u64 ghcb_msr) } vcpu->run->exit_reason = KVM_EXIT_HYPERCALL; + vcpu->run->ret = 0; vcpu->run->hypercall.nr = KVM_HC_MAP_GPA_RANGE; vcpu->run->hypercall.args[0] = gpa; vcpu->run->hypercall.args[1] = 1; @@ -3796,6 +3797,7 @@ static int snp_begin_psc(struct vcpu_svm *svm, struct psc_buffer *psc) case VMGEXIT_PSC_OP_PRIVATE: case VMGEXIT_PSC_OP_SHARED: vcpu->run->exit_reason = KVM_EXIT_HYPERCALL; + vcpu->run->ret = 0; vcpu->run->hypercall.nr = KVM_HC_MAP_GPA_RANGE; vcpu->run->hypercall.args[0] = gfn_to_gpa(gfn); vcpu->run->hypercall.args[1] = npages; diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c index 85c8aee263c1..c50c2edc8c56 100644 --- a/arch/x86/kvm/vmx/tdx.c +++ b/arch/x86/kvm/vmx/tdx.c @@ -1161,6 +1161,7 @@ static void __tdx_map_gpa(struct vcpu_tdx * tdx) pr_err("%s: gpa = 0x%llx, size = 0x%llx", __func__, gpa, size); tdx->vcpu.run->exit_reason = KVM_EXIT_HYPERCALL; + tdx->vcpu->run->ret = 0; tdx->vcpu.run->hypercall.nr = KVM_HC_MAP_GPA_RANGE; tdx->vcpu.run->hypercall.args[0] = gpa & ~gfn_to_gpa(kvm_gfn_direct_bits(tdx->vcpu.kvm)); tdx->vcpu.run->hypercall.args[1] = size / PAGE_SIZE; diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 4f94b1e24eae..3f82bb2357e3 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -10070,6 +10070,7 @@ int ____kvm_emulate_hypercall(struct kvm_vcpu *vcpu, unsigned long nr, } vcpu->run->exit_reason = KVM_EXIT_HYPERCALL; + vcpu->run->ret = 0; vcpu->run->hypercall.nr = KVM_HC_MAP_GPA_RANGE; vcpu->run->hypercall.args[0] = gpa; vcpu->run->hypercall.args[1] = npages;
On 12/13/2024 9:46 AM, Binbin Wu wrote: > > > > On 12/13/2024 5:28 AM, Paolo Bonzini wrote: >> On 12/12/24 20:13, Sean Christopherson wrote: >>> On Thu, Dec 12, 2024, Paolo Bonzini wrote: >>>> On 12/12/24 09:07, Zhao Liu wrote: >>>>> On Thu, Dec 12, 2024 at 11:26:28AM +0800, Binbin Wu wrote: >>>>>> Date: Thu, 12 Dec 2024 11:26:28 +0800 >>>>>> From: Binbin Wu <binbin.wu@linux.intel.com> >>>>>> Subject: [PATCH] i386/kvm: Set return value after handling >>>>>> KVM_EXIT_HYPERCALL >>>>>> X-Mailer: git-send-email 2.46.0 >>>>>> >>>>>> Userspace should set the ret field of hypercall after handling >>>>>> KVM_EXIT_HYPERCALL. Otherwise, a stale value could be returned to KVM. >>>>>> >>>>>> Fixes: 47e76d03b15 ("i386/kvm: Add KVM_EXIT_HYPERCALL handling for KVM_HC_MAP_GPA_RANGE") >>>>>> Reported-by: Farrah Chen <farrah.chen@intel.com> >>>>>> Signed-off-by: Binbin Wu <binbin.wu@linux.intel.com> >>>>>> Tested-by: Farrah Chen <farrah.chen@intel.com> >>>>>> --- >>>>>> To test the TDX code in kvm-coco-queue, please apply the patch to the QEMU, >>>>>> otherwise, TDX guest boot could fail. >>>>>> A matching QEMU tree including this patch is here: >>>>>> https://github.com/intel-staging/qemu-tdx/releases/tag/tdx-qemu-upstream-v6.1-fix_kvm_hypercall_return_value >>>>>> >>>>>> Previously, the issue was not triggered because no one would modify the ret >>>>>> value. But with the refactor patch for __kvm_emulate_hypercall() in KVM, >>>>>> https://lore.kernel.org/kvm/20241128004344.4072099-7-seanjc@google.com/, the >>>>>> value could be modified. >>>>> >>>>> Could you explain the specific reasons here in detail? It would be >>>>> helpful with debugging or reproducing the issue. >>>>> >>>>>> --- >>>>>> target/i386/kvm/kvm.c | 8 ++++++-- >>>>>> 1 file changed, 6 insertions(+), 2 deletions(-) >>>>>> >>>>>> diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c >>>>>> index 8e17942c3b..4bcccb48d1 100644 >>>>>> --- a/target/i386/kvm/kvm.c >>>>>> +++ b/target/i386/kvm/kvm.c >>>>>> @@ -6005,10 +6005,14 @@ static int kvm_handle_hc_map_gpa_range(struct kvm_run *run) >>>>>> static int kvm_handle_hypercall(struct kvm_run *run) >>>>>> { >>>>>> + int ret = -EINVAL; >>>>>> + >>>>>> if (run->hypercall.nr == KVM_HC_MAP_GPA_RANGE) >>>>>> - return kvm_handle_hc_map_gpa_range(run); >>>>>> + ret = kvm_handle_hc_map_gpa_range(run); >>>>>> + >>>>>> + run->hypercall.ret = ret; >>>>> >>>>> ret may be negative but hypercall.ret is u64. Do we need to set it to >>>>> -ret? >>>> >>>> If ret is less than zero, will stop the VM anyway as >>>> RUN_STATE_INTERNAL_ERROR. >>>> >>>> If this has to be fixed in QEMU, I think there's no need to set anything >>>> if ret != 0; also because kvm_convert_memory() returns -1 on error and >>>> that's not how the error would be passed to the guest. >>>> >>>> However, I think the right fix should simply be this in KVM: >>>> >>>> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c >>>> index 83fe0a78146f..e2118ba93ef6 100644 >>>> --- a/arch/x86/kvm/x86.c >>>> +++ b/arch/x86/kvm/x86.c >>>> @@ -10066,6 +10066,7 @@ unsigned long __kvm_emulate_hypercall(struct kvm_vcpu *vcpu, unsigned long nr, >>>> } >>>> vcpu->run->exit_reason = KVM_EXIT_HYPERCALL; >>>> + vcpu->run->ret = 0; >>> >>> vcpu->run->hypercall.ret >>> >>>> vcpu->run->hypercall.nr = KVM_HC_MAP_GPA_RANGE; >>>> vcpu->run->hypercall.args[0] = gpa; >>>> vcpu->run->hypercall.args[1] = npages; >>>> >>>> While there is arguably a change in behavior of the kernel both with >>>> the patches in kvm-coco-queue and with the above one, _in practice_ >>>> the above change is one that userspace will not notice. >>> >>> I agree that KVM should initialize "ret", but I don't think '0' is the right >>> value. KVM shouldn't assume userspace will successfully handle the hypercall. >>> What happens if KVM sets vcpu->run->hypercall.ret to a non-zero value, e.g. -KVM_ENOSYS? >> >> Unfortunately QEMU is never writing vcpu->run->hypercall.ret, so the guest sees -KVM_ENOSYS; this is basically the same bug that Binbin is fixing, just with a different value passed to the guest. >> >> In other words, the above one-liner is pulling the "don't break userspace" card. >> >> Paolo >> >> > If the change need to be done in KVM, there are other 3 functions that use > KVM_EXIT_HYPERCALL based on the code in kvm-coco-queue. > > diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c > index 40fe7258843e..a624f7289282 100644 > --- a/arch/x86/kvm/svm/sev.c > +++ b/arch/x86/kvm/svm/sev.c > @@ -3633,6 +3633,7 @@ static int snp_begin_psc_msr(struct vcpu_svm *svm, u64 ghcb_msr) > } > > vcpu->run->exit_reason = KVM_EXIT_HYPERCALL; > + vcpu->run->ret = 0; > vcpu->run->hypercall.nr = KVM_HC_MAP_GPA_RANGE; > vcpu->run->hypercall.args[0] = gpa; > vcpu->run->hypercall.args[1] = 1; > @@ -3796,6 +3797,7 @@ static int snp_begin_psc(struct vcpu_svm *svm, struct psc_buffer *psc) > case VMGEXIT_PSC_OP_PRIVATE: > case VMGEXIT_PSC_OP_SHARED: > vcpu->run->exit_reason = KVM_EXIT_HYPERCALL; > + vcpu->run->ret = 0; > vcpu->run->hypercall.nr = KVM_HC_MAP_GPA_RANGE; > vcpu->run->hypercall.args[0] = gfn_to_gpa(gfn); > vcpu->run->hypercall.args[1] = npages; > diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c > index 85c8aee263c1..c50c2edc8c56 100644 > --- a/arch/x86/kvm/vmx/tdx.c > +++ b/arch/x86/kvm/vmx/tdx.c > @@ -1161,6 +1161,7 @@ static void __tdx_map_gpa(struct vcpu_tdx * tdx) > pr_err("%s: gpa = 0x%llx, size = 0x%llx", __func__, gpa, size); > > tdx->vcpu.run->exit_reason = KVM_EXIT_HYPERCALL; > + tdx->vcpu->run->ret = 0; > tdx->vcpu.run->hypercall.nr = KVM_HC_MAP_GPA_RANGE; > tdx->vcpu.run->hypercall.args[0] = gpa & ~gfn_to_gpa(kvm_gfn_direct_bits(tdx->vcpu.kvm)); > tdx->vcpu.run->hypercall.args[1] = size / PAGE_SIZE; > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c > index 4f94b1e24eae..3f82bb2357e3 100644 > --- a/arch/x86/kvm/x86.c > +++ b/arch/x86/kvm/x86.c > @@ -10070,6 +10070,7 @@ int ____kvm_emulate_hypercall(struct kvm_vcpu *vcpu, unsigned long nr, > } > > vcpu->run->exit_reason = KVM_EXIT_HYPERCALL; > + vcpu->run->ret = 0; > vcpu->run->hypercall.nr = KVM_HC_MAP_GPA_RANGE; > vcpu->run->hypercall.args[0] = gpa; > vcpu->run->hypercall.args[1] = npages; > Maybe we could add a helper to fill the vcpu->run?
On 12/13/2024 9:46 AM, Binbin Wu wrote: > > > > On 12/13/2024 5:28 AM, Paolo Bonzini wrote: >> On 12/12/24 20:13, Sean Christopherson wrote: >>> On Thu, Dec 12, 2024, Paolo Bonzini wrote: >>>> On 12/12/24 09:07, Zhao Liu wrote: >>>>> On Thu, Dec 12, 2024 at 11:26:28AM +0800, Binbin Wu wrote: >>>>>> Date: Thu, 12 Dec 2024 11:26:28 +0800 >>>>>> From: Binbin Wu <binbin.wu@linux.intel.com> >>>>>> Subject: [PATCH] i386/kvm: Set return value after handling >>>>>> KVM_EXIT_HYPERCALL >>>>>> X-Mailer: git-send-email 2.46.0 >>>>>> >>>>>> Userspace should set the ret field of hypercall after handling >>>>>> KVM_EXIT_HYPERCALL. Otherwise, a stale value could be returned to KVM. >>>>>> >>>>>> Fixes: 47e76d03b15 ("i386/kvm: Add KVM_EXIT_HYPERCALL handling for KVM_HC_MAP_GPA_RANGE") >>>>>> Reported-by: Farrah Chen <farrah.chen@intel.com> >>>>>> Signed-off-by: Binbin Wu <binbin.wu@linux.intel.com> >>>>>> Tested-by: Farrah Chen <farrah.chen@intel.com> >>>>>> --- >>>>>> To test the TDX code in kvm-coco-queue, please apply the patch to the QEMU, >>>>>> otherwise, TDX guest boot could fail. >>>>>> A matching QEMU tree including this patch is here: >>>>>> https://github.com/intel-staging/qemu-tdx/releases/tag/tdx-qemu-upstream-v6.1-fix_kvm_hypercall_return_value >>>>>> >>>>>> Previously, the issue was not triggered because no one would modify the ret >>>>>> value. But with the refactor patch for __kvm_emulate_hypercall() in KVM, >>>>>> https://lore.kernel.org/kvm/20241128004344.4072099-7-seanjc@google.com/, the >>>>>> value could be modified. >>>>> >>>>> Could you explain the specific reasons here in detail? It would be >>>>> helpful with debugging or reproducing the issue. >>>>> >>>>>> --- >>>>>> target/i386/kvm/kvm.c | 8 ++++++-- >>>>>> 1 file changed, 6 insertions(+), 2 deletions(-) >>>>>> >>>>>> diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c >>>>>> index 8e17942c3b..4bcccb48d1 100644 >>>>>> --- a/target/i386/kvm/kvm.c >>>>>> +++ b/target/i386/kvm/kvm.c >>>>>> @@ -6005,10 +6005,14 @@ static int kvm_handle_hc_map_gpa_range(struct kvm_run *run) >>>>>> static int kvm_handle_hypercall(struct kvm_run *run) >>>>>> { >>>>>> + int ret = -EINVAL; >>>>>> + >>>>>> if (run->hypercall.nr == KVM_HC_MAP_GPA_RANGE) >>>>>> - return kvm_handle_hc_map_gpa_range(run); >>>>>> + ret = kvm_handle_hc_map_gpa_range(run); >>>>>> + >>>>>> + run->hypercall.ret = ret; >>>>> >>>>> ret may be negative but hypercall.ret is u64. Do we need to set it to >>>>> -ret? >>>> >>>> If ret is less than zero, will stop the VM anyway as >>>> RUN_STATE_INTERNAL_ERROR. >>>> >>>> If this has to be fixed in QEMU, I think there's no need to set anything >>>> if ret != 0; also because kvm_convert_memory() returns -1 on error and >>>> that's not how the error would be passed to the guest. >>>> >>>> However, I think the right fix should simply be this in KVM: >>>> >>>> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c >>>> index 83fe0a78146f..e2118ba93ef6 100644 >>>> --- a/arch/x86/kvm/x86.c >>>> +++ b/arch/x86/kvm/x86.c >>>> @@ -10066,6 +10066,7 @@ unsigned long __kvm_emulate_hypercall(struct kvm_vcpu *vcpu, unsigned long nr, >>>> } >>>> vcpu->run->exit_reason = KVM_EXIT_HYPERCALL; >>>> + vcpu->run->ret = 0; >>> >>> vcpu->run->hypercall.ret >>> >>>> vcpu->run->hypercall.nr = KVM_HC_MAP_GPA_RANGE; >>>> vcpu->run->hypercall.args[0] = gpa; >>>> vcpu->run->hypercall.args[1] = npages; >>>> >>>> While there is arguably a change in behavior of the kernel both with >>>> the patches in kvm-coco-queue and with the above one, _in practice_ >>>> the above change is one that userspace will not notice. >>> >>> I agree that KVM should initialize "ret", but I don't think '0' is the right >>> value. KVM shouldn't assume userspace will successfully handle the hypercall. >>> What happens if KVM sets vcpu->run->hypercall.ret to a non-zero value, e.g. -KVM_ENOSYS? >> >> Unfortunately QEMU is never writing vcpu->run->hypercall.ret, so the guest sees -KVM_ENOSYS; this is basically the same bug that Binbin is fixing, just with a different value passed to the guest. >> >> In other words, the above one-liner is pulling the "don't break userspace" card. >> >> Paolo >> >> > If the change need to be done in KVM, there are other 3 functions that use > KVM_EXIT_HYPERCALL based on the code in kvm-coco-queue. > > diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c > index 40fe7258843e..a624f7289282 100644 > --- a/arch/x86/kvm/svm/sev.c > +++ b/arch/x86/kvm/svm/sev.c > @@ -3633,6 +3633,7 @@ static int snp_begin_psc_msr(struct vcpu_svm *svm, u64 ghcb_msr) > } > > vcpu->run->exit_reason = KVM_EXIT_HYPERCALL; > + vcpu->run->ret = 0; > vcpu->run->hypercall.nr = KVM_HC_MAP_GPA_RANGE; > vcpu->run->hypercall.args[0] = gpa; > vcpu->run->hypercall.args[1] = 1; > @@ -3796,6 +3797,7 @@ static int snp_begin_psc(struct vcpu_svm *svm, struct psc_buffer *psc) > case VMGEXIT_PSC_OP_PRIVATE: > case VMGEXIT_PSC_OP_SHARED: > vcpu->run->exit_reason = KVM_EXIT_HYPERCALL; > + vcpu->run->ret = 0; > vcpu->run->hypercall.nr = KVM_HC_MAP_GPA_RANGE; > vcpu->run->hypercall.args[0] = gfn_to_gpa(gfn); > vcpu->run->hypercall.args[1] = npages; > diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c > index 85c8aee263c1..c50c2edc8c56 100644 > --- a/arch/x86/kvm/vmx/tdx.c > +++ b/arch/x86/kvm/vmx/tdx.c > @@ -1161,6 +1161,7 @@ static void __tdx_map_gpa(struct vcpu_tdx * tdx) > pr_err("%s: gpa = 0x%llx, size = 0x%llx", __func__, gpa, size); > > tdx->vcpu.run->exit_reason = KVM_EXIT_HYPERCALL; > + tdx->vcpu->run->ret = 0; Sorry, this should be " tdx->vcpu.run->ret = 0;" > tdx->vcpu.run->hypercall.nr = KVM_HC_MAP_GPA_RANGE; > tdx->vcpu.run->hypercall.args[0] = gpa & ~gfn_to_gpa(kvm_gfn_direct_bits(tdx->vcpu.kvm)); > tdx->vcpu.run->hypercall.args[1] = size / PAGE_SIZE; > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c > index 4f94b1e24eae..3f82bb2357e3 100644 > --- a/arch/x86/kvm/x86.c > +++ b/arch/x86/kvm/x86.c > @@ -10070,6 +10070,7 @@ int ____kvm_emulate_hypercall(struct kvm_vcpu *vcpu, unsigned long nr, > } > > vcpu->run->exit_reason = KVM_EXIT_HYPERCALL; > + vcpu->run->ret = 0; > vcpu->run->hypercall.nr = KVM_HC_MAP_GPA_RANGE; > vcpu->run->hypercall.args[0] = gpa; > vcpu->run->hypercall.args[1] = npages; >
diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c index 8e17942c3b..4bcccb48d1 100644 --- a/target/i386/kvm/kvm.c +++ b/target/i386/kvm/kvm.c @@ -6005,10 +6005,14 @@ static int kvm_handle_hc_map_gpa_range(struct kvm_run *run) static int kvm_handle_hypercall(struct kvm_run *run) { + int ret = -EINVAL; + if (run->hypercall.nr == KVM_HC_MAP_GPA_RANGE) - return kvm_handle_hc_map_gpa_range(run); + ret = kvm_handle_hc_map_gpa_range(run); + + run->hypercall.ret = ret; - return -EINVAL; + return ret; } #define VMX_INVALID_GUEST_STATE 0x80000021