diff mbox series

i386/kvm: Set return value after handling KVM_EXIT_HYPERCALL

Message ID 20241212032628.475976-1-binbin.wu@linux.intel.com (mailing list archive)
State New
Headers show
Series i386/kvm: Set return value after handling KVM_EXIT_HYPERCALL | expand

Commit Message

Binbin Wu Dec. 12, 2024, 3:26 a.m. UTC
Userspace should set the ret field of hypercall after handling
KVM_EXIT_HYPERCALL.  Otherwise, a stale value could be returned to KVM.

Fixes: 47e76d03b15 ("i386/kvm: Add KVM_EXIT_HYPERCALL handling for KVM_HC_MAP_GPA_RANGE")
Reported-by: Farrah Chen <farrah.chen@intel.com>
Signed-off-by: Binbin Wu <binbin.wu@linux.intel.com>
Tested-by: Farrah Chen <farrah.chen@intel.com>
---
To test the TDX code in kvm-coco-queue, please apply the patch to the QEMU,
otherwise, TDX guest boot could fail.
A matching QEMU tree including this patch is here:
https://github.com/intel-staging/qemu-tdx/releases/tag/tdx-qemu-upstream-v6.1-fix_kvm_hypercall_return_value

Previously, the issue was not triggered because no one would modify the ret
value. But with the refactor patch for __kvm_emulate_hypercall() in KVM,
https://lore.kernel.org/kvm/20241128004344.4072099-7-seanjc@google.com/, the
value could be modified.
---
 target/i386/kvm/kvm.c | 8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)


base-commit: ae35f033b874c627d81d51070187fbf55f0bf1a7

Comments

Yao Yuan Dec. 12, 2024, 3:41 a.m. UTC | #1
On Thu, Dec 12, 2024 at 11:26:28AM +0800, Binbin Wu wrote:
> Userspace should set the ret field of hypercall after handling
> KVM_EXIT_HYPERCALL.  Otherwise, a stale value could be returned to KVM.
>
> Fixes: 47e76d03b15 ("i386/kvm: Add KVM_EXIT_HYPERCALL handling for KVM_HC_MAP_GPA_RANGE")
> Reported-by: Farrah Chen <farrah.chen@intel.com>
> Signed-off-by: Binbin Wu <binbin.wu@linux.intel.com>
> Tested-by: Farrah Chen <farrah.chen@intel.com>
> ---
> To test the TDX code in kvm-coco-queue, please apply the patch to the QEMU,
> otherwise, TDX guest boot could fail.
> A matching QEMU tree including this patch is here:
> https://github.com/intel-staging/qemu-tdx/releases/tag/tdx-qemu-upstream-v6.1-fix_kvm_hypercall_return_value
>
> Previously, the issue was not triggered because no one would modify the ret
> value. But with the refactor patch for __kvm_emulate_hypercall() in KVM,
> https://lore.kernel.org/kvm/20241128004344.4072099-7-seanjc@google.com/, the
> value could be modified.
> ---
>  target/i386/kvm/kvm.c | 8 ++++++--
>  1 file changed, 6 insertions(+), 2 deletions(-)
>
> diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
> index 8e17942c3b..4bcccb48d1 100644
> --- a/target/i386/kvm/kvm.c
> +++ b/target/i386/kvm/kvm.c
> @@ -6005,10 +6005,14 @@ static int kvm_handle_hc_map_gpa_range(struct kvm_run *run)
>
>  static int kvm_handle_hypercall(struct kvm_run *run)
>  {
> +    int ret = -EINVAL;
> +
>      if (run->hypercall.nr == KVM_HC_MAP_GPA_RANGE)
> -        return kvm_handle_hc_map_gpa_range(run);
> +        ret = kvm_handle_hc_map_gpa_range(run);

LGTM to the issue it tries to fix :-)

> +
> +    run->hypercall.ret = ret;
>
> -    return -EINVAL;
> +    return ret;
>  }
>
>  #define VMX_INVALID_GUEST_STATE 0x80000021
>
> base-commit: ae35f033b874c627d81d51070187fbf55f0bf1a7
> --
> 2.46.0
>
Xiaoyao Li Dec. 12, 2024, 3:44 a.m. UTC | #2
On 12/12/2024 11:26 AM, Binbin Wu wrote:
> Userspace should set the ret field of hypercall after handling
> KVM_EXIT_HYPERCALL.  Otherwise, a stale value could be returned to KVM.
> 
> Fixes: 47e76d03b15 ("i386/kvm: Add KVM_EXIT_HYPERCALL handling for KVM_HC_MAP_GPA_RANGE")
> Reported-by: Farrah Chen <farrah.chen@intel.com>
> Signed-off-by: Binbin Wu <binbin.wu@linux.intel.com>
> Tested-by: Farrah Chen <farrah.chen@intel.com>
> ---
> To test the TDX code in kvm-coco-queue, please apply the patch to the QEMU,
> otherwise, TDX guest boot could fail.
> A matching QEMU tree including this patch is here:
> https://github.com/intel-staging/qemu-tdx/releases/tag/tdx-qemu-upstream-v6.1-fix_kvm_hypercall_return_value
> 
> Previously, the issue was not triggered because no one would modify the ret
> value. But with the refactor patch for __kvm_emulate_hypercall() in KVM,
> https://lore.kernel.org/kvm/20241128004344.4072099-7-seanjc@google.com/, the
> value could be modified.
> ---
>   target/i386/kvm/kvm.c | 8 ++++++--
>   1 file changed, 6 insertions(+), 2 deletions(-)
> 
> diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
> index 8e17942c3b..4bcccb48d1 100644
> --- a/target/i386/kvm/kvm.c
> +++ b/target/i386/kvm/kvm.c
> @@ -6005,10 +6005,14 @@ static int kvm_handle_hc_map_gpa_range(struct kvm_run *run)
>   
>   static int kvm_handle_hypercall(struct kvm_run *run)
>   {
> +    int ret = -EINVAL;
> +
>       if (run->hypercall.nr == KVM_HC_MAP_GPA_RANGE)
> -        return kvm_handle_hc_map_gpa_range(run);
> +        ret = kvm_handle_hc_map_gpa_range(run);
> +
> +    run->hypercall.ret = ret;

Updating run->hypercall.ret is useful only when QEMU needs to re-enter 
the guest. For the case of ret < 0, QEMU will stop the vcpu.

I think we might need re-think on the handling of KVM_EXIT_HYPERCALL. 
E.g., in what error case should QEMU stop the vcpu, and in what case can 
QEMU return the error back to the guest via run->hypercall.ret.

> -    return -EINVAL;
> +    return ret;
>   }
>   
>   #define VMX_INVALID_GUEST_STATE 0x80000021
> 
> base-commit: ae35f033b874c627d81d51070187fbf55f0bf1a7
Binbin Wu Dec. 12, 2024, 5:18 a.m. UTC | #3
On 12/12/2024 11:44 AM, Xiaoyao Li wrote:
> On 12/12/2024 11:26 AM, Binbin Wu wrote:
>> Userspace should set the ret field of hypercall after handling
>> KVM_EXIT_HYPERCALL.  Otherwise, a stale value could be returned to KVM.
>>
>> Fixes: 47e76d03b15 ("i386/kvm: Add KVM_EXIT_HYPERCALL handling for KVM_HC_MAP_GPA_RANGE")
>> Reported-by: Farrah Chen <farrah.chen@intel.com>
>> Signed-off-by: Binbin Wu <binbin.wu@linux.intel.com>
>> Tested-by: Farrah Chen <farrah.chen@intel.com>
>> ---
>> To test the TDX code in kvm-coco-queue, please apply the patch to the QEMU,
>> otherwise, TDX guest boot could fail.
>> A matching QEMU tree including this patch is here:
>> https://github.com/intel-staging/qemu-tdx/releases/tag/tdx-qemu-upstream-v6.1-fix_kvm_hypercall_return_value
>>
>> Previously, the issue was not triggered because no one would modify the ret
>> value. But with the refactor patch for __kvm_emulate_hypercall() in KVM,
>> https://lore.kernel.org/kvm/20241128004344.4072099-7-seanjc@google.com/, the
>> value could be modified.
>> ---
>>   target/i386/kvm/kvm.c | 8 ++++++--
>>   1 file changed, 6 insertions(+), 2 deletions(-)
>>
>> diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
>> index 8e17942c3b..4bcccb48d1 100644
>> --- a/target/i386/kvm/kvm.c
>> +++ b/target/i386/kvm/kvm.c
>> @@ -6005,10 +6005,14 @@ static int kvm_handle_hc_map_gpa_range(struct kvm_run *run)
>>     static int kvm_handle_hypercall(struct kvm_run *run)
>>   {
>> +    int ret = -EINVAL;
>> +
>>       if (run->hypercall.nr == KVM_HC_MAP_GPA_RANGE)
>> -        return kvm_handle_hc_map_gpa_range(run);
>> +        ret = kvm_handle_hc_map_gpa_range(run);
>> +
>> +    run->hypercall.ret = ret;
>
> Updating run->hypercall.ret is useful only when QEMU needs to re-enter the guest. For the case of ret < 0, QEMU will stop the vcpu.

IMHO, assign run->hypercall.ret anyway should be OK, no need to add a
per-condition on ret, although the value is not used when ret < 0.

Currently, since QEMU will stop the vcpu when ret < 0, this patch doesn't
convert ret to -Exxx that the ABI expects.

>
> I think we might need re-think on the handling of KVM_EXIT_HYPERCALL. E.g., in what error case should QEMU stop the vcpu, and in what case can QEMU return the error back to the guest via run->hypercall.ret.

Actually, I had the similar question before.
https://lore.kernel.org/kvm/d25cc62c-0f56-4be2-968a-63c8b1d63b5a@linux.intel.com/

It might depends on the hypercall number?
Another option is QEMU always sets run->hypercall.ret appropriately and continues the vcpu thread.


>
>> -    return -EINVAL;
>> +    return ret;
>>   }
>>     #define VMX_INVALID_GUEST_STATE 0x80000021
>>
>> base-commit: ae35f033b874c627d81d51070187fbf55f0bf1a7
>
Xiaoyao Li Dec. 12, 2024, 7:09 a.m. UTC | #4
On 12/12/2024 1:18 PM, Binbin Wu wrote:
> 
> 
> 
> On 12/12/2024 11:44 AM, Xiaoyao Li wrote:
>> On 12/12/2024 11:26 AM, Binbin Wu wrote:
>>> Userspace should set the ret field of hypercall after handling
>>> KVM_EXIT_HYPERCALL.  Otherwise, a stale value could be returned to KVM.
>>>
>>> Fixes: 47e76d03b15 ("i386/kvm: Add KVM_EXIT_HYPERCALL handling for 
>>> KVM_HC_MAP_GPA_RANGE")
>>> Reported-by: Farrah Chen <farrah.chen@intel.com>
>>> Signed-off-by: Binbin Wu <binbin.wu@linux.intel.com>
>>> Tested-by: Farrah Chen <farrah.chen@intel.com>
>>> ---
>>> To test the TDX code in kvm-coco-queue, please apply the patch to the 
>>> QEMU,
>>> otherwise, TDX guest boot could fail.
>>> A matching QEMU tree including this patch is here:
>>> https://github.com/intel-staging/qemu-tdx/releases/tag/tdx-qemu- 
>>> upstream-v6.1-fix_kvm_hypercall_return_value
>>>
>>> Previously, the issue was not triggered because no one would modify 
>>> the ret
>>> value. But with the refactor patch for __kvm_emulate_hypercall() in KVM,
>>> https://lore.kernel.org/kvm/20241128004344.4072099-7- 
>>> seanjc@google.com/, the
>>> value could be modified.
>>> ---
>>>   target/i386/kvm/kvm.c | 8 ++++++--
>>>   1 file changed, 6 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
>>> index 8e17942c3b..4bcccb48d1 100644
>>> --- a/target/i386/kvm/kvm.c
>>> +++ b/target/i386/kvm/kvm.c
>>> @@ -6005,10 +6005,14 @@ static int kvm_handle_hc_map_gpa_range(struct 
>>> kvm_run *run)
>>>     static int kvm_handle_hypercall(struct kvm_run *run)
>>>   {
>>> +    int ret = -EINVAL;
>>> +
>>>       if (run->hypercall.nr == KVM_HC_MAP_GPA_RANGE)
>>> -        return kvm_handle_hc_map_gpa_range(run);
>>> +        ret = kvm_handle_hc_map_gpa_range(run);
>>> +
>>> +    run->hypercall.ret = ret;
>>
>> Updating run->hypercall.ret is useful only when QEMU needs to re-enter 
>> the guest. For the case of ret < 0, QEMU will stop the vcpu.
> 
> IMHO, assign run->hypercall.ret anyway should be OK, no need to add a
> per-condition on ret, although the value is not used when ret < 0.
> 
> Currently, since QEMU will stop the vcpu when ret < 0, this patch doesn't
> convert ret to -Exxx that the ABI expects.

I was thinking if it is better to let each specific handling function to 
   update run->hypercall.ret with its own logic. E.g., for this case, 
let kvm_handle_hc_map_gpa_range() to update the run->hypercall.ret.

Reusing the return value of the handling function to update
run->hypercall.ret seems not logically correct to me.

>>
>> I think we might need re-think on the handling of KVM_EXIT_HYPERCALL. 
>> E.g., in what error case should QEMU stop the vcpu, and in what case 
>> can QEMU return the error back to the guest via run->hypercall.ret.
> 
> Actually, I had the similar question before.
> https://lore.kernel.org/kvm/ 
> d25cc62c-0f56-4be2-968a-63c8b1d63b5a@linux.intel.com/
> 
> It might depends on the hypercall number?
> Another option is QEMU always sets run->hypercall.ret appropriately and 
> continues the vcpu thread.
> 
> 
>>
>>> -    return -EINVAL;
>>> +    return ret;
>>>   }
>>>     #define VMX_INVALID_GUEST_STATE 0x80000021
>>>
>>> base-commit: ae35f033b874c627d81d51070187fbf55f0bf1a7
>>
>
Binbin Wu Dec. 12, 2024, 7:24 a.m. UTC | #5
On 12/12/2024 3:09 PM, Xiaoyao Li wrote:
> On 12/12/2024 1:18 PM, Binbin Wu wrote:
>>
>>
>>
>> On 12/12/2024 11:44 AM, Xiaoyao Li wrote:
>>> On 12/12/2024 11:26 AM, Binbin Wu wrote:
>>>> Userspace should set the ret field of hypercall after handling
>>>> KVM_EXIT_HYPERCALL.  Otherwise, a stale value could be returned to KVM.
>>>>
>>>> Fixes: 47e76d03b15 ("i386/kvm: Add KVM_EXIT_HYPERCALL handling for KVM_HC_MAP_GPA_RANGE")
>>>> Reported-by: Farrah Chen <farrah.chen@intel.com>
>>>> Signed-off-by: Binbin Wu <binbin.wu@linux.intel.com>
>>>> Tested-by: Farrah Chen <farrah.chen@intel.com>
>>>> ---
>>>> To test the TDX code in kvm-coco-queue, please apply the patch to the QEMU,
>>>> otherwise, TDX guest boot could fail.
>>>> A matching QEMU tree including this patch is here:
>>>> https://github.com/intel-staging/qemu-tdx/releases/tag/tdx-qemu- upstream-v6.1-fix_kvm_hypercall_return_value
>>>>
>>>> Previously, the issue was not triggered because no one would modify the ret
>>>> value. But with the refactor patch for __kvm_emulate_hypercall() in KVM,
>>>> https://lore.kernel.org/kvm/20241128004344.4072099-7- seanjc@google.com/, the
>>>> value could be modified.
>>>> ---
>>>>   target/i386/kvm/kvm.c | 8 ++++++--
>>>>   1 file changed, 6 insertions(+), 2 deletions(-)
>>>>
>>>> diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
>>>> index 8e17942c3b..4bcccb48d1 100644
>>>> --- a/target/i386/kvm/kvm.c
>>>> +++ b/target/i386/kvm/kvm.c
>>>> @@ -6005,10 +6005,14 @@ static int kvm_handle_hc_map_gpa_range(struct kvm_run *run)
>>>>     static int kvm_handle_hypercall(struct kvm_run *run)
>>>>   {
>>>> +    int ret = -EINVAL;
>>>> +
>>>>       if (run->hypercall.nr == KVM_HC_MAP_GPA_RANGE)
>>>> -        return kvm_handle_hc_map_gpa_range(run);
>>>> +        ret = kvm_handle_hc_map_gpa_range(run);
>>>> +
>>>> +    run->hypercall.ret = ret;
>>>
>>> Updating run->hypercall.ret is useful only when QEMU needs to re-enter the guest. For the case of ret < 0, QEMU will stop the vcpu.
>>
>> IMHO, assign run->hypercall.ret anyway should be OK, no need to add a
>> per-condition on ret, although the value is not used when ret < 0.
>>
>> Currently, since QEMU will stop the vcpu when ret < 0, this patch doesn't
>> convert ret to -Exxx that the ABI expects.
>
> I was thinking if it is better to let each specific handling function to   update run->hypercall.ret with its own logic. E.g., for this case, let kvm_handle_hc_map_gpa_range() to update the run->hypercall.ret.
I think it makes sense.

Also, each handling function can decide whether the vcpu should continue if the handling failed.
- Return 0 and set the error code ( 0 or -Exxx) to run->hypercall.ret if it want to continue.
- Return negative value if it want to stop the vcpu thread.

>
> Reusing the return value of the handling function to update
> run->hypercall.ret seems not logically correct to me.
>
>>>
>>> I think we might need re-think on the handling of KVM_EXIT_HYPERCALL. E.g., in what error case should QEMU stop the vcpu, and in what case can QEMU return the error back to the guest via run->hypercall.ret.
>>
>> Actually, I had the similar question before.
>> https://lore.kernel.org/kvm/ d25cc62c-0f56-4be2-968a-63c8b1d63b5a@linux.intel.com/
>>
>> It might depends on the hypercall number?
>> Another option is QEMU always sets run->hypercall.ret appropriately and continues the vcpu thread.
>>
>>
>>>
>>>> -    return -EINVAL;
>>>> +    return ret;
>>>>   }
>>>>     #define VMX_INVALID_GUEST_STATE 0x80000021
>>>>
>>>> base-commit: ae35f033b874c627d81d51070187fbf55f0bf1a7
>>>
>>
>
Zhao Liu Dec. 12, 2024, 8:07 a.m. UTC | #6
On Thu, Dec 12, 2024 at 11:26:28AM +0800, Binbin Wu wrote:
> Date: Thu, 12 Dec 2024 11:26:28 +0800
> From: Binbin Wu <binbin.wu@linux.intel.com>
> Subject: [PATCH] i386/kvm: Set return value after handling
>  KVM_EXIT_HYPERCALL
> X-Mailer: git-send-email 2.46.0
> 
> Userspace should set the ret field of hypercall after handling
> KVM_EXIT_HYPERCALL.  Otherwise, a stale value could be returned to KVM.
> 
> Fixes: 47e76d03b15 ("i386/kvm: Add KVM_EXIT_HYPERCALL handling for KVM_HC_MAP_GPA_RANGE")
> Reported-by: Farrah Chen <farrah.chen@intel.com>
> Signed-off-by: Binbin Wu <binbin.wu@linux.intel.com>
> Tested-by: Farrah Chen <farrah.chen@intel.com>
> ---
> To test the TDX code in kvm-coco-queue, please apply the patch to the QEMU,
> otherwise, TDX guest boot could fail.
> A matching QEMU tree including this patch is here:
> https://github.com/intel-staging/qemu-tdx/releases/tag/tdx-qemu-upstream-v6.1-fix_kvm_hypercall_return_value
> 
> Previously, the issue was not triggered because no one would modify the ret
> value. But with the refactor patch for __kvm_emulate_hypercall() in KVM,
> https://lore.kernel.org/kvm/20241128004344.4072099-7-seanjc@google.com/, the
> value could be modified.

Could you explain the specific reasons here in detail? It would be
helpful with debugging or reproducing the issue.

> ---
>  target/i386/kvm/kvm.c | 8 ++++++--
>  1 file changed, 6 insertions(+), 2 deletions(-)
> 
> diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
> index 8e17942c3b..4bcccb48d1 100644
> --- a/target/i386/kvm/kvm.c
> +++ b/target/i386/kvm/kvm.c
> @@ -6005,10 +6005,14 @@ static int kvm_handle_hc_map_gpa_range(struct kvm_run *run)
>  
>  static int kvm_handle_hypercall(struct kvm_run *run)
>  {
> +    int ret = -EINVAL;
> +
>      if (run->hypercall.nr == KVM_HC_MAP_GPA_RANGE)
> -        return kvm_handle_hc_map_gpa_range(run);
> +        ret = kvm_handle_hc_map_gpa_range(run);
> +
> +    run->hypercall.ret = ret;

ret may be negative but hypercall.ret is u64. Do we need to set it to
-ret?

> -    return -EINVAL;
> +    return ret;
>  }
>  
>  #define VMX_INVALID_GUEST_STATE 0x80000021
> 
> base-commit: ae35f033b874c627d81d51070187fbf55f0bf1a7
> -- 
> 2.46.0
> 
>
Paolo Bonzini Dec. 12, 2024, 4:03 p.m. UTC | #7
On 12/12/24 09:07, Zhao Liu wrote:
> On Thu, Dec 12, 2024 at 11:26:28AM +0800, Binbin Wu wrote:
>> Date: Thu, 12 Dec 2024 11:26:28 +0800
>> From: Binbin Wu <binbin.wu@linux.intel.com>
>> Subject: [PATCH] i386/kvm: Set return value after handling
>>   KVM_EXIT_HYPERCALL
>> X-Mailer: git-send-email 2.46.0
>>
>> Userspace should set the ret field of hypercall after handling
>> KVM_EXIT_HYPERCALL.  Otherwise, a stale value could be returned to KVM.
>>
>> Fixes: 47e76d03b15 ("i386/kvm: Add KVM_EXIT_HYPERCALL handling for KVM_HC_MAP_GPA_RANGE")
>> Reported-by: Farrah Chen <farrah.chen@intel.com>
>> Signed-off-by: Binbin Wu <binbin.wu@linux.intel.com>
>> Tested-by: Farrah Chen <farrah.chen@intel.com>
>> ---
>> To test the TDX code in kvm-coco-queue, please apply the patch to the QEMU,
>> otherwise, TDX guest boot could fail.
>> A matching QEMU tree including this patch is here:
>> https://github.com/intel-staging/qemu-tdx/releases/tag/tdx-qemu-upstream-v6.1-fix_kvm_hypercall_return_value
>>
>> Previously, the issue was not triggered because no one would modify the ret
>> value. But with the refactor patch for __kvm_emulate_hypercall() in KVM,
>> https://lore.kernel.org/kvm/20241128004344.4072099-7-seanjc@google.com/, the
>> value could be modified.
> 
> Could you explain the specific reasons here in detail? It would be
> helpful with debugging or reproducing the issue.
> 
>> ---
>>   target/i386/kvm/kvm.c | 8 ++++++--
>>   1 file changed, 6 insertions(+), 2 deletions(-)
>>
>> diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
>> index 8e17942c3b..4bcccb48d1 100644
>> --- a/target/i386/kvm/kvm.c
>> +++ b/target/i386/kvm/kvm.c
>> @@ -6005,10 +6005,14 @@ static int kvm_handle_hc_map_gpa_range(struct kvm_run *run)
>>   
>>   static int kvm_handle_hypercall(struct kvm_run *run)
>>   {
>> +    int ret = -EINVAL;
>> +
>>       if (run->hypercall.nr == KVM_HC_MAP_GPA_RANGE)
>> -        return kvm_handle_hc_map_gpa_range(run);
>> +        ret = kvm_handle_hc_map_gpa_range(run);
>> +
>> +    run->hypercall.ret = ret;
> 
> ret may be negative but hypercall.ret is u64. Do we need to set it to
> -ret?

If ret is less than zero, will stop the VM anyway as
RUN_STATE_INTERNAL_ERROR.

If this has to be fixed in QEMU, I think there's no need to set anything
if ret != 0; also because kvm_convert_memory() returns -1 on error and
that's not how the error would be passed to the guest.

However, I think the right fix should simply be this in KVM:

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 83fe0a78146f..e2118ba93ef6 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -10066,6 +10066,7 @@ unsigned long __kvm_emulate_hypercall(struct kvm_vcpu *vcpu, unsigned long nr,
  		}
  
  		vcpu->run->exit_reason        = KVM_EXIT_HYPERCALL;
+		vcpu->run->ret                = 0;
  		vcpu->run->hypercall.nr       = KVM_HC_MAP_GPA_RANGE;
  		vcpu->run->hypercall.args[0]  = gpa;
  		vcpu->run->hypercall.args[1]  = npages;

While there is arguably a change in behavior of the kernel both with
the patches in kvm-coco-queue and with the above one, _in practice_
the above change is one that userspace will not notice.

Paolo

>> -    return -EINVAL;
>> +    return ret;
>>   }
>>   
>>   #define VMX_INVALID_GUEST_STATE 0x80000021
>>
>> base-commit: ae35f033b874c627d81d51070187fbf55f0bf1a7
>> -- 
>> 2.46.0
>>
>>
Sean Christopherson Dec. 12, 2024, 7:13 p.m. UTC | #8
On Thu, Dec 12, 2024, Paolo Bonzini wrote:
> On 12/12/24 09:07, Zhao Liu wrote:
> > On Thu, Dec 12, 2024 at 11:26:28AM +0800, Binbin Wu wrote:
> > > Date: Thu, 12 Dec 2024 11:26:28 +0800
> > > From: Binbin Wu <binbin.wu@linux.intel.com>
> > > Subject: [PATCH] i386/kvm: Set return value after handling
> > >   KVM_EXIT_HYPERCALL
> > > X-Mailer: git-send-email 2.46.0
> > > 
> > > Userspace should set the ret field of hypercall after handling
> > > KVM_EXIT_HYPERCALL.  Otherwise, a stale value could be returned to KVM.
> > > 
> > > Fixes: 47e76d03b15 ("i386/kvm: Add KVM_EXIT_HYPERCALL handling for KVM_HC_MAP_GPA_RANGE")
> > > Reported-by: Farrah Chen <farrah.chen@intel.com>
> > > Signed-off-by: Binbin Wu <binbin.wu@linux.intel.com>
> > > Tested-by: Farrah Chen <farrah.chen@intel.com>
> > > ---
> > > To test the TDX code in kvm-coco-queue, please apply the patch to the QEMU,
> > > otherwise, TDX guest boot could fail.
> > > A matching QEMU tree including this patch is here:
> > > https://github.com/intel-staging/qemu-tdx/releases/tag/tdx-qemu-upstream-v6.1-fix_kvm_hypercall_return_value
> > > 
> > > Previously, the issue was not triggered because no one would modify the ret
> > > value. But with the refactor patch for __kvm_emulate_hypercall() in KVM,
> > > https://lore.kernel.org/kvm/20241128004344.4072099-7-seanjc@google.com/, the
> > > value could be modified.
> > 
> > Could you explain the specific reasons here in detail? It would be
> > helpful with debugging or reproducing the issue.
> > 
> > > ---
> > >   target/i386/kvm/kvm.c | 8 ++++++--
> > >   1 file changed, 6 insertions(+), 2 deletions(-)
> > > 
> > > diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
> > > index 8e17942c3b..4bcccb48d1 100644
> > > --- a/target/i386/kvm/kvm.c
> > > +++ b/target/i386/kvm/kvm.c
> > > @@ -6005,10 +6005,14 @@ static int kvm_handle_hc_map_gpa_range(struct kvm_run *run)
> > >   static int kvm_handle_hypercall(struct kvm_run *run)
> > >   {
> > > +    int ret = -EINVAL;
> > > +
> > >       if (run->hypercall.nr == KVM_HC_MAP_GPA_RANGE)
> > > -        return kvm_handle_hc_map_gpa_range(run);
> > > +        ret = kvm_handle_hc_map_gpa_range(run);
> > > +
> > > +    run->hypercall.ret = ret;
> > 
> > ret may be negative but hypercall.ret is u64. Do we need to set it to
> > -ret?
> 
> If ret is less than zero, will stop the VM anyway as
> RUN_STATE_INTERNAL_ERROR.
> 
> If this has to be fixed in QEMU, I think there's no need to set anything
> if ret != 0; also because kvm_convert_memory() returns -1 on error and
> that's not how the error would be passed to the guest.
> 
> However, I think the right fix should simply be this in KVM:
> 
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 83fe0a78146f..e2118ba93ef6 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -10066,6 +10066,7 @@ unsigned long __kvm_emulate_hypercall(struct kvm_vcpu *vcpu, unsigned long nr,
>  		}
>  		vcpu->run->exit_reason        = KVM_EXIT_HYPERCALL;
> +		vcpu->run->ret                = 0;

		vcpu->run->hypercall.ret

>  		vcpu->run->hypercall.nr       = KVM_HC_MAP_GPA_RANGE;
>  		vcpu->run->hypercall.args[0]  = gpa;
>  		vcpu->run->hypercall.args[1]  = npages;
> 
> While there is arguably a change in behavior of the kernel both with
> the patches in kvm-coco-queue and with the above one, _in practice_
> the above change is one that userspace will not notice.

I agree that KVM should initialize "ret", but I don't think '0' is the right
value.  KVM shouldn't assume userspace will successfully handle the hypercall.
What happens if KVM sets vcpu->run->hypercall.ret to a non-zero value, e.g. -KVM_ENOSYS?
Paolo Bonzini Dec. 12, 2024, 9:28 p.m. UTC | #9
On 12/12/24 20:13, Sean Christopherson wrote:
> On Thu, Dec 12, 2024, Paolo Bonzini wrote:
>> On 12/12/24 09:07, Zhao Liu wrote:
>>> On Thu, Dec 12, 2024 at 11:26:28AM +0800, Binbin Wu wrote:
>>>> Date: Thu, 12 Dec 2024 11:26:28 +0800
>>>> From: Binbin Wu <binbin.wu@linux.intel.com>
>>>> Subject: [PATCH] i386/kvm: Set return value after handling
>>>>    KVM_EXIT_HYPERCALL
>>>> X-Mailer: git-send-email 2.46.0
>>>>
>>>> Userspace should set the ret field of hypercall after handling
>>>> KVM_EXIT_HYPERCALL.  Otherwise, a stale value could be returned to KVM.
>>>>
>>>> Fixes: 47e76d03b15 ("i386/kvm: Add KVM_EXIT_HYPERCALL handling for KVM_HC_MAP_GPA_RANGE")
>>>> Reported-by: Farrah Chen <farrah.chen@intel.com>
>>>> Signed-off-by: Binbin Wu <binbin.wu@linux.intel.com>
>>>> Tested-by: Farrah Chen <farrah.chen@intel.com>
>>>> ---
>>>> To test the TDX code in kvm-coco-queue, please apply the patch to the QEMU,
>>>> otherwise, TDX guest boot could fail.
>>>> A matching QEMU tree including this patch is here:
>>>> https://github.com/intel-staging/qemu-tdx/releases/tag/tdx-qemu-upstream-v6.1-fix_kvm_hypercall_return_value
>>>>
>>>> Previously, the issue was not triggered because no one would modify the ret
>>>> value. But with the refactor patch for __kvm_emulate_hypercall() in KVM,
>>>> https://lore.kernel.org/kvm/20241128004344.4072099-7-seanjc@google.com/, the
>>>> value could be modified.
>>>
>>> Could you explain the specific reasons here in detail? It would be
>>> helpful with debugging or reproducing the issue.
>>>
>>>> ---
>>>>    target/i386/kvm/kvm.c | 8 ++++++--
>>>>    1 file changed, 6 insertions(+), 2 deletions(-)
>>>>
>>>> diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
>>>> index 8e17942c3b..4bcccb48d1 100644
>>>> --- a/target/i386/kvm/kvm.c
>>>> +++ b/target/i386/kvm/kvm.c
>>>> @@ -6005,10 +6005,14 @@ static int kvm_handle_hc_map_gpa_range(struct kvm_run *run)
>>>>    static int kvm_handle_hypercall(struct kvm_run *run)
>>>>    {
>>>> +    int ret = -EINVAL;
>>>> +
>>>>        if (run->hypercall.nr == KVM_HC_MAP_GPA_RANGE)
>>>> -        return kvm_handle_hc_map_gpa_range(run);
>>>> +        ret = kvm_handle_hc_map_gpa_range(run);
>>>> +
>>>> +    run->hypercall.ret = ret;
>>>
>>> ret may be negative but hypercall.ret is u64. Do we need to set it to
>>> -ret?
>>
>> If ret is less than zero, will stop the VM anyway as
>> RUN_STATE_INTERNAL_ERROR.
>>
>> If this has to be fixed in QEMU, I think there's no need to set anything
>> if ret != 0; also because kvm_convert_memory() returns -1 on error and
>> that's not how the error would be passed to the guest.
>>
>> However, I think the right fix should simply be this in KVM:
>>
>> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
>> index 83fe0a78146f..e2118ba93ef6 100644
>> --- a/arch/x86/kvm/x86.c
>> +++ b/arch/x86/kvm/x86.c
>> @@ -10066,6 +10066,7 @@ unsigned long __kvm_emulate_hypercall(struct kvm_vcpu *vcpu, unsigned long nr,
>>   		}
>>   		vcpu->run->exit_reason        = KVM_EXIT_HYPERCALL;
>> +		vcpu->run->ret                = 0;
> 
> 		vcpu->run->hypercall.ret
> 
>>   		vcpu->run->hypercall.nr       = KVM_HC_MAP_GPA_RANGE;
>>   		vcpu->run->hypercall.args[0]  = gpa;
>>   		vcpu->run->hypercall.args[1]  = npages;
>>
>> While there is arguably a change in behavior of the kernel both with
>> the patches in kvm-coco-queue and with the above one, _in practice_
>> the above change is one that userspace will not notice.
> 
> I agree that KVM should initialize "ret", but I don't think '0' is the right
> value.  KVM shouldn't assume userspace will successfully handle the hypercall.
> What happens if KVM sets vcpu->run->hypercall.ret to a non-zero value, e.g. -KVM_ENOSYS?

Unfortunately QEMU is never writing vcpu->run->hypercall.ret, so the 
guest sees -KVM_ENOSYS; this is basically the same bug that Binbin is 
fixing, just with a different value passed to the guest.

In other words, the above one-liner is pulling the "don't break 
userspace" card.

Paolo
Sean Christopherson Dec. 12, 2024, 10:11 p.m. UTC | #10
On Thu, Dec 12, 2024, Paolo Bonzini wrote:
> On 12/12/24 20:13, Sean Christopherson wrote:
> > On Thu, Dec 12, 2024, Paolo Bonzini wrote:
> > > If ret is less than zero, will stop the VM anyway as
> > > RUN_STATE_INTERNAL_ERROR.
> > > 
> > > If this has to be fixed in QEMU, I think there's no need to set anything
> > > if ret != 0; also because kvm_convert_memory() returns -1 on error and
> > > that's not how the error would be passed to the guest.
> > > 
> > > However, I think the right fix should simply be this in KVM:
> > > 
> > > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> > > index 83fe0a78146f..e2118ba93ef6 100644
> > > --- a/arch/x86/kvm/x86.c
> > > +++ b/arch/x86/kvm/x86.c
> > > @@ -10066,6 +10066,7 @@ unsigned long __kvm_emulate_hypercall(struct kvm_vcpu *vcpu, unsigned long nr,
> > >   		}
> > >   		vcpu->run->exit_reason        = KVM_EXIT_HYPERCALL;
> > > +		vcpu->run->ret                = 0;
> > 
> > 		vcpu->run->hypercall.ret
> > 
> > >   		vcpu->run->hypercall.nr       = KVM_HC_MAP_GPA_RANGE;
> > >   		vcpu->run->hypercall.args[0]  = gpa;
> > >   		vcpu->run->hypercall.args[1]  = npages;
> > > 
> > > While there is arguably a change in behavior of the kernel both with
> > > the patches in kvm-coco-queue and with the above one, _in practice_
> > > the above change is one that userspace will not notice.
> > 
> > I agree that KVM should initialize "ret", but I don't think '0' is the right
> > value.  KVM shouldn't assume userspace will successfully handle the hypercall.
> > What happens if KVM sets vcpu->run->hypercall.ret to a non-zero value, e.g. -KVM_ENOSYS?
> 
> Unfortunately QEMU is never writing vcpu->run->hypercall.ret, so the guest
> sees -KVM_ENOSYS; this is basically the same bug that Binbin is fixing, just
> with a different value passed to the guest.
> 
> In other words, the above one-liner is pulling the "don't break userspace"
> card.

But how is anything breaking userspace?  QEMU needs to opt-in to intercepting
KVM_HC_MAP_GPA_RANGE, and this has been KVM's behavior since commit 0dbb11230437
("KVM: X86: Introduce KVM_HC_MAP_GPA_RANGE hypercall").

Ah, "ret" happens to be deep in the union and KVM zero allocates vcpu->run, so
QEMU gets lucky and "ret" happens to be zero because no other non-fatal userspace
exit on x86 happens to need as many bytes.  Hilarious.

FWIW, if TDX marshalls hypercall state into KVM's "normal" registers, then KVM's
shenanigans with vcpu->run->hypercall.ret might go away?  Though regardless of
what happens on that front, I think it makes to explicitly initialize "ret" to
*something*.

I checked our VMM, and it does the right thing, so I don't have any objection
to explicitly zeroing "ret".  Though it needs a comment explaining that it's a
terrible hack for broken userspace ;-)
Binbin Wu Dec. 13, 2024, 1:46 a.m. UTC | #11
On 12/13/2024 5:28 AM, Paolo Bonzini wrote:
> On 12/12/24 20:13, Sean Christopherson wrote:
>> On Thu, Dec 12, 2024, Paolo Bonzini wrote:
>>> On 12/12/24 09:07, Zhao Liu wrote:
>>>> On Thu, Dec 12, 2024 at 11:26:28AM +0800, Binbin Wu wrote:
>>>>> Date: Thu, 12 Dec 2024 11:26:28 +0800
>>>>> From: Binbin Wu <binbin.wu@linux.intel.com>
>>>>> Subject: [PATCH] i386/kvm: Set return value after handling
>>>>>    KVM_EXIT_HYPERCALL
>>>>> X-Mailer: git-send-email 2.46.0
>>>>>
>>>>> Userspace should set the ret field of hypercall after handling
>>>>> KVM_EXIT_HYPERCALL.  Otherwise, a stale value could be returned to KVM.
>>>>>
>>>>> Fixes: 47e76d03b15 ("i386/kvm: Add KVM_EXIT_HYPERCALL handling for KVM_HC_MAP_GPA_RANGE")
>>>>> Reported-by: Farrah Chen <farrah.chen@intel.com>
>>>>> Signed-off-by: Binbin Wu <binbin.wu@linux.intel.com>
>>>>> Tested-by: Farrah Chen <farrah.chen@intel.com>
>>>>> ---
>>>>> To test the TDX code in kvm-coco-queue, please apply the patch to the QEMU,
>>>>> otherwise, TDX guest boot could fail.
>>>>> A matching QEMU tree including this patch is here:
>>>>> https://github.com/intel-staging/qemu-tdx/releases/tag/tdx-qemu-upstream-v6.1-fix_kvm_hypercall_return_value
>>>>>
>>>>> Previously, the issue was not triggered because no one would modify the ret
>>>>> value. But with the refactor patch for __kvm_emulate_hypercall() in KVM,
>>>>> https://lore.kernel.org/kvm/20241128004344.4072099-7-seanjc@google.com/, the
>>>>> value could be modified.
>>>>
>>>> Could you explain the specific reasons here in detail? It would be
>>>> helpful with debugging or reproducing the issue.
>>>>
>>>>> ---
>>>>>    target/i386/kvm/kvm.c | 8 ++++++--
>>>>>    1 file changed, 6 insertions(+), 2 deletions(-)
>>>>>
>>>>> diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
>>>>> index 8e17942c3b..4bcccb48d1 100644
>>>>> --- a/target/i386/kvm/kvm.c
>>>>> +++ b/target/i386/kvm/kvm.c
>>>>> @@ -6005,10 +6005,14 @@ static int kvm_handle_hc_map_gpa_range(struct kvm_run *run)
>>>>>    static int kvm_handle_hypercall(struct kvm_run *run)
>>>>>    {
>>>>> +    int ret = -EINVAL;
>>>>> +
>>>>>        if (run->hypercall.nr == KVM_HC_MAP_GPA_RANGE)
>>>>> -        return kvm_handle_hc_map_gpa_range(run);
>>>>> +        ret = kvm_handle_hc_map_gpa_range(run);
>>>>> +
>>>>> +    run->hypercall.ret = ret;
>>>>
>>>> ret may be negative but hypercall.ret is u64. Do we need to set it to
>>>> -ret?
>>>
>>> If ret is less than zero, will stop the VM anyway as
>>> RUN_STATE_INTERNAL_ERROR.
>>>
>>> If this has to be fixed in QEMU, I think there's no need to set anything
>>> if ret != 0; also because kvm_convert_memory() returns -1 on error and
>>> that's not how the error would be passed to the guest.
>>>
>>> However, I think the right fix should simply be this in KVM:
>>>
>>> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
>>> index 83fe0a78146f..e2118ba93ef6 100644
>>> --- a/arch/x86/kvm/x86.c
>>> +++ b/arch/x86/kvm/x86.c
>>> @@ -10066,6 +10066,7 @@ unsigned long __kvm_emulate_hypercall(struct kvm_vcpu *vcpu, unsigned long nr,
>>>           }
>>>           vcpu->run->exit_reason        = KVM_EXIT_HYPERCALL;
>>> +        vcpu->run->ret                = 0;
>>
>>         vcpu->run->hypercall.ret
>>
>>> vcpu->run->hypercall.nr       = KVM_HC_MAP_GPA_RANGE;
>>>           vcpu->run->hypercall.args[0]  = gpa;
>>>           vcpu->run->hypercall.args[1]  = npages;
>>>
>>> While there is arguably a change in behavior of the kernel both with
>>> the patches in kvm-coco-queue and with the above one, _in practice_
>>> the above change is one that userspace will not notice.
>>
>> I agree that KVM should initialize "ret", but I don't think '0' is the right
>> value.  KVM shouldn't assume userspace will successfully handle the hypercall.
>> What happens if KVM sets vcpu->run->hypercall.ret to a non-zero value, e.g. -KVM_ENOSYS?
>
> Unfortunately QEMU is never writing vcpu->run->hypercall.ret, so the guest sees -KVM_ENOSYS; this is basically the same bug that Binbin is fixing, just with a different value passed to the guest.
>
> In other words, the above one-liner is pulling the "don't break userspace" card.
>
> Paolo
>
>
If the change need to be done in KVM, there are other 3 functions that use
KVM_EXIT_HYPERCALL based on the code in kvm-coco-queue.

diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 40fe7258843e..a624f7289282 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -3633,6 +3633,7 @@ static int snp_begin_psc_msr(struct vcpu_svm *svm, u64 ghcb_msr)
         }

         vcpu->run->exit_reason = KVM_EXIT_HYPERCALL;
+       vcpu->run->ret         = 0;
         vcpu->run->hypercall.nr = KVM_HC_MAP_GPA_RANGE;
         vcpu->run->hypercall.args[0] = gpa;
         vcpu->run->hypercall.args[1] = 1;
@@ -3796,6 +3797,7 @@ static int snp_begin_psc(struct vcpu_svm *svm, struct psc_buffer *psc)
         case VMGEXIT_PSC_OP_PRIVATE:
         case VMGEXIT_PSC_OP_SHARED:
                 vcpu->run->exit_reason = KVM_EXIT_HYPERCALL;
+               vcpu->run->ret         = 0;
                 vcpu->run->hypercall.nr = KVM_HC_MAP_GPA_RANGE;
                 vcpu->run->hypercall.args[0] = gfn_to_gpa(gfn);
                 vcpu->run->hypercall.args[1] = npages;
diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c
index 85c8aee263c1..c50c2edc8c56 100644
--- a/arch/x86/kvm/vmx/tdx.c
+++ b/arch/x86/kvm/vmx/tdx.c
@@ -1161,6 +1161,7 @@ static void __tdx_map_gpa(struct vcpu_tdx * tdx)
         pr_err("%s: gpa = 0x%llx, size = 0x%llx", __func__, gpa, size);

         tdx->vcpu.run->exit_reason       = KVM_EXIT_HYPERCALL;
+       tdx->vcpu->run->ret              = 0;
         tdx->vcpu.run->hypercall.nr      = KVM_HC_MAP_GPA_RANGE;
         tdx->vcpu.run->hypercall.args[0] = gpa & ~gfn_to_gpa(kvm_gfn_direct_bits(tdx->vcpu.kvm));
         tdx->vcpu.run->hypercall.args[1] = size / PAGE_SIZE;
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 4f94b1e24eae..3f82bb2357e3 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -10070,6 +10070,7 @@ int ____kvm_emulate_hypercall(struct kvm_vcpu *vcpu, unsigned long nr,
                 }

                 vcpu->run->exit_reason        = KVM_EXIT_HYPERCALL;
+               vcpu->run->ret                = 0;
                 vcpu->run->hypercall.nr       = KVM_HC_MAP_GPA_RANGE;
                 vcpu->run->hypercall.args[0]  = gpa;
                 vcpu->run->hypercall.args[1]  = npages;
Binbin Wu Dec. 13, 2024, 1:52 a.m. UTC | #12
On 12/13/2024 9:46 AM, Binbin Wu wrote:
>
>
>
> On 12/13/2024 5:28 AM, Paolo Bonzini wrote:
>> On 12/12/24 20:13, Sean Christopherson wrote:
>>> On Thu, Dec 12, 2024, Paolo Bonzini wrote:
>>>> On 12/12/24 09:07, Zhao Liu wrote:
>>>>> On Thu, Dec 12, 2024 at 11:26:28AM +0800, Binbin Wu wrote:
>>>>>> Date: Thu, 12 Dec 2024 11:26:28 +0800
>>>>>> From: Binbin Wu <binbin.wu@linux.intel.com>
>>>>>> Subject: [PATCH] i386/kvm: Set return value after handling
>>>>>>    KVM_EXIT_HYPERCALL
>>>>>> X-Mailer: git-send-email 2.46.0
>>>>>>
>>>>>> Userspace should set the ret field of hypercall after handling
>>>>>> KVM_EXIT_HYPERCALL.  Otherwise, a stale value could be returned to KVM.
>>>>>>
>>>>>> Fixes: 47e76d03b15 ("i386/kvm: Add KVM_EXIT_HYPERCALL handling for KVM_HC_MAP_GPA_RANGE")
>>>>>> Reported-by: Farrah Chen <farrah.chen@intel.com>
>>>>>> Signed-off-by: Binbin Wu <binbin.wu@linux.intel.com>
>>>>>> Tested-by: Farrah Chen <farrah.chen@intel.com>
>>>>>> ---
>>>>>> To test the TDX code in kvm-coco-queue, please apply the patch to the QEMU,
>>>>>> otherwise, TDX guest boot could fail.
>>>>>> A matching QEMU tree including this patch is here:
>>>>>> https://github.com/intel-staging/qemu-tdx/releases/tag/tdx-qemu-upstream-v6.1-fix_kvm_hypercall_return_value
>>>>>>
>>>>>> Previously, the issue was not triggered because no one would modify the ret
>>>>>> value. But with the refactor patch for __kvm_emulate_hypercall() in KVM,
>>>>>> https://lore.kernel.org/kvm/20241128004344.4072099-7-seanjc@google.com/, the
>>>>>> value could be modified.
>>>>>
>>>>> Could you explain the specific reasons here in detail? It would be
>>>>> helpful with debugging or reproducing the issue.
>>>>>
>>>>>> ---
>>>>>>    target/i386/kvm/kvm.c | 8 ++++++--
>>>>>>    1 file changed, 6 insertions(+), 2 deletions(-)
>>>>>>
>>>>>> diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
>>>>>> index 8e17942c3b..4bcccb48d1 100644
>>>>>> --- a/target/i386/kvm/kvm.c
>>>>>> +++ b/target/i386/kvm/kvm.c
>>>>>> @@ -6005,10 +6005,14 @@ static int kvm_handle_hc_map_gpa_range(struct kvm_run *run)
>>>>>>    static int kvm_handle_hypercall(struct kvm_run *run)
>>>>>>    {
>>>>>> +    int ret = -EINVAL;
>>>>>> +
>>>>>>        if (run->hypercall.nr == KVM_HC_MAP_GPA_RANGE)
>>>>>> -        return kvm_handle_hc_map_gpa_range(run);
>>>>>> +        ret = kvm_handle_hc_map_gpa_range(run);
>>>>>> +
>>>>>> +    run->hypercall.ret = ret;
>>>>>
>>>>> ret may be negative but hypercall.ret is u64. Do we need to set it to
>>>>> -ret?
>>>>
>>>> If ret is less than zero, will stop the VM anyway as
>>>> RUN_STATE_INTERNAL_ERROR.
>>>>
>>>> If this has to be fixed in QEMU, I think there's no need to set anything
>>>> if ret != 0; also because kvm_convert_memory() returns -1 on error and
>>>> that's not how the error would be passed to the guest.
>>>>
>>>> However, I think the right fix should simply be this in KVM:
>>>>
>>>> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
>>>> index 83fe0a78146f..e2118ba93ef6 100644
>>>> --- a/arch/x86/kvm/x86.c
>>>> +++ b/arch/x86/kvm/x86.c
>>>> @@ -10066,6 +10066,7 @@ unsigned long __kvm_emulate_hypercall(struct kvm_vcpu *vcpu, unsigned long nr,
>>>>           }
>>>>           vcpu->run->exit_reason        = KVM_EXIT_HYPERCALL;
>>>> +        vcpu->run->ret                = 0;
>>>
>>>         vcpu->run->hypercall.ret
>>>
>>>> vcpu->run->hypercall.nr       = KVM_HC_MAP_GPA_RANGE;
>>>>           vcpu->run->hypercall.args[0]  = gpa;
>>>>           vcpu->run->hypercall.args[1]  = npages;
>>>>
>>>> While there is arguably a change in behavior of the kernel both with
>>>> the patches in kvm-coco-queue and with the above one, _in practice_
>>>> the above change is one that userspace will not notice.
>>>
>>> I agree that KVM should initialize "ret", but I don't think '0' is the right
>>> value.  KVM shouldn't assume userspace will successfully handle the hypercall.
>>> What happens if KVM sets vcpu->run->hypercall.ret to a non-zero value, e.g. -KVM_ENOSYS?
>>
>> Unfortunately QEMU is never writing vcpu->run->hypercall.ret, so the guest sees -KVM_ENOSYS; this is basically the same bug that Binbin is fixing, just with a different value passed to the guest.
>>
>> In other words, the above one-liner is pulling the "don't break userspace" card.
>>
>> Paolo
>>
>>
> If the change need to be done in KVM, there are other 3 functions that use
> KVM_EXIT_HYPERCALL based on the code in kvm-coco-queue.
>
> diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
> index 40fe7258843e..a624f7289282 100644
> --- a/arch/x86/kvm/svm/sev.c
> +++ b/arch/x86/kvm/svm/sev.c
> @@ -3633,6 +3633,7 @@ static int snp_begin_psc_msr(struct vcpu_svm *svm, u64 ghcb_msr)
>         }
>
>         vcpu->run->exit_reason = KVM_EXIT_HYPERCALL;
> +       vcpu->run->ret         = 0;
>         vcpu->run->hypercall.nr = KVM_HC_MAP_GPA_RANGE;
>         vcpu->run->hypercall.args[0] = gpa;
>         vcpu->run->hypercall.args[1] = 1;
> @@ -3796,6 +3797,7 @@ static int snp_begin_psc(struct vcpu_svm *svm, struct psc_buffer *psc)
>         case VMGEXIT_PSC_OP_PRIVATE:
>         case VMGEXIT_PSC_OP_SHARED:
>                 vcpu->run->exit_reason = KVM_EXIT_HYPERCALL;
> +               vcpu->run->ret         = 0;
>                 vcpu->run->hypercall.nr = KVM_HC_MAP_GPA_RANGE;
>                 vcpu->run->hypercall.args[0] = gfn_to_gpa(gfn);
>                 vcpu->run->hypercall.args[1] = npages;
> diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c
> index 85c8aee263c1..c50c2edc8c56 100644
> --- a/arch/x86/kvm/vmx/tdx.c
> +++ b/arch/x86/kvm/vmx/tdx.c
> @@ -1161,6 +1161,7 @@ static void __tdx_map_gpa(struct vcpu_tdx * tdx)
>         pr_err("%s: gpa = 0x%llx, size = 0x%llx", __func__, gpa, size);
>
>         tdx->vcpu.run->exit_reason       = KVM_EXIT_HYPERCALL;
> +       tdx->vcpu->run->ret              = 0;
>         tdx->vcpu.run->hypercall.nr      = KVM_HC_MAP_GPA_RANGE;
>         tdx->vcpu.run->hypercall.args[0] = gpa & ~gfn_to_gpa(kvm_gfn_direct_bits(tdx->vcpu.kvm));
>         tdx->vcpu.run->hypercall.args[1] = size / PAGE_SIZE;
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 4f94b1e24eae..3f82bb2357e3 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -10070,6 +10070,7 @@ int ____kvm_emulate_hypercall(struct kvm_vcpu *vcpu, unsigned long nr,
>                 }
>
>                 vcpu->run->exit_reason        = KVM_EXIT_HYPERCALL;
> +               vcpu->run->ret                = 0;
>                 vcpu->run->hypercall.nr       = KVM_HC_MAP_GPA_RANGE;
>                 vcpu->run->hypercall.args[0]  = gpa;
>                 vcpu->run->hypercall.args[1]  = npages;
>
Maybe we could add a helper to fill the vcpu->run?
Binbin Wu Dec. 13, 2024, 1:56 a.m. UTC | #13
On 12/13/2024 9:46 AM, Binbin Wu wrote:
>
>
>
> On 12/13/2024 5:28 AM, Paolo Bonzini wrote:
>> On 12/12/24 20:13, Sean Christopherson wrote:
>>> On Thu, Dec 12, 2024, Paolo Bonzini wrote:
>>>> On 12/12/24 09:07, Zhao Liu wrote:
>>>>> On Thu, Dec 12, 2024 at 11:26:28AM +0800, Binbin Wu wrote:
>>>>>> Date: Thu, 12 Dec 2024 11:26:28 +0800
>>>>>> From: Binbin Wu <binbin.wu@linux.intel.com>
>>>>>> Subject: [PATCH] i386/kvm: Set return value after handling
>>>>>>    KVM_EXIT_HYPERCALL
>>>>>> X-Mailer: git-send-email 2.46.0
>>>>>>
>>>>>> Userspace should set the ret field of hypercall after handling
>>>>>> KVM_EXIT_HYPERCALL.  Otherwise, a stale value could be returned to KVM.
>>>>>>
>>>>>> Fixes: 47e76d03b15 ("i386/kvm: Add KVM_EXIT_HYPERCALL handling for KVM_HC_MAP_GPA_RANGE")
>>>>>> Reported-by: Farrah Chen <farrah.chen@intel.com>
>>>>>> Signed-off-by: Binbin Wu <binbin.wu@linux.intel.com>
>>>>>> Tested-by: Farrah Chen <farrah.chen@intel.com>
>>>>>> ---
>>>>>> To test the TDX code in kvm-coco-queue, please apply the patch to the QEMU,
>>>>>> otherwise, TDX guest boot could fail.
>>>>>> A matching QEMU tree including this patch is here:
>>>>>> https://github.com/intel-staging/qemu-tdx/releases/tag/tdx-qemu-upstream-v6.1-fix_kvm_hypercall_return_value
>>>>>>
>>>>>> Previously, the issue was not triggered because no one would modify the ret
>>>>>> value. But with the refactor patch for __kvm_emulate_hypercall() in KVM,
>>>>>> https://lore.kernel.org/kvm/20241128004344.4072099-7-seanjc@google.com/, the
>>>>>> value could be modified.
>>>>>
>>>>> Could you explain the specific reasons here in detail? It would be
>>>>> helpful with debugging or reproducing the issue.
>>>>>
>>>>>> ---
>>>>>>    target/i386/kvm/kvm.c | 8 ++++++--
>>>>>>    1 file changed, 6 insertions(+), 2 deletions(-)
>>>>>>
>>>>>> diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
>>>>>> index 8e17942c3b..4bcccb48d1 100644
>>>>>> --- a/target/i386/kvm/kvm.c
>>>>>> +++ b/target/i386/kvm/kvm.c
>>>>>> @@ -6005,10 +6005,14 @@ static int kvm_handle_hc_map_gpa_range(struct kvm_run *run)
>>>>>>    static int kvm_handle_hypercall(struct kvm_run *run)
>>>>>>    {
>>>>>> +    int ret = -EINVAL;
>>>>>> +
>>>>>>        if (run->hypercall.nr == KVM_HC_MAP_GPA_RANGE)
>>>>>> -        return kvm_handle_hc_map_gpa_range(run);
>>>>>> +        ret = kvm_handle_hc_map_gpa_range(run);
>>>>>> +
>>>>>> +    run->hypercall.ret = ret;
>>>>>
>>>>> ret may be negative but hypercall.ret is u64. Do we need to set it to
>>>>> -ret?
>>>>
>>>> If ret is less than zero, will stop the VM anyway as
>>>> RUN_STATE_INTERNAL_ERROR.
>>>>
>>>> If this has to be fixed in QEMU, I think there's no need to set anything
>>>> if ret != 0; also because kvm_convert_memory() returns -1 on error and
>>>> that's not how the error would be passed to the guest.
>>>>
>>>> However, I think the right fix should simply be this in KVM:
>>>>
>>>> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
>>>> index 83fe0a78146f..e2118ba93ef6 100644
>>>> --- a/arch/x86/kvm/x86.c
>>>> +++ b/arch/x86/kvm/x86.c
>>>> @@ -10066,6 +10066,7 @@ unsigned long __kvm_emulate_hypercall(struct kvm_vcpu *vcpu, unsigned long nr,
>>>>           }
>>>>           vcpu->run->exit_reason        = KVM_EXIT_HYPERCALL;
>>>> +        vcpu->run->ret                = 0;
>>>
>>>         vcpu->run->hypercall.ret
>>>
>>>> vcpu->run->hypercall.nr       = KVM_HC_MAP_GPA_RANGE;
>>>>           vcpu->run->hypercall.args[0]  = gpa;
>>>>           vcpu->run->hypercall.args[1]  = npages;
>>>>
>>>> While there is arguably a change in behavior of the kernel both with
>>>> the patches in kvm-coco-queue and with the above one, _in practice_
>>>> the above change is one that userspace will not notice.
>>>
>>> I agree that KVM should initialize "ret", but I don't think '0' is the right
>>> value.  KVM shouldn't assume userspace will successfully handle the hypercall.
>>> What happens if KVM sets vcpu->run->hypercall.ret to a non-zero value, e.g. -KVM_ENOSYS?
>>
>> Unfortunately QEMU is never writing vcpu->run->hypercall.ret, so the guest sees -KVM_ENOSYS; this is basically the same bug that Binbin is fixing, just with a different value passed to the guest.
>>
>> In other words, the above one-liner is pulling the "don't break userspace" card.
>>
>> Paolo
>>
>>
> If the change need to be done in KVM, there are other 3 functions that use
> KVM_EXIT_HYPERCALL based on the code in kvm-coco-queue.
>
> diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
> index 40fe7258843e..a624f7289282 100644
> --- a/arch/x86/kvm/svm/sev.c
> +++ b/arch/x86/kvm/svm/sev.c
> @@ -3633,6 +3633,7 @@ static int snp_begin_psc_msr(struct vcpu_svm *svm, u64 ghcb_msr)
>         }
>
>         vcpu->run->exit_reason = KVM_EXIT_HYPERCALL;
> +       vcpu->run->ret         = 0;
>         vcpu->run->hypercall.nr = KVM_HC_MAP_GPA_RANGE;
>         vcpu->run->hypercall.args[0] = gpa;
>         vcpu->run->hypercall.args[1] = 1;
> @@ -3796,6 +3797,7 @@ static int snp_begin_psc(struct vcpu_svm *svm, struct psc_buffer *psc)
>         case VMGEXIT_PSC_OP_PRIVATE:
>         case VMGEXIT_PSC_OP_SHARED:
>                 vcpu->run->exit_reason = KVM_EXIT_HYPERCALL;
> +               vcpu->run->ret         = 0;
>                 vcpu->run->hypercall.nr = KVM_HC_MAP_GPA_RANGE;
>                 vcpu->run->hypercall.args[0] = gfn_to_gpa(gfn);
>                 vcpu->run->hypercall.args[1] = npages;
> diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c
> index 85c8aee263c1..c50c2edc8c56 100644
> --- a/arch/x86/kvm/vmx/tdx.c
> +++ b/arch/x86/kvm/vmx/tdx.c
> @@ -1161,6 +1161,7 @@ static void __tdx_map_gpa(struct vcpu_tdx * tdx)
>         pr_err("%s: gpa = 0x%llx, size = 0x%llx", __func__, gpa, size);
>
>         tdx->vcpu.run->exit_reason       = KVM_EXIT_HYPERCALL;
> +       tdx->vcpu->run->ret              = 0;
Sorry, this should be  " tdx->vcpu.run->ret              = 0;"

> tdx->vcpu.run->hypercall.nr      = KVM_HC_MAP_GPA_RANGE;
>         tdx->vcpu.run->hypercall.args[0] = gpa & ~gfn_to_gpa(kvm_gfn_direct_bits(tdx->vcpu.kvm));
>         tdx->vcpu.run->hypercall.args[1] = size / PAGE_SIZE;
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 4f94b1e24eae..3f82bb2357e3 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -10070,6 +10070,7 @@ int ____kvm_emulate_hypercall(struct kvm_vcpu *vcpu, unsigned long nr,
>                 }
>
>                 vcpu->run->exit_reason        = KVM_EXIT_HYPERCALL;
> +               vcpu->run->ret                = 0;
>                 vcpu->run->hypercall.nr       = KVM_HC_MAP_GPA_RANGE;
>                 vcpu->run->hypercall.args[0]  = gpa;
>                 vcpu->run->hypercall.args[1]  = npages;
>
diff mbox series

Patch

diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
index 8e17942c3b..4bcccb48d1 100644
--- a/target/i386/kvm/kvm.c
+++ b/target/i386/kvm/kvm.c
@@ -6005,10 +6005,14 @@  static int kvm_handle_hc_map_gpa_range(struct kvm_run *run)
 
 static int kvm_handle_hypercall(struct kvm_run *run)
 {
+    int ret = -EINVAL;
+
     if (run->hypercall.nr == KVM_HC_MAP_GPA_RANGE)
-        return kvm_handle_hc_map_gpa_range(run);
+        ret = kvm_handle_hc_map_gpa_range(run);
+
+    run->hypercall.ret = ret;
 
-    return -EINVAL;
+    return ret;
 }
 
 #define VMX_INVALID_GUEST_STATE 0x80000021