diff mbox series

[v2,6/8] KVM: TDX: Handle TDG.VP.VMCALL<ReportFatalError>

Message ID 20250211025442.3071607-7-binbin.wu@linux.intel.com (mailing list archive)
State New
Headers show
Series KVM: TDX: TDX hypercalls may exit to userspace | expand

Commit Message

Binbin Wu Feb. 11, 2025, 2:54 a.m. UTC
Convert TDG.VP.VMCALL<ReportFatalError> to KVM_EXIT_SYSTEM_EVENT with
a new type KVM_SYSTEM_EVENT_TDX_FATAL and forward it to userspace for
handling.

TD guest can use TDG.VP.VMCALL<ReportFatalError> to report the fatal
error it has experienced.  This hypercall is special because TD guest
is requesting a termination with the error information, KVM needs to
forward the hypercall to userspace anyway, KVM doesn't do sanity checks
and let userspace decide what to do.

Signed-off-by: Binbin Wu <binbin.wu@linux.intel.com>
---
Hypercalls exit to userspace v2:
- Use vp_enter_args instead of x86 registers.
- vcpu->run->system_event.ndata is not hardcoded to 10. (Xiaoyao)
- Undefine COPY_REG after use. (Yilun)
- Updated the document about KVM_SYSTEM_EVENT_TDX_FATAL. (Chao)

Hypercalls exit to userspace v1:
- New added.
  Implement one of the hypercalls need to exit to userspace for handling after
  reverting "KVM: TDX: Add KVM Exit for TDX TDG.VP.VMCALL", which tries to resolve
  Sean's comment.
  https://lore.kernel.org/kvm/Zg18ul8Q4PGQMWam@google.com/
- Use TDVMCALL_STATUS prefix for TDX call status codes (Binbin)
---
 Documentation/virt/kvm/api.rst |  9 +++++++
 arch/x86/kvm/vmx/tdx.c         | 45 ++++++++++++++++++++++++++++++++++
 include/uapi/linux/kvm.h       |  1 +
 3 files changed, 55 insertions(+)

Comments

Sean Christopherson Feb. 12, 2025, 12:18 a.m. UTC | #1
On Tue, Feb 11, 2025, Binbin Wu wrote:
> +static int tdx_report_fatal_error(struct kvm_vcpu *vcpu)
> +{
> +	struct vcpu_tdx *tdx = to_tdx(vcpu);
> +	u64 reg_mask = tdx->vp_enter_args.rcx;
> +	u64 *opt_regs;
> +
> +	/*
> +	 * Skip sanity checks and let userspace decide what to do if sanity
> +	 * checks fail.
> +	 */
> +	vcpu->run->exit_reason = KVM_EXIT_SYSTEM_EVENT;
> +	vcpu->run->system_event.type = KVM_SYSTEM_EVENT_TDX_FATAL;
> +	/* Error codes. */
> +	vcpu->run->system_event.data[0] = tdx->vp_enter_args.r12;
> +	/* GPA of additional information page. */
> +	vcpu->run->system_event.data[1] = tdx->vp_enter_args.r13;
> +	/* Information passed via registers (up to 64 bytes). */
> +	opt_regs = &vcpu->run->system_event.data[2];
> +
> +#define COPY_REG(REG, MASK)						\
> +	do {								\
> +		if (reg_mask & MASK) {					\

Based on past experience with conditionally filling kvm_run fields, I think KVM
should copy all registers and let userspace sort out the reg_mask.  Unless the
guest passes an ASCII byte stream exactly as the GHCI suggests, the information
is quite useless because userspace doesn't have reg_mask and so can't know what's
in data[4], data[5], etc...  And I won't be the least bit surprised if guests
deviate from the GHCI.

> +			*opt_regs = tdx->vp_enter_args.REG;		\
> +			opt_regs++;					\
> +		}							\
> +	} while (0)
> +
> +	/* The order is defined in GHCI. */

Assuming I haven't missed something, to hell with the GCHI, just dump *all*
registers, sorted by their index (ascending).  Including RAX (TDCALL), RBP, and
RSP.
Binbin Wu Feb. 12, 2025, 5:37 a.m. UTC | #2
On 2/12/2025 8:18 AM, Sean Christopherson wrote:
> On Tue, Feb 11, 2025, Binbin Wu wrote:
>> +static int tdx_report_fatal_error(struct kvm_vcpu *vcpu)
>> +{
>> +	struct vcpu_tdx *tdx = to_tdx(vcpu);
>> +	u64 reg_mask = tdx->vp_enter_args.rcx;
>> +	u64 *opt_regs;
>> +
>> +	/*
>> +	 * Skip sanity checks and let userspace decide what to do if sanity
>> +	 * checks fail.
>> +	 */
>> +	vcpu->run->exit_reason = KVM_EXIT_SYSTEM_EVENT;
>> +	vcpu->run->system_event.type = KVM_SYSTEM_EVENT_TDX_FATAL;
>> +	/* Error codes. */
>> +	vcpu->run->system_event.data[0] = tdx->vp_enter_args.r12;
>> +	/* GPA of additional information page. */
>> +	vcpu->run->system_event.data[1] = tdx->vp_enter_args.r13;
>> +	/* Information passed via registers (up to 64 bytes). */
>> +	opt_regs = &vcpu->run->system_event.data[2];
>> +
>> +#define COPY_REG(REG, MASK)						\
>> +	do {								\
>> +		if (reg_mask & MASK) {					\
> Based on past experience with conditionally filling kvm_run fields, I think KVM
> should copy all registers and let userspace sort out the reg_mask.  Unless the
> guest passes an ASCII byte stream exactly as the GHCI suggests,
Yea, GHCI doesn't enforce it to be ASCII byte stream.

> the information
> is quite useless because userspace doesn't have reg_mask and so can't know what's
> in data[4], data[5], etc...  And I won't be the least bit surprised if guests
> deviate from the GHCI.

But it also confuses userspace if guests uses special protocol to pass
information other than ASCII byte stream.

Anyway, dumping all registers to userspace and let userspace to have all
the information passed from guest for parsing is definitely workable.

>
>> +			*opt_regs = tdx->vp_enter_args.REG;		\
>> +			opt_regs++;					\
>> +		}							\
>> +	} while (0)
>> +
>> +	/* The order is defined in GHCI. */
> Assuming I haven't missed something, to hell with the GCHI, just dump *all*
> registers, sorted by their index (ascending).  Including RAX (TDCALL), RBP, and
> RSP.
>
Sean Christopherson Feb. 12, 2025, 1:53 p.m. UTC | #3
On Wed, Feb 12, 2025, Binbin Wu wrote:
> On 2/12/2025 8:18 AM, Sean Christopherson wrote:
> > On Tue, Feb 11, 2025, Binbin Wu wrote:
> > the information is quite useless because userspace doesn't have reg_mask
> > and so can't know what's in data[4], data[5], etc...  And I won't be the
> > least bit surprised if guests deviate from the GHCI.
> 
> But it also confuses userspace if guests uses special protocol to pass
> information other than ASCII byte stream.

Yes, but only if userspace and the guest aren't in cahoots.  There are use cases
where the VMM and the VM are managed/owned by the same entity.
diff mbox series

Patch

diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
index 2ba70c1fad51..5e415b312ab0 100644
--- a/Documentation/virt/kvm/api.rst
+++ b/Documentation/virt/kvm/api.rst
@@ -6823,6 +6823,7 @@  should put the acknowledged interrupt vector into the 'epr' field.
   #define KVM_SYSTEM_EVENT_WAKEUP         4
   #define KVM_SYSTEM_EVENT_SUSPEND        5
   #define KVM_SYSTEM_EVENT_SEV_TERM       6
+  #define KVM_SYSTEM_EVENT_TDX_FATAL      7
 			__u32 type;
                         __u32 ndata;
                         __u64 data[16];
@@ -6849,6 +6850,14 @@  Valid values for 'type' are:
    reset/shutdown of the VM.
  - KVM_SYSTEM_EVENT_SEV_TERM -- an AMD SEV guest requested termination.
    The guest physical address of the guest's GHCB is stored in `data[0]`.
+ - KVM_SYSTEM_EVENT_TDX_FATAL -- a TDX guest reported a fatal error state.
+   The error code reported by the TDX guest is stored in `data[0]`, the error
+   code format is defined in TDX GHCI specification.
+   If the bit 63 of `data[0]` is set, it indicates there is TD specified
+   additional information provided in a page, which is shared memory. The
+   guest physical address of the information page is stored in `data[1]`.
+   An optional error message is provided by `data[2]` ~ `data[9]`, which is
+   byte sequence, LSB filled first. Typically, ASCII code(0x20-0x7e) is filled.
  - KVM_SYSTEM_EVENT_WAKEUP -- the exiting vCPU is in a suspended state and
    KVM has recognized a wakeup event. Userspace may honor this event by
    marking the exiting vCPU as runnable, or deny it and call KVM_RUN again.
diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c
index 8b51b4c937e9..85956768c515 100644
--- a/arch/x86/kvm/vmx/tdx.c
+++ b/arch/x86/kvm/vmx/tdx.c
@@ -1092,11 +1092,56 @@  static int tdx_map_gpa(struct kvm_vcpu *vcpu)
 	return 1;
 }
 
+static int tdx_report_fatal_error(struct kvm_vcpu *vcpu)
+{
+	struct vcpu_tdx *tdx = to_tdx(vcpu);
+	u64 reg_mask = tdx->vp_enter_args.rcx;
+	u64 *opt_regs;
+
+	/*
+	 * Skip sanity checks and let userspace decide what to do if sanity
+	 * checks fail.
+	 */
+	vcpu->run->exit_reason = KVM_EXIT_SYSTEM_EVENT;
+	vcpu->run->system_event.type = KVM_SYSTEM_EVENT_TDX_FATAL;
+	/* Error codes. */
+	vcpu->run->system_event.data[0] = tdx->vp_enter_args.r12;
+	/* GPA of additional information page. */
+	vcpu->run->system_event.data[1] = tdx->vp_enter_args.r13;
+	/* Information passed via registers (up to 64 bytes). */
+	opt_regs = &vcpu->run->system_event.data[2];
+
+#define COPY_REG(REG, MASK)						\
+	do {								\
+		if (reg_mask & MASK) {					\
+			*opt_regs = tdx->vp_enter_args.REG;		\
+			opt_regs++;					\
+		}							\
+	} while (0)
+
+	/* The order is defined in GHCI. */
+	COPY_REG(r14, BIT_ULL(14));
+	COPY_REG(r15, BIT_ULL(15));
+	COPY_REG(rbx, BIT_ULL(3));
+	COPY_REG(rdi, BIT_ULL(7));
+	COPY_REG(rsi, BIT_ULL(6));
+	COPY_REG(r8, BIT_ULL(8));
+	COPY_REG(r9, BIT_ULL(9));
+	COPY_REG(rdx, BIT_ULL(2));
+#undef COPY_REG
+
+	vcpu->run->system_event.ndata = opt_regs - vcpu->run->system_event.data;
+
+	return 0;
+}
+
 static int handle_tdvmcall(struct kvm_vcpu *vcpu)
 {
 	switch (tdvmcall_leaf(vcpu)) {
 	case TDVMCALL_MAP_GPA:
 		return tdx_map_gpa(vcpu);
+	case TDVMCALL_REPORT_FATAL_ERROR:
+		return tdx_report_fatal_error(vcpu);
 	default:
 		break;
 	}
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index 45e6d8fca9b9..937400350317 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -375,6 +375,7 @@  struct kvm_run {
 #define KVM_SYSTEM_EVENT_WAKEUP         4
 #define KVM_SYSTEM_EVENT_SUSPEND        5
 #define KVM_SYSTEM_EVENT_SEV_TERM       6
+#define KVM_SYSTEM_EVENT_TDX_FATAL      7
 			__u32 type;
 			__u32 ndata;
 			union {