diff mbox series

[for-5.18] KVM: fix bad user ABI for KVM_EXIT_SYSTEM_EVENT

Message ID 20220422103013.34832-1-pbonzini@redhat.com (mailing list archive)
State New, archived
Headers show
Series [for-5.18] KVM: fix bad user ABI for KVM_EXIT_SYSTEM_EVENT | expand

Commit Message

Paolo Bonzini April 22, 2022, 10:30 a.m. UTC
When KVM_EXIT_SYSTEM_EVENT was introduced, it included a flags
member that at the time was unused.  Unfortunately this extensibility
mechanism has several issues:

- x86 is not writing the member, so it is not possible to use it
  on x86 except for new events

- the member is not aligned to 64 bits, so the definition of the
  uAPI struct is incorrect for 32-bit userspace.  This is a problem
  for RISC-V, which supports CONFIG_KVM_COMPAT.

Since padding has to be introduced, place a new field in there
that tells if the flags field is valid.  To allow further extensibility,
in fact, change flags to an array of 16 values, and store how many
of the values are valid.  The availability of the new ndata field
is tied to a system capability; all architectures are changed to
fill in the field.

For compatibility with userspace that was using the flags field,
a union overlaps flags with data[0].

Supersedes: <20220421180443.1465634-1-pbonzini@redhat.com>
Cc: Will Deacon <will@kernel.org>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Peter Gonda <pgonda@google.com>
Cc: Sean Christopherson <seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 Documentation/virt/kvm/api.rst | 24 +++++++++++++++++-------
 arch/arm64/kvm/psci.c          |  3 ++-
 arch/riscv/kvm/vcpu_sbi.c      |  3 ++-
 arch/x86/kvm/x86.c             |  2 ++
 include/uapi/linux/kvm.h       |  8 +++++++-
 virt/kvm/kvm_main.c            |  1 +
 6 files changed, 31 insertions(+), 10 deletions(-)

Comments

kernel test robot April 22, 2022, 7:57 p.m. UTC | #1
Hi Paolo,

I love your patch! Yet something to improve:

[auto build test ERROR on kvm/master]
[also build test ERROR on linus/master v5.18-rc3]
[cannot apply to kvmarm/next mst-vhost/linux-next linux/master next-20220422]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/intel-lab-lkp/linux/commits/Paolo-Bonzini/KVM-fix-bad-user-ABI-for-KVM_EXIT_SYSTEM_EVENT/20220422-184535
base:   https://git.kernel.org/pub/scm/virt/kvm/kvm.git master
config: riscv-randconfig-r014-20220422 (https://download.01.org/0day-ci/archive/20220423/202204230312.8EOM8DHM-lkp@intel.com/config)
compiler: clang version 15.0.0 (https://github.com/llvm/llvm-project 5bd87350a5ae429baf8f373cb226a57b62f87280)
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # install riscv cross compiling tool for clang build
        # apt-get install binutils-riscv64-linux-gnu
        # https://github.com/intel-lab-lkp/linux/commit/71af2c55b700878ad9d73c90c6b75f327f36d0a9
        git remote add linux-review https://github.com/intel-lab-lkp/linux
        git fetch --no-tags linux-review Paolo-Bonzini/KVM-fix-bad-user-ABI-for-KVM_EXIT_SYSTEM_EVENT/20220422-184535
        git checkout 71af2c55b700878ad9d73c90c6b75f327f36d0a9
        # save the config file
        mkdir build_dir && cp config build_dir/.config
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross W=1 O=build_dir ARCH=riscv SHELL=/bin/bash

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All errors (new ones prefixed by >>):

>> arch/riscv/kvm/vcpu_sbi.c:98:30: error: use of undeclared identifier 'reason'
           run->system_event.data[0] = reason;
                                       ^
   1 error generated.


vim +/reason +98 arch/riscv/kvm/vcpu_sbi.c

    83	
    84	void kvm_riscv_vcpu_sbi_system_reset(struct kvm_vcpu *vcpu,
    85					     struct kvm_run *run,
    86					     u32 type, u64 flags)
    87	{
    88		unsigned long i;
    89		struct kvm_vcpu *tmp;
    90	
    91		kvm_for_each_vcpu(i, tmp, vcpu->kvm)
    92			tmp->arch.power_off = true;
    93		kvm_make_all_cpus_request(vcpu->kvm, KVM_REQ_SLEEP);
    94	
    95		memset(&run->system_event, 0, sizeof(run->system_event));
    96		run->system_event.type = type;
    97		run->system_event.ndata = 1;
  > 98		run->system_event.data[0] = reason;
    99		run->exit_reason = KVM_EXIT_SYSTEM_EVENT;
   100	}
   101
Oliver Upton April 29, 2022, 4:38 a.m. UTC | #2
Hi Paolo,

On Fri, Apr 22, 2022 at 12:30:13PM +0200, Paolo Bonzini wrote:
> When KVM_EXIT_SYSTEM_EVENT was introduced, it included a flags
> member that at the time was unused.  Unfortunately this extensibility
> mechanism has several issues:
> 
> - x86 is not writing the member, so it is not possible to use it
>   on x86 except for new events
> 
> - the member is not aligned to 64 bits, so the definition of the
>   uAPI struct is incorrect for 32-bit userspace.  This is a problem
>   for RISC-V, which supports CONFIG_KVM_COMPAT.
> 
> Since padding has to be introduced, place a new field in there
> that tells if the flags field is valid.  To allow further extensibility,
> in fact, change flags to an array of 16 values, and store how many
> of the values are valid.  The availability of the new ndata field
> is tied to a system capability; all architectures are changed to
> fill in the field.
> 
> For compatibility with userspace that was using the flags field,
> a union overlaps flags with data[0].
> 
> Supersedes: <20220421180443.1465634-1-pbonzini@redhat.com>
> Cc: Will Deacon <will@kernel.org>
> Cc: Marc Zyngier <maz@kernel.org>
> Cc: Peter Gonda <pgonda@google.com>
> Cc: Sean Christopherson <seanjc@google.com>
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
>  Documentation/virt/kvm/api.rst | 24 +++++++++++++++++-------
>  arch/arm64/kvm/psci.c          |  3 ++-
>  arch/riscv/kvm/vcpu_sbi.c      |  3 ++-
>  arch/x86/kvm/x86.c             |  2 ++
>  include/uapi/linux/kvm.h       |  8 +++++++-
>  virt/kvm/kvm_main.c            |  1 +
>  6 files changed, 31 insertions(+), 10 deletions(-)
> 
> diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
> index 85c7abc51af5..4a900cdbc62e 100644
> --- a/Documentation/virt/kvm/api.rst
> +++ b/Documentation/virt/kvm/api.rst
> @@ -5986,16 +5986,16 @@ should put the acknowledged interrupt vector into the 'epr' field.
>    #define KVM_SYSTEM_EVENT_RESET          2
>    #define KVM_SYSTEM_EVENT_CRASH          3
>  			__u32 type;
> -			__u64 flags;
> +                        __u32 ndata;
> +                        __u64 data[16];

This is out of sync with the union { flags; data; } now.

IMO, we should put a giant disclaimer on all of this to *not* use the
flags field and instead only use data. I imagine we wont want to persist
the union forever as it is quite ugly, but necessary.

[...]

> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
> index 91a6fe4e02c0..f903ab0c8d7a 100644
> --- a/include/uapi/linux/kvm.h
> +++ b/include/uapi/linux/kvm.h
> @@ -445,7 +445,11 @@ struct kvm_run {
>  #define KVM_SYSTEM_EVENT_RESET          2
>  #define KVM_SYSTEM_EVENT_CRASH          3
>  			__u32 type;
> -			__u64 flags;
> +			__u32 ndata;
> +			union {
> +				__u64 flags;
> +				__u64 data[16];
> +			};
>  		} system_event;
>  		/* KVM_EXIT_S390_STSI */
>  		struct {
> @@ -1144,6 +1148,8 @@ struct kvm_ppc_resize_hpt {
>  #define KVM_CAP_S390_MEM_OP_EXTENSION 211
>  #define KVM_CAP_PMU_CAPABILITY 212
>  #define KVM_CAP_DISABLE_QUIRKS2 213
> +/* #define KVM_CAP_VM_TSC_CONTROL 214 */

This sticks out a bit. Couldn't the VM TSC control patch just use a
different number? It seems that there will be a conflict anyway, if only to
delete this comment.

How do we go about getting CAP numbers for features coming in from other
architectures? An eager backport (such as the Android case that made us
look at a union) would wind up using the wrong capability for a feature.

--
Thanks,
Oliver
Paolo Bonzini April 29, 2022, 1:52 p.m. UTC | #3
On 4/29/22 06:38, Oliver Upton wrote:
>> +                        __u64 data[16];
> This is out of sync with the union { flags; data; } now.

Yes, that's intentional.  The flags member is mentioned below:

+Previous versions of Linux defined a `flags` member in this struct.  The
+field is now aliased to `data[0]`.  Userspace can assume that it is only
+written if ndata is greater than 0.

but I don't want projects to believe it is different in any way from
`data[0]`.  In particular, `flags` should also be considered valid only
if the cap is present (unless crosvm wants ARM to be grandfathered in).

> IMO, we should put a giant disclaimer on all of this to*not*  use the
> flags field and instead only use data. I imagine we wont want to persist
> the union forever as it is quite ugly, but necessary.


>> +/* #define KVM_CAP_VM_TSC_CONTROL 214 */
> 
> This sticks out a bit. Couldn't the VM TSC control patch just use a
> different number? It seems that there will be a conflict anyway, if only to
> delete this comment.

I don't want to change cap numbers once things have landed in
kvm/next, because that's when userspace projects pick them.

Paolo
Sean Christopherson April 29, 2022, 2:03 p.m. UTC | #4
On Fri, Apr 22, 2022, Paolo Bonzini wrote:
> For compatibility with userspace that was using the flags field,
> a union overlaps flags with data[0].

I think "compatibility" is slightly misleading, e.g. the offset of the field is
changing for 32-bit userspace. 

  To avoid breaking compilation of userspace that was using the flags
  field, provide a userspace-only union to overlap flags with data[0].

> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
> index 91a6fe4e02c0..f903ab0c8d7a 100644
> --- a/include/uapi/linux/kvm.h
> +++ b/include/uapi/linux/kvm.h
> @@ -445,7 +445,11 @@ struct kvm_run {
>  #define KVM_SYSTEM_EVENT_RESET          2
>  #define KVM_SYSTEM_EVENT_CRASH          3
>  			__u32 type;
> -			__u64 flags;
> +			__u32 ndata;
> +			union {
> +				__u64 flags;

As alluded to above, what about wrapping flags in 

#ifndef __KERNEL__
				__u64 flags;
#endif

so that KVM doesn't try to use flags?

> +				__u64 data[16];
> +			};
>  		} system_event;
>  		/* KVM_EXIT_S390_STSI */
>  		struct {
Paolo Bonzini April 29, 2022, 2:05 p.m. UTC | #5
On 4/29/22 16:03, Sean Christopherson wrote:
> On Fri, Apr 22, 2022, Paolo Bonzini wrote:
>> For compatibility with userspace that was using the flags field,
>> a union overlaps flags with data[0].
> 
> I think "compatibility" is slightly misleading, e.g. the offset of the field is
> changing for 32-bit userspace.

Well, the only such userspace AFAIK is crosvm on ARM and there's no 
compat ABI for ARM.  But yeah, your wording below sounds good.

>    To avoid breaking compilation of userspace that was using the flags
>    field, provide a userspace-only union to overlap flags with data[0].
> 
>> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
>> index 91a6fe4e02c0..f903ab0c8d7a 100644
>> --- a/include/uapi/linux/kvm.h
>> +++ b/include/uapi/linux/kvm.h
>> @@ -445,7 +445,11 @@ struct kvm_run {
>>   #define KVM_SYSTEM_EVENT_RESET          2
>>   #define KVM_SYSTEM_EVENT_CRASH          3
>>   			__u32 type;
>> -			__u64 flags;
>> +			__u32 ndata;
>> +			union {
>> +				__u64 flags;
> 
> As alluded to above, what about wrapping flags in
> 
> #ifndef __KERNEL__
> 				__u64 flags;
> #endif
> 
> so that KVM doesn't try to use flags?

Interesting idea.  I'll apply it and push the patch.

Thanks for the review!

Paolo

>> +				__u64 data[16];
>> +			};
>>   		} system_event;
>>   		/* KVM_EXIT_S390_STSI */
>>   		struct {
>
diff mbox series

Patch

diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
index 85c7abc51af5..4a900cdbc62e 100644
--- a/Documentation/virt/kvm/api.rst
+++ b/Documentation/virt/kvm/api.rst
@@ -5986,16 +5986,16 @@  should put the acknowledged interrupt vector into the 'epr' field.
   #define KVM_SYSTEM_EVENT_RESET          2
   #define KVM_SYSTEM_EVENT_CRASH          3
 			__u32 type;
-			__u64 flags;
+                        __u32 ndata;
+                        __u64 data[16];
 		} system_event;
 
 If exit_reason is KVM_EXIT_SYSTEM_EVENT then the vcpu has triggered
 a system-level event using some architecture specific mechanism (hypercall
 or some special instruction). In case of ARM64, this is triggered using
-HVC instruction based PSCI call from the vcpu. The 'type' field describes
-the system-level event type. The 'flags' field describes architecture
-specific flags for the system-level event.
+HVC instruction based PSCI call from the vcpu.
 
+The 'type' field describes the system-level event type.
 Valid values for 'type' are:
 
  - KVM_SYSTEM_EVENT_SHUTDOWN -- the guest has requested a shutdown of the
@@ -6010,10 +6010,20 @@  Valid values for 'type' are:
    to ignore the request, or to gather VM memory core dump and/or
    reset/shutdown of the VM.
 
-Valid flags are:
+If KVM_CAP_SYSTEM_EVENT_DATA is present, the 'data' field can contain
+architecture specific information for the system-level event.  Only
+the first `ndata` items (possibly zero) of the data array are valid.
 
- - KVM_SYSTEM_EVENT_RESET_FLAG_PSCI_RESET2 (arm64 only) -- the guest issued
-   a SYSTEM_RESET2 call according to v1.1 of the PSCI specification.
+ - for arm64, data[0] is set to KVM_SYSTEM_EVENT_RESET_FLAG_PSCI_RESET2 if
+   the guest issued a SYSTEM_RESET2 call according to v1.1 of the PSCI
+   specification.
+
+ - for RISC-V, data[0] is set to the value of the second argument of the
+   ``sbi_system_reset`` call.
+
+Previous versions of Linux defined a `flags` member in this struct.  The
+field is now aliased to `data[0]`.  Userspace can assume that it is only
+written if ndata is greater than 0.
 
 ::
 
diff --git a/arch/arm64/kvm/psci.c b/arch/arm64/kvm/psci.c
index baac2b405f23..708d80e8e60d 100644
--- a/arch/arm64/kvm/psci.c
+++ b/arch/arm64/kvm/psci.c
@@ -181,7 +181,8 @@  static void kvm_prepare_system_event(struct kvm_vcpu *vcpu, u32 type, u64 flags)
 
 	memset(&vcpu->run->system_event, 0, sizeof(vcpu->run->system_event));
 	vcpu->run->system_event.type = type;
-	vcpu->run->system_event.flags = flags;
+	vcpu->run->system_event.ndata = 1;
+	vcpu->run->system_event.data[0] = flags;
 	vcpu->run->exit_reason = KVM_EXIT_SYSTEM_EVENT;
 }
 
diff --git a/arch/riscv/kvm/vcpu_sbi.c b/arch/riscv/kvm/vcpu_sbi.c
index a09ecb97b890..a145f4356453 100644
--- a/arch/riscv/kvm/vcpu_sbi.c
+++ b/arch/riscv/kvm/vcpu_sbi.c
@@ -94,7 +94,8 @@  void kvm_riscv_vcpu_sbi_system_reset(struct kvm_vcpu *vcpu,
 
 	memset(&run->system_event, 0, sizeof(run->system_event));
 	run->system_event.type = type;
-	run->system_event.flags = flags;
+	run->system_event.ndata = 1;
+	run->system_event.data[0] = reason;
 	run->exit_reason = KVM_EXIT_SYSTEM_EVENT;
 }
 
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index a6ab19afc638..7f21d9fe816f 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -10020,12 +10020,14 @@  static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
 		if (kvm_check_request(KVM_REQ_HV_CRASH, vcpu)) {
 			vcpu->run->exit_reason = KVM_EXIT_SYSTEM_EVENT;
 			vcpu->run->system_event.type = KVM_SYSTEM_EVENT_CRASH;
+			vcpu->run->system_event.ndata = 0;
 			r = 0;
 			goto out;
 		}
 		if (kvm_check_request(KVM_REQ_HV_RESET, vcpu)) {
 			vcpu->run->exit_reason = KVM_EXIT_SYSTEM_EVENT;
 			vcpu->run->system_event.type = KVM_SYSTEM_EVENT_RESET;
+			vcpu->run->system_event.ndata = 0;
 			r = 0;
 			goto out;
 		}
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index 91a6fe4e02c0..f903ab0c8d7a 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -445,7 +445,11 @@  struct kvm_run {
 #define KVM_SYSTEM_EVENT_RESET          2
 #define KVM_SYSTEM_EVENT_CRASH          3
 			__u32 type;
-			__u64 flags;
+			__u32 ndata;
+			union {
+				__u64 flags;
+				__u64 data[16];
+			};
 		} system_event;
 		/* KVM_EXIT_S390_STSI */
 		struct {
@@ -1144,6 +1148,8 @@  struct kvm_ppc_resize_hpt {
 #define KVM_CAP_S390_MEM_OP_EXTENSION 211
 #define KVM_CAP_PMU_CAPABILITY 212
 #define KVM_CAP_DISABLE_QUIRKS2 213
+/* #define KVM_CAP_VM_TSC_CONTROL 214 */
+#define KVM_CAP_SYSTEM_EVENT_DATA 215
 
 #ifdef KVM_CAP_IRQ_ROUTING
 
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index f30bb8c16f26..6d971fb1b08d 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -4354,6 +4354,7 @@  static long kvm_vm_ioctl_check_extension_generic(struct kvm *kvm, long arg)
 		return 0;
 #endif
 	case KVM_CAP_BINARY_STATS_FD:
+	case KVM_CAP_SYSTEM_EVENT_DATA:
 		return 1;
 	default:
 		break;