Message ID | 20200818211533.849501-3-aaronlewis@google.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | Allow userspace to manage MSRs | expand |
On 18.08.20 23:15, Aaron Lewis wrote: > > It's not desireable to have all MSRs always handled by KVM kernel space. Some > MSRs would be useful to handle in user space to either emulate behavior (like > uCode updates) or differentiate whether they are valid based on the CPU model. > > To allow user space to specify which MSRs it wants to see handled by KVM, > this patch introduces a new ioctl to push allow lists of bitmaps into > KVM. Based on these bitmaps, KVM can then decide whether to reject MSR access. > With the addition of KVM_CAP_X86_USER_SPACE_MSR it can also deflect the > denied MSR events to user space to operate on. > > If no allowlist is populated, MSR handling stays identical to before. > > Signed-off-by: KarimAllah Ahmed <karahmed@amazon.de> > Signed-off-by: Alexander Graf <graf@amazon.com> Same here, SoB line is missing. I also see that you didn't address the nits you had on this patch: [...] >> + Filter booth read and write accesses to MSRs using the given bitmap. A 0 >> + in the bitmap indicates that both reads and writes should immediately fail, >> + while a 1 indicates that reads and writes should be handled by the normal >> + KVM MSR emulation logic. > > nit: Filter both [...] >> +/* Maximum size of the of the bitmap in bytes */ > > nit: "of the" is repeated twice Feel free to change them in your patch setand add a note between the SoB lines: Signed-off-by: KarimAllah Ahmed <karahmed@amazon.de Signed-off-by: Alexander Graf <graf@amazon.com> [aaronlewis: s/of the of the/of the/, s/booth/both/] Signed-off-by: Aaron Lewis <aaronlewis@google.com> Alex Amazon Development Center Germany GmbH Krausenstr. 38 10117 Berlin Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss Eingetragen am Amtsgericht Charlottenburg unter HRB 149173 B Sitz: Berlin Ust-ID: DE 289 237 879
Hi Aaron, url: https://github.com/0day-ci/linux/commits/Aaron-Lewis/Allow-userspace-to-manage-MSRs/20200819-051903 base: https://git.kernel.org/pub/scm/virt/kvm/kvm.git linux-next config: x86_64-randconfig-m001-20200827 (attached as .config) compiler: gcc-9 (Debian 9.3.0-15) 9.3.0 If you fix the issue, kindly add following tag as appropriate Reported-by: kernel test robot <lkp@intel.com> Reported-by: Dan Carpenter <dan.carpenter@oracle.com> smatch warnings: arch/x86/kvm/x86.c:5248 kvm_vm_ioctl_add_msr_allowlist() error: 'bitmap' dereferencing possible ERR_PTR() # https://github.com/0day-ci/linux/commit/107c87325cf461b7b1bd07bb6ddbaf808a8d8a2a git remote add linux-review https://github.com/0day-ci/linux git fetch --no-tags linux-review Aaron-Lewis/Allow-userspace-to-manage-MSRs/20200819-051903 git checkout 107c87325cf461b7b1bd07bb6ddbaf808a8d8a2a vim +/bitmap +5248 arch/x86/kvm/x86.c 107c87325cf461 Aaron Lewis 2020-08-18 5181 static int kvm_vm_ioctl_add_msr_allowlist(struct kvm *kvm, void __user *argp) 107c87325cf461 Aaron Lewis 2020-08-18 5182 { 107c87325cf461 Aaron Lewis 2020-08-18 5183 struct msr_bitmap_range *ranges = kvm->arch.msr_allowlist_ranges; 107c87325cf461 Aaron Lewis 2020-08-18 5184 struct kvm_msr_allowlist __user *user_msr_allowlist = argp; 107c87325cf461 Aaron Lewis 2020-08-18 5185 struct msr_bitmap_range range; 107c87325cf461 Aaron Lewis 2020-08-18 5186 struct kvm_msr_allowlist kernel_msr_allowlist; 107c87325cf461 Aaron Lewis 2020-08-18 5187 unsigned long *bitmap = NULL; 107c87325cf461 Aaron Lewis 2020-08-18 5188 size_t bitmap_size; 107c87325cf461 Aaron Lewis 2020-08-18 5189 int r = 0; 107c87325cf461 Aaron Lewis 2020-08-18 5190 107c87325cf461 Aaron Lewis 2020-08-18 5191 if (copy_from_user(&kernel_msr_allowlist, user_msr_allowlist, 107c87325cf461 Aaron Lewis 2020-08-18 5192 sizeof(kernel_msr_allowlist))) { 107c87325cf461 Aaron Lewis 2020-08-18 5193 r = -EFAULT; 107c87325cf461 Aaron Lewis 2020-08-18 5194 goto out; 107c87325cf461 Aaron Lewis 2020-08-18 5195 } 107c87325cf461 Aaron Lewis 2020-08-18 5196 107c87325cf461 Aaron Lewis 2020-08-18 5197 bitmap_size = BITS_TO_LONGS(kernel_msr_allowlist.nmsrs) * sizeof(long); ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^n On 32 bit systems the BITS_TO_LONGS() can integer overflow if kernel_msr_allowlist.nmsrs is larger than ULONG_MAX - bits_per_long. In that case bitmap_size is zero. 107c87325cf461 Aaron Lewis 2020-08-18 5198 if (bitmap_size > KVM_MSR_ALLOWLIST_MAX_LEN) { 107c87325cf461 Aaron Lewis 2020-08-18 5199 r = -EINVAL; 107c87325cf461 Aaron Lewis 2020-08-18 5200 goto out; 107c87325cf461 Aaron Lewis 2020-08-18 5201 } 107c87325cf461 Aaron Lewis 2020-08-18 5202 107c87325cf461 Aaron Lewis 2020-08-18 5203 bitmap = memdup_user(user_msr_allowlist->bitmap, bitmap_size); 107c87325cf461 Aaron Lewis 2020-08-18 5204 if (IS_ERR(bitmap)) { 107c87325cf461 Aaron Lewis 2020-08-18 5205 r = PTR_ERR(bitmap); 107c87325cf461 Aaron Lewis 2020-08-18 5206 goto out; ^^^^^^^^ "out" is always a vague label name. It's better style to return directly instead of doing a complicated no-op. if (IS_ERR(bitmap)) return PTR_ERR(bitmap); 107c87325cf461 Aaron Lewis 2020-08-18 5207 } 107c87325cf461 Aaron Lewis 2020-08-18 5208 107c87325cf461 Aaron Lewis 2020-08-18 5209 range = (struct msr_bitmap_range) { 107c87325cf461 Aaron Lewis 2020-08-18 5210 .flags = kernel_msr_allowlist.flags, 107c87325cf461 Aaron Lewis 2020-08-18 5211 .base = kernel_msr_allowlist.base, 107c87325cf461 Aaron Lewis 2020-08-18 5212 .nmsrs = kernel_msr_allowlist.nmsrs, 107c87325cf461 Aaron Lewis 2020-08-18 5213 .bitmap = bitmap, In case of overflow then "bitmap" is 0x16 and .nmsrs is a very high number. 107c87325cf461 Aaron Lewis 2020-08-18 5214 }; 107c87325cf461 Aaron Lewis 2020-08-18 5215 107c87325cf461 Aaron Lewis 2020-08-18 5216 if (range.flags & ~(KVM_MSR_ALLOW_READ | KVM_MSR_ALLOW_WRITE)) { 107c87325cf461 Aaron Lewis 2020-08-18 5217 r = -EINVAL; 107c87325cf461 Aaron Lewis 2020-08-18 5218 goto out; 107c87325cf461 Aaron Lewis 2020-08-18 5219 } 107c87325cf461 Aaron Lewis 2020-08-18 5220 107c87325cf461 Aaron Lewis 2020-08-18 5221 /* 107c87325cf461 Aaron Lewis 2020-08-18 5222 * Protect from concurrent calls to this function that could trigger 107c87325cf461 Aaron Lewis 2020-08-18 5223 * a TOCTOU violation on kvm->arch.msr_allowlist_ranges_count. 107c87325cf461 Aaron Lewis 2020-08-18 5224 */ 107c87325cf461 Aaron Lewis 2020-08-18 5225 mutex_lock(&kvm->lock); 107c87325cf461 Aaron Lewis 2020-08-18 5226 107c87325cf461 Aaron Lewis 2020-08-18 5227 if (kvm->arch.msr_allowlist_ranges_count >= 107c87325cf461 Aaron Lewis 2020-08-18 5228 ARRAY_SIZE(kvm->arch.msr_allowlist_ranges)) { 107c87325cf461 Aaron Lewis 2020-08-18 5229 r = -E2BIG; 107c87325cf461 Aaron Lewis 2020-08-18 5230 goto out_locked; 107c87325cf461 Aaron Lewis 2020-08-18 5231 } 107c87325cf461 Aaron Lewis 2020-08-18 5232 107c87325cf461 Aaron Lewis 2020-08-18 5233 if (msr_range_overlaps(kvm, &range)) { 107c87325cf461 Aaron Lewis 2020-08-18 5234 r = -EINVAL; 107c87325cf461 Aaron Lewis 2020-08-18 5235 goto out_locked; 107c87325cf461 Aaron Lewis 2020-08-18 5236 } 107c87325cf461 Aaron Lewis 2020-08-18 5237 107c87325cf461 Aaron Lewis 2020-08-18 5238 /* Everything ok, add this range identifier to our global pool */ 107c87325cf461 Aaron Lewis 2020-08-18 5239 ranges[kvm->arch.msr_allowlist_ranges_count] = range; 107c87325cf461 Aaron Lewis 2020-08-18 5240 /* Make sure we filled the array before we tell anyone to walk it */ 107c87325cf461 Aaron Lewis 2020-08-18 5241 smp_wmb(); 107c87325cf461 Aaron Lewis 2020-08-18 5242 kvm->arch.msr_allowlist_ranges_count++; 107c87325cf461 Aaron Lewis 2020-08-18 5243 107c87325cf461 Aaron Lewis 2020-08-18 5244 out_locked: 107c87325cf461 Aaron Lewis 2020-08-18 5245 mutex_unlock(&kvm->lock); 107c87325cf461 Aaron Lewis 2020-08-18 5246 out: 107c87325cf461 Aaron Lewis 2020-08-18 5247 if (r) 107c87325cf461 Aaron Lewis 2020-08-18 @5248 kfree(bitmap); 107c87325cf461 Aaron Lewis 2020-08-18 5249 107c87325cf461 Aaron Lewis 2020-08-18 5250 return r; 107c87325cf461 Aaron Lewis 2020-08-18 5251 } --- 0-DAY CI Kernel Test Service, Intel Corporation https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org
On 31.08.20 12:39, Dan Carpenter wrote: > > Hi Aaron, > > url: https://github.com/0day-ci/linux/commits/Aaron-Lewis/Allow-userspace-to-manage-MSRs/20200819-051903 > base: https://git.kernel.org/pub/scm/virt/kvm/kvm.git linux-next > config: x86_64-randconfig-m001-20200827 (attached as .config) > compiler: gcc-9 (Debian 9.3.0-15) 9.3.0 > > If you fix the issue, kindly add following tag as appropriate > Reported-by: kernel test robot <lkp@intel.com> > Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Thanks a bunch for looking at this! I'd squash in the change with the actual patch as it's tiny, so I'm not sure how attribution would work in that case. > > smatch warnings: > arch/x86/kvm/x86.c:5248 kvm_vm_ioctl_add_msr_allowlist() error: 'bitmap' dereferencing possible ERR_PTR() > > # https://github.com/0day-ci/linux/commit/107c87325cf461b7b1bd07bb6ddbaf808a8d8a2a > git remote add linux-review https://github.com/0day-ci/linux > git fetch --no-tags linux-review Aaron-Lewis/Allow-userspace-to-manage-MSRs/20200819-051903 > git checkout 107c87325cf461b7b1bd07bb6ddbaf808a8d8a2a > vim +/bitmap +5248 arch/x86/kvm/x86.c > > 107c87325cf461 Aaron Lewis 2020-08-18 5181 static int kvm_vm_ioctl_add_msr_allowlist(struct kvm *kvm, void __user *argp) > 107c87325cf461 Aaron Lewis 2020-08-18 5182 { > 107c87325cf461 Aaron Lewis 2020-08-18 5183 struct msr_bitmap_range *ranges = kvm->arch.msr_allowlist_ranges; > 107c87325cf461 Aaron Lewis 2020-08-18 5184 struct kvm_msr_allowlist __user *user_msr_allowlist = argp; > 107c87325cf461 Aaron Lewis 2020-08-18 5185 struct msr_bitmap_range range; > 107c87325cf461 Aaron Lewis 2020-08-18 5186 struct kvm_msr_allowlist kernel_msr_allowlist; > 107c87325cf461 Aaron Lewis 2020-08-18 5187 unsigned long *bitmap = NULL; > 107c87325cf461 Aaron Lewis 2020-08-18 5188 size_t bitmap_size; > 107c87325cf461 Aaron Lewis 2020-08-18 5189 int r = 0; > 107c87325cf461 Aaron Lewis 2020-08-18 5190 > 107c87325cf461 Aaron Lewis 2020-08-18 5191 if (copy_from_user(&kernel_msr_allowlist, user_msr_allowlist, > 107c87325cf461 Aaron Lewis 2020-08-18 5192 sizeof(kernel_msr_allowlist))) { > 107c87325cf461 Aaron Lewis 2020-08-18 5193 r = -EFAULT; > 107c87325cf461 Aaron Lewis 2020-08-18 5194 goto out; > 107c87325cf461 Aaron Lewis 2020-08-18 5195 } > 107c87325cf461 Aaron Lewis 2020-08-18 5196 > 107c87325cf461 Aaron Lewis 2020-08-18 5197 bitmap_size = BITS_TO_LONGS(kernel_msr_allowlist.nmsrs) * sizeof(long); > ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^n > On 32 bit systems the BITS_TO_LONGS() can integer overflow if > kernel_msr_allowlist.nmsrs is larger than ULONG_MAX - bits_per_long. In > that case bitmap_size is zero. Nice catch! It should be enough to ... > > 107c87325cf461 Aaron Lewis 2020-08-18 5198 if (bitmap_size > KVM_MSR_ALLOWLIST_MAX_LEN) { ... add a check for !bitmap_size here as well then, right? > 107c87325cf461 Aaron Lewis 2020-08-18 5199 r = -EINVAL; > 107c87325cf461 Aaron Lewis 2020-08-18 5200 goto out; > 107c87325cf461 Aaron Lewis 2020-08-18 5201 } > 107c87325cf461 Aaron Lewis 2020-08-18 5202 > 107c87325cf461 Aaron Lewis 2020-08-18 5203 bitmap = memdup_user(user_msr_allowlist->bitmap, bitmap_size); > 107c87325cf461 Aaron Lewis 2020-08-18 5204 if (IS_ERR(bitmap)) { > 107c87325cf461 Aaron Lewis 2020-08-18 5205 r = PTR_ERR(bitmap); > 107c87325cf461 Aaron Lewis 2020-08-18 5206 goto out; > ^^^^^^^^ > "out" is always a vague label name. It's better style to return > directly instead of doing a complicated no-op. > > if (IS_ERR(bitmap)) > return PTR_ERR(bitmap); I agree 100% :). In fact, I agree so much that I already did change it for v6 last week, just did not send it out yet. > > 107c87325cf461 Aaron Lewis 2020-08-18 5207 } > 107c87325cf461 Aaron Lewis 2020-08-18 5208 > 107c87325cf461 Aaron Lewis 2020-08-18 5209 range = (struct msr_bitmap_range) { > 107c87325cf461 Aaron Lewis 2020-08-18 5210 .flags = kernel_msr_allowlist.flags, > 107c87325cf461 Aaron Lewis 2020-08-18 5211 .base = kernel_msr_allowlist.base, > 107c87325cf461 Aaron Lewis 2020-08-18 5212 .nmsrs = kernel_msr_allowlist.nmsrs, > 107c87325cf461 Aaron Lewis 2020-08-18 5213 .bitmap = bitmap, > > In case of overflow then "bitmap" is 0x16 and .nmsrs is a very high > number. The overflow case should disappear with the additional check above, right? Alex Amazon Development Center Germany GmbH Krausenstr. 38 10117 Berlin Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss Eingetragen am Amtsgericht Charlottenburg unter HRB 149173 B Sitz: Berlin Ust-ID: DE 289 237 879
On Tue, Sep 01, 2020 at 09:13:10PM +0200, Alexander Graf wrote: > > > On 31.08.20 12:39, Dan Carpenter wrote: > > > > Hi Aaron, > > > > url: https://github.com/0day-ci/linux/commits/Aaron-Lewis/Allow-userspace-to-manage-MSRs/20200819-051903 > > base: https://git.kernel.org/pub/scm/virt/kvm/kvm.git linux-next > > config: x86_64-randconfig-m001-20200827 (attached as .config) > > compiler: gcc-9 (Debian 9.3.0-15) 9.3.0 > > > > If you fix the issue, kindly add following tag as appropriate > > Reported-by: kernel test robot <lkp@intel.com> > > Reported-by: Dan Carpenter <dan.carpenter@oracle.com> > > Thanks a bunch for looking at this! I'd squash in the change with the actual > patch as it's tiny, so I'm not sure how attribution would work in that case. Yep. No problem. These are just a template that gets sent to everyone. > > > > > smatch warnings: > > arch/x86/kvm/x86.c:5248 kvm_vm_ioctl_add_msr_allowlist() error: 'bitmap' dereferencing possible ERR_PTR() > > > > # https://github.com/0day-ci/linux/commit/107c87325cf461b7b1bd07bb6ddbaf808a8d8a2a > > git remote add linux-review https://github.com/0day-ci/linux git fetch > > --no-tags linux-review > > Aaron-Lewis/Allow-userspace-to-manage-MSRs/20200819-051903 > > git checkout 107c87325cf461b7b1bd07bb6ddbaf808a8d8a2a > > vim +/bitmap +5248 arch/x86/kvm/x86.c > > > > 107c87325cf461 Aaron Lewis 2020-08-18 5181 static int kvm_vm_ioctl_add_msr_allowlist(struct kvm *kvm, void __user *argp) > > 107c87325cf461 Aaron Lewis 2020-08-18 5182 { > > 107c87325cf461 Aaron Lewis 2020-08-18 5183 struct msr_bitmap_range *ranges = kvm->arch.msr_allowlist_ranges; > > 107c87325cf461 Aaron Lewis 2020-08-18 5184 struct kvm_msr_allowlist __user *user_msr_allowlist = argp; > > 107c87325cf461 Aaron Lewis 2020-08-18 5185 struct msr_bitmap_range range; > > 107c87325cf461 Aaron Lewis 2020-08-18 5186 struct kvm_msr_allowlist kernel_msr_allowlist; > > 107c87325cf461 Aaron Lewis 2020-08-18 5187 unsigned long *bitmap = NULL; > > 107c87325cf461 Aaron Lewis 2020-08-18 5188 size_t bitmap_size; > > 107c87325cf461 Aaron Lewis 2020-08-18 5189 int r = 0; > > 107c87325cf461 Aaron Lewis 2020-08-18 5190 > > 107c87325cf461 Aaron Lewis 2020-08-18 5191 if (copy_from_user(&kernel_msr_allowlist, user_msr_allowlist, > > 107c87325cf461 Aaron Lewis 2020-08-18 5192 sizeof(kernel_msr_allowlist))) { > > 107c87325cf461 Aaron Lewis 2020-08-18 5193 r = -EFAULT; > > 107c87325cf461 Aaron Lewis 2020-08-18 5194 goto out; > > 107c87325cf461 Aaron Lewis 2020-08-18 5195 } > > 107c87325cf461 Aaron Lewis 2020-08-18 5196 > > 107c87325cf461 Aaron Lewis 2020-08-18 5197 bitmap_size = BITS_TO_LONGS(kernel_msr_allowlist.nmsrs) * sizeof(long); > > ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^n > > On 32 bit systems the BITS_TO_LONGS() can integer overflow if > > kernel_msr_allowlist.nmsrs is larger than ULONG_MAX - bits_per_long. In > > that case bitmap_size is zero. > > Nice catch! It should be enough to ... > > > > > 107c87325cf461 Aaron Lewis 2020-08-18 5198 if (bitmap_size > KVM_MSR_ALLOWLIST_MAX_LEN) { > > ... add a check for !bitmap_size here as well then, right? Yup. regards, dan carpenter
diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst index aad51c33fcae..91ce3e4b5b2e 100644 --- a/Documentation/virt/kvm/api.rst +++ b/Documentation/virt/kvm/api.rst @@ -4704,6 +4704,82 @@ KVM_PV_VM_VERIFY Verify the integrity of the unpacked image. Only if this succeeds, KVM is allowed to start protected VCPUs. +4.126 KVM_X86_ADD_MSR_ALLOWLIST +------------------------------- + +:Capability: KVM_CAP_X86_MSR_ALLOWLIST +:Architectures: x86 +:Type: vm ioctl +:Parameters: struct kvm_msr_allowlist +:Returns: 0 on success, < 0 on error + +:: + + struct kvm_msr_allowlist { + __u32 flags; + __u32 nmsrs; /* number of msrs in bitmap */ + __u32 base; /* base address for the MSRs bitmap */ + __u32 pad; + + __u8 bitmap[0]; /* a set bit allows that the operation set in flags */ + }; + +flags values: + +KVM_MSR_ALLOW_READ + + Filter read accesses to MSRs using the given bitmap. A 0 in the bitmap + indicates that a read should immediately fail, while a 1 indicates that + a read should be handled by the normal KVM MSR emulation logic. + +KVM_MSR_ALLOW_WRITE + + Filter write accesses to MSRs using the given bitmap. A 0 in the bitmap + indicates that a write should immediately fail, while a 1 indicates that + a write should be handled by the normal KVM MSR emulation logic. + +KVM_MSR_ALLOW_READ | KVM_MSR_ALLOW_WRITE + + Filter booth read and write accesses to MSRs using the given bitmap. A 0 + in the bitmap indicates that both reads and writes should immediately fail, + while a 1 indicates that reads and writes should be handled by the normal + KVM MSR emulation logic. + +This ioctl allows user space to define a set of bitmaps of MSR ranges to +specify whether a certain MSR access is allowed or not. + +If this ioctl has never been invoked, MSR accesses are not guarded and the +old KVM in-kernel emulation behavior is fully preserved. + +As soon as the first allow list was specified, only allowed MSR accesses +are permitted inside of KVM's MSR code. + +Each allowlist specifies a range of MSRs to potentially allow access on. +The range goes from MSR index [base .. base+nmsrs]. The flags field +indicates whether reads, writes or both reads and writes are permitted +by setting a 1 bit in the bitmap for the corresponding MSR index. + +If an MSR access is not permitted through the allow list, it generates a +#GP inside the guest. When combined with KVM_CAP_X86_USER_SPACE_MSR, that +allows user space to deflect and potentially handle various MSR accesses +into user space. + +4.124 KVM_X86_CLEAR_MSR_ALLOWLIST +--------------------------------- + +:Capability: KVM_CAP_X86_MSR_ALLOWLIST +:Architectures: x86 +:Type: vcpu ioctl +:Parameters: none +:Returns: 0 + +This ioctl resets all internal MSR allow lists. After this call, no allow +list is present and the guest would execute as if no allow lists were set, +so all MSRs are considered allowed and thus handled by the in-kernel MSR +emulation logic. + +No vCPU may be in running state when calling this ioctl. + 5. The kvm_run structure ======================== @@ -6221,3 +6297,18 @@ writes to user space. It can be enabled on a VM level. If enabled, MSR accesses that would usually trigger a #GP by KVM into the guest will instead get bounced to user space through the KVM_EXIT_X86_RDMSR and KVM_EXIT_X86_WRMSR exit notifications. + +8.25 KVM_CAP_X86_MSR_ALLOWLIST +------------------------------ + +:Architectures: x86 + +This capability indicates that KVM supports emulation of only select MSR +registers. With this capability exposed, KVM exports two new VM ioctls: +KVM_X86_ADD_MSR_ALLOWLIST which user space can call to specify bitmaps of MSR +ranges that KVM should emulate in kernel space and KVM_X86_CLEAR_MSR_ALLOWLIST +which user space can call to remove all MSR allow lists from the VM context. + +In combination with KVM_CAP_X86_USER_SPACE_MSR, this allows user space to +trap and emulate MSRs that are outside of the scope of KVM as well as +limit the attack surface on KVM's MSR emulation code. diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index 02a102c60dff..1ee8468c913c 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -860,6 +860,13 @@ struct kvm_hv { struct kvm_hv_syndbg hv_syndbg; }; +struct msr_bitmap_range { + u32 flags; + u32 nmsrs; + u32 base; + unsigned long *bitmap; +}; + enum kvm_irqchip_mode { KVM_IRQCHIP_NONE, KVM_IRQCHIP_KERNEL, /* created with KVM_CREATE_IRQCHIP */ @@ -964,6 +971,9 @@ struct kvm_arch { /* Deflect RDMSR and WRMSR to user space when they trigger a #GP */ bool user_space_msr_enabled; + struct msr_bitmap_range msr_allowlist_ranges[10]; + int msr_allowlist_ranges_count; + struct kvm_pmu_event_filter *pmu_event_filter; struct task_struct *nx_lpage_recovery_thread; }; diff --git a/arch/x86/include/uapi/asm/kvm.h b/arch/x86/include/uapi/asm/kvm.h index 0780f97c1850..c33fb1d72d52 100644 --- a/arch/x86/include/uapi/asm/kvm.h +++ b/arch/x86/include/uapi/asm/kvm.h @@ -192,6 +192,21 @@ struct kvm_msr_list { __u32 indices[0]; }; +#define KVM_MSR_ALLOW_READ (1 << 0) +#define KVM_MSR_ALLOW_WRITE (1 << 1) + +/* Maximum size of the of the bitmap in bytes */ +#define KVM_MSR_ALLOWLIST_MAX_LEN 0x600 + +/* for KVM_X86_ADD_MSR_ALLOWLIST */ +struct kvm_msr_allowlist { + __u32 flags; + __u32 nmsrs; /* number of msrs in bitmap */ + __u32 base; /* base address for the MSRs bitmap */ + __u32 pad; + + __u8 bitmap[0]; /* a set bit allows that the operation set in flags */ +}; struct kvm_cpuid_entry { __u32 function; diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 5d94a95fb66b..c46a709be532 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -1486,6 +1486,39 @@ void kvm_enable_efer_bits(u64 mask) } EXPORT_SYMBOL_GPL(kvm_enable_efer_bits); +static bool kvm_msr_allowed(struct kvm_vcpu *vcpu, u32 index, u32 type) +{ + struct kvm *kvm = vcpu->kvm; + struct msr_bitmap_range *ranges = kvm->arch.msr_allowlist_ranges; + u32 count = kvm->arch.msr_allowlist_ranges_count; + u32 i; + bool r = false; + int idx; + + /* MSR allowlist not set up, allow everything */ + if (!count) + return true; + + /* Prevent collision with clear_msr_allowlist */ + idx = srcu_read_lock(&kvm->srcu); + + for (i = 0; i < count; i++) { + u32 start = ranges[i].base; + u32 end = start + ranges[i].nmsrs; + u32 flags = ranges[i].flags; + unsigned long *bitmap = ranges[i].bitmap; + + if ((index >= start) && (index < end) && (flags & type)) { + r = !!test_bit(index - start, bitmap); + break; + } + } + + srcu_read_unlock(&kvm->srcu, idx); + + return r; +} + /* * Write @data into the MSR specified by @index. Select MSR specific fault * checks are bypassed if @host_initiated is %true. @@ -1497,6 +1530,9 @@ static int __kvm_set_msr(struct kvm_vcpu *vcpu, u32 index, u64 data, { struct msr_data msr; + if (!host_initiated && !kvm_msr_allowed(vcpu, index, KVM_MSR_ALLOW_WRITE)) + return -ENOENT; + switch (index) { case MSR_FS_BASE: case MSR_GS_BASE: @@ -1553,6 +1589,9 @@ int __kvm_get_msr(struct kvm_vcpu *vcpu, u32 index, u64 *data, struct msr_data msr; int ret; + if (!host_initiated && !kvm_msr_allowed(vcpu, index, KVM_MSR_ALLOW_READ)) + return -ENOENT; + msr.index = index; msr.host_initiated = host_initiated; @@ -3590,6 +3629,7 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext) case KVM_CAP_SET_GUEST_DEBUG: case KVM_CAP_LAST_CPU: case KVM_CAP_X86_USER_SPACE_MSR: + case KVM_CAP_X86_MSR_ALLOWLIST: r = 1; break; case KVM_CAP_SYNC_REGS: @@ -5118,6 +5158,116 @@ int kvm_vm_ioctl_enable_cap(struct kvm *kvm, return r; } +static bool msr_range_overlaps(struct kvm *kvm, struct msr_bitmap_range *range) +{ + struct msr_bitmap_range *ranges = kvm->arch.msr_allowlist_ranges; + u32 i, count = kvm->arch.msr_allowlist_ranges_count; + bool r = false; + + for (i = 0; i < count; i++) { + u32 start = max(range->base, ranges[i].base); + u32 end = min(range->base + range->nmsrs, + ranges[i].base + ranges[i].nmsrs); + + if ((start < end) && (range->flags & ranges[i].flags)) { + r = true; + break; + } + } + + return r; +} + +static int kvm_vm_ioctl_add_msr_allowlist(struct kvm *kvm, void __user *argp) +{ + struct msr_bitmap_range *ranges = kvm->arch.msr_allowlist_ranges; + struct kvm_msr_allowlist __user *user_msr_allowlist = argp; + struct msr_bitmap_range range; + struct kvm_msr_allowlist kernel_msr_allowlist; + unsigned long *bitmap = NULL; + size_t bitmap_size; + int r = 0; + + if (copy_from_user(&kernel_msr_allowlist, user_msr_allowlist, + sizeof(kernel_msr_allowlist))) { + r = -EFAULT; + goto out; + } + + bitmap_size = BITS_TO_LONGS(kernel_msr_allowlist.nmsrs) * sizeof(long); + if (bitmap_size > KVM_MSR_ALLOWLIST_MAX_LEN) { + r = -EINVAL; + goto out; + } + + bitmap = memdup_user(user_msr_allowlist->bitmap, bitmap_size); + if (IS_ERR(bitmap)) { + r = PTR_ERR(bitmap); + goto out; + } + + range = (struct msr_bitmap_range) { + .flags = kernel_msr_allowlist.flags, + .base = kernel_msr_allowlist.base, + .nmsrs = kernel_msr_allowlist.nmsrs, + .bitmap = bitmap, + }; + + if (range.flags & ~(KVM_MSR_ALLOW_READ | KVM_MSR_ALLOW_WRITE)) { + r = -EINVAL; + goto out; + } + + /* + * Protect from concurrent calls to this function that could trigger + * a TOCTOU violation on kvm->arch.msr_allowlist_ranges_count. + */ + mutex_lock(&kvm->lock); + + if (kvm->arch.msr_allowlist_ranges_count >= + ARRAY_SIZE(kvm->arch.msr_allowlist_ranges)) { + r = -E2BIG; + goto out_locked; + } + + if (msr_range_overlaps(kvm, &range)) { + r = -EINVAL; + goto out_locked; + } + + /* Everything ok, add this range identifier to our global pool */ + ranges[kvm->arch.msr_allowlist_ranges_count] = range; + /* Make sure we filled the array before we tell anyone to walk it */ + smp_wmb(); + kvm->arch.msr_allowlist_ranges_count++; + +out_locked: + mutex_unlock(&kvm->lock); +out: + if (r) + kfree(bitmap); + + return r; +} + +static int kvm_vm_ioctl_clear_msr_allowlist(struct kvm *kvm) +{ + int i; + u32 count = kvm->arch.msr_allowlist_ranges_count; + struct msr_bitmap_range ranges[10]; + + mutex_lock(&kvm->lock); + kvm->arch.msr_allowlist_ranges_count = 0; + memcpy(ranges, kvm->arch.msr_allowlist_ranges, count * sizeof(ranges[0])); + mutex_unlock(&kvm->lock); + synchronize_srcu(&kvm->srcu); + + for (i = 0; i < count; i++) + kfree(ranges[i].bitmap); + + return 0; +} + long kvm_arch_vm_ioctl(struct file *filp, unsigned int ioctl, unsigned long arg) { @@ -5424,6 +5574,12 @@ long kvm_arch_vm_ioctl(struct file *filp, case KVM_SET_PMU_EVENT_FILTER: r = kvm_vm_ioctl_set_pmu_event_filter(kvm, argp); break; + case KVM_X86_ADD_MSR_ALLOWLIST: + r = kvm_vm_ioctl_add_msr_allowlist(kvm, argp); + break; + case KVM_X86_CLEAR_MSR_ALLOWLIST: + r = kvm_vm_ioctl_clear_msr_allowlist(kvm); + break; default: r = -ENOTTY; } @@ -10123,6 +10279,8 @@ void kvm_arch_pre_destroy_vm(struct kvm *kvm) void kvm_arch_destroy_vm(struct kvm *kvm) { + int i; + if (current->mm == kvm->mm) { /* * Free memory regions allocated on behalf of userspace, @@ -10139,6 +10297,8 @@ void kvm_arch_destroy_vm(struct kvm *kvm) } if (kvm_x86_ops.vm_destroy) kvm_x86_ops.vm_destroy(kvm); + for (i = 0; i < kvm->arch.msr_allowlist_ranges_count; i++) + kfree(kvm->arch.msr_allowlist_ranges[i].bitmap); kvm_pic_destroy(kvm); kvm_ioapic_destroy(kvm); kvm_free_vcpus(kvm); diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h index 6470c0c1e77a..374021dc4e61 100644 --- a/include/uapi/linux/kvm.h +++ b/include/uapi/linux/kvm.h @@ -1045,6 +1045,7 @@ struct kvm_ppc_resize_hpt { #define KVM_CAP_SMALLER_MAXPHYADDR 185 #define KVM_CAP_S390_DIAG318 186 #define KVM_CAP_X86_USER_SPACE_MSR 187 +#define KVM_CAP_X86_MSR_ALLOWLIST 188 #ifdef KVM_CAP_IRQ_ROUTING @@ -1546,6 +1547,10 @@ struct kvm_pv_cmd { /* Available with KVM_CAP_S390_PROTECTED */ #define KVM_S390_PV_COMMAND _IOWR(KVMIO, 0xc5, struct kvm_pv_cmd) +/* Available with KVM_CAP_X86_MSR_ALLOWLIST */ +#define KVM_X86_ADD_MSR_ALLOWLIST _IOW(KVMIO, 0xc6, struct kvm_msr_allowlist) +#define KVM_X86_CLEAR_MSR_ALLOWLIST _IO(KVMIO, 0xc7) + /* Secure Encrypted Virtualization command */ enum sev_cmd_id { /* Guest initialization commands */