Message ID | 20221005004154.83502-4-gshan@redhat.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | KVM: arm64: Enable ring-based dirty memory tracking | expand |
On Wed, Oct 05, 2022 at 08:41:50AM +0800, Gavin Shan wrote: > -8.29 KVM_CAP_DIRTY_LOG_RING/KVM_CAP_DIRTY_LOG_RING_ACQ_REL > ----------------------------------------------------------- > +8.29 KVM_CAP_DIRTY_LOG_{RING, RING_ACQ_REL, RING_ALLOW_BITMAP} > +-------------------------------------------------------------- Shall we make it a standalone cap, just to rely on DIRTY_RING[_ACQ_REL] being enabled first, instead of making the three caps at the same level? E.g. we can skip creating bitmap for DIRTY_RING[_ACQ_REL] && !_ALLOW_BITMAP (x86). > @@ -2060,10 +2060,6 @@ int kvm_get_dirty_log(struct kvm *kvm, struct kvm_dirty_log *log, > unsigned long n; > unsigned long any = 0; > > - /* Dirty ring tracking is exclusive to dirty log tracking */ > - if (kvm->dirty_ring_size) > - return -ENXIO; Then we can also have one dirty_ring_exclusive(), with something like: bool dirty_ring_exclusive(struct kvm *kvm) { return kvm->dirty_ring_size && !kvm->dirty_ring_allow_bitmap; } Does it make sense? Thanks,
Hi Peter, On 10/7/22 4:28 AM, Peter Xu wrote: > On Wed, Oct 05, 2022 at 08:41:50AM +0800, Gavin Shan wrote: >> -8.29 KVM_CAP_DIRTY_LOG_RING/KVM_CAP_DIRTY_LOG_RING_ACQ_REL >> ----------------------------------------------------------- >> +8.29 KVM_CAP_DIRTY_LOG_{RING, RING_ACQ_REL, RING_ALLOW_BITMAP} >> +-------------------------------------------------------------- > > Shall we make it a standalone cap, just to rely on DIRTY_RING[_ACQ_REL] > being enabled first, instead of making the three caps at the same level? > > E.g. we can skip creating bitmap for DIRTY_RING[_ACQ_REL] && !_ALLOW_BITMAP > (x86). > Both KVM_CAP_DIRTY_LOG_RING and KVM_CAP_DIRTY_LONG_RING_ACQ_REL are available to x86. So KVM_CAP_DIRTY_LONG_RING_ACQ_REL can be enabled on x86 in theory. However, the idea to disallow bitmap for KVM_CAP_DIRTY_LOG_RING (x86) makes sense to me. I think you may be suggesting something like below. - bool struct kvm::dirty_ring_allow_bitmap - In kvm_vm_ioctl_enable_dirty_log_ring(), set 'dirty_ring_allow_bitmap' to true when the capability is KVM_CAP_DIRTY_LONG_RING_ACQ_REL static int kvm_vm_ioctl_enable_dirty_log_ring(struct kvm *kvm, u32 cap, u32 size) { : mutex_lock(&kvm->lock); if (kvm->created_vcpus) { /* We don't allow to change this value after vcpu created */ r = -EINVAL; } else { kvm->dirty_ring_size = size; kvm->dirty_ring_allow_bitmap = (cap == KVM_CAP_DIRTY_LOG_RING_ACQ_REL); r = 0; } mutex_unlock(&kvm->lock); return r; } - In kvm_vm_ioctl_check_extension_generic(), KVM_CAP_DIRTY_LOG_RING_ALLOW_BITMAP is always flase until KVM_CAP_DIRTY_LOG_RING_ACQ_REL is enabled. static long kvm_vm_ioctl_check_extension_generic(...) { : case KVM_CAP_DIRTY_LOG_RING_ALLOW_BITMAP: return kvm->dirty_ring_allow_bitmap ? 1 : 0; } - The suggested dirty_ring_exclusive() is used. >> @@ -2060,10 +2060,6 @@ int kvm_get_dirty_log(struct kvm *kvm, struct kvm_dirty_log *log, >> unsigned long n; >> unsigned long any = 0; >> >> - /* Dirty ring tracking is exclusive to dirty log tracking */ >> - if (kvm->dirty_ring_size) >> - return -ENXIO; > > Then we can also have one dirty_ring_exclusive(), with something like: > > bool dirty_ring_exclusive(struct kvm *kvm) > { > return kvm->dirty_ring_size && !kvm->dirty_ring_allow_bitmap; > } > > Does it make sense? Thanks, > Thanks, Gavin
On Fri, Oct 07, 2022 at 07:38:19AM +0800, Gavin Shan wrote: > Hi Peter, Hi, Gavin, > > On 10/7/22 4:28 AM, Peter Xu wrote: > > On Wed, Oct 05, 2022 at 08:41:50AM +0800, Gavin Shan wrote: > > > -8.29 KVM_CAP_DIRTY_LOG_RING/KVM_CAP_DIRTY_LOG_RING_ACQ_REL > > > ----------------------------------------------------------- > > > +8.29 KVM_CAP_DIRTY_LOG_{RING, RING_ACQ_REL, RING_ALLOW_BITMAP} > > > +-------------------------------------------------------------- > > > > Shall we make it a standalone cap, just to rely on DIRTY_RING[_ACQ_REL] > > being enabled first, instead of making the three caps at the same level? > > > > E.g. we can skip creating bitmap for DIRTY_RING[_ACQ_REL] && !_ALLOW_BITMAP > > (x86). > > > > Both KVM_CAP_DIRTY_LOG_RING and KVM_CAP_DIRTY_LONG_RING_ACQ_REL are available > to x86. So KVM_CAP_DIRTY_LONG_RING_ACQ_REL can be enabled on x86 in theory. > However, the idea to disallow bitmap for KVM_CAP_DIRTY_LOG_RING (x86) makes > sense to me. I think you may be suggesting something like below. > > - bool struct kvm::dirty_ring_allow_bitmap > > - In kvm_vm_ioctl_enable_dirty_log_ring(), set 'dirty_ring_allow_bitmap' to > true when the capability is KVM_CAP_DIRTY_LONG_RING_ACQ_REL What I wanted to do is to decouple the ACQ_REL with ALLOW_BITMAP, so mostly as what you suggested, except.. > > static int kvm_vm_ioctl_enable_dirty_log_ring(struct kvm *kvm, u32 cap, u32 size) > { > : > mutex_lock(&kvm->lock); > > if (kvm->created_vcpus) { > /* We don't allow to change this value after vcpu created */ > r = -EINVAL; > } else { > kvm->dirty_ring_size = size; .. here I'd not set dirty_ring_allow_bitmap at all so I'd drop below line, instead.. > kvm->dirty_ring_allow_bitmap = (cap == KVM_CAP_DIRTY_LOG_RING_ACQ_REL); > r = 0; > } > > mutex_unlock(&kvm->lock); > return r; > } > - In kvm_vm_ioctl_check_extension_generic(), KVM_CAP_DIRTY_LOG_RING_ALLOW_BITMAP > is always flase until KVM_CAP_DIRTY_LOG_RING_ACQ_REL is enabled. > > static long kvm_vm_ioctl_check_extension_generic(...) > { > : > case KVM_CAP_DIRTY_LOG_RING_ALLOW_BITMAP: > return kvm->dirty_ring_allow_bitmap ? 1 : 0; ... here we always return 1, OTOH in kvm_vm_ioctl_enable_cap_generic(): case KVM_CAP_DIRTY_LOG_RING_ALLOW_BITMAP: if (kvm->dirty_ring_size) return -EINVAL; kvm->dirty_ring_allow_bitmap = true; return 0; A side effect of checking dirty_ring_size is then we'll be sure to have no vcpu created too. Maybe we should also check no memslot created to make sure the bitmaps are not created. Then if the userspace wants to use the bitmap altogether with the ring, it needs to first detect KVM_CAP_DIRTY_LOG_RING_ALLOW_BITMAP and enable it before it enables KVM_CAP_DIRTY_LOG_RING. One trick on ALLOW_BITMAP is in mark_page_dirty_in_slot() - after we allow !vcpu case we'll need to make sure it won't accidentally try to set bitmap for !ALLOW_BITMAP, because in that case the bitmap pointer is NULL so set_bit_le() will directly crash the kernel. We could keep the old flavor of having a WARN_ON_ONCE(!vcpu && !ALLOW_BITMAP) then return, but since now the userspace can easily trigger this (e.g. on ARM, a malicious userapp can have DIRTY_RING && !ALLOW_BITMAP, then it can simply trigger the gic ioctl to trigger host warning), I think the better approach is we can kill the process in that case. Not sure whether there's anything better we can do. > } > > - The suggested dirty_ring_exclusive() is used. > > > > @@ -2060,10 +2060,6 @@ int kvm_get_dirty_log(struct kvm *kvm, struct kvm_dirty_log *log, > > > unsigned long n; > > > unsigned long any = 0; > > > - /* Dirty ring tracking is exclusive to dirty log tracking */ > > > - if (kvm->dirty_ring_size) > > > - return -ENXIO; > > > > Then we can also have one dirty_ring_exclusive(), with something like: > > > > bool dirty_ring_exclusive(struct kvm *kvm) > > { > > return kvm->dirty_ring_size && !kvm->dirty_ring_allow_bitmap; > > } > > > > Does it make sense? Thanks, > > > > Thanks, > Gavin >
On Fri, Oct 07, 2022 at 10:31:49AM -0400, Peter Xu wrote: [...] > > - In kvm_vm_ioctl_enable_dirty_log_ring(), set 'dirty_ring_allow_bitmap' to > > true when the capability is KVM_CAP_DIRTY_LONG_RING_ACQ_REL > > What I wanted to do is to decouple the ACQ_REL with ALLOW_BITMAP, so mostly > as what you suggested, except.. +1 > > > > static int kvm_vm_ioctl_enable_dirty_log_ring(struct kvm *kvm, u32 cap, u32 size) > > { > > : > > mutex_lock(&kvm->lock); > > > > if (kvm->created_vcpus) { > > /* We don't allow to change this value after vcpu created */ > > r = -EINVAL; > > } else { > > kvm->dirty_ring_size = size; > > .. here I'd not set dirty_ring_allow_bitmap at all so I'd drop below line, > instead.. > > > kvm->dirty_ring_allow_bitmap = (cap == KVM_CAP_DIRTY_LOG_RING_ACQ_REL); > > r = 0; > > } > > > > mutex_unlock(&kvm->lock); > > return r; > > } > > - In kvm_vm_ioctl_check_extension_generic(), KVM_CAP_DIRTY_LOG_RING_ALLOW_BITMAP > > is always flase until KVM_CAP_DIRTY_LOG_RING_ACQ_REL is enabled. > > > > static long kvm_vm_ioctl_check_extension_generic(...) > > { > > : > > case KVM_CAP_DIRTY_LOG_RING_ALLOW_BITMAP: > > return kvm->dirty_ring_allow_bitmap ? 1 : 0; > > ... here we always return 1, OTOH in kvm_vm_ioctl_enable_cap_generic(): > > case KVM_CAP_DIRTY_LOG_RING_ALLOW_BITMAP: > if (kvm->dirty_ring_size) > return -EINVAL; > kvm->dirty_ring_allow_bitmap = true; > return 0; > > A side effect of checking dirty_ring_size is then we'll be sure to have no > vcpu created too. Maybe we should also check no memslot created to make > sure the bitmaps are not created. I'm not sure I follow... What prevents userspace from creating a vCPU between enabling the two caps? > Then if the userspace wants to use the bitmap altogether with the ring, it > needs to first detect KVM_CAP_DIRTY_LOG_RING_ALLOW_BITMAP and enable it > before it enables KVM_CAP_DIRTY_LOG_RING. > > One trick on ALLOW_BITMAP is in mark_page_dirty_in_slot() - after we allow > !vcpu case we'll need to make sure it won't accidentally try to set bitmap > for !ALLOW_BITMAP, because in that case the bitmap pointer is NULL so > set_bit_le() will directly crash the kernel. > > We could keep the old flavor of having a WARN_ON_ONCE(!vcpu && > !ALLOW_BITMAP) then return, but since now the userspace can easily trigger > this (e.g. on ARM, a malicious userapp can have DIRTY_RING && > !ALLOW_BITMAP, then it can simply trigger the gic ioctl to trigger host > warning), I think the better approach is we can kill the process in that > case. Not sure whether there's anything better we can do. I don't believe !ALLOW_BITMAP && DIRTY_RING is a valid configuration for arm64 given the fact that we'll dirty memory outside of a vCPU context. Could ALLOW_BITMAP be a requirement of DIRTY_RING, thereby making userspace fail fast? Otherwise (at least on arm64) your VM is DOA on the target. With that the old WARN() could be preserved, as you suggest. On top of that there would no longer be a need to test for memslot creation when userspace attempts to enable KVM_CAP_DIRTY_LOG_RING_ALLOW_BITMAP. -- Thanks, Oliver
On Mon, Oct 10, 2022 at 11:18:55PM +0000, Oliver Upton wrote: > On Fri, Oct 07, 2022 at 10:31:49AM -0400, Peter Xu wrote: > > [...] > > > > - In kvm_vm_ioctl_enable_dirty_log_ring(), set 'dirty_ring_allow_bitmap' to > > > true when the capability is KVM_CAP_DIRTY_LONG_RING_ACQ_REL > > > > What I wanted to do is to decouple the ACQ_REL with ALLOW_BITMAP, so mostly > > as what you suggested, except.. > > +1 > > > > > > > static int kvm_vm_ioctl_enable_dirty_log_ring(struct kvm *kvm, u32 cap, u32 size) > > > { > > > : > > > mutex_lock(&kvm->lock); > > > > > > if (kvm->created_vcpus) { > > > /* We don't allow to change this value after vcpu created */ > > > r = -EINVAL; > > > } else { > > > kvm->dirty_ring_size = size; > > > > .. here I'd not set dirty_ring_allow_bitmap at all so I'd drop below line, > > instead.. > > > > > kvm->dirty_ring_allow_bitmap = (cap == KVM_CAP_DIRTY_LOG_RING_ACQ_REL); > > > r = 0; > > > } > > > > > > mutex_unlock(&kvm->lock); > > > return r; > > > } > > > - In kvm_vm_ioctl_check_extension_generic(), KVM_CAP_DIRTY_LOG_RING_ALLOW_BITMAP > > > is always flase until KVM_CAP_DIRTY_LOG_RING_ACQ_REL is enabled. > > > > > > static long kvm_vm_ioctl_check_extension_generic(...) > > > { > > > : > > > case KVM_CAP_DIRTY_LOG_RING_ALLOW_BITMAP: > > > return kvm->dirty_ring_allow_bitmap ? 1 : 0; > > > > ... here we always return 1, OTOH in kvm_vm_ioctl_enable_cap_generic(): > > > > case KVM_CAP_DIRTY_LOG_RING_ALLOW_BITMAP: > > if (kvm->dirty_ring_size) > > return -EINVAL; > > kvm->dirty_ring_allow_bitmap = true; > > return 0; > > > > A side effect of checking dirty_ring_size is then we'll be sure to have no > > vcpu created too. Maybe we should also check no memslot created to make > > sure the bitmaps are not created. > > I'm not sure I follow... What prevents userspace from creating a vCPU > between enabling the two caps? > > > Then if the userspace wants to use the bitmap altogether with the ring, it > > needs to first detect KVM_CAP_DIRTY_LOG_RING_ALLOW_BITMAP and enable it > > before it enables KVM_CAP_DIRTY_LOG_RING. > > > > One trick on ALLOW_BITMAP is in mark_page_dirty_in_slot() - after we allow > > !vcpu case we'll need to make sure it won't accidentally try to set bitmap > > for !ALLOW_BITMAP, because in that case the bitmap pointer is NULL so > > set_bit_le() will directly crash the kernel. > > > > We could keep the old flavor of having a WARN_ON_ONCE(!vcpu && > > !ALLOW_BITMAP) then return, but since now the userspace can easily trigger > > this (e.g. on ARM, a malicious userapp can have DIRTY_RING && > > !ALLOW_BITMAP, then it can simply trigger the gic ioctl to trigger host > > warning), I think the better approach is we can kill the process in that > > case. Not sure whether there's anything better we can do. > > I don't believe !ALLOW_BITMAP && DIRTY_RING is a valid configuration for > arm64 given the fact that we'll dirty memory outside of a vCPU context. > > Could ALLOW_BITMAP be a requirement of DIRTY_RING, thereby making > userspace fail fast? Otherwise (at least on arm64) your VM is DOA on the > target. With that the old WARN() could be preserved, as you suggest. On > top of that there would no longer be a need to test for memslot creation > when userspace attempts to enable KVM_CAP_DIRTY_LOG_RING_ALLOW_BITMAP. Just to be explicit... I don't believe ALLOW_BITMAP needs to be generally advertized on architectures that select DIRTY_RING. Instead, architectures (just arm64 right now) should select ALLOW_BITMAP if they need to dirty memory outside of a vCPU. When ALLOW_BITMAP is selected, KVM_CAP_DIRTY_LOG_RING[_ACQ_REL] has the additional restriction that KVM_CAP_DIRTY_LOG_RING_ALLOW_BITMAP has been enabled first. -- Thanks, Oliver
On Mon, Oct 10, 2022 at 11:18:55PM +0000, Oliver Upton wrote: > On Fri, Oct 07, 2022 at 10:31:49AM -0400, Peter Xu wrote: > > [...] > > > > - In kvm_vm_ioctl_enable_dirty_log_ring(), set 'dirty_ring_allow_bitmap' to > > > true when the capability is KVM_CAP_DIRTY_LONG_RING_ACQ_REL > > > > What I wanted to do is to decouple the ACQ_REL with ALLOW_BITMAP, so mostly > > as what you suggested, except.. > > +1 > > > > > > > static int kvm_vm_ioctl_enable_dirty_log_ring(struct kvm *kvm, u32 cap, u32 size) > > > { > > > : > > > mutex_lock(&kvm->lock); > > > > > > if (kvm->created_vcpus) { > > > /* We don't allow to change this value after vcpu created */ > > > r = -EINVAL; > > > } else { > > > kvm->dirty_ring_size = size; > > > > .. here I'd not set dirty_ring_allow_bitmap at all so I'd drop below line, > > instead.. > > > > > kvm->dirty_ring_allow_bitmap = (cap == KVM_CAP_DIRTY_LOG_RING_ACQ_REL); > > > r = 0; > > > } > > > > > > mutex_unlock(&kvm->lock); > > > return r; > > > } > > > - In kvm_vm_ioctl_check_extension_generic(), KVM_CAP_DIRTY_LOG_RING_ALLOW_BITMAP > > > is always flase until KVM_CAP_DIRTY_LOG_RING_ACQ_REL is enabled. > > > > > > static long kvm_vm_ioctl_check_extension_generic(...) > > > { > > > : > > > case KVM_CAP_DIRTY_LOG_RING_ALLOW_BITMAP: > > > return kvm->dirty_ring_allow_bitmap ? 1 : 0; > > > > ... here we always return 1, OTOH in kvm_vm_ioctl_enable_cap_generic(): > > > > case KVM_CAP_DIRTY_LOG_RING_ALLOW_BITMAP: > > if (kvm->dirty_ring_size) > > return -EINVAL; > > kvm->dirty_ring_allow_bitmap = true; > > return 0; > > > > A side effect of checking dirty_ring_size is then we'll be sure to have no > > vcpu created too. Maybe we should also check no memslot created to make > > sure the bitmaps are not created. > > I'm not sure I follow... What prevents userspace from creating a vCPU > between enabling the two caps? Enabling of dirty ring requires no vcpu created, so as to make sure all the vcpus will have the ring structures allocated as long as ring enabled for the vm. Done in kvm_vm_ioctl_enable_dirty_log_ring(): if (kvm->created_vcpus) { /* We don't allow to change this value after vcpu created */ r = -EINVAL; } else { kvm->dirty_ring_size = size; r = 0; } Then if we have KVM_CAP_DIRTY_LOG_RING_ALLOW_BITMAP checking dirty_ring_size first then we make sure we need to configure both ALLOW_BITMAP and DIRTY_RING before any vcpu creation. > > > Then if the userspace wants to use the bitmap altogether with the ring, it > > needs to first detect KVM_CAP_DIRTY_LOG_RING_ALLOW_BITMAP and enable it > > before it enables KVM_CAP_DIRTY_LOG_RING. > > > > One trick on ALLOW_BITMAP is in mark_page_dirty_in_slot() - after we allow > > !vcpu case we'll need to make sure it won't accidentally try to set bitmap > > for !ALLOW_BITMAP, because in that case the bitmap pointer is NULL so > > set_bit_le() will directly crash the kernel. > > > > We could keep the old flavor of having a WARN_ON_ONCE(!vcpu && > > !ALLOW_BITMAP) then return, but since now the userspace can easily trigger > > this (e.g. on ARM, a malicious userapp can have DIRTY_RING && > > !ALLOW_BITMAP, then it can simply trigger the gic ioctl to trigger host > > warning), I think the better approach is we can kill the process in that > > case. Not sure whether there's anything better we can do. > > I don't believe !ALLOW_BITMAP && DIRTY_RING is a valid configuration for > arm64 given the fact that we'll dirty memory outside of a vCPU context. Yes it's not, but after Gavin's current series it'll be possible, IOW a malicious app can leverage this to trigger host warning, which is IMHO not wanted. > > Could ALLOW_BITMAP be a requirement of DIRTY_RING, thereby making > userspace fail fast? Otherwise (at least on arm64) your VM is DOA on the > target. With that the old WARN() could be preserved, as you suggest. It's just that x86 doesn't need the bitmap, so it'll be a pure waste there otherwise. It's not only about the memory that will be wasted (that's guest mem size / 32k), but also the sync() process for x86 will be all zeros and totally meaningless - note that the sync() of bitmap will be part of VM downtime in this case (we need to sync() after turning VM off), so it will make x86 downtime larger but without any benefit. > On > top of that there would no longer be a need to test for memslot creation > when userspace attempts to enable KVM_CAP_DIRTY_LOG_RING_ALLOW_BITMAP.
Hi Peter and Oliver, On 10/11/22 7:49 AM, Peter Xu wrote: > On Mon, Oct 10, 2022 at 11:18:55PM +0000, Oliver Upton wrote: >> On Fri, Oct 07, 2022 at 10:31:49AM -0400, Peter Xu wrote: >> >> [...] >> >>>> - In kvm_vm_ioctl_enable_dirty_log_ring(), set 'dirty_ring_allow_bitmap' to >>>> true when the capability is KVM_CAP_DIRTY_LONG_RING_ACQ_REL >>> >>> What I wanted to do is to decouple the ACQ_REL with ALLOW_BITMAP, so mostly >>> as what you suggested, except.. >> >> +1 >> Agreed. [...] >> >>> Then if the userspace wants to use the bitmap altogether with the ring, it >>> needs to first detect KVM_CAP_DIRTY_LOG_RING_ALLOW_BITMAP and enable it >>> before it enables KVM_CAP_DIRTY_LOG_RING. >>> >>> One trick on ALLOW_BITMAP is in mark_page_dirty_in_slot() - after we allow >>> !vcpu case we'll need to make sure it won't accidentally try to set bitmap >>> for !ALLOW_BITMAP, because in that case the bitmap pointer is NULL so >>> set_bit_le() will directly crash the kernel. >>> >>> We could keep the old flavor of having a WARN_ON_ONCE(!vcpu && >>> !ALLOW_BITMAP) then return, but since now the userspace can easily trigger >>> this (e.g. on ARM, a malicious userapp can have DIRTY_RING && >>> !ALLOW_BITMAP, then it can simply trigger the gic ioctl to trigger host >>> warning), I think the better approach is we can kill the process in that >>> case. Not sure whether there's anything better we can do. >> >> I don't believe !ALLOW_BITMAP && DIRTY_RING is a valid configuration for >> arm64 given the fact that we'll dirty memory outside of a vCPU context. > > Yes it's not, but after Gavin's current series it'll be possible, IOW a > malicious app can leverage this to trigger host warning, which is IMHO not > wanted. > >> >> Could ALLOW_BITMAP be a requirement of DIRTY_RING, thereby making >> userspace fail fast? Otherwise (at least on arm64) your VM is DOA on the >> target. With that the old WARN() could be preserved, as you suggest. > > It's just that x86 doesn't need the bitmap, so it'll be a pure waste there > otherwise. It's not only about the memory that will be wasted (that's > guest mem size / 32k), but also the sync() process for x86 will be all > zeros and totally meaningless - note that the sync() of bitmap will be part > of VM downtime in this case (we need to sync() after turning VM off), so it > will make x86 downtime larger but without any benefit. > Besides, old QEMU won't work if ALLOW_BITMAP is required to enable DIRTY_RING, if I'm correct. >> On >> top of that there would no longer be a need to test for memslot creation >> when userspace attempts to enable KVM_CAP_DIRTY_LOG_RING_ALLOW_BITMAP. > Thanks, Gavin
On Mon, Oct 10, 2022 at 07:49:15PM -0400, Peter Xu wrote: > On Mon, Oct 10, 2022 at 11:18:55PM +0000, Oliver Upton wrote: > > On Fri, Oct 07, 2022 at 10:31:49AM -0400, Peter Xu wrote: > > > > [...] > > > > > > - In kvm_vm_ioctl_enable_dirty_log_ring(), set 'dirty_ring_allow_bitmap' to > > > > true when the capability is KVM_CAP_DIRTY_LONG_RING_ACQ_REL > > > > > > What I wanted to do is to decouple the ACQ_REL with ALLOW_BITMAP, so mostly > > > as what you suggested, except.. > > > > +1 > > > > > > > > > > static int kvm_vm_ioctl_enable_dirty_log_ring(struct kvm *kvm, u32 cap, u32 size) > > > > { > > > > : > > > > mutex_lock(&kvm->lock); > > > > > > > > if (kvm->created_vcpus) { > > > > /* We don't allow to change this value after vcpu created */ > > > > r = -EINVAL; > > > > } else { > > > > kvm->dirty_ring_size = size; > > > > > > .. here I'd not set dirty_ring_allow_bitmap at all so I'd drop below line, > > > instead.. > > > > > > > kvm->dirty_ring_allow_bitmap = (cap == KVM_CAP_DIRTY_LOG_RING_ACQ_REL); > > > > r = 0; > > > > } > > > > > > > > mutex_unlock(&kvm->lock); > > > > return r; > > > > } > > > > - In kvm_vm_ioctl_check_extension_generic(), KVM_CAP_DIRTY_LOG_RING_ALLOW_BITMAP > > > > is always flase until KVM_CAP_DIRTY_LOG_RING_ACQ_REL is enabled. > > > > > > > > static long kvm_vm_ioctl_check_extension_generic(...) > > > > { > > > > : > > > > case KVM_CAP_DIRTY_LOG_RING_ALLOW_BITMAP: > > > > return kvm->dirty_ring_allow_bitmap ? 1 : 0; > > > > > > ... here we always return 1, OTOH in kvm_vm_ioctl_enable_cap_generic(): > > > > > > case KVM_CAP_DIRTY_LOG_RING_ALLOW_BITMAP: > > > if (kvm->dirty_ring_size) > > > return -EINVAL; > > > kvm->dirty_ring_allow_bitmap = true; > > > return 0; > > > > > > A side effect of checking dirty_ring_size is then we'll be sure to have no > > > vcpu created too. Maybe we should also check no memslot created to make > > > sure the bitmaps are not created. > > > > I'm not sure I follow... What prevents userspace from creating a vCPU > > between enabling the two caps? > > Enabling of dirty ring requires no vcpu created, so as to make sure all the > vcpus will have the ring structures allocated as long as ring enabled for > the vm. Done in kvm_vm_ioctl_enable_dirty_log_ring(): > > if (kvm->created_vcpus) { > /* We don't allow to change this value after vcpu created */ > r = -EINVAL; > } else { > kvm->dirty_ring_size = size; > r = 0; > } > > Then if we have KVM_CAP_DIRTY_LOG_RING_ALLOW_BITMAP checking > dirty_ring_size first then we make sure we need to configure both > ALLOW_BITMAP and DIRTY_RING before any vcpu creation. Ah, right. Sorry, I had the 'if' condition inverted in my head. > > > > > Then if the userspace wants to use the bitmap altogether with the ring, it > > > needs to first detect KVM_CAP_DIRTY_LOG_RING_ALLOW_BITMAP and enable it > > > before it enables KVM_CAP_DIRTY_LOG_RING. > > > > > > One trick on ALLOW_BITMAP is in mark_page_dirty_in_slot() - after we allow > > > !vcpu case we'll need to make sure it won't accidentally try to set bitmap > > > for !ALLOW_BITMAP, because in that case the bitmap pointer is NULL so > > > set_bit_le() will directly crash the kernel. > > > > > > We could keep the old flavor of having a WARN_ON_ONCE(!vcpu && > > > !ALLOW_BITMAP) then return, but since now the userspace can easily trigger > > > this (e.g. on ARM, a malicious userapp can have DIRTY_RING && > > > !ALLOW_BITMAP, then it can simply trigger the gic ioctl to trigger host > > > warning), I think the better approach is we can kill the process in that > > > case. Not sure whether there's anything better we can do. > > > > I don't believe !ALLOW_BITMAP && DIRTY_RING is a valid configuration for > > arm64 given the fact that we'll dirty memory outside of a vCPU context. > > Yes it's not, but after Gavin's current series it'll be possible, IOW a > malicious app can leverage this to trigger host warning, which is IMHO not > wanted. > > > > > Could ALLOW_BITMAP be a requirement of DIRTY_RING, thereby making > > userspace fail fast? Otherwise (at least on arm64) your VM is DOA on the > > target. With that the old WARN() could be preserved, as you suggest. > > It's just that x86 doesn't need the bitmap, so it'll be a pure waste there > otherwise. It's not only about the memory that will be wasted (that's > guest mem size / 32k), but also the sync() process for x86 will be all > zeros and totally meaningless - note that the sync() of bitmap will be part > of VM downtime in this case (we need to sync() after turning VM off), so it > will make x86 downtime larger but without any benefit. Ah, my follow-up [1] missed by just a few minutes :) I think this further drives the point home -- there's zero need for the bitmap with dirty ring on x86, so why even support it? The proposal of ALLOW_BITMAP && DIRTY_RING should be arm64-specific. Any other arch that needs to dirty memory outside of a vCPU context can opt-in to the behavior. [1]: https://lore.kernel.org/kvm/Y0SuJee3oWL2QCqM@google.com/ -- Thanks, Oliver
On Mon, Oct 10, 2022 at 11:58:22PM +0000, Oliver Upton wrote: > I think this further drives the point home -- there's zero need for the > bitmap with dirty ring on x86, so why even support it? The proposal of > ALLOW_BITMAP && DIRTY_RING should be arm64-specific. Any other arch that > needs to dirty memory outside of a vCPU context can opt-in to the > behavior. Yeah that sounds working too, but it'll be slightly hackish as then the user app will need some "#ifdef ARM64" blocks for e.g. sync dirty bitmap. With the new cap the user app can implement the whole ring with generic code. Also more flexible to expose it as generic cap? E.g., one day x86 can enable this too for whatever reason (even though I don't think so..).
On Mon, Oct 10, 2022 at 08:20:29PM -0400, Peter Xu wrote: > On Mon, Oct 10, 2022 at 11:58:22PM +0000, Oliver Upton wrote: > > I think this further drives the point home -- there's zero need for the > > bitmap with dirty ring on x86, so why even support it? The proposal of > > ALLOW_BITMAP && DIRTY_RING should be arm64-specific. Any other arch that > > needs to dirty memory outside of a vCPU context can opt-in to the > > behavior. > > Yeah that sounds working too, but it'll be slightly hackish as then the > user app will need some "#ifdef ARM64" blocks for e.g. sync dirty bitmap. > With the new cap the user app can implement the whole ring with generic > code. Isn't the current route of exposing ALLOW_BITMAP on other arches for no reason headed in exactly that direction? Userspace would need to know if it _really_ needs the dirty bitmap in addition to the dirty ring, which could take the form of architecture ifdeffery. OTOH, if the cap is only exposed when it is absolutely necessary, an arch-generic live migration implementation could enable the cap whenever it is advertized and scan the bitmap accordingly. The VMM must know something about the architecture it is running on, as it calls KVM_DEV_ARM_ITS_SAVE_TABLES after all... > Also more flexible to expose it as generic cap? E.g., one day x86 can > enable this too for whatever reason (even though I don't think so..). I had imagined something like this patch where the arch opts-in to some generic construct if it *requires* the use of both the ring and bitmap (very rough sketch). -- Thanks, Oliver
On 10/11/22 9:12 AM, Oliver Upton wrote: > On Mon, Oct 10, 2022 at 08:20:29PM -0400, Peter Xu wrote: >> On Mon, Oct 10, 2022 at 11:58:22PM +0000, Oliver Upton wrote: >>> I think this further drives the point home -- there's zero need for the >>> bitmap with dirty ring on x86, so why even support it? The proposal of >>> ALLOW_BITMAP && DIRTY_RING should be arm64-specific. Any other arch that >>> needs to dirty memory outside of a vCPU context can opt-in to the >>> behavior. >> >> Yeah that sounds working too, but it'll be slightly hackish as then the >> user app will need some "#ifdef ARM64" blocks for e.g. sync dirty bitmap. >> With the new cap the user app can implement the whole ring with generic >> code. > > Isn't the current route of exposing ALLOW_BITMAP on other arches for no > reason headed in exactly that direction? Userspace would need to know if > it _really_ needs the dirty bitmap in addition to the dirty ring, which > could take the form of architecture ifdeffery. > > OTOH, if the cap is only exposed when it is absolutely necessary, an > arch-generic live migration implementation could enable the cap whenever > it is advertized and scan the bitmap accordingly. > > The VMM must know something about the architecture it is running on, as > it calls KVM_DEV_ARM_ITS_SAVE_TABLES after all... > It looks good to me by using CONFIG_HAVE_KVM_DIRTY_RING_USE_BITMAP to opt-in KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP. The most important point is to ensure 'kvm->dirty_ring_with_bitmap == true' when dirty ring capability is enabled. In this way, we can fail early when KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP isn't enabled on attempt to enable dirty ring capability. If both of you agree, I will integrate the suggested code changes in next respin, with necessary tweak. - In kvm_vm_ioctl_enable_cap_generic(), 'kvm->dirty_ring_with_bitmap' is updated to 'true' unconditionally. static int kvm_vm_ioctl_enable_cap_generic(struct kvm *kvm, struct kvm_enable_cap *cap) { : case KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP: kvm->dirty_ring_with_bitmap = true; return 0; } - In mark_page_dirty_in_slot(), we need comprehensive checks like below. void mark_page_dirty_in_slot(struct kvm *kvm, const struct kvm_memory_slot *memslot, gfn_t gfn) { #ifdef CONFIG_HAVE_KVM_DIRTY_RING if (WARN_ON_ONCE(vcpu && vcpu->kvm != kvm)) return; #ifndef CONFIG_HAVE_KVM_DIRTY_RING_WITH_BITMAP if (WARN_ON_ONCE(!vcpu)) return; #endif #endif } - Use kvm_dirty_ring_exclusive(), which was suggested by Peter before. The function is used in various spots to allow the dirty bitmap is created and accessed. bool kvm_dirty_ring_exclusive(struct kvm *kvm) { return kvm->dirty_ring_size && !kvm->dirty_ring_with_bitmap; } >> Also more flexible to expose it as generic cap? E.g., one day x86 can >> enable this too for whatever reason (even though I don't think so..). > > I had imagined something like this patch where the arch opts-in to some > generic construct if it *requires* the use of both the ring and bitmap > (very rough sketch). > Thanks, Gavin
Hi Peter/Oliver, On 10/11/22 11:56 AM, Gavin Shan wrote: > On 10/11/22 9:12 AM, Oliver Upton wrote: >> On Mon, Oct 10, 2022 at 08:20:29PM -0400, Peter Xu wrote: >>> On Mon, Oct 10, 2022 at 11:58:22PM +0000, Oliver Upton wrote: >>>> I think this further drives the point home -- there's zero need for the >>>> bitmap with dirty ring on x86, so why even support it? The proposal of >>>> ALLOW_BITMAP && DIRTY_RING should be arm64-specific. Any other arch that >>>> needs to dirty memory outside of a vCPU context can opt-in to the >>>> behavior. >>> >>> Yeah that sounds working too, but it'll be slightly hackish as then the >>> user app will need some "#ifdef ARM64" blocks for e.g. sync dirty bitmap. >>> With the new cap the user app can implement the whole ring with generic >>> code. >> >> Isn't the current route of exposing ALLOW_BITMAP on other arches for no >> reason headed in exactly that direction? Userspace would need to know if >> it _really_ needs the dirty bitmap in addition to the dirty ring, which >> could take the form of architecture ifdeffery. >> >> OTOH, if the cap is only exposed when it is absolutely necessary, an >> arch-generic live migration implementation could enable the cap whenever >> it is advertized and scan the bitmap accordingly. >> >> The VMM must know something about the architecture it is running on, as >> it calls KVM_DEV_ARM_ITS_SAVE_TABLES after all... >> > > It looks good to me by using CONFIG_HAVE_KVM_DIRTY_RING_USE_BITMAP to > opt-in KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP. The most important point is > to ensure 'kvm->dirty_ring_with_bitmap == true' when dirty ring capability > is enabled. In this way, we can fail early when KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP > isn't enabled on attempt to enable dirty ring capability. > > If both of you agree, I will integrate the suggested code changes in > next respin, with necessary tweak. > > - In kvm_vm_ioctl_enable_cap_generic(), 'kvm->dirty_ring_with_bitmap' is > updated to 'true' unconditionally. > > static int kvm_vm_ioctl_enable_cap_generic(struct kvm *kvm, > struct kvm_enable_cap *cap) > { > : > case KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP: > kvm->dirty_ring_with_bitmap = true; > return 0; > } > > - In mark_page_dirty_in_slot(), we need comprehensive checks like below. > > void mark_page_dirty_in_slot(struct kvm *kvm, > const struct kvm_memory_slot *memslot, > gfn_t gfn) > { > #ifdef CONFIG_HAVE_KVM_DIRTY_RING > if (WARN_ON_ONCE(vcpu && vcpu->kvm != kvm)) > return; > > #ifndef CONFIG_HAVE_KVM_DIRTY_RING_WITH_BITMAP > if (WARN_ON_ONCE(!vcpu)) > return; > #endif > #endif > > } > > - Use kvm_dirty_ring_exclusive(), which was suggested by Peter before. > The function is used in various spots to allow the dirty bitmap is > created and accessed. > > bool kvm_dirty_ring_exclusive(struct kvm *kvm) > { > return kvm->dirty_ring_size && !kvm->dirty_ring_with_bitmap; > } > > I've included Oliver's suggested changes into v6, which was just posted: https://lore.kernel.org/kvmarm/3123a04f-a674-782b-9e9b-0baf3db49ebc@redhat.com/ Please find your time to review v6 directly, thanks! >>> Also more flexible to expose it as generic cap? E.g., one day x86 can >>> enable this too for whatever reason (even though I don't think so..). >> >> I had imagined something like this patch where the arch opts-in to some >> generic construct if it *requires* the use of both the ring and bitmap >> (very rough sketch). >> Thanks, Gavin
On Tue, Oct 11, 2022 at 01:12:43AM +0000, Oliver Upton wrote: > The VMM must know something about the architecture it is running on, as > it calls KVM_DEV_ARM_ITS_SAVE_TABLES after all... IIUC this is still a kernel impl detail to flush data into guest pages within this ioctl, or am I wrong? For example, I'm assuming it's safe to change KVM_DEV_ARM_ITS_SAVE_TABLES impl one day to not flush data to guest memories, then the kernel should also disable the ALLOW_BITMAP cap in the same patch, so that any old qemu binary that supports arm64 dirty ring will naturally skip all the bitmap ops and becoming the same as what it does with x86 when running on that new kernel. With implicit approach suggested, we need to modify QEMU. Changing impl of KVM_DEV_ARM_ITS_SAVE_TABLES is probably not a good example.. but just want to show what I meant. Fundamentally it sounds cleaner if it's the kernel that tells the user "okay you collected the ring, but that's not enough; you need to collect the bitmap too", rather than assuming the user app will always know what kvm did in details. No strong opinion though, as I could also have misunderstood how arm works.
On Fri, Oct 14, 2022 at 12:55:35PM -0400, Peter Xu wrote: > On Tue, Oct 11, 2022 at 01:12:43AM +0000, Oliver Upton wrote: > > The VMM must know something about the architecture it is running on, as > > it calls KVM_DEV_ARM_ITS_SAVE_TABLES after all... > > IIUC this is still a kernel impl detail to flush data into guest pages > within this ioctl, or am I wrong? Somewhat... The guest is assigning memory from the IPA space to back the ITS tables, but KVM maintains its own internal representation. It just so happens that we've conditioned userspace to be aware that ITS emulation is incoherent w.r.t. the guest memory that backs the tables. > For example, I'm assuming it's safe to change KVM_DEV_ARM_ITS_SAVE_TABLES > impl one day to not flush data to guest memories, then the kernel should > also disable the ALLOW_BITMAP cap in the same patch, so that any old qemu > binary that supports arm64 dirty ring will naturally skip all the bitmap > ops and becoming the same as what it does with x86 when running on that new > kernel. With implicit approach suggested, we need to modify QEMU. > > Changing impl of KVM_DEV_ARM_ITS_SAVE_TABLES is probably not a good > example.. but just want to show what I meant. Fundamentally it sounds > cleaner if it's the kernel that tells the user "okay you collected the > ring, but that's not enough; you need to collect the bitmap too", rather > than assuming the user app will always know what kvm did in details. No > strong opinion though, as I could also have misunderstood how arm works. I think the SAVE_TABLES ioctl is likely here to stay given the odd quirk that it really is guest memory, so we'll probably need the bitmap on arm64 for a long time. Even if we were to kill it, userspace would need to take a change anyway to switch to a new ITS migration mechanism. If we ever get to the point that we can relax this restriction i think a flag on the BITMAP_WITH_TABLE cap that says "I don't actually set any bits in the bitmap" would do. We shouldn't hide the cap entirely, as that would be ABI breakage for VMMs that expect bitmap+ring. Thoughts? -- Thanks, Oliver
On Tue, Oct 18, 2022 at 10:38:10AM +0300, Oliver Upton wrote: > On Fri, Oct 14, 2022 at 12:55:35PM -0400, Peter Xu wrote: > > On Tue, Oct 11, 2022 at 01:12:43AM +0000, Oliver Upton wrote: > > > The VMM must know something about the architecture it is running on, as > > > it calls KVM_DEV_ARM_ITS_SAVE_TABLES after all... > > > > IIUC this is still a kernel impl detail to flush data into guest pages > > within this ioctl, or am I wrong? > > Somewhat... > > The guest is assigning memory from the IPA space to back the ITS tables, > but KVM maintains its own internal representation. It just so happens > that we've conditioned userspace to be aware that ITS emulation is > incoherent w.r.t. the guest memory that backs the tables. > > > For example, I'm assuming it's safe to change KVM_DEV_ARM_ITS_SAVE_TABLES > > impl one day to not flush data to guest memories, then the kernel should > > also disable the ALLOW_BITMAP cap in the same patch, so that any old qemu > > binary that supports arm64 dirty ring will naturally skip all the bitmap > > ops and becoming the same as what it does with x86 when running on that new > > kernel. With implicit approach suggested, we need to modify QEMU. > > > > Changing impl of KVM_DEV_ARM_ITS_SAVE_TABLES is probably not a good > > example.. but just want to show what I meant. Fundamentally it sounds > > cleaner if it's the kernel that tells the user "okay you collected the > > ring, but that's not enough; you need to collect the bitmap too", rather > > than assuming the user app will always know what kvm did in details. No > > strong opinion though, as I could also have misunderstood how arm works. > > I think the SAVE_TABLES ioctl is likely here to stay given the odd quirk > that it really is guest memory, so we'll probably need the bitmap on > arm64 for a long time. Even if we were to kill it, userspace would need > to take a change anyway to switch to a new ITS migration mechanism. > > If we ever get to the point that we can relax this restriction i think a > flag on the BITMAP_WITH_TABLE cap that says "I don't actually set any BITMAP_WITH_RING > bits in the bitmap" would do. We shouldn't hide the cap entirely, as > that would be ABI breakage for VMMs that expect bitmap+ring. > > Thoughts? > > -- > Thanks, > Oliver > _______________________________________________ > kvmarm mailing list > kvmarm@lists.cs.columbia.edu > https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
On Tue, Oct 18, 2022 at 10:38:10AM +0300, Oliver Upton wrote: > If we ever get to the point that we can relax this restriction i think a > flag on the BITMAP_WITH_TABLE cap that says "I don't actually set any > bits in the bitmap" would do. We shouldn't hide the cap entirely, as > that would be ABI breakage for VMMs that expect bitmap+ring. I'd rather drop the cap directly if it's just a boolean that tells us "whether we need bitmap to back rings". Btw when I said "dropping it" I meant "don't return 1 for the cap anymore" - we definitely need to make the cap macro stable as part of kvm API.. But I think I misunderstood the proposal previously, sorry. I assumed we were discussing an internal HAVE_DIRTY_RING_WITH_BITMAP only. I noticed this only after I had a closer look at Gavin's patch. Having a cap exposed sounds always good to me. I'll comment on Gavin's patch directly, thanks.
diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst index 32427ea160df..4f5e09042f8a 100644 --- a/Documentation/virt/kvm/api.rst +++ b/Documentation/virt/kvm/api.rst @@ -8019,8 +8019,8 @@ guest according to the bits in the KVM_CPUID_FEATURES CPUID leaf (0x40000001). Otherwise, a guest may use the paravirtual features regardless of what has actually been exposed through the CPUID leaf. -8.29 KVM_CAP_DIRTY_LOG_RING/KVM_CAP_DIRTY_LOG_RING_ACQ_REL ----------------------------------------------------------- +8.29 KVM_CAP_DIRTY_LOG_{RING, RING_ACQ_REL, RING_ALLOW_BITMAP} +-------------------------------------------------------------- :Architectures: x86 :Parameters: args[0] - size of the dirty log ring @@ -8104,13 +8104,6 @@ flushing is done by the KVM_GET_DIRTY_LOG ioctl). To achieve that, one needs to kick the vcpu out of KVM_RUN using a signal. The resulting vmexit ensures that all dirty GFNs are flushed to the dirty rings. -NOTE: the capability KVM_CAP_DIRTY_LOG_RING and the corresponding -ioctl KVM_RESET_DIRTY_RINGS are mutual exclusive to the existing ioctls -KVM_GET_DIRTY_LOG and KVM_CLEAR_DIRTY_LOG. After enabling -KVM_CAP_DIRTY_LOG_RING with an acceptable dirty ring size, the virtual -machine will switch to ring-buffer dirty page tracking and further -KVM_GET_DIRTY_LOG or KVM_CLEAR_DIRTY_LOG ioctls will fail. - NOTE: KVM_CAP_DIRTY_LOG_RING_ACQ_REL is the only capability that should be exposed by weakly ordered architecture, in order to indicate the additional memory ordering requirements imposed on userspace when @@ -8119,6 +8112,12 @@ Architecture with TSO-like ordering (such as x86) are allowed to expose both KVM_CAP_DIRTY_LOG_RING and KVM_CAP_DIRTY_LOG_RING_ACQ_REL to userspace. +NOTE: There is no running vcpu and available vcpu dirty ring when pages +becomes dirty in some cases. One example is to save arm64's vgic/its +tables during migration. The dirty bitmap is still used to track those +dirty pages, indicated by KVM_CAP_DIRTY_LOG_RING_ALLOW_BITMAP. The ditry +bitmap is visited by KVM_GET_DIRTY_LOG and KVM_CLEAR_DIRTY_LOG ioctls. + 8.30 KVM_CAP_XEN_HVM -------------------- diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h index 0d5d4419139a..a1f10bc5fe8b 100644 --- a/include/uapi/linux/kvm.h +++ b/include/uapi/linux/kvm.h @@ -1178,6 +1178,7 @@ struct kvm_ppc_resize_hpt { #define KVM_CAP_S390_ZPCI_OP 221 #define KVM_CAP_S390_CPU_TOPOLOGY 222 #define KVM_CAP_DIRTY_LOG_RING_ACQ_REL 223 +#define KVM_CAP_DIRTY_LOG_RING_ALLOW_BITMAP 224 #ifdef KVM_CAP_IRQ_ROUTING diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index 5b064dbadaf4..505e3840cf0c 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -1617,7 +1617,7 @@ static int kvm_prepare_memory_region(struct kvm *kvm, new->dirty_bitmap = NULL; else if (old && old->dirty_bitmap) new->dirty_bitmap = old->dirty_bitmap; - else if (!kvm->dirty_ring_size) { + else { r = kvm_alloc_dirty_bitmap(new); if (r) return r; @@ -2060,10 +2060,6 @@ int kvm_get_dirty_log(struct kvm *kvm, struct kvm_dirty_log *log, unsigned long n; unsigned long any = 0; - /* Dirty ring tracking is exclusive to dirty log tracking */ - if (kvm->dirty_ring_size) - return -ENXIO; - *memslot = NULL; *is_dirty = 0; @@ -2125,10 +2121,6 @@ static int kvm_get_dirty_log_protect(struct kvm *kvm, struct kvm_dirty_log *log) unsigned long *dirty_bitmap_buffer; bool flush; - /* Dirty ring tracking is exclusive to dirty log tracking */ - if (kvm->dirty_ring_size) - return -ENXIO; - as_id = log->slot >> 16; id = (u16)log->slot; if (as_id >= KVM_ADDRESS_SPACE_NUM || id >= KVM_USER_MEM_SLOTS) @@ -2237,10 +2229,6 @@ static int kvm_clear_dirty_log_protect(struct kvm *kvm, unsigned long *dirty_bitmap_buffer; bool flush; - /* Dirty ring tracking is exclusive to dirty log tracking */ - if (kvm->dirty_ring_size) - return -ENXIO; - as_id = log->slot >> 16; id = (u16)log->slot; if (as_id >= KVM_ADDRESS_SPACE_NUM || id >= KVM_USER_MEM_SLOTS) @@ -3305,7 +3293,7 @@ void mark_page_dirty_in_slot(struct kvm *kvm, struct kvm_vcpu *vcpu = kvm_get_running_vcpu(); #ifdef CONFIG_HAVE_KVM_DIRTY_RING - if (WARN_ON_ONCE(!vcpu) || WARN_ON_ONCE(vcpu->kvm != kvm)) + if (WARN_ON_ONCE(vcpu && vcpu->kvm != kvm)) return; #endif @@ -3313,7 +3301,7 @@ void mark_page_dirty_in_slot(struct kvm *kvm, unsigned long rel_gfn = gfn - memslot->base_gfn; u32 slot = (memslot->as_id << 16) | memslot->id; - if (kvm->dirty_ring_size) + if (kvm->dirty_ring_size && vcpu) kvm_dirty_ring_push(&vcpu->dirty_ring, slot, rel_gfn); else @@ -4485,6 +4473,12 @@ static long kvm_vm_ioctl_check_extension_generic(struct kvm *kvm, long arg) return KVM_DIRTY_RING_MAX_ENTRIES * sizeof(struct kvm_dirty_gfn); #else return 0; +#endif + case KVM_CAP_DIRTY_LOG_RING_ALLOW_BITMAP: +#ifdef CONFIG_HAVE_KVM_DIRTY_RING + return 1; +#else + return 0; #endif case KVM_CAP_BINARY_STATS_FD: case KVM_CAP_SYSTEM_EVENT_DATA:
There is no running vcpu and available per-vcpu dirty ring when pages become dirty in some cases. One example is to save arm64's vgic/its tables during migration. This leads to our lost tracking on these dirty pages. Fix the issue by reusing the bitmap to track those dirty pages. The bitmap is visited by KVM_GET_DIRTY_LOG and KVM_CLEAR_DIRTY_LOG ioctls. KVM_CAP_DIRTY_LOG_RING_ALLOW_BITMAP is used to advertise the new capability. Suggested-by: Marc Zyngier <maz@kernel.org> Suggested-by: Peter Xu <peterx@redhat.com> Signed-off-by: Gavin Shan <gshan@redhat.com> --- Documentation/virt/kvm/api.rst | 17 ++++++++--------- include/uapi/linux/kvm.h | 1 + virt/kvm/kvm_main.c | 24 +++++++++--------------- 3 files changed, 18 insertions(+), 24 deletions(-)