Message ID | 20190904180736.29009-1-xypron.glpk@gmx.de (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [1/1] KVM: inject data abort if instruction cannot be decoded | expand |
That's interesting; we also met this issue yesterday :) HG On 2019/9/5 2:07, Heinrich Schuchardt wrote: > If an application tries to access memory that is not mapped, an error > ENOSYS, "load/store instruction decoding not implemented" may occur. > QEMU will hang with a register dump. > > Instead create a data abort that can be handled gracefully by the > application running in the virtual environment. > > Now the virtual machine can react to the event in the most appropriate > way - by recovering, by writing an informative log, or by rebooting. > > Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de> > --- > virt/kvm/arm/mmio.c | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > diff --git a/virt/kvm/arm/mmio.c b/virt/kvm/arm/mmio.c > index a8a6a0c883f1..0cbed7d6a0f4 100644 > --- a/virt/kvm/arm/mmio.c > +++ b/virt/kvm/arm/mmio.c > @@ -161,8 +161,8 @@ int io_mem_abort(struct kvm_vcpu *vcpu, struct kvm_run *run, > if (ret) > return ret; > } else { > - kvm_err("load/store instruction decoding not implemented\n"); > - return -ENOSYS; > + kvm_inject_dabt(vcpu, kvm_vcpu_get_hfar(vcpu)); > + return 1; > } > > rt = vcpu->arch.mmio_decode.rt; > -- > 2.23.0.rc1 > > > _______________________________________________ > linux-arm-kernel mailing list > linux-arm-kernel@lists.infradead.org > http://lists.infradead.org/mailman/listinfo/linux-arm-kernel > >
[Please use my kernel.org address. My arm.com address will disappear shortly] On Wed, 04 Sep 2019 19:07:36 +0100, Heinrich Schuchardt <xypron.glpk@gmx.de> wrote: > > If an application tries to access memory that is not mapped, an error > ENOSYS, "load/store instruction decoding not implemented" may occur. > QEMU will hang with a register dump. > > Instead create a data abort that can be handled gracefully by the > application running in the virtual environment. > > Now the virtual machine can react to the event in the most appropriate > way - by recovering, by writing an informative log, or by rebooting. > > Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de> > --- > virt/kvm/arm/mmio.c | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > diff --git a/virt/kvm/arm/mmio.c b/virt/kvm/arm/mmio.c > index a8a6a0c883f1..0cbed7d6a0f4 100644 > --- a/virt/kvm/arm/mmio.c > +++ b/virt/kvm/arm/mmio.c > @@ -161,8 +161,8 @@ int io_mem_abort(struct kvm_vcpu *vcpu, struct kvm_run *run, > if (ret) > return ret; > } else { > - kvm_err("load/store instruction decoding not implemented\n"); > - return -ENOSYS; > + kvm_inject_dabt(vcpu, kvm_vcpu_get_hfar(vcpu)); > + return 1; How can you tell that the access would fault? You have no idea at that stage (the kernel doesn't know about the MMIO ranges that userspace handles). All you know is that you're faced with a memory access that you cannot emulate in the kernel. Injecting a data abort at that stage is not something that the architecture allows. If you want to address this, consider forwarding the access to userspace. You'll only need an instruction decoder (supporting T1, T2, A32 and A64) and a S1 page table walker (one per page table format, all three of them) to emulate the access (having taken care of stopping all the other vcpus to make sure there is no concurrent modification of the page tables). You'll then be able to return the result of the access back to the kernel. Of course, the best thing would be to actually fix the guest so that it doesn't use non-emulatable MMIO accesses. In general, that the sign of a bug in low-level accessors. M.
On Thu, 5 Sep 2019 at 09:04, Marc Zyngier <maz@kernel.org> wrote: > How can you tell that the access would fault? You have no idea at that > stage (the kernel doesn't know about the MMIO ranges that userspace > handles). All you know is that you're faced with a memory access that > you cannot emulate in the kernel. Injecting a data abort at that stage > is not something that the architecture allows. To be fair, locking up the whole CPU (which is effectively what the kvm_err/ENOSYS is going to do to the VM) isn't something the architecture allows either :-) > Of course, the best thing would be to actually fix the guest so that > it doesn't use non-emulatable MMIO accesses. In general, that the sign > of a bug in low-level accessors. This is true, but the problem is that barfing out to userspace makes it harder to debug the guest because it means that the VM is immediately destroyed, whereas AIUI if we inject some kind of exception then (assuming you're set up to do kernel-debug via gdbstub) you can actually examine the offending guest code with a debugger because at least your VM is still around to inspect... thanks -- PMM
On Thu, Sep 05, 2019 at 09:16:54AM +0100, Peter Maydell wrote: > On Thu, 5 Sep 2019 at 09:04, Marc Zyngier <maz@kernel.org> wrote: > > How can you tell that the access would fault? You have no idea at that > > stage (the kernel doesn't know about the MMIO ranges that userspace > > handles). All you know is that you're faced with a memory access that > > you cannot emulate in the kernel. Injecting a data abort at that stage > > is not something that the architecture allows. > > To be fair, locking up the whole CPU (which is effectively > what the kvm_err/ENOSYS is going to do to the VM) isn't > something the architecture allows either :-) > > > Of course, the best thing would be to actually fix the guest so that > > it doesn't use non-emulatable MMIO accesses. In general, that the sign > > of a bug in low-level accessors. > > This is true, but the problem is that barfing out to userspace > makes it harder to debug the guest because it means that > the VM is immediately destroyed, whereas AIUI if we > inject some kind of exception then (assuming you're set up > to do kernel-debug via gdbstub) you can actually examine > the offending guest code with a debugger because at least > your VM is still around to inspect... > Is it really going to be easier to debug a guest that sees behavior which may not be architecturally correct? For example, seeing a data abort on an access to an MMIO region because the guest used a strange instruction? I appreaciate that the current way we handle this is confusing and has led many people down a rabbit hole, so we should do better. Would a better approach not be to return to userspace saying, "we can't handle this in the kernel, you decide", without printing the dubious kernel error message. Then user space could suspend the VM and print a lenghty explanation of all the possible problems there could be, or re-inject something back into the guest, or whatever, for a particular environment. Thoughts? Thanks, Christoffer
On 9/5/19 10:03 AM, Marc Zyngier wrote: > [Please use my kernel.org address. My arm.com address will disappear shortly] > > On Wed, 04 Sep 2019 19:07:36 +0100, > Heinrich Schuchardt <xypron.glpk@gmx.de> wrote: >> >> If an application tries to access memory that is not mapped, an error >> ENOSYS, "load/store instruction decoding not implemented" may occur. >> QEMU will hang with a register dump. >> >> Instead create a data abort that can be handled gracefully by the >> application running in the virtual environment. >> >> Now the virtual machine can react to the event in the most appropriate >> way - by recovering, by writing an informative log, or by rebooting. >> >> Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de> >> --- >> virt/kvm/arm/mmio.c | 4 ++-- >> 1 file changed, 2 insertions(+), 2 deletions(-) >> >> diff --git a/virt/kvm/arm/mmio.c b/virt/kvm/arm/mmio.c >> index a8a6a0c883f1..0cbed7d6a0f4 100644 >> --- a/virt/kvm/arm/mmio.c >> +++ b/virt/kvm/arm/mmio.c >> @@ -161,8 +161,8 @@ int io_mem_abort(struct kvm_vcpu *vcpu, struct kvm_run *run, >> if (ret) >> return ret; >> } else { >> - kvm_err("load/store instruction decoding not implemented\n"); >> - return -ENOSYS; >> + kvm_inject_dabt(vcpu, kvm_vcpu_get_hfar(vcpu)); >> + return 1; > > How can you tell that the access would fault? You have no idea at that > stage (the kernel doesn't know about the MMIO ranges that userspace > handles). All you know is that you're faced with a memory access that > you cannot emulate in the kernel. Injecting a data abort at that stage > is not something that the architecture allows. > > If you want to address this, consider forwarding the access to > userspace. You'll only need an instruction decoder (supporting T1, T2, > A32 and A64) and a S1 page table walker (one per page table format, > all three of them) to emulate the access (having taken care of > stopping all the other vcpus to make sure there is no concurrent > modification of the page tables). You'll then be able to return the > result of the access back to the kernel. If I understand you right, the problem has to be fixed in QEMU and not in KVM. Is there an example of such a forwarding already available in QEMU? > > Of course, the best thing would be to actually fix the guest so that > it doesn't use non-emulatable MMIO accesses. In general, that the sign > of a bug in low-level accessors. My problem was to find out where in my guest (U-Boot running UEFI SCT) the problem occurred. With this patch U-Boot gave me the relative address in Shell.efi and I was able to locate what was wrong in U-Boot's UEFI implementation. Thanks for reviewing. Heinrich
On Thu, 5 Sep 2019 at 09:25, Christoffer Dall <christoffer.dall@arm.com> wrote: > > On Thu, Sep 05, 2019 at 09:16:54AM +0100, Peter Maydell wrote: > > This is true, but the problem is that barfing out to userspace > > makes it harder to debug the guest because it means that > > the VM is immediately destroyed, whereas AIUI if we > > inject some kind of exception then (assuming you're set up > > to do kernel-debug via gdbstub) you can actually examine > > the offending guest code with a debugger because at least > > your VM is still around to inspect... > > > > Is it really going to be easier to debug a guest that sees behavior > which may not be architecturally correct? For example, seeing a data > abort on an access to an MMIO region because the guest used a strange > instruction? Yeah, a data abort is not ideal. You could UNDEF the insn, which probably is more likely to result in getting control in the debugger I suppose. As for whether it's going to be easier to debug, for the user who reported this in the first place it certainly was. (Consider even a simple Linux guest not under a debugger -- if we UNDEF the insn the guest kernel will print a helpful backtrace so you can tell where the problem is; at the moment we just print a register dump from the host kernel, which is a lot less informative.) > I appreaciate that the current way we handle this is confusing and has > led many people down a rabbit hole, so we should do better. > > Would a better approach not be to return to userspace saying, "we can't > handle this in the kernel, you decide", without printing the dubious > kernel error message. Printing the message in the kernel is the best clue we give the user at the moment that they've run into this problem; I would be wary of removing it (even if we decide to also do something else). > Then user space could suspend the VM and print a > lenghty explanation of all the possible problems there could be, or > re-inject something back into the guest, or whatever, for a particular > environment. In theory I guess so. In practice that's not what userspace currently in the wild does, and injecting an exception from userspace is a bit awkward (I dunno if kvmtool does it, QEMU only needs to in really obscure circumstances and was buggy in how it tried to do it until very recently)... thanks -- PMM
On 9/5/19 10:16 AM, Peter Maydell wrote: > On Thu, 5 Sep 2019 at 09:04, Marc Zyngier <maz@kernel.org> wrote: >> How can you tell that the access would fault? You have no idea at that >> stage (the kernel doesn't know about the MMIO ranges that userspace >> handles). All you know is that you're faced with a memory access that >> you cannot emulate in the kernel. Injecting a data abort at that stage >> is not something that the architecture allows. > > To be fair, locking up the whole CPU (which is effectively > what the kvm_err/ENOSYS is going to do to the VM) isn't > something the architecture allows either :-) > >> Of course, the best thing would be to actually fix the guest so that >> it doesn't use non-emulatable MMIO accesses. In general, that the sign >> of a bug in low-level accessors. > > This is true, but the problem is that barfing out to userspace > makes it harder to debug the guest because it means that > the VM is immediately destroyed, whereas AIUI if we > inject some kind of exception then (assuming you're set up > to do kernel-debug via gdbstub) you can actually examine > the offending guest code with a debugger because at least > your VM is still around to inspect... Stopping the CPU and debugging is not what I am interested in. I want the QEMU guest to be able to react to an incorrect memory access. Imagine Apollo 11's computer not restarting when hitting an exception. They would never have reached the moon. - I think allowing an emulation guest to react to an exception, e.g. by resetting, is a necessity. In my case U-Boot as a guest creates an output like the one below when a data abort occurs: "Synchronous Abort" handler, esr 0x02000000 elr: fffffffffdeac19c lr : fffffffffdeac19c (reloc) elr: 000000007ddd719c lr : 000000007ddd719c x0 : 0000000000000000 x1 : 000000007ffbc000 x2 : 000000000000000a x3 : 000000007ffbcd80 x4 : 0000000000002800 x5 : 000000007ffbcdb0 x6 : 0000000000000001 x7 : 000000007eef8b80 x8 : 000000000000003f x9 : 0000000000000004 x10: 0000000000000001 x11: 000000000000000d x12: 0000000000000006 x13: 000000000001869f x14: 0000000047f00000 x15: 0000000000000000 x16: 000000007ff5b194 x17: 0000000000000000 x18: 0000000000000000 x19: 000000007ffbcd30 x20: 0000000000000000 x21: 000000007ffeb000 x22: 0000000000000009 x23: 000000007eef5cf0 x24: 0000000000000000 x25: 000000007ffa7806 x26: 000000007ffa7834 x27: 0000000000000024 x28: 000000007dddd040 x29: 000000007ede9990 UEFI image [0x000000007ddd7000:0x000000007ddd749f] pc=0x19c '/bug.efi' Resetting CPU ... With this information I see that the problem occurred at 0x019C from the start of the loaded binary bug.efi. Next thing is to look at the map file of bug.efi to find out in which instruction the problem occurred. After providing the dump U-Boot continues to reset the system. When U-Boot is running the EDK II SCT (a test suite for UEFI firmware), SCT will log that a restart occurred (indicating that a test failed) and continue to run the next test. Best regards Heinrich
On Thu, 05 Sep 2019 09:16:54 +0100, Peter Maydell <peter.maydell@linaro.org> wrote: > > On Thu, 5 Sep 2019 at 09:04, Marc Zyngier <maz@kernel.org> wrote: > > How can you tell that the access would fault? You have no idea at that > > stage (the kernel doesn't know about the MMIO ranges that userspace > > handles). All you know is that you're faced with a memory access that > > you cannot emulate in the kernel. Injecting a data abort at that stage > > is not something that the architecture allows. > > To be fair, locking up the whole CPU (which is effectively > what the kvm_err/ENOSYS is going to do to the VM) isn't > something the architecture allows either :-) Hey, I didn't say things were good as they are now! ;-) I'm definitely willing to change things in that area, but I also don't want anyone to start relying on things that are not specified anywhere. > > > Of course, the best thing would be to actually fix the guest so that > > it doesn't use non-emulatable MMIO accesses. In general, that the sign > > of a bug in low-level accessors. > > This is true, but the problem is that barfing out to userspace > makes it harder to debug the guest because it means that > the VM is immediately destroyed, whereas AIUI if we > inject some kind of exception then (assuming you're set up > to do kernel-debug via gdbstub) you can actually examine > the offending guest code with a debugger because at least > your VM is still around to inspect... To Christoffer's point, I find the benefit a bit dubious. Yes, you get an exception, but the instruction that caused it may be completely legal (store with post-increment, for example), leading to an even more puzzled developer (that exception should never have been delivered the first place). I'm far more in favour of dumping the state of the access in the run structure (much like we do for a MMIO access) and let userspace do something about it (such as dumping information on the console or breaking). It could even inject an exception *if* the user has asked for it. Thanks, M.
On Thu, 5 Sep 2019 at 09:52, Marc Zyngier <maz@kernel.org> wrote: > > On Thu, 05 Sep 2019 09:16:54 +0100, > Peter Maydell <peter.maydell@linaro.org> wrote: > > This is true, but the problem is that barfing out to userspace > > makes it harder to debug the guest because it means that > > the VM is immediately destroyed, whereas AIUI if we > > inject some kind of exception then (assuming you're set up > > to do kernel-debug via gdbstub) you can actually examine > > the offending guest code with a debugger because at least > > your VM is still around to inspect... > > To Christoffer's point, I find the benefit a bit dubious. Yes, you get > an exception, but the instruction that caused it may be completely > legal (store with post-increment, for example), leading to an even > more puzzled developer (that exception should never have been > delivered the first place). Right, but the combination of "host kernel prints a message about an unsupported load/store insn" and "within-guest debug dump/stack trace/etc" is much more useful than just having "host kernel prints message" and "QEMU exits"; and it requires about 3 lines of code change... > I'm far more in favour of dumping the state of the access in the run > structure (much like we do for a MMIO access) and let userspace do > something about it (such as dumping information on the console or > breaking). It could even inject an exception *if* the user has asked > for it. ...whereas this requires agreement on a kernel-userspace API, larger changes in the kernel, somebody to implement the userspace side of things, and the user to update both the kernel and QEMU. It's hard for me to see that the benefit here over the 3-line approach really outweighs the extra effort needed. In practice saying "we should do this" is saying "we're going to do nothing", based on the historical record. thanks -- PMM
On Thu, 05 Sep 2019 09:28:42 +0100, Heinrich Schuchardt <xypron.glpk@gmx.de> wrote: > > On 9/5/19 10:03 AM, Marc Zyngier wrote: > > [Please use my kernel.org address. My arm.com address will disappear shortly] > > > > On Wed, 04 Sep 2019 19:07:36 +0100, > > Heinrich Schuchardt <xypron.glpk@gmx.de> wrote: > >> > >> If an application tries to access memory that is not mapped, an error > >> ENOSYS, "load/store instruction decoding not implemented" may occur. > >> QEMU will hang with a register dump. > >> > >> Instead create a data abort that can be handled gracefully by the > >> application running in the virtual environment. > >> > >> Now the virtual machine can react to the event in the most appropriate > >> way - by recovering, by writing an informative log, or by rebooting. > >> > >> Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de> > >> --- > >> virt/kvm/arm/mmio.c | 4 ++-- > >> 1 file changed, 2 insertions(+), 2 deletions(-) > >> > >> diff --git a/virt/kvm/arm/mmio.c b/virt/kvm/arm/mmio.c > >> index a8a6a0c883f1..0cbed7d6a0f4 100644 > >> --- a/virt/kvm/arm/mmio.c > >> +++ b/virt/kvm/arm/mmio.c > >> @@ -161,8 +161,8 @@ int io_mem_abort(struct kvm_vcpu *vcpu, struct kvm_run *run, > >> if (ret) > >> return ret; > >> } else { > >> - kvm_err("load/store instruction decoding not implemented\n"); > >> - return -ENOSYS; > >> + kvm_inject_dabt(vcpu, kvm_vcpu_get_hfar(vcpu)); > >> + return 1; > > > > How can you tell that the access would fault? You have no idea at that > > stage (the kernel doesn't know about the MMIO ranges that userspace > > handles). All you know is that you're faced with a memory access that > > you cannot emulate in the kernel. Injecting a data abort at that stage > > is not something that the architecture allows. > > > > If you want to address this, consider forwarding the access to > > userspace. You'll only need an instruction decoder (supporting T1, T2, > > A32 and A64) and a S1 page table walker (one per page table format, > > all three of them) to emulate the access (having taken care of > > stopping all the other vcpus to make sure there is no concurrent > > modification of the page tables). You'll then be able to return the > > result of the access back to the kernel. > > If I understand you right, the problem has to be fixed in QEMU and not > in KVM. It is a shared responsibility. > Is there an example of such a forwarding already available in QEMU? Yes. That's how we ask userspace (QEMU) to perform a MMIO access on behalf of the guest (see how the run structure is populated and the vcpu thread returns to userspace). > > > > Of course, the best thing would be to actually fix the guest so that > > it doesn't use non-emulatable MMIO accesses. In general, that the sign > > of a bug in low-level accessors. > > My problem was to find out where in my guest (U-Boot running UEFI SCT) > the problem occurred. With this patch U-Boot gave me the relative > address in Shell.efi and I was able to locate what was wrong in U-Boot's > UEFI implementation. I understand that there is a need for more precise information. I'm just wary of adding debug features that become something that people rely on, despite being in contradiction with the architecture. I can help with the kernel side of the reporting, should someone want to tackle the userspace side of it. Thanks, M.
On Thu, 05 Sep 2019 09:56:44 +0100, Peter Maydell <peter.maydell@linaro.org> wrote: > > On Thu, 5 Sep 2019 at 09:52, Marc Zyngier <maz@kernel.org> wrote: > > > > On Thu, 05 Sep 2019 09:16:54 +0100, > > Peter Maydell <peter.maydell@linaro.org> wrote: > > > This is true, but the problem is that barfing out to userspace > > > makes it harder to debug the guest because it means that > > > the VM is immediately destroyed, whereas AIUI if we > > > inject some kind of exception then (assuming you're set up > > > to do kernel-debug via gdbstub) you can actually examine > > > the offending guest code with a debugger because at least > > > your VM is still around to inspect... > > > > To Christoffer's point, I find the benefit a bit dubious. Yes, you get > > an exception, but the instruction that caused it may be completely > > legal (store with post-increment, for example), leading to an even > > more puzzled developer (that exception should never have been > > delivered the first place). > > Right, but the combination of "host kernel prints a message > about an unsupported load/store insn" and "within-guest debug > dump/stack trace/etc" is much more useful than just having > "host kernel prints message" and "QEMU exits"; and it requires > about 3 lines of code change... Which is wrong, and creates a new behaviour that isn't specified anywhere. > > > I'm far more in favour of dumping the state of the access in the run > > structure (much like we do for a MMIO access) and let userspace do > > something about it (such as dumping information on the console or > > breaking). It could even inject an exception *if* the user has asked > > for it. > > ...whereas this requires agreement on a kernel-userspace API, > larger changes in the kernel, somebody to implement the userspace > side of things, and the user to update both the kernel and QEMU. > It's hard for me to see that the benefit here over the 3-line > approach really outweighs the extra effort needed. 3 lines that already require the host kernel to be updated, and create a legacy that we'll never be able to get rid of. > In practice saying "we should do this" is saying "we're going to do > nothing", based on the historical record. Thanks for the vote of confidence... M.
On Wed, Sep 04, 2019 at 08:07:36PM +0200, Heinrich Schuchardt wrote: > If an application tries to access memory that is not mapped, an error > ENOSYS, "load/store instruction decoding not implemented" may occur. > QEMU will hang with a register dump. > > Instead create a data abort that can be handled gracefully by the > application running in the virtual environment. > > Now the virtual machine can react to the event in the most appropriate > way - by recovering, by writing an informative log, or by rebooting. > > Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de> > --- > virt/kvm/arm/mmio.c | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > diff --git a/virt/kvm/arm/mmio.c b/virt/kvm/arm/mmio.c > index a8a6a0c883f1..0cbed7d6a0f4 100644 > --- a/virt/kvm/arm/mmio.c > +++ b/virt/kvm/arm/mmio.c > @@ -161,8 +161,8 @@ int io_mem_abort(struct kvm_vcpu *vcpu, struct kvm_run *run, > if (ret) > return ret; > } else { > - kvm_err("load/store instruction decoding not implemented\n"); > - return -ENOSYS; > + kvm_inject_dabt(vcpu, kvm_vcpu_get_hfar(vcpu)); > + return 1; I see this more as a temporary debugging hack than something to merge. It sounds like in your case the guest environment provided good debugging information and you preferred it over debugging this from the host side. That's fine, but allowing the guest to continue running in the general case makes it much harder to track down the root cause of a problem because many guest CPU instructions may be executed after the original problem occurs. Other guest software may fail silently in weird ways. IMO it's best to fail early. Stefan
On Thu, Sep 05, 2019 at 09:56:44AM +0100, Peter Maydell wrote: > On Thu, 5 Sep 2019 at 09:52, Marc Zyngier <maz@kernel.org> wrote: > > > > On Thu, 05 Sep 2019 09:16:54 +0100, > > Peter Maydell <peter.maydell@linaro.org> wrote: > > > This is true, but the problem is that barfing out to userspace > > > makes it harder to debug the guest because it means that > > > the VM is immediately destroyed, whereas AIUI if we > > > inject some kind of exception then (assuming you're set up > > > to do kernel-debug via gdbstub) you can actually examine > > > the offending guest code with a debugger because at least > > > your VM is still around to inspect... > > > > To Christoffer's point, I find the benefit a bit dubious. Yes, you get > > an exception, but the instruction that caused it may be completely > > legal (store with post-increment, for example), leading to an even > > more puzzled developer (that exception should never have been > > delivered the first place). > > Right, but the combination of "host kernel prints a message > about an unsupported load/store insn" and "within-guest debug > dump/stack trace/etc" is much more useful than just having > "host kernel prints message" and "QEMU exits"; and it requires > about 3 lines of code change... > > > I'm far more in favour of dumping the state of the access in the run > > structure (much like we do for a MMIO access) and let userspace do > > something about it (such as dumping information on the console or > > breaking). It could even inject an exception *if* the user has asked > > for it. > > ...whereas this requires agreement on a kernel-userspace API, > larger changes in the kernel, somebody to implement the userspace > side of things, and the user to update both the kernel and QEMU. > It's hard for me to see that the benefit here over the 3-line > approach really outweighs the extra effort needed. In practice > saying "we should do this" is saying "we're going to do nothing", > based on the historical record. > How about something like the following (completely untested, liable for ABI discussions etc. etc., but for illustration purposes). I think it raises the question (and likely many other) of whether we can break the existing 'ABI' and change behavior for missing ISV retrospectively for legacy user space when the issue has occurred? Someone might have written code that reacts to the -ENOSYS, so I've taken the conservative approach for this for the time being. diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h index 8a37c8e89777..19a92c49039c 100644 --- a/arch/arm/include/asm/kvm_host.h +++ b/arch/arm/include/asm/kvm_host.h @@ -76,6 +76,14 @@ struct kvm_arch { /* Mandated version of PSCI */ u32 psci_version; + + /* + * If we encounter a data abort without valid instruction syndrome + * information, report this to user space. User space can (and + * should) opt in to this feature if KVM_CAP_ARM_NISV_TO_USER is + * supported. + */ + bool return_nisv_io_abort_to_user; }; #define KVM_NR_MEM_OBJS 40 diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h index f656169db8c3..019bc560edc1 100644 --- a/arch/arm64/include/asm/kvm_host.h +++ b/arch/arm64/include/asm/kvm_host.h @@ -83,6 +83,14 @@ struct kvm_arch { /* Mandated version of PSCI */ u32 psci_version; + + /* + * If we encounter a data abort without valid instruction syndrome + * information, report this to user space. User space can (and + * should) opt in to this feature if KVM_CAP_ARM_NISV_TO_USER is + * supported. + */ + bool return_nisv_io_abort_to_user; }; #define KVM_NR_MEM_OBJS 40 diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h index 5e3f12d5359e..a4dd004d0db9 100644 --- a/include/uapi/linux/kvm.h +++ b/include/uapi/linux/kvm.h @@ -235,6 +235,7 @@ struct kvm_hyperv_exit { #define KVM_EXIT_S390_STSI 25 #define KVM_EXIT_IOAPIC_EOI 26 #define KVM_EXIT_HYPERV 27 +#define KVM_EXIT_ARM_NISV 28 /* For KVM_EXIT_INTERNAL_ERROR */ /* Emulate instruction failed. */ @@ -996,6 +997,7 @@ struct kvm_ppc_resize_hpt { #define KVM_CAP_ARM_PTRAUTH_ADDRESS 171 #define KVM_CAP_ARM_PTRAUTH_GENERIC 172 #define KVM_CAP_PMU_EVENT_FILTER 173 +#define KVM_CAP_ARM_NISV_TO_USER 174 #ifdef KVM_CAP_IRQ_ROUTING diff --git a/virt/kvm/arm/arm.c b/virt/kvm/arm/arm.c index 35a069815baf..2ce94bd9d4a9 100644 --- a/virt/kvm/arm/arm.c +++ b/virt/kvm/arm/arm.c @@ -98,6 +98,26 @@ int kvm_arch_check_processor_compat(void) return 0; } +int kvm_vm_ioctl_enable_cap(struct kvm *kvm, + struct kvm_enable_cap *cap) +{ + int r; + + if (cap->flags) + return -EINVAL; + + switch (cap->cap) { + case KVM_CAP_ARM_NISV_TO_USER: + r = 0; + kvm->arch.return_nisv_io_abort_to_user = true; + break; + default: + r = -EINVAL; + break; + } + + return r; +} /** * kvm_arch_init_vm - initializes a VM data structure @@ -196,6 +216,7 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext) case KVM_CAP_MP_STATE: case KVM_CAP_IMMEDIATE_EXIT: case KVM_CAP_VCPU_EVENTS: + case KVM_CAP_ARM_NISV_TO_USER: r = 1; break; case KVM_CAP_ARM_SET_DEVICE_ADDR: @@ -673,6 +694,8 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run) ret = kvm_handle_mmio_return(vcpu, vcpu->run); if (ret) return ret; + } else if (run->exit_reason == KVM_EXIT_ARM_NISV) { + kvm_inject_undefined(vcpu); } if (run->immediate_exit) diff --git a/virt/kvm/arm/mmio.c b/virt/kvm/arm/mmio.c index 6af5c91337f2..62e6ef47a6de 100644 --- a/virt/kvm/arm/mmio.c +++ b/virt/kvm/arm/mmio.c @@ -167,8 +167,15 @@ int io_mem_abort(struct kvm_vcpu *vcpu, struct kvm_run *run, if (ret) return ret; } else { - kvm_err("load/store instruction decoding not implemented\n"); - return -ENOSYS; + if (vcpu->kvm->arch.return_nisv_io_abort_to_user) { + run->exit_reason = KVM_EXIT_ARM_NISV; + run->mmio.phys_addr = fault_ipa; + vcpu->stat.mmio_exit_user++; + return 0; + } else { + kvm_info("encountered data abort without syndrome info\n"); + return -ENOSYS; + } } rt = vcpu->arch.mmio_decode.rt; Thanks, Christoffer
On Thu, Sep 05, 2019 at 10:20:39AM +0100, Stefan Hajnoczi wrote: > On Wed, Sep 04, 2019 at 08:07:36PM +0200, Heinrich Schuchardt wrote: > > If an application tries to access memory that is not mapped, an error > > ENOSYS, "load/store instruction decoding not implemented" may occur. > > QEMU will hang with a register dump. > > > > Instead create a data abort that can be handled gracefully by the > > application running in the virtual environment. > > > > Now the virtual machine can react to the event in the most appropriate > > way - by recovering, by writing an informative log, or by rebooting. > > > > Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de> > > --- > > virt/kvm/arm/mmio.c | 4 ++-- > > 1 file changed, 2 insertions(+), 2 deletions(-) > > > > diff --git a/virt/kvm/arm/mmio.c b/virt/kvm/arm/mmio.c > > index a8a6a0c883f1..0cbed7d6a0f4 100644 > > --- a/virt/kvm/arm/mmio.c > > +++ b/virt/kvm/arm/mmio.c > > @@ -161,8 +161,8 @@ int io_mem_abort(struct kvm_vcpu *vcpu, struct kvm_run *run, > > if (ret) > > return ret; > > } else { > > - kvm_err("load/store instruction decoding not implemented\n"); > > - return -ENOSYS; > > + kvm_inject_dabt(vcpu, kvm_vcpu_get_hfar(vcpu)); > > + return 1; > > I see this more as a temporary debugging hack than something to merge. > > It sounds like in your case the guest environment provided good > debugging information and you preferred it over debugging this from the > host side. That's fine, but allowing the guest to continue running in > the general case makes it much harder to track down the root cause of a > problem because many guest CPU instructions may be executed after the > original problem occurs. Other guest software may fail silently in > weird ways. IMO it's best to fail early. The current error message is quite limited in its usefulness - mostly you have to be able to google the message and hope to hit a previous report which explains the problem, or find someone on IRC who remembers the problem, etc. Could we put a text doc in the kernel tree explaining the problem in enough detail that people can identify their next steps to resolve it, and then make this error message tell people to read that text doc. Regards, Daniel
On 9/5/19 11:20 AM, Stefan Hajnoczi wrote: > On Wed, Sep 04, 2019 at 08:07:36PM +0200, Heinrich Schuchardt wrote: >> If an application tries to access memory that is not mapped, an error >> ENOSYS, "load/store instruction decoding not implemented" may occur. >> QEMU will hang with a register dump. >> >> Instead create a data abort that can be handled gracefully by the >> application running in the virtual environment. >> >> Now the virtual machine can react to the event in the most appropriate >> way - by recovering, by writing an informative log, or by rebooting. >> >> Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de> >> --- >> virt/kvm/arm/mmio.c | 4 ++-- >> 1 file changed, 2 insertions(+), 2 deletions(-) >> >> diff --git a/virt/kvm/arm/mmio.c b/virt/kvm/arm/mmio.c >> index a8a6a0c883f1..0cbed7d6a0f4 100644 >> --- a/virt/kvm/arm/mmio.c >> +++ b/virt/kvm/arm/mmio.c >> @@ -161,8 +161,8 @@ int io_mem_abort(struct kvm_vcpu *vcpu, struct kvm_run *run, >> if (ret) >> return ret; >> } else { >> - kvm_err("load/store instruction decoding not implemented\n"); >> - return -ENOSYS; >> + kvm_inject_dabt(vcpu, kvm_vcpu_get_hfar(vcpu)); >> + return 1; > > I see this more as a temporary debugging hack than something to merge. > > It sounds like in your case the guest environment provided good > debugging information and you preferred it over debugging this from the > host side. That's fine, but allowing the guest to continue running in > the general case makes it much harder to track down the root cause of a > problem because many guest CPU instructions may be executed after the > original problem occurs. Other guest software may fail silently in > weird ways. IMO it's best to fail early. > > Stefan > As virtual machine are ubiquitous, expect also mission critical system to run on them. At development time halting a machine may be a good idea. In production this is often the worst solution. Rebooting may be essential for survival. For an anecdotal example see: https://www.hq.nasa.gov/alsj/a11/a11.1201-pa.html I am convinced that leaving it to the guest to decide how to react is the best choice. Best regards Heinrich
Hi Heinrich, On Thu, Sep 05, 2019 at 02:01:36PM +0200, Heinrich Schuchardt wrote: > On 9/5/19 11:20 AM, Stefan Hajnoczi wrote: > > On Wed, Sep 04, 2019 at 08:07:36PM +0200, Heinrich Schuchardt wrote: > > > If an application tries to access memory that is not mapped, an error > > > ENOSYS, "load/store instruction decoding not implemented" may occur. > > > QEMU will hang with a register dump. > > > > > > Instead create a data abort that can be handled gracefully by the > > > application running in the virtual environment. > > > > > > Now the virtual machine can react to the event in the most appropriate > > > way - by recovering, by writing an informative log, or by rebooting. > > > > > > Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de> > > > --- > > > virt/kvm/arm/mmio.c | 4 ++-- > > > 1 file changed, 2 insertions(+), 2 deletions(-) > > > > > > diff --git a/virt/kvm/arm/mmio.c b/virt/kvm/arm/mmio.c > > > index a8a6a0c883f1..0cbed7d6a0f4 100644 > > > --- a/virt/kvm/arm/mmio.c > > > +++ b/virt/kvm/arm/mmio.c > > > @@ -161,8 +161,8 @@ int io_mem_abort(struct kvm_vcpu *vcpu, struct kvm_run *run, > > > if (ret) > > > return ret; > > > } else { > > > - kvm_err("load/store instruction decoding not implemented\n"); > > > - return -ENOSYS; > > > + kvm_inject_dabt(vcpu, kvm_vcpu_get_hfar(vcpu)); > > > + return 1; > > > > I see this more as a temporary debugging hack than something to merge. > > > > It sounds like in your case the guest environment provided good > > debugging information and you preferred it over debugging this from the > > host side. That's fine, but allowing the guest to continue running in > > the general case makes it much harder to track down the root cause of a > > problem because many guest CPU instructions may be executed after the > > original problem occurs. Other guest software may fail silently in > > weird ways. IMO it's best to fail early. > > > > Stefan > > > > As virtual machine are ubiquitous, expect also mission critical system > to run on them. At development time halting a machine may be a good > idea. In production this is often the worst solution. Rebooting may be > essential for survival. > > For an anecdotal example see: > https://www.hq.nasa.gov/alsj/a11/a11.1201-pa.html > > I am convinced that leaving it to the guest to decide how to react is > the best choice. > Maintaining strong adherence to the architecture is equally important, and I'm sure we can find anecdotes to support how not doing the expected, can also lead to disastrous outcomes. Have you had a look at the suggested patch I sent? The idea is that we can preserve existing legacy ABI, allow for a better debugging experience, allow userspace to do emulation if it so wishes, and provide a better error message if userspace doesn't handle this properly. One thing we could change from my proposed patch would be to have KVM inject the access as an external abort if the target address also doesn't hit an MMIO device, which is by far the common scenario reported here on the list. Hopefully, a mission critical deployment based on KVM/Arm (scary as that sounds), would use a recent and patched VMM (QEMU) that either causes the external abort, or reboots the VM, as per the configuration of the particular system in question. Thanks, Christoffer
On 05/09/2019 10:22, Christoffer Dall wrote: > On Thu, Sep 05, 2019 at 09:56:44AM +0100, Peter Maydell wrote: >> On Thu, 5 Sep 2019 at 09:52, Marc Zyngier <maz@kernel.org> wrote: >>> >>> On Thu, 05 Sep 2019 09:16:54 +0100, >>> Peter Maydell <peter.maydell@linaro.org> wrote: >>>> This is true, but the problem is that barfing out to userspace >>>> makes it harder to debug the guest because it means that >>>> the VM is immediately destroyed, whereas AIUI if we >>>> inject some kind of exception then (assuming you're set up >>>> to do kernel-debug via gdbstub) you can actually examine >>>> the offending guest code with a debugger because at least >>>> your VM is still around to inspect... >>> >>> To Christoffer's point, I find the benefit a bit dubious. Yes, you get >>> an exception, but the instruction that caused it may be completely >>> legal (store with post-increment, for example), leading to an even >>> more puzzled developer (that exception should never have been >>> delivered the first place). >> >> Right, but the combination of "host kernel prints a message >> about an unsupported load/store insn" and "within-guest debug >> dump/stack trace/etc" is much more useful than just having >> "host kernel prints message" and "QEMU exits"; and it requires >> about 3 lines of code change... >> >>> I'm far more in favour of dumping the state of the access in the run >>> structure (much like we do for a MMIO access) and let userspace do >>> something about it (such as dumping information on the console or >>> breaking). It could even inject an exception *if* the user has asked >>> for it. >> >> ...whereas this requires agreement on a kernel-userspace API, >> larger changes in the kernel, somebody to implement the userspace >> side of things, and the user to update both the kernel and QEMU. >> It's hard for me to see that the benefit here over the 3-line >> approach really outweighs the extra effort needed. In practice >> saying "we should do this" is saying "we're going to do nothing", >> based on the historical record. >> > > How about something like the following (completely untested, liable for > ABI discussions etc. etc., but for illustration purposes). > > I think it raises the question (and likely many other) of whether we can > break the existing 'ABI' and change behavior for missing ISV > retrospectively for legacy user space when the issue has occurred? > > Someone might have written code that reacts to the -ENOSYS, so I've > taken the conservative approach for this for the time being. > > > diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h > index 8a37c8e89777..19a92c49039c 100644 > --- a/arch/arm/include/asm/kvm_host.h > +++ b/arch/arm/include/asm/kvm_host.h > @@ -76,6 +76,14 @@ struct kvm_arch { > > /* Mandated version of PSCI */ > u32 psci_version; > + > + /* > + * If we encounter a data abort without valid instruction syndrome > + * information, report this to user space. User space can (and > + * should) opt in to this feature if KVM_CAP_ARM_NISV_TO_USER is > + * supported. > + */ > + bool return_nisv_io_abort_to_user; > }; > > #define KVM_NR_MEM_OBJS 40 > diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h > index f656169db8c3..019bc560edc1 100644 > --- a/arch/arm64/include/asm/kvm_host.h > +++ b/arch/arm64/include/asm/kvm_host.h > @@ -83,6 +83,14 @@ struct kvm_arch { > > /* Mandated version of PSCI */ > u32 psci_version; > + > + /* > + * If we encounter a data abort without valid instruction syndrome > + * information, report this to user space. User space can (and > + * should) opt in to this feature if KVM_CAP_ARM_NISV_TO_USER is > + * supported. > + */ > + bool return_nisv_io_abort_to_user; > }; > > #define KVM_NR_MEM_OBJS 40 > diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h > index 5e3f12d5359e..a4dd004d0db9 100644 > --- a/include/uapi/linux/kvm.h > +++ b/include/uapi/linux/kvm.h > @@ -235,6 +235,7 @@ struct kvm_hyperv_exit { > #define KVM_EXIT_S390_STSI 25 > #define KVM_EXIT_IOAPIC_EOI 26 > #define KVM_EXIT_HYPERV 27 > +#define KVM_EXIT_ARM_NISV 28 > > /* For KVM_EXIT_INTERNAL_ERROR */ > /* Emulate instruction failed. */ > @@ -996,6 +997,7 @@ struct kvm_ppc_resize_hpt { > #define KVM_CAP_ARM_PTRAUTH_ADDRESS 171 > #define KVM_CAP_ARM_PTRAUTH_GENERIC 172 > #define KVM_CAP_PMU_EVENT_FILTER 173 > +#define KVM_CAP_ARM_NISV_TO_USER 174 > > #ifdef KVM_CAP_IRQ_ROUTING > > diff --git a/virt/kvm/arm/arm.c b/virt/kvm/arm/arm.c > index 35a069815baf..2ce94bd9d4a9 100644 > --- a/virt/kvm/arm/arm.c > +++ b/virt/kvm/arm/arm.c > @@ -98,6 +98,26 @@ int kvm_arch_check_processor_compat(void) > return 0; > } > > +int kvm_vm_ioctl_enable_cap(struct kvm *kvm, > + struct kvm_enable_cap *cap) > +{ > + int r; > + > + if (cap->flags) > + return -EINVAL; > + > + switch (cap->cap) { > + case KVM_CAP_ARM_NISV_TO_USER: > + r = 0; > + kvm->arch.return_nisv_io_abort_to_user = true; > + break; > + default: > + r = -EINVAL; > + break; > + } > + > + return r; > +} > > /** > * kvm_arch_init_vm - initializes a VM data structure > @@ -196,6 +216,7 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext) > case KVM_CAP_MP_STATE: > case KVM_CAP_IMMEDIATE_EXIT: > case KVM_CAP_VCPU_EVENTS: > + case KVM_CAP_ARM_NISV_TO_USER: > r = 1; > break; > case KVM_CAP_ARM_SET_DEVICE_ADDR: > @@ -673,6 +694,8 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run) > ret = kvm_handle_mmio_return(vcpu, vcpu->run); > if (ret) > return ret; > + } else if (run->exit_reason == KVM_EXIT_ARM_NISV) { > + kvm_inject_undefined(vcpu); Just to make sure I understand: Is the expectation here that userspace could clear the exit reason if it managed to handle the exit? And otherwise we'd inject an UNDEF on reentry? > } > > if (run->immediate_exit) > diff --git a/virt/kvm/arm/mmio.c b/virt/kvm/arm/mmio.c > index 6af5c91337f2..62e6ef47a6de 100644 > --- a/virt/kvm/arm/mmio.c > +++ b/virt/kvm/arm/mmio.c > @@ -167,8 +167,15 @@ int io_mem_abort(struct kvm_vcpu *vcpu, struct kvm_run *run, > if (ret) > return ret; > } else { > - kvm_err("load/store instruction decoding not implemented\n"); > - return -ENOSYS; > + if (vcpu->kvm->arch.return_nisv_io_abort_to_user) { > + run->exit_reason = KVM_EXIT_ARM_NISV; > + run->mmio.phys_addr = fault_ipa; We could also record whether that's a read or a write (WnR should still be valid). Actually, we could store a sanitized version of the ESR. > + vcpu->stat.mmio_exit_user++; > + return 0; > + } else { > + kvm_info("encountered data abort without syndrome info\n"); My only issue with this is that the previous message has been sort of documented... Thanks, M.
On 9/5/19 11:22 AM, Christoffer Dall wrote: > On Thu, Sep 05, 2019 at 09:56:44AM +0100, Peter Maydell wrote: >> On Thu, 5 Sep 2019 at 09:52, Marc Zyngier <maz@kernel.org> wrote: >>> >>> On Thu, 05 Sep 2019 09:16:54 +0100, >>> Peter Maydell <peter.maydell@linaro.org> wrote: >>>> This is true, but the problem is that barfing out to userspace >>>> makes it harder to debug the guest because it means that >>>> the VM is immediately destroyed, whereas AIUI if we >>>> inject some kind of exception then (assuming you're set up >>>> to do kernel-debug via gdbstub) you can actually examine >>>> the offending guest code with a debugger because at least >>>> your VM is still around to inspect... >>> >>> To Christoffer's point, I find the benefit a bit dubious. Yes, you get >>> an exception, but the instruction that caused it may be completely >>> legal (store with post-increment, for example), leading to an even >>> more puzzled developer (that exception should never have been >>> delivered the first place). >> >> Right, but the combination of "host kernel prints a message >> about an unsupported load/store insn" and "within-guest debug >> dump/stack trace/etc" is much more useful than just having >> "host kernel prints message" and "QEMU exits"; and it requires >> about 3 lines of code change... >> >>> I'm far more in favour of dumping the state of the access in the run >>> structure (much like we do for a MMIO access) and let userspace do >>> something about it (such as dumping information on the console or >>> breaking). It could even inject an exception *if* the user has asked >>> for it. >> >> ...whereas this requires agreement on a kernel-userspace API, >> larger changes in the kernel, somebody to implement the userspace >> side of things, and the user to update both the kernel and QEMU. >> It's hard for me to see that the benefit here over the 3-line >> approach really outweighs the extra effort needed. In practice >> saying "we should do this" is saying "we're going to do nothing", >> based on the historical record. >> > > How about something like the following (completely untested, liable for > ABI discussions etc. etc., but for illustration purposes). > > I think it raises the question (and likely many other) of whether we can > break the existing 'ABI' and change behavior for missing ISV > retrospectively for legacy user space when the issue has occurred? > > Someone might have written code that reacts to the -ENOSYS, so I've > taken the conservative approach for this for the time being. > > > diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h > index 8a37c8e89777..19a92c49039c 100644 > --- a/arch/arm/include/asm/kvm_host.h > +++ b/arch/arm/include/asm/kvm_host.h > @@ -76,6 +76,14 @@ struct kvm_arch { > > /* Mandated version of PSCI */ > u32 psci_version; > + > + /* > + * If we encounter a data abort without valid instruction syndrome > + * information, report this to user space. User space can (and > + * should) opt in to this feature if KVM_CAP_ARM_NISV_TO_USER is > + * supported. > + */ > + bool return_nisv_io_abort_to_user; > }; > > #define KVM_NR_MEM_OBJS 40 > diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h > index f656169db8c3..019bc560edc1 100644 > --- a/arch/arm64/include/asm/kvm_host.h > +++ b/arch/arm64/include/asm/kvm_host.h > @@ -83,6 +83,14 @@ struct kvm_arch { > > /* Mandated version of PSCI */ > u32 psci_version; > + > + /* > + * If we encounter a data abort without valid instruction syndrome > + * information, report this to user space. User space can (and > + * should) opt in to this feature if KVM_CAP_ARM_NISV_TO_USER is > + * supported. > + */ > + bool return_nisv_io_abort_to_user; How about 32bit ARM? > }; > > #define KVM_NR_MEM_OBJS 40 > diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h > index 5e3f12d5359e..a4dd004d0db9 100644 > --- a/include/uapi/linux/kvm.h > +++ b/include/uapi/linux/kvm.h > @@ -235,6 +235,7 @@ struct kvm_hyperv_exit { > #define KVM_EXIT_S390_STSI 25 > #define KVM_EXIT_IOAPIC_EOI 26 > #define KVM_EXIT_HYPERV 27 > +#define KVM_EXIT_ARM_NISV 28 > > /* For KVM_EXIT_INTERNAL_ERROR */ > /* Emulate instruction failed. */ > @@ -996,6 +997,7 @@ struct kvm_ppc_resize_hpt { > #define KVM_CAP_ARM_PTRAUTH_ADDRESS 171 > #define KVM_CAP_ARM_PTRAUTH_GENERIC 172 > #define KVM_CAP_PMU_EVENT_FILTER 173 > +#define KVM_CAP_ARM_NISV_TO_USER 174 > > #ifdef KVM_CAP_IRQ_ROUTING > > diff --git a/virt/kvm/arm/arm.c b/virt/kvm/arm/arm.c > index 35a069815baf..2ce94bd9d4a9 100644 > --- a/virt/kvm/arm/arm.c > +++ b/virt/kvm/arm/arm.c > @@ -98,6 +98,26 @@ int kvm_arch_check_processor_compat(void) > return 0; > } > > +int kvm_vm_ioctl_enable_cap(struct kvm *kvm, > + struct kvm_enable_cap *cap) This overrides the weak implementation in virt/kvm/kvm_main.c. OK. > +{ > + int r; > + > + if (cap->flags) > + return -EINVAL; > + > + switch (cap->cap) { > + case KVM_CAP_ARM_NISV_TO_USER: > + r = 0; > + kvm->arch.return_nisv_io_abort_to_user = true; > + break; > + default: > + r = -EINVAL; > + break; > + } > + > + return r; > +} > > /** > * kvm_arch_init_vm - initializes a VM data structure > @@ -196,6 +216,7 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext) > case KVM_CAP_MP_STATE: > case KVM_CAP_IMMEDIATE_EXIT: > case KVM_CAP_VCPU_EVENTS: > + case KVM_CAP_ARM_NISV_TO_USER: > r = 1; > break; > case KVM_CAP_ARM_SET_DEVICE_ADDR: > @@ -673,6 +694,8 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run) > ret = kvm_handle_mmio_return(vcpu, vcpu->run); > if (ret) > return ret; > + } else if (run->exit_reason == KVM_EXIT_ARM_NISV) { > + kvm_inject_undefined(vcpu); So QEMU can try to enable the feature via IOCTL. And here you would raise the 'undefined instruction' exception which QEMU will have to handle in the loop calling KVM either by trying to make sense of the instruction or by passing it on to the guest. Conceptually this looks good to me and meets the requirements of my application. Thanks a lot for your suggestion. Regards Heinrich > } > > if (run->immediate_exit) > diff --git a/virt/kvm/arm/mmio.c b/virt/kvm/arm/mmio.c > index 6af5c91337f2..62e6ef47a6de 100644 > --- a/virt/kvm/arm/mmio.c > +++ b/virt/kvm/arm/mmio.c > @@ -167,8 +167,15 @@ int io_mem_abort(struct kvm_vcpu *vcpu, struct kvm_run *run, > if (ret) > return ret; > } else { > - kvm_err("load/store instruction decoding not implemented\n"); > - return -ENOSYS; > + if (vcpu->kvm->arch.return_nisv_io_abort_to_user) { > + run->exit_reason = KVM_EXIT_ARM_NISV; > + run->mmio.phys_addr = fault_ipa; > + vcpu->stat.mmio_exit_user++; > + return 0; > + } else { > + kvm_info("encountered data abort without syndrome info\n"); > + return -ENOSYS; > + } > } > > rt = vcpu->arch.mmio_decode.rt; > > > Thanks, > > Christoffer >
On Thu, Sep 05, 2019 at 03:25:47PM +0200, Heinrich Schuchardt wrote: > On 9/5/19 11:22 AM, Christoffer Dall wrote: > > On Thu, Sep 05, 2019 at 09:56:44AM +0100, Peter Maydell wrote: > > > On Thu, 5 Sep 2019 at 09:52, Marc Zyngier <maz@kernel.org> wrote: > > > > > > > > On Thu, 05 Sep 2019 09:16:54 +0100, > > > > Peter Maydell <peter.maydell@linaro.org> wrote: > > > > > This is true, but the problem is that barfing out to userspace > > > > > makes it harder to debug the guest because it means that > > > > > the VM is immediately destroyed, whereas AIUI if we > > > > > inject some kind of exception then (assuming you're set up > > > > > to do kernel-debug via gdbstub) you can actually examine > > > > > the offending guest code with a debugger because at least > > > > > your VM is still around to inspect... > > > > > > > > To Christoffer's point, I find the benefit a bit dubious. Yes, you get > > > > an exception, but the instruction that caused it may be completely > > > > legal (store with post-increment, for example), leading to an even > > > > more puzzled developer (that exception should never have been > > > > delivered the first place). > > > > > > Right, but the combination of "host kernel prints a message > > > about an unsupported load/store insn" and "within-guest debug > > > dump/stack trace/etc" is much more useful than just having > > > "host kernel prints message" and "QEMU exits"; and it requires > > > about 3 lines of code change... > > > > > > > I'm far more in favour of dumping the state of the access in the run > > > > structure (much like we do for a MMIO access) and let userspace do > > > > something about it (such as dumping information on the console or > > > > breaking). It could even inject an exception *if* the user has asked > > > > for it. > > > > > > ...whereas this requires agreement on a kernel-userspace API, > > > larger changes in the kernel, somebody to implement the userspace > > > side of things, and the user to update both the kernel and QEMU. > > > It's hard for me to see that the benefit here over the 3-line > > > approach really outweighs the extra effort needed. In practice > > > saying "we should do this" is saying "we're going to do nothing", > > > based on the historical record. > > > > > > > How about something like the following (completely untested, liable for > > ABI discussions etc. etc., but for illustration purposes). > > > > I think it raises the question (and likely many other) of whether we can > > break the existing 'ABI' and change behavior for missing ISV > > retrospectively for legacy user space when the issue has occurred? > > > > Someone might have written code that reacts to the -ENOSYS, so I've > > taken the conservative approach for this for the time being. > > > > > > diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h > > index 8a37c8e89777..19a92c49039c 100644 > > --- a/arch/arm/include/asm/kvm_host.h > > +++ b/arch/arm/include/asm/kvm_host.h > > @@ -76,6 +76,14 @@ struct kvm_arch { > > > > /* Mandated version of PSCI */ > > u32 psci_version; > > + > > + /* > > + * If we encounter a data abort without valid instruction syndrome > > + * information, report this to user space. User space can (and > > + * should) opt in to this feature if KVM_CAP_ARM_NISV_TO_USER is > > + * supported. > > + */ > > + bool return_nisv_io_abort_to_user; > > }; > > > > #define KVM_NR_MEM_OBJS 40 > > diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h > > index f656169db8c3..019bc560edc1 100644 > > --- a/arch/arm64/include/asm/kvm_host.h > > +++ b/arch/arm64/include/asm/kvm_host.h > > @@ -83,6 +83,14 @@ struct kvm_arch { > > > > /* Mandated version of PSCI */ > > u32 psci_version; > > + > > + /* > > + * If we encounter a data abort without valid instruction syndrome > > + * information, report this to user space. User space can (and > > + * should) opt in to this feature if KVM_CAP_ARM_NISV_TO_USER is > > + * supported. > > + */ > > + bool return_nisv_io_abort_to_user; > > How about 32bit ARM? > What about it? Not sure I understand the comment. > > }; > > > > #define KVM_NR_MEM_OBJS 40 > > diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h > > index 5e3f12d5359e..a4dd004d0db9 100644 > > --- a/include/uapi/linux/kvm.h > > +++ b/include/uapi/linux/kvm.h > > @@ -235,6 +235,7 @@ struct kvm_hyperv_exit { > > #define KVM_EXIT_S390_STSI 25 > > #define KVM_EXIT_IOAPIC_EOI 26 > > #define KVM_EXIT_HYPERV 27 > > +#define KVM_EXIT_ARM_NISV 28 > > > > /* For KVM_EXIT_INTERNAL_ERROR */ > > /* Emulate instruction failed. */ > > @@ -996,6 +997,7 @@ struct kvm_ppc_resize_hpt { > > #define KVM_CAP_ARM_PTRAUTH_ADDRESS 171 > > #define KVM_CAP_ARM_PTRAUTH_GENERIC 172 > > #define KVM_CAP_PMU_EVENT_FILTER 173 > > +#define KVM_CAP_ARM_NISV_TO_USER 174 > > > > #ifdef KVM_CAP_IRQ_ROUTING > > > > diff --git a/virt/kvm/arm/arm.c b/virt/kvm/arm/arm.c > > index 35a069815baf..2ce94bd9d4a9 100644 > > --- a/virt/kvm/arm/arm.c > > +++ b/virt/kvm/arm/arm.c > > @@ -98,6 +98,26 @@ int kvm_arch_check_processor_compat(void) > > return 0; > > } > > > > +int kvm_vm_ioctl_enable_cap(struct kvm *kvm, > > + struct kvm_enable_cap *cap) > > This overrides the weak implementation in virt/kvm/kvm_main.c. OK. > Yes. > > +{ > > + int r; > > + > > + if (cap->flags) > > + return -EINVAL; > > + > > + switch (cap->cap) { > > + case KVM_CAP_ARM_NISV_TO_USER: > > + r = 0; > > + kvm->arch.return_nisv_io_abort_to_user = true; > > + break; > > + default: > > + r = -EINVAL; > > + break; > > + } > > + > > + return r; > > +} > > > > /** > > * kvm_arch_init_vm - initializes a VM data structure > > @@ -196,6 +216,7 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext) > > case KVM_CAP_MP_STATE: > > case KVM_CAP_IMMEDIATE_EXIT: > > case KVM_CAP_VCPU_EVENTS: > > + case KVM_CAP_ARM_NISV_TO_USER: > > r = 1; > > break; > > case KVM_CAP_ARM_SET_DEVICE_ADDR: > > @@ -673,6 +694,8 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run) > > ret = kvm_handle_mmio_return(vcpu, vcpu->run); > > if (ret) > > return ret; > > + } else if (run->exit_reason == KVM_EXIT_ARM_NISV) { > > + kvm_inject_undefined(vcpu); > > So QEMU can try to enable the feature via IOCTL. And here you would > raise the 'undefined instruction' exception which QEMU will have to > handle in the loop calling KVM either by trying to make sense of the > instruction or by passing it on to the guest. > > Conceptually this looks good to me and meets the requirements of my > application. > > Thanks a lot for your suggestion. > I will change the undef to an external abort as I think that's more in line with the architecture, and document this, test and send out as a proper patch. Thanks, Christoffer
On Thu, Sep 05, 2019 at 02:09:18PM +0100, Marc Zyngier wrote: > On 05/09/2019 10:22, Christoffer Dall wrote: > > On Thu, Sep 05, 2019 at 09:56:44AM +0100, Peter Maydell wrote: > >> On Thu, 5 Sep 2019 at 09:52, Marc Zyngier <maz@kernel.org> wrote: > >>> > >>> On Thu, 05 Sep 2019 09:16:54 +0100, > >>> Peter Maydell <peter.maydell@linaro.org> wrote: > >>>> This is true, but the problem is that barfing out to userspace > >>>> makes it harder to debug the guest because it means that > >>>> the VM is immediately destroyed, whereas AIUI if we > >>>> inject some kind of exception then (assuming you're set up > >>>> to do kernel-debug via gdbstub) you can actually examine > >>>> the offending guest code with a debugger because at least > >>>> your VM is still around to inspect... > >>> > >>> To Christoffer's point, I find the benefit a bit dubious. Yes, you get > >>> an exception, but the instruction that caused it may be completely > >>> legal (store with post-increment, for example), leading to an even > >>> more puzzled developer (that exception should never have been > >>> delivered the first place). > >> > >> Right, but the combination of "host kernel prints a message > >> about an unsupported load/store insn" and "within-guest debug > >> dump/stack trace/etc" is much more useful than just having > >> "host kernel prints message" and "QEMU exits"; and it requires > >> about 3 lines of code change... > >> > >>> I'm far more in favour of dumping the state of the access in the run > >>> structure (much like we do for a MMIO access) and let userspace do > >>> something about it (such as dumping information on the console or > >>> breaking). It could even inject an exception *if* the user has asked > >>> for it. > >> > >> ...whereas this requires agreement on a kernel-userspace API, > >> larger changes in the kernel, somebody to implement the userspace > >> side of things, and the user to update both the kernel and QEMU. > >> It's hard for me to see that the benefit here over the 3-line > >> approach really outweighs the extra effort needed. In practice > >> saying "we should do this" is saying "we're going to do nothing", > >> based on the historical record. > >> > > > > How about something like the following (completely untested, liable for > > ABI discussions etc. etc., but for illustration purposes). > > > > I think it raises the question (and likely many other) of whether we can > > break the existing 'ABI' and change behavior for missing ISV > > retrospectively for legacy user space when the issue has occurred? > > > > Someone might have written code that reacts to the -ENOSYS, so I've > > taken the conservative approach for this for the time being. > > > > > > diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h > > index 8a37c8e89777..19a92c49039c 100644 > > --- a/arch/arm/include/asm/kvm_host.h > > +++ b/arch/arm/include/asm/kvm_host.h > > @@ -76,6 +76,14 @@ struct kvm_arch { > > > > /* Mandated version of PSCI */ > > u32 psci_version; > > + > > + /* > > + * If we encounter a data abort without valid instruction syndrome > > + * information, report this to user space. User space can (and > > + * should) opt in to this feature if KVM_CAP_ARM_NISV_TO_USER is > > + * supported. > > + */ > > + bool return_nisv_io_abort_to_user; > > }; > > > > #define KVM_NR_MEM_OBJS 40 > > diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h > > index f656169db8c3..019bc560edc1 100644 > > --- a/arch/arm64/include/asm/kvm_host.h > > +++ b/arch/arm64/include/asm/kvm_host.h > > @@ -83,6 +83,14 @@ struct kvm_arch { > > > > /* Mandated version of PSCI */ > > u32 psci_version; > > + > > + /* > > + * If we encounter a data abort without valid instruction syndrome > > + * information, report this to user space. User space can (and > > + * should) opt in to this feature if KVM_CAP_ARM_NISV_TO_USER is > > + * supported. > > + */ > > + bool return_nisv_io_abort_to_user; > > }; > > > > #define KVM_NR_MEM_OBJS 40 > > diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h > > index 5e3f12d5359e..a4dd004d0db9 100644 > > --- a/include/uapi/linux/kvm.h > > +++ b/include/uapi/linux/kvm.h > > @@ -235,6 +235,7 @@ struct kvm_hyperv_exit { > > #define KVM_EXIT_S390_STSI 25 > > #define KVM_EXIT_IOAPIC_EOI 26 > > #define KVM_EXIT_HYPERV 27 > > +#define KVM_EXIT_ARM_NISV 28 > > > > /* For KVM_EXIT_INTERNAL_ERROR */ > > /* Emulate instruction failed. */ > > @@ -996,6 +997,7 @@ struct kvm_ppc_resize_hpt { > > #define KVM_CAP_ARM_PTRAUTH_ADDRESS 171 > > #define KVM_CAP_ARM_PTRAUTH_GENERIC 172 > > #define KVM_CAP_PMU_EVENT_FILTER 173 > > +#define KVM_CAP_ARM_NISV_TO_USER 174 > > > > #ifdef KVM_CAP_IRQ_ROUTING > > > > diff --git a/virt/kvm/arm/arm.c b/virt/kvm/arm/arm.c > > index 35a069815baf..2ce94bd9d4a9 100644 > > --- a/virt/kvm/arm/arm.c > > +++ b/virt/kvm/arm/arm.c > > @@ -98,6 +98,26 @@ int kvm_arch_check_processor_compat(void) > > return 0; > > } > > > > +int kvm_vm_ioctl_enable_cap(struct kvm *kvm, > > + struct kvm_enable_cap *cap) > > +{ > > + int r; > > + > > + if (cap->flags) > > + return -EINVAL; > > + > > + switch (cap->cap) { > > + case KVM_CAP_ARM_NISV_TO_USER: > > + r = 0; > > + kvm->arch.return_nisv_io_abort_to_user = true; > > + break; > > + default: > > + r = -EINVAL; > > + break; > > + } > > + > > + return r; > > +} > > > > /** > > * kvm_arch_init_vm - initializes a VM data structure > > @@ -196,6 +216,7 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext) > > case KVM_CAP_MP_STATE: > > case KVM_CAP_IMMEDIATE_EXIT: > > case KVM_CAP_VCPU_EVENTS: > > + case KVM_CAP_ARM_NISV_TO_USER: > > r = 1; > > break; > > case KVM_CAP_ARM_SET_DEVICE_ADDR: > > @@ -673,6 +694,8 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run) > > ret = kvm_handle_mmio_return(vcpu, vcpu->run); > > if (ret) > > return ret; > > + } else if (run->exit_reason == KVM_EXIT_ARM_NISV) { > > + kvm_inject_undefined(vcpu); > > Just to make sure I understand: Is the expectation here that userspace > could clear the exit reason if it managed to handle the exit? And > otherwise we'd inject an UNDEF on reentry? > Yes, but I think we should change that to an external abort. I'll test something and send a proper patch with more clear documentation. > > } > > > > if (run->immediate_exit) > > diff --git a/virt/kvm/arm/mmio.c b/virt/kvm/arm/mmio.c > > index 6af5c91337f2..62e6ef47a6de 100644 > > --- a/virt/kvm/arm/mmio.c > > +++ b/virt/kvm/arm/mmio.c > > @@ -167,8 +167,15 @@ int io_mem_abort(struct kvm_vcpu *vcpu, struct kvm_run *run, > > if (ret) > > return ret; > > } else { > > - kvm_err("load/store instruction decoding not implemented\n"); > > - return -ENOSYS; > > + if (vcpu->kvm->arch.return_nisv_io_abort_to_user) { > > + run->exit_reason = KVM_EXIT_ARM_NISV; > > + run->mmio.phys_addr = fault_ipa; > > We could also record whether that's a read or a write (WnR should still > be valid). Actually, we could store a sanitized version of the ESR. > Ah yes, I'll incorporate that. > > + vcpu->stat.mmio_exit_user++; > > + return 0; > > + } else { > > + kvm_info("encountered data abort without syndrome info\n"); > > My only issue with this is that the previous message has been sort of > documented... Well, my main gripe with the current code is that the error message is massively misleading because it explains one possible case, which is very "kernel part of a KVM VM centric" and is actually not the common scenario that people encounter. Let me work on the particular wording of the error message and see if I can achieve something self-documenting. Thanks, Christoffer
On 06.09.19 10:00, Christoffer Dall wrote: > On Thu, Sep 05, 2019 at 02:09:18PM +0100, Marc Zyngier wrote: >> On 05/09/2019 10:22, Christoffer Dall wrote: >>> On Thu, Sep 05, 2019 at 09:56:44AM +0100, Peter Maydell wrote: >>>> On Thu, 5 Sep 2019 at 09:52, Marc Zyngier <maz@kernel.org> wrote: >>>>> >>>>> On Thu, 05 Sep 2019 09:16:54 +0100, >>>>> Peter Maydell <peter.maydell@linaro.org> wrote: >>>>>> This is true, but the problem is that barfing out to userspace >>>>>> makes it harder to debug the guest because it means that >>>>>> the VM is immediately destroyed, whereas AIUI if we >>>>>> inject some kind of exception then (assuming you're set up >>>>>> to do kernel-debug via gdbstub) you can actually examine >>>>>> the offending guest code with a debugger because at least >>>>>> your VM is still around to inspect... >>>>> >>>>> To Christoffer's point, I find the benefit a bit dubious. Yes, you get >>>>> an exception, but the instruction that caused it may be completely >>>>> legal (store with post-increment, for example), leading to an even >>>>> more puzzled developer (that exception should never have been >>>>> delivered the first place). >>>> >>>> Right, but the combination of "host kernel prints a message >>>> about an unsupported load/store insn" and "within-guest debug >>>> dump/stack trace/etc" is much more useful than just having >>>> "host kernel prints message" and "QEMU exits"; and it requires >>>> about 3 lines of code change... >>>> >>>>> I'm far more in favour of dumping the state of the access in the run >>>>> structure (much like we do for a MMIO access) and let userspace do >>>>> something about it (such as dumping information on the console or >>>>> breaking). It could even inject an exception *if* the user has asked >>>>> for it. >>>> >>>> ...whereas this requires agreement on a kernel-userspace API, >>>> larger changes in the kernel, somebody to implement the userspace >>>> side of things, and the user to update both the kernel and QEMU. >>>> It's hard for me to see that the benefit here over the 3-line >>>> approach really outweighs the extra effort needed. In practice >>>> saying "we should do this" is saying "we're going to do nothing", >>>> based on the historical record. >>>> >>> >>> How about something like the following (completely untested, liable for >>> ABI discussions etc. etc., but for illustration purposes). >>> >>> I think it raises the question (and likely many other) of whether we can >>> break the existing 'ABI' and change behavior for missing ISV >>> retrospectively for legacy user space when the issue has occurred? >>> >>> Someone might have written code that reacts to the -ENOSYS, so I've >>> taken the conservative approach for this for the time being. >>> >>> >>> diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h >>> index 8a37c8e89777..19a92c49039c 100644 >>> --- a/arch/arm/include/asm/kvm_host.h >>> +++ b/arch/arm/include/asm/kvm_host.h >>> @@ -76,6 +76,14 @@ struct kvm_arch { >>> >>> /* Mandated version of PSCI */ >>> u32 psci_version; >>> + >>> + /* >>> + * If we encounter a data abort without valid instruction syndrome >>> + * information, report this to user space. User space can (and >>> + * should) opt in to this feature if KVM_CAP_ARM_NISV_TO_USER is >>> + * supported. >>> + */ >>> + bool return_nisv_io_abort_to_user; >>> }; >>> >>> #define KVM_NR_MEM_OBJS 40 >>> diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h >>> index f656169db8c3..019bc560edc1 100644 >>> --- a/arch/arm64/include/asm/kvm_host.h >>> +++ b/arch/arm64/include/asm/kvm_host.h >>> @@ -83,6 +83,14 @@ struct kvm_arch { >>> >>> /* Mandated version of PSCI */ >>> u32 psci_version; >>> + >>> + /* >>> + * If we encounter a data abort without valid instruction syndrome >>> + * information, report this to user space. User space can (and >>> + * should) opt in to this feature if KVM_CAP_ARM_NISV_TO_USER is >>> + * supported. >>> + */ >>> + bool return_nisv_io_abort_to_user; >>> }; >>> >>> #define KVM_NR_MEM_OBJS 40 >>> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h >>> index 5e3f12d5359e..a4dd004d0db9 100644 >>> --- a/include/uapi/linux/kvm.h >>> +++ b/include/uapi/linux/kvm.h >>> @@ -235,6 +235,7 @@ struct kvm_hyperv_exit { >>> #define KVM_EXIT_S390_STSI 25 >>> #define KVM_EXIT_IOAPIC_EOI 26 >>> #define KVM_EXIT_HYPERV 27 >>> +#define KVM_EXIT_ARM_NISV 28 >>> >>> /* For KVM_EXIT_INTERNAL_ERROR */ >>> /* Emulate instruction failed. */ >>> @@ -996,6 +997,7 @@ struct kvm_ppc_resize_hpt { >>> #define KVM_CAP_ARM_PTRAUTH_ADDRESS 171 >>> #define KVM_CAP_ARM_PTRAUTH_GENERIC 172 >>> #define KVM_CAP_PMU_EVENT_FILTER 173 >>> +#define KVM_CAP_ARM_NISV_TO_USER 174 >>> >>> #ifdef KVM_CAP_IRQ_ROUTING >>> >>> diff --git a/virt/kvm/arm/arm.c b/virt/kvm/arm/arm.c >>> index 35a069815baf..2ce94bd9d4a9 100644 >>> --- a/virt/kvm/arm/arm.c >>> +++ b/virt/kvm/arm/arm.c >>> @@ -98,6 +98,26 @@ int kvm_arch_check_processor_compat(void) >>> return 0; >>> } >>> >>> +int kvm_vm_ioctl_enable_cap(struct kvm *kvm, >>> + struct kvm_enable_cap *cap) >>> +{ >>> + int r; >>> + >>> + if (cap->flags) >>> + return -EINVAL; >>> + >>> + switch (cap->cap) { >>> + case KVM_CAP_ARM_NISV_TO_USER: >>> + r = 0; >>> + kvm->arch.return_nisv_io_abort_to_user = true; >>> + break; >>> + default: >>> + r = -EINVAL; >>> + break; >>> + } >>> + >>> + return r; >>> +} >>> >>> /** >>> * kvm_arch_init_vm - initializes a VM data structure >>> @@ -196,6 +216,7 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext) >>> case KVM_CAP_MP_STATE: >>> case KVM_CAP_IMMEDIATE_EXIT: >>> case KVM_CAP_VCPU_EVENTS: >>> + case KVM_CAP_ARM_NISV_TO_USER: >>> r = 1; >>> break; >>> case KVM_CAP_ARM_SET_DEVICE_ADDR: >>> @@ -673,6 +694,8 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run) >>> ret = kvm_handle_mmio_return(vcpu, vcpu->run); >>> if (ret) >>> return ret; >>> + } else if (run->exit_reason == KVM_EXIT_ARM_NISV) { >>> + kvm_inject_undefined(vcpu); >> >> Just to make sure I understand: Is the expectation here that userspace >> could clear the exit reason if it managed to handle the exit? And >> otherwise we'd inject an UNDEF on reentry? >> > > Yes, but I think we should change that to an external abort. I'll test > something and send a proper patch with more clear documentation. Why not leave the injection to user space in any case? API wise there is no need to be backwards compatible, as we require the CAP to be enabled, right? IMHO it should be 100% a policy decision in user space whether to emulate and what type of exception to inject, if anything. Alex Amazon Development Center Germany GmbH Krausenstr. 38 10117 Berlin Geschaeftsfuehrung: Christian Schlaeger, Ralf Herbrich Eingetragen am Amtsgericht Charlottenburg unter HRB 149173 B Sitz: Berlin Ust-ID: DE 289 237 879
On 06/09/2019 13:08, Alexander Graf wrote: > > > On 06.09.19 10:00, Christoffer Dall wrote: >> On Thu, Sep 05, 2019 at 02:09:18PM +0100, Marc Zyngier wrote: [...] >>>> @@ -673,6 +694,8 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run) >>>> ret = kvm_handle_mmio_return(vcpu, vcpu->run); >>>> if (ret) >>>> return ret; >>>> + } else if (run->exit_reason == KVM_EXIT_ARM_NISV) { >>>> + kvm_inject_undefined(vcpu); >>> >>> Just to make sure I understand: Is the expectation here that userspace >>> could clear the exit reason if it managed to handle the exit? And >>> otherwise we'd inject an UNDEF on reentry? >>> >> >> Yes, but I think we should change that to an external abort. I'll test >> something and send a proper patch with more clear documentation. > > Why not leave the injection to user space in any case? API wise there is > no need to be backwards compatible, as we require the CAP to be enabled, > right? > > IMHO it should be 100% a policy decision in user space whether to > emulate and what type of exception to inject, if anything. The exception has to be something that the trapped instruction can actually generate. An UNDEF is definitely wrong, as the guest would have otherwise UNDEF'd at EL1, and KVM would have never seen it. You cannot deviate from the rule of architecture, and userspace feels like the wrong place to enforce it. M.
On 06.09.19 14:34, Marc Zyngier wrote: > On 06/09/2019 13:08, Alexander Graf wrote: >> >> >> On 06.09.19 10:00, Christoffer Dall wrote: >>> On Thu, Sep 05, 2019 at 02:09:18PM +0100, Marc Zyngier wrote: > > [...] > >>>>> @@ -673,6 +694,8 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run) >>>>> ret = kvm_handle_mmio_return(vcpu, vcpu->run); >>>>> if (ret) >>>>> return ret; >>>>> + } else if (run->exit_reason == KVM_EXIT_ARM_NISV) { >>>>> + kvm_inject_undefined(vcpu); >>>> >>>> Just to make sure I understand: Is the expectation here that userspace >>>> could clear the exit reason if it managed to handle the exit? And >>>> otherwise we'd inject an UNDEF on reentry? >>>> >>> >>> Yes, but I think we should change that to an external abort. I'll test >>> something and send a proper patch with more clear documentation. >> >> Why not leave the injection to user space in any case? API wise there is >> no need to be backwards compatible, as we require the CAP to be enabled, >> right? >> >> IMHO it should be 100% a policy decision in user space whether to >> emulate and what type of exception to inject, if anything. > > The exception has to be something that the trapped instruction can > actually generate. An UNDEF is definitely wrong, as the guest would have > otherwise UNDEF'd at EL1, and KVM would have never seen it. You cannot > deviate from the rule of architecture, and userspace feels like the > wrong place to enforce it. There are multiple viable options user space has: 1) Trigger an external abort 2) Emulate the instruction in user space 3) Inject a PV mechanism into the guest to emulate the insn inside guest space Why should we treat 1) any different from 2) or 3)? Why is there a "fast path" for the external abort, on an exit that is not performance critical or has any other reason to get special attention from kernel space. All we're doing is add more code in a privileged layer for a case that realistically should never occur in the first place. Alex Amazon Development Center Germany GmbH Krausenstr. 38 10117 Berlin Geschaeftsfuehrung: Christian Schlaeger, Ralf Herbrich Eingetragen am Amtsgericht Charlottenburg unter HRB 149173 B Sitz: Berlin Ust-ID: DE 289 237 879
On Fri, Sep 06, 2019 at 02:08:15PM +0200, Alexander Graf wrote: > > > On 06.09.19 10:00, Christoffer Dall wrote: > > On Thu, Sep 05, 2019 at 02:09:18PM +0100, Marc Zyngier wrote: > > > On 05/09/2019 10:22, Christoffer Dall wrote: > > > > On Thu, Sep 05, 2019 at 09:56:44AM +0100, Peter Maydell wrote: > > > > > On Thu, 5 Sep 2019 at 09:52, Marc Zyngier <maz@kernel.org> wrote: > > > > > > > > > > > > On Thu, 05 Sep 2019 09:16:54 +0100, > > > > > > Peter Maydell <peter.maydell@linaro.org> wrote: > > > > > > > This is true, but the problem is that barfing out to userspace > > > > > > > makes it harder to debug the guest because it means that > > > > > > > the VM is immediately destroyed, whereas AIUI if we > > > > > > > inject some kind of exception then (assuming you're set up > > > > > > > to do kernel-debug via gdbstub) you can actually examine > > > > > > > the offending guest code with a debugger because at least > > > > > > > your VM is still around to inspect... > > > > > > > > > > > > To Christoffer's point, I find the benefit a bit dubious. Yes, you get > > > > > > an exception, but the instruction that caused it may be completely > > > > > > legal (store with post-increment, for example), leading to an even > > > > > > more puzzled developer (that exception should never have been > > > > > > delivered the first place). > > > > > > > > > > Right, but the combination of "host kernel prints a message > > > > > about an unsupported load/store insn" and "within-guest debug > > > > > dump/stack trace/etc" is much more useful than just having > > > > > "host kernel prints message" and "QEMU exits"; and it requires > > > > > about 3 lines of code change... > > > > > > > > > > > I'm far more in favour of dumping the state of the access in the run > > > > > > structure (much like we do for a MMIO access) and let userspace do > > > > > > something about it (such as dumping information on the console or > > > > > > breaking). It could even inject an exception *if* the user has asked > > > > > > for it. > > > > > > > > > > ...whereas this requires agreement on a kernel-userspace API, > > > > > larger changes in the kernel, somebody to implement the userspace > > > > > side of things, and the user to update both the kernel and QEMU. > > > > > It's hard for me to see that the benefit here over the 3-line > > > > > approach really outweighs the extra effort needed. In practice > > > > > saying "we should do this" is saying "we're going to do nothing", > > > > > based on the historical record. > > > > > > > > > > > > > How about something like the following (completely untested, liable for > > > > ABI discussions etc. etc., but for illustration purposes). > > > > > > > > I think it raises the question (and likely many other) of whether we can > > > > break the existing 'ABI' and change behavior for missing ISV > > > > retrospectively for legacy user space when the issue has occurred? > > > > Someone might have written code that reacts to the -ENOSYS, so I've > > > > taken the conservative approach for this for the time being. > > > > > > > > > > > > diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h > > > > index 8a37c8e89777..19a92c49039c 100644 > > > > --- a/arch/arm/include/asm/kvm_host.h > > > > +++ b/arch/arm/include/asm/kvm_host.h > > > > @@ -76,6 +76,14 @@ struct kvm_arch { > > > > /* Mandated version of PSCI */ > > > > u32 psci_version; > > > > + > > > > + /* > > > > + * If we encounter a data abort without valid instruction syndrome > > > > + * information, report this to user space. User space can (and > > > > + * should) opt in to this feature if KVM_CAP_ARM_NISV_TO_USER is > > > > + * supported. > > > > + */ > > > > + bool return_nisv_io_abort_to_user; > > > > }; > > > > #define KVM_NR_MEM_OBJS 40 > > > > diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h > > > > index f656169db8c3..019bc560edc1 100644 > > > > --- a/arch/arm64/include/asm/kvm_host.h > > > > +++ b/arch/arm64/include/asm/kvm_host.h > > > > @@ -83,6 +83,14 @@ struct kvm_arch { > > > > /* Mandated version of PSCI */ > > > > u32 psci_version; > > > > + > > > > + /* > > > > + * If we encounter a data abort without valid instruction syndrome > > > > + * information, report this to user space. User space can (and > > > > + * should) opt in to this feature if KVM_CAP_ARM_NISV_TO_USER is > > > > + * supported. > > > > + */ > > > > + bool return_nisv_io_abort_to_user; > > > > }; > > > > #define KVM_NR_MEM_OBJS 40 > > > > diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h > > > > index 5e3f12d5359e..a4dd004d0db9 100644 > > > > --- a/include/uapi/linux/kvm.h > > > > +++ b/include/uapi/linux/kvm.h > > > > @@ -235,6 +235,7 @@ struct kvm_hyperv_exit { > > > > #define KVM_EXIT_S390_STSI 25 > > > > #define KVM_EXIT_IOAPIC_EOI 26 > > > > #define KVM_EXIT_HYPERV 27 > > > > +#define KVM_EXIT_ARM_NISV 28 > > > > /* For KVM_EXIT_INTERNAL_ERROR */ > > > > /* Emulate instruction failed. */ > > > > @@ -996,6 +997,7 @@ struct kvm_ppc_resize_hpt { > > > > #define KVM_CAP_ARM_PTRAUTH_ADDRESS 171 > > > > #define KVM_CAP_ARM_PTRAUTH_GENERIC 172 > > > > #define KVM_CAP_PMU_EVENT_FILTER 173 > > > > +#define KVM_CAP_ARM_NISV_TO_USER 174 > > > > #ifdef KVM_CAP_IRQ_ROUTING > > > > diff --git a/virt/kvm/arm/arm.c b/virt/kvm/arm/arm.c > > > > index 35a069815baf..2ce94bd9d4a9 100644 > > > > --- a/virt/kvm/arm/arm.c > > > > +++ b/virt/kvm/arm/arm.c > > > > @@ -98,6 +98,26 @@ int kvm_arch_check_processor_compat(void) > > > > return 0; > > > > } > > > > +int kvm_vm_ioctl_enable_cap(struct kvm *kvm, > > > > + struct kvm_enable_cap *cap) > > > > +{ > > > > + int r; > > > > + > > > > + if (cap->flags) > > > > + return -EINVAL; > > > > + > > > > + switch (cap->cap) { > > > > + case KVM_CAP_ARM_NISV_TO_USER: > > > > + r = 0; > > > > + kvm->arch.return_nisv_io_abort_to_user = true; > > > > + break; > > > > + default: > > > > + r = -EINVAL; > > > > + break; > > > > + } > > > > + > > > > + return r; > > > > +} > > > > /** > > > > * kvm_arch_init_vm - initializes a VM data structure > > > > @@ -196,6 +216,7 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext) > > > > case KVM_CAP_MP_STATE: > > > > case KVM_CAP_IMMEDIATE_EXIT: > > > > case KVM_CAP_VCPU_EVENTS: > > > > + case KVM_CAP_ARM_NISV_TO_USER: > > > > r = 1; > > > > break; > > > > case KVM_CAP_ARM_SET_DEVICE_ADDR: > > > > @@ -673,6 +694,8 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run) > > > > ret = kvm_handle_mmio_return(vcpu, vcpu->run); > > > > if (ret) > > > > return ret; > > > > + } else if (run->exit_reason == KVM_EXIT_ARM_NISV) { > > > > + kvm_inject_undefined(vcpu); > > > > > > Just to make sure I understand: Is the expectation here that userspace > > > could clear the exit reason if it managed to handle the exit? And > > > otherwise we'd inject an UNDEF on reentry? > > > > > > > Yes, but I think we should change that to an external abort. I'll test > > something and send a proper patch with more clear documentation. > > Why not leave the injection to user space in any case? API wise there is no > need to be backwards compatible, as we require the CAP to be enabled, right? > I'd prefer leaving it to userspace to worry about, but I thought Peter said that had been problematic historically, which I took at face value, but I could have misunderstood. If QEMU, kvmtool, and whatever the crazy^H cool kids are using in userspace these days are happy emulating the exception, then that's a viable approach. The main concern I have with that is whether they'll all get it right, and since we already have the code in the kernel to do this, it might make sense to re-use the kernel logic for it. I'll leave it in for v1 of the patch, and if based on how that code and interface looks like, we agree it's better to leave it to userspace, I can remove it in v2. Thanks, Christoffer
On 06.09.19 15:12, Christoffer Dall wrote: > On Fri, Sep 06, 2019 at 02:08:15PM +0200, Alexander Graf wrote: >> >> >> On 06.09.19 10:00, Christoffer Dall wrote: >>> On Thu, Sep 05, 2019 at 02:09:18PM +0100, Marc Zyngier wrote: >>>> On 05/09/2019 10:22, Christoffer Dall wrote: >>>>> On Thu, Sep 05, 2019 at 09:56:44AM +0100, Peter Maydell wrote: >>>>>> On Thu, 5 Sep 2019 at 09:52, Marc Zyngier <maz@kernel.org> wrote: >>>>>>> >>>>>>> On Thu, 05 Sep 2019 09:16:54 +0100, >>>>>>> Peter Maydell <peter.maydell@linaro.org> wrote: >>>>>>>> This is true, but the problem is that barfing out to userspace >>>>>>>> makes it harder to debug the guest because it means that >>>>>>>> the VM is immediately destroyed, whereas AIUI if we >>>>>>>> inject some kind of exception then (assuming you're set up >>>>>>>> to do kernel-debug via gdbstub) you can actually examine >>>>>>>> the offending guest code with a debugger because at least >>>>>>>> your VM is still around to inspect... >>>>>>> >>>>>>> To Christoffer's point, I find the benefit a bit dubious. Yes, you get >>>>>>> an exception, but the instruction that caused it may be completely >>>>>>> legal (store with post-increment, for example), leading to an even >>>>>>> more puzzled developer (that exception should never have been >>>>>>> delivered the first place). >>>>>> >>>>>> Right, but the combination of "host kernel prints a message >>>>>> about an unsupported load/store insn" and "within-guest debug >>>>>> dump/stack trace/etc" is much more useful than just having >>>>>> "host kernel prints message" and "QEMU exits"; and it requires >>>>>> about 3 lines of code change... >>>>>> >>>>>>> I'm far more in favour of dumping the state of the access in the run >>>>>>> structure (much like we do for a MMIO access) and let userspace do >>>>>>> something about it (such as dumping information on the console or >>>>>>> breaking). It could even inject an exception *if* the user has asked >>>>>>> for it. >>>>>> >>>>>> ...whereas this requires agreement on a kernel-userspace API, >>>>>> larger changes in the kernel, somebody to implement the userspace >>>>>> side of things, and the user to update both the kernel and QEMU. >>>>>> It's hard for me to see that the benefit here over the 3-line >>>>>> approach really outweighs the extra effort needed. In practice >>>>>> saying "we should do this" is saying "we're going to do nothing", >>>>>> based on the historical record. >>>>>> >>>>> >>>>> How about something like the following (completely untested, liable for >>>>> ABI discussions etc. etc., but for illustration purposes). >>>>> >>>>> I think it raises the question (and likely many other) of whether we can >>>>> break the existing 'ABI' and change behavior for missing ISV >>>>> retrospectively for legacy user space when the issue has occurred? >>>>> Someone might have written code that reacts to the -ENOSYS, so I've >>>>> taken the conservative approach for this for the time being. >>>>> >>>>> >>>>> diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h >>>>> index 8a37c8e89777..19a92c49039c 100644 >>>>> --- a/arch/arm/include/asm/kvm_host.h >>>>> +++ b/arch/arm/include/asm/kvm_host.h >>>>> @@ -76,6 +76,14 @@ struct kvm_arch { >>>>> /* Mandated version of PSCI */ >>>>> u32 psci_version; >>>>> + >>>>> + /* >>>>> + * If we encounter a data abort without valid instruction syndrome >>>>> + * information, report this to user space. User space can (and >>>>> + * should) opt in to this feature if KVM_CAP_ARM_NISV_TO_USER is >>>>> + * supported. >>>>> + */ >>>>> + bool return_nisv_io_abort_to_user; >>>>> }; >>>>> #define KVM_NR_MEM_OBJS 40 >>>>> diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h >>>>> index f656169db8c3..019bc560edc1 100644 >>>>> --- a/arch/arm64/include/asm/kvm_host.h >>>>> +++ b/arch/arm64/include/asm/kvm_host.h >>>>> @@ -83,6 +83,14 @@ struct kvm_arch { >>>>> /* Mandated version of PSCI */ >>>>> u32 psci_version; >>>>> + >>>>> + /* >>>>> + * If we encounter a data abort without valid instruction syndrome >>>>> + * information, report this to user space. User space can (and >>>>> + * should) opt in to this feature if KVM_CAP_ARM_NISV_TO_USER is >>>>> + * supported. >>>>> + */ >>>>> + bool return_nisv_io_abort_to_user; >>>>> }; >>>>> #define KVM_NR_MEM_OBJS 40 >>>>> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h >>>>> index 5e3f12d5359e..a4dd004d0db9 100644 >>>>> --- a/include/uapi/linux/kvm.h >>>>> +++ b/include/uapi/linux/kvm.h >>>>> @@ -235,6 +235,7 @@ struct kvm_hyperv_exit { >>>>> #define KVM_EXIT_S390_STSI 25 >>>>> #define KVM_EXIT_IOAPIC_EOI 26 >>>>> #define KVM_EXIT_HYPERV 27 >>>>> +#define KVM_EXIT_ARM_NISV 28 >>>>> /* For KVM_EXIT_INTERNAL_ERROR */ >>>>> /* Emulate instruction failed. */ >>>>> @@ -996,6 +997,7 @@ struct kvm_ppc_resize_hpt { >>>>> #define KVM_CAP_ARM_PTRAUTH_ADDRESS 171 >>>>> #define KVM_CAP_ARM_PTRAUTH_GENERIC 172 >>>>> #define KVM_CAP_PMU_EVENT_FILTER 173 >>>>> +#define KVM_CAP_ARM_NISV_TO_USER 174 >>>>> #ifdef KVM_CAP_IRQ_ROUTING >>>>> diff --git a/virt/kvm/arm/arm.c b/virt/kvm/arm/arm.c >>>>> index 35a069815baf..2ce94bd9d4a9 100644 >>>>> --- a/virt/kvm/arm/arm.c >>>>> +++ b/virt/kvm/arm/arm.c >>>>> @@ -98,6 +98,26 @@ int kvm_arch_check_processor_compat(void) >>>>> return 0; >>>>> } >>>>> +int kvm_vm_ioctl_enable_cap(struct kvm *kvm, >>>>> + struct kvm_enable_cap *cap) >>>>> +{ >>>>> + int r; >>>>> + >>>>> + if (cap->flags) >>>>> + return -EINVAL; >>>>> + >>>>> + switch (cap->cap) { >>>>> + case KVM_CAP_ARM_NISV_TO_USER: >>>>> + r = 0; >>>>> + kvm->arch.return_nisv_io_abort_to_user = true; >>>>> + break; >>>>> + default: >>>>> + r = -EINVAL; >>>>> + break; >>>>> + } >>>>> + >>>>> + return r; >>>>> +} >>>>> /** >>>>> * kvm_arch_init_vm - initializes a VM data structure >>>>> @@ -196,6 +216,7 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext) >>>>> case KVM_CAP_MP_STATE: >>>>> case KVM_CAP_IMMEDIATE_EXIT: >>>>> case KVM_CAP_VCPU_EVENTS: >>>>> + case KVM_CAP_ARM_NISV_TO_USER: >>>>> r = 1; >>>>> break; >>>>> case KVM_CAP_ARM_SET_DEVICE_ADDR: >>>>> @@ -673,6 +694,8 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run) >>>>> ret = kvm_handle_mmio_return(vcpu, vcpu->run); >>>>> if (ret) >>>>> return ret; >>>>> + } else if (run->exit_reason == KVM_EXIT_ARM_NISV) { >>>>> + kvm_inject_undefined(vcpu); >>>> >>>> Just to make sure I understand: Is the expectation here that userspace >>>> could clear the exit reason if it managed to handle the exit? And >>>> otherwise we'd inject an UNDEF on reentry? >>>> >>> >>> Yes, but I think we should change that to an external abort. I'll test >>> something and send a proper patch with more clear documentation. >> >> Why not leave the injection to user space in any case? API wise there is no >> need to be backwards compatible, as we require the CAP to be enabled, right? >> > > I'd prefer leaving it to userspace to worry about, but I thought Peter > said that had been problematic historically, which I took at face value, > but I could have misunderstood. > > If QEMU, kvmtool, and whatever the crazy^H cool kids are using in > userspace these days are happy emulating the exception, then that's a > viable approach. The main concern I have with that is whether they'll > all get it right, and since we already have the code in the kernel to do > this, it might make sense to re-use the kernel logic for it. You could make the same argument about injecting an #SError on an out of bounds access to MMIO. If injecting a fault is too complicated, we should fix that rather than create an unbalanced user space interface :). > I'll leave it in for v1 of the patch, and if based on how that code and > interface looks like, we agree it's better to leave it to userspace, I > can remove it in v2. Sure, works for me :). Please CC me on v1 so I can comment on it ;) Alex Amazon Development Center Germany GmbH Krausenstr. 38 10117 Berlin Geschaeftsfuehrung: Christian Schlaeger, Ralf Herbrich Eingetragen am Amtsgericht Charlottenburg unter HRB 149173 B Sitz: Berlin Ust-ID: DE 289 237 879
On Fri, 6 Sep 2019 at 14:13, Christoffer Dall <christoffer.dall@arm.com> wrote: > I'd prefer leaving it to userspace to worry about, but I thought Peter > said that had been problematic historically, which I took at face value, > but I could have misunderstood. > > If QEMU, kvmtool, and whatever the crazy^H cool kids are using in > userspace these days are happy emulating the exception, then that's a > viable approach. The main concern I have with that is whether they'll > all get it right, and since we already have the code in the kernel to do > this, it might make sense to re-use the kernel logic for it. Well, for QEMU we've had code that in theory might do this but in practice it's never been tested. Essentially the problem is that nobody ever wants to inject an exception from userspace except in incredibly rare cases like "trying to use h/w breakpoints simultaneously inside the VM and also to debug the VM from outside" or "we're running on RAS hardware that just told us that the VM's RAM was faulty". There's no even vaguely commonly-used usecase for it today; and this ABI suggestion adds another "this is in practice almost never going to happen" case to the pile. The codepath is unreliable in QEMU because (a) it requires getting syncing of sysreg values to and from the kernel right -- this is about the only situation where userspace wants to modify sysregs during execution of the VM, as opposed to just reading them -- and (b) we try to reuse the code we already have that does TCG exception injection, which might or might not be a design mistake, and (c) as noted above it's a never-actually-used untested codepath... So I think if I were you I wouldn't commit to the kernel ABI until somebody had at least written some RFC-quality patches for QEMU and tested that they work and the ABI is OK in that sense. (For the avoidance of doubt, I'm not volunteering to do that myself.) I don't object to the idea in principle, though. PS: the other "injecting exceptions to the guest" oddity is that if the kernel *does* find the ISV information and returns to userspace for it to handle the MMIO, there's no way for userspace to say "actually that address is supposed to generate a data abort". thanks -- PMM
On 06.09.19 15:31, Peter Maydell wrote: > On Fri, 6 Sep 2019 at 14:13, Christoffer Dall <christoffer.dall@arm.com> wrote: >> I'd prefer leaving it to userspace to worry about, but I thought Peter >> said that had been problematic historically, which I took at face value, >> but I could have misunderstood. >> >> If QEMU, kvmtool, and whatever the crazy^H cool kids are using in >> userspace these days are happy emulating the exception, then that's a >> viable approach. The main concern I have with that is whether they'll >> all get it right, and since we already have the code in the kernel to do >> this, it might make sense to re-use the kernel logic for it. > > Well, for QEMU we've had code that in theory might do this but > in practice it's never been tested. Essentially the problem is > that nobody ever wants to inject an exception from userspace > except in incredibly rare cases like "trying to use h/w breakpoints > simultaneously inside the VM and also to debug the VM from outside" > or "we're running on RAS hardware that just told us that the VM's > RAM was faulty". There's no even vaguely commonly-used usecase > for it today; and this ABI suggestion adds another "this is in > practice almost never going to happen" case to the pile. The > codepath is unreliable in QEMU because (a) it requires getting > syncing of sysreg values to and from the kernel right -- this is > about the only situation where userspace wants to modify sysregs > during execution of the VM, as opposed to just reading them -- and > (b) we try to reuse the code we already have that does TCG exception > injection, which might or might not be a design mistake, and That's probably a design mistake, correct :) > (c) as noted above it's a never-actually-used untested codepath... Sounds like an easy thing to resolve using kvm-unit-tests? > So I think if I were you I wouldn't commit to the kernel ABI until > somebody had at least written some RFC-quality patches for QEMU and > tested that they work and the ABI is OK in that sense. (For the > avoidance of doubt, I'm not volunteering to do that myself.) > I don't object to the idea in principle, though. > > PS: the other "injecting exceptions to the guest" oddity is that > if the kernel *does* find the ISV information and returns to userspace > for it to handle the MMIO, there's no way for userspace to say > "actually that address is supposed to generate a data abort". I think we're converging here. My proposal is that "inject a fault" should not be something special cased for the "I can't decode the instruction" case, but rather that we need a more generic mechanism. Whether that's a new ioctl, a flag we set on entry or something else is an implementation detail I'll be happy to leave for discussion. The only thing I'd like to avoid seeing is that we create a new user space ABI that makes it easy to inject a single, particular exception, but not solve all of the other cases while creating extra work to just implement instruction decoding in user space. Alex Amazon Development Center Germany GmbH Krausenstr. 38 10117 Berlin Geschaeftsfuehrung: Christian Schlaeger, Ralf Herbrich Eingetragen am Amtsgericht Charlottenburg unter HRB 149173 B Sitz: Berlin Ust-ID: DE 289 237 879
On Fri, Sep 06, 2019 at 02:31:42PM +0100, Peter Maydell wrote: > On Fri, 6 Sep 2019 at 14:13, Christoffer Dall <christoffer.dall@arm.com> wrote: > > I'd prefer leaving it to userspace to worry about, but I thought Peter > > said that had been problematic historically, which I took at face value, > > but I could have misunderstood. > > > > If QEMU, kvmtool, and whatever the crazy^H cool kids are using in > > userspace these days are happy emulating the exception, then that's a > > viable approach. The main concern I have with that is whether they'll > > all get it right, and since we already have the code in the kernel to do > > this, it might make sense to re-use the kernel logic for it. > > Well, for QEMU we've had code that in theory might do this but > in practice it's never been tested. Essentially the problem is > that nobody ever wants to inject an exception from userspace > except in incredibly rare cases like "trying to use h/w breakpoints > simultaneously inside the VM and also to debug the VM from outside" > or "we're running on RAS hardware that just told us that the VM's > RAM was faulty". There's no even vaguely commonly-used usecase > for it today; and this ABI suggestion adds another "this is in > practice almost never going to happen" case to the pile. The > codepath is unreliable in QEMU because (a) it requires getting > syncing of sysreg values to and from the kernel right -- this is > about the only situation where userspace wants to modify sysregs > during execution of the VM, as opposed to just reading them -- and > (b) we try to reuse the code we already have that does TCG exception > injection, which might or might not be a design mistake, and > (c) as noted above it's a never-actually-used untested codepath... > > So I think if I were you I wouldn't commit to the kernel ABI until > somebody had at least written some RFC-quality patches for QEMU and > tested that they work and the ABI is OK in that sense. (For the > avoidance of doubt, I'm not volunteering to do that myself.) > I don't object to the idea in principle, though. > > PS: the other "injecting exceptions to the guest" oddity is that > if the kernel *does* find the ISV information and returns to userspace > for it to handle the MMIO, there's no way for userspace to say > "actually that address is supposed to generate a data abort". > That's a good point. A synchronous interface with a separate mechanism to ask the kernel to inject an exception might be a good solution, if there's an elegant way to do the latter. I'll have a look at that. Thanks, Christoffer
On Fri, 6 Sep 2019 at 14:41, Alexander Graf <graf@amazon.com> wrote: > On 06.09.19 15:31, Peter Maydell wrote: > > (b) we try to reuse the code we already have that does TCG exception > > injection, which might or might not be a design mistake, and > > That's probably a design mistake, correct :) Well, conceptually it's not necessarily a bad idea, because in both cases what we're doing is "change the system register state (PC, ESR_EL1, ELR_EL1 etc) so that the CPU looks like it's just taken an exception"; but some of what the TCG code needs to do isn't necessary for KVM and all of it was not written with the idea of KVM in mind at all... thanks -- PMM
On 06.09.19 15:50, Peter Maydell wrote: > On Fri, 6 Sep 2019 at 14:41, Alexander Graf <graf@amazon.com> wrote: >> On 06.09.19 15:31, Peter Maydell wrote: >>> (b) we try to reuse the code we already have that does TCG exception >>> injection, which might or might not be a design mistake, and >> >> That's probably a design mistake, correct :) > > Well, conceptually it's not necessarily a bad idea, because > in both cases what we're doing is "change the system register > state (PC, ESR_EL1, ELR_EL1 etc) so that the CPU looks like > it's just taken an exception"; but some of what the > TCG code needs to do isn't necessary for KVM and all of it > was not written with the idea of KVM in mind at all... Yes, and it probably makes sense to model the QEMU internal API similarly, so that exception generating code does not have to distinguish. However, it's much easier for KVM to ensure exception prioritization as well as internal state synchronization. Conceptually, as you've seen, it really doesn't make a lot of sense to pull register state from KVM, wiggle it and then push it down when KVM has awareness of the same transformation anyway. So I guess the path is clear: Create a generic interface for exception injection and a separate patch similar to what Christoffer already posted with the new CAP to route illegal MMIO traps into user space. Connecting the two dots in user space really should be trivial then. (famous last words) Alex Amazon Development Center Germany GmbH Krausenstr. 38 10117 Berlin Geschaeftsfuehrung: Christian Schlaeger, Ralf Herbrich Eingetragen am Amtsgericht Charlottenburg unter HRB 149173 B Sitz: Berlin Ust-ID: DE 289 237 879
diff --git a/virt/kvm/arm/mmio.c b/virt/kvm/arm/mmio.c index a8a6a0c883f1..0cbed7d6a0f4 100644 --- a/virt/kvm/arm/mmio.c +++ b/virt/kvm/arm/mmio.c @@ -161,8 +161,8 @@ int io_mem_abort(struct kvm_vcpu *vcpu, struct kvm_run *run, if (ret) return ret; } else { - kvm_err("load/store instruction decoding not implemented\n"); - return -ENOSYS; + kvm_inject_dabt(vcpu, kvm_vcpu_get_hfar(vcpu)); + return 1; } rt = vcpu->arch.mmio_decode.rt;
If an application tries to access memory that is not mapped, an error ENOSYS, "load/store instruction decoding not implemented" may occur. QEMU will hang with a register dump. Instead create a data abort that can be handled gracefully by the application running in the virtual environment. Now the virtual machine can react to the event in the most appropriate way - by recovering, by writing an informative log, or by rebooting. Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de> --- virt/kvm/arm/mmio.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) -- 2.23.0.rc1