Message ID | 20200724143506.17772-7-will@kernel.org (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | KVM: arm64: Fixes to early stage-2 fault handling | expand |
On Fri, 24 Jul 2020 15:35:05 +0100, Will Deacon <will@kernel.org> wrote: > > Stage-2 faults on stage-1 page-table walks can occur on both the I-side > and the D-side. It is IMPLEMENTATATION DEFINED whether they are reported > as reads or writes and, in the case that they are generated by an AT > instruction, they are reported with the CM bit set. > > All of this deeply confuses the logic in kvm_handle_guest_abort(); > userspace may or may not see the fault, depending on whether it occurs > on the data or the instruction side, and an AT instruction may be skipped > if the translation tables are held in a read-only memslot. Yuk, that's indeed ugly. Well spotted. I guess the saving grace is that a S2 trap caused by an ATS1 instruction will be reported as S1PTW+CM, while the fault caused by a CMO is reported as *either* S1PTW *or* CM, but never both. > > Move the handling of stage-2 faults on stage-1 page-table walks earlier > so that they consistently result in either a data or an instruction abort > being re-injected back to the guest. The instruction abort seems to be happening as the side effect of executing outside of a memslot, not really because of a S1PTW. I wonder whether these S1PTW faults should be classified as external aborts instead (because putting your page tables outside of a memslot seems a bit bonkers). > > Cc: Marc Zyngier <maz@kernel.org> > Cc: Quentin Perret <qperret@google.com> > Signed-off-by: Will Deacon <will@kernel.org> > --- > arch/arm64/kvm/mmu.c | 10 +++++----- > 1 file changed, 5 insertions(+), 5 deletions(-) > > diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c > index adb933ecd177..9e72e7f4a2c2 100644 > --- a/arch/arm64/kvm/mmu.c > +++ b/arch/arm64/kvm/mmu.c > @@ -2124,6 +2124,11 @@ int kvm_handle_guest_abort(struct kvm_vcpu *vcpu, struct kvm_run *run) > goto out; > } > > + if (kvm_vcpu_dabt_iss1tw(vcpu)) { > + ret = -ENXIO; > + goto out; > + } > + > /* > * Check for a cache maintenance operation. Since we > * ended-up here, we know it is outside of any memory > @@ -2157,11 +2162,6 @@ int kvm_handle_guest_abort(struct kvm_vcpu *vcpu, struct kvm_run *run) > goto out_unlock; > } > > - if (kvm_vcpu_dabt_iss1tw(vcpu)) { > - ret = -ENXIO; > - goto out; > - } > - > ret = io_mem_abort(vcpu, run, fault_ipa); > goto out_unlock; > } > -- > 2.28.0.rc0.142.g3c755180ce-goog > > Thanks, M.
On Sun, Jul 26, 2020 at 02:38:38PM +0100, Marc Zyngier wrote: > On Fri, 24 Jul 2020 15:35:05 +0100, > Will Deacon <will@kernel.org> wrote: > > > > Stage-2 faults on stage-1 page-table walks can occur on both the I-side > > and the D-side. It is IMPLEMENTATATION DEFINED whether they are reported > > as reads or writes and, in the case that they are generated by an AT > > instruction, they are reported with the CM bit set. > > > > All of this deeply confuses the logic in kvm_handle_guest_abort(); > > userspace may or may not see the fault, depending on whether it occurs > > on the data or the instruction side, and an AT instruction may be skipped > > if the translation tables are held in a read-only memslot. > > Yuk, that's indeed ugly. Well spotted. I guess the saving grace is > that a S2 trap caused by an ATS1 instruction will be reported as > S1PTW+CM, while the fault caused by a CMO is reported as *either* > S1PTW *or* CM, but never both. Hmm, is that right? If the translation faults at S2 for a CM instruction, wouldn't it have both bits set? > > Move the handling of stage-2 faults on stage-1 page-table walks earlier > > so that they consistently result in either a data or an instruction abort > > being re-injected back to the guest. > > The instruction abort seems to be happening as the side effect of > executing outside of a memslot, not really because of a S1PTW. Not sure about that. If the instruction fetch generates an S2 abort during translation, then we could be executing from within a memslot; it's the location of the page-tables that matters. However, I think that means things still aren't quite right with my patches because we can end up calling io_mem_abort() from an instruction abort, which won't have enough syndrome information to do anything useful. Hmm. Stepping back, here's what I _think_ we want, although see the '(?)' bits where I'm particularly unsure: S2 instruction abort: * Not in memslot: inject external iabt to guest * In R/O memslot: - S2 fault on S1 walk: either EXIT_NISV or inject external iabt to guest (?) S2 data abort: * Not in memslot: - S2 fault on S1 walk: inject external dabt to guest - Cache maintenance: skip instr - Syndrome valid EXIT_MMIO - Syndrome invalid EXIT_NISV * In R/O memslot: - S2 fault on S1 walk: either EXIT_NISV or inject external dabt to guest (?) - Access is write (including cache maintenance (?)): - Syndrome valid EXIT_MMIO - Syndrome invalid EXIT_NISV Everything else gets handled by handle_access_fault()/user_mem_abort(). What do you think? > I wonder whether these S1PTW faults should be classified as external > aborts instead (because putting your page tables outside of a memslot > seems a bit bonkers). I think that's what this patch does, since we end up in kvm_inject_dabt(). Will
diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c index adb933ecd177..9e72e7f4a2c2 100644 --- a/arch/arm64/kvm/mmu.c +++ b/arch/arm64/kvm/mmu.c @@ -2124,6 +2124,11 @@ int kvm_handle_guest_abort(struct kvm_vcpu *vcpu, struct kvm_run *run) goto out; } + if (kvm_vcpu_dabt_iss1tw(vcpu)) { + ret = -ENXIO; + goto out; + } + /* * Check for a cache maintenance operation. Since we * ended-up here, we know it is outside of any memory @@ -2157,11 +2162,6 @@ int kvm_handle_guest_abort(struct kvm_vcpu *vcpu, struct kvm_run *run) goto out_unlock; } - if (kvm_vcpu_dabt_iss1tw(vcpu)) { - ret = -ENXIO; - goto out; - } - ret = io_mem_abort(vcpu, run, fault_ipa); goto out_unlock; }
Stage-2 faults on stage-1 page-table walks can occur on both the I-side and the D-side. It is IMPLEMENTATATION DEFINED whether they are reported as reads or writes and, in the case that they are generated by an AT instruction, they are reported with the CM bit set. All of this deeply confuses the logic in kvm_handle_guest_abort(); userspace may or may not see the fault, depending on whether it occurs on the data or the instruction side, and an AT instruction may be skipped if the translation tables are held in a read-only memslot. Move the handling of stage-2 faults on stage-1 page-table walks earlier so that they consistently result in either a data or an instruction abort being re-injected back to the guest. Cc: Marc Zyngier <maz@kernel.org> Cc: Quentin Perret <qperret@google.com> Signed-off-by: Will Deacon <will@kernel.org> --- arch/arm64/kvm/mmu.c | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-)