Message ID | 20220512131019.2594948-2-scgl@linux.ibm.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | Dirtying, failing memop: don't indicate suppression | expand |
On 12.05.22 15:10, Janis Schoetterl-Glausch wrote: > If user space uses a memop to emulate an instruction and that > memop fails, the execution of the instruction ends. > Instruction execution can end in different ways, one of which is > suppression, which requires that the instruction execute like a no-op. > A writing memop that spans multiple pages and fails due to key > protection may have modified guest memory, as a result, the likely > correct ending is termination. Therefore, do not indicate a > suppressing instruction ending in this case. I think that is possibly problematic handling. In TCG we stumbled in similar issues in the past for MVC when crossing page boundaries. Failing after modifying the first page already seriously broke some user space, because the guest would retry the instruction after fixing up the fault reason on the second page: if source and destination operands overlap, you'll be in trouble because the input parameters already changed. For this reason, in TCG we make sure that all accesses are valid before starting modifications. See target/s390x/tcg/mem_helper.c:do_helper_mvc with access_prepare() and friends as an example. Now, I don't know how to tackle that for KVM, I just wanted to raise awareness that injecting an interrupt after modifying page content is possible dodgy and dangerous.
Am 12.05.22 um 15:22 schrieb David Hildenbrand: > On 12.05.22 15:10, Janis Schoetterl-Glausch wrote: >> If user space uses a memop to emulate an instruction and that >> memop fails, the execution of the instruction ends. >> Instruction execution can end in different ways, one of which is >> suppression, which requires that the instruction execute like a no-op. >> A writing memop that spans multiple pages and fails due to key >> protection may have modified guest memory, as a result, the likely >> correct ending is termination. Therefore, do not indicate a >> suppressing instruction ending in this case. > > I think that is possibly problematic handling. > > In TCG we stumbled in similar issues in the past for MVC when crossing > page boundaries. Failing after modifying the first page already > seriously broke some user space, because the guest would retry the > instruction after fixing up the fault reason on the second page: if > source and destination operands overlap, you'll be in trouble because > the input parameters already changed. > > For this reason, in TCG we make sure that all accesses are valid before > starting modifications. > > See target/s390x/tcg/mem_helper.c:do_helper_mvc with access_prepare() > and friends as an example. > > Now, I don't know how to tackle that for KVM, I just wanted to raise > awareness that injecting an interrupt after modifying page content is > possible dodgy and dangerous. this is really special and only for key protection crossing pages. Its been done since the 70ies in that way on z/VM. The architecture is and was always written in a way to allow termination for this case for hypervisors.
On 12.05.22 15:51, Christian Borntraeger wrote: > > > Am 12.05.22 um 15:22 schrieb David Hildenbrand: >> On 12.05.22 15:10, Janis Schoetterl-Glausch wrote: >>> If user space uses a memop to emulate an instruction and that >>> memop fails, the execution of the instruction ends. >>> Instruction execution can end in different ways, one of which is >>> suppression, which requires that the instruction execute like a no-op. >>> A writing memop that spans multiple pages and fails due to key >>> protection may have modified guest memory, as a result, the likely >>> correct ending is termination. Therefore, do not indicate a >>> suppressing instruction ending in this case. >> >> I think that is possibly problematic handling. >> >> In TCG we stumbled in similar issues in the past for MVC when crossing >> page boundaries. Failing after modifying the first page already >> seriously broke some user space, because the guest would retry the >> instruction after fixing up the fault reason on the second page: if >> source and destination operands overlap, you'll be in trouble because >> the input parameters already changed. >> >> For this reason, in TCG we make sure that all accesses are valid before >> starting modifications. >> >> See target/s390x/tcg/mem_helper.c:do_helper_mvc with access_prepare() >> and friends as an example. >> >> Now, I don't know how to tackle that for KVM, I just wanted to raise >> awareness that injecting an interrupt after modifying page content is >> possible dodgy and dangerous. > > this is really special and only for key protection crossing pages. > Its been done since the 70ies in that way on z/VM. The architecture > is and was always written in a way to allow termination for this > case for hypervisors. Just so I understand correctly: all instructions that a hypervisor with hardware virtualization is supposed to emulate are "written in a way to allow termination", correct? That makes things a lot easier.
Am 12.05.22 um 17:50 schrieb David Hildenbrand: > On 12.05.22 15:51, Christian Borntraeger wrote: >> >> >> Am 12.05.22 um 15:22 schrieb David Hildenbrand: >>> On 12.05.22 15:10, Janis Schoetterl-Glausch wrote: >>>> If user space uses a memop to emulate an instruction and that >>>> memop fails, the execution of the instruction ends. >>>> Instruction execution can end in different ways, one of which is >>>> suppression, which requires that the instruction execute like a no-op. >>>> A writing memop that spans multiple pages and fails due to key >>>> protection may have modified guest memory, as a result, the likely >>>> correct ending is termination. Therefore, do not indicate a >>>> suppressing instruction ending in this case. >>> >>> I think that is possibly problematic handling. >>> >>> In TCG we stumbled in similar issues in the past for MVC when crossing >>> page boundaries. Failing after modifying the first page already >>> seriously broke some user space, because the guest would retry the >>> instruction after fixing up the fault reason on the second page: if >>> source and destination operands overlap, you'll be in trouble because >>> the input parameters already changed. >>> >>> For this reason, in TCG we make sure that all accesses are valid before >>> starting modifications. >>> >>> See target/s390x/tcg/mem_helper.c:do_helper_mvc with access_prepare() >>> and friends as an example. >>> >>> Now, I don't know how to tackle that for KVM, I just wanted to raise >>> awareness that injecting an interrupt after modifying page content is >>> possible dodgy and dangerous. >> >> this is really special and only for key protection crossing pages. >> Its been done since the 70ies in that way on z/VM. The architecture >> is and was always written in a way to allow termination for this >> case for hypervisors. > > Just so I understand correctly: all instructions that a hypervisor with > hardware virtualization is supposed to emulate are "written in a way to > allow termination", correct? That makes things a lot easier. Only for key protection. Key protection can always be terminating no matter what the instruction says. This is historical baggage - key protection was resulting in abends - killing the process. So it does not matter if we provide the extra info as in enhanced suppression on protection as nobody is making use of that (apart from debuggers maybe).
On 12.05.22 18:26, Christian Borntraeger wrote: > > > Am 12.05.22 um 17:50 schrieb David Hildenbrand: >> On 12.05.22 15:51, Christian Borntraeger wrote: >>> >>> >>> Am 12.05.22 um 15:22 schrieb David Hildenbrand: >>>> On 12.05.22 15:10, Janis Schoetterl-Glausch wrote: >>>>> If user space uses a memop to emulate an instruction and that >>>>> memop fails, the execution of the instruction ends. >>>>> Instruction execution can end in different ways, one of which is >>>>> suppression, which requires that the instruction execute like a no-op. >>>>> A writing memop that spans multiple pages and fails due to key >>>>> protection may have modified guest memory, as a result, the likely >>>>> correct ending is termination. Therefore, do not indicate a >>>>> suppressing instruction ending in this case. >>>> >>>> I think that is possibly problematic handling. >>>> >>>> In TCG we stumbled in similar issues in the past for MVC when crossing >>>> page boundaries. Failing after modifying the first page already >>>> seriously broke some user space, because the guest would retry the >>>> instruction after fixing up the fault reason on the second page: if >>>> source and destination operands overlap, you'll be in trouble because >>>> the input parameters already changed. >>>> >>>> For this reason, in TCG we make sure that all accesses are valid before >>>> starting modifications. >>>> >>>> See target/s390x/tcg/mem_helper.c:do_helper_mvc with access_prepare() >>>> and friends as an example. >>>> >>>> Now, I don't know how to tackle that for KVM, I just wanted to raise >>>> awareness that injecting an interrupt after modifying page content is >>>> possible dodgy and dangerous. >>> >>> this is really special and only for key protection crossing pages. >>> Its been done since the 70ies in that way on z/VM. The architecture >>> is and was always written in a way to allow termination for this >>> case for hypervisors. >> >> Just so I understand correctly: all instructions that a hypervisor with >> hardware virtualization is supposed to emulate are "written in a way to >> allow termination", correct? That makes things a lot easier. > > Only for key protection. Key protection can always be terminating no matter > what the instruction says. This is historical baggage - key protection was > resulting in abends - killing the process. So it does not matter if we > provide the extra info as in enhanced suppression on protection as nobody > is making use of that (apart from debuggers maybe). Got it, makes sense then. Thanks for clarifying!
Am 12.05.22 um 15:10 schrieb Janis Schoetterl-Glausch: > If user space uses a memop to emulate an instruction and that > memop fails, the execution of the instruction ends. > Instruction execution can end in different ways, one of which is > suppression, which requires that the instruction execute like a no-op. > A writing memop that spans multiple pages and fails due to key > protection may have modified guest memory, as a result, the likely > correct ending is termination. Therefore, do not indicate a > suppressing instruction ending in this case. > > Signed-off-by: Janis Schoetterl-Glausch <scgl@linux.ibm.com> Reviewed-by: Christian Borntraeger <borntraeger@linux.ibm.com> > --- > Documentation/virt/kvm/api.rst | 6 ++++++ > arch/s390/kvm/gaccess.c | 22 ++++++++++++++++++---- > 2 files changed, 24 insertions(+), 4 deletions(-) > > diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst > index 4a900cdbc62e..b6aba4f50db7 100644 > --- a/Documentation/virt/kvm/api.rst > +++ b/Documentation/virt/kvm/api.rst > @@ -3754,12 +3754,18 @@ in case of KVM_S390_MEMOP_F_CHECK_ONLY), the ioctl returns a positive > error number indicating the type of exception. This exception is also > raised directly at the corresponding VCPU if the flag > KVM_S390_MEMOP_F_INJECT_EXCEPTION is set. > +On protection exceptions, unless specified otherwise, the injected > +translation-exception identifier (TEID) indicates suppression. > > If the KVM_S390_MEMOP_F_SKEY_PROTECTION flag is set, storage key > protection is also in effect and may cause exceptions if accesses are > prohibited given the access key designated by "key"; the valid range is 0..15. > KVM_S390_MEMOP_F_SKEY_PROTECTION is available if KVM_CAP_S390_MEM_OP_EXTENSION > is > 0. > +Since the accessed memory may span multiple pages and those pages might have > +different storage keys, it is possible that a protection exception occurs > +after memory has been modified. In this case, if the exception is injected, > +the TEID does not indicate suppression. > > Absolute read/write: > ^^^^^^^^^^^^^^^^^^^^ > diff --git a/arch/s390/kvm/gaccess.c b/arch/s390/kvm/gaccess.c > index d53a183c2005..227ed0009354 100644 > --- a/arch/s390/kvm/gaccess.c > +++ b/arch/s390/kvm/gaccess.c > @@ -491,8 +491,8 @@ enum prot_type { > PROT_TYPE_IEP = 4, > }; > > -static int trans_exc(struct kvm_vcpu *vcpu, int code, unsigned long gva, > - u8 ar, enum gacc_mode mode, enum prot_type prot) > +static int trans_exc_ending(struct kvm_vcpu *vcpu, int code, unsigned long gva, u8 ar, > + enum gacc_mode mode, enum prot_type prot, bool terminate) > { > struct kvm_s390_pgm_info *pgm = &vcpu->arch.pgm; > struct trans_exc_code_bits *tec; > @@ -520,6 +520,11 @@ static int trans_exc(struct kvm_vcpu *vcpu, int code, unsigned long gva, > tec->b61 = 1; > break; > } > + if (terminate) { > + tec->b56 = 0; > + tec->b60 = 0; > + tec->b61 = 0; > + } > fallthrough; > case PGM_ASCE_TYPE: > case PGM_PAGE_TRANSLATION: > @@ -552,6 +557,12 @@ static int trans_exc(struct kvm_vcpu *vcpu, int code, unsigned long gva, > return code; > } > > +static int trans_exc(struct kvm_vcpu *vcpu, int code, unsigned long gva, u8 ar, > + enum gacc_mode mode, enum prot_type prot) > +{ > + return trans_exc_ending(vcpu, code, gva, ar, mode, prot, false); > +} > + > static int get_vcpu_asce(struct kvm_vcpu *vcpu, union asce *asce, > unsigned long ga, u8 ar, enum gacc_mode mode) > { > @@ -1109,8 +1120,11 @@ int access_guest_with_key(struct kvm_vcpu *vcpu, unsigned long ga, u8 ar, > data += fragment_len; > ga = kvm_s390_logical_to_effective(vcpu, ga + fragment_len); > } > - if (rc > 0) > - rc = trans_exc(vcpu, rc, ga, ar, mode, prot); > + if (rc > 0) { > + bool terminate = (mode == GACC_STORE) && (idx > 0); > + > + rc = trans_exc_ending(vcpu, rc, ga, ar, mode, prot, terminate); > + } > out_unlock: > if (need_ipte_lock) > ipte_unlock(vcpu);
On Thu, 12 May 2022 15:10:17 +0200 Janis Schoetterl-Glausch <scgl@linux.ibm.com> wrote: > If user space uses a memop to emulate an instruction and that > memop fails, the execution of the instruction ends. > Instruction execution can end in different ways, one of which is > suppression, which requires that the instruction execute like a no-op. > A writing memop that spans multiple pages and fails due to key > protection may have modified guest memory, as a result, the likely > correct ending is termination. Therefore, do not indicate a > suppressing instruction ending in this case. > > Signed-off-by: Janis Schoetterl-Glausch <scgl@linux.ibm.com> Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com> > --- > Documentation/virt/kvm/api.rst | 6 ++++++ > arch/s390/kvm/gaccess.c | 22 ++++++++++++++++++---- > 2 files changed, 24 insertions(+), 4 deletions(-) > > diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst > index 4a900cdbc62e..b6aba4f50db7 100644 > --- a/Documentation/virt/kvm/api.rst > +++ b/Documentation/virt/kvm/api.rst > @@ -3754,12 +3754,18 @@ in case of KVM_S390_MEMOP_F_CHECK_ONLY), the ioctl returns a positive > error number indicating the type of exception. This exception is also > raised directly at the corresponding VCPU if the flag > KVM_S390_MEMOP_F_INJECT_EXCEPTION is set. > +On protection exceptions, unless specified otherwise, the injected > +translation-exception identifier (TEID) indicates suppression. > > If the KVM_S390_MEMOP_F_SKEY_PROTECTION flag is set, storage key > protection is also in effect and may cause exceptions if accesses are > prohibited given the access key designated by "key"; the valid range is 0..15. > KVM_S390_MEMOP_F_SKEY_PROTECTION is available if KVM_CAP_S390_MEM_OP_EXTENSION > is > 0. > +Since the accessed memory may span multiple pages and those pages might have > +different storage keys, it is possible that a protection exception occurs > +after memory has been modified. In this case, if the exception is injected, > +the TEID does not indicate suppression. > > Absolute read/write: > ^^^^^^^^^^^^^^^^^^^^ > diff --git a/arch/s390/kvm/gaccess.c b/arch/s390/kvm/gaccess.c > index d53a183c2005..227ed0009354 100644 > --- a/arch/s390/kvm/gaccess.c > +++ b/arch/s390/kvm/gaccess.c > @@ -491,8 +491,8 @@ enum prot_type { > PROT_TYPE_IEP = 4, > }; > > -static int trans_exc(struct kvm_vcpu *vcpu, int code, unsigned long gva, > - u8 ar, enum gacc_mode mode, enum prot_type prot) > +static int trans_exc_ending(struct kvm_vcpu *vcpu, int code, unsigned long gva, u8 ar, > + enum gacc_mode mode, enum prot_type prot, bool terminate) > { > struct kvm_s390_pgm_info *pgm = &vcpu->arch.pgm; > struct trans_exc_code_bits *tec; > @@ -520,6 +520,11 @@ static int trans_exc(struct kvm_vcpu *vcpu, int code, unsigned long gva, > tec->b61 = 1; > break; > } > + if (terminate) { > + tec->b56 = 0; > + tec->b60 = 0; > + tec->b61 = 0; > + } > fallthrough; > case PGM_ASCE_TYPE: > case PGM_PAGE_TRANSLATION: > @@ -552,6 +557,12 @@ static int trans_exc(struct kvm_vcpu *vcpu, int code, unsigned long gva, > return code; > } > > +static int trans_exc(struct kvm_vcpu *vcpu, int code, unsigned long gva, u8 ar, > + enum gacc_mode mode, enum prot_type prot) > +{ > + return trans_exc_ending(vcpu, code, gva, ar, mode, prot, false); > +} > + > static int get_vcpu_asce(struct kvm_vcpu *vcpu, union asce *asce, > unsigned long ga, u8 ar, enum gacc_mode mode) > { > @@ -1109,8 +1120,11 @@ int access_guest_with_key(struct kvm_vcpu *vcpu, unsigned long ga, u8 ar, > data += fragment_len; > ga = kvm_s390_logical_to_effective(vcpu, ga + fragment_len); > } > - if (rc > 0) > - rc = trans_exc(vcpu, rc, ga, ar, mode, prot); > + if (rc > 0) { > + bool terminate = (mode == GACC_STORE) && (idx > 0); > + > + rc = trans_exc_ending(vcpu, rc, ga, ar, mode, prot, terminate); > + } > out_unlock: > if (need_ipte_lock) > ipte_unlock(vcpu);
diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst index 4a900cdbc62e..b6aba4f50db7 100644 --- a/Documentation/virt/kvm/api.rst +++ b/Documentation/virt/kvm/api.rst @@ -3754,12 +3754,18 @@ in case of KVM_S390_MEMOP_F_CHECK_ONLY), the ioctl returns a positive error number indicating the type of exception. This exception is also raised directly at the corresponding VCPU if the flag KVM_S390_MEMOP_F_INJECT_EXCEPTION is set. +On protection exceptions, unless specified otherwise, the injected +translation-exception identifier (TEID) indicates suppression. If the KVM_S390_MEMOP_F_SKEY_PROTECTION flag is set, storage key protection is also in effect and may cause exceptions if accesses are prohibited given the access key designated by "key"; the valid range is 0..15. KVM_S390_MEMOP_F_SKEY_PROTECTION is available if KVM_CAP_S390_MEM_OP_EXTENSION is > 0. +Since the accessed memory may span multiple pages and those pages might have +different storage keys, it is possible that a protection exception occurs +after memory has been modified. In this case, if the exception is injected, +the TEID does not indicate suppression. Absolute read/write: ^^^^^^^^^^^^^^^^^^^^ diff --git a/arch/s390/kvm/gaccess.c b/arch/s390/kvm/gaccess.c index d53a183c2005..227ed0009354 100644 --- a/arch/s390/kvm/gaccess.c +++ b/arch/s390/kvm/gaccess.c @@ -491,8 +491,8 @@ enum prot_type { PROT_TYPE_IEP = 4, }; -static int trans_exc(struct kvm_vcpu *vcpu, int code, unsigned long gva, - u8 ar, enum gacc_mode mode, enum prot_type prot) +static int trans_exc_ending(struct kvm_vcpu *vcpu, int code, unsigned long gva, u8 ar, + enum gacc_mode mode, enum prot_type prot, bool terminate) { struct kvm_s390_pgm_info *pgm = &vcpu->arch.pgm; struct trans_exc_code_bits *tec; @@ -520,6 +520,11 @@ static int trans_exc(struct kvm_vcpu *vcpu, int code, unsigned long gva, tec->b61 = 1; break; } + if (terminate) { + tec->b56 = 0; + tec->b60 = 0; + tec->b61 = 0; + } fallthrough; case PGM_ASCE_TYPE: case PGM_PAGE_TRANSLATION: @@ -552,6 +557,12 @@ static int trans_exc(struct kvm_vcpu *vcpu, int code, unsigned long gva, return code; } +static int trans_exc(struct kvm_vcpu *vcpu, int code, unsigned long gva, u8 ar, + enum gacc_mode mode, enum prot_type prot) +{ + return trans_exc_ending(vcpu, code, gva, ar, mode, prot, false); +} + static int get_vcpu_asce(struct kvm_vcpu *vcpu, union asce *asce, unsigned long ga, u8 ar, enum gacc_mode mode) { @@ -1109,8 +1120,11 @@ int access_guest_with_key(struct kvm_vcpu *vcpu, unsigned long ga, u8 ar, data += fragment_len; ga = kvm_s390_logical_to_effective(vcpu, ga + fragment_len); } - if (rc > 0) - rc = trans_exc(vcpu, rc, ga, ar, mode, prot); + if (rc > 0) { + bool terminate = (mode == GACC_STORE) && (idx > 0); + + rc = trans_exc_ending(vcpu, rc, ga, ar, mode, prot, terminate); + } out_unlock: if (need_ipte_lock) ipte_unlock(vcpu);
If user space uses a memop to emulate an instruction and that memop fails, the execution of the instruction ends. Instruction execution can end in different ways, one of which is suppression, which requires that the instruction execute like a no-op. A writing memop that spans multiple pages and fails due to key protection may have modified guest memory, as a result, the likely correct ending is termination. Therefore, do not indicate a suppressing instruction ending in this case. Signed-off-by: Janis Schoetterl-Glausch <scgl@linux.ibm.com> --- Documentation/virt/kvm/api.rst | 6 ++++++ arch/s390/kvm/gaccess.c | 22 ++++++++++++++++++---- 2 files changed, 24 insertions(+), 4 deletions(-)