diff mbox series

[v3,1/2] KVM: s390: Don't indicate suppression on dirtying, failing memop

Message ID 20220512131019.2594948-2-scgl@linux.ibm.com (mailing list archive)
State New, archived
Headers show
Series Dirtying, failing memop: don't indicate suppression | expand

Commit Message

Janis Schoetterl-Glausch May 12, 2022, 1:10 p.m. UTC
If user space uses a memop to emulate an instruction and that
memop fails, the execution of the instruction ends.
Instruction execution can end in different ways, one of which is
suppression, which requires that the instruction execute like a no-op.
A writing memop that spans multiple pages and fails due to key
protection may have modified guest memory, as a result, the likely
correct ending is termination. Therefore, do not indicate a
suppressing instruction ending in this case.

Signed-off-by: Janis Schoetterl-Glausch <scgl@linux.ibm.com>
---
 Documentation/virt/kvm/api.rst |  6 ++++++
 arch/s390/kvm/gaccess.c        | 22 ++++++++++++++++++----
 2 files changed, 24 insertions(+), 4 deletions(-)

Comments

David Hildenbrand May 12, 2022, 1:22 p.m. UTC | #1
On 12.05.22 15:10, Janis Schoetterl-Glausch wrote:
> If user space uses a memop to emulate an instruction and that
> memop fails, the execution of the instruction ends.
> Instruction execution can end in different ways, one of which is
> suppression, which requires that the instruction execute like a no-op.
> A writing memop that spans multiple pages and fails due to key
> protection may have modified guest memory, as a result, the likely
> correct ending is termination. Therefore, do not indicate a
> suppressing instruction ending in this case.

I think that is possibly problematic handling.

In TCG we stumbled in similar issues in the past for MVC when crossing
page boundaries. Failing after modifying the first page already
seriously broke some user space, because the guest would retry the
instruction after fixing up the fault reason on the second page: if
source and destination operands overlap, you'll be in trouble because
the input parameters already changed.

For this reason, in TCG we make sure that all accesses are valid before
starting modifications.

See target/s390x/tcg/mem_helper.c:do_helper_mvc with access_prepare()
and friends as an example.

Now, I don't know how to tackle that for KVM, I just wanted to raise
awareness that injecting an interrupt after modifying page content is
possible dodgy and dangerous.
Christian Borntraeger May 12, 2022, 1:51 p.m. UTC | #2
Am 12.05.22 um 15:22 schrieb David Hildenbrand:
> On 12.05.22 15:10, Janis Schoetterl-Glausch wrote:
>> If user space uses a memop to emulate an instruction and that
>> memop fails, the execution of the instruction ends.
>> Instruction execution can end in different ways, one of which is
>> suppression, which requires that the instruction execute like a no-op.
>> A writing memop that spans multiple pages and fails due to key
>> protection may have modified guest memory, as a result, the likely
>> correct ending is termination. Therefore, do not indicate a
>> suppressing instruction ending in this case.
> 
> I think that is possibly problematic handling.
> 
> In TCG we stumbled in similar issues in the past for MVC when crossing
> page boundaries. Failing after modifying the first page already
> seriously broke some user space, because the guest would retry the
> instruction after fixing up the fault reason on the second page: if
> source and destination operands overlap, you'll be in trouble because
> the input parameters already changed.
> 
> For this reason, in TCG we make sure that all accesses are valid before
> starting modifications.
> 
> See target/s390x/tcg/mem_helper.c:do_helper_mvc with access_prepare()
> and friends as an example.
> 
> Now, I don't know how to tackle that for KVM, I just wanted to raise
> awareness that injecting an interrupt after modifying page content is
> possible dodgy and dangerous.

this is really special and only for key protection crossing pages.
Its been done since the 70ies in that way on z/VM. The architecture
is and was always written in a way to allow termination for this
case for hypervisors.
David Hildenbrand May 12, 2022, 3:50 p.m. UTC | #3
On 12.05.22 15:51, Christian Borntraeger wrote:
> 
> 
> Am 12.05.22 um 15:22 schrieb David Hildenbrand:
>> On 12.05.22 15:10, Janis Schoetterl-Glausch wrote:
>>> If user space uses a memop to emulate an instruction and that
>>> memop fails, the execution of the instruction ends.
>>> Instruction execution can end in different ways, one of which is
>>> suppression, which requires that the instruction execute like a no-op.
>>> A writing memop that spans multiple pages and fails due to key
>>> protection may have modified guest memory, as a result, the likely
>>> correct ending is termination. Therefore, do not indicate a
>>> suppressing instruction ending in this case.
>>
>> I think that is possibly problematic handling.
>>
>> In TCG we stumbled in similar issues in the past for MVC when crossing
>> page boundaries. Failing after modifying the first page already
>> seriously broke some user space, because the guest would retry the
>> instruction after fixing up the fault reason on the second page: if
>> source and destination operands overlap, you'll be in trouble because
>> the input parameters already changed.
>>
>> For this reason, in TCG we make sure that all accesses are valid before
>> starting modifications.
>>
>> See target/s390x/tcg/mem_helper.c:do_helper_mvc with access_prepare()
>> and friends as an example.
>>
>> Now, I don't know how to tackle that for KVM, I just wanted to raise
>> awareness that injecting an interrupt after modifying page content is
>> possible dodgy and dangerous.
> 
> this is really special and only for key protection crossing pages.
> Its been done since the 70ies in that way on z/VM. The architecture
> is and was always written in a way to allow termination for this
> case for hypervisors.

Just so I understand correctly: all instructions that a hypervisor with
hardware virtualization is supposed to emulate are "written in a way to
allow termination", correct? That makes things a lot easier.
Christian Borntraeger May 12, 2022, 4:26 p.m. UTC | #4
Am 12.05.22 um 17:50 schrieb David Hildenbrand:
> On 12.05.22 15:51, Christian Borntraeger wrote:
>>
>>
>> Am 12.05.22 um 15:22 schrieb David Hildenbrand:
>>> On 12.05.22 15:10, Janis Schoetterl-Glausch wrote:
>>>> If user space uses a memop to emulate an instruction and that
>>>> memop fails, the execution of the instruction ends.
>>>> Instruction execution can end in different ways, one of which is
>>>> suppression, which requires that the instruction execute like a no-op.
>>>> A writing memop that spans multiple pages and fails due to key
>>>> protection may have modified guest memory, as a result, the likely
>>>> correct ending is termination. Therefore, do not indicate a
>>>> suppressing instruction ending in this case.
>>>
>>> I think that is possibly problematic handling.
>>>
>>> In TCG we stumbled in similar issues in the past for MVC when crossing
>>> page boundaries. Failing after modifying the first page already
>>> seriously broke some user space, because the guest would retry the
>>> instruction after fixing up the fault reason on the second page: if
>>> source and destination operands overlap, you'll be in trouble because
>>> the input parameters already changed.
>>>
>>> For this reason, in TCG we make sure that all accesses are valid before
>>> starting modifications.
>>>
>>> See target/s390x/tcg/mem_helper.c:do_helper_mvc with access_prepare()
>>> and friends as an example.
>>>
>>> Now, I don't know how to tackle that for KVM, I just wanted to raise
>>> awareness that injecting an interrupt after modifying page content is
>>> possible dodgy and dangerous.
>>
>> this is really special and only for key protection crossing pages.
>> Its been done since the 70ies in that way on z/VM. The architecture
>> is and was always written in a way to allow termination for this
>> case for hypervisors.
> 
> Just so I understand correctly: all instructions that a hypervisor with
> hardware virtualization is supposed to emulate are "written in a way to
> allow termination", correct? That makes things a lot easier.

Only for key protection. Key protection can always be terminating no matter
what the instruction says. This is historical baggage - key protection was
resulting in abends - killing the process. So it does not matter if we
provide the extra info as in enhanced suppression on protection as nobody
is making use of that (apart from debuggers maybe).
David Hildenbrand May 12, 2022, 4:40 p.m. UTC | #5
On 12.05.22 18:26, Christian Borntraeger wrote:
> 
> 
> Am 12.05.22 um 17:50 schrieb David Hildenbrand:
>> On 12.05.22 15:51, Christian Borntraeger wrote:
>>>
>>>
>>> Am 12.05.22 um 15:22 schrieb David Hildenbrand:
>>>> On 12.05.22 15:10, Janis Schoetterl-Glausch wrote:
>>>>> If user space uses a memop to emulate an instruction and that
>>>>> memop fails, the execution of the instruction ends.
>>>>> Instruction execution can end in different ways, one of which is
>>>>> suppression, which requires that the instruction execute like a no-op.
>>>>> A writing memop that spans multiple pages and fails due to key
>>>>> protection may have modified guest memory, as a result, the likely
>>>>> correct ending is termination. Therefore, do not indicate a
>>>>> suppressing instruction ending in this case.
>>>>
>>>> I think that is possibly problematic handling.
>>>>
>>>> In TCG we stumbled in similar issues in the past for MVC when crossing
>>>> page boundaries. Failing after modifying the first page already
>>>> seriously broke some user space, because the guest would retry the
>>>> instruction after fixing up the fault reason on the second page: if
>>>> source and destination operands overlap, you'll be in trouble because
>>>> the input parameters already changed.
>>>>
>>>> For this reason, in TCG we make sure that all accesses are valid before
>>>> starting modifications.
>>>>
>>>> See target/s390x/tcg/mem_helper.c:do_helper_mvc with access_prepare()
>>>> and friends as an example.
>>>>
>>>> Now, I don't know how to tackle that for KVM, I just wanted to raise
>>>> awareness that injecting an interrupt after modifying page content is
>>>> possible dodgy and dangerous.
>>>
>>> this is really special and only for key protection crossing pages.
>>> Its been done since the 70ies in that way on z/VM. The architecture
>>> is and was always written in a way to allow termination for this
>>> case for hypervisors.
>>
>> Just so I understand correctly: all instructions that a hypervisor with
>> hardware virtualization is supposed to emulate are "written in a way to
>> allow termination", correct? That makes things a lot easier.
> 
> Only for key protection. Key protection can always be terminating no matter
> what the instruction says. This is historical baggage - key protection was
> resulting in abends - killing the process. So it does not matter if we
> provide the extra info as in enhanced suppression on protection as nobody
> is making use of that (apart from debuggers maybe).

Got it, makes sense then. Thanks for clarifying!
Christian Borntraeger May 17, 2022, 12:25 p.m. UTC | #6
Am 12.05.22 um 15:10 schrieb Janis Schoetterl-Glausch:
> If user space uses a memop to emulate an instruction and that
> memop fails, the execution of the instruction ends.
> Instruction execution can end in different ways, one of which is
> suppression, which requires that the instruction execute like a no-op.
> A writing memop that spans multiple pages and fails due to key
> protection may have modified guest memory, as a result, the likely
> correct ending is termination. Therefore, do not indicate a
> suppressing instruction ending in this case.
> 
> Signed-off-by: Janis Schoetterl-Glausch <scgl@linux.ibm.com>

Reviewed-by: Christian Borntraeger <borntraeger@linux.ibm.com>

> ---
>   Documentation/virt/kvm/api.rst |  6 ++++++
>   arch/s390/kvm/gaccess.c        | 22 ++++++++++++++++++----
>   2 files changed, 24 insertions(+), 4 deletions(-)
> 
> diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
> index 4a900cdbc62e..b6aba4f50db7 100644
> --- a/Documentation/virt/kvm/api.rst
> +++ b/Documentation/virt/kvm/api.rst
> @@ -3754,12 +3754,18 @@ in case of KVM_S390_MEMOP_F_CHECK_ONLY), the ioctl returns a positive
>   error number indicating the type of exception. This exception is also
>   raised directly at the corresponding VCPU if the flag
>   KVM_S390_MEMOP_F_INJECT_EXCEPTION is set.
> +On protection exceptions, unless specified otherwise, the injected
> +translation-exception identifier (TEID) indicates suppression.
>   
>   If the KVM_S390_MEMOP_F_SKEY_PROTECTION flag is set, storage key
>   protection is also in effect and may cause exceptions if accesses are
>   prohibited given the access key designated by "key"; the valid range is 0..15.
>   KVM_S390_MEMOP_F_SKEY_PROTECTION is available if KVM_CAP_S390_MEM_OP_EXTENSION
>   is > 0.
> +Since the accessed memory may span multiple pages and those pages might have
> +different storage keys, it is possible that a protection exception occurs
> +after memory has been modified. In this case, if the exception is injected,
> +the TEID does not indicate suppression.
>   
>   Absolute read/write:
>   ^^^^^^^^^^^^^^^^^^^^
> diff --git a/arch/s390/kvm/gaccess.c b/arch/s390/kvm/gaccess.c
> index d53a183c2005..227ed0009354 100644
> --- a/arch/s390/kvm/gaccess.c
> +++ b/arch/s390/kvm/gaccess.c
> @@ -491,8 +491,8 @@ enum prot_type {
>   	PROT_TYPE_IEP  = 4,
>   };
>   
> -static int trans_exc(struct kvm_vcpu *vcpu, int code, unsigned long gva,
> -		     u8 ar, enum gacc_mode mode, enum prot_type prot)
> +static int trans_exc_ending(struct kvm_vcpu *vcpu, int code, unsigned long gva, u8 ar,
> +			    enum gacc_mode mode, enum prot_type prot, bool terminate)
>   {
>   	struct kvm_s390_pgm_info *pgm = &vcpu->arch.pgm;
>   	struct trans_exc_code_bits *tec;
> @@ -520,6 +520,11 @@ static int trans_exc(struct kvm_vcpu *vcpu, int code, unsigned long gva,
>   			tec->b61 = 1;
>   			break;
>   		}
> +		if (terminate) {
> +			tec->b56 = 0;
> +			tec->b60 = 0;
> +			tec->b61 = 0;
> +		}
>   		fallthrough;
>   	case PGM_ASCE_TYPE:
>   	case PGM_PAGE_TRANSLATION:
> @@ -552,6 +557,12 @@ static int trans_exc(struct kvm_vcpu *vcpu, int code, unsigned long gva,
>   	return code;
>   }
>   
> +static int trans_exc(struct kvm_vcpu *vcpu, int code, unsigned long gva, u8 ar,
> +		     enum gacc_mode mode, enum prot_type prot)
> +{
> +	return trans_exc_ending(vcpu, code, gva, ar, mode, prot, false);
> +}
> +
>   static int get_vcpu_asce(struct kvm_vcpu *vcpu, union asce *asce,
>   			 unsigned long ga, u8 ar, enum gacc_mode mode)
>   {
> @@ -1109,8 +1120,11 @@ int access_guest_with_key(struct kvm_vcpu *vcpu, unsigned long ga, u8 ar,
>   		data += fragment_len;
>   		ga = kvm_s390_logical_to_effective(vcpu, ga + fragment_len);
>   	}
> -	if (rc > 0)
> -		rc = trans_exc(vcpu, rc, ga, ar, mode, prot);
> +	if (rc > 0) {
> +		bool terminate = (mode == GACC_STORE) && (idx > 0);
> +
> +		rc = trans_exc_ending(vcpu, rc, ga, ar, mode, prot, terminate);
> +	}
>   out_unlock:
>   	if (need_ipte_lock)
>   		ipte_unlock(vcpu);
Claudio Imbrenda May 17, 2022, 2:45 p.m. UTC | #7
On Thu, 12 May 2022 15:10:17 +0200
Janis Schoetterl-Glausch <scgl@linux.ibm.com> wrote:

> If user space uses a memop to emulate an instruction and that
> memop fails, the execution of the instruction ends.
> Instruction execution can end in different ways, one of which is
> suppression, which requires that the instruction execute like a no-op.
> A writing memop that spans multiple pages and fails due to key
> protection may have modified guest memory, as a result, the likely
> correct ending is termination. Therefore, do not indicate a
> suppressing instruction ending in this case.
> 
> Signed-off-by: Janis Schoetterl-Glausch <scgl@linux.ibm.com>

Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com>

> ---
>  Documentation/virt/kvm/api.rst |  6 ++++++
>  arch/s390/kvm/gaccess.c        | 22 ++++++++++++++++++----
>  2 files changed, 24 insertions(+), 4 deletions(-)
> 
> diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
> index 4a900cdbc62e..b6aba4f50db7 100644
> --- a/Documentation/virt/kvm/api.rst
> +++ b/Documentation/virt/kvm/api.rst
> @@ -3754,12 +3754,18 @@ in case of KVM_S390_MEMOP_F_CHECK_ONLY), the ioctl returns a positive
>  error number indicating the type of exception. This exception is also
>  raised directly at the corresponding VCPU if the flag
>  KVM_S390_MEMOP_F_INJECT_EXCEPTION is set.
> +On protection exceptions, unless specified otherwise, the injected
> +translation-exception identifier (TEID) indicates suppression.
>  
>  If the KVM_S390_MEMOP_F_SKEY_PROTECTION flag is set, storage key
>  protection is also in effect and may cause exceptions if accesses are
>  prohibited given the access key designated by "key"; the valid range is 0..15.
>  KVM_S390_MEMOP_F_SKEY_PROTECTION is available if KVM_CAP_S390_MEM_OP_EXTENSION
>  is > 0.
> +Since the accessed memory may span multiple pages and those pages might have
> +different storage keys, it is possible that a protection exception occurs
> +after memory has been modified. In this case, if the exception is injected,
> +the TEID does not indicate suppression.
>  
>  Absolute read/write:
>  ^^^^^^^^^^^^^^^^^^^^
> diff --git a/arch/s390/kvm/gaccess.c b/arch/s390/kvm/gaccess.c
> index d53a183c2005..227ed0009354 100644
> --- a/arch/s390/kvm/gaccess.c
> +++ b/arch/s390/kvm/gaccess.c
> @@ -491,8 +491,8 @@ enum prot_type {
>  	PROT_TYPE_IEP  = 4,
>  };
>  
> -static int trans_exc(struct kvm_vcpu *vcpu, int code, unsigned long gva,
> -		     u8 ar, enum gacc_mode mode, enum prot_type prot)
> +static int trans_exc_ending(struct kvm_vcpu *vcpu, int code, unsigned long gva, u8 ar,
> +			    enum gacc_mode mode, enum prot_type prot, bool terminate)
>  {
>  	struct kvm_s390_pgm_info *pgm = &vcpu->arch.pgm;
>  	struct trans_exc_code_bits *tec;
> @@ -520,6 +520,11 @@ static int trans_exc(struct kvm_vcpu *vcpu, int code, unsigned long gva,
>  			tec->b61 = 1;
>  			break;
>  		}
> +		if (terminate) {
> +			tec->b56 = 0;
> +			tec->b60 = 0;
> +			tec->b61 = 0;
> +		}
>  		fallthrough;
>  	case PGM_ASCE_TYPE:
>  	case PGM_PAGE_TRANSLATION:
> @@ -552,6 +557,12 @@ static int trans_exc(struct kvm_vcpu *vcpu, int code, unsigned long gva,
>  	return code;
>  }
>  
> +static int trans_exc(struct kvm_vcpu *vcpu, int code, unsigned long gva, u8 ar,
> +		     enum gacc_mode mode, enum prot_type prot)
> +{
> +	return trans_exc_ending(vcpu, code, gva, ar, mode, prot, false);
> +}
> +
>  static int get_vcpu_asce(struct kvm_vcpu *vcpu, union asce *asce,
>  			 unsigned long ga, u8 ar, enum gacc_mode mode)
>  {
> @@ -1109,8 +1120,11 @@ int access_guest_with_key(struct kvm_vcpu *vcpu, unsigned long ga, u8 ar,
>  		data += fragment_len;
>  		ga = kvm_s390_logical_to_effective(vcpu, ga + fragment_len);
>  	}
> -	if (rc > 0)
> -		rc = trans_exc(vcpu, rc, ga, ar, mode, prot);
> +	if (rc > 0) {
> +		bool terminate = (mode == GACC_STORE) && (idx > 0);
> +
> +		rc = trans_exc_ending(vcpu, rc, ga, ar, mode, prot, terminate);
> +	}
>  out_unlock:
>  	if (need_ipte_lock)
>  		ipte_unlock(vcpu);
diff mbox series

Patch

diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
index 4a900cdbc62e..b6aba4f50db7 100644
--- a/Documentation/virt/kvm/api.rst
+++ b/Documentation/virt/kvm/api.rst
@@ -3754,12 +3754,18 @@  in case of KVM_S390_MEMOP_F_CHECK_ONLY), the ioctl returns a positive
 error number indicating the type of exception. This exception is also
 raised directly at the corresponding VCPU if the flag
 KVM_S390_MEMOP_F_INJECT_EXCEPTION is set.
+On protection exceptions, unless specified otherwise, the injected
+translation-exception identifier (TEID) indicates suppression.
 
 If the KVM_S390_MEMOP_F_SKEY_PROTECTION flag is set, storage key
 protection is also in effect and may cause exceptions if accesses are
 prohibited given the access key designated by "key"; the valid range is 0..15.
 KVM_S390_MEMOP_F_SKEY_PROTECTION is available if KVM_CAP_S390_MEM_OP_EXTENSION
 is > 0.
+Since the accessed memory may span multiple pages and those pages might have
+different storage keys, it is possible that a protection exception occurs
+after memory has been modified. In this case, if the exception is injected,
+the TEID does not indicate suppression.
 
 Absolute read/write:
 ^^^^^^^^^^^^^^^^^^^^
diff --git a/arch/s390/kvm/gaccess.c b/arch/s390/kvm/gaccess.c
index d53a183c2005..227ed0009354 100644
--- a/arch/s390/kvm/gaccess.c
+++ b/arch/s390/kvm/gaccess.c
@@ -491,8 +491,8 @@  enum prot_type {
 	PROT_TYPE_IEP  = 4,
 };
 
-static int trans_exc(struct kvm_vcpu *vcpu, int code, unsigned long gva,
-		     u8 ar, enum gacc_mode mode, enum prot_type prot)
+static int trans_exc_ending(struct kvm_vcpu *vcpu, int code, unsigned long gva, u8 ar,
+			    enum gacc_mode mode, enum prot_type prot, bool terminate)
 {
 	struct kvm_s390_pgm_info *pgm = &vcpu->arch.pgm;
 	struct trans_exc_code_bits *tec;
@@ -520,6 +520,11 @@  static int trans_exc(struct kvm_vcpu *vcpu, int code, unsigned long gva,
 			tec->b61 = 1;
 			break;
 		}
+		if (terminate) {
+			tec->b56 = 0;
+			tec->b60 = 0;
+			tec->b61 = 0;
+		}
 		fallthrough;
 	case PGM_ASCE_TYPE:
 	case PGM_PAGE_TRANSLATION:
@@ -552,6 +557,12 @@  static int trans_exc(struct kvm_vcpu *vcpu, int code, unsigned long gva,
 	return code;
 }
 
+static int trans_exc(struct kvm_vcpu *vcpu, int code, unsigned long gva, u8 ar,
+		     enum gacc_mode mode, enum prot_type prot)
+{
+	return trans_exc_ending(vcpu, code, gva, ar, mode, prot, false);
+}
+
 static int get_vcpu_asce(struct kvm_vcpu *vcpu, union asce *asce,
 			 unsigned long ga, u8 ar, enum gacc_mode mode)
 {
@@ -1109,8 +1120,11 @@  int access_guest_with_key(struct kvm_vcpu *vcpu, unsigned long ga, u8 ar,
 		data += fragment_len;
 		ga = kvm_s390_logical_to_effective(vcpu, ga + fragment_len);
 	}
-	if (rc > 0)
-		rc = trans_exc(vcpu, rc, ga, ar, mode, prot);
+	if (rc > 0) {
+		bool terminate = (mode == GACC_STORE) && (idx > 0);
+
+		rc = trans_exc_ending(vcpu, rc, ga, ar, mode, prot, terminate);
+	}
 out_unlock:
 	if (need_ipte_lock)
 		ipte_unlock(vcpu);