diff mbox

[GIT,PULL,7/9] KVM: s390: Backup the guest's machine check info

Message ID 1498671044-81240-8-git-send-email-borntraeger@de.ibm.com (mailing list archive)
State New, archived
Headers show

Commit Message

Christian Borntraeger June 28, 2017, 5:30 p.m. UTC
From: QingFeng Hao <haoqf@linux.vnet.ibm.com>

When a machine check happens in the guest, related mcck info (mcic,
external damage code, ...) is stored in the vcpu's lowcore on the host.
Then the machine check handler's low-level part is executed, followed
by the high-level part.

If the high-level part's execution is interrupted by a new machine check
happening on the same vcpu on the host, the mcck info in the lowcore is
overwritten with the new machine check's data.

If the high-level part's execution is scheduled to a different cpu,
the mcck info in the lowcore is uncertain.

Therefore, for both cases, the further reinjection to the guest will use
the wrong data.
Let's backup the mcck info in the lowcore to the sie page
for further reinjection, so that the right data will be used.

Add new member into struct sie_page to store related machine check's
info of mcic, failing storage address and external damage code.

Signed-off-by: QingFeng Hao <haoqf@linux.vnet.ibm.com>
Acked-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
---
 arch/s390/include/asm/kvm_host.h | 17 ++++++++++++++++-
 arch/s390/kernel/nmi.c           | 34 ++++++++++++++++++++++++++++++++++
 arch/s390/kvm/kvm-s390.c         |  1 +
 3 files changed, 51 insertions(+), 1 deletion(-)

Comments

David Hildenbrand June 28, 2017, 6:20 p.m. UTC | #1
On 28.06.2017 19:30, Christian Borntraeger wrote:
> From: QingFeng Hao <haoqf@linux.vnet.ibm.com>
> 
> When a machine check happens in the guest, related mcck info (mcic,
> external damage code, ...) is stored in the vcpu's lowcore on the host.
> Then the machine check handler's low-level part is executed, followed
> by the high-level part.
> 
> If the high-level part's execution is interrupted by a new machine check
> happening on the same vcpu on the host, the mcck info in the lowcore is
> overwritten with the new machine check's data.
> 
> If the high-level part's execution is scheduled to a different cpu,
> the mcck info in the lowcore is uncertain.
> 
> Therefore, for both cases, the further reinjection to the guest will use
> the wrong data.
> Let's backup the mcck info in the lowcore to the sie page
> for further reinjection, so that the right data will be used.
> 
> Add new member into struct sie_page to store related machine check's
> info of mcic, failing storage address and external damage code.
> 


When this happens while the guest is running, there will be some
registers written into the low core save area (gprs, cr etc.) during the
machine check. Are these always host registers? Or can these be guest
registers?

Also, do the "valid" flags always refer to guest or host bits?

If they can be guest bits, I think we would have to do more translation.
And most likely treat vSIE special.

Or will something like that always lead to a host crash and real machine
errors will still take the host down, and not the guest?
Christian Borntraeger June 28, 2017, 6:37 p.m. UTC | #2
On 06/28/2017 08:20 PM, David Hildenbrand wrote:
> On 28.06.2017 19:30, Christian Borntraeger wrote:
>> From: QingFeng Hao <haoqf@linux.vnet.ibm.com>
>>
>> When a machine check happens in the guest, related mcck info (mcic,
>> external damage code, ...) is stored in the vcpu's lowcore on the host.
>> Then the machine check handler's low-level part is executed, followed
>> by the high-level part.
>>
>> If the high-level part's execution is interrupted by a new machine check
>> happening on the same vcpu on the host, the mcck info in the lowcore is
>> overwritten with the new machine check's data.
>>
>> If the high-level part's execution is scheduled to a different cpu,
>> the mcck info in the lowcore is uncertain.
>>
>> Therefore, for both cases, the further reinjection to the guest will use
>> the wrong data.
>> Let's backup the mcck info in the lowcore to the sie page
>> for further reinjection, so that the right data will be used.
>>
>> Add new member into struct sie_page to store related machine check's
>> info of mcic, failing storage address and external damage code.
>>
> 
> 
> When this happens while the guest is running, there will be some
> registers written into the low core save area (gprs, cr etc.) during the
> machine check. Are these always host registers? Or can these be guest
> registers?

Always host registers (the machine will exit SIE before presenting the machine
check).
> 
> Also, do the "valid" flags always refer to guest or host bits?

The host bits.
> 
> If they can be guest bits, I think we would have to do more translation.
> And most likely treat vSIE special.
> 
> Or will something like that always lead to a host crash and real machine
> errors will still take the host down, and not the guest?
> 
>
David Hildenbrand June 28, 2017, 6:56 p.m. UTC | #3
On 28.06.2017 20:37, Christian Borntraeger wrote:
> On 06/28/2017 08:20 PM, David Hildenbrand wrote:
>> On 28.06.2017 19:30, Christian Borntraeger wrote:
>>> From: QingFeng Hao <haoqf@linux.vnet.ibm.com>
>>>
>>> When a machine check happens in the guest, related mcck info (mcic,
>>> external damage code, ...) is stored in the vcpu's lowcore on the host.
>>> Then the machine check handler's low-level part is executed, followed
>>> by the high-level part.
>>>
>>> If the high-level part's execution is interrupted by a new machine check
>>> happening on the same vcpu on the host, the mcck info in the lowcore is
>>> overwritten with the new machine check's data.
>>>
>>> If the high-level part's execution is scheduled to a different cpu,
>>> the mcck info in the lowcore is uncertain.
>>>
>>> Therefore, for both cases, the further reinjection to the guest will use
>>> the wrong data.
>>> Let's backup the mcck info in the lowcore to the sie page
>>> for further reinjection, so that the right data will be used.
>>>
>>> Add new member into struct sie_page to store related machine check's
>>> info of mcic, failing storage address and external damage code.
>>>
>>
>>
>> When this happens while the guest is running, there will be some
>> registers written into the low core save area (gprs, cr etc.) during the
>> machine check. Are these always host registers? Or can these be guest
>> registers?
> 
> Always host registers (the machine will exit SIE before presenting the machine
> check).
>>
>> Also, do the "valid" flags always refer to guest or host bits?
> 
> The host bits.

Okay, then why is mcic extracted from the real machine check and passed
on to the guest (almost unmodified)?

Wouldn't all guest registers than have to be always valid?

Also, if the guest does not have vector registers enabled, but the host
does, and there is a machine check, the guest would suddenly have the
vector registers valid flag indicated, which is strange.

For ordinary crw machine checks, we properly take care of that (QEMU
build_channel_report_mcic()).
Christian Borntraeger June 28, 2017, 7:14 p.m. UTC | #4
On 06/28/2017 08:56 PM, David Hildenbrand wrote:
> On 28.06.2017 20:37, Christian Borntraeger wrote:
>> On 06/28/2017 08:20 PM, David Hildenbrand wrote:
>>> On 28.06.2017 19:30, Christian Borntraeger wrote:
>>>> From: QingFeng Hao <haoqf@linux.vnet.ibm.com>
>>>>
>>>> When a machine check happens in the guest, related mcck info (mcic,
>>>> external damage code, ...) is stored in the vcpu's lowcore on the host.
>>>> Then the machine check handler's low-level part is executed, followed
>>>> by the high-level part.
>>>>
>>>> If the high-level part's execution is interrupted by a new machine check
>>>> happening on the same vcpu on the host, the mcck info in the lowcore is
>>>> overwritten with the new machine check's data.
>>>>
>>>> If the high-level part's execution is scheduled to a different cpu,
>>>> the mcck info in the lowcore is uncertain.
>>>>
>>>> Therefore, for both cases, the further reinjection to the guest will use
>>>> the wrong data.
>>>> Let's backup the mcck info in the lowcore to the sie page
>>>> for further reinjection, so that the right data will be used.
>>>>
>>>> Add new member into struct sie_page to store related machine check's
>>>> info of mcic, failing storage address and external damage code.
>>>>
>>>
>>>
>>> When this happens while the guest is running, there will be some
>>> registers written into the low core save area (gprs, cr etc.) during the
>>> machine check. Are these always host registers? Or can these be guest
>>> registers?
>>
>> Always host registers (the machine will exit SIE before presenting the machine
>> check).
>>>
>>> Also, do the "valid" flags always refer to guest or host bits?
>>
>> The host bits.
> 
> Okay, then why is mcic extracted from the real machine check and passed
> on to the guest (almost unmodified)?
> 
> Wouldn't all guest registers than have to be always valid?

No. Several guest registers are shared with host registers and reloaded lazily.
For example if the floating point registers are invalid, they will be invalid 
for the host (but the registers contain the guest values).
From a KVM perspective we can always turn off valid bits in case of doubts 
(overindicate errors). So the current variant is certainly not perfect, but it
at least will not claim correct registers when they are incorrect.
We can certainly improve the handling in a future patch. (e.g. turn on validity
for control registers), I will have a look.

> 
> Also, if the guest does not have vector registers enabled, but the host
> does, and there is a machine check, the guest would suddenly have the
> vector registers valid flag indicated, which is strange.

Yes, this should be masked away. Will provide a fixup patch.
> 
> For ordinary crw machine checks, we properly take care of that (QEMU
> build_channel_report_mcic()).
> 
>
David Hildenbrand June 28, 2017, 7:46 p.m. UTC | #5
> No. Several guest registers are shared with host registers and reloaded lazily.
> For example if the floating point registers are invalid, they will be invalid 
> for the host (but the registers contain the guest values).
> From a KVM perspective we can always turn off valid bits in case of doubts 
> (overindicate errors). So the current variant is certainly not perfect, but it
> at least will not claim correct registers when they are incorrect.
> We can certainly improve the handling in a future patch. (e.g. turn on validity
> for control registers), I will have a look.
> 

As far as I understand, that would then mean, enabling all validity
flags for registers that are not shared. Makes sense.
diff mbox

Patch

diff --git a/arch/s390/include/asm/kvm_host.h b/arch/s390/include/asm/kvm_host.h
index 426614a..c6e1d5f 100644
--- a/arch/s390/include/asm/kvm_host.h
+++ b/arch/s390/include/asm/kvm_host.h
@@ -107,6 +107,20 @@  struct esca_block {
 	struct esca_entry cpu[KVM_S390_ESCA_CPU_SLOTS];
 } __packed;
 
+/*
+ * This struct is used to store some machine check info from lowcore
+ * for machine checks that happen while the guest is running.
+ * This info in host's lowcore might be overwritten by a second machine
+ * check from host when host is in the machine check's high-level handling.
+ * The size is 24 bytes.
+ */
+struct mcck_volatile_info {
+	__u64 mcic;
+	__u64 failing_storage_address;
+	__u32 ext_damage_code;
+	__u32 reserved;
+};
+
 #define CPUSTAT_STOPPED    0x80000000
 #define CPUSTAT_WAIT       0x10000000
 #define CPUSTAT_ECALL_PEND 0x08000000
@@ -264,7 +278,8 @@  struct kvm_s390_itdb {
 
 struct sie_page {
 	struct kvm_s390_sie_block sie_block;
-	__u8 reserved200[1024];		/* 0x0200 */
+	struct mcck_volatile_info mcck_info;	/* 0x0200 */
+	__u8 reserved218[1000];		/* 0x0218 */
 	struct kvm_s390_itdb itdb;	/* 0x0600 */
 	__u8 reserved700[2304];		/* 0x0700 */
 } __packed;
diff --git a/arch/s390/kernel/nmi.c b/arch/s390/kernel/nmi.c
index 958cc33..31d03a8 100644
--- a/arch/s390/kernel/nmi.c
+++ b/arch/s390/kernel/nmi.c
@@ -26,6 +26,7 @@ 
 #include <asm/switch_to.h>
 #include <asm/ctl_reg.h>
 #include <asm/asm-offsets.h>
+#include <linux/kvm_host.h>
 
 struct mcck_struct {
 	unsigned int kill_task : 1;
@@ -275,6 +276,31 @@  static int notrace s390_validate_registers(union mci mci, int umode)
 	return kill_task;
 }
 
+/*
+ * Backup the guest's machine check info to its description block
+ */
+static void notrace s390_backup_mcck_info(struct pt_regs *regs)
+{
+	struct mcck_volatile_info *mcck_backup;
+	struct sie_page *sie_page;
+
+	/* r14 contains the sie block, which was set in sie64a */
+	struct kvm_s390_sie_block *sie_block =
+			(struct kvm_s390_sie_block *) regs->gprs[14];
+
+	if (sie_block == NULL)
+		/* Something's seriously wrong, stop system. */
+		s390_handle_damage();
+
+	sie_page = container_of(sie_block, struct sie_page, sie_block);
+	mcck_backup = &sie_page->mcck_info;
+	mcck_backup->mcic = S390_lowcore.mcck_interruption_code &
+				~(MCCK_CODE_CP | MCCK_CODE_EXT_DAMAGE);
+	mcck_backup->ext_damage_code = S390_lowcore.external_damage_code;
+	mcck_backup->failing_storage_address
+			= S390_lowcore.failing_storage_address;
+}
+
 #define MAX_IPD_COUNT	29
 #define MAX_IPD_TIME	(5 * 60 * USEC_PER_SEC) /* 5 minutes */
 
@@ -355,6 +381,14 @@  void notrace s390_do_machine_check(struct pt_regs *regs)
 		mcck->mcck_code = mci.val;
 		set_cpu_flag(CIF_MCCK_PENDING);
 	}
+
+	/*
+	 * Backup the machine check's info if it happens when the guest
+	 * is running.
+	 */
+	if (test_cpu_flag(CIF_MCCK_GUEST))
+		s390_backup_mcck_info(regs);
+
 	if (mci.cd) {
 		/* Timing facility damage */
 		s390_handle_damage();
diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
index 689ac48..0457e03 100644
--- a/arch/s390/kvm/kvm-s390.c
+++ b/arch/s390/kvm/kvm-s390.c
@@ -2069,6 +2069,7 @@  struct kvm_vcpu *kvm_arch_vcpu_create(struct kvm *kvm,
 	if (!vcpu)
 		goto out;
 
+	BUILD_BUG_ON(sizeof(struct sie_page) != 4096);
 	sie_page = (struct sie_page *) get_zeroed_page(GFP_KERNEL);
 	if (!sie_page)
 		goto out_free_cpu;