[v2,2/2] s390/kvm: handle diagnose 318 instruction call

Message ID	1544135405-22385-3-git-send-email-walling@linux.ibm.com (mailing list archive)
State	New, archived
Headers	show Return-Path: <kvm-owner@kernel.org> Gateway: Authorized Use Only! Violators will be prosecuted for <kvm@vger.kernel.org> from <walling@linux.ibm.com>; Thu, 6 Dec 2018 22:30:31 -0000 Gateway: Authorized Use Only! Violators will be prosecuted; (version=TLSv1/SSLv3 cipher=AES256-GCM-SHA384 bits=256/256) Thu, 6 Dec 2018 22:30:29 -0000 From: Collin Walling <walling@linux.ibm.com> To: linux-s390@vger.kernel.org, kvm@vger.kernel.org, borntraeger@de.ibm.com, david@redhat.com, cohuck@redhat.com, heiko.carstens@de.ibm.com, schwidefsky@de.ibm.com, frankja@linux.ibm.com Subject: [PATCH v2 2/2] s390/kvm: handle diagnose 318 instruction call Date: Thu, 6 Dec 2018 17:30:05 -0500 In-Reply-To: <1544135405-22385-1-git-send-email-walling@linux.ibm.com> References: <1544135405-22385-1-git-send-email-walling@linux.ibm.com> Message-Id: <1544135405-22385-3-git-send-email-walling@linux.ibm.com> Sender: kvm-owner@vger.kernel.org Precedence: bulk
Series	Use DIAG318 to set Control Program Name & Version Codes \| expand [v2,0/2] Use DIAG318 to set Control Program Name & Version Codes [v2,1/2] s390/setup: set control program code via diag 318 [v2,2/2] s390/kvm: handle diagnose 318 instruction call

Collin Walling Dec. 6, 2018, 10:30 p.m. UTC

The diagnose 318 instruction is a privileged instruction that must be
interpreted by SIE and handled via KVM.

The control program name and version codes (CPNC and CPVC) set by this
instruction are saved to the kvm->arch struct. The CPNC is also set in
the SIE control block of all VCPUs. The new kvm_s390_set_machine
interface is introduced for migration.

Signed-off-by: Collin Walling <walling@linux.ibm.com>
Reviewed-by: Janosch Frank <frankja@linux.ibm.com>
---
 arch/s390/include/asm/kvm_host.h | 13 +++++-
 arch/s390/include/uapi/asm/kvm.h |  5 +++
 arch/s390/kvm/diag.c             | 12 ++++++
 arch/s390/kvm/kvm-s390.c         | 92 ++++++++++++++++++++++++++++++++++++++++
 arch/s390/kvm/kvm-s390.h         |  1 +
 arch/s390/kvm/vsie.c             |  7 +++
 6 files changed, 129 insertions(+), 1 deletion(-)

Cornelia Huck Dec. 7, 2018, 10:56 a.m. UTC | #1

On Thu,  6 Dec 2018 17:30:05 -0500
Collin Walling <walling@linux.ibm.com> wrote:

> The diagnose 318 instruction is a privileged instruction that must be
> interpreted by SIE and handled via KVM.
> 
> The control program name and version codes (CPNC and CPVC) set by this
> instruction are saved to the kvm->arch struct. The CPNC is also set in
> the SIE control block of all VCPUs. The new kvm_s390_set_machine
> interface is introduced for migration.
> 
> Signed-off-by: Collin Walling <walling@linux.ibm.com>
> Reviewed-by: Janosch Frank <frankja@linux.ibm.com>
> ---
>  arch/s390/include/asm/kvm_host.h | 13 +++++-
>  arch/s390/include/uapi/asm/kvm.h |  5 +++
>  arch/s390/kvm/diag.c             | 12 ++++++
>  arch/s390/kvm/kvm-s390.c         | 92 ++++++++++++++++++++++++++++++++++++++++
>  arch/s390/kvm/kvm-s390.h         |  1 +
>  arch/s390/kvm/vsie.c             |  7 +++
>  6 files changed, 129 insertions(+), 1 deletion(-)
> 

(...)

> diff --git a/arch/s390/include/uapi/asm/kvm.h b/arch/s390/include/uapi/asm/kvm.h
> index 16511d9..6420aad 100644
> --- a/arch/s390/include/uapi/asm/kvm.h
> +++ b/arch/s390/include/uapi/asm/kvm.h
> @@ -74,6 +74,7 @@ struct kvm_s390_io_adapter_req {
>  #define KVM_S390_VM_CRYPTO		2
>  #define KVM_S390_VM_CPU_MODEL		3
>  #define KVM_S390_VM_MIGRATION		4
> +#define KVM_S390_VM_MACHINE		5

Please update Documentation/virtual/kvm/api.txt for this as well.

[Looking at the documentation, I also notice that
KVM_S390_VM_CRYPTO_{EN,DIS}ABLE_APIE seem to lack an entry in that
file. Obviously not within the scope of this patch :)]

>  
>  /* kvm attributes for mem_ctrl */
>  #define KVM_S390_VM_MEM_ENABLE_CMMA	0
> @@ -130,6 +131,7 @@ struct kvm_s390_vm_cpu_machine {
>  #define KVM_S390_VM_CPU_FEAT_PFMFI	11
>  #define KVM_S390_VM_CPU_FEAT_SIGPIF	12
>  #define KVM_S390_VM_CPU_FEAT_KSS	13
> +#define KVM_S390_VM_CPU_FEAT_DIAG318	14
>  struct kvm_s390_vm_cpu_feat {
>  	__u64 feat[16];
>  };
> @@ -168,6 +170,9 @@ struct kvm_s390_vm_cpu_subfunc {
>  #define KVM_S390_VM_MIGRATION_START	1
>  #define KVM_S390_VM_MIGRATION_STATUS	2
>  
> +/* kvm attributes for KVM_S390_VM_MACHINE */
> +#define KVM_S390_VM_MACHINE_CPC	0
> +
>  /* for KVM_GET_REGS and KVM_SET_REGS */
>  struct kvm_regs {
>  	/* general purpose regs for s390 */

David Hildenbrand Dec. 7, 2018, 4:33 p.m. UTC | #2

On 06.12.18 23:30, Collin Walling wrote:
> The diagnose 318 instruction is a privileged instruction that must be
> interpreted by SIE and handled via KVM.
> 
> The control program name and version codes (CPNC and CPVC) set by this
> instruction are saved to the kvm->arch struct. The CPNC is also set in
> the SIE control block of all VCPUs. The new kvm_s390_set_machine
> interface is introduced for migration.
> 
> Signed-off-by: Collin Walling <walling@linux.ibm.com>
> Reviewed-by: Janosch Frank <frankja@linux.ibm.com>
> ---
>  arch/s390/include/asm/kvm_host.h | 13 +++++-
>  arch/s390/include/uapi/asm/kvm.h |  5 +++
>  arch/s390/kvm/diag.c             | 12 ++++++
>  arch/s390/kvm/kvm-s390.c         | 92 ++++++++++++++++++++++++++++++++++++++++
>  arch/s390/kvm/kvm-s390.h         |  1 +
>  arch/s390/kvm/vsie.c             |  7 +++
>  6 files changed, 129 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/s390/include/asm/kvm_host.h b/arch/s390/include/asm/kvm_host.h
> index d5d2488..ad42949 100644
> --- a/arch/s390/include/asm/kvm_host.h
> +++ b/arch/s390/include/asm/kvm_host.h
> @@ -229,7 +229,8 @@ struct kvm_s390_sie_block {
>  	__u32	scaol;			/* 0x0064 */
>  	__u8	reserved68;		/* 0x0068 */
>  	__u8    epdx;			/* 0x0069 */
> -	__u8    reserved6a[2];		/* 0x006a */
> +	__u8    cpnc;			/* 0x006a */

So, no field for cpvc? Can you give me a short summary how these values
are to be used. E.g. who will use the cpvc? Who can actually read both
values. A guest VCPU? Firmware? (well firmware obviously has no access
to cpvc)

> +	__u8	reserved6b;		/* 0x006b */
>  	__u32	todpr;			/* 0x006c */
>  #define GISA_FORMAT1 0x00000001
>  	__u32	gd;			/* 0x0070 */
> @@ -391,6 +392,7 @@ struct kvm_vcpu_stat {
>  	u64 diagnose_9c;
>  	u64 diagnose_258;
>  	u64 diagnose_308;
> +	u64 diagnose_318;
>  	u64 diagnose_500;
>  	u64 diagnose_other;
>  };
> @@ -804,6 +806,14 @@ struct kvm_s390_vsie {
>  	struct page *pages[KVM_MAX_VCPUS];
>  };
>  
> +union kvm_s390_diag318_info {
> +	u64 cpc;
> +	struct {
> +		u64 cpnc : 8;
> +		u64 cpvc : 56;
> +	};
> +};
> +
>  struct kvm_arch{
>  	void *sca;
>  	int use_esca;
> @@ -838,6 +848,7 @@ struct kvm_arch{
>  	/* subset of available cpu features enabled by user space */
>  	DECLARE_BITMAP(cpu_feat, KVM_S390_VM_CPU_FEAT_NR_BITS);
>  	struct kvm_s390_gisa *gisa;
> +	union kvm_s390_diag318_info diag318_info;
>  };
>  
>  #define KVM_HVA_ERR_BAD		(-1UL)
> diff --git a/arch/s390/include/uapi/asm/kvm.h b/arch/s390/include/uapi/asm/kvm.h
> index 16511d9..6420aad 100644
> --- a/arch/s390/include/uapi/asm/kvm.h
> +++ b/arch/s390/include/uapi/asm/kvm.h
> @@ -74,6 +74,7 @@ struct kvm_s390_io_adapter_req {
>  #define KVM_S390_VM_CRYPTO		2
>  #define KVM_S390_VM_CPU_MODEL		3
>  #define KVM_S390_VM_MIGRATION		4
> +#define KVM_S390_VM_MACHINE		5
>  
>  /* kvm attributes for mem_ctrl */
>  #define KVM_S390_VM_MEM_ENABLE_CMMA	0
> @@ -130,6 +131,7 @@ struct kvm_s390_vm_cpu_machine {
>  #define KVM_S390_VM_CPU_FEAT_PFMFI	11
>  #define KVM_S390_VM_CPU_FEAT_SIGPIF	12
>  #define KVM_S390_VM_CPU_FEAT_KSS	13
> +#define KVM_S390_VM_CPU_FEAT_DIAG318	14
>  struct kvm_s390_vm_cpu_feat {
>  	__u64 feat[16];
>  };
> @@ -168,6 +170,9 @@ struct kvm_s390_vm_cpu_subfunc {
>  #define KVM_S390_VM_MIGRATION_START	1
>  #define KVM_S390_VM_MIGRATION_STATUS	2
>  
> +/* kvm attributes for KVM_S390_VM_MACHINE */
> +#define KVM_S390_VM_MACHINE_CPC	0
> +
>  /* for KVM_GET_REGS and KVM_SET_REGS */
>  struct kvm_regs {
>  	/* general purpose regs for s390 */
> diff --git a/arch/s390/kvm/diag.c b/arch/s390/kvm/diag.c
> index 45634b3d..b61ffd2 100644
> --- a/arch/s390/kvm/diag.c
> +++ b/arch/s390/kvm/diag.c
> @@ -235,6 +235,16 @@ static int __diag_virtio_hypercall(struct kvm_vcpu *vcpu)
>  	return ret < 0 ? ret : 0;
>  }
>  
> +static int __diag_set_control_prog_name(struct kvm_vcpu *vcpu)
> +{
> +	unsigned int reg = (vcpu->arch.sie_block->ipa & 0xf0) >> 4;
> +	unsigned long cpc = vcpu->run->s.regs.gprs[reg];

1. This should be a u64.

2. What if a !KVM_S390_VM_CPU_FEAT_DIAG318 ? Shouldn't there be an error
reported to the guest? (diag subcode not installed)

> +
> +	vcpu->stat.diagnose_318++;
> +	kvm_s390_set_cpc(vcpu->kvm, cpc);
> +	return 0;
> +}
> +
>  int kvm_s390_handle_diag(struct kvm_vcpu *vcpu)
>  {
>  	int code = kvm_s390_get_base_disp_rs(vcpu, NULL) & 0xffff;
> @@ -254,6 +264,8 @@ int kvm_s390_handle_diag(struct kvm_vcpu *vcpu)
>  		return __diag_page_ref_service(vcpu);
>  	case 0x308:
>  		return __diag_ipl_functions(vcpu);
> +	case 0x318:
> +		return __diag_set_control_prog_name(vcpu);
>  	case 0x500:
>  		return __diag_virtio_hypercall(vcpu);
>  	default:
> diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
> index fe24150..67aa34a 100644
> --- a/arch/s390/kvm/kvm-s390.c
> +++ b/arch/s390/kvm/kvm-s390.c
> @@ -157,6 +157,7 @@ struct kvm_stats_debugfs_item debugfs_entries[] = {
>  	{ "instruction_diag_9c", VCPU_STAT(diagnose_9c) },
>  	{ "instruction_diag_258", VCPU_STAT(diagnose_258) },
>  	{ "instruction_diag_308", VCPU_STAT(diagnose_308) },
> +	{ "instruction_diag_318", VCPU_STAT(diagnose_318) },
>  	{ "instruction_diag_500", VCPU_STAT(diagnose_500) },
>  	{ "instruction_diag_other", VCPU_STAT(diagnose_other) },
>  	{ NULL }
> @@ -371,6 +372,10 @@ static void kvm_s390_cpu_feat_init(void)
>  
>  	if (MACHINE_HAS_ESOP)
>  		allow_cpu_feat(KVM_S390_VM_CPU_FEAT_ESOP);
> +
> +	/* Enable DIAG318 guest support unconditionally */
> +	allow_cpu_feat(KVM_S390_VM_CPU_FEAT_DIAG318);

Interesting. Is that the right thing to do?

While somebody can always write (__diag_set_control_prog_name emualtion
handler), I don't see a read handler.

How is that supposed to work e.g. on older hardware?

(I was expecting HW to handle reading, that's why we add it to the SCB-
Can you elaborate what the value stored in the SCB is actually used for?
Is a guest CPU not actually able to read that value somehow?)

> +
>  	/*
>  	 * We need SIE support, ESOP (PROT_READ protection for gmap_shadow),
>  	 * 64bit SCAO (SCA passthrough) and IDTE (for gmap_shadow unshadowing).
> @@ -1173,6 +1178,75 @@ static int kvm_s390_get_tod(struct kvm *kvm, struct kvm_device_attr *attr)
>  	return ret;
>  }
>  
> +void kvm_s390_set_cpc(struct kvm *kvm, u64 cpc)
> +{
> +	struct kvm_vcpu *vcpu;
> +	int i;
> +
> +	mutex_lock(&kvm->lock);
> +	kvm_s390_vcpu_block_all(kvm);

> +
> +	kvm->arch.diag318_info.cpc = cpc;
> +
> +	VM_EVENT(kvm, 3, "SET: cpnc: 0x%x cpvc: 0x%llx",
> +		 (u8)kvm->arch.diag318_info.cpnc, (u64)kvm->arch.diag318_info.cpvc);

The second cast (u64) might not be necessary.

> +
> +	kvm_for_each_vcpu(i, vcpu, kvm) {
> +		vcpu->arch.sie_block->cpnc = kvm->arch.diag318_info.cpnc;
> +	}
> +
> +	kvm_s390_vcpu_unblock_all(kvm);

Looks sane to me

> +	mutex_unlock(&kvm->lock);
> +}
> +
> +static int kvm_s390_vm_set_machine(struct kvm *kvm, struct kvm_device_attr *attr)
> +{
> +	int ret;
> +	u64 cpc;
> +
> +	switch (attr->attr) {
> +	case KVM_S390_VM_MACHINE_CPC:
> +		ret = -EFAULT;
> +		if (get_user(cpc, (u64 __user *)attr->addr))
> +			break;
> +		kvm_s390_set_cpc(kvm, cpc);
> +		ret = 0;
> +		break;
> +	default:
> +		ret = -ENXIO;
> +		break;
> +	}
> +	return ret;
> +}
> +
> +static int kvm_s390_get_cpc(struct kvm *kvm, struct kvm_device_attr *attr)
> +{
> +	u64 cpc = kvm->arch.diag318_info.cpc;
> +
> +	if (put_user(cpc, (u64 __user *)attr->addr))
> +		return -EFAULT;
> +
> +	VM_EVENT(kvm, 3, "QUERY: cpnc: 0x%x cpvc: 0x%llx",
> +		 (u8)kvm->arch.diag318_info.cpnc, (u64)kvm->arch.diag318_info.cpvc);
> +
> +	return 0;
> +}
> +
> +static int kvm_s390_get_misc(struct kvm *kvm, struct kvm_device_attr *attr)

get_machine

> +{
> +	int ret;
> +
> +	switch (attr->attr) {
> +	case KVM_S390_VM_MACHINE_CPC:
> +		ret = kvm_s390_get_cpc(kvm, attr);
> +		break;
> +	default:
> +		ret = -ENXIO;
> +		break;
> +	}
> +	return ret;
> +}
> +
>  static int kvm_s390_set_processor(struct kvm *kvm, struct kvm_device_attr *attr)
>  {
>  	struct kvm_s390_vm_cpu_processor *proc;
> @@ -1435,6 +1509,9 @@ static int kvm_s390_vm_set_attr(struct kvm *kvm, struct kvm_device_attr *attr)
>  	case KVM_S390_VM_MIGRATION:
>  		ret = kvm_s390_vm_set_migration(kvm, attr);
>  		break;
> +	case KVM_S390_VM_MACHINE:
> +		ret = kvm_s390_vm_set_machine(kvm, attr);
> +		break;
>  	default:
>  		ret = -ENXIO;
>  		break;
> @@ -1460,6 +1537,9 @@ static int kvm_s390_vm_get_attr(struct kvm *kvm, struct kvm_device_attr *attr)
>  	case KVM_S390_VM_MIGRATION:
>  		ret = kvm_s390_vm_get_migration(kvm, attr);
>  		break;
> +	case KVM_S390_VM_MACHINE:
> +		ret = kvm_s390_get_misc(kvm, attr);
> +		break;
>  	default:
>  		ret = -ENXIO;
>  		break;
> @@ -1534,6 +1614,16 @@ static int kvm_s390_vm_has_attr(struct kvm *kvm, struct kvm_device_attr *attr)
>  	case KVM_S390_VM_MIGRATION:
>  		ret = 0;
>  		break;
> +	case KVM_S390_VM_MACHINE:
> +		switch (attr->attr) {
> +		case KVM_S390_VM_MACHINE_CPC:
> +			ret = 0;
> +			break;
> +		default:
> +			ret = -ENXIO;
> +			break;
> +		}
> +		break;
>  	default:
>  		ret = -ENXIO;
>  		break;
> @@ -2650,7 +2740,9 @@ void kvm_arch_vcpu_postcreate(struct kvm_vcpu *vcpu)
>  	preempt_disable();
>  	vcpu->arch.sie_block->epoch = vcpu->kvm->arch.epoch;
>  	vcpu->arch.sie_block->epdx = vcpu->kvm->arch.epdx;
> +	vcpu->arch.sie_block->cpnc = vcpu->kvm->arch.diag318_info.cpnc;

Move that out of the preempt section. That is only needed for the
epoch/epdx. cpc is fully covered by the kvm lock.

>  	preempt_enable();
> +
>  	mutex_unlock(&vcpu->kvm->lock);
>  	if (!kvm_is_ucontrol(vcpu->kvm)) {
>  		vcpu->arch.gmap = vcpu->kvm->arch.gmap;
> diff --git a/arch/s390/kvm/kvm-s390.h b/arch/s390/kvm/kvm-s390.h
> index 1f6e36c..8137f15 100644
> --- a/arch/s390/kvm/kvm-s390.h
> +++ b/arch/s390/kvm/kvm-s390.h
> @@ -281,6 +281,7 @@ int kvm_s390_handle_sigp(struct kvm_vcpu *vcpu);
>  int kvm_s390_handle_sigp_pei(struct kvm_vcpu *vcpu);
>  
>  /* implemented in kvm-s390.c */
> +void kvm_s390_set_cpc(struct kvm *kvm, u64 cpc);
>  void kvm_s390_set_tod_clock(struct kvm *kvm,
>  			    const struct kvm_s390_vm_tod_clock *gtod);
>  long kvm_arch_fault_in_page(struct kvm_vcpu *vcpu, gpa_t gpa, int writable);
> diff --git a/arch/s390/kvm/vsie.c b/arch/s390/kvm/vsie.c
> index a153257..1a6a9b6 100644
> --- a/arch/s390/kvm/vsie.c
> +++ b/arch/s390/kvm/vsie.c
> @@ -397,6 +397,9 @@ static void unshadow_scb(struct kvm_vcpu *vcpu, struct vsie_page *vsie_page)
>  	memcpy(scb_o->gcr, scb_s->gcr, 128);
>  	scb_o->pp = scb_s->pp;
>  
> +	/* machine information */
> +	scb_o->cpnc = scb_s->cpnc;

Is unshadow needed? I think we have diag318 for that.

> +
>  	/* branch prediction */
>  	if (test_kvm_facility(vcpu->kvm, 82)) {
>  		scb_o->fpf &= ~FPF_BPBC;
> @@ -473,6 +476,10 @@ static int shadow_scb(struct kvm_vcpu *vcpu, struct vsie_page *vsie_page)
>  	scb_s->lctl = scb_o->lctl;
>  	scb_s->svcc = scb_o->svcc;
>  	scb_s->ictl = scb_o->ictl;
> +
> +	/* machine information */

Please use as the comment the full name (e.g. control program name and
version)

> +	scb_s->cpnc = scb_o->cpnc;
> +
>  	/*
>  	 * SKEY handling functions can't deal with false setting of PTE invalid
>  	 * bits. Therefore we cannot provide interpretation and would later
>

Collin Walling Dec. 11, 2018, 3:27 p.m. UTC | #3

On 12/7/18 5:56 AM, Cornelia Huck wrote:
> On Thu,  6 Dec 2018 17:30:05 -0500
> Collin Walling <walling@linux.ibm.com> wrote:
> 
>> The diagnose 318 instruction is a privileged instruction that must be
>> interpreted by SIE and handled via KVM.
>>
>> The control program name and version codes (CPNC and CPVC) set by this
>> instruction are saved to the kvm->arch struct. The CPNC is also set in
>> the SIE control block of all VCPUs. The new kvm_s390_set_machine
>> interface is introduced for migration.
>>
>> Signed-off-by: Collin Walling <walling@linux.ibm.com>
>> Reviewed-by: Janosch Frank <frankja@linux.ibm.com>
>> ---
>>  arch/s390/include/asm/kvm_host.h | 13 +++++-
>>  arch/s390/include/uapi/asm/kvm.h |  5 +++
>>  arch/s390/kvm/diag.c             | 12 ++++++
>>  arch/s390/kvm/kvm-s390.c         | 92 ++++++++++++++++++++++++++++++++++++++++
>>  arch/s390/kvm/kvm-s390.h         |  1 +
>>  arch/s390/kvm/vsie.c             |  7 +++
>>  6 files changed, 129 insertions(+), 1 deletion(-)
>>
> 
> (...)
> 
>> diff --git a/arch/s390/include/uapi/asm/kvm.h b/arch/s390/include/uapi/asm/kvm.h
>> index 16511d9..6420aad 100644
>> --- a/arch/s390/include/uapi/asm/kvm.h
>> +++ b/arch/s390/include/uapi/asm/kvm.h
>> @@ -74,6 +74,7 @@ struct kvm_s390_io_adapter_req {
>>  #define KVM_S390_VM_CRYPTO		2
>>  #define KVM_S390_VM_CPU_MODEL		3
>>  #define KVM_S390_VM_MIGRATION		4
>> +#define KVM_S390_VM_MACHINE		5
> 
> Please update Documentation/virtual/kvm/api.txt for this as well.
> 

Will do.

> [Looking at the documentation, I also notice that
> KVM_S390_VM_CRYPTO_{EN,DIS}ABLE_APIE seem to lack an entry in that
> file. Obviously not within the scope of this patch :)]
> 
>>  
>>  /* kvm attributes for mem_ctrl */
>>  #define KVM_S390_VM_MEM_ENABLE_CMMA	0
>> @@ -130,6 +131,7 @@ struct kvm_s390_vm_cpu_machine {
>>  #define KVM_S390_VM_CPU_FEAT_PFMFI	11
>>  #define KVM_S390_VM_CPU_FEAT_SIGPIF	12
>>  #define KVM_S390_VM_CPU_FEAT_KSS	13
>> +#define KVM_S390_VM_CPU_FEAT_DIAG318	14
>>  struct kvm_s390_vm_cpu_feat {
>>  	__u64 feat[16];
>>  };
>> @@ -168,6 +170,9 @@ struct kvm_s390_vm_cpu_subfunc {
>>  #define KVM_S390_VM_MIGRATION_START	1
>>  #define KVM_S390_VM_MIGRATION_STATUS	2
>>  
>> +/* kvm attributes for KVM_S390_VM_MACHINE */
>> +#define KVM_S390_VM_MACHINE_CPC	0
>> +
>>  /* for KVM_GET_REGS and KVM_SET_REGS */
>>  struct kvm_regs {
>>  	/* general purpose regs for s390 */
>

Collin Walling Dec. 11, 2018, 4:45 p.m. UTC | #4

On 12/7/18 11:33 AM, David Hildenbrand wrote:
> On 06.12.18 23:30, Collin Walling wrote:
>> The diagnose 318 instruction is a privileged instruction that must be
>> interpreted by SIE and handled via KVM.
>>
>> The control program name and version codes (CPNC and CPVC) set by this
>> instruction are saved to the kvm->arch struct. The CPNC is also set in
>> the SIE control block of all VCPUs. The new kvm_s390_set_machine
>> interface is introduced for migration.
>>
>> Signed-off-by: Collin Walling <walling@linux.ibm.com>
>> Reviewed-by: Janosch Frank <frankja@linux.ibm.com>
>> ---
>>  arch/s390/include/asm/kvm_host.h | 13 +++++-
>>  arch/s390/include/uapi/asm/kvm.h |  5 +++
>>  arch/s390/kvm/diag.c             | 12 ++++++
>>  arch/s390/kvm/kvm-s390.c         | 92 ++++++++++++++++++++++++++++++++++++++++
>>  arch/s390/kvm/kvm-s390.h         |  1 +
>>  arch/s390/kvm/vsie.c             |  7 +++
>>  6 files changed, 129 insertions(+), 1 deletion(-)
>>
>> diff --git a/arch/s390/include/asm/kvm_host.h b/arch/s390/include/asm/kvm_host.h
>> index d5d2488..ad42949 100644
>> --- a/arch/s390/include/asm/kvm_host.h
>> +++ b/arch/s390/include/asm/kvm_host.h
>> @@ -229,7 +229,8 @@ struct kvm_s390_sie_block {
>>  	__u32	scaol;			/* 0x0064 */
>>  	__u8	reserved68;		/* 0x0068 */
>>  	__u8    epdx;			/* 0x0069 */
>> -	__u8    reserved6a[2];		/* 0x006a */
>> +	__u8    cpnc;			/* 0x006a */
> 
> So, no field for cpvc? Can you give me a short summary how these values
> are to be used. E.g. who will use the cpvc? Who can actually read both
> values. A guest VCPU? Firmware? (well firmware obviously has no access
> to cpvc)
> 

Only the CPNC is architected to be stored in the SIE block... CPVC has to be held elsewhere.

The CPNC can be seen during a ring dump, which is helpful when a problem causes a CPU to
enter a check-stop state and we can't get a CEC dump.

We can really only see the CPVC from the host's perspective, perhaps during a guest dump 
or via VM_EVENT messages in the guest debug logs.

>> +	__u8	reserved6b;		/* 0x006b */
>>  	__u32	todpr;			/* 0x006c */
>>  #define GISA_FORMAT1 0x00000001
>>  	__u32	gd;			/* 0x0070 */
>> @@ -391,6 +392,7 @@ struct kvm_vcpu_stat {
>>  	u64 diagnose_9c;
>>  	u64 diagnose_258;
>>  	u64 diagnose_308;
>> +	u64 diagnose_318;
>>  	u64 diagnose_500;
>>  	u64 diagnose_other;
>>  };
>> @@ -804,6 +806,14 @@ struct kvm_s390_vsie {
>>  	struct page *pages[KVM_MAX_VCPUS];
>>  };
>>  
>> +union kvm_s390_diag318_info {
>> +	u64 cpc;
>> +	struct {
>> +		u64 cpnc : 8;
>> +		u64 cpvc : 56;
>> +	};
>> +};
>> +
>>  struct kvm_arch{
>>  	void *sca;
>>  	int use_esca;
>> @@ -838,6 +848,7 @@ struct kvm_arch{
>>  	/* subset of available cpu features enabled by user space */
>>  	DECLARE_BITMAP(cpu_feat, KVM_S390_VM_CPU_FEAT_NR_BITS);
>>  	struct kvm_s390_gisa *gisa;
>> +	union kvm_s390_diag318_info diag318_info;
>>  };
>>  
>>  #define KVM_HVA_ERR_BAD		(-1UL)
>> diff --git a/arch/s390/include/uapi/asm/kvm.h b/arch/s390/include/uapi/asm/kvm.h
>> index 16511d9..6420aad 100644
>> --- a/arch/s390/include/uapi/asm/kvm.h
>> +++ b/arch/s390/include/uapi/asm/kvm.h
>> @@ -74,6 +74,7 @@ struct kvm_s390_io_adapter_req {
>>  #define KVM_S390_VM_CRYPTO		2
>>  #define KVM_S390_VM_CPU_MODEL		3
>>  #define KVM_S390_VM_MIGRATION		4
>> +#define KVM_S390_VM_MACHINE		5
>>  
>>  /* kvm attributes for mem_ctrl */
>>  #define KVM_S390_VM_MEM_ENABLE_CMMA	0
>> @@ -130,6 +131,7 @@ struct kvm_s390_vm_cpu_machine {
>>  #define KVM_S390_VM_CPU_FEAT_PFMFI	11
>>  #define KVM_S390_VM_CPU_FEAT_SIGPIF	12
>>  #define KVM_S390_VM_CPU_FEAT_KSS	13
>> +#define KVM_S390_VM_CPU_FEAT_DIAG318	14
>>  struct kvm_s390_vm_cpu_feat {
>>  	__u64 feat[16];
>>  };
>> @@ -168,6 +170,9 @@ struct kvm_s390_vm_cpu_subfunc {
>>  #define KVM_S390_VM_MIGRATION_START	1
>>  #define KVM_S390_VM_MIGRATION_STATUS	2
>>  
>> +/* kvm attributes for KVM_S390_VM_MACHINE */
>> +#define KVM_S390_VM_MACHINE_CPC	0
>> +
>>  /* for KVM_GET_REGS and KVM_SET_REGS */
>>  struct kvm_regs {
>>  	/* general purpose regs for s390 */
>> diff --git a/arch/s390/kvm/diag.c b/arch/s390/kvm/diag.c
>> index 45634b3d..b61ffd2 100644
>> --- a/arch/s390/kvm/diag.c
>> +++ b/arch/s390/kvm/diag.c
>> @@ -235,6 +235,16 @@ static int __diag_virtio_hypercall(struct kvm_vcpu *vcpu)
>>  	return ret < 0 ? ret : 0;
>>  }
>>  
>> +static int __diag_set_control_prog_name(struct kvm_vcpu *vcpu)
>> +{
>> +	unsigned int reg = (vcpu->arch.sie_block->ipa & 0xf0) >> 4;
>> +	unsigned long cpc = vcpu->run->s.regs.gprs[reg];
> 
> 1. This should be a u64.
> 
> 2. What if a !KVM_S390_VM_CPU_FEAT_DIAG318 ? Shouldn't there be an error
> reported to the guest? (diag subcode not installed)
> 

Since we're emulating the diag318 feature entirely for a guest, and we're able
to fence its availability via CPU models in QEMU (which is not obvious from the
KVM code, admittedly), we can safely allow this feature if KVM can support it.

With both KVM and QEMU support, the guest kernel will be able to pass the 
if (sclp.has_diag318) check in patch #1 and execute the diag instruction (with 
KVM handling) safely.

The whole implementation of these values is a bit tricky, I admit. I find myself
going back-and-forth to the docs to make sure I have this info correct.

Please let me know if there's a better way to convey this information to someone
who might have to look back on this code. Perhaps some documentation on the new 
VM attribute (suggested by Conny) would suffice.

>> +
>> +	vcpu->stat.diagnose_318++;
>> +	kvm_s390_set_cpc(vcpu->kvm, cpc);
>> +	return 0;
>> +}
>> +
>>  int kvm_s390_handle_diag(struct kvm_vcpu *vcpu)
>>  {
>>  	int code = kvm_s390_get_base_disp_rs(vcpu, NULL) & 0xffff;
>> @@ -254,6 +264,8 @@ int kvm_s390_handle_diag(struct kvm_vcpu *vcpu)
>>  		return __diag_page_ref_service(vcpu);
>>  	case 0x308:
>>  		return __diag_ipl_functions(vcpu);
>> +	case 0x318:
>> +		return __diag_set_control_prog_name(vcpu);
>>  	case 0x500:
>>  		return __diag_virtio_hypercall(vcpu);
>>  	default:
>> diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
>> index fe24150..67aa34a 100644
>> --- a/arch/s390/kvm/kvm-s390.c
>> +++ b/arch/s390/kvm/kvm-s390.c
>> @@ -157,6 +157,7 @@ struct kvm_stats_debugfs_item debugfs_entries[] = {
>>  	{ "instruction_diag_9c", VCPU_STAT(diagnose_9c) },
>>  	{ "instruction_diag_258", VCPU_STAT(diagnose_258) },
>>  	{ "instruction_diag_308", VCPU_STAT(diagnose_308) },
>> +	{ "instruction_diag_318", VCPU_STAT(diagnose_318) },
>>  	{ "instruction_diag_500", VCPU_STAT(diagnose_500) },
>>  	{ "instruction_diag_other", VCPU_STAT(diagnose_other) },
>>  	{ NULL }
>> @@ -371,6 +372,10 @@ static void kvm_s390_cpu_feat_init(void)
>>  
>>  	if (MACHINE_HAS_ESOP)
>>  		allow_cpu_feat(KVM_S390_VM_CPU_FEAT_ESOP);
>> +
>> +	/* Enable DIAG318 guest support unconditionally */
>> +	allow_cpu_feat(KVM_S390_VM_CPU_FEAT_DIAG318);
> 
> Interesting. Is that the right thing to do?
> 
> While somebody can always write (__diag_set_control_prog_name emualtion
> handler), I don't see a read handler.
> 
> How is that supposed to work e.g. on older hardware?
> 
> (I was expecting HW to handle reading, that's why we add it to the SCB-
> Can you elaborate what the value stored in the SCB is actually used for?
> Is a guest CPU not actually able to read that value somehow?)
> 

A CPU (or VCPU) should have a copy of the CPNC stored in a local register. 
I'm sorry I can't be super helpful in my explanation, as I am not 100%
familiar with the ring dump process, but perhaps the CPU retrieves the
value from the SCB during a ring dump? I'll have to phone-a-friend on this
one.

Please see my explanation to your questions above and let me know if that
covers this question as well.

I think a more informative comment in the code would be helpful here.

>> +
>>  	/*
>>  	 * We need SIE support, ESOP (PROT_READ protection for gmap_shadow),
>>  	 * 64bit SCAO (SCA passthrough) and IDTE (for gmap_shadow unshadowing).
>> @@ -1173,6 +1178,75 @@ static int kvm_s390_get_tod(struct kvm *kvm, struct kvm_device_attr *attr)
>>  	return ret;
>>  }
>>  
>> +void kvm_s390_set_cpc(struct kvm *kvm, u64 cpc)
>> +{
>> +	struct kvm_vcpu *vcpu;
>> +	int i;
>> +
>> +	mutex_lock(&kvm->lock);
>> +	kvm_s390_vcpu_block_all(kvm);
> 
>> +
>> +	kvm->arch.diag318_info.cpc = cpc;
>> +
>> +	VM_EVENT(kvm, 3, "SET: cpnc: 0x%x cpvc: 0x%llx",
>> +		 (u8)kvm->arch.diag318_info.cpnc, (u64)kvm->arch.diag318_info.cpvc);
> 
> The second cast (u64) might not be necessary.
> 

You might be right. I'll double-check if the compiler yells at me.

>> +
>> +	kvm_for_each_vcpu(i, vcpu, kvm) {
>> +		vcpu->arch.sie_block->cpnc = kvm->arch.diag318_info.cpnc;
>> +	}
>> +
>> +	kvm_s390_vcpu_unblock_all(kvm);
> 
> Looks sane to me
> 

:)

>> +	mutex_unlock(&kvm->lock);
>> +}
>> +
>> +static int kvm_s390_vm_set_machine(struct kvm *kvm, struct kvm_device_attr *attr)
>> +{
>> +	int ret;
>> +	u64 cpc;
>> +
>> +	switch (attr->attr) {
>> +	case KVM_S390_VM_MACHINE_CPC:
>> +		ret = -EFAULT;
>> +		if (get_user(cpc, (u64 __user *)attr->addr))
>> +			break;
>> +		kvm_s390_set_cpc(kvm, cpc);
>> +		ret = 0;
>> +		break;
>> +	default:
>> +		ret = -ENXIO;
>> +		break;
>> +	}
>> +	return ret;
>> +}
>> +
>> +static int kvm_s390_get_cpc(struct kvm *kvm, struct kvm_device_attr *attr)
>> +{
>> +	u64 cpc = kvm->arch.diag318_info.cpc;
>> +
>> +	if (put_user(cpc, (u64 __user *)attr->addr))
>> +		return -EFAULT;
>> +
>> +	VM_EVENT(kvm, 3, "QUERY: cpnc: 0x%x cpvc: 0x%llx",
>> +		 (u8)kvm->arch.diag318_info.cpnc, (u64)kvm->arch.diag318_info.cpvc);
>> +
>> +	return 0;
>> +}
>> +
>> +static int kvm_s390_get_misc(struct kvm *kvm, struct kvm_device_attr *attr)
> 
> get_machine
> 

Whoops, thanks!

>> +{
>> +	int ret;
>> +
>> +	switch (attr->attr) {
>> +	case KVM_S390_VM_MACHINE_CPC:
>> +		ret = kvm_s390_get_cpc(kvm, attr);
>> +		break;
>> +	default:
>> +		ret = -ENXIO;
>> +		break;
>> +	}
>> +	return ret;
>> +}
>> +
>>  static int kvm_s390_set_processor(struct kvm *kvm, struct kvm_device_attr *attr)
>>  {
>>  	struct kvm_s390_vm_cpu_processor *proc;
>> @@ -1435,6 +1509,9 @@ static int kvm_s390_vm_set_attr(struct kvm *kvm, struct kvm_device_attr *attr)
>>  	case KVM_S390_VM_MIGRATION:
>>  		ret = kvm_s390_vm_set_migration(kvm, attr);
>>  		break;
>> +	case KVM_S390_VM_MACHINE:
>> +		ret = kvm_s390_vm_set_machine(kvm, attr);
>> +		break;
>>  	default:
>>  		ret = -ENXIO;
>>  		break;
>> @@ -1460,6 +1537,9 @@ static int kvm_s390_vm_get_attr(struct kvm *kvm, struct kvm_device_attr *attr)
>>  	case KVM_S390_VM_MIGRATION:
>>  		ret = kvm_s390_vm_get_migration(kvm, attr);
>>  		break;
>> +	case KVM_S390_VM_MACHINE:
>> +		ret = kvm_s390_get_misc(kvm, attr);
>> +		break;
>>  	default:
>>  		ret = -ENXIO;
>>  		break;
>> @@ -1534,6 +1614,16 @@ static int kvm_s390_vm_has_attr(struct kvm *kvm, struct kvm_device_attr *attr)
>>  	case KVM_S390_VM_MIGRATION:
>>  		ret = 0;
>>  		break;
>> +	case KVM_S390_VM_MACHINE:
>> +		switch (attr->attr) {
>> +		case KVM_S390_VM_MACHINE_CPC:
>> +			ret = 0;
>> +			break;
>> +		default:
>> +			ret = -ENXIO;
>> +			break;
>> +		}
>> +		break;
>>  	default:
>>  		ret = -ENXIO;
>>  		break;
>> @@ -2650,7 +2740,9 @@ void kvm_arch_vcpu_postcreate(struct kvm_vcpu *vcpu)
>>  	preempt_disable();
>>  	vcpu->arch.sie_block->epoch = vcpu->kvm->arch.epoch;
>>  	vcpu->arch.sie_block->epdx = vcpu->kvm->arch.epdx;
>> +	vcpu->arch.sie_block->cpnc = vcpu->kvm->arch.diag318_info.cpnc;
> 
> Move that out of the preempt section. That is only needed for the
> epoch/epdx. cpc is fully covered by the kvm lock.
> 

Will do.

>>  	preempt_enable();
>> +
>>  	mutex_unlock(&vcpu->kvm->lock);
>>  	if (!kvm_is_ucontrol(vcpu->kvm)) {
>>  		vcpu->arch.gmap = vcpu->kvm->arch.gmap;
>> diff --git a/arch/s390/kvm/kvm-s390.h b/arch/s390/kvm/kvm-s390.h
>> index 1f6e36c..8137f15 100644
>> --- a/arch/s390/kvm/kvm-s390.h
>> +++ b/arch/s390/kvm/kvm-s390.h
>> @@ -281,6 +281,7 @@ int kvm_s390_handle_sigp(struct kvm_vcpu *vcpu);
>>  int kvm_s390_handle_sigp_pei(struct kvm_vcpu *vcpu);
>>  
>>  /* implemented in kvm-s390.c */
>> +void kvm_s390_set_cpc(struct kvm *kvm, u64 cpc);
>>  void kvm_s390_set_tod_clock(struct kvm *kvm,
>>  			    const struct kvm_s390_vm_tod_clock *gtod);
>>  long kvm_arch_fault_in_page(struct kvm_vcpu *vcpu, gpa_t gpa, int writable);
>> diff --git a/arch/s390/kvm/vsie.c b/arch/s390/kvm/vsie.c
>> index a153257..1a6a9b6 100644
>> --- a/arch/s390/kvm/vsie.c
>> +++ b/arch/s390/kvm/vsie.c
>> @@ -397,6 +397,9 @@ static void unshadow_scb(struct kvm_vcpu *vcpu, struct vsie_page *vsie_page)
>>  	memcpy(scb_o->gcr, scb_s->gcr, 128);
>>  	scb_o->pp = scb_s->pp;
>>  
>> +	/* machine information */
>> +	scb_o->cpnc = scb_s->cpnc;
> 
> Is unshadow needed? I think we have diag318 for that.
> 

Don't we need to make sure the CPNC in the SCB for a level 3+ guest is set correctly? I don't
know if we support anything other than "linux on linux on linux" so-to-speak, but if we do
then we need to make sure that when diag318 is called we correctly convey that value to the
correct SCB.

Correct me if I have my understanding of SIE / vSIE handling wrong here.

>> +
>>  	/* branch prediction */
>>  	if (test_kvm_facility(vcpu->kvm, 82)) {
>>  		scb_o->fpf &= ~FPF_BPBC;
>> @@ -473,6 +476,10 @@ static int shadow_scb(struct kvm_vcpu *vcpu, struct vsie_page *vsie_page)
>>  	scb_s->lctl = scb_o->lctl;
>>  	scb_s->svcc = scb_o->svcc;
>>  	scb_s->ictl = scb_o->ictl;
>> +
>> +	/* machine information */
> 
> Please use as the comment the full name (e.g. control program name and
> version)
> 

Okay.

>> +	scb_s->cpnc = scb_o->cpnc;
>> +
>>  	/*
>>  	 * SKEY handling functions can't deal with false setting of PTE invalid
>>  	 * bits. Therefore we cannot provide interpretation and would later
>>
> 
> 

Thanks for the review.

David Hildenbrand Dec. 12, 2018, 10:49 a.m. UTC | #5

@Christian see suggestion below

>>
>> So, no field for cpvc? Can you give me a short summary how these values
>> are to be used. E.g. who will use the cpvc? Who can actually read both
>> values. A guest VCPU? Firmware? (well firmware obviously has no access
>> to cpvc)
>>
> 
> Only the CPNC is architected to be stored in the SIE block... CPVC has to be held elsewhere.
> 
> The CPNC can be seen during a ring dump, which is helpful when a problem causes a CPU to
> enter a check-stop state and we can't get a CEC dump.
> 
> We can really only see the CPVC from the host's perspective, perhaps during a guest dump 
> or via VM_EVENT messages in the guest debug logs.

Okay, so I assume we'll e.g. want to include that information in guest
dumps e.g. in QEMU when dumping in ELF format? And/or allow other ways
to access this information.

Thanks for the information!

> 
>>> +	__u8	reserved6b;		/* 0x006b */
>>>  	__u32	todpr;			/* 0x006c */
>>>  #define GISA_FORMAT1 0x00000001
>>>  	__u32	gd;			/* 0x0070 */
>>> @@ -391,6 +392,7 @@ struct kvm_vcpu_stat {
>>>  	u64 diagnose_9c;
>>>  	u64 diagnose_258;
>>>  	u64 diagnose_308;
>>> +	u64 diagnose_318;
>>>  	u64 diagnose_500;
>>>  	u64 diagnose_other;
>>>  };
>>> @@ -804,6 +806,14 @@ struct kvm_s390_vsie {
>>>  	struct page *pages[KVM_MAX_VCPUS];
>>>  };
>>>  
>>> +union kvm_s390_diag318_info {
>>> +	u64 cpc;
>>> +	struct {
>>> +		u64 cpnc : 8;
>>> +		u64 cpvc : 56;
>>> +	};
>>> +};
>>> +
>>>  struct kvm_arch{
>>>  	void *sca;
>>>  	int use_esca;
>>> @@ -838,6 +848,7 @@ struct kvm_arch{
>>>  	/* subset of available cpu features enabled by user space */
>>>  	DECLARE_BITMAP(cpu_feat, KVM_S390_VM_CPU_FEAT_NR_BITS);
>>>  	struct kvm_s390_gisa *gisa;
>>> +	union kvm_s390_diag318_info diag318_info;
>>>  };
>>>  
>>>  #define KVM_HVA_ERR_BAD		(-1UL)
>>> diff --git a/arch/s390/include/uapi/asm/kvm.h b/arch/s390/include/uapi/asm/kvm.h
>>> index 16511d9..6420aad 100644
>>> --- a/arch/s390/include/uapi/asm/kvm.h
>>> +++ b/arch/s390/include/uapi/asm/kvm.h
>>> @@ -74,6 +74,7 @@ struct kvm_s390_io_adapter_req {
>>>  #define KVM_S390_VM_CRYPTO		2
>>>  #define KVM_S390_VM_CPU_MODEL		3
>>>  #define KVM_S390_VM_MIGRATION		4
>>> +#define KVM_S390_VM_MACHINE		5
>>>  
>>>  /* kvm attributes for mem_ctrl */
>>>  #define KVM_S390_VM_MEM_ENABLE_CMMA	0
>>> @@ -130,6 +131,7 @@ struct kvm_s390_vm_cpu_machine {
>>>  #define KVM_S390_VM_CPU_FEAT_PFMFI	11
>>>  #define KVM_S390_VM_CPU_FEAT_SIGPIF	12
>>>  #define KVM_S390_VM_CPU_FEAT_KSS	13
>>> +#define KVM_S390_VM_CPU_FEAT_DIAG318	14
>>>  struct kvm_s390_vm_cpu_feat {
>>>  	__u64 feat[16];
>>>  };
>>> @@ -168,6 +170,9 @@ struct kvm_s390_vm_cpu_subfunc {
>>>  #define KVM_S390_VM_MIGRATION_START	1
>>>  #define KVM_S390_VM_MIGRATION_STATUS	2
>>>  
>>> +/* kvm attributes for KVM_S390_VM_MACHINE */
>>> +#define KVM_S390_VM_MACHINE_CPC	0
>>> +
>>>  /* for KVM_GET_REGS and KVM_SET_REGS */
>>>  struct kvm_regs {
>>>  	/* general purpose regs for s390 */
>>> diff --git a/arch/s390/kvm/diag.c b/arch/s390/kvm/diag.c
>>> index 45634b3d..b61ffd2 100644
>>> --- a/arch/s390/kvm/diag.c
>>> +++ b/arch/s390/kvm/diag.c
>>> @@ -235,6 +235,16 @@ static int __diag_virtio_hypercall(struct kvm_vcpu *vcpu)
>>>  	return ret < 0 ? ret : 0;
>>>  }
>>>  
>>> +static int __diag_set_control_prog_name(struct kvm_vcpu *vcpu)
>>> +{
>>> +	unsigned int reg = (vcpu->arch.sie_block->ipa & 0xf0) >> 4;
>>> +	unsigned long cpc = vcpu->run->s.regs.gprs[reg];
>>
>> 1. This should be a u64.
>>
>> 2. What if a !KVM_S390_VM_CPU_FEAT_DIAG318 ? Shouldn't there be an error
>> reported to the guest? (diag subcode not installed)
>>
> 
> Since we're emulating the diag318 feature entirely for a guest, and we're able
> to fence its availability via CPU models in QEMU (which is not obvious from the
> KVM code, admittedly), we can safely allow this feature if KVM can support it.

Yes, while we in KVM "can" allow it, QEMU can do whatever it wants. e.g.
disable KVM_S390_VM_CPU_FEAT_DIAG318. Which can easily happen (old QEMU
on new HW).

In case QEMU disables KVM_S390_VM_CPU_FEAT_DIAG318, we should keep old
behavior -> Send it to user space and let QEMU handle it! (-EOPNOTSUPP)

A guest will in this case not see sclp.has_diag318, ideally not execute
diag318, but if it is still done, diag318 should also properly fail as
it used to. But see below.

> 
> With both KVM and QEMU support, the guest kernel will be able to pass the 
> if (sclp.has_diag318) check in patch #1 and execute the diag instruction (with 
> KVM handling) safely.
> 
> The whole implementation of these values is a bit tricky, I admit. I find myself
> going back-and-forth to the docs to make sure I have this info correct.
> 
> Please let me know if there's a better way to convey this information to someone
> who might have to look back on this code. Perhaps some documentation on the new 
> VM attribute (suggested by Conny) would suffice.

I guess the comment above
"allow_cpu_feat(KVM_S390_VM_CPU_FEAT_DIAG318);" is sufficient. But here,
we should really respect what QEMU asked for. If qemu asks for
!KVM_S390_VM_CPU_FEAT_DIAG318, diag318 should go to user space just as
it used to before.

> 
>>> +
>>> +	vcpu->stat.diagnose_318++;
>>> +	kvm_s390_set_cpc(vcpu->kvm, cpc);
>>> +	return 0;
>>> +}
>>> +
>>>  int kvm_s390_handle_diag(struct kvm_vcpu *vcpu)
>>>  {
>>>  	int code = kvm_s390_get_base_disp_rs(vcpu, NULL) & 0xffff;
>>> @@ -254,6 +264,8 @@ int kvm_s390_handle_diag(struct kvm_vcpu *vcpu)
>>>  		return __diag_page_ref_service(vcpu);
>>>  	case 0x308:
>>>  		return __diag_ipl_functions(vcpu);
>>> +	case 0x318:
>>> +		return __diag_set_control_prog_name(vcpu);
>>>  	case 0x500:
>>>  		return __diag_virtio_hypercall(vcpu);
>>>  	default:
>>> diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
>>> index fe24150..67aa34a 100644
>>> --- a/arch/s390/kvm/kvm-s390.c
>>> +++ b/arch/s390/kvm/kvm-s390.c
>>> @@ -157,6 +157,7 @@ struct kvm_stats_debugfs_item debugfs_entries[] = {
>>>  	{ "instruction_diag_9c", VCPU_STAT(diagnose_9c) },
>>>  	{ "instruction_diag_258", VCPU_STAT(diagnose_258) },
>>>  	{ "instruction_diag_308", VCPU_STAT(diagnose_308) },
>>> +	{ "instruction_diag_318", VCPU_STAT(diagnose_318) },
>>>  	{ "instruction_diag_500", VCPU_STAT(diagnose_500) },
>>>  	{ "instruction_diag_other", VCPU_STAT(diagnose_other) },
>>>  	{ NULL }
>>> @@ -371,6 +372,10 @@ static void kvm_s390_cpu_feat_init(void)
>>>  
>>>  	if (MACHINE_HAS_ESOP)
>>>  		allow_cpu_feat(KVM_S390_VM_CPU_FEAT_ESOP);
>>> +
>>> +	/* Enable DIAG318 guest support unconditionally */
>>> +	allow_cpu_feat(KVM_S390_VM_CPU_FEAT_DIAG318);
>>
>> Interesting. Is that the right thing to do?
>>
>> While somebody can always write (__diag_set_control_prog_name emualtion
>> handler), I don't see a read handler.
>>
>> How is that supposed to work e.g. on older hardware?
>>
>> (I was expecting HW to handle reading, that's why we add it to the SCB-
>> Can you elaborate what the value stored in the SCB is actually used for?
>> Is a guest CPU not actually able to read that value somehow?)
>>
> 
> A CPU (or VCPU) should have a copy of the CPNC stored in a local register. 
> I'm sorry I can't be super helpful in my explanation, as I am not 100%
> familiar with the ring dump process, but perhaps the CPU retrieves the
> value from the SCB during a ring dump? I'll have to phone-a-friend on this
> one.

That makes sense. So a guest cannot read the value, however the actual
CPU can read the value (e.g. when in check-stop) via the SCB. So one can
identify the workload triggering the crash (maybe).

So allowing this feature unconditionally will not actually allow HW to
use it, however we could get that information in QEMU, to e.g. include
it in a guest dump.

I am wondering if another way to handle this would maybe be even better.
It would allow also new QEMU on old KVM to get access to these values.
It will simply not be written to HW then (bad luck, there is no HW
support eventually)


It goes like this:

1. don't implement diag318 in the kernel, implement it in user space.
diag318 is already forwarded to QEMU as of now. (I don't consider
diag318 peformance relevant)

2. in user space, we can always support diag318, even without KVM/HW
support. (could glue to machines if really needed)

3. If we get a diag318 and the CPU feature is not enabled, handle it if
diag318 is not available (exception, just as we do now).

3. If we get a diag318 and the CPU feature is enabled, store both values
in QEMU (for migration and e.g. DUMPs) AND ...

4. ... if KVM supports KVM_S390_VM_MACHINE_CPC, also write the new
values (or even only the CPNC! ) to KVM.

5. During migration, if KVM supports KVM_S390_VM_MACHINE_CPC, also write
the migrated value to KVM.

KVM code is minimized. KVM_S390_VM_CPU_FEAT_DIAG318 in KVM is not
needed. We might even only need writing of KVM_S390_VM_MACHINE_CPC.

@Christian what do you think?

> 
> Please see my explanation to your questions above and let me know if that
> covers this question as well.
> 
> I think a more informative comment in the code would be helpful here.
> 
>>> +
>>>  	/*
>>>  	 * We need SIE support, ESOP (PROT_READ protection for gmap_shadow),
>>>  	 * 64bit SCAO (SCA passthrough) and IDTE (for gmap_shadow unshadowing).
>>> @@ -1173,6 +1178,75 @@ static int kvm_s390_get_tod(struct kvm *kvm, struct kvm_device_attr *attr)
>>>  	return ret;
>>>  }
>>>  

[...]
>>> +++ b/arch/s390/kvm/vsie.c
>>> @@ -397,6 +397,9 @@ static void unshadow_scb(struct kvm_vcpu *vcpu, struct vsie_page *vsie_page)
>>>  	memcpy(scb_o->gcr, scb_s->gcr, 128);
>>>  	scb_o->pp = scb_s->pp;
>>>  
>>> +	/* machine information */
>>> +	scb_o->cpnc = scb_s->cpnc;
>>
>> Is unshadow needed? I think we have diag318 for that.
>>
> 
> Don't we need to make sure the CPNC in the SCB for a level 3+ guest is set correctly? I don't
> know if we support anything other than "linux on linux on linux" so-to-speak, but if we do
> then we need to make sure that when diag318 is called we correctly convey that value to the
> correct SCB.
> 
> Correct me if I have my understanding of SIE / vSIE handling wrong here.

If HW(SIE) is not able to modify the CPNC, then we don't need an
unshadow path.

Assume our nested guests (g3) wants to set the CPNC.

1. g3 triggers diag318.
2. g3 intercepts into g1, leaving the vSIE.
3. g1 runs g2 to handle the SIE exit.
4. g2 processes the diag318, changing the value in the g3 SCB.
5. g2 wants to continue running g3 by executing the SIE.
6. g1 creates a shadow g3 SCB, using the new value from the g3 SCB.
7. The new CBNC is active when g3 is executed.

So I don't think the unshadow code is necessary, only the shadow code.

Cornelia Huck Dec. 12, 2018, 1:08 p.m. UTC | #6

On Wed, 12 Dec 2018 11:49:56 +0100
David Hildenbrand <david@redhat.com> wrote:

> I am wondering if another way to handle this would maybe be even better.
> It would allow also new QEMU on old KVM to get access to these values.
> It will simply not be written to HW then (bad luck, there is no HW
> support eventually)
> 
> 
> It goes like this:
> 
> 1. don't implement diag318 in the kernel, implement it in user space.
> diag318 is already forwarded to QEMU as of now. (I don't consider
> diag318 peformance relevant)
> 
> 2. in user space, we can always support diag318, even without KVM/HW
> support. (could glue to machines if really needed)
> 
> 3. If we get a diag318 and the CPU feature is not enabled, handle it if
> diag318 is not available (exception, just as we do now).
> 
> 3. If we get a diag318 and the CPU feature is enabled, store both values
> in QEMU (for migration and e.g. DUMPs) AND ...
> 
> 4. ... if KVM supports KVM_S390_VM_MACHINE_CPC, also write the new
> values (or even only the CPNC! ) to KVM.
> 
> 5. During migration, if KVM supports KVM_S390_VM_MACHINE_CPC, also write
> the migrated value to KVM.

What about that sclp thingie (I'm not quite sure what it signifies)?

> 
> KVM code is minimized. KVM_S390_VM_CPU_FEAT_DIAG318 in KVM is not
> needed. We might even only need writing of KVM_S390_VM_MACHINE_CPC.

In general, I like doing most of the implementation in QEMU. Especially
if we consider the compatibility/migration implications.

David Hildenbrand Dec. 13, 2018, 9:12 a.m. UTC | #7

On 12.12.18 14:08, Cornelia Huck wrote:
> On Wed, 12 Dec 2018 11:49:56 +0100
> David Hildenbrand <david@redhat.com> wrote:
> 
>> I am wondering if another way to handle this would maybe be even better.
>> It would allow also new QEMU on old KVM to get access to these values.
>> It will simply not be written to HW then (bad luck, there is no HW
>> support eventually)
>>
>>
>> It goes like this:
>>
>> 1. don't implement diag318 in the kernel, implement it in user space.
>> diag318 is already forwarded to QEMU as of now. (I don't consider
>> diag318 peformance relevant)
>>
>> 2. in user space, we can always support diag318, even without KVM/HW
>> support. (could glue to machines if really needed)
>>
>> 3. If we get a diag318 and the CPU feature is not enabled, handle it if
>> diag318 is not available (exception, just as we do now).
>>
>> 3. If we get a diag318 and the CPU feature is enabled, store both values
>> in QEMU (for migration and e.g. DUMPs) AND ...
>>
>> 4. ... if KVM supports KVM_S390_VM_MACHINE_CPC, also write the new
>> values (or even only the CPNC! ) to KVM.
>>
>> 5. During migration, if KVM supports KVM_S390_VM_MACHINE_CPC, also write
>> the migrated value to KVM.
> 
> What about that sclp thingie (I'm not quite sure what it signifies)?

You mean the indication of diag318? Well we might want to check in KVM
if the host has it (and therefore the CPNC in the SCB actually gets
used) and only then allow to set the CPNC. Or we could simply always
allow to write it to the SCB even though the value won't get used by HW.

Cornelia Huck Dec. 13, 2018, 10:21 a.m. UTC | #8

On Thu, 13 Dec 2018 10:12:49 +0100
David Hildenbrand <david@redhat.com> wrote:

> On 12.12.18 14:08, Cornelia Huck wrote:
> > On Wed, 12 Dec 2018 11:49:56 +0100
> > David Hildenbrand <david@redhat.com> wrote:
> >   
> >> I am wondering if another way to handle this would maybe be even better.
> >> It would allow also new QEMU on old KVM to get access to these values.
> >> It will simply not be written to HW then (bad luck, there is no HW
> >> support eventually)
> >>
> >>
> >> It goes like this:
> >>
> >> 1. don't implement diag318 in the kernel, implement it in user space.
> >> diag318 is already forwarded to QEMU as of now. (I don't consider
> >> diag318 peformance relevant)
> >>
> >> 2. in user space, we can always support diag318, even without KVM/HW
> >> support. (could glue to machines if really needed)
> >>
> >> 3. If we get a diag318 and the CPU feature is not enabled, handle it if
> >> diag318 is not available (exception, just as we do now).
> >>
> >> 3. If we get a diag318 and the CPU feature is enabled, store both values
> >> in QEMU (for migration and e.g. DUMPs) AND ...
> >>
> >> 4. ... if KVM supports KVM_S390_VM_MACHINE_CPC, also write the new
> >> values (or even only the CPNC! ) to KVM.
> >>
> >> 5. During migration, if KVM supports KVM_S390_VM_MACHINE_CPC, also write
> >> the migrated value to KVM.  
> > 
> > What about that sclp thingie (I'm not quite sure what it signifies)?  
> 
> You mean the indication of diag318? Well we might want to check in KVM
> if the host has it (and therefore the CPNC in the SCB actually gets
> used) and only then allow to set the CPNC. Or we could simply always
> allow to write it to the SCB even though the value won't get used by HW.

Maybe always allow to write it, for simplicity's sake? Does it matter
much whether it will get used by HW?

David Hildenbrand Dec. 13, 2018, 10:27 a.m. UTC | #9

On 13.12.18 11:21, Cornelia Huck wrote:
> On Thu, 13 Dec 2018 10:12:49 +0100
> David Hildenbrand <david@redhat.com> wrote:
> 
>> On 12.12.18 14:08, Cornelia Huck wrote:
>>> On Wed, 12 Dec 2018 11:49:56 +0100
>>> David Hildenbrand <david@redhat.com> wrote:
>>>   
>>>> I am wondering if another way to handle this would maybe be even better.
>>>> It would allow also new QEMU on old KVM to get access to these values.
>>>> It will simply not be written to HW then (bad luck, there is no HW
>>>> support eventually)
>>>>
>>>>
>>>> It goes like this:
>>>>
>>>> 1. don't implement diag318 in the kernel, implement it in user space.
>>>> diag318 is already forwarded to QEMU as of now. (I don't consider
>>>> diag318 peformance relevant)
>>>>
>>>> 2. in user space, we can always support diag318, even without KVM/HW
>>>> support. (could glue to machines if really needed)
>>>>
>>>> 3. If we get a diag318 and the CPU feature is not enabled, handle it if
>>>> diag318 is not available (exception, just as we do now).
>>>>
>>>> 3. If we get a diag318 and the CPU feature is enabled, store both values
>>>> in QEMU (for migration and e.g. DUMPs) AND ...
>>>>
>>>> 4. ... if KVM supports KVM_S390_VM_MACHINE_CPC, also write the new
>>>> values (or even only the CPNC! ) to KVM.
>>>>
>>>> 5. During migration, if KVM supports KVM_S390_VM_MACHINE_CPC, also write
>>>> the migrated value to KVM.  
>>>
>>> What about that sclp thingie (I'm not quite sure what it signifies)?  
>>
>> You mean the indication of diag318? Well we might want to check in KVM
>> if the host has it (and therefore the CPNC in the SCB actually gets
>> used) and only then allow to set the CPNC. Or we could simply always
>> allow to write it to the SCB even though the value won't get used by HW.
> 
> Maybe always allow to write it, for simplicity's sake? Does it matter
> much whether it will get used by HW?
> 

Usually it does not matter. I remember that there were some fields that
should not be touched if the feature is not available (otherwise you
might get undefined behavior).

Janosch Frank Dec. 13, 2018, 11:19 a.m. UTC | #10

On 13.12.18 11:27, David Hildenbrand wrote:
> On 13.12.18 11:21, Cornelia Huck wrote:
>> On Thu, 13 Dec 2018 10:12:49 +0100
>> David Hildenbrand <david@redhat.com> wrote:
>>
>>> On 12.12.18 14:08, Cornelia Huck wrote:
>>>> On Wed, 12 Dec 2018 11:49:56 +0100
>>>> David Hildenbrand <david@redhat.com> wrote:
>>>>   
>>>>> I am wondering if another way to handle this would maybe be even better.
>>>>> It would allow also new QEMU on old KVM to get access to these values.
>>>>> It will simply not be written to HW then (bad luck, there is no HW
>>>>> support eventually)
>>>>>
>>>>>
>>>>> It goes like this:
>>>>>
>>>>> 1. don't implement diag318 in the kernel, implement it in user space.
>>>>> diag318 is already forwarded to QEMU as of now. (I don't consider
>>>>> diag318 peformance relevant)
>>>>>
>>>>> 2. in user space, we can always support diag318, even without KVM/HW
>>>>> support. (could glue to machines if really needed)
>>>>>
>>>>> 3. If we get a diag318 and the CPU feature is not enabled, handle it if
>>>>> diag318 is not available (exception, just as we do now).
>>>>>
>>>>> 3. If we get a diag318 and the CPU feature is enabled, store both values
>>>>> in QEMU (for migration and e.g. DUMPs) AND ...
>>>>>
>>>>> 4. ... if KVM supports KVM_S390_VM_MACHINE_CPC, also write the new
>>>>> values (or even only the CPNC! ) to KVM.
>>>>>
>>>>> 5. During migration, if KVM supports KVM_S390_VM_MACHINE_CPC, also write
>>>>> the migrated value to KVM.  
>>>>
>>>> What about that sclp thingie (I'm not quite sure what it signifies)?  
>>>
>>> You mean the indication of diag318? Well we might want to check in KVM
>>> if the host has it (and therefore the CPNC in the SCB actually gets
>>> used) and only then allow to set the CPNC. Or we could simply always
>>> allow to write it to the SCB even though the value won't get used by HW.
>>
>> Maybe always allow to write it, for simplicity's sake? Does it matter
>> much whether it will get used by HW?
>>
> 
> Usually it does not matter. I remember that there were some fields that
> should not be touched if the feature is not available (otherwise you
> might get undefined behavior).
> 

@Collin: Could you clear that up?
I'd prefer to always have it if possible, if it would result in
undefined behavior we could store it in the KVM struct and not write it
to the SCB if the sclp bit is off.

It would be nice to have a consolidated sysfs or debugfs interface to
read that (and other things) from userspace, not having to parse the VM
messages. Well, maybe some other time.

Collin Walling Dec. 13, 2018, 3:54 p.m. UTC | #11

On 12/13/18 6:19 AM, Janosch Frank wrote:
> On 13.12.18 11:27, David Hildenbrand wrote:
>> On 13.12.18 11:21, Cornelia Huck wrote:
>>> On Thu, 13 Dec 2018 10:12:49 +0100
>>> David Hildenbrand <david@redhat.com> wrote:
>>>
>>>> On 12.12.18 14:08, Cornelia Huck wrote:
>>>>> On Wed, 12 Dec 2018 11:49:56 +0100
>>>>> David Hildenbrand <david@redhat.com> wrote:
>>>>>   
>>>>>> I am wondering if another way to handle this would maybe be even better.
>>>>>> It would allow also new QEMU on old KVM to get access to these values.
>>>>>> It will simply not be written to HW then (bad luck, there is no HW
>>>>>> support eventually)
>>>>>>
>>>>>>
>>>>>> It goes like this:
>>>>>>
>>>>>> 1. don't implement diag318 in the kernel, implement it in user space.
>>>>>> diag318 is already forwarded to QEMU as of now. (I don't consider
>>>>>> diag318 peformance relevant)
>>>>>>
>>>>>> 2. in user space, we can always support diag318, even without KVM/HW
>>>>>> support. (could glue to machines if really needed)
>>>>>>
>>>>>> 3. If we get a diag318 and the CPU feature is not enabled, handle it if
>>>>>> diag318 is not available (exception, just as we do now).
>>>>>>
>>>>>> 3. If we get a diag318 and the CPU feature is enabled, store both values
>>>>>> in QEMU (for migration and e.g. DUMPs) AND ...
>>>>>>
>>>>>> 4. ... if KVM supports KVM_S390_VM_MACHINE_CPC, also write the new
>>>>>> values (or even only the CPNC! ) to KVM.
>>>>>>
>>>>>> 5. During migration, if KVM supports KVM_S390_VM_MACHINE_CPC, also write
>>>>>> the migrated value to KVM.  
>>>>>
>>>>> What about that sclp thingie (I'm not quite sure what it signifies)?  
>>>>
>>>> You mean the indication of diag318? Well we might want to check in KVM
>>>> if the host has it (and therefore the CPNC in the SCB actually gets
>>>> used) and only then allow to set the CPNC. Or we could simply always
>>>> allow to write it to the SCB even though the value won't get used by HW.
>>>
>>> Maybe always allow to write it, for simplicity's sake? Does it matter
>>> much whether it will get used by HW?
>>>
>>
>> Usually it does not matter. I remember that there were some fields that
>> should not be touched if the feature is not available (otherwise you
>> might get undefined behavior).
>>
> 
> @Collin: Could you clear that up?
> I'd prefer to always have it if possible, if it would result in
> undefined behavior we could store it in the KVM struct and not write it
> to the SCB if the sclp bit is off.
> 

It should be completely safe. The CPNC takes up a previously reserved location
and firmware treats the CPNC as a read-only value. If we have it in the SCB and
firmware doesn't support reading it, then no harm is done.

> It would be nice to have a consolidated sysfs or debugfs interface to
> read that (and other things) from userspace, not having to parse the VM
> messages. Well, maybe some other time.
> 

Hm. Might be a useful feature for the host to get a quick at-a-glance
view of what guests are running. We should keep this in mind.

[v2,2/2] s390/kvm: handle diagnose 318 instruction call

Commit Message

Comments

Patch