diff mbox series

KVM: s390: Remove false WARN_ON_ONCE for the PQAP instruction

Message ID 20200505073525.2287-1-borntraeger@de.ibm.com (mailing list archive)
State New, archived
Headers show
Series KVM: s390: Remove false WARN_ON_ONCE for the PQAP instruction | expand

Commit Message

Christian Borntraeger May 5, 2020, 7:35 a.m. UTC
In LPAR we will only get an intercept for FC==3 for the PQAP
instruction. Running nested under z/VM can result in other intercepts as
well, for example PQAP(QCI). So the WARN_ON_ONCE is not right. Let
us simply remove it.

Cc: Pierre Morel <pmorel@linux.ibm.com>
Cc: Tony Krowiak <akrowiak@linux.ibm.com>
Reported-by: Qian Cai <cailca@icloud.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
---
 arch/s390/kvm/priv.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

Comments

Cornelia Huck May 5, 2020, 7:53 a.m. UTC | #1
On Tue,  5 May 2020 09:35:25 +0200
Christian Borntraeger <borntraeger@de.ibm.com> wrote:

> In LPAR we will only get an intercept for FC==3 for the PQAP
> instruction. Running nested under z/VM can result in other intercepts as
> well, for example PQAP(QCI). So the WARN_ON_ONCE is not right. Let
> us simply remove it.

While I agree with removing the WARN_ON_ONCE, I'm wondering why z/VM
gives us intercepts for those fcs... is that just a result of nesting
(or the z/VM implementation), or is there anything we might want to do?

> 
> Cc: Pierre Morel <pmorel@linux.ibm.com>
> Cc: Tony Krowiak <akrowiak@linux.ibm.com>
> Reported-by: Qian Cai <cailca@icloud.com>
> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
> ---
>  arch/s390/kvm/priv.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/s390/kvm/priv.c b/arch/s390/kvm/priv.c
> index 69a824f9ef0b..bbe46c6aedbf 100644
> --- a/arch/s390/kvm/priv.c
> +++ b/arch/s390/kvm/priv.c
> @@ -626,10 +626,12 @@ static int handle_pqap(struct kvm_vcpu *vcpu)
>  	 * available for the guest are AQIC and TAPQ with the t bit set
>  	 * since we do not set IC.3 (FIII) we currently will only intercept
>  	 * the AQIC function code.
> +	 * Note: running nested under z/VM can result in intercepts, e.g.

s/intercepts/intercepts for other function codes/

> +	 * for PQAP(QCI). We do not support this and bail out.
>  	 */
>  	reg0 = vcpu->run->s.regs.gprs[0];
>  	fc = (reg0 >> 24) & 0xff;
> -	if (WARN_ON_ONCE(fc != 0x03))
> +	if (fc != 0x03)
>  		return -EOPNOTSUPP;
>  
>  	/* PQAP instruction is allowed for guest kernel only */
Christian Borntraeger May 5, 2020, 7:55 a.m. UTC | #2
On 05.05.20 09:53, Cornelia Huck wrote:
> On Tue,  5 May 2020 09:35:25 +0200
> Christian Borntraeger <borntraeger@de.ibm.com> wrote:
> 
>> In LPAR we will only get an intercept for FC==3 for the PQAP
>> instruction. Running nested under z/VM can result in other intercepts as
>> well, for example PQAP(QCI). So the WARN_ON_ONCE is not right. Let
>> us simply remove it.
> 
> While I agree with removing the WARN_ON_ONCE, I'm wondering why z/VM
> gives us intercepts for those fcs... is that just a result of nesting
> (or the z/VM implementation), or is there anything we might want to do?

Yes nesting. 
The ECA bit for interpretion is an effective one. So if the ECA bit is off
in z/VM (no crypto cards) our ECA bit is basically ignored as these bits
are ANDed.
I asked Tony to ask the z/VM team if that is the case here.

> 
>>
>> Cc: Pierre Morel <pmorel@linux.ibm.com>
>> Cc: Tony Krowiak <akrowiak@linux.ibm.com>
>> Reported-by: Qian Cai <cailca@icloud.com>
>> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
>> ---
>>  arch/s390/kvm/priv.c | 4 +++-
>>  1 file changed, 3 insertions(+), 1 deletion(-)
>>
>> diff --git a/arch/s390/kvm/priv.c b/arch/s390/kvm/priv.c
>> index 69a824f9ef0b..bbe46c6aedbf 100644
>> --- a/arch/s390/kvm/priv.c
>> +++ b/arch/s390/kvm/priv.c
>> @@ -626,10 +626,12 @@ static int handle_pqap(struct kvm_vcpu *vcpu)
>>  	 * available for the guest are AQIC and TAPQ with the t bit set
>>  	 * since we do not set IC.3 (FIII) we currently will only intercept
>>  	 * the AQIC function code.
>> +	 * Note: running nested under z/VM can result in intercepts, e.g.
> 
> s/intercepts/intercepts for other function codes/
> 
>> +	 * for PQAP(QCI). We do not support this and bail out.
>>  	 */
>>  	reg0 = vcpu->run->s.regs.gprs[0];
>>  	fc = (reg0 >> 24) & 0xff;
>> -	if (WARN_ON_ONCE(fc != 0x03))
>> +	if (fc != 0x03)
>>  		return -EOPNOTSUPP;
>>  
>>  	/* PQAP instruction is allowed for guest kernel only */
>
David Hildenbrand May 5, 2020, 8:04 a.m. UTC | #3
On 05.05.20 09:55, Christian Borntraeger wrote:
> 
> 
> On 05.05.20 09:53, Cornelia Huck wrote:
>> On Tue,  5 May 2020 09:35:25 +0200
>> Christian Borntraeger <borntraeger@de.ibm.com> wrote:
>>
>>> In LPAR we will only get an intercept for FC==3 for the PQAP
>>> instruction. Running nested under z/VM can result in other intercepts as
>>> well, for example PQAP(QCI). So the WARN_ON_ONCE is not right. Let
>>> us simply remove it.
>>
>> While I agree with removing the WARN_ON_ONCE, I'm wondering why z/VM
>> gives us intercepts for those fcs... is that just a result of nesting
>> (or the z/VM implementation), or is there anything we might want to do?
> 
> Yes nesting. 
> The ECA bit for interpretion is an effective one. So if the ECA bit is off
> in z/VM (no crypto cards) our ECA bit is basically ignored as these bits
> are ANDed.
> I asked Tony to ask the z/VM team if that is the case here.
> 

So we can't detect if we have support for ECA_APIE, because there is no
explicit feature bit, right? Rings a bell. Still an ugly
hardware/firmware specification.

Seems to be the right thing to do

Reviewed-by: David Hildenbrand <david@redhat.com>
Cornelia Huck May 5, 2020, 8:21 a.m. UTC | #4
On Tue, 5 May 2020 09:55:36 +0200
Christian Borntraeger <borntraeger@de.ibm.com> wrote:

> On 05.05.20 09:53, Cornelia Huck wrote:
> > On Tue,  5 May 2020 09:35:25 +0200
> > Christian Borntraeger <borntraeger@de.ibm.com> wrote:
> >   
> >> In LPAR we will only get an intercept for FC==3 for the PQAP
> >> instruction. Running nested under z/VM can result in other intercepts as
> >> well, for example PQAP(QCI). So the WARN_ON_ONCE is not right. Let
> >> us simply remove it.  
> > 
> > While I agree with removing the WARN_ON_ONCE, I'm wondering why z/VM
> > gives us intercepts for those fcs... is that just a result of nesting
> > (or the z/VM implementation), or is there anything we might want to do?  
> 
> Yes nesting. 
> The ECA bit for interpretion is an effective one. So if the ECA bit is off
> in z/VM (no crypto cards) our ECA bit is basically ignored as these bits
> are ANDed.

Ok, that makes sense, even if it is rather ugly.

With the comment tweaked,
Reviewed-by: Cornelia Huck <cohuck@redhat.com>

> I asked Tony to ask the z/VM team if that is the case here.
> 
> >   
> >>
> >> Cc: Pierre Morel <pmorel@linux.ibm.com>
> >> Cc: Tony Krowiak <akrowiak@linux.ibm.com>
> >> Reported-by: Qian Cai <cailca@icloud.com>
> >> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
> >> ---
> >>  arch/s390/kvm/priv.c | 4 +++-
> >>  1 file changed, 3 insertions(+), 1 deletion(-)
> >>
> >> diff --git a/arch/s390/kvm/priv.c b/arch/s390/kvm/priv.c
> >> index 69a824f9ef0b..bbe46c6aedbf 100644
> >> --- a/arch/s390/kvm/priv.c
> >> +++ b/arch/s390/kvm/priv.c
> >> @@ -626,10 +626,12 @@ static int handle_pqap(struct kvm_vcpu *vcpu)
> >>  	 * available for the guest are AQIC and TAPQ with the t bit set
> >>  	 * since we do not set IC.3 (FIII) we currently will only intercept
> >>  	 * the AQIC function code.
> >> +	 * Note: running nested under z/VM can result in intercepts, e.g.  
> > 
> > s/intercepts/intercepts for other function codes/
> >   
> >> +	 * for PQAP(QCI). We do not support this and bail out.
> >>  	 */
> >>  	reg0 = vcpu->run->s.regs.gprs[0];
> >>  	fc = (reg0 >> 24) & 0xff;
> >> -	if (WARN_ON_ONCE(fc != 0x03))
> >> +	if (fc != 0x03)
> >>  		return -EOPNOTSUPP;
> >>  
> >>  	/* PQAP instruction is allowed for guest kernel only */  
> >   
>
Christian Borntraeger May 5, 2020, 8:27 a.m. UTC | #5
On 05.05.20 10:04, David Hildenbrand wrote:
> On 05.05.20 09:55, Christian Borntraeger wrote:
>>
>>
>> On 05.05.20 09:53, Cornelia Huck wrote:
>>> On Tue,  5 May 2020 09:35:25 +0200
>>> Christian Borntraeger <borntraeger@de.ibm.com> wrote:
>>>
>>>> In LPAR we will only get an intercept for FC==3 for the PQAP
>>>> instruction. Running nested under z/VM can result in other intercepts as
>>>> well, for example PQAP(QCI). So the WARN_ON_ONCE is not right. Let
>>>> us simply remove it.
>>>
>>> While I agree with removing the WARN_ON_ONCE, I'm wondering why z/VM
>>> gives us intercepts for those fcs... is that just a result of nesting
>>> (or the z/VM implementation), or is there anything we might want to do?
>>
>> Yes nesting. 
>> The ECA bit for interpretion is an effective one. So if the ECA bit is off
>> in z/VM (no crypto cards) our ECA bit is basically ignored as these bits
>> are ANDed.
>> I asked Tony to ask the z/VM team if that is the case here.
>>
> 
> So we can't detect if we have support for ECA_APIE, because there is no
> explicit feature bit, right? Rings a bell. Still an ugly
> hardware/firmware specification.

Yes, no matter if this is the case here, we cannot rely on ECA_APIE to not
trigger intercepts. So we must remove the WARN_ON. 

cc stable?

> 
> Seems to be the right thing to do
> 
> Reviewed-by: David Hildenbrand <david@redhat.com>
>
David Hildenbrand May 5, 2020, 8:28 a.m. UTC | #6
On 05.05.20 10:27, Christian Borntraeger wrote:
> 
> 
> On 05.05.20 10:04, David Hildenbrand wrote:
>> On 05.05.20 09:55, Christian Borntraeger wrote:
>>>
>>>
>>> On 05.05.20 09:53, Cornelia Huck wrote:
>>>> On Tue,  5 May 2020 09:35:25 +0200
>>>> Christian Borntraeger <borntraeger@de.ibm.com> wrote:
>>>>
>>>>> In LPAR we will only get an intercept for FC==3 for the PQAP
>>>>> instruction. Running nested under z/VM can result in other intercepts as
>>>>> well, for example PQAP(QCI). So the WARN_ON_ONCE is not right. Let
>>>>> us simply remove it.
>>>>
>>>> While I agree with removing the WARN_ON_ONCE, I'm wondering why z/VM
>>>> gives us intercepts for those fcs... is that just a result of nesting
>>>> (or the z/VM implementation), or is there anything we might want to do?
>>>
>>> Yes nesting. 
>>> The ECA bit for interpretion is an effective one. So if the ECA bit is off
>>> in z/VM (no crypto cards) our ECA bit is basically ignored as these bits
>>> are ANDed.
>>> I asked Tony to ask the z/VM team if that is the case here.
>>>
>>
>> So we can't detect if we have support for ECA_APIE, because there is no
>> explicit feature bit, right? Rings a bell. Still an ugly
>> hardware/firmware specification.
> 
> Yes, no matter if this is the case here, we cannot rely on ECA_APIE to not
> trigger intercepts. So we must remove the WARN_ON. 
> 
> cc stable?

Yes, I'd say so.
Cornelia Huck May 5, 2020, 8:33 a.m. UTC | #7
On Tue, 5 May 2020 10:27:16 +0200
Christian Borntraeger <borntraeger@de.ibm.com> wrote:

> On 05.05.20 10:04, David Hildenbrand wrote:
> > On 05.05.20 09:55, Christian Borntraeger wrote:  
> >>
> >>
> >> On 05.05.20 09:53, Cornelia Huck wrote:  
> >>> On Tue,  5 May 2020 09:35:25 +0200
> >>> Christian Borntraeger <borntraeger@de.ibm.com> wrote:
> >>>  
> >>>> In LPAR we will only get an intercept for FC==3 for the PQAP
> >>>> instruction. Running nested under z/VM can result in other intercepts as
> >>>> well, for example PQAP(QCI). So the WARN_ON_ONCE is not right. Let
> >>>> us simply remove it.  
> >>>
> >>> While I agree with removing the WARN_ON_ONCE, I'm wondering why z/VM
> >>> gives us intercepts for those fcs... is that just a result of nesting
> >>> (or the z/VM implementation), or is there anything we might want to do?  
> >>
> >> Yes nesting. 
> >> The ECA bit for interpretion is an effective one. So if the ECA bit is off
> >> in z/VM (no crypto cards) our ECA bit is basically ignored as these bits
> >> are ANDed.
> >> I asked Tony to ask the z/VM team if that is the case here.
> >>  
> > 
> > So we can't detect if we have support for ECA_APIE, because there is no
> > explicit feature bit, right? Rings a bell. Still an ugly
> > hardware/firmware specification.  
> 
> Yes, no matter if this is the case here, we cannot rely on ECA_APIE to not
> trigger intercepts. So we must remove the WARN_ON. 
> 
> cc stable?

Agreed.

> 
> > 
> > Seems to be the right thing to do
> > 
> > Reviewed-by: David Hildenbrand <david@redhat.com>
> >   
>
Pierre Morel May 5, 2020, 12:18 p.m. UTC | #8
On 2020-05-05 10:27, Christian Borntraeger wrote:
> 
> 
> On 05.05.20 10:04, David Hildenbrand wrote:
>> On 05.05.20 09:55, Christian Borntraeger wrote:
>>>
>>>
>>> On 05.05.20 09:53, Cornelia Huck wrote:
>>>> On Tue,  5 May 2020 09:35:25 +0200
>>>> Christian Borntraeger <borntraeger@de.ibm.com> wrote:
>>>>
>>>>> In LPAR we will only get an intercept for FC==3 for the PQAP
>>>>> instruction. Running nested under z/VM can result in other intercepts as
>>>>> well, for example PQAP(QCI). So the WARN_ON_ONCE is not right. Let
>>>>> us simply remove it.
>>>>
>>>> While I agree with removing the WARN_ON_ONCE, I'm wondering why z/VM
>>>> gives us intercepts for those fcs... is that just a result of nesting
>>>> (or the z/VM implementation), or is there anything we might want to do?
>>>
>>> Yes nesting.
>>> The ECA bit for interpretion is an effective one. So if the ECA bit is off
>>> in z/VM (no crypto cards) our ECA bit is basically ignored as these bits
>>> are ANDed.
>>> I asked Tony to ask the z/VM team if that is the case here.
>>>
>>
>> So we can't detect if we have support for ECA_APIE, because there is no
>> explicit feature bit, right? Rings a bell. Still an ugly
>> hardware/firmware specification.

Sorry to be late but you were really too fast for me. :)

AFAIK we detect if we have AP instructions enabled by ECA_APIE for the 
host by probing with a PQAP(TESTQ) during the boot.
If the hypervizor accept this instruction it is supposed to work as if 
it has set APIE present for the Linux host.
If the instruction is rejected we do not enable AP instructions for the 
guest

We also detect if we can use QCI by testing the facility bit and 
propagate only the facility bits we have earned or emulate don't we?

So here I am curious why we got an interception.

Did we give false information to the guest?
Is the guest right to issue the instruction intercepted?
Did z/VM provide the host with false facility information?
Did z/VM dynamically change the virtualization scheme after the boot?

I did not find evidence of the first assumption which would have been a 
legitimate warning.
The next 3 are, IMHO, misbehavior from the guest or z/VM, and do not 
justify a warning there so I find right to remove it.

consider it as a "late" reviewed-by.

Regards,
Pierre


> 
> Yes, no matter if this is the case here, we cannot rely on ECA_APIE to not
> trigger intercepts. So we must remove the WARN_ON.




> 
> cc stable?
> 
>>
>> Seems to be the right thing to do
>>
>> Reviewed-by: David Hildenbrand <david@redhat.com>
>>
Christian Borntraeger May 5, 2020, 12:31 p.m. UTC | #9
On 05.05.20 14:18, Pierre Morel wrote:
> 
> 
> On 2020-05-05 10:27, Christian Borntraeger wrote:
>>
>>
>> On 05.05.20 10:04, David Hildenbrand wrote:
>>> On 05.05.20 09:55, Christian Borntraeger wrote:
>>>>
>>>>
>>>> On 05.05.20 09:53, Cornelia Huck wrote:
>>>>> On Tue,  5 May 2020 09:35:25 +0200
>>>>> Christian Borntraeger <borntraeger@de.ibm.com> wrote:
>>>>>
>>>>>> In LPAR we will only get an intercept for FC==3 for the PQAP
>>>>>> instruction. Running nested under z/VM can result in other intercepts as
>>>>>> well, for example PQAP(QCI). So the WARN_ON_ONCE is not right. Let
>>>>>> us simply remove it.
>>>>>
>>>>> While I agree with removing the WARN_ON_ONCE, I'm wondering why z/VM
>>>>> gives us intercepts for those fcs... is that just a result of nesting
>>>>> (or the z/VM implementation), or is there anything we might want to do?
>>>>
>>>> Yes nesting.
>>>> The ECA bit for interpretion is an effective one. So if the ECA bit is off
>>>> in z/VM (no crypto cards) our ECA bit is basically ignored as these bits
>>>> are ANDed.
>>>> I asked Tony to ask the z/VM team if that is the case here.
>>>>
>>>
>>> So we can't detect if we have support for ECA_APIE, because there is no
>>> explicit feature bit, right? Rings a bell. Still an ugly
>>> hardware/firmware specification.
> 
> Sorry to be late but you were really too fast for me. :)
> 
> AFAIK we detect if we have AP instructions enabled by ECA_APIE for the host by probing with a PQAP(TESTQ) during the boot.
> If the hypervizor accept this instruction it is supposed to work as if it has set APIE present for the Linux host.
> If the instruction is rejected we do not enable AP instructions for the guest

Yes, we do have the AP instruction in the KVM host (z/VM guest). It seems that this is implemented without ECA_APIE on the z/VM side
as there is no domain assigned yet (and it is still possible to attach a virtual crypto device/domain).
It seems to me that z/VM implements all of this with a software implementation. This then is not used for VSIE.
Instead it relies on the architecture saying that ECA_APIE is an effective control.

> 
> We also detect if we can use QCI by testing the facility bit and propagate only the facility bits we have earned or emulate don't we?
> 
> So here I am curious why we got an interception.
> 
> Did we give false information to the guest?
> Is the guest right to issue the instruction intercepted?
> Did z/VM provide the host with false facility information?
> Did z/VM dynamically change the virtualization scheme after the boot?
> 
> I did not find evidence of the first assumption which would have been a legitimate warning.
> The next 3 are, IMHO, misbehavior from the guest or z/VM, and do not justify a warning there so I find right to remove it.
> 
> consider it as a "late" reviewed-by.
> 
> Regards,
>
Anthony Krowiak May 5, 2020, 10:34 p.m. UTC | #10
On 5/5/20 3:55 AM, Christian Borntraeger wrote:
>
> On 05.05.20 09:53, Cornelia Huck wrote:
>> On Tue,  5 May 2020 09:35:25 +0200
>> Christian Borntraeger <borntraeger@de.ibm.com> wrote:
>>
>>> In LPAR we will only get an intercept for FC==3 for the PQAP
>>> instruction. Running nested under z/VM can result in other intercepts as
>>> well, for example PQAP(QCI). So the WARN_ON_ONCE is not right. Let
>>> us simply remove it.
>> While I agree with removing the WARN_ON_ONCE, I'm wondering why z/VM
>> gives us intercepts for those fcs... is that just a result of nesting
>> (or the z/VM implementation), or is there anything we might want to do?
> Yes nesting.
> The ECA bit for interpretion is an effective one. So if the ECA bit is off
> in z/VM (no crypto cards) our ECA bit is basically ignored as these bits
> are ANDed.
> I asked Tony to ask the z/VM team if that is the case here.

I apologize, but I was on vacation yesterday and did not have a
chance to look at this until today. I left a slack message for
my z/VM contact, but have not yet gotten a response.

The only AP virtualization support we currently provide with
Linux on Z relies on AP interpretive execution. The ECA.28
bit in the SIE state description determines whether AP
instructions executed on a guest are intercepted (0) or
interpreted (1). The problem here is that ECA.28 is an
effective control meaning that ECA.28 for the guest is
logically ANDed with the host's. If linux is running as a
guest of z/VM and z/VM is sets ECA.28 to zero,
then ECA.28 for the guest will also be zero, in which case
every AP instruction executed on the guest will be intercepted.

The only AP instruction that has an interception handler is
PQAP with function code 0x03 (AP-queue interruption control), so
this warning is being issued for all other AP instructions being
intercepted; so, maybe this is the right thing to do? After all,
running a linux as a guest of z/VM that is setting ECA.28 to zero is not
a supported configuration.

Having said that, the root of the problem is the fact that
a guest is allowed to start without AP interpretive execution
turned on because that is the only currently supported configuration.
If there is a way to determine the effective value of ECA.28 for a
KVM guest, when KVM could respond appropriately when QEMU
queries whether the KVM_S390_VM_CRYPTO_ENABLE_APIE attribute
is available in the CPU model. If it is not, then QEMU does not set the
S390_FEAT_AP (AP instructions installed) feature and a guest started
with cpu model feature ap='on' will fail to start.


>
>>> Cc: Pierre Morel <pmorel@linux.ibm.com>
>>> Cc: Tony Krowiak <akrowiak@linux.ibm.com>
>>> Reported-by: Qian Cai <cailca@icloud.com>
>>> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
>>> ---
>>>   arch/s390/kvm/priv.c | 4 +++-
>>>   1 file changed, 3 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/arch/s390/kvm/priv.c b/arch/s390/kvm/priv.c
>>> index 69a824f9ef0b..bbe46c6aedbf 100644
>>> --- a/arch/s390/kvm/priv.c
>>> +++ b/arch/s390/kvm/priv.c
>>> @@ -626,10 +626,12 @@ static int handle_pqap(struct kvm_vcpu *vcpu)
>>>   	 * available for the guest are AQIC and TAPQ with the t bit set
>>>   	 * since we do not set IC.3 (FIII) we currently will only intercept
>>>   	 * the AQIC function code.
>>> +	 * Note: running nested under z/VM can result in intercepts, e.g.
>> s/intercepts/intercepts for other function codes/
>>
>>> +	 * for PQAP(QCI). We do not support this and bail out.
>>>   	 */
>>>   	reg0 = vcpu->run->s.regs.gprs[0];
>>>   	fc = (reg0 >> 24) & 0xff;
>>> -	if (WARN_ON_ONCE(fc != 0x03))
>>> +	if (fc != 0x03)
>>>   		return -EOPNOTSUPP;
>>>   
>>>   	/* PQAP instruction is allowed for guest kernel only */
Christian Borntraeger May 6, 2020, 6:08 a.m. UTC | #11
On 06.05.20 00:34, Tony Krowiak wrote:
> 
> 
> On 5/5/20 3:55 AM, Christian Borntraeger wrote:
>>
>> On 05.05.20 09:53, Cornelia Huck wrote:
>>> On Tue,  5 May 2020 09:35:25 +0200
>>> Christian Borntraeger <borntraeger@de.ibm.com> wrote:
>>>
>>>> In LPAR we will only get an intercept for FC==3 for the PQAP
>>>> instruction. Running nested under z/VM can result in other intercepts as
>>>> well, for example PQAP(QCI). So the WARN_ON_ONCE is not right. Let
>>>> us simply remove it.
>>> While I agree with removing the WARN_ON_ONCE, I'm wondering why z/VM
>>> gives us intercepts for those fcs... is that just a result of nesting
>>> (or the z/VM implementation), or is there anything we might want to do?
>> Yes nesting.
>> The ECA bit for interpretion is an effective one. So if the ECA bit is off
>> in z/VM (no crypto cards) our ECA bit is basically ignored as these bits
>> are ANDed.
>> I asked Tony to ask the z/VM team if that is the case here.
> 
> I apologize, but I was on vacation yesterday and did not have a
> chance to look at this until today. I left a slack message for
> my z/VM contact, but have not yet gotten a response.
> 
> The only AP virtualization support we currently provide with
> Linux on Z relies on AP interpretive execution. The ECA.28
> bit in the SIE state description determines whether AP
> instructions executed on a guest are intercepted (0) or
> interpreted (1). The problem here is that ECA.28 is an
> effective control meaning that ECA.28 for the guest is
> logically ANDed with the host's. If linux is running as a
> guest of z/VM and z/VM is sets ECA.28 to zero,
> then ECA.28 for the guest will also be zero, in which case
> every AP instruction executed on the guest will be intercepted.
> 
> The only AP instruction that has an interception handler is
> PQAP with function code 0x03 (AP-queue interruption control), so
> this warning is being issued for all other AP instructions being
> intercepted; so, maybe this is the right thing to do? After all,
> running a linux as a guest of z/VM that is setting ECA.28 to zero is not
> a supported configuration.
> 
> Having said that, the root of the problem is the fact that
> a guest is allowed to start without AP interpretive execution
> turned on because that is the only currently supported configuration.
> If there is a way to determine the effective value of ECA.28 for a
> KVM guest, when KVM could respond appropriately when QEMU
> queries whether the KVM_S390_VM_CRYPTO_ENABLE_APIE attribute
> is available in the CPU model. If it is not, then QEMU does not set the
> S390_FEAT_AP (AP instructions installed) feature and a guest started
> with cpu model feature ap='on' will fail to start.

I think the problem is that there is no way to find out the effective value
of ECA_APIE (and I think this could even change during the lifetime of a
guest). So I see 3 options:
1. check for z/VM and do not advertise crypto (ap=on will always fail). This
will disable the possibility to pass through a pass through device. (I think
if the zVM guest has an adapter APDEDICATED then z/VM does set ECA_APIE)
2. reject all crypto instructions that do exit and are not fc==3 (PQAP ACIQ).
This is kind of not ideal as we will pass through facility stfle.12 (PQAP QCI)
but then fail to handle it. Linux does handle faults on these instructions just
fine, though. 
3. Implement emulation of crypto. I think this not what we want to do, especially
as this only makes sense for acceleration but not for any of the secure key
or protected key schemes.

The WARN_ON is certainly something that must go as long as we construct cases
that can trigger this.

So I would suggest to go with the miminal patch (variant 2) which basically just
removes the WARN_ON.
We can then think about doing a nicer variant on top.
e.g. implement PQAP QCI that just returns an empty config. I will look into this.
Anthony Krowiak May 6, 2020, 11:29 p.m. UTC | #12
On 5/6/20 2:08 AM, Christian Borntraeger wrote:
>
> On 06.05.20 00:34, Tony Krowiak wrote:
>>
>> On 5/5/20 3:55 AM, Christian Borntraeger wrote:
>>> On 05.05.20 09:53, Cornelia Huck wrote:
>>>> On Tue,  5 May 2020 09:35:25 +0200
>>>> Christian Borntraeger <borntraeger@de.ibm.com> wrote:
>>>>
>>>>> In LPAR we will only get an intercept for FC==3 for the PQAP
>>>>> instruction. Running nested under z/VM can result in other intercepts as
>>>>> well, for example PQAP(QCI). So the WARN_ON_ONCE is not right. Let
>>>>> us simply remove it.
>>>> While I agree with removing the WARN_ON_ONCE, I'm wondering why z/VM
>>>> gives us intercepts for those fcs... is that just a result of nesting
>>>> (or the z/VM implementation), or is there anything we might want to do?
>>> Yes nesting.
>>> The ECA bit for interpretion is an effective one. So if the ECA bit is off
>>> in z/VM (no crypto cards) our ECA bit is basically ignored as these bits
>>> are ANDed.
>>> I asked Tony to ask the z/VM team if that is the case here.
>> I apologize, but I was on vacation yesterday and did not have a
>> chance to look at this until today. I left a slack message for
>> my z/VM contact, but have not yet gotten a response.
>>
>> The only AP virtualization support we currently provide with
>> Linux on Z relies on AP interpretive execution. The ECA.28
>> bit in the SIE state description determines whether AP
>> instructions executed on a guest are intercepted (0) or
>> interpreted (1). The problem here is that ECA.28 is an
>> effective control meaning that ECA.28 for the guest is
>> logically ANDed with the host's. If linux is running as a
>> guest of z/VM and z/VM is sets ECA.28 to zero,
>> then ECA.28 for the guest will also be zero, in which case
>> every AP instruction executed on the guest will be intercepted.
>>
>> The only AP instruction that has an interception handler is
>> PQAP with function code 0x03 (AP-queue interruption control), so
>> this warning is being issued for all other AP instructions being
>> intercepted; so, maybe this is the right thing to do? After all,
>> running a linux as a guest of z/VM that is setting ECA.28 to zero is not
>> a supported configuration.
>>
>> Having said that, the root of the problem is the fact that
>> a guest is allowed to start without AP interpretive execution
>> turned on because that is the only currently supported configuration.
>> If there is a way to determine the effective value of ECA.28 for a
>> KVM guest, when KVM could respond appropriately when QEMU
>> queries whether the KVM_S390_VM_CRYPTO_ENABLE_APIE attribute
>> is available in the CPU model. If it is not, then QEMU does not set the
>> S390_FEAT_AP (AP instructions installed) feature and a guest started
>> with cpu model feature ap='on' will fail to start.
> I think the problem is that there is no way to find out the effective value
> of ECA_APIE (and I think this could even change during the lifetime of a
> guest). So I see 3 options:
> 1. check for z/VM and do not advertise crypto (ap=on will always fail). This
> will disable the possibility to pass through a pass through device. (I think
> if the zVM guest has an adapter APDEDICATED then z/VM does set ECA_APIE)
> 2. reject all crypto instructions that do exit and are not fc==3 (PQAP ACIQ).
> This is kind of not ideal as we will pass through facility stfle.12 (PQAP QCI)
> but then fail to handle it. Linux does handle faults on these instructions just
> fine, though.
> 3. Implement emulation of crypto. I think this not what we want to do, especially
> as this only makes sense for acceleration but not for any of the secure key
> or protected key schemes.
>
> The WARN_ON is certainly something that must go as long as we construct cases
> that can trigger this.
>
> So I would suggest to go with the miminal patch (variant 2) which basically just
> removes the WARN_ON.
> We can then think about doing a nicer variant on top.
> e.g. implement PQAP QCI that just returns an empty config. I will look into this.

Agree that the WARN_ON shall be removed with the caveat that a better
solution will be pursued in the future.

Reviewed-by: Tony Krowiak <akrowiak@linux.ibm.com>
diff mbox series

Patch

diff --git a/arch/s390/kvm/priv.c b/arch/s390/kvm/priv.c
index 69a824f9ef0b..bbe46c6aedbf 100644
--- a/arch/s390/kvm/priv.c
+++ b/arch/s390/kvm/priv.c
@@ -626,10 +626,12 @@  static int handle_pqap(struct kvm_vcpu *vcpu)
 	 * available for the guest are AQIC and TAPQ with the t bit set
 	 * since we do not set IC.3 (FIII) we currently will only intercept
 	 * the AQIC function code.
+	 * Note: running nested under z/VM can result in intercepts, e.g.
+	 * for PQAP(QCI). We do not support this and bail out.
 	 */
 	reg0 = vcpu->run->s.regs.gprs[0];
 	fc = (reg0 >> 24) & 0xff;
-	if (WARN_ON_ONCE(fc != 0x03))
+	if (fc != 0x03)
 		return -EOPNOTSUPP;
 
 	/* PQAP instruction is allowed for guest kernel only */