[v1,4/4] s390/kvm: VSIE: correctly handle MVPG when in VSIE

Message ID	20201218141811.310267-5-imbrenda@linux.ibm.com (mailing list archive)
State	New, archived
Headers	show Return-Path: <kvm-owner@kernel.org> From: Claudio Imbrenda <imbrenda@linux.ibm.com> To: linux-kernel@vger.kernel.org Cc: borntraeger@de.ibm.com, frankja@linux.ibm.com, david@redhat.com, kvm@vger.kernel.org, linux-s390@vger.kernel.org, stable@vger.kernel.org Subject: [PATCH v1 4/4] s390/kvm: VSIE: correctly handle MVPG when in VSIE Date: Fri, 18 Dec 2020 15:18:11 +0100 Message-Id: <20201218141811.310267-5-imbrenda@linux.ibm.com> In-Reply-To: <20201218141811.310267-1-imbrenda@linux.ibm.com> References: <20201218141811.310267-1-imbrenda@linux.ibm.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Precedence: bulk
Series	s390/kvm: fix MVPG when in VSIE \| expand [v1,0/4] s390/kvm: fix MVPG when in VSIE [v1,1/4] s390/kvm: VSIE: stop leaking host addresses [v1,2/4] s390/kvm: extend guest_translate for MVPG interpretation [v1,3/4] s390/kvm: add kvm_s390_vsie_mvpg_check needed for VSIE MVPG [v1,4/4] s390/kvm: VSIE: correctly handle MVPG when in VSIE

Claudio Imbrenda Dec. 18, 2020, 2:18 p.m. UTC

Correctly handle the MVPG instruction when issued by a VSIE guest.

Fixes: a3508fbe9dc6d ("KVM: s390: vsie: initial support for nested virtualization")
Cc: stable@vger.kernel.org
Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
---
 arch/s390/kvm/vsie.c | 73 ++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 73 insertions(+)

David Hildenbrand Dec. 20, 2020, 10:13 a.m. UTC | #1

On 18.12.20 15:18, Claudio Imbrenda wrote:
> Correctly handle the MVPG instruction when issued by a VSIE guest.
> 

I remember that MVPG SIE documentation was completely crazy and full of
corner cases. :)

Looking at arch/s390/kvm/intercept.c:handle_mvpg_pei(), I can spot that

1. "This interception can only happen for guests with DAT disabled ..."
2. KVM does not make use of any mvpg state inside the SCB.

Can this be observed with Linux guests?

Can I get some information on what information is stored at [0xc0, 0xd)
inside the SCB? I assume it's:

0xc0: guest physical address of source PTE
0xc8: guest physical address of target PTE

Also, which conditions have to be met such that we get a ICPT_PARTEXEC:

a) State of guest DAT (I assume off?)
b) State of PTEs: What happens if there is no PTE (I assume we need two
PTEs, otherwise no such intercept)? I assume we get an intercept if one
of both PTEs is not present or the destination PTE is protected. Correct?

So, when we (g1) get an intercept for g3, can we be sure 0xc0 and 0xc8
in the scb are both valid g1 addresses pointing at our PTE, and what do
we know about these PTEs (one not present or destination protected)?

[...]
>  /*
>   * Run the vsie on a shadow scb and a shadow gmap, without any further
>   * sanity checks, handling SIE faults.
> @@ -1063,6 +1132,10 @@ static int do_vsie_run(struct kvm_vcpu *vcpu, struct vsie_page *vsie_page)
>  		if ((scb_s->ipa & 0xf000) != 0xf000)
>  			scb_s->ipa += 0x1000;
>  		break;
> +	case ICPT_PARTEXEC:
> +		if (scb_s->ipa == 0xb254)

Old code hat "/* MVPG only */" - why is this condition now necessary?

> +			rc = vsie_handle_mvpg(vcpu, vsie_page);
> +		break;
>  	}
>  	return rc;
>  }
>

Claudio Imbrenda Jan. 4, 2021, 3:22 p.m. UTC | #2

On Sun, 20 Dec 2020 11:13:57 +0100
David Hildenbrand <david@redhat.com> wrote:

> On 18.12.20 15:18, Claudio Imbrenda wrote:
> > Correctly handle the MVPG instruction when issued by a VSIE guest.
> >   
> 
> I remember that MVPG SIE documentation was completely crazy and full
> of corner cases. :)

you remember correctly

> Looking at arch/s390/kvm/intercept.c:handle_mvpg_pei(), I can spot
> that
> 
> 1. "This interception can only happen for guests with DAT disabled
> ..." 2. KVM does not make use of any mvpg state inside the SCB.
> 
> Can this be observed with Linux guests?

a Linux guest will typically not run with DAT disabled

> Can I get some information on what information is stored at [0xc0,
> 0xd) inside the SCB? I assume it's:
> 
> 0xc0: guest physical address of source PTE
> 0xc8: guest physical address of target PTE

yes (plus 3 flags in the lower bits of each)

> 
> Also, which conditions have to be met such that we get a
> ICPT_PARTEXEC:
> 
> a) State of guest DAT (I assume off?)
> b) State of PTEs: What happens if there is no PTE (I assume we need
> two PTEs, otherwise no such intercept)? I assume we get an intercept
> if one of both PTEs is not present or the destination PTE is
> protected. Correct?

we get the intercept if the guest has DAT off, and at least one of the
host PTEs is invalid (and I think if the destination is valid but
protected)

> So, when we (g1) get an intercept for g3, can we be sure 0xc0 and 0xc8
> in the scb are both valid g1 addresses pointing at our PTE, and what
> do we know about these PTEs (one not present or destination
> protected)?

we only know that at least one of the following holds true:
* source invalid
* destination invalid
* destination protected

there is a bit that tells us if the destination was protected (bit 62),
but that does not exclude an invalid source

> [...]
> >  /*
> >   * Run the vsie on a shadow scb and a shadow gmap, without any
> > further
> >   * sanity checks, handling SIE faults.
> > @@ -1063,6 +1132,10 @@ static int do_vsie_run(struct kvm_vcpu
> > *vcpu, struct vsie_page *vsie_page) if ((scb_s->ipa & 0xf000) !=
> > 0xf000) scb_s->ipa += 0x1000;
> >  		break;
> > +	case ICPT_PARTEXEC:
> > +		if (scb_s->ipa == 0xb254)  
> 
> Old code hat "/* MVPG only */" - why is this condition now necessary?

old code was wrong ;)

> > +			rc = vsie_handle_mvpg(vcpu, vsie_page);
> > +		break;
> >  	}
> >  	return rc;
> >  }
> >   
> 
>

David Hildenbrand Jan. 4, 2021, 4:08 p.m. UTC | #3

On 04.01.21 16:22, Claudio Imbrenda wrote:
> On Sun, 20 Dec 2020 11:13:57 +0100
> David Hildenbrand <david@redhat.com> wrote:
> 
>> On 18.12.20 15:18, Claudio Imbrenda wrote:
>>> Correctly handle the MVPG instruction when issued by a VSIE guest.
>>>   
>>
>> I remember that MVPG SIE documentation was completely crazy and full
>> of corner cases. :)
> 
> you remember correctly
> 
>> Looking at arch/s390/kvm/intercept.c:handle_mvpg_pei(), I can spot
>> that
>>
>> 1. "This interception can only happen for guests with DAT disabled
>> ..." 2. KVM does not make use of any mvpg state inside the SCB.
>>
>> Can this be observed with Linux guests?
> 
> a Linux guest will typically not run with DAT disabled
> 
>> Can I get some information on what information is stored at [0xc0,
>> 0xd) inside the SCB? I assume it's:
>>
>> 0xc0: guest physical address of source PTE
>> 0xc8: guest physical address of target PTE
> 
> yes (plus 3 flags in the lower bits of each)

Thanks! Do the flags tell us what the deal with the PTE was? If yes,
what's the meaning of the separate flags?

I assume something like "invalid, proteced, ??"

I'm asking because I think we can handle this a little easier.

> 
>> [...]
>>>  /*
>>>   * Run the vsie on a shadow scb and a shadow gmap, without any
>>> further
>>>   * sanity checks, handling SIE faults.
>>> @@ -1063,6 +1132,10 @@ static int do_vsie_run(struct kvm_vcpu
>>> *vcpu, struct vsie_page *vsie_page) if ((scb_s->ipa & 0xf000) !=
>>> 0xf000) scb_s->ipa += 0x1000;
>>>  		break;
>>> +	case ICPT_PARTEXEC:
>>> +		if (scb_s->ipa == 0xb254)  
>>
>> Old code hat "/* MVPG only */" - why is this condition now necessary?
> 
> old code was wrong ;)


arch/s390/kvm/intercept.c:handle_partial_execution() we only seem to handle

1. MVPG
2. SIGP PEI

The latter is only relevant for external calls. IIRC, this is only active
with sigp interpretation - which is never active under vsie (ECA_SIGPI).

Claudio Imbrenda Jan. 4, 2021, 4:36 p.m. UTC | #4

On Mon, 4 Jan 2021 17:08:15 +0100
David Hildenbrand <david@redhat.com> wrote:

> On 04.01.21 16:22, Claudio Imbrenda wrote:
> > On Sun, 20 Dec 2020 11:13:57 +0100
> > David Hildenbrand <david@redhat.com> wrote:
> >   
> >> On 18.12.20 15:18, Claudio Imbrenda wrote:  
> >>> Correctly handle the MVPG instruction when issued by a VSIE guest.
> >>>     
> >>
> >> I remember that MVPG SIE documentation was completely crazy and
> >> full of corner cases. :)  
> > 
> > you remember correctly
> >   
> >> Looking at arch/s390/kvm/intercept.c:handle_mvpg_pei(), I can spot
> >> that
> >>
> >> 1. "This interception can only happen for guests with DAT disabled
> >> ..." 2. KVM does not make use of any mvpg state inside the SCB.
> >>
> >> Can this be observed with Linux guests?  
> > 
> > a Linux guest will typically not run with DAT disabled
> >   
> >> Can I get some information on what information is stored at [0xc0,
> >> 0xd) inside the SCB? I assume it's:
> >>
> >> 0xc0: guest physical address of source PTE
> >> 0xc8: guest physical address of target PTE  
> > 
> > yes (plus 3 flags in the lower bits of each)  
> 
> Thanks! Do the flags tell us what the deal with the PTE was? If yes,
> what's the meaning of the separate flags?
> 
> I assume something like "invalid, proteced, ??"

bit 61 indicates that the address is a region or segment table entry,
when EDAT applies
bit 62 is "protected" when the protected bit is set in the segment
table entry (or region, if EDAT applies) 
bit 63 is set when the operand was translated with a real-space ASCE

but you can check if the PTE is valid just by dereferencing the
pointers...

> I'm asking because I think we can handle this a little easier.

what is your idea?

> >   
> >> [...]  
> >>>  /*
> >>>   * Run the vsie on a shadow scb and a shadow gmap, without any
> >>> further
> >>>   * sanity checks, handling SIE faults.
> >>> @@ -1063,6 +1132,10 @@ static int do_vsie_run(struct kvm_vcpu
> >>> *vcpu, struct vsie_page *vsie_page) if ((scb_s->ipa & 0xf000) !=
> >>> 0xf000) scb_s->ipa += 0x1000;
> >>>  		break;
> >>> +	case ICPT_PARTEXEC:
> >>> +		if (scb_s->ipa == 0xb254)    
> >>
> >> Old code hat "/* MVPG only */" - why is this condition now
> >> necessary?  
> > 
> > old code was wrong ;)  
> 
> 
> arch/s390/kvm/intercept.c:handle_partial_execution() we only seem to
> handle
> 
> 1. MVPG
> 2. SIGP PEI
> 
> The latter is only relevant for external calls. IIRC, this is only
> active with sigp interpretation - which is never active under vsie
> (ECA_SIGPI).

I think putting an explicit check is better than just a jump in the
dark.

David Hildenbrand Jan. 5, 2021, 10:17 a.m. UTC | #5

On 04.01.21 17:36, Claudio Imbrenda wrote:
> On Mon, 4 Jan 2021 17:08:15 +0100
> David Hildenbrand <david@redhat.com> wrote:
> 
>> On 04.01.21 16:22, Claudio Imbrenda wrote:
>>> On Sun, 20 Dec 2020 11:13:57 +0100
>>> David Hildenbrand <david@redhat.com> wrote:
>>>   
>>>> On 18.12.20 15:18, Claudio Imbrenda wrote:  
>>>>> Correctly handle the MVPG instruction when issued by a VSIE guest.
>>>>>     
>>>>
>>>> I remember that MVPG SIE documentation was completely crazy and
>>>> full of corner cases. :)  
>>>
>>> you remember correctly
>>>   
>>>> Looking at arch/s390/kvm/intercept.c:handle_mvpg_pei(), I can spot
>>>> that
>>>>
>>>> 1. "This interception can only happen for guests with DAT disabled
>>>> ..." 2. KVM does not make use of any mvpg state inside the SCB.
>>>>
>>>> Can this be observed with Linux guests?  
>>>
>>> a Linux guest will typically not run with DAT disabled
>>>   
>>>> Can I get some information on what information is stored at [0xc0,
>>>> 0xd) inside the SCB? I assume it's:
>>>>
>>>> 0xc0: guest physical address of source PTE
>>>> 0xc8: guest physical address of target PTE  
>>>
>>> yes (plus 3 flags in the lower bits of each)  
>>
>> Thanks! Do the flags tell us what the deal with the PTE was? If yes,
>> what's the meaning of the separate flags?
>>
>> I assume something like "invalid, proteced, ??"
> 
> bit 61 indicates that the address is a region or segment table entry,
> when EDAT applies
> bit 62 is "protected" when the protected bit is set in the segment
> table entry (or region, if EDAT applies) 
> bit 63 is set when the operand was translated with a real-space ASCE

Thanks!

> but you can check if the PTE is valid just by dereferencing the
> pointers...

The pgtable might already have been unshadowed and repurposed I think.
So for vSIE, the PTE content, therefore, is a little unreliable.

We could, of course, try using them to make a guess.

"Likely valid"
"Likely invalid"

A rerun of the vSIE will fixup any wrong guess.

> 
>> I'm asking because I think we can handle this a little easier.
> 
> what is your idea?

I was wondering if we can

1. avoid essentially two translations per PTE, obtaining the information
we need while tying to shadow. kvm_s390_shadow_fault() on steroids that

a) gives us the last guest pte address (tricky for segment.region table
I think ... will have to think about this)
b) the final protection

2. avoid faulting/shadowing in case we know an entry is not problematic.
E.g., no need to shadow/fault the source in case the PTE is there and
not invalid. "likely valid" case above.

The idea would be to call the new kvm_s390_shadow_fault() two times (or
only once due to our guesses) and either rerun the vsie, inject an
interrupt, or create the partial intercept.

Essentially avoiding kvm_s390_vsie_mvpg_check(). Will have to think
about this.

[...]
>>
>> arch/s390/kvm/intercept.c:handle_partial_execution() we only seem to
>> handle
>>
>> 1. MVPG
>> 2. SIGP PEI
>>
>> The latter is only relevant for external calls. IIRC, this is only
>> active with sigp interpretation - which is never active under vsie
>> (ECA_SIGPI).
> 
> I think putting an explicit check is better than just a jump in the
> dark.

Agreed, but that should then be called out somewhere why the change as
done. (e.g., separate cleanup patch)

[v1,4/4] s390/kvm: VSIE: correctly handle MVPG when in VSIE

Commit Message

Comments

Patch