diff mbox series

[kvm-unit-tests,v1,1/4] s390x: sie: switch to home space mode before entering SIE

Message ID 20230327082118.2177-2-nrb@linux.ibm.com (mailing list archive)
State New, archived
Headers show
Series s390x: Add support for running guests without MSO/MSL | expand

Commit Message

Nico Boehr March 27, 2023, 8:21 a.m. UTC
This is to prepare for running guests without MSO/MSL, which is
currently not possible.

We already have code in sie64a to setup a guest primary ASCE before
entering SIE, so we can in theory switch to the page tables which
translate gpa to hpa.

But the host is running in primary space mode already, so changing the
primary ASCE before entering SIE will also affect the host's code and
data.

To make this switch useful, the host should run in a different address
space mode. Hence, set up and change to home address space mode before
installing the guest ASCE.

The home space ASCE is just copied over from the primary space ASCE, so
no functional change is intended, also for tests that want to use
MSO/MSL. If a test intends to use a different primary space ASCE, it can
now just set the guest.asce in the save_area.

Signed-off-by: Nico Boehr <nrb@linux.ibm.com>
---
 lib/s390x/asm/arch_def.h |  2 ++
 lib/s390x/sie.c          | 26 ++++++++++++++++++++++++++
 lib/s390x/sie.h          |  1 +
 3 files changed, 29 insertions(+)

Comments

Janosch Frank March 28, 2023, 2:13 p.m. UTC | #1
On 3/27/23 10:21, Nico Boehr wrote:
> This is to prepare for running guests without MSO/MSL, which is
> currently not possible.
> 
> We already have code in sie64a to setup a guest primary ASCE before
> entering SIE, so we can in theory switch to the page tables which
> translate gpa to hpa.
> 
> But the host is running in primary space mode already, so changing the
> primary ASCE before entering SIE will also affect the host's code and
> data.
> 
> To make this switch useful, the host should run in a different address
> space mode. Hence, set up and change to home address space mode before
> installing the guest ASCE.
> 
> The home space ASCE is just copied over from the primary space ASCE, so
> no functional change is intended, also for tests that want to use
> MSO/MSL. If a test intends to use a different primary space ASCE, it can
> now just set the guest.asce in the save_area.
> 
[...]
> +	/* set up home address space to match primary space */
> +	old_cr13 = stctg(13);
> +	lctlg(13, stctg(1));
> +
> +	/* switch to home space so guest tables can be different from host */
> +	psw_mask_set_bits(PSW_MASK_HOME);
> +
> +	/* also handle all interruptions in home space while in SIE */
> +	lowcore.pgm_new_psw.mask |= PSW_MASK_DAT_HOME;

> +	lowcore.ext_new_psw.mask |= PSW_MASK_DAT_HOME;
> +	lowcore.io_new_psw.mask |= PSW_MASK_DAT_HOME;
We didn't enable DAT in these two cases as far as I can see so this is 
superfluous or we should change the mmu code. Also it's missing the svc 
and machine check.

The whole bit manipulation thing looks a bit crude. It might make more 
sense to drop into real mode for a few instructions and have a dedicated 
storage location for an extended PSW mask and an interrupt ASCE as part 
of the interrupt call code instead.

Opinions?

> +	mb();
> +
>   	while (vm->sblk->icptcode == 0) {
>   		sie64a(vm->sblk, &vm->save_area);
>   		sie_handle_validity(vm);
> @@ -60,6 +75,17 @@ void sie(struct vm *vm)
>   	vm->save_area.guest.grs[14] = vm->sblk->gg14;
>   	vm->save_area.guest.grs[15] = vm->sblk->gg15;
>   
> +	lowcore.pgm_new_psw.mask &= ~PSW_MASK_HOME;
> +	lowcore.ext_new_psw.mask &= ~PSW_MASK_HOME;
> +	lowcore.io_new_psw.mask &= ~PSW_MASK_HOME;
> +	mb();
> +
> +	psw_mask_clear_bits(PSW_MASK_HOME);
> +
> +	/* restore the old CR 13 */
> +	lctlg(13, old_cr13);
> +
> +
>   	if (vm->sblk->sdf == 2)
>   		memcpy(vm->save_area.guest.grs, vm->sblk->pv_grregs,
>   		       sizeof(vm->save_area.guest.grs));
> diff --git a/lib/s390x/sie.h b/lib/s390x/sie.h
> index 147cb0f2a556..0b00fb709776 100644
> --- a/lib/s390x/sie.h
> +++ b/lib/s390x/sie.h
> @@ -284,5 +284,6 @@ void sie_handle_validity(struct vm *vm);
>   void sie_guest_sca_create(struct vm *vm);
>   void sie_guest_create(struct vm *vm, uint64_t guest_mem, uint64_t guest_mem_len);
>   void sie_guest_destroy(struct vm *vm);
> +bool sie_had_pgm_int(struct vm *vm);
>   
>   #endif /* _S390X_SIE_H_ */
Nico Boehr March 29, 2023, 12:50 p.m. UTC | #2
Quoting Janosch Frank (2023-03-28 16:13:04)
> On 3/27/23 10:21, Nico Boehr wrote:
> > This is to prepare for running guests without MSO/MSL, which is
> > currently not possible.
> > 
> > We already have code in sie64a to setup a guest primary ASCE before
> > entering SIE, so we can in theory switch to the page tables which
> > translate gpa to hpa.
> > 
> > But the host is running in primary space mode already, so changing the
> > primary ASCE before entering SIE will also affect the host's code and
> > data.
> > 
> > To make this switch useful, the host should run in a different address
> > space mode. Hence, set up and change to home address space mode before
> > installing the guest ASCE.
> > 
> > The home space ASCE is just copied over from the primary space ASCE, so
> > no functional change is intended, also for tests that want to use
> > MSO/MSL. If a test intends to use a different primary space ASCE, it can
> > now just set the guest.asce in the save_area.
> > 
> [...]
> > +     /* set up home address space to match primary space */
> > +     old_cr13 = stctg(13);
> > +     lctlg(13, stctg(1));
> > +
> > +     /* switch to home space so guest tables can be different from host */
> > +     psw_mask_set_bits(PSW_MASK_HOME);
> > +
> > +     /* also handle all interruptions in home space while in SIE */
> > +     lowcore.pgm_new_psw.mask |= PSW_MASK_DAT_HOME;
> 
> > +     lowcore.ext_new_psw.mask |= PSW_MASK_DAT_HOME;
> > +     lowcore.io_new_psw.mask |= PSW_MASK_DAT_HOME;
> We didn't enable DAT in these two cases as far as I can see so this is 
> superfluous or we should change the mmu code. Also it's missing the svc 
> and machine check.

Right. Is there a particular reason why we only run DAT on for PGM ints?

> The whole bit manipulation thing looks a bit crude. It might make more 
> sense to drop into real mode for a few instructions and have a dedicated 
> storage location for an extended PSW mask and an interrupt ASCE as part 
> of the interrupt call code instead.
> 
> Opinions?

Maybe I don't get it, but I personally don't quite see the advantage. It seems
to me this would make things much more complicated just to avoid a few simple
bitops.

It maybe also depends on how many new_psws we have to touch. If it's really just
the PGM, the current solution seems simple enough.

But if others also prefer Janosch's suggestion, I am happy to implement it.
Claudio Imbrenda March 29, 2023, 1 p.m. UTC | #3
On Wed, 29 Mar 2023 14:50:50 +0200
Nico Boehr <nrb@linux.ibm.com> wrote:

> Quoting Janosch Frank (2023-03-28 16:13:04)
> > On 3/27/23 10:21, Nico Boehr wrote:  
> > > This is to prepare for running guests without MSO/MSL, which is
> > > currently not possible.
> > > 
> > > We already have code in sie64a to setup a guest primary ASCE before
> > > entering SIE, so we can in theory switch to the page tables which
> > > translate gpa to hpa.
> > > 
> > > But the host is running in primary space mode already, so changing the
> > > primary ASCE before entering SIE will also affect the host's code and
> > > data.
> > > 
> > > To make this switch useful, the host should run in a different address
> > > space mode. Hence, set up and change to home address space mode before
> > > installing the guest ASCE.
> > > 
> > > The home space ASCE is just copied over from the primary space ASCE, so
> > > no functional change is intended, also for tests that want to use
> > > MSO/MSL. If a test intends to use a different primary space ASCE, it can
> > > now just set the guest.asce in the save_area.
> > >   
> > [...]  
> > > +     /* set up home address space to match primary space */
> > > +     old_cr13 = stctg(13);
> > > +     lctlg(13, stctg(1));
> > > +
> > > +     /* switch to home space so guest tables can be different from host */
> > > +     psw_mask_set_bits(PSW_MASK_HOME);
> > > +
> > > +     /* also handle all interruptions in home space while in SIE */
> > > +     lowcore.pgm_new_psw.mask |= PSW_MASK_DAT_HOME;  
> >   
> > > +     lowcore.ext_new_psw.mask |= PSW_MASK_DAT_HOME;
> > > +     lowcore.io_new_psw.mask |= PSW_MASK_DAT_HOME;  
> > We didn't enable DAT in these two cases as far as I can see so this is 
> > superfluous or we should change the mmu code. Also it's missing the svc 
> > and machine check.  
> 
> Right. Is there a particular reason why we only run DAT on for PGM ints?

a fixup handler for PGM it might need to run with DAT on (e.g. to
access data that is not identity mapped), whereas for other interrupts
it's not needed (at least not yet ;) )

> 
> > The whole bit manipulation thing looks a bit crude. It might make more 
> > sense to drop into real mode for a few instructions and have a dedicated 
> > storage location for an extended PSW mask and an interrupt ASCE as part 
> > of the interrupt call code instead.
> > 
> > Opinions?  
> 
> Maybe I don't get it, but I personally don't quite see the advantage. It seems
> to me this would make things much more complicated just to avoid a few simple
> bitops.
> 
> It maybe also depends on how many new_psws we have to touch. If it's really just
> the PGM, the current solution seems simple enough.
> 
> But if others also prefer Janosch's suggestion, I am happy to implement it.
Janosch Frank March 29, 2023, 1:42 p.m. UTC | #4
On 3/29/23 15:00, Claudio Imbrenda wrote:
> On Wed, 29 Mar 2023 14:50:50 +0200
> Nico Boehr <nrb@linux.ibm.com> wrote:
> 
>> Quoting Janosch Frank (2023-03-28 16:13:04)
>>> On 3/27/23 10:21, Nico Boehr wrote:
>>>> This is to prepare for running guests without MSO/MSL, which is
>>>> currently not possible.
>>>>
>>>> We already have code in sie64a to setup a guest primary ASCE before
>>>> entering SIE, so we can in theory switch to the page tables which
>>>> translate gpa to hpa.
>>>>
>>>> But the host is running in primary space mode already, so changing the
>>>> primary ASCE before entering SIE will also affect the host's code and
>>>> data.
>>>>
>>>> To make this switch useful, the host should run in a different address
>>>> space mode. Hence, set up and change to home address space mode before
>>>> installing the guest ASCE.
>>>>
>>>> The home space ASCE is just copied over from the primary space ASCE, so
>>>> no functional change is intended, also for tests that want to use
>>>> MSO/MSL. If a test intends to use a different primary space ASCE, it can
>>>> now just set the guest.asce in the save_area.
>>>>    
>>> [...]
>>>> +     /* set up home address space to match primary space */
>>>> +     old_cr13 = stctg(13);
>>>> +     lctlg(13, stctg(1));
>>>> +
>>>> +     /* switch to home space so guest tables can be different from host */
>>>> +     psw_mask_set_bits(PSW_MASK_HOME);
>>>> +
>>>> +     /* also handle all interruptions in home space while in SIE */
>>>> +     lowcore.pgm_new_psw.mask |= PSW_MASK_DAT_HOME;
>>>    
>>>> +     lowcore.ext_new_psw.mask |= PSW_MASK_DAT_HOME;
>>>> +     lowcore.io_new_psw.mask |= PSW_MASK_DAT_HOME;
>>> We didn't enable DAT in these two cases as far as I can see so this is
>>> superfluous or we should change the mmu code. Also it's missing the svc
>>> and machine check.
>>
>> Right. Is there a particular reason why we only run DAT on for PGM ints?
> 
> a fixup handler for PGM it might need to run with DAT on (e.g. to
> access data that is not identity mapped), whereas for other interrupts
> it's not needed (at least not yet ;) )

At the time where the mmu code was written, the other handlers were very 
basic or a direct abort (like the IO IRQ that was introduced by Pierre). 
But I'd rather have them all behave the same so they are at least 
consistent.

If we don't introduce a location where we load the PSW from, then add a 
function that sets the masks for all IRQs and also convert the mmu 
enablement to use it.

Something to the likes of:

void irq_new_set_mask(bool dat, uint8_t as)
{
	loop over psws {
		- Remove dat and as bits from new PSW
		- Or in the new dat + as bits
	}
}
Nico Boehr March 29, 2023, 2:58 p.m. UTC | #5
Quoting Claudio Imbrenda (2023-03-29 15:00:32)
> On Wed, 29 Mar 2023 14:50:50 +0200
> Nico Boehr <nrb@linux.ibm.com> wrote:
> 
> > Quoting Janosch Frank (2023-03-28 16:13:04)
> > > On 3/27/23 10:21, Nico Boehr wrote:  
> > > > This is to prepare for running guests without MSO/MSL, which is
> > > > currently not possible.
> > > > 
> > > > We already have code in sie64a to setup a guest primary ASCE before
> > > > entering SIE, so we can in theory switch to the page tables which
> > > > translate gpa to hpa.
> > > > 
> > > > But the host is running in primary space mode already, so changing the
> > > > primary ASCE before entering SIE will also affect the host's code and
> > > > data.
> > > > 
> > > > To make this switch useful, the host should run in a different address
> > > > space mode. Hence, set up and change to home address space mode before
> > > > installing the guest ASCE.
> > > > 
> > > > The home space ASCE is just copied over from the primary space ASCE, so
> > > > no functional change is intended, also for tests that want to use
> > > > MSO/MSL. If a test intends to use a different primary space ASCE, it can
> > > > now just set the guest.asce in the save_area.
> > > >   
> > > [...]  
> > > > +     /* set up home address space to match primary space */
> > > > +     old_cr13 = stctg(13);
> > > > +     lctlg(13, stctg(1));
> > > > +
> > > > +     /* switch to home space so guest tables can be different from host */
> > > > +     psw_mask_set_bits(PSW_MASK_HOME);
> > > > +
> > > > +     /* also handle all interruptions in home space while in SIE */
> > > > +     lowcore.pgm_new_psw.mask |= PSW_MASK_DAT_HOME;  
> > >   
> > > > +     lowcore.ext_new_psw.mask |= PSW_MASK_DAT_HOME;
> > > > +     lowcore.io_new_psw.mask |= PSW_MASK_DAT_HOME;  
> > > We didn't enable DAT in these two cases as far as I can see so this is 
> > > superfluous or we should change the mmu code. Also it's missing the svc 
> > > and machine check.  
> > 
> > Right. Is there a particular reason why we only run DAT on for PGM ints?
> 
> a fixup handler for PGM it might need to run with DAT on (e.g. to
> access data that is not identity mapped), whereas for other interrupts
> it's not needed (at least not yet ;) )

Makes sense.

Since one can register a cleanup for io and ext, too, I think we need to fix the
mmu init for these cases while at it.

Will do that in v2.
diff mbox series

Patch

diff --git a/lib/s390x/asm/arch_def.h b/lib/s390x/asm/arch_def.h
index bb26e008cc68..dd95e27abf48 100644
--- a/lib/s390x/asm/arch_def.h
+++ b/lib/s390x/asm/arch_def.h
@@ -67,6 +67,8 @@  struct cpu {
 #define AS_HOME				3
 
 #define PSW_MASK_DAT			0x0400000000000000UL
+#define PSW_MASK_DAT_HOME		0x0400C00000000000UL
+#define PSW_MASK_HOME			0x0000C00000000000UL
 #define PSW_MASK_IO			0x0200000000000000UL
 #define PSW_MASK_EXT			0x0100000000000000UL
 #define PSW_MASK_KEY			0x00F0000000000000UL
diff --git a/lib/s390x/sie.c b/lib/s390x/sie.c
index 9241b4b4a512..22141ded1a90 100644
--- a/lib/s390x/sie.c
+++ b/lib/s390x/sie.c
@@ -46,6 +46,8 @@  void sie_handle_validity(struct vm *vm)
 
 void sie(struct vm *vm)
 {
+	uint64_t old_cr13;
+
 	if (vm->sblk->sdf == 2)
 		memcpy(vm->sblk->pv_grregs, vm->save_area.guest.grs,
 		       sizeof(vm->save_area.guest.grs));
@@ -53,6 +55,19 @@  void sie(struct vm *vm)
 	/* Reset icptcode so we don't trip over it below */
 	vm->sblk->icptcode = 0;
 
+	/* set up home address space to match primary space */
+	old_cr13 = stctg(13);
+	lctlg(13, stctg(1));
+
+	/* switch to home space so guest tables can be different from host */
+	psw_mask_set_bits(PSW_MASK_HOME);
+
+	/* also handle all interruptions in home space while in SIE */
+	lowcore.pgm_new_psw.mask |= PSW_MASK_DAT_HOME;
+	lowcore.ext_new_psw.mask |= PSW_MASK_DAT_HOME;
+	lowcore.io_new_psw.mask |= PSW_MASK_DAT_HOME;
+	mb();
+
 	while (vm->sblk->icptcode == 0) {
 		sie64a(vm->sblk, &vm->save_area);
 		sie_handle_validity(vm);
@@ -60,6 +75,17 @@  void sie(struct vm *vm)
 	vm->save_area.guest.grs[14] = vm->sblk->gg14;
 	vm->save_area.guest.grs[15] = vm->sblk->gg15;
 
+	lowcore.pgm_new_psw.mask &= ~PSW_MASK_HOME;
+	lowcore.ext_new_psw.mask &= ~PSW_MASK_HOME;
+	lowcore.io_new_psw.mask &= ~PSW_MASK_HOME;
+	mb();
+
+	psw_mask_clear_bits(PSW_MASK_HOME);
+
+	/* restore the old CR 13 */
+	lctlg(13, old_cr13);
+
+
 	if (vm->sblk->sdf == 2)
 		memcpy(vm->save_area.guest.grs, vm->sblk->pv_grregs,
 		       sizeof(vm->save_area.guest.grs));
diff --git a/lib/s390x/sie.h b/lib/s390x/sie.h
index 147cb0f2a556..0b00fb709776 100644
--- a/lib/s390x/sie.h
+++ b/lib/s390x/sie.h
@@ -284,5 +284,6 @@  void sie_handle_validity(struct vm *vm);
 void sie_guest_sca_create(struct vm *vm);
 void sie_guest_create(struct vm *vm, uint64_t guest_mem, uint64_t guest_mem_len);
 void sie_guest_destroy(struct vm *vm);
+bool sie_had_pgm_int(struct vm *vm);
 
 #endif /* _S390X_SIE_H_ */