diff mbox series

KVM: arm64: Make the exposed feature bits in AA64DFR0_EL1 writable from userspace

Message ID 20240813142835.77180-1-shameerali.kolothum.thodi@huawei.com (mailing list archive)
State New, archived
Headers show
Series KVM: arm64: Make the exposed feature bits in AA64DFR0_EL1 writable from userspace | expand

Commit Message

Shameerali Kolothum Thodi Aug. 13, 2024, 2:28 p.m. UTC
KVM exposes the OS double lock feature bit to Guests but returns
RAZ/WI on Guest OSDLR_EL1 access. This breaks Guest migration between
systems where this feature support differ. Add support to make this
feature writable from userspace by setting the mask bit. While at it,
set the mask bits for other exposed features in the AA64DFR0_EL1
register as well.

Also update the selftest to cover these fields.

Signed-off-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
---
   This is based on the discussion here(Thanks to Oliver),
   https://lore.kernel.org/all/ZrVSlbVwnaMDShah@linux.dev/
---
 arch/arm64/kvm/sys_regs.c                         | 6 +++++-
 tools/testing/selftests/kvm/aarch64/set_id_regs.c | 4 ++++
 2 files changed, 9 insertions(+), 1 deletion(-)

Comments

Marc Zyngier Aug. 13, 2024, 6:20 p.m. UTC | #1
On Tue, 13 Aug 2024 15:28:35 +0100,
Shameer Kolothum <shameerali.kolothum.thodi@huawei.com> wrote:
> 
> KVM exposes the OS double lock feature bit to Guests but returns
> RAZ/WI on Guest OSDLR_EL1 access. This breaks Guest migration between
> systems where this feature support differ. Add support to make this
> feature writable from userspace by setting the mask bit. While at it,
> set the mask bits for other exposed features in the AA64DFR0_EL1
> register as well.
> 
> Also update the selftest to cover these fields.
> 
> Signed-off-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
> ---
>    This is based on the discussion here(Thanks to Oliver),
>    https://lore.kernel.org/all/ZrVSlbVwnaMDShah@linux.dev/
> ---
>  arch/arm64/kvm/sys_regs.c                         | 6 +++++-
>  tools/testing/selftests/kvm/aarch64/set_id_regs.c | 4 ++++
>  2 files changed, 9 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
> index c90324060436..adb49d681052 100644
> --- a/arch/arm64/kvm/sys_regs.c
> +++ b/arch/arm64/kvm/sys_regs.c
> @@ -2376,7 +2376,11 @@ static const struct sys_reg_desc sys_reg_descs[] = {
>  	  .get_user = get_id_reg,
>  	  .set_user = set_id_aa64dfr0_el1,
>  	  .reset = read_sanitised_id_aa64dfr0_el1,
> -	  .val = ID_AA64DFR0_EL1_PMUVer_MASK |
> +	  .val = ID_AA64DFR0_EL1_DoubleLock_MASK |
> +		 ID_AA64DFR0_EL1_CTX_CMPs_MASK |
> +		 ID_AA64DFR0_EL1_WRPs_MASK |
> +		 ID_AA64DFR0_EL1_BRPs_MASK |


I think this is going to cause some troubles.

The issue is that context-aware breakpoints are the highest-numbered
breakpoints, right after the normal breakpoints (D2.8.3 "Breakpoint
types and linking of breakpoints"). So if you reduce the number of
normal breakpoints, you shift the context-aware ones down, and
everything breaks.

I really don't see how you can safely do that without completely
changing the way we handle the debug registers.

Thanks,

	M.
Shameerali Kolothum Thodi Aug. 14, 2024, 9:17 a.m. UTC | #2
> -----Original Message-----
> From: Marc Zyngier <maz@kernel.org>
> Sent: Tuesday, August 13, 2024 7:21 PM
> To: Shameerali Kolothum Thodi <shameerali.kolothum.thodi@huawei.com>
> Cc: kvmarm@lists.linux.dev; linux-arm-kernel@lists.infradead.org;
> will@kernel.org; catalin.marinas@arm.com; oliver.upton@linux.dev;
> james.morse@arm.com; suzuki.poulose@arm.com; yuzenghui
> <yuzenghui@huawei.com>; Wangzhou (B) <wangzhou1@hisilicon.com>;
> Linuxarm <linuxarm@huawei.com>
> Subject: Re: [PATCH] KVM: arm64: Make the exposed feature bits in
> AA64DFR0_EL1 writable from userspace
> 
> On Tue, 13 Aug 2024 15:28:35 +0100,
> Shameer Kolothum <shameerali.kolothum.thodi@huawei.com> wrote:
> >
> > KVM exposes the OS double lock feature bit to Guests but returns
> > RAZ/WI on Guest OSDLR_EL1 access. This breaks Guest migration between
> > systems where this feature support differ. Add support to make this
> > feature writable from userspace by setting the mask bit. While at it,
> > set the mask bits for other exposed features in the AA64DFR0_EL1
> > register as well.
> >
> > Also update the selftest to cover these fields.
> >
> > Signed-off-by: Shameer Kolothum
> <shameerali.kolothum.thodi@huawei.com>
> > ---
> >    This is based on the discussion here(Thanks to Oliver),
> >    https://lore.kernel.org/all/ZrVSlbVwnaMDShah@linux.dev/
> > ---
> >  arch/arm64/kvm/sys_regs.c                         | 6 +++++-
> >  tools/testing/selftests/kvm/aarch64/set_id_regs.c | 4 ++++
> >  2 files changed, 9 insertions(+), 1 deletion(-)
> >
> > diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
> > index c90324060436..adb49d681052 100644
> > --- a/arch/arm64/kvm/sys_regs.c
> > +++ b/arch/arm64/kvm/sys_regs.c
> > @@ -2376,7 +2376,11 @@ static const struct sys_reg_desc sys_reg_descs[]
> = {
> >  	  .get_user = get_id_reg,
> >  	  .set_user = set_id_aa64dfr0_el1,
> >  	  .reset = read_sanitised_id_aa64dfr0_el1,
> > -	  .val = ID_AA64DFR0_EL1_PMUVer_MASK |
> > +	  .val = ID_AA64DFR0_EL1_DoubleLock_MASK |
> > +		 ID_AA64DFR0_EL1_CTX_CMPs_MASK |
> > +		 ID_AA64DFR0_EL1_WRPs_MASK |
> > +		 ID_AA64DFR0_EL1_BRPs_MASK |
> 
> 
> I think this is going to cause some troubles.
> 
> The issue is that context-aware breakpoints are the highest-numbered
> breakpoints, right after the normal breakpoints (D2.8.3 "Breakpoint
> types and linking of breakpoints"). So if you reduce the number of
> normal breakpoints, you shift the context-aware ones down, and
> everything breaks.

Thanks Marc for explaining this. I was not aware of this one.

> I really don't see how you can safely do that without completely
> changing the way we handle the debug registers.

Looks like Reji has attempted to do this a while back, 
https://lore.kernel.org/kvm/20220419065544.3616948-13-reijiw@google.com/

I guess that one is trying to address the problem you described above, right?
Though, not clear to me what happened  afterwards to these patches in the series.

Coming back to this patch, we don't have a requirement now to make the
breakpoints writable for migration. The only concern is OS Double lock feature. 
Not sure anyone has a high priority requirement to make the other features
writable or not. Will it be acceptable if I resent this patch with just OS Double Lock
being writable?(Sorry If I sound selfish, but at least some progress can be made soon).

Thanks,
Shameer
Marc Zyngier Aug. 15, 2024, 8:32 a.m. UTC | #3
On Wed, 14 Aug 2024 10:17:10 +0100,
Shameerali Kolothum Thodi <shameerali.kolothum.thodi@huawei.com> wrote:
> 
> 
> 
> > -----Original Message-----
> > From: Marc Zyngier <maz@kernel.org>
> > Sent: Tuesday, August 13, 2024 7:21 PM
> > To: Shameerali Kolothum Thodi <shameerali.kolothum.thodi@huawei.com>
> > Cc: kvmarm@lists.linux.dev; linux-arm-kernel@lists.infradead.org;
> > will@kernel.org; catalin.marinas@arm.com; oliver.upton@linux.dev;
> > james.morse@arm.com; suzuki.poulose@arm.com; yuzenghui
> > <yuzenghui@huawei.com>; Wangzhou (B) <wangzhou1@hisilicon.com>;
> > Linuxarm <linuxarm@huawei.com>
> > Subject: Re: [PATCH] KVM: arm64: Make the exposed feature bits in
> > AA64DFR0_EL1 writable from userspace
> > 
> > On Tue, 13 Aug 2024 15:28:35 +0100,
> > Shameer Kolothum <shameerali.kolothum.thodi@huawei.com> wrote:
> > >
> > > KVM exposes the OS double lock feature bit to Guests but returns
> > > RAZ/WI on Guest OSDLR_EL1 access. This breaks Guest migration between
> > > systems where this feature support differ. Add support to make this
> > > feature writable from userspace by setting the mask bit. While at it,
> > > set the mask bits for other exposed features in the AA64DFR0_EL1
> > > register as well.
> > >
> > > Also update the selftest to cover these fields.
> > >
> > > Signed-off-by: Shameer Kolothum
> > <shameerali.kolothum.thodi@huawei.com>
> > > ---
> > >    This is based on the discussion here(Thanks to Oliver),
> > >    https://lore.kernel.org/all/ZrVSlbVwnaMDShah@linux.dev/
> > > ---
> > >  arch/arm64/kvm/sys_regs.c                         | 6 +++++-
> > >  tools/testing/selftests/kvm/aarch64/set_id_regs.c | 4 ++++
> > >  2 files changed, 9 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
> > > index c90324060436..adb49d681052 100644
> > > --- a/arch/arm64/kvm/sys_regs.c
> > > +++ b/arch/arm64/kvm/sys_regs.c
> > > @@ -2376,7 +2376,11 @@ static const struct sys_reg_desc sys_reg_descs[]
> > = {
> > >  	  .get_user = get_id_reg,
> > >  	  .set_user = set_id_aa64dfr0_el1,
> > >  	  .reset = read_sanitised_id_aa64dfr0_el1,
> > > -	  .val = ID_AA64DFR0_EL1_PMUVer_MASK |
> > > +	  .val = ID_AA64DFR0_EL1_DoubleLock_MASK |
> > > +		 ID_AA64DFR0_EL1_CTX_CMPs_MASK |
> > > +		 ID_AA64DFR0_EL1_WRPs_MASK |
> > > +		 ID_AA64DFR0_EL1_BRPs_MASK |
> > 
> > 
> > I think this is going to cause some troubles.
> > 
> > The issue is that context-aware breakpoints are the highest-numbered
> > breakpoints, right after the normal breakpoints (D2.8.3 "Breakpoint
> > types and linking of breakpoints"). So if you reduce the number of
> > normal breakpoints, you shift the context-aware ones down, and
> > everything breaks.
> 
> Thanks Marc for explaining this. I was not aware of this one.

Yeah, that's a pretty annoying shortcoming of the architecture.  There
is an effort to try and address it, but not sure when that will be
fixed.

> 
> > I really don't see how you can safely do that without completely
> > changing the way we handle the debug registers.
> 
> Looks like Reji has attempted to do this a while back, 
> https://lore.kernel.org/kvm/20220419065544.3616948-13-reijiw@google.com/
> 
> I guess that one is trying to address the problem you described
> above, right?  Though, not clear to me what happened afterwards to
> these patches in the series.
> 
> Coming back to this patch, we don't have a requirement now to make
> the breakpoints writable for migration. The only concern is OS
> Double lock feature.  Not sure anyone has a high priority
> requirement to make the other features writable or not. Will it be
> acceptable if I resent this patch with just OS Double Lock being
> writable?(Sorry If I sound selfish, but at least some progress can
> be made soon).

I think you can keep all the other two fields, as they are
independent. You could add a comment indicating why we can't just let
userspace change this field.

Thanks,

	M.
Sebastian Ott Nov. 26, 2024, 5 p.m. UTC | #4
Hi,

On Wed, 14 Aug 2024, Shameerali Kolothum Thodi wrote:
>>
>> On Tue, 13 Aug 2024 15:28:35 +0100,
>> Shameer Kolothum <shameerali.kolothum.thodi@huawei.com> wrote:
>>>
>>> KVM exposes the OS double lock feature bit to Guests but returns
>>> RAZ/WI on Guest OSDLR_EL1 access. This breaks Guest migration between
>>> systems where this feature support differ. Add support to make this
>>> feature writable from userspace by setting the mask bit. While at it,
>>> set the mask bits for other exposed features in the AA64DFR0_EL1
>>> register as well.
>>>
>>> Also update the selftest to cover these fields.
>>>
>>> Signed-off-by: Shameer Kolothum
>> <shameerali.kolothum.thodi@huawei.com>
>>> ---
>>>    This is based on the discussion here(Thanks to Oliver),
>>>    https://lore.kernel.org/all/ZrVSlbVwnaMDShah@linux.dev/
>>> ---
>>>  arch/arm64/kvm/sys_regs.c                         | 6 +++++-
>>>  tools/testing/selftests/kvm/aarch64/set_id_regs.c | 4 ++++
>>>  2 files changed, 9 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
>>> index c90324060436..adb49d681052 100644
>>> --- a/arch/arm64/kvm/sys_regs.c
>>> +++ b/arch/arm64/kvm/sys_regs.c
>>> @@ -2376,7 +2376,11 @@ static const struct sys_reg_desc sys_reg_descs[]
>> = {
>>>  	  .get_user = get_id_reg,
>>>  	  .set_user = set_id_aa64dfr0_el1,
>>>  	  .reset = read_sanitised_id_aa64dfr0_el1,
>>> -	  .val = ID_AA64DFR0_EL1_PMUVer_MASK |
>>> +	  .val = ID_AA64DFR0_EL1_DoubleLock_MASK |
>>> +		 ID_AA64DFR0_EL1_CTX_CMPs_MASK |
>>> +		 ID_AA64DFR0_EL1_WRPs_MASK |
>>> +		 ID_AA64DFR0_EL1_BRPs_MASK |
>>
>>
>> I think this is going to cause some troubles.
>>
>> The issue is that context-aware breakpoints are the highest-numbered
>> breakpoints, right after the normal breakpoints (D2.8.3 "Breakpoint
>> types and linking of breakpoints"). So if you reduce the number of
>> normal breakpoints, you shift the context-aware ones down, and
>> everything breaks.
>
> Thanks Marc for explaining this. I was not aware of this one.
>
>> I really don't see how you can safely do that without completely
>> changing the way we handle the debug registers.
>
> Looks like Reji has attempted to do this a while back,
> https://lore.kernel.org/kvm/20220419065544.3616948-13-reijiw@google.com/
>

I've got two machines that differ in the number of breakpoints and
it would be nice to be able to migrate between these. Is anything
preventing us from trapping the access and make sure the correct
breakpoint is used? Is anyone working on this? If not I'd like to
give it a shot.

Thanks,
Sebastian
Marc Zyngier Nov. 26, 2024, 7:29 p.m. UTC | #5
On Tue, 26 Nov 2024 17:00:35 +0000,
Sebastian Ott <sebott@redhat.com> wrote:
> 
> Hi,
> 
> On Wed, 14 Aug 2024, Shameerali Kolothum Thodi wrote:
> >> 
> >> On Tue, 13 Aug 2024 15:28:35 +0100,
> >> Shameer Kolothum <shameerali.kolothum.thodi@huawei.com> wrote:
> >>> 
> >>> KVM exposes the OS double lock feature bit to Guests but returns
> >>> RAZ/WI on Guest OSDLR_EL1 access. This breaks Guest migration between
> >>> systems where this feature support differ. Add support to make this
> >>> feature writable from userspace by setting the mask bit. While at it,
> >>> set the mask bits for other exposed features in the AA64DFR0_EL1
> >>> register as well.
> >>> 
> >>> Also update the selftest to cover these fields.
> >>> 
> >>> Signed-off-by: Shameer Kolothum
> >> <shameerali.kolothum.thodi@huawei.com>
> >>> ---
> >>>    This is based on the discussion here(Thanks to Oliver),
> >>>    https://lore.kernel.org/all/ZrVSlbVwnaMDShah@linux.dev/
> >>> ---
> >>>  arch/arm64/kvm/sys_regs.c                         | 6 +++++-
> >>>  tools/testing/selftests/kvm/aarch64/set_id_regs.c | 4 ++++
> >>>  2 files changed, 9 insertions(+), 1 deletion(-)
> >>> 
> >>> diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
> >>> index c90324060436..adb49d681052 100644
> >>> --- a/arch/arm64/kvm/sys_regs.c
> >>> +++ b/arch/arm64/kvm/sys_regs.c
> >>> @@ -2376,7 +2376,11 @@ static const struct sys_reg_desc sys_reg_descs[]
> >> = {
> >>>  	  .get_user = get_id_reg,
> >>>  	  .set_user = set_id_aa64dfr0_el1,
> >>>  	  .reset = read_sanitised_id_aa64dfr0_el1,
> >>> -	  .val = ID_AA64DFR0_EL1_PMUVer_MASK |
> >>> +	  .val = ID_AA64DFR0_EL1_DoubleLock_MASK |
> >>> +		 ID_AA64DFR0_EL1_CTX_CMPs_MASK |
> >>> +		 ID_AA64DFR0_EL1_WRPs_MASK |
> >>> +		 ID_AA64DFR0_EL1_BRPs_MASK |
> >> 
> >> 
> >> I think this is going to cause some troubles.
> >> 
> >> The issue is that context-aware breakpoints are the highest-numbered
> >> breakpoints, right after the normal breakpoints (D2.8.3 "Breakpoint
> >> types and linking of breakpoints"). So if you reduce the number of
> >> normal breakpoints, you shift the context-aware ones down, and
> >> everything breaks.
> > 
> > Thanks Marc for explaining this. I was not aware of this one.
> > 
> >> I really don't see how you can safely do that without completely
> >> changing the way we handle the debug registers.
> > 
> > Looks like Reji has attempted to do this a while back,
> > https://lore.kernel.org/kvm/20220419065544.3616948-13-reijiw@google.com/
> > 
> 
> I've got two machines that differ in the number of breakpoints and
> it would be nice to be able to migrate between these. Is anything

Is that the *only* thing that differ? Do the have the same number of
context-aware breakpoints?

> preventing us from trapping the access and make sure the correct
> breakpoint is used? Is anyone working on this? If not I'd like to
> give it a shot.

Not only trapping. You also need to handle some interesting parts of
the architecture, such as the breakpoint linking fun.

But if we are to go down that road, I really want to restrict that to
implementations that have FEAT_FGT. Because otherwise we need to trap
and emulate *everything*, instead of just the breakpoint registers.
And that would be pretty bad from a performance perspective.

Another thing is that this only works because there is no report of
the breakpoint number in ESR_ELx. The moment we offering this
migration "feature", we are painting ourselves in a corner, should the
architecture ever evolve to something less... bizarre.

Finally, who is going to ensure this keeps working in the foreseeable
future? Because while this is nice, that's not what gets deployed in
production, as it leads to unpredictable performances. My take is that
this thing will eventually bitrot and die.

So, do we *really* want to go down that road?

	M.
diff mbox series

Patch

diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index c90324060436..adb49d681052 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -2376,7 +2376,11 @@  static const struct sys_reg_desc sys_reg_descs[] = {
 	  .get_user = get_id_reg,
 	  .set_user = set_id_aa64dfr0_el1,
 	  .reset = read_sanitised_id_aa64dfr0_el1,
-	  .val = ID_AA64DFR0_EL1_PMUVer_MASK |
+	  .val = ID_AA64DFR0_EL1_DoubleLock_MASK |
+		 ID_AA64DFR0_EL1_CTX_CMPs_MASK |
+		 ID_AA64DFR0_EL1_WRPs_MASK |
+		 ID_AA64DFR0_EL1_BRPs_MASK |
+		 ID_AA64DFR0_EL1_PMUVer_MASK |
 		 ID_AA64DFR0_EL1_DebugVer_MASK, },
 	ID_SANITISED(ID_AA64DFR1_EL1),
 	ID_UNALLOCATED(5,2),
diff --git a/tools/testing/selftests/kvm/aarch64/set_id_regs.c b/tools/testing/selftests/kvm/aarch64/set_id_regs.c
index d20981663831..1e6b9594daf8 100644
--- a/tools/testing/selftests/kvm/aarch64/set_id_regs.c
+++ b/tools/testing/selftests/kvm/aarch64/set_id_regs.c
@@ -68,6 +68,10 @@  struct test_feature_reg {
 	}
 
 static const struct reg_ftr_bits ftr_id_aa64dfr0_el1[] = {
+	S_REG_FTR_BITS(FTR_LOWER_SAFE, ID_AA64DFR0_EL1, DoubleLock, 0),
+	REG_FTR_BITS(FTR_LOWER_SAFE, ID_AA64DFR0_EL1, CTX_CMPs, 0),
+	REG_FTR_BITS(FTR_LOWER_SAFE, ID_AA64DFR0_EL1, WRPs, 0),
+	REG_FTR_BITS(FTR_LOWER_SAFE, ID_AA64DFR0_EL1, BRPs, 0),
 	S_REG_FTR_BITS(FTR_LOWER_SAFE, ID_AA64DFR0_EL1, PMUVer, 0),
 	REG_FTR_BITS(FTR_LOWER_SAFE, ID_AA64DFR0_EL1, DebugVer, ID_AA64DFR0_EL1_DebugVer_IMP),
 	REG_FTR_END,