diff mbox series

[v9,05/11] KVM: arm64: Enable writable for ID_AA64DFR0_EL1 and ID_DFR0_EL1

Message ID 20230821212243.491660-6-jingzhangos@google.com (mailing list archive)
State New, archived
Headers show
Series Enable writable for idregs DFR0,PFR0, MMFR{0,1,2,3} | expand

Commit Message

Jing Zhang Aug. 21, 2023, 9:22 p.m. UTC
All valid fields in ID_AA64DFR0_EL1 and ID_DFR0_EL1 are writable
from userspace with this change.
RES0 fields and those fields hidden by KVM are not writable.

Signed-off-by: Jing Zhang <jingzhangos@google.com>
---
 arch/arm64/kvm/sys_regs.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

Comments

Marc Zyngier Aug. 22, 2023, 7:06 a.m. UTC | #1
On Mon, 21 Aug 2023 22:22:37 +0100,
Jing Zhang <jingzhangos@google.com> wrote:
> 
> All valid fields in ID_AA64DFR0_EL1 and ID_DFR0_EL1 are writable
> from userspace with this change.
> RES0 fields and those fields hidden by KVM are not writable.
> 
> Signed-off-by: Jing Zhang <jingzhangos@google.com>
> ---
>  arch/arm64/kvm/sys_regs.c | 6 ++++--
>  1 file changed, 4 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
> index afade7186675..20fc38bad4e8 100644
> --- a/arch/arm64/kvm/sys_regs.c
> +++ b/arch/arm64/kvm/sys_regs.c
> @@ -1931,6 +1931,8 @@ static bool access_spsr(struct kvm_vcpu *vcpu,
>  	return true;
>  }
>  
> +#define ID_AA64DFR0_EL1_RES0_MASK (GENMASK(59, 56) | GENMASK(27, 24) | GENMASK(19, 16))
> +
>  /*
>   * Architected system registers.
>   * Important: Must be sorted ascending by Op0, Op1, CRn, CRm, Op2
> @@ -2006,7 +2008,7 @@ static const struct sys_reg_desc sys_reg_descs[] = {
>  	  .set_user = set_id_dfr0_el1,
>  	  .visibility = aa32_id_visibility,
>  	  .reset = read_sanitised_id_dfr0_el1,
> -	  .val = ID_DFR0_EL1_PerfMon_MASK, },
> +	  .val = GENMASK(31, 0), },

Can you *please* look at the register and realise that we *don't*
support writing to the whole of the low 32 bits? What does it mean to
allow selecting the M-profile debug? Or the memory-mapped trace?

You are advertising a lot of crap to userspace, and that's definitely
not on.

>  	ID_HIDDEN(ID_AFR0_EL1),
>  	AA32_ID_SANITISED(ID_MMFR0_EL1),
>  	AA32_ID_SANITISED(ID_MMFR1_EL1),
> @@ -2055,7 +2057,7 @@ static const struct sys_reg_desc sys_reg_descs[] = {
>  	  .get_user = get_id_reg,
>  	  .set_user = set_id_aa64dfr0_el1,
>  	  .reset = read_sanitised_id_aa64dfr0_el1,
> -	  .val = ID_AA64DFR0_EL1_PMUVer_MASK, },
> +	  .val = ~(ID_AA64DFR0_EL1_PMSVer_MASK | ID_AA64DFR0_EL1_RES0_MASK), },

And it is the same thing here. Where is the handling code to deal with
variable breakpoint numbers? Oh wait, there is none. Really, the only
thing we support writing to are the PMU and Debug versions. And
nothing else.

What does it mean for userspace? Either the write will be denied
despite being advertised a writable field (remember the first patch of
the series???), or we'll blindly accept the write and further ignore
the requested values. Do you really think any of this is acceptable?

This is the *9th* version of this series, and we're still battling
over some extremely basic userspace issues... I don't think we can
merge this series as is stands.

	M.
Jing Zhang Aug. 22, 2023, 6:35 p.m. UTC | #2
Hi Marc,

On Tue, Aug 22, 2023 at 12:06 AM Marc Zyngier <maz@kernel.org> wrote:
>
> On Mon, 21 Aug 2023 22:22:37 +0100,
> Jing Zhang <jingzhangos@google.com> wrote:
> >
> > All valid fields in ID_AA64DFR0_EL1 and ID_DFR0_EL1 are writable
> > from userspace with this change.
> > RES0 fields and those fields hidden by KVM are not writable.
> >
> > Signed-off-by: Jing Zhang <jingzhangos@google.com>
> > ---
> >  arch/arm64/kvm/sys_regs.c | 6 ++++--
> >  1 file changed, 4 insertions(+), 2 deletions(-)
> >
> > diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
> > index afade7186675..20fc38bad4e8 100644
> > --- a/arch/arm64/kvm/sys_regs.c
> > +++ b/arch/arm64/kvm/sys_regs.c
> > @@ -1931,6 +1931,8 @@ static bool access_spsr(struct kvm_vcpu *vcpu,
> >       return true;
> >  }
> >
> > +#define ID_AA64DFR0_EL1_RES0_MASK (GENMASK(59, 56) | GENMASK(27, 24) | GENMASK(19, 16))
> > +
> >  /*
> >   * Architected system registers.
> >   * Important: Must be sorted ascending by Op0, Op1, CRn, CRm, Op2
> > @@ -2006,7 +2008,7 @@ static const struct sys_reg_desc sys_reg_descs[] = {
> >         .set_user = set_id_dfr0_el1,
> >         .visibility = aa32_id_visibility,
> >         .reset = read_sanitised_id_dfr0_el1,
> > -       .val = ID_DFR0_EL1_PerfMon_MASK, },
> > +       .val = GENMASK(31, 0), },
>
> Can you *please* look at the register and realise that we *don't*
> support writing to the whole of the low 32 bits? What does it mean to
> allow selecting the M-profile debug? Or the memory-mapped trace?
>
> You are advertising a lot of crap to userspace, and that's definitely
> not on.
>
> >       ID_HIDDEN(ID_AFR0_EL1),
> >       AA32_ID_SANITISED(ID_MMFR0_EL1),
> >       AA32_ID_SANITISED(ID_MMFR1_EL1),
> > @@ -2055,7 +2057,7 @@ static const struct sys_reg_desc sys_reg_descs[] = {
> >         .get_user = get_id_reg,
> >         .set_user = set_id_aa64dfr0_el1,
> >         .reset = read_sanitised_id_aa64dfr0_el1,
> > -       .val = ID_AA64DFR0_EL1_PMUVer_MASK, },
> > +       .val = ~(ID_AA64DFR0_EL1_PMSVer_MASK | ID_AA64DFR0_EL1_RES0_MASK), },
>
> And it is the same thing here. Where is the handling code to deal with
> variable breakpoint numbers? Oh wait, there is none. Really, the only
> thing we support writing to are the PMU and Debug versions. And
> nothing else.
>
> What does it mean for userspace? Either the write will be denied
> despite being advertised a writable field (remember the first patch of
> the series???), or we'll blindly accept the write and further ignore
> the requested values. Do you really think any of this is acceptable?
>
> This is the *9th* version of this series, and we're still battling
> over some extremely basic userspace issues... I don't think we can
> merge this series as is stands.

I removed sanity checks across multiple fields after the discussion
here: https://lore.kernel.org/all/ZKRC80hb4hXwW8WK@thinky-boi/
I might have misunderstood the discussion. I thought we'd let VMM do
more complete sanity checks.

>
>         M.
>
> --
> Without deviation from the norm, progress is not possible.

Jing
Marc Zyngier Aug. 26, 2023, 10:51 a.m. UTC | #3
On Tue, 22 Aug 2023 19:35:12 +0100,
Jing Zhang <jingzhangos@google.com> wrote:
> 
> Hi Marc,
> 
> On Tue, Aug 22, 2023 at 12:06 AM Marc Zyngier <maz@kernel.org> wrote:
> >
> > On Mon, 21 Aug 2023 22:22:37 +0100,
> > Jing Zhang <jingzhangos@google.com> wrote:
> > >
> > > All valid fields in ID_AA64DFR0_EL1 and ID_DFR0_EL1 are writable
> > > from userspace with this change.
> > > RES0 fields and those fields hidden by KVM are not writable.
> > >
> > > Signed-off-by: Jing Zhang <jingzhangos@google.com>
> > > ---
> > >  arch/arm64/kvm/sys_regs.c | 6 ++++--
> > >  1 file changed, 4 insertions(+), 2 deletions(-)
> > >
> > > diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
> > > index afade7186675..20fc38bad4e8 100644
> > > --- a/arch/arm64/kvm/sys_regs.c
> > > +++ b/arch/arm64/kvm/sys_regs.c
> > > @@ -1931,6 +1931,8 @@ static bool access_spsr(struct kvm_vcpu *vcpu,
> > >       return true;
> > >  }
> > >
> > > +#define ID_AA64DFR0_EL1_RES0_MASK (GENMASK(59, 56) | GENMASK(27, 24) | GENMASK(19, 16))
> > > +
> > >  /*
> > >   * Architected system registers.
> > >   * Important: Must be sorted ascending by Op0, Op1, CRn, CRm, Op2
> > > @@ -2006,7 +2008,7 @@ static const struct sys_reg_desc sys_reg_descs[] = {
> > >         .set_user = set_id_dfr0_el1,
> > >         .visibility = aa32_id_visibility,
> > >         .reset = read_sanitised_id_dfr0_el1,
> > > -       .val = ID_DFR0_EL1_PerfMon_MASK, },
> > > +       .val = GENMASK(31, 0), },
> >
> > Can you *please* look at the register and realise that we *don't*
> > support writing to the whole of the low 32 bits? What does it mean to
> > allow selecting the M-profile debug? Or the memory-mapped trace?
> >
> > You are advertising a lot of crap to userspace, and that's definitely
> > not on.
> >
> > >       ID_HIDDEN(ID_AFR0_EL1),
> > >       AA32_ID_SANITISED(ID_MMFR0_EL1),
> > >       AA32_ID_SANITISED(ID_MMFR1_EL1),
> > > @@ -2055,7 +2057,7 @@ static const struct sys_reg_desc sys_reg_descs[] = {
> > >         .get_user = get_id_reg,
> > >         .set_user = set_id_aa64dfr0_el1,
> > >         .reset = read_sanitised_id_aa64dfr0_el1,
> > > -       .val = ID_AA64DFR0_EL1_PMUVer_MASK, },
> > > +       .val = ~(ID_AA64DFR0_EL1_PMSVer_MASK | ID_AA64DFR0_EL1_RES0_MASK), },
> >
> > And it is the same thing here. Where is the handling code to deal with
> > variable breakpoint numbers? Oh wait, there is none. Really, the only
> > thing we support writing to are the PMU and Debug versions. And
> > nothing else.
> >
> > What does it mean for userspace? Either the write will be denied
> > despite being advertised a writable field (remember the first patch of
> > the series???), or we'll blindly accept the write and further ignore
> > the requested values. Do you really think any of this is acceptable?
> >
> > This is the *9th* version of this series, and we're still battling
> > over some extremely basic userspace issues... I don't think we can
> > merge this series as is stands.
> 
> I removed sanity checks across multiple fields after the discussion
> here: https://lore.kernel.org/all/ZKRC80hb4hXwW8WK@thinky-boi/
> I might have misunderstood the discussion. I thought we'd let VMM do
> more complete sanity checks.

The problem is that you don't even have a statement about how this is
supposed to be handled. What are the rules? How can the VMM author
*know*?

That's my real issue with this series: at no point do we state when an
ID register can be written, what are the rules that must be followed,
where is the split in responsibility between KVM and the VMM, and what
is the expected behaviour when the VMM exposes something that is
completely outlandish to the guest (such as the M-profile debug).

Do you see the issue? We can always fix the code. But once we've
exposed that to userspace, there is no going back. And I really want
everybody's buy-in on that front, including the VMM people.

Thanks,

	M.
Jing Zhang Aug. 27, 2023, 7:31 p.m. UTC | #4
Hi Marc,

On Sat, Aug 26, 2023 at 3:51 AM Marc Zyngier <maz@kernel.org> wrote:
>
> On Tue, 22 Aug 2023 19:35:12 +0100,
> Jing Zhang <jingzhangos@google.com> wrote:
> >
> > Hi Marc,
> >
> > On Tue, Aug 22, 2023 at 12:06 AM Marc Zyngier <maz@kernel.org> wrote:
> > >
> > > On Mon, 21 Aug 2023 22:22:37 +0100,
> > > Jing Zhang <jingzhangos@google.com> wrote:
> > > >
> > > > All valid fields in ID_AA64DFR0_EL1 and ID_DFR0_EL1 are writable
> > > > from userspace with this change.
> > > > RES0 fields and those fields hidden by KVM are not writable.
> > > >
> > > > Signed-off-by: Jing Zhang <jingzhangos@google.com>
> > > > ---
> > > >  arch/arm64/kvm/sys_regs.c | 6 ++++--
> > > >  1 file changed, 4 insertions(+), 2 deletions(-)
> > > >
> > > > diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
> > > > index afade7186675..20fc38bad4e8 100644
> > > > --- a/arch/arm64/kvm/sys_regs.c
> > > > +++ b/arch/arm64/kvm/sys_regs.c
> > > > @@ -1931,6 +1931,8 @@ static bool access_spsr(struct kvm_vcpu *vcpu,
> > > >       return true;
> > > >  }
> > > >
> > > > +#define ID_AA64DFR0_EL1_RES0_MASK (GENMASK(59, 56) | GENMASK(27, 24) | GENMASK(19, 16))
> > > > +
> > > >  /*
> > > >   * Architected system registers.
> > > >   * Important: Must be sorted ascending by Op0, Op1, CRn, CRm, Op2
> > > > @@ -2006,7 +2008,7 @@ static const struct sys_reg_desc sys_reg_descs[] = {
> > > >         .set_user = set_id_dfr0_el1,
> > > >         .visibility = aa32_id_visibility,
> > > >         .reset = read_sanitised_id_dfr0_el1,
> > > > -       .val = ID_DFR0_EL1_PerfMon_MASK, },
> > > > +       .val = GENMASK(31, 0), },
> > >
> > > Can you *please* look at the register and realise that we *don't*
> > > support writing to the whole of the low 32 bits? What does it mean to
> > > allow selecting the M-profile debug? Or the memory-mapped trace?
> > >
> > > You are advertising a lot of crap to userspace, and that's definitely
> > > not on.
> > >
> > > >       ID_HIDDEN(ID_AFR0_EL1),
> > > >       AA32_ID_SANITISED(ID_MMFR0_EL1),
> > > >       AA32_ID_SANITISED(ID_MMFR1_EL1),
> > > > @@ -2055,7 +2057,7 @@ static const struct sys_reg_desc sys_reg_descs[] = {
> > > >         .get_user = get_id_reg,
> > > >         .set_user = set_id_aa64dfr0_el1,
> > > >         .reset = read_sanitised_id_aa64dfr0_el1,
> > > > -       .val = ID_AA64DFR0_EL1_PMUVer_MASK, },
> > > > +       .val = ~(ID_AA64DFR0_EL1_PMSVer_MASK | ID_AA64DFR0_EL1_RES0_MASK), },
> > >
> > > And it is the same thing here. Where is the handling code to deal with
> > > variable breakpoint numbers? Oh wait, there is none. Really, the only
> > > thing we support writing to are the PMU and Debug versions. And
> > > nothing else.
> > >
> > > What does it mean for userspace? Either the write will be denied
> > > despite being advertised a writable field (remember the first patch of
> > > the series???), or we'll blindly accept the write and further ignore
> > > the requested values. Do you really think any of this is acceptable?
> > >
> > > This is the *9th* version of this series, and we're still battling
> > > over some extremely basic userspace issues... I don't think we can
> > > merge this series as is stands.
> >
> > I removed sanity checks across multiple fields after the discussion
> > here: https://lore.kernel.org/all/ZKRC80hb4hXwW8WK@thinky-boi/
> > I might have misunderstood the discussion. I thought we'd let VMM do
> > more complete sanity checks.
>
> The problem is that you don't even have a statement about how this is
> supposed to be handled. What are the rules? How can the VMM author
> *know*?
>
> That's my real issue with this series: at no point do we state when an
> ID register can be written, what are the rules that must be followed,
> where is the split in responsibility between KVM and the VMM, and what
> is the expected behaviour when the VMM exposes something that is
> completely outlandish to the guest (such as the M-profile debug).
>
> Do you see the issue? We can always fix the code. But once we've
> exposed that to userspace, there is no going back. And I really want
> everybody's buy-in on that front, including the VMM people.

Understood.
Looks like it would be less complicated to have KVM do all the sanity
checks to determine if a feature field is configured correctly.
The determination can be done by following rules:
1. Architecture constraints from ARM ARM.
2. KVM constraints. (Those features not exposed by KVM should be read-only)
3. VCPU features. (The write mask needs to be determined on-the-fly)

Thanks,
Jing

>
> Thanks,
>
>         M.
>
> --
> Without deviation from the norm, progress is not possible.
Jing Zhang Sept. 6, 2023, 2:13 a.m. UTC | #5
Hi Marc,

On Sun, Aug 27, 2023 at 12:31 PM Jing Zhang <jingzhangos@google.com> wrote:
>
> Hi Marc,
>
> On Sat, Aug 26, 2023 at 3:51 AM Marc Zyngier <maz@kernel.org> wrote:
> >
> > On Tue, 22 Aug 2023 19:35:12 +0100,
> > Jing Zhang <jingzhangos@google.com> wrote:
> > >
> > > Hi Marc,
> > >
> > > On Tue, Aug 22, 2023 at 12:06 AM Marc Zyngier <maz@kernel.org> wrote:
> > > >
> > > > On Mon, 21 Aug 2023 22:22:37 +0100,
> > > > Jing Zhang <jingzhangos@google.com> wrote:
> > > > >
> > > > > All valid fields in ID_AA64DFR0_EL1 and ID_DFR0_EL1 are writable
> > > > > from userspace with this change.
> > > > > RES0 fields and those fields hidden by KVM are not writable.
> > > > >
> > > > > Signed-off-by: Jing Zhang <jingzhangos@google.com>
> > > > > ---
> > > > >  arch/arm64/kvm/sys_regs.c | 6 ++++--
> > > > >  1 file changed, 4 insertions(+), 2 deletions(-)
> > > > >
> > > > > diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
> > > > > index afade7186675..20fc38bad4e8 100644
> > > > > --- a/arch/arm64/kvm/sys_regs.c
> > > > > +++ b/arch/arm64/kvm/sys_regs.c
> > > > > @@ -1931,6 +1931,8 @@ static bool access_spsr(struct kvm_vcpu *vcpu,
> > > > >       return true;
> > > > >  }
> > > > >
> > > > > +#define ID_AA64DFR0_EL1_RES0_MASK (GENMASK(59, 56) | GENMASK(27, 24) | GENMASK(19, 16))
> > > > > +
> > > > >  /*
> > > > >   * Architected system registers.
> > > > >   * Important: Must be sorted ascending by Op0, Op1, CRn, CRm, Op2
> > > > > @@ -2006,7 +2008,7 @@ static const struct sys_reg_desc sys_reg_descs[] = {
> > > > >         .set_user = set_id_dfr0_el1,
> > > > >         .visibility = aa32_id_visibility,
> > > > >         .reset = read_sanitised_id_dfr0_el1,
> > > > > -       .val = ID_DFR0_EL1_PerfMon_MASK, },
> > > > > +       .val = GENMASK(31, 0), },
> > > >
> > > > Can you *please* look at the register and realise that we *don't*
> > > > support writing to the whole of the low 32 bits? What does it mean to
> > > > allow selecting the M-profile debug? Or the memory-mapped trace?
> > > >
> > > > You are advertising a lot of crap to userspace, and that's definitely
> > > > not on.
> > > >
> > > > >       ID_HIDDEN(ID_AFR0_EL1),
> > > > >       AA32_ID_SANITISED(ID_MMFR0_EL1),
> > > > >       AA32_ID_SANITISED(ID_MMFR1_EL1),
> > > > > @@ -2055,7 +2057,7 @@ static const struct sys_reg_desc sys_reg_descs[] = {
> > > > >         .get_user = get_id_reg,
> > > > >         .set_user = set_id_aa64dfr0_el1,
> > > > >         .reset = read_sanitised_id_aa64dfr0_el1,
> > > > > -       .val = ID_AA64DFR0_EL1_PMUVer_MASK, },
> > > > > +       .val = ~(ID_AA64DFR0_EL1_PMSVer_MASK | ID_AA64DFR0_EL1_RES0_MASK), },
> > > >
> > > > And it is the same thing here. Where is the handling code to deal with
> > > > variable breakpoint numbers? Oh wait, there is none. Really, the only
> > > > thing we support writing to are the PMU and Debug versions. And
> > > > nothing else.
> > > >
> > > > What does it mean for userspace? Either the write will be denied
> > > > despite being advertised a writable field (remember the first patch of
> > > > the series???), or we'll blindly accept the write and further ignore
> > > > the requested values. Do you really think any of this is acceptable?
> > > >
> > > > This is the *9th* version of this series, and we're still battling
> > > > over some extremely basic userspace issues... I don't think we can
> > > > merge this series as is stands.
> > >
> > > I removed sanity checks across multiple fields after the discussion
> > > here: https://lore.kernel.org/all/ZKRC80hb4hXwW8WK@thinky-boi/
> > > I might have misunderstood the discussion. I thought we'd let VMM do
> > > more complete sanity checks.
> >
> > The problem is that you don't even have a statement about how this is
> > supposed to be handled. What are the rules? How can the VMM author
> > *know*?
> >
> > That's my real issue with this series: at no point do we state when an
> > ID register can be written, what are the rules that must be followed,
> > where is the split in responsibility between KVM and the VMM, and what
> > is the expected behaviour when the VMM exposes something that is
> > completely outlandish to the guest (such as the M-profile debug).
> >
> > Do you see the issue? We can always fix the code. But once we've
> > exposed that to userspace, there is no going back. And I really want
> > everybody's buy-in on that front, including the VMM people.
>
> Understood.
> Looks like it would be less complicated to have KVM do all the sanity
> checks to determine if a feature field is configured correctly.
> The determination can be done by following rules:
> 1. Architecture constraints from ARM ARM.
> 2. KVM constraints. (Those features not exposed by KVM should be read-only)
> 3. VCPU features. (The write mask needs to be determined on-the-fly)

Does this sound good to you to have all sanity checks in KVM?

Thanks,
Jing

>
> Thanks,
> Jing
>
> >
> > Thanks,
> >
> >         M.
> >
> > --
> > Without deviation from the norm, progress is not possible.
Oliver Upton Sept. 6, 2023, 6:03 a.m. UTC | #6
On Tue, Sep 05, 2023 at 07:13:35PM -0700, Jing Zhang wrote:

[...]

> > > > I removed sanity checks across multiple fields after the discussion
> > > > here: https://lore.kernel.org/all/ZKRC80hb4hXwW8WK@thinky-boi/
> > > > I might have misunderstood the discussion. I thought we'd let VMM do
> > > > more complete sanity checks.
> > >
> > > The problem is that you don't even have a statement about how this is
> > > supposed to be handled. What are the rules? How can the VMM author
> > > *know*?
> > >
> > > That's my real issue with this series: at no point do we state when an
> > > ID register can be written, what are the rules that must be followed,
> > > where is the split in responsibility between KVM and the VMM, and what
> > > is the expected behaviour when the VMM exposes something that is
> > > completely outlandish to the guest (such as the M-profile debug).
> > >
> > > Do you see the issue? We can always fix the code. But once we've
> > > exposed that to userspace, there is no going back. And I really want
> > > everybody's buy-in on that front, including the VMM people.
> >
> > Understood.
> > Looks like it would be less complicated to have KVM do all the sanity
> > checks to determine if a feature field is configured correctly.
> > The determination can be done by following rules:
> > 1. Architecture constraints from ARM ARM.
> > 2. KVM constraints. (Those features not exposed by KVM should be read-only)
> > 3. VCPU features. (The write mask needs to be determined on-the-fly)
> 
> Does this sound good to you to have all sanity checks in KVM?

I would rather we not implement exhaustive checks in KVM, because we
*will* get them wrong. I don't believe Marc is asking for exhaustive
sanity checks in KVM either, just that we prevent userspace from
selecting features we will _never_ emulate (like the MProfDbg example).
You need very clear documentation for the usage pattern and what the
VMM's responsibilities are (like obeying the ARM ARM).

While we're here, I'll subject the both of you to one of the ongoing
thoughts I've had with the whole configurable CPU model UAPI. Ideally
we should get to the point where all emulation and trap configuration is
solely determined from the ID register values of the VM.

I'm a bit worried that this mixes poorly with userspace system register
accesses, though. As implemented, nothing stops userspce from
interleaving ID register writes with other system registers that might
be conditional on a particular CPU feature. For example, disabling the
PMU via DFR0 and then writing to the PMCs. Sure, this could be hacked
around by making PMCs visible to userspace only when the ID register
hides the feature, but we may be painting ourselves in a corner w.r.t.
the addition of new features.

One way out would be to make the ID registers immutable after
the first system register write outside of the ID register range.
Changes under the hood wouldn't be too terrible; AFAICT it involves
hoisting error checking into kvm_vcpu_init_check_features() and
deferring kvm_vcpu_reset() to the point the ID regs are immutable.

I somewhat like the clear order of operations, and it _shouldn't_ break
existing VMMs since they just save/restore the KVM values verbatim. Of
course, this requires some very clear documentation for VMMs that want
to adopt the new UAPI.
diff mbox series

Patch

diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index afade7186675..20fc38bad4e8 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -1931,6 +1931,8 @@  static bool access_spsr(struct kvm_vcpu *vcpu,
 	return true;
 }
 
+#define ID_AA64DFR0_EL1_RES0_MASK (GENMASK(59, 56) | GENMASK(27, 24) | GENMASK(19, 16))
+
 /*
  * Architected system registers.
  * Important: Must be sorted ascending by Op0, Op1, CRn, CRm, Op2
@@ -2006,7 +2008,7 @@  static const struct sys_reg_desc sys_reg_descs[] = {
 	  .set_user = set_id_dfr0_el1,
 	  .visibility = aa32_id_visibility,
 	  .reset = read_sanitised_id_dfr0_el1,
-	  .val = ID_DFR0_EL1_PerfMon_MASK, },
+	  .val = GENMASK(31, 0), },
 	ID_HIDDEN(ID_AFR0_EL1),
 	AA32_ID_SANITISED(ID_MMFR0_EL1),
 	AA32_ID_SANITISED(ID_MMFR1_EL1),
@@ -2055,7 +2057,7 @@  static const struct sys_reg_desc sys_reg_descs[] = {
 	  .get_user = get_id_reg,
 	  .set_user = set_id_aa64dfr0_el1,
 	  .reset = read_sanitised_id_aa64dfr0_el1,
-	  .val = ID_AA64DFR0_EL1_PMUVer_MASK, },
+	  .val = ~(ID_AA64DFR0_EL1_PMSVer_MASK | ID_AA64DFR0_EL1_RES0_MASK), },
 	ID_SANITISED(ID_AA64DFR1_EL1),
 	ID_UNALLOCATED(5,2),
 	ID_UNALLOCATED(5,3),