diff mbox series

[v2,07/14] target/arm/cpu64: max cpu: Introduce sve<vl-bits> properties

Message ID 20190621163422.6127-8-drjones@redhat.com (mailing list archive)
State New, archived
Headers show
Series target/arm/kvm: enable SVE in guests | expand

Commit Message

Andrew Jones June 21, 2019, 4:34 p.m. UTC
Introduce cpu properties to give fine control over SVE vector lengths.
We introduce a property for each valid length up to the current
maximum supported, which is 2048-bits. The properties are named, e.g.
sve128, sve256, sve512, ..., where the number is the number of bits.

It's now possible to do something like -cpu max,sve-max-vq=4,sve384=off
to provide a guest vector lengths 128, 256, and 512 bits. The resulting
set must conform to the architectural constraint of having all power-of-2
lengths smaller than the maximum length present. It's also possible to
only provide sve<vl-bits> properties, e.g. -cpu max,sve512=on. That
example provides the machine with 128, 256, and 512 bit vector lengths.
It doesn't hurt to explicitly ask for all expected vector lengths,
which is what, for example, libvirt should do.

Note1, it is not possible to use sve<vl-bits> properties before
sve-max-vq, e.g. -cpu max,sve384=off,sve-max-vq=4, as supporting
that overly complicates the user input validation.

Note2, while one might expect -cpu max,sve-max-vq=4,sve512=on to be the
same as -cpu max,sve512=on, they are not. The former enables all vector
lengths 512 bits and smaller, while the latter only sets the 512-bit
length and its smaller power-of-2 lengths. It's probably best not to use
sve-max-vq with sve<vl-bits> properties, but it can't be completely
forbidden as we want qmp_query_cpu_model_expansion to work with guests
launched with e.g. -cpu max,sve-max-vq=8 on their command line.

Signed-off-by: Andrew Jones <drjones@redhat.com>
---
 target/arm/cpu.c         |   6 +
 target/arm/cpu.h         |  14 ++
 target/arm/cpu64.c       | 360 ++++++++++++++++++++++++++++++++++++++-
 target/arm/helper.c      |  11 +-
 target/arm/monitor.c     |  16 ++
 tests/arm-cpu-features.c | 217 +++++++++++++++++++++++
 6 files changed, 620 insertions(+), 4 deletions(-)

Comments

Dave Martin June 24, 2019, 11:05 a.m. UTC | #1
On Fri, Jun 21, 2019 at 05:34:15PM +0100, Andrew Jones wrote:
> Introduce cpu properties to give fine control over SVE vector lengths.
> We introduce a property for each valid length up to the current
> maximum supported, which is 2048-bits. The properties are named, e.g.
> sve128, sve256, sve512, ..., where the number is the number of bits.
> 
> It's now possible to do something like -cpu max,sve-max-vq=4,sve384=off
> to provide a guest vector lengths 128, 256, and 512 bits. The resulting
> set must conform to the architectural constraint of having all power-of-2
> lengths smaller than the maximum length present. It's also possible to
> only provide sve<vl-bits> properties, e.g. -cpu max,sve512=on. That
> example provides the machine with 128, 256, and 512 bit vector lengths.
> It doesn't hurt to explicitly ask for all expected vector lengths,
> which is what, for example, libvirt should do.
> 
> Note1, it is not possible to use sve<vl-bits> properties before
> sve-max-vq, e.g. -cpu max,sve384=off,sve-max-vq=4, as supporting
> that overly complicates the user input validation.
> 
> Note2, while one might expect -cpu max,sve-max-vq=4,sve512=on to be the
> same as -cpu max,sve512=on, they are not. The former enables all vector
> lengths 512 bits and smaller, while the latter only sets the 512-bit
> length and its smaller power-of-2 lengths. It's probably best not to use
> sve-max-vq with sve<vl-bits> properties, but it can't be completely
> forbidden as we want qmp_query_cpu_model_expansion to work with guests
> launched with e.g. -cpu max,sve-max-vq=8 on their command line.

Supporting both options together in a non-idempotent way seems to
complicate things.

Would it be simpler to allow sve-max-vq only when there are no sve<bits>
options?

Alternatively, the two options would be defined so that their meanings
are independent of parse order.

So, way sve-max-vq=<max> means that:

 * all VQs up to max for which there is no corresponding sve<bits>=off,
   are enabled; and

 * VQ max is enabled; and

 * all VQs exceeding max are disabled.

While sve<bits>=(on|off) means

 * the VQ coresponding to <bits> is enabled (for on) or disabled (for
   off).
 
After parsing all the options, you check that the sve-max-vq and
sve<bits> optinos are consistent.  If you disallow duplicate sve-max-vq
or sve<bits> options, then there is no possibility of ambiguity and the
order or options doesn't matter.

(There may be constraints on the way qemu options parsing works that
make this hard, though...)

Having sve-max-vq in 128-bit units while sve<bits> are named in terms of
bit counts also feels a bit odd now.

[...]

Cheers
---Dave
Andrew Jones June 24, 2019, 11:49 a.m. UTC | #2
On Mon, Jun 24, 2019 at 12:05:26PM +0100, Dave Martin wrote:
> On Fri, Jun 21, 2019 at 05:34:15PM +0100, Andrew Jones wrote:
> > Introduce cpu properties to give fine control over SVE vector lengths.
> > We introduce a property for each valid length up to the current
> > maximum supported, which is 2048-bits. The properties are named, e.g.
> > sve128, sve256, sve512, ..., where the number is the number of bits.
> > 
> > It's now possible to do something like -cpu max,sve-max-vq=4,sve384=off
> > to provide a guest vector lengths 128, 256, and 512 bits. The resulting
> > set must conform to the architectural constraint of having all power-of-2
> > lengths smaller than the maximum length present. It's also possible to
> > only provide sve<vl-bits> properties, e.g. -cpu max,sve512=on. That
> > example provides the machine with 128, 256, and 512 bit vector lengths.
> > It doesn't hurt to explicitly ask for all expected vector lengths,
> > which is what, for example, libvirt should do.
> > 
> > Note1, it is not possible to use sve<vl-bits> properties before
> > sve-max-vq, e.g. -cpu max,sve384=off,sve-max-vq=4, as supporting
> > that overly complicates the user input validation.
> > 
> > Note2, while one might expect -cpu max,sve-max-vq=4,sve512=on to be the
> > same as -cpu max,sve512=on, they are not. The former enables all vector
> > lengths 512 bits and smaller, while the latter only sets the 512-bit
> > length and its smaller power-of-2 lengths. It's probably best not to use
> > sve-max-vq with sve<vl-bits> properties, but it can't be completely
> > forbidden as we want qmp_query_cpu_model_expansion to work with guests
> > launched with e.g. -cpu max,sve-max-vq=8 on their command line.
> 
> Supporting both options together in a non-idempotent way seems to
> complicate things.
> 
> Would it be simpler to allow sve-max-vq only when there are no sve<bits>
> options?

Not really. Since we can't simply remove sve-max-vq from the 'max' cpu
type, then we'd still need conditions to check when we can use it. The
restriction that it has to come first reduces the complexity
substantially, and I think restricting to not being allowed
at all, when sve<vl-bits> are used, would only give us a net check
reduction of one or two.

> 
> Alternatively, the two options would be defined so that their meanings
> are independent of parse order.
> 
> So, way sve-max-vq=<max> means that:
> 
>  * all VQs up to max for which there is no corresponding sve<bits>=off,
>    are enabled; and
> 
>  * VQ max is enabled; and
> 
>  * all VQs exceeding max are disabled.
> 
> While sve<bits>=(on|off) means
> 
>  * the VQ coresponding to <bits> is enabled (for on) or disabled (for
>    off).
>  
> After parsing all the options, you check that the sve-max-vq and
> sve<bits> optinos are consistent.  If you disallow duplicate sve-max-vq
> or sve<bits> options, then there is no possibility of ambiguity and the
> order or options doesn't matter.

I don't want to put any final checks at the end of parsing all options,
because that won't work with the QMP query.

> 
> (There may be constraints on the way qemu options parsing works that
> make this hard, though...)

I can't think of any issue with the parsing, but the QMP query only using
the property get accessors, without any finalizing, does put constraints
on what we can do.

> 
> Having sve-max-vq in 128-bit units while sve<bits> are named in terms of
> bit counts also feels a bit odd now.

sve-max-vq already exists, so it'd have to be deprecated if we don't want
it anymore. And I think sve<vl-bits> is the right naming to go with for
that. Of course using sve-max-vq is completely optional. If you don't want
it for backward compatible reasons, or as a shorthand to restrict the
lengths, then you can just use the sve<vl-bits> properties. Indeed, with
KVM, if you use the 'host' cpu type (the default for use with KVM), then
you won't even have the sve-max-vq property. As 'host' never had it, I
didn't have to keep it, nor adopt it. And of course I didn't want to
adopt it.

Thanks,
drew
Andrew Jones June 24, 2019, 12:10 p.m. UTC | #3
On Mon, Jun 24, 2019 at 01:49:11PM +0200, Andrew Jones wrote:
> On Mon, Jun 24, 2019 at 12:05:26PM +0100, Dave Martin wrote:
> > On Fri, Jun 21, 2019 at 05:34:15PM +0100, Andrew Jones wrote:
> > > Introduce cpu properties to give fine control over SVE vector lengths.
> > > We introduce a property for each valid length up to the current
> > > maximum supported, which is 2048-bits. The properties are named, e.g.
> > > sve128, sve256, sve512, ..., where the number is the number of bits.
> > > 
> > > It's now possible to do something like -cpu max,sve-max-vq=4,sve384=off
> > > to provide a guest vector lengths 128, 256, and 512 bits. The resulting
> > > set must conform to the architectural constraint of having all power-of-2
> > > lengths smaller than the maximum length present. It's also possible to
> > > only provide sve<vl-bits> properties, e.g. -cpu max,sve512=on. That
> > > example provides the machine with 128, 256, and 512 bit vector lengths.
> > > It doesn't hurt to explicitly ask for all expected vector lengths,
> > > which is what, for example, libvirt should do.
> > > 
> > > Note1, it is not possible to use sve<vl-bits> properties before
> > > sve-max-vq, e.g. -cpu max,sve384=off,sve-max-vq=4, as supporting
> > > that overly complicates the user input validation.
> > > 
> > > Note2, while one might expect -cpu max,sve-max-vq=4,sve512=on to be the
> > > same as -cpu max,sve512=on, they are not. The former enables all vector
> > > lengths 512 bits and smaller, while the latter only sets the 512-bit
> > > length and its smaller power-of-2 lengths. It's probably best not to use
> > > sve-max-vq with sve<vl-bits> properties, but it can't be completely
> > > forbidden as we want qmp_query_cpu_model_expansion to work with guests
> > > launched with e.g. -cpu max,sve-max-vq=8 on their command line.
> > 
> > Supporting both options together in a non-idempotent way seems to
> > complicate things.
> > 
> > Would it be simpler to allow sve-max-vq only when there are no sve<bits>
> > options?
> 
> Not really. Since we can't simply remove sve-max-vq from the 'max' cpu
> type, then we'd still need conditions to check when we can use it. The
> restriction that it has to come first reduces the complexity
> substantially, and I think restricting to not being allowed
> at all, when sve<vl-bits> are used, would only give us a net check
> reduction of one or two.
> 
> > 
> > Alternatively, the two options would be defined so that their meanings
> > are independent of parse order.
> > 
> > So, way sve-max-vq=<max> means that:
> > 
> >  * all VQs up to max for which there is no corresponding sve<bits>=off,
> >    are enabled; and
> > 
> >  * VQ max is enabled; and
> > 
> >  * all VQs exceeding max are disabled.
> > 
> > While sve<bits>=(on|off) means
> > 
> >  * the VQ coresponding to <bits> is enabled (for on) or disabled (for
> >    off).
> >  
> > After parsing all the options, you check that the sve-max-vq and
> > sve<bits> optinos are consistent.  If you disallow duplicate sve-max-vq
> > or sve<bits> options, then there is no possibility of ambiguity and the
> > order or options doesn't matter.
> 
> I don't want to put any final checks at the end of parsing all options,
> because that won't work with the QMP query.

Actually, I think I can allow sve-max-vq to come after sve<vl-bits>
without adding much code, and without requiring a final check. I
could try that if you'd like, but I'm not sure it's worth it. Also, I
feel like I tried this once already and rejected it for some reason,
but atm I can't remember why.

> 
> > 
> > (There may be constraints on the way qemu options parsing works that
> > make this hard, though...)
> 
> I can't think of any issue with the parsing, but the QMP query only using
> the property get accessors, without any finalizing, does put constraints
> on what we can do.
> 
> > 
> > Having sve-max-vq in 128-bit units while sve<bits> are named in terms of
> > bit counts also feels a bit odd now.
> 
> sve-max-vq already exists, so it'd have to be deprecated if we don't want
> it anymore. And I think sve<vl-bits> is the right naming to go with for
> that. Of course using sve-max-vq is completely optional. If you don't want
> it for backward compatible reasons, or as a shorthand to restrict the
> lengths, then you can just use the sve<vl-bits> properties. Indeed, with
> KVM, if you use the 'host' cpu type (the default for use with KVM), then
> you won't even have the sve-max-vq property. As 'host' never had it, I
> didn't have to keep it, nor adopt it. And of course I didn't want to
> adopt it.
> 
> Thanks,
> drew
>
Dave Martin June 24, 2019, 4:06 p.m. UTC | #4
On Mon, Jun 24, 2019 at 01:10:41PM +0100, Andrew Jones wrote:
> On Mon, Jun 24, 2019 at 01:49:11PM +0200, Andrew Jones wrote:
> > On Mon, Jun 24, 2019 at 12:05:26PM +0100, Dave Martin wrote:
> > > On Fri, Jun 21, 2019 at 05:34:15PM +0100, Andrew Jones wrote:
> > > > Introduce cpu properties to give fine control over SVE vector lengths.
> > > > We introduce a property for each valid length up to the current
> > > > maximum supported, which is 2048-bits. The properties are named, e.g.
> > > > sve128, sve256, sve512, ..., where the number is the number of bits.
> > > > 
> > > > It's now possible to do something like -cpu max,sve-max-vq=4,sve384=off
> > > > to provide a guest vector lengths 128, 256, and 512 bits. The resulting
> > > > set must conform to the architectural constraint of having all power-of-2
> > > > lengths smaller than the maximum length present. It's also possible to
> > > > only provide sve<vl-bits> properties, e.g. -cpu max,sve512=on. That
> > > > example provides the machine with 128, 256, and 512 bit vector lengths.
> > > > It doesn't hurt to explicitly ask for all expected vector lengths,
> > > > which is what, for example, libvirt should do.
> > > > 
> > > > Note1, it is not possible to use sve<vl-bits> properties before
> > > > sve-max-vq, e.g. -cpu max,sve384=off,sve-max-vq=4, as supporting
> > > > that overly complicates the user input validation.
> > > > 
> > > > Note2, while one might expect -cpu max,sve-max-vq=4,sve512=on to be the
> > > > same as -cpu max,sve512=on, they are not. The former enables all vector
> > > > lengths 512 bits and smaller, while the latter only sets the 512-bit
> > > > length and its smaller power-of-2 lengths. It's probably best not to use
> > > > sve-max-vq with sve<vl-bits> properties, but it can't be completely
> > > > forbidden as we want qmp_query_cpu_model_expansion to work with guests
> > > > launched with e.g. -cpu max,sve-max-vq=8 on their command line.
> > > 
> > > Supporting both options together in a non-idempotent way seems to
> > > complicate things.
> > > 
> > > Would it be simpler to allow sve-max-vq only when there are no sve<bits>
> > > options?
> > 
> > Not really. Since we can't simply remove sve-max-vq from the 'max' cpu
> > type, then we'd still need conditions to check when we can use it. The
> > restriction that it has to come first reduces the complexity
> > substantially, and I think restricting to not being allowed
> > at all, when sve<vl-bits> are used, would only give us a net check
> > reduction of one or two.
> > 
> > > 
> > > Alternatively, the two options would be defined so that their meanings
> > > are independent of parse order.
> > > 
> > > So, way sve-max-vq=<max> means that:
> > > 
> > >  * all VQs up to max for which there is no corresponding sve<bits>=off,
> > >    are enabled; and
> > > 
> > >  * VQ max is enabled; and
> > > 
> > >  * all VQs exceeding max are disabled.
> > > 
> > > While sve<bits>=(on|off) means
> > > 
> > >  * the VQ coresponding to <bits> is enabled (for on) or disabled (for
> > >    off).
> > >  
> > > After parsing all the options, you check that the sve-max-vq and
> > > sve<bits> optinos are consistent.  If you disallow duplicate sve-max-vq
> > > or sve<bits> options, then there is no possibility of ambiguity and the
> > > order or options doesn't matter.
> > 
> > I don't want to put any final checks at the end of parsing all options,
> > because that won't work with the QMP query.

Ack, I'm not familiar with the QMP side of things.

> Actually, I think I can allow sve-max-vq to come after sve<vl-bits>
> without adding much code, and without requiring a final check. I
> could try that if you'd like, but I'm not sure it's worth it. Also, I
> feel like I tried this once already and rejected it for some reason,
> but atm I can't remember why.

Probably not.  I was aiming to simplify things, but if my suggestions
don't make anything simpler, then it's obviously not worth it :)

> > > (There may be constraints on the way qemu options parsing works that
> > > make this hard, though...)
> > 
> > I can't think of any issue with the parsing, but the QMP query only using
> > the property get accessors, without any finalizing, does put constraints
> > on what we can do.
> > 
> > > Having sve-max-vq in 128-bit units while sve<bits> are named in terms of
> > > bit counts also feels a bit odd now.
> > 
> > sve-max-vq already exists, so it'd have to be deprecated if we don't want
> > it anymore. And I think sve<vl-bits> is the right naming to go with for
> > that. Of course using sve-max-vq is completely optional. If you don't want
> > it for backward compatible reasons, or as a shorthand to restrict the
> > lengths, then you can just use the sve<vl-bits> properties. Indeed, with
> > KVM, if you use the 'host' cpu type (the default for use with KVM), then
> > you won't even have the sve-max-vq property. As 'host' never had it, I
> > didn't have to keep it, nor adopt it. And of course I didn't want to
> > adopt it.

I hadn't realised the sve-max-vq option was already merged.  I agree
that if there are established semantics for this, we shouldn't mess with
it now.

So, keep it as you proposed, I guess.

Cheers
---Dave
Eric Auger June 26, 2019, 2:58 p.m. UTC | #5
Hi Drew,

On 6/21/19 6:34 PM, Andrew Jones wrote:
> Introduce cpu properties to give fine control over SVE vector lengths.
> We introduce a property for each valid length up to the current
> maximum supported, which is 2048-bits. The properties are named, e.g.
> sve128, sve256, sve512, ..., where the number is the number of bits.
sve384 then in the natural order, otherwise it gives the impression you
can only specify * 128bit pow of 2 sizes at this stage of the reading.

> 
> It's now possible to do something like -cpu max,sve-max-vq=4,sve384=off
Document the fact sve512 cannot be turned to off, which sounds fully
sensible (by reading the code). By the way, I think an actual
documentation should be provided in qemu. Maybe as spec to agree on.
> to provide a guest vector lengths 128, 256, and 512 bits. The resulting
> set must conform to the architectural constraint of having all power-of-2
> lengths smaller than the maximum length present. It's also possible to
> only provide sve<vl-bits> properties, e.g. -cpu max,sve512=on> That example provides the machine with 128, 256, and 512 bit vector
lengths.
> It doesn't hurt to explicitly ask for all expected vector lengths,
> which is what, for example, libvirt should do.>
> Note1, it is not possible to use sve<vl-bits> properties before
> sve-max-vq, e.g. -cpu max,sve384=off,sve-max-vq=4, as supporting
> that overly complicates the user input validation.
> 
> Note2, while one might expect -cpu max,sve-max-vq=4,sve512=on to be the
> same as -cpu max,sve512=on, they are not.
yep it is a bit weird

Didn't you consider -cpu max,sve-max-vq=4,req_only=true removing non
power of 2 values and sve<vl-bits> setting a single VLbit?
> The former enables all vector lengths 512 bits and smaller
( required and permitted)
> while the latter only sets the 512-bit
> length and its smaller power-of-2 lengths. It's probably best not to use
> sve-max-vq with sve<vl-bits> properties, but it can't be completely
> forbidden as we want qmp_query_cpu_model_expansion to work with guests
> launched with e.g. -cpu max,sve-max-vq=8 on their command line.
what does happen if you specify -cpu max,sve384=on? (seems to be allowed
in the code?)

> 
> Signed-off-by: Andrew Jones <drjones@redhat.com>
> ---
>  target/arm/cpu.c         |   6 +
>  target/arm/cpu.h         |  14 ++
>  target/arm/cpu64.c       | 360 ++++++++++++++++++++++++++++++++++++++-
>  target/arm/helper.c      |  11 +-
>  target/arm/monitor.c     |  16 ++
>  tests/arm-cpu-features.c | 217 +++++++++++++++++++++++
>  6 files changed, 620 insertions(+), 4 deletions(-)
> 
> diff --git a/target/arm/cpu.c b/target/arm/cpu.c
> index f08e178fc84b..e060a0d9df0e 100644
> --- a/target/arm/cpu.c
> +++ b/target/arm/cpu.c
> @@ -1019,6 +1019,12 @@ static void arm_cpu_realizefn(DeviceState *dev, Error **errp)
>          return;
>      }
>  
> +    arm_cpu_sve_finalize(cpu, &local_err);
> +    if (local_err) {
> +        error_propagate(errp, local_err);
> +        return;
> +    }
> +
>      if (arm_feature(env, ARM_FEATURE_AARCH64) &&
>          cpu->has_vfp != cpu->has_neon) {
>          /*
> diff --git a/target/arm/cpu.h b/target/arm/cpu.h
> index f9da672be575..cbb155cf72a5 100644
> --- a/target/arm/cpu.h
> +++ b/target/arm/cpu.h
> @@ -184,8 +184,13 @@ typedef struct {
>  
>  #ifdef TARGET_AARCH64
>  # define ARM_MAX_VQ    16
> +void arm_cpu_sve_finalize(ARMCPU *cpu, Error **errp);
> +uint32_t arm_cpu_vq_map_next_smaller(ARMCPU *cpu, uint32_t vq);
>  #else
>  # define ARM_MAX_VQ    1
> +static inline void arm_cpu_sve_finalize(ARMCPU *cpu, Error **errp) { }
> +static inline uint32_t arm_cpu_vq_map_next_smaller(ARMCPU *cpu, uint32_t vq)
> +{ return 0; }
>  #endif
>  
>  typedef struct ARMVectorReg {
> @@ -915,6 +920,15 @@ struct ARMCPU {
>  
>      /* Used to set the maximum vector length the cpu will support.  */
>      uint32_t sve_max_vq;
> +
> +    /*
> +     * In sve_vq_map each set bit is a supported vector length of
> +     * (bit-number + 1) * 16 bytes, i.e. each bit number + 1 is the vector> +     * length in quadwords. We need a map size twice the maximum
> +     * quadword length though because we use two bits for each vector
> +     * length in order to track three states: uninitialized, off, and on.
> +     */
> +    DECLARE_BITMAP(sve_vq_map, ARM_MAX_VQ * 2);
>  };
>  
>  void arm_cpu_post_init(Object *obj);
> diff --git a/target/arm/cpu64.c b/target/arm/cpu64.c
> index 02ada65f240c..5def82111dee 100644
> --- a/target/arm/cpu64.c
> +++ b/target/arm/cpu64.c
> @@ -257,6 +257,149 @@ static void aarch64_a72_initfn(Object *obj)
>      define_arm_cp_regs(cpu, cortex_a72_a57_a53_cp_reginfo);
>  }
>  
> +/*
> + * While we eventually use cpu->sve_vq_map as a typical bitmap, where each vq
> + * has only two states (off/on), until we've finalized the map at realize time
> + * we use an extra bit, at the vq - 1 + ARM_MAX_VQ bit number, to also allow
> + * tracking of the uninitialized state. The arm_vq_state typedef and following
> + * functions allow us to more easily work with the bitmap. Also, while the map
> + * is still initializing, sve-max-vq has an additional three states, bringing
> + * the number of its states to five, which are the following:
> + *
> + * sve-max-vq:
> + *   0:    SVE is disabled. The default value for a vq in the map is 'OFF'.
> + *  -1:    SVE is enabled, but neither sve-max-vq nor sve<vl-bits> properties
> + *         have yet been specified by the user. The default value for a vq in
> + *         the map is 'ON'.
> + *  -2:    SVE is enabled and one or more sve<vl-bits> properties have been
> + *         set to 'OFF' by the user, but no sve<vl-bits> properties have yet
> + *         been set to 'ON'. The user is now blocked from setting sve-max-vq
> + *         and the default value for a vq in the map is 'ON'.
> + *  -3:    SVE is enabled and one or more sve<vl-bits> properties have been
> + *         set to 'ON' by the user. The user is blocked from setting sve-max-vq
> + *         and the default value for a vq in the map is 'OFF'. sve-max-vq never
> + *         transitions back to -2, even if later inputs disable the vector
> + *         lengths that initially transitioned sve-max-vq to this state. This
> + *         avoids the default values from flip-flopping.
> + *  [1-ARM_MAX_VQ]: SVE is enabled and the user has specified a valid
> + *                  sve-max-vq. The sve-max-vq specified vq and all smaller
> + *                  vq's will be initially enabled. All larger vq's will have
> + *                  a default of 'OFF'.
> + */
> +#define ARM_SVE_INIT          -1
> +#define ARM_VQ_DEFAULT_ON     -2
> +#define ARM_VQ_DEFAULT_OFF    -3
> +
> +#define arm_sve_have_max_vq(cpu) ((int32_t)(cpu)->sve_max_vq > 0)
> +
> +typedef enum arm_vq_state {
> +    ARM_VQ_OFF,
> +    ARM_VQ_ON,
> +    ARM_VQ_UNINITIALIZED,
> +} arm_vq_state;
> +
> +static arm_vq_state arm_cpu_vq_map_get(ARMCPU *cpu, int vq)> +{
> +    assert(vq <= ARM_MAX_VQ);
> +
> +    return test_bit(vq - 1, cpu->sve_vq_map) |
> +           test_bit(vq - 1 + ARM_MAX_VQ, cpu->sve_vq_map) << 1;
> +}> +
> +static void arm_cpu_vq_map_set(ARMCPU *cpu, int vq, arm_vq_state state)
> +{
> +    assert(state == ARM_VQ_OFF || state == ARM_VQ_ON);
> +    assert(vq <= ARM_MAX_VQ);
> +
> +    clear_bit(vq - 1 + ARM_MAX_VQ, cpu->sve_vq_map);
> +
> +    if (state == ARM_VQ_ON) {
> +        set_bit(vq - 1, cpu->sve_vq_map);
> +    } else {
> +        clear_bit(vq - 1, cpu->sve_vq_map);
> +    }
> +}
> +
> +static void arm_cpu_vq_map_init(ARMCPU *cpu)
> +{
> +    bitmap_zero(cpu->sve_vq_map, ARM_MAX_VQ * 2);
nit: bitmap_clear(map, 0, ARM_MAX_VQ);
/* all VLs OFF */
> +    bitmap_set(cpu->sve_vq_map, ARM_MAX_VQ, ARM_MAX_VQ);
/* all VLs uninitialized */
> +}
> +
> +static bool arm_cpu_vq_map_is_finalized(ARMCPU *cpu)
> +{
> +    DECLARE_BITMAP(map, ARM_MAX_VQ * 2);
> +
> +    bitmap_zero(map, ARM_MAX_VQ * 2);
same
> +    bitmap_set(map, ARM_MAX_VQ, ARM_MAX_VQ);
> +    bitmap_and(map, map, cpu->sve_vq_map, ARM_MAX_VQ * 2);
> +
> +    return bitmap_empty(map, ARM_MAX_VQ * 2);
> +}
> +
> +static void arm_cpu_vq_map_finalize(ARMCPU *cpu)
> +{
> +    Error *err = NULL;
> +    char name[8];
> +    uint32_t vq;
> +    bool value;
> +
> +    /*
> +     * We use the property get accessor because it knows what default
> +     * values to return for uninitialized vector lengths.
> +     */
> +    for (vq = 1; vq <= ARM_MAX_VQ; ++vq) {
> +        sprintf(name, "sve%d", vq * 128);
> +        value = object_property_get_bool(OBJECT(cpu), name, &err);
> +        assert(!err);
> +        if (value) {
> +            arm_cpu_vq_map_set(cpu, vq, ARM_VQ_ON);
> +        } else {
> +            arm_cpu_vq_map_set(cpu, vq, ARM_VQ_OFF);
> +        }
> +    }
> +
> +    assert(arm_cpu_vq_map_is_finalized(cpu));
this can be removed
> +}
> +
> +void arm_cpu_sve_finalize(ARMCPU *cpu, Error **errp)
> +{
> +    Error *err = NULL;
> +
> +    if (!cpu->sve_max_vq) {
> +        bitmap_zero(cpu->sve_vq_map, ARM_MAX_VQ * 2);
> +        return;
> +    }
> +
> +    if (cpu->sve_max_vq == ARM_SVE_INIT) {
> +        object_property_set_uint(OBJECT(cpu), ARM_MAX_VQ, "sve-max-vq", &err);
> +        if (err) {
> +            error_propagate(errp, err);
> +            return;
> +        }
> +        assert(cpu->sve_max_vq == ARM_MAX_VQ);
I guess those asserts can be removed now?
> +        arm_cpu_vq_map_finalize(cpu);
move the arm_cpu_vq_map_finalize out of the if, at the end.
> +    } else {
> +        arm_cpu_vq_map_finalize(cpu);
> +        if (!arm_sve_have_max_vq(cpu)) {
> +            cpu->sve_max_vq = arm_cpu_vq_map_next_smaller(cpu, ARM_MAX_VQ + 1);
> +        }
> +    }
> +
> +    assert(cpu->sve_max_vq == arm_cpu_vq_map_next_smaller(cpu, ARM_MAX_VQ));
same here
> +}
> +
> +uint32_t arm_cpu_vq_map_next_smaller(ARMCPU *cpu, uint32_t vq)
> +{
> +    uint32_t bitnum;
> +
> +    assert(vq <= ARM_MAX_VQ + 1);
> +    assert(arm_cpu_vq_map_is_finalized(cpu));
> +
> +    bitnum = find_last_bit(cpu->sve_vq_map, vq - 1);
why do you pass ARM_MAX_VQ + 1 and then do vq -1? doesn't
find_last_bit() take the size which is ARM_MAX_VQ in this case?
> +    return bitnum == vq - 1 ? 0 : bitnum + 1;
> +}
> +
>  static void cpu_max_get_sve_max_vq(Object *obj, Visitor *v, const char *name,
>                                     void *opaque, Error **errp)
>  {
> @@ -283,12 +426,203 @@ static void cpu_max_set_sve_max_vq(Object *obj, Visitor *v, const char *name,
>          return;
>      }
>  
> +    /*
> +     * It gets complicated trying to support both sve-max-vq and
> +     * sve<vl-bits> properties together, so we mostly don't. We
> +     * do allow both if sve-max-vq is specified first and only once
> +     * though.
> +     */
> +    if (cpu->sve_max_vq != ARM_SVE_INIT) {
> +        error_setg(errp, "sve<vl-bits> in use or sve-max-vq already "
> +                   "specified");
> +        error_append_hint(errp, "sve-max-vq must come before all "
> +                          "sve<vl-bits> properties and it must only "
> +                          "be specified once.\n");
> +        return;
> +    }
> +
>      cpu->sve_max_vq = value;
>  
>      if (cpu->sve_max_vq == 0 || cpu->sve_max_vq > ARM_MAX_VQ) {
>          error_setg(errp, "unsupported SVE vector length");
>          error_append_hint(errp, "Valid sve-max-vq in range [1-%d]\n",
>                            ARM_MAX_VQ);
> +    } else {
> +        uint32_t vq;
> +
> +        for (vq = 1; vq <= cpu->sve_max_vq; ++vq) {
> +            char name[8];
> +            sprintf(name, "sve%d", vq * 128);
> +            object_property_set_bool(obj, true, name, &err);
> +            if (err) {
> +                error_propagate(errp, err);
> +                return;
> +            }
> +        }
> +    }
> +}
> +
> +static void cpu_arm_get_sve_vq(Object *obj, Visitor *v, const char *name,
> +                               void *opaque, Error **errp)
> +{
> +    ARMCPU *cpu = ARM_CPU(obj);
> +    int vq = atoi(&name[3]) / 128;
> +    arm_vq_state vq_state;
> +    bool value;
> +
> +    vq_state = arm_cpu_vq_map_get(cpu, vq);
> +
> +    if (!cpu->sve_max_vq) {
> +        /* All vector lengths are disabled when SVE is off. */
> +        value = false;
> +    } else if (vq_state == ARM_VQ_ON) {
> +        value = true;
> +    } else if (vq_state == ARM_VQ_OFF) {
> +        value = false;
> +    } else {
> +        /*
> +         * vq is uninitialized. We pick a default here based on the
> +         * the state of sve-max-vq and other sve<vl-bits> properties.
> +         */
> +        if (arm_sve_have_max_vq(cpu)) {
> +            /*
> +             * If we have sve-max-vq, then all remaining uninitialized
> +             * vq's are 'OFF'.
> +             */
> +            value = false;
> +        } else {
> +            switch (cpu->sve_max_vq) {
> +            case ARM_SVE_INIT:
> +            case ARM_VQ_DEFAULT_ON:
> +                value = true;
> +                break;
> +            case ARM_VQ_DEFAULT_OFF:
> +                value = false;
> +                break;
> +            }
> +        }
> +    }
> +
> +    visit_type_bool(v, name, &value, errp);
> +}
> +
> +static void cpu_arm_set_sve_vq(Object *obj, Visitor *v, const char *name,
> +                               void *opaque, Error **errp)
> +{
> +    ARMCPU *cpu = ARM_CPU(obj);
> +    int vq = atoi(&name[3]) / 128;
> +    arm_vq_state vq_state;
> +    Error *err = NULL;
> +    uint32_t max_vq = 0;
> +    bool value;
> +
> +    visit_type_bool(v, name, &value, &err);
> +    if (err) {
> +        error_propagate(errp, err);
> +        return;
> +    }
> +
> +    if (value && !cpu->sve_max_vq) {
> +        error_setg(errp, "cannot enable %s", name);
> +        error_append_hint(errp, "SVE has been disabled with sve=off\n");
> +        return;
> +    } else if (!cpu->sve_max_vq) {
> +        /*
> +         * We don't complain about disabling vector lengths when SVE
> +         * is off, but we don't do anything either.
> +         */
> +        return;
> +    }
> +
> +    if (arm_sve_have_max_vq(cpu)) {
> +        max_vq = cpu->sve_max_vq;
> +    } else {
> +        if (value) {
> +            cpu->sve_max_vq = ARM_VQ_DEFAULT_OFF;
> +        } else if (cpu->sve_max_vq != ARM_VQ_DEFAULT_OFF) {
> +            cpu->sve_max_vq = ARM_VQ_DEFAULT_ON;
> +        }
> +    }
> +
> +    /*
> +     * We need to know the maximum vector length, which may just currently
> +     * be the maximum length, in order to validate the enabling/disabling
> +     * of this vector length. We use the property get accessor in order to
> +     * get the appropriate default value for any uninitialized lengths.
> +     */
> +    if (!max_vq) {
> +        char tmp[8];
> +        bool s;
> +
> +        for (max_vq = ARM_MAX_VQ; max_vq >= 1; --max_vq) {
> +            sprintf(tmp, "sve%d", max_vq * 128);
> +            s = object_property_get_bool(OBJECT(cpu), tmp, &err);
> +            assert(!err);
> +            if (s) {
> +                break;
> +            }
> +        }
> +    }
> +
> +    if (arm_sve_have_max_vq(cpu) && value && vq > cpu->sve_max_vq) {
> +        error_setg(errp, "cannot enable %s", name);
> +        error_append_hint(errp, "vq=%d (%d bits) is larger than the "
> +                          "maximum vector length, sve-max-vq=%d "
> +                          "(%d bits)\n", vq, vq * 128,
> +                          cpu->sve_max_vq, cpu->sve_max_vq * 128);
> +    } else if (arm_sve_have_max_vq(cpu) && !value && vq == cpu->sve_max_vq) {
> +        error_setg(errp, "cannot disable %s", name);
> +        error_append_hint(errp, "The maximum vector length must be "
> +                          "enabled, sve-max-vq=%d (%d bits)\n",
> +                          cpu->sve_max_vq, cpu->sve_max_vq * 128);
> +    } else if (arm_sve_have_max_vq(cpu) && !value && vq < cpu->sve_max_vq &&
> +               is_power_of_2(vq)) {
> +        error_setg(errp, "cannot disable %s", name);
> +        error_append_hint(errp, "vq=%d (%d bits) is required as it is a "
> +                          "power-of-2 length smaller than the maximum, "
> +                          "sve-max-vq=%d (%d bits)\n", vq, vq * 128,
> +                          cpu->sve_max_vq, cpu->sve_max_vq * 128);
> +    } else if (!value && vq < max_vq && is_power_of_2(vq)) {
> +        error_setg(errp, "cannot disable %s", name);
> +        error_append_hint(errp, "Vector length %d-bits is required as it "
> +                          "is a power-of-2 length smaller than another "
> +                          "enabled vector length. Disable all larger vector "
> +                          "lengths first.\n", vq * 128);
> +    } else {
adding return in each if/elsif would allow to avoid this indent.
> +        if (value) {
> +            bool fail = false;
> +            uint32_t s;
> +
> +            /*
> +             * Enabling a vector length automatically enables all
> +             * uninitialized power-of-2 lengths smaller than it, as
> +             * per the architecture.
> +             */
Test we are not attempting to enable a !is_power_of_2
> +            for (s = 1; s < vq; ++s) {
> +                if (is_power_of_2(s)) {
> +                    vq_state = arm_cpu_vq_map_get(cpu, s);
> +                    if (vq_state == ARM_VQ_UNINITIALIZED) {
> +                        arm_cpu_vq_map_set(cpu, s, ARM_VQ_ON);
> +                    } else if (vq_state == ARM_VQ_OFF) {
> +                        fail = true;
> +                        break;
> +                    }
> +                }
> +            }
> +
> +            if (fail) {
> +                error_setg(errp, "cannot enable %s", name);
> +                error_append_hint(errp, "Vector length %d-bits is disabled "
> +                                  "and is a power-of-2 length smaller than "
> +                                  "%s. All power-of-2 vector lengths smaller "
> +                                  "than the maximum length are required.\n",
> +                                  s * 128, name);
> +            } else {
> +                arm_cpu_vq_map_set(cpu, vq, ARM_VQ_ON);
> +            }
> +        } else {
> +            arm_cpu_vq_map_set(cpu, vq, ARM_VQ_OFF);
> +        }
>      }
>  }
>  
> @@ -318,10 +652,11 @@ static void cpu_arm_set_sve(Object *obj, Visitor *v, const char *name,
>          /*
>           * We handle the -cpu <cpu>,sve=off,sve=on case by reinitializing,
>           * but otherwise we don't do anything as an sve=on could come after
> -         * a sve-max-vq setting.
> +         * a sve-max-vq or sve<vl-bits> setting.
>           */
>          if (!cpu->sve_max_vq) {
> -            cpu->sve_max_vq = ARM_MAX_VQ;
> +            cpu->sve_max_vq = ARM_SVE_INIT;
> +            arm_cpu_vq_map_init(cpu);
>          }
>      } else {
>          cpu->sve_max_vq = 0;
> @@ -336,6 +671,7 @@ static void cpu_arm_set_sve(Object *obj, Visitor *v, const char *name,
>  static void aarch64_max_initfn(Object *obj)
>  {
>      ARMCPU *cpu = ARM_CPU(obj);
> +    uint32_t vq;
>  
>      if (kvm_enabled()) {
>          kvm_arm_set_cpu_features_from_host(cpu);
> @@ -420,11 +756,29 @@ static void aarch64_max_initfn(Object *obj)
>          cpu->dcz_blocksize = 7; /*  512 bytes */
>  #endif
>  
> -        cpu->sve_max_vq = ARM_MAX_VQ;
> +        /*
> +         * sve_max_vq is initially unspecified, but must be initialized to a
> +         * non-zero value (ARM_SVE_INIT) to indicate that this cpu type has
> +         * SVE. It will be finalized in arm_cpu_realizefn().
> +         */
> +        cpu->sve_max_vq = ARM_SVE_INIT;
>          object_property_add(obj, "sve-max-vq", "uint32", cpu_max_get_sve_max_vq,
>                              cpu_max_set_sve_max_vq, NULL, NULL, &error_fatal);
>          object_property_add(obj, "sve", "bool", cpu_arm_get_sve,
>                              cpu_arm_set_sve, NULL, NULL, &error_fatal);
> +
> +        /*
> +         * sve_vq_map uses a special state while setting properties, so
> +         * we initialize it here with its init function and finalize it
> +         * in arm_cpu_realizefn().
> +         */
> +        arm_cpu_vq_map_init(cpu);
> +        for (vq = 1; vq <= ARM_MAX_VQ; ++vq) {
> +            char name[8];
> +            sprintf(name, "sve%d", vq * 128);
> +            object_property_add(obj, name, "bool", cpu_arm_get_sve_vq,
> +                                cpu_arm_set_sve_vq, NULL, NULL, &error_fatal);
> +        }
>      }
>  }
>  
> diff --git a/target/arm/helper.c b/target/arm/helper.c
> index f500ccb6d31b..b7b719dba57f 100644
> --- a/target/arm/helper.c
> +++ b/target/arm/helper.c
> @@ -5324,7 +5324,16 @@ static void zcr_write(CPUARMState *env, const ARMCPRegInfo *ri,
>  
>      /* Bits other than [3:0] are RAZ/WI.  */
>      QEMU_BUILD_BUG_ON(ARM_MAX_VQ > 16);
> -    raw_write(env, ri, value & 0xf);
> +    value &= 0xf;
> +
> +    if (value) {> +        /* get next vq that is smaller than or equal to value's vq */
> +        uint32_t vq = value + 1;
ditto
> +        vq = arm_cpu_vq_map_next_smaller(cpu, vq + 1);
> +        value = vq - 1;
> +    }
> +
> +    raw_write(env, ri, value);
>  
>      /*
>       * Because we arrived here, we know both FP and SVE are enabled;
> diff --git a/target/arm/monitor.c b/target/arm/monitor.c
> index 157c487a1551..1e213906fd8f 100644
> --- a/target/arm/monitor.c
> +++ b/target/arm/monitor.c
> @@ -89,8 +89,24 @@ GICCapabilityList *qmp_query_gic_capabilities(Error **errp)
>      return head;
>  }
>  
> +QEMU_BUILD_BUG_ON(ARM_MAX_VQ > 16);
> +
> +/*
> + * These are cpu model features we want to advertise. The order here
> + * matters as this is the order in which qmp_query_cpu_model_expansion
> + * will attempt to set them. If there are dependencies between features,
> + * as there are with the sve<vl-bits> features, then the order that
> + * considers those dependencies must be used.
> + *
> + * The sve<vl-bits> features need to be in reverse order in order to
> + * enable/disable the largest vector lengths first, ensuring all
> + * power-of-2 vector lengths smaller can also be enabled/disabled.
> + */
>  static const char *cpu_model_advertised_features[] = {
>      "aarch64", "pmu", "sve",
> +    "sve2048", "sve1920", "sve1792", "sve1664", "sve1536", "sve1408",
> +    "sve1280", "sve1152", "sve1024", "sve896", "sve768", "sve640",
> +    "sve512", "sve384", "sve256", "sve128",
>      NULL
>  };
>  
> diff --git a/tests/arm-cpu-features.c b/tests/arm-cpu-features.c
Please move all the tests in a separate patch.
Each day has enough trouble of its own ;-) sweat...

Thanks

Eric
> index 509e458e9c2f..a4bf6aec00df 100644
> --- a/tests/arm-cpu-features.c
> +++ b/tests/arm-cpu-features.c
> @@ -13,6 +13,18 @@
>  #include "qapi/qmp/qdict.h"
>  #include "qapi/qmp/qjson.h"
>  
> +#if __SIZEOF_LONG__ == 8
> +#define BIT(n) (1UL << (n))
> +#else
> +#define BIT(n) (1ULL << (n))
> +#endif
> +
> +/*
> + * We expect the SVE max-vq to be 16. Also it must be <= 64
> + * for our test code, otherwise 'vls' can't just be a uint64_t.
> + */
> +#define SVE_MAX_VQ 16
> +
>  #define MACHINE    "-machine virt,gic-version=max "
>  #define QUERY_HEAD "{ 'execute': 'query-cpu-model-expansion', " \
>                       "'arguments': { 'type': 'full', "
> @@ -137,6 +149,201 @@ static void assert_bad_props(QTestState *qts, const char *cpu_type)
>      qobject_unref(resp);
>  }
>  
> +static void resp_get_sve_vls(QDict *resp, uint64_t *vls, uint32_t *max_vq)
> +{
> +    const QDictEntry *e;
> +    QDict *qdict;
> +    int n = 0;
> +
> +    *vls = 0;
> +
> +    qdict = resp_get_props(resp);
> +
> +    for (e = qdict_first(qdict); e; e = qdict_next(qdict, e)) {
> +        if (strlen(e->key) > 3 && !strncmp(e->key, "sve", 3) &&
> +            g_ascii_isdigit(e->key[3])) {
> +            char *endptr;
> +            int bits;
> +
> +            bits = g_ascii_strtoll(&e->key[3], &endptr, 10);
> +            if (!bits || *endptr != '\0') {
> +                continue;
> +            }
> +
> +            if (qdict_get_bool(qdict, e->key)) {
> +                *vls |= BIT((bits / 128) - 1);
> +            }
> +            ++n;
> +        }
> +    }
> +
> +    g_assert(n == SVE_MAX_VQ);
> +
> +    *max_vq = !*vls ? 0 : 64 - __builtin_clzll(*vls);
> +}
> +
> +static uint64_t sve_get_vls(QTestState *qts, const char *cpu_type,
> +                            const char *fmt, ...)
> +{
> +    QDict *resp;
> +    uint64_t vls;
> +    uint32_t max_vq;
> +
> +    if (fmt) {
> +        QDict *args;
> +        va_list ap;
> +
> +        va_start(ap, fmt);
> +        args = qdict_from_vjsonf_nofail(fmt, ap);
> +        va_end(ap);
> +
> +        resp = qtest_qmp(qts, QUERY_HEAD "'model': { 'name': %s, "
> +                                                    "'props': %p }"
> +                              QUERY_TAIL, cpu_type, args);
> +    } else {
> +        resp = do_query_no_props(qts, cpu_type);
> +    }
> +
> +    g_assert(resp);
> +    resp_get_sve_vls(resp, &vls, &max_vq);
> +    qobject_unref(resp);
> +
> +    return vls;
> +}
> +
> +#define assert_sve_vls(qts, cpu_type, expected_vls, fmt, ...) \
> +    g_assert(sve_get_vls(qts, cpu_type, fmt, ##__VA_ARGS__) == expected_vls)
> +
> +static void sve_tests_default(QTestState *qts, const char *cpu_type)
> +{
> +    /*
> +     * With no sve-max-vq or sve<vl-bits> properties on the command line
> +     * the default is to have all vector lengths enabled.
> +     */
> +    assert_sve_vls(qts, cpu_type, BIT(SVE_MAX_VQ) - 1, NULL);
> +
> +    /*
> +     * -------------------------------------------------------------------
> +     *               power-of-2(vq)   all-power-            can      can
> +     *                                of-2(< vq)          enable   disable
> +     * -------------------------------------------------------------------
> +     * vq < max_vq      no            MUST*                yes      yes
> +     * vq < max_vq      yes           MUST*                yes      no
> +     * -------------------------------------------------------------------
> +     * vq == max_vq     n/a           MUST*                yes**    yes**
> +     * -------------------------------------------------------------------
> +     * vq > max_vq      n/a           no                   no       yes
> +     * vq > max_vq      n/a           yes                  yes      yes
> +     * -------------------------------------------------------------------
> +     *
> +     * [*] "MUST" means this requirement must already be satisfied,
> +     *     otherwise 'max_vq' couldn't itself be enabled.
> +     *
> +     * [**] Not testable with the QMP interface, only with the command line.
> +     */
> +
> +    /* max_vq := 8 */
> +    assert_sve_vls(qts, cpu_type, 0x8b, "{ 'sve1024': true }");
> +
> +    /* max_vq := 8, vq < max_vq, !power-of-2(vq) */
> +    assert_sve_vls(qts, cpu_type, 0x8f,
> +                   "{ 'sve1024': true, 'sve384': true }");
> +    assert_sve_vls(qts, cpu_type, 0x8b,
> +                   "{ 'sve1024': true, 'sve384': false }");
> +
> +    /* max_vq := 8, vq < max_vq, power-of-2(vq) */
> +    assert_sve_vls(qts, cpu_type, 0x8b,
> +                   "{ 'sve1024': true, 'sve256': true }");
> +    assert_error(qts, cpu_type, "cannot disable sve256",
> +                 "{ 'sve1024': true, 'sve256': false }");
> +
> +    /*
> +     * max_vq := 3, vq > max_vq, !all-power-of-2(< vq)
> +     *
> +     * If given sve384=on,sve512=off,sve640=on the command line error would be
> +     * "cannot enable sve640", but QMP visits the vector lengths in reverse
> +     * order, so we get "cannot disable sve512" instead. The command line would
> +     * also give that error if given sve384=on,sve640=on,sve512=off, so this is
> +     * all fine. The important thing is that we get an error.
> +     */
> +    assert_error(qts, cpu_type, "cannot disable sve512",
> +                 "{ 'sve384': true, 'sve512': false, 'sve640': true }");
> +
> +    /*
> +     * We can disable power-of-2 vector lengths when all larger lengths
> +     * are also disabled. The shorter, sve384=on,sve512=off,sve640=off
> +     * works on the command line, but QMP doesn't know that all the
> +     * vector lengths larger than 384-bits will be disabled until it
> +     * sees the enabling of sve384, which comes near the end since it
> +     * visits the lengths in reverse order. So we just have to explicitly
> +     * disable them all.
> +     */
> +    assert_sve_vls(qts, cpu_type, 0x7,
> +                   "{ 'sve384': true, 'sve512': false, 'sve640': false, "
> +                   "  'sve768': false, 'sve896': false, 'sve1024': false, "
> +                   "  'sve1152': false, 'sve1280': false, 'sve1408': false, "
> +                   "  'sve1536': false, 'sve1664': false, 'sve1792': false, "
> +                   "  'sve1920': false, 'sve2048': false }");
> +
> +    /* max_vq := 3, vq > max_vq, all-power-of-2(< vq) */
> +    assert_sve_vls(qts, cpu_type, 0x1f,
> +                   "{ 'sve384': true, 'sve512': true, 'sve640': true }");
> +    assert_sve_vls(qts, cpu_type, 0xf,
> +                   "{ 'sve384': true, 'sve512': true, 'sve640': false }");
> +}
> +
> +static void sve_tests_sve_max_vq_8(const void *data)
> +{
> +    QTestState *qts;
> +
> +    qts = qtest_init(MACHINE "-cpu max,sve-max-vq=8");
> +
> +    assert_sve_vls(qts, "max", BIT(8) - 1, NULL);
> +
> +    /*
> +     * Disabling the max-vq set by sve-max-vq is not allowed, but
> +     * of course enabling it is OK.
> +     */
> +    assert_error(qts, "max", "cannot disable sve1024", "{ 'sve1024': false }");
> +    assert_sve_vls(qts, "max", 0xff, "{ 'sve1024': true }");
> +
> +    /*
> +     * Enabling anything larger than max-vq set by sve-max-vq is not
> +     * allowed, but of course disabling everything larger is OK.
> +     */
> +    assert_error(qts, "max", "cannot enable sve1152", "{ 'sve1152': true }");
> +    assert_sve_vls(qts, "max", 0xff, "{ 'sve1152': false }");
> +
> +    /*
> +     * We can disable non power-of-2 lengths smaller than the max-vq
> +     * set by sve-max-vq, but not power-of-2 lengths.
> +     */
> +    assert_sve_vls(qts, "max", 0xfb, "{ 'sve384': false }");
> +    assert_error(qts, "max", "cannot disable sve256", "{ 'sve256': false }");
> +
> +    qtest_quit(qts);
> +}
> +
> +static void sve_tests_sve_off(const void *data)
> +{
> +    QTestState *qts;
> +
> +    qts = qtest_init(MACHINE "-cpu max,sve=off");
> +
> +    /*
> +     * SVE is off, so the map should be empty.
> +     */
> +    assert_sve_vls(qts, "max", 0, NULL);
> +
> +    /*
> +     * We can't turn anything on, but off is OK.
> +     */
> +    assert_error(qts, "max", "cannot enable sve128", "{ 'sve128': true }");
> +    assert_sve_vls(qts, "max", 0, "{ 'sve128': false }");
> +
> +    qtest_quit(qts);
> +}
> +
>  static void test_query_cpu_model_expansion(const void *data)
>  {
>      QTestState *qts;
> @@ -159,9 +366,12 @@ static void test_query_cpu_model_expansion(const void *data)
>      if (g_str_equal(qtest_get_arch(), "aarch64")) {
>          assert_has_feature(qts, "max", "aarch64");
>          assert_has_feature(qts, "max", "sve");
> +        assert_has_feature(qts, "max", "sve128");
>          assert_has_feature(qts, "cortex-a57", "pmu");
>          assert_has_feature(qts, "cortex-a57", "aarch64");
>  
> +        sve_tests_default(qts, "max");
> +
>          /* Test that features that depend on KVM generate errors without. */
>          assert_error(qts, "max",
>                       "'aarch64' feature cannot be disabled "
> @@ -213,6 +423,13 @@ int main(int argc, char **argv)
>      qtest_add_data_func("/arm/query-cpu-model-expansion",
>                          NULL, test_query_cpu_model_expansion);
>  
> +    if (g_str_equal(qtest_get_arch(), "aarch64")) {
> +        qtest_add_data_func("/arm/max/query-cpu-model-expansion/sve-max-vq-8",
> +                            NULL, sve_tests_sve_max_vq_8);
> +        qtest_add_data_func("/arm/max/query-cpu-model-expansion/sve-off",
> +                            NULL, sve_tests_sve_off);
> +    }
> +
>      if (kvm_available) {
>          qtest_add_data_func("/arm/kvm/query-cpu-model-expansion",
>                              NULL, test_query_cpu_model_expansion_kvm);
>
Eric Auger June 26, 2019, 4:56 p.m. UTC | #6
Hi,
On 6/21/19 6:34 PM, Andrew Jones wrote:
> Introduce cpu properties to give fine control over SVE vector lengths.
> We introduce a property for each valid length up to the current
> maximum supported, which is 2048-bits. The properties are named, e.g.
> sve128, sve256, sve512, ..., where the number is the number of bits.
> 
> It's now possible to do something like -cpu max,sve-max-vq=4,sve384=off
> to provide a guest vector lengths 128, 256, and 512 bits. The resulting
> set must conform to the architectural constraint of having all power-of-2
> lengths smaller than the maximum length present. It's also possible to
> only provide sve<vl-bits> properties, e.g. -cpu max,sve512=on. That
> example provides the machine with 128, 256, and 512 bit vector lengths.
> It doesn't hurt to explicitly ask for all expected vector lengths,
> which is what, for example, libvirt should do.
> 
> Note1, it is not possible to use sve<vl-bits> properties before
> sve-max-vq, e.g. -cpu max,sve384=off,sve-max-vq=4, as supporting
> that overly complicates the user input validation.
> 
> Note2, while one might expect -cpu max,sve-max-vq=4,sve512=on to be the
> same as -cpu max,sve512=on, they are not. The former enables all vector
> lengths 512 bits and smaller, while the latter only sets the 512-bit
> length and its smaller power-of-2 lengths. It's probably best not to use
> sve-max-vq with sve<vl-bits> properties, but it can't be completely
> forbidden as we want qmp_query_cpu_model_expansion to work with guests
> launched with e.g. -cpu max,sve-max-vq=8 on their command line.
> 
> Signed-off-by: Andrew Jones <drjones@redhat.com>
> ---
>  target/arm/cpu.c         |   6 +
>  target/arm/cpu.h         |  14 ++
>  target/arm/cpu64.c       | 360 ++++++++++++++++++++++++++++++++++++++-
>  target/arm/helper.c      |  11 +-
>  target/arm/monitor.c     |  16 ++
>  tests/arm-cpu-features.c | 217 +++++++++++++++++++++++
>  6 files changed, 620 insertions(+), 4 deletions(-)
> 
> diff --git a/target/arm/cpu.c b/target/arm/cpu.c
> index f08e178fc84b..e060a0d9df0e 100644
> --- a/target/arm/cpu.c
> +++ b/target/arm/cpu.c
> @@ -1019,6 +1019,12 @@ static void arm_cpu_realizefn(DeviceState *dev, Error **errp)
>          return;
>      }
>  
> +    arm_cpu_sve_finalize(cpu, &local_err);
> +    if (local_err) {
> +        error_propagate(errp, local_err);
> +        return;
> +    }
> +
>      if (arm_feature(env, ARM_FEATURE_AARCH64) &&
>          cpu->has_vfp != cpu->has_neon) {
>          /*
> diff --git a/target/arm/cpu.h b/target/arm/cpu.h
> index f9da672be575..cbb155cf72a5 100644
> --- a/target/arm/cpu.h
> +++ b/target/arm/cpu.h
> @@ -184,8 +184,13 @@ typedef struct {
>  
>  #ifdef TARGET_AARCH64
>  # define ARM_MAX_VQ    16
> +void arm_cpu_sve_finalize(ARMCPU *cpu, Error **errp);
> +uint32_t arm_cpu_vq_map_next_smaller(ARMCPU *cpu, uint32_t vq);
>  #else
>  # define ARM_MAX_VQ    1
> +static inline void arm_cpu_sve_finalize(ARMCPU *cpu, Error **errp) { }
> +static inline uint32_t arm_cpu_vq_map_next_smaller(ARMCPU *cpu, uint32_t vq)
> +{ return 0; }
>  #endif
>  
>  typedef struct ARMVectorReg {
> @@ -915,6 +920,15 @@ struct ARMCPU {
>  
>      /* Used to set the maximum vector length the cpu will support.  */
>      uint32_t sve_max_vq;
> +
> +    /*
> +     * In sve_vq_map each set bit is a supported vector length of
> +     * (bit-number + 1) * 16 bytes, i.e. each bit number + 1 is the vector
> +     * length in quadwords. We need a map size twice the maximum
> +     * quadword length though because we use two bits for each vector
> +     * length in order to track three states: uninitialized, off, and on.
> +     */
> +    DECLARE_BITMAP(sve_vq_map, ARM_MAX_VQ * 2);
>  };
>  
>  void arm_cpu_post_init(Object *obj);
> diff --git a/target/arm/cpu64.c b/target/arm/cpu64.c
> index 02ada65f240c..5def82111dee 100644
> --- a/target/arm/cpu64.c
> +++ b/target/arm/cpu64.c
> @@ -257,6 +257,149 @@ static void aarch64_a72_initfn(Object *obj)
>      define_arm_cp_regs(cpu, cortex_a72_a57_a53_cp_reginfo);
>  }
>  
> +/*
> + * While we eventually use cpu->sve_vq_map as a typical bitmap, where each vq
> + * has only two states (off/on), until we've finalized the map at realize time
> + * we use an extra bit, at the vq - 1 + ARM_MAX_VQ bit number, to also allow
> + * tracking of the uninitialized state. The arm_vq_state typedef and following
> + * functions allow us to more easily work with the bitmap. Also, while the map
> + * is still initializing, sve-max-vq has an additional three states, bringing
> + * the number of its states to five, which are the following:
> + *
> + * sve-max-vq:
> + *   0:    SVE is disabled. The default value for a vq in the map is 'OFF'.
> + *  -1:    SVE is enabled, but neither sve-max-vq nor sve<vl-bits> properties
> + *         have yet been specified by the user. The default value for a vq in
> + *         the map is 'ON'.
> + *  -2:    SVE is enabled and one or more sve<vl-bits> properties have been
> + *         set to 'OFF' by the user, but no sve<vl-bits> properties have yet
> + *         been set to 'ON'. The user is now blocked from setting sve-max-vq
> + *         and the default value for a vq in the map is 'ON'.
> + *  -3:    SVE is enabled and one or more sve<vl-bits> properties have been
> + *         set to 'ON' by the user. The user is blocked from setting sve-max-vq
> + *         and the default value for a vq in the map is 'OFF'. sve-max-vq never
> + *         transitions back to -2, even if later inputs disable the vector
> + *         lengths that initially transitioned sve-max-vq to this state. This
> + *         avoids the default values from flip-flopping.
> + *  [1-ARM_MAX_VQ]: SVE is enabled and the user has specified a valid
> + *                  sve-max-vq. The sve-max-vq specified vq and all smaller
> + *                  vq's will be initially enabled. All larger vq's will have
> + *                  a default of 'OFF'.
> + */
> +#define ARM_SVE_INIT          -1
> +#define ARM_VQ_DEFAULT_ON     -2
> +#define ARM_VQ_DEFAULT_OFF    -3
> +
> +#define arm_sve_have_max_vq(cpu) ((int32_t)(cpu)->sve_max_vq > 0)
> +
> +typedef enum arm_vq_state {
> +    ARM_VQ_OFF,
> +    ARM_VQ_ON,
> +    ARM_VQ_UNINITIALIZED,
> +} arm_vq_state;
> +
> +static arm_vq_state arm_cpu_vq_map_get(ARMCPU *cpu, int vq)
> +{
> +    assert(vq <= ARM_MAX_VQ);
> +
> +    return test_bit(vq - 1, cpu->sve_vq_map) |
> +           test_bit(vq - 1 + ARM_MAX_VQ, cpu->sve_vq_map) << 1;
> +}
> +
> +static void arm_cpu_vq_map_set(ARMCPU *cpu, int vq, arm_vq_state state)
> +{
> +    assert(state == ARM_VQ_OFF || state == ARM_VQ_ON);
> +    assert(vq <= ARM_MAX_VQ);
> +
> +    clear_bit(vq - 1 + ARM_MAX_VQ, cpu->sve_vq_map);
> +
> +    if (state == ARM_VQ_ON) {
> +        set_bit(vq - 1, cpu->sve_vq_map);
> +    } else {
> +        clear_bit(vq - 1, cpu->sve_vq_map);
> +    }
> +}
> +
> +static void arm_cpu_vq_map_init(ARMCPU *cpu)
> +{
> +    bitmap_zero(cpu->sve_vq_map, ARM_MAX_VQ * 2);
> +    bitmap_set(cpu->sve_vq_map, ARM_MAX_VQ, ARM_MAX_VQ);
> +}
> +
> +static bool arm_cpu_vq_map_is_finalized(ARMCPU *cpu)
> +{
> +    DECLARE_BITMAP(map, ARM_MAX_VQ * 2);
> +
> +    bitmap_zero(map, ARM_MAX_VQ * 2);
> +    bitmap_set(map, ARM_MAX_VQ, ARM_MAX_VQ);
> +    bitmap_and(map, map, cpu->sve_vq_map, ARM_MAX_VQ * 2);
> +
> +    return bitmap_empty(map, ARM_MAX_VQ * 2);
> +}
> +
> +static void arm_cpu_vq_map_finalize(ARMCPU *cpu)
> +{
> +    Error *err = NULL;
> +    char name[8];
> +    uint32_t vq;
> +    bool value;
> +
> +    /*
> +     * We use the property get accessor because it knows what default
> +     * values to return for uninitialized vector lengths.
> +     */
> +    for (vq = 1; vq <= ARM_MAX_VQ; ++vq) {
> +        sprintf(name, "sve%d", vq * 128);
> +        value = object_property_get_bool(OBJECT(cpu), name, &err);
> +        assert(!err);
> +        if (value) {
> +            arm_cpu_vq_map_set(cpu, vq, ARM_VQ_ON);
> +        } else {
> +            arm_cpu_vq_map_set(cpu, vq, ARM_VQ_OFF);
> +        }
> +    }
> +
> +    assert(arm_cpu_vq_map_is_finalized(cpu));
> +}
> +
> +void arm_cpu_sve_finalize(ARMCPU *cpu, Error **errp)
> +{
> +    Error *err = NULL;
> +
> +    if (!cpu->sve_max_vq) {
> +        bitmap_zero(cpu->sve_vq_map, ARM_MAX_VQ * 2);
> +        return;
> +    }
> +
> +    if (cpu->sve_max_vq == ARM_SVE_INIT) {
> +        object_property_set_uint(OBJECT(cpu), ARM_MAX_VQ, "sve-max-vq", &err);
> +        if (err) {
> +            error_propagate(errp, err);
> +            return;
> +        }
> +        assert(cpu->sve_max_vq == ARM_MAX_VQ);
> +        arm_cpu_vq_map_finalize(cpu);
> +    } else {
> +        arm_cpu_vq_map_finalize(cpu);
> +        if (!arm_sve_have_max_vq(cpu)) {
> +            cpu->sve_max_vq = arm_cpu_vq_map_next_smaller(cpu, ARM_MAX_VQ + 1);
> +        }
> +    }
> +
> +    assert(cpu->sve_max_vq == arm_cpu_vq_map_next_smaller(cpu, ARM_MAX_VQ + 1));
> +}
> +
> +uint32_t arm_cpu_vq_map_next_smaller(ARMCPU *cpu, uint32_t vq)
> +{
> +    uint32_t bitnum;
> +
> +    assert(vq <= ARM_MAX_VQ + 1);
> +    assert(arm_cpu_vq_map_is_finalized(cpu));
> +
> +    bitnum = find_last_bit(cpu->sve_vq_map, vq - 1);
> +    return bitnum == vq - 1 ? 0 : bitnum + 1;
> +}
> +
>  static void cpu_max_get_sve_max_vq(Object *obj, Visitor *v, const char *name,
>                                     void *opaque, Error **errp)
>  {
> @@ -283,12 +426,203 @@ static void cpu_max_set_sve_max_vq(Object *obj, Visitor *v, const char *name,
>          return;
>      }
>  
> +    /*
> +     * It gets complicated trying to support both sve-max-vq and
> +     * sve<vl-bits> properties together, so we mostly don't. We
> +     * do allow both if sve-max-vq is specified first and only once
> +     * though.
> +     */
> +    if (cpu->sve_max_vq != ARM_SVE_INIT) {
> +        error_setg(errp, "sve<vl-bits> in use or sve-max-vq already "
> +                   "specified");
> +        error_append_hint(errp, "sve-max-vq must come before all "
> +                          "sve<vl-bits> properties and it must only "
> +                          "be specified once.\n");
> +        return;
> +    }
> +
>      cpu->sve_max_vq = value;
>  
>      if (cpu->sve_max_vq == 0 || cpu->sve_max_vq > ARM_MAX_VQ) {
>          error_setg(errp, "unsupported SVE vector length");
>          error_append_hint(errp, "Valid sve-max-vq in range [1-%d]\n",
>                            ARM_MAX_VQ);
> +    } else {
> +        uint32_t vq;
> +
> +        for (vq = 1; vq <= cpu->sve_max_vq; ++vq) {
> +            char name[8];
> +            sprintf(name, "sve%d", vq * 128);
> +            object_property_set_bool(obj, true, name, &err);
> +            if (err) {
> +                error_propagate(errp, err);
> +                return;
> +            }
> +        }
> +    }
> +}
> +
> +static void cpu_arm_get_sve_vq(Object *obj, Visitor *v, const char *name,
> +                               void *opaque, Error **errp)
> +{
> +    ARMCPU *cpu = ARM_CPU(obj);
> +    int vq = atoi(&name[3]) / 128;
> +    arm_vq_state vq_state;
> +    bool value;
> +
> +    vq_state = arm_cpu_vq_map_get(cpu, vq);
> +
> +    if (!cpu->sve_max_vq) {
> +        /* All vector lengths are disabled when SVE is off. */
> +        value = false;
> +    } else if (vq_state == ARM_VQ_ON) {
> +        value = true;
> +    } else if (vq_state == ARM_VQ_OFF) {
> +        value = false;
> +    } else {
> +        /*
> +         * vq is uninitialized. We pick a default here based on the
> +         * the state of sve-max-vq and other sve<vl-bits> properties.
> +         */
> +        if (arm_sve_have_max_vq(cpu)) {
> +            /*
> +             * If we have sve-max-vq, then all remaining uninitialized
> +             * vq's are 'OFF'.
> +             */
> +            value = false;
> +        } else {
> +            switch (cpu->sve_max_vq) {
> +            case ARM_SVE_INIT:
> +            case ARM_VQ_DEFAULT_ON:
> +                value = true;
> +                break;
> +            case ARM_VQ_DEFAULT_OFF:
> +                value = false;
> +                break;
> +            }
> +        }
> +    }
> +
> +    visit_type_bool(v, name, &value, errp);
> +}
> +
> +static void cpu_arm_set_sve_vq(Object *obj, Visitor *v, const char *name,
> +                               void *opaque, Error **errp)
> +{
> +    ARMCPU *cpu = ARM_CPU(obj);
> +    int vq = atoi(&name[3]) / 128;
> +    arm_vq_state vq_state;
> +    Error *err = NULL;
> +    uint32_t max_vq = 0;
> +    bool value;
> +
> +    visit_type_bool(v, name, &value, &err);
> +    if (err) {
> +        error_propagate(errp, err);
> +        return;
> +    }
> +
> +    if (value && !cpu->sve_max_vq) {
> +        error_setg(errp, "cannot enable %s", name);
> +        error_append_hint(errp, "SVE has been disabled with sve=off\n");
> +        return;
> +    } else if (!cpu->sve_max_vq) {
> +        /*
> +         * We don't complain about disabling vector lengths when SVE
> +         * is off, but we don't do anything either.
> +         */
> +        return;
> +    }
> +
> +    if (arm_sve_have_max_vq(cpu)) {
> +        max_vq = cpu->sve_max_vq;
> +    } else {
> +        if (value) {
> +            cpu->sve_max_vq = ARM_VQ_DEFAULT_OFF;
> +        } else if (cpu->sve_max_vq != ARM_VQ_DEFAULT_OFF) {
> +            cpu->sve_max_vq = ARM_VQ_DEFAULT_ON;
> +        }
> +    }
> +
> +    /*
> +     * We need to know the maximum vector length, which may just currently
> +     * be the maximum length, in order to validate the enabling/disabling
> +     * of this vector length. We use the property get accessor in order to
> +     * get the appropriate default value for any uninitialized lengths.
> +     */
> +    if (!max_vq) {
> +        char tmp[8];
> +        bool s;
> +
> +        for (max_vq = ARM_MAX_VQ; max_vq >= 1; --max_vq) {
> +            sprintf(tmp, "sve%d", max_vq * 128);
> +            s = object_property_get_bool(OBJECT(cpu), tmp, &err);
> +            assert(!err);
> +            if (s) {
> +                break;
> +            }
> +        }
> +    }
> +
> +    if (arm_sve_have_max_vq(cpu) && value && vq > cpu->sve_max_vq) {
> +        error_setg(errp, "cannot enable %s", name);
> +        error_append_hint(errp, "vq=%d (%d bits) is larger than the "
> +                          "maximum vector length, sve-max-vq=%d "
> +                          "(%d bits)\n", vq, vq * 128,
> +                          cpu->sve_max_vq, cpu->sve_max_vq * 128);
> +    } else if (arm_sve_have_max_vq(cpu) && !value && vq == cpu->sve_max_vq) {
> +        error_setg(errp, "cannot disable %s", name);
> +        error_append_hint(errp, "The maximum vector length must be "
> +                          "enabled, sve-max-vq=%d (%d bits)\n",
> +                          cpu->sve_max_vq, cpu->sve_max_vq * 128);
> +    } else if (arm_sve_have_max_vq(cpu) && !value && vq < cpu->sve_max_vq &&
> +               is_power_of_2(vq)) {
> +        error_setg(errp, "cannot disable %s", name);
> +        error_append_hint(errp, "vq=%d (%d bits) is required as it is a "
> +                          "power-of-2 length smaller than the maximum, "
> +                          "sve-max-vq=%d (%d bits)\n", vq, vq * 128,
> +                          cpu->sve_max_vq, cpu->sve_max_vq * 128);
> +    } else if (!value && vq < max_vq && is_power_of_2(vq)) {
> +        error_setg(errp, "cannot disable %s", name);
> +        error_append_hint(errp, "Vector length %d-bits is required as it "
> +                          "is a power-of-2 length smaller than another "
> +                          "enabled vector length. Disable all larger vector "
> +                          "lengths first.\n", vq * 128);
> +    } else {
> +        if (value) {
> +            bool fail = false;
> +            uint32_t s;
> +
> +            /*
> +             * Enabling a vector length automatically enables all
> +             * uninitialized power-of-2 lengths smaller than it, as
> +             * per the architecture.
> +             */
> +            for (s = 1; s < vq; ++s) {
> +                if (is_power_of_2(s)) {
> +                    vq_state = arm_cpu_vq_map_get(cpu, s);
> +                    if (vq_state == ARM_VQ_UNINITIALIZED) {
> +                        arm_cpu_vq_map_set(cpu, s, ARM_VQ_ON);
> +                    } else if (vq_state == ARM_VQ_OFF) {
> +                        fail = true;
> +                        break;
> +                    }
> +                }
> +            }
> +
> +            if (fail) {
> +                error_setg(errp, "cannot enable %s", name);
> +                error_append_hint(errp, "Vector length %d-bits is disabled "
> +                                  "and is a power-of-2 length smaller than "
> +                                  "%s. All power-of-2 vector lengths smaller "
> +                                  "than the maximum length are required.\n",
> +                                  s * 128, name);
> +            } else {
> +                arm_cpu_vq_map_set(cpu, vq, ARM_VQ_ON);
> +            }
> +        } else {
> +            arm_cpu_vq_map_set(cpu, vq, ARM_VQ_OFF);
> +        }
>      }
>  }
>  
> @@ -318,10 +652,11 @@ static void cpu_arm_set_sve(Object *obj, Visitor *v, const char *name,
>          /*
>           * We handle the -cpu <cpu>,sve=off,sve=on case by reinitializing,
>           * but otherwise we don't do anything as an sve=on could come after
> -         * a sve-max-vq setting.
> +         * a sve-max-vq or sve<vl-bits> setting.
>           */
>          if (!cpu->sve_max_vq) {
> -            cpu->sve_max_vq = ARM_MAX_VQ;
> +            cpu->sve_max_vq = ARM_SVE_INIT;
> +            arm_cpu_vq_map_init(cpu);
>          }
>      } else {
>          cpu->sve_max_vq = 0;
> @@ -336,6 +671,7 @@ static void cpu_arm_set_sve(Object *obj, Visitor *v, const char *name,
>  static void aarch64_max_initfn(Object *obj)
>  {
>      ARMCPU *cpu = ARM_CPU(obj);
> +    uint32_t vq;
>  
>      if (kvm_enabled()) {
>          kvm_arm_set_cpu_features_from_host(cpu);
> @@ -420,11 +756,29 @@ static void aarch64_max_initfn(Object *obj)
>          cpu->dcz_blocksize = 7; /*  512 bytes */
>  #endif
>  
> -        cpu->sve_max_vq = ARM_MAX_VQ;
> +        /*
> +         * sve_max_vq is initially unspecified, but must be initialized to a
> +         * non-zero value (ARM_SVE_INIT) to indicate that this cpu type has
> +         * SVE. It will be finalized in arm_cpu_realizefn().
> +         */
> +        cpu->sve_max_vq = ARM_SVE_INIT;
>          object_property_add(obj, "sve-max-vq", "uint32", cpu_max_get_sve_max_vq,
>                              cpu_max_set_sve_max_vq, NULL, NULL, &error_fatal);
>          object_property_add(obj, "sve", "bool", cpu_arm_get_sve,
>                              cpu_arm_set_sve, NULL, NULL, &error_fatal);
> +
> +        /*
> +         * sve_vq_map uses a special state while setting properties, so
> +         * we initialize it here with its init function and finalize it
> +         * in arm_cpu_realizefn().
> +         */
> +        arm_cpu_vq_map_init(cpu);
> +        for (vq = 1; vq <= ARM_MAX_VQ; ++vq) {
> +            char name[8];
> +            sprintf(name, "sve%d", vq * 128);
> +            object_property_add(obj, name, "bool", cpu_arm_get_sve_vq,
> +                                cpu_arm_set_sve_vq, NULL, NULL, &error_fatal);
> +        }
>      }
>  }
>  
> diff --git a/target/arm/helper.c b/target/arm/helper.c
> index f500ccb6d31b..b7b719dba57f 100644
> --- a/target/arm/helper.c
> +++ b/target/arm/helper.c
> @@ -5324,7 +5324,16 @@ static void zcr_write(CPUARMState *env, const ARMCPRegInfo *ri,
>  
>      /* Bits other than [3:0] are RAZ/WI.  */
>      QEMU_BUILD_BUG_ON(ARM_MAX_VQ > 16);
> -    raw_write(env, ri, value & 0xf);
> +    value &= 0xf;
> +
> +    if (value) {
> +        /* get next vq that is smaller than or equal to value's vq */
> +        uint32_t vq = value + 1;
> +        vq = arm_cpu_vq_map_next_smaller(cpu, vq + 1);
> +        value = vq - 1;
spec says:

"if an unsupported vector length is requested in ZCR_ELx, the
implementation is required to select the largest
supported vector length that is less than the requested length. This
does not alter the value of ZCR_ELx.LEN.
"

So I understand the value written in the reg should not be unmodified.

Thanks

Eric

> +    }
> +
> +    raw_write(env, ri, value);
>  
>      /*
>       * Because we arrived here, we know both FP and SVE are enabled;
> diff --git a/target/arm/monitor.c b/target/arm/monitor.c
> index 157c487a1551..1e213906fd8f 100644
> --- a/target/arm/monitor.c
> +++ b/target/arm/monitor.c
> @@ -89,8 +89,24 @@ GICCapabilityList *qmp_query_gic_capabilities(Error **errp)
>      return head;
>  }
>  
> +QEMU_BUILD_BUG_ON(ARM_MAX_VQ > 16);
> +
> +/*
> + * These are cpu model features we want to advertise. The order here
> + * matters as this is the order in which qmp_query_cpu_model_expansion
> + * will attempt to set them. If there are dependencies between features,
> + * as there are with the sve<vl-bits> features, then the order that
> + * considers those dependencies must be used.
> + *
> + * The sve<vl-bits> features need to be in reverse order in order to
> + * enable/disable the largest vector lengths first, ensuring all
> + * power-of-2 vector lengths smaller can also be enabled/disabled.
> + */
>  static const char *cpu_model_advertised_features[] = {
>      "aarch64", "pmu", "sve",
> +    "sve2048", "sve1920", "sve1792", "sve1664", "sve1536", "sve1408",
> +    "sve1280", "sve1152", "sve1024", "sve896", "sve768", "sve640",
> +    "sve512", "sve384", "sve256", "sve128",
>      NULL
>  };
>  
> diff --git a/tests/arm-cpu-features.c b/tests/arm-cpu-features.c
> index 509e458e9c2f..a4bf6aec00df 100644
> --- a/tests/arm-cpu-features.c
> +++ b/tests/arm-cpu-features.c
> @@ -13,6 +13,18 @@
>  #include "qapi/qmp/qdict.h"
>  #include "qapi/qmp/qjson.h"
>  
> +#if __SIZEOF_LONG__ == 8
> +#define BIT(n) (1UL << (n))
> +#else
> +#define BIT(n) (1ULL << (n))
> +#endif
> +
> +/*
> + * We expect the SVE max-vq to be 16. Also it must be <= 64
> + * for our test code, otherwise 'vls' can't just be a uint64_t.
> + */
> +#define SVE_MAX_VQ 16
> +
>  #define MACHINE    "-machine virt,gic-version=max "
>  #define QUERY_HEAD "{ 'execute': 'query-cpu-model-expansion', " \
>                       "'arguments': { 'type': 'full', "
> @@ -137,6 +149,201 @@ static void assert_bad_props(QTestState *qts, const char *cpu_type)
>      qobject_unref(resp);
>  }
>  
> +static void resp_get_sve_vls(QDict *resp, uint64_t *vls, uint32_t *max_vq)
> +{
> +    const QDictEntry *e;
> +    QDict *qdict;
> +    int n = 0;
> +
> +    *vls = 0;
> +
> +    qdict = resp_get_props(resp);
> +
> +    for (e = qdict_first(qdict); e; e = qdict_next(qdict, e)) {
> +        if (strlen(e->key) > 3 && !strncmp(e->key, "sve", 3) &&
> +            g_ascii_isdigit(e->key[3])) {
> +            char *endptr;
> +            int bits;
> +
> +            bits = g_ascii_strtoll(&e->key[3], &endptr, 10);
> +            if (!bits || *endptr != '\0') {
> +                continue;
> +            }
> +
> +            if (qdict_get_bool(qdict, e->key)) {
> +                *vls |= BIT((bits / 128) - 1);
> +            }
> +            ++n;
> +        }
> +    }
> +
> +    g_assert(n == SVE_MAX_VQ);
> +
> +    *max_vq = !*vls ? 0 : 64 - __builtin_clzll(*vls);
> +}
> +
> +static uint64_t sve_get_vls(QTestState *qts, const char *cpu_type,
> +                            const char *fmt, ...)
> +{
> +    QDict *resp;
> +    uint64_t vls;
> +    uint32_t max_vq;
> +
> +    if (fmt) {
> +        QDict *args;
> +        va_list ap;
> +
> +        va_start(ap, fmt);
> +        args = qdict_from_vjsonf_nofail(fmt, ap);
> +        va_end(ap);
> +
> +        resp = qtest_qmp(qts, QUERY_HEAD "'model': { 'name': %s, "
> +                                                    "'props': %p }"
> +                              QUERY_TAIL, cpu_type, args);
> +    } else {
> +        resp = do_query_no_props(qts, cpu_type);
> +    }
> +
> +    g_assert(resp);
> +    resp_get_sve_vls(resp, &vls, &max_vq);
> +    qobject_unref(resp);
> +
> +    return vls;
> +}
> +
> +#define assert_sve_vls(qts, cpu_type, expected_vls, fmt, ...) \
> +    g_assert(sve_get_vls(qts, cpu_type, fmt, ##__VA_ARGS__) == expected_vls)
> +
> +static void sve_tests_default(QTestState *qts, const char *cpu_type)
> +{
> +    /*
> +     * With no sve-max-vq or sve<vl-bits> properties on the command line
> +     * the default is to have all vector lengths enabled.
> +     */
> +    assert_sve_vls(qts, cpu_type, BIT(SVE_MAX_VQ) - 1, NULL);
> +
> +    /*
> +     * -------------------------------------------------------------------
> +     *               power-of-2(vq)   all-power-            can      can
> +     *                                of-2(< vq)          enable   disable
> +     * -------------------------------------------------------------------
> +     * vq < max_vq      no            MUST*                yes      yes
> +     * vq < max_vq      yes           MUST*                yes      no
> +     * -------------------------------------------------------------------
> +     * vq == max_vq     n/a           MUST*                yes**    yes**
> +     * -------------------------------------------------------------------
> +     * vq > max_vq      n/a           no                   no       yes
> +     * vq > max_vq      n/a           yes                  yes      yes
> +     * -------------------------------------------------------------------
> +     *
> +     * [*] "MUST" means this requirement must already be satisfied,
> +     *     otherwise 'max_vq' couldn't itself be enabled.
> +     *
> +     * [**] Not testable with the QMP interface, only with the command line.
> +     */
> +
> +    /* max_vq := 8 */
> +    assert_sve_vls(qts, cpu_type, 0x8b, "{ 'sve1024': true }");
> +
> +    /* max_vq := 8, vq < max_vq, !power-of-2(vq) */
> +    assert_sve_vls(qts, cpu_type, 0x8f,
> +                   "{ 'sve1024': true, 'sve384': true }");
> +    assert_sve_vls(qts, cpu_type, 0x8b,
> +                   "{ 'sve1024': true, 'sve384': false }");
> +
> +    /* max_vq := 8, vq < max_vq, power-of-2(vq) */
> +    assert_sve_vls(qts, cpu_type, 0x8b,
> +                   "{ 'sve1024': true, 'sve256': true }");
> +    assert_error(qts, cpu_type, "cannot disable sve256",
> +                 "{ 'sve1024': true, 'sve256': false }");
> +
> +    /*
> +     * max_vq := 3, vq > max_vq, !all-power-of-2(< vq)
> +     *
> +     * If given sve384=on,sve512=off,sve640=on the command line error would be
> +     * "cannot enable sve640", but QMP visits the vector lengths in reverse
> +     * order, so we get "cannot disable sve512" instead. The command line would
> +     * also give that error if given sve384=on,sve640=on,sve512=off, so this is
> +     * all fine. The important thing is that we get an error.
> +     */
> +    assert_error(qts, cpu_type, "cannot disable sve512",
> +                 "{ 'sve384': true, 'sve512': false, 'sve640': true }");
> +
> +    /*
> +     * We can disable power-of-2 vector lengths when all larger lengths
> +     * are also disabled. The shorter, sve384=on,sve512=off,sve640=off
> +     * works on the command line, but QMP doesn't know that all the
> +     * vector lengths larger than 384-bits will be disabled until it
> +     * sees the enabling of sve384, which comes near the end since it
> +     * visits the lengths in reverse order. So we just have to explicitly
> +     * disable them all.
> +     */
> +    assert_sve_vls(qts, cpu_type, 0x7,
> +                   "{ 'sve384': true, 'sve512': false, 'sve640': false, "
> +                   "  'sve768': false, 'sve896': false, 'sve1024': false, "
> +                   "  'sve1152': false, 'sve1280': false, 'sve1408': false, "
> +                   "  'sve1536': false, 'sve1664': false, 'sve1792': false, "
> +                   "  'sve1920': false, 'sve2048': false }");
> +
> +    /* max_vq := 3, vq > max_vq, all-power-of-2(< vq) */
> +    assert_sve_vls(qts, cpu_type, 0x1f,
> +                   "{ 'sve384': true, 'sve512': true, 'sve640': true }");
> +    assert_sve_vls(qts, cpu_type, 0xf,
> +                   "{ 'sve384': true, 'sve512': true, 'sve640': false }");
> +}
> +
> +static void sve_tests_sve_max_vq_8(const void *data)
> +{
> +    QTestState *qts;
> +
> +    qts = qtest_init(MACHINE "-cpu max,sve-max-vq=8");
> +
> +    assert_sve_vls(qts, "max", BIT(8) - 1, NULL);
> +
> +    /*
> +     * Disabling the max-vq set by sve-max-vq is not allowed, but
> +     * of course enabling it is OK.
> +     */
> +    assert_error(qts, "max", "cannot disable sve1024", "{ 'sve1024': false }");
> +    assert_sve_vls(qts, "max", 0xff, "{ 'sve1024': true }");
> +
> +    /*
> +     * Enabling anything larger than max-vq set by sve-max-vq is not
> +     * allowed, but of course disabling everything larger is OK.
> +     */
> +    assert_error(qts, "max", "cannot enable sve1152", "{ 'sve1152': true }");
> +    assert_sve_vls(qts, "max", 0xff, "{ 'sve1152': false }");
> +
> +    /*
> +     * We can disable non power-of-2 lengths smaller than the max-vq
> +     * set by sve-max-vq, but not power-of-2 lengths.
> +     */
> +    assert_sve_vls(qts, "max", 0xfb, "{ 'sve384': false }");
> +    assert_error(qts, "max", "cannot disable sve256", "{ 'sve256': false }");
> +
> +    qtest_quit(qts);
> +}
> +
> +static void sve_tests_sve_off(const void *data)
> +{
> +    QTestState *qts;
> +
> +    qts = qtest_init(MACHINE "-cpu max,sve=off");
> +
> +    /*
> +     * SVE is off, so the map should be empty.
> +     */
> +    assert_sve_vls(qts, "max", 0, NULL);
> +
> +    /*
> +     * We can't turn anything on, but off is OK.
> +     */
> +    assert_error(qts, "max", "cannot enable sve128", "{ 'sve128': true }");
> +    assert_sve_vls(qts, "max", 0, "{ 'sve128': false }");
> +
> +    qtest_quit(qts);
> +}
> +
>  static void test_query_cpu_model_expansion(const void *data)
>  {
>      QTestState *qts;
> @@ -159,9 +366,12 @@ static void test_query_cpu_model_expansion(const void *data)
>      if (g_str_equal(qtest_get_arch(), "aarch64")) {
>          assert_has_feature(qts, "max", "aarch64");
>          assert_has_feature(qts, "max", "sve");
> +        assert_has_feature(qts, "max", "sve128");
>          assert_has_feature(qts, "cortex-a57", "pmu");
>          assert_has_feature(qts, "cortex-a57", "aarch64");
>  
> +        sve_tests_default(qts, "max");
> +
>          /* Test that features that depend on KVM generate errors without. */
>          assert_error(qts, "max",
>                       "'aarch64' feature cannot be disabled "
> @@ -213,6 +423,13 @@ int main(int argc, char **argv)
>      qtest_add_data_func("/arm/query-cpu-model-expansion",
>                          NULL, test_query_cpu_model_expansion);
>  
> +    if (g_str_equal(qtest_get_arch(), "aarch64")) {
> +        qtest_add_data_func("/arm/max/query-cpu-model-expansion/sve-max-vq-8",
> +                            NULL, sve_tests_sve_max_vq_8);
> +        qtest_add_data_func("/arm/max/query-cpu-model-expansion/sve-off",
> +                            NULL, sve_tests_sve_off);
> +    }
> +
>      if (kvm_available) {
>          qtest_add_data_func("/arm/kvm/query-cpu-model-expansion",
>                              NULL, test_query_cpu_model_expansion_kvm);
>
Andrew Jones June 27, 2019, 9:40 a.m. UTC | #7
On Wed, Jun 26, 2019 at 04:58:05PM +0200, Auger Eric wrote:
> Hi Drew,
> 
> On 6/21/19 6:34 PM, Andrew Jones wrote:
> > Introduce cpu properties to give fine control over SVE vector lengths.
> > We introduce a property for each valid length up to the current
> > maximum supported, which is 2048-bits. The properties are named, e.g.
> > sve128, sve256, sve512, ..., where the number is the number of bits.
> sve384 then in the natural order, otherwise it gives the impression you
> can only specify * 128bit pow of 2 sizes at this stage of the reading.

OK

> 
> > 
> > It's now possible to do something like -cpu max,sve-max-vq=4,sve384=off
> Document the fact sve512 cannot be turned to off, which sounds fully
> sensible (by reading the code). By the way, I think an actual
> documentation should be provided in qemu. Maybe as spec to agree on.

Actually, maybe I could just allow it to be disabled. It would be
a strange command line to do '-cpu max,sve-max-vq=4,sve512=off', but if
that's what the user wants, then there's not really any good reason to
block it. As I say below, mixing these types of properties on the command
line isn't really a good idea, but it's not completely blocked because we
want a user that prefers launching their (most likely TCG) guest with,
e.g.  '-cpu max,sve-max-vq=4', to also have a functional QMP CPU model
expansion, but the CPU model expansion for SVE vector lengths depends on
the sve<vl-bits> properties, thus mixing sve-max-vq with sve<vl-bits>,
where sve-max-vq comes first. They're just not mixed on the command line.

> > to provide a guest vector lengths 128, 256, and 512 bits. The resulting
> > set must conform to the architectural constraint of having all power-of-2
> > lengths smaller than the maximum length present. It's also possible to
> > only provide sve<vl-bits> properties, e.g. -cpu max,sve512=on> That example provides the machine with 128, 256, and 512 bit vector
> lengths.
> > It doesn't hurt to explicitly ask for all expected vector lengths,
> > which is what, for example, libvirt should do.>
> > Note1, it is not possible to use sve<vl-bits> properties before
> > sve-max-vq, e.g. -cpu max,sve384=off,sve-max-vq=4, as supporting
> > that overly complicates the user input validation.
> > 
> > Note2, while one might expect -cpu max,sve-max-vq=4,sve512=on to be the
> > same as -cpu max,sve512=on, they are not.
> yep it is a bit weird
> 
> Didn't you consider -cpu max,sve-max-vq=4,req_only=true removing non
> power of 2 values and sve<vl-bits> setting a single VLbit?

Nope. I mostly considered making sure sve-max-vq was supported to the
level necessary to work with the new properties, and to not change its
current semantics, but I don't want to extend it in any way, nor to
require it to be used with the new properties at all. sve-max-vq isn't
even going to be supported by '-cpu host', so we can't rely on it being
part of the SVE vector length selection for normal KVM guests.

Keep in mind that the current semantics of sve-max-vq are to enable all
vector lengths up to and including that specified maximum. That's not
a safe way to select vector lengths on KVM, as it may include vector
lengths that the KVM host does not support, thus it's not something KVM
users should use. It's already there for the 'max' cpu type though, so
we work with it in this series the best we can for 'max', but not at all
for 'host'.

Re: documentation for this. I suppose I could write something up that
clarifies the properties and also suggests best practices regarding
sve-max-vq.

> > The former enables all vector lengths 512 bits and smaller
> ( required and permitted)
> > while the latter only sets the 512-bit
> > length and its smaller power-of-2 lengths. It's probably best not to use
> > sve-max-vq with sve<vl-bits> properties, but it can't be completely
> > forbidden as we want qmp_query_cpu_model_expansion to work with guests
> > launched with e.g. -cpu max,sve-max-vq=8 on their command line.
> what does happen if you specify -cpu max,sve384=on? (seems to be allowed
> in the code?)

That's perfectly valid with tcg, you'll get 128,256,384. It's also valid
with KVM if you're host supports it.

> 
> > 
> > Signed-off-by: Andrew Jones <drjones@redhat.com>
> > ---
> >  target/arm/cpu.c         |   6 +
> >  target/arm/cpu.h         |  14 ++
> >  target/arm/cpu64.c       | 360 ++++++++++++++++++++++++++++++++++++++-
> >  target/arm/helper.c      |  11 +-
> >  target/arm/monitor.c     |  16 ++
> >  tests/arm-cpu-features.c | 217 +++++++++++++++++++++++
> >  6 files changed, 620 insertions(+), 4 deletions(-)
> > 
> > diff --git a/target/arm/cpu.c b/target/arm/cpu.c
> > index f08e178fc84b..e060a0d9df0e 100644
> > --- a/target/arm/cpu.c
> > +++ b/target/arm/cpu.c
> > @@ -1019,6 +1019,12 @@ static void arm_cpu_realizefn(DeviceState *dev, Error **errp)
> >          return;
> >      }
> >  
> > +    arm_cpu_sve_finalize(cpu, &local_err);
> > +    if (local_err) {
> > +        error_propagate(errp, local_err);
> > +        return;
> > +    }
> > +
> >      if (arm_feature(env, ARM_FEATURE_AARCH64) &&
> >          cpu->has_vfp != cpu->has_neon) {
> >          /*
> > diff --git a/target/arm/cpu.h b/target/arm/cpu.h
> > index f9da672be575..cbb155cf72a5 100644
> > --- a/target/arm/cpu.h
> > +++ b/target/arm/cpu.h
> > @@ -184,8 +184,13 @@ typedef struct {
> >  
> >  #ifdef TARGET_AARCH64
> >  # define ARM_MAX_VQ    16
> > +void arm_cpu_sve_finalize(ARMCPU *cpu, Error **errp);
> > +uint32_t arm_cpu_vq_map_next_smaller(ARMCPU *cpu, uint32_t vq);
> >  #else
> >  # define ARM_MAX_VQ    1
> > +static inline void arm_cpu_sve_finalize(ARMCPU *cpu, Error **errp) { }
> > +static inline uint32_t arm_cpu_vq_map_next_smaller(ARMCPU *cpu, uint32_t vq)
> > +{ return 0; }
> >  #endif
> >  
> >  typedef struct ARMVectorReg {
> > @@ -915,6 +920,15 @@ struct ARMCPU {
> >  
> >      /* Used to set the maximum vector length the cpu will support.  */
> >      uint32_t sve_max_vq;
> > +
> > +    /*
> > +     * In sve_vq_map each set bit is a supported vector length of
> > +     * (bit-number + 1) * 16 bytes, i.e. each bit number + 1 is the vector> +     * length in quadwords. We need a map size twice the maximum
> > +     * quadword length though because we use two bits for each vector
> > +     * length in order to track three states: uninitialized, off, and on.
> > +     */
> > +    DECLARE_BITMAP(sve_vq_map, ARM_MAX_VQ * 2);
> >  };
> >  
> >  void arm_cpu_post_init(Object *obj);
> > diff --git a/target/arm/cpu64.c b/target/arm/cpu64.c
> > index 02ada65f240c..5def82111dee 100644
> > --- a/target/arm/cpu64.c
> > +++ b/target/arm/cpu64.c
> > @@ -257,6 +257,149 @@ static void aarch64_a72_initfn(Object *obj)
> >      define_arm_cp_regs(cpu, cortex_a72_a57_a53_cp_reginfo);
> >  }
> >  
> > +/*
> > + * While we eventually use cpu->sve_vq_map as a typical bitmap, where each vq
> > + * has only two states (off/on), until we've finalized the map at realize time
> > + * we use an extra bit, at the vq - 1 + ARM_MAX_VQ bit number, to also allow
> > + * tracking of the uninitialized state. The arm_vq_state typedef and following
> > + * functions allow us to more easily work with the bitmap. Also, while the map
> > + * is still initializing, sve-max-vq has an additional three states, bringing
> > + * the number of its states to five, which are the following:
> > + *
> > + * sve-max-vq:
> > + *   0:    SVE is disabled. The default value for a vq in the map is 'OFF'.
> > + *  -1:    SVE is enabled, but neither sve-max-vq nor sve<vl-bits> properties
> > + *         have yet been specified by the user. The default value for a vq in
> > + *         the map is 'ON'.
> > + *  -2:    SVE is enabled and one or more sve<vl-bits> properties have been
> > + *         set to 'OFF' by the user, but no sve<vl-bits> properties have yet
> > + *         been set to 'ON'. The user is now blocked from setting sve-max-vq
> > + *         and the default value for a vq in the map is 'ON'.
> > + *  -3:    SVE is enabled and one or more sve<vl-bits> properties have been
> > + *         set to 'ON' by the user. The user is blocked from setting sve-max-vq
> > + *         and the default value for a vq in the map is 'OFF'. sve-max-vq never
> > + *         transitions back to -2, even if later inputs disable the vector
> > + *         lengths that initially transitioned sve-max-vq to this state. This
> > + *         avoids the default values from flip-flopping.
> > + *  [1-ARM_MAX_VQ]: SVE is enabled and the user has specified a valid
> > + *                  sve-max-vq. The sve-max-vq specified vq and all smaller
> > + *                  vq's will be initially enabled. All larger vq's will have
> > + *                  a default of 'OFF'.
> > + */
> > +#define ARM_SVE_INIT          -1
> > +#define ARM_VQ_DEFAULT_ON     -2
> > +#define ARM_VQ_DEFAULT_OFF    -3
> > +
> > +#define arm_sve_have_max_vq(cpu) ((int32_t)(cpu)->sve_max_vq > 0)
> > +
> > +typedef enum arm_vq_state {
> > +    ARM_VQ_OFF,
> > +    ARM_VQ_ON,
> > +    ARM_VQ_UNINITIALIZED,
> > +} arm_vq_state;
> > +
> > +static arm_vq_state arm_cpu_vq_map_get(ARMCPU *cpu, int vq)> +{
> > +    assert(vq <= ARM_MAX_VQ);
> > +
> > +    return test_bit(vq - 1, cpu->sve_vq_map) |
> > +           test_bit(vq - 1 + ARM_MAX_VQ, cpu->sve_vq_map) << 1;
> > +}> +
> > +static void arm_cpu_vq_map_set(ARMCPU *cpu, int vq, arm_vq_state state)
> > +{
> > +    assert(state == ARM_VQ_OFF || state == ARM_VQ_ON);
> > +    assert(vq <= ARM_MAX_VQ);
> > +
> > +    clear_bit(vq - 1 + ARM_MAX_VQ, cpu->sve_vq_map);
> > +
> > +    if (state == ARM_VQ_ON) {
> > +        set_bit(vq - 1, cpu->sve_vq_map);
> > +    } else {
> > +        clear_bit(vq - 1, cpu->sve_vq_map);
> > +    }
> > +}
> > +
> > +static void arm_cpu_vq_map_init(ARMCPU *cpu)
> > +{
> > +    bitmap_zero(cpu->sve_vq_map, ARM_MAX_VQ * 2);
> nit: bitmap_clear(map, 0, ARM_MAX_VQ);

bitmap_clear will only clear the specified bits, leaving the upper bits
uninitialized, which could cause problems. It's not a problem here
because we're using a zero initialized cpu state member, but if it was
a bitmap like below, declared on the stack, then we need the zeroing at
least once. I'd prefer to use zero here too, to keep it consistent.

> /* all VLs OFF */
> > +    bitmap_set(cpu->sve_vq_map, ARM_MAX_VQ, ARM_MAX_VQ);
> /* all VLs uninitialized */

I can add one comment that says

"Set all VQ's to ARM_VQ_UNINITIALIZED" above both lines.

> > +}
> > +
> > +static bool arm_cpu_vq_map_is_finalized(ARMCPU *cpu)
> > +{
> > +    DECLARE_BITMAP(map, ARM_MAX_VQ * 2);
> > +
> > +    bitmap_zero(map, ARM_MAX_VQ * 2);
> same

Here me must use bitmap_zero.

> > +    bitmap_set(map, ARM_MAX_VQ, ARM_MAX_VQ);
> > +    bitmap_and(map, map, cpu->sve_vq_map, ARM_MAX_VQ * 2);
> > +
> > +    return bitmap_empty(map, ARM_MAX_VQ * 2);
> > +}
> > +
> > +static void arm_cpu_vq_map_finalize(ARMCPU *cpu)
> > +{
> > +    Error *err = NULL;
> > +    char name[8];
> > +    uint32_t vq;
> > +    bool value;
> > +
> > +    /*
> > +     * We use the property get accessor because it knows what default
> > +     * values to return for uninitialized vector lengths.
> > +     */
> > +    for (vq = 1; vq <= ARM_MAX_VQ; ++vq) {
> > +        sprintf(name, "sve%d", vq * 128);
> > +        value = object_property_get_bool(OBJECT(cpu), name, &err);
> > +        assert(!err);
> > +        if (value) {
> > +            arm_cpu_vq_map_set(cpu, vq, ARM_VQ_ON);
> > +        } else {
> > +            arm_cpu_vq_map_set(cpu, vq, ARM_VQ_OFF);
> > +        }
> > +    }
> > +
> > +    assert(arm_cpu_vq_map_is_finalized(cpu));
> this can be removed

Yes, it can be today, as the code works fine. The idea was that if
somebody came in here and modified this in someway that broke the
finalized state, then we'd catch it here before going off and doing
weird things. This isn't a hot path, so I'd prefer we keep it, but
if QEMU maintainers prefer to limit defensive code, then I can
certainly remove it.

> > +}
> > +
> > +void arm_cpu_sve_finalize(ARMCPU *cpu, Error **errp)
> > +{
> > +    Error *err = NULL;
> > +
> > +    if (!cpu->sve_max_vq) {
> > +        bitmap_zero(cpu->sve_vq_map, ARM_MAX_VQ * 2);
> > +        return;
> > +    }
> > +
> > +    if (cpu->sve_max_vq == ARM_SVE_INIT) {
> > +        object_property_set_uint(OBJECT(cpu), ARM_MAX_VQ, "sve-max-vq", &err);
> > +        if (err) {
> > +            error_propagate(errp, err);
> > +            return;
> > +        }
> > +        assert(cpu->sve_max_vq == ARM_MAX_VQ);
> I guess those asserts can be removed now?

I'd prefer to keep it. It's critical that we get this right, so if
somebody changes something somewhere in the property set/get code
that would break this, then it's best we know right away. Again,
though, if there's some reason to limit defensive code, even on the
init path, then I can.

> > +        arm_cpu_vq_map_finalize(cpu);
> move the arm_cpu_vq_map_finalize out of the if, at the end.

That wouldn't work. In this branch arm_cpu_vq_map_finalize() comes
after setting cpu->sve_max_vq. In the else branch it must come first.

> > +    } else {
> > +        arm_cpu_vq_map_finalize(cpu);
> > +        if (!arm_sve_have_max_vq(cpu)) {
> > +            cpu->sve_max_vq = arm_cpu_vq_map_next_smaller(cpu, ARM_MAX_VQ + 1);
> > +        }
> > +    }
> > +
> > +    assert(cpu->sve_max_vq == arm_cpu_vq_map_next_smaller(cpu, ARM_MAX_VQ));

What happened to my '+ 1' here? I see it's in the patch, but somehow got
removed in your quote of the patch? For a second there I thought something
was really wrong, since that assert should have fired for every full map.

> same here

Anyway my same argument that we don't want to leave arm_cpu_sve_finalize()
without knowing we got this right applies.

> > +}
> > +
> > +uint32_t arm_cpu_vq_map_next_smaller(ARMCPU *cpu, uint32_t vq)
> > +{
> > +    uint32_t bitnum;
> > +
> > +    assert(vq <= ARM_MAX_VQ + 1);
> > +    assert(arm_cpu_vq_map_is_finalized(cpu));
> > +
> > +    bitnum = find_last_bit(cpu->sve_vq_map, vq - 1);
> why do you pass ARM_MAX_VQ + 1 and then do vq -1? doesn't
> find_last_bit() take the size which is ARM_MAX_VQ in this case?

The size is ARM_MAX_VQ + 1, when you want to also check the bitnum
corresponding to ARM_MAX_VQ. The bitnum for a VQ is always VQ - 1.
Recall VQ=1, which is 128-bits. It takes the bit position zero.
VQ=ARM_MAX_VQ=16 takes the bit position 15. As for the 'vq - 1' here
in the find_last_bit() call, that's required because that's what the
function says it does. It finds the next smaller VQ. So if you pass
in, for example, vq=2, it shouldn't search anything but bit position
zero:

 vq=2 => max next smaller is vq=1, bitnum = 0
      => search the bitmap of size 1 for the last set bit

Or, if you want to search the whole map, including the maximum
possibly VQ, as is done above, then you need to pass ARM_MAX_VQ + 1,
since the max next smaller VQ is ARM_MAX_VQ.

> > +    return bitnum == vq - 1 ? 0 : bitnum + 1;
> > +}
> > +
> >  static void cpu_max_get_sve_max_vq(Object *obj, Visitor *v, const char *name,
> >                                     void *opaque, Error **errp)
> >  {
> > @@ -283,12 +426,203 @@ static void cpu_max_set_sve_max_vq(Object *obj, Visitor *v, const char *name,
> >          return;
> >      }
> >  
> > +    /*
> > +     * It gets complicated trying to support both sve-max-vq and
> > +     * sve<vl-bits> properties together, so we mostly don't. We
> > +     * do allow both if sve-max-vq is specified first and only once
> > +     * though.
> > +     */
> > +    if (cpu->sve_max_vq != ARM_SVE_INIT) {
> > +        error_setg(errp, "sve<vl-bits> in use or sve-max-vq already "
> > +                   "specified");
> > +        error_append_hint(errp, "sve-max-vq must come before all "
> > +                          "sve<vl-bits> properties and it must only "
> > +                          "be specified once.\n");
> > +        return;
> > +    }
> > +
> >      cpu->sve_max_vq = value;
> >  
> >      if (cpu->sve_max_vq == 0 || cpu->sve_max_vq > ARM_MAX_VQ) {
> >          error_setg(errp, "unsupported SVE vector length");
> >          error_append_hint(errp, "Valid sve-max-vq in range [1-%d]\n",
> >                            ARM_MAX_VQ);
> > +    } else {
> > +        uint32_t vq;
> > +
> > +        for (vq = 1; vq <= cpu->sve_max_vq; ++vq) {
> > +            char name[8];
> > +            sprintf(name, "sve%d", vq * 128);
> > +            object_property_set_bool(obj, true, name, &err);
> > +            if (err) {
> > +                error_propagate(errp, err);
> > +                return;
> > +            }
> > +        }
> > +    }
> > +}
> > +
> > +static void cpu_arm_get_sve_vq(Object *obj, Visitor *v, const char *name,
> > +                               void *opaque, Error **errp)
> > +{
> > +    ARMCPU *cpu = ARM_CPU(obj);
> > +    int vq = atoi(&name[3]) / 128;
> > +    arm_vq_state vq_state;
> > +    bool value;
> > +
> > +    vq_state = arm_cpu_vq_map_get(cpu, vq);
> > +
> > +    if (!cpu->sve_max_vq) {
> > +        /* All vector lengths are disabled when SVE is off. */
> > +        value = false;
> > +    } else if (vq_state == ARM_VQ_ON) {
> > +        value = true;
> > +    } else if (vq_state == ARM_VQ_OFF) {
> > +        value = false;
> > +    } else {
> > +        /*
> > +         * vq is uninitialized. We pick a default here based on the
> > +         * the state of sve-max-vq and other sve<vl-bits> properties.
> > +         */
> > +        if (arm_sve_have_max_vq(cpu)) {
> > +            /*
> > +             * If we have sve-max-vq, then all remaining uninitialized
> > +             * vq's are 'OFF'.
> > +             */
> > +            value = false;
> > +        } else {
> > +            switch (cpu->sve_max_vq) {
> > +            case ARM_SVE_INIT:
> > +            case ARM_VQ_DEFAULT_ON:
> > +                value = true;
> > +                break;
> > +            case ARM_VQ_DEFAULT_OFF:
> > +                value = false;
> > +                break;
> > +            }
> > +        }
> > +    }
> > +
> > +    visit_type_bool(v, name, &value, errp);
> > +}
> > +
> > +static void cpu_arm_set_sve_vq(Object *obj, Visitor *v, const char *name,
> > +                               void *opaque, Error **errp)
> > +{
> > +    ARMCPU *cpu = ARM_CPU(obj);
> > +    int vq = atoi(&name[3]) / 128;
> > +    arm_vq_state vq_state;
> > +    Error *err = NULL;
> > +    uint32_t max_vq = 0;
> > +    bool value;
> > +
> > +    visit_type_bool(v, name, &value, &err);
> > +    if (err) {
> > +        error_propagate(errp, err);
> > +        return;
> > +    }
> > +
> > +    if (value && !cpu->sve_max_vq) {
> > +        error_setg(errp, "cannot enable %s", name);
> > +        error_append_hint(errp, "SVE has been disabled with sve=off\n");
> > +        return;
> > +    } else if (!cpu->sve_max_vq) {
> > +        /*
> > +         * We don't complain about disabling vector lengths when SVE
> > +         * is off, but we don't do anything either.
> > +         */
> > +        return;
> > +    }
> > +
> > +    if (arm_sve_have_max_vq(cpu)) {
> > +        max_vq = cpu->sve_max_vq;
> > +    } else {
> > +        if (value) {
> > +            cpu->sve_max_vq = ARM_VQ_DEFAULT_OFF;
> > +        } else if (cpu->sve_max_vq != ARM_VQ_DEFAULT_OFF) {
> > +            cpu->sve_max_vq = ARM_VQ_DEFAULT_ON;
> > +        }
> > +    }
> > +
> > +    /*
> > +     * We need to know the maximum vector length, which may just currently
> > +     * be the maximum length, in order to validate the enabling/disabling
> > +     * of this vector length. We use the property get accessor in order to
> > +     * get the appropriate default value for any uninitialized lengths.
> > +     */
> > +    if (!max_vq) {
> > +        char tmp[8];
> > +        bool s;
> > +
> > +        for (max_vq = ARM_MAX_VQ; max_vq >= 1; --max_vq) {
> > +            sprintf(tmp, "sve%d", max_vq * 128);
> > +            s = object_property_get_bool(OBJECT(cpu), tmp, &err);
> > +            assert(!err);
> > +            if (s) {
> > +                break;
> > +            }
> > +        }
> > +    }
> > +
> > +    if (arm_sve_have_max_vq(cpu) && value && vq > cpu->sve_max_vq) {
> > +        error_setg(errp, "cannot enable %s", name);
> > +        error_append_hint(errp, "vq=%d (%d bits) is larger than the "
> > +                          "maximum vector length, sve-max-vq=%d "
> > +                          "(%d bits)\n", vq, vq * 128,
> > +                          cpu->sve_max_vq, cpu->sve_max_vq * 128);
> > +    } else if (arm_sve_have_max_vq(cpu) && !value && vq == cpu->sve_max_vq) {
> > +        error_setg(errp, "cannot disable %s", name);
> > +        error_append_hint(errp, "The maximum vector length must be "
> > +                          "enabled, sve-max-vq=%d (%d bits)\n",
> > +                          cpu->sve_max_vq, cpu->sve_max_vq * 128);
> > +    } else if (arm_sve_have_max_vq(cpu) && !value && vq < cpu->sve_max_vq &&
> > +               is_power_of_2(vq)) {
> > +        error_setg(errp, "cannot disable %s", name);
> > +        error_append_hint(errp, "vq=%d (%d bits) is required as it is a "
> > +                          "power-of-2 length smaller than the maximum, "
> > +                          "sve-max-vq=%d (%d bits)\n", vq, vq * 128,
> > +                          cpu->sve_max_vq, cpu->sve_max_vq * 128);
> > +    } else if (!value && vq < max_vq && is_power_of_2(vq)) {
> > +        error_setg(errp, "cannot disable %s", name);
> > +        error_append_hint(errp, "Vector length %d-bits is required as it "
> > +                          "is a power-of-2 length smaller than another "
> > +                          "enabled vector length. Disable all larger vector "
> > +                          "lengths first.\n", vq * 128);
> > +    } else {
> adding return in each if/elsif would allow to avoid this indent.

Yeah, but I didn't really mind the indent :-)

> > +        if (value) {
> > +            bool fail = false;
> > +            uint32_t s;
> > +
> > +            /*
> > +             * Enabling a vector length automatically enables all
> > +             * uninitialized power-of-2 lengths smaller than it, as
> > +             * per the architecture.
> > +             */
> Test we are not attempting to enable a !is_power_of_2

I'm not sure what this means. In this context 'vq' can be a !power-of-2
if it wants to be, and there's no way for 's' to be a !power-of-2 because
we filter the vq's with is_power_of_2(s).

> > +            for (s = 1; s < vq; ++s) {
> > +                if (is_power_of_2(s)) {
> > +                    vq_state = arm_cpu_vq_map_get(cpu, s);
> > +                    if (vq_state == ARM_VQ_UNINITIALIZED) {
> > +                        arm_cpu_vq_map_set(cpu, s, ARM_VQ_ON);
> > +                    } else if (vq_state == ARM_VQ_OFF) {
> > +                        fail = true;
> > +                        break;
> > +                    }
> > +                }
> > +            }
> > +
> > +            if (fail) {
> > +                error_setg(errp, "cannot enable %s", name);
> > +                error_append_hint(errp, "Vector length %d-bits is disabled "
> > +                                  "and is a power-of-2 length smaller than "
> > +                                  "%s. All power-of-2 vector lengths smaller "
> > +                                  "than the maximum length are required.\n",
> > +                                  s * 128, name);
> > +            } else {
> > +                arm_cpu_vq_map_set(cpu, vq, ARM_VQ_ON);
> > +            }
> > +        } else {
> > +            arm_cpu_vq_map_set(cpu, vq, ARM_VQ_OFF);
> > +        }
> >      }
> >  }
> >  
> > @@ -318,10 +652,11 @@ static void cpu_arm_set_sve(Object *obj, Visitor *v, const char *name,
> >          /*
> >           * We handle the -cpu <cpu>,sve=off,sve=on case by reinitializing,
> >           * but otherwise we don't do anything as an sve=on could come after
> > -         * a sve-max-vq setting.
> > +         * a sve-max-vq or sve<vl-bits> setting.
> >           */
> >          if (!cpu->sve_max_vq) {
> > -            cpu->sve_max_vq = ARM_MAX_VQ;
> > +            cpu->sve_max_vq = ARM_SVE_INIT;
> > +            arm_cpu_vq_map_init(cpu);
> >          }
> >      } else {
> >          cpu->sve_max_vq = 0;
> > @@ -336,6 +671,7 @@ static void cpu_arm_set_sve(Object *obj, Visitor *v, const char *name,
> >  static void aarch64_max_initfn(Object *obj)
> >  {
> >      ARMCPU *cpu = ARM_CPU(obj);
> > +    uint32_t vq;
> >  
> >      if (kvm_enabled()) {
> >          kvm_arm_set_cpu_features_from_host(cpu);
> > @@ -420,11 +756,29 @@ static void aarch64_max_initfn(Object *obj)
> >          cpu->dcz_blocksize = 7; /*  512 bytes */
> >  #endif
> >  
> > -        cpu->sve_max_vq = ARM_MAX_VQ;
> > +        /*
> > +         * sve_max_vq is initially unspecified, but must be initialized to a
> > +         * non-zero value (ARM_SVE_INIT) to indicate that this cpu type has
> > +         * SVE. It will be finalized in arm_cpu_realizefn().
> > +         */
> > +        cpu->sve_max_vq = ARM_SVE_INIT;
> >          object_property_add(obj, "sve-max-vq", "uint32", cpu_max_get_sve_max_vq,
> >                              cpu_max_set_sve_max_vq, NULL, NULL, &error_fatal);
> >          object_property_add(obj, "sve", "bool", cpu_arm_get_sve,
> >                              cpu_arm_set_sve, NULL, NULL, &error_fatal);
> > +
> > +        /*
> > +         * sve_vq_map uses a special state while setting properties, so
> > +         * we initialize it here with its init function and finalize it
> > +         * in arm_cpu_realizefn().
> > +         */
> > +        arm_cpu_vq_map_init(cpu);
> > +        for (vq = 1; vq <= ARM_MAX_VQ; ++vq) {
> > +            char name[8];
> > +            sprintf(name, "sve%d", vq * 128);
> > +            object_property_add(obj, name, "bool", cpu_arm_get_sve_vq,
> > +                                cpu_arm_set_sve_vq, NULL, NULL, &error_fatal);
> > +        }
> >      }
> >  }
> >  
> > diff --git a/target/arm/helper.c b/target/arm/helper.c
> > index f500ccb6d31b..b7b719dba57f 100644
> > --- a/target/arm/helper.c
> > +++ b/target/arm/helper.c
> > @@ -5324,7 +5324,16 @@ static void zcr_write(CPUARMState *env, const ARMCPRegInfo *ri,
> >  
> >      /* Bits other than [3:0] are RAZ/WI.  */
> >      QEMU_BUILD_BUG_ON(ARM_MAX_VQ > 16);
> > -    raw_write(env, ri, value & 0xf);
> > +    value &= 0xf;
> > +
> > +    if (value) {> +        /* get next vq that is smaller than or equal to value's vq */
> > +        uint32_t vq = value + 1;
> ditto

What does this 'ditto' map to? Oh, probably the discussion about vq's
getting +1's and -1's. In this context 'value' is the ZCR_ELx
representation of a VQ, which is VQ - 1, just like in the bitmap. To
translate that to a VQ we need to do a '+ 1'. As I wrote in the comment
here, we want to find the next smaller or equal VQ here, because this is
the input VQ from the guest which may itself be a valid VQ, but if it's
not, we need to step down to the next valid one. So we ask
arm_cpu_vq_map_next_smaller() for 'vq + 1' in order to ensure 'vq' will
be in the searched range. After we get a valid vq from
arm_cpu_vq_map_next_smaller(), which must be at least '1', since that's
required by the architecture and thus will always be present, we need to
translate it back to the ZCR_ELx representation with the subtraction.
Phew... We programmers do love adding and subtracting by one, right :-)

> > +        vq = arm_cpu_vq_map_next_smaller(cpu, vq + 1);
> > +        value = vq - 1;
> > +    }
> > +
> > +    raw_write(env, ri, value);
> >  
> >      /*
> >       * Because we arrived here, we know both FP and SVE are enabled;
> > diff --git a/target/arm/monitor.c b/target/arm/monitor.c
> > index 157c487a1551..1e213906fd8f 100644
> > --- a/target/arm/monitor.c
> > +++ b/target/arm/monitor.c
> > @@ -89,8 +89,24 @@ GICCapabilityList *qmp_query_gic_capabilities(Error **errp)
> >      return head;
> >  }
> >  
> > +QEMU_BUILD_BUG_ON(ARM_MAX_VQ > 16);
> > +
> > +/*
> > + * These are cpu model features we want to advertise. The order here
> > + * matters as this is the order in which qmp_query_cpu_model_expansion
> > + * will attempt to set them. If there are dependencies between features,
> > + * as there are with the sve<vl-bits> features, then the order that
> > + * considers those dependencies must be used.
> > + *
> > + * The sve<vl-bits> features need to be in reverse order in order to
> > + * enable/disable the largest vector lengths first, ensuring all
> > + * power-of-2 vector lengths smaller can also be enabled/disabled.
> > + */
> >  static const char *cpu_model_advertised_features[] = {
> >      "aarch64", "pmu", "sve",
> > +    "sve2048", "sve1920", "sve1792", "sve1664", "sve1536", "sve1408",
> > +    "sve1280", "sve1152", "sve1024", "sve896", "sve768", "sve640",
> > +    "sve512", "sve384", "sve256", "sve128",
> >      NULL
> >  };
> >  
> > diff --git a/tests/arm-cpu-features.c b/tests/arm-cpu-features.c
> Please move all the tests in a separate patch.

I'd prefer not to. I like keeping the test cases that test the new code in
the same patch. It self documents what the test cases map to what code and
allows immediate testing of the patch when bisecting. Also I'm not really
sure how it makes review worse, as another patch would look the same, just
in a different email. 

> Each day has enough trouble of its own ;-) sweat...

I really appreciate the review! I realize it's generating sweat,
especially with the European heat wave! You don't have to finish
a patch before sending comments. I'm fine with multiple replies to
the same patch if you want to break your review up into smaller
bites.

Thanks,
drew

> 
> Thanks
> 
> Eric
> > index 509e458e9c2f..a4bf6aec00df 100644
> > --- a/tests/arm-cpu-features.c
> > +++ b/tests/arm-cpu-features.c
> > @@ -13,6 +13,18 @@
> >  #include "qapi/qmp/qdict.h"
> >  #include "qapi/qmp/qjson.h"
> >  
> > +#if __SIZEOF_LONG__ == 8
> > +#define BIT(n) (1UL << (n))
> > +#else
> > +#define BIT(n) (1ULL << (n))
> > +#endif
> > +
> > +/*
> > + * We expect the SVE max-vq to be 16. Also it must be <= 64
> > + * for our test code, otherwise 'vls' can't just be a uint64_t.
> > + */
> > +#define SVE_MAX_VQ 16
> > +
> >  #define MACHINE    "-machine virt,gic-version=max "
> >  #define QUERY_HEAD "{ 'execute': 'query-cpu-model-expansion', " \
> >                       "'arguments': { 'type': 'full', "
> > @@ -137,6 +149,201 @@ static void assert_bad_props(QTestState *qts, const char *cpu_type)
> >      qobject_unref(resp);
> >  }
> >  
> > +static void resp_get_sve_vls(QDict *resp, uint64_t *vls, uint32_t *max_vq)
> > +{
> > +    const QDictEntry *e;
> > +    QDict *qdict;
> > +    int n = 0;
> > +
> > +    *vls = 0;
> > +
> > +    qdict = resp_get_props(resp);
> > +
> > +    for (e = qdict_first(qdict); e; e = qdict_next(qdict, e)) {
> > +        if (strlen(e->key) > 3 && !strncmp(e->key, "sve", 3) &&
> > +            g_ascii_isdigit(e->key[3])) {
> > +            char *endptr;
> > +            int bits;
> > +
> > +            bits = g_ascii_strtoll(&e->key[3], &endptr, 10);
> > +            if (!bits || *endptr != '\0') {
> > +                continue;
> > +            }
> > +
> > +            if (qdict_get_bool(qdict, e->key)) {
> > +                *vls |= BIT((bits / 128) - 1);
> > +            }
> > +            ++n;
> > +        }
> > +    }
> > +
> > +    g_assert(n == SVE_MAX_VQ);
> > +
> > +    *max_vq = !*vls ? 0 : 64 - __builtin_clzll(*vls);
> > +}
> > +
> > +static uint64_t sve_get_vls(QTestState *qts, const char *cpu_type,
> > +                            const char *fmt, ...)
> > +{
> > +    QDict *resp;
> > +    uint64_t vls;
> > +    uint32_t max_vq;
> > +
> > +    if (fmt) {
> > +        QDict *args;
> > +        va_list ap;
> > +
> > +        va_start(ap, fmt);
> > +        args = qdict_from_vjsonf_nofail(fmt, ap);
> > +        va_end(ap);
> > +
> > +        resp = qtest_qmp(qts, QUERY_HEAD "'model': { 'name': %s, "
> > +                                                    "'props': %p }"
> > +                              QUERY_TAIL, cpu_type, args);
> > +    } else {
> > +        resp = do_query_no_props(qts, cpu_type);
> > +    }
> > +
> > +    g_assert(resp);
> > +    resp_get_sve_vls(resp, &vls, &max_vq);
> > +    qobject_unref(resp);
> > +
> > +    return vls;
> > +}
> > +
> > +#define assert_sve_vls(qts, cpu_type, expected_vls, fmt, ...) \
> > +    g_assert(sve_get_vls(qts, cpu_type, fmt, ##__VA_ARGS__) == expected_vls)
> > +
> > +static void sve_tests_default(QTestState *qts, const char *cpu_type)
> > +{
> > +    /*
> > +     * With no sve-max-vq or sve<vl-bits> properties on the command line
> > +     * the default is to have all vector lengths enabled.
> > +     */
> > +    assert_sve_vls(qts, cpu_type, BIT(SVE_MAX_VQ) - 1, NULL);
> > +
> > +    /*
> > +     * -------------------------------------------------------------------
> > +     *               power-of-2(vq)   all-power-            can      can
> > +     *                                of-2(< vq)          enable   disable
> > +     * -------------------------------------------------------------------
> > +     * vq < max_vq      no            MUST*                yes      yes
> > +     * vq < max_vq      yes           MUST*                yes      no
> > +     * -------------------------------------------------------------------
> > +     * vq == max_vq     n/a           MUST*                yes**    yes**
> > +     * -------------------------------------------------------------------
> > +     * vq > max_vq      n/a           no                   no       yes
> > +     * vq > max_vq      n/a           yes                  yes      yes
> > +     * -------------------------------------------------------------------
> > +     *
> > +     * [*] "MUST" means this requirement must already be satisfied,
> > +     *     otherwise 'max_vq' couldn't itself be enabled.
> > +     *
> > +     * [**] Not testable with the QMP interface, only with the command line.
> > +     */
> > +
> > +    /* max_vq := 8 */
> > +    assert_sve_vls(qts, cpu_type, 0x8b, "{ 'sve1024': true }");
> > +
> > +    /* max_vq := 8, vq < max_vq, !power-of-2(vq) */
> > +    assert_sve_vls(qts, cpu_type, 0x8f,
> > +                   "{ 'sve1024': true, 'sve384': true }");
> > +    assert_sve_vls(qts, cpu_type, 0x8b,
> > +                   "{ 'sve1024': true, 'sve384': false }");
> > +
> > +    /* max_vq := 8, vq < max_vq, power-of-2(vq) */
> > +    assert_sve_vls(qts, cpu_type, 0x8b,
> > +                   "{ 'sve1024': true, 'sve256': true }");
> > +    assert_error(qts, cpu_type, "cannot disable sve256",
> > +                 "{ 'sve1024': true, 'sve256': false }");
> > +
> > +    /*
> > +     * max_vq := 3, vq > max_vq, !all-power-of-2(< vq)
> > +     *
> > +     * If given sve384=on,sve512=off,sve640=on the command line error would be
> > +     * "cannot enable sve640", but QMP visits the vector lengths in reverse
> > +     * order, so we get "cannot disable sve512" instead. The command line would
> > +     * also give that error if given sve384=on,sve640=on,sve512=off, so this is
> > +     * all fine. The important thing is that we get an error.
> > +     */
> > +    assert_error(qts, cpu_type, "cannot disable sve512",
> > +                 "{ 'sve384': true, 'sve512': false, 'sve640': true }");
> > +
> > +    /*
> > +     * We can disable power-of-2 vector lengths when all larger lengths
> > +     * are also disabled. The shorter, sve384=on,sve512=off,sve640=off
> > +     * works on the command line, but QMP doesn't know that all the
> > +     * vector lengths larger than 384-bits will be disabled until it
> > +     * sees the enabling of sve384, which comes near the end since it
> > +     * visits the lengths in reverse order. So we just have to explicitly
> > +     * disable them all.
> > +     */
> > +    assert_sve_vls(qts, cpu_type, 0x7,
> > +                   "{ 'sve384': true, 'sve512': false, 'sve640': false, "
> > +                   "  'sve768': false, 'sve896': false, 'sve1024': false, "
> > +                   "  'sve1152': false, 'sve1280': false, 'sve1408': false, "
> > +                   "  'sve1536': false, 'sve1664': false, 'sve1792': false, "
> > +                   "  'sve1920': false, 'sve2048': false }");
> > +
> > +    /* max_vq := 3, vq > max_vq, all-power-of-2(< vq) */
> > +    assert_sve_vls(qts, cpu_type, 0x1f,
> > +                   "{ 'sve384': true, 'sve512': true, 'sve640': true }");
> > +    assert_sve_vls(qts, cpu_type, 0xf,
> > +                   "{ 'sve384': true, 'sve512': true, 'sve640': false }");
> > +}
> > +
> > +static void sve_tests_sve_max_vq_8(const void *data)
> > +{
> > +    QTestState *qts;
> > +
> > +    qts = qtest_init(MACHINE "-cpu max,sve-max-vq=8");
> > +
> > +    assert_sve_vls(qts, "max", BIT(8) - 1, NULL);
> > +
> > +    /*
> > +     * Disabling the max-vq set by sve-max-vq is not allowed, but
> > +     * of course enabling it is OK.
> > +     */
> > +    assert_error(qts, "max", "cannot disable sve1024", "{ 'sve1024': false }");
> > +    assert_sve_vls(qts, "max", 0xff, "{ 'sve1024': true }");
> > +
> > +    /*
> > +     * Enabling anything larger than max-vq set by sve-max-vq is not
> > +     * allowed, but of course disabling everything larger is OK.
> > +     */
> > +    assert_error(qts, "max", "cannot enable sve1152", "{ 'sve1152': true }");
> > +    assert_sve_vls(qts, "max", 0xff, "{ 'sve1152': false }");
> > +
> > +    /*
> > +     * We can disable non power-of-2 lengths smaller than the max-vq
> > +     * set by sve-max-vq, but not power-of-2 lengths.
> > +     */
> > +    assert_sve_vls(qts, "max", 0xfb, "{ 'sve384': false }");
> > +    assert_error(qts, "max", "cannot disable sve256", "{ 'sve256': false }");
> > +
> > +    qtest_quit(qts);
> > +}
> > +
> > +static void sve_tests_sve_off(const void *data)
> > +{
> > +    QTestState *qts;
> > +
> > +    qts = qtest_init(MACHINE "-cpu max,sve=off");
> > +
> > +    /*
> > +     * SVE is off, so the map should be empty.
> > +     */
> > +    assert_sve_vls(qts, "max", 0, NULL);
> > +
> > +    /*
> > +     * We can't turn anything on, but off is OK.
> > +     */
> > +    assert_error(qts, "max", "cannot enable sve128", "{ 'sve128': true }");
> > +    assert_sve_vls(qts, "max", 0, "{ 'sve128': false }");
> > +
> > +    qtest_quit(qts);
> > +}
> > +
> >  static void test_query_cpu_model_expansion(const void *data)
> >  {
> >      QTestState *qts;
> > @@ -159,9 +366,12 @@ static void test_query_cpu_model_expansion(const void *data)
> >      if (g_str_equal(qtest_get_arch(), "aarch64")) {
> >          assert_has_feature(qts, "max", "aarch64");
> >          assert_has_feature(qts, "max", "sve");
> > +        assert_has_feature(qts, "max", "sve128");
> >          assert_has_feature(qts, "cortex-a57", "pmu");
> >          assert_has_feature(qts, "cortex-a57", "aarch64");
> >  
> > +        sve_tests_default(qts, "max");
> > +
> >          /* Test that features that depend on KVM generate errors without. */
> >          assert_error(qts, "max",
> >                       "'aarch64' feature cannot be disabled "
> > @@ -213,6 +423,13 @@ int main(int argc, char **argv)
> >      qtest_add_data_func("/arm/query-cpu-model-expansion",
> >                          NULL, test_query_cpu_model_expansion);
> >  
> > +    if (g_str_equal(qtest_get_arch(), "aarch64")) {
> > +        qtest_add_data_func("/arm/max/query-cpu-model-expansion/sve-max-vq-8",
> > +                            NULL, sve_tests_sve_max_vq_8);
> > +        qtest_add_data_func("/arm/max/query-cpu-model-expansion/sve-off",
> > +                            NULL, sve_tests_sve_off);
> > +    }
> > +
> >      if (kvm_available) {
> >          qtest_add_data_func("/arm/kvm/query-cpu-model-expansion",
> >                              NULL, test_query_cpu_model_expansion_kvm);
> > 
>
Andrew Jones June 27, 2019, 10:46 a.m. UTC | #8
On Wed, Jun 26, 2019 at 06:56:54PM +0200, Auger Eric wrote:
> > diff --git a/target/arm/helper.c b/target/arm/helper.c
> > index f500ccb6d31b..b7b719dba57f 100644
> > --- a/target/arm/helper.c
> > +++ b/target/arm/helper.c
> > @@ -5324,7 +5324,16 @@ static void zcr_write(CPUARMState *env, const ARMCPRegInfo *ri,
> >  
> >      /* Bits other than [3:0] are RAZ/WI.  */
> >      QEMU_BUILD_BUG_ON(ARM_MAX_VQ > 16);
> > -    raw_write(env, ri, value & 0xf);
> > +    value &= 0xf;
> > +
> > +    if (value) {
> > +        /* get next vq that is smaller than or equal to value's vq */
> > +        uint32_t vq = value + 1;
> > +        vq = arm_cpu_vq_map_next_smaller(cpu, vq + 1);
> > +        value = vq - 1;
> spec says:
> 
> "if an unsupported vector length is requested in ZCR_ELx, the
> implementation is required to select the largest
> supported vector length that is less than the requested length. This
> does not alter the value of ZCR_ELx.LEN.
> "
> 
> So I understand the value written in the reg should not be unmodified.
>

Sorry, I can't parse what you're trying to tell me here. Here we have
to write 'value', because that's what the guest is trying to do. As the
spec says in your quote, we have to pick the length the guest wants, or
the next smaller valid one, so that's what the code above does. So are
you just stating that you agree with this hunk of the code?

Thanks,
drew
Eric Auger June 27, 2019, 10:51 a.m. UTC | #9
Hi Drew,
On 6/27/19 11:40 AM, Andrew Jones wrote:
> On Wed, Jun 26, 2019 at 04:58:05PM +0200, Auger Eric wrote:
>> Hi Drew,
>>
>> On 6/21/19 6:34 PM, Andrew Jones wrote:
>>> Introduce cpu properties to give fine control over SVE vector lengths.
>>> We introduce a property for each valid length up to the current
>>> maximum supported, which is 2048-bits. The properties are named, e.g.
>>> sve128, sve256, sve512, ..., where the number is the number of bits.
>> sve384 then in the natural order, otherwise it gives the impression you
>> can only specify * 128bit pow of 2 sizes at this stage of the reading.
> 
> OK
> 
>>
>>>
>>> It's now possible to do something like -cpu max,sve-max-vq=4,sve384=off
>> Document the fact sve512 cannot be turned to off, which sounds fully
>> sensible (by reading the code). By the way, I think an actual
>> documentation should be provided in qemu. Maybe as spec to agree on.
> 
> Actually, maybe I could just allow it to be disabled. It would be
> a strange command line to do '-cpu max,sve-max-vq=4,sve512=off', but if
> that's what the user wants, then there's not really any good reason to
> block it. As I say below, mixing these types of properties on the command
> line isn't really a good idea, but it's not completely blocked because we
> want a user that prefers launching their (most likely TCG) guest with,
> e.g.  '-cpu max,sve-max-vq=4', to also have a functional QMP CPU model
> expansion, but the CPU model expansion for SVE vector lengths depends on
> the sve<vl-bits> properties, thus mixing sve-max-vq with sve<vl-bits>,
> where sve-max-vq comes first. They're just not mixed on the command line.
OK
> 
>>> to provide a guest vector lengths 128, 256, and 512 bits. The resulting
>>> set must conform to the architectural constraint of having all power-of-2
>>> lengths smaller than the maximum length present. It's also possible to
>>> only provide sve<vl-bits> properties, e.g. -cpu max,sve512=on> That example provides the machine with 128, 256, and 512 bit vector
>> lengths.
>>> It doesn't hurt to explicitly ask for all expected vector lengths,
>>> which is what, for example, libvirt should do.>
>>> Note1, it is not possible to use sve<vl-bits> properties before
>>> sve-max-vq, e.g. -cpu max,sve384=off,sve-max-vq=4, as supporting
>>> that overly complicates the user input validation.
>>>
>>> Note2, while one might expect -cpu max,sve-max-vq=4,sve512=on to be the
>>> same as -cpu max,sve512=on, they are not.
>> yep it is a bit weird
>>
>> Didn't you consider -cpu max,sve-max-vq=4,req_only=true removing non
>> power of 2 values and sve<vl-bits> setting a single VLbit?
> 
> Nope. I mostly considered making sure sve-max-vq was supported to the
> level necessary to work with the new properties, and to not change its
> current semantics, but I don't want to extend it in any way, nor to
> require it to be used with the new properties at all. sve-max-vq isn't
> even going to be supported by '-cpu host', so we can't rely on it being
> part of the SVE vector length selection for normal KVM guests.
OK, as long as it is documented somewhere.
> 
> Keep in mind that the current semantics of sve-max-vq are to enable all
> vector lengths up to and including that specified maximum. That's not
> a safe way to select vector lengths on KVM, as it may include vector
> lengths that the KVM host does not support, thus it's not something KVM
> users should use. It's already there for the 'max' cpu type though, so
> we work with it in this series the best we can for 'max', but not at all
> for 'host'.
Do you expect hosts to expose only a few of the permitted values? I
imagined it would generally expose none or all of them actually. If you
discourage people to use sve-max-vq and sve<>=on only sets 2^n values,
userspace will need to set manually all intermediate !2^n values
> 
> Re: documentation for this. I suppose I could write something up that
> clarifies the properties and also suggests best practices regarding
> sve-max-vq.

To me this is really helpful to have a global understanding
> 
>>> The former enables all vector lengths 512 bits and smaller
>> ( required and permitted)
>>> while the latter only sets the 512-bit
>>> length and its smaller power-of-2 lengths. It's probably best not to use
>>> sve-max-vq with sve<vl-bits> properties, but it can't be completely
>>> forbidden as we want qmp_query_cpu_model_expansion to work with guests
>>> launched with e.g. -cpu max,sve-max-vq=8 on their command line.
>> what does happen if you specify -cpu max,sve384=on? (seems to be allowed
>> in the code?)
> 
> That's perfectly valid with tcg, you'll get 128,256,384. It's also valid
> with KVM if you're host supports it.
OK so it exposes the maximum configurable vector length + all the
corresponding required values.
> 
>>
>>>
>>> Signed-off-by: Andrew Jones <drjones@redhat.com>
>>> ---
>>>  target/arm/cpu.c         |   6 +
>>>  target/arm/cpu.h         |  14 ++
>>>  target/arm/cpu64.c       | 360 ++++++++++++++++++++++++++++++++++++++-
>>>  target/arm/helper.c      |  11 +-
>>>  target/arm/monitor.c     |  16 ++
>>>  tests/arm-cpu-features.c | 217 +++++++++++++++++++++++
>>>  6 files changed, 620 insertions(+), 4 deletions(-)
>>>
>>> diff --git a/target/arm/cpu.c b/target/arm/cpu.c
>>> index f08e178fc84b..e060a0d9df0e 100644
>>> --- a/target/arm/cpu.c
>>> +++ b/target/arm/cpu.c
>>> @@ -1019,6 +1019,12 @@ static void arm_cpu_realizefn(DeviceState *dev, Error **errp)
>>>          return;
>>>      }
>>>  
>>> +    arm_cpu_sve_finalize(cpu, &local_err);
>>> +    if (local_err) {
>>> +        error_propagate(errp, local_err);
>>> +        return;
>>> +    }
>>> +
>>>      if (arm_feature(env, ARM_FEATURE_AARCH64) &&
>>>          cpu->has_vfp != cpu->has_neon) {
>>>          /*
>>> diff --git a/target/arm/cpu.h b/target/arm/cpu.h
>>> index f9da672be575..cbb155cf72a5 100644
>>> --- a/target/arm/cpu.h
>>> +++ b/target/arm/cpu.h
>>> @@ -184,8 +184,13 @@ typedef struct {
>>>  
>>>  #ifdef TARGET_AARCH64
>>>  # define ARM_MAX_VQ    16
>>> +void arm_cpu_sve_finalize(ARMCPU *cpu, Error **errp);
>>> +uint32_t arm_cpu_vq_map_next_smaller(ARMCPU *cpu, uint32_t vq);
>>>  #else
>>>  # define ARM_MAX_VQ    1
>>> +static inline void arm_cpu_sve_finalize(ARMCPU *cpu, Error **errp) { }
>>> +static inline uint32_t arm_cpu_vq_map_next_smaller(ARMCPU *cpu, uint32_t vq)
>>> +{ return 0; }
>>>  #endif
>>>  
>>>  typedef struct ARMVectorReg {
>>> @@ -915,6 +920,15 @@ struct ARMCPU {
>>>  
>>>      /* Used to set the maximum vector length the cpu will support.  */
>>>      uint32_t sve_max_vq;
>>> +
>>> +    /*
>>> +     * In sve_vq_map each set bit is a supported vector length of
>>> +     * (bit-number + 1) * 16 bytes, i.e. each bit number + 1 is the vector> +     * length in quadwords. We need a map size twice the maximum
>>> +     * quadword length though because we use two bits for each vector
>>> +     * length in order to track three states: uninitialized, off, and on.
>>> +     */
>>> +    DECLARE_BITMAP(sve_vq_map, ARM_MAX_VQ * 2);
>>>  };
>>>  
>>>  void arm_cpu_post_init(Object *obj);
>>> diff --git a/target/arm/cpu64.c b/target/arm/cpu64.c
>>> index 02ada65f240c..5def82111dee 100644
>>> --- a/target/arm/cpu64.c
>>> +++ b/target/arm/cpu64.c
>>> @@ -257,6 +257,149 @@ static void aarch64_a72_initfn(Object *obj)
>>>      define_arm_cp_regs(cpu, cortex_a72_a57_a53_cp_reginfo);
>>>  }
>>>  
>>> +/*
>>> + * While we eventually use cpu->sve_vq_map as a typical bitmap, where each vq
>>> + * has only two states (off/on), until we've finalized the map at realize time
>>> + * we use an extra bit, at the vq - 1 + ARM_MAX_VQ bit number, to also allow
>>> + * tracking of the uninitialized state. The arm_vq_state typedef and following
>>> + * functions allow us to more easily work with the bitmap. Also, while the map
>>> + * is still initializing, sve-max-vq has an additional three states, bringing
>>> + * the number of its states to five, which are the following:
>>> + *
>>> + * sve-max-vq:
>>> + *   0:    SVE is disabled. The default value for a vq in the map is 'OFF'.
>>> + *  -1:    SVE is enabled, but neither sve-max-vq nor sve<vl-bits> properties
>>> + *         have yet been specified by the user. The default value for a vq in
>>> + *         the map is 'ON'.
>>> + *  -2:    SVE is enabled and one or more sve<vl-bits> properties have been
>>> + *         set to 'OFF' by the user, but no sve<vl-bits> properties have yet
>>> + *         been set to 'ON'. The user is now blocked from setting sve-max-vq
>>> + *         and the default value for a vq in the map is 'ON'.
>>> + *  -3:    SVE is enabled and one or more sve<vl-bits> properties have been
>>> + *         set to 'ON' by the user. The user is blocked from setting sve-max-vq
>>> + *         and the default value for a vq in the map is 'OFF'. sve-max-vq never
>>> + *         transitions back to -2, even if later inputs disable the vector
>>> + *         lengths that initially transitioned sve-max-vq to this state. This
>>> + *         avoids the default values from flip-flopping.
>>> + *  [1-ARM_MAX_VQ]: SVE is enabled and the user has specified a valid
>>> + *                  sve-max-vq. The sve-max-vq specified vq and all smaller
>>> + *                  vq's will be initially enabled. All larger vq's will have
>>> + *                  a default of 'OFF'.
>>> + */
>>> +#define ARM_SVE_INIT          -1
>>> +#define ARM_VQ_DEFAULT_ON     -2
>>> +#define ARM_VQ_DEFAULT_OFF    -3
>>> +
>>> +#define arm_sve_have_max_vq(cpu) ((int32_t)(cpu)->sve_max_vq > 0)
>>> +
>>> +typedef enum arm_vq_state {
>>> +    ARM_VQ_OFF,
>>> +    ARM_VQ_ON,
>>> +    ARM_VQ_UNINITIALIZED,
>>> +} arm_vq_state;
>>> +
>>> +static arm_vq_state arm_cpu_vq_map_get(ARMCPU *cpu, int vq)> +{
>>> +    assert(vq <= ARM_MAX_VQ);
>>> +
>>> +    return test_bit(vq - 1, cpu->sve_vq_map) |
>>> +           test_bit(vq - 1 + ARM_MAX_VQ, cpu->sve_vq_map) << 1;
>>> +}> +
>>> +static void arm_cpu_vq_map_set(ARMCPU *cpu, int vq, arm_vq_state state)
>>> +{
>>> +    assert(state == ARM_VQ_OFF || state == ARM_VQ_ON);
>>> +    assert(vq <= ARM_MAX_VQ);
>>> +
>>> +    clear_bit(vq - 1 + ARM_MAX_VQ, cpu->sve_vq_map);
>>> +
>>> +    if (state == ARM_VQ_ON) {
>>> +        set_bit(vq - 1, cpu->sve_vq_map);
>>> +    } else {
>>> +        clear_bit(vq - 1, cpu->sve_vq_map);
>>> +    }
>>> +}
>>> +
>>> +static void arm_cpu_vq_map_init(ARMCPU *cpu)
>>> +{
>>> +    bitmap_zero(cpu->sve_vq_map, ARM_MAX_VQ * 2);
>> nit: bitmap_clear(map, 0, ARM_MAX_VQ);
> 
> bitmap_clear will only clear the specified bits, leaving the upper bits
> uninitialized, which could cause problems. It's not a problem here
> because we're using a zero initialized cpu state member, but if it was
> a bitmap like below, declared on the stack, then we need the zeroing at
> least once. I'd prefer to use zero here too, to keep it consistent.
Sorry I don't get it. I would have expected the above bitmap_clear would
0 initialize 0 - ARM_MAX_VQ-1 bits and the bitmap_set below would
initialize ARM_MAX_VQ - 2 * ARM_MAX_VQ -1 with 1's?

> 
>> /* all VLs OFF */
>>> +    bitmap_set(cpu->sve_vq_map, ARM_MAX_VQ, ARM_MAX_VQ);
>> /* all VLs uninitialized */
> 
> I can add one comment that says
> 
> "Set all VQ's to ARM_VQ_UNINITIALIZED" above both lines.
> 
>>> +}
>>> +
>>> +static bool arm_cpu_vq_map_is_finalized(ARMCPU *cpu)
>>> +{
>>> +    DECLARE_BITMAP(map, ARM_MAX_VQ * 2);
>>> +
>>> +    bitmap_zero(map, ARM_MAX_VQ * 2);
>> same
> 
> Here me must use bitmap_zero.
> 
>>> +    bitmap_set(map, ARM_MAX_VQ, ARM_MAX_VQ);
>>> +    bitmap_and(map, map, cpu->sve_vq_map, ARM_MAX_VQ * 2);
>>> +
>>> +    return bitmap_empty(map, ARM_MAX_VQ * 2);
>>> +}
>>> +
>>> +static void arm_cpu_vq_map_finalize(ARMCPU *cpu)
>>> +{
>>> +    Error *err = NULL;
>>> +    char name[8];
>>> +    uint32_t vq;
>>> +    bool value;
>>> +
>>> +    /*
>>> +     * We use the property get accessor because it knows what default
>>> +     * values to return for uninitialized vector lengths.
>>> +     */
>>> +    for (vq = 1; vq <= ARM_MAX_VQ; ++vq) {
>>> +        sprintf(name, "sve%d", vq * 128);
>>> +        value = object_property_get_bool(OBJECT(cpu), name, &err);
>>> +        assert(!err);
>>> +        if (value) {
>>> +            arm_cpu_vq_map_set(cpu, vq, ARM_VQ_ON);
>>> +        } else {
>>> +            arm_cpu_vq_map_set(cpu, vq, ARM_VQ_OFF);
>>> +        }
>>> +    }
>>> +
>>> +    assert(arm_cpu_vq_map_is_finalized(cpu));
>> this can be removed
> 
> Yes, it can be today, as the code works fine. The idea was that if
> somebody came in here and modified this in someway that broke the
> finalized state, then we'd catch it here before going off and doing
> weird things. This isn't a hot path, so I'd prefer we keep it, but
> if QEMU maintainers prefer to limit defensive code, then I can
> certainly remove it.
Well I understand this was useful was developing and to check some
tricky states but some of the asserts correspond to some checks that
look rather obvious, like this one. But well I let others give their
opinions and it is not a big deal either. Then we can wonder when one
needs a trace mesg before asserting ...
> 
>>> +}
>>> +
>>> +void arm_cpu_sve_finalize(ARMCPU *cpu, Error **errp)
>>> +{
>>> +    Error *err = NULL;
>>> +
>>> +    if (!cpu->sve_max_vq) {
>>> +        bitmap_zero(cpu->sve_vq_map, ARM_MAX_VQ * 2);
>>> +        return;
>>> +    }
>>> +
>>> +    if (cpu->sve_max_vq == ARM_SVE_INIT) {
>>> +        object_property_set_uint(OBJECT(cpu), ARM_MAX_VQ, "sve-max-vq", &err);
>>> +        if (err) {
>>> +            error_propagate(errp, err);
>>> +            return;
>>> +        }
>>> +        assert(cpu->sve_max_vq == ARM_MAX_VQ);
>> I guess those asserts can be removed now?
> 
> I'd prefer to keep it. It's critical that we get this right, so if
> somebody changes something somewhere in the property set/get code
> that would break this, then it's best we know right away. Again,
> though, if there's some reason to limit defensive code, even on the
> init path, then I can.
Here I would expect that if the set did not fail, sve-max-vq effectively
if set to the expected value. But maybe I don't remember the cases where
sve_max-vq can be set to some other value without reprting an error.

> 
>>> +        arm_cpu_vq_map_finalize(cpu);
>> move the arm_cpu_vq_map_finalize out of the if, at the end.
> 
> That wouldn't work. In this branch arm_cpu_vq_map_finalize() comes
> after setting cpu->sve_max_vq. In the else branch it must come first.
That's right sorry
> 
>>> +    } else {
>>> +        arm_cpu_vq_map_finalize(cpu);
>>> +        if (!arm_sve_have_max_vq(cpu)) {
>>> +            cpu->sve_max_vq = arm_cpu_vq_map_next_smaller(cpu, ARM_MAX_VQ + 1);
>>> +        }
>>> +    }
>>> +
>>> +    assert(cpu->sve_max_vq == arm_cpu_vq_map_next_smaller(cpu, ARM_MAX_VQ));>
> What happened to my '+ 1' here? I see it's in the patch, but somehow got
> removed in your quote of the patch? For a second there I thought something
> was really wrong, since that assert should have fired for every full map.
Yep sorry for the cut :-(
> 
>> same here
> 
> Anyway my same argument that we don't want to leave arm_cpu_sve_finalize()
> without knowing we got this right applies.

> 
>>> +}
>>> +
>>> +uint32_t arm_cpu_vq_map_next_smaller(ARMCPU *cpu, uint32_t vq)
>>> +{
>>> +    uint32_t bitnum;
>>> +
>>> +    assert(vq <= ARM_MAX_VQ + 1);
>>> +    assert(arm_cpu_vq_map_is_finalized(cpu));
>>> +
>>> +    bitnum = find_last_bit(cpu->sve_vq_map, vq - 1);
>> why do you pass ARM_MAX_VQ + 1 and then do vq -1? doesn't
>> find_last_bit() take the size which is ARM_MAX_VQ in this case?
> 
> The size is ARM_MAX_VQ + 1, when you want to also check the bitnum
> corresponding to ARM_MAX_VQ. The bitnum for a VQ is always VQ - 1.
> Recall VQ=1, which is 128-bits. It takes the bit position zero.
> VQ=ARM_MAX_VQ=16 takes the bit position 15. As for the 'vq - 1' here
> in the find_last_bit() call, that's required because that's what the
> function says it does. It finds the next smaller VQ. So if you pass
> in, for example, vq=2, it shouldn't search anything but bit position
> zero:
> 
>  vq=2 => max next smaller is vq=1, bitnum = 0
>       => search the bitmap of size 1 for the last set bit
> 
> Or, if you want to search the whole map, including the maximum
> possibly VQ, as is done above, then you need to pass ARM_MAX_VQ + 1,
> since the max next smaller VQ is ARM_MAX_VQ.
OK I get it now. Maybe renaming the function into something like
arm_cpu_vq_map_last_bit() would have avoided that. the first assert
looks strange typically.
> 
>>> +    return bitnum == vq - 1 ? 0 : bitnum + 1;
>>> +}
>>> +
>>>  static void cpu_max_get_sve_max_vq(Object *obj, Visitor *v, const char *name,
>>>                                     void *opaque, Error **errp)
>>>  {
>>> @@ -283,12 +426,203 @@ static void cpu_max_set_sve_max_vq(Object *obj, Visitor *v, const char *name,
>>>          return;
>>>      }
>>>  
>>> +    /*
>>> +     * It gets complicated trying to support both sve-max-vq and
>>> +     * sve<vl-bits> properties together, so we mostly don't. We
>>> +     * do allow both if sve-max-vq is specified first and only once
>>> +     * though.
>>> +     */
>>> +    if (cpu->sve_max_vq != ARM_SVE_INIT) {
>>> +        error_setg(errp, "sve<vl-bits> in use or sve-max-vq already "
>>> +                   "specified");
>>> +        error_append_hint(errp, "sve-max-vq must come before all "
>>> +                          "sve<vl-bits> properties and it must only "
>>> +                          "be specified once.\n");
>>> +        return;
>>> +    }
>>> +
>>>      cpu->sve_max_vq = value;
>>>  
>>>      if (cpu->sve_max_vq == 0 || cpu->sve_max_vq > ARM_MAX_VQ) {
>>>          error_setg(errp, "unsupported SVE vector length");
>>>          error_append_hint(errp, "Valid sve-max-vq in range [1-%d]\n",
>>>                            ARM_MAX_VQ);
>>> +    } else {
>>> +        uint32_t vq;
>>> +
>>> +        for (vq = 1; vq <= cpu->sve_max_vq; ++vq) {
>>> +            char name[8];
>>> +            sprintf(name, "sve%d", vq * 128);
>>> +            object_property_set_bool(obj, true, name, &err);
>>> +            if (err) {
>>> +                error_propagate(errp, err);
>>> +                return;
>>> +            }
>>> +        }
>>> +    }
>>> +}
>>> +
>>> +static void cpu_arm_get_sve_vq(Object *obj, Visitor *v, const char *name,
>>> +                               void *opaque, Error **errp)
>>> +{
>>> +    ARMCPU *cpu = ARM_CPU(obj);
>>> +    int vq = atoi(&name[3]) / 128;
>>> +    arm_vq_state vq_state;
>>> +    bool value;
>>> +
>>> +    vq_state = arm_cpu_vq_map_get(cpu, vq);
>>> +
>>> +    if (!cpu->sve_max_vq) {
>>> +        /* All vector lengths are disabled when SVE is off. */
>>> +        value = false;
>>> +    } else if (vq_state == ARM_VQ_ON) {
>>> +        value = true;
>>> +    } else if (vq_state == ARM_VQ_OFF) {
>>> +        value = false;
>>> +    } else {
>>> +        /*
>>> +         * vq is uninitialized. We pick a default here based on the
>>> +         * the state of sve-max-vq and other sve<vl-bits> properties.
>>> +         */
>>> +        if (arm_sve_have_max_vq(cpu)) {
>>> +            /*
>>> +             * If we have sve-max-vq, then all remaining uninitialized
>>> +             * vq's are 'OFF'.
>>> +             */
>>> +            value = false;
>>> +        } else {
>>> +            switch (cpu->sve_max_vq) {
>>> +            case ARM_SVE_INIT:
>>> +            case ARM_VQ_DEFAULT_ON:
>>> +                value = true;
>>> +                break;
>>> +            case ARM_VQ_DEFAULT_OFF:
>>> +                value = false;
>>> +                break;
>>> +            }
>>> +        }
>>> +    }
>>> +
>>> +    visit_type_bool(v, name, &value, errp);
>>> +}
>>> +
>>> +static void cpu_arm_set_sve_vq(Object *obj, Visitor *v, const char *name,
>>> +                               void *opaque, Error **errp)
>>> +{
>>> +    ARMCPU *cpu = ARM_CPU(obj);
>>> +    int vq = atoi(&name[3]) / 128;
>>> +    arm_vq_state vq_state;
>>> +    Error *err = NULL;
>>> +    uint32_t max_vq = 0;
>>> +    bool value;
>>> +
>>> +    visit_type_bool(v, name, &value, &err);
>>> +    if (err) {
>>> +        error_propagate(errp, err);
>>> +        return;
>>> +    }
>>> +
>>> +    if (value && !cpu->sve_max_vq) {
>>> +        error_setg(errp, "cannot enable %s", name);
>>> +        error_append_hint(errp, "SVE has been disabled with sve=off\n");
>>> +        return;
>>> +    } else if (!cpu->sve_max_vq) {
>>> +        /*
>>> +         * We don't complain about disabling vector lengths when SVE
>>> +         * is off, but we don't do anything either.
>>> +         */
>>> +        return;
>>> +    }
>>> +
>>> +    if (arm_sve_have_max_vq(cpu)) {
>>> +        max_vq = cpu->sve_max_vq;
>>> +    } else {
>>> +        if (value) {
>>> +            cpu->sve_max_vq = ARM_VQ_DEFAULT_OFF;
>>> +        } else if (cpu->sve_max_vq != ARM_VQ_DEFAULT_OFF) {
>>> +            cpu->sve_max_vq = ARM_VQ_DEFAULT_ON;
>>> +        }
>>> +    }
>>> +
>>> +    /*
>>> +     * We need to know the maximum vector length, which may just currently
>>> +     * be the maximum length, in order to validate the enabling/disabling
>>> +     * of this vector length. We use the property get accessor in order to
>>> +     * get the appropriate default value for any uninitialized lengths.
>>> +     */
>>> +    if (!max_vq) {
>>> +        char tmp[8];
>>> +        bool s;
>>> +
>>> +        for (max_vq = ARM_MAX_VQ; max_vq >= 1; --max_vq) {
>>> +            sprintf(tmp, "sve%d", max_vq * 128);
>>> +            s = object_property_get_bool(OBJECT(cpu), tmp, &err);
>>> +            assert(!err);
>>> +            if (s) {
>>> +                break;
>>> +            }
>>> +        }
>>> +    }
>>> +
>>> +    if (arm_sve_have_max_vq(cpu) && value && vq > cpu->sve_max_vq) {
>>> +        error_setg(errp, "cannot enable %s", name);
>>> +        error_append_hint(errp, "vq=%d (%d bits) is larger than the "
>>> +                          "maximum vector length, sve-max-vq=%d "
>>> +                          "(%d bits)\n", vq, vq * 128,
>>> +                          cpu->sve_max_vq, cpu->sve_max_vq * 128);
>>> +    } else if (arm_sve_have_max_vq(cpu) && !value && vq == cpu->sve_max_vq) {
>>> +        error_setg(errp, "cannot disable %s", name);
>>> +        error_append_hint(errp, "The maximum vector length must be "
>>> +                          "enabled, sve-max-vq=%d (%d bits)\n",
>>> +                          cpu->sve_max_vq, cpu->sve_max_vq * 128);
>>> +    } else if (arm_sve_have_max_vq(cpu) && !value && vq < cpu->sve_max_vq &&
>>> +               is_power_of_2(vq)) {
>>> +        error_setg(errp, "cannot disable %s", name);
>>> +        error_append_hint(errp, "vq=%d (%d bits) is required as it is a "
>>> +                          "power-of-2 length smaller than the maximum, "
>>> +                          "sve-max-vq=%d (%d bits)\n", vq, vq * 128,
>>> +                          cpu->sve_max_vq, cpu->sve_max_vq * 128);
>>> +    } else if (!value && vq < max_vq && is_power_of_2(vq)) {
>>> +        error_setg(errp, "cannot disable %s", name);
>>> +        error_append_hint(errp, "Vector length %d-bits is required as it "
>>> +                          "is a power-of-2 length smaller than another "
>>> +                          "enabled vector length. Disable all larger vector "
>>> +                          "lengths first.\n", vq * 128);
>>> +    } else {
>> adding return in each if/elsif would allow to avoid this indent.
> 
> Yeah, but I didn't really mind the indent :-)
> 
>>> +        if (value) {
>>> +            bool fail = false;
>>> +            uint32_t s;
>>> +
>>> +            /*
>>> +             * Enabling a vector length automatically enables all
>>> +             * uninitialized power-of-2 lengths smaller than it, as
>>> +             * per the architecture.
>>> +             */
>> Test we are not attempting to enable a !is_power_of_2
> 
> I'm not sure what this means. In this context 'vq' can be a !power-of-2
> if it wants to be, and there's no way for 's' to be a !power-of-2 because
> we filter the vq's with is_power_of_2(s).
Yep got it now
> 
>>> +            for (s = 1; s < vq; ++s) {
>>> +                if (is_power_of_2(s)) {
>>> +                    vq_state = arm_cpu_vq_map_get(cpu, s);
>>> +                    if (vq_state == ARM_VQ_UNINITIALIZED) {
>>> +                        arm_cpu_vq_map_set(cpu, s, ARM_VQ_ON);
>>> +                    } else if (vq_state == ARM_VQ_OFF) {
>>> +                        fail = true;
>>> +                        break;
>>> +                    }
>>> +                }
>>> +            }
>>> +
>>> +            if (fail) {
>>> +                error_setg(errp, "cannot enable %s", name);
>>> +                error_append_hint(errp, "Vector length %d-bits is disabled "
>>> +                                  "and is a power-of-2 length smaller than "
>>> +                                  "%s. All power-of-2 vector lengths smaller "
>>> +                                  "than the maximum length are required.\n",
>>> +                                  s * 128, name);
>>> +            } else {
>>> +                arm_cpu_vq_map_set(cpu, vq, ARM_VQ_ON);
>>> +            }
>>> +        } else {
>>> +            arm_cpu_vq_map_set(cpu, vq, ARM_VQ_OFF);
>>> +        }
>>>      }
>>>  }
>>>  
>>> @@ -318,10 +652,11 @@ static void cpu_arm_set_sve(Object *obj, Visitor *v, const char *name,
>>>          /*
>>>           * We handle the -cpu <cpu>,sve=off,sve=on case by reinitializing,
>>>           * but otherwise we don't do anything as an sve=on could come after
>>> -         * a sve-max-vq setting.
>>> +         * a sve-max-vq or sve<vl-bits> setting.
>>>           */
>>>          if (!cpu->sve_max_vq) {
>>> -            cpu->sve_max_vq = ARM_MAX_VQ;
>>> +            cpu->sve_max_vq = ARM_SVE_INIT;
>>> +            arm_cpu_vq_map_init(cpu);
>>>          }
>>>      } else {
>>>          cpu->sve_max_vq = 0;
>>> @@ -336,6 +671,7 @@ static void cpu_arm_set_sve(Object *obj, Visitor *v, const char *name,
>>>  static void aarch64_max_initfn(Object *obj)
>>>  {
>>>      ARMCPU *cpu = ARM_CPU(obj);
>>> +    uint32_t vq;
>>>  
>>>      if (kvm_enabled()) {
>>>          kvm_arm_set_cpu_features_from_host(cpu);
>>> @@ -420,11 +756,29 @@ static void aarch64_max_initfn(Object *obj)
>>>          cpu->dcz_blocksize = 7; /*  512 bytes */
>>>  #endif
>>>  
>>> -        cpu->sve_max_vq = ARM_MAX_VQ;
>>> +        /*
>>> +         * sve_max_vq is initially unspecified, but must be initialized to a
>>> +         * non-zero value (ARM_SVE_INIT) to indicate that this cpu type has
>>> +         * SVE. It will be finalized in arm_cpu_realizefn().
>>> +         */
>>> +        cpu->sve_max_vq = ARM_SVE_INIT;
>>>          object_property_add(obj, "sve-max-vq", "uint32", cpu_max_get_sve_max_vq,
>>>                              cpu_max_set_sve_max_vq, NULL, NULL, &error_fatal);
>>>          object_property_add(obj, "sve", "bool", cpu_arm_get_sve,
>>>                              cpu_arm_set_sve, NULL, NULL, &error_fatal);
>>> +
>>> +        /*
>>> +         * sve_vq_map uses a special state while setting properties, so
>>> +         * we initialize it here with its init function and finalize it
>>> +         * in arm_cpu_realizefn().
>>> +         */
>>> +        arm_cpu_vq_map_init(cpu);
>>> +        for (vq = 1; vq <= ARM_MAX_VQ; ++vq) {
>>> +            char name[8];
>>> +            sprintf(name, "sve%d", vq * 128);
>>> +            object_property_add(obj, name, "bool", cpu_arm_get_sve_vq,
>>> +                                cpu_arm_set_sve_vq, NULL, NULL, &error_fatal);
>>> +        }
>>>      }
>>>  }
>>>  
>>> diff --git a/target/arm/helper.c b/target/arm/helper.c
>>> index f500ccb6d31b..b7b719dba57f 100644
>>> --- a/target/arm/helper.c
>>> +++ b/target/arm/helper.c
>>> @@ -5324,7 +5324,16 @@ static void zcr_write(CPUARMState *env, const ARMCPRegInfo *ri,
>>>  
>>>      /* Bits other than [3:0] are RAZ/WI.  */
>>>      QEMU_BUILD_BUG_ON(ARM_MAX_VQ > 16);
>>> -    raw_write(env, ri, value & 0xf);
>>> +    value &= 0xf;
>>> +
>>> +    if (value) {> +        /* get next vq that is smaller than or equal to value's vq */
>>> +        uint32_t vq = value + 1;
>> ditto
> 
> What does this 'ditto' map to? Oh, probably the discussion about vq's
> getting +1's and -1's. In this context 'value' is the ZCR_ELx
> representation of a VQ, which is VQ - 1, just like in the bitmap. To
> translate that to a VQ we need to do a '+ 1'.

Until here I follow. By the way in which doc did you find the
description of ZCR_ELx?

 As I wrote in the comment
> here, we want to find the next smaller or equal VQ here, because this is
> the input VQ from the guest which may itself be a valid VQ, but if it's
> not, we need to step down to the next valid one. So we ask
> arm_cpu_vq_map_next_smaller() for 'vq + 1' in order to ensure 'vq' will
> be in the searched range. After we get a valid vq from
> arm_cpu_vq_map_next_smaller(), which must be at least '1', since that's
> required by the architecture and thus will always be present, we need to
> translate it back to the ZCR_ELx representation with the subtraction.
> Phew... We programmers do love adding and subtracting by one, right :-)
Got it now. but well I scratched my head
> 
>>> +        vq = arm_cpu_vq_map_next_smaller(cpu, vq + 1);
>>> +        value = vq - 1;
>>> +    }
>>> +
>>> +    raw_write(env, ri, value);
>>>  
>>>      /*
>>>       * Because we arrived here, we know both FP and SVE are enabled;
>>> diff --git a/target/arm/monitor.c b/target/arm/monitor.c
>>> index 157c487a1551..1e213906fd8f 100644
>>> --- a/target/arm/monitor.c
>>> +++ b/target/arm/monitor.c
>>> @@ -89,8 +89,24 @@ GICCapabilityList *qmp_query_gic_capabilities(Error **errp)
>>>      return head;
>>>  }
>>>  
>>> +QEMU_BUILD_BUG_ON(ARM_MAX_VQ > 16);
>>> +
>>> +/*
>>> + * These are cpu model features we want to advertise. The order here
>>> + * matters as this is the order in which qmp_query_cpu_model_expansion
>>> + * will attempt to set them. If there are dependencies between features,
>>> + * as there are with the sve<vl-bits> features, then the order that
>>> + * considers those dependencies must be used.
>>> + *
>>> + * The sve<vl-bits> features need to be in reverse order in order to
>>> + * enable/disable the largest vector lengths first, ensuring all
>>> + * power-of-2 vector lengths smaller can also be enabled/disabled.
>>> + */
>>>  static const char *cpu_model_advertised_features[] = {
>>>      "aarch64", "pmu", "sve",
>>> +    "sve2048", "sve1920", "sve1792", "sve1664", "sve1536", "sve1408",
>>> +    "sve1280", "sve1152", "sve1024", "sve896", "sve768", "sve640",
>>> +    "sve512", "sve384", "sve256", "sve128",
>>>      NULL
>>>  };
>>>  
>>> diff --git a/tests/arm-cpu-features.c b/tests/arm-cpu-features.c
>> Please move all the tests in a separate patch.
> 
> I'd prefer not to. I like keeping the test cases that test the new code in
> the same patch. It self documents what the test cases map to what code and
> allows immediate testing of the patch when bisecting. Also I'm not really
> sure how it makes review worse, as another patch would look the same, just
> in a different email. 
A reviewier might not be familiar with both standard code and test and
feel reluctant to invest reading boths, hence reducing the chances to
collect R-b's. But that's my humble opinion.
> 
>> Each day has enough trouble of its own ;-) sweat...
> 
> I really appreciate the review! I realize it's generating sweat,
> especially with the European heat wave! You don't have to finish
> a patch before sending comments. I'm fine with multiple replies to
> the same patch if you want to break your review up into smaller
> bites.

Thanks. Yes breathing times are needed due to the heat and digestion of
the code ;-)

Thanks

Eric
> 
> Thanks,
> drew
> 
>>
>> Thanks
>>
>> Eric
>>> index 509e458e9c2f..a4bf6aec00df 100644
>>> --- a/tests/arm-cpu-features.c
>>> +++ b/tests/arm-cpu-features.c
>>> @@ -13,6 +13,18 @@
>>>  #include "qapi/qmp/qdict.h"
>>>  #include "qapi/qmp/qjson.h"
>>>  
>>> +#if __SIZEOF_LONG__ == 8
>>> +#define BIT(n) (1UL << (n))
>>> +#else
>>> +#define BIT(n) (1ULL << (n))
>>> +#endif
>>> +
>>> +/*
>>> + * We expect the SVE max-vq to be 16. Also it must be <= 64
>>> + * for our test code, otherwise 'vls' can't just be a uint64_t.
>>> + */
>>> +#define SVE_MAX_VQ 16
>>> +
>>>  #define MACHINE    "-machine virt,gic-version=max "
>>>  #define QUERY_HEAD "{ 'execute': 'query-cpu-model-expansion', " \
>>>                       "'arguments': { 'type': 'full', "
>>> @@ -137,6 +149,201 @@ static void assert_bad_props(QTestState *qts, const char *cpu_type)
>>>      qobject_unref(resp);
>>>  }
>>>  
>>> +static void resp_get_sve_vls(QDict *resp, uint64_t *vls, uint32_t *max_vq)
>>> +{
>>> +    const QDictEntry *e;
>>> +    QDict *qdict;
>>> +    int n = 0;
>>> +
>>> +    *vls = 0;
>>> +
>>> +    qdict = resp_get_props(resp);
>>> +
>>> +    for (e = qdict_first(qdict); e; e = qdict_next(qdict, e)) {
>>> +        if (strlen(e->key) > 3 && !strncmp(e->key, "sve", 3) &&
>>> +            g_ascii_isdigit(e->key[3])) {
>>> +            char *endptr;
>>> +            int bits;
>>> +
>>> +            bits = g_ascii_strtoll(&e->key[3], &endptr, 10);
>>> +            if (!bits || *endptr != '\0') {
>>> +                continue;
>>> +            }
>>> +
>>> +            if (qdict_get_bool(qdict, e->key)) {
>>> +                *vls |= BIT((bits / 128) - 1);
>>> +            }
>>> +            ++n;
>>> +        }
>>> +    }
>>> +
>>> +    g_assert(n == SVE_MAX_VQ);
>>> +
>>> +    *max_vq = !*vls ? 0 : 64 - __builtin_clzll(*vls);
>>> +}
>>> +
>>> +static uint64_t sve_get_vls(QTestState *qts, const char *cpu_type,
>>> +                            const char *fmt, ...)
>>> +{
>>> +    QDict *resp;
>>> +    uint64_t vls;
>>> +    uint32_t max_vq;
>>> +
>>> +    if (fmt) {
>>> +        QDict *args;
>>> +        va_list ap;
>>> +
>>> +        va_start(ap, fmt);
>>> +        args = qdict_from_vjsonf_nofail(fmt, ap);
>>> +        va_end(ap);
>>> +
>>> +        resp = qtest_qmp(qts, QUERY_HEAD "'model': { 'name': %s, "
>>> +                                                    "'props': %p }"
>>> +                              QUERY_TAIL, cpu_type, args);
>>> +    } else {
>>> +        resp = do_query_no_props(qts, cpu_type);
>>> +    }
>>> +
>>> +    g_assert(resp);
>>> +    resp_get_sve_vls(resp, &vls, &max_vq);
>>> +    qobject_unref(resp);
>>> +
>>> +    return vls;
>>> +}
>>> +
>>> +#define assert_sve_vls(qts, cpu_type, expected_vls, fmt, ...) \
>>> +    g_assert(sve_get_vls(qts, cpu_type, fmt, ##__VA_ARGS__) == expected_vls)
>>> +
>>> +static void sve_tests_default(QTestState *qts, const char *cpu_type)
>>> +{
>>> +    /*
>>> +     * With no sve-max-vq or sve<vl-bits> properties on the command line
>>> +     * the default is to have all vector lengths enabled.
>>> +     */
>>> +    assert_sve_vls(qts, cpu_type, BIT(SVE_MAX_VQ) - 1, NULL);
>>> +
>>> +    /*
>>> +     * -------------------------------------------------------------------
>>> +     *               power-of-2(vq)   all-power-            can      can
>>> +     *                                of-2(< vq)          enable   disable
>>> +     * -------------------------------------------------------------------
>>> +     * vq < max_vq      no            MUST*                yes      yes
>>> +     * vq < max_vq      yes           MUST*                yes      no
>>> +     * -------------------------------------------------------------------
>>> +     * vq == max_vq     n/a           MUST*                yes**    yes**
>>> +     * -------------------------------------------------------------------
>>> +     * vq > max_vq      n/a           no                   no       yes
>>> +     * vq > max_vq      n/a           yes                  yes      yes
>>> +     * -------------------------------------------------------------------
>>> +     *
>>> +     * [*] "MUST" means this requirement must already be satisfied,
>>> +     *     otherwise 'max_vq' couldn't itself be enabled.
>>> +     *
>>> +     * [**] Not testable with the QMP interface, only with the command line.
>>> +     */
>>> +
>>> +    /* max_vq := 8 */
>>> +    assert_sve_vls(qts, cpu_type, 0x8b, "{ 'sve1024': true }");
>>> +
>>> +    /* max_vq := 8, vq < max_vq, !power-of-2(vq) */
>>> +    assert_sve_vls(qts, cpu_type, 0x8f,
>>> +                   "{ 'sve1024': true, 'sve384': true }");
>>> +    assert_sve_vls(qts, cpu_type, 0x8b,
>>> +                   "{ 'sve1024': true, 'sve384': false }");
>>> +
>>> +    /* max_vq := 8, vq < max_vq, power-of-2(vq) */
>>> +    assert_sve_vls(qts, cpu_type, 0x8b,
>>> +                   "{ 'sve1024': true, 'sve256': true }");
>>> +    assert_error(qts, cpu_type, "cannot disable sve256",
>>> +                 "{ 'sve1024': true, 'sve256': false }");
>>> +
>>> +    /*
>>> +     * max_vq := 3, vq > max_vq, !all-power-of-2(< vq)
>>> +     *
>>> +     * If given sve384=on,sve512=off,sve640=on the command line error would be
>>> +     * "cannot enable sve640", but QMP visits the vector lengths in reverse
>>> +     * order, so we get "cannot disable sve512" instead. The command line would
>>> +     * also give that error if given sve384=on,sve640=on,sve512=off, so this is
>>> +     * all fine. The important thing is that we get an error.
>>> +     */
>>> +    assert_error(qts, cpu_type, "cannot disable sve512",
>>> +                 "{ 'sve384': true, 'sve512': false, 'sve640': true }");
>>> +
>>> +    /*
>>> +     * We can disable power-of-2 vector lengths when all larger lengths
>>> +     * are also disabled. The shorter, sve384=on,sve512=off,sve640=off
>>> +     * works on the command line, but QMP doesn't know that all the
>>> +     * vector lengths larger than 384-bits will be disabled until it
>>> +     * sees the enabling of sve384, which comes near the end since it
>>> +     * visits the lengths in reverse order. So we just have to explicitly
>>> +     * disable them all.
>>> +     */
>>> +    assert_sve_vls(qts, cpu_type, 0x7,
>>> +                   "{ 'sve384': true, 'sve512': false, 'sve640': false, "
>>> +                   "  'sve768': false, 'sve896': false, 'sve1024': false, "
>>> +                   "  'sve1152': false, 'sve1280': false, 'sve1408': false, "
>>> +                   "  'sve1536': false, 'sve1664': false, 'sve1792': false, "
>>> +                   "  'sve1920': false, 'sve2048': false }");
>>> +
>>> +    /* max_vq := 3, vq > max_vq, all-power-of-2(< vq) */
>>> +    assert_sve_vls(qts, cpu_type, 0x1f,
>>> +                   "{ 'sve384': true, 'sve512': true, 'sve640': true }");
>>> +    assert_sve_vls(qts, cpu_type, 0xf,
>>> +                   "{ 'sve384': true, 'sve512': true, 'sve640': false }");
>>> +}
>>> +
>>> +static void sve_tests_sve_max_vq_8(const void *data)
>>> +{
>>> +    QTestState *qts;
>>> +
>>> +    qts = qtest_init(MACHINE "-cpu max,sve-max-vq=8");
>>> +
>>> +    assert_sve_vls(qts, "max", BIT(8) - 1, NULL);
>>> +
>>> +    /*
>>> +     * Disabling the max-vq set by sve-max-vq is not allowed, but
>>> +     * of course enabling it is OK.
>>> +     */
>>> +    assert_error(qts, "max", "cannot disable sve1024", "{ 'sve1024': false }");
>>> +    assert_sve_vls(qts, "max", 0xff, "{ 'sve1024': true }");
>>> +
>>> +    /*
>>> +     * Enabling anything larger than max-vq set by sve-max-vq is not
>>> +     * allowed, but of course disabling everything larger is OK.
>>> +     */
>>> +    assert_error(qts, "max", "cannot enable sve1152", "{ 'sve1152': true }");
>>> +    assert_sve_vls(qts, "max", 0xff, "{ 'sve1152': false }");
>>> +
>>> +    /*
>>> +     * We can disable non power-of-2 lengths smaller than the max-vq
>>> +     * set by sve-max-vq, but not power-of-2 lengths.
>>> +     */
>>> +    assert_sve_vls(qts, "max", 0xfb, "{ 'sve384': false }");
>>> +    assert_error(qts, "max", "cannot disable sve256", "{ 'sve256': false }");
>>> +
>>> +    qtest_quit(qts);
>>> +}
>>> +
>>> +static void sve_tests_sve_off(const void *data)
>>> +{
>>> +    QTestState *qts;
>>> +
>>> +    qts = qtest_init(MACHINE "-cpu max,sve=off");
>>> +
>>> +    /*
>>> +     * SVE is off, so the map should be empty.
>>> +     */
>>> +    assert_sve_vls(qts, "max", 0, NULL);
>>> +
>>> +    /*
>>> +     * We can't turn anything on, but off is OK.
>>> +     */
>>> +    assert_error(qts, "max", "cannot enable sve128", "{ 'sve128': true }");
>>> +    assert_sve_vls(qts, "max", 0, "{ 'sve128': false }");
>>> +
>>> +    qtest_quit(qts);
>>> +}
>>> +
>>>  static void test_query_cpu_model_expansion(const void *data)
>>>  {
>>>      QTestState *qts;
>>> @@ -159,9 +366,12 @@ static void test_query_cpu_model_expansion(const void *data)
>>>      if (g_str_equal(qtest_get_arch(), "aarch64")) {
>>>          assert_has_feature(qts, "max", "aarch64");
>>>          assert_has_feature(qts, "max", "sve");
>>> +        assert_has_feature(qts, "max", "sve128");
>>>          assert_has_feature(qts, "cortex-a57", "pmu");
>>>          assert_has_feature(qts, "cortex-a57", "aarch64");
>>>  
>>> +        sve_tests_default(qts, "max");
>>> +
>>>          /* Test that features that depend on KVM generate errors without. */
>>>          assert_error(qts, "max",
>>>                       "'aarch64' feature cannot be disabled "
>>> @@ -213,6 +423,13 @@ int main(int argc, char **argv)
>>>      qtest_add_data_func("/arm/query-cpu-model-expansion",
>>>                          NULL, test_query_cpu_model_expansion);
>>>  
>>> +    if (g_str_equal(qtest_get_arch(), "aarch64")) {
>>> +        qtest_add_data_func("/arm/max/query-cpu-model-expansion/sve-max-vq-8",
>>> +                            NULL, sve_tests_sve_max_vq_8);
>>> +        qtest_add_data_func("/arm/max/query-cpu-model-expansion/sve-off",
>>> +                            NULL, sve_tests_sve_off);
>>> +    }
>>> +
>>>      if (kvm_available) {
>>>          qtest_add_data_func("/arm/kvm/query-cpu-model-expansion",
>>>                              NULL, test_query_cpu_model_expansion_kvm);
>>>
>>
Eric Auger June 27, 2019, 11 a.m. UTC | #10
Hi,

On 6/27/19 12:46 PM, Andrew Jones wrote:
> On Wed, Jun 26, 2019 at 06:56:54PM +0200, Auger Eric wrote:
>>> diff --git a/target/arm/helper.c b/target/arm/helper.c
>>> index f500ccb6d31b..b7b719dba57f 100644
>>> --- a/target/arm/helper.c
>>> +++ b/target/arm/helper.c
>>> @@ -5324,7 +5324,16 @@ static void zcr_write(CPUARMState *env, const ARMCPRegInfo *ri,
>>>  
>>>      /* Bits other than [3:0] are RAZ/WI.  */
>>>      QEMU_BUILD_BUG_ON(ARM_MAX_VQ > 16);
>>> -    raw_write(env, ri, value & 0xf);
>>> +    value &= 0xf;
>>> +
>>> +    if (value) {
>>> +        /* get next vq that is smaller than or equal to value's vq */
>>> +        uint32_t vq = value + 1;
>>> +        vq = arm_cpu_vq_map_next_smaller(cpu, vq + 1);
>>> +        value = vq - 1;
>> spec says:
>>
>> "if an unsupported vector length is requested in ZCR_ELx, the
>> implementation is required to select the largest
>> supported vector length that is less than the requested length. This
>> does not alter the value of ZCR_ELx.LEN.
>> "
>>
>> So I understand the value written in the reg should not be unmodified.
>>
> 
> Sorry, I can't parse what you're trying to tell me here. Here we have
> to write 'value', because that's what the guest is trying to do. As the
> spec says in your quote, we have to pick the length the guest wants, or
> the next smaller valid one, so that's what the code above does. So are
> you just stating that you agree with this hunk of the code?
What we are writing into the reg is arm_cpu_vq_map_next_smaller(cpu, vq
+ 1) -1. Maybe I misunderstand the whole wording but I would have
expected the original unmodified value to be written in the reg instead?

Thanks

Eric
> 
> Thanks,
> drew
>
Andrew Jones June 27, 2019, 11:43 a.m. UTC | #11
On Thu, Jun 27, 2019 at 12:51:10PM +0200, Auger Eric wrote:
> Hi Drew,
> On 6/27/19 11:40 AM, Andrew Jones wrote:
> > On Wed, Jun 26, 2019 at 04:58:05PM +0200, Auger Eric wrote:
> >> Hi Drew,
> >>
> >> On 6/21/19 6:34 PM, Andrew Jones wrote:
> >>> Introduce cpu properties to give fine control over SVE vector lengths.
> >>> We introduce a property for each valid length up to the current
> >>> maximum supported, which is 2048-bits. The properties are named, e.g.
> >>> sve128, sve256, sve512, ..., where the number is the number of bits.
> >> sve384 then in the natural order, otherwise it gives the impression you
> >> can only specify * 128bit pow of 2 sizes at this stage of the reading.
> > 
> > OK
> > 
> >>
> >>>
> >>> It's now possible to do something like -cpu max,sve-max-vq=4,sve384=off
> >> Document the fact sve512 cannot be turned to off, which sounds fully
> >> sensible (by reading the code). By the way, I think an actual
> >> documentation should be provided in qemu. Maybe as spec to agree on.
> > 
> > Actually, maybe I could just allow it to be disabled. It would be
> > a strange command line to do '-cpu max,sve-max-vq=4,sve512=off', but if
> > that's what the user wants, then there's not really any good reason to
> > block it. As I say below, mixing these types of properties on the command
> > line isn't really a good idea, but it's not completely blocked because we
> > want a user that prefers launching their (most likely TCG) guest with,
> > e.g.  '-cpu max,sve-max-vq=4', to also have a functional QMP CPU model
> > expansion, but the CPU model expansion for SVE vector lengths depends on
> > the sve<vl-bits> properties, thus mixing sve-max-vq with sve<vl-bits>,
> > where sve-max-vq comes first. They're just not mixed on the command line.
> OK
> > 
> >>> to provide a guest vector lengths 128, 256, and 512 bits. The resulting
> >>> set must conform to the architectural constraint of having all power-of-2
> >>> lengths smaller than the maximum length present. It's also possible to
> >>> only provide sve<vl-bits> properties, e.g. -cpu max,sve512=on> That example provides the machine with 128, 256, and 512 bit vector
> >> lengths.
> >>> It doesn't hurt to explicitly ask for all expected vector lengths,
> >>> which is what, for example, libvirt should do.>
> >>> Note1, it is not possible to use sve<vl-bits> properties before
> >>> sve-max-vq, e.g. -cpu max,sve384=off,sve-max-vq=4, as supporting
> >>> that overly complicates the user input validation.
> >>>
> >>> Note2, while one might expect -cpu max,sve-max-vq=4,sve512=on to be the
> >>> same as -cpu max,sve512=on, they are not.
> >> yep it is a bit weird
> >>
> >> Didn't you consider -cpu max,sve-max-vq=4,req_only=true removing non
> >> power of 2 values and sve<vl-bits> setting a single VLbit?
> > 
> > Nope. I mostly considered making sure sve-max-vq was supported to the
> > level necessary to work with the new properties, and to not change its
> > current semantics, but I don't want to extend it in any way, nor to
> > require it to be used with the new properties at all. sve-max-vq isn't
> > even going to be supported by '-cpu host', so we can't rely on it being
> > part of the SVE vector length selection for normal KVM guests.
> OK, as long as it is documented somewhere.
> > 
> > Keep in mind that the current semantics of sve-max-vq are to enable all
> > vector lengths up to and including that specified maximum. That's not
> > a safe way to select vector lengths on KVM, as it may include vector
> > lengths that the KVM host does not support, thus it's not something KVM
> > users should use. It's already there for the 'max' cpu type though, so
> > we work with it in this series the best we can for 'max', but not at all
> > for 'host'.
> Do you expect hosts to expose only a few of the permitted values? I
> imagined it would generally expose none or all of them actually. If you

You'd expect wrong then :-) There's already a real host out there that
implements SVE, but they don't implement any of the optional lengths.

> discourage people to use sve-max-vq and sve<>=on only sets 2^n values,
> userspace will need to set manually all intermediate !2^n values

For TCG, yes. But what I recommend userspace to do isn't just to set the
max with sve<max-vl-bits>=on and then all the optional vector lengths,
too, if they want them. I recommend they set *all* vector lengths they
want, e.g. 128,256,384,... even though some of them would be auto-enabled.
If you don't do that, then you don't have a truly migratable command line.
The auto-enabling of required vector lengths makes sense to me, though,
because the code has to detect it anyway, in order to error out when
they're not present. And doing so also gives developers, who may not care
about migration, a more concise command line.

> > 
> > Re: documentation for this. I suppose I could write something up that
> > clarifies the properties and also suggests best practices regarding
> > sve-max-vq.
> 
> To me this is really helpful to have a global understanding
> > 
> >>> The former enables all vector lengths 512 bits and smaller
> >> ( required and permitted)
> >>> while the latter only sets the 512-bit
> >>> length and its smaller power-of-2 lengths. It's probably best not to use
> >>> sve-max-vq with sve<vl-bits> properties, but it can't be completely
> >>> forbidden as we want qmp_query_cpu_model_expansion to work with guests
> >>> launched with e.g. -cpu max,sve-max-vq=8 on their command line.
> >> what does happen if you specify -cpu max,sve384=on? (seems to be allowed
> >> in the code?)
> > 
> > That's perfectly valid with tcg, you'll get 128,256,384. It's also valid
> > with KVM if you're host supports it.
> OK so it exposes the maximum configurable vector length + all the
> corresponding required values.

Correct

> > 
> >>
> >>>
> >>> Signed-off-by: Andrew Jones <drjones@redhat.com>
> >>> ---
> >>>  target/arm/cpu.c         |   6 +
> >>>  target/arm/cpu.h         |  14 ++
> >>>  target/arm/cpu64.c       | 360 ++++++++++++++++++++++++++++++++++++++-
> >>>  target/arm/helper.c      |  11 +-
> >>>  target/arm/monitor.c     |  16 ++
> >>>  tests/arm-cpu-features.c | 217 +++++++++++++++++++++++
> >>>  6 files changed, 620 insertions(+), 4 deletions(-)
> >>>
> >>> diff --git a/target/arm/cpu.c b/target/arm/cpu.c
> >>> index f08e178fc84b..e060a0d9df0e 100644
> >>> --- a/target/arm/cpu.c
> >>> +++ b/target/arm/cpu.c
> >>> @@ -1019,6 +1019,12 @@ static void arm_cpu_realizefn(DeviceState *dev, Error **errp)
> >>>          return;
> >>>      }
> >>>  
> >>> +    arm_cpu_sve_finalize(cpu, &local_err);
> >>> +    if (local_err) {
> >>> +        error_propagate(errp, local_err);
> >>> +        return;
> >>> +    }
> >>> +
> >>>      if (arm_feature(env, ARM_FEATURE_AARCH64) &&
> >>>          cpu->has_vfp != cpu->has_neon) {
> >>>          /*
> >>> diff --git a/target/arm/cpu.h b/target/arm/cpu.h
> >>> index f9da672be575..cbb155cf72a5 100644
> >>> --- a/target/arm/cpu.h
> >>> +++ b/target/arm/cpu.h
> >>> @@ -184,8 +184,13 @@ typedef struct {
> >>>  
> >>>  #ifdef TARGET_AARCH64
> >>>  # define ARM_MAX_VQ    16
> >>> +void arm_cpu_sve_finalize(ARMCPU *cpu, Error **errp);
> >>> +uint32_t arm_cpu_vq_map_next_smaller(ARMCPU *cpu, uint32_t vq);
> >>>  #else
> >>>  # define ARM_MAX_VQ    1
> >>> +static inline void arm_cpu_sve_finalize(ARMCPU *cpu, Error **errp) { }
> >>> +static inline uint32_t arm_cpu_vq_map_next_smaller(ARMCPU *cpu, uint32_t vq)
> >>> +{ return 0; }
> >>>  #endif
> >>>  
> >>>  typedef struct ARMVectorReg {
> >>> @@ -915,6 +920,15 @@ struct ARMCPU {
> >>>  
> >>>      /* Used to set the maximum vector length the cpu will support.  */
> >>>      uint32_t sve_max_vq;
> >>> +
> >>> +    /*
> >>> +     * In sve_vq_map each set bit is a supported vector length of
> >>> +     * (bit-number + 1) * 16 bytes, i.e. each bit number + 1 is the vector> +     * length in quadwords. We need a map size twice the maximum
> >>> +     * quadword length though because we use two bits for each vector
> >>> +     * length in order to track three states: uninitialized, off, and on.
> >>> +     */
> >>> +    DECLARE_BITMAP(sve_vq_map, ARM_MAX_VQ * 2);
> >>>  };
> >>>  
> >>>  void arm_cpu_post_init(Object *obj);
> >>> diff --git a/target/arm/cpu64.c b/target/arm/cpu64.c
> >>> index 02ada65f240c..5def82111dee 100644
> >>> --- a/target/arm/cpu64.c
> >>> +++ b/target/arm/cpu64.c
> >>> @@ -257,6 +257,149 @@ static void aarch64_a72_initfn(Object *obj)
> >>>      define_arm_cp_regs(cpu, cortex_a72_a57_a53_cp_reginfo);
> >>>  }
> >>>  
> >>> +/*
> >>> + * While we eventually use cpu->sve_vq_map as a typical bitmap, where each vq
> >>> + * has only two states (off/on), until we've finalized the map at realize time
> >>> + * we use an extra bit, at the vq - 1 + ARM_MAX_VQ bit number, to also allow
> >>> + * tracking of the uninitialized state. The arm_vq_state typedef and following
> >>> + * functions allow us to more easily work with the bitmap. Also, while the map
> >>> + * is still initializing, sve-max-vq has an additional three states, bringing
> >>> + * the number of its states to five, which are the following:
> >>> + *
> >>> + * sve-max-vq:
> >>> + *   0:    SVE is disabled. The default value for a vq in the map is 'OFF'.
> >>> + *  -1:    SVE is enabled, but neither sve-max-vq nor sve<vl-bits> properties
> >>> + *         have yet been specified by the user. The default value for a vq in
> >>> + *         the map is 'ON'.
> >>> + *  -2:    SVE is enabled and one or more sve<vl-bits> properties have been
> >>> + *         set to 'OFF' by the user, but no sve<vl-bits> properties have yet
> >>> + *         been set to 'ON'. The user is now blocked from setting sve-max-vq
> >>> + *         and the default value for a vq in the map is 'ON'.
> >>> + *  -3:    SVE is enabled and one or more sve<vl-bits> properties have been
> >>> + *         set to 'ON' by the user. The user is blocked from setting sve-max-vq
> >>> + *         and the default value for a vq in the map is 'OFF'. sve-max-vq never
> >>> + *         transitions back to -2, even if later inputs disable the vector
> >>> + *         lengths that initially transitioned sve-max-vq to this state. This
> >>> + *         avoids the default values from flip-flopping.
> >>> + *  [1-ARM_MAX_VQ]: SVE is enabled and the user has specified a valid
> >>> + *                  sve-max-vq. The sve-max-vq specified vq and all smaller
> >>> + *                  vq's will be initially enabled. All larger vq's will have
> >>> + *                  a default of 'OFF'.
> >>> + */
> >>> +#define ARM_SVE_INIT          -1
> >>> +#define ARM_VQ_DEFAULT_ON     -2
> >>> +#define ARM_VQ_DEFAULT_OFF    -3
> >>> +
> >>> +#define arm_sve_have_max_vq(cpu) ((int32_t)(cpu)->sve_max_vq > 0)
> >>> +
> >>> +typedef enum arm_vq_state {
> >>> +    ARM_VQ_OFF,
> >>> +    ARM_VQ_ON,
> >>> +    ARM_VQ_UNINITIALIZED,
> >>> +} arm_vq_state;
> >>> +
> >>> +static arm_vq_state arm_cpu_vq_map_get(ARMCPU *cpu, int vq)> +{
> >>> +    assert(vq <= ARM_MAX_VQ);
> >>> +
> >>> +    return test_bit(vq - 1, cpu->sve_vq_map) |
> >>> +           test_bit(vq - 1 + ARM_MAX_VQ, cpu->sve_vq_map) << 1;
> >>> +}> +
> >>> +static void arm_cpu_vq_map_set(ARMCPU *cpu, int vq, arm_vq_state state)
> >>> +{
> >>> +    assert(state == ARM_VQ_OFF || state == ARM_VQ_ON);
> >>> +    assert(vq <= ARM_MAX_VQ);
> >>> +
> >>> +    clear_bit(vq - 1 + ARM_MAX_VQ, cpu->sve_vq_map);
> >>> +
> >>> +    if (state == ARM_VQ_ON) {
> >>> +        set_bit(vq - 1, cpu->sve_vq_map);
> >>> +    } else {
> >>> +        clear_bit(vq - 1, cpu->sve_vq_map);
> >>> +    }
> >>> +}
> >>> +
> >>> +static void arm_cpu_vq_map_init(ARMCPU *cpu)
> >>> +{
> >>> +    bitmap_zero(cpu->sve_vq_map, ARM_MAX_VQ * 2);
> >> nit: bitmap_clear(map, 0, ARM_MAX_VQ);
> > 
> > bitmap_clear will only clear the specified bits, leaving the upper bits
> > uninitialized, which could cause problems. It's not a problem here
> > because we're using a zero initialized cpu state member, but if it was
> > a bitmap like below, declared on the stack, then we need the zeroing at
> > least once. I'd prefer to use zero here too, to keep it consistent.
> Sorry I don't get it. I would have expected the above bitmap_clear would
> 0 initialize 0 - ARM_MAX_VQ-1 bits and the bitmap_set below would
> initialize ARM_MAX_VQ - 2 * ARM_MAX_VQ -1 with 1's?

That's true, but what about the bits > 2 * ARM_MAX_VQ - 1? The bitmap is
implemented as an array of longs, so if you don't use a multiple of
sizeof(long)*8 bits, then you're going to have uninitialized parts of
your bitmap. It's safest to use bitmap_clear() for the whole thing at
least once. That said, I'll agree it's a bit subtle and maybe not even
the right way to approach it. It might be better just to audit all the
uses to ensure there's no chance of misinterpretation of unused one
bits > 2 * ARM_MAX_VQ - 1. We could possibly sprinkle more asserts around
to maintain that.

> 
> > 
> >> /* all VLs OFF */
> >>> +    bitmap_set(cpu->sve_vq_map, ARM_MAX_VQ, ARM_MAX_VQ);
> >> /* all VLs uninitialized */
> > 
> > I can add one comment that says
> > 
> > "Set all VQ's to ARM_VQ_UNINITIALIZED" above both lines.
> > 
> >>> +}
> >>> +
> >>> +static bool arm_cpu_vq_map_is_finalized(ARMCPU *cpu)
> >>> +{
> >>> +    DECLARE_BITMAP(map, ARM_MAX_VQ * 2);
> >>> +
> >>> +    bitmap_zero(map, ARM_MAX_VQ * 2);
> >> same
> > 
> > Here me must use bitmap_zero.
> > 
> >>> +    bitmap_set(map, ARM_MAX_VQ, ARM_MAX_VQ);
> >>> +    bitmap_and(map, map, cpu->sve_vq_map, ARM_MAX_VQ * 2);
> >>> +
> >>> +    return bitmap_empty(map, ARM_MAX_VQ * 2);
> >>> +}
> >>> +
> >>> +static void arm_cpu_vq_map_finalize(ARMCPU *cpu)
> >>> +{
> >>> +    Error *err = NULL;
> >>> +    char name[8];
> >>> +    uint32_t vq;
> >>> +    bool value;
> >>> +
> >>> +    /*
> >>> +     * We use the property get accessor because it knows what default
> >>> +     * values to return for uninitialized vector lengths.
> >>> +     */
> >>> +    for (vq = 1; vq <= ARM_MAX_VQ; ++vq) {
> >>> +        sprintf(name, "sve%d", vq * 128);
> >>> +        value = object_property_get_bool(OBJECT(cpu), name, &err);
> >>> +        assert(!err);
> >>> +        if (value) {
> >>> +            arm_cpu_vq_map_set(cpu, vq, ARM_VQ_ON);
> >>> +        } else {
> >>> +            arm_cpu_vq_map_set(cpu, vq, ARM_VQ_OFF);
> >>> +        }
> >>> +    }
> >>> +
> >>> +    assert(arm_cpu_vq_map_is_finalized(cpu));
> >> this can be removed
> > 
> > Yes, it can be today, as the code works fine. The idea was that if
> > somebody came in here and modified this in someway that broke the
> > finalized state, then we'd catch it here before going off and doing
> > weird things. This isn't a hot path, so I'd prefer we keep it, but
> > if QEMU maintainers prefer to limit defensive code, then I can
> > certainly remove it.
> Well I understand this was useful was developing and to check some
> tricky states but some of the asserts correspond to some checks that
> look rather obvious, like this one. But well I let others give their
> opinions and it is not a big deal either. Then we can wonder when one
> needs a trace mesg before asserting ...

OK, this one is probably fine to remove.

> > 
> >>> +}
> >>> +
> >>> +void arm_cpu_sve_finalize(ARMCPU *cpu, Error **errp)
> >>> +{
> >>> +    Error *err = NULL;
> >>> +
> >>> +    if (!cpu->sve_max_vq) {
> >>> +        bitmap_zero(cpu->sve_vq_map, ARM_MAX_VQ * 2);
> >>> +        return;
> >>> +    }
> >>> +
> >>> +    if (cpu->sve_max_vq == ARM_SVE_INIT) {
> >>> +        object_property_set_uint(OBJECT(cpu), ARM_MAX_VQ, "sve-max-vq", &err);
> >>> +        if (err) {
> >>> +            error_propagate(errp, err);
> >>> +            return;
> >>> +        }
> >>> +        assert(cpu->sve_max_vq == ARM_MAX_VQ);
> >> I guess those asserts can be removed now?
> > 
> > I'd prefer to keep it. It's critical that we get this right, so if
> > somebody changes something somewhere in the property set/get code
> > that would break this, then it's best we know right away. Again,
> > though, if there's some reason to limit defensive code, even on the
> > init path, then I can.
> Here I would expect that if the set did not fail, sve-max-vq effectively
> if set to the expected value. But maybe I don't remember the cases where
> sve_max-vq can be set to some other value without reprting an error.

No, you're right, no other value can occur if the 'set' doesn't fail. I
suppose it's 100% safe to remove this assert, but then we'd likely want
to state in a comment that sve_max_vq is now ARM_MAX_VQ, just to make
that clear. Besides catching issues early, asserts are good for
documenting.

> 
> > 
> >>> +        arm_cpu_vq_map_finalize(cpu);
> >> move the arm_cpu_vq_map_finalize out of the if, at the end.
> > 
> > That wouldn't work. In this branch arm_cpu_vq_map_finalize() comes
> > after setting cpu->sve_max_vq. In the else branch it must come first.
> That's right sorry
> > 
> >>> +    } else {
> >>> +        arm_cpu_vq_map_finalize(cpu);
> >>> +        if (!arm_sve_have_max_vq(cpu)) {
> >>> +            cpu->sve_max_vq = arm_cpu_vq_map_next_smaller(cpu, ARM_MAX_VQ + 1);
> >>> +        }
> >>> +    }
> >>> +
> >>> +    assert(cpu->sve_max_vq == arm_cpu_vq_map_next_smaller(cpu, ARM_MAX_VQ));>
> > What happened to my '+ 1' here? I see it's in the patch, but somehow got
> > removed in your quote of the patch? For a second there I thought something
> > was really wrong, since that assert should have fired for every full map.
> Yep sorry for the cut :-(
> > 
> >> same here
> > 
> > Anyway my same argument that we don't want to leave arm_cpu_sve_finalize()
> > without knowing we got this right applies.
> 
> > 
> >>> +}
> >>> +
> >>> +uint32_t arm_cpu_vq_map_next_smaller(ARMCPU *cpu, uint32_t vq)
> >>> +{
> >>> +    uint32_t bitnum;
> >>> +
> >>> +    assert(vq <= ARM_MAX_VQ + 1);
> >>> +    assert(arm_cpu_vq_map_is_finalized(cpu));
> >>> +
> >>> +    bitnum = find_last_bit(cpu->sve_vq_map, vq - 1);
> >> why do you pass ARM_MAX_VQ + 1 and then do vq -1? doesn't
> >> find_last_bit() take the size which is ARM_MAX_VQ in this case?
> > 
> > The size is ARM_MAX_VQ + 1, when you want to also check the bitnum
> > corresponding to ARM_MAX_VQ. The bitnum for a VQ is always VQ - 1.
> > Recall VQ=1, which is 128-bits. It takes the bit position zero.
> > VQ=ARM_MAX_VQ=16 takes the bit position 15. As for the 'vq - 1' here
> > in the find_last_bit() call, that's required because that's what the
> > function says it does. It finds the next smaller VQ. So if you pass
> > in, for example, vq=2, it shouldn't search anything but bit position
> > zero:
> > 
> >  vq=2 => max next smaller is vq=1, bitnum = 0
> >       => search the bitmap of size 1 for the last set bit
> > 
> > Or, if you want to search the whole map, including the maximum
> > possibly VQ, as is done above, then you need to pass ARM_MAX_VQ + 1,
> > since the max next smaller VQ is ARM_MAX_VQ.
> OK I get it now. Maybe renaming the function into something like
> arm_cpu_vq_map_last_bit() would have avoided that. the first assert
> looks strange typically.

I'm not sure "last bit" is descriptive enough. Does it mean last bit
not including the VQ passed in? Or does it include the passed in VQ?
I prefer "next smaller". I'll add a comment above the first assert.

> > 
> >>> +    return bitnum == vq - 1 ? 0 : bitnum + 1;
> >>> +}
> >>> +
> >>>  static void cpu_max_get_sve_max_vq(Object *obj, Visitor *v, const char *name,
> >>>                                     void *opaque, Error **errp)
> >>>  {
> >>> @@ -283,12 +426,203 @@ static void cpu_max_set_sve_max_vq(Object *obj, Visitor *v, const char *name,
> >>>          return;
> >>>      }
> >>>  
> >>> +    /*
> >>> +     * It gets complicated trying to support both sve-max-vq and
> >>> +     * sve<vl-bits> properties together, so we mostly don't. We
> >>> +     * do allow both if sve-max-vq is specified first and only once
> >>> +     * though.
> >>> +     */
> >>> +    if (cpu->sve_max_vq != ARM_SVE_INIT) {
> >>> +        error_setg(errp, "sve<vl-bits> in use or sve-max-vq already "
> >>> +                   "specified");
> >>> +        error_append_hint(errp, "sve-max-vq must come before all "
> >>> +                          "sve<vl-bits> properties and it must only "
> >>> +                          "be specified once.\n");
> >>> +        return;
> >>> +    }
> >>> +
> >>>      cpu->sve_max_vq = value;
> >>>  
> >>>      if (cpu->sve_max_vq == 0 || cpu->sve_max_vq > ARM_MAX_VQ) {
> >>>          error_setg(errp, "unsupported SVE vector length");
> >>>          error_append_hint(errp, "Valid sve-max-vq in range [1-%d]\n",
> >>>                            ARM_MAX_VQ);
> >>> +    } else {
> >>> +        uint32_t vq;
> >>> +
> >>> +        for (vq = 1; vq <= cpu->sve_max_vq; ++vq) {
> >>> +            char name[8];
> >>> +            sprintf(name, "sve%d", vq * 128);
> >>> +            object_property_set_bool(obj, true, name, &err);
> >>> +            if (err) {
> >>> +                error_propagate(errp, err);
> >>> +                return;
> >>> +            }
> >>> +        }
> >>> +    }
> >>> +}
> >>> +
> >>> +static void cpu_arm_get_sve_vq(Object *obj, Visitor *v, const char *name,
> >>> +                               void *opaque, Error **errp)
> >>> +{
> >>> +    ARMCPU *cpu = ARM_CPU(obj);
> >>> +    int vq = atoi(&name[3]) / 128;
> >>> +    arm_vq_state vq_state;
> >>> +    bool value;
> >>> +
> >>> +    vq_state = arm_cpu_vq_map_get(cpu, vq);
> >>> +
> >>> +    if (!cpu->sve_max_vq) {
> >>> +        /* All vector lengths are disabled when SVE is off. */
> >>> +        value = false;
> >>> +    } else if (vq_state == ARM_VQ_ON) {
> >>> +        value = true;
> >>> +    } else if (vq_state == ARM_VQ_OFF) {
> >>> +        value = false;
> >>> +    } else {
> >>> +        /*
> >>> +         * vq is uninitialized. We pick a default here based on the
> >>> +         * the state of sve-max-vq and other sve<vl-bits> properties.
> >>> +         */
> >>> +        if (arm_sve_have_max_vq(cpu)) {
> >>> +            /*
> >>> +             * If we have sve-max-vq, then all remaining uninitialized
> >>> +             * vq's are 'OFF'.
> >>> +             */
> >>> +            value = false;
> >>> +        } else {
> >>> +            switch (cpu->sve_max_vq) {
> >>> +            case ARM_SVE_INIT:
> >>> +            case ARM_VQ_DEFAULT_ON:
> >>> +                value = true;
> >>> +                break;
> >>> +            case ARM_VQ_DEFAULT_OFF:
> >>> +                value = false;
> >>> +                break;
> >>> +            }
> >>> +        }
> >>> +    }
> >>> +
> >>> +    visit_type_bool(v, name, &value, errp);
> >>> +}
> >>> +
> >>> +static void cpu_arm_set_sve_vq(Object *obj, Visitor *v, const char *name,
> >>> +                               void *opaque, Error **errp)
> >>> +{
> >>> +    ARMCPU *cpu = ARM_CPU(obj);
> >>> +    int vq = atoi(&name[3]) / 128;
> >>> +    arm_vq_state vq_state;
> >>> +    Error *err = NULL;
> >>> +    uint32_t max_vq = 0;
> >>> +    bool value;
> >>> +
> >>> +    visit_type_bool(v, name, &value, &err);
> >>> +    if (err) {
> >>> +        error_propagate(errp, err);
> >>> +        return;
> >>> +    }
> >>> +
> >>> +    if (value && !cpu->sve_max_vq) {
> >>> +        error_setg(errp, "cannot enable %s", name);
> >>> +        error_append_hint(errp, "SVE has been disabled with sve=off\n");
> >>> +        return;
> >>> +    } else if (!cpu->sve_max_vq) {
> >>> +        /*
> >>> +         * We don't complain about disabling vector lengths when SVE
> >>> +         * is off, but we don't do anything either.
> >>> +         */
> >>> +        return;
> >>> +    }
> >>> +
> >>> +    if (arm_sve_have_max_vq(cpu)) {
> >>> +        max_vq = cpu->sve_max_vq;
> >>> +    } else {
> >>> +        if (value) {
> >>> +            cpu->sve_max_vq = ARM_VQ_DEFAULT_OFF;
> >>> +        } else if (cpu->sve_max_vq != ARM_VQ_DEFAULT_OFF) {
> >>> +            cpu->sve_max_vq = ARM_VQ_DEFAULT_ON;
> >>> +        }
> >>> +    }
> >>> +
> >>> +    /*
> >>> +     * We need to know the maximum vector length, which may just currently
> >>> +     * be the maximum length, in order to validate the enabling/disabling
> >>> +     * of this vector length. We use the property get accessor in order to
> >>> +     * get the appropriate default value for any uninitialized lengths.
> >>> +     */
> >>> +    if (!max_vq) {
> >>> +        char tmp[8];
> >>> +        bool s;
> >>> +
> >>> +        for (max_vq = ARM_MAX_VQ; max_vq >= 1; --max_vq) {
> >>> +            sprintf(tmp, "sve%d", max_vq * 128);
> >>> +            s = object_property_get_bool(OBJECT(cpu), tmp, &err);
> >>> +            assert(!err);
> >>> +            if (s) {
> >>> +                break;
> >>> +            }
> >>> +        }
> >>> +    }
> >>> +
> >>> +    if (arm_sve_have_max_vq(cpu) && value && vq > cpu->sve_max_vq) {
> >>> +        error_setg(errp, "cannot enable %s", name);
> >>> +        error_append_hint(errp, "vq=%d (%d bits) is larger than the "
> >>> +                          "maximum vector length, sve-max-vq=%d "
> >>> +                          "(%d bits)\n", vq, vq * 128,
> >>> +                          cpu->sve_max_vq, cpu->sve_max_vq * 128);
> >>> +    } else if (arm_sve_have_max_vq(cpu) && !value && vq == cpu->sve_max_vq) {
> >>> +        error_setg(errp, "cannot disable %s", name);
> >>> +        error_append_hint(errp, "The maximum vector length must be "
> >>> +                          "enabled, sve-max-vq=%d (%d bits)\n",
> >>> +                          cpu->sve_max_vq, cpu->sve_max_vq * 128);
> >>> +    } else if (arm_sve_have_max_vq(cpu) && !value && vq < cpu->sve_max_vq &&
> >>> +               is_power_of_2(vq)) {
> >>> +        error_setg(errp, "cannot disable %s", name);
> >>> +        error_append_hint(errp, "vq=%d (%d bits) is required as it is a "
> >>> +                          "power-of-2 length smaller than the maximum, "
> >>> +                          "sve-max-vq=%d (%d bits)\n", vq, vq * 128,
> >>> +                          cpu->sve_max_vq, cpu->sve_max_vq * 128);
> >>> +    } else if (!value && vq < max_vq && is_power_of_2(vq)) {
> >>> +        error_setg(errp, "cannot disable %s", name);
> >>> +        error_append_hint(errp, "Vector length %d-bits is required as it "
> >>> +                          "is a power-of-2 length smaller than another "
> >>> +                          "enabled vector length. Disable all larger vector "
> >>> +                          "lengths first.\n", vq * 128);
> >>> +    } else {
> >> adding return in each if/elsif would allow to avoid this indent.
> > 
> > Yeah, but I didn't really mind the indent :-)
> > 
> >>> +        if (value) {
> >>> +            bool fail = false;
> >>> +            uint32_t s;
> >>> +
> >>> +            /*
> >>> +             * Enabling a vector length automatically enables all
> >>> +             * uninitialized power-of-2 lengths smaller than it, as
> >>> +             * per the architecture.
> >>> +             */
> >> Test we are not attempting to enable a !is_power_of_2
> > 
> > I'm not sure what this means. In this context 'vq' can be a !power-of-2
> > if it wants to be, and there's no way for 's' to be a !power-of-2 because
> > we filter the vq's with is_power_of_2(s).
> Yep got it now
> > 
> >>> +            for (s = 1; s < vq; ++s) {
> >>> +                if (is_power_of_2(s)) {
> >>> +                    vq_state = arm_cpu_vq_map_get(cpu, s);
> >>> +                    if (vq_state == ARM_VQ_UNINITIALIZED) {
> >>> +                        arm_cpu_vq_map_set(cpu, s, ARM_VQ_ON);
> >>> +                    } else if (vq_state == ARM_VQ_OFF) {
> >>> +                        fail = true;
> >>> +                        break;
> >>> +                    }
> >>> +                }
> >>> +            }
> >>> +
> >>> +            if (fail) {
> >>> +                error_setg(errp, "cannot enable %s", name);
> >>> +                error_append_hint(errp, "Vector length %d-bits is disabled "
> >>> +                                  "and is a power-of-2 length smaller than "
> >>> +                                  "%s. All power-of-2 vector lengths smaller "
> >>> +                                  "than the maximum length are required.\n",
> >>> +                                  s * 128, name);
> >>> +            } else {
> >>> +                arm_cpu_vq_map_set(cpu, vq, ARM_VQ_ON);
> >>> +            }
> >>> +        } else {
> >>> +            arm_cpu_vq_map_set(cpu, vq, ARM_VQ_OFF);
> >>> +        }
> >>>      }
> >>>  }
> >>>  
> >>> @@ -318,10 +652,11 @@ static void cpu_arm_set_sve(Object *obj, Visitor *v, const char *name,
> >>>          /*
> >>>           * We handle the -cpu <cpu>,sve=off,sve=on case by reinitializing,
> >>>           * but otherwise we don't do anything as an sve=on could come after
> >>> -         * a sve-max-vq setting.
> >>> +         * a sve-max-vq or sve<vl-bits> setting.
> >>>           */
> >>>          if (!cpu->sve_max_vq) {
> >>> -            cpu->sve_max_vq = ARM_MAX_VQ;
> >>> +            cpu->sve_max_vq = ARM_SVE_INIT;
> >>> +            arm_cpu_vq_map_init(cpu);
> >>>          }
> >>>      } else {
> >>>          cpu->sve_max_vq = 0;
> >>> @@ -336,6 +671,7 @@ static void cpu_arm_set_sve(Object *obj, Visitor *v, const char *name,
> >>>  static void aarch64_max_initfn(Object *obj)
> >>>  {
> >>>      ARMCPU *cpu = ARM_CPU(obj);
> >>> +    uint32_t vq;
> >>>  
> >>>      if (kvm_enabled()) {
> >>>          kvm_arm_set_cpu_features_from_host(cpu);
> >>> @@ -420,11 +756,29 @@ static void aarch64_max_initfn(Object *obj)
> >>>          cpu->dcz_blocksize = 7; /*  512 bytes */
> >>>  #endif
> >>>  
> >>> -        cpu->sve_max_vq = ARM_MAX_VQ;
> >>> +        /*
> >>> +         * sve_max_vq is initially unspecified, but must be initialized to a
> >>> +         * non-zero value (ARM_SVE_INIT) to indicate that this cpu type has
> >>> +         * SVE. It will be finalized in arm_cpu_realizefn().
> >>> +         */
> >>> +        cpu->sve_max_vq = ARM_SVE_INIT;
> >>>          object_property_add(obj, "sve-max-vq", "uint32", cpu_max_get_sve_max_vq,
> >>>                              cpu_max_set_sve_max_vq, NULL, NULL, &error_fatal);
> >>>          object_property_add(obj, "sve", "bool", cpu_arm_get_sve,
> >>>                              cpu_arm_set_sve, NULL, NULL, &error_fatal);
> >>> +
> >>> +        /*
> >>> +         * sve_vq_map uses a special state while setting properties, so
> >>> +         * we initialize it here with its init function and finalize it
> >>> +         * in arm_cpu_realizefn().
> >>> +         */
> >>> +        arm_cpu_vq_map_init(cpu);
> >>> +        for (vq = 1; vq <= ARM_MAX_VQ; ++vq) {
> >>> +            char name[8];
> >>> +            sprintf(name, "sve%d", vq * 128);
> >>> +            object_property_add(obj, name, "bool", cpu_arm_get_sve_vq,
> >>> +                                cpu_arm_set_sve_vq, NULL, NULL, &error_fatal);
> >>> +        }
> >>>      }
> >>>  }
> >>>  
> >>> diff --git a/target/arm/helper.c b/target/arm/helper.c
> >>> index f500ccb6d31b..b7b719dba57f 100644
> >>> --- a/target/arm/helper.c
> >>> +++ b/target/arm/helper.c
> >>> @@ -5324,7 +5324,16 @@ static void zcr_write(CPUARMState *env, const ARMCPRegInfo *ri,
> >>>  
> >>>      /* Bits other than [3:0] are RAZ/WI.  */
> >>>      QEMU_BUILD_BUG_ON(ARM_MAX_VQ > 16);
> >>> -    raw_write(env, ri, value & 0xf);
> >>> +    value &= 0xf;
> >>> +
> >>> +    if (value) {> +        /* get next vq that is smaller than or equal to value's vq */
> >>> +        uint32_t vq = value + 1;
> >> ditto
> > 
> > What does this 'ditto' map to? Oh, probably the discussion about vq's
> > getting +1's and -1's. In this context 'value' is the ZCR_ELx
> > representation of a VQ, which is VQ - 1, just like in the bitmap. To
> > translate that to a VQ we need to do a '+ 1'.
> 
> Until here I follow. By the way in which doc did you find the
> description of ZCR_ELx?

It definitely wasn't "ARM Architecture Reference Manual Supplement The
Scalable Vector Extension (SVE), for ARMv8-A". I'm not being sarcastic!
That manual *should* describe the register, and it *does* list the
register in section 6.1.2 SVE System registers, but there is *no* register
description in the document...

My knowledge of ZCR_ELx comes from reading kernel and qemu code. And
actually seeing it work in practice on real hardware.

> 
>  As I wrote in the comment
> > here, we want to find the next smaller or equal VQ here, because this is
> > the input VQ from the guest which may itself be a valid VQ, but if it's
> > not, we need to step down to the next valid one. So we ask
> > arm_cpu_vq_map_next_smaller() for 'vq + 1' in order to ensure 'vq' will
> > be in the searched range. After we get a valid vq from
> > arm_cpu_vq_map_next_smaller(), which must be at least '1', since that's
> > required by the architecture and thus will always be present, we need to
> > translate it back to the ZCR_ELx representation with the subtraction.
> > Phew... We programmers do love adding and subtracting by one, right :-)
> Got it now. but well I scratched my head
> > 
> >>> +        vq = arm_cpu_vq_map_next_smaller(cpu, vq + 1);
> >>> +        value = vq - 1;
> >>> +    }
> >>> +
> >>> +    raw_write(env, ri, value);
> >>>  
> >>>      /*
> >>>       * Because we arrived here, we know both FP and SVE are enabled;
> >>> diff --git a/target/arm/monitor.c b/target/arm/monitor.c
> >>> index 157c487a1551..1e213906fd8f 100644
> >>> --- a/target/arm/monitor.c
> >>> +++ b/target/arm/monitor.c
> >>> @@ -89,8 +89,24 @@ GICCapabilityList *qmp_query_gic_capabilities(Error **errp)
> >>>      return head;
> >>>  }
> >>>  
> >>> +QEMU_BUILD_BUG_ON(ARM_MAX_VQ > 16);
> >>> +
> >>> +/*
> >>> + * These are cpu model features we want to advertise. The order here
> >>> + * matters as this is the order in which qmp_query_cpu_model_expansion
> >>> + * will attempt to set them. If there are dependencies between features,
> >>> + * as there are with the sve<vl-bits> features, then the order that
> >>> + * considers those dependencies must be used.
> >>> + *
> >>> + * The sve<vl-bits> features need to be in reverse order in order to
> >>> + * enable/disable the largest vector lengths first, ensuring all
> >>> + * power-of-2 vector lengths smaller can also be enabled/disabled.
> >>> + */
> >>>  static const char *cpu_model_advertised_features[] = {
> >>>      "aarch64", "pmu", "sve",
> >>> +    "sve2048", "sve1920", "sve1792", "sve1664", "sve1536", "sve1408",
> >>> +    "sve1280", "sve1152", "sve1024", "sve896", "sve768", "sve640",
> >>> +    "sve512", "sve384", "sve256", "sve128",
> >>>      NULL
> >>>  };
> >>>  
> >>> diff --git a/tests/arm-cpu-features.c b/tests/arm-cpu-features.c
> >> Please move all the tests in a separate patch.
> > 
> > I'd prefer not to. I like keeping the test cases that test the new code in
> > the same patch. It self documents what the test cases map to what code and
> > allows immediate testing of the patch when bisecting. Also I'm not really
> > sure how it makes review worse, as another patch would look the same, just
> > in a different email. 
> A reviewier might not be familiar with both standard code and test and
> feel reluctant to invest reading boths, hence reducing the chances to
> collect R-b's. But that's my humble opinion.

That's a valid point. But with you reviewing, I'm not worried :-)

Thanks,
drew

> > 
> >> Each day has enough trouble of its own ;-) sweat...
> > 
> > I really appreciate the review! I realize it's generating sweat,
> > especially with the European heat wave! You don't have to finish
> > a patch before sending comments. I'm fine with multiple replies to
> > the same patch if you want to break your review up into smaller
> > bites.
> 
> Thanks. Yes breathing times are needed due to the heat and digestion of
> the code ;-)
> 
> Thanks
> 
> Eric
> > 
> > Thanks,
> > drew
> > 
> >>
> >> Thanks
> >>
> >> Eric
> >>> index 509e458e9c2f..a4bf6aec00df 100644
> >>> --- a/tests/arm-cpu-features.c
> >>> +++ b/tests/arm-cpu-features.c
> >>> @@ -13,6 +13,18 @@
> >>>  #include "qapi/qmp/qdict.h"
> >>>  #include "qapi/qmp/qjson.h"
> >>>  
> >>> +#if __SIZEOF_LONG__ == 8
> >>> +#define BIT(n) (1UL << (n))
> >>> +#else
> >>> +#define BIT(n) (1ULL << (n))
> >>> +#endif
> >>> +
> >>> +/*
> >>> + * We expect the SVE max-vq to be 16. Also it must be <= 64
> >>> + * for our test code, otherwise 'vls' can't just be a uint64_t.
> >>> + */
> >>> +#define SVE_MAX_VQ 16
> >>> +
> >>>  #define MACHINE    "-machine virt,gic-version=max "
> >>>  #define QUERY_HEAD "{ 'execute': 'query-cpu-model-expansion', " \
> >>>                       "'arguments': { 'type': 'full', "
> >>> @@ -137,6 +149,201 @@ static void assert_bad_props(QTestState *qts, const char *cpu_type)
> >>>      qobject_unref(resp);
> >>>  }
> >>>  
> >>> +static void resp_get_sve_vls(QDict *resp, uint64_t *vls, uint32_t *max_vq)
> >>> +{
> >>> +    const QDictEntry *e;
> >>> +    QDict *qdict;
> >>> +    int n = 0;
> >>> +
> >>> +    *vls = 0;
> >>> +
> >>> +    qdict = resp_get_props(resp);
> >>> +
> >>> +    for (e = qdict_first(qdict); e; e = qdict_next(qdict, e)) {
> >>> +        if (strlen(e->key) > 3 && !strncmp(e->key, "sve", 3) &&
> >>> +            g_ascii_isdigit(e->key[3])) {
> >>> +            char *endptr;
> >>> +            int bits;
> >>> +
> >>> +            bits = g_ascii_strtoll(&e->key[3], &endptr, 10);
> >>> +            if (!bits || *endptr != '\0') {
> >>> +                continue;
> >>> +            }
> >>> +
> >>> +            if (qdict_get_bool(qdict, e->key)) {
> >>> +                *vls |= BIT((bits / 128) - 1);
> >>> +            }
> >>> +            ++n;
> >>> +        }
> >>> +    }
> >>> +
> >>> +    g_assert(n == SVE_MAX_VQ);
> >>> +
> >>> +    *max_vq = !*vls ? 0 : 64 - __builtin_clzll(*vls);
> >>> +}
> >>> +
> >>> +static uint64_t sve_get_vls(QTestState *qts, const char *cpu_type,
> >>> +                            const char *fmt, ...)
> >>> +{
> >>> +    QDict *resp;
> >>> +    uint64_t vls;
> >>> +    uint32_t max_vq;
> >>> +
> >>> +    if (fmt) {
> >>> +        QDict *args;
> >>> +        va_list ap;
> >>> +
> >>> +        va_start(ap, fmt);
> >>> +        args = qdict_from_vjsonf_nofail(fmt, ap);
> >>> +        va_end(ap);
> >>> +
> >>> +        resp = qtest_qmp(qts, QUERY_HEAD "'model': { 'name': %s, "
> >>> +                                                    "'props': %p }"
> >>> +                              QUERY_TAIL, cpu_type, args);
> >>> +    } else {
> >>> +        resp = do_query_no_props(qts, cpu_type);
> >>> +    }
> >>> +
> >>> +    g_assert(resp);
> >>> +    resp_get_sve_vls(resp, &vls, &max_vq);
> >>> +    qobject_unref(resp);
> >>> +
> >>> +    return vls;
> >>> +}
> >>> +
> >>> +#define assert_sve_vls(qts, cpu_type, expected_vls, fmt, ...) \
> >>> +    g_assert(sve_get_vls(qts, cpu_type, fmt, ##__VA_ARGS__) == expected_vls)
> >>> +
> >>> +static void sve_tests_default(QTestState *qts, const char *cpu_type)
> >>> +{
> >>> +    /*
> >>> +     * With no sve-max-vq or sve<vl-bits> properties on the command line
> >>> +     * the default is to have all vector lengths enabled.
> >>> +     */
> >>> +    assert_sve_vls(qts, cpu_type, BIT(SVE_MAX_VQ) - 1, NULL);
> >>> +
> >>> +    /*
> >>> +     * -------------------------------------------------------------------
> >>> +     *               power-of-2(vq)   all-power-            can      can
> >>> +     *                                of-2(< vq)          enable   disable
> >>> +     * -------------------------------------------------------------------
> >>> +     * vq < max_vq      no            MUST*                yes      yes
> >>> +     * vq < max_vq      yes           MUST*                yes      no
> >>> +     * -------------------------------------------------------------------
> >>> +     * vq == max_vq     n/a           MUST*                yes**    yes**
> >>> +     * -------------------------------------------------------------------
> >>> +     * vq > max_vq      n/a           no                   no       yes
> >>> +     * vq > max_vq      n/a           yes                  yes      yes
> >>> +     * -------------------------------------------------------------------
> >>> +     *
> >>> +     * [*] "MUST" means this requirement must already be satisfied,
> >>> +     *     otherwise 'max_vq' couldn't itself be enabled.
> >>> +     *
> >>> +     * [**] Not testable with the QMP interface, only with the command line.
> >>> +     */
> >>> +
> >>> +    /* max_vq := 8 */
> >>> +    assert_sve_vls(qts, cpu_type, 0x8b, "{ 'sve1024': true }");
> >>> +
> >>> +    /* max_vq := 8, vq < max_vq, !power-of-2(vq) */
> >>> +    assert_sve_vls(qts, cpu_type, 0x8f,
> >>> +                   "{ 'sve1024': true, 'sve384': true }");
> >>> +    assert_sve_vls(qts, cpu_type, 0x8b,
> >>> +                   "{ 'sve1024': true, 'sve384': false }");
> >>> +
> >>> +    /* max_vq := 8, vq < max_vq, power-of-2(vq) */
> >>> +    assert_sve_vls(qts, cpu_type, 0x8b,
> >>> +                   "{ 'sve1024': true, 'sve256': true }");
> >>> +    assert_error(qts, cpu_type, "cannot disable sve256",
> >>> +                 "{ 'sve1024': true, 'sve256': false }");
> >>> +
> >>> +    /*
> >>> +     * max_vq := 3, vq > max_vq, !all-power-of-2(< vq)
> >>> +     *
> >>> +     * If given sve384=on,sve512=off,sve640=on the command line error would be
> >>> +     * "cannot enable sve640", but QMP visits the vector lengths in reverse
> >>> +     * order, so we get "cannot disable sve512" instead. The command line would
> >>> +     * also give that error if given sve384=on,sve640=on,sve512=off, so this is
> >>> +     * all fine. The important thing is that we get an error.
> >>> +     */
> >>> +    assert_error(qts, cpu_type, "cannot disable sve512",
> >>> +                 "{ 'sve384': true, 'sve512': false, 'sve640': true }");
> >>> +
> >>> +    /*
> >>> +     * We can disable power-of-2 vector lengths when all larger lengths
> >>> +     * are also disabled. The shorter, sve384=on,sve512=off,sve640=off
> >>> +     * works on the command line, but QMP doesn't know that all the
> >>> +     * vector lengths larger than 384-bits will be disabled until it
> >>> +     * sees the enabling of sve384, which comes near the end since it
> >>> +     * visits the lengths in reverse order. So we just have to explicitly
> >>> +     * disable them all.
> >>> +     */
> >>> +    assert_sve_vls(qts, cpu_type, 0x7,
> >>> +                   "{ 'sve384': true, 'sve512': false, 'sve640': false, "
> >>> +                   "  'sve768': false, 'sve896': false, 'sve1024': false, "
> >>> +                   "  'sve1152': false, 'sve1280': false, 'sve1408': false, "
> >>> +                   "  'sve1536': false, 'sve1664': false, 'sve1792': false, "
> >>> +                   "  'sve1920': false, 'sve2048': false }");
> >>> +
> >>> +    /* max_vq := 3, vq > max_vq, all-power-of-2(< vq) */
> >>> +    assert_sve_vls(qts, cpu_type, 0x1f,
> >>> +                   "{ 'sve384': true, 'sve512': true, 'sve640': true }");
> >>> +    assert_sve_vls(qts, cpu_type, 0xf,
> >>> +                   "{ 'sve384': true, 'sve512': true, 'sve640': false }");
> >>> +}
> >>> +
> >>> +static void sve_tests_sve_max_vq_8(const void *data)
> >>> +{
> >>> +    QTestState *qts;
> >>> +
> >>> +    qts = qtest_init(MACHINE "-cpu max,sve-max-vq=8");
> >>> +
> >>> +    assert_sve_vls(qts, "max", BIT(8) - 1, NULL);
> >>> +
> >>> +    /*
> >>> +     * Disabling the max-vq set by sve-max-vq is not allowed, but
> >>> +     * of course enabling it is OK.
> >>> +     */
> >>> +    assert_error(qts, "max", "cannot disable sve1024", "{ 'sve1024': false }");
> >>> +    assert_sve_vls(qts, "max", 0xff, "{ 'sve1024': true }");
> >>> +
> >>> +    /*
> >>> +     * Enabling anything larger than max-vq set by sve-max-vq is not
> >>> +     * allowed, but of course disabling everything larger is OK.
> >>> +     */
> >>> +    assert_error(qts, "max", "cannot enable sve1152", "{ 'sve1152': true }");
> >>> +    assert_sve_vls(qts, "max", 0xff, "{ 'sve1152': false }");
> >>> +
> >>> +    /*
> >>> +     * We can disable non power-of-2 lengths smaller than the max-vq
> >>> +     * set by sve-max-vq, but not power-of-2 lengths.
> >>> +     */
> >>> +    assert_sve_vls(qts, "max", 0xfb, "{ 'sve384': false }");
> >>> +    assert_error(qts, "max", "cannot disable sve256", "{ 'sve256': false }");
> >>> +
> >>> +    qtest_quit(qts);
> >>> +}
> >>> +
> >>> +static void sve_tests_sve_off(const void *data)
> >>> +{
> >>> +    QTestState *qts;
> >>> +
> >>> +    qts = qtest_init(MACHINE "-cpu max,sve=off");
> >>> +
> >>> +    /*
> >>> +     * SVE is off, so the map should be empty.
> >>> +     */
> >>> +    assert_sve_vls(qts, "max", 0, NULL);
> >>> +
> >>> +    /*
> >>> +     * We can't turn anything on, but off is OK.
> >>> +     */
> >>> +    assert_error(qts, "max", "cannot enable sve128", "{ 'sve128': true }");
> >>> +    assert_sve_vls(qts, "max", 0, "{ 'sve128': false }");
> >>> +
> >>> +    qtest_quit(qts);
> >>> +}
> >>> +
> >>>  static void test_query_cpu_model_expansion(const void *data)
> >>>  {
> >>>      QTestState *qts;
> >>> @@ -159,9 +366,12 @@ static void test_query_cpu_model_expansion(const void *data)
> >>>      if (g_str_equal(qtest_get_arch(), "aarch64")) {
> >>>          assert_has_feature(qts, "max", "aarch64");
> >>>          assert_has_feature(qts, "max", "sve");
> >>> +        assert_has_feature(qts, "max", "sve128");
> >>>          assert_has_feature(qts, "cortex-a57", "pmu");
> >>>          assert_has_feature(qts, "cortex-a57", "aarch64");
> >>>  
> >>> +        sve_tests_default(qts, "max");
> >>> +
> >>>          /* Test that features that depend on KVM generate errors without. */
> >>>          assert_error(qts, "max",
> >>>                       "'aarch64' feature cannot be disabled "
> >>> @@ -213,6 +423,13 @@ int main(int argc, char **argv)
> >>>      qtest_add_data_func("/arm/query-cpu-model-expansion",
> >>>                          NULL, test_query_cpu_model_expansion);
> >>>  
> >>> +    if (g_str_equal(qtest_get_arch(), "aarch64")) {
> >>> +        qtest_add_data_func("/arm/max/query-cpu-model-expansion/sve-max-vq-8",
> >>> +                            NULL, sve_tests_sve_max_vq_8);
> >>> +        qtest_add_data_func("/arm/max/query-cpu-model-expansion/sve-off",
> >>> +                            NULL, sve_tests_sve_off);
> >>> +    }
> >>> +
> >>>      if (kvm_available) {
> >>>          qtest_add_data_func("/arm/kvm/query-cpu-model-expansion",
> >>>                              NULL, test_query_cpu_model_expansion_kvm);
> >>>
> >>
>
Andrew Jones June 27, 2019, 11:47 a.m. UTC | #12
On Thu, Jun 27, 2019 at 01:00:27PM +0200, Auger Eric wrote:
> Hi,
> 
> On 6/27/19 12:46 PM, Andrew Jones wrote:
> > On Wed, Jun 26, 2019 at 06:56:54PM +0200, Auger Eric wrote:
> >>> diff --git a/target/arm/helper.c b/target/arm/helper.c
> >>> index f500ccb6d31b..b7b719dba57f 100644
> >>> --- a/target/arm/helper.c
> >>> +++ b/target/arm/helper.c
> >>> @@ -5324,7 +5324,16 @@ static void zcr_write(CPUARMState *env, const ARMCPRegInfo *ri,
> >>>  
> >>>      /* Bits other than [3:0] are RAZ/WI.  */
> >>>      QEMU_BUILD_BUG_ON(ARM_MAX_VQ > 16);
> >>> -    raw_write(env, ri, value & 0xf);
> >>> +    value &= 0xf;
> >>> +
> >>> +    if (value) {
> >>> +        /* get next vq that is smaller than or equal to value's vq */
> >>> +        uint32_t vq = value + 1;
> >>> +        vq = arm_cpu_vq_map_next_smaller(cpu, vq + 1);
> >>> +        value = vq - 1;
> >> spec says:
> >>
> >> "if an unsupported vector length is requested in ZCR_ELx, the
> >> implementation is required to select the largest
> >> supported vector length that is less than the requested length. This
> >> does not alter the value of ZCR_ELx.LEN.
> >> "
> >>
> >> So I understand the value written in the reg should not be unmodified.
> >>
> > 
> > Sorry, I can't parse what you're trying to tell me here. Here we have
> > to write 'value', because that's what the guest is trying to do. As the
> > spec says in your quote, we have to pick the length the guest wants, or
> > the next smaller valid one, so that's what the code above does. So are
> > you just stating that you agree with this hunk of the code?
> What we are writing into the reg is arm_cpu_vq_map_next_smaller(cpu, vq
> + 1) -1. Maybe I misunderstand the whole wording but I would have
> expected the original unmodified value to be written in the reg instead?

Hmm... So maybe we need more changes to the emulation in order for it to
have an acting value and a register value? Maybe Richard knows what we
should do here.

Thanks,
drew
Dave Martin June 27, 2019, 3:16 p.m. UTC | #13
On Thu, Jun 27, 2019 at 12:47:01PM +0100, Andrew Jones wrote:
> On Thu, Jun 27, 2019 at 01:00:27PM +0200, Auger Eric wrote:
> > Hi,
> > 
> > On 6/27/19 12:46 PM, Andrew Jones wrote:
> > > On Wed, Jun 26, 2019 at 06:56:54PM +0200, Auger Eric wrote:
> > >>> diff --git a/target/arm/helper.c b/target/arm/helper.c
> > >>> index f500ccb6d31b..b7b719dba57f 100644
> > >>> --- a/target/arm/helper.c
> > >>> +++ b/target/arm/helper.c
> > >>> @@ -5324,7 +5324,16 @@ static void zcr_write(CPUARMState *env, const ARMCPRegInfo *ri,
> > >>>  
> > >>>      /* Bits other than [3:0] are RAZ/WI.  */
> > >>>      QEMU_BUILD_BUG_ON(ARM_MAX_VQ > 16);
> > >>> -    raw_write(env, ri, value & 0xf);
> > >>> +    value &= 0xf;
> > >>> +
> > >>> +    if (value) {
> > >>> +        /* get next vq that is smaller than or equal to value's vq */
> > >>> +        uint32_t vq = value + 1;
> > >>> +        vq = arm_cpu_vq_map_next_smaller(cpu, vq + 1);
> > >>> +        value = vq - 1;
> > >> spec says:
> > >>
> > >> "if an unsupported vector length is requested in ZCR_ELx, the
> > >> implementation is required to select the largest
> > >> supported vector length that is less than the requested length. This
> > >> does not alter the value of ZCR_ELx.LEN.
> > >> "
> > >>
> > >> So I understand the value written in the reg should not be unmodified.
> > >>
> > > 
> > > Sorry, I can't parse what you're trying to tell me here. Here we have
> > > to write 'value', because that's what the guest is trying to do. As the
> > > spec says in your quote, we have to pick the length the guest wants, or
> > > the next smaller valid one, so that's what the code above does. So are
> > > you just stating that you agree with this hunk of the code?
> > What we are writing into the reg is arm_cpu_vq_map_next_smaller(cpu, vq
> > + 1) -1. Maybe I misunderstand the whole wording but I would have
> > expected the original unmodified value to be written in the reg instead?
> 
> Hmm... So maybe we need more changes to the emulation in order for it to
> have an acting value and a register value? Maybe Richard knows what we
> should do here.

The "effective" value of an individual ZCR_ELx.LEN field is a bit
nonsensical: the effective vector length comes from the minimum
LEN value at any relevant EL (depending on which ELs are implemented,
which security state you're in, VHE controls etc.).

This is tedious to compute, so I'd expect you want to cache it,
recomputing it only when a ZCR_ELx.LEN changes or when you switch to a
different EL.


The architecture says:

"For all purposes other than returning the result of a direct read of
ZCR_EL1 then this field behaves as if it is set to the minimum of the
stored value and the constrained length inherited from more privileged
Exception levels in the current Security state, rounded down to the
nearest implemented vector length."

I think the behaviour of a direct read is implied: the LEN bits yielded
by an MRS should contain exactly what was last written to them via MSR.

Cheers
---Dave
Richard Henderson June 27, 2019, 4:19 p.m. UTC | #14
On 6/27/19 5:16 PM, Dave Martin wrote:
> The architecture says:
> 
> "For all purposes other than returning the result of a direct read of
> ZCR_EL1 then this field behaves as if it is set to the minimum of the
> stored value and the constrained length inherited from more privileged
> Exception levels in the current Security state, rounded down to the
> nearest implemented vector length."
> 
> I think the behaviour of a direct read is implied: the LEN bits yielded
> by an MRS should contain exactly what was last written to them via MSR.

I agree.

Moreover, the value written to ZCR_ELx.LEN should not be directly adjusted
because the effective value also depends on ZCR_EL(x+1).LEN, and if the
higher-level EL register changes, the lower-level EL must see the effect.

The function that should be modified instead is sve_zcr_len_for_el().


r~
Richard Henderson June 27, 2019, 4:49 p.m. UTC | #15
On 6/21/19 6:34 PM, Andrew Jones wrote:
> +    /*
> +     * In sve_vq_map each set bit is a supported vector length of
> +     * (bit-number + 1) * 16 bytes, i.e. each bit number + 1 is the vector
> +     * length in quadwords. We need a map size twice the maximum
> +     * quadword length though because we use two bits for each vector
> +     * length in order to track three states: uninitialized, off, and on.
> +     */
> +    DECLARE_BITMAP(sve_vq_map, ARM_MAX_VQ * 2);

I don't see that having one continuous bitmap is more convenient than two.
Indeed, there appear to be several places that would be clearer with two.

> +static arm_vq_state arm_cpu_vq_map_get(ARMCPU *cpu, int vq)
> +{
> +    assert(vq <= ARM_MAX_VQ);
> +
> +    return test_bit(vq - 1, cpu->sve_vq_map) |
> +           test_bit(vq - 1 + ARM_MAX_VQ, cpu->sve_vq_map) << 1;
> +}

Neither easier nor more complex w/ one or two bitmaps.

> +static void arm_cpu_vq_map_init(ARMCPU *cpu)
> +{
> +    bitmap_zero(cpu->sve_vq_map, ARM_MAX_VQ * 2);
> +    bitmap_set(cpu->sve_vq_map, ARM_MAX_VQ, ARM_MAX_VQ);
> +}

Clearer with two bitmaps.

	bitmap_zero(cpu->sve_vq_map, ARM_MAX_VQ);
	bitmap_set(cpu->sve_vq_uninit_map, 0, ARM_MAX_VQ);

> +static bool arm_cpu_vq_map_is_finalized(ARMCPU *cpu)
> +{
> +    DECLARE_BITMAP(map, ARM_MAX_VQ * 2);
> +
> +    bitmap_zero(map, ARM_MAX_VQ * 2);
> +    bitmap_set(map, ARM_MAX_VQ, ARM_MAX_VQ);
> +    bitmap_and(map, map, cpu->sve_vq_map, ARM_MAX_VQ * 2);
> +
> +    return bitmap_empty(map, ARM_MAX_VQ * 2);
> +}

Definitely clearer w/ 2 bitmaps,

	return bitmap_empty(cpu->sve_vq_uninit_map);


As for how sve-max-vq=Y and sveX={on,off} interoperate...  I wonder if we can
just remove cpu->sve_max_vq.  That might simplify your code a bit.

What if sve-max-vq=Y "expands" to

	for (X = 1; X <= Y; X++) { sve(X*128)=on }

Then you've got a reasonable in-order definition of how those two sets of
switches interoperate.

The uses of cpu->sve_max_vq within cpu.c and cpu64.c are all initialization
stuff that you're replacing.

The use within sve_zcr_len_for_el can be replaced by AVR_MAX_VQ.  Your "select
supported vector size not larger than" code goes at the end of that function,
such that we select a supported maximum no larger than the raw .LEN values.

The use within aarch64_sve_narrow_vq should in fact assert that the given vq is
set within cpu->sve_vq_map.


r~
Andrew Jones June 28, 2019, 7:27 a.m. UTC | #16
On Thu, Jun 27, 2019 at 06:49:02PM +0200, Richard Henderson wrote:
> On 6/21/19 6:34 PM, Andrew Jones wrote:
> > +    /*
> > +     * In sve_vq_map each set bit is a supported vector length of
> > +     * (bit-number + 1) * 16 bytes, i.e. each bit number + 1 is the vector
> > +     * length in quadwords. We need a map size twice the maximum
> > +     * quadword length though because we use two bits for each vector
> > +     * length in order to track three states: uninitialized, off, and on.
> > +     */
> > +    DECLARE_BITMAP(sve_vq_map, ARM_MAX_VQ * 2);
> 
> I don't see that having one continuous bitmap is more convenient than two.
> Indeed, there appear to be several places that would be clearer with two.
> 
> > +static arm_vq_state arm_cpu_vq_map_get(ARMCPU *cpu, int vq)
> > +{
> > +    assert(vq <= ARM_MAX_VQ);
> > +
> > +    return test_bit(vq - 1, cpu->sve_vq_map) |
> > +           test_bit(vq - 1 + ARM_MAX_VQ, cpu->sve_vq_map) << 1;
> > +}
> 
> Neither easier nor more complex w/ one or two bitmaps.
> 
> > +static void arm_cpu_vq_map_init(ARMCPU *cpu)
> > +{
> > +    bitmap_zero(cpu->sve_vq_map, ARM_MAX_VQ * 2);
> > +    bitmap_set(cpu->sve_vq_map, ARM_MAX_VQ, ARM_MAX_VQ);
> > +}
> 
> Clearer with two bitmaps.
> 
> 	bitmap_zero(cpu->sve_vq_map, ARM_MAX_VQ);
> 	bitmap_set(cpu->sve_vq_uninit_map, 0, ARM_MAX_VQ);
> 
> > +static bool arm_cpu_vq_map_is_finalized(ARMCPU *cpu)
> > +{
> > +    DECLARE_BITMAP(map, ARM_MAX_VQ * 2);
> > +
> > +    bitmap_zero(map, ARM_MAX_VQ * 2);
> > +    bitmap_set(map, ARM_MAX_VQ, ARM_MAX_VQ);
> > +    bitmap_and(map, map, cpu->sve_vq_map, ARM_MAX_VQ * 2);
> > +
> > +    return bitmap_empty(map, ARM_MAX_VQ * 2);
> > +}
> 
> Definitely clearer w/ 2 bitmaps,
> 
> 	return bitmap_empty(cpu->sve_vq_uninit_map);

I guess I don't have a strong opinion on one or two bitmaps. I'm not a big
fan of adding temporary variables to data structures (even if the same
amount of storage is being allocated a different way), but I can change
this for v3.

> 
> 
> As for how sve-max-vq=Y and sveX={on,off} interoperate...  I wonder if we can
> just remove cpu->sve_max_vq.  That might simplify your code a bit.
> 
> What if sve-max-vq=Y "expands" to
> 
> 	for (X = 1; X <= Y; X++) { sve(X*128)=on }
> 
> Then you've got a reasonable in-order definition of how those two sets of
> switches interoperate.
> 
> The uses of cpu->sve_max_vq within cpu.c and cpu64.c are all initialization
> stuff that you're replacing.

Hmm, I can look at removing cpu->sve_max_vq, by keeping the property and
expanding it, as you suggest. However I still need a few initialization
states to track what the default vq value should be, so I'm not sure
how I'd implement that without it or some other cpu state, which would
be another temporary cpu state member. Also, while it's true we can always
get the max vq with next-smaller(ARM_MAX_VQ + 1), having it cached in
cpu->sve_max_vq is convenient. That said, I think we'd rather keep it.

> 
> The use within sve_zcr_len_for_el can be replaced by AVR_MAX_VQ.  Your "select
> supported vector size not larger than" code goes at the end of that function,
> such that we select a supported maximum no larger than the raw .LEN values.
> 
> The use within aarch64_sve_narrow_vq should in fact assert that the given vq is
> set within cpu->sve_vq_map.

Yeah, I'm glad Eric pointed out that I had a bug there in the emulation.
I'll get that fixed for v3.

Thanks,
drew
Andrew Jones June 28, 2019, 8:31 a.m. UTC | #17
On Fri, Jun 28, 2019 at 09:27:39AM +0200, Andrew Jones wrote:
> On Thu, Jun 27, 2019 at 06:49:02PM +0200, Richard Henderson wrote:
> > On 6/21/19 6:34 PM, Andrew Jones wrote:
> > > +    /*
> > > +     * In sve_vq_map each set bit is a supported vector length of
> > > +     * (bit-number + 1) * 16 bytes, i.e. each bit number + 1 is the vector
> > > +     * length in quadwords. We need a map size twice the maximum
> > > +     * quadword length though because we use two bits for each vector
> > > +     * length in order to track three states: uninitialized, off, and on.
> > > +     */
> > > +    DECLARE_BITMAP(sve_vq_map, ARM_MAX_VQ * 2);
> > 
> > I don't see that having one continuous bitmap is more convenient than two.
> > Indeed, there appear to be several places that would be clearer with two.
> > 
> > > +static arm_vq_state arm_cpu_vq_map_get(ARMCPU *cpu, int vq)
> > > +{
> > > +    assert(vq <= ARM_MAX_VQ);
> > > +
> > > +    return test_bit(vq - 1, cpu->sve_vq_map) |
> > > +           test_bit(vq - 1 + ARM_MAX_VQ, cpu->sve_vq_map) << 1;
> > > +}
> > 
> > Neither easier nor more complex w/ one or two bitmaps.
> > 
> > > +static void arm_cpu_vq_map_init(ARMCPU *cpu)
> > > +{
> > > +    bitmap_zero(cpu->sve_vq_map, ARM_MAX_VQ * 2);
> > > +    bitmap_set(cpu->sve_vq_map, ARM_MAX_VQ, ARM_MAX_VQ);
> > > +}
> > 
> > Clearer with two bitmaps.
> > 
> > 	bitmap_zero(cpu->sve_vq_map, ARM_MAX_VQ);
> > 	bitmap_set(cpu->sve_vq_uninit_map, 0, ARM_MAX_VQ);
> > 
> > > +static bool arm_cpu_vq_map_is_finalized(ARMCPU *cpu)
> > > +{
> > > +    DECLARE_BITMAP(map, ARM_MAX_VQ * 2);
> > > +
> > > +    bitmap_zero(map, ARM_MAX_VQ * 2);
> > > +    bitmap_set(map, ARM_MAX_VQ, ARM_MAX_VQ);
> > > +    bitmap_and(map, map, cpu->sve_vq_map, ARM_MAX_VQ * 2);
> > > +
> > > +    return bitmap_empty(map, ARM_MAX_VQ * 2);
> > > +}
> > 
> > Definitely clearer w/ 2 bitmaps,
> > 
> > 	return bitmap_empty(cpu->sve_vq_uninit_map);
> 
> I guess I don't have a strong opinion on one or two bitmaps. I'm not a big
> fan of adding temporary variables to data structures (even if the same
> amount of storage is being allocated a different way), but I can change
> this for v3.

On second thought, since in this case there is storage savings (one less
long), then I'd rather we keep it a single bitmap. Maybe I can just add
some comments to these bitmap operations?

Thanks,
drew
Richard Henderson June 29, 2019, 12:10 a.m. UTC | #18
On 6/28/19 9:27 AM, Andrew Jones wrote:
> Also, while it's true we can always
> get the max vq with next-smaller(ARM_MAX_VQ + 1), having it cached in
> cpu->sve_max_vq is convenient. That said, I think we'd rather keep it.

When is it convenient, and for what?

Certainly the only thing that we check after boot is the largest enabled vq not
larger than x.  And for that I don't see that sve_max_vq is relevant at all.

Oh, something that we should also think about is -cpu foo, where foo is one of
the Fujitsu thingumies.  We should be able to default sve_vq_map to that which
a real bit of hw actually supports.  I, for one, welcome our typedef long
overlords.  ;-)


r~
Andrew Jones July 17, 2019, 8:13 a.m. UTC | #19
On Sat, Jun 29, 2019 at 02:10:28AM +0200, Richard Henderson wrote:
> On 6/28/19 9:27 AM, Andrew Jones wrote:
> > Also, while it's true we can always
> > get the max vq with next-smaller(ARM_MAX_VQ + 1), having it cached in
> > cpu->sve_max_vq is convenient. That said, I think we'd rather keep it.
> 
> When is it convenient, and for what?

It's used for the upper boundary in several loops in target/arm/kvm64.c

> 
> Certainly the only thing that we check after boot is the largest enabled vq not
> larger than x.  And for that I don't see that sve_max_vq is relevant at all.
> 
> Oh, something that we should also think about is -cpu foo, where foo is one of
> the Fujitsu thingumies.  We should be able to default sve_vq_map to that which
> a real bit of hw actually supports.  I, for one, welcome our typedef long
> overlords.  ;-)

I don't think we need to implement an A64FX cpu model with this series,
although that's a nice idea for people that focus on TCG to do as a
follow-up series. This series fully enables the guest to use SVE on the
A64FX when KVM is enabled, without the need to special case it in any way.

Thanks,
drew
diff mbox series

Patch

diff --git a/target/arm/cpu.c b/target/arm/cpu.c
index f08e178fc84b..e060a0d9df0e 100644
--- a/target/arm/cpu.c
+++ b/target/arm/cpu.c
@@ -1019,6 +1019,12 @@  static void arm_cpu_realizefn(DeviceState *dev, Error **errp)
         return;
     }
 
+    arm_cpu_sve_finalize(cpu, &local_err);
+    if (local_err) {
+        error_propagate(errp, local_err);
+        return;
+    }
+
     if (arm_feature(env, ARM_FEATURE_AARCH64) &&
         cpu->has_vfp != cpu->has_neon) {
         /*
diff --git a/target/arm/cpu.h b/target/arm/cpu.h
index f9da672be575..cbb155cf72a5 100644
--- a/target/arm/cpu.h
+++ b/target/arm/cpu.h
@@ -184,8 +184,13 @@  typedef struct {
 
 #ifdef TARGET_AARCH64
 # define ARM_MAX_VQ    16
+void arm_cpu_sve_finalize(ARMCPU *cpu, Error **errp);
+uint32_t arm_cpu_vq_map_next_smaller(ARMCPU *cpu, uint32_t vq);
 #else
 # define ARM_MAX_VQ    1
+static inline void arm_cpu_sve_finalize(ARMCPU *cpu, Error **errp) { }
+static inline uint32_t arm_cpu_vq_map_next_smaller(ARMCPU *cpu, uint32_t vq)
+{ return 0; }
 #endif
 
 typedef struct ARMVectorReg {
@@ -915,6 +920,15 @@  struct ARMCPU {
 
     /* Used to set the maximum vector length the cpu will support.  */
     uint32_t sve_max_vq;
+
+    /*
+     * In sve_vq_map each set bit is a supported vector length of
+     * (bit-number + 1) * 16 bytes, i.e. each bit number + 1 is the vector
+     * length in quadwords. We need a map size twice the maximum
+     * quadword length though because we use two bits for each vector
+     * length in order to track three states: uninitialized, off, and on.
+     */
+    DECLARE_BITMAP(sve_vq_map, ARM_MAX_VQ * 2);
 };
 
 void arm_cpu_post_init(Object *obj);
diff --git a/target/arm/cpu64.c b/target/arm/cpu64.c
index 02ada65f240c..5def82111dee 100644
--- a/target/arm/cpu64.c
+++ b/target/arm/cpu64.c
@@ -257,6 +257,149 @@  static void aarch64_a72_initfn(Object *obj)
     define_arm_cp_regs(cpu, cortex_a72_a57_a53_cp_reginfo);
 }
 
+/*
+ * While we eventually use cpu->sve_vq_map as a typical bitmap, where each vq
+ * has only two states (off/on), until we've finalized the map at realize time
+ * we use an extra bit, at the vq - 1 + ARM_MAX_VQ bit number, to also allow
+ * tracking of the uninitialized state. The arm_vq_state typedef and following
+ * functions allow us to more easily work with the bitmap. Also, while the map
+ * is still initializing, sve-max-vq has an additional three states, bringing
+ * the number of its states to five, which are the following:
+ *
+ * sve-max-vq:
+ *   0:    SVE is disabled. The default value for a vq in the map is 'OFF'.
+ *  -1:    SVE is enabled, but neither sve-max-vq nor sve<vl-bits> properties
+ *         have yet been specified by the user. The default value for a vq in
+ *         the map is 'ON'.
+ *  -2:    SVE is enabled and one or more sve<vl-bits> properties have been
+ *         set to 'OFF' by the user, but no sve<vl-bits> properties have yet
+ *         been set to 'ON'. The user is now blocked from setting sve-max-vq
+ *         and the default value for a vq in the map is 'ON'.
+ *  -3:    SVE is enabled and one or more sve<vl-bits> properties have been
+ *         set to 'ON' by the user. The user is blocked from setting sve-max-vq
+ *         and the default value for a vq in the map is 'OFF'. sve-max-vq never
+ *         transitions back to -2, even if later inputs disable the vector
+ *         lengths that initially transitioned sve-max-vq to this state. This
+ *         avoids the default values from flip-flopping.
+ *  [1-ARM_MAX_VQ]: SVE is enabled and the user has specified a valid
+ *                  sve-max-vq. The sve-max-vq specified vq and all smaller
+ *                  vq's will be initially enabled. All larger vq's will have
+ *                  a default of 'OFF'.
+ */
+#define ARM_SVE_INIT          -1
+#define ARM_VQ_DEFAULT_ON     -2
+#define ARM_VQ_DEFAULT_OFF    -3
+
+#define arm_sve_have_max_vq(cpu) ((int32_t)(cpu)->sve_max_vq > 0)
+
+typedef enum arm_vq_state {
+    ARM_VQ_OFF,
+    ARM_VQ_ON,
+    ARM_VQ_UNINITIALIZED,
+} arm_vq_state;
+
+static arm_vq_state arm_cpu_vq_map_get(ARMCPU *cpu, int vq)
+{
+    assert(vq <= ARM_MAX_VQ);
+
+    return test_bit(vq - 1, cpu->sve_vq_map) |
+           test_bit(vq - 1 + ARM_MAX_VQ, cpu->sve_vq_map) << 1;
+}
+
+static void arm_cpu_vq_map_set(ARMCPU *cpu, int vq, arm_vq_state state)
+{
+    assert(state == ARM_VQ_OFF || state == ARM_VQ_ON);
+    assert(vq <= ARM_MAX_VQ);
+
+    clear_bit(vq - 1 + ARM_MAX_VQ, cpu->sve_vq_map);
+
+    if (state == ARM_VQ_ON) {
+        set_bit(vq - 1, cpu->sve_vq_map);
+    } else {
+        clear_bit(vq - 1, cpu->sve_vq_map);
+    }
+}
+
+static void arm_cpu_vq_map_init(ARMCPU *cpu)
+{
+    bitmap_zero(cpu->sve_vq_map, ARM_MAX_VQ * 2);
+    bitmap_set(cpu->sve_vq_map, ARM_MAX_VQ, ARM_MAX_VQ);
+}
+
+static bool arm_cpu_vq_map_is_finalized(ARMCPU *cpu)
+{
+    DECLARE_BITMAP(map, ARM_MAX_VQ * 2);
+
+    bitmap_zero(map, ARM_MAX_VQ * 2);
+    bitmap_set(map, ARM_MAX_VQ, ARM_MAX_VQ);
+    bitmap_and(map, map, cpu->sve_vq_map, ARM_MAX_VQ * 2);
+
+    return bitmap_empty(map, ARM_MAX_VQ * 2);
+}
+
+static void arm_cpu_vq_map_finalize(ARMCPU *cpu)
+{
+    Error *err = NULL;
+    char name[8];
+    uint32_t vq;
+    bool value;
+
+    /*
+     * We use the property get accessor because it knows what default
+     * values to return for uninitialized vector lengths.
+     */
+    for (vq = 1; vq <= ARM_MAX_VQ; ++vq) {
+        sprintf(name, "sve%d", vq * 128);
+        value = object_property_get_bool(OBJECT(cpu), name, &err);
+        assert(!err);
+        if (value) {
+            arm_cpu_vq_map_set(cpu, vq, ARM_VQ_ON);
+        } else {
+            arm_cpu_vq_map_set(cpu, vq, ARM_VQ_OFF);
+        }
+    }
+
+    assert(arm_cpu_vq_map_is_finalized(cpu));
+}
+
+void arm_cpu_sve_finalize(ARMCPU *cpu, Error **errp)
+{
+    Error *err = NULL;
+
+    if (!cpu->sve_max_vq) {
+        bitmap_zero(cpu->sve_vq_map, ARM_MAX_VQ * 2);
+        return;
+    }
+
+    if (cpu->sve_max_vq == ARM_SVE_INIT) {
+        object_property_set_uint(OBJECT(cpu), ARM_MAX_VQ, "sve-max-vq", &err);
+        if (err) {
+            error_propagate(errp, err);
+            return;
+        }
+        assert(cpu->sve_max_vq == ARM_MAX_VQ);
+        arm_cpu_vq_map_finalize(cpu);
+    } else {
+        arm_cpu_vq_map_finalize(cpu);
+        if (!arm_sve_have_max_vq(cpu)) {
+            cpu->sve_max_vq = arm_cpu_vq_map_next_smaller(cpu, ARM_MAX_VQ + 1);
+        }
+    }
+
+    assert(cpu->sve_max_vq == arm_cpu_vq_map_next_smaller(cpu, ARM_MAX_VQ + 1));
+}
+
+uint32_t arm_cpu_vq_map_next_smaller(ARMCPU *cpu, uint32_t vq)
+{
+    uint32_t bitnum;
+
+    assert(vq <= ARM_MAX_VQ + 1);
+    assert(arm_cpu_vq_map_is_finalized(cpu));
+
+    bitnum = find_last_bit(cpu->sve_vq_map, vq - 1);
+    return bitnum == vq - 1 ? 0 : bitnum + 1;
+}
+
 static void cpu_max_get_sve_max_vq(Object *obj, Visitor *v, const char *name,
                                    void *opaque, Error **errp)
 {
@@ -283,12 +426,203 @@  static void cpu_max_set_sve_max_vq(Object *obj, Visitor *v, const char *name,
         return;
     }
 
+    /*
+     * It gets complicated trying to support both sve-max-vq and
+     * sve<vl-bits> properties together, so we mostly don't. We
+     * do allow both if sve-max-vq is specified first and only once
+     * though.
+     */
+    if (cpu->sve_max_vq != ARM_SVE_INIT) {
+        error_setg(errp, "sve<vl-bits> in use or sve-max-vq already "
+                   "specified");
+        error_append_hint(errp, "sve-max-vq must come before all "
+                          "sve<vl-bits> properties and it must only "
+                          "be specified once.\n");
+        return;
+    }
+
     cpu->sve_max_vq = value;
 
     if (cpu->sve_max_vq == 0 || cpu->sve_max_vq > ARM_MAX_VQ) {
         error_setg(errp, "unsupported SVE vector length");
         error_append_hint(errp, "Valid sve-max-vq in range [1-%d]\n",
                           ARM_MAX_VQ);
+    } else {
+        uint32_t vq;
+
+        for (vq = 1; vq <= cpu->sve_max_vq; ++vq) {
+            char name[8];
+            sprintf(name, "sve%d", vq * 128);
+            object_property_set_bool(obj, true, name, &err);
+            if (err) {
+                error_propagate(errp, err);
+                return;
+            }
+        }
+    }
+}
+
+static void cpu_arm_get_sve_vq(Object *obj, Visitor *v, const char *name,
+                               void *opaque, Error **errp)
+{
+    ARMCPU *cpu = ARM_CPU(obj);
+    int vq = atoi(&name[3]) / 128;
+    arm_vq_state vq_state;
+    bool value;
+
+    vq_state = arm_cpu_vq_map_get(cpu, vq);
+
+    if (!cpu->sve_max_vq) {
+        /* All vector lengths are disabled when SVE is off. */
+        value = false;
+    } else if (vq_state == ARM_VQ_ON) {
+        value = true;
+    } else if (vq_state == ARM_VQ_OFF) {
+        value = false;
+    } else {
+        /*
+         * vq is uninitialized. We pick a default here based on the
+         * the state of sve-max-vq and other sve<vl-bits> properties.
+         */
+        if (arm_sve_have_max_vq(cpu)) {
+            /*
+             * If we have sve-max-vq, then all remaining uninitialized
+             * vq's are 'OFF'.
+             */
+            value = false;
+        } else {
+            switch (cpu->sve_max_vq) {
+            case ARM_SVE_INIT:
+            case ARM_VQ_DEFAULT_ON:
+                value = true;
+                break;
+            case ARM_VQ_DEFAULT_OFF:
+                value = false;
+                break;
+            }
+        }
+    }
+
+    visit_type_bool(v, name, &value, errp);
+}
+
+static void cpu_arm_set_sve_vq(Object *obj, Visitor *v, const char *name,
+                               void *opaque, Error **errp)
+{
+    ARMCPU *cpu = ARM_CPU(obj);
+    int vq = atoi(&name[3]) / 128;
+    arm_vq_state vq_state;
+    Error *err = NULL;
+    uint32_t max_vq = 0;
+    bool value;
+
+    visit_type_bool(v, name, &value, &err);
+    if (err) {
+        error_propagate(errp, err);
+        return;
+    }
+
+    if (value && !cpu->sve_max_vq) {
+        error_setg(errp, "cannot enable %s", name);
+        error_append_hint(errp, "SVE has been disabled with sve=off\n");
+        return;
+    } else if (!cpu->sve_max_vq) {
+        /*
+         * We don't complain about disabling vector lengths when SVE
+         * is off, but we don't do anything either.
+         */
+        return;
+    }
+
+    if (arm_sve_have_max_vq(cpu)) {
+        max_vq = cpu->sve_max_vq;
+    } else {
+        if (value) {
+            cpu->sve_max_vq = ARM_VQ_DEFAULT_OFF;
+        } else if (cpu->sve_max_vq != ARM_VQ_DEFAULT_OFF) {
+            cpu->sve_max_vq = ARM_VQ_DEFAULT_ON;
+        }
+    }
+
+    /*
+     * We need to know the maximum vector length, which may just currently
+     * be the maximum length, in order to validate the enabling/disabling
+     * of this vector length. We use the property get accessor in order to
+     * get the appropriate default value for any uninitialized lengths.
+     */
+    if (!max_vq) {
+        char tmp[8];
+        bool s;
+
+        for (max_vq = ARM_MAX_VQ; max_vq >= 1; --max_vq) {
+            sprintf(tmp, "sve%d", max_vq * 128);
+            s = object_property_get_bool(OBJECT(cpu), tmp, &err);
+            assert(!err);
+            if (s) {
+                break;
+            }
+        }
+    }
+
+    if (arm_sve_have_max_vq(cpu) && value && vq > cpu->sve_max_vq) {
+        error_setg(errp, "cannot enable %s", name);
+        error_append_hint(errp, "vq=%d (%d bits) is larger than the "
+                          "maximum vector length, sve-max-vq=%d "
+                          "(%d bits)\n", vq, vq * 128,
+                          cpu->sve_max_vq, cpu->sve_max_vq * 128);
+    } else if (arm_sve_have_max_vq(cpu) && !value && vq == cpu->sve_max_vq) {
+        error_setg(errp, "cannot disable %s", name);
+        error_append_hint(errp, "The maximum vector length must be "
+                          "enabled, sve-max-vq=%d (%d bits)\n",
+                          cpu->sve_max_vq, cpu->sve_max_vq * 128);
+    } else if (arm_sve_have_max_vq(cpu) && !value && vq < cpu->sve_max_vq &&
+               is_power_of_2(vq)) {
+        error_setg(errp, "cannot disable %s", name);
+        error_append_hint(errp, "vq=%d (%d bits) is required as it is a "
+                          "power-of-2 length smaller than the maximum, "
+                          "sve-max-vq=%d (%d bits)\n", vq, vq * 128,
+                          cpu->sve_max_vq, cpu->sve_max_vq * 128);
+    } else if (!value && vq < max_vq && is_power_of_2(vq)) {
+        error_setg(errp, "cannot disable %s", name);
+        error_append_hint(errp, "Vector length %d-bits is required as it "
+                          "is a power-of-2 length smaller than another "
+                          "enabled vector length. Disable all larger vector "
+                          "lengths first.\n", vq * 128);
+    } else {
+        if (value) {
+            bool fail = false;
+            uint32_t s;
+
+            /*
+             * Enabling a vector length automatically enables all
+             * uninitialized power-of-2 lengths smaller than it, as
+             * per the architecture.
+             */
+            for (s = 1; s < vq; ++s) {
+                if (is_power_of_2(s)) {
+                    vq_state = arm_cpu_vq_map_get(cpu, s);
+                    if (vq_state == ARM_VQ_UNINITIALIZED) {
+                        arm_cpu_vq_map_set(cpu, s, ARM_VQ_ON);
+                    } else if (vq_state == ARM_VQ_OFF) {
+                        fail = true;
+                        break;
+                    }
+                }
+            }
+
+            if (fail) {
+                error_setg(errp, "cannot enable %s", name);
+                error_append_hint(errp, "Vector length %d-bits is disabled "
+                                  "and is a power-of-2 length smaller than "
+                                  "%s. All power-of-2 vector lengths smaller "
+                                  "than the maximum length are required.\n",
+                                  s * 128, name);
+            } else {
+                arm_cpu_vq_map_set(cpu, vq, ARM_VQ_ON);
+            }
+        } else {
+            arm_cpu_vq_map_set(cpu, vq, ARM_VQ_OFF);
+        }
     }
 }
 
@@ -318,10 +652,11 @@  static void cpu_arm_set_sve(Object *obj, Visitor *v, const char *name,
         /*
          * We handle the -cpu <cpu>,sve=off,sve=on case by reinitializing,
          * but otherwise we don't do anything as an sve=on could come after
-         * a sve-max-vq setting.
+         * a sve-max-vq or sve<vl-bits> setting.
          */
         if (!cpu->sve_max_vq) {
-            cpu->sve_max_vq = ARM_MAX_VQ;
+            cpu->sve_max_vq = ARM_SVE_INIT;
+            arm_cpu_vq_map_init(cpu);
         }
     } else {
         cpu->sve_max_vq = 0;
@@ -336,6 +671,7 @@  static void cpu_arm_set_sve(Object *obj, Visitor *v, const char *name,
 static void aarch64_max_initfn(Object *obj)
 {
     ARMCPU *cpu = ARM_CPU(obj);
+    uint32_t vq;
 
     if (kvm_enabled()) {
         kvm_arm_set_cpu_features_from_host(cpu);
@@ -420,11 +756,29 @@  static void aarch64_max_initfn(Object *obj)
         cpu->dcz_blocksize = 7; /*  512 bytes */
 #endif
 
-        cpu->sve_max_vq = ARM_MAX_VQ;
+        /*
+         * sve_max_vq is initially unspecified, but must be initialized to a
+         * non-zero value (ARM_SVE_INIT) to indicate that this cpu type has
+         * SVE. It will be finalized in arm_cpu_realizefn().
+         */
+        cpu->sve_max_vq = ARM_SVE_INIT;
         object_property_add(obj, "sve-max-vq", "uint32", cpu_max_get_sve_max_vq,
                             cpu_max_set_sve_max_vq, NULL, NULL, &error_fatal);
         object_property_add(obj, "sve", "bool", cpu_arm_get_sve,
                             cpu_arm_set_sve, NULL, NULL, &error_fatal);
+
+        /*
+         * sve_vq_map uses a special state while setting properties, so
+         * we initialize it here with its init function and finalize it
+         * in arm_cpu_realizefn().
+         */
+        arm_cpu_vq_map_init(cpu);
+        for (vq = 1; vq <= ARM_MAX_VQ; ++vq) {
+            char name[8];
+            sprintf(name, "sve%d", vq * 128);
+            object_property_add(obj, name, "bool", cpu_arm_get_sve_vq,
+                                cpu_arm_set_sve_vq, NULL, NULL, &error_fatal);
+        }
     }
 }
 
diff --git a/target/arm/helper.c b/target/arm/helper.c
index f500ccb6d31b..b7b719dba57f 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -5324,7 +5324,16 @@  static void zcr_write(CPUARMState *env, const ARMCPRegInfo *ri,
 
     /* Bits other than [3:0] are RAZ/WI.  */
     QEMU_BUILD_BUG_ON(ARM_MAX_VQ > 16);
-    raw_write(env, ri, value & 0xf);
+    value &= 0xf;
+
+    if (value) {
+        /* get next vq that is smaller than or equal to value's vq */
+        uint32_t vq = value + 1;
+        vq = arm_cpu_vq_map_next_smaller(cpu, vq + 1);
+        value = vq - 1;
+    }
+
+    raw_write(env, ri, value);
 
     /*
      * Because we arrived here, we know both FP and SVE are enabled;
diff --git a/target/arm/monitor.c b/target/arm/monitor.c
index 157c487a1551..1e213906fd8f 100644
--- a/target/arm/monitor.c
+++ b/target/arm/monitor.c
@@ -89,8 +89,24 @@  GICCapabilityList *qmp_query_gic_capabilities(Error **errp)
     return head;
 }
 
+QEMU_BUILD_BUG_ON(ARM_MAX_VQ > 16);
+
+/*
+ * These are cpu model features we want to advertise. The order here
+ * matters as this is the order in which qmp_query_cpu_model_expansion
+ * will attempt to set them. If there are dependencies between features,
+ * as there are with the sve<vl-bits> features, then the order that
+ * considers those dependencies must be used.
+ *
+ * The sve<vl-bits> features need to be in reverse order in order to
+ * enable/disable the largest vector lengths first, ensuring all
+ * power-of-2 vector lengths smaller can also be enabled/disabled.
+ */
 static const char *cpu_model_advertised_features[] = {
     "aarch64", "pmu", "sve",
+    "sve2048", "sve1920", "sve1792", "sve1664", "sve1536", "sve1408",
+    "sve1280", "sve1152", "sve1024", "sve896", "sve768", "sve640",
+    "sve512", "sve384", "sve256", "sve128",
     NULL
 };
 
diff --git a/tests/arm-cpu-features.c b/tests/arm-cpu-features.c
index 509e458e9c2f..a4bf6aec00df 100644
--- a/tests/arm-cpu-features.c
+++ b/tests/arm-cpu-features.c
@@ -13,6 +13,18 @@ 
 #include "qapi/qmp/qdict.h"
 #include "qapi/qmp/qjson.h"
 
+#if __SIZEOF_LONG__ == 8
+#define BIT(n) (1UL << (n))
+#else
+#define BIT(n) (1ULL << (n))
+#endif
+
+/*
+ * We expect the SVE max-vq to be 16. Also it must be <= 64
+ * for our test code, otherwise 'vls' can't just be a uint64_t.
+ */
+#define SVE_MAX_VQ 16
+
 #define MACHINE    "-machine virt,gic-version=max "
 #define QUERY_HEAD "{ 'execute': 'query-cpu-model-expansion', " \
                      "'arguments': { 'type': 'full', "
@@ -137,6 +149,201 @@  static void assert_bad_props(QTestState *qts, const char *cpu_type)
     qobject_unref(resp);
 }
 
+static void resp_get_sve_vls(QDict *resp, uint64_t *vls, uint32_t *max_vq)
+{
+    const QDictEntry *e;
+    QDict *qdict;
+    int n = 0;
+
+    *vls = 0;
+
+    qdict = resp_get_props(resp);
+
+    for (e = qdict_first(qdict); e; e = qdict_next(qdict, e)) {
+        if (strlen(e->key) > 3 && !strncmp(e->key, "sve", 3) &&
+            g_ascii_isdigit(e->key[3])) {
+            char *endptr;
+            int bits;
+
+            bits = g_ascii_strtoll(&e->key[3], &endptr, 10);
+            if (!bits || *endptr != '\0') {
+                continue;
+            }
+
+            if (qdict_get_bool(qdict, e->key)) {
+                *vls |= BIT((bits / 128) - 1);
+            }
+            ++n;
+        }
+    }
+
+    g_assert(n == SVE_MAX_VQ);
+
+    *max_vq = !*vls ? 0 : 64 - __builtin_clzll(*vls);
+}
+
+static uint64_t sve_get_vls(QTestState *qts, const char *cpu_type,
+                            const char *fmt, ...)
+{
+    QDict *resp;
+    uint64_t vls;
+    uint32_t max_vq;
+
+    if (fmt) {
+        QDict *args;
+        va_list ap;
+
+        va_start(ap, fmt);
+        args = qdict_from_vjsonf_nofail(fmt, ap);
+        va_end(ap);
+
+        resp = qtest_qmp(qts, QUERY_HEAD "'model': { 'name': %s, "
+                                                    "'props': %p }"
+                              QUERY_TAIL, cpu_type, args);
+    } else {
+        resp = do_query_no_props(qts, cpu_type);
+    }
+
+    g_assert(resp);
+    resp_get_sve_vls(resp, &vls, &max_vq);
+    qobject_unref(resp);
+
+    return vls;
+}
+
+#define assert_sve_vls(qts, cpu_type, expected_vls, fmt, ...) \
+    g_assert(sve_get_vls(qts, cpu_type, fmt, ##__VA_ARGS__) == expected_vls)
+
+static void sve_tests_default(QTestState *qts, const char *cpu_type)
+{
+    /*
+     * With no sve-max-vq or sve<vl-bits> properties on the command line
+     * the default is to have all vector lengths enabled.
+     */
+    assert_sve_vls(qts, cpu_type, BIT(SVE_MAX_VQ) - 1, NULL);
+
+    /*
+     * -------------------------------------------------------------------
+     *               power-of-2(vq)   all-power-            can      can
+     *                                of-2(< vq)          enable   disable
+     * -------------------------------------------------------------------
+     * vq < max_vq      no            MUST*                yes      yes
+     * vq < max_vq      yes           MUST*                yes      no
+     * -------------------------------------------------------------------
+     * vq == max_vq     n/a           MUST*                yes**    yes**
+     * -------------------------------------------------------------------
+     * vq > max_vq      n/a           no                   no       yes
+     * vq > max_vq      n/a           yes                  yes      yes
+     * -------------------------------------------------------------------
+     *
+     * [*] "MUST" means this requirement must already be satisfied,
+     *     otherwise 'max_vq' couldn't itself be enabled.
+     *
+     * [**] Not testable with the QMP interface, only with the command line.
+     */
+
+    /* max_vq := 8 */
+    assert_sve_vls(qts, cpu_type, 0x8b, "{ 'sve1024': true }");
+
+    /* max_vq := 8, vq < max_vq, !power-of-2(vq) */
+    assert_sve_vls(qts, cpu_type, 0x8f,
+                   "{ 'sve1024': true, 'sve384': true }");
+    assert_sve_vls(qts, cpu_type, 0x8b,
+                   "{ 'sve1024': true, 'sve384': false }");
+
+    /* max_vq := 8, vq < max_vq, power-of-2(vq) */
+    assert_sve_vls(qts, cpu_type, 0x8b,
+                   "{ 'sve1024': true, 'sve256': true }");
+    assert_error(qts, cpu_type, "cannot disable sve256",
+                 "{ 'sve1024': true, 'sve256': false }");
+
+    /*
+     * max_vq := 3, vq > max_vq, !all-power-of-2(< vq)
+     *
+     * If given sve384=on,sve512=off,sve640=on the command line error would be
+     * "cannot enable sve640", but QMP visits the vector lengths in reverse
+     * order, so we get "cannot disable sve512" instead. The command line would
+     * also give that error if given sve384=on,sve640=on,sve512=off, so this is
+     * all fine. The important thing is that we get an error.
+     */
+    assert_error(qts, cpu_type, "cannot disable sve512",
+                 "{ 'sve384': true, 'sve512': false, 'sve640': true }");
+
+    /*
+     * We can disable power-of-2 vector lengths when all larger lengths
+     * are also disabled. The shorter, sve384=on,sve512=off,sve640=off
+     * works on the command line, but QMP doesn't know that all the
+     * vector lengths larger than 384-bits will be disabled until it
+     * sees the enabling of sve384, which comes near the end since it
+     * visits the lengths in reverse order. So we just have to explicitly
+     * disable them all.
+     */
+    assert_sve_vls(qts, cpu_type, 0x7,
+                   "{ 'sve384': true, 'sve512': false, 'sve640': false, "
+                   "  'sve768': false, 'sve896': false, 'sve1024': false, "
+                   "  'sve1152': false, 'sve1280': false, 'sve1408': false, "
+                   "  'sve1536': false, 'sve1664': false, 'sve1792': false, "
+                   "  'sve1920': false, 'sve2048': false }");
+
+    /* max_vq := 3, vq > max_vq, all-power-of-2(< vq) */
+    assert_sve_vls(qts, cpu_type, 0x1f,
+                   "{ 'sve384': true, 'sve512': true, 'sve640': true }");
+    assert_sve_vls(qts, cpu_type, 0xf,
+                   "{ 'sve384': true, 'sve512': true, 'sve640': false }");
+}
+
+static void sve_tests_sve_max_vq_8(const void *data)
+{
+    QTestState *qts;
+
+    qts = qtest_init(MACHINE "-cpu max,sve-max-vq=8");
+
+    assert_sve_vls(qts, "max", BIT(8) - 1, NULL);
+
+    /*
+     * Disabling the max-vq set by sve-max-vq is not allowed, but
+     * of course enabling it is OK.
+     */
+    assert_error(qts, "max", "cannot disable sve1024", "{ 'sve1024': false }");
+    assert_sve_vls(qts, "max", 0xff, "{ 'sve1024': true }");
+
+    /*
+     * Enabling anything larger than max-vq set by sve-max-vq is not
+     * allowed, but of course disabling everything larger is OK.
+     */
+    assert_error(qts, "max", "cannot enable sve1152", "{ 'sve1152': true }");
+    assert_sve_vls(qts, "max", 0xff, "{ 'sve1152': false }");
+
+    /*
+     * We can disable non power-of-2 lengths smaller than the max-vq
+     * set by sve-max-vq, but not power-of-2 lengths.
+     */
+    assert_sve_vls(qts, "max", 0xfb, "{ 'sve384': false }");
+    assert_error(qts, "max", "cannot disable sve256", "{ 'sve256': false }");
+
+    qtest_quit(qts);
+}
+
+static void sve_tests_sve_off(const void *data)
+{
+    QTestState *qts;
+
+    qts = qtest_init(MACHINE "-cpu max,sve=off");
+
+    /*
+     * SVE is off, so the map should be empty.
+     */
+    assert_sve_vls(qts, "max", 0, NULL);
+
+    /*
+     * We can't turn anything on, but off is OK.
+     */
+    assert_error(qts, "max", "cannot enable sve128", "{ 'sve128': true }");
+    assert_sve_vls(qts, "max", 0, "{ 'sve128': false }");
+
+    qtest_quit(qts);
+}
+
 static void test_query_cpu_model_expansion(const void *data)
 {
     QTestState *qts;
@@ -159,9 +366,12 @@  static void test_query_cpu_model_expansion(const void *data)
     if (g_str_equal(qtest_get_arch(), "aarch64")) {
         assert_has_feature(qts, "max", "aarch64");
         assert_has_feature(qts, "max", "sve");
+        assert_has_feature(qts, "max", "sve128");
         assert_has_feature(qts, "cortex-a57", "pmu");
         assert_has_feature(qts, "cortex-a57", "aarch64");
 
+        sve_tests_default(qts, "max");
+
         /* Test that features that depend on KVM generate errors without. */
         assert_error(qts, "max",
                      "'aarch64' feature cannot be disabled "
@@ -213,6 +423,13 @@  int main(int argc, char **argv)
     qtest_add_data_func("/arm/query-cpu-model-expansion",
                         NULL, test_query_cpu_model_expansion);
 
+    if (g_str_equal(qtest_get_arch(), "aarch64")) {
+        qtest_add_data_func("/arm/max/query-cpu-model-expansion/sve-max-vq-8",
+                            NULL, sve_tests_sve_max_vq_8);
+        qtest_add_data_func("/arm/max/query-cpu-model-expansion/sve-off",
+                            NULL, sve_tests_sve_off);
+    }
+
     if (kvm_available) {
         qtest_add_data_func("/arm/kvm/query-cpu-model-expansion",
                             NULL, test_query_cpu_model_expansion_kvm);