[v1,06/21] RISC-V FPU Support

Message ID	1514940265-18093-7-git-send-email-mjc@sifive.com (mailing list archive)
State	New, archived
Headers	show Return-Path: <qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@nongnu.org> From: Michael Clark <mjc@sifive.com> To: qemu-devel@nongnu.org Date: Wed, 3 Jan 2018 13:44:10 +1300 Message-Id: <1514940265-18093-7-git-send-email-mjc@sifive.com> In-Reply-To: <1514940265-18093-1-git-send-email-mjc@sifive.com> References: <1514940265-18093-1-git-send-email-mjc@sifive.com> Subject: [Qemu-devel] [PATCH v1 06/21] RISC-V FPU Support Precedence: list Cc: Bastian Koppelmann <kbastian@mail.uni-paderborn.de>, Michael Clark <mjc@sifive.com>, Sagar Karandikar <sagark@eecs.berkeley.edu> Errors-To: qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@nongnu.org Sender: "Qemu-devel" <qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@nongnu.org>

Michael Clark Jan. 3, 2018, 12:44 a.m. UTC

Helper routines for FPU instructions and NaN definitions.

Signed-off-by: Michael Clark <mjc@sifive.com>
---
 fpu/softfloat-specialize.h |   7 +-
 target/riscv/fpu_helper.c  | 591 +++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 595 insertions(+), 3 deletions(-)
 create mode 100644 target/riscv/fpu_helper.c

Richard Henderson Jan. 3, 2018, 8:10 p.m. UTC | #1

On 01/02/2018 04:44 PM, Michael Clark wrote:
> +/* convert RISC-V rounding mode to IEEE library numbers */
> +unsigned int ieee_rm[] = {

static const.

> +/* obtain rm value to use in computation
> + * as the last step, convert rm codes to what the softfloat library expects
> + * Adapted from Spike's decode.h:RM
> + */
> +#define RM ({                                             \
> +if (rm == 7) {                                            \
> +    rm = env->frm;                               \
> +}                                                         \
> +if (rm > 4) {                                             \
> +    helper_raise_exception(env, RISCV_EXCP_ILLEGAL_INST); \
> +}                                                         \
> +ieee_rm[rm]; })

Please use inline functions and not these macros.

Probably this applies to some of the helpers that I've already reviewed, but
you're going to need to use an exception raising function that also performs an
unwind (usually via cpu_loop_exit_restore).  The GETPC() that feeds the unwind
must be placed in the outer-most helper (including inlines).

> +#ifndef CONFIG_USER_ONLY
> +#define require_fp if (!(env->mstatus & MSTATUS_FS)) { \
> +    helper_raise_exception(env, RISCV_EXCP_ILLEGAL_INST); \
> +}

If you included MSTATUS_FS in cpu_get_tb_cpu_state flags, then you could be
checking for this at translation time instead of at run time.

> +/* convert softfloat library flag numbers to RISC-V */
> +unsigned int softfloat_flags_to_riscv(unsigned int flags)
> +{
> +    int rv_flags = 0;
> +    rv_flags |= (flags & float_flag_inexact) ? 1 : 0;
> +    rv_flags |= (flags & float_flag_underflow) ? 2 : 0;
> +    rv_flags |= (flags & float_flag_overflow) ? 4 : 0;
> +    rv_flags |= (flags & float_flag_divbyzero) ? 8 : 0;
> +    rv_flags |= (flags & float_flag_invalid) ? 16 : 0;

FPEXC_NX et al.

> +/* adapted from Spike's decode.h:set_fp_exceptions */
> +#define set_fp_exceptions() do { \
> +    env->fflags |= softfloat_flags_to_riscv(get_float_exception_flags(\
> +                            &env->fp_status)); \
> +    set_float_exception_flags(0, &env->fp_status); \
> +} while (0)

inline function.  Usually written as

  int flags = get_float_exception_flags(&env->fp_status);
  if (flags) {
    set_float_exception_flags(0, &env->fp_status);
    env->fflags |= softfloat_flags_to_riscv(flags);
  }

since we really do expect exceptions to be exceptional.

> +uint64_t helper_fmsub_s(CPURISCVState *env, uint64_t frs1, uint64_t frs2,
> +                        uint64_t frs3, uint64_t rm)
> +{
> +    require_fp;
> +    set_float_rounding_mode(RM, &env->fp_status);
> +    frs1 = float32_muladd(frs1, frs2, frs3 ^ (uint32_t)INT32_MIN, 0,
> +                          &env->fp_status);

Given that RISC-V always returns a default NaN, you obviously do not care about
the sign of a NaN result.  Therefore you should use float_muladd_negate_c as
the fourth argument here and not perform the sign flip manually.

> +uint64_t helper_fnmsub_s(CPURISCVState *env, uint64_t frs1, uint64_t frs2,
> +                         uint64_t frs3, uint64_t rm)
> +{
> +    require_fp;
> +    set_float_rounding_mode(RM, &env->fp_status);
> +    frs1 = float32_muladd(frs1 ^ (uint32_t)INT32_MIN, frs2, frs3, 0,
> +                          &env->fp_status);

float_muladd_negate_product.

> +uint64_t helper_fnmadd_s(CPURISCVState *env, uint64_t frs1, uint64_t frs2,
> +                         uint64_t frs3, uint64_t rm)
> +{
> +    require_fp;
> +    set_float_rounding_mode(RM, &env->fp_status);
> +    frs1 = float32_muladd(frs1 ^ (uint32_t)INT32_MIN, frs2,
> +                          frs3 ^ (uint32_t)INT32_MIN, 0, &env->fp_status);

float_muladd_negate_c | float_muladd_negate_product

> +uint64_t helper_fmin_s(CPURISCVState *env, uint64_t frs1, uint64_t frs2)
> +{
> +    require_fp;
> +    frs1 = float32_minnum(frs1, frs2, &env->fp_status);

If you want minnum and not min, riscv-spec-v2.2.pdf could use some more verbage
to specify that.

> +/* adapted from spike */
> +#define isNaNF32UI(ui) (0xFF000000 < (uint32_t)((uint_fast32_t)ui << 1))

float32_is_any_nan

> +#define signF32UI(a) ((bool)((uint32_t)a >> 31))

float32_is_neg

> +#define expF32UI(a) ((int_fast16_t)(a >> 23) & 0xFF)
> +#define fracF32UI(a) (a & 0x007FFFFF)

Either float32_is_infinity or float32_is_zero_or_denormal.

> +union ui32_f32 { uint32_t ui; uint32_t f; };

This is just silly.

> +uint_fast16_t float32_classify(uint32_t a, float_status *status)

Please drop *_fast*_t for just "int".

> +    return
> +        (sign && infOrNaN && fracF32UI(uiA) == 0)           << 0 |
> +        (sign && !infOrNaN && !subnormalOrZero)             << 1 |
> +        (sign && subnormalOrZero && fracF32UI(uiA))         << 2 |
> +        (sign && subnormalOrZero && fracF32UI(uiA) == 0)    << 3 |
> +        (!sign && infOrNaN && fracF32UI(uiA) == 0)          << 7 |
> +        (!sign && !infOrNaN && !subnormalOrZero)            << 6 |
> +        (!sign && subnormalOrZero && fracF32UI(uiA))        << 5 |
> +        (!sign && subnormalOrZero && fracF32UI(uiA) == 0)   << 4 |
> +        (isNaNF32UI(uiA) &&  float32_is_signaling_nan(uiA, status)) << 8 |
> +        (isNaNF32UI(uiA) && !float32_is_signaling_nan(uiA, status)) << 9;

These conditions are all exclusive.  You do not need to compute all of them.

  if (float32_is_any_nan(f)) {
    if (float32_is_signaling_nan(f, status)) {
      return 1 << 8;
    } else {
      return 1 << 9;
    }
  }
  if (float32_is_neg(f)) {
    if (float32_is_infinity(f)) {
      return 1 << 0;
    } else if (float32_is_zero(f)) {
      return 1 << 3;
    } else if (float32_is_zero_or_denormal(f)) {
      return 1 << 2;
    } else {
      return 1 << 1;
    }
  } else {
    ...
  }

> +    frs1 = (int64_t)((int32_t)float64_to_int32(frs1, &env->fp_status));

Double cast is pointless, as are the extra parenthesis.

> +/* adapted from spike */
> +#define isNaNF64UI(ui) (UINT64_C(0xFFE0000000000000) \
> +                       < (uint64_t)((uint_fast64_t)ui << 1))
> +#define signF64UI(a) ((bool)((uint64_t) a >> 63))
> +#define expF64UI(a) ((int_fast16_t)(a >> 52) & 0x7FF)
> +#define fracF64UI(a) (a & UINT64_C(0x000FFFFFFFFFFFFF))
> +
> +union ui64_f64 { uint64_t ui; uint64_t f; };
> +
> +uint_fast16_t float64_classify(uint64_t a, float_status *status)

Likewise.


r~

Michael Clark Jan. 23, 2018, 9:37 p.m. UTC | #2

On Wed, Jan 3, 2018 at 12:10 PM, Richard Henderson <
richard.henderson@linaro.org> wrote:

> On 01/02/2018 04:44 PM, Michael Clark wrote:
> > +/* convert RISC-V rounding mode to IEEE library numbers */
> > +unsigned int ieee_rm[] = {
>
> static const.


Done.

> +/* obtain rm value to use in computation
> > + * as the last step, convert rm codes to what the softfloat library
> expects
> > + * Adapted from Spike's decode.h:RM
> > + */
> > +#define RM ({                                             \
> > +if (rm == 7) {                                            \
> > +    rm = env->frm;                               \
> > +}                                                         \
> > +if (rm > 4) {                                             \
> > +    helper_raise_exception(env, RISCV_EXCP_ILLEGAL_INST); \
> > +}                                                         \
> > +ieee_rm[rm]; })
>
> Please use inline functions and not these macros.
>

Done.

In fact the previous code would, with normal control flow, dereference
ieee_rm[rm] with an out of bounds round mode so assuming
helper_raise_exception does a longjmp, i've inserted g_assert_not_reached()
after the exception. An analyser could detect that ieee_rm has an out of
bounds access assuming normal control flow. e.g.

static inline void set_fp_round_mode(CPURISCVState *env, uint64_t rm)
{
    if (rm == 7) {
        rm = env->frm;
    } else if (rm > 4) {
        helper_raise_exception(env, RISCV_EXCP_ILLEGAL_INST);
        g_assert_not_reached();
    }
    set_float_rounding_mode(ieee_rm[rm], &env->fp_status);
}



> Probably this applies to some of the helpers that I've already reviewed,
> but
> you're going to need to use an exception raising function that also
> performs an
> unwind (usually via cpu_loop_exit_restore).  The GETPC() that feeds the
> unwind
> must be placed in the outer-most helper (including inlines).
>
> > +#ifndef CONFIG_USER_ONLY
> > +#define require_fp if (!(env->mstatus & MSTATUS_FS)) { \
> > +    helper_raise_exception(env, RISCV_EXCP_ILLEGAL_INST); \
> > +}
>
> If you included MSTATUS_FS in cpu_get_tb_cpu_state flags, then you could be
> checking for this at translation time instead of at run time.


I'll keep a note of this but I might attempt this later. We have some other
changes in this area with repsect to cpu_mmu_index.

Are translations not implicitly indexed by cpu_mmu_index? i.e. do we also
need to add cpu_mmu_index to cpu_get_tb_cpu_state flags to prevent code
translated for one value of mmu_index running in code with another value of
mmu_index (in the case of riscv, we currently return processor mode /
privilege level in cpu_mmu_index).


> > +/* convert softfloat library flag numbers to RISC-V */
> > +unsigned int softfloat_flags_to_riscv(unsigned int flags)
> > +{
> > +    int rv_flags = 0;
> > +    rv_flags |= (flags & float_flag_inexact) ? 1 : 0;
> > +    rv_flags |= (flags & float_flag_underflow) ? 2 : 0;
> > +    rv_flags |= (flags & float_flag_overflow) ? 4 : 0;
> > +    rv_flags |= (flags & float_flag_divbyzero) ? 8 : 0;
> > +    rv_flags |= (flags & float_flag_invalid) ? 16 : 0;
>
> FPEXC_NX et al.
>
> > +/* adapted from Spike's decode.h:set_fp_exceptions */
> > +#define set_fp_exceptions() do { \
> > +    env->fflags |= softfloat_flags_to_riscv(get_float_exception_flags(\
> > +                            &env->fp_status)); \
> > +    set_float_exception_flags(0, &env->fp_status); \
> > +} while (0)
>
> inline function.  Usually written as
>
>   int flags = get_float_exception_flags(&env->fp_status);
>   if (flags) {
>     set_float_exception_flags(0, &env->fp_status);
>     env->fflags |= softfloat_flags_to_riscv(flags);
>   }
>
> since we really do expect exceptions to be exceptional.


Done.

> +uint64_t helper_fmsub_s(CPURISCVState *env, uint64_t frs1, uint64_t frs2,
> > +                        uint64_t frs3, uint64_t rm)
> > +{
> > +    require_fp;
> > +    set_float_rounding_mode(RM, &env->fp_status);
> > +    frs1 = float32_muladd(frs1, frs2, frs3 ^ (uint32_t)INT32_MIN, 0,
> > +                          &env->fp_status);
>
> Given that RISC-V always returns a default NaN, you obviously do not care
> about
> the sign of a NaN result.  Therefore you should use float_muladd_negate_c
> as
> the fourth argument here and not perform the sign flip manually.


Now working on feedback from here on...

Here is the WIP changelog for the v4 spin...

v4

- Move code to set round mode into set_fp_round_mode function
- Convert set_fp_exceptions from a macro to an inline function
- Convert round mode helper into an inline function
- Make fpu_helper ieee_rm array static const
- Include cpu_mmu_index in cpu_get_tb_cpu_state flags
- Eliminate MPRV influence on mmu_index
- Remove unrecoverable do_unassigned_access function
- Only update PTE accessed and dirty bits if necessary
- Remove unnecessary tlb_flush in set_mode as mode is in mmu_idx
- Remove buggy support for misa writes. misa writes are optional
  and are not implemented in any known hardware
- Always set PTE read or execute permissions during page walk
- Reorder helper function declarations to match order in helper.c
- Remove redundant variable declaration in get_physical_address
- Remove duplicated code from get_physical_address
- Use mmu_idx instead of mem_idx in riscv_cpu_get_phys_page_debug
- Use IEEE NaNs instead of default NaNs

The changes being added to the v3 baseline and we'll rebase before we spin
v4 in a week or two...

- https://github.com/riscv/riscv-qemu/tree/qemu-upstream-v3


> > +uint64_t helper_fnmsub_s(CPURISCVState *env, uint64_t frs1, uint64_t
> frs2,
> > +                         uint64_t frs3, uint64_t rm)
> > +{
> > +    require_fp;
> > +    set_float_rounding_mode(RM, &env->fp_status);
> > +    frs1 = float32_muladd(frs1 ^ (uint32_t)INT32_MIN, frs2, frs3, 0,
> > +                          &env->fp_status);
>
> float_muladd_negate_product.
>
> > +uint64_t helper_fnmadd_s(CPURISCVState *env, uint64_t frs1, uint64_t
> frs2,
> > +                         uint64_t frs3, uint64_t rm)
> > +{
> > +    require_fp;
> > +    set_float_rounding_mode(RM, &env->fp_status);
> > +    frs1 = float32_muladd(frs1 ^ (uint32_t)INT32_MIN, frs2,
> > +                          frs3 ^ (uint32_t)INT32_MIN, 0,
> &env->fp_status);
>
> float_muladd_negate_c | float_muladd_negate_product
>
> > +uint64_t helper_fmin_s(CPURISCVState *env, uint64_t frs1, uint64_t frs2)
> > +{
> > +    require_fp;
> > +    frs1 = float32_minnum(frs1, frs2, &env->fp_status);
>
> If you want minnum and not min, riscv-spec-v2.2.pdf could use some more
> verbage
> to specify that.
>
> > +/* adapted from spike */
> > +#define isNaNF32UI(ui) (0xFF000000 < (uint32_t)((uint_fast32_t)ui << 1))
>
> float32_is_any_nan
>
> > +#define signF32UI(a) ((bool)((uint32_t)a >> 31))
>
> float32_is_neg
>
> > +#define expF32UI(a) ((int_fast16_t)(a >> 23) & 0xFF)
> > +#define fracF32UI(a) (a & 0x007FFFFF)
>
> Either float32_is_infinity or float32_is_zero_or_denormal.
>
> > +union ui32_f32 { uint32_t ui; uint32_t f; };
>
> This is just silly.
>
> > +uint_fast16_t float32_classify(uint32_t a, float_status *status)
>
> Please drop *_fast*_t for just "int".
>
> > +    return
> > +        (sign && infOrNaN && fracF32UI(uiA) == 0)           << 0 |
> > +        (sign && !infOrNaN && !subnormalOrZero)             << 1 |
> > +        (sign && subnormalOrZero && fracF32UI(uiA))         << 2 |
> > +        (sign && subnormalOrZero && fracF32UI(uiA) == 0)    << 3 |
> > +        (!sign && infOrNaN && fracF32UI(uiA) == 0)          << 7 |
> > +        (!sign && !infOrNaN && !subnormalOrZero)            << 6 |
> > +        (!sign && subnormalOrZero && fracF32UI(uiA))        << 5 |
> > +        (!sign && subnormalOrZero && fracF32UI(uiA) == 0)   << 4 |
> > +        (isNaNF32UI(uiA) &&  float32_is_signaling_nan(uiA, status)) <<
> 8 |
> > +        (isNaNF32UI(uiA) && !float32_is_signaling_nan(uiA, status)) <<
> 9;
>
> These conditions are all exclusive.  You do not need to compute all of
> them.
>
>   if (float32_is_any_nan(f)) {
>     if (float32_is_signaling_nan(f, status)) {
>       return 1 << 8;
>     } else {
>       return 1 << 9;
>     }
>   }
>   if (float32_is_neg(f)) {
>     if (float32_is_infinity(f)) {
>       return 1 << 0;
>     } else if (float32_is_zero(f)) {
>       return 1 << 3;
>     } else if (float32_is_zero_or_denormal(f)) {
>       return 1 << 2;
>     } else {
>       return 1 << 1;
>     }
>   } else {
>     ...
>   }
>
> > +    frs1 = (int64_t)((int32_t)float64_to_int32(frs1, &env->fp_status));
>
> Double cast is pointless, as are the extra parenthesis.
>
> > +/* adapted from spike */
> > +#define isNaNF64UI(ui) (UINT64_C(0xFFE0000000000000) \
> > +                       < (uint64_t)((uint_fast64_t)ui << 1))
> > +#define signF64UI(a) ((bool)((uint64_t) a >> 63))
> > +#define expF64UI(a) ((int_fast16_t)(a >> 52) & 0x7FF)
> > +#define fracF64UI(a) (a & UINT64_C(0x000FFFFFFFFFFFFF))
> > +
> > +union ui64_f64 { uint64_t ui; uint64_t f; };
> > +
> > +uint_fast16_t float64_classify(uint64_t a, float_status *status)
>
> Likewise.
>
>
> r~
>

Michael Clark Jan. 23, 2018, 11:15 p.m. UTC | #3

On Wed, Jan 3, 2018 at 12:10 PM, Richard Henderson <
richard.henderson@linaro.org> wrote:

> On 01/02/2018 04:44 PM, Michael Clark wrote:
> > +/* convert RISC-V rounding mode to IEEE library numbers */
> > +unsigned int ieee_rm[] = {
>
> static const.
>
> > +/* obtain rm value to use in computation
> > + * as the last step, convert rm codes to what the softfloat library
> expects
> > + * Adapted from Spike's decode.h:RM
> > + */
> > +#define RM ({                                             \
> > +if (rm == 7) {                                            \
> > +    rm = env->frm;                               \
> > +}                                                         \
> > +if (rm > 4) {                                             \
> > +    helper_raise_exception(env, RISCV_EXCP_ILLEGAL_INST); \
> > +}                                                         \
> > +ieee_rm[rm]; })
>
> Please use inline functions and not these macros.
>
> Probably this applies to some of the helpers that I've already reviewed,
> but
> you're going to need to use an exception raising function that also
> performs an
> unwind (usually via cpu_loop_exit_restore).  The GETPC() that feeds the
> unwind
> must be placed in the outer-most helper (including inlines).
>
> > +#ifndef CONFIG_USER_ONLY
> > +#define require_fp if (!(env->mstatus & MSTATUS_FS)) { \
> > +    helper_raise_exception(env, RISCV_EXCP_ILLEGAL_INST); \
> > +}
>
> If you included MSTATUS_FS in cpu_get_tb_cpu_state flags, then you could be
> checking for this at translation time instead of at run time.
>
> > +/* convert softfloat library flag numbers to RISC-V */
> > +unsigned int softfloat_flags_to_riscv(unsigned int flags)
> > +{
> > +    int rv_flags = 0;
> > +    rv_flags |= (flags & float_flag_inexact) ? 1 : 0;
> > +    rv_flags |= (flags & float_flag_underflow) ? 2 : 0;
> > +    rv_flags |= (flags & float_flag_overflow) ? 4 : 0;
> > +    rv_flags |= (flags & float_flag_divbyzero) ? 8 : 0;
> > +    rv_flags |= (flags & float_flag_invalid) ? 16 : 0;
>
> FPEXC_NX et al.
>
> > +/* adapted from Spike's decode.h:set_fp_exceptions */
> > +#define set_fp_exceptions() do { \
> > +    env->fflags |= softfloat_flags_to_riscv(get_float_exception_flags(\
> > +                            &env->fp_status)); \
> > +    set_float_exception_flags(0, &env->fp_status); \
> > +} while (0)
>
> inline function.  Usually written as
>
>   int flags = get_float_exception_flags(&env->fp_status);
>   if (flags) {
>     set_float_exception_flags(0, &env->fp_status);
>     env->fflags |= softfloat_flags_to_riscv(flags);
>   }
>
> since we really do expect exceptions to be exceptional.
>
> > +uint64_t helper_fmsub_s(CPURISCVState *env, uint64_t frs1, uint64_t
> frs2,
> > +                        uint64_t frs3, uint64_t rm)
> > +{
> > +    require_fp;
> > +    set_float_rounding_mode(RM, &env->fp_status);
> > +    frs1 = float32_muladd(frs1, frs2, frs3 ^ (uint32_t)INT32_MIN, 0,
> > +                          &env->fp_status);
>
> Given that RISC-V always returns a default NaN, you obviously do not care
> about
> the sign of a NaN result.  Therefore you should use float_muladd_negate_c
> as
> the fourth argument here and not perform the sign flip manually.


We do care about the sign of NaN results.

Jim Wilson spotted this bug and removed a call to set_default_nan_mode

https://github.com/riscv/riscv-qemu/commit/4223d89b0c5c671332d66bcd649db5c6f46559f5


> > +uint64_t helper_fnmsub_s(CPURISCVState *env, uint64_t frs1, uint64_t
> frs2,
> > +                         uint64_t frs3, uint64_t rm)
> > +{
> > +    require_fp;
> > +    set_float_rounding_mode(RM, &env->fp_status);
> > +    frs1 = float32_muladd(frs1 ^ (uint32_t)INT32_MIN, frs2, frs3, 0,
> > +                          &env->fp_status);
>
> float_muladd_negate_product.
>
> > +uint64_t helper_fnmadd_s(CPURISCVState *env, uint64_t frs1, uint64_t
> frs2,
> > +                         uint64_t frs3, uint64_t rm)
> > +{
> > +    require_fp;
> > +    set_float_rounding_mode(RM, &env->fp_status);
> > +    frs1 = float32_muladd(frs1 ^ (uint32_t)INT32_MIN, frs2,
> > +                          frs3 ^ (uint32_t)INT32_MIN, 0,
> &env->fp_status);
>
> float_muladd_negate_c | float_muladd_negate_product
>
> > +uint64_t helper_fmin_s(CPURISCVState *env, uint64_t frs1, uint64_t frs2)
> > +{
> > +    require_fp;
> > +    frs1 = float32_minnum(frs1, frs2, &env->fp_status);
>
> If you want minnum and not min, riscv-spec-v2.2.pdf could use some more
> verbage
> to specify that.


We'll look at Spike (riscv-isa-sim)... Spike has the correct behavior.


> > +/* adapted from spike */
> > +#define isNaNF32UI(ui) (0xFF000000 < (uint32_t)((uint_fast32_t)ui << 1))
>
> float32_is_any_nan
>
> > +#define signF32UI(a) ((bool)((uint32_t)a >> 31))
>
> float32_is_neg
>
> > +#define expF32UI(a) ((int_fast16_t)(a >> 23) & 0xFF)
> > +#define fracF32UI(a) (a & 0x007FFFFF)
>
> Either float32_is_infinity or float32_is_zero_or_denormal.
>
> > +union ui32_f32 { uint32_t ui; uint32_t f; };
>
> This is just silly.
>
> > +uint_fast16_t float32_classify(uint32_t a, float_status *status)
>
> Please drop *_fast*_t for just "int".
>
> > +    return
> > +        (sign && infOrNaN && fracF32UI(uiA) == 0)           << 0 |
> > +        (sign && !infOrNaN && !subnormalOrZero)             << 1 |
> > +        (sign && subnormalOrZero && fracF32UI(uiA))         << 2 |
> > +        (sign && subnormalOrZero && fracF32UI(uiA) == 0)    << 3 |
> > +        (!sign && infOrNaN && fracF32UI(uiA) == 0)          << 7 |
> > +        (!sign && !infOrNaN && !subnormalOrZero)            << 6 |
> > +        (!sign && subnormalOrZero && fracF32UI(uiA))        << 5 |
> > +        (!sign && subnormalOrZero && fracF32UI(uiA) == 0)   << 4 |
> > +        (isNaNF32UI(uiA) &&  float32_is_signaling_nan(uiA, status)) <<
> 8 |
> > +        (isNaNF32UI(uiA) && !float32_is_signaling_nan(uiA, status)) <<
> 9;
>
> These conditions are all exclusive.  You do not need to compute all of
> them.
>
>   if (float32_is_any_nan(f)) {
>     if (float32_is_signaling_nan(f, status)) {
>       return 1 << 8;
>     } else {
>       return 1 << 9;
>     }
>   }
>   if (float32_is_neg(f)) {
>     if (float32_is_infinity(f)) {
>       return 1 << 0;
>     } else if (float32_is_zero(f)) {
>       return 1 << 3;
>     } else if (float32_is_zero_or_denormal(f)) {
>       return 1 << 2;
>     } else {
>       return 1 << 1;
>     }
>   } else {
>     ...
>   }
>
> > +    frs1 = (int64_t)((int32_t)float64_to_int32(frs1, &env->fp_status));
>
> Double cast is pointless, as are the extra parenthesis.
>
> > +/* adapted from spike */
> > +#define isNaNF64UI(ui) (UINT64_C(0xFFE0000000000000) \
> > +                       < (uint64_t)((uint_fast64_t)ui << 1))
> > +#define signF64UI(a) ((bool)((uint64_t) a >> 63))
> > +#define expF64UI(a) ((int_fast16_t)(a >> 52) & 0x7FF)
> > +#define fracF64UI(a) (a & UINT64_C(0x000FFFFFFFFFFFFF))
> > +
> > +union ui64_f64 { uint64_t ui; uint64_t f; };
> > +
> > +uint_fast16_t float64_classify(uint64_t a, float_status *status)
>
> Likewise.
>
>
> r~
>

Michael Clark Jan. 23, 2018, 11:35 p.m. UTC | #4

On Tue, Jan 23, 2018 at 3:15 PM, Michael Clark <mjc@sifive.com> wrote:

>
>
> On Wed, Jan 3, 2018 at 12:10 PM, Richard Henderson <
> richard.henderson@linaro.org> wrote:
>
>> On 01/02/2018 04:44 PM, Michael Clark wrote:
>> > +/* convert RISC-V rounding mode to IEEE library numbers */
>> > +unsigned int ieee_rm[] = {
>>
>> static const.
>>
>> > +/* obtain rm value to use in computation
>> > + * as the last step, convert rm codes to what the softfloat library
>> expects
>> > + * Adapted from Spike's decode.h:RM
>> > + */
>> > +#define RM ({                                             \
>> > +if (rm == 7) {                                            \
>> > +    rm = env->frm;                               \
>> > +}                                                         \
>> > +if (rm > 4) {                                             \
>> > +    helper_raise_exception(env, RISCV_EXCP_ILLEGAL_INST); \
>> > +}                                                         \
>> > +ieee_rm[rm]; })
>>
>> Please use inline functions and not these macros.
>>
>> Probably this applies to some of the helpers that I've already reviewed,
>> but
>> you're going to need to use an exception raising function that also
>> performs an
>> unwind (usually via cpu_loop_exit_restore).  The GETPC() that feeds the
>> unwind
>> must be placed in the outer-most helper (including inlines).
>>
>> > +#ifndef CONFIG_USER_ONLY
>> > +#define require_fp if (!(env->mstatus & MSTATUS_FS)) { \
>> > +    helper_raise_exception(env, RISCV_EXCP_ILLEGAL_INST); \
>> > +}
>>
>> If you included MSTATUS_FS in cpu_get_tb_cpu_state flags, then you could
>> be
>> checking for this at translation time instead of at run time.
>>
>> > +/* convert softfloat library flag numbers to RISC-V */
>> > +unsigned int softfloat_flags_to_riscv(unsigned int flags)
>> > +{
>> > +    int rv_flags = 0;
>> > +    rv_flags |= (flags & float_flag_inexact) ? 1 : 0;
>> > +    rv_flags |= (flags & float_flag_underflow) ? 2 : 0;
>> > +    rv_flags |= (flags & float_flag_overflow) ? 4 : 0;
>> > +    rv_flags |= (flags & float_flag_divbyzero) ? 8 : 0;
>> > +    rv_flags |= (flags & float_flag_invalid) ? 16 : 0;
>>
>> FPEXC_NX et al.
>>
>> > +/* adapted from Spike's decode.h:set_fp_exceptions */
>> > +#define set_fp_exceptions() do { \
>> > +    env->fflags |= softfloat_flags_to_riscv(get_f
>> loat_exception_flags(\
>> > +                            &env->fp_status)); \
>> > +    set_float_exception_flags(0, &env->fp_status); \
>> > +} while (0)
>>
>> inline function.  Usually written as
>>
>>   int flags = get_float_exception_flags(&env->fp_status);
>>   if (flags) {
>>     set_float_exception_flags(0, &env->fp_status);
>>     env->fflags |= softfloat_flags_to_riscv(flags);
>>   }
>>
>> since we really do expect exceptions to be exceptional.
>>
>> > +uint64_t helper_fmsub_s(CPURISCVState *env, uint64_t frs1, uint64_t
>> frs2,
>> > +                        uint64_t frs3, uint64_t rm)
>> > +{
>> > +    require_fp;
>> > +    set_float_rounding_mode(RM, &env->fp_status);
>> > +    frs1 = float32_muladd(frs1, frs2, frs3 ^ (uint32_t)INT32_MIN, 0,
>> > +                          &env->fp_status);
>>
>> Given that RISC-V always returns a default NaN, you obviously do not care
>> about
>> the sign of a NaN result.  Therefore you should use float_muladd_negate_c
>> as
>> the fourth argument here and not perform the sign flip manually.
>
>
> We do care about the sign of NaN results.
>
> Jim Wilson spotted this bug and removed a call to set_default_nan_mode
>
> https://github.com/riscv/riscv-qemu/commit/4223d89b0c5c671332d66bcd649db5
> c6f46559f5
>
>
>> > +uint64_t helper_fnmsub_s(CPURISCVState *env, uint64_t frs1, uint64_t
>> frs2,
>> > +                         uint64_t frs3, uint64_t rm)
>> > +{
>> > +    require_fp;
>> > +    set_float_rounding_mode(RM, &env->fp_status);
>> > +    frs1 = float32_muladd(frs1 ^ (uint32_t)INT32_MIN, frs2, frs3, 0,
>> > +                          &env->fp_status);
>>
>> float_muladd_negate_product.
>>
>> > +uint64_t helper_fnmadd_s(CPURISCVState *env, uint64_t frs1, uint64_t
>> frs2,
>> > +                         uint64_t frs3, uint64_t rm)
>> > +{
>> > +    require_fp;
>> > +    set_float_rounding_mode(RM, &env->fp_status);
>> > +    frs1 = float32_muladd(frs1 ^ (uint32_t)INT32_MIN, frs2,
>> > +                          frs3 ^ (uint32_t)INT32_MIN, 0,
>> &env->fp_status);
>>
>> float_muladd_negate_c | float_muladd_negate_product
>>
>> > +uint64_t helper_fmin_s(CPURISCVState *env, uint64_t frs1, uint64_t
>> frs2)
>> > +{
>> > +    require_fp;
>> > +    frs1 = float32_minnum(frs1, frs2, &env->fp_status);
>>
>> If you want minnum and not min, riscv-spec-v2.2.pdf could use some more
>> verbage
>> to specify that.
>
>
> We'll look at Spike (riscv-isa-sim)... Spike has the correct behavior.
>

We want minimum number (minnum). It's been added to the draft spec and will
be in riscv-spec-v2.3.pdf

Section 8.6 in the draft spec says:

"
Floating-point minimum-number and maximum-number instructions FMIN.S and
FMAX.S write, respectively, the smaller or larger of rs1 and rs2 to rd. For
the purposes of these instructions only, the value -0:0 is considered to be
less than the value +0:0. If both inputs are NaNs, the result is the
canonical NaN. If only one operand is a NaN, the result is the non-NaN
operand. Signaling NaN inputs raise the invalid operation exception, even
when the result is not NaN.
"


> +/* adapted from spike */
>> > +#define isNaNF32UI(ui) (0xFF000000 < (uint32_t)((uint_fast32_t)ui <<
>> 1))
>>
>> float32_is_any_nan
>>
>> > +#define signF32UI(a) ((bool)((uint32_t)a >> 31))
>>
>> float32_is_neg
>>
>> > +#define expF32UI(a) ((int_fast16_t)(a >> 23) & 0xFF)
>> > +#define fracF32UI(a) (a & 0x007FFFFF)
>>
>> Either float32_is_infinity or float32_is_zero_or_denormal.
>>
>> > +union ui32_f32 { uint32_t ui; uint32_t f; };
>>
>> This is just silly.
>>
>> > +uint_fast16_t float32_classify(uint32_t a, float_status *status)
>>
>> Please drop *_fast*_t for just "int".
>>
>> > +    return
>> > +        (sign && infOrNaN && fracF32UI(uiA) == 0)           << 0 |
>> > +        (sign && !infOrNaN && !subnormalOrZero)             << 1 |
>> > +        (sign && subnormalOrZero && fracF32UI(uiA))         << 2 |
>> > +        (sign && subnormalOrZero && fracF32UI(uiA) == 0)    << 3 |
>> > +        (!sign && infOrNaN && fracF32UI(uiA) == 0)          << 7 |
>> > +        (!sign && !infOrNaN && !subnormalOrZero)            << 6 |
>> > +        (!sign && subnormalOrZero && fracF32UI(uiA))        << 5 |
>> > +        (!sign && subnormalOrZero && fracF32UI(uiA) == 0)   << 4 |
>> > +        (isNaNF32UI(uiA) &&  float32_is_signaling_nan(uiA, status)) <<
>> 8 |
>> > +        (isNaNF32UI(uiA) && !float32_is_signaling_nan(uiA, status)) <<
>> 9;
>>
>> These conditions are all exclusive.  You do not need to compute all of
>> them.
>>
>>   if (float32_is_any_nan(f)) {
>>     if (float32_is_signaling_nan(f, status)) {
>>       return 1 << 8;
>>     } else {
>>       return 1 << 9;
>>     }
>>   }
>>   if (float32_is_neg(f)) {
>>     if (float32_is_infinity(f)) {
>>       return 1 << 0;
>>     } else if (float32_is_zero(f)) {
>>       return 1 << 3;
>>     } else if (float32_is_zero_or_denormal(f)) {
>>       return 1 << 2;
>>     } else {
>>       return 1 << 1;
>>     }
>>   } else {
>>     ...
>>   }
>>
>> > +    frs1 = (int64_t)((int32_t)float64_to_int32(frs1,
>> &env->fp_status));
>>
>> Double cast is pointless, as are the extra parenthesis.
>>
>> > +/* adapted from spike */
>> > +#define isNaNF64UI(ui) (UINT64_C(0xFFE0000000000000) \
>> > +                       < (uint64_t)((uint_fast64_t)ui << 1))
>> > +#define signF64UI(a) ((bool)((uint64_t) a >> 63))
>> > +#define expF64UI(a) ((int_fast16_t)(a >> 52) & 0x7FF)
>> > +#define fracF64UI(a) (a & UINT64_C(0x000FFFFFFFFFFFFF))
>> > +
>> > +union ui64_f64 { uint64_t ui; uint64_t f; };
>> > +
>> > +uint_fast16_t float64_classify(uint64_t a, float_status *status)
>>
>> Likewise.
>>
>>
>> r~
>>
>
>

Richard Henderson Jan. 24, 2018, 12:01 a.m. UTC | #5

On 01/23/2018 01:37 PM, Michael Clark wrote:
> 
> 
> On Wed, Jan 3, 2018 at 12:10 PM, Richard Henderson
> <richard.henderson@linaro.org <mailto:richard.henderson@linaro.org>> wrote:
> 
>     On 01/02/2018 04:44 PM, Michael Clark wrote:
>     > +/* convert RISC-V rounding mode to IEEE library numbers */
>     > +unsigned int ieee_rm[] = {
> 
>     static const.
> 
> 
> Done. 
> 
>     > +/* obtain rm value to use in computation
>     > + * as the last step, convert rm codes to what the softfloat library expects
>     > + * Adapted from Spike's decode.h:RM
>     > + */
>     > +#define RM ({                                             \
>     > +if (rm == 7) {                                            \
>     > +    rm = env->frm;                               \
>     > +}                                                         \
>     > +if (rm > 4) {                                             \
>     > +    helper_raise_exception(env, RISCV_EXCP_ILLEGAL_INST); \
>     > +}                                                         \
>     > +ieee_rm[rm]; })
> 
>     Please use inline functions and not these macros.
> 
> 
> Done.
> 
> In fact the previous code would, with normal control flow, dereference
> ieee_rm[rm] with an out of bounds round mode so assuming helper_raise_exception
> does a longjmp, i've inserted g_assert_not_reached() after the exception. An
> analyser could detect that ieee_rm has an out of bounds access assuming normal
> control flow. e.g.
> 
> static inline void set_fp_round_mode(CPURISCVState *env, uint64_t rm)
> {
>     if (rm == 7) {
>         rm = env->frm;
>     } else if (rm > 4) {
>         helper_raise_exception(env, RISCV_EXCP_ILLEGAL_INST);
>         g_assert_not_reached();

Yes, raise_exception exits via longjmp.  This is a good change.

> Are translations not implicitly indexed by cpu_mmu_index? i.e. do we also need
> to add cpu_mmu_index to cpu_get_tb_cpu_state flags to prevent code translated
> for one value of mmu_index running in code with another value of mmu_index (in
> the case of riscv, we currently return processor mode / privilege level in
> cpu_mmu_index).

No, there is no implicit indexing.  That's why I've mentioned exactly this at
least twice in review so far.

There are two ways to do this correctly.  One is to add all the bits that
specify processor state (e.g. Alpha stores pal_code bit and supervisor bit).
Another is to actually encode the mmu_idx into the flags (e.g. ARM).


r~

Jim Wilson Jan. 24, 2018, 12:03 a.m. UTC | #6

On Tue, Jan 23, 2018 at 3:35 PM, Michael Clark <mjc@sifive.com> wrote:
> We want minimum number (minnum). It's been added to the draft spec and will
> be in riscv-spec-v2.3.pdf

In the preface of the draft, it says
• Defined the signed-zero behavior of FMIN.fmt and FMAX.fmt, and
changed their behavior on
signaling-NaN inputs to conform to the minimumNumber and maximumNumber
operations
in the proposed IEEE 754-201x specification.

But if qemu doesn't have minimumNumber support yet, then yes minNum is correct.

This is discussed a bit here
    https://github.com/riscv/riscv-isa-manual/issues/65

Jim

Richard Henderson Jan. 24, 2018, 12:15 a.m. UTC | #7

On 01/23/2018 03:15 PM, Michael Clark wrote:
>     > +uint64_t helper_fmsub_s(CPURISCVState *env, uint64_t frs1, uint64_t frs2,
>     > +                        uint64_t frs3, uint64_t rm)
>     > +{
>     > +    require_fp;
>     > +    set_float_rounding_mode(RM, &env->fp_status);
>     > +    frs1 = float32_muladd(frs1, frs2, frs3 ^ (uint32_t)INT32_MIN, 0,
>     > +                          &env->fp_status);
> 
>     Given that RISC-V always returns a default NaN, you obviously do not care about
>     the sign of a NaN result.  Therefore you should use float_muladd_negate_c as
>     the fourth argument here and not perform the sign flip manually.
> 
> 
> We do care about the sign of NaN results.
> 
> Jim Wilson spotted this bug and removed a call to set_default_nan_mode
> 
> https://github.com/riscv/riscv-qemu/commit/4223d89b0c5c671332d66bcd649db5c6f46559f5

Ok.  Now it depends on what result you care about for madd specifically.

If, like x86 and Power, fmsub returns the (silenced) original input NaN, you
want the float_muladd_* flags.

If, like ARM, fmsub returns the (silenced) negated input NaN, then you do need
to change sign externally.  If this is the case, please use float32_chs instead
of open-coding it with xor.


r~

Michael Clark Jan. 24, 2018, 1:31 a.m. UTC | #8

On Tue, Jan 23, 2018 at 4:01 PM, Richard Henderson <
richard.henderson@linaro.org> wrote:

> On 01/23/2018 01:37 PM, Michael Clark wrote:
> >
> >
> > On Wed, Jan 3, 2018 at 12:10 PM, Richard Henderson
> > <richard.henderson@linaro.org <mailto:richard.henderson@linaro.org>>
> wrote:
> >
> >     On 01/02/2018 04:44 PM, Michael Clark wrote:
> >     > +/* convert RISC-V rounding mode to IEEE library numbers */
> >     > +unsigned int ieee_rm[] = {
> >
> >     static const.
> >
> >
> > Done.
> >
> >     > +/* obtain rm value to use in computation
> >     > + * as the last step, convert rm codes to what the softfloat
> library expects
> >     > + * Adapted from Spike's decode.h:RM
> >     > + */
> >     > +#define RM ({                                             \
> >     > +if (rm == 7) {                                            \
> >     > +    rm = env->frm;                               \
> >     > +}                                                         \
> >     > +if (rm > 4) {                                             \
> >     > +    helper_raise_exception(env, RISCV_EXCP_ILLEGAL_INST); \
> >     > +}                                                         \
> >     > +ieee_rm[rm]; })
> >
> >     Please use inline functions and not these macros.
> >
> >
> > Done.
> >
> > In fact the previous code would, with normal control flow, dereference
> > ieee_rm[rm] with an out of bounds round mode so assuming
> helper_raise_exception
> > does a longjmp, i've inserted g_assert_not_reached() after the
> exception. An
> > analyser could detect that ieee_rm has an out of bounds access assuming
> normal
> > control flow. e.g.
> >
> > static inline void set_fp_round_mode(CPURISCVState *env, uint64_t rm)
> > {
> >     if (rm == 7) {
> >         rm = env->frm;
> >     } else if (rm > 4) {
> >         helper_raise_exception(env, RISCV_EXCP_ILLEGAL_INST);
> >         g_assert_not_reached();
>
> Yes, raise_exception exits via longjmp.  This is a good change.
>
> > Are translations not implicitly indexed by cpu_mmu_index? i.e. do we
> also need
> > to add cpu_mmu_index to cpu_get_tb_cpu_state flags to prevent code
> translated
> > for one value of mmu_index running in code with another value of
> mmu_index (in
> > the case of riscv, we currently return processor mode / privilege level
> in
> > cpu_mmu_index).
>
> No, there is no implicit indexing.  That's why I've mentioned exactly this
> at
> least twice in review so far.
>
> There are two ways to do this correctly.  One is to add all the bits that
> specify processor state (e.g. Alpha stores pal_code bit and supervisor
> bit).
> Another is to actually encode the mmu_idx into the flags (e.g. ARM).

I reviewed Stefan's patches. Stefan has some code that encoded processor
flags in mmu_index however it removed some well-tested code paths and I was
hesitant to make such a large change to get_physical_address at this point
in time. I read Stefan's code and used some fragments from it but was
relatively cautious with regard to changing relatively well tested code
paths. I tried to make the most minimal change for the sake of correctness.

For the meantime we've greatly simplified cpu_mmu_index to just return the
processor mode as well as adding the processor mode to cpu_get_tb_cpu_state
flags. cpu_mmu_index previously returned a permutation of env->priv
(containing processer mode), mstatus.MPP (previous mode) and mstatus.MPRV.
When MPRV is set, M mode loads and stores operate as per the mode in
mstatus.MPP and the previous cpu_mmu_index function returned the mode the
processor should do loads and stores as, not the current mode. This is
problematic as mstatus.MPRV can be altered from user code via register
values so there was a potential to re-enter a pre-existing trace with a
different mmu_index. I believe we have eliminated that issue.

These two changes should fix the issue and we've performed testing with
these patches:

-
https://github.com/michaeljclark/riscv-qemu/commit/82012aef90e5c4500f926523b3b2ff621b0cd512
-
https://github.com/michaeljclark/riscv-qemu/commit/abdb897a4a607d00cfce577ac37ca6119004658f

There is a possibility to further improve this code and potentially avoid
TLB flushes, by encoding more state in cpu_mmu_index
and cpu_get_tb_cpu_state flags. I would say that mstatus.SUM flag is
altered frequently in Linux's copy_to/from_user and is likely to result in
decent performance improvement if we can eliminate the TLB flush when the
SUM bit is changed (it is in concept similar to SMAP on x86). mstatus.MPRV
is used by the monitor for handling misaligned loads and stores which
should be relatively rare (the occasional use of structs with
__attribute__((packed))). Eliminating the tlb_flush on mstatus.SUM changes
likely will have the most positive performance impact

We have increased performance somewhat already (~2.5X faster on dd
if=/dev/zero of=/dev/null bs=1 count=100000). We eliminated a tlb_flush on
privilege level changes as this is included (now solely) in cpu_mmu_index,
and any helper that changes privilege level terminates translation so
mmu_index should now always be consistent. Possibly not quite as fast as
Stefan's original patch but an improvement nevertheless. The changes are
somewhat smaller.

The code is in a state where the mmu_index invariant is now very easy to
reason about. Any helper that changes mstatus or priv level terminates
traces [1] and cpu_mmu_index is included in cpu_get_tb_cpu_state flags.
Stefan may very well propose some incremental changes to eliminate some
more TLB flushes.

[1]
https://github.com/riscv/riscv-qemu/blob/ef7332fc693d386f572169b4011a84cfc5c02ac0/target/riscv/translate.c#L1420

During the process, we also found some bugs in the accessed/dirty bit
handling in get_physical_address. i.e. when the accessed/dirty bits don't
change, we now elide the store. This allows various use cases such as PTE
entries in ROM. We coalesced some of the flag
setting in get_physical_address based on Stefan's patch
(PAGE_READ+PAGE_EXEC) which likely reduces the number of TLB misses, but we
don't set PAGE_WRITE unless the access_type is MMU_DATA_STORE. The previous
code would only set the TLB prot flag for the specific access type. We set
PAGE_READ+PAGE_EXEC based on the PTE flags for load and fetch but we don't
set PAGE_WRITE on loads or fetches because we want a TLB miss for
subsequent stores so we don't miss updating the dirty bit if the first
access on a RW page is a load or fetch. I saw code in i386 that does
essentially the same thing.

We still need to make PTE updates atomic but given we only have multiplexed
SMP, it should not be too much of an issue right now.

Richard Henderson Jan. 24, 2018, 4:16 p.m. UTC | #9

On 01/23/2018 05:31 PM, Michael Clark wrote:
> For the meantime we've greatly simplified cpu_mmu_index to just return the
> processor mode as well as adding the processor mode to cpu_get_tb_cpu_state
> flags. cpu_mmu_index previously returned a permutation of env->priv (containing
> processer mode), mstatus.MPP (previous mode) and mstatus.MPRV. When MPRV is
> set, M mode loads and stores operate as per the mode in mstatus.MPP and the
> previous cpu_mmu_index function returned the mode the processor should do loads
> and stores as, not the current mode. This is problematic as mstatus.MPRV can be
> altered from user code via register values so there was a potential to re-enter
> a pre-existing trace with a different mmu_index. I believe we have eliminated
> that issue.
> 
> These two changes should fix the issue and we've performed testing with these
> patches:
> 
> -
> https://github.com/michaeljclark/riscv-qemu/commit/82012aef90e5c4500f926523b3b2ff621b0cd512
> - https://github.com/michaeljclark/riscv-qemu/commit/abdb897a4a607d00cfce577ac37ca6119004658f

I had been concerned, because solely within the context of these patches I
didn't see the additional flushing being added.  However, on checking out the
branch I see that they are all there on changing mstatus.

No need for it immediately, but as an incremental improvement you can use
targeted flushing.  E.g. when only MPRV changes, only PRV_M's mmu_idx is
invalidated; PRV_S and PRV_U are unaffected.  For this, use tlb_flush_by_mmuidx.

> During the process, we also found some bugs in the accessed/dirty bit handling
> in get_physical_address. i.e. when the accessed/dirty bits don't change, we now
> elide the store. This allows various use cases such as PTE entries in ROM. We
> coalesced some of the flag setting in get_physical_address based on Stefan's
> patch (PAGE_READ+PAGE_EXEC) which likely reduces the number of TLB misses, but
> we don't set PAGE_WRITE unless the access_type is MMU_DATA_STORE. The previous
> code would only set the TLB prot flag for the specific access type. We set
> PAGE_READ+PAGE_EXEC based on the PTE flags for load and fetch but we don't set
> PAGE_WRITE on loads or fetches because we want a TLB miss for subsequent stores
> so we don't miss updating the dirty bit if the first access on a RW page is a
> load or fetch. I saw code in i386 that does essentially the same thing.

You should still set PAGE_WRITE for MMU_DATA_LOAD if the dirty bit is already
set.  Consider:

Addresses A and B alias in the TLB.  Since our tlb implementation is of
necessity trivial, an access to B will remove A from the table.  Assuming both
pages are initially dirty, the access sequence

	load A
	load B
	store B

will fault 3 times with your code, but only twice if you consider the dirty bit
right away.

> We still need to make PTE updates atomic but given we only have multiplexed
> SMP, it should not be too much of an issue right now.

I'm sure that enabling multi-threading will be high on the list of post-merge
improvements.  There's certainly a lot of bang to be had there.

r~

Michael Clark Jan. 24, 2018, 5:35 p.m. UTC | #10

On Wed, Jan 24, 2018 at 8:16 AM, Richard Henderson <
richard.henderson@linaro.org> wrote:

> On 01/23/2018 05:31 PM, Michael Clark wrote:
> > For the meantime we've greatly simplified cpu_mmu_index to just return
> the
> > processor mode as well as adding the processor mode to
> cpu_get_tb_cpu_state
> > flags. cpu_mmu_index previously returned a permutation of env->priv
> (containing
> > processer mode), mstatus.MPP (previous mode) and mstatus.MPRV. When MPRV
> is
> > set, M mode loads and stores operate as per the mode in mstatus.MPP and
> the
> > previous cpu_mmu_index function returned the mode the processor should
> do loads
> > and stores as, not the current mode. This is problematic as mstatus.MPRV
> can be
> > altered from user code via register values so there was a potential to
> re-enter
> > a pre-existing trace with a different mmu_index. I believe we have
> eliminated
> > that issue.
> >
> > These two changes should fix the issue and we've performed testing with
> these
> > patches:
> >
> > -
> > https://github.com/michaeljclark/riscv-qemu/commit/
> 82012aef90e5c4500f926523b3b2ff621b0cd512
> > - https://github.com/michaeljclark/riscv-qemu/commit/
> abdb897a4a607d00cfce577ac37ca6119004658f
>
>
> I had been concerned, because solely within the context of these patches I
> didn't see the additional flushing being added.  However, on checking out
> the
> branch I see that they are all there on changing mstatus.
>
> No need for it immediately, but as an incremental improvement you can use
> targeted flushing.  E.g. when only MPRV changes, only PRV_M's mmu_idx is
> invalidated; PRV_S and PRV_U are unaffected.  For this, use
> tlb_flush_by_mmuidx.


Right. I saw those APIs. I experimented with a patch that did this but
didn't see a noticeable performance improvement so didn't pursue it.


> > During the process, we also found some bugs in the accessed/dirty bit
> handling
> > in get_physical_address. i.e. when the accessed/dirty bits don't change,
> we now
> > elide the store. This allows various use cases such as PTE entries in
> ROM. We
> > coalesced some of the flag setting in get_physical_address based on
> Stefan's
> > patch (PAGE_READ+PAGE_EXEC) which likely reduces the number of TLB
> misses, but
> > we don't set PAGE_WRITE unless the access_type is MMU_DATA_STORE. The
> previous
> > code would only set the TLB prot flag for the specific access type. We
> set
> > PAGE_READ+PAGE_EXEC based on the PTE flags for load and fetch but we
> don't set
> > PAGE_WRITE on loads or fetches because we want a TLB miss for subsequent
> stores
> > so we don't miss updating the dirty bit if the first access on a RW page
> is a
> > load or fetch. I saw code in i386 that does essentially the same thing.
>
> You should still set PAGE_WRITE for MMU_DATA_LOAD if the dirty bit is
> already
> set.  Consider:
>

Good spotting. I'll fix this case right now...


> Addresses A and B alias in the TLB.  Since our tlb implementation is of
> necessity trivial, an access to B will remove A from the table.  Assuming
> both
> pages are initially dirty, the access sequence
>
>         load A
>         load B
>         store B
>
> will fault 3 times with your code, but only twice if you consider the
> dirty bit
> right away.
>
> > We still need to make PTE updates atomic but given we only have
> multiplexed
> > SMP, it should not be too much of an issue right now.
>
> I'm sure that enabling multi-threading will be high on the list of
> post-merge
> improvements.  There's certainly a lot of bang to be had there.


Agree.

Jim Wilson Jan. 24, 2018, 6:58 p.m. UTC | #11

On Tue, Jan 23, 2018 at 4:15 PM, Richard Henderson
<richard.henderson@linaro.org> wrote:
> Ok.  Now it depends on what result you care about for madd specifically.
>
> If, like x86 and Power, fmsub returns the (silenced) original input NaN, you
> want the float_muladd_* flags.
>
> If, like ARM, fmsub returns the (silenced) negated input NaN, then you do need
> to change sign externally.  If this is the case, please use float32_chs instead
> of open-coding it with xor.

The ISA spec is a little ambiguous here.  There is text that says we
multiply, optionally negate the result, and then add or subtract the
addend.  However, this is followed by a sentence that gives equations,
and for fnmsub it says -rs1*rs2+rs3.  If we assume C semantics, then
they are negating rs1 not the multiply result.  This could potentially
give a different result if rs2 is a +qNaN and rs1/rs3 are not NaNs.
qemu is implementing what the equation says, not what the text says.
The definitive source would be the Spike simulator, and it agrees with
qemu and the equations.  The ISA spec does not specify what happens
when one of the operands is a NaN, except to mention that operations
comply with the IEEE 754 2008 standard.

I think that qemu is correct here, and that you want to use float32_chs.

Although, looking at this again, I see another statement in a
different place that says:

Except when otherwise stated, if the result of a floating-point
operation is NaN, it is the canonical
NaN. The canonical NaN has a positive sign and all significand bits
clear except the MSB, a.k.a.
the quiet bit. For single-precision floating-point, this corresponds
to the pattern 0x7fc00000.

So it sounds like maybe we do want default NaN support.  It appears
that spike is using default NaNs.  Unfortunately, enabling default NaN
support causes gcc and glibc testsuite failures which complicates
upstreaming glibc support, as they won't accept it unless we get
failures below a certain number.  Also, not having hardware to compare
against makes it impossible to determine if the simulator is doing
exactly what the hardware does.  This could take a little time to sort
out.

Jim

Richard Henderson Jan. 24, 2018, 11:47 p.m. UTC | #12

On 01/24/2018 10:58 AM, Jim Wilson wrote:
> I think that qemu is correct here, and that you want to use float32_chs.

Ok.

> Although, looking at this again, I see another statement in a
> different place that says:
> 
> Except when otherwise stated, if the result of a floating-point operation is
> NaN, it is the canonical NaN. The canonical NaN has a positive sign and all
> significand bits clear except the MSB, a.k.a. the quiet bit. For
> single-precision floating-point, this corresponds to the pattern
> 0x7fc00000.
Yes, I had read this before as well.  I had assumed that your patch constituted
an intended change to this text.

> This could take a little time to sort out.
Ok.  I don't see this as a blocking issue for merging.


r~

Jim Wilson Jan. 29, 2018, 8:33 p.m. UTC | #13

On Wed, Jan 24, 2018 at 3:47 PM, Richard Henderson
<richard.henderson@linaro.org> wrote:
> On 01/24/2018 10:58 AM, Jim Wilson wrote:
>> Although, looking at this again, I see another statement in a
>> different place that says:
>>
>> Except when otherwise stated, if the result of a floating-point operation is
>> NaN, it is the canonical NaN. The canonical NaN has a positive sign and all
>> significand bits clear except the MSB, a.k.a. the quiet bit. For
>> single-precision floating-point, this corresponds to the pattern
>> 0x7fc00000.
> Yes, I had read this before as well.  I had assumed that your patch constituted
> an intended change to this text.
>
>> This could take a little time to sort out.
> Ok.  I don't see this as a blocking issue for merging.

So after looking at this a bit more, I've come to the conclusion that
my patch to remove the default/canonical nan support from RISC-V qemu
was wrong.  We will have to fix this on the gcc/glibc side.

Michael, please revert my change
    https://github.com/riscv/riscv-qemu/commit/4223d89b0c5c671332d66bcd649db5c6f46559f5

Jim

Michael Clark Feb. 2, 2018, 5:26 a.m. UTC | #14

On Mon, Jan 29, 2018 at 12:33 PM, Jim Wilson <jimw@sifive.com> wrote:

> On Wed, Jan 24, 2018 at 3:47 PM, Richard Henderson
> <richard.henderson@linaro.org> wrote:
> > On 01/24/2018 10:58 AM, Jim Wilson wrote:
> >> Although, looking at this again, I see another statement in a
> >> different place that says:
> >>
> >> Except when otherwise stated, if the result of a floating-point
> operation is
> >> NaN, it is the canonical NaN. The canonical NaN has a positive sign and
> all
> >> significand bits clear except the MSB, a.k.a. the quiet bit. For
> >> single-precision floating-point, this corresponds to the pattern
> >> 0x7fc00000.
> > Yes, I had read this before as well.  I had assumed that your patch
> constituted
> > an intended change to this text.
> >
> >> This could take a little time to sort out.
> > Ok.  I don't see this as a blocking issue for merging.
>
> So after looking at this a bit more, I've come to the conclusion that
> my patch to remove the default/canonical nan support from RISC-V qemu
> was wrong.  We will have to fix this on the gcc/glibc side.
>
> Michael, please revert my change
>     https://github.com/riscv/riscv-qemu/commit/
> 4223d89b0c5c671332d66bcd649db5c6f46559f5


Done.

[v1,06/21] RISC-V FPU Support

Commit Message

Comments

Patch