diff mbox series

[2/4] target/riscv: Add right functions to set agnostic elements

Message ID 20240306092013.21231-3-eric.huang@linux.alibaba.com (mailing list archive)
State New, archived
Headers show
Series target/riscv: Fix the element agnostic function problem | expand

Commit Message

Huang Tao March 6, 2024, 9:20 a.m. UTC
We add vext_set_elems_1s to set agnostic elements to 1s in both big
and little endian situation.
In the function vext_set_elems_1s. We using esz argument to get the first
element to set. 'cnt' is just idx * esz.

Signed-off-by: Huang Tao <eric.huang@linux.alibaba.com>
---
 target/riscv/vector_internals.c | 53 +++++++++++++++++++++++++++++++++
 target/riscv/vector_internals.h |  2 ++
 2 files changed, 55 insertions(+)

Comments

Daniel Henrique Barboza March 19, 2024, 9:57 p.m. UTC | #1
(--- CCing Richard ---)

On 3/6/24 06:20, Huang Tao wrote:
> We add vext_set_elems_1s to set agnostic elements to 1s in both big
> and little endian situation.
> In the function vext_set_elems_1s. We using esz argument to get the first
> element to set. 'cnt' is just idx * esz.
> 
> Signed-off-by: Huang Tao <eric.huang@linux.alibaba.com>
> ---
>   target/riscv/vector_internals.c | 53 +++++++++++++++++++++++++++++++++
>   target/riscv/vector_internals.h |  2 ++
>   2 files changed, 55 insertions(+)
> 
> diff --git a/target/riscv/vector_internals.c b/target/riscv/vector_internals.c
> index 349b24f4ae..455be96996 100644
> --- a/target/riscv/vector_internals.c
> +++ b/target/riscv/vector_internals.c
> @@ -20,6 +20,59 @@
>   #include "vector_internals.h"
>   
>   /* set agnostic elements to 1s */
> +#if HOST_BIG_ENDIAN
> +void vext_set_elems_1s(void *vd, uint32_t is_agnostic, uint32_t esz,
> +                       uint32_t idx, uint32_t tot)
> +{
> +    if (is_agnostic == 0) {
> +        /* policy undisturbed */
> +        return;
> +    }
> +    void *base = NULL;
> +    switch (esz) {
> +    case 1:
> +        base = ((int8_t *)vd + H1(idx));
> +    break;
> +    case 2:
> +        base = ((int16_t *)vd + H2(idx));
> +    break;
> +    case 4:
> +        base = ((int32_t *)vd + H4(idx));
> +    break;
> +    case 8:
> +        base = ((int64_t *)vd + H8(idx));
> +    break;
> +    default:
> +        g_assert_not_reached();
> +    break;
> +    }
> +    /*
> +     * spilt the elements into 2 parts
> +     * part_begin: the memory need to be set in the first uint64_t unit
> +     * part_allign: the memory need to be set begins from next uint64_t
> +     *              unit and alligned to 8
> +     */
> +    uint32_t cnt = idx * esz;
> +    int part_begin, part_allign;
> +    part_begin = MIN(tot - cnt, 8 - (cnt % 8));
> +    part_allign = ((tot - cnt - part_begin) / 8) * 8;
> +
> +    memset(base - part_begin + 1, -1, part_begin);
> +    memset(QEMU_ALIGN_PTR_UP(base, 8), -1, part_allign);


This seems correct but a bit over complicated at first glance. I wonder if we have
something simpler already done somewhere.

Richard, does ARM (or any other arch) do anything of the sort? Aside from more trivial
byte swaps using bswap64() I didn't find anything similar.

We recently posted a big endian related fix here:

[PATCH for 9.0 v15 03/10] target/riscv/vector_helper.c: fix 'vmvr_v' memcpy endianess

But not sure how to apply it here.



Thanks,

Daniel



> +}
> +#else
> +void vext_set_elems_1s(void *vd, uint32_t is_agnostic, uint32_t esz,
> +                       uint32_t idx, uint32_t tot)
> +{
> +    if (is_agnostic == 0) {
> +        /* policy undisturbed */
> +        return;
> +    }
> +    uint32_t cnt = idx * esz;
> +    memset(vd + cnt, -1, tot - cnt);
> +}
> +#endif
> +
>   void vext_set_elems_1s_le(void *base, uint32_t is_agnostic, uint32_t cnt,
>                          uint32_t tot)
>   {
> diff --git a/target/riscv/vector_internals.h b/target/riscv/vector_internals.h
> index fa599f60ca..c96e52f926 100644
> --- a/target/riscv/vector_internals.h
> +++ b/target/riscv/vector_internals.h
> @@ -114,6 +114,8 @@ static inline uint32_t vext_get_total_elems(CPURISCVState *env, uint32_t desc,
>   }
>   
>   /* set agnostic elements to 1s */
> +void vext_set_elems_1s(void *vd, uint32_t is_agnostic, uint32_t esz,
> +                       uint32_t idx, uint32_t tot);
>   void vext_set_elems_1s_le(void *base, uint32_t is_agnostic, uint32_t cnt,
>                          uint32_t tot);
>
Richard Henderson March 19, 2024, 11:32 p.m. UTC | #2
On 3/19/24 11:57, Daniel Henrique Barboza wrote:
> This seems correct but a bit over complicated at first glance. I wonder if we have
> something simpler already done somewhere.
> 
> Richard, does ARM (or any other arch) do anything of the sort? Aside from more trivial
> byte swaps using bswap64() I didn't find anything similar.

No, nothing quite like.

> We recently posted a big endian related fix here:
> 
> [PATCH for 9.0 v15 03/10] target/riscv/vector_helper.c: fix 'vmvr_v' memcpy endianess
> 
> But not sure how to apply it here.

It's almost exactly the same, only with memset instead of memcpy.

     if (HOST_BIG_ENDIAN && idx % 8 != 0) {
         uint32_t j = ROUND_UP(idx, 8);
         memset(vd + H(j - 1), -1, j - idx);
         idx = j;
     }
     memset(vd + idx, -1, tot - idx);


I'll note that you don't need to change the api of vext_set_elems_1s -- so most of these 
patches are not required.


r~
Huang Tao March 20, 2024, 2:41 a.m. UTC | #3
I will rewrite the patch, and send a new version soon.

Thanks,

Huang Tao

On 2024/3/20 07:32, Richard Henderson wrote:
> On 3/19/24 11:57, Daniel Henrique Barboza wrote:
>> This seems correct but a bit over complicated at first glance. I 
>> wonder if we have
>> something simpler already done somewhere.
>>
>> Richard, does ARM (or any other arch) do anything of the sort? Aside 
>> from more trivial
>> byte swaps using bswap64() I didn't find anything similar.
>
> No, nothing quite like.
>
>> We recently posted a big endian related fix here:
>>
>> [PATCH for 9.0 v15 03/10] target/riscv/vector_helper.c: fix 'vmvr_v' 
>> memcpy endianess
>>
>> But not sure how to apply it here.
>
> It's almost exactly the same, only with memset instead of memcpy.
>
>     if (HOST_BIG_ENDIAN && idx % 8 != 0) {
>         uint32_t j = ROUND_UP(idx, 8);
>         memset(vd + H(j - 1), -1, j - idx);
>         idx = j;
>     }
>     memset(vd + idx, -1, tot - idx);
>
>
> I'll note that you don't need to change the api of vext_set_elems_1s 
> -- so most of these patches are not required.
>
>
> r~
diff mbox series

Patch

diff --git a/target/riscv/vector_internals.c b/target/riscv/vector_internals.c
index 349b24f4ae..455be96996 100644
--- a/target/riscv/vector_internals.c
+++ b/target/riscv/vector_internals.c
@@ -20,6 +20,59 @@ 
 #include "vector_internals.h"
 
 /* set agnostic elements to 1s */
+#if HOST_BIG_ENDIAN
+void vext_set_elems_1s(void *vd, uint32_t is_agnostic, uint32_t esz,
+                       uint32_t idx, uint32_t tot)
+{
+    if (is_agnostic == 0) {
+        /* policy undisturbed */
+        return;
+    }
+    void *base = NULL;
+    switch (esz) {
+    case 1:
+        base = ((int8_t *)vd + H1(idx));
+    break;
+    case 2:
+        base = ((int16_t *)vd + H2(idx));
+    break;
+    case 4:
+        base = ((int32_t *)vd + H4(idx));
+    break;
+    case 8:
+        base = ((int64_t *)vd + H8(idx));
+    break;
+    default:
+        g_assert_not_reached();
+    break;
+    }
+    /*
+     * spilt the elements into 2 parts
+     * part_begin: the memory need to be set in the first uint64_t unit
+     * part_allign: the memory need to be set begins from next uint64_t
+     *              unit and alligned to 8
+     */
+    uint32_t cnt = idx * esz;
+    int part_begin, part_allign;
+    part_begin = MIN(tot - cnt, 8 - (cnt % 8));
+    part_allign = ((tot - cnt - part_begin) / 8) * 8;
+
+    memset(base - part_begin + 1, -1, part_begin);
+    memset(QEMU_ALIGN_PTR_UP(base, 8), -1, part_allign);
+}
+#else
+void vext_set_elems_1s(void *vd, uint32_t is_agnostic, uint32_t esz,
+                       uint32_t idx, uint32_t tot)
+{
+    if (is_agnostic == 0) {
+        /* policy undisturbed */
+        return;
+    }
+    uint32_t cnt = idx * esz;
+    memset(vd + cnt, -1, tot - cnt);
+}
+#endif
+
 void vext_set_elems_1s_le(void *base, uint32_t is_agnostic, uint32_t cnt,
                        uint32_t tot)
 {
diff --git a/target/riscv/vector_internals.h b/target/riscv/vector_internals.h
index fa599f60ca..c96e52f926 100644
--- a/target/riscv/vector_internals.h
+++ b/target/riscv/vector_internals.h
@@ -114,6 +114,8 @@  static inline uint32_t vext_get_total_elems(CPURISCVState *env, uint32_t desc,
 }
 
 /* set agnostic elements to 1s */
+void vext_set_elems_1s(void *vd, uint32_t is_agnostic, uint32_t esz,
+                       uint32_t idx, uint32_t tot);
 void vext_set_elems_1s_le(void *base, uint32_t is_agnostic, uint32_t cnt,
                        uint32_t tot);