Message ID | 1553525566-14913-2-git-send-email-mateja.marjanovic@rt-rk.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | target/mips: Add support for MSA instructions on a big endian host | expand |
> From: Mateja Marjanovic <mateja.marjanovic@rt-rk.com> > Subject: [PATCH v2 1/7] target/mips: Fix <ld|st>.<b|h|w|d> MSA instructions for MIPS big endian host Please split this patch into one for load instructions and another for store instructions. I don't think the variable "big_endian" is needed, you should better resolve all differences wrt endian via "#if defined(HOST_WORDS_BIGENDIAN)", like you did in other patches. It is important to be consistent. Thanks, Aleksandar
On 25.3.19. 22:21, Aleksandar Markovic wrote: >> From: Mateja Marjanovic <mateja.marjanovic@rt-rk.com> >> Subject: [PATCH v2 1/7] target/mips: Fix <ld|st>.<b|h|w|d> MSA instructions for MIPS big endian host > Please split this patch into one for load instructions and another for > store instructions. I will do that. > > I don't think the variable "big_endian" is needed, you should better resolve > all differences wrt endian via "#if defined(HOST_WORDS_BIGENDIAN)", > like you did in other patches. It is important to be consistent. I think you can't have a preprocessing statement inside a macro, so I did the thing that seemed most similar to that. > > Thanks, > Aleksandar Regards, Mateja
> From: Mateja Marjanovic <Mateja.Marjanovic@rt-rk.com> > Subject: Re: [PATCH v2 1/7] target/mips: Fix <ld|st>.<b|h|w|d> MSA instructions for MIPS big endian host > On 25.3.19. 22:21, Aleksandar Markovic wrote: >> From: Mateja Marjanovic <mateja.marjanovic@rt-rk.com> >> Subject: [PATCH v2 1/7] target/mips: Fix <ld|st>.<b|h|w|d> MSA instructions for MIPS big endian host >> Please split this patch into one for load instructions and another for >> store instructions. > > I will do that. > >> I don't think the variable "big_endian" is needed, you should better resolve >> all differences wrt endian via "#if defined(HOST_WORDS_BIGENDIAN)", >> like you did in other patches. It is important to be consistent. > > I think you can't have a preprocessing statement inside a macro, > so I did the thing that seemed most similar to that. > There should be some equivalent way of doing that involving an argument to the macro, however, in the light of your recent work on MSA optimization, let's split/demacro and unroll loops in load and store helpers. After splitting helper for LD.B should look something like this: (the code is just for demonstration purposes, doublecheck it) void helper_msa_ld_b(CPUMIPSState *env, uint32_t wd, target_ulong addr) { wr_t *pwd = &(env->active_fpu.fpr[wd].wr); wr_t wx; int i; MEMOP_IDX(DF_BYTE) for (i = 0; i < DF_ELEMENTS(DF_BYTE); i++) { #if !defined(CONFIG_USER_ONLY) wx.b[i] = helper_ret_ldub_mmu(env, addr + (i << DF_BYTE), oi, GETPC()); #else wx.b[i] = cpu_ldub_data(env, addr + (i << DF_BYTE)); #endif } memcpy(pwd, &wx, sizeof(wr_t)); } After unrolling: void helper_msa_ld_b(CPUMIPSState *env, uint32_t wd, target_ulong addr) { wr_t *pwd = &(env->active_fpu.fpr[wd].wr); wr_t wx; int i; MEMOP_IDX(DF_BYTE) #if !defined(CONFIG_USER_ONLY) wx.b[0] = helper_ret_ldub_mmu(env, addr, oi, GETPC()); wx.b[1] = helper_ret_ldub_mmu(env, addr + 1, oi, GETPC()); wx.b[2] = helper_ret_ldub_mmu(env, addr + 2, oi, GETPC()); wx.b[3] = helper_ret_ldub_mmu(env, addr + 3, oi, GETPC()); wx.b[4] = helper_ret_ldub_mmu(env, addr + 4, oi, GETPC()); wx.b[5] = helper_ret_ldub_mmu(env, addr + 5, oi, GETPC()); wx.b[6] = helper_ret_ldub_mmu(env, addr + 6, oi, GETPC()); wx.b[7] = helper_ret_ldub_mmu(env, addr + 7, oi, GETPC()); wx.b[8] = helper_ret_ldub_mmu(env, addr + 8, oi, GETPC()); wx.b[9] = helper_ret_ldub_mmu(env, addr + 9, oi, GETPC()); wx.b[10] = helper_ret_ldub_mmu(env, addr + 10, oi, GETPC()); wx.b[11] = helper_ret_ldub_mmu(env, addr + 11, oi, GETPC()); wx.b[12] = helper_ret_ldub_mmu(env, addr + 12, oi, GETPC()); wx.b[13] = helper_ret_ldub_mmu(env, addr + 13, oi, GETPC()); wx.b[14] = helper_ret_ldub_mmu(env, addr + 14, oi, GETPC()); wx.b[15] = helper_ret_ldub_mmu(env, addr + 15, oi, GETPC()); #else wx.b[0] = cpu_ldub_data(env, addr); wx.b[1] = cpu_ldub_data(env, addr + 1); wx.b[2] = cpu_ldub_data(env, addr + 2); wx.b[3] = cpu_ldub_data(env, addr + 3); wx.b[4] = cpu_ldub_data(env, addr + 4); wx.b[5] = cpu_ldub_data(env, addr + 5); wx.b[6] = cpu_ldub_data(env, addr + 6); wx.b[7] = cpu_ldub_data(env, addr + 7); wx.b[8] = cpu_ldub_data(env, addr + 8); wx.b[9] = cpu_ldub_data(env, addr + 9); wx.b[10] = cpu_ldub_data(env, addr + 10); wx.b[11] = cpu_ldub_data(env, addr + 11); wx.b[12] = cpu_ldub_data(env, addr + 12); wx.b[13] = cpu_ldub_data(env, addr + 13); wx.b[14] = cpu_ldub_data(env, addr + 14); wx.b[15] = cpu_ldub_data(env, addr + 15); #endif } memcpy(pwd, &wx, sizeof(wr_t)); } And then apply big-endian-related changes. I think that would be more in the spirit of other MSA work. Thanks, Aleksandar
> From: Aleksandar Markovic > Subject: Re: [PATCH v2 1/7] target/mips: Fix <ld|st>.<b|h|w|d> MSA instructions for MIPS big endian host > > > From: Mateja Marjanovic <Mateja.Marjanovic@rt-rk.com> > > Subject: Re: [PATCH v2 1/7] target/mips: Fix <ld|st>.<b|h|w|d> MSA instructions for MIPS big endian host > > On 25.3.19. 22:21, Aleksandar Markovic wrote: > >> From: Mateja Marjanovic <mateja.marjanovic@rt-rk.com> > >> Subject: [PATCH v2 1/7] target/mips: Fix <ld|st>.<b|h|w|d> MSA instructions for MIPS big endian host > >> Please split this patch into one for load instructions and another for > >> store instructions. > > > > I will do that. > > > >> I don't think the variable "big_endian" is needed, you should better resolve > >> all differences wrt endian via "#if defined(HOST_WORDS_BIGENDIAN)", > >> like you did in other patches. It is important to be consistent. > > > > I think you can't have a preprocessing statement inside a macro, > > so I did the thing that seemed most similar to that. > > > > There should be some equivalent way of doing that involving an argument > to the macro, however, in the light of your recent work on MSA optimization, > let's split/demacro and unroll loops in load and store helpers. > > After splitting helper for LD.B should look something like this: > (the code is just for demonstration purposes, doublecheck it) > > void helper_msa_ld_b(CPUMIPSState *env, uint32_t wd, target_ulong addr) > { > wr_t *pwd = &(env->active_fpu.fpr[wd].wr); > wr_t wx; > int i; > > MEMOP_IDX(DF_BYTE) > for (i = 0; i < DF_ELEMENTS(DF_BYTE); i++) { > #if !defined(CONFIG_USER_ONLY) > wx.b[i] = helper_ret_ldub_mmu(env, addr + (i << DF_BYTE), oi, GETPC()); > #else > wx.b[i] = cpu_ldub_data(env, addr + (i << DF_BYTE)); > #endif > } > memcpy(pwd, &wx, sizeof(wr_t)); > } > > After unrolling: > > void helper_msa_ld_b(CPUMIPSState *env, uint32_t wd, target_ulong addr) > { > wr_t *pwd = &(env->active_fpu.fpr[wd].wr); > wr_t wx; > int i; > > MEMOP_IDX(DF_BYTE) > > #if !defined(CONFIG_USER_ONLY) > wx.b[0] = helper_ret_ldub_mmu(env, addr, oi, GETPC()); > wx.b[1] = helper_ret_ldub_mmu(env, addr + 1, oi, GETPC()); > wx.b[2] = helper_ret_ldub_mmu(env, addr + 2, oi, GETPC()); > wx.b[3] = helper_ret_ldub_mmu(env, addr + 3, oi, GETPC()); > wx.b[4] = helper_ret_ldub_mmu(env, addr + 4, oi, GETPC()); > wx.b[5] = helper_ret_ldub_mmu(env, addr + 5, oi, GETPC()); > wx.b[6] = helper_ret_ldub_mmu(env, addr + 6, oi, GETPC()); > wx.b[7] = helper_ret_ldub_mmu(env, addr + 7, oi, GETPC()); > wx.b[8] = helper_ret_ldub_mmu(env, addr + 8, oi, GETPC()); > wx.b[9] = helper_ret_ldub_mmu(env, addr + 9, oi, GETPC()); > wx.b[10] = helper_ret_ldub_mmu(env, addr + 10, oi, GETPC()); > wx.b[11] = helper_ret_ldub_mmu(env, addr + 11, oi, GETPC()); > wx.b[12] = helper_ret_ldub_mmu(env, addr + 12, oi, GETPC()); > wx.b[13] = helper_ret_ldub_mmu(env, addr + 13, oi, GETPC()); > wx.b[14] = helper_ret_ldub_mmu(env, addr + 14, oi, GETPC()); > wx.b[15] = helper_ret_ldub_mmu(env, addr + 15, oi, GETPC()); > #else > wx.b[0] = cpu_ldub_data(env, addr); > wx.b[1] = cpu_ldub_data(env, addr + 1); > wx.b[2] = cpu_ldub_data(env, addr + 2); > wx.b[3] = cpu_ldub_data(env, addr + 3); > wx.b[4] = cpu_ldub_data(env, addr + 4); > wx.b[5] = cpu_ldub_data(env, addr + 5); > wx.b[6] = cpu_ldub_data(env, addr + 6); > wx.b[7] = cpu_ldub_data(env, addr + 7); > wx.b[8] = cpu_ldub_data(env, addr + 8); > wx.b[9] = cpu_ldub_data(env, addr + 9); > wx.b[10] = cpu_ldub_data(env, addr + 10); > wx.b[11] = cpu_ldub_data(env, addr + 11); > wx.b[12] = cpu_ldub_data(env, addr + 12); > wx.b[13] = cpu_ldub_data(env, addr + 13); > wx.b[14] = cpu_ldub_data(env, addr + 14); > wx.b[15] = cpu_ldub_data(env, addr + 15); > #endif > } > memcpy(pwd, &wx, sizeof(wr_t)); > } > > > And then apply big-endian-related changes. > > I think that would be more in the spirit of other MSA work. The same approach (splitting helpers/unrolling loops) should be good for COPY_S, COPY_U, INSERT as well. Thanks, Aleksandar
diff --git a/target/mips/op_helper.c b/target/mips/op_helper.c index 0f272a5..5441ab2 100644 --- a/target/mips/op_helper.c +++ b/target/mips/op_helper.c @@ -4371,18 +4371,37 @@ FOP_CONDN_S(sne, (float32_lt(fst1, fst0, &env->active_fpu.fp_status) #define MEMOP_IDX(DF) #endif -#define MSA_LD_DF(DF, TYPE, LD_INSN, ...) \ -void helper_msa_ld_ ## TYPE(CPUMIPSState *env, uint32_t wd, \ - target_ulong addr) \ -{ \ - wr_t *pwd = &(env->active_fpu.fpr[wd].wr); \ - wr_t wx; \ - int i; \ - MEMOP_IDX(DF) \ - for (i = 0; i < DF_ELEMENTS(DF); i++) { \ - wx.TYPE[i] = LD_INSN(env, addr + (i << DF), ##__VA_ARGS__); \ - } \ - memcpy(pwd, &wx, sizeof(wr_t)); \ +#if defined(HOST_WORDS_BIGENDIAN) + bool big_endian = 1; +#else + bool big_endian = 0; +#endif + +#define MSA_LD_DF(DF, TYPE, LD_INSN, ...) \ +void helper_msa_ld_ ## TYPE(CPUMIPSState *env, uint32_t wd, \ + target_ulong addr) \ +{ \ + wr_t *pwd = &(env->active_fpu.fpr[wd].wr); \ + wr_t wx; \ + int i, k; \ + MEMOP_IDX(DF) \ + if (!big_endian) { \ + for (i = 0; i < DF_ELEMENTS(DF); i++) { \ + wx.TYPE[i] = LD_INSN(env, addr + (i << DF), ##__VA_ARGS__); \ + } \ + } else { \ + for (i = 0; i < DF_ELEMENTS(DF); i++) { \ + if (i < DF_ELEMENTS(DF) / 2) { \ + k = DF_ELEMENTS(DF) / 2 - i - 1; \ + wx.TYPE[i] = LD_INSN(env, addr + (k << DF), ##__VA_ARGS__); \ + } else { \ + k = 3 * DF_ELEMENTS(DF) / 2 - i - 1; \ + wx.TYPE[i] = LD_INSN(env, addr + (k << DF), ##__VA_ARGS__); \ + } \ + } \ + } \ + \ + memcpy(pwd, &wx, sizeof(wr_t)); \ } #if !defined(CONFIG_USER_ONLY) @@ -4417,18 +4436,30 @@ static inline void ensure_writable_pages(CPUMIPSState *env, #endif } -#define MSA_ST_DF(DF, TYPE, ST_INSN, ...) \ -void helper_msa_st_ ## TYPE(CPUMIPSState *env, uint32_t wd, \ - target_ulong addr) \ -{ \ - wr_t *pwd = &(env->active_fpu.fpr[wd].wr); \ - int mmu_idx = cpu_mmu_index(env, false); \ - int i; \ - MEMOP_IDX(DF) \ - ensure_writable_pages(env, addr, mmu_idx, GETPC()); \ - for (i = 0; i < DF_ELEMENTS(DF); i++) { \ - ST_INSN(env, addr + (i << DF), pwd->TYPE[i], ##__VA_ARGS__); \ - } \ +#define MSA_ST_DF(DF, TYPE, ST_INSN, ...) \ +void helper_msa_st_ ## TYPE(CPUMIPSState *env, uint32_t wd, \ + target_ulong addr) \ +{ \ + wr_t *pwd = &(env->active_fpu.fpr[wd].wr); \ + int mmu_idx = cpu_mmu_index(env, false); \ + int i, k; \ + MEMOP_IDX(DF) \ + ensure_writable_pages(env, addr, mmu_idx, GETPC()); \ + if (!big_endian) { \ + for (i = 0; i < DF_ELEMENTS(DF); i++) { \ + ST_INSN(env, addr + (i << DF), pwd->TYPE[i], ##__VA_ARGS__); \ + } \ + } else { \ + for (i = 0; i < DF_ELEMENTS(DF); i++) { \ + if (i < DF_ELEMENTS(DF) / 2) { \ + k = DF_ELEMENTS(DF) / 2 - i - 1; \ + ST_INSN(env, addr + (k << DF), pwd->TYPE[i], ##__VA_ARGS__); \ + } else { \ + k = 3 * DF_ELEMENTS(DF) / 2 - i - 1; \ + ST_INSN(env, addr + (k << DF), pwd->TYPE[i], ##__VA_ARGS__); \ + } \ + } \ + } \ } #if !defined(CONFIG_USER_ONLY)