diff mbox series

[RFC,v3,30/34] Hexagon (target/hexagon) TCG for instructions with multiple definitions

Message ID 1597765847-16637-31-git-send-email-tsimpson@quicinc.com (mailing list archive)
State New, archived
Headers show
Series Hexagon patch series | expand

Commit Message

Taylor Simpson Aug. 18, 2020, 3:50 p.m. UTC
Helpers won't work if there are multiple definitions, so we override these
instructions using #define fGEN_TCG_<tag>.

Signed-off-by: Taylor Simpson <tsimpson@quicinc.com>
---
 target/hexagon/gen_tcg.h | 198 +++++++++++++++++++++++++++++++++++++++++++++++
 target/hexagon/helper.h  |   2 +
 target/hexagon/genptr.c  |   1 +
 3 files changed, 201 insertions(+)
 create mode 100644 target/hexagon/gen_tcg.h

Comments

Richard Henderson Aug. 29, 2020, 2:02 a.m. UTC | #1
On 8/18/20 8:50 AM, Taylor Simpson wrote:
> +++ b/target/hexagon/helper.h
> @@ -15,6 +15,8 @@
>   *  along with this program; if not, see <http://www.gnu.org/licenses/>.
>   */
>  
> +#include "gen_tcg.h"

Why would you need this here?  Definitely looks wrong.


r~
Taylor Simpson Aug. 30, 2020, 7:48 p.m. UTC | #2
> -----Original Message-----
> From: Richard Henderson <richard.henderson@linaro.org>
> Sent: Friday, August 28, 2020 8:03 PM
> To: Taylor Simpson <tsimpson@quicinc.com>; qemu-devel@nongnu.org
> Cc: philmd@redhat.com; laurent@vivier.eu; riku.voipio@iki.fi;
> aleksandar.m.mail@gmail.com; ale@rev.ng
> Subject: Re: [RFC PATCH v3 30/34] Hexagon (target/hexagon) TCG for
> instructions with multiple definitions
>
> On 8/18/20 8:50 AM, Taylor Simpson wrote:
> > +++ b/target/hexagon/helper.h
> > @@ -15,6 +15,8 @@
> >   *  along with this program; if not, see <http://www.gnu.org/licenses/>.
> >   */
> >
> > +#include "gen_tcg.h"
>
> Why would you need this here?  Definitely looks wrong.

I'll add the following comment to indicate what's going on

/*
  * Each of the generated helpers is wrapped with #ifndef fGEN_TCG_<tag>.
  * For example
   *     #ifndef fGEN_TCG_A2_add
   *     DEF_HELPER_3(A2_add, s32, env, s32, s32)
   *     #endif
  * We include gen_tcg.h here to provide definitions of fGEN_TCG for any instructions that
  * are overridden.
  *
  * This prevents unused helpers from taking up space in the executable.
  */
Richard Henderson Aug. 30, 2020, 9:13 p.m. UTC | #3
On 8/30/20 12:48 PM, Taylor Simpson wrote:
> I'll add the following comment to indicate what's going on
> 
> /*
>   * Each of the generated helpers is wrapped with #ifndef fGEN_TCG_<tag>.
>   * For example
>    *     #ifndef fGEN_TCG_A2_add
>    *     DEF_HELPER_3(A2_add, s32, env, s32, s32)
>    *     #endif
>   * We include gen_tcg.h here to provide definitions of fGEN_TCG for any instructions that
>   * are overridden.
>   *
>   * This prevents unused helpers from taking up space in the executable.
>   */

Ah, hum.  Well.

How about we figure out a way to communicate to the generator scripts which
functions have been implemented "directly", and don't need to be generated at all?

I don't see why we have to wait until the c preprocessor to remove this unused
code, and the less we have of it, the better.


r~
Taylor Simpson Aug. 30, 2020, 9:30 p.m. UTC | #4
> -----Original Message-----
> From: Richard Henderson <richard.henderson@linaro.org>
> Sent: Sunday, August 30, 2020 3:13 PM
> To: Taylor Simpson <tsimpson@quicinc.com>; qemu-devel@nongnu.org
> Cc: philmd@redhat.com; laurent@vivier.eu; riku.voipio@iki.fi;
> aleksandar.m.mail@gmail.com; ale@rev.ng
> Subject: Re: [RFC PATCH v3 30/34] Hexagon (target/hexagon) TCG for
> instructions with multiple definitions
>
> On 8/30/20 12:48 PM, Taylor Simpson wrote:
> > I'll add the following comment to indicate what's going on
> >
> > /*
> >   * Each of the generated helpers is wrapped with #ifndef
> fGEN_TCG_<tag>.
> >   * For example
> >    *     #ifndef fGEN_TCG_A2_add
> >    *     DEF_HELPER_3(A2_add, s32, env, s32, s32)
> >    *     #endif
> >   * We include gen_tcg.h here to provide definitions of fGEN_TCG for any
> instructions that
> >   * are overridden.
> >   *
> >   * This prevents unused helpers from taking up space in the executable.
> >   */
>
> Ah, hum.  Well.
>
> How about we figure out a way to communicate to the generator scripts
> which
> functions have been implemented "directly", and don't need to be
> generated at all?
>
> I don't see why we have to wait until the c preprocessor to remove this
> unused
> code, and the less we have of it, the better.
>

A few reasons
- Makes it easy to override instruction helpers.  All one has to do is #define fGEN_TCG_<tag>.
- When debugging the override, sometimes you want to quickly revert back to the helper.  Or if you've written a bunch of overrides in one shot and then find that a test case is failing, you can binary search which one to turn off and get the test to pass.  This is the one with the bug to fix.
- Reduces time for an incremental build.  When we add or delete an override, we don't have to re-run the generator.
Richard Henderson Aug. 30, 2020, 11:26 p.m. UTC | #5
On 8/30/20 2:30 PM, Taylor Simpson wrote:
> 
> 
>> -----Original Message-----
>> From: Richard Henderson <richard.henderson@linaro.org>
>> Sent: Sunday, August 30, 2020 3:13 PM
>> To: Taylor Simpson <tsimpson@quicinc.com>; qemu-devel@nongnu.org
>> Cc: philmd@redhat.com; laurent@vivier.eu; riku.voipio@iki.fi;
>> aleksandar.m.mail@gmail.com; ale@rev.ng
>> Subject: Re: [RFC PATCH v3 30/34] Hexagon (target/hexagon) TCG for
>> instructions with multiple definitions
>>
>> On 8/30/20 12:48 PM, Taylor Simpson wrote:
>>> I'll add the following comment to indicate what's going on
>>>
>>> /*
>>>   * Each of the generated helpers is wrapped with #ifndef
>> fGEN_TCG_<tag>.
>>>   * For example
>>>    *     #ifndef fGEN_TCG_A2_add
>>>    *     DEF_HELPER_3(A2_add, s32, env, s32, s32)
>>>    *     #endif
>>>   * We include gen_tcg.h here to provide definitions of fGEN_TCG for any
>> instructions that
>>>   * are overridden.
>>>   *
>>>   * This prevents unused helpers from taking up space in the executable.
>>>   */
>>
>> Ah, hum.  Well.
>>
>> How about we figure out a way to communicate to the generator scripts
>> which
>> functions have been implemented "directly", and don't need to be
>> generated at all?
>>
>> I don't see why we have to wait until the c preprocessor to remove this
>> unused
>> code, and the less we have of it, the better.
>>
> 
> A few reasons
> - Makes it easy to override instruction helpers.  All one has to do is #define fGEN_TCG_<tag>.

If the generator can examine, say, genptr_override.c.inc, then you don't even
have to add a #define.  Just add the code.

Perhaps something like

DEFINE_FGEN(tag)
{
    // some code
}

where DEFINE_FGEN expands to the appropriate function declaration.  The
generator then need only grep '^DEFINE_FGEN' and extract the list of overridden
tags.


> - When debugging the override, sometimes you want to quickly revert back to the helper.  Or if you've written a bunch of overrides in one shot and then find that a test case is failing, you can binary search which one to turn off and get the test to pass.  This is the one with the bug to fix.

No difference there, just binary searching on different text.

> - Reduces time for an incremental build.  When we add or delete an override, we don't have to re-run the generator.

About this I care not at all.  I can't imagine that more than fractions of a
second are at stake.


r~
Taylor Simpson Aug. 31, 2020, 5:08 p.m. UTC | #6
> >> -----Original Message-----
> >> From: Richard Henderson <richard.henderson@linaro.org>
> >> Sent: Sunday, August 30, 2020 3:13 PM
> >> To: Taylor Simpson <tsimpson@quicinc.com>; qemu-devel@nongnu.org
> >> Cc: philmd@redhat.com; laurent@vivier.eu; riku.voipio@iki.fi;
> >> aleksandar.m.mail@gmail.com; ale@rev.ng
> >> Subject: Re: [RFC PATCH v3 30/34] Hexagon (target/hexagon) TCG for
> >> instructions with multiple definitions
> >>
> >> On 8/30/20 12:48 PM, Taylor Simpson wrote:
> >>> I'll add the following comment to indicate what's going on
> >>>
> >>> /*
> >>>   * Each of the generated helpers is wrapped with #ifndef
> >> fGEN_TCG_<tag>.
> >>>   * For example
> >>>    *     #ifndef fGEN_TCG_A2_add
> >>>    *     DEF_HELPER_3(A2_add, s32, env, s32, s32)
> >>>    *     #endif
> >>>   * We include gen_tcg.h here to provide definitions of fGEN_TCG for
> any
> >> instructions that
> >>>   * are overridden.
> >>>   *
> >>>   * This prevents unused helpers from taking up space in the executable.
> >>>   */
> >>
> >> Ah, hum.  Well.
> >>
> >> How about we figure out a way to communicate to the generator scripts
> >> which
> >> functions have been implemented "directly", and don't need to be
> >> generated at all?
> >>
> >> I don't see why we have to wait until the c preprocessor to remove this
> >> unused
> >> code, and the less we have of it, the better.
> >>
> >
> > A few reasons
> > - Makes it easy to override instruction helpers.  All one has to do is #define
> fGEN_TCG_<tag>.
>
> If the generator can examine, say, genptr_override.c.inc, then you don't
> even
> have to add a #define.  Just add the code.
>
> Perhaps something like
>
> DEFINE_FGEN(tag)
> {
>     // some code
> }
>
> where DEFINE_FGEN expands to the appropriate function declaration.  The
> generator then need only grep '^DEFINE_FGEN' and extract the list of
> overridden
> tags.
>
>
> > - When debugging the override, sometimes you want to quickly revert back
> to the helper.  Or if you've written a bunch of overrides in one shot and then
> find that a test case is failing, you can binary search which one to turn off and
> get the test to pass.  This is the one with the bug to fix.
>
> No difference there, just binary searching on different text.
>
> > - Reduces time for an incremental build.  When we add or delete an
> override, we don't have to re-run the generator.
>
> About this I care not at all.  I can't imagine that more than fractions of a
> second are at stake.

I can modify the generator to skip instructions with overrides.

There are some assumptions here I'd like to clarify.  When I started this discussion, there seemed to be general resistance to the concept of a generator.  Because of this, I made the generator as simple as possible and pushed the complexity and optimization into code that consumes the output of the generator.  Also, I assumed people wouldn't read the output of the generator, so I didn't focus on making it readable.

Now, it sounds like my assumptions were not correct.  You pointed out two things to do in the generator
- Expand the macros inline
- Skip the instructions that have overrides
I addition, there other things I should do if we want the generated files to be more readable
- Indent the code
- Only generate one of the fGEN_TCG_<tag> or gen_helper_<tag>
- Generate the declaration of the generate_<tag> function instead of just the body

Thoughts?

Thanks,
Taylor
Richard Henderson Aug. 31, 2020, 5:29 p.m. UTC | #7
On 8/31/20 10:08 AM, Taylor Simpson wrote:
> There are some assumptions here I'd like to clarify.  When I started this
> discussion, there seemed to be general resistance to the concept of a
> generator.  Because of this, I made the generator as simple as possible and
> pushed the complexity and optimization into code that consumes the output of
> the generator.  Also, I assumed people wouldn't read the output of the
> generator, so I didn't focus on making it readable.

Except, at some point, one has to debug this code.
If the code is unreadable, how do you figure out what's broken?

> Now, it sounds like my assumptions were not correct.  You pointed out two
> things to do in the generator> - Expand the macros inline
> - Skip the instructions that have overrides

Yes please.

> I addition, there other things I should do if we want the generated files to be more readable
> - Indent the code

Helpful, yes.

I wouldn't worry about nested statements within the *.def files too much,
except to put each ";" terminated statement on a new line, so that gdb's step
command goes to the next statement instead of skipping everything all at once.

> - Only generate one of the fGEN_TCG_<tag> or gen_helper_<tag>

That would be part of "skip the instructions that have overrides", would it not?

> - Generate the declaration of the generate_<tag> function instead of just the body

I'm not quite sure what this means.

Aren't the "generate_<tag>" functions private to genptr.c?  Why would we need a
separate declaration of them, as opposed to just a definition?


r~
Taylor Simpson Aug. 31, 2020, 6:14 p.m. UTC | #8
> -----Original Message-----
> From: Richard Henderson <richard.henderson@linaro.org>
> Sent: Monday, August 31, 2020 11:29 AM
> To: Taylor Simpson <tsimpson@quicinc.com>; qemu-devel@nongnu.org
> Cc: philmd@redhat.com; laurent@vivier.eu; riku.voipio@iki.fi;
> aleksandar.m.mail@gmail.com; ale@rev.ng
> Subject: Re: [RFC PATCH v3 30/34] Hexagon (target/hexagon) TCG for
> instructions with multiple definitions
>
> On 8/31/20 10:08 AM, Taylor Simpson wrote:
> > There are some assumptions here I'd like to clarify.  When I started this
> > discussion, there seemed to be general resistance to the concept of a
> > generator.  Because of this, I made the generator as simple as possible and
> > pushed the complexity and optimization into code that consumes the
> output of
> > the generator.  Also, I assumed people wouldn't read the output of the
> > generator, so I didn't focus on making it readable.
>
> Except, at some point, one has to debug this code.
> If the code is unreadable, how do you figure out what's broken?
>
> > Now, it sounds like my assumptions were not correct.  You pointed out two
> > things to do in the generator> - Expand the macros inline
> > - Skip the instructions that have overrides
>
> Yes please.
>
> > I addition, there other things I should do if we want the generated files to
> be more readable
> > - Indent the code
>
> Helpful, yes.
>
> I wouldn't worry about nested statements within the *.def files too much,
> except to put each ";" terminated statement on a new line, so that gdb's step
> command goes to the next statement instead of skipping everything all at
> once.
>
> > - Only generate one of the fGEN_TCG_<tag> or gen_helper_<tag>
>
> That would be part of "skip the instructions that have overrides", would it
> not?

Just to be explicit, the code that generates code is generated as
    #ifdef fGEN_TCG_A2_add
    fGEN_TCG_A2_add({ RdV=RsV+RtV;});
    #else
    do {
    gen_helper_A2_add(RdV, cpu_env, RsV, RtV);
    } while (0);
    #endif
If we're going to have the generator know if there is an override, this would be more readable as either
    fGEN_TCG_A2_add({ RdV=RsV+RtV;});
or
    gen_helper_A2_add(RdV, cpu_env, RsV, RtV);


> > - Generate the declaration of the generate_<tag> function instead of just
> the body
>
> I'm not quite sure what this means.
>
> Aren't the "generate_<tag>" functions private to genptr.c?  Why would we
> need a
> separate declaration of them, as opposed to just a definition?

In genptr.c, there is
    #define DEF_TCG_FUNC(TAG, GENFN) \
    static void generate_##TAG(CPUHexagonState *env, DisasContext *ctx, \
                               insn_t *insn, packet_t *pkt) \
    { \
        GENFN \
    }
    #include "tcg_funcs_generated.h"
    #undef DEF_TCG_FUNC
The generated code only has the body of the function.  It would be more readable to move the static void generate_##TAG ... into the generated code.
Richard Henderson Aug. 31, 2020, 7:20 p.m. UTC | #9
On 8/31/20 11:14 AM, Taylor Simpson wrote:
> Just to be explicit, the code that generates code is generated as
>     #ifdef fGEN_TCG_A2_add
>     fGEN_TCG_A2_add({ RdV=RsV+RtV;});
>     #else
>     do {
>     gen_helper_A2_add(RdV, cpu_env, RsV, RtV);
>     } while (0);
>     #endif
> If we're going to have the generator know if there is an override, this would be more readable as either
>     fGEN_TCG_A2_add({ RdV=RsV+RtV;});
> or
>     gen_helper_A2_add(RdV, cpu_env, RsV, RtV);

Not quite, see below.

> In genptr.c, there is
>     #define DEF_TCG_FUNC(TAG, GENFN) \
>     static void generate_##TAG(CPUHexagonState *env, DisasContext *ctx, \
>                                insn_t *insn, packet_t *pkt) \
>     { \
>         GENFN \
>     }
>     #include "tcg_funcs_generated.h"
>     #undef DEF_TCG_FUNC
> The generated code only has the body of the function.  It would be more
> readable to move the static void generate_##TAG ... into the generated
> code.

Yes.

The fGEN_TCG_A2_add macro does not require nor use that {...} argument.  What
it *does* need are the same arguments as are given to generate_<rtag>.  I
assume you are using those arguments implicitly in your current fGEN_TCG_<rtag>
instances?

It would be cleanest to only have the generate_* functions.

Either they are written by hand (replacing the current fGEN_TCG_*), or they are
generated.  In either case, there's just the one level of indirection from
opcode_genptr[].

I'd imagine

--- genptr.c

#define DEF_TCG_FUNC(TAG) \
static void generate_##TAG(CPUHexagonState *env, \
    DisasContext *ctx, insn_t *insn, packet_t *pkt)

/*
 * All IIDs with an explicit implementation,
 * overriding the auto-generated helper functions.
 */

DEF_TCG_FUNC(A2_add)
{
    /* { RdV=RsV+RtV;} */
    tcg_gen_add_tl(args...);
}

/*
 * Generate calls to the auto-generate helpers,
 * and slot everything into the opcode_genptr table.
 */
#include "genptr_generated.c.inc"

--- genptr_generated.c.inc

DEF_TCG_FUNC(A4_tlbmatch)
{
   gen_helper_A4_tlbmatch(args...);
}

// etc

const SemanticInsn opcode_genptr[] = {
    // All IID's, generated or not.
};

---

This leaves genptr.c as the file to grep for '^DEF_TCG_FUNC'.


r~
Taylor Simpson Aug. 31, 2020, 11:10 p.m. UTC | #10
> -----Original Message-----
> From: Richard Henderson <richard.henderson@linaro.org>
> Sent: Monday, August 31, 2020 1:20 PM
> To: Taylor Simpson <tsimpson@quicinc.com>; qemu-devel@nongnu.org
> Cc: philmd@redhat.com; laurent@vivier.eu; riku.voipio@iki.fi;
> aleksandar.m.mail@gmail.com; ale@rev.ng
> Subject: Re: [RFC PATCH v3 30/34] Hexagon (target/hexagon) TCG for
> instructions with multiple definitions
>
> The fGEN_TCG_A2_add macro does not require nor use that {...} argument.

The fGEN_TCG_A2_add macro does need that argument, but there are cases that do need it.  Here's an example from gen_tcg.h
    #define fGEN_TCG_L2_loadrub_pr(SHORTCODE)      SHORTCODE
This is explained in the README, but basically the argument is useful if we can properly define the macros that it contains to generate TCG.


> What it *does* need are the same arguments as are given to generate_<rtag>.  I
> assume you are using those arguments implicitly in your current
> fGEN_TCG_<rtag>
> instances?

Yes

>
> It would be cleanest to only have the generate_* functions.
>
> Either they are written by hand (replacing the current fGEN_TCG_*), or they
> are
> generated.  In either case, there's just the one level of indirection from
> opcode_genptr[].
>
> I'd imagine
>
> --- genptr.c
>
> #define DEF_TCG_FUNC(TAG) \
> static void generate_##TAG(CPUHexagonState *env, \
>     DisasContext *ctx, insn_t *insn, packet_t *pkt)
>
> /*
>  * All IIDs with an explicit implementation,
>  * overriding the auto-generated helper functions.
>  */
>
> DEF_TCG_FUNC(A2_add)
> {
>     /* { RdV=RsV+RtV;} */
>     tcg_gen_add_tl(args...);
> }

There's additional generated code before and after the tcg_gen_add_tl.  IMO, we don't want the person who writes an override having to reproduce the generated code.  Assuming we have a definition of fGEN_TCG_A2_add and we have the generator intelligently expanding the macros, this is what will be generated.

static void generate_A2_add(CPUHexagonState *env, DisasContext *ctx, insn_t *insn, packet_t *pkt)
{
/* A2_add */
int RdN =insn->regno[0];
TCGv RdV = tcg_temp_local_new();
int RsN = insn->regno[1];
TCGv RsV = hex_gpr[RsN];
int RtN = insn->regno[2];
TCGv RtV = hex_gpr[RtN];

fGEN_TCG_A2_add({ RdV=RsV+RtV;});

gen_log_reg_write(RdN, RdV, insn->slot, 0);
ctx_log_reg_write(ctx, RdN);

tcg_temp_free(RdV);
/* A2_add */
}

If there weren't an override, we'd get this

static void generate_A2_add(CPUHexagonState *env, DisasContext *ctx, insn_t *insn, packet_t *pkt)
{
/* A2_add */
int RdN =insn->regno[0];
TCGv RdV = tcg_temp_local_new();
int RsN = insn->regno[1];
TCGv RsV = hex_gpr[RsN];
int RtN = insn->regno[2];
TCGv RtV = hex_gpr[RtN];

gen_helper_A2_add(RdV, cpu_env, RsV, RtV);                    /* Only difference is this line */

gen_log_reg_write(RdN, RdV, insn->slot, 0);
ctx_log_reg_write(ctx, RdN);

tcg_temp_free(RdV);
/* A2_add */
}

The fGEN_TCG_<tag> macro can also mention the operands of the instruction (RdV, RsV, RtV in this example).

Unlike the generate_<tag> functions that all have the same signature.  The overrides would have different signatures.  This would be more defensive programming because you know exactly where the variables come from but more verbose when writing the overrides by hand.  Also, note that these need to be macros in order to take advantage of the SHORTCODE.

In other words, instead of
#define fGEN_TCG_A2_add(SHORTCODE)    tcg_gen_add_tl(RdV, RsV, RtV)

We would write
#define fGEN_TCG_A2_add(env, ctx, insn, pkt, RdV, RsV, RtV, SHORTCODE)    tcg_gen_add_tl(RdV, RsV, RtV);

Personally, I prefer the former, but will change to the latter if you feel strongly.

I'm not married to the fGEN_TCG_<tag> name.  DEF_TCG_<tag> would also be fine.

>
> /*
>  * Generate calls to the auto-generate helpers,
>  * and slot everything into the opcode_genptr table.
>  */
> #include "genptr_generated.c.inc"
>
> --- genptr_generated.c.inc
>
> DEF_TCG_FUNC(A4_tlbmatch)
> {
>    gen_helper_A4_tlbmatch(args...);
> }
>
> // etc
>
> const SemanticInsn opcode_genptr[] = {
>     // All IID's, generated or not.
> };
>
> ---
>
> This leaves genptr.c as the file to grep for '^DEF_TCG_FUNC'.
>
>
> r~
Richard Henderson Sept. 1, 2020, 2:40 a.m. UTC | #11
On 8/31/20 4:10 PM, Taylor Simpson wrote:
> 
> 
>> -----Original Message-----
>> From: Richard Henderson <richard.henderson@linaro.org>
>> Sent: Monday, August 31, 2020 1:20 PM
>> To: Taylor Simpson <tsimpson@quicinc.com>; qemu-devel@nongnu.org
>> Cc: philmd@redhat.com; laurent@vivier.eu; riku.voipio@iki.fi;
>> aleksandar.m.mail@gmail.com; ale@rev.ng
>> Subject: Re: [RFC PATCH v3 30/34] Hexagon (target/hexagon) TCG for
>> instructions with multiple definitions
>>
>> The fGEN_TCG_A2_add macro does not require nor use that {...} argument.
> 
> The fGEN_TCG_A2_add macro does need that argument, but there are cases that
> do need it.  Here's an example from gen_tcg.h
>     #define fGEN_TCG_L2_loadrub_pr(SHORTCODE)      SHORTCODE
> This is explained in the README, but basically the argument is useful if we
> can properly define the macros that it contains to generate TCG.
We're certainly not going to be able to handle e.g. "+" or "if", so it is going
to work only for the most trivial of SHORTCODE.

Though in fact loadrub_pr makes that grade...

> IMO, we don't want the person who writes an override having to reproduce the 
> generated code.  Assuming we have a definition of fGEN_TCG_A2_add and we
> have the generator intelligently expanding the macros, this is what will be
> generated.
You need to give me a better example that A2_add, then.  Because I see that
being exactly one line, calling a helper that handles all instructions of the
same format, passing tcg_gen_add_tl as a callback.

Have a browse through my recent microblaze decodetree conversion.  Note that
the basic logical operations are implemented with exactly one source line.

> Unlike the generate_<tag> functions that all have the same signature.  The overrides would have different signatures.  This would be more defensive programming because you know exactly where the variables come from but more verbose when writing the overrides by hand.  Also, note that these need to be macros in order to take advantage of the SHORTCODE.
> 
> In other words, instead of
> #define fGEN_TCG_A2_add(SHORTCODE)    tcg_gen_add_tl(RdV, RsV, RtV)
> 
> We would write
> #define fGEN_TCG_A2_add(env, ctx, insn, pkt, RdV, RsV, RtV, SHORTCODE)    tcg_gen_add_tl(RdV, RsV, RtV);
> 
> Personally, I prefer the former, but will change to the latter if you feel strongly.

This comes from trying to handle instructions in different ways, but represent
them all the same.

I guess I see the attraction of the magic non-parameters -- you get a
compilation error if the variable is not present, but are not tied to
positional parameters.

Ho hum.  Maybe I'm trying to overthink this too much before tackling the
ultimate goal of full parsing of the SHORTCODE.

Perhaps the only thing for the short term is to have the generator grep
genptr.c for "#define fGEN", to choose between the two alternatives: inline
generation or out-of-line helper generation.


r~
Taylor Simpson Sept. 1, 2020, 4:17 a.m. UTC | #12
> -----Original Message-----
> From: Richard Henderson <richard.henderson@linaro.org>
> Sent: Monday, August 31, 2020 8:41 PM
> To: Taylor Simpson <tsimpson@quicinc.com>; qemu-devel@nongnu.org
> Cc: philmd@redhat.com; laurent@vivier.eu; riku.voipio@iki.fi;
> aleksandar.m.mail@gmail.com; ale@rev.ng
> Subject: Re: [RFC PATCH v3 30/34] Hexagon (target/hexagon) TCG for
> instructions with multiple definitions
>
> On 8/31/20 4:10 PM, Taylor Simpson wrote:
> >
> >
> >> -----Original Message-----
> >> From: Richard Henderson <richard.henderson@linaro.org>
> >> Sent: Monday, August 31, 2020 1:20 PM
> >> To: Taylor Simpson <tsimpson@quicinc.com>; qemu-devel@nongnu.org
> >> Cc: philmd@redhat.com; laurent@vivier.eu; riku.voipio@iki.fi;
> >> aleksandar.m.mail@gmail.com; ale@rev.ng
> >> Subject: Re: [RFC PATCH v3 30/34] Hexagon (target/hexagon) TCG for
> >> instructions with multiple definitions
> >>
> >> The fGEN_TCG_A2_add macro does not require nor use that {...}
> argument.
> >
> > The fGEN_TCG_A2_add macro does need that argument, but there are
> cases that
> > do need it.  Here's an example from gen_tcg.h
> >     #define fGEN_TCG_L2_loadrub_pr(SHORTCODE)      SHORTCODE
> > This is explained in the README, but basically the argument is useful if we
> > can properly define the macros that it contains to generate TCG.
> We're certainly not going to be able to handle e.g. "+" or "if", so it is going
> to work only for the most trivial of SHORTCODE.
>
> Though in fact loadrub_pr makes that grade...

The prior version of this series included all the overrides I've written to date.  To reduce the size of this version, I removed most of them and only left the ones that are essential for correct execution.  I plan to submit the others in subsequent series.  Anyway, there are >50 overrides of load/store instructions that leverage SHORTCODE.

> > IMO, we don't want the person who writes an override having to
> reproduce the
> > generated code.  Assuming we have a definition of fGEN_TCG_A2_add and
> we
> > have the generator intelligently expanding the macros, this is what will be
> > generated.
> You need to give me a better example that A2_add, then.  Because I see that
> being exactly one line, calling a helper that handles all instructions of the
> same format, passing tcg_gen_add_tl as a callback.

Here's a more complicated example for a predicated post-increment load

Static void generate_L2_ploadrit_pi(CPUHexagonState *env, DisasContext*cts, insn_t *insn, packet_t *pkt)
{
/* L2_ploadrit_pi */
TCGv EA = tcg_temp_local_new();
int PtN = insn->regno[0];
TCGv PtV = hex_pred[PtN];
int RdN = insn->regno[1];
TCGv RdV = tcg_temp_local_new();
if (!is_preloaded(ctx, RdN)) {
    tcg_gen_mov_tl(hex_hew_value[RdN], hex_gpr[RdN]);
}
int RxN = insn->regno[2];
TCGv RxV = tcg_temp_local_new();
if (!is_preloaded(ctx, RxN)) {
    tcg_gen_mov_tl(hex_new_value[RdN], hex_gpr[RxN]);
}
int siV = insn->immed[0];
tcg_gen_mov_tl(RxV, hex_gpr[RxN]);
fGEN_TCG_L2_ploadrit_pi({fEA_REG(RxV); if(fLSBOLD(PtV)){ fPM_I(RxV,siV); fLOAD(1,4,u,EA,RdV);} else {LOAD_CANCEL(EA);}});
gen_log_reg_write(RdN, RdV, insn->slot, 1);
gen_log_reg_write(RxN, RxV, insn->slot, 1);
tcg_temp_free(EA);
tcg_temp_free(RdV);
tcg_temp_free(RxV);
/* L2_ploadrit_pi */
}


> Have a browse through my recent microblaze decodetree conversion.  Note
> that
> the basic logical operations are implemented with exactly one source line.

With a helper function, our compares are all one line also

static inline void gen_compare(TCGCond cond, TCGv res, TCGv arg1, TCGv arg2)
{
    TCGv one = tcg_const_tl(0xff);
    TCGv zero = tcg_const_tl(0);

    tcg_gen_movcond_tl(cond, res, arg1, arg2, one, zero);

    tcg_temp_free(one);
    tcg_temp_free(zero);
}

/* Compare instructions */
#define fGEN_TCG_C2_cmpeq(SHORTCODE) \
    gen_compare(TCG_COND_EQ, PdV, RsV, RtV)
#define fGEN_TCG_C4_cmpneq(SHORTCODE) \
    gen_compare(TCG_COND_NE, PdV, RsV, RtV)
#define fGEN_TCG_C2_cmpgt(SHORTCODE) \
    gen_compare(TCG_COND_GT, PdV, RsV, RtV)
#define fGEN_TCG_C2_cmpgtu(SHORTCODE) \
    gen_compare(TCG_COND_GTU, PdV, RsV, RtV)
...



> > Unlike the generate_<tag> functions that all have the same signature.  The
> overrides would have different signatures.  This would be more defensive
> programming because you know exactly where the variables come from but
> more verbose when writing the overrides by hand.  Also, note that these
> need to be macros in order to take advantage of the SHORTCODE.
> >
> > In other words, instead of
> > #define fGEN_TCG_A2_add(SHORTCODE)    tcg_gen_add_tl(RdV, RsV, RtV)
> >
> > We would write
> > #define fGEN_TCG_A2_add(env, ctx, insn, pkt, RdV, RsV, RtV,
> SHORTCODE)    tcg_gen_add_tl(RdV, RsV, RtV);
> >
> > Personally, I prefer the former, but will change to the latter if you feel
> strongly.
>
> This comes from trying to handle instructions in different ways, but
> represent
> them all the same.
>
> I guess I see the attraction of the magic non-parameters -- you get a
> compilation error if the variable is not present, but are not tied to
> positional parameters.
>
> Ho hum.  Maybe I'm trying to overthink this too much before tackling the
> ultimate goal of full parsing of the SHORTCODE.

Alessandro (ale@rev.ng) and Niccolo (nizzo@rev.ng) are working on this 
Taylor Simpson Sept. 24, 2020, 2:56 a.m. UTC | #13
> > On 8/31/20 4:10 PM, Taylor Simpson wrote:
> > >
> > >
> > >> -----Original Message-----
> > >> From: Richard Henderson <richard.henderson@linaro.org>
> > >> Sent: Monday, August 31, 2020 1:20 PM
> > >> To: Taylor Simpson <tsimpson@quicinc.com>; qemu-devel@nongnu.org
> > >> Cc: philmd@redhat.com; laurent@vivier.eu; riku.voipio@iki.fi;
> > >> aleksandar.m.mail@gmail.com; ale@rev.ng
> > >> Subject: Re: [RFC PATCH v3 30/34] Hexagon (target/hexagon) TCG for
> > >> instructions with multiple definitions
> > >>
> > Ho hum.  Maybe I'm trying to overthink this too much before tackling the
> > ultimate goal of full parsing of the SHORTCODE.
> > Perhaps the only thing for the short term is to have the generator grep
> > genptr.c for "#define fGEN", to choose between the two alternatives:
> inline
> > generation or out-of-line helper generation.
>
> That's certainly doable.  It will also be good to implement some of your other
> ideas
> - Have the generator expand the DECL/READ/WRITE/FREE macros will make
> the generated code more readable and we can specialize them for
> predicated vs non-predicated instructions which will make translation faster.
> - Generate the entire generate_<tag> function instead of just the body will
> make the generated code more readable.

I've made these changes to the generator.  I hope you like the results.  As an example, here is what we generate for the add instruction

DEF_TCG_FUNC(A2_add,
static void generate_A2_add(
                CPUHexagonState *env,
                DisasContext *ctx,
                insn_t *insn,
                packet_t *pkt)
{
    TCGv RdV = tcg_temp_local_new();
    const int RdN = insn->regno[0];
    TCGv RsV = hex_gpr[insn->regno[1]];
    TCGv RtV = hex_gpr[insn->regno[2]];
    gen_helper_A2_add(RdV, cpu_env, RsV, RtV);
    gen_log_reg_write(RdN, RdV);
    ctx_log_reg_write(ctx, RdN);
    tcg_temp_free(RdV);
})

And here is how the generated file gets used in genptr.c

#define DEF_TCG_FUNC(TAG, GENFN) \
    GENFN
#include "tcg_funcs_generated.h"
#undef DEF_TCG_FUNC

/*
 * Not all opcodes have generate_<tag> functions, so initialize
 * the table from the tcg_funcs_generated.h file.
 */
const semantic_insn_t opcode_genptr[XX_LAST_OPCODE] = {
#define DEF_TCG_FUNC(TAG, GENFN) \
    [TAG] = generate_##TAG,
#include "tcg_funcs_generated.h"
#undef DEF_TCG_FUNC
};

I've also addressed several of the items from Richard's review, so I'll resubmit the series once I figure out how to get "make check-tcg" working under meson.

Thanks,
Taylor
Richard Henderson Sept. 24, 2020, 3:03 p.m. UTC | #14
On 9/23/20 7:56 PM, Taylor Simpson wrote:
> 
> 
>>> On 8/31/20 4:10 PM, Taylor Simpson wrote:
>>>>
>>>>
>>>>> -----Original Message-----
>>>>> From: Richard Henderson <richard.henderson@linaro.org>
>>>>> Sent: Monday, August 31, 2020 1:20 PM
>>>>> To: Taylor Simpson <tsimpson@quicinc.com>; qemu-devel@nongnu.org
>>>>> Cc: philmd@redhat.com; laurent@vivier.eu; riku.voipio@iki.fi;
>>>>> aleksandar.m.mail@gmail.com; ale@rev.ng
>>>>> Subject: Re: [RFC PATCH v3 30/34] Hexagon (target/hexagon) TCG for
>>>>> instructions with multiple definitions
>>>>>
>>> Ho hum.  Maybe I'm trying to overthink this too much before tackling the
>>> ultimate goal of full parsing of the SHORTCODE.
>>> Perhaps the only thing for the short term is to have the generator grep
>>> genptr.c for "#define fGEN", to choose between the two alternatives:
>> inline
>>> generation or out-of-line helper generation.
>>
>> That's certainly doable.  It will also be good to implement some of your other
>> ideas
>> - Have the generator expand the DECL/READ/WRITE/FREE macros will make
>> the generated code more readable and we can specialize them for
>> predicated vs non-predicated instructions which will make translation faster.
>> - Generate the entire generate_<tag> function instead of just the body will
>> make the generated code more readable.
> 
> I've made these changes to the generator.  I hope you like the results.  As an example, here is what we generate for the add instruction
> 
> DEF_TCG_FUNC(A2_add,
> static void generate_A2_add(
>                 CPUHexagonState *env,
>                 DisasContext *ctx,
>                 insn_t *insn,
>                 packet_t *pkt)
> {
>     TCGv RdV = tcg_temp_local_new();
>     const int RdN = insn->regno[0];
>     TCGv RsV = hex_gpr[insn->regno[1]];
>     TCGv RtV = hex_gpr[insn->regno[2]];
>     gen_helper_A2_add(RdV, cpu_env, RsV, RtV);
>     gen_log_reg_write(RdN, RdV);
>     ctx_log_reg_write(ctx, RdN);
>     tcg_temp_free(RdV);
> })

I would be happier if the entire function body were not in a macro.  Have you
tried to set a gdb breakpoint in one of these?  Once upon a time at least, this
would have resulted in all lines of the function becoming one "source line" in
the debug info.

I also think the full function prototype is unnecessary, and the replication of
"A2_add" undesirable.

I would prefer the function prototype itself to be macro-ized.

E.g.

DEF_TCG_FUNC(A2_add)
{
    TCGv RdV = tcg_temp_local_new();
    const int RdN = insn->regno[0];
    TCGv RsV = hex_gpr[insn->regno[1]];
    TCGv RtV = hex_gpr[insn->regno[2]];
    gen_helper_A2_add(RdV, cpu_env, RsV, RtV);
    gen_log_reg_write(RdN, RdV);
    ctx_log_reg_write(ctx, RdN);
    tcg_temp_free(RdV);
}

with

#define DEF_TCG_FUNC(TAG)                             \
    static void generate_##TAG(CPUHexagonState *env,  \
                               DisasContext *ctx,     \
                               insn_t *insn,          \
                               packet_t *pkt)

> And here is how the generated file gets used in genptr.c
> 
> #define DEF_TCG_FUNC(TAG, GENFN) \
>     GENFN
> #include "tcg_funcs_generated.h"
> #undef DEF_TCG_FUNC
> 
> /*
>  * Not all opcodes have generate_<tag> functions, so initialize
>  * the table from the tcg_funcs_generated.h file.
>  */
> const semantic_insn_t opcode_genptr[XX_LAST_OPCODE] = {
> #define DEF_TCG_FUNC(TAG, GENFN) \
>     [TAG] = generate_##TAG,
> #include "tcg_funcs_generated.h"
> #undef DEF_TCG_FUNC
> };

Obviously, the macro I propose above cannot be directly reused, as you do here.
 But I also think we should not try to do so.

You've got a script generating stuff.  It can just as easily generate two
different lists.  You're trying to do too much with the C preprocessor and too
little with python.

At some point in the v3 thread, I had suggested grepping for some macro in
order to indicate to the python script which tags are implemented manually.  My
definition above is easy to look for: exactly one thing on the line, easy regexp.

> I've also addressed several of the items from Richard's review, so I'll resubmit the series once I figure out how to get "make check-tcg" working under meson.

Yes, make check-tcg is currently broken, as are a few other check-foo.  I've
not yet had the courage to look into it, hoping that someone else will do it first.


r~
Taylor Simpson Sept. 24, 2020, 5:18 p.m. UTC | #15
> -----Original Message-----
> From: Richard Henderson <richard.henderson@linaro.org>
> Sent: Thursday, September 24, 2020 9:04 AM
> To: Taylor Simpson <tsimpson@quicinc.com>; qemu-devel@nongnu.org
> Cc: philmd@redhat.com; laurent@vivier.eu; riku.voipio@iki.fi;
> aleksandar.m.mail@gmail.com; ale@rev.ng
> Subject: Re: [RFC PATCH v3 30/34] Hexagon (target/hexagon) TCG for
> instructions with multiple definitions
>
> >
> > I've made these changes to the generator.  I hope you like the results.  As
> an example, here is what we generate for the add instruction
> >
> > DEF_TCG_FUNC(A2_add,
> > static void generate_A2_add(
> >                 CPUHexagonState *env,
> >                 DisasContext *ctx,
> >                 insn_t *insn,
> >                 packet_t *pkt)
> > {
> >     TCGv RdV = tcg_temp_local_new();
> >     const int RdN = insn->regno[0];
> >     TCGv RsV = hex_gpr[insn->regno[1]];
> >     TCGv RtV = hex_gpr[insn->regno[2]];
> >     gen_helper_A2_add(RdV, cpu_env, RsV, RtV);
> >     gen_log_reg_write(RdN, RdV);
> >     ctx_log_reg_write(ctx, RdN);
> >     tcg_temp_free(RdV);
> > })
>
> I would be happier if the entire function body were not in a macro.  Have you
> tried to set a gdb breakpoint in one of these?  Once upon a time at least, this
> would have resulted in all lines of the function becoming one "source line" in
> the debug info.

Good point.  It still comes out as a single line.

> I also think the full function prototype is unnecessary, and the replication of
> "A2_add" undesirable.
>
> I would prefer the function prototype itself to be macro-ized.
>
> E.g.
>
> DEF_TCG_FUNC(A2_add)
> {
>     TCGv RdV = tcg_temp_local_new();
>     const int RdN = insn->regno[0];
>     TCGv RsV = hex_gpr[insn->regno[1]];
>     TCGv RtV = hex_gpr[insn->regno[2]];
>     gen_helper_A2_add(RdV, cpu_env, RsV, RtV);
>     gen_log_reg_write(RdN, RdV);
>     ctx_log_reg_write(ctx, RdN);
>     tcg_temp_free(RdV);
> }
>
> with
>
> #define DEF_TCG_FUNC(TAG)                             \
>     static void generate_##TAG(CPUHexagonState *env,  \
>                                DisasContext *ctx,     \
>                                insn_t *insn,          \
>                                packet_t *pkt)
>
> > And here is how the generated file gets used in genptr.c
> >
> > #define DEF_TCG_FUNC(TAG, GENFN) \
> >     GENFN
> > #include "tcg_funcs_generated.h"
> > #undef DEF_TCG_FUNC
> >
> > /*
> >  * Not all opcodes have generate_<tag> functions, so initialize
> >  * the table from the tcg_funcs_generated.h file.
> >  */
> > const semantic_insn_t opcode_genptr[XX_LAST_OPCODE] = {
> > #define DEF_TCG_FUNC(TAG, GENFN) \
> >     [TAG] = generate_##TAG,
> > #include "tcg_funcs_generated.h"
> > #undef DEF_TCG_FUNC
> > };
>
> Obviously, the macro I propose above cannot be directly reused, as you do
> here.
>  But I also think we should not try to do so.
>
> You've got a script generating stuff.  It can just as easily generate two
> different lists.  You're trying to do too much with the C preprocessor and too
> little with python.

Sure, generating two lists was also suggested by Alessandro (ale@rev.ng).  Although doing more in python and less with the C preprocessor would lead to the conclusion we shouldn't define the function prototype in a macro.  If we generate two lists, what is the advantage of putting the function signature in a macro vs generating?

>
> At some point in the v3 thread, I had suggested grepping for some macro in
> order to indicate to the python script which tags are implemented manually.
> My
> definition above is easy to look for: exactly one thing on the line, easy
> regexp.

This is already done as well.  As you may recall, we were previously generating
    #ifdef fGEN_TCG_A2_add
    fGEN_TCG_A2_add({ RdV=RsV+RtV;});
    #else
    do {
    gen_helper_A2_add(RdV, cpu_env, RsV, RtV);
    } while (0);

The generator now searches for "#define fGEN_TCG_<tag>" and generates different code depending on what it finds.  This version of the series only contains overrides that are required for correct execution.  So, A2_add isn't there.  When we do override it, the generator produces this

static void generate_A2_add(
                CPUHexagonState *env,
                DisasContext *ctx,
                insn_t *insn,
                packet_t *pkt)
{
    TCGv RdV = tcg_temp_local_new();
    const int RdN = insn->regno[0];
    TCGv RsV = hex_gpr[insn->regno[1]];
    TCGv RtV = hex_gpr[insn->regno[2]];
    fGEN_TCG_A2_add({ RdV=RsV+RtV;});                  <---- This line is different
    gen_log_reg_write(RdN, RdV);
    ctx_log_reg_write(ctx, RdN);
    tcg_temp_free(RdV);
}

Also, if it finds the override, it doesn't generate the DEF_HELPER or the helper function.  That means we don't include tcg_gen.h in helper.h as you suggested.


Thanks,
Taylor
Richard Henderson Sept. 24, 2020, 7:04 p.m. UTC | #16
On 9/24/20 10:18 AM, Taylor Simpson wrote:
>> You've got a script generating stuff.  It can just as easily generate two
>> different lists.  You're trying to do too much with the C preprocessor and too
>> little with python.
> 
> Sure, generating two lists was also suggested by Alessandro (ale@rev.ng).  Although doing more in python and less with the C preprocessor would lead to the conclusion we shouldn't define the function prototype in a macro.  If we generate two lists, what is the advantage of putting the function signature in a macro vs generating?

None, because...

>> At some point in the v3 thread, I had suggested grepping for some macro in
>> order to indicate to the python script which tags are implemented manually.
>> My
>> definition above is easy to look for: exactly one thing on the line, easy
>> regexp.
> 
> This is already done as well.  As you may recall, we were previously generating
>     #ifdef fGEN_TCG_A2_add
>     fGEN_TCG_A2_add({ RdV=RsV+RtV;});
>     #else
>     do {
>     gen_helper_A2_add(RdV, cpu_env, RsV, RtV);
>     } while (0);
> 
> The generator now searches for "#define fGEN_TCG_<tag>" ...

... I'd forgotten that they were two different macros.

> Also, if it finds the override, it doesn't generate the DEF_HELPER or the helper function.  That means we don't include tcg_gen.h in helper.h as you suggested.

Excellent.


r~
diff mbox series

Patch

diff --git a/target/hexagon/gen_tcg.h b/target/hexagon/gen_tcg.h
new file mode 100644
index 0000000..35568d1
--- /dev/null
+++ b/target/hexagon/gen_tcg.h
@@ -0,0 +1,198 @@ 
+/*
+ *  Copyright(c) 2019-2020 Qualcomm Innovation Center, Inc. All Rights Reserved.
+ *
+ *  This program is free software; you can redistribute it and/or modify
+ *  it under the terms of the GNU General Public License as published by
+ *  the Free Software Foundation; either version 2 of the License, or
+ *  (at your option) any later version.
+ *
+ *  This program is distributed in the hope that it will be useful,
+ *  but WITHOUT ANY WARRANTY; without even the implied warranty of
+ *  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ *  GNU General Public License for more details.
+ *
+ *  You should have received a copy of the GNU General Public License
+ *  along with this program; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+#ifndef HEXAGON_GEN_TCG_H
+#define HEXAGON_GEN_TCG_H
+
+/*
+ * Here is a primer to understand the tag names for load/store instructions
+ *
+ * Data types
+ *      b        signed byte                       r0 = memb(r2+#0)
+ *     ub        unsigned byte                     r0 = memub(r2+#0)
+ *      h        signed half word (16 bits)        r0 = memh(r2+#0)
+ *     uh        unsigned half word                r0 = memuh(r2+#0)
+ *      i        integer (32 bits)                 r0 = memw(r2+#0)
+ *      d        double word (64 bits)             r1:0 = memd(r2+#0)
+ *
+ * Addressing modes
+ *     _io       indirect with offset              r0 = memw(r1+#4)
+ *     _ur       absolute with register offset     r0 = memw(r1<<#4+##variable)
+ *     _rr       indirect with register offset     r0 = memw(r1+r4<<#2)
+ *     gp        global pointer relative           r0 = memw(gp+#200)
+ *     _sp       stack pointer relative            r0 = memw(r29+#12)
+ *     _ap       absolute set                      r0 = memw(r1=##variable)
+ *     _pr       post increment register           r0 = memw(r1++m1)
+ *     _pi       post increment immediate          r0 = memb(r1++#1)
+ */
+
+/* Macros for complex addressing modes */
+#define GET_EA_ap \
+    do { \
+        fEA_IMM(UiV); \
+        tcg_gen_movi_tl(ReV, UiV); \
+    } while (0)
+#define GET_EA_pr \
+    do { \
+        fEA_REG(RxV); \
+        fPM_M(RxV, MuV); \
+    } while (0)
+#define GET_EA_pi \
+    do { \
+        fEA_REG(RxV); \
+        fPM_I(RxV, siV); \
+    } while (0)
+
+
+/* Instructions with multiple definitions */
+#define fGEN_TCG_LOAD_AP(RES, SIZE, SIGN) \
+    do { \
+        fMUST_IMMEXT(UiV); \
+        fEA_IMM(UiV); \
+        fLOAD(1, SIZE, SIGN, EA, RES); \
+        tcg_gen_movi_tl(ReV, UiV); \
+    } while (0)
+
+#define fGEN_TCG_L4_loadrub_ap(SHORTCODE) \
+    fGEN_TCG_LOAD_AP(RdV, 1, u)
+#define fGEN_TCG_L4_loadrb_ap(SHORTCODE) \
+    fGEN_TCG_LOAD_AP(RdV, 1, s)
+#define fGEN_TCG_L4_loadruh_ap(SHORTCODE) \
+    fGEN_TCG_LOAD_AP(RdV, 2, u)
+#define fGEN_TCG_L4_loadrh_ap(SHORTCODE) \
+    fGEN_TCG_LOAD_AP(RdV, 2, s)
+#define fGEN_TCG_L4_loadri_ap(SHORTCODE) \
+    fGEN_TCG_LOAD_AP(RdV, 4, u)
+#define fGEN_TCG_L4_loadrd_ap(SHORTCODE) \
+    fGEN_TCG_LOAD_AP(RddV, 8, u)
+
+#define fGEN_TCG_L2_loadrub_pr(SHORTCODE)      SHORTCODE
+#define fGEN_TCG_L2_loadrub_pi(SHORTCODE)      SHORTCODE
+#define fGEN_TCG_L2_loadrb_pr(SHORTCODE)       SHORTCODE
+#define fGEN_TCG_L2_loadrb_pi(SHORTCODE)       SHORTCODE;
+#define fGEN_TCG_L2_loadruh_pr(SHORTCODE)      SHORTCODE
+#define fGEN_TCG_L2_loadruh_pi(SHORTCODE)      SHORTCODE;
+#define fGEN_TCG_L2_loadrh_pr(SHORTCODE)       SHORTCODE
+#define fGEN_TCG_L2_loadrh_pi(SHORTCODE)       SHORTCODE
+#define fGEN_TCG_L2_loadri_pr(SHORTCODE)       SHORTCODE
+#define fGEN_TCG_L2_loadri_pi(SHORTCODE)       SHORTCODE
+#define fGEN_TCG_L2_loadrd_pr(SHORTCODE)       SHORTCODE
+#define fGEN_TCG_L2_loadrd_pi(SHORTCODE)       SHORTCODE
+
+/*
+ * Predicated loads
+ * Here is a primer to understand the tag names
+ *
+ * Predicate used
+ *      t        true "old" value                  if (p0) r0 = memb(r2+#0)
+ *      f        false "old" value                 if (!p0) r0 = memb(r2+#0)
+ *      tnew     true "new" value                  if (p0.new) r0 = memb(r2+#0)
+ *      fnew     false "new" value                 if (!p0.new) r0 = memb(r2+#0)
+ */
+#define fGEN_TCG_PRED_LOAD(GET_EA, PRED, SIZE, SIGN) \
+    do { \
+        TCGv LSB = tcg_temp_local_new(); \
+        TCGLabel *label = gen_new_label(); \
+        GET_EA; \
+        PRED;  \
+        PRED_LOAD_CANCEL(LSB, EA); \
+        tcg_gen_movi_tl(RdV, 0); \
+        tcg_gen_brcondi_tl(TCG_COND_EQ, LSB, 0, label); \
+            fLOAD(1, SIZE, SIGN, EA, RdV); \
+        gen_set_label(label); \
+        tcg_temp_free(LSB); \
+    } while (0)
+
+#define fGEN_TCG_L2_ploadrubt_pi(SHORTCODE) \
+    fGEN_TCG_PRED_LOAD(GET_EA_pi, fLSBOLD(PtV), 1, u)
+#define fGEN_TCG_L2_ploadrubf_pi(SHORTCODE) \
+    fGEN_TCG_PRED_LOAD(GET_EA_pi, fLSBOLDNOT(PtV), 1, u)
+#define fGEN_TCG_L2_ploadrubtnew_pi(SHORTCODE) \
+    fGEN_TCG_PRED_LOAD(GET_EA_pi, fLSBNEW(PtN), 1, u)
+#define fGEN_TCG_L2_ploadrubfnew_pi(SHORTCODE) \
+    fGEN_TCG_PRED_LOAD(GET_EA_pi, fLSBNEWNOT(PtN), 1, u)
+#define fGEN_TCG_L2_ploadrbt_pi(SHORTCODE) \
+    fGEN_TCG_PRED_LOAD(GET_EA_pi, fLSBOLD(PtV), 1, s)
+#define fGEN_TCG_L2_ploadrbf_pi(SHORTCODE) \
+    fGEN_TCG_PRED_LOAD(GET_EA_pi, fLSBOLDNOT(PtV), 1, s)
+#define fGEN_TCG_L2_ploadrbtnew_pi(SHORTCODE) \
+    fGEN_TCG_PRED_LOAD(GET_EA_pi, fLSBNEW(PtN), 1, s)
+#define fGEN_TCG_L2_ploadrbfnew_pi(SHORTCODE) \
+    fGEN_TCG_PRED_LOAD({ fEA_REG(RxV); fPM_I(RxV, siV); }, \
+                       fLSBNEWNOT(PtN), 1, s)
+
+#define fGEN_TCG_L2_ploadruht_pi(SHORTCODE) \
+    fGEN_TCG_PRED_LOAD(GET_EA_pi, fLSBOLD(PtV), 2, u)
+#define fGEN_TCG_L2_ploadruhf_pi(SHORTCODE) \
+    fGEN_TCG_PRED_LOAD(GET_EA_pi, fLSBOLDNOT(PtV), 2, u)
+#define fGEN_TCG_L2_ploadruhtnew_pi(SHORTCODE) \
+    fGEN_TCG_PRED_LOAD(GET_EA_pi, fLSBNEW(PtN), 2, u)
+#define fGEN_TCG_L2_ploadruhfnew_pi(SHORTCODE) \
+    fGEN_TCG_PRED_LOAD(GET_EA_pi, fLSBNEWNOT(PtN), 2, u)
+#define fGEN_TCG_L2_ploadrht_pi(SHORTCODE) \
+    fGEN_TCG_PRED_LOAD(GET_EA_pi, fLSBOLD(PtV), 2, s)
+#define fGEN_TCG_L2_ploadrhf_pi(SHORTCODE) \
+    fGEN_TCG_PRED_LOAD(GET_EA_pi, fLSBOLDNOT(PtV), 2, s)
+#define fGEN_TCG_L2_ploadrhtnew_pi(SHORTCODE) \
+    fGEN_TCG_PRED_LOAD(GET_EA_pi, fLSBNEW(PtN), 2, s)
+#define fGEN_TCG_L2_ploadrhfnew_pi(SHORTCODE) \
+    fGEN_TCG_PRED_LOAD(GET_EA_pi, fLSBNEWNOT(PtN), 2, s)
+
+#define fGEN_TCG_L2_ploadrit_pi(SHORTCODE) \
+    fGEN_TCG_PRED_LOAD(GET_EA_pi, fLSBOLD(PtV), 4, u)
+#define fGEN_TCG_L2_ploadrif_pi(SHORTCODE) \
+    fGEN_TCG_PRED_LOAD(GET_EA_pi, fLSBOLDNOT(PtV), 4, u)
+#define fGEN_TCG_L2_ploadritnew_pi(SHORTCODE) \
+    fGEN_TCG_PRED_LOAD(GET_EA_pi, fLSBNEW(PtN), 4, u)
+#define fGEN_TCG_L2_ploadrifnew_pi(SHORTCODE) \
+    fGEN_TCG_PRED_LOAD(GET_EA_pi, fLSBNEWNOT(PtN), 4, u)
+
+/* Predicated loads into a register pair */
+#define fGEN_TCG_PRED_LOAD_PAIR(GET_EA, PRED) \
+    do { \
+        TCGv LSB = tcg_temp_local_new(); \
+        TCGLabel *label = gen_new_label(); \
+        GET_EA; \
+        PRED;  \
+        PRED_LOAD_CANCEL(LSB, EA); \
+        tcg_gen_movi_i64(RddV, 0); \
+        tcg_gen_brcondi_tl(TCG_COND_EQ, LSB, 0, label); \
+            fLOAD(1, 8, u, EA, RddV); \
+        gen_set_label(label); \
+        tcg_temp_free(LSB); \
+    } while (0)
+
+#define fGEN_TCG_L2_ploadrdt_pi(SHORTCODE) \
+    fGEN_TCG_PRED_LOAD_PAIR(GET_EA_pi, fLSBOLD(PtV))
+#define fGEN_TCG_L2_ploadrdf_pi(SHORTCODE) \
+    fGEN_TCG_PRED_LOAD_PAIR(GET_EA_pi, fLSBOLDNOT(PtV))
+#define fGEN_TCG_L2_ploadrdtnew_pi(SHORTCODE) \
+    fGEN_TCG_PRED_LOAD_PAIR(GET_EA_pi, fLSBNEW(PtN))
+#define fGEN_TCG_L2_ploadrdfnew_pi(SHORTCODE) \
+    fGEN_TCG_PRED_LOAD_PAIR(GET_EA_pi, fLSBNEWNOT(PtN))
+
+/* load-locked and store-locked */
+#define fGEN_TCG_L2_loadw_locked(SHORTCODE) \
+    SHORTCODE
+#define fGEN_TCG_L4_loadd_locked(SHORTCODE) \
+    SHORTCODE
+#define fGEN_TCG_S2_storew_locked(SHORTCODE) \
+    do { SHORTCODE; READ_PREG(PdV, PdN); } while (0)
+#define fGEN_TCG_S4_stored_locked(SHORTCODE) \
+    do { SHORTCODE; READ_PREG(PdV, PdN); } while (0)
+
+#endif
diff --git a/target/hexagon/helper.h b/target/hexagon/helper.h
index 48b1917..cee902b 100644
--- a/target/hexagon/helper.h
+++ b/target/hexagon/helper.h
@@ -15,6 +15,8 @@ 
  *  along with this program; if not, see <http://www.gnu.org/licenses/>.
  */
 
+#include "gen_tcg.h"
+
 DEF_HELPER_2(raise_exception, noreturn, env, i32)
 DEF_HELPER_1(debug_start_packet, void, env)
 DEF_HELPER_3(debug_check_store_width, void, env, int, int)
diff --git a/target/hexagon/genptr.c b/target/hexagon/genptr.c
index a85fc14..1808660 100644
--- a/target/hexagon/genptr.c
+++ b/target/hexagon/genptr.c
@@ -26,6 +26,7 @@ 
 #include "translate.h"
 #include "macros.h"
 #include "genptr_helpers.h"
+#include "gen_tcg.h"
 
 #define DEF_TCG_FUNC(TAG, GENFN) \
 static void generate_##TAG(CPUHexagonState *env, DisasContext *ctx, \