Message ID | 20190409135243.12424-4-raphael.gault@arm.com (mailing list archive) |
---|---|
State | RFC |
Headers | show |
Series | objtool: Add support for Arm64 | expand |
I'm just doing my initial read-through,.. however On Tue, Apr 09, 2019 at 02:52:40PM +0100, Raphael Gault wrote: > + if (!(sec->sh.sh_flags & SHF_EXECINSTR) > + && (strcmp(sec->name, ".altinstr_replacement") || !IGNORE_SHF_EXEC_FLAG)) > continue; could you please not format code like that. Operators go at the end of the line, and continuation should match the indentation of the opening paren. So the above would look like: > + if (!(sec->sh.sh_flags & SHF_EXECINSTR) && > + (strcmp(sec->name, ".altinstr_replacement") || !IGNORE_SHF_EXEC_FLAG)) > continue; You appear to be doing that quit consistently, and it is against style.
On Tue, Apr 09, 2019 at 06:12:04PM +0200, Peter Zijlstra wrote: > > I'm just doing my initial read-through,.. however > > On Tue, Apr 09, 2019 at 02:52:40PM +0100, Raphael Gault wrote: > > + if (!(sec->sh.sh_flags & SHF_EXECINSTR) > > + && (strcmp(sec->name, ".altinstr_replacement") || !IGNORE_SHF_EXEC_FLAG)) > > continue; > > could you please not format code like that. Operators go at the end of > the line, and continuation should match the indentation of the opening > paren. So the above would look like: > > > + if (!(sec->sh.sh_flags & SHF_EXECINSTR) && > > + (strcmp(sec->name, ".altinstr_replacement") || !IGNORE_SHF_EXEC_FLAG)) > > continue; > > You appear to be doing that quit consistently, and it is against style. Raphael, as a heads-up, ./scripts/checkpatch.pl can catch issues like this. You can run it over a list of patches, so for a patch series you can run: $ ./scripts/checkpatch.pl *.patch ... and hopefully most of the output will be reasonable. Thanks, Mark.
On 09/04/2019 17:24, Mark Rutland wrote: > On Tue, Apr 09, 2019 at 06:12:04PM +0200, Peter Zijlstra wrote: >> >> I'm just doing my initial read-through,.. however >> >> On Tue, Apr 09, 2019 at 02:52:40PM +0100, Raphael Gault wrote: >>> + if (!(sec->sh.sh_flags & SHF_EXECINSTR) >>> + && (strcmp(sec->name, ".altinstr_replacement") || !IGNORE_SHF_EXEC_FLAG)) >>> continue; >> >> could you please not format code like that. Operators go at the end of >> the line, and continuation should match the indentation of the opening >> paren. So the above would look like: >> >>> + if (!(sec->sh.sh_flags & SHF_EXECINSTR) && >>> + (strcmp(sec->name, ".altinstr_replacement") || !IGNORE_SHF_EXEC_FLAG)) >>> continue; >> >> You appear to be doing that quit consistently, and it is against style. > > Raphael, as a heads-up, ./scripts/checkpatch.pl can catch issues like > this. You can run it over a list of patches, so for a patch series you > can run: > > $ ./scripts/checkpatch.pl *.patch > > ... and hopefully most of the output will be reasonable. > For this particular case, checkpatch only warns about it if you pass it "--strict" option. So in general it might be useful to include this option at least for the first pass at including large pieces of code. Cheers,
Hi, On 4/9/19 5:27 PM, Julien Thierry wrote: > > > On 09/04/2019 17:24, Mark Rutland wrote: >> On Tue, Apr 09, 2019 at 06:12:04PM +0200, Peter Zijlstra wrote: >>> >>> I'm just doing my initial read-through,.. however >>> >>> On Tue, Apr 09, 2019 at 02:52:40PM +0100, Raphael Gault wrote: >>>> +if (!(sec->sh.sh_flags & SHF_EXECINSTR) >>>> +&& (strcmp(sec->name, ".altinstr_replacement") || !IGNORE_SHF_EXEC_FLAG)) >>>> continue; >>> >>> could you please not format code like that. Operators go at the end of >>> the line, and continuation should match the indentation of the opening >>> paren. So the above would look like: >>> >>>> +if (!(sec->sh.sh_flags & SHF_EXECINSTR) && >>>> + (strcmp(sec->name, ".altinstr_replacement") || !IGNORE_SHF_EXEC_FLAG)) >>>> continue; >>> >>> You appear to be doing that quit consistently, and it is against style. Thank you for these remarks, I will correct this! >> >> Raphael, as a heads-up, ./scripts/checkpatch.pl can catch issues like >> this. You can run it over a list of patches, so for a patch series you >> can run: >> >> $ ./scripts/checkpatch.pl *.patch >> >> ... and hopefully most of the output will be reasonable. >> > > For this particular case, checkpatch only warns about it if you pass it > "--strict" option. So in general it might be useful to include this > option at least for the first pass at including large pieces of code. > Indeed that sounds usefull, thanks, > Cheers, > Cheers, -- Raphael Gault IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.
On Tue, Apr 09, 2019 at 02:52:40PM +0100, Raphael Gault wrote: > Since the way the initial stack frame when entering a function is different that what is done > in the x86_64 architecture, we need to add some more check to support the different cases. > As opposed as for x86_64, the return address is not stored by the call instruction but is instead > loaded in a register. The initial stack frame is thus empty when entering a function and 2 push > operations are needed to set it up correctly. All the different combinations need to be > taken into account. > > On arm64, the .altinstr_replacement section is not flagged as containing executable instructions > but we still need to process it. > > Switch tables are alse stored in a different way on arm64 than on x86_64 so we need to be able > to identify in which case we are when looking for it. > > Signed-off-by: Raphael Gault <raphael.gault@arm.com> > --- > tools/objtool/arch.h | 2 + > tools/objtool/arch/arm64/decode.c | 27 +++++++++ > tools/objtool/arch/x86/decode.c | 5 ++ > tools/objtool/check.c | 95 +++++++++++++++++++++++++++---- > tools/objtool/elf.c | 3 +- > 5 files changed, 120 insertions(+), 12 deletions(-) > > diff --git a/tools/objtool/arch.h b/tools/objtool/arch.h > index 0eff166ca613..f3bef3f2cef3 100644 > --- a/tools/objtool/arch.h > +++ b/tools/objtool/arch.h > @@ -88,4 +88,6 @@ unsigned long arch_compute_jump_destination(struct instruction *insn); > > unsigned long arch_compute_rela_sym_offset(int addend); > > +bool arch_is_insn_sibling_call(struct instruction *insn); > + > #endif /* _ARCH_H */ > diff --git a/tools/objtool/arch/arm64/decode.c b/tools/objtool/arch/arm64/decode.c > index 0feb3ae3af5d..8b293eae2b38 100644 > --- a/tools/objtool/arch/arm64/decode.c > +++ b/tools/objtool/arch/arm64/decode.c > @@ -105,6 +105,33 @@ unsigned long arch_compute_rela_sym_offset(int addend) > return addend; > } > > +/* > + * In order to know if we are in presence of a sibling > + * call and not in presence of a switch table we look > + * back at the previous instructions and see if we are > + * jumping inside the same function that we are already > + * in. > + */ > +bool arch_is_insn_sibling_call(struct instruction *insn) > +{ > + struct instruction *prev; > + struct list_head *l; > + struct symbol *sym; > + list_for_each_prev(l, &insn->list) { > + prev = (void *)l; > + if (!prev->func > + || prev->func->pfunc != insn->func->pfunc) > + return false; > + if (prev->stack_op.src.reg != ADR_SOURCE) > + continue; > + sym = find_symbol_containing(insn->sec, insn->immediate); > + if (!sym || sym->type != STT_FUNC > + || sym->pfunc != insn->func->pfunc) > + return true; > + break; > + } > + return true; > +} I get the feeling there might be a better way to do this, but I can't figure out what this function is actually doing. It looks like it searches backwards in the function for an instruction which has stack_op.src.reg != ADR_SOURCE -- what does that mean? And why doesn't it do anything with the instruction after it finds it? > static int is_arm64(struct elf *elf) > { > switch(elf->ehdr.e_machine){ > diff --git a/tools/objtool/arch/x86/decode.c b/tools/objtool/arch/x86/decode.c > index 1af7b4996307..88c3d99c76be 100644 > --- a/tools/objtool/arch/x86/decode.c > +++ b/tools/objtool/arch/x86/decode.c > @@ -85,6 +85,11 @@ unsigned long arch_compute_rela_sym_offset(int addend) > return addend + 4; > } > > +bool arch_is_insn_sibling_call(struct instruction *insn) > +{ > + return true; > +} All x86 instructions are sibling calls? > + > int arch_orc_read_unwind_hints(struct objtool_file *file) > { > struct section *sec, *relasec; > diff --git a/tools/objtool/check.c b/tools/objtool/check.c > index 17fcd8c8f9c1..fa6106214318 100644 > --- a/tools/objtool/check.c > +++ b/tools/objtool/check.c > @@ -261,10 +261,12 @@ static int decode_instructions(struct objtool_file *file) > unsigned long offset; > struct instruction *insn; > int ret; > + static int composed_insn = 0; > > for_each_sec(file, sec) { > > - if (!(sec->sh.sh_flags & SHF_EXECINSTR)) > + if (!(sec->sh.sh_flags & SHF_EXECINSTR) > + && (strcmp(sec->name, ".altinstr_replacement") || !IGNORE_SHF_EXEC_FLAG)) > continue; We should just fix the root cause instead, presumably: diff --git a/arch/arm64/include/asm/alternative.h b/arch/arm64/include/asm/alternative.h index b9f8d787eea9..e9e6b81e3eb4 100644 --- a/arch/arm64/include/asm/alternative.h +++ b/arch/arm64/include/asm/alternative.h @@ -71,7 +71,7 @@ static inline void apply_alternatives_module(void *start, size_t length) { } ALTINSTR_ENTRY(feature,cb) \ ".popsection\n" \ " .if " __stringify(cb) " == 0\n" \ - ".pushsection .altinstr_replacement, \"a\"\n" \ + ".pushsection .altinstr_replacement, \"ax\"\n" \ "663:\n\t" \ newinstr "\n" \ "664:\n\t" \ > > if (strcmp(sec->name, ".altinstr_replacement") && > @@ -297,10 +299,22 @@ static int decode_instructions(struct objtool_file *file) > WARN_FUNC("invalid instruction type %d", > insn->sec, insn->offset, insn->type); > ret = -1; > - goto err; > + free(insn); > + continue; What's the purpose of this change? If it's really needed then it looks like it should be a separate patch. > } > - > - hash_add(file->insn_hash, &insn->hash, insn->offset); > + /* > + * For arm64 architecture, we sometime split instructions so that > + * we can track the state evolution (i.e. load/store of pairs of registers). > + * We thus need to take both into account and not erase the previous ones. > + */ Ew... Is this an architectural thing, or just a quirk of the arm64 decoder? > + if (composed_insn > 0) > + hash_add(file->insn_hash, &insn->hash, insn->offset + composed_insn); > + else > + hash_add(file->insn_hash, &insn->hash, insn->offset); > + if (insn->len == 0) > + composed_insn++; > + else > + composed_insn = 0; > list_add_tail(&insn->list, &file->insn_list); > } > > @@ -510,10 +524,10 @@ static int add_jump_destinations(struct objtool_file *file) > dest_off = arch_compute_jump_destination(insn); > } else if (rela->sym->type == STT_SECTION) { > dest_sec = rela->sym->sec; > - dest_off = rela->addend + 4; > + dest_off = arch_compute_rela_sym_offset(rela->addend); > } else if (rela->sym->sec->idx) { > dest_sec = rela->sym->sec; > - dest_off = rela->sym->sym.st_value + rela->addend + 4; > + dest_off = rela->sym->sym.st_value + arch_compute_rela_sym_offset(rela->addend); > } else if (strstr(rela->sym->name, "_indirect_thunk_")) { > /* > * Retpoline jumps are really dynamic jumps in > @@ -663,7 +677,7 @@ static int handle_group_alt(struct objtool_file *file, > last_orig_insn = insn; > } > > - if (next_insn_same_sec(file, last_orig_insn)) { > + if (last_orig_insn && next_insn_same_sec(file, last_orig_insn)) { This might belong in a separate patch which explains the reason for the change. > fake_jump = malloc(sizeof(*fake_jump)); > if (!fake_jump) { > WARN("malloc failed"); > @@ -976,6 +990,17 @@ static struct rela *find_switch_table(struct objtool_file *file, > if (find_symbol_containing(rodata_sec, table_offset)) > continue; > > + /* > + * If we are on arm64 architecture, we now that we "know" > + * are in presence of a switch table thanks to > + * the `br <Xn>` insn. but we can't retrieve it yet. > + * So we just ignore unreachable for this file. > + */ > + if (JUMP_DYNAMIC_IS_SWITCH_TABLE) { > + file->ignore_unreachables = true; > + return NULL; > + } > + I think this means switch table reading is not yet supported? If so then maybe the flag should be called SWITCH_TABLE_NOT_SUPPORTED. But really this needs to be fixed anyway before merging the code. > rodata_rela = find_rela_by_dest(rodata_sec, table_offset); > if (rodata_rela) { > /* > @@ -1258,8 +1283,8 @@ static void save_reg(struct insn_state *state, unsigned char reg, int base, > > static void restore_reg(struct insn_state *state, unsigned char reg) > { > - state->regs[reg].base = CFI_UNDEFINED; > - state->regs[reg].offset = 0; > + state->regs[reg].base = initial_func_cfi.regs[reg].base; > + state->regs[reg].offset = initial_func_cfi.regs[reg].offset; > } > > /* > @@ -1415,8 +1440,33 @@ static int update_insn_state(struct instruction *insn, struct insn_state *state) > > /* add imm, %rsp */ > state->stack_size -= op->src.offset; > - if (cfa->base == CFI_SP) > + if (cfa->base == CFI_SP) { > cfa->offset -= op->src.offset; > + if (state->stack_size == 0 > + && initial_func_cfi.cfa.base == CFI_CFA) { > + cfa->base = CFI_CFA; > + cfa->offset = 0; > + } > + } > + /* > + * on arm64 the save/restore of sp into fp is not automatic > + * and the first one can be done without the other so we > + * need to be careful not to invalidate the stack frame in such > + * cases. > + */ > + else if (cfa->base == CFI_BP) { > + if (state->stack_size == 0 > + && initial_func_cfi.cfa.base == CFI_CFA) { > + cfa->base = CFI_CFA; > + cfa->offset = 0; > + restore_reg(state, CFI_BP); > + } > + } > + else if (cfa->base == CFI_CFA) { > + cfa->base = CFI_SP; > + if (state->stack_size >= 16) > + cfa->offset = 16; > + } > break; > } > > @@ -1427,6 +1477,16 @@ static int update_insn_state(struct instruction *insn, struct insn_state *state) > break; > } > > + if (op->src.reg == CFI_SP && op->dest.reg == CFI_BP && > + cfa->base == CFI_SP && > + regs[CFI_BP].base == CFI_CFA && > + regs[CFI_BP].offset == -cfa->offset) { > + > + /* mov %rsp, %rbp */ > + cfa->base = op->dest.reg; > + state->bp_scratch = false; > + break; > + } > if (op->src.reg == CFI_SP && cfa->base == CFI_SP) { > > /* drap: lea disp(%rsp), %drap */ > @@ -1518,6 +1578,9 @@ static int update_insn_state(struct instruction *insn, struct insn_state *state) > state->stack_size -= 8; > if (cfa->base == CFI_SP) > cfa->offset -= 8; > + if (cfa->base == CFI_SP && cfa->offset == 0 > + && initial_func_cfi.cfa.base == CFI_CFA) > + cfa->base = CFI_CFA; > > break; > > @@ -1557,6 +1620,8 @@ static int update_insn_state(struct instruction *insn, struct insn_state *state) > > case OP_DEST_PUSH: > state->stack_size += 8; > + if (cfa->base == CFI_CFA) > + cfa->base = CFI_SP; > if (cfa->base == CFI_SP) > cfa->offset += 8; > > @@ -1728,7 +1793,7 @@ static int validate_branch(struct objtool_file *file, struct instruction *first, > insn = first; > sec = insn->sec; > > - if (insn->alt_group && list_empty(&insn->alts)) { > + if (!insn->visited && insn->alt_group && list_empty(&insn->alts)) { Why? This looks like another one that might belong in a separate patch. > WARN_FUNC("don't know how to handle branch to middle of alternative instruction group", > sec, insn->offset); > return 1; > @@ -1871,6 +1936,14 @@ static int validate_branch(struct objtool_file *file, struct instruction *first, > case INSN_JUMP_DYNAMIC: > if (func && list_empty(&insn->alts) && > has_modified_stack_frame(&state)) { > + /* > + * On arm64 `br <Xn>` insn can be used for switch-tables > + * but it cannot be distinguished in itself from a sibling > + * call thus we need to have a look at the previous instructions > + * to determine which it is > + */ > + if (!arch_is_insn_sibling_call(insn)) > + break; > WARN_FUNC("sibling call from callable instruction with modified stack frame", > sec, insn->offset); > return 1; > diff --git a/tools/objtool/elf.c b/tools/objtool/elf.c > index b8f3cca8e58b..136f9b9fb1d1 100644 > --- a/tools/objtool/elf.c > +++ b/tools/objtool/elf.c > @@ -74,7 +74,8 @@ struct symbol *find_symbol_by_offset(struct section *sec, unsigned long offset) > struct symbol *sym; > > list_for_each_entry(sym, &sec->symbol_list, list) > - if (sym->type != STT_SECTION && > + if (sym->type != STT_NOTYPE && > + sym->type != STT_SECTION && Why? Another potential separate patch. > sym->offset == offset) > return sym; > > -- > 2.17.1 >
Hi Raphaël, I think you could split the patch in at least 3: - Handling the split instructions - Handling the jump offset - Dynamic jumps/switch table On 09/04/2019 14:52, Raphael Gault wrote: > Since the way the initial stack frame when entering a function is different that what is done Nit: "different from what is done" > in the x86_64 architecture, we need to add some more check to support the different cases. > As opposed as for x86_64, the return address is not stored by the call instruction but is instead > loaded in a register. The initial stack frame is thus empty when entering a function and 2 push > operations are needed to set it up correctly. All the different combinations need to be > taken into account. > > On arm64, the .altinstr_replacement section is not flagged as containing executable instructions > but we still need to process it. > > Switch tables are alse stored in a different way on arm64 than on x86_64 so we need to be able Nit: also > to identify in which case we are when looking for it. > > Signed-off-by: Raphael Gault <raphael.gault@arm.com> > --- > tools/objtool/arch.h | 2 + > tools/objtool/arch/arm64/decode.c | 27 +++++++++ > tools/objtool/arch/x86/decode.c | 5 ++ > tools/objtool/check.c | 95 +++++++++++++++++++++++++++---- > tools/objtool/elf.c | 3 +- > 5 files changed, 120 insertions(+), 12 deletions(-) > > diff --git a/tools/objtool/arch.h b/tools/objtool/arch.h > index 0eff166ca613..f3bef3f2cef3 100644 > --- a/tools/objtool/arch.h > +++ b/tools/objtool/arch.h > @@ -88,4 +88,6 @@ unsigned long arch_compute_jump_destination(struct instruction *insn); > > unsigned long arch_compute_rela_sym_offset(int addend); > > +bool arch_is_insn_sibling_call(struct instruction *insn); > + > #endif /* _ARCH_H */ > diff --git a/tools/objtool/arch/arm64/decode.c b/tools/objtool/arch/arm64/decode.c > index 0feb3ae3af5d..8b293eae2b38 100644 > --- a/tools/objtool/arch/arm64/decode.c > +++ b/tools/objtool/arch/arm64/decode.c > @@ -105,6 +105,33 @@ unsigned long arch_compute_rela_sym_offset(int addend) > return addend; > } > > +/* > + * In order to know if we are in presence of a sibling > + * call and not in presence of a switch table we look > + * back at the previous instructions and see if we are > + * jumping inside the same function that we are already > + * in. > + */ > +bool arch_is_insn_sibling_call(struct instruction *insn) > +{ > + struct instruction *prev; > + struct list_head *l; > + struct symbol *sym; > + list_for_each_prev(l, &insn->list) { > + prev = (void *)l; Please use list_entry() instead of casting, this only happens to work because list is the first member of the struct instruction. > + if (!prev->func > + || prev->func->pfunc != insn->func->pfunc) > + return false; > + if (prev->stack_op.src.reg != ADR_SOURCE) > + continue; > + sym = find_symbol_containing(insn->sec, insn->immediate); > + if (!sym || sym->type != STT_FUNC > + || sym->pfunc != insn->func->pfunc) > + return true; > + break; > + } > + return true; > +} > static int is_arm64(struct elf *elf) > { > switch(elf->ehdr.e_machine){ > diff --git a/tools/objtool/arch/x86/decode.c b/tools/objtool/arch/x86/decode.c > index 1af7b4996307..88c3d99c76be 100644 > --- a/tools/objtool/arch/x86/decode.c > +++ b/tools/objtool/arch/x86/decode.c > @@ -85,6 +85,11 @@ unsigned long arch_compute_rela_sym_offset(int addend) > return addend + 4; > } > > +bool arch_is_insn_sibling_call(struct instruction *insn) > +{ > + return true; The naming and what the function returns does seem to be really fitting. Makes it seem like every x86 instruction is a sibling call, which is unlikely to be the case. > +} > + > int arch_orc_read_unwind_hints(struct objtool_file *file) > { > struct section *sec, *relasec; > diff --git a/tools/objtool/check.c b/tools/objtool/check.c > index 17fcd8c8f9c1..fa6106214318 100644 > --- a/tools/objtool/check.c > +++ b/tools/objtool/check.c > @@ -261,10 +261,12 @@ static int decode_instructions(struct objtool_file *file) > unsigned long offset; > struct instruction *insn; > int ret; > + static int composed_insn = 0; > > for_each_sec(file, sec) { > > - if (!(sec->sh.sh_flags & SHF_EXECINSTR)) > + if (!(sec->sh.sh_flags & SHF_EXECINSTR) > + && (strcmp(sec->name, ".altinstr_replacement") || !IGNORE_SHF_EXEC_FLAG)) > continue; > > if (strcmp(sec->name, ".altinstr_replacement") && > @@ -297,10 +299,22 @@ static int decode_instructions(struct objtool_file *file) > WARN_FUNC("invalid instruction type %d", > insn->sec, insn->offset, insn->type); > ret = -1; > - goto err; > + free(insn); > + continue; > } > - > - hash_add(file->insn_hash, &insn->hash, insn->offset); > + /* > + * For arm64 architecture, we sometime split instructions so that > + * we can track the state evolution (i.e. load/store of pairs of registers). > + * We thus need to take both into account and not erase the previous ones. > + */ > + if (composed_insn > 0) > + hash_add(file->insn_hash, &insn->hash, insn->offset + composed_insn); > + else > + hash_add(file->insn_hash, &insn->hash, insn->offset); composed_insn has no reason to be negative, right? So this if is equivalent to: hash_add(file->insn_hash, &insn->hash, insn->offset + composed_insn); Also, that means that this only works because all arm64 instructions are 4 bytes long. Otherwise you could have an overlap between the "second" part of the composed instruction and the instruction that follows it. It feels a bit too hackish for the arch independent code. If we want to use that trick, maybe it should be the arm64 decode that should return a length of 1 or 2 for composed insn, and when decoding an instruction that isn't 4-bytes aligned, we would know to look 2 bytes before to find what we were decoding, then return (4 - len) so that we end up on the next instruction on the next itertation. This way we don't have to change anything to the decode loop. Also, I've got the impression that this hash table is most often use to find the instruction at the starting offset of a function. Meaning it is unlikely we'll end up looking up that composed instruction. Might be worth checking whether this is the case and if so, maybe we can just add the one real instruction to the hash table and focus on the instruction list for our instruction splitting. > + if (insn->len == 0) > + composed_insn++; > + else > + composed_insn = 0; > list_add_tail(&insn->list, &file->insn_list); > } > Thanks,
Hi, As Julien also pointed out I realise that similarly to the first patch of this series, this patch should be split into smaller ones. I will do this for the next version. On 4/23/19 9:36 PM, Josh Poimboeuf wrote: > On Tue, Apr 09, 2019 at 02:52:40PM +0100, Raphael Gault wrote: >> Since the way the initial stack frame when entering a function is different that what is done >> in the x86_64 architecture, we need to add some more check to support the different cases. >> As opposed as for x86_64, the return address is not stored by the call instruction but is instead >> loaded in a register. The initial stack frame is thus empty when entering a function and 2 push >> operations are needed to set it up correctly. All the different combinations need to be >> taken into account. >> >> On arm64, the .altinstr_replacement section is not flagged as containing executable instructions >> but we still need to process it. >> >> Switch tables are alse stored in a different way on arm64 than on x86_64 so we need to be able >> to identify in which case we are when looking for it. >> >> Signed-off-by: Raphael Gault <raphael.gault@arm.com> >> --- >> tools/objtool/arch.h | 2 + >> tools/objtool/arch/arm64/decode.c | 27 +++++++++ >> tools/objtool/arch/x86/decode.c | 5 ++ >> tools/objtool/check.c | 95 +++++++++++++++++++++++++++---- >> tools/objtool/elf.c | 3 +- >> 5 files changed, 120 insertions(+), 12 deletions(-) >> >> diff --git a/tools/objtool/arch.h b/tools/objtool/arch.h >> index 0eff166ca613..f3bef3f2cef3 100644 >> --- a/tools/objtool/arch.h >> +++ b/tools/objtool/arch.h >> @@ -88,4 +88,6 @@ unsigned long arch_compute_jump_destination(struct instruction *insn); >> >> unsigned long arch_compute_rela_sym_offset(int addend); >> >> +bool arch_is_insn_sibling_call(struct instruction *insn); >> + >> #endif /* _ARCH_H */ >> diff --git a/tools/objtool/arch/arm64/decode.c b/tools/objtool/arch/arm64/decode.c >> index 0feb3ae3af5d..8b293eae2b38 100644 >> --- a/tools/objtool/arch/arm64/decode.c >> +++ b/tools/objtool/arch/arm64/decode.c >> @@ -105,6 +105,33 @@ unsigned long arch_compute_rela_sym_offset(int addend) >> return addend; >> } >> >> +/* >> + * In order to know if we are in presence of a sibling >> + * call and not in presence of a switch table we look >> + * back at the previous instructions and see if we are >> + * jumping inside the same function that we are already >> + * in. >> + */ >> +bool arch_is_insn_sibling_call(struct instruction *insn) >> +{ >> +struct instruction *prev; >> +struct list_head *l; >> +struct symbol *sym; >> +list_for_each_prev(l, &insn->list) { >> +prev = (void *)l; >> +if (!prev->func >> +|| prev->func->pfunc != insn->func->pfunc) >> +return false; >> +if (prev->stack_op.src.reg != ADR_SOURCE) >> +continue; >> +sym = find_symbol_containing(insn->sec, insn->immediate); >> +if (!sym || sym->type != STT_FUNC >> +|| sym->pfunc != insn->func->pfunc) >> +return true; >> +break; >> +} >> +return true; >> +} > > I get the feeling there might be a better way to do this, but I can't > figure out what this function is actually doing. It looks like it > searches backwards in the function for an instruction which has > stack_op.src.reg != ADR_SOURCE -- what does that mean? And why doesn't > it do anything with the instruction after it finds it? > I will indeed try to make it better. >> static int is_arm64(struct elf *elf) >> { >> switch(elf->ehdr.e_machine){ >> diff --git a/tools/objtool/arch/x86/decode.c b/tools/objtool/arch/x86/decode.c >> index 1af7b4996307..88c3d99c76be 100644 >> --- a/tools/objtool/arch/x86/decode.c >> +++ b/tools/objtool/arch/x86/decode.c >> @@ -85,6 +85,11 @@ unsigned long arch_compute_rela_sym_offset(int addend) >> return addend + 4; >> } >> >> +bool arch_is_insn_sibling_call(struct instruction *insn) >> +{ >> +return true; >> +} > > All x86 instructions are sibling calls? > The sementic here is indeed wrong. I implemented it like that on the x86 side because at the place it is called from in check.c we already know that we are in presence of a sibling calls when on x86. I will improve these bits on the next iteration. >> + >> int arch_orc_read_unwind_hints(struct objtool_file *file) >> { >> struct section *sec, *relasec; >> diff --git a/tools/objtool/check.c b/tools/objtool/check.c >> index 17fcd8c8f9c1..fa6106214318 100644 >> --- a/tools/objtool/check.c >> +++ b/tools/objtool/check.c >> @@ -261,10 +261,12 @@ static int decode_instructions(struct objtool_file *file) >> unsigned long offset; >> struct instruction *insn; >> int ret; >> +static int composed_insn = 0; >> >> for_each_sec(file, sec) { >> >> -if (!(sec->sh.sh_flags & SHF_EXECINSTR)) >> +if (!(sec->sh.sh_flags & SHF_EXECINSTR) >> +&& (strcmp(sec->name, ".altinstr_replacement") || !IGNORE_SHF_EXEC_FLAG)) >> continue; > > We should just fix the root cause instead, presumably: > > diff --git a/arch/arm64/include/asm/alternative.h b/arch/arm64/include/asm/alternative.h > index b9f8d787eea9..e9e6b81e3eb4 100644 > --- a/arch/arm64/include/asm/alternative.h > +++ b/arch/arm64/include/asm/alternative.h > @@ -71,7 +71,7 @@ static inline void apply_alternatives_module(void *start, size_t length) { } > ALTINSTR_ENTRY(feature,cb)\ > ".popsection\n"\ > " .if " __stringify(cb) " == 0\n"\ > -".pushsection .altinstr_replacement, \"a\"\n"\ > +".pushsection .altinstr_replacement, \"ax\"\n"\ > "663:\n\t"\ > newinstr "\n"\ > "664:\n\t"\ > > I will take your advice and suggest this in the next version as well. >> >> if (strcmp(sec->name, ".altinstr_replacement") && >> @@ -297,10 +299,22 @@ static int decode_instructions(struct objtool_file *file) >> WARN_FUNC("invalid instruction type %d", >> insn->sec, insn->offset, insn->type); >> ret = -1; >> -goto err; >> +free(insn); >> +continue; > > What's the purpose of this change? If it's really needed then it looks > like it should be a separate patch. > >> } >> - >> -hash_add(file->insn_hash, &insn->hash, insn->offset); >> +/* >> + * For arm64 architecture, we sometime split instructions so that >> + * we can track the state evolution (i.e. load/store of pairs of registers). >> + * We thus need to take both into account and not erase the previous ones. >> + */ > > Ew... Is this an architectural thing, or just a quirk of the arm64 > decoder? > The motivation for this is to simulate the two consecutive operations that would be executed on x86 but are done in one on arm64. This is strictly a decoder related quirk. I don't know if there is a better way to do it without modifying the struct op_src and struct instruction. >> +if (composed_insn > 0) >> +hash_add(file->insn_hash, &insn->hash, insn->offset + composed_insn); >> +else >> +hash_add(file->insn_hash, &insn->hash, insn->offset); >> +if (insn->len == 0) >> +composed_insn++; >> +else >> +composed_insn = 0; >> list_add_tail(&insn->list, &file->insn_list); >> } >> >> @@ -510,10 +524,10 @@ static int add_jump_destinations(struct objtool_file *file) >> dest_off = arch_compute_jump_destination(insn); >> } else if (rela->sym->type == STT_SECTION) { >> dest_sec = rela->sym->sec; >> -dest_off = rela->addend + 4; >> +dest_off = arch_compute_rela_sym_offset(rela->addend); >> } else if (rela->sym->sec->idx) { >> dest_sec = rela->sym->sec; >> -dest_off = rela->sym->sym.st_value + rela->addend + 4; >> +dest_off = rela->sym->sym.st_value + arch_compute_rela_sym_offset(rela->addend); >> } else if (strstr(rela->sym->name, "_indirect_thunk_")) { >> /* >> * Retpoline jumps are really dynamic jumps in >> @@ -663,7 +677,7 @@ static int handle_group_alt(struct objtool_file *file, >> last_orig_insn = insn; >> } >> >> -if (next_insn_same_sec(file, last_orig_insn)) { >> +if (last_orig_insn && next_insn_same_sec(file, last_orig_insn)) { > > This might belong in a separate patch which explains the reason for the > change. > I did this because I encountered situations where the last_orig_insn was NULL because of the offset check perform in the preceding loop. This caused a null dereference to occur. But it definitely should be split up. >> fake_jump = malloc(sizeof(*fake_jump)); >> if (!fake_jump) { >> WARN("malloc failed"); >> @@ -976,6 +990,17 @@ static struct rela *find_switch_table(struct objtool_file *file, >> if (find_symbol_containing(rodata_sec, table_offset)) >> continue; >> >> +/* >> + * If we are on arm64 architecture, we now that we > > "know" > >> + * are in presence of a switch table thanks to >> + * the `br <Xn>` insn. but we can't retrieve it yet. >> + * So we just ignore unreachable for this file. >> + */ >> +if (JUMP_DYNAMIC_IS_SWITCH_TABLE) { >> +file->ignore_unreachables = true; >> +return NULL; >> +} >> + > > I think this means switch table reading is not yet supported? If so > then maybe the flag should be called SWITCH_TABLE_NOT_SUPPORTED. > > But really this needs to be fixed anyway before merging the code. > This was indeed an ugly way to do this. I will propose a better solution for this part in the next iteration. >> rodata_rela = find_rela_by_dest(rodata_sec, table_offset); >> if (rodata_rela) { >> /* >> @@ -1258,8 +1283,8 @@ static void save_reg(struct insn_state *state, unsigned char reg, int base, >> >> static void restore_reg(struct insn_state *state, unsigned char reg) >> { >> -state->regs[reg].base = CFI_UNDEFINED; >> -state->regs[reg].offset = 0; >> +state->regs[reg].base = initial_func_cfi.regs[reg].base; >> +state->regs[reg].offset = initial_func_cfi.regs[reg].offset; >> } >> >> /* >> @@ -1415,8 +1440,33 @@ static int update_insn_state(struct instruction *insn, struct insn_state *state) >> >> /* add imm, %rsp */ >> state->stack_size -= op->src.offset; >> -if (cfa->base == CFI_SP) >> +if (cfa->base == CFI_SP) { >> cfa->offset -= op->src.offset; >> +if (state->stack_size == 0 >> +&& initial_func_cfi.cfa.base == CFI_CFA) { >> +cfa->base = CFI_CFA; >> +cfa->offset = 0; >> +} >> +} >> +/* >> + * on arm64 the save/restore of sp into fp is not automatic >> + * and the first one can be done without the other so we >> + * need to be careful not to invalidate the stack frame in such >> + * cases. >> + */ >> +else if (cfa->base == CFI_BP) { >> +if (state->stack_size == 0 >> +&& initial_func_cfi.cfa.base == CFI_CFA) { >> +cfa->base = CFI_CFA; >> +cfa->offset = 0; >> +restore_reg(state, CFI_BP); >> +} >> +} >> +else if (cfa->base == CFI_CFA) { >> +cfa->base = CFI_SP; >> +if (state->stack_size >= 16) >> +cfa->offset = 16; >> +} >> break; >> } >> >> @@ -1427,6 +1477,16 @@ static int update_insn_state(struct instruction *insn, struct insn_state *state) >> break; >> } >> >> +if (op->src.reg == CFI_SP && op->dest.reg == CFI_BP && >> + cfa->base == CFI_SP && >> + regs[CFI_BP].base == CFI_CFA && >> + regs[CFI_BP].offset == -cfa->offset) { >> + >> +/* mov %rsp, %rbp */ >> +cfa->base = op->dest.reg; >> +state->bp_scratch = false; >> +break; >> +} >> if (op->src.reg == CFI_SP && cfa->base == CFI_SP) { >> >> /* drap: lea disp(%rsp), %drap */ >> @@ -1518,6 +1578,9 @@ static int update_insn_state(struct instruction *insn, struct insn_state *state) >> state->stack_size -= 8; >> if (cfa->base == CFI_SP) >> cfa->offset -= 8; >> +if (cfa->base == CFI_SP && cfa->offset == 0 >> +&& initial_func_cfi.cfa.base == CFI_CFA) >> +cfa->base = CFI_CFA; >> >> break; >> >> @@ -1557,6 +1620,8 @@ static int update_insn_state(struct instruction *insn, struct insn_state *state) >> >> case OP_DEST_PUSH: >> state->stack_size += 8; >> +if (cfa->base == CFI_CFA) >> +cfa->base = CFI_SP; >> if (cfa->base == CFI_SP) >> cfa->offset += 8; >> >> @@ -1728,7 +1793,7 @@ static int validate_branch(struct objtool_file *file, struct instruction *first, >> insn = first; >> sec = insn->sec; >> >> -if (insn->alt_group && list_empty(&insn->alts)) { >> +if (!insn->visited && insn->alt_group && list_empty(&insn->alts)) { > > Why? This looks like another one that might belong in a separate patch. > Indeed it belongs to a separate patch. I did this because I encountered a situation where one of the alternative instructions (in .altinstr_replacement) jumps at the offset of an instruction which is replaced as well. I thus considered that if one of the instruction of the group is replaced then all instructions of the group should be replaced and thus this would never happen at execution. >> WARN_FUNC("don't know how to handle branch to middle of alternative instruction group", >> sec, insn->offset); >> return 1; >> @@ -1871,6 +1936,14 @@ static int validate_branch(struct objtool_file *file, struct instruction *first, >> case INSN_JUMP_DYNAMIC: >> if (func && list_empty(&insn->alts) && >> has_modified_stack_frame(&state)) { >> +/* >> + * On arm64 `br <Xn>` insn can be used for switch-tables >> + * but it cannot be distinguished in itself from a sibling >> + * call thus we need to have a look at the previous instructions >> + * to determine which it is >> + */ >> +if (!arch_is_insn_sibling_call(insn)) >> +break; >> WARN_FUNC("sibling call from callable instruction with modified stack frame", >> sec, insn->offset); >> return 1; >> diff --git a/tools/objtool/elf.c b/tools/objtool/elf.c >> index b8f3cca8e58b..136f9b9fb1d1 100644 >> --- a/tools/objtool/elf.c >> +++ b/tools/objtool/elf.c >> @@ -74,7 +74,8 @@ struct symbol *find_symbol_by_offset(struct section *sec, unsigned long offset) >> struct symbol *sym; >> >> list_for_each_entry(sym, &sec->symbol_list, list) >> -if (sym->type != STT_SECTION && >> +if (sym->type != STT_NOTYPE && >> + sym->type != STT_SECTION && > > Why? Another potential separate patch. > This is again a special case I encountered where several symbols can be found with the same offset so the sole condition of the offset wasn't enough to distinguish the correct destination. I thus added some extra check on the type of the symbol. >> sym->offset == offset) >> return sym; >> >> -- >> 2.17.1 >> > Please fill free to comment back on my answers. I would really like to get you opinion on some of the special cases I mentioned. Thanks, -- Raphael Gault IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.
On Wed, Apr 24, 2019 at 04:32:44PM +0000, Raphael Gault wrote: > >> diff --git a/tools/objtool/arch/arm64/decode.c b/tools/objtool/arch/arm64/decode.c > >> index 0feb3ae3af5d..8b293eae2b38 100644 > >> --- a/tools/objtool/arch/arm64/decode.c > >> +++ b/tools/objtool/arch/arm64/decode.c > >> @@ -105,6 +105,33 @@ unsigned long arch_compute_rela_sym_offset(int addend) > >> return addend; > >> } > >> > >> +/* > >> + * In order to know if we are in presence of a sibling > >> + * call and not in presence of a switch table we look > >> + * back at the previous instructions and see if we are > >> + * jumping inside the same function that we are already > >> + * in. > >> + */ > >> +bool arch_is_insn_sibling_call(struct instruction *insn) > >> +{ > >> +struct instruction *prev; > >> +struct list_head *l; > >> +struct symbol *sym; > >> +list_for_each_prev(l, &insn->list) { > >> +prev = (void *)l; > >> +if (!prev->func > >> +|| prev->func->pfunc != insn->func->pfunc) > >> +return false; > >> +if (prev->stack_op.src.reg != ADR_SOURCE) > >> +continue; > >> +sym = find_symbol_containing(insn->sec, insn->immediate); > >> +if (!sym || sym->type != STT_FUNC > >> +|| sym->pfunc != insn->func->pfunc) > >> +return true; > >> +break; > >> +} > >> +return true; > >> +} > > > > I get the feeling there might be a better way to do this, but I can't > > figure out what this function is actually doing. It looks like it > > searches backwards in the function for an instruction which has > > stack_op.src.reg != ADR_SOURCE -- what does that mean? And why doesn't > > it do anything with the instruction after it finds it? > > > > I will indeed try to make it better. I still don't quite get what it's trying to accomplish, but I wonder if there's some kind of tracking you can add in validate_branch() to keep track of whatever you're looking for, leading up to the indirect jump. > >> -hash_add(file->insn_hash, &insn->hash, insn->offset); > >> +/* > >> + * For arm64 architecture, we sometime split instructions so that > >> + * we can track the state evolution (i.e. load/store of pairs of registers). > >> + * We thus need to take both into account and not erase the previous ones. > >> + */ > > > > Ew... Is this an architectural thing, or just a quirk of the arm64 > > decoder? > > > > The motivation for this is to simulate the two consecutive operations > that would be executed on x86 but are done in one on arm64. This is > strictly a decoder related quirk. I don't know if there is a better way > to do it without modifying the struct op_src and struct instruction. Ah. Which ops are those? Hopefully we can find a better way to represent that with a single instruction. Adding fake instructions is fragile.
Hi Josh, On 4/24/19 5:56 PM, Josh Poimboeuf wrote: > On Wed, Apr 24, 2019 at 04:32:44PM +0000, Raphael Gault wrote: >>>> diff --git a/tools/objtool/arch/arm64/decode.c b/tools/objtool/arch/arm64/decode.c >>>> index 0feb3ae3af5d..8b293eae2b38 100644 >>>> --- a/tools/objtool/arch/arm64/decode.c >>>> +++ b/tools/objtool/arch/arm64/decode.c >>>> @@ -105,6 +105,33 @@ unsigned long arch_compute_rela_sym_offset(int addend) >>>> return addend; >>>> } >>>> >>>> +/* >>>> + * In order to know if we are in presence of a sibling >>>> + * call and not in presence of a switch table we look >>>> + * back at the previous instructions and see if we are >>>> + * jumping inside the same function that we are already >>>> + * in. >>>> + */ >>>> +bool arch_is_insn_sibling_call(struct instruction *insn) >>>> +{ >>>> +struct instruction *prev; >>>> +struct list_head *l; >>>> +struct symbol *sym; >>>> +list_for_each_prev(l, &insn->list) { >>>> +prev = (void *)l; >>>> +if (!prev->func >>>> +|| prev->func->pfunc != insn->func->pfunc) >>>> +return false; >>>> +if (prev->stack_op.src.reg != ADR_SOURCE) >>>> +continue; >>>> +sym = find_symbol_containing(insn->sec, insn->immediate); >>>> +if (!sym || sym->type != STT_FUNC >>>> +|| sym->pfunc != insn->func->pfunc) >>>> +return true; >>>> +break; >>>> +} >>>> +return true; >>>> +} >>> >>> I get the feeling there might be a better way to do this, but I can't >>> figure out what this function is actually doing. It looks like it >>> searches backwards in the function for an instruction which has >>> stack_op.src.reg != ADR_SOURCE -- what does that mean? And why doesn't >>> it do anything with the instruction after it finds it? >>> >> >> I will indeed try to make it better. > > I still don't quite get what it's trying to accomplish, but I wonder if > there's some kind of tracking you can add in validate_branch() to keep > track of whatever you're looking for, leading up to the indirect jump. > The motivation behind this is that the `br <Xn>` instruction is a dynamic jump (jump to the address contained in the provided register). This instruction is used for sibling calls but can also be used for switch table. I use this to differentiate these two cases from one another: Generally the `adr/adrp` instruction is used prior to `br` in order to load the address into the register. What I do here is go back throught the instructions and try to identify if the address loaded. I also thought of implementing some sort of tracking in validate branch because it could be useful for identifying the switch tables as well. But it seemed to me like a major change in the sementic of this tool: indeed, from my perspective I would have to track the state of the registers and I don't know if we want to do that. >>>> -hash_add(file->insn_hash, &insn->hash, insn->offset); >>>> +/* >>>> + * For arm64 architecture, we sometime split instructions so that >>>> + * we can track the state evolution (i.e. load/store of pairs of registers). >>>> + * We thus need to take both into account and not erase the previous ones. >>>> + */ >>> >>> Ew... Is this an architectural thing, or just a quirk of the arm64 >>> decoder? >>> >> >> The motivation for this is to simulate the two consecutive operations >> that would be executed on x86 but are done in one on arm64. This is >> strictly a decoder related quirk. I don't know if there is a better way >> to do it without modifying the struct op_src and struct instruction. > > Ah. Which ops are those? Hopefully we can find a better way to > represent that with a single instruction. Adding fake instructions is > fragile. > Those are the load/store of pairs of registers, mainly stp/ldp. Those are often use in the function prologues/epilogues to save/restore the stack pointers and frame pointers however it can be used with any register pair. The idea to add a new instruction could work but I would need to extend the `struct op_src` as well I think. Thanks, -- Raphael Gault IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.
On Thu, Apr 25, 2019 at 08:12:24AM +0000, Raphael Gault wrote: > The motivation behind this is that the `br <Xn>` instruction is a > dynamic jump (jump to the address contained in the provided register). > This instruction is used for sibling calls but can also be used for > switch table. I use this to differentiate these two cases from one another: > > Generally the `adr/adrp` instruction is used prior to `br` in order to > load the address into the register. What I do here is go back throught > the instructions and try to identify if the address loaded. Yikes, be very careful with simply going back on the instruction stream. The problem case would be something like: adr ... b 1f ... b ... 1: br ... In that case, simply going backwards from 1 will not yield the desired result. At some point I did a pass storing all the fwd jumps in their destination and used that to 'rewind' the instruction stream, but that wasn't very pretty either. The best way might be to, while doing validate_branch() keep a state table of all most recent ADR(P) instructions and when encountering BR check that state to see what it really is. We currently don't do dynamic insn->type, but it shouldn't be too hard (famous last words of course).
On Thu, Apr 25, 2019 at 08:12:24AM +0000, Raphael Gault wrote: > Hi Josh, > > On 4/24/19 5:56 PM, Josh Poimboeuf wrote: > > On Wed, Apr 24, 2019 at 04:32:44PM +0000, Raphael Gault wrote: > >>>> diff --git a/tools/objtool/arch/arm64/decode.c b/tools/objtool/arch/arm64/decode.c > >>>> index 0feb3ae3af5d..8b293eae2b38 100644 > >>>> --- a/tools/objtool/arch/arm64/decode.c > >>>> +++ b/tools/objtool/arch/arm64/decode.c > >>>> @@ -105,6 +105,33 @@ unsigned long arch_compute_rela_sym_offset(int addend) > >>>> return addend; > >>>> } > >>>> > >>>> +/* > >>>> + * In order to know if we are in presence of a sibling > >>>> + * call and not in presence of a switch table we look > >>>> + * back at the previous instructions and see if we are > >>>> + * jumping inside the same function that we are already > >>>> + * in. > >>>> + */ > >>>> +bool arch_is_insn_sibling_call(struct instruction *insn) > >>>> +{ > >>>> +struct instruction *prev; > >>>> +struct list_head *l; > >>>> +struct symbol *sym; > >>>> +list_for_each_prev(l, &insn->list) { > >>>> +prev = (void *)l; > >>>> +if (!prev->func > >>>> +|| prev->func->pfunc != insn->func->pfunc) > >>>> +return false; > >>>> +if (prev->stack_op.src.reg != ADR_SOURCE) > >>>> +continue; > >>>> +sym = find_symbol_containing(insn->sec, insn->immediate); > >>>> +if (!sym || sym->type != STT_FUNC > >>>> +|| sym->pfunc != insn->func->pfunc) > >>>> +return true; > >>>> +break; > >>>> +} > >>>> +return true; > >>>> +} > >>> > >>> I get the feeling there might be a better way to do this, but I can't > >>> figure out what this function is actually doing. It looks like it > >>> searches backwards in the function for an instruction which has > >>> stack_op.src.reg != ADR_SOURCE -- what does that mean? And why doesn't > >>> it do anything with the instruction after it finds it? > >>> > >> > >> I will indeed try to make it better. > > > > I still don't quite get what it's trying to accomplish, but I wonder if > > there's some kind of tracking you can add in validate_branch() to keep > > track of whatever you're looking for, leading up to the indirect jump. > > > > The motivation behind this is that the `br <Xn>` instruction is a > dynamic jump (jump to the address contained in the provided register). > This instruction is used for sibling calls but can also be used for > switch table. I use this to differentiate these two cases from one another: > > Generally the `adr/adrp` instruction is used prior to `br` in order to > load the address into the register. What I do here is go back throught > the instructions and try to identify if the address loaded. > > I also thought of implementing some sort of tracking in validate branch > because it could be useful for identifying the switch tables as well. > But it seemed to me like a major change in the sementic of this tool: > indeed, from my perspective I would have to track the state of the > registers and I don't know if we want to do that. I don't have much time to look at this today (and I'll be out next week), but we had a similar problem in x86. See the comments above find_switch_table(), particularly #3. Does that function not work for the arm64 case? > >>>> -hash_add(file->insn_hash, &insn->hash, insn->offset); > >>>> +/* > >>>> + * For arm64 architecture, we sometime split instructions so that > >>>> + * we can track the state evolution (i.e. load/store of pairs of registers). > >>>> + * We thus need to take both into account and not erase the previous ones. > >>>> + */ > >>> > >>> Ew... Is this an architectural thing, or just a quirk of the arm64 > >>> decoder? > >>> > >> > >> The motivation for this is to simulate the two consecutive operations > >> that would be executed on x86 but are done in one on arm64. This is > >> strictly a decoder related quirk. I don't know if there is a better way > >> to do it without modifying the struct op_src and struct instruction. > > > > Ah. Which ops are those? Hopefully we can find a better way to > > represent that with a single instruction. Adding fake instructions is > > fragile. > > > > Those are the load/store of pairs of registers, mainly stp/ldp. Those > are often use in the function prologues/epilogues to save/restore the > stack pointers and frame pointers however it can be used with any > register pair. > > The idea to add a new instruction could work but I would need to extend > the `struct op_src` as well I think. Again I don't have much time to look at it, but I do think that changing op_src/dest to allow for the stp/ldp instructions would work better than inserting a fake instruction to emulate x86. Or another idea would be to associate multiple stack_ops with a single instruction.
Hi Josh, On 4/25/19 5:25 PM, Josh Poimboeuf wrote: > On Thu, Apr 25, 2019 at 08:12:24AM +0000, Raphael Gault wrote: >> Hi Josh, >> >> On 4/24/19 5:56 PM, Josh Poimboeuf wrote: >>> On Wed, Apr 24, 2019 at 04:32:44PM +0000, Raphael Gault wrote: >>>>>> diff --git a/tools/objtool/arch/arm64/decode.c b/tools/objtool/arch/arm64/decode.c >>>>>> index 0feb3ae3af5d..8b293eae2b38 100644 >>>>>> --- a/tools/objtool/arch/arm64/decode.c >>>>>> +++ b/tools/objtool/arch/arm64/decode.c >>>>>> @@ -105,6 +105,33 @@ unsigned long arch_compute_rela_sym_offset(int addend) >>>>>> return addend; >>>>>> } >>>>>> >>>>>> +/* >>>>>> + * In order to know if we are in presence of a sibling >>>>>> + * call and not in presence of a switch table we look >>>>>> + * back at the previous instructions and see if we are >>>>>> + * jumping inside the same function that we are already >>>>>> + * in. >>>>>> + */ >>>>>> +bool arch_is_insn_sibling_call(struct instruction *insn) >>>>>> +{ >>>>>> +struct instruction *prev; >>>>>> +struct list_head *l; >>>>>> +struct symbol *sym; >>>>>> +list_for_each_prev(l, &insn->list) { >>>>>> +prev = (void *)l; >>>>>> +if (!prev->func >>>>>> +|| prev->func->pfunc != insn->func->pfunc) >>>>>> +return false; >>>>>> +if (prev->stack_op.src.reg != ADR_SOURCE) >>>>>> +continue; >>>>>> +sym = find_symbol_containing(insn->sec, insn->immediate); >>>>>> +if (!sym || sym->type != STT_FUNC >>>>>> +|| sym->pfunc != insn->func->pfunc) >>>>>> +return true; >>>>>> +break; >>>>>> +} >>>>>> +return true; >>>>>> +} >>>>> >>>>> I get the feeling there might be a better way to do this, but I can't >>>>> figure out what this function is actually doing. It looks like it >>>>> searches backwards in the function for an instruction which has >>>>> stack_op.src.reg != ADR_SOURCE -- what does that mean? And why doesn't >>>>> it do anything with the instruction after it finds it? >>>>> >>>> >>>> I will indeed try to make it better. >>> >>> I still don't quite get what it's trying to accomplish, but I wonder if >>> there's some kind of tracking you can add in validate_branch() to keep >>> track of whatever you're looking for, leading up to the indirect jump. >>> >> >> The motivation behind this is that the `br <Xn>` instruction is a >> dynamic jump (jump to the address contained in the provided register). >> This instruction is used for sibling calls but can also be used for >> switch table. I use this to differentiate these two cases from one another: >> >> Generally the `adr/adrp` instruction is used prior to `br` in order to >> load the address into the register. What I do here is go back throught >> the instructions and try to identify if the address loaded. >> >> I also thought of implementing some sort of tracking in validate branch >> because it could be useful for identifying the switch tables as well. >> But it seemed to me like a major change in the sementic of this tool: >> indeed, from my perspective I would have to track the state of the >> registers and I don't know if we want to do that. > > I don't have much time to look at this today (and I'll be out next > week), but we had a similar problem in x86. See the comments above > find_switch_table(), particularly #3. Does that function not work for > the arm64 case? > Honestly, I don't have a full understanding of how the switch tables are handled on arm64. All I know is that I've identified a case in which it doesn't work (and I get an unreachable instruction warning). When trying to figure out how the switch tables work on arm64 and how objtool is retrieving them (on x86 at least) I realised that you look for 2 relocations : - One from (.rela).text which refers to the .rodata section - One from (.rela).rodata which refers somewhere else. On the case I identified the second relocation doesn't exist thus the function doesn't find the switch table. Again since I do not have a good understanding about this I am not able to say if it is a corner case or not. >>>>>> -hash_add(file->insn_hash, &insn->hash, insn->offset); >>>>>> +/* >>>>>> + * For arm64 architecture, we sometime split instructions so that >>>>>> + * we can track the state evolution (i.e. load/store of pairs of registers). >>>>>> + * We thus need to take both into account and not erase the previous ones. >>>>>> + */ >>>>> >>>>> Ew... Is this an architectural thing, or just a quirk of the arm64 >>>>> decoder? >>>>> >>>> >>>> The motivation for this is to simulate the two consecutive operations >>>> that would be executed on x86 but are done in one on arm64. This is >>>> strictly a decoder related quirk. I don't know if there is a better way >>>> to do it without modifying the struct op_src and struct instruction. >>> >>> Ah. Which ops are those? Hopefully we can find a better way to >>> represent that with a single instruction. Adding fake instructions is >>> fragile. >>> >> >> Those are the load/store of pairs of registers, mainly stp/ldp. Those >> are often use in the function prologues/epilogues to save/restore the >> stack pointers and frame pointers however it can be used with any >> register pair. >> >> The idea to add a new instruction could work but I would need to extend >> the `struct op_src` as well I think. > > Again I don't have much time to look at it, but I do think that changing > op_src/dest to allow for the stp/ldp instructions would work better than > inserting a fake instruction to emulate x86. > > Or another idea would be to associate multiple stack_ops with a single > instruction. > I haven't looked at it in depth yet but I will try to figure out a good way to represent those instructions on a more proper manner. Thanks, -- Raphael Gault IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.
Hi Josh, On 4/30/19 1:20 PM, Raphael Gault wrote: > Hi Josh, > > On 4/25/19 5:25 PM, Josh Poimboeuf wrote: >> On Thu, Apr 25, 2019 at 08:12:24AM +0000, Raphael Gault wrote: >>> Hi Josh, >>> >>> On 4/24/19 5:56 PM, Josh Poimboeuf wrote: >>>> On Wed, Apr 24, 2019 at 04:32:44PM +0000, Raphael Gault wrote: >>>>>>> diff --git a/tools/objtool/arch/arm64/decode.c b/tools/objtool/arch/arm64/decode.c >>>>>>> index 0feb3ae3af5d..8b293eae2b38 100644 >>>>>>> --- a/tools/objtool/arch/arm64/decode.c >>>>>>> +++ b/tools/objtool/arch/arm64/decode.c >>>>>>> @@ -105,6 +105,33 @@ unsigned long arch_compute_rela_sym_offset(int addend) >>>>>>> return addend; >>>>>>> } >>>>>>> >>>>>>> +/* >>>>>>> + * In order to know if we are in presence of a sibling >>>>>>> + * call and not in presence of a switch table we look >>>>>>> + * back at the previous instructions and see if we are >>>>>>> + * jumping inside the same function that we are already >>>>>>> + * in. >>>>>>> + */ >>>>>>> +bool arch_is_insn_sibling_call(struct instruction *insn) >>>>>>> +{ >>>>>>> +struct instruction *prev; >>>>>>> +struct list_head *l; >>>>>>> +struct symbol *sym; >>>>>>> +list_for_each_prev(l, &insn->list) { >>>>>>> +prev = (void *)l; >>>>>>> +if (!prev->func >>>>>>> +|| prev->func->pfunc != insn->func->pfunc) >>>>>>> +return false; >>>>>>> +if (prev->stack_op.src.reg != ADR_SOURCE) >>>>>>> +continue; >>>>>>> +sym = find_symbol_containing(insn->sec, insn->immediate); >>>>>>> +if (!sym || sym->type != STT_FUNC >>>>>>> +|| sym->pfunc != insn->func->pfunc) >>>>>>> +return true; >>>>>>> +break; >>>>>>> +} >>>>>>> +return true; >>>>>>> +} >>>>>> >>>>>> I get the feeling there might be a better way to do this, but I can't >>>>>> figure out what this function is actually doing. It looks like it >>>>>> searches backwards in the function for an instruction which has >>>>>> stack_op.src.reg != ADR_SOURCE -- what does that mean? And why doesn't >>>>>> it do anything with the instruction after it finds it? >>>>>> >>>>> >>>>> I will indeed try to make it better. >>>> >>>> I still don't quite get what it's trying to accomplish, but I wonder if >>>> there's some kind of tracking you can add in validate_branch() to keep >>>> track of whatever you're looking for, leading up to the indirect jump. >>>> >>> >>> The motivation behind this is that the `br <Xn>` instruction is a >>> dynamic jump (jump to the address contained in the provided register). >>> This instruction is used for sibling calls but can also be used for >>> switch table. I use this to differentiate these two cases from one another: >>> >>> Generally the `adr/adrp` instruction is used prior to `br` in order to >>> load the address into the register. What I do here is go back throught >>> the instructions and try to identify if the address loaded. >>> >>> I also thought of implementing some sort of tracking in validate branch >>> because it could be useful for identifying the switch tables as well. >>> But it seemed to me like a major change in the sementic of this tool: >>> indeed, from my perspective I would have to track the state of the >>> registers and I don't know if we want to do that. >> >> I don't have much time to look at this today (and I'll be out next >> week), but we had a similar problem in x86. See the comments above >> find_switch_table(), particularly #3. Does that function not work for >> the arm64 case? >> > > Honestly, I don't have a full understanding of how the switch tables are > handled on arm64. All I know is that I've identified a case in which it > doesn't work (and I get an unreachable instruction warning). > When trying to figure out how the switch tables work on arm64 and how > objtool is retrieving them (on x86 at least) I realised that you look > for 2 relocations : > - One from (.rela).text which refers to the .rodata section > - One from (.rela).rodata which refers somewhere else. > On the case I identified the second relocation doesn't exist thus the > function doesn't find the switch table. > > Again since I do not have a good understanding about this I am not able > to say if it is a corner case or not. > >>>>>>> -hash_add(file->insn_hash, &insn->hash, insn->offset); >>>>>>> +/* >>>>>>> + * For arm64 architecture, we sometime split instructions so that >>>>>>> + * we can track the state evolution (i.e. load/store of pairs of registers). >>>>>>> + * We thus need to take both into account and not erase the previous ones. >>>>>>> + */ >>>>>> >>>>>> Ew... Is this an architectural thing, or just a quirk of the arm64 >>>>>> decoder? >>>>>> >>>>> >>>>> The motivation for this is to simulate the two consecutive operations >>>>> that would be executed on x86 but are done in one on arm64. This is >>>>> strictly a decoder related quirk. I don't know if there is a better way >>>>> to do it without modifying the struct op_src and struct instruction. >>>> >>>> Ah. Which ops are those? Hopefully we can find a better way to >>>> represent that with a single instruction. Adding fake instructions is >>>> fragile. >>>> >>> >>> Those are the load/store of pairs of registers, mainly stp/ldp. Those >>> are often use in the function prologues/epilogues to save/restore the >>> stack pointers and frame pointers however it can be used with any >>> register pair. >>> >>> The idea to add a new instruction could work but I would need to extend >>> the `struct op_src` as well I think. >> >> Again I don't have much time to look at it, but I do think that changing >> op_src/dest to allow for the stp/ldp instructions would work better than >> inserting a fake instruction to emulate x86. >> >> Or another idea would be to associate multiple stack_ops with a single >> instruction. >> > > I haven't looked at it in depth yet but I will try to figure out a good > way to represent those instructions on a more proper manner. I wanted to get your thoughts on the solution I found without waiting for v2. If it's too much trouble reviewing it now I'll wait for the v2. I added a field to the struct stack_op in order to have access to an extra register. This way I can provide the extra register to the instruction in order to use it later. diff --git a/tools/objtool/arch.h b/tools/objtool/arch.h index 52599ebd89fb..ae9ae25b3bdc 100644 --- a/tools/objtool/arch.h +++ b/tools/objtool/arch.h @@ -51,6 +51,7 @@ struct op_dest { int offset; }; + enum op_src_type { OP_SRC_REG, OP_SRC_REG_INDIRECT, @@ -66,9 +67,16 @@ struct op_src { int offset; }; +struct op_extra { +unsigned char used; +unsigned char reg; +int offset; +}; + struct stack_op { struct op_dest dest; struct op_src src; +struct op_extra extra; }; struct instruction; diff --git a/tools/objtool/arch/arm64/decode.c b/tools/objtool/arch/arm64/decode.c index 17b5d59f16ad..86ad37c8397c 100644 --- a/tools/objtool/arch/arm64/decode.c +++ b/tools/objtool/arch/arm64/decode.c @@ -1743,23 +1673,26 @@ int arm_decode_ld_st_regs_pair_off(u32 instr, unsigned char *type, op->src.type = OP_SRC_REG_INDIRECT; op->src.reg = CFI_SP; op->src.offset = 0; -state.curr_offset = 8; op->dest.type = OP_DEST_REG; -op->dest.reg = state.regs[0]; +op->dest.reg = rt; op->dest.offset = 0; +op->extra.used = 1; +op->extra.reg = rt2; +op->extra.offset = 8; break; default: op->dest.type = OP_DEST_REG_INDIRECT; op->dest.reg = CFI_SP; op->dest.offset = 8; -state.curr_offset = 0; op->src.type = OP_SRC_REG; -op->src.reg = state.regs[1]; +op->src.reg = rt2; op->src.offset = 0; +op->extra.used = 1; +op->extra.reg = rt; +op->extra.offset = 0; /* store */ } -state.op = *op; -return INSN_COMPOSED; +return 0; } int arm_decode_ld_st_regs_pair_post(u32 instr, unsigned char *type, diff --git a/tools/objtool/check.c b/tools/objtool/check.c index 5ddb25414de5..df1fb6ce1e8f 100644 --- a/tools/objtool/check.c +++ b/tools/objtool/check.c @@ -1580,6 +1569,18 @@ static int update_insn_state(struct instruction *insn, struct insn_state *state) initial_func_cfi.cfa.base == CFI_CFA) cfa->base = CFI_CFA; +if (op->extra.used) { +if (regs[op->extra.reg].offset == -state->stack_size) +restore_reg(state, op->extra.reg); +state->stack_size -= 8; +if (cfa->base == CFI_SP) +cfa->offset -= 8; +if (cfa->base == CFI_SP && + cfa->offset == 0 && + initial_func_cfi.cfa.base == CFI_CFA) +cfa->base = CFI_CFA; +} + break; case OP_SRC_REG_INDIRECT: @@ -1598,12 +1599,22 @@ static int update_insn_state(struct instruction *insn, struct insn_state *state) /* drap: mov disp(%rbp), %reg */ restore_reg(state, op->dest.reg); +if (op->extra.used && + op->src.reg == CFI_BP && + op->extra.offset == regs[op->extra.reg].offset) +restore_reg(state, op->extra.reg); + } else if (op->src.reg == cfa->base && op->src.offset == regs[op->dest.reg].offset + cfa->offset) { /* mov disp(%rbp), %reg */ /* mov disp(%rsp), %reg */ restore_reg(state, op->dest.reg); + +if (op->extra.used && + op->src.reg == cfa->base && + op->extra.offset == regs[op->extra.reg].offset + cfa->offset) +restore_reg(state, op->extra.reg); } break; @@ -1653,6 +1664,21 @@ static int update_insn_state(struct instruction *insn, struct insn_state *state) save_reg(state, op->src.reg, CFI_CFA, -state->stack_size); } +if (op->extra.used) { +state->stack_size += 8; +if (cfa->base == CFI_CFA) +cfa->base = CFI_SP; +if (cfa->base == CFI_SP) +cfa->offset += 8; +if (!state->drap || + (!(op->extra.reg == cfa->base && + op->extra.reg == state->drap_reg) && + !(op->extra.reg == CFI_BP && + cfa->base == state->drap_reg) && + regs[op->extra.reg].base == CFI_UNDEFINED)) +save_reg(state, op->extra.reg, CFI_CFA, + -state->stack_size); +} /* detect when asm code uses rbp as a scratch register */ if (!no_fp && insn->func && op->src.reg == CFI_BP && cfa->base != CFI_BP) @@ -1671,11 +1697,19 @@ static int update_insn_state(struct instruction *insn, struct insn_state *state) /* save drap offset so we know when to restore it */ state->drap_offset = op->dest.offset; } +if (op->extra.used && op->extra.reg == cfa->base && + op->extra.reg == state->drap_reg) { +cfa->base = CFI_BP_INDIRECT; +cfa->offset = op->extra.offset; +} else if (regs[op->src.reg].base == CFI_UNDEFINED) { /* drap: mov reg, disp(%rbp) */ save_reg(state, op->src.reg, CFI_BP, op->dest.offset); +if (op->extra.used) +save_reg(state, op->extra.reg, CFI_BP, + op->extra.offset); } } else if (op->dest.reg == cfa->base) { @@ -1684,8 +1718,12 @@ static int update_insn_state(struct instruction *insn, struct insn_state *state) /* mov reg, disp(%rsp) */ save_reg(state, op->src.reg, CFI_CFA, op->dest.offset - state->cfa.offset); +if (op->extra.used) +save_reg(state, op->extra.reg, CFI_CFA, + op->extra.offset - state->cfa.offset); } + break; case OP_DEST_LEAVE: Thanks, -- Raphael Gault IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.
diff --git a/tools/objtool/arch.h b/tools/objtool/arch.h index 0eff166ca613..f3bef3f2cef3 100644 --- a/tools/objtool/arch.h +++ b/tools/objtool/arch.h @@ -88,4 +88,6 @@ unsigned long arch_compute_jump_destination(struct instruction *insn); unsigned long arch_compute_rela_sym_offset(int addend); +bool arch_is_insn_sibling_call(struct instruction *insn); + #endif /* _ARCH_H */ diff --git a/tools/objtool/arch/arm64/decode.c b/tools/objtool/arch/arm64/decode.c index 0feb3ae3af5d..8b293eae2b38 100644 --- a/tools/objtool/arch/arm64/decode.c +++ b/tools/objtool/arch/arm64/decode.c @@ -105,6 +105,33 @@ unsigned long arch_compute_rela_sym_offset(int addend) return addend; } +/* + * In order to know if we are in presence of a sibling + * call and not in presence of a switch table we look + * back at the previous instructions and see if we are + * jumping inside the same function that we are already + * in. + */ +bool arch_is_insn_sibling_call(struct instruction *insn) +{ + struct instruction *prev; + struct list_head *l; + struct symbol *sym; + list_for_each_prev(l, &insn->list) { + prev = (void *)l; + if (!prev->func + || prev->func->pfunc != insn->func->pfunc) + return false; + if (prev->stack_op.src.reg != ADR_SOURCE) + continue; + sym = find_symbol_containing(insn->sec, insn->immediate); + if (!sym || sym->type != STT_FUNC + || sym->pfunc != insn->func->pfunc) + return true; + break; + } + return true; +} static int is_arm64(struct elf *elf) { switch(elf->ehdr.e_machine){ diff --git a/tools/objtool/arch/x86/decode.c b/tools/objtool/arch/x86/decode.c index 1af7b4996307..88c3d99c76be 100644 --- a/tools/objtool/arch/x86/decode.c +++ b/tools/objtool/arch/x86/decode.c @@ -85,6 +85,11 @@ unsigned long arch_compute_rela_sym_offset(int addend) return addend + 4; } +bool arch_is_insn_sibling_call(struct instruction *insn) +{ + return true; +} + int arch_orc_read_unwind_hints(struct objtool_file *file) { struct section *sec, *relasec; diff --git a/tools/objtool/check.c b/tools/objtool/check.c index 17fcd8c8f9c1..fa6106214318 100644 --- a/tools/objtool/check.c +++ b/tools/objtool/check.c @@ -261,10 +261,12 @@ static int decode_instructions(struct objtool_file *file) unsigned long offset; struct instruction *insn; int ret; + static int composed_insn = 0; for_each_sec(file, sec) { - if (!(sec->sh.sh_flags & SHF_EXECINSTR)) + if (!(sec->sh.sh_flags & SHF_EXECINSTR) + && (strcmp(sec->name, ".altinstr_replacement") || !IGNORE_SHF_EXEC_FLAG)) continue; if (strcmp(sec->name, ".altinstr_replacement") && @@ -297,10 +299,22 @@ static int decode_instructions(struct objtool_file *file) WARN_FUNC("invalid instruction type %d", insn->sec, insn->offset, insn->type); ret = -1; - goto err; + free(insn); + continue; } - - hash_add(file->insn_hash, &insn->hash, insn->offset); + /* + * For arm64 architecture, we sometime split instructions so that + * we can track the state evolution (i.e. load/store of pairs of registers). + * We thus need to take both into account and not erase the previous ones. + */ + if (composed_insn > 0) + hash_add(file->insn_hash, &insn->hash, insn->offset + composed_insn); + else + hash_add(file->insn_hash, &insn->hash, insn->offset); + if (insn->len == 0) + composed_insn++; + else + composed_insn = 0; list_add_tail(&insn->list, &file->insn_list); } @@ -510,10 +524,10 @@ static int add_jump_destinations(struct objtool_file *file) dest_off = arch_compute_jump_destination(insn); } else if (rela->sym->type == STT_SECTION) { dest_sec = rela->sym->sec; - dest_off = rela->addend + 4; + dest_off = arch_compute_rela_sym_offset(rela->addend); } else if (rela->sym->sec->idx) { dest_sec = rela->sym->sec; - dest_off = rela->sym->sym.st_value + rela->addend + 4; + dest_off = rela->sym->sym.st_value + arch_compute_rela_sym_offset(rela->addend); } else if (strstr(rela->sym->name, "_indirect_thunk_")) { /* * Retpoline jumps are really dynamic jumps in @@ -663,7 +677,7 @@ static int handle_group_alt(struct objtool_file *file, last_orig_insn = insn; } - if (next_insn_same_sec(file, last_orig_insn)) { + if (last_orig_insn && next_insn_same_sec(file, last_orig_insn)) { fake_jump = malloc(sizeof(*fake_jump)); if (!fake_jump) { WARN("malloc failed"); @@ -976,6 +990,17 @@ static struct rela *find_switch_table(struct objtool_file *file, if (find_symbol_containing(rodata_sec, table_offset)) continue; + /* + * If we are on arm64 architecture, we now that we + * are in presence of a switch table thanks to + * the `br <Xn>` insn. but we can't retrieve it yet. + * So we just ignore unreachable for this file. + */ + if (JUMP_DYNAMIC_IS_SWITCH_TABLE) { + file->ignore_unreachables = true; + return NULL; + } + rodata_rela = find_rela_by_dest(rodata_sec, table_offset); if (rodata_rela) { /* @@ -1258,8 +1283,8 @@ static void save_reg(struct insn_state *state, unsigned char reg, int base, static void restore_reg(struct insn_state *state, unsigned char reg) { - state->regs[reg].base = CFI_UNDEFINED; - state->regs[reg].offset = 0; + state->regs[reg].base = initial_func_cfi.regs[reg].base; + state->regs[reg].offset = initial_func_cfi.regs[reg].offset; } /* @@ -1415,8 +1440,33 @@ static int update_insn_state(struct instruction *insn, struct insn_state *state) /* add imm, %rsp */ state->stack_size -= op->src.offset; - if (cfa->base == CFI_SP) + if (cfa->base == CFI_SP) { cfa->offset -= op->src.offset; + if (state->stack_size == 0 + && initial_func_cfi.cfa.base == CFI_CFA) { + cfa->base = CFI_CFA; + cfa->offset = 0; + } + } + /* + * on arm64 the save/restore of sp into fp is not automatic + * and the first one can be done without the other so we + * need to be careful not to invalidate the stack frame in such + * cases. + */ + else if (cfa->base == CFI_BP) { + if (state->stack_size == 0 + && initial_func_cfi.cfa.base == CFI_CFA) { + cfa->base = CFI_CFA; + cfa->offset = 0; + restore_reg(state, CFI_BP); + } + } + else if (cfa->base == CFI_CFA) { + cfa->base = CFI_SP; + if (state->stack_size >= 16) + cfa->offset = 16; + } break; } @@ -1427,6 +1477,16 @@ static int update_insn_state(struct instruction *insn, struct insn_state *state) break; } + if (op->src.reg == CFI_SP && op->dest.reg == CFI_BP && + cfa->base == CFI_SP && + regs[CFI_BP].base == CFI_CFA && + regs[CFI_BP].offset == -cfa->offset) { + + /* mov %rsp, %rbp */ + cfa->base = op->dest.reg; + state->bp_scratch = false; + break; + } if (op->src.reg == CFI_SP && cfa->base == CFI_SP) { /* drap: lea disp(%rsp), %drap */ @@ -1518,6 +1578,9 @@ static int update_insn_state(struct instruction *insn, struct insn_state *state) state->stack_size -= 8; if (cfa->base == CFI_SP) cfa->offset -= 8; + if (cfa->base == CFI_SP && cfa->offset == 0 + && initial_func_cfi.cfa.base == CFI_CFA) + cfa->base = CFI_CFA; break; @@ -1557,6 +1620,8 @@ static int update_insn_state(struct instruction *insn, struct insn_state *state) case OP_DEST_PUSH: state->stack_size += 8; + if (cfa->base == CFI_CFA) + cfa->base = CFI_SP; if (cfa->base == CFI_SP) cfa->offset += 8; @@ -1728,7 +1793,7 @@ static int validate_branch(struct objtool_file *file, struct instruction *first, insn = first; sec = insn->sec; - if (insn->alt_group && list_empty(&insn->alts)) { + if (!insn->visited && insn->alt_group && list_empty(&insn->alts)) { WARN_FUNC("don't know how to handle branch to middle of alternative instruction group", sec, insn->offset); return 1; @@ -1871,6 +1936,14 @@ static int validate_branch(struct objtool_file *file, struct instruction *first, case INSN_JUMP_DYNAMIC: if (func && list_empty(&insn->alts) && has_modified_stack_frame(&state)) { + /* + * On arm64 `br <Xn>` insn can be used for switch-tables + * but it cannot be distinguished in itself from a sibling + * call thus we need to have a look at the previous instructions + * to determine which it is + */ + if (!arch_is_insn_sibling_call(insn)) + break; WARN_FUNC("sibling call from callable instruction with modified stack frame", sec, insn->offset); return 1; diff --git a/tools/objtool/elf.c b/tools/objtool/elf.c index b8f3cca8e58b..136f9b9fb1d1 100644 --- a/tools/objtool/elf.c +++ b/tools/objtool/elf.c @@ -74,7 +74,8 @@ struct symbol *find_symbol_by_offset(struct section *sec, unsigned long offset) struct symbol *sym; list_for_each_entry(sym, &sec->symbol_list, list) - if (sym->type != STT_SECTION && + if (sym->type != STT_NOTYPE && + sym->type != STT_SECTION && sym->offset == offset) return sym;
Since the way the initial stack frame when entering a function is different that what is done in the x86_64 architecture, we need to add some more check to support the different cases. As opposed as for x86_64, the return address is not stored by the call instruction but is instead loaded in a register. The initial stack frame is thus empty when entering a function and 2 push operations are needed to set it up correctly. All the different combinations need to be taken into account. On arm64, the .altinstr_replacement section is not flagged as containing executable instructions but we still need to process it. Switch tables are alse stored in a different way on arm64 than on x86_64 so we need to be able to identify in which case we are when looking for it. Signed-off-by: Raphael Gault <raphael.gault@arm.com> --- tools/objtool/arch.h | 2 + tools/objtool/arch/arm64/decode.c | 27 +++++++++ tools/objtool/arch/x86/decode.c | 5 ++ tools/objtool/check.c | 95 +++++++++++++++++++++++++++---- tools/objtool/elf.c | 3 +- 5 files changed, 120 insertions(+), 12 deletions(-)