diff mbox series

[RFC,3/6] objtool: arm64: Adapt the stack frame checks and the section analysis for the arm architecture

Message ID 20190409135243.12424-4-raphael.gault@arm.com (mailing list archive)
State RFC
Headers show
Series objtool: Add support for Arm64 | expand

Commit Message

Raphael Gault April 9, 2019, 1:52 p.m. UTC
Since the way the initial stack frame when entering a function is different that what is done
in the x86_64 architecture, we need to add some more check to support the different cases.
As opposed as for x86_64, the return address is not stored by the call instruction but is instead
loaded in a register. The initial stack frame is thus empty when entering a function and 2 push
operations are needed to set it up correctly. All the different combinations need to be
taken into account.

On arm64, the .altinstr_replacement section is not flagged as containing executable instructions
but we still need to process it.

Switch tables are alse stored in a different way on arm64 than on x86_64 so we need to be able
to identify in which case we are when looking for it.

Signed-off-by: Raphael Gault <raphael.gault@arm.com>
---
 tools/objtool/arch.h              |  2 +
 tools/objtool/arch/arm64/decode.c | 27 +++++++++
 tools/objtool/arch/x86/decode.c   |  5 ++
 tools/objtool/check.c             | 95 +++++++++++++++++++++++++++----
 tools/objtool/elf.c               |  3 +-
 5 files changed, 120 insertions(+), 12 deletions(-)

Comments

Peter Zijlstra April 9, 2019, 4:12 p.m. UTC | #1
I'm just doing my initial read-through,.. however

On Tue, Apr 09, 2019 at 02:52:40PM +0100, Raphael Gault wrote:
> +		if (!(sec->sh.sh_flags & SHF_EXECINSTR)
> +			&& (strcmp(sec->name, ".altinstr_replacement") || !IGNORE_SHF_EXEC_FLAG))
>  			continue;

could you please not format code like that. Operators go at the end of
the line, and continuation should match the indentation of the opening
paren. So the above would look like:

> +		if (!(sec->sh.sh_flags & SHF_EXECINSTR) &&
> +		    (strcmp(sec->name, ".altinstr_replacement") || !IGNORE_SHF_EXEC_FLAG))
>  			continue;

You appear to be doing that quit consistently, and it is against style.
Mark Rutland April 9, 2019, 4:24 p.m. UTC | #2
On Tue, Apr 09, 2019 at 06:12:04PM +0200, Peter Zijlstra wrote:
> 
> I'm just doing my initial read-through,.. however
> 
> On Tue, Apr 09, 2019 at 02:52:40PM +0100, Raphael Gault wrote:
> > +		if (!(sec->sh.sh_flags & SHF_EXECINSTR)
> > +			&& (strcmp(sec->name, ".altinstr_replacement") || !IGNORE_SHF_EXEC_FLAG))
> >  			continue;
> 
> could you please not format code like that. Operators go at the end of
> the line, and continuation should match the indentation of the opening
> paren. So the above would look like:
> 
> > +		if (!(sec->sh.sh_flags & SHF_EXECINSTR) &&
> > +		    (strcmp(sec->name, ".altinstr_replacement") || !IGNORE_SHF_EXEC_FLAG))
> >  			continue;
> 
> You appear to be doing that quit consistently, and it is against style.

Raphael, as a heads-up, ./scripts/checkpatch.pl can catch issues like
this. You can run it over a list of patches, so for a patch series you
can run:

 $ ./scripts/checkpatch.pl *.patch

... and hopefully most of the output will be reasonable.

Thanks,
Mark.
Julien Thierry April 9, 2019, 4:27 p.m. UTC | #3
On 09/04/2019 17:24, Mark Rutland wrote:
> On Tue, Apr 09, 2019 at 06:12:04PM +0200, Peter Zijlstra wrote:
>>
>> I'm just doing my initial read-through,.. however
>>
>> On Tue, Apr 09, 2019 at 02:52:40PM +0100, Raphael Gault wrote:
>>> +		if (!(sec->sh.sh_flags & SHF_EXECINSTR)
>>> +			&& (strcmp(sec->name, ".altinstr_replacement") || !IGNORE_SHF_EXEC_FLAG))
>>>  			continue;
>>
>> could you please not format code like that. Operators go at the end of
>> the line, and continuation should match the indentation of the opening
>> paren. So the above would look like:
>>
>>> +		if (!(sec->sh.sh_flags & SHF_EXECINSTR) &&
>>> +		    (strcmp(sec->name, ".altinstr_replacement") || !IGNORE_SHF_EXEC_FLAG))
>>>  			continue;
>>
>> You appear to be doing that quit consistently, and it is against style.
> 
> Raphael, as a heads-up, ./scripts/checkpatch.pl can catch issues like
> this. You can run it over a list of patches, so for a patch series you
> can run:
> 
>  $ ./scripts/checkpatch.pl *.patch
> 
> ... and hopefully most of the output will be reasonable.
> 

For this particular case, checkpatch only warns about it if you pass it
"--strict" option. So in general it might be useful to include this
option at least for the first pass at including large pieces of code.

Cheers,
Raphael Gault April 9, 2019, 4:33 p.m. UTC | #4
Hi,

On 4/9/19 5:27 PM, Julien Thierry wrote:
>
>
> On 09/04/2019 17:24, Mark Rutland wrote:
>> On Tue, Apr 09, 2019 at 06:12:04PM +0200, Peter Zijlstra wrote:
>>>
>>> I'm just doing my initial read-through,.. however
>>>
>>> On Tue, Apr 09, 2019 at 02:52:40PM +0100, Raphael Gault wrote:
>>>> +if (!(sec->sh.sh_flags & SHF_EXECINSTR)
>>>> +&& (strcmp(sec->name, ".altinstr_replacement") || !IGNORE_SHF_EXEC_FLAG))
>>>>   continue;
>>>
>>> could you please not format code like that. Operators go at the end of
>>> the line, and continuation should match the indentation of the opening
>>> paren. So the above would look like:
>>>
>>>> +if (!(sec->sh.sh_flags & SHF_EXECINSTR) &&
>>>> +    (strcmp(sec->name, ".altinstr_replacement") || !IGNORE_SHF_EXEC_FLAG))
>>>>   continue;
>>>
>>> You appear to be doing that quit consistently, and it is against style.

Thank you for these remarks, I will correct this!

>>
>> Raphael, as a heads-up, ./scripts/checkpatch.pl can catch issues like
>> this. You can run it over a list of patches, so for a patch series you
>> can run:
>>
>>   $ ./scripts/checkpatch.pl *.patch
>>
>> ... and hopefully most of the output will be reasonable.
>>
>
> For this particular case, checkpatch only warns about it if you pass it
> "--strict" option. So in general it might be useful to include this
> option at least for the first pass at including large pieces of code.
>

Indeed that sounds usefull, thanks,

> Cheers,
>

Cheers,

--
Raphael Gault
IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.
Josh Poimboeuf April 23, 2019, 8:36 p.m. UTC | #5
On Tue, Apr 09, 2019 at 02:52:40PM +0100, Raphael Gault wrote:
> Since the way the initial stack frame when entering a function is different that what is done
> in the x86_64 architecture, we need to add some more check to support the different cases.
> As opposed as for x86_64, the return address is not stored by the call instruction but is instead
> loaded in a register. The initial stack frame is thus empty when entering a function and 2 push
> operations are needed to set it up correctly. All the different combinations need to be
> taken into account.
> 
> On arm64, the .altinstr_replacement section is not flagged as containing executable instructions
> but we still need to process it.
> 
> Switch tables are alse stored in a different way on arm64 than on x86_64 so we need to be able
> to identify in which case we are when looking for it.
> 
> Signed-off-by: Raphael Gault <raphael.gault@arm.com>
> ---
>  tools/objtool/arch.h              |  2 +
>  tools/objtool/arch/arm64/decode.c | 27 +++++++++
>  tools/objtool/arch/x86/decode.c   |  5 ++
>  tools/objtool/check.c             | 95 +++++++++++++++++++++++++++----
>  tools/objtool/elf.c               |  3 +-
>  5 files changed, 120 insertions(+), 12 deletions(-)
> 
> diff --git a/tools/objtool/arch.h b/tools/objtool/arch.h
> index 0eff166ca613..f3bef3f2cef3 100644
> --- a/tools/objtool/arch.h
> +++ b/tools/objtool/arch.h
> @@ -88,4 +88,6 @@ unsigned long arch_compute_jump_destination(struct instruction *insn);
>  
>  unsigned long arch_compute_rela_sym_offset(int addend);
>  
> +bool arch_is_insn_sibling_call(struct instruction *insn);
> +
>  #endif /* _ARCH_H */
> diff --git a/tools/objtool/arch/arm64/decode.c b/tools/objtool/arch/arm64/decode.c
> index 0feb3ae3af5d..8b293eae2b38 100644
> --- a/tools/objtool/arch/arm64/decode.c
> +++ b/tools/objtool/arch/arm64/decode.c
> @@ -105,6 +105,33 @@ unsigned long arch_compute_rela_sym_offset(int addend)
>  	return addend;
>  }
>  
> +/*
> + * In order to know if we are in presence of a sibling
> + * call and not in presence of a switch table we look
> + * back at the previous instructions and see if we are
> + * jumping inside the same function that we are already
> + * in.
> + */
> +bool arch_is_insn_sibling_call(struct instruction *insn)
> +{
> +	struct instruction *prev;
> +	struct list_head *l;
> +	struct symbol *sym;
> +	list_for_each_prev(l, &insn->list) {
> +		prev = (void *)l;
> +		if (!prev->func
> +			|| prev->func->pfunc != insn->func->pfunc)
> +			return false;
> +		if (prev->stack_op.src.reg != ADR_SOURCE)
> +			continue;
> +		sym = find_symbol_containing(insn->sec, insn->immediate);
> +		if (!sym || sym->type != STT_FUNC
> +			|| sym->pfunc != insn->func->pfunc)
> +			return true;
> +		break;
> +	}
> +	return true;
> +}

I get the feeling there might be a better way to do this, but I can't
figure out what this function is actually doing.  It looks like it
searches backwards in the function for an instruction which has
stack_op.src.reg != ADR_SOURCE -- what does that mean?  And why doesn't
it do anything with the instruction after it finds it?

>  static int is_arm64(struct elf *elf)
>  {
>  	switch(elf->ehdr.e_machine){
> diff --git a/tools/objtool/arch/x86/decode.c b/tools/objtool/arch/x86/decode.c
> index 1af7b4996307..88c3d99c76be 100644
> --- a/tools/objtool/arch/x86/decode.c
> +++ b/tools/objtool/arch/x86/decode.c
> @@ -85,6 +85,11 @@ unsigned long arch_compute_rela_sym_offset(int addend)
>  	return addend + 4;
>  }
>  
> +bool arch_is_insn_sibling_call(struct instruction *insn)
> +{
> +	return true;
> +}

All x86 instructions are sibling calls?

> +
>  int arch_orc_read_unwind_hints(struct objtool_file *file)
>  {
>  	struct section *sec, *relasec;
> diff --git a/tools/objtool/check.c b/tools/objtool/check.c
> index 17fcd8c8f9c1..fa6106214318 100644
> --- a/tools/objtool/check.c
> +++ b/tools/objtool/check.c
> @@ -261,10 +261,12 @@ static int decode_instructions(struct objtool_file *file)
>  	unsigned long offset;
>  	struct instruction *insn;
>  	int ret;
> +	static int composed_insn = 0;
>  
>  	for_each_sec(file, sec) {
>  
> -		if (!(sec->sh.sh_flags & SHF_EXECINSTR))
> +		if (!(sec->sh.sh_flags & SHF_EXECINSTR)
> +			&& (strcmp(sec->name, ".altinstr_replacement") || !IGNORE_SHF_EXEC_FLAG))
>  			continue;

We should just fix the root cause instead, presumably:

diff --git a/arch/arm64/include/asm/alternative.h b/arch/arm64/include/asm/alternative.h
index b9f8d787eea9..e9e6b81e3eb4 100644
--- a/arch/arm64/include/asm/alternative.h
+++ b/arch/arm64/include/asm/alternative.h
@@ -71,7 +71,7 @@ static inline void apply_alternatives_module(void *start, size_t length) { }
 	ALTINSTR_ENTRY(feature,cb)					\
 	".popsection\n"							\
 	" .if " __stringify(cb) " == 0\n"				\
-	".pushsection .altinstr_replacement, \"a\"\n"			\
+	".pushsection .altinstr_replacement, \"ax\"\n"			\
 	"663:\n\t"							\
 	newinstr "\n"							\
 	"664:\n\t"							\


>  
>  		if (strcmp(sec->name, ".altinstr_replacement") &&
> @@ -297,10 +299,22 @@ static int decode_instructions(struct objtool_file *file)
>  				WARN_FUNC("invalid instruction type %d",
>  					  insn->sec, insn->offset, insn->type);
>  				ret = -1;
> -				goto err;
> +				free(insn);
> +				continue;

What's the purpose of this change?  If it's really needed then it looks
like it should be a separate patch.

>  			}
> -
> -			hash_add(file->insn_hash, &insn->hash, insn->offset);
> +			/*
> +			 * For arm64 architecture, we sometime split instructions so that
> +			 * we can track the state evolution (i.e. load/store of pairs of registers).
> +			 * We thus need to take both into account and not erase the previous ones.
> +			 */

Ew...  Is this an architectural thing, or just a quirk of the arm64
decoder?

> +			if (composed_insn > 0)
> +				hash_add(file->insn_hash, &insn->hash, insn->offset + composed_insn);
> +			else
> +				hash_add(file->insn_hash, &insn->hash, insn->offset);
> +			if (insn->len == 0)
> +				composed_insn++;
> +			else
> +				composed_insn = 0;
>  			list_add_tail(&insn->list, &file->insn_list);
>  		}
>  
> @@ -510,10 +524,10 @@ static int add_jump_destinations(struct objtool_file *file)
>  			dest_off = arch_compute_jump_destination(insn);
>  		} else if (rela->sym->type == STT_SECTION) {
>  			dest_sec = rela->sym->sec;
> -			dest_off = rela->addend + 4;
> +			dest_off = arch_compute_rela_sym_offset(rela->addend);
>  		} else if (rela->sym->sec->idx) {
>  			dest_sec = rela->sym->sec;
> -			dest_off = rela->sym->sym.st_value + rela->addend + 4;
> +			dest_off = rela->sym->sym.st_value + arch_compute_rela_sym_offset(rela->addend);
>  		} else if (strstr(rela->sym->name, "_indirect_thunk_")) {
>  			/*
>  			 * Retpoline jumps are really dynamic jumps in
> @@ -663,7 +677,7 @@ static int handle_group_alt(struct objtool_file *file,
>  		last_orig_insn = insn;
>  	}
>  
> -	if (next_insn_same_sec(file, last_orig_insn)) {
> +	if (last_orig_insn && next_insn_same_sec(file, last_orig_insn)) {

This might belong in a separate patch which explains the reason for the
change.

>  		fake_jump = malloc(sizeof(*fake_jump));
>  		if (!fake_jump) {
>  			WARN("malloc failed");
> @@ -976,6 +990,17 @@ static struct rela *find_switch_table(struct objtool_file *file,
>  		if (find_symbol_containing(rodata_sec, table_offset))
>  			continue;
>  
> +		/*
> +		 * If we are on arm64 architecture, we now that we

"know"

> +		 * are in presence of a switch table thanks to
> +		 * the `br <Xn>` insn. but we can't retrieve it yet.
> +		 * So we just ignore unreachable for this file.
> +		 */
> +		if (JUMP_DYNAMIC_IS_SWITCH_TABLE) {
> +			file->ignore_unreachables = true;
> +			return NULL;
> +		}
> +

I think this means switch table reading is not yet supported?  If so
then maybe the flag should be called SWITCH_TABLE_NOT_SUPPORTED.

But really this needs to be fixed anyway before merging the code.

>  		rodata_rela = find_rela_by_dest(rodata_sec, table_offset);
>  		if (rodata_rela) {
>  			/*
> @@ -1258,8 +1283,8 @@ static void save_reg(struct insn_state *state, unsigned char reg, int base,
>  
>  static void restore_reg(struct insn_state *state, unsigned char reg)
>  {
> -	state->regs[reg].base = CFI_UNDEFINED;
> -	state->regs[reg].offset = 0;
> +	state->regs[reg].base = initial_func_cfi.regs[reg].base;
> +	state->regs[reg].offset = initial_func_cfi.regs[reg].offset;
>  }
>  
>  /*
> @@ -1415,8 +1440,33 @@ static int update_insn_state(struct instruction *insn, struct insn_state *state)
>  
>  				/* add imm, %rsp */
>  				state->stack_size -= op->src.offset;
> -				if (cfa->base == CFI_SP)
> +				if (cfa->base == CFI_SP) {
>  					cfa->offset -= op->src.offset;
> +					if (state->stack_size == 0
> +							&& initial_func_cfi.cfa.base == CFI_CFA) {
> +						cfa->base = CFI_CFA;
> +						cfa->offset = 0;
> +					}
> +				}
> +				/*
> +				 * on arm64 the save/restore of sp into fp is not automatic
> +				 * and the first one can be done without the other so we
> +				 * need to be careful not to invalidate the stack frame in such
> +				 * cases.
> +				 */
> +				else if (cfa->base == CFI_BP) {
> +					if (state->stack_size == 0
> +							&& initial_func_cfi.cfa.base == CFI_CFA) {
> +						cfa->base = CFI_CFA;
> +						cfa->offset = 0;
> +						restore_reg(state, CFI_BP);
> +					}
> +				}
> +				else if (cfa->base == CFI_CFA) {
> +					cfa->base = CFI_SP;
> +					if (state->stack_size >= 16)
> +						cfa->offset = 16;
> +				}
>  				break;
>  			}
>  
> @@ -1427,6 +1477,16 @@ static int update_insn_state(struct instruction *insn, struct insn_state *state)
>  				break;
>  			}
>  
> +			if (op->src.reg == CFI_SP && op->dest.reg == CFI_BP &&
> +			    cfa->base == CFI_SP &&
> +			    regs[CFI_BP].base == CFI_CFA &&
> +			    regs[CFI_BP].offset == -cfa->offset) {
> +
> +				/* mov %rsp, %rbp */
> +				cfa->base = op->dest.reg;
> +				state->bp_scratch = false;
> +				break;
> +			}
>  			if (op->src.reg == CFI_SP && cfa->base == CFI_SP) {
>  
>  				/* drap: lea disp(%rsp), %drap */
> @@ -1518,6 +1578,9 @@ static int update_insn_state(struct instruction *insn, struct insn_state *state)
>  			state->stack_size -= 8;
>  			if (cfa->base == CFI_SP)
>  				cfa->offset -= 8;
> +			if (cfa->base == CFI_SP && cfa->offset == 0
> +					&& initial_func_cfi.cfa.base == CFI_CFA)
> +				cfa->base = CFI_CFA;
>  
>  			break;
>  
> @@ -1557,6 +1620,8 @@ static int update_insn_state(struct instruction *insn, struct insn_state *state)
>  
>  	case OP_DEST_PUSH:
>  		state->stack_size += 8;
> +		if (cfa->base == CFI_CFA)
> +			cfa->base = CFI_SP;
>  		if (cfa->base == CFI_SP)
>  			cfa->offset += 8;
>  
> @@ -1728,7 +1793,7 @@ static int validate_branch(struct objtool_file *file, struct instruction *first,
>  	insn = first;
>  	sec = insn->sec;
>  
> -	if (insn->alt_group && list_empty(&insn->alts)) {
> +	if (!insn->visited && insn->alt_group && list_empty(&insn->alts)) {

Why?  This looks like another one that might belong in a separate patch.

>  		WARN_FUNC("don't know how to handle branch to middle of alternative instruction group",
>  			  sec, insn->offset);
>  		return 1;
> @@ -1871,6 +1936,14 @@ static int validate_branch(struct objtool_file *file, struct instruction *first,
>  		case INSN_JUMP_DYNAMIC:
>  			if (func && list_empty(&insn->alts) &&
>  			    has_modified_stack_frame(&state)) {
> +				/*
> +				 * On arm64 `br <Xn>` insn can be used for switch-tables
> +				 * but it cannot be distinguished in itself from a sibling
> +				 * call thus we need to have a look at the previous instructions
> +				 * to determine which it is
> +				 */
> +				if (!arch_is_insn_sibling_call(insn))
> +					break;
>  				WARN_FUNC("sibling call from callable instruction with modified stack frame",
>  					  sec, insn->offset);
>  				return 1;
> diff --git a/tools/objtool/elf.c b/tools/objtool/elf.c
> index b8f3cca8e58b..136f9b9fb1d1 100644
> --- a/tools/objtool/elf.c
> +++ b/tools/objtool/elf.c
> @@ -74,7 +74,8 @@ struct symbol *find_symbol_by_offset(struct section *sec, unsigned long offset)
>  	struct symbol *sym;
>  
>  	list_for_each_entry(sym, &sec->symbol_list, list)
> -		if (sym->type != STT_SECTION &&
> +		if (sym->type != STT_NOTYPE &&
> +		    sym->type != STT_SECTION &&

Why?  Another potential separate patch.

>  		    sym->offset == offset)
>  			return sym;
>  
> -- 
> 2.17.1
>
Julien Thierry April 24, 2019, 10:36 a.m. UTC | #6
Hi Raphaël,

I think you could split the patch in at least 3:

- Handling the split instructions
- Handling the jump offset
- Dynamic jumps/switch table


On 09/04/2019 14:52, Raphael Gault wrote:
> Since the way the initial stack frame when entering a function is different that what is done

Nit: "different from what is done"

> in the x86_64 architecture, we need to add some more check to support the different cases.
> As opposed as for x86_64, the return address is not stored by the call instruction but is instead
> loaded in a register. The initial stack frame is thus empty when entering a function and 2 push
> operations are needed to set it up correctly. All the different combinations need to be
> taken into account.
> 
> On arm64, the .altinstr_replacement section is not flagged as containing executable instructions
> but we still need to process it.
> 
> Switch tables are alse stored in a different way on arm64 than on x86_64 so we need to be able

Nit: also

> to identify in which case we are when looking for it.
> 
> Signed-off-by: Raphael Gault <raphael.gault@arm.com>
> ---
>  tools/objtool/arch.h              |  2 +
>  tools/objtool/arch/arm64/decode.c | 27 +++++++++
>  tools/objtool/arch/x86/decode.c   |  5 ++
>  tools/objtool/check.c             | 95 +++++++++++++++++++++++++++----
>  tools/objtool/elf.c               |  3 +-
>  5 files changed, 120 insertions(+), 12 deletions(-)
> 
> diff --git a/tools/objtool/arch.h b/tools/objtool/arch.h
> index 0eff166ca613..f3bef3f2cef3 100644
> --- a/tools/objtool/arch.h
> +++ b/tools/objtool/arch.h
> @@ -88,4 +88,6 @@ unsigned long arch_compute_jump_destination(struct instruction *insn);
>  
>  unsigned long arch_compute_rela_sym_offset(int addend);
>  
> +bool arch_is_insn_sibling_call(struct instruction *insn);
> +
>  #endif /* _ARCH_H */
> diff --git a/tools/objtool/arch/arm64/decode.c b/tools/objtool/arch/arm64/decode.c
> index 0feb3ae3af5d..8b293eae2b38 100644
> --- a/tools/objtool/arch/arm64/decode.c
> +++ b/tools/objtool/arch/arm64/decode.c
> @@ -105,6 +105,33 @@ unsigned long arch_compute_rela_sym_offset(int addend)
>  	return addend;
>  }
>  
> +/*
> + * In order to know if we are in presence of a sibling
> + * call and not in presence of a switch table we look
> + * back at the previous instructions and see if we are
> + * jumping inside the same function that we are already
> + * in.
> + */
> +bool arch_is_insn_sibling_call(struct instruction *insn)
> +{
> +	struct instruction *prev;
> +	struct list_head *l;
> +	struct symbol *sym;
> +	list_for_each_prev(l, &insn->list) {
> +		prev = (void *)l;

Please use list_entry() instead of casting, this only happens to work
because list is the first member of the struct instruction.

> +		if (!prev->func
> +			|| prev->func->pfunc != insn->func->pfunc)
> +			return false;
> +		if (prev->stack_op.src.reg != ADR_SOURCE)
> +			continue;
> +		sym = find_symbol_containing(insn->sec, insn->immediate);
> +		if (!sym || sym->type != STT_FUNC
> +			|| sym->pfunc != insn->func->pfunc)
> +			return true;
> +		break;
> +	}
> +	return true;
> +}
>  static int is_arm64(struct elf *elf)
>  {
>  	switch(elf->ehdr.e_machine){
> diff --git a/tools/objtool/arch/x86/decode.c b/tools/objtool/arch/x86/decode.c
> index 1af7b4996307..88c3d99c76be 100644
> --- a/tools/objtool/arch/x86/decode.c
> +++ b/tools/objtool/arch/x86/decode.c
> @@ -85,6 +85,11 @@ unsigned long arch_compute_rela_sym_offset(int addend)
>  	return addend + 4;
>  }
>  
> +bool arch_is_insn_sibling_call(struct instruction *insn)
> +{
> +	return true;

The naming and what the function returns does seem to be really fitting.
Makes it seem like every x86 instruction is a sibling call, which is
unlikely to be the case.

> +}
> +
>  int arch_orc_read_unwind_hints(struct objtool_file *file)
>  {
>  	struct section *sec, *relasec;
> diff --git a/tools/objtool/check.c b/tools/objtool/check.c
> index 17fcd8c8f9c1..fa6106214318 100644
> --- a/tools/objtool/check.c
> +++ b/tools/objtool/check.c
> @@ -261,10 +261,12 @@ static int decode_instructions(struct objtool_file *file)
>  	unsigned long offset;
>  	struct instruction *insn;
>  	int ret;
> +	static int composed_insn = 0;
>  
>  	for_each_sec(file, sec) {
>  
> -		if (!(sec->sh.sh_flags & SHF_EXECINSTR))
> +		if (!(sec->sh.sh_flags & SHF_EXECINSTR)
> +			&& (strcmp(sec->name, ".altinstr_replacement") || !IGNORE_SHF_EXEC_FLAG))
>  			continue;
>  
>  		if (strcmp(sec->name, ".altinstr_replacement") &&
> @@ -297,10 +299,22 @@ static int decode_instructions(struct objtool_file *file)
>  				WARN_FUNC("invalid instruction type %d",
>  					  insn->sec, insn->offset, insn->type);
>  				ret = -1;
> -				goto err;
> +				free(insn);
> +				continue;
>  			}
> -
> -			hash_add(file->insn_hash, &insn->hash, insn->offset);
> +			/*
> +			 * For arm64 architecture, we sometime split instructions so that
> +			 * we can track the state evolution (i.e. load/store of pairs of registers).
> +			 * We thus need to take both into account and not erase the previous ones.
> +			 */
> +			if (composed_insn > 0)
> +				hash_add(file->insn_hash, &insn->hash, insn->offset + composed_insn);
> +			else
> +				hash_add(file->insn_hash, &insn->hash, insn->offset);

composed_insn has no reason to be negative, right? So this if is
equivalent to:

	hash_add(file->insn_hash, &insn->hash,
		 insn->offset + composed_insn);


Also, that means that this only works because all arm64 instructions are
4 bytes long. Otherwise you could have an overlap between the "second"
part of the composed instruction and the instruction that follows it.

It feels a bit too hackish for the arch independent code.

If we want to use that trick, maybe it should be the arm64 decode that
should return a length of 1 or 2 for composed insn, and when decoding an
instruction that isn't 4-bytes aligned, we would know to look 2 bytes
before to find what we were decoding, then return (4 - len) so that we
end up on the next instruction on the next itertation.

This way we don't have to change anything to the decode loop.

Also, I've got the impression that this hash table is most often use to
find the instruction at the starting offset of a function. Meaning it is
unlikely we'll end up looking up that composed instruction. Might be
worth checking whether this is the case and if so, maybe we can just add
the one real instruction to the hash table and focus on the instruction
list for our instruction splitting.

> +			if (insn->len == 0)
> +				composed_insn++;
> +			else
> +				composed_insn = 0;
>  			list_add_tail(&insn->list, &file->insn_list);
>  		}
>  

Thanks,
Raphael Gault April 24, 2019, 4:32 p.m. UTC | #7
Hi,

As Julien also pointed out I realise that similarly to the first patch
of this series, this patch should be split into smaller ones. I will do
this for the next version.

On 4/23/19 9:36 PM, Josh Poimboeuf wrote:
> On Tue, Apr 09, 2019 at 02:52:40PM +0100, Raphael Gault wrote:
>> Since the way the initial stack frame when entering a function is different that what is done
>> in the x86_64 architecture, we need to add some more check to support the different cases.
>> As opposed as for x86_64, the return address is not stored by the call instruction but is instead
>> loaded in a register. The initial stack frame is thus empty when entering a function and 2 push
>> operations are needed to set it up correctly. All the different combinations need to be
>> taken into account.
>>
>> On arm64, the .altinstr_replacement section is not flagged as containing executable instructions
>> but we still need to process it.
>>
>> Switch tables are alse stored in a different way on arm64 than on x86_64 so we need to be able
>> to identify in which case we are when looking for it.
>>
>> Signed-off-by: Raphael Gault <raphael.gault@arm.com>
>> ---
>>   tools/objtool/arch.h              |  2 +
>>   tools/objtool/arch/arm64/decode.c | 27 +++++++++
>>   tools/objtool/arch/x86/decode.c   |  5 ++
>>   tools/objtool/check.c             | 95 +++++++++++++++++++++++++++----
>>   tools/objtool/elf.c               |  3 +-
>>   5 files changed, 120 insertions(+), 12 deletions(-)
>>
>> diff --git a/tools/objtool/arch.h b/tools/objtool/arch.h
>> index 0eff166ca613..f3bef3f2cef3 100644
>> --- a/tools/objtool/arch.h
>> +++ b/tools/objtool/arch.h
>> @@ -88,4 +88,6 @@ unsigned long arch_compute_jump_destination(struct instruction *insn);
>>
>>   unsigned long arch_compute_rela_sym_offset(int addend);
>>
>> +bool arch_is_insn_sibling_call(struct instruction *insn);
>> +
>>   #endif /* _ARCH_H */
>> diff --git a/tools/objtool/arch/arm64/decode.c b/tools/objtool/arch/arm64/decode.c
>> index 0feb3ae3af5d..8b293eae2b38 100644
>> --- a/tools/objtool/arch/arm64/decode.c
>> +++ b/tools/objtool/arch/arm64/decode.c
>> @@ -105,6 +105,33 @@ unsigned long arch_compute_rela_sym_offset(int addend)
>>   return addend;
>>   }
>>
>> +/*
>> + * In order to know if we are in presence of a sibling
>> + * call and not in presence of a switch table we look
>> + * back at the previous instructions and see if we are
>> + * jumping inside the same function that we are already
>> + * in.
>> + */
>> +bool arch_is_insn_sibling_call(struct instruction *insn)
>> +{
>> +struct instruction *prev;
>> +struct list_head *l;
>> +struct symbol *sym;
>> +list_for_each_prev(l, &insn->list) {
>> +prev = (void *)l;
>> +if (!prev->func
>> +|| prev->func->pfunc != insn->func->pfunc)
>> +return false;
>> +if (prev->stack_op.src.reg != ADR_SOURCE)
>> +continue;
>> +sym = find_symbol_containing(insn->sec, insn->immediate);
>> +if (!sym || sym->type != STT_FUNC
>> +|| sym->pfunc != insn->func->pfunc)
>> +return true;
>> +break;
>> +}
>> +return true;
>> +}
>
> I get the feeling there might be a better way to do this, but I can't
> figure out what this function is actually doing.  It looks like it
> searches backwards in the function for an instruction which has
> stack_op.src.reg != ADR_SOURCE -- what does that mean?  And why doesn't
> it do anything with the instruction after it finds it?
>

I will indeed try to make it better.

>>   static int is_arm64(struct elf *elf)
>>   {
>>   switch(elf->ehdr.e_machine){
>> diff --git a/tools/objtool/arch/x86/decode.c b/tools/objtool/arch/x86/decode.c
>> index 1af7b4996307..88c3d99c76be 100644
>> --- a/tools/objtool/arch/x86/decode.c
>> +++ b/tools/objtool/arch/x86/decode.c
>> @@ -85,6 +85,11 @@ unsigned long arch_compute_rela_sym_offset(int addend)
>>   return addend + 4;
>>   }
>>
>> +bool arch_is_insn_sibling_call(struct instruction *insn)
>> +{
>> +return true;
>> +}
>
> All x86 instructions are sibling calls?
>

The sementic here is indeed wrong. I implemented it like that on the x86
side because at the place it is called from in check.c we already know
that we are in presence of a sibling calls when on x86. I will improve
these bits on the next iteration.

>> +
>>   int arch_orc_read_unwind_hints(struct objtool_file *file)
>>   {
>>   struct section *sec, *relasec;
>> diff --git a/tools/objtool/check.c b/tools/objtool/check.c
>> index 17fcd8c8f9c1..fa6106214318 100644
>> --- a/tools/objtool/check.c
>> +++ b/tools/objtool/check.c
>> @@ -261,10 +261,12 @@ static int decode_instructions(struct objtool_file *file)
>>   unsigned long offset;
>>   struct instruction *insn;
>>   int ret;
>> +static int composed_insn = 0;
>>
>>   for_each_sec(file, sec) {
>>
>> -if (!(sec->sh.sh_flags & SHF_EXECINSTR))
>> +if (!(sec->sh.sh_flags & SHF_EXECINSTR)
>> +&& (strcmp(sec->name, ".altinstr_replacement") || !IGNORE_SHF_EXEC_FLAG))
>>   continue;
>
> We should just fix the root cause instead, presumably:
>
> diff --git a/arch/arm64/include/asm/alternative.h b/arch/arm64/include/asm/alternative.h
> index b9f8d787eea9..e9e6b81e3eb4 100644
> --- a/arch/arm64/include/asm/alternative.h
> +++ b/arch/arm64/include/asm/alternative.h
> @@ -71,7 +71,7 @@ static inline void apply_alternatives_module(void *start, size_t length) { }
>   ALTINSTR_ENTRY(feature,cb)\
>   ".popsection\n"\
>   " .if " __stringify(cb) " == 0\n"\
> -".pushsection .altinstr_replacement, \"a\"\n"\
> +".pushsection .altinstr_replacement, \"ax\"\n"\
>   "663:\n\t"\
>   newinstr "\n"\
>   "664:\n\t"\
>
>

I will take your advice and suggest this in the next version as well.

>>
>>   if (strcmp(sec->name, ".altinstr_replacement") &&
>> @@ -297,10 +299,22 @@ static int decode_instructions(struct objtool_file *file)
>>   WARN_FUNC("invalid instruction type %d",
>>     insn->sec, insn->offset, insn->type);
>>   ret = -1;
>> -goto err;
>> +free(insn);
>> +continue;
>
> What's the purpose of this change?  If it's really needed then it looks
> like it should be a separate patch.
>
>>   }
>> -
>> -hash_add(file->insn_hash, &insn->hash, insn->offset);
>> +/*
>> + * For arm64 architecture, we sometime split instructions so that
>> + * we can track the state evolution (i.e. load/store of pairs of registers).
>> + * We thus need to take both into account and not erase the previous ones.
>> + */
>
> Ew...  Is this an architectural thing, or just a quirk of the arm64
> decoder?
>

The motivation for this is to simulate the two consecutive operations
that would be executed on x86 but are done in one on arm64. This is
strictly a decoder related quirk. I don't know if there is a better way
to do it without modifying the struct op_src and struct instruction.

>> +if (composed_insn > 0)
>> +hash_add(file->insn_hash, &insn->hash, insn->offset + composed_insn);
>> +else
>> +hash_add(file->insn_hash, &insn->hash, insn->offset);
>> +if (insn->len == 0)
>> +composed_insn++;
>> +else
>> +composed_insn = 0;
>>   list_add_tail(&insn->list, &file->insn_list);
>>   }
>>
>> @@ -510,10 +524,10 @@ static int add_jump_destinations(struct objtool_file *file)
>>   dest_off = arch_compute_jump_destination(insn);
>>   } else if (rela->sym->type == STT_SECTION) {
>>   dest_sec = rela->sym->sec;
>> -dest_off = rela->addend + 4;
>> +dest_off = arch_compute_rela_sym_offset(rela->addend);
>>   } else if (rela->sym->sec->idx) {
>>   dest_sec = rela->sym->sec;
>> -dest_off = rela->sym->sym.st_value + rela->addend + 4;
>> +dest_off = rela->sym->sym.st_value + arch_compute_rela_sym_offset(rela->addend);
>>   } else if (strstr(rela->sym->name, "_indirect_thunk_")) {
>>   /*
>>    * Retpoline jumps are really dynamic jumps in
>> @@ -663,7 +677,7 @@ static int handle_group_alt(struct objtool_file *file,
>>   last_orig_insn = insn;
>>   }
>>
>> -if (next_insn_same_sec(file, last_orig_insn)) {
>> +if (last_orig_insn && next_insn_same_sec(file, last_orig_insn)) {
>
> This might belong in a separate patch which explains the reason for the
> change.
>

I did this because I encountered situations where the last_orig_insn was
NULL because of the offset check perform in the preceding loop. This
caused a null dereference to occur. But it definitely should be split up.

>>   fake_jump = malloc(sizeof(*fake_jump));
>>   if (!fake_jump) {
>>   WARN("malloc failed");
>> @@ -976,6 +990,17 @@ static struct rela *find_switch_table(struct objtool_file *file,
>>   if (find_symbol_containing(rodata_sec, table_offset))
>>   continue;
>>
>> +/*
>> + * If we are on arm64 architecture, we now that we
>
> "know"
>
>> + * are in presence of a switch table thanks to
>> + * the `br <Xn>` insn. but we can't retrieve it yet.
>> + * So we just ignore unreachable for this file.
>> + */
>> +if (JUMP_DYNAMIC_IS_SWITCH_TABLE) {
>> +file->ignore_unreachables = true;
>> +return NULL;
>> +}
>> +
>
> I think this means switch table reading is not yet supported?  If so
> then maybe the flag should be called SWITCH_TABLE_NOT_SUPPORTED.
>
> But really this needs to be fixed anyway before merging the code.
>

This was indeed an ugly way to do this. I will propose a better solution
for this part in the next iteration.

>>   rodata_rela = find_rela_by_dest(rodata_sec, table_offset);
>>   if (rodata_rela) {
>>   /*
>> @@ -1258,8 +1283,8 @@ static void save_reg(struct insn_state *state, unsigned char reg, int base,
>>
>>   static void restore_reg(struct insn_state *state, unsigned char reg)
>>   {
>> -state->regs[reg].base = CFI_UNDEFINED;
>> -state->regs[reg].offset = 0;
>> +state->regs[reg].base = initial_func_cfi.regs[reg].base;
>> +state->regs[reg].offset = initial_func_cfi.regs[reg].offset;
>>   }
>>
>>   /*
>> @@ -1415,8 +1440,33 @@ static int update_insn_state(struct instruction *insn, struct insn_state *state)
>>
>>   /* add imm, %rsp */
>>   state->stack_size -= op->src.offset;
>> -if (cfa->base == CFI_SP)
>> +if (cfa->base == CFI_SP) {
>>   cfa->offset -= op->src.offset;
>> +if (state->stack_size == 0
>> +&& initial_func_cfi.cfa.base == CFI_CFA) {
>> +cfa->base = CFI_CFA;
>> +cfa->offset = 0;
>> +}
>> +}
>> +/*
>> + * on arm64 the save/restore of sp into fp is not automatic
>> + * and the first one can be done without the other so we
>> + * need to be careful not to invalidate the stack frame in such
>> + * cases.
>> + */
>> +else if (cfa->base == CFI_BP) {
>> +if (state->stack_size == 0
>> +&& initial_func_cfi.cfa.base == CFI_CFA) {
>> +cfa->base = CFI_CFA;
>> +cfa->offset = 0;
>> +restore_reg(state, CFI_BP);
>> +}
>> +}
>> +else if (cfa->base == CFI_CFA) {
>> +cfa->base = CFI_SP;
>> +if (state->stack_size >= 16)
>> +cfa->offset = 16;
>> +}
>>   break;
>>   }
>>
>> @@ -1427,6 +1477,16 @@ static int update_insn_state(struct instruction *insn, struct insn_state *state)
>>   break;
>>   }
>>
>> +if (op->src.reg == CFI_SP && op->dest.reg == CFI_BP &&
>> +    cfa->base == CFI_SP &&
>> +    regs[CFI_BP].base == CFI_CFA &&
>> +    regs[CFI_BP].offset == -cfa->offset) {
>> +
>> +/* mov %rsp, %rbp */
>> +cfa->base = op->dest.reg;
>> +state->bp_scratch = false;
>> +break;
>> +}
>>   if (op->src.reg == CFI_SP && cfa->base == CFI_SP) {
>>
>>   /* drap: lea disp(%rsp), %drap */
>> @@ -1518,6 +1578,9 @@ static int update_insn_state(struct instruction *insn, struct insn_state *state)
>>   state->stack_size -= 8;
>>   if (cfa->base == CFI_SP)
>>   cfa->offset -= 8;
>> +if (cfa->base == CFI_SP && cfa->offset == 0
>> +&& initial_func_cfi.cfa.base == CFI_CFA)
>> +cfa->base = CFI_CFA;
>>
>>   break;
>>
>> @@ -1557,6 +1620,8 @@ static int update_insn_state(struct instruction *insn, struct insn_state *state)
>>
>>   case OP_DEST_PUSH:
>>   state->stack_size += 8;
>> +if (cfa->base == CFI_CFA)
>> +cfa->base = CFI_SP;
>>   if (cfa->base == CFI_SP)
>>   cfa->offset += 8;
>>
>> @@ -1728,7 +1793,7 @@ static int validate_branch(struct objtool_file *file, struct instruction *first,
>>   insn = first;
>>   sec = insn->sec;
>>
>> -if (insn->alt_group && list_empty(&insn->alts)) {
>> +if (!insn->visited && insn->alt_group && list_empty(&insn->alts)) {
>
> Why?  This looks like another one that might belong in a separate patch.
>

Indeed it belongs to a separate patch. I did this because I encountered
a situation where one of the alternative instructions (in
.altinstr_replacement) jumps at the offset of an instruction which is
replaced as well. I thus considered that if one of the instruction of
the group is replaced then all instructions of the group should be
replaced and thus this would never happen at execution.

>>   WARN_FUNC("don't know how to handle branch to middle of alternative instruction group",
>>     sec, insn->offset);
>>   return 1;
>> @@ -1871,6 +1936,14 @@ static int validate_branch(struct objtool_file *file, struct instruction *first,
>>   case INSN_JUMP_DYNAMIC:
>>   if (func && list_empty(&insn->alts) &&
>>       has_modified_stack_frame(&state)) {
>> +/*
>> + * On arm64 `br <Xn>` insn can be used for switch-tables
>> + * but it cannot be distinguished in itself from a sibling
>> + * call thus we need to have a look at the previous instructions
>> + * to determine which it is
>> + */
>> +if (!arch_is_insn_sibling_call(insn))
>> +break;
>>   WARN_FUNC("sibling call from callable instruction with modified stack frame",
>>     sec, insn->offset);
>>   return 1;
>> diff --git a/tools/objtool/elf.c b/tools/objtool/elf.c
>> index b8f3cca8e58b..136f9b9fb1d1 100644
>> --- a/tools/objtool/elf.c
>> +++ b/tools/objtool/elf.c
>> @@ -74,7 +74,8 @@ struct symbol *find_symbol_by_offset(struct section *sec, unsigned long offset)
>>   struct symbol *sym;
>>
>>   list_for_each_entry(sym, &sec->symbol_list, list)
>> -if (sym->type != STT_SECTION &&
>> +if (sym->type != STT_NOTYPE &&
>> +    sym->type != STT_SECTION &&
>
> Why?  Another potential separate patch.
>

This is again a special case I encountered where several symbols can be
found with the same offset so the sole condition of the offset wasn't
enough to distinguish the correct destination. I thus added some extra
check on the type of the symbol.

>>       sym->offset == offset)
>>   return sym;
>>
>> --
>> 2.17.1
>>
>

Please fill free to comment back on my answers. I would really like to
get you opinion on some of the special cases I mentioned.

Thanks,

--
Raphael Gault
IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.
Josh Poimboeuf April 24, 2019, 4:56 p.m. UTC | #8
On Wed, Apr 24, 2019 at 04:32:44PM +0000, Raphael Gault wrote:
> >> diff --git a/tools/objtool/arch/arm64/decode.c b/tools/objtool/arch/arm64/decode.c
> >> index 0feb3ae3af5d..8b293eae2b38 100644
> >> --- a/tools/objtool/arch/arm64/decode.c
> >> +++ b/tools/objtool/arch/arm64/decode.c
> >> @@ -105,6 +105,33 @@ unsigned long arch_compute_rela_sym_offset(int addend)
> >>   return addend;
> >>   }
> >>
> >> +/*
> >> + * In order to know if we are in presence of a sibling
> >> + * call and not in presence of a switch table we look
> >> + * back at the previous instructions and see if we are
> >> + * jumping inside the same function that we are already
> >> + * in.
> >> + */
> >> +bool arch_is_insn_sibling_call(struct instruction *insn)
> >> +{
> >> +struct instruction *prev;
> >> +struct list_head *l;
> >> +struct symbol *sym;
> >> +list_for_each_prev(l, &insn->list) {
> >> +prev = (void *)l;
> >> +if (!prev->func
> >> +|| prev->func->pfunc != insn->func->pfunc)
> >> +return false;
> >> +if (prev->stack_op.src.reg != ADR_SOURCE)
> >> +continue;
> >> +sym = find_symbol_containing(insn->sec, insn->immediate);
> >> +if (!sym || sym->type != STT_FUNC
> >> +|| sym->pfunc != insn->func->pfunc)
> >> +return true;
> >> +break;
> >> +}
> >> +return true;
> >> +}
> >
> > I get the feeling there might be a better way to do this, but I can't
> > figure out what this function is actually doing.  It looks like it
> > searches backwards in the function for an instruction which has
> > stack_op.src.reg != ADR_SOURCE -- what does that mean?  And why doesn't
> > it do anything with the instruction after it finds it?
> >
> 
> I will indeed try to make it better.

I still don't quite get what it's trying to accomplish, but I wonder if
there's some kind of tracking you can add in validate_branch() to keep
track of whatever you're looking for, leading up to the indirect jump.

> >> -hash_add(file->insn_hash, &insn->hash, insn->offset);
> >> +/*
> >> + * For arm64 architecture, we sometime split instructions so that
> >> + * we can track the state evolution (i.e. load/store of pairs of registers).
> >> + * We thus need to take both into account and not erase the previous ones.
> >> + */
> >
> > Ew...  Is this an architectural thing, or just a quirk of the arm64
> > decoder?
> >
> 
> The motivation for this is to simulate the two consecutive operations
> that would be executed on x86 but are done in one on arm64. This is
> strictly a decoder related quirk. I don't know if there is a better way
> to do it without modifying the struct op_src and struct instruction.

Ah.  Which ops are those?  Hopefully we can find a better way to
represent that with a single instruction.  Adding fake instructions is
fragile.
Raphael Gault April 25, 2019, 8:12 a.m. UTC | #9
Hi Josh,

On 4/24/19 5:56 PM, Josh Poimboeuf wrote:
> On Wed, Apr 24, 2019 at 04:32:44PM +0000, Raphael Gault wrote:
>>>> diff --git a/tools/objtool/arch/arm64/decode.c b/tools/objtool/arch/arm64/decode.c
>>>> index 0feb3ae3af5d..8b293eae2b38 100644
>>>> --- a/tools/objtool/arch/arm64/decode.c
>>>> +++ b/tools/objtool/arch/arm64/decode.c
>>>> @@ -105,6 +105,33 @@ unsigned long arch_compute_rela_sym_offset(int addend)
>>>>    return addend;
>>>>    }
>>>>
>>>> +/*
>>>> + * In order to know if we are in presence of a sibling
>>>> + * call and not in presence of a switch table we look
>>>> + * back at the previous instructions and see if we are
>>>> + * jumping inside the same function that we are already
>>>> + * in.
>>>> + */
>>>> +bool arch_is_insn_sibling_call(struct instruction *insn)
>>>> +{
>>>> +struct instruction *prev;
>>>> +struct list_head *l;
>>>> +struct symbol *sym;
>>>> +list_for_each_prev(l, &insn->list) {
>>>> +prev = (void *)l;
>>>> +if (!prev->func
>>>> +|| prev->func->pfunc != insn->func->pfunc)
>>>> +return false;
>>>> +if (prev->stack_op.src.reg != ADR_SOURCE)
>>>> +continue;
>>>> +sym = find_symbol_containing(insn->sec, insn->immediate);
>>>> +if (!sym || sym->type != STT_FUNC
>>>> +|| sym->pfunc != insn->func->pfunc)
>>>> +return true;
>>>> +break;
>>>> +}
>>>> +return true;
>>>> +}
>>>
>>> I get the feeling there might be a better way to do this, but I can't
>>> figure out what this function is actually doing.  It looks like it
>>> searches backwards in the function for an instruction which has
>>> stack_op.src.reg != ADR_SOURCE -- what does that mean?  And why doesn't
>>> it do anything with the instruction after it finds it?
>>>
>>
>> I will indeed try to make it better.
>
> I still don't quite get what it's trying to accomplish, but I wonder if
> there's some kind of tracking you can add in validate_branch() to keep
> track of whatever you're looking for, leading up to the indirect jump.
>

The motivation behind this is that the `br <Xn>` instruction is a
dynamic jump (jump to the address contained in the provided register).
This instruction is used for sibling calls but can also be used for
switch table. I use this to differentiate these two cases from one another:

Generally the `adr/adrp` instruction is used prior to `br` in order to
load the address into the register. What I do here is go back throught
the instructions and try to identify if the address loaded.

I also thought of implementing some sort of tracking in validate branch
because it could be useful for identifying the switch tables as well.
But it seemed to me like a major change in the sementic of this tool:
indeed, from my perspective I would have to track the state of the
registers and I don't know if we want to do that.

>>>> -hash_add(file->insn_hash, &insn->hash, insn->offset);
>>>> +/*
>>>> + * For arm64 architecture, we sometime split instructions so that
>>>> + * we can track the state evolution (i.e. load/store of pairs of registers).
>>>> + * We thus need to take both into account and not erase the previous ones.
>>>> + */
>>>
>>> Ew...  Is this an architectural thing, or just a quirk of the arm64
>>> decoder?
>>>
>>
>> The motivation for this is to simulate the two consecutive operations
>> that would be executed on x86 but are done in one on arm64. This is
>> strictly a decoder related quirk. I don't know if there is a better way
>> to do it without modifying the struct op_src and struct instruction.
>
> Ah.  Which ops are those?  Hopefully we can find a better way to
> represent that with a single instruction.  Adding fake instructions is
> fragile.
>

Those are the load/store of pairs of registers, mainly stp/ldp. Those
are often use in the function prologues/epilogues to save/restore the
stack pointers and frame pointers however it can be used with any
register pair.

The idea to add a new instruction could work but I would need to extend
the `struct op_src` as well I think.

Thanks,

--
Raphael Gault
IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.
Peter Zijlstra April 25, 2019, 8:33 a.m. UTC | #10
On Thu, Apr 25, 2019 at 08:12:24AM +0000, Raphael Gault wrote:
> The motivation behind this is that the `br <Xn>` instruction is a
> dynamic jump (jump to the address contained in the provided register).
> This instruction is used for sibling calls but can also be used for
> switch table. I use this to differentiate these two cases from one another:
> 
> Generally the `adr/adrp` instruction is used prior to `br` in order to
> load the address into the register. What I do here is go back throught
> the instructions and try to identify if the address loaded.

Yikes, be very careful with simply going back on the instruction stream.

The problem case would be something like:

	adr	...
	b	1f

	...

	b	...
1:	br	...

In that case, simply going backwards from 1 will not yield the desired
result.

At some point I did a pass storing all the fwd jumps in their
destination and used that to 'rewind' the instruction stream, but that
wasn't very pretty either.

The best way might be to, while doing validate_branch() keep a state
table of all most recent ADR(P) instructions and when encountering BR
check that state to see what it really is.

We currently don't do dynamic insn->type, but it shouldn't be too hard
(famous last words of course).
Josh Poimboeuf April 25, 2019, 4:25 p.m. UTC | #11
On Thu, Apr 25, 2019 at 08:12:24AM +0000, Raphael Gault wrote:
> Hi Josh,
> 
> On 4/24/19 5:56 PM, Josh Poimboeuf wrote:
> > On Wed, Apr 24, 2019 at 04:32:44PM +0000, Raphael Gault wrote:
> >>>> diff --git a/tools/objtool/arch/arm64/decode.c b/tools/objtool/arch/arm64/decode.c
> >>>> index 0feb3ae3af5d..8b293eae2b38 100644
> >>>> --- a/tools/objtool/arch/arm64/decode.c
> >>>> +++ b/tools/objtool/arch/arm64/decode.c
> >>>> @@ -105,6 +105,33 @@ unsigned long arch_compute_rela_sym_offset(int addend)
> >>>>    return addend;
> >>>>    }
> >>>>
> >>>> +/*
> >>>> + * In order to know if we are in presence of a sibling
> >>>> + * call and not in presence of a switch table we look
> >>>> + * back at the previous instructions and see if we are
> >>>> + * jumping inside the same function that we are already
> >>>> + * in.
> >>>> + */
> >>>> +bool arch_is_insn_sibling_call(struct instruction *insn)
> >>>> +{
> >>>> +struct instruction *prev;
> >>>> +struct list_head *l;
> >>>> +struct symbol *sym;
> >>>> +list_for_each_prev(l, &insn->list) {
> >>>> +prev = (void *)l;
> >>>> +if (!prev->func
> >>>> +|| prev->func->pfunc != insn->func->pfunc)
> >>>> +return false;
> >>>> +if (prev->stack_op.src.reg != ADR_SOURCE)
> >>>> +continue;
> >>>> +sym = find_symbol_containing(insn->sec, insn->immediate);
> >>>> +if (!sym || sym->type != STT_FUNC
> >>>> +|| sym->pfunc != insn->func->pfunc)
> >>>> +return true;
> >>>> +break;
> >>>> +}
> >>>> +return true;
> >>>> +}
> >>>
> >>> I get the feeling there might be a better way to do this, but I can't
> >>> figure out what this function is actually doing.  It looks like it
> >>> searches backwards in the function for an instruction which has
> >>> stack_op.src.reg != ADR_SOURCE -- what does that mean?  And why doesn't
> >>> it do anything with the instruction after it finds it?
> >>>
> >>
> >> I will indeed try to make it better.
> >
> > I still don't quite get what it's trying to accomplish, but I wonder if
> > there's some kind of tracking you can add in validate_branch() to keep
> > track of whatever you're looking for, leading up to the indirect jump.
> >
> 
> The motivation behind this is that the `br <Xn>` instruction is a
> dynamic jump (jump to the address contained in the provided register).
> This instruction is used for sibling calls but can also be used for
> switch table. I use this to differentiate these two cases from one another:
> 
> Generally the `adr/adrp` instruction is used prior to `br` in order to
> load the address into the register. What I do here is go back throught
> the instructions and try to identify if the address loaded.
> 
> I also thought of implementing some sort of tracking in validate branch
> because it could be useful for identifying the switch tables as well.
> But it seemed to me like a major change in the sementic of this tool:
> indeed, from my perspective I would have to track the state of the
> registers and I don't know if we want to do that.

I don't have much time to look at this today (and I'll be out next
week), but we had a similar problem in x86.  See the comments above
find_switch_table(), particularly #3.  Does that function not work for
the arm64 case?

> >>>> -hash_add(file->insn_hash, &insn->hash, insn->offset);
> >>>> +/*
> >>>> + * For arm64 architecture, we sometime split instructions so that
> >>>> + * we can track the state evolution (i.e. load/store of pairs of registers).
> >>>> + * We thus need to take both into account and not erase the previous ones.
> >>>> + */
> >>>
> >>> Ew...  Is this an architectural thing, or just a quirk of the arm64
> >>> decoder?
> >>>
> >>
> >> The motivation for this is to simulate the two consecutive operations
> >> that would be executed on x86 but are done in one on arm64. This is
> >> strictly a decoder related quirk. I don't know if there is a better way
> >> to do it without modifying the struct op_src and struct instruction.
> >
> > Ah.  Which ops are those?  Hopefully we can find a better way to
> > represent that with a single instruction.  Adding fake instructions is
> > fragile.
> >
> 
> Those are the load/store of pairs of registers, mainly stp/ldp. Those
> are often use in the function prologues/epilogues to save/restore the
> stack pointers and frame pointers however it can be used with any
> register pair.
> 
> The idea to add a new instruction could work but I would need to extend
> the `struct op_src` as well I think.

Again I don't have much time to look at it, but I do think that changing
op_src/dest to allow for the stp/ldp instructions would work better than
inserting a fake instruction to emulate x86.

Or another idea would be to associate multiple stack_ops with a single
instruction.
Raphael Gault April 30, 2019, 12:20 p.m. UTC | #12
Hi Josh,

On 4/25/19 5:25 PM, Josh Poimboeuf wrote:
> On Thu, Apr 25, 2019 at 08:12:24AM +0000, Raphael Gault wrote:
>> Hi Josh,
>>
>> On 4/24/19 5:56 PM, Josh Poimboeuf wrote:
>>> On Wed, Apr 24, 2019 at 04:32:44PM +0000, Raphael Gault wrote:
>>>>>> diff --git a/tools/objtool/arch/arm64/decode.c b/tools/objtool/arch/arm64/decode.c
>>>>>> index 0feb3ae3af5d..8b293eae2b38 100644
>>>>>> --- a/tools/objtool/arch/arm64/decode.c
>>>>>> +++ b/tools/objtool/arch/arm64/decode.c
>>>>>> @@ -105,6 +105,33 @@ unsigned long arch_compute_rela_sym_offset(int addend)
>>>>>>     return addend;
>>>>>>     }
>>>>>>
>>>>>> +/*
>>>>>> + * In order to know if we are in presence of a sibling
>>>>>> + * call and not in presence of a switch table we look
>>>>>> + * back at the previous instructions and see if we are
>>>>>> + * jumping inside the same function that we are already
>>>>>> + * in.
>>>>>> + */
>>>>>> +bool arch_is_insn_sibling_call(struct instruction *insn)
>>>>>> +{
>>>>>> +struct instruction *prev;
>>>>>> +struct list_head *l;
>>>>>> +struct symbol *sym;
>>>>>> +list_for_each_prev(l, &insn->list) {
>>>>>> +prev = (void *)l;
>>>>>> +if (!prev->func
>>>>>> +|| prev->func->pfunc != insn->func->pfunc)
>>>>>> +return false;
>>>>>> +if (prev->stack_op.src.reg != ADR_SOURCE)
>>>>>> +continue;
>>>>>> +sym = find_symbol_containing(insn->sec, insn->immediate);
>>>>>> +if (!sym || sym->type != STT_FUNC
>>>>>> +|| sym->pfunc != insn->func->pfunc)
>>>>>> +return true;
>>>>>> +break;
>>>>>> +}
>>>>>> +return true;
>>>>>> +}
>>>>>
>>>>> I get the feeling there might be a better way to do this, but I can't
>>>>> figure out what this function is actually doing.  It looks like it
>>>>> searches backwards in the function for an instruction which has
>>>>> stack_op.src.reg != ADR_SOURCE -- what does that mean?  And why doesn't
>>>>> it do anything with the instruction after it finds it?
>>>>>
>>>>
>>>> I will indeed try to make it better.
>>>
>>> I still don't quite get what it's trying to accomplish, but I wonder if
>>> there's some kind of tracking you can add in validate_branch() to keep
>>> track of whatever you're looking for, leading up to the indirect jump.
>>>
>>
>> The motivation behind this is that the `br <Xn>` instruction is a
>> dynamic jump (jump to the address contained in the provided register).
>> This instruction is used for sibling calls but can also be used for
>> switch table. I use this to differentiate these two cases from one another:
>>
>> Generally the `adr/adrp` instruction is used prior to `br` in order to
>> load the address into the register. What I do here is go back throught
>> the instructions and try to identify if the address loaded.
>>
>> I also thought of implementing some sort of tracking in validate branch
>> because it could be useful for identifying the switch tables as well.
>> But it seemed to me like a major change in the sementic of this tool:
>> indeed, from my perspective I would have to track the state of the
>> registers and I don't know if we want to do that.
>
> I don't have much time to look at this today (and I'll be out next
> week), but we had a similar problem in x86.  See the comments above
> find_switch_table(), particularly #3.  Does that function not work for
> the arm64 case?
>

Honestly, I don't have a full understanding of how the switch tables are
handled on arm64. All I know is that I've identified a case in which it
doesn't work (and I get an unreachable instruction warning).
When trying to figure out how the switch tables work on arm64 and how
objtool is retrieving them (on x86 at least) I realised that you look
for 2 relocations :
- One from (.rela).text which refers to the .rodata section
- One from (.rela).rodata which refers somewhere else.
On the case I identified the second relocation doesn't exist thus the
function doesn't find the switch table.

Again since I do not have a good understanding about this I am not able
to say if it is a corner case or not.

>>>>>> -hash_add(file->insn_hash, &insn->hash, insn->offset);
>>>>>> +/*
>>>>>> + * For arm64 architecture, we sometime split instructions so that
>>>>>> + * we can track the state evolution (i.e. load/store of pairs of registers).
>>>>>> + * We thus need to take both into account and not erase the previous ones.
>>>>>> + */
>>>>>
>>>>> Ew...  Is this an architectural thing, or just a quirk of the arm64
>>>>> decoder?
>>>>>
>>>>
>>>> The motivation for this is to simulate the two consecutive operations
>>>> that would be executed on x86 but are done in one on arm64. This is
>>>> strictly a decoder related quirk. I don't know if there is a better way
>>>> to do it without modifying the struct op_src and struct instruction.
>>>
>>> Ah.  Which ops are those?  Hopefully we can find a better way to
>>> represent that with a single instruction.  Adding fake instructions is
>>> fragile.
>>>
>>
>> Those are the load/store of pairs of registers, mainly stp/ldp. Those
>> are often use in the function prologues/epilogues to save/restore the
>> stack pointers and frame pointers however it can be used with any
>> register pair.
>>
>> The idea to add a new instruction could work but I would need to extend
>> the `struct op_src` as well I think.
>
> Again I don't have much time to look at it, but I do think that changing
> op_src/dest to allow for the stp/ldp instructions would work better than
> inserting a fake instruction to emulate x86.
>
> Or another idea would be to associate multiple stack_ops with a single
> instruction.
>

I haven't looked at it in depth yet but I will try to figure out a good
way to represent those instructions on a more proper manner.

Thanks,

--
Raphael Gault
IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.
Raphael Gault May 1, 2019, 3:09 p.m. UTC | #13
Hi Josh,

On 4/30/19 1:20 PM, Raphael Gault wrote:
> Hi Josh,
>
> On 4/25/19 5:25 PM, Josh Poimboeuf wrote:
>> On Thu, Apr 25, 2019 at 08:12:24AM +0000, Raphael Gault wrote:
>>> Hi Josh,
>>>
>>> On 4/24/19 5:56 PM, Josh Poimboeuf wrote:
>>>> On Wed, Apr 24, 2019 at 04:32:44PM +0000, Raphael Gault wrote:
>>>>>>> diff --git a/tools/objtool/arch/arm64/decode.c b/tools/objtool/arch/arm64/decode.c
>>>>>>> index 0feb3ae3af5d..8b293eae2b38 100644
>>>>>>> --- a/tools/objtool/arch/arm64/decode.c
>>>>>>> +++ b/tools/objtool/arch/arm64/decode.c
>>>>>>> @@ -105,6 +105,33 @@ unsigned long arch_compute_rela_sym_offset(int addend)
>>>>>>>      return addend;
>>>>>>>      }
>>>>>>>
>>>>>>> +/*
>>>>>>> + * In order to know if we are in presence of a sibling
>>>>>>> + * call and not in presence of a switch table we look
>>>>>>> + * back at the previous instructions and see if we are
>>>>>>> + * jumping inside the same function that we are already
>>>>>>> + * in.
>>>>>>> + */
>>>>>>> +bool arch_is_insn_sibling_call(struct instruction *insn)
>>>>>>> +{
>>>>>>> +struct instruction *prev;
>>>>>>> +struct list_head *l;
>>>>>>> +struct symbol *sym;
>>>>>>> +list_for_each_prev(l, &insn->list) {
>>>>>>> +prev = (void *)l;
>>>>>>> +if (!prev->func
>>>>>>> +|| prev->func->pfunc != insn->func->pfunc)
>>>>>>> +return false;
>>>>>>> +if (prev->stack_op.src.reg != ADR_SOURCE)
>>>>>>> +continue;
>>>>>>> +sym = find_symbol_containing(insn->sec, insn->immediate);
>>>>>>> +if (!sym || sym->type != STT_FUNC
>>>>>>> +|| sym->pfunc != insn->func->pfunc)
>>>>>>> +return true;
>>>>>>> +break;
>>>>>>> +}
>>>>>>> +return true;
>>>>>>> +}
>>>>>>
>>>>>> I get the feeling there might be a better way to do this, but I can't
>>>>>> figure out what this function is actually doing.  It looks like it
>>>>>> searches backwards in the function for an instruction which has
>>>>>> stack_op.src.reg != ADR_SOURCE -- what does that mean?  And why doesn't
>>>>>> it do anything with the instruction after it finds it?
>>>>>>
>>>>>
>>>>> I will indeed try to make it better.
>>>>
>>>> I still don't quite get what it's trying to accomplish, but I wonder if
>>>> there's some kind of tracking you can add in validate_branch() to keep
>>>> track of whatever you're looking for, leading up to the indirect jump.
>>>>
>>>
>>> The motivation behind this is that the `br <Xn>` instruction is a
>>> dynamic jump (jump to the address contained in the provided register).
>>> This instruction is used for sibling calls but can also be used for
>>> switch table. I use this to differentiate these two cases from one another:
>>>
>>> Generally the `adr/adrp` instruction is used prior to `br` in order to
>>> load the address into the register. What I do here is go back throught
>>> the instructions and try to identify if the address loaded.
>>>
>>> I also thought of implementing some sort of tracking in validate branch
>>> because it could be useful for identifying the switch tables as well.
>>> But it seemed to me like a major change in the sementic of this tool:
>>> indeed, from my perspective I would have to track the state of the
>>> registers and I don't know if we want to do that.
>>
>> I don't have much time to look at this today (and I'll be out next
>> week), but we had a similar problem in x86.  See the comments above
>> find_switch_table(), particularly #3.  Does that function not work for
>> the arm64 case?
>>
>
> Honestly, I don't have a full understanding of how the switch tables are
> handled on arm64. All I know is that I've identified a case in which it
> doesn't work (and I get an unreachable instruction warning).
> When trying to figure out how the switch tables work on arm64 and how
> objtool is retrieving them (on x86 at least) I realised that you look
> for 2 relocations :
> - One from (.rela).text which refers to the .rodata section
> - One from (.rela).rodata which refers somewhere else.
> On the case I identified the second relocation doesn't exist thus the
> function doesn't find the switch table.
>
> Again since I do not have a good understanding about this I am not able
> to say if it is a corner case or not.
>
>>>>>>> -hash_add(file->insn_hash, &insn->hash, insn->offset);
>>>>>>> +/*
>>>>>>> + * For arm64 architecture, we sometime split instructions so that
>>>>>>> + * we can track the state evolution (i.e. load/store of pairs of registers).
>>>>>>> + * We thus need to take both into account and not erase the previous ones.
>>>>>>> + */
>>>>>>
>>>>>> Ew...  Is this an architectural thing, or just a quirk of the arm64
>>>>>> decoder?
>>>>>>
>>>>>
>>>>> The motivation for this is to simulate the two consecutive operations
>>>>> that would be executed on x86 but are done in one on arm64. This is
>>>>> strictly a decoder related quirk. I don't know if there is a better way
>>>>> to do it without modifying the struct op_src and struct instruction.
>>>>
>>>> Ah.  Which ops are those?  Hopefully we can find a better way to
>>>> represent that with a single instruction.  Adding fake instructions is
>>>> fragile.
>>>>
>>>
>>> Those are the load/store of pairs of registers, mainly stp/ldp. Those
>>> are often use in the function prologues/epilogues to save/restore the
>>> stack pointers and frame pointers however it can be used with any
>>> register pair.
>>>
>>> The idea to add a new instruction could work but I would need to extend
>>> the `struct op_src` as well I think.
>>
>> Again I don't have much time to look at it, but I do think that changing
>> op_src/dest to allow for the stp/ldp instructions would work better than
>> inserting a fake instruction to emulate x86.
>>
>> Or another idea would be to associate multiple stack_ops with a single
>> instruction.
>>
>
> I haven't looked at it in depth yet but I will try to figure out a good
> way to represent those instructions on a more proper manner.

I wanted to get your thoughts on the solution I found without waiting
for v2. If it's too much trouble reviewing it now I'll wait for the v2.

I added a field to the struct stack_op in order to have access to an
extra register. This way I can provide the extra register to the instruction
in order to use it later.

diff --git a/tools/objtool/arch.h b/tools/objtool/arch.h
index 52599ebd89fb..ae9ae25b3bdc 100644
--- a/tools/objtool/arch.h
+++ b/tools/objtool/arch.h
@@ -51,6 +51,7 @@ struct op_dest {
  int offset;
  };

+
  enum op_src_type {
  OP_SRC_REG,
  OP_SRC_REG_INDIRECT,
@@ -66,9 +67,16 @@ struct op_src {
  int offset;
  };

+struct op_extra {
+unsigned char used;
+unsigned char reg;
+int offset;
+};
+
  struct stack_op {
  struct op_dest dest;
  struct op_src src;
+struct op_extra extra;
  };

  struct instruction;
diff --git a/tools/objtool/arch/arm64/decode.c b/tools/objtool/arch/arm64/decode.c
index 17b5d59f16ad..86ad37c8397c 100644
--- a/tools/objtool/arch/arm64/decode.c
+++ b/tools/objtool/arch/arm64/decode.c
@@ -1743,23 +1673,26 @@ int arm_decode_ld_st_regs_pair_off(u32 instr, unsigned char *type,
  op->src.type = OP_SRC_REG_INDIRECT;
  op->src.reg = CFI_SP;
  op->src.offset = 0;
-state.curr_offset = 8;
  op->dest.type = OP_DEST_REG;
-op->dest.reg = state.regs[0];
+op->dest.reg = rt;
  op->dest.offset = 0;
+op->extra.used = 1;
+op->extra.reg = rt2;
+op->extra.offset = 8;
  break;
  default:
  op->dest.type = OP_DEST_REG_INDIRECT;
  op->dest.reg = CFI_SP;
  op->dest.offset = 8;
-state.curr_offset = 0;
  op->src.type = OP_SRC_REG;
-op->src.reg = state.regs[1];
+op->src.reg = rt2;
  op->src.offset = 0;
+op->extra.used = 1;
+op->extra.reg = rt;
+op->extra.offset = 0;
  /* store */
  }
-state.op = *op;
-return INSN_COMPOSED;
+return 0;
  }

  int arm_decode_ld_st_regs_pair_post(u32 instr, unsigned char *type,
diff --git a/tools/objtool/check.c b/tools/objtool/check.c
index 5ddb25414de5..df1fb6ce1e8f 100644
--- a/tools/objtool/check.c
+++ b/tools/objtool/check.c
@@ -1580,6 +1569,18 @@ static int update_insn_state(struct instruction *insn, struct insn_state *state)
      initial_func_cfi.cfa.base == CFI_CFA)
  cfa->base = CFI_CFA;

+if (op->extra.used) {
+if (regs[op->extra.reg].offset == -state->stack_size)
+restore_reg(state, op->extra.reg);
+state->stack_size -= 8;
+if (cfa->base == CFI_SP)
+cfa->offset -= 8;
+if (cfa->base == CFI_SP &&
+    cfa->offset == 0 &&
+    initial_func_cfi.cfa.base == CFI_CFA)
+cfa->base = CFI_CFA;
+}
+
  break;

  case OP_SRC_REG_INDIRECT:
@@ -1598,12 +1599,22 @@ static int update_insn_state(struct instruction *insn, struct insn_state *state)
  /* drap: mov disp(%rbp), %reg */
  restore_reg(state, op->dest.reg);

+if (op->extra.used &&
+    op->src.reg == CFI_BP &&
+    op->extra.offset == regs[op->extra.reg].offset)
+restore_reg(state, op->extra.reg);
+
  } else if (op->src.reg == cfa->base &&
      op->src.offset == regs[op->dest.reg].offset + cfa->offset) {

  /* mov disp(%rbp), %reg */
  /* mov disp(%rsp), %reg */
  restore_reg(state, op->dest.reg);
+
+if (op->extra.used &&
+    op->src.reg == cfa->base &&
+    op->extra.offset == regs[op->extra.reg].offset + cfa->offset)
+restore_reg(state, op->extra.reg);
  }

  break;
@@ -1653,6 +1664,21 @@ static int update_insn_state(struct instruction *insn, struct insn_state *state)
  save_reg(state, op->src.reg, CFI_CFA, -state->stack_size);
  }

+if (op->extra.used) {
+state->stack_size += 8;
+if (cfa->base == CFI_CFA)
+cfa->base = CFI_SP;
+if (cfa->base == CFI_SP)
+cfa->offset += 8;
+if (!state->drap ||
+    (!(op->extra.reg == cfa->base &&
+       op->extra.reg == state->drap_reg) &&
+     !(op->extra.reg == CFI_BP &&
+       cfa->base == state->drap_reg) &&
+     regs[op->extra.reg].base == CFI_UNDEFINED))
+save_reg(state, op->extra.reg, CFI_CFA,
+ -state->stack_size);
+}
  /* detect when asm code uses rbp as a scratch register */
  if (!no_fp && insn->func && op->src.reg == CFI_BP &&
      cfa->base != CFI_BP)
@@ -1671,11 +1697,19 @@ static int update_insn_state(struct instruction *insn, struct insn_state *state)
  /* save drap offset so we know when to restore it */
  state->drap_offset = op->dest.offset;
  }
+if (op->extra.used && op->extra.reg == cfa->base &&
+    op->extra.reg == state->drap_reg) {
+cfa->base = CFI_BP_INDIRECT;
+cfa->offset = op->extra.offset;
+}

  else if (regs[op->src.reg].base == CFI_UNDEFINED) {

  /* drap: mov reg, disp(%rbp) */
  save_reg(state, op->src.reg, CFI_BP, op->dest.offset);
+if (op->extra.used)
+save_reg(state, op->extra.reg, CFI_BP,
+ op->extra.offset);
  }

  } else if (op->dest.reg == cfa->base) {
@@ -1684,8 +1718,12 @@ static int update_insn_state(struct instruction *insn, struct insn_state *state)
  /* mov reg, disp(%rsp) */
  save_reg(state, op->src.reg, CFI_CFA,
   op->dest.offset - state->cfa.offset);
+if (op->extra.used)
+save_reg(state, op->extra.reg, CFI_CFA,
+ op->extra.offset - state->cfa.offset);
  }

+
  break;

  case OP_DEST_LEAVE:

Thanks,

--
Raphael Gault
IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.
diff mbox series

Patch

diff --git a/tools/objtool/arch.h b/tools/objtool/arch.h
index 0eff166ca613..f3bef3f2cef3 100644
--- a/tools/objtool/arch.h
+++ b/tools/objtool/arch.h
@@ -88,4 +88,6 @@  unsigned long arch_compute_jump_destination(struct instruction *insn);
 
 unsigned long arch_compute_rela_sym_offset(int addend);
 
+bool arch_is_insn_sibling_call(struct instruction *insn);
+
 #endif /* _ARCH_H */
diff --git a/tools/objtool/arch/arm64/decode.c b/tools/objtool/arch/arm64/decode.c
index 0feb3ae3af5d..8b293eae2b38 100644
--- a/tools/objtool/arch/arm64/decode.c
+++ b/tools/objtool/arch/arm64/decode.c
@@ -105,6 +105,33 @@  unsigned long arch_compute_rela_sym_offset(int addend)
 	return addend;
 }
 
+/*
+ * In order to know if we are in presence of a sibling
+ * call and not in presence of a switch table we look
+ * back at the previous instructions and see if we are
+ * jumping inside the same function that we are already
+ * in.
+ */
+bool arch_is_insn_sibling_call(struct instruction *insn)
+{
+	struct instruction *prev;
+	struct list_head *l;
+	struct symbol *sym;
+	list_for_each_prev(l, &insn->list) {
+		prev = (void *)l;
+		if (!prev->func
+			|| prev->func->pfunc != insn->func->pfunc)
+			return false;
+		if (prev->stack_op.src.reg != ADR_SOURCE)
+			continue;
+		sym = find_symbol_containing(insn->sec, insn->immediate);
+		if (!sym || sym->type != STT_FUNC
+			|| sym->pfunc != insn->func->pfunc)
+			return true;
+		break;
+	}
+	return true;
+}
 static int is_arm64(struct elf *elf)
 {
 	switch(elf->ehdr.e_machine){
diff --git a/tools/objtool/arch/x86/decode.c b/tools/objtool/arch/x86/decode.c
index 1af7b4996307..88c3d99c76be 100644
--- a/tools/objtool/arch/x86/decode.c
+++ b/tools/objtool/arch/x86/decode.c
@@ -85,6 +85,11 @@  unsigned long arch_compute_rela_sym_offset(int addend)
 	return addend + 4;
 }
 
+bool arch_is_insn_sibling_call(struct instruction *insn)
+{
+	return true;
+}
+
 int arch_orc_read_unwind_hints(struct objtool_file *file)
 {
 	struct section *sec, *relasec;
diff --git a/tools/objtool/check.c b/tools/objtool/check.c
index 17fcd8c8f9c1..fa6106214318 100644
--- a/tools/objtool/check.c
+++ b/tools/objtool/check.c
@@ -261,10 +261,12 @@  static int decode_instructions(struct objtool_file *file)
 	unsigned long offset;
 	struct instruction *insn;
 	int ret;
+	static int composed_insn = 0;
 
 	for_each_sec(file, sec) {
 
-		if (!(sec->sh.sh_flags & SHF_EXECINSTR))
+		if (!(sec->sh.sh_flags & SHF_EXECINSTR)
+			&& (strcmp(sec->name, ".altinstr_replacement") || !IGNORE_SHF_EXEC_FLAG))
 			continue;
 
 		if (strcmp(sec->name, ".altinstr_replacement") &&
@@ -297,10 +299,22 @@  static int decode_instructions(struct objtool_file *file)
 				WARN_FUNC("invalid instruction type %d",
 					  insn->sec, insn->offset, insn->type);
 				ret = -1;
-				goto err;
+				free(insn);
+				continue;
 			}
-
-			hash_add(file->insn_hash, &insn->hash, insn->offset);
+			/*
+			 * For arm64 architecture, we sometime split instructions so that
+			 * we can track the state evolution (i.e. load/store of pairs of registers).
+			 * We thus need to take both into account and not erase the previous ones.
+			 */
+			if (composed_insn > 0)
+				hash_add(file->insn_hash, &insn->hash, insn->offset + composed_insn);
+			else
+				hash_add(file->insn_hash, &insn->hash, insn->offset);
+			if (insn->len == 0)
+				composed_insn++;
+			else
+				composed_insn = 0;
 			list_add_tail(&insn->list, &file->insn_list);
 		}
 
@@ -510,10 +524,10 @@  static int add_jump_destinations(struct objtool_file *file)
 			dest_off = arch_compute_jump_destination(insn);
 		} else if (rela->sym->type == STT_SECTION) {
 			dest_sec = rela->sym->sec;
-			dest_off = rela->addend + 4;
+			dest_off = arch_compute_rela_sym_offset(rela->addend);
 		} else if (rela->sym->sec->idx) {
 			dest_sec = rela->sym->sec;
-			dest_off = rela->sym->sym.st_value + rela->addend + 4;
+			dest_off = rela->sym->sym.st_value + arch_compute_rela_sym_offset(rela->addend);
 		} else if (strstr(rela->sym->name, "_indirect_thunk_")) {
 			/*
 			 * Retpoline jumps are really dynamic jumps in
@@ -663,7 +677,7 @@  static int handle_group_alt(struct objtool_file *file,
 		last_orig_insn = insn;
 	}
 
-	if (next_insn_same_sec(file, last_orig_insn)) {
+	if (last_orig_insn && next_insn_same_sec(file, last_orig_insn)) {
 		fake_jump = malloc(sizeof(*fake_jump));
 		if (!fake_jump) {
 			WARN("malloc failed");
@@ -976,6 +990,17 @@  static struct rela *find_switch_table(struct objtool_file *file,
 		if (find_symbol_containing(rodata_sec, table_offset))
 			continue;
 
+		/*
+		 * If we are on arm64 architecture, we now that we
+		 * are in presence of a switch table thanks to
+		 * the `br <Xn>` insn. but we can't retrieve it yet.
+		 * So we just ignore unreachable for this file.
+		 */
+		if (JUMP_DYNAMIC_IS_SWITCH_TABLE) {
+			file->ignore_unreachables = true;
+			return NULL;
+		}
+
 		rodata_rela = find_rela_by_dest(rodata_sec, table_offset);
 		if (rodata_rela) {
 			/*
@@ -1258,8 +1283,8 @@  static void save_reg(struct insn_state *state, unsigned char reg, int base,
 
 static void restore_reg(struct insn_state *state, unsigned char reg)
 {
-	state->regs[reg].base = CFI_UNDEFINED;
-	state->regs[reg].offset = 0;
+	state->regs[reg].base = initial_func_cfi.regs[reg].base;
+	state->regs[reg].offset = initial_func_cfi.regs[reg].offset;
 }
 
 /*
@@ -1415,8 +1440,33 @@  static int update_insn_state(struct instruction *insn, struct insn_state *state)
 
 				/* add imm, %rsp */
 				state->stack_size -= op->src.offset;
-				if (cfa->base == CFI_SP)
+				if (cfa->base == CFI_SP) {
 					cfa->offset -= op->src.offset;
+					if (state->stack_size == 0
+							&& initial_func_cfi.cfa.base == CFI_CFA) {
+						cfa->base = CFI_CFA;
+						cfa->offset = 0;
+					}
+				}
+				/*
+				 * on arm64 the save/restore of sp into fp is not automatic
+				 * and the first one can be done without the other so we
+				 * need to be careful not to invalidate the stack frame in such
+				 * cases.
+				 */
+				else if (cfa->base == CFI_BP) {
+					if (state->stack_size == 0
+							&& initial_func_cfi.cfa.base == CFI_CFA) {
+						cfa->base = CFI_CFA;
+						cfa->offset = 0;
+						restore_reg(state, CFI_BP);
+					}
+				}
+				else if (cfa->base == CFI_CFA) {
+					cfa->base = CFI_SP;
+					if (state->stack_size >= 16)
+						cfa->offset = 16;
+				}
 				break;
 			}
 
@@ -1427,6 +1477,16 @@  static int update_insn_state(struct instruction *insn, struct insn_state *state)
 				break;
 			}
 
+			if (op->src.reg == CFI_SP && op->dest.reg == CFI_BP &&
+			    cfa->base == CFI_SP &&
+			    regs[CFI_BP].base == CFI_CFA &&
+			    regs[CFI_BP].offset == -cfa->offset) {
+
+				/* mov %rsp, %rbp */
+				cfa->base = op->dest.reg;
+				state->bp_scratch = false;
+				break;
+			}
 			if (op->src.reg == CFI_SP && cfa->base == CFI_SP) {
 
 				/* drap: lea disp(%rsp), %drap */
@@ -1518,6 +1578,9 @@  static int update_insn_state(struct instruction *insn, struct insn_state *state)
 			state->stack_size -= 8;
 			if (cfa->base == CFI_SP)
 				cfa->offset -= 8;
+			if (cfa->base == CFI_SP && cfa->offset == 0
+					&& initial_func_cfi.cfa.base == CFI_CFA)
+				cfa->base = CFI_CFA;
 
 			break;
 
@@ -1557,6 +1620,8 @@  static int update_insn_state(struct instruction *insn, struct insn_state *state)
 
 	case OP_DEST_PUSH:
 		state->stack_size += 8;
+		if (cfa->base == CFI_CFA)
+			cfa->base = CFI_SP;
 		if (cfa->base == CFI_SP)
 			cfa->offset += 8;
 
@@ -1728,7 +1793,7 @@  static int validate_branch(struct objtool_file *file, struct instruction *first,
 	insn = first;
 	sec = insn->sec;
 
-	if (insn->alt_group && list_empty(&insn->alts)) {
+	if (!insn->visited && insn->alt_group && list_empty(&insn->alts)) {
 		WARN_FUNC("don't know how to handle branch to middle of alternative instruction group",
 			  sec, insn->offset);
 		return 1;
@@ -1871,6 +1936,14 @@  static int validate_branch(struct objtool_file *file, struct instruction *first,
 		case INSN_JUMP_DYNAMIC:
 			if (func && list_empty(&insn->alts) &&
 			    has_modified_stack_frame(&state)) {
+				/*
+				 * On arm64 `br <Xn>` insn can be used for switch-tables
+				 * but it cannot be distinguished in itself from a sibling
+				 * call thus we need to have a look at the previous instructions
+				 * to determine which it is
+				 */
+				if (!arch_is_insn_sibling_call(insn))
+					break;
 				WARN_FUNC("sibling call from callable instruction with modified stack frame",
 					  sec, insn->offset);
 				return 1;
diff --git a/tools/objtool/elf.c b/tools/objtool/elf.c
index b8f3cca8e58b..136f9b9fb1d1 100644
--- a/tools/objtool/elf.c
+++ b/tools/objtool/elf.c
@@ -74,7 +74,8 @@  struct symbol *find_symbol_by_offset(struct section *sec, unsigned long offset)
 	struct symbol *sym;
 
 	list_for_each_entry(sym, &sec->symbol_list, list)
-		if (sym->type != STT_SECTION &&
+		if (sym->type != STT_NOTYPE &&
+		    sym->type != STT_SECTION &&
 		    sym->offset == offset)
 			return sym;