diff mbox series

[v2,07/70] x86: Build check for embedded endbr64 instructions

Message ID 20220214125127.17985-8-andrew.cooper3@citrix.com (mailing list archive)
State New, archived
Headers show
Series x86: Support for CET Indirect Branch Tracking | expand

Commit Message

Andrew Cooper Feb. 14, 2022, 12:50 p.m. UTC
From: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>

Embedded endbr64 instructions mark legal indirect branches as far as the CPU
is concerned, which aren't legal as far as the logic is concerned.

When CET-IBT is active, check for embedded byte sequences.  Example failures
look like:

  Fail: Found 2 embedded endbr64 instructions
  0xffff82d040325677: test_endbr64 at /local/xen.git/xen/arch/x86/x86_64/entry.S:28
  0xffff82d040352da6: init_done at /local/xen.git/xen/arch/x86/setup.c:675

Signed-off-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
v2:
 * New
---
 xen/arch/x86/Makefile    |  3 ++
 xen/tools/check-endbr.sh | 76 ++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 79 insertions(+)
 create mode 100755 xen/tools/check-endbr.sh

Comments

Jan Beulich Feb. 15, 2022, 3:12 p.m. UTC | #1
On 14.02.2022 13:50, Andrew Cooper wrote:
> From: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
> 
> Embedded endbr64 instructions mark legal indirect branches as far as the CPU
> is concerned, which aren't legal as far as the logic is concerned.

I think it would help if it was clarified what "embedded" actually means
here.

> --- a/xen/arch/x86/Makefile
> +++ b/xen/arch/x86/Makefile
> @@ -155,6 +155,9 @@ $(TARGET)-syms: prelink.o xen.lds
>  	$(MAKE) -f $(BASEDIR)/Rules.mk $(@D)/.$(@F).1.o
>  	$(LD) $(XEN_LDFLAGS) -T xen.lds -N prelink.o $(build_id_linker) \
>  	    $(@D)/.$(@F).1.o -o $@
> +ifeq ($(CONFIG_XEN_IBT),y)
> +	$(SHELL) $(BASEDIR)/tools/check-endbr.sh $@
> +endif
>  	$(NM) -pa --format=sysv $(@D)/$(@F) \
>  		| $(BASEDIR)/tools/symbols --all-symbols --xensyms --sysv --sort \
>  		>$(@D)/$(@F).map

The same wants doing on xen.efi, I guess?

> --- /dev/null
> +++ b/xen/tools/check-endbr.sh
> @@ -0,0 +1,76 @@
> +#!/bin/sh
> +
> +#
> +# Usage ./$0 xen-syms
> +#
> +
> +set -e
> +
> +OBJCOPY="${OBJCOPY:-objcopy} -j .text $1"
> +OBJDUMP="${OBJDUMP:-objdump} -j .text $1"
> +
> +D=$(mktemp -d)
> +trap "rm -rf $D" EXIT
> +
> +TEXT_BIN=$D/xen-syms.text
> +VALID=$D/valid-addrs
> +ALL=$D/all-addrs
> +BAD=$D/bad-addrs
> +
> +#
> +# First, look for all the valid endbr64 instructions.
> +# A worst-case disassembly, viewed through cat -A, may look like:
> +#
> +# ffff82d040337bd4 <endbr64>:$
> +# ffff82d040337bd4:^If3 0f 1e fa          ^Iendbr64 $
> +# ffff82d040337bd8:^Ieb fe                ^Ijmp    ffff82d040337bd8 <endbr64+0x4>$
> +# ffff82d040337bda:^Ib8 f3 0f 1e fa       ^Imov    $0xfa1e0ff3,%eax$
> +#
> +# Want to grab the address of endbr64 instructions only, ignoring function
> +# names/jump labels/etc, so look for 'endbr64' preceeded by a tab and with any
> +# number of trailing spaces before the end of the line.
> +#
> +${OBJDUMP} -d | grep '	endbr64 *$' | cut -f 1 -d ':' > $VALID &

Since you look at only .text the risk of the disassembler coming
out of sync with the actual instruction stream is lower than when
32- and 16-bit code was also part of what is disassembled, but it's
not zero. Any zero-padding inserted anywhere by the linker can
result in an immediately following ENDBR to be missed (because
sequences of zeros resemble 2-byte insns). While this risk may be
acceptable, I think it wants mentioning at least in the description,
maybe even at the top of the script (where one would likely look
first after it spitting out an error).

Do you perhaps want to also pass -w to objdump, to eliminate the
risk of getting confused by split lines?

> +#
> +# Second, look for any endbr64 byte sequence
> +# This has a couple of complications:
> +#
> +# 1) Grep binary search isn't VMA aware.  Copy .text out as binary, causing
> +#    the grep offset to be from the start of .text.
> +#
> +# 2) AWK can't add 64bit integers, because internally all numbers are doubles.
> +#    When the upper bits are set, the exponents worth of precision is lost in
> +#    the lower bits, rounding integers to the nearest 4k.
> +#
> +#    Instead, use the fact that Xen's .text is within a 1G aligned region, and
> +#    split the VMA in half so AWK's numeric addition is only working on 32 bit
> +#    numbers, which don't lose precision.
> +#
> +eval $(${OBJDUMP} -h | awk '$2 == ".text" {printf "vma_hi=%s\nvma_lo=%s\n", substr($4, 1, 8), substr($4, 9, 16)}')
> +
> +${OBJCOPY} -O binary $TEXT_BIN
> +grep -aob "$(printf '\363\17\36\372')" $TEXT_BIN |
> +    awk -F':' '{printf "%s%x\n", "'$vma_hi'", strtonum(0x'$vma_lo') + $1}' > $ALL

None of the three options passed to grep look to be standardized.
Is this going to cause problems on non-Linux systems? Should this
checking perhaps be put behind a separate Kconfig option?

> +# Wait for $VALID to become complete
> +wait
> +
> +# Sanity check $VALID and $ALL, in case the string parsing bitrots
> +val_sz=$(stat -c '%s' $VALID)
> +all_sz=$(stat -c '%s' $ALL)
> +[ "$val_sz" -eq 0 ]         && { echo "Error: Empty valid-addrs" >&2; exit 1; }
> +[ "$all_sz" -eq 0 ]         && { echo "Error: Empty all-addrs" >&2; exit 1; }
> +[ "$all_sz" -lt "$val_sz" ] && { echo "Error: More valid-addrs than all-addrs" >&2; exit 1; }
> +
> +# $BAD = $ALL - $VALID
> +join -v 2 $VALID $ALL > $BAD
> +nr_bad=$(wc -l < $BAD)
> +
> +# Success
> +[ "$nr_bad" -eq 0 ] && exit 0
> +
> +# Failure
> +echo "Fail: Found ${nr_bad} embedded endbr64 instructions" >&2
> +addr2line -afip -e $1 < $BAD >&2

There probably also wants to be an ADDR2LINE variable then. If
one overrides objdump and objcopy, one would likely want/need to
override this one as well.

Jan
Andrew Cooper Feb. 15, 2022, 5:52 p.m. UTC | #2
On 15/02/2022 15:12, Jan Beulich wrote:
> On 14.02.2022 13:50, Andrew Cooper wrote:
>> From: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
>>
>> Embedded endbr64 instructions mark legal indirect branches as far as the CPU
>> is concerned, which aren't legal as far as the logic is concerned.
> I think it would help if it was clarified what "embedded" actually means
> here.

Oh yeah, that's lost a bit of context now I've split it out of the patch
introducing endbr.h

>
>> --- a/xen/arch/x86/Makefile
>> +++ b/xen/arch/x86/Makefile
>> @@ -155,6 +155,9 @@ $(TARGET)-syms: prelink.o xen.lds
>>  	$(MAKE) -f $(BASEDIR)/Rules.mk $(@D)/.$(@F).1.o
>>  	$(LD) $(XEN_LDFLAGS) -T xen.lds -N prelink.o $(build_id_linker) \
>>  	    $(@D)/.$(@F).1.o -o $@
>> +ifeq ($(CONFIG_XEN_IBT),y)
>> +	$(SHELL) $(BASEDIR)/tools/check-endbr.sh $@
>> +endif
>>  	$(NM) -pa --format=sysv $(@D)/$(@F) \
>>  		| $(BASEDIR)/tools/symbols --all-symbols --xensyms --sysv --sort \
>>  		>$(@D)/$(@F).map
> The same wants doing on xen.efi, I guess?

Probably.

>
>> --- /dev/null
>> +++ b/xen/tools/check-endbr.sh
>> @@ -0,0 +1,76 @@
>> +#!/bin/sh
>> +
>> +#
>> +# Usage ./$0 xen-syms
>> +#
>> +
>> +set -e
>> +
>> +OBJCOPY="${OBJCOPY:-objcopy} -j .text $1"
>> +OBJDUMP="${OBJDUMP:-objdump} -j .text $1"
>> +
>> +D=$(mktemp -d)
>> +trap "rm -rf $D" EXIT
>> +
>> +TEXT_BIN=$D/xen-syms.text
>> +VALID=$D/valid-addrs
>> +ALL=$D/all-addrs
>> +BAD=$D/bad-addrs
>> +
>> +#
>> +# First, look for all the valid endbr64 instructions.
>> +# A worst-case disassembly, viewed through cat -A, may look like:
>> +#
>> +# ffff82d040337bd4 <endbr64>:$
>> +# ffff82d040337bd4:^If3 0f 1e fa          ^Iendbr64 $
>> +# ffff82d040337bd8:^Ieb fe                ^Ijmp    ffff82d040337bd8 <endbr64+0x4>$
>> +# ffff82d040337bda:^Ib8 f3 0f 1e fa       ^Imov    $0xfa1e0ff3,%eax$
>> +#
>> +# Want to grab the address of endbr64 instructions only, ignoring function
>> +# names/jump labels/etc, so look for 'endbr64' preceeded by a tab and with any
>> +# number of trailing spaces before the end of the line.
>> +#
>> +${OBJDUMP} -d | grep '	endbr64 *$' | cut -f 1 -d ':' > $VALID &
> Since you look at only .text the risk of the disassembler coming
> out of sync with the actual instruction stream is lower than when
> 32- and 16-bit code was also part of what is disassembled, but it's
> not zero.

I'm not sure that we have any interesting non-64bit code at all in .text.

_start is technically 32bit but is mode-invariant as far as decoding goes.

The kexec trampoline is here too, but when I dust off my cleanup patch,
there will no longer be data or mode-dependent things to disassemble.

Everything else I can think of is in .init.text.

> Any zero-padding inserted anywhere by the linker can
> result in an immediately following ENDBR to be missed (because
> sequences of zeros resemble 2-byte insns).

I'm not sure this is a problem.  This pass is looking for everything
that objdump thinks is a legal endbr64 instruction, and it splits at labels.

Only the hand-written stubs can legitimately have an endbr64 without a
symbol pointing at it.

We also don't have any 0 padding.  It's specified as 0x90 in the linker
file, although I've been debating switching this to 0xcc for a while now
already.

>  While this risk may be
> acceptable, I think it wants mentioning at least in the description,
> maybe even at the top of the script (where one would likely look
> first after it spitting out an error).
>
> Do you perhaps want to also pass -w to objdump, to eliminate the
> risk of getting confused by split lines?

I think that's probably a good move irrespective.  This particular pipe
is the longest single task in the script which is why I backgrounded it
while the second scan occurs.  -w means fewer lines so hopefully a minor
speedup.

>> +#
>> +# Second, look for any endbr64 byte sequence
>> +# This has a couple of complications:
>> +#
>> +# 1) Grep binary search isn't VMA aware.  Copy .text out as binary, causing
>> +#    the grep offset to be from the start of .text.
>> +#
>> +# 2) AWK can't add 64bit integers, because internally all numbers are doubles.
>> +#    When the upper bits are set, the exponents worth of precision is lost in
>> +#    the lower bits, rounding integers to the nearest 4k.
>> +#
>> +#    Instead, use the fact that Xen's .text is within a 1G aligned region, and
>> +#    split the VMA in half so AWK's numeric addition is only working on 32 bit
>> +#    numbers, which don't lose precision.
>> +#
>> +eval $(${OBJDUMP} -h | awk '$2 == ".text" {printf "vma_hi=%s\nvma_lo=%s\n", substr($4, 1, 8), substr($4, 9, 16)}')
>> +
>> +${OBJCOPY} -O binary $TEXT_BIN
>> +grep -aob "$(printf '\363\17\36\372')" $TEXT_BIN |
>> +    awk -F':' '{printf "%s%x\n", "'$vma_hi'", strtonum(0x'$vma_lo') + $1}' > $ALL
> None of the three options passed to grep look to be standardized.
> Is this going to cause problems on non-Linux systems? Should this
> checking perhaps be put behind a separate Kconfig option?

CI says that FreeBSD is entirely happy, while Alpine Linux isn't.  This
is because Alpine has busybox's grep unless you install the GNU grep
package, and I'm doing a fix to our container.

My plan to fix this is to just declare a "grep capable of binary
searching" a conditional build requirement for Xen.  I don't think this
is onerous, and there no other plausible alternatives here.

The other option is to detect the absence of support an skip the check. 
It is after all a defence in depth scheme, and anything liable to cause
a problem would be caught in CI anyway.

>> +# Wait for $VALID to become complete
>> +wait
>> +
>> +# Sanity check $VALID and $ALL, in case the string parsing bitrots
>> +val_sz=$(stat -c '%s' $VALID)
>> +all_sz=$(stat -c '%s' $ALL)
>> +[ "$val_sz" -eq 0 ]         && { echo "Error: Empty valid-addrs" >&2; exit 1; }
>> +[ "$all_sz" -eq 0 ]         && { echo "Error: Empty all-addrs" >&2; exit 1; }
>> +[ "$all_sz" -lt "$val_sz" ] && { echo "Error: More valid-addrs than all-addrs" >&2; exit 1; }
>> +
>> +# $BAD = $ALL - $VALID
>> +join -v 2 $VALID $ALL > $BAD
>> +nr_bad=$(wc -l < $BAD)
>> +
>> +# Success
>> +[ "$nr_bad" -eq 0 ] && exit 0
>> +
>> +# Failure
>> +echo "Fail: Found ${nr_bad} embedded endbr64 instructions" >&2
>> +addr2line -afip -e $1 < $BAD >&2
> There probably also wants to be an ADDR2LINE variable then. If
> one overrides objdump and objcopy, one would likely want/need to
> override this one as well.

Ah yes.  Will fix.

~Andrew
Jan Beulich Feb. 16, 2022, 8:41 a.m. UTC | #3
On 15.02.2022 18:52, Andrew Cooper wrote:
> On 15/02/2022 15:12, Jan Beulich wrote:
>> On 14.02.2022 13:50, Andrew Cooper wrote:
>>> --- /dev/null
>>> +++ b/xen/tools/check-endbr.sh
>>> @@ -0,0 +1,76 @@
>>> +#!/bin/sh
>>> +
>>> +#
>>> +# Usage ./$0 xen-syms
>>> +#
>>> +
>>> +set -e
>>> +
>>> +OBJCOPY="${OBJCOPY:-objcopy} -j .text $1"
>>> +OBJDUMP="${OBJDUMP:-objdump} -j .text $1"
>>> +
>>> +D=$(mktemp -d)
>>> +trap "rm -rf $D" EXIT
>>> +
>>> +TEXT_BIN=$D/xen-syms.text
>>> +VALID=$D/valid-addrs
>>> +ALL=$D/all-addrs
>>> +BAD=$D/bad-addrs
>>> +
>>> +#
>>> +# First, look for all the valid endbr64 instructions.
>>> +# A worst-case disassembly, viewed through cat -A, may look like:
>>> +#
>>> +# ffff82d040337bd4 <endbr64>:$
>>> +# ffff82d040337bd4:^If3 0f 1e fa          ^Iendbr64 $
>>> +# ffff82d040337bd8:^Ieb fe                ^Ijmp    ffff82d040337bd8 <endbr64+0x4>$
>>> +# ffff82d040337bda:^Ib8 f3 0f 1e fa       ^Imov    $0xfa1e0ff3,%eax$
>>> +#
>>> +# Want to grab the address of endbr64 instructions only, ignoring function
>>> +# names/jump labels/etc, so look for 'endbr64' preceeded by a tab and with any
>>> +# number of trailing spaces before the end of the line.
>>> +#
>>> +${OBJDUMP} -d | grep '	endbr64 *$' | cut -f 1 -d ':' > $VALID &
>> Since you look at only .text the risk of the disassembler coming
>> out of sync with the actual instruction stream is lower than when
>> 32- and 16-bit code was also part of what is disassembled, but it's
>> not zero.
> 
> I'm not sure that we have any interesting non-64bit code at all in .text.
> 
> _start is technically 32bit but is mode-invariant as far as decoding goes.
> 
> The kexec trampoline is here too, but when I dust off my cleanup patch,
> there will no longer be data or mode-dependent things to disassemble.
> 
> Everything else I can think of is in .init.text.
> 
>> Any zero-padding inserted anywhere by the linker can
>> result in an immediately following ENDBR to be missed (because
>> sequences of zeros resemble 2-byte insns).
> 
> I'm not sure this is a problem.  This pass is looking for everything
> that objdump thinks is a legal endbr64 instruction, and it splits at labels.

Oh, right - I did miss the splitting at labels aspect. Hopefully
objdump is really consistent with this.

> Only the hand-written stubs can legitimately have an endbr64 without a
> symbol pointing at it.
> 
> We also don't have any 0 padding.  It's specified as 0x90 in the linker
> file, although I've been debating switching this to 0xcc for a while now
> already.

The linker script comes into play only in the final linking step.
Prior "ld -r" could easily have inserted other padding.

>>> +#
>>> +# Second, look for any endbr64 byte sequence
>>> +# This has a couple of complications:
>>> +#
>>> +# 1) Grep binary search isn't VMA aware.  Copy .text out as binary, causing
>>> +#    the grep offset to be from the start of .text.
>>> +#
>>> +# 2) AWK can't add 64bit integers, because internally all numbers are doubles.
>>> +#    When the upper bits are set, the exponents worth of precision is lost in
>>> +#    the lower bits, rounding integers to the nearest 4k.
>>> +#
>>> +#    Instead, use the fact that Xen's .text is within a 1G aligned region, and
>>> +#    split the VMA in half so AWK's numeric addition is only working on 32 bit
>>> +#    numbers, which don't lose precision.
>>> +#
>>> +eval $(${OBJDUMP} -h | awk '$2 == ".text" {printf "vma_hi=%s\nvma_lo=%s\n", substr($4, 1, 8), substr($4, 9, 16)}')
>>> +
>>> +${OBJCOPY} -O binary $TEXT_BIN
>>> +grep -aob "$(printf '\363\17\36\372')" $TEXT_BIN |
>>> +    awk -F':' '{printf "%s%x\n", "'$vma_hi'", strtonum(0x'$vma_lo') + $1}' > $ALL
>> None of the three options passed to grep look to be standardized.
>> Is this going to cause problems on non-Linux systems? Should this
>> checking perhaps be put behind a separate Kconfig option?
> 
> CI says that FreeBSD is entirely happy, while Alpine Linux isn't.  This
> is because Alpine has busybox's grep unless you install the GNU grep
> package, and I'm doing a fix to our container.
> 
> My plan to fix this is to just declare a "grep capable of binary
> searching" a conditional build requirement for Xen.  I don't think this
> is onerous, and there no other plausible alternatives here.
> 
> The other option is to detect the absence of support an skip the check. 
> It is after all a defence in depth scheme, and anything liable to cause
> a problem would be caught in CI anyway.

I'd favor the latter approach (but I wouldn't mind the conditional build
requirement, if you and others deem that better), with a warning issued
when the check can't be performed. I have to admit that I didn't expect
there would be no simple and standardized binary search tool on Unix-es.

Jan
Andrew Cooper Feb. 16, 2022, 11:55 a.m. UTC | #4
On 16/02/2022 08:41, Jan Beulich wrote:
>>> Any zero-padding inserted anywhere by the linker can
>>> result in an immediately following ENDBR to be missed (because
>>> sequences of zeros resemble 2-byte insns).
>> I'm not sure this is a problem.  This pass is looking for everything
>> that objdump thinks is a legal endbr64 instruction, and it splits at labels.
> Oh, right - I did miss the splitting at labels aspect. Hopefully
> objdump is really consistent with this.

Certainly appears to be in my experience.

>>>> +#
>>>> +# Second, look for any endbr64 byte sequence
>>>> +# This has a couple of complications:
>>>> +#
>>>> +# 1) Grep binary search isn't VMA aware.  Copy .text out as binary, causing
>>>> +#    the grep offset to be from the start of .text.
>>>> +#
>>>> +# 2) AWK can't add 64bit integers, because internally all numbers are doubles.
>>>> +#    When the upper bits are set, the exponents worth of precision is lost in
>>>> +#    the lower bits, rounding integers to the nearest 4k.
>>>> +#
>>>> +#    Instead, use the fact that Xen's .text is within a 1G aligned region, and
>>>> +#    split the VMA in half so AWK's numeric addition is only working on 32 bit
>>>> +#    numbers, which don't lose precision.
>>>> +#
>>>> +eval $(${OBJDUMP} -h | awk '$2 == ".text" {printf "vma_hi=%s\nvma_lo=%s\n", substr($4, 1, 8), substr($4, 9, 16)}')
>>>> +
>>>> +${OBJCOPY} -O binary $TEXT_BIN
>>>> +grep -aob "$(printf '\363\17\36\372')" $TEXT_BIN |
>>>> +    awk -F':' '{printf "%s%x\n", "'$vma_hi'", strtonum(0x'$vma_lo') + $1}' > $ALL
>>> None of the three options passed to grep look to be standardized.
>>> Is this going to cause problems on non-Linux systems? Should this
>>> checking perhaps be put behind a separate Kconfig option?
>> CI says that FreeBSD is entirely happy, while Alpine Linux isn't.  This
>> is because Alpine has busybox's grep unless you install the GNU grep
>> package, and I'm doing a fix to our container.
>>
>> My plan to fix this is to just declare a "grep capable of binary
>> searching" a conditional build requirement for Xen.  I don't think this
>> is onerous, and there no other plausible alternatives here.
>>
>> The other option is to detect the absence of support an skip the check. 
>> It is after all a defence in depth scheme, and anything liable to cause
>> a problem would be caught in CI anyway.
> I'd favor the latter approach (but I wouldn't mind the conditional build
> requirement, if you and others deem that better), with a warning issued
> when the check can't be performed. I have to admit that I didn't expect
> there would be no simple and standardized binary search tool on Unix-es.

Ok, so lets do this:

1) This script gets a check for $(grep -aob) and emits a warning to
stderr but exits 0.  This lets people using IBT know that something was
missing.

2) Optional build dependency of `grep -aob` for Xen.  (just a tweak to
README)

3) Update the alpine containers to not miss out.

~Andrew
diff mbox series

Patch

diff --git a/xen/arch/x86/Makefile b/xen/arch/x86/Makefile
index 9fc884813cb5..f15a984aacc2 100644
--- a/xen/arch/x86/Makefile
+++ b/xen/arch/x86/Makefile
@@ -155,6 +155,9 @@  $(TARGET)-syms: prelink.o xen.lds
 	$(MAKE) -f $(BASEDIR)/Rules.mk $(@D)/.$(@F).1.o
 	$(LD) $(XEN_LDFLAGS) -T xen.lds -N prelink.o $(build_id_linker) \
 	    $(@D)/.$(@F).1.o -o $@
+ifeq ($(CONFIG_XEN_IBT),y)
+	$(SHELL) $(BASEDIR)/tools/check-endbr.sh $@
+endif
 	$(NM) -pa --format=sysv $(@D)/$(@F) \
 		| $(BASEDIR)/tools/symbols --all-symbols --xensyms --sysv --sort \
 		>$(@D)/$(@F).map
diff --git a/xen/tools/check-endbr.sh b/xen/tools/check-endbr.sh
new file mode 100755
index 000000000000..3d96e02bdf93
--- /dev/null
+++ b/xen/tools/check-endbr.sh
@@ -0,0 +1,76 @@ 
+#!/bin/sh
+
+#
+# Usage ./$0 xen-syms
+#
+
+set -e
+
+OBJCOPY="${OBJCOPY:-objcopy} -j .text $1"
+OBJDUMP="${OBJDUMP:-objdump} -j .text $1"
+
+D=$(mktemp -d)
+trap "rm -rf $D" EXIT
+
+TEXT_BIN=$D/xen-syms.text
+VALID=$D/valid-addrs
+ALL=$D/all-addrs
+BAD=$D/bad-addrs
+
+#
+# First, look for all the valid endbr64 instructions.
+# A worst-case disassembly, viewed through cat -A, may look like:
+#
+# ffff82d040337bd4 <endbr64>:$
+# ffff82d040337bd4:^If3 0f 1e fa          ^Iendbr64 $
+# ffff82d040337bd8:^Ieb fe                ^Ijmp    ffff82d040337bd8 <endbr64+0x4>$
+# ffff82d040337bda:^Ib8 f3 0f 1e fa       ^Imov    $0xfa1e0ff3,%eax$
+#
+# Want to grab the address of endbr64 instructions only, ignoring function
+# names/jump labels/etc, so look for 'endbr64' preceeded by a tab and with any
+# number of trailing spaces before the end of the line.
+#
+${OBJDUMP} -d | grep '	endbr64 *$' | cut -f 1 -d ':' > $VALID &
+
+#
+# Second, look for any endbr64 byte sequence
+# This has a couple of complications:
+#
+# 1) Grep binary search isn't VMA aware.  Copy .text out as binary, causing
+#    the grep offset to be from the start of .text.
+#
+# 2) AWK can't add 64bit integers, because internally all numbers are doubles.
+#    When the upper bits are set, the exponents worth of precision is lost in
+#    the lower bits, rounding integers to the nearest 4k.
+#
+#    Instead, use the fact that Xen's .text is within a 1G aligned region, and
+#    split the VMA in half so AWK's numeric addition is only working on 32 bit
+#    numbers, which don't lose precision.
+#
+eval $(${OBJDUMP} -h | awk '$2 == ".text" {printf "vma_hi=%s\nvma_lo=%s\n", substr($4, 1, 8), substr($4, 9, 16)}')
+
+${OBJCOPY} -O binary $TEXT_BIN
+grep -aob "$(printf '\363\17\36\372')" $TEXT_BIN |
+    awk -F':' '{printf "%s%x\n", "'$vma_hi'", strtonum(0x'$vma_lo') + $1}' > $ALL
+
+# Wait for $VALID to become complete
+wait
+
+# Sanity check $VALID and $ALL, in case the string parsing bitrots
+val_sz=$(stat -c '%s' $VALID)
+all_sz=$(stat -c '%s' $ALL)
+[ "$val_sz" -eq 0 ]         && { echo "Error: Empty valid-addrs" >&2; exit 1; }
+[ "$all_sz" -eq 0 ]         && { echo "Error: Empty all-addrs" >&2; exit 1; }
+[ "$all_sz" -lt "$val_sz" ] && { echo "Error: More valid-addrs than all-addrs" >&2; exit 1; }
+
+# $BAD = $ALL - $VALID
+join -v 2 $VALID $ALL > $BAD
+nr_bad=$(wc -l < $BAD)
+
+# Success
+[ "$nr_bad" -eq 0 ] && exit 0
+
+# Failure
+echo "Fail: Found ${nr_bad} embedded endbr64 instructions" >&2
+addr2line -afip -e $1 < $BAD >&2
+exit 1