diff mbox series

[v2,2/2] xen: Fix latent check-endbr.sh bug with 32bit build environments

Message ID 20220715132651.1093-3-andrew.cooper3@citrix.com (mailing list archive)
State New, archived
Headers show
Series xen: Fixes to check-endbr.sh | expand

Commit Message

Andrew Cooper July 15, 2022, 1:26 p.m. UTC
While Xen's current VMA means it works, the mawk fix (i.e. using $((0xN)) in
the shell) isn't portable in 32bit shells.  See the code comment for the fix.

The fix found a second latent bug.  Recombining $vma_hi/lo should have used
printf "%s%08x" and only worked previously because $vma_lo had bits set in
it's top nibble.  Combining with the main fix, %08x becomes %07x.

Fixes: $XXX patch 1
Reported-by: Jan Beulich <JBeulich@suse.com>
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
CC: George Dunlap <George.Dunlap@eu.citrix.com>
CC: Jan Beulich <JBeulich@suse.com>
CC: Stefano Stabellini <sstabellini@kernel.org>
CC: Wei Liu <wl@xen.org>
CC: Julien Grall <julien@xen.org>
CC: Anthony PERARD <anthony.perard@citrix.com>
CC: Luca Fancellu <Luca.Fancellu@arm.com>
CC: Mathieu Tarral <mathieu.tarral@protonmail.com>
CC: Bertrand Marquis <Bertrand.Marquis@arm.com>

v2:
 * New
---
 xen/tools/check-endbr.sh | 25 +++++++++++++++++++++----
 1 file changed, 21 insertions(+), 4 deletions(-)

Comments

Jan Beulich July 18, 2022, 9:11 a.m. UTC | #1
On 15.07.2022 15:26, Andrew Cooper wrote:
> While Xen's current VMA means it works, the mawk fix (i.e. using $((0xN)) in
> the shell) isn't portable in 32bit shells.  See the code comment for the fix.
> 
> The fix found a second latent bug.  Recombining $vma_hi/lo should have used
> printf "%s%08x" and only worked previously because $vma_lo had bits set in
> it's top nibble.  Combining with the main fix, %08x becomes %07x.
> 
> Fixes: $XXX patch 1
> Reported-by: Jan Beulich <JBeulich@suse.com>
> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>

Reviewed-by: Jan Beulich <jbeulich@suse.com>
with, I guess, ...

> --- a/xen/tools/check-endbr.sh
> +++ b/xen/tools/check-endbr.sh
> @@ -61,19 +61,36 @@ ${OBJDUMP} -j .text $1 -d -w | grep '	endbr64 *$' | cut -f 1 -d ':' > $VALID &
>  #    the lower bits, rounding integers to the nearest 4k.
>  #
>  #    Instead, use the fact that Xen's .text is within a 1G aligned region, and
> -#    split the VMA in half so AWK's numeric addition is only working on 32 bit
> -#    numbers, which don't lose precision.
> +#    split the VMA so AWK's numeric addition is only working on <32 bit
> +#    numbers, which don't lose precision.  (See point 5)
>  #
>  # 4) MAWK doesn't support plain hex constants (an optional part of the POSIX
>  #    spec), and GAWK and MAWK can't agree on how to work with hex constants in
>  #    a string.  Use the shell to convert $vma_lo to decimal before passing to
>  #    AWK.
>  #
> +# 5) Point 4 isn't fully portable.  POSIX only requires that $((0xN)) be
> +#    evaluated as long, which in 32bit shells turns negative if bit 31 of the
> +#    VMA is set.  AWK then interprets this negative number as a double before
> +#    adding the offsets from the binary grep.
> +#
> +#    Instead of doing an 8/8 split with vma_hi/lo, do a 9/7 split.
> +#
> +#    The consequence of this is that for all offsets, $vma_lo + offset needs
> +#    to be less that 256M (i.e. 7 nibbles) so as to be successfully recombined
> +#    with the 9 nibbles of $vma_hi.  This is fine; .text is at the start of a
> +#    1G aligned region, and Xen is far far smaller than 256M, but leave safety
> +#    check nevertheless.
> +#
>  eval $(${OBJDUMP} -j .text $1 -h |
> -    $AWK '$2 == ".text" {printf "vma_hi=%s\nvma_lo=%s\n", substr($4, 1, 8), substr($4, 9, 16)}')
> +    $AWK '$2 == ".text" {printf "vma_hi=%s\nvma_lo=%s\n", substr($4, 1, 9), substr($4, 10, 16)}')
>  
>  ${OBJCOPY} -j .text $1 -O binary $TEXT_BIN
>  
> +bin_sz=$(stat -c '%s' $TEXT_BIN)
> +[ "$bin_sz" -ge $(((1 << 28) - $vma_lo)) ] &&
> +    { echo "$MSG_PFX Error: .text offsets can exceed 256M" >&2; exit 1; }

... s/can/cannot/ ?

Jan
Andrew Cooper July 18, 2022, 9:31 a.m. UTC | #2
On 18/07/2022 10:11, Jan Beulich wrote:
> On 15.07.2022 15:26, Andrew Cooper wrote:
>> While Xen's current VMA means it works, the mawk fix (i.e. using $((0xN)) in
>> the shell) isn't portable in 32bit shells.  See the code comment for the fix.
>>
>> The fix found a second latent bug.  Recombining $vma_hi/lo should have used
>> printf "%s%08x" and only worked previously because $vma_lo had bits set in
>> it's top nibble.  Combining with the main fix, %08x becomes %07x.
>>
>> Fixes: $XXX patch 1
>> Reported-by: Jan Beulich <JBeulich@suse.com>
>> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
> Reviewed-by: Jan Beulich <jbeulich@suse.com>

Thanks, but...

> with, I guess, ...
>
>> --- a/xen/tools/check-endbr.sh
>> +++ b/xen/tools/check-endbr.sh
>> @@ -61,19 +61,36 @@ ${OBJDUMP} -j .text $1 -d -w | grep '	endbr64 *$' | cut -f 1 -d ':' > $VALID &
>>  #    the lower bits, rounding integers to the nearest 4k.
>>  #
>>  #    Instead, use the fact that Xen's .text is within a 1G aligned region, and
>> -#    split the VMA in half so AWK's numeric addition is only working on 32 bit
>> -#    numbers, which don't lose precision.
>> +#    split the VMA so AWK's numeric addition is only working on <32 bit
>> +#    numbers, which don't lose precision.  (See point 5)
>>  #
>>  # 4) MAWK doesn't support plain hex constants (an optional part of the POSIX
>>  #    spec), and GAWK and MAWK can't agree on how to work with hex constants in
>>  #    a string.  Use the shell to convert $vma_lo to decimal before passing to
>>  #    AWK.
>>  #
>> +# 5) Point 4 isn't fully portable.  POSIX only requires that $((0xN)) be
>> +#    evaluated as long, which in 32bit shells turns negative if bit 31 of the
>> +#    VMA is set.  AWK then interprets this negative number as a double before
>> +#    adding the offsets from the binary grep.
>> +#
>> +#    Instead of doing an 8/8 split with vma_hi/lo, do a 9/7 split.
>> +#
>> +#    The consequence of this is that for all offsets, $vma_lo + offset needs
>> +#    to be less that 256M (i.e. 7 nibbles) so as to be successfully recombined
>> +#    with the 9 nibbles of $vma_hi.  This is fine; .text is at the start of a
>> +#    1G aligned region, and Xen is far far smaller than 256M, but leave safety
>> +#    check nevertheless.
>> +#
>>  eval $(${OBJDUMP} -j .text $1 -h |
>> -    $AWK '$2 == ".text" {printf "vma_hi=%s\nvma_lo=%s\n", substr($4, 1, 8), substr($4, 9, 16)}')
>> +    $AWK '$2 == ".text" {printf "vma_hi=%s\nvma_lo=%s\n", substr($4, 1, 9), substr($4, 10, 16)}')
>>  
>>  ${OBJCOPY} -j .text $1 -O binary $TEXT_BIN
>>  
>> +bin_sz=$(stat -c '%s' $TEXT_BIN)
>> +[ "$bin_sz" -ge $(((1 << 28) - $vma_lo)) ] &&
>> +    { echo "$MSG_PFX Error: .text offsets can exceed 256M" >&2; exit 1; }
> ... s/can/cannot/ ?

Why?  "Can" is correct here.  If the offsets can't exceed 256M, then
everything is good.

~Andrew
Jan Beulich July 18, 2022, 9:49 a.m. UTC | #3
On 18.07.2022 11:31, Andrew Cooper wrote:
> On 18/07/2022 10:11, Jan Beulich wrote:
>> On 15.07.2022 15:26, Andrew Cooper wrote:
>>> --- a/xen/tools/check-endbr.sh
>>> +++ b/xen/tools/check-endbr.sh
>>> @@ -61,19 +61,36 @@ ${OBJDUMP} -j .text $1 -d -w | grep '	endbr64 *$' | cut -f 1 -d ':' > $VALID &
>>>  #    the lower bits, rounding integers to the nearest 4k.
>>>  #
>>>  #    Instead, use the fact that Xen's .text is within a 1G aligned region, and
>>> -#    split the VMA in half so AWK's numeric addition is only working on 32 bit
>>> -#    numbers, which don't lose precision.
>>> +#    split the VMA so AWK's numeric addition is only working on <32 bit
>>> +#    numbers, which don't lose precision.  (See point 5)
>>>  #
>>>  # 4) MAWK doesn't support plain hex constants (an optional part of the POSIX
>>>  #    spec), and GAWK and MAWK can't agree on how to work with hex constants in
>>>  #    a string.  Use the shell to convert $vma_lo to decimal before passing to
>>>  #    AWK.
>>>  #
>>> +# 5) Point 4 isn't fully portable.  POSIX only requires that $((0xN)) be
>>> +#    evaluated as long, which in 32bit shells turns negative if bit 31 of the
>>> +#    VMA is set.  AWK then interprets this negative number as a double before
>>> +#    adding the offsets from the binary grep.
>>> +#
>>> +#    Instead of doing an 8/8 split with vma_hi/lo, do a 9/7 split.
>>> +#
>>> +#    The consequence of this is that for all offsets, $vma_lo + offset needs
>>> +#    to be less that 256M (i.e. 7 nibbles) so as to be successfully recombined
>>> +#    with the 9 nibbles of $vma_hi.  This is fine; .text is at the start of a
>>> +#    1G aligned region, and Xen is far far smaller than 256M, but leave safety
>>> +#    check nevertheless.
>>> +#
>>>  eval $(${OBJDUMP} -j .text $1 -h |
>>> -    $AWK '$2 == ".text" {printf "vma_hi=%s\nvma_lo=%s\n", substr($4, 1, 8), substr($4, 9, 16)}')
>>> +    $AWK '$2 == ".text" {printf "vma_hi=%s\nvma_lo=%s\n", substr($4, 1, 9), substr($4, 10, 16)}')
>>>  
>>>  ${OBJCOPY} -j .text $1 -O binary $TEXT_BIN
>>>  
>>> +bin_sz=$(stat -c '%s' $TEXT_BIN)
>>> +[ "$bin_sz" -ge $(((1 << 28) - $vma_lo)) ] &&
>>> +    { echo "$MSG_PFX Error: .text offsets can exceed 256M" >&2; exit 1; }
>> ... s/can/cannot/ ?
> 
> Why?  "Can" is correct here.  If the offsets can't exceed 256M, then
> everything is good.

Hmm, the wording then indeed is ambiguous. I read "can" as "are allowed
to", when we mean "aren't allowed to". Maybe ".text is 256M or more in
size"? If you mention "offsets", then I think the check should be based
on actually observing an offset which is too large (which .text size
alone doesn't guarantee will happen).

Jan
Andrew Cooper July 18, 2022, 12:07 p.m. UTC | #4
On 18/07/2022 10:49, Jan Beulich wrote:
> On 18.07.2022 11:31, Andrew Cooper wrote:
>> On 18/07/2022 10:11, Jan Beulich wrote:
>>> On 15.07.2022 15:26, Andrew Cooper wrote:
>>>> --- a/xen/tools/check-endbr.sh
>>>> +++ b/xen/tools/check-endbr.sh
>>>> @@ -61,19 +61,36 @@ ${OBJDUMP} -j .text $1 -d -w | grep '	endbr64 *$' | cut -f 1 -d ':' > $VALID &
>>>>  #    the lower bits, rounding integers to the nearest 4k.
>>>>  #
>>>>  #    Instead, use the fact that Xen's .text is within a 1G aligned region, and
>>>> -#    split the VMA in half so AWK's numeric addition is only working on 32 bit
>>>> -#    numbers, which don't lose precision.
>>>> +#    split the VMA so AWK's numeric addition is only working on <32 bit
>>>> +#    numbers, which don't lose precision.  (See point 5)
>>>>  #
>>>>  # 4) MAWK doesn't support plain hex constants (an optional part of the POSIX
>>>>  #    spec), and GAWK and MAWK can't agree on how to work with hex constants in
>>>>  #    a string.  Use the shell to convert $vma_lo to decimal before passing to
>>>>  #    AWK.
>>>>  #
>>>> +# 5) Point 4 isn't fully portable.  POSIX only requires that $((0xN)) be
>>>> +#    evaluated as long, which in 32bit shells turns negative if bit 31 of the
>>>> +#    VMA is set.  AWK then interprets this negative number as a double before
>>>> +#    adding the offsets from the binary grep.
>>>> +#
>>>> +#    Instead of doing an 8/8 split with vma_hi/lo, do a 9/7 split.
>>>> +#
>>>> +#    The consequence of this is that for all offsets, $vma_lo + offset needs
>>>> +#    to be less that 256M (i.e. 7 nibbles) so as to be successfully recombined
>>>> +#    with the 9 nibbles of $vma_hi.  This is fine; .text is at the start of a
>>>> +#    1G aligned region, and Xen is far far smaller than 256M, but leave safety
>>>> +#    check nevertheless.
>>>> +#
>>>>  eval $(${OBJDUMP} -j .text $1 -h |
>>>> -    $AWK '$2 == ".text" {printf "vma_hi=%s\nvma_lo=%s\n", substr($4, 1, 8), substr($4, 9, 16)}')
>>>> +    $AWK '$2 == ".text" {printf "vma_hi=%s\nvma_lo=%s\n", substr($4, 1, 9), substr($4, 10, 16)}')
>>>>  
>>>>  ${OBJCOPY} -j .text $1 -O binary $TEXT_BIN
>>>>  
>>>> +bin_sz=$(stat -c '%s' $TEXT_BIN)
>>>> +[ "$bin_sz" -ge $(((1 << 28) - $vma_lo)) ] &&
>>>> +    { echo "$MSG_PFX Error: .text offsets can exceed 256M" >&2; exit 1; }
>>> ... s/can/cannot/ ?
>> Why?  "Can" is correct here.  If the offsets can't exceed 256M, then
>> everything is good.
> Hmm, the wording then indeed is ambiguous.

I see your point.  In this case it's meant as "are able to", but this is
still clearer than using "can't" because at least the text matches the
check which triggered it.

>  I read "can" as "are allowed
> to", when we mean "aren't allowed to". Maybe ".text is 256M or more in
> size"? If you mention "offsets", then I think the check should be based
> on actually observing an offset which is too large (which .text size
> alone doesn't guarantee will happen).

It's not just .text on its own because the VMA of offset by 2M, hence
the subtraction of $vma_lo in the main calculation.

There's no point searching for offsets.  There will be one near the end,
so all searching for an offset would do is complicate the critical loop.

How about ".text offsets must not exceed 256M" ?

That should be unambiguous.

~Andrew
Jan Beulich July 18, 2022, 2:06 p.m. UTC | #5
On 18.07.2022 14:07, Andrew Cooper wrote:
> On 18/07/2022 10:49, Jan Beulich wrote:
>> On 18.07.2022 11:31, Andrew Cooper wrote:
>>> On 18/07/2022 10:11, Jan Beulich wrote:
>>>> On 15.07.2022 15:26, Andrew Cooper wrote:
>>>>> --- a/xen/tools/check-endbr.sh
>>>>> +++ b/xen/tools/check-endbr.sh
>>>>> @@ -61,19 +61,36 @@ ${OBJDUMP} -j .text $1 -d -w | grep '	endbr64 *$' | cut -f 1 -d ':' > $VALID &
>>>>>  #    the lower bits, rounding integers to the nearest 4k.
>>>>>  #
>>>>>  #    Instead, use the fact that Xen's .text is within a 1G aligned region, and
>>>>> -#    split the VMA in half so AWK's numeric addition is only working on 32 bit
>>>>> -#    numbers, which don't lose precision.
>>>>> +#    split the VMA so AWK's numeric addition is only working on <32 bit
>>>>> +#    numbers, which don't lose precision.  (See point 5)
>>>>>  #
>>>>>  # 4) MAWK doesn't support plain hex constants (an optional part of the POSIX
>>>>>  #    spec), and GAWK and MAWK can't agree on how to work with hex constants in
>>>>>  #    a string.  Use the shell to convert $vma_lo to decimal before passing to
>>>>>  #    AWK.
>>>>>  #
>>>>> +# 5) Point 4 isn't fully portable.  POSIX only requires that $((0xN)) be
>>>>> +#    evaluated as long, which in 32bit shells turns negative if bit 31 of the
>>>>> +#    VMA is set.  AWK then interprets this negative number as a double before
>>>>> +#    adding the offsets from the binary grep.
>>>>> +#
>>>>> +#    Instead of doing an 8/8 split with vma_hi/lo, do a 9/7 split.
>>>>> +#
>>>>> +#    The consequence of this is that for all offsets, $vma_lo + offset needs
>>>>> +#    to be less that 256M (i.e. 7 nibbles) so as to be successfully recombined
>>>>> +#    with the 9 nibbles of $vma_hi.  This is fine; .text is at the start of a
>>>>> +#    1G aligned region, and Xen is far far smaller than 256M, but leave safety
>>>>> +#    check nevertheless.
>>>>> +#
>>>>>  eval $(${OBJDUMP} -j .text $1 -h |
>>>>> -    $AWK '$2 == ".text" {printf "vma_hi=%s\nvma_lo=%s\n", substr($4, 1, 8), substr($4, 9, 16)}')
>>>>> +    $AWK '$2 == ".text" {printf "vma_hi=%s\nvma_lo=%s\n", substr($4, 1, 9), substr($4, 10, 16)}')
>>>>>  
>>>>>  ${OBJCOPY} -j .text $1 -O binary $TEXT_BIN
>>>>>  
>>>>> +bin_sz=$(stat -c '%s' $TEXT_BIN)
>>>>> +[ "$bin_sz" -ge $(((1 << 28) - $vma_lo)) ] &&
>>>>> +    { echo "$MSG_PFX Error: .text offsets can exceed 256M" >&2; exit 1; }
>>>> ... s/can/cannot/ ?
>>> Why?  "Can" is correct here.  If the offsets can't exceed 256M, then
>>> everything is good.
>> Hmm, the wording then indeed is ambiguous.
> 
> I see your point.  In this case it's meant as "are able to", but this is
> still clearer than using "can't" because at least the text matches the
> check which triggered it.
> 
>>  I read "can" as "are allowed
>> to", when we mean "aren't allowed to". Maybe ".text is 256M or more in
>> size"? If you mention "offsets", then I think the check should be based
>> on actually observing an offset which is too large (which .text size
>> alone doesn't guarantee will happen).
> 
> It's not just .text on its own because the VMA of offset by 2M, hence
> the subtraction of $vma_lo in the main calculation.
> 
> There's no point searching for offsets.  There will be one near the end,
> so all searching for an offset would do is complicate the critical loop.
> 
> How about ".text offsets must not exceed 256M" ?
> 
> That should be unambiguous.

Yes, that reads fine. Thanks.

Jan
diff mbox series

Patch

diff --git a/xen/tools/check-endbr.sh b/xen/tools/check-endbr.sh
index b3febd6a4ccc..d6aa117de13b 100755
--- a/xen/tools/check-endbr.sh
+++ b/xen/tools/check-endbr.sh
@@ -61,19 +61,36 @@  ${OBJDUMP} -j .text $1 -d -w | grep '	endbr64 *$' | cut -f 1 -d ':' > $VALID &
 #    the lower bits, rounding integers to the nearest 4k.
 #
 #    Instead, use the fact that Xen's .text is within a 1G aligned region, and
-#    split the VMA in half so AWK's numeric addition is only working on 32 bit
-#    numbers, which don't lose precision.
+#    split the VMA so AWK's numeric addition is only working on <32 bit
+#    numbers, which don't lose precision.  (See point 5)
 #
 # 4) MAWK doesn't support plain hex constants (an optional part of the POSIX
 #    spec), and GAWK and MAWK can't agree on how to work with hex constants in
 #    a string.  Use the shell to convert $vma_lo to decimal before passing to
 #    AWK.
 #
+# 5) Point 4 isn't fully portable.  POSIX only requires that $((0xN)) be
+#    evaluated as long, which in 32bit shells turns negative if bit 31 of the
+#    VMA is set.  AWK then interprets this negative number as a double before
+#    adding the offsets from the binary grep.
+#
+#    Instead of doing an 8/8 split with vma_hi/lo, do a 9/7 split.
+#
+#    The consequence of this is that for all offsets, $vma_lo + offset needs
+#    to be less that 256M (i.e. 7 nibbles) so as to be successfully recombined
+#    with the 9 nibbles of $vma_hi.  This is fine; .text is at the start of a
+#    1G aligned region, and Xen is far far smaller than 256M, but leave safety
+#    check nevertheless.
+#
 eval $(${OBJDUMP} -j .text $1 -h |
-    $AWK '$2 == ".text" {printf "vma_hi=%s\nvma_lo=%s\n", substr($4, 1, 8), substr($4, 9, 16)}')
+    $AWK '$2 == ".text" {printf "vma_hi=%s\nvma_lo=%s\n", substr($4, 1, 9), substr($4, 10, 16)}')
 
 ${OBJCOPY} -j .text $1 -O binary $TEXT_BIN
 
+bin_sz=$(stat -c '%s' $TEXT_BIN)
+[ "$bin_sz" -ge $(((1 << 28) - $vma_lo)) ] &&
+    { echo "$MSG_PFX Error: .text offsets can exceed 256M" >&2; exit 1; }
+
 # instruction:    hex:           oct:
 # endbr64         f3 0f 1e fa    363 017 036 372
 # endbr32         f3 0f 1e fb    363 017 036 373
@@ -84,7 +101,7 @@  then
 else
     grep -aob -e "$(printf '\363\17\36\372')" -e "$(printf '\363\17\36\373')" \
          -e "$(printf '\146\17\37\1')" $TEXT_BIN
-fi | $AWK -F':' '{printf "%s%x\n", "'$vma_hi'", int('$((0x$vma_lo))') + $1}' > $ALL
+fi | $AWK -F':' '{printf "%s%07x\n", "'$vma_hi'", int('$((0x$vma_lo))') + $1}' > $ALL
 
 # Wait for $VALID to become complete
 wait