diff mbox series

[v6,03/20] modpost: detect section mismatch for R_ARM_MOVW_ABS_NC and R_ARM_MOVT_ABS

Message ID 20230521160426.1881124-4-masahiroy@kernel.org (mailing list archive)
State New, archived
Headers show
Series Unify <linux/export.h> and <asm/export.h>, remove EXPORT_DATA_SYMBOL(), faster TRIM_UNUSED_KSYMS | expand

Commit Message

Masahiro Yamada May 21, 2023, 4:04 p.m. UTC
ARM defconfig misses to detect some section mismatches.

  [test code]

    #include <linux/init.h>

    int __initdata foo;
    int get_foo(int x) { return foo; }

It is apparently a bad reference, but modpost does not report anything
for ARM defconfig (i.e. multi_v7_defconfig).

The test code above produces the following relocations.

  Relocation section '.rel.text' at offset 0x200 contains 2 entries:
   Offset     Info    Type            Sym.Value  Sym. Name
  00000000  0000062b R_ARM_MOVW_ABS_NC 00000000   .LANCHOR0
  00000004  0000062c R_ARM_MOVT_ABS    00000000   .LANCHOR0

  Relocation section '.rel.ARM.exidx' at offset 0x210 contains 2 entries:
   Offset     Info    Type            Sym.Value  Sym. Name
  00000000  0000022a R_ARM_PREL31      00000000   .text
  00000000  00001000 R_ARM_NONE        00000000   __aeabi_unwind_cpp_pr0

Currently, R_ARM_MOVW_ABS_NC and R_ARM_MOVT_ABS are just skipped.

Add code to handle them. I checked arch/arm/kernel/module.c to learn
how the offset is encoded in the instruction.

The referenced symbol in relocation might be a local anchor.
If is_valid_name() returns false, let's search for a better symbol name.

Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
---

 scripts/mod/modpost.c | 12 ++++++++++--
 1 file changed, 10 insertions(+), 2 deletions(-)

Comments

Nick Desaulniers May 22, 2023, 6:03 p.m. UTC | #1
+ linux-arm-kernel

On Sun, May 21, 2023 at 9:05 AM Masahiro Yamada <masahiroy@kernel.org> wrote:
>
> ARM defconfig misses to detect some section mismatches.
>
>   [test code]
>
>     #include <linux/init.h>
>
>     int __initdata foo;
>     int get_foo(int x) { return foo; }
>
> It is apparently a bad reference, but modpost does not report anything
> for ARM defconfig (i.e. multi_v7_defconfig).
>
> The test code above produces the following relocations.
>
>   Relocation section '.rel.text' at offset 0x200 contains 2 entries:
>    Offset     Info    Type            Sym.Value  Sym. Name
>   00000000  0000062b R_ARM_MOVW_ABS_NC 00000000   .LANCHOR0
>   00000004  0000062c R_ARM_MOVT_ABS    00000000   .LANCHOR0
>
>   Relocation section '.rel.ARM.exidx' at offset 0x210 contains 2 entries:
>    Offset     Info    Type            Sym.Value  Sym. Name
>   00000000  0000022a R_ARM_PREL31      00000000   .text
>   00000000  00001000 R_ARM_NONE        00000000   __aeabi_unwind_cpp_pr0
>
> Currently, R_ARM_MOVW_ABS_NC and R_ARM_MOVT_ABS are just skipped.
>
> Add code to handle them. I checked arch/arm/kernel/module.c to learn
> how the offset is encoded in the instruction.
>
> The referenced symbol in relocation might be a local anchor.
> If is_valid_name() returns false, let's search for a better symbol name.
>
> Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
> ---
>
>  scripts/mod/modpost.c | 12 ++++++++++--
>  1 file changed, 10 insertions(+), 2 deletions(-)
>
> diff --git a/scripts/mod/modpost.c b/scripts/mod/modpost.c
> index 34fbbd85bfde..ed2301e951a9 100644
> --- a/scripts/mod/modpost.c
> +++ b/scripts/mod/modpost.c
> @@ -1108,7 +1108,7 @@ static inline int is_valid_name(struct elf_info *elf, Elf_Sym *sym)
>  /**
>   * Find symbol based on relocation record info.
>   * In some cases the symbol supplied is a valid symbol so
> - * return refsym. If st_name != 0 we assume this is a valid symbol.
> + * return refsym. If is_valid_name() == true, we assume this is a valid symbol.
>   * In other cases the symbol needs to be looked up in the symbol table
>   * based on section and address.
>   *  **/
> @@ -1121,7 +1121,7 @@ static Elf_Sym *find_tosym(struct elf_info *elf, Elf64_Sword addr,
>         Elf64_Sword d;
>         unsigned int relsym_secindex;
>
> -       if (relsym->st_name != 0)
> +       if (is_valid_name(elf, relsym))
>                 return relsym;
>
>         /*
> @@ -1312,11 +1312,19 @@ static int addend_arm_rel(struct elf_info *elf, Elf_Shdr *sechdr, Elf_Rela *r)
>         unsigned int r_typ = ELF_R_TYPE(r->r_info);
>         Elf_Sym *sym = elf->symtab_start + ELF_R_SYM(r->r_info);
>         unsigned int inst = TO_NATIVE(*reloc_location(elf, sechdr, r));
> +       int offset;
>
>         switch (r_typ) {
>         case R_ARM_ABS32:
>                 r->r_addend = inst + sym->st_value;
>                 break;
> +       case R_ARM_MOVW_ABS_NC:
> +       case R_ARM_MOVT_ABS:
> +               offset = ((inst & 0xf0000) >> 4) | (inst & 0xfff);
> +               offset = (offset ^ 0x8000) - 0x8000;

The code in arch/arm/kernel/module.c then right shifts the offset by
16 for R_ARM_MOVT_ABS. Is that necessary?

> +               offset += sym->st_value;
> +               r->r_addend = offset;
> +               break;
>         case R_ARM_PC24:
>         case R_ARM_CALL:
>         case R_ARM_JUMP24:
> --
> 2.39.2
>
Ard Biesheuvel May 22, 2023, 9:50 p.m. UTC | #2
On Mon, 22 May 2023 at 20:03, Nick Desaulniers <ndesaulniers@google.com> wrote:
>
> + linux-arm-kernel
>
> On Sun, May 21, 2023 at 9:05 AM Masahiro Yamada <masahiroy@kernel.org> wrote:
> >
> > ARM defconfig misses to detect some section mismatches.
> >
> >   [test code]
> >
> >     #include <linux/init.h>
> >
> >     int __initdata foo;
> >     int get_foo(int x) { return foo; }
> >
> > It is apparently a bad reference, but modpost does not report anything
> > for ARM defconfig (i.e. multi_v7_defconfig).
> >
> > The test code above produces the following relocations.
> >
> >   Relocation section '.rel.text' at offset 0x200 contains 2 entries:
> >    Offset     Info    Type            Sym.Value  Sym. Name
> >   00000000  0000062b R_ARM_MOVW_ABS_NC 00000000   .LANCHOR0
> >   00000004  0000062c R_ARM_MOVT_ABS    00000000   .LANCHOR0
> >
> >   Relocation section '.rel.ARM.exidx' at offset 0x210 contains 2 entries:
> >    Offset     Info    Type            Sym.Value  Sym. Name
> >   00000000  0000022a R_ARM_PREL31      00000000   .text
> >   00000000  00001000 R_ARM_NONE        00000000   __aeabi_unwind_cpp_pr0
> >
> > Currently, R_ARM_MOVW_ABS_NC and R_ARM_MOVT_ABS are just skipped.
> >
> > Add code to handle them. I checked arch/arm/kernel/module.c to learn
> > how the offset is encoded in the instruction.
> >
> > The referenced symbol in relocation might be a local anchor.
> > If is_valid_name() returns false, let's search for a better symbol name.
> >
> > Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
> > ---
> >
> >  scripts/mod/modpost.c | 12 ++++++++++--
> >  1 file changed, 10 insertions(+), 2 deletions(-)
> >
> > diff --git a/scripts/mod/modpost.c b/scripts/mod/modpost.c
> > index 34fbbd85bfde..ed2301e951a9 100644
> > --- a/scripts/mod/modpost.c
> > +++ b/scripts/mod/modpost.c
> > @@ -1108,7 +1108,7 @@ static inline int is_valid_name(struct elf_info *elf, Elf_Sym *sym)
> >  /**
> >   * Find symbol based on relocation record info.
> >   * In some cases the symbol supplied is a valid symbol so
> > - * return refsym. If st_name != 0 we assume this is a valid symbol.
> > + * return refsym. If is_valid_name() == true, we assume this is a valid symbol.
> >   * In other cases the symbol needs to be looked up in the symbol table
> >   * based on section and address.
> >   *  **/
> > @@ -1121,7 +1121,7 @@ static Elf_Sym *find_tosym(struct elf_info *elf, Elf64_Sword addr,
> >         Elf64_Sword d;
> >         unsigned int relsym_secindex;
> >
> > -       if (relsym->st_name != 0)
> > +       if (is_valid_name(elf, relsym))
> >                 return relsym;
> >
> >         /*
> > @@ -1312,11 +1312,19 @@ static int addend_arm_rel(struct elf_info *elf, Elf_Shdr *sechdr, Elf_Rela *r)
> >         unsigned int r_typ = ELF_R_TYPE(r->r_info);
> >         Elf_Sym *sym = elf->symtab_start + ELF_R_SYM(r->r_info);
> >         unsigned int inst = TO_NATIVE(*reloc_location(elf, sechdr, r));
> > +       int offset;
> >
> >         switch (r_typ) {
> >         case R_ARM_ABS32:
> >                 r->r_addend = inst + sym->st_value;
> >                 break;
> > +       case R_ARM_MOVW_ABS_NC:
> > +       case R_ARM_MOVT_ABS:
> > +               offset = ((inst & 0xf0000) >> 4) | (inst & 0xfff);
> > +               offset = (offset ^ 0x8000) - 0x8000;
>
> The code in arch/arm/kernel/module.c then right shifts the offset by
> 16 for R_ARM_MOVT_ABS. Is that necessary?
>

MOVW/MOVT pairs are limited to an addend of -/+ 32 KiB, and the same
value must be encoded in both instructions.

When constructing the actual immediate value from the symbol value and
the addend, only the top 16 bits are used in MOVT and the bottom 16
bits in MOVW.

However, this code seems to borrow the Elf_Rela::addend field (which
ARM does not use natively) to record the intermediate value, which
would need to be split if it is used to fix up instruction opcodes.

Btw the Thumb2 encodings of MOVT and MOVW seem to be missing here.


> > +               offset += sym->st_value;
> > +               r->r_addend = offset;
> > +               break;
> >         case R_ARM_PC24:
> >         case R_ARM_CALL:
> >         case R_ARM_JUMP24:
> > --
> > 2.39.2
> >
>
>
> --
> Thanks,
> ~Nick Desaulniers
Masahiro Yamada May 23, 2023, 11:58 a.m. UTC | #3
On Tue, May 23, 2023 at 6:50 AM Ard Biesheuvel <ardb@kernel.org> wrote:
>
> On Mon, 22 May 2023 at 20:03, Nick Desaulniers <ndesaulniers@google.com> wrote:
> >
> > + linux-arm-kernel
> >
> > On Sun, May 21, 2023 at 9:05 AM Masahiro Yamada <masahiroy@kernel.org> wrote:
> > >
> > > ARM defconfig misses to detect some section mismatches.
> > >
> > >   [test code]
> > >
> > >     #include <linux/init.h>
> > >
> > >     int __initdata foo;
> > >     int get_foo(int x) { return foo; }
> > >
> > > It is apparently a bad reference, but modpost does not report anything
> > > for ARM defconfig (i.e. multi_v7_defconfig).
> > >
> > > The test code above produces the following relocations.
> > >
> > >   Relocation section '.rel.text' at offset 0x200 contains 2 entries:
> > >    Offset     Info    Type            Sym.Value  Sym. Name
> > >   00000000  0000062b R_ARM_MOVW_ABS_NC 00000000   .LANCHOR0
> > >   00000004  0000062c R_ARM_MOVT_ABS    00000000   .LANCHOR0
> > >
> > >   Relocation section '.rel.ARM.exidx' at offset 0x210 contains 2 entries:
> > >    Offset     Info    Type            Sym.Value  Sym. Name
> > >   00000000  0000022a R_ARM_PREL31      00000000   .text
> > >   00000000  00001000 R_ARM_NONE        00000000   __aeabi_unwind_cpp_pr0
> > >
> > > Currently, R_ARM_MOVW_ABS_NC and R_ARM_MOVT_ABS are just skipped.
> > >
> > > Add code to handle them. I checked arch/arm/kernel/module.c to learn
> > > how the offset is encoded in the instruction.
> > >
> > > The referenced symbol in relocation might be a local anchor.
> > > If is_valid_name() returns false, let's search for a better symbol name.
> > >
> > > Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
> > > ---
> > >
> > >  scripts/mod/modpost.c | 12 ++++++++++--
> > >  1 file changed, 10 insertions(+), 2 deletions(-)
> > >
> > > diff --git a/scripts/mod/modpost.c b/scripts/mod/modpost.c
> > > index 34fbbd85bfde..ed2301e951a9 100644
> > > --- a/scripts/mod/modpost.c
> > > +++ b/scripts/mod/modpost.c
> > > @@ -1108,7 +1108,7 @@ static inline int is_valid_name(struct elf_info *elf, Elf_Sym *sym)
> > >  /**
> > >   * Find symbol based on relocation record info.
> > >   * In some cases the symbol supplied is a valid symbol so
> > > - * return refsym. If st_name != 0 we assume this is a valid symbol.
> > > + * return refsym. If is_valid_name() == true, we assume this is a valid symbol.
> > >   * In other cases the symbol needs to be looked up in the symbol table
> > >   * based on section and address.
> > >   *  **/
> > > @@ -1121,7 +1121,7 @@ static Elf_Sym *find_tosym(struct elf_info *elf, Elf64_Sword addr,
> > >         Elf64_Sword d;
> > >         unsigned int relsym_secindex;
> > >
> > > -       if (relsym->st_name != 0)
> > > +       if (is_valid_name(elf, relsym))
> > >                 return relsym;
> > >
> > >         /*
> > > @@ -1312,11 +1312,19 @@ static int addend_arm_rel(struct elf_info *elf, Elf_Shdr *sechdr, Elf_Rela *r)
> > >         unsigned int r_typ = ELF_R_TYPE(r->r_info);
> > >         Elf_Sym *sym = elf->symtab_start + ELF_R_SYM(r->r_info);
> > >         unsigned int inst = TO_NATIVE(*reloc_location(elf, sechdr, r));
> > > +       int offset;
> > >
> > >         switch (r_typ) {
> > >         case R_ARM_ABS32:
> > >                 r->r_addend = inst + sym->st_value;
> > >                 break;
> > > +       case R_ARM_MOVW_ABS_NC:
> > > +       case R_ARM_MOVT_ABS:
> > > +               offset = ((inst & 0xf0000) >> 4) | (inst & 0xfff);
> > > +               offset = (offset ^ 0x8000) - 0x8000;
> >
> > The code in arch/arm/kernel/module.c then right shifts the offset by
> > 16 for R_ARM_MOVT_ABS. Is that necessary?
> >
>
> MOVW/MOVT pairs are limited to an addend of -/+ 32 KiB, and the same
> value must be encoded in both instructions.


In my understanding, 'movt' loads the immediate value to
the upper 16-bit of the register.

I am just curious about the code in arch/arm/kernel/module.c.

Please see 'case R_ARM_MOVT_ABS:' part.

  [1] 'offset' is the immediate value encoded in instruction
  [2] Add sym->st_value
  [3] Right-shift 'offset' by 16
  [4] Write it back to the instruction

So, the immediate value encoded in the instruction
is divided by 65536.

I guess we need something like the following?
(left-shift by 16).

  if (ELF32_R_TYPE(rel->r_info) == R_ARM_MOVT_ABS ||
      ELF32_R_TYPE(rel->r_info) == R_ARM_MOVT_PREL)
          offset <<= 16;




>
> When constructing the actual immediate value from the symbol value and
> the addend, only the top 16 bits are used in MOVT and the bottom 16
> bits in MOVW.
>
> However, this code seems to borrow the Elf_Rela::addend field (which
> ARM does not use natively) to record the intermediate value, which
> would need to be split if it is used to fix up instruction opcodes.

At first, modpost supported only RELA for section mismatch checks.

Later, 2c1a51f39d95 ("[PATCH] kbuild: check SHT_REL sections")
added REL support.

But, the common code still used Elf_Rela.


modpost does not need to write back the fixed instruction.
modpost is only interested in the offset address.

Currently, modpost saves the offset address in
r->r_offset even for Rel. I do not like this code.

So, I am trying to reduce the use of Elf_Rela.
For example, this patch.
https://patchwork.kernel.org/project/linux-kbuild/patch/20230521160426.1881124-8-masahiroy@kernel.org/


> Btw the Thumb2 encodings of MOVT and MOVW seem to be missing here.

Right, if CONFIG_THUMB2_KERNEL=y, section mismatch check.

Several relocation types are just skipped.






>
>
> > > +               offset += sym->st_value;
> > > +               r->r_addend = offset;
> > > +               break;
> > >         case R_ARM_PC24:
> > >         case R_ARM_CALL:
> > >         case R_ARM_JUMP24:
> > > --
> > > 2.39.2
> > >
> >
> >
> > --
> > Thanks,
> > ~Nick Desaulniers
--
Best Regards
Masahiro Yamada
Ard Biesheuvel May 23, 2023, 12:20 p.m. UTC | #4
On Tue, 23 May 2023 at 13:59, Masahiro Yamada <masahiroy@kernel.org> wrote:
>
> On Tue, May 23, 2023 at 6:50 AM Ard Biesheuvel <ardb@kernel.org> wrote:
> >
> > On Mon, 22 May 2023 at 20:03, Nick Desaulniers <ndesaulniers@google.com> wrote:
> > >
> > > + linux-arm-kernel
> > >
> > > On Sun, May 21, 2023 at 9:05 AM Masahiro Yamada <masahiroy@kernel.org> wrote:
> > > >
> > > > ARM defconfig misses to detect some section mismatches.
> > > >
> > > >   [test code]
> > > >
> > > >     #include <linux/init.h>
> > > >
> > > >     int __initdata foo;
> > > >     int get_foo(int x) { return foo; }
> > > >
> > > > It is apparently a bad reference, but modpost does not report anything
> > > > for ARM defconfig (i.e. multi_v7_defconfig).
> > > >
> > > > The test code above produces the following relocations.
> > > >
> > > >   Relocation section '.rel.text' at offset 0x200 contains 2 entries:
> > > >    Offset     Info    Type            Sym.Value  Sym. Name
> > > >   00000000  0000062b R_ARM_MOVW_ABS_NC 00000000   .LANCHOR0
> > > >   00000004  0000062c R_ARM_MOVT_ABS    00000000   .LANCHOR0
> > > >
> > > >   Relocation section '.rel.ARM.exidx' at offset 0x210 contains 2 entries:
> > > >    Offset     Info    Type            Sym.Value  Sym. Name
> > > >   00000000  0000022a R_ARM_PREL31      00000000   .text
> > > >   00000000  00001000 R_ARM_NONE        00000000   __aeabi_unwind_cpp_pr0
> > > >
> > > > Currently, R_ARM_MOVW_ABS_NC and R_ARM_MOVT_ABS are just skipped.
> > > >
> > > > Add code to handle them. I checked arch/arm/kernel/module.c to learn
> > > > how the offset is encoded in the instruction.
> > > >
> > > > The referenced symbol in relocation might be a local anchor.
> > > > If is_valid_name() returns false, let's search for a better symbol name.
> > > >
> > > > Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
> > > > ---
> > > >
> > > >  scripts/mod/modpost.c | 12 ++++++++++--
> > > >  1 file changed, 10 insertions(+), 2 deletions(-)
> > > >
> > > > diff --git a/scripts/mod/modpost.c b/scripts/mod/modpost.c
> > > > index 34fbbd85bfde..ed2301e951a9 100644
> > > > --- a/scripts/mod/modpost.c
> > > > +++ b/scripts/mod/modpost.c
> > > > @@ -1108,7 +1108,7 @@ static inline int is_valid_name(struct elf_info *elf, Elf_Sym *sym)
> > > >  /**
> > > >   * Find symbol based on relocation record info.
> > > >   * In some cases the symbol supplied is a valid symbol so
> > > > - * return refsym. If st_name != 0 we assume this is a valid symbol.
> > > > + * return refsym. If is_valid_name() == true, we assume this is a valid symbol.
> > > >   * In other cases the symbol needs to be looked up in the symbol table
> > > >   * based on section and address.
> > > >   *  **/
> > > > @@ -1121,7 +1121,7 @@ static Elf_Sym *find_tosym(struct elf_info *elf, Elf64_Sword addr,
> > > >         Elf64_Sword d;
> > > >         unsigned int relsym_secindex;
> > > >
> > > > -       if (relsym->st_name != 0)
> > > > +       if (is_valid_name(elf, relsym))
> > > >                 return relsym;
> > > >
> > > >         /*
> > > > @@ -1312,11 +1312,19 @@ static int addend_arm_rel(struct elf_info *elf, Elf_Shdr *sechdr, Elf_Rela *r)
> > > >         unsigned int r_typ = ELF_R_TYPE(r->r_info);
> > > >         Elf_Sym *sym = elf->symtab_start + ELF_R_SYM(r->r_info);
> > > >         unsigned int inst = TO_NATIVE(*reloc_location(elf, sechdr, r));
> > > > +       int offset;
> > > >
> > > >         switch (r_typ) {
> > > >         case R_ARM_ABS32:
> > > >                 r->r_addend = inst + sym->st_value;
> > > >                 break;
> > > > +       case R_ARM_MOVW_ABS_NC:
> > > > +       case R_ARM_MOVT_ABS:
> > > > +               offset = ((inst & 0xf0000) >> 4) | (inst & 0xfff);
> > > > +               offset = (offset ^ 0x8000) - 0x8000;
> > >
> > > The code in arch/arm/kernel/module.c then right shifts the offset by
> > > 16 for R_ARM_MOVT_ABS. Is that necessary?
> > >
> >
> > MOVW/MOVT pairs are limited to an addend of -/+ 32 KiB, and the same
> > value must be encoded in both instructions.
>
>
> In my understanding, 'movt' loads the immediate value to
> the upper 16-bit of the register.
>

Correct. It sets the upper 16 bits of a register without corrupting
the lower 16 bits.

> I am just curious about the code in arch/arm/kernel/module.c.
>
> Please see 'case R_ARM_MOVT_ABS:' part.
>
>   [1] 'offset' is the immediate value encoded in instruction
>   [2] Add sym->st_value
>   [3] Right-shift 'offset' by 16
>   [4] Write it back to the instruction
>
> So, the immediate value encoded in the instruction
> is divided by 65536.
>
> I guess we need something like the following?
> (left-shift by 16).
>
>   if (ELF32_R_TYPE(rel->r_info) == R_ARM_MOVT_ABS ||
>       ELF32_R_TYPE(rel->r_info) == R_ARM_MOVT_PREL)
>           offset <<= 16;
>

No. The addend is not encoded in the same way as the effective immediate value.

The addend is limited to -/+ 32 KiB (range of s16), and the MOVT
instruction must use the same addend value as the MOVW instruction it
is paired with, without shifting.

This is necessary because otherwise, there is no way to handle an
addend/symbol combination that results in a carry between the lower
and upper 16 bit words. This is a consequence of the use of REL format
rather than RELA, where the addend is part of the relocation and not
encoded in the instructions.

>
>
>
> >
> > When constructing the actual immediate value from the symbol value and
> > the addend, only the top 16 bits are used in MOVT and the bottom 16
> > bits in MOVW.
> >
> > However, this code seems to borrow the Elf_Rela::addend field (which
> > ARM does not use natively) to record the intermediate value, which
> > would need to be split if it is used to fix up instruction opcodes.
>
> At first, modpost supported only RELA for section mismatch checks.
>
> Later, 2c1a51f39d95 ("[PATCH] kbuild: check SHT_REL sections")
> added REL support.
>
> But, the common code still used Elf_Rela.
>
>
> modpost does not need to write back the fixed instruction.
> modpost is only interested in the offset address.
>
> Currently, modpost saves the offset address in
> r->r_offset even for Rel. I do not like this code.
>
> So, I am trying to reduce the use of Elf_Rela.
> For example, this patch.
> https://patchwork.kernel.org/project/linux-kbuild/patch/20230521160426.1881124-8-masahiroy@kernel.org/
>

Yeah, that looks better to me.

>
> > Btw the Thumb2 encodings of MOVT and MOVW seem to be missing here.
>
> Right, if CONFIG_THUMB2_KERNEL=y, section mismatch check.
>
> Several relocation types are just skipped.
>

Skipped entirely? Or only for the diagnostic print that outputs the symbol name?
Masahiro Yamada May 24, 2023, 12:02 a.m. UTC | #5
On Tue, May 23, 2023 at 9:21 PM Ard Biesheuvel <ardb@kernel.org> wrote:
>
> On Tue, 23 May 2023 at 13:59, Masahiro Yamada <masahiroy@kernel.org> wrote:
> >
> > On Tue, May 23, 2023 at 6:50 AM Ard Biesheuvel <ardb@kernel.org> wrote:
> > >
> > > On Mon, 22 May 2023 at 20:03, Nick Desaulniers <ndesaulniers@google.com> wrote:
> > > >
> > > > + linux-arm-kernel
> > > >
> > > > On Sun, May 21, 2023 at 9:05 AM Masahiro Yamada <masahiroy@kernel.org> wrote:
> > > > >
> > > > > ARM defconfig misses to detect some section mismatches.
> > > > >
> > > > >   [test code]
> > > > >
> > > > >     #include <linux/init.h>
> > > > >
> > > > >     int __initdata foo;
> > > > >     int get_foo(int x) { return foo; }
> > > > >
> > > > > It is apparently a bad reference, but modpost does not report anything
> > > > > for ARM defconfig (i.e. multi_v7_defconfig).
> > > > >
> > > > > The test code above produces the following relocations.
> > > > >
> > > > >   Relocation section '.rel.text' at offset 0x200 contains 2 entries:
> > > > >    Offset     Info    Type            Sym.Value  Sym. Name
> > > > >   00000000  0000062b R_ARM_MOVW_ABS_NC 00000000   .LANCHOR0
> > > > >   00000004  0000062c R_ARM_MOVT_ABS    00000000   .LANCHOR0
> > > > >
> > > > >   Relocation section '.rel.ARM.exidx' at offset 0x210 contains 2 entries:
> > > > >    Offset     Info    Type            Sym.Value  Sym. Name
> > > > >   00000000  0000022a R_ARM_PREL31      00000000   .text
> > > > >   00000000  00001000 R_ARM_NONE        00000000   __aeabi_unwind_cpp_pr0
> > > > >
> > > > > Currently, R_ARM_MOVW_ABS_NC and R_ARM_MOVT_ABS are just skipped.
> > > > >
> > > > > Add code to handle them. I checked arch/arm/kernel/module.c to learn
> > > > > how the offset is encoded in the instruction.
> > > > >
> > > > > The referenced symbol in relocation might be a local anchor.
> > > > > If is_valid_name() returns false, let's search for a better symbol name.
> > > > >
> > > > > Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
> > > > > ---
> > > > >
> > > > >  scripts/mod/modpost.c | 12 ++++++++++--
> > > > >  1 file changed, 10 insertions(+), 2 deletions(-)
> > > > >
> > > > > diff --git a/scripts/mod/modpost.c b/scripts/mod/modpost.c
> > > > > index 34fbbd85bfde..ed2301e951a9 100644
> > > > > --- a/scripts/mod/modpost.c
> > > > > +++ b/scripts/mod/modpost.c
> > > > > @@ -1108,7 +1108,7 @@ static inline int is_valid_name(struct elf_info *elf, Elf_Sym *sym)
> > > > >  /**
> > > > >   * Find symbol based on relocation record info.
> > > > >   * In some cases the symbol supplied is a valid symbol so
> > > > > - * return refsym. If st_name != 0 we assume this is a valid symbol.
> > > > > + * return refsym. If is_valid_name() == true, we assume this is a valid symbol.
> > > > >   * In other cases the symbol needs to be looked up in the symbol table
> > > > >   * based on section and address.
> > > > >   *  **/
> > > > > @@ -1121,7 +1121,7 @@ static Elf_Sym *find_tosym(struct elf_info *elf, Elf64_Sword addr,
> > > > >         Elf64_Sword d;
> > > > >         unsigned int relsym_secindex;
> > > > >
> > > > > -       if (relsym->st_name != 0)
> > > > > +       if (is_valid_name(elf, relsym))
> > > > >                 return relsym;
> > > > >
> > > > >         /*
> > > > > @@ -1312,11 +1312,19 @@ static int addend_arm_rel(struct elf_info *elf, Elf_Shdr *sechdr, Elf_Rela *r)
> > > > >         unsigned int r_typ = ELF_R_TYPE(r->r_info);
> > > > >         Elf_Sym *sym = elf->symtab_start + ELF_R_SYM(r->r_info);
> > > > >         unsigned int inst = TO_NATIVE(*reloc_location(elf, sechdr, r));
> > > > > +       int offset;
> > > > >
> > > > >         switch (r_typ) {
> > > > >         case R_ARM_ABS32:
> > > > >                 r->r_addend = inst + sym->st_value;
> > > > >                 break;
> > > > > +       case R_ARM_MOVW_ABS_NC:
> > > > > +       case R_ARM_MOVT_ABS:
> > > > > +               offset = ((inst & 0xf0000) >> 4) | (inst & 0xfff);
> > > > > +               offset = (offset ^ 0x8000) - 0x8000;
> > > >
> > > > The code in arch/arm/kernel/module.c then right shifts the offset by
> > > > 16 for R_ARM_MOVT_ABS. Is that necessary?
> > > >
> > >
> > > MOVW/MOVT pairs are limited to an addend of -/+ 32 KiB, and the same
> > > value must be encoded in both instructions.
> >
> >
> > In my understanding, 'movt' loads the immediate value to
> > the upper 16-bit of the register.
> >
>
> Correct. It sets the upper 16 bits of a register without corrupting
> the lower 16 bits.
>
> > I am just curious about the code in arch/arm/kernel/module.c.
> >
> > Please see 'case R_ARM_MOVT_ABS:' part.
> >
> >   [1] 'offset' is the immediate value encoded in instruction
> >   [2] Add sym->st_value
> >   [3] Right-shift 'offset' by 16
> >   [4] Write it back to the instruction
> >
> > So, the immediate value encoded in the instruction
> > is divided by 65536.
> >
> > I guess we need something like the following?
> > (left-shift by 16).
> >
> >   if (ELF32_R_TYPE(rel->r_info) == R_ARM_MOVT_ABS ||
> >       ELF32_R_TYPE(rel->r_info) == R_ARM_MOVT_PREL)
> >           offset <<= 16;
> >
>
> No. The addend is not encoded in the same way as the effective immediate value.
>
> The addend is limited to -/+ 32 KiB (range of s16), and the MOVT
> instruction must use the same addend value as the MOVW instruction it
> is paired with, without shifting.
>
> This is necessary because otherwise, there is no way to handle an
> addend/symbol combination that results in a carry between the lower
> and upper 16 bit words. This is a consequence of the use of REL format
> rather than RELA, where the addend is part of the relocation and not
> encoded in the instructions.


Ah, OK.
Now I understand.




> >
> >
> >
> > >
> > > When constructing the actual immediate value from the symbol value and
> > > the addend, only the top 16 bits are used in MOVT and the bottom 16
> > > bits in MOVW.
> > >
> > > However, this code seems to borrow the Elf_Rela::addend field (which
> > > ARM does not use natively) to record the intermediate value, which
> > > would need to be split if it is used to fix up instruction opcodes.
> >
> > At first, modpost supported only RELA for section mismatch checks.
> >
> > Later, 2c1a51f39d95 ("[PATCH] kbuild: check SHT_REL sections")
> > added REL support.
> >
> > But, the common code still used Elf_Rela.
> >
> >
> > modpost does not need to write back the fixed instruction.
> > modpost is only interested in the offset address.
> >
> > Currently, modpost saves the offset address in
> > r->r_offset even for Rel. I do not like this code.
> >
> > So, I am trying to reduce the use of Elf_Rela.
> > For example, this patch.
> > https://patchwork.kernel.org/project/linux-kbuild/patch/20230521160426.1881124-8-masahiroy@kernel.org/
> >
>
> Yeah, that looks better to me.
>
> >
> > > Btw the Thumb2 encodings of MOVT and MOVW seem to be missing here.
> >
> > Right, if CONFIG_THUMB2_KERNEL=y, section mismatch check.
> >
> > Several relocation types are just skipped.
> >
>
> Skipped entirely? Or only for the diagnostic print that outputs the symbol name?


Skipped entirely.

modpost cannot detect section mismatches
if you enable CONFIG_THUMB2_KERNEL.



--
Best Regards
Masahiro Yamada
Masahiro Yamada May 24, 2023, 12:04 a.m. UTC | #6
On Tue, May 23, 2023 at 3:03 AM Nick Desaulniers
<ndesaulniers@google.com> wrote:
>
> + linux-arm-kernel
>
> On Sun, May 21, 2023 at 9:05 AM Masahiro Yamada <masahiroy@kernel.org> wrote:
> >
> > ARM defconfig misses to detect some section mismatches.
> >
> >   [test code]
> >
> >     #include <linux/init.h>
> >
> >     int __initdata foo;
> >     int get_foo(int x) { return foo; }
> >
> > It is apparently a bad reference, but modpost does not report anything
> > for ARM defconfig (i.e. multi_v7_defconfig).
> >
> > The test code above produces the following relocations.
> >
> >   Relocation section '.rel.text' at offset 0x200 contains 2 entries:
> >    Offset     Info    Type            Sym.Value  Sym. Name
> >   00000000  0000062b R_ARM_MOVW_ABS_NC 00000000   .LANCHOR0
> >   00000004  0000062c R_ARM_MOVT_ABS    00000000   .LANCHOR0
> >
> >   Relocation section '.rel.ARM.exidx' at offset 0x210 contains 2 entries:
> >    Offset     Info    Type            Sym.Value  Sym. Name
> >   00000000  0000022a R_ARM_PREL31      00000000   .text
> >   00000000  00001000 R_ARM_NONE        00000000   __aeabi_unwind_cpp_pr0
> >
> > Currently, R_ARM_MOVW_ABS_NC and R_ARM_MOVT_ABS are just skipped.
> >
> > Add code to handle them. I checked arch/arm/kernel/module.c to learn
> > how the offset is encoded in the instruction.
> >
> > The referenced symbol in relocation might be a local anchor.
> > If is_valid_name() returns false, let's search for a better symbol name.
> >
> > Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
> > ---
> >
> >  scripts/mod/modpost.c | 12 ++++++++++--
> >  1 file changed, 10 insertions(+), 2 deletions(-)
> >
> > diff --git a/scripts/mod/modpost.c b/scripts/mod/modpost.c
> > index 34fbbd85bfde..ed2301e951a9 100644
> > --- a/scripts/mod/modpost.c
> > +++ b/scripts/mod/modpost.c
> > @@ -1108,7 +1108,7 @@ static inline int is_valid_name(struct elf_info *elf, Elf_Sym *sym)
> >  /**
> >   * Find symbol based on relocation record info.
> >   * In some cases the symbol supplied is a valid symbol so
> > - * return refsym. If st_name != 0 we assume this is a valid symbol.
> > + * return refsym. If is_valid_name() == true, we assume this is a valid symbol.
> >   * In other cases the symbol needs to be looked up in the symbol table
> >   * based on section and address.
> >   *  **/
> > @@ -1121,7 +1121,7 @@ static Elf_Sym *find_tosym(struct elf_info *elf, Elf64_Sword addr,
> >         Elf64_Sword d;
> >         unsigned int relsym_secindex;
> >
> > -       if (relsym->st_name != 0)
> > +       if (is_valid_name(elf, relsym))
> >                 return relsym;
> >
> >         /*
> > @@ -1312,11 +1312,19 @@ static int addend_arm_rel(struct elf_info *elf, Elf_Shdr *sechdr, Elf_Rela *r)
> >         unsigned int r_typ = ELF_R_TYPE(r->r_info);
> >         Elf_Sym *sym = elf->symtab_start + ELF_R_SYM(r->r_info);
> >         unsigned int inst = TO_NATIVE(*reloc_location(elf, sechdr, r));
> > +       int offset;
> >
> >         switch (r_typ) {
> >         case R_ARM_ABS32:
> >                 r->r_addend = inst + sym->st_value;
> >                 break;
> > +       case R_ARM_MOVW_ABS_NC:
> > +       case R_ARM_MOVT_ABS:
> > +               offset = ((inst & 0xf0000) >> 4) | (inst & 0xfff);
> > +               offset = (offset ^ 0x8000) - 0x8000;
>
> The code in arch/arm/kernel/module.c then right shifts the offset by
> 16 for R_ARM_MOVT_ABS. Is that necessary?


I replied to Ard's email, but just in case.


modpost does not need to write back the fixed instruction.
modpost is only interested in the offset address.

So, the right-shift by 16 is unneeded.
diff mbox series

Patch

diff --git a/scripts/mod/modpost.c b/scripts/mod/modpost.c
index 34fbbd85bfde..ed2301e951a9 100644
--- a/scripts/mod/modpost.c
+++ b/scripts/mod/modpost.c
@@ -1108,7 +1108,7 @@  static inline int is_valid_name(struct elf_info *elf, Elf_Sym *sym)
 /**
  * Find symbol based on relocation record info.
  * In some cases the symbol supplied is a valid symbol so
- * return refsym. If st_name != 0 we assume this is a valid symbol.
+ * return refsym. If is_valid_name() == true, we assume this is a valid symbol.
  * In other cases the symbol needs to be looked up in the symbol table
  * based on section and address.
  *  **/
@@ -1121,7 +1121,7 @@  static Elf_Sym *find_tosym(struct elf_info *elf, Elf64_Sword addr,
 	Elf64_Sword d;
 	unsigned int relsym_secindex;
 
-	if (relsym->st_name != 0)
+	if (is_valid_name(elf, relsym))
 		return relsym;
 
 	/*
@@ -1312,11 +1312,19 @@  static int addend_arm_rel(struct elf_info *elf, Elf_Shdr *sechdr, Elf_Rela *r)
 	unsigned int r_typ = ELF_R_TYPE(r->r_info);
 	Elf_Sym *sym = elf->symtab_start + ELF_R_SYM(r->r_info);
 	unsigned int inst = TO_NATIVE(*reloc_location(elf, sechdr, r));
+	int offset;
 
 	switch (r_typ) {
 	case R_ARM_ABS32:
 		r->r_addend = inst + sym->st_value;
 		break;
+	case R_ARM_MOVW_ABS_NC:
+	case R_ARM_MOVT_ABS:
+		offset = ((inst & 0xf0000) >> 4) | (inst & 0xfff);
+		offset = (offset ^ 0x8000) - 0x8000;
+		offset += sym->st_value;
+		r->r_addend = offset;
+		break;
 	case R_ARM_PC24:
 	case R_ARM_CALL:
 	case R_ARM_JUMP24: