diff mbox series

compiler: allow all arches to enable CONFIG_OPTIMIZE_INLINING

Message ID 1553062828-27798-1-git-send-email-yamada.masahiro@socionext.com (mailing list archive)
State New, archived
Headers show
Series compiler: allow all arches to enable CONFIG_OPTIMIZE_INLINING | expand

Commit Message

Masahiro Yamada March 20, 2019, 6:20 a.m. UTC
Commit 60a3cdd06394 ("x86: add optimized inlining") introduced
CONFIG_OPTIMIZE_INLINING, but it has been available only for x86.

The idea is obviously arch-agnostic although we need some code fixups.
This commit moves the config entry from arch/x86/Kconfig.debug to
lib/Kconfig.debug so that all architectures (except MIPS for now) can
benefit from it.

At this moment, I added "depends on !MIPS" because fixing 0day bot reports
for MIPS was complex to me.

I tested this patch on my arm/arm64 boards.

This can make a huge difference in kernel image size especially when
CONFIG_OPTIMIZE_FOR_SIZE is enabled.

For example, I got 3.5% smaller arm64 kernel image for v5.1-rc1.

  dec       file
  18983424  arch/arm64/boot/Image.before
  18321920  arch/arm64/boot/Image.after

This also slightly improves the "Kernel hacking" Kconfig menu.
Commit e61aca5158a8 ("Merge branch 'kconfig-diet' from Dave Hansen')
mentioned this config option would be a good fit in the "compiler option"
menu. I did so.

I fixed up some files to avoid build warnings/errors.

[1] arch/arm64/include/asm/cpufeature.h

In file included from ././include/linux/compiler_types.h:68,
                 from <command-line>:
./arch/arm64/include/asm/jump_label.h: In function 'cpus_have_const_cap':
./include/linux/compiler-gcc.h:120:38: warning: asm operand 0 probably doesn't match constraints
 #define asm_volatile_goto(x...) do { asm goto(x); asm (""); } while (0)
                                      ^~~
./arch/arm64/include/asm/jump_label.h:32:2: note: in expansion of macro 'asm_volatile_goto'
  asm_volatile_goto(
  ^~~~~~~~~~~~~~~~~
./include/linux/compiler-gcc.h:120:38: error: impossible constraint in 'asm'
 #define asm_volatile_goto(x...) do { asm goto(x); asm (""); } while (0)
                                      ^~~
./arch/arm64/include/asm/jump_label.h:32:2: note: in expansion of macro 'asm_volatile_goto'
  asm_volatile_goto(
  ^~~~~~~~~~~~~~~~~

[2] arch/mips/kernel/cpu-bugs64.c

arch/mips/kernel/cpu-bugs64.c: In function 'mult_sh_align_mod.constprop':
arch/mips/kernel/cpu-bugs64.c:33:2: error: asm operand 1 probably doesn't match constraints [-Werror]
  asm volatile(
  ^~~
arch/mips/kernel/cpu-bugs64.c:33:2: error: asm operand 1 probably doesn't match constraints [-Werror]
  asm volatile(
  ^~~
arch/mips/kernel/cpu-bugs64.c:33:2: error: impossible constraint in 'asm'
  asm volatile(
  ^~~
arch/mips/kernel/cpu-bugs64.c:33:2: error: impossible constraint in 'asm'
  asm volatile(
  ^~~

[3] arch/powerpc/mm/tlb-radix.c

arch/powerpc/mm/tlb-radix.c: In function '__radix__flush_tlb_range_psize':
arch/powerpc/mm/tlb-radix.c:104:2: error: asm operand 3 probably doesn't match constraints [-Werror]
  asm volatile(PPC_TLBIEL(%0, %4, %3, %2, %1)
  ^~~
arch/powerpc/mm/tlb-radix.c:104:2: error: impossible constraint in 'asm'
  CC      arch/powerpc/perf/hv-gpci.o

[4] arch/s390/include/asm/cpacf.h

In file included from arch/s390/crypto/des_s390.c:19:
./arch/s390/include/asm/cpacf.h: In function 'cpacf_query':
./arch/s390/include/asm/cpacf.h:170:2: warning: asm operand 3 probably doesn't match constraints
  asm volatile(
  ^~~
./arch/s390/include/asm/cpacf.h:170:2: error: impossible constraint in 'asm'

[5] arch/powerpc/kernel/prom_init.c

WARNING: vmlinux.o(.text.unlikely+0x20): Section mismatch in reference from the function .prom_getprop() to the function .init.text:.call_prom()
The function .prom_getprop() references
the function __init .call_prom().
This is often because .prom_getprop lacks a __init
annotation or the annotation of .call_prom is wrong.

WARNING: vmlinux.o(.text.unlikely+0x3c): Section mismatch in reference from the function .prom_getproplen() to the function .init.text:.call_prom()
The function .prom_getproplen() references
the function __init .call_prom().
This is often because .prom_getproplen lacks a __init
annotation or the annotation of .call_prom is wrong.

[6] drivers/mtd/nand/raw/vf610_nfc.c

drivers/mtd/nand/raw/vf610_nfc.c: In function ‘vf610_nfc_cmd’:
drivers/mtd/nand/raw/vf610_nfc.c:455:3: warning: ‘offset’ may be used uninitialized in this function [-Wmaybe-uninitialized]
   vf610_nfc_rd_from_sram(instr->ctx.data.buf.in + offset,
   ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
            nfc->regs + NFC_MAIN_AREA(0) + offset,
            ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
            trfr_sz, !nfc->data_access);
            ~~~~~~~~~~~~~~~~~~~~~~~~~~~

[7] arch/arm/kernel/smp.c

arch/arm/kernel/smp.c: In function ‘raise_nmi’:
arch/arm/kernel/smp.c:522:2: warning: array subscript is above array bounds [-Warray-bounds]
  trace_ipi_raise_rcuidle(target, ipi_types[ipinr]);
  ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

The fixup is not included in this. The patch is available in ML:

http://lists.infradead.org/pipermail/linux-arm-kernel/2016-February/409393.html

Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
---

 arch/arm64/include/asm/cpufeature.h |  4 ++--
 arch/mips/kernel/cpu-bugs64.c       |  4 ++--
 arch/powerpc/kernel/prom_init.c     |  6 +++---
 arch/powerpc/mm/tlb-radix.c         |  2 +-
 arch/s390/include/asm/cpacf.h       |  2 +-
 arch/x86/Kconfig                    |  3 ---
 arch/x86/Kconfig.debug              | 14 --------------
 drivers/mtd/nand/raw/vf610_nfc.c    |  2 +-
 include/linux/compiler_types.h      |  3 +--
 lib/Kconfig.debug                   | 15 +++++++++++++++
 10 files changed, 26 insertions(+), 29 deletions(-)

Comments

Masahiro Yamada March 20, 2019, 6:40 a.m. UTC | #1
On Wed, Mar 20, 2019 at 3:21 PM Masahiro Yamada
<yamada.masahiro@socionext.com> wrote:
>
> Commit 60a3cdd06394 ("x86: add optimized inlining") introduced
> CONFIG_OPTIMIZE_INLINING, but it has been available only for x86.
>
> The idea is obviously arch-agnostic although we need some code fixups.
> This commit moves the config entry from arch/x86/Kconfig.debug to
> lib/Kconfig.debug so that all architectures (except MIPS for now) can
> benefit from it.
>
> At this moment, I added "depends on !MIPS" because fixing 0day bot reports
> for MIPS was complex to me.

BTW, I got the following error if I enabled CONFIG_OPTIMIZE_INLINING for MIPS.

It is unclear to me how to fix it.
That's why I ended up with "depends on !MIPS".


  MODPOST vmlinux.o
arch/mips/mm/sc-mips.o: In function `mips_sc_prefetch_enable.part.2':
sc-mips.c:(.text+0x98): undefined reference to `mips_gcr_base'
sc-mips.c:(.text+0x9c): undefined reference to `mips_gcr_base'
sc-mips.c:(.text+0xbc): undefined reference to `mips_gcr_base'
sc-mips.c:(.text+0xc8): undefined reference to `mips_gcr_base'
sc-mips.c:(.text+0xdc): undefined reference to `mips_gcr_base'
arch/mips/mm/sc-mips.o:sc-mips.c:(.text.unlikely+0x44): more undefined
references to `mips_gcr_base'


Perhaps, MIPS folks may know how to fix it.




> I tested this patch on my arm/arm64 boards.
>
> This can make a huge difference in kernel image size especially when
> CONFIG_OPTIMIZE_FOR_SIZE is enabled.
>
> For example, I got 3.5% smaller arm64 kernel image for v5.1-rc1.
>
>   dec       file
>   18983424  arch/arm64/boot/Image.before
>   18321920  arch/arm64/boot/Image.after
>
> This also slightly improves the "Kernel hacking" Kconfig menu.
> Commit e61aca5158a8 ("Merge branch 'kconfig-diet' from Dave Hansen')
> mentioned this config option would be a good fit in the "compiler option"
> menu. I did so.
>
> I fixed up some files to avoid build warnings/errors.
>
> [1] arch/arm64/include/asm/cpufeature.h
>
> In file included from ././include/linux/compiler_types.h:68,
>                  from <command-line>:
> ./arch/arm64/include/asm/jump_label.h: In function 'cpus_have_const_cap':
> ./include/linux/compiler-gcc.h:120:38: warning: asm operand 0 probably doesn't match constraints
>  #define asm_volatile_goto(x...) do { asm goto(x); asm (""); } while (0)
>                                       ^~~
> ./arch/arm64/include/asm/jump_label.h:32:2: note: in expansion of macro 'asm_volatile_goto'
>   asm_volatile_goto(
>   ^~~~~~~~~~~~~~~~~
> ./include/linux/compiler-gcc.h:120:38: error: impossible constraint in 'asm'
>  #define asm_volatile_goto(x...) do { asm goto(x); asm (""); } while (0)
>                                       ^~~
> ./arch/arm64/include/asm/jump_label.h:32:2: note: in expansion of macro 'asm_volatile_goto'
>   asm_volatile_goto(
>   ^~~~~~~~~~~~~~~~~
>
> [2] arch/mips/kernel/cpu-bugs64.c
>
> arch/mips/kernel/cpu-bugs64.c: In function 'mult_sh_align_mod.constprop':
> arch/mips/kernel/cpu-bugs64.c:33:2: error: asm operand 1 probably doesn't match constraints [-Werror]
>   asm volatile(
>   ^~~
> arch/mips/kernel/cpu-bugs64.c:33:2: error: asm operand 1 probably doesn't match constraints [-Werror]
>   asm volatile(
>   ^~~
> arch/mips/kernel/cpu-bugs64.c:33:2: error: impossible constraint in 'asm'
>   asm volatile(
>   ^~~
> arch/mips/kernel/cpu-bugs64.c:33:2: error: impossible constraint in 'asm'
>   asm volatile(
>   ^~~
>
> [3] arch/powerpc/mm/tlb-radix.c
>
> arch/powerpc/mm/tlb-radix.c: In function '__radix__flush_tlb_range_psize':
> arch/powerpc/mm/tlb-radix.c:104:2: error: asm operand 3 probably doesn't match constraints [-Werror]
>   asm volatile(PPC_TLBIEL(%0, %4, %3, %2, %1)
>   ^~~
> arch/powerpc/mm/tlb-radix.c:104:2: error: impossible constraint in 'asm'
>   CC      arch/powerpc/perf/hv-gpci.o
>
> [4] arch/s390/include/asm/cpacf.h
>
> In file included from arch/s390/crypto/des_s390.c:19:
> ./arch/s390/include/asm/cpacf.h: In function 'cpacf_query':
> ./arch/s390/include/asm/cpacf.h:170:2: warning: asm operand 3 probably doesn't match constraints
>   asm volatile(
>   ^~~
> ./arch/s390/include/asm/cpacf.h:170:2: error: impossible constraint in 'asm'
>
> [5] arch/powerpc/kernel/prom_init.c
>
> WARNING: vmlinux.o(.text.unlikely+0x20): Section mismatch in reference from the function .prom_getprop() to the function .init.text:.call_prom()
> The function .prom_getprop() references
> the function __init .call_prom().
> This is often because .prom_getprop lacks a __init
> annotation or the annotation of .call_prom is wrong.
>
> WARNING: vmlinux.o(.text.unlikely+0x3c): Section mismatch in reference from the function .prom_getproplen() to the function .init.text:.call_prom()
> The function .prom_getproplen() references
> the function __init .call_prom().
> This is often because .prom_getproplen lacks a __init
> annotation or the annotation of .call_prom is wrong.
>
> [6] drivers/mtd/nand/raw/vf610_nfc.c
>
> drivers/mtd/nand/raw/vf610_nfc.c: In function ‘vf610_nfc_cmd’:
> drivers/mtd/nand/raw/vf610_nfc.c:455:3: warning: ‘offset’ may be used uninitialized in this function [-Wmaybe-uninitialized]
>    vf610_nfc_rd_from_sram(instr->ctx.data.buf.in + offset,
>    ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>             nfc->regs + NFC_MAIN_AREA(0) + offset,
>             ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>             trfr_sz, !nfc->data_access);
>             ~~~~~~~~~~~~~~~~~~~~~~~~~~~
>
> [7] arch/arm/kernel/smp.c
>
> arch/arm/kernel/smp.c: In function ‘raise_nmi’:
> arch/arm/kernel/smp.c:522:2: warning: array subscript is above array bounds [-Warray-bounds]
>   trace_ipi_raise_rcuidle(target, ipi_types[ipinr]);
>   ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>
> The fixup is not included in this. The patch is available in ML:
>
> http://lists.infradead.org/pipermail/linux-arm-kernel/2016-February/409393.html
>
> Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
> ---
>
>  arch/arm64/include/asm/cpufeature.h |  4 ++--
>  arch/mips/kernel/cpu-bugs64.c       |  4 ++--
>  arch/powerpc/kernel/prom_init.c     |  6 +++---
>  arch/powerpc/mm/tlb-radix.c         |  2 +-
>  arch/s390/include/asm/cpacf.h       |  2 +-
>  arch/x86/Kconfig                    |  3 ---
>  arch/x86/Kconfig.debug              | 14 --------------
>  drivers/mtd/nand/raw/vf610_nfc.c    |  2 +-
>  include/linux/compiler_types.h      |  3 +--
>  lib/Kconfig.debug                   | 15 +++++++++++++++
>  10 files changed, 26 insertions(+), 29 deletions(-)
>
> diff --git a/arch/arm64/include/asm/cpufeature.h b/arch/arm64/include/asm/cpufeature.h
> index e505e1f..77d1aa5 100644
> --- a/arch/arm64/include/asm/cpufeature.h
> +++ b/arch/arm64/include/asm/cpufeature.h
> @@ -406,7 +406,7 @@ static inline bool cpu_have_feature(unsigned int num)
>  }
>
>  /* System capability check for constant caps */
> -static inline bool __cpus_have_const_cap(int num)
> +static __always_inline bool __cpus_have_const_cap(int num)
>  {
>         if (num >= ARM64_NCAPS)
>                 return false;
> @@ -420,7 +420,7 @@ static inline bool cpus_have_cap(unsigned int num)
>         return test_bit(num, cpu_hwcaps);
>  }
>
> -static inline bool cpus_have_const_cap(int num)
> +static __always_inline bool cpus_have_const_cap(int num)
>  {
>         if (static_branch_likely(&arm64_const_caps_ready))
>                 return __cpus_have_const_cap(num);
> diff --git a/arch/mips/kernel/cpu-bugs64.c b/arch/mips/kernel/cpu-bugs64.c
> index bada74a..c04b97a 100644
> --- a/arch/mips/kernel/cpu-bugs64.c
> +++ b/arch/mips/kernel/cpu-bugs64.c
> @@ -42,8 +42,8 @@ static inline void align_mod(const int align, const int mod)
>                 : "n"(align), "n"(mod));
>  }
>
> -static inline void mult_sh_align_mod(long *v1, long *v2, long *w,
> -                                    const int align, const int mod)
> +static __always_inline void mult_sh_align_mod(long *v1, long *v2, long *w,
> +                                             const int align, const int mod)
>  {
>         unsigned long flags;
>         int m1, m2;
> diff --git a/arch/powerpc/kernel/prom_init.c b/arch/powerpc/kernel/prom_init.c
> index f33ff41..241fe6b 100644
> --- a/arch/powerpc/kernel/prom_init.c
> +++ b/arch/powerpc/kernel/prom_init.c
> @@ -501,14 +501,14 @@ static int __init prom_next_node(phandle *nodep)
>         }
>  }
>
> -static inline int prom_getprop(phandle node, const char *pname,
> -                              void *value, size_t valuelen)
> +static inline int __init prom_getprop(phandle node, const char *pname,
> +                                     void *value, size_t valuelen)
>  {
>         return call_prom("getprop", 4, 1, node, ADDR(pname),
>                          (u32)(unsigned long) value, (u32) valuelen);
>  }
>
> -static inline int prom_getproplen(phandle node, const char *pname)
> +static inline int __init prom_getproplen(phandle node, const char *pname)
>  {
>         return call_prom("getproplen", 2, 1, node, ADDR(pname));
>  }
> diff --git a/arch/powerpc/mm/tlb-radix.c b/arch/powerpc/mm/tlb-radix.c
> index 6a23b9e..a2b2848 100644
> --- a/arch/powerpc/mm/tlb-radix.c
> +++ b/arch/powerpc/mm/tlb-radix.c
> @@ -928,7 +928,7 @@ void radix__tlb_flush(struct mmu_gather *tlb)
>         tlb->need_flush_all = 0;
>  }
>
> -static inline void __radix__flush_tlb_range_psize(struct mm_struct *mm,
> +static __always_inline void __radix__flush_tlb_range_psize(struct mm_struct *mm,
>                                 unsigned long start, unsigned long end,
>                                 int psize, bool also_pwc)
>  {
> diff --git a/arch/s390/include/asm/cpacf.h b/arch/s390/include/asm/cpacf.h
> index 3cc52e3..f316de4 100644
> --- a/arch/s390/include/asm/cpacf.h
> +++ b/arch/s390/include/asm/cpacf.h
> @@ -202,7 +202,7 @@ static inline int __cpacf_check_opcode(unsigned int opcode)
>         }
>  }
>
> -static inline int cpacf_query(unsigned int opcode, cpacf_mask_t *mask)
> +static __always_inline int cpacf_query(unsigned int opcode, cpacf_mask_t *mask)
>  {
>         if (__cpacf_check_opcode(opcode)) {
>                 __cpacf_query(opcode, mask);
> diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
> index c1f9b3c..1a3e2b5 100644
> --- a/arch/x86/Kconfig
> +++ b/arch/x86/Kconfig
> @@ -310,9 +310,6 @@ config ZONE_DMA32
>  config AUDIT_ARCH
>         def_bool y if X86_64
>
> -config ARCH_SUPPORTS_OPTIMIZED_INLINING
> -       def_bool y
> -
>  config ARCH_SUPPORTS_DEBUG_PAGEALLOC
>         def_bool y
>
> diff --git a/arch/x86/Kconfig.debug b/arch/x86/Kconfig.debug
> index 15d0fbe..f730680 100644
> --- a/arch/x86/Kconfig.debug
> +++ b/arch/x86/Kconfig.debug
> @@ -266,20 +266,6 @@ config CPA_DEBUG
>         ---help---
>           Do change_page_attr() self-tests every 30 seconds.
>
> -config OPTIMIZE_INLINING
> -       bool "Allow gcc to uninline functions marked 'inline'"
> -       ---help---
> -         This option determines if the kernel forces gcc to inline the functions
> -         developers have marked 'inline'. Doing so takes away freedom from gcc to
> -         do what it thinks is best, which is desirable for the gcc 3.x series of
> -         compilers. The gcc 4.x series have a rewritten inlining algorithm and
> -         enabling this option will generate a smaller kernel there. Hopefully
> -         this algorithm is so good that allowing gcc 4.x and above to make the
> -         decision will become the default in the future. Until then this option
> -         is there to test gcc for this.
> -
> -         If unsure, say N.
> -
>  config DEBUG_ENTRY
>         bool "Debug low-level entry code"
>         depends on DEBUG_KERNEL
> diff --git a/drivers/mtd/nand/raw/vf610_nfc.c b/drivers/mtd/nand/raw/vf610_nfc.c
> index a662ca1..19792d7 100644
> --- a/drivers/mtd/nand/raw/vf610_nfc.c
> +++ b/drivers/mtd/nand/raw/vf610_nfc.c
> @@ -364,7 +364,7 @@ static int vf610_nfc_cmd(struct nand_chip *chip,
>  {
>         const struct nand_op_instr *instr;
>         struct vf610_nfc *nfc = chip_to_nfc(chip);
> -       int op_id = -1, trfr_sz = 0, offset;
> +       int op_id = -1, trfr_sz = 0, offset = 0;
>         u32 col = 0, row = 0, cmd1 = 0, cmd2 = 0, code = 0;
>         bool force8bit = false;
>
> diff --git a/include/linux/compiler_types.h b/include/linux/compiler_types.h
> index ba814f1..19e58b9 100644
> --- a/include/linux/compiler_types.h
> +++ b/include/linux/compiler_types.h
> @@ -140,8 +140,7 @@ struct ftrace_likely_data {
>   * Do not use __always_inline here, since currently it expands to inline again
>   * (which would break users of __always_inline).
>   */
> -#if !defined(CONFIG_ARCH_SUPPORTS_OPTIMIZED_INLINING) || \
> -       !defined(CONFIG_OPTIMIZE_INLINING)
> +#if !defined(CONFIG_OPTIMIZE_INLINING)
>  #define inline inline __attribute__((__always_inline__)) __gnu_inline \
>         __maybe_unused notrace
>  #else
> diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
> index 0d9e817..20f3dfc 100644
> --- a/lib/Kconfig.debug
> +++ b/lib/Kconfig.debug
> @@ -310,6 +310,21 @@ config HEADERS_CHECK
>           exported to $(INSTALL_HDR_PATH) (usually 'usr/include' in
>           your build tree), to make sure they're suitable.
>
> +config OPTIMIZE_INLINING
> +       bool "Allow compiler to uninline functions marked 'inline'"
> +       depends on !MIPS  # TODO: look into MIPS code
> +       help
> +         This option determines if the kernel forces gcc to inline the functions
> +         developers have marked 'inline'. Doing so takes away freedom from gcc to
> +         do what it thinks is best, which is desirable for the gcc 3.x series of
> +         compilers. The gcc 4.x series have a rewritten inlining algorithm and
> +         enabling this option will generate a smaller kernel there. Hopefully
> +         this algorithm is so good that allowing gcc 4.x and above to make the
> +         decision will become the default in the future. Until then this option
> +         is there to test gcc for this.
> +
> +         If unsure, say N.
> +
>  config DEBUG_SECTION_MISMATCH
>         bool "Enable full Section mismatch analysis"
>         help
> --
> 2.7.4
>
>
> ______________________________________________________
> Linux MTD discussion mailing list
> http://lists.infradead.org/mailman/listinfo/linux-mtd/
Arnd Bergmann March 20, 2019, 9:38 a.m. UTC | #2
On Wed, Mar 20, 2019 at 7:41 AM Masahiro Yamada
<yamada.masahiro@socionext.com> wrote:

> It is unclear to me how to fix it.
> That's why I ended up with "depends on !MIPS".
>
>
>   MODPOST vmlinux.o
> arch/mips/mm/sc-mips.o: In function `mips_sc_prefetch_enable.part.2':
> sc-mips.c:(.text+0x98): undefined reference to `mips_gcr_base'
> sc-mips.c:(.text+0x9c): undefined reference to `mips_gcr_base'
> sc-mips.c:(.text+0xbc): undefined reference to `mips_gcr_base'
> sc-mips.c:(.text+0xc8): undefined reference to `mips_gcr_base'
> sc-mips.c:(.text+0xdc): undefined reference to `mips_gcr_base'
> arch/mips/mm/sc-mips.o:sc-mips.c:(.text.unlikely+0x44): more undefined
> references to `mips_gcr_base'
>
>
> Perhaps, MIPS folks may know how to fix it.

I would guess like this:

diff --git a/arch/mips/include/asm/mips-cm.h b/arch/mips/include/asm/mips-cm.h
index 8bc5df49b0e1..a27483fedb7d 100644
--- a/arch/mips/include/asm/mips-cm.h
+++ b/arch/mips/include/asm/mips-cm.h
@@ -79,7 +79,7 @@ static inline int mips_cm_probe(void)
  *
  * Returns true if a CM is present in the system, else false.
  */
-static inline bool mips_cm_present(void)
+static __always_inline bool mips_cm_present(void)
 {
 #ifdef CONFIG_MIPS_CM
        return mips_gcr_base != NULL;
@@ -93,7 +93,7 @@ static inline bool mips_cm_present(void)
  *
  * Returns true if the system implements an L2-only sync region, else false.
  */
-static inline bool mips_cm_has_l2sync(void)
+static __always_inline bool mips_cm_has_l2sync(void)
 {
 #ifdef CONFIG_MIPS_CM
        return mips_cm_l2sync_base != NULL;
Arnd Bergmann March 20, 2019, 9:41 a.m. UTC | #3
On Wed, Mar 20, 2019 at 7:21 AM Masahiro Yamada
<yamada.masahiro@socionext.com> wrote:
>
> Commit 60a3cdd06394 ("x86: add optimized inlining") introduced
> CONFIG_OPTIMIZE_INLINING, but it has been available only for x86.
>
> The idea is obviously arch-agnostic although we need some code fixups.
> This commit moves the config entry from arch/x86/Kconfig.debug to
> lib/Kconfig.debug so that all architectures (except MIPS for now) can
> benefit from it.
>
> At this moment, I added "depends on !MIPS" because fixing 0day bot reports
> for MIPS was complex to me.
>
> I tested this patch on my arm/arm64 boards.
>
> This can make a huge difference in kernel image size especially when
> CONFIG_OPTIMIZE_FOR_SIZE is enabled.
>
> For example, I got 3.5% smaller arm64 kernel image for v5.1-rc1.
>
>   dec       file
>   18983424  arch/arm64/boot/Image.before
>   18321920  arch/arm64/boot/Image.after
>
> This also slightly improves the "Kernel hacking" Kconfig menu.
> Commit e61aca5158a8 ("Merge branch 'kconfig-diet' from Dave Hansen')
> mentioned this config option would be a good fit in the "compiler option"
> menu. I did so.

I think this is a good idea in general, but it is likely to cause a lot of
new warnings. Especially the -Wmaybe-uninitialized warnings get
new false positives every time we get substantially different inlining
decisions.

I've added your patch to my randconfig test setup and will let you
know if I see anything noticeable. I'm currently testing clang-arm32,
clang-arm64 and gcc-x86.

      Arnd
Masahiro Yamada March 20, 2019, 10:18 a.m. UTC | #4
Hi Arnd,


On Wed, Mar 20, 2019 at 6:39 PM Arnd Bergmann <arnd@arndb.de> wrote:
>
> On Wed, Mar 20, 2019 at 7:41 AM Masahiro Yamada
> <yamada.masahiro@socionext.com> wrote:
>
> > It is unclear to me how to fix it.
> > That's why I ended up with "depends on !MIPS".
> >
> >
> >   MODPOST vmlinux.o
> > arch/mips/mm/sc-mips.o: In function `mips_sc_prefetch_enable.part.2':
> > sc-mips.c:(.text+0x98): undefined reference to `mips_gcr_base'
> > sc-mips.c:(.text+0x9c): undefined reference to `mips_gcr_base'
> > sc-mips.c:(.text+0xbc): undefined reference to `mips_gcr_base'
> > sc-mips.c:(.text+0xc8): undefined reference to `mips_gcr_base'
> > sc-mips.c:(.text+0xdc): undefined reference to `mips_gcr_base'
> > arch/mips/mm/sc-mips.o:sc-mips.c:(.text.unlikely+0x44): more undefined
> > references to `mips_gcr_base'
> >
> >
> > Perhaps, MIPS folks may know how to fix it.
>
> I would guess like this:
>
> diff --git a/arch/mips/include/asm/mips-cm.h b/arch/mips/include/asm/mips-cm.h
> index 8bc5df49b0e1..a27483fedb7d 100644
> --- a/arch/mips/include/asm/mips-cm.h
> +++ b/arch/mips/include/asm/mips-cm.h
> @@ -79,7 +79,7 @@ static inline int mips_cm_probe(void)
>   *
>   * Returns true if a CM is present in the system, else false.
>   */
> -static inline bool mips_cm_present(void)
> +static __always_inline bool mips_cm_present(void)
>  {
>  #ifdef CONFIG_MIPS_CM
>         return mips_gcr_base != NULL;
> @@ -93,7 +93,7 @@ static inline bool mips_cm_present(void)
>   *
>   * Returns true if the system implements an L2-only sync region, else false.
>   */
> -static inline bool mips_cm_has_l2sync(void)
> +static __always_inline bool mips_cm_has_l2sync(void)
>  {
>  #ifdef CONFIG_MIPS_CM
>         return mips_cm_l2sync_base != NULL;
>


Thanks, I applied the above, but I still see
 undefined reference to `mips_gcr_base'


I attached .config to produce this error.

I use prebuilt mips-linux-gcc from
https://mirrors.edge.kernel.org/pub/tools/crosstool/files/bin/x86_64/8.1.0/
Arnd Bergmann March 20, 2019, 1:04 p.m. UTC | #5
On Wed, Mar 20, 2019 at 11:19 AM Masahiro Yamada
<yamada.masahiro@socionext.com> wrote:
> On Wed, Mar 20, 2019 at 6:39 PM Arnd Bergmann <arnd@arndb.de> wrote:
> >
> > On Wed, Mar 20, 2019 at 7:41 AM Masahiro Yamada
> > <yamada.masahiro@socionext.com> wrote:
> >
> > > It is unclear to me how to fix it.
> > > That's why I ended up with "depends on !MIPS".
> > >
> > >
> > >   MODPOST vmlinux.o
> > > arch/mips/mm/sc-mips.o: In function `mips_sc_prefetch_enable.part.2':
> > > sc-mips.c:(.text+0x98): undefined reference to `mips_gcr_base'
> > > sc-mips.c:(.text+0x9c): undefined reference to `mips_gcr_base'
> > > sc-mips.c:(.text+0xbc): undefined reference to `mips_gcr_base'
> > > sc-mips.c:(.text+0xc8): undefined reference to `mips_gcr_base'
> > > sc-mips.c:(.text+0xdc): undefined reference to `mips_gcr_base'
> > > arch/mips/mm/sc-mips.o:sc-mips.c:(.text.unlikely+0x44): more undefined
> > > references to `mips_gcr_base'
> > >
> > >
> > > Perhaps, MIPS folks may know how to fix it.
> >
> > I would guess like this:
> >
> > diff --git a/arch/mips/include/asm/mips-cm.h b/arch/mips/include/asm/mips-cm.h
> > index 8bc5df49b0e1..a27483fedb7d 100644
> > --- a/arch/mips/include/asm/mips-cm.h
> > +++ b/arch/mips/include/asm/mips-cm.h
> > @@ -79,7 +79,7 @@ static inline int mips_cm_probe(void)
> >   *
> >   * Returns true if a CM is present in the system, else false.
> >   */
> > -static inline bool mips_cm_present(void)
> > +static __always_inline bool mips_cm_present(void)
> >  {
> >  #ifdef CONFIG_MIPS_CM
> >         return mips_gcr_base != NULL;
> > @@ -93,7 +93,7 @@ static inline bool mips_cm_present(void)
> >   *
> >   * Returns true if the system implements an L2-only sync region, else false.
> >   */
> > -static inline bool mips_cm_has_l2sync(void)
> > +static __always_inline bool mips_cm_has_l2sync(void)
> >  {
> >  #ifdef CONFIG_MIPS_CM
> >         return mips_cm_l2sync_base != NULL;
> >
>
>
> Thanks, I applied the above, but I still see
>  undefined reference to `mips_gcr_base'
>
>
> I attached .config to produce this error.
>
> I use prebuilt mips-linux-gcc from
> https://mirrors.edge.kernel.org/pub/tools/crosstool/files/bin/x86_64/8.1.0/

I got to this patch experimentally, it fixes the problem for me:

diff --git a/arch/mips/mm/sc-mips.c b/arch/mips/mm/sc-mips.c
index 394673991bab..d70d02da038b 100644
--- a/arch/mips/mm/sc-mips.c
+++ b/arch/mips/mm/sc-mips.c
@@ -181,7 +181,7 @@ static int __init mips_sc_probe_cm3(void)
        return 0;
 }

-static inline int __init mips_sc_probe(void)
+static __always_inline int __init mips_sc_probe(void)
 {
        struct cpuinfo_mips *c = &current_cpu_data;
        unsigned int config1, config2;
diff --git a/arch/mips/include/asm/bitops.h b/arch/mips/include/asm/bitops.h
index 830c93a010c3..186c28463bf3 100644
--- a/arch/mips/include/asm/bitops.h
+++ b/arch/mips/include/asm/bitops.h
@@ -548,7 +548,7 @@ static inline unsigned long __fls(unsigned long word)
  * Returns 0..SZLONG-1
  * Undefined if no bit exists, so code should check against 0 first.
  */
-static inline unsigned long __ffs(unsigned long word)
+static __always_inline unsigned long __ffs(unsigned long word)
 {
        return __fls(word & -word);
 }


It does look like a gcc bug though, as at least some of the references
are from a function that got split out from an inlined function but that
has no remaining call sites.

       Arnd
Arnd Bergmann March 20, 2019, 1:34 p.m. UTC | #6
On Wed, Mar 20, 2019 at 10:41 AM Arnd Bergmann <arnd@arndb.de> wrote:
>
> I've added your patch to my randconfig test setup and will let you
> know if I see anything noticeable. I'm currently testing clang-arm32,
> clang-arm64 and gcc-x86.

This is the only additional bug that has come up so far:

`.exit.text' referenced in section `.alt.smp.init' of
drivers/char/ipmi/ipmi_msghandler.o: defined in discarded section
`exit.text' of drivers/char/ipmi/ipmi_msghandler.o

diff --git a/arch/arm/kernel/atags.h b/arch/arm/kernel/atags.h
index 201100226301..84b12e33104d 100644
--- a/arch/arm/kernel/atags.h
+++ b/arch/arm/kernel/atags.h
@@ -5,7 +5,7 @@ void convert_to_tag_list(struct tag *tags);
 const struct machine_desc *setup_machine_tags(phys_addr_t __atags_pointer,
        unsigned int machine_nr);
 #else
-static inline const struct machine_desc *
+static __always_inline const struct machine_desc *
 setup_machine_tags(phys_addr_t __atags_pointer, unsigned int machine_nr)
 {
        early_print("no ATAGS support: can't continue\n");
Heiko Carstens March 21, 2019, 8:01 a.m. UTC | #7
On Wed, Mar 20, 2019 at 03:20:27PM +0900, Masahiro Yamada wrote:
> Commit 60a3cdd06394 ("x86: add optimized inlining") introduced
> CONFIG_OPTIMIZE_INLINING, but it has been available only for x86.
> 
> The idea is obviously arch-agnostic although we need some code fixups.
> This commit moves the config entry from arch/x86/Kconfig.debug to
> lib/Kconfig.debug so that all architectures (except MIPS for now) can
> benefit from it.
> 
> At this moment, I added "depends on !MIPS" because fixing 0day bot reports
> for MIPS was complex to me.
> 
> I tested this patch on my arm/arm64 boards.
> 
> This can make a huge difference in kernel image size especially when
> CONFIG_OPTIMIZE_FOR_SIZE is enabled.
> 
> For example, I got 3.5% smaller arm64 kernel image for v5.1-rc1.
> 
>   dec       file
>   18983424  arch/arm64/boot/Image.before
>   18321920  arch/arm64/boot/Image.after

Well, this will change, since now people (have to) start adding
__always_inline annotations on all architectures, most likely until
all have about the same amount of annotations like x86. This will
reduce the benefit.

Not sure if it's really a win that we get the inline vs
__always_inline discussion now on all architectures.
Christophe Leroy March 23, 2019, 8:26 a.m. UTC | #8
Arnd Bergmann <arnd@arndb.de> a écrit :

> On Wed, Mar 20, 2019 at 10:41 AM Arnd Bergmann <arnd@arndb.de> wrote:
>>
>> I've added your patch to my randconfig test setup and will let you
>> know if I see anything noticeable. I'm currently testing clang-arm32,
>> clang-arm64 and gcc-x86.
>
> This is the only additional bug that has come up so far:
>
> `.exit.text' referenced in section `.alt.smp.init' of
> drivers/char/ipmi/ipmi_msghandler.o: defined in discarded section
> `exit.text' of drivers/char/ipmi/ipmi_msghandler.o

Wouldn't it be useful to activate -Winline gcc warning to ease finding  
out problematic usage of inline keyword ?

Christophe

>
> diff --git a/arch/arm/kernel/atags.h b/arch/arm/kernel/atags.h
> index 201100226301..84b12e33104d 100644
> --- a/arch/arm/kernel/atags.h
> +++ b/arch/arm/kernel/atags.h
> @@ -5,7 +5,7 @@ void convert_to_tag_list(struct tag *tags);
>  const struct machine_desc *setup_machine_tags(phys_addr_t __atags_pointer,
>         unsigned int machine_nr);
>  #else
> -static inline const struct machine_desc *
> +static __always_inline const struct machine_desc *
>  setup_machine_tags(phys_addr_t __atags_pointer, unsigned int machine_nr)
>  {
>         early_print("no ATAGS support: can't continue\n");
Christophe Leroy March 23, 2019, 8:37 a.m. UTC | #9
Masahiro Yamada <yamada.masahiro@socionext.com> a écrit :

> Commit 60a3cdd06394 ("x86: add optimized inlining") introduced
> CONFIG_OPTIMIZE_INLINING, but it has been available only for x86.
>
> The idea is obviously arch-agnostic although we need some code fixups.
> This commit moves the config entry from arch/x86/Kconfig.debug to
> lib/Kconfig.debug so that all architectures (except MIPS for now) can
> benefit from it.
>
> At this moment, I added "depends on !MIPS" because fixing 0day bot reports
> for MIPS was complex to me.
>
> I tested this patch on my arm/arm64 boards.
>
> This can make a huge difference in kernel image size especially when
> CONFIG_OPTIMIZE_FOR_SIZE is enabled.
>
> For example, I got 3.5% smaller arm64 kernel image for v5.1-rc1.
>
>   dec       file
>   18983424  arch/arm64/boot/Image.before
>   18321920  arch/arm64/boot/Image.after
>
> This also slightly improves the "Kernel hacking" Kconfig menu.
> Commit e61aca5158a8 ("Merge branch 'kconfig-diet' from Dave Hansen')
> mentioned this config option would be a good fit in the "compiler option"
> menu. I did so.
>
> I fixed up some files to avoid build warnings/errors.
>
> [1] arch/arm64/include/asm/cpufeature.h
>
> In file included from ././include/linux/compiler_types.h:68,
>                  from <command-line>:
> ./arch/arm64/include/asm/jump_label.h: In function 'cpus_have_const_cap':
> ./include/linux/compiler-gcc.h:120:38: warning: asm operand 0  
> probably doesn't match constraints
>  #define asm_volatile_goto(x...) do { asm goto(x); asm (""); } while (0)
>                                       ^~~
> ./arch/arm64/include/asm/jump_label.h:32:2: note: in expansion of  
> macro 'asm_volatile_goto'
>   asm_volatile_goto(
>   ^~~~~~~~~~~~~~~~~
> ./include/linux/compiler-gcc.h:120:38: error: impossible constraint in 'asm'
>  #define asm_volatile_goto(x...) do { asm goto(x); asm (""); } while (0)
>                                       ^~~
> ./arch/arm64/include/asm/jump_label.h:32:2: note: in expansion of  
> macro 'asm_volatile_goto'
>   asm_volatile_goto(
>   ^~~~~~~~~~~~~~~~~
>
> [2] arch/mips/kernel/cpu-bugs64.c
>
> arch/mips/kernel/cpu-bugs64.c: In function 'mult_sh_align_mod.constprop':
> arch/mips/kernel/cpu-bugs64.c:33:2: error: asm operand 1 probably  
> doesn't match constraints [-Werror]
>   asm volatile(
>   ^~~
> arch/mips/kernel/cpu-bugs64.c:33:2: error: asm operand 1 probably  
> doesn't match constraints [-Werror]
>   asm volatile(
>   ^~~
> arch/mips/kernel/cpu-bugs64.c:33:2: error: impossible constraint in 'asm'
>   asm volatile(
>   ^~~
> arch/mips/kernel/cpu-bugs64.c:33:2: error: impossible constraint in 'asm'
>   asm volatile(
>   ^~~
>
> [3] arch/powerpc/mm/tlb-radix.c
>
> arch/powerpc/mm/tlb-radix.c: In function '__radix__flush_tlb_range_psize':
> arch/powerpc/mm/tlb-radix.c:104:2: error: asm operand 3 probably  
> doesn't match constraints [-Werror]
>   asm volatile(PPC_TLBIEL(%0, %4, %3, %2, %1)
>   ^~~
> arch/powerpc/mm/tlb-radix.c:104:2: error: impossible constraint in 'asm'
>   CC      arch/powerpc/perf/hv-gpci.o
>
> [4] arch/s390/include/asm/cpacf.h
>
> In file included from arch/s390/crypto/des_s390.c:19:
> ./arch/s390/include/asm/cpacf.h: In function 'cpacf_query':
> ./arch/s390/include/asm/cpacf.h:170:2: warning: asm operand 3  
> probably doesn't match constraints
>   asm volatile(
>   ^~~
> ./arch/s390/include/asm/cpacf.h:170:2: error: impossible constraint in 'asm'
>
> [5] arch/powerpc/kernel/prom_init.c
>
> WARNING: vmlinux.o(.text.unlikely+0x20): Section mismatch in  
> reference from the function .prom_getprop() to the function  
> .init.text:.call_prom()
> The function .prom_getprop() references
> the function __init .call_prom().
> This is often because .prom_getprop lacks a __init
> annotation or the annotation of .call_prom is wrong.
>
> WARNING: vmlinux.o(.text.unlikely+0x3c): Section mismatch in  
> reference from the function .prom_getproplen() to the function  
> .init.text:.call_prom()
> The function .prom_getproplen() references
> the function __init .call_prom().
> This is often because .prom_getproplen lacks a __init
> annotation or the annotation of .call_prom is wrong.
>
> [6] drivers/mtd/nand/raw/vf610_nfc.c
>
> drivers/mtd/nand/raw/vf610_nfc.c: In function ‘vf610_nfc_cmd’:
> drivers/mtd/nand/raw/vf610_nfc.c:455:3: warning: ‘offset’ may be  
> used uninitialized in this function [-Wmaybe-uninitialized]
>    vf610_nfc_rd_from_sram(instr->ctx.data.buf.in + offset,
>    ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>             nfc->regs + NFC_MAIN_AREA(0) + offset,
>             ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>             trfr_sz, !nfc->data_access);
>             ~~~~~~~~~~~~~~~~~~~~~~~~~~~
>
> [7] arch/arm/kernel/smp.c
>
> arch/arm/kernel/smp.c: In function ‘raise_nmi’:
> arch/arm/kernel/smp.c:522:2: warning: array subscript is above array  
> bounds [-Warray-bounds]
>   trace_ipi_raise_rcuidle(target, ipi_types[ipinr]);
>   ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>
> The fixup is not included in this. The patch is available in ML:
>
> http://lists.infradead.org/pipermail/linux-arm-kernel/2016-February/409393.html
>
> Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
> ---
>
>  arch/arm64/include/asm/cpufeature.h |  4 ++--
>  arch/mips/kernel/cpu-bugs64.c       |  4 ++--
>  arch/powerpc/kernel/prom_init.c     |  6 +++---
>  arch/powerpc/mm/tlb-radix.c         |  2 +-
>  arch/s390/include/asm/cpacf.h       |  2 +-
>  arch/x86/Kconfig                    |  3 ---
>  arch/x86/Kconfig.debug              | 14 --------------
>  drivers/mtd/nand/raw/vf610_nfc.c    |  2 +-
>  include/linux/compiler_types.h      |  3 +--
>  lib/Kconfig.debug                   | 15 +++++++++++++++
>  10 files changed, 26 insertions(+), 29 deletions(-)

The arch fixups can be done regardless of the Kconfig change as far as  
they are done before.

I guess it would then be easier to manage and review with a series of  
small patches addressing each arch independently

Christophe

>
> diff --git a/arch/arm64/include/asm/cpufeature.h  
> b/arch/arm64/include/asm/cpufeature.h
> index e505e1f..77d1aa5 100644
> --- a/arch/arm64/include/asm/cpufeature.h
> +++ b/arch/arm64/include/asm/cpufeature.h
> @@ -406,7 +406,7 @@ static inline bool cpu_have_feature(unsigned int num)
>  }
>
>  /* System capability check for constant caps */
> -static inline bool __cpus_have_const_cap(int num)
> +static __always_inline bool __cpus_have_const_cap(int num)
>  {
>  	if (num >= ARM64_NCAPS)
>  		return false;
> @@ -420,7 +420,7 @@ static inline bool cpus_have_cap(unsigned int num)
>  	return test_bit(num, cpu_hwcaps);
>  }
>
> -static inline bool cpus_have_const_cap(int num)
> +static __always_inline bool cpus_have_const_cap(int num)
>  {
>  	if (static_branch_likely(&arm64_const_caps_ready))
>  		return __cpus_have_const_cap(num);
> diff --git a/arch/mips/kernel/cpu-bugs64.c b/arch/mips/kernel/cpu-bugs64.c
> index bada74a..c04b97a 100644
> --- a/arch/mips/kernel/cpu-bugs64.c
> +++ b/arch/mips/kernel/cpu-bugs64.c
> @@ -42,8 +42,8 @@ static inline void align_mod(const int align,  
> const int mod)
>  		: "n"(align), "n"(mod));
>  }
>
> -static inline void mult_sh_align_mod(long *v1, long *v2, long *w,
> -				     const int align, const int mod)
> +static __always_inline void mult_sh_align_mod(long *v1, long *v2, long *w,
> +					      const int align, const int mod)
>  {
>  	unsigned long flags;
>  	int m1, m2;
> diff --git a/arch/powerpc/kernel/prom_init.c  
> b/arch/powerpc/kernel/prom_init.c
> index f33ff41..241fe6b 100644
> --- a/arch/powerpc/kernel/prom_init.c
> +++ b/arch/powerpc/kernel/prom_init.c
> @@ -501,14 +501,14 @@ static int __init prom_next_node(phandle *nodep)
>  	}
>  }
>
> -static inline int prom_getprop(phandle node, const char *pname,
> -			       void *value, size_t valuelen)
> +static inline int __init prom_getprop(phandle node, const char *pname,
> +				      void *value, size_t valuelen)
>  {
>  	return call_prom("getprop", 4, 1, node, ADDR(pname),
>  			 (u32)(unsigned long) value, (u32) valuelen);
>  }
>
> -static inline int prom_getproplen(phandle node, const char *pname)
> +static inline int __init prom_getproplen(phandle node, const char *pname)
>  {
>  	return call_prom("getproplen", 2, 1, node, ADDR(pname));
>  }
> diff --git a/arch/powerpc/mm/tlb-radix.c b/arch/powerpc/mm/tlb-radix.c
> index 6a23b9e..a2b2848 100644
> --- a/arch/powerpc/mm/tlb-radix.c
> +++ b/arch/powerpc/mm/tlb-radix.c
> @@ -928,7 +928,7 @@ void radix__tlb_flush(struct mmu_gather *tlb)
>  	tlb->need_flush_all = 0;
>  }
>
> -static inline void __radix__flush_tlb_range_psize(struct mm_struct *mm,
> +static __always_inline void __radix__flush_tlb_range_psize(struct  
> mm_struct *mm,
>  				unsigned long start, unsigned long end,
>  				int psize, bool also_pwc)
>  {
> diff --git a/arch/s390/include/asm/cpacf.h b/arch/s390/include/asm/cpacf.h
> index 3cc52e3..f316de4 100644
> --- a/arch/s390/include/asm/cpacf.h
> +++ b/arch/s390/include/asm/cpacf.h
> @@ -202,7 +202,7 @@ static inline int __cpacf_check_opcode(unsigned  
> int opcode)
>  	}
>  }
>
> -static inline int cpacf_query(unsigned int opcode, cpacf_mask_t *mask)
> +static __always_inline int cpacf_query(unsigned int opcode,  
> cpacf_mask_t *mask)
>  {
>  	if (__cpacf_check_opcode(opcode)) {
>  		__cpacf_query(opcode, mask);
> diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
> index c1f9b3c..1a3e2b5 100644
> --- a/arch/x86/Kconfig
> +++ b/arch/x86/Kconfig
> @@ -310,9 +310,6 @@ config ZONE_DMA32
>  config AUDIT_ARCH
>  	def_bool y if X86_64
>
> -config ARCH_SUPPORTS_OPTIMIZED_INLINING
> -	def_bool y
> -
>  config ARCH_SUPPORTS_DEBUG_PAGEALLOC
>  	def_bool y
>
> diff --git a/arch/x86/Kconfig.debug b/arch/x86/Kconfig.debug
> index 15d0fbe..f730680 100644
> --- a/arch/x86/Kconfig.debug
> +++ b/arch/x86/Kconfig.debug
> @@ -266,20 +266,6 @@ config CPA_DEBUG
>  	---help---
>  	  Do change_page_attr() self-tests every 30 seconds.
>
> -config OPTIMIZE_INLINING
> -	bool "Allow gcc to uninline functions marked 'inline'"
> -	---help---
> -	  This option determines if the kernel forces gcc to inline the functions
> -	  developers have marked 'inline'. Doing so takes away freedom from gcc to
> -	  do what it thinks is best, which is desirable for the gcc 3.x series of
> -	  compilers. The gcc 4.x series have a rewritten inlining algorithm and
> -	  enabling this option will generate a smaller kernel there. Hopefully
> -	  this algorithm is so good that allowing gcc 4.x and above to make the
> -	  decision will become the default in the future. Until then this option
> -	  is there to test gcc for this.
> -
> -	  If unsure, say N.
> -
>  config DEBUG_ENTRY
>  	bool "Debug low-level entry code"
>  	depends on DEBUG_KERNEL
> diff --git a/drivers/mtd/nand/raw/vf610_nfc.c  
> b/drivers/mtd/nand/raw/vf610_nfc.c
> index a662ca1..19792d7 100644
> --- a/drivers/mtd/nand/raw/vf610_nfc.c
> +++ b/drivers/mtd/nand/raw/vf610_nfc.c
> @@ -364,7 +364,7 @@ static int vf610_nfc_cmd(struct nand_chip *chip,
>  {
>  	const struct nand_op_instr *instr;
>  	struct vf610_nfc *nfc = chip_to_nfc(chip);
> -	int op_id = -1, trfr_sz = 0, offset;
> +	int op_id = -1, trfr_sz = 0, offset = 0;
>  	u32 col = 0, row = 0, cmd1 = 0, cmd2 = 0, code = 0;
>  	bool force8bit = false;
>
> diff --git a/include/linux/compiler_types.h b/include/linux/compiler_types.h
> index ba814f1..19e58b9 100644
> --- a/include/linux/compiler_types.h
> +++ b/include/linux/compiler_types.h
> @@ -140,8 +140,7 @@ struct ftrace_likely_data {
>   * Do not use __always_inline here, since currently it expands to  
> inline again
>   * (which would break users of __always_inline).
>   */
> -#if !defined(CONFIG_ARCH_SUPPORTS_OPTIMIZED_INLINING) || \
> -	!defined(CONFIG_OPTIMIZE_INLINING)
> +#if !defined(CONFIG_OPTIMIZE_INLINING)
>  #define inline inline __attribute__((__always_inline__)) __gnu_inline \
>  	__maybe_unused notrace
>  #else
> diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
> index 0d9e817..20f3dfc 100644
> --- a/lib/Kconfig.debug
> +++ b/lib/Kconfig.debug
> @@ -310,6 +310,21 @@ config HEADERS_CHECK
>  	  exported to $(INSTALL_HDR_PATH) (usually 'usr/include' in
>  	  your build tree), to make sure they're suitable.
>
> +config OPTIMIZE_INLINING
> +	bool "Allow compiler to uninline functions marked 'inline'"
> +	depends on !MIPS  # TODO: look into MIPS code
> +	help
> +	  This option determines if the kernel forces gcc to inline the functions
> +	  developers have marked 'inline'. Doing so takes away freedom from gcc to
> +	  do what it thinks is best, which is desirable for the gcc 3.x series of
> +	  compilers. The gcc 4.x series have a rewritten inlining algorithm and
> +	  enabling this option will generate a smaller kernel there. Hopefully
> +	  this algorithm is so good that allowing gcc 4.x and above to make the
> +	  decision will become the default in the future. Until then this option
> +	  is there to test gcc for this.
> +
> +	  If unsure, say N.
> +
>  config DEBUG_SECTION_MISMATCH
>  	bool "Enable full Section mismatch analysis"
>  	help
> --
> 2.7.4
Masahiro Yamada March 25, 2019, 6:04 a.m. UTC | #10
Hi Arnd,




On Wed, Mar 20, 2019 at 10:05 PM Arnd Bergmann <arnd@arndb.de> wrote:
>
> On Wed, Mar 20, 2019 at 11:19 AM Masahiro Yamada
> <yamada.masahiro@socionext.com> wrote:
> > On Wed, Mar 20, 2019 at 6:39 PM Arnd Bergmann <arnd@arndb.de> wrote:
> > >
> > > On Wed, Mar 20, 2019 at 7:41 AM Masahiro Yamada
> > > <yamada.masahiro@socionext.com> wrote:
> > >
> > > > It is unclear to me how to fix it.
> > > > That's why I ended up with "depends on !MIPS".
> > > >
> > > >
> > > >   MODPOST vmlinux.o
> > > > arch/mips/mm/sc-mips.o: In function `mips_sc_prefetch_enable.part.2':
> > > > sc-mips.c:(.text+0x98): undefined reference to `mips_gcr_base'
> > > > sc-mips.c:(.text+0x9c): undefined reference to `mips_gcr_base'
> > > > sc-mips.c:(.text+0xbc): undefined reference to `mips_gcr_base'
> > > > sc-mips.c:(.text+0xc8): undefined reference to `mips_gcr_base'
> > > > sc-mips.c:(.text+0xdc): undefined reference to `mips_gcr_base'
> > > > arch/mips/mm/sc-mips.o:sc-mips.c:(.text.unlikely+0x44): more undefined
> > > > references to `mips_gcr_base'
> > > >
> > > >
> > > > Perhaps, MIPS folks may know how to fix it.
> > >
> > > I would guess like this:
> > >
> > > diff --git a/arch/mips/include/asm/mips-cm.h b/arch/mips/include/asm/mips-cm.h
> > > index 8bc5df49b0e1..a27483fedb7d 100644
> > > --- a/arch/mips/include/asm/mips-cm.h
> > > +++ b/arch/mips/include/asm/mips-cm.h
> > > @@ -79,7 +79,7 @@ static inline int mips_cm_probe(void)
> > >   *
> > >   * Returns true if a CM is present in the system, else false.
> > >   */
> > > -static inline bool mips_cm_present(void)
> > > +static __always_inline bool mips_cm_present(void)
> > >  {
> > >  #ifdef CONFIG_MIPS_CM
> > >         return mips_gcr_base != NULL;
> > > @@ -93,7 +93,7 @@ static inline bool mips_cm_present(void)
> > >   *
> > >   * Returns true if the system implements an L2-only sync region, else false.
> > >   */
> > > -static inline bool mips_cm_has_l2sync(void)
> > > +static __always_inline bool mips_cm_has_l2sync(void)
> > >  {
> > >  #ifdef CONFIG_MIPS_CM
> > >         return mips_cm_l2sync_base != NULL;
> > >
> >
> >
> > Thanks, I applied the above, but I still see
> >  undefined reference to `mips_gcr_base'
> >
> >
> > I attached .config to produce this error.
> >
> > I use prebuilt mips-linux-gcc from
> > https://mirrors.edge.kernel.org/pub/tools/crosstool/files/bin/x86_64/8.1.0/
>
> I got to this patch experimentally, it fixes the problem for me:
>
> diff --git a/arch/mips/mm/sc-mips.c b/arch/mips/mm/sc-mips.c
> index 394673991bab..d70d02da038b 100644
> --- a/arch/mips/mm/sc-mips.c
> +++ b/arch/mips/mm/sc-mips.c
> @@ -181,7 +181,7 @@ static int __init mips_sc_probe_cm3(void)
>         return 0;
>  }
>
> -static inline int __init mips_sc_probe(void)
> +static __always_inline int __init mips_sc_probe(void)
>  {
>         struct cpuinfo_mips *c = &current_cpu_data;
>         unsigned int config1, config2;
> diff --git a/arch/mips/include/asm/bitops.h b/arch/mips/include/asm/bitops.h
> index 830c93a010c3..186c28463bf3 100644
> --- a/arch/mips/include/asm/bitops.h
> +++ b/arch/mips/include/asm/bitops.h
> @@ -548,7 +548,7 @@ static inline unsigned long __fls(unsigned long word)
>   * Returns 0..SZLONG-1
>   * Undefined if no bit exists, so code should check against 0 first.
>   */
> -static inline unsigned long __ffs(unsigned long word)
> +static __always_inline unsigned long __ffs(unsigned long word)
>  {
>         return __fls(word & -word);
>  }
>
>
> It does look like a gcc bug though, as at least some of the references
> are from a function that got split out from an inlined function but that
> has no remaining call sites.
>
>        Arnd


I applied it, but
"undefined reference to `mips_gcr_base'" still remains.


Then, I got a solution. This is it:

diff --git a/arch/mips/include/asm/bitops.h b/arch/mips/include/asm/bitops.h
index 830c93a..6a26ead 100644
--- a/arch/mips/include/asm/bitops.h
+++ b/arch/mips/include/asm/bitops.h
@@ -482,7 +482,7 @@ static inline void __clear_bit_unlock(unsigned
long nr, volatile unsigned long *
  * Return the bit position (0..63) of the most significant 1 bit in a word
  * Returns -1 if no 1 bit exists
  */
-static inline unsigned long __fls(unsigned long word)
+static __always_inline unsigned long __fls(unsigned long word)
 {
        int num;



LEROY Christophe provided me a tip
to find out the cause of the error.

-Winline pin-points which function
was not inlined despite of its inline marker.



$ make -j8  ARCH=mips CROSS_COMPILE=mips-linux- KCFLAGS=-Winline
arch/mips/mm/sc-mips.o
  CC      scripts/mod/devicetable-offsets.s
  CC      scripts/mod/empty.o
  MKELF   scripts/mod/elfconfig.h
  HOSTCC  scripts/mod/modpost.o
  HOSTCC  scripts/mod/sumversion.o
  HOSTCC  scripts/mod/file2alias.o
  HOSTLD  scripts/mod/modpost
  CC      kernel/bounds.s
  CALL    scripts/atomic/check-atomics.sh
  CC      arch/mips/kernel/asm-offsets.s
  CALL    scripts/checksyscalls.sh
<stdin>:1478:2: warning: #warning syscall pidfd_send_signal not
implemented [-Wcpp]
<stdin>:1481:2: warning: #warning syscall io_uring_setup not implemented [-Wcpp]
<stdin>:1484:2: warning: #warning syscall io_uring_enter not implemented [-Wcpp]
<stdin>:1487:2: warning: #warning syscall io_uring_register not
implemented [-Wcpp]
  CC      arch/mips/mm/sc-mips.o
In file included from ./include/linux/bitops.h:19,
                 from ./include/linux/kernel.h:12,
                 from arch/mips/mm/sc-mips.c:6:
./arch/mips/include/asm/bitops.h: In function 'mips_sc_init':
./arch/mips/include/asm/bitops.h:485:29: warning: inlining failed in
call to '__fls': call is unlikely and code size would grow [-Winline]
 static inline unsigned long __fls(unsigned long word)
                             ^~~~~
./arch/mips/include/asm/bitops.h:553:9: note: called from here
  return __fls(word & -word);
         ^~~~~~~~~~~~~~~~~~~
./arch/mips/include/asm/bitops.h:485:29: warning: inlining failed in
call to '__fls': call is unlikely and code size would grow [-Winline]
 static inline unsigned long __fls(unsigned long word)
                             ^~~~~
./arch/mips/include/asm/bitops.h:553:9: note: called from here
  return __fls(word & -word);
         ^~~~~~~~~~~~~~~~~~~
./arch/mips/include/asm/bitops.h:485:29: warning: inlining failed in
call to '__fls': call is unlikely and code size would grow [-Winline]
 static inline unsigned long __fls(unsigned long word)
                             ^~~~~
./arch/mips/include/asm/bitops.h:553:9: note: called from here
  return __fls(word & -word);
         ^~~~~~~~~~~~~~~~~~~
./arch/mips/include/asm/bitops.h:485:29: warning: inlining failed in
call to '__fls': call is unlikely and code size would grow [-Winline]
 static inline unsigned long __fls(unsigned long word)
                             ^~~~~
./arch/mips/include/asm/bitops.h:553:9: note: called from here
  return __fls(word & -word);
         ^~~~~~~~~~~~~~~~~~~
./arch/mips/include/asm/bitops.h:485:29: warning: inlining failed in
call to '__fls': call is unlikely and code size would grow [-Winline]
 static inline unsigned long __fls(unsigned long word)
                             ^~~~~
./arch/mips/include/asm/bitops.h:553:9: note: called from here
  return __fls(word & -word);
         ^~~~~~~~~~~~~~~~~~~
./arch/mips/include/asm/bitops.h:485:29: warning: inlining failed in
call to '__fls': call is unlikely and code size would grow [-Winline]
 static inline unsigned long __fls(unsigned long word)
                             ^~~~~
./arch/mips/include/asm/bitops.h:553:9: note: called from here
  return __fls(word & -word);
         ^~~~~~~~~~~~~~~~~~~
./arch/mips/include/asm/bitops.h:485:29: warning: inlining failed in
call to '__fls': call is unlikely and code size would grow [-Winline]
 static inline unsigned long __fls(unsigned long word)
                             ^~~~~
./arch/mips/include/asm/bitops.h:553:9: note: called from here
  return __fls(word & -word);
         ^~~~~~~~~~~~~~~~~~~
Masahiro Yamada March 25, 2019, 6:10 a.m. UTC | #11
Hi Arnd,




On Wed, Mar 20, 2019 at 10:34 PM Arnd Bergmann <arnd@arndb.de> wrote:
>
> On Wed, Mar 20, 2019 at 10:41 AM Arnd Bergmann <arnd@arndb.de> wrote:
> >
> > I've added your patch to my randconfig test setup and will let you
> > know if I see anything noticeable. I'm currently testing clang-arm32,
> > clang-arm64 and gcc-x86.
>
> This is the only additional bug that has come up so far:
>
> `.exit.text' referenced in section `.alt.smp.init' of
> drivers/char/ipmi/ipmi_msghandler.o: defined in discarded section
> `exit.text' of drivers/char/ipmi/ipmi_msghandler.o
>
> diff --git a/arch/arm/kernel/atags.h b/arch/arm/kernel/atags.h
> index 201100226301..84b12e33104d 100644
> --- a/arch/arm/kernel/atags.h
> +++ b/arch/arm/kernel/atags.h
> @@ -5,7 +5,7 @@ void convert_to_tag_list(struct tag *tags);
>  const struct machine_desc *setup_machine_tags(phys_addr_t __atags_pointer,
>         unsigned int machine_nr);
>  #else
> -static inline const struct machine_desc *
> +static __always_inline const struct machine_desc *
>  setup_machine_tags(phys_addr_t __atags_pointer, unsigned int machine_nr)
>  {
>         early_print("no ATAGS support: can't continue\n");
>


I do not know why to reproduce it,
but is "__init __noreturn" more sensible than
"__always_inline" here?
Masahiro Yamada March 25, 2019, 6:12 a.m. UTC | #12
On Mon, Mar 25, 2019 at 3:10 PM Masahiro Yamada
<yamada.masahiro@socionext.com> wrote:
>
> Hi Arnd,
>
>
>
>
> On Wed, Mar 20, 2019 at 10:34 PM Arnd Bergmann <arnd@arndb.de> wrote:
> >
> > On Wed, Mar 20, 2019 at 10:41 AM Arnd Bergmann <arnd@arndb.de> wrote:
> > >
> > > I've added your patch to my randconfig test setup and will let you
> > > know if I see anything noticeable. I'm currently testing clang-arm32,
> > > clang-arm64 and gcc-x86.
> >
> > This is the only additional bug that has come up so far:
> >
> > `.exit.text' referenced in section `.alt.smp.init' of
> > drivers/char/ipmi/ipmi_msghandler.o: defined in discarded section
> > `exit.text' of drivers/char/ipmi/ipmi_msghandler.o
> >
> > diff --git a/arch/arm/kernel/atags.h b/arch/arm/kernel/atags.h
> > index 201100226301..84b12e33104d 100644
> > --- a/arch/arm/kernel/atags.h
> > +++ b/arch/arm/kernel/atags.h
> > @@ -5,7 +5,7 @@ void convert_to_tag_list(struct tag *tags);
> >  const struct machine_desc *setup_machine_tags(phys_addr_t __atags_pointer,
> >         unsigned int machine_nr);
> >  #else
> > -static inline const struct machine_desc *
> > +static __always_inline const struct machine_desc *
> >  setup_machine_tags(phys_addr_t __atags_pointer, unsigned int machine_nr)
> >  {
> >         early_print("no ATAGS support: can't continue\n");
> >
>
>
> I do not know why to reproduce it,

"how to"


> but is "__init __noreturn" more sensible than
> "__always_inline" here?
>
>
> --
> Best Regards
> Masahiro Yamada
Masahiro Yamada March 25, 2019, 6:41 a.m. UTC | #13
Hi Heiko,


On Thu, Mar 21, 2019 at 5:02 PM Heiko Carstens
<heiko.carstens@de.ibm.com> wrote:
>
> On Wed, Mar 20, 2019 at 03:20:27PM +0900, Masahiro Yamada wrote:
> > Commit 60a3cdd06394 ("x86: add optimized inlining") introduced
> > CONFIG_OPTIMIZE_INLINING, but it has been available only for x86.
> >
> > The idea is obviously arch-agnostic although we need some code fixups.
> > This commit moves the config entry from arch/x86/Kconfig.debug to
> > lib/Kconfig.debug so that all architectures (except MIPS for now) can
> > benefit from it.
> >
> > At this moment, I added "depends on !MIPS" because fixing 0day bot reports
> > for MIPS was complex to me.
> >
> > I tested this patch on my arm/arm64 boards.
> >
> > This can make a huge difference in kernel image size especially when
> > CONFIG_OPTIMIZE_FOR_SIZE is enabled.
> >
> > For example, I got 3.5% smaller arm64 kernel image for v5.1-rc1.
> >
> >   dec       file
> >   18983424  arch/arm64/boot/Image.before
> >   18321920  arch/arm64/boot/Image.after
>
> Well, this will change, since now people (have to) start adding
> __always_inline annotations on all architectures, most likely until
> all have about the same amount of annotations like x86. This will
> reduce the benefit.


If people start to replace inline with __always_inline here and there,
yes, the difference will be reduced.

Perhaps, we might end up with fixing dozens of places or so,
but I guess we would still get benefit.


> Not sure if it's really a win that we get the inline vs
> __always_inline discussion now on all architectures.


This feature is not x86-specific.

I prefer "do it for all arches or don't do it at all"
instead of the half-baked state.

If we force inlining for the 'inline' marker
there is no point of having __always_inline.
Masahiro Yamada March 25, 2019, 6:44 a.m. UTC | #14
Hi Christophe,


On Sat, Mar 23, 2019 at 5:27 PM LEROY Christophe
<christophe.leroy@c-s.fr> wrote:
>
> Arnd Bergmann <arnd@arndb.de> a écrit :
>
> > On Wed, Mar 20, 2019 at 10:41 AM Arnd Bergmann <arnd@arndb.de> wrote:
> >>
> >> I've added your patch to my randconfig test setup and will let you
> >> know if I see anything noticeable. I'm currently testing clang-arm32,
> >> clang-arm64 and gcc-x86.
> >
> > This is the only additional bug that has come up so far:
> >
> > `.exit.text' referenced in section `.alt.smp.init' of
> > drivers/char/ipmi/ipmi_msghandler.o: defined in discarded section
> > `exit.text' of drivers/char/ipmi/ipmi_msghandler.o
>
> Wouldn't it be useful to activate -Winline gcc warning to ease finding
> out problematic usage of inline keyword ?


Yes, it is useful to find out
which function is causing the error.
Thanks for the tip.
Arnd Bergmann March 25, 2019, 7:32 a.m. UTC | #15
On Mon, Mar 25, 2019 at 7:11 AM Masahiro Yamada
<yamada.masahiro@socionext.com> wrote:
> On Wed, Mar 20, 2019 at 10:34 PM Arnd Bergmann <arnd@arndb.de> wrote:
> >
> > On Wed, Mar 20, 2019 at 10:41 AM Arnd Bergmann <arnd@arndb.de> wrote:
> > >
> > > I've added your patch to my randconfig test setup and will let you
> > > know if I see anything noticeable. I'm currently testing clang-arm32,
> > > clang-arm64 and gcc-x86.
> >
> > This is the only additional bug that has come up so far:
> >
> > `.exit.text' referenced in section `.alt.smp.init' of
> > drivers/char/ipmi/ipmi_msghandler.o: defined in discarded section
> > `exit.text' of drivers/char/ipmi/ipmi_msghandler.o
> >
> > diff --git a/arch/arm/kernel/atags.h b/arch/arm/kernel/atags.h
> > index 201100226301..84b12e33104d 100644
> > --- a/arch/arm/kernel/atags.h
> > +++ b/arch/arm/kernel/atags.h
> > @@ -5,7 +5,7 @@ void convert_to_tag_list(struct tag *tags);
> >  const struct machine_desc *setup_machine_tags(phys_addr_t __atags_pointer,
> >         unsigned int machine_nr);
> >  #else
> > -static inline const struct machine_desc *
> > +static __always_inline const struct machine_desc *
> >  setup_machine_tags(phys_addr_t __atags_pointer, unsigned int machine_nr)
> >  {
> >         early_print("no ATAGS support: can't continue\n");
> >
>
>
> I do not know why to reproduce it,
> but is "__init __noreturn" more sensible than
> "__always_inline" here?

It's in a header file, so it has to be 'inline'. We could make it
static inline __init __noreturn, but I don't see an advantage over
__always_inline there.

        Arnd
Masahiro Yamada March 25, 2019, 7:54 a.m. UTC | #16
On Mon, Mar 25, 2019 at 4:33 PM Arnd Bergmann <arnd@arndb.de> wrote:
>
> On Mon, Mar 25, 2019 at 7:11 AM Masahiro Yamada
> <yamada.masahiro@socionext.com> wrote:
> > On Wed, Mar 20, 2019 at 10:34 PM Arnd Bergmann <arnd@arndb.de> wrote:
> > >
> > > On Wed, Mar 20, 2019 at 10:41 AM Arnd Bergmann <arnd@arndb.de> wrote:
> > > >
> > > > I've added your patch to my randconfig test setup and will let you
> > > > know if I see anything noticeable. I'm currently testing clang-arm32,
> > > > clang-arm64 and gcc-x86.
> > >
> > > This is the only additional bug that has come up so far:
> > >
> > > `.exit.text' referenced in section `.alt.smp.init' of
> > > drivers/char/ipmi/ipmi_msghandler.o: defined in discarded section
> > > `exit.text' of drivers/char/ipmi/ipmi_msghandler.o
> > >
> > > diff --git a/arch/arm/kernel/atags.h b/arch/arm/kernel/atags.h
> > > index 201100226301..84b12e33104d 100644
> > > --- a/arch/arm/kernel/atags.h
> > > +++ b/arch/arm/kernel/atags.h
> > > @@ -5,7 +5,7 @@ void convert_to_tag_list(struct tag *tags);
> > >  const struct machine_desc *setup_machine_tags(phys_addr_t __atags_pointer,
> > >         unsigned int machine_nr);
> > >  #else
> > > -static inline const struct machine_desc *
> > > +static __always_inline const struct machine_desc *
> > >  setup_machine_tags(phys_addr_t __atags_pointer, unsigned int machine_nr)
> > >  {
> > >         early_print("no ATAGS support: can't continue\n");
> > >
> >
> >
> > I do not know why to reproduce it,
> > but is "__init __noreturn" more sensible than
> > "__always_inline" here?
>
> It's in a header file, so it has to be 'inline'. We could make it
> static inline __init __noreturn,

Yes, I like 'static inline __init __noreturn'

> but I don't see an advantage over
> __always_inline there.

__always_inline takes away the compiler's freedom.

I'd like to leave it up to the compiler where possible.

The inlining decision may change
depending on -O2, -Os, -Og(which was rejected)
or whatever optimization strategy.


>
>         Arnd
>
> ______________________________________________________
> Linux MTD discussion mailing list
> http://lists.infradead.org/mailman/listinfo/linux-mtd/
Arnd Bergmann March 25, 2019, 10:25 a.m. UTC | #17
On Mon, Mar 25, 2019 at 8:55 AM Masahiro Yamada
<yamada.masahiro@socionext.com> wrote:
>
> On Mon, Mar 25, 2019 at 4:33 PM Arnd Bergmann <arnd@arndb.de> wrote:
> >
> > On Mon, Mar 25, 2019 at 7:11 AM Masahiro Yamada
> > <yamada.masahiro@socionext.com> wrote:
> > > I do not know why to reproduce it,
> > > but is "__init __noreturn" more sensible than
> > > "__always_inline" here?
> >
> > It's in a header file, so it has to be 'inline'. We could make it
> > static inline __init __noreturn,
>
> Yes, I like 'static inline __init __noreturn'
>
> > but I don't see an advantage over
> > __always_inline there.
>
> __always_inline takes away the compiler's freedom.
>
> I'd like to leave it up to the compiler where possible.

Ok, fair enough.

      Arnd
Arnd Bergmann March 25, 2019, 10:27 a.m. UTC | #18
On Wed, Mar 20, 2019 at 2:34 PM Arnd Bergmann <arnd@arndb.de> wrote:
>
> On Wed, Mar 20, 2019 at 10:41 AM Arnd Bergmann <arnd@arndb.de> wrote:
> >
> > I've added your patch to my randconfig test setup and will let you
> > know if I see anything noticeable. I'm currently testing clang-arm32,
> > clang-arm64 and gcc-x86.
>
> This is the only additional bug that has come up so far:
>
> `.exit.text' referenced in section `.alt.smp.init' of
> drivers/char/ipmi/ipmi_msghandler.o: defined in discarded section
> `exit.text' of drivers/char/ipmi/ipmi_msghandler.o

FWIW, the message up there does not match the patch anyway,
that was an unrelated bug.

     Arnd
Christophe Leroy March 26, 2019, 6:02 a.m. UTC | #19
Hi Masahiro,

Le 25/03/2019 à 07:44, Masahiro Yamada a écrit :
> Hi Christophe,
> 
> 
> On Sat, Mar 23, 2019 at 5:27 PM LEROY Christophe
> <christophe.leroy@c-s.fr> wrote:
>>
>> Arnd Bergmann <arnd@arndb.de> a écrit :
>>
>>> On Wed, Mar 20, 2019 at 10:41 AM Arnd Bergmann <arnd@arndb.de> wrote:
>>>>
>>>> I've added your patch to my randconfig test setup and will let you
>>>> know if I see anything noticeable. I'm currently testing clang-arm32,
>>>> clang-arm64 and gcc-x86.
>>>
>>> This is the only additional bug that has come up so far:
>>>
>>> `.exit.text' referenced in section `.alt.smp.init' of
>>> drivers/char/ipmi/ipmi_msghandler.o: defined in discarded section
>>> `exit.text' of drivers/char/ipmi/ipmi_msghandler.o
>>
>> Wouldn't it be useful to activate -Winline gcc warning to ease finding
>> out problematic usage of inline keyword ?
> 
> 
> Yes, it is useful to find out
> which function is causing the error.
> Thanks for the tip.
> 

I did a mass build on kisskb. Almost ok, see results at 
http://kisskb.ellerman.id.au/kisskb/head/203d912edf8dde291977c71aa64024065e197f79/

ps3_defconfig with GCC 5 fails 
(http://kisskb.ellerman.id.au/kisskb/buildresult/13742711/) with:

arch/powerpc/mm/tlb-radix.c:148:2: error: asm operand 3 probably doesn't 
match constraints [-Werror]
arch/powerpc/mm/tlb-radix.c:148:2: error: impossible constraint in 'asm'
arch/powerpc/mm/tlb-radix.c:118:2: error: asm operand 3 probably doesn't 
match constraints [-Werror]
arch/powerpc/mm/tlb-radix.c:104:2: error: asm operand 3 probably doesn't 
match constraints [-Werror]

ps3_defconfig with GCC 4.6 fails 
(http://kisskb.ellerman.id.au/kisskb/buildresult/13742591/) with:

arch/powerpc/mm/tlb-radix.c:148:2: error: asm operand 3 probably doesn't 
match constraints [-Werror]
arch/powerpc/mm/tlb-radix.c:118:2: error: asm operand 3 probably doesn't 
match constraints [-Werror]
arch/powerpc/mm/tlb-radix.c:104:2: error: asm operand 3 probably doesn't 
match constraints [-Werror]

randconfig with GCC 4.6 fails 
(http://kisskb.ellerman.id.au/kisskb/buildresult/13742698/) with:

arch/powerpc/mm/tlb-radix.c:104:2: error: impossible constraint in 'asm'

Christophe
Ingo Molnar April 18, 2019, 1:20 p.m. UTC | #20
* Masahiro Yamada <yamada.masahiro@socionext.com> wrote:

> Commit 60a3cdd06394 ("x86: add optimized inlining") introduced
> CONFIG_OPTIMIZE_INLINING, but it has been available only for x86.
> 
> The idea is obviously arch-agnostic although we need some code fixups.
> This commit moves the config entry from arch/x86/Kconfig.debug to
> lib/Kconfig.debug so that all architectures (except MIPS for now) can
> benefit from it.
> 
> At this moment, I added "depends on !MIPS" because fixing 0day bot reports
> for MIPS was complex to me.
> 
> I tested this patch on my arm/arm64 boards.
> 
> This can make a huge difference in kernel image size especially when
> CONFIG_OPTIMIZE_FOR_SIZE is enabled.
> 
> For example, I got 3.5% smaller arm64 kernel image for v5.1-rc1.
> 
>   dec       file
>   18983424  arch/arm64/boot/Image.before
>   18321920  arch/arm64/boot/Image.after
> 
> This also slightly improves the "Kernel hacking" Kconfig menu.
> Commit e61aca5158a8 ("Merge branch 'kconfig-diet' from Dave Hansen')
> mentioned this config option would be a good fit in the "compiler option"
> menu. I did so.

No objections against moving it from x86 code to generic code.

Thanks,

	Ingo
Masahiro Yamada April 23, 2019, 3:26 a.m. UTC | #21
Hi Christophe,

On Tue, Mar 26, 2019 at 3:03 PM Christophe Leroy
<christophe.leroy@c-s.fr> wrote:
>
> Hi Masahiro,
>
> Le 25/03/2019 à 07:44, Masahiro Yamada a écrit :
> > Hi Christophe,
> >
> >
> > On Sat, Mar 23, 2019 at 5:27 PM LEROY Christophe
> > <christophe.leroy@c-s.fr> wrote:
> >>
> >> Arnd Bergmann <arnd@arndb.de> a écrit :
> >>
> >>> On Wed, Mar 20, 2019 at 10:41 AM Arnd Bergmann <arnd@arndb.de> wrote:
> >>>>
> >>>> I've added your patch to my randconfig test setup and will let you
> >>>> know if I see anything noticeable. I'm currently testing clang-arm32,
> >>>> clang-arm64 and gcc-x86.
> >>>
> >>> This is the only additional bug that has come up so far:
> >>>
> >>> `.exit.text' referenced in section `.alt.smp.init' of
> >>> drivers/char/ipmi/ipmi_msghandler.o: defined in discarded section
> >>> `exit.text' of drivers/char/ipmi/ipmi_msghandler.o
> >>
> >> Wouldn't it be useful to activate -Winline gcc warning to ease finding
> >> out problematic usage of inline keyword ?
> >
> >
> > Yes, it is useful to find out
> > which function is causing the error.
> > Thanks for the tip.
> >
>
> I did a mass build on kisskb. Almost ok, see results at
> http://kisskb.ellerman.id.au/kisskb/head/203d912edf8dde291977c71aa64024065e197f79/
>
> ps3_defconfig with GCC 5 fails
> (http://kisskb.ellerman.id.au/kisskb/buildresult/13742711/) with:
>
> arch/powerpc/mm/tlb-radix.c:148:2: error: asm operand 3 probably doesn't
> match constraints [-Werror]
> arch/powerpc/mm/tlb-radix.c:148:2: error: impossible constraint in 'asm'
> arch/powerpc/mm/tlb-radix.c:118:2: error: asm operand 3 probably doesn't
> match constraints [-Werror]
> arch/powerpc/mm/tlb-radix.c:104:2: error: asm operand 3 probably doesn't
> match constraints [-Werror]
>
> ps3_defconfig with GCC 4.6 fails
> (http://kisskb.ellerman.id.au/kisskb/buildresult/13742591/) with:
>
> arch/powerpc/mm/tlb-radix.c:148:2: error: asm operand 3 probably doesn't
> match constraints [-Werror]
> arch/powerpc/mm/tlb-radix.c:118:2: error: asm operand 3 probably doesn't
> match constraints [-Werror]
> arch/powerpc/mm/tlb-radix.c:104:2: error: asm operand 3 probably doesn't
> match constraints [-Werror]
>
> randconfig with GCC 4.6 fails
> (http://kisskb.ellerman.id.au/kisskb/buildresult/13742698/) with:
>
> arch/powerpc/mm/tlb-radix.c:104:2: error: impossible constraint in 'asm'


Thanks.

I fixed these ppc errors,
and separate out arch-fixes.

Now, the latest version is v3.
diff mbox series

Patch

diff --git a/arch/arm64/include/asm/cpufeature.h b/arch/arm64/include/asm/cpufeature.h
index e505e1f..77d1aa5 100644
--- a/arch/arm64/include/asm/cpufeature.h
+++ b/arch/arm64/include/asm/cpufeature.h
@@ -406,7 +406,7 @@  static inline bool cpu_have_feature(unsigned int num)
 }
 
 /* System capability check for constant caps */
-static inline bool __cpus_have_const_cap(int num)
+static __always_inline bool __cpus_have_const_cap(int num)
 {
 	if (num >= ARM64_NCAPS)
 		return false;
@@ -420,7 +420,7 @@  static inline bool cpus_have_cap(unsigned int num)
 	return test_bit(num, cpu_hwcaps);
 }
 
-static inline bool cpus_have_const_cap(int num)
+static __always_inline bool cpus_have_const_cap(int num)
 {
 	if (static_branch_likely(&arm64_const_caps_ready))
 		return __cpus_have_const_cap(num);
diff --git a/arch/mips/kernel/cpu-bugs64.c b/arch/mips/kernel/cpu-bugs64.c
index bada74a..c04b97a 100644
--- a/arch/mips/kernel/cpu-bugs64.c
+++ b/arch/mips/kernel/cpu-bugs64.c
@@ -42,8 +42,8 @@  static inline void align_mod(const int align, const int mod)
 		: "n"(align), "n"(mod));
 }
 
-static inline void mult_sh_align_mod(long *v1, long *v2, long *w,
-				     const int align, const int mod)
+static __always_inline void mult_sh_align_mod(long *v1, long *v2, long *w,
+					      const int align, const int mod)
 {
 	unsigned long flags;
 	int m1, m2;
diff --git a/arch/powerpc/kernel/prom_init.c b/arch/powerpc/kernel/prom_init.c
index f33ff41..241fe6b 100644
--- a/arch/powerpc/kernel/prom_init.c
+++ b/arch/powerpc/kernel/prom_init.c
@@ -501,14 +501,14 @@  static int __init prom_next_node(phandle *nodep)
 	}
 }
 
-static inline int prom_getprop(phandle node, const char *pname,
-			       void *value, size_t valuelen)
+static inline int __init prom_getprop(phandle node, const char *pname,
+				      void *value, size_t valuelen)
 {
 	return call_prom("getprop", 4, 1, node, ADDR(pname),
 			 (u32)(unsigned long) value, (u32) valuelen);
 }
 
-static inline int prom_getproplen(phandle node, const char *pname)
+static inline int __init prom_getproplen(phandle node, const char *pname)
 {
 	return call_prom("getproplen", 2, 1, node, ADDR(pname));
 }
diff --git a/arch/powerpc/mm/tlb-radix.c b/arch/powerpc/mm/tlb-radix.c
index 6a23b9e..a2b2848 100644
--- a/arch/powerpc/mm/tlb-radix.c
+++ b/arch/powerpc/mm/tlb-radix.c
@@ -928,7 +928,7 @@  void radix__tlb_flush(struct mmu_gather *tlb)
 	tlb->need_flush_all = 0;
 }
 
-static inline void __radix__flush_tlb_range_psize(struct mm_struct *mm,
+static __always_inline void __radix__flush_tlb_range_psize(struct mm_struct *mm,
 				unsigned long start, unsigned long end,
 				int psize, bool also_pwc)
 {
diff --git a/arch/s390/include/asm/cpacf.h b/arch/s390/include/asm/cpacf.h
index 3cc52e3..f316de4 100644
--- a/arch/s390/include/asm/cpacf.h
+++ b/arch/s390/include/asm/cpacf.h
@@ -202,7 +202,7 @@  static inline int __cpacf_check_opcode(unsigned int opcode)
 	}
 }
 
-static inline int cpacf_query(unsigned int opcode, cpacf_mask_t *mask)
+static __always_inline int cpacf_query(unsigned int opcode, cpacf_mask_t *mask)
 {
 	if (__cpacf_check_opcode(opcode)) {
 		__cpacf_query(opcode, mask);
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index c1f9b3c..1a3e2b5 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -310,9 +310,6 @@  config ZONE_DMA32
 config AUDIT_ARCH
 	def_bool y if X86_64
 
-config ARCH_SUPPORTS_OPTIMIZED_INLINING
-	def_bool y
-
 config ARCH_SUPPORTS_DEBUG_PAGEALLOC
 	def_bool y
 
diff --git a/arch/x86/Kconfig.debug b/arch/x86/Kconfig.debug
index 15d0fbe..f730680 100644
--- a/arch/x86/Kconfig.debug
+++ b/arch/x86/Kconfig.debug
@@ -266,20 +266,6 @@  config CPA_DEBUG
 	---help---
 	  Do change_page_attr() self-tests every 30 seconds.
 
-config OPTIMIZE_INLINING
-	bool "Allow gcc to uninline functions marked 'inline'"
-	---help---
-	  This option determines if the kernel forces gcc to inline the functions
-	  developers have marked 'inline'. Doing so takes away freedom from gcc to
-	  do what it thinks is best, which is desirable for the gcc 3.x series of
-	  compilers. The gcc 4.x series have a rewritten inlining algorithm and
-	  enabling this option will generate a smaller kernel there. Hopefully
-	  this algorithm is so good that allowing gcc 4.x and above to make the
-	  decision will become the default in the future. Until then this option
-	  is there to test gcc for this.
-
-	  If unsure, say N.
-
 config DEBUG_ENTRY
 	bool "Debug low-level entry code"
 	depends on DEBUG_KERNEL
diff --git a/drivers/mtd/nand/raw/vf610_nfc.c b/drivers/mtd/nand/raw/vf610_nfc.c
index a662ca1..19792d7 100644
--- a/drivers/mtd/nand/raw/vf610_nfc.c
+++ b/drivers/mtd/nand/raw/vf610_nfc.c
@@ -364,7 +364,7 @@  static int vf610_nfc_cmd(struct nand_chip *chip,
 {
 	const struct nand_op_instr *instr;
 	struct vf610_nfc *nfc = chip_to_nfc(chip);
-	int op_id = -1, trfr_sz = 0, offset;
+	int op_id = -1, trfr_sz = 0, offset = 0;
 	u32 col = 0, row = 0, cmd1 = 0, cmd2 = 0, code = 0;
 	bool force8bit = false;
 
diff --git a/include/linux/compiler_types.h b/include/linux/compiler_types.h
index ba814f1..19e58b9 100644
--- a/include/linux/compiler_types.h
+++ b/include/linux/compiler_types.h
@@ -140,8 +140,7 @@  struct ftrace_likely_data {
  * Do not use __always_inline here, since currently it expands to inline again
  * (which would break users of __always_inline).
  */
-#if !defined(CONFIG_ARCH_SUPPORTS_OPTIMIZED_INLINING) || \
-	!defined(CONFIG_OPTIMIZE_INLINING)
+#if !defined(CONFIG_OPTIMIZE_INLINING)
 #define inline inline __attribute__((__always_inline__)) __gnu_inline \
 	__maybe_unused notrace
 #else
diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
index 0d9e817..20f3dfc 100644
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -310,6 +310,21 @@  config HEADERS_CHECK
 	  exported to $(INSTALL_HDR_PATH) (usually 'usr/include' in
 	  your build tree), to make sure they're suitable.
 
+config OPTIMIZE_INLINING
+	bool "Allow compiler to uninline functions marked 'inline'"
+	depends on !MIPS  # TODO: look into MIPS code
+	help
+	  This option determines if the kernel forces gcc to inline the functions
+	  developers have marked 'inline'. Doing so takes away freedom from gcc to
+	  do what it thinks is best, which is desirable for the gcc 3.x series of
+	  compilers. The gcc 4.x series have a rewritten inlining algorithm and
+	  enabling this option will generate a smaller kernel there. Hopefully
+	  this algorithm is so good that allowing gcc 4.x and above to make the
+	  decision will become the default in the future. Until then this option
+	  is there to test gcc for this.
+
+	  If unsure, say N.
+
 config DEBUG_SECTION_MISMATCH
 	bool "Enable full Section mismatch analysis"
 	help