scsi: fc: force inlining of wwn conversion functions

Message ID	80200c53ae54f6cb34bd6fb51e9da65fdcc03004.1461073602.git.jpoimboe@redhat.com (mailing list archive)
State	Not Applicable, archived
Headers	show Return-Path: <linux-scsi-owner@kernel.org> From: Josh Poimboeuf <jpoimboe@redhat.com> To: James Bottomley <James.Bottomley@HansenPartnership.com>, "Martin K. Petersen" <martin.petersen@oracle.com> Cc: linux-scsi@vger.kernel.org, linux-kernel@vger.kernel.org, Denys Vlasenko <dvlasenk@redhat.com>, Thomas Graf <tgraf@suug.ch>, Peter Zijlstra <peterz@infradead.org>, David Rientjes <rientjes@google.com>, Andrew Morton <akpm@linux-foundation.org>, Arnd Bergmann <arnd@arndb.de>, jamborm@gcc.gnu.org, Ingo Molnar <mingo@kernel.org>, Himanshu Madhani <himanshu.madhani@qlogic.com>, qla2xxx-upstream@qlogic.com Subject: [PATCH] scsi: fc: force inlining of wwn conversion functions Date: Tue, 19 Apr 2016 08:56:00 -0500 Message-Id: <80200c53ae54f6cb34bd6fb51e9da65fdcc03004.1461073602.git.jpoimboe@redhat.com> In-Reply-To: <20160419085221.GA29087@gmail.com> References: <20160419085221.GA29087@gmail.com> Sender: linux-scsi-owner@vger.kernel.org Precedence: bulk

Josh Poimboeuf April 19, 2016, 1:56 p.m. UTC

objtool reports [1] the following warning:

  drivers/scsi/qla2xxx/qla_attr.o: warning: objtool: qla2x00_get_host_fabric_name() falls through to next function qla2x00_get_starget_port_name()

This warning is due to a gcc bug [2] which causes corrupt code:

  0000000000002f53 <qla2x00_get_host_fabric_name>:
      2f53:       55                      push   %rbp
      2f54:       48 89 e5                mov    %rsp,%rbp

  0000000000002f57 <qla2x00_get_fc_host_stats>:
      2f57:       55                      push   %rbp
      2f58:       b9 e8 00 00 00          mov    $0xe8,%ecx
      2f5d:       48 89 e5                mov    %rsp,%rbp
  ...

Note that qla2x00_get_host_fabric_name() is inexplicably truncated after
setting up the frame pointer.  It falls through to the next function,
which is very bad.

It occurs with the combination of the following two recent commits:

  bc27fb68aaad ("include/uapi/linux/byteorder, swab: force inlining of some byteswap operations")
  ef3fb2422ffe ("scsi: fc: use get/put_unaligned64 for wwn access")

The call chain which appears to trigger the problem is:

  qla2x00_get_host_fabric_name()
    wwn_to_u64()
      get_unaligned_be64()
        be64_to_cpup()
          __be64_to_cpup()

The bug requires very specific conditions to trigger.  According to Martin
Jambor (from the gcc bugzilla):

  "This bug can occur when an inlineable function containing a call to
  __builtin_constant_p, which checks a parameter or a value it
  references and a (possibly indirect) caller of the function actually
  passes a constant, but stores it using a type of a different size."

There's no reliable way to avoid (or even detect) the bug.  Until it
gets fixed in released versions of gcc, the least intrusive workaround
for this particular issue is to force the wwn conversion functions to be
inlined.

[1] https://lists.01.org/pipermail/kbuild-all/2016-April/019579.html
[2] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70646

Reported-by: kbuild test robot <fengguang.wu@intel.com>
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
---
 include/scsi/scsi_transport_fc.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

Quinn Tran April 22, 2016, 11:17 p.m. UTC | #1

Current kernel (4.6.0-rc4+) + GCC 5.3.0 definitely truncated qla2x00_get_host_fabric_name() routine.  Just like Josh indicated, we’re dropping down to the next routine.


root@mars:/sys/class/fc_host/host3  2016-04-22 16:07:30
> cat fabric_name

Killed
——
static void
qla2x00_get_host_fabric_name(struct Scsi_Host *shost)
{
    32d0:       e8 00 00 00 00          callq  32d5 <qla2x00_get_host_fabric_name+0x5>
    32d5:       55                      push   %rbp
    32d6:       48 89 e5                mov    %rsp,%rbp
    32d9:       0f 1f 80 00 00 00 00    nopl   0x0(%rax)

00000000000032e0 <qla2x00_get_starget_node_name>:
qla2x00_get_starget_node_name():
/root/qt/linux.git/drivers/scsi/qla2xxx/qla_attr.c:1756
        fc_host_port_type(shost) = port_type;
}


----

Apr 22 16:07:50 mars kernel: BUG: unable to handle kernel NULL pointer dereference at 0000000000000058
Apr 22 16:07:50 mars kernel: IP: [<ffffffff813f72d7>] scsi_is_host_device+0x7/0x20
Apr 22 16:07:50 mars kernel: PGD 7fe1c8067 PUD 7f5c72067 PMD 0 
Apr 22 16:07:50 mars kernel: Oops: 0000 [#1] SMP 
Apr 22 16:07:50 mars kernel: Modules linked in: qla2xxx scsi_transport_fc ebtable_nat ebtables ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 
...
dm_mirror(E) dm_region_hash(E) dm_log(E) dm_mod(E) [last unloaded: qla2xxx]
Apr 22 16:07:50 mars kernel: CPU: 8 PID: 10452 Comm: cat Tainted: G            E   4.6.0-rc4+ #2
Apr 22 16:07:50 mars kernel: Hardware name: HP ProLiant DL380 G7, BIOS P67 05/05/2011
Apr 22 16:07:50 mars kernel: task: ffff8807fcd1a880 ti: ffff8807ff128000 task.ti: ffff8807ff128000
Apr 22 16:07:50 mars kernel: RIP: 0010:[<ffffffff813f72d7>]  [<ffffffff813f72d7>] scsi_is_host_device+0x7/0x20
Apr 22 16:07:50 mars kernel: RSP: 0018:ffff8807ff12bcf0  EFLAGS: 00010246
Apr 22 16:07:50 mars kernel: RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffff880ffe8ade88
Apr 22 16:07:50 mars kernel: RDX: ffff8807f5db9000 RSI: ffff880ffed43340 RDI: 0000000000000000
Apr 22 16:07:50 mars kernel: RBP: ffff8807ff12bd08 R08: ffff88101f45ac38 R09: ffff8807fccef280
Apr 22 16:07:50 mars kernel: R10: 0000000000000000 R11: 0000000000000000 R12: ffff88101b7e6000
Apr 22 16:07:50 mars kernel: R13: ffff8807fdaf1f00 R14: ffff8800dad379c0 R15: 0000000000000001
Apr 22 16:07:50 mars kernel: FS:  00007fd569512700(0000) GS:ffff88081fc80000(0000) knlGS:0000000000000000
Apr 22 16:07:50 mars kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Apr 22 16:07:50 mars kernel: CR2: 0000000000000058 CR3: 00000007f23f4000 CR4: 00000000000006e0
Apr 22 16:07:50 mars kernel: Stack:
Apr 22 16:07:50 mars kernel: ffffffffa0759db5 ffff88101b7e6000 ffff8807f5db9000 ffff8807ff12bd10
Apr 22 16:07:50 mars kernel: ffff8807ff12bd30 ffffffffa065b1bb ffff880ffed43340 ffffffff8166d950
Apr 22 16:07:50 mars kernel: ffff8807ff12bd50 ffffffff813c9e30 ffffffff815ccfb2 ffff8800dad379c0
Apr 22 16:07:50 mars kernel: Call Trace:
Apr 22 16:07:50 mars kernel: [<ffffffffa0759db5>] ? qla2x00_get_starget_node_name+0x25/0x90 [qla2xxx]
Apr 22 16:07:50 mars kernel: [<ffffffffa065b1bb>] ? show_fc_host_fabric_name+0x4b/0x80 [scsi_transport_fc]
Apr 22 16:07:50 mars kernel: [<ffffffff813c9e30>] ? dev_attr_show+0x20/0x50




Regards,
Quinn Tran






-----Original Message-----
From: <linux-scsi-owner@vger.kernel.org> on behalf of Josh Poimboeuf <jpoimboe@redhat.com>

Date: Tuesday, April 19, 2016 at 6:56 AM
To: James Bottomley <James.Bottomley@HansenPartnership.com>, "Martin K. Petersen" <martin.petersen@oracle.com>
Cc: linux-scsi <linux-scsi@vger.kernel.org>, linux-kernel <linux-kernel@vger.kernel.org>, Denys Vlasenko <dvlasenk@redhat.com>, Thomas Graf <tgraf@suug.ch>, Peter Zijlstra <peterz@infradead.org>, David Rientjes <rientjes@google.com>, Andrew Morton <akpm@linux-foundation.org>, Arnd Bergmann <arnd@arndb.de>, "jamborm@gcc.gnu.org" <jamborm@gcc.gnu.org>, Ingo Molnar <mingo@kernel.org>, Himanshu Madhani <himanshu.madhani@qlogic.com>, Dept-Eng QLA2xxx Upstream <qla2xxx-upstream@qlogic.com>
Subject: [PATCH] scsi: fc: force inlining of wwn conversion functions

>objtool reports [1] the following warning:

>

>  drivers/scsi/qla2xxx/qla_attr.o: warning: objtool: qla2x00_get_host_fabric_name() falls through to next function qla2x00_get_starget_port_name()

>

>This warning is due to a gcc bug [2] which causes corrupt code:

>

>  0000000000002f53 <qla2x00_get_host_fabric_name>:

>      2f53:       55                      push   %rbp

>      2f54:       48 89 e5                mov    %rsp,%rbp

>

>  0000000000002f57 <qla2x00_get_fc_host_stats>:

>      2f57:       55                      push   %rbp

>      2f58:       b9 e8 00 00 00          mov    $0xe8,%ecx

>      2f5d:       48 89 e5                mov    %rsp,%rbp

>  ...

>

>Note that qla2x00_get_host_fabric_name() is inexplicably truncated after

>setting up the frame pointer.  It falls through to the next function,

>which is very bad.

>

>It occurs with the combination of the following two recent commits:

>

>  bc27fb68aaad ("include/uapi/linux/byteorder, swab: force inlining of some byteswap operations")

>  ef3fb2422ffe ("scsi: fc: use get/put_unaligned64 for wwn access")

>

>The call chain which appears to trigger the problem is:

>

>  qla2x00_get_host_fabric_name()

>    wwn_to_u64()

>      get_unaligned_be64()

>        be64_to_cpup()

>          __be64_to_cpup()

>

>The bug requires very specific conditions to trigger.  According to Martin

>Jambor (from the gcc bugzilla):

>

>  "This bug can occur when an inlineable function containing a call to

>  __builtin_constant_p, which checks a parameter or a value it

>  references and a (possibly indirect) caller of the function actually

>  passes a constant, but stores it using a type of a different size."

>

>There's no reliable way to avoid (or even detect) the bug.  Until it

>gets fixed in released versions of gcc, the least intrusive workaround

>for this particular issue is to force the wwn conversion functions to be

>inlined.

>

>[1] https://lists.01.org/pipermail/kbuild-all/2016-April/019579.html

>[2] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70646

>

>Reported-by: kbuild test robot <fengguang.wu@intel.com>

>Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>

>---

> include/scsi/scsi_transport_fc.h | 4 ++--

> 1 file changed, 2 insertions(+), 2 deletions(-)

>

>diff --git a/include/scsi/scsi_transport_fc.h b/include/scsi/scsi_transport_fc.h

>index bf66ea6..1919cd4 100644

>--- a/include/scsi/scsi_transport_fc.h

>+++ b/include/scsi/scsi_transport_fc.h

>@@ -796,12 +796,12 @@ fc_remote_port_chkready(struct fc_rport *rport)

> 	return result;

> }

> 

>-static inline u64 wwn_to_u64(u8 *wwn)

>+static __always_inline u64 wwn_to_u64(u8 *wwn)

> {

> 	return get_unaligned_be64(wwn);

> }

> 

>-static inline void u64_to_wwn(u64 inm, u8 *wwn)

>+static __always_inline void u64_to_wwn(u64 inm, u8 *wwn)

> {

> 	put_unaligned_be64(inm, wwn);

> }

>-- 

>2.4.11

>

>--

>To unsubscribe from this list: send the line "unsubscribe linux-scsi" in

>the body of a message to majordomo@vger.kernel.org

>More majordomo info at  http://vger.kernel.org/majordomo-info.html

Josh Poimboeuf April 25, 2016, 4:07 p.m. UTC | #2

James, 

Can you merge this patch for 4.6?

On Tue, Apr 19, 2016 at 08:56:00AM -0500, Josh Poimboeuf wrote:
> objtool reports [1] the following warning:
> 
>   drivers/scsi/qla2xxx/qla_attr.o: warning: objtool: qla2x00_get_host_fabric_name() falls through to next function qla2x00_get_starget_port_name()
> 
> This warning is due to a gcc bug [2] which causes corrupt code:
> 
>   0000000000002f53 <qla2x00_get_host_fabric_name>:
>       2f53:       55                      push   %rbp
>       2f54:       48 89 e5                mov    %rsp,%rbp
> 
>   0000000000002f57 <qla2x00_get_fc_host_stats>:
>       2f57:       55                      push   %rbp
>       2f58:       b9 e8 00 00 00          mov    $0xe8,%ecx
>       2f5d:       48 89 e5                mov    %rsp,%rbp
>   ...
> 
> Note that qla2x00_get_host_fabric_name() is inexplicably truncated after
> setting up the frame pointer.  It falls through to the next function,
> which is very bad.
> 
> It occurs with the combination of the following two recent commits:
> 
>   bc27fb68aaad ("include/uapi/linux/byteorder, swab: force inlining of some byteswap operations")
>   ef3fb2422ffe ("scsi: fc: use get/put_unaligned64 for wwn access")
> 
> The call chain which appears to trigger the problem is:
> 
>   qla2x00_get_host_fabric_name()
>     wwn_to_u64()
>       get_unaligned_be64()
>         be64_to_cpup()
>           __be64_to_cpup()
> 
> The bug requires very specific conditions to trigger.  According to Martin
> Jambor (from the gcc bugzilla):
> 
>   "This bug can occur when an inlineable function containing a call to
>   __builtin_constant_p, which checks a parameter or a value it
>   references and a (possibly indirect) caller of the function actually
>   passes a constant, but stores it using a type of a different size."
> 
> There's no reliable way to avoid (or even detect) the bug.  Until it
> gets fixed in released versions of gcc, the least intrusive workaround
> for this particular issue is to force the wwn conversion functions to be
> inlined.
> 
> [1] https://lists.01.org/pipermail/kbuild-all/2016-April/019579.html
> [2] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70646
> 
> Reported-by: kbuild test robot <fengguang.wu@intel.com>
> Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
> ---
>  include/scsi/scsi_transport_fc.h | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/include/scsi/scsi_transport_fc.h b/include/scsi/scsi_transport_fc.h
> index bf66ea6..1919cd4 100644
> --- a/include/scsi/scsi_transport_fc.h
> +++ b/include/scsi/scsi_transport_fc.h
> @@ -796,12 +796,12 @@ fc_remote_port_chkready(struct fc_rport *rport)
>  	return result;
>  }
>  
> -static inline u64 wwn_to_u64(u8 *wwn)
> +static __always_inline u64 wwn_to_u64(u8 *wwn)
>  {
>  	return get_unaligned_be64(wwn);
>  }
>  
> -static inline void u64_to_wwn(u64 inm, u8 *wwn)
> +static __always_inline void u64_to_wwn(u64 inm, u8 *wwn)
>  {
>  	put_unaligned_be64(inm, wwn);
>  }
> -- 
> 2.4.11
>

Martin K. Petersen April 26, 2016, 2:40 a.m. UTC | #3

>>>>> "Josh" == Josh Poimboeuf <jpoimboe@redhat.com> writes:

Josh> Can you merge this patch for 4.6?

I am really not a big fan of working around compiler bugs in a device
driver.

Are we sure there are no other get_unaligned_be64() calls in the kernel
that suffer the same fate?

James Bottomley April 26, 2016, 3:37 a.m. UTC | #4

On Mon, 2016-04-25 at 22:40 -0400, Martin K. Petersen wrote:
> > > > > > "Josh" == Josh Poimboeuf <jpoimboe@redhat.com> writes:
> 
> Josh> Can you merge this patch for 4.6?
> 
> I am really not a big fan of working around compiler bugs in a device
> driver.

Me neither

> Are we sure there are no other get_unaligned_be64() calls in the
> kernel that suffer the same fate?

Agree, plus, as I've said before, we have 3-4 weeks before we go final,
so we still have some time before a decision has to be made.  It looks
like the gcc people already have a patch for the compiler, so the
distributions could just push that out through channels.

James

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Arnd Bergmann April 26, 2016, 7:22 a.m. UTC | #5

On Monday 25 April 2016 20:37:31 James Bottomley wrote:
> On Mon, 2016-04-25 at 22:40 -0400, Martin K. Petersen wrote:
> > > > > > > "Josh" == Josh Poimboeuf <jpoimboe@redhat.com> writes:
> > 
> > Josh> Can you merge this patch for 4.6?
> > 
> > I am really not a big fan of working around compiler bugs in a device
> > driver.
> 
> Me neither
> 
> > Are we sure there are no other get_unaligned_be64() calls in the
> > kernel that suffer the same fate?
> 
> Agree, plus, as I've said before, we have 3-4 weeks before we go final,
> so we still have some time before a decision has to be made.  It looks
> like the gcc people already have a patch for the compiler, so the
> distributions could just push that out through channels.

I don't think we can realistically blacklist gcc-4.9.{0,1,2,3},
gcc-5.{0,1,2,3}.* and gcc-6.0 and require everyone to upgrade to compilers
that have not been released yet in order to build a linux-4.6 kernel.

	Arnd
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Christoph Hellwig April 26, 2016, 8:35 a.m. UTC | #6

On Tue, Apr 26, 2016 at 09:22:46AM +0200, Arnd Bergmann wrote:
> > Agree, plus, as I've said before, we have 3-4 weeks before we go final,
> > so we still have some time before a decision has to be made.  It looks
> > like the gcc people already have a patch for the compiler, so the
> > distributions could just push that out through channels.
> 
> I don't think we can realistically blacklist gcc-4.9.{0,1,2,3},
> gcc-5.{0,1,2,3}.* and gcc-6.0 and require everyone to upgrade to compilers
> that have not been released yet in order to build a linux-4.6 kernel.

Agreed.  What about just removing the wrappers?  They seem fairly
pointless to start with.
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Martin K. Petersen April 26, 2016, 1:06 p.m. UTC | #7

>>>>> "Arnd" == Arnd Bergmann <arnd@arndb.de> writes:

Arnd> I don't think we can realistically blacklist gcc-4.9.{0,1,2,3},
Arnd> gcc-5.{0,1,2,3}.* and gcc-6.0 and require everyone to upgrade to
Arnd> compilers that have not been released yet in order to build a
Arnd> linux-4.6 kernel.

I agree that compiler blacklisting is problematic and I'd like to avoid
it. The question is how far we go in the kernel to accommodate various
levels of brokenness.

In any case. Sticking compiler workarounds in device driver code is akin
to putting demolition orders on display on Alpha Centauri. At the very
minimum the patch should put a fat comment in the code stating that
these wrapper functions or #defines should not be changed in the future
because that'll break builds using gcc XYZ. But that does not solve the
problem for anybody else that might be doing something similar.
Converting between u64 and $RANDOM_TYPE in an inline wrapper does not
seem like a rare and unusual programming pattern.

Arnd Bergmann April 26, 2016, 3:58 p.m. UTC | #8

On Tuesday 26 April 2016 09:06:54 Martin K. Petersen wrote:
> >>>>> "Arnd" == Arnd Bergmann <arnd@arndb.de> writes:
> 
> Arnd> I don't think we can realistically blacklist gcc-4.9.{0,1,2,3},
> Arnd> gcc-5.{0,1,2,3}.* and gcc-6.0 and require everyone to upgrade to
> Arnd> compilers that have not been released yet in order to build a
> Arnd> linux-4.6 kernel.
> 
> I agree that compiler blacklisting is problematic and I'd like to avoid
> it. The question is how far we go in the kernel to accommodate various
> levels of brokenness.
> 
> In any case. Sticking compiler workarounds in device driver code is akin
> to putting demolition orders on display on Alpha Centauri. At the very
> minimum the patch should put a fat comment in the code stating that
> these wrapper functions or #defines should not be changed in the future
> because that'll break builds using gcc XYZ. But that does not solve the
> problem for anybody else that might be doing something similar.
> Converting between u64 and $RANDOM_TYPE in an inline wrapper does not
> seem like a rare and unusual programming pattern.

It's not the driver really, it's the core scsi/fc layer, which makes
it a little dangerous that a random driver.

I agree that putting a comment in would also help. What I understand
from the bug report is that to trigger this bug you need these elements:

1. an inline function marked __always_inline
2. another inline function that is automatically inlined (not __always_inline)
3. CONFIG_OPTIMIZE_INLINING=y to guarantee 2
4. __builtin_compatible_p inside that inline function

The last point is what Denys introduced in the kernel with
bc27fb68aaad ("include/uapi/linux/byteorder, swab: force inlining of some
byteswap operations"). So maybe it's better after all to revert that
patch, to have a higher confidence in the same bug not appearing
elsewhere. It's also really a workaround for another quirk of the
compiler, but that one only results in duplicated functions in object
code rather than functions that end in the middle.

	Arnd

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

James Bottomley April 26, 2016, 10:36 p.m. UTC | #9

On Tue, 2016-04-26 at 17:58 +0200, Arnd Bergmann wrote:
> On Tuesday 26 April 2016 09:06:54 Martin K. Petersen wrote:
> > > > > > > "Arnd" == Arnd Bergmann <arnd@arndb.de> writes:
> > 
> > Arnd> I don't think we can realistically blacklist gcc
> > -4.9.{0,1,2,3},
> > Arnd> gcc-5.{0,1,2,3}.* and gcc-6.0 and require everyone to upgrade
> > to
> > Arnd> compilers that have not been released yet in order to build a
> > Arnd> linux-4.6 kernel.
> > 
> > I agree that compiler blacklisting is problematic and I'd like to 
> > avoid it. The question is how far we go in the kernel to 
> > accommodate various levels of brokenness.
> > 
> > In any case. Sticking compiler workarounds in device driver code is 
> > akin to putting demolition orders on display on Alpha Centauri. At 
> > the very minimum the patch should put a fat comment in the code 
> > stating that these wrapper functions or #defines should not be 
> > changed in the future because that'll break builds using gcc XYZ. 
> > But that does not solve the problem for anybody else that might be 
> > doing something similar. Converting between u64 and $RANDOM_TYPE in 
> > an inline wrapper does not seem like a rare and unusual programming
> > pattern.
> 
> It's not the driver really, it's the core scsi/fc layer, which makes
> it a little dangerous that a random driver.

Well, it's libfc; that's a fibre channel transport class mostly used by
FCoE drivers ... there's few enough of those to call it driver only.

> I agree that putting a comment in would also help. What I understand
> from the bug report is that to trigger this bug you need these
> elements:
> 
> 1. an inline function marked __always_inline
> 2. another inline function that is automatically inlined (not
> __always_inline)
> 3. CONFIG_OPTIMIZE_INLINING=y to guarantee 2
> 4. __builtin_compatible_p inside that inline function
> 
> The last point is what Denys introduced in the kernel with
> bc27fb68aaad ("include/uapi/linux/byteorder, swab: force inlining of 
> some byteswap operations"). So maybe it's better after all to revert 
> that patch, to have a higher confidence in the same bug not appearing
> elsewhere. It's also really a workaround for another quirk of the
> compiler, but that one only results in duplicated functions in object
> code rather than functions that end in the middle.

Yes, I think this is my preferred option.  That patch is nothing more
than an attempt to force the compiler to do something it didn't do but
should have.  If we apply the general rule that we shouldn't work
around compiler problems in the kernel code, then that should have been
disallowed first.  Plus, as the root cause of all of this, reverting
that patch will ensure that nothing else picks up this problem (at
least from the route we got it).

James


--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Martin K. Petersen April 27, 2016, 12:44 a.m. UTC | #10

>>>>> "James" == James Bottomley <James.Bottomley@HansenPartnership.com> writes:

>> The last point is what Denys introduced in the kernel with
>> bc27fb68aaad ("include/uapi/linux/byteorder, swab: force inlining of
>> some byteswap operations"). So maybe it's better after all to revert
>> that patch, to have a higher confidence in the same bug not appearing
>> elsewhere. It's also really a workaround for another quirk of the
>> compiler, but that one only results in duplicated functions in object
>> code rather than functions that end in the middle.

James> Yes, I think this is my preferred option.

Same here.

Martin Jambor April 27, 2016, 11:05 a.m. UTC | #11

Hi,

On Tue, Apr 26, 2016 at 05:58:20PM +0200, Arnd Bergmann wrote:
> On Tuesday 26 April 2016 09:06:54 Martin K. Petersen wrote:
> > >>>>> "Arnd" == Arnd Bergmann <arnd@arndb.de> writes:
> > 
> > Arnd> I don't think we can realistically blacklist gcc-4.9.{0,1,2,3},
> > Arnd> gcc-5.{0,1,2,3}.* and gcc-6.0 and require everyone to upgrade to
> > Arnd> compilers that have not been released yet in order to build a
> > Arnd> linux-4.6 kernel.
> > 
> > I agree that compiler blacklisting is problematic and I'd like to avoid
> > it. The question is how far we go in the kernel to accommodate various
> > levels of brokenness.
> > 
> > In any case. Sticking compiler workarounds in device driver code is akin
> > to putting demolition orders on display on Alpha Centauri. At the very
> > minimum the patch should put a fat comment in the code stating that
> > these wrapper functions or #defines should not be changed in the future
> > because that'll break builds using gcc XYZ. But that does not solve the
> > problem for anybody else that might be doing something similar.
> > Converting between u64 and $RANDOM_TYPE in an inline wrapper does not
> > seem like a rare and unusual programming pattern.
> 
> It's not the driver really, it's the core scsi/fc layer, which makes
> it a little dangerous that a random driver.
> 
> I agree that putting a comment in would also help. What I understand
> from the bug report is that to trigger this bug you need these elements:
> 
> 1. an inline function marked __always_inline
> 2. another inline function that is automatically inlined (not __always_inline)
> 3. CONFIG_OPTIMIZE_INLINING=y to guarantee 2
> 4. __builtin_compatible_p inside that inline function

The __always_inline requirement is not true.  In fact, if you look at
the example testcase filed in
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70646#c7 you'll see it
uses __builtin_compatible_p in an __always inline function that is
called from one that is not tagged with that attribute.

And generally speaking, always inline is never a requirement, any call
or chain of calls that the inliner can decide to inline can lead to
the bug (if it complies with the condition below).

What is a requirement, though, is that __builtin_compatible_p is
called on something passed in an argument by reference or in an
aggregate (i.e. struct or array) argument.

So,

  int foo1 (unsigned long *ref)
  {
    if (__builtin_constant (*ref))
      ...
    else
      /* wrongly unreachable code */
  }

can lead to this issue, as can

  int foo2 (struct S s)
  {
    if ((__builtin_constant (s.l))
      ...
    else
      /* wrongly unreachable code */
  }

but

  int foo3 (unsigned long val)
  {
    if (__builtin_constant (val))
      ...
    else
      /* all OK */
  }

cannot, and is fine.  But please note that wrapping a foo[12]-like
function into a dereferencing wrapper might not help if foo[12] would
be early-inlined into such wrapper (GCC has two inliners, a very
simple early-inliner that only handles simple cases and a full-blown
IPA inliner that contains the bug).  I believe this can be ensured by
making the wrapper always_inline and never calling it indirectly (via
a pointer).  Honza (CCed), you know inlining heuristics better, please
correct me if my last statement is somehow inaccurate (or indeed if
you have a better idea how kernel developers can make sure they do not
hit the bug).

Thanks,

Martin

> 
> The last point is what Denys introduced in the kernel with
> bc27fb68aaad ("include/uapi/linux/byteorder, swab: force inlining of some
> byteswap operations"). So maybe it's better after all to revert that
> patch, to have a higher confidence in the same bug not appearing
> elsewhere. It's also really a workaround for another quirk of the
> compiler, but that one only results in duplicated functions in object
> code rather than functions that end in the middle.
> 
> 	Arnd
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Arnd Bergmann April 27, 2016, 9:34 p.m. UTC | #12

On Wednesday 27 April 2016 13:05:03 Martin Jambor wrote:
> On Tue, Apr 26, 2016 at 05:58:20PM +0200, Arnd Bergmann wrote:
> > On Tuesday 26 April 2016 09:06:54 Martin K. Petersen wrote:
> > > >>>>> "Arnd" == Arnd Bergmann <arnd@arndb.de> writes:
> > > 
> > > Arnd> I don't think we can realistically blacklist gcc-4.9.{0,1,2,3},
> > > Arnd> gcc-5.{0,1,2,3}.* and gcc-6.0 and require everyone to upgrade to
> > > Arnd> compilers that have not been released yet in order to build a
> > > Arnd> linux-4.6 kernel.
> > > 
> > > I agree that compiler blacklisting is problematic and I'd like to avoid
> > > it. The question is how far we go in the kernel to accommodate various
> > > levels of brokenness.
> > > 
> > > In any case. Sticking compiler workarounds in device driver code is akin
> > > to putting demolition orders on display on Alpha Centauri. At the very
> > > minimum the patch should put a fat comment in the code stating that
> > > these wrapper functions or #defines should not be changed in the future
> > > because that'll break builds using gcc XYZ. But that does not solve the
> > > problem for anybody else that might be doing something similar.
> > > Converting between u64 and $RANDOM_TYPE in an inline wrapper does not
> > > seem like a rare and unusual programming pattern.
> > 
> > It's not the driver really, it's the core scsi/fc layer, which makes
> > it a little dangerous that a random driver.
> > 
> > I agree that putting a comment in would also help. What I understand
> > from the bug report is that to trigger this bug you need these elements:
> > 
> > 1. an inline function marked __always_inline
> > 2. another inline function that is automatically inlined (not __always_inline)
> > 3. CONFIG_OPTIMIZE_INLINING=y to guarantee 2
> > 4. __builtin_compatible_p inside that inline function
> 
> The __always_inline requirement is not true.  In fact, if you look at
> the example testcase filed in
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70646#c7 you'll see it
> uses __builtin_compatible_p in an __always inline function that is
> called from one that is not tagged with that attribute.
>
> And generally speaking, always inline is never a requirement, any call
> or chain of calls that the inliner can decide to inline can lead to
> the bug (if it complies with the condition below).

Ok, thanks for the clarification, I thought you always had to have both
kinds of inline functions.
 
> What is a requirement, though, is that __builtin_compatible_p is
> called on something passed in an argument by reference or in an
> aggregate (i.e. struct or array) argument.
> 
> So,
> 
>   int foo1 (unsigned long *ref)
>   {
>     if (__builtin_constant (*ref))
>       ...
>     else
>       /* wrongly unreachable code */
>   }
> 
>   }
> 
> cannot, and is fine.  But please note that wrapping a foo[12]-like
> function into a dereferencing wrapper might not help if foo[12] would
> be early-inlined into such wrapper (GCC has two inliners, a very
> simple early-inliner that only handles simple cases and a full-blown
> IPA inliner that contains the bug).  I believe this can be ensured by
> making the wrapper always_inline and never calling it indirectly (via
> a pointer).  Honza (CCed), you know inlining heuristics better, please
> correct me if my last statement is somehow inaccurate (or indeed if
> you have a better idea how kernel developers can make sure they do not
> hit the bug).

I guess that means that any user of this code in the kernel:

static inline __attribute_const__ __u64 __fswab64(__u64 val)
{
#ifdef __HAVE_BUILTIN_BSWAP64__
        return __builtin_bswap64(val);
#elif defined (__arch_swab64)
        return __arch_swab64(val);
#elif defined(__SWAB_64_THRU_32__)
        __u32 h = val >> 32;
        __u32 l = val & ((1ULL << 32) - 1);
        return (((__u64)__fswab32(l)) << 32) | ((__u64)(__fswab32(h)));
#else
        return ___constant_swab64(val);
#endif
}

#define __swab64(x)                             \
        (__builtin_constant_p((__u64)(x)) ?     \
        ___constant_swab64(x) :                 \
        __fswab64(x))

static __always_inline __u64 __swab64p(const __u64 *p)
{       
#ifdef __arch_swab64p
        return __arch_swab64p(p);
#else
        return __swab64(*p);
#endif
}

has a chance of running into the same problem, and we may want to solve
it at the root. For architectures that define __HAVE_BUILTIN_BSWAP64__
(i.e. ARM, MIPS, POWERPC, S390, and x86 with gcc-4.4 or higher, 4.8
for __HAVE_BUILTIN_BSWAP16__), we can probably just change the logic
to avoid __builtin_constant_p() and always use __builtin_bswap64().

This won't help on TILE, which is the one architecture that sets
ARCH_SUPPORTS_OPTIMIZED_INLINING but does not set ARCH_USE_BUILTIN_BSWAP.
Chris Metcalf should be able to figure out whether we can just
set ARCH_USE_BUILTIN_BSWAP for tile as well.

	Arnd
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Chris Metcalf April 28, 2016, 2:58 p.m. UTC | #13

(Resending as text/plain)

On 4/27/2016 5:34 PM, Arnd Bergmann wrote:
> This won't help on TILE, which is the one architecture that sets
> ARCH_SUPPORTS_OPTIMIZED_INLINING but does not set ARCH_USE_BUILTIN_BSWAP.
> Chris Metcalf should be able to figure out whether we can just
> set ARCH_USE_BUILTIN_BSWAP for tile as well.

We certainly could enable ARCH_USE_BUILTIN_BSWAP.  The only problem is
that we never added explicit support for bswap16() in gcc, which is
efficiently done on tilegx via the "revbytes" instruction and a 48-bit
right-shift.  So gcc instead does a generic thing with four
instructions in three bundles, so really not as good as our asm/swab.h.

I'm not sure how to weigh the implications of converting to
builtin_bswap16 (and possibly upstreaming a better implementation to
gcc), vs. disabling ARCH_SUPPORTS_OPTIMIZED_INLINING (which no one
else but x86 uses anyway), vs. just ignoring the compiler bug and
hoping it's not an issue in practice :-)

Arnd Bergmann April 28, 2016, 3:23 p.m. UTC | #14

On Thursday 28 April 2016 10:58:43 Chris Metcalf wrote:
> (Resending as text/plain)
> 
> On 4/27/2016 5:34 PM, Arnd Bergmann wrote:
> > This won't help on TILE, which is the one architecture that sets
> > ARCH_SUPPORTS_OPTIMIZED_INLINING but does not set ARCH_USE_BUILTIN_BSWAP.
> > Chris Metcalf should be able to figure out whether we can just
> > set ARCH_USE_BUILTIN_BSWAP for tile as well.
> 
> We certainly could enable ARCH_USE_BUILTIN_BSWAP.  The only problem is
> that we never added explicit support for bswap16() in gcc, which is
> efficiently done on tilegx via the "revbytes" instruction and a 48-bit
> right-shift.  So gcc instead does a generic thing with four
> instructions in three bundles, so really not as good as our asm/swab.h.
> 
> I'm not sure how to weigh the implications of converting to
> builtin_bswap16 (and possibly upstreaming a better implementation to
> gcc), vs. disabling ARCH_SUPPORTS_OPTIMIZED_INLINING (which no one
> else but x86 uses anyway), vs. just ignoring the compiler bug and
> hoping it's not an issue in practice 

How about figuring out whether you hit the gcc bug on tile as a
first step?

Another idea would be to adapt this section in include/linux/compiler-gcc.h:

#if !defined(CONFIG_ARCH_SUPPORTS_OPTIMIZED_INLINING) ||                \
    !defined(CONFIG_OPTIMIZE_INLINING) || (__GNUC__ < 4)
#define inline          inline          __attribute__((always_inline)) notrace
#define __inline__      __inline__      __attribute__((always_inline)) notrace
#define __inline        __inline        __attribute__((always_inline)) notrace
#else
/* A lot of inline functions can cause havoc with function tracing */
#define inline          inline          notrace
#define __inline__      __inline__      notrace
#define __inline        __inline        notrace
#endif

to work around the issue. We already check for gcc before 4.0, and
we could also check for the affected releases (4.9, 5.x, 6.1) in the
same place, possibly conditional on ARCH_USE_BUILTIN_BSWAP with
a comment pointing to the gcc bug tracker.

	Arnd
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Chris Metcalf April 28, 2016, 3:48 p.m. UTC | #15

On 4/28/2016 11:23 AM, Arnd Bergmann wrote:
> On Thursday 28 April 2016 10:58:43 Chris Metcalf wrote:
>> (Resending as text/plain)
>>
>> On 4/27/2016 5:34 PM, Arnd Bergmann wrote:
>>> This won't help on TILE, which is the one architecture that sets
>>> ARCH_SUPPORTS_OPTIMIZED_INLINING but does not set ARCH_USE_BUILTIN_BSWAP.
>>> Chris Metcalf should be able to figure out whether we can just
>>> set ARCH_USE_BUILTIN_BSWAP for tile as well.
>> We certainly could enable ARCH_USE_BUILTIN_BSWAP.  The only problem is
>> that we never added explicit support for bswap16() in gcc, which is
>> efficiently done on tilegx via the "revbytes" instruction and a 48-bit
>> right-shift.  So gcc instead does a generic thing with four
>> instructions in three bundles, so really not as good as our asm/swab.h.
>>
>> I'm not sure how to weigh the implications of converting to
>> builtin_bswap16 (and possibly upstreaming a better implementation to
>> gcc), vs. disabling ARCH_SUPPORTS_OPTIMIZED_INLINING (which no one
>> else but x86 uses anyway), vs. just ignoring the compiler bug and
>> hoping it's not an issue in practice
> How about figuring out whether you hit the gcc bug on tile as a
> first step?

I don't have an affected build of gcc handy (just 4.8 and 4.4).  I will pass
this to our compiler folks and see what they know.

> Another idea would be to adapt this section in include/linux/compiler-gcc.h:
>
> #if !defined(CONFIG_ARCH_SUPPORTS_OPTIMIZED_INLINING) ||                \
>      !defined(CONFIG_OPTIMIZE_INLINING) || (__GNUC__ < 4)
> #define inline          inline          __attribute__((always_inline)) notrace
> #define __inline__      __inline__      __attribute__((always_inline)) notrace
> #define __inline        __inline        __attribute__((always_inline)) notrace
> #else
> /* A lot of inline functions can cause havoc with function tracing */
> #define inline          inline          notrace
> #define __inline__      __inline__      notrace
> #define __inline        __inline        notrace
> #endif
>
> to work around the issue. We already check for gcc before 4.0, and
> we could also check for the affected releases (4.9, 5.x, 6.1) in the
> same place, possibly conditional on ARCH_USE_BUILTIN_BSWAP with
> a comment pointing to the gcc bug tracker.

This does seem like a more robust solution anyway, since more instances of the
bad inline pattern might get introduced in the future in other places.
I wouldn't make it conditional on ARCH_USE_BUILTIN_BSWAP for the same reason.

scsi: fc: force inlining of wwn conversion functions

Commit Message

Comments

Patch