diff mbox

qla2xxx: rewrite code to avoid hitting gcc bug 70646

Message ID 1460716583-15673-1-git-send-email-dvlasenk@redhat.com (mailing list archive)
State Changes Requested, archived
Headers show

Commit Message

Denys Vlasenko April 15, 2016, 10:36 a.m. UTC
More info here:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70646

Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
CC: Himanshu Madhani <himanshu.madhani@qlogic.com>
CC: James Bottomley <James.Bottomley@HansenPartnership.com>
CC: qla2xxx-upstream@qlogic.com
CC: Josh Poimboeuf <jpoimboe@redhat.com>
CC: linux-scsi@vger.kernel.org
CC: linux-kernel@vger.kernel.org
---
 drivers/scsi/qla2xxx/qla_attr.c | 5 +++++
 1 file changed, 5 insertions(+)

Comments

James Bottomley April 15, 2016, 2:40 p.m. UTC | #1
On Fri, 2016-04-15 at 12:36 +0200, Denys Vlasenko wrote:
> More info here:
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70646

This bug is under investigation, so I'd rather not alter code for a gcc
bug until we know if we can supply options to fix it rather than
changing code.

James


> Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
> CC: Himanshu Madhani <himanshu.madhani@qlogic.com>
> CC: James Bottomley <James.Bottomley@HansenPartnership.com>
> CC: qla2xxx-upstream@qlogic.com
> CC: Josh Poimboeuf <jpoimboe@redhat.com>
> CC: linux-scsi@vger.kernel.org
> CC: linux-kernel@vger.kernel.org
> ---
>  drivers/scsi/qla2xxx/qla_attr.c | 5 +++++
>  1 file changed, 5 insertions(+)
> 
> diff --git a/drivers/scsi/qla2xxx/qla_attr.c
> b/drivers/scsi/qla2xxx/qla_attr.c
> index 4dc06a13..2dd9c72 100644
> --- a/drivers/scsi/qla2xxx/qla_attr.c
> +++ b/drivers/scsi/qla2xxx/qla_attr.c
> @@ -2003,9 +2003,14 @@ static void
>  qla2x00_get_host_fabric_name(struct Scsi_Host *shost)
>  {
>  	scsi_qla_host_t *vha = shost_priv(shost);
> +	/*
> +	 * This can trigger gcc 4.9/5.3 bug.
> +	 * See https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70646
>  	uint8_t node_name[WWN_SIZE] = { 0xFF, 0xFF, 0xFF, 0xFF, \
>  		0xFF, 0xFF, 0xFF, 0xFF};
>  	u64 fabric_name = wwn_to_u64(node_name);
> +	 */
> +	u64 fabric_name = (u64)(s64)-1; /* the same as above */
>  
>  	if (vha->device_flags & SWITCH_FOUND)
>  		fabric_name = wwn_to_u64(vha->fabric_node_name);

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Denys Vlasenko April 15, 2016, 6:56 p.m. UTC | #2
On 04/15/2016 04:40 PM, James Bottomley wrote:
> On Fri, 2016-04-15 at 12:36 +0200, Denys Vlasenko wrote:
>> More info here:
>> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70646
> 
> This bug is under investigation, so I'd rather not alter code for a gcc
> bug until we know if we can supply options to fix it rather than
> changing code.


Background. The bug exists in gcc for 2 years, but it is rather
hard to trigger, so nobody noticed.

Unfortunately for kernel, these two commits landed in Linus tree
in March 16 and 17:


On 04/13/2016 05:36 AM, Josh Poimboeuf wrote:
> It occurs with the combination of the following two recent commits:
>
> - bc27fb68aaad ("include/uapi/linux/byteorder, swab: force inlining of some byteswap operations")
> - ef3fb2422ffe ("scsi: fc: use get/put_unaligned64 for wwn access")


and now *many* users of qla2x00 and new-ish gcc are going to
very much notice it, as their kernels will start crashing reliably.

The commits can be reverted, sure, but they per se do not contain
anything unusual. They, together with not very typical construct
in qla2x00_get_host_fabric_name, one
which boils down to "swab64p(constant_array_of_8_bytes)",
just happen to nudge gcc in a right way to finally trigger the bug.

So I came with another idea how to forestall the imminent deluge of
qla2x00 oops reports - this patch.
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
James Bottomley April 15, 2016, 7:05 p.m. UTC | #3
On Fri, 2016-04-15 at 20:56 +0200, Denys Vlasenko wrote:
> On 04/15/2016 04:40 PM, James Bottomley wrote:
> > On Fri, 2016-04-15 at 12:36 +0200, Denys Vlasenko wrote:
> > > More info here:
> > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70646
> > 
> > This bug is under investigation, so I'd rather not alter code for a
> > gcc
> > bug until we know if we can supply options to fix it rather than
> > changing code.
> 
> 
> Background. The bug exists in gcc for 2 years, but it is rather
> hard to trigger, so nobody noticed.

We know this ... linux-scsi is on the cc for the other thread on this.

> Unfortunately for kernel, these two commits landed in Linus tree
> in March 16 and 17:
> 
> 
> On 04/13/2016 05:36 AM, Josh Poimboeuf wrote:
> > It occurs with the combination of the following two recent commits:
> > 
> > - bc27fb68aaad ("include/uapi/linux/byteorder, swab: force inlining
> > of some byteswap operations")
> > - ef3fb2422ffe ("scsi: fc: use get/put_unaligned64 for wwn access")
> 
> 
> and now *many* users of qla2x00 and new-ish gcc are going to
> very much notice it, as their kernels will start crashing reliably.
> 
> The commits can be reverted, sure, but they per se do not contain
> anything unusual. They, together with not very typical construct
> in qla2x00_get_host_fabric_name, one
> which boils down to "swab64p(constant_array_of_8_bytes)",
> just happen to nudge gcc in a right way to finally trigger the bug.
> 
> So I came with another idea how to forestall the imminent deluge of
> qla2x00 oops reports - this patch.

There are actually a raft of checkers that run the upstream code which
aren't seeing any problem; likely because the code is harder to trigger
than you think.  So, lets wait until the resolution of the other thread
before we panic, especially since we're only at -rc3.

James


--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Josh Poimboeuf April 15, 2016, 8:02 p.m. UTC | #4
On Fri, Apr 15, 2016 at 12:05:26PM -0700, James Bottomley wrote:
> On Fri, 2016-04-15 at 20:56 +0200, Denys Vlasenko wrote:
> > and now *many* users of qla2x00 and new-ish gcc are going to
> > very much notice it, as their kernels will start crashing reliably.
> > 
> > The commits can be reverted, sure, but they per se do not contain
> > anything unusual. They, together with not very typical construct
> > in qla2x00_get_host_fabric_name, one
> > which boils down to "swab64p(constant_array_of_8_bytes)",
> > just happen to nudge gcc in a right way to finally trigger the bug.
> > 
> > So I came with another idea how to forestall the imminent deluge of
> > qla2x00 oops reports - this patch.
> 
> There are actually a raft of checkers that run the upstream code which
> aren't seeing any problem; likely because the code is harder to trigger
> than you think.  So, lets wait until the resolution of the other thread
> before we panic, especially since we're only at -rc3.

Regardless of the outcome of the gcc bug, it seems kind of silly to
byteswap a constant value of 0xffffffffffffffff.

	uint8_t node_name[WWN_SIZE] = { 0xFF, 0xFF, 0xFF, 0xFF, \
		0xFF, 0xFF, 0xFF, 0xFF};
	u64 fabric_name = wwn_to_u64(node_name);

Similar to what Denys suggested, it can just be:

	u64 fabric_name = -1;
or
	u64 fabric_name = 0xffffffffffffffff;

Wouldn't that be an improvement to the code regardless?
Denys Vlasenko April 15, 2016, 9:09 p.m. UTC | #5
On 04/15/2016 09:05 PM, James Bottomley wrote:
> On Fri, 2016-04-15 at 20:56 +0200, Denys Vlasenko wrote:
>> On 04/15/2016 04:40 PM, James Bottomley wrote:
>>> On Fri, 2016-04-15 at 12:36 +0200, Denys Vlasenko wrote:
>>>> More info here:
>>>> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70646
>>>
>>> This bug is under investigation, so I'd rather not alter code for a
>>> gcc
>>> bug until we know if we can supply options to fix it rather than
>>> changing code.
>>
>>
>> Background. The bug exists in gcc for 2 years, but it is rather
>> hard to trigger, so nobody noticed.
> 
> We know this ... linux-scsi is on the cc for the other thread on this.
> 
>> Unfortunately for kernel, these two commits landed in Linus tree
>> in March 16 and 17:
>>
>>
>> On 04/13/2016 05:36 AM, Josh Poimboeuf wrote:
>>> It occurs with the combination of the following two recent commits:
>>>
>>> - bc27fb68aaad ("include/uapi/linux/byteorder, swab: force inlining
>>> of some byteswap operations")
>>> - ef3fb2422ffe ("scsi: fc: use get/put_unaligned64 for wwn access")
>>
>>
>> and now *many* users of qla2x00 and new-ish gcc are going to
>> very much notice it, as their kernels will start crashing reliably.
>>
>> The commits can be reverted, sure, but they per se do not contain
>> anything unusual. They, together with not very typical construct
>> in qla2x00_get_host_fabric_name, one
>> which boils down to "swab64p(constant_array_of_8_bytes)",
>> just happen to nudge gcc in a right way to finally trigger the bug.
>>
>> So I came with another idea how to forestall the imminent deluge of
>> qla2x00 oops reports - this patch.
> 
> There are actually a raft of checkers that run the upstream code which
> aren't seeing any problem; likely because the code is harder to trigger
> than you think.  So, lets wait until the resolution of the other thread
> before we panic, especially since we're only at -rc3.

I'm not panicking, James.

By sending a workaround, I just want to make sure that *other people*
won't be forced to fix up a problem which surfaced because of *my* patch.


I'm afraid "harder to trigger than you think" is not true.
It is nearly trivial to trigger it now.
I just tried the following on a freshly installed Fedora 21 machine:

$ git clone git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
$ cd linux
$ make defconfig
$ sed '/SCSI_FC_ATTRS/d;/SCSI_LOWLEVEL/d' -i .config
$ make oldconfig    # answer "yes" to everything
$ nice make -j22
$ objdump -dr drivers/scsi/qla2xxx/qla_attr.o | grep -A10 qla2x00_get_host_fabric_name

0000000000001540 <qla2x00_get_host_fabric_name>:
    1540:	55                   	push   %rbp
    1541:	48 89 e5             	mov    %rsp,%rbp
    1544:	66 66 66 2e 0f 1f 84 	data32 data32 nopw %cs:0x0(%rax,%rax,1)
    154b:	00 00 00 00 00

0000000000001550 <qla2x00_fw_state_show>:
    1550:	55                   	push   %rbp
    1551:	48 89 e5             	mov    %rsp,%rbp
    1554:	53                   	push   %rbx
    1555:	48 89 d3             	mov    %rdx,%rbx

See?
I'm sure Fedora 22, 23 and 24 will also do that.

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
James Bottomley April 15, 2016, 9:15 p.m. UTC | #6
On Fri, 2016-04-15 at 15:02 -0500, Josh Poimboeuf wrote:
> On Fri, Apr 15, 2016 at 12:05:26PM -0700, James Bottomley wrote:
> > On Fri, 2016-04-15 at 20:56 +0200, Denys Vlasenko wrote:
> > > and now *many* users of qla2x00 and new-ish gcc are going to
> > > very much notice it, as their kernels will start crashing
> > > reliably.
> > > 
> > > The commits can be reverted, sure, but they per se do not contain
> > > anything unusual. They, together with not very typical construct
> > > in qla2x00_get_host_fabric_name, one
> > > which boils down to "swab64p(constant_array_of_8_bytes)",
> > > just happen to nudge gcc in a right way to finally trigger the
> > > bug.
> > > 
> > > So I came with another idea how to forestall the imminent deluge
> > > of
> > > qla2x00 oops reports - this patch.
> > 
> > There are actually a raft of checkers that run the upstream code
> > which
> > aren't seeing any problem; likely because the code is harder to
> > trigger
> > than you think.  So, lets wait until the resolution of the other
> > thread
> > before we panic, especially since we're only at -rc3.
> 
> Regardless of the outcome of the gcc bug, it seems kind of silly to
> byteswap a constant value of 0xffffffffffffffff.
> 
> 	uint8_t node_name[WWN_SIZE] = { 0xFF, 0xFF, 0xFF, 0xFF, \
> 		0xFF, 0xFF, 0xFF, 0xFF};
> 	u64 fabric_name = wwn_to_u64(node_name);
> 
> Similar to what Denys suggested, it can just be:
> 
> 	u64 fabric_name = -1;
> or
> 	u64 fabric_name = 0xffffffffffffffff;
> 
> Wouldn't that be an improvement to the code regardless?

"Improvement" would be in the eye of the beholder.  Semantically it
would be wrong because we're initialising a CPU local representation of
a big endian structure, so we *should* use the conversion.

James

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
James Bottomley April 15, 2016, 9:25 p.m. UTC | #7
On Fri, 2016-04-15 at 23:09 +0200, Denys Vlasenko wrote:
> On 04/15/2016 09:05 PM, James Bottomley wrote:
> > On Fri, 2016-04-15 at 20:56 +0200, Denys Vlasenko wrote:
> > > On 04/15/2016 04:40 PM, James Bottomley wrote:
> > > > On Fri, 2016-04-15 at 12:36 +0200, Denys Vlasenko wrote:
> > > > > More info here:
> > > > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70646
> > > > 
> > > > This bug is under investigation, so I'd rather not alter code
> > > > for a
> > > > gcc
> > > > bug until we know if we can supply options to fix it rather
> > > > than
> > > > changing code.
> > > 
> > > 
> > > Background. The bug exists in gcc for 2 years, but it is rather
> > > hard to trigger, so nobody noticed.
> > 
> > We know this ... linux-scsi is on the cc for the other thread on
> > this.
> > 
> > > Unfortunately for kernel, these two commits landed in Linus tree
> > > in March 16 and 17:
> > > 
> > > 
> > > On 04/13/2016 05:36 AM, Josh Poimboeuf wrote:
> > > > It occurs with the combination of the following two recent
> > > > commits:
> > > > 
> > > > - bc27fb68aaad ("include/uapi/linux/byteorder, swab: force
> > > > inlining
> > > > of some byteswap operations")
> > > > - ef3fb2422ffe ("scsi: fc: use get/put_unaligned64 for wwn
> > > > access")
> > > 
> > > 
> > > and now *many* users of qla2x00 and new-ish gcc are going to
> > > very much notice it, as their kernels will start crashing
> > > reliably.
> > > 
> > > The commits can be reverted, sure, but they per se do not contain
> > > anything unusual. They, together with not very typical construct
> > > in qla2x00_get_host_fabric_name, one
> > > which boils down to "swab64p(constant_array_of_8_bytes)",
> > > just happen to nudge gcc in a right way to finally trigger the
> > > bug.
> > > 
> > > So I came with another idea how to forestall the imminent deluge
> > > of
> > > qla2x00 oops reports - this patch.
> > 
> > There are actually a raft of checkers that run the upstream code
> > which
> > aren't seeing any problem; likely because the code is harder to
> > trigger
> > than you think.  So, lets wait until the resolution of the other
> > thread
> > before we panic, especially since we're only at -rc3.
> 
> I'm not panicking, James.
> 
> By sending a workaround, I just want to make sure that *other people*
> won't be forced to fix up a problem which surfaced because of *my* 
> patch.

Look, if gcc really proves to be intractable, I think what should
happen is revert the triggering patch, which is

commit e3bde9568d992c5f985e6e30731a5f9f9bef7b13
Author: Denys Vlasenko <dvlasenk@redhat.com>
Date:   Thu Mar 17 14:22:47 2016 -0700

    include/linux/unaligned: force inlining of byteswap operations

But, as I've said a couple of times now, there are no bug reports from
the testers about qla2xxx (yet) so we can afford to wait a bit and see
if there's some other resolution that doesn't involve changing kernel
code to work around a local gcc bug.

James

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/drivers/scsi/qla2xxx/qla_attr.c b/drivers/scsi/qla2xxx/qla_attr.c
index 4dc06a13..2dd9c72 100644
--- a/drivers/scsi/qla2xxx/qla_attr.c
+++ b/drivers/scsi/qla2xxx/qla_attr.c
@@ -2003,9 +2003,14 @@  static void
 qla2x00_get_host_fabric_name(struct Scsi_Host *shost)
 {
 	scsi_qla_host_t *vha = shost_priv(shost);
+	/*
+	 * This can trigger gcc 4.9/5.3 bug.
+	 * See https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70646
 	uint8_t node_name[WWN_SIZE] = { 0xFF, 0xFF, 0xFF, 0xFF, \
 		0xFF, 0xFF, 0xFF, 0xFF};
 	u64 fabric_name = wwn_to_u64(node_name);
+	 */
+	u64 fabric_name = (u64)(s64)-1; /* the same as above */
 
 	if (vha->device_flags & SWITCH_FOUND)
 		fabric_name = wwn_to_u64(vha->fabric_node_name);