Message ID | 1460716583-15673-1-git-send-email-dvlasenk@redhat.com (mailing list archive) |
---|---|
State | Changes Requested, archived |
Headers | show |
On Fri, 2016-04-15 at 12:36 +0200, Denys Vlasenko wrote: > More info here: > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70646 This bug is under investigation, so I'd rather not alter code for a gcc bug until we know if we can supply options to fix it rather than changing code. James > Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com> > CC: Himanshu Madhani <himanshu.madhani@qlogic.com> > CC: James Bottomley <James.Bottomley@HansenPartnership.com> > CC: qla2xxx-upstream@qlogic.com > CC: Josh Poimboeuf <jpoimboe@redhat.com> > CC: linux-scsi@vger.kernel.org > CC: linux-kernel@vger.kernel.org > --- > drivers/scsi/qla2xxx/qla_attr.c | 5 +++++ > 1 file changed, 5 insertions(+) > > diff --git a/drivers/scsi/qla2xxx/qla_attr.c > b/drivers/scsi/qla2xxx/qla_attr.c > index 4dc06a13..2dd9c72 100644 > --- a/drivers/scsi/qla2xxx/qla_attr.c > +++ b/drivers/scsi/qla2xxx/qla_attr.c > @@ -2003,9 +2003,14 @@ static void > qla2x00_get_host_fabric_name(struct Scsi_Host *shost) > { > scsi_qla_host_t *vha = shost_priv(shost); > + /* > + * This can trigger gcc 4.9/5.3 bug. > + * See https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70646 > uint8_t node_name[WWN_SIZE] = { 0xFF, 0xFF, 0xFF, 0xFF, \ > 0xFF, 0xFF, 0xFF, 0xFF}; > u64 fabric_name = wwn_to_u64(node_name); > + */ > + u64 fabric_name = (u64)(s64)-1; /* the same as above */ > > if (vha->device_flags & SWITCH_FOUND) > fabric_name = wwn_to_u64(vha->fabric_node_name); -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 04/15/2016 04:40 PM, James Bottomley wrote: > On Fri, 2016-04-15 at 12:36 +0200, Denys Vlasenko wrote: >> More info here: >> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70646 > > This bug is under investigation, so I'd rather not alter code for a gcc > bug until we know if we can supply options to fix it rather than > changing code. Background. The bug exists in gcc for 2 years, but it is rather hard to trigger, so nobody noticed. Unfortunately for kernel, these two commits landed in Linus tree in March 16 and 17: On 04/13/2016 05:36 AM, Josh Poimboeuf wrote: > It occurs with the combination of the following two recent commits: > > - bc27fb68aaad ("include/uapi/linux/byteorder, swab: force inlining of some byteswap operations") > - ef3fb2422ffe ("scsi: fc: use get/put_unaligned64 for wwn access") and now *many* users of qla2x00 and new-ish gcc are going to very much notice it, as their kernels will start crashing reliably. The commits can be reverted, sure, but they per se do not contain anything unusual. They, together with not very typical construct in qla2x00_get_host_fabric_name, one which boils down to "swab64p(constant_array_of_8_bytes)", just happen to nudge gcc in a right way to finally trigger the bug. So I came with another idea how to forestall the imminent deluge of qla2x00 oops reports - this patch. -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, 2016-04-15 at 20:56 +0200, Denys Vlasenko wrote: > On 04/15/2016 04:40 PM, James Bottomley wrote: > > On Fri, 2016-04-15 at 12:36 +0200, Denys Vlasenko wrote: > > > More info here: > > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70646 > > > > This bug is under investigation, so I'd rather not alter code for a > > gcc > > bug until we know if we can supply options to fix it rather than > > changing code. > > > Background. The bug exists in gcc for 2 years, but it is rather > hard to trigger, so nobody noticed. We know this ... linux-scsi is on the cc for the other thread on this. > Unfortunately for kernel, these two commits landed in Linus tree > in March 16 and 17: > > > On 04/13/2016 05:36 AM, Josh Poimboeuf wrote: > > It occurs with the combination of the following two recent commits: > > > > - bc27fb68aaad ("include/uapi/linux/byteorder, swab: force inlining > > of some byteswap operations") > > - ef3fb2422ffe ("scsi: fc: use get/put_unaligned64 for wwn access") > > > and now *many* users of qla2x00 and new-ish gcc are going to > very much notice it, as their kernels will start crashing reliably. > > The commits can be reverted, sure, but they per se do not contain > anything unusual. They, together with not very typical construct > in qla2x00_get_host_fabric_name, one > which boils down to "swab64p(constant_array_of_8_bytes)", > just happen to nudge gcc in a right way to finally trigger the bug. > > So I came with another idea how to forestall the imminent deluge of > qla2x00 oops reports - this patch. There are actually a raft of checkers that run the upstream code which aren't seeing any problem; likely because the code is harder to trigger than you think. So, lets wait until the resolution of the other thread before we panic, especially since we're only at -rc3. James -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, Apr 15, 2016 at 12:05:26PM -0700, James Bottomley wrote: > On Fri, 2016-04-15 at 20:56 +0200, Denys Vlasenko wrote: > > and now *many* users of qla2x00 and new-ish gcc are going to > > very much notice it, as their kernels will start crashing reliably. > > > > The commits can be reverted, sure, but they per se do not contain > > anything unusual. They, together with not very typical construct > > in qla2x00_get_host_fabric_name, one > > which boils down to "swab64p(constant_array_of_8_bytes)", > > just happen to nudge gcc in a right way to finally trigger the bug. > > > > So I came with another idea how to forestall the imminent deluge of > > qla2x00 oops reports - this patch. > > There are actually a raft of checkers that run the upstream code which > aren't seeing any problem; likely because the code is harder to trigger > than you think. So, lets wait until the resolution of the other thread > before we panic, especially since we're only at -rc3. Regardless of the outcome of the gcc bug, it seems kind of silly to byteswap a constant value of 0xffffffffffffffff. uint8_t node_name[WWN_SIZE] = { 0xFF, 0xFF, 0xFF, 0xFF, \ 0xFF, 0xFF, 0xFF, 0xFF}; u64 fabric_name = wwn_to_u64(node_name); Similar to what Denys suggested, it can just be: u64 fabric_name = -1; or u64 fabric_name = 0xffffffffffffffff; Wouldn't that be an improvement to the code regardless?
On 04/15/2016 09:05 PM, James Bottomley wrote: > On Fri, 2016-04-15 at 20:56 +0200, Denys Vlasenko wrote: >> On 04/15/2016 04:40 PM, James Bottomley wrote: >>> On Fri, 2016-04-15 at 12:36 +0200, Denys Vlasenko wrote: >>>> More info here: >>>> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70646 >>> >>> This bug is under investigation, so I'd rather not alter code for a >>> gcc >>> bug until we know if we can supply options to fix it rather than >>> changing code. >> >> >> Background. The bug exists in gcc for 2 years, but it is rather >> hard to trigger, so nobody noticed. > > We know this ... linux-scsi is on the cc for the other thread on this. > >> Unfortunately for kernel, these two commits landed in Linus tree >> in March 16 and 17: >> >> >> On 04/13/2016 05:36 AM, Josh Poimboeuf wrote: >>> It occurs with the combination of the following two recent commits: >>> >>> - bc27fb68aaad ("include/uapi/linux/byteorder, swab: force inlining >>> of some byteswap operations") >>> - ef3fb2422ffe ("scsi: fc: use get/put_unaligned64 for wwn access") >> >> >> and now *many* users of qla2x00 and new-ish gcc are going to >> very much notice it, as their kernels will start crashing reliably. >> >> The commits can be reverted, sure, but they per se do not contain >> anything unusual. They, together with not very typical construct >> in qla2x00_get_host_fabric_name, one >> which boils down to "swab64p(constant_array_of_8_bytes)", >> just happen to nudge gcc in a right way to finally trigger the bug. >> >> So I came with another idea how to forestall the imminent deluge of >> qla2x00 oops reports - this patch. > > There are actually a raft of checkers that run the upstream code which > aren't seeing any problem; likely because the code is harder to trigger > than you think. So, lets wait until the resolution of the other thread > before we panic, especially since we're only at -rc3. I'm not panicking, James. By sending a workaround, I just want to make sure that *other people* won't be forced to fix up a problem which surfaced because of *my* patch. I'm afraid "harder to trigger than you think" is not true. It is nearly trivial to trigger it now. I just tried the following on a freshly installed Fedora 21 machine: $ git clone git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git $ cd linux $ make defconfig $ sed '/SCSI_FC_ATTRS/d;/SCSI_LOWLEVEL/d' -i .config $ make oldconfig # answer "yes" to everything $ nice make -j22 $ objdump -dr drivers/scsi/qla2xxx/qla_attr.o | grep -A10 qla2x00_get_host_fabric_name 0000000000001540 <qla2x00_get_host_fabric_name>: 1540: 55 push %rbp 1541: 48 89 e5 mov %rsp,%rbp 1544: 66 66 66 2e 0f 1f 84 data32 data32 nopw %cs:0x0(%rax,%rax,1) 154b: 00 00 00 00 00 0000000000001550 <qla2x00_fw_state_show>: 1550: 55 push %rbp 1551: 48 89 e5 mov %rsp,%rbp 1554: 53 push %rbx 1555: 48 89 d3 mov %rdx,%rbx See? I'm sure Fedora 22, 23 and 24 will also do that. -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, 2016-04-15 at 15:02 -0500, Josh Poimboeuf wrote: > On Fri, Apr 15, 2016 at 12:05:26PM -0700, James Bottomley wrote: > > On Fri, 2016-04-15 at 20:56 +0200, Denys Vlasenko wrote: > > > and now *many* users of qla2x00 and new-ish gcc are going to > > > very much notice it, as their kernels will start crashing > > > reliably. > > > > > > The commits can be reverted, sure, but they per se do not contain > > > anything unusual. They, together with not very typical construct > > > in qla2x00_get_host_fabric_name, one > > > which boils down to "swab64p(constant_array_of_8_bytes)", > > > just happen to nudge gcc in a right way to finally trigger the > > > bug. > > > > > > So I came with another idea how to forestall the imminent deluge > > > of > > > qla2x00 oops reports - this patch. > > > > There are actually a raft of checkers that run the upstream code > > which > > aren't seeing any problem; likely because the code is harder to > > trigger > > than you think. So, lets wait until the resolution of the other > > thread > > before we panic, especially since we're only at -rc3. > > Regardless of the outcome of the gcc bug, it seems kind of silly to > byteswap a constant value of 0xffffffffffffffff. > > uint8_t node_name[WWN_SIZE] = { 0xFF, 0xFF, 0xFF, 0xFF, \ > 0xFF, 0xFF, 0xFF, 0xFF}; > u64 fabric_name = wwn_to_u64(node_name); > > Similar to what Denys suggested, it can just be: > > u64 fabric_name = -1; > or > u64 fabric_name = 0xffffffffffffffff; > > Wouldn't that be an improvement to the code regardless? "Improvement" would be in the eye of the beholder. Semantically it would be wrong because we're initialising a CPU local representation of a big endian structure, so we *should* use the conversion. James -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, 2016-04-15 at 23:09 +0200, Denys Vlasenko wrote: > On 04/15/2016 09:05 PM, James Bottomley wrote: > > On Fri, 2016-04-15 at 20:56 +0200, Denys Vlasenko wrote: > > > On 04/15/2016 04:40 PM, James Bottomley wrote: > > > > On Fri, 2016-04-15 at 12:36 +0200, Denys Vlasenko wrote: > > > > > More info here: > > > > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70646 > > > > > > > > This bug is under investigation, so I'd rather not alter code > > > > for a > > > > gcc > > > > bug until we know if we can supply options to fix it rather > > > > than > > > > changing code. > > > > > > > > > Background. The bug exists in gcc for 2 years, but it is rather > > > hard to trigger, so nobody noticed. > > > > We know this ... linux-scsi is on the cc for the other thread on > > this. > > > > > Unfortunately for kernel, these two commits landed in Linus tree > > > in March 16 and 17: > > > > > > > > > On 04/13/2016 05:36 AM, Josh Poimboeuf wrote: > > > > It occurs with the combination of the following two recent > > > > commits: > > > > > > > > - bc27fb68aaad ("include/uapi/linux/byteorder, swab: force > > > > inlining > > > > of some byteswap operations") > > > > - ef3fb2422ffe ("scsi: fc: use get/put_unaligned64 for wwn > > > > access") > > > > > > > > > and now *many* users of qla2x00 and new-ish gcc are going to > > > very much notice it, as their kernels will start crashing > > > reliably. > > > > > > The commits can be reverted, sure, but they per se do not contain > > > anything unusual. They, together with not very typical construct > > > in qla2x00_get_host_fabric_name, one > > > which boils down to "swab64p(constant_array_of_8_bytes)", > > > just happen to nudge gcc in a right way to finally trigger the > > > bug. > > > > > > So I came with another idea how to forestall the imminent deluge > > > of > > > qla2x00 oops reports - this patch. > > > > There are actually a raft of checkers that run the upstream code > > which > > aren't seeing any problem; likely because the code is harder to > > trigger > > than you think. So, lets wait until the resolution of the other > > thread > > before we panic, especially since we're only at -rc3. > > I'm not panicking, James. > > By sending a workaround, I just want to make sure that *other people* > won't be forced to fix up a problem which surfaced because of *my* > patch. Look, if gcc really proves to be intractable, I think what should happen is revert the triggering patch, which is commit e3bde9568d992c5f985e6e30731a5f9f9bef7b13 Author: Denys Vlasenko <dvlasenk@redhat.com> Date: Thu Mar 17 14:22:47 2016 -0700 include/linux/unaligned: force inlining of byteswap operations But, as I've said a couple of times now, there are no bug reports from the testers about qla2xxx (yet) so we can afford to wait a bit and see if there's some other resolution that doesn't involve changing kernel code to work around a local gcc bug. James -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/drivers/scsi/qla2xxx/qla_attr.c b/drivers/scsi/qla2xxx/qla_attr.c index 4dc06a13..2dd9c72 100644 --- a/drivers/scsi/qla2xxx/qla_attr.c +++ b/drivers/scsi/qla2xxx/qla_attr.c @@ -2003,9 +2003,14 @@ static void qla2x00_get_host_fabric_name(struct Scsi_Host *shost) { scsi_qla_host_t *vha = shost_priv(shost); + /* + * This can trigger gcc 4.9/5.3 bug. + * See https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70646 uint8_t node_name[WWN_SIZE] = { 0xFF, 0xFF, 0xFF, 0xFF, \ 0xFF, 0xFF, 0xFF, 0xFF}; u64 fabric_name = wwn_to_u64(node_name); + */ + u64 fabric_name = (u64)(s64)-1; /* the same as above */ if (vha->device_flags & SWITCH_FOUND) fabric_name = wwn_to_u64(vha->fabric_node_name);
More info here: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70646 Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com> CC: Himanshu Madhani <himanshu.madhani@qlogic.com> CC: James Bottomley <James.Bottomley@HansenPartnership.com> CC: qla2xxx-upstream@qlogic.com CC: Josh Poimboeuf <jpoimboe@redhat.com> CC: linux-scsi@vger.kernel.org CC: linux-kernel@vger.kernel.org --- drivers/scsi/qla2xxx/qla_attr.c | 5 +++++ 1 file changed, 5 insertions(+)