Message ID | 1476699850-25083-5-git-send-email-sumit.saxena@broadcom.com (mailing list archive) |
---|---|
State | Changes Requested, archived |
Headers | show |
On 10/17/2016 12:24 PM, Sumit Saxena wrote: > megaraid_sas driver returns SYNCHRONIZE_CACHE command back to SCSI layer > without sending it to firmware as firmware takes care of flushing cache. > This patch will change the driver behavior wrt SYNCHRONIZE_CACHE handling. > If underlying firmware has support to handle the SYNCHORNIZE_CACHE, driver > will send it for firmware otherwise complete it back to SCSI layer with > SUCCESS immediately. > If Firmware handle SYNCHORNIZE_CACHE for both VD and JBOD > "canHandleSyncCache" bit in scratch pad register(offset 0x00B4) will be > set. > > This behavior can be controlled via module parameter and user can fallback > to old behavior of returning SYNCHRONIZE_CACHE by driver only without > sending it to firmware. > > Signed-off-by: Sumit Saxena <sumit.saxena@broadcom.com> > Signed-off-by: Kashyap Desai <kashyap.desai@broadcom.com> > --- > drivers/scsi/megaraid/megaraid_sas.h | 3 +++ > drivers/scsi/megaraid/megaraid_sas_base.c | 14 ++++++-------- > drivers/scsi/megaraid/megaraid_sas_fusion.c | 3 +++ > drivers/scsi/megaraid/megaraid_sas_fusion.h | 2 ++ > 4 files changed, 14 insertions(+), 8 deletions(-) > > diff --git a/drivers/scsi/megaraid/megaraid_sas.h b/drivers/scsi/megaraid/megaraid_sas.h > index ca86c88..43fd14f 100644 > --- a/drivers/scsi/megaraid/megaraid_sas.h > +++ b/drivers/scsi/megaraid/megaraid_sas.h > @@ -1429,6 +1429,8 @@ enum FW_BOOT_CONTEXT { > #define MR_MAX_REPLY_QUEUES_EXT_OFFSET_SHIFT 14 > #define MR_MAX_MSIX_REG_ARRAY 16 > #define MR_RDPQ_MODE_OFFSET 0X00800000 > +#define MR_CAN_HANDLE_SYNC_CACHE_OFFSET 0X01000000 > + > /* > * register set for both 1068 and 1078 controllers > * structure extended for 1078 registers > @@ -2140,6 +2142,7 @@ struct megasas_instance { > u8 is_imr; > u8 is_rdpq; > bool dev_handle; > + bool fw_sync_cache_support; > }; > struct MR_LD_VF_MAP { > u32 size; > diff --git a/drivers/scsi/megaraid/megaraid_sas_base.c b/drivers/scsi/megaraid/megaraid_sas_base.c > index 092387f..a4a8e2f 100644 > --- a/drivers/scsi/megaraid/megaraid_sas_base.c > +++ b/drivers/scsi/megaraid/megaraid_sas_base.c > @@ -104,6 +104,10 @@ unsigned int scmd_timeout = MEGASAS_DEFAULT_CMD_TIMEOUT; > module_param(scmd_timeout, int, S_IRUGO); > MODULE_PARM_DESC(scmd_timeout, "scsi command timeout (10-90s), default 90s. See megasas_reset_timer."); > > +bool block_sync_cache; > +module_param(block_sync_cache, bool, S_IRUGO); > +MODULE_PARM_DESC(block_sync_cache, "Block SYNC CACHE by driver Default: 0(Send it to firmware)"); > + > MODULE_LICENSE("GPL"); > MODULE_VERSION(MEGASAS_VERSION); > MODULE_AUTHOR("megaraidlinux.pdl@avagotech.com"); > @@ -1700,16 +1704,10 @@ megasas_queue_command(struct Scsi_Host *shost, struct scsi_cmnd *scmd) > goto out_done; > } > > - switch (scmd->cmnd[0]) { > - case SYNCHRONIZE_CACHE: > - /* > - * FW takes care of flush cache on its own > - * No need to send it down > - */ > + if ((scmd->cmnd[0] == SYNCHRONIZE_CACHE) && > + (!instance->fw_sync_cache_support)) { > scmd->result = DID_OK << 16; > goto out_done; > - default: > - break; > } > > return instance->instancet->build_and_issue_cmd(instance, scmd); > diff --git a/drivers/scsi/megaraid/megaraid_sas_fusion.c b/drivers/scsi/megaraid/megaraid_sas_fusion.c > index 2159f6a..8237580 100644 > --- a/drivers/scsi/megaraid/megaraid_sas_fusion.c > +++ b/drivers/scsi/megaraid/megaraid_sas_fusion.c > @@ -747,6 +747,9 @@ megasas_ioc_init_fusion(struct megasas_instance *instance) > ret = 1; > goto fail_fw_init; > } > + if (!block_sync_cache) > + instance->fw_sync_cache_support = (scratch_pad_2 & > + MR_CAN_HANDLE_SYNC_CACHE_OFFSET) ? 1 : 0; > > IOCInitMessage = > dma_alloc_coherent(&instance->pdev->dev, > diff --git a/drivers/scsi/megaraid/megaraid_sas_fusion.h b/drivers/scsi/megaraid/megaraid_sas_fusion.h > index e3bee04..2857154 100644 > --- a/drivers/scsi/megaraid/megaraid_sas_fusion.h > +++ b/drivers/scsi/megaraid/megaraid_sas_fusion.h > @@ -72,6 +72,8 @@ > #define MPI2_SUP_REPLY_POST_HOST_INDEX_OFFSET (0x0000030C) > #define MPI2_REPLY_POST_HOST_INDEX_OFFSET (0x0000006C) > > +extern bool block_sync_cache; > + > /* > * Raid context flags > */ > Be extra careful with that. SYNCHRONIZE_CACHE has (potentially) a global scope, as there might be an array-wide cache, and a cache flush would affect the entire cache. Linux is using SYNCHRONIZE_CACHE as a per device setting, ie it assumes that the effects of a cache flush is restricted to the device in question. If this is _not_ the case I'd rather not enable it. Have you checked that enabling this functionality doesn't have negative performance impact? Cheers, Hannes
>-----Original Message----- >From: Hannes Reinecke [mailto:hare@suse.de] >Sent: Monday, October 17, 2016 5:04 PM >To: Sumit Saxena; linux-scsi@vger.kernel.org >Cc: martin.petersen@oracle.com; thenzl@redhat.com; jejb@linux.vnet.ibm.com; >kashyap.desai@broadcom.com >Subject: Re: [PATCH 4/7] megaraid_sas: Send SYNCHRONIZE_CACHE command to >firmware > >On 10/17/2016 12:24 PM, Sumit Saxena wrote: >> megaraid_sas driver returns SYNCHRONIZE_CACHE command back to SCSI >> layer without sending it to firmware as firmware takes care of flushing cache. >> This patch will change the driver behavior wrt SYNCHRONIZE_CACHE handling. >> If underlying firmware has support to handle the SYNCHORNIZE_CACHE, >> driver will send it for firmware otherwise complete it back to SCSI >> layer with SUCCESS immediately. >> If Firmware handle SYNCHORNIZE_CACHE for both VD and JBOD >> "canHandleSyncCache" bit in scratch pad register(offset 0x00B4) will >> be set. >> >> This behavior can be controlled via module parameter and user can >> fallback to old behavior of returning SYNCHRONIZE_CACHE by driver only >> without sending it to firmware. >> >> Signed-off-by: Sumit Saxena <sumit.saxena@broadcom.com> >> Signed-off-by: Kashyap Desai <kashyap.desai@broadcom.com> >> --- >> drivers/scsi/megaraid/megaraid_sas.h | 3 +++ >> drivers/scsi/megaraid/megaraid_sas_base.c | 14 ++++++-------- >> drivers/scsi/megaraid/megaraid_sas_fusion.c | 3 +++ >> drivers/scsi/megaraid/megaraid_sas_fusion.h | 2 ++ >> 4 files changed, 14 insertions(+), 8 deletions(-) >> >> diff --git a/drivers/scsi/megaraid/megaraid_sas.h >> b/drivers/scsi/megaraid/megaraid_sas.h >> index ca86c88..43fd14f 100644 >> --- a/drivers/scsi/megaraid/megaraid_sas.h >> +++ b/drivers/scsi/megaraid/megaraid_sas.h >> @@ -1429,6 +1429,8 @@ enum FW_BOOT_CONTEXT { >> #define MR_MAX_REPLY_QUEUES_EXT_OFFSET_SHIFT 14 >> #define MR_MAX_MSIX_REG_ARRAY 16 >> #define MR_RDPQ_MODE_OFFSET 0X00800000 >> +#define MR_CAN_HANDLE_SYNC_CACHE_OFFSET 0X01000000 >> + >> /* >> * register set for both 1068 and 1078 controllers >> * structure extended for 1078 registers @@ -2140,6 +2142,7 @@ struct >> megasas_instance { >> u8 is_imr; >> u8 is_rdpq; >> bool dev_handle; >> + bool fw_sync_cache_support; >> }; >> struct MR_LD_VF_MAP { >> u32 size; >> diff --git a/drivers/scsi/megaraid/megaraid_sas_base.c >> b/drivers/scsi/megaraid/megaraid_sas_base.c >> index 092387f..a4a8e2f 100644 >> --- a/drivers/scsi/megaraid/megaraid_sas_base.c >> +++ b/drivers/scsi/megaraid/megaraid_sas_base.c >> @@ -104,6 +104,10 @@ unsigned int scmd_timeout = >> MEGASAS_DEFAULT_CMD_TIMEOUT; module_param(scmd_timeout, int, >> S_IRUGO); MODULE_PARM_DESC(scmd_timeout, "scsi command timeout >> (10-90s), default 90s. See megasas_reset_timer."); >> >> +bool block_sync_cache; >> +module_param(block_sync_cache, bool, S_IRUGO); >> +MODULE_PARM_DESC(block_sync_cache, "Block SYNC CACHE by driver >> +Default: 0(Send it to firmware)"); >> + >> MODULE_LICENSE("GPL"); >> MODULE_VERSION(MEGASAS_VERSION); >> MODULE_AUTHOR("megaraidlinux.pdl@avagotech.com"); >> @@ -1700,16 +1704,10 @@ megasas_queue_command(struct Scsi_Host >*shost, struct scsi_cmnd *scmd) >> goto out_done; >> } >> >> - switch (scmd->cmnd[0]) { >> - case SYNCHRONIZE_CACHE: >> - /* >> - * FW takes care of flush cache on its own >> - * No need to send it down >> - */ >> + if ((scmd->cmnd[0] == SYNCHRONIZE_CACHE) && >> + (!instance->fw_sync_cache_support)) { >> scmd->result = DID_OK << 16; >> goto out_done; >> - default: >> - break; >> } >> >> return instance->instancet->build_and_issue_cmd(instance, scmd); >> diff --git a/drivers/scsi/megaraid/megaraid_sas_fusion.c >> b/drivers/scsi/megaraid/megaraid_sas_fusion.c >> index 2159f6a..8237580 100644 >> --- a/drivers/scsi/megaraid/megaraid_sas_fusion.c >> +++ b/drivers/scsi/megaraid/megaraid_sas_fusion.c >> @@ -747,6 +747,9 @@ megasas_ioc_init_fusion(struct megasas_instance >*instance) >> ret = 1; >> goto fail_fw_init; >> } >> + if (!block_sync_cache) >> + instance->fw_sync_cache_support = (scratch_pad_2 & >> + MR_CAN_HANDLE_SYNC_CACHE_OFFSET) ? 1 : 0; >> >> IOCInitMessage = >> dma_alloc_coherent(&instance->pdev->dev, >> diff --git a/drivers/scsi/megaraid/megaraid_sas_fusion.h >> b/drivers/scsi/megaraid/megaraid_sas_fusion.h >> index e3bee04..2857154 100644 >> --- a/drivers/scsi/megaraid/megaraid_sas_fusion.h >> +++ b/drivers/scsi/megaraid/megaraid_sas_fusion.h >> @@ -72,6 +72,8 @@ >> #define MPI2_SUP_REPLY_POST_HOST_INDEX_OFFSET (0x0000030C) >> #define MPI2_REPLY_POST_HOST_INDEX_OFFSET (0x0000006C) >> >> +extern bool block_sync_cache; >> + >> /* >> * Raid context flags >> */ >> >Be extra careful with that. > >SYNCHRONIZE_CACHE has (potentially) a global scope, as there might be an >array-wide cache, and a cache flush would affect the entire cache. >Linux is using SYNCHRONIZE_CACHE as a per device setting, ie it assumes that the >effects of a cache flush is restricted to the device in question. Yes, cache flush will be done per device only. > >If this is _not_ the case I'd rather not enable it. >Have you checked that enabling this functionality doesn't have negative >performance impact? Yes, there will be negative performance impact on wrokloads where SYNCHRONIZE_CACHE is fired very frequently. We have provided a module parameter to fallback to older behavior, if anyone sees negative performance impact because of this and there is external power to drive. This change is done for deployment where there is no external power to drive and abrupt shutdown may cause data loss as cache may not be flushed to disk. Thanks, Sumit > >Cheers, > >Hannes >-- >Dr. Hannes Reinecke Teamlead Storage & Networking >hare@suse.de +49 911 74053 688 >SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg >GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton HRB 21284 (AG >Nürnberg) -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 10/17/2016 07:34 AM, Hannes Reinecke wrote: > On 10/17/2016 12:24 PM, Sumit Saxena wrote: >> megaraid_sas driver returns SYNCHRONIZE_CACHE command back to SCSI layer >> without sending it to firmware as firmware takes care of flushing cache. >> This patch will change the driver behavior wrt SYNCHRONIZE_CACHE handling. >> If underlying firmware has support to handle the SYNCHORNIZE_CACHE, driver >> will send it for firmware otherwise complete it back to SCSI layer with >> SUCCESS immediately. >> If Firmware handle SYNCHORNIZE_CACHE for both VD and JBOD >> "canHandleSyncCache" bit in scratch pad register(offset 0x00B4) will be >> set. >> >> This behavior can be controlled via module parameter and user can fallback >> to old behavior of returning SYNCHRONIZE_CACHE by driver only without >> sending it to firmware. >> >> Signed-off-by: Sumit Saxena <sumit.saxena@broadcom.com> >> Signed-off-by: Kashyap Desai <kashyap.desai@broadcom.com> >> --- >> drivers/scsi/megaraid/megaraid_sas.h | 3 +++ >> drivers/scsi/megaraid/megaraid_sas_base.c | 14 ++++++-------- >> drivers/scsi/megaraid/megaraid_sas_fusion.c | 3 +++ >> drivers/scsi/megaraid/megaraid_sas_fusion.h | 2 ++ >> 4 files changed, 14 insertions(+), 8 deletions(-) >> >> diff --git a/drivers/scsi/megaraid/megaraid_sas.h b/drivers/scsi/megaraid/megaraid_sas.h >> index ca86c88..43fd14f 100644 >> --- a/drivers/scsi/megaraid/megaraid_sas.h >> +++ b/drivers/scsi/megaraid/megaraid_sas.h >> @@ -1429,6 +1429,8 @@ enum FW_BOOT_CONTEXT { >> #define MR_MAX_REPLY_QUEUES_EXT_OFFSET_SHIFT 14 >> #define MR_MAX_MSIX_REG_ARRAY 16 >> #define MR_RDPQ_MODE_OFFSET 0X00800000 >> +#define MR_CAN_HANDLE_SYNC_CACHE_OFFSET 0X01000000 >> + >> /* >> * register set for both 1068 and 1078 controllers >> * structure extended for 1078 registers >> @@ -2140,6 +2142,7 @@ struct megasas_instance { >> u8 is_imr; >> u8 is_rdpq; >> bool dev_handle; >> + bool fw_sync_cache_support; >> }; >> struct MR_LD_VF_MAP { >> u32 size; >> diff --git a/drivers/scsi/megaraid/megaraid_sas_base.c b/drivers/scsi/megaraid/megaraid_sas_base.c >> index 092387f..a4a8e2f 100644 >> --- a/drivers/scsi/megaraid/megaraid_sas_base.c >> +++ b/drivers/scsi/megaraid/megaraid_sas_base.c >> @@ -104,6 +104,10 @@ unsigned int scmd_timeout = MEGASAS_DEFAULT_CMD_TIMEOUT; >> module_param(scmd_timeout, int, S_IRUGO); >> MODULE_PARM_DESC(scmd_timeout, "scsi command timeout (10-90s), default 90s. See megasas_reset_timer."); >> >> +bool block_sync_cache; >> +module_param(block_sync_cache, bool, S_IRUGO); >> +MODULE_PARM_DESC(block_sync_cache, "Block SYNC CACHE by driver Default: 0(Send it to firmware)"); >> + >> MODULE_LICENSE("GPL"); >> MODULE_VERSION(MEGASAS_VERSION); >> MODULE_AUTHOR("megaraidlinux.pdl@avagotech.com"); >> @@ -1700,16 +1704,10 @@ megasas_queue_command(struct Scsi_Host *shost, struct scsi_cmnd *scmd) >> goto out_done; >> } >> >> - switch (scmd->cmnd[0]) { >> - case SYNCHRONIZE_CACHE: >> - /* >> - * FW takes care of flush cache on its own >> - * No need to send it down >> - */ >> + if ((scmd->cmnd[0] == SYNCHRONIZE_CACHE) && >> + (!instance->fw_sync_cache_support)) { >> scmd->result = DID_OK << 16; >> goto out_done; >> - default: >> - break; >> } >> >> return instance->instancet->build_and_issue_cmd(instance, scmd); >> diff --git a/drivers/scsi/megaraid/megaraid_sas_fusion.c b/drivers/scsi/megaraid/megaraid_sas_fusion.c >> index 2159f6a..8237580 100644 >> --- a/drivers/scsi/megaraid/megaraid_sas_fusion.c >> +++ b/drivers/scsi/megaraid/megaraid_sas_fusion.c >> @@ -747,6 +747,9 @@ megasas_ioc_init_fusion(struct megasas_instance *instance) >> ret = 1; >> goto fail_fw_init; >> } >> + if (!block_sync_cache) >> + instance->fw_sync_cache_support = (scratch_pad_2 & >> + MR_CAN_HANDLE_SYNC_CACHE_OFFSET) ? 1 : 0; >> >> IOCInitMessage = >> dma_alloc_coherent(&instance->pdev->dev, >> diff --git a/drivers/scsi/megaraid/megaraid_sas_fusion.h b/drivers/scsi/megaraid/megaraid_sas_fusion.h >> index e3bee04..2857154 100644 >> --- a/drivers/scsi/megaraid/megaraid_sas_fusion.h >> +++ b/drivers/scsi/megaraid/megaraid_sas_fusion.h >> @@ -72,6 +72,8 @@ >> #define MPI2_SUP_REPLY_POST_HOST_INDEX_OFFSET (0x0000030C) >> #define MPI2_REPLY_POST_HOST_INDEX_OFFSET (0x0000006C) >> >> +extern bool block_sync_cache; >> + >> /* >> * Raid context flags >> */ >> > Be extra careful with that. > > SYNCHRONIZE_CACHE has (potentially) a global scope, as there might be an > array-wide cache, and a cache flush would affect the entire cache. > Linux is using SYNCHRONIZE_CACHE as a per device setting, ie it assumes > that the effects of a cache flush is restricted to the device in question. > > If this is _not_ the case I'd rather not enable it. > Have you checked that enabling this functionality doesn't have negative > performance impact? > > Cheers, > > Hannes This must go in - without this fix, there is no data integrity for any file system. In effect, this driver by default has been throwing away SYNCHRONIZE_CACHE commands even when acting in JBOD/non-RAID mode. Of course, actually doing a SYNCHRONIZE_CACHE to drives will be slower than throwing it away, but this is not optional. We really need to have some ways to validate that our IO stack is properly and safely configured. I would love to see a couple of things: * having T10 & T13 report the existence of a volatile write cache - this is different than WCE set, some devices have a write cache and are battery/flash backed. * having a robust tool to test over power failure/disconnect that our assumptions are true - any write is durable after a SYNCHRONIZE_CACHE or CACHE_FLUSH_EXT is sent and ack'ed Regards, Ric -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 17.10.2016 12:24, Sumit Saxena wrote: > megaraid_sas driver returns SYNCHRONIZE_CACHE command back to SCSI layer > without sending it to firmware as firmware takes care of flushing cache. > This patch will change the driver behavior wrt SYNCHRONIZE_CACHE handling. > If underlying firmware has support to handle the SYNCHORNIZE_CACHE, driver > will send it for firmware otherwise complete it back to SCSI layer with > SUCCESS immediately. > If Firmware handle SYNCHORNIZE_CACHE for both VD and JBOD > "canHandleSyncCache" bit in scratch pad register(offset 0x00B4) will be > set. > > This behavior can be controlled via module parameter and user can fallback > to old behavior of returning SYNCHRONIZE_CACHE by driver only without > sending it to firmware. > > Signed-off-by: Sumit Saxena <sumit.saxena@broadcom.com> > Signed-off-by: Kashyap Desai <kashyap.desai@broadcom.com> > --- > drivers/scsi/megaraid/megaraid_sas.h | 3 +++ > drivers/scsi/megaraid/megaraid_sas_base.c | 14 ++++++-------- > drivers/scsi/megaraid/megaraid_sas_fusion.c | 3 +++ > drivers/scsi/megaraid/megaraid_sas_fusion.h | 2 ++ > 4 files changed, 14 insertions(+), 8 deletions(-) > > diff --git a/drivers/scsi/megaraid/megaraid_sas.h b/drivers/scsi/megaraid/megaraid_sas.h > index ca86c88..43fd14f 100644 > --- a/drivers/scsi/megaraid/megaraid_sas.h > +++ b/drivers/scsi/megaraid/megaraid_sas.h > @@ -1429,6 +1429,8 @@ enum FW_BOOT_CONTEXT { > #define MR_MAX_REPLY_QUEUES_EXT_OFFSET_SHIFT 14 > #define MR_MAX_MSIX_REG_ARRAY 16 > #define MR_RDPQ_MODE_OFFSET 0X00800000 > +#define MR_CAN_HANDLE_SYNC_CACHE_OFFSET 0X01000000 > + > /* > * register set for both 1068 and 1078 controllers > * structure extended for 1078 registers > @@ -2140,6 +2142,7 @@ struct megasas_instance { > u8 is_imr; > u8 is_rdpq; > bool dev_handle; > + bool fw_sync_cache_support; > }; > struct MR_LD_VF_MAP { > u32 size; > diff --git a/drivers/scsi/megaraid/megaraid_sas_base.c b/drivers/scsi/megaraid/megaraid_sas_base.c > index 092387f..a4a8e2f 100644 > --- a/drivers/scsi/megaraid/megaraid_sas_base.c > +++ b/drivers/scsi/megaraid/megaraid_sas_base.c > @@ -104,6 +104,10 @@ unsigned int scmd_timeout = MEGASAS_DEFAULT_CMD_TIMEOUT; > module_param(scmd_timeout, int, S_IRUGO); > MODULE_PARM_DESC(scmd_timeout, "scsi command timeout (10-90s), default 90s. See megasas_reset_timer."); > > +bool block_sync_cache; > +module_param(block_sync_cache, bool, S_IRUGO); > +MODULE_PARM_DESC(block_sync_cache, "Block SYNC CACHE by driver Default: 0(Send it to firmware)"); > + > MODULE_LICENSE("GPL"); > MODULE_VERSION(MEGASAS_VERSION); > MODULE_AUTHOR("megaraidlinux.pdl@avagotech.com"); > @@ -1700,16 +1704,10 @@ megasas_queue_command(struct Scsi_Host *shost, struct scsi_cmnd *scmd) > goto out_done; > } > > - switch (scmd->cmnd[0]) { > - case SYNCHRONIZE_CACHE: > - /* > - * FW takes care of flush cache on its own > - * No need to send it down > - */ > + if ((scmd->cmnd[0] == SYNCHRONIZE_CACHE) && > + (!instance->fw_sync_cache_support)) { Maybe I'm wrong, but isn't the logic inverted? when fw_sync_cache_support is true it means that there is a flash cache or a battery and we don't have to send the command down ? tomash > scmd->result = DID_OK << 16; > goto out_done; > - default: > - break; > } > > return instance->instancet->build_and_issue_cmd(instance, scmd); > diff --git a/drivers/scsi/megaraid/megaraid_sas_fusion.c b/drivers/scsi/megaraid/megaraid_sas_fusion.c > index 2159f6a..8237580 100644 > --- a/drivers/scsi/megaraid/megaraid_sas_fusion.c > +++ b/drivers/scsi/megaraid/megaraid_sas_fusion.c > @@ -747,6 +747,9 @@ megasas_ioc_init_fusion(struct megasas_instance *instance) > ret = 1; > goto fail_fw_init; > } > + if (!block_sync_cache) > + instance->fw_sync_cache_support = (scratch_pad_2 & > + MR_CAN_HANDLE_SYNC_CACHE_OFFSET) ? 1 : 0; > > IOCInitMessage = > dma_alloc_coherent(&instance->pdev->dev, > diff --git a/drivers/scsi/megaraid/megaraid_sas_fusion.h b/drivers/scsi/megaraid/megaraid_sas_fusion.h > index e3bee04..2857154 100644 > --- a/drivers/scsi/megaraid/megaraid_sas_fusion.h > +++ b/drivers/scsi/megaraid/megaraid_sas_fusion.h > @@ -72,6 +72,8 @@ > #define MPI2_SUP_REPLY_POST_HOST_INDEX_OFFSET (0x0000030C) > #define MPI2_REPLY_POST_HOST_INDEX_OFFSET (0x0000006C) > > +extern bool block_sync_cache; > + > /* > * Raid context flags > */ -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
>-----Original Message----- >From: Tomas Henzl [mailto:thenzl@redhat.com] >Sent: Monday, October 17, 2016 6:44 PM >To: Sumit Saxena; linux-scsi@vger.kernel.org >Cc: martin.petersen@oracle.com; jejb@linux.vnet.ibm.com; >kashyap.desai@broadcom.com >Subject: Re: [PATCH 4/7] megaraid_sas: Send SYNCHRONIZE_CACHE command to >firmware > >On 17.10.2016 12:24, Sumit Saxena wrote: >> megaraid_sas driver returns SYNCHRONIZE_CACHE command back to SCSI >> layer without sending it to firmware as firmware takes care of flushing cache. >> This patch will change the driver behavior wrt SYNCHRONIZE_CACHE handling. >> If underlying firmware has support to handle the SYNCHORNIZE_CACHE, >> driver will send it for firmware otherwise complete it back to SCSI >> layer with SUCCESS immediately. >> If Firmware handle SYNCHORNIZE_CACHE for both VD and JBOD >> "canHandleSyncCache" bit in scratch pad register(offset 0x00B4) will >> be set. >> >> This behavior can be controlled via module parameter and user can >> fallback to old behavior of returning SYNCHRONIZE_CACHE by driver only >> without sending it to firmware. >> >> Signed-off-by: Sumit Saxena <sumit.saxena@broadcom.com> >> Signed-off-by: Kashyap Desai <kashyap.desai@broadcom.com> >> --- >> drivers/scsi/megaraid/megaraid_sas.h | 3 +++ >> drivers/scsi/megaraid/megaraid_sas_base.c | 14 ++++++-------- >> drivers/scsi/megaraid/megaraid_sas_fusion.c | 3 +++ >> drivers/scsi/megaraid/megaraid_sas_fusion.h | 2 ++ >> 4 files changed, 14 insertions(+), 8 deletions(-) >> >> diff --git a/drivers/scsi/megaraid/megaraid_sas.h >> b/drivers/scsi/megaraid/megaraid_sas.h >> index ca86c88..43fd14f 100644 >> --- a/drivers/scsi/megaraid/megaraid_sas.h >> +++ b/drivers/scsi/megaraid/megaraid_sas.h >> @@ -1429,6 +1429,8 @@ enum FW_BOOT_CONTEXT { >> #define MR_MAX_REPLY_QUEUES_EXT_OFFSET_SHIFT 14 >> #define MR_MAX_MSIX_REG_ARRAY 16 >> #define MR_RDPQ_MODE_OFFSET 0X00800000 >> +#define MR_CAN_HANDLE_SYNC_CACHE_OFFSET 0X01000000 >> + >> /* >> * register set for both 1068 and 1078 controllers >> * structure extended for 1078 registers @@ -2140,6 +2142,7 @@ struct >> megasas_instance { >> u8 is_imr; >> u8 is_rdpq; >> bool dev_handle; >> + bool fw_sync_cache_support; >> }; >> struct MR_LD_VF_MAP { >> u32 size; >> diff --git a/drivers/scsi/megaraid/megaraid_sas_base.c >> b/drivers/scsi/megaraid/megaraid_sas_base.c >> index 092387f..a4a8e2f 100644 >> --- a/drivers/scsi/megaraid/megaraid_sas_base.c >> +++ b/drivers/scsi/megaraid/megaraid_sas_base.c >> @@ -104,6 +104,10 @@ unsigned int scmd_timeout = >> MEGASAS_DEFAULT_CMD_TIMEOUT; module_param(scmd_timeout, int, >> S_IRUGO); MODULE_PARM_DESC(scmd_timeout, "scsi command timeout >> (10-90s), default 90s. See megasas_reset_timer."); >> >> +bool block_sync_cache; >> +module_param(block_sync_cache, bool, S_IRUGO); >> +MODULE_PARM_DESC(block_sync_cache, "Block SYNC CACHE by driver >> +Default: 0(Send it to firmware)"); >> + >> MODULE_LICENSE("GPL"); >> MODULE_VERSION(MEGASAS_VERSION); >> MODULE_AUTHOR("megaraidlinux.pdl@avagotech.com"); >> @@ -1700,16 +1704,10 @@ megasas_queue_command(struct Scsi_Host >*shost, struct scsi_cmnd *scmd) >> goto out_done; >> } >> >> - switch (scmd->cmnd[0]) { >> - case SYNCHRONIZE_CACHE: >> - /* >> - * FW takes care of flush cache on its own >> - * No need to send it down >> - */ >> + if ((scmd->cmnd[0] == SYNCHRONIZE_CACHE) && >> + (!instance->fw_sync_cache_support)) { > >Maybe I'm wrong, but isn't the logic inverted? >when fw_sync_cache_support is true it means that there is a flash cache or a >battery and we don't have to send the command down ? > If "instance->fw_sync_cache_support" is set to true, it means driver can send SYNCHRONIZE_CACHE to firmware(firmware has infrastructure to support SYNCHRONIZE_CACHE). If "instance->fw_sync_cache_support" is set to false, it means FW does not support to handle SYNCHRONIZE_CACHE and FW will flush cache during shutdown. >tomash > >> scmd->result = DID_OK << 16; >> goto out_done; >> - default: >> - break; >> } >> >> return instance->instancet->build_and_issue_cmd(instance, scmd); >> diff --git a/drivers/scsi/megaraid/megaraid_sas_fusion.c >> b/drivers/scsi/megaraid/megaraid_sas_fusion.c >> index 2159f6a..8237580 100644 >> --- a/drivers/scsi/megaraid/megaraid_sas_fusion.c >> +++ b/drivers/scsi/megaraid/megaraid_sas_fusion.c >> @@ -747,6 +747,9 @@ megasas_ioc_init_fusion(struct megasas_instance >*instance) >> ret = 1; >> goto fail_fw_init; >> } >> + if (!block_sync_cache) >> + instance->fw_sync_cache_support = (scratch_pad_2 & >> + MR_CAN_HANDLE_SYNC_CACHE_OFFSET) ? 1 : 0; >> >> IOCInitMessage = >> dma_alloc_coherent(&instance->pdev->dev, >> diff --git a/drivers/scsi/megaraid/megaraid_sas_fusion.h >> b/drivers/scsi/megaraid/megaraid_sas_fusion.h >> index e3bee04..2857154 100644 >> --- a/drivers/scsi/megaraid/megaraid_sas_fusion.h >> +++ b/drivers/scsi/megaraid/megaraid_sas_fusion.h >> @@ -72,6 +72,8 @@ >> #define MPI2_SUP_REPLY_POST_HOST_INDEX_OFFSET (0x0000030C) >> #define MPI2_REPLY_POST_HOST_INDEX_OFFSET (0x0000006C) >> >> +extern bool block_sync_cache; >> + >> /* >> * Raid context flags >> */ > -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
>-----Original Message----- >From: Ric Wheeler [mailto:rwheeler@redhat.com] >Sent: Monday, October 17, 2016 6:31 PM >To: Hannes Reinecke; Sumit Saxena; linux-scsi@vger.kernel.org >Cc: martin.petersen@oracle.com; thenzl@redhat.com; jejb@linux.vnet.ibm.com; >kashyap.desai@broadcom.com; Christoph Hellwig; Martin K. Petersen; Jeff >Moyer; Gris Ge; Ewan Milne; Jens Axboe >Subject: Re: [PATCH 4/7] megaraid_sas: Send SYNCHRONIZE_CACHE command to >firmware > >On 10/17/2016 07:34 AM, Hannes Reinecke wrote: >> On 10/17/2016 12:24 PM, Sumit Saxena wrote: >>> megaraid_sas driver returns SYNCHRONIZE_CACHE command back to SCSI >>> layer without sending it to firmware as firmware takes care of flushing cache. >>> This patch will change the driver behavior wrt SYNCHRONIZE_CACHE handling. >>> If underlying firmware has support to handle the SYNCHORNIZE_CACHE, >>> driver will send it for firmware otherwise complete it back to SCSI >>> layer with SUCCESS immediately. >>> If Firmware handle SYNCHORNIZE_CACHE for both VD and JBOD >>> "canHandleSyncCache" bit in scratch pad register(offset 0x00B4) will >>> be set. >>> >>> This behavior can be controlled via module parameter and user can >>> fallback to old behavior of returning SYNCHRONIZE_CACHE by driver >>> only without sending it to firmware. >>> >>> Signed-off-by: Sumit Saxena <sumit.saxena@broadcom.com> >>> Signed-off-by: Kashyap Desai <kashyap.desai@broadcom.com> >>> --- >>> drivers/scsi/megaraid/megaraid_sas.h | 3 +++ >>> drivers/scsi/megaraid/megaraid_sas_base.c | 14 ++++++-------- >>> drivers/scsi/megaraid/megaraid_sas_fusion.c | 3 +++ >>> drivers/scsi/megaraid/megaraid_sas_fusion.h | 2 ++ >>> 4 files changed, 14 insertions(+), 8 deletions(-) >>> >>> diff --git a/drivers/scsi/megaraid/megaraid_sas.h >>> b/drivers/scsi/megaraid/megaraid_sas.h >>> index ca86c88..43fd14f 100644 >>> --- a/drivers/scsi/megaraid/megaraid_sas.h >>> +++ b/drivers/scsi/megaraid/megaraid_sas.h >>> @@ -1429,6 +1429,8 @@ enum FW_BOOT_CONTEXT { >>> #define MR_MAX_REPLY_QUEUES_EXT_OFFSET_SHIFT 14 >>> #define MR_MAX_MSIX_REG_ARRAY 16 >>> #define MR_RDPQ_MODE_OFFSET 0X00800000 >>> +#define MR_CAN_HANDLE_SYNC_CACHE_OFFSET 0X01000000 >>> + >>> /* >>> * register set for both 1068 and 1078 controllers >>> * structure extended for 1078 registers @@ -2140,6 +2142,7 @@ >>> struct megasas_instance { >>> u8 is_imr; >>> u8 is_rdpq; >>> bool dev_handle; >>> + bool fw_sync_cache_support; >>> }; >>> struct MR_LD_VF_MAP { >>> u32 size; >>> diff --git a/drivers/scsi/megaraid/megaraid_sas_base.c >>> b/drivers/scsi/megaraid/megaraid_sas_base.c >>> index 092387f..a4a8e2f 100644 >>> --- a/drivers/scsi/megaraid/megaraid_sas_base.c >>> +++ b/drivers/scsi/megaraid/megaraid_sas_base.c >>> @@ -104,6 +104,10 @@ unsigned int scmd_timeout = >MEGASAS_DEFAULT_CMD_TIMEOUT; >>> module_param(scmd_timeout, int, S_IRUGO); >>> MODULE_PARM_DESC(scmd_timeout, "scsi command timeout (10-90s), >>> default 90s. See megasas_reset_timer."); >>> >>> +bool block_sync_cache; >>> +module_param(block_sync_cache, bool, S_IRUGO); >>> +MODULE_PARM_DESC(block_sync_cache, "Block SYNC CACHE by driver >>> +Default: 0(Send it to firmware)"); >>> + >>> MODULE_LICENSE("GPL"); >>> MODULE_VERSION(MEGASAS_VERSION); >>> MODULE_AUTHOR("megaraidlinux.pdl@avagotech.com"); >>> @@ -1700,16 +1704,10 @@ megasas_queue_command(struct Scsi_Host >*shost, struct scsi_cmnd *scmd) >>> goto out_done; >>> } >>> >>> - switch (scmd->cmnd[0]) { >>> - case SYNCHRONIZE_CACHE: >>> - /* >>> - * FW takes care of flush cache on its own >>> - * No need to send it down >>> - */ >>> + if ((scmd->cmnd[0] == SYNCHRONIZE_CACHE) && >>> + (!instance->fw_sync_cache_support)) { >>> scmd->result = DID_OK << 16; >>> goto out_done; >>> - default: >>> - break; >>> } >>> >>> return instance->instancet->build_and_issue_cmd(instance, scmd); >>> diff --git a/drivers/scsi/megaraid/megaraid_sas_fusion.c >>> b/drivers/scsi/megaraid/megaraid_sas_fusion.c >>> index 2159f6a..8237580 100644 >>> --- a/drivers/scsi/megaraid/megaraid_sas_fusion.c >>> +++ b/drivers/scsi/megaraid/megaraid_sas_fusion.c >>> @@ -747,6 +747,9 @@ megasas_ioc_init_fusion(struct megasas_instance >*instance) >>> ret = 1; >>> goto fail_fw_init; >>> } >>> + if (!block_sync_cache) >>> + instance->fw_sync_cache_support = (scratch_pad_2 & >>> + MR_CAN_HANDLE_SYNC_CACHE_OFFSET) ? 1 : 0; >>> >>> IOCInitMessage = >>> dma_alloc_coherent(&instance->pdev->dev, >>> diff --git a/drivers/scsi/megaraid/megaraid_sas_fusion.h >>> b/drivers/scsi/megaraid/megaraid_sas_fusion.h >>> index e3bee04..2857154 100644 >>> --- a/drivers/scsi/megaraid/megaraid_sas_fusion.h >>> +++ b/drivers/scsi/megaraid/megaraid_sas_fusion.h >>> @@ -72,6 +72,8 @@ >>> #define MPI2_SUP_REPLY_POST_HOST_INDEX_OFFSET (0x0000030C) >>> #define MPI2_REPLY_POST_HOST_INDEX_OFFSET (0x0000006C) >>> >>> +extern bool block_sync_cache; >>> + >>> /* >>> * Raid context flags >>> */ >>> >> Be extra careful with that. >> >> SYNCHRONIZE_CACHE has (potentially) a global scope, as there might be >> an array-wide cache, and a cache flush would affect the entire cache. >> Linux is using SYNCHRONIZE_CACHE as a per device setting, ie it >> assumes that the effects of a cache flush is restricted to the device in question. >> >> If this is _not_ the case I'd rather not enable it. >> Have you checked that enabling this functionality doesn't have >> negative performance impact? >> >> Cheers, >> >> Hannes > >This must go in - without this fix, there is no data integrity for any file system. > >In effect, this driver by default has been throwing away SYNCHRONIZE_CACHE >commands even when acting in JBOD/non-RAID mode. For JBOD mode, we are planning to send SYNCHRONIZE_CACHE unconditionally for all generation Controllers and FW. Since there is single driver for all generation controllers so we are doing some Testing that opening SYNCHRONIZE_CACHE should not break any controller. I will be sending follow up patch for the same as soon as I am done. > >Of course, actually doing a SYNCHRONIZE_CACHE to drives will be slower than >throwing it away, but this is not optional. > >We really need to have some ways to validate that our IO stack is properly and >safely configured. > >I would love to see a couple of things: > >* having T10 & T13 report the existence of a volatile write cache - this is different >than WCE set, some devices have a write cache and are battery/flash backed. > >* having a robust tool to test over power failure/disconnect that our assumptions >are true - any write is durable after a SYNCHRONIZE_CACHE or >CACHE_FLUSH_EXT is sent and ack'ed > >Regards, > >Ric > -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 17.10.2016 15:28, Sumit Saxena wrote: >> -----Original Message----- >> From: Tomas Henzl [mailto:thenzl@redhat.com] >> Sent: Monday, October 17, 2016 6:44 PM >> To: Sumit Saxena; linux-scsi@vger.kernel.org >> Cc: martin.petersen@oracle.com; jejb@linux.vnet.ibm.com; >> kashyap.desai@broadcom.com >> Subject: Re: [PATCH 4/7] megaraid_sas: Send SYNCHRONIZE_CACHE command to >> firmware >> >> On 17.10.2016 12:24, Sumit Saxena wrote: >>> megaraid_sas driver returns SYNCHRONIZE_CACHE command back to SCSI >>> layer without sending it to firmware as firmware takes care of flushing > cache. >>> This patch will change the driver behavior wrt SYNCHRONIZE_CACHE > handling. >>> If underlying firmware has support to handle the SYNCHORNIZE_CACHE, >>> driver will send it for firmware otherwise complete it back to SCSI >>> layer with SUCCESS immediately. >>> If Firmware handle SYNCHORNIZE_CACHE for both VD and JBOD >>> "canHandleSyncCache" bit in scratch pad register(offset 0x00B4) will >>> be set. >>> >>> This behavior can be controlled via module parameter and user can >>> fallback to old behavior of returning SYNCHRONIZE_CACHE by driver only >>> without sending it to firmware. >>> >>> Signed-off-by: Sumit Saxena <sumit.saxena@broadcom.com> >>> Signed-off-by: Kashyap Desai <kashyap.desai@broadcom.com> >>> --- >>> drivers/scsi/megaraid/megaraid_sas.h | 3 +++ >>> drivers/scsi/megaraid/megaraid_sas_base.c | 14 ++++++-------- >>> drivers/scsi/megaraid/megaraid_sas_fusion.c | 3 +++ >>> drivers/scsi/megaraid/megaraid_sas_fusion.h | 2 ++ >>> 4 files changed, 14 insertions(+), 8 deletions(-) >>> >>> diff --git a/drivers/scsi/megaraid/megaraid_sas.h >>> b/drivers/scsi/megaraid/megaraid_sas.h >>> index ca86c88..43fd14f 100644 >>> --- a/drivers/scsi/megaraid/megaraid_sas.h >>> +++ b/drivers/scsi/megaraid/megaraid_sas.h >>> @@ -1429,6 +1429,8 @@ enum FW_BOOT_CONTEXT { >>> #define MR_MAX_REPLY_QUEUES_EXT_OFFSET_SHIFT 14 >>> #define MR_MAX_MSIX_REG_ARRAY 16 >>> #define MR_RDPQ_MODE_OFFSET 0X00800000 >>> +#define MR_CAN_HANDLE_SYNC_CACHE_OFFSET 0X01000000 >>> + >>> /* >>> * register set for both 1068 and 1078 controllers >>> * structure extended for 1078 registers @@ -2140,6 +2142,7 @@ struct >>> megasas_instance { >>> u8 is_imr; >>> u8 is_rdpq; >>> bool dev_handle; >>> + bool fw_sync_cache_support; >>> }; >>> struct MR_LD_VF_MAP { >>> u32 size; >>> diff --git a/drivers/scsi/megaraid/megaraid_sas_base.c >>> b/drivers/scsi/megaraid/megaraid_sas_base.c >>> index 092387f..a4a8e2f 100644 >>> --- a/drivers/scsi/megaraid/megaraid_sas_base.c >>> +++ b/drivers/scsi/megaraid/megaraid_sas_base.c >>> @@ -104,6 +104,10 @@ unsigned int scmd_timeout = >>> MEGASAS_DEFAULT_CMD_TIMEOUT; module_param(scmd_timeout, int, >>> S_IRUGO); MODULE_PARM_DESC(scmd_timeout, "scsi command timeout >>> (10-90s), default 90s. See megasas_reset_timer."); >>> >>> +bool block_sync_cache; >>> +module_param(block_sync_cache, bool, S_IRUGO); >>> +MODULE_PARM_DESC(block_sync_cache, "Block SYNC CACHE by driver >>> +Default: 0(Send it to firmware)"); >>> + >>> MODULE_LICENSE("GPL"); >>> MODULE_VERSION(MEGASAS_VERSION); >>> MODULE_AUTHOR("megaraidlinux.pdl@avagotech.com"); >>> @@ -1700,16 +1704,10 @@ megasas_queue_command(struct Scsi_Host >> *shost, struct scsi_cmnd *scmd) >>> goto out_done; >>> } >>> >>> - switch (scmd->cmnd[0]) { >>> - case SYNCHRONIZE_CACHE: >>> - /* >>> - * FW takes care of flush cache on its own >>> - * No need to send it down >>> - */ >>> + if ((scmd->cmnd[0] == SYNCHRONIZE_CACHE) && >>> + (!instance->fw_sync_cache_support)) { >> Maybe I'm wrong, but isn't the logic inverted? >> when fw_sync_cache_support is true it means that there is a flash cache > or a >> battery and we don't have to send the command down ? >> > If "instance->fw_sync_cache_support" is set to true, it means driver can > send SYNCHRONIZE_CACHE to firmware(firmware has infrastructure to support > SYNCHRONIZE_CACHE). > If "instance->fw_sync_cache_support" is set to false, it means FW does not > support to handle SYNCHRONIZE_CACHE and FW will flush cache during > shutdown. Thanks, in that case it is correct. >>> + if (!block_sync_cache) >>> + instance->fw_sync_cache_support = (scratch_pad_2 & >>> + MR_CAN_HANDLE_SYNC_CACHE_OFFSET) ? 1 : 0; Please add a warning or log the state of the synchronise cache command on the controller. >>> IOCInitMessage = >>> dma_alloc_coherent(&instance->pdev->dev, >>> diff --git a/drivers/scsi/megaraid/megaraid_sas_fusion.h >>> b/drivers/scsi/megaraid/megaraid_sas_fusion.h >>> index e3bee04..2857154 100644 >>> --- a/drivers/scsi/megaraid/megaraid_sas_fusion.h >>> +++ b/drivers/scsi/megaraid/megaraid_sas_fusion.h >>> @@ -72,6 +72,8 @@ >>> #define MPI2_SUP_REPLY_POST_HOST_INDEX_OFFSET (0x0000030C) >>> #define MPI2_REPLY_POST_HOST_INDEX_OFFSET (0x0000006C) >>> >>> +extern bool block_sync_cache; >>> + >>> /* >>> * Raid context flags >>> */ > -- > To unsubscribe from this list: send the line "unsubscribe linux-scsi" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
>-----Original Message----- >From: Tomas Henzl [mailto:thenzl@redhat.com] >Sent: Monday, October 17, 2016 7:27 PM >To: Sumit Saxena; linux-scsi@vger.kernel.org >Cc: martin.petersen@oracle.com; jejb@linux.vnet.ibm.com; Kashyap Desai >Subject: Re: [PATCH 4/7] megaraid_sas: Send SYNCHRONIZE_CACHE command to >firmware > >On 17.10.2016 15:28, Sumit Saxena wrote: >>> -----Original Message----- >>> From: Tomas Henzl [mailto:thenzl@redhat.com] >>> Sent: Monday, October 17, 2016 6:44 PM >>> To: Sumit Saxena; linux-scsi@vger.kernel.org >>> Cc: martin.petersen@oracle.com; jejb@linux.vnet.ibm.com; >>> kashyap.desai@broadcom.com >>> Subject: Re: [PATCH 4/7] megaraid_sas: Send SYNCHRONIZE_CACHE >command >>> to firmware >>> >>> On 17.10.2016 12:24, Sumit Saxena wrote: >>>> megaraid_sas driver returns SYNCHRONIZE_CACHE command back to SCSI >>>> layer without sending it to firmware as firmware takes care of >>>> flushing >> cache. >>>> This patch will change the driver behavior wrt SYNCHRONIZE_CACHE >> handling. >>>> If underlying firmware has support to handle the SYNCHORNIZE_CACHE, >>>> driver will send it for firmware otherwise complete it back to SCSI >>>> layer with SUCCESS immediately. >>>> If Firmware handle SYNCHORNIZE_CACHE for both VD and JBOD >>>> "canHandleSyncCache" bit in scratch pad register(offset 0x00B4) will >>>> be set. >>>> >>>> This behavior can be controlled via module parameter and user can >>>> fallback to old behavior of returning SYNCHRONIZE_CACHE by driver >>>> only without sending it to firmware. >>>> >>>> Signed-off-by: Sumit Saxena <sumit.saxena@broadcom.com> >>>> Signed-off-by: Kashyap Desai <kashyap.desai@broadcom.com> >>>> --- >>>> drivers/scsi/megaraid/megaraid_sas.h | 3 +++ >>>> drivers/scsi/megaraid/megaraid_sas_base.c | 14 ++++++-------- >>>> drivers/scsi/megaraid/megaraid_sas_fusion.c | 3 +++ >>>> drivers/scsi/megaraid/megaraid_sas_fusion.h | 2 ++ >>>> 4 files changed, 14 insertions(+), 8 deletions(-) >>>> >>>> diff --git a/drivers/scsi/megaraid/megaraid_sas.h >>>> b/drivers/scsi/megaraid/megaraid_sas.h >>>> index ca86c88..43fd14f 100644 >>>> --- a/drivers/scsi/megaraid/megaraid_sas.h >>>> +++ b/drivers/scsi/megaraid/megaraid_sas.h >>>> @@ -1429,6 +1429,8 @@ enum FW_BOOT_CONTEXT { >>>> #define MR_MAX_REPLY_QUEUES_EXT_OFFSET_SHIFT 14 >>>> #define MR_MAX_MSIX_REG_ARRAY 16 >>>> #define MR_RDPQ_MODE_OFFSET 0X00800000 >>>> +#define MR_CAN_HANDLE_SYNC_CACHE_OFFSET 0X01000000 >>>> + >>>> /* >>>> * register set for both 1068 and 1078 controllers >>>> * structure extended for 1078 registers @@ -2140,6 +2142,7 @@ >>>> struct megasas_instance { >>>> u8 is_imr; >>>> u8 is_rdpq; >>>> bool dev_handle; >>>> + bool fw_sync_cache_support; >>>> }; >>>> struct MR_LD_VF_MAP { >>>> u32 size; >>>> diff --git a/drivers/scsi/megaraid/megaraid_sas_base.c >>>> b/drivers/scsi/megaraid/megaraid_sas_base.c >>>> index 092387f..a4a8e2f 100644 >>>> --- a/drivers/scsi/megaraid/megaraid_sas_base.c >>>> +++ b/drivers/scsi/megaraid/megaraid_sas_base.c >>>> @@ -104,6 +104,10 @@ unsigned int scmd_timeout = >>>> MEGASAS_DEFAULT_CMD_TIMEOUT; module_param(scmd_timeout, int, >>>> S_IRUGO); MODULE_PARM_DESC(scmd_timeout, "scsi command timeout >>>> (10-90s), default 90s. See megasas_reset_timer."); >>>> >>>> +bool block_sync_cache; >>>> +module_param(block_sync_cache, bool, S_IRUGO); >>>> +MODULE_PARM_DESC(block_sync_cache, "Block SYNC CACHE by driver >>>> +Default: 0(Send it to firmware)"); >>>> + >>>> MODULE_LICENSE("GPL"); >>>> MODULE_VERSION(MEGASAS_VERSION); >>>> MODULE_AUTHOR("megaraidlinux.pdl@avagotech.com"); >>>> @@ -1700,16 +1704,10 @@ megasas_queue_command(struct Scsi_Host >>> *shost, struct scsi_cmnd *scmd) >>>> goto out_done; >>>> } >>>> >>>> - switch (scmd->cmnd[0]) { >>>> - case SYNCHRONIZE_CACHE: >>>> - /* >>>> - * FW takes care of flush cache on its own >>>> - * No need to send it down >>>> - */ >>>> + if ((scmd->cmnd[0] == SYNCHRONIZE_CACHE) && >>>> + (!instance->fw_sync_cache_support)) { >>> Maybe I'm wrong, but isn't the logic inverted? >>> when fw_sync_cache_support is true it means that there is a flash >>> cache >> or a >>> battery and we don't have to send the command down ? >>> >> If "instance->fw_sync_cache_support" is set to true, it means driver >> can send SYNCHRONIZE_CACHE to firmware(firmware has infrastructure to >> support SYNCHRONIZE_CACHE). >> If "instance->fw_sync_cache_support" is set to false, it means FW does >> not support to handle SYNCHRONIZE_CACHE and FW will flush cache during >> shutdown. > >Thanks, in that case it is correct. > >>>> + if (!block_sync_cache) >>>> + instance->fw_sync_cache_support = (scratch_pad_2 & >>>> + MR_CAN_HANDLE_SYNC_CACHE_OFFSET) ? 1 : 0; > >Please add a warning or log the state of the synchronise cache command on >the >controller. Ok. thanks for pointing it out. I will add a print for the same. > > >>>> IOCInitMessage = >>>> dma_alloc_coherent(&instance->pdev->dev, >>>> diff --git a/drivers/scsi/megaraid/megaraid_sas_fusion.h >>>> b/drivers/scsi/megaraid/megaraid_sas_fusion.h >>>> index e3bee04..2857154 100644 >>>> --- a/drivers/scsi/megaraid/megaraid_sas_fusion.h >>>> +++ b/drivers/scsi/megaraid/megaraid_sas_fusion.h >>>> @@ -72,6 +72,8 @@ >>>> #define MPI2_SUP_REPLY_POST_HOST_INDEX_OFFSET (0x0000030C) >>>> #define MPI2_REPLY_POST_HOST_INDEX_OFFSET (0x0000006C) >>>> >>>> +extern bool block_sync_cache; >>>> + >>>> /* >>>> * Raid context flags >>>> */ >> -- >> To unsubscribe from this list: send the line "unsubscribe linux-scsi" >> in the body of a message to majordomo@vger.kernel.org More majordomo >> info at http://vger.kernel.org/majordomo-info.html > -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, Oct 17, 2016 at 09:01:29AM -0400, Ric Wheeler wrote: > This must go in - without this fix, there is no data integrity for any file system. megaraid always had odd ideas on cache flushing, and this might be a opportunity to write down all the assumptions and document them. > In effect, this driver by default has been throwing away SYNCHRONIZE_CACHE > commands even when acting in JBOD/non-RAID mode. That would explain some issues we've seen with megaraid hardware, but it seems a bit too shocking to be true. Looking over the patch I disagree with the module option - we must do the right thing by default, which is sending SYNCHRONIZE_CACHE commands if the WCE bit is set. If there are controllers where this is harmful for RAID mode and we can't fix the firmware in time we'll need to make special exceptions for this case in the driver based on the PCI ID and knowing what we talk to instead of leaving it to the user. > * having T10 & T13 report the existence of a volatile write cache - this is > different than WCE set, some devices have a write cache and are > battery/flash backed. T10 is pretty clear now the WCE should only be set for a non-voltile cache. For a while they had odd NV bits to allow flushing a non-volatile cache, but in the latest revisions all that is gone. -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 10/17/2016 11:55 AM, Christoph Hellwig wrote: > On Mon, Oct 17, 2016 at 09:01:29AM -0400, Ric Wheeler wrote: >> This must go in - without this fix, there is no data integrity for any file system. > megaraid always had odd ideas on cache flushing, and this might be > a opportunity to write down all the assumptions and document them. > >> In effect, this driver by default has been throwing away SYNCHRONIZE_CACHE >> commands even when acting in JBOD/non-RAID mode. > That would explain some issues we've seen with megaraid hardware, but > it seems a bit too shocking to be true. > > Looking over the patch I disagree with the module option - we must do > the right thing by default, which is sending SYNCHRONIZE_CACHE commands > if the WCE bit is set. If there are controllers where this is harmful > for RAID mode and we can't fix the firmware in time we'll need to make > special exceptions for this case in the driver based on the PCI ID > and knowing what we talk to instead of leaving it to the user. I do agree - having users be able to disable this easily is asking for trouble. It will be slower, but slower because the cache flush actually is effective. > >> * having T10 & T13 report the existence of a volatile write cache - this is >> different than WCE set, some devices have a write cache and are >> battery/flash backed. > T10 is pretty clear now the WCE should only be set for a non-voltile > cache. For a while they had odd NV bits to allow flushing a > non-volatile cache, but in the latest revisions all that is gone. If T10 has clarity on this, then the actual fix would be to have the driver advertise WCE enabled only for the pass through/non-RAID luns (assuming the drive's write cache was enabled) and then we would leave WCE disabled for the targets that the firmware handles cache destaging for. -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, 2016-10-17 at 09:01 -0400, Ric Wheeler wrote: > On 10/17/2016 07:34 AM, Hannes Reinecke wrote: > > On 10/17/2016 12:24 PM, Sumit Saxena wrote: > > > megaraid_sas driver returns SYNCHRONIZE_CACHE command back to > > > SCSI layer without sending it to firmware as firmware takes care > > > of flushing cache. This patch will change the driver behavior > > > wrt SYNCHRONIZE_CACHE handling. If underlying firmware has > > > support to handle the SYNCHORNIZE_CACHE, driver will send it for > > > firmware otherwise complete it back to SCSI layer with SUCCESS > > > immediately. If Firmware handle SYNCHORNIZE_CACHE for both VD > > > and JBOD "canHandleSyncCache" bit in scratch pad register(offset > > > 0x00B4) will be set. > > > > > > This behavior can be controlled via module parameter and user can > > > fallback to old behavior of returning SYNCHRONIZE_CACHE by driver > > > only without sending it to firmware. > > > > > > Signed-off-by: Sumit Saxena <sumit.saxena@broadcom.com> > > > Signed-off-by: Kashyap Desai <kashyap.desai@broadcom.com> > > > --- > > > drivers/scsi/megaraid/megaraid_sas.h | 3 +++ > > > drivers/scsi/megaraid/megaraid_sas_base.c | 14 ++++++-------- > > > drivers/scsi/megaraid/megaraid_sas_fusion.c | 3 +++ > > > drivers/scsi/megaraid/megaraid_sas_fusion.h | 2 ++ > > > 4 files changed, 14 insertions(+), 8 deletions(-) > > > > > > diff --git a/drivers/scsi/megaraid/megaraid_sas.h > > > b/drivers/scsi/megaraid/megaraid_sas.h > > > index ca86c88..43fd14f 100644 > > > --- a/drivers/scsi/megaraid/megaraid_sas.h > > > +++ b/drivers/scsi/megaraid/megaraid_sas.h > > > @@ -1429,6 +1429,8 @@ enum FW_BOOT_CONTEXT { > > > #define MR_MAX_REPLY_QUEUES_EXT_OFFSET_SHIFT 14 > > > #define MR_MAX_MSIX_REG_ARRAY 16 > > > #define MR_RDPQ_MODE_OFFSET 0X00800000 > > > +#define MR_CAN_HANDLE_SYNC_CACHE_OFFSET 0X0100000 > > > 0 > > > + > > > /* > > > * register set for both 1068 and 1078 controllers > > > * structure extended for 1078 registers > > > @@ -2140,6 +2142,7 @@ struct megasas_instance { > > > u8 is_imr; > > > u8 is_rdpq; > > > bool dev_handle; > > > + bool fw_sync_cache_support; > > > }; > > > struct MR_LD_VF_MAP { > > > u32 size; > > > diff --git a/drivers/scsi/megaraid/megaraid_sas_base.c > > > b/drivers/scsi/megaraid/megaraid_sas_base.c > > > index 092387f..a4a8e2f 100644 > > > --- a/drivers/scsi/megaraid/megaraid_sas_base.c > > > +++ b/drivers/scsi/megaraid/megaraid_sas_base.c > > > @@ -104,6 +104,10 @@ unsigned int scmd_timeout = > > > MEGASAS_DEFAULT_CMD_TIMEOUT; > > > module_param(scmd_timeout, int, S_IRUGO); > > > MODULE_PARM_DESC(scmd_timeout, "scsi command timeout (10-90s), > > > default 90s. See megasas_reset_timer."); > > > > > > +bool block_sync_cache; > > > +module_param(block_sync_cache, bool, S_IRUGO); > > > +MODULE_PARM_DESC(block_sync_cache, "Block SYNC CACHE by driver > > > Default: 0(Send it to firmware)"); > > > + > > > MODULE_LICENSE("GPL"); > > > MODULE_VERSION(MEGASAS_VERSION); > > > MODULE_AUTHOR("megaraidlinux.pdl@avagotech.com"); > > > @@ -1700,16 +1704,10 @@ megasas_queue_command(struct Scsi_Host > > > *shost, struct scsi_cmnd *scmd) > > > goto out_done; > > > } > > > > > > - switch (scmd->cmnd[0]) { > > > - case SYNCHRONIZE_CACHE: > > > - /* > > > - * FW takes care of flush cache on its own > > > - * No need to send it down > > > - */ > > > + if ((scmd->cmnd[0] == SYNCHRONIZE_CACHE) && > > > + (!instance->fw_sync_cache_support)) { > > > scmd->result = DID_OK << 16; > > > goto out_done; > > > - default: > > > - break; > > > } > > > > > > return instance->instancet > > > ->build_and_issue_cmd(instance, scmd); > > > diff --git a/drivers/scsi/megaraid/megaraid_sas_fusion.c > > > b/drivers/scsi/megaraid/megaraid_sas_fusion.c > > > index 2159f6a..8237580 100644 > > > --- a/drivers/scsi/megaraid/megaraid_sas_fusion.c > > > +++ b/drivers/scsi/megaraid/megaraid_sas_fusion.c > > > @@ -747,6 +747,9 @@ megasas_ioc_init_fusion(struct > > > megasas_instance *instance) > > > ret = 1; > > > goto fail_fw_init; > > > } > > > + if (!block_sync_cache) > > > + instance->fw_sync_cache_support = (scratch_pad_2 > > > & > > > + MR_CAN_HANDLE_SYNC_CACHE_OFFSET) ? 1 : > > > 0; > > > > > > IOCInitMessage = > > > dma_alloc_coherent(&instance->pdev->dev, > > > diff --git a/drivers/scsi/megaraid/megaraid_sas_fusion.h > > > b/drivers/scsi/megaraid/megaraid_sas_fusion.h > > > index e3bee04..2857154 100644 > > > --- a/drivers/scsi/megaraid/megaraid_sas_fusion.h > > > +++ b/drivers/scsi/megaraid/megaraid_sas_fusion.h > > > @@ -72,6 +72,8 @@ > > > #define MPI2_SUP_REPLY_POST_HOST_INDEX_OFFSET (0x0000030C) > > > #define MPI2_REPLY_POST_HOST_INDEX_OFFSET (0x0000006C) > > > > > > +extern bool block_sync_cache; > > > + > > > /* > > > * Raid context flags > > > */ > > > > > Be extra careful with that. > > > > SYNCHRONIZE_CACHE has (potentially) a global scope, as there might > > be an array-wide cache, and a cache flush would affect the entire > > cache. Linux is using SYNCHRONIZE_CACHE as a per device setting, ie > > it assumes that the effects of a cache flush is restricted to the > > device in question. > > > > If this is _not_ the case I'd rather not enable it. > > Have you checked that enabling this functionality doesn't have > > negative performance impact? > > > > Cheers, > > > > Hannes > > This must go in - without this fix, there is no data integrity for > any file system. That's not what I get from the change log. What it says to me is that the caches are currently firmware managed. Barring firmware bugs, that means that we currently don't have any integrity issues. > In effect, this driver by default has been throwing away > SYNCHRONIZE_CACHE commands even when acting in JBOD/non-RAID mode. That's not what the changelog says. It says the cache flushing has been managed by the firmware only. Meaning the firmware decides when to flush the cache. Barring firmware bugs, this should mean that integrity is preserved in either mode. > Of course, actually doing a SYNCHRONIZE_CACHE to drives will be > slower than throwing it away, but this is not optional. What Hannes means is that we need to know that turning over cache management to Linux is a) safe and b) not a performance regression. Given that there aren't any integrity issues, that's a reasonable request. > We really need to have some ways to validate that our IO stack is > properly and safely configured. > > I would love to see a couple of things: > > * having T10 & T13 report the existence of a volatile write cache - > this is different than WCE set, some devices have a write cache and > are battery/flash backed. That's the non-volatile cache log page. James > * having a robust tool to test over power failure/disconnect that our > assumptions are true - any write is durable after a SYNCHRONIZE_CACHE > or CACHE_FLUSH_EXT is sent and ack'ed > > Regards, > > Ric > > > -- > To unsubscribe from this list: send the line "unsubscribe linux-scsi" > in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, 2016-10-17 at 12:08 -0400, Ric Wheeler wrote: > On 10/17/2016 11:55 AM, Christoph Hellwig wrote: > > On Mon, Oct 17, 2016 at 09:01:29AM -0400, Ric Wheeler wrote: > >> This must go in - without this fix, there is no data integrity for any file system. > > megaraid always had odd ideas on cache flushing, and this might be > > a opportunity to write down all the assumptions and document them. > > > >> In effect, this driver by default has been throwing away SYNCHRONIZE_CACHE > >> commands even when acting in JBOD/non-RAID mode. > > That would explain some issues we've seen with megaraid hardware, but > > it seems a bit too shocking to be true. > > > > Looking over the patch I disagree with the module option - we must do > > the right thing by default, which is sending SYNCHRONIZE_CACHE commands > > if the WCE bit is set. If there are controllers where this is harmful > > for RAID mode and we can't fix the firmware in time we'll need to make > > special exceptions for this case in the driver based on the PCI ID > > and knowing what we talk to instead of leaving it to the user. > > I do agree - having users be able to disable this easily is asking for trouble. > It will be slower, but slower because the cache flush actually is effective. Absolutely. It is a bad idea to allow users to disable correct behavior in order to achieve better performance, you will always get people who will do this and then pay the price eventually when they get corruption. > > > > >> * having T10 & T13 report the existence of a volatile write cache - this is > >> different than WCE set, some devices have a write cache and are > >> battery/flash backed. > > T10 is pretty clear now the WCE should only be set for a non-voltile > > cache. For a while they had odd NV bits to allow flushing a > > non-volatile cache, but in the latest revisions all that is gone. > > If T10 has clarity on this, then the actual fix would be to have the driver > advertise WCE enabled only for the pass through/non-RAID luns (assuming the > drive's write cache was enabled) and then we would leave WCE disabled for the > targets that the firmware handles cache destaging for. > > -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 10/17/2016 12:20 PM, James Bottomley wrote: > On Mon, 2016-10-17 at 09:01 -0400, Ric Wheeler wrote: >> On 10/17/2016 07:34 AM, Hannes Reinecke wrote: >>> On 10/17/2016 12:24 PM, Sumit Saxena wrote: >>>> megaraid_sas driver returns SYNCHRONIZE_CACHE command back to >>>> SCSI layer without sending it to firmware as firmware takes care >>>> of flushing cache. This patch will change the driver behavior >>>> wrt SYNCHRONIZE_CACHE handling. If underlying firmware has >>>> support to handle the SYNCHORNIZE_CACHE, driver will send it for >>>> firmware otherwise complete it back to SCSI layer with SUCCESS >>>> immediately. If Firmware handle SYNCHORNIZE_CACHE for both VD >>>> and JBOD "canHandleSyncCache" bit in scratch pad register(offset >>>> 0x00B4) will be set. >>>> >>>> This behavior can be controlled via module parameter and user can >>>> fallback to old behavior of returning SYNCHRONIZE_CACHE by driver >>>> only without sending it to firmware. >>>> >>>> Signed-off-by: Sumit Saxena <sumit.saxena@broadcom.com> >>>> Signed-off-by: Kashyap Desai <kashyap.desai@broadcom.com> >>>> --- >>>> drivers/scsi/megaraid/megaraid_sas.h | 3 +++ >>>> drivers/scsi/megaraid/megaraid_sas_base.c | 14 ++++++-------- >>>> drivers/scsi/megaraid/megaraid_sas_fusion.c | 3 +++ >>>> drivers/scsi/megaraid/megaraid_sas_fusion.h | 2 ++ >>>> 4 files changed, 14 insertions(+), 8 deletions(-) >>>> >>>> diff --git a/drivers/scsi/megaraid/megaraid_sas.h >>>> b/drivers/scsi/megaraid/megaraid_sas.h >>>> index ca86c88..43fd14f 100644 >>>> --- a/drivers/scsi/megaraid/megaraid_sas.h >>>> +++ b/drivers/scsi/megaraid/megaraid_sas.h >>>> @@ -1429,6 +1429,8 @@ enum FW_BOOT_CONTEXT { >>>> #define MR_MAX_REPLY_QUEUES_EXT_OFFSET_SHIFT 14 >>>> #define MR_MAX_MSIX_REG_ARRAY 16 >>>> #define MR_RDPQ_MODE_OFFSET 0X00800000 >>>> +#define MR_CAN_HANDLE_SYNC_CACHE_OFFSET 0X0100000 >>>> 0 >>>> + >>>> /* >>>> * register set for both 1068 and 1078 controllers >>>> * structure extended for 1078 registers >>>> @@ -2140,6 +2142,7 @@ struct megasas_instance { >>>> u8 is_imr; >>>> u8 is_rdpq; >>>> bool dev_handle; >>>> + bool fw_sync_cache_support; >>>> }; >>>> struct MR_LD_VF_MAP { >>>> u32 size; >>>> diff --git a/drivers/scsi/megaraid/megaraid_sas_base.c >>>> b/drivers/scsi/megaraid/megaraid_sas_base.c >>>> index 092387f..a4a8e2f 100644 >>>> --- a/drivers/scsi/megaraid/megaraid_sas_base.c >>>> +++ b/drivers/scsi/megaraid/megaraid_sas_base.c >>>> @@ -104,6 +104,10 @@ unsigned int scmd_timeout = >>>> MEGASAS_DEFAULT_CMD_TIMEOUT; >>>> module_param(scmd_timeout, int, S_IRUGO); >>>> MODULE_PARM_DESC(scmd_timeout, "scsi command timeout (10-90s), >>>> default 90s. See megasas_reset_timer."); >>>> >>>> +bool block_sync_cache; >>>> +module_param(block_sync_cache, bool, S_IRUGO); >>>> +MODULE_PARM_DESC(block_sync_cache, "Block SYNC CACHE by driver >>>> Default: 0(Send it to firmware)"); >>>> + >>>> MODULE_LICENSE("GPL"); >>>> MODULE_VERSION(MEGASAS_VERSION); >>>> MODULE_AUTHOR("megaraidlinux.pdl@avagotech.com"); >>>> @@ -1700,16 +1704,10 @@ megasas_queue_command(struct Scsi_Host >>>> *shost, struct scsi_cmnd *scmd) >>>> goto out_done; >>>> } >>>> >>>> - switch (scmd->cmnd[0]) { >>>> - case SYNCHRONIZE_CACHE: >>>> - /* >>>> - * FW takes care of flush cache on its own >>>> - * No need to send it down >>>> - */ >>>> + if ((scmd->cmnd[0] == SYNCHRONIZE_CACHE) && >>>> + (!instance->fw_sync_cache_support)) { >>>> scmd->result = DID_OK << 16; >>>> goto out_done; >>>> - default: >>>> - break; >>>> } >>>> >>>> return instance->instancet >>>> ->build_and_issue_cmd(instance, scmd); >>>> diff --git a/drivers/scsi/megaraid/megaraid_sas_fusion.c >>>> b/drivers/scsi/megaraid/megaraid_sas_fusion.c >>>> index 2159f6a..8237580 100644 >>>> --- a/drivers/scsi/megaraid/megaraid_sas_fusion.c >>>> +++ b/drivers/scsi/megaraid/megaraid_sas_fusion.c >>>> @@ -747,6 +747,9 @@ megasas_ioc_init_fusion(struct >>>> megasas_instance *instance) >>>> ret = 1; >>>> goto fail_fw_init; >>>> } >>>> + if (!block_sync_cache) >>>> + instance->fw_sync_cache_support = (scratch_pad_2 >>>> & >>>> + MR_CAN_HANDLE_SYNC_CACHE_OFFSET) ? 1 : >>>> 0; >>>> >>>> IOCInitMessage = >>>> dma_alloc_coherent(&instance->pdev->dev, >>>> diff --git a/drivers/scsi/megaraid/megaraid_sas_fusion.h >>>> b/drivers/scsi/megaraid/megaraid_sas_fusion.h >>>> index e3bee04..2857154 100644 >>>> --- a/drivers/scsi/megaraid/megaraid_sas_fusion.h >>>> +++ b/drivers/scsi/megaraid/megaraid_sas_fusion.h >>>> @@ -72,6 +72,8 @@ >>>> #define MPI2_SUP_REPLY_POST_HOST_INDEX_OFFSET (0x0000030C) >>>> #define MPI2_REPLY_POST_HOST_INDEX_OFFSET (0x0000006C) >>>> >>>> +extern bool block_sync_cache; >>>> + >>>> /* >>>> * Raid context flags >>>> */ >>>> >>> Be extra careful with that. >>> >>> SYNCHRONIZE_CACHE has (potentially) a global scope, as there might >>> be an array-wide cache, and a cache flush would affect the entire >>> cache. Linux is using SYNCHRONIZE_CACHE as a per device setting, ie >>> it assumes that the effects of a cache flush is restricted to the >>> device in question. >>> >>> If this is _not_ the case I'd rather not enable it. >>> Have you checked that enabling this functionality doesn't have >>> negative performance impact? >>> >>> Cheers, >>> >>> Hannes >> This must go in - without this fix, there is no data integrity for >> any file system. > That's not what I get from the change log. What it says to me is that > the caches are currently firmware managed. Barring firmware bugs, that > means that we currently don't have any integrity issues. Your understanding (or the change log) is incorrect. The issue here is for devices that are non-RAID or pass through - this is a real issue and has been seen in practice. Ric > >> In effect, this driver by default has been throwing away >> SYNCHRONIZE_CACHE commands even when acting in JBOD/non-RAID mode. > That's not what the changelog says. It says the cache flushing has > been managed by the firmware only. Meaning the firmware decides when > to flush the cache. Barring firmware bugs, this should mean that > integrity is preserved in either mode. > >> Of course, actually doing a SYNCHRONIZE_CACHE to drives will be >> slower than throwing it away, but this is not optional. > What Hannes means is that we need to know that turning over cache > management to Linux is a) safe and b) not a performance regression. > Given that there aren't any integrity issues, that's a reasonable > request. > >> We really need to have some ways to validate that our IO stack is >> properly and safely configured. >> >> I would love to see a couple of things: >> >> * having T10 & T13 report the existence of a volatile write cache - >> this is different than WCE set, some devices have a write cache and >> are battery/flash backed. > That's the non-volatile cache log page. > > James > > >> * having a robust tool to test over power failure/disconnect that our >> assumptions are true - any write is durable after a SYNCHRONIZE_CACHE >> or CACHE_FLUSH_EXT is sent and ack'ed >> >> Regards, >> >> Ric >> >> >> -- >> To unsubscribe from this list: send the line "unsubscribe linux-scsi" >> in >> the body of a message to majordomo@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html >> -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 10/17/2016 12:20 PM, James Bottomley wrote: >> We really need to have some ways to validate that our IO stack is >> >properly and safely configured. >> > >> >I would love to see a couple of things: >> > >> >* having T10 & T13 report the existence of a volatile write cache - >> >this is different than WCE set, some devices have a write cache and >> >are battery/flash backed. > That's the non-volatile cache log page. > > James > > That is not how the vendors implement this unfortunately. I dug into this non-volatile cache log page with several disk vendors last year and they were consistently not setting this to reflect that state. Ric -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, 2016-10-17 at 12:27 -0400, Ric Wheeler wrote: > On 10/17/2016 12:20 PM, James Bottomley wrote: > > On Mon, 2016-10-17 at 09:01 -0400, Ric Wheeler wrote: > > > On 10/17/2016 07:34 AM, Hannes Reinecke wrote: > > > > On 10/17/2016 12:24 PM, Sumit Saxena wrote: > > > > > megaraid_sas driver returns SYNCHRONIZE_CACHE command back to > > > > > SCSI layer without sending it to firmware as firmware takes > > > > > care > > > > > of flushing cache. This patch will change the driver > > > > > behavior > > > > > wrt SYNCHRONIZE_CACHE handling. If underlying firmware has > > > > > support to handle the SYNCHORNIZE_CACHE, driver will send it > > > > > for > > > > > firmware otherwise complete it back to SCSI layer with > > > > > SUCCESS > > > > > immediately. If Firmware handle SYNCHORNIZE_CACHE for both > > > > > VD > > > > > and JBOD "canHandleSyncCache" bit in scratch pad > > > > > register(offset > > > > > 0x00B4) will be set. > > > > > > > > > > This behavior can be controlled via module parameter and user > > > > > can > > > > > fallback to old behavior of returning SYNCHRONIZE_CACHE by > > > > > driver > > > > > only without sending it to firmware. > > > > > > > > > > Signed-off-by: Sumit Saxena <sumit.saxena@broadcom.com> > > > > > Signed-off-by: Kashyap Desai <kashyap.desai@broadcom.com> > > > > > --- > > > > > drivers/scsi/megaraid/megaraid_sas.h | 3 +++ > > > > > drivers/scsi/megaraid/megaraid_sas_base.c | 14 ++++++--- > > > > > ----- > > > > > drivers/scsi/megaraid/megaraid_sas_fusion.c | 3 +++ > > > > > drivers/scsi/megaraid/megaraid_sas_fusion.h | 2 ++ > > > > > 4 files changed, 14 insertions(+), 8 deletions(-) > > > > > > > > > > diff --git a/drivers/scsi/megaraid/megaraid_sas.h > > > > > b/drivers/scsi/megaraid/megaraid_sas.h > > > > > index ca86c88..43fd14f 100644 > > > > > --- a/drivers/scsi/megaraid/megaraid_sas.h > > > > > +++ b/drivers/scsi/megaraid/megaraid_sas.h > > > > > @@ -1429,6 +1429,8 @@ enum FW_BOOT_CONTEXT { > > > > > #define MR_MAX_REPLY_QUEUES_EXT_OFFSET_SHIFT 14 > > > > > #define MR_MAX_MSIX_REG_ARRAY 16 > > > > > #define MR_RDPQ_MODE_OFFSET 0X00800 > > > > > 000 > > > > > +#define MR_CAN_HANDLE_SYNC_CACHE_OFFSET 0X010 > > > > > 0000 > > > > > 0 > > > > > + > > > > > /* > > > > > * register set for both 1068 and 1078 controllers > > > > > * structure extended for 1078 registers > > > > > @@ -2140,6 +2142,7 @@ struct megasas_instance { > > > > > u8 is_imr; > > > > > u8 is_rdpq; > > > > > bool dev_handle; > > > > > + bool fw_sync_cache_support; > > > > > }; > > > > > struct MR_LD_VF_MAP { > > > > > u32 size; > > > > > diff --git a/drivers/scsi/megaraid/megaraid_sas_base.c > > > > > b/drivers/scsi/megaraid/megaraid_sas_base.c > > > > > index 092387f..a4a8e2f 100644 > > > > > --- a/drivers/scsi/megaraid/megaraid_sas_base.c > > > > > +++ b/drivers/scsi/megaraid/megaraid_sas_base.c > > > > > @@ -104,6 +104,10 @@ unsigned int scmd_timeout = > > > > > MEGASAS_DEFAULT_CMD_TIMEOUT; > > > > > module_param(scmd_timeout, int, S_IRUGO); > > > > > MODULE_PARM_DESC(scmd_timeout, "scsi command timeout (10 > > > > > -90s), > > > > > default 90s. See megasas_reset_timer."); > > > > > > > > > > +bool block_sync_cache; > > > > > +module_param(block_sync_cache, bool, S_IRUGO); > > > > > +MODULE_PARM_DESC(block_sync_cache, "Block SYNC CACHE by > > > > > driver > > > > > Default: 0(Send it to firmware)"); > > > > > + > > > > > MODULE_LICENSE("GPL"); > > > > > MODULE_VERSION(MEGASAS_VERSION); > > > > > MODULE_AUTHOR("megaraidlinux.pdl@avagotech.com"); > > > > > @@ -1700,16 +1704,10 @@ megasas_queue_command(struct > > > > > Scsi_Host > > > > > *shost, struct scsi_cmnd *scmd) > > > > > goto out_done; > > > > > } > > > > > > > > > > - switch (scmd->cmnd[0]) { > > > > > - case SYNCHRONIZE_CACHE: > > > > > - /* > > > > > - * FW takes care of flush cache on its own > > > > > - * No need to send it down > > > > > - */ > > > > > + if ((scmd->cmnd[0] == SYNCHRONIZE_CACHE) && > > > > > + (!instance->fw_sync_cache_support)) { > > > > > scmd->result = DID_OK << 16; > > > > > goto out_done; > > > > > - default: > > > > > - break; > > > > > } > > > > > > > > > > return instance->instancet > > > > > ->build_and_issue_cmd(instance, scmd); > > > > > diff --git a/drivers/scsi/megaraid/megaraid_sas_fusion.c > > > > > b/drivers/scsi/megaraid/megaraid_sas_fusion.c > > > > > index 2159f6a..8237580 100644 > > > > > --- a/drivers/scsi/megaraid/megaraid_sas_fusion.c > > > > > +++ b/drivers/scsi/megaraid/megaraid_sas_fusion.c > > > > > @@ -747,6 +747,9 @@ megasas_ioc_init_fusion(struct > > > > > megasas_instance *instance) > > > > > ret = 1; > > > > > goto fail_fw_init; > > > > > } > > > > > + if (!block_sync_cache) > > > > > + instance->fw_sync_cache_support = > > > > > (scratch_pad_2 > > > > > & > > > > > + MR_CAN_HANDLE_SYNC_CACHE_OFFSET) ? 1 > > > > > : > > > > > 0; > > > > > > > > > > IOCInitMessage = > > > > > dma_alloc_coherent(&instance->pdev->dev, > > > > > diff --git a/drivers/scsi/megaraid/megaraid_sas_fusion.h > > > > > b/drivers/scsi/megaraid/megaraid_sas_fusion.h > > > > > index e3bee04..2857154 100644 > > > > > --- a/drivers/scsi/megaraid/megaraid_sas_fusion.h > > > > > +++ b/drivers/scsi/megaraid/megaraid_sas_fusion.h > > > > > @@ -72,6 +72,8 @@ > > > > > #define MPI2_SUP_REPLY_POST_HOST_INDEX_OFFSET > > > > > (0x0000030C) > > > > > #define MPI2_REPLY_POST_HOST_INDEX_OFFSET (0x000000 > > > > > 6C) > > > > > > > > > > +extern bool block_sync_cache; > > > > > + > > > > > /* > > > > > * Raid context flags > > > > > */ > > > > > > > > > Be extra careful with that. > > > > > > > > SYNCHRONIZE_CACHE has (potentially) a global scope, as there > > > > might > > > > be an array-wide cache, and a cache flush would affect the > > > > entire > > > > cache. Linux is using SYNCHRONIZE_CACHE as a per device > > > > setting, ie > > > > it assumes that the effects of a cache flush is restricted to > > > > the > > > > device in question. > > > > > > > > If this is _not_ the case I'd rather not enable it. > > > > Have you checked that enabling this functionality doesn't have > > > > negative performance impact? > > > > > > > > Cheers, > > > > > > > > Hannes > > > This must go in - without this fix, there is no data integrity > > > for > > > any file system. > > That's not what I get from the change log. What it says to me is > > that the caches are currently firmware managed. Barring firmware > > bugs, that means that we currently don't have any integrity issues. > > Your understanding (or the change log) is incorrect. There's no current way in English of construing "as firmware takes care of flushing cache" to mean its inverse, so the changelog needs updating to explain that firmware does *not* take care of cache flushing, so effectively nothing does and we'll need a cc to stable if the problem is that nothing takes care of flushing the drive caches. James > The issue here is for devices that are non-RAID or pass through - > this is a real issue and has been seen in practice. -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 10/17/2016 01:19 PM, James Bottomley wrote: >>> That's not what I get from the change log. What it says to me is >>> > >that the caches are currently firmware managed. Barring firmware >>> > >bugs, that means that we currently don't have any integrity issues. >> > >> >Your understanding (or the change log) is incorrect. > There's no current way in English of construing "as firmware takes care > of flushing cache" to mean its inverse, so the changelog needs updating > to explain that firmware does*not* take care of cache flushing, so > effectively nothing does and we'll need a cc to stable if the problem > is that nothing takes care of flushing the drive caches. > > James > > I agree that the changelog should be fixed, but the code itself is clearly broken in how is discards synchronize_cache commands. I assume Sumit will update that for use. The specific situation is this: * when using devices in RAID mode, the firmware does handle cache flushes (I assume it disables the back end disk write caches and handles its HBA write cache as you would expect). * when using devices in non-RAID mode or pass through mode where the backend disks have write cache enabled, dropping the "SYNCHRONIZE_CACHE" commands in the driver is *never* correct for any disk with a volatile write cache * there is no standard way to distinguish devices that have WCE set and have non-volatile write caches, so we have to assume the worst. Regards, Ric -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
> -----Original Message----- > From: James Bottomley [mailto:jejb@linux.vnet.ibm.com] > Sent: Monday, October 17, 2016 10:50 PM > To: Ric Wheeler; Hannes Reinecke; Sumit Saxena; linux-scsi@vger.kernel.org > Cc: martin.petersen@oracle.com; thenzl@redhat.com; > kashyap.desai@broadcom.com; Christoph Hellwig; Martin K. Petersen; Jeff > Moyer; Gris Ge; Ewan Milne; Jens Axboe > Subject: Re: [PATCH 4/7] megaraid_sas: Send SYNCHRONIZE_CACHE command > to firmware > > On Mon, 2016-10-17 at 12:27 -0400, Ric Wheeler wrote: > > On 10/17/2016 12:20 PM, James Bottomley wrote: > > > On Mon, 2016-10-17 at 09:01 -0400, Ric Wheeler wrote: > > > > On 10/17/2016 07:34 AM, Hannes Reinecke wrote: > > > > > On 10/17/2016 12:24 PM, Sumit Saxena wrote: > > > > > > megaraid_sas driver returns SYNCHRONIZE_CACHE command back to > > > > > > SCSI layer without sending it to firmware as firmware takes > > > > > > care of flushing cache. This patch will change the driver > > > > > > behavior wrt SYNCHRONIZE_CACHE handling. If underlying > > > > > > firmware has support to handle the SYNCHORNIZE_CACHE, driver > > > > > > will send it for firmware otherwise complete it back to SCSI > > > > > > layer with SUCCESS immediately. If Firmware handle > > > > > > SYNCHORNIZE_CACHE for both VD and JBOD "canHandleSyncCache" > > > > > > bit in scratch pad register(offset > > > > > > 0x00B4) will be set. > > > > > > > > > > > > This behavior can be controlled via module parameter and user > > > > > > can fallback to old behavior of returning SYNCHRONIZE_CACHE by > > > > > > driver only without sending it to firmware. > > > > > > > > > > > > Signed-off-by: Sumit Saxena <sumit.saxena@broadcom.com> > > > > > > Signed-off-by: Kashyap Desai <kashyap.desai@broadcom.com> > > > > > > --- > > > > > > drivers/scsi/megaraid/megaraid_sas.h | 3 +++ > > > > > > drivers/scsi/megaraid/megaraid_sas_base.c | 14 ++++++--- > > > > > > ----- > > > > > > drivers/scsi/megaraid/megaraid_sas_fusion.c | 3 +++ > > > > > > drivers/scsi/megaraid/megaraid_sas_fusion.h | 2 ++ > > > > > > 4 files changed, 14 insertions(+), 8 deletions(-) > > > > > > > > > > > > diff --git a/drivers/scsi/megaraid/megaraid_sas.h > > > > > > b/drivers/scsi/megaraid/megaraid_sas.h > > > > > > index ca86c88..43fd14f 100644 > > > > > > --- a/drivers/scsi/megaraid/megaraid_sas.h > > > > > > +++ b/drivers/scsi/megaraid/megaraid_sas.h > > > > > > @@ -1429,6 +1429,8 @@ enum FW_BOOT_CONTEXT { > > > > > > #define MR_MAX_REPLY_QUEUES_EXT_OFFSET_SHIFT 14 > > > > > > #define MR_MAX_MSIX_REG_ARRAY 16 > > > > > > #define MR_RDPQ_MODE_OFFSET 0X00800 > > > > > > 000 > > > > > > +#define MR_CAN_HANDLE_SYNC_CACHE_OFFSET 0X010 > > > > > > 0000 > > > > > > 0 > > > > > > + > > > > > > /* > > > > > > * register set for both 1068 and 1078 controllers > > > > > > * structure extended for 1078 registers @@ -2140,6 +2142,7 > > > > > > @@ struct megasas_instance { > > > > > > u8 is_imr; > > > > > > u8 is_rdpq; > > > > > > bool dev_handle; > > > > > > + bool fw_sync_cache_support; > > > > > > }; > > > > > > struct MR_LD_VF_MAP { > > > > > > u32 size; > > > > > > diff --git a/drivers/scsi/megaraid/megaraid_sas_base.c > > > > > > b/drivers/scsi/megaraid/megaraid_sas_base.c > > > > > > index 092387f..a4a8e2f 100644 > > > > > > --- a/drivers/scsi/megaraid/megaraid_sas_base.c > > > > > > +++ b/drivers/scsi/megaraid/megaraid_sas_base.c > > > > > > @@ -104,6 +104,10 @@ unsigned int scmd_timeout = > > > > > > MEGASAS_DEFAULT_CMD_TIMEOUT; > > > > > > module_param(scmd_timeout, int, S_IRUGO); > > > > > > MODULE_PARM_DESC(scmd_timeout, "scsi command timeout (10 > > > > > > -90s), default 90s. See megasas_reset_timer."); > > > > > > > > > > > > +bool block_sync_cache; > > > > > > +module_param(block_sync_cache, bool, S_IRUGO); > > > > > > +MODULE_PARM_DESC(block_sync_cache, "Block SYNC CACHE by > > > > > > driver > > > > > > Default: 0(Send it to firmware)"); > > > > > > + > > > > > > MODULE_LICENSE("GPL"); > > > > > > MODULE_VERSION(MEGASAS_VERSION); > > > > > > MODULE_AUTHOR("megaraidlinux.pdl@avagotech.com"); > > > > > > @@ -1700,16 +1704,10 @@ megasas_queue_command(struct > Scsi_Host > > > > > > *shost, struct scsi_cmnd *scmd) > > > > > > goto out_done; > > > > > > } > > > > > > > > > > > > - switch (scmd->cmnd[0]) { > > > > > > - case SYNCHRONIZE_CACHE: > > > > > > - /* > > > > > > - * FW takes care of flush cache on its own > > > > > > - * No need to send it down > > > > > > - */ > > > > > > + if ((scmd->cmnd[0] == SYNCHRONIZE_CACHE) && > > > > > > + (!instance->fw_sync_cache_support)) { > > > > > > scmd->result = DID_OK << 16; > > > > > > goto out_done; > > > > > > - default: > > > > > > - break; > > > > > > } > > > > > > > > > > > > return instance->instancet > > > > > > ->build_and_issue_cmd(instance, scmd); > > > > > > diff --git a/drivers/scsi/megaraid/megaraid_sas_fusion.c > > > > > > b/drivers/scsi/megaraid/megaraid_sas_fusion.c > > > > > > index 2159f6a..8237580 100644 > > > > > > --- a/drivers/scsi/megaraid/megaraid_sas_fusion.c > > > > > > +++ b/drivers/scsi/megaraid/megaraid_sas_fusion.c > > > > > > @@ -747,6 +747,9 @@ megasas_ioc_init_fusion(struct > > > > > > megasas_instance *instance) > > > > > > ret = 1; > > > > > > goto fail_fw_init; > > > > > > } > > > > > > + if (!block_sync_cache) > > > > > > + instance->fw_sync_cache_support = > > > > > > (scratch_pad_2 > > > > > > & > > > > > > + MR_CAN_HANDLE_SYNC_CACHE_OFFSET) ? 1 > > > > > > : > > > > > > 0; > > > > > > > > > > > > IOCInitMessage = > > > > > > dma_alloc_coherent(&instance->pdev->dev, > > > > > > diff --git a/drivers/scsi/megaraid/megaraid_sas_fusion.h > > > > > > b/drivers/scsi/megaraid/megaraid_sas_fusion.h > > > > > > index e3bee04..2857154 100644 > > > > > > --- a/drivers/scsi/megaraid/megaraid_sas_fusion.h > > > > > > +++ b/drivers/scsi/megaraid/megaraid_sas_fusion.h > > > > > > @@ -72,6 +72,8 @@ > > > > > > #define MPI2_SUP_REPLY_POST_HOST_INDEX_OFFSET > > > > > > (0x0000030C) > > > > > > #define MPI2_REPLY_POST_HOST_INDEX_OFFSET (0x000000 > > > > > > 6C) > > > > > > > > > > > > +extern bool block_sync_cache; > > > > > > + > > > > > > /* > > > > > > * Raid context flags > > > > > > */ > > > > > > > > > > > Be extra careful with that. > > > > > > > > > > SYNCHRONIZE_CACHE has (potentially) a global scope, as there > > > > > might be an array-wide cache, and a cache flush would affect the > > > > > entire cache. Linux is using SYNCHRONIZE_CACHE as a per device > > > > > setting, ie it assumes that the effects of a cache flush is > > > > > restricted to the device in question. > > > > > > > > > > If this is _not_ the case I'd rather not enable it. > > > > > Have you checked that enabling this functionality doesn't have > > > > > negative performance impact? > > > > > > > > > > Cheers, > > > > > > > > > > Hannes > > > > This must go in - without this fix, there is no data integrity for > > > > any file system. > > > That's not what I get from the change log. What it says to me is > > > that the caches are currently firmware managed. Barring firmware > > > bugs, that means that we currently don't have any integrity issues. > > > > Your understanding (or the change log) is incorrect. > > There's no current way in English of construing "as firmware takes care of > flushing cache" to mean its inverse, so the changelog needs updating to > explain > that firmware does *not* take care of cache flushing, so effectively > nothing > does and we'll need a cc to stable if the problem is that nothing takes > care of > flushing the drive caches. > > James Sorry for confusion. More accurate to say - Current megaraid_sas driver returns SYNCHRONIZE_CACHE command back to SCSI layer without sending it down to firmware as firmware supposed to takes care of flushing cache for all Virtual Disk at the time of system reboot/shutdown. Because of above FW rule driver coded wrongly and it added bug that SYNCHRONIZE_CACHE is not passed to FW for JBOD as well. (MR Firmware wanted OS to manage SYNCHRONIZE_CACHE and pass the same to the Firmware. ). We figure out that even for VD, why driver should send back SYNC_CACHE command...let's have the same in FW (giving all control in FW) New behavior described - IF application requests SYNCH_CACHE IF any FW which supports new API bit called canHandleSyncCache Driver sends SYNCH_CACHE command to the FW IF 'VirtualDisk' FW does not send SYNCH_CACHE to drives FW returns SUCCESS ELSE IF 'JBOD' FW sends SYNCH_CACHE to drive FW obtains status from drive and returns same status back to driver ENDIF Fixing this is robust if FW and driver changes are inline. See below for more detail. Case - 1 This patch is attempt to fix one level problem where Virtual Disks are not managed correctly in MR FW. There are some MR FW (E.a Gen2 and Gen2.5 FW), which set WCE bit for Virtual Disk but FW is not reply correct for SYNCH_CACHE. This was handled in past and carry forward (in driver + FW ) to all new generation products as well. We tried to collect all possible details why it was handled such way to provide better driver fix for this issue( mainly to avoid this FW checks and module parameters etc..). But now it looks like to make generic fix (Device ID based etc..)is even risky, so went with this protective approach. It is very risky to find out which Device ID and FW has capability to manage SYNC_CACHE, so we managed to figure out which are the new FW using FW capability bit. E.g Liberator/Thunderbolt MR FW SYNC_CACHE is write Unsupported command (this is a firmware bug) and immediately it will be failed to SML. This will cause File system to go Read-only. Case -2 One more thing which we are trying and known driver can send "SYNC_CACHE" unconditionally to all generation of FW. For this we are doing more unit testing on various controllers/FW and planning to provide another patch to fix JBOD path for any FW. (It will not be based on FW capability/module parameter). If we remove module parameter, we can ask customer to disable WCE on drive to get similar impact. ` Kashyap > > > > The issue here is for devices that are non-RAID or pass through - this > > is a real issue and has been seen in practice. > > -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, 2016-10-17 at 23:06 +0530, Kashyap Desai wrote: > > -----Original Message----- > > From: James Bottomley [mailto:jejb@linux.vnet.ibm.com] > > Sent: Monday, October 17, 2016 10:50 PM > > To: Ric Wheeler; Hannes Reinecke; Sumit Saxena; > > linux-scsi@vger.kernel.org > > Cc: martin.petersen@oracle.com; thenzl@redhat.com; > > kashyap.desai@broadcom.com; Christoph Hellwig; Martin K. Petersen; > > Jeff > > Moyer; Gris Ge; Ewan Milne; Jens Axboe > > Subject: Re: [PATCH 4/7] megaraid_sas: Send SYNCHRONIZE_CACHE > > command > > to firmware > > > > On Mon, 2016-10-17 at 12:27 -0400, Ric Wheeler wrote: > > > On 10/17/2016 12:20 PM, James Bottomley wrote: > > > > On Mon, 2016-10-17 at 09:01 -0400, Ric Wheeler wrote: > > > > > On 10/17/2016 07:34 AM, Hannes Reinecke wrote: > > > > > > On 10/17/2016 12:24 PM, Sumit Saxena wrote: > > > > > > > megaraid_sas driver returns SYNCHRONIZE_CACHE command > > > > > > > back to > > > > > > > SCSI layer without sending it to firmware as firmware > > > > > > > takes > > > > > > > care of flushing cache. This patch will change the > > > > > > > driver > > > > > > > behavior wrt SYNCHRONIZE_CACHE handling. If underlying > > > > > > > firmware has support to handle the SYNCHORNIZE_CACHE, > > > > > > > driver > > > > > > > will send it for firmware otherwise complete it back to > > > > > > > SCSI > > > > > > > layer with SUCCESS immediately. If Firmware handle > > > > > > > SYNCHORNIZE_CACHE for both VD and JBOD > > > > > > > "canHandleSyncCache" > > > > > > > bit in scratch pad register(offset > > > > > > > 0x00B4) will be set. > > > > > > > > > > > > > > This behavior can be controlled via module parameter and > > > > > > > user > > > > > > > can fallback to old behavior of returning > > > > > > > SYNCHRONIZE_CACHE by > > > > > > > driver only without sending it to firmware. > > > > > > > > > > > > > > Signed-off-by: Sumit Saxena <sumit.saxena@broadcom.com> > > > > > > > Signed-off-by: Kashyap Desai <kashyap.desai@broadcom.com> > > > > > > > --- > > > > > > > drivers/scsi/megaraid/megaraid_sas.h | 3 +++ > > > > > > > drivers/scsi/megaraid/megaraid_sas_base.c | 14 > > > > > > > ++++++--- > > > > > > > ----- > > > > > > > drivers/scsi/megaraid/megaraid_sas_fusion.c | 3 +++ > > > > > > > drivers/scsi/megaraid/megaraid_sas_fusion.h | 2 ++ > > > > > > > 4 files changed, 14 insertions(+), 8 deletions(-) > > > > > > > > > > > > > > diff --git a/drivers/scsi/megaraid/megaraid_sas.h > > > > > > > b/drivers/scsi/megaraid/megaraid_sas.h > > > > > > > index ca86c88..43fd14f 100644 > > > > > > > --- a/drivers/scsi/megaraid/megaraid_sas.h > > > > > > > +++ b/drivers/scsi/megaraid/megaraid_sas.h > > > > > > > @@ -1429,6 +1429,8 @@ enum FW_BOOT_CONTEXT { > > > > > > > #define MR_MAX_REPLY_QUEUES_EXT_OFFSET_SHIFT 14 > > > > > > > #define MR_MAX_MSIX_REG_ARRAY 16 > > > > > > > #define MR_RDPQ_MODE_OFFSET 0X0 > > > > > > > 0800 > > > > > > > 000 > > > > > > > +#define MR_CAN_HANDLE_SYNC_CACHE_OFFSET 0 > > > > > > > X010 > > > > > > > 0000 > > > > > > > 0 > > > > > > > + > > > > > > > /* > > > > > > > * register set for both 1068 and 1078 controllers > > > > > > > * structure extended for 1078 registers @@ -2140,6 > > > > > > > +2142,7 > > > > > > > @@ struct megasas_instance { > > > > > > > u8 is_imr; > > > > > > > u8 is_rdpq; > > > > > > > bool dev_handle; > > > > > > > + bool fw_sync_cache_support; > > > > > > > }; > > > > > > > struct MR_LD_VF_MAP { > > > > > > > u32 size; > > > > > > > diff --git a/drivers/scsi/megaraid/megaraid_sas_base.c > > > > > > > b/drivers/scsi/megaraid/megaraid_sas_base.c > > > > > > > index 092387f..a4a8e2f 100644 > > > > > > > --- a/drivers/scsi/megaraid/megaraid_sas_base.c > > > > > > > +++ b/drivers/scsi/megaraid/megaraid_sas_base.c > > > > > > > @@ -104,6 +104,10 @@ unsigned int scmd_timeout = > > > > > > > MEGASAS_DEFAULT_CMD_TIMEOUT; > > > > > > > module_param(scmd_timeout, int, S_IRUGO); > > > > > > > MODULE_PARM_DESC(scmd_timeout, "scsi command timeout > > > > > > > (10 > > > > > > > -90s), default 90s. See megasas_reset_timer."); > > > > > > > > > > > > > > +bool block_sync_cache; > > > > > > > +module_param(block_sync_cache, bool, S_IRUGO); > > > > > > > +MODULE_PARM_DESC(block_sync_cache, "Block SYNC CACHE by > > > > > > > driver > > > > > > > Default: 0(Send it to firmware)"); > > > > > > > + > > > > > > > MODULE_LICENSE("GPL"); > > > > > > > MODULE_VERSION(MEGASAS_VERSION); > > > > > > > MODULE_AUTHOR("megaraidlinux.pdl@avagotech.com"); > > > > > > > @@ -1700,16 +1704,10 @@ megasas_queue_command(struct > > Scsi_Host > > > > > > > *shost, struct scsi_cmnd *scmd) > > > > > > > goto out_done; > > > > > > > } > > > > > > > > > > > > > > - switch (scmd->cmnd[0]) { > > > > > > > - case SYNCHRONIZE_CACHE: > > > > > > > - /* > > > > > > > - * FW takes care of flush cache on its > > > > > > > own > > > > > > > - * No need to send it down > > > > > > > - */ > > > > > > > + if ((scmd->cmnd[0] == SYNCHRONIZE_CACHE) && > > > > > > > + (!instance->fw_sync_cache_support)) { > > > > > > > scmd->result = DID_OK << 16; > > > > > > > goto out_done; > > > > > > > - default: > > > > > > > - break; > > > > > > > } > > > > > > > > > > > > > > return instance->instancet > > > > > > > ->build_and_issue_cmd(instance, scmd); > > > > > > > diff --git a/drivers/scsi/megaraid/megaraid_sas_fusion.c > > > > > > > b/drivers/scsi/megaraid/megaraid_sas_fusion.c > > > > > > > index 2159f6a..8237580 100644 > > > > > > > --- a/drivers/scsi/megaraid/megaraid_sas_fusion.c > > > > > > > +++ b/drivers/scsi/megaraid/megaraid_sas_fusion.c > > > > > > > @@ -747,6 +747,9 @@ megasas_ioc_init_fusion(struct > > > > > > > megasas_instance *instance) > > > > > > > ret = 1; > > > > > > > goto fail_fw_init; > > > > > > > } > > > > > > > + if (!block_sync_cache) > > > > > > > + instance->fw_sync_cache_support = > > > > > > > (scratch_pad_2 > > > > > > > & > > > > > > > + MR_CAN_HANDLE_SYNC_CACHE_OFFSET) > > > > > > > ? 1 > > > > > > > : > > > > > > > 0; > > > > > > > > > > > > > > IOCInitMessage = > > > > > > > dma_alloc_coherent(&instance->pdev->dev, > > > > > > > diff --git a/drivers/scsi/megaraid/megaraid_sas_fusion.h > > > > > > > b/drivers/scsi/megaraid/megaraid_sas_fusion.h > > > > > > > index e3bee04..2857154 100644 > > > > > > > --- a/drivers/scsi/megaraid/megaraid_sas_fusion.h > > > > > > > +++ b/drivers/scsi/megaraid/megaraid_sas_fusion.h > > > > > > > @@ -72,6 +72,8 @@ > > > > > > > #define MPI2_SUP_REPLY_POST_HOST_INDEX_OFFSET > > > > > > > (0x0000030C) > > > > > > > #define MPI2_REPLY_POST_HOST_INDEX_OFFSET (0x00 > > > > > > > 0000 > > > > > > > 6C) > > > > > > > > > > > > > > +extern bool block_sync_cache; > > > > > > > + > > > > > > > /* > > > > > > > * Raid context flags > > > > > > > */ > > > > > > > > > > > > > Be extra careful with that. > > > > > > > > > > > > SYNCHRONIZE_CACHE has (potentially) a global scope, as > > > > > > there > > > > > > might be an array-wide cache, and a cache flush would > > > > > > affect the > > > > > > entire cache. Linux is using SYNCHRONIZE_CACHE as a per > > > > > > device > > > > > > setting, ie it assumes that the effects of a cache flush is > > > > > > restricted to the device in question. > > > > > > > > > > > > If this is _not_ the case I'd rather not enable it. > > > > > > Have you checked that enabling this functionality doesn't > > > > > > have > > > > > > negative performance impact? > > > > > > > > > > > > Cheers, > > > > > > > > > > > > Hannes > > > > > This must go in - without this fix, there is no data > > > > > integrity for > > > > > any file system. > > > > That's not what I get from the change log. What it says to me > > > > is > > > > that the caches are currently firmware managed. Barring > > > > firmware > > > > bugs, that means that we currently don't have any integrity > > > > issues. > > > > > > Your understanding (or the change log) is incorrect. > > > > There's no current way in English of construing "as firmware takes > > care of > > flushing cache" to mean its inverse, so the changelog needs > > updating to > > explain > > that firmware does *not* take care of cache flushing, so > > effectively > > nothing > > does and we'll need a cc to stable if the problem is that nothing > > takes > > care of > > flushing the drive caches. > > > > James > > Sorry for confusion. More accurate to say - > > Current megaraid_sas driver returns SYNCHRONIZE_CACHE command back to > SCSI layer without sending it down to firmware as firmware supposed > to takes care of flushing cache for all Virtual Disk at the time of > system reboot/shutdown. Because of above FW rule driver coded wrongly > and it added bug that SYNCHRONIZE_CACHE is not passed to FW for JBOD > as well. (MR Firmware wanted OS to manage SYNCHRONIZE_CACHE and pass > the same to the Firmware. ). We figure out that even for VD, why > driver should send back SYNC_CACHE command...let's have the same in > FW (giving all control in FW) > > New behavior described - > > IF application requests SYNCH_CACHE > IF any FW which supports new API bit called canHandleSyncCache > Driver sends SYNCH_CACHE command to the FW > IF 'VirtualDisk' > FW does not send SYNCH_CACHE to drives > FW returns SUCCESS > ELSE IF 'JBOD' > FW sends SYNCH_CACHE to drive > FW obtains status from drive and returns same status back to > driver > ENDIF > > Fixing this is robust if FW and driver changes are inline. See below > for more detail. > > Case - 1 > This patch is attempt to fix one level problem where Virtual Disks > are not managed correctly in MR FW. There are some MR FW (E.a Gen2 > and Gen2.5 FW), which set WCE bit for Virtual Disk but FW is not > reply correct for SYNCH_CACHE. This was handled in past and carry > forward (in driver + FW ) to all new generation products as well. We > tried to collect all possible details why it was handled such way to > provide better driver fix for this issue(mainly to avoid this FW > checks and module parameters etc..). But now it looks like to make > generic fix (Device ID based etc..)is even risky, so went with this > protective approach. It is very risky to find out which Device ID > and FW has capability to manage SYNC_CACHE, so we managed to figure > out which are the new FW using FW capability bit. > > E.g Liberator/Thunderbolt MR FW SYNC_CACHE is write Unsupported > command (this is a firmware bug) and immediately it will be failed to > SML. This will cause File system to go Read-only. That's a better description ... what you're saying is that the patch doesn't actually fix the bug Ric is worrying about. > Case -2 > One more thing which we are trying and known driver can send > "SYNC_CACHE" unconditionally to all generation of FW. For this we > are doing more unit testing on various controllers/FW and planning to > provide another patch to fix JBOD path for any FW. (It will not be > based on FW capability/module parameter). OK, but we really need this ASAP. As Ric said, data integrity is at risk until this is fixed. Can you also reference the commit that introduced the problem so we know how far to do the sable backports? > If we remove module parameter, we can ask customer to disable WCE on > drive to get similar impact. Thanks, James -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 10/17/2016 09:57 AM, Tomas Henzl wrote: > On 17.10.2016 15:28, Sumit Saxena wrote: >>> -----Original Message----- >>> From: Tomas Henzl [mailto:thenzl@redhat.com] >>> Sent: Monday, October 17, 2016 6:44 PM >>> To: Sumit Saxena; linux-scsi@vger.kernel.org >>> Cc: martin.petersen@oracle.com; jejb@linux.vnet.ibm.com; >>> kashyap.desai@broadcom.com >>> Subject: Re: [PATCH 4/7] megaraid_sas: Send SYNCHRONIZE_CACHE command to >>> firmware >>> >>> On 17.10.2016 12:24, Sumit Saxena wrote: >>>> megaraid_sas driver returns SYNCHRONIZE_CACHE command back to SCSI >>>> layer without sending it to firmware as firmware takes care of flushing >> cache. >>>> This patch will change the driver behavior wrt SYNCHRONIZE_CACHE >> handling. >>>> If underlying firmware has support to handle the SYNCHORNIZE_CACHE, >>>> driver will send it for firmware otherwise complete it back to SCSI >>>> layer with SUCCESS immediately. >>>> If Firmware handle SYNCHORNIZE_CACHE for both VD and JBOD >>>> "canHandleSyncCache" bit in scratch pad register(offset 0x00B4) will >>>> be set. >>>> >>>> This behavior can be controlled via module parameter and user can >>>> fallback to old behavior of returning SYNCHRONIZE_CACHE by driver only >>>> without sending it to firmware. >>>> >>>> Signed-off-by: Sumit Saxena <sumit.saxena@broadcom.com> >>>> Signed-off-by: Kashyap Desai <kashyap.desai@broadcom.com> >>>> --- >>>> drivers/scsi/megaraid/megaraid_sas.h | 3 +++ >>>> drivers/scsi/megaraid/megaraid_sas_base.c | 14 ++++++-------- >>>> drivers/scsi/megaraid/megaraid_sas_fusion.c | 3 +++ >>>> drivers/scsi/megaraid/megaraid_sas_fusion.h | 2 ++ >>>> 4 files changed, 14 insertions(+), 8 deletions(-) >>>> >>>> diff --git a/drivers/scsi/megaraid/megaraid_sas.h >>>> b/drivers/scsi/megaraid/megaraid_sas.h >>>> index ca86c88..43fd14f 100644 >>>> --- a/drivers/scsi/megaraid/megaraid_sas.h >>>> +++ b/drivers/scsi/megaraid/megaraid_sas.h >>>> @@ -1429,6 +1429,8 @@ enum FW_BOOT_CONTEXT { >>>> #define MR_MAX_REPLY_QUEUES_EXT_OFFSET_SHIFT 14 >>>> #define MR_MAX_MSIX_REG_ARRAY 16 >>>> #define MR_RDPQ_MODE_OFFSET 0X00800000 >>>> +#define MR_CAN_HANDLE_SYNC_CACHE_OFFSET 0X01000000 >>>> + >>>> /* >>>> * register set for both 1068 and 1078 controllers >>>> * structure extended for 1078 registers @@ -2140,6 +2142,7 @@ struct >>>> megasas_instance { >>>> u8 is_imr; >>>> u8 is_rdpq; >>>> bool dev_handle; >>>> + bool fw_sync_cache_support; >>>> }; >>>> struct MR_LD_VF_MAP { >>>> u32 size; >>>> diff --git a/drivers/scsi/megaraid/megaraid_sas_base.c >>>> b/drivers/scsi/megaraid/megaraid_sas_base.c >>>> index 092387f..a4a8e2f 100644 >>>> --- a/drivers/scsi/megaraid/megaraid_sas_base.c >>>> +++ b/drivers/scsi/megaraid/megaraid_sas_base.c >>>> @@ -104,6 +104,10 @@ unsigned int scmd_timeout = >>>> MEGASAS_DEFAULT_CMD_TIMEOUT; module_param(scmd_timeout, int, >>>> S_IRUGO); MODULE_PARM_DESC(scmd_timeout, "scsi command timeout >>>> (10-90s), default 90s. See megasas_reset_timer."); >>>> >>>> +bool block_sync_cache; >>>> +module_param(block_sync_cache, bool, S_IRUGO); >>>> +MODULE_PARM_DESC(block_sync_cache, "Block SYNC CACHE by driver >>>> +Default: 0(Send it to firmware)"); >>>> + >>>> MODULE_LICENSE("GPL"); >>>> MODULE_VERSION(MEGASAS_VERSION); >>>> MODULE_AUTHOR("megaraidlinux.pdl@avagotech.com"); >>>> @@ -1700,16 +1704,10 @@ megasas_queue_command(struct Scsi_Host >>> *shost, struct scsi_cmnd *scmd) >>>> goto out_done; >>>> } >>>> >>>> - switch (scmd->cmnd[0]) { >>>> - case SYNCHRONIZE_CACHE: >>>> - /* >>>> - * FW takes care of flush cache on its own >>>> - * No need to send it down >>>> - */ >>>> + if ((scmd->cmnd[0] == SYNCHRONIZE_CACHE) && >>>> + (!instance->fw_sync_cache_support)) { >>> Maybe I'm wrong, but isn't the logic inverted? >>> when fw_sync_cache_support is true it means that there is a flash cache >> or a >>> battery and we don't have to send the command down ? >>> >> If "instance->fw_sync_cache_support" is set to true, it means driver can >> send SYNCHRONIZE_CACHE to firmware(firmware has infrastructure to support >> SYNCHRONIZE_CACHE). >> If "instance->fw_sync_cache_support" is set to false, it means FW does not >> support to handle SYNCHRONIZE_CACHE and FW will flush cache during >> shutdown. > Thanks, in that case it is correct. > >>>> + if (!block_sync_cache) >>>> + instance->fw_sync_cache_support = (scratch_pad_2 & >>>> + MR_CAN_HANDLE_SYNC_CACHE_OFFSET) ? 1 : 0; > Please add a warning or log the state of the synchronise cache command > on the controller. Any logging should be limited to once per device - say when the device is opened. Note that synchronize_cache commands are extremely common, we don't want to fill the log with this. thanks! Ric > > >>>> IOCInitMessage = >>>> dma_alloc_coherent(&instance->pdev->dev, >>>> diff --git a/drivers/scsi/megaraid/megaraid_sas_fusion.h >>>> b/drivers/scsi/megaraid/megaraid_sas_fusion.h >>>> index e3bee04..2857154 100644 >>>> --- a/drivers/scsi/megaraid/megaraid_sas_fusion.h >>>> +++ b/drivers/scsi/megaraid/megaraid_sas_fusion.h >>>> @@ -72,6 +72,8 @@ >>>> #define MPI2_SUP_REPLY_POST_HOST_INDEX_OFFSET (0x0000030C) >>>> #define MPI2_REPLY_POST_HOST_INDEX_OFFSET (0x0000006C) >>>> >>>> +extern bool block_sync_cache; >>>> + >>>> /* >>>> * Raid context flags >>>> */ -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
>-----Original Message----- >From: Ric Wheeler [mailto:ricwheeler@gmail.com] >Sent: Tuesday, October 18, 2016 6:38 PM >To: Tomas Henzl; Sumit Saxena; linux-scsi@vger.kernel.org >Cc: martin.petersen@oracle.com; jejb@linux.vnet.ibm.com; Kashyap Desai >Subject: Re: [PATCH 4/7] megaraid_sas: Send SYNCHRONIZE_CACHE command to >firmware > >On 10/17/2016 09:57 AM, Tomas Henzl wrote: >> On 17.10.2016 15:28, Sumit Saxena wrote: >>>> -----Original Message----- >>>> From: Tomas Henzl [mailto:thenzl@redhat.com] >>>> Sent: Monday, October 17, 2016 6:44 PM >>>> To: Sumit Saxena; linux-scsi@vger.kernel.org >>>> Cc: martin.petersen@oracle.com; jejb@linux.vnet.ibm.com; >>>> kashyap.desai@broadcom.com >>>> Subject: Re: [PATCH 4/7] megaraid_sas: Send SYNCHRONIZE_CACHE >>>> command to firmware >>>> >>>> On 17.10.2016 12:24, Sumit Saxena wrote: >>>>> megaraid_sas driver returns SYNCHRONIZE_CACHE command back to SCSI >>>>> layer without sending it to firmware as firmware takes care of >>>>> flushing >>> cache. >>>>> This patch will change the driver behavior wrt SYNCHRONIZE_CACHE >>> handling. >>>>> If underlying firmware has support to handle the SYNCHORNIZE_CACHE, >>>>> driver will send it for firmware otherwise complete it back to SCSI >>>>> layer with SUCCESS immediately. >>>>> If Firmware handle SYNCHORNIZE_CACHE for both VD and JBOD >>>>> "canHandleSyncCache" bit in scratch pad register(offset 0x00B4) >>>>> will be set. >>>>> >>>>> This behavior can be controlled via module parameter and user can >>>>> fallback to old behavior of returning SYNCHRONIZE_CACHE by driver >>>>> only without sending it to firmware. >>>>> >>>>> Signed-off-by: Sumit Saxena <sumit.saxena@broadcom.com> >>>>> Signed-off-by: Kashyap Desai <kashyap.desai@broadcom.com> >>>>> --- >>>>> drivers/scsi/megaraid/megaraid_sas.h | 3 +++ >>>>> drivers/scsi/megaraid/megaraid_sas_base.c | 14 ++++++-------- >>>>> drivers/scsi/megaraid/megaraid_sas_fusion.c | 3 +++ >>>>> drivers/scsi/megaraid/megaraid_sas_fusion.h | 2 ++ >>>>> 4 files changed, 14 insertions(+), 8 deletions(-) >>>>> >>>>> diff --git a/drivers/scsi/megaraid/megaraid_sas.h >>>>> b/drivers/scsi/megaraid/megaraid_sas.h >>>>> index ca86c88..43fd14f 100644 >>>>> --- a/drivers/scsi/megaraid/megaraid_sas.h >>>>> +++ b/drivers/scsi/megaraid/megaraid_sas.h >>>>> @@ -1429,6 +1429,8 @@ enum FW_BOOT_CONTEXT { >>>>> #define MR_MAX_REPLY_QUEUES_EXT_OFFSET_SHIFT 14 >>>>> #define MR_MAX_MSIX_REG_ARRAY 16 >>>>> #define MR_RDPQ_MODE_OFFSET 0X00800000 >>>>> +#define MR_CAN_HANDLE_SYNC_CACHE_OFFSET 0X01000000 >>>>> + >>>>> /* >>>>> * register set for both 1068 and 1078 controllers >>>>> * structure extended for 1078 registers @@ -2140,6 +2142,7 @@ >>>>> struct megasas_instance { >>>>> u8 is_imr; >>>>> u8 is_rdpq; >>>>> bool dev_handle; >>>>> + bool fw_sync_cache_support; >>>>> }; >>>>> struct MR_LD_VF_MAP { >>>>> u32 size; >>>>> diff --git a/drivers/scsi/megaraid/megaraid_sas_base.c >>>>> b/drivers/scsi/megaraid/megaraid_sas_base.c >>>>> index 092387f..a4a8e2f 100644 >>>>> --- a/drivers/scsi/megaraid/megaraid_sas_base.c >>>>> +++ b/drivers/scsi/megaraid/megaraid_sas_base.c >>>>> @@ -104,6 +104,10 @@ unsigned int scmd_timeout = >>>>> MEGASAS_DEFAULT_CMD_TIMEOUT; module_param(scmd_timeout, int, >>>>> S_IRUGO); MODULE_PARM_DESC(scmd_timeout, "scsi command timeout >>>>> (10-90s), default 90s. See megasas_reset_timer."); >>>>> >>>>> +bool block_sync_cache; >>>>> +module_param(block_sync_cache, bool, S_IRUGO); >>>>> +MODULE_PARM_DESC(block_sync_cache, "Block SYNC CACHE by driver >>>>> +Default: 0(Send it to firmware)"); >>>>> + >>>>> MODULE_LICENSE("GPL"); >>>>> MODULE_VERSION(MEGASAS_VERSION); >>>>> MODULE_AUTHOR("megaraidlinux.pdl@avagotech.com"); >>>>> @@ -1700,16 +1704,10 @@ megasas_queue_command(struct Scsi_Host >>>> *shost, struct scsi_cmnd *scmd) >>>>> goto out_done; >>>>> } >>>>> >>>>> - switch (scmd->cmnd[0]) { >>>>> - case SYNCHRONIZE_CACHE: >>>>> - /* >>>>> - * FW takes care of flush cache on its own >>>>> - * No need to send it down >>>>> - */ >>>>> + if ((scmd->cmnd[0] == SYNCHRONIZE_CACHE) && >>>>> + (!instance->fw_sync_cache_support)) { >>>> Maybe I'm wrong, but isn't the logic inverted? >>>> when fw_sync_cache_support is true it means that there is a flash >>>> cache >>> or a >>>> battery and we don't have to send the command down ? >>>> >>> If "instance->fw_sync_cache_support" is set to true, it means driver >>> can send SYNCHRONIZE_CACHE to firmware(firmware has infrastructure to >>> support SYNCHRONIZE_CACHE). >>> If "instance->fw_sync_cache_support" is set to false, it means FW >>> does not support to handle SYNCHRONIZE_CACHE and FW will flush cache >>> during shutdown. >> Thanks, in that case it is correct. >> >>>>> + if (!block_sync_cache) >>>>> + instance->fw_sync_cache_support = (scratch_pad_2 & >>>>> + MR_CAN_HANDLE_SYNC_CACHE_OFFSET) ? 1 : 0; >> Please add a warning or log the state of the synchronise cache command >> on the controller. > >Any logging should be limited to once per device - say when the device is >opened. >Note that synchronize_cache commands are extremely common, we don't want >to fill the log with this. This print is not per device but per MegaRAID controller. This print will state the controller firmware's capability whether SYNCHRONIZE_CACHE coming from OS for RAID volumes can be handled by firmware or not. For non RAID(JBODs), SYNCHRONIZE_CACHE will be always sent to all MegaRAID firmware irrespective of firmware capability. Thanks, Sumit > >thanks! > >Ric > >> >> >>>>> IOCInitMessage = >>>>> dma_alloc_coherent(&instance->pdev->dev, >>>>> diff --git a/drivers/scsi/megaraid/megaraid_sas_fusion.h >>>>> b/drivers/scsi/megaraid/megaraid_sas_fusion.h >>>>> index e3bee04..2857154 100644 >>>>> --- a/drivers/scsi/megaraid/megaraid_sas_fusion.h >>>>> +++ b/drivers/scsi/megaraid/megaraid_sas_fusion.h >>>>> @@ -72,6 +72,8 @@ >>>>> #define MPI2_SUP_REPLY_POST_HOST_INDEX_OFFSET (0x0000030C) >>>>> #define MPI2_REPLY_POST_HOST_INDEX_OFFSET (0x0000006C) >>>>> >>>>> +extern bool block_sync_cache; >>>>> + >>>>> /* >>>>> * Raid context flags >>>>> */ -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hi, similar suspicious code path can be found in the queuecommand functions in other drivers too these are - pmcraid.c arcmsr_hba.c cc-ing maintainers - (but both drivers seem to be unmaintained for a while - I've added Ching for arcmsr and Raghava for pmcraid) please read this thread and verify that your driver and firmware is safe. It is expected that a raid card controls the disk write cache (switches it off) but this might not be true for all modes of operation a raid adapter handles - pass through/non-RAID etc. In such case when disk write cache is enabled and your driver skips the SYNCHRONIZE_CACHE command a FS corruption is very much possible during unexpected power loss or even a clean shutdown. tomash On 17.10.2016 19:51, James Bottomley wrote: > On Mon, 2016-10-17 at 23:06 +0530, Kashyap Desai wrote: >>> -----Original Message----- >>> From: James Bottomley [mailto:jejb@linux.vnet.ibm.com] >>> Sent: Monday, October 17, 2016 10:50 PM >>> To: Ric Wheeler; Hannes Reinecke; Sumit Saxena; >>> linux-scsi@vger.kernel.org >>> Cc: martin.petersen@oracle.com; thenzl@redhat.com; >>> kashyap.desai@broadcom.com; Christoph Hellwig; Martin K. Petersen; >>> Jeff >>> Moyer; Gris Ge; Ewan Milne; Jens Axboe >>> Subject: Re: [PATCH 4/7] megaraid_sas: Send SYNCHRONIZE_CACHE >>> command >>> to firmware >>> >>> On Mon, 2016-10-17 at 12:27 -0400, Ric Wheeler wrote: >>>> On 10/17/2016 12:20 PM, James Bottomley wrote: >>>>> On Mon, 2016-10-17 at 09:01 -0400, Ric Wheeler wrote: >>>>>> On 10/17/2016 07:34 AM, Hannes Reinecke wrote: >>>>>>> On 10/17/2016 12:24 PM, Sumit Saxena wrote: >>>>>>>> megaraid_sas driver returns SYNCHRONIZE_CACHE command >>>>>>>> back to >>>>>>>> SCSI layer without sending it to firmware as firmware >>>>>>>> takes >>>>>>>> care of flushing cache. This patch will change the >>>>>>>> driver >>>>>>>> behavior wrt SYNCHRONIZE_CACHE handling. If underlying >>>>>>>> firmware has support to handle the SYNCHORNIZE_CACHE, >>>>>>>> driver >>>>>>>> will send it for firmware otherwise complete it back to >>>>>>>> SCSI >>>>>>>> layer with SUCCESS immediately. If Firmware handle >>>>>>>> SYNCHORNIZE_CACHE for both VD and JBOD >>>>>>>> "canHandleSyncCache" >>>>>>>> bit in scratch pad register(offset >>>>>>>> 0x00B4) will be set. >>>>>>>> >>>>>>>> This behavior can be controlled via module parameter and >>>>>>>> user >>>>>>>> can fallback to old behavior of returning >>>>>>>> SYNCHRONIZE_CACHE by >>>>>>>> driver only without sending it to firmware. >>>>>>>> >>>>>>>> Signed-off-by: Sumit Saxena <sumit.saxena@broadcom.com> >>>>>>>> Signed-off-by: Kashyap Desai <kashyap.desai@broadcom.com> >>>>>>>> --- >>>>>>>> drivers/scsi/megaraid/megaraid_sas.h | 3 +++ >>>>>>>> drivers/scsi/megaraid/megaraid_sas_base.c | 14 >>>>>>>> ++++++--- >>>>>>>> ----- >>>>>>>> drivers/scsi/megaraid/megaraid_sas_fusion.c | 3 +++ >>>>>>>> drivers/scsi/megaraid/megaraid_sas_fusion.h | 2 ++ >>>>>>>> 4 files changed, 14 insertions(+), 8 deletions(-) >>>>>>>> >>>>>>>> diff --git a/drivers/scsi/megaraid/megaraid_sas.h >>>>>>>> b/drivers/scsi/megaraid/megaraid_sas.h >>>>>>>> index ca86c88..43fd14f 100644 >>>>>>>> --- a/drivers/scsi/megaraid/megaraid_sas.h >>>>>>>> +++ b/drivers/scsi/megaraid/megaraid_sas.h >>>>>>>> @@ -1429,6 +1429,8 @@ enum FW_BOOT_CONTEXT { >>>>>>>> #define MR_MAX_REPLY_QUEUES_EXT_OFFSET_SHIFT 14 >>>>>>>> #define MR_MAX_MSIX_REG_ARRAY 16 >>>>>>>> #define MR_RDPQ_MODE_OFFSET 0X0 >>>>>>>> 0800 >>>>>>>> 000 >>>>>>>> +#define MR_CAN_HANDLE_SYNC_CACHE_OFFSET 0 >>>>>>>> X010 >>>>>>>> 0000 >>>>>>>> 0 >>>>>>>> + >>>>>>>> /* >>>>>>>> * register set for both 1068 and 1078 controllers >>>>>>>> * structure extended for 1078 registers @@ -2140,6 >>>>>>>> +2142,7 >>>>>>>> @@ struct megasas_instance { >>>>>>>> u8 is_imr; >>>>>>>> u8 is_rdpq; >>>>>>>> bool dev_handle; >>>>>>>> + bool fw_sync_cache_support; >>>>>>>> }; >>>>>>>> struct MR_LD_VF_MAP { >>>>>>>> u32 size; >>>>>>>> diff --git a/drivers/scsi/megaraid/megaraid_sas_base.c >>>>>>>> b/drivers/scsi/megaraid/megaraid_sas_base.c >>>>>>>> index 092387f..a4a8e2f 100644 >>>>>>>> --- a/drivers/scsi/megaraid/megaraid_sas_base.c >>>>>>>> +++ b/drivers/scsi/megaraid/megaraid_sas_base.c >>>>>>>> @@ -104,6 +104,10 @@ unsigned int scmd_timeout = >>>>>>>> MEGASAS_DEFAULT_CMD_TIMEOUT; >>>>>>>> module_param(scmd_timeout, int, S_IRUGO); >>>>>>>> MODULE_PARM_DESC(scmd_timeout, "scsi command timeout >>>>>>>> (10 >>>>>>>> -90s), default 90s. See megasas_reset_timer."); >>>>>>>> >>>>>>>> +bool block_sync_cache; >>>>>>>> +module_param(block_sync_cache, bool, S_IRUGO); >>>>>>>> +MODULE_PARM_DESC(block_sync_cache, "Block SYNC CACHE by >>>>>>>> driver >>>>>>>> Default: 0(Send it to firmware)"); >>>>>>>> + >>>>>>>> MODULE_LICENSE("GPL"); >>>>>>>> MODULE_VERSION(MEGASAS_VERSION); >>>>>>>> MODULE_AUTHOR("megaraidlinux.pdl@avagotech.com"); >>>>>>>> @@ -1700,16 +1704,10 @@ megasas_queue_command(struct >>> Scsi_Host >>>>>>>> *shost, struct scsi_cmnd *scmd) >>>>>>>> goto out_done; >>>>>>>> } >>>>>>>> >>>>>>>> - switch (scmd->cmnd[0]) { >>>>>>>> - case SYNCHRONIZE_CACHE: >>>>>>>> - /* >>>>>>>> - * FW takes care of flush cache on its >>>>>>>> own >>>>>>>> - * No need to send it down >>>>>>>> - */ >>>>>>>> + if ((scmd->cmnd[0] == SYNCHRONIZE_CACHE) && >>>>>>>> + (!instance->fw_sync_cache_support)) { >>>>>>>> scmd->result = DID_OK << 16; >>>>>>>> goto out_done; >>>>>>>> - default: >>>>>>>> - break; >>>>>>>> } >>>>>>>> >>>>>>>> return instance->instancet >>>>>>>> ->build_and_issue_cmd(instance, scmd); >>>>>>>> diff --git a/drivers/scsi/megaraid/megaraid_sas_fusion.c >>>>>>>> b/drivers/scsi/megaraid/megaraid_sas_fusion.c >>>>>>>> index 2159f6a..8237580 100644 >>>>>>>> --- a/drivers/scsi/megaraid/megaraid_sas_fusion.c >>>>>>>> +++ b/drivers/scsi/megaraid/megaraid_sas_fusion.c >>>>>>>> @@ -747,6 +747,9 @@ megasas_ioc_init_fusion(struct >>>>>>>> megasas_instance *instance) >>>>>>>> ret = 1; >>>>>>>> goto fail_fw_init; >>>>>>>> } >>>>>>>> + if (!block_sync_cache) >>>>>>>> + instance->fw_sync_cache_support = >>>>>>>> (scratch_pad_2 >>>>>>>> & >>>>>>>> + MR_CAN_HANDLE_SYNC_CACHE_OFFSET) >>>>>>>> ? 1 >>>>>>>> : >>>>>>>> 0; >>>>>>>> >>>>>>>> IOCInitMessage = >>>>>>>> dma_alloc_coherent(&instance->pdev->dev, >>>>>>>> diff --git a/drivers/scsi/megaraid/megaraid_sas_fusion.h >>>>>>>> b/drivers/scsi/megaraid/megaraid_sas_fusion.h >>>>>>>> index e3bee04..2857154 100644 >>>>>>>> --- a/drivers/scsi/megaraid/megaraid_sas_fusion.h >>>>>>>> +++ b/drivers/scsi/megaraid/megaraid_sas_fusion.h >>>>>>>> @@ -72,6 +72,8 @@ >>>>>>>> #define MPI2_SUP_REPLY_POST_HOST_INDEX_OFFSET >>>>>>>> (0x0000030C) >>>>>>>> #define MPI2_REPLY_POST_HOST_INDEX_OFFSET (0x00 >>>>>>>> 0000 >>>>>>>> 6C) >>>>>>>> >>>>>>>> +extern bool block_sync_cache; >>>>>>>> + >>>>>>>> /* >>>>>>>> * Raid context flags >>>>>>>> */ >>>>>>>> >>>>>>> Be extra careful with that. >>>>>>> >>>>>>> SYNCHRONIZE_CACHE has (potentially) a global scope, as >>>>>>> there >>>>>>> might be an array-wide cache, and a cache flush would >>>>>>> affect the >>>>>>> entire cache. Linux is using SYNCHRONIZE_CACHE as a per >>>>>>> device >>>>>>> setting, ie it assumes that the effects of a cache flush is >>>>>>> restricted to the device in question. >>>>>>> >>>>>>> If this is _not_ the case I'd rather not enable it. >>>>>>> Have you checked that enabling this functionality doesn't >>>>>>> have >>>>>>> negative performance impact? >>>>>>> >>>>>>> Cheers, >>>>>>> >>>>>>> Hannes >>>>>> This must go in - without this fix, there is no data >>>>>> integrity for >>>>>> any file system. >>>>> That's not what I get from the change log. What it says to me >>>>> is >>>>> that the caches are currently firmware managed. Barring >>>>> firmware >>>>> bugs, that means that we currently don't have any integrity >>>>> issues. >>>> Your understanding (or the change log) is incorrect. >>> There's no current way in English of construing "as firmware takes >>> care of >>> flushing cache" to mean its inverse, so the changelog needs >>> updating to >>> explain >>> that firmware does *not* take care of cache flushing, so >>> effectively >>> nothing >>> does and we'll need a cc to stable if the problem is that nothing >>> takes >>> care of >>> flushing the drive caches. >>> >>> James >> Sorry for confusion. More accurate to say - >> >> Current megaraid_sas driver returns SYNCHRONIZE_CACHE command back to >> SCSI layer without sending it down to firmware as firmware supposed >> to takes care of flushing cache for all Virtual Disk at the time of >> system reboot/shutdown. Because of above FW rule driver coded wrongly >> and it added bug that SYNCHRONIZE_CACHE is not passed to FW for JBOD >> as well. (MR Firmware wanted OS to manage SYNCHRONIZE_CACHE and pass >> the same to the Firmware. ). We figure out that even for VD, why >> driver should send back SYNC_CACHE command...let's have the same in >> FW (giving all control in FW) >> >> New behavior described - >> >> IF application requests SYNCH_CACHE >> IF any FW which supports new API bit called canHandleSyncCache >> Driver sends SYNCH_CACHE command to the FW >> IF 'VirtualDisk' >> FW does not send SYNCH_CACHE to drives >> FW returns SUCCESS >> ELSE IF 'JBOD' >> FW sends SYNCH_CACHE to drive >> FW obtains status from drive and returns same status back to >> driver >> ENDIF >> >> Fixing this is robust if FW and driver changes are inline. See below >> for more detail. >> >> Case - 1 >> This patch is attempt to fix one level problem where Virtual Disks >> are not managed correctly in MR FW. There are some MR FW (E.a Gen2 >> and Gen2.5 FW), which set WCE bit for Virtual Disk but FW is not >> reply correct for SYNCH_CACHE. This was handled in past and carry >> forward (in driver + FW ) to all new generation products as well. We >> tried to collect all possible details why it was handled such way to >> provide better driver fix for this issue(mainly to avoid this FW >> checks and module parameters etc..). But now it looks like to make >> generic fix (Device ID based etc..)is even risky, so went with this >> protective approach. It is very risky to find out which Device ID >> and FW has capability to manage SYNC_CACHE, so we managed to figure >> out which are the new FW using FW capability bit. >> >> E.g Liberator/Thunderbolt MR FW SYNC_CACHE is write Unsupported >> command (this is a firmware bug) and immediately it will be failed to >> SML. This will cause File system to go Read-only. > That's a better description ... what you're saying is that the patch > doesn't actually fix the bug Ric is worrying about. > >> Case -2 >> One more thing which we are trying and known driver can send >> "SYNC_CACHE" unconditionally to all generation of FW. For this we >> are doing more unit testing on various controllers/FW and planning to >> provide another patch to fix JBOD path for any FW. (It will not be >> based on FW capability/module parameter). > OK, but we really need this ASAP. As Ric said, data integrity is at > risk until this is fixed. Can you also reference the commit that > introduced the problem so we know how far to do the sable backports? > >> If we remove module parameter, we can ask customer to disable WCE on >> drive to get similar impact. > Thanks, > > James > > -- > To unsubscribe from this list: send the line "unsubscribe linux-scsi" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
>-----Original Message----- >From: James Bottomley [mailto:jejb@linux.vnet.ibm.com] >Sent: Monday, October 17, 2016 11:22 PM >To: Kashyap Desai; Ric Wheeler; Hannes Reinecke; Sumit Saxena; linux- >scsi@vger.kernel.org >Cc: martin.petersen@oracle.com; thenzl@redhat.com; Christoph Hellwig; >Martin >K. Petersen; Jeff Moyer; Gris Ge; Ewan Milne; Jens Axboe >Subject: Re: [PATCH 4/7] megaraid_sas: Send SYNCHRONIZE_CACHE command to >firmware > >On Mon, 2016-10-17 at 23:06 +0530, Kashyap Desai wrote: >> > -----Original Message----- >> > From: James Bottomley [mailto:jejb@linux.vnet.ibm.com] >> > Sent: Monday, October 17, 2016 10:50 PM >> > To: Ric Wheeler; Hannes Reinecke; Sumit Saxena; >> > linux-scsi@vger.kernel.org >> > Cc: martin.petersen@oracle.com; thenzl@redhat.com; >> > kashyap.desai@broadcom.com; Christoph Hellwig; Martin K. Petersen; >> > Jeff Moyer; Gris Ge; Ewan Milne; Jens Axboe >> > Subject: Re: [PATCH 4/7] megaraid_sas: Send SYNCHRONIZE_CACHE >> > command to firmware >> > >> > On Mon, 2016-10-17 at 12:27 -0400, Ric Wheeler wrote: >> > > On 10/17/2016 12:20 PM, James Bottomley wrote: >> > > > On Mon, 2016-10-17 at 09:01 -0400, Ric Wheeler wrote: >> > > > > On 10/17/2016 07:34 AM, Hannes Reinecke wrote: >> > > > > > On 10/17/2016 12:24 PM, Sumit Saxena wrote: >> > > > > > > megaraid_sas driver returns SYNCHRONIZE_CACHE command back >> > > > > > > to SCSI layer without sending it to firmware as firmware >> > > > > > > takes care of flushing cache. This patch will change the >> > > > > > > driver behavior wrt SYNCHRONIZE_CACHE handling. If >> > > > > > > underlying firmware has support to handle the >> > > > > > > SYNCHORNIZE_CACHE, driver will send it for firmware >> > > > > > > otherwise complete it back to SCSI layer with SUCCESS >> > > > > > > immediately. If Firmware handle SYNCHORNIZE_CACHE for >> > > > > > > both VD and JBOD "canHandleSyncCache" >> > > > > > > bit in scratch pad register(offset >> > > > > > > 0x00B4) will be set. >> > > > > > > >> > > > > > > This behavior can be controlled via module parameter and >> > > > > > > user can fallback to old behavior of returning >> > > > > > > SYNCHRONIZE_CACHE by driver only without sending it to >> > > > > > > firmware. >> > > > > > > >> > > > > > > Signed-off-by: Sumit Saxena <sumit.saxena@broadcom.com> >> > > > > > > Signed-off-by: Kashyap Desai <kashyap.desai@broadcom.com> >> > > > > > > --- >> > > > > > > drivers/scsi/megaraid/megaraid_sas.h | 3 +++ >> > > > > > > drivers/scsi/megaraid/megaraid_sas_base.c | 14 >> > > > > > > ++++++--- >> > > > > > > ----- >> > > > > > > drivers/scsi/megaraid/megaraid_sas_fusion.c | 3 +++ >> > > > > > > drivers/scsi/megaraid/megaraid_sas_fusion.h | 2 ++ >> > > > > > > 4 files changed, 14 insertions(+), 8 deletions(-) >> > > > > > > >> > > > > > > diff --git a/drivers/scsi/megaraid/megaraid_sas.h >> > > > > > > b/drivers/scsi/megaraid/megaraid_sas.h >> > > > > > > index ca86c88..43fd14f 100644 >> > > > > > > --- a/drivers/scsi/megaraid/megaraid_sas.h >> > > > > > > +++ b/drivers/scsi/megaraid/megaraid_sas.h >> > > > > > > @@ -1429,6 +1429,8 @@ enum FW_BOOT_CONTEXT { >> > > > > > > #define MR_MAX_REPLY_QUEUES_EXT_OFFSET_SHIFT 14 >> > > > > > > #define MR_MAX_MSIX_REG_ARRAY 16 >> > > > > > > #define MR_RDPQ_MODE_OFFSET 0X0 >> > > > > > > 0800 >> > > > > > > 000 >> > > > > > > +#define MR_CAN_HANDLE_SYNC_CACHE_OFFSET 0 >> > > > > > > X010 >> > > > > > > 0000 >> > > > > > > 0 >> > > > > > > + >> > > > > > > /* >> > > > > > > * register set for both 1068 and 1078 controllers >> > > > > > > * structure extended for 1078 registers @@ -2140,6 >> > > > > > > +2142,7 >> > > > > > > @@ struct megasas_instance { >> > > > > > > u8 is_imr; >> > > > > > > u8 is_rdpq; >> > > > > > > bool dev_handle; >> > > > > > > + bool fw_sync_cache_support; >> > > > > > > }; >> > > > > > > struct MR_LD_VF_MAP { >> > > > > > > u32 size; >> > > > > > > diff --git a/drivers/scsi/megaraid/megaraid_sas_base.c >> > > > > > > b/drivers/scsi/megaraid/megaraid_sas_base.c >> > > > > > > index 092387f..a4a8e2f 100644 >> > > > > > > --- a/drivers/scsi/megaraid/megaraid_sas_base.c >> > > > > > > +++ b/drivers/scsi/megaraid/megaraid_sas_base.c >> > > > > > > @@ -104,6 +104,10 @@ unsigned int scmd_timeout = >> > > > > > > MEGASAS_DEFAULT_CMD_TIMEOUT; >> > > > > > > module_param(scmd_timeout, int, S_IRUGO); >> > > > > > > MODULE_PARM_DESC(scmd_timeout, "scsi command timeout >> > > > > > > (10 >> > > > > > > -90s), default 90s. See megasas_reset_timer."); >> > > > > > > >> > > > > > > +bool block_sync_cache; >> > > > > > > +module_param(block_sync_cache, bool, S_IRUGO); >> > > > > > > +MODULE_PARM_DESC(block_sync_cache, "Block SYNC CACHE by >> > > > > > > driver >> > > > > > > Default: 0(Send it to firmware)"); >> > > > > > > + >> > > > > > > MODULE_LICENSE("GPL"); >> > > > > > > MODULE_VERSION(MEGASAS_VERSION); >> > > > > > > MODULE_AUTHOR("megaraidlinux.pdl@avagotech.com"); >> > > > > > > @@ -1700,16 +1704,10 @@ megasas_queue_command(struct >> > Scsi_Host >> > > > > > > *shost, struct scsi_cmnd *scmd) >> > > > > > > goto out_done; >> > > > > > > } >> > > > > > > >> > > > > > > - switch (scmd->cmnd[0]) { >> > > > > > > - case SYNCHRONIZE_CACHE: >> > > > > > > - /* >> > > > > > > - * FW takes care of flush cache on its >> > > > > > > own >> > > > > > > - * No need to send it down >> > > > > > > - */ >> > > > > > > + if ((scmd->cmnd[0] == SYNCHRONIZE_CACHE) && >> > > > > > > + (!instance->fw_sync_cache_support)) { >> > > > > > > scmd->result = DID_OK << 16; >> > > > > > > goto out_done; >> > > > > > > - default: >> > > > > > > - break; >> > > > > > > } >> > > > > > > >> > > > > > > return instance->instancet >> > > > > > > ->build_and_issue_cmd(instance, scmd); >> > > > > > > diff --git a/drivers/scsi/megaraid/megaraid_sas_fusion.c >> > > > > > > b/drivers/scsi/megaraid/megaraid_sas_fusion.c >> > > > > > > index 2159f6a..8237580 100644 >> > > > > > > --- a/drivers/scsi/megaraid/megaraid_sas_fusion.c >> > > > > > > +++ b/drivers/scsi/megaraid/megaraid_sas_fusion.c >> > > > > > > @@ -747,6 +747,9 @@ megasas_ioc_init_fusion(struct >> > > > > > > megasas_instance *instance) >> > > > > > > ret = 1; >> > > > > > > goto fail_fw_init; >> > > > > > > } >> > > > > > > + if (!block_sync_cache) >> > > > > > > + instance->fw_sync_cache_support = >> > > > > > > (scratch_pad_2 >> > > > > > > & >> > > > > > > + MR_CAN_HANDLE_SYNC_CACHE_OFFSET) >> > > > > > > ? 1 >> > > > > > > : >> > > > > > > 0; >> > > > > > > >> > > > > > > IOCInitMessage = >> > > > > > > dma_alloc_coherent(&instance->pdev->dev, >> > > > > > > diff --git a/drivers/scsi/megaraid/megaraid_sas_fusion.h >> > > > > > > b/drivers/scsi/megaraid/megaraid_sas_fusion.h >> > > > > > > index e3bee04..2857154 100644 >> > > > > > > --- a/drivers/scsi/megaraid/megaraid_sas_fusion.h >> > > > > > > +++ b/drivers/scsi/megaraid/megaraid_sas_fusion.h >> > > > > > > @@ -72,6 +72,8 @@ >> > > > > > > #define MPI2_SUP_REPLY_POST_HOST_INDEX_OFFSET >> > > > > > > (0x0000030C) >> > > > > > > #define MPI2_REPLY_POST_HOST_INDEX_OFFSET (0x00 >> > > > > > > 0000 >> > > > > > > 6C) >> > > > > > > >> > > > > > > +extern bool block_sync_cache; >> > > > > > > + >> > > > > > > /* >> > > > > > > * Raid context flags >> > > > > > > */ >> > > > > > > >> > > > > > Be extra careful with that. >> > > > > > >> > > > > > SYNCHRONIZE_CACHE has (potentially) a global scope, as there >> > > > > > might be an array-wide cache, and a cache flush would affect >> > > > > > the entire cache. Linux is using SYNCHRONIZE_CACHE as a per >> > > > > > device setting, ie it assumes that the effects of a cache >> > > > > > flush is restricted to the device in question. >> > > > > > >> > > > > > If this is _not_ the case I'd rather not enable it. >> > > > > > Have you checked that enabling this functionality doesn't >> > > > > > have negative performance impact? >> > > > > > >> > > > > > Cheers, >> > > > > > >> > > > > > Hannes >> > > > > This must go in - without this fix, there is no data integrity >> > > > > for any file system. >> > > > That's not what I get from the change log. What it says to me >> > > > is that the caches are currently firmware managed. Barring >> > > > firmware bugs, that means that we currently don't have any >> > > > integrity issues. >> > > >> > > Your understanding (or the change log) is incorrect. >> > >> > There's no current way in English of construing "as firmware takes >> > care of flushing cache" to mean its inverse, so the changelog needs >> > updating to explain that firmware does *not* take care of cache >> > flushing, so effectively nothing does and we'll need a cc to stable >> > if the problem is that nothing takes care of flushing the drive >> > caches. >> > >> > James >> >> Sorry for confusion. More accurate to say - >> >> Current megaraid_sas driver returns SYNCHRONIZE_CACHE command back to >> SCSI layer without sending it down to firmware as firmware supposed to >> takes care of flushing cache for all Virtual Disk at the time of >> system reboot/shutdown. Because of above FW rule driver coded wrongly >> and it added bug that SYNCHRONIZE_CACHE is not passed to FW for JBOD >> as well. (MR Firmware wanted OS to manage SYNCHRONIZE_CACHE and pass >> the same to the Firmware. ). We figure out that even for VD, why >> driver should send back SYNC_CACHE command...let's have the same in FW >> (giving all control in FW) >> >> New behavior described - >> >> IF application requests SYNCH_CACHE >> IF any FW which supports new API bit called canHandleSyncCache >> Driver sends SYNCH_CACHE command to the FW >> IF 'VirtualDisk' >> FW does not send SYNCH_CACHE to drives >> FW returns SUCCESS >> ELSE IF 'JBOD' >> FW sends SYNCH_CACHE to drive >> FW obtains status from drive and returns same status back to >> driver >> ENDIF >> >> Fixing this is robust if FW and driver changes are inline. See below >> for more detail. >> >> Case - 1 >> This patch is attempt to fix one level problem where Virtual Disks are >> not managed correctly in MR FW. There are some MR FW (E.a Gen2 and >> Gen2.5 FW), which set WCE bit for Virtual Disk but FW is not reply >> correct for SYNCH_CACHE. This was handled in past and carry forward >> (in driver + FW ) to all new generation products as well. We tried to >> collect all possible details why it was handled such way to provide >> better driver fix for this issue(mainly to avoid this FW checks and >> module parameters etc..). But now it looks like to make generic fix >> (Device ID based etc..)is even risky, so went with this protective >> approach. It is very risky to find out which Device ID and FW has >> capability to manage SYNC_CACHE, so we managed to figure out which are >> the new FW using FW capability bit. >> >> E.g Liberator/Thunderbolt MR FW SYNC_CACHE is write Unsupported >> command (this is a firmware bug) and immediately it will be failed to >> SML. This will cause File system to go Read-only. > >That's a better description ... what you're saying is that the patch >doesn't actually >fix the bug Ric is worrying about. > >> Case -2 >> One more thing which we are trying and known driver can send >> "SYNC_CACHE" unconditionally to all generation of FW. For this we are >> doing more unit testing on various controllers/FW and planning to >> provide another patch to fix JBOD path for any FW. (It will not be >> based on FW capability/module parameter). > >OK, but we really need this ASAP. As Ric said, data integrity is at risk >until this is >fixed. Can you also reference the commit that introduced the problem so we >know how far to do the sable backports? Commit- " 02b01e0 [SCSI] megaraid_sas: return sync cache call with success" added the code in driver to return SYNCHRONIZE_CACHE without sending it to firmware long back in 2007. This was then added for RAID volumes as then supported controller firmwares did not have support for SYNCHRONIZE_CACHE. This was mistakenly carried forward for non RAID(JBODs) which was wrong. I will be sending a consolidated patch which will address all issues pertaining to SYNCHRONIZE_CACHE for RAID volumes and non RAID(JBOD) disks. > >> If we remove module parameter, we can ask customer to disable WCE on >> drive to get similar impact. > >Thanks, > >James -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 19.10.2016 11:50, Ching Huang wrote: > Hi Tomas, > > SCSI command checking in queuecommand function of arcmsr can be removed safely. > Now driver can pass all scsi command to controller firmware. thanks for your quick reply. Is this (passing the SYNCHRONIZE_CACHE command to firmware) safe for all arcmsr cards, even the oldest? Please start with your patch a new thread and add your 'Signed-off-by:' line. > > diff -uprN a/drivers/scsi/arcmsr/arcmsr_hba.c b/drivers/scsi/arcmsr/arcmsr_hba.c > --- a/drivers/scsi/arcmsr/arcmsr_hba.c 2016-10-19 16:18:45.000000000 +0800 > +++ b/drivers/scsi/arcmsr/arcmsr_hba.c 2016-10-19 16:31:59.665524699 +0800 > @@ -2623,13 +2623,6 @@ static int arcmsr_queue_command_lck(stru > cmd->scsi_done = done; > cmd->host_scribble = NULL; > cmd->result = 0; > - if ((scsicmd == SYNCHRONIZE_CACHE) ||(scsicmd == SEND_DIAGNOSTIC)){ > - if(acb->devstate[target][lun] == ARECA_RAID_GONE) { > - cmd->result = (DID_NO_CONNECT << 16); > - } > - cmd->scsi_done(cmd); > - return 0; > - } > if (target == 16) { > /* virtual device for iop message transfer */ > arcmsr_handle_virtual_command(acb, cmd); > > Thanks > Ching > On Tue, 2016-10-18 at 15:56 +0200, Tomas Henzl wrote: >> Hi, >> >> similar suspicious code path can be found in the queuecommand functions in other drivers too >> these are - >> pmcraid.c >> arcmsr_hba.c >> cc-ing maintainers - >> (but both drivers seem to be unmaintained for a while - >> I've added Ching for arcmsr and Raghava for pmcraid) >> >> please read this thread and verify that your driver and firmware is safe. >> >> It is expected that a raid card controls the disk write cache (switches it off) >> but this might not be true for all modes of operation a raid adapter handles >> - pass through/non-RAID etc. In such case when disk write cache is enabled >> and your driver skips the SYNCHRONIZE_CACHE command a FS corruption >> is very much possible during unexpected power loss or even a clean shutdown. >> >> tomash >> >> On 17.10.2016 19:51, James Bottomley wrote: >>> On Mon, 2016-10-17 at 23:06 +0530, Kashyap Desai wrote: >>>>> -----Original Message----- >>>>> From: James Bottomley [mailto:jejb@linux.vnet.ibm.com] >>>>> Sent: Monday, October 17, 2016 10:50 PM >>>>> To: Ric Wheeler; Hannes Reinecke; Sumit Saxena; >>>>> linux-scsi@vger.kernel.org >>>>> Cc: martin.petersen@oracle.com; thenzl@redhat.com; >>>>> kashyap.desai@broadcom.com; Christoph Hellwig; Martin K. Petersen; >>>>> Jeff >>>>> Moyer; Gris Ge; Ewan Milne; Jens Axboe >>>>> Subject: Re: [PATCH 4/7] megaraid_sas: Send SYNCHRONIZE_CACHE >>>>> command >>>>> to firmware >>>>> >>>>> On Mon, 2016-10-17 at 12:27 -0400, Ric Wheeler wrote: >>>>>> On 10/17/2016 12:20 PM, James Bottomley wrote: >>>>>>> On Mon, 2016-10-17 at 09:01 -0400, Ric Wheeler wrote: >>>>>>>> On 10/17/2016 07:34 AM, Hannes Reinecke wrote: >>>>>>>>> On 10/17/2016 12:24 PM, Sumit Saxena wrote: >>>>>>>>>> megaraid_sas driver returns SYNCHRONIZE_CACHE command >>>>>>>>>> back to >>>>>>>>>> SCSI layer without sending it to firmware as firmware >>>>>>>>>> takes >>>>>>>>>> care of flushing cache. This patch will change the >>>>>>>>>> driver >>>>>>>>>> behavior wrt SYNCHRONIZE_CACHE handling. If underlying >>>>>>>>>> firmware has support to handle the SYNCHORNIZE_CACHE, >>>>>>>>>> driver >>>>>>>>>> will send it for firmware otherwise complete it back to >>>>>>>>>> SCSI >>>>>>>>>> layer with SUCCESS immediately. If Firmware handle >>>>>>>>>> SYNCHORNIZE_CACHE for both VD and JBOD >>>>>>>>>> "canHandleSyncCache" >>>>>>>>>> bit in scratch pad register(offset >>>>>>>>>> 0x00B4) will be set. >>>>>>>>>> >>>>>>>>>> This behavior can be controlled via module parameter and >>>>>>>>>> user >>>>>>>>>> can fallback to old behavior of returning >>>>>>>>>> SYNCHRONIZE_CACHE by >>>>>>>>>> driver only without sending it to firmware. >>>>>>>>>> >>>>>>>>>> Signed-off-by: Sumit Saxena <sumit.saxena@broadcom.com> >>>>>>>>>> Signed-off-by: Kashyap Desai <kashyap.desai@broadcom.com> >>>>>>>>>> --- >>>>>>>>>> drivers/scsi/megaraid/megaraid_sas.h | 3 +++ >>>>>>>>>> drivers/scsi/megaraid/megaraid_sas_base.c | 14 >>>>>>>>>> ++++++--- >>>>>>>>>> ----- >>>>>>>>>> drivers/scsi/megaraid/megaraid_sas_fusion.c | 3 +++ >>>>>>>>>> drivers/scsi/megaraid/megaraid_sas_fusion.h | 2 ++ >>>>>>>>>> 4 files changed, 14 insertions(+), 8 deletions(-) >>>>>>>>>> >>>>>>>>>> diff --git a/drivers/scsi/megaraid/megaraid_sas.h >>>>>>>>>> b/drivers/scsi/megaraid/megaraid_sas.h >>>>>>>>>> index ca86c88..43fd14f 100644 >>>>>>>>>> --- a/drivers/scsi/megaraid/megaraid_sas.h >>>>>>>>>> +++ b/drivers/scsi/megaraid/megaraid_sas.h >>>>>>>>>> @@ -1429,6 +1429,8 @@ enum FW_BOOT_CONTEXT { >>>>>>>>>> #define MR_MAX_REPLY_QUEUES_EXT_OFFSET_SHIFT 14 >>>>>>>>>> #define MR_MAX_MSIX_REG_ARRAY 16 >>>>>>>>>> #define MR_RDPQ_MODE_OFFSET 0X0 >>>>>>>>>> 0800 >>>>>>>>>> 000 >>>>>>>>>> +#define MR_CAN_HANDLE_SYNC_CACHE_OFFSET 0 >>>>>>>>>> X010 >>>>>>>>>> 0000 >>>>>>>>>> 0 >>>>>>>>>> + >>>>>>>>>> /* >>>>>>>>>> * register set for both 1068 and 1078 controllers >>>>>>>>>> * structure extended for 1078 registers @@ -2140,6 >>>>>>>>>> +2142,7 >>>>>>>>>> @@ struct megasas_instance { >>>>>>>>>> u8 is_imr; >>>>>>>>>> u8 is_rdpq; >>>>>>>>>> bool dev_handle; >>>>>>>>>> + bool fw_sync_cache_support; >>>>>>>>>> }; >>>>>>>>>> struct MR_LD_VF_MAP { >>>>>>>>>> u32 size; >>>>>>>>>> diff --git a/drivers/scsi/megaraid/megaraid_sas_base.c >>>>>>>>>> b/drivers/scsi/megaraid/megaraid_sas_base.c >>>>>>>>>> index 092387f..a4a8e2f 100644 >>>>>>>>>> --- a/drivers/scsi/megaraid/megaraid_sas_base.c >>>>>>>>>> +++ b/drivers/scsi/megaraid/megaraid_sas_base.c >>>>>>>>>> @@ -104,6 +104,10 @@ unsigned int scmd_timeout = >>>>>>>>>> MEGASAS_DEFAULT_CMD_TIMEOUT; >>>>>>>>>> module_param(scmd_timeout, int, S_IRUGO); >>>>>>>>>> MODULE_PARM_DESC(scmd_timeout, "scsi command timeout >>>>>>>>>> (10 >>>>>>>>>> -90s), default 90s. See megasas_reset_timer."); >>>>>>>>>> >>>>>>>>>> +bool block_sync_cache; >>>>>>>>>> +module_param(block_sync_cache, bool, S_IRUGO); >>>>>>>>>> +MODULE_PARM_DESC(block_sync_cache, "Block SYNC CACHE by >>>>>>>>>> driver >>>>>>>>>> Default: 0(Send it to firmware)"); >>>>>>>>>> + >>>>>>>>>> MODULE_LICENSE("GPL"); >>>>>>>>>> MODULE_VERSION(MEGASAS_VERSION); >>>>>>>>>> MODULE_AUTHOR("megaraidlinux.pdl@avagotech.com"); >>>>>>>>>> @@ -1700,16 +1704,10 @@ megasas_queue_command(struct >>>>> Scsi_Host >>>>>>>>>> *shost, struct scsi_cmnd *scmd) >>>>>>>>>> goto out_done; >>>>>>>>>> } >>>>>>>>>> >>>>>>>>>> - switch (scmd->cmnd[0]) { >>>>>>>>>> - case SYNCHRONIZE_CACHE: >>>>>>>>>> - /* >>>>>>>>>> - * FW takes care of flush cache on its >>>>>>>>>> own >>>>>>>>>> - * No need to send it down >>>>>>>>>> - */ >>>>>>>>>> + if ((scmd->cmnd[0] == SYNCHRONIZE_CACHE) && >>>>>>>>>> + (!instance->fw_sync_cache_support)) { >>>>>>>>>> scmd->result = DID_OK << 16; >>>>>>>>>> goto out_done; >>>>>>>>>> - default: >>>>>>>>>> - break; >>>>>>>>>> } >>>>>>>>>> >>>>>>>>>> return instance->instancet >>>>>>>>>> ->build_and_issue_cmd(instance, scmd); >>>>>>>>>> diff --git a/drivers/scsi/megaraid/megaraid_sas_fusion.c >>>>>>>>>> b/drivers/scsi/megaraid/megaraid_sas_fusion.c >>>>>>>>>> index 2159f6a..8237580 100644 >>>>>>>>>> --- a/drivers/scsi/megaraid/megaraid_sas_fusion.c >>>>>>>>>> +++ b/drivers/scsi/megaraid/megaraid_sas_fusion.c >>>>>>>>>> @@ -747,6 +747,9 @@ megasas_ioc_init_fusion(struct >>>>>>>>>> megasas_instance *instance) >>>>>>>>>> ret = 1; >>>>>>>>>> goto fail_fw_init; >>>>>>>>>> } >>>>>>>>>> + if (!block_sync_cache) >>>>>>>>>> + instance->fw_sync_cache_support = >>>>>>>>>> (scratch_pad_2 >>>>>>>>>> & >>>>>>>>>> + MR_CAN_HANDLE_SYNC_CACHE_OFFSET) >>>>>>>>>> ? 1 >>>>>>>>>> : >>>>>>>>>> 0; >>>>>>>>>> >>>>>>>>>> IOCInitMessage = >>>>>>>>>> dma_alloc_coherent(&instance->pdev->dev, >>>>>>>>>> diff --git a/drivers/scsi/megaraid/megaraid_sas_fusion.h >>>>>>>>>> b/drivers/scsi/megaraid/megaraid_sas_fusion.h >>>>>>>>>> index e3bee04..2857154 100644 >>>>>>>>>> --- a/drivers/scsi/megaraid/megaraid_sas_fusion.h >>>>>>>>>> +++ b/drivers/scsi/megaraid/megaraid_sas_fusion.h >>>>>>>>>> @@ -72,6 +72,8 @@ >>>>>>>>>> #define MPI2_SUP_REPLY_POST_HOST_INDEX_OFFSET >>>>>>>>>> (0x0000030C) >>>>>>>>>> #define MPI2_REPLY_POST_HOST_INDEX_OFFSET (0x00 >>>>>>>>>> 0000 >>>>>>>>>> 6C) >>>>>>>>>> >>>>>>>>>> +extern bool block_sync_cache; >>>>>>>>>> + >>>>>>>>>> /* >>>>>>>>>> * Raid context flags >>>>>>>>>> */ >>>>>>>>>> >>>>>>>>> Be extra careful with that. >>>>>>>>> >>>>>>>>> SYNCHRONIZE_CACHE has (potentially) a global scope, as >>>>>>>>> there >>>>>>>>> might be an array-wide cache, and a cache flush would >>>>>>>>> affect the >>>>>>>>> entire cache. Linux is using SYNCHRONIZE_CACHE as a per >>>>>>>>> device >>>>>>>>> setting, ie it assumes that the effects of a cache flush is >>>>>>>>> restricted to the device in question. >>>>>>>>> >>>>>>>>> If this is _not_ the case I'd rather not enable it. >>>>>>>>> Have you checked that enabling this functionality doesn't >>>>>>>>> have >>>>>>>>> negative performance impact? >>>>>>>>> >>>>>>>>> Cheers, >>>>>>>>> >>>>>>>>> Hannes >>>>>>>> This must go in - without this fix, there is no data >>>>>>>> integrity for >>>>>>>> any file system. >>>>>>> That's not what I get from the change log. What it says to me >>>>>>> is >>>>>>> that the caches are currently firmware managed. Barring >>>>>>> firmware >>>>>>> bugs, that means that we currently don't have any integrity >>>>>>> issues. >>>>>> Your understanding (or the change log) is incorrect. >>>>> There's no current way in English of construing "as firmware takes >>>>> care of >>>>> flushing cache" to mean its inverse, so the changelog needs >>>>> updating to >>>>> explain >>>>> that firmware does *not* take care of cache flushing, so >>>>> effectively >>>>> nothing >>>>> does and we'll need a cc to stable if the problem is that nothing >>>>> takes >>>>> care of >>>>> flushing the drive caches. >>>>> >>>>> James >>>> Sorry for confusion. More accurate to say - >>>> >>>> Current megaraid_sas driver returns SYNCHRONIZE_CACHE command back to >>>> SCSI layer without sending it down to firmware as firmware supposed >>>> to takes care of flushing cache for all Virtual Disk at the time of >>>> system reboot/shutdown. Because of above FW rule driver coded wrongly >>>> and it added bug that SYNCHRONIZE_CACHE is not passed to FW for JBOD >>>> as well. (MR Firmware wanted OS to manage SYNCHRONIZE_CACHE and pass >>>> the same to the Firmware. ). We figure out that even for VD, why >>>> driver should send back SYNC_CACHE command...let's have the same in >>>> FW (giving all control in FW) >>>> >>>> New behavior described - >>>> >>>> IF application requests SYNCH_CACHE >>>> IF any FW which supports new API bit called canHandleSyncCache >>>> Driver sends SYNCH_CACHE command to the FW >>>> IF 'VirtualDisk' >>>> FW does not send SYNCH_CACHE to drives >>>> FW returns SUCCESS >>>> ELSE IF 'JBOD' >>>> FW sends SYNCH_CACHE to drive >>>> FW obtains status from drive and returns same status back to >>>> driver >>>> ENDIF >>>> >>>> Fixing this is robust if FW and driver changes are inline. See below >>>> for more detail. >>>> >>>> Case - 1 >>>> This patch is attempt to fix one level problem where Virtual Disks >>>> are not managed correctly in MR FW. There are some MR FW (E.a Gen2 >>>> and Gen2.5 FW), which set WCE bit for Virtual Disk but FW is not >>>> reply correct for SYNCH_CACHE. This was handled in past and carry >>>> forward (in driver + FW ) to all new generation products as well. We >>>> tried to collect all possible details why it was handled such way to >>>> provide better driver fix for this issue(mainly to avoid this FW >>>> checks and module parameters etc..). But now it looks like to make >>>> generic fix (Device ID based etc..)is even risky, so went with this >>>> protective approach. It is very risky to find out which Device ID >>>> and FW has capability to manage SYNC_CACHE, so we managed to figure >>>> out which are the new FW using FW capability bit. >>>> >>>> E.g Liberator/Thunderbolt MR FW SYNC_CACHE is write Unsupported >>>> command (this is a firmware bug) and immediately it will be failed to >>>> SML. This will cause File system to go Read-only. >>> That's a better description ... what you're saying is that the patch >>> doesn't actually fix the bug Ric is worrying about. >>> >>>> Case -2 >>>> One more thing which we are trying and known driver can send >>>> "SYNC_CACHE" unconditionally to all generation of FW. For this we >>>> are doing more unit testing on various controllers/FW and planning to >>>> provide another patch to fix JBOD path for any FW. (It will not be >>>> based on FW capability/module parameter). >>> OK, but we really need this ASAP. As Ric said, data integrity is at >>> risk until this is fixed. Can you also reference the commit that >>> introduced the problem so we know how far to do the sable backports? >>> >>>> If we remove module parameter, we can ask customer to disable WCE on >>>> drive to get similar impact. >>> Thanks, >>> >>> James >>> >>> -- >>> To unsubscribe from this list: send the line "unsubscribe linux-scsi" in >>> the body of a message to majordomo@vger.kernel.org >>> More majordomo info at http://vger.kernel.org/majordomo-info.html >> > -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hi Tomas, > -----Original Message----- > From: Tomas Henzl [mailto:thenzl@redhat.com] > Sent: Wednesday, October 19, 2016 5:56 AM > To: Ching Huang > Cc: James Bottomley; Kashyap Desai; Ric Wheeler; Hannes Reinecke; Sumit > Saxena; linux-scsi@vger.kernel.org; martin.petersen@oracle.com; Christoph > Hellwig; Martin K. Petersen; Jeff Moyer; Gris Ge; Ewan Milne; Jens Axboe; > Raghava Aditya Renukunta > Subject: Re: [PATCH 4/7] megaraid_sas: Send SYNCHRONIZE_CACHE > command to firmware > > EXTERNAL EMAIL > > > On 19.10.2016 11:50, Ching Huang wrote: > > Hi Tomas, > > > > SCSI command checking in queuecommand function of arcmsr can be > removed safely. > > Now driver can pass all scsi command to controller firmware. > > thanks for your quick reply. Is this (passing the SYNCHRONIZE_CACHE > command > to firmware) safe for all arcmsr cards, even the oldest? > > Please start with your patch a new thread and add your 'Signed-off-by:' line. > > > > > diff -uprN a/drivers/scsi/arcmsr/arcmsr_hba.c > b/drivers/scsi/arcmsr/arcmsr_hba.c > > --- a/drivers/scsi/arcmsr/arcmsr_hba.c 2016-10-19 16:18:45.000000000 > +0800 > > +++ b/drivers/scsi/arcmsr/arcmsr_hba.c 2016-10-19 16:31:59.665524699 > +0800 > > @@ -2623,13 +2623,6 @@ static int arcmsr_queue_command_lck(stru > > cmd->scsi_done = done; > > cmd->host_scribble = NULL; > > cmd->result = 0; > > - if ((scsicmd == SYNCHRONIZE_CACHE) ||(scsicmd == > SEND_DIAGNOSTIC)){ > > - if(acb->devstate[target][lun] == ARECA_RAID_GONE) { > > - cmd->result = (DID_NO_CONNECT << 16); > > - } > > - cmd->scsi_done(cmd); > > - return 0; > > - } > > if (target == 16) { > > /* virtual device for iop message transfer */ > > arcmsr_handle_virtual_command(acb, cmd); > > > > Thanks > > Ching > > On Tue, 2016-10-18 at 15:56 +0200, Tomas Henzl wrote: > >> Hi, > >> > >> similar suspicious code path can be found in the queuecommand > functions in other drivers too > >> these are - > >> pmcraid.c > >> arcmsr_hba.c > >> cc-ing maintainers - > >> (but both drivers seem to be unmaintained for a while - > >> I've added Ching for arcmsr and Raghava for pmcraid) We do not support this card anymore nor do we have the hardware to test it out, but let me try and procure the hardware for testing although chances are very slim. Regards, Raghava Aditya > >> please read this thread and verify that your driver and firmware is safe. > >> > >> It is expected that a raid card controls the disk write cache (switches it off) > >> but this might not be true for all modes of operation a raid adapter > handles > >> - pass through/non-RAID etc. In such case when disk write cache is > enabled > >> and your driver skips the SYNCHRONIZE_CACHE command a FS corruption > >> is very much possible during unexpected power loss or even a clean > shutdown. > >> > >> tomash > >> > >> On 17.10.2016 19:51, James Bottomley wrote: > >>> On Mon, 2016-10-17 at 23:06 +0530, Kashyap Desai wrote: > >>>>> -----Original Message----- > >>>>> From: James Bottomley [mailto:jejb@linux.vnet.ibm.com] > >>>>> Sent: Monday, October 17, 2016 10:50 PM > >>>>> To: Ric Wheeler; Hannes Reinecke; Sumit Saxena; > >>>>> linux-scsi@vger.kernel.org > >>>>> Cc: martin.petersen@oracle.com; thenzl@redhat.com; > >>>>> kashyap.desai@broadcom.com; Christoph Hellwig; Martin K. > Petersen; > >>>>> Jeff > >>>>> Moyer; Gris Ge; Ewan Milne; Jens Axboe > >>>>> Subject: Re: [PATCH 4/7] megaraid_sas: Send SYNCHRONIZE_CACHE > >>>>> command > >>>>> to firmware > >>>>> > >>>>> On Mon, 2016-10-17 at 12:27 -0400, Ric Wheeler wrote: > >>>>>> On 10/17/2016 12:20 PM, James Bottomley wrote: > >>>>>>> On Mon, 2016-10-17 at 09:01 -0400, Ric Wheeler wrote: > >>>>>>>> On 10/17/2016 07:34 AM, Hannes Reinecke wrote: > >>>>>>>>> On 10/17/2016 12:24 PM, Sumit Saxena wrote: > >>>>>>>>>> megaraid_sas driver returns SYNCHRONIZE_CACHE command > >>>>>>>>>> back to > >>>>>>>>>> SCSI layer without sending it to firmware as firmware > >>>>>>>>>> takes > >>>>>>>>>> care of flushing cache. This patch will change the > >>>>>>>>>> driver > >>>>>>>>>> behavior wrt SYNCHRONIZE_CACHE handling. If underlying > >>>>>>>>>> firmware has support to handle the SYNCHORNIZE_CACHE, > >>>>>>>>>> driver > >>>>>>>>>> will send it for firmware otherwise complete it back to > >>>>>>>>>> SCSI > >>>>>>>>>> layer with SUCCESS immediately. If Firmware handle > >>>>>>>>>> SYNCHORNIZE_CACHE for both VD and JBOD > >>>>>>>>>> "canHandleSyncCache" > >>>>>>>>>> bit in scratch pad register(offset > >>>>>>>>>> 0x00B4) will be set. > >>>>>>>>>> > >>>>>>>>>> This behavior can be controlled via module parameter and > >>>>>>>>>> user > >>>>>>>>>> can fallback to old behavior of returning > >>>>>>>>>> SYNCHRONIZE_CACHE by > >>>>>>>>>> driver only without sending it to firmware. > >>>>>>>>>> > >>>>>>>>>> Signed-off-by: Sumit Saxena <sumit.saxena@broadcom.com> > >>>>>>>>>> Signed-off-by: Kashyap Desai > <kashyap.desai@broadcom.com> > >>>>>>>>>> --- > >>>>>>>>>> drivers/scsi/megaraid/megaraid_sas.h | 3 +++ > >>>>>>>>>> drivers/scsi/megaraid/megaraid_sas_base.c | 14 > >>>>>>>>>> ++++++--- > >>>>>>>>>> ----- > >>>>>>>>>> drivers/scsi/megaraid/megaraid_sas_fusion.c | 3 +++ > >>>>>>>>>> drivers/scsi/megaraid/megaraid_sas_fusion.h | 2 ++ > >>>>>>>>>> 4 files changed, 14 insertions(+), 8 deletions(-) > >>>>>>>>>> > >>>>>>>>>> diff --git a/drivers/scsi/megaraid/megaraid_sas.h > >>>>>>>>>> b/drivers/scsi/megaraid/megaraid_sas.h > >>>>>>>>>> index ca86c88..43fd14f 100644 > >>>>>>>>>> --- a/drivers/scsi/megaraid/megaraid_sas.h > >>>>>>>>>> +++ b/drivers/scsi/megaraid/megaraid_sas.h > >>>>>>>>>> @@ -1429,6 +1429,8 @@ enum FW_BOOT_CONTEXT { > >>>>>>>>>> #define MR_MAX_REPLY_QUEUES_EXT_OFFSET_SHIFT 14 > >>>>>>>>>> #define MR_MAX_MSIX_REG_ARRAY 16 > >>>>>>>>>> #define MR_RDPQ_MODE_OFFSET 0X0 > >>>>>>>>>> 0800 > >>>>>>>>>> 000 > >>>>>>>>>> +#define MR_CAN_HANDLE_SYNC_CACHE_OFFSET 0 > >>>>>>>>>> X010 > >>>>>>>>>> 0000 > >>>>>>>>>> 0 > >>>>>>>>>> + > >>>>>>>>>> /* > >>>>>>>>>> * register set for both 1068 and 1078 controllers > >>>>>>>>>> * structure extended for 1078 registers @@ -2140,6 > >>>>>>>>>> +2142,7 > >>>>>>>>>> @@ struct megasas_instance { > >>>>>>>>>> u8 is_imr; > >>>>>>>>>> u8 is_rdpq; > >>>>>>>>>> bool dev_handle; > >>>>>>>>>> + bool fw_sync_cache_support; > >>>>>>>>>> }; > >>>>>>>>>> struct MR_LD_VF_MAP { > >>>>>>>>>> u32 size; > >>>>>>>>>> diff --git a/drivers/scsi/megaraid/megaraid_sas_base.c > >>>>>>>>>> b/drivers/scsi/megaraid/megaraid_sas_base.c > >>>>>>>>>> index 092387f..a4a8e2f 100644 > >>>>>>>>>> --- a/drivers/scsi/megaraid/megaraid_sas_base.c > >>>>>>>>>> +++ b/drivers/scsi/megaraid/megaraid_sas_base.c > >>>>>>>>>> @@ -104,6 +104,10 @@ unsigned int scmd_timeout = > >>>>>>>>>> MEGASAS_DEFAULT_CMD_TIMEOUT; > >>>>>>>>>> module_param(scmd_timeout, int, S_IRUGO); > >>>>>>>>>> MODULE_PARM_DESC(scmd_timeout, "scsi command > timeout > >>>>>>>>>> (10 > >>>>>>>>>> -90s), default 90s. See megasas_reset_timer."); > >>>>>>>>>> > >>>>>>>>>> +bool block_sync_cache; > >>>>>>>>>> +module_param(block_sync_cache, bool, S_IRUGO); > >>>>>>>>>> +MODULE_PARM_DESC(block_sync_cache, "Block SYNC CACHE > by > >>>>>>>>>> driver > >>>>>>>>>> Default: 0(Send it to firmware)"); > >>>>>>>>>> + > >>>>>>>>>> MODULE_LICENSE("GPL"); > >>>>>>>>>> MODULE_VERSION(MEGASAS_VERSION); > >>>>>>>>>> MODULE_AUTHOR("megaraidlinux.pdl@avagotech.com"); > >>>>>>>>>> @@ -1700,16 +1704,10 @@ megasas_queue_command(struct > >>>>> Scsi_Host > >>>>>>>>>> *shost, struct scsi_cmnd *scmd) > >>>>>>>>>> goto out_done; > >>>>>>>>>> } > >>>>>>>>>> > >>>>>>>>>> - switch (scmd->cmnd[0]) { > >>>>>>>>>> - case SYNCHRONIZE_CACHE: > >>>>>>>>>> - /* > >>>>>>>>>> - * FW takes care of flush cache on its > >>>>>>>>>> own > >>>>>>>>>> - * No need to send it down > >>>>>>>>>> - */ > >>>>>>>>>> + if ((scmd->cmnd[0] == SYNCHRONIZE_CACHE) && > >>>>>>>>>> + (!instance->fw_sync_cache_support)) { > >>>>>>>>>> scmd->result = DID_OK << 16; > >>>>>>>>>> goto out_done; > >>>>>>>>>> - default: > >>>>>>>>>> - break; > >>>>>>>>>> } > >>>>>>>>>> > >>>>>>>>>> return instance->instancet > >>>>>>>>>> ->build_and_issue_cmd(instance, scmd); > >>>>>>>>>> diff --git a/drivers/scsi/megaraid/megaraid_sas_fusion.c > >>>>>>>>>> b/drivers/scsi/megaraid/megaraid_sas_fusion.c > >>>>>>>>>> index 2159f6a..8237580 100644 > >>>>>>>>>> --- a/drivers/scsi/megaraid/megaraid_sas_fusion.c > >>>>>>>>>> +++ b/drivers/scsi/megaraid/megaraid_sas_fusion.c > >>>>>>>>>> @@ -747,6 +747,9 @@ megasas_ioc_init_fusion(struct > >>>>>>>>>> megasas_instance *instance) > >>>>>>>>>> ret = 1; > >>>>>>>>>> goto fail_fw_init; > >>>>>>>>>> } > >>>>>>>>>> + if (!block_sync_cache) > >>>>>>>>>> + instance->fw_sync_cache_support = > >>>>>>>>>> (scratch_pad_2 > >>>>>>>>>> & > >>>>>>>>>> + MR_CAN_HANDLE_SYNC_CACHE_OFFSET) > >>>>>>>>>> ? 1 > >>>>>>>>>> : > >>>>>>>>>> 0; > >>>>>>>>>> > >>>>>>>>>> IOCInitMessage = > >>>>>>>>>> dma_alloc_coherent(&instance->pdev->dev, > >>>>>>>>>> diff --git a/drivers/scsi/megaraid/megaraid_sas_fusion.h > >>>>>>>>>> b/drivers/scsi/megaraid/megaraid_sas_fusion.h > >>>>>>>>>> index e3bee04..2857154 100644 > >>>>>>>>>> --- a/drivers/scsi/megaraid/megaraid_sas_fusion.h > >>>>>>>>>> +++ b/drivers/scsi/megaraid/megaraid_sas_fusion.h > >>>>>>>>>> @@ -72,6 +72,8 @@ > >>>>>>>>>> #define MPI2_SUP_REPLY_POST_HOST_INDEX_OFFSET > >>>>>>>>>> (0x0000030C) > >>>>>>>>>> #define MPI2_REPLY_POST_HOST_INDEX_OFFSET (0x00 > >>>>>>>>>> 0000 > >>>>>>>>>> 6C) > >>>>>>>>>> > >>>>>>>>>> +extern bool block_sync_cache; > >>>>>>>>>> + > >>>>>>>>>> /* > >>>>>>>>>> * Raid context flags > >>>>>>>>>> */ > >>>>>>>>>> > >>>>>>>>> Be extra careful with that. > >>>>>>>>> > >>>>>>>>> SYNCHRONIZE_CACHE has (potentially) a global scope, as > >>>>>>>>> there > >>>>>>>>> might be an array-wide cache, and a cache flush would > >>>>>>>>> affect the > >>>>>>>>> entire cache. Linux is using SYNCHRONIZE_CACHE as a per > >>>>>>>>> device > >>>>>>>>> setting, ie it assumes that the effects of a cache flush is > >>>>>>>>> restricted to the device in question. > >>>>>>>>> > >>>>>>>>> If this is _not_ the case I'd rather not enable it. > >>>>>>>>> Have you checked that enabling this functionality doesn't > >>>>>>>>> have > >>>>>>>>> negative performance impact? > >>>>>>>>> > >>>>>>>>> Cheers, > >>>>>>>>> > >>>>>>>>> Hannes > >>>>>>>> This must go in - without this fix, there is no data > >>>>>>>> integrity for > >>>>>>>> any file system. > >>>>>>> That's not what I get from the change log. What it says to me > >>>>>>> is > >>>>>>> that the caches are currently firmware managed. Barring > >>>>>>> firmware > >>>>>>> bugs, that means that we currently don't have any integrity > >>>>>>> issues. > >>>>>> Your understanding (or the change log) is incorrect. > >>>>> There's no current way in English of construing "as firmware takes > >>>>> care of > >>>>> flushing cache" to mean its inverse, so the changelog needs > >>>>> updating to > >>>>> explain > >>>>> that firmware does *not* take care of cache flushing, so > >>>>> effectively > >>>>> nothing > >>>>> does and we'll need a cc to stable if the problem is that nothing > >>>>> takes > >>>>> care of > >>>>> flushing the drive caches. > >>>>> > >>>>> James > >>>> Sorry for confusion. More accurate to say - > >>>> > >>>> Current megaraid_sas driver returns SYNCHRONIZE_CACHE command > back to > >>>> SCSI layer without sending it down to firmware as firmware supposed > >>>> to takes care of flushing cache for all Virtual Disk at the time of > >>>> system reboot/shutdown. Because of above FW rule driver coded > wrongly > >>>> and it added bug that SYNCHRONIZE_CACHE is not passed to FW for > JBOD > >>>> as well. (MR Firmware wanted OS to manage SYNCHRONIZE_CACHE > and pass > >>>> the same to the Firmware. ). We figure out that even for VD, why > >>>> driver should send back SYNC_CACHE command...let's have the same > in > >>>> FW (giving all control in FW) > >>>> > >>>> New behavior described - > >>>> > >>>> IF application requests SYNCH_CACHE > >>>> IF any FW which supports new API bit called canHandleSyncCache > >>>> Driver sends SYNCH_CACHE command to the FW > >>>> IF 'VirtualDisk' > >>>> FW does not send SYNCH_CACHE to drives > >>>> FW returns SUCCESS > >>>> ELSE IF 'JBOD' > >>>> FW sends SYNCH_CACHE to drive > >>>> FW obtains status from drive and returns same status back to > >>>> driver > >>>> ENDIF > >>>> > >>>> Fixing this is robust if FW and driver changes are inline. See below > >>>> for more detail. > >>>> > >>>> Case - 1 > >>>> This patch is attempt to fix one level problem where Virtual Disks > >>>> are not managed correctly in MR FW. There are some MR FW (E.a Gen2 > >>>> and Gen2.5 FW), which set WCE bit for Virtual Disk but FW is not > >>>> reply correct for SYNCH_CACHE. This was handled in past and carry > >>>> forward (in driver + FW ) to all new generation products as well. We > >>>> tried to collect all possible details why it was handled such way to > >>>> provide better driver fix for this issue(mainly to avoid this FW > >>>> checks and module parameters etc..). But now it looks like to make > >>>> generic fix (Device ID based etc..)is even risky, so went with this > >>>> protective approach. It is very risky to find out which Device ID > >>>> and FW has capability to manage SYNC_CACHE, so we managed to > figure > >>>> out which are the new FW using FW capability bit. > >>>> > >>>> E.g Liberator/Thunderbolt MR FW SYNC_CACHE is write Unsupported > >>>> command (this is a firmware bug) and immediately it will be failed to > >>>> SML. This will cause File system to go Read-only. > >>> That's a better description ... what you're saying is that the patch > >>> doesn't actually fix the bug Ric is worrying about. > >>> > >>>> Case -2 > >>>> One more thing which we are trying and known driver can send > >>>> "SYNC_CACHE" unconditionally to all generation of FW. For this we > >>>> are doing more unit testing on various controllers/FW and planning to > >>>> provide another patch to fix JBOD path for any FW. (It will not be > >>>> based on FW capability/module parameter). > >>> OK, but we really need this ASAP. As Ric said, data integrity is at > >>> risk until this is fixed. Can you also reference the commit that > >>> introduced the problem so we know how far to do the sable backports? > >>> > >>>> If we remove module parameter, we can ask customer to disable WCE > on > >>>> drive to get similar impact. > >>> Thanks, > >>> > >>> James > >>> > >>> -- > >>> To unsubscribe from this list: send the line "unsubscribe linux-scsi" in > >>> the body of a message to majordomo@vger.kernel.org > >>> More majordomo info at http://vger.kernel.org/majordomo-info.html > >> > >
On 19.10.2016 20:07, Raghava Aditya Renukunta wrote: > Hi Tomas, > >> -----Original Message----- >> From: Tomas Henzl [mailto:thenzl@redhat.com] >> Sent: Wednesday, October 19, 2016 5:56 AM >> To: Ching Huang >> Cc: James Bottomley; Kashyap Desai; Ric Wheeler; Hannes Reinecke; Sumit >> Saxena; linux-scsi@vger.kernel.org; martin.petersen@oracle.com; Christoph >> Hellwig; Martin K. Petersen; Jeff Moyer; Gris Ge; Ewan Milne; Jens Axboe; >> Raghava Aditya Renukunta >> Subject: Re: [PATCH 4/7] megaraid_sas: Send SYNCHRONIZE_CACHE >> command to firmware >> >> EXTERNAL EMAIL >> >> >> On 19.10.2016 11:50, Ching Huang wrote: >>> Hi Tomas, >>> >>> SCSI command checking in queuecommand function of arcmsr can be >> removed safely. >>> Now driver can pass all scsi command to controller firmware. >> thanks for your quick reply. Is this (passing the SYNCHRONIZE_CACHE >> command >> to firmware) safe for all arcmsr cards, even the oldest? >> >> Please start with your patch a new thread and add your 'Signed-off-by:' line. >> >>> diff -uprN a/drivers/scsi/arcmsr/arcmsr_hba.c >> b/drivers/scsi/arcmsr/arcmsr_hba.c >>> --- a/drivers/scsi/arcmsr/arcmsr_hba.c 2016-10-19 16:18:45.000000000 >> +0800 >>> +++ b/drivers/scsi/arcmsr/arcmsr_hba.c 2016-10-19 16:31:59.665524699 >> +0800 >>> @@ -2623,13 +2623,6 @@ static int arcmsr_queue_command_lck(stru >>> cmd->scsi_done = done; >>> cmd->host_scribble = NULL; >>> cmd->result = 0; >>> - if ((scsicmd == SYNCHRONIZE_CACHE) ||(scsicmd == >> SEND_DIAGNOSTIC)){ >>> - if(acb->devstate[target][lun] == ARECA_RAID_GONE) { >>> - cmd->result = (DID_NO_CONNECT << 16); >>> - } >>> - cmd->scsi_done(cmd); >>> - return 0; >>> - } >>> if (target == 16) { >>> /* virtual device for iop message transfer */ >>> arcmsr_handle_virtual_command(acb, cmd); >>> >>> Thanks >>> Ching >>> On Tue, 2016-10-18 at 15:56 +0200, Tomas Henzl wrote: >>>> Hi, >>>> >>>> similar suspicious code path can be found in the queuecommand >> functions in other drivers too >>>> these are - >>>> pmcraid.c >>>> arcmsr_hba.c >>>> cc-ing maintainers - >>>> (but both drivers seem to be unmaintained for a while - >>>> I've added Ching for arcmsr and Raghava for pmcraid) > We do not support this card anymore nor do we have the hardware to test > it out, but let me try and procure the hardware for testing although > chances are very slim. Thanks for looking into this even though the driver is no more supported. > > Regards, > Raghava Aditya > >>>> please read this thread and verify that your driver and firmware is safe. >>>> >>>> It is expected that a raid card controls the disk write cache (switches it off) >>>> but this might not be true for all modes of operation a raid adapter >> handles >>>> - pass through/non-RAID etc. In such case when disk write cache is >> enabled >>>> and your driver skips the SYNCHRONIZE_CACHE command a FS corruption >>>> is very much possible during unexpected power loss or even a clean >> shutdown. >>>> tomash >>>> >>>> On 17.10.2016 19:51, James Bottomley wrote: >>>>> On Mon, 2016-10-17 at 23:06 +0530, Kashyap Desai wrote: >>>>>>> -----Original Message----- >>>>>>> From: James Bottomley [mailto:jejb@linux.vnet.ibm.com] >>>>>>> Sent: Monday, October 17, 2016 10:50 PM >>>>>>> To: Ric Wheeler; Hannes Reinecke; Sumit Saxena; >>>>>>> linux-scsi@vger.kernel.org >>>>>>> Cc: martin.petersen@oracle.com; thenzl@redhat.com; >>>>>>> kashyap.desai@broadcom.com; Christoph Hellwig; Martin K. >> Petersen; >>>>>>> Jeff >>>>>>> Moyer; Gris Ge; Ewan Milne; Jens Axboe >>>>>>> Subject: Re: [PATCH 4/7] megaraid_sas: Send SYNCHRONIZE_CACHE >>>>>>> command >>>>>>> to firmware >>>>>>> >>>>>>> On Mon, 2016-10-17 at 12:27 -0400, Ric Wheeler wrote: >>>>>>>> On 10/17/2016 12:20 PM, James Bottomley wrote: >>>>>>>>> On Mon, 2016-10-17 at 09:01 -0400, Ric Wheeler wrote: >>>>>>>>>> On 10/17/2016 07:34 AM, Hannes Reinecke wrote: >>>>>>>>>>> On 10/17/2016 12:24 PM, Sumit Saxena wrote: >>>>>>>>>>>> megaraid_sas driver returns SYNCHRONIZE_CACHE command >>>>>>>>>>>> back to >>>>>>>>>>>> SCSI layer without sending it to firmware as firmware >>>>>>>>>>>> takes >>>>>>>>>>>> care of flushing cache. This patch will change the >>>>>>>>>>>> driver >>>>>>>>>>>> behavior wrt SYNCHRONIZE_CACHE handling. If underlying >>>>>>>>>>>> firmware has support to handle the SYNCHORNIZE_CACHE, >>>>>>>>>>>> driver >>>>>>>>>>>> will send it for firmware otherwise complete it back to >>>>>>>>>>>> SCSI >>>>>>>>>>>> layer with SUCCESS immediately. If Firmware handle >>>>>>>>>>>> SYNCHORNIZE_CACHE for both VD and JBOD >>>>>>>>>>>> "canHandleSyncCache" >>>>>>>>>>>> bit in scratch pad register(offset >>>>>>>>>>>> 0x00B4) will be set. >>>>>>>>>>>> >>>>>>>>>>>> This behavior can be controlled via module parameter and >>>>>>>>>>>> user >>>>>>>>>>>> can fallback to old behavior of returning >>>>>>>>>>>> SYNCHRONIZE_CACHE by >>>>>>>>>>>> driver only without sending it to firmware. >>>>>>>>>>>> >>>>>>>>>>>> Signed-off-by: Sumit Saxena <sumit.saxena@broadcom.com> >>>>>>>>>>>> Signed-off-by: Kashyap Desai >> <kashyap.desai@broadcom.com> >>>>>>>>>>>> --- >>>>>>>>>>>> drivers/scsi/megaraid/megaraid_sas.h | 3 +++ >>>>>>>>>>>> drivers/scsi/megaraid/megaraid_sas_base.c | 14 >>>>>>>>>>>> ++++++--- >>>>>>>>>>>> ----- >>>>>>>>>>>> drivers/scsi/megaraid/megaraid_sas_fusion.c | 3 +++ >>>>>>>>>>>> drivers/scsi/megaraid/megaraid_sas_fusion.h | 2 ++ >>>>>>>>>>>> 4 files changed, 14 insertions(+), 8 deletions(-) >>>>>>>>>>>> >>>>>>>>>>>> diff --git a/drivers/scsi/megaraid/megaraid_sas.h >>>>>>>>>>>> b/drivers/scsi/megaraid/megaraid_sas.h >>>>>>>>>>>> index ca86c88..43fd14f 100644 >>>>>>>>>>>> --- a/drivers/scsi/megaraid/megaraid_sas.h >>>>>>>>>>>> +++ b/drivers/scsi/megaraid/megaraid_sas.h >>>>>>>>>>>> @@ -1429,6 +1429,8 @@ enum FW_BOOT_CONTEXT { >>>>>>>>>>>> #define MR_MAX_REPLY_QUEUES_EXT_OFFSET_SHIFT 14 >>>>>>>>>>>> #define MR_MAX_MSIX_REG_ARRAY 16 >>>>>>>>>>>> #define MR_RDPQ_MODE_OFFSET 0X0 >>>>>>>>>>>> 0800 >>>>>>>>>>>> 000 >>>>>>>>>>>> +#define MR_CAN_HANDLE_SYNC_CACHE_OFFSET 0 >>>>>>>>>>>> X010 >>>>>>>>>>>> 0000 >>>>>>>>>>>> 0 >>>>>>>>>>>> + >>>>>>>>>>>> /* >>>>>>>>>>>> * register set for both 1068 and 1078 controllers >>>>>>>>>>>> * structure extended for 1078 registers @@ -2140,6 >>>>>>>>>>>> +2142,7 >>>>>>>>>>>> @@ struct megasas_instance { >>>>>>>>>>>> u8 is_imr; >>>>>>>>>>>> u8 is_rdpq; >>>>>>>>>>>> bool dev_handle; >>>>>>>>>>>> + bool fw_sync_cache_support; >>>>>>>>>>>> }; >>>>>>>>>>>> struct MR_LD_VF_MAP { >>>>>>>>>>>> u32 size; >>>>>>>>>>>> diff --git a/drivers/scsi/megaraid/megaraid_sas_base.c >>>>>>>>>>>> b/drivers/scsi/megaraid/megaraid_sas_base.c >>>>>>>>>>>> index 092387f..a4a8e2f 100644 >>>>>>>>>>>> --- a/drivers/scsi/megaraid/megaraid_sas_base.c >>>>>>>>>>>> +++ b/drivers/scsi/megaraid/megaraid_sas_base.c >>>>>>>>>>>> @@ -104,6 +104,10 @@ unsigned int scmd_timeout = >>>>>>>>>>>> MEGASAS_DEFAULT_CMD_TIMEOUT; >>>>>>>>>>>> module_param(scmd_timeout, int, S_IRUGO); >>>>>>>>>>>> MODULE_PARM_DESC(scmd_timeout, "scsi command >> timeout >>>>>>>>>>>> (10 >>>>>>>>>>>> -90s), default 90s. See megasas_reset_timer."); >>>>>>>>>>>> >>>>>>>>>>>> +bool block_sync_cache; >>>>>>>>>>>> +module_param(block_sync_cache, bool, S_IRUGO); >>>>>>>>>>>> +MODULE_PARM_DESC(block_sync_cache, "Block SYNC CACHE >> by >>>>>>>>>>>> driver >>>>>>>>>>>> Default: 0(Send it to firmware)"); >>>>>>>>>>>> + >>>>>>>>>>>> MODULE_LICENSE("GPL"); >>>>>>>>>>>> MODULE_VERSION(MEGASAS_VERSION); >>>>>>>>>>>> MODULE_AUTHOR("megaraidlinux.pdl@avagotech.com"); >>>>>>>>>>>> @@ -1700,16 +1704,10 @@ megasas_queue_command(struct >>>>>>> Scsi_Host >>>>>>>>>>>> *shost, struct scsi_cmnd *scmd) >>>>>>>>>>>> goto out_done; >>>>>>>>>>>> } >>>>>>>>>>>> >>>>>>>>>>>> - switch (scmd->cmnd[0]) { >>>>>>>>>>>> - case SYNCHRONIZE_CACHE: >>>>>>>>>>>> - /* >>>>>>>>>>>> - * FW takes care of flush cache on its >>>>>>>>>>>> own >>>>>>>>>>>> - * No need to send it down >>>>>>>>>>>> - */ >>>>>>>>>>>> + if ((scmd->cmnd[0] == SYNCHRONIZE_CACHE) && >>>>>>>>>>>> + (!instance->fw_sync_cache_support)) { >>>>>>>>>>>> scmd->result = DID_OK << 16; >>>>>>>>>>>> goto out_done; >>>>>>>>>>>> - default: >>>>>>>>>>>> - break; >>>>>>>>>>>> } >>>>>>>>>>>> >>>>>>>>>>>> return instance->instancet >>>>>>>>>>>> ->build_and_issue_cmd(instance, scmd); >>>>>>>>>>>> diff --git a/drivers/scsi/megaraid/megaraid_sas_fusion.c >>>>>>>>>>>> b/drivers/scsi/megaraid/megaraid_sas_fusion.c >>>>>>>>>>>> index 2159f6a..8237580 100644 >>>>>>>>>>>> --- a/drivers/scsi/megaraid/megaraid_sas_fusion.c >>>>>>>>>>>> +++ b/drivers/scsi/megaraid/megaraid_sas_fusion.c >>>>>>>>>>>> @@ -747,6 +747,9 @@ megasas_ioc_init_fusion(struct >>>>>>>>>>>> megasas_instance *instance) >>>>>>>>>>>> ret = 1; >>>>>>>>>>>> goto fail_fw_init; >>>>>>>>>>>> } >>>>>>>>>>>> + if (!block_sync_cache) >>>>>>>>>>>> + instance->fw_sync_cache_support = >>>>>>>>>>>> (scratch_pad_2 >>>>>>>>>>>> & >>>>>>>>>>>> + MR_CAN_HANDLE_SYNC_CACHE_OFFSET) >>>>>>>>>>>> ? 1 >>>>>>>>>>>> : >>>>>>>>>>>> 0; >>>>>>>>>>>> >>>>>>>>>>>> IOCInitMessage = >>>>>>>>>>>> dma_alloc_coherent(&instance->pdev->dev, >>>>>>>>>>>> diff --git a/drivers/scsi/megaraid/megaraid_sas_fusion.h >>>>>>>>>>>> b/drivers/scsi/megaraid/megaraid_sas_fusion.h >>>>>>>>>>>> index e3bee04..2857154 100644 >>>>>>>>>>>> --- a/drivers/scsi/megaraid/megaraid_sas_fusion.h >>>>>>>>>>>> +++ b/drivers/scsi/megaraid/megaraid_sas_fusion.h >>>>>>>>>>>> @@ -72,6 +72,8 @@ >>>>>>>>>>>> #define MPI2_SUP_REPLY_POST_HOST_INDEX_OFFSET >>>>>>>>>>>> (0x0000030C) >>>>>>>>>>>> #define MPI2_REPLY_POST_HOST_INDEX_OFFSET (0x00 >>>>>>>>>>>> 0000 >>>>>>>>>>>> 6C) >>>>>>>>>>>> >>>>>>>>>>>> +extern bool block_sync_cache; >>>>>>>>>>>> + >>>>>>>>>>>> /* >>>>>>>>>>>> * Raid context flags >>>>>>>>>>>> */ >>>>>>>>>>>> >>>>>>>>>>> Be extra careful with that. >>>>>>>>>>> >>>>>>>>>>> SYNCHRONIZE_CACHE has (potentially) a global scope, as >>>>>>>>>>> there >>>>>>>>>>> might be an array-wide cache, and a cache flush would >>>>>>>>>>> affect the >>>>>>>>>>> entire cache. Linux is using SYNCHRONIZE_CACHE as a per >>>>>>>>>>> device >>>>>>>>>>> setting, ie it assumes that the effects of a cache flush is >>>>>>>>>>> restricted to the device in question. >>>>>>>>>>> >>>>>>>>>>> If this is _not_ the case I'd rather not enable it. >>>>>>>>>>> Have you checked that enabling this functionality doesn't >>>>>>>>>>> have >>>>>>>>>>> negative performance impact? >>>>>>>>>>> >>>>>>>>>>> Cheers, >>>>>>>>>>> >>>>>>>>>>> Hannes >>>>>>>>>> This must go in - without this fix, there is no data >>>>>>>>>> integrity for >>>>>>>>>> any file system. >>>>>>>>> That's not what I get from the change log. What it says to me >>>>>>>>> is >>>>>>>>> that the caches are currently firmware managed. Barring >>>>>>>>> firmware >>>>>>>>> bugs, that means that we currently don't have any integrity >>>>>>>>> issues. >>>>>>>> Your understanding (or the change log) is incorrect. >>>>>>> There's no current way in English of construing "as firmware takes >>>>>>> care of >>>>>>> flushing cache" to mean its inverse, so the changelog needs >>>>>>> updating to >>>>>>> explain >>>>>>> that firmware does *not* take care of cache flushing, so >>>>>>> effectively >>>>>>> nothing >>>>>>> does and we'll need a cc to stable if the problem is that nothing >>>>>>> takes >>>>>>> care of >>>>>>> flushing the drive caches. >>>>>>> >>>>>>> James >>>>>> Sorry for confusion. More accurate to say - >>>>>> >>>>>> Current megaraid_sas driver returns SYNCHRONIZE_CACHE command >> back to >>>>>> SCSI layer without sending it down to firmware as firmware supposed >>>>>> to takes care of flushing cache for all Virtual Disk at the time of >>>>>> system reboot/shutdown. Because of above FW rule driver coded >> wrongly >>>>>> and it added bug that SYNCHRONIZE_CACHE is not passed to FW for >> JBOD >>>>>> as well. (MR Firmware wanted OS to manage SYNCHRONIZE_CACHE >> and pass >>>>>> the same to the Firmware. ). We figure out that even for VD, why >>>>>> driver should send back SYNC_CACHE command...let's have the same >> in >>>>>> FW (giving all control in FW) >>>>>> >>>>>> New behavior described - >>>>>> >>>>>> IF application requests SYNCH_CACHE >>>>>> IF any FW which supports new API bit called canHandleSyncCache >>>>>> Driver sends SYNCH_CACHE command to the FW >>>>>> IF 'VirtualDisk' >>>>>> FW does not send SYNCH_CACHE to drives >>>>>> FW returns SUCCESS >>>>>> ELSE IF 'JBOD' >>>>>> FW sends SYNCH_CACHE to drive >>>>>> FW obtains status from drive and returns same status back to >>>>>> driver >>>>>> ENDIF >>>>>> >>>>>> Fixing this is robust if FW and driver changes are inline. See below >>>>>> for more detail. >>>>>> >>>>>> Case - 1 >>>>>> This patch is attempt to fix one level problem where Virtual Disks >>>>>> are not managed correctly in MR FW. There are some MR FW (E.a Gen2 >>>>>> and Gen2.5 FW), which set WCE bit for Virtual Disk but FW is not >>>>>> reply correct for SYNCH_CACHE. This was handled in past and carry >>>>>> forward (in driver + FW ) to all new generation products as well. We >>>>>> tried to collect all possible details why it was handled such way to >>>>>> provide better driver fix for this issue(mainly to avoid this FW >>>>>> checks and module parameters etc..). But now it looks like to make >>>>>> generic fix (Device ID based etc..)is even risky, so went with this >>>>>> protective approach. It is very risky to find out which Device ID >>>>>> and FW has capability to manage SYNC_CACHE, so we managed to >> figure >>>>>> out which are the new FW using FW capability bit. >>>>>> >>>>>> E.g Liberator/Thunderbolt MR FW SYNC_CACHE is write Unsupported >>>>>> command (this is a firmware bug) and immediately it will be failed to >>>>>> SML. This will cause File system to go Read-only. >>>>> That's a better description ... what you're saying is that the patch >>>>> doesn't actually fix the bug Ric is worrying about. >>>>> >>>>>> Case -2 >>>>>> One more thing which we are trying and known driver can send >>>>>> "SYNC_CACHE" unconditionally to all generation of FW. For this we >>>>>> are doing more unit testing on various controllers/FW and planning to >>>>>> provide another patch to fix JBOD path for any FW. (It will not be >>>>>> based on FW capability/module parameter). >>>>> OK, but we really need this ASAP. As Ric said, data integrity is at >>>>> risk until this is fixed. Can you also reference the commit that >>>>> introduced the problem so we know how far to do the sable backports? >>>>> >>>>>> If we remove module parameter, we can ask customer to disable WCE >> on >>>>>> drive to get similar impact. >>>>> Thanks, >>>>> >>>>> James >>>>> >>>>> -- >>>>> To unsubscribe from this list: send the line "unsubscribe linux-scsi" in >>>>> the body of a message to majordomo@vger.kernel.org >>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html > N�����r��y���b�X��ǧv�^�){.n�+����{���"�{ay�ʇڙ�,j��f���h���z��w������j:+v���w�j�m��������zZ+�����ݢj"��!tml= -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
>>>>> "Tomas" == Tomas Henzl <thenzl@redhat.com> writes: >> SCSI command checking in queuecommand function of arcmsr can be >> removed safely. Now driver can pass all scsi command to controller >> firmware. Tomas> Is this (passing the SYNCHRONIZE_CACHE command to firmware) safe Tomas> for all arcmsr cards, even the oldest? Ching?
>>>>> "ching" == 黃清隆 <ching2048@areca.com.tw> writes:
ching,
ching> Yes. Passing SYNCHRONIZE_CACHE command to firmware is safe for
ching> all Areca Raid controllers, even the oldest.
I applied your patch to 4.9/scsi-fixes.
diff --git a/drivers/scsi/megaraid/megaraid_sas.h b/drivers/scsi/megaraid/megaraid_sas.h index ca86c88..43fd14f 100644 --- a/drivers/scsi/megaraid/megaraid_sas.h +++ b/drivers/scsi/megaraid/megaraid_sas.h @@ -1429,6 +1429,8 @@ enum FW_BOOT_CONTEXT { #define MR_MAX_REPLY_QUEUES_EXT_OFFSET_SHIFT 14 #define MR_MAX_MSIX_REG_ARRAY 16 #define MR_RDPQ_MODE_OFFSET 0X00800000 +#define MR_CAN_HANDLE_SYNC_CACHE_OFFSET 0X01000000 + /* * register set for both 1068 and 1078 controllers * structure extended for 1078 registers @@ -2140,6 +2142,7 @@ struct megasas_instance { u8 is_imr; u8 is_rdpq; bool dev_handle; + bool fw_sync_cache_support; }; struct MR_LD_VF_MAP { u32 size; diff --git a/drivers/scsi/megaraid/megaraid_sas_base.c b/drivers/scsi/megaraid/megaraid_sas_base.c index 092387f..a4a8e2f 100644 --- a/drivers/scsi/megaraid/megaraid_sas_base.c +++ b/drivers/scsi/megaraid/megaraid_sas_base.c @@ -104,6 +104,10 @@ unsigned int scmd_timeout = MEGASAS_DEFAULT_CMD_TIMEOUT; module_param(scmd_timeout, int, S_IRUGO); MODULE_PARM_DESC(scmd_timeout, "scsi command timeout (10-90s), default 90s. See megasas_reset_timer."); +bool block_sync_cache; +module_param(block_sync_cache, bool, S_IRUGO); +MODULE_PARM_DESC(block_sync_cache, "Block SYNC CACHE by driver Default: 0(Send it to firmware)"); + MODULE_LICENSE("GPL"); MODULE_VERSION(MEGASAS_VERSION); MODULE_AUTHOR("megaraidlinux.pdl@avagotech.com"); @@ -1700,16 +1704,10 @@ megasas_queue_command(struct Scsi_Host *shost, struct scsi_cmnd *scmd) goto out_done; } - switch (scmd->cmnd[0]) { - case SYNCHRONIZE_CACHE: - /* - * FW takes care of flush cache on its own - * No need to send it down - */ + if ((scmd->cmnd[0] == SYNCHRONIZE_CACHE) && + (!instance->fw_sync_cache_support)) { scmd->result = DID_OK << 16; goto out_done; - default: - break; } return instance->instancet->build_and_issue_cmd(instance, scmd); diff --git a/drivers/scsi/megaraid/megaraid_sas_fusion.c b/drivers/scsi/megaraid/megaraid_sas_fusion.c index 2159f6a..8237580 100644 --- a/drivers/scsi/megaraid/megaraid_sas_fusion.c +++ b/drivers/scsi/megaraid/megaraid_sas_fusion.c @@ -747,6 +747,9 @@ megasas_ioc_init_fusion(struct megasas_instance *instance) ret = 1; goto fail_fw_init; } + if (!block_sync_cache) + instance->fw_sync_cache_support = (scratch_pad_2 & + MR_CAN_HANDLE_SYNC_CACHE_OFFSET) ? 1 : 0; IOCInitMessage = dma_alloc_coherent(&instance->pdev->dev, diff --git a/drivers/scsi/megaraid/megaraid_sas_fusion.h b/drivers/scsi/megaraid/megaraid_sas_fusion.h index e3bee04..2857154 100644 --- a/drivers/scsi/megaraid/megaraid_sas_fusion.h +++ b/drivers/scsi/megaraid/megaraid_sas_fusion.h @@ -72,6 +72,8 @@ #define MPI2_SUP_REPLY_POST_HOST_INDEX_OFFSET (0x0000030C) #define MPI2_REPLY_POST_HOST_INDEX_OFFSET (0x0000006C) +extern bool block_sync_cache; + /* * Raid context flags */