diff mbox series

[v2] hwmon: (drivetemp) Avoid SCT usage on Toshiba DT01ACA family drives

Message ID 8325438e25a3a5a7e6d12ef6ede8f4350e4c65be.1594500029.git.mail@maciej.szmigiero.name (mailing list archive)
State Changes Requested
Headers show
Series [v2] hwmon: (drivetemp) Avoid SCT usage on Toshiba DT01ACA family drives | expand

Commit Message

Maciej S. Szmigiero July 11, 2020, 8:41 p.m. UTC
It has been observed that Toshiba DT01ACA family drives have
WRITE FPDMA QUEUED command timeouts and sometimes just freeze until
power-cycled under heavy write loads when their temperature is getting
polled in SCT mode. The SMART mode seems to be fine, though.

Let's make sure we don't use SCT mode for these drives then.

While only the 3 TB model was actually caught exhibiting the problem let's
play safe here to avoid data corruption and extend the ban to the whole
family.

Fixes: 5b46903d8bf3 ("hwmon: Driver for disk and solid state drives with temperature sensors")
Cc: stable@vger.kernel.org
Signed-off-by: Maciej S. Szmigiero <mail@maciej.szmigiero.name>
---

Notes:
    This behavior was observed on two different DT01ACA3 drives.
    
    Usually, a series of queued WRITE FPDMA QUEUED commands just time out,
    but sometimes the whole drive freezes. Merely disconnecting and
    reconnecting SATA interface cable then does not unfreeze the drive.
    
    One has to disconnect and reconnect the drive power connector for the
    drive to be detected again (suggesting the drive firmware itself has
    crashed).
    
    This only happens when the drive temperature is polled very often (like
    every second), so occasional SCT usage via smartmontools is probably
    safe.
    
    Changes from v1:
    'SCT blacklist' -> 'SCT avoid models'

 drivers/hwmon/drivetemp.c | 37 +++++++++++++++++++++++++++++++++++++
 1 file changed, 37 insertions(+)

Comments

Guenter Roeck July 14, 2020, 2:02 p.m. UTC | #1
On 7/11/20 1:41 PM, Maciej S. Szmigiero wrote:
> It has been observed that Toshiba DT01ACA family drives have
> WRITE FPDMA QUEUED command timeouts and sometimes just freeze until
> power-cycled under heavy write loads when their temperature is getting
> polled in SCT mode. The SMART mode seems to be fine, though.
> 
> Let's make sure we don't use SCT mode for these drives then.
> 
> While only the 3 TB model was actually caught exhibiting the problem let's
> play safe here to avoid data corruption and extend the ban to the whole
> family.
> 
> Fixes: 5b46903d8bf3 ("hwmon: Driver for disk and solid state drives with temperature sensors")
> Cc: stable@vger.kernel.org
> Signed-off-by: Maciej S. Szmigiero <mail@maciej.szmigiero.name>
> ---
> 
> Notes:
>     This behavior was observed on two different DT01ACA3 drives.
>     
>     Usually, a series of queued WRITE FPDMA QUEUED commands just time out,
>     but sometimes the whole drive freezes. Merely disconnecting and
>     reconnecting SATA interface cable then does not unfreeze the drive.
>     
>     One has to disconnect and reconnect the drive power connector for the
>     drive to be detected again (suggesting the drive firmware itself has
>     crashed).
>     
>     This only happens when the drive temperature is polled very often (like
>     every second), so occasional SCT usage via smartmontools is probably
>     safe.
>     
>     Changes from v1:
>     'SCT blacklist' -> 'SCT avoid models'
> 
>  drivers/hwmon/drivetemp.c | 37 +++++++++++++++++++++++++++++++++++++
>  1 file changed, 37 insertions(+)
> 
> diff --git a/drivers/hwmon/drivetemp.c b/drivers/hwmon/drivetemp.c
> index 0d4f3d97ffc6..da9cfcbecc96 100644
> --- a/drivers/hwmon/drivetemp.c
> +++ b/drivers/hwmon/drivetemp.c
> @@ -285,6 +285,36 @@ static int drivetemp_get_scttemp(struct drivetemp_data *st, u32 attr, long *val)
>  	return err;
>  }
>  
> +static const char * const sct_avoid_models[] = {
> +/*
> + * These drives will have WRITE FPDMA QUEUED command timeouts and sometimes just
> + * freeze until power-cycled under heavy write loads when their temperature is
> + * getting polled in SCT mode. The SMART mode seems to be fine, though.
> + *
> + * While only the 3 TB model was actually caught exhibiting the problem
> + * let's play safe here to avoid data corruption and ban the whole family.
> + */
> +	"TOSHIBA DT01ACA0",
> +	"TOSHIBA DT01ACA1",
> +	"TOSHIBA DT01ACA2",
> +	"TOSHIBA DT01ACA3",
> +};
> +
> +static bool drivetemp_sct_avoid(struct drivetemp_data *st)
> +{
> +	struct scsi_device *sdev = st->sdev;
> +	unsigned int ctr;
> +
> +	if (!sdev->model)
> +		return false;
> +
> +	for (ctr = 0; ctr < ARRAY_SIZE(sct_avoid_models); ctr++)
> +		if (strncmp(sdev->model, sct_avoid_models[ctr], 16) == 0)

Why strncmp, and why length 16 ? Both strings are, as far as I can see,
0 terminated. A fixed length only asks for trouble later on as more models
are added to the list.

Also, please use "!" instead of "== 0".

> +			return true;
> +
> +	return false;
> +}
> +
>  static int drivetemp_identify_sata(struct drivetemp_data *st)
>  {
>  	struct scsi_device *sdev = st->sdev;
> @@ -326,6 +356,13 @@ static int drivetemp_identify_sata(struct drivetemp_data *st)
>  	/* bail out if this is not a SATA device */
>  	if (!is_ata || !is_sata)
>  		return -ENODEV;
> +
> +	if (have_sct && drivetemp_sct_avoid(st)) {
> +		dev_notice(&sdev->sdev_gendev,
> +			   "will avoid using SCT for temperature monitoring\n");
> +		have_sct = false;
> +	}
> +
>  	if (!have_sct)
>  		goto skip_sct;
>  
>
Maciej S. Szmigiero July 14, 2020, 2:11 p.m. UTC | #2
On 14.07.2020 16:02, Guenter Roeck wrote:
> On 7/11/20 1:41 PM, Maciej S. Szmigiero wrote:
>> It has been observed that Toshiba DT01ACA family drives have
>> WRITE FPDMA QUEUED command timeouts and sometimes just freeze until
>> power-cycled under heavy write loads when their temperature is getting
>> polled in SCT mode. The SMART mode seems to be fine, though.
>>
>> Let's make sure we don't use SCT mode for these drives then.
>>
>> While only the 3 TB model was actually caught exhibiting the problem let's
>> play safe here to avoid data corruption and extend the ban to the whole
>> family.
>>
>> Fixes: 5b46903d8bf3 ("hwmon: Driver for disk and solid state drives with temperature sensors")
>> Cc: stable@vger.kernel.org
>> Signed-off-by: Maciej S. Szmigiero <mail@maciej.szmigiero.name>
>> ---
>>
>> Notes:
>>     This behavior was observed on two different DT01ACA3 drives.
>>     
>>     Usually, a series of queued WRITE FPDMA QUEUED commands just time out,
>>     but sometimes the whole drive freezes. Merely disconnecting and
>>     reconnecting SATA interface cable then does not unfreeze the drive.
>>     
>>     One has to disconnect and reconnect the drive power connector for the
>>     drive to be detected again (suggesting the drive firmware itself has
>>     crashed).
>>     
>>     This only happens when the drive temperature is polled very often (like
>>     every second), so occasional SCT usage via smartmontools is probably
>>     safe.
>>     
>>     Changes from v1:
>>     'SCT blacklist' -> 'SCT avoid models'
>>
>>  drivers/hwmon/drivetemp.c | 37 +++++++++++++++++++++++++++++++++++++
>>  1 file changed, 37 insertions(+)
>>
>> diff --git a/drivers/hwmon/drivetemp.c b/drivers/hwmon/drivetemp.c
>> index 0d4f3d97ffc6..da9cfcbecc96 100644
>> --- a/drivers/hwmon/drivetemp.c
>> +++ b/drivers/hwmon/drivetemp.c
>> @@ -285,6 +285,36 @@ static int drivetemp_get_scttemp(struct drivetemp_data *st, u32 attr, long *val)
>>  	return err;
>>  }
>>  
>> +static const char * const sct_avoid_models[] = {
>> +/*
>> + * These drives will have WRITE FPDMA QUEUED command timeouts and sometimes just
>> + * freeze until power-cycled under heavy write loads when their temperature is
>> + * getting polled in SCT mode. The SMART mode seems to be fine, though.
>> + *
>> + * While only the 3 TB model was actually caught exhibiting the problem
>> + * let's play safe here to avoid data corruption and ban the whole family.
>> + */
>> +	"TOSHIBA DT01ACA0",
>> +	"TOSHIBA DT01ACA1",
>> +	"TOSHIBA DT01ACA2",
>> +	"TOSHIBA DT01ACA3",
>> +};
>> +
>> +static bool drivetemp_sct_avoid(struct drivetemp_data *st)
>> +{
>> +	struct scsi_device *sdev = st->sdev;
>> +	unsigned int ctr;
>> +
>> +	if (!sdev->model)
>> +		return false;
>> +
>> +	for (ctr = 0; ctr < ARRAY_SIZE(sct_avoid_models); ctr++)
>> +		if (strncmp(sdev->model, sct_avoid_models[ctr], 16) == 0)
> 
> Why strncmp, and why length 16 ? Both strings are, as far as I can see,
> 0 terminated. A fixed length only asks for trouble later on as more models
> are added to the list.

The first 16 bytes of sdev->model contain the model number, the rest
seems to be the drive serial number.
There is no NULL separator between them.

See how the SCSI device model is printed for the sysfs "model"
attribute:
https://elixir.bootlin.com/linux/v5.8-rc4/source/drivers/scsi/scsi_sysfs.c#L654

> Also, please use "!" instead of "== 0".

Will do.

Thanks,
Maciej
Guenter Roeck July 14, 2020, 5:14 p.m. UTC | #3
On 7/14/20 7:11 AM, Maciej S. Szmigiero wrote:
> On 14.07.2020 16:02, Guenter Roeck wrote:
>> On 7/11/20 1:41 PM, Maciej S. Szmigiero wrote:
>>> It has been observed that Toshiba DT01ACA family drives have
>>> WRITE FPDMA QUEUED command timeouts and sometimes just freeze until
>>> power-cycled under heavy write loads when their temperature is getting
>>> polled in SCT mode. The SMART mode seems to be fine, though.
>>>
>>> Let's make sure we don't use SCT mode for these drives then.
>>>
>>> While only the 3 TB model was actually caught exhibiting the problem let's
>>> play safe here to avoid data corruption and extend the ban to the whole
>>> family.
>>>
>>> Fixes: 5b46903d8bf3 ("hwmon: Driver for disk and solid state drives with temperature sensors")
>>> Cc: stable@vger.kernel.org
>>> Signed-off-by: Maciej S. Szmigiero <mail@maciej.szmigiero.name>
>>> ---
>>>
>>> Notes:
>>>     This behavior was observed on two different DT01ACA3 drives.
>>>     
>>>     Usually, a series of queued WRITE FPDMA QUEUED commands just time out,
>>>     but sometimes the whole drive freezes. Merely disconnecting and
>>>     reconnecting SATA interface cable then does not unfreeze the drive.
>>>     
>>>     One has to disconnect and reconnect the drive power connector for the
>>>     drive to be detected again (suggesting the drive firmware itself has
>>>     crashed).
>>>     
>>>     This only happens when the drive temperature is polled very often (like
>>>     every second), so occasional SCT usage via smartmontools is probably
>>>     safe.
>>>     
>>>     Changes from v1:
>>>     'SCT blacklist' -> 'SCT avoid models'
>>>
>>>  drivers/hwmon/drivetemp.c | 37 +++++++++++++++++++++++++++++++++++++
>>>  1 file changed, 37 insertions(+)
>>>
>>> diff --git a/drivers/hwmon/drivetemp.c b/drivers/hwmon/drivetemp.c
>>> index 0d4f3d97ffc6..da9cfcbecc96 100644
>>> --- a/drivers/hwmon/drivetemp.c
>>> +++ b/drivers/hwmon/drivetemp.c
>>> @@ -285,6 +285,36 @@ static int drivetemp_get_scttemp(struct drivetemp_data *st, u32 attr, long *val)
>>>  	return err;
>>>  }
>>>  
>>> +static const char * const sct_avoid_models[] = {
>>> +/*
>>> + * These drives will have WRITE FPDMA QUEUED command timeouts and sometimes just
>>> + * freeze until power-cycled under heavy write loads when their temperature is
>>> + * getting polled in SCT mode. The SMART mode seems to be fine, though.
>>> + *
>>> + * While only the 3 TB model was actually caught exhibiting the problem
>>> + * let's play safe here to avoid data corruption and ban the whole family.
>>> + */
>>> +	"TOSHIBA DT01ACA0",
>>> +	"TOSHIBA DT01ACA1",
>>> +	"TOSHIBA DT01ACA2",
>>> +	"TOSHIBA DT01ACA3",
>>> +};
>>> +
>>> +static bool drivetemp_sct_avoid(struct drivetemp_data *st)
>>> +{
>>> +	struct scsi_device *sdev = st->sdev;
>>> +	unsigned int ctr;
>>> +
>>> +	if (!sdev->model)
>>> +		return false;
>>> +
>>> +	for (ctr = 0; ctr < ARRAY_SIZE(sct_avoid_models); ctr++)
>>> +		if (strncmp(sdev->model, sct_avoid_models[ctr], 16) == 0)
>>
>> Why strncmp, and why length 16 ? Both strings are, as far as I can see,
>> 0 terminated. A fixed length only asks for trouble later on as more models
>> are added to the list.
> 
> The first 16 bytes of sdev->model contain the model number, the rest
> seems to be the drive serial number.
> There is no NULL separator between them.
> 

If the "16" is based on some SCSI standard, there should be
a define for it somewhere. Otherwise, the comparison should
use strlen(sct_avoid_models[ctr]) and explain the reason.
In the latter case, it might possibly make sense to match
"TOSHIBA DT01ACA" to also catch later models.

Thanks,
Guenter

> See how the SCSI device model is printed for the sysfs "model"
> attribute:
> https://elixir.bootlin.com/linux/v5.8-rc4/source/drivers/scsi/scsi_sysfs.c#L654
> 
>> Also, please use "!" instead of "== 0".
> 
> Will do.
> 
> Thanks,
> Maciej
>
Maciej S. Szmigiero July 14, 2020, 5:47 p.m. UTC | #4
On 14.07.2020 19:14, Guenter Roeck wrote:
> On 7/14/20 7:11 AM, Maciej S. Szmigiero wrote:
>> On 14.07.2020 16:02, Guenter Roeck wrote:
>>> On 7/11/20 1:41 PM, Maciej S. Szmigiero wrote:
>>>> It has been observed that Toshiba DT01ACA family drives have
>>>> WRITE FPDMA QUEUED command timeouts and sometimes just freeze until
>>>> power-cycled under heavy write loads when their temperature is getting
>>>> polled in SCT mode. The SMART mode seems to be fine, though.
>>>>
>>>> Let's make sure we don't use SCT mode for these drives then.
>>>>
>>>> While only the 3 TB model was actually caught exhibiting the problem let's
>>>> play safe here to avoid data corruption and extend the ban to the whole
>>>> family.
>>>>
>>>> Fixes: 5b46903d8bf3 ("hwmon: Driver for disk and solid state drives with temperature sensors")
>>>> Cc: stable@vger.kernel.org
>>>> Signed-off-by: Maciej S. Szmigiero <mail@maciej.szmigiero.name>
>>>> ---
>>>>
>>>> Notes:
>>>>     This behavior was observed on two different DT01ACA3 drives.
>>>>     
>>>>     Usually, a series of queued WRITE FPDMA QUEUED commands just time out,
>>>>     but sometimes the whole drive freezes. Merely disconnecting and
>>>>     reconnecting SATA interface cable then does not unfreeze the drive.
>>>>     
>>>>     One has to disconnect and reconnect the drive power connector for the
>>>>     drive to be detected again (suggesting the drive firmware itself has
>>>>     crashed).
>>>>     
>>>>     This only happens when the drive temperature is polled very often (like
>>>>     every second), so occasional SCT usage via smartmontools is probably
>>>>     safe.
>>>>     
>>>>     Changes from v1:
>>>>     'SCT blacklist' -> 'SCT avoid models'
>>>>
>>>>  drivers/hwmon/drivetemp.c | 37 +++++++++++++++++++++++++++++++++++++
>>>>  1 file changed, 37 insertions(+)
>>>>
>>>> diff --git a/drivers/hwmon/drivetemp.c b/drivers/hwmon/drivetemp.c
>>>> index 0d4f3d97ffc6..da9cfcbecc96 100644
>>>> --- a/drivers/hwmon/drivetemp.c
>>>> +++ b/drivers/hwmon/drivetemp.c
>>>> @@ -285,6 +285,36 @@ static int drivetemp_get_scttemp(struct drivetemp_data *st, u32 attr, long *val)
>>>>  	return err;
>>>>  }
>>>>  
>>>> +static const char * const sct_avoid_models[] = {
>>>> +/*
>>>> + * These drives will have WRITE FPDMA QUEUED command timeouts and sometimes just
>>>> + * freeze until power-cycled under heavy write loads when their temperature is
>>>> + * getting polled in SCT mode. The SMART mode seems to be fine, though.
>>>> + *
>>>> + * While only the 3 TB model was actually caught exhibiting the problem
>>>> + * let's play safe here to avoid data corruption and ban the whole family.
>>>> + */
>>>> +	"TOSHIBA DT01ACA0",
>>>> +	"TOSHIBA DT01ACA1",
>>>> +	"TOSHIBA DT01ACA2",
>>>> +	"TOSHIBA DT01ACA3",
>>>> +};
>>>> +
>>>> +static bool drivetemp_sct_avoid(struct drivetemp_data *st)
>>>> +{
>>>> +	struct scsi_device *sdev = st->sdev;
>>>> +	unsigned int ctr;
>>>> +
>>>> +	if (!sdev->model)
>>>> +		return false;
>>>> +
>>>> +	for (ctr = 0; ctr < ARRAY_SIZE(sct_avoid_models); ctr++)
>>>> +		if (strncmp(sdev->model, sct_avoid_models[ctr], 16) == 0)
>>>
>>> Why strncmp, and why length 16 ? Both strings are, as far as I can see,
>>> 0 terminated. A fixed length only asks for trouble later on as more models
>>> are added to the list.
>>
>> The first 16 bytes of sdev->model contain the model number, the rest
>> seems to be the drive serial number.
>> There is no NULL separator between them.
>>
> 
> If the "16" is based on some SCSI standard, there should be
> a define for it somewhere. 

The "model" field seems to contain just the raw SCSI INQUIRY
"product identification" field, which is space-filled to 16 bytes, but
is not NULL-terminated.

The SCSI layer seems to just open-code the number 16 everywhere, can't
see any specific define for this.

> Otherwise, the comparison should
> use strlen(sct_avoid_models[ctr]) and explain the reason.
> In the latter case, it might possibly make sense to match
> "TOSHIBA DT01ACA" to also catch later models.

Will use strlen() for prefix matching and match on the
"TOSHIBA DT01ACA" prefix then.

> Thanks,
> Guenter

Thanks,
Maciej
Guenter Roeck July 14, 2020, 7:39 p.m. UTC | #5
On 7/14/20 10:47 AM, Maciej S. Szmigiero wrote:
> On 14.07.2020 19:14, Guenter Roeck wrote:
>> On 7/14/20 7:11 AM, Maciej S. Szmigiero wrote:
>>> On 14.07.2020 16:02, Guenter Roeck wrote:
>>>> On 7/11/20 1:41 PM, Maciej S. Szmigiero wrote:
>>>>> It has been observed that Toshiba DT01ACA family drives have
>>>>> WRITE FPDMA QUEUED command timeouts and sometimes just freeze until
>>>>> power-cycled under heavy write loads when their temperature is getting
>>>>> polled in SCT mode. The SMART mode seems to be fine, though.
>>>>>
>>>>> Let's make sure we don't use SCT mode for these drives then.
>>>>>
>>>>> While only the 3 TB model was actually caught exhibiting the problem let's
>>>>> play safe here to avoid data corruption and extend the ban to the whole
>>>>> family.
>>>>>
>>>>> Fixes: 5b46903d8bf3 ("hwmon: Driver for disk and solid state drives with temperature sensors")
>>>>> Cc: stable@vger.kernel.org
>>>>> Signed-off-by: Maciej S. Szmigiero <mail@maciej.szmigiero.name>
>>>>> ---
>>>>>
>>>>> Notes:
>>>>>     This behavior was observed on two different DT01ACA3 drives.
>>>>>     
>>>>>     Usually, a series of queued WRITE FPDMA QUEUED commands just time out,
>>>>>     but sometimes the whole drive freezes. Merely disconnecting and
>>>>>     reconnecting SATA interface cable then does not unfreeze the drive.
>>>>>     
>>>>>     One has to disconnect and reconnect the drive power connector for the
>>>>>     drive to be detected again (suggesting the drive firmware itself has
>>>>>     crashed).
>>>>>     
>>>>>     This only happens when the drive temperature is polled very often (like
>>>>>     every second), so occasional SCT usage via smartmontools is probably
>>>>>     safe.
>>>>>     
>>>>>     Changes from v1:
>>>>>     'SCT blacklist' -> 'SCT avoid models'
>>>>>
>>>>>  drivers/hwmon/drivetemp.c | 37 +++++++++++++++++++++++++++++++++++++
>>>>>  1 file changed, 37 insertions(+)
>>>>>
>>>>> diff --git a/drivers/hwmon/drivetemp.c b/drivers/hwmon/drivetemp.c
>>>>> index 0d4f3d97ffc6..da9cfcbecc96 100644
>>>>> --- a/drivers/hwmon/drivetemp.c
>>>>> +++ b/drivers/hwmon/drivetemp.c
>>>>> @@ -285,6 +285,36 @@ static int drivetemp_get_scttemp(struct drivetemp_data *st, u32 attr, long *val)
>>>>>  	return err;
>>>>>  }
>>>>>  
>>>>> +static const char * const sct_avoid_models[] = {
>>>>> +/*
>>>>> + * These drives will have WRITE FPDMA QUEUED command timeouts and sometimes just
>>>>> + * freeze until power-cycled under heavy write loads when their temperature is
>>>>> + * getting polled in SCT mode. The SMART mode seems to be fine, though.
>>>>> + *
>>>>> + * While only the 3 TB model was actually caught exhibiting the problem
>>>>> + * let's play safe here to avoid data corruption and ban the whole family.
>>>>> + */
>>>>> +	"TOSHIBA DT01ACA0",
>>>>> +	"TOSHIBA DT01ACA1",
>>>>> +	"TOSHIBA DT01ACA2",
>>>>> +	"TOSHIBA DT01ACA3",
>>>>> +};
>>>>> +
>>>>> +static bool drivetemp_sct_avoid(struct drivetemp_data *st)
>>>>> +{
>>>>> +	struct scsi_device *sdev = st->sdev;
>>>>> +	unsigned int ctr;
>>>>> +
>>>>> +	if (!sdev->model)
>>>>> +		return false;
>>>>> +
>>>>> +	for (ctr = 0; ctr < ARRAY_SIZE(sct_avoid_models); ctr++)
>>>>> +		if (strncmp(sdev->model, sct_avoid_models[ctr], 16) == 0)
>>>>
>>>> Why strncmp, and why length 16 ? Both strings are, as far as I can see,
>>>> 0 terminated. A fixed length only asks for trouble later on as more models
>>>> are added to the list.
>>>
>>> The first 16 bytes of sdev->model contain the model number, the rest
>>> seems to be the drive serial number.
>>> There is no NULL separator between them.
>>>
>>
>> If the "16" is based on some SCSI standard, there should be
>> a define for it somewhere. 
> 
> The "model" field seems to contain just the raw SCSI INQUIRY
> "product identification" field, which is space-filled to 16 bytes, but
> is not NULL-terminated.
> 
> The SCSI layer seems to just open-code the number 16 everywhere, can't
> see any specific define for this.
> 
>> Otherwise, the comparison should
>> use strlen(sct_avoid_models[ctr]) and explain the reason.
>> In the latter case, it might possibly make sense to match
>> "TOSHIBA DT01ACA" to also catch later models.
> 
> Will use strlen() for prefix matching and match on the
> "TOSHIBA DT01ACA" prefix then.
> 
sgtm

Thanks,
Guenter
diff mbox series

Patch

diff --git a/drivers/hwmon/drivetemp.c b/drivers/hwmon/drivetemp.c
index 0d4f3d97ffc6..da9cfcbecc96 100644
--- a/drivers/hwmon/drivetemp.c
+++ b/drivers/hwmon/drivetemp.c
@@ -285,6 +285,36 @@  static int drivetemp_get_scttemp(struct drivetemp_data *st, u32 attr, long *val)
 	return err;
 }
 
+static const char * const sct_avoid_models[] = {
+/*
+ * These drives will have WRITE FPDMA QUEUED command timeouts and sometimes just
+ * freeze until power-cycled under heavy write loads when their temperature is
+ * getting polled in SCT mode. The SMART mode seems to be fine, though.
+ *
+ * While only the 3 TB model was actually caught exhibiting the problem
+ * let's play safe here to avoid data corruption and ban the whole family.
+ */
+	"TOSHIBA DT01ACA0",
+	"TOSHIBA DT01ACA1",
+	"TOSHIBA DT01ACA2",
+	"TOSHIBA DT01ACA3",
+};
+
+static bool drivetemp_sct_avoid(struct drivetemp_data *st)
+{
+	struct scsi_device *sdev = st->sdev;
+	unsigned int ctr;
+
+	if (!sdev->model)
+		return false;
+
+	for (ctr = 0; ctr < ARRAY_SIZE(sct_avoid_models); ctr++)
+		if (strncmp(sdev->model, sct_avoid_models[ctr], 16) == 0)
+			return true;
+
+	return false;
+}
+
 static int drivetemp_identify_sata(struct drivetemp_data *st)
 {
 	struct scsi_device *sdev = st->sdev;
@@ -326,6 +356,13 @@  static int drivetemp_identify_sata(struct drivetemp_data *st)
 	/* bail out if this is not a SATA device */
 	if (!is_ata || !is_sata)
 		return -ENODEV;
+
+	if (have_sct && drivetemp_sct_avoid(st)) {
+		dev_notice(&sdev->sdev_gendev,
+			   "will avoid using SCT for temperature monitoring\n");
+		have_sct = false;
+	}
+
 	if (!have_sct)
 		goto skip_sct;