block: don't do revalidate zones on invalid devices

Message ID	20200730112517.12816-1-johannes.thumshirn@wdc.com (mailing list archive)
State	New, archived
Headers	show Return-Path: <SRS0=0pY0=BJ=vger.kernel.org=linux-block-owner@kernel.org> IronPort-SDR: aqukEQ1XyKb1m5IjvqDxyOzaNEYRhPks5HMi55WT9j1mUG5HHlpqAqKj6CVqfC4yAk5R7NzSES rD3zr+oyGgitjuUF8GkTGjRJgHFvHWQ8ULVLDfcT/TlCG5832atcYiLBb9MJpEHHRQ4wUgbdcU 2D6Pp5vwqH4IjfqEJmK0/ywG+uoIH8gbcaRoCxddpMZA4y+bfGAvUOxUXT4i6zVOe0m4dguWt6 YjaA7a9s2cyqtRZB+Wq+g/W45GQdUazdyNyusDiPdw8CmZBhf2Zfrhv5mcSEM7357hxIDJ/EBA PqI= IronPort-SDR: 79gEIQYYMU/+ZWqinC2NEAX9plz+g6vcOyBLCkvrLcG56ecXZN8/vkuMf3iLv71CSoqeAAuJRg WUucNfNZ0P0vWkQkz4+BXJB07XBZjsa1Y= IronPort-SDR: HhBEa3CaAKUeKe6KZaa4M96eOXCKn88ZysFezYuiGkQblqIRUnQvvCJwqcEB2Tv5rUdNslBP/d TAMAwWW+FbJw== WDCIronportException: Internal From: Johannes Thumshirn <johannes.thumshirn@wdc.com> To: Jens Axboe <axboe@kernel.dk> Cc: linux-block@vger.kernel.org, Damien Le Moal <Damien.LeMoal@wdc.com>, Johannes Thumshirn <johannes.thumshirn@wdc.com> Subject: [PATCH] block: don't do revalidate zones on invalid devices Date: Thu, 30 Jul 2020 20:25:17 +0900 Message-Id: <20200730112517.12816-1-johannes.thumshirn@wdc.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Sender: linux-block-owner@vger.kernel.org Precedence: bulk
Series	block: don't do revalidate zones on invalid devices \| expand block: don't do revalidate zones on invalid devices

Johannes Thumshirn July 30, 2020, 11:25 a.m. UTC

When we loose a device for whatever reason while (re)scanning zones, we
trip over a NULL pointer in blk_revalidate_zone_cb, like in the following
log:

sd 0:0:0:0: [sda] 3418095616 4096-byte logical blocks: (14.0 TB/12.7 TiB)
sd 0:0:0:0: [sda] 52156 zones of 65536 logical blocks
sd 0:0:0:0: [sda] Write Protect is off
sd 0:0:0:0: [sda] Mode Sense: 37 00 00 08
sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
sd 0:0:0:0: [sda] REPORT ZONES start lba 1065287680 failed
sd 0:0:0:0: [sda] REPORT ZONES: Result: hostbyte=0x00 driverbyte=0x08
sd 0:0:0:0: [sda] Sense Key : 0xb [current]
sd 0:0:0:0: [sda] ASC=0x0 ASCQ=0x6
sda: failed to revalidate zones
sd 0:0:0:0: [sda] 0 4096-byte logical blocks: (0 B/0 B)
sda: detected capacity change from 14000519643136 to 0
==================================================================
BUG: KASAN: null-ptr-deref in blk_revalidate_zone_cb+0x1b7/0x550
Write of size 8 at addr 0000000000000010 by task kworker/u4:1/58

CPU: 1 PID: 58 Comm: kworker/u4:1 Not tainted 5.8.0-rc1 #692
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.13.0-0-gf21b5a4-rebuilt.opensuse.org 04/01/2014
Workqueue: events_unbound async_run_entry_fn
Call Trace:
 dump_stack+0x7d/0xb0
 ? blk_revalidate_zone_cb+0x1b7/0x550
 kasan_report.cold+0x5/0x37
 ? blk_revalidate_zone_cb+0x1b7/0x550
 check_memory_region+0x145/0x1a0
 blk_revalidate_zone_cb+0x1b7/0x550
 sd_zbc_parse_report+0x1f1/0x370
 ? blk_req_zone_write_trylock+0x200/0x200
 ? sectors_to_logical+0x60/0x60
 ? blk_req_zone_write_trylock+0x200/0x200
 ? blk_req_zone_write_trylock+0x200/0x200
 sd_zbc_report_zones+0x3c4/0x5e0
 ? sd_dif_config_host+0x500/0x500
 blk_revalidate_disk_zones+0x231/0x44d
 ? _raw_write_lock_irqsave+0xb0/0xb0
 ? blk_queue_free_zone_bitmaps+0xd0/0xd0
 sd_zbc_read_zones+0x8cf/0x11a0
 sd_revalidate_disk+0x305c/0x64e0
 ? __device_add_disk+0x776/0xf20
 ? read_capacity_16.part.0+0x1080/0x1080
 ? blk_alloc_devt+0x250/0x250
 ? create_object.isra.0+0x595/0xa20
 ? kasan_unpoison_shadow+0x33/0x40
 sd_probe+0x8dc/0xcd2
 really_probe+0x20e/0xaf0
 __driver_attach_async_helper+0x249/0x2d0
 async_run_entry_fn+0xbe/0x560
 process_one_work+0x764/0x1290
 ? _raw_read_unlock_irqrestore+0x30/0x30
 worker_thread+0x598/0x12f0
 ? __kthread_parkme+0xc6/0x1b0
 ? schedule+0xed/0x2c0
 ? process_one_work+0x1290/0x1290
 kthread+0x36b/0x440
 ? kthread_create_worker_on_cpu+0xa0/0xa0
 ret_from_fork+0x22/0x30
==================================================================

When the device is already gone we end up with the following scenario:
The device's capacity is 0 and thus the number of zones will be 0 as well. When
allocating the bitmap for the conventional zones, we then trip over a NULL
pointer.

So if we encounter a zoned block device with a 0 capacity, don't dare to
revalidate the zones sizes.

Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
---

Note: This is a hot-fix for 5.8, we're working on something to make a
recoverable error recoverable.


 block/blk-zoned.c | 3 +++
 1 file changed, 3 insertions(+)

Damien Le Moal July 30, 2020, 12:32 p.m. UTC | #1

On 2020/07/30 20:25, Johannes Thumshirn wrote:
> When we loose a device for whatever reason while (re)scanning zones, we
> trip over a NULL pointer in blk_revalidate_zone_cb, like in the following
> log:
> 
> sd 0:0:0:0: [sda] 3418095616 4096-byte logical blocks: (14.0 TB/12.7 TiB)
> sd 0:0:0:0: [sda] 52156 zones of 65536 logical blocks
> sd 0:0:0:0: [sda] Write Protect is off
> sd 0:0:0:0: [sda] Mode Sense: 37 00 00 08
> sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
> sd 0:0:0:0: [sda] REPORT ZONES start lba 1065287680 failed
> sd 0:0:0:0: [sda] REPORT ZONES: Result: hostbyte=0x00 driverbyte=0x08
> sd 0:0:0:0: [sda] Sense Key : 0xb [current]
> sd 0:0:0:0: [sda] ASC=0x0 ASCQ=0x6
> sda: failed to revalidate zones
> sd 0:0:0:0: [sda] 0 4096-byte logical blocks: (0 B/0 B)
> sda: detected capacity change from 14000519643136 to 0
> ==================================================================
> BUG: KASAN: null-ptr-deref in blk_revalidate_zone_cb+0x1b7/0x550
> Write of size 8 at addr 0000000000000010 by task kworker/u4:1/58
> 
> CPU: 1 PID: 58 Comm: kworker/u4:1 Not tainted 5.8.0-rc1 #692
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.13.0-0-gf21b5a4-rebuilt.opensuse.org 04/01/2014
> Workqueue: events_unbound async_run_entry_fn
> Call Trace:
>  dump_stack+0x7d/0xb0
>  ? blk_revalidate_zone_cb+0x1b7/0x550
>  kasan_report.cold+0x5/0x37
>  ? blk_revalidate_zone_cb+0x1b7/0x550
>  check_memory_region+0x145/0x1a0
>  blk_revalidate_zone_cb+0x1b7/0x550
>  sd_zbc_parse_report+0x1f1/0x370
>  ? blk_req_zone_write_trylock+0x200/0x200
>  ? sectors_to_logical+0x60/0x60
>  ? blk_req_zone_write_trylock+0x200/0x200
>  ? blk_req_zone_write_trylock+0x200/0x200
>  sd_zbc_report_zones+0x3c4/0x5e0
>  ? sd_dif_config_host+0x500/0x500
>  blk_revalidate_disk_zones+0x231/0x44d
>  ? _raw_write_lock_irqsave+0xb0/0xb0
>  ? blk_queue_free_zone_bitmaps+0xd0/0xd0
>  sd_zbc_read_zones+0x8cf/0x11a0
>  sd_revalidate_disk+0x305c/0x64e0
>  ? __device_add_disk+0x776/0xf20
>  ? read_capacity_16.part.0+0x1080/0x1080
>  ? blk_alloc_devt+0x250/0x250
>  ? create_object.isra.0+0x595/0xa20
>  ? kasan_unpoison_shadow+0x33/0x40
>  sd_probe+0x8dc/0xcd2
>  really_probe+0x20e/0xaf0
>  __driver_attach_async_helper+0x249/0x2d0
>  async_run_entry_fn+0xbe/0x560
>  process_one_work+0x764/0x1290
>  ? _raw_read_unlock_irqrestore+0x30/0x30
>  worker_thread+0x598/0x12f0
>  ? __kthread_parkme+0xc6/0x1b0
>  ? schedule+0xed/0x2c0
>  ? process_one_work+0x1290/0x1290
>  kthread+0x36b/0x440
>  ? kthread_create_worker_on_cpu+0xa0/0xa0
>  ret_from_fork+0x22/0x30
> ==================================================================
> 
> When the device is already gone we end up with the following scenario:
> The device's capacity is 0 and thus the number of zones will be 0 as well. When
> allocating the bitmap for the conventional zones, we then trip over a NULL
> pointer.
> 
> So if we encounter a zoned block device with a 0 capacity, don't dare to
> revalidate the zones sizes.
> 
> Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
> ---
> 
> Note: This is a hot-fix for 5.8, we're working on something to make a
> recoverable error recoverable.
> 
> 
>  block/blk-zoned.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/block/blk-zoned.c b/block/blk-zoned.c
> index 23831fa8701d..480dfff69a00 100644
> --- a/block/blk-zoned.c
> +++ b/block/blk-zoned.c
> @@ -497,6 +497,9 @@ int blk_revalidate_disk_zones(struct gendisk *disk,
>  	if (WARN_ON_ONCE(!queue_is_mq(q)))
>  		return -EIO;
>  
> +	if (!get_capacity(disk))
> +		return -EIO;
> +
>  	/*
>  	 * Ensure that all memory allocations in this context are done as if
>  	 * GFP_NOIO was specified.
> 

I reworked sd_zbc_read_zones() and sd_zbc_revalidate_zones() to allow recovering
from simple temporary errors and avoid this problem. Will send the patch
tomorrow or so after some more testing.

But even with that patch applied, I think this patch makes the generic block
code more solid. So:

Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com>

Johannes Thumshirn July 31, 2020, 7:43 a.m. UTC | #2

On 30/07/2020 14:33, Damien Le Moal wrote:
> On 2020/07/30 20:25, Johannes Thumshirn wrote:
>> When we loose a device for whatever reason while (re)scanning zones, we
>> trip over a NULL pointer in blk_revalidate_zone_cb, like in the following
>> log:
>>
>> sd 0:0:0:0: [sda] 3418095616 4096-byte logical blocks: (14.0 TB/12.7 TiB)
>> sd 0:0:0:0: [sda] 52156 zones of 65536 logical blocks
>> sd 0:0:0:0: [sda] Write Protect is off
>> sd 0:0:0:0: [sda] Mode Sense: 37 00 00 08
>> sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
>> sd 0:0:0:0: [sda] REPORT ZONES start lba 1065287680 failed
>> sd 0:0:0:0: [sda] REPORT ZONES: Result: hostbyte=0x00 driverbyte=0x08
>> sd 0:0:0:0: [sda] Sense Key : 0xb [current]
>> sd 0:0:0:0: [sda] ASC=0x0 ASCQ=0x6
>> sda: failed to revalidate zones
>> sd 0:0:0:0: [sda] 0 4096-byte logical blocks: (0 B/0 B)
>> sda: detected capacity change from 14000519643136 to 0
>> ==================================================================
>> BUG: KASAN: null-ptr-deref in blk_revalidate_zone_cb+0x1b7/0x550
>> Write of size 8 at addr 0000000000000010 by task kworker/u4:1/58
>>
>> CPU: 1 PID: 58 Comm: kworker/u4:1 Not tainted 5.8.0-rc1 #692
>> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.13.0-0-gf21b5a4-rebuilt.opensuse.org 04/01/2014
>> Workqueue: events_unbound async_run_entry_fn
>> Call Trace:
>>  dump_stack+0x7d/0xb0
>>  ? blk_revalidate_zone_cb+0x1b7/0x550
>>  kasan_report.cold+0x5/0x37
>>  ? blk_revalidate_zone_cb+0x1b7/0x550
>>  check_memory_region+0x145/0x1a0
>>  blk_revalidate_zone_cb+0x1b7/0x550
>>  sd_zbc_parse_report+0x1f1/0x370
>>  ? blk_req_zone_write_trylock+0x200/0x200
>>  ? sectors_to_logical+0x60/0x60
>>  ? blk_req_zone_write_trylock+0x200/0x200
>>  ? blk_req_zone_write_trylock+0x200/0x200
>>  sd_zbc_report_zones+0x3c4/0x5e0
>>  ? sd_dif_config_host+0x500/0x500
>>  blk_revalidate_disk_zones+0x231/0x44d
>>  ? _raw_write_lock_irqsave+0xb0/0xb0
>>  ? blk_queue_free_zone_bitmaps+0xd0/0xd0
>>  sd_zbc_read_zones+0x8cf/0x11a0
>>  sd_revalidate_disk+0x305c/0x64e0
>>  ? __device_add_disk+0x776/0xf20
>>  ? read_capacity_16.part.0+0x1080/0x1080
>>  ? blk_alloc_devt+0x250/0x250
>>  ? create_object.isra.0+0x595/0xa20
>>  ? kasan_unpoison_shadow+0x33/0x40
>>  sd_probe+0x8dc/0xcd2
>>  really_probe+0x20e/0xaf0
>>  __driver_attach_async_helper+0x249/0x2d0
>>  async_run_entry_fn+0xbe/0x560
>>  process_one_work+0x764/0x1290
>>  ? _raw_read_unlock_irqrestore+0x30/0x30
>>  worker_thread+0x598/0x12f0
>>  ? __kthread_parkme+0xc6/0x1b0
>>  ? schedule+0xed/0x2c0
>>  ? process_one_work+0x1290/0x1290
>>  kthread+0x36b/0x440
>>  ? kthread_create_worker_on_cpu+0xa0/0xa0
>>  ret_from_fork+0x22/0x30
>> ==================================================================
>>
>> When the device is already gone we end up with the following scenario:
>> The device's capacity is 0 and thus the number of zones will be 0 as well. When
>> allocating the bitmap for the conventional zones, we then trip over a NULL
>> pointer.
>>
>> So if we encounter a zoned block device with a 0 capacity, don't dare to
>> revalidate the zones sizes.
>>
>> Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
>> ---
>>
>> Note: This is a hot-fix for 5.8, we're working on something to make a
>> recoverable error recoverable.
>>
>>
>>  block/blk-zoned.c | 3 +++
>>  1 file changed, 3 insertions(+)
>>
>> diff --git a/block/blk-zoned.c b/block/blk-zoned.c
>> index 23831fa8701d..480dfff69a00 100644
>> --- a/block/blk-zoned.c
>> +++ b/block/blk-zoned.c
>> @@ -497,6 +497,9 @@ int blk_revalidate_disk_zones(struct gendisk *disk,
>>  	if (WARN_ON_ONCE(!queue_is_mq(q)))
>>  		return -EIO;
>>  
>> +	if (!get_capacity(disk))
>> +		return -EIO;
>> +
>>  	/*
>>  	 * Ensure that all memory allocations in this context are done as if
>>  	 * GFP_NOIO was specified.
>>
> 
> I reworked sd_zbc_read_zones() and sd_zbc_revalidate_zones() to allow recovering
> from simple temporary errors and avoid this problem. Will send the patch
> tomorrow or so after some more testing.
> 
> But even with that patch applied, I think this patch makes the generic block
> code more solid. So:
> 
> Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com>
> 
> 

Jens any chance we can still get this into 5.8?

Damien Le Moal July 31, 2020, 8:01 a.m. UTC | #3

On 2020/07/31 16:43, Johannes Thumshirn wrote:
> On 30/07/2020 14:33, Damien Le Moal wrote:
>> On 2020/07/30 20:25, Johannes Thumshirn wrote:
>>> When we loose a device for whatever reason while (re)scanning zones, we
>>> trip over a NULL pointer in blk_revalidate_zone_cb, like in the following
>>> log:
>>>
>>> sd 0:0:0:0: [sda] 3418095616 4096-byte logical blocks: (14.0 TB/12.7 TiB)
>>> sd 0:0:0:0: [sda] 52156 zones of 65536 logical blocks
>>> sd 0:0:0:0: [sda] Write Protect is off
>>> sd 0:0:0:0: [sda] Mode Sense: 37 00 00 08
>>> sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
>>> sd 0:0:0:0: [sda] REPORT ZONES start lba 1065287680 failed
>>> sd 0:0:0:0: [sda] REPORT ZONES: Result: hostbyte=0x00 driverbyte=0x08
>>> sd 0:0:0:0: [sda] Sense Key : 0xb [current]
>>> sd 0:0:0:0: [sda] ASC=0x0 ASCQ=0x6
>>> sda: failed to revalidate zones
>>> sd 0:0:0:0: [sda] 0 4096-byte logical blocks: (0 B/0 B)
>>> sda: detected capacity change from 14000519643136 to 0
>>> ==================================================================
>>> BUG: KASAN: null-ptr-deref in blk_revalidate_zone_cb+0x1b7/0x550
>>> Write of size 8 at addr 0000000000000010 by task kworker/u4:1/58
>>>
>>> CPU: 1 PID: 58 Comm: kworker/u4:1 Not tainted 5.8.0-rc1 #692
>>> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.13.0-0-gf21b5a4-rebuilt.opensuse.org 04/01/2014
>>> Workqueue: events_unbound async_run_entry_fn
>>> Call Trace:
>>>  dump_stack+0x7d/0xb0
>>>  ? blk_revalidate_zone_cb+0x1b7/0x550
>>>  kasan_report.cold+0x5/0x37
>>>  ? blk_revalidate_zone_cb+0x1b7/0x550
>>>  check_memory_region+0x145/0x1a0
>>>  blk_revalidate_zone_cb+0x1b7/0x550
>>>  sd_zbc_parse_report+0x1f1/0x370
>>>  ? blk_req_zone_write_trylock+0x200/0x200
>>>  ? sectors_to_logical+0x60/0x60
>>>  ? blk_req_zone_write_trylock+0x200/0x200
>>>  ? blk_req_zone_write_trylock+0x200/0x200
>>>  sd_zbc_report_zones+0x3c4/0x5e0
>>>  ? sd_dif_config_host+0x500/0x500
>>>  blk_revalidate_disk_zones+0x231/0x44d
>>>  ? _raw_write_lock_irqsave+0xb0/0xb0
>>>  ? blk_queue_free_zone_bitmaps+0xd0/0xd0
>>>  sd_zbc_read_zones+0x8cf/0x11a0
>>>  sd_revalidate_disk+0x305c/0x64e0
>>>  ? __device_add_disk+0x776/0xf20
>>>  ? read_capacity_16.part.0+0x1080/0x1080
>>>  ? blk_alloc_devt+0x250/0x250
>>>  ? create_object.isra.0+0x595/0xa20
>>>  ? kasan_unpoison_shadow+0x33/0x40
>>>  sd_probe+0x8dc/0xcd2
>>>  really_probe+0x20e/0xaf0
>>>  __driver_attach_async_helper+0x249/0x2d0
>>>  async_run_entry_fn+0xbe/0x560
>>>  process_one_work+0x764/0x1290
>>>  ? _raw_read_unlock_irqrestore+0x30/0x30
>>>  worker_thread+0x598/0x12f0
>>>  ? __kthread_parkme+0xc6/0x1b0
>>>  ? schedule+0xed/0x2c0
>>>  ? process_one_work+0x1290/0x1290
>>>  kthread+0x36b/0x440
>>>  ? kthread_create_worker_on_cpu+0xa0/0xa0
>>>  ret_from_fork+0x22/0x30
>>> ==================================================================
>>>
>>> When the device is already gone we end up with the following scenario:
>>> The device's capacity is 0 and thus the number of zones will be 0 as well. When
>>> allocating the bitmap for the conventional zones, we then trip over a NULL
>>> pointer.
>>>
>>> So if we encounter a zoned block device with a 0 capacity, don't dare to
>>> revalidate the zones sizes.
>>>
>>> Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
>>> ---
>>>
>>> Note: This is a hot-fix for 5.8, we're working on something to make a
>>> recoverable error recoverable.
>>>
>>>
>>>  block/blk-zoned.c | 3 +++
>>>  1 file changed, 3 insertions(+)
>>>
>>> diff --git a/block/blk-zoned.c b/block/blk-zoned.c
>>> index 23831fa8701d..480dfff69a00 100644
>>> --- a/block/blk-zoned.c
>>> +++ b/block/blk-zoned.c
>>> @@ -497,6 +497,9 @@ int blk_revalidate_disk_zones(struct gendisk *disk,
>>>  	if (WARN_ON_ONCE(!queue_is_mq(q)))
>>>  		return -EIO;
>>>  
>>> +	if (!get_capacity(disk))
>>> +		return -EIO;
>>> +
>>>  	/*
>>>  	 * Ensure that all memory allocations in this context are done as if
>>>  	 * GFP_NOIO was specified.
>>>
>>
>> I reworked sd_zbc_read_zones() and sd_zbc_revalidate_zones() to allow recovering
>> from simple temporary errors and avoid this problem. Will send the patch
>> tomorrow or so after some more testing.
>>
>> But even with that patch applied, I think this patch makes the generic block
>> code more solid. So:
>>
>> Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com>
>>
>>
> 
> Jens any chance we can still get this into 5.8?

By the way, this needs a "fixes" tag too. And probably cc stable for 5.7.
Looking at 5.4 LTS, the bug is not present since there is a test on !nr_zones
and the entire revalidation is different anyway (callback was introduced in 5.5
if I remember correctly).

Johannes Thumshirn July 31, 2020, 8:41 a.m. UTC | #4

On 31/07/2020 10:01, Damien Le Moal wrote:
> On 2020/07/31 16:43, Johannes Thumshirn wrote:
>> On 30/07/2020 14:33, Damien Le Moal wrote:
>>> On 2020/07/30 20:25, Johannes Thumshirn wrote:
>>>> When we loose a device for whatever reason while (re)scanning zones, we
>>>> trip over a NULL pointer in blk_revalidate_zone_cb, like in the following
>>>> log:
>>>>
>>>> sd 0:0:0:0: [sda] 3418095616 4096-byte logical blocks: (14.0 TB/12.7 TiB)
>>>> sd 0:0:0:0: [sda] 52156 zones of 65536 logical blocks
>>>> sd 0:0:0:0: [sda] Write Protect is off
>>>> sd 0:0:0:0: [sda] Mode Sense: 37 00 00 08
>>>> sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
>>>> sd 0:0:0:0: [sda] REPORT ZONES start lba 1065287680 failed
>>>> sd 0:0:0:0: [sda] REPORT ZONES: Result: hostbyte=0x00 driverbyte=0x08
>>>> sd 0:0:0:0: [sda] Sense Key : 0xb [current]
>>>> sd 0:0:0:0: [sda] ASC=0x0 ASCQ=0x6
>>>> sda: failed to revalidate zones
>>>> sd 0:0:0:0: [sda] 0 4096-byte logical blocks: (0 B/0 B)
>>>> sda: detected capacity change from 14000519643136 to 0
>>>> ==================================================================
>>>> BUG: KASAN: null-ptr-deref in blk_revalidate_zone_cb+0x1b7/0x550
>>>> Write of size 8 at addr 0000000000000010 by task kworker/u4:1/58
>>>>
>>>> CPU: 1 PID: 58 Comm: kworker/u4:1 Not tainted 5.8.0-rc1 #692
>>>> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.13.0-0-gf21b5a4-rebuilt.opensuse.org 04/01/2014
>>>> Workqueue: events_unbound async_run_entry_fn
>>>> Call Trace:
>>>>  dump_stack+0x7d/0xb0
>>>>  ? blk_revalidate_zone_cb+0x1b7/0x550
>>>>  kasan_report.cold+0x5/0x37
>>>>  ? blk_revalidate_zone_cb+0x1b7/0x550
>>>>  check_memory_region+0x145/0x1a0
>>>>  blk_revalidate_zone_cb+0x1b7/0x550
>>>>  sd_zbc_parse_report+0x1f1/0x370
>>>>  ? blk_req_zone_write_trylock+0x200/0x200
>>>>  ? sectors_to_logical+0x60/0x60
>>>>  ? blk_req_zone_write_trylock+0x200/0x200
>>>>  ? blk_req_zone_write_trylock+0x200/0x200
>>>>  sd_zbc_report_zones+0x3c4/0x5e0
>>>>  ? sd_dif_config_host+0x500/0x500
>>>>  blk_revalidate_disk_zones+0x231/0x44d
>>>>  ? _raw_write_lock_irqsave+0xb0/0xb0
>>>>  ? blk_queue_free_zone_bitmaps+0xd0/0xd0
>>>>  sd_zbc_read_zones+0x8cf/0x11a0
>>>>  sd_revalidate_disk+0x305c/0x64e0
>>>>  ? __device_add_disk+0x776/0xf20
>>>>  ? read_capacity_16.part.0+0x1080/0x1080
>>>>  ? blk_alloc_devt+0x250/0x250
>>>>  ? create_object.isra.0+0x595/0xa20
>>>>  ? kasan_unpoison_shadow+0x33/0x40
>>>>  sd_probe+0x8dc/0xcd2
>>>>  really_probe+0x20e/0xaf0
>>>>  __driver_attach_async_helper+0x249/0x2d0
>>>>  async_run_entry_fn+0xbe/0x560
>>>>  process_one_work+0x764/0x1290
>>>>  ? _raw_read_unlock_irqrestore+0x30/0x30
>>>>  worker_thread+0x598/0x12f0
>>>>  ? __kthread_parkme+0xc6/0x1b0
>>>>  ? schedule+0xed/0x2c0
>>>>  ? process_one_work+0x1290/0x1290
>>>>  kthread+0x36b/0x440
>>>>  ? kthread_create_worker_on_cpu+0xa0/0xa0
>>>>  ret_from_fork+0x22/0x30
>>>> ==================================================================
>>>>
>>>> When the device is already gone we end up with the following scenario:
>>>> The device's capacity is 0 and thus the number of zones will be 0 as well. When
>>>> allocating the bitmap for the conventional zones, we then trip over a NULL
>>>> pointer.
>>>>
>>>> So if we encounter a zoned block device with a 0 capacity, don't dare to
>>>> revalidate the zones sizes.
>>>>
>>>> Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
>>>> ---
>>>>
>>>> Note: This is a hot-fix for 5.8, we're working on something to make a
>>>> recoverable error recoverable.
>>>>
>>>>
>>>>  block/blk-zoned.c | 3 +++
>>>>  1 file changed, 3 insertions(+)
>>>>
>>>> diff --git a/block/blk-zoned.c b/block/blk-zoned.c
>>>> index 23831fa8701d..480dfff69a00 100644
>>>> --- a/block/blk-zoned.c
>>>> +++ b/block/blk-zoned.c
>>>> @@ -497,6 +497,9 @@ int blk_revalidate_disk_zones(struct gendisk *disk,
>>>>  	if (WARN_ON_ONCE(!queue_is_mq(q)))
>>>>  		return -EIO;
>>>>  
>>>> +	if (!get_capacity(disk))
>>>> +		return -EIO;
>>>> +
>>>>  	/*
>>>>  	 * Ensure that all memory allocations in this context are done as if
>>>>  	 * GFP_NOIO was specified.
>>>>
>>>
>>> I reworked sd_zbc_read_zones() and sd_zbc_revalidate_zones() to allow recovering
>>> from simple temporary errors and avoid this problem. Will send the patch
>>> tomorrow or so after some more testing.
>>>
>>> But even with that patch applied, I think this patch makes the generic block
>>> code more solid. So:
>>>
>>> Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com>
>>>
>>>
>>
>> Jens any chance we can still get this into 5.8?
> 
> By the way, this needs a "fixes" tag too. And probably cc stable for 5.7.
> Looking at 5.4 LTS, the bug is not present since there is a test on !nr_zones
> and the entire revalidation is different anyway (callback was introduced in 5.5
> if I remember correctly).
> 
> 

The callback got introduced with commit d41003513e61 ("block: rework zone reporting") in
v5.5-rc1 but looking at blk_revalidate_zone_cb() at this commit I think passing in a 
gendisk with 0 capacity won't do much harm.

The correct fixes tag will be:

Fixes: 6c6b35491422 ("block: set the zone size in blk_revalidate_disk_zones atomically")

Starting with this commit we're calculating the number of zones based on the disk's capacity
unconditional of what the capacity could be.

But that commit landed in v5.5 as well.

Jens Axboe July 31, 2020, 10:33 p.m. UTC | #5

On 7/31/20 1:43 AM, Johannes Thumshirn wrote:
> On 30/07/2020 14:33, Damien Le Moal wrote:
>> On 2020/07/30 20:25, Johannes Thumshirn wrote:
>>> When we loose a device for whatever reason while (re)scanning zones, we
>>> trip over a NULL pointer in blk_revalidate_zone_cb, like in the following
>>> log:
>>>
>>> sd 0:0:0:0: [sda] 3418095616 4096-byte logical blocks: (14.0 TB/12.7 TiB)
>>> sd 0:0:0:0: [sda] 52156 zones of 65536 logical blocks
>>> sd 0:0:0:0: [sda] Write Protect is off
>>> sd 0:0:0:0: [sda] Mode Sense: 37 00 00 08
>>> sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
>>> sd 0:0:0:0: [sda] REPORT ZONES start lba 1065287680 failed
>>> sd 0:0:0:0: [sda] REPORT ZONES: Result: hostbyte=0x00 driverbyte=0x08
>>> sd 0:0:0:0: [sda] Sense Key : 0xb [current]
>>> sd 0:0:0:0: [sda] ASC=0x0 ASCQ=0x6
>>> sda: failed to revalidate zones
>>> sd 0:0:0:0: [sda] 0 4096-byte logical blocks: (0 B/0 B)
>>> sda: detected capacity change from 14000519643136 to 0
>>> ==================================================================
>>> BUG: KASAN: null-ptr-deref in blk_revalidate_zone_cb+0x1b7/0x550
>>> Write of size 8 at addr 0000000000000010 by task kworker/u4:1/58
>>>
>>> CPU: 1 PID: 58 Comm: kworker/u4:1 Not tainted 5.8.0-rc1 #692
>>> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.13.0-0-gf21b5a4-rebuilt.opensuse.org 04/01/2014
>>> Workqueue: events_unbound async_run_entry_fn
>>> Call Trace:
>>>  dump_stack+0x7d/0xb0
>>>  ? blk_revalidate_zone_cb+0x1b7/0x550
>>>  kasan_report.cold+0x5/0x37
>>>  ? blk_revalidate_zone_cb+0x1b7/0x550
>>>  check_memory_region+0x145/0x1a0
>>>  blk_revalidate_zone_cb+0x1b7/0x550
>>>  sd_zbc_parse_report+0x1f1/0x370
>>>  ? blk_req_zone_write_trylock+0x200/0x200
>>>  ? sectors_to_logical+0x60/0x60
>>>  ? blk_req_zone_write_trylock+0x200/0x200
>>>  ? blk_req_zone_write_trylock+0x200/0x200
>>>  sd_zbc_report_zones+0x3c4/0x5e0
>>>  ? sd_dif_config_host+0x500/0x500
>>>  blk_revalidate_disk_zones+0x231/0x44d
>>>  ? _raw_write_lock_irqsave+0xb0/0xb0
>>>  ? blk_queue_free_zone_bitmaps+0xd0/0xd0
>>>  sd_zbc_read_zones+0x8cf/0x11a0
>>>  sd_revalidate_disk+0x305c/0x64e0
>>>  ? __device_add_disk+0x776/0xf20
>>>  ? read_capacity_16.part.0+0x1080/0x1080
>>>  ? blk_alloc_devt+0x250/0x250
>>>  ? create_object.isra.0+0x595/0xa20
>>>  ? kasan_unpoison_shadow+0x33/0x40
>>>  sd_probe+0x8dc/0xcd2
>>>  really_probe+0x20e/0xaf0
>>>  __driver_attach_async_helper+0x249/0x2d0
>>>  async_run_entry_fn+0xbe/0x560
>>>  process_one_work+0x764/0x1290
>>>  ? _raw_read_unlock_irqrestore+0x30/0x30
>>>  worker_thread+0x598/0x12f0
>>>  ? __kthread_parkme+0xc6/0x1b0
>>>  ? schedule+0xed/0x2c0
>>>  ? process_one_work+0x1290/0x1290
>>>  kthread+0x36b/0x440
>>>  ? kthread_create_worker_on_cpu+0xa0/0xa0
>>>  ret_from_fork+0x22/0x30
>>> ==================================================================
>>>
>>> When the device is already gone we end up with the following scenario:
>>> The device's capacity is 0 and thus the number of zones will be 0 as well. When
>>> allocating the bitmap for the conventional zones, we then trip over a NULL
>>> pointer.
>>>
>>> So if we encounter a zoned block device with a 0 capacity, don't dare to
>>> revalidate the zones sizes.
>>>
>>> Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
>>> ---
>>>
>>> Note: This is a hot-fix for 5.8, we're working on something to make a
>>> recoverable error recoverable.
>>>
>>>
>>>  block/blk-zoned.c | 3 +++
>>>  1 file changed, 3 insertions(+)
>>>
>>> diff --git a/block/blk-zoned.c b/block/blk-zoned.c
>>> index 23831fa8701d..480dfff69a00 100644
>>> --- a/block/blk-zoned.c
>>> +++ b/block/blk-zoned.c
>>> @@ -497,6 +497,9 @@ int blk_revalidate_disk_zones(struct gendisk *disk,
>>>  	if (WARN_ON_ONCE(!queue_is_mq(q)))
>>>  		return -EIO;
>>>  
>>> +	if (!get_capacity(disk))
>>> +		return -EIO;
>>> +
>>>  	/*
>>>  	 * Ensure that all memory allocations in this context are done as if
>>>  	 * GFP_NOIO was specified.
>>>
>>
>> I reworked sd_zbc_read_zones() and sd_zbc_revalidate_zones() to allow recovering
>> from simple temporary errors and avoid this problem. Will send the patch
>> tomorrow or so after some more testing.
>>
>> But even with that patch applied, I think this patch makes the generic block
>> code more solid. So:
>>
>> Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com>
>>
>>
> 
> Jens any chance we can still get this into 5.8?

I'm not going to push this out now, if 5.8 is being cut on Sunday. If we
happen to get an -rc8, then it's not impossible.

But this isn't a regression in this merge window as far as I can tell,
so really shouldn't be critical to get in. Marking it for stable etc and
queueing for 5.9 may be the saner approach, imho.

Johannes Thumshirn Aug. 3, 2020, 8:49 a.m. UTC | #6

On 01/08/2020 00:33, Jens Axboe wrote:
> On 7/31/20 1:43 AM, Johannes Thumshirn wrote:
>> On 30/07/2020 14:33, Damien Le Moal wrote:
>>> On 2020/07/30 20:25, Johannes Thumshirn wrote:
>>>> When we loose a device for whatever reason while (re)scanning zones, we
>>>> trip over a NULL pointer in blk_revalidate_zone_cb, like in the following
>>>> log:
>>>>
>>>> sd 0:0:0:0: [sda] 3418095616 4096-byte logical blocks: (14.0 TB/12.7 TiB)
>>>> sd 0:0:0:0: [sda] 52156 zones of 65536 logical blocks
>>>> sd 0:0:0:0: [sda] Write Protect is off
>>>> sd 0:0:0:0: [sda] Mode Sense: 37 00 00 08
>>>> sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
>>>> sd 0:0:0:0: [sda] REPORT ZONES start lba 1065287680 failed
>>>> sd 0:0:0:0: [sda] REPORT ZONES: Result: hostbyte=0x00 driverbyte=0x08
>>>> sd 0:0:0:0: [sda] Sense Key : 0xb [current]
>>>> sd 0:0:0:0: [sda] ASC=0x0 ASCQ=0x6
>>>> sda: failed to revalidate zones
>>>> sd 0:0:0:0: [sda] 0 4096-byte logical blocks: (0 B/0 B)
>>>> sda: detected capacity change from 14000519643136 to 0
>>>> ==================================================================
>>>> BUG: KASAN: null-ptr-deref in blk_revalidate_zone_cb+0x1b7/0x550
>>>> Write of size 8 at addr 0000000000000010 by task kworker/u4:1/58
>>>>
>>>> CPU: 1 PID: 58 Comm: kworker/u4:1 Not tainted 5.8.0-rc1 #692
>>>> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.13.0-0-gf21b5a4-rebuilt.opensuse.org 04/01/2014
>>>> Workqueue: events_unbound async_run_entry_fn
>>>> Call Trace:
>>>>  dump_stack+0x7d/0xb0
>>>>  ? blk_revalidate_zone_cb+0x1b7/0x550
>>>>  kasan_report.cold+0x5/0x37
>>>>  ? blk_revalidate_zone_cb+0x1b7/0x550
>>>>  check_memory_region+0x145/0x1a0
>>>>  blk_revalidate_zone_cb+0x1b7/0x550
>>>>  sd_zbc_parse_report+0x1f1/0x370
>>>>  ? blk_req_zone_write_trylock+0x200/0x200
>>>>  ? sectors_to_logical+0x60/0x60
>>>>  ? blk_req_zone_write_trylock+0x200/0x200
>>>>  ? blk_req_zone_write_trylock+0x200/0x200
>>>>  sd_zbc_report_zones+0x3c4/0x5e0
>>>>  ? sd_dif_config_host+0x500/0x500
>>>>  blk_revalidate_disk_zones+0x231/0x44d
>>>>  ? _raw_write_lock_irqsave+0xb0/0xb0
>>>>  ? blk_queue_free_zone_bitmaps+0xd0/0xd0
>>>>  sd_zbc_read_zones+0x8cf/0x11a0
>>>>  sd_revalidate_disk+0x305c/0x64e0
>>>>  ? __device_add_disk+0x776/0xf20
>>>>  ? read_capacity_16.part.0+0x1080/0x1080
>>>>  ? blk_alloc_devt+0x250/0x250
>>>>  ? create_object.isra.0+0x595/0xa20
>>>>  ? kasan_unpoison_shadow+0x33/0x40
>>>>  sd_probe+0x8dc/0xcd2
>>>>  really_probe+0x20e/0xaf0
>>>>  __driver_attach_async_helper+0x249/0x2d0
>>>>  async_run_entry_fn+0xbe/0x560
>>>>  process_one_work+0x764/0x1290
>>>>  ? _raw_read_unlock_irqrestore+0x30/0x30
>>>>  worker_thread+0x598/0x12f0
>>>>  ? __kthread_parkme+0xc6/0x1b0
>>>>  ? schedule+0xed/0x2c0
>>>>  ? process_one_work+0x1290/0x1290
>>>>  kthread+0x36b/0x440
>>>>  ? kthread_create_worker_on_cpu+0xa0/0xa0
>>>>  ret_from_fork+0x22/0x30
>>>> ==================================================================
>>>>
>>>> When the device is already gone we end up with the following scenario:
>>>> The device's capacity is 0 and thus the number of zones will be 0 as well. When
>>>> allocating the bitmap for the conventional zones, we then trip over a NULL
>>>> pointer.
>>>>
>>>> So if we encounter a zoned block device with a 0 capacity, don't dare to
>>>> revalidate the zones sizes.
>>>>
>>>> Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
>>>> ---
>>>>
>>>> Note: This is a hot-fix for 5.8, we're working on something to make a
>>>> recoverable error recoverable.
>>>>
>>>>
>>>>  block/blk-zoned.c | 3 +++
>>>>  1 file changed, 3 insertions(+)
>>>>
>>>> diff --git a/block/blk-zoned.c b/block/blk-zoned.c
>>>> index 23831fa8701d..480dfff69a00 100644
>>>> --- a/block/blk-zoned.c
>>>> +++ b/block/blk-zoned.c
>>>> @@ -497,6 +497,9 @@ int blk_revalidate_disk_zones(struct gendisk *disk,
>>>>  	if (WARN_ON_ONCE(!queue_is_mq(q)))
>>>>  		return -EIO;
>>>>  
>>>> +	if (!get_capacity(disk))
>>>> +		return -EIO;
>>>> +
>>>>  	/*
>>>>  	 * Ensure that all memory allocations in this context are done as if
>>>>  	 * GFP_NOIO was specified.
>>>>
>>>
>>> I reworked sd_zbc_read_zones() and sd_zbc_revalidate_zones() to allow recovering
>>> from simple temporary errors and avoid this problem. Will send the patch
>>> tomorrow or so after some more testing.
>>>
>>> But even with that patch applied, I think this patch makes the generic block
>>> code more solid. So:
>>>
>>> Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com>
>>>
>>>
>>
>> Jens any chance we can still get this into 5.8?
> 
> I'm not going to push this out now, if 5.8 is being cut on Sunday. If we
> happen to get an -rc8, then it's not impossible.
> 
> But this isn't a regression in this merge window as far as I can tell,
> so really shouldn't be critical to get in. Marking it for stable etc and
> queueing for 5.9 may be the saner approach, imho.
> 

You're right. When I wrote this I was still under the impression it's a 
regression I did introduce with the zone append series.

Anyway, thanks for pulling in.

block: don't do revalidate zones on invalid devices

Commit Message

Comments

Patch