Message ID | 20250115042910.1149966-4-shinichiro.kawasaki@wdc.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | null_blk: improve write failure simulation | expand |
Context | Check | Description |
---|---|---|
shin/vmtest-for-next-PR | success | PR summary |
shin/vmtest-for-next-VM_Test-0 | success | Logs for build-kernel |
On 1/15/25 1:29 PM, Shin'ichiro Kawasaki wrote: > When the badblocks parameter is set for zoned null_blk, zone resource > management does not work correctly. This issue arises because > null_zone_write() modifies the zone resource status and then call > null_process_cmd(), which handles the badblocks parameter. When > badblocks cause IO failures and no IO happens, the zone resource status > should not change. However, it has already changed. And that is correct in general. E.g. if an SMR HDD encounters a bad block while executing a write command, the bad block may endup failing the write command but the write operation was already started, meaning that the zone was at least implicitly open first to start writing. So doing zone management first and then handling the bad blocks (eventually failing the write operation) is the correct order to do things. I commented on the previous version that partially advancing the wp for a write that is failed due to a bad block was incorrect because zone resource management needed to be done. But it seems I was mistaken since you are saying here that zone management is already done before handling bad blocks. So I do not think we need this change. Or is it me who is completely confused ? > > To fix the unexpected change in zone resource status, when writes are > requested for sequential write required zones, handle badblocks not in > null_process_cmd() but in null_zone_write(). Modify null_zone_write() to > call null_handle_badblocks() before changing the zone resource status. > Also, call null_handle_memory_backed() instead of null_process_cmd(). > > Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com> > --- > drivers/block/null_blk/main.c | 11 ++++------- > drivers/block/null_blk/null_blk.h | 5 +++++ > drivers/block/null_blk/zoned.c | 15 ++++++++++++--- > 3 files changed, 21 insertions(+), 10 deletions(-) > > diff --git a/drivers/block/null_blk/main.c b/drivers/block/null_blk/main.c > index 2a060a6ea8c0..87037cb375c9 100644 > --- a/drivers/block/null_blk/main.c > +++ b/drivers/block/null_blk/main.c > @@ -1309,9 +1309,8 @@ static inline blk_status_t null_handle_throttled(struct nullb_cmd *cmd) > return sts; > } > > -static inline blk_status_t null_handle_badblocks(struct nullb_cmd *cmd, > - sector_t sector, > - sector_t nr_sectors) > +blk_status_t null_handle_badblocks(struct nullb_cmd *cmd, sector_t sector, > + sector_t nr_sectors) > { > struct badblocks *bb = &cmd->nq->dev->badblocks; > sector_t first_bad; > @@ -1326,10 +1325,8 @@ static inline blk_status_t null_handle_badblocks(struct nullb_cmd *cmd, > return BLK_STS_IOERR; > } > > -static inline blk_status_t null_handle_memory_backed(struct nullb_cmd *cmd, > - enum req_op op, > - sector_t sector, > - sector_t nr_sectors) > +blk_status_t null_handle_memory_backed(struct nullb_cmd *cmd, enum req_op op, > + sector_t sector, sector_t nr_sectors) > { > struct nullb_device *dev = cmd->nq->dev; > > diff --git a/drivers/block/null_blk/null_blk.h b/drivers/block/null_blk/null_blk.h > index 3c4c07f0418b..ee60f3a88796 100644 > --- a/drivers/block/null_blk/null_blk.h > +++ b/drivers/block/null_blk/null_blk.h > @@ -132,6 +132,11 @@ blk_status_t null_handle_discard(struct nullb_device *dev, sector_t sector, > sector_t nr_sectors); > blk_status_t null_process_cmd(struct nullb_cmd *cmd, enum req_op op, > sector_t sector, unsigned int nr_sectors); > +blk_status_t null_handle_badblocks(struct nullb_cmd *cmd, sector_t sector, > + sector_t nr_sectors); > +blk_status_t null_handle_memory_backed(struct nullb_cmd *cmd, enum req_op op, > + sector_t sector, sector_t nr_sectors); > + > > #ifdef CONFIG_BLK_DEV_ZONED > int null_init_zoned_dev(struct nullb_device *dev, struct queue_limits *lim); > diff --git a/drivers/block/null_blk/zoned.c b/drivers/block/null_blk/zoned.c > index 0d5f9bf95229..09dae8d018aa 100644 > --- a/drivers/block/null_blk/zoned.c > +++ b/drivers/block/null_blk/zoned.c > @@ -389,6 +389,12 @@ static blk_status_t null_zone_write(struct nullb_cmd *cmd, sector_t sector, > goto unlock_zone; > } > > + if (dev->badblocks.shift != -1) { > + ret = null_handle_badblocks(cmd, sector, nr_sectors); > + if (ret != BLK_STS_OK) > + goto unlock_zone; > + } > + > if (zone->cond == BLK_ZONE_COND_CLOSED || > zone->cond == BLK_ZONE_COND_EMPTY) { > if (dev->need_zone_res_mgmt) { > @@ -412,9 +418,12 @@ static blk_status_t null_zone_write(struct nullb_cmd *cmd, sector_t sector, > zone->cond = BLK_ZONE_COND_IMP_OPEN; > } > > - ret = null_process_cmd(cmd, REQ_OP_WRITE, sector, nr_sectors); > - if (ret != BLK_STS_OK) > - goto unlock_zone; > + if (dev->memory_backed) { > + ret = null_handle_memory_backed(cmd, REQ_OP_WRITE, sector, > + nr_sectors); > + if (ret != BLK_STS_OK) > + goto unlock_zone; > + } > > zone->wp += nr_sectors; > if (zone->wp == zone->start + zone->capacity) {
diff --git a/drivers/block/null_blk/main.c b/drivers/block/null_blk/main.c index 2a060a6ea8c0..87037cb375c9 100644 --- a/drivers/block/null_blk/main.c +++ b/drivers/block/null_blk/main.c @@ -1309,9 +1309,8 @@ static inline blk_status_t null_handle_throttled(struct nullb_cmd *cmd) return sts; } -static inline blk_status_t null_handle_badblocks(struct nullb_cmd *cmd, - sector_t sector, - sector_t nr_sectors) +blk_status_t null_handle_badblocks(struct nullb_cmd *cmd, sector_t sector, + sector_t nr_sectors) { struct badblocks *bb = &cmd->nq->dev->badblocks; sector_t first_bad; @@ -1326,10 +1325,8 @@ static inline blk_status_t null_handle_badblocks(struct nullb_cmd *cmd, return BLK_STS_IOERR; } -static inline blk_status_t null_handle_memory_backed(struct nullb_cmd *cmd, - enum req_op op, - sector_t sector, - sector_t nr_sectors) +blk_status_t null_handle_memory_backed(struct nullb_cmd *cmd, enum req_op op, + sector_t sector, sector_t nr_sectors) { struct nullb_device *dev = cmd->nq->dev; diff --git a/drivers/block/null_blk/null_blk.h b/drivers/block/null_blk/null_blk.h index 3c4c07f0418b..ee60f3a88796 100644 --- a/drivers/block/null_blk/null_blk.h +++ b/drivers/block/null_blk/null_blk.h @@ -132,6 +132,11 @@ blk_status_t null_handle_discard(struct nullb_device *dev, sector_t sector, sector_t nr_sectors); blk_status_t null_process_cmd(struct nullb_cmd *cmd, enum req_op op, sector_t sector, unsigned int nr_sectors); +blk_status_t null_handle_badblocks(struct nullb_cmd *cmd, sector_t sector, + sector_t nr_sectors); +blk_status_t null_handle_memory_backed(struct nullb_cmd *cmd, enum req_op op, + sector_t sector, sector_t nr_sectors); + #ifdef CONFIG_BLK_DEV_ZONED int null_init_zoned_dev(struct nullb_device *dev, struct queue_limits *lim); diff --git a/drivers/block/null_blk/zoned.c b/drivers/block/null_blk/zoned.c index 0d5f9bf95229..09dae8d018aa 100644 --- a/drivers/block/null_blk/zoned.c +++ b/drivers/block/null_blk/zoned.c @@ -389,6 +389,12 @@ static blk_status_t null_zone_write(struct nullb_cmd *cmd, sector_t sector, goto unlock_zone; } + if (dev->badblocks.shift != -1) { + ret = null_handle_badblocks(cmd, sector, nr_sectors); + if (ret != BLK_STS_OK) + goto unlock_zone; + } + if (zone->cond == BLK_ZONE_COND_CLOSED || zone->cond == BLK_ZONE_COND_EMPTY) { if (dev->need_zone_res_mgmt) { @@ -412,9 +418,12 @@ static blk_status_t null_zone_write(struct nullb_cmd *cmd, sector_t sector, zone->cond = BLK_ZONE_COND_IMP_OPEN; } - ret = null_process_cmd(cmd, REQ_OP_WRITE, sector, nr_sectors); - if (ret != BLK_STS_OK) - goto unlock_zone; + if (dev->memory_backed) { + ret = null_handle_memory_backed(cmd, REQ_OP_WRITE, sector, + nr_sectors); + if (ret != BLK_STS_OK) + goto unlock_zone; + } zone->wp += nr_sectors; if (zone->wp == zone->start + zone->capacity) {
When the badblocks parameter is set for zoned null_blk, zone resource management does not work correctly. This issue arises because null_zone_write() modifies the zone resource status and then call null_process_cmd(), which handles the badblocks parameter. When badblocks cause IO failures and no IO happens, the zone resource status should not change. However, it has already changed. To fix the unexpected change in zone resource status, when writes are requested for sequential write required zones, handle badblocks not in null_process_cmd() but in null_zone_write(). Modify null_zone_write() to call null_handle_badblocks() before changing the zone resource status. Also, call null_handle_memory_backed() instead of null_process_cmd(). Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com> --- drivers/block/null_blk/main.c | 11 ++++------- drivers/block/null_blk/null_blk.h | 5 +++++ drivers/block/null_blk/zoned.c | 15 ++++++++++++--- 3 files changed, 21 insertions(+), 10 deletions(-)