Message ID | 20250115042910.1149966-4-shinichiro.kawasaki@wdc.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | null_blk: improve write failure simulation | expand |
Context | Check | Description |
---|---|---|
shin/vmtest-for-next-PR | success | PR summary |
shin/vmtest-for-next-VM_Test-0 | success | Logs for build-kernel |
On 1/15/25 1:29 PM, Shin'ichiro Kawasaki wrote: > When the badblocks parameter is set for zoned null_blk, zone resource > management does not work correctly. This issue arises because > null_zone_write() modifies the zone resource status and then call > null_process_cmd(), which handles the badblocks parameter. When > badblocks cause IO failures and no IO happens, the zone resource status > should not change. However, it has already changed. And that is correct in general. E.g. if an SMR HDD encounters a bad block while executing a write command, the bad block may endup failing the write command but the write operation was already started, meaning that the zone was at least implicitly open first to start writing. So doing zone management first and then handling the bad blocks (eventually failing the write operation) is the correct order to do things. I commented on the previous version that partially advancing the wp for a write that is failed due to a bad block was incorrect because zone resource management needed to be done. But it seems I was mistaken since you are saying here that zone management is already done before handling bad blocks. So I do not think we need this change. Or is it me who is completely confused ? > > To fix the unexpected change in zone resource status, when writes are > requested for sequential write required zones, handle badblocks not in > null_process_cmd() but in null_zone_write(). Modify null_zone_write() to > call null_handle_badblocks() before changing the zone resource status. > Also, call null_handle_memory_backed() instead of null_process_cmd(). > > Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com> > --- > drivers/block/null_blk/main.c | 11 ++++------- > drivers/block/null_blk/null_blk.h | 5 +++++ > drivers/block/null_blk/zoned.c | 15 ++++++++++++--- > 3 files changed, 21 insertions(+), 10 deletions(-) > > diff --git a/drivers/block/null_blk/main.c b/drivers/block/null_blk/main.c > index 2a060a6ea8c0..87037cb375c9 100644 > --- a/drivers/block/null_blk/main.c > +++ b/drivers/block/null_blk/main.c > @@ -1309,9 +1309,8 @@ static inline blk_status_t null_handle_throttled(struct nullb_cmd *cmd) > return sts; > } > > -static inline blk_status_t null_handle_badblocks(struct nullb_cmd *cmd, > - sector_t sector, > - sector_t nr_sectors) > +blk_status_t null_handle_badblocks(struct nullb_cmd *cmd, sector_t sector, > + sector_t nr_sectors) > { > struct badblocks *bb = &cmd->nq->dev->badblocks; > sector_t first_bad; > @@ -1326,10 +1325,8 @@ static inline blk_status_t null_handle_badblocks(struct nullb_cmd *cmd, > return BLK_STS_IOERR; > } > > -static inline blk_status_t null_handle_memory_backed(struct nullb_cmd *cmd, > - enum req_op op, > - sector_t sector, > - sector_t nr_sectors) > +blk_status_t null_handle_memory_backed(struct nullb_cmd *cmd, enum req_op op, > + sector_t sector, sector_t nr_sectors) > { > struct nullb_device *dev = cmd->nq->dev; > > diff --git a/drivers/block/null_blk/null_blk.h b/drivers/block/null_blk/null_blk.h > index 3c4c07f0418b..ee60f3a88796 100644 > --- a/drivers/block/null_blk/null_blk.h > +++ b/drivers/block/null_blk/null_blk.h > @@ -132,6 +132,11 @@ blk_status_t null_handle_discard(struct nullb_device *dev, sector_t sector, > sector_t nr_sectors); > blk_status_t null_process_cmd(struct nullb_cmd *cmd, enum req_op op, > sector_t sector, unsigned int nr_sectors); > +blk_status_t null_handle_badblocks(struct nullb_cmd *cmd, sector_t sector, > + sector_t nr_sectors); > +blk_status_t null_handle_memory_backed(struct nullb_cmd *cmd, enum req_op op, > + sector_t sector, sector_t nr_sectors); > + > > #ifdef CONFIG_BLK_DEV_ZONED > int null_init_zoned_dev(struct nullb_device *dev, struct queue_limits *lim); > diff --git a/drivers/block/null_blk/zoned.c b/drivers/block/null_blk/zoned.c > index 0d5f9bf95229..09dae8d018aa 100644 > --- a/drivers/block/null_blk/zoned.c > +++ b/drivers/block/null_blk/zoned.c > @@ -389,6 +389,12 @@ static blk_status_t null_zone_write(struct nullb_cmd *cmd, sector_t sector, > goto unlock_zone; > } > > + if (dev->badblocks.shift != -1) { > + ret = null_handle_badblocks(cmd, sector, nr_sectors); > + if (ret != BLK_STS_OK) > + goto unlock_zone; > + } > + > if (zone->cond == BLK_ZONE_COND_CLOSED || > zone->cond == BLK_ZONE_COND_EMPTY) { > if (dev->need_zone_res_mgmt) { > @@ -412,9 +418,12 @@ static blk_status_t null_zone_write(struct nullb_cmd *cmd, sector_t sector, > zone->cond = BLK_ZONE_COND_IMP_OPEN; > } > > - ret = null_process_cmd(cmd, REQ_OP_WRITE, sector, nr_sectors); > - if (ret != BLK_STS_OK) > - goto unlock_zone; > + if (dev->memory_backed) { > + ret = null_handle_memory_backed(cmd, REQ_OP_WRITE, sector, > + nr_sectors); > + if (ret != BLK_STS_OK) > + goto unlock_zone; > + } > > zone->wp += nr_sectors; > if (zone->wp == zone->start + zone->capacity) {
On Jan 18, 2025 / 08:00, Damien Le Moal wrote: > On 1/15/25 1:29 PM, Shin'ichiro Kawasaki wrote: > > When the badblocks parameter is set for zoned null_blk, zone resource > > management does not work correctly. This issue arises because > > null_zone_write() modifies the zone resource status and then call > > null_process_cmd(), which handles the badblocks parameter. When > > badblocks cause IO failures and no IO happens, the zone resource status > > should not change. However, it has already changed. > > And that is correct in general. E.g. if an SMR HDD encounters a bad block while > executing a write command, the bad block may endup failing the write command but > the write operation was already started, meaning that the zone was at least > implicitly open first to start writing. So doing zone management first and then > handling the bad blocks (eventually failing the write operation) is the correct > order to do things. > > I commented on the previous version that partially advancing the wp for a write > that is failed due to a bad block was incorrect because zone resource management > needed to be done. But it seems I was mistaken since you are saying here that > zone management is already done before handling bad blocks. So I do not think we > need this change. Or is it me who is completely confused ? Based on your comment on the previous version, I checked the code and found the call chain was as follows: null_process_zoned_cmd() null_zone_write() do zone resource management: zone resource management is done here, assuming writes always succeed null_process_cmd() null_handle_badblocks() ... v2 3rd patch added wp move for partial writes So, the zone resource management was done before applying the v2 patch series. However, the zone resource management did not care the write failure by badblocks. It assumed that as if the writes fully succeed always. When badblocks causes write failure for zoned null_blk, it leaves wrong zone resource status. I would say, this patch does not respond to your comment for the previous version. It addresses the problem I found when I thought about your comment. With this patch, the function call chain will be as follows: null_process_zoned_cmd() null_zone_write() null_handle_badblocks() ... checks how many sectors to be written do zone resource management ... zone resource management is done reflecting the result of null_handle_badblocks() call null_handle_memory_backed() The zone resource management part will be skipped when badblocks causes no write. The 5th patch in this 3rd series will modify null_handle_badblocks() to support partial writes by badblocks, and the zone resource management will be done for such partial writes.
On 1/20/25 5:33 PM, Shinichiro Kawasaki wrote: > On Jan 18, 2025 / 08:00, Damien Le Moal wrote: >> On 1/15/25 1:29 PM, Shin'ichiro Kawasaki wrote: >>> When the badblocks parameter is set for zoned null_blk, zone resource >>> management does not work correctly. This issue arises because >>> null_zone_write() modifies the zone resource status and then call >>> null_process_cmd(), which handles the badblocks parameter. When >>> badblocks cause IO failures and no IO happens, the zone resource status >>> should not change. However, it has already changed. >> >> And that is correct in general. E.g. if an SMR HDD encounters a bad block while >> executing a write command, the bad block may endup failing the write command but >> the write operation was already started, meaning that the zone was at least >> implicitly open first to start writing. So doing zone management first and then >> handling the bad blocks (eventually failing the write operation) is the correct >> order to do things. >> >> I commented on the previous version that partially advancing the wp for a write >> that is failed due to a bad block was incorrect because zone resource management >> needed to be done. But it seems I was mistaken since you are saying here that >> zone management is already done before handling bad blocks. So I do not think we >> need this change. Or is it me who is completely confused ? > > Based on your comment on the previous version, I checked the code and found > the call chain was as follows: > > null_process_zoned_cmd() > null_zone_write() > do zone resource management: zone resource management is done here, > assuming writes always succeed > null_process_cmd() > null_handle_badblocks() ... v2 3rd patch added wp move for partial writes > > So, the zone resource management was done before applying the v2 patch series. > > However, the zone resource management did not care the write failure by > badblocks. It assumed that as if the writes fully succeed always. When badblocks > causes write failure for zoned null_blk, it leaves wrong zone resource status. Not always. If the write that is being failed by null_handle_badblocks() makes (or keeop) the zone open, that it is still correct. The zone condition is fine as long as the zone does not become full. What is wrong though is that we will also increment the zone write pointer before running null_handle_badblocks(), so if the write is failed by that function, we endup with an incorrect write pointer. But the zone condittion is not necessarily wrong. As commented before, a real device will always open a zone before accessing the media. So an empty or closed zone will always become open even if the write operation fails on the first sector of the write. > I would say, this patch does not respond to your comment for the previous > version. It addresses the problem I found when I thought about your comment. > > With this patch, the function call chain will be as follows: > > null_process_zoned_cmd() > null_zone_write() > null_handle_badblocks() ... checks how many sectors to be written > do zone resource management ... zone resource management is done reflecting > the result of null_handle_badblocks() call > null_handle_memory_backed() > > The zone resource management part will be skipped when badblocks causes no > write. The 5th patch in this 3rd series will modify null_handle_badblocks() to > support partial writes by badblocks, and the zone resource management will be > done for such partial writes. Understood. But to be completely correct with the zone condition handling, we should probably also do the zone resource management even if the write is failed on its first sector, that is, even if the actual number of sectors to write is 0.
diff --git a/drivers/block/null_blk/main.c b/drivers/block/null_blk/main.c index 2a060a6ea8c0..87037cb375c9 100644 --- a/drivers/block/null_blk/main.c +++ b/drivers/block/null_blk/main.c @@ -1309,9 +1309,8 @@ static inline blk_status_t null_handle_throttled(struct nullb_cmd *cmd) return sts; } -static inline blk_status_t null_handle_badblocks(struct nullb_cmd *cmd, - sector_t sector, - sector_t nr_sectors) +blk_status_t null_handle_badblocks(struct nullb_cmd *cmd, sector_t sector, + sector_t nr_sectors) { struct badblocks *bb = &cmd->nq->dev->badblocks; sector_t first_bad; @@ -1326,10 +1325,8 @@ static inline blk_status_t null_handle_badblocks(struct nullb_cmd *cmd, return BLK_STS_IOERR; } -static inline blk_status_t null_handle_memory_backed(struct nullb_cmd *cmd, - enum req_op op, - sector_t sector, - sector_t nr_sectors) +blk_status_t null_handle_memory_backed(struct nullb_cmd *cmd, enum req_op op, + sector_t sector, sector_t nr_sectors) { struct nullb_device *dev = cmd->nq->dev; diff --git a/drivers/block/null_blk/null_blk.h b/drivers/block/null_blk/null_blk.h index 3c4c07f0418b..ee60f3a88796 100644 --- a/drivers/block/null_blk/null_blk.h +++ b/drivers/block/null_blk/null_blk.h @@ -132,6 +132,11 @@ blk_status_t null_handle_discard(struct nullb_device *dev, sector_t sector, sector_t nr_sectors); blk_status_t null_process_cmd(struct nullb_cmd *cmd, enum req_op op, sector_t sector, unsigned int nr_sectors); +blk_status_t null_handle_badblocks(struct nullb_cmd *cmd, sector_t sector, + sector_t nr_sectors); +blk_status_t null_handle_memory_backed(struct nullb_cmd *cmd, enum req_op op, + sector_t sector, sector_t nr_sectors); + #ifdef CONFIG_BLK_DEV_ZONED int null_init_zoned_dev(struct nullb_device *dev, struct queue_limits *lim); diff --git a/drivers/block/null_blk/zoned.c b/drivers/block/null_blk/zoned.c index 0d5f9bf95229..09dae8d018aa 100644 --- a/drivers/block/null_blk/zoned.c +++ b/drivers/block/null_blk/zoned.c @@ -389,6 +389,12 @@ static blk_status_t null_zone_write(struct nullb_cmd *cmd, sector_t sector, goto unlock_zone; } + if (dev->badblocks.shift != -1) { + ret = null_handle_badblocks(cmd, sector, nr_sectors); + if (ret != BLK_STS_OK) + goto unlock_zone; + } + if (zone->cond == BLK_ZONE_COND_CLOSED || zone->cond == BLK_ZONE_COND_EMPTY) { if (dev->need_zone_res_mgmt) { @@ -412,9 +418,12 @@ static blk_status_t null_zone_write(struct nullb_cmd *cmd, sector_t sector, zone->cond = BLK_ZONE_COND_IMP_OPEN; } - ret = null_process_cmd(cmd, REQ_OP_WRITE, sector, nr_sectors); - if (ret != BLK_STS_OK) - goto unlock_zone; + if (dev->memory_backed) { + ret = null_handle_memory_backed(cmd, REQ_OP_WRITE, sector, + nr_sectors); + if (ret != BLK_STS_OK) + goto unlock_zone; + } zone->wp += nr_sectors; if (zone->wp == zone->start + zone->capacity) {
When the badblocks parameter is set for zoned null_blk, zone resource management does not work correctly. This issue arises because null_zone_write() modifies the zone resource status and then call null_process_cmd(), which handles the badblocks parameter. When badblocks cause IO failures and no IO happens, the zone resource status should not change. However, it has already changed. To fix the unexpected change in zone resource status, when writes are requested for sequential write required zones, handle badblocks not in null_process_cmd() but in null_zone_write(). Modify null_zone_write() to call null_handle_badblocks() before changing the zone resource status. Also, call null_handle_memory_backed() instead of null_process_cmd(). Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com> --- drivers/block/null_blk/main.c | 11 ++++------- drivers/block/null_blk/null_blk.h | 5 +++++ drivers/block/null_blk/zoned.c | 15 ++++++++++++--- 3 files changed, 21 insertions(+), 10 deletions(-)