Message ID | 20230418224002.1195163-9-bvanassche@acm.org (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | mq-deadline: Improve support for zoned block devices | expand |
On Tue, Apr 18, 2023 at 03:39:59PM -0700, Bart Van Assche wrote: > Let deadline_next_request() only consider the first zoned write per > zone. This patch fixes a race condition between deadline_next_request() > and completion of zoned writes. Can you explain the condition in a bit more detail? > * For the specified data direction, return the next request to > @@ -386,9 +388,25 @@ deadline_next_request(struct deadline_data *dd, struct dd_per_prio *per_prio, > */ > spin_lock_irqsave(&dd->zone_lock, flags); > while (rq) { > + unsigned int zno __maybe_unused; > + > if (blk_req_can_dispatch_to_zone(rq)) > break; > + > +#ifdef CONFIG_BLK_DEV_ZONED > + zno = blk_rq_zone_no(rq); > + > rq = deadline_skip_seq_writes(dd, rq); > + > + /* > + * Skip all other write requests for the zone with zone number > + * 'zno'. This prevents that this function selects a zoned write > + * that is not the first write for a given zone. > + */ > + while (rq && blk_rq_zone_no(rq) == zno && > + blk_rq_is_seq_zoned_write(rq)) > + rq = deadline_latter_request(rq); > +#endif The ifdefere and extra checks are a bit ugly here. I'd suggest to: - move all the zoned related logic in deadline_next_request into a separate helper that is stubbed out for !CONFIG_BLK_DEV_ZONED - merge the loop in deadline_skip_seq_writes and added here into one single loop. Something like: static struct request * deadline_zoned_next_request(struct deadline_data *dd, struct request *rq) { unsigned long flags; spin_lock_irqsave(&dd->zone_lock, flags); while (!blk_req_can_dispatch_to_zone(rq)) { unsigned int zone_no = blk_rq_zone_no(rq); sector_t pos = blk_rq_pos(rq); while (blk_rq_is_seq_zoned_write(rq)) { // XXX: good comment explaining the check here if (blk_rq_pos(rq) != pos && blk_rq_zone_no(rq) != zone_no) break; pos += blk_rq_sectors(rq); rq = deadline_latter_request(rq); if (!rq) goto out_unlock; } } out_unlock: spin_unlock_irqrestore(&dd->zone_lock, flags); return rq; }
On 4/18/23 22:07, Christoph Hellwig wrote: > On Tue, Apr 18, 2023 at 03:39:59PM -0700, Bart Van Assche wrote: >> Let deadline_next_request() only consider the first zoned write per >> zone. This patch fixes a race condition between deadline_next_request() >> and completion of zoned writes. > > Can you explain the condition in a bit more detail? Hi Christoph, I'm concerned about the following scenario: * deadline_next_request() starts iterating over the writes for a particular zone. * blk_req_can_dispatch_to_zone() returns false for the first zoned write examined by deadline_next_request(). * A write for the same zone completes. * deadline_next_request() continues examining zoned writes and blk_req_can_dispatch_to_zone() returns true for another write than the first write for a zone. This would result in an UNALIGNED WRITE COMMAND error for zoned SCSI devices. Bart.
On 4/20/23 03:46, Bart Van Assche wrote: > On 4/18/23 22:07, Christoph Hellwig wrote: >> On Tue, Apr 18, 2023 at 03:39:59PM -0700, Bart Van Assche wrote: >>> Let deadline_next_request() only consider the first zoned write per >>> zone. This patch fixes a race condition between deadline_next_request() >>> and completion of zoned writes. >> >> Can you explain the condition in a bit more detail? > > Hi Christoph, > > I'm concerned about the following scenario: > * deadline_next_request() starts iterating over the writes for a > particular zone. > * blk_req_can_dispatch_to_zone() returns false for the first zoned write > examined by deadline_next_request(). > * A write for the same zone completes. > * deadline_next_request() continues examining zoned writes and > blk_req_can_dispatch_to_zone() returns true for another write than the > first write for a zone. This would result in an UNALIGNED WRITE > COMMAND error for zoned SCSI devices. Examining zoned writes has to be done with deadline zone lock held, exactly to avoid this issue. > > Bart. >
diff --git a/block/mq-deadline.c b/block/mq-deadline.c index 3122c471f473..32a2cc013ed3 100644 --- a/block/mq-deadline.c +++ b/block/mq-deadline.c @@ -302,6 +302,7 @@ static bool deadline_is_seq_write(struct deadline_data *dd, struct request *rq) return blk_rq_pos(prev) + blk_rq_sectors(prev) == blk_rq_pos(rq); } +#ifdef CONFIG_BLK_DEV_ZONED /* * Skip all write requests that are sequential from @rq, even if we cross * a zone boundary. @@ -318,6 +319,7 @@ static struct request *deadline_skip_seq_writes(struct deadline_data *dd, return rq; } +#endif /* * For the specified data direction, return the next request to @@ -386,9 +388,25 @@ deadline_next_request(struct deadline_data *dd, struct dd_per_prio *per_prio, */ spin_lock_irqsave(&dd->zone_lock, flags); while (rq) { + unsigned int zno __maybe_unused; + if (blk_req_can_dispatch_to_zone(rq)) break; + +#ifdef CONFIG_BLK_DEV_ZONED + zno = blk_rq_zone_no(rq); + rq = deadline_skip_seq_writes(dd, rq); + + /* + * Skip all other write requests for the zone with zone number + * 'zno'. This prevents that this function selects a zoned write + * that is not the first write for a given zone. + */ + while (rq && blk_rq_zone_no(rq) == zno && + blk_rq_is_seq_zoned_write(rq)) + rq = deadline_latter_request(rq); +#endif } spin_unlock_irqrestore(&dd->zone_lock, flags);
Let deadline_next_request() only consider the first zoned write per zone. This patch fixes a race condition between deadline_next_request() and completion of zoned writes. Cc: Damien Le Moal <damien.lemoal@opensource.wdc.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Ming Lei <ming.lei@redhat.com> Signed-off-by: Bart Van Assche <bvanassche@acm.org> --- block/mq-deadline.c | 18 ++++++++++++++++++ 1 file changed, 18 insertions(+)