Message ID | 20240614160350.180490-1-fengli@smartx.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | scsi: sd: Keep the discard mode stable | expand |
On Sat, Jun 15, 2024 at 12:03:47AM +0800, Li Feng wrote: > + /* > + * When the discard mode has been set to UNMAP, it should not be set to Overly long line here. > + * WRITE SAME with UNMAP. > + */ > + if (!sdkp->max_unmap_blocks) > + sd_config_discard(sdkp, SD_LBP_WS16); But more importantly this doesn't really scale to all the variations of reported / guessed at probe time vs overriden. I think you just need an explicit override flag that skips the discard settings.
> 2024年6月17日 14:17,Christoph Hellwig <hch@infradead.org> 写道: > > On Sat, Jun 15, 2024 at 12:03:47AM +0800, Li Feng wrote: >> + /* >> + * When the discard mode has been set to UNMAP, it should not be set to > > Overly long line here. OK. > >> + * WRITE SAME with UNMAP. >> + */ >> + if (!sdkp->max_unmap_blocks) >> + sd_config_discard(sdkp, SD_LBP_WS16); > > But more importantly this doesn't really scale to all the variations > of reported / guessed at probe time vs overriden. I think you just > need an explicit override flag that skips the discard settings. > I think we only need to prevent the temporary change of discard mode from UNMAP to WS16, and this patch should be enough. Maybe it is a good idea to remove the call to sd_config_discard from read_capacity_16 . Because the unmap_alignment/ unmap_granularity used by sd_config_discard are assigned in sd_read_block_limits. sd_read_block_limits is enough to negotiate the discard parameter. It is redundant for read_capacity to modify the discard parameter. In this way, when the SCSI probe sends read_capacity first and then read block limits, it avoids the change of discard from DISABLE to WS16 to UNMAP. Thanks, Li
Hey, On Sat, Jun 15, 2024 at 12:03:47AM +0800, Li Feng wrote: > There is a scenario where a large number of discard commands > are issued when the iscsi initiator connects to the target > and then performs a session rescan operation. Is this with just one specific target implementation? This sounds like a broken/buggy target, or is there a reason why this happens in general? And broken target sounds like device quirk, rather than impacting every possible target. > There is a time > window, most of the commands are in UNMAP mode, and some > discard commands become WRITE SAME with UNMAP. > > The discard mode has been negotiated during the SCSI probe. If > the mode is temporarily changed from UNMAP to WRITE SAME with > UNMAP, IO ERROR may occur because the target may not implement > WRITE SAME with UNMAP. Keep the discard mode stable to fix this > issue. > > Signed-off-by: Li Feng <fengli@smartx.com> > --- > drivers/scsi/sd.c | 7 ++++++- > 1 file changed, 6 insertions(+), 1 deletion(-) > > diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c > index f6c822c9cbd2..0165dc70a99b 100644 > --- a/drivers/scsi/sd.c > +++ b/drivers/scsi/sd.c > @@ -2598,7 +2598,12 @@ static int read_capacity_16(struct scsi_disk *sdkp, struct scsi_device *sdp, > if (buffer[14] & 0x40) /* LBPRZ */ > sdkp->lbprz = 1; > > - sd_config_discard(sdkp, SD_LBP_WS16); > + /* > + * When the discard mode has been set to UNMAP, it should not be set to > + * WRITE SAME with UNMAP. > + */ > + if (!sdkp->max_unmap_blocks) > + sd_config_discard(sdkp, SD_LBP_WS16); > } > > sdkp->capacity = lba + 1;
> 2024年6月18日 00:26,Benjamin Block <bblock@linux.ibm.com> 写道: > > Hey, > > On Sat, Jun 15, 2024 at 12:03:47AM +0800, Li Feng wrote: >> There is a scenario where a large number of discard commands >> are issued when the iscsi initiator connects to the target >> and then performs a session rescan operation. > > Is this with just one specific target implementation? This sounds like a > broken/buggy target, or is there a reason why this happens in general? > > And broken target sounds like device quirk, rather than impacting every > possible target. (resend due to GMail HTML bounce) This is a common problem. Before sending a rescan, discard has been negotiated to UNMAP. After the rescan, there will be a short window for it to become WS16, and then it will immediately become UNMAP. However, during this period, a small amount of discard commands may become WS16, resulting in a strange problem. > >> There is a time >> window, most of the commands are in UNMAP mode, and some >> discard commands become WRITE SAME with UNMAP. >> >> The discard mode has been negotiated during the SCSI probe. If >> the mode is temporarily changed from UNMAP to WRITE SAME with >> UNMAP, IO ERROR may occur because the target may not implement >> WRITE SAME with UNMAP. Keep the discard mode stable to fix this >> issue. >> >> Signed-off-by: Li Feng <fengli@smartx.com> >> --- >> drivers/scsi/sd.c | 7 ++++++- >> 1 file changed, 6 insertions(+), 1 deletion(-) >> >> diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c >> index f6c822c9cbd2..0165dc70a99b 100644 >> --- a/drivers/scsi/sd.c >> +++ b/drivers/scsi/sd.c >> @@ -2598,7 +2598,12 @@ static int read_capacity_16(struct scsi_disk *sdkp, struct scsi_device *sdp, >> if (buffer[14] & 0x40) /* LBPRZ */ >> sdkp->lbprz = 1; >> >> - sd_config_discard(sdkp, SD_LBP_WS16); >> + /* >> + * When the discard mode has been set to UNMAP, it should not be set to >> + * WRITE SAME with UNMAP. >> + */ >> + if (!sdkp->max_unmap_blocks) >> + sd_config_discard(sdkp, SD_LBP_WS16); >> } >> >> sdkp->capacity = lba + 1; > > -- > Best Regards, Benjamin Block / Linux on IBM Z Kernel Development > IBM Deutschland Research & Development GmbH / https://www.ibm.com/privacy > Vors. Aufs.-R.: Wolfgang Wendt / Gesch?ftsf?hrung: David Faller > Sitz der Ges.: B?blingen / Registergericht: AmtsG Stuttgart, HRB 243294
On Mon, Jun 17, 2024 at 05:03:03PM +0800, Li Feng wrote: > > But more importantly this doesn't really scale to all the variations > > of reported / guessed at probe time vs overriden. I think you just > > need an explicit override flag that skips the discard settings. > > > I think we only need to prevent the temporary change of discard mode > from UNMAP to WS16, and this patch should be enough. > > Maybe it is a good idea to remove the call to sd_config_discard > from read_capacity_16 . Because the unmap_alignment/ unmap_granularity > used by sd_config_discard are assigned in sd_read_block_limits. > > sd_read_block_limits is enough to negotiate the discard parameter. > It is redundant for read_capacity to modify the discard parameter. In this way, > when the SCSI probe sends read_capacity first and then read block limits, > it avoids the change of discard from DISABLE to WS16 to UNMAP. Note that in the linux-next tree for 6.11 we're not only applying the discard choice to the queue_limits structure and not commiting it in read_capacity_16. So it will be overriden before it gets actually applied. Can you check that your issue doesn't show up in linux-next?
On Tue, Jun 18, 2024 at 11:06:13AM +0800, Li Feng wrote: > > 2024年6月18日 00:26,Benjamin Block <bblock@linux.ibm.com> 写道: > > On Sat, Jun 15, 2024 at 12:03:47AM +0800, Li Feng wrote: > >> There is a scenario where a large number of discard commands > >> are issued when the iscsi initiator connects to the target > >> and then performs a session rescan operation. > > > > Is this with just one specific target implementation? This sounds like a > > broken/buggy target, or is there a reason why this happens in general? > > > > And broken target sounds like device quirk, rather than impacting every > > possible target. > > This is a common problem. Before sending a rescan, discard has been > negotiated to UNMAP. After the rescan, there will be a short window for > it to become WS16, and then it will immediately become UNMAP. > However, during this period, a small amount of discard commands > may become WS16, resulting in a strange problem. Ok, interesting. Do you know why this short window happens? > >> There is a time > >> window, most of the commands are in UNMAP mode, and some > >> discard commands become WRITE SAME with UNMAP. > >> > >> The discard mode has been negotiated during the SCSI probe. If > >> the mode is temporarily changed from UNMAP to WRITE SAME with > >> UNMAP, IO ERROR may occur because the target may not implement > >> WRITE SAME with UNMAP. Keep the discard mode stable to fix this > >> issue. > >> > >> Signed-off-by: Li Feng <fengli@smartx.com> > >> ---
diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c index f6c822c9cbd2..0165dc70a99b 100644 --- a/drivers/scsi/sd.c +++ b/drivers/scsi/sd.c @@ -2598,7 +2598,12 @@ static int read_capacity_16(struct scsi_disk *sdkp, struct scsi_device *sdp, if (buffer[14] & 0x40) /* LBPRZ */ sdkp->lbprz = 1; - sd_config_discard(sdkp, SD_LBP_WS16); + /* + * When the discard mode has been set to UNMAP, it should not be set to + * WRITE SAME with UNMAP. + */ + if (!sdkp->max_unmap_blocks) + sd_config_discard(sdkp, SD_LBP_WS16); } sdkp->capacity = lba + 1;
There is a scenario where a large number of discard commands are issued when the iscsi initiator connects to the target and then performs a session rescan operation. There is a time window, most of the commands are in UNMAP mode, and some discard commands become WRITE SAME with UNMAP. The discard mode has been negotiated during the SCSI probe. If the mode is temporarily changed from UNMAP to WRITE SAME with UNMAP, IO ERROR may occur because the target may not implement WRITE SAME with UNMAP. Keep the discard mode stable to fix this issue. Signed-off-by: Li Feng <fengli@smartx.com> --- drivers/scsi/sd.c | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-)