Message ID | 20250115224649.3973718-15-bvanassche@acm.org (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | Improve write performance for zoned UFS devices | expand |
On 1/16/2025 6:46 AM, Bart Van Assche wrote: > From the UFSHCI 4.0 specification, about the legacy (single queue) mode: > "The host controller always process transfer requests in-order according > to the order submitted to the list. In case of multiple commands with > single doorbell register ringing (batch mode), The dispatch order for > these transfer requests by host controller will base on their index in > the List. A transfer request with lower index value will be executed > before a transfer request with higher index value." > > From the UFSHCI 4.0 specification, about the MCQ mode: > "Command Submission > 1. Host SW writes an Entry to SQ > 2. Host SW updates SQ doorbell tail pointer > > Command Processing > 3. After fetching the Entry, Host Controller updates SQ doorbell head > pointer > 4. Host controller sends COMMAND UPIU to UFS device" > > In other words, for both legacy and MCQ mode, UFS controllers are > required to forward commands to the UFS device in the order these > commands have been received from the host. > > Notes: > - For legacy mode this is only correct if the host submits one > command at a time. The UFS driver does this. > - Also in legacy mode, the command order is not preserved if > auto-hibernation is enabled in the UFS controller. > > This patch improves performance as follows on a test setup with UFSHCI > 3.0 controller: > - With the mq-deadline scheduler: 2.5x more IOPS for small writes. > - When not using an I/O scheduler compared to using mq-deadline with > zone locking: 4x more IOPS for small writes. > > Cc: Bao D. Nguyen <quic_nguyenb@quicinc.com> > Cc: Can Guo <quic_cang@quicinc.com> > Cc: Martin K. Petersen <martin.petersen@oracle.com> > Cc: Avri Altman <avri.altman@wdc.com> > Signed-off-by: Bart Van Assche <bvanassche@acm.org> > --- > drivers/ufs/core/ufshcd.c | 7 +++++++ > 1 file changed, 7 insertions(+) > > diff --git a/drivers/ufs/core/ufshcd.c b/drivers/ufs/core/ufshcd.c > index 3094f3c89e82..08803ba21668 100644 > --- a/drivers/ufs/core/ufshcd.c > +++ b/drivers/ufs/core/ufshcd.c > @@ -5255,6 +5255,13 @@ static int ufshcd_device_configure(struct scsi_device *sdev, > struct ufs_hba *hba = shost_priv(sdev->host); > struct request_queue *q = sdev->request_queue; > > + /* > + * With auto-hibernation disabled, the write order is preserved per > + * MCQ. Auto-hibernation may cause write reordering that results in > + * unaligned write errors. The SCSI core will retry the failed writes. > + */ > + lim->driver_preserves_write_order = true; > + > lim->dma_pad_mask = PRDT_DATA_BYTE_COUNT_PAD - 1; > > /* Review-by: Can Guo <quic_cang@quicinc.com>
On 1/15/2025 2:46 PM, Bart Van Assche wrote: > This patch improves performance as follows on a test setup with UFSHCI > 3.0 controller: > - With the mq-deadline scheduler: 2.5x more IOPS for small writes. > - When not using an I/O scheduler compared to using mq-deadline with > zone locking: 4x more IOPS for small writes. Hi Bart, Wondering if the change has been tried on 4.x hosts and using different IO schedulers? Any performance improvements? Thanks, Bao
diff --git a/drivers/ufs/core/ufshcd.c b/drivers/ufs/core/ufshcd.c index 3094f3c89e82..08803ba21668 100644 --- a/drivers/ufs/core/ufshcd.c +++ b/drivers/ufs/core/ufshcd.c @@ -5255,6 +5255,13 @@ static int ufshcd_device_configure(struct scsi_device *sdev, struct ufs_hba *hba = shost_priv(sdev->host); struct request_queue *q = sdev->request_queue; + /* + * With auto-hibernation disabled, the write order is preserved per + * MCQ. Auto-hibernation may cause write reordering that results in + * unaligned write errors. The SCSI core will retry the failed writes. + */ + lim->driver_preserves_write_order = true; + lim->dma_pad_mask = PRDT_DATA_BYTE_COUNT_PAD - 1; /*
From the UFSHCI 4.0 specification, about the legacy (single queue) mode: "The host controller always process transfer requests in-order according to the order submitted to the list. In case of multiple commands with single doorbell register ringing (batch mode), The dispatch order for these transfer requests by host controller will base on their index in the List. A transfer request with lower index value will be executed before a transfer request with higher index value." From the UFSHCI 4.0 specification, about the MCQ mode: "Command Submission 1. Host SW writes an Entry to SQ 2. Host SW updates SQ doorbell tail pointer Command Processing 3. After fetching the Entry, Host Controller updates SQ doorbell head pointer 4. Host controller sends COMMAND UPIU to UFS device" In other words, for both legacy and MCQ mode, UFS controllers are required to forward commands to the UFS device in the order these commands have been received from the host. Notes: - For legacy mode this is only correct if the host submits one command at a time. The UFS driver does this. - Also in legacy mode, the command order is not preserved if auto-hibernation is enabled in the UFS controller. This patch improves performance as follows on a test setup with UFSHCI 3.0 controller: - With the mq-deadline scheduler: 2.5x more IOPS for small writes. - When not using an I/O scheduler compared to using mq-deadline with zone locking: 4x more IOPS for small writes. Cc: Bao D. Nguyen <quic_nguyenb@quicinc.com> Cc: Can Guo <quic_cang@quicinc.com> Cc: Martin K. Petersen <martin.petersen@oracle.com> Cc: Avri Altman <avri.altman@wdc.com> Signed-off-by: Bart Van Assche <bvanassche@acm.org> --- drivers/ufs/core/ufshcd.c | 7 +++++++ 1 file changed, 7 insertions(+)