Message ID | 20241119002815.600608-1-bvanassche@acm.org (mailing list archive) |
---|---|
Headers | show |
Series | Improve write performance for zoned UFS devices | expand |
On 11/19/24 09:27, Bart Van Assche wrote: > Hi Damien and Christoph, > > This patch series improves small write IOPS by a factor of four (+300%) for > zoned UFS devices on my test setup with an UFSHCI 3.0 controller. Although > you are probably busy because the merge window is open, please take a look > at this patch series when you have the time. This patch series is organized > as follows: > - Bug fixes for existing code at the start of the series. > - The write pipelining support implementation comes after the bug fixes. Impressive improvements but the changes are rather invasive. Have you tried simpler solution like forcing unplugging a zone write plug from the driver once a command is passed to the driver and the driver did not reject it ? It seems like this would make everything simpler on the block layer side. But I am not sure if the performance gains would be the same.
On Mon, Nov 18, 2024 at 04:27:49PM -0800, Bart Van Assche wrote: > Hi Damien and Christoph, > > This patch series improves small write IOPS by a factor of four (+300%) for > zoned UFS devices on my test setup with an UFSHCI 3.0 controller. What's your exact test setup? Which upstream kernel support zoned UFS device are using?
On 11/19/24 4:25 AM, Christoph Hellwig wrote: > What's your exact test setup? Pixel 2023 and 2025 development boards with android-mainline kernel (https://android.googlesource.com/kernel/common/+/refs/heads/android-mainline). The android mainline kernel tracks the upstream kernel closely. While the performance of Pixel development boards is identical to that of the corresponding smartphone generation, these development boards have the following advantages: - A UFS socket that allows swapping UFS devices without any soldering. - A USB port that makes it easy to monitor and capture kernel log messages. > Which upstream kernel support zoned UFS device are using? A Micron ZUFS device. Micron zoned UFS devices have a zone size that is a power of two. Thanks, Bart.
On 11/19/24 12:01 AM, Damien Le Moal wrote: > Impressive improvements but the changes are rather invasive. Have you tried > simpler solution like forcing unplugging a zone write plug from the driver once > a command is passed to the driver and the driver did not reject it ? It seems > like this would make everything simpler on the block layer side. But I am not > sure if the performance gains would be the same. Hi Damien, I'm not sure that the approach of submitting a new zoned write if the driver did not reject the previous write would result in a simpler solution. SCSI devices are allowed to respond to any command with a unit attention instead of processing the command. If a unit attention is reported, the SCSI core requeues the command. In other words, even with this approach, proper support for requeued zoned writes in the block layer is required. Additionally, that approach is not compatible with using .queue_rqs(). While the SCSI core does not yet support a .queue_rqs() callback, I think it would be good to have this support in the SCSI core. If we need requeuing support anyway, why to select an approach that probably will result in lower performance than what has been implemented in this patch series? Thanks, Bart.
On 11/20/24 04:08, Bart Van Assche wrote: > On 11/19/24 12:01 AM, Damien Le Moal wrote: >> Impressive improvements but the changes are rather invasive. Have you tried >> simpler solution like forcing unplugging a zone write plug from the driver once >> a command is passed to the driver and the driver did not reject it ? It seems >> like this would make everything simpler on the block layer side. But I am not >> sure if the performance gains would be the same. > > Hi Damien, > > I'm not sure that the approach of submitting a new zoned write if the > driver did not reject the previous write would result in a simpler > solution. SCSI devices are allowed to respond to any command with a unit > attention instead of processing the command. If a unit attention is > reported, the SCSI core requeues the command. In other words, even with > this approach, proper support for requeued zoned writes in the block > layer is required. Yes, but it would be vastly simpler because you would be guaranteed to having only a single write request per zone being in-flight between the write plug and the device at any time. So the requeue would not need reordering, and likely not need any special code at all. nless I am missing something, this would be simpler, no ? The main question though with such approach is: does it give you the same performance improvements as your current (more invasive) approach ? > Additionally, that approach is not compatible with using .queue_rqs(). > While the SCSI core does not yet support a .queue_rqs() callback, I > think it would be good to have this support in the SCSI core. I do not understand why it is not compatible. What is the problem ? > If we need requeuing support anyway, why to select an approach that > probably will result in lower performance than what has been implemented > in this patch series? I am only trying to see if there is not a simpler approach than what you did. The less changes, the better, right ?