[RFC,v3,0/5] scsi: ufs: Add Host Performance Booster Support

Message ID	963815509.21592879582091.JavaMail.epsvc@epcpadp2 (mailing list archive)
Headers	show Return-Path: <SRS0=JRJf=AE=vger.kernel.org=linux-scsi-owner@kernel.org> Mime-Version: 1.0 Subject: [RFC PATCH v3 0/5] scsi: ufs: Add Host Performance Booster Support Reply-To: daejun7.park@samsung.com From: Daejun Park <daejun7.park@samsung.com> To: "avri.altman@wdc.com" <avri.altman@wdc.com>, "jejb@linux.ibm.com" <jejb@linux.ibm.com>, "martin.petersen@oracle.com" <martin.petersen@oracle.com>, "asutoshd@codeaurora.org" <asutoshd@codeaurora.org>, "stanley.chu@mediatek.com" <stanley.chu@mediatek.com>, "cang@codeaurora.org" <cang@codeaurora.org>, "huobean@gmail.com" <huobean@gmail.com>, "bvanassche@acm.org" <bvanassche@acm.org>, "tomas.winkler@intel.com" <tomas.winkler@intel.com>, ALIM AKHTAR <alim.akhtar@samsung.com>, Daejun Park <daejun7.park@samsung.com> CC: "linux-scsi@vger.kernel.org" <linux-scsi@vger.kernel.org>, "linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>, Sang-yoon Oh <sangyoon.oh@samsung.com>, Sung-Jun Park <sungjun07.park@samsung.com>, yongmyung lee <ymhungry.lee@samsung.com>, Jinyoung CHOI <j-young.choi@samsung.com>, Adel Choi <adel.choi@samsung.com>, BoRam Shin <boram.shin@samsung.com> Message-ID: <963815509.21592879582091.JavaMail.epsvc@epcpadp2> Date: Tue, 23 Jun 2020 10:02:01 +0900 Content-Transfer-Encoding: 7bit Content-Type: text/plain; charset="utf-8" References: <CGME20200623010201epcms2p11aebdf1fbc719b409968cba997507114@epcms2p1> Sender: linux-scsi-owner@vger.kernel.org Precedence: bulk
Series	scsi: ufs: Add Host Performance Booster Support \| expand [RFC,v3,0/5] scsi: ufs: Add Host Performance Booster Support [RFC,v3,1/5] scsi: ufs: Add UFS feature related parameter [RFC,v3,2/5] scsi: ufs: Add UFS-feature layer [RFC,v3,3/5] scsi: ufs: Introduce HPB module [RFC,v3,4/5] scsi: ufs: L2P map management for HPB read [RFC,v3,5/5] scsi: ufs: Prepare HPB read for cached sub-region

Daejun Park June 23, 2020, 1:02 a.m. UTC

Changelog:

v2 -> v3
1. Add checking input module parameter value.
2. Change base commit from 5.8/scsi-queue to 5.9/scsi-queue.
3. Cleanup for unused variables and label.

v1 -> v2
1. Change the full boilerplate text to SPDX style.
2. Adopt dynamic allocation for sub-region data structure.
3. Cleanup.

NAND flash memory-based storage devices use Flash Translation Layer (FTL)
to translate logical addresses of I/O requests to corresponding flash
memory addresses. Mobile storage devices typically have RAM with
constrained size, thus lack in memory to keep the whole mapping table.
Therefore, mapping tables are partially retrieved from NAND flash on
demand, causing random-read performance degradation.

To improve random read performance, JESD220-3 (HPB v1.0) proposes HPB
(Host Performance Booster) which uses host system memory as a cache for the
FTL mapping table. By using HPB, FTL data can be read from host memory
faster than from NAND flash memory. 

The current version only supports the DCM (device control mode).
This patch consists of 4 parts to support HPB feature.

1) UFS-feature layer
2) HPB probe and initialization process
3) READ -> HPB READ using cached map information
4) L2P (logical to physical) map management

The UFS-feature is an additional layer to avoid the structure in which the
UFS-core driver and the UFS-feature are entangled with each other in a 
single module.
By adding the layer, UFS-features composed of various combinations can be
supported. Also, even if a new feature is added, modification of the 
UFS-core driver can be minimized.

In the HPB probe and init process, the device information of the UFS is
queried. After checking supported features, the data structure for the HPB
is initialized according to the device information.

A read I/O in the active sub-region where the map is cached is changed to
HPB READ by the HPB module.

The HPB module manages the L2P map using information received from the
device. For active sub-region, the HPB module caches through ufshpb_map
request. For the in-active region, the HPB module discards the L2P map.
When a write I/O occurs in an active sub-region area, associated dirty
bitmap checked as dirty for preventing stale read.

HPB is shown to have a performance improvement of 58 - 67% for random read
workload. [1]

This series patches are based on the 5.9/scsi-queue branch.

[1]:
https://www.usenix.org/conference/hotstorage17/program/presentation/jeong

Daejun park (5):
 scsi: ufs: Add UFS feature related parameter
 scsi: ufs: Add UFS feature layer
 scsi: ufs: Introduce HPB module
 scsi: ufs: L2P map management for HPB read
 scsi: ufs: Prepare HPB read for cached sub-region
 
 drivers/scsi/ufs/Kconfig      |    9 +
 drivers/scsi/ufs/Makefile     |    3 +-
 drivers/scsi/ufs/ufs.h        |   12 +
 drivers/scsi/ufs/ufsfeature.c |  148 +++
 drivers/scsi/ufs/ufsfeature.h |   69 ++
 drivers/scsi/ufs/ufshcd.c     |   23 +-
 drivers/scsi/ufs/ufshcd.h     |    3 +
 drivers/scsi/ufs/ufshpb.c     | 1996 ++++++++++++++++++++++++++++++++++++
 drivers/scsi/ufs/ufshpb.h     |  234 +++++
 9 files changed, 2494 insertions(+), 3 deletions(-)
 created mode 100644 drivers/scsi/ufs/ufsfeature.c
 created mode 100644 drivers/scsi/ufs/ufsfeature.h
 created mode 100644 drivers/scsi/ufs/ufshpb.c
 created mode 100644 drivers/scsi/ufs/ufshpb.h

Bean Huo June 28, 2020, 12:26 p.m. UTC | #1

Hi Daejun

Seems you intentionally ignored to give you comments on my suggestion.
let me provide the reason.

Before submitting your next version patch, please check your L2P
mapping HPB reqeust submission logical algorithem. I have did
performance comparison testing on 4KB, there are about 13% performance
drop. Also the hit count is lower. I don't know if this is related to
your current work queue scheduling, since you didn't add the timer for
each HPB request.

Thanks,

Bean


On Tue, 2020-06-23 at 10:02 +0900, Daejun Park wrote:
> Changelog:
> 
> v2 -> v3
> 1. Add checking input module parameter value.
> 2. Change base commit from 5.8/scsi-queue to 5.9/scsi-queue.
> 3. Cleanup for unused variables and label.
> 
> v1 -> v2
> 1. Change the full boilerplate text to SPDX style.
> 2. Adopt dynamic allocation for sub-region data structure.
> 3. Cleanup.
> 
> NAND flash memory-based storage devices use Flash Translation Layer
> (FTL)
> to translate logical addresses of I/O requests to corresponding flash
> memory addresses. Mobile storage devices typically have RAM with
> constrained size, thus lack in memory to keep the whole mapping
> table.
> Therefore, mapping tables are partially retrieved from NAND flash on
> demand, causing random-read performance degradation.
> 
> To improve random read performance, JESD220-3 (HPB v1.0) proposes HPB
> (Host Performance Booster) which uses host system memory as a cache
> for the
> FTL mapping table. By using HPB, FTL data can be read from host
> memory
> faster than from NAND flash memory. 
> 
> The current version only supports the DCM (device control mode).
> This patch consists of 4 parts to support HPB feature.
> 
> 1) UFS-feature layer
> 2) HPB probe and initialization process
> 3) READ -> HPB READ using cached map information
> 4) L2P (logical to physical) map management
> 
> The UFS-feature is an additional layer to avoid the structure in
> which the
> UFS-core driver and the UFS-feature are entangled with each other in
> a 
> single module.
> By adding the layer, UFS-features composed of various combinations
> can be
> supported. Also, even if a new feature is added, modification of the 
> UFS-core driver can be minimized.
> 
> In the HPB probe and init process, the device information of the UFS
> is
> queried. After checking supported features, the data structure for
> the HPB
> is initialized according to the device information.
> 
> A read I/O in the active sub-region where the map is cached is
> changed to
> HPB READ by the HPB module.
> 
> The HPB module manages the L2P map using information received from
> the
> device. For active sub-region, the HPB module caches through
> ufshpb_map
> request. For the in-active region, the HPB module discards the L2P
> map.
> When a write I/O occurs in an active sub-region area, associated
> dirty
> bitmap checked as dirty for preventing stale read.
> 
> HPB is shown to have a performance improvement of 58 - 67% for random
> read
> workload. [1]
> 
> This series patches are based on the 5.9/scsi-queue branch.
> 
> [1]:
> 
https://www.usenix.org/conference/hotstorage17/program/presentation/jeong
> 
> Daejun park (5):
>  scsi: ufs: Add UFS feature related parameter
>  scsi: ufs: Add UFS feature layer
>  scsi: ufs: Introduce HPB module
>  scsi: ufs: L2P map management for HPB read
>  scsi: ufs: Prepare HPB read for cached sub-region
>  
>  drivers/scsi/ufs/Kconfig      |    9 +
>  drivers/scsi/ufs/Makefile     |    3 +-
>  drivers/scsi/ufs/ufs.h        |   12 +
>  drivers/scsi/ufs/ufsfeature.c |  148 +++
>  drivers/scsi/ufs/ufsfeature.h |   69 ++
>  drivers/scsi/ufs/ufshcd.c     |   23 +-
>  drivers/scsi/ufs/ufshcd.h     |    3 +
>  drivers/scsi/ufs/ufshpb.c     | 1996
> ++++++++++++++++++++++++++++++++++++
>  drivers/scsi/ufs/ufshpb.h     |  234 +++++
>  9 files changed, 2494 insertions(+), 3 deletions(-)
>  created mode 100644 drivers/scsi/ufs/ufsfeature.c
>  created mode 100644 drivers/scsi/ufs/ufsfeature.h
>  created mode 100644 drivers/scsi/ufs/ufshpb.c
>  created mode 100644 drivers/scsi/ufs/ufshpb.h

Avri Altman June 29, 2020, 5:17 a.m. UTC | #2

If no-one else objects, maybe you can submit your patches as non-RFC for review?

Thanks,
Avri

> -----Original Message-----
> From: Daejun Park <daejun7.park@samsung.com>
> Sent: Tuesday, June 23, 2020 4:02 AM
> To: Avri Altman <Avri.Altman@wdc.com>; jejb@linux.ibm.com;
> martin.petersen@oracle.com; asutoshd@codeaurora.org;
> stanley.chu@mediatek.com; cang@codeaurora.org; huobean@gmail.com;
> bvanassche@acm.org; tomas.winkler@intel.com; ALIM AKHTAR
> <alim.akhtar@samsung.com>; Daejun Park <daejun7.park@samsung.com>
> Cc: linux-scsi@vger.kernel.org; linux-kernel@vger.kernel.org; Sang-yoon Oh
> <sangyoon.oh@samsung.com>; Sung-Jun Park
> <sungjun07.park@samsung.com>; yongmyung lee
> <ymhungry.lee@samsung.com>; Jinyoung CHOI <j-
> young.choi@samsung.com>; Adel Choi <adel.choi@samsung.com>; BoRam
> Shin <boram.shin@samsung.com>
> Subject: [RFC PATCH v3 0/5] scsi: ufs: Add Host Performance Booster Support
> 
> CAUTION: This email originated from outside of Western Digital. Do not click
> on links or open attachments unless you recognize the sender and know that
> the content is safe.
> 
> 
> Changelog:
> 
> v2 -> v3
> 1. Add checking input module parameter value.
> 2. Change base commit from 5.8/scsi-queue to 5.9/scsi-queue.
> 3. Cleanup for unused variables and label.
> 
> v1 -> v2
> 1. Change the full boilerplate text to SPDX style.
> 2. Adopt dynamic allocation for sub-region data structure.
> 3. Cleanup.
> 
> NAND flash memory-based storage devices use Flash Translation Layer (FTL)
> to translate logical addresses of I/O requests to corresponding flash
> memory addresses. Mobile storage devices typically have RAM with
> constrained size, thus lack in memory to keep the whole mapping table.
> Therefore, mapping tables are partially retrieved from NAND flash on
> demand, causing random-read performance degradation.
> 
> To improve random read performance, JESD220-3 (HPB v1.0) proposes HPB
> (Host Performance Booster) which uses host system memory as a cache for
> the
> FTL mapping table. By using HPB, FTL data can be read from host memory
> faster than from NAND flash memory.
> 
> The current version only supports the DCM (device control mode).
> This patch consists of 4 parts to support HPB feature.
> 
> 1) UFS-feature layer
> 2) HPB probe and initialization process
> 3) READ -> HPB READ using cached map information
> 4) L2P (logical to physical) map management
> 
> The UFS-feature is an additional layer to avoid the structure in which the
> UFS-core driver and the UFS-feature are entangled with each other in a
> single module.
> By adding the layer, UFS-features composed of various combinations can be
> supported. Also, even if a new feature is added, modification of the
> UFS-core driver can be minimized.
> 
> In the HPB probe and init process, the device information of the UFS is
> queried. After checking supported features, the data structure for the HPB
> is initialized according to the device information.
> 
> A read I/O in the active sub-region where the map is cached is changed to
> HPB READ by the HPB module.
> 
> The HPB module manages the L2P map using information received from the
> device. For active sub-region, the HPB module caches through ufshpb_map
> request. For the in-active region, the HPB module discards the L2P map.
> When a write I/O occurs in an active sub-region area, associated dirty
> bitmap checked as dirty for preventing stale read.
> 
> HPB is shown to have a performance improvement of 58 - 67% for random
> read
> workload. [1]
> 
> This series patches are based on the 5.9/scsi-queue branch.
> 
> [1]:
> https://www.usenix.org/conference/hotstorage17/program/presentation/jeo
> ng
> 
> Daejun park (5):
>  scsi: ufs: Add UFS feature related parameter
>  scsi: ufs: Add UFS feature layer
>  scsi: ufs: Introduce HPB module
>  scsi: ufs: L2P map management for HPB read
>  scsi: ufs: Prepare HPB read for cached sub-region
> 
>  drivers/scsi/ufs/Kconfig      |    9 +
>  drivers/scsi/ufs/Makefile     |    3 +-
>  drivers/scsi/ufs/ufs.h        |   12 +
>  drivers/scsi/ufs/ufsfeature.c |  148 +++
>  drivers/scsi/ufs/ufsfeature.h |   69 ++
>  drivers/scsi/ufs/ufshcd.c     |   23 +-
>  drivers/scsi/ufs/ufshcd.h     |    3 +
>  drivers/scsi/ufs/ufshpb.c     | 1996 ++++++++++++++++++++++++++++++++++++
>  drivers/scsi/ufs/ufshpb.h     |  234 +++++
>  9 files changed, 2494 insertions(+), 3 deletions(-)
>  created mode 100644 drivers/scsi/ufs/ufsfeature.c
>  created mode 100644 drivers/scsi/ufs/ufsfeature.h
>  created mode 100644 drivers/scsi/ufs/ufshpb.c
>  created mode 100644 drivers/scsi/ufs/ufshpb.h

Avri Altman June 29, 2020, 5:24 a.m. UTC | #3

Hi Bean,
> 
> Hi Daejun
> 
> Seems you intentionally ignored to give you comments on my suggestion.
> let me provide the reason.
> 
> Before submitting your next version patch, please check your L2P
> mapping HPB reqeust submission logical algorithem. I have did
> performance comparison testing on 4KB, there are about 13% performance
> drop. Also the hit count is lower. I don't know if this is related to
> your current work queue scheduling, since you didn't add the timer for
> each HPB request.
In device control mode, the various decisions,
and specifically those that are causing repetitive evictions,
are made by the device.
Is this the issue that you are referring to? 

As for the driver, do you see any issue that is causing unnecessary latency? 

Thanks,
Avri

Daejun Park June 29, 2020, 6:15 a.m. UTC | #4

> Seems you intentionally ignored to give you comments on my suggestion.
> let me provide the reason.
Sorry! I replied to your comment (https://lkml.org/lkml/2020/6/15/1492),
but you didn't reply on that. I thought you agreed because you didn't send
any more comments.


> Before submitting your next version patch, please check your L2P
> mapping HPB reqeust submission logical algorithem. I have did
We are also reviewing the code that you submitted before.
It seems to be a performance improvement as it sends a map request directly.

> performance comparison testing on 4KB, there are about 13% performance
> drop. Also the hit count is lower. I don't know if this is related to
It is interesting that there is actually a performance improvement. 
Could you share the test environment, please? However, I think stability is
important to HPB driver. We have tested our method with the real products and
the HPB 1.0 driver is based on that.
After this patch, your approach can be done as an incremental patch? I would
like to test the patch that you submitted and verify it.

> your current work queue scheduling, since you didn't add the timer for
> each HPB request.
There was Bart's comment that it was not good add an arbitrary timeout value
to the request. (please refer to: https://lkml.org/lkml/2020/6/11/1043)
When no timer is added to the request, the SD timout will be set as default
timeout at the block layer.

Thanks,
Daejun

Bean Huo June 29, 2020, 10:53 a.m. UTC | #5

Hi Avri

On Mon, 2020-06-29 at 05:24 +0000, Avri Altman wrote:
> Hi Bean,
> > 
> > Hi Daejun
> > 
> > Seems you intentionally ignored to give you comments on my
> > suggestion.
> > let me provide the reason.
> > 
> > Before submitting your next version patch, please check your L2P
> > mapping HPB reqeust submission logical algorithem. I have did
> > performance comparison testing on 4KB, there are about 13%
> > performance
> > drop. Also the hit count is lower. I don't know if this is related
> > to
> > your current work queue scheduling, since you didn't add the timer
> > for
> > each HPB request.
> 
> In device control mode, the various decisions,
> and specifically those that are causing repetitive evictions,
> are made by the device.
> Is this the issue that you are referring to? 
> 

For this device mode, if HPB mapping table of the active region becomes
dirty in the UFS device side, there is repetitive inactive rsp, but it
is not the reason for the condition I mentioned here.  

> As for the driver, do you see any issue that is causing unnecessary
> latency? 
> 

In Daejun's patch, it now uses work_queue, and as long there is new RSP of thesubregion to be activated, the driver will queue "work" to this work
queue, actually, this is deferred work. we don't know when it will be
scheduled/finished. we need to optimize it.

Thanks,
Bean

Avri Altman June 29, 2020, 11:06 a.m. UTC | #6

> 
> Hi Avri
> 
> On Mon, 2020-06-29 at 05:24 +0000, Avri Altman wrote:
> > Hi Bean,
> > >
> > > Hi Daejun
> > >
> > > Seems you intentionally ignored to give you comments on my
> > > suggestion.
> > > let me provide the reason.
> > >
> > > Before submitting your next version patch, please check your L2P
> > > mapping HPB reqeust submission logical algorithem. I have did
> > > performance comparison testing on 4KB, there are about 13%
> > > performance
> > > drop. Also the hit count is lower. I don't know if this is related
> > > to
> > > your current work queue scheduling, since you didn't add the timer
> > > for
> > > each HPB request.
> >
> > In device control mode, the various decisions,
> > and specifically those that are causing repetitive evictions,
> > are made by the device.
> > Is this the issue that you are referring to?
> >
> 
> For this device mode, if HPB mapping table of the active region becomes
> dirty in the UFS device side, there is repetitive inactive rsp, but it
> is not the reason for the condition I mentioned here.
> 
> > As for the driver, do you see any issue that is causing unnecessary
> > latency?
> >
> 
> In Daejun's patch, it now uses work_queue, and as long there is new RSP of
> thesubregion to be activated, the driver will queue "work" to this work
> queue, actually, this is deferred work. we don't know when it will be
> scheduled/finished. we need to optimize it.
But those "to-do" lists are checked on every completion interrupt and on every resume.
Do you see any scenario in which the "to-be-activated" or "to-be-inactivate" work is getting starved?

Bean Huo June 29, 2020, 11:25 a.m. UTC | #7

Hi Daejun

On Mon, 2020-06-29 at 15:15 +0900, Daejun Park wrote:
> > Seems you intentionally ignored to give you comments on my
> > suggestion.
> > let me provide the reason.
> 
> Sorry! I replied to your comment (
> https://lkml.org/lkml/2020/6/15/1492),
> but you didn't reply on that. I thought you agreed because you didn't
> send
> any more comments.
> 
> 
> > Before submitting your next version patch, please check your L2P
> > mapping HPB reqeust submission logical algorithem. I have did
> 
> We are also reviewing the code that you submitted before.
> It seems to be a performance improvement as it sends a map request
> directly.
> 
> > performance comparison testing on 4KB, there are about 13%
> > performance
> > drop. Also the hit count is lower. I don't know if this is related
> > to
> 
> It is interesting that there is actually a performance improvement. 
> Could you share the test environment, please? However, I think
> stability is
> important to HPB driver. We have tested our method with the real
> products and
> the HPB 1.0 driver is based on that.

I just run fio benchmark tool with --rw=randread, --bs=4kb, --
size=8G/10G/64G/100G. and see what performance diff with the direct
submission approach.

> After this patch, your approach can be done as an incremental patch?
> I would
> like to test the patch that you submitted and verify it.
> 
> > your current work queue scheduling, since you didn't add the timer
> > for
> > each HPB request.
> 

Taking into consideration of the HPB 2.0, can we submit the HPB write
request to the SCSI layer? if not, it will be a direct submission way.
why not directly use direct way? or maybe you have a more advisable
approach to work around this. would you please share with us.
appreciate.


> There was Bart's comment that it was not good add an arbitrary
> timeout value
> to the request. (please refer to: 
> https://lkml.org/lkml/2020/6/11/1043)
> When no timer is added to the request, the SD timout will be set as
> default
> timeout at the block layer.
> 

I saw that, so I should add a timer in order to optimise HPB reqeust
scheduling/completition. this is ok so far.

> Thanks,
> Daejun

Thanks,
Bean

Bean Huo June 29, 2020, 11:39 a.m. UTC | #8

On Mon, 2020-06-29 at 11:06 +0000, Avri Altman wrote:
> > 
> > Hi Avri
> > 
> > On Mon, 2020-06-29 at 05:24 +0000, Avri Altman wrote:
> > > Hi Bean,
> > > > 
> > > > Hi Daejun
> > > > 
> > > > Seems you intentionally ignored to give you comments on my
> > > > suggestion.
> > > > let me provide the reason.
> > > > 
> > > > Before submitting your next version patch, please check your
> > > > L2P
> > > > mapping HPB reqeust submission logical algorithem. I have did
> > > > performance comparison testing on 4KB, there are about 13%
> > > > performance
> > > > drop. Also the hit count is lower. I don't know if this is
> > > > related
> > > > to
> > > > your current work queue scheduling, since you didn't add the
> > > > timer
> > > > for
> > > > each HPB request.
> > > 
> > > In device control mode, the various decisions,
> > > and specifically those that are causing repetitive evictions,
> > > are made by the device.
> > > Is this the issue that you are referring to?
> > > 
> > 
> > For this device mode, if HPB mapping table of the active region
> > becomes
> > dirty in the UFS device side, there is repetitive inactive rsp, but
> > it
> > is not the reason for the condition I mentioned here.
> > 
> > > As for the driver, do you see any issue that is causing
> > > unnecessary
> > > latency?
> > > 
> > 
> > In Daejun's patch, it now uses work_queue, and as long there is new
> > RSP of
> > thesubregion to be activated, the driver will queue "work" to this
> > work
> > queue, actually, this is deferred work. we don't know when it will
> > be
> > scheduled/finished. we need to optimize it.
> 
> But those "to-do" lists are checked on every completion interrupt and
> on every resume.
> Do you see any scenario in which the "to-be-activated" or "to-be-
> inactivate" work is getting starved?
> 

let me run more testing cases, will back to you if there is new
updates.

Thanks,
Bean

Daejun Park June 30, 2020, 1:05 a.m. UTC | #9

Hi Bean,
> On Mon, 2020-06-29 at 15:15 +0900, Daejun Park wrote:
> > > Seems you intentionally ignored to give you comments on my
> > > suggestion.
> > > let me provide the reason.
> > 
> > Sorry! I replied to your comment (
> > https://protect2.fireeye.com/url?k=be575021-e3854728-be56db6e-0cc47a31cdf8-6c7d0e1e42762b92&q=1&u=https%3A%2F%2Flkml.org%2Flkml%2F2020%2F6%2F15%2F1492),
> > but you didn't reply on that. I thought you agreed because you didn't
> > send
> > any more comments.
> > 
> > 
> > > Before submitting your next version patch, please check your L2P
> > > mapping HPB reqeust submission logical algorithem. I have did
> > 
> > We are also reviewing the code that you submitted before.
> > It seems to be a performance improvement as it sends a map request
> > directly.
> > 
> > > performance comparison testing on 4KB, there are about 13%
> > > performance
> > > drop. Also the hit count is lower. I don't know if this is related
> > > to
> > 
> > It is interesting that there is actually a performance improvement. 
> > Could you share the test environment, please? However, I think
> > stability is
> > important to HPB driver. We have tested our method with the real
> > products and
> > the HPB 1.0 driver is based on that.
> 
> I just run fio benchmark tool with --rw=randread, --bs=4kb, --
> size=8G/10G/64G/100G. and see what performance diff with the direct
> submission approach.

Thanks!

> > After this patch, your approach can be done as an incremental patch?
> > I would
> > like to test the patch that you submitted and verify it.
> > 
> > > your current work queue scheduling, since you didn't add the timer
> > > for
> > > each HPB request.
> > 
> 
> Taking into consideration of the HPB 2.0, can we submit the HPB write
> request to the SCSI layer? if not, it will be a direct submission way.
> why not directly use direct way? or maybe you have a more advisable
> approach to work around this. would you please share with us.
> appreciate.

I am considering a direct submission way for the next version.
We will implement the write buffer command of HPB 2.0, after patching HPB 1.0.

As for the direct submission of HPB releated command including HPB write
buffer, I think we'd better discuss the right approach in depth before
moving on to the next step.

Thanks,
Daejun

Avri Altman June 30, 2020, 6:39 a.m. UTC | #10

Hi,
 
> 
> Hi Bean,
> > On Mon, 2020-06-29 at 15:15 +0900, Daejun Park wrote:
> > > > Seems you intentionally ignored to give you comments on my
> > > > suggestion.
> > > > let me provide the reason.
> > >
> > > Sorry! I replied to your comment (
> > > https://protect2.fireeye.com/url?k=be575021-e3854728-be56db6e-
> 0cc47a31cdf8-
> 6c7d0e1e42762b92&q=1&u=https%3A%2F%2Flkml.org%2Flkml%2F2020%2F6%
> 2F15%2F1492),
> > > but you didn't reply on that. I thought you agreed because you didn't
> > > send
> > > any more comments.
> > >
> > >
> > > > Before submitting your next version patch, please check your L2P
> > > > mapping HPB reqeust submission logical algorithem. I have did
> > >
> > > We are also reviewing the code that you submitted before.
> > > It seems to be a performance improvement as it sends a map request
> > > directly.
> > >
> > > > performance comparison testing on 4KB, there are about 13%
> > > > performance
> > > > drop. Also the hit count is lower. I don't know if this is related
> > > > to
> > >
> > > It is interesting that there is actually a performance improvement.
> > > Could you share the test environment, please? However, I think
> > > stability is
> > > important to HPB driver. We have tested our method with the real
> > > products and
> > > the HPB 1.0 driver is based on that.
> >
> > I just run fio benchmark tool with --rw=randread, --bs=4kb, --
> > size=8G/10G/64G/100G. and see what performance diff with the direct
> > submission approach.
> 
> Thanks!
> 
> > > After this patch, your approach can be done as an incremental patch?
> > > I would
> > > like to test the patch that you submitted and verify it.
> > >
> > > > your current work queue scheduling, since you didn't add the timer
> > > > for
> > > > each HPB request.
> > >
> >
> > Taking into consideration of the HPB 2.0, can we submit the HPB write
> > request to the SCSI layer? if not, it will be a direct submission way.
> > why not directly use direct way? or maybe you have a more advisable
> > approach to work around this. would you please share with us.
> > appreciate.
> 
> I am considering a direct submission way for the next version.
> We will implement the write buffer command of HPB 2.0, after patching HPB
> 1.0.
> 
> As for the direct submission of HPB releated command including HPB write
> buffer, I think we'd better discuss the right approach in depth before
> moving on to the next step.
I vote to stay with the current implementation because:
1) Bean is probably right about 2.0, but it's out of scope for now - 
    there is a long way to go before we'll need to worry about it
2) For now, we should focus on the functional flows. 
    Performance issues, should such issues indeed exists, can be dealt with  later.  And,
3) The current code base is running in production for more than 3 years now.
     I am not so eager to dump a robust, well debugged code unless it absolutely necessary.

Thanks,
Avri

Bean Huo June 30, 2020, 9:59 p.m. UTC | #11

On Tue, 2020-06-30 at 06:39 +0000, Avri Altman wrote:
> Hi,
>  
> > 
> > Hi Bean,
> > > On Mon, 2020-06-29 at 15:15 +0900, Daejun Park wrote:
> > > > > Seems you intentionally ignored to give you comments on my
> > > > > suggestion.
> > > > > let me provide the reason.
> > > > 
> > > > Sorry! I replied to your comment (
> > > > https://protect2.fireeye.com/url?k=be575021-e3854728-be56db6e-
> > 
> > 0cc47a31cdf8-
> > 6c7d0e1e42762b92&q=1&u=https%3A%2F%2Flkml.org%2Flkml%2F2020%2F6%
> > 2F15%2F1492),
> > > > but you didn't reply on that. I thought you agreed because you
> > > > didn't
> > > > send
> > > > any more comments.
> > > > 
> > > > 
> > > > > Before submitting your next version patch, please check your
> > > > > L2P
> > > > > mapping HPB reqeust submission logical algorithem. I have did
> > > > 
> > > > We are also reviewing the code that you submitted before.
> > > > It seems to be a performance improvement as it sends a map
> > > > request
> > > > directly.
> > > > 
> > > > > performance comparison testing on 4KB, there are about 13%
> > > > > performance
> > > > > drop. Also the hit count is lower. I don't know if this is
> > > > > related
> > > > > to
> > > > 
> > > > It is interesting that there is actually a performance
> > > > improvement.
> > > > Could you share the test environment, please? However, I think
> > > > stability is
> > > > important to HPB driver. We have tested our method with the
> > > > real
> > > > products and
> > > > the HPB 1.0 driver is based on that.
> > > 
> > > I just run fio benchmark tool with --rw=randread, --bs=4kb, --
> > > size=8G/10G/64G/100G. and see what performance diff with the
> > > direct
> > > submission approach.
> > 
> > Thanks!
> > 
> > > > After this patch, your approach can be done as an incremental
> > > > patch?
> > > > I would
> > > > like to test the patch that you submitted and verify it.
> > > > 
> > > > > your current work queue scheduling, since you didn't add the
> > > > > timer
> > > > > for
> > > > > each HPB request.
> > > 
> > > Taking into consideration of the HPB 2.0, can we submit the HPB
> > > write
> > > request to the SCSI layer? if not, it will be a direct submission
> > > way.
> > > why not directly use direct way? or maybe you have a more
> > > advisable
> > > approach to work around this. would you please share with us.
> > > appreciate.
> > 
> > I am considering a direct submission way for the next version.
> > We will implement the write buffer command of HPB 2.0, after
> > patching HPB
> > 1.0.
> > 
> > As for the direct submission of HPB releated command including HPB
> > write
> > buffer, I think we'd better discuss the right approach in depth
> > before
> > moving on to the next step.
> 
> I vote to stay with the current implementation because:
> 1) Bean is probably right about 2.0, but it's out of scope for now - 
>     there is a long way to go before we'll need to worry about it
> 2) For now, we should focus on the functional flows. 
>     Performance issues, should such issues indeed exists, can be
> dealt with  later.  And,
> 3) The current code base is running in production for more than 3
> years now.
>      I am not so eager to dump a robust, well debugged code unless it
> absolutely necessary.
> 
> Thanks,
> Avri
> 
> 
Hi Avri
Thanks, appreciate you shared your position on this topic. I don't know
how I can convince you to change your opinion.
Let me try.

1. HPB 2.0 is not out of scope.
HPB 1.0 only supports 4KB read length, which is useless. I don't know
if there will be users who want to use HPB driver only supports 4KB
chunk size. I think, we all know that some smartphone vendors have
already use HPB 2.0, even HPB 2.0 has not been released yet. you
mentioned this in your before emails. HPB 1.0 is just a
transition(limited) version, we need to think about the HPB 2.0 support
when we develop the HPB 1.0 driver.
To say the least, if we don't think about HPB 2.0 support, and just
focus HPB 1.0, in the end, after HPB 2.0 releasing, we need to return
original point, re-do lots thing, why we cannot fix it now and think
one step further.

2. The major goal of the HPB feature is to improve random read
performance, and HPB device mode implementing flow is now already very
clear enough. I don't know what the functional flows you mentioned.
if it is HPB host mode, no, this is another big topic, I think we'd
better not add in current driver until we all have a final approach.

3. Regarding the Daejun's HPB driver used age, I can't easily jump to a
conclusion. But for sure, before he disclosed his HPB driver and
submitted to the community, he did lots of changes and deletions. That
means it still needs lots of tests.

I didn't mean to disrupt Daejun's patch upstreaming. If Daejun can
consider HPB 2.0 support while developing HPB 1.0 patch, that is super.
Thus we can quickly add HPB 2.0 support once HPB 2.0 Spec released.
Think about that who is now using HPB 1.0?

Thanks,
Bean

Bean Huo June 30, 2020, 10:05 p.m. UTC | #12

On Tue, 2020-06-30 at 10:05 +0900, Daejun Park wrote:
> Hi Bean,
> > On Mon, 2020-06-29 at 15:15 +0900, Daejun Park wrote:
> > > > Seems you intentionally ignored to give you comments on my
> > > > suggestion.
> > > > let me provide the reason.
> > > 
> > > Sorry! I replied to your comment (
> > > 
https://protect2.fireeye.com/url?k=be575021-e3854728-be56db6e-0cc47a31cdf8-6c7d0e1e42762b92&q=1&u=https%3A%2F%2Flkml.org%2Flkml%2F2020%2F6%2F15%2F1492
> > > ),
> > > but you didn't reply on that. I thought you agreed because you
> > > didn't
> > > send
> > > any more comments.
> > > 
> > > 
> > > > Before submitting your next version patch, please check your
> > > > L2P
> > > > mapping HPB reqeust submission logical algorithem. I have did
> > > 
> > > We are also reviewing the code that you submitted before.
> > > It seems to be a performance improvement as it sends a map
> > > request
> > > directly.
> > > 
> > > > performance comparison testing on 4KB, there are about 13%
> > > > performance
> > > > drop. Also the hit count is lower. I don't know if this is
> > > > related
> > > > to
> > > 
> > > It is interesting that there is actually a performance
> > > improvement. 
> > > Could you share the test environment, please? However, I think
> > > stability is
> > > important to HPB driver. We have tested our method with the real
> > > products and
> > > the HPB 1.0 driver is based on that.
> > 
> > I just run fio benchmark tool with --rw=randread, --bs=4kb, --
> > size=8G/10G/64G/100G. and see what performance diff with the direct
> > submission approach.
> 
> Thanks!
> 
> > > After this patch, your approach can be done as an incremental
> > > patch?
> > > I would
> > > like to test the patch that you submitted and verify it.
> > > 
> > > > your current work queue scheduling, since you didn't add the
> > > > timer
> > > > for
> > > > each HPB request.
> > 
> > Taking into consideration of the HPB 2.0, can we submit the HPB
> > write
> > request to the SCSI layer? if not, it will be a direct submission
> > way.
> > why not directly use direct way? or maybe you have a more advisable
> > approach to work around this. would you please share with us.
> > appreciate.
> 
> I am considering a direct submission way for the next version.
> We will implement the write buffer command of HPB 2.0, after patching
> HPB 1.0.
> 
> As for the direct submission of HPB releated command including HPB
> write
> buffer, I think we'd better discuss the right approach in depth
> before
> moving on to the next step.
> 

Hi Daejun
If you need reference code, you can freely copy my code from my RFC v3
patchset. or if you need my side testing support, just let me, I can
help you test your code.

Thanks,
Bean

Daejun Park July 1, 2020, 12:14 a.m. UTC | #13

On Tue, 2020-06-30 at 10:05 +0900, Daejun Park wrote:
> > Hi Bean,
> > > On Mon, 2020-06-29 at 15:15 +0900, Daejun Park wrote:
> > > > > Seems you intentionally ignored to give you comments on my
> > > > > suggestion.
> > > > > let me provide the reason.
> > > > 
> > > > Sorry! I replied to your comment (
> > > > 
> https://protect2.fireeye.com/url?k=be575021-e3854728-be56db6e-0cc47a31cdf8-6c7d0e1e42762b92&q=1&u=https%3A%2F%2Flkml.org%2Flkml%2F2020%2F6%2F15%2F1492
> > > > ),
> > > > but you didn't reply on that. I thought you agreed because you
> > > > didn't
> > > > send
> > > > any more comments.
> > > > 
> > > > 
> > > > > Before submitting your next version patch, please check your
> > > > > L2P
> > > > > mapping HPB reqeust submission logical algorithem. I have did
> > > > 
> > > > We are also reviewing the code that you submitted before.
> > > > It seems to be a performance improvement as it sends a map
> > > > request
> > > > directly.
> > > > 
> > > > > performance comparison testing on 4KB, there are about 13%
> > > > > performance
> > > > > drop. Also the hit count is lower. I don't know if this is
> > > > > related
> > > > > to
> > > > 
> > > > It is interesting that there is actually a performance
> > > > improvement. 
> > > > Could you share the test environment, please? However, I think
> > > > stability is
> > > > important to HPB driver. We have tested our method with the real
> > > > products and
> > > > the HPB 1.0 driver is based on that.
> > > 
> > > I just run fio benchmark tool with --rw=randread, --bs=4kb, --
> > > size=8G/10G/64G/100G. and see what performance diff with the direct
> > > submission approach.
> > 
> > Thanks!
> > 
> > > > After this patch, your approach can be done as an incremental
> > > > patch?
> > > > I would
> > > > like to test the patch that you submitted and verify it.
> > > > 
> > > > > your current work queue scheduling, since you didn't add the
> > > > > timer
> > > > > for
> > > > > each HPB request.
> > > 
> > > Taking into consideration of the HPB 2.0, can we submit the HPB
> > > write
> > > request to the SCSI layer? if not, it will be a direct submission
> > > way.
> > > why not directly use direct way? or maybe you have a more advisable
> > > approach to work around this. would you please share with us.
> > > appreciate.
> > 
> > I am considering a direct submission way for the next version.
> > We will implement the write buffer command of HPB 2.0, after patching
> > HPB 1.0.
> > 
> > As for the direct submission of HPB releated command including HPB
> > write
> > buffer, I think we'd better discuss the right approach in depth
> > before
> > moving on to the next step.
> > 
> 
> Hi Daejun
> If you need reference code, you can freely copy my code from my RFC v3
> patchset. or if you need my side testing support, just let me, I can
> help you test your code.
> 
It will be good example code for developing HPB 2.0.

Thanks,
Daejun

Alim Akhtar July 1, 2020, 1:54 a.m. UTC | #14

> -----Original Message-----
> From: Avri Altman <Avri.Altman@wdc.com>
> Sent: 30 June 2020 12:09
> To: daejun7.park@samsung.com; Bean Huo <huobean@gmail.com>;
> jejb@linux.ibm.com; martin.petersen@oracle.com; asutoshd@codeaurora.org;
> stanley.chu@mediatek.com; cang@codeaurora.org; bvanassche@acm.org;
> tomas.winkler@intel.com; ALIM AKHTAR <alim.akhtar@samsung.com>
> Cc: linux-scsi@vger.kernel.org; linux-kernel@vger.kernel.org; Sang-yoon Oh
> <sangyoon.oh@samsung.com>; Sung-Jun Park
> <sungjun07.park@samsung.com>; yongmyung lee
> <ymhungry.lee@samsung.com>; Jinyoung CHOI <j-young.choi@samsung.com>;
> Adel Choi <adel.choi@samsung.com>; BoRam Shin
> <boram.shin@samsung.com>
> Subject: RE: [RFC PATCH v3 0/5] scsi: ufs: Add Host Performance Booster
> Support
> 
> Hi,
> 
> >
> > Hi Bean,
> > > On Mon, 2020-06-29 at 15:15 +0900, Daejun Park wrote:
> > > > > Seems you intentionally ignored to give you comments on my
> > > > > suggestion.
> > > > > let me provide the reason.
> > > >
> > > > Sorry! I replied to your comment (
> > > > https://protect2.fireeye.com/url?k=be575021-e3854728-be56db6e-
> > 0cc47a31cdf8-
> >
> 6c7d0e1e42762b92&q=1&u=https%3A%2F%2Flkml.org%2Flkml%2F2020%2F6%
> > 2F15%2F1492),
> > > > but you didn't reply on that. I thought you agreed because you
> > > > didn't send any more comments.
> > > >
> > > >
> > > > > Before submitting your next version patch, please check your L2P
> > > > > mapping HPB reqeust submission logical algorithem. I have did
> > > >
> > > > We are also reviewing the code that you submitted before.
> > > > It seems to be a performance improvement as it sends a map request
> > > > directly.
> > > >
> > > > > performance comparison testing on 4KB, there are about 13%
> > > > > performance drop. Also the hit count is lower. I don't know if
> > > > > this is related to
> > > >
> > > > It is interesting that there is actually a performance improvement.
> > > > Could you share the test environment, please? However, I think
> > > > stability is important to HPB driver. We have tested our method
> > > > with the real products and the HPB 1.0 driver is based on that.
> > >
> > > I just run fio benchmark tool with --rw=randread, --bs=4kb, --
> > > size=8G/10G/64G/100G. and see what performance diff with the direct
> > > submission approach.
> >
> > Thanks!
> >
> > > > After this patch, your approach can be done as an incremental patch?
> > > > I would
> > > > like to test the patch that you submitted and verify it.
> > > >
> > > > > your current work queue scheduling, since you didn't add the
> > > > > timer for each HPB request.
> > > >
> > >
> > > Taking into consideration of the HPB 2.0, can we submit the HPB
> > > write request to the SCSI layer? if not, it will be a direct submission way.
> > > why not directly use direct way? or maybe you have a more advisable
> > > approach to work around this. would you please share with us.
> > > appreciate.
> >
> > I am considering a direct submission way for the next version.
> > We will implement the write buffer command of HPB 2.0, after patching
> > HPB 1.0.
> >
> > As for the direct submission of HPB releated command including HPB
> > write buffer, I think we'd better discuss the right approach in depth
> > before moving on to the next step.
> I vote to stay with the current implementation because:
> 1) Bean is probably right about 2.0, but it's out of scope for now -
>     there is a long way to go before we'll need to worry about it
> 2) For now, we should focus on the functional flows.
>     Performance issues, should such issues indeed exists, can be dealt with  later.
> And,
> 3) The current code base is running in production for more than 3 years now.
>      I am not so eager to dump a robust, well debugged code unless it absolutely
> necessary.
> 
Avri and Bean,
I think this is good approach to take, and let us add incremental patches to add future specification enhancements.
 
> Thanks,
> Avri
>

[RFC,v3,0/5] scsi: ufs: Add Host Performance Booster Support

Message

Comments