mbox series

[RFC,0/5] scsi: ufs: Add Host Performance Booster Support

Message ID 231786897.01591320001492.JavaMail.epsvc@epcpadp1 (mailing list archive)
Headers show
Series scsi: ufs: Add Host Performance Booster Support | expand

Message

Daejun Park June 5, 2020, 1:16 a.m. UTC
NAND flash memory-based storage devices use Flash Translation Layer (FTL)
to translate logical addresses of I/O requests to corresponding flash
memory addresses. Mobile storage devices typically have RAM with
constrained size, thus lack in memory to keep the whole mapping table.
Therefore, mapping tables are partially retrieved from NAND flash on
demand, causing random-read performance degradation.

To improve random read performance, we propose HPB (Host Performance
Booster) which uses host system memory as a cache for the FTL mapping
table. By using HPB, FTL data can be read from host memory faster than from
NAND flash memory. 

The current version only supports the DCM (device control mode).
This patch consists of 4 parts to support HPB feature.

1) UFS-feature layer
2) HPB probe and initialization process
3) READ -> HPB READ using cached map information
4) L2P (logical to physical) map management

The UFS-feature is an additional layer to avoid the structure in which the
UFS-core driver and the UFS-feature are entangled with each other in a 
single module.
By adding the layer, UFS-features composed of various combinations can be
supported. Also, even if a new feature is added, modification of the 
UFS-core driver can be minimized.

In the HPB probe and init process, the device information of the UFS is
queried. After checking supported features, the data structure for the HPB
is initialized according to the device information.

A read I/O in the active sub-region where the map is cached is changed to
HPB READ by the HPB module.

The HPB module manages the L2P map using information received from the
device. For active sub-region, the HPB module caches through ufshpb_map
request. For the in-active region, the HPB module discards the L2P map.
When a write I/O occurs in an active sub-region area, associated dirty
bitmap checked as dirty for preventing stale read.

HPB is shown to have a performance improvement of 58 - 67% for random read
workload. [1]

This series patches are based on the "5.8/scsi-queue" branch.

[1]:
https://www.usenix.org/conference/hotstorage17/program/presentation/jeong

Daejun park (5):
 scsi: ufs: Add UFS feature related parameter
 scsi: ufs: Add UFS feature layer
 scsi: ufs: Introduce HPB module
 scsi: ufs: L2P map management for HPB read
 scsi: ufs: Prepare HPB read for cached sub-region
 
 drivers/scsi/ufs/Kconfig      |    8 +
 drivers/scsi/ufs/Makefile     |    3 +-
 drivers/scsi/ufs/ufs.h        |   11 +
 drivers/scsi/ufs/ufsfeature.c |  178 ++++
 drivers/scsi/ufs/ufsfeature.h |   95 ++
 drivers/scsi/ufs/ufshcd.c     |   19 +
 drivers/scsi/ufs/ufshcd.h     |    3 +
 drivers/scsi/ufs/ufshpb.c     | 2029 +++++++++++++++++++++++++++++++++++++++++
 drivers/scsi/ufs/ufshpb.h     |  257 ++++++
 9 files changed, 2602 insertions(+), 1 deletion(-)
 created mode 100644 drivers/scsi/ufs/ufsfeature.c
 created mode 100644 drivers/scsi/ufs/ufsfeature.h
 created mode 100644 drivers/scsi/ufs/ufshpb.c
 created mode 100644 drivers/scsi/ufs/ufshpb.h

Comments

Avri Altman June 6, 2020, 12:02 p.m. UTC | #1
Hi,
> 
> NAND flash memory-based storage devices use Flash Translation Layer (FTL)
> to translate logical addresses of I/O requests to corresponding flash
> memory addresses. Mobile storage devices typically have RAM with
> constrained size, thus lack in memory to keep the whole mapping table.
> Therefore, mapping tables are partially retrieved from NAND flash on
> demand, causing random-read performance degradation.
> 
> To improve random read performance, we propose HPB (Host Performance
we propose  --> jedec spec XXX proposes …
and here you also disclose what version of the spec are you supporting

> Booster) which uses host system memory as a cache for the FTL mapping
> table. By using HPB, FTL data can be read from host memory faster than from
> NAND flash memory.
> 
> The current version only supports the DCM (device control mode).
> This patch consists of 4 parts to support HPB feature.
> 
> 1) UFS-feature layer
> 2) HPB probe and initialization process
> 3) READ -> HPB READ using cached map information
> 4) L2P (logical to physical) map management
> 
> The UFS-feature is an additional layer to avoid the structure in which the
> UFS-core driver and the UFS-feature are entangled with each other in a
> single module.
> By adding the layer, UFS-features composed of various combinations can be
> supported. Also, even if a new feature is added, modification of the
> UFS-core driver can be minimized.
Like Bart, I am not sure that this extra module is needed.
It only makes sense if indeed there are some common calls that can be shared by several features.
There are up to now 10 extended features defined, but none of them can share a common api.
What other features can share this additional layer?  And how those ops can be reused?
If you have some future implementations in mind, you should add this api once you'll add those.

> 
> In the HPB probe and init process, the device information of the UFS is
> queried. After checking supported features, the data structure for the HPB
> is initialized according to the device information.
> 
> A read I/O in the active sub-region where the map is cached is changed to
> HPB READ by the HPB module.
> 
> The HPB module manages the L2P map using information received from the
> device. For active sub-region, the HPB module caches through ufshpb_map
> request. For the in-active region, the HPB module discards the L2P map.
> When a write I/O occurs in an active sub-region area, associated dirty
> bitmap checked as dirty for preventing stale read.
> 
> HPB is shown to have a performance improvement of 58 - 67% for random
> read
> workload. [1]
> 
> This series patches are based on the "5.8/scsi-queue" branch.
> 
> [1]:
> https://www.usenix.org/conference/hotstorage17/program/presentation/jeo
> ng
This 2017 study, is being cited by everyone, but does not really describes it's test setup to its details.
It  does say however that they used a 16MB subregions over a range of 1GB,
which can be covered by a 64 active regions, Even for a single subregion per region.
Meaning no eviction should take place, thus HPB overhead is minimized.
Do we have a more recent public studies that supports those impressive figures?

Thanks,
Avri
Daejun Park June 9, 2020, 12:49 a.m. UTC | #2
Hi,

I appreciate your insightful comments.
  
> we propose  --> jedec spec XXX proposes …
> and here you also disclose what version of the spec are you supporting
I will change to "JESD220-3 (HPB v1.0) proposes".
This patch supports HPB version 1.0.

> Like Bart, I am not sure that this extra module is needed.
> It only makes sense if indeed there are some common calls that can be shared by several features.
> There are up to now 10 extended features defined, but none of them can share a common api.
> What other features can share this additional layer?  And how those ops can be reused?
> If you have some future implementations in mind, you should add this api once you'll add those.
We added UFS feature layer with several callbacks to important parts of the UFS control flow.
Other extended features can also be implemented using the proposed APIs.
For example, in WB, "prep_fn" can be used to guarantee the lifetime of UFS by updating the amount of write IO used as WB.
And reset/reset_host/suspend/resume can be used to manage the kernel task for checking lifetime of UFS.

> This 2017 study, is being cited by everyone, but does not really describes it's test setup to its details.
> It  does say however that they used a 16MB subregions over a range of 1GB,
> which can be covered by a 64 active regions, Even for a single subregion per region.
> Meaning no eviction should take place, thus HPB overhead is minimized.
> Do we have a more recent public studies that supports those impressive figures?
There are no other public studies currently.
However, when using HPB, there is an internal report that read latency is improved in android 
user-case scenarios, as well as in the benchmarks.

Thanks,
Daejun
Avri Altman June 9, 2020, 7 a.m. UTC | #3
> > Like Bart, I am not sure that this extra module is needed.
> > It only makes sense if indeed there are some common calls that can be
> shared by several features.
> > There are up to now 10 extended features defined, but none of them can
> share a common api.
> > What other features can share this additional layer?  And how those ops
> can be reused?
> > If you have some future implementations in mind, you should add this api
> once you'll add those.
> We added UFS feature layer with several callbacks to important parts of the
> UFS control flow.
> Other extended features can also be implemented using the proposed APIs.
> For example, in WB, "prep_fn" can be used to guarantee the lifetime of UFS
> by updating the amount of write IO used as WB.
This is an interesting idea.

> And reset/reset_host/suspend/resume can be used to manage the kernel task
> for checking lifetime of UFS.
Another interesting idea.

Fair enough. Please share in the commit log of patch 2/5 your plans,
Otherwise, just for HPB - It seems excessive.
Bean Huo June 10, 2020, 9:50 a.m. UTC | #4
Hi Daejun

Nice to see your patch, I just run it on my testing workspace, work.
and in the next days, I can help you review your patch.

Thanks,
Bean
 

On Fri, 2020-06-05 at 10:16 +0900, Daejun Park wrote:
> NAND flash memory-based storage devices use Flash Translation Layer
> (FTL)
> to translate logical addresses of I/O requests to corresponding flash
> memory addresses. Mobile storage devices typically have RAM with
> constrained size, thus lack in memory to keep the whole mapping
> table.
> Therefore, mapping tables are partially retrieved from NAND flash on
> demand, causing random-read performance degradation.
> 
> To improve random read performance, we propose HPB (Host Performance
> Booster) which uses host system memory as a cache for the FTL mapping
> table. By using HPB, FTL data can be read from host memory faster
> than from
> NAND flash memory. 
> 
> The current version only supports the DCM (device control mode).
> This patch consists of 4 parts to support HPB feature.
> 
> 1) UFS-feature layer
> 2) HPB probe and initialization process
> 3) READ -> HPB READ using cached map information
> 4) L2P (logical to physical) map management
> 
> The UFS-feature is an additional layer to avoid the structure in
> which the
> UFS-core driver and the UFS-feature are entangled with each other in
> a 
> single module.
> By adding the layer, UFS-features composed of various combinations
> can be
> supported. Also, even if a new feature is added, modification of the 
> UFS-core driver can be minimized.
> 
> In the HPB probe and init process, the device information of the UFS
> is
> queried. After checking supported features, the data structure for
> the HPB
> is initialized according to the device information.
> 
> A read I/O in the active sub-region where the map is cached is
> changed to
> HPB READ by the HPB module.
> 
> The HPB module manages the L2P map using information received from
> the
> device. For active sub-region, the HPB module caches through
> ufshpb_map
> request. For the in-active region, the HPB module discards the L2P
> map.
> When a write I/O occurs in an active sub-region area, associated
> dirty
> bitmap checked as dirty for preventing stale read.
> 
> HPB is shown to have a performance improvement of 58 - 67% for random
> read
> workload. [1]
> 
> This series patches are based on the "5.8/scsi-queue" branch.
> 
> [1]:
> 
https://www.usenix.org/conference/hotstorage17/program/presentation/jeong
> 
> Daejun park (5):
>  scsi: ufs: Add UFS feature related parameter
>  scsi: ufs: Add UFS feature layer
>  scsi: ufs: Introduce HPB module
>  scsi: ufs: L2P map management for HPB read
>  scsi: ufs: Prepare HPB read for cached sub-region
>  
>  drivers/scsi/ufs/Kconfig      |    8 +
>  drivers/scsi/ufs/Makefile     |    3 +-
>  drivers/scsi/ufs/ufs.h        |   11 +
>  drivers/scsi/ufs/ufsfeature.c |  178 ++++
>  drivers/scsi/ufs/ufsfeature.h |   95 ++
>  drivers/scsi/ufs/ufshcd.c     |   19 +
>  drivers/scsi/ufs/ufshcd.h     |    3 +
>  drivers/scsi/ufs/ufshpb.c     | 2029
> +++++++++++++++++++++++++++++++++++++++++
>  drivers/scsi/ufs/ufshpb.h     |  257 ++++++
>  9 files changed, 2602 insertions(+), 1 deletion(-)
>  created mode 100644 drivers/scsi/ufs/ufsfeature.c
>  created mode 100644 drivers/scsi/ufs/ufsfeature.h
>  created mode 100644 drivers/scsi/ufs/ufshpb.c
>  created mode 100644 drivers/scsi/ufs/ufshpb.h