Message ID | 963815509.21592879582091.JavaMail.epsvc@epcpadp2 (mailing list archive) |
---|---|
Headers | show |
Series | scsi: ufs: Add Host Performance Booster Support | expand |
Hi Daejun Seems you intentionally ignored to give you comments on my suggestion. let me provide the reason. Before submitting your next version patch, please check your L2P mapping HPB reqeust submission logical algorithem. I have did performance comparison testing on 4KB, there are about 13% performance drop. Also the hit count is lower. I don't know if this is related to your current work queue scheduling, since you didn't add the timer for each HPB request. Thanks, Bean On Tue, 2020-06-23 at 10:02 +0900, Daejun Park wrote: > Changelog: > > v2 -> v3 > 1. Add checking input module parameter value. > 2. Change base commit from 5.8/scsi-queue to 5.9/scsi-queue. > 3. Cleanup for unused variables and label. > > v1 -> v2 > 1. Change the full boilerplate text to SPDX style. > 2. Adopt dynamic allocation for sub-region data structure. > 3. Cleanup. > > NAND flash memory-based storage devices use Flash Translation Layer > (FTL) > to translate logical addresses of I/O requests to corresponding flash > memory addresses. Mobile storage devices typically have RAM with > constrained size, thus lack in memory to keep the whole mapping > table. > Therefore, mapping tables are partially retrieved from NAND flash on > demand, causing random-read performance degradation. > > To improve random read performance, JESD220-3 (HPB v1.0) proposes HPB > (Host Performance Booster) which uses host system memory as a cache > for the > FTL mapping table. By using HPB, FTL data can be read from host > memory > faster than from NAND flash memory. > > The current version only supports the DCM (device control mode). > This patch consists of 4 parts to support HPB feature. > > 1) UFS-feature layer > 2) HPB probe and initialization process > 3) READ -> HPB READ using cached map information > 4) L2P (logical to physical) map management > > The UFS-feature is an additional layer to avoid the structure in > which the > UFS-core driver and the UFS-feature are entangled with each other in > a > single module. > By adding the layer, UFS-features composed of various combinations > can be > supported. Also, even if a new feature is added, modification of the > UFS-core driver can be minimized. > > In the HPB probe and init process, the device information of the UFS > is > queried. After checking supported features, the data structure for > the HPB > is initialized according to the device information. > > A read I/O in the active sub-region where the map is cached is > changed to > HPB READ by the HPB module. > > The HPB module manages the L2P map using information received from > the > device. For active sub-region, the HPB module caches through > ufshpb_map > request. For the in-active region, the HPB module discards the L2P > map. > When a write I/O occurs in an active sub-region area, associated > dirty > bitmap checked as dirty for preventing stale read. > > HPB is shown to have a performance improvement of 58 - 67% for random > read > workload. [1] > > This series patches are based on the 5.9/scsi-queue branch. > > [1]: > https://www.usenix.org/conference/hotstorage17/program/presentation/jeong > > Daejun park (5): > scsi: ufs: Add UFS feature related parameter > scsi: ufs: Add UFS feature layer > scsi: ufs: Introduce HPB module > scsi: ufs: L2P map management for HPB read > scsi: ufs: Prepare HPB read for cached sub-region > > drivers/scsi/ufs/Kconfig | 9 + > drivers/scsi/ufs/Makefile | 3 +- > drivers/scsi/ufs/ufs.h | 12 + > drivers/scsi/ufs/ufsfeature.c | 148 +++ > drivers/scsi/ufs/ufsfeature.h | 69 ++ > drivers/scsi/ufs/ufshcd.c | 23 +- > drivers/scsi/ufs/ufshcd.h | 3 + > drivers/scsi/ufs/ufshpb.c | 1996 > ++++++++++++++++++++++++++++++++++++ > drivers/scsi/ufs/ufshpb.h | 234 +++++ > 9 files changed, 2494 insertions(+), 3 deletions(-) > created mode 100644 drivers/scsi/ufs/ufsfeature.c > created mode 100644 drivers/scsi/ufs/ufsfeature.h > created mode 100644 drivers/scsi/ufs/ufshpb.c > created mode 100644 drivers/scsi/ufs/ufshpb.h
If no-one else objects, maybe you can submit your patches as non-RFC for review? Thanks, Avri > -----Original Message----- > From: Daejun Park <daejun7.park@samsung.com> > Sent: Tuesday, June 23, 2020 4:02 AM > To: Avri Altman <Avri.Altman@wdc.com>; jejb@linux.ibm.com; > martin.petersen@oracle.com; asutoshd@codeaurora.org; > stanley.chu@mediatek.com; cang@codeaurora.org; huobean@gmail.com; > bvanassche@acm.org; tomas.winkler@intel.com; ALIM AKHTAR > <alim.akhtar@samsung.com>; Daejun Park <daejun7.park@samsung.com> > Cc: linux-scsi@vger.kernel.org; linux-kernel@vger.kernel.org; Sang-yoon Oh > <sangyoon.oh@samsung.com>; Sung-Jun Park > <sungjun07.park@samsung.com>; yongmyung lee > <ymhungry.lee@samsung.com>; Jinyoung CHOI <j- > young.choi@samsung.com>; Adel Choi <adel.choi@samsung.com>; BoRam > Shin <boram.shin@samsung.com> > Subject: [RFC PATCH v3 0/5] scsi: ufs: Add Host Performance Booster Support > > CAUTION: This email originated from outside of Western Digital. Do not click > on links or open attachments unless you recognize the sender and know that > the content is safe. > > > Changelog: > > v2 -> v3 > 1. Add checking input module parameter value. > 2. Change base commit from 5.8/scsi-queue to 5.9/scsi-queue. > 3. Cleanup for unused variables and label. > > v1 -> v2 > 1. Change the full boilerplate text to SPDX style. > 2. Adopt dynamic allocation for sub-region data structure. > 3. Cleanup. > > NAND flash memory-based storage devices use Flash Translation Layer (FTL) > to translate logical addresses of I/O requests to corresponding flash > memory addresses. Mobile storage devices typically have RAM with > constrained size, thus lack in memory to keep the whole mapping table. > Therefore, mapping tables are partially retrieved from NAND flash on > demand, causing random-read performance degradation. > > To improve random read performance, JESD220-3 (HPB v1.0) proposes HPB > (Host Performance Booster) which uses host system memory as a cache for > the > FTL mapping table. By using HPB, FTL data can be read from host memory > faster than from NAND flash memory. > > The current version only supports the DCM (device control mode). > This patch consists of 4 parts to support HPB feature. > > 1) UFS-feature layer > 2) HPB probe and initialization process > 3) READ -> HPB READ using cached map information > 4) L2P (logical to physical) map management > > The UFS-feature is an additional layer to avoid the structure in which the > UFS-core driver and the UFS-feature are entangled with each other in a > single module. > By adding the layer, UFS-features composed of various combinations can be > supported. Also, even if a new feature is added, modification of the > UFS-core driver can be minimized. > > In the HPB probe and init process, the device information of the UFS is > queried. After checking supported features, the data structure for the HPB > is initialized according to the device information. > > A read I/O in the active sub-region where the map is cached is changed to > HPB READ by the HPB module. > > The HPB module manages the L2P map using information received from the > device. For active sub-region, the HPB module caches through ufshpb_map > request. For the in-active region, the HPB module discards the L2P map. > When a write I/O occurs in an active sub-region area, associated dirty > bitmap checked as dirty for preventing stale read. > > HPB is shown to have a performance improvement of 58 - 67% for random > read > workload. [1] > > This series patches are based on the 5.9/scsi-queue branch. > > [1]: > https://www.usenix.org/conference/hotstorage17/program/presentation/jeo > ng > > Daejun park (5): > scsi: ufs: Add UFS feature related parameter > scsi: ufs: Add UFS feature layer > scsi: ufs: Introduce HPB module > scsi: ufs: L2P map management for HPB read > scsi: ufs: Prepare HPB read for cached sub-region > > drivers/scsi/ufs/Kconfig | 9 + > drivers/scsi/ufs/Makefile | 3 +- > drivers/scsi/ufs/ufs.h | 12 + > drivers/scsi/ufs/ufsfeature.c | 148 +++ > drivers/scsi/ufs/ufsfeature.h | 69 ++ > drivers/scsi/ufs/ufshcd.c | 23 +- > drivers/scsi/ufs/ufshcd.h | 3 + > drivers/scsi/ufs/ufshpb.c | 1996 ++++++++++++++++++++++++++++++++++++ > drivers/scsi/ufs/ufshpb.h | 234 +++++ > 9 files changed, 2494 insertions(+), 3 deletions(-) > created mode 100644 drivers/scsi/ufs/ufsfeature.c > created mode 100644 drivers/scsi/ufs/ufsfeature.h > created mode 100644 drivers/scsi/ufs/ufshpb.c > created mode 100644 drivers/scsi/ufs/ufshpb.h
Hi Bean, > > Hi Daejun > > Seems you intentionally ignored to give you comments on my suggestion. > let me provide the reason. > > Before submitting your next version patch, please check your L2P > mapping HPB reqeust submission logical algorithem. I have did > performance comparison testing on 4KB, there are about 13% performance > drop. Also the hit count is lower. I don't know if this is related to > your current work queue scheduling, since you didn't add the timer for > each HPB request. In device control mode, the various decisions, and specifically those that are causing repetitive evictions, are made by the device. Is this the issue that you are referring to? As for the driver, do you see any issue that is causing unnecessary latency? Thanks, Avri
> Seems you intentionally ignored to give you comments on my suggestion. > let me provide the reason. Sorry! I replied to your comment (https://lkml.org/lkml/2020/6/15/1492), but you didn't reply on that. I thought you agreed because you didn't send any more comments. > Before submitting your next version patch, please check your L2P > mapping HPB reqeust submission logical algorithem. I have did We are also reviewing the code that you submitted before. It seems to be a performance improvement as it sends a map request directly. > performance comparison testing on 4KB, there are about 13% performance > drop. Also the hit count is lower. I don't know if this is related to It is interesting that there is actually a performance improvement. Could you share the test environment, please? However, I think stability is important to HPB driver. We have tested our method with the real products and the HPB 1.0 driver is based on that. After this patch, your approach can be done as an incremental patch? I would like to test the patch that you submitted and verify it. > your current work queue scheduling, since you didn't add the timer for > each HPB request. There was Bart's comment that it was not good add an arbitrary timeout value to the request. (please refer to: https://lkml.org/lkml/2020/6/11/1043) When no timer is added to the request, the SD timout will be set as default timeout at the block layer. Thanks, Daejun
Hi Avri On Mon, 2020-06-29 at 05:24 +0000, Avri Altman wrote: > Hi Bean, > > > > Hi Daejun > > > > Seems you intentionally ignored to give you comments on my > > suggestion. > > let me provide the reason. > > > > Before submitting your next version patch, please check your L2P > > mapping HPB reqeust submission logical algorithem. I have did > > performance comparison testing on 4KB, there are about 13% > > performance > > drop. Also the hit count is lower. I don't know if this is related > > to > > your current work queue scheduling, since you didn't add the timer > > for > > each HPB request. > > In device control mode, the various decisions, > and specifically those that are causing repetitive evictions, > are made by the device. > Is this the issue that you are referring to? > For this device mode, if HPB mapping table of the active region becomes dirty in the UFS device side, there is repetitive inactive rsp, but it is not the reason for the condition I mentioned here. > As for the driver, do you see any issue that is causing unnecessary > latency? > In Daejun's patch, it now uses work_queue, and as long there is new RSP of thesubregion to be activated, the driver will queue "work" to this work queue, actually, this is deferred work. we don't know when it will be scheduled/finished. we need to optimize it. Thanks, Bean
> > Hi Avri > > On Mon, 2020-06-29 at 05:24 +0000, Avri Altman wrote: > > Hi Bean, > > > > > > Hi Daejun > > > > > > Seems you intentionally ignored to give you comments on my > > > suggestion. > > > let me provide the reason. > > > > > > Before submitting your next version patch, please check your L2P > > > mapping HPB reqeust submission logical algorithem. I have did > > > performance comparison testing on 4KB, there are about 13% > > > performance > > > drop. Also the hit count is lower. I don't know if this is related > > > to > > > your current work queue scheduling, since you didn't add the timer > > > for > > > each HPB request. > > > > In device control mode, the various decisions, > > and specifically those that are causing repetitive evictions, > > are made by the device. > > Is this the issue that you are referring to? > > > > For this device mode, if HPB mapping table of the active region becomes > dirty in the UFS device side, there is repetitive inactive rsp, but it > is not the reason for the condition I mentioned here. > > > As for the driver, do you see any issue that is causing unnecessary > > latency? > > > > In Daejun's patch, it now uses work_queue, and as long there is new RSP of > thesubregion to be activated, the driver will queue "work" to this work > queue, actually, this is deferred work. we don't know when it will be > scheduled/finished. we need to optimize it. But those "to-do" lists are checked on every completion interrupt and on every resume. Do you see any scenario in which the "to-be-activated" or "to-be-inactivate" work is getting starved?
Hi Daejun On Mon, 2020-06-29 at 15:15 +0900, Daejun Park wrote: > > Seems you intentionally ignored to give you comments on my > > suggestion. > > let me provide the reason. > > Sorry! I replied to your comment ( > https://lkml.org/lkml/2020/6/15/1492), > but you didn't reply on that. I thought you agreed because you didn't > send > any more comments. > > > > Before submitting your next version patch, please check your L2P > > mapping HPB reqeust submission logical algorithem. I have did > > We are also reviewing the code that you submitted before. > It seems to be a performance improvement as it sends a map request > directly. > > > performance comparison testing on 4KB, there are about 13% > > performance > > drop. Also the hit count is lower. I don't know if this is related > > to > > It is interesting that there is actually a performance improvement. > Could you share the test environment, please? However, I think > stability is > important to HPB driver. We have tested our method with the real > products and > the HPB 1.0 driver is based on that. I just run fio benchmark tool with --rw=randread, --bs=4kb, -- size=8G/10G/64G/100G. and see what performance diff with the direct submission approach. > After this patch, your approach can be done as an incremental patch? > I would > like to test the patch that you submitted and verify it. > > > your current work queue scheduling, since you didn't add the timer > > for > > each HPB request. > Taking into consideration of the HPB 2.0, can we submit the HPB write request to the SCSI layer? if not, it will be a direct submission way. why not directly use direct way? or maybe you have a more advisable approach to work around this. would you please share with us. appreciate. > There was Bart's comment that it was not good add an arbitrary > timeout value > to the request. (please refer to: > https://lkml.org/lkml/2020/6/11/1043) > When no timer is added to the request, the SD timout will be set as > default > timeout at the block layer. > I saw that, so I should add a timer in order to optimise HPB reqeust scheduling/completition. this is ok so far. > Thanks, > Daejun Thanks, Bean
On Mon, 2020-06-29 at 11:06 +0000, Avri Altman wrote: > > > > Hi Avri > > > > On Mon, 2020-06-29 at 05:24 +0000, Avri Altman wrote: > > > Hi Bean, > > > > > > > > Hi Daejun > > > > > > > > Seems you intentionally ignored to give you comments on my > > > > suggestion. > > > > let me provide the reason. > > > > > > > > Before submitting your next version patch, please check your > > > > L2P > > > > mapping HPB reqeust submission logical algorithem. I have did > > > > performance comparison testing on 4KB, there are about 13% > > > > performance > > > > drop. Also the hit count is lower. I don't know if this is > > > > related > > > > to > > > > your current work queue scheduling, since you didn't add the > > > > timer > > > > for > > > > each HPB request. > > > > > > In device control mode, the various decisions, > > > and specifically those that are causing repetitive evictions, > > > are made by the device. > > > Is this the issue that you are referring to? > > > > > > > For this device mode, if HPB mapping table of the active region > > becomes > > dirty in the UFS device side, there is repetitive inactive rsp, but > > it > > is not the reason for the condition I mentioned here. > > > > > As for the driver, do you see any issue that is causing > > > unnecessary > > > latency? > > > > > > > In Daejun's patch, it now uses work_queue, and as long there is new > > RSP of > > thesubregion to be activated, the driver will queue "work" to this > > work > > queue, actually, this is deferred work. we don't know when it will > > be > > scheduled/finished. we need to optimize it. > > But those "to-do" lists are checked on every completion interrupt and > on every resume. > Do you see any scenario in which the "to-be-activated" or "to-be- > inactivate" work is getting starved? > let me run more testing cases, will back to you if there is new updates. Thanks, Bean
Hi Bean, > On Mon, 2020-06-29 at 15:15 +0900, Daejun Park wrote: > > > Seems you intentionally ignored to give you comments on my > > > suggestion. > > > let me provide the reason. > > > > Sorry! I replied to your comment ( > > https://protect2.fireeye.com/url?k=be575021-e3854728-be56db6e-0cc47a31cdf8-6c7d0e1e42762b92&q=1&u=https%3A%2F%2Flkml.org%2Flkml%2F2020%2F6%2F15%2F1492), > > but you didn't reply on that. I thought you agreed because you didn't > > send > > any more comments. > > > > > > > Before submitting your next version patch, please check your L2P > > > mapping HPB reqeust submission logical algorithem. I have did > > > > We are also reviewing the code that you submitted before. > > It seems to be a performance improvement as it sends a map request > > directly. > > > > > performance comparison testing on 4KB, there are about 13% > > > performance > > > drop. Also the hit count is lower. I don't know if this is related > > > to > > > > It is interesting that there is actually a performance improvement. > > Could you share the test environment, please? However, I think > > stability is > > important to HPB driver. We have tested our method with the real > > products and > > the HPB 1.0 driver is based on that. > > I just run fio benchmark tool with --rw=randread, --bs=4kb, -- > size=8G/10G/64G/100G. and see what performance diff with the direct > submission approach. Thanks! > > After this patch, your approach can be done as an incremental patch? > > I would > > like to test the patch that you submitted and verify it. > > > > > your current work queue scheduling, since you didn't add the timer > > > for > > > each HPB request. > > > > Taking into consideration of the HPB 2.0, can we submit the HPB write > request to the SCSI layer? if not, it will be a direct submission way. > why not directly use direct way? or maybe you have a more advisable > approach to work around this. would you please share with us. > appreciate. I am considering a direct submission way for the next version. We will implement the write buffer command of HPB 2.0, after patching HPB 1.0. As for the direct submission of HPB releated command including HPB write buffer, I think we'd better discuss the right approach in depth before moving on to the next step. Thanks, Daejun
Hi, > > Hi Bean, > > On Mon, 2020-06-29 at 15:15 +0900, Daejun Park wrote: > > > > Seems you intentionally ignored to give you comments on my > > > > suggestion. > > > > let me provide the reason. > > > > > > Sorry! I replied to your comment ( > > > https://protect2.fireeye.com/url?k=be575021-e3854728-be56db6e- > 0cc47a31cdf8- > 6c7d0e1e42762b92&q=1&u=https%3A%2F%2Flkml.org%2Flkml%2F2020%2F6% > 2F15%2F1492), > > > but you didn't reply on that. I thought you agreed because you didn't > > > send > > > any more comments. > > > > > > > > > > Before submitting your next version patch, please check your L2P > > > > mapping HPB reqeust submission logical algorithem. I have did > > > > > > We are also reviewing the code that you submitted before. > > > It seems to be a performance improvement as it sends a map request > > > directly. > > > > > > > performance comparison testing on 4KB, there are about 13% > > > > performance > > > > drop. Also the hit count is lower. I don't know if this is related > > > > to > > > > > > It is interesting that there is actually a performance improvement. > > > Could you share the test environment, please? However, I think > > > stability is > > > important to HPB driver. We have tested our method with the real > > > products and > > > the HPB 1.0 driver is based on that. > > > > I just run fio benchmark tool with --rw=randread, --bs=4kb, -- > > size=8G/10G/64G/100G. and see what performance diff with the direct > > submission approach. > > Thanks! > > > > After this patch, your approach can be done as an incremental patch? > > > I would > > > like to test the patch that you submitted and verify it. > > > > > > > your current work queue scheduling, since you didn't add the timer > > > > for > > > > each HPB request. > > > > > > > Taking into consideration of the HPB 2.0, can we submit the HPB write > > request to the SCSI layer? if not, it will be a direct submission way. > > why not directly use direct way? or maybe you have a more advisable > > approach to work around this. would you please share with us. > > appreciate. > > I am considering a direct submission way for the next version. > We will implement the write buffer command of HPB 2.0, after patching HPB > 1.0. > > As for the direct submission of HPB releated command including HPB write > buffer, I think we'd better discuss the right approach in depth before > moving on to the next step. I vote to stay with the current implementation because: 1) Bean is probably right about 2.0, but it's out of scope for now - there is a long way to go before we'll need to worry about it 2) For now, we should focus on the functional flows. Performance issues, should such issues indeed exists, can be dealt with later. And, 3) The current code base is running in production for more than 3 years now. I am not so eager to dump a robust, well debugged code unless it absolutely necessary. Thanks, Avri
On Tue, 2020-06-30 at 06:39 +0000, Avri Altman wrote: > Hi, > > > > > Hi Bean, > > > On Mon, 2020-06-29 at 15:15 +0900, Daejun Park wrote: > > > > > Seems you intentionally ignored to give you comments on my > > > > > suggestion. > > > > > let me provide the reason. > > > > > > > > Sorry! I replied to your comment ( > > > > https://protect2.fireeye.com/url?k=be575021-e3854728-be56db6e- > > > > 0cc47a31cdf8- > > 6c7d0e1e42762b92&q=1&u=https%3A%2F%2Flkml.org%2Flkml%2F2020%2F6% > > 2F15%2F1492), > > > > but you didn't reply on that. I thought you agreed because you > > > > didn't > > > > send > > > > any more comments. > > > > > > > > > > > > > Before submitting your next version patch, please check your > > > > > L2P > > > > > mapping HPB reqeust submission logical algorithem. I have did > > > > > > > > We are also reviewing the code that you submitted before. > > > > It seems to be a performance improvement as it sends a map > > > > request > > > > directly. > > > > > > > > > performance comparison testing on 4KB, there are about 13% > > > > > performance > > > > > drop. Also the hit count is lower. I don't know if this is > > > > > related > > > > > to > > > > > > > > It is interesting that there is actually a performance > > > > improvement. > > > > Could you share the test environment, please? However, I think > > > > stability is > > > > important to HPB driver. We have tested our method with the > > > > real > > > > products and > > > > the HPB 1.0 driver is based on that. > > > > > > I just run fio benchmark tool with --rw=randread, --bs=4kb, -- > > > size=8G/10G/64G/100G. and see what performance diff with the > > > direct > > > submission approach. > > > > Thanks! > > > > > > After this patch, your approach can be done as an incremental > > > > patch? > > > > I would > > > > like to test the patch that you submitted and verify it. > > > > > > > > > your current work queue scheduling, since you didn't add the > > > > > timer > > > > > for > > > > > each HPB request. > > > > > > Taking into consideration of the HPB 2.0, can we submit the HPB > > > write > > > request to the SCSI layer? if not, it will be a direct submission > > > way. > > > why not directly use direct way? or maybe you have a more > > > advisable > > > approach to work around this. would you please share with us. > > > appreciate. > > > > I am considering a direct submission way for the next version. > > We will implement the write buffer command of HPB 2.0, after > > patching HPB > > 1.0. > > > > As for the direct submission of HPB releated command including HPB > > write > > buffer, I think we'd better discuss the right approach in depth > > before > > moving on to the next step. > > I vote to stay with the current implementation because: > 1) Bean is probably right about 2.0, but it's out of scope for now - > there is a long way to go before we'll need to worry about it > 2) For now, we should focus on the functional flows. > Performance issues, should such issues indeed exists, can be > dealt with later. And, > 3) The current code base is running in production for more than 3 > years now. > I am not so eager to dump a robust, well debugged code unless it > absolutely necessary. > > Thanks, > Avri > > Hi Avri Thanks, appreciate you shared your position on this topic. I don't know how I can convince you to change your opinion. Let me try. 1. HPB 2.0 is not out of scope. HPB 1.0 only supports 4KB read length, which is useless. I don't know if there will be users who want to use HPB driver only supports 4KB chunk size. I think, we all know that some smartphone vendors have already use HPB 2.0, even HPB 2.0 has not been released yet. you mentioned this in your before emails. HPB 1.0 is just a transition(limited) version, we need to think about the HPB 2.0 support when we develop the HPB 1.0 driver. To say the least, if we don't think about HPB 2.0 support, and just focus HPB 1.0, in the end, after HPB 2.0 releasing, we need to return original point, re-do lots thing, why we cannot fix it now and think one step further. 2. The major goal of the HPB feature is to improve random read performance, and HPB device mode implementing flow is now already very clear enough. I don't know what the functional flows you mentioned. if it is HPB host mode, no, this is another big topic, I think we'd better not add in current driver until we all have a final approach. 3. Regarding the Daejun's HPB driver used age, I can't easily jump to a conclusion. But for sure, before he disclosed his HPB driver and submitted to the community, he did lots of changes and deletions. That means it still needs lots of tests. I didn't mean to disrupt Daejun's patch upstreaming. If Daejun can consider HPB 2.0 support while developing HPB 1.0 patch, that is super. Thus we can quickly add HPB 2.0 support once HPB 2.0 Spec released. Think about that who is now using HPB 1.0? Thanks, Bean
On Tue, 2020-06-30 at 10:05 +0900, Daejun Park wrote: > Hi Bean, > > On Mon, 2020-06-29 at 15:15 +0900, Daejun Park wrote: > > > > Seems you intentionally ignored to give you comments on my > > > > suggestion. > > > > let me provide the reason. > > > > > > Sorry! I replied to your comment ( > > > https://protect2.fireeye.com/url?k=be575021-e3854728-be56db6e-0cc47a31cdf8-6c7d0e1e42762b92&q=1&u=https%3A%2F%2Flkml.org%2Flkml%2F2020%2F6%2F15%2F1492 > > > ), > > > but you didn't reply on that. I thought you agreed because you > > > didn't > > > send > > > any more comments. > > > > > > > > > > Before submitting your next version patch, please check your > > > > L2P > > > > mapping HPB reqeust submission logical algorithem. I have did > > > > > > We are also reviewing the code that you submitted before. > > > It seems to be a performance improvement as it sends a map > > > request > > > directly. > > > > > > > performance comparison testing on 4KB, there are about 13% > > > > performance > > > > drop. Also the hit count is lower. I don't know if this is > > > > related > > > > to > > > > > > It is interesting that there is actually a performance > > > improvement. > > > Could you share the test environment, please? However, I think > > > stability is > > > important to HPB driver. We have tested our method with the real > > > products and > > > the HPB 1.0 driver is based on that. > > > > I just run fio benchmark tool with --rw=randread, --bs=4kb, -- > > size=8G/10G/64G/100G. and see what performance diff with the direct > > submission approach. > > Thanks! > > > > After this patch, your approach can be done as an incremental > > > patch? > > > I would > > > like to test the patch that you submitted and verify it. > > > > > > > your current work queue scheduling, since you didn't add the > > > > timer > > > > for > > > > each HPB request. > > > > Taking into consideration of the HPB 2.0, can we submit the HPB > > write > > request to the SCSI layer? if not, it will be a direct submission > > way. > > why not directly use direct way? or maybe you have a more advisable > > approach to work around this. would you please share with us. > > appreciate. > > I am considering a direct submission way for the next version. > We will implement the write buffer command of HPB 2.0, after patching > HPB 1.0. > > As for the direct submission of HPB releated command including HPB > write > buffer, I think we'd better discuss the right approach in depth > before > moving on to the next step. > Hi Daejun If you need reference code, you can freely copy my code from my RFC v3 patchset. or if you need my side testing support, just let me, I can help you test your code. Thanks, Bean
On Tue, 2020-06-30 at 10:05 +0900, Daejun Park wrote: > > Hi Bean, > > > On Mon, 2020-06-29 at 15:15 +0900, Daejun Park wrote: > > > > > Seems you intentionally ignored to give you comments on my > > > > > suggestion. > > > > > let me provide the reason. > > > > > > > > Sorry! I replied to your comment ( > > > > > https://protect2.fireeye.com/url?k=be575021-e3854728-be56db6e-0cc47a31cdf8-6c7d0e1e42762b92&q=1&u=https%3A%2F%2Flkml.org%2Flkml%2F2020%2F6%2F15%2F1492 > > > > ), > > > > but you didn't reply on that. I thought you agreed because you > > > > didn't > > > > send > > > > any more comments. > > > > > > > > > > > > > Before submitting your next version patch, please check your > > > > > L2P > > > > > mapping HPB reqeust submission logical algorithem. I have did > > > > > > > > We are also reviewing the code that you submitted before. > > > > It seems to be a performance improvement as it sends a map > > > > request > > > > directly. > > > > > > > > > performance comparison testing on 4KB, there are about 13% > > > > > performance > > > > > drop. Also the hit count is lower. I don't know if this is > > > > > related > > > > > to > > > > > > > > It is interesting that there is actually a performance > > > > improvement. > > > > Could you share the test environment, please? However, I think > > > > stability is > > > > important to HPB driver. We have tested our method with the real > > > > products and > > > > the HPB 1.0 driver is based on that. > > > > > > I just run fio benchmark tool with --rw=randread, --bs=4kb, -- > > > size=8G/10G/64G/100G. and see what performance diff with the direct > > > submission approach. > > > > Thanks! > > > > > > After this patch, your approach can be done as an incremental > > > > patch? > > > > I would > > > > like to test the patch that you submitted and verify it. > > > > > > > > > your current work queue scheduling, since you didn't add the > > > > > timer > > > > > for > > > > > each HPB request. > > > > > > Taking into consideration of the HPB 2.0, can we submit the HPB > > > write > > > request to the SCSI layer? if not, it will be a direct submission > > > way. > > > why not directly use direct way? or maybe you have a more advisable > > > approach to work around this. would you please share with us. > > > appreciate. > > > > I am considering a direct submission way for the next version. > > We will implement the write buffer command of HPB 2.0, after patching > > HPB 1.0. > > > > As for the direct submission of HPB releated command including HPB > > write > > buffer, I think we'd better discuss the right approach in depth > > before > > moving on to the next step. > > > > Hi Daejun > If you need reference code, you can freely copy my code from my RFC v3 > patchset. or if you need my side testing support, just let me, I can > help you test your code. > It will be good example code for developing HPB 2.0. Thanks, Daejun
> -----Original Message----- > From: Avri Altman <Avri.Altman@wdc.com> > Sent: 30 June 2020 12:09 > To: daejun7.park@samsung.com; Bean Huo <huobean@gmail.com>; > jejb@linux.ibm.com; martin.petersen@oracle.com; asutoshd@codeaurora.org; > stanley.chu@mediatek.com; cang@codeaurora.org; bvanassche@acm.org; > tomas.winkler@intel.com; ALIM AKHTAR <alim.akhtar@samsung.com> > Cc: linux-scsi@vger.kernel.org; linux-kernel@vger.kernel.org; Sang-yoon Oh > <sangyoon.oh@samsung.com>; Sung-Jun Park > <sungjun07.park@samsung.com>; yongmyung lee > <ymhungry.lee@samsung.com>; Jinyoung CHOI <j-young.choi@samsung.com>; > Adel Choi <adel.choi@samsung.com>; BoRam Shin > <boram.shin@samsung.com> > Subject: RE: [RFC PATCH v3 0/5] scsi: ufs: Add Host Performance Booster > Support > > Hi, > > > > > Hi Bean, > > > On Mon, 2020-06-29 at 15:15 +0900, Daejun Park wrote: > > > > > Seems you intentionally ignored to give you comments on my > > > > > suggestion. > > > > > let me provide the reason. > > > > > > > > Sorry! I replied to your comment ( > > > > https://protect2.fireeye.com/url?k=be575021-e3854728-be56db6e- > > 0cc47a31cdf8- > > > 6c7d0e1e42762b92&q=1&u=https%3A%2F%2Flkml.org%2Flkml%2F2020%2F6% > > 2F15%2F1492), > > > > but you didn't reply on that. I thought you agreed because you > > > > didn't send any more comments. > > > > > > > > > > > > > Before submitting your next version patch, please check your L2P > > > > > mapping HPB reqeust submission logical algorithem. I have did > > > > > > > > We are also reviewing the code that you submitted before. > > > > It seems to be a performance improvement as it sends a map request > > > > directly. > > > > > > > > > performance comparison testing on 4KB, there are about 13% > > > > > performance drop. Also the hit count is lower. I don't know if > > > > > this is related to > > > > > > > > It is interesting that there is actually a performance improvement. > > > > Could you share the test environment, please? However, I think > > > > stability is important to HPB driver. We have tested our method > > > > with the real products and the HPB 1.0 driver is based on that. > > > > > > I just run fio benchmark tool with --rw=randread, --bs=4kb, -- > > > size=8G/10G/64G/100G. and see what performance diff with the direct > > > submission approach. > > > > Thanks! > > > > > > After this patch, your approach can be done as an incremental patch? > > > > I would > > > > like to test the patch that you submitted and verify it. > > > > > > > > > your current work queue scheduling, since you didn't add the > > > > > timer for each HPB request. > > > > > > > > > > Taking into consideration of the HPB 2.0, can we submit the HPB > > > write request to the SCSI layer? if not, it will be a direct submission way. > > > why not directly use direct way? or maybe you have a more advisable > > > approach to work around this. would you please share with us. > > > appreciate. > > > > I am considering a direct submission way for the next version. > > We will implement the write buffer command of HPB 2.0, after patching > > HPB 1.0. > > > > As for the direct submission of HPB releated command including HPB > > write buffer, I think we'd better discuss the right approach in depth > > before moving on to the next step. > I vote to stay with the current implementation because: > 1) Bean is probably right about 2.0, but it's out of scope for now - > there is a long way to go before we'll need to worry about it > 2) For now, we should focus on the functional flows. > Performance issues, should such issues indeed exists, can be dealt with later. > And, > 3) The current code base is running in production for more than 3 years now. > I am not so eager to dump a robust, well debugged code unless it absolutely > necessary. > Avri and Bean, I think this is good approach to take, and let us add incremental patches to add future specification enhancements. > Thanks, > Avri >