[RFC,5/5] scsi: ufs: Prepare HPB read for cached sub-region

Message ID	336371513.41591323603173.JavaMail.epsvc@epcpadp1 (mailing list archive)
State	Superseded
Headers	show Return-Path: <SRS0=+N3d=7S=vger.kernel.org=linux-scsi-owner@kernel.org> Mime-Version: 1.0 Subject: [RFC PATCH 5/5] scsi: ufs: Prepare HPB read for cached sub-region Reply-To: daejun7.park@samsung.com From: Daejun Park <daejun7.park@samsung.com> To: Daejun Park <daejun7.park@samsung.com>, ALIM AKHTAR <alim.akhtar@samsung.com>, "avri.altman@wdc.com" <avri.altman@wdc.com>, "jejb@linux.ibm.com" <jejb@linux.ibm.com>, "martin.petersen@oracle.com" <martin.petersen@oracle.com>, "asutoshd@codeaurora.org" <asutoshd@codeaurora.org>, "beanhuo@micron.com" <beanhuo@micron.com>, "stanley.chu@mediatek.com" <stanley.chu@mediatek.com>, "cang@codeaurora.org" <cang@codeaurora.org>, "bvanassche@acm.org" <bvanassche@acm.org>, "tomas.winkler@intel.com" <tomas.winkler@intel.com> CC: "linux-scsi@vger.kernel.org" <linux-scsi@vger.kernel.org>, "linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>, Sang-yoon Oh <sangyoon.oh@samsung.com>, Sung-Jun Park <sungjun07.park@samsung.com>, yongmyung lee <ymhungry.lee@samsung.com>, Jinyoung CHOI <j-young.choi@samsung.com>, Adel Choi <adel.choi@samsung.com>, BoRam Shin <boram.shin@samsung.com> In-Reply-To: <963815509.21591323002276.JavaMail.epsvc@epcpadp1> Message-ID: <336371513.41591323603173.JavaMail.epsvc@epcpadp1> Date: Fri, 05 Jun 2020 11:12:16 +0900 Content-Transfer-Encoding: 7bit Content-Type: text/plain; charset="utf-8" References: <963815509.21591323002276.JavaMail.epsvc@epcpadp1> <231786897.01591322101492.JavaMail.epsvc@epcpadp1> <336371513.41591320902369.JavaMail.epsvc@epcpadp1> <963815509.21591320301642.JavaMail.epsvc@epcpadp1> <231786897.01591320001492.JavaMail.epsvc@epcpadp1> <CGME20200605011604epcms2p8bec8ef6682583d7248dc7d9dc1bfc882@epcms2p2> Sender: linux-scsi-owner@vger.kernel.org Precedence: bulk
Series	scsi: ufs: Add Host Performance Booster Support \| expand [RFC,0/5] scsi: ufs: Add Host Performance Booster Support [RFC,1/5] scsi: ufs: Add UFS feature related parameter [RFC,2/5] scsi: ufs: Add UFS-feature layer [RFC,3/5] scsi: ufs: Introduce HPB module [RFC,4/5] scsi: ufs: L2P map management for HPB read [RFC,5/5] scsi: ufs: Prepare HPB read for cached sub-region

Daejun Park June 5, 2020, 2:12 a.m. UTC

This patch changes the read I/O to the HPB read I/O.

If the logical address of the read I/O belongs to active sub-region, the
HPB driver modifies the read I/O command to HPB read. It modifies the upiu
command of UFS instead of modifying the existing SCSI command.

In the HPB version 1.0, the maximum read I/O size that can be converted to
HPB read is 4KB.

The dirty map of the active sub-region prevents an incorrect HPB read that
has stale physical page number which is updated by previous write I/O.

Signed-off-by: Daejun Park <daejun7.park@samsung.com>
---
 drivers/scsi/ufs/ufshpb.c | 249 ++++++++++++++++++++++++++++++++++++++
 1 file changed, 249 insertions(+)

Avri Altman June 6, 2020, 6:38 p.m. UTC | #1

> +static bool ufshpb_test_ppn_dirty(struct ufshpb_lu *hpb, int rgn_idx,
> +                                  int srgn_idx, int srgn_offset, int cnt)


> +
> +       for (i = 0; i < bit_len; i++) {
> +               if (test_bit(srgn_offset + i, srgn->mctx->ppn_dirty))
Maybe use a mask or hweight instead of testing bit by bit?

> +static void
> +ufshpb_set_hpb_read_to_upiu(struct ufshpb_lu *hpb, struct ufshcd_lrb
> *lrbp,
> +                                 u32 lpn, u64 ppn,  unsigned int transfer_len)
> +{
> +       unsigned char *cdb = lrbp->ucd_req_ptr->sc.cdb;
> +
> +       cdb[0] = UFSHPB_READ;
> +
> +       put_unaligned_be32(lpn, &cdb[2]);
Is this needed? The lba is already occupying bytes 2..5

> +       put_unaligned_be64(ppn, &cdb[6]);
> +       cdb[14] = transfer_len;
> +}
> +

Thanks,
Avri

Daejun Park June 9, 2020, 12:53 a.m. UTC | #2

> > +static bool ufshpb_test_ppn_dirty(struct ufshpb_lu *hpb, int rgn_idx,
> > +                                  int srgn_idx, int srgn_offset, int cnt)

> > +
> > +       for (i = 0; i < bit_len; i++) {
> > +               if (test_bit(srgn_offset + i, srgn->mctx->ppn_dirty))
> Maybe use a mask or hweight instead of testing bit by bit?
There is no problem in this HPB vesion because it only supports 4KB sized read IO.
However, this code is not as efficient as you pointed out. So I will change this in HPB version 2.0.

> > +static void
> > +ufshpb_set_hpb_read_to_upiu(struct ufshpb_lu *hpb, struct ufshcd_lrb
> > *lrbp,
> > +                                 u32 lpn, u64 ppn,  unsigned int transfer_len)
> > +{
> > +       unsigned char *cdb = lrbp->ucd_req_ptr->sc.cdb;
> > +
> > +       cdb[0] = UFSHPB_READ;
> > +
> > +       put_unaligned_be32(lpn, &cdb[2]);
> Is this needed? The lba is already occupying bytes 2..5
The needless code will be deleted on next patch.

Thanks,
Daejun

Bart Van Assche June 9, 2020, 1:23 a.m. UTC | #3

On 2020-06-06 11:38, Avri Altman wrote:
>> +       for (i = 0; i < bit_len; i++) {
>> +               if (test_bit(srgn_offset + i, srgn->mctx->ppn_dirty))
>
> Maybe use a mask or hweight instead of testing bit by bit?

How about using find_next_bit() from include/linux/bitmap.h?

/*
 *  find_next_bit(addr, nbits, bit) Position next set bit in *addr
 *                                  >= bit
 */

Thanks,

Bart.

Bart Van Assche June 11, 2020, 1:37 a.m. UTC | #4

On 2020-06-04 19:12, Daejun Park wrote:
> +static inline bool ufshpb_is_read_cmd(struct scsi_cmnd *cmd)
> +{
> +	if (cmd->cmnd[0] == READ_10 || cmd->cmnd[0] == READ_16)
> +		return true;
> +
> +	return false;
> +}

Has it been considered to check req_op(cmd->request) instead of checking
the SCSI CDB?

> +static inline bool ufshpb_is_write_discard_cmd(struct scsi_cmnd *cmd)
> +{
> +	if (cmd->cmnd[0] == WRITE_10 || cmd->cmnd[0] == WRITE_16 ||
> +	    cmd->cmnd[0] == UNMAP)
> +		return true;
> +
> +	return false;
> +}

Does the above code depend on how the sd driver translates write and
discard requests? Do UFS devices support the WRITE SAME SCSI command?
Has it been considered to check req_op(cmd->request) instead of checking
the SCSI CDB?

> +static inline bool ufshpb_is_support_chunk(int transfer_len)
> +{
> +	return transfer_len <= HPB_MULTI_CHUNK_HIGH;
> +}

The names used in the above function are mysterious. What is a support
chunk? What does "multi chunk high" mean? Please add a comment.

> +static inline u32 ufshpb_get_lpn(struct scsi_cmnd *cmnd)
> +{
> +	return blk_rq_pos(cmnd->request) >>
> +		(ilog2(cmnd->device->sector_size) - 9);
> +}
>
> +static inline unsigned int ufshpb_get_len(struct scsi_cmnd *cmnd)
> +{
> +	return blk_rq_sectors(cmnd->request) >>
> +		(ilog2(cmnd->device->sector_size) - 9);
> +}

Do the above two functions perhaps each include a duplicate of
sectors_to_logical()?

> +/* routine : READ10 -> HPB_READ  */
> +static void ufshpb_prep_fn(struct ufs_hba *hba, struct ufshcd_lrb *lrbp)
> +{
> +	struct ufshpb_lu *hpb;
> +	struct ufshpb_region *rgn;
> +	struct ufshpb_subregion *srgn;
> +	struct scsi_cmnd *cmd = lrbp->cmd;
> +	u32 lpn;
> +	u64 ppn;
> +	unsigned long flags;
> +	int transfer_len, rgn_idx, srgn_idx, srgn_offset;
> +	int err = 0;
> +
> +	hpb = ufshpb_get_hpb_data(cmd);
> +	err = ufshpb_lu_get(hpb);
> +	if (unlikely(err))
> +		return;
> +
> +	WARN_ON(hpb->lun != cmd->device->lun);
        ^^^^^^^
        WARN_ON_ONCE()?

> +	if (!ufshpb_is_write_discard_cmd(cmd) &&
> +	    !ufshpb_is_read_cmd(cmd))
> +		goto put_hpb;
> +
> +	transfer_len = ufshpb_get_len(cmd);
> +	if (unlikely(!transfer_len))
> +		goto put_hpb;
> +
> +	lpn = ufshpb_get_lpn(cmd);
> +	ufshpb_get_pos_from_lpn(hpb, lpn, &rgn_idx, &srgn_idx, &srgn_offset);
> +	rgn = hpb->rgn_tbl + rgn_idx;
> +	srgn = rgn->srgn_tbl + srgn_idx;
> +
> +	/* If commnad type is WRITE and DISCARD, set bitmap as drity */
              ^^^^^^^               ^^^                        ^^^^^
              command?              or?                        dirty?
> +	if (ufshpb_is_write_discard_cmd(cmd)) {
> +		spin_lock_irqsave(&hpb->hpb_state_lock, flags);
> +		ufshpb_set_ppn_dirty(hpb, rgn_idx, srgn_idx, srgn_offset,
> +				 transfer_len);
> +		spin_unlock_irqrestore(&hpb->hpb_state_lock, flags);
> +		goto put_hpb;
> +	}
> +
> +	WARN_ON(!ufshpb_is_read_cmd(cmd));
        ^^^^^^^
        WARN_ON_ONCE()?

> +	if (!ufshpb_is_support_chunk(transfer_len))
> +		goto put_hpb;
> +
> +	spin_lock_irqsave(&hpb->hpb_state_lock, flags);
> +	if (ufshpb_test_ppn_dirty(hpb, rgn_idx, srgn_idx, srgn_offset,
> +				   transfer_len)) {
> +		atomic_inc(&hpb->stats.miss_cnt);
> +		spin_unlock_irqrestore(&hpb->hpb_state_lock, flags);
> +		goto put_hpb;
> +	}
> +
> +	ppn = ufshpb_get_ppn(hpb, srgn->mctx, srgn_offset, &err);
> +	spin_unlock_irqrestore(&hpb->hpb_state_lock, flags);
> +	if (unlikely(err)) {
> +		/*
> +		 * In this case, the region state is active,
> +		 * but the ppn table is not allocated.
> +		 * Make sure that ppn tabie must be allocated on
                                      ^^^^^
                                      table?
> +		 * active state
> +		 */
> +		WARN_ON(true);
> +		dev_err(&hpb->hpb_lu_dev,
> +			"ufshpb_get_ppn failed. err %d\n", err);
> +		goto put_hpb;
> +	}
> +
> +	ufshpb_set_hpb_read_to_upiu(hpb, lrbp, lpn, ppn, transfer_len);
> +
> +	atomic_inc(&hpb->stats.hit_cnt);
> +put_hpb:
> +	ufshpb_lu_put(hpb);
> +}

Thanks,

Bart.

Avri Altman June 11, 2020, 6:43 a.m. UTC | #5

> > +static inline bool ufshpb_is_support_chunk(int transfer_len)
> > +{
> > +     return transfer_len <= HPB_MULTI_CHUNK_HIGH;
> > +}
> 
> The names used in the above function are mysterious. What is a support
> chunk? What does "multi chunk high" mean? Please add a comment.
HPB1.0 limits transfer_len to be at most 1.
HPB2.0, which is in its final draft, allows transfer_len to be at most 128,
But introduce some new behavior depends on transfer_len.
This is just preparing for that.

Thanks,
Avri

Daejun Park June 12, 2020, 3:39 a.m. UTC | #6

On 2020-06-04 18:38, Daejun Park wrote:
> > +  if (total_srgn_cnt != 0) {
> > +    dev_err(hba->dev, "ufshpb(%d) error total_subregion_count %d",
> > +      hpb->lun, total_srgn_cnt);
> > +    goto release_srgn_table;
> > +  }
> > +
> > +  return 0;
> > +release_srgn_table:
> > +  for (i = 0; i < rgn_idx; i++) {
> > +    rgn = rgn_table + i;
> > +    if (rgn->srgn_tbl)
> > +      kvfree(rgn->srgn_tbl);
> > +  }

> Please insert a blank line above goto labels as is done in most of the
> kernel code.
OK, I will fix it.

> > +static struct device_attribute ufshpb_sysfs_entries[] = {
> > +  __ATTR(hit_count, 0444, ufshpb_sysfs_show_hit_cnt, NULL),
> > +  __ATTR(miss_count, 0444, ufshpb_sysfs_show_miss_cnt, NULL),
> > +  __ATTR(rb_noti_count, 0444, ufshpb_sysfs_show_rb_noti_cnt, NULL),
> > +  __ATTR(rb_active_count, 0444, ufshpb_sysfs_show_rb_active_cnt, NULL),
> > +  __ATTR(rb_inactive_count, 0444, ufshpb_sysfs_show_rb_inactive_cnt,
> > +         NULL),
> > +  __ATTR(map_req_count, 0444, ufshpb_sysfs_show_map_req_cnt, NULL),
> > +  __ATTR_NULL
> > +};

> Please use __ATTR_RO() where appropriate.
They are only readable attributes. So I changed the code to use __ATTR_RO.

> > +static int ufshpb_create_sysfs(struct ufs_hba *hba, struct ufshpb_lu *hpb)
> > +{
> > +  struct device_attribute *attr;
> > +  int ret;
> > +
> > +  device_initialize(&hpb->hpb_lu_dev);
> > +
> > +  ufshpb_stat_init(hpb);
> > +
> > +  hpb->hpb_lu_dev.parent = get_device(&hba->ufsf.hpb_dev);
> > +  hpb->hpb_lu_dev.release = ufshpb_dev_release;
> > +  dev_set_name(&hpb->hpb_lu_dev, "ufshpb_lu%d", hpb->lun);
> > +
> > +  ret = device_add(&hpb->hpb_lu_dev);
> > +  if (ret) {
> > +    dev_err(hba->dev, "ufshpb(%d) device_add failed",
> > +      hpb->lun);
> > +    return -ENODEV;
> > +  }
> > +
> > +  for (attr = ufshpb_sysfs_entries; attr->attr.name != NULL; attr++) {
> > +    if (device_create_file(&hpb->hpb_lu_dev, attr))
> > +      dev_err(hba->dev, "ufshpb(%d) %s create file error\n",
> > +        hpb->lun, attr->attr.name);
> > +  }
> > +
> > +  return 0;
> > +}

> This is the wrong way to create sysfs attributes. Please set the
> 'groups' member of struct device instead of using a loop to create sysfs
> attributes. The former approach is compatible with udev but the latter
> approach is not.
OK, I changed to create attributes without loop.

> > +static void ufshpb_probe_async(void *data, async_cookie_t cookie)
> > +{
> > +  struct ufshpb_dev_info hpb_dev_info = { 0 };
> > +  struct ufs_hba *hba = data;
> > +  char *desc_buf;
> > +  int ret;
> > +
> > +  desc_buf = kzalloc(QUERY_DESC_MAX_SIZE, GFP_KERNEL);
> > +  if (!desc_buf)
> > +    goto release_desc_buf;
> > +
> > +  ret = ufshpb_get_dev_info(hba, &hpb_dev_info, desc_buf);
> > +  if (ret)
> > +    goto release_desc_buf;
> > +
> > +  /*
> > +   * Because HPB driver uses scsi_device data structure,
> > +   * we should wait at this point until finishing initialization of all
> > +   * scsi devices. Even if timeout occurs, HPB driver will search
> > +   * the scsi_device list on struct scsi_host (shost->__host list_head)
> > +   * and can find out HPB logical units in all scsi_devices
> > +   */
> > +  wait_event_timeout(hba->ufsf.sdev_wait,
> > +         (atomic_read(&hba->ufsf.slave_conf_cnt)
> > +        == hpb_dev_info.num_lu),
> > +         SDEV_WAIT_TIMEOUT);
> > +
> > +  dev_dbg(hba->dev, "ufshpb: slave count %d, lu count %d\n",
> > +    atomic_read(&hba->ufsf.slave_conf_cnt), hpb_dev_info.num_lu);
> > +
> > +  ufshpb_scan_hpb_lu(hba, &hpb_dev_info, desc_buf);
> > +release_desc_buf:
> > +  kfree(desc_buf);
> > +}

> What happens if two LUNs are added before the above code is woken up?
> Will that perhaps cause the wait_event_timeout() call to wait forever?
I don't think it is problem. I think that the wait_event_timeout() will
check the condition before waiting.

> > +static int ufshpb_probe(struct device *dev)
> > +{
> > +  struct ufs_hba *hba;
> > +  struct ufsf_feature_info *ufsf;
> > +
> > +  if (dev->type != &ufshpb_dev_type)
> > +    return -ENODEV;
> > +
> > +  ufsf = container_of(dev, struct ufsf_feature_info, hpb_dev);
> > +  hba = container_of(ufsf, struct ufs_hba, ufsf);
> > +
> > +  async_schedule(ufshpb_probe_async, hba);
> > +  return 0;
> > +}

> So this is an asynchronous probe that is not visible to the device
> driver core? Could the PROBE_PREFER_ASYNCHRONOUS flag have been used
> instead to make device probing asynchronous?
I added the PROBE_PREFER_ASYNCHRONOUS flag to code and changed it to probe
synchronously.
 
> > +static int ufshpb_remove(struct device *dev)
> > +{
> > +  struct ufshpb_lu *hpb, *n_hpb;
> > +  struct ufsf_feature_info *ufsf;
> > +  struct scsi_device *sdev;
> > +
> > +  ufsf = container_of(dev, struct ufsf_feature_info, hpb_dev);
> > +
> > +  dev_set_drvdata(&ufsf->hpb_dev, NULL);
> > +
> > +  list_for_each_entry_safe(hpb, n_hpb, &ufshpb_drv.lh_hpb_lu,
> > +         list_hpb_lu) {
> > +    ufshpb_set_state(hpb, HPB_FAILED);
> > +
> > +    sdev = hpb->sdev_ufs_lu;
> > +    sdev->hostdata = NULL;
> > +
> > +    device_del(&hpb->hpb_lu_dev);
> > +
> > +    dev_info(&hpb->hpb_lu_dev, "hpb_lu_dev refcnt %d\n",
> > +       kref_read(&hpb->hpb_lu_dev.kobj.kref));
> > +    put_device(&hpb->hpb_lu_dev);
> > +  }
> > +  dev_info(dev, "ufshpb: remove success\n");
> > +
> > +  return 0;
> > +}

> Where is the code that waits for the asynchronously scheduled probe
> calls to finish?
I changed it to probe without async_schedule.

> > diff --git a/drivers/scsi/ufs/ufshpb.h b/drivers/scsi/ufs/ufshpb.h
> > new file mode 100644
> > index 000000000000..c6dd88e00849
> > --- /dev/null
> > +++ b/drivers/scsi/ufs/ufshpb.h
> > @@ -0,0 +1,185 @@
> > +/* SPDX-License-Identifier: GPL-2.0-only */
> > +/*
> > + * Universal Flash Storage Host Performance Booster
> > + *
> > + * Copyright (C) 2017-2018 Samsung Electronics Co., Ltd.
> > + *
> > + * Authors:
> > + *  Yongmyung Lee <ymhungry.lee@samsung.com>
> > + *  Jinyoung Choi <j-young.choi@samsung.com>
> > + *
> > + * This program is free software; you can redistribute it and/or
> > + * modify it under the terms of the GNU General Public License
> > + * as published by the Free Software Foundation; either version 2
> > + * of the License, or (at your option) any later version.
> > + * See the COPYING file in the top-level directory or visit
> > + * <http://www.gnu.org/licenses/gpl-2.0.html>
> > + *
> > + * This program is distributed in the hope that it will be useful,
> > + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> > + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> > + * GNU General Public License for more details.
> > + *
> > + * This program is provided "AS IS" and "WITH ALL FAULTS" and
> > + * without warranty of any kind. You are solely responsible for
> > + * determining the appropriateness of using and distributing
> > + * the program and assume all risks associated with your exercise
> > + * of rights with respect to the program, including but not limited
> > + * to infringement of third party rights, the risks and costs of
> > + * program errors, damage to or loss of data, programs or equipment,
> > + * and unavailability or interruption of operations. Under no
> > + * circumstances will the contributor of this Program be liable for
> > + * any damages of any kind arising from your use or distribution of
> > + * this program.
> > + *
> > + * The Linux Foundation chooses to take subject only to the GPLv2
> > + * license terms, and distributes only under these terms.
> > + */

> Please use an SPDX declaration instead of the full GPLv2 text.
OK, I will.

Thanks,

Daejun.

[RFC,5/5] scsi: ufs: Prepare HPB read for cached sub-region

Commit Message

Comments

Patch