diff mbox series

[RFC,2/6] scsi: ufs: Rename clk_scaling_lock to host_rw_sem

Message ID 20211004120650.153218-3-adrian.hunter@intel.com (mailing list archive)
State Changes Requested
Headers show
Series scsi: ufs: Start to make driver synchronization easier to understand | expand

Commit Message

Adrian Hunter Oct. 4, 2021, 12:06 p.m. UTC
To fit its new purpose as a more general purpose sleeping lock for the
host.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
---
 drivers/scsi/ufs/ufshcd.c |  2 +-
 drivers/scsi/ufs/ufshcd.h | 14 +++++++-------
 2 files changed, 8 insertions(+), 8 deletions(-)

Comments

Bart Van Assche Oct. 4, 2021, 4:52 p.m. UTC | #1
On 10/4/21 5:06 AM, Adrian Hunter wrote:
> To fit its new purpose as a more general purpose sleeping lock for the
> host.
> 
> [ ... ]
 >
> diff --git a/drivers/scsi/ufs/ufshcd.h b/drivers/scsi/ufs/ufshcd.h
> index 9b1ef272fb3c..495e1c0afae3 100644
> --- a/drivers/scsi/ufs/ufshcd.h
> +++ b/drivers/scsi/ufs/ufshcd.h
> @@ -897,7 +897,7 @@ struct ufs_hba {
>   	enum bkops_status urgent_bkops_lvl;
>   	bool is_urgent_bkops_lvl_checked;
>   
> -	struct rw_semaphore clk_scaling_lock;
> +	struct rw_semaphore host_rw_sem;
>   	unsigned char desc_size[QUERY_DESC_IDN_MAX];
>   	atomic_t scsi_block_reqs_cnt;

Hi Adrian,

It seems to me that this patch series goes in another direction than the
direction the JEDEC UFS committee is going into. The UFSHCI 4.0 specification
will support MCQ (Multi-Circular queue). We will benefit most from the v4.0
MCQ support if there is no contention between the CPUs that submit UFS commands
to different queues. I think the intention of this patch series is to make a
single synchronization object protect all submission queues. I'm concerned that
this will prevent to fully benefit from multiqueue support. Has it been
considered to eliminate the clk_scaling_lock and instead to use RCU to
serialize clock scaling against command processing? One possible approach is to
use blk_mq_freeze_queue() and blk_mq_unfreeze_queue() around the clock scaling
code. A disadvantage of using RCU is that waiting for an RCU grace period takes
some time - about 10 ms on my test setup. I have not yet verified what the
performance and time impact would be of using an expedited RCU grace period
instead of a regular RCU grace period.

Bart.
Adrian Hunter Oct. 5, 2021, 5:06 a.m. UTC | #2
On 04/10/2021 19:52, Bart Van Assche wrote:
> On 10/4/21 5:06 AM, Adrian Hunter wrote:
>> To fit its new purpose as a more general purpose sleeping lock for the
>> host.
>>
>> [ ... ]
>>
>> diff --git a/drivers/scsi/ufs/ufshcd.h b/drivers/scsi/ufs/ufshcd.h
>> index 9b1ef272fb3c..495e1c0afae3 100644
>> --- a/drivers/scsi/ufs/ufshcd.h
>> +++ b/drivers/scsi/ufs/ufshcd.h
>> @@ -897,7 +897,7 @@ struct ufs_hba {
>>       enum bkops_status urgent_bkops_lvl;
>>       bool is_urgent_bkops_lvl_checked;
>>   -    struct rw_semaphore clk_scaling_lock;
>> +    struct rw_semaphore host_rw_sem;
>>       unsigned char desc_size[QUERY_DESC_IDN_MAX];
>>       atomic_t scsi_block_reqs_cnt;
> 
> Hi Adrian,

Thanks for looking at this.

> 
> It seems to me that this patch series goes in another direction than the
> direction the JEDEC UFS committee is going into. The UFSHCI 4.0 specification
> will support MCQ (Multi-Circular queue). We will benefit most from the v4.0
> MCQ support if there is no contention between the CPUs that submit UFS commands
> to different queues. I think the intention of this patch series is to make a
> single synchronization object protect all submission queues.

The intention is to make the locking easier to understand.  We need either to
share access to the host (e.g. ufshcd_queuecommand) or provide for exclusive
ownership (e.g. ufshcd_err_handler, PM, Shutdown).  We should be able to do
that with 1 rw_semaphore.

> I'm concerned that
> this will prevent to fully benefit from multiqueue support. Has it been

You are talking about contention between ufshcd_queuecommand() running
simultaneously on 2 CPUs right?  In that case, down_read() should be practically
atomic, so no contention unless a third process is waiting on down_write()
which never happens under normal circumstances.

> Has it been
> considered to eliminate the clk_scaling_lock and instead to use RCU to
> serialize clock scaling against command processing? One possible approach is to
> use blk_mq_freeze_queue() and blk_mq_unfreeze_queue() around the clock scaling
> code. A disadvantage of using RCU is that waiting for an RCU grace period takes
> some time - about 10 ms on my test setup. I have not yet verified what the
> performance and time impact would be of using an expedited RCU grace period
> instead of a regular RCU grace period.

It is probably worth measuring the performance of clk_scaling_lock first.
Bart Van Assche Oct. 5, 2021, 7:06 p.m. UTC | #3
On 10/4/21 10:06 PM, Adrian Hunter wrote:
> On 04/10/2021 19:52, Bart Van Assche wrote:
>> I'm concerned that
>> this will prevent to fully benefit from multiqueue support. Has it been
> 
> You are talking about contention between ufshcd_queuecommand() running
> simultaneously on 2 CPUs right?  In that case, down_read() should be practically
> atomic, so no contention unless a third process is waiting on down_write()
> which never happens under normal circumstances.
> 
>> Has it been
>> considered to eliminate the clk_scaling_lock and instead to use RCU to
>> serialize clock scaling against command processing? One possible approach is to
>> use blk_mq_freeze_queue() and blk_mq_unfreeze_queue() around the clock scaling
>> code. A disadvantage of using RCU is that waiting for an RCU grace period takes
>> some time - about 10 ms on my test setup. I have not yet verified what the
>> performance and time impact would be of using an expedited RCU grace period
>> instead of a regular RCU grace period.
> 
> It is probably worth measuring the performance of clk_scaling_lock first.

Upcoming UFS devices support several million IOPS. My experience, and that of
everyone else working with such storage devices is that every single atomic
operation in the hot path causes a measurable performance overhead.
down_read() is a synchronization operation and implementing synchronization
operations without using atomic loads or stores is not possible. This is why
I see clk_scaling_lock as a performance bottleneck.

Thanks,

Bart.
diff mbox series

Patch

diff --git a/drivers/scsi/ufs/ufshcd.c b/drivers/scsi/ufs/ufshcd.c
index 350ecfd90306..3912b74d50ae 100644
--- a/drivers/scsi/ufs/ufshcd.c
+++ b/drivers/scsi/ufs/ufshcd.c
@@ -9630,7 +9630,7 @@  int ufshcd_init(struct ufs_hba *hba, void __iomem *mmio_base, unsigned int irq)
 	/* Initialize mutex for exception event control */
 	mutex_init(&hba->ee_ctrl_mutex);
 
-	init_rwsem(&hba->clk_scaling_lock);
+	init_rwsem(&hba->host_rw_sem);
 
 	ufshcd_init_clk_gating(hba);
 
diff --git a/drivers/scsi/ufs/ufshcd.h b/drivers/scsi/ufs/ufshcd.h
index 9b1ef272fb3c..495e1c0afae3 100644
--- a/drivers/scsi/ufs/ufshcd.h
+++ b/drivers/scsi/ufs/ufshcd.h
@@ -897,7 +897,7 @@  struct ufs_hba {
 	enum bkops_status urgent_bkops_lvl;
 	bool is_urgent_bkops_lvl_checked;
 
-	struct rw_semaphore clk_scaling_lock;
+	struct rw_semaphore host_rw_sem;
 	unsigned char desc_size[QUERY_DESC_IDN_MAX];
 	atomic_t scsi_block_reqs_cnt;
 
@@ -1420,32 +1420,32 @@  static inline int ufshcd_rpmb_rpm_put(struct ufs_hba *hba)
 
 static inline void ufshcd_down_read(struct ufs_hba *hba)
 {
-	down_read(&hba->clk_scaling_lock);
+	down_read(&hba->host_rw_sem);
 }
 
 static inline void ufshcd_up_read(struct ufs_hba *hba)
 {
-	up_read(&hba->clk_scaling_lock);
+	up_read(&hba->host_rw_sem);
 }
 
 static inline int ufshcd_down_read_trylock(struct ufs_hba *hba)
 {
-	return down_read_trylock(&hba->clk_scaling_lock);
+	return down_read_trylock(&hba->host_rw_sem);
 }
 
 static inline void ufshcd_down_write(struct ufs_hba *hba)
 {
-	down_write(&hba->clk_scaling_lock);
+	down_write(&hba->host_rw_sem);
 }
 
 static inline void ufshcd_up_write(struct ufs_hba *hba)
 {
-	up_write(&hba->clk_scaling_lock);
+	up_write(&hba->host_rw_sem);
 }
 
 static inline void ufshcd_downgrade_write(struct ufs_hba *hba)
 {
-	downgrade_write(&hba->clk_scaling_lock);
+	downgrade_write(&hba->host_rw_sem);
 }
 
 #endif /* End of Header */