diff mbox series

[2/2] scsi: ufs: core: Fix the code for entering hibernation

Message ID 20240821182923.145631-3-bvanassche@acm.org (mailing list archive)
State Changes Requested
Headers show
Series Fix the UFS driver hibernation code | expand

Commit Message

Bart Van Assche Aug. 21, 2024, 6:29 p.m. UTC
Accessing a host controller register after the host controller has
entered the hibernation state may cause the host controller to exit the
hibernation state. Hence rework the hibernation entry code such that it
does not modify the interrupt enabled status. This patch relies on the
following:
* If an UIC command is submitted that should be completed by the UIC
  command completion interrupt, hba->uic_async_done == NULL.
* If an UIC command is submitted that should be completed by the power
  mode change interrupt or by a hibernation state change interrupt,
  hba->uic_async_done != NULL.

Signed-off-by: Bart Van Assche <bvanassche@acm.org>
---
 drivers/ufs/core/ufshcd.c | 22 ++++++----------------
 include/ufs/ufshcd.h      |  7 ++++---
 2 files changed, 10 insertions(+), 19 deletions(-)

Comments

Bean Huo Aug. 21, 2024, 9:27 p.m. UTC | #1
On Wed, 2024-08-21 at 11:29 -0700, Bart Van Assche wrote:
> drivers/ufs/core/ufshcd.c | 22 ++++++----------------
>  include/ufs/ufshcd.h      |  7 ++++---
>  2 files changed, 10 insertions(+), 19 deletions(-)
> 
> diff --git a/drivers/ufs/core/ufshcd.c b/drivers/ufs/core/ufshcd.c
> index d0ae6e50becc..e12f30b8a83c 100644
> --- a/drivers/ufs/core/ufshcd.c
> +++ b/drivers/ufs/core/ufshcd.c
> @@ -2585,6 +2585,7 @@ int ufshcd_send_uic_cmd(struct ufs_hba *hba,
> struct uic_command *uic_cmd)
>         ufshcd_hold(hba);
>         mutex_lock(&hba->uic_cmd_mutex);
>         ufshcd_add_delay_before_dme_cmd(hba);
> +       WARN_ON(hba->uic_async_done);
>  
>         ret = __ufshcd_send_uic_cmd(hba, uic_cmd, true);
>         if (!ret)
> @@ -4255,7 +4256,6 @@ static int ufshcd_uic_pwr_ctrl(struct ufs_hba
> *hba, struct uic_command *cmd)
>         unsigned long flags;
>         u8 status;
>         int ret;
> -       bool reenable_intr = false;
>  
>         mutex_lock(&hba->uic_cmd_mutex);
>         ufshcd_add_delay_before_dme_cmd(hba);
> @@ -4266,15 +4266,6 @@ static int ufshcd_uic_pwr_ctrl(struct ufs_hba
> *hba, struct uic_command *cmd)
>                 goto out_unlock;
>         }
>         hba->uic_async_done = &uic_async_done;
> -       if (ufshcd_readl(hba, REG_INTERRUPT_ENABLE) &
> UIC_COMMAND_COMPL) {
> -               ufshcd_disable_intr(hba, UIC_COMMAND_COMPL);
> -               /*
> -                * Make sure UIC command completion interrupt is
> disabled before
> -                * issuing UIC command.
> -                */
> -               ufshcd_readl(hba, REG_INTERRUPT_ENABLE);
> -               reenable_intr = true;
> -       }
>         spin_unlock_irqrestore(hba->host->host_lock, flags);
>         ret = __ufshcd_send_uic_cmd(hba, cmd, false);
>         if (ret) {
> @@ -4318,8 +4309,6 @@ static int ufshcd_uic_pwr_ctrl(struct ufs_hba
> *hba, struct uic_command *cmd)
>         spin_lock_irqsave(hba->host->host_lock, flags);
>         hba->active_uic_cmd = NULL;
>         hba->uic_async_done = NULL;
> -       if (reenable_intr)
> -               ufshcd_enable_intr(hba, UIC_COMMAND_COMPL);
>         if (ret) {
>                 ufshcd_set_link_broken(hba);
>                 ufshcd_schedule_eh_work(hba);
> @@ -5472,11 +5461,12 @@ static irqreturn_t
> ufshcd_uic_cmd_compl(struct ufs_hba *hba, u32 intr_status)
>                 hba->errors |= (UFSHCD_UIC_HIBERN8_MASK &
> intr_status);
>  
>         if (intr_status & UIC_COMMAND_COMPL && cmd) {
> -               cmd->argument2 |= ufshcd_get_uic_cmd_result(hba);
> -               cmd->argument3 = ufshcd_get_dme_attr_val(hba);

Bart, 

My only concern is, removing disabling UIC completion IRQ, and keeping
is.uccs 1, then we don't read its status in case of ufshcd_uic_pwr_ctrl
path, whether this will affect the next UIC access result.

Other things look good to me.

Kind regards,
Bean
Bart Van Assche Aug. 21, 2024, 9:39 p.m. UTC | #2
On 8/21/24 2:27 PM, Bean Huo wrote:
> My only concern is, removing disabling UIC completion IRQ, and keeping
> is.uccs 1, then we don't read its status in case of ufshcd_uic_pwr_ctrl
> path, whether this will affect the next UIC access result.

Hmm ... I think I need more context information. If the UIC completion
interrupt is left enabled then ufshcd_intr() will execute the code
"intr_status = ufshcd_readl(hba, REG_INTERRUPT_STATUS)". This statement
reads all bits from the interrupt status register including the UCCS
bit, isn't it?

Thanks,

Bart.
Bao D. Nguyen Aug. 21, 2024, 11:26 p.m. UTC | #3
On 8/21/2024 11:29 AM, Bart Van Assche wrote:
> Accessing a host controller register after the host controller has
> entered the hibernation state may cause the host controller to exit the
> hibernation state. Hence rework the hibernation entry code such that it
> does not modify the interrupt enabled status. Bart,
I am not clear on the offending condition, particularly the term 
"hibernation" used in this context. In the function 
ufshcd_uic_pwr_ctrl() where you are making the change, the host 
controller is fully active at this point, right?
Please help me clarify the issue.

Thanks, Bao
Bart Van Assche Aug. 22, 2024, 12:14 a.m. UTC | #4
On 8/21/24 4:26 PM, Bao D. Nguyen wrote:
> On 8/21/2024 11:29 AM, Bart Van Assche wrote:
>> Accessing a host controller register after the host controller has
>> entered the hibernation state may cause the host controller to exit the
>> hibernation state. Hence rework the hibernation entry code such that it
>> does not modify the interrupt enabled status. Bart,
 >
> I am not clear on the offending condition, particularly the term 
> "hibernation" used in this context. In the function 
> ufshcd_uic_pwr_ctrl() where you are making the change, the host 
> controller is fully active at this point, right?
> Please help me clarify the issue.

Hi Bao,

Isn't "hibernation" terminology that comes from the M-PHY standard?
See also the DME_HIBERNATE_ENTER and DME_HIBERNATE_EXIT constants in
the UFSHCI driver source code. Please let me know if you need more
information.

Bart.
Bao D. Nguyen Aug. 22, 2024, 1:05 a.m. UTC | #5
On 8/21/2024 5:14 PM, Bart Van Assche wrote:
> On 8/21/24 4:26 PM, Bao D. Nguyen wrote:
>> On 8/21/2024 11:29 AM, Bart Van Assche wrote:
>>> Accessing a host controller register after the host controller has
>>> entered the hibernation state may cause the host controller to exit the
>>> hibernation state. Hence rework the hibernation entry code such that it
>>> does not modify the interrupt enabled status. Bart,
>  >
>> I am not clear on the offending condition, particularly the term 
>> "hibernation" used in this context. In the function 
>> ufshcd_uic_pwr_ctrl() where you are making the change, the host 
>> controller is fully active at this point, right?
>> Please help me clarify the issue.
> 
> Hi Bao,
> 
> Isn't "hibernation" terminology that comes from the M-PHY standard?
> See also the DME_HIBERNATE_ENTER and DME_HIBERNATE_EXIT constants in
> the UFSHCI driver source code. Please let me know if you need more
> information.

I see. Thanks Bart.
If I understand correctly, the link is hibernated because we had a 
successful ufshcd_uic_hibern8_enter() earlier. Then the 
ufshcd_uic_pwr_ctrl() is invoked probably from a power mode change 
command? (a callstack would be helpful in this case).
Anyway, accessing the UFSHCI host controller registers space somehow 
affected the M-PHY link state? If my understanding is correct, it seems 
like a hardware issue that we are trying to work around?

Thanks, Bao
Peter Wang (王信友) Aug. 22, 2024, 5:36 a.m. UTC | #6
On Wed, 2024-08-21 at 11:29 -0700, Bart Van Assche wrote:
>  	 
> External email : Please do not click links or open attachments until
> you have verified the sender or the content.
>  Accessing a host controller register after the host controller has
> entered the hibernation state may cause the host controller to exit
> the
> hibernation state. Hence rework the hibernation entry code such that
> it
> does not modify the interrupt enabled status. This patch relies on
> the
> following:
> * If an UIC command is submitted that should be completed by the UIC
>   command completion interrupt, hba->uic_async_done == NULL.
> * If an UIC command is submitted that should be completed by the
> power
>   mode change interrupt or by a hibernation state change interrupt,
>   hba->uic_async_done != NULL.
> 
> Signed-off-by: Bart Van Assche <bvanassche@acm.org>
> 

Hi Bart,

If link enter hibernate, change power mode should wakeup link 
to active before power mode change.
So, even manual hibernate exit by clock ungating or auto 
hiberante exit by hardware, write hci interrupt register exit 
hiberante should not a problem?

Thanks.
Peter
Manivannan Sadhasivam Aug. 22, 2024, 6:36 a.m. UTC | #7
On Wed, Aug 21, 2024 at 05:14:52PM -0700, Bart Van Assche wrote:
> On 8/21/24 4:26 PM, Bao D. Nguyen wrote:
> > On 8/21/2024 11:29 AM, Bart Van Assche wrote:
> > > Accessing a host controller register after the host controller has
> > > entered the hibernation state may cause the host controller to exit the
> > > hibernation state. Hence rework the hibernation entry code such that it
> > > does not modify the interrupt enabled status. Bart,
> >
> > I am not clear on the offending condition, particularly the term
> > "hibernation" used in this context. In the function
> > ufshcd_uic_pwr_ctrl() where you are making the change, the host
> > controller is fully active at this point, right?
> > Please help me clarify the issue.
> 
> Hi Bao,
> 
> Isn't "hibernation" terminology that comes from the M-PHY standard?

Yeah, it creates confusion between OS hibernation and M-PHY hibernation. Maybe
saying 'link hibernation' would avoid this.

- Mani
Bean Huo Aug. 22, 2024, 2:17 p.m. UTC | #8
On Wed, 2024-08-21 at 14:39 -0700, Bart Van Assche wrote:
> On 8/21/24 2:27 PM, Bean Huo wrote:
> > My only concern is, removing disabling UIC completion IRQ, and
> > keeping
> > is.uccs 1, then we don't read its status in case of
> > ufshcd_uic_pwr_ctrl
> > path, whether this will affect the next UIC access result.
> 
> Hmm ... I think I need more context information. If the UIC
> completion
> interrupt is left enabled then ufshcd_intr() will execute the code
> "intr_status = ufshcd_readl(hba, REG_INTERRUPT_STATUS)". This
> statement
> reads all bits from the interrupt status register including the UCCS
> bit, isn't it?
> 
> Thanks,
> 
> Bart.

One UIC power ctrl command will generate two Compl interrupts, one is
command complete (UIC_COMMAND_COMPL) and the other is power switch
complete (UFSHCD_UIC_PWR_MASK). Is that right? I checked the current
code and we don't read the UFSHCI registers except when we read intr
status before ufshcd_uic_cmd_compl() and re-enable UIC compl interrupt.

Do you mean re-enabling UIC complete interrupt will cause the problem?


Kind regards,
Bean
Bart Van Assche Aug. 22, 2024, 5:51 p.m. UTC | #9
On 8/22/24 7:17 AM, Bean Huo wrote:
> Do you mean re-enabling UIC complete interrupt will cause the problem?

That's correct. ufshcd_uic_hibern8_enter() calls ufshcd_uic_pwr_ctrl()
indirectly. For the test setup that is on my desk, the code in
ufshcd_uic_pwr_ctrl() that re-enables the UIC completion interrupt
causes the UFS host controller to exit hibernation.

Thanks,

Bart.
Bart Van Assche Aug. 22, 2024, 6:13 p.m. UTC | #10
On 8/21/24 6:05 PM, Bao D. Nguyen wrote:
> If I understand correctly, the link is hibernated because we had a 
> successful ufshcd_uic_hibern8_enter() earlier. Then the 
> ufshcd_uic_pwr_ctrl() is invoked probably from a power mode change 
> command? (a callstack would be helpful in this case).

Hi Bao,

ufshcd_uic_hibern8_enter() calls ufshcd_uic_pwr_ctrl() directly. The
former function causes the latter to send the UIC_CMD_DME_HIBER_ENTER
command. Although UIC_CMD_DME_HIBER_ENTER causes the link to enter the
hibernation state, the code in ufshcd_uic_pwr_ctrl() for re-enabling
interrupts causes the UFS host controller that I'm testing to exit the
hibernation state.

> Anyway, accessing the UFSHCI host controller registers space somehow 
> affected the M-PHY link state? If my understanding is correct, it seems 
> like a hardware issue that we are trying to work around?

Yes, this is a workaround particular behavior of a particular UFS
controller.

Thanks,

Bart.
Bao D. Nguyen Aug. 22, 2024, 8:54 p.m. UTC | #11
On 8/22/2024 11:13 AM, Bart Van Assche wrote:
> On 8/21/24 6:05 PM, Bao D. Nguyen wrote:
>> If I understand correctly, the link is hibernated because we had a 
>> successful ufshcd_uic_hibern8_enter() earlier. Then the 
>> ufshcd_uic_pwr_ctrl() is invoked probably from a power mode change 
>> command? (a callstack would be helpful in this case).
> 
> Hi Bao,
> 
> ufshcd_uic_hibern8_enter() calls ufshcd_uic_pwr_ctrl() directly. The
> former function causes the latter to send the UIC_CMD_DME_HIBER_ENTER
> command. Although UIC_CMD_DME_HIBER_ENTER causes the link to enter the
> hibernation state, the code in ufshcd_uic_pwr_ctrl() for re-enabling
> interrupts causes the UFS host controller that I'm testing to exit the
> hibernation state.
> 
>> Anyway, accessing the UFSHCI host controller registers space somehow 
>> affected the M-PHY link state? If my understanding is correct, it 
>> seems like a hardware issue that we are trying to work around?
> 
> Yes, this is a workaround particular behavior of a particular UFS
> controller.

Thank you for the clarification, Bart.
I am just curious about providing workaround for a hardware issue in the 
ufs core driver. Sometimes I notice the community is not accepting such 
a change and push the change to be implemented in the vendor/platform 
drivers.

Now about your workaround, I have the same concern as Bean.
For a ufshcd_uic_pwr_ctrl() command, i.e PMC, hibern8_enter/exit() 
commands, you will get 2 ufshcd_uic_cmd_compl() interrupts or 1 uic 
completion interrupt with 2 status bits set at once.
a. UIC_COMMAND_COMPL is set
b. and one of these bits UIC_POWER_MODE || UIC_HIBERNATE_EXIT || 
UIC_HIBERNATE_ENTER is set, depending on the completed uic command. 


In your proposed change to ufshcd_uic_cmd_compl(), you are signalling 
both complete(&cmd->done) and complete(hba->uic_async_done) for a single 
ufshcd_uic_pwr_ctrl() command which can cause issues.

Thanks, Bao

> 
> Thanks,
> 
> Bart.
>
Bart Van Assche Aug. 22, 2024, 9:08 p.m. UTC | #12
On 8/22/24 1:54 PM, Bao D. Nguyen wrote:
> I am just curious about providing workaround for a hardware issue in the 
> ufs core driver. Sometimes I notice the community is not accepting such 
> a change and push the change to be implemented in the vendor/platform 
> drivers.

There are two reasons why I propose to implement this workaround as a
change for the UFS driver core:
- I am not aware of any way to implement the behavior change in a vendor
   driver without modifying the UFS driver core.
- The workaround results in a simplification of the UFS driver core
   code. More lines are removed from the UFS driver than added.

Although it would be possible to select whether the old or the new
behavior is selected by introducing yet another host controller quirk, I
prefer not to do that because it would make the UFSHCI driver even more
complex.

> In your proposed change to ufshcd_uic_cmd_compl(), you are signalling 
> both complete(&cmd->done) and complete(hba->uic_async_done) for a single 
> ufshcd_uic_pwr_ctrl() command which can cause issues.

Please take another look at patch 2/2. With or without this patch
applied, only hba->uic_async_done is signaled when a power mode UIC
command completes.

Thanks,

Bart.
Bao D. Nguyen Aug. 22, 2024, 11:34 p.m. UTC | #13
On 8/21/2024 11:29 AM, Bart Van Assche wrote:
> Accessing a host controller register after the host controller has
> entered the hibernation state may cause the host controller to exit the
> hibernation state. Hence rework the hibernation entry code such that it
> does not modify the interrupt enabled status. This patch relies on the
> following:
> * If an UIC command is submitted that should be completed by the UIC
>    command completion interrupt, hba->uic_async_done == NULL.
> * If an UIC command is submitted that should be completed by the power
>    mode change interrupt or by a hibernation state change interrupt,
>    hba->uic_async_done != NULL.
> 
> Signed-off-by: Bart Van Assche <bvanassche@acm.org>
> ---
>   drivers/ufs/core/ufshcd.c | 22 ++++++----------------
>   include/ufs/ufshcd.h      |  7 ++++---
>   2 files changed, 10 insertions(+), 19 deletions(-)
> 
> diff --git a/drivers/ufs/core/ufshcd.c b/drivers/ufs/core/ufshcd.c
> index d0ae6e50becc..e12f30b8a83c 100644
> --- a/drivers/ufs/core/ufshcd.c
> +++ b/drivers/ufs/core/ufshcd.c
> @@ -2585,6 +2585,7 @@ int ufshcd_send_uic_cmd(struct ufs_hba *hba, struct uic_command *uic_cmd)
>   	ufshcd_hold(hba);
>   	mutex_lock(&hba->uic_cmd_mutex);
>   	ufshcd_add_delay_before_dme_cmd(hba);
> +	WARN_ON(hba->uic_async_done);
>   
>   	ret = __ufshcd_send_uic_cmd(hba, uic_cmd, true);
>   	if (!ret)
> @@ -4255,7 +4256,6 @@ static int ufshcd_uic_pwr_ctrl(struct ufs_hba *hba, struct uic_command *cmd)
>   	unsigned long flags;
>   	u8 status;
>   	int ret;
> -	bool reenable_intr = false;
>   
>   	mutex_lock(&hba->uic_cmd_mutex);
>   	ufshcd_add_delay_before_dme_cmd(hba);
> @@ -4266,15 +4266,6 @@ static int ufshcd_uic_pwr_ctrl(struct ufs_hba *hba, struct uic_command *cmd)
>   		goto out_unlock;
>   	}
>   	hba->uic_async_done = &uic_async_done;
> -	if (ufshcd_readl(hba, REG_INTERRUPT_ENABLE) & UIC_COMMAND_COMPL) {
> -		ufshcd_disable_intr(hba, UIC_COMMAND_COMPL);
> -		/*
> -		 * Make sure UIC command completion interrupt is disabled before
> -		 * issuing UIC command.
> -		 */
> -		ufshcd_readl(hba, REG_INTERRUPT_ENABLE);
> -		reenable_intr = true;
> -	}
>   	spin_unlock_irqrestore(hba->host->host_lock, flags);
>   	ret = __ufshcd_send_uic_cmd(hba, cmd, false);
>   	if (ret) {
> @@ -4318,8 +4309,6 @@ static int ufshcd_uic_pwr_ctrl(struct ufs_hba *hba, struct uic_command *cmd)
>   	spin_lock_irqsave(hba->host->host_lock, flags);
>   	hba->active_uic_cmd = NULL;
>   	hba->uic_async_done = NULL;
> -	if (reenable_intr)
> -		ufshcd_enable_intr(hba, UIC_COMMAND_COMPL);
>   	if (ret) {
>   		ufshcd_set_link_broken(hba);
>   		ufshcd_schedule_eh_work(hba);
> @@ -5472,11 +5461,12 @@ static irqreturn_t ufshcd_uic_cmd_compl(struct ufs_hba *hba, u32 intr_status)
>   		hba->errors |= (UFSHCD_UIC_HIBERN8_MASK & intr_status);
>   
>   	if (intr_status & UIC_COMMAND_COMPL && cmd) {

Let's consider this corner case. I am not sure if it's a valid concern, 
but it is a corner case that the extra interrupt (new behavior 
introduced by this patch) may bring.

Let's say you are sending a ufshcd_uic_pwr_ctrl() command. You will get 
2 uic completion interrupts:
[1] ufshcd_uic_cmd_compl() is called for the first interrupt which 
happens to be UFSHCD_UIC_PWR_MASK only. At the end of the 
ufshcd_uic_pwr_ctrl(), you would set the hba->active_uic_cmd to NULL.
[2]The second uic completion interrupt for UIC_COMMAND_COMP is delayed.
This interrupt is newly introduced by this patch.

Now let's say you have a new UIC command coming via 
ufshcd_send_uic_cmd(). The ufshcd_dispatch_uic_cmd() will update your 
hba->active_uic_cmd to the new uic_cmd.

The delayed interrupt mentioned in [2] arrives late. The 
ufshcd_uic_cmd_compl() is called. In this scenario, this check here may 
return true "if (intr_status & UIC_COMMAND_COMPL && cmd) {..}"

Now, the ufshcd_uic_cmd_compl() would proceed to complete the 
UIC_COMMAND_COMPL interrupt for the new command intended for the 
previous "old" uic command.

Thanks, Bao

> -		cmd->argument2 |= ufshcd_get_uic_cmd_result(hba);
> -		cmd->argument3 = ufshcd_get_dme_attr_val(hba);
> -		if (!hba->uic_async_done)
> +		if (!hba->uic_async_done) {
> +			cmd->argument2 |= ufshcd_get_uic_cmd_result(hba);
> +			cmd->argument3 = ufshcd_get_dme_attr_val(hba);
>   			cmd->cmd_active = 0;
> -		complete(&cmd->done);
> +			complete(&cmd->done);
> +		}
>   		retval = IRQ_HANDLED;
>   	}
>   
> diff --git a/include/ufs/ufshcd.h b/include/ufs/ufshcd.h
> index a43b14276bc3..0577013a4611 100644
> --- a/include/ufs/ufshcd.h
> +++ b/include/ufs/ufshcd.h
> @@ -868,9 +868,10 @@ enum ufshcd_mcq_opr {
>    * @tmf_tag_set: TMF tag set.
>    * @tmf_queue: Used to allocate TMF tags.
>    * @tmf_rqs: array with pointers to TMF requests while these are in progress.
> - * @active_uic_cmd: handle of active UIC command
> - * @uic_cmd_mutex: mutex for UIC command
> - * @uic_async_done: completion used during UIC processing
> + * @active_uic_cmd: active UIC command pointer.
> + * @uic_cmd_mutex: mutex used to serialize UIC command processing.
> + * @uic_async_done: completion used to wait for power mode or hibernation state
> + *	changes.
>    * @ufshcd_state: UFSHCD state
>    * @eh_flags: Error handling flags
>    * @intr_mask: Interrupt Mask Bits
>
Bart Van Assche Aug. 23, 2024, 2:06 a.m. UTC | #14
On 8/22/24 4:34 PM, Bao D. Nguyen wrote:
> Let's say you are sending a ufshcd_uic_pwr_ctrl() command. You will get 
> 2 uic completion interrupts:
> [1] ufshcd_uic_cmd_compl() is called for the first interrupt which 
> happens to be UFSHCD_UIC_PWR_MASK only. At the end of the 
> ufshcd_uic_pwr_ctrl(), you would set the hba->active_uic_cmd to NULL.

That's not correct. ufshcd_uic_pwr_ctrl() only clears
hba->active_uic_cmd after the power mode change interrupt has been
processed.

> [2]The second uic completion interrupt for UIC_COMMAND_COMP is delayed.
> This interrupt is newly introduced by this patch.

If UIC_COMMAND_COMPL is delivered after UFSHCD_UIC_PWR_MASK then the
UIC_COMMAND_COMPL interrupt will be ignored because hba->active_uic_cmd
is cleared by ufshcd_uic_pwr_ctrl() after it has processed the
power mode change interrupt.

> Now let's say you have a new UIC command coming via 
> ufshcd_send_uic_cmd(). The ufshcd_dispatch_uic_cmd() will update your 
> hba->active_uic_cmd to the new uic_cmd.

UIC command processing is serialized by hba->uic_cmd_mutex. Hence only
one UIC command is processed at any given time.

Does this address your concerns?

Thanks,

Bart.
Peter Wang (王信友) Aug. 23, 2024, 2:57 a.m. UTC | #15
On Thu, 2024-08-22 at 19:06 -0700, Bart Van Assche wrote:
>  	 
> External email : Please do not click links or open attachments until
> you have verified the sender or the content.
>  On 8/22/24 4:34 PM, Bao D. Nguyen wrote:
> > Let's say you are sending a ufshcd_uic_pwr_ctrl() command. You will
> get 
> > 2 uic completion interrupts:
> > [1] ufshcd_uic_cmd_compl() is called for the first interrupt which 
> > happens to be UFSHCD_UIC_PWR_MASK only. At the end of the 
> > ufshcd_uic_pwr_ctrl(), you would set the hba->active_uic_cmd to
> NULL.
> 
> That's not correct. ufshcd_uic_pwr_ctrl() only clears
> hba->active_uic_cmd after the power mode change interrupt has been
> processed.
> 
> > [2]The second uic completion interrupt for UIC_COMMAND_COMP is
> delayed.
> > This interrupt is newly introduced by this patch.
> 
> If UIC_COMMAND_COMPL is delivered after UFSHCD_UIC_PWR_MASK then the
> UIC_COMMAND_COMPL interrupt will be ignored because hba-
> >active_uic_cmd
> is cleared by ufshcd_uic_pwr_ctrl() after it has processed the
> power mode change interrupt.
> 
> > Now let's say you have a new UIC command coming via 
> > ufshcd_send_uic_cmd(). The ufshcd_dispatch_uic_cmd() will update
> your 
> > hba->active_uic_cmd to the new uic_cmd.
> 
> UIC command processing is serialized by hba->uic_cmd_mutex. Hence
> only
> one UIC command is processed at any given time.
> 
> Does this address your concerns?
> 
> Thanks,
> 
> Bart.

Hi Bart,

You assume that the following steps are executed in sequence.
(Theread A) send ufshcd_uic_pwr_ctrl
(ISR) UIC_POWER_MODE A
	clear hab->active_uic_cmd
(ISR) UIC_COMMAND_COMPL A (step A)
	do nothing
(Thread B) ufshcd_send_uic_cmd set hab->active_uic_cmd (step B)
(ISR) UIC_COMMAND_COMPL B
	complte thread B's cmd

But actually step A ISR may come after step B and cause error.

(Theread A) send ufshcd_uic_pwr_ctrl
(ISR) UIC_POWER_MODE A
	clear hab->active_uic_cmd
(Thread B) ufshcd_send_uic_cmd set hab->active_uic_cmd (step B)
(ISR) UIC_COMMAND_COMPL A (step A)
	complete Thread A cmd	


Thanks.
Peter
Bao D. Nguyen Aug. 23, 2024, 3:43 a.m. UTC | #16
On 8/22/2024 7:06 PM, Bart Van Assche wrote:
> On 8/22/24 4:34 PM, Bao D. Nguyen wrote:
>> Let's say you are sending a ufshcd_uic_pwr_ctrl() command. You will 
>> get 2 uic completion interrupts:
>> [1] ufshcd_uic_cmd_compl() is called for the first interrupt which 
>> happens to be UFSHCD_UIC_PWR_MASK only. At the end of the 
>> ufshcd_uic_pwr_ctrl(), you would set the hba->active_uic_cmd to NULL.
> 
> That's not correct. ufshcd_uic_pwr_ctrl() only clears
> hba->active_uic_cmd after the power mode change interrupt has been
> processed.
> 
Agree.

>> [2]The second uic completion interrupt for UIC_COMMAND_COMP is delayed.
>> This interrupt is newly introduced by this patch.
> 
> If UIC_COMMAND_COMPL is delivered after UFSHCD_UIC_PWR_MASK then the
> UIC_COMMAND_COMPL interrupt will be ignored because hba->active_uic_cmd
> is cleared by ufshcd_uic_pwr_ctrl() after it has processed the
> power mode change interrupt.
Agree.

> 
>> Now let's say you have a new UIC command coming via 
>> ufshcd_send_uic_cmd(). The ufshcd_dispatch_uic_cmd() will update your 
>> hba->active_uic_cmd to the new uic_cmd.
> 
> UIC command processing is serialized by hba->uic_cmd_mutex. Hence only
> one UIC command is processed at any given time.
> 
> Does this address your concerns?
Not really. I agree that the uic commands are serialized by the 
hba->uic_cmd_mutex. However, the UIC_COMMAND_COMPL extra interrupt 
belonging to the previous PMC/hibern8_enter/exit() command can come late 
and causes the ufshcd_uic_cmd_compl() to complete the current uic 
command incorrectly.

Thanks, Bao

> 
> Thanks,
> 
> Bart.
Bean Huo Aug. 23, 2024, 10:54 a.m. UTC | #17
On Thu, 2024-08-22 at 10:51 -0700, Bart Van Assche wrote:
> On 8/22/24 7:17 AM, Bean Huo wrote:
> > Do you mean re-enabling UIC complete interrupt will cause the
> > problem?
> 
> That's correct. ufshcd_uic_hibern8_enter() calls
> ufshcd_uic_pwr_ctrl()
> indirectly. For the test setup that is on my desk, the code in
> ufshcd_uic_pwr_ctrl() that re-enables the UIC completion interrupt
> causes the UFS host controller to exit hibernation.
> 
> Thanks,
> 
> Bart.


Do you think this is only true in your case or for a specific UFS
controller vendor? and this doesnnot mean that all UFS controller
vendors have this problem? Maybe MTK has confirmed this.

Kind regards,
Bean
Can Guo Aug. 23, 2024, 11:26 a.m. UTC | #18
On 8/23/2024 6:54 PM, Bean Huo wrote:
> On Thu, 2024-08-22 at 10:51 -0700, Bart Van Assche wrote:
>> On 8/22/24 7:17 AM, Bean Huo wrote:
>>> Do you mean re-enabling UIC complete interrupt will cause the
>>> problem?
>> That's correct. ufshcd_uic_hibern8_enter() calls
>> ufshcd_uic_pwr_ctrl()
>> indirectly. For the test setup that is on my desk, the code in
>> ufshcd_uic_pwr_ctrl() that re-enables the UIC completion interrupt
>> causes the UFS host controller to exit hibernation.

That is a weird UFS host controller behavior.

At least Qcom UFS host controller does not behave like this, because 
accessing the

UFSHCI IRQ register does not require/involve link communication with the 
UFS device.


Thanks,

Can Guo.

>>
>> Thanks,
>>
>> Bart.
>
> Do you think this is only true in your case or for a specific UFS
> controller vendor? and this doesnnot mean that all UFS controller
> vendors have this problem? Maybe MTK has confirmed this.
>
> Kind regards,
> Bean
Manivannan Sadhasivam Aug. 23, 2024, 12:01 p.m. UTC | #19
On Thu, Aug 22, 2024 at 02:08:24PM -0700, Bart Van Assche wrote:
> On 8/22/24 1:54 PM, Bao D. Nguyen wrote:
> > I am just curious about providing workaround for a hardware issue in the
> > ufs core driver. Sometimes I notice the community is not accepting such
> > a change and push the change to be implemented in the vendor/platform
> > drivers.
> 
> There are two reasons why I propose to implement this workaround as a
> change for the UFS driver core:
> - I am not aware of any way to implement the behavior change in a vendor
>   driver without modifying the UFS driver core.

Unfortunately you never mentioned which UFS controller this behavior applies to.

> - The workaround results in a simplification of the UFS driver core
>   code. More lines are removed from the UFS driver than added.
> 

This doesn't justify the modification of the UFS code driver for an errantic
behavior of a UFS controller.

> Although it would be possible to select whether the old or the new
> behavior is selected by introducing yet another host controller quirk, I
> prefer not to do that because it would make the UFSHCI driver even more
> complex.
> 

I strongly believe that using the quirk is the way forward to address this
issue. Because this is not a documented behavior to be handled in the core
driver and also defeats the purpose of having the quirks in first place.

- Mani
Bart Van Assche Aug. 23, 2024, 2:23 p.m. UTC | #20
On 8/23/24 5:01 AM, Manivannan Sadhasivam wrote:
> Unfortunately you never mentioned which UFS controller this behavior applies to.

Does your employer allow you to publish detailed information about
unreleased SoCs?

>> - The workaround results in a simplification of the UFS driver core
>>    code. More lines are removed from the UFS driver than added.
> 
> This doesn't justify the modification of the UFS code driver for an errantic
> behavior of a UFS controller.

Patch 2/2 deletes more code than it adds and hence helps to make the UFS
driver core easier to maintain. Adding yet another quirk would make the
UFS driver core more complicated and hence harder to maintain.

>> Although it would be possible to select whether the old or the new
>> behavior is selected by introducing yet another host controller quirk, I
>> prefer not to do that because it would make the UFSHCI driver even more
>> complex.
> 
> I strongly believe that using the quirk is the way forward to address this
> issue. Because this is not a documented behavior to be handled in the core
> driver and also defeats the purpose of having the quirks in first place.

Adding a quirk is not an option in this case because adding a new quirk
without code that uses the quirk is not allowed. It may take another
year before the code that uses that new quirk is sent upstream because
the team that sends Pixel SoC drivers upstream only does this after
devices with that SoC are publicly available.

Thanks,

Bart.
Manivannan Sadhasivam Aug. 23, 2024, 2:58 p.m. UTC | #21
On Fri, Aug 23, 2024 at 07:23:57AM -0700, Bart Van Assche wrote:
> On 8/23/24 5:01 AM, Manivannan Sadhasivam wrote:
> > Unfortunately you never mentioned which UFS controller this behavior applies to.
> 
> Does your employer allow you to publish detailed information about
> unreleased SoCs?
> 

Certainly not! But in that case I will explicitly mention that the controller is
used in an upcoming SoC or something like that. Otherwise, how can the community
know whether you are sending the patch for an existing controller or a secret
one?

> > > - The workaround results in a simplification of the UFS driver core
> > >    code. More lines are removed from the UFS driver than added.
> > 
> > This doesn't justify the modification of the UFS code driver for an errantic
> > behavior of a UFS controller.
> 
> Patch 2/2 deletes more code than it adds and hence helps to make the UFS
> driver core easier to maintain. Adding yet another quirk would make the
> UFS driver core more complicated and hence harder to maintain.
> 

IMO nobody can make the UFS driver more complicated. It is complicated at its
best :)

But you are just applying the plaster in the core code for a quirky behavior of
an unreleased SoC. IMO that is not something we would want. Imagine if other
vendor also decides to do the same with the argument of removing more lines of
code etc... we will end up with a situation where the core code doing something
out of the spec for a buggy controller without any mentioning of the quirky
behavior.

So that will make the code more complex to understand.

> > > Although it would be possible to select whether the old or the new
> > > behavior is selected by introducing yet another host controller quirk, I
> > > prefer not to do that because it would make the UFSHCI driver even more
> > > complex.
> > 
> > I strongly believe that using the quirk is the way forward to address this
> > issue. Because this is not a documented behavior to be handled in the core
> > driver and also defeats the purpose of having the quirks in first place.
> 
> Adding a quirk is not an option in this case because adding a new quirk
> without code that uses the quirk is not allowed. It may take another
> year before the code that uses that new quirk is sent upstream because
> the team that sends Pixel SoC drivers upstream only does this after
> devices with that SoC are publicly available.
> 

Then why can't you send the change at that time? We do the same for Qcom SoCs
all the time and I'm pretty sure that the Pixel SoCs are no different.

At that time, the SoC may get a new revision and we may end up not needing the
quirk at all. I'm not saying that it will happen, but that is a common practice
among the semiconductor companies.

- Mani
Bart Van Assche Aug. 23, 2024, 4:07 p.m. UTC | #22
On 8/23/24 7:58 AM, Manivannan Sadhasivam wrote:
> Then why can't you send the change at that time?

We use the Android GKI kernel and any patches must be sent upstream
first before these can be considered for inclusion in the GKI kernel.

Thanks,

Bart.
Bart Van Assche Aug. 23, 2024, 4:09 p.m. UTC | #23
On 8/23/24 3:54 AM, Bean Huo wrote:
> On Thu, 2024-08-22 at 10:51 -0700, Bart Van Assche wrote:
>> That's correct. ufshcd_uic_hibern8_enter() calls
>> ufshcd_uic_pwr_ctrl()
>> indirectly. For the test setup that is on my desk, the code in
>> ufshcd_uic_pwr_ctrl() that re-enables the UIC completion interrupt
>> causes the UFS host controller to exit hibernation.
> 
> Do you think this is only true in your case or for a specific UFS
> controller vendor? and this doesnnot mean that all UFS controller
> vendors have this problem? Maybe MTK has confirmed this.

Hi Bean,

I'm only aware of one UFS controller type that shows this behavior.

Thanks,

Bart.
Manivannan Sadhasivam Aug. 23, 2024, 4:48 p.m. UTC | #24
On Fri, Aug 23, 2024 at 09:07:18AM -0700, Bart Van Assche wrote:
> On 8/23/24 7:58 AM, Manivannan Sadhasivam wrote:
> > Then why can't you send the change at that time?
> 
> We use the Android GKI kernel and any patches must be sent upstream
> first before these can be considered for inclusion in the GKI kernel.
> 

But that's the same requirement for other SoC vendors as well. Anyway, these
don't justify the fact that the core code should be modified to workaround a
controller defect. Please use quirks as like other vendors.

- Mani
Bart Van Assche Aug. 23, 2024, 6:05 p.m. UTC | #25
On 8/23/24 9:48 AM, Manivannan Sadhasivam wrote:
> On Fri, Aug 23, 2024 at 09:07:18AM -0700, Bart Van Assche wrote:
>> On 8/23/24 7:58 AM, Manivannan Sadhasivam wrote:
>>> Then why can't you send the change at that time?
>>
>> We use the Android GKI kernel and any patches must be sent upstream
>> first before these can be considered for inclusion in the GKI kernel.
> 
> But that's the same requirement for other SoC vendors as well. Anyway, these
> don't justify the fact that the core code should be modified to workaround a
> controller defect. Please use quirks as like other vendors.

Let me repeat what I mentioned earlier:
* Introducing a new quirk without introducing a user for that quirk is
   not acceptable because that would involve introducing code that is
   dead code from the point of view of the upstream kernel.
* The UFS driver core is already complicated. If we don't need a new
   quirk we shouldn't introduce a new quirk.

Bart.
Bart Van Assche Aug. 23, 2024, 8:25 p.m. UTC | #26
On 8/22/24 8:43 PM, Bao D. Nguyen wrote:
> However, the UIC_COMMAND_COMPL extra interrupt 
> belonging to the previous PMC/hibern8_enter/exit() command can come late 
> and causes the ufshcd_uic_cmd_compl() to complete the current uic 
> command incorrectly.

Hi Bao,

If the UIC command completion interrupt could come late we would
already have observed unhandled interrupt errors in device logs, isn't
it?

Anyway, isn't this something that is easy to fix with something like the
(untested) patch below?

Thanks,

Bart.

diff --git a/drivers/ufs/core/ufshcd.c b/drivers/ufs/core/ufshcd.c
index d0ae6e50becc..e3a487ea83f9 100644
--- a/drivers/ufs/core/ufshcd.c
+++ b/drivers/ufs/core/ufshcd.c
@@ -2543,13 +2543,11 @@ ufshcd_wait_for_uic_cmd(struct ufs_hba *hba, 
struct uic_command *uic_cmd)
   * __ufshcd_send_uic_cmd - Send UIC commands and retrieve the result
   * @hba: per adapter instance
   * @uic_cmd: UIC command
- * @completion: initialize the completion only if this is set to true
   *
   * Return: 0 only if success.
   */
  static int
-__ufshcd_send_uic_cmd(struct ufs_hba *hba, struct uic_command *uic_cmd,
-		      bool completion)
+__ufshcd_send_uic_cmd(struct ufs_hba *hba, struct uic_command *uic_cmd)
  {
  	lockdep_assert_held(&hba->uic_cmd_mutex);

@@ -2559,8 +2557,7 @@ __ufshcd_send_uic_cmd(struct ufs_hba *hba, struct 
uic_command *uic_cmd,
  		return -EIO;
  	}

-	if (completion)
-		init_completion(&uic_cmd->done);
+	init_completion(&uic_cmd->done);

  	uic_cmd->cmd_active = 1;
  	ufshcd_dispatch_uic_cmd(hba, uic_cmd);
@@ -2586,7 +2583,7 @@ int ufshcd_send_uic_cmd(struct ufs_hba *hba, 
struct uic_command *uic_cmd)
  	mutex_lock(&hba->uic_cmd_mutex);
  	ufshcd_add_delay_before_dme_cmd(hba);

-	ret = __ufshcd_send_uic_cmd(hba, uic_cmd, true);
+	ret = __ufshcd_send_uic_cmd(hba, uic_cmd);
  	if (!ret)
  		ret = ufshcd_wait_for_uic_cmd(hba, uic_cmd);

@@ -4255,7 +4252,6 @@ static int ufshcd_uic_pwr_ctrl(struct ufs_hba 
*hba, struct uic_command *cmd)
  	unsigned long flags;
  	u8 status;
  	int ret;
-	bool reenable_intr = false;

  	mutex_lock(&hba->uic_cmd_mutex);
  	ufshcd_add_delay_before_dme_cmd(hba);
@@ -4266,17 +4262,8 @@ static int ufshcd_uic_pwr_ctrl(struct ufs_hba 
*hba, struct uic_command *cmd)
  		goto out_unlock;
  	}
  	hba->uic_async_done = &uic_async_done;
-	if (ufshcd_readl(hba, REG_INTERRUPT_ENABLE) & UIC_COMMAND_COMPL) {
-		ufshcd_disable_intr(hba, UIC_COMMAND_COMPL);
-		/*
-		 * Make sure UIC command completion interrupt is disabled before
-		 * issuing UIC command.
-		 */
-		ufshcd_readl(hba, REG_INTERRUPT_ENABLE);
-		reenable_intr = true;
-	}
  	spin_unlock_irqrestore(hba->host->host_lock, flags);
-	ret = __ufshcd_send_uic_cmd(hba, cmd, false);
+	ret = __ufshcd_send_uic_cmd(hba, cmd);
  	if (ret) {
  		dev_err(hba->dev,
  			"pwr ctrl cmd 0x%x with mode 0x%x uic error %d\n",
@@ -4300,6 +4287,13 @@ static int ufshcd_uic_pwr_ctrl(struct ufs_hba 
*hba, struct uic_command *cmd)
  		goto out;
  	}

+	ret = wait_for_completion_timeout(&cmd->done,
+					  msecs_to_jiffies(uic_cmd_timeout));
+	WARN_ON_ONCE(ret < 0);
+	if (ret == 0)
+		dev_err(hba->dev, "UIC command %#x timed out\n", cmd->command);
+	ret = 0;
+
  check_upmcrs:
  	status = ufshcd_get_upmcrs(hba);
  	if (status != PWR_LOCAL) {
@@ -4318,8 +4312,6 @@ static int ufshcd_uic_pwr_ctrl(struct ufs_hba 
*hba, struct uic_command *cmd)
  	spin_lock_irqsave(hba->host->host_lock, flags);
  	hba->active_uic_cmd = NULL;
  	hba->uic_async_done = NULL;
-	if (reenable_intr)
-		ufshcd_enable_intr(hba, UIC_COMMAND_COMPL);
  	if (ret) {
  		ufshcd_set_link_broken(hba);
  		ufshcd_schedule_eh_work(hba);
Bart Van Assche Aug. 23, 2024, 8:27 p.m. UTC | #27
On 8/22/24 7:57 PM, Peter Wang (王信友) wrote:
> You assume that the following steps are executed in sequence.
> (Theread A) send ufshcd_uic_pwr_ctrl
> (ISR) UIC_POWER_MODE A
> 	clear hab->active_uic_cmd
> (ISR) UIC_COMMAND_COMPL A (step A)
> 	do nothing
> (Thread B) ufshcd_send_uic_cmd set hab->active_uic_cmd (step B)
> (ISR) UIC_COMMAND_COMPL B
> 	complte thread B's cmd
> 
> But actually step A ISR may come after step B and cause error.
> 
> (Theread A) send ufshcd_uic_pwr_ctrl
> (ISR) UIC_POWER_MODE A
> 	clear hab->active_uic_cmd
> (Thread B) ufshcd_send_uic_cmd set hab->active_uic_cmd (step B)
> (ISR) UIC_COMMAND_COMPL A (step A)
> 	complete Thread A cmd	

Hi Peter,

Can you please take a look at the fix I proposed in my reply to Bao?

Thanks,

Bart.
Manivannan Sadhasivam Aug. 24, 2024, 2:29 a.m. UTC | #28
On Fri, Aug 23, 2024 at 11:05:12AM -0700, Bart Van Assche wrote:
> On 8/23/24 9:48 AM, Manivannan Sadhasivam wrote:
> > On Fri, Aug 23, 2024 at 09:07:18AM -0700, Bart Van Assche wrote:
> > > On 8/23/24 7:58 AM, Manivannan Sadhasivam wrote:
> > > > Then why can't you send the change at that time?
> > > 
> > > We use the Android GKI kernel and any patches must be sent upstream
> > > first before these can be considered for inclusion in the GKI kernel.
> > 
> > But that's the same requirement for other SoC vendors as well. Anyway, these
> > don't justify the fact that the core code should be modified to workaround a
> > controller defect. Please use quirks as like other vendors.
> 
> Let me repeat what I mentioned earlier:
> * Introducing a new quirk without introducing a user for that quirk is
>   not acceptable because that would involve introducing code that is
>   dead code from the point of view of the upstream kernel.

As I pointed out earlier, you should just submit the quirk change when you are
submitting your driver. But you said that for GKI requirement you are doing the
change in core driver. But again, that is applicable for other vendors as well.
What if other vendors start adding the workaround in the core driver citing GKI
requirement (provided it also removes some code as you justified)? Will it be
acceptable? NO.

> * The UFS driver core is already complicated. If we don't need a new
>   quirk we shouldn't introduce a new quirk.
> 

Sorry, the quirk is a quirk. All the existing quirks can be worked around in the
core driver in some way. But we have the quirk mechanisms to specifically not to
do that to avoid polluting the core code which has to follow the spec.

Moreover, this workaround you are adding is not at all common for other
controllers. So this definitely doesn't justify modifying the core code.

IMO adding more code alone will not make a driver complicated, but changing the
logic will.

- Mani
Bart Van Assche Aug. 24, 2024, 2:48 a.m. UTC | #29
On 8/23/24 7:29 PM, Manivannan Sadhasivam wrote:
> What if other vendors start adding the workaround in the core driver citing GKI
> requirement (provided it also removes some code as you justified)? Will it be
> acceptable? NO.

It's not up to you to define new rules for upstream kernel development.
Anyone is allowed to publish patches that rework kernel code, whether
or not the purpose of such a patch is to work around a SoC bug.

Additionally, it has already happened that one of your colleagues
submitted a workaround for a SoC bug to the UFS core driver.
 From the description of commit 0f52fcb99ea2 ("scsi: ufs: Try to save
power mode change and UIC cmd completion timeout"): "This is to deal
with the scenario in which completion has been raised but the one
waiting for the completion cannot be awaken in time due to kernel
scheduling problem." That description makes zero sense to me. My
conclusion from commit 0f52fcb99ea2 is that it is a workaround for a
bug in a UFS host controller, namely that a particular UFS host
controller not always generates a UIC completion interrupt when it
should.

Bart.
Manivannan Sadhasivam Aug. 24, 2024, 3:03 a.m. UTC | #30
On Fri, Aug 23, 2024 at 07:48:50PM -0700, Bart Van Assche wrote:
> On 8/23/24 7:29 PM, Manivannan Sadhasivam wrote:
> > What if other vendors start adding the workaround in the core driver citing GKI
> > requirement (provided it also removes some code as you justified)? Will it be
> > acceptable? NO.
> 
> It's not up to you to define new rules for upstream kernel development.

I'm not framing new rules, but just pointing out the common practice.

> Anyone is allowed to publish patches that rework kernel code, whether
> or not the purpose of such a patch is to work around a SoC bug.
> 

Yes, at the same time if that code deviates from the norm, then anyone can
complain. We are all working towards making the code better.

> Additionally, it has already happened that one of your colleagues
> submitted a workaround for a SoC bug to the UFS core driver.
> From the description of commit 0f52fcb99ea2 ("scsi: ufs: Try to save
> power mode change and UIC cmd completion timeout"): "This is to deal
> with the scenario in which completion has been raised but the one
> waiting for the completion cannot be awaken in time due to kernel
> scheduling problem." That description makes zero sense to me. My
> conclusion from commit 0f52fcb99ea2 is that it is a workaround for a
> bug in a UFS host controller, namely that a particular UFS host
> controller not always generates a UIC completion interrupt when it
> should.
> 

0f52fcb99ea2 was submitted in 2020 before I started contributing to UFS driver
seriously. But the description of that commit never mentioned any issue with the
controller. It vaguely mentions 'kernel scheduling problem' which I don't know
how to interpret. If I were looking into the code at that time, I would've
definitely asked for clarity during the review phase.

But there is no need to take it as an example. I can only assert the fact that
working around the controller defect in core code when we already have quirks
for the same purpose defeats the purpose of quirks. And it will encourage other
people to start changing the core code in the future thus bypassing the quirks.

But I'm not a maintainer of this part of the code. So I cannot definitely stop
you from getting this patch merged. I'll leave it up to Martin to decide.

- Mani
Peter Wang (王信友) Aug. 26, 2024, 6:16 a.m. UTC | #31
On Fri, 2024-08-23 at 13:27 -0700, Bart Van Assche wrote:
>  	 
> External email : Please do not click links or open attachments until
> you have verified the sender or the content.
>  On 8/22/24 7:57 PM, Peter Wang (王信友) wrote:
> > You assume that the following steps are executed in sequence.
> > (Theread A) send ufshcd_uic_pwr_ctrl
> > (ISR) UIC_POWER_MODE A
> > clear hab->active_uic_cmd
> > (ISR) UIC_COMMAND_COMPL A (step A)
> > do nothing
> > (Thread B) ufshcd_send_uic_cmd set hab->active_uic_cmd (step B)
> > (ISR) UIC_COMMAND_COMPL B
> > complte thread B's cmd
> > 
> > But actually step A ISR may come after step B and cause error.
> > 
> > (Theread A) send ufshcd_uic_pwr_ctrl
> > (ISR) UIC_POWER_MODE A
> > clear hab->active_uic_cmd
> > (Thread B) ufshcd_send_uic_cmd set hab->active_uic_cmd (step B)
> > (ISR) UIC_COMMAND_COMPL A (step A)
> > complete Thread A cmd
> 
> Hi Peter,
> 
> Can you please take a look at the fix I proposed in my reply to Bao?
> 
> Thanks,
> 
> Bart.
> 

Hi Bart,

In this case, I suggest use a vendor hooks ufshcd_vops_hibern8_notify.
When hibernate enter pre-change, disable UIC_COMMAND_COMPL interrupt
to prevent enable UIC_COMMAND_COMPL after hibernate enter.
When hibernate exit post-change, enable UIC_COMMAND_COMPL interrupt.

If it works, it won't modify the native kernel code, nor will it
require adding a quirk. It would simply use a vendor hook as a 
workaround, without violating GKI, right?

Thanks
Peter
Can Guo Aug. 26, 2024, 6:48 a.m. UTC | #32
On 1/1/1970 8:00 AM, wrote:
> On Fri, Aug 23, 2024 at 07:48:50PM -0700, Bart Van Assche wrote:
>> On 8/23/24 7:29 PM, Manivannan Sadhasivam wrote:
>>> What if other vendors start adding the workaround in the core driver citing GKI
>>> requirement (provided it also removes some code as you justified)? Will it be
>>> acceptable? NO.
>> It's not up to you to define new rules for upstream kernel development.
> I'm not framing new rules, but just pointing out the common practice.
>
>> Anyone is allowed to publish patches that rework kernel code, whether
>> or not the purpose of such a patch is to work around a SoC bug.
>>
> Yes, at the same time if that code deviates from the norm, then anyone can
> complain. We are all working towards making the code better.
>
>> Additionally, it has already happened that one of your colleagues
>> submitted a workaround for a SoC bug to the UFS core driver.
>>  From the description of commit 0f52fcb99ea2 ("scsi: ufs: Try to save
>> power mode change and UIC cmd completion timeout"): "This is to deal
>> with the scenario in which completion has been raised but the one
>> waiting for the completion cannot be awaken in time due to kernel
>> scheduling problem." That description makes zero sense to me. My
>> conclusion from commit 0f52fcb99ea2 is that it is a workaround for a
>> bug in a UFS host controller, namely that a particular UFS host
>> controller not always generates a UIC completion interrupt when it
>> should.
>>
> 0f52fcb99ea2 was submitted in 2020 before I started contributing to UFS driver
> seriously. But the description of that commit never mentioned any issue with the
> controller. It vaguely mentions 'kernel scheduling problem' which I don't know
> how to interpret. If I were looking into the code at that time, I would've
> definitely asked for clarity during the review phase.

0f52fcb99ea2 is my commit, apologize for the confusion due to poor commit msg.
What we were trying to fix was not a SoC BUG. More background for this change:
from our customer side, we used to hit corner cases where the UIC command is
sent, UFS host controller generates the UIC command completion interrupt fine,
then UIC completion IRQ handler fires and calls the complete(), however the
completion timeout error still happens. In this case, UFS, UFS host and UFS
driver are the victims. And whatever could cause this scheduling problem should
be fixed properly by the right PoC, but we thought making UFS driver robust in
this spot would be good for all of the users who may face the similar issue,
hence the change.

Thanks,
Can Guo.

>
> But there is no need to take it as an example. I can only assert the fact that
> working around the controller defect in core code when we already have quirks
> for the same purpose defeats the purpose of quirks. And it will encourage other
> people to start changing the core code in the future thus bypassing the quirks.
>
> But I'm not a maintainer of this part of the code. So I cannot definitely stop
> you from getting this patch merged. I'll leave it up to Martin to decide.
>
> - Mani
>
Bart Van Assche Aug. 26, 2024, 6:08 p.m. UTC | #33
On 8/25/24 11:16 PM, Peter Wang (王信友) wrote:
> In this case, I suggest use a vendor hooks ufshcd_vops_hibern8_notify.
> When hibernate enter pre-change, disable UIC_COMMAND_COMPL interrupt
> to prevent enable UIC_COMMAND_COMPL after hibernate enter.
> When hibernate exit post-change, enable UIC_COMMAND_COMPL interrupt.
> 
> If it works, it won't modify the native kernel code, nor will it
> require adding a quirk. It would simply use a vendor hook as a
> workaround, without violating GKI, right?

Hmm ... does this mean introducing a new vendor hook without introducing
any host driver code that implements that hook? I don't think this is
allowed.

How about introducing a quirk that selects the current behavior if set
(disable UIC completion interrupt around hibernation) and not touching
the UIC completion interrupt if that quirk is not set?

Thanks,

Bart.
Peter Wang (王信友) Aug. 27, 2024, 1:39 a.m. UTC | #34
On Mon, 2024-08-26 at 11:08 -0700, Bart Van Assche wrote:
>  	 
> External email : Please do not click links or open attachments until
> you have verified the sender or the content.
>  On 8/25/24 11:16 PM, Peter Wang (王信友) wrote:
> > In this case, I suggest use a vendor hooks
> ufshcd_vops_hibern8_notify.
> > When hibernate enter pre-change, disable UIC_COMMAND_COMPL
> interrupt
> > to prevent enable UIC_COMMAND_COMPL after hibernate enter.
> > When hibernate exit post-change, enable UIC_COMMAND_COMPL
> interrupt.
> > 
> > If it works, it won't modify the native kernel code, nor will it
> > require adding a quirk. It would simply use a vendor hook as a
> > workaround, without violating GKI, right?
> 
> Hmm ... does this mean introducing a new vendor hook without
> introducing
> any host driver code that implements that hook? I don't think this is
> allowed.
> 

Hi Bart,

It is not a new vendor hook, ufshcd_vops_hibern8_notify is exist 
in current kernel.

Thanks.
Peter


> How about introducing a quirk that selects the current behavior if
> set
> (disable UIC completion interrupt around hibernation) and not
> touching
> the UIC completion interrupt if that quirk is not set?
> 
> Thanks,
> 
> Bart.
Bart Van Assche Aug. 27, 2024, 3:42 p.m. UTC | #35
On 8/26/24 6:39 PM, Peter Wang (王信友) wrote:
> It is not a new vendor hook, ufshcd_vops_hibern8_notify is exist
> in current kernel.
Hi Peter,

Is something like the untested patch below perhaps what you have in
mind?

Thanks,

Bart.


diff --git a/drivers/ufs/core/ufshcd-priv.h b/drivers/ufs/core/ufshcd-priv.h
index ce36154ce963..69ee49a75c04 100644
--- a/drivers/ufs/core/ufshcd-priv.h
+++ b/drivers/ufs/core/ufshcd-priv.h
@@ -176,12 +176,14 @@ static inline void 
ufshcd_vops_setup_task_mgmt(struct ufs_hba *hba,
  		return hba->vops->setup_task_mgmt(hba, tag, tm_function);
  }

-static inline void ufshcd_vops_hibern8_notify(struct ufs_hba *hba,
-					enum uic_cmd_dme cmd,
-					enum ufs_notify_change_status status)
+static inline void
+ufshcd_vops_hibern8_notify(struct ufs_hba *hba, enum uic_cmd_dme cmd,
+			   enum ufs_notify_change_status status,
+			   bool *disable_uic_compl_intr)
  {
  	if (hba->vops && hba->vops->hibern8_notify)
-		return hba->vops->hibern8_notify(hba, cmd, status);
+		return hba->vops->hibern8_notify(hba, cmd, status,
+						 disable_uic_compl_intr);
  }

  static inline int ufshcd_vops_apply_dev_quirks(struct ufs_hba *hba)
diff --git a/drivers/ufs/core/ufshcd.c b/drivers/ufs/core/ufshcd.c
index e13b9ac145f6..614b24f2eb7f 100644
--- a/drivers/ufs/core/ufshcd.c
+++ b/drivers/ufs/core/ufshcd.c
@@ -2541,9 +2541,8 @@ ufshcd_wait_for_uic_cmd(struct ufs_hba *hba, 
struct uic_command *uic_cmd)
   *
   * Return: 0 only if success.
   */
-static int
-__ufshcd_send_uic_cmd(struct ufs_hba *hba, struct uic_command *uic_cmd,
-		      bool completion)
+static int __ufshcd_send_uic_cmd(struct ufs_hba *hba,
+				 struct uic_command *uic_cmd)
  {
  	lockdep_assert_held(&hba->uic_cmd_mutex);

@@ -2553,8 +2552,7 @@ __ufshcd_send_uic_cmd(struct ufs_hba *hba, struct 
uic_command *uic_cmd,
  		return -EIO;
  	}

-	if (completion)
-		init_completion(&uic_cmd->done);
+	init_completion(&uic_cmd->done);

  	uic_cmd->cmd_active = 1;
  	ufshcd_dispatch_uic_cmd(hba, uic_cmd);
@@ -2580,7 +2578,7 @@ int ufshcd_send_uic_cmd(struct ufs_hba *hba, 
struct uic_command *uic_cmd)
  	mutex_lock(&hba->uic_cmd_mutex);
  	ufshcd_add_delay_before_dme_cmd(hba);

-	ret = __ufshcd_send_uic_cmd(hba, uic_cmd, true);
+	ret = __ufshcd_send_uic_cmd(hba, uic_cmd);
  	if (!ret)
  		ret = ufshcd_wait_for_uic_cmd(hba, uic_cmd);

@@ -4243,7 +4241,8 @@ EXPORT_SYMBOL_GPL(ufshcd_dme_get_attr);
   *
   * Return: 0 on success, non-zero value on failure.
   */
-static int ufshcd_uic_pwr_ctrl(struct ufs_hba *hba, struct uic_command 
*cmd)
+static int ufshcd_uic_pwr_ctrl(struct ufs_hba *hba, struct uic_command 
*cmd,
+			       bool disable_uic_compl_intr)
  {
  	DECLARE_COMPLETION_ONSTACK(uic_async_done);
  	unsigned long flags;
@@ -4260,7 +4259,8 @@ static int ufshcd_uic_pwr_ctrl(struct ufs_hba 
*hba, struct uic_command *cmd)
  		goto out_unlock;
  	}
  	hba->uic_async_done = &uic_async_done;
-	if (ufshcd_readl(hba, REG_INTERRUPT_ENABLE) & UIC_COMMAND_COMPL) {
+	if (disable_uic_compl_intr &&
+	    ufshcd_readl(hba, REG_INTERRUPT_ENABLE) & UIC_COMMAND_COMPL) {
  		ufshcd_disable_intr(hba, UIC_COMMAND_COMPL);
  		/*
  		 * Make sure UIC command completion interrupt is disabled before
@@ -4270,7 +4270,7 @@ static int ufshcd_uic_pwr_ctrl(struct ufs_hba 
*hba, struct uic_command *cmd)
  		reenable_intr = true;
  	}
  	spin_unlock_irqrestore(hba->host->host_lock, flags);
-	ret = __ufshcd_send_uic_cmd(hba, cmd, false);
+	ret = __ufshcd_send_uic_cmd(hba, cmd);
  	if (ret) {
  		dev_err(hba->dev,
  			"pwr ctrl cmd 0x%x with mode 0x%x uic error %d\n",
@@ -4294,6 +4294,16 @@ static int ufshcd_uic_pwr_ctrl(struct ufs_hba 
*hba, struct uic_command *cmd)
  		goto out;
  	}

+	if (!disable_uic_compl_intr &&
+	    ufshcd_readl(hba, REG_INTERRUPT_ENABLE) & UIC_COMMAND_COMPL) {
+		ret = wait_for_completion_timeout(
+			&cmd->done, msecs_to_jiffies(uic_cmd_timeout));
+		if (ret == 0)
+			dev_err(hba->dev, "pwr ctrl cmd %#x timed out\n",
+				cmd->command);
+		ret = 0;
+	}
+
  check_upmcrs:
  	status = ufshcd_get_upmcrs(hba);
  	if (status != PWR_LOCAL) {
@@ -4353,7 +4363,7 @@ int ufshcd_uic_change_pwr_mode(struct ufs_hba 
*hba, u8 mode)
  	}

  	ufshcd_hold(hba);
-	ret = ufshcd_uic_pwr_ctrl(hba, &uic_cmd);
+	ret = ufshcd_uic_pwr_ctrl(hba, &uic_cmd, true);
  	ufshcd_release(hba);

  out:
@@ -4396,11 +4406,13 @@ int ufshcd_uic_hibern8_enter(struct ufs_hba *hba)
  		.command = UIC_CMD_DME_HIBER_ENTER,
  	};
  	ktime_t start = ktime_get();
+	bool disable_uic_compl_intr = true;
  	int ret;

-	ufshcd_vops_hibern8_notify(hba, UIC_CMD_DME_HIBER_ENTER, PRE_CHANGE);
+	ufshcd_vops_hibern8_notify(hba, UIC_CMD_DME_HIBER_ENTER, PRE_CHANGE,
+				   &disable_uic_compl_intr);

-	ret = ufshcd_uic_pwr_ctrl(hba, &uic_cmd);
+	ret = ufshcd_uic_pwr_ctrl(hba, &uic_cmd, disable_uic_compl_intr);
  	trace_ufshcd_profile_hibern8(dev_name(hba->dev), "enter",
  			     ktime_to_us(ktime_sub(ktime_get(), start)), ret);

@@ -4409,7 +4421,7 @@ int ufshcd_uic_hibern8_enter(struct ufs_hba *hba)
  			__func__, ret);
  	else
  		ufshcd_vops_hibern8_notify(hba, UIC_CMD_DME_HIBER_ENTER,
-								POST_CHANGE);
+					   POST_CHANGE, NULL);

  	return ret;
  }
@@ -4423,9 +4435,10 @@ int ufshcd_uic_hibern8_exit(struct ufs_hba *hba)
  	int ret;
  	ktime_t start = ktime_get();

-	ufshcd_vops_hibern8_notify(hba, UIC_CMD_DME_HIBER_EXIT, PRE_CHANGE);
+	ufshcd_vops_hibern8_notify(hba, UIC_CMD_DME_HIBER_EXIT, PRE_CHANGE,
+				   NULL);

-	ret = ufshcd_uic_pwr_ctrl(hba, &uic_cmd);
+	ret = ufshcd_uic_pwr_ctrl(hba, &uic_cmd, true);
  	trace_ufshcd_profile_hibern8(dev_name(hba->dev), "exit",
  			     ktime_to_us(ktime_sub(ktime_get(), start)), ret);

@@ -4434,7 +4447,7 @@ int ufshcd_uic_hibern8_exit(struct ufs_hba *hba)
  			__func__, ret);
  	} else {
  		ufshcd_vops_hibern8_notify(hba, UIC_CMD_DME_HIBER_EXIT,
-								POST_CHANGE);
+					   POST_CHANGE, NULL);
  		hba->ufs_stats.last_hibern8_exit_tstamp = local_clock();
  		hba->ufs_stats.hibern8_exit_cnt++;
  	}
diff --git a/drivers/ufs/host/cdns-pltfrm.c b/drivers/ufs/host/cdns-pltfrm.c
index 66811d8d1929..24c05e5c455d 100644
--- a/drivers/ufs/host/cdns-pltfrm.c
+++ b/drivers/ufs/host/cdns-pltfrm.c
@@ -164,7 +164,8 @@ static int cdns_ufs_hce_enable_notify(struct ufs_hba 
*hba,
   * @status: notify stage (pre, post change)
   */
  static void cdns_ufs_hibern8_notify(struct ufs_hba *hba, enum 
uic_cmd_dme cmd,
-				    enum ufs_notify_change_status status)
+				    enum ufs_notify_change_status status,
+				    bool *disable_uic_compl_intr)
  {
  	if (status == PRE_CHANGE && cmd == UIC_CMD_DME_HIBER_ENTER)
  		cdns_ufs_get_l4_attr(hba);
diff --git a/drivers/ufs/host/ufs-exynos.c b/drivers/ufs/host/ufs-exynos.c
index 16ad3528d80b..e991d9e5e2e4 100644
--- a/drivers/ufs/host/ufs-exynos.c
+++ b/drivers/ufs/host/ufs-exynos.c
@@ -1624,8 +1624,9 @@ static int exynos_ufs_pwr_change_notify(struct 
ufs_hba *hba,
  }

  static void exynos_ufs_hibern8_notify(struct ufs_hba *hba,
-				     enum uic_cmd_dme enter,
-				     enum ufs_notify_change_status notify)
+				      enum uic_cmd_dme enter,
+				      enum ufs_notify_change_status notify,
+				      bool *disable_uic_compl_intr)
  {
  	switch ((u8)notify) {
  	case PRE_CHANGE:
diff --git a/include/ufs/ufshcd.h b/include/ufs/ufshcd.h
index a43b14276bc3..59b901d67400 100644
--- a/include/ufs/ufshcd.h
+++ b/include/ufs/ufshcd.h
@@ -355,8 +355,9 @@ struct ufs_hba_variant_ops {
  	void	(*setup_xfer_req)(struct ufs_hba *hba, int tag,
  				  bool is_scsi_cmd);
  	void	(*setup_task_mgmt)(struct ufs_hba *, int, u8);
-	void    (*hibern8_notify)(struct ufs_hba *, enum uic_cmd_dme,
-					enum ufs_notify_change_status);
+	void	(*hibern8_notify)(struct ufs_hba *, enum uic_cmd_dme,
+				  enum ufs_notify_change_status,
+				  bool *disable_uic_compl_intr);
  	int	(*apply_dev_quirks)(struct ufs_hba *hba);
  	void	(*fixup_dev_quirks)(struct ufs_hba *hba);
  	int     (*suspend)(struct ufs_hba *, enum ufs_pm_op,
Bao D. Nguyen Aug. 27, 2024, 9:17 p.m. UTC | #36
On 8/23/2024 1:25 PM, Bart Van Assche wrote:

> If the UIC command completion interrupt could come late we would
> already have observed unhandled interrupt errors in device logs, isn't
> it?
> 
> Anyway, isn't this something that is easy to fix with something like the
> (untested) patch below?
> 
> 
> @@ -2559,8 +2557,7 @@ __ufshcd_send_uic_cmd(struct ufs_hba *hba, struct 
> uic_command *uic_cmd,
>           return -EIO;
>       }
> 
> -    if (completion)
> -        init_completion(&uic_cmd->done);
> +    init_completion(&uic_cmd->done);

This change may not work, Bart.
In this path ufshcd_uic_pwr_ctrl()->__ufshcd_send_uic_cmd(), the 
proposal always has a init_completion(&uic_cmd->done). However, there is 
no ufshcd_wait_for_uic_cmd() in this path. The isr will call 
ufshcd_uic_cmd_compl()->complete(&hba->active_uic_cmd->done); to signal 
the completion, but you are missing the code that waits for the 
&hba->active_uic_cmd->done signal, so it may cause stability issues.

I am going to review another of your proposal in the mail with Peter and 
get back.

Thanks, Bao
Bart Van Assche Aug. 27, 2024, 9:39 p.m. UTC | #37
On 8/27/24 5:17 PM, Bao D. Nguyen wrote:
> On 8/23/2024 1:25 PM, Bart Van Assche wrote:
>> -    if (completion)
>> -        init_completion(&uic_cmd->done);
>> +    init_completion(&uic_cmd->done);
> 
> This change may not work, Bart.

Why not? It is allowed to call init_completion() multiple times before
wait_for_completion() is called. It is even allowed to call 
init_completion() without ever calling wait_for_completion(). All 
init_completion() does is to perform a few assignments:

static inline void init_completion(struct completion *x)
{
	x->done = 0;
	init_swait_queue_head(&x->wait);
}

It is not allowed to call init_completion() concurrently with
wait_for_completion() because that would trigger a race condition. But
I don't think that my patch introduces such a race.

Thanks,

Bart.
Bao D. Nguyen Aug. 27, 2024, 9:59 p.m. UTC | #38
On 8/27/2024 8:42 AM, Bart Van Assche wrote:
> On 8/26/24 6:39 PM, Peter Wang (王信友) wrote:
>> It is not a new vendor hook, ufshcd_vops_hibern8_notify is exist
>> in current kernel.
> Hi Peter,
> 
> Is something like the untested patch below perhaps what you have in
> mind?

Hi Bart, IMHO even though the patch may work, it makes the existing code 
more complicated with the disable_uic_compl_intr flag being sent to the 
platform driver that has nothing to do with it. It seems the proposal 
makes the code harder to read as well. A new person without any history 
of this issue trying to understand the purpose of this flag would 
probably need to spend some extra time digging the history.

Thanks, Bao
Peter Wang (王信友) Aug. 28, 2024, 6:17 a.m. UTC | #39
On Tue, 2024-08-27 at 11:42 -0400, Bart Van Assche wrote:
>  	 
> External email : Please do not click links or open attachments until
> you have verified the sender or the content.
>  On 8/26/24 6:39 PM, Peter Wang (王信友) wrote:
> > It is not a new vendor hook, ufshcd_vops_hibern8_notify is exist
> > in current kernel.
> Hi Peter,
> 
> Is something like the untested patch below perhaps what you have in
> mind?
> 
> Thanks,
> 
> Bart.
> 

Hi Bart,

No, I means you can reference ufs-sprd.c driver. which may have the
same issue?

			/*
			 * Disable UIC COMPL INTR to prevent access to
UFSHCI after
			 * checking HCS.UPMCRS
			 */
			ufs_sprd_ctrl_uic_compl(hba, false);

Then after enter hibernte, you can prevent access to UFSHCI.
After exit hibernate, enable uic complete interrupt again for
workaround.

Thanks.
Peter


> 
> diff --git a/drivers/ufs/core/ufshcd-priv.h
> b/drivers/ufs/core/ufshcd-priv.h
> index ce36154ce963..69ee49a75c04 100644
> --- a/drivers/ufs/core/ufshcd-priv.h
> +++ b/drivers/ufs/core/ufshcd-priv.h
> @@ -176,12 +176,14 @@ static inline void 
> ufshcd_vops_setup_task_mgmt(struct ufs_hba *hba,
>   return hba->vops->setup_task_mgmt(hba, tag, tm_function);
>   }
> 
> -static inline void ufshcd_vops_hibern8_notify(struct ufs_hba *hba,
> -enum uic_cmd_dme cmd,
> -enum ufs_notify_change_status status)
> +static inline void
> +ufshcd_vops_hibern8_notify(struct ufs_hba *hba, enum uic_cmd_dme
> cmd,
> +   enum ufs_notify_change_status status,
> +   bool *disable_uic_compl_intr)
>   {
>   if (hba->vops && hba->vops->hibern8_notify)
> -return hba->vops->hibern8_notify(hba, cmd, status);
> +return hba->vops->hibern8_notify(hba, cmd, status,
> + disable_uic_compl_intr);
>   }
> 
>   static inline int ufshcd_vops_apply_dev_quirks(struct ufs_hba *hba)
> diff --git a/drivers/ufs/core/ufshcd.c b/drivers/ufs/core/ufshcd.c
> index e13b9ac145f6..614b24f2eb7f 100644
> --- a/drivers/ufs/core/ufshcd.c
> +++ b/drivers/ufs/core/ufshcd.c
> @@ -2541,9 +2541,8 @@ ufshcd_wait_for_uic_cmd(struct ufs_hba *hba, 
> struct uic_command *uic_cmd)
>    *
>    * Return: 0 only if success.
>    */
> -static int
> -__ufshcd_send_uic_cmd(struct ufs_hba *hba, struct uic_command
> *uic_cmd,
> -      bool completion)
> +static int __ufshcd_send_uic_cmd(struct ufs_hba *hba,
> + struct uic_command *uic_cmd)
>   {
>   lockdep_assert_held(&hba->uic_cmd_mutex);
> 
> @@ -2553,8 +2552,7 @@ __ufshcd_send_uic_cmd(struct ufs_hba *hba,
> struct 
> uic_command *uic_cmd,
>   return -EIO;
>   }
> 
> -if (completion)
> -init_completion(&uic_cmd->done);
> +init_completion(&uic_cmd->done);
> 
>   uic_cmd->cmd_active = 1;
>   ufshcd_dispatch_uic_cmd(hba, uic_cmd);
> @@ -2580,7 +2578,7 @@ int ufshcd_send_uic_cmd(struct ufs_hba *hba, 
> struct uic_command *uic_cmd)
>   mutex_lock(&hba->uic_cmd_mutex);
>   ufshcd_add_delay_before_dme_cmd(hba);
> 
> -ret = __ufshcd_send_uic_cmd(hba, uic_cmd, true);
> +ret = __ufshcd_send_uic_cmd(hba, uic_cmd);
>   if (!ret)
>   ret = ufshcd_wait_for_uic_cmd(hba, uic_cmd);
> 
> @@ -4243,7 +4241,8 @@ EXPORT_SYMBOL_GPL(ufshcd_dme_get_attr);
>    *
>    * Return: 0 on success, non-zero value on failure.
>    */
> -static int ufshcd_uic_pwr_ctrl(struct ufs_hba *hba, struct
> uic_command 
> *cmd)
> +static int ufshcd_uic_pwr_ctrl(struct ufs_hba *hba, struct
> uic_command 
> *cmd,
> +       bool disable_uic_compl_intr)
>   {
>   DECLARE_COMPLETION_ONSTACK(uic_async_done);
>   unsigned long flags;
> @@ -4260,7 +4259,8 @@ static int ufshcd_uic_pwr_ctrl(struct ufs_hba 
> *hba, struct uic_command *cmd)
>   goto out_unlock;
>   }
>   hba->uic_async_done = &uic_async_done;
> -if (ufshcd_readl(hba, REG_INTERRUPT_ENABLE) & UIC_COMMAND_COMPL) {
> +if (disable_uic_compl_intr &&
> +    ufshcd_readl(hba, REG_INTERRUPT_ENABLE) & UIC_COMMAND_COMPL) {
>   ufshcd_disable_intr(hba, UIC_COMMAND_COMPL);
>   /*
>    * Make sure UIC command completion interrupt is disabled before
> @@ -4270,7 +4270,7 @@ static int ufshcd_uic_pwr_ctrl(struct ufs_hba 
> *hba, struct uic_command *cmd)
>   reenable_intr = true;
>   }
>   spin_unlock_irqrestore(hba->host->host_lock, flags);
> -ret = __ufshcd_send_uic_cmd(hba, cmd, false);
> +ret = __ufshcd_send_uic_cmd(hba, cmd);
>   if (ret) {
>   dev_err(hba->dev,
>   "pwr ctrl cmd 0x%x with mode 0x%x uic error %d\n",
> @@ -4294,6 +4294,16 @@ static int ufshcd_uic_pwr_ctrl(struct ufs_hba 
> *hba, struct uic_command *cmd)
>   goto out;
>   }
> 
> +if (!disable_uic_compl_intr &&
> +    ufshcd_readl(hba, REG_INTERRUPT_ENABLE) & UIC_COMMAND_COMPL) {
> +ret = wait_for_completion_timeout(
> +&cmd->done, msecs_to_jiffies(uic_cmd_timeout));
> +if (ret == 0)
> +dev_err(hba->dev, "pwr ctrl cmd %#x timed out\n",
> +cmd->command);
> +ret = 0;
> +}
> +
>   check_upmcrs:
>   status = ufshcd_get_upmcrs(hba);
>   if (status != PWR_LOCAL) {
> @@ -4353,7 +4363,7 @@ int ufshcd_uic_change_pwr_mode(struct ufs_hba 
> *hba, u8 mode)
>   }
> 
>   ufshcd_hold(hba);
> -ret = ufshcd_uic_pwr_ctrl(hba, &uic_cmd);
> +ret = ufshcd_uic_pwr_ctrl(hba, &uic_cmd, true);
>   ufshcd_release(hba);
> 
>   out:
> @@ -4396,11 +4406,13 @@ int ufshcd_uic_hibern8_enter(struct ufs_hba
> *hba)
>   .command = UIC_CMD_DME_HIBER_ENTER,
>   };
>   ktime_t start = ktime_get();
> +bool disable_uic_compl_intr = true;
>   int ret;
> 
> -ufshcd_vops_hibern8_notify(hba, UIC_CMD_DME_HIBER_ENTER,
> PRE_CHANGE);
> +ufshcd_vops_hibern8_notify(hba, UIC_CMD_DME_HIBER_ENTER, PRE_CHANGE,
> +   &disable_uic_compl_intr);
> 
> -ret = ufshcd_uic_pwr_ctrl(hba, &uic_cmd);
> +ret = ufshcd_uic_pwr_ctrl(hba, &uic_cmd, disable_uic_compl_intr);
>   trace_ufshcd_profile_hibern8(dev_name(hba->dev), "enter",
>        ktime_to_us(ktime_sub(ktime_get(), start)), ret);
> 
> @@ -4409,7 +4421,7 @@ int ufshcd_uic_hibern8_enter(struct ufs_hba
> *hba)
>   __func__, ret);
>   else
>   ufshcd_vops_hibern8_notify(hba, UIC_CMD_DME_HIBER_ENTER,
> -POST_CHANGE);
> +   POST_CHANGE, NULL);
> 
>   return ret;
>   }
> @@ -4423,9 +4435,10 @@ int ufshcd_uic_hibern8_exit(struct ufs_hba
> *hba)
>   int ret;
>   ktime_t start = ktime_get();
> 
> -ufshcd_vops_hibern8_notify(hba, UIC_CMD_DME_HIBER_EXIT, PRE_CHANGE);
> +ufshcd_vops_hibern8_notify(hba, UIC_CMD_DME_HIBER_EXIT, PRE_CHANGE,
> +   NULL);
> 
> -ret = ufshcd_uic_pwr_ctrl(hba, &uic_cmd);
> +ret = ufshcd_uic_pwr_ctrl(hba, &uic_cmd, true);
>   trace_ufshcd_profile_hibern8(dev_name(hba->dev), "exit",
>        ktime_to_us(ktime_sub(ktime_get(), start)), ret);
> 
> @@ -4434,7 +4447,7 @@ int ufshcd_uic_hibern8_exit(struct ufs_hba
> *hba)
>   __func__, ret);
>   } else {
>   ufshcd_vops_hibern8_notify(hba, UIC_CMD_DME_HIBER_EXIT,
> -POST_CHANGE);
> +   POST_CHANGE, NULL);
>   hba->ufs_stats.last_hibern8_exit_tstamp = local_clock();
>   hba->ufs_stats.hibern8_exit_cnt++;
>   }
> diff --git a/drivers/ufs/host/cdns-pltfrm.c b/drivers/ufs/host/cdns-
> pltfrm.c
> index 66811d8d1929..24c05e5c455d 100644
> --- a/drivers/ufs/host/cdns-pltfrm.c
> +++ b/drivers/ufs/host/cdns-pltfrm.c
> @@ -164,7 +164,8 @@ static int cdns_ufs_hce_enable_notify(struct
> ufs_hba 
> *hba,
>    * @status: notify stage (pre, post change)
>    */
>   static void cdns_ufs_hibern8_notify(struct ufs_hba *hba, enum 
> uic_cmd_dme cmd,
> -    enum ufs_notify_change_status status)
> +    enum ufs_notify_change_status status,
> +    bool *disable_uic_compl_intr)
>   {
>   if (status == PRE_CHANGE && cmd == UIC_CMD_DME_HIBER_ENTER)
>   cdns_ufs_get_l4_attr(hba);
> diff --git a/drivers/ufs/host/ufs-exynos.c b/drivers/ufs/host/ufs-
> exynos.c
> index 16ad3528d80b..e991d9e5e2e4 100644
> --- a/drivers/ufs/host/ufs-exynos.c
> +++ b/drivers/ufs/host/ufs-exynos.c
> @@ -1624,8 +1624,9 @@ static int exynos_ufs_pwr_change_notify(struct 
> ufs_hba *hba,
>   }
> 
>   static void exynos_ufs_hibern8_notify(struct ufs_hba *hba,
> -     enum uic_cmd_dme enter,
> -     enum ufs_notify_change_status notify)
> +      enum uic_cmd_dme enter,
> +      enum ufs_notify_change_status notify,
> +      bool *disable_uic_compl_intr)
>   {
>   switch ((u8)notify) {
>   case PRE_CHANGE:
> diff --git a/include/ufs/ufshcd.h b/include/ufs/ufshcd.h
> index a43b14276bc3..59b901d67400 100644
> --- a/include/ufs/ufshcd.h
> +++ b/include/ufs/ufshcd.h
> @@ -355,8 +355,9 @@ struct ufs_hba_variant_ops {
>   void(*setup_xfer_req)(struct ufs_hba *hba, int tag,
>     bool is_scsi_cmd);
>   void(*setup_task_mgmt)(struct ufs_hba *, int, u8);
> -void    (*hibern8_notify)(struct ufs_hba *, enum uic_cmd_dme,
> -enum ufs_notify_change_status);
> +void(*hibern8_notify)(struct ufs_hba *, enum uic_cmd_dme,
> +  enum ufs_notify_change_status,
> +  bool *disable_uic_compl_intr);
>   int(*apply_dev_quirks)(struct ufs_hba *hba);
>   void(*fixup_dev_quirks)(struct ufs_hba *hba);
>   int     (*suspend)(struct ufs_hba *, enum ufs_pm_op,
>
Bart Van Assche Aug. 28, 2024, 11:18 a.m. UTC | #40
On 8/28/24 2:17 AM, Peter Wang (王信友) wrote:
> No, I means you can reference ufs-sprd.c driver. which may have the
> same issue?
> 
> 			/*
> 			 * Disable UIC COMPL INTR to prevent access to
> UFSHCI after
> 			 * checking HCS.UPMCRS
> 			 */
> 			ufs_sprd_ctrl_uic_compl(hba, false);
> 
> Then after enter hibernte, you can prevent access to UFSHCI.
> After exit hibernate, enable uic complete interrupt again for
> workaround.
Hi Peter,

My opinion about this is as follows:
* Host drivers should not disable or enable the UIC completion
   interrupt. Only the UFS controller core driver should do this.
* The behavior I'm observing is that modifying the REG_INTERRUPT_ENABLE
   register is sufficient to cause the UniPro link to exit the
   hibernation state. Avoiding this cannot be achieved in a clean way
   without modifying the UFS controller core driver.

Thanks,

Bart.
Peter Wang (王信友) Aug. 28, 2024, 1:46 p.m. UTC | #41
On Wed, 2024-08-28 at 07:18 -0400, Bart Van Assche wrote:
>  	 
> External email : Please do not click links or open attachments until
> you have verified the sender or the content.
>  On 8/28/24 2:17 AM, Peter Wang (王信友) wrote:
> > No, I means you can reference ufs-sprd.c driver. which may have the
> > same issue?
> > 
> > /*
> >  * Disable UIC COMPL INTR to prevent access to
> > UFSHCI after
> >  * checking HCS.UPMCRS
> >  */
> > ufs_sprd_ctrl_uic_compl(hba, false);
> > 
> > Then after enter hibernte, you can prevent access to UFSHCI.
> > After exit hibernate, enable uic complete interrupt again for
> > workaround.
> Hi Peter,
> 
> My opinion about this is as follows:
> * Host drivers should not disable or enable the UIC completion
>    interrupt. Only the UFS controller core driver should do this.
> * The behavior I'm observing is that modifying the
> REG_INTERRUPT_ENABLE
>    register is sufficient to cause the UniPro link to exit the
>    hibernation state. Avoiding this cannot be achieved in a clean way
>    without modifying the UFS controller core driver.
> 
> Thanks,
> 
> Bart.

Hi Bart,

I am confusing,

1. If only UFS controller core driver should do this, 
   What about registers that are not REG_INTERRUPT_ENABLE?
   Since ufshcd_writel has been exported, shouldn't the host 
   driver have the authority to control all Host REGs?
2. Set REG_INTERRUPT_ENABLE only when hibernate exit? 
   If cause the UniPro link to exit, then it should still be correct, 
   just exiting hibernate early?

Thanks.
Peter
Bart Van Assche Aug. 28, 2024, 2:10 p.m. UTC | #42
On 8/28/24 9:46 AM, Peter Wang (王信友) wrote:
> 1. If only UFS controller core driver should do this,
>     What about registers that are not REG_INTERRUPT_ENABLE?
>     Since ufshcd_writel has been exported, shouldn't the host
>     driver have the authority to control all Host REGs?

It is not because host drivers can change any host controller register
that host drivers should take the freedom to modify all host controller
registers. Modifying host controller registers that are vendor
specific from a host driver seems totally fine to me. I think that
standardized host controller registers should only be modified from the
UFS driver core. Otherwise the UFS core driver cannot be understood nor
verified without deep understanding of all the host drivers.

> 2. Set REG_INTERRUPT_ENABLE only when hibernate exit?
>     If cause the UniPro link to exit, then it should still be correct,
>     just exiting hibernate early?

This approach has two disadvantages:
- It requires that even more state information is tracked in struct
   ufs_hba.
- This approach is probably incompatible with the power management core.
   I think that there is code in the power management core for disabling
   interrupts during suspend and reenabling interrupts during resume.
   Enabling an interrupt that is already enabled is not allowed.

In general, disabling / reenabling interrupts is something that should
be avoided because it is not compatible with multithreading. The reason
why it works in the UFS driver for UIC commands is because these are
serialized.

Thanks,

Bart.
Peter Wang (王信友) Aug. 29, 2024, 2:34 a.m. UTC | #43
On Wed, 2024-08-28 at 10:10 -0400, Bart Van Assche wrote:
>  	 
> External email : Please do not click links or open attachments until
> you have verified the sender or the content.
>  On 8/28/24 9:46 AM, Peter Wang (王信友) wrote:
> > 1. If only UFS controller core driver should do this,
> >     What about registers that are not REG_INTERRUPT_ENABLE?
> >     Since ufshcd_writel has been exported, shouldn't the host
> >     driver have the authority to control all Host REGs?
> 
> It is not because host drivers can change any host controller
> register
> that host drivers should take the freedom to modify all host
> controller
> registers. Modifying host controller registers that are vendor
> specific from a host driver seems totally fine to me. I think that
> standardized host controller registers should only be modified from
> the
> UFS driver core. Otherwise the UFS core driver cannot be understood
> nor
> verified without deep understanding of all the host drivers.
> 

Hi Bart,

But what we are discussing is a 'workaround', which might allow 
modifications to standardized host controller registers, right?



> > 2. Set REG_INTERRUPT_ENABLE only when hibernate exit?
> >     If cause the UniPro link to exit, then it should still be
> correct,
> >     just exiting hibernate early?
> 
> This approach has two disadvantages:
> - It requires that even more state information is tracked in struct
>    ufs_hba.

It could utilize a host private structure, like ufs_mtk_host.

> - This approach is probably incompatible with the power management
> core.
>    I think that there is code in the power management core for
> disabling
>    interrupts during suspend and reenabling interrupts during resume.
>    Enabling an interrupt that is already enabled is not allowed.
> 

Power management enable or disable interrupt should through 
device driver hook, such as suspend/resume callback function? 
I am not sure because MediaTek's power management does 
not directly control interrupts.


> In general, disabling / reenabling interrupts is something that
> should
> be avoided because it is not compatible with multithreading. The
> reason
> why it works in the UFS driver for UIC commands is because these are
> serialized.

Yes, but entering and exiting hibernate are protected by 
clock gating or the runtime/system PM framework. There shouldn't 
be issues with multiple threads entering or exiting hibernate?

Thanks.
Peter

> 
> Thanks,
> 
> Bart.
diff mbox series

Patch

diff --git a/drivers/ufs/core/ufshcd.c b/drivers/ufs/core/ufshcd.c
index d0ae6e50becc..e12f30b8a83c 100644
--- a/drivers/ufs/core/ufshcd.c
+++ b/drivers/ufs/core/ufshcd.c
@@ -2585,6 +2585,7 @@  int ufshcd_send_uic_cmd(struct ufs_hba *hba, struct uic_command *uic_cmd)
 	ufshcd_hold(hba);
 	mutex_lock(&hba->uic_cmd_mutex);
 	ufshcd_add_delay_before_dme_cmd(hba);
+	WARN_ON(hba->uic_async_done);
 
 	ret = __ufshcd_send_uic_cmd(hba, uic_cmd, true);
 	if (!ret)
@@ -4255,7 +4256,6 @@  static int ufshcd_uic_pwr_ctrl(struct ufs_hba *hba, struct uic_command *cmd)
 	unsigned long flags;
 	u8 status;
 	int ret;
-	bool reenable_intr = false;
 
 	mutex_lock(&hba->uic_cmd_mutex);
 	ufshcd_add_delay_before_dme_cmd(hba);
@@ -4266,15 +4266,6 @@  static int ufshcd_uic_pwr_ctrl(struct ufs_hba *hba, struct uic_command *cmd)
 		goto out_unlock;
 	}
 	hba->uic_async_done = &uic_async_done;
-	if (ufshcd_readl(hba, REG_INTERRUPT_ENABLE) & UIC_COMMAND_COMPL) {
-		ufshcd_disable_intr(hba, UIC_COMMAND_COMPL);
-		/*
-		 * Make sure UIC command completion interrupt is disabled before
-		 * issuing UIC command.
-		 */
-		ufshcd_readl(hba, REG_INTERRUPT_ENABLE);
-		reenable_intr = true;
-	}
 	spin_unlock_irqrestore(hba->host->host_lock, flags);
 	ret = __ufshcd_send_uic_cmd(hba, cmd, false);
 	if (ret) {
@@ -4318,8 +4309,6 @@  static int ufshcd_uic_pwr_ctrl(struct ufs_hba *hba, struct uic_command *cmd)
 	spin_lock_irqsave(hba->host->host_lock, flags);
 	hba->active_uic_cmd = NULL;
 	hba->uic_async_done = NULL;
-	if (reenable_intr)
-		ufshcd_enable_intr(hba, UIC_COMMAND_COMPL);
 	if (ret) {
 		ufshcd_set_link_broken(hba);
 		ufshcd_schedule_eh_work(hba);
@@ -5472,11 +5461,12 @@  static irqreturn_t ufshcd_uic_cmd_compl(struct ufs_hba *hba, u32 intr_status)
 		hba->errors |= (UFSHCD_UIC_HIBERN8_MASK & intr_status);
 
 	if (intr_status & UIC_COMMAND_COMPL && cmd) {
-		cmd->argument2 |= ufshcd_get_uic_cmd_result(hba);
-		cmd->argument3 = ufshcd_get_dme_attr_val(hba);
-		if (!hba->uic_async_done)
+		if (!hba->uic_async_done) {
+			cmd->argument2 |= ufshcd_get_uic_cmd_result(hba);
+			cmd->argument3 = ufshcd_get_dme_attr_val(hba);
 			cmd->cmd_active = 0;
-		complete(&cmd->done);
+			complete(&cmd->done);
+		}
 		retval = IRQ_HANDLED;
 	}
 
diff --git a/include/ufs/ufshcd.h b/include/ufs/ufshcd.h
index a43b14276bc3..0577013a4611 100644
--- a/include/ufs/ufshcd.h
+++ b/include/ufs/ufshcd.h
@@ -868,9 +868,10 @@  enum ufshcd_mcq_opr {
  * @tmf_tag_set: TMF tag set.
  * @tmf_queue: Used to allocate TMF tags.
  * @tmf_rqs: array with pointers to TMF requests while these are in progress.
- * @active_uic_cmd: handle of active UIC command
- * @uic_cmd_mutex: mutex for UIC command
- * @uic_async_done: completion used during UIC processing
+ * @active_uic_cmd: active UIC command pointer.
+ * @uic_cmd_mutex: mutex used to serialize UIC command processing.
+ * @uic_async_done: completion used to wait for power mode or hibernation state
+ *	changes.
  * @ufshcd_state: UFSHCD state
  * @eh_flags: Error handling flags
  * @intr_mask: Interrupt Mask Bits