mbox series

[v4,0/2] Fix two regression issues for QCA controllers

Message ID 1713650800-29741-1-git-send-email-quic_zijuhu@quicinc.com (mailing list archive)
Headers show
Series Fix two regression issues for QCA controllers | expand

Message

quic_zijuhu April 20, 2024, 10:06 p.m. UTC
This patch series are to fix below 2 regression issues for QCA controllers
1) BT can't be enabled once BT was ever enabled for QCA_QCA6390
2) BT can't be enabled after disable then warm reboot for QCA_QCA6390

the links for these issues are shown below:
https://bugzilla.kernel.org/show_bug.cgi?id=218726
https://lore.kernel.org/linux-bluetooth/ea20bb9b-6b60-47fc-ae42-5eed918ad7b4@quicinc.com/T/#m73d6a71d2f454bb03588c66f3ef7912274d37c6f

Changes:
V3 -> V4: Correct code stype and commit message
V2 -> V3: Wrong patch sets are sent
V1 -> V2: Remove debugging logs

Zijun Hu (2):
  Bluetooth: qca: Fix BT enable failure for QCA_QCA6390
  Bluetooth: qca: Fix BT enable failure for QCA_QCA6390 after disable
    then warm reboot

 drivers/bluetooth/hci_qca.c | 7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

Comments

Wren Turkal April 21, 2024, 7:44 a.m. UTC | #1
On 4/20/24 3:06 PM, Zijun Hu wrote:
> This patch series are to fix below 2 regression issues for QCA controllers
> 1) BT can't be enabled once BT was ever enabled for QCA_QCA6390
> 2) BT can't be enabled after disable then warm reboot for QCA_QCA6390

@Zijun @Krzysztof and @Bartosz Would it be helpful for me to test these 
to ensure they fix the issues I reported?

> the links for these issues are shown below:
> https://bugzilla.kernel.org/show_bug.cgi?id=218726
> https://lore.kernel.org/linux-bluetooth/ea20bb9b-6b60-47fc-ae42-5eed918ad7b4@quicinc.com/T/#m73d6a71d2f454bb03588c66f3ef7912274d37c6f
> 
> Changes:
> V3 -> V4: Correct code stype and commit message
> V2 -> V3: Wrong patch sets are sent
> V1 -> V2: Remove debugging logs
> 
> Zijun Hu (2):
>    Bluetooth: qca: Fix BT enable failure for QCA_QCA6390
>    Bluetooth: qca: Fix BT enable failure for QCA_QCA6390 after disable
>      then warm reboot
> 
>   drivers/bluetooth/hci_qca.c | 7 ++++---
>   1 file changed, 4 insertions(+), 3 deletions(-)
quic_zijuhu April 21, 2024, 9:30 a.m. UTC | #2
On 4/21/2024 3:44 PM, Wren Turkal wrote:
> On 4/20/24 3:06 PM, Zijun Hu wrote:
>> This patch series are to fix below 2 regression issues for QCA
>> controllers
>> 1) BT can't be enabled once BT was ever enabled for QCA_QCA6390
>> 2) BT can't be enabled after disable then warm reboot for QCA_QCA6390
> 
> @Zijun @Krzysztof and @Bartosz Would it be helpful for me to test these
> to ensure they fix the issues I reported?
> 
Hi Wren,
for QCA6390. this updated patch sets is the same as the patch sets you
ever tested.
sure. if you would like to test this one.
>> the links for these issues are shown below:
>> https://bugzilla.kernel.org/show_bug.cgi?id=218726
>> https://lore.kernel.org/linux-bluetooth/ea20bb9b-6b60-47fc-ae42-5eed918ad7b4@quicinc.com/T/#m73d6a71d2f454bb03588c66f3ef7912274d37c6f
>>
>> Changes:
>> V3 -> V4: Correct code stype and commit message
>> V2 -> V3: Wrong patch sets are sent
>> V1 -> V2: Remove debugging logs
>>
>> Zijun Hu (2):
>>    Bluetooth: qca: Fix BT enable failure for QCA_QCA6390
>>    Bluetooth: qca: Fix BT enable failure for QCA_QCA6390 after disable
>>      then warm reboot
>>
>>   drivers/bluetooth/hci_qca.c | 7 ++++---
>>   1 file changed, 4 insertions(+), 3 deletions(-)
> 
>
Krzysztof Kozlowski April 21, 2024, 1:51 p.m. UTC | #3
On 21/04/2024 00:06, Zijun Hu wrote:
> This patch series are to fix below 2 regression issues for QCA controllers
> 1) BT can't be enabled once BT was ever enabled for QCA_QCA6390
> 2) BT can't be enabled after disable then warm reboot for QCA_QCA6390
> 
> the links for these issues are shown below:
> https://bugzilla.kernel.org/show_bug.cgi?id=218726
> https://lore.kernel.org/linux-bluetooth/ea20bb9b-6b60-47fc-ae42-5eed918ad7b4@quicinc.com/T/#m73d6a71d2f454bb03588c66f3ef7912274d37c6f
> 
> Changes:
> V3 -> V4: Correct code stype and commit message
> V2 -> V3: Wrong patch sets are sent

Didn't you got comment not to attach your postings to some other
threads? Each posting is a separate thread.

Best regards,
Krzysztof
Krzysztof Kozlowski April 21, 2024, 6:41 p.m. UTC | #4
On 21/04/2024 09:44, Wren Turkal wrote:
> On 4/20/24 3:06 PM, Zijun Hu wrote:
>> This patch series are to fix below 2 regression issues for QCA controllers
>> 1) BT can't be enabled once BT was ever enabled for QCA_QCA6390
>> 2) BT can't be enabled after disable then warm reboot for QCA_QCA6390
> 
> @Zijun @Krzysztof and @Bartosz Would it be helpful for me to test these 
> to ensure they fix the issues I reported?
> 

I look forward to someone testing these on other hardware, not yours. On
the hardware where the original issues were happening leading to this
changes, e.g. RB5.

Anyway, the problem here is poor explanation of the problem which did
not improve in v3 and v4. Instead I receive explanations like:

"this is shutdown of serdev and not hdev's shutdown."
Not related...

"now. you understood why your merged change as shown link of 4) have
problems and introduced our discussed issue, right?"

No. I did not understand and I feel I am wasting here time.

Code could be correct, could be wrong. Especially second patch looks
suspicious. But the way Zijun Hu explains it and the way Zijun Hu
responds is not helping at all.

Sorry, with such replies to review, it is not worth my time.

Best regards,
Krzysztof
quic_zijuhu April 22, 2024, 12:14 a.m. UTC | #5
On 4/22/2024 2:41 AM, Krzysztof Kozlowski wrote:
> On 21/04/2024 09:44, Wren Turkal wrote:
>> On 4/20/24 3:06 PM, Zijun Hu wrote:
>>> This patch series are to fix below 2 regression issues for QCA controllers
>>> 1) BT can't be enabled once BT was ever enabled for QCA_QCA6390
>>> 2) BT can't be enabled after disable then warm reboot for QCA_QCA6390
>>
>> @Zijun @Krzysztof and @Bartosz Would it be helpful for me to test these 
>> to ensure they fix the issues I reported?
>>
> 
> I look forward to someone testing these on other hardware, not yours. On
> the hardware where the original issues were happening leading to this
> changes, e.g. RB5.
> 
> Anyway, the problem here is poor explanation of the problem which did
> not improve in v3 and v4. Instead I receive explanations like:
> 
> "this is shutdown of serdev and not hdev's shutdown."
> Not related...
> 
this is the reply for secondary issue. i believe i have given much
explain for my fix for the 2nd issue as shown by below links.
let me add a bit more explanation within the ending "For the 2nd issue"
section, supposed you known much for generic flag
HCI_QUIRK_NON_PERSISTENT_SETUP, otherwise, see header comment for the
quirk. also supposed you see commit history to find why
qca_serdev_shutdown() was introduced for QCA6390.
https://lore.kernel.org/all/fe1a0e3b-3408-4a33-90e9-d4ffcfc7a99b@quicinc.com/
> "now. you understood why your merged change as shown link of 4) have
> problems and introduced our discussed issue, right?"
> 
this is the reply for the first issue as shown by below link. it almost
have the same description as the following "For 1st issue:" section.
i believe it have clear illustration why the commit have bugs.
https://lore.kernel.org/all/2166fc66-9340-4e8c-8662-17a19a7d8ce6@linaro.org/
> No. I did not understand and I feel I am wasting here time.
> > Code could be correct, could be wrong. Especially second patch looks
> suspicious. But the way Zijun Hu explains it and the way Zijun Hu
> responds is not helping at all.
> 
> Sorry, with such replies to review, it is not worth my time.
> 
> Best regards,
> Krzysztof
> 
Hi luiz,marcel

it is time for me to request you give comments for our discussion
and for my fixes, Let me explain the 1st issue then 2nd one.

For 1st issue:
1) the following commit will cause serious regression issue for QCA
controllers, and it has been merged with linus's mainline kernel.

Commit 56d074d26c58 ("Bluetooth: hci_qca: don't use IS_ERR_OR_NULL()
with gpiod_get_optional()").

2) the regression issue is described by [PATCH v4 1/2] commit message
  as following:
  BT can't be enabled after below steps:
  cold boot -> enable BT -> disable BT -> warm reboot -> BT enable
failure if property enable-gpios is not configured within DT|ACPI for
QCA_QCA6390.
  i will verify and confirm if QCA_QCA2066 and QCA_ROME also are impacted.

3) let me explain the bug point for commit mentioned by 1), its
   commit message and bug change applet are shown below.

The optional variants for the gpiod_get() family of functions return
NULL if the GPIO in question is not associated with this device. They
return ERR_PTR() on any other error. NULL descriptors are graciously
handled by GPIOLIB and can be safely passed to any of the GPIO consumer
interfaces as they will return 0 and act as if the function succeeded.
If one is using the optional variant, then there's no point in checking
for NULL.

 		qcadev->bt_en = devm_gpiod_get_optional(&serdev->dev, "enable",
 					       GPIOD_OUT_LOW);
-		if (IS_ERR_OR_NULL(qcadev->bt_en)) {
+		if (IS_ERR(qcadev->bt_en)) {
 			dev_warn(&serdev->dev, "failed to acquire enable gpio\n");
 			power_ctrl_enabled = false;
 		}
   3.1) we only need to discuss how to handle case "qcadev->bt_en ==
NULL" since this is only difference between the commit and BT original
design.
   3.2) BT original design are agree with the point of above commit
message that case "qcadev->bt_en == NULL" should not be treated as
error, so BT original design does not do error return for the case and
use dev_warn() instead of dev_err() to give.
   3.3) the commit misunderstands BT original design and wrongly think
BT original design take "qcadev->bt_en == NULL" as error case,
so change the following flag power_ctrl_enabled set logic and cause
discussed issue.

For the 2nd issue:
1) the following commit will cause below regression issue for QCA_QCA6390.
Commit 272970be3dab ("Bluetooth: hci_qca: Fix driver shutdown on closed
    serdev")

2) the regression issue is described by [PATCH v4 2/2] commit message
  as following:
  BT can't be enabled after below steps:
  cold boot -> enable BT -> disable BT -> warm reboot -> BT enable
failure if property enable-gpios is not configured within DT|ACPI for
QCA_QCA6390.

3) qca_serdev_shutdown() is serdev's shutdown and not hdev's shutdown()
it should not and also never get chance to be invoked even if BT is
disabled at above 2) step.  qca_serdev_shutdown() need to send the VSC
to reset controller during warm reset phase of above 2) steps.
Wren Turkal April 22, 2024, 5:21 a.m. UTC | #6
On 4/21/24 5:14 PM, quic_zijuhu wrote:
> On 4/22/2024 2:41 AM, Krzysztof Kozlowski wrote:
>> On 21/04/2024 09:44, Wren Turkal wrote:
>>> On 4/20/24 3:06 PM, Zijun Hu wrote:
>>>> This patch series are to fix below 2 regression issues for QCA controllers
>>>> 1) BT can't be enabled once BT was ever enabled for QCA_QCA6390
>>>> 2) BT can't be enabled after disable then warm reboot for QCA_QCA6390
>>>
>>> @Zijun @Krzysztof and @Bartosz Would it be helpful for me to test these
>>> to ensure they fix the issues I reported?
>>>
>>
>> I look forward to someone testing these on other hardware, not yours. On
>> the hardware where the original issues were happening leading to this
>> changes, e.g. RB5.
>>
>> Anyway, the problem here is poor explanation of the problem which did
>> not improve in v3 and v4. Instead I receive explanations like:
>>
>> "this is shutdown of serdev and not hdev's shutdown."
>> Not related...
>>
> this is the reply for secondary issue. i believe i have given much
> explain for my fix for the 2nd issue as shown by below links.
> let me add a bit more explanation within the ending "For the 2nd issue"
> section, supposed you known much for generic flag
> HCI_QUIRK_NON_PERSISTENT_SETUP, otherwise, see header comment for the
> quirk. also supposed you see commit history to find why
> qca_serdev_shutdown() was introduced for QCA6390.
> https://lore.kernel.org/all/fe1a0e3b-3408-4a33-90e9-d4ffcfc7a99b@quicinc.com/
>> "now. you understood why your merged change as shown link of 4) have
>> problems and introduced our discussed issue, right?"
>>
> this is the reply for the first issue as shown by below link. it almost
> have the same description as the following "For 1st issue:" section.
> i believe it have clear illustration why the commit have bugs.
> https://lore.kernel.org/all/2166fc66-9340-4e8c-8662-17a19a7d8ce6@linaro.org/
>> No. I did not understand and I feel I am wasting here time.
>>> Code could be correct, could be wrong. Especially second patch looks
>> suspicious. But the way Zijun Hu explains it and the way Zijun Hu
>> responds is not helping at all.
>>
>> Sorry, with such replies to review, it is not worth my time.
>>
>> Best regards,
>> Krzysztof
>>
> Hi luiz,marcel
> 
> it is time for me to request you give comments for our discussion
> and for my fixes, Let me explain the 1st issue then 2nd one.
> 
> For 1st issue:
> 1) the following commit will cause serious regression issue for QCA
> controllers, and it has been merged with linus's mainline kernel.
> 
> Commit 56d074d26c58 ("Bluetooth: hci_qca: don't use IS_ERR_OR_NULL()
> with gpiod_get_optional()").

As the user who originally reported thes issue, I can confirm this. I 
was introduced to this regression because I use Fedora Rawhide on my 
laptop, which builds and pushes kernels based on mainline very regularly.

Here is my description of the regression: After the reverted change, the 
BT hardware in my laptop (qca6390) will only work after a cold boot when 
the hardware has only be enabled once by the driver. Once the hardware 
is enabled, the process of disabling/re-enabling fails. Also, the 
hardware cannot be enabled after a warm boot of the laptop.

Among other things, this makes logging into KDE Plasma break my 
bluetooth mouse. The cause of this breakage appears to be that Plasma 
disables/re-enables bluetooth hardware upon login.

GNOME operates slightly less badly in that bluetooth stays enabled. 
However, if I manually disable the bluetooth via the ui or by restarting 
the bluetooth service with systemctl, the mouse fails in the same way as 
happens with Plasma.

Once the bluetooth has failed, the only way to fix is a cold boot and 
only enable the hardware once. I cannot remove the modules (btqca, 
hci_uart, and bluetooth) and re-modprobe them to fix it. I can't restart 
the bluetooth service. I can't do both of those things. I haven't found 
any way to re-enable the hardware beyond cold boot with bluetooth 
service enabled.

If I disable the bluetooth service and cold boot the laptop, there also 
appears to be some kind of race condition as not enabling bluetooth 
service very soon after loading the hci_uart and btqca modules during 
boot puts the system in a state where I can never enable bluetooth. I do 
not know what causes this specifically, but my theory is that not 
starting the bluetooth service immediately puts the driver in a similar 
state as when the service is started immediately. Maybe some kind of 
lazy initialization that is forced to happen more quickly when the 
bluetooth service is enabled?

Any way, this reversion by itself (which I manually did after a 
discussion with Zijun before getting his test patches applying to my 
kernel for test). However, this reversion did not get the hardware 
working after a warm boot.

> 2) the regression issue is described by [PATCH v4 1/2] commit message
>    as following:
>    BT can't be enabled after below steps:
>    cold boot -> enable BT -> disable BT -> warm reboot -> BT enable
> failure if property enable-gpios is not configured within DT|ACPI for
> QCA_QCA6390.
>    i will verify and confirm if QCA_QCA2066 and QCA_ROME also are impacted.

I can confirm this. Without this change (and with the #1 change), I can 
cold boot the laptop and disable/re-enable the hardware as many times as 
I want. However, warm booting will not allow the hardware to work. I 
believe that a similar problem existed before the 6.8 kernel (if memory 
serves), as I had been having issues of this sort for some time. I was 
able to reproduce a similar issue as far back as 5.19. I tested that and 
every intervening release until 6.8.0. I did not realize that the warm 
boot problem was separate from the enable/disable issue until working 
with Zijun.

> 3) let me explain the bug point for commit mentioned by 1), its
>     commit message and bug change applet are shown below.
> 
> The optional variants for the gpiod_get() family of functions return
> NULL if the GPIO in question is not associated with this device. They
> return ERR_PTR() on any other error. NULL descriptors are graciously
> handled by GPIOLIB and can be safely passed to any of the GPIO consumer
> interfaces as they will return 0 and act as if the function succeeded.
> If one is using the optional variant, then there's no point in checking
> for NULL.
> 
>   		qcadev->bt_en = devm_gpiod_get_optional(&serdev->dev, "enable",
>   					       GPIOD_OUT_LOW);
> -		if (IS_ERR_OR_NULL(qcadev->bt_en)) {
> +		if (IS_ERR(qcadev->bt_en)) {
>   			dev_warn(&serdev->dev, "failed to acquire enable gpio\n");
>   			power_ctrl_enabled = false;
>   		}
>     3.1) we only need to discuss how to handle case "qcadev->bt_en ==
> NULL" since this is only difference between the commit and BT original
> design.
>     3.2) BT original design are agree with the point of above commit
> message that case "qcadev->bt_en == NULL" should not be treated as
> error, so BT original design does not do error return for the case and
> use dev_warn() instead of dev_err() to give.
>     3.3) the commit misunderstands BT original design and wrongly think
> BT original design take "qcadev->bt_en == NULL" as error case,
> so change the following flag power_ctrl_enabled set logic and cause
> discussed issue.
> 
> For the 2nd issue:
> 1) the following commit will cause below regression issue for QCA_QCA6390.
> Commit 272970be3dab ("Bluetooth: hci_qca: Fix driver shutdown on closed
>      serdev")
> 
> 2) the regression issue is described by [PATCH v4 2/2] commit message
>    as following:
>    BT can't be enabled after below steps:
>    cold boot -> enable BT -> disable BT -> warm reboot -> BT enable
> failure if property enable-gpios is not configured within DT|ACPI for
> QCA_QCA6390.
> 
> 3) qca_serdev_shutdown() is serdev's shutdown and not hdev's shutdown()
> it should not and also never get chance to be invoked even if BT is
> disabled at above 2) step.  qca_serdev_shutdown() need to send the VSC
> to reset controller during warm reset phase of above 2) steps.

It was Zijun who realized that #1 and #2 these were two separate but 
related issues. He really dug in and found the problem and produced test 
patches. It was impressive, and he should be given credit for finding 
that these were the issues so quickly.

The only reason I'm involved here is that I am squeaky wheel that 
happened to be running Rawhide and got hurt by the kernel. I am a 
glorified beta tester who got unlucky, and I was hoping the find help in 
the kernel community. Zijun stepped up.

The only other thing that I am wondering about this patch set is if 
Zijun or some other party should be listed as the maintainer of the 
btqca module and hci_qca.c and btqca.* files so that they can be found 
more easily with the get_maintainer.pl script.

wt
Krzysztof Kozlowski April 22, 2024, 5:52 a.m. UTC | #7
On 22/04/2024 02:14, quic_zijuhu wrote:
> On 4/22/2024 2:41 AM, Krzysztof Kozlowski wrote:
>> On 21/04/2024 09:44, Wren Turkal wrote:
>>> On 4/20/24 3:06 PM, Zijun Hu wrote:
>>>> This patch series are to fix below 2 regression issues for QCA controllers
>>>> 1) BT can't be enabled once BT was ever enabled for QCA_QCA6390
>>>> 2) BT can't be enabled after disable then warm reboot for QCA_QCA6390
>>>
>>> @Zijun @Krzysztof and @Bartosz Would it be helpful for me to test these 
>>> to ensure they fix the issues I reported?
>>>
>>
>> I look forward to someone testing these on other hardware, not yours. On
>> the hardware where the original issues were happening leading to this
>> changes, e.g. RB5.
>>
>> Anyway, the problem here is poor explanation of the problem which did
>> not improve in v3 and v4. Instead I receive explanations like:
>>
>> "this is shutdown of serdev and not hdev's shutdown."
>> Not related...
>>
> this is the reply for secondary issue. i believe i have given much
> explain for my fix for the 2nd issue as shown by below links.

No, you did not.

> let me add a bit more explanation within the ending "For the 2nd issue"
> section, supposed you known much for generic flag
> HCI_QUIRK_NON_PERSISTENT_SETUP, otherwise, see header comment for the
> quirk. also supposed you see commit history to find why
> qca_serdev_shutdown() was introduced for QCA6390.
> https://lore.kernel.org/all/fe1a0e3b-3408-4a33-90e9-d4ffcfc7a99b@quicinc.com/

You did not answer my questions.

Let's quote:

"i don't explain much since these HCI_QUIRK_NON_PERSISTENT_SETUP and
HCI_SETUP is generic flag."

Srsly, what is such answer?





>> "now. you understood why your merged change as shown link of 4) have
>> problems and introduced our discussed issue, right?"
>>
> this is the reply for the first issue as shown by below link. it almost
> have the same description as the following "For 1st issue:" section.
> i believe it have clear illustration why the commit have bugs.
> https://lore.kernel.org/all/2166fc66-9340-4e8c-8662-17a19a7d8ce6@linaro.org/
>> No. I did not understand and I feel I am wasting here time.
>>> Code could be correct, could be wrong. Especially second patch looks
>> suspicious. But the way Zijun Hu explains it and the way Zijun Hu
>> responds is not helping at all.
>>
>> Sorry, with such replies to review, it is not worth my time.
>>
>> Best regards,
>> Krzysztof
>>
> Hi luiz,marcel
> 
> it is time for me to request you give comments for our discussion
> and for my fixes, Let me explain the 1st issue then 2nd one.

You keep pushing and pushing even though I stated my remarks.


> 
> For 1st issue:
> 1) the following commit will cause serious regression issue for QCA
> controllers, and it has been merged with linus's mainline kernel.
> 
> Commit 56d074d26c58 ("Bluetooth: hci_qca: don't use IS_ERR_OR_NULL()
> with gpiod_get_optional()").
> 
> 2) the regression issue is described by [PATCH v4 1/2] commit message
>   as following:
>   BT can't be enabled after below steps:
>   cold boot -> enable BT -> disable BT -> warm reboot -> BT enable
> failure if property enable-gpios is not configured within DT|ACPI for
> QCA_QCA6390.
>   i will verify and confirm if QCA_QCA2066 and QCA_ROME also are impacted.
> 
> 3) let me explain the bug point for commit mentioned by 1), its
>    commit message and bug change applet are shown below.
> 
> The optional variants for the gpiod_get() family of functions return
> NULL if the GPIO in question is not associated with this device. They
> return ERR_PTR() on any other error. NULL descriptors are graciously
> handled by GPIOLIB and can be safely passed to any of the GPIO consumer
> interfaces as they will return 0 and act as if the function succeeded.
> If one is using the optional variant, then there's no point in checking
> for NULL.
> 
>  		qcadev->bt_en = devm_gpiod_get_optional(&serdev->dev, "enable",
>  					       GPIOD_OUT_LOW);
> -		if (IS_ERR_OR_NULL(qcadev->bt_en)) {
> +		if (IS_ERR(qcadev->bt_en)) {
>  			dev_warn(&serdev->dev, "failed to acquire enable gpio\n");
>  			power_ctrl_enabled = false;
>  		}
>    3.1) we only need to discuss how to handle case "qcadev->bt_en ==
> NULL" since this is only difference between the commit and BT original
> design.
>    3.2) BT original design are agree with the point of above commit
> message that case "qcadev->bt_en == NULL" should not be treated as
> error, so BT original design does not do error return for the case and
> use dev_warn() instead of dev_err() to give.
>    3.3) the commit misunderstands BT original design and wrongly think
> BT original design take "qcadev->bt_en == NULL" as error case,
> so change the following flag power_ctrl_enabled set logic and cause
> discussed issue.
> 
> For the 2nd issue:
> 1) the following commit will cause below regression issue for QCA_QCA6390.
> Commit 272970be3dab ("Bluetooth: hci_qca: Fix driver shutdown on closed
>     serdev")
> 
> 2) the regression issue is described by [PATCH v4 2/2] commit message
>   as following:
>   BT can't be enabled after below steps:
>   cold boot -> enable BT -> disable BT -> warm reboot -> BT enable
> failure if property enable-gpios is not configured within DT|ACPI for
> QCA_QCA6390.

You did not address original issue of crash during shutdown and did not
clarify my questions.

> 
> 3) qca_serdev_shutdown() is serdev's shutdown and not hdev's shutdown()
> it should not and also never get chance to be invoked even if BT is
> disabled at above 2) step.  qca_serdev_shutdown() need to send the VSC
> to reset controller during warm reset phase of above 2) steps.

Anyway, any explanation providing background how you are fixing this
issue while keeping *previous problem fixed* is useful but should be
provided in commit msg. I asked about this two or three times.

BTW, provide here exact kernel version you tested this patches with.
Also the exact hardware.


Best regards,
Krzysztof
quic_zijuhu April 22, 2024, 6 a.m. UTC | #8
Hi Krzysztof,

could you list questions i need to explain within commit message based
on current v4 patch sets ?

let me send v5 patch sets with updated commit messages.

On 4/22/2024 1:52 PM, Krzysztof Kozlowski wrote:
> On 22/04/2024 02:14, quic_zijuhu wrote:
>> On 4/22/2024 2:41 AM, Krzysztof Kozlowski wrote:
>>> On 21/04/2024 09:44, Wren Turkal wrote:
>>>> On 4/20/24 3:06 PM, Zijun Hu wrote:
>>>>> This patch series are to fix below 2 regression issues for QCA controllers
>>>>> 1) BT can't be enabled once BT was ever enabled for QCA_QCA6390
>>>>> 2) BT can't be enabled after disable then warm reboot for QCA_QCA6390
>>>>
>>>> @Zijun @Krzysztof and @Bartosz Would it be helpful for me to test these 
>>>> to ensure they fix the issues I reported?
>>>>
>>>
>>> I look forward to someone testing these on other hardware, not yours. On
>>> the hardware where the original issues were happening leading to this
>>> changes, e.g. RB5.
>>>
>>> Anyway, the problem here is poor explanation of the problem which did
>>> not improve in v3 and v4. Instead I receive explanations like:
>>>
>>> "this is shutdown of serdev and not hdev's shutdown."
>>> Not related...
>>>
>> this is the reply for secondary issue. i believe i have given much
>> explain for my fix for the 2nd issue as shown by below links.
> 
> No, you did not.
> 
>> let me add a bit more explanation within the ending "For the 2nd issue"
>> section, supposed you known much for generic flag
>> HCI_QUIRK_NON_PERSISTENT_SETUP, otherwise, see header comment for the
>> quirk. also supposed you see commit history to find why
>> qca_serdev_shutdown() was introduced for QCA6390.
>> https://lore.kernel.org/all/fe1a0e3b-3408-4a33-90e9-d4ffcfc7a99b@quicinc.com/
> 
> You did not answer my questions.
> 
> Let's quote:
> 
> "i don't explain much since these HCI_QUIRK_NON_PERSISTENT_SETUP and
> HCI_SETUP is generic flag."
> 
> Srsly, what is such answer?
> 
> 
> 
> 
> 
>>> "now. you understood why your merged change as shown link of 4) have
>>> problems and introduced our discussed issue, right?"
>>>
>> this is the reply for the first issue as shown by below link. it almost
>> have the same description as the following "For 1st issue:" section.
>> i believe it have clear illustration why the commit have bugs.
>> https://lore.kernel.org/all/2166fc66-9340-4e8c-8662-17a19a7d8ce6@linaro.org/
>>> No. I did not understand and I feel I am wasting here time.
>>>> Code could be correct, could be wrong. Especially second patch looks
>>> suspicious. But the way Zijun Hu explains it and the way Zijun Hu
>>> responds is not helping at all.
>>>
>>> Sorry, with such replies to review, it is not worth my time.
>>>
>>> Best regards,
>>> Krzysztof
>>>
>> Hi luiz,marcel
>>
>> it is time for me to request you give comments for our discussion
>> and for my fixes, Let me explain the 1st issue then 2nd one.
> 
> You keep pushing and pushing even though I stated my remarks.
> 
> 
>>
>> For 1st issue:
>> 1) the following commit will cause serious regression issue for QCA
>> controllers, and it has been merged with linus's mainline kernel.
>>
>> Commit 56d074d26c58 ("Bluetooth: hci_qca: don't use IS_ERR_OR_NULL()
>> with gpiod_get_optional()").
>>
>> 2) the regression issue is described by [PATCH v4 1/2] commit message
>>   as following:
>>   BT can't be enabled after below steps:
>>   cold boot -> enable BT -> disable BT -> warm reboot -> BT enable
>> failure if property enable-gpios is not configured within DT|ACPI for
>> QCA_QCA6390.
>>   i will verify and confirm if QCA_QCA2066 and QCA_ROME also are impacted.
>>
>> 3) let me explain the bug point for commit mentioned by 1), its
>>    commit message and bug change applet are shown below.
>>
>> The optional variants for the gpiod_get() family of functions return
>> NULL if the GPIO in question is not associated with this device. They
>> return ERR_PTR() on any other error. NULL descriptors are graciously
>> handled by GPIOLIB and can be safely passed to any of the GPIO consumer
>> interfaces as they will return 0 and act as if the function succeeded.
>> If one is using the optional variant, then there's no point in checking
>> for NULL.
>>
>>  		qcadev->bt_en = devm_gpiod_get_optional(&serdev->dev, "enable",
>>  					       GPIOD_OUT_LOW);
>> -		if (IS_ERR_OR_NULL(qcadev->bt_en)) {
>> +		if (IS_ERR(qcadev->bt_en)) {
>>  			dev_warn(&serdev->dev, "failed to acquire enable gpio\n");
>>  			power_ctrl_enabled = false;
>>  		}
>>    3.1) we only need to discuss how to handle case "qcadev->bt_en ==
>> NULL" since this is only difference between the commit and BT original
>> design.
>>    3.2) BT original design are agree with the point of above commit
>> message that case "qcadev->bt_en == NULL" should not be treated as
>> error, so BT original design does not do error return for the case and
>> use dev_warn() instead of dev_err() to give.
>>    3.3) the commit misunderstands BT original design and wrongly think
>> BT original design take "qcadev->bt_en == NULL" as error case,
>> so change the following flag power_ctrl_enabled set logic and cause
>> discussed issue.
>>
>> For the 2nd issue:
>> 1) the following commit will cause below regression issue for QCA_QCA6390.
>> Commit 272970be3dab ("Bluetooth: hci_qca: Fix driver shutdown on closed
>>     serdev")
>>
>> 2) the regression issue is described by [PATCH v4 2/2] commit message
>>   as following:
>>   BT can't be enabled after below steps:
>>   cold boot -> enable BT -> disable BT -> warm reboot -> BT enable
>> failure if property enable-gpios is not configured within DT|ACPI for
>> QCA_QCA6390.
> 
> You did not address original issue of crash during shutdown and did not
> clarify my questions.
> 
>>
>> 3) qca_serdev_shutdown() is serdev's shutdown and not hdev's shutdown()
>> it should not and also never get chance to be invoked even if BT is
>> disabled at above 2) step.  qca_serdev_shutdown() need to send the VSC
>> to reset controller during warm reset phase of above 2) steps.
> 
> Anyway, any explanation providing background how you are fixing this
> issue while keeping *previous problem fixed* is useful but should be
> provided in commit msg. I asked about this two or three times.
> 
> BTW, provide here exact kernel version you tested this patches with.
> Also the exact hardware.
> 
> 
> Best regards,
> Krzysztof
>
Krzysztof Kozlowski April 22, 2024, 7:45 a.m. UTC | #9
On 22/04/2024 08:00, quic_zijuhu wrote:
> Hi Krzysztof,
> 
> could you list questions i need to explain within commit message based
> on current v4 patch sets ?
> 
> let me send v5 patch sets with updated commit messages.

NAK, no.

Stop sending new versions. You ignored several feedbacks already and my
question from that email.

Best regards,
Krzysztof
Bartosz Golaszewski April 22, 2024, 8:51 a.m. UTC | #10
On Mon, 22 Apr 2024 at 07:21, Wren Turkal <wt@penguintechs.org> wrote:
>
> As the user who originally reported thes issue, I can confirm this. I
> was introduced to this regression because I use Fedora Rawhide on my
> laptop, which builds and pushes kernels based on mainline very regularly.
>

I don't doubt my patch could have caused a regression.

> Here is my description of the regression: After the reverted change, the
> BT hardware in my laptop (qca6390) will only work after a cold boot when
> the hardware has only be enabled once by the driver. Once the hardware
> is enabled, the process of disabling/re-enabling fails. Also, the
> hardware cannot be enabled after a warm boot of the laptop.
>
> Among other things, this makes logging into KDE Plasma break my
> bluetooth mouse. The cause of this breakage appears to be that Plasma
> disables/re-enables bluetooth hardware upon login.
>
> GNOME operates slightly less badly in that bluetooth stays enabled.
> However, if I manually disable the bluetooth via the ui or by restarting
> the bluetooth service with systemctl, the mouse fails in the same way as
> happens with Plasma.
>
> Once the bluetooth has failed, the only way to fix is a cold boot and
> only enable the hardware once. I cannot remove the modules (btqca,
> hci_uart, and bluetooth) and re-modprobe them to fix it. I can't restart
> the bluetooth service. I can't do both of those things. I haven't found
> any way to re-enable the hardware beyond cold boot with bluetooth
> service enabled.
>
> If I disable the bluetooth service and cold boot the laptop, there also
> appears to be some kind of race condition as not enabling bluetooth
> service very soon after loading the hci_uart and btqca modules during
> boot puts the system in a state where I can never enable bluetooth. I do
> not know what causes this specifically, but my theory is that not
> starting the bluetooth service immediately puts the driver in a similar
> state as when the service is started immediately. Maybe some kind of
> lazy initialization that is forced to happen more quickly when the
> bluetooth service is enabled?
>
> Any way, this reversion by itself (which I manually did after a
> discussion with Zijun before getting his test patches applying to my
> kernel for test). However, this reversion did not get the hardware
> working after a warm boot.
>

This all sounds plausible. However just reverting this patch is a
waste of time as checking IS_ERR_OR_NULL() on the return value of
gpiod_get_optional() and continuing on error is wrong as I explained
several times under Ziju's emails already. I provided a suggestion:
bail out on error returned from gpiod_get_optional() even if the
driver could technically continue in some cases. I don't want to have
to argue this anymore.

Bart
quic_zijuhu April 22, 2024, 10:05 a.m. UTC | #11
On 4/22/2024 1:52 PM, Krzysztof Kozlowski wrote:
> On 22/04/2024 02:14, quic_zijuhu wrote:
>> On 4/22/2024 2:41 AM, Krzysztof Kozlowski wrote:
>>> On 21/04/2024 09:44, Wren Turkal wrote:
>>>> On 4/20/24 3:06 PM, Zijun Hu wrote:
>>>>> This patch series are to fix below 2 regression issues for QCA controllers
>>>>> 1) BT can't be enabled once BT was ever enabled for QCA_QCA6390
>>>>> 2) BT can't be enabled after disable then warm reboot for QCA_QCA6390
>>>>
>>>> @Zijun @Krzysztof and @Bartosz Would it be helpful for me to test these 
>>>> to ensure they fix the issues I reported?
>>>>
>>>
>>> I look forward to someone testing these on other hardware, not yours. On
>>> the hardware where the original issues were happening leading to this
>>> changes, e.g. RB5.
>>>
>>> Anyway, the problem here is poor explanation of the problem which did
>>> not improve in v3 and v4. Instead I receive explanations like:
>>>
>>> "this is shutdown of serdev and not hdev's shutdown."
>>> Not related...
>>>
>> this is the reply for secondary issue. i believe i have given much
>> explain for my fix for the 2nd issue as shown by below links.
> 
> No, you did not.
> 
>> let me add a bit more explanation within the ending "For the 2nd issue"
>> section, supposed you known much for generic flag
>> HCI_QUIRK_NON_PERSISTENT_SETUP, otherwise, see header comment for the
>> quirk. also supposed you see commit history to find why
>> qca_serdev_shutdown() was introduced for QCA6390.
>> https://lore.kernel.org/all/fe1a0e3b-3408-4a33-90e9-d4ffcfc7a99b@quicinc.com/
> 
> You did not answer my questions.
> 
> Let's quote:
> 
> "i don't explain much since these HCI_QUIRK_NON_PERSISTENT_SETUP and
> HCI_SETUP is generic flag."
> 
> Srsly, what is such answer?
> 
> 
i reviewed my reply. i have explained to you why my change fix both this
issue and the issue your commit fixed.

so i don't think it is meaningful to explain why your wrong condition
are changed by me.
> 
> 
> 
>>> "now. you understood why your merged change as shown link of 4) have
>>> problems and introduced our discussed issue, right?"
>>>
>> this is the reply for the first issue as shown by below link. it almost
>> have the same description as the following "For 1st issue:" section.
>> i believe it have clear illustration why the commit have bugs.
>> https://lore.kernel.org/all/2166fc66-9340-4e8c-8662-17a19a7d8ce6@linaro.org/
>>> No. I did not understand and I feel I am wasting here time.
>>>> Code could be correct, could be wrong. Especially second patch looks
>>> suspicious. But the way Zijun Hu explains it and the way Zijun Hu
>>> responds is not helping at all.
>>>
>>> Sorry, with such replies to review, it is not worth my time.
>>>
>>> Best regards,
>>> Krzysztof
>>>
>> Hi luiz,marcel
>>
>> it is time for me to request you give comments for our discussion
>> and for my fixes, Let me explain the 1st issue then 2nd one.
> 
> You keep pushing and pushing even though I stated my remarks.
> 
> 
>>
>> For 1st issue:
>> 1) the following commit will cause serious regression issue for QCA
>> controllers, and it has been merged with linus's mainline kernel.
>>
>> Commit 56d074d26c58 ("Bluetooth: hci_qca: don't use IS_ERR_OR_NULL()
>> with gpiod_get_optional()").
>>
>> 2) the regression issue is described by [PATCH v4 1/2] commit message
>>   as following:
>>   BT can't be enabled after below steps:
>>   cold boot -> enable BT -> disable BT -> warm reboot -> BT enable
>> failure if property enable-gpios is not configured within DT|ACPI for
>> QCA_QCA6390.
>>   i will verify and confirm if QCA_QCA2066 and QCA_ROME also are impacted.
>>
>> 3) let me explain the bug point for commit mentioned by 1), its
>>    commit message and bug change applet are shown below.
>>
>> The optional variants for the gpiod_get() family of functions return
>> NULL if the GPIO in question is not associated with this device. They
>> return ERR_PTR() on any other error. NULL descriptors are graciously
>> handled by GPIOLIB and can be safely passed to any of the GPIO consumer
>> interfaces as they will return 0 and act as if the function succeeded.
>> If one is using the optional variant, then there's no point in checking
>> for NULL.
>>
>>  		qcadev->bt_en = devm_gpiod_get_optional(&serdev->dev, "enable",
>>  					       GPIOD_OUT_LOW);
>> -		if (IS_ERR_OR_NULL(qcadev->bt_en)) {
>> +		if (IS_ERR(qcadev->bt_en)) {
>>  			dev_warn(&serdev->dev, "failed to acquire enable gpio\n");
>>  			power_ctrl_enabled = false;
>>  		}
>>    3.1) we only need to discuss how to handle case "qcadev->bt_en ==
>> NULL" since this is only difference between the commit and BT original
>> design.
>>    3.2) BT original design are agree with the point of above commit
>> message that case "qcadev->bt_en == NULL" should not be treated as
>> error, so BT original design does not do error return for the case and
>> use dev_warn() instead of dev_err() to give.
>>    3.3) the commit misunderstands BT original design and wrongly think
>> BT original design take "qcadev->bt_en == NULL" as error case,
>> so change the following flag power_ctrl_enabled set logic and cause
>> discussed issue.
>>
>> For the 2nd issue:
>> 1) the following commit will cause below regression issue for QCA_QCA6390.
>> Commit 272970be3dab ("Bluetooth: hci_qca: Fix driver shutdown on closed
>>     serdev")
>>
>> 2) the regression issue is described by [PATCH v4 2/2] commit message
>>   as following:
>>   BT can't be enabled after below steps:
>>   cold boot -> enable BT -> disable BT -> warm reboot -> BT enable
>> failure if property enable-gpios is not configured within DT|ACPI for
>> QCA_QCA6390.
> 
> You did not address original issue of crash during shutdown and did not
> clarify my questions.
> 
as i statemented. my fix have fixed both this issue and the original
crash issue. don't need to talk about others.
>>
>> 3) qca_serdev_shutdown() is serdev's shutdown and not hdev's shutdown()
>> it should not and also never get chance to be invoked even if BT is
>> disabled at above 2) step.  qca_serdev_shutdown() need to send the VSC
>> to reset controller during warm reset phase of above 2) steps.
> 
> Anyway, any explanation providing background how you are fixing this
> issue while keeping *previous problem fixed* is useful but should be
> provided in commit msg. I asked about this two or three times.
> 
> BTW, provide here exact kernel version you tested this patches with.
> Also the exact hardware.
> 
there are almost no commit with tag Tested-by also provide exact kernel
version. for one type bt controller. different h/w has different config.
important is that this issue is fixed in reported H/W and don't cause
issue for other issue.

let us stop here and wait for other comments.

i have given too much explanations for my change of only total 7 lines.
> 
> Best regards,
> Krzysztof
>
Wren Turkal April 22, 2024, 10:42 a.m. UTC | #12
On 4/22/24 1:51 AM, Bartosz Golaszewski wrote:
> On Mon, 22 Apr 2024 at 07:21, Wren Turkal <wt@penguintechs.org> wrote:
>>
>> As the user who originally reported thes issue, I can confirm this. I
>> was introduced to this regression because I use Fedora Rawhide on my
>> laptop, which builds and pushes kernels based on mainline very regularly.
>>
> 
> I don't doubt my patch could have caused a regression.
> 
>> Here is my description of the regression: After the reverted change, the
>> BT hardware in my laptop (qca6390) will only work after a cold boot when
>> the hardware has only be enabled once by the driver. Once the hardware
>> is enabled, the process of disabling/re-enabling fails. Also, the
>> hardware cannot be enabled after a warm boot of the laptop.
>>
>> Among other things, this makes logging into KDE Plasma break my
>> bluetooth mouse. The cause of this breakage appears to be that Plasma
>> disables/re-enables bluetooth hardware upon login.
>>
>> GNOME operates slightly less badly in that bluetooth stays enabled.
>> However, if I manually disable the bluetooth via the ui or by restarting
>> the bluetooth service with systemctl, the mouse fails in the same way as
>> happens with Plasma.
>>
>> Once the bluetooth has failed, the only way to fix is a cold boot and
>> only enable the hardware once. I cannot remove the modules (btqca,
>> hci_uart, and bluetooth) and re-modprobe them to fix it. I can't restart
>> the bluetooth service. I can't do both of those things. I haven't found
>> any way to re-enable the hardware beyond cold boot with bluetooth
>> service enabled.
>>
>> If I disable the bluetooth service and cold boot the laptop, there also
>> appears to be some kind of race condition as not enabling bluetooth
>> service very soon after loading the hci_uart and btqca modules during
>> boot puts the system in a state where I can never enable bluetooth. I do
>> not know what causes this specifically, but my theory is that not
>> starting the bluetooth service immediately puts the driver in a similar
>> state as when the service is started immediately. Maybe some kind of
>> lazy initialization that is forced to happen more quickly when the
>> bluetooth service is enabled?
>>
>> Any way, this reversion by itself (which I manually did after a
>> discussion with Zijun before getting his test patches applying to my
>> kernel for test). However, this reversion did not get the hardware
>> working after a warm boot.
>>
> 
> This all sounds plausible. However just reverting this patch is a
> waste of time as checking IS_ERR_OR_NULL() on the return value of
> gpiod_get_optional() and continuing on error is wrong as I explained
> several times under Ziju's emails already. I provided a suggestion:
> bail out on error returned from gpiod_get_optional() even if the
> driver could technically continue in some cases. I don't want to have
> to argue this anymore.

I'm not trying to argue. I am trying to find a path forward as a 
concerned user. I am also trying to figure out if there is any way I can 
help resolve this. I am not a kernel developer, but I would really like 
to contribute in some way, if possible.

> 
> Bart
Krzysztof Kozlowski April 22, 2024, 12:28 p.m. UTC | #13
On 22/04/2024 12:05, quic_zijuhu wrote:
>>> 3) qca_serdev_shutdown() is serdev's shutdown and not hdev's shutdown()
>>> it should not and also never get chance to be invoked even if BT is
>>> disabled at above 2) step.  qca_serdev_shutdown() need to send the VSC
>>> to reset controller during warm reset phase of above 2) steps.
>>
>> Anyway, any explanation providing background how you are fixing this
>> issue while keeping *previous problem fixed* is useful but should be
>> provided in commit msg. I asked about this two or three times.
>>
>> BTW, provide here exact kernel version you tested this patches with.
>> Also the exact hardware.
>>
> there are almost no commit with tag Tested-by also provide exact kernel

?!?

So this was not tested at all by you on mainline kernel and you push
downstream patch? That's how shall we understand this?

> version. for one type bt controller. different h/w has different config.
> important is that this issue is fixed in reported H/W and don't cause
> issue for other issue.

Amount of pushback from your side and ignoring questions raised during
review is way too much.

> 
> let us stop here and wait for other comments.

So why do you push again in v5?

Best regards,
Krzysztof
Bartosz Golaszewski April 22, 2024, 1:02 p.m. UTC | #14
On Mon, 22 Apr 2024 at 12:42, Wren Turkal <wt@penguintechs.org> wrote:
>
> On 4/22/24 1:51 AM, Bartosz Golaszewski wrote:
> >
> > This all sounds plausible. However just reverting this patch is a
> > waste of time as checking IS_ERR_OR_NULL() on the return value of
> > gpiod_get_optional() and continuing on error is wrong as I explained
> > several times under Ziju's emails already. I provided a suggestion:
> > bail out on error returned from gpiod_get_optional() even if the
> > driver could technically continue in some cases. I don't want to have
> > to argue this anymore.
>
> I'm not trying to argue. I am trying to find a path forward as a
> concerned user. I am also trying to figure out if there is any way I can
> help resolve this. I am not a kernel developer, but I would really like
> to contribute in some way, if possible.
>

Can you test the patch[1] I just sent?

Bart

[1] https://lore.kernel.org/linux-bluetooth/20240422130036.31856-1-brgl@bgdev.pl/
Wren Turkal April 24, 2024, 1:52 a.m. UTC | #15
On 4/22/24 6:02 AM, Bartosz Golaszewski wrote:
> Can you test the patch[1] I just sent?

I am doing this now. Just to be clear, I am testing the patch I found in 
the thread with subject "[PATCH] Bluetooth: qca: set power_ctrl_enabled 
on NULL returned by gpiod_get_optional()". If that isn't the one you're 
referring to, please let me know.

I will reply back to that patch after testing.

wt