mbox series

[0/2] Disable SS instances in park mode for SC7180/ SC7280

Message ID 20240530082556.2960148-1-quic_kriskura@quicinc.com (mailing list archive)
Headers show
Series Disable SS instances in park mode for SC7180/ SC7280 | expand

Message

Krishna Kurapati PSSNV May 30, 2024, 8:25 a.m. UTC
When working in host mode, in certain conditions, when the USB
host controller is stressed, there is a HC died warning that comes up.
Fix this up by disabling SS instances in park mode for SC7280 and SC7180.

Krishna Kurapati (2):
  arm64: dts: qcom: sc7180: Disable SS instances in park mode
  arm64: dts: qcom: sc7280: Disable SS instances in park mode

 arch/arm64/boot/dts/qcom/sc7180.dtsi | 1 +
 arch/arm64/boot/dts/qcom/sc7280.dtsi | 1 +
 2 files changed, 2 insertions(+)

Comments

Doug Anderson May 30, 2024, 1:34 p.m. UTC | #1
Hi,

On Thu, May 30, 2024 at 1:26 AM Krishna Kurapati
<quic_kriskura@quicinc.com> wrote:
>
> When working in host mode, in certain conditions, when the USB
> host controller is stressed, there is a HC died warning that comes up.
> Fix this up by disabling SS instances in park mode for SC7280 and SC7180.
>
> Krishna Kurapati (2):
>   arm64: dts: qcom: sc7180: Disable SS instances in park mode
>   arm64: dts: qcom: sc7280: Disable SS instances in park mode
>
>  arch/arm64/boot/dts/qcom/sc7180.dtsi | 1 +
>  arch/arm64/boot/dts/qcom/sc7280.dtsi | 1 +
>  2 files changed, 2 insertions(+)

FWIW, the test case I used to reproduce this:

1. Plug in a USB dock w/ Ethernet
2. Plug a USB 3 SD card reader into the dock.
3. Use lsusb -t to confirm both Ethernet and card reader are on USB3.
4. From a shell, run for i in $(seq 5); do dd if=/dev/sdb of=/dev/null
bs=4M; done to read from the card reader.
5. At the same time, stress the Internet. If you've got a very fast
Internet connection then running Google's "Internet speed test" did
it, but I could also reproduce by just running this from a PC
connected to the same network as my DUT: ssh ${DUT} "dd of=/dev/null"
< /dev/zero

I would also note that, though I personally reproduced this on sc7180
and sc7280 boards and thus Krishna posted the patch for those boards,
there's no reason to believe that this problem doesn't affect all of
Qualcomm's SoCs. It would be nice if someone at Qualcomm could post a
followup patch fixing this everywhere.

-Doug
Konrad Dybcio May 31, 2024, 12:33 p.m. UTC | #2
On 30.05.2024 3:34 PM, Doug Anderson wrote:
> Hi,
> 
> On Thu, May 30, 2024 at 1:26 AM Krishna Kurapati
> <quic_kriskura@quicinc.com> wrote:
>>
>> When working in host mode, in certain conditions, when the USB
>> host controller is stressed, there is a HC died warning that comes up.
>> Fix this up by disabling SS instances in park mode for SC7280 and SC7180.
>>
>> Krishna Kurapati (2):
>>   arm64: dts: qcom: sc7180: Disable SS instances in park mode
>>   arm64: dts: qcom: sc7280: Disable SS instances in park mode
>>
>>  arch/arm64/boot/dts/qcom/sc7180.dtsi | 1 +
>>  arch/arm64/boot/dts/qcom/sc7280.dtsi | 1 +
>>  2 files changed, 2 insertions(+)
> 
> FWIW, the test case I used to reproduce this:
> 
> 1. Plug in a USB dock w/ Ethernet
> 2. Plug a USB 3 SD card reader into the dock.
> 3. Use lsusb -t to confirm both Ethernet and card reader are on USB3.
> 4. From a shell, run for i in $(seq 5); do dd if=/dev/sdb of=/dev/null
> bs=4M; done to read from the card reader.
> 5. At the same time, stress the Internet. If you've got a very fast
> Internet connection then running Google's "Internet speed test" did
> it, but I could also reproduce by just running this from a PC
> connected to the same network as my DUT: ssh ${DUT} "dd of=/dev/null"
> < /dev/zero
> 
> I would also note that, though I personally reproduced this on sc7180
> and sc7280 boards and thus Krishna posted the patch for those boards,
> there's no reason to believe that this problem doesn't affect all of
> Qualcomm's SoCs. It would be nice if someone at Qualcomm could post a
> followup patch fixing this everywhere.

Right, this sounds like a more widespread issue

That said, I couldn't reproduce it on SC8280XP / X13s (which does NOT mean
8280 isn't affected). My setup was:

- USB3 5GB/s hub plugged into one of the side USBs
  - on-hub 1 Gb /s network hub connected straight to my router with a
    600 / 60 Mbps link, spamming speedtest-cli and dd-over-ssh
  - M.2 SSD connected over a USB adapter, nearing 280 MB/s speeds (the
    adapter isn't particularly speedy)

So it stands to reason that it might not have been enough to trigger it.

Konrad
Doug Anderson May 31, 2024, 2:17 p.m. UTC | #3
Hi,

On Fri, May 31, 2024 at 5:33 AM Konrad Dybcio <konrad.dybcio@linaro.org> wrote:
>
> On 30.05.2024 3:34 PM, Doug Anderson wrote:
> > Hi,
> >
> > On Thu, May 30, 2024 at 1:26 AM Krishna Kurapati
> > <quic_kriskura@quicinc.com> wrote:
> >>
> >> When working in host mode, in certain conditions, when the USB
> >> host controller is stressed, there is a HC died warning that comes up.
> >> Fix this up by disabling SS instances in park mode for SC7280 and SC7180.
> >>
> >> Krishna Kurapati (2):
> >>   arm64: dts: qcom: sc7180: Disable SS instances in park mode
> >>   arm64: dts: qcom: sc7280: Disable SS instances in park mode
> >>
> >>  arch/arm64/boot/dts/qcom/sc7180.dtsi | 1 +
> >>  arch/arm64/boot/dts/qcom/sc7280.dtsi | 1 +
> >>  2 files changed, 2 insertions(+)
> >
> > FWIW, the test case I used to reproduce this:
> >
> > 1. Plug in a USB dock w/ Ethernet
> > 2. Plug a USB 3 SD card reader into the dock.
> > 3. Use lsusb -t to confirm both Ethernet and card reader are on USB3.
> > 4. From a shell, run for i in $(seq 5); do dd if=/dev/sdb of=/dev/null
> > bs=4M; done to read from the card reader.
> > 5. At the same time, stress the Internet. If you've got a very fast
> > Internet connection then running Google's "Internet speed test" did
> > it, but I could also reproduce by just running this from a PC
> > connected to the same network as my DUT: ssh ${DUT} "dd of=/dev/null"
> > < /dev/zero
> >
> > I would also note that, though I personally reproduced this on sc7180
> > and sc7280 boards and thus Krishna posted the patch for those boards,
> > there's no reason to believe that this problem doesn't affect all of
> > Qualcomm's SoCs. It would be nice if someone at Qualcomm could post a
> > followup patch fixing this everywhere.
>
> Right, this sounds like a more widespread issue
>
> That said, I couldn't reproduce it on SC8280XP / X13s (which does NOT mean
> 8280 isn't affected). My setup was:
>
> - USB3 5GB/s hub plugged into one of the side USBs
>   - on-hub 1 Gb /s network hub connected straight to my router with a
>     600 / 60 Mbps link, spamming speedtest-cli and dd-over-ssh
>   - M.2 SSD connected over a USB adapter, nearing 280 MB/s speeds (the
>     adapter isn't particularly speedy)
>
> So it stands to reason that it might not have been enough to trigger it.

In my case I wasn't using anything nearly as fast as a M.2 SSD. I was
just using a normal USB3 SD card reader. That being said, multiple
people at Qualcomm were able to replicate the issue without lots of
back and forth, so I'd guess that the problem isn't that sensitive to
the exact storage device. I will also note that it's not sensitive to
the exact network device as I replicated it with two Ethernet adapters
with very different chipsets.

My only guess is that somehow SC8280XP is faster and that changes the
timing of how it handles interrupts. I guess you could try capping
your cpufreq in sysfs and see if that makes a difference in
reproducing. ;-) ...or maybe somehow SC8280XP has a newer version of
the IP where they've fixed this?

It would be interesting if someone with a SDM845 dragonboard could try
replicating since that seems highly likely to reproduce, at least.

-Doug
Konrad Dybcio May 31, 2024, 2:26 p.m. UTC | #4
On 31.05.2024 4:17 PM, Doug Anderson wrote:
> Hi,
> 
> On Fri, May 31, 2024 at 5:33 AM Konrad Dybcio <konrad.dybcio@linaro.org> wrote:
>>
>> On 30.05.2024 3:34 PM, Doug Anderson wrote:
>>> Hi,
>>>
>>> On Thu, May 30, 2024 at 1:26 AM Krishna Kurapati
>>> <quic_kriskura@quicinc.com> wrote:
>>>>
>>>> When working in host mode, in certain conditions, when the USB
>>>> host controller is stressed, there is a HC died warning that comes up.
>>>> Fix this up by disabling SS instances in park mode for SC7280 and SC7180.
>>>>
>>>> Krishna Kurapati (2):
>>>>   arm64: dts: qcom: sc7180: Disable SS instances in park mode
>>>>   arm64: dts: qcom: sc7280: Disable SS instances in park mode
>>>>
>>>>  arch/arm64/boot/dts/qcom/sc7180.dtsi | 1 +
>>>>  arch/arm64/boot/dts/qcom/sc7280.dtsi | 1 +
>>>>  2 files changed, 2 insertions(+)
>>>
>>> FWIW, the test case I used to reproduce this:
>>>
>>> 1. Plug in a USB dock w/ Ethernet
>>> 2. Plug a USB 3 SD card reader into the dock.
>>> 3. Use lsusb -t to confirm both Ethernet and card reader are on USB3.
>>> 4. From a shell, run for i in $(seq 5); do dd if=/dev/sdb of=/dev/null
>>> bs=4M; done to read from the card reader.
>>> 5. At the same time, stress the Internet. If you've got a very fast
>>> Internet connection then running Google's "Internet speed test" did
>>> it, but I could also reproduce by just running this from a PC
>>> connected to the same network as my DUT: ssh ${DUT} "dd of=/dev/null"
>>> < /dev/zero
>>>
>>> I would also note that, though I personally reproduced this on sc7180
>>> and sc7280 boards and thus Krishna posted the patch for those boards,
>>> there's no reason to believe that this problem doesn't affect all of
>>> Qualcomm's SoCs. It would be nice if someone at Qualcomm could post a
>>> followup patch fixing this everywhere.
>>
>> Right, this sounds like a more widespread issue
>>
>> That said, I couldn't reproduce it on SC8280XP / X13s (which does NOT mean
>> 8280 isn't affected). My setup was:
>>
>> - USB3 5GB/s hub plugged into one of the side USBs
>>   - on-hub 1 Gb /s network hub connected straight to my router with a
>>     600 / 60 Mbps link, spamming speedtest-cli and dd-over-ssh
>>   - M.2 SSD connected over a USB adapter, nearing 280 MB/s speeds (the
>>     adapter isn't particularly speedy)
>>
>> So it stands to reason that it might not have been enough to trigger it.
> 
> In my case I wasn't using anything nearly as fast as a M.2 SSD. I was
> just using a normal USB3 SD card reader. That being said, multiple
> people at Qualcomm were able to replicate the issue without lots of
> back and forth, so I'd guess that the problem isn't that sensitive to
> the exact storage device. I will also note that it's not sensitive to
> the exact network device as I replicated it with two Ethernet adapters
> with very different chipsets.
> 
> My only guess is that somehow SC8280XP is faster and that changes the
> timing of how it handles interrupts. I guess you could try capping
> your cpufreq in sysfs and see if that makes a difference in
> reproducing. ;-) ...or maybe somehow SC8280XP has a newer version of
> the IP where they've fixed this?

Well, great minds think alike :P I did cap it to f_min on all cores, but
that didn't change the situation. Might have been worth to check out powering
off all cores except 0.. I might do that at one point.

My guess is that with a process node change, they might have used some
newer/better ip revision though. Remains to be seen.

Konrad

> 
> It would be interesting if someone with a SDM845 dragonboard could try
> replicating since that seems highly likely to reproduce, at least.
> 
> -Doug
Krishna Kurapati PSSNV May 31, 2024, 2:27 p.m. UTC | #5
On 5/31/2024 7:47 PM, Doug Anderson wrote:
> Hi,
> 
> On Fri, May 31, 2024 at 5:33 AM Konrad Dybcio <konrad.dybcio@linaro.org> wrote:
>>
>> On 30.05.2024 3:34 PM, Doug Anderson wrote:
>>> Hi,
>>>
>>> On Thu, May 30, 2024 at 1:26 AM Krishna Kurapati
>>> <quic_kriskura@quicinc.com> wrote:
>>>>
>>>> When working in host mode, in certain conditions, when the USB
>>>> host controller is stressed, there is a HC died warning that comes up.
>>>> Fix this up by disabling SS instances in park mode for SC7280 and SC7180.
>>>>
>>>> Krishna Kurapati (2):
>>>>    arm64: dts: qcom: sc7180: Disable SS instances in park mode
>>>>    arm64: dts: qcom: sc7280: Disable SS instances in park mode
>>>>
>>>>   arch/arm64/boot/dts/qcom/sc7180.dtsi | 1 +
>>>>   arch/arm64/boot/dts/qcom/sc7280.dtsi | 1 +
>>>>   2 files changed, 2 insertions(+)
>>>
>>> FWIW, the test case I used to reproduce this:
>>>
>>> 1. Plug in a USB dock w/ Ethernet
>>> 2. Plug a USB 3 SD card reader into the dock.
>>> 3. Use lsusb -t to confirm both Ethernet and card reader are on USB3.
>>> 4. From a shell, run for i in $(seq 5); do dd if=/dev/sdb of=/dev/null
>>> bs=4M; done to read from the card reader.
>>> 5. At the same time, stress the Internet. If you've got a very fast
>>> Internet connection then running Google's "Internet speed test" did
>>> it, but I could also reproduce by just running this from a PC
>>> connected to the same network as my DUT: ssh ${DUT} "dd of=/dev/null"
>>> < /dev/zero
>>>
>>> I would also note that, though I personally reproduced this on sc7180
>>> and sc7280 boards and thus Krishna posted the patch for those boards,
>>> there's no reason to believe that this problem doesn't affect all of
>>> Qualcomm's SoCs. It would be nice if someone at Qualcomm could post a
>>> followup patch fixing this everywhere.
>>
>> Right, this sounds like a more widespread issue
>>
>> That said, I couldn't reproduce it on SC8280XP / X13s (which does NOT mean
>> 8280 isn't affected). My setup was:
>>
>> - USB3 5GB/s hub plugged into one of the side USBs
>>    - on-hub 1 Gb /s network hub connected straight to my router with a
>>      600 / 60 Mbps link, spamming speedtest-cli and dd-over-ssh
>>    - M.2 SSD connected over a USB adapter, nearing 280 MB/s speeds (the
>>      adapter isn't particularly speedy)
>>
>> So it stands to reason that it might not have been enough to trigger it.
> 
> In my case I wasn't using anything nearly as fast as a M.2 SSD. I was
> just using a normal USB3 SD card reader. That being said, multiple
> people at Qualcomm were able to replicate the issue without lots of
> back and forth, so I'd guess that the problem isn't that sensitive to
> the exact storage device. I will also note that it's not sensitive to
> the exact network device as I replicated it with two Ethernet adapters
> with very different chipsets.
> 
> My only guess is that somehow SC8280XP is faster and that changes the
> timing of how it handles interrupts. I guess you could try capping
> your cpufreq in sysfs and see if that makes a difference in
> reproducing. ;-) ...or maybe somehow SC8280XP has a newer version of
> the IP where they've fixed this?
> 
> It would be interesting if someone with a SDM845 dragonboard could try
> replicating since that seems highly likely to reproduce, at least.
> 

Hi Konrad, Doug,

  Usually on downstream we set this quirk only for all Gen-1 targets 
(not particularly for this testcase) but to avoid these kind of 
controller going dead issues. I can filter out the gen-1 targets (other 
than sc7280/sc7180) and send a separate series to add this quirk in all 
of them.

Regards,
Krishna,
Doug Anderson May 31, 2024, 2:31 p.m. UTC | #6
Hi,

On Fri, May 31, 2024 at 7:27 AM Krishna Kurapati PSSNV
<quic_kriskura@quicinc.com> wrote:
>
> > My only guess is that somehow SC8280XP is faster and that changes the
> > timing of how it handles interrupts. I guess you could try capping
> > your cpufreq in sysfs and see if that makes a difference in
> > reproducing. ;-) ...or maybe somehow SC8280XP has a newer version of
> > the IP where they've fixed this?
> >
> > It would be interesting if someone with a SDM845 dragonboard could try
> > replicating since that seems highly likely to reproduce, at least.
> >
>
> Hi Konrad, Doug,
>
>   Usually on downstream we set this quirk only for all Gen-1 targets
> (not particularly for this testcase) but to avoid these kind of
> controller going dead issues. I can filter out the gen-1 targets (other
> than sc7280/sc7180) and send a separate series to add this quirk in all
> of them.

Sounds like a plan to me!

-Doug
Konrad Dybcio May 31, 2024, 2:41 p.m. UTC | #7
On 31.05.2024 4:31 PM, Doug Anderson wrote:
> Hi,
> 
> On Fri, May 31, 2024 at 7:27 AM Krishna Kurapati PSSNV
> <quic_kriskura@quicinc.com> wrote:
>>
>>> My only guess is that somehow SC8280XP is faster and that changes the
>>> timing of how it handles interrupts. I guess you could try capping
>>> your cpufreq in sysfs and see if that makes a difference in
>>> reproducing. ;-) ...or maybe somehow SC8280XP has a newer version of
>>> the IP where they've fixed this?
>>>
>>> It would be interesting if someone with a SDM845 dragonboard could try
>>> replicating since that seems highly likely to reproduce, at least.
>>>
>>
>> Hi Konrad, Doug,
>>
>>   Usually on downstream we set this quirk only for all Gen-1 targets
>> (not particularly for this testcase) but to avoid these kind of
>> controller going dead issues. I can filter out the gen-1 targets (other
>> than sc7280/sc7180) and send a separate series to add this quirk in all
>> of them.
> 
> Sounds like a plan to me!

Yep!

In case there are more gen1 platforms than what we have upstream, it would
be of great utility if you could list them all, so that we can have a reference
for future additions, Krishna.

Konrad
Krishna Kurapati PSSNV May 31, 2024, 2:52 p.m. UTC | #8
On 5/31/2024 8:11 PM, Konrad Dybcio wrote:
> On 31.05.2024 4:31 PM, Doug Anderson wrote:
>> Hi,
>>
>> On Fri, May 31, 2024 at 7:27 AM Krishna Kurapati PSSNV
>> <quic_kriskura@quicinc.com> wrote:
>>>
>>>> My only guess is that somehow SC8280XP is faster and that changes the
>>>> timing of how it handles interrupts. I guess you could try capping
>>>> your cpufreq in sysfs and see if that makes a difference in
>>>> reproducing. ;-) ...or maybe somehow SC8280XP has a newer version of
>>>> the IP where they've fixed this?
>>>>
>>>> It would be interesting if someone with a SDM845 dragonboard could try
>>>> replicating since that seems highly likely to reproduce, at least.
>>>>
>>>
>>> Hi Konrad, Doug,
>>>
>>>    Usually on downstream we set this quirk only for all Gen-1 targets
>>> (not particularly for this testcase) but to avoid these kind of
>>> controller going dead issues. I can filter out the gen-1 targets (other
>>> than sc7280/sc7180) and send a separate series to add this quirk in all
>>> of them.
>>
>> Sounds like a plan to me!
> 
> Yep!
> 
> In case there are more gen1 platforms than what we have upstream, it would
> be of great utility if you could list them all, so that we can have a reference
> for future additions, Krishna.
> 

I am not sure if I can give out info on targets that are not on 
upstream. I will check internally if I can do that. Else we can just 
ensure that from now on whenever a Gen-1 target is getting upstreamed, 
this quirk is set.

Regards,
Krishna,