diff mbox series

platform/chrome: cros_ec_typec: deferred probe when typec count mismatch

Message ID 20221220055954.11197-1-zhouruihai@huaqin.corp-partner.google.com (mailing list archive)
State New, archived
Headers show
Series platform/chrome: cros_ec_typec: deferred probe when typec count mismatch | expand

Commit Message

Ruihai Zhou Dec. 20, 2022, 5:59 a.m. UTC
The kernel bootup is much faster with normal mode. In this case,
there is a chance that the cros-ec-typec module get the actual typec
port counts but not accurate from ec before ec is able to setup it.
It will block the HDMI mux function.
Hence, return -EPROBE_DEFER to put the device onto the deferred probe
list when the typec count mismatch between ec and node.

Signed-off-by: Ruihai Zhou <zhouruihai@huaqin.corp-partner.google.com>
---
 drivers/platform/chrome/cros_ec_typec.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Guenter Roeck Dec. 20, 2022, 3:09 p.m. UTC | #1
On Mon, Dec 19, 2022 at 10:00 PM Ruihai Zhou
<zhouruihai@huaqin.corp-partner.google.com> wrote:
>
> The kernel bootup is much faster with normal mode. In this case,
> there is a chance that the cros-ec-typec module get the actual typec
> port counts but not accurate from ec before ec is able to setup it.
> It will block the HDMI mux function.
> Hence, return -EPROBE_DEFER to put the device onto the deferred probe
> list when the typec count mismatch between ec and node.
>
> Signed-off-by: Ruihai Zhou <zhouruihai@huaqin.corp-partner.google.com>
> ---
>  drivers/platform/chrome/cros_ec_typec.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/platform/chrome/cros_ec_typec.c b/drivers/platform/chrome/cros_ec_typec.c
> index 59de4ce01fab..d821501e875c 100644
> --- a/drivers/platform/chrome/cros_ec_typec.c
> +++ b/drivers/platform/chrome/cros_ec_typec.c
> @@ -382,7 +382,7 @@ static int cros_typec_init_ports(struct cros_typec_data *typec)
>
>         if (nports > typec->num_ports) {
>                 dev_err(dev, "More ports listed than can be supported.\n");
> -               return -EINVAL;
> +               return -EPROBE_DEFER;

I think that is problematic. It might as well be that nports >
EC_USB_PD_MAX_PORTS.

Is this really seen in the field ? The EC should never report a wrong
(random) number of ports. If it is not ready, there should be _some_
indication that it isn't ready. Does it really report a more or less
random number in this case ?

Guenter

>         }
>
>         /* DT uses "reg" to specify port number. */
> --
> 2.17.1
>
Ruihai Zhou Dec. 21, 2022, 2:58 a.m. UTC | #2
> I think that is problematic. It might as well be that nports >
> EC_USB_PD_MAX_PORTS.
Yes, you're right. so we should consider it's a invalid argument and return -EINVAL
if nports > EC_USB_PD_MAX_PORTS. right?
> Is this really seen in the field ? The EC should never report a wrong
> (random) number of ports. If it is not ready, there should be _some_
> indication that it isn't ready. Does it really report a more or less
> random number in this case ?
Yes, I saw this on corsola boards. The EC report a wrong(not random) number.
because corsola emulates HDMI MUX over the current type-c mux stack. The ec has to
fake a type-c port to pass the MUX info. But the task are not initiated on starting up,
and increase the type-c port counts after the tasks finished. In this case, I saw
the typec->num_ports = 1, but the nports = 2, which will be probe failed and block
the HDMI MUX function.
I will send v2 patch, if nports > EC_USB_PD_MAX_PORTS, then return -EINVAL,
but if nports > typec->num_ports, we consider wait a second to ec task increase the
type-c port counts if there're HDMI DB attach, then return -EPROBE_DEFER.
Guenter Roeck Dec. 21, 2022, 4:10 a.m. UTC | #3
On Tue, Dec 20, 2022 at 6:10 PM Ruihai Zhou
<zhouruihai@huaqin.corp-partner.google.com> wrote:
>>
>> I think that is problematic. It might as well be that nports >
>> EC_USB_PD_MAX_PORTS.
>>
>  Yes, you're right. so we should consider it's a invalid argument and return -EINVAL if nports > EC_USB_PD_MAX_PORTS. right?

Why ? While this would be a bug, it should hopefully be found early in
development and never hit the field. I don't see a reason for changing
the code.

>>
>> Is this really seen in the field ? The EC should never report a wrong
>> (random) number of ports. If it is not ready, there should be _some_
>> indication that it isn't ready. Does it really report a more or less
>> random number in this case ?
>
>  Yes, I saw this on corsola board. The EC report a wrong number(not random), because corsola emulates HDMI MUX over the current
>  type-c mux stack. The ec has to fake a type-c port to pass the MUX info. But the task are not initiated on starting up, and increase the
>  type-c port counts after the tasks finished. In this case, I saw the typec->num_ports = 1, but the nports = 2, which will be probe failed and
>  block the HDMI mux function.
>
>  I will send v2 patch, if nports > EC_USB_PD_MAX_PORTS, then return -EINVAL, but if nports > typec->num_ports, we consider wait a second
>  to ec task increase the type-c port counts if there're a HDMI DB attach, then return -EPROBE_DEFER

The current code just reduces nports if it is larger than
EC_USB_PD_MAX_PORTS. Again, I don't see a reason to change that.

Anyway, I am not sure if the above will work reliably. I am not sure
what "HDMI DB" refers to, but if it is an external connector its
status could change anytime. In that situation, no amount of waiting
would help. Either case, the EC on corsola should really not change
the number of ports it supports. Either it is a constant that should
not change, or it is dynamic and the EC would need to tell the host if
the number of ports changes (up or down). Trying to fix this in
cros_ec_typec without well defined protocol exchange with the EC is at
best a kludge, but not a real solution.

Thanks,
Guenter
Ruihai Zhou Dec. 21, 2022, 7:01 a.m. UTC | #4
>>> Is this really seen in the field ? The EC should never report a wrong
>>> (random) number of ports. If it is not ready, there should be _some_
>>> indication that it isn't ready. Does it really report a more or less
>>> random number in this case ?
>>
>>  Yes, I saw this on corsola board. The EC report a wrong number(not random), because corsola emulates HDMI MUX over the current
>>  type-c mux stack. The ec has to fake a type-c port to pass the MUX info. But the task are not initiated on starting up, and increase the
>>  type-c port counts after the tasks finished. In this case, I saw the typec->num_ports = 1, but the nports = 2, which will be probe failed and
>>  block the HDMI mux function.
>>
>>  I will send v2 patch, if nports > EC_USB_PD_MAX_PORTS, then return -EINVAL, but if nports > typec->num_ports, we consider wait a second
>>  to ec task increase the type-c port counts if there're a HDMI DB attach, then return -EPROBE_DEFER
>
> The current code just reduces nports if it is larger than
> EC_USB_PD_MAX_PORTS. Again, I don't see a reason to change that.
>
> Anyway, I am not sure if the above will work reliably. I am not sure
> what "HDMI DB" refers to, but if it is an external connector its
> status could change anytime. In that situation, no amount of waiting
> would help. Either case, the EC on corsola should really not change
> the number of ports it supports. Either it is a constant that should
> not change, or it is dynamic and the EC would need to tell the host if
> the number of ports changes (up or down). Trying to fix this in
> cros_ec_typec without well defined protocol exchange with the EC is at
> best a kludge, but not a real solution.

Thank you for the reply. The "HDMI DB" refer to the daughter board
attached to the mainboard, which don't change anytime. In fact,
the corsola EC increase the port count dynamically with some delay
(no see a standard yet) during bootup. There is the EC code to
increase the port count [1]. If the cros-ec-typec get the type-c
port counts from EC before the EC increase the port counts, then
will probe failed if return -EINVAL. I think the cros-ec-typec
is replying on the fragile delay in EC, cros-ec-typec module will
not get the fake(HDMI) type-c port counts once the EC deferred
call is later than the driver probe. That is why I make this
change. For the verification, the driver always probe failed
when EC reboot without the patch. But the driver probe passed with
the patch after a few times deferred probe.

Link:
[1] https://source.chromium.org/chromiumos/chromiumos/codesearch/+/main:src/platform/ec/zephyr/program/corsola/src/usbc.c;l=47
Prashant Malani Dec. 21, 2022, 9:20 a.m. UTC | #5
On Tue, Dec 20, 2022 at 11:01 PM Ruihai Zhou
<zhouruihai@huaqin.corp-partner.google.com> wrote:
>
> >>> Is this really seen in the field ? The EC should never report a wrong
> >>> (random) number of ports. If it is not ready, there should be _some_
> >>> indication that it isn't ready. Does it really report a more or less
> >>> random number in this case ?
> >>
> >>  Yes, I saw this on corsola board. The EC report a wrong number(not random), because corsola emulates HDMI MUX over the current
> >>  type-c mux stack. The ec has to fake a type-c port to pass the MUX info. But the task are not initiated on starting up, and increase the
> >>  type-c port counts after the tasks finished. In this case, I saw the typec->num_ports = 1, but the nports = 2, which will be probe failed and
> >>  block the HDMI mux function.
> >>
> >>  I will send v2 patch, if nports > EC_USB_PD_MAX_PORTS, then return -EINVAL, but if nports > typec->num_ports, we consider wait a second
> >>  to ec task increase the type-c port counts if there're a HDMI DB attach, then return -EPROBE_DEFER
> >
> > The current code just reduces nports if it is larger than
> > EC_USB_PD_MAX_PORTS. Again, I don't see a reason to change that.
> >
> > Anyway, I am not sure if the above will work reliably. I am not sure
> > what "HDMI DB" refers to, but if it is an external connector its
> > status could change anytime. In that situation, no amount of waiting
> > would help. Either case, the EC on corsola should really not change
> > the number of ports it supports. Either it is a constant that should
> > not change, or it is dynamic and the EC would need to tell the host if
> > the number of ports changes (up or down). Trying to fix this in
> > cros_ec_typec without well defined protocol exchange with the EC is at
> > best a kludge, but not a real solution.
>
> Thank you for the reply. The "HDMI DB" refer to the daughter board
> attached to the mainboard, which don't change anytime. In fact,
> the corsola EC increase the port count dynamically with some delay
> (no see a standard yet) during bootup. There is the EC code to
> increase the port count [1]. If the cros-ec-typec get the type-c
> port counts from EC before the EC increase the port counts, then
> will probe failed if return -EINVAL. I think the cros-ec-typec
> is replying on the fragile delay in EC, cros-ec-typec module will
> not get the fake(HDMI) type-c port counts once the EC deferred
> call is later than the driver probe. That is why I make this
> change. For the verification, the driver always probe failed
> when EC reboot without the patch. But the driver probe passed with
> the patch after a few times deferred probe.

Sorry, but I agree with Guenter; I don't think this is justification to add a
hack for 1 board in this driver.

In what situation does the EC task take so long after reboot that it
occurs after a module probe (which occurs shortly after system boot up)?

Why is the "corsola" board increasing the Type-C port count dynamically
after some delay, and what is that delay amount? Why can't this be set
via CBI/DT or its equivalent on your ARM platform?

This seems like fragile code in the EC and should be fixed there. The
number of Type-C ports reported by the EC really should be invariant
across the boot lifecycle. I don't think this patch can be accepted.

Frankly, I'm concerned that doing this might be introducing some subtle
bugs in the EC Type-C code too (I'm certain there are assumptions
about port count being static there too).

If you really need to change the number of ports dynamically in the EC,
add a board specific-hook to not respond to the EC_CMD_USB_PD_PORTS
host command until you have "set" the number of ports.

Best regards,

- Prashant
Ruihai Zhou Dec. 22, 2022, 1:53 a.m. UTC | #6
> Sorry, but I agree with Guenter; I don't think this is justification to add a
> hack for 1 board in this driver.
>
> In what situation does the EC task take so long after reboot that it
> occurs after a module probe (which occurs shortly after system boot up)?
>
> Why is the "corsola" board increasing the Type-C port count dynamically
> after some delay, and what is that delay amount? Why can't this be set
> via CBI/DT or its equivalent on your ARM platform?
>
> This seems like fragile code in the EC and should be fixed there. The
> number of Type-C ports reported by the EC really should be invariant
> across the boot lifecycle. I don't think this patch can be accepted.
>

Alright, I got it. Thank you for your patience and the suggestions.

Because the tcpc/pd tasks are not needed for the HDMI port.
The EC don't want the tasks initiated on starting up, and increase the
counts after the tasks are launched. The delay amount was 2 second before,
(always failed when ec reboot on firmware normal mode) and change to
1 second later, and 500ms yesterday(not reproduce until now).
The delay on EC can be referred to here [1]. However, with the current
design, the number of type-c ports reported by EC isn't invariant across
the boot lifecycle. Also, maybe printing the counts getting from EC
when encountered this field will be a littel helpful for debugging. just
an insignificant thought.

> If you really need to change the number of ports dynamically in the EC,
> add a board specific-hook to not respond to the EC_CMD_USB_PD_PORTS
> host command until you have "set" the number of ports.

Got it. Thank you again for your suggestions.

Link:
[1] https://source.chromium.org/chromiumos/chromiumos/codesearch/+/main:src/platform/ec/zephyr/program/corsola/src/usbc.c;l=222
diff mbox series

Patch

diff --git a/drivers/platform/chrome/cros_ec_typec.c b/drivers/platform/chrome/cros_ec_typec.c
index 59de4ce01fab..d821501e875c 100644
--- a/drivers/platform/chrome/cros_ec_typec.c
+++ b/drivers/platform/chrome/cros_ec_typec.c
@@ -382,7 +382,7 @@  static int cros_typec_init_ports(struct cros_typec_data *typec)
 
 	if (nports > typec->num_ports) {
 		dev_err(dev, "More ports listed than can be supported.\n");
-		return -EINVAL;
+		return -EPROBE_DEFER;
 	}
 
 	/* DT uses "reg" to specify port number. */