mbox series

[RFC,net-next,0/2] net: dsa: realtek: fix LED support for rtl8366rb

Message ID 20240106184651.3665-1-luizluca@gmail.com (mailing list archive)
Headers show
Series net: dsa: realtek: fix LED support for rtl8366rb | expand

Message

Luiz Angelo Daros de Luca Jan. 6, 2024, 6:40 p.m. UTC
The rtl8366rb switch family has 4 LED groups, with one LED from each
group for each of its 6 ports. LEDs in this family can be controlled
manually using a bitmap or triggered by hardware. It's important to note
that hardware triggers are configured at the LED group level, meaning
all LEDs in the same group share the same hardware triggers settings.

The first part of this series involves dropping most of the existing
code, as, except for disabling the LEDs, it was not working as expected.
If not disabled, the LEDs will retain their default settings after a
switch reset, which may be sufficient for many devices.

The second part introduces the LED driver to control the switch LEDs
from sysfs or device-tree. This driver still allows the LEDs to retain
their default settings, but it will shift to the software-based OS LED
triggers if any configuration is changed. Subsequently, the LEDs will
operate as normal LEDs until the switch undergoes another reset.

Netdev LED trigger supports offloading to hardware triggers.
Unfortunately, this isn't possible with the current LED API for this
switch family. When the hardware trigger is enabled, it applies to all
LEDs in the LED group while the LED API decides to offload based on only
the state of a single LED. To avoid inconsistency between LEDs,
offloading would need to check if all LEDs in the group share the same
compatible settings and atomically enable offload for all LEDs.

This patch series has a minor conflict with the other realtek
refacatoring series at the priv->ds access.

-       dsa_switch_for_each_port(dp, priv->ds) {
+       dsa_switch_for_each_port(dp, &priv->ds) {

Comments

Luiz Angelo Daros de Luca Jan. 6, 2024, 7:47 p.m. UTC | #1
> The rtl8366rb switch family has 4 LED groups, with one LED from each
> group for each of its 6 ports. LEDs in this family can be controlled
> manually using a bitmap or triggered by hardware. It's important to note
> that hardware triggers are configured at the LED group level, meaning
> all LEDs in the same group share the same hardware triggers settings.
>
> The first part of this series involves dropping most of the existing
> code, as, except for disabling the LEDs, it was not working as expected.
> If not disabled, the LEDs will retain their default settings after a
> switch reset, which may be sufficient for many devices.
>
> The second part introduces the LED driver to control the switch LEDs
> from sysfs or device-tree. This driver still allows the LEDs to retain
> their default settings, but it will shift to the software-based OS LED
> triggers if any configuration is changed. Subsequently, the LEDs will
> operate as normal LEDs until the switch undergoes another reset.
>
> Netdev LED trigger supports offloading to hardware triggers.
> Unfortunately, this isn't possible with the current LED API for this
> switch family. When the hardware trigger is enabled, it applies to all
> LEDs in the LED group while the LED API decides to offload based on only
> the state of a single LED. To avoid inconsistency between LEDs,
> offloading would need to check if all LEDs in the group share the same
> compatible settings and atomically enable offload for all LEDs.

Hi Christian,

I tried to implement something close to your work with qca8k and LED
hw control. However, I couldn't find a solution that would work with
the existing API. The HW led configuration in realtek switches is
shared with all LEDs in a group. Before activating the hw control, all
LEDs in the same group must share the same netdev trigger config, use
the correct device and also use a compatible netdev trigger settings.
In order to check that, I would need to expose some internal netdev
trigger info that is only available through sysfs (and I believe sysfs
is not suitable to be used from the kernel). Even if I got all LEDs
with the correct settings, I would need to atomicly switch all LEDs to
use the hw control or, at least, I would need to stop all update jobs
because if the OS changes a LED brightness, it might be interpreted as
the OS disabling the hw control:

/*
...
* Deactivate hardware blink control by setting brightness to LED_OFF via
* the brightness_set() callback.
*
...
*/
int (*hw_control_set)(struct led_classdev *led_cdev,
 unsigned long flags);

Do you have any idea how to implement it?

BTW, during my tests with a single LED, ignoring the LED group
situation, I noticed that the OS was sending a brightness_set(LED_OFF)
after I changed the trigger to netdev, a moment after hw_control_set
was called. It doesn't make sense to enable hw control just to disable
it afterwards. The call came from set_brightness_delayed(). Maybe it
is because my test device is pretty slow and the previous trigger
event always gets queued. Touching any settings after that worked as
expected without the spurious brightness_set(LED_OFF). Did you see
something like this?

Regards,

Luiz
Linus Walleij Jan. 7, 2024, 8:16 p.m. UTC | #2
Hi Luiz,

On Sat, Jan 6, 2024 at 7:47 PM Luiz Angelo Daros de Luca
<luizluca@gmail.com> wrote:

> The rtl8366rb switch family has 4 LED groups, with one LED from each
> group for each of its 6 ports. LEDs in this family can be controlled
> manually using a bitmap or triggered by hardware. It's important to note
> that hardware triggers are configured at the LED group level, meaning
> all LEDs in the same group share the same hardware triggers settings.
>
> The first part of this series involves dropping most of the existing
> code, as, except for disabling the LEDs, it was not working as expected.
> If not disabled, the LEDs will retain their default settings after a
> switch reset, which may be sufficient for many devices.
>
> The second part introduces the LED driver to control the switch LEDs
> from sysfs or device-tree. This driver still allows the LEDs to retain
> their default settings, but it will shift to the software-based OS LED
> triggers if any configuration is changed. Subsequently, the LEDs will
> operate as normal LEDs until the switch undergoes another reset.
>
> Netdev LED trigger supports offloading to hardware triggers.
> Unfortunately, this isn't possible with the current LED API for this
> switch family. When the hardware trigger is enabled, it applies to all
> LEDs in the LED group while the LED API decides to offload based on only
> the state of a single LED. To avoid inconsistency between LEDs,
> offloading would need to check if all LEDs in the group share the same
> compatible settings and atomically enable offload for all LEDs.

I think these patches look great, and the driver certainly look better
after these changes than before them so if you resend without
RFC, please feel free to add my:

Reviewed-by: Linus Walleij <linus.walleij@linaro.org>

HW triggers may be hard to implement but plain software control
is not bad either so this is already way better than what we had
before.

HW control can always be discussed and added later.

Yours,
Linus Walleij
Christian Marangi Jan. 7, 2024, 8:51 p.m. UTC | #3
On Sat, Jan 06, 2024 at 04:47:10PM -0300, Luiz Angelo Daros de Luca wrote:
> > The rtl8366rb switch family has 4 LED groups, with one LED from each
> > group for each of its 6 ports. LEDs in this family can be controlled
> > manually using a bitmap or triggered by hardware. It's important to note
> > that hardware triggers are configured at the LED group level, meaning
> > all LEDs in the same group share the same hardware triggers settings.
> >
> > The first part of this series involves dropping most of the existing
> > code, as, except for disabling the LEDs, it was not working as expected.
> > If not disabled, the LEDs will retain their default settings after a
> > switch reset, which may be sufficient for many devices.
> >
> > The second part introduces the LED driver to control the switch LEDs
> > from sysfs or device-tree. This driver still allows the LEDs to retain
> > their default settings, but it will shift to the software-based OS LED
> > triggers if any configuration is changed. Subsequently, the LEDs will
> > operate as normal LEDs until the switch undergoes another reset.
> >
> > Netdev LED trigger supports offloading to hardware triggers.
> > Unfortunately, this isn't possible with the current LED API for this
> > switch family. When the hardware trigger is enabled, it applies to all
> > LEDs in the LED group while the LED API decides to offload based on only
> > the state of a single LED. To avoid inconsistency between LEDs,
> > offloading would need to check if all LEDs in the group share the same
> > compatible settings and atomically enable offload for all LEDs.
> 
> Hi Christian,
> 
> I tried to implement something close to your work with qca8k and LED
> hw control. However, I couldn't find a solution that would work with
> the existing API. The HW led configuration in realtek switches is
> shared with all LEDs in a group. Before activating the hw control, all
> LEDs in the same group must share the same netdev trigger config, use
> the correct device and also use a compatible netdev trigger settings.
> In order to check that, I would need to expose some internal netdev
> trigger info that is only available through sysfs (and I believe sysfs
> is not suitable to be used from the kernel). Even if I got all LEDs
> with the correct settings, I would need to atomicly switch all LEDs to
> use the hw control or, at least, I would need to stop all update jobs
> because if the OS changes a LED brightness, it might be interpreted as
> the OS disabling the hw control:
> 

Saddly we still don't have the concept of LED groups, but from what I
notice 99% of the time switch have limitation of HW control but single
LED can still be controlled separately.

With this limitation you can use the is_supported function and some priv
struct to enforce and reject unsupported configuration.
netdev trigger will then fallback to software in this case. (I assume on
real world scenario to have all the LED in the group to be set to the
common rule set resulting in is_supported never rejecting it)

Also consider this situation, it's the first LED touched that enables HW
control that drive everything. LED configuration are not enabled all at
once. You can totally introduce a priv struct that cache the current
modes and on the other LEDs make sure the requested mode match the cache
one.

And I guess this limitation should be printed and documented in DT.

> /*
> ...
> * Deactivate hardware blink control by setting brightness to LED_OFF via
> * the brightness_set() callback.
> *
> ...
> */
> int (*hw_control_set)(struct led_classdev *led_cdev,
>  unsigned long flags);
> 
> Do you have any idea how to implement it?
> 
> BTW, during my tests with a single LED, ignoring the LED group
> situation, I noticed that the OS was sending a brightness_set(LED_OFF)
> after I changed the trigger to netdev, a moment after hw_control_set
> was called. It doesn't make sense to enable hw control just to disable
> it afterwards. The call came from set_brightness_delayed(). Maybe it
> is because my test device is pretty slow and the previous trigger
> event always gets queued. Touching any settings after that worked as
> expected without the spurious brightness_set(LED_OFF). Did you see
> something like this?
>

Consider that brightness_set is called whatever a trigger is changed,
the logic is in the generic LED handling. Setting OFF and then enabling
hw control should not change a thing. In other driver tho I notice an
extra measure is needed to reset any HW control rule already applied by
default.
Luiz Angelo Daros de Luca Jan. 8, 2024, 5:47 a.m. UTC | #4
> > Hi Christian,
> >
> > I tried to implement something close to your work with qca8k and LED
> > hw control. However, I couldn't find a solution that would work with
> > the existing API. The HW led configuration in realtek switches is
> > shared with all LEDs in a group. Before activating the hw control, all
> > LEDs in the same group must share the same netdev trigger config, use
> > the correct device and also use a compatible netdev trigger settings.
> > In order to check that, I would need to expose some internal netdev
> > trigger info that is only available through sysfs (and I believe sysfs
> > is not suitable to be used from the kernel). Even if I got all LEDs
> > with the correct settings, I would need to atomicly switch all LEDs to
> > use the hw control or, at least, I would need to stop all update jobs
> > because if the OS changes a LED brightness, it might be interpreted as
> > the OS disabling the hw control:
> >
>
> Saddly we still don't have the concept of LED groups, but from what I
> notice 99% of the time switch have limitation of HW control but single
> LED can still be controlled separately.

Individually, I can only turn them on/off. That is enough for software
control but not for hardware control. When you set a LED group to
blink on link activity, all LEDs will be affected.

> With this limitation you can use the is_supported function and some priv
> struct to enforce and reject unsupported configuration.
> netdev trigger will then fallback to software in this case. (I assume on
> real world scenario to have all the LED in the group to be set to the
> common rule set resulting in is_supported never rejecting it)

Maybe I wasn't clear enough about what the HW provides me. I have 4
16-bit registers:

REG1: a single blink rate used by all LEDs in all groups
REG2: configures the trigger for each group, 4-bit each, with one
special 4-bit value being "fixed", equivalent to "none" in Linux LED
trigger
REG3: bitmap to manually control LEDs in group 0 and 1 only when their
group trigger is configured as fixed.
REG3: bitmap to manually control LEDs in group 2 and 3 only when their
group trigger is configured as fixed.

And that's it.

I can keep track of the netdev trigger form calls to "is_supported
function". I can also check if all LEDs are still using the netdev
trigger. However, I cannot detect if the user changed the device to
something else not related to the corresponding port as the netdev
trigger will not check the compatibility if the device does not match.
I would still need to expose at least some of the netdev trigger
internal data.

> Also consider this situation, it's the first LED touched that enables HW
> control that drive everything. LED configuration are not enabled all at
> once. You can totally introduce a priv struct that cache the current
> modes and on the other LEDs make sure the requested mode match the cache
> one.

Considering that I can externally check that all LEDs have a netdev
trigger settings compatible with the HW control, once the last LED is
configured, I could return true for the hw_control_is_supported. When
hw_control_set is called, I could configure the hardware accordingly,
which would affect all LEDs in that group. However, the OS will still
use the software control for the other LEDs in that same group. That
way, once a netdev event turns off one LED, that message is the same
clue the LED driver receives to disable the hardware control. It will
undo the hardware change I just made. I could use
led_brightness_set(OFF) on those other LEDs during hw_control_set to
disable their software controlled triggers (actually changing the
trigger to "none"), but it might be a race condition of who stops the
other. And even then, the other LEDs will keep an inconsistent
configuration state, with "none" as their trigger.

I need:
1) expose the required info or allow an external caller to test a LED
configuration for compatibility (avoiding recursion).
2) something from hw_control_set() that stops the software triggers in
other LEDs without destroying their configuration.
3) something that could enable hw_control on those other LEDs

> And I guess this limitation should be printed and documented in DT.
>
> > /*
> > ...
> > * Deactivate hardware blink control by setting brightness to LED_OFF via
> > * the brightness_set() callback.
> > *
> > ...
> > */
> > int (*hw_control_set)(struct led_classdev *led_cdev,
> >  unsigned long flags);
> >
> > Do you have any idea how to implement it?
> >
> > BTW, during my tests with a single LED, ignoring the LED group
> > situation, I noticed that the OS was sending a brightness_set(LED_OFF)
> > after I changed the trigger to netdev, a moment after hw_control_set
> > was called. It doesn't make sense to enable hw control just to disable
> > it afterwards. The call came from set_brightness_delayed(). Maybe it
> > is because my test device is pretty slow and the previous trigger
> > event always gets queued. Touching any settings after that worked as
> > expected without the spurious brightness_set(LED_OFF). Did you see
> > something like this?
> >
>
> Consider that brightness_set is called whatever a trigger is changed,
> the logic is in the generic LED handling. Setting OFF and then enabling
> hw control should not change a thing. In other driver tho I notice an
> extra measure is needed to reset any HW control rule already applied by
> default.

It would be OK to call brightness_set(LED_OFF) if that is guaranteed
to happen before hw_control_set(). The problem is that the
brightness_set(LED_OFF) happens *after* hw_control_set() was called.
It looks like a race condition.

Regards,

Luiz
Christian Marangi Jan. 8, 2024, 1:09 p.m. UTC | #5
On Mon, Jan 08, 2024 at 02:47:22AM -0300, Luiz Angelo Daros de Luca wrote:
> > > Hi Christian,
> > >
> > > I tried to implement something close to your work with qca8k and LED
> > > hw control. However, I couldn't find a solution that would work with
> > > the existing API. The HW led configuration in realtek switches is
> > > shared with all LEDs in a group. Before activating the hw control, all
> > > LEDs in the same group must share the same netdev trigger config, use
> > > the correct device and also use a compatible netdev trigger settings.
> > > In order to check that, I would need to expose some internal netdev
> > > trigger info that is only available through sysfs (and I believe sysfs
> > > is not suitable to be used from the kernel). Even if I got all LEDs
> > > with the correct settings, I would need to atomicly switch all LEDs to
> > > use the hw control or, at least, I would need to stop all update jobs
> > > because if the OS changes a LED brightness, it might be interpreted as
> > > the OS disabling the hw control:
> > >
> >
> > Saddly we still don't have the concept of LED groups, but from what I
> > notice 99% of the time switch have limitation of HW control but single
> > LED can still be controlled separately.
>
> Individually, I can only turn them on/off. That is enough for software
> control but not for hardware control. When you set a LED group to
> blink on link activity, all LEDs will be affected.
>

Assuming we have the same 2005 datasheet, yes the LED situation is
complex for this switch. (If you have something better please link)

> > With this limitation you can use the is_supported function and some priv
> > struct to enforce and reject unsupported configuration.
> > netdev trigger will then fallback to software in this case. (I assume on
> > real world scenario to have all the LED in the group to be set to the
> > common rule set resulting in is_supported never rejecting it)
> 
> Maybe I wasn't clear enough about what the HW provides me. I have 4
> 16-bit registers:
> 
> REG1: a single blink rate used by all LEDs in all groups
> REG2: configures the trigger for each group, 4-bit each, with one
> special 4-bit value being "fixed", equivalent to "none" in Linux LED
> trigger
> REG3: bitmap to manually control LEDs in group 0 and 1 only when their
> group trigger is configured as fixed.
> REG3: bitmap to manually control LEDs in group 2 and 3 only when their
> group trigger is configured as fixed.
> 
> And that's it.
> 
> I can keep track of the netdev trigger form calls to "is_supported
> function". I can also check if all LEDs are still using the netdev
> trigger. However, I cannot detect if the user changed the device to
> something else not related to the corresponding port as the netdev
> trigger will not check the compatibility if the device does not match.
> I would still need to expose at least some of the netdev trigger
> internal data.
> 

We can make some assumption and use refcount tho. For very exotic
configuration it will always fallback to software (and make hw control
impossible) but for more generic one we can benefit of it.

- We can only enable HW control on the LED group.

This means that for the group we need to make sure that:
1. We have the correct device set to each LED
2. We have an acceptable mode requesterd.

With these 2 prereq, we can correctly enable HW control for the LED
group.

HW control is enable only IF the device netdev currently set match what
hw_control_get_device returns. With this, we can assume for HW control
request the correct netdev is always set.

We also use refcount to check how many LED are actually ""enabled"".
With this count we can understand if we can enable HW control for the
LED group or return false from is_supported.

And with HW control enabled, we would reject all kind of invalid
settings and print a warning to alert the user of the limitation and
maybe how to remove it.

> > Also consider this situation, it's the first LED touched that enables HW
> > control that drive everything. LED configuration are not enabled all at
> > once. You can totally introduce a priv struct that cache the current
> > modes and on the other LEDs make sure the requested mode match the cache
> > one.
> 
> Considering that I can externally check that all LEDs have a netdev
> trigger settings compatible with the HW control, once the last LED is
> configured, I could return true for the hw_control_is_supported. When
> hw_control_set is called, I could configure the hardware accordingly,
> which would affect all LEDs in that group. However, the OS will still
> use the software control for the other LEDs in that same group. That
> way, once a netdev event turns off one LED, that message is the same
> clue the LED driver receives to disable the hardware control. It will
> undo the hardware change I just made. I could use
> led_brightness_set(OFF) on those other LEDs during hw_control_set to
> disable their software controlled triggers (actually changing the
> trigger to "none"), but it might be a race condition of who stops the
> other. And even then, the other LEDs will keep an inconsistent
> configuration state, with "none" as their trigger.
> 
> I need:
> 1) expose the required info or allow an external caller to test a LED
> configuration for compatibility (avoiding recursion).
> 2) something from hw_control_set() that stops the software triggers in
> other LEDs without destroying their configuration.
> 3) something that could enable hw_control on those other LEDs
> 

I think it would be problematic for other LED to do changes. I need to
check how LED multicolor work... In a sense they are LED group so maybe
in LED core we have a way to group LED and share some info with the
others.

> > And I guess this limitation should be printed and documented in DT.
> >
> > > /*
> > > ...
> > > * Deactivate hardware blink control by setting brightness to LED_OFF via
> > > * the brightness_set() callback.
> > > *
> > > ...
> > > */
> > > int (*hw_control_set)(struct led_classdev *led_cdev,
> > >  unsigned long flags);
> > >
> > > Do you have any idea how to implement it?
> > >
> > > BTW, during my tests with a single LED, ignoring the LED group
> > > situation, I noticed that the OS was sending a brightness_set(LED_OFF)
> > > after I changed the trigger to netdev, a moment after hw_control_set
> > > was called. It doesn't make sense to enable hw control just to disable
> > > it afterwards. The call came from set_brightness_delayed(). Maybe it
> > > is because my test device is pretty slow and the previous trigger
> > > event always gets queued. Touching any settings after that worked as
> > > expected without the spurious brightness_set(LED_OFF). Did you see
> > > something like this?
> > >
> >
> > Consider that brightness_set is called whatever a trigger is changed,
> > the logic is in the generic LED handling. Setting OFF and then enabling
> > hw control should not change a thing. In other driver tho I notice an
> > extra measure is needed to reset any HW control rule already applied by
> > default.
> 
> It would be OK to call brightness_set(LED_OFF) if that is guaranteed
> to happen before hw_control_set(). The problem is that the
> brightness_set(LED_OFF) happens *after* hw_control_set() was called.
> It looks like a race condition.
> 

Totally require some further investigation, it seems strange tho that
the your system is that slow.
Luiz Angelo Daros de Luca Jan. 13, 2024, 4:06 a.m. UTC | #6
> On Mon, Jan 08, 2024 at 02:47:22AM -0300, Luiz Angelo Daros de Luca wrote:
> > > > Hi Christian,
> > > >
> > > > I tried to implement something close to your work with qca8k and LED
> > > > hw control. However, I couldn't find a solution that would work with
> > > > the existing API. The HW led configuration in realtek switches is
> > > > shared with all LEDs in a group. Before activating the hw control, all
> > > > LEDs in the same group must share the same netdev trigger config, use
> > > > the correct device and also use a compatible netdev trigger settings.
> > > > In order to check that, I would need to expose some internal netdev
> > > > trigger info that is only available through sysfs (and I believe sysfs
> > > > is not suitable to be used from the kernel). Even if I got all LEDs
> > > > with the correct settings, I would need to atomicly switch all LEDs to
> > > > use the hw control or, at least, I would need to stop all update jobs
> > > > because if the OS changes a LED brightness, it might be interpreted as
> > > > the OS disabling the hw control:
> > > >
> > >
> > > Saddly we still don't have the concept of LED groups, but from what I
> > > notice 99% of the time switch have limitation of HW control but single
> > > LED can still be controlled separately.
> >
> > Individually, I can only turn them on/off. That is enough for software
> > control but not for hardware control. When you set a LED group to
> > blink on link activity, all LEDs will be affected.
> >
>
> Assuming we have the same 2005 datasheet, yes the LED situation is
> complex for this switch. (If you have something better please link)

I wouldn't say complex, but limited.

The manual for rtl8365mb
(https://cdn.jsdelivr.net/gh/libc0607/Realtek_switch_hacking@files/Realtek_Unmanaged_Switch_ProgrammingGuide.pdf,
page 64) shows the vendor API for controlling LEDs. It is similar for
both families.
You have only 3 functions:

int32 rtk_led_blinkRate_set(rtk_led_blink_rate_t blinkRate)
int32 rtk_led_groupConfig_set(rtk_led_group_t group, rtk_led_congig_t config)
int32 rtk_led_enable_set(rtk_led_group_t group, rtk_portmask_t portmask)

rtk_led_blinkRate_set sets the blink rate but it does not mention
group or port, so it is global.
rtk_led_groupConfig_set defines the HW trigger but only by group.
rtk_led_enable_set uses a mask to manually set a LED

It closely matches the register behavior.

> > > Also consider this situation, it's the first LED touched that enables HW
> > > control that drive everything. LED configuration are not enabled all at
> > > once. You can totally introduce a priv struct that cache the current
> > > modes and on the other LEDs make sure the requested mode match the cache
> > > one.
> >
> > Considering that I can externally check that all LEDs have a netdev
> > trigger settings compatible with the HW control, once the last LED is
> > configured, I could return true for the hw_control_is_supported. When
> > hw_control_set is called, I could configure the hardware accordingly,
> > which would affect all LEDs in that group. However, the OS will still
> > use the software control for the other LEDs in that same group. That
> > way, once a netdev event turns off one LED, that message is the same
> > clue the LED driver receives to disable the hardware control. It will
> > undo the hardware change I just made. I could use
> > led_brightness_set(OFF) on those other LEDs during hw_control_set to
> > disable their software controlled triggers (actually changing the
> > trigger to "none"), but it might be a race condition of who stops the
> > other. And even then, the other LEDs will keep an inconsistent
> > configuration state, with "none" as their trigger.
> >
> > I need:
> > 1) expose the required info or allow an external caller to test a LED
> > configuration for compatibility (avoiding recursion).
> > 2) something from hw_control_set() that stops the software triggers in
> > other LEDs without destroying their configuration.
> > 3) something that could enable hw_control on those other LEDs
> >
>
> I think it would be problematic for other LED to do changes. I need to
> check how LED multicolor work... In a sense they are LED group so maybe
> in LED core we have a way to group LED and share some info with the
> others.

That's the main issue. I can expose the needed info to check if all
LEDs agree in a compatible configuration. However, once that happens,
I must stop all sw control and enable the hw control. Something like:

lock a group of leds
for all leds
   if not devname is correct
     fallback to sw control
   if settings are not compatible
     fallback to sw control
   if settings is different from other LEDs
     fallback to sw control
for all leds
   stop sw control work
for all leds
   enable hw control
set the group hw trigger
unlock the group of leds

And I need something that would also work to disable hw control once
the first LED changes anything, breaking the compatibility.

> > > > BTW, during my tests with a single LED, ignoring the LED group
> > > > situation, I noticed that the OS was sending a brightness_set(LED_OFF)
> > > > after I changed the trigger to netdev, a moment after hw_control_set
> > > > was called. It doesn't make sense to enable hw control just to disable
> > > > it afterwards. The call came from set_brightness_delayed(). Maybe it
> > > > is because my test device is pretty slow and the previous trigger
> > > > event always gets queued. Touching any settings after that worked as
> > > > expected without the spurious brightness_set(LED_OFF). Did you see
> > > > something like this?
> > > >
> > >
> > > Consider that brightness_set is called whatever a trigger is changed,
> > > the logic is in the generic LED handling. Setting OFF and then enabling
> > > hw control should not change a thing. In other driver tho I notice an
> > > extra measure is needed to reset any HW control rule already applied by
> > > default.
> >
> > It would be OK to call brightness_set(LED_OFF) if that is guaranteed
> > to happen before hw_control_set(). The problem is that the
> > brightness_set(LED_OFF) happens *after* hw_control_set() was called.
> > It looks like a race condition.
> >
>
> Totally require some further investigation, it seems strange tho that
> the your system is that slow.

I got some stacks. When I change the trigger to netdev, I get 2 calls
to set the hw control:

[  625.601449] CPU: 0 PID: 2607 Comm: ash Not tainted 6.1.59 #0
[  625.607153] Stack : 809b0000 77e70000 00000000 800c1e80 00000431
00000004 00000000 00000000
[  625.615627]         80c45c74 80980000 80850000 806ffc2c 80e23b88
00000001 80c45c18 f51e58b8
[  625.624094]         00000000 00000000 806ffc2c 80c45b48 ffffefff
00000000 00000000 ffffffea
[  625.632561]         00000112 80c45b54 00000112 807c99c0 00000001
806ffc2c 809ec080 00000005
[  625.641029]         ffff7fff 80840000 809b0000 77e70000 00000018
8039785c 00000000 80980000
[  625.649495]         ...
[  625.651965] Call Trace:
[  625.654422] [<80066e4c>] show_stack+0x28/0xf0
[  625.658848] [<8061b800>] dump_stack_lvl+0x38/0x60
[  625.663599] [<8189b890>] rtl8366rb_cled_hw_control_set+0xdc/0xf8 [rtl8366]
[  625.670578] [<8041dfb0>] netdev_trig_notify+0x114/0x280
[  625.675867] [<80450d14>] call_netdevice_register_net_notifiers+0x54/0x104
[  625.682729] [<804542dc>] register_netdevice_notifier+0x98/0x130
[  625.688702] [<8041dbf8>] netdev_trig_activate+0x160/0x1b0
[  625.694152] [<8041b948>] led_trigger_set+0xf8/0x254
[  625.699070] [<8041c2a4>] led_trigger_write+0xd4/0x148
[  625.704163] [<8026966c>] sysfs_kf_bin_write+0x80/0xbc
[  625.709263] [<80268438>] kernfs_fop_write_iter+0x118/0x244
[  625.714801] [<801e787c>] vfs_write+0x1fc/0x3c0
[  625.719301] [<801e7bdc>] ksys_write+0x70/0x124
[  625.723791] [<8006e1e4>] syscall_common+0x34/0x58

[  625.761665] CPU: 0 PID: 2607 Comm: ash Not tainted 6.1.59 #0
[  625.767374] Stack : 809b0000 77e70000 00000000 800c1e80 00000431
00000004 00000000 00000000
[  625.775848]         80c45c74 80980000 80850000 806ffc2c 80e23b88
00000001 80c45c18 f51e58b8
[  625.784315]         00000000 00000000 806ffc2c 80c45b48 ffffefff
00000000 00000000 ffffffea
[  625.792782]         00000130 80c45b54 00000130 807c99c0 00000001
806ffc2c 809ec080 00000001
[  625.801249]         ffff7fff 80840000 809b0000 77e70000 00000018
8039785c 00000000 80980000
[  625.809716]         ...
[  625.812186] Call Trace:
[  625.814643] [<80066e4c>] show_stack+0x28/0xf0
[  625.819068] [<8061b800>] dump_stack_lvl+0x38/0x60
[  625.823820] [<8189b890>] rtl8366rb_cled_hw_control_set+0xdc/0xf8 [rtl8366]
[  625.830800] [<8041dfb0>] netdev_trig_notify+0x114/0x280
[  625.836088] [<80450d94>] call_netdevice_register_net_notifiers+0xd4/0x104
[  625.842950] [<804542dc>] register_netdevice_notifier+0x98/0x130
[  625.848923] [<8041dbf8>] netdev_trig_activate+0x160/0x1b0
[  625.854374] [<8041b948>] led_trigger_set+0xf8/0x254
[  625.859300] [<8041c2a4>] led_trigger_write+0xd4/0x148
[  625.864401] [<8026966c>] sysfs_kf_bin_write+0x80/0xbc
[  625.869502] [<80268438>] kernfs_fop_write_iter+0x118/0x244
[  625.875040] [<801e787c>] vfs_write+0x1fc/0x3c0
[  625.879539] [<801e7bdc>] ksys_write+0x70/0x124
[  625.884030] [<8006e1e4>] syscall_common+0x34/0x58

That is not really a problem but I my guess is that it is calling for
both NETDEV_REGISTER and NETDEV_UP as both eventually call
set_baseline_state(). Shouldn't we avoid one of them?

But after that, I get:

[  625.900712] CPU: 0 PID: 2626 Comm: kworker/0:2 Not tainted 6.1.59 #0
[  625.907154] Workqueue: events set_brightness_delayed
[  625.912178] Stack : 81000205 81a4ab40 817f9da4 800c1e80 8199e4e0
806ffc2c 807c0000 81a4ab00
[  625.920652]         81000200 00000000 80c08c5c 800c1f7c 80e21cc8
00000001 817f9d60 f37f8d2c
[  625.929119]         00000000 00000000 806ffc2c 817f9c78 ffffefff
00000000 00000000 ffffffea
[  625.937587]         00000148 817f9c84 00000148 807c99c0 00000001
806ffc2c 00000000 81000200
[  625.946054]         00000000 80c08c5c 81000205 81a4ab40 00000018
8039785c 00000000 80980000
[  625.954521]         ...
[  625.956990] Call Trace:
[  625.959448] [<80066e4c>] show_stack+0x28/0xf0
[  625.963873] [<8061b800>] dump_stack_lvl+0x38/0x60
[  625.968624] [<81899ec8>]
rtl8366rb_cled_brightness_set_blocking+0x68/0x88 [rtl8366]
[  625.976389] [<8041a018>] set_brightness_delayed+0x84/0xec
[  625.981833] [<8009f724>] process_one_work+0x254/0x484
[  625.986934] [<8009fed0>] worker_thread+0x178/0x5a4
[  625.991766] [<800a61a0>] kthread+0xec/0x114
[  625.996004] [<800620b8>] ret_from_kernel_thread+0x14/0x1c

This one is the issue. It turns off the LED, forcing me to disable the
hw control I just configured (twice). Unfortunately, a scheduled work
breaks the stack, not showing who actually requested it.

I only saw set_brightness_delayed being used by a work created in
led_init_core(), called by led_classdev_register_ext. That work might
be scheduled by led_set_brightness(). So, a call to
led_set_brightness() moments before setting netdev trigger might
reproduce the issue I see.

I'll try to get who scheduled the work and the stack from there. It
might already pinpoint the cause. Checking if the work is pending
during netdev trigger activation might also help. I'll also try to
flush (or cancel) the work before activating the new trigger. I just
don't know if I can flush (and that way, blocking) inside
led_set_brightness().

I just bricked my device and I couldn't continue the tests. I might
need to reserve an extra hour to fix that in the next days.

Regards,

Luiz
Luiz Angelo Daros de Luca Jan. 13, 2024, 2:24 p.m. UTC | #7
> I'll try to get who scheduled the work and the stack from there. It
> might already pinpoint the cause. Checking if the work is pending
> during netdev trigger activation might also help. I'll also try to
> flush (or cancel) the work before activating the new trigger. I just
> don't know if I can flush (and that way, blocking) inside
> led_set_brightness().
>

Hi Christian,

I got it. It was actually from the netdev trigger. During activation we have:

        /* Check if hw control is active by default on the LED.
         * Init already enabled mode in hw control.
         */
        if (supports_hw_control(led_cdev)) {
                dev = led_cdev->hw_control_get_device(led_cdev);
                if (dev) {
                        const char *name = dev_name(dev);

                        set_device_name(trigger_data, name, strlen(name));
                        trigger_data->hw_control = true;

                        rc = led_cdev->hw_control_get(led_cdev, &mode);
                        if (!rc)
                                trigger_data->mode = mode;
                }
        }

The set_device_name calls set_baseline_state() that, at this point,
will start to monitor the device using sw control
(trigger_data->hw_control is only set afterwards). In
set_baseline_state(), it will call led_set_brightness in most
codepaths (all of them if trigger_data->hw_control is false). With
link down (and other situations), it will call
led_set_brightness(led_cdev, LED_OFF). If that led_set_brightness
takes some time to be processed, it will happen after the hw control
was configured, undoing what it previously just did.

Is there any good reason to call set_device_name before the led mode
and hw_control are defined? Will this break anything?

diff --git a/drivers/leds/trigger/ledtrig-netdev.c
b/drivers/leds/trigger/ledtrig-netdev.c
index d76214fa9ad8..6f72d55c187a 100644
--- a/drivers/leds/trigger/ledtrig-netdev.c
+++ b/drivers/leds/trigger/ledtrig-netdev.c
@@ -572,12 +572,13 @@ static int netdev_trig_activate(struct
led_classdev *led_cdev)
               if (dev) {
                       const char *name = dev_name(dev);

-                       set_device_name(trigger_data, name, strlen(name));
                       trigger_data->hw_control = true;

                       rc = led_cdev->hw_control_get(led_cdev, &mode);
                       if (!rc)
                               trigger_data->mode = mode;
+
+                       set_device_name(trigger_data, name, strlen(name));
               }
       }

With this patch, it will not undo the trigger setting in hardware
anymore. However, it now calls the hw_control_set 3 times during
activation:

1) set_device_name
2) register_netdevice_notifier on NETDEV_REGISTER
3) register_netdevice_notifier on NETDEV_UP

Anyway, calling it multiple times doesn't break anything.

Regards,

Luiz
Heiner Kallweit Jan. 13, 2024, 2:31 p.m. UTC | #8
On 13.01.2024 15:24, Luiz Angelo Daros de Luca wrote:
>> I'll try to get who scheduled the work and the stack from there. It
>> might already pinpoint the cause. Checking if the work is pending
>> during netdev trigger activation might also help. I'll also try to
>> flush (or cancel) the work before activating the new trigger. I just
>> don't know if I can flush (and that way, blocking) inside
>> led_set_brightness().
>>
> 
> Hi Christian,
> 
> I got it. It was actually from the netdev trigger. During activation we have:
> 
>         /* Check if hw control is active by default on the LED.
>          * Init already enabled mode in hw control.
>          */
>         if (supports_hw_control(led_cdev)) {
>                 dev = led_cdev->hw_control_get_device(led_cdev);
>                 if (dev) {
>                         const char *name = dev_name(dev);
> 
>                         set_device_name(trigger_data, name, strlen(name));
>                         trigger_data->hw_control = true;
> 
>                         rc = led_cdev->hw_control_get(led_cdev, &mode);
>                         if (!rc)
>                                 trigger_data->mode = mode;
>                 }
>         }
> 
> The set_device_name calls set_baseline_state() that, at this point,
> will start to monitor the device using sw control
> (trigger_data->hw_control is only set afterwards). In
> set_baseline_state(), it will call led_set_brightness in most
> codepaths (all of them if trigger_data->hw_control is false). With
> link down (and other situations), it will call
> led_set_brightness(led_cdev, LED_OFF). If that led_set_brightness
> takes some time to be processed, it will happen after the hw control
> was configured, undoing what it previously just did.
> 
> Is there any good reason to call set_device_name before the led mode
> and hw_control are defined? Will this break anything?
> 
> diff --git a/drivers/leds/trigger/ledtrig-netdev.c
> b/drivers/leds/trigger/ledtrig-netdev.c
> index d76214fa9ad8..6f72d55c187a 100644
> --- a/drivers/leds/trigger/ledtrig-netdev.c
> +++ b/drivers/leds/trigger/ledtrig-netdev.c
> @@ -572,12 +572,13 @@ static int netdev_trig_activate(struct
> led_classdev *led_cdev)
>                if (dev) {
>                        const char *name = dev_name(dev);
> 
> -                       set_device_name(trigger_data, name, strlen(name));
>                        trigger_data->hw_control = true;
> 
>                        rc = led_cdev->hw_control_get(led_cdev, &mode);
>                        if (!rc)
>                                trigger_data->mode = mode;
> +
> +                       set_device_name(trigger_data, name, strlen(name));
>                }
>        }
> 
> With this patch, it will not undo the trigger setting in hardware
> anymore. However, it now calls the hw_control_set 3 times during
> activation:
> 
This is addressed by the following patch, it should show up in linux-next
after the merge window.

https://git.kernel.org/pub/scm/linux/kernel/git/lee/leds.git/commit/?h=for-leds-next-next&id=5df2b4ed10a4ea636bb5ace99712a7d0c6226a55

> 1) set_device_name
> 2) register_netdevice_notifier on NETDEV_REGISTER
> 3) register_netdevice_notifier on NETDEV_UP
> 
> Anyway, calling it multiple times doesn't break anything.
> 
> Regards,
> 
> Luiz
>
Luiz Angelo Daros de Luca Jan. 13, 2024, 2:41 p.m. UTC | #9
> This is addressed by the following patch, it should show up in linux-next
> after the merge window.
>
> https://git.kernel.org/pub/scm/linux/kernel/git/lee/leds.git/commit/?h=for-leds-next-next&id=5df2b4ed10a4ea636bb5ace99712a7d0c6226a55

Hello Heiner,

Yes, exactly that. Thanks. I wish it had appeared some days ago. :-)

Regards,

Luiz