Message ID | 20240106184651.3665-1-luizluca@gmail.com (mailing list archive) |
---|---|
Headers | show |
Series | net: dsa: realtek: fix LED support for rtl8366rb | expand |
> The rtl8366rb switch family has 4 LED groups, with one LED from each > group for each of its 6 ports. LEDs in this family can be controlled > manually using a bitmap or triggered by hardware. It's important to note > that hardware triggers are configured at the LED group level, meaning > all LEDs in the same group share the same hardware triggers settings. > > The first part of this series involves dropping most of the existing > code, as, except for disabling the LEDs, it was not working as expected. > If not disabled, the LEDs will retain their default settings after a > switch reset, which may be sufficient for many devices. > > The second part introduces the LED driver to control the switch LEDs > from sysfs or device-tree. This driver still allows the LEDs to retain > their default settings, but it will shift to the software-based OS LED > triggers if any configuration is changed. Subsequently, the LEDs will > operate as normal LEDs until the switch undergoes another reset. > > Netdev LED trigger supports offloading to hardware triggers. > Unfortunately, this isn't possible with the current LED API for this > switch family. When the hardware trigger is enabled, it applies to all > LEDs in the LED group while the LED API decides to offload based on only > the state of a single LED. To avoid inconsistency between LEDs, > offloading would need to check if all LEDs in the group share the same > compatible settings and atomically enable offload for all LEDs. Hi Christian, I tried to implement something close to your work with qca8k and LED hw control. However, I couldn't find a solution that would work with the existing API. The HW led configuration in realtek switches is shared with all LEDs in a group. Before activating the hw control, all LEDs in the same group must share the same netdev trigger config, use the correct device and also use a compatible netdev trigger settings. In order to check that, I would need to expose some internal netdev trigger info that is only available through sysfs (and I believe sysfs is not suitable to be used from the kernel). Even if I got all LEDs with the correct settings, I would need to atomicly switch all LEDs to use the hw control or, at least, I would need to stop all update jobs because if the OS changes a LED brightness, it might be interpreted as the OS disabling the hw control: /* ... * Deactivate hardware blink control by setting brightness to LED_OFF via * the brightness_set() callback. * ... */ int (*hw_control_set)(struct led_classdev *led_cdev, unsigned long flags); Do you have any idea how to implement it? BTW, during my tests with a single LED, ignoring the LED group situation, I noticed that the OS was sending a brightness_set(LED_OFF) after I changed the trigger to netdev, a moment after hw_control_set was called. It doesn't make sense to enable hw control just to disable it afterwards. The call came from set_brightness_delayed(). Maybe it is because my test device is pretty slow and the previous trigger event always gets queued. Touching any settings after that worked as expected without the spurious brightness_set(LED_OFF). Did you see something like this? Regards, Luiz
Hi Luiz, On Sat, Jan 6, 2024 at 7:47 PM Luiz Angelo Daros de Luca <luizluca@gmail.com> wrote: > The rtl8366rb switch family has 4 LED groups, with one LED from each > group for each of its 6 ports. LEDs in this family can be controlled > manually using a bitmap or triggered by hardware. It's important to note > that hardware triggers are configured at the LED group level, meaning > all LEDs in the same group share the same hardware triggers settings. > > The first part of this series involves dropping most of the existing > code, as, except for disabling the LEDs, it was not working as expected. > If not disabled, the LEDs will retain their default settings after a > switch reset, which may be sufficient for many devices. > > The second part introduces the LED driver to control the switch LEDs > from sysfs or device-tree. This driver still allows the LEDs to retain > their default settings, but it will shift to the software-based OS LED > triggers if any configuration is changed. Subsequently, the LEDs will > operate as normal LEDs until the switch undergoes another reset. > > Netdev LED trigger supports offloading to hardware triggers. > Unfortunately, this isn't possible with the current LED API for this > switch family. When the hardware trigger is enabled, it applies to all > LEDs in the LED group while the LED API decides to offload based on only > the state of a single LED. To avoid inconsistency between LEDs, > offloading would need to check if all LEDs in the group share the same > compatible settings and atomically enable offload for all LEDs. I think these patches look great, and the driver certainly look better after these changes than before them so if you resend without RFC, please feel free to add my: Reviewed-by: Linus Walleij <linus.walleij@linaro.org> HW triggers may be hard to implement but plain software control is not bad either so this is already way better than what we had before. HW control can always be discussed and added later. Yours, Linus Walleij
On Sat, Jan 06, 2024 at 04:47:10PM -0300, Luiz Angelo Daros de Luca wrote: > > The rtl8366rb switch family has 4 LED groups, with one LED from each > > group for each of its 6 ports. LEDs in this family can be controlled > > manually using a bitmap or triggered by hardware. It's important to note > > that hardware triggers are configured at the LED group level, meaning > > all LEDs in the same group share the same hardware triggers settings. > > > > The first part of this series involves dropping most of the existing > > code, as, except for disabling the LEDs, it was not working as expected. > > If not disabled, the LEDs will retain their default settings after a > > switch reset, which may be sufficient for many devices. > > > > The second part introduces the LED driver to control the switch LEDs > > from sysfs or device-tree. This driver still allows the LEDs to retain > > their default settings, but it will shift to the software-based OS LED > > triggers if any configuration is changed. Subsequently, the LEDs will > > operate as normal LEDs until the switch undergoes another reset. > > > > Netdev LED trigger supports offloading to hardware triggers. > > Unfortunately, this isn't possible with the current LED API for this > > switch family. When the hardware trigger is enabled, it applies to all > > LEDs in the LED group while the LED API decides to offload based on only > > the state of a single LED. To avoid inconsistency between LEDs, > > offloading would need to check if all LEDs in the group share the same > > compatible settings and atomically enable offload for all LEDs. > > Hi Christian, > > I tried to implement something close to your work with qca8k and LED > hw control. However, I couldn't find a solution that would work with > the existing API. The HW led configuration in realtek switches is > shared with all LEDs in a group. Before activating the hw control, all > LEDs in the same group must share the same netdev trigger config, use > the correct device and also use a compatible netdev trigger settings. > In order to check that, I would need to expose some internal netdev > trigger info that is only available through sysfs (and I believe sysfs > is not suitable to be used from the kernel). Even if I got all LEDs > with the correct settings, I would need to atomicly switch all LEDs to > use the hw control or, at least, I would need to stop all update jobs > because if the OS changes a LED brightness, it might be interpreted as > the OS disabling the hw control: > Saddly we still don't have the concept of LED groups, but from what I notice 99% of the time switch have limitation of HW control but single LED can still be controlled separately. With this limitation you can use the is_supported function and some priv struct to enforce and reject unsupported configuration. netdev trigger will then fallback to software in this case. (I assume on real world scenario to have all the LED in the group to be set to the common rule set resulting in is_supported never rejecting it) Also consider this situation, it's the first LED touched that enables HW control that drive everything. LED configuration are not enabled all at once. You can totally introduce a priv struct that cache the current modes and on the other LEDs make sure the requested mode match the cache one. And I guess this limitation should be printed and documented in DT. > /* > ... > * Deactivate hardware blink control by setting brightness to LED_OFF via > * the brightness_set() callback. > * > ... > */ > int (*hw_control_set)(struct led_classdev *led_cdev, > unsigned long flags); > > Do you have any idea how to implement it? > > BTW, during my tests with a single LED, ignoring the LED group > situation, I noticed that the OS was sending a brightness_set(LED_OFF) > after I changed the trigger to netdev, a moment after hw_control_set > was called. It doesn't make sense to enable hw control just to disable > it afterwards. The call came from set_brightness_delayed(). Maybe it > is because my test device is pretty slow and the previous trigger > event always gets queued. Touching any settings after that worked as > expected without the spurious brightness_set(LED_OFF). Did you see > something like this? > Consider that brightness_set is called whatever a trigger is changed, the logic is in the generic LED handling. Setting OFF and then enabling hw control should not change a thing. In other driver tho I notice an extra measure is needed to reset any HW control rule already applied by default.
> > Hi Christian, > > > > I tried to implement something close to your work with qca8k and LED > > hw control. However, I couldn't find a solution that would work with > > the existing API. The HW led configuration in realtek switches is > > shared with all LEDs in a group. Before activating the hw control, all > > LEDs in the same group must share the same netdev trigger config, use > > the correct device and also use a compatible netdev trigger settings. > > In order to check that, I would need to expose some internal netdev > > trigger info that is only available through sysfs (and I believe sysfs > > is not suitable to be used from the kernel). Even if I got all LEDs > > with the correct settings, I would need to atomicly switch all LEDs to > > use the hw control or, at least, I would need to stop all update jobs > > because if the OS changes a LED brightness, it might be interpreted as > > the OS disabling the hw control: > > > > Saddly we still don't have the concept of LED groups, but from what I > notice 99% of the time switch have limitation of HW control but single > LED can still be controlled separately. Individually, I can only turn them on/off. That is enough for software control but not for hardware control. When you set a LED group to blink on link activity, all LEDs will be affected. > With this limitation you can use the is_supported function and some priv > struct to enforce and reject unsupported configuration. > netdev trigger will then fallback to software in this case. (I assume on > real world scenario to have all the LED in the group to be set to the > common rule set resulting in is_supported never rejecting it) Maybe I wasn't clear enough about what the HW provides me. I have 4 16-bit registers: REG1: a single blink rate used by all LEDs in all groups REG2: configures the trigger for each group, 4-bit each, with one special 4-bit value being "fixed", equivalent to "none" in Linux LED trigger REG3: bitmap to manually control LEDs in group 0 and 1 only when their group trigger is configured as fixed. REG3: bitmap to manually control LEDs in group 2 and 3 only when their group trigger is configured as fixed. And that's it. I can keep track of the netdev trigger form calls to "is_supported function". I can also check if all LEDs are still using the netdev trigger. However, I cannot detect if the user changed the device to something else not related to the corresponding port as the netdev trigger will not check the compatibility if the device does not match. I would still need to expose at least some of the netdev trigger internal data. > Also consider this situation, it's the first LED touched that enables HW > control that drive everything. LED configuration are not enabled all at > once. You can totally introduce a priv struct that cache the current > modes and on the other LEDs make sure the requested mode match the cache > one. Considering that I can externally check that all LEDs have a netdev trigger settings compatible with the HW control, once the last LED is configured, I could return true for the hw_control_is_supported. When hw_control_set is called, I could configure the hardware accordingly, which would affect all LEDs in that group. However, the OS will still use the software control for the other LEDs in that same group. That way, once a netdev event turns off one LED, that message is the same clue the LED driver receives to disable the hardware control. It will undo the hardware change I just made. I could use led_brightness_set(OFF) on those other LEDs during hw_control_set to disable their software controlled triggers (actually changing the trigger to "none"), but it might be a race condition of who stops the other. And even then, the other LEDs will keep an inconsistent configuration state, with "none" as their trigger. I need: 1) expose the required info or allow an external caller to test a LED configuration for compatibility (avoiding recursion). 2) something from hw_control_set() that stops the software triggers in other LEDs without destroying their configuration. 3) something that could enable hw_control on those other LEDs > And I guess this limitation should be printed and documented in DT. > > > /* > > ... > > * Deactivate hardware blink control by setting brightness to LED_OFF via > > * the brightness_set() callback. > > * > > ... > > */ > > int (*hw_control_set)(struct led_classdev *led_cdev, > > unsigned long flags); > > > > Do you have any idea how to implement it? > > > > BTW, during my tests with a single LED, ignoring the LED group > > situation, I noticed that the OS was sending a brightness_set(LED_OFF) > > after I changed the trigger to netdev, a moment after hw_control_set > > was called. It doesn't make sense to enable hw control just to disable > > it afterwards. The call came from set_brightness_delayed(). Maybe it > > is because my test device is pretty slow and the previous trigger > > event always gets queued. Touching any settings after that worked as > > expected without the spurious brightness_set(LED_OFF). Did you see > > something like this? > > > > Consider that brightness_set is called whatever a trigger is changed, > the logic is in the generic LED handling. Setting OFF and then enabling > hw control should not change a thing. In other driver tho I notice an > extra measure is needed to reset any HW control rule already applied by > default. It would be OK to call brightness_set(LED_OFF) if that is guaranteed to happen before hw_control_set(). The problem is that the brightness_set(LED_OFF) happens *after* hw_control_set() was called. It looks like a race condition. Regards, Luiz
On Mon, Jan 08, 2024 at 02:47:22AM -0300, Luiz Angelo Daros de Luca wrote: > > > Hi Christian, > > > > > > I tried to implement something close to your work with qca8k and LED > > > hw control. However, I couldn't find a solution that would work with > > > the existing API. The HW led configuration in realtek switches is > > > shared with all LEDs in a group. Before activating the hw control, all > > > LEDs in the same group must share the same netdev trigger config, use > > > the correct device and also use a compatible netdev trigger settings. > > > In order to check that, I would need to expose some internal netdev > > > trigger info that is only available through sysfs (and I believe sysfs > > > is not suitable to be used from the kernel). Even if I got all LEDs > > > with the correct settings, I would need to atomicly switch all LEDs to > > > use the hw control or, at least, I would need to stop all update jobs > > > because if the OS changes a LED brightness, it might be interpreted as > > > the OS disabling the hw control: > > > > > > > Saddly we still don't have the concept of LED groups, but from what I > > notice 99% of the time switch have limitation of HW control but single > > LED can still be controlled separately. > > Individually, I can only turn them on/off. That is enough for software > control but not for hardware control. When you set a LED group to > blink on link activity, all LEDs will be affected. > Assuming we have the same 2005 datasheet, yes the LED situation is complex for this switch. (If you have something better please link) > > With this limitation you can use the is_supported function and some priv > > struct to enforce and reject unsupported configuration. > > netdev trigger will then fallback to software in this case. (I assume on > > real world scenario to have all the LED in the group to be set to the > > common rule set resulting in is_supported never rejecting it) > > Maybe I wasn't clear enough about what the HW provides me. I have 4 > 16-bit registers: > > REG1: a single blink rate used by all LEDs in all groups > REG2: configures the trigger for each group, 4-bit each, with one > special 4-bit value being "fixed", equivalent to "none" in Linux LED > trigger > REG3: bitmap to manually control LEDs in group 0 and 1 only when their > group trigger is configured as fixed. > REG3: bitmap to manually control LEDs in group 2 and 3 only when their > group trigger is configured as fixed. > > And that's it. > > I can keep track of the netdev trigger form calls to "is_supported > function". I can also check if all LEDs are still using the netdev > trigger. However, I cannot detect if the user changed the device to > something else not related to the corresponding port as the netdev > trigger will not check the compatibility if the device does not match. > I would still need to expose at least some of the netdev trigger > internal data. > We can make some assumption and use refcount tho. For very exotic configuration it will always fallback to software (and make hw control impossible) but for more generic one we can benefit of it. - We can only enable HW control on the LED group. This means that for the group we need to make sure that: 1. We have the correct device set to each LED 2. We have an acceptable mode requesterd. With these 2 prereq, we can correctly enable HW control for the LED group. HW control is enable only IF the device netdev currently set match what hw_control_get_device returns. With this, we can assume for HW control request the correct netdev is always set. We also use refcount to check how many LED are actually ""enabled"". With this count we can understand if we can enable HW control for the LED group or return false from is_supported. And with HW control enabled, we would reject all kind of invalid settings and print a warning to alert the user of the limitation and maybe how to remove it. > > Also consider this situation, it's the first LED touched that enables HW > > control that drive everything. LED configuration are not enabled all at > > once. You can totally introduce a priv struct that cache the current > > modes and on the other LEDs make sure the requested mode match the cache > > one. > > Considering that I can externally check that all LEDs have a netdev > trigger settings compatible with the HW control, once the last LED is > configured, I could return true for the hw_control_is_supported. When > hw_control_set is called, I could configure the hardware accordingly, > which would affect all LEDs in that group. However, the OS will still > use the software control for the other LEDs in that same group. That > way, once a netdev event turns off one LED, that message is the same > clue the LED driver receives to disable the hardware control. It will > undo the hardware change I just made. I could use > led_brightness_set(OFF) on those other LEDs during hw_control_set to > disable their software controlled triggers (actually changing the > trigger to "none"), but it might be a race condition of who stops the > other. And even then, the other LEDs will keep an inconsistent > configuration state, with "none" as their trigger. > > I need: > 1) expose the required info or allow an external caller to test a LED > configuration for compatibility (avoiding recursion). > 2) something from hw_control_set() that stops the software triggers in > other LEDs without destroying their configuration. > 3) something that could enable hw_control on those other LEDs > I think it would be problematic for other LED to do changes. I need to check how LED multicolor work... In a sense they are LED group so maybe in LED core we have a way to group LED and share some info with the others. > > And I guess this limitation should be printed and documented in DT. > > > > > /* > > > ... > > > * Deactivate hardware blink control by setting brightness to LED_OFF via > > > * the brightness_set() callback. > > > * > > > ... > > > */ > > > int (*hw_control_set)(struct led_classdev *led_cdev, > > > unsigned long flags); > > > > > > Do you have any idea how to implement it? > > > > > > BTW, during my tests with a single LED, ignoring the LED group > > > situation, I noticed that the OS was sending a brightness_set(LED_OFF) > > > after I changed the trigger to netdev, a moment after hw_control_set > > > was called. It doesn't make sense to enable hw control just to disable > > > it afterwards. The call came from set_brightness_delayed(). Maybe it > > > is because my test device is pretty slow and the previous trigger > > > event always gets queued. Touching any settings after that worked as > > > expected without the spurious brightness_set(LED_OFF). Did you see > > > something like this? > > > > > > > Consider that brightness_set is called whatever a trigger is changed, > > the logic is in the generic LED handling. Setting OFF and then enabling > > hw control should not change a thing. In other driver tho I notice an > > extra measure is needed to reset any HW control rule already applied by > > default. > > It would be OK to call brightness_set(LED_OFF) if that is guaranteed > to happen before hw_control_set(). The problem is that the > brightness_set(LED_OFF) happens *after* hw_control_set() was called. > It looks like a race condition. > Totally require some further investigation, it seems strange tho that the your system is that slow.
> On Mon, Jan 08, 2024 at 02:47:22AM -0300, Luiz Angelo Daros de Luca wrote: > > > > Hi Christian, > > > > > > > > I tried to implement something close to your work with qca8k and LED > > > > hw control. However, I couldn't find a solution that would work with > > > > the existing API. The HW led configuration in realtek switches is > > > > shared with all LEDs in a group. Before activating the hw control, all > > > > LEDs in the same group must share the same netdev trigger config, use > > > > the correct device and also use a compatible netdev trigger settings. > > > > In order to check that, I would need to expose some internal netdev > > > > trigger info that is only available through sysfs (and I believe sysfs > > > > is not suitable to be used from the kernel). Even if I got all LEDs > > > > with the correct settings, I would need to atomicly switch all LEDs to > > > > use the hw control or, at least, I would need to stop all update jobs > > > > because if the OS changes a LED brightness, it might be interpreted as > > > > the OS disabling the hw control: > > > > > > > > > > Saddly we still don't have the concept of LED groups, but from what I > > > notice 99% of the time switch have limitation of HW control but single > > > LED can still be controlled separately. > > > > Individually, I can only turn them on/off. That is enough for software > > control but not for hardware control. When you set a LED group to > > blink on link activity, all LEDs will be affected. > > > > Assuming we have the same 2005 datasheet, yes the LED situation is > complex for this switch. (If you have something better please link) I wouldn't say complex, but limited. The manual for rtl8365mb (https://cdn.jsdelivr.net/gh/libc0607/Realtek_switch_hacking@files/Realtek_Unmanaged_Switch_ProgrammingGuide.pdf, page 64) shows the vendor API for controlling LEDs. It is similar for both families. You have only 3 functions: int32 rtk_led_blinkRate_set(rtk_led_blink_rate_t blinkRate) int32 rtk_led_groupConfig_set(rtk_led_group_t group, rtk_led_congig_t config) int32 rtk_led_enable_set(rtk_led_group_t group, rtk_portmask_t portmask) rtk_led_blinkRate_set sets the blink rate but it does not mention group or port, so it is global. rtk_led_groupConfig_set defines the HW trigger but only by group. rtk_led_enable_set uses a mask to manually set a LED It closely matches the register behavior. > > > Also consider this situation, it's the first LED touched that enables HW > > > control that drive everything. LED configuration are not enabled all at > > > once. You can totally introduce a priv struct that cache the current > > > modes and on the other LEDs make sure the requested mode match the cache > > > one. > > > > Considering that I can externally check that all LEDs have a netdev > > trigger settings compatible with the HW control, once the last LED is > > configured, I could return true for the hw_control_is_supported. When > > hw_control_set is called, I could configure the hardware accordingly, > > which would affect all LEDs in that group. However, the OS will still > > use the software control for the other LEDs in that same group. That > > way, once a netdev event turns off one LED, that message is the same > > clue the LED driver receives to disable the hardware control. It will > > undo the hardware change I just made. I could use > > led_brightness_set(OFF) on those other LEDs during hw_control_set to > > disable their software controlled triggers (actually changing the > > trigger to "none"), but it might be a race condition of who stops the > > other. And even then, the other LEDs will keep an inconsistent > > configuration state, with "none" as their trigger. > > > > I need: > > 1) expose the required info or allow an external caller to test a LED > > configuration for compatibility (avoiding recursion). > > 2) something from hw_control_set() that stops the software triggers in > > other LEDs without destroying their configuration. > > 3) something that could enable hw_control on those other LEDs > > > > I think it would be problematic for other LED to do changes. I need to > check how LED multicolor work... In a sense they are LED group so maybe > in LED core we have a way to group LED and share some info with the > others. That's the main issue. I can expose the needed info to check if all LEDs agree in a compatible configuration. However, once that happens, I must stop all sw control and enable the hw control. Something like: lock a group of leds for all leds if not devname is correct fallback to sw control if settings are not compatible fallback to sw control if settings is different from other LEDs fallback to sw control for all leds stop sw control work for all leds enable hw control set the group hw trigger unlock the group of leds And I need something that would also work to disable hw control once the first LED changes anything, breaking the compatibility. > > > > BTW, during my tests with a single LED, ignoring the LED group > > > > situation, I noticed that the OS was sending a brightness_set(LED_OFF) > > > > after I changed the trigger to netdev, a moment after hw_control_set > > > > was called. It doesn't make sense to enable hw control just to disable > > > > it afterwards. The call came from set_brightness_delayed(). Maybe it > > > > is because my test device is pretty slow and the previous trigger > > > > event always gets queued. Touching any settings after that worked as > > > > expected without the spurious brightness_set(LED_OFF). Did you see > > > > something like this? > > > > > > > > > > Consider that brightness_set is called whatever a trigger is changed, > > > the logic is in the generic LED handling. Setting OFF and then enabling > > > hw control should not change a thing. In other driver tho I notice an > > > extra measure is needed to reset any HW control rule already applied by > > > default. > > > > It would be OK to call brightness_set(LED_OFF) if that is guaranteed > > to happen before hw_control_set(). The problem is that the > > brightness_set(LED_OFF) happens *after* hw_control_set() was called. > > It looks like a race condition. > > > > Totally require some further investigation, it seems strange tho that > the your system is that slow. I got some stacks. When I change the trigger to netdev, I get 2 calls to set the hw control: [ 625.601449] CPU: 0 PID: 2607 Comm: ash Not tainted 6.1.59 #0 [ 625.607153] Stack : 809b0000 77e70000 00000000 800c1e80 00000431 00000004 00000000 00000000 [ 625.615627] 80c45c74 80980000 80850000 806ffc2c 80e23b88 00000001 80c45c18 f51e58b8 [ 625.624094] 00000000 00000000 806ffc2c 80c45b48 ffffefff 00000000 00000000 ffffffea [ 625.632561] 00000112 80c45b54 00000112 807c99c0 00000001 806ffc2c 809ec080 00000005 [ 625.641029] ffff7fff 80840000 809b0000 77e70000 00000018 8039785c 00000000 80980000 [ 625.649495] ... [ 625.651965] Call Trace: [ 625.654422] [<80066e4c>] show_stack+0x28/0xf0 [ 625.658848] [<8061b800>] dump_stack_lvl+0x38/0x60 [ 625.663599] [<8189b890>] rtl8366rb_cled_hw_control_set+0xdc/0xf8 [rtl8366] [ 625.670578] [<8041dfb0>] netdev_trig_notify+0x114/0x280 [ 625.675867] [<80450d14>] call_netdevice_register_net_notifiers+0x54/0x104 [ 625.682729] [<804542dc>] register_netdevice_notifier+0x98/0x130 [ 625.688702] [<8041dbf8>] netdev_trig_activate+0x160/0x1b0 [ 625.694152] [<8041b948>] led_trigger_set+0xf8/0x254 [ 625.699070] [<8041c2a4>] led_trigger_write+0xd4/0x148 [ 625.704163] [<8026966c>] sysfs_kf_bin_write+0x80/0xbc [ 625.709263] [<80268438>] kernfs_fop_write_iter+0x118/0x244 [ 625.714801] [<801e787c>] vfs_write+0x1fc/0x3c0 [ 625.719301] [<801e7bdc>] ksys_write+0x70/0x124 [ 625.723791] [<8006e1e4>] syscall_common+0x34/0x58 [ 625.761665] CPU: 0 PID: 2607 Comm: ash Not tainted 6.1.59 #0 [ 625.767374] Stack : 809b0000 77e70000 00000000 800c1e80 00000431 00000004 00000000 00000000 [ 625.775848] 80c45c74 80980000 80850000 806ffc2c 80e23b88 00000001 80c45c18 f51e58b8 [ 625.784315] 00000000 00000000 806ffc2c 80c45b48 ffffefff 00000000 00000000 ffffffea [ 625.792782] 00000130 80c45b54 00000130 807c99c0 00000001 806ffc2c 809ec080 00000001 [ 625.801249] ffff7fff 80840000 809b0000 77e70000 00000018 8039785c 00000000 80980000 [ 625.809716] ... [ 625.812186] Call Trace: [ 625.814643] [<80066e4c>] show_stack+0x28/0xf0 [ 625.819068] [<8061b800>] dump_stack_lvl+0x38/0x60 [ 625.823820] [<8189b890>] rtl8366rb_cled_hw_control_set+0xdc/0xf8 [rtl8366] [ 625.830800] [<8041dfb0>] netdev_trig_notify+0x114/0x280 [ 625.836088] [<80450d94>] call_netdevice_register_net_notifiers+0xd4/0x104 [ 625.842950] [<804542dc>] register_netdevice_notifier+0x98/0x130 [ 625.848923] [<8041dbf8>] netdev_trig_activate+0x160/0x1b0 [ 625.854374] [<8041b948>] led_trigger_set+0xf8/0x254 [ 625.859300] [<8041c2a4>] led_trigger_write+0xd4/0x148 [ 625.864401] [<8026966c>] sysfs_kf_bin_write+0x80/0xbc [ 625.869502] [<80268438>] kernfs_fop_write_iter+0x118/0x244 [ 625.875040] [<801e787c>] vfs_write+0x1fc/0x3c0 [ 625.879539] [<801e7bdc>] ksys_write+0x70/0x124 [ 625.884030] [<8006e1e4>] syscall_common+0x34/0x58 That is not really a problem but I my guess is that it is calling for both NETDEV_REGISTER and NETDEV_UP as both eventually call set_baseline_state(). Shouldn't we avoid one of them? But after that, I get: [ 625.900712] CPU: 0 PID: 2626 Comm: kworker/0:2 Not tainted 6.1.59 #0 [ 625.907154] Workqueue: events set_brightness_delayed [ 625.912178] Stack : 81000205 81a4ab40 817f9da4 800c1e80 8199e4e0 806ffc2c 807c0000 81a4ab00 [ 625.920652] 81000200 00000000 80c08c5c 800c1f7c 80e21cc8 00000001 817f9d60 f37f8d2c [ 625.929119] 00000000 00000000 806ffc2c 817f9c78 ffffefff 00000000 00000000 ffffffea [ 625.937587] 00000148 817f9c84 00000148 807c99c0 00000001 806ffc2c 00000000 81000200 [ 625.946054] 00000000 80c08c5c 81000205 81a4ab40 00000018 8039785c 00000000 80980000 [ 625.954521] ... [ 625.956990] Call Trace: [ 625.959448] [<80066e4c>] show_stack+0x28/0xf0 [ 625.963873] [<8061b800>] dump_stack_lvl+0x38/0x60 [ 625.968624] [<81899ec8>] rtl8366rb_cled_brightness_set_blocking+0x68/0x88 [rtl8366] [ 625.976389] [<8041a018>] set_brightness_delayed+0x84/0xec [ 625.981833] [<8009f724>] process_one_work+0x254/0x484 [ 625.986934] [<8009fed0>] worker_thread+0x178/0x5a4 [ 625.991766] [<800a61a0>] kthread+0xec/0x114 [ 625.996004] [<800620b8>] ret_from_kernel_thread+0x14/0x1c This one is the issue. It turns off the LED, forcing me to disable the hw control I just configured (twice). Unfortunately, a scheduled work breaks the stack, not showing who actually requested it. I only saw set_brightness_delayed being used by a work created in led_init_core(), called by led_classdev_register_ext. That work might be scheduled by led_set_brightness(). So, a call to led_set_brightness() moments before setting netdev trigger might reproduce the issue I see. I'll try to get who scheduled the work and the stack from there. It might already pinpoint the cause. Checking if the work is pending during netdev trigger activation might also help. I'll also try to flush (or cancel) the work before activating the new trigger. I just don't know if I can flush (and that way, blocking) inside led_set_brightness(). I just bricked my device and I couldn't continue the tests. I might need to reserve an extra hour to fix that in the next days. Regards, Luiz
> I'll try to get who scheduled the work and the stack from there. It > might already pinpoint the cause. Checking if the work is pending > during netdev trigger activation might also help. I'll also try to > flush (or cancel) the work before activating the new trigger. I just > don't know if I can flush (and that way, blocking) inside > led_set_brightness(). > Hi Christian, I got it. It was actually from the netdev trigger. During activation we have: /* Check if hw control is active by default on the LED. * Init already enabled mode in hw control. */ if (supports_hw_control(led_cdev)) { dev = led_cdev->hw_control_get_device(led_cdev); if (dev) { const char *name = dev_name(dev); set_device_name(trigger_data, name, strlen(name)); trigger_data->hw_control = true; rc = led_cdev->hw_control_get(led_cdev, &mode); if (!rc) trigger_data->mode = mode; } } The set_device_name calls set_baseline_state() that, at this point, will start to monitor the device using sw control (trigger_data->hw_control is only set afterwards). In set_baseline_state(), it will call led_set_brightness in most codepaths (all of them if trigger_data->hw_control is false). With link down (and other situations), it will call led_set_brightness(led_cdev, LED_OFF). If that led_set_brightness takes some time to be processed, it will happen after the hw control was configured, undoing what it previously just did. Is there any good reason to call set_device_name before the led mode and hw_control are defined? Will this break anything? diff --git a/drivers/leds/trigger/ledtrig-netdev.c b/drivers/leds/trigger/ledtrig-netdev.c index d76214fa9ad8..6f72d55c187a 100644 --- a/drivers/leds/trigger/ledtrig-netdev.c +++ b/drivers/leds/trigger/ledtrig-netdev.c @@ -572,12 +572,13 @@ static int netdev_trig_activate(struct led_classdev *led_cdev) if (dev) { const char *name = dev_name(dev); - set_device_name(trigger_data, name, strlen(name)); trigger_data->hw_control = true; rc = led_cdev->hw_control_get(led_cdev, &mode); if (!rc) trigger_data->mode = mode; + + set_device_name(trigger_data, name, strlen(name)); } } With this patch, it will not undo the trigger setting in hardware anymore. However, it now calls the hw_control_set 3 times during activation: 1) set_device_name 2) register_netdevice_notifier on NETDEV_REGISTER 3) register_netdevice_notifier on NETDEV_UP Anyway, calling it multiple times doesn't break anything. Regards, Luiz
On 13.01.2024 15:24, Luiz Angelo Daros de Luca wrote: >> I'll try to get who scheduled the work and the stack from there. It >> might already pinpoint the cause. Checking if the work is pending >> during netdev trigger activation might also help. I'll also try to >> flush (or cancel) the work before activating the new trigger. I just >> don't know if I can flush (and that way, blocking) inside >> led_set_brightness(). >> > > Hi Christian, > > I got it. It was actually from the netdev trigger. During activation we have: > > /* Check if hw control is active by default on the LED. > * Init already enabled mode in hw control. > */ > if (supports_hw_control(led_cdev)) { > dev = led_cdev->hw_control_get_device(led_cdev); > if (dev) { > const char *name = dev_name(dev); > > set_device_name(trigger_data, name, strlen(name)); > trigger_data->hw_control = true; > > rc = led_cdev->hw_control_get(led_cdev, &mode); > if (!rc) > trigger_data->mode = mode; > } > } > > The set_device_name calls set_baseline_state() that, at this point, > will start to monitor the device using sw control > (trigger_data->hw_control is only set afterwards). In > set_baseline_state(), it will call led_set_brightness in most > codepaths (all of them if trigger_data->hw_control is false). With > link down (and other situations), it will call > led_set_brightness(led_cdev, LED_OFF). If that led_set_brightness > takes some time to be processed, it will happen after the hw control > was configured, undoing what it previously just did. > > Is there any good reason to call set_device_name before the led mode > and hw_control are defined? Will this break anything? > > diff --git a/drivers/leds/trigger/ledtrig-netdev.c > b/drivers/leds/trigger/ledtrig-netdev.c > index d76214fa9ad8..6f72d55c187a 100644 > --- a/drivers/leds/trigger/ledtrig-netdev.c > +++ b/drivers/leds/trigger/ledtrig-netdev.c > @@ -572,12 +572,13 @@ static int netdev_trig_activate(struct > led_classdev *led_cdev) > if (dev) { > const char *name = dev_name(dev); > > - set_device_name(trigger_data, name, strlen(name)); > trigger_data->hw_control = true; > > rc = led_cdev->hw_control_get(led_cdev, &mode); > if (!rc) > trigger_data->mode = mode; > + > + set_device_name(trigger_data, name, strlen(name)); > } > } > > With this patch, it will not undo the trigger setting in hardware > anymore. However, it now calls the hw_control_set 3 times during > activation: > This is addressed by the following patch, it should show up in linux-next after the merge window. https://git.kernel.org/pub/scm/linux/kernel/git/lee/leds.git/commit/?h=for-leds-next-next&id=5df2b4ed10a4ea636bb5ace99712a7d0c6226a55 > 1) set_device_name > 2) register_netdevice_notifier on NETDEV_REGISTER > 3) register_netdevice_notifier on NETDEV_UP > > Anyway, calling it multiple times doesn't break anything. > > Regards, > > Luiz >
> This is addressed by the following patch, it should show up in linux-next > after the merge window. > > https://git.kernel.org/pub/scm/linux/kernel/git/lee/leds.git/commit/?h=for-leds-next-next&id=5df2b4ed10a4ea636bb5ace99712a7d0c6226a55 Hello Heiner, Yes, exactly that. Thanks. I wish it had appeared some days ago. :-) Regards, Luiz