Message ID | 20211008084348.7306-1-arun.ramadoss@microchip.com (mailing list archive) |
---|---|
State | Superseded |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | [net] net: dsa: microchip: Added the condition for scheduling ksz_mib_read_work | expand |
Context | Check | Description |
---|---|---|
netdev/cover_letter | success | Single patches do not need cover letters |
netdev/fixes_present | success | Fixes tag present in non-next series |
netdev/patch_count | success | Link |
netdev/tree_selection | success | Clearly marked for net |
netdev/subject_prefix | success | Link |
netdev/cc_maintainers | success | CCed 10 of 10 maintainers |
netdev/source_inline | success | Was 0 now: 0 |
netdev/verify_signedoff | success | Signed-off-by tag matches author and committer |
netdev/module_param | success | Was 0 now: 0 |
netdev/build_32bit | success | Errors and warnings before: 0 this patch: 0 |
netdev/kdoc | success | Errors and warnings before: 0 this patch: 0 |
netdev/verify_fixes | success | Fixes tag looks correct |
netdev/checkpatch | success | total: 0 errors, 0 warnings, 0 checks, 20 lines checked |
netdev/build_allmodconfig_warn | success | Errors and warnings before: 0 this patch: 0 |
netdev/header_inline | success | No static functions without inline keyword in header files |
On Fri, Oct 08, 2021 at 02:13:48PM +0530, Arun Ramadoss wrote: > When the ksz module is installed and removed using rmmod, kernel crashes > with null pointer dereferrence error. During rmmod, ksz_switch_remove > function tries to cancel the mib_read_workqueue using > cancel_delayed_work_sync routine. > > At the end of mib_read_workqueue execution, it again reschedule the > workqueue unconditionally. Due to which queue rescheduled after > mib_interval, during this execution it tries to access dp->slave. But > the slave is unregistered in the ksz_switch_remove function. Hence > kernel crashes. Something not correct here. https://www.kernel.org/doc/html/latest/core-api/workqueue.html?highlight=delayed_work#c.cancel_delayed_work_sync This is cancel_work_sync() for delayed works. and https://www.kernel.org/doc/html/latest/core-api/workqueue.html?highlight=delayed_work#c.cancel_work_sync This function can be used even if the work re-queues itself or migrates to another workqueue. Maybe the real problem is a missing call to destroy_worker()? Andrew
On Fri, 8 Oct 2021 15:58:16 +0200 Andrew Lunn wrote: > On Fri, Oct 08, 2021 at 02:13:48PM +0530, Arun Ramadoss wrote: > > When the ksz module is installed and removed using rmmod, kernel crashes > > with null pointer dereferrence error. During rmmod, ksz_switch_remove > > function tries to cancel the mib_read_workqueue using > > cancel_delayed_work_sync routine. > > > > At the end of mib_read_workqueue execution, it again reschedule the > > workqueue unconditionally. Due to which queue rescheduled after > > mib_interval, during this execution it tries to access dp->slave. But > > the slave is unregistered in the ksz_switch_remove function. Hence > > kernel crashes. > > Something not correct here. > > https://www.kernel.org/doc/html/latest/core-api/workqueue.html?highlight=delayed_work#c.cancel_delayed_work_sync > > This is cancel_work_sync() for delayed works. > > and > > https://www.kernel.org/doc/html/latest/core-api/workqueue.html?highlight=delayed_work#c.cancel_work_sync > > This function can be used even if the work re-queues itself or > migrates to another workqueue. > > Maybe the real problem is a missing call to destroy_worker()? Also the cancel_delayed_work_sync() is suspiciously early in the remove flow. There is a schedule_work call in ksz_mac_link_down() which may schedule the work back in. That'd also explain why the patch helps since ksz_mac_link_down() only schedules if (dev->mib_read_interval).
On Fri, 2021-10-08 at 11:34 -0700, Jakub Kicinski wrote: > EXTERNAL EMAIL: Do not click links or open attachments unless you > know the content is safe > > On Fri, 8 Oct 2021 15:58:16 +0200 Andrew Lunn wrote: > > On Fri, Oct 08, 2021 at 02:13:48PM +0530, Arun Ramadoss wrote: > > > When the ksz module is installed and removed using rmmod, kernel > > > crashes > > > with null pointer dereferrence error. During rmmod, > > > ksz_switch_remove > > > function tries to cancel the mib_read_workqueue using > > > cancel_delayed_work_sync routine. > > > > > > At the end of mib_read_workqueue execution, it again reschedule > > > the > > > workqueue unconditionally. Due to which queue rescheduled after > > > mib_interval, during this execution it tries to access dp->slave. > > > But > > > the slave is unregistered in the ksz_switch_remove function. > > > Hence > > > kernel crashes. > > > > Something not correct here. > > > > https://www.kernel.org/doc/html/latest/core-api/workqueue.html?highlight=delayed_work#c.cancel_delayed_work_sync > > > > This is cancel_work_sync() for delayed works. > > > > and > > > > https://www.kernel.org/doc/html/latest/core-api/workqueue.html?highlight=delayed_work#c.cancel_work_sync > > > > This function can be used even if the work re-queues itself or > > migrates to another workqueue. > > > > Maybe the real problem is a missing call to destroy_worker()? > > Also the cancel_delayed_work_sync() is suspiciously early in the > remove > flow. There is a schedule_work call in ksz_mac_link_down() which may > schedule the work back in. That'd also explain why the patch helps > since > ksz_mac_link_down() only schedules if (dev->mib_read_interval). In this patch, I did two things. Added the if condition for rescheduling the queue and other is resetted the mib_read_interval to zero. As per the cancel_delayed_queue_sync() functionality, Now I tried rmod after removing the if condition for resheduling the queue,kernel didn't crash. So, concluded that it is not rearm in ksz_mib_read_work is causing the problem but it is due to scheduling in the ksz_mac_link_down function. This function is called due to the dsa_unregister_switch. Due to resetting of the mib_read_interval to zero in switch_remove, the queue is not scheduled in mac_link_down, so kernel didn't crash. And also, as per suggestion on cancel_delayed_work_sync() is suspiciously placed in switch_remove. I undo this patch, and tried just by moving the canceling of delayed_work after the dsa_unregister_switch function. As expected dsa_unregister_switch calls the ksz_mac_link_down, which schedules the mib_read_work. Now, when cancel_delayed_work_sync is called, it cancels all the workqueue. As a result, module is removed successfully without kernel crash. Can I send the updated patch as v1 or new patch with updated commit message and description.
On Mon, 11 Oct 2021 09:41:55 +0000 Arun.Ramadoss@microchip.com wrote: > On Fri, 2021-10-08 at 11:34 -0700, Jakub Kicinski wrote: > > Also the cancel_delayed_work_sync() is suspiciously early in the > > remove > > flow. There is a schedule_work call in ksz_mac_link_down() which may > > schedule the work back in. That'd also explain why the patch helps > > since > > ksz_mac_link_down() only schedules if (dev->mib_read_interval). > In this patch, I did two things. Added the if condition for > rescheduling the queue and other is resetted the mib_read_interval to > zero. > As per the cancel_delayed_queue_sync() functionality, Now I tried rmod > after removing the if condition for resheduling the queue,kernel didn't > crash. So, concluded that it is not rearm in ksz_mib_read_work is > causing the problem but it is due to scheduling in the > ksz_mac_link_down function. This function is called due to the > dsa_unregister_switch. Due to resetting of the mib_read_interval to > zero in switch_remove, the queue is not scheduled in mac_link_down, so > kernel didn't crash. > > And also, as per suggestion on cancel_delayed_work_sync() is > suspiciously placed in switch_remove. I undo this patch, and tried just > by moving the canceling of delayed_work after the dsa_unregister_switch > function. As expected dsa_unregister_switch calls the > ksz_mac_link_down, which schedules the mib_read_work. Now, when > cancel_delayed_work_sync is called, it cancels all the workqueue. As a > result, module is removed successfully without kernel crash. > > Can I send the updated patch as v1 or new patch with updated commit > message and description. Please send a patch with just the second chunk (zeroing mib_read_interval), you can mark it as v2.
diff --git a/drivers/net/dsa/microchip/ksz_common.c b/drivers/net/dsa/microchip/ksz_common.c index 1542bfb8b5e5..ffc8e6fb300a 100644 --- a/drivers/net/dsa/microchip/ksz_common.c +++ b/drivers/net/dsa/microchip/ksz_common.c @@ -94,7 +94,8 @@ static void ksz_mib_read_work(struct work_struct *work) mutex_unlock(&mib->cnt_mutex); } - schedule_delayed_work(&dev->mib_read, dev->mib_read_interval); + if (dev->mib_read_interval) + schedule_delayed_work(&dev->mib_read, dev->mib_read_interval); } void ksz_init_mib_timer(struct ksz_device *dev) @@ -449,8 +450,10 @@ EXPORT_SYMBOL(ksz_switch_register); void ksz_switch_remove(struct ksz_device *dev) { /* timer started */ - if (dev->mib_read_interval) + if (dev->mib_read_interval) { + dev->mib_read_interval = 0; cancel_delayed_work_sync(&dev->mib_read); + } dev->dev_ops->exit(dev); dsa_unregister_switch(dev->ds);
When the ksz module is installed and removed using rmmod, kernel crashes with null pointer dereferrence error. During rmmod, ksz_switch_remove function tries to cancel the mib_read_workqueue using cancel_delayed_work_sync routine. At the end of mib_read_workqueue execution, it again reschedule the workqueue unconditionally. Due to which queue rescheduled after mib_interval, during this execution it tries to access dp->slave. But the slave is unregistered in the ksz_switch_remove function. Hence kernel crashes. To avoid this crash, before canceling the workqueue, resetted the mib_interval to 0. In the work queue execution, it schedules the workqueue next time only if the mib_interval is non zero. Fixes: 469b390e1ba3 ("net: dsa: microchip: use delayed_work instead of timer + work") Signed-off-by: Arun Ramadoss <arun.ramadoss@microchip.com> --- drivers/net/dsa/microchip/ksz_common.c | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-)