diff mbox

clk: remove the clk_notifier from clk_notifier_list before free it (was: Re: [BUG] zynq | CCF | SRCU)

Message ID 51AC5F1B.4020409@cn.fujitsu.com (mailing list archive)
State New, archived
Headers show

Commit Message

Lai Jiangshan June 3, 2013, 9:17 a.m. UTC
On 06/01/2013 03:12 AM, Sören Brinkmann wrote:
> Hi,
> 
> we recently encountered some kernel panics when we compiled one of our
> drivers as module and tested inserting/removing the module.
> Trying to debug this issue, I could reproduce it on the mainline kernel
> with a dummy module.
> 
> What happens is, that when on driver remove clk_notifier_unregister() is
> called and no other notifier for that clock is registered, the kernel
> panics.
> I'm not sure what is going wrong here. If there is a bug (and if where)
> or I'm just using the infrastructure the wrong way,... So, any hint is
> appreciated.
> 
> I attach the output from the crashing system. The stacktrace indicates a
> crash in 'srcu_readers_seq_idx()'.
> I also attach the module I used to trigger the issue and a patch on top
> of mainline commit a93cb29acaa8f75618c3f202d1cf43c231984644 which has
> the DT modifications I need to make the module find its clock and boot
> with my initramfs.
> 
> 
> 	Thanks,
> 	Sören
> 



Hi, Sören Brinkmann

I guess:

modprobe clk_notif_dbg
modprobe clk_notif_dbg -r  # memory corrupt here
modprobe clk_notif_dbg     # access corrupted memroy, but no visiable bug
modprobe clk_notif_dbg -r  # access corrupted memroy, BUG


How the first "modprobe clk_notif_dbg -r" corrupt memroy:

Comments

Soren Brinkmann June 3, 2013, 4:49 p.m. UTC | #1
Hi Lai,

On Mon, Jun 03, 2013 at 05:17:15PM +0800, Lai Jiangshan wrote:
> On 06/01/2013 03:12 AM, Sören Brinkmann wrote:
> > Hi,
> > 
> > we recently encountered some kernel panics when we compiled one of our
> > drivers as module and tested inserting/removing the module.
> > Trying to debug this issue, I could reproduce it on the mainline kernel
> > with a dummy module.
> > 
> > What happens is, that when on driver remove clk_notifier_unregister() is
> > called and no other notifier for that clock is registered, the kernel
> > panics.
> > I'm not sure what is going wrong here. If there is a bug (and if where)
> > or I'm just using the infrastructure the wrong way,... So, any hint is
> > appreciated.
> > 
> > I attach the output from the crashing system. The stacktrace indicates a
> > crash in 'srcu_readers_seq_idx()'.
> > I also attach the module I used to trigger the issue and a patch on top
> > of mainline commit a93cb29acaa8f75618c3f202d1cf43c231984644 which has
> > the DT modifications I need to make the module find its clock and boot
> > with my initramfs.
> > 
> > 
> > 	Thanks,
> > 	Sören
> > 
> 
> 
> 
> Hi, Sören Brinkmann
> 
> I guess:
> 
> modprobe clk_notif_dbg
> modprobe clk_notif_dbg -r  # memory corrupt here
> modprobe clk_notif_dbg     # access corrupted memroy, but no visiable bug
> modprobe clk_notif_dbg -r  # access corrupted memroy, BUG
> 
> 
> How the first "modprobe clk_notif_dbg -r" corrupt memroy:
> 
> =========
> 
> int clk_notifier_unregister(struct clk *clk, struct notifier_block *nb)
> {
> 	struct clk_notifier *cn = NULL;
> 	int ret = -EINVAL;
> 
> 	if (!clk || !nb)
> 		return -EINVAL;
> 
> 	clk_prepare_lock();
> 
> 	list_for_each_entry(cn, &clk_notifier_list, node)
> 		if (cn->clk == clk)
> 			break;
> 
> 	if (cn->clk == clk) {
> 		ret = srcu_notifier_chain_unregister(&cn->notifier_head, nb);
> 
> 		clk->notifier_count--;
> 
> 		/* XXX the notifier code should handle this better */
> 		if (!cn->notifier_head.head) {
> 			srcu_cleanup_notifier_head(&cn->notifier_head);
> 
> ===========> the code forgot to remove @cn from the clk_notifier_list
> ===========> the second "modprobe clk_notif_dbg" will the same @clk and use the same corrupt @cn
> 
> 			kfree(cn);
> 		}
> 
> 	} else {
> 		ret = -ENOENT;
> 	}
> 
> 	clk_prepare_unlock();
> 
> 	return ret;
> }
> 
> ===========
> 
> Could you retry with the following patch?
Thanks for the patch. This fixes it. I can add/remove my module, and it
didn't crash since.
I had some trouble applying it though, due to some encoding hiccups. I
think the 'ö' in my name might be the culprit. I never know where things
go wrong, whether it's format-patch, or something in the email
transport, the sender or receiver side... But well, looks like full
UTF-8 support is missing somewhere.

> 
> Thanks,
> Lai
> 
> From 5e26b626724139070148df9f6bd0607bc7bc3812 Mon Sep 17 00:00:00 2001
> From: Lai Jiangshan <laijs@cn.fujitsu.com>
> Date: Mon, 3 Jun 2013 16:59:50 +0800
> Subject: [PATCH] clk: remove the clk_notifier from clk_notifier_list before
>  free it
> MIME-Version: 1.0
> Content-Type: text/plain; charset=UTF-8
> Content-Transfer-Encoding: 8bit
> 
> The @cn is stay in @clk_notifier_list after it is freed, it cause
> memory corruption.
> 
> Example, if @clk is registered(first), unregistered(first),
> registered(second), unregistered(second).
> 
> The freed @cn will be used when @clk is registered(second),
> and the bug will be happened when @clk is unregistered(second):
> 
> [  517.040000] clk_notif_dbg clk_notif_dbg.1: clk_notifier_unregister()
> [  517.040000] Unable to handle kernel paging request at virtual address 00df3008
> [  517.050000] pgd = ed858000
> [  517.050000] [00df3008] *pgd=00000000
> [  517.060000] Internal error: Oops: 5 [#1] PREEMPT SMP ARM
> [  517.060000] Modules linked in: clk_notif_dbg(O-) [last unloaded: clk_notif_dbg]
> [  517.060000] CPU: 1 PID: 499 Comm: modprobe Tainted: G           O 3.10.0-rc3-00119-ga93cb29-dirty #85
> [  517.060000] task: ee1e0180 ti: ee3e6000 task.ti: ee3e6000
> [  517.060000] PC is at srcu_readers_seq_idx+0x48/0x84
> [  517.060000] LR is at srcu_readers_seq_idx+0x60/0x84
> [  517.060000] pc : [<c0052720>]    lr : [<c0052738>]    psr: 80070013
> [  517.060000] sp : ee3e7d48  ip : 00000000  fp : ee3e7d6c
> [  517.060000] r10: 00000000  r9 : ee3e6000  r8 : 00000000
> [  517.060000] r7 : ed84fe4c  r6 : c068ec90  r5 : c068e430  r4 : 00000000
> [  517.060000] r3 : 00df3000  r2 : 00000000  r1 : 00000002  r0 : 00000000
> [  517.060000] Flags: Nzcv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment user
> [  517.060000] Control: 18c5387d  Table: 2d85804a  DAC: 00000015
> [  517.060000] Process modprobe (pid: 499, stack limit = 0xee3e6238)
> [  517.060000] Stack: (0xee3e7d48 to 0xee3e8000)
> ....
> [  517.060000] [<c0052720>] (srcu_readers_seq_idx+0x48/0x84) from [<c0052790>] (try_check_zero+0x34/0xfc)
> [  517.060000] [<c0052790>] (try_check_zero+0x34/0xfc) from [<c00528b0>] (srcu_advance_batches+0x58/0x114)
> [  517.060000] [<c00528b0>] (srcu_advance_batches+0x58/0x114) from [<c0052c30>] (__synchronize_srcu+0x114/0x1ac)
> [  517.060000] [<c0052c30>] (__synchronize_srcu+0x114/0x1ac) from [<c0052d14>] (synchronize_srcu+0x2c/0x34)
> [  517.060000] [<c0052d14>] (synchronize_srcu+0x2c/0x34) from [<c0053a08>] (srcu_notifier_chain_unregister+0x68/0x74)
> [  517.060000] [<c0053a08>] (srcu_notifier_chain_unregister+0x68/0x74) from [<c0375a78>] (clk_notifier_unregister+0x7c/0xc0)
> [  517.060000] [<c0375a78>] (clk_notifier_unregister+0x7c/0xc0) from [<bf008034>] (clk_notif_dbg_remove+0x34/0x9c [clk_notif_dbg])
> [  517.060000] [<bf008034>] (clk_notif_dbg_remove+0x34/0x9c [clk_notif_dbg]) from [<c02bb974>] (platform_drv_remove+0x24/0x28)
> [  517.060000] [<c02bb974>] (platform_drv_remove+0x24/0x28) from [<c02b9bf8>] (__device_release_driver+0x8c/0xd4)
> [  517.060000] [<c02b9bf8>] (__device_release_driver+0x8c/0xd4) from [<c02ba680>] (driver_detach+0x9c/0xc4)
> [  517.060000] [<c02ba680>] (driver_detach+0x9c/0xc4) from [<c02b99c4>] (bus_remove_driver+0xcc/0xfc)
> [  517.060000] [<c02b99c4>] (bus_remove_driver+0xcc/0xfc) from [<c02bace4>] (driver_unregister+0x54/0x78)
> [  517.060000] [<c02bace4>] (driver_unregister+0x54/0x78) from [<c02bbb44>] (platform_driver_unregister+0x1c/0x20)
> [  517.060000] [<c02bbb44>] (platform_driver_unregister+0x1c/0x20) from [<bf0081f8>] (clk_notif_dbg_driver_exit+0x14/0x1c [clk_notif_dbg])
> [  517.060000] [<bf0081f8>] (clk_notif_dbg_driver_exit+0x14/0x1c [clk_notif_dbg]) from [<c00835e4>] (SyS_delete_module+0x200/0x28c)
> [  517.060000] [<c00835e4>] (SyS_delete_module+0x200/0x28c) from [<c000edc0>] (ret_fast_syscall+0x0/0x48)
> [  517.060000] Code: e5973004 e7911102 e0833001 e2881002 (e7933101)
> 
> CC: stable@kernel.org
> Reported-by: Sören Brinkmann <soren.brinkmann@xilinx.com>
> Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Tested-by: Sören Brinkmann <soren.brinkmann@xilinx.com>

	Sören
Mike Turquette June 7, 2013, 1:42 a.m. UTC | #2
Quoting Lai Jiangshan (2013-06-03 02:17:15)
> The @cn is stay in @clk_notifier_list after it is freed, it cause
> memory corruption.
> 
> Example, if @clk is registered(first), unregistered(first),
> registered(second), unregistered(second).
> 
> The freed @cn will be used when @clk is registered(second),
> and the bug will be happened when @clk is unregistered(second):
> 
> [  517.040000] clk_notif_dbg clk_notif_dbg.1: clk_notifier_unregister()
> [  517.040000] Unable to handle kernel paging request at virtual address 00df3008
> [  517.050000] pgd = ed858000
> [  517.050000] [00df3008] *pgd=00000000
> [  517.060000] Internal error: Oops: 5 [#1] PREEMPT SMP ARM
> [  517.060000] Modules linked in: clk_notif_dbg(O-) [last unloaded: clk_notif_dbg]
> [  517.060000] CPU: 1 PID: 499 Comm: modprobe Tainted: G           O 3.10.0-rc3-00119-ga93cb29-dirty #85
> [  517.060000] task: ee1e0180 ti: ee3e6000 task.ti: ee3e6000
> [  517.060000] PC is at srcu_readers_seq_idx+0x48/0x84
> [  517.060000] LR is at srcu_readers_seq_idx+0x60/0x84
> [  517.060000] pc : [<c0052720>]    lr : [<c0052738>]    psr: 80070013
> [  517.060000] sp : ee3e7d48  ip : 00000000  fp : ee3e7d6c
> [  517.060000] r10: 00000000  r9 : ee3e6000  r8 : 00000000
> [  517.060000] r7 : ed84fe4c  r6 : c068ec90  r5 : c068e430  r4 : 00000000
> [  517.060000] r3 : 00df3000  r2 : 00000000  r1 : 00000002  r0 : 00000000
> [  517.060000] Flags: Nzcv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment user
> [  517.060000] Control: 18c5387d  Table: 2d85804a  DAC: 00000015
> [  517.060000] Process modprobe (pid: 499, stack limit = 0xee3e6238)
> [  517.060000] Stack: (0xee3e7d48 to 0xee3e8000)
> ....
> [  517.060000] [<c0052720>] (srcu_readers_seq_idx+0x48/0x84) from [<c0052790>] (try_check_zero+0x34/0xfc)
> [  517.060000] [<c0052790>] (try_check_zero+0x34/0xfc) from [<c00528b0>] (srcu_advance_batches+0x58/0x114)
> [  517.060000] [<c00528b0>] (srcu_advance_batches+0x58/0x114) from [<c0052c30>] (__synchronize_srcu+0x114/0x1ac)
> [  517.060000] [<c0052c30>] (__synchronize_srcu+0x114/0x1ac) from [<c0052d14>] (synchronize_srcu+0x2c/0x34)
> [  517.060000] [<c0052d14>] (synchronize_srcu+0x2c/0x34) from [<c0053a08>] (srcu_notifier_chain_unregister+0x68/0x74)
> [  517.060000] [<c0053a08>] (srcu_notifier_chain_unregister+0x68/0x74) from [<c0375a78>] (clk_notifier_unregister+0x7c/0xc0)
> [  517.060000] [<c0375a78>] (clk_notifier_unregister+0x7c/0xc0) from [<bf008034>] (clk_notif_dbg_remove+0x34/0x9c [clk_notif_dbg])
> [  517.060000] [<bf008034>] (clk_notif_dbg_remove+0x34/0x9c [clk_notif_dbg]) from [<c02bb974>] (platform_drv_remove+0x24/0x28)
> [  517.060000] [<c02bb974>] (platform_drv_remove+0x24/0x28) from [<c02b9bf8>] (__device_release_driver+0x8c/0xd4)
> [  517.060000] [<c02b9bf8>] (__device_release_driver+0x8c/0xd4) from [<c02ba680>] (driver_detach+0x9c/0xc4)
> [  517.060000] [<c02ba680>] (driver_detach+0x9c/0xc4) from [<c02b99c4>] (bus_remove_driver+0xcc/0xfc)
> [  517.060000] [<c02b99c4>] (bus_remove_driver+0xcc/0xfc) from [<c02bace4>] (driver_unregister+0x54/0x78)
> [  517.060000] [<c02bace4>] (driver_unregister+0x54/0x78) from [<c02bbb44>] (platform_driver_unregister+0x1c/0x20)
> [  517.060000] [<c02bbb44>] (platform_driver_unregister+0x1c/0x20) from [<bf0081f8>] (clk_notif_dbg_driver_exit+0x14/0x1c [clk_notif_dbg])
> [  517.060000] [<bf0081f8>] (clk_notif_dbg_driver_exit+0x14/0x1c [clk_notif_dbg]) from [<c00835e4>] (SyS_delete_module+0x200/0x28c)
> [  517.060000] [<c00835e4>] (SyS_delete_module+0x200/0x28c) from [<c000edc0>] (ret_fast_syscall+0x0/0x48)
> [  517.060000] Code: e5973004 e7911102 e0833001 e2881002 (e7933101)
> 
> CC: stable@kernel.org
> Reported-by: Sören Brinkmann <soren.brinkmann@xilinx.com>
> Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>

Thanks!  I picked this up for the next set of fixes.

Regards,
Mike

> ---
>  drivers/clk/clk.c |    1 +
>  1 files changed, 1 insertions(+), 0 deletions(-)
> 
> diff --git a/drivers/clk/clk.c b/drivers/clk/clk.c
> index 934cfd1..1144e8c 100644
> --- a/drivers/clk/clk.c
> +++ b/drivers/clk/clk.c
> @@ -1955,6 +1955,7 @@ int clk_notifier_unregister(struct clk *clk, struct notifier_block *nb)
>                 /* XXX the notifier code should handle this better */
>                 if (!cn->notifier_head.head) {
>                         srcu_cleanup_notifier_head(&cn->notifier_head);
> +                       list_del(&cn->node);
>                         kfree(cn);
>                 }
>  
> -- 
> 1.7.4.4
diff mbox

Patch

=========

int clk_notifier_unregister(struct clk *clk, struct notifier_block *nb)
{
	struct clk_notifier *cn = NULL;
	int ret = -EINVAL;

	if (!clk || !nb)
		return -EINVAL;

	clk_prepare_lock();

	list_for_each_entry(cn, &clk_notifier_list, node)
		if (cn->clk == clk)
			break;

	if (cn->clk == clk) {
		ret = srcu_notifier_chain_unregister(&cn->notifier_head, nb);

		clk->notifier_count--;

		/* XXX the notifier code should handle this better */
		if (!cn->notifier_head.head) {
			srcu_cleanup_notifier_head(&cn->notifier_head);

===========> the code forgot to remove @cn from the clk_notifier_list
===========> the second "modprobe clk_notif_dbg" will the same @clk and use the same corrupt @cn

			kfree(cn);
		}

	} else {
		ret = -ENOENT;
	}

	clk_prepare_unlock();

	return ret;
}

===========

Could you retry with the following patch?

Thanks,
Lai

From 5e26b626724139070148df9f6bd0607bc7bc3812 Mon Sep 17 00:00:00 2001
From: Lai Jiangshan <laijs@cn.fujitsu.com>
Date: Mon, 3 Jun 2013 16:59:50 +0800
Subject: [PATCH] clk: remove the clk_notifier from clk_notifier_list before
 free it
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

The @cn is stay in @clk_notifier_list after it is freed, it cause
memory corruption.

Example, if @clk is registered(first), unregistered(first),
registered(second), unregistered(second).

The freed @cn will be used when @clk is registered(second),
and the bug will be happened when @clk is unregistered(second):

[  517.040000] clk_notif_dbg clk_notif_dbg.1: clk_notifier_unregister()
[  517.040000] Unable to handle kernel paging request at virtual address 00df3008
[  517.050000] pgd = ed858000
[  517.050000] [00df3008] *pgd=00000000
[  517.060000] Internal error: Oops: 5 [#1] PREEMPT SMP ARM
[  517.060000] Modules linked in: clk_notif_dbg(O-) [last unloaded: clk_notif_dbg]
[  517.060000] CPU: 1 PID: 499 Comm: modprobe Tainted: G           O 3.10.0-rc3-00119-ga93cb29-dirty #85
[  517.060000] task: ee1e0180 ti: ee3e6000 task.ti: ee3e6000
[  517.060000] PC is at srcu_readers_seq_idx+0x48/0x84
[  517.060000] LR is at srcu_readers_seq_idx+0x60/0x84
[  517.060000] pc : [<c0052720>]    lr : [<c0052738>]    psr: 80070013
[  517.060000] sp : ee3e7d48  ip : 00000000  fp : ee3e7d6c
[  517.060000] r10: 00000000  r9 : ee3e6000  r8 : 00000000
[  517.060000] r7 : ed84fe4c  r6 : c068ec90  r5 : c068e430  r4 : 00000000
[  517.060000] r3 : 00df3000  r2 : 00000000  r1 : 00000002  r0 : 00000000
[  517.060000] Flags: Nzcv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment user
[  517.060000] Control: 18c5387d  Table: 2d85804a  DAC: 00000015
[  517.060000] Process modprobe (pid: 499, stack limit = 0xee3e6238)
[  517.060000] Stack: (0xee3e7d48 to 0xee3e8000)
....
[  517.060000] [<c0052720>] (srcu_readers_seq_idx+0x48/0x84) from [<c0052790>] (try_check_zero+0x34/0xfc)
[  517.060000] [<c0052790>] (try_check_zero+0x34/0xfc) from [<c00528b0>] (srcu_advance_batches+0x58/0x114)
[  517.060000] [<c00528b0>] (srcu_advance_batches+0x58/0x114) from [<c0052c30>] (__synchronize_srcu+0x114/0x1ac)
[  517.060000] [<c0052c30>] (__synchronize_srcu+0x114/0x1ac) from [<c0052d14>] (synchronize_srcu+0x2c/0x34)
[  517.060000] [<c0052d14>] (synchronize_srcu+0x2c/0x34) from [<c0053a08>] (srcu_notifier_chain_unregister+0x68/0x74)
[  517.060000] [<c0053a08>] (srcu_notifier_chain_unregister+0x68/0x74) from [<c0375a78>] (clk_notifier_unregister+0x7c/0xc0)
[  517.060000] [<c0375a78>] (clk_notifier_unregister+0x7c/0xc0) from [<bf008034>] (clk_notif_dbg_remove+0x34/0x9c [clk_notif_dbg])
[  517.060000] [<bf008034>] (clk_notif_dbg_remove+0x34/0x9c [clk_notif_dbg]) from [<c02bb974>] (platform_drv_remove+0x24/0x28)
[  517.060000] [<c02bb974>] (platform_drv_remove+0x24/0x28) from [<c02b9bf8>] (__device_release_driver+0x8c/0xd4)
[  517.060000] [<c02b9bf8>] (__device_release_driver+0x8c/0xd4) from [<c02ba680>] (driver_detach+0x9c/0xc4)
[  517.060000] [<c02ba680>] (driver_detach+0x9c/0xc4) from [<c02b99c4>] (bus_remove_driver+0xcc/0xfc)
[  517.060000] [<c02b99c4>] (bus_remove_driver+0xcc/0xfc) from [<c02bace4>] (driver_unregister+0x54/0x78)
[  517.060000] [<c02bace4>] (driver_unregister+0x54/0x78) from [<c02bbb44>] (platform_driver_unregister+0x1c/0x20)
[  517.060000] [<c02bbb44>] (platform_driver_unregister+0x1c/0x20) from [<bf0081f8>] (clk_notif_dbg_driver_exit+0x14/0x1c [clk_notif_dbg])
[  517.060000] [<bf0081f8>] (clk_notif_dbg_driver_exit+0x14/0x1c [clk_notif_dbg]) from [<c00835e4>] (SyS_delete_module+0x200/0x28c)
[  517.060000] [<c00835e4>] (SyS_delete_module+0x200/0x28c) from [<c000edc0>] (ret_fast_syscall+0x0/0x48)
[  517.060000] Code: e5973004 e7911102 e0833001 e2881002 (e7933101)

CC: stable@kernel.org
Reported-by: Sören Brinkmann <soren.brinkmann@xilinx.com>
Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
---
 drivers/clk/clk.c |    1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/drivers/clk/clk.c b/drivers/clk/clk.c
index 934cfd1..1144e8c 100644
--- a/drivers/clk/clk.c
+++ b/drivers/clk/clk.c
@@ -1955,6 +1955,7 @@  int clk_notifier_unregister(struct clk *clk, struct notifier_block *nb)
 		/* XXX the notifier code should handle this better */
 		if (!cn->notifier_head.head) {
 			srcu_cleanup_notifier_head(&cn->notifier_head);
+			list_del(&cn->node);
 			kfree(cn);
 		}