Message ID | 20220309133020.1869-1-heming.zhao@suse.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | fix kernel crash after mounting when dlm doesn't ready | expand |
It looks this issue is introduced by the following commit: da5e7c87827e ocfs2: cleanup journal init and shutdown Before that, journal was initialized in ocfs2_initialize_super(). On 3/9/22 9:30 PM, Heming Zhao wrote: > How to trigger: > > tb-ocfs1 # dd if=/dev/zero of=/dev/vdb bs=1M count=20 oflag=direct > tb-ocfs1 # mkfs.ocfs2 --cluster-stack=pcmk -N 4 /dev/vdb \ > --cluster-name=ocfstst > == there is no dlm running == > tb-ocfs1 # mount -t ocfs2 /dev/vdb /mnt > == kernel crash == > > Crash log > > ``` > kernel: DLM installed > kernel: ocfs2: Registered cluster interface user > kernel: dlm: no local IP address has been set > kernel: dlm: cannot start dlm lowcomms -107 > kernel: (mount.ocfs2,2225,0):ocfs2_dlm_init:3355 ERROR: status = -107 > kernel: (mount.ocfs2,2225,0):ocfs2_mount_volume:1817 ERROR: status = -107 > kernel: (mount.ocfs2,2225,0):ocfs2_fill_super:1186 ERROR: status = -107 > kernel: BUG: kernel NULL pointer dereference, address: 0000000000000030 > kernel: #PF: supervisor read access in kernel mode > kernel: #PF: error_code(0x0000) - not-present page > kernel: PGD 0 P4D 0 > kernel: Oops: 0000 [#1] PREEMPT SMP NOPTI > kernel: CPU: 0 PID: 2225 Comm: mount.ocfs2 Not tainted 5.16.2-1-default #1 openSUSE Tumbleweed b40a195b7ff0f3399a616c3290f963c4ad189e84 > kernel: Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.14.0-0-g155821a-rebuilt.opensuse.org 04/01/2014 > kernel: RIP: 0010:ocfs2_clear_inode+0x2e9/0x720 [ocfs2] > kernel: Code: 24 e8 9b 68 04 00 48 c7 c7 70 88 db c0 48 8b 80 98 03 00 > 00 48 8b 80 70 01 00 00 48 89 44 24 08 e8 3c 50 d8 d8 48 8b 44 24 08 > <48> 8b 48 30 49 39 4f c8 0f 88 ff 00 00 00 48 c> > kernel: RSP: 0018:ffffbbf000847bf0 EFLAGS: 00010246 > kernel: RAX: 0000000000000000 RBX: ffff95f6834b8000 RCX: 0000000000000000 > kernel: RDX: 0000000000000001 RSI: ffff95f6a1fbcbe0 RDI: ffffffffc0db8870 > kernel: RBP: ffff95f6a1fbc6b8 R08: 00000ab5a9371b7a R09: 0000000000000230 > kernel: R10: ffffbbf000847bc0 R11: ffffffffc0d53ea0 R12: ffff95f6a1fbc560 > kernel: R13: ffff95f6a1fbc408 R14: ffff95f6834b8000 R15: ffff95f6a1fbc908 > kernel: FS: 00007f366f151740(0000) GS:ffff95f6fdc00000(0000) knlGS:0000000000000000 > kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > kernel: CR2: 0000000000000030 CR3: 0000000003fcc004 CR4: 0000000000370ef0 > kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > kernel: DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 > kernel: Call Trace: > kernel: <TASK> > kernel: ? ocfs2_evict_inode+0x1fe/0x630 [ocfs2 411bc..281] > kernel: evict+0xc0/0x1c0 > kernel: ocfs2_release_system_inodes+0x21/0xc0 [ocfs2 411bc..281] > kernel: ocfs2_dismount_volume+0x10b/0x2d0 [ocfs2 411bc..281] > kernel: ocfs2_fill_super+0xaf/0x19e0 [ocfs2 411bc..281] > kernel: mount_bdev+0x182/0x1b0 > kernel: ? ocfs2_initialize_super.isra.0+0xf50/0xf50 [ocfs2 411bc..281] > kernel: legacy_get_tree+0x24/0x40 > kernel: vfs_get_tree+0x22/0xb0 > kernel: path_mount+0x465/0xac0 > kernel: __x64_sys_mount+0x103/0x140 > kernel: do_syscall_64+0x59/0x80 > kernel: ? syscall_exit_to_user_mode+0x18/0x40 > kernel: ? do_syscall_64+0x69/0x80 > kernel: ? syscall_exit_to_user_mode+0x18/0x40 > kernel: ? do_syscall_64+0x69/0x80 > kernel: ? exc_page_fault+0x68/0x150 > kernel: entry_SYSCALL_64_after_hwframe+0x44/0xae > ``` > > Analysis: > > ocfs2_fill_super > ocfs2_mount_volume > ocfs2_dlm_init //failed, journal still doesn't be initized. > goto read_super_error This is not a part of call trace, I don't think it is needed here. > ocfs2_dismount_volume > ocfs2_release_system_inodes > ... > evict > ... > ocfs2_clear_inode > ocfs2_checkpoint_inode > ocfs2_ci_fully_checkpointed > time_after(journal->j_trans_id, ci->ci_last_trans) > + journal is empty, crash! > Suggest we use the following way to description this commit: <call trace> <reproducer> <your analysis and how to fix> BTW, as I mentioned at the first, a fixes tag should be mentioned here. Could you also please check all possible use of journal during ocfs2_dismount_volume()? Thanks, Joseph > Signed-off-by: Heming Zhao <heming.zhao@suse.com> > --- > fs/ocfs2/inode.c | 3 ++- > fs/ocfs2/journal.h | 2 +- > 2 files changed, 3 insertions(+), 2 deletions(-) > > diff --git a/fs/ocfs2/inode.c b/fs/ocfs2/inode.c > index 6c2411c2afcf..3826ab7eab3e 100644 > --- a/fs/ocfs2/inode.c > +++ b/fs/ocfs2/inode.c > @@ -1205,7 +1205,8 @@ static void ocfs2_clear_inode(struct inode *inode) > * the journal is flushed before journal shutdown. Thus it is safe to > * have inodes get cleaned up after journal shutdown. > */ > - jbd2_journal_release_jbd_inode(osb->journal->j_journal, > + if (osb->journal) > + jbd2_journal_release_jbd_inode(osb->journal->j_journal, > &oi->ip_jinode); > } > > diff --git a/fs/ocfs2/journal.h b/fs/ocfs2/journal.h > index 8dcb2f2cadbc..1610d49b4112 100644 > --- a/fs/ocfs2/journal.h > +++ b/fs/ocfs2/journal.h > @@ -189,7 +189,7 @@ static inline void ocfs2_checkpoint_inode(struct inode *inode) > { > struct ocfs2_super *osb = OCFS2_SB(inode->i_sb); > > - if (ocfs2_mount_local(osb)) > + if (!osb->journal || ocfs2_mount_local(osb)) > return; > > if (!ocfs2_ci_fully_checkpointed(INODE_CACHE(inode))) {
Hello Joseph, Thank you for your review. On 3/10/22 11:59, Joseph Qi wrote: > It looks this issue is introduced by the following commit: > da5e7c87827e ocfs2: cleanup journal init and shutdown > > Before that, journal was initialized in ocfs2_initialize_super(). I agree, and will add Fixes in V2. > > On 3/9/22 9:30 PM, Heming Zhao wrote: >> How to trigger: >> >> tb-ocfs1 # dd if=/dev/zero of=/dev/vdb bs=1M count=20 oflag=direct >> tb-ocfs1 # mkfs.ocfs2 --cluster-stack=pcmk -N 4 /dev/vdb \ >> --cluster-name=ocfstst >> == there is no dlm running == >> tb-ocfs1 # mount -t ocfs2 /dev/vdb /mnt >> == kernel crash == >> >> Crash log >> >> ``` >> kernel: DLM installed >> kernel: ocfs2: Registered cluster interface user >> kernel: dlm: no local IP address has been set >> kernel: dlm: cannot start dlm lowcomms -107 >> kernel: (mount.ocfs2,2225,0):ocfs2_dlm_init:3355 ERROR: status = -107 >> kernel: (mount.ocfs2,2225,0):ocfs2_mount_volume:1817 ERROR: status = -107 >> kernel: (mount.ocfs2,2225,0):ocfs2_fill_super:1186 ERROR: status = -107 >> kernel: BUG: kernel NULL pointer dereference, address: 0000000000000030 >> kernel: #PF: supervisor read access in kernel mode >> kernel: #PF: error_code(0x0000) - not-present page >> kernel: PGD 0 P4D 0 >> kernel: Oops: 0000 [#1] PREEMPT SMP NOPTI >> kernel: CPU: 0 PID: 2225 Comm: mount.ocfs2 Not tainted 5.16.2-1-default #1 openSUSE Tumbleweed b40a195b7ff0f3399a616c3290f963c4ad189e84 >> kernel: Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.14.0-0-g155821a-rebuilt.opensuse.org 04/01/2014 >> kernel: RIP: 0010:ocfs2_clear_inode+0x2e9/0x720 [ocfs2] >> kernel: Code: 24 e8 9b 68 04 00 48 c7 c7 70 88 db c0 48 8b 80 98 03 00 >> 00 48 8b 80 70 01 00 00 48 89 44 24 08 e8 3c 50 d8 d8 48 8b 44 24 08 >> <48> 8b 48 30 49 39 4f c8 0f 88 ff 00 00 00 48 c> >> kernel: RSP: 0018:ffffbbf000847bf0 EFLAGS: 00010246 >> kernel: RAX: 0000000000000000 RBX: ffff95f6834b8000 RCX: 0000000000000000 >> kernel: RDX: 0000000000000001 RSI: ffff95f6a1fbcbe0 RDI: ffffffffc0db8870 >> kernel: RBP: ffff95f6a1fbc6b8 R08: 00000ab5a9371b7a R09: 0000000000000230 >> kernel: R10: ffffbbf000847bc0 R11: ffffffffc0d53ea0 R12: ffff95f6a1fbc560 >> kernel: R13: ffff95f6a1fbc408 R14: ffff95f6834b8000 R15: ffff95f6a1fbc908 >> kernel: FS: 00007f366f151740(0000) GS:ffff95f6fdc00000(0000) knlGS:0000000000000000 >> kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 >> kernel: CR2: 0000000000000030 CR3: 0000000003fcc004 CR4: 0000000000370ef0 >> kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 >> kernel: DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 >> kernel: Call Trace: >> kernel: <TASK> >> kernel: ? ocfs2_evict_inode+0x1fe/0x630 [ocfs2 411bc..281] >> kernel: evict+0xc0/0x1c0 >> kernel: ocfs2_release_system_inodes+0x21/0xc0 [ocfs2 411bc..281] >> kernel: ocfs2_dismount_volume+0x10b/0x2d0 [ocfs2 411bc..281] >> kernel: ocfs2_fill_super+0xaf/0x19e0 [ocfs2 411bc..281] >> kernel: mount_bdev+0x182/0x1b0 >> kernel: ? ocfs2_initialize_super.isra.0+0xf50/0xf50 [ocfs2 411bc..281] >> kernel: legacy_get_tree+0x24/0x40 >> kernel: vfs_get_tree+0x22/0xb0 >> kernel: path_mount+0x465/0xac0 >> kernel: __x64_sys_mount+0x103/0x140 >> kernel: do_syscall_64+0x59/0x80 >> kernel: ? syscall_exit_to_user_mode+0x18/0x40 >> kernel: ? do_syscall_64+0x69/0x80 >> kernel: ? syscall_exit_to_user_mode+0x18/0x40 >> kernel: ? do_syscall_64+0x69/0x80 >> kernel: ? exc_page_fault+0x68/0x150 >> kernel: entry_SYSCALL_64_after_hwframe+0x44/0xae >> ``` >> >> Analysis: >> >> ocfs2_fill_super >> ocfs2_mount_volume >> ocfs2_dlm_init //failed, journal still doesn't be initized. >> goto read_super_error > This is not a part of call trace, I don't think it is needed here. OK, will drop in v2. > >> ocfs2_dismount_volume >> ocfs2_release_system_inodes >> ... >> evict >> ... >> ocfs2_clear_inode >> ocfs2_checkpoint_inode >> ocfs2_ci_fully_checkpointed >> time_after(journal->j_trans_id, ci->ci_last_trans) >> + journal is empty, crash! >> > > Suggest we use the following way to description this commit: > <call trace> > <reproducer> > <your analysis and how to fix> OK, will follow this style. > > BTW, as I mentioned at the first, a fixes tag should be mentioned here. > Could you also please check all possible use of journal during > ocfs2_dismount_volume()? I verified my patch before file V1, there was no crash anymore with no dlm case. Under your comment, there may have other places need to fix in ocfs2_dismount_volume(). The reverse operations in ocfs2_dismount_volume() for range [ocfs2_initialize_super, \ ocfs2_journal_init) should be checked. > > Thanks, > Joseph > Thanks, Heming
diff --git a/fs/ocfs2/inode.c b/fs/ocfs2/inode.c index 6c2411c2afcf..3826ab7eab3e 100644 --- a/fs/ocfs2/inode.c +++ b/fs/ocfs2/inode.c @@ -1205,7 +1205,8 @@ static void ocfs2_clear_inode(struct inode *inode) * the journal is flushed before journal shutdown. Thus it is safe to * have inodes get cleaned up after journal shutdown. */ - jbd2_journal_release_jbd_inode(osb->journal->j_journal, + if (osb->journal) + jbd2_journal_release_jbd_inode(osb->journal->j_journal, &oi->ip_jinode); } diff --git a/fs/ocfs2/journal.h b/fs/ocfs2/journal.h index 8dcb2f2cadbc..1610d49b4112 100644 --- a/fs/ocfs2/journal.h +++ b/fs/ocfs2/journal.h @@ -189,7 +189,7 @@ static inline void ocfs2_checkpoint_inode(struct inode *inode) { struct ocfs2_super *osb = OCFS2_SB(inode->i_sb); - if (ocfs2_mount_local(osb)) + if (!osb->journal || ocfs2_mount_local(osb)) return; if (!ocfs2_ci_fully_checkpointed(INODE_CACHE(inode))) {
How to trigger: tb-ocfs1 # dd if=/dev/zero of=/dev/vdb bs=1M count=20 oflag=direct tb-ocfs1 # mkfs.ocfs2 --cluster-stack=pcmk -N 4 /dev/vdb \ --cluster-name=ocfstst == there is no dlm running == tb-ocfs1 # mount -t ocfs2 /dev/vdb /mnt == kernel crash == Crash log ``` kernel: DLM installed kernel: ocfs2: Registered cluster interface user kernel: dlm: no local IP address has been set kernel: dlm: cannot start dlm lowcomms -107 kernel: (mount.ocfs2,2225,0):ocfs2_dlm_init:3355 ERROR: status = -107 kernel: (mount.ocfs2,2225,0):ocfs2_mount_volume:1817 ERROR: status = -107 kernel: (mount.ocfs2,2225,0):ocfs2_fill_super:1186 ERROR: status = -107 kernel: BUG: kernel NULL pointer dereference, address: 0000000000000030 kernel: #PF: supervisor read access in kernel mode kernel: #PF: error_code(0x0000) - not-present page kernel: PGD 0 P4D 0 kernel: Oops: 0000 [#1] PREEMPT SMP NOPTI kernel: CPU: 0 PID: 2225 Comm: mount.ocfs2 Not tainted 5.16.2-1-default #1 openSUSE Tumbleweed b40a195b7ff0f3399a616c3290f963c4ad189e84 kernel: Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.14.0-0-g155821a-rebuilt.opensuse.org 04/01/2014 kernel: RIP: 0010:ocfs2_clear_inode+0x2e9/0x720 [ocfs2] kernel: Code: 24 e8 9b 68 04 00 48 c7 c7 70 88 db c0 48 8b 80 98 03 00 00 48 8b 80 70 01 00 00 48 89 44 24 08 e8 3c 50 d8 d8 48 8b 44 24 08 <48> 8b 48 30 49 39 4f c8 0f 88 ff 00 00 00 48 c> kernel: RSP: 0018:ffffbbf000847bf0 EFLAGS: 00010246 kernel: RAX: 0000000000000000 RBX: ffff95f6834b8000 RCX: 0000000000000000 kernel: RDX: 0000000000000001 RSI: ffff95f6a1fbcbe0 RDI: ffffffffc0db8870 kernel: RBP: ffff95f6a1fbc6b8 R08: 00000ab5a9371b7a R09: 0000000000000230 kernel: R10: ffffbbf000847bc0 R11: ffffffffc0d53ea0 R12: ffff95f6a1fbc560 kernel: R13: ffff95f6a1fbc408 R14: ffff95f6834b8000 R15: ffff95f6a1fbc908 kernel: FS: 00007f366f151740(0000) GS:ffff95f6fdc00000(0000) knlGS:0000000000000000 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 kernel: CR2: 0000000000000030 CR3: 0000000003fcc004 CR4: 0000000000370ef0 kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 kernel: DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 kernel: Call Trace: kernel: <TASK> kernel: ? ocfs2_evict_inode+0x1fe/0x630 [ocfs2 411bc..281] kernel: evict+0xc0/0x1c0 kernel: ocfs2_release_system_inodes+0x21/0xc0 [ocfs2 411bc..281] kernel: ocfs2_dismount_volume+0x10b/0x2d0 [ocfs2 411bc..281] kernel: ocfs2_fill_super+0xaf/0x19e0 [ocfs2 411bc..281] kernel: mount_bdev+0x182/0x1b0 kernel: ? ocfs2_initialize_super.isra.0+0xf50/0xf50 [ocfs2 411bc..281] kernel: legacy_get_tree+0x24/0x40 kernel: vfs_get_tree+0x22/0xb0 kernel: path_mount+0x465/0xac0 kernel: __x64_sys_mount+0x103/0x140 kernel: do_syscall_64+0x59/0x80 kernel: ? syscall_exit_to_user_mode+0x18/0x40 kernel: ? do_syscall_64+0x69/0x80 kernel: ? syscall_exit_to_user_mode+0x18/0x40 kernel: ? do_syscall_64+0x69/0x80 kernel: ? exc_page_fault+0x68/0x150 kernel: entry_SYSCALL_64_after_hwframe+0x44/0xae ``` Analysis: ocfs2_fill_super ocfs2_mount_volume ocfs2_dlm_init //failed, journal still doesn't be initized. goto read_super_error ocfs2_dismount_volume ocfs2_release_system_inodes ... evict ... ocfs2_clear_inode ocfs2_checkpoint_inode ocfs2_ci_fully_checkpointed time_after(journal->j_trans_id, ci->ci_last_trans) + journal is empty, crash! Signed-off-by: Heming Zhao <heming.zhao@suse.com> --- fs/ocfs2/inode.c | 3 ++- fs/ocfs2/journal.h | 2 +- 2 files changed, 3 insertions(+), 2 deletions(-)