Message ID | 20220310105850.1817-1-heming.zhao@suse.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [v2] ocfs2: fix kernel crash after mounting when dlm doesn't ready | expand |
Hi, To prevent the crash, I think these changes are fine. But it looks wried to spread the check of ocfs2 journal. Can we intialize the ocfs2 journal much earlier? Thanks, Joseph On 3/10/22 6:58 PM, Heming Zhao wrote: > Call trace: > > ocfs2: Registered cluster interface user > dlm: no local IP address has been set > dlm: cannot start dlm lowcomms -107 > (mount.ocfs2,2225,0):ocfs2_dlm_init:3355 ERROR: status = -107 > (mount.ocfs2,2225,0):ocfs2_mount_volume:1817 ERROR: status = -107 > (mount.ocfs2,2225,0):ocfs2_fill_super:1186 ERROR: status = -107 > BUG: kernel NULL pointer dereference, address: 0000000000000030 > #PF: supervisor read access in kernel mode > #PF: error_code(0x0000) - not-present page > PGD 0 P4D 0 > Oops: 0000 [#1] PREEMPT SMP NOPTI > CPU: 0 PID: 2225 Comm: mount.ocfs2 Not tainted 5.16.2-1-default #1 openSUSE Tumbleweed b40...e84 > Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS ... > RIP: 0010:ocfs2_clear_inode+0x2e9/0x720 [ocfs2] > ... > Call Trace: > <TASK> > ? ocfs2_evict_inode+0x1fe/0x630 [ocfs2 411bc..281] > evict+0xc0/0x1c0 > ocfs2_release_system_inodes+0x21/0xc0 [ocfs2 411bc..281] > ocfs2_dismount_volume+0x10b/0x2d0 [ocfs2 411bc..281] > ocfs2_fill_super+0xaf/0x19e0 [ocfs2 411bc..281] > mount_bdev+0x182/0x1b0 > ? ocfs2_initialize_super.isra.0+0xf50/0xf50 [ocfs2 411bc..281] > legacy_get_tree+0x24/0x40 > vfs_get_tree+0x22/0xb0 > path_mount+0x465/0xac0 > __x64_sys_mount+0x103/0x140 > do_syscall_64+0x59/0x80 > ? syscall_exit_to_user_mode+0x18/0x40 > ? do_syscall_64+0x69/0x80 > ? syscall_exit_to_user_mode+0x18/0x40 > ? do_syscall_64+0x69/0x80 > ? exc_page_fault+0x68/0x150 > entry_SYSCALL_64_after_hwframe+0x44/0xae > > How to trigger: > > tb-ocfs1 # dd if=/dev/zero of=/dev/vdb bs=1M count=20 oflag=direct > tb-ocfs1 # mkfs.ocfs2 --cluster-stack=pcmk -N 4 /dev/vdb --cluster-name=ocfstst > > == there is no dlm running == > tb-ocfs1 # mount -t ocfs2 /dev/vdb /mnt > == kernel crash == > > Analysis: > > After commit da5e7c87827e8 ("ocfs2: cleanup journal init and shutdown"), > Journal init later than before, it makes NULL pointer access in free > routine. > > ocfs2_fill_super > + ocfs2_mount_volume > | ocfs2_dlm_init //failed, osb->journal still NULL. > + ... > + ocfs2_dismount_volume > ocfs2_release_system_inodes > ... > evict > ... > ocfs2_clear_inode > ocfs2_checkpoint_inode > ocfs2_ci_fully_checkpointed > time_after(journal->j_trans_id, ci->ci_last_trans) > + journal is empty, crash! > > The fix should assess osb->journal before journal operation. And the fix > should be only related with abnormal case before ocfs2_journal_init(). > > Fixes: da5e7c87827e8 ("ocfs2: cleanup journal init and shutdown") > Signed-off-by: Heming Zhao <heming.zhao@suse.com> > --- > v2: revise commit log > add checking in ocfs2_dismount_volume() > --- > fs/ocfs2/inode.c | 3 ++- > fs/ocfs2/journal.h | 2 +- > fs/ocfs2/super.c | 7 ++++--- > 3 files changed, 7 insertions(+), 5 deletions(-) > > diff --git a/fs/ocfs2/inode.c b/fs/ocfs2/inode.c > index 6c2411c2afcf..3826ab7eab3e 100644 > --- a/fs/ocfs2/inode.c > +++ b/fs/ocfs2/inode.c > @@ -1205,7 +1205,8 @@ static void ocfs2_clear_inode(struct inode *inode) > * the journal is flushed before journal shutdown. Thus it is safe to > * have inodes get cleaned up after journal shutdown. > */ > - jbd2_journal_release_jbd_inode(osb->journal->j_journal, > + if (osb->journal) > + jbd2_journal_release_jbd_inode(osb->journal->j_journal, > &oi->ip_jinode); > } > > diff --git a/fs/ocfs2/journal.h b/fs/ocfs2/journal.h > index 8dcb2f2cadbc..1610d49b4112 100644 > --- a/fs/ocfs2/journal.h > +++ b/fs/ocfs2/journal.h > @@ -189,7 +189,7 @@ static inline void ocfs2_checkpoint_inode(struct inode *inode) > { > struct ocfs2_super *osb = OCFS2_SB(inode->i_sb); > > - if (ocfs2_mount_local(osb)) > + if (!osb->journal || ocfs2_mount_local(osb)) > return; > > if (!ocfs2_ci_fully_checkpointed(INODE_CACHE(inode))) { > diff --git a/fs/ocfs2/super.c b/fs/ocfs2/super.c > index 2772dec9dcea..01f1ab7c8de8 100644 > --- a/fs/ocfs2/super.c > +++ b/fs/ocfs2/super.c > @@ -1886,9 +1886,10 @@ static void ocfs2_dismount_volume(struct super_block *sb, int mnt_err) > /* Wait for worker to be done with the work structure in osb */ > cancel_work_sync(&osb->dquot_drop_work); > > - ocfs2_shutdown_local_alloc(osb); > - > - ocfs2_truncate_log_shutdown(osb); > + if (osb->journal) { > + ocfs2_shutdown_local_alloc(osb); > + ocfs2_truncate_log_shutdown(osb); > + } > > /* This will disable recovery and flush any recovery work. */ > ocfs2_recovery_exit(osb);
On 3/11/22 10:33, Joseph Qi wrote: > Hi, > To prevent the crash, I think these changes are fine. > But it looks wried to spread the check of ocfs2 journal. Yes, I have same feeling when I'm making patch. > Can we intialize the ocfs2 journal much earlier? > calling it earlier only makes the gap more narrow not disappear. (another way is to revert the commit da5e7c87827e8, it will make author unhappy) I'm just learning ocfs2 code, don't have too much knowledge. please forgive me for below foolish writing. IMO all dirty data is useless on mounting phase, and can be safely dropped. (or IOW, all the dirty data before journal init should be dropped directly) I have tried to mark readonly when error happen, want to use readonly to skip write operations but still crash. I checked __ext4_fill_super, who calls ext4_load_journal lately too, but it uses specific free steps before function return. Maybe we could do the same clean job in ocfs2_fill_super or ocfs2_mount_volume(I prefer this func). This idea looks need to write some new functions. Thanks > > On 3/10/22 6:58 PM, Heming Zhao wrote: >> Call trace: >> >> ocfs2: Registered cluster interface user >> dlm: no local IP address has been set >> dlm: cannot start dlm lowcomms -107 >> (mount.ocfs2,2225,0):ocfs2_dlm_init:3355 ERROR: status = -107 >> (mount.ocfs2,2225,0):ocfs2_mount_volume:1817 ERROR: status = -107 >> (mount.ocfs2,2225,0):ocfs2_fill_super:1186 ERROR: status = -107 >> BUG: kernel NULL pointer dereference, address: 0000000000000030 >> #PF: supervisor read access in kernel mode >> #PF: error_code(0x0000) - not-present page >> PGD 0 P4D 0 >> Oops: 0000 [#1] PREEMPT SMP NOPTI >> CPU: 0 PID: 2225 Comm: mount.ocfs2 Not tainted 5.16.2-1-default #1 openSUSE Tumbleweed b40...e84 >> Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS ... >> RIP: 0010:ocfs2_clear_inode+0x2e9/0x720 [ocfs2] >> ... >> Call Trace: >> <TASK> >> ? ocfs2_evict_inode+0x1fe/0x630 [ocfs2 411bc..281] >> evict+0xc0/0x1c0 >> ocfs2_release_system_inodes+0x21/0xc0 [ocfs2 411bc..281] >> ocfs2_dismount_volume+0x10b/0x2d0 [ocfs2 411bc..281] >> ocfs2_fill_super+0xaf/0x19e0 [ocfs2 411bc..281] >> mount_bdev+0x182/0x1b0 >> ? ocfs2_initialize_super.isra.0+0xf50/0xf50 [ocfs2 411bc..281] >> legacy_get_tree+0x24/0x40 >> vfs_get_tree+0x22/0xb0 >> path_mount+0x465/0xac0 >> __x64_sys_mount+0x103/0x140 >> do_syscall_64+0x59/0x80 >> ? syscall_exit_to_user_mode+0x18/0x40 >> ? do_syscall_64+0x69/0x80 >> ? syscall_exit_to_user_mode+0x18/0x40 >> ? do_syscall_64+0x69/0x80 >> ? exc_page_fault+0x68/0x150 >> entry_SYSCALL_64_after_hwframe+0x44/0xae >> >> How to trigger: >> >> tb-ocfs1 # dd if=/dev/zero of=/dev/vdb bs=1M count=20 oflag=direct >> tb-ocfs1 # mkfs.ocfs2 --cluster-stack=pcmk -N 4 /dev/vdb --cluster-name=ocfstst >> >> == there is no dlm running == >> tb-ocfs1 # mount -t ocfs2 /dev/vdb /mnt >> == kernel crash == >> >> Analysis: >> >> After commit da5e7c87827e8 ("ocfs2: cleanup journal init and shutdown"), >> Journal init later than before, it makes NULL pointer access in free >> routine. >> >> ocfs2_fill_super >> + ocfs2_mount_volume >> | ocfs2_dlm_init //failed, osb->journal still NULL. >> + ... >> + ocfs2_dismount_volume >> ocfs2_release_system_inodes >> ... >> evict >> ... >> ocfs2_clear_inode >> ocfs2_checkpoint_inode >> ocfs2_ci_fully_checkpointed >> time_after(journal->j_trans_id, ci->ci_last_trans) >> + journal is empty, crash! >> >> The fix should assess osb->journal before journal operation. And the fix >> should be only related with abnormal case before ocfs2_journal_init(). >> >> Fixes: da5e7c87827e8 ("ocfs2: cleanup journal init and shutdown") >> Signed-off-by: Heming Zhao <heming.zhao@suse.com> >> --- >> v2: revise commit log >> add checking in ocfs2_dismount_volume() >> --- >> fs/ocfs2/inode.c | 3 ++- >> fs/ocfs2/journal.h | 2 +- >> fs/ocfs2/super.c | 7 ++++--- >> 3 files changed, 7 insertions(+), 5 deletions(-) >> >> diff --git a/fs/ocfs2/inode.c b/fs/ocfs2/inode.c >> index 6c2411c2afcf..3826ab7eab3e 100644 >> --- a/fs/ocfs2/inode.c >> +++ b/fs/ocfs2/inode.c >> @@ -1205,7 +1205,8 @@ static void ocfs2_clear_inode(struct inode *inode) >> * the journal is flushed before journal shutdown. Thus it is safe to >> * have inodes get cleaned up after journal shutdown. >> */ >> - jbd2_journal_release_jbd_inode(osb->journal->j_journal, >> + if (osb->journal) >> + jbd2_journal_release_jbd_inode(osb->journal->j_journal, >> &oi->ip_jinode); >> } >> >> diff --git a/fs/ocfs2/journal.h b/fs/ocfs2/journal.h >> index 8dcb2f2cadbc..1610d49b4112 100644 >> --- a/fs/ocfs2/journal.h >> +++ b/fs/ocfs2/journal.h >> @@ -189,7 +189,7 @@ static inline void ocfs2_checkpoint_inode(struct inode *inode) >> { >> struct ocfs2_super *osb = OCFS2_SB(inode->i_sb); >> >> - if (ocfs2_mount_local(osb)) >> + if (!osb->journal || ocfs2_mount_local(osb)) >> return; >> >> if (!ocfs2_ci_fully_checkpointed(INODE_CACHE(inode))) { >> diff --git a/fs/ocfs2/super.c b/fs/ocfs2/super.c >> index 2772dec9dcea..01f1ab7c8de8 100644 >> --- a/fs/ocfs2/super.c >> +++ b/fs/ocfs2/super.c >> @@ -1886,9 +1886,10 @@ static void ocfs2_dismount_volume(struct super_block *sb, int mnt_err) >> /* Wait for worker to be done with the work structure in osb */ >> cancel_work_sync(&osb->dquot_drop_work); >> >> - ocfs2_shutdown_local_alloc(osb); >> - >> - ocfs2_truncate_log_shutdown(osb); >> + if (osb->journal) { >> + ocfs2_shutdown_local_alloc(osb); >> + ocfs2_truncate_log_shutdown(osb); >> + } >> >> /* This will disable recovery and flush any recovery work. */ >> ocfs2_recovery_exit(osb); >
diff --git a/fs/ocfs2/inode.c b/fs/ocfs2/inode.c index 6c2411c2afcf..3826ab7eab3e 100644 --- a/fs/ocfs2/inode.c +++ b/fs/ocfs2/inode.c @@ -1205,7 +1205,8 @@ static void ocfs2_clear_inode(struct inode *inode) * the journal is flushed before journal shutdown. Thus it is safe to * have inodes get cleaned up after journal shutdown. */ - jbd2_journal_release_jbd_inode(osb->journal->j_journal, + if (osb->journal) + jbd2_journal_release_jbd_inode(osb->journal->j_journal, &oi->ip_jinode); } diff --git a/fs/ocfs2/journal.h b/fs/ocfs2/journal.h index 8dcb2f2cadbc..1610d49b4112 100644 --- a/fs/ocfs2/journal.h +++ b/fs/ocfs2/journal.h @@ -189,7 +189,7 @@ static inline void ocfs2_checkpoint_inode(struct inode *inode) { struct ocfs2_super *osb = OCFS2_SB(inode->i_sb); - if (ocfs2_mount_local(osb)) + if (!osb->journal || ocfs2_mount_local(osb)) return; if (!ocfs2_ci_fully_checkpointed(INODE_CACHE(inode))) { diff --git a/fs/ocfs2/super.c b/fs/ocfs2/super.c index 2772dec9dcea..01f1ab7c8de8 100644 --- a/fs/ocfs2/super.c +++ b/fs/ocfs2/super.c @@ -1886,9 +1886,10 @@ static void ocfs2_dismount_volume(struct super_block *sb, int mnt_err) /* Wait for worker to be done with the work structure in osb */ cancel_work_sync(&osb->dquot_drop_work); - ocfs2_shutdown_local_alloc(osb); - - ocfs2_truncate_log_shutdown(osb); + if (osb->journal) { + ocfs2_shutdown_local_alloc(osb); + ocfs2_truncate_log_shutdown(osb); + } /* This will disable recovery and flush any recovery work. */ ocfs2_recovery_exit(osb);
Call trace: ocfs2: Registered cluster interface user dlm: no local IP address has been set dlm: cannot start dlm lowcomms -107 (mount.ocfs2,2225,0):ocfs2_dlm_init:3355 ERROR: status = -107 (mount.ocfs2,2225,0):ocfs2_mount_volume:1817 ERROR: status = -107 (mount.ocfs2,2225,0):ocfs2_fill_super:1186 ERROR: status = -107 BUG: kernel NULL pointer dereference, address: 0000000000000030 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD 0 P4D 0 Oops: 0000 [#1] PREEMPT SMP NOPTI CPU: 0 PID: 2225 Comm: mount.ocfs2 Not tainted 5.16.2-1-default #1 openSUSE Tumbleweed b40...e84 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS ... RIP: 0010:ocfs2_clear_inode+0x2e9/0x720 [ocfs2] ... Call Trace: <TASK> ? ocfs2_evict_inode+0x1fe/0x630 [ocfs2 411bc..281] evict+0xc0/0x1c0 ocfs2_release_system_inodes+0x21/0xc0 [ocfs2 411bc..281] ocfs2_dismount_volume+0x10b/0x2d0 [ocfs2 411bc..281] ocfs2_fill_super+0xaf/0x19e0 [ocfs2 411bc..281] mount_bdev+0x182/0x1b0 ? ocfs2_initialize_super.isra.0+0xf50/0xf50 [ocfs2 411bc..281] legacy_get_tree+0x24/0x40 vfs_get_tree+0x22/0xb0 path_mount+0x465/0xac0 __x64_sys_mount+0x103/0x140 do_syscall_64+0x59/0x80 ? syscall_exit_to_user_mode+0x18/0x40 ? do_syscall_64+0x69/0x80 ? syscall_exit_to_user_mode+0x18/0x40 ? do_syscall_64+0x69/0x80 ? exc_page_fault+0x68/0x150 entry_SYSCALL_64_after_hwframe+0x44/0xae How to trigger: tb-ocfs1 # dd if=/dev/zero of=/dev/vdb bs=1M count=20 oflag=direct tb-ocfs1 # mkfs.ocfs2 --cluster-stack=pcmk -N 4 /dev/vdb --cluster-name=ocfstst == there is no dlm running == tb-ocfs1 # mount -t ocfs2 /dev/vdb /mnt == kernel crash == Analysis: After commit da5e7c87827e8 ("ocfs2: cleanup journal init and shutdown"), Journal init later than before, it makes NULL pointer access in free routine. ocfs2_fill_super + ocfs2_mount_volume | ocfs2_dlm_init //failed, osb->journal still NULL. + ... + ocfs2_dismount_volume ocfs2_release_system_inodes ... evict ... ocfs2_clear_inode ocfs2_checkpoint_inode ocfs2_ci_fully_checkpointed time_after(journal->j_trans_id, ci->ci_last_trans) + journal is empty, crash! The fix should assess osb->journal before journal operation. And the fix should be only related with abnormal case before ocfs2_journal_init(). Fixes: da5e7c87827e8 ("ocfs2: cleanup journal init and shutdown") Signed-off-by: Heming Zhao <heming.zhao@suse.com> --- v2: revise commit log add checking in ocfs2_dismount_volume() --- fs/ocfs2/inode.c | 3 ++- fs/ocfs2/journal.h | 2 +- fs/ocfs2/super.c | 7 ++++--- 3 files changed, 7 insertions(+), 5 deletions(-)