diff mbox series

[v2] ocfs2: fix kernel crash after mounting when dlm doesn't ready

Message ID 20220310105850.1817-1-heming.zhao@suse.com (mailing list archive)
State New, archived
Headers show
Series [v2] ocfs2: fix kernel crash after mounting when dlm doesn't ready | expand

Commit Message

heming.zhao@suse.com March 10, 2022, 10:58 a.m. UTC
Call trace:

  ocfs2: Registered cluster interface user
  dlm: no local IP address has been set
  dlm: cannot start dlm lowcomms -107
  (mount.ocfs2,2225,0):ocfs2_dlm_init:3355 ERROR: status = -107
  (mount.ocfs2,2225,0):ocfs2_mount_volume:1817 ERROR: status = -107
  (mount.ocfs2,2225,0):ocfs2_fill_super:1186 ERROR: status = -107
  BUG: kernel NULL pointer dereference, address: 0000000000000030
  #PF: supervisor read access in kernel mode
  #PF: error_code(0x0000) - not-present page
  PGD 0 P4D 0
  Oops: 0000 [#1] PREEMPT SMP NOPTI
  CPU: 0 PID: 2225 Comm: mount.ocfs2 Not tainted 5.16.2-1-default #1 openSUSE Tumbleweed b40...e84
  Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS ...
  RIP: 0010:ocfs2_clear_inode+0x2e9/0x720 [ocfs2]
  ...
  Call Trace:
   <TASK>
   ? ocfs2_evict_inode+0x1fe/0x630 [ocfs2 411bc..281]
   evict+0xc0/0x1c0
   ocfs2_release_system_inodes+0x21/0xc0 [ocfs2 411bc..281]
   ocfs2_dismount_volume+0x10b/0x2d0 [ocfs2 411bc..281]
   ocfs2_fill_super+0xaf/0x19e0 [ocfs2 411bc..281]
   mount_bdev+0x182/0x1b0
   ? ocfs2_initialize_super.isra.0+0xf50/0xf50 [ocfs2 411bc..281]
   legacy_get_tree+0x24/0x40
   vfs_get_tree+0x22/0xb0
   path_mount+0x465/0xac0
   __x64_sys_mount+0x103/0x140
   do_syscall_64+0x59/0x80
   ? syscall_exit_to_user_mode+0x18/0x40
   ? do_syscall_64+0x69/0x80
   ? syscall_exit_to_user_mode+0x18/0x40
   ? do_syscall_64+0x69/0x80
   ? exc_page_fault+0x68/0x150
   entry_SYSCALL_64_after_hwframe+0x44/0xae

How to trigger:

  tb-ocfs1 # dd if=/dev/zero of=/dev/vdb bs=1M count=20 oflag=direct
  tb-ocfs1 # mkfs.ocfs2 --cluster-stack=pcmk -N 4 /dev/vdb --cluster-name=ocfstst

             == there is no dlm running ==
  tb-ocfs1 # mount -t ocfs2 /dev/vdb /mnt
             == kernel crash ==

Analysis:

After commit da5e7c87827e8 ("ocfs2: cleanup journal init and shutdown"),
Journal init later than before, it makes NULL pointer access in free
routine.

ocfs2_fill_super
 + ocfs2_mount_volume
 |  ocfs2_dlm_init //failed, osb->journal still NULL.
 + ...
 + ocfs2_dismount_volume
    ocfs2_release_system_inodes
      ...
       evict
        ...
         ocfs2_clear_inode
          ocfs2_checkpoint_inode
           ocfs2_ci_fully_checkpointed
            time_after(journal->j_trans_id, ci->ci_last_trans)
             + journal is empty, crash!

The fix should assess osb->journal before journal operation. And the fix
should be only related with abnormal case before ocfs2_journal_init().

Fixes: da5e7c87827e8 ("ocfs2: cleanup journal init and shutdown")
Signed-off-by: Heming Zhao <heming.zhao@suse.com>
---
v2: revise commit log
    add checking in ocfs2_dismount_volume()
---
 fs/ocfs2/inode.c   | 3 ++-
 fs/ocfs2/journal.h | 2 +-
 fs/ocfs2/super.c   | 7 ++++---
 3 files changed, 7 insertions(+), 5 deletions(-)

Comments

Joseph Qi March 11, 2022, 2:33 a.m. UTC | #1
Hi,
To prevent the crash, I think these changes are fine.
But it looks wried to spread the check of ocfs2 journal.
Can we intialize the ocfs2 journal much earlier?

Thanks,
Joseph

On 3/10/22 6:58 PM, Heming Zhao wrote:
> Call trace:
> 
>   ocfs2: Registered cluster interface user
>   dlm: no local IP address has been set
>   dlm: cannot start dlm lowcomms -107
>   (mount.ocfs2,2225,0):ocfs2_dlm_init:3355 ERROR: status = -107
>   (mount.ocfs2,2225,0):ocfs2_mount_volume:1817 ERROR: status = -107
>   (mount.ocfs2,2225,0):ocfs2_fill_super:1186 ERROR: status = -107
>   BUG: kernel NULL pointer dereference, address: 0000000000000030
>   #PF: supervisor read access in kernel mode
>   #PF: error_code(0x0000) - not-present page
>   PGD 0 P4D 0
>   Oops: 0000 [#1] PREEMPT SMP NOPTI
>   CPU: 0 PID: 2225 Comm: mount.ocfs2 Not tainted 5.16.2-1-default #1 openSUSE Tumbleweed b40...e84
>   Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS ...
>   RIP: 0010:ocfs2_clear_inode+0x2e9/0x720 [ocfs2]
>   ...
>   Call Trace:
>    <TASK>
>    ? ocfs2_evict_inode+0x1fe/0x630 [ocfs2 411bc..281]
>    evict+0xc0/0x1c0
>    ocfs2_release_system_inodes+0x21/0xc0 [ocfs2 411bc..281]
>    ocfs2_dismount_volume+0x10b/0x2d0 [ocfs2 411bc..281]
>    ocfs2_fill_super+0xaf/0x19e0 [ocfs2 411bc..281]
>    mount_bdev+0x182/0x1b0
>    ? ocfs2_initialize_super.isra.0+0xf50/0xf50 [ocfs2 411bc..281]
>    legacy_get_tree+0x24/0x40
>    vfs_get_tree+0x22/0xb0
>    path_mount+0x465/0xac0
>    __x64_sys_mount+0x103/0x140
>    do_syscall_64+0x59/0x80
>    ? syscall_exit_to_user_mode+0x18/0x40
>    ? do_syscall_64+0x69/0x80
>    ? syscall_exit_to_user_mode+0x18/0x40
>    ? do_syscall_64+0x69/0x80
>    ? exc_page_fault+0x68/0x150
>    entry_SYSCALL_64_after_hwframe+0x44/0xae
> 
> How to trigger:
> 
>   tb-ocfs1 # dd if=/dev/zero of=/dev/vdb bs=1M count=20 oflag=direct
>   tb-ocfs1 # mkfs.ocfs2 --cluster-stack=pcmk -N 4 /dev/vdb --cluster-name=ocfstst
> 
>              == there is no dlm running ==
>   tb-ocfs1 # mount -t ocfs2 /dev/vdb /mnt
>              == kernel crash ==
> 
> Analysis:
> 
> After commit da5e7c87827e8 ("ocfs2: cleanup journal init and shutdown"),
> Journal init later than before, it makes NULL pointer access in free
> routine.
> 
> ocfs2_fill_super
>  + ocfs2_mount_volume
>  |  ocfs2_dlm_init //failed, osb->journal still NULL.
>  + ...
>  + ocfs2_dismount_volume
>     ocfs2_release_system_inodes
>       ...
>        evict
>         ...
>          ocfs2_clear_inode
>           ocfs2_checkpoint_inode
>            ocfs2_ci_fully_checkpointed
>             time_after(journal->j_trans_id, ci->ci_last_trans)
>              + journal is empty, crash!
> 
> The fix should assess osb->journal before journal operation. And the fix
> should be only related with abnormal case before ocfs2_journal_init().
> 
> Fixes: da5e7c87827e8 ("ocfs2: cleanup journal init and shutdown")
> Signed-off-by: Heming Zhao <heming.zhao@suse.com>
> ---
> v2: revise commit log
>     add checking in ocfs2_dismount_volume()
> ---
>  fs/ocfs2/inode.c   | 3 ++-
>  fs/ocfs2/journal.h | 2 +-
>  fs/ocfs2/super.c   | 7 ++++---
>  3 files changed, 7 insertions(+), 5 deletions(-)
> 
> diff --git a/fs/ocfs2/inode.c b/fs/ocfs2/inode.c
> index 6c2411c2afcf..3826ab7eab3e 100644
> --- a/fs/ocfs2/inode.c
> +++ b/fs/ocfs2/inode.c
> @@ -1205,7 +1205,8 @@ static void ocfs2_clear_inode(struct inode *inode)
>  	 * the journal is flushed before journal shutdown. Thus it is safe to
>  	 * have inodes get cleaned up after journal shutdown.
>  	 */
> -	jbd2_journal_release_jbd_inode(osb->journal->j_journal,
> +	if (osb->journal)
> +		jbd2_journal_release_jbd_inode(osb->journal->j_journal,
>  				       &oi->ip_jinode);
>  }
>  
> diff --git a/fs/ocfs2/journal.h b/fs/ocfs2/journal.h
> index 8dcb2f2cadbc..1610d49b4112 100644
> --- a/fs/ocfs2/journal.h
> +++ b/fs/ocfs2/journal.h
> @@ -189,7 +189,7 @@ static inline void ocfs2_checkpoint_inode(struct inode *inode)
>  {
>  	struct ocfs2_super *osb = OCFS2_SB(inode->i_sb);
>  
> -	if (ocfs2_mount_local(osb))
> +	if (!osb->journal || ocfs2_mount_local(osb))
>  		return;
>  
>  	if (!ocfs2_ci_fully_checkpointed(INODE_CACHE(inode))) {
> diff --git a/fs/ocfs2/super.c b/fs/ocfs2/super.c
> index 2772dec9dcea..01f1ab7c8de8 100644
> --- a/fs/ocfs2/super.c
> +++ b/fs/ocfs2/super.c
> @@ -1886,9 +1886,10 @@ static void ocfs2_dismount_volume(struct super_block *sb, int mnt_err)
>  	/* Wait for worker to be done with the work structure in osb */
>  	cancel_work_sync(&osb->dquot_drop_work);
>  
> -	ocfs2_shutdown_local_alloc(osb);
> -
> -	ocfs2_truncate_log_shutdown(osb);
> +	if (osb->journal) {
> +		ocfs2_shutdown_local_alloc(osb);
> +		ocfs2_truncate_log_shutdown(osb);
> +	}
>  
>  	/* This will disable recovery and flush any recovery work. */
>  	ocfs2_recovery_exit(osb);
David Sterba via Ocfs2-devel March 11, 2022, 8:10 a.m. UTC | #2
On 3/11/22 10:33, Joseph Qi wrote:
> Hi,
> To prevent the crash, I think these changes are fine.
> But it looks wried to spread the check of ocfs2 journal.

Yes, I have same feeling when I'm making patch.

> Can we intialize the ocfs2 journal much earlier?
> 

calling it earlier only makes the gap more narrow not disappear.
(another way is to revert the commit da5e7c87827e8, it will make author unhappy)

I'm just learning ocfs2 code, don't have too much knowledge.
please forgive me for below foolish writing.

IMO all dirty data is useless on mounting phase, and can be safely dropped.

(or IOW, all the dirty data before journal init should be dropped directly)
I have tried to mark readonly when error happen, want to use readonly to skip
write operations but still crash.


I checked __ext4_fill_super, who calls ext4_load_journal lately too, but it uses
specific free steps before function return. Maybe we could do the same clean job
in ocfs2_fill_super or ocfs2_mount_volume(I prefer this func). This idea looks
need to write some new functions.

Thanks
> 
> On 3/10/22 6:58 PM, Heming Zhao wrote:
>> Call trace:
>>
>>    ocfs2: Registered cluster interface user
>>    dlm: no local IP address has been set
>>    dlm: cannot start dlm lowcomms -107
>>    (mount.ocfs2,2225,0):ocfs2_dlm_init:3355 ERROR: status = -107
>>    (mount.ocfs2,2225,0):ocfs2_mount_volume:1817 ERROR: status = -107
>>    (mount.ocfs2,2225,0):ocfs2_fill_super:1186 ERROR: status = -107
>>    BUG: kernel NULL pointer dereference, address: 0000000000000030
>>    #PF: supervisor read access in kernel mode
>>    #PF: error_code(0x0000) - not-present page
>>    PGD 0 P4D 0
>>    Oops: 0000 [#1] PREEMPT SMP NOPTI
>>    CPU: 0 PID: 2225 Comm: mount.ocfs2 Not tainted 5.16.2-1-default #1 openSUSE Tumbleweed b40...e84
>>    Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS ...
>>    RIP: 0010:ocfs2_clear_inode+0x2e9/0x720 [ocfs2]
>>    ...
>>    Call Trace:
>>     <TASK>
>>     ? ocfs2_evict_inode+0x1fe/0x630 [ocfs2 411bc..281]
>>     evict+0xc0/0x1c0
>>     ocfs2_release_system_inodes+0x21/0xc0 [ocfs2 411bc..281]
>>     ocfs2_dismount_volume+0x10b/0x2d0 [ocfs2 411bc..281]
>>     ocfs2_fill_super+0xaf/0x19e0 [ocfs2 411bc..281]
>>     mount_bdev+0x182/0x1b0
>>     ? ocfs2_initialize_super.isra.0+0xf50/0xf50 [ocfs2 411bc..281]
>>     legacy_get_tree+0x24/0x40
>>     vfs_get_tree+0x22/0xb0
>>     path_mount+0x465/0xac0
>>     __x64_sys_mount+0x103/0x140
>>     do_syscall_64+0x59/0x80
>>     ? syscall_exit_to_user_mode+0x18/0x40
>>     ? do_syscall_64+0x69/0x80
>>     ? syscall_exit_to_user_mode+0x18/0x40
>>     ? do_syscall_64+0x69/0x80
>>     ? exc_page_fault+0x68/0x150
>>     entry_SYSCALL_64_after_hwframe+0x44/0xae
>>
>> How to trigger:
>>
>>    tb-ocfs1 # dd if=/dev/zero of=/dev/vdb bs=1M count=20 oflag=direct
>>    tb-ocfs1 # mkfs.ocfs2 --cluster-stack=pcmk -N 4 /dev/vdb --cluster-name=ocfstst
>>
>>               == there is no dlm running ==
>>    tb-ocfs1 # mount -t ocfs2 /dev/vdb /mnt
>>               == kernel crash ==
>>
>> Analysis:
>>
>> After commit da5e7c87827e8 ("ocfs2: cleanup journal init and shutdown"),
>> Journal init later than before, it makes NULL pointer access in free
>> routine.
>>
>> ocfs2_fill_super
>>   + ocfs2_mount_volume
>>   |  ocfs2_dlm_init //failed, osb->journal still NULL.
>>   + ...
>>   + ocfs2_dismount_volume
>>      ocfs2_release_system_inodes
>>        ...
>>         evict
>>          ...
>>           ocfs2_clear_inode
>>            ocfs2_checkpoint_inode
>>             ocfs2_ci_fully_checkpointed
>>              time_after(journal->j_trans_id, ci->ci_last_trans)
>>               + journal is empty, crash!
>>
>> The fix should assess osb->journal before journal operation. And the fix
>> should be only related with abnormal case before ocfs2_journal_init().
>>
>> Fixes: da5e7c87827e8 ("ocfs2: cleanup journal init and shutdown")
>> Signed-off-by: Heming Zhao <heming.zhao@suse.com>
>> ---
>> v2: revise commit log
>>      add checking in ocfs2_dismount_volume()
>> ---
>>   fs/ocfs2/inode.c   | 3 ++-
>>   fs/ocfs2/journal.h | 2 +-
>>   fs/ocfs2/super.c   | 7 ++++---
>>   3 files changed, 7 insertions(+), 5 deletions(-)
>>
>> diff --git a/fs/ocfs2/inode.c b/fs/ocfs2/inode.c
>> index 6c2411c2afcf..3826ab7eab3e 100644
>> --- a/fs/ocfs2/inode.c
>> +++ b/fs/ocfs2/inode.c
>> @@ -1205,7 +1205,8 @@ static void ocfs2_clear_inode(struct inode *inode)
>>   	 * the journal is flushed before journal shutdown. Thus it is safe to
>>   	 * have inodes get cleaned up after journal shutdown.
>>   	 */
>> -	jbd2_journal_release_jbd_inode(osb->journal->j_journal,
>> +	if (osb->journal)
>> +		jbd2_journal_release_jbd_inode(osb->journal->j_journal,
>>   				       &oi->ip_jinode);
>>   }
>>   
>> diff --git a/fs/ocfs2/journal.h b/fs/ocfs2/journal.h
>> index 8dcb2f2cadbc..1610d49b4112 100644
>> --- a/fs/ocfs2/journal.h
>> +++ b/fs/ocfs2/journal.h
>> @@ -189,7 +189,7 @@ static inline void ocfs2_checkpoint_inode(struct inode *inode)
>>   {
>>   	struct ocfs2_super *osb = OCFS2_SB(inode->i_sb);
>>   
>> -	if (ocfs2_mount_local(osb))
>> +	if (!osb->journal || ocfs2_mount_local(osb))
>>   		return;
>>   
>>   	if (!ocfs2_ci_fully_checkpointed(INODE_CACHE(inode))) {
>> diff --git a/fs/ocfs2/super.c b/fs/ocfs2/super.c
>> index 2772dec9dcea..01f1ab7c8de8 100644
>> --- a/fs/ocfs2/super.c
>> +++ b/fs/ocfs2/super.c
>> @@ -1886,9 +1886,10 @@ static void ocfs2_dismount_volume(struct super_block *sb, int mnt_err)
>>   	/* Wait for worker to be done with the work structure in osb */
>>   	cancel_work_sync(&osb->dquot_drop_work);
>>   
>> -	ocfs2_shutdown_local_alloc(osb);
>> -
>> -	ocfs2_truncate_log_shutdown(osb);
>> +	if (osb->journal) {
>> +		ocfs2_shutdown_local_alloc(osb);
>> +		ocfs2_truncate_log_shutdown(osb);
>> +	}
>>   
>>   	/* This will disable recovery and flush any recovery work. */
>>   	ocfs2_recovery_exit(osb);
>
diff mbox series

Patch

diff --git a/fs/ocfs2/inode.c b/fs/ocfs2/inode.c
index 6c2411c2afcf..3826ab7eab3e 100644
--- a/fs/ocfs2/inode.c
+++ b/fs/ocfs2/inode.c
@@ -1205,7 +1205,8 @@  static void ocfs2_clear_inode(struct inode *inode)
 	 * the journal is flushed before journal shutdown. Thus it is safe to
 	 * have inodes get cleaned up after journal shutdown.
 	 */
-	jbd2_journal_release_jbd_inode(osb->journal->j_journal,
+	if (osb->journal)
+		jbd2_journal_release_jbd_inode(osb->journal->j_journal,
 				       &oi->ip_jinode);
 }
 
diff --git a/fs/ocfs2/journal.h b/fs/ocfs2/journal.h
index 8dcb2f2cadbc..1610d49b4112 100644
--- a/fs/ocfs2/journal.h
+++ b/fs/ocfs2/journal.h
@@ -189,7 +189,7 @@  static inline void ocfs2_checkpoint_inode(struct inode *inode)
 {
 	struct ocfs2_super *osb = OCFS2_SB(inode->i_sb);
 
-	if (ocfs2_mount_local(osb))
+	if (!osb->journal || ocfs2_mount_local(osb))
 		return;
 
 	if (!ocfs2_ci_fully_checkpointed(INODE_CACHE(inode))) {
diff --git a/fs/ocfs2/super.c b/fs/ocfs2/super.c
index 2772dec9dcea..01f1ab7c8de8 100644
--- a/fs/ocfs2/super.c
+++ b/fs/ocfs2/super.c
@@ -1886,9 +1886,10 @@  static void ocfs2_dismount_volume(struct super_block *sb, int mnt_err)
 	/* Wait for worker to be done with the work structure in osb */
 	cancel_work_sync(&osb->dquot_drop_work);
 
-	ocfs2_shutdown_local_alloc(osb);
-
-	ocfs2_truncate_log_shutdown(osb);
+	if (osb->journal) {
+		ocfs2_shutdown_local_alloc(osb);
+		ocfs2_truncate_log_shutdown(osb);
+	}
 
 	/* This will disable recovery and flush any recovery work. */
 	ocfs2_recovery_exit(osb);