diff mbox series

fix kernel crash after mounting when dlm doesn't ready

Message ID 20220309133020.1869-1-heming.zhao@suse.com (mailing list archive)
State New, archived
Headers show
Series fix kernel crash after mounting when dlm doesn't ready | expand

Commit Message

heming.zhao@suse.com March 9, 2022, 1:30 p.m. UTC
How to trigger:

 tb-ocfs1 # dd if=/dev/zero of=/dev/vdb bs=1M count=20 oflag=direct
 tb-ocfs1 # mkfs.ocfs2 --cluster-stack=pcmk -N 4 /dev/vdb \
            --cluster-name=ocfstst
            == there is no dlm running ==
 tb-ocfs1 # mount -t ocfs2 /dev/vdb /mnt
            == kernel crash ==

Crash log

```
kernel: DLM installed
kernel: ocfs2: Registered cluster interface user
kernel: dlm: no local IP address has been set
kernel: dlm: cannot start dlm lowcomms -107
kernel: (mount.ocfs2,2225,0):ocfs2_dlm_init:3355 ERROR: status = -107
kernel: (mount.ocfs2,2225,0):ocfs2_mount_volume:1817 ERROR: status = -107
kernel: (mount.ocfs2,2225,0):ocfs2_fill_super:1186 ERROR: status = -107
kernel: BUG: kernel NULL pointer dereference, address: 0000000000000030
kernel: #PF: supervisor read access in kernel mode
kernel: #PF: error_code(0x0000) - not-present page
kernel: PGD 0 P4D 0
kernel: Oops: 0000 [#1] PREEMPT SMP NOPTI
kernel: CPU: 0 PID: 2225 Comm: mount.ocfs2 Not tainted 5.16.2-1-default #1 openSUSE Tumbleweed b40a195b7ff0f3399a616c3290f963c4ad189e84
kernel: Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.14.0-0-g155821a-rebuilt.opensuse.org 04/01/2014
kernel: RIP: 0010:ocfs2_clear_inode+0x2e9/0x720 [ocfs2]
kernel: Code: 24 e8 9b 68 04 00 48 c7 c7 70 88 db c0 48 8b 80 98 03 00
00 48 8b 80 70 01 00 00 48 89 44 24 08 e8 3c 50 d8 d8 48 8b 44 24 08
<48> 8b 48 30 49 39 4f c8 0f 88 ff 00 00 00 48 c>
kernel: RSP: 0018:ffffbbf000847bf0 EFLAGS: 00010246
kernel: RAX: 0000000000000000 RBX: ffff95f6834b8000 RCX: 0000000000000000
kernel: RDX: 0000000000000001 RSI: ffff95f6a1fbcbe0 RDI: ffffffffc0db8870
kernel: RBP: ffff95f6a1fbc6b8 R08: 00000ab5a9371b7a R09: 0000000000000230
kernel: R10: ffffbbf000847bc0 R11: ffffffffc0d53ea0 R12: ffff95f6a1fbc560
kernel: R13: ffff95f6a1fbc408 R14: ffff95f6834b8000 R15: ffff95f6a1fbc908
kernel: FS:  00007f366f151740(0000) GS:ffff95f6fdc00000(0000) knlGS:0000000000000000
kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
kernel: CR2: 0000000000000030 CR3: 0000000003fcc004 CR4: 0000000000370ef0
kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
kernel: DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
kernel: Call Trace:
kernel:  <TASK>
kernel:  ? ocfs2_evict_inode+0x1fe/0x630 [ocfs2 411bc..281]
kernel:  evict+0xc0/0x1c0
kernel:  ocfs2_release_system_inodes+0x21/0xc0 [ocfs2 411bc..281]
kernel:  ocfs2_dismount_volume+0x10b/0x2d0 [ocfs2 411bc..281]
kernel:  ocfs2_fill_super+0xaf/0x19e0 [ocfs2 411bc..281]
kernel:  mount_bdev+0x182/0x1b0
kernel:  ? ocfs2_initialize_super.isra.0+0xf50/0xf50 [ocfs2 411bc..281]
kernel:  legacy_get_tree+0x24/0x40
kernel:  vfs_get_tree+0x22/0xb0
kernel:  path_mount+0x465/0xac0
kernel:  __x64_sys_mount+0x103/0x140
kernel:  do_syscall_64+0x59/0x80
kernel:  ? syscall_exit_to_user_mode+0x18/0x40
kernel:  ? do_syscall_64+0x69/0x80
kernel:  ? syscall_exit_to_user_mode+0x18/0x40
kernel:  ? do_syscall_64+0x69/0x80
kernel:  ? exc_page_fault+0x68/0x150
kernel:  entry_SYSCALL_64_after_hwframe+0x44/0xae
```

Analysis:

ocfs2_fill_super
 ocfs2_mount_volume
  ocfs2_dlm_init //failed, journal still doesn't be initized.
   goto read_super_error
    ocfs2_dismount_volume
     ocfs2_release_system_inodes
      ...
       evict
        ...
         ocfs2_clear_inode
          ocfs2_checkpoint_inode
           ocfs2_ci_fully_checkpointed
            time_after(journal->j_trans_id, ci->ci_last_trans)
             + journal is empty, crash!

Signed-off-by: Heming Zhao <heming.zhao@suse.com>
---
 fs/ocfs2/inode.c   | 3 ++-
 fs/ocfs2/journal.h | 2 +-
 2 files changed, 3 insertions(+), 2 deletions(-)

Comments

Joseph Qi March 10, 2022, 3:59 a.m. UTC | #1
It looks this issue is introduced by the following commit:
da5e7c87827e ocfs2: cleanup journal init and shutdown

Before that, journal was initialized in ocfs2_initialize_super().

On 3/9/22 9:30 PM, Heming Zhao wrote:
> How to trigger:
> 
>  tb-ocfs1 # dd if=/dev/zero of=/dev/vdb bs=1M count=20 oflag=direct
>  tb-ocfs1 # mkfs.ocfs2 --cluster-stack=pcmk -N 4 /dev/vdb \
>             --cluster-name=ocfstst
>             == there is no dlm running ==
>  tb-ocfs1 # mount -t ocfs2 /dev/vdb /mnt
>             == kernel crash ==
> 
> Crash log
> 
> ```
> kernel: DLM installed
> kernel: ocfs2: Registered cluster interface user
> kernel: dlm: no local IP address has been set
> kernel: dlm: cannot start dlm lowcomms -107
> kernel: (mount.ocfs2,2225,0):ocfs2_dlm_init:3355 ERROR: status = -107
> kernel: (mount.ocfs2,2225,0):ocfs2_mount_volume:1817 ERROR: status = -107
> kernel: (mount.ocfs2,2225,0):ocfs2_fill_super:1186 ERROR: status = -107
> kernel: BUG: kernel NULL pointer dereference, address: 0000000000000030
> kernel: #PF: supervisor read access in kernel mode
> kernel: #PF: error_code(0x0000) - not-present page
> kernel: PGD 0 P4D 0
> kernel: Oops: 0000 [#1] PREEMPT SMP NOPTI
> kernel: CPU: 0 PID: 2225 Comm: mount.ocfs2 Not tainted 5.16.2-1-default #1 openSUSE Tumbleweed b40a195b7ff0f3399a616c3290f963c4ad189e84
> kernel: Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.14.0-0-g155821a-rebuilt.opensuse.org 04/01/2014
> kernel: RIP: 0010:ocfs2_clear_inode+0x2e9/0x720 [ocfs2]
> kernel: Code: 24 e8 9b 68 04 00 48 c7 c7 70 88 db c0 48 8b 80 98 03 00
> 00 48 8b 80 70 01 00 00 48 89 44 24 08 e8 3c 50 d8 d8 48 8b 44 24 08
> <48> 8b 48 30 49 39 4f c8 0f 88 ff 00 00 00 48 c>
> kernel: RSP: 0018:ffffbbf000847bf0 EFLAGS: 00010246
> kernel: RAX: 0000000000000000 RBX: ffff95f6834b8000 RCX: 0000000000000000
> kernel: RDX: 0000000000000001 RSI: ffff95f6a1fbcbe0 RDI: ffffffffc0db8870
> kernel: RBP: ffff95f6a1fbc6b8 R08: 00000ab5a9371b7a R09: 0000000000000230
> kernel: R10: ffffbbf000847bc0 R11: ffffffffc0d53ea0 R12: ffff95f6a1fbc560
> kernel: R13: ffff95f6a1fbc408 R14: ffff95f6834b8000 R15: ffff95f6a1fbc908
> kernel: FS:  00007f366f151740(0000) GS:ffff95f6fdc00000(0000) knlGS:0000000000000000
> kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> kernel: CR2: 0000000000000030 CR3: 0000000003fcc004 CR4: 0000000000370ef0
> kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> kernel: DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> kernel: Call Trace:
> kernel:  <TASK>
> kernel:  ? ocfs2_evict_inode+0x1fe/0x630 [ocfs2 411bc..281]
> kernel:  evict+0xc0/0x1c0
> kernel:  ocfs2_release_system_inodes+0x21/0xc0 [ocfs2 411bc..281]
> kernel:  ocfs2_dismount_volume+0x10b/0x2d0 [ocfs2 411bc..281]
> kernel:  ocfs2_fill_super+0xaf/0x19e0 [ocfs2 411bc..281]
> kernel:  mount_bdev+0x182/0x1b0
> kernel:  ? ocfs2_initialize_super.isra.0+0xf50/0xf50 [ocfs2 411bc..281]
> kernel:  legacy_get_tree+0x24/0x40
> kernel:  vfs_get_tree+0x22/0xb0
> kernel:  path_mount+0x465/0xac0
> kernel:  __x64_sys_mount+0x103/0x140
> kernel:  do_syscall_64+0x59/0x80
> kernel:  ? syscall_exit_to_user_mode+0x18/0x40
> kernel:  ? do_syscall_64+0x69/0x80
> kernel:  ? syscall_exit_to_user_mode+0x18/0x40
> kernel:  ? do_syscall_64+0x69/0x80
> kernel:  ? exc_page_fault+0x68/0x150
> kernel:  entry_SYSCALL_64_after_hwframe+0x44/0xae
> ```
> 
> Analysis:
> 
> ocfs2_fill_super
>  ocfs2_mount_volume
>   ocfs2_dlm_init //failed, journal still doesn't be initized.
>    goto read_super_error
This is not a part of call trace, I don't think it is needed here.

>     ocfs2_dismount_volume
>      ocfs2_release_system_inodes
>       ...
>        evict
>         ...
>          ocfs2_clear_inode
>           ocfs2_checkpoint_inode
>            ocfs2_ci_fully_checkpointed
>             time_after(journal->j_trans_id, ci->ci_last_trans)
>              + journal is empty, crash!
> 

Suggest we use the following way to description this commit:
<call trace>
<reproducer>
<your analysis and how to fix>

BTW, as I mentioned at the first, a fixes tag should be mentioned here.
Could you also please check all possible use of journal during
ocfs2_dismount_volume()?

Thanks,
Joseph

> Signed-off-by: Heming Zhao <heming.zhao@suse.com>
> ---
>  fs/ocfs2/inode.c   | 3 ++-
>  fs/ocfs2/journal.h | 2 +-
>  2 files changed, 3 insertions(+), 2 deletions(-)
> 
> diff --git a/fs/ocfs2/inode.c b/fs/ocfs2/inode.c
> index 6c2411c2afcf..3826ab7eab3e 100644
> --- a/fs/ocfs2/inode.c
> +++ b/fs/ocfs2/inode.c
> @@ -1205,7 +1205,8 @@ static void ocfs2_clear_inode(struct inode *inode)
>  	 * the journal is flushed before journal shutdown. Thus it is safe to
>  	 * have inodes get cleaned up after journal shutdown.
>  	 */
> -	jbd2_journal_release_jbd_inode(osb->journal->j_journal,
> +	if (osb->journal)
> +		jbd2_journal_release_jbd_inode(osb->journal->j_journal,
>  				       &oi->ip_jinode);
>  }
>  
> diff --git a/fs/ocfs2/journal.h b/fs/ocfs2/journal.h
> index 8dcb2f2cadbc..1610d49b4112 100644
> --- a/fs/ocfs2/journal.h
> +++ b/fs/ocfs2/journal.h
> @@ -189,7 +189,7 @@ static inline void ocfs2_checkpoint_inode(struct inode *inode)
>  {
>  	struct ocfs2_super *osb = OCFS2_SB(inode->i_sb);
>  
> -	if (ocfs2_mount_local(osb))
> +	if (!osb->journal || ocfs2_mount_local(osb))
>  		return;
>  
>  	if (!ocfs2_ci_fully_checkpointed(INODE_CACHE(inode))) {
David Sterba via Ocfs2-devel March 10, 2022, 8:16 a.m. UTC | #2
Hello Joseph,

Thank you for your review.

On 3/10/22 11:59, Joseph Qi wrote:
> It looks this issue is introduced by the following commit:
> da5e7c87827e ocfs2: cleanup journal init and shutdown
> 
> Before that, journal was initialized in ocfs2_initialize_super().

I agree, and will add Fixes in V2.

> 
> On 3/9/22 9:30 PM, Heming Zhao wrote:
>> How to trigger:
>>
>>   tb-ocfs1 # dd if=/dev/zero of=/dev/vdb bs=1M count=20 oflag=direct
>>   tb-ocfs1 # mkfs.ocfs2 --cluster-stack=pcmk -N 4 /dev/vdb \
>>              --cluster-name=ocfstst
>>              == there is no dlm running ==
>>   tb-ocfs1 # mount -t ocfs2 /dev/vdb /mnt
>>              == kernel crash ==
>>
>> Crash log
>>
>> ```
>> kernel: DLM installed
>> kernel: ocfs2: Registered cluster interface user
>> kernel: dlm: no local IP address has been set
>> kernel: dlm: cannot start dlm lowcomms -107
>> kernel: (mount.ocfs2,2225,0):ocfs2_dlm_init:3355 ERROR: status = -107
>> kernel: (mount.ocfs2,2225,0):ocfs2_mount_volume:1817 ERROR: status = -107
>> kernel: (mount.ocfs2,2225,0):ocfs2_fill_super:1186 ERROR: status = -107
>> kernel: BUG: kernel NULL pointer dereference, address: 0000000000000030
>> kernel: #PF: supervisor read access in kernel mode
>> kernel: #PF: error_code(0x0000) - not-present page
>> kernel: PGD 0 P4D 0
>> kernel: Oops: 0000 [#1] PREEMPT SMP NOPTI
>> kernel: CPU: 0 PID: 2225 Comm: mount.ocfs2 Not tainted 5.16.2-1-default #1 openSUSE Tumbleweed b40a195b7ff0f3399a616c3290f963c4ad189e84
>> kernel: Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.14.0-0-g155821a-rebuilt.opensuse.org 04/01/2014
>> kernel: RIP: 0010:ocfs2_clear_inode+0x2e9/0x720 [ocfs2]
>> kernel: Code: 24 e8 9b 68 04 00 48 c7 c7 70 88 db c0 48 8b 80 98 03 00
>> 00 48 8b 80 70 01 00 00 48 89 44 24 08 e8 3c 50 d8 d8 48 8b 44 24 08
>> <48> 8b 48 30 49 39 4f c8 0f 88 ff 00 00 00 48 c>
>> kernel: RSP: 0018:ffffbbf000847bf0 EFLAGS: 00010246
>> kernel: RAX: 0000000000000000 RBX: ffff95f6834b8000 RCX: 0000000000000000
>> kernel: RDX: 0000000000000001 RSI: ffff95f6a1fbcbe0 RDI: ffffffffc0db8870
>> kernel: RBP: ffff95f6a1fbc6b8 R08: 00000ab5a9371b7a R09: 0000000000000230
>> kernel: R10: ffffbbf000847bc0 R11: ffffffffc0d53ea0 R12: ffff95f6a1fbc560
>> kernel: R13: ffff95f6a1fbc408 R14: ffff95f6834b8000 R15: ffff95f6a1fbc908
>> kernel: FS:  00007f366f151740(0000) GS:ffff95f6fdc00000(0000) knlGS:0000000000000000
>> kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> kernel: CR2: 0000000000000030 CR3: 0000000003fcc004 CR4: 0000000000370ef0
>> kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>> kernel: DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>> kernel: Call Trace:
>> kernel:  <TASK>
>> kernel:  ? ocfs2_evict_inode+0x1fe/0x630 [ocfs2 411bc..281]
>> kernel:  evict+0xc0/0x1c0
>> kernel:  ocfs2_release_system_inodes+0x21/0xc0 [ocfs2 411bc..281]
>> kernel:  ocfs2_dismount_volume+0x10b/0x2d0 [ocfs2 411bc..281]
>> kernel:  ocfs2_fill_super+0xaf/0x19e0 [ocfs2 411bc..281]
>> kernel:  mount_bdev+0x182/0x1b0
>> kernel:  ? ocfs2_initialize_super.isra.0+0xf50/0xf50 [ocfs2 411bc..281]
>> kernel:  legacy_get_tree+0x24/0x40
>> kernel:  vfs_get_tree+0x22/0xb0
>> kernel:  path_mount+0x465/0xac0
>> kernel:  __x64_sys_mount+0x103/0x140
>> kernel:  do_syscall_64+0x59/0x80
>> kernel:  ? syscall_exit_to_user_mode+0x18/0x40
>> kernel:  ? do_syscall_64+0x69/0x80
>> kernel:  ? syscall_exit_to_user_mode+0x18/0x40
>> kernel:  ? do_syscall_64+0x69/0x80
>> kernel:  ? exc_page_fault+0x68/0x150
>> kernel:  entry_SYSCALL_64_after_hwframe+0x44/0xae
>> ```
>>
>> Analysis:
>>
>> ocfs2_fill_super
>>   ocfs2_mount_volume
>>    ocfs2_dlm_init //failed, journal still doesn't be initized.
>>     goto read_super_error
> This is not a part of call trace, I don't think it is needed here.

OK, will drop in v2.

> 
>>      ocfs2_dismount_volume
>>       ocfs2_release_system_inodes
>>        ...
>>         evict
>>          ...
>>           ocfs2_clear_inode
>>            ocfs2_checkpoint_inode
>>             ocfs2_ci_fully_checkpointed
>>              time_after(journal->j_trans_id, ci->ci_last_trans)
>>               + journal is empty, crash!
>>
> 
> Suggest we use the following way to description this commit:
> <call trace>
> <reproducer>
> <your analysis and how to fix>

OK, will follow this style.

> 
> BTW, as I mentioned at the first, a fixes tag should be mentioned here.
> Could you also please check all possible use of journal during
> ocfs2_dismount_volume()?

I verified my patch before file V1, there was no crash anymore with no dlm case.

Under your comment, there may have other places need to fix in ocfs2_dismount_volume().
The reverse operations in ocfs2_dismount_volume() for range [ocfs2_initialize_super, \
ocfs2_journal_init) should be checked.

> 
> Thanks,
> Joseph
> 

Thanks,
Heming
diff mbox series

Patch

diff --git a/fs/ocfs2/inode.c b/fs/ocfs2/inode.c
index 6c2411c2afcf..3826ab7eab3e 100644
--- a/fs/ocfs2/inode.c
+++ b/fs/ocfs2/inode.c
@@ -1205,7 +1205,8 @@  static void ocfs2_clear_inode(struct inode *inode)
 	 * the journal is flushed before journal shutdown. Thus it is safe to
 	 * have inodes get cleaned up after journal shutdown.
 	 */
-	jbd2_journal_release_jbd_inode(osb->journal->j_journal,
+	if (osb->journal)
+		jbd2_journal_release_jbd_inode(osb->journal->j_journal,
 				       &oi->ip_jinode);
 }
 
diff --git a/fs/ocfs2/journal.h b/fs/ocfs2/journal.h
index 8dcb2f2cadbc..1610d49b4112 100644
--- a/fs/ocfs2/journal.h
+++ b/fs/ocfs2/journal.h
@@ -189,7 +189,7 @@  static inline void ocfs2_checkpoint_inode(struct inode *inode)
 {
 	struct ocfs2_super *osb = OCFS2_SB(inode->i_sb);
 
-	if (ocfs2_mount_local(osb))
+	if (!osb->journal || ocfs2_mount_local(osb))
 		return;
 
 	if (!ocfs2_ci_fully_checkpointed(INODE_CACHE(inode))) {