diff mbox series

[1/4] ocfs2: Fix freeing uninitialized resource on ocfs2_dlm_shutdown

Message ID 20220730011411.11214-2-heming.zhao@suse.com (mailing list archive)
State New, archived
Headers show
Series re-enable non-clustered mount & add MMP support | expand

Commit Message

Heming Zhao July 30, 2022, 1:14 a.m. UTC
On local mount mode, there is no dlm resource initalized. If
ocfs2_mount_volume() fails in ocfs2_find_slot(), error handling
flow will call ocfs2_dlm_shutdown(), then does dlm resource
cleanup job, which will trigger kernel crash.

Fixes: 0737e01de9c4 ("ocfs2: ocfs2_mount_volume does cleanup job before
return error")
Signed-off-by: Heming Zhao <heming.zhao@suse.com>
---
 fs/ocfs2/dlmglue.c | 3 +++
 1 file changed, 3 insertions(+)

Comments

Joseph Qi Aug. 8, 2022, 6:51 a.m. UTC | #1
On 7/30/22 9:14 AM, Heming Zhao wrote:
> On local mount mode, there is no dlm resource initalized. If
> ocfs2_mount_volume() fails in ocfs2_find_slot(), error handling
> flow will call ocfs2_dlm_shutdown(), then does dlm resource
> cleanup job, which will trigger kernel crash.
> 
> Fixes: 0737e01de9c4 ("ocfs2: ocfs2_mount_volume does cleanup job before
> return error")

Should be put at the same line.

> Signed-off-by: Heming Zhao <heming.zhao@suse.com>
> ---
>  fs/ocfs2/dlmglue.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/fs/ocfs2/dlmglue.c b/fs/ocfs2/dlmglue.c
> index 801e60bab955..1438ac14940b 100644
> --- a/fs/ocfs2/dlmglue.c
> +++ b/fs/ocfs2/dlmglue.c
> @@ -3385,6 +3385,9 @@ int ocfs2_dlm_init(struct ocfs2_super *osb)
>  void ocfs2_dlm_shutdown(struct ocfs2_super *osb,
>  			int hangup_pending)
>  {
> +	if (ocfs2_mount_local(osb))
> +		return;
> +

IMO, we have to do part of ocfs2_dlm_shutdown() jobs such as
ocfs2_lock_res_free(), which will remove lockres from d_lockres_tracking
added by ocfs2_xxx_lock_res_init().

Before commit 0737e01de9c4, it seems this issue also exists since
osb->cconn is already set under local mount mode. 

Thanks,
Joseph
Heming Zhao Aug. 8, 2022, 12:09 p.m. UTC | #2
On Mon, Aug 08, 2022 at 02:51:12PM +0800, Joseph Qi wrote:
> 
> 
> On 7/30/22 9:14 AM, Heming Zhao wrote:
> > On local mount mode, there is no dlm resource initalized. If
> > ocfs2_mount_volume() fails in ocfs2_find_slot(), error handling
> > flow will call ocfs2_dlm_shutdown(), then does dlm resource
> > cleanup job, which will trigger kernel crash.
> > 
> > Fixes: 0737e01de9c4 ("ocfs2: ocfs2_mount_volume does cleanup job before
> > return error")
> 
> Should be put at the same line.
OK

> 
> > Signed-off-by: Heming Zhao <heming.zhao@suse.com>
> > ---
> >  fs/ocfs2/dlmglue.c | 3 +++
> >  1 file changed, 3 insertions(+)
> > 
> > diff --git a/fs/ocfs2/dlmglue.c b/fs/ocfs2/dlmglue.c
> > index 801e60bab955..1438ac14940b 100644
> > --- a/fs/ocfs2/dlmglue.c
> > +++ b/fs/ocfs2/dlmglue.c
> > @@ -3385,6 +3385,9 @@ int ocfs2_dlm_init(struct ocfs2_super *osb)
> >  void ocfs2_dlm_shutdown(struct ocfs2_super *osb,
> >  			int hangup_pending)
> >  {
> > +	if (ocfs2_mount_local(osb))
> > +		return;
> > +
> 
> IMO, we have to do part of ocfs2_dlm_shutdown() jobs such as
> ocfs2_lock_res_free(), which will remove lockres from d_lockres_tracking
> added by ocfs2_xxx_lock_res_init().
> 

ocfs2_dlm_shutdown does the cleanup job for ocfs2_dlm_init.
This patch fixed crash in local mount error flow.
In local mount mode, ocfs2_dlm_init does nothing, which should make
ocfs2_dlm_shutdown do nothing.

And I checked all calling ocfs2_dlm_shutdown cases:
1. mount flow:

   ocfs2_fill_super
    + xxx =fails=> label:out_super (checked, work fine)
    |
    + ocfs2_mount_volume =fails=> label:out_debugfs (checked, work fine)
    |  |
    |  + xxx =fails=> cleanup everything before returning
    |
    + xxx =fails=> label:out_dismout
         At this time, dlm has been init successfully, we can call all lines
         of ocfs2_dlm_shutdown.
    
   
2. ocfs2_dismount_volume => 'osb->cconn' is true.
   this MUST be dlm successfully init case. everything looks fine.

In previous mail/patch: [PATCH] test error handling during mount stage, I may
forget to test local mount mode. So this crash didn't be triggered.

> Before commit 0737e01de9c4, it seems this issue also exists since
> osb->cconn is already set under local mount mode. 

Yes. The bug exists since local mount feature was introduced, commit number:
c271c5c22b0a7ca45fda15f1f4d258bca36a5b94.  I will change 'Fixes' on next version.

(Hope 'CC' list takes effect for this mail. -_-# )

Thanks,
Heming
Joseph Qi Aug. 10, 2022, 1:31 a.m. UTC | #3
On 8/8/22 8:09 PM, Heming Zhao wrote:
> On Mon, Aug 08, 2022 at 02:51:12PM +0800, Joseph Qi wrote:
>>
>>
>> On 7/30/22 9:14 AM, Heming Zhao wrote:
>>> On local mount mode, there is no dlm resource initalized. If
>>> ocfs2_mount_volume() fails in ocfs2_find_slot(), error handling
>>> flow will call ocfs2_dlm_shutdown(), then does dlm resource
>>> cleanup job, which will trigger kernel crash.
>>>
>>> Fixes: 0737e01de9c4 ("ocfs2: ocfs2_mount_volume does cleanup job before
>>> return error")
>>
>> Should be put at the same line.
> OK
> 
>>
>>> Signed-off-by: Heming Zhao <heming.zhao@suse.com>
>>> ---
>>>  fs/ocfs2/dlmglue.c | 3 +++
>>>  1 file changed, 3 insertions(+)
>>>
>>> diff --git a/fs/ocfs2/dlmglue.c b/fs/ocfs2/dlmglue.c
>>> index 801e60bab955..1438ac14940b 100644
>>> --- a/fs/ocfs2/dlmglue.c
>>> +++ b/fs/ocfs2/dlmglue.c
>>> @@ -3385,6 +3385,9 @@ int ocfs2_dlm_init(struct ocfs2_super *osb)
>>>  void ocfs2_dlm_shutdown(struct ocfs2_super *osb,
>>>  			int hangup_pending)
>>>  {
>>> +	if (ocfs2_mount_local(osb))
>>> +		return;
>>> +
>>
>> IMO, we have to do part of ocfs2_dlm_shutdown() jobs such as
>> ocfs2_lock_res_free(), which will remove lockres from d_lockres_tracking
>> added by ocfs2_xxx_lock_res_init().
>>
> 
> ocfs2_dlm_shutdown does the cleanup job for ocfs2_dlm_init.
> This patch fixed crash in local mount error flow.
> In local mount mode, ocfs2_dlm_init does nothing, which should make
> ocfs2_dlm_shutdown do nothing.
> 
Not exactly, even in local mount we have to initialize lockres like
super lock.


> And I checked all calling ocfs2_dlm_shutdown cases:
> 1. mount flow:
> 
>    ocfs2_fill_super
>     + xxx =fails=> label:out_super (checked, work fine)
>     |
>     + ocfs2_mount_volume =fails=> label:out_debugfs (checked, work fine)
>     |  |
>     |  + xxx =fails=> cleanup everything before returning
>     |
>     + xxx =fails=> label:out_dismout
>          At this time, dlm has been init successfully, we can call all lines
>          of ocfs2_dlm_shutdown.
>     
>    
> 2. ocfs2_dismount_volume => 'osb->cconn' is true.
>    this MUST be dlm successfully init case. everything looks fine.
> 
> In previous mail/patch: [PATCH] test error handling during mount stage, I may
> forget to test local mount mode. So this crash didn't be triggered.
> 
>> Before commit 0737e01de9c4, it seems this issue also exists since
>> osb->cconn is already set under local mount mode. 
> 
> Yes. The bug exists since local mount feature was introduced, commit number:
> c271c5c22b0a7ca45fda15f1f4d258bca36a5b94.  I will change 'Fixes' on next version.
> 
This patch can be a separate fix and we don't have to bind with
nocluster mount feature, even though you've triggered it by related
local mount.

Thanks,
Joseph

> (Hope 'CC' list takes effect for this mail. -_-# )
> 
> Thanks,
> Heming
David Sterba via Ocfs2-devel Aug. 10, 2022, 11:52 p.m. UTC | #4
On 8/10/22 09:31, Joseph Qi wrote:
> 
> 
> On 8/8/22 8:09 PM, Heming Zhao wrote:
>> On Mon, Aug 08, 2022 at 02:51:12PM +0800, Joseph Qi wrote:
>>>
>>>
>>> On 7/30/22 9:14 AM, Heming Zhao wrote:
>>>> On local mount mode, there is no dlm resource initalized. If
>>>> ocfs2_mount_volume() fails in ocfs2_find_slot(), error handling
>>>> flow will call ocfs2_dlm_shutdown(), then does dlm resource
>>>> cleanup job, which will trigger kernel crash.
>>>>
>>>> Fixes: 0737e01de9c4 ("ocfs2: ocfs2_mount_volume does cleanup job before
>>>> return error")
>>>
>>> Should be put at the same line.
>> OK
>>
>>>
>>>> Signed-off-by: Heming Zhao <heming.zhao@suse.com>
>>>> ---
>>>>   fs/ocfs2/dlmglue.c | 3 +++
>>>>   1 file changed, 3 insertions(+)
>>>>
>>>> diff --git a/fs/ocfs2/dlmglue.c b/fs/ocfs2/dlmglue.c
>>>> index 801e60bab955..1438ac14940b 100644
>>>> --- a/fs/ocfs2/dlmglue.c
>>>> +++ b/fs/ocfs2/dlmglue.c
>>>> @@ -3385,6 +3385,9 @@ int ocfs2_dlm_init(struct ocfs2_super *osb)
>>>>   void ocfs2_dlm_shutdown(struct ocfs2_super *osb,
>>>>   			int hangup_pending)
>>>>   {
>>>> +	if (ocfs2_mount_local(osb))
>>>> +		return;
>>>> +
>>>
>>> IMO, we have to do part of ocfs2_dlm_shutdown() jobs such as
>>> ocfs2_lock_res_free(), which will remove lockres from d_lockres_tracking
>>> added by ocfs2_xxx_lock_res_init().
>>>
>>
>> ocfs2_dlm_shutdown does the cleanup job for ocfs2_dlm_init.
>> This patch fixed crash in local mount error flow.
>> In local mount mode, ocfs2_dlm_init does nothing, which should make
>> ocfs2_dlm_shutdown do nothing.
>>
> Not exactly, even in local mount we have to initialize lockres like
> super lock.

Thank you for your patient explaination, I understood your meaning, will change it in
next version.
I am very busy these days, it makes my brain stop working.

> 
> 
>> And I checked all calling ocfs2_dlm_shutdown cases:
>> 1. mount flow:
>>
>>     ocfs2_fill_super
>>      + xxx =fails=> label:out_super (checked, work fine)
>>      |
>>      + ocfs2_mount_volume =fails=> label:out_debugfs (checked, work fine)
>>      |  |
>>      |  + xxx =fails=> cleanup everything before returning
>>      |
>>      + xxx =fails=> label:out_dismout
>>           At this time, dlm has been init successfully, we can call all lines
>>           of ocfs2_dlm_shutdown.
>>      
>>     
>> 2. ocfs2_dismount_volume => 'osb->cconn' is true.
>>     this MUST be dlm successfully init case. everything looks fine.
>>
>> In previous mail/patch: [PATCH] test error handling during mount stage, I may
>> forget to test local mount mode. So this crash didn't be triggered.
>>
>>> Before commit 0737e01de9c4, it seems this issue also exists since
>>> osb->cconn is already set under local mount mode.
>>
>> Yes. The bug exists since local mount feature was introduced, commit number:
>> c271c5c22b0a7ca45fda15f1f4d258bca36a5b94.  I will change 'Fixes' on next version.
>>
> This patch can be a separate fix and we don't have to bind with
> nocluster mount feature, even though you've triggered it by related
> local mount.

OK.

> 
> Thanks,
> Joseph
> 
>> (Hope 'CC' list takes effect for this mail. -_-# )
>>
>> Thanks,
>> Heming

Thanks,
Heming
diff mbox series

Patch

diff --git a/fs/ocfs2/dlmglue.c b/fs/ocfs2/dlmglue.c
index 801e60bab955..1438ac14940b 100644
--- a/fs/ocfs2/dlmglue.c
+++ b/fs/ocfs2/dlmglue.c
@@ -3385,6 +3385,9 @@  int ocfs2_dlm_init(struct ocfs2_super *osb)
 void ocfs2_dlm_shutdown(struct ocfs2_super *osb,
 			int hangup_pending)
 {
+	if (ocfs2_mount_local(osb))
+		return;
+
 	ocfs2_drop_osb_locks(osb);
 
 	/*