ocfs2: protect get_block with inode cluster lock
diff mbox series

Message ID 20190910173046.27360-1-junxiao.bi@oracle.com
State New
Headers show
Series
  • ocfs2: protect get_block with inode cluster lock
Related show

Commit Message

Junxiao Bi Sept. 10, 2019, 5:30 p.m. UTC
Inode cluster lock should be acquired to avoid extent
tree changed by other cluster nodes.

Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com>
---
 fs/ocfs2/aops.c | 6 ++++++
 1 file changed, 6 insertions(+)

Comments

Joseph Qi Sept. 11, 2019, 1:32 a.m. UTC | #1
Hi Junxiao,

On 19/9/11 01:30, Junxiao Bi wrote:
> Inode cluster lock should be acquired to avoid extent
> tree changed by other cluster nodes.
> 
Could you please elaborate more on the real issue?
IIUC, ocfs2_lock_get_block() will only be called under:
1) rw = READ;
2) rw = WRITE && overwrite.

Thanks,
Joseph

> Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com>
> ---
>  fs/ocfs2/aops.c | 6 ++++++
>  1 file changed, 6 insertions(+)
> 
> diff --git a/fs/ocfs2/aops.c b/fs/ocfs2/aops.c
> index a4c905d6b575..5ae7253d04b0 100644
> --- a/fs/ocfs2/aops.c
> +++ b/fs/ocfs2/aops.c
> @@ -127,9 +127,15 @@ static int ocfs2_lock_get_block(struct inode *inode, sector_t iblock,
>  	int ret = 0;
>  	struct ocfs2_inode_info *oi = OCFS2_I(inode);
>  
> +	ret = ocfs2_inode_lock(inode, NULL, 0);
> +	if (ret) {
> +		mlog_errno(ret);
> +		return ret;
> +	}
>  	down_read(&oi->ip_alloc_sem);
>  	ret = ocfs2_get_block(inode, iblock, bh_result, create);
>  	up_read(&oi->ip_alloc_sem);
> +	ocfs2_inode_unlock(inode, 0);
>  
>  	return ret;
>  }
>
Changwei Ge Sept. 11, 2019, 2:33 a.m. UTC | #2
Hi Junxiao,


As _RW_ cluster lock with EX level is held by local node, is that 
possible other node has a chance to touch the extent tree belonging to 
the same inode?


Thanks,

Changwei


On 2019/9/11 1:30 上午, Junxiao Bi wrote:
> Inode cluster lock should be acquired to avoid extent
> tree changed by other cluster nodes.
>
> Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com>
> ---
>   fs/ocfs2/aops.c | 6 ++++++
>   1 file changed, 6 insertions(+)
>
> diff --git a/fs/ocfs2/aops.c b/fs/ocfs2/aops.c
> index a4c905d6b575..5ae7253d04b0 100644
> --- a/fs/ocfs2/aops.c
> +++ b/fs/ocfs2/aops.c
> @@ -127,9 +127,15 @@ static int ocfs2_lock_get_block(struct inode *inode, sector_t iblock,
>   	int ret = 0;
>   	struct ocfs2_inode_info *oi = OCFS2_I(inode);
>   
> +	ret = ocfs2_inode_lock(inode, NULL, 0);
> +	if (ret) {
> +		mlog_errno(ret);
> +		return ret;
> +	}
>   	down_read(&oi->ip_alloc_sem);
>   	ret = ocfs2_get_block(inode, iblock, bh_result, create);
>   	up_read(&oi->ip_alloc_sem);
> +	ocfs2_inode_unlock(inode, 0);
>   
>   	return ret;
>   }
Junxiao Bi Sept. 11, 2019, 5:40 p.m. UTC | #3
Hi Joseph & Changwei,

This is found by just reviewing source code, according the lock 
semantic, for accessing extent tree, alloc_sem and ip cluster lock 
should be acquired. ocfs2_dio_wr_get_block() had these two lock 
acquired. rw_lock can protect the tree from the write, but i am not sure 
whether there were other flow changing extent tree without rw lock held.

Thanks,

Junxiao.

On 9/10/19 7:33 PM, Changwei Ge wrote:
> Hi Junxiao,
>
>
> As _RW_ cluster lock with EX level is held by local node, is that 
> possible other node has a chance to touch the extent tree belonging to 
> the same inode?
>
>
> Thanks,
>
> Changwei
>
>
> On 2019/9/11 1:30 上午, Junxiao Bi wrote:
>> Inode cluster lock should be acquired to avoid extent
>> tree changed by other cluster nodes.
>>
>> Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com>
>> ---
>>   fs/ocfs2/aops.c | 6 ++++++
>>   1 file changed, 6 insertions(+)
>>
>> diff --git a/fs/ocfs2/aops.c b/fs/ocfs2/aops.c
>> index a4c905d6b575..5ae7253d04b0 100644
>> --- a/fs/ocfs2/aops.c
>> +++ b/fs/ocfs2/aops.c
>> @@ -127,9 +127,15 @@ static int ocfs2_lock_get_block(struct inode 
>> *inode, sector_t iblock,
>>       int ret = 0;
>>       struct ocfs2_inode_info *oi = OCFS2_I(inode);
>>   +    ret = ocfs2_inode_lock(inode, NULL, 0);
>> +    if (ret) {
>> +        mlog_errno(ret);
>> +        return ret;
>> +    }
>>       down_read(&oi->ip_alloc_sem);
>>       ret = ocfs2_get_block(inode, iblock, bh_result, create);
>>       up_read(&oi->ip_alloc_sem);
>> +    ocfs2_inode_unlock(inode, 0);
>>         return ret;
>>   }
Gang He Sept. 12, 2019, 9:34 a.m. UTC | #4
Hi Guys,

I went through the ocfs2 code today.
I think there should be a ocfs2_inode_lock() in the ocfs2_lock_get_block() function.
Why?
ocfs2_rw_lock is used to protect read/write operation from the different nodes, 
but it cannot synchronize the inode meta-data changes, that means one node cannot on-time see the meta-data changes(e.g. extent blocks) from the other nodes.
The meta-data consistency should be guaranteed by ocfs2_inode_lock.
Anyway, 
this ocfs2_inode_lock should be added here, but we should do a full testing after this patch, to make sure there is not any deadlock.
Second, according to my comments, I think the patch reporter should be able to give an example for why we need to add this patch. 

Thanks
Gang

> -----Original Message-----
> From: ocfs2-devel-bounces@oss.oracle.com
> [mailto:ocfs2-devel-bounces@oss.oracle.com] On Behalf Of Junxiao Bi
> Sent: 2019年9月12日 1:40
> To: jiangqi903@gmail.com; Changwei Ge <chge@linux.alibaba.com>;
> ocfs2-devel@oss.oracle.com
> Subject: Re: [Ocfs2-devel] [PATCH] ocfs2: protect get_block with inode cluster
> lock
> 
> Hi Joseph & Changwei,
> 
> This is found by just reviewing source code, according the lock semantic, for
> accessing extent tree, alloc_sem and ip cluster lock should be acquired.
> ocfs2_dio_wr_get_block() had these two lock acquired. rw_lock can protect the
> tree from the write, but i am not sure whether there were other flow changing
> extent tree without rw lock held.
> 
> Thanks,
> 
> Junxiao.
> 
> On 9/10/19 7:33 PM, Changwei Ge wrote:
> > Hi Junxiao,
> >
> >
> > As _RW_ cluster lock with EX level is held by local node, is that
> > possible other node has a chance to touch the extent tree belonging to
> > the same inode?
> >
> >
> > Thanks,
> >
> > Changwei
> >
> >
> > On 2019/9/11 1:30 上午, Junxiao Bi wrote:
> >> Inode cluster lock should be acquired to avoid extent tree changed by
> >> other cluster nodes.
> >>
> >> Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com>
> >> ---
> >>   fs/ocfs2/aops.c | 6 ++++++
> >>   1 file changed, 6 insertions(+)
> >>
> >> diff --git a/fs/ocfs2/aops.c b/fs/ocfs2/aops.c index
> >> a4c905d6b575..5ae7253d04b0 100644
> >> --- a/fs/ocfs2/aops.c
> >> +++ b/fs/ocfs2/aops.c
> >> @@ -127,9 +127,15 @@ static int ocfs2_lock_get_block(struct inode
> >> *inode, sector_t iblock,
> >>       int ret = 0;
> >>       struct ocfs2_inode_info *oi = OCFS2_I(inode);
> >>   +    ret = ocfs2_inode_lock(inode, NULL, 0);
> >> +    if (ret) {
> >> +        mlog_errno(ret);
> >> +        return ret;
> >> +    }
> >>       down_read(&oi->ip_alloc_sem);
> >>       ret = ocfs2_get_block(inode, iblock, bh_result, create);
> >>       up_read(&oi->ip_alloc_sem);
> >> +    ocfs2_inode_unlock(inode, 0);
> >>         return ret;
> >>   }
> 
> _______________________________________________
> Ocfs2-devel mailing list
> Ocfs2-devel@oss.oracle.com
> https://oss.oracle.com/mailman/listinfo/ocfs2-devel
Changwei Ge Sept. 12, 2019, 10:26 a.m. UTC | #5
Hi Gang,

IIUC, you worried that local node might use the old inode meta like 
'inode size", right?

If so, in ocfs2_direct_IO() which is the caller of 
ocfs2_lock_get_block() and ocfs2_dio_wr_get_block() is already using 
inode meta.

Should we also lock inode there?

I think the way how we guarantee what you mentioned is we acquire inode 
cluster lock with RW cluster lock held(in ocfs2_file_write_iter()), 
after which we a see a consistent inode.


I suppose if other node wants to change inode extent tree(truncating, 
appending, punching hole), it should
acquire first, by which during local node write operation, we can use a 
stable extent tree.


Thank,

Changwei


On 2019/9/12 5:34 下午, Gang He wrote:
> Hi Guys,
>
> I went through the ocfs2 code today.
> I think there should be a ocfs2_inode_lock() in the ocfs2_lock_get_block() function.
> Why?
> ocfs2_rw_lock is used to protect read/write operation from the different nodes,
> but it cannot synchronize the inode meta-data changes, that means one node cannot on-time see the meta-data changes(e.g. extent blocks) from the other nodes.
> The meta-data consistency should be guaranteed by ocfs2_inode_lock.
> Anyway,
> this ocfs2_inode_lock should be added here, but we should do a full testing after this patch, to make sure there is not any deadlock.
> Second, according to my comments, I think the patch reporter should be able to give an example for why we need to add this patch.
>
> Thanks
> Gang
>
>> -----Original Message-----
>> From: ocfs2-devel-bounces@oss.oracle.com
>> [mailto:ocfs2-devel-bounces@oss.oracle.com] On Behalf Of Junxiao Bi
>> Sent: 2019年9月12日 1:40
>> To: jiangqi903@gmail.com; Changwei Ge <chge@linux.alibaba.com>;
>> ocfs2-devel@oss.oracle.com
>> Subject: Re: [Ocfs2-devel] [PATCH] ocfs2: protect get_block with inode cluster
>> lock
>>
>> Hi Joseph & Changwei,
>>
>> This is found by just reviewing source code, according the lock semantic, for
>> accessing extent tree, alloc_sem and ip cluster lock should be acquired.
>> ocfs2_dio_wr_get_block() had these two lock acquired. rw_lock can protect the
>> tree from the write, but i am not sure whether there were other flow changing
>> extent tree without rw lock held.
>>
>> Thanks,
>>
>> Junxiao.
>>
>> On 9/10/19 7:33 PM, Changwei Ge wrote:
>>> Hi Junxiao,
>>>
>>>
>>> As _RW_ cluster lock with EX level is held by local node, is that
>>> possible other node has a chance to touch the extent tree belonging to
>>> the same inode?
>>>
>>>
>>> Thanks,
>>>
>>> Changwei
>>>
>>>
>>> On 2019/9/11 1:30 上午, Junxiao Bi wrote:
>>>> Inode cluster lock should be acquired to avoid extent tree changed by
>>>> other cluster nodes.
>>>>
>>>> Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com>
>>>> ---
>>>>    fs/ocfs2/aops.c | 6 ++++++
>>>>    1 file changed, 6 insertions(+)
>>>>
>>>> diff --git a/fs/ocfs2/aops.c b/fs/ocfs2/aops.c index
>>>> a4c905d6b575..5ae7253d04b0 100644
>>>> --- a/fs/ocfs2/aops.c
>>>> +++ b/fs/ocfs2/aops.c
>>>> @@ -127,9 +127,15 @@ static int ocfs2_lock_get_block(struct inode
>>>> *inode, sector_t iblock,
>>>>        int ret = 0;
>>>>        struct ocfs2_inode_info *oi = OCFS2_I(inode);
>>>>    +    ret = ocfs2_inode_lock(inode, NULL, 0);
>>>> +    if (ret) {
>>>> +        mlog_errno(ret);
>>>> +        return ret;
>>>> +    }
>>>>        down_read(&oi->ip_alloc_sem);
>>>>        ret = ocfs2_get_block(inode, iblock, bh_result, create);
>>>>        up_read(&oi->ip_alloc_sem);
>>>> +    ocfs2_inode_unlock(inode, 0);
>>>>          return ret;
>>>>    }
>> _______________________________________________
>> Ocfs2-devel mailing list
>> Ocfs2-devel@oss.oracle.com
>> https://oss.oracle.com/mailman/listinfo/ocfs2-devel
> _______________________________________________
> Ocfs2-devel mailing list
> Ocfs2-devel@oss.oracle.com
> https://oss.oracle.com/mailman/listinfo/ocfs2-devel
Gang He Sept. 12, 2019, 10:54 a.m. UTC | #6
Hi ChangWei,

Please see my comments inline.

> -----Original Message-----
> From: ocfs2-devel-bounces@oss.oracle.com
> [mailto:ocfs2-devel-bounces@oss.oracle.com] On Behalf Of Changwei Ge
> Sent: 2019年9月12日 18:26
> To: jiangqi903@gmail.com; Changwei Ge <chge@linux.alibaba.com>; Junxiao
> Bi <junxiao.bi@oracle.com>; ocfs2-devel@oss.oracle.com; Gang He
> <GHe@suse.com>
> Subject: Re: [Ocfs2-devel] [PATCH] ocfs2: protect get_block with inode cluster
> lock
> 
> Hi Gang,
> 
> IIUC, you worried that local node might use the old inode meta like 'inode
> size", right?
Inode block and extent blocks for a file.
My means, if there is not inode meta lock acquire, the inode meta-blocks changes from the other node cannot write-back to the disk immediately, since meta-data block write-back depends on lock downgrade hook.

> 
> If so, in ocfs2_direct_IO() which is the caller of
> ocfs2_lock_get_block() and ocfs2_dio_wr_get_block() is already using inode
> meta.
> 
> Should we also lock inode there?
> 
> I think the way how we guarantee what you mentioned is we acquire inode
> cluster lock with RW cluster lock held(in ocfs2_file_write_iter()), after which we
> a see a consistent inode.
My concerns are,
If the direct-io inode has been in the memory in the past (maybe this is second read for this node).
Then, before do the direct-io read, the caller cannot see the meta-data block changes, which were just made from the other nodes.
In this case, if we can hit the problem?

Thanks
Gang

> 
> 
> I suppose if other node wants to change inode extent tree(truncating,
> appending, punching hole), it should acquire first, by which during local node
> write operation, we can use a stable extent tree.
> 
> 
> Thank,
> 
> Changwei
> 
> 
> On 2019/9/12 5:34 下午, Gang He wrote:
> > Hi Guys,
> >
> > I went through the ocfs2 code today.
> > I think there should be a ocfs2_inode_lock() in the ocfs2_lock_get_block()
> function.
> > Why?
> > ocfs2_rw_lock is used to protect read/write operation from the
> > different nodes, but it cannot synchronize the inode meta-data changes, that
> means one node cannot on-time see the meta-data changes(e.g. extent blocks)
> from the other nodes.
> > The meta-data consistency should be guaranteed by ocfs2_inode_lock.
> > Anyway,
> > this ocfs2_inode_lock should be added here, but we should do a full testing
> after this patch, to make sure there is not any deadlock.
> > Second, according to my comments, I think the patch reporter should be
> able to give an example for why we need to add this patch.
> >
> > Thanks
> > Gang
> >
> >> -----Original Message-----
> >> From: ocfs2-devel-bounces@oss.oracle.com
> >> [mailto:ocfs2-devel-bounces@oss.oracle.com] On Behalf Of Junxiao Bi
> >> Sent: 2019年9月12日 1:40
> >> To: jiangqi903@gmail.com; Changwei Ge <chge@linux.alibaba.com>;
> >> ocfs2-devel@oss.oracle.com
> >> Subject: Re: [Ocfs2-devel] [PATCH] ocfs2: protect get_block with
> >> inode cluster lock
> >>
> >> Hi Joseph & Changwei,
> >>
> >> This is found by just reviewing source code, according the lock
> >> semantic, for accessing extent tree, alloc_sem and ip cluster lock should be
> acquired.
> >> ocfs2_dio_wr_get_block() had these two lock acquired. rw_lock can
> >> protect the tree from the write, but i am not sure whether there were
> >> other flow changing extent tree without rw lock held.
> >>
> >> Thanks,
> >>
> >> Junxiao.
> >>
> >> On 9/10/19 7:33 PM, Changwei Ge wrote:
> >>> Hi Junxiao,
> >>>
> >>>
> >>> As _RW_ cluster lock with EX level is held by local node, is that
> >>> possible other node has a chance to touch the extent tree belonging
> >>> to the same inode?
> >>>
> >>>
> >>> Thanks,
> >>>
> >>> Changwei
> >>>
> >>>
> >>> On 2019/9/11 1:30 上午, Junxiao Bi wrote:
> >>>> Inode cluster lock should be acquired to avoid extent tree changed
> >>>> by other cluster nodes.
> >>>>
> >>>> Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com>
> >>>> ---
> >>>>    fs/ocfs2/aops.c | 6 ++++++
> >>>>    1 file changed, 6 insertions(+)
> >>>>
> >>>> diff --git a/fs/ocfs2/aops.c b/fs/ocfs2/aops.c index
> >>>> a4c905d6b575..5ae7253d04b0 100644
> >>>> --- a/fs/ocfs2/aops.c
> >>>> +++ b/fs/ocfs2/aops.c
> >>>> @@ -127,9 +127,15 @@ static int ocfs2_lock_get_block(struct inode
> >>>> *inode, sector_t iblock,
> >>>>        int ret = 0;
> >>>>        struct ocfs2_inode_info *oi = OCFS2_I(inode);
> >>>>    +    ret = ocfs2_inode_lock(inode, NULL, 0);
> >>>> +    if (ret) {
> >>>> +        mlog_errno(ret);
> >>>> +        return ret;
> >>>> +    }
> >>>>        down_read(&oi->ip_alloc_sem);
> >>>>        ret = ocfs2_get_block(inode, iblock, bh_result, create);
> >>>>        up_read(&oi->ip_alloc_sem);
> >>>> +    ocfs2_inode_unlock(inode, 0);
> >>>>          return ret;
> >>>>    }
> >> _______________________________________________
> >> Ocfs2-devel mailing list
> >> Ocfs2-devel@oss.oracle.com
> >> https://oss.oracle.com/mailman/listinfo/ocfs2-devel
> > _______________________________________________
> > Ocfs2-devel mailing list
> > Ocfs2-devel@oss.oracle.com
> > https://oss.oracle.com/mailman/listinfo/ocfs2-devel
> _______________________________________________
> Ocfs2-devel mailing list
> Ocfs2-devel@oss.oracle.com
> https://oss.oracle.com/mailman/listinfo/ocfs2-devel

Patch
diff mbox series

diff --git a/fs/ocfs2/aops.c b/fs/ocfs2/aops.c
index a4c905d6b575..5ae7253d04b0 100644
--- a/fs/ocfs2/aops.c
+++ b/fs/ocfs2/aops.c
@@ -127,9 +127,15 @@  static int ocfs2_lock_get_block(struct inode *inode, sector_t iblock,
 	int ret = 0;
 	struct ocfs2_inode_info *oi = OCFS2_I(inode);
 
+	ret = ocfs2_inode_lock(inode, NULL, 0);
+	if (ret) {
+		mlog_errno(ret);
+		return ret;
+	}
 	down_read(&oi->ip_alloc_sem);
 	ret = ocfs2_get_block(inode, iblock, bh_result, create);
 	up_read(&oi->ip_alloc_sem);
+	ocfs2_inode_unlock(inode, 0);
 
 	return ret;
 }