diff mbox

ocfs2: llseek requires to ocfs2 inode lock for the file in SEEK_END

Message ID 51C2BC1F.2010106@huawei.com (mailing list archive)
State New, archived
Headers show

Commit Message

Shen Canquan June 20, 2013, 8:23 a.m. UTC
llseek requires ocfs2 inode lock for updating the file size in SEEK_END.
because the file size maybe update on another node.
if it not . after call llseek in SEEK_END. the position is old.

this bug can be reproduce the following scenario:
at first ,we dd a test fileA,the file size is 10k.
on NodeA:
---------
1) open the test fileA, lseek the end of file. and print the position.
2) close the test fileA

on NodeB:
1) open the test fileA, append the 5k data to test FileA.
2) lseek the end of file. and print the position.
3) close file.

at first we run the test program1 on NodeA , the result is 10k.
and then run the test program2 on NodeB,  the result is 15k.
at last, we run the test program1 on NodeA again, the result is 10k.

after apply this patch.  the three step result is 15k.

Signed-off-by: jensen <shencanquan@huawei.com>
---
 fs/ocfs2/file.c |    9 +++++++++
 1 file changed, 9 insertions(+)

Comments

Andrew Morton June 26, 2013, 9:18 p.m. UTC | #1
On Thu, 20 Jun 2013 16:23:59 +0800 shencanquan <shencanquan@huawei.com> wrote:

> llseek requires ocfs2 inode lock for updating the file size in SEEK_END.
> because the file size maybe update on another node.
> if it not . after call llseek in SEEK_END. the position is old.
> 
> this bug can be reproduce the following scenario:
> at first ,we dd a test fileA,the file size is 10k.
> on NodeA:
> ---------
> 1) open the test fileA, lseek the end of file. and print the position.
> 2) close the test fileA
> 
> on NodeB:
> 1) open the test fileA, append the 5k data to test FileA.
> 2) lseek the end of file. and print the position.
> 3) close file.
> 
> at first we run the test program1 on NodeA , the result is 10k.
> and then run the test program2 on NodeB,  the result is 15k.
> at last, we run the test program1 on NodeA again, the result is 10k.
> 
> after apply this patch.  the three step result is 15k.
> 
> ...
>
> --- a/fs/ocfs2/file.c
> +++ b/fs/ocfs2/file.c
> @@ -2626,7 +2626,16 @@ static loff_t ocfs2_file_llseek(struct file *file, loff_t offset, int whence)
>  	case SEEK_SET:
>  		break;
>  	case SEEK_END:
> +		/* SEEK_END requires the OCFS2 inode lock for the file
> +		 * because it references the file's size.
> +		 */
> +		ret = ocfs2_inode_lock(inode, NULL, 0);
> +		if (ret < 0) {
> +			mlog_errno(ret);
> +			goto out;
> +		}
>  		offset += inode->i_size;
> +		ocfs2_inode_unlock(inode, 0);
>  		break;

I don't understand this.  The lock for inode->i_size is inode->i_mutex,
and we're already holding i_mutex here.  The current mainline code
looks correct.

My guess is that there is some other code path which is modifying
inode->i_size without holding inode->i_mutex, and while holding
ocfs2_inode_lock().  If so, that code is surely wrong - it should hold
i_mutex while modifying i_size.

Also, safely reading i_size should be performed via i_size_read(), and
modifications to i_size should use i_size_write().

And all this is only really applicable to 32-bit CPUs, which you
probably aren't using.

So.... please let's take a second look at this.
Shen Canquan June 27, 2013, 1:19 a.m. UTC | #2
On 2013/6/27 5:18, Andrew Morton wrote:

> On Thu, 20 Jun 2013 16:23:59 +0800 shencanquan <shencanquan@huawei.com> wrote:
> 
>> llseek requires ocfs2 inode lock for updating the file size in SEEK_END.
>> because the file size maybe update on another node.
>> if it not . after call llseek in SEEK_END. the position is old.
>>
>> this bug can be reproduce the following scenario:
>> at first ,we dd a test fileA,the file size is 10k.
>> on NodeA:
>> ---------
>> 1) open the test fileA, lseek the end of file. and print the position.
>> 2) close the test fileA
>>
>> on NodeB:
>> 1) open the test fileA, append the 5k data to test FileA.
>> 2) lseek the end of file. and print the position.
>> 3) close file.
>>
>> at first we run the test program1 on NodeA , the result is 10k.
>> and then run the test program2 on NodeB,  the result is 15k.
>> at last, we run the test program1 on NodeA again, the result is 10k.
>>
>> after apply this patch.  the three step result is 15k.
>>
>> ...
>>
>> --- a/fs/ocfs2/file.c
>> +++ b/fs/ocfs2/file.c
>> @@ -2626,7 +2626,16 @@ static loff_t ocfs2_file_llseek(struct file *file, loff_t offset, int whence)
>>  	case SEEK_SET:
>>  		break;
>>  	case SEEK_END:
>> +		/* SEEK_END requires the OCFS2 inode lock for the file
>> +		 * because it references the file's size.
>> +		 */
>> +		ret = ocfs2_inode_lock(inode, NULL, 0);
>> +		if (ret < 0) {
>> +			mlog_errno(ret);
>> +			goto out;
>> +		}
>>  		offset += inode->i_size;
>> +		ocfs2_inode_unlock(inode, 0);
>>  		break;
> 
> I don't understand this.  The lock for inode->i_size is inode->i_mutex,
> and we're already holding i_mutex here.  The current mainline code
> looks correct.
> 
> My guess is that there is some other code path which is modifying
> inode->i_size without holding inode->i_mutex, and while holding
> ocfs2_inode_lock().  If so, that code is surely wrong - it should hold
> i_mutex while modifying i_size.


inode->i_mutex lock only protect the inode size on the same machine.
because ocfs2 is cluster file system. so other machine maybe append the file.
and modify the file size.
and so before get the inode size, it need cluster lock, when other node unlock
cluster lock it will write the inode size to disk. and this machine acquire the lock
it will read the inode size from disk.

> 
> Also, safely reading i_size should be performed via i_size_read(), and
> modifications to i_size should use i_size_write().
> 

ok, thanks for your comments, I will modify this and resent the patch.

> And all this is only really applicable to 32-bit CPUs, which you
> probably aren't using.

I don't understand this.

> 
> So.... please let's take a second look at this.
> 
> .
>
Andrew Morton June 27, 2013, 1:25 a.m. UTC | #3
On Thu, 27 Jun 2013 09:19:52 +0800 shencanquan <shencanquan@huawei.com> wrote:

> On 2013/6/27 5:18, Andrew Morton wrote:
> 
> > 
> > My guess is that there is some other code path which is modifying
> > inode->i_size without holding inode->i_mutex, and while holding
> > ocfs2_inode_lock().  If so, that code is surely wrong - it should hold
> > i_mutex while modifying i_size.
> 
> 
> inode->i_mutex lock only protect the inode size on the same machine.

ah ;)

> > And all this is only really applicable to 32-bit CPUs, which you
> > probably aren't using.
> 
> I don't understand this.

The i_size_read/i_size_write infrastructure is designed to efficiently
handle the situation where a 32-bit machine reads a 64-bit number which
might be undergoing modification on another CPU.  We don't want the
reading CPU to see an invalid number when the writing CPU is in the
middle of modifying the two 32-bit words.
Shen Canquan June 27, 2013, 1:50 a.m. UTC | #4
On 2013/6/27 9:25, Andrew Morton wrote:

> On Thu, 27 Jun 2013 09:19:52 +0800 shencanquan <shencanquan@huawei.com> wrote:
> 
>> On 2013/6/27 5:18, Andrew Morton wrote:
>>
>>>
>>> My guess is that there is some other code path which is modifying
>>> inode->i_size without holding inode->i_mutex, and while holding
>>> ocfs2_inode_lock().  If so, that code is surely wrong - it should hold
>>> i_mutex while modifying i_size.
>>
>>
>> inode->i_mutex lock only protect the inode size on the same machine.
> 
> ah ;)
> 
>>> And all this is only really applicable to 32-bit CPUs, which you
>>> probably aren't using.
>>
>> I don't understand this.
> 
> The i_size_read/i_size_write infrastructure is designed to efficiently
> handle the situation where a 32-bit machine reads a 64-bit number which
> might be undergoing modification on another CPU.  We don't want the
> reading CPU to see an invalid number when the writing CPU is in the
> middle of modifying the two 32-bit words.
> 
> 
> 

ok. thanks.
Sunil Mushran June 27, 2013, 3:34 a.m. UTC | #5
AFAIR, this behavior has been there since day 1 and changing it will impact
performance negatively. I would recommend against making this change for
one app.


On Wed, Jun 26, 2013 at 6:50 PM, shencanquan <shencanquan@huawei.com> wrote:

> On 2013/6/27 9:25, Andrew Morton wrote:
>
> > On Thu, 27 Jun 2013 09:19:52 +0800 shencanquan <shencanquan@huawei.com>
> wrote:
> >
> >> On 2013/6/27 5:18, Andrew Morton wrote:
> >>
> >>>
> >>> My guess is that there is some other code path which is modifying
> >>> inode->i_size without holding inode->i_mutex, and while holding
> >>> ocfs2_inode_lock().  If so, that code is surely wrong - it should hold
> >>> i_mutex while modifying i_size.
> >>
> >>
> >> inode->i_mutex lock only protect the inode size on the same machine.
> >
> > ah ;)
> >
> >>> And all this is only really applicable to 32-bit CPUs, which you
> >>> probably aren't using.
> >>
> >> I don't understand this.
> >
> > The i_size_read/i_size_write infrastructure is designed to efficiently
> > handle the situation where a 32-bit machine reads a 64-bit number which
> > might be undergoing modification on another CPU.  We don't want the
> > reading CPU to see an invalid number when the writing CPU is in the
> > middle of modifying the two 32-bit words.
> >
> >
> >
>
> ok. thanks.
>
>
> _______________________________________________
> Ocfs2-devel mailing list
> Ocfs2-devel@oss.oracle.com
> https://oss.oracle.com/mailman/listinfo/ocfs2-devel
>
Shen Canquan June 27, 2013, 8:56 a.m. UTC | #6
On 2013/6/27 11:34, Sunil Mushran wrote:

> AFAIR, this behavior has been there since day 1 and changing it will impact performance negatively. I would recommend against making this
> change for one app.
> 

I think it will impact the performance negatively. but I think it is very
very little, because in cluster lock function, the inode size will update
from dlm in lvb. and it don't update by disk (another machine write file
size to disk and this machine read file size from disk to update the inode size)

if the file size update by disk. I think it will impact the performance very much.

> 
> On Wed, Jun 26, 2013 at 6:50 PM, shencanquan <shencanquan@huawei.com <mailto:shencanquan@huawei.com>> wrote:
> 
>     On 2013/6/27 9:25, Andrew Morton wrote:
> 
>     > On Thu, 27 Jun 2013 09:19:52 +0800 shencanquan <shencanquan@huawei.com <mailto:shencanquan@huawei.com>> wrote:
>     >
>     >> On 2013/6/27 5:18, Andrew Morton wrote:
>     >>
>     >>>
>     >>> My guess is that there is some other code path which is modifying
>     >>> inode->i_size without holding inode->i_mutex, and while holding
>     >>> ocfs2_inode_lock().  If so, that code is surely wrong - it should hold
>     >>> i_mutex while modifying i_size.
>     >>
>     >>
>     >> inode->i_mutex lock only protect the inode size on the same machine.
>     >
>     > ah ;)
>     >
>     >>> And all this is only really applicable to 32-bit CPUs, which you
>     >>> probably aren't using.
>     >>
>     >> I don't understand this.
>     >
>     > The i_size_read/i_size_write infrastructure is designed to efficiently
>     > handle the situation where a 32-bit machine reads a 64-bit number which
>     > might be undergoing modification on another CPU.  We don't want the
>     > reading CPU to see an invalid number when the writing CPU is in the
>     > middle of modifying the two 32-bit words.
>     >
>     >
>     >
> 
>     ok. thanks.
> 
> 
>     _______________________________________________
>     Ocfs2-devel mailing list
>     Ocfs2-devel@oss.oracle.com <mailto:Ocfs2-devel@oss.oracle.com>
>     https://oss.oracle.com/mailman/listinfo/ocfs2-devel
> 
>
Andrew Morton June 27, 2013, 10:08 p.m. UTC | #7
On Wed, 26 Jun 2013 20:34:19 -0700 Sunil Mushran <sunil.mushran@gmail.com> wrote:

> AFAIR, this behavior has been there since day 1 and changing it will impact
> performance negatively. I would recommend against making this change for
> one app.

Well, it's a bug fix isn't it?  SEEK_END on a 15k file is presently returning
10k.

Obviously tradeoffs are involved - is there any way in which this
behaviour can be fixed without serious performance damage?
Sunil Mushran June 27, 2013, 10:59 p.m. UTC | #8
The qs is whether this change is required for a real problem or not. If so,
what is that logic
that gets tripped up by this behaviour.


On Thu, Jun 27, 2013 at 3:08 PM, Andrew Morton <akpm@linux-foundation.org>wrote:

> On Wed, 26 Jun 2013 20:34:19 -0700 Sunil Mushran <sunil.mushran@gmail.com>
> wrote:
>
> > AFAIR, this behavior has been there since day 1 and changing it will
> impact
> > performance negatively. I would recommend against making this change for
> > one app.
>
> Well, it's a bug fix isn't it?  SEEK_END on a 15k file is presently
> returning
> 10k.
>
> Obviously tradeoffs are involved - is there any way in which this
> behaviour can be fixed without serious performance damage?
>
>
jeff.liu June 28, 2013, 9:11 a.m. UTC | #9
On 06/28/2013 06:59 AM, Sunil Mushran wrote:

> The qs is whether this change is required for a real problem or not. If
> so, what is that logic
> that gets tripped up by this behaviour.


IMHO, there have a couple of commands in Coreutils would be affected by this
behavior:
- tail(1) with "-c bytes" option.
- wc(1): when counting only bytes, it will try to figure out the file size
	 through lseek (SEEK_END - SEEK_CUR) when counting only bytes.

Other popular programs need get the file size via SEEK_END only in some corner
cases.  e.g, if the input file is not a regular file or the utility failed to
fetch the file size via fstat(2), they will call lseek as an alternative approach,
like split(1), truncate(1), head(1) with '-c bytes' option.

Generally, the user applications might fetch the file size through fstat(2) which
will go through ocfs2_getattr() so that it's safe to us because we perform inode
re-validation before filling up the generic attributes.

IOWs, the user application could be changed accordingly to avoid this issue.
However, this most likely a workaround to some extent, and we have a bug for
the SEEK_END business.

As per our current discussion, one big concern of us is regarding the performance
with this fix, I'd like to consider it from another point of view, that is the user
should take all the blame to themselves if they would like to run SEEK_END bounds
operation on OCFS2, does it sounds make sense? :-P.
If yes, it would be better to consolidate the code comments to indicate the
performance negatively and it is not advisable to fetch file size via SEEK_END
on OCFS2.

Thanks,
-Jeff

> 
> 
> On Thu, Jun 27, 2013 at 3:08 PM, Andrew Morton
> <akpm@linux-foundation.org <mailto:akpm@linux-foundation.org>> wrote:
> 
>     On Wed, 26 Jun 2013 20:34:19 -0700 Sunil Mushran
>     <sunil.mushran@gmail.com <mailto:sunil.mushran@gmail.com>> wrote:
> 
>     > AFAIR, this behavior has been there since day 1 and changing it
>     will impact
>     > performance negatively. I would recommend against making this
>     change for
>     > one app.
> 
>     Well, it's a bug fix isn't it?  SEEK_END on a 15k file is presently
>     returning
>     10k.
> 
>     Obviously tradeoffs are involved - is there any way in which this
>     behaviour can be fixed without serious performance damage?
> 
> 
> 
> 
> _______________________________________________
> Ocfs2-devel mailing list
> Ocfs2-devel@oss.oracle.com
> https://oss.oracle.com/mailman/listinfo/ocfs2-devel
Joel Becker June 29, 2013, 1:28 p.m. UTC | #10
On Fri, Jun 28, 2013 at 05:11:25PM +0800, Jeff Liu wrote:
> On 06/28/2013 06:59 AM, Sunil Mushran wrote:
> 
> > The qs is whether this change is required for a real problem or not. If
> > so, what is that logic
> > that gets tripped up by this behaviour.

This is a straight correctness fix.  We take the hit and apply it.  I'll
check the actual patch on that email.

Joel

> 
> 
> IMHO, there have a couple of commands in Coreutils would be affected by this
> behavior:
> - tail(1) with "-c bytes" option.
> - wc(1): when counting only bytes, it will try to figure out the file size
> 	 through lseek (SEEK_END - SEEK_CUR) when counting only bytes.
> 
> Other popular programs need get the file size via SEEK_END only in some corner
> cases.  e.g, if the input file is not a regular file or the utility failed to
> fetch the file size via fstat(2), they will call lseek as an alternative approach,
> like split(1), truncate(1), head(1) with '-c bytes' option.
> 
> Generally, the user applications might fetch the file size through fstat(2) which
> will go through ocfs2_getattr() so that it's safe to us because we perform inode
> re-validation before filling up the generic attributes.
> 
> IOWs, the user application could be changed accordingly to avoid this issue.
> However, this most likely a workaround to some extent, and we have a bug for
> the SEEK_END business.
> 
> As per our current discussion, one big concern of us is regarding the performance
> with this fix, I'd like to consider it from another point of view, that is the user
> should take all the blame to themselves if they would like to run SEEK_END bounds
> operation on OCFS2, does it sounds make sense? :-P.
> If yes, it would be better to consolidate the code comments to indicate the
> performance negatively and it is not advisable to fetch file size via SEEK_END
> on OCFS2.
> 
> Thanks,
> -Jeff
> 
> > 
> > 
> > On Thu, Jun 27, 2013 at 3:08 PM, Andrew Morton
> > <akpm@linux-foundation.org <mailto:akpm@linux-foundation.org>> wrote:
> > 
> >     On Wed, 26 Jun 2013 20:34:19 -0700 Sunil Mushran
> >     <sunil.mushran@gmail.com <mailto:sunil.mushran@gmail.com>> wrote:
> > 
> >     > AFAIR, this behavior has been there since day 1 and changing it
> >     will impact
> >     > performance negatively. I would recommend against making this
> >     change for
> >     > one app.
> > 
> >     Well, it's a bug fix isn't it?  SEEK_END on a 15k file is presently
> >     returning
> >     10k.
> > 
> >     Obviously tradeoffs are involved - is there any way in which this
> >     behaviour can be fixed without serious performance damage?
> > 
> > 
> > 
> > 
> > _______________________________________________
> > Ocfs2-devel mailing list
> > Ocfs2-devel@oss.oracle.com
> > https://oss.oracle.com/mailman/listinfo/ocfs2-devel
> 
> 
> 
> _______________________________________________
> Ocfs2-devel mailing list
> Ocfs2-devel@oss.oracle.com
> https://oss.oracle.com/mailman/listinfo/ocfs2-devel
Joel Becker June 29, 2013, 1:37 p.m. UTC | #11
On Thu, Jun 20, 2013 at 04:23:59PM +0800, shencanquan wrote:
> llseek requires ocfs2 inode lock for updating the file size in SEEK_END.
> because the file size maybe update on another node.
> if it not . after call llseek in SEEK_END. the position is old.
> 
> this bug can be reproduce the following scenario:
> at first ,we dd a test fileA,the file size is 10k.
> on NodeA:
> ---------
> 1) open the test fileA, lseek the end of file. and print the position.
> 2) close the test fileA
> 
> on NodeB:
> 1) open the test fileA, append the 5k data to test FileA.
> 2) lseek the end of file. and print the position.
> 3) close file.
> 
> at first we run the test program1 on NodeA , the result is 10k.
> and then run the test program2 on NodeB,  the result is 15k.
> at last, we run the test program1 on NodeA again, the result is 10k.
> 
> after apply this patch.  the three step result is 15k.
> 
> Signed-off-by: jensen <shencanquan@huawei.com>
> ---
>  fs/ocfs2/file.c |    9 +++++++++
>  1 file changed, 9 insertions(+)
> 
> diff --git a/fs/ocfs2/file.c b/fs/ocfs2/file.c
> index ff54014..3afd24c 100644
> --- a/fs/ocfs2/file.c
> +++ b/fs/ocfs2/file.c
> @@ -2626,7 +2626,16 @@ static loff_t ocfs2_file_llseek(struct file *file, loff_t offset, int whence)
>  	case SEEK_SET:
>  		break;
>  	case SEEK_END:
> +		/* SEEK_END requires the OCFS2 inode lock for the file
> +		 * because it references the file's size.
> +		 */
> +		ret = ocfs2_inode_lock(inode, NULL, 0);
> +		if (ret < 0) {
> +			mlog_errno(ret);
> +			goto out;
> +		}
>  		offset += inode->i_size;
> +		ocfs2_inode_unlock(inode, 0);

Why wouldn't ocfs2_rw_lock() work?  Just because we dont get the LVB
from it?

Joel

>  		break;
>  	case SEEK_CUR:
>  		if (offset == 0) {
> -- 
> 1.7.9.7
> 
> 
> _______________________________________________
> Ocfs2-devel mailing list
> Ocfs2-devel@oss.oracle.com
> https://oss.oracle.com/mailman/listinfo/ocfs2-devel
Shen Canquan July 1, 2013, 1:38 a.m. UTC | #12
On 2013/6/29 21:37, Joel Becker wrote:

> On Thu, Jun 20, 2013 at 04:23:59PM +0800, shencanquan wrote:
>> llseek requires ocfs2 inode lock for updating the file size in SEEK_END.
>> because the file size maybe update on another node.
>> if it not . after call llseek in SEEK_END. the position is old.
>>
>> this bug can be reproduce the following scenario:
>> at first ,we dd a test fileA,the file size is 10k.
>> on NodeA:
>> ---------
>> 1) open the test fileA, lseek the end of file. and print the position.
>> 2) close the test fileA
>>
>> on NodeB:
>> 1) open the test fileA, append the 5k data to test FileA.
>> 2) lseek the end of file. and print the position.
>> 3) close file.
>>
>> at first we run the test program1 on NodeA , the result is 10k.
>> and then run the test program2 on NodeB,  the result is 15k.
>> at last, we run the test program1 on NodeA again, the result is 10k.
>>
>> after apply this patch.  the three step result is 15k.
>>
>> Signed-off-by: jensen <shencanquan@huawei.com>
>> ---
>>  fs/ocfs2/file.c |    9 +++++++++
>>  1 file changed, 9 insertions(+)
>>
>> diff --git a/fs/ocfs2/file.c b/fs/ocfs2/file.c
>> index ff54014..3afd24c 100644
>> --- a/fs/ocfs2/file.c
>> +++ b/fs/ocfs2/file.c
>> @@ -2626,7 +2626,16 @@ static loff_t ocfs2_file_llseek(struct file *file, loff_t offset, int whence)
>>  	case SEEK_SET:
>>  		break;
>>  	case SEEK_END:
>> +		/* SEEK_END requires the OCFS2 inode lock for the file
>> +		 * because it references the file's size.
>> +		 */
>> +		ret = ocfs2_inode_lock(inode, NULL, 0);
>> +		if (ret < 0) {
>> +			mlog_errno(ret);
>> +			goto out;
>> +		}
>>  		offset += inode->i_size;
>> +		ocfs2_inode_unlock(inode, 0);
> 
> Why wouldn't ocfs2_rw_lock() work?  Just because we dont get the LVB
> from it?
> 


Yes. we want to update the inode size from lvb.

I also think the file size maybe protected by inode lock. not rw lock.

> Joel
> 
>>  		break;
>>  	case SEEK_CUR:
>>  		if (offset == 0) {
>> -- 
>> 1.7.9.7
>>
>>
>> _______________________________________________
>> Ocfs2-devel mailing list
>> Ocfs2-devel@oss.oracle.com
>> https://oss.oracle.com/mailman/listinfo/ocfs2-devel
>
Mark Fasheh July 2, 2013, 7:58 p.m. UTC | #13
On Mon, Jul 01, 2013 at 09:38:31AM +0800, Jensen wrote:
> On 2013/6/29 21:37, Joel Becker wrote:
> 
> > On Thu, Jun 20, 2013 at 04:23:59PM +0800, shencanquan wrote:
> >> llseek requires ocfs2 inode lock for updating the file size in SEEK_END.
> >> because the file size maybe update on another node.
> >> if it not . after call llseek in SEEK_END. the position is old.
> >>
> >> this bug can be reproduce the following scenario:
> >> at first ,we dd a test fileA,the file size is 10k.
> >> on NodeA:
> >> ---------
> >> 1) open the test fileA, lseek the end of file. and print the position.
> >> 2) close the test fileA
> >>
> >> on NodeB:
> >> 1) open the test fileA, append the 5k data to test FileA.
> >> 2) lseek the end of file. and print the position.
> >> 3) close file.
> >>
> >> at first we run the test program1 on NodeA , the result is 10k.
> >> and then run the test program2 on NodeB,  the result is 15k.
> >> at last, we run the test program1 on NodeA again, the result is 10k.
> >>
> >> after apply this patch.  the three step result is 15k.
> >>
> >> Signed-off-by: jensen <shencanquan@huawei.com>
> >> ---
> >>  fs/ocfs2/file.c |    9 +++++++++
> >>  1 file changed, 9 insertions(+)
> >>
> >> diff --git a/fs/ocfs2/file.c b/fs/ocfs2/file.c
> >> index ff54014..3afd24c 100644
> >> --- a/fs/ocfs2/file.c
> >> +++ b/fs/ocfs2/file.c
> >> @@ -2626,7 +2626,16 @@ static loff_t ocfs2_file_llseek(struct file *file, loff_t offset, int whence)
> >>  	case SEEK_SET:
> >>  		break;
> >>  	case SEEK_END:
> >> +		/* SEEK_END requires the OCFS2 inode lock for the file
> >> +		 * because it references the file's size.
> >> +		 */
> >> +		ret = ocfs2_inode_lock(inode, NULL, 0);
> >> +		if (ret < 0) {
> >> +			mlog_errno(ret);
> >> +			goto out;
> >> +		}
> >>  		offset += inode->i_size;
> >> +		ocfs2_inode_unlock(inode, 0);
> > 
> > Why wouldn't ocfs2_rw_lock() work?  Just because we dont get the LVB
> > from it?
> > 
> 
> 
> Yes. we want to update the inode size from lvb.
> 
> I also think the file size maybe protected by inode lock. not rw lock.

Correct, if you want to get the most up to date i_size you'll have to be
holding the meta (inode) lock.
	--Mark

--
Mark Fasheh
Joel Becker July 2, 2013, 9:19 p.m. UTC | #14
On Tue, Jul 02, 2013 at 12:58:26PM -0700, Mark Fasheh wrote:
> On Mon, Jul 01, 2013 at 09:38:31AM +0800, Jensen wrote:
> > On 2013/6/29 21:37, Joel Becker wrote:
> > 
> > > On Thu, Jun 20, 2013 at 04:23:59PM +0800, shencanquan wrote:
> > >> llseek requires ocfs2 inode lock for updating the file size in SEEK_END.
> > >> because the file size maybe update on another node.
> > >> if it not . after call llseek in SEEK_END. the position is old.
> > >>
> > >> this bug can be reproduce the following scenario:
> > >> at first ,we dd a test fileA,the file size is 10k.
> > >> on NodeA:
> > >> ---------
> > >> 1) open the test fileA, lseek the end of file. and print the position.
> > >> 2) close the test fileA
> > >>
> > >> on NodeB:
> > >> 1) open the test fileA, append the 5k data to test FileA.
> > >> 2) lseek the end of file. and print the position.
> > >> 3) close file.
> > >>
> > >> at first we run the test program1 on NodeA , the result is 10k.
> > >> and then run the test program2 on NodeB,  the result is 15k.
> > >> at last, we run the test program1 on NodeA again, the result is 10k.
> > >>
> > >> after apply this patch.  the three step result is 15k.
> > >>
> > >> Signed-off-by: jensen <shencanquan@huawei.com>
> > >> ---
> > >>  fs/ocfs2/file.c |    9 +++++++++
> > >>  1 file changed, 9 insertions(+)
> > >>
> > >> diff --git a/fs/ocfs2/file.c b/fs/ocfs2/file.c
> > >> index ff54014..3afd24c 100644
> > >> --- a/fs/ocfs2/file.c
> > >> +++ b/fs/ocfs2/file.c
> > >> @@ -2626,7 +2626,16 @@ static loff_t ocfs2_file_llseek(struct file *file, loff_t offset, int whence)
> > >>  	case SEEK_SET:
> > >>  		break;
> > >>  	case SEEK_END:
> > >> +		/* SEEK_END requires the OCFS2 inode lock for the file
> > >> +		 * because it references the file's size.
> > >> +		 */
> > >> +		ret = ocfs2_inode_lock(inode, NULL, 0);
> > >> +		if (ret < 0) {
> > >> +			mlog_errno(ret);
> > >> +			goto out;
> > >> +		}
> > >>  		offset += inode->i_size;
> > >> +		ocfs2_inode_unlock(inode, 0);
> > > 
> > > Why wouldn't ocfs2_rw_lock() work?  Just because we dont get the LVB
> > > from it?
> > > 
> > 
> > 
> > Yes. we want to update the inode size from lvb.
> > 
> > I also think the file size maybe protected by inode lock. not rw lock.
> 
> Correct, if you want to get the most up to date i_size you'll have to be
> holding the meta (inode) lock.

Ok, then:

Acked-by: Joel Becker <jlbec@evilplan.org>

> 	--Mark
> 
> --
> Mark Fasheh
> 
> _______________________________________________
> Ocfs2-devel mailing list
> Ocfs2-devel@oss.oracle.com
> https://oss.oracle.com/mailman/listinfo/ocfs2-devel
diff mbox

Patch

diff --git a/fs/ocfs2/file.c b/fs/ocfs2/file.c
index ff54014..3afd24c 100644
--- a/fs/ocfs2/file.c
+++ b/fs/ocfs2/file.c
@@ -2626,7 +2626,16 @@  static loff_t ocfs2_file_llseek(struct file *file, loff_t offset, int whence)
 	case SEEK_SET:
 		break;
 	case SEEK_END:
+		/* SEEK_END requires the OCFS2 inode lock for the file
+		 * because it references the file's size.
+		 */
+		ret = ocfs2_inode_lock(inode, NULL, 0);
+		if (ret < 0) {
+			mlog_errno(ret);
+			goto out;
+		}
 		offset += inode->i_size;
+		ocfs2_inode_unlock(inode, 0);
 		break;
 	case SEEK_CUR:
 		if (offset == 0) {