Message ID | 1472252646-29618-1-git-send-email-ashish.samant@oracle.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Hi, Thanks for this fix. I'd like to reproduce this issue locally and test this patch, could you elaborate the detailed steps of reproduction? Thanks, Eric On 08/27/2016 07:04 AM, Ashish Samant wrote: > If we punch a hole on a reflink such that following conditions are met: > > 1. start offset is on a cluster boundary > 2. end offset is not on a cluster boundary > 3. (end offset is somewhere in another extent) or > (hole range > MAX_CONTIG_BYTES(1MB)), > > we dont COW the first cluster starting at the start offset. But in this > case, we were wrongly passing this cluster to > ocfs2_zero_range_for_truncate() to zero out. This will modify the cluster > in place and zero it in the source too. > > Fix this by skipping this cluster in such a scenario. > > Reported-by: Saar Maoz <saar.maoz@oracle.com> > Signed-off-by: Ashish Samant <ashish.samant@oracle.com> > Reviewed-by: Srinivas Eeda <srinivas.eeda@oracle.com> > --- > v1->v2: > -Changed the commit msg to include a better and generic description of > the problem, for all cluster sizes. > -Added Reported-by and Reviewed-by tags. > > fs/ocfs2/file.c | 34 ++++++++++++++++++++++++---------- > 1 file changed, 24 insertions(+), 10 deletions(-) > > diff --git a/fs/ocfs2/file.c b/fs/ocfs2/file.c > index 4e7b0dc..0b055bf 100644 > --- a/fs/ocfs2/file.c > +++ b/fs/ocfs2/file.c > @@ -1506,7 +1506,8 @@ static int ocfs2_zero_partial_clusters(struct inode *inode, > u64 start, u64 len) > { > int ret = 0; > - u64 tmpend, end = start + len; > + u64 tmpend = 0; > + u64 end = start + len; > struct ocfs2_super *osb = OCFS2_SB(inode->i_sb); > unsigned int csize = osb->s_clustersize; > handle_t *handle; > @@ -1538,18 +1539,31 @@ static int ocfs2_zero_partial_clusters(struct inode *inode, > } > > /* > - * We want to get the byte offset of the end of the 1st cluster. > + * If start is on a cluster boundary and end is somewhere in another > + * cluster, we have not COWed the cluster starting at start, unless > + * end is also within the same cluster. So, in this case, we skip this > + * first call to ocfs2_zero_range_for_truncate() truncate and move on > + * to the next one. > */ > - tmpend = (u64)osb->s_clustersize + (start & ~(osb->s_clustersize - 1)); > - if (tmpend > end) > - tmpend = end; > + if ((start & (csize - 1)) != 0) { > + /* > + * We want to get the byte offset of the end of the 1st > + * cluster. > + */ > + tmpend = (u64)osb->s_clustersize + > + (start & ~(osb->s_clustersize - 1)); > + if (tmpend > end) > + tmpend = end; > > - trace_ocfs2_zero_partial_clusters_range1((unsigned long long)start, > - (unsigned long long)tmpend); > + trace_ocfs2_zero_partial_clusters_range1( > + (unsigned long long)start, > + (unsigned long long)tmpend); > > - ret = ocfs2_zero_range_for_truncate(inode, handle, start, tmpend); > - if (ret) > - mlog_errno(ret); > + ret = ocfs2_zero_range_for_truncate(inode, handle, start, > + tmpend); > + if (ret) > + mlog_errno(ret); > + } > > if (tmpend < end) { > /*
Hi Eric, The easiest way to reproduce this is : 1. Create a random file of say 10 MB xfs_io -c 'pwrite -b 4k 0 10M' -f 10MBfile 2. Reflink it reflink -f 10MBfile reflnktest 3. Punch a hole at starting at cluster boundary with range greater that 1MB. You can also use a range that will put the end offset in another extent. fallocate -p -o 0 -l 1048615 reflnktest 4. sync 5. Check the first cluster in the source file. (It will be zeroed out). dd if=10MBfile iflag=direct bs=<cluster size> count=1 | hexdump -C Thanks, Ashish On 08/28/2016 10:39 PM, Eric Ren wrote: > Hi, > > Thanks for this fix. I'd like to reproduce this issue locally and test > this patch, > could you elaborate the detailed steps of reproduction? > > Thanks, > Eric > > On 08/27/2016 07:04 AM, Ashish Samant wrote: >> If we punch a hole on a reflink such that following conditions are met: >> >> 1. start offset is on a cluster boundary >> 2. end offset is not on a cluster boundary >> 3. (end offset is somewhere in another extent) or >> (hole range > MAX_CONTIG_BYTES(1MB)), >> >> we dont COW the first cluster starting at the start offset. But in this >> case, we were wrongly passing this cluster to >> ocfs2_zero_range_for_truncate() to zero out. This will modify the >> cluster >> in place and zero it in the source too. >> >> Fix this by skipping this cluster in such a scenario. >> >> Reported-by: Saar Maoz <saar.maoz@oracle.com> >> Signed-off-by: Ashish Samant <ashish.samant@oracle.com> >> Reviewed-by: Srinivas Eeda <srinivas.eeda@oracle.com> >> --- >> v1->v2: >> -Changed the commit msg to include a better and generic description of >> the problem, for all cluster sizes. >> -Added Reported-by and Reviewed-by tags. >> fs/ocfs2/file.c | 34 ++++++++++++++++++++++++---------- >> 1 file changed, 24 insertions(+), 10 deletions(-) >> >> diff --git a/fs/ocfs2/file.c b/fs/ocfs2/file.c >> index 4e7b0dc..0b055bf 100644 >> --- a/fs/ocfs2/file.c >> +++ b/fs/ocfs2/file.c >> @@ -1506,7 +1506,8 @@ static int ocfs2_zero_partial_clusters(struct >> inode *inode, >> u64 start, u64 len) >> { >> int ret = 0; >> - u64 tmpend, end = start + len; >> + u64 tmpend = 0; >> + u64 end = start + len; >> struct ocfs2_super *osb = OCFS2_SB(inode->i_sb); >> unsigned int csize = osb->s_clustersize; >> handle_t *handle; >> @@ -1538,18 +1539,31 @@ static int ocfs2_zero_partial_clusters(struct >> inode *inode, >> } >> /* >> - * We want to get the byte offset of the end of the 1st cluster. >> + * If start is on a cluster boundary and end is somewhere in >> another >> + * cluster, we have not COWed the cluster starting at start, unless >> + * end is also within the same cluster. So, in this case, we >> skip this >> + * first call to ocfs2_zero_range_for_truncate() truncate and >> move on >> + * to the next one. >> */ >> - tmpend = (u64)osb->s_clustersize + (start & ~(osb->s_clustersize >> - 1)); >> - if (tmpend > end) >> - tmpend = end; >> + if ((start & (csize - 1)) != 0) { >> + /* >> + * We want to get the byte offset of the end of the 1st >> + * cluster. >> + */ >> + tmpend = (u64)osb->s_clustersize + >> + (start & ~(osb->s_clustersize - 1)); >> + if (tmpend > end) >> + tmpend = end; >> - trace_ocfs2_zero_partial_clusters_range1((unsigned long >> long)start, >> - (unsigned long long)tmpend); >> + trace_ocfs2_zero_partial_clusters_range1( >> + (unsigned long long)start, >> + (unsigned long long)tmpend); >> - ret = ocfs2_zero_range_for_truncate(inode, handle, start, >> tmpend); >> - if (ret) >> - mlog_errno(ret); >> + ret = ocfs2_zero_range_for_truncate(inode, handle, start, >> + tmpend); >> + if (ret) >> + mlog_errno(ret); >> + } >> if (tmpend < end) { >> /* > >
On 08/30/2016 03:23 AM, Ashish Samant wrote: > Hi Eric, > > The easiest way to reproduce this is : > > 1. Create a random file of say 10 MB > xfs_io -c 'pwrite -b 4k 0 10M' -f 10MBfile > 2. Reflink it > reflink -f 10MBfile reflnktest > 3. Punch a hole at starting at cluster boundary with range greater that > 1MB. You can also use a range that will put the end offset in another > extent. > fallocate -p -o 0 -l 1048615 reflnktest > 4. sync > 5. Check the first cluster in the source file. (It will be zeroed out). > dd if=10MBfile iflag=direct bs=<cluster size> count=1 | hexdump -C > Cool, this reproduce step deserved to be add into patch log. Thanks, Junxiao. > Thanks, > Ashish > > On 08/28/2016 10:39 PM, Eric Ren wrote: >> Hi, >> >> Thanks for this fix. I'd like to reproduce this issue locally and test >> this patch, >> could you elaborate the detailed steps of reproduction? >> >> Thanks, >> Eric >> >> On 08/27/2016 07:04 AM, Ashish Samant wrote: >>> If we punch a hole on a reflink such that following conditions are met: >>> >>> 1. start offset is on a cluster boundary >>> 2. end offset is not on a cluster boundary >>> 3. (end offset is somewhere in another extent) or >>> (hole range > MAX_CONTIG_BYTES(1MB)), >>> >>> we dont COW the first cluster starting at the start offset. But in this >>> case, we were wrongly passing this cluster to >>> ocfs2_zero_range_for_truncate() to zero out. This will modify the >>> cluster >>> in place and zero it in the source too. >>> >>> Fix this by skipping this cluster in such a scenario. >>> >>> Reported-by: Saar Maoz <saar.maoz@oracle.com> >>> Signed-off-by: Ashish Samant <ashish.samant@oracle.com> >>> Reviewed-by: Srinivas Eeda <srinivas.eeda@oracle.com> >>> --- >>> v1->v2: >>> -Changed the commit msg to include a better and generic description of >>> the problem, for all cluster sizes. >>> -Added Reported-by and Reviewed-by tags. >>> fs/ocfs2/file.c | 34 ++++++++++++++++++++++++---------- >>> 1 file changed, 24 insertions(+), 10 deletions(-) >>> >>> diff --git a/fs/ocfs2/file.c b/fs/ocfs2/file.c >>> index 4e7b0dc..0b055bf 100644 >>> --- a/fs/ocfs2/file.c >>> +++ b/fs/ocfs2/file.c >>> @@ -1506,7 +1506,8 @@ static int ocfs2_zero_partial_clusters(struct >>> inode *inode, >>> u64 start, u64 len) >>> { >>> int ret = 0; >>> - u64 tmpend, end = start + len; >>> + u64 tmpend = 0; >>> + u64 end = start + len; >>> struct ocfs2_super *osb = OCFS2_SB(inode->i_sb); >>> unsigned int csize = osb->s_clustersize; >>> handle_t *handle; >>> @@ -1538,18 +1539,31 @@ static int ocfs2_zero_partial_clusters(struct >>> inode *inode, >>> } >>> /* >>> - * We want to get the byte offset of the end of the 1st cluster. >>> + * If start is on a cluster boundary and end is somewhere in >>> another >>> + * cluster, we have not COWed the cluster starting at start, unless >>> + * end is also within the same cluster. So, in this case, we >>> skip this >>> + * first call to ocfs2_zero_range_for_truncate() truncate and >>> move on >>> + * to the next one. >>> */ >>> - tmpend = (u64)osb->s_clustersize + (start & ~(osb->s_clustersize >>> - 1)); >>> - if (tmpend > end) >>> - tmpend = end; >>> + if ((start & (csize - 1)) != 0) { >>> + /* >>> + * We want to get the byte offset of the end of the 1st >>> + * cluster. >>> + */ >>> + tmpend = (u64)osb->s_clustersize + >>> + (start & ~(osb->s_clustersize - 1)); >>> + if (tmpend > end) >>> + tmpend = end; >>> - trace_ocfs2_zero_partial_clusters_range1((unsigned long >>> long)start, >>> - (unsigned long long)tmpend); >>> + trace_ocfs2_zero_partial_clusters_range1( >>> + (unsigned long long)start, >>> + (unsigned long long)tmpend); >>> - ret = ocfs2_zero_range_for_truncate(inode, handle, start, >>> tmpend); >>> - if (ret) >>> - mlog_errno(ret); >>> + ret = ocfs2_zero_range_for_truncate(inode, handle, start, >>> + tmpend); >>> + if (ret) >>> + mlog_errno(ret); >>> + } >>> if (tmpend < end) { >>> /* >> >> > > > _______________________________________________ > Ocfs2-devel mailing list > Ocfs2-devel@oss.oracle.com > https://oss.oracle.com/mailman/listinfo/ocfs2-devel >
Hello, On 08/30/2016 03:23 AM, Ashish Samant wrote: > Hi Eric, > > The easiest way to reproduce this is : > > 1. Create a random file of say 10 MB > xfs_io -c 'pwrite -b 4k 0 10M' -f 10MBfile > 2. Reflink it > reflink -f 10MBfile reflnktest > 3. Punch a hole at starting at cluster boundary with range greater that 1MB. You can also > use a range that will put the end offset in another extent. > fallocate -p -o 0 -l 1048615 reflnktest > 4. sync > 5. Check the first cluster in the source file. (It will be zeroed out). > dd if=10MBfile iflag=direct bs=<cluster size> count=1 | hexdump -C Thanks! I have a try myself, but I'm not sure what is our expected output and if the test result meet it: 1. After applying this patch: ocfs2dev1:/mnt/ocfs2 # rm 10MBfile reflnktest ocfs2dev1:/mnt/ocfs2 # xfs_io -c 'pwrite -b 4k 0 10M' -f 10MBfile wrote 10485760/10485760 bytes at offset 0 10 MiB, 2560 ops; 0.0000 sec (1.089 GiB/sec and 285427.5839 ops/sec) ocfs2dev1:/mnt/ocfs2 # reflink -f 10MBfile reflnktest ocfs2dev1:/mnt/ocfs2 # fallocate -p -o 0 -l 1048615 reflnktest ocfs2dev1:/mnt/ocfs2 # dd if=10MBfile iflag=direct bs=1048576 count=1 | hexdump -C 00000000 cd cd cd cd cd cd cd cd cd cd cd cd cd cd cd cd |................| * 1+0 records in 1+0 records out 1048576 bytes (1.0 MB, 1.0 MiB) copied, 0.0952464 s, 11.0 MB/s 00100000 2. Before this patch: .... ocfs2dev1:/mnt/ocfs2 # dd if=10MBfile iflag=direct bs=1048576 count=1 | hexdump -C 00000000 cd cd cd cd cd cd cd cd cd cd cd cd cd cd cd cd |................| * 1+0 records in 1+0 records out 1048576 bytes (1.0 MB, 1.0 MiB) copied, 0.0954648 s, 11.0 MB/s 00100000 3. debugfs.ocfs2 -R stats /dev/sdb ... Block Size Bits: 12 Cluster Size Bits: 20 ... Eric > > Thanks, > Ashish > > On 08/28/2016 10:39 PM, Eric Ren wrote: >> Hi, >> >> Thanks for this fix. I'd like to reproduce this issue locally and test this patch, >> could you elaborate the detailed steps of reproduction? >> >> Thanks, >> Eric >> >> On 08/27/2016 07:04 AM, Ashish Samant wrote: >>> If we punch a hole on a reflink such that following conditions are met: >>> >>> 1. start offset is on a cluster boundary >>> 2. end offset is not on a cluster boundary >>> 3. (end offset is somewhere in another extent) or >>> (hole range > MAX_CONTIG_BYTES(1MB)), >>> >>> we dont COW the first cluster starting at the start offset. But in this >>> case, we were wrongly passing this cluster to >>> ocfs2_zero_range_for_truncate() to zero out. This will modify the cluster >>> in place and zero it in the source too. >>> >>> Fix this by skipping this cluster in such a scenario. >>> >>> Reported-by: Saar Maoz <saar.maoz@oracle.com> >>> Signed-off-by: Ashish Samant <ashish.samant@oracle.com> >>> Reviewed-by: Srinivas Eeda <srinivas.eeda@oracle.com> >>> --- >>> v1->v2: >>> -Changed the commit msg to include a better and generic description of >>> the problem, for all cluster sizes. >>> -Added Reported-by and Reviewed-by tags. >>> fs/ocfs2/file.c | 34 ++++++++++++++++++++++++---------- >>> 1 file changed, 24 insertions(+), 10 deletions(-) >>> >>> diff --git a/fs/ocfs2/file.c b/fs/ocfs2/file.c >>> index 4e7b0dc..0b055bf 100644 >>> --- a/fs/ocfs2/file.c >>> +++ b/fs/ocfs2/file.c >>> @@ -1506,7 +1506,8 @@ static int ocfs2_zero_partial_clusters(struct inode *inode, >>> u64 start, u64 len) >>> { >>> int ret = 0; >>> - u64 tmpend, end = start + len; >>> + u64 tmpend = 0; >>> + u64 end = start + len; >>> struct ocfs2_super *osb = OCFS2_SB(inode->i_sb); >>> unsigned int csize = osb->s_clustersize; >>> handle_t *handle; >>> @@ -1538,18 +1539,31 @@ static int ocfs2_zero_partial_clusters(struct inode *inode, >>> } >>> /* >>> - * We want to get the byte offset of the end of the 1st cluster. >>> + * If start is on a cluster boundary and end is somewhere in another >>> + * cluster, we have not COWed the cluster starting at start, unless >>> + * end is also within the same cluster. So, in this case, we skip this >>> + * first call to ocfs2_zero_range_for_truncate() truncate and move on >>> + * to the next one. >>> */ >>> - tmpend = (u64)osb->s_clustersize + (start & ~(osb->s_clustersize - 1)); >>> - if (tmpend > end) >>> - tmpend = end; >>> + if ((start & (csize - 1)) != 0) { >>> + /* >>> + * We want to get the byte offset of the end of the 1st >>> + * cluster. >>> + */ >>> + tmpend = (u64)osb->s_clustersize + >>> + (start & ~(osb->s_clustersize - 1)); >>> + if (tmpend > end) >>> + tmpend = end; >>> - trace_ocfs2_zero_partial_clusters_range1((unsigned long long)start, >>> - (unsigned long long)tmpend); >>> + trace_ocfs2_zero_partial_clusters_range1( >>> + (unsigned long long)start, >>> + (unsigned long long)tmpend); >>> - ret = ocfs2_zero_range_for_truncate(inode, handle, start, tmpend); >>> - if (ret) >>> - mlog_errno(ret); >>> + ret = ocfs2_zero_range_for_truncate(inode, handle, start, >>> + tmpend); >>> + if (ret) >>> + mlog_errno(ret); >>> + } >>> if (tmpend < end) { >>> /* >> >> >
Hmm, thats weird. I see this on 4.7 kernel without the patch: # xfs_io -c 'pwrite -b 4k 0 10M' -f 10MBfile wrote 10485760/10485760 bytes at offset 0 10 MiB, 2560 ops; 0.0000 sec (683.995 MiB/sec and 175102.5992 ops/sec) # reflink -f 10MBfile reflnktest # fallocate -p -o 0 -l 1048615 reflnktest # dd if=10MBfile iflag=direct bs=1048576 count=1 | hexdump -C 00000000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| * 1+0 records in 1+0 records out 1048576 bytes (1.0 MB) copied, 0.0321517 s, 32.6 MB/s 00100000 and with patch ---- # dd if=10MBfile iflag=direct bs=1M count=1 | hexdump -C 00000000 cd cd cd cd cd cd cd cd cd cd cd cd cd cd cd cd |................| * 1+0 records in 1+0 records out 00100000 Thanks, Ashish On 08/29/2016 08:33 PM, Eric Ren wrote: > Hello, > > On 08/30/2016 03:23 AM, Ashish Samant wrote: >> Hi Eric, >> >> The easiest way to reproduce this is : >> >> 1. Create a random file of say 10 MB >> xfs_io -c 'pwrite -b 4k 0 10M' -f 10MBfile >> 2. Reflink it >> reflink -f 10MBfile reflnktest >> 3. Punch a hole at starting at cluster boundary with range greater >> that 1MB. You can also use a range that will put the end offset in >> another extent. >> fallocate -p -o 0 -l 1048615 reflnktest >> 4. sync >> 5. Check the first cluster in the source file. (It will be zeroed out). >> dd if=10MBfile iflag=direct bs=<cluster size> count=1 | hexdump -C > > Thanks! I have a try myself, but I'm not sure what is our expected > output and if the test result meet > it: > > 1. After applying this patch: > ocfs2dev1:/mnt/ocfs2 # rm 10MBfile reflnktest > ocfs2dev1:/mnt/ocfs2 # xfs_io -c 'pwrite -b 4k 0 10M' -f 10MBfile > wrote 10485760/10485760 bytes at offset 0 > 10 MiB, 2560 ops; 0.0000 sec (1.089 GiB/sec and 285427.5839 ops/sec) > ocfs2dev1:/mnt/ocfs2 # reflink -f 10MBfile reflnktest > ocfs2dev1:/mnt/ocfs2 # fallocate -p -o 0 -l 1048615 reflnktest > ocfs2dev1:/mnt/ocfs2 # dd if=10MBfile iflag=direct bs=1048576 count=1 > | hexdump -C > 00000000 cd cd cd cd cd cd cd cd cd cd cd cd cd cd cd cd > |................| > * > 1+0 records in > 1+0 records out > 1048576 bytes (1.0 MB, 1.0 MiB) copied, 0.0952464 s, 11.0 MB/s > 00100000 > > 2. Before this patch: > .... > ocfs2dev1:/mnt/ocfs2 # dd if=10MBfile iflag=direct bs=1048576 count=1 > | hexdump -C > 00000000 cd cd cd cd cd cd cd cd cd cd cd cd cd cd cd cd > |................| > * > 1+0 records in > 1+0 records out > 1048576 bytes (1.0 MB, 1.0 MiB) copied, 0.0954648 s, 11.0 MB/s > 00100000 > > 3. debugfs.ocfs2 -R stats /dev/sdb > ... > Block Size Bits: 12 Cluster Size Bits: 20 > ... > > Eric >> >> Thanks, >> Ashish >> >> On 08/28/2016 10:39 PM, Eric Ren wrote: >>> Hi, >>> >>> Thanks for this fix. I'd like to reproduce this issue locally and >>> test this patch, >>> could you elaborate the detailed steps of reproduction? >>> >>> Thanks, >>> Eric >>> >>> On 08/27/2016 07:04 AM, Ashish Samant wrote: >>>> If we punch a hole on a reflink such that following conditions are >>>> met: >>>> >>>> 1. start offset is on a cluster boundary >>>> 2. end offset is not on a cluster boundary >>>> 3. (end offset is somewhere in another extent) or >>>> (hole range > MAX_CONTIG_BYTES(1MB)), >>>> >>>> we dont COW the first cluster starting at the start offset. But in >>>> this >>>> case, we were wrongly passing this cluster to >>>> ocfs2_zero_range_for_truncate() to zero out. This will modify the >>>> cluster >>>> in place and zero it in the source too. >>>> >>>> Fix this by skipping this cluster in such a scenario. >>>> >>>> Reported-by: Saar Maoz <saar.maoz@oracle.com> >>>> Signed-off-by: Ashish Samant <ashish.samant@oracle.com> >>>> Reviewed-by: Srinivas Eeda <srinivas.eeda@oracle.com> >>>> --- >>>> v1->v2: >>>> -Changed the commit msg to include a better and generic description of >>>> the problem, for all cluster sizes. >>>> -Added Reported-by and Reviewed-by tags. >>>> fs/ocfs2/file.c | 34 ++++++++++++++++++++++++---------- >>>> 1 file changed, 24 insertions(+), 10 deletions(-) >>>> >>>> diff --git a/fs/ocfs2/file.c b/fs/ocfs2/file.c >>>> index 4e7b0dc..0b055bf 100644 >>>> --- a/fs/ocfs2/file.c >>>> +++ b/fs/ocfs2/file.c >>>> @@ -1506,7 +1506,8 @@ static int ocfs2_zero_partial_clusters(struct >>>> inode *inode, >>>> u64 start, u64 len) >>>> { >>>> int ret = 0; >>>> - u64 tmpend, end = start + len; >>>> + u64 tmpend = 0; >>>> + u64 end = start + len; >>>> struct ocfs2_super *osb = OCFS2_SB(inode->i_sb); >>>> unsigned int csize = osb->s_clustersize; >>>> handle_t *handle; >>>> @@ -1538,18 +1539,31 @@ static int >>>> ocfs2_zero_partial_clusters(struct inode *inode, >>>> } >>>> /* >>>> - * We want to get the byte offset of the end of the 1st cluster. >>>> + * If start is on a cluster boundary and end is somewhere in >>>> another >>>> + * cluster, we have not COWed the cluster starting at start, >>>> unless >>>> + * end is also within the same cluster. So, in this case, we >>>> skip this >>>> + * first call to ocfs2_zero_range_for_truncate() truncate and >>>> move on >>>> + * to the next one. >>>> */ >>>> - tmpend = (u64)osb->s_clustersize + (start & >>>> ~(osb->s_clustersize - 1)); >>>> - if (tmpend > end) >>>> - tmpend = end; >>>> + if ((start & (csize - 1)) != 0) { >>>> + /* >>>> + * We want to get the byte offset of the end of the 1st >>>> + * cluster. >>>> + */ >>>> + tmpend = (u64)osb->s_clustersize + >>>> + (start & ~(osb->s_clustersize - 1)); >>>> + if (tmpend > end) >>>> + tmpend = end; >>>> - trace_ocfs2_zero_partial_clusters_range1((unsigned long >>>> long)start, >>>> - (unsigned long long)tmpend); >>>> + trace_ocfs2_zero_partial_clusters_range1( >>>> + (unsigned long long)start, >>>> + (unsigned long long)tmpend); >>>> - ret = ocfs2_zero_range_for_truncate(inode, handle, start, >>>> tmpend); >>>> - if (ret) >>>> - mlog_errno(ret); >>>> + ret = ocfs2_zero_range_for_truncate(inode, handle, start, >>>> + tmpend); >>>> + if (ret) >>>> + mlog_errno(ret); >>>> + } >>>> if (tmpend < end) { >>>> /* >>> >>> >> >
Hi, I'm on 4.8.0-rc3 kernel. Hope someone else can double-confirm this;-) On 08/30/2016 12:11 PM, Ashish Samant wrote: > Hmm, thats weird. I see this on 4.7 kernel without the patch: > > # xfs_io -c 'pwrite -b 4k 0 10M' -f 10MBfile > wrote 10485760/10485760 bytes at offset 0 > 10 MiB, 2560 ops; 0.0000 sec (683.995 MiB/sec and 175102.5992 ops/sec) > # reflink -f 10MBfile reflnktest > # fallocate -p -o 0 -l 1048615 reflnktest > # dd if=10MBfile iflag=direct bs=1048576 count=1 | hexdump -C > 00000000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| > * > 1+0 records in > 1+0 records out > 1048576 bytes (1.0 MB) copied, 0.0321517 s, 32.6 MB/s > 00100000 > > and with patch > ---- > # dd if=10MBfile iflag=direct bs=1M count=1 | hexdump -C > 00000000 cd cd cd cd cd cd cd cd cd cd cd cd cd cd cd cd |................| I'm not familiar with this code. So why is the output "cd ..."? because we didn't write anything into "10MBfile". Is it a magic number when reading from a hole? Eric > * > 1+0 records in > 1+0 records out > 00100000 > > Thanks, > Ashish > > > On 08/29/2016 08:33 PM, Eric Ren wrote: >> Hello, >> >> On 08/30/2016 03:23 AM, Ashish Samant wrote: >>> Hi Eric, >>> >>> The easiest way to reproduce this is : >>> >>> 1. Create a random file of say 10 MB >>> xfs_io -c 'pwrite -b 4k 0 10M' -f 10MBfile >>> 2. Reflink it >>> reflink -f 10MBfile reflnktest >>> 3. Punch a hole at starting at cluster boundary with range greater that 1MB. You can >>> also use a range that will put the end offset in another extent. >>> fallocate -p -o 0 -l 1048615 reflnktest >>> 4. sync >>> 5. Check the first cluster in the source file. (It will be zeroed out). >>> dd if=10MBfile iflag=direct bs=<cluster size> count=1 | hexdump -C >> >> Thanks! I have a try myself, but I'm not sure what is our expected output and if the test >> result meet >> it: >> >> 1. After applying this patch: >> ocfs2dev1:/mnt/ocfs2 # rm 10MBfile reflnktest >> ocfs2dev1:/mnt/ocfs2 # xfs_io -c 'pwrite -b 4k 0 10M' -f 10MBfile >> wrote 10485760/10485760 bytes at offset 0 >> 10 MiB, 2560 ops; 0.0000 sec (1.089 GiB/sec and 285427.5839 ops/sec) >> ocfs2dev1:/mnt/ocfs2 # reflink -f 10MBfile reflnktest >> ocfs2dev1:/mnt/ocfs2 # fallocate -p -o 0 -l 1048615 reflnktest >> ocfs2dev1:/mnt/ocfs2 # dd if=10MBfile iflag=direct bs=1048576 count=1 | hexdump -C >> 00000000 cd cd cd cd cd cd cd cd cd cd cd cd cd cd cd cd |................| >> * >> 1+0 records in >> 1+0 records out >> 1048576 bytes (1.0 MB, 1.0 MiB) copied, 0.0952464 s, 11.0 MB/s >> 00100000 >> >> 2. Before this patch: >> .... >> ocfs2dev1:/mnt/ocfs2 # dd if=10MBfile iflag=direct bs=1048576 count=1 | hexdump -C >> 00000000 cd cd cd cd cd cd cd cd cd cd cd cd cd cd cd cd |................| >> * >> 1+0 records in >> 1+0 records out >> 1048576 bytes (1.0 MB, 1.0 MiB) copied, 0.0954648 s, 11.0 MB/s >> 00100000 >> >> 3. debugfs.ocfs2 -R stats /dev/sdb >> ... >> Block Size Bits: 12 Cluster Size Bits: 20 >> ... >> >> Eric >>> >>> Thanks, >>> Ashish >>> >>> On 08/28/2016 10:39 PM, Eric Ren wrote: >>>> Hi, >>>> >>>> Thanks for this fix. I'd like to reproduce this issue locally and test this patch, >>>> could you elaborate the detailed steps of reproduction? >>>> >>>> Thanks, >>>> Eric >>>> >>>> On 08/27/2016 07:04 AM, Ashish Samant wrote: >>>>> If we punch a hole on a reflink such that following conditions are met: >>>>> >>>>> 1. start offset is on a cluster boundary >>>>> 2. end offset is not on a cluster boundary >>>>> 3. (end offset is somewhere in another extent) or >>>>> (hole range > MAX_CONTIG_BYTES(1MB)), >>>>> >>>>> we dont COW the first cluster starting at the start offset. But in this >>>>> case, we were wrongly passing this cluster to >>>>> ocfs2_zero_range_for_truncate() to zero out. This will modify the cluster >>>>> in place and zero it in the source too. >>>>> >>>>> Fix this by skipping this cluster in such a scenario. >>>>> >>>>> Reported-by: Saar Maoz <saar.maoz@oracle.com> >>>>> Signed-off-by: Ashish Samant <ashish.samant@oracle.com> >>>>> Reviewed-by: Srinivas Eeda <srinivas.eeda@oracle.com> >>>>> --- >>>>> v1->v2: >>>>> -Changed the commit msg to include a better and generic description of >>>>> the problem, for all cluster sizes. >>>>> -Added Reported-by and Reviewed-by tags. >>>>> fs/ocfs2/file.c | 34 ++++++++++++++++++++++++---------- >>>>> 1 file changed, 24 insertions(+), 10 deletions(-) >>>>> >>>>> diff --git a/fs/ocfs2/file.c b/fs/ocfs2/file.c >>>>> index 4e7b0dc..0b055bf 100644 >>>>> --- a/fs/ocfs2/file.c >>>>> +++ b/fs/ocfs2/file.c >>>>> @@ -1506,7 +1506,8 @@ static int ocfs2_zero_partial_clusters(struct inode *inode, >>>>> u64 start, u64 len) >>>>> { >>>>> int ret = 0; >>>>> - u64 tmpend, end = start + len; >>>>> + u64 tmpend = 0; >>>>> + u64 end = start + len; >>>>> struct ocfs2_super *osb = OCFS2_SB(inode->i_sb); >>>>> unsigned int csize = osb->s_clustersize; >>>>> handle_t *handle; >>>>> @@ -1538,18 +1539,31 @@ static int ocfs2_zero_partial_clusters(struct inode *inode, >>>>> } >>>>> /* >>>>> - * We want to get the byte offset of the end of the 1st cluster. >>>>> + * If start is on a cluster boundary and end is somewhere in another >>>>> + * cluster, we have not COWed the cluster starting at start, unless >>>>> + * end is also within the same cluster. So, in this case, we skip this >>>>> + * first call to ocfs2_zero_range_for_truncate() truncate and move on >>>>> + * to the next one. >>>>> */ >>>>> - tmpend = (u64)osb->s_clustersize + (start & ~(osb->s_clustersize - 1)); >>>>> - if (tmpend > end) >>>>> - tmpend = end; >>>>> + if ((start & (csize - 1)) != 0) { >>>>> + /* >>>>> + * We want to get the byte offset of the end of the 1st >>>>> + * cluster. >>>>> + */ >>>>> + tmpend = (u64)osb->s_clustersize + >>>>> + (start & ~(osb->s_clustersize - 1)); >>>>> + if (tmpend > end) >>>>> + tmpend = end; >>>>> - trace_ocfs2_zero_partial_clusters_range1((unsigned long long)start, >>>>> - (unsigned long long)tmpend); >>>>> + trace_ocfs2_zero_partial_clusters_range1( >>>>> + (unsigned long long)start, >>>>> + (unsigned long long)tmpend); >>>>> - ret = ocfs2_zero_range_for_truncate(inode, handle, start, tmpend); >>>>> - if (ret) >>>>> - mlog_errno(ret); >>>>> + ret = ocfs2_zero_range_for_truncate(inode, handle, start, >>>>> + tmpend); >>>>> + if (ret) >>>>> + mlog_errno(ret); >>>>> + } >>>>> if (tmpend < end) { >>>>> /* >>>> >>>> >>> >> > >
Hi Eric, I am able to reproduce this on 4.8.0-rc3 as well. Can you try again and issue a sync between fallocate and dd? On 08/30/2016 12:38 AM, Eric Ren wrote: > Hi, > > I'm on 4.8.0-rc3 kernel. Hope someone else can double-confirm this;-) > > On 08/30/2016 12:11 PM, Ashish Samant wrote: >> Hmm, thats weird. I see this on 4.7 kernel without the patch: >> >> # xfs_io -c 'pwrite -b 4k 0 10M' -f 10MBfile >> wrote 10485760/10485760 bytes at offset 0 >> 10 MiB, 2560 ops; 0.0000 sec (683.995 MiB/sec and 175102.5992 ops/sec) >> # reflink -f 10MBfile reflnktest >> # fallocate -p -o 0 -l 1048615 reflnktest >> # dd if=10MBfile iflag=direct bs=1048576 count=1 | hexdump -C >> 00000000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 >> |................| >> * >> 1+0 records in >> 1+0 records out >> 1048576 bytes (1.0 MB) copied, 0.0321517 s, 32.6 MB/s >> 00100000 >> >> and with patch >> ---- >> # dd if=10MBfile iflag=direct bs=1M count=1 | hexdump -C >> 00000000 cd cd cd cd cd cd cd cd cd cd cd cd cd cd cd cd >> |................| > > I'm not familiar with this code. So why is the output "cd ..."? > because we didn't write anything > into "10MBfile". Is it a magic number when reading from a hole? No, "cd" is what xfs_io wrote into the file. Those are the original contents of the file which are overwritten by 0 in the first cluster because of this bug. Thanks, Ashish > > Eric > >> * >> 1+0 records in >> 1+0 records out >> 00100000 > > > >> >> Thanks, >> Ashish >> >> >> On 08/29/2016 08:33 PM, Eric Ren wrote: >>> Hello, >>> >>> On 08/30/2016 03:23 AM, Ashish Samant wrote: >>>> Hi Eric, >>>> >>>> The easiest way to reproduce this is : >>>> >>>> 1. Create a random file of say 10 MB >>>> xfs_io -c 'pwrite -b 4k 0 10M' -f 10MBfile >>>> 2. Reflink it >>>> reflink -f 10MBfile reflnktest >>>> 3. Punch a hole at starting at cluster boundary with range greater >>>> that 1MB. You can also use a range that will put the end offset in >>>> another extent. >>>> fallocate -p -o 0 -l 1048615 reflnktest >>>> 4. sync >>>> 5. Check the first cluster in the source file. (It will be zeroed >>>> out). >>>> dd if=10MBfile iflag=direct bs=<cluster size> count=1 | hexdump -C >>> >>> Thanks! I have a try myself, but I'm not sure what is our expected >>> output and if the test result meet >>> it: >>> >>> 1. After applying this patch: >>> ocfs2dev1:/mnt/ocfs2 # rm 10MBfile reflnktest >>> ocfs2dev1:/mnt/ocfs2 # xfs_io -c 'pwrite -b 4k 0 10M' -f 10MBfile >>> wrote 10485760/10485760 bytes at offset 0 >>> 10 MiB, 2560 ops; 0.0000 sec (1.089 GiB/sec and 285427.5839 ops/sec) >>> ocfs2dev1:/mnt/ocfs2 # reflink -f 10MBfile reflnktest >>> ocfs2dev1:/mnt/ocfs2 # fallocate -p -o 0 -l 1048615 reflnktest >>> ocfs2dev1:/mnt/ocfs2 # dd if=10MBfile iflag=direct bs=1048576 >>> count=1 | hexdump -C >>> 00000000 cd cd cd cd cd cd cd cd cd cd cd cd cd cd cd cd >>> |................| >>> * >>> 1+0 records in >>> 1+0 records out >>> 1048576 bytes (1.0 MB, 1.0 MiB) copied, 0.0952464 s, 11.0 MB/s >>> 00100000 >>> >>> 2. Before this patch: >>> .... >>> ocfs2dev1:/mnt/ocfs2 # dd if=10MBfile iflag=direct bs=1048576 >>> count=1 | hexdump -C >>> 00000000 cd cd cd cd cd cd cd cd cd cd cd cd cd cd cd cd >>> |................| >>> * >>> 1+0 records in >>> 1+0 records out >>> 1048576 bytes (1.0 MB, 1.0 MiB) copied, 0.0954648 s, 11.0 MB/s >>> 00100000 >>> >>> 3. debugfs.ocfs2 -R stats /dev/sdb >>> ... >>> Block Size Bits: 12 Cluster Size Bits: 20 >>> ... >>> >>> Eric >>>> >>>> Thanks, >>>> Ashish >>>> >>>> On 08/28/2016 10:39 PM, Eric Ren wrote: >>>>> Hi, >>>>> >>>>> Thanks for this fix. I'd like to reproduce this issue locally and >>>>> test this patch, >>>>> could you elaborate the detailed steps of reproduction? >>>>> >>>>> Thanks, >>>>> Eric >>>>> >>>>> On 08/27/2016 07:04 AM, Ashish Samant wrote: >>>>>> If we punch a hole on a reflink such that following conditions >>>>>> are met: >>>>>> >>>>>> 1. start offset is on a cluster boundary >>>>>> 2. end offset is not on a cluster boundary >>>>>> 3. (end offset is somewhere in another extent) or >>>>>> (hole range > MAX_CONTIG_BYTES(1MB)), >>>>>> >>>>>> we dont COW the first cluster starting at the start offset. But >>>>>> in this >>>>>> case, we were wrongly passing this cluster to >>>>>> ocfs2_zero_range_for_truncate() to zero out. This will modify the >>>>>> cluster >>>>>> in place and zero it in the source too. >>>>>> >>>>>> Fix this by skipping this cluster in such a scenario. >>>>>> >>>>>> Reported-by: Saar Maoz <saar.maoz@oracle.com> >>>>>> Signed-off-by: Ashish Samant <ashish.samant@oracle.com> >>>>>> Reviewed-by: Srinivas Eeda <srinivas.eeda@oracle.com> >>>>>> --- >>>>>> v1->v2: >>>>>> -Changed the commit msg to include a better and generic >>>>>> description of >>>>>> the problem, for all cluster sizes. >>>>>> -Added Reported-by and Reviewed-by tags. >>>>>> fs/ocfs2/file.c | 34 ++++++++++++++++++++++++---------- >>>>>> 1 file changed, 24 insertions(+), 10 deletions(-) >>>>>> >>>>>> diff --git a/fs/ocfs2/file.c b/fs/ocfs2/file.c >>>>>> index 4e7b0dc..0b055bf 100644 >>>>>> --- a/fs/ocfs2/file.c >>>>>> +++ b/fs/ocfs2/file.c >>>>>> @@ -1506,7 +1506,8 @@ static int >>>>>> ocfs2_zero_partial_clusters(struct inode *inode, >>>>>> u64 start, u64 len) >>>>>> { >>>>>> int ret = 0; >>>>>> - u64 tmpend, end = start + len; >>>>>> + u64 tmpend = 0; >>>>>> + u64 end = start + len; >>>>>> struct ocfs2_super *osb = OCFS2_SB(inode->i_sb); >>>>>> unsigned int csize = osb->s_clustersize; >>>>>> handle_t *handle; >>>>>> @@ -1538,18 +1539,31 @@ static int >>>>>> ocfs2_zero_partial_clusters(struct inode *inode, >>>>>> } >>>>>> /* >>>>>> - * We want to get the byte offset of the end of the 1st >>>>>> cluster. >>>>>> + * If start is on a cluster boundary and end is somewhere in >>>>>> another >>>>>> + * cluster, we have not COWed the cluster starting at start, >>>>>> unless >>>>>> + * end is also within the same cluster. So, in this case, we >>>>>> skip this >>>>>> + * first call to ocfs2_zero_range_for_truncate() truncate >>>>>> and move on >>>>>> + * to the next one. >>>>>> */ >>>>>> - tmpend = (u64)osb->s_clustersize + (start & >>>>>> ~(osb->s_clustersize - 1)); >>>>>> - if (tmpend > end) >>>>>> - tmpend = end; >>>>>> + if ((start & (csize - 1)) != 0) { >>>>>> + /* >>>>>> + * We want to get the byte offset of the end of the 1st >>>>>> + * cluster. >>>>>> + */ >>>>>> + tmpend = (u64)osb->s_clustersize + >>>>>> + (start & ~(osb->s_clustersize - 1)); >>>>>> + if (tmpend > end) >>>>>> + tmpend = end; >>>>>> - trace_ocfs2_zero_partial_clusters_range1((unsigned long >>>>>> long)start, >>>>>> - (unsigned long long)tmpend); >>>>>> + trace_ocfs2_zero_partial_clusters_range1( >>>>>> + (unsigned long long)start, >>>>>> + (unsigned long long)tmpend); >>>>>> - ret = ocfs2_zero_range_for_truncate(inode, handle, start, >>>>>> tmpend); >>>>>> - if (ret) >>>>>> - mlog_errno(ret); >>>>>> + ret = ocfs2_zero_range_for_truncate(inode, handle, start, >>>>>> + tmpend); >>>>>> + if (ret) >>>>>> + mlog_errno(ret); >>>>>> + } >>>>>> if (tmpend < end) { >>>>>> /* >>>>> >>>>> >>>> >>> >> >> >
Hi Ashish, On 08/31/2016 07:17 AM, Ashish Samant wrote: > Hi Eric, > > I am able to reproduce this on 4.8.0-rc3 as well. Can you try again and issue a sync > between fallocate and dd? It works! ocfs2dev2 is not patched: ==================== ocfs2dev2:/mnt/ocfs2 # reflink -f 10MBfile reflnktest ocfs2dev2:/mnt/ocfs2 # fallocate -p -o 0 -l 1048615 reflnktest ocfs2dev2:/mnt/ocfs2 # sync ocfs2dev2:/mnt/ocfs2 # dd if=10MBfile iflag=direct bs=1048576 count=1 | hexdump -C 00000000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| * 1+0 records in 1+0 records out 1048576 bytes (1.0 MB, 1.0 MiB) copied, 0.0936888 s, 11.2 MB/s 00100000 while ocfs2dev1 is patched: ====================== ocfs2dev1:/mnt/ocfs2 # xfs_io -c 'pwrite -b 4k 0 10M' -f 10MBfile wrote 10485760/10485760 bytes at offset 0 10 MiB, 2560 ops; 0.0000 sec (183.137 MiB/sec and 46883.0122 ops/sec) ocfs2dev1:/mnt/ocfs2 # reflink -f 10MBfile reflnktest ocfs2dev1:/mnt/ocfs2 # fallocate -p -o 0 -l 1048615 reflnktest ocfs2dev1:/mnt/ocfs2 # sync ocfs2dev1:/mnt/ocfs2 # dd if=10MBfile iflag=direct bs=1048576 count=1 | hexdump -C 00000000 cd cd cd cd cd cd cd cd cd cd cd cd cd cd cd cd |................| * 1+0 records in 1+0 records out 1048576 bytes (1.0 MB, 1.0 MiB) copied, 0.0933082 s, 11.2 MB/s 00100000 > > On 08/30/2016 12:38 AM, Eric Ren wrote: >> Hi, >> >> I'm on 4.8.0-rc3 kernel. Hope someone else can double-confirm this;-) >> >> On 08/30/2016 12:11 PM, Ashish Samant wrote: >>> Hmm, thats weird. I see this on 4.7 kernel without the patch: >>> >>> # xfs_io -c 'pwrite -b 4k 0 10M' -f 10MBfile >>> wrote 10485760/10485760 bytes at offset 0 >>> 10 MiB, 2560 ops; 0.0000 sec (683.995 MiB/sec and 175102.5992 ops/sec) >>> # reflink -f 10MBfile reflnktest >>> # fallocate -p -o 0 -l 1048615 reflnktest >>> # dd if=10MBfile iflag=direct bs=1048576 count=1 | hexdump -C >>> 00000000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| >>> * >>> 1+0 records in >>> 1+0 records out >>> 1048576 bytes (1.0 MB) copied, 0.0321517 s, 32.6 MB/s >>> 00100000 >>> >>> and with patch >>> ---- >>> # dd if=10MBfile iflag=direct bs=1M count=1 | hexdump -C >>> 00000000 cd cd cd cd cd cd cd cd cd cd cd cd cd cd cd cd |................| >> >> I'm not familiar with this code. So why is the output "cd ..."? because we didn't write >> anything >> into "10MBfile". Is it a magic number when reading from a hole? > No, "cd" is what xfs_io wrote into the file. Those are the original contents of the file > which are overwritten by 0 in the first cluster because of this bug. Ah, gotcha, thanks! Eric > > Thanks, > Ashish >> >> Eric >> >>> * >>> 1+0 records in >>> 1+0 records out >>> 00100000 >> >> >> >>> >>> Thanks, >>> Ashish >>> >>> >>> On 08/29/2016 08:33 PM, Eric Ren wrote: >>>> Hello, >>>> >>>> On 08/30/2016 03:23 AM, Ashish Samant wrote: >>>>> Hi Eric, >>>>> >>>>> The easiest way to reproduce this is : >>>>> >>>>> 1. Create a random file of say 10 MB >>>>> xfs_io -c 'pwrite -b 4k 0 10M' -f 10MBfile >>>>> 2. Reflink it >>>>> reflink -f 10MBfile reflnktest >>>>> 3. Punch a hole at starting at cluster boundary with range greater that 1MB. You can >>>>> also use a range that will put the end offset in another extent. >>>>> fallocate -p -o 0 -l 1048615 reflnktest >>>>> 4. sync >>>>> 5. Check the first cluster in the source file. (It will be zeroed out). >>>>> dd if=10MBfile iflag=direct bs=<cluster size> count=1 | hexdump -C >>>> >>>> Thanks! I have a try myself, but I'm not sure what is our expected output and if the >>>> test result meet >>>> it: >>>> >>>> 1. After applying this patch: >>>> ocfs2dev1:/mnt/ocfs2 # rm 10MBfile reflnktest >>>> ocfs2dev1:/mnt/ocfs2 # xfs_io -c 'pwrite -b 4k 0 10M' -f 10MBfile >>>> wrote 10485760/10485760 bytes at offset 0 >>>> 10 MiB, 2560 ops; 0.0000 sec (1.089 GiB/sec and 285427.5839 ops/sec) >>>> ocfs2dev1:/mnt/ocfs2 # reflink -f 10MBfile reflnktest >>>> ocfs2dev1:/mnt/ocfs2 # fallocate -p -o 0 -l 1048615 reflnktest >>>> ocfs2dev1:/mnt/ocfs2 # dd if=10MBfile iflag=direct bs=1048576 count=1 | hexdump -C >>>> 00000000 cd cd cd cd cd cd cd cd cd cd cd cd cd cd cd cd |................| >>>> * >>>> 1+0 records in >>>> 1+0 records out >>>> 1048576 bytes (1.0 MB, 1.0 MiB) copied, 0.0952464 s, 11.0 MB/s >>>> 00100000 >>>> >>>> 2. Before this patch: >>>> .... >>>> ocfs2dev1:/mnt/ocfs2 # dd if=10MBfile iflag=direct bs=1048576 count=1 | hexdump -C >>>> 00000000 cd cd cd cd cd cd cd cd cd cd cd cd cd cd cd cd |................| >>>> * >>>> 1+0 records in >>>> 1+0 records out >>>> 1048576 bytes (1.0 MB, 1.0 MiB) copied, 0.0954648 s, 11.0 MB/s >>>> 00100000 >>>> >>>> 3. debugfs.ocfs2 -R stats /dev/sdb >>>> ... >>>> Block Size Bits: 12 Cluster Size Bits: 20 >>>> ... >>>> >>>> Eric >>>>> >>>>> Thanks, >>>>> Ashish >>>>> >>>>> On 08/28/2016 10:39 PM, Eric Ren wrote: >>>>>> Hi, >>>>>> >>>>>> Thanks for this fix. I'd like to reproduce this issue locally and test this patch, >>>>>> could you elaborate the detailed steps of reproduction? >>>>>> >>>>>> Thanks, >>>>>> Eric >>>>>> >>>>>> On 08/27/2016 07:04 AM, Ashish Samant wrote: >>>>>>> If we punch a hole on a reflink such that following conditions are met: >>>>>>> >>>>>>> 1. start offset is on a cluster boundary >>>>>>> 2. end offset is not on a cluster boundary >>>>>>> 3. (end offset is somewhere in another extent) or >>>>>>> (hole range > MAX_CONTIG_BYTES(1MB)), >>>>>>> >>>>>>> we dont COW the first cluster starting at the start offset. But in this >>>>>>> case, we were wrongly passing this cluster to >>>>>>> ocfs2_zero_range_for_truncate() to zero out. This will modify the cluster >>>>>>> in place and zero it in the source too. >>>>>>> >>>>>>> Fix this by skipping this cluster in such a scenario. >>>>>>> >>>>>>> Reported-by: Saar Maoz <saar.maoz@oracle.com> >>>>>>> Signed-off-by: Ashish Samant <ashish.samant@oracle.com> >>>>>>> Reviewed-by: Srinivas Eeda <srinivas.eeda@oracle.com> >>>>>>> --- >>>>>>> v1->v2: >>>>>>> -Changed the commit msg to include a better and generic description of >>>>>>> the problem, for all cluster sizes. >>>>>>> -Added Reported-by and Reviewed-by tags. >>>>>>> fs/ocfs2/file.c | 34 ++++++++++++++++++++++++---------- >>>>>>> 1 file changed, 24 insertions(+), 10 deletions(-) >>>>>>> >>>>>>> diff --git a/fs/ocfs2/file.c b/fs/ocfs2/file.c >>>>>>> index 4e7b0dc..0b055bf 100644 >>>>>>> --- a/fs/ocfs2/file.c >>>>>>> +++ b/fs/ocfs2/file.c >>>>>>> @@ -1506,7 +1506,8 @@ static int ocfs2_zero_partial_clusters(struct inode *inode, >>>>>>> u64 start, u64 len) >>>>>>> { >>>>>>> int ret = 0; >>>>>>> - u64 tmpend, end = start + len; >>>>>>> + u64 tmpend = 0; >>>>>>> + u64 end = start + len; >>>>>>> struct ocfs2_super *osb = OCFS2_SB(inode->i_sb); >>>>>>> unsigned int csize = osb->s_clustersize; >>>>>>> handle_t *handle; >>>>>>> @@ -1538,18 +1539,31 @@ static int ocfs2_zero_partial_clusters(struct inode *inode, >>>>>>> } >>>>>>> /* >>>>>>> - * We want to get the byte offset of the end of the 1st cluster. >>>>>>> + * If start is on a cluster boundary and end is somewhere in another >>>>>>> + * cluster, we have not COWed the cluster starting at start, unless >>>>>>> + * end is also within the same cluster. So, in this case, we skip this >>>>>>> + * first call to ocfs2_zero_range_for_truncate() truncate and move on >>>>>>> + * to the next one. >>>>>>> */ >>>>>>> - tmpend = (u64)osb->s_clustersize + (start & ~(osb->s_clustersize - 1)); >>>>>>> - if (tmpend > end) >>>>>>> - tmpend = end; >>>>>>> + if ((start & (csize - 1)) != 0) { >>>>>>> + /* >>>>>>> + * We want to get the byte offset of the end of the 1st >>>>>>> + * cluster. >>>>>>> + */ >>>>>>> + tmpend = (u64)osb->s_clustersize + >>>>>>> + (start & ~(osb->s_clustersize - 1)); >>>>>>> + if (tmpend > end) >>>>>>> + tmpend = end; >>>>>>> - trace_ocfs2_zero_partial_clusters_range1((unsigned long long)start, >>>>>>> - (unsigned long long)tmpend); >>>>>>> + trace_ocfs2_zero_partial_clusters_range1( >>>>>>> + (unsigned long long)start, >>>>>>> + (unsigned long long)tmpend); >>>>>>> - ret = ocfs2_zero_range_for_truncate(inode, handle, start, tmpend); >>>>>>> - if (ret) >>>>>>> - mlog_errno(ret); >>>>>>> + ret = ocfs2_zero_range_for_truncate(inode, handle, start, >>>>>>> + tmpend); >>>>>>> + if (ret) >>>>>>> + mlog_errno(ret); >>>>>>> + } >>>>>>> if (tmpend < end) { >>>>>>> /* >>>>>> >>>>>> >>>>> >>>> >>> >>> >> > >
diff --git a/fs/ocfs2/file.c b/fs/ocfs2/file.c index 4e7b0dc..0b055bf 100644 --- a/fs/ocfs2/file.c +++ b/fs/ocfs2/file.c @@ -1506,7 +1506,8 @@ static int ocfs2_zero_partial_clusters(struct inode *inode, u64 start, u64 len) { int ret = 0; - u64 tmpend, end = start + len; + u64 tmpend = 0; + u64 end = start + len; struct ocfs2_super *osb = OCFS2_SB(inode->i_sb); unsigned int csize = osb->s_clustersize; handle_t *handle; @@ -1538,18 +1539,31 @@ static int ocfs2_zero_partial_clusters(struct inode *inode, } /* - * We want to get the byte offset of the end of the 1st cluster. + * If start is on a cluster boundary and end is somewhere in another + * cluster, we have not COWed the cluster starting at start, unless + * end is also within the same cluster. So, in this case, we skip this + * first call to ocfs2_zero_range_for_truncate() truncate and move on + * to the next one. */ - tmpend = (u64)osb->s_clustersize + (start & ~(osb->s_clustersize - 1)); - if (tmpend > end) - tmpend = end; + if ((start & (csize - 1)) != 0) { + /* + * We want to get the byte offset of the end of the 1st + * cluster. + */ + tmpend = (u64)osb->s_clustersize + + (start & ~(osb->s_clustersize - 1)); + if (tmpend > end) + tmpend = end; - trace_ocfs2_zero_partial_clusters_range1((unsigned long long)start, - (unsigned long long)tmpend); + trace_ocfs2_zero_partial_clusters_range1( + (unsigned long long)start, + (unsigned long long)tmpend); - ret = ocfs2_zero_range_for_truncate(inode, handle, start, tmpend); - if (ret) - mlog_errno(ret); + ret = ocfs2_zero_range_for_truncate(inode, handle, start, + tmpend); + if (ret) + mlog_errno(ret); + } if (tmpend < end) { /*