diff mbox

Re: question about striped_read

Message ID 201307261007003577701@gmail.com (mailing list archive)
State New, archived
Headers show

Commit Message

majianpeng July 26, 2013, 2:07 a.m. UTC
>On Fri, Jul 26, 2013 at 9:38 AM, majianpeng <majianpeng@gmail.com> wrote:

>>>On Fri, Jul 26, 2013 at 9:22 AM, majianpeng <majianpeng@gmail.com> wrote:

>>>>>On Fri, Jul 26, 2013 at 8:48 AM, majianpeng <majianpeng@gmail.com> wrote:

>>>>>>>On Thu, 25 Jul 2013, Yan, Zheng wrote:

>>>>>>>> On Thu, Jul 25, 2013 at 2:55 PM, majianpeng <majianpeng@gmail.com> wrote:

>>>>>>>> >>On Thu, 25 Jul 2013, majianpeng wrote:

>>>>>>>> >>> Hi all,

>>>>>>>> >>>      I met a problem and ask somebody could help me.

>>>>>>>> >>> In func striped_read()

>>>>>>>> >>> > if (ret > 0) {

>>>>>>>> >>> >                int didpages = (page_align + ret) >> PAGE_CACHE_SHIFT;

>>>>>>>> >>>

>>>>>>>> >>> >                if (read < pos - off) {

>>>>>>>> >>> >                       dout(" zero gap %llu to %llu\n", off + read, pos);

>>>>>>>> >>> >                        ceph_zero_page_vector_range(page_align + read,

>>>>>>>> >>> >                                                    pos - off - read, pages);

>>>>>>>> >>> >                }

>>>>>>>> >>> >                pos += ret;

>>>>>>>>

>>>>>>>> I think you are right. probably above line should be 'pos += this_len'

>>>>>>>

>>>>>>>It should be easy to construct a simple test for this.  E.g., something

>>>>>>>like

>>>>>>>

>>>>>>> pwrite(fd, buf, 0, 3000000);

>>>>>>> pwrite(fd, buf, 4194304, 1000);

>>>>>>> pread(fd, buf, 0, 6000000);

>>>>>>> ...

>>>>>>>

>>>>>>>and whatever else to verify that pos was in fact advanced properly?

>>>>>>>

>>>>>> The following is my test code:

>>>>>> void hole_test()

>>>>>> {

>>>>>>         char buf[4194304];

>>>>>>         ssize_t ret;

>>>>>>         int fd = open("/media/ceph/test", O_RDWR|O_CREAT|O_DIRECT|O_TRUNC);

>>>>>>         if (fd < 0) {

>>>>>>                 printf("open error %s\n", strerror(errno));

>>>>>>                 return;

>>>>>>         }

>>>>>>

>>>>>>         ret = pwrite(fd, buf, 0, 3000000);

>>>>>>         ret = pwrite(fd, buf, 4194304, 1000);

>>>>>>         ret = pread(fd, buf , 0, 6000000);

>>>>>>         close(fd);

>>>>>> }

>>>>>>

>>>>>> The debug message from striped_read are:

>>>>>> [  267.530266] ceph:           file.c:356  : striped_read 6000000~0 (read 0) got 0

>>>>>> [  267.530270] ceph:           file.c:396  : striped_read returns 0

>>>>>>

>>>>>> The result isn't what's your said.

>>>>>> Am i missing something?

>>>>>>

>>>>>> BTW, i think Yan is ok,

>>>>>> The code

>>>>>> pos += ret; should pos += this_len.

>>>>>> Because this_len can larger than ret.

>>>>>> But the question is what's condition can cause this?

>>>>>

>>>>>by default, ceph strips file to 4M objects. In above example, the

>>>>>first object only has

>>>>>3M data, so 'ret = 3M'  and 'this_len = 4M'

>>>> actually, in func calc_layout: the read length is the smaller between object and left.

>>>>>

>>>>>> And can the short_read operation handle this situation?

>>>>>

>>>>>I don't think it's good idea to short read unless we really reach EOF.

>>>> Can you explain in detail?

>>>>

>>>

>>>because some user program interpret short read as EOF reached. If we

>>>return short

>>>read they get confused.

>> The reason cause short-read is only EOF? or other reason?

>>

>

>I think yes, (at least for reading from FS and read size is not very large)

we can remove those code:

Becase in fun striped_read, the short-read have two reasons:eof or met a hole.
In either case ,the later was_short handler can handle.
Is it ok?

Thanks
Jianpeng Ma

Comments

Yan, Zheng July 29, 2013, 5:02 a.m. UTC | #1
On Mon, Jul 29, 2013 at 11:00 AM, majianpeng <majianpeng@gmail.com> wrote:
>
> [snip]
> >I don't think the later was_short can handle the hole case. For the hole case,
> >we should try reading next strip object instead of return. how about
> >below patch.
> >
> Hi Yan,
>         i uesed this demo to test hole case.
> dd if=/dev/urandom bs=4096 count=2 of=file_with_holes
> dd if=/dev/urandom bs=4096 seek=7 count=2 of=file_with_holes
>
> dd if=file_with_holes of=/dev/null bs=16k count=1 iflag=direct
> Using the dynamic_debug in striped_read,  the message are:
> >[ 8743.663499] ceph:           file.c:350  : striped_read 0~16384 (read 0) got 16384
> >[ 8743.663502] ceph:           file.c:390  : striped_read returns 16384
> From the messages, we can see it can't hit the short-read.
> For the ceph-file-hole, how does the ceph handle?
> Or am i missing something?

the default strip size is 4M, all data are written to the first object
in your test case.
could you try something like below.

dd if=/dev/urandom bs=1M count=2 of=file_with_holes
dd if=/dev/urandom bs=1M count=2 seek=4 of=file_with_holes conv=notrunc
dd if=file_with_holes bs=8M >/dev/null

Regards
Yan, Zheng


>
>
> Thanks!
> Jianpeng Ma
>
> >Regards
> >Yan, Zheng
> >---
> >diff --git a/fs/ceph/file.c b/fs/ceph/file.c
> >index 271a346..6ca2921 100644
> >--- a/fs/ceph/file.c
> >+++ b/fs/ceph/file.c
> >@@ -350,16 +350,17 @@ more:
> >            ret, hit_stripe ? " HITSTRIPE" : "", was_short ? " SHORT" : "");
> >
> >       if (ret > 0) {
> >-              int didpages = (page_align + ret) >> PAGE_CACHE_SHIFT;
> >+              int didpages = (page_align + this_len) >> PAGE_CACHE_SHIFT;
> >
> >-              if (read < pos - off) {
> >-                      dout(" zero gap %llu to %llu\n", off + read, pos);
> >-                      ceph_zero_page_vector_range(page_align + read,
> >-                                                  pos - off - read, pages);
> >+              if (was_short) {
> >+                      dout(" zero gap %llu to %llu\n",
> >+                           pos + ret, pos + this_len);
> >+                      ceph_zero_page_vector_range(page_align + ret,
> >+                                                  this_len - ret, page_pos);
> >               }
> >-              pos += ret;
> >+              pos += this_len;
> >               read = pos - off;
> >-              left -= ret;
> >+              left -= this_len;
> >               page_pos += didpages;
> >               pages_left -= didpages;
> >
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
majianpeng July 30, 2013, 2:08 a.m. UTC | #2
Pk9uIE1vbiwgSnVsIDI5LCAyMDEzIGF0IDExOjAwIEFNLCBtYWppYW5wZW5nIDxtYWppYW5wZW5n
QGdtYWlsLmNvbT4gd3JvdGU6DQo+Pg0KPj4gW3NuaXBdDQo+PiA+SSBkb24ndCB0aGluayB0aGUg
bGF0ZXIgd2FzX3Nob3J0IGNhbiBoYW5kbGUgdGhlIGhvbGUgY2FzZS4gRm9yIHRoZSBob2xlIGNh
c2UsDQo+PiA+d2Ugc2hvdWxkIHRyeSByZWFkaW5nIG5leHQgc3RyaXAgb2JqZWN0IGluc3RlYWQg
b2YgcmV0dXJuLiBob3cgYWJvdXQNCj4+ID5iZWxvdyBwYXRjaC4NCj4+ID4NCj4+IEhpIFlhbiwN
Cj4+ICAgICAgICAgaSB1ZXNlZCB0aGlzIGRlbW8gdG8gdGVzdCBob2xlIGNhc2UuDQo+PiBkZCBp
Zj0vZGV2L3VyYW5kb20gYnM9NDA5NiBjb3VudD0yIG9mPWZpbGVfd2l0aF9ob2xlcw0KPj4gZGQg
aWY9L2Rldi91cmFuZG9tIGJzPTQwOTYgc2Vlaz03IGNvdW50PTIgb2Y9ZmlsZV93aXRoX2hvbGVz
DQo+Pg0KPj4gZGQgaWY9ZmlsZV93aXRoX2hvbGVzIG9mPS9kZXYvbnVsbCBicz0xNmsgY291bnQ9
MSBpZmxhZz1kaXJlY3QNCj4+IFVzaW5nIHRoZSBkeW5hbWljX2RlYnVnIGluIHN0cmlwZWRfcmVh
ZCwgIHRoZSBtZXNzYWdlIGFyZToNCj4+ID5bIDg3NDMuNjYzNDk5XSBjZXBoOiAgICAgICAgICAg
ZmlsZS5jOjM1MCAgOiBzdHJpcGVkX3JlYWQgMH4xNjM4NCAocmVhZCAwKSBnb3QgMTYzODQNCj4+
ID5bIDg3NDMuNjYzNTAyXSBjZXBoOiAgICAgICAgICAgZmlsZS5jOjM5MCAgOiBzdHJpcGVkX3Jl
YWQgcmV0dXJucyAxNjM4NA0KPj4gRnJvbSB0aGUgbWVzc2FnZXMsIHdlIGNhbiBzZWUgaXQgY2Fu
J3QgaGl0IHRoZSBzaG9ydC1yZWFkLg0KPj4gRm9yIHRoZSBjZXBoLWZpbGUtaG9sZSwgaG93IGRv
ZXMgdGhlIGNlcGggaGFuZGxlPw0KPj4gT3IgYW0gaSBtaXNzaW5nIHNvbWV0aGluZz8NCj4NCj50
aGUgZGVmYXVsdCBzdHJpcCBzaXplIGlzIDRNLCBhbGwgZGF0YSBhcmUgd3JpdHRlbiB0byB0aGUg
Zmlyc3Qgb2JqZWN0DQo+aW4geW91ciB0ZXN0IGNhc2UuDQo+Y291bGQgeW91IHRyeSBzb21ldGhp
bmcgbGlrZSBiZWxvdy4NCj4NCj5kZCBpZj0vZGV2L3VyYW5kb20gYnM9MU0gY291bnQ9MiBvZj1m
aWxlX3dpdGhfaG9sZXMNCj5kZCBpZj0vZGV2L3VyYW5kb20gYnM9MU0gY291bnQ9MiBzZWVrPTQg
b2Y9ZmlsZV93aXRoX2hvbGVzIGNvbnY9bm90cnVuYw0KPmRkIGlmPWZpbGVfd2l0aF9ob2xlcyBi
cz04TSA+L2Rldi9udWxsDQo+DQoNCkZyb20gYWJvdmUgdGVzdCwgaSB0aGluayB5b3VyIHBhdGNo
IGlzIHJpZ2h0Lg0KQWx0aG91Z2gsIHRoZSBvcmlnaW5hbCBjb2RlIGNhbiB3b3JrIGJ1dCBpdCAg
Y2FsbCBtdWx0aSBzdHJpcGVkX3JlYWQuDQpBcyB5b3VyIHNhaWQgZm9yIHN0cmlwZSBzaG9ydC1y
ZWFkLGl0IGRvZXNuJ3QgbWFrZSBzZW5zZSB0byByZXR1cm4gcmF0aGVyIHRoYW4gcmVhZGluZyBu
ZXh0IHN0cmlwZS4NCkJ1dCBjYW4geW91IGFkZCBzb21lIGNvbW1lbnRzIGZvciB0aGlzPw0KVGhl
IHNob3J0LXJlYWQgcmVhc29uZ3MgYXJlIHR3bzpFT0Ygb3IgaGl0LWhvbGUuDQpCdXQgZm9yIGhp
dC1ob2xlIHRoZXJlIGFyZSBzb21lIGRpZmZlcmVudHMgY2FzZS4gRm9yIHRoYXQgaSBkb24ndCBr
bm93Lg0KDQpUaGFua3MhDQpKaWFucGVuZyBNYQ0KPlJlZ2FyZHMNCj5ZYW4sIFpoZW5nDQo+DQo+
DQo+Pg0KPj4NCj4+IFRoYW5rcyENCj4+IEppYW5wZW5nIE1hDQo+Pg0KPj4gPlJlZ2FyZHMNCj4+
ID5ZYW4sIFpoZW5nDQo+PiA+LS0tDQo+PiA+ZGlmZiAtLWdpdCBhL2ZzL2NlcGgvZmlsZS5jIGIv
ZnMvY2VwaC9maWxlLmMNCj4+ID5pbmRleCAyNzFhMzQ2Li42Y2EyOTIxIDEwMDY0NA0KPj4gPi0t
LSBhL2ZzL2NlcGgvZmlsZS5jDQo+PiA+KysrIGIvZnMvY2VwaC9maWxlLmMNCj4+ID5AQCAtMzUw
LDE2ICszNTAsMTcgQEAgbW9yZToNCj4+ID4gICAgICAgICAgICByZXQsIGhpdF9zdHJpcGUgPyAi
IEhJVFNUUklQRSIgOiAiIiwgd2FzX3Nob3J0ID8gIiBTSE9SVCIgOiAiIik7DQo+PiA+DQo+PiA+
ICAgICAgIGlmIChyZXQgPiAwKSB7DQo+PiA+LSAgICAgICAgICAgICAgaW50IGRpZHBhZ2VzID0g
KHBhZ2VfYWxpZ24gKyByZXQpID4+IFBBR0VfQ0FDSEVfU0hJRlQ7DQo+PiA+KyAgICAgICAgICAg
ICAgaW50IGRpZHBhZ2VzID0gKHBhZ2VfYWxpZ24gKyB0aGlzX2xlbikgPj4gUEFHRV9DQUNIRV9T
SElGVDsNCj4+ID4NCj4+ID4tICAgICAgICAgICAgICBpZiAocmVhZCA8IHBvcyAtIG9mZikgew0K
Pj4gPi0gICAgICAgICAgICAgICAgICAgICAgZG91dCgiIHplcm8gZ2FwICVsbHUgdG8gJWxsdVxu
Iiwgb2ZmICsgcmVhZCwgcG9zKTsNCj4+ID4tICAgICAgICAgICAgICAgICAgICAgIGNlcGhfemVy
b19wYWdlX3ZlY3Rvcl9yYW5nZShwYWdlX2FsaWduICsgcmVhZCwNCj4+ID4tICAgICAgICAgICAg
ICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICBwb3MgLSBvZmYgLSByZWFkLCBw
YWdlcyk7DQo+PiA+KyAgICAgICAgICAgICAgaWYgKHdhc19zaG9ydCkgew0KPj4gPisgICAgICAg
ICAgICAgICAgICAgICAgZG91dCgiIHplcm8gZ2FwICVsbHUgdG8gJWxsdVxuIiwNCj4+ID4rICAg
ICAgICAgICAgICAgICAgICAgICAgICAgcG9zICsgcmV0LCBwb3MgKyB0aGlzX2xlbik7DQo+PiA+
KyAgICAgICAgICAgICAgICAgICAgICBjZXBoX3plcm9fcGFnZV92ZWN0b3JfcmFuZ2UocGFnZV9h
bGlnbiArIHJldCwNCj4+ID4rICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAg
ICAgICAgICAgICB0aGlzX2xlbiAtIHJldCwgcGFnZV9wb3MpOw0KPj4gPiAgICAgICAgICAgICAg
IH0NCj4+ID4tICAgICAgICAgICAgICBwb3MgKz0gcmV0Ow0KPj4gPisgICAgICAgICAgICAgIHBv
cyArPSB0aGlzX2xlbjsNCj4+ID4gICAgICAgICAgICAgICByZWFkID0gcG9zIC0gb2ZmOw0KPj4g
Pi0gICAgICAgICAgICAgIGxlZnQgLT0gcmV0Ow0KPj4gPisgICAgICAgICAgICAgIGxlZnQgLT0g
dGhpc19sZW47DQo+PiA+ICAgICAgICAgICAgICAgcGFnZV9wb3MgKz0gZGlkcGFnZXM7DQo+PiA+
ICAgICAgICAgICAgICAgcGFnZXNfbGVmdCAtPSBkaWRwYWdlczsNCj4+ID4NClRoYW5rcyENCkpp
YW5wZW5nIE1hDQo+T24gTW9uLCBKdWwgMjksIDIwMTMgYXQgMTE6MDAgQU0sIG1hamlhbnBlbmcg
PG1hamlhbnBlbmdAZ21haWwuY29tPiB3cm90ZToNCj4+DQo+PiBbc25pcF0NCj4+ID5JIGRvbid0
IHRoaW5rIHRoZSBsYXRlciB3YXNfc2hvcnQgY2FuIGhhbmRsZSB0aGUgaG9sZSBjYXNlLiBGb3Ig
dGhlIGhvbGUgY2FzZSwNCj4+ID53ZSBzaG91bGQgdHJ5IHJlYWRpbmcgbmV4dCBzdHJpcCBvYmpl
Y3QgaW5zdGVhZCBvZiByZXR1cm4uIGhvdyBhYm91dA0KPj4gPmJlbG93IHBhdGNoLg0KPj4gPg0K
Pj4gSGkgWWFuLA0KPj4gICAgICAgICBpIHVlc2VkIHRoaXMgZGVtbyB0byB0ZXN0IGhvbGUgY2Fz
ZS4NCj4+IGRkIGlmPS9kZXYvdXJhbmRvbSBicz00MDk2IGNvdW50PTIgb2Y9ZmlsZV93aXRoX2hv
bGVzDQo+PiBkZCBpZj0vZGV2L3VyYW5kb20gYnM9NDA5NiBzZWVrPTcgY291bnQ9MiBvZj1maWxl
X3dpdGhfaG9sZXMNCj4+DQo+PiBkZCBpZj1maWxlX3dpdGhfaG9sZXMgb2Y9L2Rldi9udWxsIGJz
PTE2ayBjb3VudD0xIGlmbGFnPWRpcmVjdA0KPj4gVXNpbmcgdGhlIGR5bmFtaWNfZGVidWcgaW4g
c3RyaXBlZF9yZWFkLCAgdGhlIG1lc3NhZ2UgYXJlOg0KPj4gPlsgODc0My42NjM0OTldIGNlcGg6
ICAgICAgICAgICBmaWxlLmM6MzUwICA6IHN0cmlwZWRfcmVhZCAwfjE2Mzg0IChyZWFkIDApIGdv
dCAxNjM4NA0KPj4gPlsgODc0My42NjM1MDJdIGNlcGg6ICAgICAgICAgICBmaWxlLmM6MzkwICA6
IHN0cmlwZWRfcmVhZCByZXR1cm5zIDE2Mzg0DQo+PiBGcm9tIHRoZSBtZXNzYWdlcywgd2UgY2Fu
IHNlZSBpdCBjYW4ndCBoaXQgdGhlIHNob3J0LXJlYWQuDQo+PiBGb3IgdGhlIGNlcGgtZmlsZS1o
b2xlLCBob3cgZG9lcyB0aGUgY2VwaCBoYW5kbGU/DQo+PiBPciBhbSBpIG1pc3Npbmcgc29tZXRo
aW5nPw0KPg0KPnRoZSBkZWZhdWx0IHN0cmlwIHNpemUgaXMgNE0sIGFsbCBkYXRhIGFyZSB3cml0
dGVuIHRvIHRoZSBmaXJzdCBvYmplY3QNCj5pbiB5b3VyIHRlc3QgY2FzZS4NCj5jb3VsZCB5b3Ug
dHJ5IHNvbWV0aGluZyBsaWtlIGJlbG93Lg0KPg0KPmRkIGlmPS9kZXYvdXJhbmRvbSBicz0xTSBj
b3VudD0yIG9mPWZpbGVfd2l0aF9ob2xlcw0KPmRkIGlmPS9kZXYvdXJhbmRvbSBicz0xTSBjb3Vu
dD0yIHNlZWs9NCBvZj1maWxlX3dpdGhfaG9sZXMgY29udj1ub3RydW5jDQo+ZGQgaWY9ZmlsZV93
aXRoX2hvbGVzIGJzPThNID4vZGV2L251bGwNCj4NCj5SZWdhcmRzDQo+WWFuLCBaaGVuZw0KPg0K
Pg0KPj4NCj4+DQo+PiBUaGFua3MhDQo+PiBKaWFucGVuZyBNYQ0KPj4NCj4+ID5SZWdhcmRzDQo+
PiA+WWFuLCBaaGVuZw0KPj4gPi0tLQ0KPj4gPmRpZmYgLS1naXQgYS9mcy9jZXBoL2ZpbGUuYyBi
L2ZzL2NlcGgvZmlsZS5jDQo+PiA+aW5kZXggMjcxYTM0Ni4uNmNhMjkyMSAxMDA2NDQNCj4+ID4t
LS0gYS9mcy9jZXBoL2ZpbGUuYw0KPj4gPisrKyBiL2ZzL2NlcGgvZmlsZS5jDQo+PiA+QEAgLTM1
MCwxNiArMzUwLDE3IEBAIG1vcmU6DQo+PiA+ICAgICAgICAgICAgcmV0LCBoaXRfc3RyaXBlID8g
IiBISVRTVFJJUEUiIDogIiIsIHdhc19zaG9ydCA/ICIgU0hPUlQiIDogIiIpOw0KPj4gPg0KPj4g
PiAgICAgICBpZiAocmV0ID4gMCkgew0KPj4gPi0gICAgICAgICAgICAgIGludCBkaWRwYWdlcyA9
IChwYWdlX2FsaWduICsgcmV0KSA+PiBQQUdFX0NBQ0hFX1NISUZUOw0KPj4gPisgICAgICAgICAg
ICAgIGludCBkaWRwYWdlcyA9IChwYWdlX2FsaWduICsgdGhpc19sZW4pID4+IFBBR0VfQ0FDSEVf
U0hJRlQ7DQo+PiA+DQo+PiA+LSAgICAgICAgICAgICAgaWYgKHJlYWQgPCBwb3MgLSBvZmYpIHsN
Cj4+ID4tICAgICAgICAgICAgICAgICAgICAgIGRvdXQoIiB6ZXJvIGdhcCAlbGx1IHRvICVsbHVc
biIsIG9mZiArIHJlYWQsIHBvcyk7DQo+PiA+LSAgICAgICAgICAgICAgICAgICAgICBjZXBoX3pl
cm9fcGFnZV92ZWN0b3JfcmFuZ2UocGFnZV9hbGlnbiArIHJlYWQsDQo+PiA+LSAgICAgICAgICAg
ICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgcG9zIC0gb2ZmIC0gcmVhZCwg
cGFnZXMpOw0KPj4gPisgICAgICAgICAgICAgIGlmICh3YXNfc2hvcnQpIHsNCj4+ID4rICAgICAg
ICAgICAgICAgICAgICAgIGRvdXQoIiB6ZXJvIGdhcCAlbGx1IHRvICVsbHVcbiIsDQo+PiA+KyAg
ICAgICAgICAgICAgICAgICAgICAgICAgIHBvcyArIHJldCwgcG9zICsgdGhpc19sZW4pOw0KPj4g
PisgICAgICAgICAgICAgICAgICAgICAgY2VwaF96ZXJvX3BhZ2VfdmVjdG9yX3JhbmdlKHBhZ2Vf
YWxpZ24gKyByZXQsDQo+PiA+KyAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAg
ICAgICAgICAgICAgdGhpc19sZW4gLSByZXQsIHBhZ2VfcG9zKTsNCj4+ID4gICAgICAgICAgICAg
ICB9DQo+PiA+LSAgICAgICAgICAgICAgcG9zICs9IHJldDsNCj4+ID4rICAgICAgICAgICAgICBw
b3MgKz0gdGhpc19sZW47DQo+PiA+ICAgICAgICAgICAgICAgcmVhZCA9IHBvcyAtIG9mZjsNCj4+
ID4tICAgICAgICAgICAgICBsZWZ0IC09IHJldDsNCj4+ID4rICAgICAgICAgICAgICBsZWZ0IC09
IHRoaXNfbGVuOw0KPj4gPiAgICAgICAgICAgICAgIHBhZ2VfcG9zICs9IGRpZHBhZ2VzOw0KPj4g
PiAgICAgICAgICAgICAgIHBhZ2VzX2xlZnQgLT0gZGlkcGFnZXM7DQo+PiA+

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Yan, Zheng July 30, 2013, 2:56 a.m. UTC | #3
On Tue, Jul 30, 2013 at 10:08 AM, majianpeng <majianpeng@gmail.com> wrote:
>>On Mon, Jul 29, 2013 at 11:00 AM, majianpeng <majianpeng@gmail.com> wrote:
>>>
>>> [snip]
>>> >I don't think the later was_short can handle the hole case. For the hole case,
>>> >we should try reading next strip object instead of return. how about
>>> >below patch.
>>> >
>>> Hi Yan,
>>>         i uesed this demo to test hole case.
>>> dd if=/dev/urandom bs=4096 count=2 of=file_with_holes
>>> dd if=/dev/urandom bs=4096 seek=7 count=2 of=file_with_holes
>>>
>>> dd if=file_with_holes of=/dev/null bs=16k count=1 iflag=direct
>>> Using the dynamic_debug in striped_read,  the message are:
>>> >[ 8743.663499] ceph:           file.c:350  : striped_read 0~16384 (read 0) got 16384
>>> >[ 8743.663502] ceph:           file.c:390  : striped_read returns 16384
>>> From the messages, we can see it can't hit the short-read.
>>> For the ceph-file-hole, how does the ceph handle?
>>> Or am i missing something?
>>
>>the default strip size is 4M, all data are written to the first object
>>in your test case.
>>could you try something like below.
>>
>>dd if=/dev/urandom bs=1M count=2 of=file_with_holes
>>dd if=/dev/urandom bs=1M count=2 seek=4 of=file_with_holes conv=notrunc
>>dd if=file_with_holes bs=8M >/dev/null
>>
>
> From above test, i think your patch is right.
> Although, the original code can work but it  call multi striped_read.

For test case
---
dd if=/dev/urandom bs=1M count=2 of=file_with_holes
dd if=/dev/urandom bs=1M count=2 seek=4 of=file_with_holes conv=notrunc
dd if=file_with_holes bs=8M iflag=direct >/dev/null

I got
---
ceph:  striped_read 0~8388608 (read 0) got 2097152 HITSTRIPE SHORT
ceph:  striped_read 2097152~6291456 (read 2097152) got 0 HITSTRIPE SHORT
ceph:  zero tail 4194304
ceph:  striped_read returns 6291456
ceph:  sync_read result 6291456
ceph:  aio_read ffff88000fb22f98 10000193e8c.fffffffffffffffe dropping
cap refs on Fcr = 6291456

the original code zeros data in range 2M~6M, it's obvious incorrect.

> As your said for stripe short-read,it doesn't make sense to return rather than reading next stripe.
> But can you add some comments for this?
> The short-read reasongs are two:EOF or hit-hole.
> But for hit-hole there are some differents case. For that i don't know.
>

For hit-hole, there is only one case: the strip object's size is
smaller then 4M. When reading
a strip object, if the returned data is less than we expected, we need
to check if following strip
objects have data.

I think the original code and my patch doesn't handle the below case properly.

| object 0 |  hole  |  hole |  object 3 |
dd if=testfile iflag=direct bs=16M >/dev/null

Could you write a patch, do some tests and submit it.

Regards
Yan, Zheng
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/fs/ceph/file.c b/fs/ceph/file.c
index 2ddf061..94fa378 100644
--- a/fs/ceph/file.c
+++ b/fs/ceph/file.c
@@ -352,11 +352,6 @@  more:
        if (ret > 0) {
                int didpages = (page_align + ret) >> PAGE_CACHE_SHIFT;
 
-               if (read < pos - off) {
-                       dout(" zero gap %llu to %llu\n", off + read, pos);
-                       ceph_zero_page_vector_range(page_align + read,
-                                                   pos - off - read, pages);
-               }
                pos += ret;
                read = pos - off;
                left -= ret;