diff mbox

filestore_fiemap is bad, memstore is buggy.

Message ID CACJqLyb31LmurPPpmi9_LbwvRGZaSp2sXmqpaFE5UXP3ynE4Ow@mail.gmail.com (mailing list archive)
State New, archived
Headers show

Commit Message

Haomai Wang Sept. 16, 2016, 4:29 p.m. UTC
On Fri, Sep 16, 2016 at 9:24 PM, Sage Weil <sweil@redhat.com> wrote:
> On Fri, 16 Sep 2016, Haomai Wang wrote:
>> On Fri, Sep 16, 2016 at 5:48 AM, Sage Weil <sweil@redhat.com> wrote:
>> > I was adding tests for clone_range in preparation for improving the
>> > bluestore implementation and got stuck on memstore and filestore bugs!
>> >
>> > The most alarming is that filestore very quickly fails the tests if
>> > filestore_fiemap is enabled.  It's off by default (phew!) but the
>> > ceph_test_objectstore test was explicitly enabling it to get better
>> > coverage.  For now I've just removed that so we stick with the default (no
>> > fiemap == good).  I wonder if we should consider removing fiemap from the
>> > code entirely, though, since it is clearly buggy, even on a modern kernel
>> > (I'm running 4.4 and XFS).
>>
>> I tried with the new test case, I found even disable fiemap, I will
>> ran into failure. So I suspect the splice codes, I disable it, test
>> passed.
>>
>> But fiemap and seek_data/seek_hole is still failing even disable
>> splice, I would like to dive into more to verify our codes if correct
>> since _do_seek_hole_data is copied from fiemap....
>
> Yeah, good idea.  Thanks for looking into it!

Oh, I think I found the problem.

If source file exists hole in [offset, length], fiemap or
seek_data/seek_hole will return empty data map(map<uint64_t, uint64_t>
exomap). So we won't write any to target file. But we exactly need to
write zero to target file in [offset, length]....

And the other case is we may write garbage data to target file. This
fix may help a lot to cases:

       r = -errno;
@@ -3661,6 +3661,7 @@ int FileStore::_do_copy_range(int from, int to,
uint64_t srcoff, u
     char buf[buflen];
     while (pos < end) {
       int l = MIN(end-pos, buflen);
+      memset(buf, 0, l);
       r = ::read(from, buf, l);
       dout(25) << "  read from " << pos << "~" << l << " got " << r << dendl;
       if (r < 0) {

Hmm, I'm on the vacation and don't have test machine at hand. Let me
rethink this logic and verify it's true. Anyone who want to fix this
problem immediately, plz feel free to do... Or correct me if I'm
wrong!

>
> sage
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Haomai Wang Sept. 17, 2016, 1:05 p.m. UTC | #1
*splice* feature is buggy in any case(disable fiemap/seek_data, enable
fiemap, enable seek_data). I'd like to disable it by default until we
have a clear answer to test failure. In other hand, from my previous
perf tests, splice doesn't show up outstanding improvements in ceph
usages(I think it's natural since splice only eliminate memory copy,
compared to ceph context, it's really not problem).

https://github.com/ceph/ceph/pull/11113

On Sat, Sep 17, 2016 at 12:29 AM, Haomai Wang <haomai@xsky.com> wrote:
> On Fri, Sep 16, 2016 at 9:24 PM, Sage Weil <sweil@redhat.com> wrote:
>> On Fri, 16 Sep 2016, Haomai Wang wrote:
>>> On Fri, Sep 16, 2016 at 5:48 AM, Sage Weil <sweil@redhat.com> wrote:
>>> > I was adding tests for clone_range in preparation for improving the
>>> > bluestore implementation and got stuck on memstore and filestore bugs!
>>> >
>>> > The most alarming is that filestore very quickly fails the tests if
>>> > filestore_fiemap is enabled.  It's off by default (phew!) but the
>>> > ceph_test_objectstore test was explicitly enabling it to get better
>>> > coverage.  For now I've just removed that so we stick with the default (no
>>> > fiemap == good).  I wonder if we should consider removing fiemap from the
>>> > code entirely, though, since it is clearly buggy, even on a modern kernel
>>> > (I'm running 4.4 and XFS).
>>>
>>> I tried with the new test case, I found even disable fiemap, I will
>>> ran into failure. So I suspect the splice codes, I disable it, test
>>> passed.
>>>
>>> But fiemap and seek_data/seek_hole is still failing even disable
>>> splice, I would like to dive into more to verify our codes if correct
>>> since _do_seek_hole_data is copied from fiemap....
>>
>> Yeah, good idea.  Thanks for looking into it!
>
> Oh, I think I found the problem.
>
> If source file exists hole in [offset, length], fiemap or
> seek_data/seek_hole will return empty data map(map<uint64_t, uint64_t>
> exomap). So we won't write any to target file. But we exactly need to
> write zero to target file in [offset, length]....
>
> And the other case is we may write garbage data to target file. This
> fix may help a lot to cases:
>
> --- a/src/os/filestore/FileStore.cc
> +++ b/src/os/filestore/FileStore.cc
> @@ -3597,7 +3597,7 @@ int FileStore::_do_copy_range(int from, int to,
> uint64_t srcoff, u
>    int buflen = 4096 * 16; //limit by pipe max size.see fcntl
>
>  #ifdef CEPH_HAVE_SPLICE
> -  if (backend->has_splice()) {
> +  if (backend->has_splice() && 0) {
>      int pipefd[2];
>      if (pipe(pipefd) < 0) {
>        r = -errno;
> @@ -3661,6 +3661,7 @@ int FileStore::_do_copy_range(int from, int to,
> uint64_t srcoff, u
>      char buf[buflen];
>      while (pos < end) {
>        int l = MIN(end-pos, buflen);
> +      memset(buf, 0, l);
>        r = ::read(from, buf, l);
>        dout(25) << "  read from " << pos << "~" << l << " got " << r << dendl;
>        if (r < 0) {
>
> Hmm, I'm on the vacation and don't have test machine at hand. Let me
> rethink this logic and verify it's true. Anyone who want to fix this
> problem immediately, plz feel free to do... Or correct me if I'm
> wrong!
>
>>
>> sage
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

--- a/src/os/filestore/FileStore.cc
+++ b/src/os/filestore/FileStore.cc
@@ -3597,7 +3597,7 @@  int FileStore::_do_copy_range(int from, int to,
uint64_t srcoff, u
   int buflen = 4096 * 16; //limit by pipe max size.see fcntl

 #ifdef CEPH_HAVE_SPLICE
-  if (backend->has_splice()) {
+  if (backend->has_splice() && 0) {
     int pipefd[2];
     if (pipe(pipefd) < 0) {