diff mbox

write and read error handling problems

Message ID CABAwU-ao6=DfMEeJZLZnYniVnzs+Gp7BYLoGv-36oUUPCR77Qg@mail.gmail.com (mailing list archive)
State New, archived
Headers show

Commit Message

huang jun Aug. 10, 2011, 12:11 p.m. UTC
hi,all
About OSD read ops, if osd got errors, it just return,
that may lead memory leak. we patched it.
thanks!
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Sage Weil Aug. 10, 2011, 2:36 p.m. UTC | #1
On Wed, 10 Aug 2011, huang jun wrote:

> hi,all
> About OSD read ops, if osd got errors, it just return,
> that may lead memory leak. we patched it.
> diff --git a/src/osd/ReplicatedPG.cc b/src/osd/ReplicatedPG.cc
> index 2ab21bb..21fbca7 100644
> --- a/src/osd/ReplicatedPG.cc
> +++ b/src/osd/ReplicatedPG.cc
> @@ -588,8 +588,18 @@ void ReplicatedPG::do_op(MOSDOp *op)
>      obc->ondisk_read_unlock();
>    }
> 
> -  if (result == -EAGAIN)
> +  if (result == -EAGAIN) {
> +    delete ctx;
>      return;
> +  }
> please have a check!

That was a leak, yep!  In the current master it's already fixed up (along 
with the obc and src_obc [something new]):

  if (result == -EAGAIN) {
    // clean up after the ctx
    delete ctx;
    put_object_context(obc);
    put_object_contexts(src_obc);
    return;
  }

> So i'm confused about the Error handling strategy of write/read
> operations in OSD.
> If the ceph just return when encountered errors, pass the work to client?
> Let's take an example of writing files. Client send request to write 4MB file,
> and OSD first write the osd journal, then return commit msg to Client.
> But, if the write file op was interrupted by the borken disk sector or
> other errors, that means write ops failed. What does the OSD going to
> do? Replay it from the former writen journal item? or other methods?

Currently if cosd gets an error back from the underlying file system it 
will make itself crash, effectively escalating a sector error into a 
failure of cosd itself.  If you (the admin) are able to repair the disk/fs 
and restart cosd, it will replay from the journal and continue.  
Otherwise you can replace the disk and it will recover the whole osd's 
data set from the rest of the cluster.

sage
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/src/osd/ReplicatedPG.cc b/src/osd/ReplicatedPG.cc
index 2ab21bb..21fbca7 100644
--- a/src/osd/ReplicatedPG.cc
+++ b/src/osd/ReplicatedPG.cc
@@ -588,8 +588,18 @@  void ReplicatedPG::do_op(MOSDOp *op)
     obc->ondisk_read_unlock();
   }

-  if (result == -EAGAIN)
+  if (result == -EAGAIN) {
+    delete ctx;
     return;
+  }
please have a check!

So i'm confused about the Error handling strategy of write/read
operations in OSD.
If the ceph just return when encountered errors, pass the work to client?
Let's take an example of writing files. Client send request to write 4MB file,
and OSD first write the osd journal, then return commit msg to Client.
But, if the write file op was interrupted by the borken disk sector or
other errors, that means write ops failed. What does the OSD going to
do? Replay it from the former writen journal item? or other methods?