From patchwork Wed Nov 12 23:21:24 2014 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrew Shewmaker X-Patchwork-Id: 5292551 Return-Path: X-Original-To: patchwork-ceph-devel@patchwork.kernel.org Delivered-To: patchwork-parsemail@patchwork2.web.kernel.org Received: from mail.kernel.org (mail.kernel.org [198.145.19.201]) by patchwork2.web.kernel.org (Postfix) with ESMTP id 38F45C11AC for ; Wed, 12 Nov 2014 23:21:32 +0000 (UTC) Received: from mail.kernel.org (localhost [127.0.0.1]) by mail.kernel.org (Postfix) with ESMTP id 43C02201BC for ; Wed, 12 Nov 2014 23:21:31 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 4E214201BB for ; Wed, 12 Nov 2014 23:21:30 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753690AbaKLXV2 (ORCPT ); Wed, 12 Nov 2014 18:21:28 -0500 Received: from mail-pa0-f48.google.com ([209.85.220.48]:60456 "EHLO mail-pa0-f48.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753683AbaKLXV2 (ORCPT ); Wed, 12 Nov 2014 18:21:28 -0500 Received: by mail-pa0-f48.google.com with SMTP id rd3so371052pab.35 for ; Wed, 12 Nov 2014 15:21:28 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=date:from:to:subject:message-id:mime-version:content-type :content-disposition:user-agent; bh=6l32+j1/wch1/r0I3OtWq07uDFEVcir6dsPep94Hz9c=; b=hfz3A6n/6kFdijUExhVFosZMwkSN1uOmev30PC1sRZb3jOcTZdW45iI306rEsQU60j Zb/rX5tRC2IiY1OsnousXK+NAbWO+gM9aKHkTfelvhMIw6GfWczhQ4cebTVBJB48UXBj FiZ9JuZ17J7GmwA4XSLsGvhbufgaHNMB5Js1wQC3M8MpWgH9i88AZddX6s8fhiELTFYl d1qNtnMpc29YiOxT7HC5nXlqGw7xFwuAEzsVEY+HoXtkZoyl21OJFKLsolReV4q+ik49 zeP1bimjmnBVaSdrZsfqRLS7mt+3+PGrxmdbqPW1KXVdFm8PlH4GHj3V++Nbsqw9nSGB maZw== X-Received: by 10.68.69.76 with SMTP id c12mr50117977pbu.59.1415834487944; Wed, 12 Nov 2014 15:21:27 -0800 (PST) Received: from localhost (host-69-145-59-76.bln-mt.client.bresnan.net. [69.145.59.76]) by mx.google.com with ESMTPSA id fm15sm23122242pdb.58.2014.11.12.15.21.27 for (version=TLSv1.2 cipher=RC4-SHA bits=128/128); Wed, 12 Nov 2014 15:21:27 -0800 (PST) Date: Wed, 12 Nov 2014 16:21:24 -0700 From: Andrew Shewmaker To: ceph-devel@vger.kernel.org Subject: [PATCH 6/6] Documentation for testing blkin tracing Message-ID: <20141112232124.GG6892@localhost> MIME-Version: 1.0 Content-Disposition: inline User-Agent: Mutt/1.5.21 (2010-09-15) Sender: ceph-devel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: ceph-devel@vger.kernel.org X-Spam-Status: No, score=-7.3 required=5.0 tests=BAYES_00, DKIM_ADSP_CUSTOM_MED, DKIM_SIGNED, FREEMAIL_FROM, RCVD_IN_DNSWL_HI, RP_MATCHES_RCVD, T_DKIM_INVALID, UNPARSEABLE_RELAY autolearn=ham version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on mail.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Signed-off-by: Andrew Shewmaker --- doc/dev/trace.rst | 86 +++++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 86 insertions(+) create mode 100644 doc/dev/trace.rst diff --git a/doc/dev/trace.rst b/doc/dev/trace.rst new file mode 100644 index 0000000..53be5f5 --- /dev/null +++ b/doc/dev/trace.rst @@ -0,0 +1,86 @@ +================== + Tracing Ceph +================== + +Ceph can use Blkin, a library created by Marios Kogias and others, +which enables tracking a specific request from the time it enters +the system at higher levels till it is finally served by RADOS. + +In general, Blkin implements the tracing semantics described in the Dapper +paper http://static.googleusercontent.com/media/research.google.com/el//pubs/archive/36356.pdf +in order to trace the causal relationships between the different +processing phases that an IO request may trigger. The goal is an end-to-end +visualisation of the request's route in the system, accompanied by information +concerning latencies in each processing phase. Thanks to LTTng this can happen +with a minimal overhead and in realtime. In order to visualize the results we +have integrated Twitter's Zipkin http://twitter.github.io/zipkin/ +(which is a tracing system entirely based on Dapper) with LTTng. + +Testing Blkin +============= + +It's easy to test Ceph's Blkin tracing. Compile Ceph with the Blkin +changes, then launch Ceph with the vstart script so you can see the +possible tracepoints.:: + + cd src + OSD=3 MON=3 RGW=1 ./vstart.sh -n + lttng list --userspace + +You'll see something like the following::: + + UST events: + ------------- + + PID: 8987 - Name: ./ceph-osd + zipkin:timestamp (loglevel: TRACE_WARNING (4)) (type: tracepoint) + zipkin:keyval (loglevel: TRACE_WARNING (4)) (type: tracepoint) + ust_baddr_statedump:soinfo (loglevel: TRACE_DEBUG_LINE (13)) (type: tracepoint) + + PID: 8407 - Name: ./ceph-mon + zipkin:timestamp (loglevel: TRACE_WARNING (4)) (type: tracepoint) + zipkin:keyval (loglevel: TRACE_WARNING (4)) (type: tracepoint) + ust_baddr_statedump:soinfo (loglevel: TRACE_DEBUG_LINE (13)) (type: tracepoint) + + ... + +Next, stop Ceph so that the tracepoints can be enabled.:: + + ./stop.sh + +Start up an LTTng session and enable the tracepoints.:: + + lttng create blkin-test + lttng enable-event --userspace zipkin:timestamp + lttng enable-event --userspace zipkin:keyval + lttng start + +Then start up Ceph again.:: + + OSD=3 MON=3 RGW=1 ./vstart.sh -n + +You may want to check that ceph is up.:: + + ./ceph status + +Now put something in usin rados, check that it made it, get it back, and remove it.:: + + rados put test-object-1 ./vstart.sh --pool=data + rados -p data ls + ceph osd map data test-object-1 + rados get test-object-1 ./vstart-copy.sh --pool=data + md5sum vstart* + rados rm test-object-1 --pool=data + +Then stop the LTTng session and see what was collected.:: + + lttng stop + lttng view + +You'll see something like::: + + [13:09:07.755054973] (+?.?????????) scruffy zipkin:timestamp: { cpu_id = 5 }, { trace_name = "Main", service_name = "MOSDOp", port_no = 0, ip = "0.0.0.0", trace_id = 7492589359882233221, span_id = 2694140257089376129, parent_span_id = 0, event = "Message allocated" } + [13:09:07.755071569] (+0.000016596) scruffy zipkin:keyval: { cpu_id = 5 }, { trace_name = "Main", service_name = "MOSDOp", port_no = 0, ip = "0.0.0.0", trace_id = 7492589359882233221, span_id = 2694140257089376129, parent_span_id = 0, key = "Type", val = "MOSDOp" } + [13:09:07.755074217] (+0.000002648) scruffy zipkin:keyval: { cpu_id = 5 }, { trace_name = "Main", service_name = "MOSDOp", port_no = 0, ip = "0.0.0.0", trace_id = 7492589359882233221, span_id = 2694140257089376129, parent_span_id = 0, key = "Reqid", val = "client.4126.0:1" } + + ...