From patchwork Thu Mar 25 07:14:09 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Tzvetomir Stoyanov (VMware)" X-Patchwork-Id: 12163223 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-15.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6D89EC433DB for ; Thu, 25 Mar 2021 07:15:04 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 30F8961A26 for ; Thu, 25 Mar 2021 07:15:04 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229574AbhCYHOe (ORCPT ); Thu, 25 Mar 2021 03:14:34 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:41170 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229590AbhCYHOS (ORCPT ); Thu, 25 Mar 2021 03:14:18 -0400 Received: from mail-wr1-x42c.google.com (mail-wr1-x42c.google.com [IPv6:2a00:1450:4864:20::42c]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 1EAF1C06174A for ; Thu, 25 Mar 2021 00:14:18 -0700 (PDT) Received: by mail-wr1-x42c.google.com with SMTP id b9so1158139wrt.8 for ; Thu, 25 Mar 2021 00:14:18 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=n/PcC3vShOs0tOTVX2TYRBKGhpAnKTLL0S2OF2+kdrc=; b=hYROnmDwxHa8Izone0UIqfa99cOOKlUmJ2RD47mETbMUS14/QlsBUSFYTRkQOgjjv7 42IgzqWl8pm9thDSz7owffnkHR8JVEHHfl8lG1VHamq0dO71ypUvtSRg0YpQKIcjOAp4 5sx3StowtdTRBechQfc//4AXgguKc1fib8MlxwUS/WhOm3XsD4GNfUIDqxGi2b8Rqu9W IuVvGrnOTMkDqxGguLItJRKmmGEyjh/UQTHJ3ij4u0fptUq/QODHd9g7UmY/Vhh1NZkE yei2YpXRdWIYkMGrkYbr2hswJduip7b5EthiiVKWC6JwE/k6h+smMB3G2z4dr4oSeD/g 6iUA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=n/PcC3vShOs0tOTVX2TYRBKGhpAnKTLL0S2OF2+kdrc=; b=akr/L8rPh4YoTBmKMQRDJBwKKBeA0jl0RTtf2VE22VUM73Ssbln5pGsuHmFBZx6wCU tqOybHhtEw2g+UMedATMtAjvsiSvv018vwzcBNl4zUudxa5Xo8auVYqQT2ZENwtARsdh RmzhpAC+a5AZ2Pl8uAlfbVcAz+ZGUTwkJsGg5NPCdxxGWjvRKROf9JpAtw9fn3VjodHw aF5rsZ7hnxUecMWOUaUHcupv/9fV9FFeQ/3HOaayvn6iIkfgY4cnnGWBtwAKvafCsSk0 fgamgPVR+KusXgZLKlC9QGSZSKewu3rc+D+XNfSgmAnFp6WtHoeLSm3Fso2/hswPBxJj mT9g== X-Gm-Message-State: AOAM532RlNN4/cpW/yq/qwkQ4+r7rbzC9IfP1FhF/6xeTfQbKzh7Pi3P RKlFMSAJPjMahfCcdDZbVBg= X-Google-Smtp-Source: ABdhPJwuugIaOxfXgRGLj8Hp+U4/MpLQOZ2vI05RiIlgRfogSQTHaZ0kX7vEmhw/PGdeBQMS6Ci4iA== X-Received: by 2002:a5d:628c:: with SMTP id k12mr7158189wru.316.1616656456867; Thu, 25 Mar 2021 00:14:16 -0700 (PDT) Received: from oberon.zico.biz ([83.222.187.186]) by smtp.gmail.com with ESMTPSA id q4sm5177487wma.20.2021.03.25.00.14.15 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 25 Mar 2021 00:14:16 -0700 (PDT) From: "Tzvetomir Stoyanov (VMware)" To: rostedt@goodmis.org Cc: linux-trace-devel@vger.kernel.org, stefano.devenuto99@gmail.com Subject: [PATCH v33 1/5] trace-cmd: Add dummy function to initialize timestamp sync logic Date: Thu, 25 Mar 2021 09:14:09 +0200 Message-Id: <20210325071413.543640-2-tz.stoyanov@gmail.com> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20210325071413.543640-1-tz.stoyanov@gmail.com> References: <20210325071413.543640-1-tz.stoyanov@gmail.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-trace-devel@vger.kernel.org A dummy empty function is added, will be used to initialize timestamp synchronization logic: tracecmd_tsync_init() When this code is implemented as real plugins, this function will be removed. Then the initializion will be triggered at plugin load time. Signed-off-by: Tzvetomir Stoyanov (VMware) --- lib/trace-cmd/trace-timesync.c | 8 ++++++++ tracecmd/trace-agent.c | 1 + tracecmd/trace-record.c | 25 ++++++++++++++++--------- 3 files changed, 25 insertions(+), 9 deletions(-) diff --git a/lib/trace-cmd/trace-timesync.c b/lib/trace-cmd/trace-timesync.c index 0097ce15..62f5dc1e 100644 --- a/lib/trace-cmd/trace-timesync.c +++ b/lib/trace-cmd/trace-timesync.c @@ -55,6 +55,14 @@ static struct tsync_proto *tsync_proto_find(const char *proto_name) return NULL; } +/** + * tracecmd_tsync_init - Initialize the global, per task, time sync data. + */ +void tracecmd_tsync_init(void) +{ + +} + int tracecmd_tsync_proto_register(const char *proto_name, int accuracy, int roles, int supported_clocks, unsigned int flags, int (*init)(struct tracecmd_time_sync *), diff --git a/tracecmd/trace-agent.c b/tracecmd/trace-agent.c index 49f6dd8c..1f7d9461 100644 --- a/tracecmd/trace-agent.c +++ b/tracecmd/trace-agent.c @@ -250,6 +250,7 @@ static void agent_serve(unsigned int port) sd = trace_make_vsock(port); if (sd < 0) die("Failed to open vsocket"); + tracecmd_tsync_init(); if (!get_local_cid(&cid)) printf("listening on @%u:%u\n", cid, port); diff --git a/tracecmd/trace-record.c b/tracecmd/trace-record.c index 946bcb4d..6e579c22 100644 --- a/tracecmd/trace-record.c +++ b/tracecmd/trace-record.c @@ -6541,10 +6541,6 @@ out: free(clock); } -/* - * This function contains common code for the following commands: - * record, start, stream, profile. - */ static void record_trace(int argc, char **argv, struct common_record_context *ctx) { @@ -6708,12 +6704,23 @@ static void record_trace(int argc, char **argv, finalize_record_trace(ctx); } +/* + * This function contains common code for the following commands: + * record, start, stream, profile. + */ +static void record_trace_command(int argc, char **argv, + struct common_record_context *ctx) +{ + tracecmd_tsync_init(); + record_trace(argc, argv, ctx); +} + void trace_start(int argc, char **argv) { struct common_record_context ctx; parse_record_options(argc, argv, CMD_start, &ctx); - record_trace(argc, argv, &ctx); + record_trace_command(argc, argv, &ctx); exit(0); } @@ -6722,7 +6729,7 @@ void trace_set(int argc, char **argv) struct common_record_context ctx; parse_record_options(argc, argv, CMD_set, &ctx); - record_trace(argc, argv, &ctx); + record_trace_command(argc, argv, &ctx); exit(0); } @@ -6805,7 +6812,7 @@ void trace_stream(int argc, char **argv) struct common_record_context ctx; parse_record_options(argc, argv, CMD_stream, &ctx); - record_trace(argc, argv, &ctx); + record_trace_command(argc, argv, &ctx); exit(0); } @@ -6824,7 +6831,7 @@ void trace_profile(int argc, char **argv) if (!buffer_instances) top_instance.flags |= BUFFER_FL_PROFILE; - record_trace(argc, argv, &ctx); + record_trace_command(argc, argv, &ctx); do_trace_profile(); exit(0); } @@ -6834,7 +6841,7 @@ void trace_record(int argc, char **argv) struct common_record_context ctx; parse_record_options(argc, argv, CMD_record, &ctx); - record_trace(argc, argv, &ctx); + record_trace_command(argc, argv, &ctx); exit(0); } From patchwork Thu Mar 25 07:14:10 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Tzvetomir Stoyanov (VMware)" X-Patchwork-Id: 12163231 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-15.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7D910C433E1 for ; Thu, 25 Mar 2021 07:15:04 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 4F35C61A1E for ; Thu, 25 Mar 2021 07:15:04 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229493AbhCYHOc (ORCPT ); Thu, 25 Mar 2021 03:14:32 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:41180 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229653AbhCYHOU (ORCPT ); Thu, 25 Mar 2021 03:14:20 -0400 Received: from mail-wr1-x433.google.com (mail-wr1-x433.google.com [IPv6:2a00:1450:4864:20::433]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id A2FAAC06175F for ; Thu, 25 Mar 2021 00:14:19 -0700 (PDT) Received: by mail-wr1-x433.google.com with SMTP id v4so1137313wrp.13 for ; Thu, 25 Mar 2021 00:14:19 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=6H5X34djLj6aslcjonc1F9JN1n1EpZSApu1HPrLR3hA=; b=eaTjUMMjMmvr+p8or2KgfglXGAeDKcqOeZqZ4jnOKA8IiMN3l4rGuJ/uGoDyVf+xo3 OR29Yi4E8jh/CR+SbdMoPsJLDWOG7o5/3rkFZYV6uXCvvnz+tDrAgmH02ypblClUFauf XbvYGZ0HFhzPWdJlf8E831Wa9wdDtsbODP4Tgqvfxeb80mrVgb6Jn/V7O/c6U3+Budtr wCexWGcaPOmDuwIApD5VPXYLF5anrxYrCd5xio0MD0/me8FIc7kFj/MSX9suNN1hN625 E0lywY2M2cHP8g7ha7H2iTQok9Xv9o8HClJ6qKvR0w01ABKWwoY1w6aU8fId7Sgb++Cw 5Lqw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=6H5X34djLj6aslcjonc1F9JN1n1EpZSApu1HPrLR3hA=; b=hLcJBfbv+L6A7Uyp5EXsZLljJsPtkNTr9j9tkIy4kX6Hus0iQ+rHH/YjnWzEHhIKyZ cSyhjw09X2im7tBJ61E4tFAjzgwDxtHazIQ6gGfZsxTsyK7xAQhQYrIQ2E6UVLDR8JrX aUZUgAriNu74tG7kWNTnoO4oaa3CEKbInIs+tDJaNFilm5ivmDvrd/9WutJ8h5nHwreb a35Q729BdKDQSAPJaJlBnYLqNR6eQXvfPrLiD7LPYen2bQHkL9MxSq8a4tsgxefCi94b zC/nAvWPEBtM8mmbglisqUmmeUJbiDWqAgblOeE20rNFw987zaN8cxnzwglM9kRcshC3 FbLA== X-Gm-Message-State: AOAM532sx/7fpfeRS61p6x3jZbIPUwGQpKp7lVjO/+Bt5b8qtOdrtw0o dP1f2szTPUqvpcN8/EVD/I0= X-Google-Smtp-Source: ABdhPJyXlCkMsOd/O84/ptBz6kw3EFtyWcc7QgVNgExOty4VRARx4PdjyXrfFbvJfOcKvi06V+cE1w== X-Received: by 2002:a5d:4fcb:: with SMTP id h11mr5996691wrw.53.1616656458255; Thu, 25 Mar 2021 00:14:18 -0700 (PDT) Received: from oberon.zico.biz ([83.222.187.186]) by smtp.gmail.com with ESMTPSA id q4sm5177487wma.20.2021.03.25.00.14.16 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 25 Mar 2021 00:14:17 -0700 (PDT) From: "Tzvetomir Stoyanov (VMware)" To: rostedt@goodmis.org Cc: linux-trace-devel@vger.kernel.org, stefano.devenuto99@gmail.com Subject: [PATCH v33 2/5] trace-cmd: Add timestamp synchronization per vCPU Date: Thu, 25 Mar 2021 09:14:10 +0200 Message-Id: <20210325071413.543640-3-tz.stoyanov@gmail.com> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20210325071413.543640-1-tz.stoyanov@gmail.com> References: <20210325071413.543640-1-tz.stoyanov@gmail.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-trace-devel@vger.kernel.org Timestamp synchronization logic is changed to work per virtual CPU. Some hypervisors maintain time offset and scaling per vCPU. The host-guest communication protocol is changed to request time stamp offset calculation for particular vCPU. The guest thread, responsible for running that logic is pinned to the requested CPU. The time sync medata data, saved in the trace.dat file is changed to an array of vCPUs. When an event time stamp is corrected, the CPU on that the event happened is used to get the correct offset. Signed-off-by: Tzvetomir Stoyanov (VMware) --- .../include/private/trace-cmd-private.h | 2 +- lib/trace-cmd/include/trace-tsync-local.h | 22 +- lib/trace-cmd/trace-input.c | 175 +++++++---- lib/trace-cmd/trace-timesync.c | 281 +++++++++++++----- tracecmd/trace-dump.c | 52 ++-- 5 files changed, 361 insertions(+), 171 deletions(-) diff --git a/lib/trace-cmd/include/private/trace-cmd-private.h b/lib/trace-cmd/include/private/trace-cmd-private.h index fdca7494..899b3fef 100644 --- a/lib/trace-cmd/include/private/trace-cmd-private.h +++ b/lib/trace-cmd/include/private/trace-cmd-private.h @@ -455,7 +455,7 @@ tracecmd_tsync_with_guest(unsigned long long trace_id, int loop_interval, unsigned int cid, unsigned int port, int guest_pid, int guest_cpus, const char *proto_name, const char *clock); int tracecmd_tsync_with_guest_stop(struct tracecmd_time_sync *tsync); -int tracecmd_tsync_get_offsets(struct tracecmd_time_sync *tsync, +int tracecmd_tsync_get_offsets(struct tracecmd_time_sync *tsync, int cpu, int *count, long long **ts, long long **offsets, long long **scalings); int tracecmd_tsync_get_session_params(struct tracecmd_time_sync *tsync, diff --git a/lib/trace-cmd/include/trace-tsync-local.h b/lib/trace-cmd/include/trace-tsync-local.h index 5cf4f843..b3a3bdfc 100644 --- a/lib/trace-cmd/include/trace-tsync-local.h +++ b/lib/trace-cmd/include/trace-tsync-local.h @@ -24,12 +24,7 @@ struct tracecmd_time_sync { int vcpu_count; }; -struct clock_sync_context { - void *proto_data; /* time sync protocol specific data */ - bool is_server; /* server side time sync role */ - bool is_guest; /* guest or host time sync role */ - struct tracefs_instance *instance; /* ftrace buffer, used for time sync events */ - +struct clock_sync_offsets { /* Arrays with calculated time offsets at given time */ int sync_size; /* Allocated size of sync_ts, * sync_offsets and sync_scalings @@ -40,6 +35,18 @@ struct clock_sync_context { long long *sync_ts; long long *sync_offsets; long long *sync_scalings; +}; + +struct clock_sync_context { + void *proto_data; /* time sync protocol specific data */ + bool is_server; /* server side time sync role */ + bool is_guest; /* guest or host time sync role */ + struct tracefs_instance *instance; /* ftrace buffer, used for time sync events */ + + int cpu_count; + struct clock_sync_offsets *offsets; /* Array of size cpu_count + * calculated offsets per CPU + */ /* Identifiers of local and remote time sync peers: cid and port */ unsigned int local_cid; @@ -53,7 +60,8 @@ int tracecmd_tsync_proto_register(const char *proto_name, int accuracy, int role int (*init)(struct tracecmd_time_sync *), int (*free)(struct tracecmd_time_sync *), int (*calc)(struct tracecmd_time_sync *, - long long *, long long *, long long *)); + long long *, long long *, + long long *, unsigned int)); int tracecmd_tsync_proto_unregister(char *proto_name); int ptp_clock_sync_register(void); diff --git a/lib/trace-cmd/trace-input.c b/lib/trace-cmd/trace-input.c index c0b47345..2f7ba5db 100644 --- a/lib/trace-cmd/trace-input.c +++ b/lib/trace-cmd/trace-input.c @@ -91,12 +91,19 @@ struct guest_trace_info { int *cpu_pid; }; +struct timesync_offsets { + int ts_samples_count; + struct ts_offset_sample *ts_samples; +}; + struct host_trace_info { unsigned long long peer_trace_id; unsigned int flags; bool sync_enable; int ts_samples_count; struct ts_offset_sample *ts_samples; + int cpu_count; + struct timesync_offsets *ts_offsets; }; struct tsc2nsec { @@ -1241,50 +1248,54 @@ timestamp_correction_calc(unsigned long long ts, unsigned int flags, return ts + tscor; } -static unsigned long long timestamp_host_sync(unsigned long long ts, +static unsigned long long timestamp_host_sync(unsigned long long ts, int cpu, struct tracecmd_input *handle) { - struct host_trace_info *host = &handle->host; + struct timesync_offsets *tsync; int min, mid, max; + if (cpu >= handle->host.cpu_count) + return ts; + tsync = &handle->host.ts_offsets[cpu]; + /* We have one sample, nothing to calc here */ - if (host->ts_samples_count == 1) - return ts + host->ts_samples[0].offset; + if (tsync->ts_samples_count == 1) + return ts + tsync->ts_samples[0].offset; /* We have two samples, nothing to search here */ - if (host->ts_samples_count == 2) - return timestamp_correction_calc(ts, host->flags, - &host->ts_samples[0], - &host->ts_samples[1]); + if (tsync->ts_samples_count == 2) + return timestamp_correction_calc(ts, handle->host.flags, + &tsync->ts_samples[0], + &tsync->ts_samples[1]); /* We have more than two samples */ - if (ts <= host->ts_samples[0].time) - return timestamp_correction_calc(ts, host->flags, - &host->ts_samples[0], - &host->ts_samples[1]); - else if (ts >= host->ts_samples[host->ts_samples_count-1].time) - return timestamp_correction_calc(ts, host->flags, - &host->ts_samples[host->ts_samples_count-2], - &host->ts_samples[host->ts_samples_count-1]); + if (ts <= tsync->ts_samples[0].time) + return timestamp_correction_calc(ts, handle->host.flags, + &tsync->ts_samples[0], + &tsync->ts_samples[1]); + else if (ts >= tsync->ts_samples[tsync->ts_samples_count-1].time) + return timestamp_correction_calc(ts, handle->host.flags, + &tsync->ts_samples[tsync->ts_samples_count-2], + &tsync->ts_samples[tsync->ts_samples_count-1]); min = 0; - max = host->ts_samples_count-1; + max = tsync->ts_samples_count-1; mid = (min + max)/2; while (min <= max) { - if (ts < host->ts_samples[mid].time) + if (ts < tsync->ts_samples[mid].time) max = mid - 1; - else if (ts > host->ts_samples[mid].time) + else if (ts > tsync->ts_samples[mid].time) min = mid + 1; else break; mid = (min + max)/2; } - return timestamp_correction_calc(ts, host->flags, - &host->ts_samples[mid], - &host->ts_samples[mid+1]); + return timestamp_correction_calc(ts, handle->host.flags, + &tsync->ts_samples[mid], + &tsync->ts_samples[mid+1]); } -static unsigned long long timestamp_calc(unsigned long long ts, +static unsigned long long timestamp_calc(unsigned long long ts, int cpu, struct tracecmd_input *handle) { /* do not modify raw timestamps */ @@ -1293,7 +1304,7 @@ static unsigned long long timestamp_calc(unsigned long long ts, /* Guest trace file, sync with host timestamps */ if (handle->host.sync_enable) - ts = timestamp_host_sync(ts, handle); + ts = timestamp_host_sync(ts, cpu, handle); if (handle->ts2secs) { /* user specified clock frequency */ @@ -1333,7 +1344,7 @@ static int update_page_info(struct tracecmd_input *handle, int cpu) return -1; } handle->cpu_data[cpu].timestamp = timestamp_calc(kbuffer_timestamp(kbuf), - handle); + cpu, handle); return 0; } @@ -1963,7 +1974,7 @@ read_again: goto read_again; } - handle->cpu_data[cpu].timestamp = timestamp_calc(ts, handle); + handle->cpu_data[cpu].timestamp = timestamp_calc(ts, cpu, handle); index = kbuffer_curr_offset(kbuf); @@ -2293,34 +2304,79 @@ static int tsync_offset_cmp(const void *a, const void *b) return 0; } -static void tsync_offset_load(struct tracecmd_input *handle, char *buf) +#define safe_read(R, C) \ + do { \ + if ((C) > size) \ + return -EFAULT; \ + (R) = tep_read_number(tep, buf, (C)); \ + buf += (C); \ + size -= (C); \ + } while (0) + +#define safe_read_loop(type) \ + do { \ + int i; \ + for (i = 0; i < ts_offsets->ts_samples_count; i++) \ + safe_read(ts_offsets->ts_samples[i].type, 8); \ + } while (0) + +static int tsync_offset_load(struct tep_handle *tep, + struct timesync_offsets *ts_offsets, char *buf, int size) { - struct host_trace_info *host = &handle->host; - long long *buf8 = (long long *)buf; + int start_size = size; int i, j; - for (i = 0; i < host->ts_samples_count; i++) { - host->ts_samples[i].time = tep_read_number(handle->pevent, - buf8 + i, 8); - host->ts_samples[i].offset = tep_read_number(handle->pevent, - buf8 + host->ts_samples_count + i, 8); - host->ts_samples[i].scaling = tep_read_number(handle->pevent, - buf8 + (2 * host->ts_samples_count) + i, 8); - } - qsort(host->ts_samples, host->ts_samples_count, + safe_read_loop(time); + safe_read_loop(offset); + safe_read_loop(scaling); + qsort(ts_offsets->ts_samples, ts_offsets->ts_samples_count, sizeof(struct ts_offset_sample), tsync_offset_cmp); /* Filter possible samples with equal time */ - for (i = 0, j = 0; i < host->ts_samples_count; i++) { - if (i == 0 || host->ts_samples[i].time != host->ts_samples[i-1].time) - host->ts_samples[j++] = host->ts_samples[i]; + for (i = 0, j = 0; i < ts_offsets->ts_samples_count; i++) { + if (i == 0 || ts_offsets->ts_samples[i].time != ts_offsets->ts_samples[i-1].time) + ts_offsets->ts_samples[j++] = ts_offsets->ts_samples[i]; } - host->ts_samples_count = j; + ts_offsets->ts_samples_count = j; + + return start_size - size; +} + +static int tsync_cpu_offsets_load(struct tracecmd_input *handle, char *buf, int size) +{ + struct tep_handle *tep = handle->pevent; + int ret; + int i; + + safe_read(handle->host.cpu_count, 4); + handle->host.ts_offsets = calloc(handle->host.cpu_count, + sizeof(struct timesync_offsets)); + if (!handle->host.ts_offsets) + return -ENOMEM; + for (i = 0; i < handle->host.cpu_count; i++) { + safe_read(handle->host.ts_offsets[i].ts_samples_count, 4); + handle->host.ts_offsets[i].ts_samples = calloc(handle->host.ts_offsets[i].ts_samples_count, + sizeof(struct ts_offset_sample)); + if (!handle->host.ts_offsets[i].ts_samples) + return -ENOMEM; + ret = tsync_offset_load(tep, &handle->host.ts_offsets[i], buf, size); + if (ret <= 0) + return -EFAULT; + size -= ret; + buf += ret; + } + return 0; } static void trace_tsync_offset_free(struct host_trace_info *host) { - free(host->ts_samples); - host->ts_samples = NULL; + int i; + + if (host->ts_offsets) { + for (i = 0; i < host->cpu_count; i++) + free(host->ts_offsets[i].ts_samples); + free(host->ts_offsets); + host->ts_offsets = NULL; + } } static int trace_pid_map_cmp(const void *a, const void *b) @@ -2574,8 +2630,8 @@ static int handle_options(struct tracecmd_input *handle) struct input_buffer_instance *buffer; struct hook_list *hook; char *buf; - int samples_size; int cpus; + int ret; /* By default, use usecs, unless told otherwise */ handle->flags |= TRACECMD_FL_IN_USECS; @@ -2627,11 +2683,15 @@ static int handle_options(struct tracecmd_input *handle) /* * long long int (8 bytes) trace session ID * int (4 bytes) protocol flags. - * int (4 bytes) count of timestamp offsets. - * long long array of size [count] of times, + * int (4 bytes) CPU count. + * array of size [CPU count]: + * [ + * int (4 bytes) count of timestamp offsets. + * long long array of size [count] of times, * when the offsets were calculated. - * long long array of size [count] of timestamp offsets. - * long long array of size [count] of timestamp scaling ratios.* + * long long array of size [count] of timestamp offsets. + * long long array of size [count] of timestamp scaling ratios.* + * ] */ if (size < 16 || (handle->flags & TRACECMD_FL_RAW_TS)) break; @@ -2639,18 +2699,9 @@ static int handle_options(struct tracecmd_input *handle) buf, 8); handle->host.flags = tep_read_number(handle->pevent, buf + 8, 4); - handle->host.ts_samples_count = tep_read_number(handle->pevent, - buf + 12, 4); - samples_size = (8 * handle->host.ts_samples_count); - if (size != (16 + (2 * samples_size))) { - warning("Failed to extract Time Shift information from the file: found size %d, expected is %d", - size, 16 + (2 * samples_size)); - break; - } - handle->host.ts_samples = malloc(2 * samples_size); - if (!handle->host.ts_samples) - return -ENOMEM; - tsync_offset_load(handle, buf + 16); + ret = tsync_cpu_offsets_load(handle, buf + 12, size - 12); + if (ret < 0) + return ret; tracecmd_enable_tsync(handle, true); break; case TRACECMD_OPTION_CPUSTAT: @@ -4057,7 +4108,7 @@ int tracecmd_get_guest_cpumap(struct tracecmd_input *handle, int tracecmd_enable_tsync(struct tracecmd_input *handle, bool enable) { if (enable && - (!handle->host.ts_samples || !handle->host.ts_samples_count)) + (!handle->host.ts_offsets || !handle->host.cpu_count)) return -1; handle->host.sync_enable = enable; diff --git a/lib/trace-cmd/trace-timesync.c b/lib/trace-cmd/trace-timesync.c index 62f5dc1e..556004f1 100644 --- a/lib/trace-cmd/trace-timesync.c +++ b/lib/trace-cmd/trace-timesync.c @@ -36,9 +36,13 @@ struct tsync_proto { int (*clock_sync_free)(struct tracecmd_time_sync *clock_context); int (*clock_sync_calc)(struct tracecmd_time_sync *clock_context, long long *offset, long long *scaling, - long long *timestamp); + long long *timestamp, unsigned int cpu); }; +struct tsync_probe_request_msg { + unsigned short cpu; +} __packed; + static struct tsync_proto *tsync_proto_list; static struct tsync_proto *tsync_proto_find(const char *proto_name) @@ -68,7 +72,8 @@ int tracecmd_tsync_proto_register(const char *proto_name, int accuracy, int role int (*init)(struct tracecmd_time_sync *), int (*free)(struct tracecmd_time_sync *), int (*calc)(struct tracecmd_time_sync *, - long long *, long long *, long long *)) + long long *, long long *, + long long *, unsigned int)) { struct tsync_proto *proto = NULL; @@ -80,6 +85,7 @@ int tracecmd_tsync_proto_register(const char *proto_name, int accuracy, int role strncpy(proto->proto_name, proto_name, TRACECMD_TSYNC_PNAME_LENGTH); proto->accuracy = accuracy; proto->roles = roles; + proto->flags = flags; proto->supported_clocks = supported_clocks; proto->clock_sync_init = init; proto->clock_sync_free = free; @@ -122,6 +128,7 @@ bool tsync_proto_is_supported(const char *proto_name) * tracecmd_tsync_get_offsets - Return the calculated time offsets * * @tsync: Pointer to time sync context + * @cpu: CPU for which to get the calculated offsets * @count: Returns the number of calculated time offsets * @ts: Array of size @count containing timestamps of callculated offsets * @offsets: array of size @count, containing offsets for each timestamp @@ -129,23 +136,25 @@ bool tsync_proto_is_supported(const char *proto_name) * * Retuns -1 in case of an error, or 0 otherwise */ -int tracecmd_tsync_get_offsets(struct tracecmd_time_sync *tsync, - int *count, long long **ts, - long long **offsets, long long **scalings) +int tracecmd_tsync_get_offsets(struct tracecmd_time_sync *tsync, int cpu, + int *count, long long **ts, + long long **offsets, long long **scalings) { struct clock_sync_context *tsync_context; if (!tsync || !tsync->context) return -1; tsync_context = (struct clock_sync_context *)tsync->context; + if (cpu >= tsync_context->cpu_count || !tsync_context->offsets) + return -1; if (count) - *count = tsync_context->sync_count; + *count = tsync_context->offsets[cpu].sync_count; if (ts) - *ts = tsync_context->sync_ts; + *ts = tsync_context->offsets[cpu].sync_ts; if (offsets) - *offsets = tsync_context->sync_offsets; + *offsets = tsync_context->offsets[cpu].sync_offsets; if (scalings) - *scalings = tsync_context->sync_scalings; + *scalings = tsync_context->offsets[cpu].sync_scalings; return 0; } @@ -496,6 +505,13 @@ static int clock_context_init(struct tracecmd_time_sync *tsync, if (!clock->instance) goto error; + clock->cpu_count = tsync->vcpu_count; + if (clock->cpu_count) { + clock->offsets = calloc(clock->cpu_count, sizeof(struct clock_sync_offsets)); + if (!clock->offsets) + goto error; + } + tsync->context = clock; if (protocol->clock_sync_init && protocol->clock_sync_init(tsync) < 0) goto error; @@ -505,6 +521,9 @@ static int clock_context_init(struct tracecmd_time_sync *tsync, return 0; error: tsync->context = NULL; + if (clock->instance) + clock_synch_delete_instance(clock->instance); + free(clock->offsets); free(clock); return -1; } @@ -520,6 +539,7 @@ void tracecmd_tsync_free(struct tracecmd_time_sync *tsync) { struct clock_sync_context *tsync_context; struct tsync_proto *proto; + int i; if (!tsync->context) return; @@ -532,14 +552,20 @@ void tracecmd_tsync_free(struct tracecmd_time_sync *tsync) clock_synch_delete_instance(tsync_context->instance); tsync_context->instance = NULL; - free(tsync_context->sync_ts); - free(tsync_context->sync_offsets); - free(tsync_context->sync_scalings); - tsync_context->sync_ts = NULL; - tsync_context->sync_offsets = NULL; - tsync_context->sync_scalings = NULL; - tsync_context->sync_count = 0; - tsync_context->sync_size = 0; + if (tsync_context->cpu_count && tsync_context->offsets) { + for (i = 0; i < tsync_context->cpu_count; i++) { + free(tsync_context->offsets[i].sync_ts); + free(tsync_context->offsets[i].sync_offsets); + free(tsync_context->offsets[i].sync_scalings); + tsync_context->offsets[i].sync_ts = NULL; + tsync_context->offsets[i].sync_offsets = NULL; + tsync_context->offsets[i].sync_scalings = NULL; + tsync_context->offsets[i].sync_count = 0; + tsync_context->offsets[i].sync_size = 0; + } + free(tsync_context->offsets); + tsync_context->offsets = NULL; + } pthread_mutex_destroy(&tsync->lock); pthread_cond_destroy(&tsync->cond); pthread_barrier_destroy(&tsync->first_sync); @@ -548,15 +574,69 @@ void tracecmd_tsync_free(struct tracecmd_time_sync *tsync) free(tsync); } +static cpu_set_t *pin_to_cpu(int cpu) +{ + static size_t size; + static int cpus; + cpu_set_t *mask = NULL; + cpu_set_t *old = NULL; + + if (!cpus) { + cpus = tracecmd_count_cpus(); + size = CPU_ALLOC_SIZE(cpus); + } + if (cpu >= cpus) + goto error; + + mask = CPU_ALLOC(cpus); + if (!mask) + goto error; + old = CPU_ALLOC(cpus); + if (!old) + goto error; + + CPU_ZERO_S(size, mask); + CPU_SET_S(cpu, size, mask); + if (pthread_getaffinity_np(pthread_self(), size, old)) + goto error; + if (pthread_setaffinity_np(pthread_self(), size, mask)) + goto error; + + CPU_FREE(mask); + return old; + +error: + if (mask) + CPU_FREE(mask); + if (old) + CPU_FREE(old); + return NULL; +} + +static void restore_pin_to_cpu(cpu_set_t *mask) +{ + static size_t size; + + if (!size) + size = CPU_ALLOC_SIZE(tracecmd_count_cpus()); + + pthread_setaffinity_np(pthread_self(), size, mask); + CPU_FREE(mask); +} + static int tsync_send(struct tracecmd_time_sync *tsync, - struct tsync_proto *proto) + struct tsync_proto *proto, unsigned int cpu) { + cpu_set_t *old_set = NULL; long long timestamp = 0; long long scaling = 0; long long offset = 0; int ret; - ret = proto->clock_sync_calc(tsync, &offset, &scaling, ×tamp); + old_set = pin_to_cpu(cpu); + ret = proto->clock_sync_calc(tsync, &offset, &scaling, ×tamp, cpu); + if (old_set) + restore_pin_to_cpu(old_set); return ret; } @@ -564,55 +644,48 @@ static int tsync_send(struct tracecmd_time_sync *tsync, static void tsync_with_host(struct tracecmd_time_sync *tsync) { char protocol[TRACECMD_TSYNC_PNAME_LENGTH]; + struct tsync_probe_request_msg probe; struct tsync_proto *proto; unsigned int command; + unsigned int size; + char *msg; int ret; clock_context_init(tsync, &proto, true); if (!tsync->context) return; + msg = (char *)&probe; + size = sizeof(probe); while (true) { + memset(&probe, 0, size); ret = tracecmd_msg_recv_time_sync(tsync->msg_handle, protocol, &command, - NULL, NULL); + &size, &msg); if (ret || strncmp(protocol, TRACECMD_TSYNC_PROTO_NONE, TRACECMD_TSYNC_PNAME_LENGTH) || command != TRACECMD_TIME_SYNC_CMD_PROBE) break; - ret = tsync_send(tsync, proto); + ret = tsync_send(tsync, proto, probe.cpu); if (ret) break; } } -static int tsync_get_sample(struct tracecmd_time_sync *tsync, - struct tsync_proto *proto, int array_step) +static int record_sync_sample(struct clock_sync_offsets *offsets, int array_step, + long long offset, long long scaling, long long ts) { - struct clock_sync_context *clock; long long *sync_scalings = NULL; long long *sync_offsets = NULL; long long *sync_ts = NULL; - long long timestamp = 0; - long long scaling = 0; - long long offset = 0; - int ret; - ret = proto->clock_sync_calc(tsync, &offset, &scaling, ×tamp); - if (ret) { - warning("Failed to synchronize timestamps with guest"); - return -1; - } - if (!offset || !timestamp || !scaling) - return 0; - clock = tsync->context; - if (clock->sync_count >= clock->sync_size) { - sync_ts = realloc(clock->sync_ts, - (clock->sync_size + array_step) * sizeof(long long)); - sync_offsets = realloc(clock->sync_offsets, - (clock->sync_size + array_step) * sizeof(long long)); - sync_scalings = realloc(clock->sync_scalings, - (clock->sync_size + array_step) * sizeof(long long)); + if (offsets->sync_count >= offsets->sync_size) { + sync_ts = realloc(offsets->sync_ts, + (offsets->sync_size + array_step) * sizeof(long long)); + sync_offsets = realloc(offsets->sync_offsets, + (offsets->sync_size + array_step) * sizeof(long long)); + sync_scalings = realloc(offsets->sync_scalings, + (offsets->sync_size + array_step) * sizeof(long long)); if (!sync_ts || !sync_offsets || !sync_scalings) { free(sync_ts); @@ -620,20 +693,43 @@ static int tsync_get_sample(struct tracecmd_time_sync *tsync, free(sync_scalings); return -1; } - clock->sync_size += array_step; - clock->sync_ts = sync_ts; - clock->sync_offsets = sync_offsets; - clock->sync_scalings = sync_scalings; + offsets->sync_size += array_step; + offsets->sync_ts = sync_ts; + offsets->sync_offsets = sync_offsets; + offsets->sync_scalings = sync_scalings; } - clock->sync_ts[clock->sync_count] = timestamp; - clock->sync_offsets[clock->sync_count] = offset; - clock->sync_scalings[clock->sync_count] = scaling; - clock->sync_count++; + offsets->sync_ts[offsets->sync_count] = ts; + offsets->sync_offsets[offsets->sync_count] = offset; + offsets->sync_scalings[offsets->sync_count] = scaling; + offsets->sync_count++; return 0; } +static int tsync_get_sample(struct tracecmd_time_sync *tsync, unsigned int cpu, + struct tsync_proto *proto, int array_step) +{ + struct clock_sync_context *clock; + long long timestamp = 0; + long long scaling = 0; + long long offset = 0; + int ret; + + ret = proto->clock_sync_calc(tsync, &offset, &scaling, ×tamp, cpu); + if (ret) { + warning("Failed to synchronize timestamps with guest"); + return -1; + } + if (!offset || !timestamp || !scaling) + return 0; + clock = tsync->context; + if (!clock || cpu >= clock->cpu_count || !clock->offsets) + return -1; + return record_sync_sample(&clock->offsets[cpu], array_step, + offset, scaling, timestamp); +} + #define TIMER_SEC_NANO 1000000000LL static inline void get_ts_loop_delay(struct timespec *timeout, int delay_ms) { @@ -651,12 +747,14 @@ static inline void get_ts_loop_delay(struct timespec *timeout, int delay_ms) #define CLOCK_TS_ARRAY 5 static int tsync_with_guest(struct tracecmd_time_sync *tsync) { + struct tsync_probe_request_msg probe; int ts_array_size = CLOCK_TS_ARRAY; struct tsync_proto *proto; struct timespec timeout; bool first = true; bool end = false; int ret; + int i; clock_context_init(tsync, &proto, false); if (!tsync->context) { @@ -670,16 +768,21 @@ static int tsync_with_guest(struct tracecmd_time_sync *tsync) while (true) { pthread_mutex_lock(&tsync->lock); - ret = tracecmd_msg_send_time_sync(tsync->msg_handle, - TRACECMD_TSYNC_PROTO_NONE, - TRACECMD_TIME_SYNC_CMD_PROBE, - 0, NULL); - ret = tsync_get_sample(tsync, proto, ts_array_size); + for (i = 0; i < tsync->vcpu_count; i++) { + probe.cpu = i; + ret = tracecmd_msg_send_time_sync(tsync->msg_handle, + TRACECMD_TSYNC_PROTO_NONE, + TRACECMD_TIME_SYNC_CMD_PROBE, + sizeof(probe), (char *)&probe); + ret = tsync_get_sample(tsync, i, proto, ts_array_size); + if (ret) + break; + } if (first) { first = false; pthread_barrier_wait(&tsync->first_sync); } - if (ret || end) + if (end || i < tsync->vcpu_count) break; if (tsync->loop_interval > 0) { get_ts_loop_delay(&timeout, tsync->loop_interval); @@ -813,41 +916,63 @@ error: int tracecmd_write_guest_time_shift(struct tracecmd_output *handle, struct tracecmd_time_sync *tsync) { - struct iovec vector[6]; + struct iovec *vector = NULL; unsigned int flags; long long *scalings = NULL; long long *offsets = NULL; long long *ts = NULL; + int vcount; int count; - int ret; + int i, j; + int ret = -1; - ret = tracecmd_tsync_get_offsets(tsync, &count, - &ts, &offsets, &scalings); - if (ret < 0 || !count || !ts || !offsets || !scalings) + if (!tsync->vcpu_count) + return -1; + vcount = 3 + (4 * tsync->vcpu_count); + vector = calloc(vcount, sizeof(struct iovec)); + if (!vector) return -1; ret = tsync_get_proto_flags(tsync, &flags); if (ret < 0) - return -1; - - vector[0].iov_len = 8; - vector[0].iov_base = &(tsync->trace_id); - vector[1].iov_len = 4; - vector[1].iov_base = &flags; - vector[2].iov_len = 4; - vector[2].iov_base = &count; - vector[3].iov_len = 8 * count; - vector[3].iov_base = ts; - vector[4].iov_len = 8 * count; - vector[4].iov_base = offsets; - vector[5].iov_len = 8 * count; - vector[5].iov_base = scalings; - tracecmd_add_option_v(handle, TRACECMD_OPTION_TIME_SHIFT, vector, 6); + goto out; + + j = 0; + vector[j].iov_len = 8; + vector[j++].iov_base = &tsync->trace_id; + vector[j].iov_len = 4; + vector[j++].iov_base = &flags; + vector[j].iov_len = 4; + vector[j++].iov_base = &tsync->vcpu_count; + for (i = 0; i < tsync->vcpu_count; i++) { + if (j >= vcount) + break; + ret = tracecmd_tsync_get_offsets(tsync, i, &count, + &ts, &offsets, &scalings); + if (ret < 0 || !count || !ts || !offsets || !scalings) + break; + vector[j].iov_len = 4; + vector[j++].iov_base = &count; + vector[j].iov_len = 8 * count; + vector[j++].iov_base = ts; + vector[j].iov_len = 8 * count; + vector[j++].iov_base = offsets; + vector[j].iov_len = 8 * count; + vector[j++].iov_base = scalings; + } + if (i < tsync->vcpu_count) { + ret = -1; + goto out; + } + tracecmd_add_option_v(handle, TRACECMD_OPTION_TIME_SHIFT, vector, vcount); #ifdef TSYNC_DEBUG if (count > 1) printf("Got %d timestamp synch samples in %lld ns trace\n\r", count, ts[count - 1] - ts[0]); #endif - return 0; + ret = 0; +out: + free(vector); + return ret; } /** diff --git a/tracecmd/trace-dump.c b/tracecmd/trace-dump.c index b6ae5638..3f56f65a 100644 --- a/tracecmd/trace-dump.c +++ b/tracecmd/trace-dump.c @@ -377,7 +377,8 @@ static void dump_option_timeshift(int fd, int size) long long trace_id; unsigned int count; unsigned int flags; - int i; + unsigned int cpus; + int i, j; /* * long long int (8 bytes) trace session ID @@ -395,29 +396,34 @@ static void dump_option_timeshift(int fd, int size) do_print(OPTIONS, "0x%llX [peer's trace id]\n", trace_id); read_file_number(fd, &flags, 4); do_print(OPTIONS, "0x%llX [peer's protocol flags]\n", flags); - read_file_number(fd, &count, 4); - do_print(OPTIONS, "%lld [samples count]\n", count); - times = calloc(count, sizeof(long long)); - if (!times) - goto out; - offsets = calloc(count, sizeof(long long)); - if (!offsets) - goto out; - scalings = calloc(count, sizeof(long long)); - if (!scalings) - goto out; - - for (i = 0; i < count; i++) - read_file_number(fd, times + i, 8); - for (i = 0; i < count; i++) - read_file_number(fd, offsets + i, 8); - for (i = 0; i < count; i++) - read_file_number(fd, scalings + i, 8); - - for (i = 0; i < count; i++) - do_print(OPTIONS, "\t%lld * %lld %llu [offset * scaling @ time]\n", - offsets[i], scalings[1], times[i]); + read_file_number(fd, &cpus, 4); + do_print(OPTIONS, "0x%llX [peer's CPU count]\n", cpus); + for (j = 0; j < cpus; j++) { + read_file_number(fd, &count, 4); + do_print(OPTIONS, "%lld [samples count for CPU %d]\n", count, j); + times = calloc(count, sizeof(long long)); + offsets = calloc(count, sizeof(long long)); + scalings = calloc(count, sizeof(long long)); + if (!times || !offsets || !scalings) + goto out; + for (i = 0; i < count; i++) + read_file_number(fd, times + i, 8); + for (i = 0; i < count; i++) + read_file_number(fd, offsets + i, 8); + for (i = 0; i < count; i++) + read_file_number(fd, scalings + i, 8); + + for (i = 0; i < count; i++) + do_print(OPTIONS, "\t%lld %lld %llu [offset * scaling @ time]\n", + offsets[i], scalings[1], times[i]); + free(times); + free(offsets); + free(scalings); + times = NULL; + offsets = NULL; + scalings = NULL; + } out: free(times); free(offsets); From patchwork Thu Mar 25 07:14:11 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Tzvetomir Stoyanov (VMware)" X-Patchwork-Id: 12163233 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-15.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9AB9AC433E2 for ; Thu, 25 Mar 2021 07:15:04 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 81B7161A2D for ; Thu, 25 Mar 2021 07:15:04 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229461AbhCYHOf (ORCPT ); Thu, 25 Mar 2021 03:14:35 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:41188 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229669AbhCYHOV (ORCPT ); Thu, 25 Mar 2021 03:14:21 -0400 Received: from mail-wm1-x336.google.com (mail-wm1-x336.google.com [IPv6:2a00:1450:4864:20::336]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id E7E0CC06174A for ; Thu, 25 Mar 2021 00:14:20 -0700 (PDT) Received: by mail-wm1-x336.google.com with SMTP id d8-20020a1c1d080000b029010f15546281so2482970wmd.4 for ; Thu, 25 Mar 2021 00:14:20 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=hhSBNWv4lmojCIYcNxFkvRVRI7WZ8QTBP3Y8R5EmkUA=; b=Dur3TpYadjNGYUeLsjTJ0qL/UAkbregr3zCsiL0drY2QOi0ii3cqQQJUdMaf+FWoie 8i8wRJOQ2c22FhrH8OXBe9oWkY94sDDm5IkhPutQhPOByvcfLD9uoA2k+qc10LSnZZKG XPvEes3xXbP4d0TtuQZR0VN546D+yY/HZYR4bBRE7RpLXZ8/87G/zT1yUojh5EO/eaRo qewHCADnRCTAWRE0WbgzE2xdbwhcZ/GZvIYlst5qP0VaH99cVtEj9gXNsUgzZefBzbKI VZ3JuuDI5jZINqRoObDxTeJlvfcRuDNGjrahzP0TwiiEufH82PHWSUm9ZeTz8gGRvWE7 hC+A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=hhSBNWv4lmojCIYcNxFkvRVRI7WZ8QTBP3Y8R5EmkUA=; b=LwrI9/71skLHYO/6jTaZ3CQ/EKtjHb1r8Q1uZh46sTNBQf698HCtcHjrAdrMPQ09zx ep+2QGQxhAjjcvh353/USkAb680kP8EHRheLFmVmlFIjpQaaO62ZZbivNxL+ituNYGtl RwYnvXO+oJqqGWx78PG6VpG7lIZgWalWnKuWYa6dfaxuolTQuoiUgUb4RYFZCIwQpj/Z RfaeRvm6xs3cgR4ewPwSEWjRxGkfvgZisdp/fX+IW9Defc+ETBB325Cuoa+JBRFgrrms LX0UFjUGXxgzEC3xTCnRZrmItskOOCdy/zdJc8KsSi1glqjJDcgBt5kODDqtDMFwuAeH OEnA== X-Gm-Message-State: AOAM531Lu4f9bDdjs2Ik8PDVzFsqu+QS842/AgKykJesqZ8zVVOvyWHP pCpI6nk8qZHm5evNUVi/Ux8= X-Google-Smtp-Source: ABdhPJw8BTv6CFVmxtXEAJ+L2QSTnoLsFRhAroVVWmkXLx52MPhdPMaOKtGZrvIcBb2+ZWj0ZaR02g== X-Received: by 2002:a05:600c:4ed1:: with SMTP id g17mr6346457wmq.67.1616656459588; Thu, 25 Mar 2021 00:14:19 -0700 (PDT) Received: from oberon.zico.biz ([83.222.187.186]) by smtp.gmail.com with ESMTPSA id q4sm5177487wma.20.2021.03.25.00.14.18 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 25 Mar 2021 00:14:18 -0700 (PDT) From: "Tzvetomir Stoyanov (VMware)" To: rostedt@goodmis.org Cc: linux-trace-devel@vger.kernel.org, stefano.devenuto99@gmail.com Subject: [PATCH v33 3/5] trace-cmd: PTP-like algorithm for host - guest timestamp synchronization Date: Thu, 25 Mar 2021 09:14:11 +0200 Message-Id: <20210325071413.543640-4-tz.stoyanov@gmail.com> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20210325071413.543640-1-tz.stoyanov@gmail.com> References: <20210325071413.543640-1-tz.stoyanov@gmail.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-trace-devel@vger.kernel.org PTP protocol is designed for synchronizing clocks of machines in a local network. The same approach can be used for host - guest timestamp synchronization. This implementation uses ftrace raw markers to track trace timestamps of PTP events. The patch is a POC, two different algorithms for PTP calculations are proposed: - Choosing the sample with the fastest response time for calculating the clocks offset. - Calculating the clocks offset using the average of all PTP samples. The implementation can be tuned using those parameters: - #define FASTEST_RESPONSE - is defined, the sample with the fastest response time is used for calculating the clocks offset. Otherwise the histogram of all samples is used. - #define PTP_SYNC_LOOP 340 - defines the number of samples, used for one calculation. - --tsync-interval - a trace-cmd argument, choose the intervals between offset calculations, performed continuously during the trace. - #define TSYNC_DEBUG - if defined, a debug information is collected and stored in files, in the guest machine: s-cid*.txt - For each offset calculation: host and guest clocks and calculated offset. res-cid*.txt - For each tracing session: all calculated clock offsets. Signed-off-by: Tzvetomir Stoyanov (VMware) --- lib/trace-cmd/Makefile | 3 + lib/trace-cmd/trace-timesync-ptp.c | 715 +++++++++++++++++++++++++++++ lib/trace-cmd/trace-timesync.c | 4 +- 3 files changed, 721 insertions(+), 1 deletion(-) create mode 100644 lib/trace-cmd/trace-timesync-ptp.c diff --git a/lib/trace-cmd/Makefile b/lib/trace-cmd/Makefile index aa92c8c0..486ea7d1 100644 --- a/lib/trace-cmd/Makefile +++ b/lib/trace-cmd/Makefile @@ -21,6 +21,9 @@ ifeq ($(PERF_DEFINED), 1) OBJS += trace-perf.o endif OBJS += trace-timesync.o +ifeq ($(VSOCK_DEFINED), 1) +OBJS += trace-timesync-ptp.o +endif # Additional util objects OBJS += trace-blk-hack.o diff --git a/lib/trace-cmd/trace-timesync-ptp.c b/lib/trace-cmd/trace-timesync-ptp.c new file mode 100644 index 00000000..b05f1cd0 --- /dev/null +++ b/lib/trace-cmd/trace-timesync-ptp.c @@ -0,0 +1,715 @@ +// SPDX-License-Identifier: LGPL-2.1 +/* + * Copyright (C) 2019, VMware, Tzvetomir Stoyanov tz.stoyanov@gmail.com> + * + */ + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include "trace-cmd.h" +#include "trace-cmd-private.h" +#include "tracefs.h" +#include "trace-tsync-local.h" +#include "trace-msg.h" +#include "trace-cmd-local.h" + +typedef __be32 be32; +typedef __u64 u64; +typedef __s64 s64; + +#define PTP_SYNC_LOOP 339 + +#define PTP_SYNC_PKT_START 1 +#define PTP_SYNC_PKT_PROBE 2 +#define PTP_SYNC_PKT_PROBES 3 +#define PTP_SYNC_PKT_OFFSET 4 +#define PTP_SYNC_PKT_END 5 + +/* print time sync debug messages */ +#define TSYNC_DEBUG + +struct ptp_clock_sync { + struct tep_handle *tep; + struct tep_format_field *id; + int raw_id; + int marker_fd; + int series_id; + int flags; + int debug_fd; +}; + +enum { +/* + * Consider only the probe with fastest response time, + * otherwise make a histogram from all probes. + */ + PTP_FLAG_FASTEST_RESPONSE = (1 << 0), +/* + * Use trace marker to get the clock, + * otherwise use the system clock directly. + */ + PTP_FLAG_USE_MARKER = (1 << 1), +}; +static int ptp_flags = PTP_FLAG_FASTEST_RESPONSE | PTP_FLAG_USE_MARKER; + +/* + * Calculated using formula [CPU rate]*[calculated offset deviation] + * tested on 3GHz CPU, with x86-tsc trace clock and compare the calculated + * offset with /sys/kernel/debug/kvm//vcpu0/tsc-offset + * measured 2000ns deviation + * using PTP flags PTP_FLAG_FASTEST_RESPONSE | PTP_FLAG_USE_MARKER + */ +#define PTP_ACCURACY 6000 +#define PTP_NAME "ptp" + +struct ptp_clock_start_msg { + be32 series_id; + be32 flags; +} __packed; + +struct ptp_clock_sample { + s64 ts; + be32 id; +} __packed; + +struct ptp_clock_result_msg { + be32 series_id; + be32 count; + struct ptp_clock_sample samples[2*PTP_SYNC_LOOP]; +} __packed; + +struct ptp_clock_offset_msg { + s64 ts; + s64 offset; +}; + +struct ptp_markers_context { + struct clock_sync_context *clock; + struct ptp_clock_sync *ptp; + struct ptp_clock_result_msg msg; + int size; +}; + +struct ptp_marker_buf { + int local_cid; + int remote_cid; + int count; + int packet_id; +} __packed; + +struct ptp_marker { + int series_id; + struct ptp_marker_buf data; +} __packed; + +static int ptp_clock_sync_init(struct tracecmd_time_sync *tsync) +{ + const char *systems[] = {"ftrace", NULL}; + struct clock_sync_context *clock_context; + struct ptp_clock_sync *ptp; + struct tep_event *raw; + char *path; + + if (!tsync || !tsync->context) + return -1; + clock_context = (struct clock_sync_context *)tsync->context; + if (clock_context->proto_data) + return 0; + + ptp = calloc(1, sizeof(struct ptp_clock_sync)); + if (!ptp) + return -1; + + ptp->marker_fd = -1; + ptp->debug_fd = -1; + + path = tracefs_instance_get_dir(clock_context->instance); + if (!path) + goto error; + ptp->tep = tracefs_local_events_system(path, systems); + tracefs_put_tracing_file(path); + if (!ptp->tep) + goto error; + raw = tep_find_event_by_name(ptp->tep, "ftrace", "raw_data"); + if (!raw) + goto error; + ptp->id = tep_find_field(raw, "id"); + if (!ptp->id) + goto error; + ptp->raw_id = raw->id; + + tep_set_file_bigendian(ptp->tep, tracecmd_host_bigendian()); + tep_set_local_bigendian(ptp->tep, tracecmd_host_bigendian()); + + path = tracefs_instance_get_file(clock_context->instance, "trace_marker_raw"); + if (!path) + goto error; + ptp->marker_fd = open(path, O_WRONLY); + tracefs_put_tracing_file(path); + + clock_context->proto_data = ptp; + +#ifdef TSYNC_DEBUG + if (clock_context->is_server) { + char buff[256]; + int res_fd; + + sprintf(buff, "res-cid%d.txt", clock_context->remote_cid); + + res_fd = open(buff, O_CREAT|O_WRONLY|O_TRUNC, 0644); + if (res_fd > 0) + close(res_fd); + } +#endif + + return 0; + +error: + if (ptp) { + tep_free(ptp->tep); + if (ptp->marker_fd >= 0) + close(ptp->marker_fd); + } + free(ptp); + return -1; +} + +static int ptp_clock_sync_free(struct tracecmd_time_sync *tsync) +{ + struct clock_sync_context *clock_context; + struct ptp_clock_sync *ptp; + + if (!tsync || !tsync->context) + return -1; + clock_context = (struct clock_sync_context *)tsync->context; + + if (clock_context && clock_context->proto_data) { + ptp = (struct ptp_clock_sync *)clock_context->proto_data; + tep_free(ptp->tep); + if (ptp->marker_fd >= 0) + close(ptp->marker_fd); + if (ptp->debug_fd >= 0) + close(ptp->debug_fd); + free(clock_context->proto_data); + clock_context->proto_data = NULL; + } + return 0; +} + +/* Save the timestamps of sent ('s') and returned ('r') probes in the + * ctx->msg.samples[] array. Depending of the context (server or client), there + * may be only returned probes, or both sent and returned probes. The returned + * probes are saved first in the array, after them are the sent probes. + * Depending of the context, the array can be with size: + * [0 .. max data.count] - holds only returned probes + * [0 .. 2 * max data.count] - holds both returned and sent probes + */ +static void ptp_probe_store(struct ptp_markers_context *ctx, + struct ptp_marker *marker, + unsigned long long ts) +{ + int index = -1; + + if (marker->data.packet_id == 'r' && + marker->data.count <= ctx->size) { + index = marker->data.count - 1; + } else if (marker->data.packet_id == 's' && + marker->data.count * 2 <= ctx->size){ + index = ctx->size / 2 + marker->data.count - 1; + } + + if (index >= 0) { + ctx->msg.samples[index].id = marker->data.count; + ctx->msg.samples[index].ts = ts; + ctx->msg.count++; + } +} + +static int ptp_marker_find(struct tep_event *event, struct tep_record *record, + int cpu, void *context) +{ + struct ptp_markers_context *ctx; + struct ptp_marker *marker; + + ctx = (struct ptp_markers_context *)context; + + /* Make sure this is our event */ + if (event->id != ctx->ptp->raw_id || !ctx->ptp->id) + return 0; + if (record->size >= (ctx->ptp->id->offset + sizeof(struct ptp_marker))) { + marker = (struct ptp_marker *)(record->data + ctx->ptp->id->offset); + if (marker->data.local_cid == ctx->clock->local_cid && + marker->data.remote_cid == ctx->clock->remote_cid && + marker->series_id == ctx->ptp->series_id && + marker->data.count) + ptp_probe_store(ctx, marker, record->ts); + } + + return 0; +} + +static inline bool good_probe(struct ptp_clock_sample *server_sample, + struct ptp_clock_sample *send_sample, + struct ptp_clock_sample *client_sample, + int *bad_probes) +{ + if (server_sample->ts && send_sample->ts && client_sample->ts && + server_sample->id == send_sample->id && + server_sample->id == client_sample->id) + return true; + (*bad_probes)++; + return false; +} + +static int ptp_calc_offset_fastest(struct clock_sync_context *clock, + struct ptp_clock_result_msg *server, + struct ptp_clock_result_msg *client, + long long *offset_ret, long long *ts_ret, + int *bad_probes) +{ + struct ptp_clock_sample *sample_send; + long long delta_min = LLONG_MAX; + long long offset = 0; + long long delta = 0; + long long ts = 0; + int max_i; + int i; + + *bad_probes = 0; + sample_send = server->samples + (server->count / 2); + max_i = server->count / 2 < client->count ? + server->count / 2 : client->count; + for (i = 0; i < max_i; i++) { + if (!good_probe(&server->samples[i], &sample_send[i], + &client->samples[i], bad_probes)) + continue; + ts = (sample_send[i].ts + server->samples[i].ts) / 2; + offset = client->samples[i].ts - ts; + + delta = server->samples[i].ts - sample_send[i].ts; + if (delta_min > delta) { + delta_min = delta; + *offset_ret = offset; + *ts_ret = ts; + } +#ifdef TSYNC_DEBUG + { + struct ptp_clock_sync *ptp; + + ptp = (struct ptp_clock_sync *)clock->proto_data; + if (ptp && ptp->debug_fd > 0) { + char buff[256]; + + sprintf(buff, "%lld %lld %lld\n", + ts, client->samples[i].ts, offset); + write(ptp->debug_fd, buff, strlen(buff)); + } + } +#endif + } + + return 0; +} + +static int ptp_calc_offset_hist(struct clock_sync_context *clock, + struct ptp_clock_result_msg *server, + struct ptp_clock_result_msg *client, + long long *offset_ret, long long *ts_ret, + int *bad_probes) +{ + struct ptp_clock_sample *sample_send; + long long timestamps[PTP_SYNC_LOOP]; + long long offsets[PTP_SYNC_LOOP]; + long long offset_min = LLONG_MAX; + long long offset_max = 0; + int hist[PTP_SYNC_LOOP]; + int ind, max = 0; + long long bin; + int i, k = 0; + + *bad_probes = 0; + memset(hist, 0, sizeof(int) * PTP_SYNC_LOOP); + sample_send = server->samples + (server->count / 2); + for (i = 0; i * 2 < server->count && i < client->count; i++) { + if (!good_probe(&server->samples[i], &sample_send[i], + &client->samples[i], bad_probes)) + continue; + timestamps[k] = (sample_send[i].ts + server->samples[i].ts) / 2; + offsets[k] = client->samples[i].ts - timestamps[k]; + if (offset_max < llabs(offsets[k])) + offset_max = llabs(offsets[k]); + if (offset_min > llabs(offsets[k])) + offset_min = llabs(offsets[k]); +#ifdef TSYNC_DEBUG + { + struct ptp_clock_sync *ptp; + + ptp = (struct ptp_clock_sync *)clock->proto_data; + + if (ptp && ptp->debug_fd > 0) { + char buff[256]; + + sprintf(buff, "%lld %lld %lld\n", + timestamps[k], + client->samples[i].ts, offsets[k]); + write(ptp->debug_fd, buff, strlen(buff)); + } + } +#endif + k++; + } + + bin = (offset_max - offset_min) / PTP_SYNC_LOOP; + for (i = 0; i < k; i++) { + ind = (llabs(offsets[i]) - offset_min) / bin; + if (ind < PTP_SYNC_LOOP) { + hist[ind]++; + if (max < hist[ind]) { + max = hist[ind]; + *offset_ret = offsets[i]; + *ts_ret = timestamps[i]; + } + } + } + + return 0; +} + +static void ntoh_ptp_results(struct ptp_clock_result_msg *msg) +{ + int i; + + msg->count = ntohl(msg->count); + for (i = 0; i < msg->count; i++) { + msg->samples[i].id = ntohl(msg->samples[i].id); + msg->samples[i].ts = ntohll(msg->samples[i].ts); + } + msg->series_id = ntohl(msg->series_id); +} + + +static void hton_ptp_results(struct ptp_clock_result_msg *msg) +{ + int i; + + for (i = 0; i < msg->count; i++) { + msg->samples[i].id = htonl(msg->samples[i].id); + msg->samples[i].ts = htonll(msg->samples[i].ts); + } + msg->series_id = htonl(msg->series_id); + msg->count = htonl(msg->count); +} + +static inline void ptp_track_clock(struct ptp_markers_context *ctx, + struct ptp_marker *marker) +{ + if (ctx->ptp->flags & PTP_FLAG_USE_MARKER) { + write(ctx->ptp->marker_fd, marker, sizeof(struct ptp_marker)); + } else { + struct timespec clock; + unsigned long long ts; + + clock_gettime(CLOCK_MONOTONIC_RAW, &clock); + ts = clock.tv_sec * 1000000000LL; + ts += clock.tv_nsec; + ptp_probe_store(ctx, marker, ts); + } +} + +static int ptp_clock_client(struct tracecmd_time_sync *tsync, + long long *offset, long long *timestamp) +{ + char sync_proto[TRACECMD_TSYNC_PNAME_LENGTH]; + struct clock_sync_context *clock_context; + struct ptp_clock_offset_msg res_offset; + struct ptp_clock_start_msg start; + struct ptp_markers_context ctx; + struct ptp_clock_sync *ptp; + struct ptp_marker marker; + unsigned int sync_msg; + unsigned int size; + char *msg; + int count; + int ret; + + if (!tsync || !tsync->context || !tsync->msg_handle) + return -1; + + clock_context = (struct clock_sync_context *)tsync->context; + if (clock_context->proto_data == NULL) + return -1; + + ptp = (struct ptp_clock_sync *)clock_context->proto_data; + size = sizeof(start); + msg = (char *)&start; + ret = tracecmd_msg_recv_time_sync(tsync->msg_handle, + sync_proto, &sync_msg, + &size, &msg); + if (ret || strncmp(sync_proto, PTP_NAME, TRACECMD_TSYNC_PNAME_LENGTH) || + sync_msg != PTP_SYNC_PKT_START) + return -1; + ret = tracecmd_msg_send_time_sync(tsync->msg_handle, PTP_NAME, + PTP_SYNC_PKT_START, sizeof(start), + (char *)&start); + marker.data.local_cid = clock_context->local_cid; + marker.data.remote_cid = clock_context->remote_cid; + marker.series_id = ntohl(start.series_id); + marker.data.packet_id = 'r'; + ptp->series_id = marker.series_id; + ptp->flags = ntohl(start.flags); + msg = (char *)&count; + size = sizeof(count); + ctx.msg.count = 0; + ctx.size = PTP_SYNC_LOOP; + ctx.ptp = ptp; + ctx.clock = clock_context; + ctx.msg.series_id = ptp->series_id; + while (true) { + count = 0; + ret = tracecmd_msg_recv_time_sync(tsync->msg_handle, + sync_proto, &sync_msg, + &size, &msg); + if (ret || strncmp(sync_proto, PTP_NAME, TRACECMD_TSYNC_PNAME_LENGTH) || + sync_msg != PTP_SYNC_PKT_PROBE || !ntohl(count)) + break; + marker.data.count = ntohl(count); + ptp_track_clock(&ctx, &marker); + ret = tracecmd_msg_send_time_sync(tsync->msg_handle, PTP_NAME, + PTP_SYNC_PKT_PROBE, + sizeof(count), (char *)&count); + if (ret) + break; + } + + if (strncmp(sync_proto, PTP_NAME, TRACECMD_TSYNC_PNAME_LENGTH) || + sync_msg != PTP_SYNC_PKT_END) + return -1; + + if (ptp->flags & PTP_FLAG_USE_MARKER) + tracefs_iterate_raw_events(ptp->tep, clock_context->instance, + NULL, 0, ptp_marker_find, &ctx); + + hton_ptp_results(&ctx.msg); + ret = tracecmd_msg_send_time_sync(tsync->msg_handle, PTP_NAME, + PTP_SYNC_PKT_PROBES, + sizeof(ctx.msg), (char *)&ctx.msg); + + msg = (char *)&res_offset; + size = sizeof(res_offset); + ret = tracecmd_msg_recv_time_sync(tsync->msg_handle, + sync_proto, &sync_msg, + &size, (char **)&msg); + if (ret || strncmp(sync_proto, PTP_NAME, TRACECMD_TSYNC_PNAME_LENGTH) || + sync_msg != PTP_SYNC_PKT_OFFSET) + return -1; + + *offset = ntohll(res_offset.offset); + *timestamp = ntohll(res_offset.ts); + + return 0; +} + + +static int ptp_clock_server(struct tracecmd_time_sync *tsync, + long long *offset, long long *timestamp) +{ + char sync_proto[TRACECMD_TSYNC_PNAME_LENGTH]; + struct ptp_clock_result_msg *results = NULL; + struct clock_sync_context *clock_context; + struct ptp_clock_offset_msg res_offset; + struct ptp_clock_start_msg start; + struct ptp_markers_context ctx; + int sync_loop = PTP_SYNC_LOOP; + struct ptp_clock_sync *ptp; + struct ptp_marker marker; + unsigned int sync_msg; + unsigned int size; + int bad_probes; + int count = 1; + int msg_count; + int msg_ret; + char *msg; + int ret; + + if (!tsync || !tsync->context || !tsync->msg_handle) + return -1; + + clock_context = (struct clock_sync_context *)tsync->context; + if (clock_context->proto_data == NULL) + return -1; + + ptp = (struct ptp_clock_sync *)clock_context->proto_data; + ptp->flags = ptp_flags; + memset(&start, 0, sizeof(start)); + start.series_id = htonl(ptp->series_id + 1); + start.flags = htonl(ptp->flags); + ret = tracecmd_msg_send_time_sync(tsync->msg_handle, PTP_NAME, + PTP_SYNC_PKT_START, sizeof(start), + (char *)&start); + if (!ret) + ret = tracecmd_msg_recv_time_sync(tsync->msg_handle, + sync_proto, &sync_msg, + NULL, NULL); + if (ret || strncmp(sync_proto, PTP_NAME, TRACECMD_TSYNC_PNAME_LENGTH) || + sync_msg != PTP_SYNC_PKT_START) + return -1; + + tracefs_instance_file_write(clock_context->instance, "trace", "\0"); + + ptp->series_id++; + marker.data.local_cid = clock_context->local_cid; + marker.data.remote_cid = clock_context->remote_cid; + marker.series_id = ptp->series_id; + msg = (char *)&msg_ret; + size = sizeof(msg_ret); + ctx.size = 2*PTP_SYNC_LOOP; + ctx.ptp = ptp; + ctx.clock = clock_context; + ctx.msg.count = 0; + ctx.msg.series_id = ptp->series_id; + do { + marker.data.count = count++; + marker.data.packet_id = 's'; + msg_count = htonl(marker.data.count); + ptp_track_clock(&ctx, &marker); + ret = tracecmd_msg_send_time_sync(tsync->msg_handle, PTP_NAME, + PTP_SYNC_PKT_PROBE, + sizeof(msg_count), + (char *)&msg_count); + if (!ret) + ret = tracecmd_msg_recv_time_sync(tsync->msg_handle, + sync_proto, &sync_msg, + &size, &msg); + + marker.data.packet_id = 'r'; + ptp_track_clock(&ctx, &marker); + if (ret || strncmp(sync_proto, PTP_NAME, TRACECMD_TSYNC_PNAME_LENGTH) || + sync_msg != PTP_SYNC_PKT_PROBE || + ntohl(msg_ret) != marker.data.count) + break; + } while (--sync_loop); + + if (sync_loop) + return -1; + + ret = tracecmd_msg_send_time_sync(tsync->msg_handle, PTP_NAME, + PTP_SYNC_PKT_END, 0, NULL); + + size = 0; + ret = tracecmd_msg_recv_time_sync(tsync->msg_handle, + sync_proto, &sync_msg, + &size, (char **)&results); + if (ret || strncmp(sync_proto, PTP_NAME, TRACECMD_TSYNC_PNAME_LENGTH) || + sync_msg != PTP_SYNC_PKT_PROBES || size == 0 || results == NULL) + return -1; + + ntoh_ptp_results(results); + if (ptp->flags & PTP_FLAG_USE_MARKER) + tracefs_iterate_raw_events(ptp->tep, clock_context->instance, + NULL, 0, ptp_marker_find, &ctx); + if (ptp->flags & PTP_FLAG_FASTEST_RESPONSE) + ptp_calc_offset_fastest(clock_context, &ctx.msg, results, offset, + timestamp, &bad_probes); + else + ptp_calc_offset_hist(clock_context, &ctx.msg, results, offset, + timestamp, &bad_probes); +#ifdef TSYNC_DEBUG + { + char buff[256]; + int res_fd; + + sprintf(buff, "res-cid%d.txt", clock_context->remote_cid); + + res_fd = open(buff, O_WRONLY|O_APPEND, 0644); + if (res_fd > 0) { + if (*offset && *timestamp) { + sprintf(buff, "%d %lld %lld\n", + ptp->series_id, *offset, *timestamp); + write(res_fd, buff, strlen(buff)); + } + close(res_fd); + } + + printf("\n calculated offset %d: %lld, %d probes, filtered out %d, PTP flags 0x%X\n\r", + ptp->series_id, *offset, results->count, bad_probes, ptp->flags); + if (ptp && ptp->debug_fd > 0) { + sprintf(buff, "%lld %lld 0\n", *offset, *timestamp); + write(ptp->debug_fd, buff, strlen(buff)); + close(ptp->debug_fd); + ptp->debug_fd = -1; + } + + } +#endif + + res_offset.offset = htonll(*offset); + res_offset.ts = htonll(*timestamp); + ret = tracecmd_msg_send_time_sync(tsync->msg_handle, PTP_NAME, + PTP_SYNC_PKT_OFFSET, + sizeof(res_offset), + (char *)&res_offset); + + free(results); + return 0; +} + +static int ptp_clock_sync_calc(struct tracecmd_time_sync *tsync, + long long *offset, long long *scaling, + long long *timestamp, unsigned int cpu) +{ + struct clock_sync_context *clock_context; + int ret; + + if (!tsync || !tsync->context) + return -1; + clock_context = (struct clock_sync_context *)tsync->context; + +#ifdef TSYNC_DEBUG + if (clock_context->is_server) { + struct ptp_clock_sync *ptp; + char buff[256]; + + ptp = (struct ptp_clock_sync *)clock_context->proto_data; + if (ptp->debug_fd > 0) + close(ptp->debug_fd); + sprintf(buff, "s-cid%d_%d.txt", + clock_context->remote_cid, ptp->series_id+1); + ptp->debug_fd = open(buff, O_CREAT|O_WRONLY|O_TRUNC, 0644); + } +#endif + + if (scaling) + *scaling = 1; + if (clock_context->is_server) + ret = ptp_clock_server(tsync, offset, timestamp); + else + ret = ptp_clock_client(tsync, offset, timestamp); + + return ret; +} + +int ptp_clock_sync_register(void) +{ + return tracecmd_tsync_proto_register(PTP_NAME, PTP_ACCURACY, + TRACECMD_TIME_SYNC_ROLE_GUEST | + TRACECMD_TIME_SYNC_ROLE_HOST, + 0, TRACECMD_TSYNC_FLAG_INTERPOLATE, + ptp_clock_sync_init, + ptp_clock_sync_free, + ptp_clock_sync_calc); + +} + +int ptp_clock_sync_unregister(void) +{ + return tracecmd_tsync_proto_unregister(PTP_NAME); +} diff --git a/lib/trace-cmd/trace-timesync.c b/lib/trace-cmd/trace-timesync.c index 556004f1..ada7ddb6 100644 --- a/lib/trace-cmd/trace-timesync.c +++ b/lib/trace-cmd/trace-timesync.c @@ -64,7 +64,9 @@ static struct tsync_proto *tsync_proto_find(const char *proto_name) */ void tracecmd_tsync_init(void) { - +#ifdef VSOCK + ptp_clock_sync_register(); +#endif } int tracecmd_tsync_proto_register(const char *proto_name, int accuracy, int roles, From patchwork Thu Mar 25 07:14:12 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Tzvetomir Stoyanov (VMware)" X-Patchwork-Id: 12163227 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-15.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9133EC433E3 for ; Thu, 25 Mar 2021 07:15:04 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 728CB61A2C for ; Thu, 25 Mar 2021 07:15:04 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229581AbhCYHOg (ORCPT ); Thu, 25 Mar 2021 03:14:36 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:41200 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229693AbhCYHOX (ORCPT ); Thu, 25 Mar 2021 03:14:23 -0400 Received: from mail-wm1-x32a.google.com (mail-wm1-x32a.google.com [IPv6:2a00:1450:4864:20::32a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 82AF0C06174A for ; Thu, 25 Mar 2021 00:14:22 -0700 (PDT) Received: by mail-wm1-x32a.google.com with SMTP id r10-20020a05600c35cab029010c946c95easo543339wmq.4 for ; Thu, 25 Mar 2021 00:14:22 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=s/wYTd27b2A1EJhLKQ1KtM3Ao7q7OdGdfUq/Zly0vtY=; b=bfUqKO0Ass318CveSiyUR5+3y4Go39wsf0+VJIA5wlcn/AtaGbHISwxwY/ljaabzJv 97l7LAhJ1GHRl//z6i8pd3yPpFF2Nt24SOyxKQpRHPxUcwChFMR3MRw6vOEhaRqhPB0A 19zLIZgMvAKRm0TH2W6iYsykNF1JZTmAxpE/PK3nJGjdHYmP5G+sdrwqbXqfkgpV6Ra5 IhnC1KMnLbpNwZKRhQ630foSO/bdmxNn9z/fsghfLbmlwqT7qvIeY8I6fkpQhrjijo1b d3FCa5DciAvuJXew2EkUVjVBJdv8fbmvQDTxQTqK5ldonXQRnYGZ3xiVTDR21QPAoOKi cHuQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=s/wYTd27b2A1EJhLKQ1KtM3Ao7q7OdGdfUq/Zly0vtY=; b=sFJe7wWl1Urz3qCn8B43JLE/bT5aMXfU004evibVSlFRrLGPbQhedcvueMhWM8Dsvp SiMtSCjF2FjM5Nb7o0E9qyb4e1CP16vVRbl9DTAcMxoalwsj+tEjrUtHYZq/6mh+MQ/t yoZlzz0h7nxdAmL8T1DN3G/HCxXF1zFjbEHEz0Zch0jOOElKhkCis8wOl6+kUUDaWrov 8QAA5l9XBx8uQl0X63qhM8u0ZwPzbSZdSi9mRc4Vk7PlInd+md42KGu9rW0/w95Oa+db /eDno+krqhrqljkgVqy9XWFWdik5z0Y8mOQfFCnBAcL51pcXKET1I5hwClZyJAUYA0Xq WY7w== X-Gm-Message-State: AOAM531tVVCkmHMTRZjnAu7Mz7daM4W+kd2VIdtooGGS+/MuA8yVNfsb DZCHPpHHQ5PKlJjcOgkS1e3ucDW70JtT0g== X-Google-Smtp-Source: ABdhPJzI7bw1fiE1XaTr/OhBs8P9uOSslY9y0pgqwhkGjuXfEkEVnEwT31vaLQ8MfxziKMsFPT5xew== X-Received: by 2002:a1c:c20a:: with SMTP id s10mr6343878wmf.144.1616656460870; Thu, 25 Mar 2021 00:14:20 -0700 (PDT) Received: from oberon.zico.biz ([83.222.187.186]) by smtp.gmail.com with ESMTPSA id q4sm5177487wma.20.2021.03.25.00.14.19 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 25 Mar 2021 00:14:20 -0700 (PDT) From: "Tzvetomir Stoyanov (VMware)" To: rostedt@goodmis.org Cc: linux-trace-devel@vger.kernel.org, stefano.devenuto99@gmail.com Subject: [PATCH v33 4/5] trace-cmd: Debug scripts for PTP-like algorithm for host - guest timestamp synchronization Date: Thu, 25 Mar 2021 09:14:12 +0200 Message-Id: <20210325071413.543640-5-tz.stoyanov@gmail.com> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20210325071413.543640-1-tz.stoyanov@gmail.com> References: <20210325071413.543640-1-tz.stoyanov@gmail.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-trace-devel@vger.kernel.org These scripts can be used to visualise debug files, written when the PTP-like algorithm is compiled with TSYNC_DEBUG defined. The files are located in the guest machine: s-cid*.txt - For each offset calculation: host and guest clocks and calculated offset. res-cid*.txt - For each tracing session: all calculated clock offsets. tsync_hist.py plots a histogram, using data from a s-cid*.txt file: "python tsync_hist.py s-cid2_1.txt" tsync_res.py plots a line, using data from res-cid*.txt file: "python tsync_res.py res-cid2.txt" Signed-off-by: Tzvetomir Stoyanov (VMware) --- scripts/debug/tsync_hist.py | 57 +++++++++++++++++++++++++++++++++++++ scripts/debug/tsync_readme | 12 ++++++++ scripts/debug/tsync_res.py | 46 ++++++++++++++++++++++++++++++ 3 files changed, 115 insertions(+) create mode 100644 scripts/debug/tsync_hist.py create mode 100644 scripts/debug/tsync_readme create mode 100644 scripts/debug/tsync_res.py diff --git a/scripts/debug/tsync_hist.py b/scripts/debug/tsync_hist.py new file mode 100644 index 00000000..819d1e8f --- /dev/null +++ b/scripts/debug/tsync_hist.py @@ -0,0 +1,57 @@ +# SPDX-License-Identifier: GPL-2.0 +# +# Copyright (C) 2019, VMware Inc, Tzvetomir Stoyanov +# Copyright (C) 2019, VMware Inc, Yordan Karadzhov + + +import matplotlib.pyplot as plt +import matplotlib.lines as mlines +import numpy as np +import sys + +def newline(p1, p2): + ax = plt.gca() + xmin, xmax = ax.get_xbound() + + if(p2[0] == p1[0]): + xmin = xmax = p1[0] + ymin, ymax = ax.get_ybound() + else: + ymax = p1[1]+(p2[1]-p1[1])/(p2[0]-p1[0])*(xmax-p1[0]) + ymin = p1[1]+(p2[1]-p1[1])/(p2[0]-p1[0])*(xmin-p1[0]) + + l = mlines.Line2D([xmin,xmax], [ymin,ymax], color='red') + ax.add_line(l) + return l + + +data = np.loadtxt(fname = sys.argv[1]) +selected_ts = data[-1, 1] +selected_ofs = data[-1, 0] +data = data[:-1,:] + +x = data[:, 1] - data[:, 0] + +mean = x.mean() +std = x.std() + +num_bins = 500 +min = x.min() #+ .4 * (x.max() - x.min()) +max = x.max() #- .4 * (x.max() - x.min()) +bins = np.linspace(min, max, num_bins, endpoint = False, dtype=int) + +fig, ax = plt.subplots() + +# the histogram of the data +n, bins, patches = ax.hist(x, bins, histtype=u'step'); + +ax.set_xlabel('clock offset [$\mu$s]') +ax.set_ylabel('entries') +ax.set_title("$\sigma$=%i" % std) + +x1, y1 = [selected_ofs, min], [selected_ofs, max] +newline(x1, y1) + +# Tweak spacing to prevent clipping of ylabel +fig.tight_layout() +plt.show() diff --git a/scripts/debug/tsync_readme b/scripts/debug/tsync_readme new file mode 100644 index 00000000..f3ebb25d --- /dev/null +++ b/scripts/debug/tsync_readme @@ -0,0 +1,12 @@ +PTP-like algorithm debug +======================== + +tsync_*.py scripts can be used to visualise debug files, written when the PTP-like algorithm +is compiled with TSYNC_DEBUG defined. The files are located in the guest machine: + s-cid*.txt - For each offset calculation: host and guest clocks and calculated offset. + res-cid*.txt - For each tracing session: all calculated clock offsets. + +tsync_hist.py plots a histogram, using data from a s-cid*.txt file: + "python tsync_hist.py s-cid2_1.txt" +tsync_res.py plots a line, using data from res-cid*.txt file: + "python tsync_res.py res-cid2.txt" diff --git a/scripts/debug/tsync_res.py b/scripts/debug/tsync_res.py new file mode 100644 index 00000000..7d109863 --- /dev/null +++ b/scripts/debug/tsync_res.py @@ -0,0 +1,46 @@ +# SPDX-License-Identifier: GPL-2.0 +# +# Copyright (C) 2019, VMware Inc, Tzvetomir Stoyanov +# Copyright (C) 2019, VMware Inc, Yordan Karadzhov + + +import matplotlib.pyplot as plt +import matplotlib.lines as mlines +import numpy as np +import sys + +def newline(p1, p2): + ax = plt.gca() + xmin, xmax = ax.get_xbound() + + if(p2[0] == p1[0]): + xmin = xmax = p1[0] + ymin, ymax = ax.get_ybound() + else: + ymax = p1[1]+(p2[1]-p1[1])/(p2[0]-p1[0])*(xmax-p1[0]) + ymin = p1[1]+(p2[1]-p1[1])/(p2[0]-p1[0])*(xmin-p1[0]) + + l = mlines.Line2D([xmin,xmax], [ymin,ymax], color='red') + ax.add_line(l) + return l + +data = np.loadtxt(fname = sys.argv[1]) +x = data[:, 0] +y = data[:, 1] + +fig, ax = plt.subplots() + +ax.set_xlabel('samples (t)') +ax.set_ylabel('clock offset') +ax.set_title("$\delta$=%i ns" % (max(y) - min(y))) + +l = mlines.Line2D(x, y) +ax.add_line(l) +ax.set_xlim(min(x), max(x)) +ax.set_ylim(min(y), max(y) ) + +print(min(y), max(y), max(y) - min(y)) + +# Tweak spacing to prevent clipping of ylabel +fig.tight_layout() +plt.show() From patchwork Thu Mar 25 07:14:13 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Tzvetomir Stoyanov (VMware)" X-Patchwork-Id: 12163229 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-15.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8E108C433E0 for ; Thu, 25 Mar 2021 07:15:04 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 62FC161A2A for ; Thu, 25 Mar 2021 07:15:04 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229590AbhCYHOg (ORCPT ); Thu, 25 Mar 2021 03:14:36 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:41202 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229716AbhCYHOY (ORCPT ); Thu, 25 Mar 2021 03:14:24 -0400 Received: from mail-wr1-x431.google.com (mail-wr1-x431.google.com [IPv6:2a00:1450:4864:20::431]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 8CEA7C06174A for ; Thu, 25 Mar 2021 00:14:23 -0700 (PDT) Received: by mail-wr1-x431.google.com with SMTP id c8so1148362wrq.11 for ; Thu, 25 Mar 2021 00:14:23 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=XYSmMLM3iWs4wwsj53ncOba+VO0uBWwhQt0ixqq+Igc=; b=rqx3PPPK8FYodPZ6a4XKMYW8SlTdd1P+8ALfY/rDZnkW19tniIeBGvlsx0b2WBOM2X vV/goiPiRNEp+3tHUWriMAWXiafZ1MQpZNo4FnvyzatltAWItJraiXK6MtyW2h/9pCfs RWzgCDMw43Zdj6W9tpoup/3haxp9YknaNlqHvNDiwr/wrQxKK4HBP1ZTLQXx2ZcW893y THuLbXmOOuoEJPHtAUw9sYbKVTOAksiTGk2S1/ftAwehJ1VkSelam/Yj1IhcK8yFN8u+ aAnmZHDEZgM3UVroT6ByWOogVwmigPlKFPC587Sn71KwlxuR0NcvZtztY302eRHV+lak lWMQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=XYSmMLM3iWs4wwsj53ncOba+VO0uBWwhQt0ixqq+Igc=; b=e9NJgEoQZZx6L6RnysmhpHpprkIj5HBUC6CkRU/oQ2f8AoctxQF97YdTJLLeCRnZgl BORuggKVO8gtOISTPoaBlYPQnC72oVECl3s9P/ZvbAwveY33y8pOMfLFHazQ3SMPn5E7 QV6wtVYeftfnOyv0mQjsMsZq6gYIO0S3Aax3mXvHd/IV+CX4MoV2smrCmKNAwdJB4BUe lUVI4aYUyThpl/a1waskmnosq5j2VG90arxSTfWQhShv+KJ94jRDHf5o8c3Pb/LpIdq+ GL5uHq1DF1CP0Rv/HNnsqKqgV+kUHa0pDCzrBd1/oIg1VegdZGFGGgU0vtDxv7TYYid0 DhHQ== X-Gm-Message-State: AOAM53121E3AS+RG9Ze3Mj/7r7924d6bJHQznDxVkQZIcgaI03etB5oj 7op/GEbmfAnLqN4f7f0M4R0= X-Google-Smtp-Source: ABdhPJyyOu7rTOYZp7jVwK2CXcHXVp6UMkymPC5LTNi9ZM8jC28e72OXSeXAVEM53BaZmX1l/Ihllw== X-Received: by 2002:a5d:4ac7:: with SMTP id y7mr7304732wrs.395.1616656462289; Thu, 25 Mar 2021 00:14:22 -0700 (PDT) Received: from oberon.zico.biz ([83.222.187.186]) by smtp.gmail.com with ESMTPSA id q4sm5177487wma.20.2021.03.25.00.14.21 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 25 Mar 2021 00:14:21 -0700 (PDT) From: "Tzvetomir Stoyanov (VMware)" To: rostedt@goodmis.org Cc: linux-trace-devel@vger.kernel.org, stefano.devenuto99@gmail.com Subject: [PATCH v33 5/5] trace-cmd [POC]: Add KVM timestamp synchronization plugin Date: Thu, 25 Mar 2021 09:14:13 +0200 Message-Id: <20210325071413.543640-6-tz.stoyanov@gmail.com> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20210325071413.543640-1-tz.stoyanov@gmail.com> References: <20210325071413.543640-1-tz.stoyanov@gmail.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-trace-devel@vger.kernel.org Added new timestamp synchronization plugin for KVM hosts. It reads the clock offsets directly from the KVM debug filesystem, if available. The plugin works only with x86-tsc ftrace clock. Note: KVM clock scaling is not supported yet. If a non-default KVM scaling is detected, the KVM plugin is not used for timestamp synchronization. Signed-off-by: Tzvetomir Stoyanov (VMware) --- lib/trace-cmd/Makefile | 1 + lib/trace-cmd/include/trace-tsync-local.h | 1 + lib/trace-cmd/trace-timesync-kvm.c | 533 ++++++++++++++++++++++ lib/trace-cmd/trace-timesync.c | 1 + 4 files changed, 536 insertions(+) create mode 100644 lib/trace-cmd/trace-timesync-kvm.c diff --git a/lib/trace-cmd/Makefile b/lib/trace-cmd/Makefile index 486ea7d1..afd74e1d 100644 --- a/lib/trace-cmd/Makefile +++ b/lib/trace-cmd/Makefile @@ -23,6 +23,7 @@ endif OBJS += trace-timesync.o ifeq ($(VSOCK_DEFINED), 1) OBJS += trace-timesync-ptp.o +OBJS += trace-timesync-kvm.o endif # Additional util objects diff --git a/lib/trace-cmd/include/trace-tsync-local.h b/lib/trace-cmd/include/trace-tsync-local.h index b3a3bdfc..7e56d187 100644 --- a/lib/trace-cmd/include/trace-tsync-local.h +++ b/lib/trace-cmd/include/trace-tsync-local.h @@ -65,5 +65,6 @@ int tracecmd_tsync_proto_register(const char *proto_name, int accuracy, int role int tracecmd_tsync_proto_unregister(char *proto_name); int ptp_clock_sync_register(void); +int kvm_clock_sync_register(void); #endif /* _TRACE_TSYNC_LOCAL_H */ diff --git a/lib/trace-cmd/trace-timesync-kvm.c b/lib/trace-cmd/trace-timesync-kvm.c new file mode 100644 index 00000000..c229c2d7 --- /dev/null +++ b/lib/trace-cmd/trace-timesync-kvm.c @@ -0,0 +1,533 @@ +// SPDX-License-Identifier: LGPL-2.1 +/* + * Copyright (C) 2020, VMware, Tzvetomir Stoyanov tz.stoyanov@gmail.com> + * + */ + +#include +#include +#include +#include +#include +#include + +#include "trace-cmd.h" +#include "trace-cmd-private.h" +#include "tracefs.h" +#include "trace-tsync-local.h" + +#define KVM_DEBUG_FS "/sys/kernel/debug/kvm" +#define KVM_DEBUG_OFFSET_FILE "tsc-offset" +#define KVM_DEBUG_SCALING_FILE "tsc-scaling-ratio" +#define KVM_DEBUG_VCPU_DIR "vcpu" + +/* default KVM scaling values, taken from the Linux kernel */ +#define KVM_SCALING_AMD_DEFAULT (1ULL<<32) +#define KVM_SCALING_INTEL_DEFAULT (1ULL<<48) + +#define KVM_SYNC_PKT_REQUEST 1 +#define KVM_SYNC_PKT_RESPONSE 2 + +typedef __s64 s64; + +#define KVM_ACCURACY 0 +#define KVM_NAME "kvm" + +struct kvm_clock_sync { + int vcpu_count; + char **vcpu_offsets; + char **vcpu_scalings; + int marker_fd; + struct tep_handle *tep; + int raw_id; + unsigned long long ts; +}; + +struct kvm_clock_offset_msg { + s64 ts; + s64 offset; + s64 scaling; +}; + +static int read_ll_form_file(char *file, long long *res) +{ + char buf[32]; + int ret; + int fd; + + if (!file) + return -1; + fd = open(file, O_RDONLY | O_NONBLOCK); + if (fd < 0) + return -1; + ret = read(fd, buf, 32); + close(fd); + if (ret <= 0) + return -1; + + *res = strtoll(buf, NULL, 10); + + return 0; +} + +static bool kvm_scaling_check_vm_cpu(char *vname, char *cpu) +{ + long long scaling; + char *path; + int ret; + + if (asprintf(&path, "%s/%s/%s", vname, cpu, KVM_DEBUG_SCALING_FILE) < 0) + return true; + ret = read_ll_form_file(path, &scaling); + free(path); + + /* + * If there is a scaling, different from the default - + * return false, not supported. + */ + if (!ret && + scaling != KVM_SCALING_AMD_DEFAULT && scaling != KVM_SCALING_INTEL_DEFAULT) + return false; + + return true; +} + +static bool kvm_scaling_check_vm(char *name) +{ + struct dirent *entry; + char *vdir; + DIR *dir; + + if (asprintf(&vdir, "%s/%s", KVM_DEBUG_FS, name) < 0) + return true; + + dir = opendir(vdir); + if (!dir) { + free(vdir); + return true; + } + while ((entry = readdir(dir))) { + if (entry->d_type == DT_DIR && !strncmp(entry->d_name, "vcpu", 4) && + !kvm_scaling_check_vm_cpu(vdir, entry->d_name)) + break; + } + + closedir(dir); + free(vdir); + return entry == NULL; +} +static bool kvm_scaling_check(void) +{ + struct dirent *entry; + DIR *dir; + + dir = opendir(KVM_DEBUG_FS); + if (!dir) + return true; + + while ((entry = readdir(dir))) { + if (entry->d_type == DT_DIR && isdigit(entry->d_name[0]) && + !kvm_scaling_check_vm(entry->d_name)) + break; + } + closedir(dir); + return entry == NULL; +} + +static bool kvm_support_check(bool guest) +{ + struct stat st; + int ret; + + if (guest) + return true; + + ret = stat(KVM_DEBUG_FS, &st); + if (ret < 0) + return false; + + if (!S_ISDIR(st.st_mode)) + return false; + + return kvm_scaling_check(); +} + +static int kvm_open_vcpu_dir(struct kvm_clock_sync *kvm, int cpu, char *dir_str) +{ + struct dirent *entry; + char path[PATH_MAX]; + DIR *dir; + + dir = opendir(dir_str); + if (!dir) + goto error; + while ((entry = readdir(dir))) { + if (entry->d_type != DT_DIR) { + if (!strncmp(entry->d_name, KVM_DEBUG_OFFSET_FILE, + strlen(KVM_DEBUG_OFFSET_FILE))) { + snprintf(path, sizeof(path), "%s/%s", + dir_str, entry->d_name); + kvm->vcpu_offsets[cpu] = strdup(path); + } + if (!strncmp(entry->d_name, KVM_DEBUG_SCALING_FILE, + strlen(KVM_DEBUG_SCALING_FILE))) { + snprintf(path, sizeof(path), "%s/%s", + dir_str, entry->d_name); + kvm->vcpu_scalings[cpu] = strdup(path); + } + } + } + if (!kvm->vcpu_offsets[cpu]) + goto error; + closedir(dir); + return 0; + +error: + if (dir) + closedir(dir); + free(kvm->vcpu_offsets[cpu]); + kvm->vcpu_offsets[cpu] = NULL; + free(kvm->vcpu_scalings[cpu]); + kvm->vcpu_scalings[cpu] = NULL; + return -1; +} + +static int kvm_open_debug_files(struct kvm_clock_sync *kvm, int pid) +{ + char *vm_dir_str = NULL; + struct dirent *entry; + char *pid_str = NULL; + char path[PATH_MAX]; + long vcpu; + DIR *dir; + int i; + + dir = opendir(KVM_DEBUG_FS); + if (!dir) + goto error; + if (asprintf(&pid_str, "%d-", pid) <= 0) + goto error; + while ((entry = readdir(dir))) { + if (!(entry->d_type == DT_DIR && + !strncmp(entry->d_name, pid_str, strlen(pid_str)))) + continue; + asprintf(&vm_dir_str, "%s/%s", KVM_DEBUG_FS, entry->d_name); + break; + } + closedir(dir); + dir = NULL; + if (!vm_dir_str) + goto error; + dir = opendir(vm_dir_str); + if (!dir) + goto error; + while ((entry = readdir(dir))) { + if (!(entry->d_type == DT_DIR && + !strncmp(entry->d_name, KVM_DEBUG_VCPU_DIR, strlen(KVM_DEBUG_VCPU_DIR)))) + continue; + vcpu = strtol(entry->d_name + strlen(KVM_DEBUG_VCPU_DIR), NULL, 10); + if (vcpu < 0 || vcpu >= kvm->vcpu_count) + continue; + snprintf(path, sizeof(path), "%s/%s", vm_dir_str, entry->d_name); + if (kvm_open_vcpu_dir(kvm, vcpu, path) < 0) + goto error; + } + for (i = 0; i < kvm->vcpu_count; i++) { + if (!kvm->vcpu_offsets[i]) + goto error; + } + closedir(dir); + free(pid_str); + free(vm_dir_str); + return 0; +error: + free(pid_str); + free(vm_dir_str); + if (dir) + closedir(dir); + return -1; +} + +static int kvm_clock_sync_init_host(struct tracecmd_time_sync *tsync, + struct kvm_clock_sync *kvm) +{ + kvm->vcpu_count = tsync->vcpu_count; + kvm->vcpu_offsets = calloc(kvm->vcpu_count, sizeof(char *)); + kvm->vcpu_scalings = calloc(kvm->vcpu_count, sizeof(char *)); + if (!kvm->vcpu_offsets || !kvm->vcpu_scalings) + goto error; + if (kvm_open_debug_files(kvm, tsync->guest_pid) < 0) + goto error; + return 0; + +error: + free(kvm->vcpu_offsets); + free(kvm->vcpu_scalings); + return -1; +} + +static int kvm_clock_sync_init_guest(struct tracecmd_time_sync *tsync, + struct kvm_clock_sync *kvm) +{ + const char *systems[] = {"ftrace", NULL}; + struct clock_sync_context *clock_context; + struct tep_event *raw; + char *path; + + clock_context = (struct clock_sync_context *)tsync->context; + path = tracefs_instance_get_dir(clock_context->instance); + if (!path) + goto error; + kvm->tep = tracefs_local_events_system(path, systems); + tracefs_put_tracing_file(path); + if (!kvm->tep) + goto error; + raw = tep_find_event_by_name(kvm->tep, "ftrace", "raw_data"); + if (!raw) + goto error; + + kvm->raw_id = raw->id; + tep_set_file_bigendian(kvm->tep, tracecmd_host_bigendian()); + tep_set_local_bigendian(kvm->tep, tracecmd_host_bigendian()); + + path = tracefs_instance_get_file(clock_context->instance, "trace_marker_raw"); + if (!path) + goto error; + kvm->marker_fd = open(path, O_WRONLY); + tracefs_put_tracing_file(path); + + return 0; + +error: + if (kvm->tep) + tep_free(kvm->tep); + if (kvm->marker_fd >= 0) + close(kvm->marker_fd); + + return -1; +} + +static int kvm_clock_sync_init(struct tracecmd_time_sync *tsync) +{ + struct clock_sync_context *clock_context; + struct kvm_clock_sync *kvm; + int ret; + + if (!tsync || !tsync->context) + return -1; + clock_context = (struct clock_sync_context *)tsync->context; + + if (!kvm_support_check(clock_context->is_guest)) + return -1; + kvm = calloc(1, sizeof(struct kvm_clock_sync)); + if (!kvm) + return -1; + kvm->marker_fd = -1; + if (clock_context->is_guest) + ret = kvm_clock_sync_init_guest(tsync, kvm); + else + ret = kvm_clock_sync_init_host(tsync, kvm); + if (ret < 0) + goto error; + + clock_context->proto_data = kvm; + return 0; + +error: + free(kvm); + return -1; +} + +static int kvm_clock_sync_free(struct tracecmd_time_sync *tsync) +{ + struct clock_sync_context *clock_context; + struct kvm_clock_sync *kvm = NULL; + int i; + + clock_context = (struct clock_sync_context *)tsync->context; + if (clock_context) + kvm = (struct kvm_clock_sync *)clock_context->proto_data; + if (kvm) { + for (i = 0; i < kvm->vcpu_count; i++) { + free(kvm->vcpu_offsets[i]); + kvm->vcpu_offsets[i] = NULL; + free(kvm->vcpu_scalings[i]); + kvm->vcpu_scalings[i] = NULL; + } + if (kvm->tep) + tep_free(kvm->tep); + if (kvm->marker_fd >= 0) + close(kvm->marker_fd); + free(kvm); + } + return -1; +} + +static int kvm_clock_host(struct tracecmd_time_sync *tsync, + long long *offset, long long *scaling, + long long *timestamp, unsigned int cpu) +{ + char sync_proto[TRACECMD_TSYNC_PNAME_LENGTH]; + struct clock_sync_context *clock_context; + struct kvm_clock_offset_msg packet; + struct kvm_clock_sync *kvm = NULL; + long long kvm_scaling = 1; + unsigned int sync_msg; + long long kvm_offset; + unsigned int size; + char *msg; + int ret; + + clock_context = (struct clock_sync_context *)tsync->context; + if (clock_context) + kvm = (struct kvm_clock_sync *)clock_context->proto_data; + if (!kvm || !kvm->vcpu_offsets || !kvm->vcpu_offsets[0]) + return -1; + if (cpu >= kvm->vcpu_count) + return -1; + ret = read_ll_form_file(kvm->vcpu_offsets[cpu], &kvm_offset); + if (ret < 0) + return -1; + if (kvm->vcpu_scalings && kvm->vcpu_scalings[cpu]) { + read_ll_form_file(kvm->vcpu_scalings[cpu], &kvm_scaling); + if (kvm_scaling == KVM_SCALING_AMD_DEFAULT || + kvm_scaling == KVM_SCALING_INTEL_DEFAULT) + kvm_scaling = 1; + } + msg = (char *)&packet; + size = sizeof(packet); + ret = tracecmd_msg_recv_time_sync(tsync->msg_handle, + sync_proto, &sync_msg, + &size, &msg); + if (ret || strncmp(sync_proto, KVM_NAME, TRACECMD_TSYNC_PNAME_LENGTH) || + sync_msg != KVM_SYNC_PKT_REQUEST) + return -1; + + packet.offset = -kvm_offset; + packet.scaling = kvm_scaling; + ret = tracecmd_msg_send_time_sync(tsync->msg_handle, KVM_NAME, + KVM_SYNC_PKT_RESPONSE, sizeof(packet), + (char *)&packet); + if (ret) + return -1; + + *scaling = packet.scaling; + *offset = packet.offset; + *timestamp = packet.ts; + + return 0; +} + +#define KVM_EVENT_MARKER "kvm sync event" +static int kvm_marker_find(struct tep_event *event, struct tep_record *record, + int cpu, void *context) +{ + struct kvm_clock_sync *kvm = (struct kvm_clock_sync *)context; + struct tep_format_field *field; + struct tep_format_field *id; + char *marker; + + /* Make sure this is our event */ + if (event->id != kvm->raw_id) + return 0; + id = tep_find_field(event, "id"); + field = tep_find_field(event, "buf"); + if (field && id && + record->size >= (id->offset + strlen(KVM_EVENT_MARKER) + 1)) { + marker = (char *)(record->data + id->offset); + if (!strcmp(marker, KVM_EVENT_MARKER)) { + kvm->ts = record->ts; + return 1; + } + } + + return 0; +} + + +static int kvm_clock_guest(struct tracecmd_time_sync *tsync, + long long *offset, + long long *scaling, + long long *timestamp) +{ + char sync_proto[TRACECMD_TSYNC_PNAME_LENGTH]; + struct clock_sync_context *clock_context; + struct kvm_clock_offset_msg packet; + struct kvm_clock_sync *kvm = NULL; + unsigned int sync_msg; + unsigned int size; + char *msg; + int ret; + + clock_context = (struct clock_sync_context *)tsync->context; + if (clock_context) + kvm = (struct kvm_clock_sync *)clock_context->proto_data; + if (!kvm) + return -1; + kvm->ts = 0; + memset(&packet, 0, sizeof(packet)); + tracefs_instance_file_write(clock_context->instance, "trace", "\0"); + write(kvm->marker_fd, KVM_EVENT_MARKER, strlen(KVM_EVENT_MARKER) + 1); + kvm->ts = 0; + tracefs_iterate_raw_events(kvm->tep, clock_context->instance, + NULL, 0, kvm_marker_find, kvm); + packet.ts = kvm->ts; + ret = tracecmd_msg_send_time_sync(tsync->msg_handle, KVM_NAME, + KVM_SYNC_PKT_REQUEST, sizeof(packet), + (char *)&packet); + if (ret) + return -1; + msg = (char *)&packet; + size = sizeof(packet); + ret = tracecmd_msg_recv_time_sync(tsync->msg_handle, + sync_proto, &sync_msg, + &size, &msg); + if (ret || strncmp(sync_proto, KVM_NAME, TRACECMD_TSYNC_PNAME_LENGTH) || + sync_msg != KVM_SYNC_PKT_RESPONSE) + return -1; + + *scaling = packet.scaling; + *offset = packet.offset; + *timestamp = packet.ts; + return 0; +} + +static int kvm_clock_sync_calc(struct tracecmd_time_sync *tsync, + long long *offset, long long *scaling, + long long *timestamp, unsigned int cpu) +{ + struct clock_sync_context *clock_context; + int ret; + + if (!tsync || !tsync->context) + return -1; + + clock_context = (struct clock_sync_context *)tsync->context; + + if (clock_context->is_guest) + ret = kvm_clock_guest(tsync, offset, scaling, timestamp); + else + ret = kvm_clock_host(tsync, offset, scaling, timestamp, cpu); + return ret; +} + +int kvm_clock_sync_register(void) +{ + int role = TRACECMD_TIME_SYNC_ROLE_GUEST; + int clock = 0; + + if (kvm_support_check(false)) { + role |= TRACECMD_TIME_SYNC_ROLE_HOST; + clock = TRACECMD_CLOCK_X86_TSC; + } + return tracecmd_tsync_proto_register(KVM_NAME, KVM_ACCURACY, + role, clock, 0, + kvm_clock_sync_init, + kvm_clock_sync_free, + kvm_clock_sync_calc); +} + +int kvm_clock_sync_unregister(void) +{ + return tracecmd_tsync_proto_unregister(KVM_NAME); +} diff --git a/lib/trace-cmd/trace-timesync.c b/lib/trace-cmd/trace-timesync.c index ada7ddb6..8b8afd01 100644 --- a/lib/trace-cmd/trace-timesync.c +++ b/lib/trace-cmd/trace-timesync.c @@ -66,6 +66,7 @@ void tracecmd_tsync_init(void) { #ifdef VSOCK ptp_clock_sync_register(); + kvm_clock_sync_register(); #endif }