[v6,25/45] trace-cmd: Read compressed trace data

Message ID	20210614075029.598048-26-tz.stoyanov@gmail.com (mailing list archive)
State	Superseded
Headers	show Return-Path: <linux-trace-devel-owner@kernel.org> From: "Tzvetomir Stoyanov (VMware)" <tz.stoyanov@gmail.com> To: rostedt@goodmis.org Cc: linux-trace-devel@vger.kernel.org Subject: [PATCH v6 25/45] trace-cmd: Read compressed trace data Date: Mon, 14 Jun 2021 10:50:09 +0300 Message-Id: <20210614075029.598048-26-tz.stoyanov@gmail.com> In-Reply-To: <20210614075029.598048-1-tz.stoyanov@gmail.com> References: <20210614075029.598048-1-tz.stoyanov@gmail.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Precedence: bulk
Series	Add trace file compression \| expand [v6,00/45] Add trace file compression [v6,01/45] trace-cmd library: Remove unused private APIs for creating trace files [v6,02/45] trace-cmd library: Remove unused API tracecmd_update_option [v6,03/45] trace-cmd: Check if file version is supported [v6,04/45] trace-cmd library: Add new API to get file version of input handler [v6,05/45] trace-cmd library: Select the file version when writing trace file [v6,06/45] trace-cmd: Add APIs for library initialization and free [v6,07/45] trace-cmd library: Add support for compression algorithms [v6,08/45] trace-cmd list: Show supported compression algorithms [v6,09/45] trace-cmd library: Bump the trace file version to 7 [v6,10/45] trace-cmd library: Compress part of the trace file [v6,11/45] trace-cmd library: Read compressed trace file [v6,12/45] trace-cmd library: Add new API to get compression of input handler [v6,13/45] trace-cmd library: Inherit compression algorithm from input file [v6,14/45] trace-cmd library: Extend the create file APIs to support different compression [v6,15/45] trace-cmd record: Add new parameter --compression [v6,16/45] trace-cmd dump: Add support for trace files version 7 [v6,17/45] trace-cmd library: Add support for zlib compression library [v6,18/45] trace-cmd library: Hide the logic for updating buffer offset [v6,19/45] trace-cmd: Move buffers description outside of options [v6,20/45] trace-cmd library: Track the offset in the option section in the trace file [v6,21/45] trace-cmd library: Add compression of the option section of the trace file [v6,22/45] trace-cmd library: Refactor the logic for writing trace data in the file [v6,23/45] trace-cmd library: Add APIs for read and write compressed data in chunks [v6,24/45] trace-cmd: Compress trace data [v6,25/45] trace-cmd: Read compressed trace data [v6,26/45] trace-cmd library: Compress latency trace data [v6,27/45] trace-cmd: Read compressed latency trace data [v6,28/45] trace-cmd library: Reuse within the library the function that checks file state. [v6,29/45] trace-cmd library: Make tracecmd_copy_headers() to work with output handler [v6,30/45] trace-cmd: Do not use trace file compression with streams [v6,31/45] trace-cmd library: Add new API to get file version of output handler [v6,32/45] trace-cmd: Add file state parameter to tracecmd_copy [v6,33/45] trace-cmd: Copy CPU count in tracecmd_copy [v6,34/45] trace-cmd: Copy buffers description in tracecmd_copy [v6,35/45] trace-cmd: Copy options in tracecmd_copy [v6,36/45] trace-cmd library: Refactor the logic for writing CPU trace data [v6,37/45] trace-cmd library: Refactor the logic for writing CPU instance trace data [v6,38/45] trace-cmd: Copy trace data in tracecmd_copy [v6,39/45] trace-cmd: Add compression parameter to tracecmd_copy [v6,40/45] trace-cmd: Add new command "trace-cmd convert" [v6,41/45] trace-cmd record: Update man page [v6,42/45] trace-cmd: Add convert man page [v6,43/45] trace-cmd: Update bash completion [v6,44/45] trace-cmd list: Update the man page [v6,45/45] trace-cmd: Update trace.dat man page

Message ID

20210614075029.598048-26-tz.stoyanov@gmail.com (mailing list archive)

State

Superseded

Headers

From: "Tzvetomir Stoyanov (VMware)" <tz.stoyanov@gmail.com>
To: rostedt@goodmis.org
Cc: linux-trace-devel@vger.kernel.org
Subject: [PATCH v6 25/45] trace-cmd: Read compressed trace data
Date: Mon, 14 Jun 2021 10:50:09 +0300
Message-Id: <20210614075029.598048-26-tz.stoyanov@gmail.com>
In-Reply-To: <20210614075029.598048-1-tz.stoyanov@gmail.com>
References: <20210614075029.598048-1-tz.stoyanov@gmail.com>
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
Precedence: bulk

Series

Add trace file compression | expand

Commit Message

Tzvetomir Stoyanov (VMware) June 14, 2021, 7:50 a.m. UTC

When reading a trace.dat file of version 7, uncompress the trace data.
The trace data for each CPU is uncompressed in a temporary file, located
in /tmp directory with prefix "trace_cpu_data".

Signed-off-by: Tzvetomir Stoyanov (VMware) <tz.stoyanov@gmail.com>
---
 lib/trace-cmd/trace-input.c | 74 +++++++++++++++++++++++++++++--------
 tracecmd/trace-read.c       |  8 ++++
 2 files changed, 67 insertions(+), 15 deletions(-)

Comments

Steven Rostedt June 21, 2021, 11:23 p.m. UTC | #1

On Mon, 14 Jun 2021 10:50:09 +0300
"Tzvetomir Stoyanov (VMware)" <tz.stoyanov@gmail.com> wrote:

> When reading a trace.dat file of version 7, uncompress the trace data.
> The trace data for each CPU is uncompressed in a temporary file, located
> in /tmp directory with prefix "trace_cpu_data".

With large trace files, this will be an issue. Several systems setup the
/tmp directory as a ramfs file system (that is, it is locate in ram, and
not backed up on disk). If you have very large trace files, which you would
if you are going to bother compressing them, by uncompressing them into
/tmp, it could take up all the memory of the machine, or easily fill the
/tmp limit.

Simply uncompressing the entire trace data is not an option. The best we
can do is to uncompress on a as needed basis. That would require having
meta data that is stored to know what pages are compressed.

-- Steve

Tzvetomir Stoyanov (VMware) June 22, 2021, 10:50 a.m. UTC | #2

On Tue, Jun 22, 2021 at 2:23 AM Steven Rostedt <rostedt@goodmis.org> wrote:
>
> On Mon, 14 Jun 2021 10:50:09 +0300
> "Tzvetomir Stoyanov (VMware)" <tz.stoyanov@gmail.com> wrote:
>
> > When reading a trace.dat file of version 7, uncompress the trace data.
> > The trace data for each CPU is uncompressed in a temporary file, located
> > in /tmp directory with prefix "trace_cpu_data".
>
> With large trace files, this will be an issue. Several systems setup the
> /tmp directory as a ramfs file system (that is, it is locate in ram, and
> not backed up on disk). If you have very large trace files, which you would
> if you are going to bother compressing them, by uncompressing them into
> /tmp, it could take up all the memory of the machine, or easily fill the
> /tmp limit.

There are a few possible approaches for solving that:
 - use the same directory where the input trace file is located
 - use an environment variable for user specified temp directory for these files
 - check if there is enough free space on the FS before uncompressing

>
> Simply uncompressing the entire trace data is not an option. The best we
> can do is to uncompress on a as needed basis. That would require having
> meta data that is stored to know what pages are compressed.
>
I can modify that logic to compress page by page, as the data is
loaded by pages. Or use some of the above approaches ?

> -- Steve

Steven Rostedt June 22, 2021, 1:51 p.m. UTC | #3

On Tue, 22 Jun 2021 13:50:44 +0300
Tzvetomir Stoyanov <tz.stoyanov@gmail.com> wrote:

> > With large trace files, this will be an issue. Several systems setup the
> > /tmp directory as a ramfs file system (that is, it is locate in ram, and
> > not backed up on disk). If you have very large trace files, which you would
> > if you are going to bother compressing them, by uncompressing them into
> > /tmp, it could take up all the memory of the machine, or easily fill the
> > /tmp limit.  
> 
> There are a few possible approaches for solving that:
>  - use the same directory where the input trace file is located

I thought about that, but then decided against it, because there's a reason
people compress it. If we have to uncompress it to read it, I can see
people saying "why is it compressed in the first place?" When data is
compressed to save disk space (which I consider this a case), then the
reading has to uncompress it on a as-needed basis.

>  - use an environment variable for user specified temp directory for these files
>  - check if there is enough free space on the FS before uncompressing
> 
> >
> > Simply uncompressing the entire trace data is not an option. The best we
> > can do is to uncompress on a as needed basis. That would require having
> > meta data that is stored to know what pages are compressed.
> >  
> I can modify that logic to compress page by page, as the data is
> loaded by pages. Or use some of the above approaches ?

Doing it page by page is probably the most logical solution. It will make
it easier to manage without needing to create separate temporary files.

I'm guessing we need an index of each page and where they start. We need a
way to map the record offset to the page that contains it in such a way
that tracecmd_read_at() still works.

We could keep this in the file, or create it from the data. I'm thinking
saving this as a section in the file would be good as it would be quicker
for loading.

Have a section for each CPU, that maps each page with their compressed
offset in the file, and then just consider the page to be page size.

Oh, which reminds me, we need to make sure that we don't use
"getpagesize()" to determine the size of the page buffers, because I may be
making the buffers more than a single page. It must use the header_page
file in the events directory, because it that might change in the future!

Anyway, we can have this:

	buffer_page_size:	4096

/* lets say the compressed data starts at 10,000 just to make this easier
to explain. */

	u64 cpu_array[0]	10000 <- page 1 (compress to 100 bytes)
				10100 <- page 2 (compressed to 150 bytes)
				10250 <- page 3
				[...]

But the record->offset should contain the offset of the uncompressed data.
That is, if the record is on page 2 at offset 400 (uncompressed) then
offset should be:

	record->offset = 14496 (10000 + 4096 + 400)

Which would be calculated as:

	record->offset = cpu_data_start[cpu] + page * buffer_page_size + offset;

This also means that cpu_array[1] has to save its uncompressed start. That
is, even though it may start at 20,000 in the trace data file (10,000 more
than the cpu_array[0] start). It's uncompressed location needs to account
for all the cpu_array[0] pages, such that no two record's offsets will
overlap if they are on different CPUs.

	cpu_data_start[0] = 10000 (but has 1000 pages, where 1000 * 4096 = 4,096,000)

But even if cpu_array[1] starts at 20000, it has to account for the
uncompressed cpu_array[0] data, thus we have:

	cpu_data_start[1] = 4106000 (4096000 + 10000)

-- Steve

diff --git a/lib/trace-cmd/trace-input.c b/lib/trace-cmd/trace-input.c
index 8fff003e..327082a2 100644
--- a/lib/trace-cmd/trace-input.c
+++ b/lib/trace-cmd/trace-input.c
@@ -54,6 +54,7 @@  struct page {
 #endif
 };
 
+#define COMPR_TEMP_FILE "/tmp/trace_cpu_dataXXXXXX"
 struct cpu_data {
 	/* the first two never change */
 	unsigned long long	file_offset;
@@ -72,6 +73,10 @@  struct cpu_data {
 	int			page_cnt;
 	int			cpu;
 	int			pipe_fd;
+
+	/* temporary file for uncompressed cpu data */
+	int			cfd;
+	char			cfile[26]; /* strlen(COMPR_TEMP_FILE) */
 };
 
 struct input_buffer_instance {
@@ -1080,6 +1085,7 @@  static void *allocate_page_map(struct tracecmd_input *handle,
 	off64_t map_offset;
 	void *map;
 	int ret;
+	int fd;
 
 	if (handle->read_page) {
 		map = malloc(handle->page_size);
@@ -1119,12 +1125,15 @@  static void *allocate_page_map(struct tracecmd_input *handle,
 		map_size -= map_offset + map_size -
 			(cpu_data->file_offset + cpu_data->file_size);
 
+	if (cpu_data->cfd >= 0)
+		fd = cpu_data->cfd;
+	else
+		fd = handle->fd;
  again:
 	page_map->size = map_size;
 	page_map->offset = map_offset;
 
-	page_map->map = mmap(NULL, map_size, PROT_READ, MAP_PRIVATE,
-			 handle->fd, map_offset);
+	page_map->map = mmap(NULL, map_size, PROT_READ, MAP_PRIVATE, fd, map_offset);
 
 	if (page_map->map == MAP_FAILED) {
 		/* Try a smaller map */
@@ -2316,13 +2325,41 @@  tracecmd_read_prev(struct tracecmd_input *handle, struct tep_record *record)
 	/* Not reached */
 }
 
+static int cpu_data_uncompress(struct tracecmd_input *handle, int cpu, unsigned long long *size)
+{
+	struct cpu_data *cpu_data;
+
+	cpu_data = &handle->cpu_data[cpu];
+	strcpy(cpu_data->cfile, COMPR_TEMP_FILE);
+	cpu_data->cfd = mkstemp(cpu_data->cfile);
+	if (cpu_data->cfd < 0)
+		return -1;
+	return tracecmd_uncompress_copy_to(handle->compress, cpu_data->cfd, NULL, size);
+}
+
 static int init_cpu(struct tracecmd_input *handle, int cpu)
 {
 	struct cpu_data *cpu_data = &handle->cpu_data[cpu];
+	unsigned long long size;
+	off64_t offset;
 	int i;
 
-	cpu_data->offset = cpu_data->file_offset;
-	cpu_data->size = cpu_data->file_size;
+	if (handle->file_version >= 7 && cpu_data->file_size > 0) {
+		offset = lseek64(handle->fd, 0, SEEK_CUR);
+		if (lseek64(handle->fd, cpu_data->file_offset, SEEK_SET) == (off_t)-1)
+			return -1;
+		if (cpu_data_uncompress(handle, cpu, &size) < 0)
+			return -1;
+		cpu_data->offset = 0;
+		cpu_data->file_offset = 0;
+		cpu_data->file_size = size;
+		cpu_data->size = size;
+		if (lseek64(handle->fd, offset, SEEK_SET) == (off_t)-1)
+			return -1;
+	} else {
+		cpu_data->offset = cpu_data->file_offset;
+		cpu_data->size = cpu_data->file_size;
+	}
 	cpu_data->timestamp = 0;
 
 	list_head_init(&cpu_data->page_maps);
@@ -3015,6 +3052,7 @@  static int read_cpu_data(struct tracecmd_input *handle)
 
 		handle->cpu_data[cpu].file_offset = offset;
 		handle->cpu_data[cpu].file_size = size;
+		handle->cpu_data[cpu].cfd = -1;
 		if (size > max_size)
 			max_size = size;
 
@@ -3635,17 +3673,23 @@  void tracecmd_close(struct tracecmd_input *handle)
 		/* The tracecmd_peek_data may have cached a record */
 		free_next(handle, cpu);
 		free_page(handle, cpu);
-		if (handle->cpu_data && handle->cpu_data[cpu].kbuf) {
-			kbuffer_free(handle->cpu_data[cpu].kbuf);
-			if (handle->cpu_data[cpu].page_map)
-				free_page_map(handle->cpu_data[cpu].page_map);
-
-			if (handle->cpu_data[cpu].page_cnt)
-				tracecmd_warning("%d pages still allocated on cpu %d%s",
-						 handle->cpu_data[cpu].page_cnt, cpu,
-						 show_records(handle->cpu_data[cpu].pages,
-							      handle->cpu_data[cpu].nr_pages));
-			free(handle->cpu_data[cpu].pages);
+		if (handle->cpu_data) {
+			if (handle->cpu_data[cpu].kbuf) {
+				kbuffer_free(handle->cpu_data[cpu].kbuf);
+				if (handle->cpu_data[cpu].page_map)
+					free_page_map(handle->cpu_data[cpu].page_map);
+
+				if (handle->cpu_data[cpu].page_cnt)
+					tracecmd_warning("%d pages still allocated on cpu %d%s",
+							 handle->cpu_data[cpu].page_cnt, cpu,
+							 show_records(handle->cpu_data[cpu].pages,
+								      handle->cpu_data[cpu].nr_pages));
+				free(handle->cpu_data[cpu].pages);
+			}
+			if (handle->cpu_data[cpu].cfd >= 0) {
+				close(handle->cpu_data[cpu].cfd);
+				unlink(handle->cpu_data[cpu].cfile);
+			}
 		}
 	}
 
diff --git a/tracecmd/trace-read.c b/tracecmd/trace-read.c
index 0cf6e773..d605d05a 100644
--- a/tracecmd/trace-read.c
+++ b/tracecmd/trace-read.c
@@ -1363,7 +1363,14 @@  struct tracecmd_input *read_trace_header(const char *file, int flags)
 
 static void sig_end(int sig)
 {
+	struct handle_list *handles;
+
 	fprintf(stderr, "trace-cmd: Received SIGINT\n");
+
+	list_for_each_entry(handles, &handle_list, list) {
+		tracecmd_close(handles->handle);
+	}
+
 	exit(0);
 }
 
@@ -1924,6 +1931,7 @@  void trace_report (int argc, char **argv)
 	/* and version overrides uname! */
 	if (show_version)
 		otype = OUTPUT_VERSION_ONLY;
+
 	read_data_info(&handle_list, otype, global, align_ts);
 
 	list_for_each_entry(handles, &handle_list, list) {

[v6,25/45] trace-cmd: Read compressed trace data

Commit Message

Comments

Patch