diff mbox series

[v6,25/45] trace-cmd: Read compressed trace data

Message ID 20210614075029.598048-26-tz.stoyanov@gmail.com (mailing list archive)
State Superseded
Headers show
Series Add trace file compression | expand

Commit Message

Tzvetomir Stoyanov (VMware) June 14, 2021, 7:50 a.m. UTC
When reading a trace.dat file of version 7, uncompress the trace data.
The trace data for each CPU is uncompressed in a temporary file, located
in /tmp directory with prefix "trace_cpu_data".

Signed-off-by: Tzvetomir Stoyanov (VMware) <tz.stoyanov@gmail.com>
---
 lib/trace-cmd/trace-input.c | 74 +++++++++++++++++++++++++++++--------
 tracecmd/trace-read.c       |  8 ++++
 2 files changed, 67 insertions(+), 15 deletions(-)

Comments

Steven Rostedt June 21, 2021, 11:23 p.m. UTC | #1
On Mon, 14 Jun 2021 10:50:09 +0300
"Tzvetomir Stoyanov (VMware)" <tz.stoyanov@gmail.com> wrote:

> When reading a trace.dat file of version 7, uncompress the trace data.
> The trace data for each CPU is uncompressed in a temporary file, located
> in /tmp directory with prefix "trace_cpu_data".

With large trace files, this will be an issue. Several systems setup the
/tmp directory as a ramfs file system (that is, it is locate in ram, and
not backed up on disk). If you have very large trace files, which you would
if you are going to bother compressing them, by uncompressing them into
/tmp, it could take up all the memory of the machine, or easily fill the
/tmp limit.

Simply uncompressing the entire trace data is not an option. The best we
can do is to uncompress on a as needed basis. That would require having
meta data that is stored to know what pages are compressed.

-- Steve
Tzvetomir Stoyanov (VMware) June 22, 2021, 10:50 a.m. UTC | #2
On Tue, Jun 22, 2021 at 2:23 AM Steven Rostedt <rostedt@goodmis.org> wrote:
>
> On Mon, 14 Jun 2021 10:50:09 +0300
> "Tzvetomir Stoyanov (VMware)" <tz.stoyanov@gmail.com> wrote:
>
> > When reading a trace.dat file of version 7, uncompress the trace data.
> > The trace data for each CPU is uncompressed in a temporary file, located
> > in /tmp directory with prefix "trace_cpu_data".
>
> With large trace files, this will be an issue. Several systems setup the
> /tmp directory as a ramfs file system (that is, it is locate in ram, and
> not backed up on disk). If you have very large trace files, which you would
> if you are going to bother compressing them, by uncompressing them into
> /tmp, it could take up all the memory of the machine, or easily fill the
> /tmp limit.

There are a few possible approaches for solving that:
 - use the same directory where the input trace file is located
 - use an environment variable for user specified temp directory for these files
 - check if there is enough free space on the FS before uncompressing

>
> Simply uncompressing the entire trace data is not an option. The best we
> can do is to uncompress on a as needed basis. That would require having
> meta data that is stored to know what pages are compressed.
>
I can modify that logic to compress page by page, as the data is
loaded by pages. Or use some of the above approaches ?

> -- Steve
Steven Rostedt June 22, 2021, 1:51 p.m. UTC | #3
On Tue, 22 Jun 2021 13:50:44 +0300
Tzvetomir Stoyanov <tz.stoyanov@gmail.com> wrote:

> > With large trace files, this will be an issue. Several systems setup the
> > /tmp directory as a ramfs file system (that is, it is locate in ram, and
> > not backed up on disk). If you have very large trace files, which you would
> > if you are going to bother compressing them, by uncompressing them into
> > /tmp, it could take up all the memory of the machine, or easily fill the
> > /tmp limit.  
> 
> There are a few possible approaches for solving that:
>  - use the same directory where the input trace file is located

I thought about that, but then decided against it, because there's a reason
people compress it. If we have to uncompress it to read it, I can see
people saying "why is it compressed in the first place?" When data is
compressed to save disk space (which I consider this a case), then the
reading has to uncompress it on a as-needed basis.

>  - use an environment variable for user specified temp directory for these files
>  - check if there is enough free space on the FS before uncompressing
> 
> >
> > Simply uncompressing the entire trace data is not an option. The best we
> > can do is to uncompress on a as needed basis. That would require having
> > meta data that is stored to know what pages are compressed.
> >  
> I can modify that logic to compress page by page, as the data is
> loaded by pages. Or use some of the above approaches ?

Doing it page by page is probably the most logical solution. It will make
it easier to manage without needing to create separate temporary files.

I'm guessing we need an index of each page and where they start. We need a
way to map the record offset to the page that contains it in such a way
that tracecmd_read_at() still works.

We could keep this in the file, or create it from the data. I'm thinking
saving this as a section in the file would be good as it would be quicker
for loading.

Have a section for each CPU, that maps each page with their compressed
offset in the file, and then just consider the page to be page size.

Oh, which reminds me, we need to make sure that we don't use
"getpagesize()" to determine the size of the page buffers, because I may be
making the buffers more than a single page. It must use the header_page
file in the events directory, because it that might change in the future!

Anyway, we can have this:

	buffer_page_size:	4096

/* lets say the compressed data starts at 10,000 just to make this easier
to explain. */

	u64 cpu_array[0]	10000 <- page 1 (compress to 100 bytes)
				10100 <- page 2 (compressed to 150 bytes)
				10250 <- page 3
				[...]

But the record->offset should contain the offset of the uncompressed data.
That is, if the record is on page 2 at offset 400 (uncompressed) then
offset should be:

	record->offset = 14496 (10000 + 4096 + 400)

Which would be calculated as:

	record->offset = cpu_data_start[cpu] + page * buffer_page_size + offset;

This also means that cpu_array[1] has to save its uncompressed start. That
is, even though it may start at 20,000 in the trace data file (10,000 more
than the cpu_array[0] start). It's uncompressed location needs to account
for all the cpu_array[0] pages, such that no two record's offsets will
overlap if they are on different CPUs.

	cpu_data_start[0] = 10000 (but has 1000 pages, where 1000 * 4096 = 4,096,000)

But even if cpu_array[1] starts at 20000, it has to account for the
uncompressed cpu_array[0] data, thus we have:

	cpu_data_start[1] = 4106000 (4096000 + 10000)

-- Steve
diff mbox series

Patch

diff --git a/lib/trace-cmd/trace-input.c b/lib/trace-cmd/trace-input.c
index 8fff003e..327082a2 100644
--- a/lib/trace-cmd/trace-input.c
+++ b/lib/trace-cmd/trace-input.c
@@ -54,6 +54,7 @@  struct page {
 #endif
 };
 
+#define COMPR_TEMP_FILE "/tmp/trace_cpu_dataXXXXXX"
 struct cpu_data {
 	/* the first two never change */
 	unsigned long long	file_offset;
@@ -72,6 +73,10 @@  struct cpu_data {
 	int			page_cnt;
 	int			cpu;
 	int			pipe_fd;
+
+	/* temporary file for uncompressed cpu data */
+	int			cfd;
+	char			cfile[26]; /* strlen(COMPR_TEMP_FILE) */
 };
 
 struct input_buffer_instance {
@@ -1080,6 +1085,7 @@  static void *allocate_page_map(struct tracecmd_input *handle,
 	off64_t map_offset;
 	void *map;
 	int ret;
+	int fd;
 
 	if (handle->read_page) {
 		map = malloc(handle->page_size);
@@ -1119,12 +1125,15 @@  static void *allocate_page_map(struct tracecmd_input *handle,
 		map_size -= map_offset + map_size -
 			(cpu_data->file_offset + cpu_data->file_size);
 
+	if (cpu_data->cfd >= 0)
+		fd = cpu_data->cfd;
+	else
+		fd = handle->fd;
  again:
 	page_map->size = map_size;
 	page_map->offset = map_offset;
 
-	page_map->map = mmap(NULL, map_size, PROT_READ, MAP_PRIVATE,
-			 handle->fd, map_offset);
+	page_map->map = mmap(NULL, map_size, PROT_READ, MAP_PRIVATE, fd, map_offset);
 
 	if (page_map->map == MAP_FAILED) {
 		/* Try a smaller map */
@@ -2316,13 +2325,41 @@  tracecmd_read_prev(struct tracecmd_input *handle, struct tep_record *record)
 	/* Not reached */
 }
 
+static int cpu_data_uncompress(struct tracecmd_input *handle, int cpu, unsigned long long *size)
+{
+	struct cpu_data *cpu_data;
+
+	cpu_data = &handle->cpu_data[cpu];
+	strcpy(cpu_data->cfile, COMPR_TEMP_FILE);
+	cpu_data->cfd = mkstemp(cpu_data->cfile);
+	if (cpu_data->cfd < 0)
+		return -1;
+	return tracecmd_uncompress_copy_to(handle->compress, cpu_data->cfd, NULL, size);
+}
+
 static int init_cpu(struct tracecmd_input *handle, int cpu)
 {
 	struct cpu_data *cpu_data = &handle->cpu_data[cpu];
+	unsigned long long size;
+	off64_t offset;
 	int i;
 
-	cpu_data->offset = cpu_data->file_offset;
-	cpu_data->size = cpu_data->file_size;
+	if (handle->file_version >= 7 && cpu_data->file_size > 0) {
+		offset = lseek64(handle->fd, 0, SEEK_CUR);
+		if (lseek64(handle->fd, cpu_data->file_offset, SEEK_SET) == (off_t)-1)
+			return -1;
+		if (cpu_data_uncompress(handle, cpu, &size) < 0)
+			return -1;
+		cpu_data->offset = 0;
+		cpu_data->file_offset = 0;
+		cpu_data->file_size = size;
+		cpu_data->size = size;
+		if (lseek64(handle->fd, offset, SEEK_SET) == (off_t)-1)
+			return -1;
+	} else {
+		cpu_data->offset = cpu_data->file_offset;
+		cpu_data->size = cpu_data->file_size;
+	}
 	cpu_data->timestamp = 0;
 
 	list_head_init(&cpu_data->page_maps);
@@ -3015,6 +3052,7 @@  static int read_cpu_data(struct tracecmd_input *handle)
 
 		handle->cpu_data[cpu].file_offset = offset;
 		handle->cpu_data[cpu].file_size = size;
+		handle->cpu_data[cpu].cfd = -1;
 		if (size > max_size)
 			max_size = size;
 
@@ -3635,17 +3673,23 @@  void tracecmd_close(struct tracecmd_input *handle)
 		/* The tracecmd_peek_data may have cached a record */
 		free_next(handle, cpu);
 		free_page(handle, cpu);
-		if (handle->cpu_data && handle->cpu_data[cpu].kbuf) {
-			kbuffer_free(handle->cpu_data[cpu].kbuf);
-			if (handle->cpu_data[cpu].page_map)
-				free_page_map(handle->cpu_data[cpu].page_map);
-
-			if (handle->cpu_data[cpu].page_cnt)
-				tracecmd_warning("%d pages still allocated on cpu %d%s",
-						 handle->cpu_data[cpu].page_cnt, cpu,
-						 show_records(handle->cpu_data[cpu].pages,
-							      handle->cpu_data[cpu].nr_pages));
-			free(handle->cpu_data[cpu].pages);
+		if (handle->cpu_data) {
+			if (handle->cpu_data[cpu].kbuf) {
+				kbuffer_free(handle->cpu_data[cpu].kbuf);
+				if (handle->cpu_data[cpu].page_map)
+					free_page_map(handle->cpu_data[cpu].page_map);
+
+				if (handle->cpu_data[cpu].page_cnt)
+					tracecmd_warning("%d pages still allocated on cpu %d%s",
+							 handle->cpu_data[cpu].page_cnt, cpu,
+							 show_records(handle->cpu_data[cpu].pages,
+								      handle->cpu_data[cpu].nr_pages));
+				free(handle->cpu_data[cpu].pages);
+			}
+			if (handle->cpu_data[cpu].cfd >= 0) {
+				close(handle->cpu_data[cpu].cfd);
+				unlink(handle->cpu_data[cpu].cfile);
+			}
 		}
 	}
 
diff --git a/tracecmd/trace-read.c b/tracecmd/trace-read.c
index 0cf6e773..d605d05a 100644
--- a/tracecmd/trace-read.c
+++ b/tracecmd/trace-read.c
@@ -1363,7 +1363,14 @@  struct tracecmd_input *read_trace_header(const char *file, int flags)
 
 static void sig_end(int sig)
 {
+	struct handle_list *handles;
+
 	fprintf(stderr, "trace-cmd: Received SIGINT\n");
+
+	list_for_each_entry(handles, &handle_list, list) {
+		tracecmd_close(handles->handle);
+	}
+
 	exit(0);
 }
 
@@ -1924,6 +1931,7 @@  void trace_report (int argc, char **argv)
 	/* and version overrides uname! */
 	if (show_version)
 		otype = OUTPUT_VERSION_ONLY;
+
 	read_data_info(&handle_list, otype, global, align_ts);
 
 	list_for_each_entry(handles, &handle_list, list) {