diff mbox series

tr2: log parent process name

Message ID 20210507002908.1495061-1-emilyshaffer@google.com (mailing list archive)
State Superseded
Headers show
Series tr2: log parent process name | expand

Commit Message

Emily Shaffer May 7, 2021, 12:29 a.m. UTC
It can be useful to tell who invoked Git - was it invoked manually by a
user via CLI or script? By an IDE? Knowing where the Git invocation came
from can help with debugging to isolate where the problem came from.

Unfortunately, there's no cross-platform reliable way to gather the name
of the parent process. If procfs is present, we can use that; otherwise
we will need to discover the name another way. However, the process ID
should be sufficient regardless of platform.

Signed-off-by: Emily Shaffer <emilyshaffer@google.com>
---
We briefly discussed hiding this behind a config, internally. However, I
wanted to include the parent name alongside the cmd_start event, which
happens very early (maybe before config gathering?).

Maybe it's better to log the parent_name as its own event, since it
shouldn't change over the lifetime of the process?

procfs is very non-portable, though - I think this won't even work on
MacOS. So I'm curious if anybody has better suggestions for how to do
this.

 - Emily

 Makefile                  |  1 +
 compat/procinfo.c         | 19 +++++++++++++++++++
 git-compat-util.h         |  6 ++++++
 t/t0212-trace2-event.sh   |  8 ++++++++
 t/t0212/parse_events.perl |  1 +
 trace2/tr2_tgt_event.c    |  3 +++
 6 files changed, 38 insertions(+)
 create mode 100644 compat/procinfo.c

Comments

Bagas Sanjaya May 7, 2021, 3:25 a.m. UTC | #1
On 07/05/21 07.29, Emily Shaffer wrote:
> It can be useful to tell who invoked Git - was it invoked manually by a
> user via CLI or script? By an IDE? Knowing where the Git invocation came
> from can help with debugging to isolate where the problem came from.
> 
> Unfortunately, there's no cross-platform reliable way to gather the name
> of the parent process. If procfs is present, we can use that; otherwise
> we will need to discover the name another way. However, the process ID
> should be sufficient regardless of platform.

What about on Windows?

> Signed-off-by: Emily Shaffer <emilyshaffer@google.com>
> ---
> We briefly discussed hiding this behind a config, internally. However, I
> wanted to include the parent name alongside the cmd_start event, which
> happens very early (maybe before config gathering?).
> 
> Maybe it's better to log the parent_name as its own event, since it
> shouldn't change over the lifetime of the process?
> 
> procfs is very non-portable, though - I think this won't even work on
> MacOS. So I'm curious if anybody has better suggestions for how to do
> this.

Maybe we can say that "currently the method of gathering parent process
name that Git uses only work on Linux".
Emily Shaffer May 7, 2021, 5:09 p.m. UTC | #2
On Thu, May 06, 2021 at 05:29:08PM -0700, Emily Shaffer wrote:
> 
> It can be useful to tell who invoked Git - was it invoked manually by a
> user via CLI or script? By an IDE? Knowing where the Git invocation came
> from can help with debugging to isolate where the problem came from.
> 
> Unfortunately, there's no cross-platform reliable way to gather the name
> of the parent process. If procfs is present, we can use that; otherwise
> we will need to discover the name another way. However, the process ID
> should be sufficient regardless of platform.
> 
> Signed-off-by: Emily Shaffer <emilyshaffer@google.com>
> ---
> We briefly discussed hiding this behind a config, internally. However, I
> wanted to include the parent name alongside the cmd_start event, which
> happens very early (maybe before config gathering?).
> 
> Maybe it's better to log the parent_name as its own event, since it
> shouldn't change over the lifetime of the process?
> 
> procfs is very non-portable, though - I think this won't even work on
> MacOS. So I'm curious if anybody has better suggestions for how to do
> this.

I wrote this and then I wrote nonportable tests anyways, bah. Working on
a fix now - the tests break for MacOS and Windows.

The parent_name needs to be set conditionally below.

> diff --git a/t/t0212-trace2-event.sh b/t/t0212-trace2-event.sh
> index 1529155cf0..3a2a8a5b5f 100755
> --- a/t/t0212-trace2-event.sh
> +++ b/t/t0212-trace2-event.sh
> @@ -61,6 +61,7 @@ test_expect_success JSON_PP 'event stream, error event' '
>  	|    "exit_code":0,
>  	|    "hierarchy":"trace2",
>  	|    "name":"trace2",
> +	|    "parent_name":"/bin/sh",
>  	|    "version":"$V"
>  	|  }
>  	|};
> @@ -115,6 +116,7 @@ test_expect_success JSON_PP 'event stream, return code 0' '
>  	|    "exit_code":0,
>  	|    "hierarchy":"trace2",
>  	|    "name":"trace2",
> +	|    "parent_name":"/bin/sh",
>  	|    "version":"$V"
>  	|  },
>  	|  "_SID0_/_SID1_":{
> @@ -143,6 +145,7 @@ test_expect_success JSON_PP 'event stream, return code 0' '
>  	|    "exit_code":0,
>  	|    "hierarchy":"trace2/trace2",
>  	|    "name":"trace2",
> +	|    "parent_name":"test-tool",
>  	|    "version":"$V"
>  	|  },
>  	|  "_SID0_/_SID1_/_SID2_":{
> @@ -155,6 +158,7 @@ test_expect_success JSON_PP 'event stream, return code 0' '
>  	|    "exit_code":0,
>  	|    "hierarchy":"trace2/trace2/trace2",
>  	|    "name":"trace2",
> +	|    "parent_name":"$TEST_DIRECTORY/../t/helper//test-tool",
>  	|    "version":"$V"
>  	|  }
>  	|};
> @@ -192,6 +196,7 @@ test_expect_success JSON_PP 'event stream, list config' '
>  	|        "value":"hello world"
>  	|      }
>  	|    ],
> +	|    "parent_name":"/bin/sh",
>  	|    "version":"$V"
>  	|  }
>  	|};
> @@ -229,6 +234,7 @@ test_expect_success JSON_PP 'event stream, list env vars' '
>  	|        "value":"hello world"
>  	|      }
>  	|    ],
> +	|    "parent_name":"/bin/sh",
>  	|    "version":"$V"
>  	|  }
>  	|};
> @@ -263,6 +269,7 @@ test_expect_success JSON_PP 'basic trace2_data' '
>  	|    "exit_code":0,
>  	|    "hierarchy":"trace2",
>  	|    "name":"trace2",
> +	|    "parent_name":"/bin/sh",
>  	|    "version":"$V"
>  	|  }
>  	|};
> @@ -295,6 +302,7 @@ test_expect_success JSON_PP 'using global config, event stream, error event' '
>  	|    "exit_code":0,
>  	|    "hierarchy":"trace2",
>  	|    "name":"trace2",
> +	|    "parent_name":"/bin/sh",
>  	|    "version":"$V"
>  	|  }
>  	|};


 - Emily
Ævar Arnfjörð Bjarmason May 10, 2021, 12:29 p.m. UTC | #3
On Thu, May 06 2021, Emily Shaffer wrote:

> It can be useful to tell who invoked Git - was it invoked manually by a
> user via CLI or script? By an IDE? Knowing where the Git invocation came
> from can help with debugging to isolate where the problem came from.

Aside from the portability concerns others have raised, I don't really
see why you'd need this.

We already have the nest-level as part of the SID, so isn't it
sufficient (and portable) at the top-level to log what isatty says + set
the initial SID "root" in the IDE (which presumably knows about git).

Wouldn't this log passwords in cases of e.g.:

    some-script --git-password secret # invokes "git"

In older versions of linux reading e.g. smaps from /proc/self would
stall the kernel while the read was happening, I haven't checked whether
cmdline is such a thing (probably not), but it's a subtle thing to have
in mind for this / follow-ups if it's made portable (is that an issue on
other OS's?).

All that being said I've got nothing fundamentally against this.
Jeff Hostetler May 11, 2021, 5:28 p.m. UTC | #4
On 5/6/21 8:29 PM, Emily Shaffer wrote:
> It can be useful to tell who invoked Git - was it invoked manually by a
> user via CLI or script? By an IDE? Knowing where the Git invocation came
> from can help with debugging to isolate where the problem came from.
> 
> Unfortunately, there's no cross-platform reliable way to gather the name
> of the parent process. If procfs is present, we can use that; otherwise
> we will need to discover the name another way. However, the process ID
> should be sufficient regardless of platform.
> 
> Signed-off-by: Emily Shaffer <emilyshaffer@google.com>
> ---
> We briefly discussed hiding this behind a config, internally. However, I
> wanted to include the parent name alongside the cmd_start event, which
> happens very early (maybe before config gathering?).
> 
> Maybe it's better to log the parent_name as its own event, since it
> shouldn't change over the lifetime of the process?
> 
> procfs is very non-portable, though - I think this won't even work on
> MacOS. So I'm curious if anybody has better suggestions for how to do
> this.
> 
>   - Emily


Look at `trace2_collect_process_info()` in `trace2.h` and
`compat/win32/trace2_win32_process_info.c`.

That function is designed to let the process call out to
platform-specific code to do things like getting the name
of the invoking process.

It is called with an enum to indicate when/why it is being
called.

On Windows, I have platform code to get the process ancestry,
the peak VM usage, and whether the process is being debugged.
Two of those were easy....  And all are very platform-specific.

They all generate events of the form:

	trace2_data_*("process", "windows/<something>", ...)

Some of my `t021[012]/` helper scripts exclude "process" category
messages from the output to make the t*.sh tests portable.


To implement a /proc lookup as you have suggested, I would
fixup the Makefile or config.mak.uname to include something
like a "HAVE_..." feature and then use that at the
bottom of trace2.h to declare a trace2_collect_process_info()
function and then have your code in compat/procinfo.c hide
behind my interface.

Your code should emits events of the form:

	trace2_data_*("process", "linux/<something>", ...)

I notice there are several `PROCFS_*` symbols currently defined
in `config.mak.uname` so maybe this should be:

	trace2_data_*("process", "procfs/<something>", ...)

(I'm not sure how similar code for Linux and the various BSD
versions would be.)


Jeff
Junio C Hamano May 11, 2021, 9:31 p.m. UTC | #5
Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes:

> On Thu, May 06 2021, Emily Shaffer wrote:
>
>> It can be useful to tell who invoked Git - was it invoked manually by a
>> user via CLI or script? By an IDE? Knowing where the Git invocation came
>> from can help with debugging to isolate where the problem came from.
>
> Aside from the portability concerns others have raised, I don't really
> see why you'd need this.
>
> We already have the nest-level as part of the SID, so isn't it
> sufficient (and portable) at the top-level to log what isatty says + set
> the initial SID "root" in the IDE (which presumably knows about git).
>
> Wouldn't this log passwords in cases of e.g.:
>
>     some-script --git-password secret # invokes "git"

Both valid and excellent points, I would think.
Emily Shaffer May 14, 2021, 10:06 p.m. UTC | #6
On Mon, May 10, 2021 at 02:29:15PM +0200, Ævar Arnfjörð Bjarmason wrote:
> 
> 
> On Thu, May 06 2021, Emily Shaffer wrote:
> 
> > It can be useful to tell who invoked Git - was it invoked manually by a
> > user via CLI or script? By an IDE? Knowing where the Git invocation came
> > from can help with debugging to isolate where the problem came from.
> 
> Aside from the portability concerns others have raised, I don't really
> see why you'd need this.
> 
> We already have the nest-level as part of the SID, so isn't it
> sufficient (and portable) at the top-level to log what isatty says + set
> the initial SID "root" in the IDE (which presumably knows about git).

If you already know all the IDEs and scripts which invoke Git, sure, you
could set the SID root - we do this with 'repo' tool today. But actually
we want this because we aren't sure exactly who at Google is invoking
Git in their tooling, how, why, etc. - this logline was supposed to help
with that. Chicken, egg, etc.

Or else I'm misunderstanding your suggestion; to me it sounded like "go
fix your VSCode plugin so that it sets an additional envvar" and I
responded accordingly. If you mean something else I didn't see,
implemented in Git land, then I'm curious to hear more.

> 
> Wouldn't this log passwords in cases of e.g.:
> 
>     some-script --git-password secret # invokes "git"

I have nothing but nasty things to say about scripts which would do
that, but your point is valid enough to make me think this logline
should be gated behind a config and posted much later (when config is
available - I think it's not yet, with the patch as-is).

> 
> In older versions of linux reading e.g. smaps from /proc/self would
> stall the kernel while the read was happening, I haven't checked whether
> cmdline is such a thing (probably not), but it's a subtle thing to have
> in mind for this / follow-ups if it's made portable (is that an issue on
> other OS's?).

That's an interesting point and I wonder how I can validate if it
does/doesn't stall in this way - I guess I can go manpage diving?

 - Emily
Emily Shaffer May 14, 2021, 10:07 p.m. UTC | #7
On Tue, May 11, 2021 at 01:28:39PM -0400, Jeff Hostetler wrote:
> 
> 
> 
> On 5/6/21 8:29 PM, Emily Shaffer wrote:
> > It can be useful to tell who invoked Git - was it invoked manually by a
> > user via CLI or script? By an IDE? Knowing where the Git invocation came
> > from can help with debugging to isolate where the problem came from.
> > 
> > Unfortunately, there's no cross-platform reliable way to gather the name
> > of the parent process. If procfs is present, we can use that; otherwise
> > we will need to discover the name another way. However, the process ID
> > should be sufficient regardless of platform.
> > 
> > Signed-off-by: Emily Shaffer <emilyshaffer@google.com>
> > ---
> > We briefly discussed hiding this behind a config, internally. However, I
> > wanted to include the parent name alongside the cmd_start event, which
> > happens very early (maybe before config gathering?).
> > 
> > Maybe it's better to log the parent_name as its own event, since it
> > shouldn't change over the lifetime of the process?
> > 
> > procfs is very non-portable, though - I think this won't even work on
> > MacOS. So I'm curious if anybody has better suggestions for how to do
> > this.
> > 
> >   - Emily
> 
> 
> Look at `trace2_collect_process_info()` in `trace2.h` and
> `compat/win32/trace2_win32_process_info.c`.
> 
> That function is designed to let the process call out to
> platform-specific code to do things like getting the name
> of the invoking process.
> 
> It is called with an enum to indicate when/why it is being
> called.
> 
> On Windows, I have platform code to get the process ancestry,
> the peak VM usage, and whether the process is being debugged.
> Two of those were easy....  And all are very platform-specific.
> 
> They all generate events of the form:
> 
> 	trace2_data_*("process", "windows/<something>", ...)
> 
> Some of my `t021[012]/` helper scripts exclude "process" category
> messages from the output to make the t*.sh tests portable.
> 
> 
> To implement a /proc lookup as you have suggested, I would
> fixup the Makefile or config.mak.uname to include something
> like a "HAVE_..." feature and then use that at the
> bottom of trace2.h to declare a trace2_collect_process_info()
> function and then have your code in compat/procinfo.c hide
> behind my interface.
> 
> Your code should emits events of the form:
> 
> 	trace2_data_*("process", "linux/<something>", ...)
> 
> I notice there are several `PROCFS_*` symbols currently defined
> in `config.mak.uname` so maybe this should be:
> 
> 	trace2_data_*("process", "procfs/<something>", ...)
> 
> (I'm not sure how similar code for Linux and the various BSD
> versions would be.)

Thanks a bunch for the pointers - this is great.

 - Emily
Junio C Hamano May 16, 2021, 3:48 a.m. UTC | #8
Emily Shaffer <emilyshaffer@google.com> writes:

> On Mon, May 10, 2021 at 02:29:15PM +0200, Ævar Arnfjörð Bjarmason wrote:
>
>> We already have the nest-level as part of the SID, so isn't it
>> sufficient (and portable) at the top-level to log what isatty says + set
>> the initial SID "root" in the IDE (which presumably knows about git).
>
> If you already know all the IDEs and scripts which invoke Git, sure, you
> could set the SID root - we do this with 'repo' tool today. But actually
> we want this because we aren't sure exactly who at Google is invoking
> Git in their tooling, how, why, etc. - this logline was supposed to help
> with that. Chicken, egg, etc.

I agreed with Æver's suggestion exectly because I failed to read the
above motivation from the patch.  If you are trying to find out who
called, then you'd need to do the "find the parent process" dance
when your parent did not give you their SID to append your ident to.

Perhaps it was obvious to the author of the patch, but it was
unclear from the point of view of readers.  Perhaps the first
paragraph of the proposed message wants a bit of rephrasing.  "we
aren't sure exactly ... how, why, etc." part of the above really
helped.

Thanks.
Emily Shaffer May 17, 2021, 8:17 p.m. UTC | #9
On Sun, May 16, 2021 at 12:48:37PM +0900, Junio C Hamano wrote:
> 
> Emily Shaffer <emilyshaffer@google.com> writes:
> 
> > On Mon, May 10, 2021 at 02:29:15PM +0200, Ævar Arnfjörð Bjarmason wrote:
> >
> >> We already have the nest-level as part of the SID, so isn't it
> >> sufficient (and portable) at the top-level to log what isatty says + set
> >> the initial SID "root" in the IDE (which presumably knows about git).
> >
> > If you already know all the IDEs and scripts which invoke Git, sure, you
> > could set the SID root - we do this with 'repo' tool today. But actually
> > we want this because we aren't sure exactly who at Google is invoking
> > Git in their tooling, how, why, etc. - this logline was supposed to help
> > with that. Chicken, egg, etc.
> 
> I agreed with Æver's suggestion exectly because I failed to read the
> above motivation from the patch.  If you are trying to find out who
> called, then you'd need to do the "find the parent process" dance
> when your parent did not give you their SID to append your ident to.
> 
> Perhaps it was obvious to the author of the patch, but it was
> unclear from the point of view of readers.  Perhaps the first
> paragraph of the proposed message wants a bit of rephrasing.  "we
> aren't sure exactly ... how, why, etc." part of the above really
> helped.
> 
> Thanks.

Thanks for the feedback. I was vacationing last week but should be
sending a rework today or tomorrow, will see if I can shore up the
commit-msg as well as the portability stuff Jeff H raised.

 - Emily
diff mbox series

Patch

diff --git a/Makefile b/Makefile
index 93664d6714..19f5189c6f 100644
--- a/Makefile
+++ b/Makefile
@@ -855,6 +855,7 @@  LIB_OBJS += commit-graph.o
 LIB_OBJS += commit-reach.o
 LIB_OBJS += commit.o
 LIB_OBJS += compat/obstack.o
+LIB_OBJS += compat/procinfo.o
 LIB_OBJS += compat/terminal.o
 LIB_OBJS += config.o
 LIB_OBJS += connect.o
diff --git a/compat/procinfo.c b/compat/procinfo.c
new file mode 100644
index 0000000000..7f4c8dd284
--- /dev/null
+++ b/compat/procinfo.c
@@ -0,0 +1,19 @@ 
+#include "git-compat-util.h"
+
+#include "strbuf.h"
+
+char *get_process_name(int pid)
+{
+	struct strbuf procfs_path = STRBUF_INIT;
+	struct strbuf out = STRBUF_INIT;
+	/* try to use procfs if it's present. */
+	strbuf_addf(&procfs_path, "/proc/%d/cmdline", pid);
+	if (strbuf_read_file(&out, procfs_path.buf, 0) > 0)
+	{
+		strbuf_release(&procfs_path);
+		return strbuf_detach(&out, NULL);
+	}
+
+	/* NEEDSWORK: add non-procfs implementations here. */
+	return NULL;
+}
diff --git a/git-compat-util.h b/git-compat-util.h
index a508dbe5a3..cc7d5d8a2a 100644
--- a/git-compat-util.h
+++ b/git-compat-util.h
@@ -1382,4 +1382,10 @@  static inline void *container_of_or_null_offset(void *ptr, size_t offset)
 
 void sleep_millisec(int millisec);
 
+/*
+ * Convert PID to process name (as would show in top/task manager). Returns
+ * NULL if unimplemented - be sure to check for NULL at callsite.
+ */
+char *get_process_name(int pid);
+
 #endif
diff --git a/t/t0212-trace2-event.sh b/t/t0212-trace2-event.sh
index 1529155cf0..3a2a8a5b5f 100755
--- a/t/t0212-trace2-event.sh
+++ b/t/t0212-trace2-event.sh
@@ -61,6 +61,7 @@  test_expect_success JSON_PP 'event stream, error event' '
 	|    "exit_code":0,
 	|    "hierarchy":"trace2",
 	|    "name":"trace2",
+	|    "parent_name":"/bin/sh",
 	|    "version":"$V"
 	|  }
 	|};
@@ -115,6 +116,7 @@  test_expect_success JSON_PP 'event stream, return code 0' '
 	|    "exit_code":0,
 	|    "hierarchy":"trace2",
 	|    "name":"trace2",
+	|    "parent_name":"/bin/sh",
 	|    "version":"$V"
 	|  },
 	|  "_SID0_/_SID1_":{
@@ -143,6 +145,7 @@  test_expect_success JSON_PP 'event stream, return code 0' '
 	|    "exit_code":0,
 	|    "hierarchy":"trace2/trace2",
 	|    "name":"trace2",
+	|    "parent_name":"test-tool",
 	|    "version":"$V"
 	|  },
 	|  "_SID0_/_SID1_/_SID2_":{
@@ -155,6 +158,7 @@  test_expect_success JSON_PP 'event stream, return code 0' '
 	|    "exit_code":0,
 	|    "hierarchy":"trace2/trace2/trace2",
 	|    "name":"trace2",
+	|    "parent_name":"$TEST_DIRECTORY/../t/helper//test-tool",
 	|    "version":"$V"
 	|  }
 	|};
@@ -192,6 +196,7 @@  test_expect_success JSON_PP 'event stream, list config' '
 	|        "value":"hello world"
 	|      }
 	|    ],
+	|    "parent_name":"/bin/sh",
 	|    "version":"$V"
 	|  }
 	|};
@@ -229,6 +234,7 @@  test_expect_success JSON_PP 'event stream, list env vars' '
 	|        "value":"hello world"
 	|      }
 	|    ],
+	|    "parent_name":"/bin/sh",
 	|    "version":"$V"
 	|  }
 	|};
@@ -263,6 +269,7 @@  test_expect_success JSON_PP 'basic trace2_data' '
 	|    "exit_code":0,
 	|    "hierarchy":"trace2",
 	|    "name":"trace2",
+	|    "parent_name":"/bin/sh",
 	|    "version":"$V"
 	|  }
 	|};
@@ -295,6 +302,7 @@  test_expect_success JSON_PP 'using global config, event stream, error event' '
 	|    "exit_code":0,
 	|    "hierarchy":"trace2",
 	|    "name":"trace2",
+	|    "parent_name":"/bin/sh",
 	|    "version":"$V"
 	|  }
 	|};
diff --git a/t/t0212/parse_events.perl b/t/t0212/parse_events.perl
index 6584bb5634..dd8b1be844 100644
--- a/t/t0212/parse_events.perl
+++ b/t/t0212/parse_events.perl
@@ -99,6 +99,7 @@ 
     }
 
     elsif ($event eq 'start') {
+	$processes->{$sid}->{'parent_name'} = $line->{'parent_name'};
 	$processes->{$sid}->{'argv'} = $line->{'argv'};
 	$processes->{$sid}->{'argv'}[0] = "_EXE_";
     }
diff --git a/trace2/tr2_tgt_event.c b/trace2/tr2_tgt_event.c
index 6353e8ad91..d258d5807c 100644
--- a/trace2/tr2_tgt_event.c
+++ b/trace2/tr2_tgt_event.c
@@ -145,10 +145,12 @@  static void fn_start_fl(const char *file, int line,
 	const char *event_name = "start";
 	struct json_writer jw = JSON_WRITER_INIT;
 	double t_abs = (double)us_elapsed_absolute / 1000000.0;
+	char *parent_name = get_process_name(getppid());
 
 	jw_object_begin(&jw, 0);
 	event_fmt_prepare(event_name, file, line, NULL, &jw);
 	jw_object_double(&jw, "t_abs", 6, t_abs);
+	jw_object_string(&jw, "parent_name", parent_name ? parent_name : NULL );
 	jw_object_inline_begin_array(&jw, "argv");
 	jw_array_argv(&jw, argv);
 	jw_end(&jw);
@@ -156,6 +158,7 @@  static void fn_start_fl(const char *file, int line,
 
 	tr2_dst_write_line(&tr2dst_event, &jw.json);
 	jw_release(&jw);
+	free(parent_name);
 }
 
 static void fn_exit_fl(const char *file, int line, uint64_t us_elapsed_absolute,