diff mbox series

[6/6] tr2: log N parent process names on Linux

Message ID patch-6.6-da003330800-20210825T231400Z-avarab@gmail.com (mailing list archive)
State Superseded
Headers show
Series tr2: plug memory leaks + logic errors + Win32 & Linux feature parity | expand

Commit Message

Ævar Arnfjörð Bjarmason Aug. 25, 2021, 11:19 p.m. UTC
In 2f732bf15e6 (tr2: log parent process name, 2021-07-21) we started
logging parent process names, but only logged all parents on Windows.
on Linux only the name of the immediate parent process was logged.

Extend the functionality added there to also log full parent chain on
Linux. In 2f732bf15e6 it was claimed that "further ancestry info can
be gathered with procfs, but it's unwieldy to do so.".

I don't know what the author meant by that, but I think it probably
referred to needing to slurp this up from the FS, as opposed to having
an API. The underlying semantics on Linux are easier to deal with than
on Windows though, at least as far as finding the parent PIDs
goes. See the get_processes() function used on Windows. As shown in
353d3d77f4f (trace2: collect Windows-specific process information,
2019-02-22) it needs to deal with cycles.

What is more complex on Linux is getting at the process name, a
simpler approach is to use fscanf(), see [1] for an implementation of
that, but as noted in the comment being added here it would fail in
the face of some weird process names, so we need our own
parse_proc_stat() to parse it out.

With this patch the "ancestry" chain for a trace2 event might look
like this:

    $ GIT_TRACE2_EVENT=/dev/stdout ~/g/git/git version | grep ancestry | jq -r .ancestry
    [
      "bash",
      "screen",
      "systemd"
    ]

And in the case of naughty process names. This uses perl's ability to
use prctl(PR_SET_NAME, ...). See Perl/perl5@7636ea95c5 (Set the legacy
process name with prctl() on assignment to $0 on Linux, 2010-04-15)[2]:

    $ perl -e '$0 = "(naughty\nname)"; system "GIT_TRACE2_EVENT=/dev/stdout ~/g/git/git version"' | grep ancestry | jq -r .ancestry
    [
      "sh",
      "(naughty\nname)",
      "bash",
      "screen",
      "systemd"
    ]

1. https://lore.kernel.org/git/87o8agp29o.fsf@evledraar.gmail.com/
2. https://github.com/Perl/perl5/commit/7636ea95c57762930accf4358f7c0c2dec086b5e

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 compat/linux/procinfo.c | 134 ++++++++++++++++++++++++++++++++++------
 1 file changed, 116 insertions(+), 18 deletions(-)

Comments

Eric Sunshine Aug. 25, 2021, 11:49 p.m. UTC | #1
On Wed, Aug 25, 2021 at 7:20 PM Ævar Arnfjörð Bjarmason
<avarab@gmail.com> wrote:
> [...]
> Extend the functionality added there to also log full parent chain on
> Linux. In 2f732bf15e6 it was claimed that "further ancestry info can
> be gathered with procfs, but it's unwieldy to do so.".
> [...]
> Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
> ---
> diff --git a/compat/linux/procinfo.c b/compat/linux/procinfo.c
> @@ -4,27 +4,129 @@
> +/*
> + * We need more complex parsing instat_parent_pid() and

s/instat_parent_pid/in stat_parent_pid/

> + * parse_proc_stat() below than a dumb fscanf(). That's because while
> + * the statcomm field is surrounded by parentheses, the process itself
> + * is free to insert any arbitrary byte sequence its its name. That
> + * can include newlines, spaces, closing parentheses etc. See
> + * do_task_stat() in fs/proc/array.c in linux.git, this is in contrast
> + * with the escaped version of the name found in /proc/%d/status.
> + *
> + * So instead of using fscanf() we'll read N bytes from it, look for
> + * the first "(", and then the last ")", anything in-between is our
> + * process name.
> + *
> + * How much N do we need? On Linux /proc/sys/kernel/pid_max is 2^15 by
> + * default, but it can be raised set to values of up to 2^22. So
> + * that's 7 digits for a PID. We have 2 PIDs in the first four fields
> + * we're interested in, so 2 * 7 = 14.
> + *
> + * We then have 4 spaces between those four values, which brings us up
> + * to 18. Add the two parentheses and it's 20. The "state" is then one
> + * character (now at 21).
> + *
> + * Finally the maximum length of the "comm" name itself is 15
> + * characters, e.g. a setting of "123456789abcdefg" will be truncated
> + * to "123456789abcdef". See PR_SET_NAME in prctl(2). So all in all
> + * we'd need to read 21 + 15 = 36 bytes.
> + *
> + * Let's just read 2^6 (64) instead for good measure. If PID_MAX ever
> + * grows past 2^22 we'll be future-proof. We'll then anchor at the
> + * last ")" we find to locate the parent PID.
> + */
Taylor Blau Aug. 26, 2021, 4:07 a.m. UTC | #2
On Thu, Aug 26, 2021 at 01:19:24AM +0200, Ævar Arnfjörð Bjarmason wrote:
> In 2f732bf15e6 (tr2: log parent process name, 2021-07-21) we started
> logging parent process names, but only logged all parents on Windows.
> on Linux only the name of the immediate parent process was logged.
>
> Extend the functionality added there to also log full parent chain on
> Linux. In 2f732bf15e6 it was claimed that "further ancestry info can
> be gathered with procfs, but it's unwieldy to do so.".
>
> I don't know what the author meant by that, but I think it probably
> referred to needing to slurp this up from the FS, as opposed to having
> an API.

I don't think that this (specifically, "I don't know what the author
meant by that") is necessary information to include in a patch message.

If you're looking for a replacement (and you may not be, but just my
$.02) I would suggest:

    "2f732bf15e6 does not log the full parent chain on Linux; implement
    that functionality here."

> What is more complex on Linux is getting at the process name, a
> simpler approach is to use fscanf(), see [1] for an implementation of
> that, but as noted in the comment being added here it would fail in
> the face of some weird process names, so we need our own
> parse_proc_stat() to parse it out.

This is helpful information to have for readers that aren't familiar
with procfs (especially the detail about why the naive approach doesn't
work).

> diff --git a/compat/linux/procinfo.c b/compat/linux/procinfo.c
> index 46a751c9a1d..937084126a6 100644
> --- a/compat/linux/procinfo.c
> +++ b/compat/linux/procinfo.c
> @@ -4,27 +4,129 @@
>  #include "strvec.h"
>  #include "trace2.h"
>
> -static void get_ancestry_names(struct strvec *names)
> +/*
> + * We need more complex parsing instat_parent_pid() and
> + * parse_proc_stat() below than a dumb fscanf(). That's because while
> + * the statcomm field is surrounded by parentheses, the process itself
> + * is free to insert any arbitrary byte sequence its its name. That
> + * can include newlines, spaces, closing parentheses etc. See
> + * do_task_stat() in fs/proc/array.c in linux.git, this is in contrast
> + * with the escaped version of the name found in /proc/%d/status.
> + *
> + * So instead of using fscanf() we'll read N bytes from it, look for
> + * the first "(", and then the last ")", anything in-between is our
> + * process name.
> + *
> + * How much N do we need? On Linux /proc/sys/kernel/pid_max is 2^15 by
> + * default, but it can be raised set to values of up to 2^22. So
> + * that's 7 digits for a PID. We have 2 PIDs in the first four fields
> + * we're interested in, so 2 * 7 = 14.
> + *
> + * We then have 4 spaces between those four values, which brings us up
> + * to 18. Add the two parentheses and it's 20. The "state" is then one
> + * character (now at 21).

Hmm, aren't there three spaces, not four?

> + * Finally the maximum length of the "comm" name itself is 15
> + * characters, e.g. a setting of "123456789abcdefg" will be truncated
> + * to "123456789abcdef". See PR_SET_NAME in prctl(2). So all in all
> + * we'd need to read 21 + 15 = 36 bytes.

Ah, 36 is the right number even though you and I counted a different
number of spaces, since the name is truncated when it goes over *16*
characters, but that includes the NUL byte. So we both arrive at the
same number in the end ;).

But I agree it's safer to just read more (but not too much more) than
what we need.

> + * Let's just read 2^6 (64) instead for good measure. If PID_MAX ever
> + * grows past 2^22 we'll be future-proof. We'll then anchor at the
> + * last ")" we find to locate the parent PID.
> + */
> +#define STAT_PARENT_PID_READ_N 64
> +
> +static int parse_proc_stat(struct strbuf *sb, struct strbuf *name,
> +			    int *statppid)

Going to think aloud to make sure that this parsing looks right.

>  {
> +	const char *lhs = strchr(sb->buf, '(');
> +	const char *rhs = strrchr(sb->buf, ')');

lhs and rhs are going to be on either side of the comm field (which may
be helpful to indicate by calling these comm_lhs and comm_rhs). And
strrchr makes sure to handle process names that have a ')' in them.
Looks right.

> +	const char *ppid_lhs, *ppid_rhs;
> +	char *p;
> +	pid_t ppid;
> +
> +	if (!lhs || !rhs)
> +		goto bad_kernel;
> +

OK.

>  	/*
> -	 * NEEDSWORK: We could gather the entire pstree into an array to match
> -	 * functionality with compat/win32/trace2_win32_process_info.c.
> -	 * To do so, we may want to examine /proc/<pid>/stat. For now, just
> -	 * gather the immediate parent name which is readily accessible from
> -	 * /proc/$(getppid())/comm.
> +	 * We're at the ")", that's followed by " X ", where X is a
> +	 * single "state" character. So advance by 4 bytes.
>  	 */
> +	ppid_lhs = rhs + 4;
> +
> +	ppid_rhs = strchr(ppid_lhs, ' ');

Skipping over the state field gives us the first character of ppid as
you say, good. And reading until the first space character will point us
right after the end, good.

> +	if (!ppid_rhs)
> +		goto bad_kernel;
> +
> +	ppid = strtol(ppid_lhs, &p, 10);
> +	if (ppid_rhs == p) {

Then parse the ppid and make sure we stopped at the right-hand side
where we should have. Good.

> +		const char *comm = lhs + 1;

Skipping past the '(', but now I feel like we really should
s/lhs/comm_&/.

> +		int commlen = rhs - lhs - 1;

This is right, but you could simplify the expression to be "rhs - comm",
since you just took into account the left-hand parenthesis in the
previous line. Also recommend a size_t here: it's obvious we're not
going to overflow int here, but it saves future readers of having to
wonder the same thing.

> +
> +		strbuf_addf(name, "%.*s", commlen, comm);
> +		*statppid = ppid;
> +
> +		return 0;
> +	}
> +
> +bad_kernel:
> +	/*
> +	 * We were able to read our STAT_PARENT_PID_READ_N bytes from
> +	 * /proc/%d/stat, but the content is bad. Broken kernel?
> +	 * Should not happen, but handle it gracefully.
> +	 */
> +	return -1;
> +}

Phew, all seems good. Thanks for bearing with me while I read through
all of that ;).

> +static int stat_parent_pid(pid_t pid, struct strbuf *name, int *statppid)
> +{
>  	struct strbuf procfs_path = STRBUF_INIT;
> -	struct strbuf name = STRBUF_INIT;
> +	struct strbuf sb = STRBUF_INIT;
> +	size_t n;
> +	FILE *fp = NULL;

fopen() will return NULL, and you call it unconditionally, so no need to
initialize here.

> +	int ret = -1;
>
>  	/* try to use procfs if it's present. */
> -	strbuf_addf(&procfs_path, "/proc/%d/comm", getppid());
> -	if (strbuf_read_file(&name, procfs_path.buf, 0) > 0) {
> -		strbuf_trim_trailing_newline(&name);
> -		strvec_push(names, name.buf);
> -		strbuf_release(&name);
> -	}
> +	strbuf_addf(&procfs_path, "/proc/%d/stat", pid);
> +	fp = fopen(procfs_path.buf, "r");
> +	if (!fp)
> +		goto cleanup;
> +
> +	n = strbuf_fread(&sb, STAT_PARENT_PID_READ_N, fp);
> +	if (n != STAT_PARENT_PID_READ_N)
> +		goto cleanup;

Hmm. Wouldn't we always goto cleanup here, since STAT_PARENT_PID_READ_N
is deliberately oversized (and not constant anyways, since process ids
could be anywhere from 1-7 digits long)?

I think we could probably drop 'n' entirely here, and instead:

    if (strbuf_fread(...) < 0)
      goto cleanup;

> +	if (parse_proc_stat(&sb, name, statppid) < 0)
> +		goto cleanup;
>
> +	ret = 0;
> +cleanup:
> +	if (fp)
> +		fclose(fp);
>  	strbuf_release(&procfs_path);
> +	strbuf_release(&sb);
> +
> +	return ret;
> +}
> +
> +static void push_ancestry_name(struct strvec *names, pid_t pid)
> +{
> +	struct strbuf name = STRBUF_INIT;
> +	int ppid;
> +
> +	if (stat_parent_pid(pid, &name, &ppid) < 0)
> +		goto cleanup;
> +
> +	strvec_push(names, name.buf);
> +
> +	/*
> +	 * Both errors and reaching the end of the process chain are
> +	 * reported as fields of 0 by proc(5)
> +	 */
> +	if (ppid)
> +		push_ancestry_name(names, ppid);
> +cleanup:
> +	strbuf_release(&name);
>  	return;
>  }

The rest looks good to me, but it looks like you overwrote all of the
work that you did in patch 4/6. I guess separating them out makes sense
if this patch wasn't taken, but I probably would have gone right to this
patch instead of fixing leaks that you were going to get rid of anyway.

Thanks,
Taylor
Ævar Arnfjörð Bjarmason Aug. 26, 2021, 12:24 p.m. UTC | #3
On Thu, Aug 26 2021, Taylor Blau wrote:

> On Thu, Aug 26, 2021 at 01:19:24AM +0200, Ævar Arnfjörð Bjarmason wrote:
>> In 2f732bf15e6 (tr2: log parent process name, 2021-07-21) we started
>> logging parent process names, but only logged all parents on Windows.
>> on Linux only the name of the immediate parent process was logged.
>>
>> Extend the functionality added there to also log full parent chain on
>> Linux. In 2f732bf15e6 it was claimed that "further ancestry info can
>> be gathered with procfs, but it's unwieldy to do so.".
>>
>> I don't know what the author meant by that, but I think it probably
>> referred to needing to slurp this up from the FS, as opposed to having
>> an API.
>
> I don't think that this (specifically, "I don't know what the author
> meant by that") is necessary information to include in a patch message.
>
> If you're looking for a replacement (and you may not be, but just my
> $.02) I would suggest:
>
>     "2f732bf15e6 does not log the full parent chain on Linux; implement
>     that functionality here."

I hope Taylor doesn't mind me quoting it, but he sent me this follow-up
off-list:

    For what it's worth, I really struggled to write this. What I was trying
    to say was something along the lines of "let's give Emily a little more
    credit for not doing this right off the bat", but I didn't want to write
    that exactly, since I don't think it was your original intention.
    
    At the very least, saying something to the effect of "I don't know what
    the original author thought was so tough, here's a patch to implement
    it" isn't helping anybody, so I tried to focus on that.
    
    Anyway, just writing to you off-list to say that I do think you had good
    intentions, but that what you wrote may be read differently by others in
    a way that you didn't intend it to be.

First, thanks to Taylor for pointing this out, and second I'd like to
apologize for that comment.

More than some "sorry someone read it that way", this really does read
even to me, its author, shortly after having written it like something
that's way more on the side of sneakiness than a charitable comment.

I.e. like some snipe-y paraphrasing of "maybe they found this problem
too complex, but look how easy it is!" than something charitable that's
meant to barely pass under the radar.

Hopefully it helps that I'm honestly just being a bit of an
inconsiderate idiot here than actively malicious.

What I meant to accomplish here was to guide a reader of these patches
through the same mental states I went through when reading the original
patch.

I.e. I took it from its description that there were some unstated
special-cases in reading procfs that made it harder to deal with than
not. I think those comments were rather just a way to say that scraping
procfs was a bit of a pain, and the MVP in 2f732bf15e6 was good enough
for now, which is fair enough.

I rephrased those comments in v2 of this patch[1]. The summary starting
at the 4th paragraph ("It's possible given the semantics[...]") is an
edge case I hadn't considered when I wrote v1. I in turn copied that
"unweildy" comment from an even older version[2], where the difficulties
of parsing the "comm" field out of "/proc/*/stat" weren't apparent to
me.

I.e. I think if I had to describe that interface now I would describe it
as unwieldy, but wouldn't when I wrote v1[1] or that v0[2]. I.e. I think
there's no convenient way to get full atomic snapshot of the process
tree, so "give me the names of all my parents as an array" is definitely
somewhere between brittle and unwieldy, in particular considering the
hassle of parsing "/proc/*/stat" properly.

But I digress, which is also a problem I have with being overly verbose
sometimes and weaving enough rope to hang myself with.

The point isn't the difficulties of the procfs(5) interface, but that
particularly with a medium like the text-based communication we mainly
use in this project it behooves the sender to think not only about what
they're saying, but how what they're saying is going to be interpreted
by the recipient and bystanders alike.

I think this is a case where I clearly failed at that, sorry Emily, and
thanks Taylor for pointing this out to me.

1. https://lore.kernel.org/git/cover-v2-0.6-00000000000-20210826T121820Z-avarab@gmail.com/
2. https://lore.kernel.org/git/87o8agp29o.fsf@evledraar.gmail.com/
diff mbox series

Patch

diff --git a/compat/linux/procinfo.c b/compat/linux/procinfo.c
index 46a751c9a1d..937084126a6 100644
--- a/compat/linux/procinfo.c
+++ b/compat/linux/procinfo.c
@@ -4,27 +4,129 @@ 
 #include "strvec.h"
 #include "trace2.h"
 
-static void get_ancestry_names(struct strvec *names)
+/*
+ * We need more complex parsing instat_parent_pid() and
+ * parse_proc_stat() below than a dumb fscanf(). That's because while
+ * the statcomm field is surrounded by parentheses, the process itself
+ * is free to insert any arbitrary byte sequence its its name. That
+ * can include newlines, spaces, closing parentheses etc. See
+ * do_task_stat() in fs/proc/array.c in linux.git, this is in contrast
+ * with the escaped version of the name found in /proc/%d/status.
+ *
+ * So instead of using fscanf() we'll read N bytes from it, look for
+ * the first "(", and then the last ")", anything in-between is our
+ * process name.
+ *
+ * How much N do we need? On Linux /proc/sys/kernel/pid_max is 2^15 by
+ * default, but it can be raised set to values of up to 2^22. So
+ * that's 7 digits for a PID. We have 2 PIDs in the first four fields
+ * we're interested in, so 2 * 7 = 14.
+ *
+ * We then have 4 spaces between those four values, which brings us up
+ * to 18. Add the two parentheses and it's 20. The "state" is then one
+ * character (now at 21).
+ *
+ * Finally the maximum length of the "comm" name itself is 15
+ * characters, e.g. a setting of "123456789abcdefg" will be truncated
+ * to "123456789abcdef". See PR_SET_NAME in prctl(2). So all in all
+ * we'd need to read 21 + 15 = 36 bytes.
+ *
+ * Let's just read 2^6 (64) instead for good measure. If PID_MAX ever
+ * grows past 2^22 we'll be future-proof. We'll then anchor at the
+ * last ")" we find to locate the parent PID.
+ */
+#define STAT_PARENT_PID_READ_N 64
+
+static int parse_proc_stat(struct strbuf *sb, struct strbuf *name,
+			    int *statppid)
 {
+	const char *lhs = strchr(sb->buf, '(');
+	const char *rhs = strrchr(sb->buf, ')');
+	const char *ppid_lhs, *ppid_rhs;
+	char *p;
+	pid_t ppid;
+
+	if (!lhs || !rhs)
+		goto bad_kernel;
+
 	/*
-	 * NEEDSWORK: We could gather the entire pstree into an array to match
-	 * functionality with compat/win32/trace2_win32_process_info.c.
-	 * To do so, we may want to examine /proc/<pid>/stat. For now, just
-	 * gather the immediate parent name which is readily accessible from
-	 * /proc/$(getppid())/comm.
+	 * We're at the ")", that's followed by " X ", where X is a
+	 * single "state" character. So advance by 4 bytes.
 	 */
+	ppid_lhs = rhs + 4;
+
+	ppid_rhs = strchr(ppid_lhs, ' ');
+	if (!ppid_rhs)
+		goto bad_kernel;
+
+	ppid = strtol(ppid_lhs, &p, 10);
+	if (ppid_rhs == p) {
+		const char *comm = lhs + 1;
+		int commlen = rhs - lhs - 1;
+
+		strbuf_addf(name, "%.*s", commlen, comm);
+		*statppid = ppid;
+
+		return 0;
+	}
+
+bad_kernel:
+	/*
+	 * We were able to read our STAT_PARENT_PID_READ_N bytes from
+	 * /proc/%d/stat, but the content is bad. Broken kernel?
+	 * Should not happen, but handle it gracefully.
+	 */
+	return -1;
+}
+
+static int stat_parent_pid(pid_t pid, struct strbuf *name, int *statppid)
+{
 	struct strbuf procfs_path = STRBUF_INIT;
-	struct strbuf name = STRBUF_INIT;
+	struct strbuf sb = STRBUF_INIT;
+	size_t n;
+	FILE *fp = NULL;
+	int ret = -1;
 
 	/* try to use procfs if it's present. */
-	strbuf_addf(&procfs_path, "/proc/%d/comm", getppid());
-	if (strbuf_read_file(&name, procfs_path.buf, 0) > 0) {
-		strbuf_trim_trailing_newline(&name);
-		strvec_push(names, name.buf);
-		strbuf_release(&name);
-	}
+	strbuf_addf(&procfs_path, "/proc/%d/stat", pid);
+	fp = fopen(procfs_path.buf, "r");
+	if (!fp)
+		goto cleanup;
+
+	n = strbuf_fread(&sb, STAT_PARENT_PID_READ_N, fp);
+	if (n != STAT_PARENT_PID_READ_N)
+		goto cleanup;
+	if (parse_proc_stat(&sb, name, statppid) < 0)
+		goto cleanup;
 
+	ret = 0;
+cleanup:
+	if (fp)
+		fclose(fp);
 	strbuf_release(&procfs_path);
+	strbuf_release(&sb);
+
+	return ret;
+}
+
+static void push_ancestry_name(struct strvec *names, pid_t pid)
+{
+	struct strbuf name = STRBUF_INIT;
+	int ppid;
+
+	if (stat_parent_pid(pid, &name, &ppid) < 0)
+		goto cleanup;
+
+	strvec_push(names, name.buf);
+
+	/*
+	 * Both errors and reaching the end of the process chain are
+	 * reported as fields of 0 by proc(5)
+	 */
+	if (ppid)
+		push_ancestry_name(names, ppid);
+cleanup:
+	strbuf_release(&name);
 	return;
 }
 
@@ -44,11 +146,7 @@  void trace2_collect_process_info(enum trace2_process_info_reason reason)
 		 */
 		break;
 	case TRACE2_PROCESS_INFO_STARTUP:
-		/*
-		 * NEEDSWORK: we could do the entire ptree in an array instead,
-		 * see compat/win32/trace2_win32_process_info.c.
-		 */
-		get_ancestry_names(&names);
+		push_ancestry_name(&names, getppid());
 
 		if (names.nr)
 			trace2_cmd_ancestry(names.v);