diff mbox series

selftests: proc: Make sure wchan works when it exists

Message ID 20211008235504.2957528-1-keescook@chromium.org (mailing list archive)
State Handled Elsewhere
Headers show
Series selftests: proc: Make sure wchan works when it exists | expand

Commit Message

Kees Cook Oct. 8, 2021, 11:55 p.m. UTC
This makes sure that wchan contains a sensible symbol when a process is
blocked. Specifically this calls the sleep() syscall, and expects the
architecture to have called schedule() from a function that has "sleep"
somewhere in its name. For example, on the architectures I tested
(x86_64, arm64, arm, mips, and powerpc) this is "hrtimer_nanosleep":

$ tools/testing/selftests/proc/proc-pid-wchan
ok: found 'sleep' in wchan 'hrtimer_nanosleep'

Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: linux-kselftest@vger.kernel.org
Signed-off-by: Kees Cook <keescook@chromium.org>
---
Hi Peter,

Can you add this to the wchan series, please? This should help wchan from
regressing in the future, and allow us to notice if the depth accidentally
changes, like Mark saw.
---
 tools/testing/selftests/proc/Makefile         |  1 +
 tools/testing/selftests/proc/proc-pid-wchan.c | 69 +++++++++++++++++++
 2 files changed, 70 insertions(+)
 create mode 100644 tools/testing/selftests/proc/proc-pid-wchan.c

Comments

Alexey Dobriyan Oct. 9, 2021, 12:52 p.m. UTC | #1
On Fri, Oct 08, 2021 at 04:55:04PM -0700, Kees Cook wrote:
> This makes sure that wchan contains a sensible symbol when a process is
> blocked.

> Specifically this calls the sleep() syscall, and expects the
> architecture to have called schedule() from a function that has "sleep"
> somewhere in its name.

This exposes internal kernel symbol to userspace.
Why would want to test that?

Doing s/sleep/SLEEP/g doesn't change kernel but now the test is broken.

> For example, on the architectures I tested
> (x86_64, arm64, arm, mips, and powerpc) this is "hrtimer_nanosleep":

> +/*
> + * Make sure that wchan returns a reasonable symbol when blocked.
> + */

Test should be "contains C identifier" then?

> +int main(void)
> +{
> +	char buf[64];
> +	pid_t child;
> +	int sync[2], fd;
> +
> +	if (pipe(sync) < 0)
> +		perror_exit("pipe");
> +
> +	child = fork();
> +	if (child < 0)
> +		perror_exit("fork");
> +	if (child == 0) {
> +		/* Child */
> +		if (close(sync[0]) < 0)
> +			perror_exit("child close sync[0]");
> +		if (close(sync[1]) < 0)
> +			perror_exit("child close sync[1]");

Redundant close().

> +		sleep(10);
> +		_exit(0);
> +	}
> +	/* Parent */
> +	if (close(sync[1]) < 0)
> +		perror_exit("parent close sync[1]");

Redundant close().

> +	if (read(sync[0], buf, 1) != 0)
> +		perror_exit("parent read sync[0]");

Racy if child is scheduled out after first close in the child.

> +	snprintf(buf, sizeof(buf), "/proc/%d/wchan", child);
> +	fd = open(buf, O_RDONLY);
> +	if (fd < 0) {
> +		if (errno == ENOENT)
> +			return 4;
> +		perror_exit(buf);
> +	}
> +
> +	memset(buf, 0, sizeof(buf));
> +	if (read(fd, buf, sizeof(buf) - 1) < 1)
> +		perror_exit(buf);
> +	if (strstr(buf, "sleep") == NULL) {
> +		fprintf(stderr, "FAIL: did not find 'sleep' in wchan '%s'\n", buf);
> +		return 1;
> +	}
> +	printf("ok: found 'sleep' in wchan '%s'\n", buf);
> +
> +	if (kill(child, SIGKILL) < 0)
> +		perror_exit("kill");
> +	if (waitpid(child, NULL, 0) != child) {
> +		fprintf(stderr, "waitpid: got the wrong child!?\n");
> +		return 1;
> +	}
> +
> +	return 0;
> +}
Kees Cook Oct. 11, 2021, 5:42 p.m. UTC | #2
On Sat, Oct 09, 2021 at 03:52:02PM +0300, Alexey Dobriyan wrote:
> On Fri, Oct 08, 2021 at 04:55:04PM -0700, Kees Cook wrote:
> > This makes sure that wchan contains a sensible symbol when a process is
> > blocked.
> 
> > Specifically this calls the sleep() syscall, and expects the
> > architecture to have called schedule() from a function that has "sleep"
> > somewhere in its name.
> 
> This exposes internal kernel symbol to userspace.

Correct; we're verifying the results of the wchan output, which produces
a kernel symbol for blocked processes.

> Why would want to test that?

This is part of a larger series refactoring/fixing wchan[1], and we've
now tripped over several different failure conditions, so I want to make
sure this doesn't regress in the future.

> Doing s/sleep/SLEEP/g doesn't change kernel but now the test is broken.

Yes; the test would be doing it's job, as that would mean there was a
userspace visible change to wchan, so we'd want to catch it and either
fix the kernel or update the test to reflect the new reality.

> 
> > For example, on the architectures I tested
> > (x86_64, arm64, arm, mips, and powerpc) this is "hrtimer_nanosleep":
> 
> > +/*
> > + * Make sure that wchan returns a reasonable symbol when blocked.
> > + */
> 
> Test should be "contains C identifier" then?

Nope, this was intentional. Expanding to a C identifier won't catch the
"we unwound the stack to the wrong depth and now all wchan shows is
'__switch_to'" bug[2]. We're specifically checking that wchan is doing
at least the right thing for the most common blocking state.

> 
> > +int main(void)
> > +{
> > +	char buf[64];
> > +	pid_t child;
> > +	int sync[2], fd;
> > +
> > +	if (pipe(sync) < 0)
> > +		perror_exit("pipe");
> > +
> > +	child = fork();
> > +	if (child < 0)
> > +		perror_exit("fork");
> > +	if (child == 0) {
> > +		/* Child */
> > +		if (close(sync[0]) < 0)
> > +			perror_exit("child close sync[0]");
> > +		if (close(sync[1]) < 0)
> > +			perror_exit("child close sync[1]");
> 
> Redundant close().

Hmm, did you maybe miss the differing array indexes? This closes the
reading end followed by the writing end of the child's pipe.

> 
> > +		sleep(10);
> > +		_exit(0);
> > +	}
> > +	/* Parent */
> > +	if (close(sync[1]) < 0)
> > +		perror_exit("parent close sync[1]");
> 
> Redundant close().

It's not, though. This closes the write side of the parent's pipe.

> 
> > +	if (read(sync[0], buf, 1) != 0)
> > +		perror_exit("parent read sync[0]");
> 
> Racy if child is scheduled out after first close in the child.

No, the first close will close the child's read-side of the pipe, which
isn't being used. For example, see[3].

The parent's read of /proc/$child/wchan could technically race if the
child is scheduled out after the second close() and before the sleep(),
but the parent is doing at least 2 syscalls before then. I'm open to
a more exact synchronization method, but this should be sufficient.
(e.g. Using ptrace to catch sleep syscall entry seemed like overkill.)

-Kees

[1] https://lore.kernel.org/lkml/20211008111527.438276127@infradead.org/
[2] https://lore.kernel.org/lkml/20211008124052.GA976@C02TD0UTHF1T.local/
[3] https://man7.org/tlpi/code/online/diff/pipes/pipe_sync.c.html

> 
> > +	snprintf(buf, sizeof(buf), "/proc/%d/wchan", child);
> > +	fd = open(buf, O_RDONLY);
> > +	if (fd < 0) {
> > +		if (errno == ENOENT)
> > +			return 4;
> > +		perror_exit(buf);
> > +	}
> > +
> > +	memset(buf, 0, sizeof(buf));
> > +	if (read(fd, buf, sizeof(buf) - 1) < 1)
> > +		perror_exit(buf);
> > +	if (strstr(buf, "sleep") == NULL) {
> > +		fprintf(stderr, "FAIL: did not find 'sleep' in wchan '%s'\n", buf);
> > +		return 1;
> > +	}
> > +	printf("ok: found 'sleep' in wchan '%s'\n", buf);
> > +
> > +	if (kill(child, SIGKILL) < 0)
> > +		perror_exit("kill");
> > +	if (waitpid(child, NULL, 0) != child) {
> > +		fprintf(stderr, "waitpid: got the wrong child!?\n");
> > +		return 1;
> > +	}
> > +
> > +	return 0;
> > +}
Kees Cook Oct. 21, 2021, 5:09 p.m. UTC | #3
On Fri, Oct 08, 2021 at 04:55:04PM -0700, Kees Cook wrote:
> This makes sure that wchan contains a sensible symbol when a process is
> blocked. Specifically this calls the sleep() syscall, and expects the
> architecture to have called schedule() from a function that has "sleep"
> somewhere in its name. For example, on the architectures I tested
> (x86_64, arm64, arm, mips, and powerpc) this is "hrtimer_nanosleep":
> 
> $ tools/testing/selftests/proc/proc-pid-wchan
> ok: found 'sleep' in wchan 'hrtimer_nanosleep'
> 
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Mark Rutland <mark.rutland@arm.com>
> Cc: Shuah Khan <shuah@kernel.org>
> Cc: Alexey Dobriyan <adobriyan@gmail.com>
> Cc: linux-kselftest@vger.kernel.org
> Signed-off-by: Kees Cook <keescook@chromium.org>

Friendly ping.

> ---
> Hi Peter,
> 
> Can you add this to the wchan series, please? This should help wchan from
> regressing in the future, and allow us to notice if the depth accidentally
> changes, like Mark saw.
> ---

I'd like to make sure we have a regression test for this. Will you add
this to the wchan series please?

-Kees
Peter Zijlstra Oct. 21, 2021, 7:30 p.m. UTC | #4
On Thu, Oct 21, 2021 at 10:09:33AM -0700, Kees Cook wrote:

> > Hi Peter,
> > 
> > Can you add this to the wchan series, please? This should help wchan from
> > regressing in the future, and allow us to notice if the depth accidentally
> > changes, like Mark saw.
> > ---
> 
> I'd like to make sure we have a regression test for this. Will you add
> this to the wchan series please?

I have it there, but it's in the part that didn't make it in yet. I'm
currently in the progress of reworking that.

Do you want it it ahead of all that?
Andy Lutomirski Oct. 21, 2021, 8:03 p.m. UTC | #5
On Fri, Oct 8, 2021, at 4:55 PM, Kees Cook wrote:
> This makes sure that wchan contains a sensible symbol when a process is
> blocked. Specifically this calls the sleep() syscall, and expects the
> architecture to have called schedule() from a function that has "sleep"
> somewhere in its name. For example, on the architectures I tested
> (x86_64, arm64, arm, mips, and powerpc) this is "hrtimer_nanosleep":

Is this really better than admitting that the whole mechanism is nonsense and disabling it?

We could have a fixed string for each task state and call it a day.
Kees Cook Oct. 21, 2021, 8:10 p.m. UTC | #6
On Thu, Oct 21, 2021 at 09:30:33PM +0200, Peter Zijlstra wrote:
> On Thu, Oct 21, 2021 at 10:09:33AM -0700, Kees Cook wrote:
> 
> > > Hi Peter,
> > > 
> > > Can you add this to the wchan series, please? This should help wchan from
> > > regressing in the future, and allow us to notice if the depth accidentally
> > > changes, like Mark saw.
> > > ---
> > 
> > I'd like to make sure we have a regression test for this. Will you add
> > this to the wchan series please?
> 
> I have it there, but it's in the part that didn't make it in yet. I'm
> currently in the progress of reworking that.
> 
> Do you want it it ahead of all that?

Nah, that's fine. I'm not in any rush; I just wanted to make sure it
didn't get lost. I didn't realize it was already there (I had only seen
the -tip emails). Sorry for the noise; thanks!

-Kees
Kees Cook Oct. 21, 2021, 8:12 p.m. UTC | #7
On Thu, Oct 21, 2021 at 01:03:33PM -0700, Andy Lutomirski wrote:
> 
> 
> On Fri, Oct 8, 2021, at 4:55 PM, Kees Cook wrote:
> > This makes sure that wchan contains a sensible symbol when a process is
> > blocked. Specifically this calls the sleep() syscall, and expects the
> > architecture to have called schedule() from a function that has "sleep"
> > somewhere in its name. For example, on the architectures I tested
> > (x86_64, arm64, arm, mips, and powerpc) this is "hrtimer_nanosleep":
> 
> Is this really better than admitting that the whole mechanism is nonsense and disabling it?
> 
> We could have a fixed string for each task state and call it a day.

I consider this to be "future work". In earlier discussions it came up,
but there wasn't an obvious clean cost-free way to do this, so instead
we're just fixing the broken corner and keeping the mostly working rest
of it while cleaning up the weird edges. :)
Andy Lutomirski Oct. 22, 2021, 12:39 a.m. UTC | #8
On Thu, Oct 21, 2021, at 1:12 PM, Kees Cook wrote:
> On Thu, Oct 21, 2021 at 01:03:33PM -0700, Andy Lutomirski wrote:
>> 
>> 
>> On Fri, Oct 8, 2021, at 4:55 PM, Kees Cook wrote:
>> > This makes sure that wchan contains a sensible symbol when a process is
>> > blocked. Specifically this calls the sleep() syscall, and expects the
>> > architecture to have called schedule() from a function that has "sleep"
>> > somewhere in its name. For example, on the architectures I tested
>> > (x86_64, arm64, arm, mips, and powerpc) this is "hrtimer_nanosleep":
>> 
>> Is this really better than admitting that the whole mechanism is nonsense and disabling it?
>> 
>> We could have a fixed string for each task state and call it a day.
>
> I consider this to be "future work". In earlier discussions it came up,
> but there wasn't an obvious clean cost-free way to do this, so instead
> we're just fixing the broken corner and keeping the mostly working rest
> of it while cleaning up the weird edges. :)

True, but we have the caveat that wchan is currently broken, so in some sense we have an easier time killing it now as compared to later.  But if we don't have a fully-fleshed-out idea for how to kill it, then I'm fine with waiting.

>
> -- 
> Kees Cook
Kees Cook Oct. 22, 2021, 3:35 a.m. UTC | #9
On Thu, Oct 21, 2021 at 05:39:38PM -0700, Andy Lutomirski wrote:
> 
> 
> On Thu, Oct 21, 2021, at 1:12 PM, Kees Cook wrote:
> > On Thu, Oct 21, 2021 at 01:03:33PM -0700, Andy Lutomirski wrote:
> >> 
> >> 
> >> On Fri, Oct 8, 2021, at 4:55 PM, Kees Cook wrote:
> >> > This makes sure that wchan contains a sensible symbol when a process is
> >> > blocked. Specifically this calls the sleep() syscall, and expects the
> >> > architecture to have called schedule() from a function that has "sleep"
> >> > somewhere in its name. For example, on the architectures I tested
> >> > (x86_64, arm64, arm, mips, and powerpc) this is "hrtimer_nanosleep":
> >> 
> >> Is this really better than admitting that the whole mechanism is nonsense and disabling it?
> >> 
> >> We could have a fixed string for each task state and call it a day.
> >
> > I consider this to be "future work". In earlier discussions it came up,
> > but there wasn't an obvious clean cost-free way to do this, so instead
> > we're just fixing the broken corner and keeping the mostly working rest
> > of it while cleaning up the weird edges. :)
> 
> True, but we have the caveat that wchan is currently broken, so in some sense we have an easier time killing it now as compared to later.  But if we don't have a fully-fleshed-out idea for how to kill it, then I'm fine with waiting.

It's not actually that broken. Only ORC was fully broken, so all the
other architectures (and non-ORC x86) have been fine. But given the
method of fixing ORC vs wchan, it turns out we could further clean up
the other architectures. But yes, no real plan to remove it, but the
current series fixes things pretty well. :)

-Kees

> 
> >
> > -- 
> > Kees Cook
diff mbox series

Patch

diff --git a/tools/testing/selftests/proc/Makefile b/tools/testing/selftests/proc/Makefile
index 1054e40a499a..45cf35703ece 100644
--- a/tools/testing/selftests/proc/Makefile
+++ b/tools/testing/selftests/proc/Makefile
@@ -8,6 +8,7 @@  TEST_GEN_PROGS += fd-002-posix-eq
 TEST_GEN_PROGS += fd-003-kthread
 TEST_GEN_PROGS += proc-loadavg-001
 TEST_GEN_PROGS += proc-pid-vm
+TEST_GEN_PROGS += proc-pid-wchan
 TEST_GEN_PROGS += proc-self-map-files-001
 TEST_GEN_PROGS += proc-self-map-files-002
 TEST_GEN_PROGS += proc-self-syscall
diff --git a/tools/testing/selftests/proc/proc-pid-wchan.c b/tools/testing/selftests/proc/proc-pid-wchan.c
new file mode 100644
index 000000000000..7d7870c31cef
--- /dev/null
+++ b/tools/testing/selftests/proc/proc-pid-wchan.c
@@ -0,0 +1,69 @@ 
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Make sure that wchan returns a reasonable symbol when blocked.
+ */
+#include <sys/types.h>
+#include <sys/stat.h>
+#include <errno.h>
+#include <fcntl.h>
+#include <signal.h>
+#include <stdio.h>
+#include <string.h>
+#include <unistd.h>
+#include <sys/wait.h>
+
+#define perror_exit(str) do { perror(str); _exit(1); } while (0)
+
+int main(void)
+{
+	char buf[64];
+	pid_t child;
+	int sync[2], fd;
+
+	if (pipe(sync) < 0)
+		perror_exit("pipe");
+
+	child = fork();
+	if (child < 0)
+		perror_exit("fork");
+	if (child == 0) {
+		/* Child */
+		if (close(sync[0]) < 0)
+			perror_exit("child close sync[0]");
+		if (close(sync[1]) < 0)
+			perror_exit("child close sync[1]");
+		sleep(10);
+		_exit(0);
+	}
+	/* Parent */
+	if (close(sync[1]) < 0)
+		perror_exit("parent close sync[1]");
+	if (read(sync[0], buf, 1) != 0)
+		perror_exit("parent read sync[0]");
+
+	snprintf(buf, sizeof(buf), "/proc/%d/wchan", child);
+	fd = open(buf, O_RDONLY);
+	if (fd < 0) {
+		if (errno == ENOENT)
+			return 4;
+		perror_exit(buf);
+	}
+
+	memset(buf, 0, sizeof(buf));
+	if (read(fd, buf, sizeof(buf) - 1) < 1)
+		perror_exit(buf);
+	if (strstr(buf, "sleep") == NULL) {
+		fprintf(stderr, "FAIL: did not find 'sleep' in wchan '%s'\n", buf);
+		return 1;
+	}
+	printf("ok: found 'sleep' in wchan '%s'\n", buf);
+
+	if (kill(child, SIGKILL) < 0)
+		perror_exit("kill");
+	if (waitpid(child, NULL, 0) != child) {
+		fprintf(stderr, "waitpid: got the wrong child!?\n");
+		return 1;
+	}
+
+	return 0;
+}