[v1,2/2] Add selftests for pidfd polling

Message ID	20190425190010.46489-2-joel@joelfernandes.org (mailing list archive)
State	New
Headers	show Return-Path: <linux-kselftest-owner@kernel.org> From: "Joel Fernandes (Google)" <joel@joelfernandes.org> To: linux-kernel@vger.kernel.org Cc: "Joel Fernandes (Google)" <joel@joelfernandes.org>, Andrew Morton <akpm@linux-foundation.org>, Arnd Bergmann <arnd@arndb.de>, Christian Brauner <christian@brauner.io>, dancol@google.com, "Eric W. Biederman" <ebiederm@xmission.com>, Greg Kroah-Hartman <gregkh@linuxfoundation.org>, Ingo Molnar <mingo@kernel.org>, jannh@google.com, Jann Horn <jann@thejh.net>, Jonathan Kowalski <bl0pbl33p@gmail.com>, kernel-team@android.com, linux-kselftest@vger.kernel.org, luto@amacapital.net, Michal Hocko <mhocko@suse.com>, "Peter Zijlstra (Intel)" <peterz@infradead.org>, rostedt@goodmis.org, Serge Hallyn <serge@hallyn.com>, Shuah Khan <shuah@kernel.org>, sspatil@google.com, Stephen Rothwell <sfr@canb.auug.org.au>, surenb@google.com, Thomas Gleixner <tglx@linutronix.de>, timmurray@google.com, torvalds@linux-foundation.org, Tycho Andersen <tycho@tycho.ws> Subject: [PATCH v1 2/2] Add selftests for pidfd polling Date: Thu, 25 Apr 2019 15:00:10 -0400 Message-Id: <20190425190010.46489-2-joel@joelfernandes.org> In-Reply-To: <20190425190010.46489-1-joel@joelfernandes.org> References: <20190425190010.46489-1-joel@joelfernandes.org> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Sender: linux-kselftest-owner@vger.kernel.org Precedence: bulk
Series	[v1,1/2] Add polling support to pidfd \| expand [v1,1/2] Add polling support to pidfd [v1,2/2] Add selftests for pidfd polling

Joel Fernandes April 25, 2019, 7 p.m. UTC

Other than verifying pidfd based polling, the tests make sure that
wait semantics are preserved with the pidfd poll. Notably the 2 cases:
1. If a thread group leader exits while threads still there, then no
   pidfd poll notifcation should happen.
2. If a non-thread group leader does an execve, then the thread group
   leader is signaled to exit and is replaced with the execing thread
   as the new leader, however the parent is not notified in this case.

Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
---
 tools/testing/selftests/pidfd/Makefile     |   2 +-
 tools/testing/selftests/pidfd/pidfd_test.c | 198 +++++++++++++++++++++
 2 files changed, 199 insertions(+), 1 deletion(-)

Tycho Andersen April 25, 2019, 8 p.m. UTC | #1

On Thu, Apr 25, 2019 at 03:00:10PM -0400, Joel Fernandes (Google) wrote:
>
> +void *test_pidfd_poll_exec_thread(void *priv)

I think everything in this file can be static, there's this one and
3-4 below.

> +int test_pidfd_poll_exec(int use_waitpid)
> +{
> +	int pid, pidfd = 0;
> +	int status, ret;
> +	pthread_t t1;
> +	time_t prog_start = time(NULL);
> +	const char *test_name = "pidfd_poll check for premature notification on child thread exec";
> +
> +	ksft_print_msg("Parent: pid: %d\n", getpid());
> +	pid = pidfd_clone(CLONE_PIDFD, &pidfd, child_poll_exec_test);

If pidfd_clone() fails here, I think things will go haywire below.

> +	ksft_print_msg("Parent: Waiting for Child (%d) to complete.\n", pid);
> +
> +	if (use_waitpid) {
> +		ret = waitpid(pid, &status, 0);
> +		if (ret == -1)
> +			ksft_print_msg("Parent: error\n");
> +
> +		if (ret == pid)
> +			ksft_print_msg("Parent: Child process waited for.\n");
> +	} else {
> +		poll_pidfd(test_name, pidfd);
> +	}
> +
> +	time_t prog_time = time(NULL) - prog_start;
> +
> +	ksft_print_msg("Time waited for child: %lu\n", prog_time);
> +
> +	close(pidfd);
> +
> +	if (prog_time < CHILD_THREAD_MIN_WAIT || prog_time > CHILD_THREAD_MIN_WAIT + 2)
> +		ksft_exit_fail_msg("%s test: Failed\n", test_name);
> +	else
> +		ksft_test_result_pass("%s test: Passed\n", test_name);
> +}
> +
> +void *test_pidfd_poll_leader_exit_thread(void *priv)
> +{
> +	char waittime[256];
> +
> +	ksft_print_msg("Child Thread: starting. pid %d tid %d ; and sleeping\n",
> +			getpid(), syscall(SYS_gettid));
> +	sleep(CHILD_THREAD_MIN_WAIT);
> +	ksft_print_msg("Child Thread: DONE. pid %d tid %d\n", getpid(), syscall(SYS_gettid));
> +	return NULL;
> +}
> +
> +static time_t *child_exit_secs;
> +static int child_poll_leader_exit_test(void *args)
> +{
> +	pthread_t t1, t2;
> +
> +	ksft_print_msg("Child: starting. pid %d tid %d\n", getpid(), syscall(SYS_gettid));
> +	pthread_create(&t1, NULL, test_pidfd_poll_leader_exit_thread, NULL);
> +	pthread_create(&t2, NULL, test_pidfd_poll_leader_exit_thread, NULL);
> +
> +	/*
> +	 * glibc exit calls exit_group syscall, so explicity call exit only
> +	 * so that only the group leader exits, leaving the threads alone.
> +	 */
> +	*child_exit_secs = time(NULL);
> +	syscall(SYS_exit, 0);
> +}
> +
> +int test_pidfd_poll_leader_exit(int use_waitpid)
> +{
> +	int pid, pidfd = 0;
> +	int status, ret;
> +	time_t prog_start = time(NULL);
> +	const char *test_name = "pidfd_poll check for premature notification on non-empty"
> +				"group leader exit";
> +
> +	child_exit_secs = mmap(NULL, sizeof *child_exit_secs, PROT_READ | PROT_WRITE,
> +			MAP_SHARED | MAP_ANONYMOUS, -1, 0);
> +
> +	ksft_print_msg("Parent: pid: %d\n", getpid());
> +	pid = pidfd_clone(CLONE_PIDFD, &pidfd, child_poll_leader_exit_test);

Same problem here, I think.

Tycho

Christian Brauner April 25, 2019, 9:29 p.m. UTC | #2

On Thu, Apr 25, 2019 at 03:00:10PM -0400, Joel Fernandes (Google) wrote:
> Other than verifying pidfd based polling, the tests make sure that
> wait semantics are preserved with the pidfd poll. Notably the 2 cases:
> 1. If a thread group leader exits while threads still there, then no
>    pidfd poll notifcation should happen.
> 2. If a non-thread group leader does an execve, then the thread group
>    leader is signaled to exit and is replaced with the execing thread
>    as the new leader, however the parent is not notified in this case.
> 
> Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
> ---
>  tools/testing/selftests/pidfd/Makefile     |   2 +-
>  tools/testing/selftests/pidfd/pidfd_test.c | 198 +++++++++++++++++++++
>  2 files changed, 199 insertions(+), 1 deletion(-)
> 
> diff --git a/tools/testing/selftests/pidfd/Makefile b/tools/testing/selftests/pidfd/Makefile
> index deaf8073bc06..4b31c14f273c 100644
> --- a/tools/testing/selftests/pidfd/Makefile
> +++ b/tools/testing/selftests/pidfd/Makefile
> @@ -1,4 +1,4 @@
> -CFLAGS += -g -I../../../../usr/include/
> +CFLAGS += -g -I../../../../usr/include/ -lpthread
>  
>  TEST_GEN_PROGS := pidfd_test
>  
> diff --git a/tools/testing/selftests/pidfd/pidfd_test.c b/tools/testing/selftests/pidfd/pidfd_test.c
> index d59378a93782..e887f807645e 100644
> --- a/tools/testing/selftests/pidfd/pidfd_test.c
> +++ b/tools/testing/selftests/pidfd/pidfd_test.c
> @@ -4,18 +4,42 @@
>  #include <errno.h>
>  #include <fcntl.h>
>  #include <linux/types.h>
> +#include <pthread.h>
>  #include <sched.h>
>  #include <signal.h>
>  #include <stdio.h>
>  #include <stdlib.h>
>  #include <string.h>
>  #include <syscall.h>
> +#include <sys/epoll.h>
> +#include <sys/mman.h>
>  #include <sys/mount.h>
>  #include <sys/wait.h>
> +#include <time.h>
>  #include <unistd.h>
>  
>  #include "../kselftest.h"
>  
> +#define CHILD_THREAD_MIN_WAIT 3 /* seconds */
> +#define MAX_EVENTS 5
> +#define __NR_pidfd_send_signal 424

Should probably be ifndefed as well.

> +
> +#ifndef CLONE_PIDFD
> +#define CLONE_PIDFD 0x00001000
> +#endif
> +
> +static pid_t pidfd_clone(int flags, int *pidfd, int (*fn)(void *))
> +{
> +	size_t stack_size = 1024;
> +	char *stack[1024] = { 0 };
> +
> +#ifdef __ia64__
> +	return __clone2(fn, stack, stack_size, flags | SIGCHLD, NULL, pidfd);
> +#else
> +	return clone(fn, stack + stack_size, flags | SIGCHLD, NULL, pidfd);
> +#endif
> +}
> +
>  static inline int sys_pidfd_send_signal(int pidfd, int sig, siginfo_t *info,
>  					unsigned int flags)
>  {
> @@ -368,10 +392,184 @@ static int test_pidfd_send_signal_syscall_support(void)
>  	return 0;
>  }
>  
> +void *test_pidfd_poll_exec_thread(void *priv)
> +{
> +	char waittime[256];
> +
> +	ksft_print_msg("Child Thread: starting. pid %d tid %d ; and sleeping\n",
> +			getpid(), syscall(SYS_gettid));
> +	ksft_print_msg("Child Thread: doing exec of sleep\n");
> +
> +	sprintf(waittime, "%d", CHILD_THREAD_MIN_WAIT);

> +#define CHILD_THREAD_MIN_SLEEP "3" /* seconds */

Could also be

#define str(s) _str(s)
#define _str(s) #s
#define CHILD_THREAD_MIN_SLEEP 3

execl("/bin/sleep", "sleep", str(CHILD_THREAD_MIN_SLEEP), (char *)NULL);

getting rid of waittime, and snprintf().

> +	execl("/bin/sleep", "sleep", waittime, (char *)NULL);
> +
> +	ksft_print_msg("Child Thread: DONE. pid %d tid %d\n",
> +			getpid(), syscall(SYS_gettid));
> +	return NULL;
> +}
> +
> +static int poll_pidfd(const char *test_name, int pidfd)
> +{
> +	int c;
> +	int epoll_fd = epoll_create1(0);

You probably don't need the epoll_fd after an exec, so:
int epoll_fd = epoll_create1(EPOLL_CLOEXEC);

> +	struct epoll_event event, events[MAX_EVENTS];
> +
> +	if (epoll_fd == -1)
> +		ksft_exit_fail_msg("%s test: Failed to create epoll file descriptor\n",
> +				   test_name);

I think logging the errno is helpful here. 

> +
> +	event.events = EPOLLIN;
> +	event.data.fd = pidfd;
> +
> +	if (epoll_ctl(epoll_fd, EPOLL_CTL_ADD, pidfd, &event)) {
> +		ksft_print_msg("%s test: Failed to add epoll file descriptor: Skipping\n",
> +			       test_name);

I think logging the errno is helpful here. 

> +		_exit(PIDFD_SKIP);

Why do you skip when you can't add the pidfd to the epoll loop? Why
shouldn't this be a test failure?

> +	}
> +
> +	c = epoll_wait(epoll_fd, events, MAX_EVENTS, 5000);

Uhm 5000 timeout? Either do a -1 or something that is noticeably
shorter, please. :)

> +	if (c != 1 || !(events[0].events & EPOLLIN))
> +		ksft_exit_fail_msg("%s test: Unexpected epoll_wait result (c=%d, events=%x)\n",
> +				   test_name, c, events[0].events);

I think logging the errno is helpful here. 

> +
> +	close(epoll_fd);
> +	return events[0].events;
> +
> +}
> +
> +static int child_poll_exec_test(void *args)
> +{
> +	pthread_t t1;
> +
> +	ksft_print_msg("Child (pidfd): starting. pid %d tid %d\n", getpid(),
> +			syscall(SYS_gettid));
> +	pthread_create(&t1, NULL, test_pidfd_poll_exec_thread, NULL);
> +	/*
> +	 * Exec in the non-leader thread will destroy the leader immediately.
> +	 * If the wait in the parent returns too soon, the test fails.
> +	 */
> +	while (1)
> +		;

Wouldn't sleep(<some-value>) be better here or at least a:

while (true)
        sleep(<some-sensible-value);

instead of a busy loop?

> +}
> +
> +int test_pidfd_poll_exec(int use_waitpid)
> +{
> +	int pid, pidfd = 0;
> +	int status, ret;
> +	pthread_t t1;
> +	time_t prog_start = time(NULL);
> +	const char *test_name = "pidfd_poll check for premature notification on child thread exec";
> +
> +	ksft_print_msg("Parent: pid: %d\n", getpid());
> +	pid = pidfd_clone(CLONE_PIDFD, &pidfd, child_poll_exec_test);

That needs to check for error aka
if (pid < 0)
I think Tycho mentioned this already.

> +
> +	ksft_print_msg("Parent: Waiting for Child (%d) to complete.\n", pid);
> +
> +	if (use_waitpid) {
> +		ret = waitpid(pid, &status, 0);
> +		if (ret == -1)
> +			ksft_print_msg("Parent: error\n");
> +
> +		if (ret == pid)
> +			ksft_print_msg("Parent: Child process waited for.\n");
> +	} else {
> +		poll_pidfd(test_name, pidfd);

Either make poll_pidfd() void or check the error value. One of the two.

> +	}
> +
> +	time_t prog_time = time(NULL) - prog_start;
> +
> +	ksft_print_msg("Time waited for child: %lu\n", prog_time);
> +
> +	close(pidfd);
> +
> +	if (prog_time < CHILD_THREAD_MIN_WAIT || prog_time > CHILD_THREAD_MIN_WAIT + 2)

This timing-based testing seems kinda odd to be honest. Can't we do
something better than this?

> +		ksft_exit_fail_msg("%s test: Failed\n", test_name);
> +	else
> +		ksft_test_result_pass("%s test: Passed\n", test_name);
> +}
> +
> +void *test_pidfd_poll_leader_exit_thread(void *priv)
> +{
> +	char waittime[256];

Unused variable

> +
> +	ksft_print_msg("Child Thread: starting. pid %d tid %d ; and sleeping\n",
> +			getpid(), syscall(SYS_gettid));
> +	sleep(CHILD_THREAD_MIN_WAIT);
> +	ksft_print_msg("Child Thread: DONE. pid %d tid %d\n", getpid(), syscall(SYS_gettid));
> +	return NULL;
> +}
> +
> +static time_t *child_exit_secs;
> +static int child_poll_leader_exit_test(void *args)
> +{
> +	pthread_t t1, t2;
> +
> +	ksft_print_msg("Child: starting. pid %d tid %d\n", getpid(), syscall(SYS_gettid));
> +	pthread_create(&t1, NULL, test_pidfd_poll_leader_exit_thread, NULL);
> +	pthread_create(&t2, NULL, test_pidfd_poll_leader_exit_thread, NULL);
> +
> +	/*
> +	 * glibc exit calls exit_group syscall, so explicity call exit only
> +	 * so that only the group leader exits, leaving the threads alone.
> +	 */
> +	*child_exit_secs = time(NULL);
> +	syscall(SYS_exit, 0);
> +}
> +
> +int test_pidfd_poll_leader_exit(int use_waitpid)

static

> +{
> +	int pid, pidfd = 0;
> +	int status, ret;
> +	time_t prog_start = time(NULL);
> +	const char *test_name = "pidfd_poll check for premature notification on non-empty"
> +				"group leader exit";
> +
> +	child_exit_secs = mmap(NULL, sizeof *child_exit_secs, PROT_READ | PROT_WRITE,
> +			MAP_SHARED | MAP_ANONYMOUS, -1, 0);

Error checking, please:

if (child_exit_secs == MAP_FAILED)

> +
> +	ksft_print_msg("Parent: pid: %d\n", getpid());
> +	pid = pidfd_clone(CLONE_PIDFD, &pidfd, child_poll_leader_exit_test);

Error checking, please:

if (pid < 0)

> +
> +	ksft_print_msg("Parent: Waiting for Child (%d) to complete.\n", pid);
> +
> +	if (use_waitpid) {
> +		ret = waitpid(pid, &status, 0);
> +		if (ret == -1)
> +			ksft_print_msg("Parent: error\n");
> +	} else {
> +		/*
> +		 * This sleep tests for the case where if the child exits, and is in
> +		 * EXIT_ZOMBIE, but the thread group leader is non-empty, then the poll
> +		 * doesn't prematurely return even though there are active threads
> +		 */
> +		sleep(1);
> +		poll_pidfd(test_name, pidfd);

Make poll_pidfd() void or check error, please.

> +	}
> +
> +	if (ret == pid)
> +		ksft_print_msg("Parent: Child process waited for.\n");
> +
> +	time_t since_child_exit = time(NULL) - *child_exit_secs;
> +
> +	ksft_print_msg("Time since child exit: %lu\n", since_child_exit);
> +
> +	close(pidfd);
> +
> +	if (since_child_exit < CHILD_THREAD_MIN_WAIT ||
> +			since_child_exit > CHILD_THREAD_MIN_WAIT + 2)

This looks very magical. Especially without a comment. Now you add
random +2. Please comment it or better, come up with a non-timing
based test.

> +		ksft_exit_fail_msg("%s test: Failed\n", test_name);
> +	else
> +		ksft_test_result_pass("%s test: Passed\n", test_name);
> +}
> +
>  int main(int argc, char **argv)
>  {
>  	ksft_print_header();
>  
> +	test_pidfd_poll_exec(0);
> +	test_pidfd_poll_exec(1);
> +	test_pidfd_poll_leader_exit(0);
> +	test_pidfd_poll_leader_exit(1);
>  	test_pidfd_send_signal_syscall_support();
>  	test_pidfd_send_signal_simple_success();
>  	test_pidfd_send_signal_exited_fail();
> -- 
> 2.21.0.593.g511ec345e18-goog
>

Daniel Colascione April 25, 2019, 10:07 p.m. UTC | #3

On Thu, Apr 25, 2019 at 2:29 PM Christian Brauner <christian@brauner.io> wrote:
> This timing-based testing seems kinda odd to be honest. Can't we do
> something better than this?

Agreed. Timing-based tests have a substantial risk of becoming flaky.
We ought to be able to make these tests fully deterministic and not
subject to breakage from odd scheduling outcomes. We don't have
sleepable events for everything, granted, but sleep-waiting on a
condition with exponential backoff is fine in test code. In general,
if you start with a robust test, you can insert a sleep(100) anywhere
and not break the logic. Violating this rule always causes pain sooner
or later.

Other thoughts: IMHO, using poll(2) instead of epoll would simplify
the test code, and I think we can get away with calling
pthread_exit(3) instead of SYS_exit.

Joel Fernandes April 26, 2019, 1:42 p.m. UTC | #4

On Thu, Apr 25, 2019 at 11:29:18PM +0200, Christian Brauner wrote:
> On Thu, Apr 25, 2019 at 03:00:10PM -0400, Joel Fernandes (Google) wrote:
> > Other than verifying pidfd based polling, the tests make sure that
> > wait semantics are preserved with the pidfd poll. Notably the 2 cases:
> > 1. If a thread group leader exits while threads still there, then no
> >    pidfd poll notifcation should happen.
> > 2. If a non-thread group leader does an execve, then the thread group
> >    leader is signaled to exit and is replaced with the execing thread
> >    as the new leader, however the parent is not notified in this case.
> > 
> > Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
> > ---
> >  tools/testing/selftests/pidfd/Makefile     |   2 +-
> >  tools/testing/selftests/pidfd/pidfd_test.c | 198 +++++++++++++++++++++
> >  2 files changed, 199 insertions(+), 1 deletion(-)
> > 
> > diff --git a/tools/testing/selftests/pidfd/Makefile b/tools/testing/selftests/pidfd/Makefile
> > index deaf8073bc06..4b31c14f273c 100644
> > --- a/tools/testing/selftests/pidfd/Makefile
> > +++ b/tools/testing/selftests/pidfd/Makefile
> > @@ -1,4 +1,4 @@
> > -CFLAGS += -g -I../../../../usr/include/
> > +CFLAGS += -g -I../../../../usr/include/ -lpthread
> >  
> >  TEST_GEN_PROGS := pidfd_test
> >  
> > diff --git a/tools/testing/selftests/pidfd/pidfd_test.c b/tools/testing/selftests/pidfd/pidfd_test.c
> > index d59378a93782..e887f807645e 100644
> > --- a/tools/testing/selftests/pidfd/pidfd_test.c
> > +++ b/tools/testing/selftests/pidfd/pidfd_test.c
> > @@ -4,18 +4,42 @@
> >  #include <errno.h>
> >  #include <fcntl.h>
> >  #include <linux/types.h>
> > +#include <pthread.h>
> >  #include <sched.h>
> >  #include <signal.h>
> >  #include <stdio.h>
> >  #include <stdlib.h>
> >  #include <string.h>
> >  #include <syscall.h>
> > +#include <sys/epoll.h>
> > +#include <sys/mman.h>
> >  #include <sys/mount.h>
> >  #include <sys/wait.h>
> > +#include <time.h>
> >  #include <unistd.h>
> >  
> >  #include "../kselftest.h"
> >  
> > +#define CHILD_THREAD_MIN_WAIT 3 /* seconds */
> > +#define MAX_EVENTS 5
> > +#define __NR_pidfd_send_signal 424
> 
> Should probably be ifndefed as well.

done

> > +#ifndef CLONE_PIDFD
> > +#define CLONE_PIDFD 0x00001000
> > +#endif
> > +
> > +static pid_t pidfd_clone(int flags, int *pidfd, int (*fn)(void *))
> > +{
> > +	size_t stack_size = 1024;
> > +	char *stack[1024] = { 0 };
> > +
> > +#ifdef __ia64__
> > +	return __clone2(fn, stack, stack_size, flags | SIGCHLD, NULL, pidfd);
> > +#else
> > +	return clone(fn, stack + stack_size, flags | SIGCHLD, NULL, pidfd);
> > +#endif
> > +}
> > +
> >  static inline int sys_pidfd_send_signal(int pidfd, int sig, siginfo_t *info,
> >  					unsigned int flags)
> >  {
> > @@ -368,10 +392,184 @@ static int test_pidfd_send_signal_syscall_support(void)
> >  	return 0;
> >  }
> >  
> > +void *test_pidfd_poll_exec_thread(void *priv)
> > +{
> > +	char waittime[256];
> > +
> > +	ksft_print_msg("Child Thread: starting. pid %d tid %d ; and sleeping\n",
> > +			getpid(), syscall(SYS_gettid));
> > +	ksft_print_msg("Child Thread: doing exec of sleep\n");
> > +
> > +	sprintf(waittime, "%d", CHILD_THREAD_MIN_WAIT);
> 
> > +#define CHILD_THREAD_MIN_SLEEP "3" /* seconds */
> 
> Could also be
> 
> #define str(s) _str(s)
> #define _str(s) #s
> #define CHILD_THREAD_MIN_SLEEP 3
> 
> execl("/bin/sleep", "sleep", str(CHILD_THREAD_MIN_SLEEP), (char *)NULL);
> 
> getting rid of waittime, and snprintf().

yep, much better, thanks.

> > +	execl("/bin/sleep", "sleep", waittime, (char *)NULL);
> > +
> > +	ksft_print_msg("Child Thread: DONE. pid %d tid %d\n",
> > +			getpid(), syscall(SYS_gettid));
> > +	return NULL;
> > +}
> > +
> > +static int poll_pidfd(const char *test_name, int pidfd)
> > +{
> > +	int c;
> > +	int epoll_fd = epoll_create1(0);
> 
> You probably don't need the epoll_fd after an exec, so:
> int epoll_fd = epoll_create1(EPOLL_CLOEXEC);

done

> > +	struct epoll_event event, events[MAX_EVENTS];
> > +
> > +	if (epoll_fd == -1)
> > +		ksft_exit_fail_msg("%s test: Failed to create epoll file descriptor\n",
> > +				   test_name);
> 
> I think logging the errno is helpful here. 
> 
> > +
> > +	event.events = EPOLLIN;
> > +	event.data.fd = pidfd;
> > +
> > +	if (epoll_ctl(epoll_fd, EPOLL_CTL_ADD, pidfd, &event)) {
> > +		ksft_print_msg("%s test: Failed to add epoll file descriptor: Skipping\n",
> > +			       test_name);
> 
> I think logging the errno is helpful here. 

no where else in other tests are we logging this. I don't have a preference.
Should ksft_exit_fail_msg() do this automatically? Although it could be
logging a stale errno if it did.  Anyway I added logging of errno here, as
you suggest.

> > +		_exit(PIDFD_SKIP);
> 
> Why do you skip when you can't add the pidfd to the epoll loop? Why
> shouldn't this be a test failure?

The original approach was to do this for proc pidfd, which means older
kernels could get a pidfd but couldn't do poll, in this case I wanted the
test to be skipped. Since we are now basing this on CLONE_PIDFD, there is
less of a reason for that. So I will just do ksft_exit_fail_msg() here.

> > +	}
> > +
> > +	c = epoll_wait(epoll_fd, events, MAX_EVENTS, 5000);
> 
> Uhm 5000 timeout? Either do a -1 or something that is noticeably
> shorter, please. :)

I want a timeout for the case where epoll_wait blocks indefinitely, in which
case it should be a test failure.

> > +	if (c != 1 || !(events[0].events & EPOLLIN))
> > +		ksft_exit_fail_msg("%s test: Unexpected epoll_wait result (c=%d, events=%x)\n",
> > +				   test_name, c, events[0].events);
> 
> I think logging the errno is helpful here. 

Ok, done.

> > +
> > +	close(epoll_fd);
> > +	return events[0].events;
> > +
> > +}
> > +
> > +static int child_poll_exec_test(void *args)
> > +{
> > +	pthread_t t1;
> > +
> > +	ksft_print_msg("Child (pidfd): starting. pid %d tid %d\n", getpid(),
> > +			syscall(SYS_gettid));
> > +	pthread_create(&t1, NULL, test_pidfd_poll_exec_thread, NULL);
> > +	/*
> > +	 * Exec in the non-leader thread will destroy the leader immediately.
> > +	 * If the wait in the parent returns too soon, the test fails.
> > +	 */
> > +	while (1)
> > +		;
> 
> Wouldn't sleep(<some-value>) be better here or at least a:
> 
> while (true)
>         sleep(<some-sensible-value);
> 
> instead of a busy loop?

Good catch, I will do sleep(1);

> > +}
> > +
> > +int test_pidfd_poll_exec(int use_waitpid)
> > +{
> > +	int pid, pidfd = 0;
> > +	int status, ret;
> > +	pthread_t t1;
> > +	time_t prog_start = time(NULL);
> > +	const char *test_name = "pidfd_poll check for premature notification on child thread exec";
> > +
> > +	ksft_print_msg("Parent: pid: %d\n", getpid());
> > +	pid = pidfd_clone(CLONE_PIDFD, &pidfd, child_poll_exec_test);
> 
> That needs to check for error aka
> if (pid < 0)
> I think Tycho mentioned this already.

fixed, thanks to Tycho as well!

> > +
> > +	ksft_print_msg("Parent: Waiting for Child (%d) to complete.\n", pid);
> > +
> > +	if (use_waitpid) {
> > +		ret = waitpid(pid, &status, 0);
> > +		if (ret == -1)
> > +			ksft_print_msg("Parent: error\n");
> > +
> > +		if (ret == pid)
> > +			ksft_print_msg("Parent: Child process waited for.\n");
> > +	} else {
> > +		poll_pidfd(test_name, pidfd);
> 
> Either make poll_pidfd() void or check the error value. One of the two.

done

> > +	}
> > +
> > +	time_t prog_time = time(NULL) - prog_start;
> > +
> > +	ksft_print_msg("Time waited for child: %lu\n", prog_time);
> > +
> > +	close(pidfd);
> > +
> > +	if (prog_time < CHILD_THREAD_MIN_WAIT || prog_time > CHILD_THREAD_MIN_WAIT + 2)
> 
> This timing-based testing seems kinda odd to be honest. Can't we do
> something better than this?

will try..

> > +		ksft_exit_fail_msg("%s test: Failed\n", test_name);
> > +	else
> > +		ksft_test_result_pass("%s test: Passed\n", test_name);
> > +}
> > +
> > +void *test_pidfd_poll_leader_exit_thread(void *priv)
> > +{
> > +	char waittime[256];
> 
> Unused variable

ouch, fixed

> > +
> > +	ksft_print_msg("Child Thread: starting. pid %d tid %d ; and sleeping\n",
> > +			getpid(), syscall(SYS_gettid));
> > +	sleep(CHILD_THREAD_MIN_WAIT);
> > +	ksft_print_msg("Child Thread: DONE. pid %d tid %d\n", getpid(), syscall(SYS_gettid));
> > +	return NULL;
> > +}
> > +
> > +static time_t *child_exit_secs;
> > +static int child_poll_leader_exit_test(void *args)
> > +{
> > +	pthread_t t1, t2;
> > +
> > +	ksft_print_msg("Child: starting. pid %d tid %d\n", getpid(), syscall(SYS_gettid));
> > +	pthread_create(&t1, NULL, test_pidfd_poll_leader_exit_thread, NULL);
> > +	pthread_create(&t2, NULL, test_pidfd_poll_leader_exit_thread, NULL);
> > +
> > +	/*
> > +	 * glibc exit calls exit_group syscall, so explicity call exit only
> > +	 * so that only the group leader exits, leaving the threads alone.
> > +	 */
> > +	*child_exit_secs = time(NULL);
> > +	syscall(SYS_exit, 0);
> > +}
> > +
> > +int test_pidfd_poll_leader_exit(int use_waitpid)
> 
> static

fixed

> > +{
> > +	int pid, pidfd = 0;
> > +	int status, ret;
> > +	time_t prog_start = time(NULL);
> > +	const char *test_name = "pidfd_poll check for premature notification on non-empty"
> > +				"group leader exit";
> > +
> > +	child_exit_secs = mmap(NULL, sizeof *child_exit_secs, PROT_READ | PROT_WRITE,
> > +			MAP_SHARED | MAP_ANONYMOUS, -1, 0);
> 
> Error checking, please:
> 
> if (child_exit_secs == MAP_FAILED)

done

> > +
> > +	ksft_print_msg("Parent: pid: %d\n", getpid());
> > +	pid = pidfd_clone(CLONE_PIDFD, &pidfd, child_poll_leader_exit_test);
> 
> Error checking, please:
> 
> if (pid < 0)

done

> > +
> > +	ksft_print_msg("Parent: Waiting for Child (%d) to complete.\n", pid);
> > +
> > +	if (use_waitpid) {
> > +		ret = waitpid(pid, &status, 0);
> > +		if (ret == -1)
> > +			ksft_print_msg("Parent: error\n");
> > +	} else {
> > +		/*
> > +		 * This sleep tests for the case where if the child exits, and is in
> > +		 * EXIT_ZOMBIE, but the thread group leader is non-empty, then the poll
> > +		 * doesn't prematurely return even though there are active threads
> > +		 */
> > +		sleep(1);
> > +		poll_pidfd(test_name, pidfd);
> 
> Make poll_pidfd() void or check error, please.

done, made void

> > +	}
> > +
> > +	if (ret == pid)
> > +		ksft_print_msg("Parent: Child process waited for.\n");
> > +
> > +	time_t since_child_exit = time(NULL) - *child_exit_secs;
> > +
> > +	ksft_print_msg("Time since child exit: %lu\n", since_child_exit);
> > +
> > +	close(pidfd);
> > +
> > +	if (since_child_exit < CHILD_THREAD_MIN_WAIT ||
> > +			since_child_exit > CHILD_THREAD_MIN_WAIT + 2)
> 
> This looks very magical. Especially without a comment. Now you add
> random +2. Please comment it or better, come up with a non-timing
> based test.

Will try a non-timing test, need to plan it out. Other comments are addressed
and will post again soon, thanks!

 - Joel

Joel Fernandes April 26, 2019, 1:47 p.m. UTC | #5

On Thu, Apr 25, 2019 at 02:00:35PM -0600, Tycho Andersen wrote:
> On Thu, Apr 25, 2019 at 03:00:10PM -0400, Joel Fernandes (Google) wrote:
> >
> > +void *test_pidfd_poll_exec_thread(void *priv)
> 
> I think everything in this file can be static, there's this one and
> 3-4 below.
> 
> > +int test_pidfd_poll_exec(int use_waitpid)
> > +{
> > +	int pid, pidfd = 0;
> > +	int status, ret;
> > +	pthread_t t1;
> > +	time_t prog_start = time(NULL);
> > +	const char *test_name = "pidfd_poll check for premature notification on child thread exec";
> > +
> > +	ksft_print_msg("Parent: pid: %d\n", getpid());
> > +	pid = pidfd_clone(CLONE_PIDFD, &pidfd, child_poll_exec_test);
> 
> If pidfd_clone() fails here, I think things will go haywire below.
> 
> > +	ksft_print_msg("Parent: Waiting for Child (%d) to complete.\n", pid);
> > +
> > +	if (use_waitpid) {
> > +		ret = waitpid(pid, &status, 0);
> > +		if (ret == -1)
> > +			ksft_print_msg("Parent: error\n");
> > +
> > +		if (ret == pid)
> > +			ksft_print_msg("Parent: Child process waited for.\n");
> > +	} else {
> > +		poll_pidfd(test_name, pidfd);
> > +	}
> > +
> > +	time_t prog_time = time(NULL) - prog_start;
> > +
> > +	ksft_print_msg("Time waited for child: %lu\n", prog_time);
> > +
> > +	close(pidfd);
> > +
> > +	if (prog_time < CHILD_THREAD_MIN_WAIT || prog_time > CHILD_THREAD_MIN_WAIT + 2)
> > +		ksft_exit_fail_msg("%s test: Failed\n", test_name);
> > +	else
> > +		ksft_test_result_pass("%s test: Passed\n", test_name);
> > +}
> > +
> > +void *test_pidfd_poll_leader_exit_thread(void *priv)
> > +{
> > +	char waittime[256];
> > +
> > +	ksft_print_msg("Child Thread: starting. pid %d tid %d ; and sleeping\n",
> > +			getpid(), syscall(SYS_gettid));
> > +	sleep(CHILD_THREAD_MIN_WAIT);
> > +	ksft_print_msg("Child Thread: DONE. pid %d tid %d\n", getpid(), syscall(SYS_gettid));
> > +	return NULL;
> > +}
> > +
> > +static time_t *child_exit_secs;
> > +static int child_poll_leader_exit_test(void *args)
> > +{
> > +	pthread_t t1, t2;
> > +
> > +	ksft_print_msg("Child: starting. pid %d tid %d\n", getpid(), syscall(SYS_gettid));
> > +	pthread_create(&t1, NULL, test_pidfd_poll_leader_exit_thread, NULL);
> > +	pthread_create(&t2, NULL, test_pidfd_poll_leader_exit_thread, NULL);
> > +
> > +	/*
> > +	 * glibc exit calls exit_group syscall, so explicity call exit only
> > +	 * so that only the group leader exits, leaving the threads alone.
> > +	 */
> > +	*child_exit_secs = time(NULL);
> > +	syscall(SYS_exit, 0);
> > +}
> > +
> > +int test_pidfd_poll_leader_exit(int use_waitpid)
> > +{
> > +	int pid, pidfd = 0;
> > +	int status, ret;
> > +	time_t prog_start = time(NULL);
> > +	const char *test_name = "pidfd_poll check for premature notification on non-empty"
> > +				"group leader exit";
> > +
> > +	child_exit_secs = mmap(NULL, sizeof *child_exit_secs, PROT_READ | PROT_WRITE,
> > +			MAP_SHARED | MAP_ANONYMOUS, -1, 0);
> > +
> > +	ksft_print_msg("Parent: pid: %d\n", getpid());
> > +	pid = pidfd_clone(CLONE_PIDFD, &pidfd, child_poll_leader_exit_test);
> 
> Same problem here, I think.


All comments address and fixed in the next revision, thanks!

 - Joel

Joel Fernandes April 26, 2019, 5:26 p.m. UTC | #6

On Thu, Apr 25, 2019 at 03:07:48PM -0700, Daniel Colascione wrote:
> On Thu, Apr 25, 2019 at 2:29 PM Christian Brauner <christian@brauner.io> wrote:
> > This timing-based testing seems kinda odd to be honest. Can't we do
> > something better than this?
> 
> Agreed. Timing-based tests have a substantial risk of becoming flaky.
> We ought to be able to make these tests fully deterministic and not
> subject to breakage from odd scheduling outcomes. We don't have
> sleepable events for everything, granted, but sleep-waiting on a
> condition with exponential backoff is fine in test code. In general,
> if you start with a robust test, you can insert a sleep(100) anywhere
> and not break the logic. Violating this rule always causes pain sooner
> or later.

I prefer if you can be more specific about how to redesign the test. Please
go through the code and make suggestions there. The tests have not been flaky
in my experience. Some tests do depend on timing like the preemptoff tests,
that can't be helped. Or a performance test that calculates framedrops.

In this case, we want to make sure that the poll unblocks at the right "time"
that is when the non-leader thread exits, and not when the leader thread
exits (test 1), or when the non-leader thread exits and not when the same
non-leader previous did an execve (test 2).

These are inherently timing related. Yes it is true that if this runs in a VM
and if the VM CPU is preempted for a couple seconds, then the test can fail
falsely. Still I would argue such a failure scenario of a multi-second CPU
lock-up can cause more serious issues like RCU stalls, and that's not a test
issue. We can increase the sleep intervals if you want, to reduce the risk of
such scenarios.

I would love to make the test not depend on timing, but I don't know how. And
the tests caught issues that I had in my development flow, so the tests
worked quite well in my experience.

thanks,

 - Joel

Daniel Colascione April 26, 2019, 7:35 p.m. UTC | #7

On Fri, Apr 26, 2019 at 10:26 AM Joel Fernandes <joel@joelfernandes.org> wrote:
> On Thu, Apr 25, 2019 at 03:07:48PM -0700, Daniel Colascione wrote:
> > On Thu, Apr 25, 2019 at 2:29 PM Christian Brauner <christian@brauner.io> wrote:
> > > This timing-based testing seems kinda odd to be honest. Can't we do
> > > something better than this?
> >
> > Agreed. Timing-based tests have a substantial risk of becoming flaky.
> > We ought to be able to make these tests fully deterministic and not
> > subject to breakage from odd scheduling outcomes. We don't have
> > sleepable events for everything, granted, but sleep-waiting on a
> > condition with exponential backoff is fine in test code. In general,
> > if you start with a robust test, you can insert a sleep(100) anywhere
> > and not break the logic. Violating this rule always causes pain sooner
> > or later.
>
> I prefer if you can be more specific about how to redesign the test. Please
> go through the code and make suggestions there. The tests have not been flaky
> in my experience.

You've been running them in an ideal environment.

Some tests do depend on timing like the preemptoff tests,
> that can't be helped. Or a performance test that calculates framedrops.

Performance tests are *about* timing. This is a functional test. Here,
we care about sequencing, not timing, and using a bare sleep instead
of sleeping with a condition check (see below) is always flaky.

> In this case, we want to make sure that the poll unblocks at the right "time"
> that is when the non-leader thread exits, and not when the leader thread
> exits (test 1), or when the non-leader thread exits and not when the same
> non-leader previous did an execve (test 2).

Instead of sleeping, you want to wait for some condition. Right now,
in a bunch of places, the test does something like this:

do_something()
sleep(SOME_TIMEOUT)
check(some_condition())

You can replace each of these clauses with something like this:

do_something()
start_time = now()
while(!some_condition() && now() - start_time < LONG_TIMEOUT)
  sleep(SHORT_DELAY)
check(some_condition())

This way, you're insensitive to timing, up to LONG_TIMEOUT (which can
be something like a minute). Yes, you can always write
sleep(LARGE_TIMEOUT) instead, but a good, robust value of LONG_TIMEOUT
(which should be tens of seconds) would make the test take far too
long to run in the happy case.

Note that this code is fine:

check(!some_condition())
sleep(SOME_REASONABLE_TIMEOUT)
check(!some_condition())

It's okay to sleep for a little while and check that something did
*not* happen, but it's not okay for the test to *fail* due to
scheduling delays. The difference is that
sleeping-and-checking-that-something-didn't-happen can only generate
false negatives when checking for failures, and it's much better from
a code health perspective for a test to sometimes fail to detect a bug
than for it to fire occasionally when there's no bug actually present.

> These are inherently timing related.

No they aren't. We don't care how long these operations take. We only
care that they happen in the right order.

(Well, we do care about performance, but not for the purposes of this
functional test.)

> Yes it is true that if this runs in a VM
> and if the VM CPU is preempted for a couple seconds, then the test can fail
> falsely. Still I would argue such a failure scenario of a multi-second CPU
> lock-up can cause more serious issues like RCU stalls, and that's not a test
> issue. We can increase the sleep intervals if you want, to reduce the risk of
> such scenarios.
>
> I would love to make the test not depend on timing, but I don't know how.

For threads, implement some_condition() above by opening a /proc
directory to the task you want. You can look by death by looking for
zombie status in stat or ESRCH.

If you want to test that poll() actually unblocks on exit (as opposed
to EPOLLIN-ing immediately when the waited process is already dead),
do something like this:

- [Main test thread] Start subprocess, getting a pidfd
- [Subprocess] Wait forever
- [Main test thread] Start a waiter thread
- [Waiter test thread] poll(2) (or epoll, if you insist) on process exit
- [Main test thread] sleep(FAIRLY_SHORT_TIMEOUT)
- [Main test thread] Check that the subprocess is alive
- [Main test thread] pthread_tryjoin_np (make sure the waiter thread
is still alive)
- [Main test thread] Kill the subprocess (or one of its threads, for
testing the leader-exit case)
- [Main test thread] pthread_timedjoin_np(LONG_TIMEOUT) the waiter thread
- [Waiter test thread] poll(2) returns and thread exits
- [Main test thread] pthread_join returns: test succeeds (or the
pthread_timedjoin_np fails with ETIMEOUT, it means poll(2) didn't
unblock, and the test should fail).

Tests that sleep for synchronization *do* end up being flaky. That the
flakiness doesn't show up in local iterative testing doesn't mean that
the test is adequately robust.

Joel Fernandes April 26, 2019, 8:31 p.m. UTC | #8

On Fri, Apr 26, 2019 at 12:35:40PM -0700, Daniel Colascione wrote:
> On Fri, Apr 26, 2019 at 10:26 AM Joel Fernandes <joel@joelfernandes.org> wrote:
> > On Thu, Apr 25, 2019 at 03:07:48PM -0700, Daniel Colascione wrote:
> > > On Thu, Apr 25, 2019 at 2:29 PM Christian Brauner <christian@brauner.io> wrote:
> > > > This timing-based testing seems kinda odd to be honest. Can't we do
> > > > something better than this?
> > >
> > > Agreed. Timing-based tests have a substantial risk of becoming flaky.
> > > We ought to be able to make these tests fully deterministic and not
> > > subject to breakage from odd scheduling outcomes. We don't have
> > > sleepable events for everything, granted, but sleep-waiting on a
> > > condition with exponential backoff is fine in test code. In general,
> > > if you start with a robust test, you can insert a sleep(100) anywhere
> > > and not break the logic. Violating this rule always causes pain sooner
> > > or later.
> >
> > I prefer if you can be more specific about how to redesign the test. Please
> > go through the code and make suggestions there. The tests have not been flaky
> > in my experience.
> 
> You've been running them in an ideal environment.

One would hope for a reliable test environment.

> > In this case, we want to make sure that the poll unblocks at the right "time"
> > that is when the non-leader thread exits, and not when the leader thread
> > exits (test 1), or when the non-leader thread exits and not when the same
> > non-leader previous did an execve (test 2).
> 
> Instead of sleeping, you want to wait for some condition. Right now,
> in a bunch of places, the test does something like this:
> 
> do_something()
> sleep(SOME_TIMEOUT)
> check(some_condition())

No. I don't have anything like "some_condition()". My some_condition() is
just the difference in time.

> 
> You can replace each of these clauses with something like this:
> 
> do_something()
> start_time = now()
> while(!some_condition() && now() - start_time < LONG_TIMEOUT)
>   sleep(SHORT_DELAY)
> check(some_condition())
> 
> This way, you're insensitive to timing, up to LONG_TIMEOUT (which can
> be something like a minute). Yes, you can always write
> sleep(LARGE_TIMEOUT) instead, but a good, robust value of LONG_TIMEOUT
> (which should be tens of seconds) would make the test take far too
> long to run in the happy case.

Yes, but try implementing some_condition()  :-). It is easy to talk in the
abstract, I think it would be more productive if you can come up with an
implementation/patchh of the test itself and send a patch for that. I know
you wrote some pseudocode below, but it is a complex reimplementation that I
don't think will make the test more robust. I mean reading /proc/pid stat?
yuck :) You are welcome to send a patch though if you have a better
implementation.

> Note that this code is fine:
> 
> check(!some_condition())
> sleep(SOME_REASONABLE_TIMEOUT)
> check(!some_condition())
> 
> It's okay to sleep for a little while and check that something did
> *not* happen, but it's not okay for the test to *fail* due to
> scheduling delays. The difference is that

As I said, multi-second scheduling delay are really unacceptable anyway. I
bet many kselftest may fail on a "bad" system like that way, that does not
mean the test is flaky. If there are any reports in the future that the test
fails or is flaky, I am happy to address them at that time. The tests work
and catch bugs reliably as I have seen. We could also make the test task as
RT if scheduling class is a concern.

I don't think its worth bikeshedding about hypothetical issues.

thanks,

 - Joel

[v1,2/2] Add selftests for pidfd polling

Commit Message

Comments

Patch