diff mbox series

KVM: Move VM's worker kthreads back to the original cgroups before exiting.

Message ID 20211214050708.4040200-1-vipinsh@google.com (mailing list archive)
State New, archived
Headers show
Series KVM: Move VM's worker kthreads back to the original cgroups before exiting. | expand

Commit Message

Vipin Sharma Dec. 14, 2021, 5:07 a.m. UTC
VM worker kthreads can linger in the VM process's cgroup for sometime
after KVM temrinates the VM process.

KVM terminates the worker kthreads by calling kthread_stop() which waits
on the signal generated by exit_mm() in do_exit() during kthread's exit.
However, these kthreads are removed from the cgroup using cgroup_exit()
call which happens after exit_mm() in do_exit(). A VM process can
terminate between the time window of exit_mm() to cgroup_exit(), leaving
only worker kthreads in the cgroup.

Moving worker kthreads back to the original cgroup (kthreadd_task's
cgroup) makes sure that cgroup is empty as soon as the main VM process
is terminated.

Signed-off-by: Vipin Sharma <vipinsh@google.com>
---
 virt/kvm/kvm_main.c | 11 ++++++++++-
 1 file changed, 10 insertions(+), 1 deletion(-)


base-commit: d8f6ef45a623d650f9b97e11553adb4978f6aa70

Comments

kernel test robot Dec. 14, 2021, 3:46 p.m. UTC | #1
Hi Vipin,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on d8f6ef45a623d650f9b97e11553adb4978f6aa70]

url:    https://github.com/0day-ci/linux/commits/Vipin-Sharma/KVM-Move-VM-s-worker-kthreads-back-to-the-original-cgroups-before-exiting/20211214-130827
base:   d8f6ef45a623d650f9b97e11553adb4978f6aa70
config: riscv-defconfig (https://download.01.org/0day-ci/archive/20211214/202112142328.a9ebDmd7-lkp@intel.com/config)
compiler: riscv64-linux-gcc (GCC) 11.2.0
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # https://github.com/0day-ci/linux/commit/fd29d23507ef3f06b61d9de1b7ecd1a0d70136f3
        git remote add linux-review https://github.com/0day-ci/linux
        git fetch --no-tags linux-review Vipin-Sharma/KVM-Move-VM-s-worker-kthreads-back-to-the-original-cgroups-before-exiting/20211214-130827
        git checkout fd29d23507ef3f06b61d9de1b7ecd1a0d70136f3
        # save the config file to linux build tree
        mkdir build_dir
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-11.2.0 make.cross O=build_dir ARCH=riscv SHELL=/bin/bash

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All errors (new ones prefixed by >>, old ones prefixed by <<):

>> ERROR: modpost: "kthreadd_task" [arch/riscv/kvm/kvm.ko] undefined!

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org
Lai Jiangshan Dec. 14, 2021, 4:14 p.m. UTC | #2
On Tue, Dec 14, 2021 at 4:13 PM Vipin Sharma <vipinsh@google.com> wrote:
>
> VM worker kthreads can linger in the VM process's cgroup for sometime
> after KVM temrinates the VM process.
>
> KVM terminates the worker kthreads by calling kthread_stop() which waits
> on the signal generated by exit_mm() in do_exit() during kthread's exit.
> However, these kthreads are removed from the cgroup using cgroup_exit()
> call which happens after exit_mm() in do_exit(). A VM process can
> terminate between the time window of exit_mm() to cgroup_exit(), leaving
> only worker kthreads in the cgroup.
>
> Moving worker kthreads back to the original cgroup (kthreadd_task's
> cgroup) makes sure that cgroup is empty as soon as the main VM process
> is terminated.
>
> Signed-off-by: Vipin Sharma <vipinsh@google.com>
> ---
>  virt/kvm/kvm_main.c | 11 ++++++++++-
>  1 file changed, 10 insertions(+), 1 deletion(-)

Hello

Off-topic, can this kvm worker_thread and the thread to do async pagefault
be possibly changed to use something like io_uring's IOWQ (fs/io-wq.c)
created by create_io_thread()?

So that every resource the threads used are credited to the process
of the vm.

Thanks
Lai
Sean Christopherson Dec. 14, 2021, 5:16 p.m. UTC | #3
On Tue, Dec 14, 2021, Vipin Sharma wrote:
> VM worker kthreads can linger in the VM process's cgroup for sometime
> after KVM temrinates the VM process.
> 
> KVM terminates the worker kthreads by calling kthread_stop() which waits
> on the signal generated by exit_mm() in do_exit() during kthread's exit.
> However, these kthreads are removed from the cgroup using cgroup_exit()
> call which happens after exit_mm() in do_exit(). A VM process can
> terminate between the time window of exit_mm() to cgroup_exit(), leaving
> only worker kthreads in the cgroup.
> 
> Moving worker kthreads back to the original cgroup (kthreadd_task's
> cgroup) makes sure that cgroup is empty as soon as the main VM process
> is terminated.
> 
> Signed-off-by: Vipin Sharma <vipinsh@google.com>
> ---
>  virt/kvm/kvm_main.c | 11 ++++++++++-
>  1 file changed, 10 insertions(+), 1 deletion(-)
> 
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index b0f7e6eb00ff..edd304a18f16 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -5785,7 +5785,7 @@ static int kvm_vm_worker_thread(void *context)
>  	init_context = NULL;
>  
>  	if (err)
> -		return err;
> +		goto out;
>  
>  	/* Wait to be woken up by the spawner before proceeding. */
>  	kthread_parkme();
> @@ -5793,6 +5793,15 @@ static int kvm_vm_worker_thread(void *context)
>  	if (!kthread_should_stop())
>  		err = thread_fn(kvm, data);
>  
> +out:
> +	/*
> +	 * We need to move the kthread back to its original cgroups, so that it
> +	 * doesn't linger in the cgroups of the user process after that has
> +	 * already terminated. exit_mm() in do_exit() signals kthread_stop() to
> +	 * return, whereas, removal of the task from the cgroups happens in
> +	 * cgroup_exit() which happens after exit_mm().
> +	 */
> +	WARN_ON(cgroup_attach_task_all(kthreadd_task, current));

As the build bot noted, kthreadd_task isn't exported, and I doubt you'll convince
folks to let you export it.

Why is it problematic for the kthread to linger in the cgroup?  Conceptually, it's
not really wrong.

>  	return err;
>  }
>  
> 
> base-commit: d8f6ef45a623d650f9b97e11553adb4978f6aa70
> -- 
> 2.34.1.173.g76aa8bc2d0-goog
>
kernel test robot Dec. 14, 2021, 5:44 p.m. UTC | #4
Hi Vipin,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on d8f6ef45a623d650f9b97e11553adb4978f6aa70]

url:    https://github.com/0day-ci/linux/commits/Vipin-Sharma/KVM-Move-VM-s-worker-kthreads-back-to-the-original-cgroups-before-exiting/20211214-130827
base:   d8f6ef45a623d650f9b97e11553adb4978f6aa70
config: x86_64-rhel-8.3 (https://download.01.org/0day-ci/archive/20211215/202112150131.MaZ9xOJx-lkp@intel.com/config)
compiler: gcc-9 (Debian 9.3.0-22) 9.3.0
reproduce (this is a W=1 build):
        # https://github.com/0day-ci/linux/commit/fd29d23507ef3f06b61d9de1b7ecd1a0d70136f3
        git remote add linux-review https://github.com/0day-ci/linux
        git fetch --no-tags linux-review Vipin-Sharma/KVM-Move-VM-s-worker-kthreads-back-to-the-original-cgroups-before-exiting/20211214-130827
        git checkout fd29d23507ef3f06b61d9de1b7ecd1a0d70136f3
        # save the config file to linux build tree
        mkdir build_dir
        make W=1 O=build_dir ARCH=x86_64 SHELL=/bin/bash

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All errors (new ones prefixed by >>, old ones prefixed by <<):

>> ERROR: modpost: "kthreadd_task" [arch/x86/kvm/kvm.ko] undefined!

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org
Vipin Sharma Dec. 15, 2021, 3:58 a.m. UTC | #5
On Tue, Dec 14, 2021 at 9:16 AM Sean Christopherson <seanjc@google.com> wrote:
>
> On Tue, Dec 14, 2021, Vipin Sharma wrote:
> > +     WARN_ON(cgroup_attach_task_all(kthreadd_task, current));
>
> As the build bot noted, kthreadd_task isn't exported, and I doubt you'll convince
> folks to let you export it.
>
> Why is it problematic for the kthread to linger in the cgroup?  Conceptually, it's
> not really wrong.

Issue comes when a process tries to clear up the resources when a VM
shutdown/dies. The process sometimes get an EBUSY error when it tries
to delete the cgroup directories which were created for that VM. It is
also difficult to know how many times to retry or how much time to
wait before the cgroup is empty. This issue is not always happening.
Paolo Bonzini Dec. 15, 2021, 6:27 p.m. UTC | #6
On 12/14/21 06:07, Vipin Sharma wrote:
> 
> KVM terminates the worker kthreads by calling kthread_stop() which waits
> on the signal generated by exit_mm() in do_exit() during kthread's exit.

Instead of "signal", please spell it as "the 'exited' completion, 
triggered by exit_mm(), via mm_release(), during the kthread's exit". 
That makes things a bit clearer.

So the issue is that the kthread_stop happens around the time 
exit_task_work() destroys the VM, but the process can go on and signal 
its demise to the parent process before the kthread has been completely 
dropped.  Not even close() can fix it, though it may reduce the window 
completely, so I agree that this is a bug and vhost has the same bug too.

Due to the issue with kthreadd_task not being exported, perhaps you can 
change cgroup_attach_task_all to use kthreadd_task if the "from" 
argument is NULL?

Paolo
Vipin Sharma Dec. 22, 2021, 8:12 p.m. UTC | #7
On Wed, Dec 15, 2021 at 10:27 AM Paolo Bonzini <pbonzini@redhat.com> wrote:
>
> So the issue is that the kthread_stop happens around the time
> exit_task_work() destroys the VM, but the process can go on and signal
> its demise to the parent process before the kthread has been completely
> dropped.  Not even close() can fix it, though it may reduce the window
> completely, so I agree that this is a bug and vhost has the same bug too.
>
> Due to the issue with kthreadd_task not being exported, perhaps you can
> change cgroup_attach_task_all to use kthreadd_task if the "from"
> argument is NULL?

Thanks for the suggestion. This also works, I will send out a patch.
Vipin Sharma Dec. 22, 2021, 8:14 p.m. UTC | #8
On Tue, Dec 14, 2021 at 8:14 AM Lai Jiangshan <jiangshanlai@gmail.com> wrote:
>
> Hello
>
> Off-topic, can this kvm worker_thread and the thread to do async pagefault
> be possibly changed to use something like io_uring's IOWQ (fs/io-wq.c)
> created by create_io_thread()?
>
> So that every resource the threads used are credited to the process
> of the vm.
>

Sorry, I am not very familiar with it. Maybe someone from the
community knows better about this.
diff mbox series

Patch

diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index b0f7e6eb00ff..edd304a18f16 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -5785,7 +5785,7 @@  static int kvm_vm_worker_thread(void *context)
 	init_context = NULL;
 
 	if (err)
-		return err;
+		goto out;
 
 	/* Wait to be woken up by the spawner before proceeding. */
 	kthread_parkme();
@@ -5793,6 +5793,15 @@  static int kvm_vm_worker_thread(void *context)
 	if (!kthread_should_stop())
 		err = thread_fn(kvm, data);
 
+out:
+	/*
+	 * We need to move the kthread back to its original cgroups, so that it
+	 * doesn't linger in the cgroups of the user process after that has
+	 * already terminated. exit_mm() in do_exit() signals kthread_stop() to
+	 * return, whereas, removal of the task from the cgroups happens in
+	 * cgroup_exit() which happens after exit_mm().
+	 */
+	WARN_ON(cgroup_attach_task_all(kthreadd_task, current));
 	return err;
 }