Message ID | 20220222054848.563321-1-vipinsh@google.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [v4] KVM: Move VM's worker kthreads back to the original cgroup before exiting. | expand |
On 2/22/22 06:48, Vipin Sharma wrote: > VM worker kthreads can linger in the VM process's cgroup for sometime > after KVM terminates the VM process. > > KVM terminates the worker kthreads by calling kthread_stop() which waits > on the 'exited' completion, triggered by exit_mm(), via mm_release(), in > do_exit() during the kthread's exit. However, these kthreads are > removed from the cgroup using the cgroup_exit() which happens after the > exit_mm(). Therefore, A VM process can terminate in between the > exit_mm() and cgroup_exit() calls, leaving only worker kthreads in the > cgroup. > > Moving worker kthreads back to the original cgroup (kthreadd_task's > cgroup) makes sure that the cgroup is empty as soon as the main VM > process is terminated. > > Signed-off-by: Vipin Sharma <vipinsh@google.com> > Suggested-by: Sean Christopherson <seanjc@google.com> > --- Queued, thanks. Paolo > Thanks Sean, for the example on how to use the real_parent outside of the RCU > critical region. I wrote your name in Suggested-by, I hope you are fine with > it and this is the right tag/way to give you the credit. > > v4: > - Read task's real_parent in the RCU critical section. > - Don't log error message from the cgroup_attach_task_all() API. > > v3: https://lore.kernel.org/lkml/20220217061616.3303271-1-vipinsh@google.com/ > - Use 'current->real_parent' (kthreadd_task) in the > cgroup_attach_task_all() call. > - Revert cgroup APIs changes in v2. Now, patch does not touch cgroup > APIs. > - Update commit and comment message > > v2: https://lore.kernel.org/lkml/20211222225350.1912249-1-vipinsh@google.com/ > - Use kthreadd_task in the cgroup API to avoid build issue. > > v1: https://lore.kernel.org/lkml/20211214050708.4040200-1-vipinsh@google.com/ > > virt/kvm/kvm_main.c | 22 +++++++++++++++++++++- > 1 file changed, 21 insertions(+), 1 deletion(-) > > diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c > index 83c57bcc6eb6..cdf1fa3c60ae 100644 > --- a/virt/kvm/kvm_main.c > +++ b/virt/kvm/kvm_main.c > @@ -5810,6 +5810,7 @@ static int kvm_vm_worker_thread(void *context) > * we have to locally copy anything that is needed beyond initialization > */ > struct kvm_vm_worker_thread_context *init_context = context; > + struct task_struct *parent; > struct kvm *kvm = init_context->kvm; > kvm_vm_thread_fn_t thread_fn = init_context->thread_fn; > uintptr_t data = init_context->data; > @@ -5836,7 +5837,7 @@ static int kvm_vm_worker_thread(void *context) > init_context = NULL; > > if (err) > - return err; > + goto out; > > /* Wait to be woken up by the spawner before proceeding. */ > kthread_parkme(); > @@ -5844,6 +5845,25 @@ static int kvm_vm_worker_thread(void *context) > if (!kthread_should_stop()) > err = thread_fn(kvm, data); > > +out: > + /* > + * Move kthread back to its original cgroup to prevent it lingering in > + * the cgroup of the VM process, after the latter finishes its > + * execution. > + * > + * kthread_stop() waits on the 'exited' completion condition which is > + * set in exit_mm(), via mm_release(), in do_exit(). However, the > + * kthread is removed from the cgroup in the cgroup_exit() which is > + * called after the exit_mm(). This causes the kthread_stop() to return > + * before the kthread actually quits the cgroup. > + */ > + rcu_read_lock(); > + parent = rcu_dereference(current->real_parent); > + get_task_struct(parent); > + rcu_read_unlock(); > + cgroup_attach_task_all(parent, current); > + put_task_struct(parent); > + > return err; > } > > > base-commit: 1bbc60d0c7e5728aced352e528ef936ebe2344c0
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index 83c57bcc6eb6..cdf1fa3c60ae 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -5810,6 +5810,7 @@ static int kvm_vm_worker_thread(void *context) * we have to locally copy anything that is needed beyond initialization */ struct kvm_vm_worker_thread_context *init_context = context; + struct task_struct *parent; struct kvm *kvm = init_context->kvm; kvm_vm_thread_fn_t thread_fn = init_context->thread_fn; uintptr_t data = init_context->data; @@ -5836,7 +5837,7 @@ static int kvm_vm_worker_thread(void *context) init_context = NULL; if (err) - return err; + goto out; /* Wait to be woken up by the spawner before proceeding. */ kthread_parkme(); @@ -5844,6 +5845,25 @@ static int kvm_vm_worker_thread(void *context) if (!kthread_should_stop()) err = thread_fn(kvm, data); +out: + /* + * Move kthread back to its original cgroup to prevent it lingering in + * the cgroup of the VM process, after the latter finishes its + * execution. + * + * kthread_stop() waits on the 'exited' completion condition which is + * set in exit_mm(), via mm_release(), in do_exit(). However, the + * kthread is removed from the cgroup in the cgroup_exit() which is + * called after the exit_mm(). This causes the kthread_stop() to return + * before the kthread actually quits the cgroup. + */ + rcu_read_lock(); + parent = rcu_dereference(current->real_parent); + get_task_struct(parent); + rcu_read_unlock(); + cgroup_attach_task_all(parent, current); + put_task_struct(parent); + return err; }
VM worker kthreads can linger in the VM process's cgroup for sometime after KVM terminates the VM process. KVM terminates the worker kthreads by calling kthread_stop() which waits on the 'exited' completion, triggered by exit_mm(), via mm_release(), in do_exit() during the kthread's exit. However, these kthreads are removed from the cgroup using the cgroup_exit() which happens after the exit_mm(). Therefore, A VM process can terminate in between the exit_mm() and cgroup_exit() calls, leaving only worker kthreads in the cgroup. Moving worker kthreads back to the original cgroup (kthreadd_task's cgroup) makes sure that the cgroup is empty as soon as the main VM process is terminated. Signed-off-by: Vipin Sharma <vipinsh@google.com> Suggested-by: Sean Christopherson <seanjc@google.com> --- Thanks Sean, for the example on how to use the real_parent outside of the RCU critical region. I wrote your name in Suggested-by, I hope you are fine with it and this is the right tag/way to give you the credit. v4: - Read task's real_parent in the RCU critical section. - Don't log error message from the cgroup_attach_task_all() API. v3: https://lore.kernel.org/lkml/20220217061616.3303271-1-vipinsh@google.com/ - Use 'current->real_parent' (kthreadd_task) in the cgroup_attach_task_all() call. - Revert cgroup APIs changes in v2. Now, patch does not touch cgroup APIs. - Update commit and comment message v2: https://lore.kernel.org/lkml/20211222225350.1912249-1-vipinsh@google.com/ - Use kthreadd_task in the cgroup API to avoid build issue. v1: https://lore.kernel.org/lkml/20211214050708.4040200-1-vipinsh@google.com/ virt/kvm/kvm_main.c | 22 +++++++++++++++++++++- 1 file changed, 21 insertions(+), 1 deletion(-) base-commit: 1bbc60d0c7e5728aced352e528ef936ebe2344c0