diff mbox series

Export mm_update_next_owner function for vhost-net

Message ID 1544676445-14897-1-git-send-email-gchen.guomin@gmail.com (mailing list archive)
State New, archived
Headers show
Series Export mm_update_next_owner function for vhost-net | expand

Commit Message

gchen chen Dec. 13, 2018, 4:47 a.m. UTC
From: guomin chen <gchen.guomin@gmail.com>

 Under normal circumstances,When do_exit exits, mm->owner will
 be updated on exit_mm(). but when the kernel process calls
 unuse_mm() and then exits,mm->owner cannot be updated. And it
 will point to a task that has been released.

 Below is my issue on vhost_net:
    A, B are two kernel processes(such as vhost_worker),
    C is a user space process(such as qemu), and all
    three use the mm of the user process C.
    Now, because user process C exits abnormally, the owner of this
    mm becomes A. When A calls unuse_mm and exits, this mm->ower
    still points to the A that has been released.
    When B accesses this mm->owner again, A has been released.

 Process A             Process B
 vhost_worker()        vhost_worker()
  ---------             ---------
  use_mm()              use_mm()
   ...
  unuse_mm()
     tsk->mm=NULL
   do_exit()            page fault
    exit_mm()           access mm->owner
   can't update owner   kernel Oops

                        unuse_mm()

Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Jason Wang <jasowang@redhat.com>
Cc: kvm@vger.kernel.org
Cc: virtualization@lists.linux-foundation.org
Cc: netdev@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Sudip Mukherjee <sudipm.mukherjee@gmail.com>
Cc: "Luis R. Rodriguez" <mcgrof@kernel.org>
Cc: Dominik Brodowski <linux@dominikbrodowski.net>
Signed-off-by: guomin chen <gchen.guomin@gmail.com>
---
 drivers/vhost/vhost.c | 1 +
 kernel/exit.c         | 1 +
 2 files changed, 2 insertions(+)

Comments

Jason Wang Dec. 13, 2018, 6:30 a.m. UTC | #1
On 2018/12/13 下午12:47, gchen.guomin@gmail.com wrote:
> From: guomin chen <gchen.guomin@gmail.com>
>
>   Under normal circumstances,When do_exit exits, mm->owner will
>   be updated on exit_mm(). but when the kernel process calls
>   unuse_mm() and then exits,mm->owner cannot be updated. And it
>   will point to a task that has been released.
>
>   Below is my issue on vhost_net:
>      A, B are two kernel processes(such as vhost_worker),
>      C is a user space process(such as qemu), and all
>      three use the mm of the user process C.
>      Now, because user process C exits abnormally, the owner of this
>      mm becomes A. When A calls unuse_mm and exits, this mm->ower
>      still points to the A that has been released.
>      When B accesses this mm->owner again, A has been released.


Could you describe how you reproduce this issue? I believe vhost process 
should exit before process C?


>
>   Process A             Process B
>   vhost_worker()        vhost_worker()
>    ---------             ---------
>    use_mm()              use_mm()
>     ...
>    unuse_mm()
>       tsk->mm=NULL
>     do_exit()            page fault
>      exit_mm()           access mm->owner
>     can't update owner   kernel Oops
>
>                          unuse_mm()
>
> Cc: "Michael S. Tsirkin" <mst@redhat.com>
> Cc: Jason Wang <jasowang@redhat.com>
> Cc: kvm@vger.kernel.org
> Cc: virtualization@lists.linux-foundation.org
> Cc: netdev@vger.kernel.org
> Cc: linux-kernel@vger.kernel.org
> Cc: "Eric W. Biederman" <ebiederm@xmission.com>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Sudip Mukherjee <sudipm.mukherjee@gmail.com>
> Cc: "Luis R. Rodriguez" <mcgrof@kernel.org>
> Cc: Dominik Brodowski <linux@dominikbrodowski.net>
> Signed-off-by: guomin chen <gchen.guomin@gmail.com>
> ---
>   drivers/vhost/vhost.c | 1 +
>   kernel/exit.c         | 1 +
>   2 files changed, 2 insertions(+)
>
> diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
> index 6b98d8e..7c09087 100644
> --- a/drivers/vhost/vhost.c
> +++ b/drivers/vhost/vhost.c
> @@ -368,6 +368,7 @@ static int vhost_worker(void *data)
>   		}
>   	}
>   	unuse_mm(dev->mm);
> +	mm_update_next_owner(dev->mm);


If you analysis is correct, this is still racy isn't it? (E.g page fault 
happen between unuse_mm() and mm_update_next_owner()).

Thanks


>   	set_fs(oldfs);
>   	return 0;
>   }
> diff --git a/kernel/exit.c b/kernel/exit.c
> index 0e21e6d..9e046dd 100644
> --- a/kernel/exit.c
> +++ b/kernel/exit.c
> @@ -486,6 +486,7 @@ void mm_update_next_owner(struct mm_struct *mm)
>   	task_unlock(c);
>   	put_task_struct(c);
>   }
> +EXPORT_SYMBOL(mm_update_next_owner);
>   #endif /* CONFIG_MEMCG */
>   
>   /*
guominchen(陈国民) Dec. 13, 2018, 9:13 a.m. UTC | #2
>>   Under normal circumstances,When do_exit exits, mm->owner will
>>   be updated on exit_mm(). but when the kernel process calls
>>   unuse_mm() and then exits,mm->owner cannot be updated. And it
>>   will point to a task that has been released.
>>
>>   Below is my issue on vhost_net:
>>      A, B are two kernel processes(such as vhost_worker),
>>      C is a user space process(such as qemu), and all
>>      three use the mm of the user process C.
>>      Now, because user process C exits abnormally, the owner of this
>>      mm becomes A. When A calls unuse_mm and exits, this mm->ower
>>      still points to the A that has been released.
>>      When B accesses this mm->owner again, A has been released.


Thank your for taking a look and apologize for my distrub.

>Could you describe how you reproduce this issue?
Sorry, this issue is hard for my to reproduce, But there is such a critical situation.

>I believe vhost process should exit before process C?
Yes, the A, B will exit before C, because usually C will close the open fd and then exit.
However, if C is abnormally exited, such as killed by some fatal signal, A may exit before C

The current issue flow is as follows:
Process C              Process A         Process B
qemu-system-x86_64:     kernel:vhost_net  kernel: vhost_net
open /dev/vhost-net
  VHOST_SET_OWNER   create kthread vhost-%d  create kthread vhost-%d
  network init           use_mm()          use_mm()
   ...                   ...
   Abnormal exited
   ...
  do_exit
  exit_mm()
  update mm->owner to A
  exit_files()
   close_files()
   kthread_should_stop() unuse_mm()
    Stop Process A       tsk->mm=NULL
                         do_exit()
                         can't update owner
                         A exit completed   vhost-%d  rcv first package
                                            vhost-%d build rcv buffer for vq
                                            page fault
                                            access mm & mm->owner
                                            NOW,mm->owner still pointer A
                                            kernel NULL pointer at mem_cgroup_from_task()
    stop Process B

>>
>> Cc: "Michael S. Tsirkin" <mst@redhat.com>
>> Cc: Jason Wang <jasowang@redhat.com>
>> Cc: kvm@vger.kernel.org
>> Cc: virtualization@lists.linux-foundation.org
>> Cc: netdev@vger.kernel.org
>> Cc: linux-kernel@vger.kernel.org
>> Cc: "Eric W. Biederman" <ebiederm@xmission.com>
>> Cc: Andrew Morton <akpm@linux-foundation.org>
>> Cc: Sudip Mukherjee <sudipm.mukherjee@gmail.com>
>> Cc: "Luis R. Rodriguez" <mcgrof@kernel.org>
>> Cc: Dominik Brodowski <linux@dominikbrodowski.net>
>> Signed-off-by: guomin chen <gchen.guomin@gmail.com>
>> ---
>>   drivers/vhost/vhost.c | 1 +
>>   kernel/exit.c         | 1 +
>>   2 files changed, 2 insertions(+)
>>
>> diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
>> index 6b98d8e..7c09087 100644
>> --- a/drivers/vhost/vhost.c
>> +++ b/drivers/vhost/vhost.c
>> @@ -368,6 +368,7 @@ static int vhost_worker(void *data)
>>   		}
>>   	}
>>   	unuse_mm(dev->mm);
>> +	mm_update_next_owner(dev->mm);


>If you analysis is correct, this is still racy isn't it? (E.g page fault 
>happen between unuse_mm() and mm_update_next_owner()).

No, I think this is not racy. 
When page fault happend Between unuse_mm() and mm_update_next_owner(), Although tsk->mm =NULL, 
But tsk has not exited, So mm->onwer = tsk can still be accessed.  

Thanks and regards
Christoph Hellwig Dec. 17, 2018, 11:32 a.m. UTC | #3
This seems like an issue all the unuse_mm users (at least those
outside of swapfile.c) have, so it should be solved in the core.

Bonus points for moving the set_fs magic into use_mm()..
Michael S. Tsirkin Dec. 18, 2018, 1:41 a.m. UTC | #4
On Tue, Dec 18, 2018 at 09:39:16AM +0800, gchen chen wrote:
> Yes, I think so. 
> and i think the point is that unuse_mm() can't directly set tsk->mm=NULL.

So why can't unuse_mm call mm_update_next_owner?
diff mbox series

Patch

diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
index 6b98d8e..7c09087 100644
--- a/drivers/vhost/vhost.c
+++ b/drivers/vhost/vhost.c
@@ -368,6 +368,7 @@  static int vhost_worker(void *data)
 		}
 	}
 	unuse_mm(dev->mm);
+	mm_update_next_owner(dev->mm);
 	set_fs(oldfs);
 	return 0;
 }
diff --git a/kernel/exit.c b/kernel/exit.c
index 0e21e6d..9e046dd 100644
--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -486,6 +486,7 @@  void mm_update_next_owner(struct mm_struct *mm)
 	task_unlock(c);
 	put_task_struct(c);
 }
+EXPORT_SYMBOL(mm_update_next_owner);
 #endif /* CONFIG_MEMCG */
 
 /*