diff mbox series

[v9,8/8] task: rust: rework how current is accessed

Message ID 20241122-vma-v9-8-7127bfcdd54e@google.com (mailing list archive)
State New
Headers show
Series Rust support for mm_struct, vm_area_struct, and mmap | expand

Commit Message

Alice Ryhl Nov. 22, 2024, 3:40 p.m. UTC
Introduce a new type called `CurrentTask` that lets you perform various
operations that are only safe on the `current` task. Use the new type to
provide a way to access the current mm without incrementing its
refcount.

With this change, you can write stuff such as

	let vma = current!().mm().lock_vma_under_rcu(addr);

without incrementing any refcounts.

Signed-off-by: Alice Ryhl <aliceryhl@google.com>
---
Reviewers: Does accessing task->mm on a non-current task require rcu
protection?

Christian: If you submit the PidNamespace abstractions this cycle, I'll
update this to also apply to PidNamespace.
---
 rust/kernel/mm.rs   | 22 ------------------
 rust/kernel/task.rs | 64 ++++++++++++++++++++++++++++++++++++++++++-----------
 2 files changed, 51 insertions(+), 35 deletions(-)

Comments

Alice Ryhl Nov. 22, 2024, 3:53 p.m. UTC | #1
On Fri, Nov 22, 2024 at 4:41 PM Alice Ryhl <aliceryhl@google.com> wrote:
>
> Introduce a new type called `CurrentTask` that lets you perform various
> operations that are only safe on the `current` task. Use the new type to
> provide a way to access the current mm without incrementing its
> refcount.
>
> With this change, you can write stuff such as
>
>         let vma = current!().mm().lock_vma_under_rcu(addr);
>
> without incrementing any refcounts.
>
> Signed-off-by: Alice Ryhl <aliceryhl@google.com>

Oh, that's awkward, I was testing this change using a config file that
was missing CONFIG_RUST=y, so it didn't compile the code at all. You
need the following imports for this to work:

diff --git a/rust/kernel/task.rs b/rust/kernel/task.rs
index 103d235eb844..60659076997a 100644
--- a/rust/kernel/task.rs
+++ b/rust/kernel/task.rs
@@ -7,7 +7,8 @@
 use crate::{
     bindings,
     ffi::{c_int, c_long, c_uint},
-    types::{NotThreadSafe, Opaque},
+    mm::MmWithUser,
+    types::Opaque,
 };
 use core::{
     cmp::{Eq, PartialEq},

Otherwise the code should be correct. You can fetch the tree with this fixed at:
https://github.com/Darksonn/linux/commits/b4/vma/

I'll fix it in the next version, but I will wait for review before I send that.

Alice
Lorenzo Stoakes Nov. 22, 2024, 5:34 p.m. UTC | #2
On Fri, Nov 22, 2024 at 04:53:58PM +0100, Alice Ryhl wrote:
> On Fri, Nov 22, 2024 at 4:41 PM Alice Ryhl <aliceryhl@google.com> wrote:
> >
> > Introduce a new type called `CurrentTask` that lets you perform various
> > operations that are only safe on the `current` task. Use the new type to
> > provide a way to access the current mm without incrementing its
> > refcount.
> >
> > With this change, you can write stuff such as
> >
> >         let vma = current!().mm().lock_vma_under_rcu(addr);
> >
> > without incrementing any refcounts.
> >
> > Signed-off-by: Alice Ryhl <aliceryhl@google.com>
>
> Oh, that's awkward, I was testing this change using a config file that
> was missing CONFIG_RUST=y, so it didn't compile the code at all. You
> need the following imports for this to work:
>
> diff --git a/rust/kernel/task.rs b/rust/kernel/task.rs
> index 103d235eb844..60659076997a 100644
> --- a/rust/kernel/task.rs
> +++ b/rust/kernel/task.rs
> @@ -7,7 +7,8 @@
>  use crate::{
>      bindings,
>      ffi::{c_int, c_long, c_uint},
> -    types::{NotThreadSafe, Opaque},
> +    mm::MmWithUser,
> +    types::Opaque,
>  };
>  use core::{
>      cmp::{Eq, PartialEq},
>
> Otherwise the code should be correct. You can fetch the tree with this fixed at:
> https://github.com/Darksonn/linux/commits/b4/vma/
>
> I'll fix it in the next version, but I will wait for review before I send that.

Sure, no problem, we can just make this a predicate for the ack.

>
> Alice
Lorenzo Stoakes Nov. 22, 2024, 5:54 p.m. UTC | #3
On Fri, Nov 22, 2024 at 03:40:33PM +0000, Alice Ryhl wrote:
> Introduce a new type called `CurrentTask` that lets you perform various
> operations that are only safe on the `current` task. Use the new type to
> provide a way to access the current mm without incrementing its
> refcount.

Nice!

>
> With this change, you can write stuff such as
>
> 	let vma = current!().mm().lock_vma_under_rcu(addr);
>
> without incrementing any refcounts.
>
> Signed-off-by: Alice Ryhl <aliceryhl@google.com>

On assumption that the problem you reference with the rust imports is
corrected in v10, and that what you are doing with current_raw() is
sensible, then:

Acked-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>

Thanks!

> ---
> Reviewers: Does accessing task->mm on a non-current task require rcu
> protection?

Hm I am not actually sure, but it seems like you probably do, and I would say
you need the task lock right?

Looking at find_lock_task_mm() as used by the oomk for instance suggests as much.

>
> Christian: If you submit the PidNamespace abstractions this cycle, I'll
> update this to also apply to PidNamespace.
> ---
>  rust/kernel/mm.rs   | 22 ------------------
>  rust/kernel/task.rs | 64 ++++++++++++++++++++++++++++++++++++++++++-----------
>  2 files changed, 51 insertions(+), 35 deletions(-)
>
> diff --git a/rust/kernel/mm.rs b/rust/kernel/mm.rs
> index 50f4861ae4b9..f7d1079391ef 100644
> --- a/rust/kernel/mm.rs
> +++ b/rust/kernel/mm.rs
> @@ -142,28 +142,6 @@ fn deref(&self) -> &MmWithUser {
>
>  // These methods are safe to call even if `mm_users` is zero.
>  impl Mm {
> -    /// Call `mmgrab` on `current.mm`.
> -    #[inline]
> -    pub fn mmgrab_current() -> Option<ARef<Mm>> {
> -        // SAFETY: It's safe to get the `mm` field from current.
> -        let mm = unsafe {
> -            let current = bindings::get_current();
> -            (*current).mm
> -        };
> -
> -        if mm.is_null() {
> -            return None;
> -        }
> -
> -        // SAFETY: The value of `current->mm` is guaranteed to be null or a valid `mm_struct`. We
> -        // just checked that it's not null. Furthermore, the returned `&Mm` is valid only for the
> -        // duration of this function, and `current->mm` will stay valid for that long.
> -        let mm = unsafe { Mm::from_raw(mm) };
> -
> -        // This increments the refcount using `mmgrab`.
> -        Some(ARef::from(mm))
> -    }
> -

It's nice to drop this to discourage the unusual thing of grabbing current's mm
and incrementing reference count.

>      /// Returns a raw pointer to the inner `mm_struct`.
>      #[inline]
>      pub fn as_raw(&self) -> *mut bindings::mm_struct {
> diff --git a/rust/kernel/task.rs b/rust/kernel/task.rs
> index 9e59d86da42d..103d235eb844 100644
> --- a/rust/kernel/task.rs
> +++ b/rust/kernel/task.rs
> @@ -94,6 +94,26 @@ unsafe impl Send for Task {}
>  // synchronised by C code (e.g., `signal_pending`).
>  unsafe impl Sync for Task {}
>
> +/// Represents a [`Task`] obtained from the `current` global.
> +///
> +/// This type exists to provide more efficient operations that are only valid on the current task.
> +/// For example, to retrieve the pid-namespace of a task, you must use rcu protection unless it is
> +/// the current task.
> +///
> +/// # Invariants
> +///
> +/// Must be equal to `current` of some thread that is currently running somewhere.
> +pub struct CurrentTask(Task);

Nice, I do like the ability to express abstractions like this...

> +
> +// Make all `Task` methods available on `CurrentTask`.
> +impl Deref for CurrentTask {
> +    type Target = Task;
> +    #[inline]
> +    fn deref(&self) -> &Task {
> +        &self.0
> +    }
> +}
> +

It's nice to be able to 'alias' types like this too (ok I'm sure that's not
quite the write way of describing but you know what I mean), so you can abstract
something, then very simply create variants that have the same methods but
different attributes otherwise.

>  /// The type of process identifiers (PIDs).
>  type Pid = bindings::pid_t;
>
> @@ -121,27 +141,25 @@ pub fn current_raw() -> *mut bindings::task_struct {
>      /// # Safety
>      ///
>      /// Callers must ensure that the returned object doesn't outlive the current task/thread.
> -    pub unsafe fn current() -> impl Deref<Target = Task> {
> -        struct TaskRef<'a> {
> -            task: &'a Task,
> -            _not_send: NotThreadSafe,
> +    pub unsafe fn current() -> impl Deref<Target = CurrentTask> {
> +        struct TaskRef {
> +            task: *const CurrentTask,
>          }

Why do we drop the NotThreadSafe bit here? And it seems like the 'a lifetime
stuff has gone too?

I'm guessing the lifetime stuff is because of the SAFETY comment below about
assumptions about lifetime?

>
> -        impl Deref for TaskRef<'_> {
> -            type Target = Task;
> +        impl Deref for TaskRef {
> +            type Target = CurrentTask;
>
>              fn deref(&self) -> &Self::Target {
> -                self.task
> +                // SAFETY: The returned reference borrows from this `TaskRef`, so it cannot outlive
> +                // the `TaskRef`, which the caller of `Task::current()` has promised will not
> +                // outlive the task/thread for which `self.task` is the `current` pointer. Thus, it
> +                // is okay to return a `CurrentTask` reference here.
> +                unsafe { &*self.task }
>              }
>          }
>
> -        let current = Task::current_raw();
>          TaskRef {
> -            // SAFETY: If the current thread is still running, the current task is valid. Given
> -            // that `TaskRef` is not `Send`, we know it cannot be transferred to another thread
> -            // (where it could potentially outlive the caller).
> -            task: unsafe { &*current.cast() },
> -            _not_send: NotThreadSafe,
> +            task: Task::current_raw().cast(),
>          }

I guess these changes align with the changes above?

>      }
>
> @@ -203,6 +221,26 @@ pub fn wake_up(&self) {
>      }
>  }
>
> +impl CurrentTask {
> +    /// Access the address space of this task.
> +    ///
> +    /// To increment the refcount of the referenced `mm`, you can use `ARef::from`.
> +    #[inline]
> +    pub fn mm(&self) -> Option<&MmWithUser> {
> +        let mm = unsafe { (*self.as_ptr()).mm };
> +
> +        if mm.is_null() {
> +            None
> +        } else {
> +            // SAFETY: If `current->mm` is non-null, then it references a valid mm with a non-zero
> +            // value of `mm_users`. The returned `&MmWithUser` borrows from `CurrentTask`, so the
> +            // `&MmWithUser` cannot escape the current task, meaning `mm_users` can't reach zero
> +            // while the reference is still live.
> +            Some(unsafe { MmWithUser::from_raw(mm) })
> +        }
> +    }
> +}

Nice!

> +
>  // SAFETY: The type invariants guarantee that `Task` is always refcounted.
>  unsafe impl crate::types::AlwaysRefCounted for Task {
>      fn inc_ref(&self) {
>
> --
> 2.47.0.371.ga323438b13-goog
>
Boqun Feng Nov. 22, 2024, 6:03 p.m. UTC | #4
On Fri, Nov 22, 2024 at 03:40:33PM +0000, Alice Ryhl wrote:
> Introduce a new type called `CurrentTask` that lets you perform various
> operations that are only safe on the `current` task. Use the new type to
> provide a way to access the current mm without incrementing its
> refcount.
> 
> With this change, you can write stuff such as
> 
> 	let vma = current!().mm().lock_vma_under_rcu(addr);
> 
> without incrementing any refcounts.
> 
> Signed-off-by: Alice Ryhl <aliceryhl@google.com>
> ---
> Reviewers: Does accessing task->mm on a non-current task require rcu
> protection?
> 
> Christian: If you submit the PidNamespace abstractions this cycle, I'll
> update this to also apply to PidNamespace.
> ---
>  rust/kernel/mm.rs   | 22 ------------------
>  rust/kernel/task.rs | 64 ++++++++++++++++++++++++++++++++++++++++++-----------
>  2 files changed, 51 insertions(+), 35 deletions(-)
> 
> diff --git a/rust/kernel/mm.rs b/rust/kernel/mm.rs
> index 50f4861ae4b9..f7d1079391ef 100644
> --- a/rust/kernel/mm.rs
> +++ b/rust/kernel/mm.rs
> @@ -142,28 +142,6 @@ fn deref(&self) -> &MmWithUser {
>  
>  // These methods are safe to call even if `mm_users` is zero.
>  impl Mm {
> -    /// Call `mmgrab` on `current.mm`.
> -    #[inline]
> -    pub fn mmgrab_current() -> Option<ARef<Mm>> {
> -        // SAFETY: It's safe to get the `mm` field from current.
> -        let mm = unsafe {
> -            let current = bindings::get_current();
> -            (*current).mm
> -        };
> -
> -        if mm.is_null() {
> -            return None;
> -        }
> -
> -        // SAFETY: The value of `current->mm` is guaranteed to be null or a valid `mm_struct`. We
> -        // just checked that it's not null. Furthermore, the returned `&Mm` is valid only for the
> -        // duration of this function, and `current->mm` will stay valid for that long.
> -        let mm = unsafe { Mm::from_raw(mm) };
> -
> -        // This increments the refcount using `mmgrab`.
> -        Some(ARef::from(mm))
> -    }
> -
>      /// Returns a raw pointer to the inner `mm_struct`.
>      #[inline]
>      pub fn as_raw(&self) -> *mut bindings::mm_struct {
> diff --git a/rust/kernel/task.rs b/rust/kernel/task.rs
> index 9e59d86da42d..103d235eb844 100644
> --- a/rust/kernel/task.rs
> +++ b/rust/kernel/task.rs
> @@ -94,6 +94,26 @@ unsafe impl Send for Task {}
>  // synchronised by C code (e.g., `signal_pending`).
>  unsafe impl Sync for Task {}
>  
> +/// Represents a [`Task`] obtained from the `current` global.
> +///
> +/// This type exists to provide more efficient operations that are only valid on the current task.
> +/// For example, to retrieve the pid-namespace of a task, you must use rcu protection unless it is
> +/// the current task.
> +///
> +/// # Invariants
> +///
> +/// Must be equal to `current` of some thread that is currently running somewhere.
> +pub struct CurrentTask(Task);
> +

I think you need to make `CurrentTask` `!Sync`, right? Otherwise, other
threads can access the shared reference of a `CurrentTask` and the ->mm
field. I'm thinking if we have a scoped thread/workqueue support in the
future:

	let task = current!();
	Workqueue::scoped(|s| {
	    s.spawn(|| {
	        let mm = task.mm();
		// do something with the mm
	    });
	});

> +// Make all `Task` methods available on `CurrentTask`.
> +impl Deref for CurrentTask {
> +    type Target = Task;
> +    #[inline]
> +    fn deref(&self) -> &Task {
> +        &self.0
> +    }
> +}
> +
>  /// The type of process identifiers (PIDs).
>  type Pid = bindings::pid_t;
>  
> @@ -121,27 +141,25 @@ pub fn current_raw() -> *mut bindings::task_struct {
>      /// # Safety
>      ///
>      /// Callers must ensure that the returned object doesn't outlive the current task/thread.
> -    pub unsafe fn current() -> impl Deref<Target = Task> {
> -        struct TaskRef<'a> {
> -            task: &'a Task,
> -            _not_send: NotThreadSafe,
> +    pub unsafe fn current() -> impl Deref<Target = CurrentTask> {
> +        struct TaskRef {
> +            task: *const CurrentTask,
>          }
>  
> -        impl Deref for TaskRef<'_> {
> -            type Target = Task;
> +        impl Deref for TaskRef {
> +            type Target = CurrentTask;
>  
>              fn deref(&self) -> &Self::Target {
> -                self.task
> +                // SAFETY: The returned reference borrows from this `TaskRef`, so it cannot outlive
> +                // the `TaskRef`, which the caller of `Task::current()` has promised will not
> +                // outlive the task/thread for which `self.task` is the `current` pointer. Thus, it
> +                // is okay to return a `CurrentTask` reference here.
> +                unsafe { &*self.task }
>              }
>          }
>  
> -        let current = Task::current_raw();
>          TaskRef {
> -            // SAFETY: If the current thread is still running, the current task is valid. Given
> -            // that `TaskRef` is not `Send`, we know it cannot be transferred to another thread
> -            // (where it could potentially outlive the caller).
> -            task: unsafe { &*current.cast() },
> -            _not_send: NotThreadSafe,
> +            task: Task::current_raw().cast(),
>          }
>      }
>  
> @@ -203,6 +221,26 @@ pub fn wake_up(&self) {
>      }
>  }
>  
> +impl CurrentTask {
> +    /// Access the address space of this task.
> +    ///
> +    /// To increment the refcount of the referenced `mm`, you can use `ARef::from`.
> +    #[inline]
> +    pub fn mm(&self) -> Option<&MmWithUser> {

Hmm... similar issue, `MmWithUser` is `Sync`.

> +        let mm = unsafe { (*self.as_ptr()).mm };
> +
> +        if mm.is_null() {
> +            None
> +        } else {
> +            // SAFETY: If `current->mm` is non-null, then it references a valid mm with a non-zero
> +            // value of `mm_users`. The returned `&MmWithUser` borrows from `CurrentTask`, so the
> +            // `&MmWithUser` cannot escape the current task, meaning `mm_users` can't reach zero
> +            // while the reference is still live.

Regards,
Boqun

> +            Some(unsafe { MmWithUser::from_raw(mm) })
> +        }
> +    }
> +}
> +
>  // SAFETY: The type invariants guarantee that `Task` is always refcounted.
>  unsafe impl crate::types::AlwaysRefCounted for Task {
>      fn inc_ref(&self) {
> 
> -- 
> 2.47.0.371.ga323438b13-goog
>
Alice Ryhl Nov. 22, 2024, 6:48 p.m. UTC | #5
On Fri, Nov 22, 2024 at 7:03 PM Boqun Feng <boqun.feng@gmail.com> wrote:
>
> On Fri, Nov 22, 2024 at 03:40:33PM +0000, Alice Ryhl wrote:
> > +/// Represents a [`Task`] obtained from the `current` global.
> > +///
> > +/// This type exists to provide more efficient operations that are only valid on the current task.
> > +/// For example, to retrieve the pid-namespace of a task, you must use rcu protection unless it is
> > +/// the current task.
> > +///
> > +/// # Invariants
> > +///
> > +/// Must be equal to `current` of some thread that is currently running somewhere.
> > +pub struct CurrentTask(Task);
> > +
>
> I think you need to make `CurrentTask` `!Sync`, right? Otherwise, other
> threads can access the shared reference of a `CurrentTask` and the ->mm
> field. I'm thinking if we have a scoped thread/workqueue support in the
> future:
>
>         let task = current!();
>         Workqueue::scoped(|s| {
>             s.spawn(|| {
>                 let mm = task.mm();
>                 // do something with the mm
>             });
>         });

I don't think this is a problem? As long as a thread exists somewhere
with `current` being equal to the task, we should be fine?

> > +impl CurrentTask {
> > +    /// Access the address space of this task.
> > +    ///
> > +    /// To increment the refcount of the referenced `mm`, you can use `ARef::from`.
> > +    #[inline]
> > +    pub fn mm(&self) -> Option<&MmWithUser> {
>
> Hmm... similar issue, `MmWithUser` is `Sync`.

What is the problem with that?

> > +        let mm = unsafe { (*self.as_ptr()).mm };
> > +
> > +        if mm.is_null() {
> > +            None
> > +        } else {
> > +            // SAFETY: If `current->mm` is non-null, then it references a valid mm with a non-zero
> > +            // value of `mm_users`. The returned `&MmWithUser` borrows from `CurrentTask`, so the
> > +            // `&MmWithUser` cannot escape the current task, meaning `mm_users` can't reach zero
> > +            // while the reference is still live.
>
> Regards,
> Boqun
>
> > +            Some(unsafe { MmWithUser::from_raw(mm) })
> > +        }
> > +    }
> > +}
> > +
> >  // SAFETY: The type invariants guarantee that `Task` is always refcounted.
> >  unsafe impl crate::types::AlwaysRefCounted for Task {
> >      fn inc_ref(&self) {
> >
> > --
> > 2.47.0.371.ga323438b13-goog
> >
Alice Ryhl Nov. 22, 2024, 6:51 p.m. UTC | #6
On Fri, Nov 22, 2024 at 6:55 PM Lorenzo Stoakes
<lorenzo.stoakes@oracle.com> wrote:
>
> On Fri, Nov 22, 2024 at 03:40:33PM +0000, Alice Ryhl wrote:
> > Introduce a new type called `CurrentTask` that lets you perform various
> > operations that are only safe on the `current` task. Use the new type to
> > provide a way to access the current mm without incrementing its
> > refcount.
>
> Nice!
>
> >
> > With this change, you can write stuff such as
> >
> >       let vma = current!().mm().lock_vma_under_rcu(addr);
> >
> > without incrementing any refcounts.
> >
> > Signed-off-by: Alice Ryhl <aliceryhl@google.com>
>
> On assumption that the problem you reference with the rust imports is
> corrected in v10, and that what you are doing with current_raw() is
> sensible, then:
>
> Acked-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
>
> Thanks!
>
> > ---
> > Reviewers: Does accessing task->mm on a non-current task require rcu
> > protection?
>
> Hm I am not actually sure, but it seems like you probably do, and I would say
> you need the task lock right?
>
> Looking at find_lock_task_mm() as used by the oomk for instance suggests as much.

Okay, sounds complicated. I'm not going to bother with that right now.

> >  /// The type of process identifiers (PIDs).
> >  type Pid = bindings::pid_t;
> >
> > @@ -121,27 +141,25 @@ pub fn current_raw() -> *mut bindings::task_struct {
> >      /// # Safety
> >      ///
> >      /// Callers must ensure that the returned object doesn't outlive the current task/thread.
> > -    pub unsafe fn current() -> impl Deref<Target = Task> {
> > -        struct TaskRef<'a> {
> > -            task: &'a Task,
> > -            _not_send: NotThreadSafe,
> > +    pub unsafe fn current() -> impl Deref<Target = CurrentTask> {
> > +        struct TaskRef {
> > +            task: *const CurrentTask,
> >          }
>
> Why do we drop the NotThreadSafe bit here? And it seems like the 'a lifetime
> stuff has gone too?
>
> I'm guessing the lifetime stuff is because of the SAFETY comment below about
> assumptions about lifetime?

I dropped the lifetime because it's not doing anything. As for NotThreadSafe:

1. See thread with Boqun.
2. Raw pointers are already considered not thread safe by default, so
the *const CurrentTask field has the same effect.

Alice
Boqun Feng Nov. 22, 2024, 7:17 p.m. UTC | #7
On Fri, Nov 22, 2024 at 07:48:16PM +0100, Alice Ryhl wrote:
> On Fri, Nov 22, 2024 at 7:03 PM Boqun Feng <boqun.feng@gmail.com> wrote:
> >
> > On Fri, Nov 22, 2024 at 03:40:33PM +0000, Alice Ryhl wrote:
> > > +/// Represents a [`Task`] obtained from the `current` global.
> > > +///
> > > +/// This type exists to provide more efficient operations that are only valid on the current task.
> > > +/// For example, to retrieve the pid-namespace of a task, you must use rcu protection unless it is
> > > +/// the current task.
> > > +///
> > > +/// # Invariants
> > > +///
> > > +/// Must be equal to `current` of some thread that is currently running somewhere.
> > > +pub struct CurrentTask(Task);
> > > +
> >
> > I think you need to make `CurrentTask` `!Sync`, right? Otherwise, other
> > threads can access the shared reference of a `CurrentTask` and the ->mm
> > field. I'm thinking if we have a scoped thread/workqueue support in the
> > future:
> >
> >         let task = current!();
> >         Workqueue::scoped(|s| {
> >             s.spawn(|| {
> >                 let mm = task.mm();
> >                 // do something with the mm
> >             });
> >         });
> 
> I don't think this is a problem? As long as a thread exists somewhere
> with `current` being equal to the task, we should be fine?
> 

I think I had a misunderstanding on what you meant by "operations
that are only valid on the current task", you mean these operations can
be run by other threads, but it has to be *on* a task_struct that's
"currently running", right? BTW, you probably want to reword a bit,
because the "current" task may be blocked, so technically it's not
"running".

Basically, the operations that `CurrentTask` have are the methods that
are safe to call (even on a different thread) for the "current" task, as
long as it exists (not dead or exited). In that definition, not being
`Sync` is fine.

But I have to admit I'm a bit worried that people may be confused, and
add new methods that can be only run by the current thread in the
future.

> > > +impl CurrentTask {
> > > +    /// Access the address space of this task.
> > > +    ///
> > > +    /// To increment the refcount of the referenced `mm`, you can use `ARef::from`.
> > > +    #[inline]
> > > +    pub fn mm(&self) -> Option<&MmWithUser> {
> >
> > Hmm... similar issue, `MmWithUser` is `Sync`.
> 
> What is the problem with that?
> 

It should be no problem under your definition of `CurrentTask`.

Regards,
Boqun

> > > +        let mm = unsafe { (*self.as_ptr()).mm };
> > > +
> > > +        if mm.is_null() {
> > > +            None
> > > +        } else {
> > > +            // SAFETY: If `current->mm` is non-null, then it references a valid mm with a non-zero
> > > +            // value of `mm_users`. The returned `&MmWithUser` borrows from `CurrentTask`, so the
> > > +            // `&MmWithUser` cannot escape the current task, meaning `mm_users` can't reach zero
> > > +            // while the reference is still live.
> >
> > Regards,
> > Boqun
> >
> > > +            Some(unsafe { MmWithUser::from_raw(mm) })
> > > +        }
> > > +    }
> > > +}
> > > +
> > >  // SAFETY: The type invariants guarantee that `Task` is always refcounted.
> > >  unsafe impl crate::types::AlwaysRefCounted for Task {
> > >      fn inc_ref(&self) {
> > >
> > > --
> > > 2.47.0.371.ga323438b13-goog
> > >
Matthew Wilcox (Oracle) Nov. 22, 2024, 7:30 p.m. UTC | #8
On Fri, Nov 22, 2024 at 11:17:15AM -0800, Boqun Feng wrote:
> > I don't think this is a problem? As long as a thread exists somewhere
> > with `current` being equal to the task, we should be fine?
> > 
> 
> I think I had a misunderstanding on what you meant by "operations
> that are only valid on the current task", you mean these operations can
> be run by other threads, but it has to be *on* a task_struct that's
> "currently running", right? BTW, you probably want to reword a bit,
> because the "current" task may be blocked, so technically it's not
> "running".
> 
> Basically, the operations that `CurrentTask` have are the methods that
> are safe to call (even on a different thread) for the "current" task, as
> long as it exists (not dead or exited). In that definition, not being
> `Sync` is fine.
> 
> But I have to admit I'm a bit worried that people may be confused, and
> add new methods that can be only run by the current thread in the
> future.

I agree, I think CurrentTask should refer to "current".  Or we'll
confuse everyone.  Would ActiveTask be a good name for this CurrentTask?
Alice Ryhl Nov. 22, 2024, 7:43 p.m. UTC | #9
On Fri, Nov 22, 2024 at 8:30 PM Matthew Wilcox <willy@infradead.org> wrote:
>
> On Fri, Nov 22, 2024 at 11:17:15AM -0800, Boqun Feng wrote:
> > > I don't think this is a problem? As long as a thread exists somewhere
> > > with `current` being equal to the task, we should be fine?
> > >
> >
> > I think I had a misunderstanding on what you meant by "operations
> > that are only valid on the current task", you mean these operations can
> > be run by other threads, but it has to be *on* a task_struct that's
> > "currently running", right? BTW, you probably want to reword a bit,
> > because the "current" task may be blocked, so technically it's not
> > "running".
> >
> > Basically, the operations that `CurrentTask` have are the methods that
> > are safe to call (even on a different thread) for the "current" task, as
> > long as it exists (not dead or exited). In that definition, not being
> > `Sync` is fine.
> >
> > But I have to admit I'm a bit worried that people may be confused, and
> > add new methods that can be only run by the current thread in the
> > future.
>
> I agree, I think CurrentTask should refer to "current".  Or we'll
> confuse everyone.  Would ActiveTask be a good name for this CurrentTask?

I mean, it does refer to current. Any time you have a `&CurrentTask`,
then you know that you got the pointer by reading the value of
`current`, and that the task you got it from hasn't returned to
userspace (or otherwise exited) yet.

But the name ActiveTask also makes sense I guess.

Alice
Matthew Wilcox (Oracle) Nov. 22, 2024, 7:54 p.m. UTC | #10
On Fri, Nov 22, 2024 at 08:43:33PM +0100, Alice Ryhl wrote:
> On Fri, Nov 22, 2024 at 8:30 PM Matthew Wilcox <willy@infradead.org> wrote:
> >
> > On Fri, Nov 22, 2024 at 11:17:15AM -0800, Boqun Feng wrote:
> > > > I don't think this is a problem? As long as a thread exists somewhere
> > > > with `current` being equal to the task, we should be fine?
> > > >
> > >
> > > I think I had a misunderstanding on what you meant by "operations
> > > that are only valid on the current task", you mean these operations can
> > > be run by other threads, but it has to be *on* a task_struct that's
> > > "currently running", right? BTW, you probably want to reword a bit,
> > > because the "current" task may be blocked, so technically it's not
> > > "running".
> > >
> > > Basically, the operations that `CurrentTask` have are the methods that
> > > are safe to call (even on a different thread) for the "current" task, as
> > > long as it exists (not dead or exited). In that definition, not being
> > > `Sync` is fine.
> > >
> > > But I have to admit I'm a bit worried that people may be confused, and
> > > add new methods that can be only run by the current thread in the
> > > future.
> >
> > I agree, I think CurrentTask should refer to "current".  Or we'll
> > confuse everyone.  Would ActiveTask be a good name for this CurrentTask?
> 
> I mean, it does refer to current. Any time you have a `&CurrentTask`,
> then you know that you got the pointer by reading the value of
> `current`, and that the task you got it from hasn't returned to
> userspace (or otherwise exited) yet.
> 
> But the name ActiveTask also makes sense I guess.

OK, I'm going to be all rust newbie about this (zoea?)

Given that there are operations that we can do on 'current' that aren't
safe to do if we pass current to another thread, is the right thing
to say that we have Task, and you can get a (Rust) reference to Task
either by it being 'current', or by getting a refcount on it using
get_task_struct()?  And I think that's called a typestate?
Alice Ryhl Nov. 22, 2024, 8:16 p.m. UTC | #11
On Fri, Nov 22, 2024 at 8:54 PM Matthew Wilcox <willy@infradead.org> wrote:
>
> On Fri, Nov 22, 2024 at 08:43:33PM +0100, Alice Ryhl wrote:
> > On Fri, Nov 22, 2024 at 8:30 PM Matthew Wilcox <willy@infradead.org> wrote:
> > >
> > > On Fri, Nov 22, 2024 at 11:17:15AM -0800, Boqun Feng wrote:
> > > > > I don't think this is a problem? As long as a thread exists somewhere
> > > > > with `current` being equal to the task, we should be fine?
> > > > >
> > > >
> > > > I think I had a misunderstanding on what you meant by "operations
> > > > that are only valid on the current task", you mean these operations can
> > > > be run by other threads, but it has to be *on* a task_struct that's
> > > > "currently running", right? BTW, you probably want to reword a bit,
> > > > because the "current" task may be blocked, so technically it's not
> > > > "running".
> > > >
> > > > Basically, the operations that `CurrentTask` have are the methods that
> > > > are safe to call (even on a different thread) for the "current" task, as
> > > > long as it exists (not dead or exited). In that definition, not being
> > > > `Sync` is fine.
> > > >
> > > > But I have to admit I'm a bit worried that people may be confused, and
> > > > add new methods that can be only run by the current thread in the
> > > > future.
> > >
> > > I agree, I think CurrentTask should refer to "current".  Or we'll
> > > confuse everyone.  Would ActiveTask be a good name for this CurrentTask?
> >
> > I mean, it does refer to current. Any time you have a `&CurrentTask`,
> > then you know that you got the pointer by reading the value of
> > `current`, and that the task you got it from hasn't returned to
> > userspace (or otherwise exited) yet.
> >
> > But the name ActiveTask also makes sense I guess.
>
> OK, I'm going to be all rust newbie about this (zoea?)
>
> Given that there are operations that we can do on 'current' that aren't
> safe to do if we pass current to another thread, is the right thing
> to say that we have Task, and you can get a (Rust) reference to Task
> either by it being 'current', or by getting a refcount on it using
> get_task_struct()?  And I think that's called a typestate?

There are essentially three important types here:

* ARef<Task>
* &Task
* &CurrentTask

The first one is using the pointer type ARef<_> to hold ownership over
a refcount to the task. When variables of type ARef<Task> go out of
scope, put_task_struct() is called in the destructor. And constructing
a new ARef<Task> involves a call to get_task_struct().

Now, the second type &Task is a reference to a task. A reference is
when you *don't* have ownership over a refcount. They're used whenever
there is *any* mechanism keeping the value alive; the actual mechanism
in question is not part of the type. The way references work is that
they are annotated with a compile-time concept called a lifetime,
which is essentially a region of code that the reference is not
allowed to escape. The compiler generally enforces this. For example,
given an ARef<Task>, you can obtain a &Task without touching the
refcount. Attempting to use the &Task after the ARef<Task> is
destroyed is a hard compiler error. In this case, the ARef<_> keeps a
refcount alive, so accessing the task through the &Task is always okay
even though the &Task doesn't have a refcount.

Another mechanism is `current`. The way that Rust's `current!()` macro
works is essentially by defining a local variable which goes out of
scope at the end of the current function, and giving you a &Task where
the reference's lifetime is bounded by that local variable. So by
ensuring that the &Task is bounded by the current function scope, we
ensure that we're not using it after current becomes invalid.

With this change, the `current!()` macro instead gives you a
`&CurrentTask` whose lifetime is bounded to the function scope in the
same way. Given a &CurrentTask, you can deref it to a normal &Task
with the same lifetime, so you can also access the normal Task methods
on a &CurrentTask.

And what's happening here is basically that ... you can use the
&CurrentTask as long as the function you created it in has not yet
returned. So if you spawn something on the workqueue and sleep until
the workqueue finishes executing the job, then technically that
satisfies the requirement and Rust will not prevent you from using the
&CurrentTask within that workqueue job. And this is technically also
okay from a C perspective, since the address space isn't going to go
away as long as the function doesn't return.

Alice
diff mbox series

Patch

diff --git a/rust/kernel/mm.rs b/rust/kernel/mm.rs
index 50f4861ae4b9..f7d1079391ef 100644
--- a/rust/kernel/mm.rs
+++ b/rust/kernel/mm.rs
@@ -142,28 +142,6 @@  fn deref(&self) -> &MmWithUser {
 
 // These methods are safe to call even if `mm_users` is zero.
 impl Mm {
-    /// Call `mmgrab` on `current.mm`.
-    #[inline]
-    pub fn mmgrab_current() -> Option<ARef<Mm>> {
-        // SAFETY: It's safe to get the `mm` field from current.
-        let mm = unsafe {
-            let current = bindings::get_current();
-            (*current).mm
-        };
-
-        if mm.is_null() {
-            return None;
-        }
-
-        // SAFETY: The value of `current->mm` is guaranteed to be null or a valid `mm_struct`. We
-        // just checked that it's not null. Furthermore, the returned `&Mm` is valid only for the
-        // duration of this function, and `current->mm` will stay valid for that long.
-        let mm = unsafe { Mm::from_raw(mm) };
-
-        // This increments the refcount using `mmgrab`.
-        Some(ARef::from(mm))
-    }
-
     /// Returns a raw pointer to the inner `mm_struct`.
     #[inline]
     pub fn as_raw(&self) -> *mut bindings::mm_struct {
diff --git a/rust/kernel/task.rs b/rust/kernel/task.rs
index 9e59d86da42d..103d235eb844 100644
--- a/rust/kernel/task.rs
+++ b/rust/kernel/task.rs
@@ -94,6 +94,26 @@  unsafe impl Send for Task {}
 // synchronised by C code (e.g., `signal_pending`).
 unsafe impl Sync for Task {}
 
+/// Represents a [`Task`] obtained from the `current` global.
+///
+/// This type exists to provide more efficient operations that are only valid on the current task.
+/// For example, to retrieve the pid-namespace of a task, you must use rcu protection unless it is
+/// the current task.
+///
+/// # Invariants
+///
+/// Must be equal to `current` of some thread that is currently running somewhere.
+pub struct CurrentTask(Task);
+
+// Make all `Task` methods available on `CurrentTask`.
+impl Deref for CurrentTask {
+    type Target = Task;
+    #[inline]
+    fn deref(&self) -> &Task {
+        &self.0
+    }
+}
+
 /// The type of process identifiers (PIDs).
 type Pid = bindings::pid_t;
 
@@ -121,27 +141,25 @@  pub fn current_raw() -> *mut bindings::task_struct {
     /// # Safety
     ///
     /// Callers must ensure that the returned object doesn't outlive the current task/thread.
-    pub unsafe fn current() -> impl Deref<Target = Task> {
-        struct TaskRef<'a> {
-            task: &'a Task,
-            _not_send: NotThreadSafe,
+    pub unsafe fn current() -> impl Deref<Target = CurrentTask> {
+        struct TaskRef {
+            task: *const CurrentTask,
         }
 
-        impl Deref for TaskRef<'_> {
-            type Target = Task;
+        impl Deref for TaskRef {
+            type Target = CurrentTask;
 
             fn deref(&self) -> &Self::Target {
-                self.task
+                // SAFETY: The returned reference borrows from this `TaskRef`, so it cannot outlive
+                // the `TaskRef`, which the caller of `Task::current()` has promised will not
+                // outlive the task/thread for which `self.task` is the `current` pointer. Thus, it
+                // is okay to return a `CurrentTask` reference here.
+                unsafe { &*self.task }
             }
         }
 
-        let current = Task::current_raw();
         TaskRef {
-            // SAFETY: If the current thread is still running, the current task is valid. Given
-            // that `TaskRef` is not `Send`, we know it cannot be transferred to another thread
-            // (where it could potentially outlive the caller).
-            task: unsafe { &*current.cast() },
-            _not_send: NotThreadSafe,
+            task: Task::current_raw().cast(),
         }
     }
 
@@ -203,6 +221,26 @@  pub fn wake_up(&self) {
     }
 }
 
+impl CurrentTask {
+    /// Access the address space of this task.
+    ///
+    /// To increment the refcount of the referenced `mm`, you can use `ARef::from`.
+    #[inline]
+    pub fn mm(&self) -> Option<&MmWithUser> {
+        let mm = unsafe { (*self.as_ptr()).mm };
+
+        if mm.is_null() {
+            None
+        } else {
+            // SAFETY: If `current->mm` is non-null, then it references a valid mm with a non-zero
+            // value of `mm_users`. The returned `&MmWithUser` borrows from `CurrentTask`, so the
+            // `&MmWithUser` cannot escape the current task, meaning `mm_users` can't reach zero
+            // while the reference is still live.
+            Some(unsafe { MmWithUser::from_raw(mm) })
+        }
+    }
+}
+
 // SAFETY: The type invariants guarantee that `Task` is always refcounted.
 unsafe impl crate::types::AlwaysRefCounted for Task {
     fn inc_ref(&self) {