Message ID | 20240912-shrinker-v1-1-18b7f1253553@google.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | rust: shrinker: add shrinker abstraction | expand |
On Thu, Sep 12, 2024 at 09:54:01AM +0000, Alice Ryhl wrote: > Rust Binder holds incoming transactions in a read-only mmap'd region > where it manually manages the pages. These pages are only in use until > the incoming transaction is consumed by userspace, but the kernel will > keep the pages around for future transactions. Rust Binder registers a > shrinker with the kernel so that it can give back these pages if the > system comes under memory pressure. > > Separate types are provided for registered and unregistered shrinkers. > The unregistered shrinker type can be used to configure the shrinker > before registering it. Separating it into two types also enables the > user to construct the private data between the calls to `shrinker_alloc` > and `shrinker_register` and avoid constructing the private data if > allocating the shrinker fails. > > The user specifies the callbacks in use by implementing the Shrinker > trait for the type used for the private data. This requires specifying > three things: implementations for count_objects and scan_objects, and > the pointer type that the private data will be wrapped in. > > The return values of count_objects and scan_objects are provided using > new types called CountObjects and ScanObjects respectively. These types > prevent the user from e.g. returning SHRINK_STOP from count_objects or > returning SHRINK_EMPTY from scan_objects. > > Signed-off-by: Alice Ryhl <aliceryhl@google.com> Scary given that drm has a few, but I learned a few new thinks about shrinkers. Some comments below, but looks all really nice imo. Cheers, Sima > --- > rust/bindings/bindings_helper.h | 2 + > rust/kernel/lib.rs | 1 + > rust/kernel/shrinker.rs | 324 ++++++++++++++++++++++++++++++++++++++++ > 3 files changed, 327 insertions(+) > > diff --git a/rust/bindings/bindings_helper.h b/rust/bindings/bindings_helper.h > index ae82e9c941af..7fc958e05dc5 100644 > --- a/rust/bindings/bindings_helper.h > +++ b/rust/bindings/bindings_helper.h > @@ -18,6 +18,7 @@ > #include <linux/phy.h> > #include <linux/refcount.h> > #include <linux/sched.h> > +#include <linux/shrinker.h> > #include <linux/slab.h> > #include <linux/wait.h> > #include <linux/workqueue.h> > @@ -31,4 +32,5 @@ const gfp_t RUST_CONST_HELPER_GFP_KERNEL_ACCOUNT = GFP_KERNEL_ACCOUNT; > const gfp_t RUST_CONST_HELPER_GFP_NOWAIT = GFP_NOWAIT; > const gfp_t RUST_CONST_HELPER___GFP_ZERO = __GFP_ZERO; > const gfp_t RUST_CONST_HELPER___GFP_HIGHMEM = ___GFP_HIGHMEM; > +const gfp_t RUST_CONST_HELPER___GFP_FS = ___GFP_FS; > const blk_features_t RUST_CONST_HELPER_BLK_FEAT_ROTATIONAL = BLK_FEAT_ROTATIONAL; > diff --git a/rust/kernel/lib.rs b/rust/kernel/lib.rs > index f10b06a78b9d..f35eb290f2e0 100644 > --- a/rust/kernel/lib.rs > +++ b/rust/kernel/lib.rs > @@ -45,6 +45,7 @@ > pub mod prelude; > pub mod print; > pub mod rbtree; > +pub mod shrinker; > mod static_assert; > #[doc(hidden)] > pub mod std_vendor; > diff --git a/rust/kernel/shrinker.rs b/rust/kernel/shrinker.rs > new file mode 100644 > index 000000000000..9af726bfe0b1 > --- /dev/null > +++ b/rust/kernel/shrinker.rs > @@ -0,0 +1,324 @@ > +// SPDX-License-Identifier: GPL-2.0 > + > +// Copyright (C) 2024 Google LLC. > + > +//! Shrinker for handling memory pressure. > +//! > +//! C header: [`include/linux/shrinker.h`](srctree/include/linux/shrinker.h) > + > +use crate::{alloc::AllocError, bindings, c_str, str::CStr, types::ForeignOwnable}; > + > +use core::{ > + ffi::{c_int, c_ulong, c_void}, > + marker::PhantomData, > + ptr::NonNull, > +}; > + > +const SHRINK_STOP: c_ulong = bindings::SHRINK_STOP as c_ulong; > +const SHRINK_EMPTY: c_ulong = bindings::SHRINK_EMPTY as c_ulong; > + > +/// The default value for the number of seeks needed to recreate an object. > +pub const DEFAULT_SEEKS: u32 = bindings::DEFAULT_SEEKS; > + > +/// An unregistered shrinker. > +/// > +/// This type can be used to modify the settings of the shrinker before it is registered. > +/// > +/// # Invariants > +/// > +/// The `shrinker` pointer references an unregistered shrinker. > +pub struct UnregisteredShrinker { > + shrinker: NonNull<bindings::shrinker>, > +} > + > +// SAFETY: Moving an unregistered shrinker between threads is okay. > +unsafe impl Send for UnregisteredShrinker {} > +// SAFETY: An unregistered shrinker is thread safe. > +unsafe impl Sync for UnregisteredShrinker {} > + > +impl UnregisteredShrinker { > + /// Create a new shrinker. > + pub fn alloc(name: &CStr) -> Result<Self, AllocError> { > + // SAFETY: Passing `0` as flags is okay. Using `%s` as the format string is okay when we I'd elaborate here that we have to pass 0 as the only valid value because all the non-zero flags are for memcg and numa aware shrinkers, and we do not support those because we don't expose the relevant data from ShrinkControl. > + // pass a nul-terminated string as the string for `%s` to print. > + let ptr = > + unsafe { bindings::shrinker_alloc(0, c_str!("%s").as_char_ptr(), name.as_char_ptr()) }; > + > + let shrinker = NonNull::new(ptr).ok_or(AllocError)?; > + > + // INVARIANT: The creation of the shrinker was successful. > + Ok(Self { shrinker }) > + } > + > + /// Create a new shrinker using format arguments for the name. > + pub fn alloc_fmt(name: core::fmt::Arguments<'_>) -> Result<Self, AllocError> { > + // SAFETY: Passing `0` as flags is okay. Using `%pA` as the format string is okay when we > + // pass a `fmt::Arguments` as the value to print. > + let ptr = unsafe { > + bindings::shrinker_alloc( > + 0, > + c_str!("%pA").as_char_ptr(), > + &name as *const _ as *const c_void, > + ) > + }; > + > + let shrinker = NonNull::new(ptr).ok_or(AllocError)?; > + > + // INVARIANT: The creation of the shrinker was successful. > + Ok(Self { shrinker }) > + } > + > + /// Set the number of seeks needed to recreate an object. > + pub fn set_seeks(&mut self, seeks: u32) { Not sure we want to expose this, with ssd seeks kinda don't matter and it's just a bit about relative fairness. I think nowadays it's more important that the count_object values are normalized to size, if not all objects you shrink have the same fixed size. So not sure we need this, at least initially. > + unsafe { (*self.shrinker.as_ptr()).seeks = seeks as c_int }; > + } > + > + /// Register the shrinker. > + /// > + /// The provided pointer is used as the private data, and the type `T` determines the callbacks > + /// that the shrinker will use. > + pub fn register<T: Shrinker>(self, private_data: T::Ptr) -> RegisteredShrinker<T> { > + let shrinker = self.shrinker; > + let ptr = shrinker.as_ptr(); > + > + // The destructor of `self` calls `shrinker_free`, so skip the destructor. > + core::mem::forget(self); > + > + let private_data_ptr = <T::Ptr as ForeignOwnable>::into_foreign(private_data); > + > + // SAFETY: We own the private data, so we can assign to it. > + unsafe { (*ptr).private_data = private_data_ptr.cast_mut() }; > + // SAFETY: The shrinker is not yet registered, so we can update this field. > + unsafe { (*ptr).count_objects = Some(rust_count_objects::<T>) }; > + // SAFETY: The shrinker is not yet registered, so we can update this field. > + unsafe { (*ptr).scan_objects = Some(rust_scan_objects::<T>) }; > + > + // SAFETY: The shrinker is unregistered, so it's safe to register it. > + unsafe { bindings::shrinker_register(ptr) }; > + > + RegisteredShrinker { > + shrinker, > + _phantom: PhantomData, > + } > + } > +} > + > +impl Drop for UnregisteredShrinker { > + fn drop(&mut self) { > + // SAFETY: The shrinker is a valid but unregistered shrinker, and we will not use it > + // anymore. > + unsafe { bindings::shrinker_free(self.shrinker.as_ptr()) }; > + } > +} > + > +/// A shrinker that is registered with the kernel. > +/// > +/// # Invariants > +/// > +/// The `shrinker` pointer refers to a registered shrinker using `T` as the private data. > +pub struct RegisteredShrinker<T: Shrinker> { > + shrinker: NonNull<bindings::shrinker>, > + _phantom: PhantomData<T::Ptr>, > +} > + > +// SAFETY: This allows you to deregister the shrinker from a different thread, which means that > +// private data could be dropped from any thread. > +unsafe impl<T: Shrinker> Send for RegisteredShrinker<T> where T::Ptr: Send {} > +// SAFETY: The only thing you can do with an immutable reference is access the private data, which > +// is okay to access in parallel as the `Shrinker` trait requires the private data to be `Sync`. > +unsafe impl<T: Shrinker> Sync for RegisteredShrinker<T> {} > + > +impl<T: Shrinker> RegisteredShrinker<T> { > + /// Access the private data in this shrinker. > + pub fn private_data(&self) -> <T::Ptr as ForeignOwnable>::Borrowed<'_> { > + // SAFETY: We own the private data, so we can access it. > + let private = unsafe { (*self.shrinker.as_ptr()).private_data }; > + // SAFETY: By the type invariants, the private data is `T`. This access could happen in > + // parallel with a shrinker callback, but that's okay as the `Shrinker` trait ensures that > + // `T::Ptr` is `Sync`. > + unsafe { <T::Ptr as ForeignOwnable>::borrow(private) } > + } > +} > + > +impl<T: Shrinker> Drop for RegisteredShrinker<T> { > + fn drop(&mut self) { > + // SAFETY: We own the private data, so we can access it. > + let private = unsafe { (*self.shrinker.as_ptr()).private_data }; > + // SAFETY: We will not access the shrinker after this call. > + unsafe { bindings::shrinker_free(self.shrinker.as_ptr()) }; > + // SAFETY: The above call blocked until the completion of any shrinker callbacks, so there > + // are no longer any users of the private data. > + drop(unsafe { <T::Ptr as ForeignOwnable>::from_foreign(private) }); > + } > +} > + > +/// Callbacks for a shrinker. > +pub trait Shrinker { > + /// The pointer type used to store the private data of the shrinker. > + /// > + /// Needs to be `Sync` because the shrinker callback could access this value immutably from > + /// several thread in parallel. > + type Ptr: ForeignOwnable + Sync; > + > + /// Count the number of freeable items in the cache. > + /// > + /// May be called from atomic context. That's wrong, reclaim is allowed to block. Or my understanding of how this works is very busted. We do run in a pseudo locking context, the core code annotates that with fs_reclaim_acquire/release. > + fn count_objects( > + me: <Self::Ptr as ForeignOwnable>::Borrowed<'_>, > + sc: ShrinkControl<'_>, > + ) -> CountObjects; > + > + /// Count the number of freeable items in the cache. > + /// > + /// May be called from atomic context. Same here. > + fn scan_objects( > + me: <Self::Ptr as ForeignOwnable>::Borrowed<'_>, > + sc: ShrinkControl<'_>, > + ) -> ScanObjects; > +} > + > +/// How many objects are there in the cache? > +/// > +/// This is used as the return value of [`Shrinker::count_objects`]. > +pub struct CountObjects { > + inner: c_ulong, > +} > + > +impl CountObjects { > + /// Indicates that the number of objects is unknown. > + pub const UNKNOWN: Self = Self { inner: 0 }; > + > + /// Indicates that the number of objects is zero. > + pub const EMPTY: Self = Self { > + inner: SHRINK_EMPTY, So I spend way too much time looking at all this, and I think overflows don't matter to the core shrinker code (aside from maybe a bit of unfairness), as long as we don't accidently return SHRINK_EMPTY here. But that's only relevant for memcg aware shrinkers I think. > + }; > + > + /// The maximum possible number of freeable objects. > + pub const MAX: Self = Self { > + // The shrinker code assumes that it can multiply this value by two without overflow. > + inner: c_ulong::MAX / 2, I think the limits is actually ulong_max/4, since priority can be 0 from drop_slabs() and we multiply by 4 is seeks are nonzero. But then I tried to look at what the limit for nr_to_scan and hence ScanObjects, and I think aside from the special values the mm/shrinker.c simply does not care about overflows at all. Both unsigned and signed integer math is well defined for overflow in linux, so no compiler license to do stupid stuff, and worst case if you do overflow you just end up shrinking a bit unfairly. But afaict nothing breaks. So not sure we should enforce this when core mm doesn't bother. Same for ScanObjects below. > + }; > + > + /// Creates a new `CountObjects` with the given value. > + pub fn from_count(count: usize) -> Self { > + if count == 0 { > + return Self::EMPTY; > + } > + > + if count > Self::MAX.inner as usize { > + return Self::MAX; > + } > + > + Self { > + inner: count as c_ulong, > + } > + } > +} > + > +/// How many objects were freed? > +/// > +/// This is used as the return value of [`Shrinker::scan_objects`]. > +pub struct ScanObjects { > + inner: c_ulong, > +} > + > +impl ScanObjects { > + /// Indicates that the shrinker should stop trying to free objects from this cache due to > + /// potential deadlocks. > + pub const STOP: Self = Self { inner: SHRINK_STOP }; > + > + /// The maximum possible number of freeable objects. > + pub const MAX: Self = Self { > + // The shrinker code assumes that it can multiply this value by two without overflow. > + inner: c_ulong::MAX / 2, > + }; > + > + /// Creates a new `CountObjects` with the given value. > + pub fn from_count(count: usize) -> Self { > + if count > Self::MAX.inner as usize { > + return Self::MAX; > + } > + > + Self { > + inner: count as c_ulong, > + } > + } > +} > + > +/// This struct is used to pass information from page reclaim to the shrinkers. > +pub struct ShrinkControl<'a> { > + ptr: NonNull<bindings::shrink_control>, > + _phantom: PhantomData<&'a bindings::shrink_control>, > +} > + > +impl<'a> ShrinkControl<'a> { > + /// Create a `ShrinkControl` from a raw pointer. > + /// > + /// # Safety > + /// > + /// The pointer should point at a valid `shrink_control` for the duration of 'a. > + pub unsafe fn from_raw(ptr: *mut bindings::shrink_control) -> Self { > + Self { > + // SAFETY: Caller promises that this pointer is valid. > + ptr: unsafe { NonNull::new_unchecked(ptr) }, > + _phantom: PhantomData, > + } > + } > + > + /// Determines whether it is safe to recurse into filesystem code. > + pub fn gfp_fs(&self) -> bool { I guess you need this in your new binder code, because the current one seems to side-step __GFP_FS recursion with just trylocking absolutely everything aside from the lru spinlock. At least I haven't found any gfp_fs tests, but I might be blind. > + // SAFETY: Okay by type invariants. > + let mask = unsafe { (*self.ptr.as_ptr()).gfp_mask }; > + > + (mask & bindings::__GFP_FS) != 0 > + } > + > + /// Returns the number of objects that `scan_objects` should try to reclaim. > + pub fn nr_to_scan(&self) -> usize { > + // SAFETY: Okay by type invariants. > + unsafe { (*self.ptr.as_ptr()).nr_to_scan as usize } > + } > + > + /// The callback should set this value to the number of objects processed. > + pub fn set_nr_scanned(&mut self, val: usize) { Unless my grep skills are really bad I think the drm/i915 shrinker is the only one that bothers with this. Not sure we want to bother either? > + let mut val = val as c_ulong; > + // SAFETY: Okay by type invariants. > + let max = unsafe { (*self.ptr.as_ptr()).nr_to_scan }; > + if val > max { > + val = max; > + } > + > + // SAFETY: Okay by type invariants. > + unsafe { (*self.ptr.as_ptr()).nr_scanned = val }; > + } > +} > + > +unsafe extern "C" fn rust_count_objects<T: Shrinker>( > + shrink: *mut bindings::shrinker, > + sc: *mut bindings::shrink_control, > +) -> c_ulong { > + // SAFETY: We own the private data, so we can access it. > + let private = unsafe { (*shrink).private_data }; > + // SAFETY: This function is only used with shrinkers where `T` is the type of the private data. > + let private = unsafe { <T::Ptr as ForeignOwnable>::borrow(private) }; > + // SAFETY: The caller passes a valid `sc` pointer. > + let sc = unsafe { ShrinkControl::from_raw(sc) }; > + > + let ret = T::count_objects(private, sc); > + ret.inner > +} > + > +unsafe extern "C" fn rust_scan_objects<T: Shrinker>( > + shrink: *mut bindings::shrinker, > + sc: *mut bindings::shrink_control, > +) -> c_ulong { > + // SAFETY: We own the private data, so we can access it. > + let private = unsafe { (*shrink).private_data }; > + // SAFETY: This function is only used with shrinkers where `T` is the type of the private data. > + let private = unsafe { <T::Ptr as ForeignOwnable>::borrow(private) }; > + // SAFETY: The caller passes a valid `sc` pointer. > + let sc = unsafe { ShrinkControl::from_raw(sc) }; > + > + let ret = T::scan_objects(private, sc); > + ret.inner > +} > > --- > base-commit: 93dc3be19450447a3a7090bd1dfb9f3daac3e8d2 > change-id: 20240911-shrinker-f8371af00b68 > > Best regards, > -- > Alice Ryhl <aliceryhl@google.com> >
On Fri, Sep 13, 2024 at 10:15 PM Simona Vetter <simona.vetter@ffwll.ch> wrote: > > On Thu, Sep 12, 2024 at 09:54:01AM +0000, Alice Ryhl wrote: > > Rust Binder holds incoming transactions in a read-only mmap'd region > > where it manually manages the pages. These pages are only in use until > > the incoming transaction is consumed by userspace, but the kernel will > > keep the pages around for future transactions. Rust Binder registers a > > shrinker with the kernel so that it can give back these pages if the > > system comes under memory pressure. > > > > Separate types are provided for registered and unregistered shrinkers. > > The unregistered shrinker type can be used to configure the shrinker > > before registering it. Separating it into two types also enables the > > user to construct the private data between the calls to `shrinker_alloc` > > and `shrinker_register` and avoid constructing the private data if > > allocating the shrinker fails. > > > > The user specifies the callbacks in use by implementing the Shrinker > > trait for the type used for the private data. This requires specifying > > three things: implementations for count_objects and scan_objects, and > > the pointer type that the private data will be wrapped in. > > > > The return values of count_objects and scan_objects are provided using > > new types called CountObjects and ScanObjects respectively. These types > > prevent the user from e.g. returning SHRINK_STOP from count_objects or > > returning SHRINK_EMPTY from scan_objects. > > > > Signed-off-by: Alice Ryhl <aliceryhl@google.com> > > Scary given that drm has a few, but I learned a few new thinks about > shrinkers. Some comments below, but looks all really nice imo. > > Cheers, Sima > > > --- > > rust/bindings/bindings_helper.h | 2 + > > rust/kernel/lib.rs | 1 + > > rust/kernel/shrinker.rs | 324 ++++++++++++++++++++++++++++++++++++++++ > > 3 files changed, 327 insertions(+) > > > > diff --git a/rust/bindings/bindings_helper.h b/rust/bindings/bindings_helper.h > > index ae82e9c941af..7fc958e05dc5 100644 > > --- a/rust/bindings/bindings_helper.h > > +++ b/rust/bindings/bindings_helper.h > > @@ -18,6 +18,7 @@ > > #include <linux/phy.h> > > #include <linux/refcount.h> > > #include <linux/sched.h> > > +#include <linux/shrinker.h> > > #include <linux/slab.h> > > #include <linux/wait.h> > > #include <linux/workqueue.h> > > @@ -31,4 +32,5 @@ const gfp_t RUST_CONST_HELPER_GFP_KERNEL_ACCOUNT = GFP_KERNEL_ACCOUNT; > > const gfp_t RUST_CONST_HELPER_GFP_NOWAIT = GFP_NOWAIT; > > const gfp_t RUST_CONST_HELPER___GFP_ZERO = __GFP_ZERO; > > const gfp_t RUST_CONST_HELPER___GFP_HIGHMEM = ___GFP_HIGHMEM; > > +const gfp_t RUST_CONST_HELPER___GFP_FS = ___GFP_FS; > > const blk_features_t RUST_CONST_HELPER_BLK_FEAT_ROTATIONAL = BLK_FEAT_ROTATIONAL; > > diff --git a/rust/kernel/lib.rs b/rust/kernel/lib.rs > > index f10b06a78b9d..f35eb290f2e0 100644 > > --- a/rust/kernel/lib.rs > > +++ b/rust/kernel/lib.rs > > @@ -45,6 +45,7 @@ > > pub mod prelude; > > pub mod print; > > pub mod rbtree; > > +pub mod shrinker; > > mod static_assert; > > #[doc(hidden)] > > pub mod std_vendor; > > diff --git a/rust/kernel/shrinker.rs b/rust/kernel/shrinker.rs > > new file mode 100644 > > index 000000000000..9af726bfe0b1 > > --- /dev/null > > +++ b/rust/kernel/shrinker.rs > > @@ -0,0 +1,324 @@ > > +// SPDX-License-Identifier: GPL-2.0 > > + > > +// Copyright (C) 2024 Google LLC. > > + > > +//! Shrinker for handling memory pressure. > > +//! > > +//! C header: [`include/linux/shrinker.h`](srctree/include/linux/shrinker.h) > > + > > +use crate::{alloc::AllocError, bindings, c_str, str::CStr, types::ForeignOwnable}; > > + > > +use core::{ > > + ffi::{c_int, c_ulong, c_void}, > > + marker::PhantomData, > > + ptr::NonNull, > > +}; > > + > > +const SHRINK_STOP: c_ulong = bindings::SHRINK_STOP as c_ulong; > > +const SHRINK_EMPTY: c_ulong = bindings::SHRINK_EMPTY as c_ulong; > > + > > +/// The default value for the number of seeks needed to recreate an object. > > +pub const DEFAULT_SEEKS: u32 = bindings::DEFAULT_SEEKS; > > + > > +/// An unregistered shrinker. > > +/// > > +/// This type can be used to modify the settings of the shrinker before it is registered. > > +/// > > +/// # Invariants > > +/// > > +/// The `shrinker` pointer references an unregistered shrinker. > > +pub struct UnregisteredShrinker { > > + shrinker: NonNull<bindings::shrinker>, > > +} > > + > > +// SAFETY: Moving an unregistered shrinker between threads is okay. > > +unsafe impl Send for UnregisteredShrinker {} > > +// SAFETY: An unregistered shrinker is thread safe. > > +unsafe impl Sync for UnregisteredShrinker {} > > + > > +impl UnregisteredShrinker { > > + /// Create a new shrinker. > > + pub fn alloc(name: &CStr) -> Result<Self, AllocError> { > > + // SAFETY: Passing `0` as flags is okay. Using `%s` as the format string is okay when we > > I'd elaborate here that we have to pass 0 as the only valid value because > all the non-zero flags are for memcg and numa aware shrinkers, and we do > not support those because we don't expose the relevant data from > ShrinkControl. > > > + // pass a nul-terminated string as the string for `%s` to print. > > + let ptr = > > + unsafe { bindings::shrinker_alloc(0, c_str!("%s").as_char_ptr(), name.as_char_ptr()) }; > > + > > + let shrinker = NonNull::new(ptr).ok_or(AllocError)?; > > + > > + // INVARIANT: The creation of the shrinker was successful. > > + Ok(Self { shrinker }) > > + } > > + > > + /// Create a new shrinker using format arguments for the name. > > + pub fn alloc_fmt(name: core::fmt::Arguments<'_>) -> Result<Self, AllocError> { > > + // SAFETY: Passing `0` as flags is okay. Using `%pA` as the format string is okay when we > > + // pass a `fmt::Arguments` as the value to print. > > + let ptr = unsafe { > > + bindings::shrinker_alloc( > > + 0, > > + c_str!("%pA").as_char_ptr(), > > + &name as *const _ as *const c_void, > > + ) > > + }; > > + > > + let shrinker = NonNull::new(ptr).ok_or(AllocError)?; > > + > > + // INVARIANT: The creation of the shrinker was successful. > > + Ok(Self { shrinker }) > > + } > > + > > + /// Set the number of seeks needed to recreate an object. > > + pub fn set_seeks(&mut self, seeks: u32) { > > Not sure we want to expose this, with ssd seeks kinda don't matter and > it's just a bit about relative fairness. I think nowadays it's more > important that the count_object values are normalized to size, if not all > objects you shrink have the same fixed size. > > So not sure we need this, at least initially. Good point about keeping the objects the same fixed size. I'll incorporate that in the docs. > > + unsafe { (*self.shrinker.as_ptr()).seeks = seeks as c_int }; > > + } > > + > > + /// Register the shrinker. > > + /// > > + /// The provided pointer is used as the private data, and the type `T` determines the callbacks > > + /// that the shrinker will use. > > + pub fn register<T: Shrinker>(self, private_data: T::Ptr) -> RegisteredShrinker<T> { > > + let shrinker = self.shrinker; > > + let ptr = shrinker.as_ptr(); > > + > > + // The destructor of `self` calls `shrinker_free`, so skip the destructor. > > + core::mem::forget(self); > > + > > + let private_data_ptr = <T::Ptr as ForeignOwnable>::into_foreign(private_data); > > + > > + // SAFETY: We own the private data, so we can assign to it. > > + unsafe { (*ptr).private_data = private_data_ptr.cast_mut() }; > > + // SAFETY: The shrinker is not yet registered, so we can update this field. > > + unsafe { (*ptr).count_objects = Some(rust_count_objects::<T>) }; > > + // SAFETY: The shrinker is not yet registered, so we can update this field. > > + unsafe { (*ptr).scan_objects = Some(rust_scan_objects::<T>) }; > > + > > + // SAFETY: The shrinker is unregistered, so it's safe to register it. > > + unsafe { bindings::shrinker_register(ptr) }; > > + > > + RegisteredShrinker { > > + shrinker, > > + _phantom: PhantomData, > > + } > > + } > > +} > > + > > +impl Drop for UnregisteredShrinker { > > + fn drop(&mut self) { > > + // SAFETY: The shrinker is a valid but unregistered shrinker, and we will not use it > > + // anymore. > > + unsafe { bindings::shrinker_free(self.shrinker.as_ptr()) }; > > + } > > +} > > + > > +/// A shrinker that is registered with the kernel. > > +/// > > +/// # Invariants > > +/// > > +/// The `shrinker` pointer refers to a registered shrinker using `T` as the private data. > > +pub struct RegisteredShrinker<T: Shrinker> { > > + shrinker: NonNull<bindings::shrinker>, > > + _phantom: PhantomData<T::Ptr>, > > +} > > + > > +// SAFETY: This allows you to deregister the shrinker from a different thread, which means that > > +// private data could be dropped from any thread. > > +unsafe impl<T: Shrinker> Send for RegisteredShrinker<T> where T::Ptr: Send {} > > +// SAFETY: The only thing you can do with an immutable reference is access the private data, which > > +// is okay to access in parallel as the `Shrinker` trait requires the private data to be `Sync`. > > +unsafe impl<T: Shrinker> Sync for RegisteredShrinker<T> {} > > + > > +impl<T: Shrinker> RegisteredShrinker<T> { > > + /// Access the private data in this shrinker. > > + pub fn private_data(&self) -> <T::Ptr as ForeignOwnable>::Borrowed<'_> { > > + // SAFETY: We own the private data, so we can access it. > > + let private = unsafe { (*self.shrinker.as_ptr()).private_data }; > > + // SAFETY: By the type invariants, the private data is `T`. This access could happen in > > + // parallel with a shrinker callback, but that's okay as the `Shrinker` trait ensures that > > + // `T::Ptr` is `Sync`. > > + unsafe { <T::Ptr as ForeignOwnable>::borrow(private) } > > + } > > +} > > + > > +impl<T: Shrinker> Drop for RegisteredShrinker<T> { > > + fn drop(&mut self) { > > + // SAFETY: We own the private data, so we can access it. > > + let private = unsafe { (*self.shrinker.as_ptr()).private_data }; > > + // SAFETY: We will not access the shrinker after this call. > > + unsafe { bindings::shrinker_free(self.shrinker.as_ptr()) }; > > + // SAFETY: The above call blocked until the completion of any shrinker callbacks, so there > > + // are no longer any users of the private data. > > + drop(unsafe { <T::Ptr as ForeignOwnable>::from_foreign(private) }); > > + } > > +} > > + > > +/// Callbacks for a shrinker. > > +pub trait Shrinker { > > + /// The pointer type used to store the private data of the shrinker. > > + /// > > + /// Needs to be `Sync` because the shrinker callback could access this value immutably from > > + /// several thread in parallel. > > + type Ptr: ForeignOwnable + Sync; > > + > > + /// Count the number of freeable items in the cache. > > + /// > > + /// May be called from atomic context. > > That's wrong, reclaim is allowed to block. Or my understanding of how this > works is very busted. We do run in a pseudo locking context, the core > code annotates that with fs_reclaim_acquire/release. Ah, ok. Every shrinker I've looked at uses try_lock everywhere so I assumed this could happen. Thanks for verifying that. > > + fn count_objects( > > + me: <Self::Ptr as ForeignOwnable>::Borrowed<'_>, > > + sc: ShrinkControl<'_>, > > + ) -> CountObjects; > > + > > + /// Count the number of freeable items in the cache. > > + /// > > + /// May be called from atomic context. > > Same here. > > > + fn scan_objects( > > + me: <Self::Ptr as ForeignOwnable>::Borrowed<'_>, > > + sc: ShrinkControl<'_>, > > + ) -> ScanObjects; > > +} > > + > > +/// How many objects are there in the cache? > > +/// > > +/// This is used as the return value of [`Shrinker::count_objects`]. > > +pub struct CountObjects { > > + inner: c_ulong, > > +} > > + > > +impl CountObjects { > > + /// Indicates that the number of objects is unknown. > > + pub const UNKNOWN: Self = Self { inner: 0 }; > > + > > + /// Indicates that the number of objects is zero. > > + pub const EMPTY: Self = Self { > > + inner: SHRINK_EMPTY, > > So I spend way too much time looking at all this, and I think overflows > don't matter to the core shrinker code (aside from maybe a bit of > unfairness), as long as we don't accidently return SHRINK_EMPTY here. But > that's only relevant for memcg aware shrinkers I think. Overflow is one thing. The current API automatically converts 0 to SHRINK_EMPTY, whereas many C shrinkers just return the size directly, which means they return UNKNOWN when it's really empty. Thoughts? > > + }; > > + > > + /// The maximum possible number of freeable objects. > > + pub const MAX: Self = Self { > > + // The shrinker code assumes that it can multiply this value by two without overflow. > > + inner: c_ulong::MAX / 2, > > I think the limits is actually ulong_max/4, since priority can be 0 from > drop_slabs() and we multiply by 4 is seeks are nonzero. But then I tried > to look at what the limit for nr_to_scan and hence ScanObjects, and I > think aside from the special values the mm/shrinker.c simply does not care > about overflows at all. Both unsigned and signed integer math is well > defined for overflow in linux, so no compiler license to do stupid stuff, > and worst case if you do overflow you just end up shrinking a bit > unfairly. But afaict nothing breaks. > > So not sure we should enforce this when core mm doesn't bother. > > Same for ScanObjects below. > > > + }; > > + > > + /// Creates a new `CountObjects` with the given value. > > + pub fn from_count(count: usize) -> Self { > > + if count == 0 { > > + return Self::EMPTY; > > + } > > + > > + if count > Self::MAX.inner as usize { > > + return Self::MAX; > > + } > > + > > + Self { > > + inner: count as c_ulong, > > + } > > + } > > +} > > + > > +/// How many objects were freed? > > +/// > > +/// This is used as the return value of [`Shrinker::scan_objects`]. > > +pub struct ScanObjects { > > + inner: c_ulong, > > +} > > + > > +impl ScanObjects { > > + /// Indicates that the shrinker should stop trying to free objects from this cache due to > > + /// potential deadlocks. > > + pub const STOP: Self = Self { inner: SHRINK_STOP }; > > + > > + /// The maximum possible number of freeable objects. > > + pub const MAX: Self = Self { > > + // The shrinker code assumes that it can multiply this value by two without overflow. > > + inner: c_ulong::MAX / 2, > > + }; > > + > > + /// Creates a new `CountObjects` with the given value. > > + pub fn from_count(count: usize) -> Self { > > + if count > Self::MAX.inner as usize { > > + return Self::MAX; > > + } > > + > > + Self { > > + inner: count as c_ulong, > > + } > > + } > > +} > > + > > +/// This struct is used to pass information from page reclaim to the shrinkers. > > +pub struct ShrinkControl<'a> { > > + ptr: NonNull<bindings::shrink_control>, > > + _phantom: PhantomData<&'a bindings::shrink_control>, > > +} > > + > > +impl<'a> ShrinkControl<'a> { > > + /// Create a `ShrinkControl` from a raw pointer. > > + /// > > + /// # Safety > > + /// > > + /// The pointer should point at a valid `shrink_control` for the duration of 'a. > > + pub unsafe fn from_raw(ptr: *mut bindings::shrink_control) -> Self { > > + Self { > > + // SAFETY: Caller promises that this pointer is valid. > > + ptr: unsafe { NonNull::new_unchecked(ptr) }, > > + _phantom: PhantomData, > > + } > > + } > > + > > + /// Determines whether it is safe to recurse into filesystem code. > > + pub fn gfp_fs(&self) -> bool { > > I guess you need this in your new binder code, because the current one > seems to side-step __GFP_FS recursion with just trylocking absolutely > everything aside from the lru spinlock. At least I haven't found any > gfp_fs tests, but I might be blind. Ah, yeah, I don't think C binder tests for that. Either way, it probably makes more sense to just expose a method to get the flags directly, rather than have a dedicated method for testing this particular flags. Or what do you think? > > + // SAFETY: Okay by type invariants. > > + let mask = unsafe { (*self.ptr.as_ptr()).gfp_mask }; > > + > > + (mask & bindings::__GFP_FS) != 0 > > + } > > + > > + /// Returns the number of objects that `scan_objects` should try to reclaim. > > + pub fn nr_to_scan(&self) -> usize { > > + // SAFETY: Okay by type invariants. > > + unsafe { (*self.ptr.as_ptr()).nr_to_scan as usize } > > + } > > + > > + /// The callback should set this value to the number of objects processed. > > + pub fn set_nr_scanned(&mut self, val: usize) { > > Unless my grep skills are really bad I think the drm/i915 shrinker is the > only one that bothers with this. Not sure we want to bother either? > > > + let mut val = val as c_ulong; > > + // SAFETY: Okay by type invariants. > > + let max = unsafe { (*self.ptr.as_ptr()).nr_to_scan }; > > + if val > max { > > + val = max; > > + } > > + > > + // SAFETY: Okay by type invariants. > > + unsafe { (*self.ptr.as_ptr()).nr_scanned = val }; > > + } > > +} > > + > > +unsafe extern "C" fn rust_count_objects<T: Shrinker>( > > + shrink: *mut bindings::shrinker, > > + sc: *mut bindings::shrink_control, > > +) -> c_ulong { > > + // SAFETY: We own the private data, so we can access it. > > + let private = unsafe { (*shrink).private_data }; > > + // SAFETY: This function is only used with shrinkers where `T` is the type of the private data. > > + let private = unsafe { <T::Ptr as ForeignOwnable>::borrow(private) }; > > + // SAFETY: The caller passes a valid `sc` pointer. > > + let sc = unsafe { ShrinkControl::from_raw(sc) }; > > + > > + let ret = T::count_objects(private, sc); > > + ret.inner > > +} > > + > > +unsafe extern "C" fn rust_scan_objects<T: Shrinker>( > > + shrink: *mut bindings::shrinker, > > + sc: *mut bindings::shrink_control, > > +) -> c_ulong { > > + // SAFETY: We own the private data, so we can access it. > > + let private = unsafe { (*shrink).private_data }; > > + // SAFETY: This function is only used with shrinkers where `T` is the type of the private data. > > + let private = unsafe { <T::Ptr as ForeignOwnable>::borrow(private) }; > > + // SAFETY: The caller passes a valid `sc` pointer. > > + let sc = unsafe { ShrinkControl::from_raw(sc) }; > > + > > + let ret = T::scan_objects(private, sc); > > + ret.inner > > +} > > > > --- > > base-commit: 93dc3be19450447a3a7090bd1dfb9f3daac3e8d2 > > change-id: 20240911-shrinker-f8371af00b68 > > > > Best regards, > > -- > > Alice Ryhl <aliceryhl@google.com> > > > > -- > Simona Vetter > Software Engineer, Intel Corporation > http://blog.ffwll.ch
On Thu, Sep 12, 2024 at 09:54:01AM +0000, Alice Ryhl wrote: > Rust Binder holds incoming transactions in a read-only mmap'd region > where it manually manages the pages. These pages are only in use until > the incoming transaction is consumed by userspace, but the kernel will > keep the pages around for future transactions. Rust Binder registers a > shrinker with the kernel so that it can give back these pages if the > system comes under memory pressure. > > Separate types are provided for registered and unregistered shrinkers. > The unregistered shrinker type can be used to configure the shrinker > before registering it. Separating it into two types also enables the > user to construct the private data between the calls to `shrinker_alloc` > and `shrinker_register` and avoid constructing the private data if > allocating the shrinker fails. > > The user specifies the callbacks in use by implementing the Shrinker > trait for the type used for the private data. This requires specifying > three things: implementations for count_objects and scan_objects, and > the pointer type that the private data will be wrapped in. > > The return values of count_objects and scan_objects are provided using > new types called CountObjects and ScanObjects respectively. These types > prevent the user from e.g. returning SHRINK_STOP from count_objects or > returning SHRINK_EMPTY from scan_objects. > > Signed-off-by: Alice Ryhl <aliceryhl@google.com> > --- [...] > --- a/rust/kernel/lib.rs > +++ b/rust/kernel/lib.rs > @@ -45,6 +45,7 @@ > pub mod prelude; > pub mod print; > pub mod rbtree; > +pub mod shrinker; Meta comment: I think we should create a mm mod (memory management) and put shrinker there, otherwise there would be a very long list of "pub mod" in rust/kernel/lib.rs ;-) Thoughts? Regards, Boqun > mod static_assert; > #[doc(hidden)] > pub mod std_vendor; > diff --git a/rust/kernel/shrinker.rs b/rust/kernel/shrinker.rs > new file mode 100644 > index 000000000000..9af726bfe0b1 [...]
On Thu, 12 Sep 2024 09:54:01 +0000 Alice Ryhl <aliceryhl@google.com> wrote: > Rust Binder holds incoming transactions in a read-only mmap'd region > where it manually manages the pages. These pages are only in use until > the incoming transaction is consumed by userspace, but the kernel will > keep the pages around for future transactions. Rust Binder registers a > shrinker with the kernel so that it can give back these pages if the > system comes under memory pressure. > > Separate types are provided for registered and unregistered shrinkers. > The unregistered shrinker type can be used to configure the shrinker > before registering it. Separating it into two types also enables the > user to construct the private data between the calls to `shrinker_alloc` > and `shrinker_register` and avoid constructing the private data if > allocating the shrinker fails. > > The user specifies the callbacks in use by implementing the Shrinker > trait for the type used for the private data. This requires specifying > three things: implementations for count_objects and scan_objects, and > the pointer type that the private data will be wrapped in. > > The return values of count_objects and scan_objects are provided using > new types called CountObjects and ScanObjects respectively. These types > prevent the user from e.g. returning SHRINK_STOP from count_objects or > returning SHRINK_EMPTY from scan_objects. > > Signed-off-by: Alice Ryhl <aliceryhl@google.com> Some suggestions below. I agree with Boqun that we might want a `kernel::mm` to avoid everything flat in the `kernel::` namespace. Best, Gary > --- > rust/bindings/bindings_helper.h | 2 + > rust/kernel/lib.rs | 1 + > rust/kernel/shrinker.rs | 324 ++++++++++++++++++++++++++++++++++++++++ > 3 files changed, 327 insertions(+) > > diff --git a/rust/bindings/bindings_helper.h b/rust/bindings/bindings_helper.h > index ae82e9c941af..7fc958e05dc5 100644 > --- a/rust/bindings/bindings_helper.h > +++ b/rust/bindings/bindings_helper.h > @@ -18,6 +18,7 @@ > #include <linux/phy.h> > #include <linux/refcount.h> > #include <linux/sched.h> > +#include <linux/shrinker.h> > #include <linux/slab.h> > #include <linux/wait.h> > #include <linux/workqueue.h> > @@ -31,4 +32,5 @@ const gfp_t RUST_CONST_HELPER_GFP_KERNEL_ACCOUNT = GFP_KERNEL_ACCOUNT; > const gfp_t RUST_CONST_HELPER_GFP_NOWAIT = GFP_NOWAIT; > const gfp_t RUST_CONST_HELPER___GFP_ZERO = __GFP_ZERO; > const gfp_t RUST_CONST_HELPER___GFP_HIGHMEM = ___GFP_HIGHMEM; > +const gfp_t RUST_CONST_HELPER___GFP_FS = ___GFP_FS; > const blk_features_t RUST_CONST_HELPER_BLK_FEAT_ROTATIONAL = BLK_FEAT_ROTATIONAL; > diff --git a/rust/kernel/lib.rs b/rust/kernel/lib.rs > index f10b06a78b9d..f35eb290f2e0 100644 > --- a/rust/kernel/lib.rs > +++ b/rust/kernel/lib.rs > @@ -45,6 +45,7 @@ > pub mod prelude; > pub mod print; > pub mod rbtree; > +pub mod shrinker; > mod static_assert; > #[doc(hidden)] > pub mod std_vendor; > diff --git a/rust/kernel/shrinker.rs b/rust/kernel/shrinker.rs > new file mode 100644 > index 000000000000..9af726bfe0b1 > --- /dev/null > +++ b/rust/kernel/shrinker.rs > @@ -0,0 +1,324 @@ > +// SPDX-License-Identifier: GPL-2.0 > + > +// Copyright (C) 2024 Google LLC. > + > +//! Shrinker for handling memory pressure. > +//! > +//! C header: [`include/linux/shrinker.h`](srctree/include/linux/shrinker.h) > + > +use crate::{alloc::AllocError, bindings, c_str, str::CStr, types::ForeignOwnable}; > + > +use core::{ > + ffi::{c_int, c_ulong, c_void}, > + marker::PhantomData, > + ptr::NonNull, > +}; > + > +const SHRINK_STOP: c_ulong = bindings::SHRINK_STOP as c_ulong; > +const SHRINK_EMPTY: c_ulong = bindings::SHRINK_EMPTY as c_ulong; > + > +/// The default value for the number of seeks needed to recreate an object. > +pub const DEFAULT_SEEKS: u32 = bindings::DEFAULT_SEEKS; > + > +/// An unregistered shrinker. > +/// > +/// This type can be used to modify the settings of the shrinker before it is registered. > +/// > +/// # Invariants > +/// > +/// The `shrinker` pointer references an unregistered shrinker. > +pub struct UnregisteredShrinker { > + shrinker: NonNull<bindings::shrinker>, > +} > + > +// SAFETY: Moving an unregistered shrinker between threads is okay. > +unsafe impl Send for UnregisteredShrinker {} > +// SAFETY: An unregistered shrinker is thread safe. > +unsafe impl Sync for UnregisteredShrinker {} > + > +impl UnregisteredShrinker { I feel like this should have "Builder" in its name. Although I struggle to come up with a good one, maybe `ShrinkerRegistrationBuilder`? > + /// Create a new shrinker. > + pub fn alloc(name: &CStr) -> Result<Self, AllocError> { Feels like they should just be named `new`. C-side uses alloc/free pair but we don't have an explicit free function (just drop), so I think it should just be called `new` by convention. Also, would we want `&CStr` or `&str` (or `&BStr`)? This particular case doesn't really need NUL-terminated strings since you can just pass %.*s to shrinker_alloc. > + // SAFETY: Passing `0` as flags is okay. Using `%s` as the format string is okay when we > + // pass a nul-terminated string as the string for `%s` to print. > + let ptr = > + unsafe { bindings::shrinker_alloc(0, c_str!("%s").as_char_ptr(), name.as_char_ptr()) }; > + > + let shrinker = NonNull::new(ptr).ok_or(AllocError)?; > + > + // INVARIANT: The creation of the shrinker was successful. // INVARIANT: The allocated shrinker is unregistered. > + Ok(Self { shrinker }) > + } > + > + /// Create a new shrinker using format arguments for the name. > + pub fn alloc_fmt(name: core::fmt::Arguments<'_>) -> Result<Self, AllocError> { > + // SAFETY: Passing `0` as flags is okay. Using `%pA` as the format string is okay when we > + // pass a `fmt::Arguments` as the value to print. > + let ptr = unsafe { > + bindings::shrinker_alloc( > + 0, > + c_str!("%pA").as_char_ptr(), > + &name as *const _ as *const c_void, > + ) > + }; > + > + let shrinker = NonNull::new(ptr).ok_or(AllocError)?; > + > + // INVARIANT: The creation of the shrinker was successful. same here > + Ok(Self { shrinker }) > + } > + > + /// Set the number of seeks needed to recreate an object. > + pub fn set_seeks(&mut self, seeks: u32) { > + unsafe { (*self.shrinker.as_ptr()).seeks = seeks as c_int }; > + } > + > + /// Register the shrinker. > + /// > + /// The provided pointer is used as the private data, and the type `T` determines the callbacks > + /// that the shrinker will use. > + pub fn register<T: Shrinker>(self, private_data: T::Ptr) -> RegisteredShrinker<T> { > + let shrinker = self.shrinker; > + let ptr = shrinker.as_ptr(); > + > + // The destructor of `self` calls `shrinker_free`, so skip the destructor. > + core::mem::forget(self); > + > + let private_data_ptr = <T::Ptr as ForeignOwnable>::into_foreign(private_data); > + > + // SAFETY: We own the private data, so we can assign to it. > + unsafe { (*ptr).private_data = private_data_ptr.cast_mut() }; > + // SAFETY: The shrinker is not yet registered, so we can update this field. > + unsafe { (*ptr).count_objects = Some(rust_count_objects::<T>) }; > + // SAFETY: The shrinker is not yet registered, so we can update this field. > + unsafe { (*ptr).scan_objects = Some(rust_scan_objects::<T>) }; > + > + // SAFETY: The shrinker is unregistered, so it's safe to register it. > + unsafe { bindings::shrinker_register(ptr) }; > + > + RegisteredShrinker { > + shrinker, > + _phantom: PhantomData, > + } > + } > +} > + > +impl Drop for UnregisteredShrinker { > + fn drop(&mut self) { > + // SAFETY: The shrinker is a valid but unregistered shrinker, and we will not use it > + // anymore. > + unsafe { bindings::shrinker_free(self.shrinker.as_ptr()) }; > + } > +} > + > +/// A shrinker that is registered with the kernel. > +/// > +/// # Invariants > +/// > +/// The `shrinker` pointer refers to a registered shrinker using `T` as the private data. > +pub struct RegisteredShrinker<T: Shrinker> { > + shrinker: NonNull<bindings::shrinker>, > + _phantom: PhantomData<T::Ptr>, > +} > + > +// SAFETY: This allows you to deregister the shrinker from a different thread, which means that > +// private data could be dropped from any thread. > +unsafe impl<T: Shrinker> Send for RegisteredShrinker<T> where T::Ptr: Send {} > +// SAFETY: The only thing you can do with an immutable reference is access the private data, which > +// is okay to access in parallel as the `Shrinker` trait requires the private data to be `Sync`. > +unsafe impl<T: Shrinker> Sync for RegisteredShrinker<T> {} > + > +impl<T: Shrinker> RegisteredShrinker<T> { > + /// Access the private data in this shrinker. > + pub fn private_data(&self) -> <T::Ptr as ForeignOwnable>::Borrowed<'_> { > + // SAFETY: We own the private data, so we can access it. > + let private = unsafe { (*self.shrinker.as_ptr()).private_data }; > + // SAFETY: By the type invariants, the private data is `T`. This access could happen in > + // parallel with a shrinker callback, but that's okay as the `Shrinker` trait ensures that > + // `T::Ptr` is `Sync`. > + unsafe { <T::Ptr as ForeignOwnable>::borrow(private) } > + } > +} > + > +impl<T: Shrinker> Drop for RegisteredShrinker<T> { > + fn drop(&mut self) { > + // SAFETY: We own the private data, so we can access it. > + let private = unsafe { (*self.shrinker.as_ptr()).private_data }; > + // SAFETY: We will not access the shrinker after this call. > + unsafe { bindings::shrinker_free(self.shrinker.as_ptr()) }; > + // SAFETY: The above call blocked until the completion of any shrinker callbacks, so there > + // are no longer any users of the private data. > + drop(unsafe { <T::Ptr as ForeignOwnable>::from_foreign(private) }); > + } > +} > + > +/// Callbacks for a shrinker. > +pub trait Shrinker { > + /// The pointer type used to store the private data of the shrinker. > + /// > + /// Needs to be `Sync` because the shrinker callback could access this value immutably from > + /// several thread in parallel. > + type Ptr: ForeignOwnable + Sync; > + > + /// Count the number of freeable items in the cache. > + /// > + /// May be called from atomic context. > + fn count_objects( > + me: <Self::Ptr as ForeignOwnable>::Borrowed<'_>, > + sc: ShrinkControl<'_>, > + ) -> CountObjects; Any reason that we want this new type instead of just returning `Option<usize>` or something like `Result<usize, ShrinkUnknown>`? Would the `CountObjects` and `ScanObjects` be used in other places? If people always use `CountObjects::from_count()` or `ConstObjects::UNKNOWN` then I think it's better just to eliminate that type. > + > + /// Count the number of freeable items in the cache. > + /// > + /// May be called from atomic context. > + fn scan_objects( > + me: <Self::Ptr as ForeignOwnable>::Borrowed<'_>, > + sc: ShrinkControl<'_>, > + ) -> ScanObjects; > +} > + > +/// How many objects are there in the cache? > +/// > +/// This is used as the return value of [`Shrinker::count_objects`]. > +pub struct CountObjects { > + inner: c_ulong, > +} > + > +impl CountObjects { > + /// Indicates that the number of objects is unknown. > + pub const UNKNOWN: Self = Self { inner: 0 }; > + > + /// Indicates that the number of objects is zero. > + pub const EMPTY: Self = Self { > + inner: SHRINK_EMPTY, > + }; > + > + /// The maximum possible number of freeable objects. > + pub const MAX: Self = Self { > + // The shrinker code assumes that it can multiply this value by two without overflow. > + inner: c_ulong::MAX / 2, > + }; > + > + /// Creates a new `CountObjects` with the given value. > + pub fn from_count(count: usize) -> Self { > + if count == 0 { > + return Self::EMPTY; > + } > + > + if count > Self::MAX.inner as usize { > + return Self::MAX; > + } > + > + Self { > + inner: count as c_ulong, > + } > + } > +} > + > +/// How many objects were freed? > +/// > +/// This is used as the return value of [`Shrinker::scan_objects`]. > +pub struct ScanObjects { > + inner: c_ulong, > +} > + > +impl ScanObjects { > + /// Indicates that the shrinker should stop trying to free objects from this cache due to > + /// potential deadlocks. > + pub const STOP: Self = Self { inner: SHRINK_STOP }; > + > + /// The maximum possible number of freeable objects. > + pub const MAX: Self = Self { > + // The shrinker code assumes that it can multiply this value by two without overflow. > + inner: c_ulong::MAX / 2, > + }; > + > + /// Creates a new `CountObjects` with the given value. > + pub fn from_count(count: usize) -> Self { > + if count > Self::MAX.inner as usize { > + return Self::MAX; > + } > + > + Self { > + inner: count as c_ulong, > + } > + } > +} > + > +/// This struct is used to pass information from page reclaim to the shrinkers. > +pub struct ShrinkControl<'a> { > + ptr: NonNull<bindings::shrink_control>, > + _phantom: PhantomData<&'a bindings::shrink_control>, > +} I feel like this can just be a wrapper of `Opaque<ShrinkControl>` and we hand out `&'a ShrinkControl`? > + > +impl<'a> ShrinkControl<'a> { > + /// Create a `ShrinkControl` from a raw pointer. > + /// > + /// # Safety > + /// > + /// The pointer should point at a valid `shrink_control` for the duration of 'a. > + pub unsafe fn from_raw(ptr: *mut bindings::shrink_control) -> Self { > + Self { > + // SAFETY: Caller promises that this pointer is valid. > + ptr: unsafe { NonNull::new_unchecked(ptr) }, > + _phantom: PhantomData, > + } > + } > + > + /// Determines whether it is safe to recurse into filesystem code. > + pub fn gfp_fs(&self) -> bool { > + // SAFETY: Okay by type invariants. > + let mask = unsafe { (*self.ptr.as_ptr()).gfp_mask }; > + > + (mask & bindings::__GFP_FS) != 0 > + } > + > + /// Returns the number of objects that `scan_objects` should try to reclaim. > + pub fn nr_to_scan(&self) -> usize { > + // SAFETY: Okay by type invariants. > + unsafe { (*self.ptr.as_ptr()).nr_to_scan as usize } > + } > + > + /// The callback should set this value to the number of objects processed. > + pub fn set_nr_scanned(&mut self, val: usize) { > + let mut val = val as c_ulong; > + // SAFETY: Okay by type invariants. > + let max = unsafe { (*self.ptr.as_ptr()).nr_to_scan }; > + if val > max { > + val = max; > + } > + > + // SAFETY: Okay by type invariants. > + unsafe { (*self.ptr.as_ptr()).nr_scanned = val }; > + } > +} > + > +unsafe extern "C" fn rust_count_objects<T: Shrinker>( nit: I think we currently (at least in phy and block mq) use <callback_name>_callback as the `extern "C"` function name. > + shrink: *mut bindings::shrinker, > + sc: *mut bindings::shrink_control, > +) -> c_ulong { > + // SAFETY: We own the private data, so we can access it. > + let private = unsafe { (*shrink).private_data }; > + // SAFETY: This function is only used with shrinkers where `T` is the type of the private data. > + let private = unsafe { <T::Ptr as ForeignOwnable>::borrow(private) }; > + // SAFETY: The caller passes a valid `sc` pointer. > + let sc = unsafe { ShrinkControl::from_raw(sc) }; > + > + let ret = T::count_objects(private, sc); > + ret.inner > +} > + > +unsafe extern "C" fn rust_scan_objects<T: Shrinker>( > + shrink: *mut bindings::shrinker, > + sc: *mut bindings::shrink_control, > +) -> c_ulong { > + // SAFETY: We own the private data, so we can access it. > + let private = unsafe { (*shrink).private_data }; > + // SAFETY: This function is only used with shrinkers where `T` is the type of the private data. > + let private = unsafe { <T::Ptr as ForeignOwnable>::borrow(private) }; > + // SAFETY: The caller passes a valid `sc` pointer. > + let sc = unsafe { ShrinkControl::from_raw(sc) }; > + > + let ret = T::scan_objects(private, sc); > + ret.inner > +} > > --- > base-commit: 93dc3be19450447a3a7090bd1dfb9f3daac3e8d2 > change-id: 20240911-shrinker-f8371af00b68 > > Best regards,
On Sat, 14 Sep 2024 14:07:45 +0100 Gary Guo <gary@garyguo.net> wrote: > > +} > > + > > +/// This struct is used to pass information from page reclaim to the shrinkers. > > +pub struct ShrinkControl<'a> { > > + ptr: NonNull<bindings::shrink_control>, > > + _phantom: PhantomData<&'a bindings::shrink_control>, > > +} > > I feel like this can just be a wrapper of `Opaque<ShrinkControl>` and > we hand out `&'a ShrinkControl`? Correction: I meant `&'a mut ShrinkControl`. Best, Gary
On Thu, Sep 12, 2024 at 09:54:01AM +0000, Alice Ryhl wrote: > Rust Binder holds incoming transactions in a read-only mmap'd region > where it manually manages the pages. These pages are only in use until > the incoming transaction is consumed by userspace, but the kernel will > keep the pages around for future transactions. Rust Binder registers a > shrinker with the kernel so that it can give back these pages if the > system comes under memory pressure. > > Separate types are provided for registered and unregistered shrinkers. > The unregistered shrinker type can be used to configure the shrinker > before registering it. Separating it into two types also enables the > user to construct the private data between the calls to `shrinker_alloc` > and `shrinker_register` and avoid constructing the private data if > allocating the shrinker fails. This seems a bit nasty. It appears to me that the code does an unsafe copy of the internal shrinker state between the unregistered and registered types. Shrinkers have additional internal state when SHRINKER_MEMCG_AWARE and/or SHRINKER_NUMA_AWARE flags are set, and this abstraction doesn't seem to handle either those flags or the internal state they use at all. .... > diff --git a/rust/kernel/shrinker.rs b/rust/kernel/shrinker.rs > new file mode 100644 > index 000000000000..9af726bfe0b1 > --- /dev/null > +++ b/rust/kernel/shrinker.rs > @@ -0,0 +1,324 @@ > +// SPDX-License-Identifier: GPL-2.0 > + > +// Copyright (C) 2024 Google LLC. > + > +//! Shrinker for handling memory pressure. > +//! > +//! C header: [`include/linux/shrinker.h`](srctree/include/linux/shrinker.h) > + > +use crate::{alloc::AllocError, bindings, c_str, str::CStr, types::ForeignOwnable}; > + > +use core::{ > + ffi::{c_int, c_ulong, c_void}, > + marker::PhantomData, > + ptr::NonNull, > +}; > + > +const SHRINK_STOP: c_ulong = bindings::SHRINK_STOP as c_ulong; > +const SHRINK_EMPTY: c_ulong = bindings::SHRINK_EMPTY as c_ulong; > + > +/// The default value for the number of seeks needed to recreate an object. > +pub const DEFAULT_SEEKS: u32 = bindings::DEFAULT_SEEKS; > + > +/// An unregistered shrinker. > +/// > +/// This type can be used to modify the settings of the shrinker before it is registered. > +/// > +/// # Invariants > +/// > +/// The `shrinker` pointer references an unregistered shrinker. > +pub struct UnregisteredShrinker { > + shrinker: NonNull<bindings::shrinker>, > +} > + > +// SAFETY: Moving an unregistered shrinker between threads is okay. > +unsafe impl Send for UnregisteredShrinker {} > +// SAFETY: An unregistered shrinker is thread safe. > +unsafe impl Sync for UnregisteredShrinker {} > + > +impl UnregisteredShrinker { > + /// Create a new shrinker. > + pub fn alloc(name: &CStr) -> Result<Self, AllocError> { > + // SAFETY: Passing `0` as flags is okay. Using `%s` as the format string is okay when we > + // pass a nul-terminated string as the string for `%s` to print. > + let ptr = > + unsafe { bindings::shrinker_alloc(0, c_str!("%s").as_char_ptr(), name.as_char_ptr()) }; This passes flags as 0, so doesn't support SHRINKER_MEMCG_AWARE or SHRINKER_NUMA_AWARE shrinker variants. These flags should be passed through here. > + > + let shrinker = NonNull::new(ptr).ok_or(AllocError)?; > + > + // INVARIANT: The creation of the shrinker was successful. > + Ok(Self { shrinker }) > + } > + > + /// Create a new shrinker using format arguments for the name. > + pub fn alloc_fmt(name: core::fmt::Arguments<'_>) -> Result<Self, AllocError> { > + // SAFETY: Passing `0` as flags is okay. Using `%pA` as the format string is okay when we > + // pass a `fmt::Arguments` as the value to print. > + let ptr = unsafe { > + bindings::shrinker_alloc( > + 0, Same again. > + c_str!("%pA").as_char_ptr(), > + &name as *const _ as *const c_void, > + ) > + }; > + > + let shrinker = NonNull::new(ptr).ok_or(AllocError)?; > + > + // INVARIANT: The creation of the shrinker was successful. > + Ok(Self { shrinker }) > + } > + > + /// Set the number of seeks needed to recreate an object. > + pub fn set_seeks(&mut self, seeks: u32) { > + unsafe { (*self.shrinker.as_ptr()).seeks = seeks as c_int }; > + } Seems kinda weird to have a separate function for setting seeks, when... > + > + /// Register the shrinker. > + /// > + /// The provided pointer is used as the private data, and the type `T` determines the callbacks > + /// that the shrinker will use. > + pub fn register<T: Shrinker>(self, private_data: T::Ptr) -> RegisteredShrinker<T> { .... all the other data needed to set up a shrinker is provided to this function.... > + let shrinker = self.shrinker; > + let ptr = shrinker.as_ptr(); > + > + // The destructor of `self` calls `shrinker_free`, so skip the destructor. > + core::mem::forget(self); > + > + let private_data_ptr = <T::Ptr as ForeignOwnable>::into_foreign(private_data); > + > + // SAFETY: We own the private data, so we can assign to it. > + unsafe { (*ptr).private_data = private_data_ptr.cast_mut() }; > + // SAFETY: The shrinker is not yet registered, so we can update this field. > + unsafe { (*ptr).count_objects = Some(rust_count_objects::<T>) }; > + // SAFETY: The shrinker is not yet registered, so we can update this field. > + unsafe { (*ptr).scan_objects = Some(rust_scan_objects::<T>) }; .... and implemented exactly the same way. ..... > +/// How many objects are there in the cache? > +/// > +/// This is used as the return value of [`Shrinker::count_objects`]. > +pub struct CountObjects { > + inner: c_ulong, > +} > + > +impl CountObjects { > + /// Indicates that the number of objects is unknown. > + pub const UNKNOWN: Self = Self { inner: 0 }; > + > + /// Indicates that the number of objects is zero. > + pub const EMPTY: Self = Self { > + inner: SHRINK_EMPTY, > + }; > + > + /// The maximum possible number of freeable objects. > + pub const MAX: Self = Self { > + // The shrinker code assumes that it can multiply this value by two without overflow. > + inner: c_ulong::MAX / 2, > + }; > + > + /// Creates a new `CountObjects` with the given value. > + pub fn from_count(count: usize) -> Self { > + if count == 0 { > + return Self::EMPTY; > + } No. A return count of 0 is not the same as a return value of SHRINK_EMPTY. A return value of 0 means "no reclaim work can be done right now". This implies that there are objects that can be reclaimed in the near future, but right now they are unavailable for reclaim. This can be due to a trylock protecting the count operation failing, the all the objects in the cache being dirty and needing work to be done before they can be reclaimed, etc. A return value of SHRINK_EMPTY means "there are no reclaimable objects at all". This implies that the cache is empty - it has absolutely nothing in it that can be reclaimed either now or in the near future. These two return values are treated very differently by the high level code. SHRINK_EMPTY is used by shrink_slab_memcg() to maintain a "shrinker needs to run" bit in the memcg shrinker search bitmap. The bit is cleared when SHRINK_EMPTY is returned, meaning the shrinker will not get called again until a new object is queued to it's LRU. This sets the "shrinker needs to run" bit again, and so the shrinker will run next time shrink_slab_memcg() is called. In constrast, a return value of zero (i.e. no work to be done right now) does not change the "shrinker needs to run" state bit, and hence it will always be called the next time the shrink_slab_memcg() is run to try to reclaim objects from that memcg shrinker... ..... > +impl ScanObjects { > + /// Indicates that the shrinker should stop trying to free objects from this cache due to > + /// potential deadlocks. > + pub const STOP: Self = Self { inner: SHRINK_STOP }; > + > + /// The maximum possible number of freeable objects. > + pub const MAX: Self = Self { > + // The shrinker code assumes that it can multiply this value by two without overflow. > + inner: c_ulong::MAX / 2, No it doesn't. This is purely a count of objects freed by the shrinker. > + }; > + > + /// Creates a new `CountObjects` with the given value. > + pub fn from_count(count: usize) -> Self { > + if count > Self::MAX.inner as usize { > + return Self::MAX; > + } > + > + Self { > + inner: count as c_ulong, > + } > + } > +} > + > +/// This struct is used to pass information from page reclaim to the shrinkers. > +pub struct ShrinkControl<'a> { > + ptr: NonNull<bindings::shrink_control>, > + _phantom: PhantomData<&'a bindings::shrink_control>, > +} > + > +impl<'a> ShrinkControl<'a> { > + /// Create a `ShrinkControl` from a raw pointer. > + /// > + /// # Safety > + /// > + /// The pointer should point at a valid `shrink_control` for the duration of 'a. > + pub unsafe fn from_raw(ptr: *mut bindings::shrink_control) -> Self { > + Self { > + // SAFETY: Caller promises that this pointer is valid. > + ptr: unsafe { NonNull::new_unchecked(ptr) }, > + _phantom: PhantomData, > + } > + } > + > + /// Determines whether it is safe to recurse into filesystem code. > + pub fn gfp_fs(&self) -> bool { > + // SAFETY: Okay by type invariants. > + let mask = unsafe { (*self.ptr.as_ptr()).gfp_mask }; > + > + (mask & bindings::__GFP_FS) != 0 > + } This needs a check for __GFP_IO as well, as there are block layer shrinkers that need to be GFP_NOIO aware... > + > + /// Returns the number of objects that `scan_objects` should try to reclaim. > + pub fn nr_to_scan(&self) -> usize { > + // SAFETY: Okay by type invariants. > + unsafe { (*self.ptr.as_ptr()).nr_to_scan as usize } > + } > + > + /// The callback should set this value to the number of objects processed. > + pub fn set_nr_scanned(&mut self, val: usize) { > + let mut val = val as c_ulong; > + // SAFETY: Okay by type invariants. > + let max = unsafe { (*self.ptr.as_ptr()).nr_to_scan }; > + if val > max { > + val = max; > + } No. Shrinkers are allowed to scan more objects in a batch than the high level code asked them to scan. If they do this, they *must* report back the count of all the objects they scanned so the batched scan accounting can adjust the remaining amount of work that needs to be done appropriately. > + > + // SAFETY: Okay by type invariants. > + unsafe { (*self.ptr.as_ptr()).nr_scanned = val }; > + } > +} > + > +unsafe extern "C" fn rust_count_objects<T: Shrinker>( > + shrink: *mut bindings::shrinker, > + sc: *mut bindings::shrink_control, > +) -> c_ulong { > + // SAFETY: We own the private data, so we can access it. > + let private = unsafe { (*shrink).private_data }; > + // SAFETY: This function is only used with shrinkers where `T` is the type of the private data. > + let private = unsafe { <T::Ptr as ForeignOwnable>::borrow(private) }; > + // SAFETY: The caller passes a valid `sc` pointer. > + let sc = unsafe { ShrinkControl::from_raw(sc) }; > + > + let ret = T::count_objects(private, sc); > + ret.inner > +} Why are there two assignments to "private" here? And for the one that is borrowing a reference, why is it needed as the shrinker is supposed to hold a reference, right? Also, where is that reference dropped - it's not at all obvious to me (as a relative n00b to rust) what is going on here. -Dave.
On Mon, Sep 16, 2024 at 4:59 AM Dave Chinner <david@fromorbit.com> wrote: > > On Thu, Sep 12, 2024 at 09:54:01AM +0000, Alice Ryhl wrote: > > Rust Binder holds incoming transactions in a read-only mmap'd region > > where it manually manages the pages. These pages are only in use until > > the incoming transaction is consumed by userspace, but the kernel will > > keep the pages around for future transactions. Rust Binder registers a > > shrinker with the kernel so that it can give back these pages if the > > system comes under memory pressure. > > > > Separate types are provided for registered and unregistered shrinkers. > > The unregistered shrinker type can be used to configure the shrinker > > before registering it. Separating it into two types also enables the > > user to construct the private data between the calls to `shrinker_alloc` > > and `shrinker_register` and avoid constructing the private data if > > allocating the shrinker fails. > > This seems a bit nasty. It appears to me that the code does an > unsafe copy of the internal shrinker state between the unregistered > and registered types. Shrinkers have additional internal state when > SHRINKER_MEMCG_AWARE and/or SHRINKER_NUMA_AWARE flags are set, > and this abstraction doesn't seem to handle either those flags or > the internal state they use at all. Doing an unsafe copy of the internal shrinker state is certainly not my intent. The UnregisteredShrinker and RegisteredShrinker types just wrap a non-null pointer, so the moves inside `register` are not copies of the whole `struct shrinker` but just copies of a pointer to the shrinker. > > +impl UnregisteredShrinker { > > + /// Create a new shrinker. > > + pub fn alloc(name: &CStr) -> Result<Self, AllocError> { > > + // SAFETY: Passing `0` as flags is okay. Using `%s` as the format string is okay when we > > + // pass a nul-terminated string as the string for `%s` to print. > > + let ptr = > > + unsafe { bindings::shrinker_alloc(0, c_str!("%s").as_char_ptr(), name.as_char_ptr()) }; > > This passes flags as 0, so doesn't support SHRINKER_MEMCG_AWARE or > SHRINKER_NUMA_AWARE shrinker variants. These flags should be > passed through here. I don't have a user for memcg/numa aware shrinkers in Rust, which is why I did not extend these abstractions to support them. However, if you would like me to, I'm happy to do so. It is easy to add a flags argument to this method, but I imagine they also need a few other additions elsewhere in the API to really be supported? Now that I think about it, perhaps Binder's shrinker *could* be memcg aware? It uses the list_lru abstraction, and currently calls `list_lru_count` in `count_objects`, but maybe we could just use `list_lru_shrink_count` instead and magically become memcg aware ...? > > + /// Set the number of seeks needed to recreate an object. > > + pub fn set_seeks(&mut self, seeks: u32) { > > + unsafe { (*self.shrinker.as_ptr()).seeks = seeks as c_int }; > > + } > > Seems kinda weird to have a separate function for setting seeks, > when... > > > + > > + /// Register the shrinker. > > + /// > > + /// The provided pointer is used as the private data, and the type `T` determines the callbacks > > + /// that the shrinker will use. > > + pub fn register<T: Shrinker>(self, private_data: T::Ptr) -> RegisteredShrinker<T> { > > .... all the other data needed to set up a shrinker is provided to > this function.... > > > + let shrinker = self.shrinker; > > + let ptr = shrinker.as_ptr(); > > + > > + // The destructor of `self` calls `shrinker_free`, so skip the destructor. > > + core::mem::forget(self); > > + > > + let private_data_ptr = <T::Ptr as ForeignOwnable>::into_foreign(private_data); > > + > > + // SAFETY: We own the private data, so we can assign to it. > > + unsafe { (*ptr).private_data = private_data_ptr.cast_mut() }; > > + // SAFETY: The shrinker is not yet registered, so we can update this field. > > + unsafe { (*ptr).count_objects = Some(rust_count_objects::<T>) }; > > + // SAFETY: The shrinker is not yet registered, so we can update this field. > > + unsafe { (*ptr).scan_objects = Some(rust_scan_objects::<T>) }; > > .... and implemented exactly the same way. Well .. this design makes setting `seeks` optional, but `private_data` mandatory. We *have* to set the two function pointers, but you don't have to set `seeks` explicitly. It's not obvious, so it's probably worth mentioning that this design doesn't actually require you to make an allocation and store a real pointer in the private data. When implementing the Shrinker trait for your custom shrinker, you have to choose which pointer type to use. If you choose the unit type `()` as the pointer type, then no real pointer is actually stored. > > +/// How many objects are there in the cache? > > +/// > > +/// This is used as the return value of [`Shrinker::count_objects`]. > > +pub struct CountObjects { > > + inner: c_ulong, > > +} > > + > > +impl CountObjects { > > + /// Indicates that the number of objects is unknown. > > + pub const UNKNOWN: Self = Self { inner: 0 }; > > + > > + /// Indicates that the number of objects is zero. > > + pub const EMPTY: Self = Self { > > + inner: SHRINK_EMPTY, > > + }; > > + > > + /// The maximum possible number of freeable objects. > > + pub const MAX: Self = Self { > > + // The shrinker code assumes that it can multiply this value by two without overflow. > > + inner: c_ulong::MAX / 2, > > + }; > > + > > + /// Creates a new `CountObjects` with the given value. > > + pub fn from_count(count: usize) -> Self { > > + if count == 0 { > > + return Self::EMPTY; > > + } > > No. A return count of 0 is not the same as a return value of > SHRINK_EMPTY. > > A return value of 0 means "no reclaim work can be done right now". > > This implies that there are objects that can be reclaimed in the near > future, but right now they are unavailable for reclaim. This can be > due to a trylock protecting the count operation failing, the all the > objects in the cache being dirty and needing work to be done before > they can be reclaimed, etc. > > A return value of SHRINK_EMPTY means "there are no reclaimable > objects at all". > > This implies that the cache is empty - it has absolutely nothing in > it that can be reclaimed either now or in the near future. > > > These two return values are treated very differently by the high > level code. SHRINK_EMPTY is used by shrink_slab_memcg() to maintain > a "shrinker needs to run" bit in the memcg shrinker search bitmap. > The bit is cleared when SHRINK_EMPTY is returned, meaning the > shrinker will not get called again until a new object is queued > to it's LRU. This sets the "shrinker needs to run" bit again, and > so the shrinker will run next time shrink_slab_memcg() is called. > > In constrast, a return value of zero (i.e. no work to be done right > now) does not change the "shrinker needs to run" state bit, and > hence it will always be called the next time the shrink_slab_memcg() > is run to try to reclaim objects from that memcg shrinker... This is a part of the API that I was pretty unsure about, so very happy to get your input. For both of the shrinkers I am familiar with, they know the exact count of objects and there's no error case for any lock. The shrinkers just return the exact count directly from count_objects, but that seemed incorrect to me, as this means that the shrinker returns 0 when it really should return SHRINK_EMPTY. That's why I implemented `from_count` this way - the intent is that you use `CountObjects::from_count` when you know the exact count, so if you pass 0 to that method, then that means the shrinker really is empty. If you don't know what the count is, then you return `CountObjects::UNKNOWN` instead. That said, I guess there is some reason that the C shrinker API is designed to use the value 0 for unknown rather than empty, so perhaps I should not try to do it differently. > > +impl ScanObjects { > > + /// Indicates that the shrinker should stop trying to free objects from this cache due to > > + /// potential deadlocks. > > + pub const STOP: Self = Self { inner: SHRINK_STOP }; > > + > > + /// The maximum possible number of freeable objects. > > + pub const MAX: Self = Self { > > + // The shrinker code assumes that it can multiply this value by two without overflow. > > + inner: c_ulong::MAX / 2, > > No it doesn't. This is purely a count of objects freed by the > shrinker. In mm/shrinker.c it multiplies by two here: total_scan = min(total_scan, (2 * freeable)); And I guess there is even a multiplication by four a bit earlier in the same function if priority is zero. > > + /// Determines whether it is safe to recurse into filesystem code. > > + pub fn gfp_fs(&self) -> bool { > > + // SAFETY: Okay by type invariants. > > + let mask = unsafe { (*self.ptr.as_ptr()).gfp_mask }; > > + > > + (mask & bindings::__GFP_FS) != 0 > > + } > > This needs a check for __GFP_IO as well, as there are block layer > shrinkers that need to be GFP_NOIO aware... Would you prefer that I add additional methods, or that I modify this method to also check __GFP_IO, or that I just expose a way to get the `gfp_mask` value directly? > > + /// Returns the number of objects that `scan_objects` should try to reclaim. > > + pub fn nr_to_scan(&self) -> usize { > > + // SAFETY: Okay by type invariants. > > + unsafe { (*self.ptr.as_ptr()).nr_to_scan as usize } > > + } > > + > > + /// The callback should set this value to the number of objects processed. > > + pub fn set_nr_scanned(&mut self, val: usize) { > > + let mut val = val as c_ulong; > > + // SAFETY: Okay by type invariants. > > + let max = unsafe { (*self.ptr.as_ptr()).nr_to_scan }; > > + if val > max { > > + val = max; > > + } > > No. Shrinkers are allowed to scan more objects in a batch than > the high level code asked them to scan. If they do this, they > *must* report back the count of all the objects they scanned so the > batched scan accounting can adjust the remaining amount of work that > needs to be done appropriately. Acknowledged. This is related to another question that I had, actually. Other than returning SHRINK_STOP, in what situations should `nr_scanned` be different from the return value of `scan_objects`? > > + > > + // SAFETY: Okay by type invariants. > > + unsafe { (*self.ptr.as_ptr()).nr_scanned = val }; > > + } > > +} > > + > > +unsafe extern "C" fn rust_count_objects<T: Shrinker>( > > + shrink: *mut bindings::shrinker, > > + sc: *mut bindings::shrink_control, > > +) -> c_ulong { > > + // SAFETY: We own the private data, so we can access it. > > + let private = unsafe { (*shrink).private_data }; > > + // SAFETY: This function is only used with shrinkers where `T` is the type of the private data. > > + let private = unsafe { <T::Ptr as ForeignOwnable>::borrow(private) }; > > + // SAFETY: The caller passes a valid `sc` pointer. > > + let sc = unsafe { ShrinkControl::from_raw(sc) }; > > + > > + let ret = T::count_objects(private, sc); > > + ret.inner > > +} > > Why are there two assignments to "private" here? And for the one > that is borrowing a reference, why is it needed as the shrinker is > supposed to hold a reference, right? Also, where is that reference > dropped - it's not at all obvious to me (as a relative n00b to rust) > what is going on here. It's just to avoid having one long line. The first line obtains a void pointer to the private data. The second line converts it into a borrowed form of whichever pointer type the private data has chosen to use. For example: * If the private data is `Box<T>`, that is, a pointer with exclusive ownership over an allocation created with kmalloc, then the borrowed form is a normal shared Rust reference `&T`. * If the private data is `Arc<T>`, that is, a pointer that owns a refcount on an allocation created with kmalloc, then the borrowed form is `ArcBorrow<T>`, which is like `&T` but also gives you the ability to increment the refcount and create a new `Arc<T>` if you would like one. An `ArcBorrow<T>` does not hold a refcount itself. Notably, the call to `borrow` does *not* increment any refcounts. Think of it as a pointer cast from void pointer to whichever pointer type is appropriate for the kind of ownership used with the private data. If you look at the definition of ForeignOwnable in rust/kernel/types.rs, which is a trait for anything that can be stored in a void pointer, you'll find that it has three methods: * `into_foreign` - converts an owned version of the pointer into a void pointer. * `from_foreign` - converts the void pointer back into an owned version, effectively taking ownership of the void pointer * `borrow` - let's you look inside a void pointer without taking ownership of it Here, `from_foreign` requires that there are no active calls to `borrow` when you take back ownership. Alice
On Fri, Sep 13, 2024 at 10:31:20PM +0200, Alice Ryhl wrote: > On Fri, Sep 13, 2024 at 10:15 PM Simona Vetter <simona.vetter@ffwll.ch> wrote: > > > > On Thu, Sep 12, 2024 at 09:54:01AM +0000, Alice Ryhl wrote: > > > Rust Binder holds incoming transactions in a read-only mmap'd region > > > where it manually manages the pages. These pages are only in use until > > > the incoming transaction is consumed by userspace, but the kernel will > > > keep the pages around for future transactions. Rust Binder registers a > > > shrinker with the kernel so that it can give back these pages if the > > > system comes under memory pressure. > > > > > > Separate types are provided for registered and unregistered shrinkers. > > > The unregistered shrinker type can be used to configure the shrinker > > > before registering it. Separating it into two types also enables the > > > user to construct the private data between the calls to `shrinker_alloc` > > > and `shrinker_register` and avoid constructing the private data if > > > allocating the shrinker fails. > > > > > > The user specifies the callbacks in use by implementing the Shrinker > > > trait for the type used for the private data. This requires specifying > > > three things: implementations for count_objects and scan_objects, and > > > the pointer type that the private data will be wrapped in. > > > > > > The return values of count_objects and scan_objects are provided using > > > new types called CountObjects and ScanObjects respectively. These types > > > prevent the user from e.g. returning SHRINK_STOP from count_objects or > > > returning SHRINK_EMPTY from scan_objects. > > > > > > Signed-off-by: Alice Ryhl <aliceryhl@google.com> > > > > Scary given that drm has a few, but I learned a few new thinks about > > shrinkers. Some comments below, but looks all really nice imo. > > > > Cheers, Sima > > > > > --- > > > rust/bindings/bindings_helper.h | 2 + > > > rust/kernel/lib.rs | 1 + > > > rust/kernel/shrinker.rs | 324 ++++++++++++++++++++++++++++++++++++++++ > > > 3 files changed, 327 insertions(+) > > > > > > diff --git a/rust/bindings/bindings_helper.h b/rust/bindings/bindings_helper.h > > > index ae82e9c941af..7fc958e05dc5 100644 > > > --- a/rust/bindings/bindings_helper.h > > > +++ b/rust/bindings/bindings_helper.h > > > @@ -18,6 +18,7 @@ > > > #include <linux/phy.h> > > > #include <linux/refcount.h> > > > #include <linux/sched.h> > > > +#include <linux/shrinker.h> > > > #include <linux/slab.h> > > > #include <linux/wait.h> > > > #include <linux/workqueue.h> > > > @@ -31,4 +32,5 @@ const gfp_t RUST_CONST_HELPER_GFP_KERNEL_ACCOUNT = GFP_KERNEL_ACCOUNT; > > > const gfp_t RUST_CONST_HELPER_GFP_NOWAIT = GFP_NOWAIT; > > > const gfp_t RUST_CONST_HELPER___GFP_ZERO = __GFP_ZERO; > > > const gfp_t RUST_CONST_HELPER___GFP_HIGHMEM = ___GFP_HIGHMEM; > > > +const gfp_t RUST_CONST_HELPER___GFP_FS = ___GFP_FS; > > > const blk_features_t RUST_CONST_HELPER_BLK_FEAT_ROTATIONAL = BLK_FEAT_ROTATIONAL; > > > diff --git a/rust/kernel/lib.rs b/rust/kernel/lib.rs > > > index f10b06a78b9d..f35eb290f2e0 100644 > > > --- a/rust/kernel/lib.rs > > > +++ b/rust/kernel/lib.rs > > > @@ -45,6 +45,7 @@ > > > pub mod prelude; > > > pub mod print; > > > pub mod rbtree; > > > +pub mod shrinker; > > > mod static_assert; > > > #[doc(hidden)] > > > pub mod std_vendor; > > > diff --git a/rust/kernel/shrinker.rs b/rust/kernel/shrinker.rs > > > new file mode 100644 > > > index 000000000000..9af726bfe0b1 > > > --- /dev/null > > > +++ b/rust/kernel/shrinker.rs > > > @@ -0,0 +1,324 @@ > > > +// SPDX-License-Identifier: GPL-2.0 > > > + > > > +// Copyright (C) 2024 Google LLC. > > > + > > > +//! Shrinker for handling memory pressure. > > > +//! > > > +//! C header: [`include/linux/shrinker.h`](srctree/include/linux/shrinker.h) > > > + > > > +use crate::{alloc::AllocError, bindings, c_str, str::CStr, types::ForeignOwnable}; > > > + > > > +use core::{ > > > + ffi::{c_int, c_ulong, c_void}, > > > + marker::PhantomData, > > > + ptr::NonNull, > > > +}; > > > + > > > +const SHRINK_STOP: c_ulong = bindings::SHRINK_STOP as c_ulong; > > > +const SHRINK_EMPTY: c_ulong = bindings::SHRINK_EMPTY as c_ulong; > > > + > > > +/// The default value for the number of seeks needed to recreate an object. > > > +pub const DEFAULT_SEEKS: u32 = bindings::DEFAULT_SEEKS; > > > + > > > +/// An unregistered shrinker. > > > +/// > > > +/// This type can be used to modify the settings of the shrinker before it is registered. > > > +/// > > > +/// # Invariants > > > +/// > > > +/// The `shrinker` pointer references an unregistered shrinker. > > > +pub struct UnregisteredShrinker { > > > + shrinker: NonNull<bindings::shrinker>, > > > +} > > > + > > > +// SAFETY: Moving an unregistered shrinker between threads is okay. > > > +unsafe impl Send for UnregisteredShrinker {} > > > +// SAFETY: An unregistered shrinker is thread safe. > > > +unsafe impl Sync for UnregisteredShrinker {} > > > + > > > +impl UnregisteredShrinker { > > > + /// Create a new shrinker. > > > + pub fn alloc(name: &CStr) -> Result<Self, AllocError> { > > > + // SAFETY: Passing `0` as flags is okay. Using `%s` as the format string is okay when we > > > > I'd elaborate here that we have to pass 0 as the only valid value because > > all the non-zero flags are for memcg and numa aware shrinkers, and we do > > not support those because we don't expose the relevant data from > > ShrinkControl. > > > > > + // pass a nul-terminated string as the string for `%s` to print. > > > + let ptr = > > > + unsafe { bindings::shrinker_alloc(0, c_str!("%s").as_char_ptr(), name.as_char_ptr()) }; > > > + > > > + let shrinker = NonNull::new(ptr).ok_or(AllocError)?; > > > + > > > + // INVARIANT: The creation of the shrinker was successful. > > > + Ok(Self { shrinker }) > > > + } > > > + > > > + /// Create a new shrinker using format arguments for the name. > > > + pub fn alloc_fmt(name: core::fmt::Arguments<'_>) -> Result<Self, AllocError> { > > > + // SAFETY: Passing `0` as flags is okay. Using `%pA` as the format string is okay when we > > > + // pass a `fmt::Arguments` as the value to print. > > > + let ptr = unsafe { > > > + bindings::shrinker_alloc( > > > + 0, > > > + c_str!("%pA").as_char_ptr(), > > > + &name as *const _ as *const c_void, > > > + ) > > > + }; > > > + > > > + let shrinker = NonNull::new(ptr).ok_or(AllocError)?; > > > + > > > + // INVARIANT: The creation of the shrinker was successful. > > > + Ok(Self { shrinker }) > > > + } > > > + > > > + /// Set the number of seeks needed to recreate an object. > > > + pub fn set_seeks(&mut self, seeks: u32) { > > > > Not sure we want to expose this, with ssd seeks kinda don't matter and > > it's just a bit about relative fairness. I think nowadays it's more > > important that the count_object values are normalized to size, if not all > > objects you shrink have the same fixed size. > > > > So not sure we need this, at least initially. > > Good point about keeping the objects the same fixed size. I'll > incorporate that in the docs. > > > > + unsafe { (*self.shrinker.as_ptr()).seeks = seeks as c_int }; > > > + } > > > + > > > + /// Register the shrinker. > > > + /// > > > + /// The provided pointer is used as the private data, and the type `T` determines the callbacks > > > + /// that the shrinker will use. > > > + pub fn register<T: Shrinker>(self, private_data: T::Ptr) -> RegisteredShrinker<T> { > > > + let shrinker = self.shrinker; > > > + let ptr = shrinker.as_ptr(); > > > + > > > + // The destructor of `self` calls `shrinker_free`, so skip the destructor. > > > + core::mem::forget(self); > > > + > > > + let private_data_ptr = <T::Ptr as ForeignOwnable>::into_foreign(private_data); > > > + > > > + // SAFETY: We own the private data, so we can assign to it. > > > + unsafe { (*ptr).private_data = private_data_ptr.cast_mut() }; > > > + // SAFETY: The shrinker is not yet registered, so we can update this field. > > > + unsafe { (*ptr).count_objects = Some(rust_count_objects::<T>) }; > > > + // SAFETY: The shrinker is not yet registered, so we can update this field. > > > + unsafe { (*ptr).scan_objects = Some(rust_scan_objects::<T>) }; > > > + > > > + // SAFETY: The shrinker is unregistered, so it's safe to register it. > > > + unsafe { bindings::shrinker_register(ptr) }; > > > + > > > + RegisteredShrinker { > > > + shrinker, > > > + _phantom: PhantomData, > > > + } > > > + } > > > +} > > > + > > > +impl Drop for UnregisteredShrinker { > > > + fn drop(&mut self) { > > > + // SAFETY: The shrinker is a valid but unregistered shrinker, and we will not use it > > > + // anymore. > > > + unsafe { bindings::shrinker_free(self.shrinker.as_ptr()) }; > > > + } > > > +} > > > + > > > +/// A shrinker that is registered with the kernel. > > > +/// > > > +/// # Invariants > > > +/// > > > +/// The `shrinker` pointer refers to a registered shrinker using `T` as the private data. > > > +pub struct RegisteredShrinker<T: Shrinker> { > > > + shrinker: NonNull<bindings::shrinker>, > > > + _phantom: PhantomData<T::Ptr>, > > > +} > > > + > > > +// SAFETY: This allows you to deregister the shrinker from a different thread, which means that > > > +// private data could be dropped from any thread. > > > +unsafe impl<T: Shrinker> Send for RegisteredShrinker<T> where T::Ptr: Send {} > > > +// SAFETY: The only thing you can do with an immutable reference is access the private data, which > > > +// is okay to access in parallel as the `Shrinker` trait requires the private data to be `Sync`. > > > +unsafe impl<T: Shrinker> Sync for RegisteredShrinker<T> {} > > > + > > > +impl<T: Shrinker> RegisteredShrinker<T> { > > > + /// Access the private data in this shrinker. > > > + pub fn private_data(&self) -> <T::Ptr as ForeignOwnable>::Borrowed<'_> { > > > + // SAFETY: We own the private data, so we can access it. > > > + let private = unsafe { (*self.shrinker.as_ptr()).private_data }; > > > + // SAFETY: By the type invariants, the private data is `T`. This access could happen in > > > + // parallel with a shrinker callback, but that's okay as the `Shrinker` trait ensures that > > > + // `T::Ptr` is `Sync`. > > > + unsafe { <T::Ptr as ForeignOwnable>::borrow(private) } > > > + } > > > +} > > > + > > > +impl<T: Shrinker> Drop for RegisteredShrinker<T> { > > > + fn drop(&mut self) { > > > + // SAFETY: We own the private data, so we can access it. > > > + let private = unsafe { (*self.shrinker.as_ptr()).private_data }; > > > + // SAFETY: We will not access the shrinker after this call. > > > + unsafe { bindings::shrinker_free(self.shrinker.as_ptr()) }; > > > + // SAFETY: The above call blocked until the completion of any shrinker callbacks, so there > > > + // are no longer any users of the private data. > > > + drop(unsafe { <T::Ptr as ForeignOwnable>::from_foreign(private) }); > > > + } > > > +} > > > + > > > +/// Callbacks for a shrinker. > > > +pub trait Shrinker { > > > + /// The pointer type used to store the private data of the shrinker. > > > + /// > > > + /// Needs to be `Sync` because the shrinker callback could access this value immutably from > > > + /// several thread in parallel. > > > + type Ptr: ForeignOwnable + Sync; > > > + > > > + /// Count the number of freeable items in the cache. > > > + /// > > > + /// May be called from atomic context. > > > > That's wrong, reclaim is allowed to block. Or my understanding of how this > > works is very busted. We do run in a pseudo locking context, the core > > code annotates that with fs_reclaim_acquire/release. > > Ah, ok. Every shrinker I've looked at uses try_lock everywhere so I > assumed this could happen. Thanks for verifying that. > > > > + fn count_objects( > > > + me: <Self::Ptr as ForeignOwnable>::Borrowed<'_>, > > > + sc: ShrinkControl<'_>, > > > + ) -> CountObjects; > > > + > > > + /// Count the number of freeable items in the cache. > > > + /// > > > + /// May be called from atomic context. > > > > Same here. > > > > > + fn scan_objects( > > > + me: <Self::Ptr as ForeignOwnable>::Borrowed<'_>, > > > + sc: ShrinkControl<'_>, > > > + ) -> ScanObjects; > > > +} > > > + > > > +/// How many objects are there in the cache? > > > +/// > > > +/// This is used as the return value of [`Shrinker::count_objects`]. > > > +pub struct CountObjects { > > > + inner: c_ulong, > > > +} > > > + > > > +impl CountObjects { > > > + /// Indicates that the number of objects is unknown. > > > + pub const UNKNOWN: Self = Self { inner: 0 }; > > > + > > > + /// Indicates that the number of objects is zero. > > > + pub const EMPTY: Self = Self { > > > + inner: SHRINK_EMPTY, > > > > So I spend way too much time looking at all this, and I think overflows > > don't matter to the core shrinker code (aside from maybe a bit of > > unfairness), as long as we don't accidently return SHRINK_EMPTY here. But > > that's only relevant for memcg aware shrinkers I think. > > Overflow is one thing. The current API automatically converts 0 to > SHRINK_EMPTY, whereas many C shrinkers just return the size directly, > which means they return UNKNOWN when it's really empty. Thoughts? I think the automatic conversion SHRINK_EMPTY and making sure implementations don't accidentally return SHRINK_STOP or one of the special is good, since that has a direct impact on the code flow. It was just the additional overflow prevention you're doing here where I'm not sure it's enough, and the C code clearly doesn't care. > > > + }; > > > + > > > + /// The maximum possible number of freeable objects. > > > + pub const MAX: Self = Self { > > > + // The shrinker code assumes that it can multiply this value by two without overflow. > > > + inner: c_ulong::MAX / 2, > > > > I think the limits is actually ulong_max/4, since priority can be 0 from > > drop_slabs() and we multiply by 4 is seeks are nonzero. But then I tried > > to look at what the limit for nr_to_scan and hence ScanObjects, and I > > think aside from the special values the mm/shrinker.c simply does not care > > about overflows at all. Both unsigned and signed integer math is well > > defined for overflow in linux, so no compiler license to do stupid stuff, > > and worst case if you do overflow you just end up shrinking a bit > > unfairly. But afaict nothing breaks. > > > > So not sure we should enforce this when core mm doesn't bother. > > > > Same for ScanObjects below. > > > > > + }; > > > + > > > + /// Creates a new `CountObjects` with the given value. > > > + pub fn from_count(count: usize) -> Self { > > > + if count == 0 { > > > + return Self::EMPTY; > > > + } > > > + > > > + if count > Self::MAX.inner as usize { > > > + return Self::MAX; > > > + } > > > + > > > + Self { > > > + inner: count as c_ulong, > > > + } > > > + } > > > +} > > > + > > > +/// How many objects were freed? > > > +/// > > > +/// This is used as the return value of [`Shrinker::scan_objects`]. > > > +pub struct ScanObjects { > > > + inner: c_ulong, > > > +} > > > + > > > +impl ScanObjects { > > > + /// Indicates that the shrinker should stop trying to free objects from this cache due to > > > + /// potential deadlocks. > > > + pub const STOP: Self = Self { inner: SHRINK_STOP }; > > > + > > > + /// The maximum possible number of freeable objects. > > > + pub const MAX: Self = Self { > > > + // The shrinker code assumes that it can multiply this value by two without overflow. > > > + inner: c_ulong::MAX / 2, > > > + }; > > > + > > > + /// Creates a new `CountObjects` with the given value. > > > + pub fn from_count(count: usize) -> Self { > > > + if count > Self::MAX.inner as usize { > > > + return Self::MAX; > > > + } > > > + > > > + Self { > > > + inner: count as c_ulong, > > > + } > > > + } > > > +} > > > + > > > +/// This struct is used to pass information from page reclaim to the shrinkers. > > > +pub struct ShrinkControl<'a> { > > > + ptr: NonNull<bindings::shrink_control>, > > > + _phantom: PhantomData<&'a bindings::shrink_control>, > > > +} > > > + > > > +impl<'a> ShrinkControl<'a> { > > > + /// Create a `ShrinkControl` from a raw pointer. > > > + /// > > > + /// # Safety > > > + /// > > > + /// The pointer should point at a valid `shrink_control` for the duration of 'a. > > > + pub unsafe fn from_raw(ptr: *mut bindings::shrink_control) -> Self { > > > + Self { > > > + // SAFETY: Caller promises that this pointer is valid. > > > + ptr: unsafe { NonNull::new_unchecked(ptr) }, > > > + _phantom: PhantomData, > > > + } > > > + } > > > + > > > + /// Determines whether it is safe to recurse into filesystem code. > > > + pub fn gfp_fs(&self) -> bool { > > > > I guess you need this in your new binder code, because the current one > > seems to side-step __GFP_FS recursion with just trylocking absolutely > > everything aside from the lru spinlock. At least I haven't found any > > gfp_fs tests, but I might be blind. > > Ah, yeah, I don't think C binder tests for that. > > Either way, it probably makes more sense to just expose a method to > get the flags directly, rather than have a dedicated method for > testing this particular flags. Or what do you think? I think we could be a bit more opionated here and just expose checks like this. Like some don't make sense in shrinkers in general (like the reclaim related flags, we're in a shrinker, so one of those is set). And some only make sense when it's a memcg/numa aware shrinker. Beyond that I think there's really only good reasons to look at gfp_fs/io for recursion control, but core mm is moving towards a world where those are per-thread flags controlled with memalloc_*_save/restore functions. Also I just realized we now need to filter the raw gfp flags through current_gfp_context (but not sure core mm does that for us already). So definitely leaning towards exposing meaningful query functions and not raw flags here. Cheers, Sima > > > > + // SAFETY: Okay by type invariants. > > > + let mask = unsafe { (*self.ptr.as_ptr()).gfp_mask }; > > > + > > > + (mask & bindings::__GFP_FS) != 0 > > > + } > > > + > > > + /// Returns the number of objects that `scan_objects` should try to reclaim. > > > + pub fn nr_to_scan(&self) -> usize { > > > + // SAFETY: Okay by type invariants. > > > + unsafe { (*self.ptr.as_ptr()).nr_to_scan as usize } > > > + } > > > + > > > + /// The callback should set this value to the number of objects processed. > > > + pub fn set_nr_scanned(&mut self, val: usize) { > > > > Unless my grep skills are really bad I think the drm/i915 shrinker is the > > only one that bothers with this. Not sure we want to bother either? > > > > > + let mut val = val as c_ulong; > > > + // SAFETY: Okay by type invariants. > > > + let max = unsafe { (*self.ptr.as_ptr()).nr_to_scan }; > > > + if val > max { > > > + val = max; > > > + } > > > + > > > + // SAFETY: Okay by type invariants. > > > + unsafe { (*self.ptr.as_ptr()).nr_scanned = val }; > > > + } > > > +} > > > + > > > +unsafe extern "C" fn rust_count_objects<T: Shrinker>( > > > + shrink: *mut bindings::shrinker, > > > + sc: *mut bindings::shrink_control, > > > +) -> c_ulong { > > > + // SAFETY: We own the private data, so we can access it. > > > + let private = unsafe { (*shrink).private_data }; > > > + // SAFETY: This function is only used with shrinkers where `T` is the type of the private data. > > > + let private = unsafe { <T::Ptr as ForeignOwnable>::borrow(private) }; > > > + // SAFETY: The caller passes a valid `sc` pointer. > > > + let sc = unsafe { ShrinkControl::from_raw(sc) }; > > > + > > > + let ret = T::count_objects(private, sc); > > > + ret.inner > > > +} > > > + > > > +unsafe extern "C" fn rust_scan_objects<T: Shrinker>( > > > + shrink: *mut bindings::shrinker, > > > + sc: *mut bindings::shrink_control, > > > +) -> c_ulong { > > > + // SAFETY: We own the private data, so we can access it. > > > + let private = unsafe { (*shrink).private_data }; > > > + // SAFETY: This function is only used with shrinkers where `T` is the type of the private data. > > > + let private = unsafe { <T::Ptr as ForeignOwnable>::borrow(private) }; > > > + // SAFETY: The caller passes a valid `sc` pointer. > > > + let sc = unsafe { ShrinkControl::from_raw(sc) }; > > > + > > > + let ret = T::scan_objects(private, sc); > > > + ret.inner > > > +} > > > > > > --- > > > base-commit: 93dc3be19450447a3a7090bd1dfb9f3daac3e8d2 > > > change-id: 20240911-shrinker-f8371af00b68 > > > > > > Best regards, > > > -- > > > Alice Ryhl <aliceryhl@google.com> > > > > > > > -- > > Simona Vetter > > Software Engineer, Intel Corporation > > http://blog.ffwll.ch
On Mon, Sep 16, 2024 at 12:22:40PM +0200, Alice Ryhl wrote: > On Mon, Sep 16, 2024 at 4:59 AM Dave Chinner <david@fromorbit.com> wrote: > > > > On Thu, Sep 12, 2024 at 09:54:01AM +0000, Alice Ryhl wrote: > > > Rust Binder holds incoming transactions in a read-only mmap'd region > > > where it manually manages the pages. These pages are only in use until > > > the incoming transaction is consumed by userspace, but the kernel will > > > keep the pages around for future transactions. Rust Binder registers a > > > shrinker with the kernel so that it can give back these pages if the > > > system comes under memory pressure. > > > > > > Separate types are provided for registered and unregistered shrinkers. > > > The unregistered shrinker type can be used to configure the shrinker > > > before registering it. Separating it into two types also enables the > > > user to construct the private data between the calls to `shrinker_alloc` > > > and `shrinker_register` and avoid constructing the private data if > > > allocating the shrinker fails. > > > > This seems a bit nasty. It appears to me that the code does an > > unsafe copy of the internal shrinker state between the unregistered > > and registered types. Shrinkers have additional internal state when > > SHRINKER_MEMCG_AWARE and/or SHRINKER_NUMA_AWARE flags are set, > > and this abstraction doesn't seem to handle either those flags or > > the internal state they use at all. > > Doing an unsafe copy of the internal shrinker state is certainly not > my intent. The UnregisteredShrinker and RegisteredShrinker types just > wrap a non-null pointer, so the moves inside `register` are not copies > of the whole `struct shrinker` but just copies of a pointer to the > shrinker. So you're playing type casting games with the same structure just to avoid the potential doing a little bit of unnecessary work in an error situation? This all seems rather convoluted and largely unnecessary to me as the shrinker_free() C code transparent handles both registered and unregistered shrinkers being passed to it. There really is no difference between the two cases in terms of error handling and teardown.... > > > +impl UnregisteredShrinker { > > > + /// Create a new shrinker. > > > + pub fn alloc(name: &CStr) -> Result<Self, AllocError> { > > > + // SAFETY: Passing `0` as flags is okay. Using `%s` as the format string is okay when we > > > + // pass a nul-terminated string as the string for `%s` to print. > > > + let ptr = > > > + unsafe { bindings::shrinker_alloc(0, c_str!("%s").as_char_ptr(), name.as_char_ptr()) }; > > > > This passes flags as 0, so doesn't support SHRINKER_MEMCG_AWARE or > > SHRINKER_NUMA_AWARE shrinker variants. These flags should be > > passed through here. > > I don't have a user for memcg/numa aware shrinkers in Rust, which is > why I did not extend these abstractions to support them. However, if > you would like me to, I'm happy to do so. Please do. A partial implementation only makes it harder for the next person to use the API.... > It is easy to add a flags > argument to this method, but I imagine they also need a few other > additions elsewhere in the API to really be supported? In the shrinker API, nothing extra needs to be done. If you haven't added bindings to the list_lru interfaces yet, then you are going to need to, including the memcg and numa aware interfaces for that. > Now that I think about it, perhaps Binder's shrinker *could* be memcg > aware? It uses the list_lru abstraction, and currently calls > `list_lru_count` in `count_objects`, but maybe we could just use > `list_lru_shrink_count` instead and magically become memcg aware ...? The scan code also needs to use list_lru_shrink_walk(), too, not list_lru_walk(). And, yes, that's really the whole point of the list_lru abstraction - almost nothing extra needs to be done to enable both NUMA and memcg based reclaim context support for a cache once it's been converted to use list_lru for tracking reclaimable objects. > > > > + /// Set the number of seeks needed to recreate an object. > > > + pub fn set_seeks(&mut self, seeks: u32) { > > > + unsafe { (*self.shrinker.as_ptr()).seeks = seeks as c_int }; > > > + } > > > > Seems kinda weird to have a separate function for setting seeks, > > when... > > > > > + > > > + /// Register the shrinker. > > > + /// > > > + /// The provided pointer is used as the private data, and the type `T` determines the callbacks > > > + /// that the shrinker will use. > > > + pub fn register<T: Shrinker>(self, private_data: T::Ptr) -> RegisteredShrinker<T> { > > > > .... all the other data needed to set up a shrinker is provided to > > this function.... > > > > > + let shrinker = self.shrinker; > > > + let ptr = shrinker.as_ptr(); > > > + > > > + // The destructor of `self` calls `shrinker_free`, so skip the destructor. > > > + core::mem::forget(self); > > > + > > > + let private_data_ptr = <T::Ptr as ForeignOwnable>::into_foreign(private_data); > > > + > > > + // SAFETY: We own the private data, so we can assign to it. > > > + unsafe { (*ptr).private_data = private_data_ptr.cast_mut() }; > > > + // SAFETY: The shrinker is not yet registered, so we can update this field. > > > + unsafe { (*ptr).count_objects = Some(rust_count_objects::<T>) }; > > > + // SAFETY: The shrinker is not yet registered, so we can update this field. > > > + unsafe { (*ptr).scan_objects = Some(rust_scan_objects::<T>) }; > > > > .... and implemented exactly the same way. > > Well .. this design makes setting `seeks` optional, but `private_data` > mandatory. We *have* to set the two function pointers, but you don't > have to set `seeks` explicitly. > > It's not obvious, so it's probably worth mentioning that this design > doesn't actually require you to make an allocation and store a real > pointer in the private data. Yes, definitely worth mentioning, because I think we need to require new shrinkers to provide a direct pointer to the context the shrinker will operate on. We want to move away from anonymous "global" shrinkers that don't provide a context for the shrinker to operation on for a number of reasons... If we change the C code to work this way, then the private pointer in the rust binding will become mandatory. Hence it might be worth making the rust bindings and shrinker implementations behave this way from the start.... > When implementing the Shrinker trait for > your custom shrinker, you have to choose which pointer type to use. If > you choose the unit type `()` as the pointer type, then no real > pointer is actually stored. > > > > +/// How many objects are there in the cache? > > > +/// > > > +/// This is used as the return value of [`Shrinker::count_objects`]. > > > +pub struct CountObjects { > > > + inner: c_ulong, > > > +} > > > + > > > +impl CountObjects { > > > + /// Indicates that the number of objects is unknown. > > > + pub const UNKNOWN: Self = Self { inner: 0 }; > > > + > > > + /// Indicates that the number of objects is zero. > > > + pub const EMPTY: Self = Self { > > > + inner: SHRINK_EMPTY, > > > + }; > > > + > > > + /// The maximum possible number of freeable objects. > > > + pub const MAX: Self = Self { > > > + // The shrinker code assumes that it can multiply this value by two without overflow. > > > + inner: c_ulong::MAX / 2, > > > + }; > > > + > > > + /// Creates a new `CountObjects` with the given value. > > > + pub fn from_count(count: usize) -> Self { > > > + if count == 0 { > > > + return Self::EMPTY; > > > + } > > > > No. A return count of 0 is not the same as a return value of > > SHRINK_EMPTY. > > > > A return value of 0 means "no reclaim work can be done right now". > > > > This implies that there are objects that can be reclaimed in the near > > future, but right now they are unavailable for reclaim. This can be > > due to a trylock protecting the count operation failing, the all the > > objects in the cache being dirty and needing work to be done before > > they can be reclaimed, etc. > > > > A return value of SHRINK_EMPTY means "there are no reclaimable > > objects at all". > > > > This implies that the cache is empty - it has absolutely nothing in > > it that can be reclaimed either now or in the near future. > > > > These two return values are treated very differently by the high > > level code. SHRINK_EMPTY is used by shrink_slab_memcg() to maintain > > a "shrinker needs to run" bit in the memcg shrinker search bitmap. > > The bit is cleared when SHRINK_EMPTY is returned, meaning the > > shrinker will not get called again until a new object is queued > > to it's LRU. This sets the "shrinker needs to run" bit again, and > > so the shrinker will run next time shrink_slab_memcg() is called. > > > > In constrast, a return value of zero (i.e. no work to be done right > > now) does not change the "shrinker needs to run" state bit, and > > hence it will always be called the next time the shrink_slab_memcg() > > is run to try to reclaim objects from that memcg shrinker... > > This is a part of the API that I was pretty unsure about, so very > happy to get your input. For both of the shrinkers I am familiar with, > they know the exact count of objects and there's no error case for any > lock. The shrinkers just return the exact count directly from > count_objects, but that seemed incorrect to me, as this means that the > shrinker returns 0 when it really should return SHRINK_EMPTY. Yes, they should be returning SHRINK_EMPTY and not 0 in this case. See super_cache_count() for example. > > That's > why I implemented `from_count` this way - the intent is that you use > `CountObjects::from_count` when you know the exact count, so if you > pass 0 to that method, then that means the shrinker really is empty. > If you don't know what the count is, then you return > `CountObjects::UNKNOWN` instead. Understood, and I see your point. I think it's a good approach to take, but I don't think that the rust binding implementation is the appropriate place to introduce such API/semantic enhancements. I would really like to avoid the situation where rust-based shrinkers have explicitly different API and behavioural semantics to the C shrinkers. Hence I would prefer the rust bindings to implement the existing C semantics and then, as a separate piece of work, we move everything (both C and Rust) to the new set of semantics as a single changeset. I know that is more difficult to do and co-ordinate, but I think we are going to be much better off in the long run if we start from a common base and then make improvements to both sides of the API at the same time. Note that I'm not asking you to change all the C code here - I can certainly help there - but I am asking that we try really hard to keep APIs and behavioural semantics as close together as possible. > That said, I guess there is some reason that the C shrinker API is > designed to use the value 0 for unknown rather than empty, so perhaps > I should not try to do it differently. No reason, it just evolved organically that way over the past decade - SHRINK_EMPTY was introduced (historically speaking) fairly recently and only propogated into the shrinkers that needed to set it (i.e. the memcg aware shrinkers). This is the sort of API fragmentation I would like us to avoid - I don't want to see some shrinkers with SHRINK_EMPTY, some with 0, and then yet another new subset with UNKNOWN, yet all largely meaning the same thing in a global shrinker context but 2 of the three types being incorrect in a memcg shrinker context... > > > +impl ScanObjects { > > > + /// Indicates that the shrinker should stop trying to free objects from this cache due to > > > + /// potential deadlocks. > > > + pub const STOP: Self = Self { inner: SHRINK_STOP }; > > > + > > > + /// The maximum possible number of freeable objects. > > > + pub const MAX: Self = Self { > > > + // The shrinker code assumes that it can multiply this value by two without overflow. > > > + inner: c_ulong::MAX / 2, > > > > No it doesn't. This is purely a count of objects freed by the > > shrinker. > > In mm/shrinker.c it multiplies by two here: > total_scan = min(total_scan, (2 * freeable)); freeable is derived from the return of ->count_objects, not from ->scan_objects (i.e. it is a CountObjects type, not a ScanObjects type). ->scan_objects() returns SHRINK_STOP to indicate that scanning should stop immediately or the number of objects freed. The only thing that is done with value other than SHRINK_STOP is to aggregate it across repeated calls to ->scan_objects(). > > > + /// Determines whether it is safe to recurse into filesystem code. > > > + pub fn gfp_fs(&self) -> bool { > > > + // SAFETY: Okay by type invariants. > > > + let mask = unsafe { (*self.ptr.as_ptr()).gfp_mask }; > > > + > > > + (mask & bindings::__GFP_FS) != 0 > > > + } > > > > This needs a check for __GFP_IO as well, as there are block layer > > shrinkers that need to be GFP_NOIO aware... > > Would you prefer that I add additional methods, or that I modify this > method to also check __GFP_IO, or that I just expose a way to get the > `gfp_mask` value directly? If it was C code, I would suggest the API would something like: if (!reclaim_allowed(sc, __GFP_FS)) return SHRINK_STOP; or if (!reclaim_allowed(sc, __GFP_IO)) return SHRINK_STOP; As this matches the typical usage (e.g. see super_cache_scan()). > > > > + /// Returns the number of objects that `scan_objects` should try to reclaim. > > > + pub fn nr_to_scan(&self) -> usize { > > > + // SAFETY: Okay by type invariants. > > > + unsafe { (*self.ptr.as_ptr()).nr_to_scan as usize } > > > + } > > > + > > > + /// The callback should set this value to the number of objects processed. > > > + pub fn set_nr_scanned(&mut self, val: usize) { > > > + let mut val = val as c_ulong; > > > + // SAFETY: Okay by type invariants. > > > + let max = unsafe { (*self.ptr.as_ptr()).nr_to_scan }; > > > + if val > max { > > > + val = max; > > > + } > > > > No. Shrinkers are allowed to scan more objects in a batch than > > the high level code asked them to scan. If they do this, they > > *must* report back the count of all the objects they scanned so the > > batched scan accounting can adjust the remaining amount of work that > > needs to be done appropriately. > > Acknowledged. > > This is related to another question that I had, actually. Other than > returning SHRINK_STOP, in what situations should `nr_scanned` be > different from the return value of `scan_objects`? nr_scanned will almost always be different to the return value of ->scan_objects. There are 3 values of note here. - nr_scanned is set by the shrinker scan setup code to the batch size the shrinker should scan in it's next invocation. If the shrinker ignores nr_to_scan batch limits, it can indicate how many objects it scanned by setting the number it scanned to nr_scanned before it returns. - nr_to_scan is set to nr_scanned at setup. The shrinker should then decrement nr_scanned for each object that is scanned by the shrinker. The shrinker should stop scanning and return nr_freed when nr_to_scan reaches zero. If the value of this is non-zero when the scan returns, it is an indication that the shrinker either did not scan the entire batch or the batch size was larger than the remaining number of objects in the cache. - nr_freed: return value from the shrinker This is a count of the number of objects the shrinker freed during the scan. If the shrinker cannot make progress for any reason it should immediately return SHRINK_STOP. So on a typical shrinker pass, we'll exit with: nr_scanned = batch size; nr_to_scan = 0; nr_freed = [0..batch size]; Some shrinkers, however, have variable sized objects, and so they will normalise nr_freed and nr_scanned to the object size. The i915/gem shrinker does this, as it considers a page to be an object, but in reality is has a smaller number of variable sized multi-page objects being tracked. Hence it might be asked to scan 128 objects (default batch size), but the scan iterated over objects that contain 256 pages. In that case, it will return: nr_scanned = 256; nr_to_scan = 0; nr_freed = [0..256] And the high level shrinker code will treat that as having scanned 256 objects rather than the 128 objects it asked for. There are several shrinkers that play weird accounting games like this. It's not really all that nice and it would be ice to clean all this up, but there's been no real urgency to do this because it's just a slightly different manner of accounting reclaim progress.... -Dave.
On Tue, Sep 17, 2024 at 3:59 AM Dave Chinner <david@fromorbit.com> wrote: > > On Mon, Sep 16, 2024 at 12:22:40PM +0200, Alice Ryhl wrote: > > On Mon, Sep 16, 2024 at 4:59 AM Dave Chinner <david@fromorbit.com> wrote: > > > > > > On Thu, Sep 12, 2024 at 09:54:01AM +0000, Alice Ryhl wrote: > > > > Rust Binder holds incoming transactions in a read-only mmap'd region > > > > where it manually manages the pages. These pages are only in use until > > > > the incoming transaction is consumed by userspace, but the kernel will > > > > keep the pages around for future transactions. Rust Binder registers a > > > > shrinker with the kernel so that it can give back these pages if the > > > > system comes under memory pressure. > > > > > > > > Separate types are provided for registered and unregistered shrinkers. > > > > The unregistered shrinker type can be used to configure the shrinker > > > > before registering it. Separating it into two types also enables the > > > > user to construct the private data between the calls to `shrinker_alloc` > > > > and `shrinker_register` and avoid constructing the private data if > > > > allocating the shrinker fails. > > > > > > This seems a bit nasty. It appears to me that the code does an > > > unsafe copy of the internal shrinker state between the unregistered > > > and registered types. Shrinkers have additional internal state when > > > SHRINKER_MEMCG_AWARE and/or SHRINKER_NUMA_AWARE flags are set, > > > and this abstraction doesn't seem to handle either those flags or > > > the internal state they use at all. > > > > Doing an unsafe copy of the internal shrinker state is certainly not > > my intent. The UnregisteredShrinker and RegisteredShrinker types just > > wrap a non-null pointer, so the moves inside `register` are not copies > > of the whole `struct shrinker` but just copies of a pointer to the > > shrinker. > > So you're playing type casting games with the same structure just to > avoid the potential doing a little bit of unnecessary work in an > error situation? > > This all seems rather convoluted and largely unnecessary to me as > the shrinker_free() C code transparent handles both registered and > unregistered shrinkers being passed to it. There really is no > difference between the two cases in terms of error handling and > teardown.... Well another thing it does is make it possible to specify shrinker without making it mandatory. I don't think it's so bad to have a builder type, but I can drop it if you prefer that. If we replace the builder with a single function for constructing it, then we need to pick one of: * Not accept a value for "seeks" at all. * Always require an explicit value for "seeks". * Make a "shrinker options" struct that we can pass to the constructor. * Have multiple constructors. > > > > +impl UnregisteredShrinker { > > > > + /// Create a new shrinker. > > > > + pub fn alloc(name: &CStr) -> Result<Self, AllocError> { > > > > + // SAFETY: Passing `0` as flags is okay. Using `%s` as the format string is okay when we > > > > + // pass a nul-terminated string as the string for `%s` to print. > > > > + let ptr = > > > > + unsafe { bindings::shrinker_alloc(0, c_str!("%s").as_char_ptr(), name.as_char_ptr()) }; > > > > > > This passes flags as 0, so doesn't support SHRINKER_MEMCG_AWARE or > > > SHRINKER_NUMA_AWARE shrinker variants. These flags should be > > > passed through here. > > > > I don't have a user for memcg/numa aware shrinkers in Rust, which is > > why I did not extend these abstractions to support them. However, if > > you would like me to, I'm happy to do so. > > Please do. A partial implementation only makes it harder for the > next person to use the API.... Alright. > > It is easy to add a flags > > argument to this method, but I imagine they also need a few other > > additions elsewhere in the API to really be supported? > > In the shrinker API, nothing extra needs to be done. > > If you haven't added bindings to the list_lru interfaces yet, then > you are going to need to, including the memcg and numa aware > interfaces for that. I'm not familiar with the memcg/numa aware interfaces for list_lru, as I only know about Binder's usage. Is there a good example of using them somewhere? Or does it not require anything beyond just adding Rust versions of list_lru_shrink_walk and list_lru_shrink_count that accept a ShrinkControl from a shrinker? > > Now that I think about it, perhaps Binder's shrinker *could* be memcg > > aware? It uses the list_lru abstraction, and currently calls > > `list_lru_count` in `count_objects`, but maybe we could just use > > `list_lru_shrink_count` instead and magically become memcg aware ...? > > The scan code also needs to use list_lru_shrink_walk(), too, > not list_lru_walk(). > > And, yes, that's really the whole point of the list_lru abstraction - > almost nothing extra needs to be done to enable both NUMA and memcg > based reclaim context support for a cache once it's been converted > to use list_lru for tracking reclaimable objects. If it's that easy, then that's great! > > > > + /// Set the number of seeks needed to recreate an object. > > > > + pub fn set_seeks(&mut self, seeks: u32) { > > > > + unsafe { (*self.shrinker.as_ptr()).seeks = seeks as c_int }; > > > > + } > > > > > > Seems kinda weird to have a separate function for setting seeks, > > > when... > > > > > > > + > > > > + /// Register the shrinker. > > > > + /// > > > > + /// The provided pointer is used as the private data, and the type `T` determines the callbacks > > > > + /// that the shrinker will use. > > > > + pub fn register<T: Shrinker>(self, private_data: T::Ptr) -> RegisteredShrinker<T> { > > > > > > .... all the other data needed to set up a shrinker is provided to > > > this function.... > > > > > > > + let shrinker = self.shrinker; > > > > + let ptr = shrinker.as_ptr(); > > > > + > > > > + // The destructor of `self` calls `shrinker_free`, so skip the destructor. > > > > + core::mem::forget(self); > > > > + > > > > + let private_data_ptr = <T::Ptr as ForeignOwnable>::into_foreign(private_data); > > > > + > > > > + // SAFETY: We own the private data, so we can assign to it. > > > > + unsafe { (*ptr).private_data = private_data_ptr.cast_mut() }; > > > > + // SAFETY: The shrinker is not yet registered, so we can update this field. > > > > + unsafe { (*ptr).count_objects = Some(rust_count_objects::<T>) }; > > > > + // SAFETY: The shrinker is not yet registered, so we can update this field. > > > > + unsafe { (*ptr).scan_objects = Some(rust_scan_objects::<T>) }; > > > > > > .... and implemented exactly the same way. > > > > Well .. this design makes setting `seeks` optional, but `private_data` > > mandatory. We *have* to set the two function pointers, but you don't > > have to set `seeks` explicitly. > > > > It's not obvious, so it's probably worth mentioning that this design > > doesn't actually require you to make an allocation and store a real > > pointer in the private data. > > Yes, definitely worth mentioning, because I think we need to require > new shrinkers to provide a direct pointer to the context the > shrinker will operate on. We want to move away from anonymous > "global" shrinkers that don't provide a context for the shrinker to > operation on for a number of reasons... > > If we change the C code to work this way, then the private pointer > in the rust binding will become mandatory. Hence it might be worth > making the rust bindings and shrinker implementations behave this > way from the start.... Thanks for the context. That is good to know. > > When implementing the Shrinker trait for > > your custom shrinker, you have to choose which pointer type to use. If > > you choose the unit type `()` as the pointer type, then no real > > pointer is actually stored. > > > > > > +/// How many objects are there in the cache? > > > > +/// > > > > +/// This is used as the return value of [`Shrinker::count_objects`]. > > > > +pub struct CountObjects { > > > > + inner: c_ulong, > > > > +} > > > > + > > > > +impl CountObjects { > > > > + /// Indicates that the number of objects is unknown. > > > > + pub const UNKNOWN: Self = Self { inner: 0 }; > > > > + > > > > + /// Indicates that the number of objects is zero. > > > > + pub const EMPTY: Self = Self { > > > > + inner: SHRINK_EMPTY, > > > > + }; > > > > + > > > > + /// The maximum possible number of freeable objects. > > > > + pub const MAX: Self = Self { > > > > + // The shrinker code assumes that it can multiply this value by two without overflow. > > > > + inner: c_ulong::MAX / 2, > > > > + }; > > > > + > > > > + /// Creates a new `CountObjects` with the given value. > > > > + pub fn from_count(count: usize) -> Self { > > > > + if count == 0 { > > > > + return Self::EMPTY; > > > > + } > > > > > > No. A return count of 0 is not the same as a return value of > > > SHRINK_EMPTY. > > > > > > A return value of 0 means "no reclaim work can be done right now". > > > > > > This implies that there are objects that can be reclaimed in the near > > > future, but right now they are unavailable for reclaim. This can be > > > due to a trylock protecting the count operation failing, the all the > > > objects in the cache being dirty and needing work to be done before > > > they can be reclaimed, etc. > > > > > > A return value of SHRINK_EMPTY means "there are no reclaimable > > > objects at all". > > > > > > This implies that the cache is empty - it has absolutely nothing in > > > it that can be reclaimed either now or in the near future. > > > > > > These two return values are treated very differently by the high > > > level code. SHRINK_EMPTY is used by shrink_slab_memcg() to maintain > > > a "shrinker needs to run" bit in the memcg shrinker search bitmap. > > > The bit is cleared when SHRINK_EMPTY is returned, meaning the > > > shrinker will not get called again until a new object is queued > > > to it's LRU. This sets the "shrinker needs to run" bit again, and > > > so the shrinker will run next time shrink_slab_memcg() is called. > > > > > > In constrast, a return value of zero (i.e. no work to be done right > > > now) does not change the "shrinker needs to run" state bit, and > > > hence it will always be called the next time the shrink_slab_memcg() > > > is run to try to reclaim objects from that memcg shrinker... > > > > This is a part of the API that I was pretty unsure about, so very > > happy to get your input. For both of the shrinkers I am familiar with, > > they know the exact count of objects and there's no error case for any > > lock. The shrinkers just return the exact count directly from > > count_objects, but that seemed incorrect to me, as this means that the > > shrinker returns 0 when it really should return SHRINK_EMPTY. > > Yes, they should be returning SHRINK_EMPTY and not 0 in this case. > See super_cache_count() for example. Okay. > > That's > > why I implemented `from_count` this way - the intent is that you use > > `CountObjects::from_count` when you know the exact count, so if you > > pass 0 to that method, then that means the shrinker really is empty. > > If you don't know what the count is, then you return > > `CountObjects::UNKNOWN` instead. > > Understood, and I see your point. I think it's a good approach to > take, but I don't think that the rust binding implementation is the > appropriate place to introduce such API/semantic enhancements. > > I would really like to avoid the situation where rust-based > shrinkers have explicitly different API and behavioural semantics to > the C shrinkers. Hence I would prefer the rust bindings to implement > the existing C semantics and then, as a separate piece of work, we > move everything (both C and Rust) to the new set of semantics as a > single changeset. > > I know that is more difficult to do and co-ordinate, but I think we > are going to be much better off in the long run if we start from a > common base and then make improvements to both sides of the API at > the same time. Note that I'm not asking you to change all the C code > here - I can certainly help there - but I am asking that we try > really hard to keep APIs and behavioural semantics as close together > as possible. That is reasonable. What do you prefer that I do with the CountObjects type I have now? I can change it to this: * CountObjects::EMPTY * CountObjects::new(number) Where `new` just uses the provided number as-is (unless it's too big). I can also get rid of the CountObjects type entirely. > > That said, I guess there is some reason that the C shrinker API is > > designed to use the value 0 for unknown rather than empty, so perhaps > > I should not try to do it differently. > > No reason, it just evolved organically that way over the past decade > - SHRINK_EMPTY was introduced (historically speaking) fairly > recently and only propogated into the shrinkers that needed to set > it (i.e. the memcg aware shrinkers). > > This is the sort of API fragmentation I would like us to avoid - I > don't want to see some shrinkers with SHRINK_EMPTY, some with 0, and > then yet another new subset with UNKNOWN, yet all largely meaning > the same thing in a global shrinker context but 2 of the three types > being incorrect in a memcg shrinker context... Yeah, that makes sense. I'm on board with keeping things consistent. > > > > +impl ScanObjects { > > > > + /// Indicates that the shrinker should stop trying to free objects from this cache due to > > > > + /// potential deadlocks. > > > > + pub const STOP: Self = Self { inner: SHRINK_STOP }; > > > > + > > > > + /// The maximum possible number of freeable objects. > > > > + pub const MAX: Self = Self { > > > > + // The shrinker code assumes that it can multiply this value by two without overflow. > > > > + inner: c_ulong::MAX / 2, > > > > > > No it doesn't. This is purely a count of objects freed by the > > > shrinker. > > > > In mm/shrinker.c it multiplies by two here: > > total_scan = min(total_scan, (2 * freeable)); > > freeable is derived from the return of ->count_objects, not > from ->scan_objects (i.e. it is a CountObjects type, not a > ScanObjects type). > > ->scan_objects() returns SHRINK_STOP to indicate that scanning > should stop immediately or the number of objects freed. The only > thing that is done with value other than SHRINK_STOP is to aggregate > it across repeated calls to ->scan_objects(). Fair point. I can change the maximum to SHRINK_STOP-1. Or do you want me to get rid of it completely? > > > > + /// Determines whether it is safe to recurse into filesystem code. > > > > + pub fn gfp_fs(&self) -> bool { > > > > + // SAFETY: Okay by type invariants. > > > > + let mask = unsafe { (*self.ptr.as_ptr()).gfp_mask }; > > > > + > > > > + (mask & bindings::__GFP_FS) != 0 > > > > + } > > > > > > This needs a check for __GFP_IO as well, as there are block layer > > > shrinkers that need to be GFP_NOIO aware... > > > > Would you prefer that I add additional methods, or that I modify this > > method to also check __GFP_IO, or that I just expose a way to get the > > `gfp_mask` value directly? > > If it was C code, I would suggest the API would something like: > > if (!reclaim_allowed(sc, __GFP_FS)) > return SHRINK_STOP; > > or > if (!reclaim_allowed(sc, __GFP_IO)) > return SHRINK_STOP; > > As this matches the typical usage (e.g. see super_cache_scan()). I guess there are several options. A few Rust versions: fn reclaim_allowed(&self, mode: ReclaimMode) -> bool; enum ReclaimMode { __GFP_FS, __GFP_IO, } or perhaps just: fn reclaim_fs_allowed(&self) -> bool; fn reclaim_io_allowed(&self) -> bool; > > > > + /// Returns the number of objects that `scan_objects` should try to reclaim. > > > > + pub fn nr_to_scan(&self) -> usize { > > > > + // SAFETY: Okay by type invariants. > > > > + unsafe { (*self.ptr.as_ptr()).nr_to_scan as usize } > > > > + } > > > > + > > > > + /// The callback should set this value to the number of objects processed. > > > > + pub fn set_nr_scanned(&mut self, val: usize) { > > > > + let mut val = val as c_ulong; > > > > + // SAFETY: Okay by type invariants. > > > > + let max = unsafe { (*self.ptr.as_ptr()).nr_to_scan }; > > > > + if val > max { > > > > + val = max; > > > > + } > > > > > > No. Shrinkers are allowed to scan more objects in a batch than > > > the high level code asked them to scan. If they do this, they > > > *must* report back the count of all the objects they scanned so the > > > batched scan accounting can adjust the remaining amount of work that > > > needs to be done appropriately. > > > > Acknowledged. > > > > This is related to another question that I had, actually. Other than > > returning SHRINK_STOP, in what situations should `nr_scanned` be > > different from the return value of `scan_objects`? > > nr_scanned will almost always be different to the return value of > ->scan_objects. > > There are 3 values of note here. > > - nr_scanned is set by the shrinker scan setup code to the batch > size the shrinker should scan in it's next invocation. > > If the shrinker ignores nr_to_scan batch limits, it can indicate > how many objects it scanned by setting the number it scanned to > nr_scanned before it returns. > > - nr_to_scan is set to nr_scanned at setup. > > The shrinker should then decrement nr_scanned for each object that > is scanned by the shrinker. The shrinker should stop scanning and > return nr_freed when nr_to_scan reaches zero. > > If the value of this is non-zero when the scan returns, it is an > indication that the shrinker either did not scan the entire > batch or the batch size was larger than the remaining number of > objects in the cache. > > - nr_freed: return value from the shrinker > > This is a count of the number of objects the shrinker freed during > the scan. > > If the shrinker cannot make progress for any reason it > should immediately return SHRINK_STOP. > > So on a typical shrinker pass, we'll exit with: > > nr_scanned = batch size; > nr_to_scan = 0; > nr_freed = [0..batch size]; > > Some shrinkers, however, have variable sized objects, and so they > will normalise nr_freed and nr_scanned to the object size. The > i915/gem shrinker does this, as it considers a page to be an object, > but in reality is has a smaller number of variable sized multi-page > objects being tracked. Hence it might be asked to scan 128 objects > (default batch size), but the scan iterated over objects that > contain 256 pages. > > In that case, it will return: > > nr_scanned = 256; > nr_to_scan = 0; > nr_freed = [0..256] > > And the high level shrinker code will treat that as having scanned > 256 objects rather than the 128 objects it asked for. > > There are several shrinkers that play weird accounting games like > this. It's not really all that nice and it would be ice to clean all > this up, but there's been no real urgency to do this because it's > just a slightly different manner of accounting reclaim progress.... Ahh ... so the numbers can be different if you scan an object without freeing it. Alice
On Sat, Sep 14, 2024 at 3:07 PM Gary Guo <gary@garyguo.net> wrote: > > On Thu, 12 Sep 2024 09:54:01 +0000 > Alice Ryhl <aliceryhl@google.com> wrote: > > +/// This struct is used to pass information from page reclaim to the shrinkers. > > +pub struct ShrinkControl<'a> { > > + ptr: NonNull<bindings::shrink_control>, > > + _phantom: PhantomData<&'a bindings::shrink_control>, > > +} > > I feel like this can just be a wrapper of `Opaque<ShrinkControl>` and > we hand out `&'a ShrinkControl`? We need mutable access, but using a pinned mutable reference is too inconvenient. I prefer this. Alice
diff --git a/rust/bindings/bindings_helper.h b/rust/bindings/bindings_helper.h index ae82e9c941af..7fc958e05dc5 100644 --- a/rust/bindings/bindings_helper.h +++ b/rust/bindings/bindings_helper.h @@ -18,6 +18,7 @@ #include <linux/phy.h> #include <linux/refcount.h> #include <linux/sched.h> +#include <linux/shrinker.h> #include <linux/slab.h> #include <linux/wait.h> #include <linux/workqueue.h> @@ -31,4 +32,5 @@ const gfp_t RUST_CONST_HELPER_GFP_KERNEL_ACCOUNT = GFP_KERNEL_ACCOUNT; const gfp_t RUST_CONST_HELPER_GFP_NOWAIT = GFP_NOWAIT; const gfp_t RUST_CONST_HELPER___GFP_ZERO = __GFP_ZERO; const gfp_t RUST_CONST_HELPER___GFP_HIGHMEM = ___GFP_HIGHMEM; +const gfp_t RUST_CONST_HELPER___GFP_FS = ___GFP_FS; const blk_features_t RUST_CONST_HELPER_BLK_FEAT_ROTATIONAL = BLK_FEAT_ROTATIONAL; diff --git a/rust/kernel/lib.rs b/rust/kernel/lib.rs index f10b06a78b9d..f35eb290f2e0 100644 --- a/rust/kernel/lib.rs +++ b/rust/kernel/lib.rs @@ -45,6 +45,7 @@ pub mod prelude; pub mod print; pub mod rbtree; +pub mod shrinker; mod static_assert; #[doc(hidden)] pub mod std_vendor; diff --git a/rust/kernel/shrinker.rs b/rust/kernel/shrinker.rs new file mode 100644 index 000000000000..9af726bfe0b1 --- /dev/null +++ b/rust/kernel/shrinker.rs @@ -0,0 +1,324 @@ +// SPDX-License-Identifier: GPL-2.0 + +// Copyright (C) 2024 Google LLC. + +//! Shrinker for handling memory pressure. +//! +//! C header: [`include/linux/shrinker.h`](srctree/include/linux/shrinker.h) + +use crate::{alloc::AllocError, bindings, c_str, str::CStr, types::ForeignOwnable}; + +use core::{ + ffi::{c_int, c_ulong, c_void}, + marker::PhantomData, + ptr::NonNull, +}; + +const SHRINK_STOP: c_ulong = bindings::SHRINK_STOP as c_ulong; +const SHRINK_EMPTY: c_ulong = bindings::SHRINK_EMPTY as c_ulong; + +/// The default value for the number of seeks needed to recreate an object. +pub const DEFAULT_SEEKS: u32 = bindings::DEFAULT_SEEKS; + +/// An unregistered shrinker. +/// +/// This type can be used to modify the settings of the shrinker before it is registered. +/// +/// # Invariants +/// +/// The `shrinker` pointer references an unregistered shrinker. +pub struct UnregisteredShrinker { + shrinker: NonNull<bindings::shrinker>, +} + +// SAFETY: Moving an unregistered shrinker between threads is okay. +unsafe impl Send for UnregisteredShrinker {} +// SAFETY: An unregistered shrinker is thread safe. +unsafe impl Sync for UnregisteredShrinker {} + +impl UnregisteredShrinker { + /// Create a new shrinker. + pub fn alloc(name: &CStr) -> Result<Self, AllocError> { + // SAFETY: Passing `0` as flags is okay. Using `%s` as the format string is okay when we + // pass a nul-terminated string as the string for `%s` to print. + let ptr = + unsafe { bindings::shrinker_alloc(0, c_str!("%s").as_char_ptr(), name.as_char_ptr()) }; + + let shrinker = NonNull::new(ptr).ok_or(AllocError)?; + + // INVARIANT: The creation of the shrinker was successful. + Ok(Self { shrinker }) + } + + /// Create a new shrinker using format arguments for the name. + pub fn alloc_fmt(name: core::fmt::Arguments<'_>) -> Result<Self, AllocError> { + // SAFETY: Passing `0` as flags is okay. Using `%pA` as the format string is okay when we + // pass a `fmt::Arguments` as the value to print. + let ptr = unsafe { + bindings::shrinker_alloc( + 0, + c_str!("%pA").as_char_ptr(), + &name as *const _ as *const c_void, + ) + }; + + let shrinker = NonNull::new(ptr).ok_or(AllocError)?; + + // INVARIANT: The creation of the shrinker was successful. + Ok(Self { shrinker }) + } + + /// Set the number of seeks needed to recreate an object. + pub fn set_seeks(&mut self, seeks: u32) { + unsafe { (*self.shrinker.as_ptr()).seeks = seeks as c_int }; + } + + /// Register the shrinker. + /// + /// The provided pointer is used as the private data, and the type `T` determines the callbacks + /// that the shrinker will use. + pub fn register<T: Shrinker>(self, private_data: T::Ptr) -> RegisteredShrinker<T> { + let shrinker = self.shrinker; + let ptr = shrinker.as_ptr(); + + // The destructor of `self` calls `shrinker_free`, so skip the destructor. + core::mem::forget(self); + + let private_data_ptr = <T::Ptr as ForeignOwnable>::into_foreign(private_data); + + // SAFETY: We own the private data, so we can assign to it. + unsafe { (*ptr).private_data = private_data_ptr.cast_mut() }; + // SAFETY: The shrinker is not yet registered, so we can update this field. + unsafe { (*ptr).count_objects = Some(rust_count_objects::<T>) }; + // SAFETY: The shrinker is not yet registered, so we can update this field. + unsafe { (*ptr).scan_objects = Some(rust_scan_objects::<T>) }; + + // SAFETY: The shrinker is unregistered, so it's safe to register it. + unsafe { bindings::shrinker_register(ptr) }; + + RegisteredShrinker { + shrinker, + _phantom: PhantomData, + } + } +} + +impl Drop for UnregisteredShrinker { + fn drop(&mut self) { + // SAFETY: The shrinker is a valid but unregistered shrinker, and we will not use it + // anymore. + unsafe { bindings::shrinker_free(self.shrinker.as_ptr()) }; + } +} + +/// A shrinker that is registered with the kernel. +/// +/// # Invariants +/// +/// The `shrinker` pointer refers to a registered shrinker using `T` as the private data. +pub struct RegisteredShrinker<T: Shrinker> { + shrinker: NonNull<bindings::shrinker>, + _phantom: PhantomData<T::Ptr>, +} + +// SAFETY: This allows you to deregister the shrinker from a different thread, which means that +// private data could be dropped from any thread. +unsafe impl<T: Shrinker> Send for RegisteredShrinker<T> where T::Ptr: Send {} +// SAFETY: The only thing you can do with an immutable reference is access the private data, which +// is okay to access in parallel as the `Shrinker` trait requires the private data to be `Sync`. +unsafe impl<T: Shrinker> Sync for RegisteredShrinker<T> {} + +impl<T: Shrinker> RegisteredShrinker<T> { + /// Access the private data in this shrinker. + pub fn private_data(&self) -> <T::Ptr as ForeignOwnable>::Borrowed<'_> { + // SAFETY: We own the private data, so we can access it. + let private = unsafe { (*self.shrinker.as_ptr()).private_data }; + // SAFETY: By the type invariants, the private data is `T`. This access could happen in + // parallel with a shrinker callback, but that's okay as the `Shrinker` trait ensures that + // `T::Ptr` is `Sync`. + unsafe { <T::Ptr as ForeignOwnable>::borrow(private) } + } +} + +impl<T: Shrinker> Drop for RegisteredShrinker<T> { + fn drop(&mut self) { + // SAFETY: We own the private data, so we can access it. + let private = unsafe { (*self.shrinker.as_ptr()).private_data }; + // SAFETY: We will not access the shrinker after this call. + unsafe { bindings::shrinker_free(self.shrinker.as_ptr()) }; + // SAFETY: The above call blocked until the completion of any shrinker callbacks, so there + // are no longer any users of the private data. + drop(unsafe { <T::Ptr as ForeignOwnable>::from_foreign(private) }); + } +} + +/// Callbacks for a shrinker. +pub trait Shrinker { + /// The pointer type used to store the private data of the shrinker. + /// + /// Needs to be `Sync` because the shrinker callback could access this value immutably from + /// several thread in parallel. + type Ptr: ForeignOwnable + Sync; + + /// Count the number of freeable items in the cache. + /// + /// May be called from atomic context. + fn count_objects( + me: <Self::Ptr as ForeignOwnable>::Borrowed<'_>, + sc: ShrinkControl<'_>, + ) -> CountObjects; + + /// Count the number of freeable items in the cache. + /// + /// May be called from atomic context. + fn scan_objects( + me: <Self::Ptr as ForeignOwnable>::Borrowed<'_>, + sc: ShrinkControl<'_>, + ) -> ScanObjects; +} + +/// How many objects are there in the cache? +/// +/// This is used as the return value of [`Shrinker::count_objects`]. +pub struct CountObjects { + inner: c_ulong, +} + +impl CountObjects { + /// Indicates that the number of objects is unknown. + pub const UNKNOWN: Self = Self { inner: 0 }; + + /// Indicates that the number of objects is zero. + pub const EMPTY: Self = Self { + inner: SHRINK_EMPTY, + }; + + /// The maximum possible number of freeable objects. + pub const MAX: Self = Self { + // The shrinker code assumes that it can multiply this value by two without overflow. + inner: c_ulong::MAX / 2, + }; + + /// Creates a new `CountObjects` with the given value. + pub fn from_count(count: usize) -> Self { + if count == 0 { + return Self::EMPTY; + } + + if count > Self::MAX.inner as usize { + return Self::MAX; + } + + Self { + inner: count as c_ulong, + } + } +} + +/// How many objects were freed? +/// +/// This is used as the return value of [`Shrinker::scan_objects`]. +pub struct ScanObjects { + inner: c_ulong, +} + +impl ScanObjects { + /// Indicates that the shrinker should stop trying to free objects from this cache due to + /// potential deadlocks. + pub const STOP: Self = Self { inner: SHRINK_STOP }; + + /// The maximum possible number of freeable objects. + pub const MAX: Self = Self { + // The shrinker code assumes that it can multiply this value by two without overflow. + inner: c_ulong::MAX / 2, + }; + + /// Creates a new `CountObjects` with the given value. + pub fn from_count(count: usize) -> Self { + if count > Self::MAX.inner as usize { + return Self::MAX; + } + + Self { + inner: count as c_ulong, + } + } +} + +/// This struct is used to pass information from page reclaim to the shrinkers. +pub struct ShrinkControl<'a> { + ptr: NonNull<bindings::shrink_control>, + _phantom: PhantomData<&'a bindings::shrink_control>, +} + +impl<'a> ShrinkControl<'a> { + /// Create a `ShrinkControl` from a raw pointer. + /// + /// # Safety + /// + /// The pointer should point at a valid `shrink_control` for the duration of 'a. + pub unsafe fn from_raw(ptr: *mut bindings::shrink_control) -> Self { + Self { + // SAFETY: Caller promises that this pointer is valid. + ptr: unsafe { NonNull::new_unchecked(ptr) }, + _phantom: PhantomData, + } + } + + /// Determines whether it is safe to recurse into filesystem code. + pub fn gfp_fs(&self) -> bool { + // SAFETY: Okay by type invariants. + let mask = unsafe { (*self.ptr.as_ptr()).gfp_mask }; + + (mask & bindings::__GFP_FS) != 0 + } + + /// Returns the number of objects that `scan_objects` should try to reclaim. + pub fn nr_to_scan(&self) -> usize { + // SAFETY: Okay by type invariants. + unsafe { (*self.ptr.as_ptr()).nr_to_scan as usize } + } + + /// The callback should set this value to the number of objects processed. + pub fn set_nr_scanned(&mut self, val: usize) { + let mut val = val as c_ulong; + // SAFETY: Okay by type invariants. + let max = unsafe { (*self.ptr.as_ptr()).nr_to_scan }; + if val > max { + val = max; + } + + // SAFETY: Okay by type invariants. + unsafe { (*self.ptr.as_ptr()).nr_scanned = val }; + } +} + +unsafe extern "C" fn rust_count_objects<T: Shrinker>( + shrink: *mut bindings::shrinker, + sc: *mut bindings::shrink_control, +) -> c_ulong { + // SAFETY: We own the private data, so we can access it. + let private = unsafe { (*shrink).private_data }; + // SAFETY: This function is only used with shrinkers where `T` is the type of the private data. + let private = unsafe { <T::Ptr as ForeignOwnable>::borrow(private) }; + // SAFETY: The caller passes a valid `sc` pointer. + let sc = unsafe { ShrinkControl::from_raw(sc) }; + + let ret = T::count_objects(private, sc); + ret.inner +} + +unsafe extern "C" fn rust_scan_objects<T: Shrinker>( + shrink: *mut bindings::shrinker, + sc: *mut bindings::shrink_control, +) -> c_ulong { + // SAFETY: We own the private data, so we can access it. + let private = unsafe { (*shrink).private_data }; + // SAFETY: This function is only used with shrinkers where `T` is the type of the private data. + let private = unsafe { <T::Ptr as ForeignOwnable>::borrow(private) }; + // SAFETY: The caller passes a valid `sc` pointer. + let sc = unsafe { ShrinkControl::from_raw(sc) }; + + let ret = T::scan_objects(private, sc); + ret.inner +}
Rust Binder holds incoming transactions in a read-only mmap'd region where it manually manages the pages. These pages are only in use until the incoming transaction is consumed by userspace, but the kernel will keep the pages around for future transactions. Rust Binder registers a shrinker with the kernel so that it can give back these pages if the system comes under memory pressure. Separate types are provided for registered and unregistered shrinkers. The unregistered shrinker type can be used to configure the shrinker before registering it. Separating it into two types also enables the user to construct the private data between the calls to `shrinker_alloc` and `shrinker_register` and avoid constructing the private data if allocating the shrinker fails. The user specifies the callbacks in use by implementing the Shrinker trait for the type used for the private data. This requires specifying three things: implementations for count_objects and scan_objects, and the pointer type that the private data will be wrapped in. The return values of count_objects and scan_objects are provided using new types called CountObjects and ScanObjects respectively. These types prevent the user from e.g. returning SHRINK_STOP from count_objects or returning SHRINK_EMPTY from scan_objects. Signed-off-by: Alice Ryhl <aliceryhl@google.com> --- rust/bindings/bindings_helper.h | 2 + rust/kernel/lib.rs | 1 + rust/kernel/shrinker.rs | 324 ++++++++++++++++++++++++++++++++++++++++ 3 files changed, 327 insertions(+) --- base-commit: 93dc3be19450447a3a7090bd1dfb9f3daac3e8d2 change-id: 20240911-shrinker-f8371af00b68 Best regards,