diff mbox series

[v3,4/4] rust: add abstraction for `struct page`

Message ID 20240311-alice-mm-v3-4-cdf7b3a2049c@google.com (mailing list archive)
State New
Headers show
Series Memory management patches needed by Rust Binder | expand

Commit Message

Alice Ryhl March 11, 2024, 10:47 a.m. UTC
Adds a new struct called `Page` that wraps a pointer to `struct page`.
This struct is assumed to hold ownership over the page, so that Rust
code can allocate and manage pages directly.

The page type has various methods for reading and writing into the page.
These methods will temporarily map the page to allow the operation. All
of these methods use a helper that takes an offset and length, performs
bounds checks, and returns a pointer to the given offset in the page.

This patch only adds support for pages of order zero, as that is all
Rust Binder needs. However, it is written to make it easy to add support
for higher-order pages in the future. To do that, you would add a const
generic parameter to `Page` that specifies the order. Most of the
methods do not need to be adjusted, as the logic for dealing with
mapping multiple pages at once can be isolated to just the
`with_pointer_into_page` method. Finally, the struct can be renamed to
`Pages<ORDER>`, and the type alias `Page = Pages<0>` can be introduced.

Rust Binder needs to manage pages directly as that is how transactions
are delivered: Each process has an mmap'd region for incoming
transactions. When an incoming transaction arrives, the Binder driver
will choose a region in the mmap, allocate and map the relevant pages
manually, and copy the incoming transaction directly into the page. This
architecture allows the driver to copy transactions directly from the
address space of one process to another, without an intermediate copy
to a kernel buffer.

This code is based on Wedson's page abstractions from the old rust
branch, but it has been modified by Alice by removing the incomplete
support for higher-order pages, by introducing the `with_*` helpers
to consolidate the bounds checking logic into a single place, and by
introducing gfp flags.

Co-developed-by: Wedson Almeida Filho <wedsonaf@gmail.com>
Signed-off-by: Wedson Almeida Filho <wedsonaf@gmail.com>
Signed-off-by: Alice Ryhl <aliceryhl@google.com>
---
 rust/bindings/bindings_helper.h |   3 +
 rust/helpers.c                  |  20 ++++
 rust/kernel/lib.rs              |   1 +
 rust/kernel/page.rs             | 223 ++++++++++++++++++++++++++++++++++++++++
 4 files changed, 247 insertions(+)

Comments

Alice Ryhl March 11, 2024, 10:50 a.m. UTC | #1
Alice Ryhl <aliceryhl@google.com> writes:
> +/// Flags for the "get free page" function that underlies all memory allocations.
> +pub mod flags {
> +    pub type gfp_t = bindings::gfp_t;
> +
> +    /// `GFP_KERNEL` is typical for kernel-internal allocations. The caller requires `ZONE_NORMAL`
> +    /// or a lower zone for direct access but can direct reclaim.
> +    pub const GFP_KERNEL: gfp_t = bindings::GFP_KERNEL;
> +    /// `GFP_ZERO` returns a zeroed page on success.
> +    pub const __GFP_ZERO: gfp_t = bindings::__GFP_ZERO;
> +    /// `GFP_HIGHMEM` indicates that the allocated memory may be located in high memory.
> +    pub const __GFP_HIGHMEM: gfp_t = bindings::__GFP_HIGHMEM;
> +}
>
> [...]
>
> +impl Page {
> +    /// Allocates a new page.
> +    pub fn alloc_page(gfp_flags: flags::gfp_t) -> Result<Self, AllocError> {
> +        // SAFETY: The specified order is zero and we want one page.
> +        let page = unsafe { bindings::alloc_pages(gfp_flags, 0) };
> +        let page = NonNull::new(page).ok_or(AllocError)?;
> +        // INVARIANT: We checked that the allocation succeeded.
> +        Ok(Self { page })
> +    }

Matthew Wilcox: You suggested on a previous version that I use gfp flags
here, or that I rename it to e.g. BinderPage to make it clear that this
is specific to the kind of pages that Binder needs.

In this version I added some gfp flags, but I'm not actually sure that
the Page abstraction works for all combinations of gfp flags. For
example, I use kmap_local_page when accessing the page, but is that
correct if there's a user that doesn't pass GFP_HIGHMEM?

So perhaps it should be called HighmemPage since the methods on it
hardcode that. Or maybe it really doesn't make sense to generalize it
beyond what Binder needs.

What do you think? How broadly does these implementations generalize? I
would be happy to hear your advice on this.


Andreas Hindborg: I recall you mentioning that you also needed an
abstraction for pages. To what extent do these abstractions fit your
needs? Which gfp flags do you need?


Also, sorry for taking so long to submit this version. I spent a long
time debugging the crash that led to the submission of [1].

Alice

[1]: https://lore.kernel.org/rust-for-linux/20240305-shadow-call-stack-v2-1-c7b4a3f4d616@google.com/
Andreas Hindborg March 15, 2024, 8:16 a.m. UTC | #2
Alice Ryhl <aliceryhl@google.com> writes:

> Alice Ryhl <aliceryhl@google.com> writes:
>
> Andreas Hindborg: I recall you mentioning that you also needed an
> abstraction for pages. To what extent do these abstractions fit your
> needs? Which gfp flags do you need?
>

I based the block device driver API and null block driver series on v1
of this patch and v3 should still be good for that. The null block
driver uses `Page` indirectly through `UniqueFolio` with `GFP_KERNEL`
alloc flags. I do not need to customize the flags outside of that.

As an aside, I added methods to safely operate on the page contents [1].
`kernel::block::vec::Segment` indirectly uses this to move data to and
from pages [2].

Best regards,
Andreas


[1] https://github.com/metaspace/linux/commit/e88f4dc928233fcedcb0afec40be9bc2f8f74e3b
[2] https://lore.kernel.org/rust-for-linux/87y1akso83.fsf@metaspace.dk/T/#me6497ec69544efd21908f1acc6b3a1ab8b148ba0
Matthew Wilcox (Oracle) March 16, 2024, 3:30 p.m. UTC | #3
On Mon, Mar 11, 2024 at 10:50:56AM +0000, Alice Ryhl wrote:
> Alice Ryhl <aliceryhl@google.com> writes:
> > +/// Flags for the "get free page" function that underlies all memory allocations.
> > +pub mod flags {
> > +    pub type gfp_t = bindings::gfp_t;
> > +
> > +    /// `GFP_KERNEL` is typical for kernel-internal allocations. The caller requires `ZONE_NORMAL`
> > +    /// or a lower zone for direct access but can direct reclaim.
> > +    pub const GFP_KERNEL: gfp_t = bindings::GFP_KERNEL;
> > +    /// `GFP_ZERO` returns a zeroed page on success.
> > +    pub const __GFP_ZERO: gfp_t = bindings::__GFP_ZERO;
> > +    /// `GFP_HIGHMEM` indicates that the allocated memory may be located in high memory.
> > +    pub const __GFP_HIGHMEM: gfp_t = bindings::__GFP_HIGHMEM;
> > +}
> >
> > [...]
> >
> > +impl Page {
> > +    /// Allocates a new page.
> > +    pub fn alloc_page(gfp_flags: flags::gfp_t) -> Result<Self, AllocError> {
> > +        // SAFETY: The specified order is zero and we want one page.
> > +        let page = unsafe { bindings::alloc_pages(gfp_flags, 0) };
> > +        let page = NonNull::new(page).ok_or(AllocError)?;
> > +        // INVARIANT: We checked that the allocation succeeded.
> > +        Ok(Self { page })
> > +    }
> 
> Matthew Wilcox: You suggested on a previous version that I use gfp flags
> here, or that I rename it to e.g. BinderPage to make it clear that this
> is specific to the kind of pages that Binder needs.

I think what you have here is good.

> In this version I added some gfp flags, but I'm not actually sure that
> the Page abstraction works for all combinations of gfp flags. For
> example, I use kmap_local_page when accessing the page, but is that
> correct if there's a user that doesn't pass GFP_HIGHMEM?

Yes, kmap_local_page() works for non-highmem pages (it's essentially a
no-op)
Benno Lossin March 16, 2024, 8:39 p.m. UTC | #4
On 3/11/24 11:47, Alice Ryhl wrote:
> +/// A pointer to a page that owns the page allocation.
> +///
> +/// # Invariants
> +///
> +/// The pointer points at a page, and has ownership over the page.

Why not "`page` is valid"?
Do you mean by ownership of the page that `page` has ownership of the
allocation, or does that entail any other property/privilege?

> +pub struct Page {
> +    page: NonNull<bindings::page>,
> +}
> +
> +// SAFETY: It is safe to transfer page allocations between threads.

Why?

> +unsafe impl Send for Page {}
> +
> +// SAFETY: As long as the safety requirements for `&self` methods on this type
> +// are followed, there is no problem with calling them in parallel.

Why?

> +unsafe impl Sync for Page {}
> +
> +impl Page {
> +    /// Allocates a new page.
> +    pub fn alloc_page(gfp_flags: flags::gfp_t) -> Result<Self, AllocError> {
> +        // SAFETY: The specified order is zero and we want one page.

This doesn't explain why it is sound to call the function. I expect that
it is always sound to call this function with valid arguments.

> +        let page = unsafe { bindings::alloc_pages(gfp_flags, 0) };
> +        let page = NonNull::new(page).ok_or(AllocError)?;
> +        // INVARIANT: We checked that the allocation succeeded.

Doesn't mention ownership.

> +        Ok(Self { page })
> +    }
> +
> +    /// Returns a raw pointer to the page.
> +    pub fn as_ptr(&self) -> *mut bindings::page {
> +        self.page.as_ptr()
> +    }
> +
> +    /// Runs a piece of code with this page mapped to an address.
> +    ///
> +    /// The page is unmapped when this call returns.
> +    ///
> +    /// It is up to the caller to use the provided raw pointer correctly.

This says nothing about what 'correctly' means. What I gathered from the
implementation is that the supplied pointer is valid for the execution
of `f` for `PAGE_SIZE` bytes.
What other things are you allowed to rely upon?

Is it really OK for this function to be called from multiple threads?
Could that not result in the same page being mapped multiple times? If
that is fine, what about potential data races when two threads write to
the pointer given to `f`?

> +    pub fn with_page_mapped<T>(&self, f: impl FnOnce(*mut u8) -> T) -> T {
> +        // SAFETY: `page` is valid due to the type invariants on `Page`.
> +        let mapped_addr = unsafe { bindings::kmap_local_page(self.as_ptr()) };
> +
> +        let res = f(mapped_addr.cast());
> +
> +        // SAFETY: This unmaps the page mapped above.

This doesn't explain why it is sound.

> +        //
> +        // Since this API takes the user code as a closure, it can only be used
> +        // in a manner where the pages are unmapped in reverse order. This is as
> +        // required by `kunmap_local`.
> +        //
> +        // In other words, if this call to `kunmap_local` happens when a
> +        // different page should be unmapped first, then there must necessarily
> +        // be a call to `kmap_local_page` other than the call just above in
> +        // `with_page_mapped` that made that possible. In this case, it is the
> +        // unsafe block that wraps that other call that is incorrect.
> +        unsafe { bindings::kunmap_local(mapped_addr) };
> +
> +        res
> +    }
> +
> +    /// Runs a piece of code with a raw pointer to a slice of this page, with
> +    /// bounds checking.
> +    ///
> +    /// If `f` is called, then it will be called with a pointer that points at
> +    /// `off` bytes into the page, and the pointer will be valid for at least
> +    /// `len` bytes. The pointer is only valid on this task, as this method uses
> +    /// a local mapping.

This information about the pointer only being valid on this task should
also apply to `with_page_mapped`, right?

> +    ///
> +    /// If `off` and `len` refers to a region outside of this page, then this
> +    /// method returns `EINVAL` and does not call `f`.
> +    ///
> +    /// It is up to the caller to use the provided raw pointer correctly.

Again, please specify what 'correctly' means.
Boqun Feng March 19, 2024, 10:16 p.m. UTC | #5
On Mon, Mar 11, 2024 at 10:47:16AM +0000, Alice Ryhl wrote:
[...]
>  /* `bindgen` gets confused at certain things. */
>  const size_t RUST_CONST_HELPER_ARCH_SLAB_MINALIGN = ARCH_SLAB_MINALIGN;
> +const size_t RUST_CONST_HELPER_PAGE_SIZE = PAGE_SIZE;
> +const size_t RUST_CONST_HELPER_PAGE_MASK = PAGE_MASK;

At least for me, bindgen couldn't work out the macro expansion, and I
got:

	pub const PAGE_SIZE: usize = 4096;
	extern "C" {
	    pub static RUST_CONST_HELPER_PAGE_MASK: usize;
	}

in rust/bindings/bindings_generated.rs, which will eventually cause the
code cannot compile.

I'm using bindgen-cli 0.65.1, libclang (16 or 17), rustc (1.76 or 1.77).

Anyone else sees the same thing?

Regards,
Boqun

>  const gfp_t RUST_CONST_HELPER_GFP_KERNEL = GFP_KERNEL;
>  const gfp_t RUST_CONST_HELPER___GFP_ZERO = __GFP_ZERO;
> +const gfp_t RUST_CONST_HELPER___GFP_HIGHMEM = ___GFP_HIGHMEM;
[...]
Benno Lossin March 19, 2024, 10:28 p.m. UTC | #6
On 3/19/24 23:16, Boqun Feng wrote:
> On Mon, Mar 11, 2024 at 10:47:16AM +0000, Alice Ryhl wrote:
> [...]
>>   /* `bindgen` gets confused at certain things. */
>>   const size_t RUST_CONST_HELPER_ARCH_SLAB_MINALIGN = ARCH_SLAB_MINALIGN;
>> +const size_t RUST_CONST_HELPER_PAGE_SIZE = PAGE_SIZE;
>> +const size_t RUST_CONST_HELPER_PAGE_MASK = PAGE_MASK;
> 
> At least for me, bindgen couldn't work out the macro expansion, and I
> got:
> 
> 	pub const PAGE_SIZE: usize = 4096;
> 	extern "C" {
> 	    pub static RUST_CONST_HELPER_PAGE_MASK: usize;
> 	}
> 
> in rust/bindings/bindings_generated.rs, which will eventually cause the
> code cannot compile.
> 
> I'm using bindgen-cli 0.65.1, libclang (16 or 17), rustc (1.76 or 1.77).
> 
> Anyone else sees the same thing?

I also have this problem with bindgen-cli 0.69.1 libclang 16 and rustc 1.76.0.
For reference, here is the actual compilation error:

error[E0425]: cannot find value `PAGE_MASK` in crate `bindings`
      --> rust/kernel/page.rs:17:40
       |
17    | pub const PAGE_MASK: usize = bindings::PAGE_MASK as usize;
       |                                        ^^^^^^^^^ help: a constant with a similar name exists: `GATE_TASK`
       |
      ::: /home/benno/kernel/review/mem-man-binder/rust/bindings/bindings_generated.rs:12188:1
       |
12188 | pub const GATE_TASK: _bindgen_ty_4 = 5;
       | ---------------------------------- similarly named constant `GATE_TASK` defined here

error: type `gfp_t` should have an upper camel case name
   --> rust/kernel/page.rs:21:14
    |
21 |     pub type gfp_t = bindings::gfp_t;
    |              ^^^^^ help: convert the identifier to upper camel case: `GfpT`
    |
    = note: `-D non-camel-case-types` implied by `-D warnings`
    = help: to override `-D warnings` add `#[allow(non_camel_case_types)]`

error: aborting due to 2 previous errors

For more information about this error, try `rustc --explain E0425`.

@Alice: the second error should be unrelated to this problem.
Alice Ryhl March 20, 2024, 8:46 a.m. UTC | #7
> On 3/11/24 11:47, Alice Ryhl wrote:
> > +/// A pointer to a page that owns the page allocation.
> > +///
> > +/// # Invariants
> > +///
> > +/// The pointer points at a page, and has ownership over the page.
> 
> Why not "`page` is valid"?
> Do you mean by ownership of the page that `page` has ownership of the
> allocation, or does that entail any other property/privilege?

I can add "at a valid page".

By ownership I mean that we are allowed to pass it to __free_page and
that until we do, we can access the page. If you want me to reword this,
please tell me what you want it to say.

> > +// SAFETY: It is safe to transfer page allocations between threads.
> 
> Why?
> 
> > +unsafe impl Send for Page {}

How about:

// SAFETY: Pages have no logic that relies on them staying on a given
// thread, so moving them across threads is safe.

> > +// SAFETY: As long as the safety requirements for `&self` methods on this type
> > +// are followed, there is no problem with calling them in parallel.
> 
> Why?
> 
> > +unsafe impl Sync for Page {}

How about:

// SAFETY: Pages have no logic that relies on them not being accessed
// concurrently, so accessing them concurrently is safe.

> > +        // SAFETY: The specified order is zero and we want one page.
> 
> This doesn't explain why it is sound to call the function. I expect that
> it is always sound to call this function with valid arguments.
> 
> > +        let page = unsafe { bindings::alloc_pages(gfp_flags, 0) };

How about:

// SAFETY: Depending on the value of `gfp_flags`, this call may sleep.
// Other than that, it is always safe to call this method.

> > +        // INVARIANT: We checked that the allocation succeeded.
> 
> Doesn't mention ownership.
> 
> > +        Ok(Self { page })

How about:

// INVARIANT: We just successfully allocated a page, so we now have
// ownership of the newly allocated page. We transfer that ownership to
// the new `Page` object.

> > +    /// Runs a piece of code with this page mapped to an address.
> > +    ///
> > +    /// The page is unmapped when this call returns.
> > +    ///
> > +    /// It is up to the caller to use the provided raw pointer correctly.
> 
> This says nothing about what 'correctly' means. What I gathered from the
> implementation is that the supplied pointer is valid for the execution
> of `f` for `PAGE_SIZE` bytes.
> What other things are you allowed to rely upon?
> 
> Is it really OK for this function to be called from multiple threads?
> Could that not result in the same page being mapped multiple times? If
> that is fine, what about potential data races when two threads write to
> the pointer given to `f`?
> 
> > +    pub fn with_page_mapped<T>(&self, f: impl FnOnce(*mut u8) -> T) -> T {

I will say:

/// It is up to the caller to use the provided raw pointer correctly.
/// The pointer is valid for `PAGE_SIZE` bytes and for the duration in
/// which the closure is called. Depending on the gfp flags and kernel
/// configuration, the pointer may only be mapped on the current thread,
/// and in those cases, dereferencing it on other threads is UB. Other
/// than that, the usual rules for dereferencing a raw pointer apply.
/// (E.g., don't cause data races, the memory may be uninitialized, and
/// so on.)

It's okay to map it multiple times from different threads.

> > +        // SAFETY: This unmaps the page mapped above.
> 
> This doesn't explain why it is sound.
> 
> > +        //
> > +        // Since this API takes the user code as a closure, it can only be used
> > +        // in a manner where the pages are unmapped in reverse order. This is as
> > +        // required by `kunmap_local`.
> > +        //
> > +        // In other words, if this call to `kunmap_local` happens when a
> > +        // different page should be unmapped first, then there must necessarily
> > +        // be a call to `kmap_local_page` other than the call just above in
> > +        // `with_page_mapped` that made that possible. In this case, it is the
> > +        // unsafe block that wraps that other call that is incorrect.
> > +        unsafe { bindings::kunmap_local(mapped_addr) };

Why do you say that? The kunmap_local method requires that the address
being unmapped is currently mapped, and that pages are unmapped in
reverse order. The safety comment explains that the page is currently
mapped and that this method cannot be used to unmap them in anything
other than reverse order.

> > +    /// Runs a piece of code with a raw pointer to a slice of this page, with
> > +    /// bounds checking.
> > +    ///
> > +    /// If `f` is called, then it will be called with a pointer that points at
> > +    /// `off` bytes into the page, and the pointer will be valid for at least
> > +    /// `len` bytes. The pointer is only valid on this task, as this method uses
> > +    /// a local mapping.
> 
> This information about the pointer only being valid on this task should
> also apply to `with_page_mapped`, right?
> 
> > +    ///
> > +    /// If `off` and `len` refers to a region outside of this page, then this
> > +    /// method returns `EINVAL` and does not call `f`.
> > +    ///
> > +    /// It is up to the caller to use the provided raw pointer correctly.
> 
> Again, please specify what 'correctly' means.

I will remove the "The pointer is only valid on this task, as this
method uses a local mapping." sentence and copy the same paragraph as
previously (without the `PAGE_SIZE` remark).

Alice
Benno Lossin March 21, 2024, 1:15 p.m. UTC | #8
On 3/20/24 09:46, Alice Ryhl wrote:
>> On 3/11/24 11:47, Alice Ryhl wrote:
>>> +/// A pointer to a page that owns the page allocation.
>>> +///
>>> +/// # Invariants
>>> +///
>>> +/// The pointer points at a page, and has ownership over the page.
>>
>> Why not "`page` is valid"?
>> Do you mean by ownership of the page that `page` has ownership of the
>> allocation, or does that entail any other property/privilege?
> 
> I can add "at a valid page".

I don't think that helps, what you need as an invariant is that the
pointer is valid.

> By ownership I mean that we are allowed to pass it to __free_page and
> that until we do, we can access the page. If you want me to reword this,
> please tell me what you want it to say.

I see, no need to change it.

>>> +// SAFETY: It is safe to transfer page allocations between threads.
>>
>> Why?
>>
>>> +unsafe impl Send for Page {}
> 
> How about:
> 
> // SAFETY: Pages have no logic that relies on them staying on a given
> // thread, so moving them across threads is safe.

Sounds good.

>>> +// SAFETY: As long as the safety requirements for `&self` methods on this type
>>> +// are followed, there is no problem with calling them in parallel.
>>
>> Why?
>>
>>> +unsafe impl Sync for Page {}
> 
> How about:
> 
> // SAFETY: Pages have no logic that relies on them not being accessed
> // concurrently, so accessing them concurrently is safe.

Sounds good.

>>> +        // SAFETY: The specified order is zero and we want one page.
>>
>> This doesn't explain why it is sound to call the function. I expect that
>> it is always sound to call this function with valid arguments.
>>
>>> +        let page = unsafe { bindings::alloc_pages(gfp_flags, 0) };
> 
> How about:
> 
> // SAFETY: Depending on the value of `gfp_flags`, this call may sleep.
> // Other than that, it is always safe to call this method.

Sounds good.

>>> +        // INVARIANT: We checked that the allocation succeeded.
>>
>> Doesn't mention ownership.
>>
>>> +        Ok(Self { page })
> 
> How about:
> 
> // INVARIANT: We just successfully allocated a page, so we now have
> // ownership of the newly allocated page. We transfer that ownership to
> // the new `Page` object.

Sounds good.

>>> +    /// Runs a piece of code with this page mapped to an address.
>>> +    ///
>>> +    /// The page is unmapped when this call returns.
>>> +    ///
>>> +    /// It is up to the caller to use the provided raw pointer correctly.
>>
>> This says nothing about what 'correctly' means. What I gathered from the
>> implementation is that the supplied pointer is valid for the execution
>> of `f` for `PAGE_SIZE` bytes.
>> What other things are you allowed to rely upon?
>>
>> Is it really OK for this function to be called from multiple threads?
>> Could that not result in the same page being mapped multiple times? If
>> that is fine, what about potential data races when two threads write to
>> the pointer given to `f`?
>>
>>> +    pub fn with_page_mapped<T>(&self, f: impl FnOnce(*mut u8) -> T) -> T {
> 
> I will say:
> 
> /// It is up to the caller to use the provided raw pointer correctly.
> /// The pointer is valid for `PAGE_SIZE` bytes and for the duration in
> /// which the closure is called. Depending on the gfp flags and kernel
> /// configuration, the pointer may only be mapped on the current thread,
> /// and in those cases, dereferencing it on other threads is UB. Other
> /// than that, the usual rules for dereferencing a raw pointer apply.
> /// (E.g., don't cause data races, the memory may be uninitialized, and
> /// so on.)

I would simplify and drop "depending on the gfp flags and kernel..." and
just say that the pointer is only valid on the current thread.

Also would it make sense to make the pointer type *mut [u8; PAGE_SIZE]?

> It's okay to map it multiple times from different threads.

Do you still need to take care of data races?
So would it be fine to execute this code on two threads in parallel?

     static PAGE: Page = ...; // assume we have a page accessible by both threads
     
     PAGE.with_page_mapped(|ptr| {
         loop {
             unsafe { ptr.write(0) };
             pr_info!("{}", unsafe { ptr.read() });
         }
     });

If this is not allowed, I don't really like the API. As a raw version it
would be fine, but I think we should have a safer version (eg by taking
`&mut self`).

>>> +        // SAFETY: This unmaps the page mapped above.
>>
>> This doesn't explain why it is sound.
>>
>>> +        //
>>> +        // Since this API takes the user code as a closure, it can only be used
>>> +        // in a manner where the pages are unmapped in reverse order. This is as
>>> +        // required by `kunmap_local`.
>>> +        //
>>> +        // In other words, if this call to `kunmap_local` happens when a
>>> +        // different page should be unmapped first, then there must necessarily
>>> +        // be a call to `kmap_local_page` other than the call just above in
>>> +        // `with_page_mapped` that made that possible. In this case, it is the
>>> +        // unsafe block that wraps that other call that is incorrect.
>>> +        unsafe { bindings::kunmap_local(mapped_addr) };
> 
> Why do you say that? The kunmap_local method requires that the address
> being unmapped is currently mapped, and that pages are unmapped in
> reverse order. The safety comment explains that the page is currently
> mapped and that this method cannot be used to unmap them in anything
> other than reverse order.

Sorry it seems I thought that the safety comment ended after the first
sentence. Can you (re)move that first sentence, since it is not part of
a justification?
The rest is fine.

>>> +    /// Runs a piece of code with a raw pointer to a slice of this page, with
>>> +    /// bounds checking.
>>> +    ///
>>> +    /// If `f` is called, then it will be called with a pointer that points at
>>> +    /// `off` bytes into the page, and the pointer will be valid for at least
>>> +    /// `len` bytes. The pointer is only valid on this task, as this method uses
>>> +    /// a local mapping.
>>
>> This information about the pointer only being valid on this task should
>> also apply to `with_page_mapped`, right?
>>
>>> +    ///
>>> +    /// If `off` and `len` refers to a region outside of this page, then this
>>> +    /// method returns `EINVAL` and does not call `f`.
>>> +    ///
>>> +    /// It is up to the caller to use the provided raw pointer correctly.
>>
>> Again, please specify what 'correctly' means.
> 
> I will remove the "The pointer is only valid on this task, as this
> method uses a local mapping." sentence and copy the same paragraph as
> previously (without the `PAGE_SIZE` remark).

Sounds good.
Alice Ryhl March 21, 2024, 1:42 p.m. UTC | #9
On Thu, Mar 21, 2024 at 2:16 PM Benno Lossin <benno.lossin@proton.me> wrote:
>
> On 3/20/24 09:46, Alice Ryhl wrote:
> >> On 3/11/24 11:47, Alice Ryhl wrote:
> >>> +/// A pointer to a page that owns the page allocation.
> >>> +///
> >>> +/// # Invariants
> >>> +///
> >>> +/// The pointer points at a page, and has ownership over the page.
> >>
> >> Why not "`page` is valid"?
> >> Do you mean by ownership of the page that `page` has ownership of the
> >> allocation, or does that entail any other property/privilege?
> >
> > I can add "at a valid page".
>
> I don't think that helps, what you need as an invariant is that the
> pointer is valid.

To me "points at a page" implies that the pointer is valid. I mean, if
it was dangling, it would not point at a page?

But I can reword to something else if you have a preferred phrasing.

> >>> +    /// Runs a piece of code with this page mapped to an address.
> >>> +    ///
> >>> +    /// The page is unmapped when this call returns.
> >>> +    ///
> >>> +    /// It is up to the caller to use the provided raw pointer correctly.
> >>
> >> This says nothing about what 'correctly' means. What I gathered from the
> >> implementation is that the supplied pointer is valid for the execution
> >> of `f` for `PAGE_SIZE` bytes.
> >> What other things are you allowed to rely upon?
> >>
> >> Is it really OK for this function to be called from multiple threads?
> >> Could that not result in the same page being mapped multiple times? If
> >> that is fine, what about potential data races when two threads write to
> >> the pointer given to `f`?
> >>
> >>> +    pub fn with_page_mapped<T>(&self, f: impl FnOnce(*mut u8) -> T) -> T {
> >
> > I will say:
> >
> > /// It is up to the caller to use the provided raw pointer correctly.
> > /// The pointer is valid for `PAGE_SIZE` bytes and for the duration in
> > /// which the closure is called. Depending on the gfp flags and kernel
> > /// configuration, the pointer may only be mapped on the current thread,
> > /// and in those cases, dereferencing it on other threads is UB. Other
> > /// than that, the usual rules for dereferencing a raw pointer apply.
> > /// (E.g., don't cause data races, the memory may be uninitialized, and
> > /// so on.)
>
> I would simplify and drop "depending on the gfp flags and kernel..." and
> just say that the pointer is only valid on the current thread.

Sure, that works for me.

> Also would it make sense to make the pointer type *mut [u8; PAGE_SIZE]?

I think it's a trade-off. That makes the code more error-prone, since
`pointer::add` now doesn't move by a number of bytes, but a number of
pages.

> > It's okay to map it multiple times from different threads.
>
> Do you still need to take care of data races?
> So would it be fine to execute this code on two threads in parallel?
>
>      static PAGE: Page = ...; // assume we have a page accessible by both threads
>
>      PAGE.with_page_mapped(|ptr| {
>          loop {
>              unsafe { ptr.write(0) };
>              pr_info!("{}", unsafe { ptr.read() });
>          }
>      });

Like I said, the usual pointer rules apply. Two threads can access it
in parallel as long as one of the following are satisfied:

* Both accesses are reads.
* Both accesses are atomic.
* They access disjoint byte ranges.

Other than the fact that it uses a thread-local mapping on machines
that can't address all of their memory at the same time, it's
completely normal memory. It's literally just a PAGE_SIZE-aligned
allocation of PAGE_SIZE bytes.

> If this is not allowed, I don't really like the API. As a raw version it
> would be fine, but I think we should have a safer version (eg by taking
> `&mut self`).

I don't understand what you mean. It is the *most* raw API that `Page`
has. I can make them private if you want me to. The API cannot take
`&mut self` because I need to be able to unsafely perform concurrent
writes to disjoint byte ranges.

Alice
Benno Lossin March 21, 2024, 1:56 p.m. UTC | #10
On 3/21/24 14:42, Alice Ryhl wrote:
> On Thu, Mar 21, 2024 at 2:16 PM Benno Lossin <benno.lossin@proton.me> wrote:
>>
>> On 3/20/24 09:46, Alice Ryhl wrote:
>>>> On 3/11/24 11:47, Alice Ryhl wrote:
>>>>> +/// A pointer to a page that owns the page allocation.
>>>>> +///
>>>>> +/// # Invariants
>>>>> +///
>>>>> +/// The pointer points at a page, and has ownership over the page.
>>>>
>>>> Why not "`page` is valid"?
>>>> Do you mean by ownership of the page that `page` has ownership of the
>>>> allocation, or does that entail any other property/privilege?
>>>
>>> I can add "at a valid page".
>>
>> I don't think that helps, what you need as an invariant is that the
>> pointer is valid.
> 
> To me "points at a page" implies that the pointer is valid. I mean, if
> it was dangling, it would not point at a page?
> 
> But I can reword to something else if you have a preferred phrasing.

I would just say "`page` is valid" or "`self.page` is valid".

>>>>> +    /// Runs a piece of code with this page mapped to an address.
>>>>> +    ///
>>>>> +    /// The page is unmapped when this call returns.
>>>>> +    ///
>>>>> +    /// It is up to the caller to use the provided raw pointer correctly.
>>>>
>>>> This says nothing about what 'correctly' means. What I gathered from the
>>>> implementation is that the supplied pointer is valid for the execution
>>>> of `f` for `PAGE_SIZE` bytes.
>>>> What other things are you allowed to rely upon?
>>>>
>>>> Is it really OK for this function to be called from multiple threads?
>>>> Could that not result in the same page being mapped multiple times? If
>>>> that is fine, what about potential data races when two threads write to
>>>> the pointer given to `f`?
>>>>
>>>>> +    pub fn with_page_mapped<T>(&self, f: impl FnOnce(*mut u8) -> T) -> T {
>>>
>>> I will say:
>>>
>>> /// It is up to the caller to use the provided raw pointer correctly.
>>> /// The pointer is valid for `PAGE_SIZE` bytes and for the duration in
>>> /// which the closure is called. Depending on the gfp flags and kernel
>>> /// configuration, the pointer may only be mapped on the current thread,
>>> /// and in those cases, dereferencing it on other threads is UB. Other
>>> /// than that, the usual rules for dereferencing a raw pointer apply.
>>> /// (E.g., don't cause data races, the memory may be uninitialized, and
>>> /// so on.)
>>
>> I would simplify and drop "depending on the gfp flags and kernel..." and
>> just say that the pointer is only valid on the current thread.
> 
> Sure, that works for me.
> 
>> Also would it make sense to make the pointer type *mut [u8; PAGE_SIZE]?
> 
> I think it's a trade-off. That makes the code more error-prone, since
> `pointer::add` now doesn't move by a number of bytes, but a number of
> pages.

Yeah. As long as you document that the pointer is valid for r/w with
offsets in `0..PAGE_SIZE` bytes, leaving the type as is, is fine by me.


>>> It's okay to map it multiple times from different threads.
>>
>> Do you still need to take care of data races?
>> So would it be fine to execute this code on two threads in parallel?
>>
>>       static PAGE: Page = ...; // assume we have a page accessible by both threads
>>
>>       PAGE.with_page_mapped(|ptr| {
>>           loop {
>>               unsafe { ptr.write(0) };
>>               pr_info!("{}", unsafe { ptr.read() });
>>           }
>>       });
> 
> Like I said, the usual pointer rules apply. Two threads can access it
> in parallel as long as one of the following are satisfied:
> 
> * Both accesses are reads.
> * Both accesses are atomic.
> * They access disjoint byte ranges.
> 
> Other than the fact that it uses a thread-local mapping on machines
> that can't address all of their memory at the same time, it's
> completely normal memory. It's literally just a PAGE_SIZE-aligned
> allocation of PAGE_SIZE bytes.

Thanks for the info, what do you think of this?:

/// It is up to the caller to use the provided raw pointer correctly. The pointer is valid for reads
/// and writes for `PAGE_SIZE` bytes and for the duration in which the closure is called. The
/// pointer must only be used on the current thread. The caller must also ensure that no data races
/// occur: when mapping the same page on two threads accesses to memory with the same offset must be
/// synchronized.

> 
>> If this is not allowed, I don't really like the API. As a raw version it
>> would be fine, but I think we should have a safer version (eg by taking
>> `&mut self`).
> 
> I don't understand what you mean. It is the *most* raw API that `Page`
> has. I can make them private if you want me to. The API cannot take
> `&mut self` because I need to be able to unsafely perform concurrent
> writes to disjoint byte ranges.

If you don't need these functions to be public, I think we should
definitely make them private.
Also we could add a `raw` suffix to the functions to make it clear that
it is a primitive API. If you think that it is highly unlikely that we
get a safer version, then I don't think there is value in adding the
suffix.
Alice Ryhl March 21, 2024, 2:11 p.m. UTC | #11
On Thu, Mar 21, 2024 at 2:56 PM Benno Lossin <benno.lossin@proton.me> wrote:
>
> On 3/21/24 14:42, Alice Ryhl wrote:
> > On Thu, Mar 21, 2024 at 2:16 PM Benno Lossin <benno.lossin@proton.me> wrote:
> >>
> >> On 3/20/24 09:46, Alice Ryhl wrote:
> >>>> On 3/11/24 11:47, Alice Ryhl wrote:
> >>>>> +/// A pointer to a page that owns the page allocation.
> >>>>> +///
> >>>>> +/// # Invariants
> >>>>> +///
> >>>>> +/// The pointer points at a page, and has ownership over the page.
> >>>>
> >>>> Why not "`page` is valid"?
> >>>> Do you mean by ownership of the page that `page` has ownership of the
> >>>> allocation, or does that entail any other property/privilege?
> >>>
> >>> I can add "at a valid page".
> >>
> >> I don't think that helps, what you need as an invariant is that the
> >> pointer is valid.
> >
> > To me "points at a page" implies that the pointer is valid. I mean, if
> > it was dangling, it would not point at a page?
> >
> > But I can reword to something else if you have a preferred phrasing.
>
> I would just say "`page` is valid" or "`self.page` is valid".
>
> >>>>> +    /// Runs a piece of code with this page mapped to an address.
> >>>>> +    ///
> >>>>> +    /// The page is unmapped when this call returns.
> >>>>> +    ///
> >>>>> +    /// It is up to the caller to use the provided raw pointer correctly.
> >>>>
> >>>> This says nothing about what 'correctly' means. What I gathered from the
> >>>> implementation is that the supplied pointer is valid for the execution
> >>>> of `f` for `PAGE_SIZE` bytes.
> >>>> What other things are you allowed to rely upon?
> >>>>
> >>>> Is it really OK for this function to be called from multiple threads?
> >>>> Could that not result in the same page being mapped multiple times? If
> >>>> that is fine, what about potential data races when two threads write to
> >>>> the pointer given to `f`?
> >>>>
> >>>>> +    pub fn with_page_mapped<T>(&self, f: impl FnOnce(*mut u8) -> T) -> T {
> >>>
> >>> I will say:
> >>>
> >>> /// It is up to the caller to use the provided raw pointer correctly.
> >>> /// The pointer is valid for `PAGE_SIZE` bytes and for the duration in
> >>> /// which the closure is called. Depending on the gfp flags and kernel
> >>> /// configuration, the pointer may only be mapped on the current thread,
> >>> /// and in those cases, dereferencing it on other threads is UB. Other
> >>> /// than that, the usual rules for dereferencing a raw pointer apply.
> >>> /// (E.g., don't cause data races, the memory may be uninitialized, and
> >>> /// so on.)
> >>
> >> I would simplify and drop "depending on the gfp flags and kernel..." and
> >> just say that the pointer is only valid on the current thread.
> >
> > Sure, that works for me.
> >
> >> Also would it make sense to make the pointer type *mut [u8; PAGE_SIZE]?
> >
> > I think it's a trade-off. That makes the code more error-prone, since
> > `pointer::add` now doesn't move by a number of bytes, but a number of
> > pages.
>
> Yeah. As long as you document that the pointer is valid for r/w with
> offsets in `0..PAGE_SIZE` bytes, leaving the type as is, is fine by me.
>
>
> >>> It's okay to map it multiple times from different threads.
> >>
> >> Do you still need to take care of data races?
> >> So would it be fine to execute this code on two threads in parallel?
> >>
> >>       static PAGE: Page = ...; // assume we have a page accessible by both threads
> >>
> >>       PAGE.with_page_mapped(|ptr| {
> >>           loop {
> >>               unsafe { ptr.write(0) };
> >>               pr_info!("{}", unsafe { ptr.read() });
> >>           }
> >>       });
> >
> > Like I said, the usual pointer rules apply. Two threads can access it
> > in parallel as long as one of the following are satisfied:
> >
> > * Both accesses are reads.
> > * Both accesses are atomic.
> > * They access disjoint byte ranges.
> >
> > Other than the fact that it uses a thread-local mapping on machines
> > that can't address all of their memory at the same time, it's
> > completely normal memory. It's literally just a PAGE_SIZE-aligned
> > allocation of PAGE_SIZE bytes.
>
> Thanks for the info, what do you think of this?:
>
> /// It is up to the caller to use the provided raw pointer correctly. The pointer is valid for reads
> /// and writes for `PAGE_SIZE` bytes and for the duration in which the closure is called. The
> /// pointer must only be used on the current thread. The caller must also ensure that no data races
> /// occur: when mapping the same page on two threads accesses to memory with the same offset must be
> /// synchronized.

I would much rather phrase it in terms of "the usual pointer" rules. I
mean, the memory could also be uninitialized if you don't pass
__GFP_ZERO when you create it, so you also have to make sure to follow
the rules about uninitialized memory. I don't want to be in the
business of listing all requirements for accessing memory here.

> >> If this is not allowed, I don't really like the API. As a raw version it
> >> would be fine, but I think we should have a safer version (eg by taking
> >> `&mut self`).
> >
> > I don't understand what you mean. It is the *most* raw API that `Page`
> > has. I can make them private if you want me to. The API cannot take
> > `&mut self` because I need to be able to unsafely perform concurrent
> > writes to disjoint byte ranges.
>
> If you don't need these functions to be public, I think we should
> definitely make them private.
> Also we could add a `raw` suffix to the functions to make it clear that
> it is a primitive API. If you think that it is highly unlikely that we
> get a safer version, then I don't think there is value in adding the
> suffix.

The old code on the Rust branch didn't have these functions, but
that's because the old `read_raw` and `write_raw` methods did all of
these things directly in their implementation:

* Map the memory so we can get a pointer.
* Get a pointer to a subslice (with bounds checks!)
* Do the actual read/write.

I thought that doing this many things in a single function was
convoluted, so I decided to refactor the code by extracting the "get a
pointer to the page" logic into `with_page_mapped` and the "point to
subslice with bounds check" logic into `with_pointer_into_page`. That
way, each function has only one responsibility, instead of mixing
three responsibilities into one.

So even if we get a safer version, I would not want to get rid of this
method. I don't want to inline its implementation into more
complicated functions. The safer method would call the raw method, and
then do whatever additional logic it wants to do on top of that.

Alice
Alice Ryhl March 21, 2024, 2:16 p.m. UTC | #12
On Thu, Mar 21, 2024 at 3:11 PM Alice Ryhl <aliceryhl@google.com> wrote:
>
> On Thu, Mar 21, 2024 at 2:56 PM Benno Lossin <benno.lossin@proton.me> wrote:
> >
> > On 3/21/24 14:42, Alice Ryhl wrote:
> > > On Thu, Mar 21, 2024 at 2:16 PM Benno Lossin <benno.lossin@proton.me> wrote:
> > >>
> > >> On 3/20/24 09:46, Alice Ryhl wrote:
> > >>>> On 3/11/24 11:47, Alice Ryhl wrote:
> > >>>>> +/// A pointer to a page that owns the page allocation.
> > >>>>> +///
> > >>>>> +/// # Invariants
> > >>>>> +///
> > >>>>> +/// The pointer points at a page, and has ownership over the page.
> > >>>>
> > >>>> Why not "`page` is valid"?
> > >>>> Do you mean by ownership of the page that `page` has ownership of the
> > >>>> allocation, or does that entail any other property/privilege?
> > >>>
> > >>> I can add "at a valid page".
> > >>
> > >> I don't think that helps, what you need as an invariant is that the
> > >> pointer is valid.
> > >
> > > To me "points at a page" implies that the pointer is valid. I mean, if
> > > it was dangling, it would not point at a page?
> > >
> > > But I can reword to something else if you have a preferred phrasing.
> >
> > I would just say "`page` is valid" or "`self.page` is valid".
> >
> > >>>>> +    /// Runs a piece of code with this page mapped to an address.
> > >>>>> +    ///
> > >>>>> +    /// The page is unmapped when this call returns.
> > >>>>> +    ///
> > >>>>> +    /// It is up to the caller to use the provided raw pointer correctly.
> > >>>>
> > >>>> This says nothing about what 'correctly' means. What I gathered from the
> > >>>> implementation is that the supplied pointer is valid for the execution
> > >>>> of `f` for `PAGE_SIZE` bytes.
> > >>>> What other things are you allowed to rely upon?
> > >>>>
> > >>>> Is it really OK for this function to be called from multiple threads?
> > >>>> Could that not result in the same page being mapped multiple times? If
> > >>>> that is fine, what about potential data races when two threads write to
> > >>>> the pointer given to `f`?
> > >>>>
> > >>>>> +    pub fn with_page_mapped<T>(&self, f: impl FnOnce(*mut u8) -> T) -> T {
> > >>>
> > >>> I will say:
> > >>>
> > >>> /// It is up to the caller to use the provided raw pointer correctly.
> > >>> /// The pointer is valid for `PAGE_SIZE` bytes and for the duration in
> > >>> /// which the closure is called. Depending on the gfp flags and kernel
> > >>> /// configuration, the pointer may only be mapped on the current thread,
> > >>> /// and in those cases, dereferencing it on other threads is UB. Other
> > >>> /// than that, the usual rules for dereferencing a raw pointer apply.
> > >>> /// (E.g., don't cause data races, the memory may be uninitialized, and
> > >>> /// so on.)
> > >>
> > >> I would simplify and drop "depending on the gfp flags and kernel..." and
> > >> just say that the pointer is only valid on the current thread.
> > >
> > > Sure, that works for me.
> > >
> > >> Also would it make sense to make the pointer type *mut [u8; PAGE_SIZE]?
> > >
> > > I think it's a trade-off. That makes the code more error-prone, since
> > > `pointer::add` now doesn't move by a number of bytes, but a number of
> > > pages.
> >
> > Yeah. As long as you document that the pointer is valid for r/w with
> > offsets in `0..PAGE_SIZE` bytes, leaving the type as is, is fine by me.
> >
> >
> > >>> It's okay to map it multiple times from different threads.
> > >>
> > >> Do you still need to take care of data races?
> > >> So would it be fine to execute this code on two threads in parallel?
> > >>
> > >>       static PAGE: Page = ...; // assume we have a page accessible by both threads
> > >>
> > >>       PAGE.with_page_mapped(|ptr| {
> > >>           loop {
> > >>               unsafe { ptr.write(0) };
> > >>               pr_info!("{}", unsafe { ptr.read() });
> > >>           }
> > >>       });
> > >
> > > Like I said, the usual pointer rules apply. Two threads can access it
> > > in parallel as long as one of the following are satisfied:
> > >
> > > * Both accesses are reads.
> > > * Both accesses are atomic.
> > > * They access disjoint byte ranges.
> > >
> > > Other than the fact that it uses a thread-local mapping on machines
> > > that can't address all of their memory at the same time, it's
> > > completely normal memory. It's literally just a PAGE_SIZE-aligned
> > > allocation of PAGE_SIZE bytes.
> >
> > Thanks for the info, what do you think of this?:
> >
> > /// It is up to the caller to use the provided raw pointer correctly. The pointer is valid for reads
> > /// and writes for `PAGE_SIZE` bytes and for the duration in which the closure is called. The
> > /// pointer must only be used on the current thread. The caller must also ensure that no data races
> > /// occur: when mapping the same page on two threads accesses to memory with the same offset must be
> > /// synchronized.
>
> I would much rather phrase it in terms of "the usual pointer" rules. I
> mean, the memory could also be uninitialized if you don't pass
> __GFP_ZERO when you create it, so you also have to make sure to follow
> the rules about uninitialized memory. I don't want to be in the
> business of listing all requirements for accessing memory here.
>
> > >> If this is not allowed, I don't really like the API. As a raw version it
> > >> would be fine, but I think we should have a safer version (eg by taking
> > >> `&mut self`).
> > >
> > > I don't understand what you mean. It is the *most* raw API that `Page`
> > > has. I can make them private if you want me to. The API cannot take
> > > `&mut self` because I need to be able to unsafely perform concurrent
> > > writes to disjoint byte ranges.
> >
> > If you don't need these functions to be public, I think we should
> > definitely make them private.
> > Also we could add a `raw` suffix to the functions to make it clear that
> > it is a primitive API. If you think that it is highly unlikely that we
> > get a safer version, then I don't think there is value in adding the
> > suffix.
>
> The old code on the Rust branch didn't have these functions, but
> that's because the old `read_raw` and `write_raw` methods did all of
> these things directly in their implementation:
>
> * Map the memory so we can get a pointer.
> * Get a pointer to a subslice (with bounds checks!)
> * Do the actual read/write.
>
> I thought that doing this many things in a single function was
> convoluted, so I decided to refactor the code by extracting the "get a
> pointer to the page" logic into `with_page_mapped` and the "point to
> subslice with bounds check" logic into `with_pointer_into_page`. That
> way, each function has only one responsibility, instead of mixing
> three responsibilities into one.
>
> So even if we get a safer version, I would not want to get rid of this
> method. I don't want to inline its implementation into more
> complicated functions. The safer method would call the raw method, and
> then do whatever additional logic it wants to do on top of that.

Adding to this: To me, we *do* already have safer versions of this
method. Those are the read_raw and write_raw and fill_zero and
copy_from_user_slice methods.

Alice
Benno Lossin March 21, 2024, 2:19 p.m. UTC | #13
On 3/21/24 15:11, Alice Ryhl wrote:
> On Thu, Mar 21, 2024 at 2:56 PM Benno Lossin <benno.lossin@proton.me> wrote:
>>
>> On 3/21/24 14:42, Alice Ryhl wrote:
>>> On Thu, Mar 21, 2024 at 2:16 PM Benno Lossin <benno.lossin@proton.me> wrote:
>>>>
>>>> On 3/20/24 09:46, Alice Ryhl wrote:
>>>>>> On 3/11/24 11:47, Alice Ryhl wrote:
>>>>>>> +/// A pointer to a page that owns the page allocation.
>>>>>>> +///
>>>>>>> +/// # Invariants
>>>>>>> +///
>>>>>>> +/// The pointer points at a page, and has ownership over the page.
>>>>>>
>>>>>> Why not "`page` is valid"?
>>>>>> Do you mean by ownership of the page that `page` has ownership of the
>>>>>> allocation, or does that entail any other property/privilege?
>>>>>
>>>>> I can add "at a valid page".
>>>>
>>>> I don't think that helps, what you need as an invariant is that the
>>>> pointer is valid.
>>>
>>> To me "points at a page" implies that the pointer is valid. I mean, if
>>> it was dangling, it would not point at a page?
>>>
>>> But I can reword to something else if you have a preferred phrasing.
>>
>> I would just say "`page` is valid" or "`self.page` is valid".
>>
>>>>>>> +    /// Runs a piece of code with this page mapped to an address.
>>>>>>> +    ///
>>>>>>> +    /// The page is unmapped when this call returns.
>>>>>>> +    ///
>>>>>>> +    /// It is up to the caller to use the provided raw pointer correctly.
>>>>>>
>>>>>> This says nothing about what 'correctly' means. What I gathered from the
>>>>>> implementation is that the supplied pointer is valid for the execution
>>>>>> of `f` for `PAGE_SIZE` bytes.
>>>>>> What other things are you allowed to rely upon?
>>>>>>
>>>>>> Is it really OK for this function to be called from multiple threads?
>>>>>> Could that not result in the same page being mapped multiple times? If
>>>>>> that is fine, what about potential data races when two threads write to
>>>>>> the pointer given to `f`?
>>>>>>
>>>>>>> +    pub fn with_page_mapped<T>(&self, f: impl FnOnce(*mut u8) -> T) -> T {
>>>>>
>>>>> I will say:
>>>>>
>>>>> /// It is up to the caller to use the provided raw pointer correctly.
>>>>> /// The pointer is valid for `PAGE_SIZE` bytes and for the duration in
>>>>> /// which the closure is called. Depending on the gfp flags and kernel
>>>>> /// configuration, the pointer may only be mapped on the current thread,
>>>>> /// and in those cases, dereferencing it on other threads is UB. Other
>>>>> /// than that, the usual rules for dereferencing a raw pointer apply.
>>>>> /// (E.g., don't cause data races, the memory may be uninitialized, and
>>>>> /// so on.)
>>>>
>>>> I would simplify and drop "depending on the gfp flags and kernel..." and
>>>> just say that the pointer is only valid on the current thread.
>>>
>>> Sure, that works for me.
>>>
>>>> Also would it make sense to make the pointer type *mut [u8; PAGE_SIZE]?
>>>
>>> I think it's a trade-off. That makes the code more error-prone, since
>>> `pointer::add` now doesn't move by a number of bytes, but a number of
>>> pages.
>>
>> Yeah. As long as you document that the pointer is valid for r/w with
>> offsets in `0..PAGE_SIZE` bytes, leaving the type as is, is fine by me.
>>
>>
>>>>> It's okay to map it multiple times from different threads.
>>>>
>>>> Do you still need to take care of data races?
>>>> So would it be fine to execute this code on two threads in parallel?
>>>>
>>>>        static PAGE: Page = ...; // assume we have a page accessible by both threads
>>>>
>>>>        PAGE.with_page_mapped(|ptr| {
>>>>            loop {
>>>>                unsafe { ptr.write(0) };
>>>>                pr_info!("{}", unsafe { ptr.read() });
>>>>            }
>>>>        });
>>>
>>> Like I said, the usual pointer rules apply. Two threads can access it
>>> in parallel as long as one of the following are satisfied:
>>>
>>> * Both accesses are reads.
>>> * Both accesses are atomic.
>>> * They access disjoint byte ranges.
>>>
>>> Other than the fact that it uses a thread-local mapping on machines
>>> that can't address all of their memory at the same time, it's
>>> completely normal memory. It's literally just a PAGE_SIZE-aligned
>>> allocation of PAGE_SIZE bytes.
>>
>> Thanks for the info, what do you think of this?:
>>
>> /// It is up to the caller to use the provided raw pointer correctly. The pointer is valid for reads
>> /// and writes for `PAGE_SIZE` bytes and for the duration in which the closure is called. The
>> /// pointer must only be used on the current thread. The caller must also ensure that no data races
>> /// occur: when mapping the same page on two threads accesses to memory with the same offset must be
>> /// synchronized.
> 
> I would much rather phrase it in terms of "the usual pointer" rules. I
> mean, the memory could also be uninitialized if you don't pass
> __GFP_ZERO when you create it, so you also have to make sure to follow
> the rules about uninitialized memory. I don't want to be in the
> business of listing all requirements for accessing memory here.

Sure you can add that part again, I just want to highlight that mapping
the same page multiple times means that the caller has to synchronize
accesses to those pointers even if the pointers do not have the same
address value. That is not normally something you need to take care of,
ie normally if `ptr1.addr() != ptr2.addr()` then you can access them
without synchronization.

>>>> If this is not allowed, I don't really like the API. As a raw version it
>>>> would be fine, but I think we should have a safer version (eg by taking
>>>> `&mut self`).
>>>
>>> I don't understand what you mean. It is the *most* raw API that `Page`
>>> has. I can make them private if you want me to. The API cannot take
>>> `&mut self` because I need to be able to unsafely perform concurrent
>>> writes to disjoint byte ranges.
>>
>> If you don't need these functions to be public, I think we should
>> definitely make them private.
>> Also we could add a `raw` suffix to the functions to make it clear that
>> it is a primitive API. If you think that it is highly unlikely that we
>> get a safer version, then I don't think there is value in adding the
>> suffix.
> 
> The old code on the Rust branch didn't have these functions, but
> that's because the old `read_raw` and `write_raw` methods did all of
> these things directly in their implementation:
> 
> * Map the memory so we can get a pointer.
> * Get a pointer to a subslice (with bounds checks!)
> * Do the actual read/write.
> 
> I thought that doing this many things in a single function was
> convoluted, so I decided to refactor the code by extracting the "get a
> pointer to the page" logic into `with_page_mapped` and the "point to
> subslice with bounds check" logic into `with_pointer_into_page`. That
> way, each function has only one responsibility, instead of mixing
> three responsibilities into one.

I think that design decision is good.

> So even if we get a safer version, I would not want to get rid of this
> method. I don't want to inline its implementation into more
> complicated functions. The safer method would call the raw method, and
> then do whatever additional logic it wants to do on top of that.

I was not suggesting removing this method, rather renaming it to reflect
that it is a primitive API that should be avoided if possible.
diff mbox series

Patch

diff --git a/rust/bindings/bindings_helper.h b/rust/bindings/bindings_helper.h
index 65b98831b975..1073005ca449 100644
--- a/rust/bindings/bindings_helper.h
+++ b/rust/bindings/bindings_helper.h
@@ -20,5 +20,8 @@ 
 
 /* `bindgen` gets confused at certain things. */
 const size_t RUST_CONST_HELPER_ARCH_SLAB_MINALIGN = ARCH_SLAB_MINALIGN;
+const size_t RUST_CONST_HELPER_PAGE_SIZE = PAGE_SIZE;
+const size_t RUST_CONST_HELPER_PAGE_MASK = PAGE_MASK;
 const gfp_t RUST_CONST_HELPER_GFP_KERNEL = GFP_KERNEL;
 const gfp_t RUST_CONST_HELPER___GFP_ZERO = __GFP_ZERO;
+const gfp_t RUST_CONST_HELPER___GFP_HIGHMEM = ___GFP_HIGHMEM;
diff --git a/rust/helpers.c b/rust/helpers.c
index 312b6fcb49d5..298d2ee16e61 100644
--- a/rust/helpers.c
+++ b/rust/helpers.c
@@ -25,6 +25,8 @@ 
 #include <linux/build_bug.h>
 #include <linux/err.h>
 #include <linux/errname.h>
+#include <linux/gfp.h>
+#include <linux/highmem.h>
 #include <linux/mutex.h>
 #include <linux/refcount.h>
 #include <linux/sched/signal.h>
@@ -93,6 +95,24 @@  int rust_helper_signal_pending(struct task_struct *t)
 }
 EXPORT_SYMBOL_GPL(rust_helper_signal_pending);
 
+struct page *rust_helper_alloc_pages(gfp_t gfp_mask, unsigned int order)
+{
+       return alloc_pages(gfp_mask, order);
+}
+EXPORT_SYMBOL_GPL(rust_helper_alloc_pages);
+
+void *rust_helper_kmap_local_page(struct page *page)
+{
+       return kmap_local_page(page);
+}
+EXPORT_SYMBOL_GPL(rust_helper_kmap_local_page);
+
+void rust_helper_kunmap_local(const void *addr)
+{
+       kunmap_local(addr);
+}
+EXPORT_SYMBOL_GPL(rust_helper_kunmap_local);
+
 refcount_t rust_helper_REFCOUNT_INIT(int n)
 {
 	return (refcount_t)REFCOUNT_INIT(n);
diff --git a/rust/kernel/lib.rs b/rust/kernel/lib.rs
index 37f84223b83f..667fc67fa24f 100644
--- a/rust/kernel/lib.rs
+++ b/rust/kernel/lib.rs
@@ -39,6 +39,7 @@ 
 pub mod kunit;
 #[cfg(CONFIG_NET)]
 pub mod net;
+pub mod page;
 pub mod prelude;
 pub mod print;
 mod static_assert;
diff --git a/rust/kernel/page.rs b/rust/kernel/page.rs
new file mode 100644
index 000000000000..02d25b142fc8
--- /dev/null
+++ b/rust/kernel/page.rs
@@ -0,0 +1,223 @@ 
+// SPDX-License-Identifier: GPL-2.0
+
+//! Kernel page allocation and management.
+
+use crate::{bindings, error::code::*, error::Result, uaccess::UserSliceReader};
+use core::{
+    alloc::AllocError,
+    ptr::{self, NonNull},
+};
+
+/// A bitwise shift for the page size.
+pub const PAGE_SHIFT: usize = bindings::PAGE_SHIFT as usize;
+/// The number of bytes in a page.
+pub const PAGE_SIZE: usize = bindings::PAGE_SIZE as usize;
+/// A bitmask that can be used to get the page containing a given address by masking away the lower
+/// bits.
+pub const PAGE_MASK: usize = bindings::PAGE_MASK as usize;
+
+/// Flags for the "get free page" function that underlies all memory allocations.
+pub mod flags {
+    pub type gfp_t = bindings::gfp_t;
+
+    /// `GFP_KERNEL` is typical for kernel-internal allocations. The caller requires `ZONE_NORMAL`
+    /// or a lower zone for direct access but can direct reclaim.
+    pub const GFP_KERNEL: gfp_t = bindings::GFP_KERNEL;
+    /// `GFP_ZERO` returns a zeroed page on success.
+    pub const __GFP_ZERO: gfp_t = bindings::__GFP_ZERO;
+    /// `GFP_HIGHMEM` indicates that the allocated memory may be located in high memory.
+    pub const __GFP_HIGHMEM: gfp_t = bindings::__GFP_HIGHMEM;
+}
+
+/// A pointer to a page that owns the page allocation.
+///
+/// # Invariants
+///
+/// The pointer points at a page, and has ownership over the page.
+pub struct Page {
+    page: NonNull<bindings::page>,
+}
+
+// SAFETY: It is safe to transfer page allocations between threads.
+unsafe impl Send for Page {}
+
+// SAFETY: As long as the safety requirements for `&self` methods on this type
+// are followed, there is no problem with calling them in parallel.
+unsafe impl Sync for Page {}
+
+impl Page {
+    /// Allocates a new page.
+    pub fn alloc_page(gfp_flags: flags::gfp_t) -> Result<Self, AllocError> {
+        // SAFETY: The specified order is zero and we want one page.
+        let page = unsafe { bindings::alloc_pages(gfp_flags, 0) };
+        let page = NonNull::new(page).ok_or(AllocError)?;
+        // INVARIANT: We checked that the allocation succeeded.
+        Ok(Self { page })
+    }
+
+    /// Returns a raw pointer to the page.
+    pub fn as_ptr(&self) -> *mut bindings::page {
+        self.page.as_ptr()
+    }
+
+    /// Runs a piece of code with this page mapped to an address.
+    ///
+    /// The page is unmapped when this call returns.
+    ///
+    /// It is up to the caller to use the provided raw pointer correctly.
+    pub fn with_page_mapped<T>(&self, f: impl FnOnce(*mut u8) -> T) -> T {
+        // SAFETY: `page` is valid due to the type invariants on `Page`.
+        let mapped_addr = unsafe { bindings::kmap_local_page(self.as_ptr()) };
+
+        let res = f(mapped_addr.cast());
+
+        // SAFETY: This unmaps the page mapped above.
+        //
+        // Since this API takes the user code as a closure, it can only be used
+        // in a manner where the pages are unmapped in reverse order. This is as
+        // required by `kunmap_local`.
+        //
+        // In other words, if this call to `kunmap_local` happens when a
+        // different page should be unmapped first, then there must necessarily
+        // be a call to `kmap_local_page` other than the call just above in
+        // `with_page_mapped` that made that possible. In this case, it is the
+        // unsafe block that wraps that other call that is incorrect.
+        unsafe { bindings::kunmap_local(mapped_addr) };
+
+        res
+    }
+
+    /// Runs a piece of code with a raw pointer to a slice of this page, with
+    /// bounds checking.
+    ///
+    /// If `f` is called, then it will be called with a pointer that points at
+    /// `off` bytes into the page, and the pointer will be valid for at least
+    /// `len` bytes. The pointer is only valid on this task, as this method uses
+    /// a local mapping.
+    ///
+    /// If `off` and `len` refers to a region outside of this page, then this
+    /// method returns `EINVAL` and does not call `f`.
+    ///
+    /// It is up to the caller to use the provided raw pointer correctly.
+    pub fn with_pointer_into_page<T>(
+        &self,
+        off: usize,
+        len: usize,
+        f: impl FnOnce(*mut u8) -> Result<T>,
+    ) -> Result<T> {
+        let bounds_ok = off <= PAGE_SIZE && len <= PAGE_SIZE && (off + len) <= PAGE_SIZE;
+
+        if bounds_ok {
+            self.with_page_mapped(move |page_addr| {
+                // SAFETY: The `off` integer is at most `PAGE_SIZE`, so this pointer offset will
+                // result in a pointer that is in bounds or one off the end of the page.
+                f(unsafe { page_addr.add(off) })
+            })
+        } else {
+            Err(EINVAL)
+        }
+    }
+
+    /// Maps the page and reads from it into the given buffer.
+    ///
+    /// This method will perform bounds checks on the page offset. If `offset ..
+    /// offset+len` goes outside ot the page, then this call returns `EINVAL`.
+    ///
+    /// # Safety
+    ///
+    /// * Callers must ensure that `dst` is valid for writing `len` bytes.
+    /// * Callers must ensure that this call does not race with a write to the
+    ///   same page that overlaps with this read.
+    pub unsafe fn read_raw(&self, dst: *mut u8, offset: usize, len: usize) -> Result {
+        self.with_pointer_into_page(offset, len, move |src| {
+            // SAFETY: If `with_pointer_into_page` calls into this closure, then
+            // it has performed a bounds check and guarantees that `src` is
+            // valid for `len` bytes.
+            //
+            // There caller guarantees that there is no data race.
+            unsafe { ptr::copy_nonoverlapping(src, dst, len) };
+            Ok(())
+        })
+    }
+
+    /// Maps the page and writes into it from the given buffer.
+    ///
+    /// This method will perform bounds checks on the page offset. If `offset ..
+    /// offset+len` goes outside ot the page, then this call returns `EINVAL`.
+    ///
+    /// # Safety
+    ///
+    /// * Callers must ensure that `src` is valid for reading `len` bytes.
+    /// * Callers must ensure that this call does not race with a read or write
+    ///   to the same page that overlaps with this write.
+    pub unsafe fn write_raw(&self, src: *const u8, offset: usize, len: usize) -> Result {
+        self.with_pointer_into_page(offset, len, move |dst| {
+            // SAFETY: If `with_pointer_into_page` calls into this closure, then
+            // it has performed a bounds check and guarantees that `dst` is
+            // valid for `len` bytes.
+            //
+            // There caller guarantees that there is no data race.
+            unsafe { ptr::copy_nonoverlapping(src, dst, len) };
+            Ok(())
+        })
+    }
+
+    /// Maps the page and zeroes the given slice.
+    ///
+    /// This method will perform bounds checks on the page offset. If `offset ..
+    /// offset+len` goes outside ot the page, then this call returns `EINVAL`.
+    ///
+    /// # Safety
+    ///
+    /// Callers must ensure that this call does not race with a read or write to
+    /// the same page that overlaps with this write.
+    pub unsafe fn fill_zero(&self, offset: usize, len: usize) -> Result {
+        self.with_pointer_into_page(offset, len, move |dst| {
+            // SAFETY: If `with_pointer_into_page` calls into this closure, then
+            // it has performed a bounds check and guarantees that `dst` is
+            // valid for `len` bytes.
+            //
+            // There caller guarantees that there is no data race.
+            unsafe { ptr::write_bytes(dst, 0u8, len) };
+            Ok(())
+        })
+    }
+
+    /// Copies data from userspace into this page.
+    ///
+    /// This method will perform bounds checks on the page offset. If `offset ..
+    /// offset+len` goes outside ot the page, then this call returns `EINVAL`.
+    ///
+    /// Like the other `UserSliceReader` methods, data races are allowed on the
+    /// userspace address. However, they are not allowed on the page you are
+    /// copying into.
+    ///
+    /// # Safety
+    ///
+    /// Callers must ensure that this call does not race with a read or write to
+    /// the same page that overlaps with this write.
+    pub unsafe fn copy_from_user_slice(
+        &self,
+        reader: &mut UserSliceReader,
+        offset: usize,
+        len: usize,
+    ) -> Result {
+        self.with_pointer_into_page(offset, len, move |dst| {
+            // SAFETY: If `with_pointer_into_page` calls into this closure, then
+            // it has performed a bounds check and guarantees that `dst` is
+            // valid for `len` bytes.
+            //
+            // There caller guarantees that there is no data race when writing
+            // to `dst`.
+            unsafe { reader.read_raw(dst, len) }
+        })
+    }
+}
+
+impl Drop for Page {
+    fn drop(&mut self) {
+        // SAFETY: By the type invariants, we have ownership of the page and can
+        // free it.
+        unsafe { bindings::__free_pages(self.page.as_ptr(), 0) };
+    }
+}