diff mbox series

[1/7] rust: file: add Rust abstraction for `struct file`

Message ID 20231129-alice-file-v1-1-f81afe8c7261@google.com (mailing list archive)
State New, archived
Headers show
Series File abstractions needed by Rust Binder | expand

Commit Message

Alice Ryhl Nov. 29, 2023, 12:51 p.m. UTC
From: Wedson Almeida Filho <wedsonaf@gmail.com>

This abstraction makes it possible to manipulate the open files for a
process. The new `File` struct wraps the C `struct file`. When accessing
it using the smart pointer `ARef<File>`, the pointer will own a
reference count to the file. When accessing it as `&File`, then the
reference does not own a refcount, but the borrow checker will ensure
that the reference count does not hit zero while the `&File` is live.

Since this is intended to manipulate the open files of a process, we
introduce a `from_fd` constructor that corresponds to the C `fget`
method. In future patches, it will become possible to create a new fd in
a process and bind it to a `File`. Rust Binder will use these to send
fds from one process to another.

We also provide a method for accessing the file's flags. Rust Binder
will use this to access the flags of the Binder fd to check whether the
non-blocking flag is set, which affects what the Binder ioctl does.

This introduces a struct for the EBADF error type, rather than just
using the Error type directly. This has two advantages:
* `File::from_fd` returns a `Result<ARef<File>, BadFdError>`, which the
  compiler will represent as a single pointer, with null being an error.
  This is possible because the compiler understands that `BadFdError`
  has only one possible value, and it also understands that the
  `ARef<File>` smart pointer is guaranteed non-null.
* Additionally, we promise to users of the method that the method can
  only fail with EBADF, which means that they can rely on this promise
  without having to inspect its implementation.
That said, there are also two disadvantages:
* Defining additional error types involves boilerplate.
* The question mark operator will only utilize the `From` trait once,
  which prevents you from using the question mark operator on
  `BadFdError` in methods that return some third error type that the
  kernel `Error` is convertible into. (However, it works fine in methods
  that return `Error`.)

Signed-off-by: Wedson Almeida Filho <wedsonaf@gmail.com>
Co-developed-by: Daniel Xu <dxu@dxuuu.xyz>
Signed-off-by: Daniel Xu <dxu@dxuuu.xyz>
Co-developed-by: Alice Ryhl <aliceryhl@google.com>
Signed-off-by: Alice Ryhl <aliceryhl@google.com>
---
 rust/bindings/bindings_helper.h |   2 +
 rust/helpers.c                  |   7 ++
 rust/kernel/file.rs             | 182 ++++++++++++++++++++++++++++++++++++++++
 rust/kernel/lib.rs              |   1 +
 4 files changed, 192 insertions(+)

Comments

Matthew Wilcox Nov. 29, 2023, 3:13 p.m. UTC | #1
On Wed, Nov 29, 2023 at 12:51:07PM +0000, Alice Ryhl wrote:
> This introduces a struct for the EBADF error type, rather than just
> using the Error type directly. This has two advantages:
> * `File::from_fd` returns a `Result<ARef<File>, BadFdError>`, which the
>   compiler will represent as a single pointer, with null being an error.
>   This is possible because the compiler understands that `BadFdError`
>   has only one possible value, and it also understands that the
>   `ARef<File>` smart pointer is guaranteed non-null.
> * Additionally, we promise to users of the method that the method can
>   only fail with EBADF, which means that they can rely on this promise
>   without having to inspect its implementation.
> That said, there are also two disadvantages:
> * Defining additional error types involves boilerplate.
> * The question mark operator will only utilize the `From` trait once,
>   which prevents you from using the question mark operator on
>   `BadFdError` in methods that return some third error type that the
>   kernel `Error` is convertible into. (However, it works fine in methods
>   that return `Error`.)

I haven't looked at how Rust-for-Linux handles errors yet, but it's
disappointing to see that it doesn't do something like the PTR_ERR /
ERR_PTR / IS_ERR C thing under the hood.

> @@ -157,6 +158,12 @@ void rust_helper_init_work_with_key(struct work_struct *work, work_func_t func,
>  }
>  EXPORT_SYMBOL_GPL(rust_helper_init_work_with_key);
>  
> +struct file *rust_helper_get_file(struct file *f)
> +{
> +	return get_file(f);
> +}
> +EXPORT_SYMBOL_GPL(rust_helper_get_file);

This is ridiculous.  A function call instead of doing the
atomic_long_inc() in Rust?
Peter Zijlstra Nov. 29, 2023, 3:23 p.m. UTC | #2
On Wed, Nov 29, 2023 at 03:13:22PM +0000, Matthew Wilcox wrote:

> > @@ -157,6 +158,12 @@ void rust_helper_init_work_with_key(struct work_struct *work, work_func_t func,
> >  }
> >  EXPORT_SYMBOL_GPL(rust_helper_init_work_with_key);
> >  
> > +struct file *rust_helper_get_file(struct file *f)
> > +{
> > +	return get_file(f);
> > +}
> > +EXPORT_SYMBOL_GPL(rust_helper_get_file);
> 
> This is ridiculous.  A function call instead of doing the
> atomic_long_inc() in Rust?

Yeah, I complained about something similar a while ago. And recently
talked to Boqun about this as well,

Bindgen *could* in theory 'compile' the inline C headers into (unsafe)
Rust, the immediate problem is that Rust has a wildly different inline
asm syntax (because Rust needs terrible syntax or whatever).

Boqun said it should all be fixable, but is a non-trivial amount of
work.
Alice Ryhl Nov. 29, 2023, 4:42 p.m. UTC | #3
Matthew Wilcox <willy@infradead.org>
> I haven't looked at how Rust-for-Linux handles errors yet, but it's
> disappointing to see that it doesn't do something like the PTR_ERR /
> ERR_PTR / IS_ERR C thing under the hood.

It would be cool to do that, but we haven't written the infrastructure
to do that yet. (Note that in this particular case, the C function also
returns the error as a null pointer.)

>> @@ -157,6 +158,12 @@ void rust_helper_init_work_with_key(struct work_struct *work, work_func_t func,
>>  }
>>  EXPORT_SYMBOL_GPL(rust_helper_init_work_with_key);
>>  
>> +struct file *rust_helper_get_file(struct file *f)
>> +{
>> +	return get_file(f);
>> +}
>> +EXPORT_SYMBOL_GPL(rust_helper_get_file);
> 
> This is ridiculous.  A function call instead of doing the
> atomic_long_inc() in Rust?

I think there are two factors to consider here:

First, doing the atomic increment from Rust currently runs into the
memory model split between the C++ and LKMM memory models. It would be
like using the C11 atomic_fetch_add instead of the one that the Kernel
defines for LKMM using inline assembly. When I discussed this with Paul
McKenney, we were advised that its best to avoid mixing the memory
models.

Avoiding this would require that we replicate the inline assembly that C
uses to define its atomic operations on the Rust side. This is something
that I think should be done, but it hasn't been done yet.


Second, there's potentially an increased maintenance burden when C
methods are reimplemented in Rust. Any change to the implementation on
the C side would have to be reflected on the Rust side.

Alice
Peter Zijlstra Nov. 29, 2023, 4:45 p.m. UTC | #4
On Wed, Nov 29, 2023 at 04:42:51PM +0000, Alice Ryhl wrote:
> Second, there's potentially an increased maintenance burden when C
> methods are reimplemented in Rust. Any change to the implementation on
> the C side would have to be reflected on the Rust side.

C to Rust compiler FTW :-)
Christian Brauner Nov. 29, 2023, 5:06 p.m. UTC | #5
On Wed, Nov 29, 2023 at 12:51:07PM +0000, Alice Ryhl wrote:
> From: Wedson Almeida Filho <wedsonaf@gmail.com>
> 
> This abstraction makes it possible to manipulate the open files for a
> process. The new `File` struct wraps the C `struct file`. When accessing
> it using the smart pointer `ARef<File>`, the pointer will own a
> reference count to the file. When accessing it as `&File`, then the
> reference does not own a refcount, but the borrow checker will ensure
> that the reference count does not hit zero while the `&File` is live.

Could you explain this in more details please? Ideally with some C and
how that translates to your Rust wrappers, please. Sorry, this is going
to be a long journey...

> 
> Since this is intended to manipulate the open files of a process, we
> introduce a `from_fd` constructor that corresponds to the C `fget`
> method. In future patches, it will become possible to create a new fd in
> a process and bind it to a `File`. Rust Binder will use these to send
> fds from one process to another.
> 
> We also provide a method for accessing the file's flags. Rust Binder
> will use this to access the flags of the Binder fd to check whether the
> non-blocking flag is set, which affects what the Binder ioctl does.
> 
> This introduces a struct for the EBADF error type, rather than just
> using the Error type directly. This has two advantages:
> * `File::from_fd` returns a `Result<ARef<File>, BadFdError>`, which the
>   compiler will represent as a single pointer, with null being an error.
>   This is possible because the compiler understands that `BadFdError`
>   has only one possible value, and it also understands that the
>   `ARef<File>` smart pointer is guaranteed non-null.
> * Additionally, we promise to users of the method that the method can
>   only fail with EBADF, which means that they can rely on this promise
>   without having to inspect its implementation.
> That said, there are also two disadvantages:
> * Defining additional error types involves boilerplate.
> * The question mark operator will only utilize the `From` trait once,
>   which prevents you from using the question mark operator on
>   `BadFdError` in methods that return some third error type that the
>   kernel `Error` is convertible into. (However, it works fine in methods
>   that return `Error`.)
> 
> Signed-off-by: Wedson Almeida Filho <wedsonaf@gmail.com>
> Co-developed-by: Daniel Xu <dxu@dxuuu.xyz>
> Signed-off-by: Daniel Xu <dxu@dxuuu.xyz>
> Co-developed-by: Alice Ryhl <aliceryhl@google.com>
> Signed-off-by: Alice Ryhl <aliceryhl@google.com>
> ---
>  rust/bindings/bindings_helper.h |   2 +
>  rust/helpers.c                  |   7 ++
>  rust/kernel/file.rs             | 182 ++++++++++++++++++++++++++++++++++++++++
>  rust/kernel/lib.rs              |   1 +
>  4 files changed, 192 insertions(+)
> 
> diff --git a/rust/bindings/bindings_helper.h b/rust/bindings/bindings_helper.h
> index 85f013ed4ca4..beed3ef1fbc3 100644
> --- a/rust/bindings/bindings_helper.h
> +++ b/rust/bindings/bindings_helper.h
> @@ -8,6 +8,8 @@
>  
>  #include <kunit/test.h>
>  #include <linux/errname.h>
> +#include <linux/file.h>
> +#include <linux/fs.h>
>  #include <linux/slab.h>
>  #include <linux/refcount.h>
>  #include <linux/wait.h>
> diff --git a/rust/helpers.c b/rust/helpers.c
> index 70e59efd92bc..03141a3608a4 100644
> --- a/rust/helpers.c
> +++ b/rust/helpers.c
> @@ -25,6 +25,7 @@
>  #include <linux/build_bug.h>
>  #include <linux/err.h>
>  #include <linux/errname.h>
> +#include <linux/fs.h>
>  #include <linux/mutex.h>
>  #include <linux/refcount.h>
>  #include <linux/sched/signal.h>
> @@ -157,6 +158,12 @@ void rust_helper_init_work_with_key(struct work_struct *work, work_func_t func,
>  }
>  EXPORT_SYMBOL_GPL(rust_helper_init_work_with_key);
>  
> +struct file *rust_helper_get_file(struct file *f)
> +{
> +	return get_file(f);
> +}
> +EXPORT_SYMBOL_GPL(rust_helper_get_file);
> +
>  /*
>   * `bindgen` binds the C `size_t` type as the Rust `usize` type, so we can
>   * use it in contexts where Rust expects a `usize` like slice (array) indices.
> diff --git a/rust/kernel/file.rs b/rust/kernel/file.rs
> new file mode 100644
> index 000000000000..ee4ec8b919af
> --- /dev/null
> +++ b/rust/kernel/file.rs
> @@ -0,0 +1,182 @@
> +// SPDX-License-Identifier: GPL-2.0
> +
> +//! Files and file descriptors.
> +//!
> +//! C headers: [`include/linux/fs.h`](../../../../include/linux/fs.h) and
> +//! [`include/linux/file.h`](../../../../include/linux/file.h)
> +
> +use crate::{
> +    bindings,
> +    error::{code::*, Error, Result},
> +    types::{ARef, AlwaysRefCounted, Opaque},
> +};
> +use core::ptr;
> +
> +/// Flags associated with a [`File`].
> +pub mod flags {
> +    /// File is opened in append mode.
> +    pub const O_APPEND: u32 = bindings::O_APPEND;
> +
> +    /// Signal-driven I/O is enabled.
> +    pub const O_ASYNC: u32 = bindings::FASYNC;
> +
> +    /// Close-on-exec flag is set.
> +    pub const O_CLOEXEC: u32 = bindings::O_CLOEXEC;
> +
> +    /// File was created if it didn't already exist.
> +    pub const O_CREAT: u32 = bindings::O_CREAT;
> +
> +    /// Direct I/O is enabled for this file.
> +    pub const O_DIRECT: u32 = bindings::O_DIRECT;
> +
> +    /// File must be a directory.
> +    pub const O_DIRECTORY: u32 = bindings::O_DIRECTORY;
> +
> +    /// Like [`O_SYNC`] except metadata is not synced.
> +    pub const O_DSYNC: u32 = bindings::O_DSYNC;
> +
> +    /// Ensure that this file is created with the `open(2)` call.
> +    pub const O_EXCL: u32 = bindings::O_EXCL;
> +
> +    /// Large file size enabled (`off64_t` over `off_t`).
> +    pub const O_LARGEFILE: u32 = bindings::O_LARGEFILE;
> +
> +    /// Do not update the file last access time.
> +    pub const O_NOATIME: u32 = bindings::O_NOATIME;
> +
> +    /// File should not be used as process's controlling terminal.
> +    pub const O_NOCTTY: u32 = bindings::O_NOCTTY;
> +
> +    /// If basename of path is a symbolic link, fail open.
> +    pub const O_NOFOLLOW: u32 = bindings::O_NOFOLLOW;
> +
> +    /// File is using nonblocking I/O.
> +    pub const O_NONBLOCK: u32 = bindings::O_NONBLOCK;
> +
> +    /// Also known as `O_NDELAY`.
> +    ///
> +    /// This is effectively the same flag as [`O_NONBLOCK`] on all architectures
> +    /// except SPARC64.
> +    pub const O_NDELAY: u32 = bindings::O_NDELAY;
> +
> +    /// Used to obtain a path file descriptor.
> +    pub const O_PATH: u32 = bindings::O_PATH;
> +
> +    /// Write operations on this file will flush data and metadata.
> +    pub const O_SYNC: u32 = bindings::O_SYNC;
> +
> +    /// This file is an unnamed temporary regular file.
> +    pub const O_TMPFILE: u32 = bindings::O_TMPFILE;
> +
> +    /// File should be truncated to length 0.
> +    pub const O_TRUNC: u32 = bindings::O_TRUNC;
> +
> +    /// Bitmask for access mode flags.
> +    ///
> +    /// # Examples
> +    ///
> +    /// ```
> +    /// use kernel::file;
> +    /// # fn do_something() {}
> +    /// # let flags = 0;
> +    /// if (flags & file::flags::O_ACCMODE) == file::flags::O_RDONLY {
> +    ///     do_something();
> +    /// }
> +    /// ```
> +    pub const O_ACCMODE: u32 = bindings::O_ACCMODE;
> +
> +    /// File is read only.
> +    pub const O_RDONLY: u32 = bindings::O_RDONLY;
> +
> +    /// File is write only.
> +    pub const O_WRONLY: u32 = bindings::O_WRONLY;
> +
> +    /// File can be both read and written.
> +    pub const O_RDWR: u32 = bindings::O_RDWR;
> +}
> +
> +/// Wraps the kernel's `struct file`.
> +///
> +/// # Invariants
> +///
> +/// Instances of this type are always ref-counted, that is, a call to `get_file` ensures that the
> +/// allocation remains valid at least until the matching call to `fput`.
> +#[repr(transparent)]
> +pub struct File(Opaque<bindings::file>);
> +
> +// SAFETY: By design, the only way to access a `File` is via an immutable reference or an `ARef`.
> +// This means that the only situation in which a `File` can be accessed mutably is when the
> +// refcount drops to zero and the destructor runs. It is safe for that to happen on any thread, so
> +// it is ok for this type to be `Send`.
> +unsafe impl Send for File {}
> +
> +// SAFETY: It's OK to access `File` through shared references from other threads because we're
> +// either accessing properties that don't change or that are properly synchronised by C code.

Uhm, what guarantees are you talking about specifically, please?
Examples would help.

> +unsafe impl Sync for File {}
> +
> +impl File {
> +    /// Constructs a new `struct file` wrapper from a file descriptor.
> +    ///
> +    /// The file descriptor belongs to the current process.
> +    pub fn from_fd(fd: u32) -> Result<ARef<Self>, BadFdError> {
> +        // SAFETY: FFI call, there are no requirements on `fd`.
> +        let ptr = ptr::NonNull::new(unsafe { bindings::fget(fd) }).ok_or(BadFdError)?;
> +
> +        // INVARIANT: `fget` increments the refcount before returning.
> +        Ok(unsafe { ARef::from_raw(ptr.cast()) })
> +    }

I think this is really misnamed.

File reference counting has two modes. For simplicity let's ignore
fdget_pos() for now:

(1) fdget()
    Return file either with or without an increased reference count.
    If the fdtable was shared increment reference count, if not don't
    increment refernce count.
(2) fget()
    Always increase refcount.

Your File implementation currently only deals with (2). And this
terminology is terribly important as far as I'm concerned. This wants to
be fget() and not from_fd(). The latter tells me nothing. I feel we
really need to try and mirror the current naming closely. Not
religiously ofc but core stuff such as this really benefits from having
an almost 1:1 mapping between C names and Rust names, I think.
Especially in the beginning.

> +
> +    /// Creates a reference to a [`File`] from a valid pointer.
> +    ///
> +    /// # Safety
> +    ///
> +    /// The caller must ensure that `ptr` points at a valid file and that its refcount does not
> +    /// reach zero during the lifetime 'a.
> +    pub unsafe fn from_ptr<'a>(ptr: *const bindings::file) -> &'a File {
> +        // INVARIANT: The safety requirements guarantee that the refcount does not hit zero during
> +        // 'a. The cast is okay because `File` is `repr(transparent)`.
> +        unsafe { &*ptr.cast() }
> +    }

How does that work and what is this used for? It's required that a
caller has called from_fd()/fget() first before from_ptr() can be used?

Can you show how this would be used in an example, please? Unless you
hold file_lock it is now invalid to access fields in struct file just
with rcu lock held for example. Which is why returning a pointer without
holding a reference seems dodgy. I'm probably just missing context.

> +
> +    /// Returns the flags associated with the file.
> +    ///
> +    /// The flags are a combination of the constants in [`flags`].
> +    pub fn flags(&self) -> u32 {
> +        // This `read_volatile` is intended to correspond to a READ_ONCE call.
> +        //
> +        // SAFETY: The file is valid because the shared reference guarantees a nonzero refcount.

I really need to understand what you mean by shared reference. At least
in the current C implementation you can't share a reference without
another task as the other task might fput() behind you and then you're
hosed. That's why we have the fdget() logic.

> +        //
> +        // TODO: Replace with `read_once` when available on the Rust side.
> +        unsafe { core::ptr::addr_of!((*self.0.get()).f_flags).read_volatile() }
> +    }
> +}
> +
> +// SAFETY: The type invariants guarantee that `File` is always ref-counted.
> +unsafe impl AlwaysRefCounted for File {
> +    fn inc_ref(&self) {
> +        // SAFETY: The existence of a shared reference means that the refcount is nonzero.
> +        unsafe { bindings::get_file(self.0.get()) };
> +    }

Why inc_ref() and not just get_file()?

> +
> +    unsafe fn dec_ref(obj: ptr::NonNull<Self>) {
> +        // SAFETY: The safety requirements guarantee that the refcount is nonzero.
> +        unsafe { bindings::fput(obj.cast().as_ptr()) }
> +    }

Ok, so this makes me think that from_ptr() requires you to have called
from_fd()/fget() first which would be good.

> +}
> +
> +/// Represents the `EBADF` error code.
> +///
> +/// Used for methods that can only fail with `EBADF`.
> +pub struct BadFdError;
> +
> +impl From<BadFdError> for Error {
> +    fn from(_: BadFdError) -> Error {
> +        EBADF
> +    }
> +}
> +
> +impl core::fmt::Debug for BadFdError {
> +    fn fmt(&self, f: &mut core::fmt::Formatter<'_>) -> core::fmt::Result {
> +        f.pad("EBADF")
> +    }
> +}
> diff --git a/rust/kernel/lib.rs b/rust/kernel/lib.rs
> index e6aff80b521f..ce9abceab784 100644
> --- a/rust/kernel/lib.rs
> +++ b/rust/kernel/lib.rs
> @@ -34,6 +34,7 @@
>  mod allocator;
>  mod build_assert;
>  pub mod error;
> +pub mod file;
>  pub mod init;
>  pub mod ioctl;
>  #[cfg(CONFIG_KUNIT)]
> 
> -- 
> 2.43.0.rc1.413.gea7ed67945-goog
>
Boqun Feng Nov. 29, 2023, 5:08 p.m. UTC | #6
On Wed, Nov 29, 2023 at 04:23:05PM +0100, Peter Zijlstra wrote:
> On Wed, Nov 29, 2023 at 03:13:22PM +0000, Matthew Wilcox wrote:
> 
> > > @@ -157,6 +158,12 @@ void rust_helper_init_work_with_key(struct work_struct *work, work_func_t func,
> > >  }
> > >  EXPORT_SYMBOL_GPL(rust_helper_init_work_with_key);
> > >  
> > > +struct file *rust_helper_get_file(struct file *f)
> > > +{
> > > +	return get_file(f);
> > > +}
> > > +EXPORT_SYMBOL_GPL(rust_helper_get_file);
> > 
> > This is ridiculous.  A function call instead of doing the
> > atomic_long_inc() in Rust?
> 
> Yeah, I complained about something similar a while ago. And recently
> talked to Boqun about this as well,
> 
> Bindgen *could* in theory 'compile' the inline C headers into (unsafe)
> Rust, the immediate problem is that Rust has a wildly different inline
> asm syntax (because Rust needs terrible syntax or whatever).
> 
> Boqun said it should all be fixable, but is a non-trivial amount of
> work.
> 

Right, but TBH, I was only thinking about "inlining" our atomic
primitives back then. The idea is since atomic primitives only have
small body (most of which is asm code), it's relatively easy to
translate that from a C function into a Rust one. And what's left is
translating asm blocks. Things get interesting here:


Originally I think the translation, despite the different syntax, might
be relatively easy, for example, considering smp_store_release() on
ARM64, we are going to translate from

	asm volatile ("stlr %w1, %0"				\
			: "=Q" (*__p)				\
			: "rZ" (*(__u32 *)__u.__c)		\
			: "memory");

to something like:

	asm!("stlr {val}, [{ptr}]",
	     val = in(reg) __u.__c,
	     ptr = in(reg) __p);

, the translation is non-trivial, but it's not that hard, since it's
basically find-and-replace.

But but but, I then realized we have asm goto in C but Rust doesn't
support them, and I haven't thought through how hard tht would be..

Regards,
Boqun
Alice Ryhl Nov. 29, 2023, 9:27 p.m. UTC | #7
Christian Brauner <brauner@kernel.org> writes:
>> This abstraction makes it possible to manipulate the open files for a
>> process. The new `File` struct wraps the C `struct file`. When accessing
>> it using the smart pointer `ARef<File>`, the pointer will own a
>> reference count to the file. When accessing it as `&File`, then the
>> reference does not own a refcount, but the borrow checker will ensure
>> that the reference count does not hit zero while the `&File` is live.
> 
> Could you explain this in more details please? Ideally with some C and
> how that translates to your Rust wrappers, please. Sorry, this is going
> to be a long journey...

Yes of course. This touches on what I think is one of the most important
features that Rust brings to the table, which is the idea of defining
many different pointer types for different ownership semantics.

In the case of `struct file`, there are two pointer types that are
relevant:

 * `&File`. This is an "immutable reference" or "shared reference"
   (both names are used). This pointer type is used when you don't have
   any ownership over the target at all. All you have is _some_ sort of
   guarantee that target stays alive while the reference is live. In
   many cases, the borrow checker helps validate this at compile-time,
   but you can also use a backdoor (in this case from_ptr) to just
   unsafely say "I know this value is valid, so shut up compiler".
   Shared references have no destructor.

 * `ARef<File>`. The `ARef` type is a custom pointer type defined in the
   kernel in `rust/kernel/types.rs`. It represents a pointer that owns a
   ref-count to the inner value. ARef can only be used with types that
   have an `unsafe impl AlwaysRefCounted for T` block. Whenever you
   clone an `ARef`, it calls into the `inc_ref` method that you defined
   for the type, and whenever it goes out of scope and the destructor
   runs, it calls the `dec_ref` method that you defined for the type.

Potentially we might want a third in the future. The third pointer type
could be another custom pointer type just for `struct file` that uses
`fdget` instead of `fget`. However, I haven't added this since I don't
need it (dead code and so on).

To give an example of this, consider this really simple C function:

	bool is_nonblocking(struct file *file) {
		return !!(filp->f_flags & O_NONBLOCK);
	}

What are the ownership semantics of `file`? Well, we don't really care.
The caller needs to somehow ensure that the `file` is valid, but we
don't care if they're doing that with `fdget` or `fget` or whatever.
This corresponds to &File, so the Rust equivalent would be:

	fn is_nonblocking(file: &File) -> bool {
		(file.flags() & O_NONBLOCK) != 0
	}

Another example:

	void set_nonblocking_and_fput(struct file *file) {
		// Let's just ignore the lock for this example.
		file->f_flags |= O_NONBLOCK;

		fput(file);
	}

This method takes a file, sets it to non-blocking, and then destroys the
ref-count. What are the ownership semantics? Well, the caller should own
an `fget` ref-count, and we consume that ref-count. The equivalent Rust
code would be to take an `ARef<File>`:

	fn set_nonblocking_and_fput(file: ARef<File>) {
		file.set_flag(O_NONBLOCK);

		// When `file` goes out of scope here, the destructor
		// runs and calls `fput`. (Since that's what we defined
		// `ARef` to do on drop in `fn dec_ref`.)
	}

You can also explicitly call the destructor with `drop(file)`:

	fn set_nonblocking_and_fput(file: ARef<File>) {
		file.set_flag(O_NONBLOCK);
		drop(file);

		// When `file` goes out of scope, the destructor does
		// *not* run. This is because `drop(file)` is a move
		// (due to the signature of drop), and if you perform a
		// move, then the destructor no longer runs at the end
		// of the scope.
	}

And note that this would not compile, because we give up ownership of
the `ARef` by passing it to `drop`:

	fn set_nonblocking_and_fput(file: ARef<File>) {
		drop(file);
		file.set_flag(O_NONBLOCK);
	}

A third example:

	struct holds_a_file {
		struct file *inner;
	};

	struct file *get_the_file(struct holds_a_file *holder) {
		return holder->inner;
	}

What are the ownership semantics? Well, let's say that `holds_a_file`
owns a refcount to the file. Then, the pointer returned by get_the_file
is valid as long as `holder` is, but it doesn't have any ownership
over the file. You must stop using the returned file pointer before the
holder is destroyed.

The Rust equivalent is:

	struct HoldsAFile {
		inner: ARef<File>,
	}

	fn get_the_file(holder: &HoldsAFile) -> &File {
		&holder.inner
	}

The method signature is short-hand for (see [1]):

	fn get_the_file<'a>(holder: &'a HoldsAFile) -> &'a File {
		&holder.inner
	}

Here, 'a is a lifetime, and it ties together `holder` and the returned
reference in the way I described above. So e.g., this compiles:

	let holder = ...;
	let file = get_the_file(&holder);
	use_the_file(file);

But this doesn't:

	let holder = ...;
	let file = get_the_file(&holder);
	drop(holder);
	use_the_file(file); // Oops, destroying holder calls fput.

Notice also how the compiler accepted `&holder.inner` as the type
`&File` even though `inner` has type `ARef<File>`. This is because
`ARef` is defined to use something called deref coercion, which makes it
act like a real pointer type. This means that if you have an
`ARef<File>`, but you want to call a method that accepts `&File`, then
it will just work. (Deref coercion only exists for conversions into
reference types, so you can't pass a `&File` to something that takes an
`ARef<File>` without explicitly upgrading it to an `ARef<File>` by
taking a ref-count.)

[1]: https://doc.rust-lang.org/reference/lifetime-elision.html

>> +    /// Constructs a new `struct file` wrapper from a file descriptor.
>> +    ///
>> +    /// The file descriptor belongs to the current process.
>> +    pub fn from_fd(fd: u32) -> Result<ARef<Self>, BadFdError> {
>> +        // SAFETY: FFI call, there are no requirements on `fd`.
>> +        let ptr = ptr::NonNull::new(unsafe { bindings::fget(fd) }).ok_or(BadFdError)?;
>> +
>> +        // INVARIANT: `fget` increments the refcount before returning.
>> +        Ok(unsafe { ARef::from_raw(ptr.cast()) })
>> +    }
> 
> I think this is really misnamed.
> 
> File reference counting has two modes. For simplicity let's ignore
> fdget_pos() for now:
> 
> (1) fdget()
>     Return file either with or without an increased reference count.
>     If the fdtable was shared increment reference count, if not don't
>     increment refernce count.
> (2) fget()
>     Always increase refcount.
> 
> Your File implementation currently only deals with (2). And this
> terminology is terribly important as far as I'm concerned. This wants to
> be fget() and not from_fd(). The latter tells me nothing. I feel we
> really need to try and mirror the current naming closely. Not
> religiously ofc but core stuff such as this really benefits from having
> an almost 1:1 mapping between C names and Rust names, I think.
> Especially in the beginning.

Sure, I'll rename these methods in the next version.

>> +    /// Creates a reference to a [`File`] from a valid pointer.
>> +    ///
>> +    /// # Safety
>> +    ///
>> +    /// The caller must ensure that `ptr` points at a valid file and that its refcount does not
>> +    /// reach zero during the lifetime 'a.
>> +    pub unsafe fn from_ptr<'a>(ptr: *const bindings::file) -> &'a File {
>> +        // INVARIANT: The safety requirements guarantee that the refcount does not hit zero during
>> +        // 'a. The cast is okay because `File` is `repr(transparent)`.
>> +        unsafe { &*ptr.cast() }
>> +    }
> 
> How does that work and what is this used for? It's required that a
> caller has called from_fd()/fget() first before from_ptr() can be used?
> 
> Can you show how this would be used in an example, please? Unless you
> hold file_lock it is now invalid to access fields in struct file just
> with rcu lock held for example. Which is why returning a pointer without
> holding a reference seems dodgy. I'm probably just missing context.

This is the backdoor. You use it when *you* know that the file is okay
to access, but Rust doesn't. It's unsafe because it's not checked by
Rust.

For example you could do this:

	let ptr = unsafe { bindings::fdget(fd) };

	// SAFETY: We just called `fdget`.
	let file = unsafe { File::from_ptr(ptr) };
	use_file(file);

	// SAFETY: We're not using `file` after this call.
	unsafe { bindings::fdput(ptr) };

It's used in Binder here:
https://github.com/Darksonn/linux/blob/dca45e6c7848e024709b165a306cdbe88e5b086a/drivers/android/rust_binder.rs#L331-L332

Basically, I use it to say "C code has called fdget for us so it's okay
to access the file", whenever userspace uses a syscall to call into the
driver.

>> +// SAFETY: The type invariants guarantee that `File` is always ref-counted.
>> +unsafe impl AlwaysRefCounted for File {
>> +    fn inc_ref(&self) {
>> +        // SAFETY: The existence of a shared reference means that the refcount is nonzero.
>> +        unsafe { bindings::get_file(self.0.get()) };
>> +    }
> 
> Why inc_ref() and not just get_file()?

Whenever you see an impl block that uses the keyword "for", then the
code is implementing a trait. In this case, the trait being implemented
is AlwaysRefCounted, which allows File to work with ARef.

It has to be `inc_ref` because that's what AlwaysRefCounted calls this
method.

>> +    unsafe fn dec_ref(obj: ptr::NonNull<Self>) {
>> +        // SAFETY: The safety requirements guarantee that the refcount is nonzero.
>> +        unsafe { bindings::fput(obj.cast().as_ptr()) }
>> +    }
> 
> Ok, so this makes me think that from_ptr() requires you to have called
> from_fd()/fget() first which would be good.

Actually, `from_ptr` has nothing to do with this. The above code only
applies to code that uses the `ARef` pointer type, but `from_ptr` uses
the `&File` pointer type instead.

>> +    /// Returns the flags associated with the file.
>> +    ///
>> +    /// The flags are a combination of the constants in [`flags`].
>> +    pub fn flags(&self) -> u32 {
>> +        // This `read_volatile` is intended to correspond to a READ_ONCE call.
>> +        //
>> +        // SAFETY: The file is valid because the shared reference guarantees a nonzero refcount.
> 
> I really need to understand what you mean by shared reference. At least
> in the current C implementation you can't share a reference without
> another task as the other task might fput() behind you and then you're
> hosed. That's why we have the fdget() logic.

By "shared reference" I just mean an `&File`. They're called shared
because there could be other pointers to the same object elsewhere in
the program, and not because we have explicitly shared it ourselves.

Rust's other type of reference `&mut T` is called a "mutable reference"
or "exclusive reference". Like with `&T`, both names are used.

> > +// SAFETY: It's OK to access `File` through shared references from other threads because we're
> > +// either accessing properties that don't change or that are properly synchronised by C code.
> 
> Uhm, what guarantees are you talking about specifically, please?
> Examples would help.
> 
> > +unsafe impl Sync for File {}

The Sync trait defines whether a value may be accessed from several
threads in parallel (using shared/immutable references). In our case,
every method on `File` that accepts a `&File` is okay to be called in
parallel from several threads, so it's okay for `File` to implement
`Sync`.

I'm actually making a statement about the rest of the Rust code in this
file here. If I added a method that took `&File`, but couldn't be called
in parallel, then `File` could no longer be `Sync`.



I hope that helps, and let me know if you have any other questions.

Alice
Benno Lossin Nov. 29, 2023, 11:17 p.m. UTC | #8
On 29.11.23 22:27, Alice Ryhl wrote:
> Another example:
> 
> 	void set_nonblocking_and_fput(struct file *file) {
> 		// Let's just ignore the lock for this example.
> 		file->f_flags |= O_NONBLOCK;
> 
> 		fput(file);
> 	}
> 
> This method takes a file, sets it to non-blocking, and then destroys the
> ref-count. What are the ownership semantics? Well, the caller should own
> an `fget` ref-count, and we consume that ref-count. The equivalent Rust
> code would be to take an `ARef<File>`:
> 
> 	fn set_nonblocking_and_fput(file: ARef<File>) {
> 		file.set_flag(O_NONBLOCK);
> 
> 		// When `file` goes out of scope here, the destructor
> 		// runs and calls `fput`. (Since that's what we defined
> 		// `ARef` to do on drop in `fn dec_ref`.)
> 	}
> 
> You can also explicitly call the destructor with `drop(file)`:
> 
> 	fn set_nonblocking_and_fput(file: ARef<File>) {
> 		file.set_flag(O_NONBLOCK);
> 		drop(file);
> 
> 		// When `file` goes out of scope, the destructor does
> 		// *not* run. This is because `drop(file)` is a move
> 		// (due to the signature of drop), and if you perform a
> 		// move, then the destructor no longer runs at the end
> 		// of the scope.

I want to note that while the destructor does not run at the end of the
scope, it still *does* run: the `drop(file)` call runs the destructor.

> 	}
> 
> And note that this would not compile, because we give up ownership of
> the `ARef` by passing it to `drop`:
> 
> 	fn set_nonblocking_and_fput(file: ARef<File>) {
> 		drop(file);
> 		file.set_flag(O_NONBLOCK);
> 	}
>

[...]

>>> +// SAFETY: The type invariants guarantee that `File` is always ref-counted.
>>> +unsafe impl AlwaysRefCounted for File {
>>> +    fn inc_ref(&self) {
>>> +        // SAFETY: The existence of a shared reference means that the refcount is nonzero.
>>> +        unsafe { bindings::get_file(self.0.get()) };
>>> +    }
>>
>> Why inc_ref() and not just get_file()?
> 
> Whenever you see an impl block that uses the keyword "for", then the
> code is implementing a trait. In this case, the trait being implemented
> is AlwaysRefCounted, which allows File to work with ARef.
> 
> It has to be `inc_ref` because that's what AlwaysRefCounted calls this
> method.

I am not sure if the Rust term "trait" is well-known, so for a bit more
context, I am quoting the Rust Book [1]:

    A *trait* defines functionality a particular type has and can share
    with other types. We can use traits to define shared behavior in an
    abstract way. We can use *trait bounds* to specify that a generic type
    can be any type that has certain behavior.

[1]: https://doc.rust-lang.org/book/ch10-02-traits.html

We have created an abstraction over reference counting:
the trait `AlwaysRefCounted` and the struct `ARef<T>` where `T`
implements `AlwaysRefCounted`.
As Alice already explained, `ARef<T>` is a pointer that owns a refcount
on the object. Because `ARef<T>` needs to know how to increment and
decrement that counter. For example, when you want to create another
`ARef<T>` you can `clone()` it and therefore `ARef<T>` needs to
increment the refcount. And when you drop it, `ARef<T>` needs to
decrement it.
The "`ARef<T>` knows how to inc/dec the refcount" part is done by the
`AlwaysRefCounted` trait. And there we chose to name the functions
`inc_ref` and `dec_ref`, since these are the *general*/*abstract*
operations and not any specific refcount adjustment.



Hope that also helped and did not create confusion.

--
Cheers,
Benno
Peter Zijlstra Nov. 30, 2023, 10:42 a.m. UTC | #9
On Wed, Nov 29, 2023 at 09:08:14AM -0800, Boqun Feng wrote:

> But but but, I then realized we have asm goto in C but Rust doesn't
> support them, and I haven't thought through how hard tht would be..

You're kidding right?

I thought we *finally* deprecated all compilers that didn't support
asm-goto and x86 now mandates asm-goto to build, and then this toy
language comes around ?

What a load of crap ...
Christian Brauner Nov. 30, 2023, 10:48 a.m. UTC | #10
On Wed, Nov 29, 2023 at 09:27:07PM +0000, Alice Ryhl wrote:
> Christian Brauner <brauner@kernel.org> writes:
> >> This abstraction makes it possible to manipulate the open files for a
> >> process. The new `File` struct wraps the C `struct file`. When accessing
> >> it using the smart pointer `ARef<File>`, the pointer will own a
> >> reference count to the file. When accessing it as `&File`, then the
> >> reference does not own a refcount, but the borrow checker will ensure
> >> that the reference count does not hit zero while the `&File` is live.
> > 
> > Could you explain this in more details please? Ideally with some C and
> > how that translates to your Rust wrappers, please. Sorry, this is going
> > to be a long journey...
> 
> Yes of course. This touches on what I think is one of the most important

Thanks for all the background.

> features that Rust brings to the table, which is the idea of defining
> many different pointer types for different ownership semantics.
> 
> In the case of `struct file`, there are two pointer types that are
> relevant:
> 
>  * `&File`. This is an "immutable reference" or "shared reference"
>    (both names are used). This pointer type is used when you don't have
>    any ownership over the target at all. All you have is _some_ sort of
>    guarantee that target stays alive while the reference is live. In
>    many cases, the borrow checker helps validate this at compile-time,
>    but you can also use a backdoor (in this case from_ptr) to just
>    unsafely say "I know this value is valid, so shut up compiler".
>    Shared references have no destructor.
> 
>  * `ARef<File>`. The `ARef` type is a custom pointer type defined in the
>    kernel in `rust/kernel/types.rs`. It represents a pointer that owns a
>    ref-count to the inner value. ARef can only be used with types that
>    have an `unsafe impl AlwaysRefCounted for T` block. Whenever you
>    clone an `ARef`, it calls into the `inc_ref` method that you defined
>    for the type, and whenever it goes out of scope and the destructor
>    runs, it calls the `dec_ref` method that you defined for the type.
> 
> Potentially we might want a third in the future. The third pointer type
> could be another custom pointer type just for `struct file` that uses
> `fdget` instead of `fget`. However, I haven't added this since I don't
> need it (dead code and so on).
> 
> To give an example of this, consider this really simple C function:
> 
> 	bool is_nonblocking(struct file *file) {
> 		return !!(filp->f_flags & O_NONBLOCK);
> 	}
> 
> What are the ownership semantics of `file`? Well, we don't really care.
> The caller needs to somehow ensure that the `file` is valid, but we
> don't care if they're doing that with `fdget` or `fget` or whatever.
> This corresponds to &File, so the Rust equivalent would be:
> 
> 	fn is_nonblocking(file: &File) -> bool {
> 		(file.flags() & O_NONBLOCK) != 0
> 	}
> 
> Another example:
> 
> 	void set_nonblocking_and_fput(struct file *file) {
> 		// Let's just ignore the lock for this example.
> 		file->f_flags |= O_NONBLOCK;
> 
> 		fput(file);
> 	}
> 
> This method takes a file, sets it to non-blocking, and then destroys the
> ref-count. What are the ownership semantics? Well, the caller should own
> an `fget` ref-count, and we consume that ref-count. The equivalent Rust
> code would be to take an `ARef<File>`:
> 
> 	fn set_nonblocking_and_fput(file: ARef<File>) {
> 		file.set_flag(O_NONBLOCK);
> 
> 		// When `file` goes out of scope here, the destructor
> 		// runs and calls `fput`. (Since that's what we defined
> 		// `ARef` to do on drop in `fn dec_ref`.)
> 	}
> 
> You can also explicitly call the destructor with `drop(file)`:
> 
> 	fn set_nonblocking_and_fput(file: ARef<File>) {
> 		file.set_flag(O_NONBLOCK);
> 		drop(file);
> 
> 		// When `file` goes out of scope, the destructor does
> 		// *not* run. This is because `drop(file)` is a move
> 		// (due to the signature of drop), and if you perform a
> 		// move, then the destructor no longer runs at the end
> 		// of the scope.
> 	}
> 
> And note that this would not compile, because we give up ownership of
> the `ARef` by passing it to `drop`:
> 
> 	fn set_nonblocking_and_fput(file: ARef<File>) {
> 		drop(file);
> 		file.set_flag(O_NONBLOCK);
> 	}
> 
> A third example:
> 
> 	struct holds_a_file {
> 		struct file *inner;
> 	};
> 
> 	struct file *get_the_file(struct holds_a_file *holder) {
> 		return holder->inner;
> 	}
> 
> What are the ownership semantics? Well, let's say that `holds_a_file`
> owns a refcount to the file. Then, the pointer returned by get_the_file
> is valid as long as `holder` is, but it doesn't have any ownership
> over the file. You must stop using the returned file pointer before the
> holder is destroyed.
> 
> The Rust equivalent is:
> 
> 	struct HoldsAFile {
> 		inner: ARef<File>,
> 	}
> 
> 	fn get_the_file(holder: &HoldsAFile) -> &File {
> 		&holder.inner
> 	}
> 
> The method signature is short-hand for (see [1]):
> 
> 	fn get_the_file<'a>(holder: &'a HoldsAFile) -> &'a File {

Whenever you implement something like this - at least for fs/vfs
wrappers - I would ask you to please annotate the lifetimes with
comments. I've done a decent amount of (userspace) Rust
https://github.com/brauner/rlxc but the syntax isn't second nature to me
and I expect there to be quite a few other developers/maintainers that
aren't familiar.

> 		&holder.inner
> 	}
> 
> Here, 'a is a lifetime, and it ties together `holder` and the returned

The lieftime of the file is bound to the lifetime of the holder, ok.

> reference in the way I described above. So e.g., this compiles:
> 
> 	let holder = ...;
> 	let file = get_the_file(&holder);
> 	use_the_file(file);
> 
> But this doesn't:
> 
> 	let holder = ...;
> 	let file = get_the_file(&holder);
> 	drop(holder);
> 	use_the_file(file); // Oops, destroying holder calls fput.
> 
> Notice also how the compiler accepted `&holder.inner` as the type
> `&File` even though `inner` has type `ARef<File>`. This is because
> `ARef` is defined to use something called deref coercion, which makes it
> act like a real pointer type. This means that if you have an
> `ARef<File>`, but you want to call a method that accepts `&File`, then
> it will just work. (Deref coercion only exists for conversions into
> reference types, so you can't pass a `&File` to something that takes an
> `ARef<File>` without explicitly upgrading it to an `ARef<File>` by
> taking a ref-count.)
> 
> [1]: https://doc.rust-lang.org/reference/lifetime-elision.html
> 
> >> +    /// Constructs a new `struct file` wrapper from a file descriptor.
> >> +    ///
> >> +    /// The file descriptor belongs to the current process.
> >> +    pub fn from_fd(fd: u32) -> Result<ARef<Self>, BadFdError> {
> >> +        // SAFETY: FFI call, there are no requirements on `fd`.
> >> +        let ptr = ptr::NonNull::new(unsafe { bindings::fget(fd) }).ok_or(BadFdError)?;
> >> +
> >> +        // INVARIANT: `fget` increments the refcount before returning.
> >> +        Ok(unsafe { ARef::from_raw(ptr.cast()) })
> >> +    }
> > 
> > I think this is really misnamed.
> > 
> > File reference counting has two modes. For simplicity let's ignore
> > fdget_pos() for now:
> > 
> > (1) fdget()
> >     Return file either with or without an increased reference count.
> >     If the fdtable was shared increment reference count, if not don't
> >     increment refernce count.
> > (2) fget()
> >     Always increase refcount.
> > 
> > Your File implementation currently only deals with (2). And this
> > terminology is terribly important as far as I'm concerned. This wants to
> > be fget() and not from_fd(). The latter tells me nothing. I feel we
> > really need to try and mirror the current naming closely. Not
> > religiously ofc but core stuff such as this really benefits from having
> > an almost 1:1 mapping between C names and Rust names, I think.
> > Especially in the beginning.
> 
> Sure, I'll rename these methods in the next version.
> 
> >> +    /// Creates a reference to a [`File`] from a valid pointer.
> >> +    ///
> >> +    /// # Safety
> >> +    ///
> >> +    /// The caller must ensure that `ptr` points at a valid file and that its refcount does not
> >> +    /// reach zero during the lifetime 'a.
> >> +    pub unsafe fn from_ptr<'a>(ptr: *const bindings::file) -> &'a File {
> >> +        // INVARIANT: The safety requirements guarantee that the refcount does not hit zero during
> >> +        // 'a. The cast is okay because `File` is `repr(transparent)`.
> >> +        unsafe { &*ptr.cast() }
> >> +    }
> > 
> > How does that work and what is this used for? It's required that a
> > caller has called from_fd()/fget() first before from_ptr() can be used?
> > 
> > Can you show how this would be used in an example, please? Unless you
> > hold file_lock it is now invalid to access fields in struct file just
> > with rcu lock held for example. Which is why returning a pointer without
> > holding a reference seems dodgy. I'm probably just missing context.
> 
> This is the backdoor. You use it when *you* know that the file is okay

And a huge one.

> to access, but Rust doesn't. It's unsafe because it's not checked by
> Rust.
> 
> For example you could do this:
> 
> 	let ptr = unsafe { bindings::fdget(fd) };
> 
> 	// SAFETY: We just called `fdget`.
> 	let file = unsafe { File::from_ptr(ptr) };
> 	use_file(file);
> 
> 	// SAFETY: We're not using `file` after this call.
> 	unsafe { bindings::fdput(ptr) };
> 
> It's used in Binder here:
> https://github.com/Darksonn/linux/blob/dca45e6c7848e024709b165a306cdbe88e5b086a/drivers/android/rust_binder.rs#L331-L332
> 
> Basically, I use it to say "C code has called fdget for us so it's okay
> to access the file", whenever userspace uses a syscall to call into the
> driver.

Yeah, ok, because the fd you're operating on may be coming from fdget(). Iirc,
binder is almost by default used multi-threaded with a shared file descriptor
table? But while that means fdget() will usually bump the reference count you
can't be sure. Hmkay.

> 
> >> +// SAFETY: The type invariants guarantee that `File` is always ref-counted.
> >> +unsafe impl AlwaysRefCounted for File {
> >> +    fn inc_ref(&self) {
> >> +        // SAFETY: The existence of a shared reference means that the refcount is nonzero.
> >> +        unsafe { bindings::get_file(self.0.get()) };
> >> +    }
> > 
> > Why inc_ref() and not just get_file()?
> 
> Whenever you see an impl block that uses the keyword "for", then the
> code is implementing a trait. In this case, the trait being implemented
> is AlwaysRefCounted, which allows File to work with ARef.

Ah, right. Thanks.

> 
> It has to be `inc_ref` because that's what AlwaysRefCounted calls this
> method.
> 
> >> +    unsafe fn dec_ref(obj: ptr::NonNull<Self>) {
> >> +        // SAFETY: The safety requirements guarantee that the refcount is nonzero.
> >> +        unsafe { bindings::fput(obj.cast().as_ptr()) }
> >> +    }
> > 
> > Ok, so this makes me think that from_ptr() requires you to have called
> > from_fd()/fget() first which would be good.
> 
> Actually, `from_ptr` has nothing to do with this. The above code only
> applies to code that uses the `ARef` pointer type, but `from_ptr` uses
> the `&File` pointer type instead.
> 
> >> +    /// Returns the flags associated with the file.
> >> +    ///
> >> +    /// The flags are a combination of the constants in [`flags`].
> >> +    pub fn flags(&self) -> u32 {
> >> +        // This `read_volatile` is intended to correspond to a READ_ONCE call.
> >> +        //
> >> +        // SAFETY: The file is valid because the shared reference guarantees a nonzero refcount.
> > 
> > I really need to understand what you mean by shared reference. At least
> > in the current C implementation you can't share a reference without
> > another task as the other task might fput() behind you and then you're
> > hosed. That's why we have the fdget() logic.
> 
> By "shared reference" I just mean an `&File`. They're called shared
> because there could be other pointers to the same object elsewhere in
> the program, and not because we have explicitly shared it ourselves.

Ok, that was confusing to me because I wasn't sure whether you were talking
about sharing an ->f_count reference.

> 
> Rust's other type of reference `&mut T` is called a "mutable reference"
> or "exclusive reference". Like with `&T`, both names are used.
> 
> > > +// SAFETY: It's OK to access `File` through shared references from other threads because we're
> > > +// either accessing properties that don't change or that are properly synchronised by C code.
> > 
> > Uhm, what guarantees are you talking about specifically, please?
> > Examples would help.
> > 
> > > +unsafe impl Sync for File {}
> 
> The Sync trait defines whether a value may be accessed from several
> threads in parallel (using shared/immutable references). In our case,

So let me put this into my own words and you correct me, please:

So, this really just means that if I have two processes both with their own
fdtable and they happen to hold fds that refer to the same @file:

P1				P2
struct fd fd1 = fdget(1234);
                                 struct fd fd2 = fdget(5678);
if (!fd1.file)                   if (!fd2.file)
	return -EBADF;                 return -EBADF;

// fd1.file == fd2.file

the only if the Sync trait is implemented both P1 and P2 can in parallel call
file->f_op->poll(@file)?

So if the Sync trait isn't implemented then the compiler will prohibit that P1
and P2 at the same time call file->f_op->poll(@file)? And that's all that's
meant by a shared reference? It's really about sharing the pointer.

The thing is that "shared reference" gets a bit in our way here:

(1) If you have SCM_RIGHTs in the mix then P1 can open fd1 to @file and then
    send that @file to P2 which now has fd2 refering to @file as well. The
    @file->f_count is bumped in that process. So @file->f_count is now 2.

    Now both P1 and P2 call fdget(). Since they don't have a shared fdtable
    none of them take an additional reference to @file. IOW, @file->f_count
    may remain 2 all throughout the @file->f_op->*() operation.

    So they share a reference to that file and elide both the
    atomic_inc_not_zero() and the atomic_dec_not_zero().

(2) io_uring has fixed files whose reference count always stays at 1.
    So all io_uring operations on such fixed files share a single reference.

So that's why this is a bit confusing at first to read "shared reference".

Please add a comment on top of unsafe impl Sync for File {}
explaining/clarifying this a little that it's about calling methods on the same
file.

> every method on `File` that accepts a `&File` is okay to be called in
> parallel from several threads, so it's okay for `File` to implement
> `Sync`.
> 
> I'm actually making a statement about the rest of the Rust code in this
> file here. If I added a method that took `&File`, but couldn't be called
> in parallel, then `File` could no longer be `Sync`.
Alice Ryhl Nov. 30, 2023, 12:10 p.m. UTC | #11
Christian Brauner <brauner@kernel.org> writes:
>> This is the backdoor. You use it when *you* know that the file is okay
> 
> And a huge one.
> 
>> to access, but Rust doesn't. It's unsafe because it's not checked by
>> Rust.
>> 
>> For example you could do this:
>> 
>> 	let ptr = unsafe { bindings::fdget(fd) };
>> 
>> 	// SAFETY: We just called `fdget`.
>> 	let file = unsafe { File::from_ptr(ptr) };
>> 	use_file(file);
>> 
>> 	// SAFETY: We're not using `file` after this call.
>> 	unsafe { bindings::fdput(ptr) };
>> 
>> It's used in Binder here:
>> https://github.com/Darksonn/linux/blob/dca45e6c7848e024709b165a306cdbe88e5b086a/drivers/android/rust_binder.rs#L331-L332
>> 
>> Basically, I use it to say "C code has called fdget for us so it's okay
>> to access the file", whenever userspace uses a syscall to call into the
>> driver.
> 
> Yeah, ok, because the fd you're operating on may be coming from fdget(). Iirc,
> binder is almost by default used multi-threaded with a shared file descriptor
> table? But while that means fdget() will usually bump the reference count you
> can't be sure. Hmkay.

Even if the syscall used `fget` instead of `fdget`, I would still be
using `from_ptr` here. The `ARef` type only really makes sense when *we*
have ownership of the ref-count, but in this case we don't own it. We're
just given a promise that the caller is keeping it alive for us using
some mechanism or another.

>>>> +// SAFETY: It's OK to access `File` through shared references from other threads because we're
>>>> +// either accessing properties that don't change or that are properly synchronised by C code.
>>> 
>>> Uhm, what guarantees are you talking about specifically, please?
>>> Examples would help.
>>> 
>>>> +unsafe impl Sync for File {}
>> 
>> The Sync trait defines whether a value may be accessed from several
>> threads in parallel (using shared/immutable references). In our case,
> 
> So let me put this into my own words and you correct me, please:
> 
> So, this really just means that if I have two processes both with their own
> fdtable and they happen to hold fds that refer to the same @file:
> 
> P1				P2
> struct fd fd1 = fdget(1234);
>                                  struct fd fd2 = fdget(5678);
> if (!fd1.file)                   if (!fd2.file)
> 	return -EBADF;                 return -EBADF;
> 
> // fd1.file == fd2.file
> 
> the only if the Sync trait is implemented both P1 and P2 can in parallel call
> file->f_op->poll(@file)?
> 
> So if the Sync trait isn't implemented then the compiler will prohibit that P1
> and P2 at the same time call file->f_op->poll(@file)? And that's all that's
> meant by a shared reference? It's really about sharing the pointer.

Yeah, what you're saying sounds correct. For a type that is not Sync,
you would need a lock around the call to `poll` before the compiler
would accept the call.

(Or some other mechanism to convince the compiler that no other thread
is looking at the file at the same time. Of course, a lock is just one
way to do that.)

> The thing is that "shared reference" gets a bit in our way here:
> 
> (1) If you have SCM_RIGHTs in the mix then P1 can open fd1 to @file and then
>     send that @file to P2 which now has fd2 refering to @file as well. The
>     @file->f_count is bumped in that process. So @file->f_count is now 2.
> 
>     Now both P1 and P2 call fdget(). Since they don't have a shared fdtable
>     none of them take an additional reference to @file. IOW, @file->f_count
>     may remain 2 all throughout the @file->f_op->*() operation.
> 
>     So they share a reference to that file and elide both the
>     atomic_inc_not_zero() and the atomic_dec_not_zero().
> 
> (2) io_uring has fixed files whose reference count always stays at 1.
>     So all io_uring operations on such fixed files share a single reference.
> 
> So that's why this is a bit confusing at first to read "shared reference".
> 
> Please add a comment on top of unsafe impl Sync for File {}
> explaining/clarifying this a little that it's about calling methods on the same
> file.

Yeah, I agree, the terminology gets a bit mixed up here because we both
use the word "reference" for different things.

How about this comment?

/// All methods defined on `File` that take `&self` are safe to call even if
/// other threads are concurrently accessing the same `struct file`, because
/// those methods either access immutable properties or have proper
/// synchronization to ensure that such accesses are safe.

Note: Here, I say "take &self" to refer to methods with &self in the
signature. This signature means that you pass a &File to the method when
you call it.

Alice
Christian Brauner Nov. 30, 2023, 12:36 p.m. UTC | #12
On Thu, Nov 30, 2023 at 12:10:12PM +0000, Alice Ryhl wrote:
> Christian Brauner <brauner@kernel.org> writes:
> >> This is the backdoor. You use it when *you* know that the file is okay
> > 
> > And a huge one.
> > 
> >> to access, but Rust doesn't. It's unsafe because it's not checked by
> >> Rust.
> >> 
> >> For example you could do this:
> >> 
> >> 	let ptr = unsafe { bindings::fdget(fd) };
> >> 
> >> 	// SAFETY: We just called `fdget`.
> >> 	let file = unsafe { File::from_ptr(ptr) };
> >> 	use_file(file);
> >> 
> >> 	// SAFETY: We're not using `file` after this call.
> >> 	unsafe { bindings::fdput(ptr) };
> >> 
> >> It's used in Binder here:
> >> https://github.com/Darksonn/linux/blob/dca45e6c7848e024709b165a306cdbe88e5b086a/drivers/android/rust_binder.rs#L331-L332
> >> 
> >> Basically, I use it to say "C code has called fdget for us so it's okay
> >> to access the file", whenever userspace uses a syscall to call into the
> >> driver.
> > 
> > Yeah, ok, because the fd you're operating on may be coming from fdget(). Iirc,
> > binder is almost by default used multi-threaded with a shared file descriptor
> > table? But while that means fdget() will usually bump the reference count you
> > can't be sure. Hmkay.
> 
> Even if the syscall used `fget` instead of `fdget`, I would still be
> using `from_ptr` here. The `ARef` type only really makes sense when *we*
> have ownership of the ref-count, but in this case we don't own it. We're
> just given a promise that the caller is keeping it alive for us using
> some mechanism or another.
> 
> >>>> +// SAFETY: It's OK to access `File` through shared references from other threads because we're
> >>>> +// either accessing properties that don't change or that are properly synchronised by C code.
> >>> 
> >>> Uhm, what guarantees are you talking about specifically, please?
> >>> Examples would help.
> >>> 
> >>>> +unsafe impl Sync for File {}
> >> 
> >> The Sync trait defines whether a value may be accessed from several
> >> threads in parallel (using shared/immutable references). In our case,
> > 
> > So let me put this into my own words and you correct me, please:
> > 
> > So, this really just means that if I have two processes both with their own
> > fdtable and they happen to hold fds that refer to the same @file:
> > 
> > P1				P2
> > struct fd fd1 = fdget(1234);
> >                                  struct fd fd2 = fdget(5678);
> > if (!fd1.file)                   if (!fd2.file)
> > 	return -EBADF;                 return -EBADF;
> > 
> > // fd1.file == fd2.file
> > 
> > the only if the Sync trait is implemented both P1 and P2 can in parallel call
> > file->f_op->poll(@file)?
> > 
> > So if the Sync trait isn't implemented then the compiler will prohibit that P1
> > and P2 at the same time call file->f_op->poll(@file)? And that's all that's
> > meant by a shared reference? It's really about sharing the pointer.
> 
> Yeah, what you're saying sounds correct. For a type that is not Sync,
> you would need a lock around the call to `poll` before the compiler
> would accept the call.
> 
> (Or some other mechanism to convince the compiler that no other thread
> is looking at the file at the same time. Of course, a lock is just one
> way to do that.)
> 
> > The thing is that "shared reference" gets a bit in our way here:
> > 
> > (1) If you have SCM_RIGHTs in the mix then P1 can open fd1 to @file and then
> >     send that @file to P2 which now has fd2 refering to @file as well. The
> >     @file->f_count is bumped in that process. So @file->f_count is now 2.
> > 
> >     Now both P1 and P2 call fdget(). Since they don't have a shared fdtable
> >     none of them take an additional reference to @file. IOW, @file->f_count
> >     may remain 2 all throughout the @file->f_op->*() operation.
> > 
> >     So they share a reference to that file and elide both the
> >     atomic_inc_not_zero() and the atomic_dec_not_zero().
> > 
> > (2) io_uring has fixed files whose reference count always stays at 1.
> >     So all io_uring operations on such fixed files share a single reference.
> > 
> > So that's why this is a bit confusing at first to read "shared reference".
> > 
> > Please add a comment on top of unsafe impl Sync for File {}
> > explaining/clarifying this a little that it's about calling methods on the same
> > file.
> 
> Yeah, I agree, the terminology gets a bit mixed up here because we both
> use the word "reference" for different things.

> 
> How about this comment?

Sounds good.
Benno Lossin Nov. 30, 2023, 2:53 p.m. UTC | #13
On 11/29/23 13:51, Alice Ryhl wrote:
> +/// Flags associated with a [`File`].
> +pub mod flags {
> +    /// File is opened in append mode.
> +    pub const O_APPEND: u32 = bindings::O_APPEND;

Why do all of these constants begin with `O_`?

[...]

> +impl File {
> +    /// Constructs a new `struct file` wrapper from a file descriptor.
> +    ///
> +    /// The file descriptor belongs to the current process.
> +    pub fn from_fd(fd: u32) -> Result<ARef<Self>, BadFdError> {
> +        // SAFETY: FFI call, there are no requirements on `fd`.
> +        let ptr = ptr::NonNull::new(unsafe { bindings::fget(fd) }).ok_or(BadFdError)?;
> +
> +        // INVARIANT: `fget` increments the refcount before returning.
> +        Ok(unsafe { ARef::from_raw(ptr.cast()) })

Missing `SAFETY` comment.

> +    }
> +
> +    /// Creates a reference to a [`File`] from a valid pointer.
> +    ///
> +    /// # Safety
> +    ///
> +    /// The caller must ensure that `ptr` points at a valid file and that its refcount does not
> +    /// reach zero during the lifetime 'a.
> +    pub unsafe fn from_ptr<'a>(ptr: *const bindings::file) -> &'a File {
> +        // INVARIANT: The safety requirements guarantee that the refcount does not hit zero during
> +        // 'a. The cast is okay because `File` is `repr(transparent)`.
> +        unsafe { &*ptr.cast() }

Missing `SAFETY` comment.

> +    }
> +
> +    /// Returns the flags associated with the file.
> +    ///
> +    /// The flags are a combination of the constants in [`flags`].
> +    pub fn flags(&self) -> u32 {
> +        // This `read_volatile` is intended to correspond to a READ_ONCE call.
> +        //
> +        // SAFETY: The file is valid because the shared reference guarantees a nonzero refcount.
> +        //
> +        // TODO: Replace with `read_once` when available on the Rust side.
> +        unsafe { core::ptr::addr_of!((*self.0.get()).f_flags).read_volatile() }
> +    }
> +}
> +
> +// SAFETY: The type invariants guarantee that `File` is always ref-counted.
> +unsafe impl AlwaysRefCounted for File {
> +    fn inc_ref(&self) {
> +        // SAFETY: The existence of a shared reference means that the refcount is nonzero.
> +        unsafe { bindings::get_file(self.0.get()) };
> +    }
> +
> +    unsafe fn dec_ref(obj: ptr::NonNull<Self>) {
> +        // SAFETY: The safety requirements guarantee that the refcount is nonzero.
> +        unsafe { bindings::fput(obj.cast().as_ptr()) }
> +    }
> +}
> +
> +/// Represents the `EBADF` error code.
> +///
> +/// Used for methods that can only fail with `EBADF`.
> +pub struct BadFdError;
> +
> +impl From<BadFdError> for Error {
> +    fn from(_: BadFdError) -> Error {
> +        EBADF
> +    }
> +}
> +
> +impl core::fmt::Debug for BadFdError {
> +    fn fmt(&self, f: &mut core::fmt::Formatter<'_>) -> core::fmt::Result {
> +        f.pad("EBADF")
> +    }
> +}

Do we want to generalize this to the other errors as well? We could modify
the `declare_error!` macro in `error.rs` to create these unit structs.
Greg KH Nov. 30, 2023, 2:59 p.m. UTC | #14
On Thu, Nov 30, 2023 at 02:53:35PM +0000, Benno Lossin wrote:
> On 11/29/23 13:51, Alice Ryhl wrote:
> > +/// Flags associated with a [`File`].
> > +pub mod flags {
> > +    /// File is opened in append mode.
> > +    pub const O_APPEND: u32 = bindings::O_APPEND;
> 
> Why do all of these constants begin with `O_`?

Because that is how they are defined in the kernel in the C code.  Why
would they not be the same here?

thanks,

greg k-h
Benno Lossin Nov. 30, 2023, 3:02 p.m. UTC | #15
On 11/29/23 16:13, Matthew Wilcox wrote:
> On Wed, Nov 29, 2023 at 12:51:07PM +0000, Alice Ryhl wrote:
>> This introduces a struct for the EBADF error type, rather than just
>> using the Error type directly. This has two advantages:
>> * `File::from_fd` returns a `Result<ARef<File>, BadFdError>`, which the
>>   compiler will represent as a single pointer, with null being an error.
>>   This is possible because the compiler understands that `BadFdError`
>>   has only one possible value, and it also understands that the
>>   `ARef<File>` smart pointer is guaranteed non-null.
>> * Additionally, we promise to users of the method that the method can
>>   only fail with EBADF, which means that they can rely on this promise
>>   without having to inspect its implementation.
>> That said, there are also two disadvantages:
>> * Defining additional error types involves boilerplate.
>> * The question mark operator will only utilize the `From` trait once,
>>   which prevents you from using the question mark operator on
>>   `BadFdError` in methods that return some third error type that the
>>   kernel `Error` is convertible into. (However, it works fine in methods
>>   that return `Error`.)
> 
> I haven't looked at how Rust-for-Linux handles errors yet, but it's
> disappointing to see that it doesn't do something like the PTR_ERR /
> ERR_PTR / IS_ERR C thing under the hood.

In this case we are actually doing that: `ARef<T>` is a non-null pointer
to a `T` and since `BadFdError` is a unit struct (i.e. there exists only
a single value it can take) `Result<ARef<T>, BadFdError>` has the same
size as a pointer. This is because the Rust compiler represents the
`Err` variant as null.

We also do have support for `ERR_PTR`, but that requires `unsafe`, since
we do not know which kind of pointer the C side returned (was it an
`ARef<T>`, `&mut T`, `&T` etc.?) and can therefore only support `*mut T`.
Boqun Feng Nov. 30, 2023, 3:25 p.m. UTC | #16
On Thu, Nov 30, 2023 at 11:42:26AM +0100, Peter Zijlstra wrote:
> On Wed, Nov 29, 2023 at 09:08:14AM -0800, Boqun Feng wrote:
> 
> > But but but, I then realized we have asm goto in C but Rust doesn't
> > support them, and I haven't thought through how hard tht would be..
> 
> You're kidding right?
> 

I'm not, but I've found this:

	https://github.com/Amanieu/rfcs/blob/inline-asm/text/0000-inline-asm.md#asm-goto

seems to me, the plan for this is something like below:

	asm!(
		"cmp {}, 42",
		"jeq {}",
		in(reg) val,
		label { println!("a"); },
		fallthrough { println!("b"); }
    	);

But it's not implemented yet. Cc Josh in case that he knows more about
this.

Regards,
Boqun

> I thought we *finally* deprecated all compilers that didn't support
> asm-goto and x86 now mandates asm-goto to build, and then this toy
> language comes around ?
> 
> What a load of crap ...
Benno Lossin Nov. 30, 2023, 3:46 p.m. UTC | #17
On 11/30/23 15:59, Greg Kroah-Hartman wrote:
> On Thu, Nov 30, 2023 at 02:53:35PM +0000, Benno Lossin wrote:
>> On 11/29/23 13:51, Alice Ryhl wrote:
>>> +/// Flags associated with a [`File`].
>>> +pub mod flags {
>>> +    /// File is opened in append mode.
>>> +    pub const O_APPEND: u32 = bindings::O_APPEND;
>>
>> Why do all of these constants begin with `O_`?
> 
> Because that is how they are defined in the kernel in the C code.  Why
> would they not be the same here?

Then why does the C side name them that way? Is it because `O_*` is
supposed to mean something, or is it done due to namespacing?

In Rust we have namespacing, so we generally drop common prefixes.
Greg KH Nov. 30, 2023, 3:56 p.m. UTC | #18
On Thu, Nov 30, 2023 at 03:46:55PM +0000, Benno Lossin wrote:
> On 11/30/23 15:59, Greg Kroah-Hartman wrote:
> > On Thu, Nov 30, 2023 at 02:53:35PM +0000, Benno Lossin wrote:
> >> On 11/29/23 13:51, Alice Ryhl wrote:
> >>> +/// Flags associated with a [`File`].
> >>> +pub mod flags {
> >>> +    /// File is opened in append mode.
> >>> +    pub const O_APPEND: u32 = bindings::O_APPEND;
> >>
> >> Why do all of these constants begin with `O_`?
> > 
> > Because that is how they are defined in the kernel in the C code.  Why
> > would they not be the same here?
> 
> Then why does the C side name them that way? Is it because `O_*` is
> supposed to mean something, or is it done due to namespacing?

Because this is a unix-like system, we all "know" what they mean. :)

See 'man 2 open' for details.

> In Rust we have namespacing, so we generally drop common prefixes.

Fine, but we know what this namespace is, please don't override it to be
something else.

thanks,

greg k-h
Theodore Ts'o Nov. 30, 2023, 3:58 p.m. UTC | #19
On Thu, Nov 30, 2023 at 03:46:55PM +0000, Benno Lossin wrote:
> >>> +    pub const O_APPEND: u32 = bindings::O_APPEND;
> >>
> >> Why do all of these constants begin with `O_`?
> > 
> > Because that is how they are defined in the kernel in the C code.  Why
> > would they not be the same here?
> 
> Then why does the C side name them that way? Is it because `O_*` is
> supposed to mean something, or is it done due to namespacing?

It's because these sets of constants were flags passed to the open(2)
system call, and so they are dictated by the POSIX specification.  So
O_ means that they are a set of integer values which are used by
open(2), and they are defined when userspace #include's the fcntl.h
header file.  One could consider it be namespacing --- we need to
distinguish these from other constants: MAY_APPEND, RWF_APPEND,
ESCAPE_APPEND, STATX_ATTR_APPEND, BTRFS_INODE_APPEND.

But it's also a convention that dates back for ***decades*** and if we
want code to be understandable by kernel programmers, we need to obey
standard kernel naming conventions.  

> In Rust we have namespacing, so we generally drop common prefixes.

I don't know about Rust namespacing, but in other languages, how you
have to especify namespaces tend to be ***far*** more verbose than
just adding an O_ prefix.

Cheers,

					- Ted
Benno Lossin Nov. 30, 2023, 4:12 p.m. UTC | #20
On 11/30/23 16:58, Theodore Ts'o wrote:
> On Thu, Nov 30, 2023 at 03:46:55PM +0000, Benno Lossin wrote:
>>>>> +    pub const O_APPEND: u32 = bindings::O_APPEND;
>>>>
>>>> Why do all of these constants begin with `O_`?
>>>
>>> Because that is how they are defined in the kernel in the C code.  Why
>>> would they not be the same here?
>>
>> Then why does the C side name them that way? Is it because `O_*` is
>> supposed to mean something, or is it done due to namespacing?
> 
> It's because these sets of constants were flags passed to the open(2)
> system call, and so they are dictated by the POSIX specification.  So
> O_ means that they are a set of integer values which are used by
> open(2), and they are defined when userspace #include's the fcntl.h
> header file.  One could consider it be namespacing --- we need to
> distinguish these from other constants: MAY_APPEND, RWF_APPEND,
> ESCAPE_APPEND, STATX_ATTR_APPEND, BTRFS_INODE_APPEND.
> 
> But it's also a convention that dates back for ***decades*** and if we
> want code to be understandable by kernel programmers, we need to obey
> standard kernel naming conventions.

I see, that makes a lot of sense. Thanks for the explanation.

>> In Rust we have namespacing, so we generally drop common prefixes.
> 
> I don't know about Rust namespacing, but in other languages, how you
> have to especify namespaces tend to be ***far*** more verbose than
> just adding an O_ prefix.

In this case we already have the `flags` namespace, so I thought about
just dropping the `O_` prefix altogether.
Theodore Ts'o Dec. 1, 2023, 1:16 a.m. UTC | #21
On Thu, Nov 30, 2023 at 04:12:14PM +0000, Benno Lossin wrote:
> > I don't know about Rust namespacing, but in other languages, how you
> > have to especify namespaces tend to be ***far*** more verbose than
> > just adding an O_ prefix.
> 
> In this case we already have the `flags` namespace, so I thought about
> just dropping the `O_` prefix altogether.

Note that in C code, the flags are known to be an integer, and there
are times when we assume that it's possible to take the bitfield, and
then either (a) or'ing in bitfields from some other "namespace",
because it's known that the open flags only use a certain number of
the low bits of the integer, or even that O_RDONLY, O_WRONLY, and
O_RDWR are 0, 1, and 2, repsectively, and so you can do something like
((flags & 0x03) + 1) such that 1 means "read access", 2 means "write
access", and 3 (1|2) is read and write.

This may make a programmer used to a type-strict language feel a
little dirty, but again, this is a convention going back deckades,
back when a PDP-11 had only 32k of words in its address space....

Cheers,


     	    	       	    	- Ted
Peter Zijlstra Dec. 1, 2023, 8:53 a.m. UTC | #22
On Thu, Nov 30, 2023 at 07:25:01AM -0800, Boqun Feng wrote:
> On Thu, Nov 30, 2023 at 11:42:26AM +0100, Peter Zijlstra wrote:
> > On Wed, Nov 29, 2023 at 09:08:14AM -0800, Boqun Feng wrote:
> > 
> > > But but but, I then realized we have asm goto in C but Rust doesn't
> > > support them, and I haven't thought through how hard tht would be..
> > 
> > You're kidding right?
> > 
> 
> I'm not, but I've found this:
> 
> 	https://github.com/Amanieu/rfcs/blob/inline-asm/text/0000-inline-asm.md#asm-goto

Reading that makes all this even worse, apparently rust can't even use
memops.

So to summarise, Rust cannot properly interop with C, it cannot do
inline asm from this side of the millenium. Why are we even trying to
use it again?
Peter Zijlstra Dec. 1, 2023, 9 a.m. UTC | #23
On Thu, Nov 30, 2023 at 07:25:01AM -0800, Boqun Feng wrote:

> seems to me, the plan for this is something like below:
> 
> 	asm!(
> 		"cmp {}, 42",
> 		"jeq {}",
> 		in(reg) val,
> 		label { println!("a"); },
> 		fallthrough { println!("b"); }
>     	);

Because rust has horrible syntax I can't parse, I can't tell if this is
useful or not :/ Can this be used to implement arch_static_branch*() ?
Boqun Feng Dec. 1, 2023, 9:19 a.m. UTC | #24
On Fri, Dec 01, 2023 at 09:53:28AM +0100, Peter Zijlstra wrote:
> On Thu, Nov 30, 2023 at 07:25:01AM -0800, Boqun Feng wrote:
> > On Thu, Nov 30, 2023 at 11:42:26AM +0100, Peter Zijlstra wrote:
> > > On Wed, Nov 29, 2023 at 09:08:14AM -0800, Boqun Feng wrote:
> > > 
> > > > But but but, I then realized we have asm goto in C but Rust doesn't
> > > > support them, and I haven't thought through how hard tht would be..
> > > 
> > > You're kidding right?
> > > 
> > 
> > I'm not, but I've found this:
> > 
> > 	https://github.com/Amanieu/rfcs/blob/inline-asm/text/0000-inline-asm.md#asm-goto
> 
> Reading that makes all this even worse, apparently rust can't even use
> memops.

What do you mean by "memops"?

Regards,
Boqun

> 
> So to summarise, Rust cannot properly interop with C, it cannot do
> inline asm from this side of the millenium. Why are we even trying to
> use it again?
Peter Zijlstra Dec. 1, 2023, 9:40 a.m. UTC | #25
On Fri, Dec 01, 2023 at 01:19:14AM -0800, Boqun Feng wrote:

> > > 	https://github.com/Amanieu/rfcs/blob/inline-asm/text/0000-inline-asm.md#asm-goto
> > 
> > Reading that makes all this even worse, apparently rust can't even use
> > memops.
> 
> What do you mean by "memops"?

Above link has the below in "future possibilities":

"Memory operands

We could support mem as an alternative to specifying a register class
which would leave the operand in memory and instead produce a memory
address when inserted into the asm string. This would allow generating
more efficient code by taking advantage of addressing modes instead of
using an intermediate register to hold the computed address."

Just so happens that every x86 atomic block uses memops.. and per-cpu
and ...
Boqun Feng Dec. 1, 2023, 9:52 a.m. UTC | #26
On Fri, Dec 01, 2023 at 10:00:39AM +0100, Peter Zijlstra wrote:
> On Thu, Nov 30, 2023 at 07:25:01AM -0800, Boqun Feng wrote:
> 
> > seems to me, the plan for this is something like below:
> > 
> > 	asm!(
> > 		"cmp {}, 42",
> > 		"jeq {}",
> > 		in(reg) val,
> > 		label { println!("a"); },
> > 		fallthrough { println!("b"); }
> >     	);
> 
> Because rust has horrible syntax I can't parse, I can't tell if this is
> useful or not :/ Can this be used to implement arch_static_branch*() ?

I should think so:

	asm!("jmp {l_yes}", // jump to l_yes
	     "..."          // directives are supported
	     l_yes { return true; } // label "l_yes"
	     fallthrough { return false; } // otherwise return false
	)

Rust uses LLVM backend, so the inline asm should have the same ability
of clang.

But as I said, AFAIK jumping to label hasn't been implemented yet.

Regards,
Boqun
Boqun Feng Dec. 1, 2023, 10:36 a.m. UTC | #27
On Fri, Dec 01, 2023 at 10:40:37AM +0100, Peter Zijlstra wrote:
> On Fri, Dec 01, 2023 at 01:19:14AM -0800, Boqun Feng wrote:
> 
> > > > 	https://github.com/Amanieu/rfcs/blob/inline-asm/text/0000-inline-asm.md#asm-goto
> > > 
> > > Reading that makes all this even worse, apparently rust can't even use
> > > memops.
> > 
> > What do you mean by "memops"?
> 
> Above link has the below in "future possibilities":
> 
> "Memory operands
> 
> We could support mem as an alternative to specifying a register class
> which would leave the operand in memory and instead produce a memory
> address when inserted into the asm string. This would allow generating
> more efficient code by taking advantage of addressing modes instead of
> using an intermediate register to hold the computed address."
> 
> Just so happens that every x86 atomic block uses memops.. and per-cpu
> and ...
> 

Oh yes, I found out Rust's asm! doesn't support specifying a memory
location as input or output recently as well.


I don't speak for the Rust langauge community, but I think this is
something that they should improve. I understand it could be frustrating
that we find out the new stuff doesn't support good old tools we use
(trust me, I do!), but I believe you also understand that a higher level
language can help in some places, for example, SBRM is naturally
supported ;-) This answers half of the question: "Why are we even trying
to use it again?".

The other half is how languages are designed is different in these days:
a language community may do a better job on listening to the users and
the real use cases can affect the language design in return. While we
are doing our own experiment, we might well give that a shot too.

And at least the document admits these are "future possibilities", so
they should be more motivated to implement these.

It's never perfect, but we gotta start somewhere.

Regards,
Boqun

>
Peter Zijlstra Dec. 1, 2023, 11:05 a.m. UTC | #28
On Fri, Dec 01, 2023 at 02:36:40AM -0800, Boqun Feng wrote:

> I don't speak for the Rust langauge community, but I think this is
> something that they should improve. I understand it could be frustrating
> that we find out the new stuff doesn't support good old tools we use
> (trust me, I do!), but I believe you also understand that a higher level
> language can help in some places, for example, SBRM is naturally
> supported ;-) This answers half of the question: "Why are we even trying
> to use it again?".

C++ does that too (and a ton of other languages), and has a much less
craptastic syntax (not claiming C++ syntax doesn't have problems, but at
least its the same language family). Now I realize C++ isn't ideal, it
inherits much of the safety issues from C. But gah, rust is such a royal
pain.

> The other half is how languages are designed is different in these days:
> a language community may do a better job on listening to the users and
> the real use cases can affect the language design in return. While we
> are doing our own experiment, we might well give that a shot too.

Well, rust was clearly not designed to interact with C/C++ sanely. Given
the kernel is a giant C project, this is somewhat of an issue IMO.

IIRC the way Chrome makes it work with C++ is by defining the interface
in a *third* language which compiles into 'compatible' Rust and C++,
which is total idiocy if you ask me.

Some langauges (Zig IIUC) can consume regular C headers and are much
less painful to interact with (I know very little about Zig, no
endorsement beyond it integrating much better with C).

> And at least the document admits these are "future possibilities", so
> they should be more motivated to implement these.
> 
> It's never perfect, but we gotta start somewhere.

How about they start by using this LLVM goodness to implement the rust
equivalent of Zig's @cImport? Have it use clang to munge the C/C++
headers into IR and squash the lot into the rust thing.

The syntax is ofcourse unfixable :-(
David Laight Dec. 1, 2023, 12:11 p.m. UTC | #29
> > I don't know about Rust namespacing, but in other languages, how you
> > have to especify namespaces tend to be ***far*** more verbose than
> > just adding an O_ prefix.
> 
> In this case we already have the `flags` namespace, so I thought about
> just dropping the `O_` prefix altogether.

Does rust have a 'using namespace' (or similar) so that namespace doesn't
have to be explicitly specified each time a value is used?
If so you still need a hint about which set of values it is from.

Otherwise you get into the same mess as C++ class members (I think
they should have been .member from the start).
Or, worse still, Pascal and multiple 'with' blocks.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)
Alice Ryhl Dec. 1, 2023, 12:27 p.m. UTC | #30
David Laight <David.Laight@ACULAB.COM> writes:
> > > I don't know about Rust namespacing, but in other languages, how you
> > > have to especify namespaces tend to be ***far*** more verbose than
> > > just adding an O_ prefix.
> > 
> > In this case we already have the `flags` namespace, so I thought about
> > just dropping the `O_` prefix altogether.
> 
> Does rust have a 'using namespace' (or similar) so that namespace doesn't
> have to be explicitly specified each time a value is used?
> If so you still need a hint about which set of values it is from.
> 
> Otherwise you get into the same mess as C++ class members (I think
> they should have been .member from the start).
> Or, worse still, Pascal and multiple 'with' blocks.

Yes.

You can import it with a use statement. For example:

use kernel::file::flags::O_RDONLY;
// use as O_RDONLY

or:

use kernel::file::flags::{O_RDONLY, O_WRONLY, O_RDWR};
// use as O_RDONLY

or:

use kernel::file::flags::*;
// use as O_RDONLY

If you want to specify a namespace every time you use it, then it is
possible: (But often you wouldn't do that.)

use kernel::file::flags;
// use as flags::O_RDONLY

or:

use kernel::file;
// use as file::flags::O_RDONLY

or:

use kernel::file::flags as file_flags;
// use as file_flags::O_RDONLY

And you can also use the full path if you don't want to add a `use`
statement.

Alice
Theodore Ts'o Dec. 1, 2023, 3:04 p.m. UTC | #31
On Fri, Dec 01, 2023 at 12:27:40PM +0000, Alice Ryhl wrote:
> 
> You can import it with a use statement. For example:
> 
> use kernel::file::flags::O_RDONLY;
> // use as O_RDONLY

That's good to hear, but it still means that we have to use the XYZ_*
prefix, because otherwise, after something like

use kernel::file::flags::RDONLY;
use kernel::uapi::rwf::RDONLY;

that will blow up.  So that has to be

use kernel::file::flags::O_RDONLY;
use kernel::uapi::rwf::RWF_RDONLY;

Which is a bit more verbose, at least things won't blow up
spectacularly when you need to use both namespaces in the same
codepath.

Also note how we do things like this:

#define IOCB_APPEND          (__force int) RWF_APPEND

In other words, the IOCB_* namespace and the RWF_* namespace partially
share code points, and so they *have* to be assigned to the same value
--- and note that since RWF_APPEND is defined as part of the UAPI, it
might not even be the same across different architectures....

Cheers,

						- Ted
Benno Lossin Dec. 1, 2023, 3:14 p.m. UTC | #32
On 12/1/23 16:04, Theodore Ts'o wrote:
> On Fri, Dec 01, 2023 at 12:27:40PM +0000, Alice Ryhl wrote:
>>
>> You can import it with a use statement. For example:
>>
>> use kernel::file::flags::O_RDONLY;
>> // use as O_RDONLY
> 
> That's good to hear, but it still means that we have to use the XYZ_*
> prefix, because otherwise, after something like
> 
> use kernel::file::flags::RDONLY;
> use kernel::uapi::rwf::RDONLY;
> 
> that will blow up.  So that has to be
> 
> use kernel::file::flags::O_RDONLY;
> use kernel::uapi::rwf::RWF_RDONLY;

You can just import the `flags` and `rwf` modules (the fourth option
posted by Alice):

    use kernel::file::flags;
    use kernel::uapi::rwf;
    
    // usage:
    
    flags::O_RDONLY
    
    rwf::RDONLY

Alternatively if we end up with multiple flags modules you can do this
(the sixth option from Alice):

    use kernel::file::flags as file_flags;
    use kernel::foo::flags as foo_flags;

    // usage:

    file_flags::O_RDONLY

    foo_flags::O_RDONLY
David Laight Dec. 1, 2023, 5:25 p.m. UTC | #33
From: Benno Lossin
> Sent: 01 December 2023 15:14
> 
> On 12/1/23 16:04, Theodore Ts'o wrote:
> > On Fri, Dec 01, 2023 at 12:27:40PM +0000, Alice Ryhl wrote:
> >>
> >> You can import it with a use statement. For example:
> >>
> >> use kernel::file::flags::O_RDONLY;
> >> // use as O_RDONLY
> >
> > That's good to hear,

Except that the examples here seem to imply you can't import
all of the values without listing them all.

From what I've seen of the rust patches the language seems
to have a lower SNR than ADA or VHDL.
Too much syntatic 'goop' makes it difficult to see what code
is actually doing.

....
> Alternatively if we end up with multiple flags modules you can do this
> (the sixth option from Alice):
> 
>     use kernel::file::flags as file_flags;
>     use kernel::foo::flags as foo_flags;
> 
>     // usage:
> 
>     file_flags::O_RDONLY
> 
>     foo_flags::O_RDONLY

That looks useful for the 'obfuscated rust' competition.
Consider:
	use kernel::file::flags as foo_flags;
	use kernel::foo::flags as file_flags;

It's probably fortunate that I' old enough retire before anyone forces
me to write any of this stuff :-)

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)
Benno Lossin Dec. 1, 2023, 5:37 p.m. UTC | #34
On 12/1/23 18:25, David Laight wrote:
> From: Benno Lossin
>> Sent: 01 December 2023 15:14
>>
>> On 12/1/23 16:04, Theodore Ts'o wrote:
>>> On Fri, Dec 01, 2023 at 12:27:40PM +0000, Alice Ryhl wrote:
>>>>
>>>> You can import it with a use statement. For example:
>>>>
>>>> use kernel::file::flags::O_RDONLY;
>>>> // use as O_RDONLY
>>>
>>> That's good to hear,
> 
> Except that the examples here seem to imply you can't import
> all of the values without listing them all.

Alice has given an example above, but you might not have noticed:

    use kernel::file::flags::*;
    
    // usage:

    O_RDONLY
    O_APPEND

> From what I've seen of the rust patches the language seems
> to have a lower SNR than ADA or VHDL.
> Too much syntatic 'goop' makes it difficult to see what code
> is actually doing.

This is done for better readability, e.g. when you do not have
rust-analyzer to help you jump to the right definition. But there are
certainly instances where we use the `::*` imports (just look at the
first patch).

> ....
>> Alternatively if we end up with multiple flags modules you can do this
>> (the sixth option from Alice):
>>
>>     use kernel::file::flags as file_flags;
>>     use kernel::foo::flags as foo_flags;
>>
>>     // usage:
>>
>>     file_flags::O_RDONLY
>>
>>     foo_flags::O_RDONLY
> 
> That looks useful for the 'obfuscated rust' competition.
> Consider:
> 	use kernel::file::flags as foo_flags;
> 	use kernel::foo::flags as file_flags;

This is no worse than C preprocessor macros doing funky stuff.
We will just have to catch this in review.
diff mbox series

Patch

diff --git a/rust/bindings/bindings_helper.h b/rust/bindings/bindings_helper.h
index 85f013ed4ca4..beed3ef1fbc3 100644
--- a/rust/bindings/bindings_helper.h
+++ b/rust/bindings/bindings_helper.h
@@ -8,6 +8,8 @@ 
 
 #include <kunit/test.h>
 #include <linux/errname.h>
+#include <linux/file.h>
+#include <linux/fs.h>
 #include <linux/slab.h>
 #include <linux/refcount.h>
 #include <linux/wait.h>
diff --git a/rust/helpers.c b/rust/helpers.c
index 70e59efd92bc..03141a3608a4 100644
--- a/rust/helpers.c
+++ b/rust/helpers.c
@@ -25,6 +25,7 @@ 
 #include <linux/build_bug.h>
 #include <linux/err.h>
 #include <linux/errname.h>
+#include <linux/fs.h>
 #include <linux/mutex.h>
 #include <linux/refcount.h>
 #include <linux/sched/signal.h>
@@ -157,6 +158,12 @@  void rust_helper_init_work_with_key(struct work_struct *work, work_func_t func,
 }
 EXPORT_SYMBOL_GPL(rust_helper_init_work_with_key);
 
+struct file *rust_helper_get_file(struct file *f)
+{
+	return get_file(f);
+}
+EXPORT_SYMBOL_GPL(rust_helper_get_file);
+
 /*
  * `bindgen` binds the C `size_t` type as the Rust `usize` type, so we can
  * use it in contexts where Rust expects a `usize` like slice (array) indices.
diff --git a/rust/kernel/file.rs b/rust/kernel/file.rs
new file mode 100644
index 000000000000..ee4ec8b919af
--- /dev/null
+++ b/rust/kernel/file.rs
@@ -0,0 +1,182 @@ 
+// SPDX-License-Identifier: GPL-2.0
+
+//! Files and file descriptors.
+//!
+//! C headers: [`include/linux/fs.h`](../../../../include/linux/fs.h) and
+//! [`include/linux/file.h`](../../../../include/linux/file.h)
+
+use crate::{
+    bindings,
+    error::{code::*, Error, Result},
+    types::{ARef, AlwaysRefCounted, Opaque},
+};
+use core::ptr;
+
+/// Flags associated with a [`File`].
+pub mod flags {
+    /// File is opened in append mode.
+    pub const O_APPEND: u32 = bindings::O_APPEND;
+
+    /// Signal-driven I/O is enabled.
+    pub const O_ASYNC: u32 = bindings::FASYNC;
+
+    /// Close-on-exec flag is set.
+    pub const O_CLOEXEC: u32 = bindings::O_CLOEXEC;
+
+    /// File was created if it didn't already exist.
+    pub const O_CREAT: u32 = bindings::O_CREAT;
+
+    /// Direct I/O is enabled for this file.
+    pub const O_DIRECT: u32 = bindings::O_DIRECT;
+
+    /// File must be a directory.
+    pub const O_DIRECTORY: u32 = bindings::O_DIRECTORY;
+
+    /// Like [`O_SYNC`] except metadata is not synced.
+    pub const O_DSYNC: u32 = bindings::O_DSYNC;
+
+    /// Ensure that this file is created with the `open(2)` call.
+    pub const O_EXCL: u32 = bindings::O_EXCL;
+
+    /// Large file size enabled (`off64_t` over `off_t`).
+    pub const O_LARGEFILE: u32 = bindings::O_LARGEFILE;
+
+    /// Do not update the file last access time.
+    pub const O_NOATIME: u32 = bindings::O_NOATIME;
+
+    /// File should not be used as process's controlling terminal.
+    pub const O_NOCTTY: u32 = bindings::O_NOCTTY;
+
+    /// If basename of path is a symbolic link, fail open.
+    pub const O_NOFOLLOW: u32 = bindings::O_NOFOLLOW;
+
+    /// File is using nonblocking I/O.
+    pub const O_NONBLOCK: u32 = bindings::O_NONBLOCK;
+
+    /// Also known as `O_NDELAY`.
+    ///
+    /// This is effectively the same flag as [`O_NONBLOCK`] on all architectures
+    /// except SPARC64.
+    pub const O_NDELAY: u32 = bindings::O_NDELAY;
+
+    /// Used to obtain a path file descriptor.
+    pub const O_PATH: u32 = bindings::O_PATH;
+
+    /// Write operations on this file will flush data and metadata.
+    pub const O_SYNC: u32 = bindings::O_SYNC;
+
+    /// This file is an unnamed temporary regular file.
+    pub const O_TMPFILE: u32 = bindings::O_TMPFILE;
+
+    /// File should be truncated to length 0.
+    pub const O_TRUNC: u32 = bindings::O_TRUNC;
+
+    /// Bitmask for access mode flags.
+    ///
+    /// # Examples
+    ///
+    /// ```
+    /// use kernel::file;
+    /// # fn do_something() {}
+    /// # let flags = 0;
+    /// if (flags & file::flags::O_ACCMODE) == file::flags::O_RDONLY {
+    ///     do_something();
+    /// }
+    /// ```
+    pub const O_ACCMODE: u32 = bindings::O_ACCMODE;
+
+    /// File is read only.
+    pub const O_RDONLY: u32 = bindings::O_RDONLY;
+
+    /// File is write only.
+    pub const O_WRONLY: u32 = bindings::O_WRONLY;
+
+    /// File can be both read and written.
+    pub const O_RDWR: u32 = bindings::O_RDWR;
+}
+
+/// Wraps the kernel's `struct file`.
+///
+/// # Invariants
+///
+/// Instances of this type are always ref-counted, that is, a call to `get_file` ensures that the
+/// allocation remains valid at least until the matching call to `fput`.
+#[repr(transparent)]
+pub struct File(Opaque<bindings::file>);
+
+// SAFETY: By design, the only way to access a `File` is via an immutable reference or an `ARef`.
+// This means that the only situation in which a `File` can be accessed mutably is when the
+// refcount drops to zero and the destructor runs. It is safe for that to happen on any thread, so
+// it is ok for this type to be `Send`.
+unsafe impl Send for File {}
+
+// SAFETY: It's OK to access `File` through shared references from other threads because we're
+// either accessing properties that don't change or that are properly synchronised by C code.
+unsafe impl Sync for File {}
+
+impl File {
+    /// Constructs a new `struct file` wrapper from a file descriptor.
+    ///
+    /// The file descriptor belongs to the current process.
+    pub fn from_fd(fd: u32) -> Result<ARef<Self>, BadFdError> {
+        // SAFETY: FFI call, there are no requirements on `fd`.
+        let ptr = ptr::NonNull::new(unsafe { bindings::fget(fd) }).ok_or(BadFdError)?;
+
+        // INVARIANT: `fget` increments the refcount before returning.
+        Ok(unsafe { ARef::from_raw(ptr.cast()) })
+    }
+
+    /// Creates a reference to a [`File`] from a valid pointer.
+    ///
+    /// # Safety
+    ///
+    /// The caller must ensure that `ptr` points at a valid file and that its refcount does not
+    /// reach zero during the lifetime 'a.
+    pub unsafe fn from_ptr<'a>(ptr: *const bindings::file) -> &'a File {
+        // INVARIANT: The safety requirements guarantee that the refcount does not hit zero during
+        // 'a. The cast is okay because `File` is `repr(transparent)`.
+        unsafe { &*ptr.cast() }
+    }
+
+    /// Returns the flags associated with the file.
+    ///
+    /// The flags are a combination of the constants in [`flags`].
+    pub fn flags(&self) -> u32 {
+        // This `read_volatile` is intended to correspond to a READ_ONCE call.
+        //
+        // SAFETY: The file is valid because the shared reference guarantees a nonzero refcount.
+        //
+        // TODO: Replace with `read_once` when available on the Rust side.
+        unsafe { core::ptr::addr_of!((*self.0.get()).f_flags).read_volatile() }
+    }
+}
+
+// SAFETY: The type invariants guarantee that `File` is always ref-counted.
+unsafe impl AlwaysRefCounted for File {
+    fn inc_ref(&self) {
+        // SAFETY: The existence of a shared reference means that the refcount is nonzero.
+        unsafe { bindings::get_file(self.0.get()) };
+    }
+
+    unsafe fn dec_ref(obj: ptr::NonNull<Self>) {
+        // SAFETY: The safety requirements guarantee that the refcount is nonzero.
+        unsafe { bindings::fput(obj.cast().as_ptr()) }
+    }
+}
+
+/// Represents the `EBADF` error code.
+///
+/// Used for methods that can only fail with `EBADF`.
+pub struct BadFdError;
+
+impl From<BadFdError> for Error {
+    fn from(_: BadFdError) -> Error {
+        EBADF
+    }
+}
+
+impl core::fmt::Debug for BadFdError {
+    fn fmt(&self, f: &mut core::fmt::Formatter<'_>) -> core::fmt::Result {
+        f.pad("EBADF")
+    }
+}
diff --git a/rust/kernel/lib.rs b/rust/kernel/lib.rs
index e6aff80b521f..ce9abceab784 100644
--- a/rust/kernel/lib.rs
+++ b/rust/kernel/lib.rs
@@ -34,6 +34,7 @@ 
 mod allocator;
 mod build_assert;
 pub mod error;
+pub mod file;
 pub mod init;
 pub mod ioctl;
 #[cfg(CONFIG_KUNIT)]