diff mbox series

[2/3] rust: file: add f_pos and set_f_pos

Message ID 20240926-b4-miscdevice-v1-2-7349c2b2837a@google.com (mailing list archive)
State New
Headers show
Series Miscdevices in Rust | expand

Commit Message

Alice Ryhl Sept. 26, 2024, 2:58 p.m. UTC
Add accessors for the file position. Most of the time, you should not
use these methods directly, and you should instead use a guard for the
file position to prove that you hold the fpos lock. However, under
limited circumstances, files are allowed to choose a different locking
strategy for their file position. These accessors can be used to handle
that case.

For now, these accessors are the only way to access the file position
within the llseek and read_iter callbacks.

Signed-off-by: Alice Ryhl <aliceryhl@google.com>
---
 rust/kernel/fs/file.rs | 20 ++++++++++++++++++++
 1 file changed, 20 insertions(+)

Comments

Al Viro Sept. 26, 2024, 10:08 p.m. UTC | #1
On Thu, Sep 26, 2024 at 02:58:56PM +0000, Alice Ryhl wrote:
> Add accessors for the file position. Most of the time, you should not
> use these methods directly, and you should instead use a guard for the
> file position to prove that you hold the fpos lock. However, under
> limited circumstances, files are allowed to choose a different locking
> strategy for their file position. These accessors can be used to handle
> that case.
> 
> For now, these accessors are the only way to access the file position
> within the llseek and read_iter callbacks.

You really should not do that within ->read_iter().  If your method
does that, it has the wrong signature.

If nothing else, it should be usable for preadv(2), so what file position
are you talking about?
Al Viro Sept. 26, 2024, 10:47 p.m. UTC | #2
On Thu, Sep 26, 2024 at 11:08:21PM +0100, Al Viro wrote:
> On Thu, Sep 26, 2024 at 02:58:56PM +0000, Alice Ryhl wrote:
> > Add accessors for the file position. Most of the time, you should not
> > use these methods directly, and you should instead use a guard for the
> > file position to prove that you hold the fpos lock. However, under
> > limited circumstances, files are allowed to choose a different locking
> > strategy for their file position. These accessors can be used to handle
> > that case.
> > 
> > For now, these accessors are the only way to access the file position
> > within the llseek and read_iter callbacks.
> 
> You really should not do that within ->read_iter().  If your method
> does that, it has the wrong signature.
> 
> If nothing else, it should be usable for preadv(2), so what file position
> are you talking about?

To elaborate: ->llseek() is the only method that has any business accessing
->f_pos (and that - possibly not forever).  Note, BTW, that most of the
time ->llseek() should be using one of the safe instances from fs/libfs.c
or helpers from the same place; direct ->f_pos access in drivers is
basically for things like
static loff_t cfam_llseek(struct file *file, loff_t offset, int whence)
{
        switch (whence) {
	case SEEK_CUR:
		break;
	case SEEK_SET:
		file->f_pos = offset;
		break;
	default:
		return -EINVAL;
	}

	return offset;
}
which is... really special.  Translation: lseek(fd, n, SEEK_CUR) - return n
and do nothing.  lseek(fd, n, SEEK_SET) - usual semantics.  Anything else
- fail with EINVAL.  The mind-boggling part is SEEK_CUR, but that's
userland ABI of that particular driver; if the authors can be convinced that
we don't need to preserve that wart, it can be replaced with use of
no_seek_end_llseek.  If their very special userland relies upon it...
not much we can do.

Anything else outside of core VFS should not touch the damn thing, unless
they have a very good reason and are willing to explain what makes them
special.

From quick grep through the tree, we seem to have grown a bunch of bogosities
in vfio (including one in samples, presumably responsible for that infestation),
there's a few strange ioctls that reset it to 0 or do other unnatural things
(hell, VFAT has readdir() variant called that way), there are _really_ shitty
cases in HFS, HFS+ and HPFS, where things like unlink() while somebody has the
parent directory open will modify the current position(s), and then there's
whatever ksmbd is playing at.

We really should not expose ->f_pos - that can't be done on the C side (yet),
but let's not spread that idiocy.
Al Viro Sept. 26, 2024, 10:52 p.m. UTC | #3
On Thu, Sep 26, 2024 at 11:47:33PM +0100, Al Viro wrote:
> time ->llseek() should be using one of the safe instances from fs/libfs.c

d'oh... s/libfs.c/read_write.c/ - sorry.
Alice Ryhl Sept. 27, 2024, 6:48 a.m. UTC | #4
On Fri, Sep 27, 2024 at 12:08 AM Al Viro <viro@zeniv.linux.org.uk> wrote:
>
> On Thu, Sep 26, 2024 at 02:58:56PM +0000, Alice Ryhl wrote:
> > Add accessors for the file position. Most of the time, you should not
> > use these methods directly, and you should instead use a guard for the
> > file position to prove that you hold the fpos lock. However, under
> > limited circumstances, files are allowed to choose a different locking
> > strategy for their file position. These accessors can be used to handle
> > that case.
> >
> > For now, these accessors are the only way to access the file position
> > within the llseek and read_iter callbacks.
>
> You really should not do that within ->read_iter().  If your method
> does that, it has the wrong signature.
>
> If nothing else, it should be usable for preadv(2), so what file position
> are you talking about?

Sorry, you are right. In read_iter we can access the file position via
the kiocb. It only makes sense for seek.

Alice
Alice Ryhl Sept. 27, 2024, 6:56 a.m. UTC | #5
On Fri, Sep 27, 2024 at 12:47 AM Al Viro <viro@zeniv.linux.org.uk> wrote:
>
> On Thu, Sep 26, 2024 at 11:08:21PM +0100, Al Viro wrote:
> > On Thu, Sep 26, 2024 at 02:58:56PM +0000, Alice Ryhl wrote:
> > > Add accessors for the file position. Most of the time, you should not
> > > use these methods directly, and you should instead use a guard for the
> > > file position to prove that you hold the fpos lock. However, under
> > > limited circumstances, files are allowed to choose a different locking
> > > strategy for their file position. These accessors can be used to handle
> > > that case.
> > >
> > > For now, these accessors are the only way to access the file position
> > > within the llseek and read_iter callbacks.
> >
> > You really should not do that within ->read_iter().  If your method
> > does that, it has the wrong signature.
> >
> > If nothing else, it should be usable for preadv(2), so what file position
> > are you talking about?
>
> To elaborate: ->llseek() is the only method that has any business accessing
> ->f_pos (and that - possibly not forever).  Note, BTW, that most of the
> time ->llseek() should be using one of the safe instances from fs/libfs.c
> or helpers from the same place; direct ->f_pos access in drivers is
> basically for things like
> static loff_t cfam_llseek(struct file *file, loff_t offset, int whence)
> {
>         switch (whence) {
>         case SEEK_CUR:
>                 break;
>         case SEEK_SET:
>                 file->f_pos = offset;
>                 break;
>         default:
>                 return -EINVAL;
>         }
>
>         return offset;
> }
> which is... really special.  Translation: lseek(fd, n, SEEK_CUR) - return n
> and do nothing.  lseek(fd, n, SEEK_SET) - usual semantics.  Anything else
> - fail with EINVAL.  The mind-boggling part is SEEK_CUR, but that's
> userland ABI of that particular driver; if the authors can be convinced that
> we don't need to preserve that wart, it can be replaced with use of
> no_seek_end_llseek.  If their very special userland relies upon it...
> not much we can do.
>
> Anything else outside of core VFS should not touch the damn thing, unless
> they have a very good reason and are willing to explain what makes them
> special.
>
> From quick grep through the tree, we seem to have grown a bunch of bogosities
> in vfio (including one in samples, presumably responsible for that infestation),
> there's a few strange ioctls that reset it to 0 or do other unnatural things
> (hell, VFAT has readdir() variant called that way), there are _really_ shitty
> cases in HFS, HFS+ and HPFS, where things like unlink() while somebody has the
> parent directory open will modify the current position(s), and then there's
> whatever ksmbd is playing at.
>
> We really should not expose ->f_pos - that can't be done on the C side (yet),
> but let's not spread that idiocy.

Okay, interesting. I did not know about all of these llseek helpers.
I'm definitely happy to make the Rust API force users to do the right
thing if we can.

It sounds like we basically have a few different seeking behaviors
that the driver can choose between, and we want to force the user to
use one of them?

Alice
Christian Brauner Sept. 27, 2024, 7:32 a.m. UTC | #6
On Thu, Sep 26, 2024 at 02:58:56PM GMT, Alice Ryhl wrote:
> Add accessors for the file position. Most of the time, you should not
> use these methods directly, and you should instead use a guard for the
> file position to prove that you hold the fpos lock. However, under
> limited circumstances, files are allowed to choose a different locking
> strategy for their file position. These accessors can be used to handle
> that case.
> 
> For now, these accessors are the only way to access the file position
> within the llseek and read_iter callbacks.
> 
> Signed-off-by: Alice Ryhl <aliceryhl@google.com>
> ---
>  rust/kernel/fs/file.rs | 20 ++++++++++++++++++++
>  1 file changed, 20 insertions(+)
> 
> diff --git a/rust/kernel/fs/file.rs b/rust/kernel/fs/file.rs
> index e03dbe14d62a..c896a3b1d182 100644
> --- a/rust/kernel/fs/file.rs
> +++ b/rust/kernel/fs/file.rs
> @@ -333,6 +333,26 @@ pub fn flags(&self) -> u32 {
>          // FIXME(read_once): Replace with `read_once` when available on the Rust side.
>          unsafe { core::ptr::addr_of!((*self.as_ptr()).f_flags).read_volatile() }
>      }
> +
> +    /// Read the file position.
> +    ///
> +    /// # Safety
> +    ///
> +    /// You must hold the fpos lock or otherwise ensure that no data race will happen.
> +    #[inline]
> +    pub unsafe fn f_pos(&self) -> i64 {
> +        unsafe { (*self.as_ptr()).f_pos }
> +    }
> +
> +    /// Set the file position.
> +    ///
> +    /// # Safety
> +    ///
> +    /// You must hold the fpos lock or otherwise ensure that no data race will happen.
> +    #[inline]
> +    pub unsafe fn set_f_pos(&self, value: i64) {
> +        unsafe { (*self.as_ptr()).f_pos = value };
> +    }

I don't think we want to expose f_pos with its weird locking rule
through rust wrappers. Ideally, it's completely opaque to the callers
and not accessed directly at all.
Al Viro Sept. 27, 2024, 7:38 p.m. UTC | #7
On Fri, Sep 27, 2024 at 08:56:50AM +0200, Alice Ryhl wrote:

> Okay, interesting. I did not know about all of these llseek helpers.
> I'm definitely happy to make the Rust API force users to do the right
> thing if we can.
> 
> It sounds like we basically have a few different seeking behaviors
> that the driver can choose between, and we want to force the user to
> use one of them?

Depends...  Basically, SEEK_HOLE/SEEK_DATA is seriously fs-specific
(unsurprisingly), and pretty much everything wants the usual relation
between SEEK_SET and SEEK_CUR (<SEEK_CUR,n> is the same as <SEEK_SET,
current position + n>).  SEEK_END availability varies - the simplest
variant is <SEEK_END, n> == <SEEK_SET, size + n>, but there are
cases that genuinely have nothing resembling end-relative seek
(e.g. anything seq_file-related).

It's not so much available instances as available helpers; details of
semantics may seriously vary by the driver.

Note that once upon a time ->f_pos had been exposed to ->read() et.al.;
caused recurring bugs, until we switched to "sample ->f_pos before calling
->read(), pass the reference to local copy into the method, then put
what's the method left behind in there back into ->f_pos".

Something similar might be a good idea for ->llseek().  Locking is
an unpleasant problem, unfortunately.  lseek() is not a terribly hot
codepath, but read() and write() are.  For a while we used to do exclusion
on per-struct file basis for _all_ read/write/lseek; see 797964253d35
"file: reinstate f_pos locking optimization for regular files" for the
point where it eroded.

FWIW, I suspect that unconditionally taking ->f_pos_mutex for llseek(2)
would solve most of the problems - for one thing, with guaranteed
per-struct-file serialization of vfs_llseek() we could handle SEEK_CUR
right there, so that ->llseek() instances would never see it; for another,
we just might be able to pull the same 'pass a reference to local variable
and let it be handled there' trick for ->llseek().  That would require
an audit of locking in the instances, though...
Alice Ryhl Oct. 1, 2024, 8:20 a.m. UTC | #8
On Fri, Sep 27, 2024 at 9:38 PM Al Viro <viro@zeniv.linux.org.uk> wrote:
>
> On Fri, Sep 27, 2024 at 08:56:50AM +0200, Alice Ryhl wrote:
>
> > Okay, interesting. I did not know about all of these llseek helpers.
> > I'm definitely happy to make the Rust API force users to do the right
> > thing if we can.
> >
> > It sounds like we basically have a few different seeking behaviors
> > that the driver can choose between, and we want to force the user to
> > use one of them?
>
> Depends...  Basically, SEEK_HOLE/SEEK_DATA is seriously fs-specific
> (unsurprisingly), and pretty much everything wants the usual relation
> between SEEK_SET and SEEK_CUR (<SEEK_CUR,n> is the same as <SEEK_SET,
> current position + n>).  SEEK_END availability varies - the simplest
> variant is <SEEK_END, n> == <SEEK_SET, size + n>, but there are
> cases that genuinely have nothing resembling end-relative seek
> (e.g. anything seq_file-related).
>
> It's not so much available instances as available helpers; details of
> semantics may seriously vary by the driver.
>
> Note that once upon a time ->f_pos had been exposed to ->read() et.al.;
> caused recurring bugs, until we switched to "sample ->f_pos before calling
> ->read(), pass the reference to local copy into the method, then put
> what's the method left behind in there back into ->f_pos".
>
> Something similar might be a good idea for ->llseek().  Locking is
> an unpleasant problem, unfortunately.  lseek() is not a terribly hot
> codepath, but read() and write() are.  For a while we used to do exclusion
> on per-struct file basis for _all_ read/write/lseek; see 797964253d35
> "file: reinstate f_pos locking optimization for regular files" for the
> point where it eroded.
>
> FWIW, I suspect that unconditionally taking ->f_pos_mutex for llseek(2)
> would solve most of the problems - for one thing, with guaranteed
> per-struct-file serialization of vfs_llseek() we could handle SEEK_CUR
> right there, so that ->llseek() instances would never see it; for another,
> we just might be able to pull the same 'pass a reference to local variable
> and let it be handled there' trick for ->llseek().  That would require
> an audit of locking in the instances, though...

Okay, thanks for the explanation. The file position stuff seems pretty
complicated.

One thing to think about is whether there are some behaviors used by
old drivers that new drivers should not use. We can design our Rust
APIs to prevent using it in those legacy ways.

For now I'm dropping this patch from the series at Greg's request.

Alice
diff mbox series

Patch

diff --git a/rust/kernel/fs/file.rs b/rust/kernel/fs/file.rs
index e03dbe14d62a..c896a3b1d182 100644
--- a/rust/kernel/fs/file.rs
+++ b/rust/kernel/fs/file.rs
@@ -333,6 +333,26 @@  pub fn flags(&self) -> u32 {
         // FIXME(read_once): Replace with `read_once` when available on the Rust side.
         unsafe { core::ptr::addr_of!((*self.as_ptr()).f_flags).read_volatile() }
     }
+
+    /// Read the file position.
+    ///
+    /// # Safety
+    ///
+    /// You must hold the fpos lock or otherwise ensure that no data race will happen.
+    #[inline]
+    pub unsafe fn f_pos(&self) -> i64 {
+        unsafe { (*self.as_ptr()).f_pos }
+    }
+
+    /// Set the file position.
+    ///
+    /// # Safety
+    ///
+    /// You must hold the fpos lock or otherwise ensure that no data race will happen.
+    #[inline]
+    pub unsafe fn set_f_pos(&self, value: i64) {
+        unsafe { (*self.as_ptr()).f_pos = value };
+    }
 }
 
 impl File {