Message ID | 20241209-miscdevice-file-param-v2-2-83ece27e9ff6@google.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | Additional miscdevice fops parameters | expand |
On Mon, Dec 09, 2024 at 07:27:47AM +0000, Alice Ryhl wrote: > Providing access to the underlying `struct miscdevice` is useful for > various reasons. For example, this allows you access the miscdevice's > internal `struct device` for use with the `dev_*` printing macros. > > Note that since the underlying `struct miscdevice` could get freed at > any point after the fops->open() call, only the open call is given > access to it. To print from other calls, they should take a refcount on > the device to keep it alive. The lifespan of the miscdevice is at least from open until close, so it's safe for at least then (i.e. read/write/ioctl/etc.) > Signed-off-by: Alice Ryhl <aliceryhl@google.com> > --- > rust/kernel/miscdevice.rs | 19 ++++++++++++++++--- > 1 file changed, 16 insertions(+), 3 deletions(-) > > diff --git a/rust/kernel/miscdevice.rs b/rust/kernel/miscdevice.rs > index 0cb79676c139..c5af1d5ec4be 100644 > --- a/rust/kernel/miscdevice.rs > +++ b/rust/kernel/miscdevice.rs > @@ -104,7 +104,7 @@ pub trait MiscDevice { > /// Called when the misc device is opened. > /// > /// The returned pointer will be stored as the private data for the file. > - fn open(_file: &File) -> Result<Self::Ptr>; > + fn open(_file: &File, _misc: &MiscDeviceRegistration<Self>) -> Result<Self::Ptr>; > > /// Called when the misc device is released. > fn release(device: Self::Ptr, _file: &File) { > @@ -190,14 +190,27 @@ impl<T: MiscDevice> VtableHelper<T> { > return ret; > } > > + // SAFETY: The opwn call of a file can access the private data. s/opwn/open/ :) > + let misc_ptr = unsafe { (*file).private_data }; Blank line here? > + // SAFETY: This is a miscdevice, so `misc_open()` set the private data to a pointer to the > + // associated `struct miscdevice` before calling into this method. Furthermore, `misc_open()` > + // ensures that the miscdevice can't be unregistered and freed during this call to `fops_open`. Aren't we wrapping comment lines at 80 columns still? I can't remember anymore... > + let misc = unsafe { &*misc_ptr.cast::<MiscDeviceRegistration<T>>() }; > + > // SAFETY: > - // * The file is valid for the duration of this call. > + // * The file is valid for the duration of the `T::open` call. It's valid for the lifespan between open/release. > // * There is no active fdget_pos region on the file on this thread. > - let ptr = match T::open(unsafe { File::from_raw_file(file) }) { > + let file = unsafe { File::from_raw_file(file) }; > + > + let ptr = match T::open(file, misc) { > Ok(ptr) => ptr, > Err(err) => return err.to_errno(), > }; > > + // This overwrites the private data from above. It makes sense to not hold on to the misc > + // pointer since the `struct miscdevice` can get unregistered as soon as we return from this > + // call, so the misc pointer might be dangling on future file operations. > + // Wait, what are we overwriting this here with? Now private data points to the misc device when before it was the file structure. No other code needed to be changed because of that? Can't we enforce this pointer type somewhere so that any casts in any read/write/ioctl also "knows" it has the right type? This feels "dangerous" to me. > // SAFETY: The open call of a file owns the private data. > unsafe { (*file).private_data = ptr.into_foreign().cast_mut() }; Is this SAFETY comment still correct? thanks, greg k-h
On Mon, Dec 9, 2024 at 9:48 AM Greg Kroah-Hartman <gregkh@linuxfoundation.org> wrote: > > On Mon, Dec 09, 2024 at 07:27:47AM +0000, Alice Ryhl wrote: > > Providing access to the underlying `struct miscdevice` is useful for > > various reasons. For example, this allows you access the miscdevice's > > internal `struct device` for use with the `dev_*` printing macros. > > > > Note that since the underlying `struct miscdevice` could get freed at > > any point after the fops->open() call, only the open call is given > > access to it. To print from other calls, they should take a refcount on > > the device to keep it alive. > > The lifespan of the miscdevice is at least from open until close, so > it's safe for at least then (i.e. read/write/ioctl/etc.) How is that enforced? What happens if I call misc_deregister while there are open fds? > > Signed-off-by: Alice Ryhl <aliceryhl@google.com> > > --- > > rust/kernel/miscdevice.rs | 19 ++++++++++++++++--- > > 1 file changed, 16 insertions(+), 3 deletions(-) > > > > diff --git a/rust/kernel/miscdevice.rs b/rust/kernel/miscdevice.rs > > index 0cb79676c139..c5af1d5ec4be 100644 > > --- a/rust/kernel/miscdevice.rs > > +++ b/rust/kernel/miscdevice.rs > > @@ -104,7 +104,7 @@ pub trait MiscDevice { > > /// Called when the misc device is opened. > > /// > > /// The returned pointer will be stored as the private data for the file. > > - fn open(_file: &File) -> Result<Self::Ptr>; > > + fn open(_file: &File, _misc: &MiscDeviceRegistration<Self>) -> Result<Self::Ptr>; > > > > /// Called when the misc device is released. > > fn release(device: Self::Ptr, _file: &File) { > > @@ -190,14 +190,27 @@ impl<T: MiscDevice> VtableHelper<T> { > > return ret; > > } > > > > + // SAFETY: The opwn call of a file can access the private data. > > s/opwn/open/ :) > > > + let misc_ptr = unsafe { (*file).private_data }; > > Blank line here? > > > + // SAFETY: This is a miscdevice, so `misc_open()` set the private data to a pointer to the > > + // associated `struct miscdevice` before calling into this method. Furthermore, `misc_open()` > > + // ensures that the miscdevice can't be unregistered and freed during this call to `fops_open`. > > Aren't we wrapping comment lines at 80 columns still? I can't remember > anymore... Not sure what the rules are, but I don't think Rust comments are being wrapped at 80. > > + let misc = unsafe { &*misc_ptr.cast::<MiscDeviceRegistration<T>>() }; > > + > > // SAFETY: > > - // * The file is valid for the duration of this call. > > + // * The file is valid for the duration of the `T::open` call. > > It's valid for the lifespan between open/release. > > > // * There is no active fdget_pos region on the file on this thread. > > - let ptr = match T::open(unsafe { File::from_raw_file(file) }) { > > + let file = unsafe { File::from_raw_file(file) }; > > + > > + let ptr = match T::open(file, misc) { > > Ok(ptr) => ptr, > > Err(err) => return err.to_errno(), > > }; > > > > + // This overwrites the private data from above. It makes sense to not hold on to the misc > > + // pointer since the `struct miscdevice` can get unregistered as soon as we return from this > > + // call, so the misc pointer might be dangling on future file operations. > > + // > > Wait, what are we overwriting this here with? Now private data points > to the misc device when before it was the file structure. No other code > needed to be changed because of that? Can't we enforce this pointer > type somewhere so that any casts in any read/write/ioctl also "knows" it > has the right type? This feels "dangerous" to me. Ultimately, when interfacing with C code using void pointers, Rust is going to need a pointer cast somewhere to assert what the type is. With the current design, that place is the fops_* functions. We need to get the pointer casts right there, but anywhere else the types are enforced. > > // SAFETY: The open call of a file owns the private data. > > unsafe { (*file).private_data = ptr.into_foreign().cast_mut() }; > > Is this SAFETY comment still correct? Well, it could probably be worded better at least. The point is that nobody else is going to touch this field and we can do what we want with it. Alice
On Mon, Dec 09, 2024 at 07:27:47AM +0000, Alice Ryhl wrote: > Providing access to the underlying `struct miscdevice` is useful for > various reasons. For example, this allows you access the miscdevice's > internal `struct device` for use with the `dev_*` printing macros. > > Note that since the underlying `struct miscdevice` could get freed at > any point after the fops->open() call, only the open call is given > access to it. To print from other calls, they should take a refcount on > the device to keep it alive. > > Signed-off-by: Alice Ryhl <aliceryhl@google.com> > --- > rust/kernel/miscdevice.rs | 19 ++++++++++++++++--- > 1 file changed, 16 insertions(+), 3 deletions(-) > > diff --git a/rust/kernel/miscdevice.rs b/rust/kernel/miscdevice.rs > index 0cb79676c139..c5af1d5ec4be 100644 > --- a/rust/kernel/miscdevice.rs > +++ b/rust/kernel/miscdevice.rs > @@ -104,7 +104,7 @@ pub trait MiscDevice { > /// Called when the misc device is opened. > /// > /// The returned pointer will be stored as the private data for the file. > - fn open(_file: &File) -> Result<Self::Ptr>; > + fn open(_file: &File, _misc: &MiscDeviceRegistration<Self>) -> Result<Self::Ptr>; How is the user of this abstraction supposed to access the underlying struct miscdevice e.g. from other fops? AFAICS, there is no way for the user to store a device pointer / reference in their driver private data. I also think it's a bit weird to pass the registration structure in open() to access the device. I think we need an actual representation of a struct miscdevice, i.e. `misc::Device`. We can discuss whether we want to implement it like I implemented `pci::Device` and `platform::Device`, i.e. as an `ARef<device::Device>` or if we do it like you proposed, but I think things should be aligned. > > /// Called when the misc device is released. > fn release(device: Self::Ptr, _file: &File) { > @@ -190,14 +190,27 @@ impl<T: MiscDevice> VtableHelper<T> { > return ret; > } > > + // SAFETY: The opwn call of a file can access the private data. > + let misc_ptr = unsafe { (*file).private_data }; > + // SAFETY: This is a miscdevice, so `misc_open()` set the private data to a pointer to the > + // associated `struct miscdevice` before calling into this method. Furthermore, `misc_open()` > + // ensures that the miscdevice can't be unregistered and freed during this call to `fops_open`. > + let misc = unsafe { &*misc_ptr.cast::<MiscDeviceRegistration<T>>() }; > + > // SAFETY: > - // * The file is valid for the duration of this call. > + // * The file is valid for the duration of the `T::open` call. > // * There is no active fdget_pos region on the file on this thread. > - let ptr = match T::open(unsafe { File::from_raw_file(file) }) { > + let file = unsafe { File::from_raw_file(file) }; > + > + let ptr = match T::open(file, misc) { > Ok(ptr) => ptr, > Err(err) => return err.to_errno(), > }; > > + // This overwrites the private data from above. It makes sense to not hold on to the misc > + // pointer since the `struct miscdevice` can get unregistered as soon as we return from this > + // call, so the misc pointer might be dangling on future file operations. > + // > // SAFETY: The open call of a file owns the private data. > unsafe { (*file).private_data = ptr.into_foreign().cast_mut() }; > > > -- > 2.47.1.545.g3c1d2e2a6a-goog > >
On Mon, Dec 09, 2024 at 11:50:57AM +0100, Alice Ryhl wrote: > On Mon, Dec 9, 2024 at 9:48 AM Greg Kroah-Hartman > <gregkh@linuxfoundation.org> wrote: > > > > On Mon, Dec 09, 2024 at 07:27:47AM +0000, Alice Ryhl wrote: > > > Providing access to the underlying `struct miscdevice` is useful for > > > various reasons. For example, this allows you access the miscdevice's > > > internal `struct device` for use with the `dev_*` printing macros. > > > > > > Note that since the underlying `struct miscdevice` could get freed at > > > any point after the fops->open() call, only the open call is given > > > access to it. To print from other calls, they should take a refcount on > > > the device to keep it alive. > > > > The lifespan of the miscdevice is at least from open until close, so > > it's safe for at least then (i.e. read/write/ioctl/etc.) > > How is that enforced? What happens if I call misc_deregister while > there are open fds? You shouldn't be able to do that as the code that would be calling misc_deregister() (i.e. in a module unload path) would not work because the module reference count is incremented at this point in time due to the file operation module reference. Wait, we are plumbing in the module owner logic here, right? That should be in the file operations structure. Yeah, it's a horrid hack, and one day we will put "real" revoke logic in here to detach the misc device from the file operations if this were to happen. It's a very very common anti-pattern that many subsystems have that is a bug that we all have been talking about for a very very long time. Wolfram even has a plan for how to fix it all up (see his Japan LinuxCon talk from 2 years ago), but I don't think anyone is doing the work on it :( The media and drm layers have internal hacks/work-arounds to try to handle this issue, but luckily for us, the odds of a misc device being dynamically removed from the system is pretty low. Once / if ever, we get the revoke type logic implemented, then we can apply that to the misc device code and follow it through to the rust side if needed. > > > Signed-off-by: Alice Ryhl <aliceryhl@google.com> > > > --- > > > rust/kernel/miscdevice.rs | 19 ++++++++++++++++--- > > > 1 file changed, 16 insertions(+), 3 deletions(-) > > > > > > diff --git a/rust/kernel/miscdevice.rs b/rust/kernel/miscdevice.rs > > > index 0cb79676c139..c5af1d5ec4be 100644 > > > --- a/rust/kernel/miscdevice.rs > > > +++ b/rust/kernel/miscdevice.rs > > > @@ -104,7 +104,7 @@ pub trait MiscDevice { > > > /// Called when the misc device is opened. > > > /// > > > /// The returned pointer will be stored as the private data for the file. > > > - fn open(_file: &File) -> Result<Self::Ptr>; > > > + fn open(_file: &File, _misc: &MiscDeviceRegistration<Self>) -> Result<Self::Ptr>; > > > > > > /// Called when the misc device is released. > > > fn release(device: Self::Ptr, _file: &File) { > > > @@ -190,14 +190,27 @@ impl<T: MiscDevice> VtableHelper<T> { > > > return ret; > > > } > > > > > > + // SAFETY: The opwn call of a file can access the private data. > > > > s/opwn/open/ :) > > > > > + let misc_ptr = unsafe { (*file).private_data }; > > > > Blank line here? > > > > > + // SAFETY: This is a miscdevice, so `misc_open()` set the private data to a pointer to the > > > + // associated `struct miscdevice` before calling into this method. Furthermore, `misc_open()` > > > + // ensures that the miscdevice can't be unregistered and freed during this call to `fops_open`. > > > > Aren't we wrapping comment lines at 80 columns still? I can't remember > > anymore... > > Not sure what the rules are, but I don't think Rust comments are being > wrapped at 80. Ok, that's fine, I didn't remember either. > > > + let misc = unsafe { &*misc_ptr.cast::<MiscDeviceRegistration<T>>() }; > > > + > > > // SAFETY: > > > - // * The file is valid for the duration of this call. > > > + // * The file is valid for the duration of the `T::open` call. > > > > It's valid for the lifespan between open/release. > > > > > // * There is no active fdget_pos region on the file on this thread. > > > - let ptr = match T::open(unsafe { File::from_raw_file(file) }) { > > > + let file = unsafe { File::from_raw_file(file) }; > > > + > > > + let ptr = match T::open(file, misc) { > > > Ok(ptr) => ptr, > > > Err(err) => return err.to_errno(), > > > }; > > > > > > + // This overwrites the private data from above. It makes sense to not hold on to the misc > > > + // pointer since the `struct miscdevice` can get unregistered as soon as we return from this > > > + // call, so the misc pointer might be dangling on future file operations. > > > + // > > > > Wait, what are we overwriting this here with? Now private data points > > to the misc device when before it was the file structure. No other code > > needed to be changed because of that? Can't we enforce this pointer > > type somewhere so that any casts in any read/write/ioctl also "knows" it > > has the right type? This feels "dangerous" to me. > > Ultimately, when interfacing with C code using void pointers, Rust is > going to need a pointer cast somewhere to assert what the type is. > With the current design, that place is the fops_* functions. We need > to get the pointer casts right there, but anywhere else the types are > enforced. So where else is this type enforced? A read/write/ioctl call also wants this pointer, or is it up to the open call to stash it somewhere that those calls can get to it? It's hanging off of the file pointer now: > > > // SAFETY: The open call of a file owns the private data. > > > unsafe { (*file).private_data = ptr.into_foreign().cast_mut() }; So any place that casts back from that structure needs to get this correct. Is that up to each individual misc device driver to do it (to be fair, that's what the C drivers do, just wanting to be sure here.) Ah, wait, I get it, you are just storing the "raw" pointer to the rust implementation of the structure, NOT the C "raw" pointer like it currently is today. You are making this all match up with the existing code. Sorry for the noise, this all makes sense to me now, I didn't have enough coffee for that first code review. > > > > Is this SAFETY comment still correct? > > Well, it could probably be worded better at least. The point is that > nobody else is going to touch this field and we can do what we want > with it. True, ok, that's fine. Care to respin this with at least the typo fixed? thanks, greg k-h
On Mon, Dec 09, 2024 at 12:07:13PM +0100, Danilo Krummrich wrote: > On Mon, Dec 09, 2024 at 07:27:47AM +0000, Alice Ryhl wrote: > > Providing access to the underlying `struct miscdevice` is useful for > > various reasons. For example, this allows you access the miscdevice's > > internal `struct device` for use with the `dev_*` printing macros. > > > > Note that since the underlying `struct miscdevice` could get freed at > > any point after the fops->open() call, only the open call is given > > access to it. To print from other calls, they should take a refcount on > > the device to keep it alive. > > > > Signed-off-by: Alice Ryhl <aliceryhl@google.com> > > --- > > rust/kernel/miscdevice.rs | 19 ++++++++++++++++--- > > 1 file changed, 16 insertions(+), 3 deletions(-) > > > > diff --git a/rust/kernel/miscdevice.rs b/rust/kernel/miscdevice.rs > > index 0cb79676c139..c5af1d5ec4be 100644 > > --- a/rust/kernel/miscdevice.rs > > +++ b/rust/kernel/miscdevice.rs > > @@ -104,7 +104,7 @@ pub trait MiscDevice { > > /// Called when the misc device is opened. > > /// > > /// The returned pointer will be stored as the private data for the file. > > - fn open(_file: &File) -> Result<Self::Ptr>; > > + fn open(_file: &File, _misc: &MiscDeviceRegistration<Self>) -> Result<Self::Ptr>; > > How is the user of this abstraction supposed to access the underlying struct > miscdevice e.g. from other fops? AFAICS, there is no way for the user to store a > device pointer / reference in their driver private data. That should be "hung" off of the miscdevice structure. In C that's done through embedding the miscdevice structure within something else, don't know how you all are going to do that in rust :) Or, better yet, in your open callback, the rust code can set the file private data pointer, that's what is done a lot as well, either could work. > I also think it's a bit weird to pass the registration structure in open() to > access the device. That's what the miscdevice api does today in C. Well, it's embedded in the file private pointer, so I guess just a function to call to get it instead would work. > I think we need an actual representation of a struct miscdevice, i.e. > `misc::Device`. I thought we have that already? thanks, greg k-h
On Mon, Dec 9, 2024 at 12:07 PM Danilo Krummrich <dakr@kernel.org> wrote: > > On Mon, Dec 09, 2024 at 07:27:47AM +0000, Alice Ryhl wrote: > > Providing access to the underlying `struct miscdevice` is useful for > > various reasons. For example, this allows you access the miscdevice's > > internal `struct device` for use with the `dev_*` printing macros. > > > > Note that since the underlying `struct miscdevice` could get freed at > > any point after the fops->open() call, only the open call is given > > access to it. To print from other calls, they should take a refcount on > > the device to keep it alive. > > > > Signed-off-by: Alice Ryhl <aliceryhl@google.com> > > --- > > rust/kernel/miscdevice.rs | 19 ++++++++++++++++--- > > 1 file changed, 16 insertions(+), 3 deletions(-) > > > > diff --git a/rust/kernel/miscdevice.rs b/rust/kernel/miscdevice.rs > > index 0cb79676c139..c5af1d5ec4be 100644 > > --- a/rust/kernel/miscdevice.rs > > +++ b/rust/kernel/miscdevice.rs > > @@ -104,7 +104,7 @@ pub trait MiscDevice { > > /// Called when the misc device is opened. > > /// > > /// The returned pointer will be stored as the private data for the file. > > - fn open(_file: &File) -> Result<Self::Ptr>; > > + fn open(_file: &File, _misc: &MiscDeviceRegistration<Self>) -> Result<Self::Ptr>; > > How is the user of this abstraction supposed to access the underlying struct > miscdevice e.g. from other fops? AFAICS, there is no way for the user to store a > device pointer / reference in their driver private data. I had assumed that the miscdevice does not necessarily live long enough for that to be okay ... but if it does we can change it. See other thread with Greg. > I also think it's a bit weird to pass the registration structure in open() to > access the device. > > I think we need an actual representation of a struct miscdevice, i.e. > `misc::Device`. It sounds like we can just rename `MiscDeviceRegistration` to `Device`. > We can discuss whether we want to implement it like I implemented `pci::Device` > and `platform::Device`, i.e. as an `ARef<device::Device>` or if we do it like > you proposed, but I think things should be aligned. Let's figure out the lifetime of `struct miscdevice` first ... Alice
On Mon, Dec 9, 2024 at 12:10 PM Greg Kroah-Hartman <gregkh@linuxfoundation.org> wrote: > > On Mon, Dec 09, 2024 at 11:50:57AM +0100, Alice Ryhl wrote: > > On Mon, Dec 9, 2024 at 9:48 AM Greg Kroah-Hartman > > <gregkh@linuxfoundation.org> wrote: > > > > > > On Mon, Dec 09, 2024 at 07:27:47AM +0000, Alice Ryhl wrote: > > > > Providing access to the underlying `struct miscdevice` is useful for > > > > various reasons. For example, this allows you access the miscdevice's > > > > internal `struct device` for use with the `dev_*` printing macros. > > > > > > > > Note that since the underlying `struct miscdevice` could get freed at > > > > any point after the fops->open() call, only the open call is given > > > > access to it. To print from other calls, they should take a refcount on > > > > the device to keep it alive. > > > > > > The lifespan of the miscdevice is at least from open until close, so > > > it's safe for at least then (i.e. read/write/ioctl/etc.) > > > > How is that enforced? What happens if I call misc_deregister while > > there are open fds? > > You shouldn't be able to do that as the code that would be calling > misc_deregister() (i.e. in a module unload path) would not work because > the module reference count is incremented at this point in time due to > the file operation module reference. Oh .. so misc_deregister must only be called when the module is being unloaded? > Wait, we are plumbing in the module owner logic here, right? That > should be in the file operations structure. Right ... it's missing but I will add it. > Yeah, it's a horrid hack, and one day we will put "real" revoke logic in > here to detach the misc device from the file operations if this were to > happen. It's a very very common anti-pattern that many subsystems have > that is a bug that we all have been talking about for a very very long > time. Wolfram even has a plan for how to fix it all up (see his Japan > LinuxCon talk from 2 years ago), but I don't think anyone is doing the > work on it :( > > The media and drm layers have internal hacks/work-arounds to try to > handle this issue, but luckily for us, the odds of a misc device being > dynamically removed from the system is pretty low. > > Once / if ever, we get the revoke type logic implemented, then we can > apply that to the misc device code and follow it through to the rust > side if needed. If dynamically deregistering is not safe, then we need to change the Rust abstractions to prevent it. Alice
On Mon, Dec 09, 2024 at 12:38:32PM +0100, Alice Ryhl wrote: > On Mon, Dec 9, 2024 at 12:10 PM Greg Kroah-Hartman > <gregkh@linuxfoundation.org> wrote: > > > > On Mon, Dec 09, 2024 at 11:50:57AM +0100, Alice Ryhl wrote: > > > On Mon, Dec 9, 2024 at 9:48 AM Greg Kroah-Hartman > > > <gregkh@linuxfoundation.org> wrote: > > > > > > > > On Mon, Dec 09, 2024 at 07:27:47AM +0000, Alice Ryhl wrote: > > > > > Providing access to the underlying `struct miscdevice` is useful for > > > > > various reasons. For example, this allows you access the miscdevice's > > > > > internal `struct device` for use with the `dev_*` printing macros. > > > > > > > > > > Note that since the underlying `struct miscdevice` could get freed at > > > > > any point after the fops->open() call, only the open call is given > > > > > access to it. To print from other calls, they should take a refcount on > > > > > the device to keep it alive. > > > > > > > > The lifespan of the miscdevice is at least from open until close, so > > > > it's safe for at least then (i.e. read/write/ioctl/etc.) > > > > > > How is that enforced? What happens if I call misc_deregister while > > > there are open fds? > > > > You shouldn't be able to do that as the code that would be calling > > misc_deregister() (i.e. in a module unload path) would not work because > > the module reference count is incremented at this point in time due to > > the file operation module reference. > > Oh .. so misc_deregister must only be called when the module is being unloaded? Traditionally yes, that's when it is called. Do you see it happening in any other place in the kernel today? > > Wait, we are plumbing in the module owner logic here, right? That > > should be in the file operations structure. > > Right ... it's missing but I will add it. Thanks! > > Yeah, it's a horrid hack, and one day we will put "real" revoke logic in > > here to detach the misc device from the file operations if this were to > > happen. It's a very very common anti-pattern that many subsystems have > > that is a bug that we all have been talking about for a very very long > > time. Wolfram even has a plan for how to fix it all up (see his Japan > > LinuxCon talk from 2 years ago), but I don't think anyone is doing the > > work on it :( > > > > The media and drm layers have internal hacks/work-arounds to try to > > handle this issue, but luckily for us, the odds of a misc device being > > dynamically removed from the system is pretty low. > > > > Once / if ever, we get the revoke type logic implemented, then we can > > apply that to the misc device code and follow it through to the rust > > side if needed. > > If dynamically deregistering is not safe, then we need to change the > Rust abstractions to prevent it. Dynamically deregistering is not unsafe, it's just that I don't think you will physically ever have the misc_deregister() path called if a file handle is open. Same should be the case for rust code, it should "just work" without any extra code to do so. thanks, greg k-h
On Mon, Dec 9, 2024 at 12:53 PM Greg Kroah-Hartman <gregkh@linuxfoundation.org> wrote: > > On Mon, Dec 09, 2024 at 12:38:32PM +0100, Alice Ryhl wrote: > > On Mon, Dec 9, 2024 at 12:10 PM Greg Kroah-Hartman > > <gregkh@linuxfoundation.org> wrote: > > > > > > On Mon, Dec 09, 2024 at 11:50:57AM +0100, Alice Ryhl wrote: > > > > On Mon, Dec 9, 2024 at 9:48 AM Greg Kroah-Hartman > > > > <gregkh@linuxfoundation.org> wrote: > > > > > > > > > > On Mon, Dec 09, 2024 at 07:27:47AM +0000, Alice Ryhl wrote: > > > > > > Providing access to the underlying `struct miscdevice` is useful for > > > > > > various reasons. For example, this allows you access the miscdevice's > > > > > > internal `struct device` for use with the `dev_*` printing macros. > > > > > > > > > > > > Note that since the underlying `struct miscdevice` could get freed at > > > > > > any point after the fops->open() call, only the open call is given > > > > > > access to it. To print from other calls, they should take a refcount on > > > > > > the device to keep it alive. > > > > > > > > > > The lifespan of the miscdevice is at least from open until close, so > > > > > it's safe for at least then (i.e. read/write/ioctl/etc.) > > > > > > > > How is that enforced? What happens if I call misc_deregister while > > > > there are open fds? > > > > > > You shouldn't be able to do that as the code that would be calling > > > misc_deregister() (i.e. in a module unload path) would not work because > > > the module reference count is incremented at this point in time due to > > > the file operation module reference. > > > > Oh .. so misc_deregister must only be called when the module is being unloaded? > > Traditionally yes, that's when it is called. Do you see it happening in > any other place in the kernel today? I had not looked, but I know that Binder allows dynamically creating and removing its devices at runtime. It happens to be the case that this is only supported when binderfs is used, which is when it doesn't use miscdevice, so technically Binder does not call misc_deregister() outside of module unload, but following its example it's not hard to imagine that such removals could happen. > > > Yeah, it's a horrid hack, and one day we will put "real" revoke logic in > > > here to detach the misc device from the file operations if this were to > > > happen. It's a very very common anti-pattern that many subsystems have > > > that is a bug that we all have been talking about for a very very long > > > time. Wolfram even has a plan for how to fix it all up (see his Japan > > > LinuxCon talk from 2 years ago), but I don't think anyone is doing the > > > work on it :( > > > > > > The media and drm layers have internal hacks/work-arounds to try to > > > handle this issue, but luckily for us, the odds of a misc device being > > > dynamically removed from the system is pretty low. > > > > > > Once / if ever, we get the revoke type logic implemented, then we can > > > apply that to the misc device code and follow it through to the rust > > > side if needed. > > > > If dynamically deregistering is not safe, then we need to change the > > Rust abstractions to prevent it. > > Dynamically deregistering is not unsafe, it's just that I don't think > you will physically ever have the misc_deregister() path called if a > file handle is open. Same should be the case for rust code, it should > "just work" without any extra code to do so. Well, if I give files access to the struct miscdevice in all fops hooks, then deregistering does become unsafe since accessing it in an ioctl after deregistering would be a UAF. I'd like to prevent the user from doing that. Alice
On Mon, Dec 09, 2024 at 01:00:05PM +0100, Alice Ryhl wrote: > On Mon, Dec 9, 2024 at 12:53 PM Greg Kroah-Hartman > <gregkh@linuxfoundation.org> wrote: > > > > On Mon, Dec 09, 2024 at 12:38:32PM +0100, Alice Ryhl wrote: > > > On Mon, Dec 9, 2024 at 12:10 PM Greg Kroah-Hartman > > > <gregkh@linuxfoundation.org> wrote: > > > > > > > > On Mon, Dec 09, 2024 at 11:50:57AM +0100, Alice Ryhl wrote: > > > > > On Mon, Dec 9, 2024 at 9:48 AM Greg Kroah-Hartman > > > > > <gregkh@linuxfoundation.org> wrote: > > > > > > > > > > > > On Mon, Dec 09, 2024 at 07:27:47AM +0000, Alice Ryhl wrote: > > > > > > > Providing access to the underlying `struct miscdevice` is useful for > > > > > > > various reasons. For example, this allows you access the miscdevice's > > > > > > > internal `struct device` for use with the `dev_*` printing macros. > > > > > > > > > > > > > > Note that since the underlying `struct miscdevice` could get freed at > > > > > > > any point after the fops->open() call, only the open call is given > > > > > > > access to it. To print from other calls, they should take a refcount on > > > > > > > the device to keep it alive. > > > > > > > > > > > > The lifespan of the miscdevice is at least from open until close, so > > > > > > it's safe for at least then (i.e. read/write/ioctl/etc.) > > > > > > > > > > How is that enforced? What happens if I call misc_deregister while > > > > > there are open fds? > > > > > > > > You shouldn't be able to do that as the code that would be calling > > > > misc_deregister() (i.e. in a module unload path) would not work because > > > > the module reference count is incremented at this point in time due to > > > > the file operation module reference. > > > > > > Oh .. so misc_deregister must only be called when the module is being unloaded? > > > > Traditionally yes, that's when it is called. Do you see it happening in > > any other place in the kernel today? > > I had not looked, but I know that Binder allows dynamically creating > and removing its devices at runtime. It happens to be the case that > this is only supported when binderfs is used, which is when it doesn't > use miscdevice, so technically Binder does not call misc_deregister() > outside of module unload, but following its example it's not hard to > imagine that such removals could happen. That's why those are files and not misc devices :) > > > > Yeah, it's a horrid hack, and one day we will put "real" revoke logic in > > > > here to detach the misc device from the file operations if this were to > > > > happen. It's a very very common anti-pattern that many subsystems have > > > > that is a bug that we all have been talking about for a very very long > > > > time. Wolfram even has a plan for how to fix it all up (see his Japan > > > > LinuxCon talk from 2 years ago), but I don't think anyone is doing the > > > > work on it :( > > > > > > > > The media and drm layers have internal hacks/work-arounds to try to > > > > handle this issue, but luckily for us, the odds of a misc device being > > > > dynamically removed from the system is pretty low. > > > > > > > > Once / if ever, we get the revoke type logic implemented, then we can > > > > apply that to the misc device code and follow it through to the rust > > > > side if needed. > > > > > > If dynamically deregistering is not safe, then we need to change the > > > Rust abstractions to prevent it. > > > > Dynamically deregistering is not unsafe, it's just that I don't think > > you will physically ever have the misc_deregister() path called if a > > file handle is open. Same should be the case for rust code, it should > > "just work" without any extra code to do so. > > Well, if I give files access to the struct miscdevice in all fops > hooks, then deregistering does become unsafe since accessing it in an > ioctl after deregistering would be a UAF. I'd like to prevent the user > from doing that. I don't think that the deregister would succeed in the vfs layer if an open file reference was currently held, but I haven't tried that in a long time. If you can come up with a way to prevent that, wonderful, but I wouldn't worry too much as again, this "should not" happen due to the file reference count, and if it does, it's a major logic error on the driver's part, just like we have today in C. thanks, greg k-h
On Mon, Dec 9, 2024 at 1:08 PM Greg Kroah-Hartman <gregkh@linuxfoundation.org> wrote: > > On Mon, Dec 09, 2024 at 01:00:05PM +0100, Alice Ryhl wrote: > > On Mon, Dec 9, 2024 at 12:53 PM Greg Kroah-Hartman > > <gregkh@linuxfoundation.org> wrote: > > > > > > On Mon, Dec 09, 2024 at 12:38:32PM +0100, Alice Ryhl wrote: > > > > On Mon, Dec 9, 2024 at 12:10 PM Greg Kroah-Hartman > > > > <gregkh@linuxfoundation.org> wrote: > > > > > > > > > > On Mon, Dec 09, 2024 at 11:50:57AM +0100, Alice Ryhl wrote: > > > > > > On Mon, Dec 9, 2024 at 9:48 AM Greg Kroah-Hartman > > > > > > <gregkh@linuxfoundation.org> wrote: > > > > > > > > > > > > > > On Mon, Dec 09, 2024 at 07:27:47AM +0000, Alice Ryhl wrote: > > > > > > > > Providing access to the underlying `struct miscdevice` is useful for > > > > > > > > various reasons. For example, this allows you access the miscdevice's > > > > > > > > internal `struct device` for use with the `dev_*` printing macros. > > > > > > > > > > > > > > > > Note that since the underlying `struct miscdevice` could get freed at > > > > > > > > any point after the fops->open() call, only the open call is given > > > > > > > > access to it. To print from other calls, they should take a refcount on > > > > > > > > the device to keep it alive. > > > > > > > > > > > > > > The lifespan of the miscdevice is at least from open until close, so > > > > > > > it's safe for at least then (i.e. read/write/ioctl/etc.) > > > > > > > > > > > > How is that enforced? What happens if I call misc_deregister while > > > > > > there are open fds? > > > > > > > > > > You shouldn't be able to do that as the code that would be calling > > > > > misc_deregister() (i.e. in a module unload path) would not work because > > > > > the module reference count is incremented at this point in time due to > > > > > the file operation module reference. > > > > > > > > Oh .. so misc_deregister must only be called when the module is being unloaded? > > > > > > Traditionally yes, that's when it is called. Do you see it happening in > > > any other place in the kernel today? > > > > I had not looked, but I know that Binder allows dynamically creating > > and removing its devices at runtime. It happens to be the case that > > this is only supported when binderfs is used, which is when it doesn't > > use miscdevice, so technically Binder does not call misc_deregister() > > outside of module unload, but following its example it's not hard to > > imagine that such removals could happen. > > That's why those are files and not misc devices :) I grepped for misc_deregister and the first driver I looked at is drivers/misc/bcm-vk which seems to allow dynamic deregistration if the pci device is removed. Another tricky path is error cleanup in its probe function. Technically, if probe fails after registering the misc device, there's a brief moment where you could open the miscdevice before it gets removed in the cleanup path, which seems to me that it could lead to UAF? Or is there something I'm missing? Alice
On Mon, Dec 09, 2024 at 01:53:42PM +0100, Alice Ryhl wrote: > On Mon, Dec 9, 2024 at 1:08 PM Greg Kroah-Hartman > <gregkh@linuxfoundation.org> wrote: > > > > On Mon, Dec 09, 2024 at 01:00:05PM +0100, Alice Ryhl wrote: > > > On Mon, Dec 9, 2024 at 12:53 PM Greg Kroah-Hartman > > > <gregkh@linuxfoundation.org> wrote: > > > > > > > > On Mon, Dec 09, 2024 at 12:38:32PM +0100, Alice Ryhl wrote: > > > > > On Mon, Dec 9, 2024 at 12:10 PM Greg Kroah-Hartman > > > > > <gregkh@linuxfoundation.org> wrote: > > > > > > > > > > > > On Mon, Dec 09, 2024 at 11:50:57AM +0100, Alice Ryhl wrote: > > > > > > > On Mon, Dec 9, 2024 at 9:48 AM Greg Kroah-Hartman > > > > > > > <gregkh@linuxfoundation.org> wrote: > > > > > > > > > > > > > > > > On Mon, Dec 09, 2024 at 07:27:47AM +0000, Alice Ryhl wrote: > > > > > > > > > Providing access to the underlying `struct miscdevice` is useful for > > > > > > > > > various reasons. For example, this allows you access the miscdevice's > > > > > > > > > internal `struct device` for use with the `dev_*` printing macros. > > > > > > > > > > > > > > > > > > Note that since the underlying `struct miscdevice` could get freed at > > > > > > > > > any point after the fops->open() call, only the open call is given > > > > > > > > > access to it. To print from other calls, they should take a refcount on > > > > > > > > > the device to keep it alive. > > > > > > > > > > > > > > > > The lifespan of the miscdevice is at least from open until close, so > > > > > > > > it's safe for at least then (i.e. read/write/ioctl/etc.) > > > > > > > > > > > > > > How is that enforced? What happens if I call misc_deregister while > > > > > > > there are open fds? > > > > > > > > > > > > You shouldn't be able to do that as the code that would be calling > > > > > > misc_deregister() (i.e. in a module unload path) would not work because > > > > > > the module reference count is incremented at this point in time due to > > > > > > the file operation module reference. > > > > > > > > > > Oh .. so misc_deregister must only be called when the module is being unloaded? > > > > > > > > Traditionally yes, that's when it is called. Do you see it happening in > > > > any other place in the kernel today? > > > > > > I had not looked, but I know that Binder allows dynamically creating > > > and removing its devices at runtime. It happens to be the case that > > > this is only supported when binderfs is used, which is when it doesn't > > > use miscdevice, so technically Binder does not call misc_deregister() > > > outside of module unload, but following its example it's not hard to > > > imagine that such removals could happen. > > > > That's why those are files and not misc devices :) > > I grepped for misc_deregister and the first driver I looked at is > drivers/misc/bcm-vk which seems to allow dynamic deregistration if the > pci device is removed. Ah, yeah, that's going to get messy and will be a problem if someone has the file open then. > Another tricky path is error cleanup in its probe function. > Technically, if probe fails after registering the misc device, there's > a brief moment where you could open the miscdevice before it gets > removed in the cleanup path, which seems to me that it could lead to > UAF? > > Or is there something I'm missing? Nope, that too is a window of a problem, luckily you "should" only register the misc device after you know the device is safe to use as once it is registered, it could be used so it "should" be the last thing you do in probe. So yes, you are right, and we do know about these issues (again see the talk I mentioned and some previous ones for many years at plumbers conferences by different people.) It's just up to someone to do the work to fix them. If you think we can prevent the race in the rust side, wonderful, I'm all for that being a valid fix. thanks, greg k-h
On Mon, Dec 9, 2024 at 2:13 PM Greg Kroah-Hartman <gregkh@linuxfoundation.org> wrote: > > On Mon, Dec 09, 2024 at 01:53:42PM +0100, Alice Ryhl wrote: > > On Mon, Dec 9, 2024 at 1:08 PM Greg Kroah-Hartman > > <gregkh@linuxfoundation.org> wrote: > > > > > > On Mon, Dec 09, 2024 at 01:00:05PM +0100, Alice Ryhl wrote: > > > > On Mon, Dec 9, 2024 at 12:53 PM Greg Kroah-Hartman > > > > <gregkh@linuxfoundation.org> wrote: > > > > > > > > > > On Mon, Dec 09, 2024 at 12:38:32PM +0100, Alice Ryhl wrote: > > > > > > On Mon, Dec 9, 2024 at 12:10 PM Greg Kroah-Hartman > > > > > > <gregkh@linuxfoundation.org> wrote: > > > > > > > > > > > > > > On Mon, Dec 09, 2024 at 11:50:57AM +0100, Alice Ryhl wrote: > > > > > > > > On Mon, Dec 9, 2024 at 9:48 AM Greg Kroah-Hartman > > > > > > > > <gregkh@linuxfoundation.org> wrote: > > > > > > > > > > > > > > > > > > On Mon, Dec 09, 2024 at 07:27:47AM +0000, Alice Ryhl wrote: > > > > > > > > > > Providing access to the underlying `struct miscdevice` is useful for > > > > > > > > > > various reasons. For example, this allows you access the miscdevice's > > > > > > > > > > internal `struct device` for use with the `dev_*` printing macros. > > > > > > > > > > > > > > > > > > > > Note that since the underlying `struct miscdevice` could get freed at > > > > > > > > > > any point after the fops->open() call, only the open call is given > > > > > > > > > > access to it. To print from other calls, they should take a refcount on > > > > > > > > > > the device to keep it alive. > > > > > > > > > > > > > > > > > > The lifespan of the miscdevice is at least from open until close, so > > > > > > > > > it's safe for at least then (i.e. read/write/ioctl/etc.) > > > > > > > > > > > > > > > > How is that enforced? What happens if I call misc_deregister while > > > > > > > > there are open fds? > > > > > > > > > > > > > > You shouldn't be able to do that as the code that would be calling > > > > > > > misc_deregister() (i.e. in a module unload path) would not work because > > > > > > > the module reference count is incremented at this point in time due to > > > > > > > the file operation module reference. > > > > > > > > > > > > Oh .. so misc_deregister must only be called when the module is being unloaded? > > > > > > > > > > Traditionally yes, that's when it is called. Do you see it happening in > > > > > any other place in the kernel today? > > > > > > > > I had not looked, but I know that Binder allows dynamically creating > > > > and removing its devices at runtime. It happens to be the case that > > > > this is only supported when binderfs is used, which is when it doesn't > > > > use miscdevice, so technically Binder does not call misc_deregister() > > > > outside of module unload, but following its example it's not hard to > > > > imagine that such removals could happen. > > > > > > That's why those are files and not misc devices :) > > > > I grepped for misc_deregister and the first driver I looked at is > > drivers/misc/bcm-vk which seems to allow dynamic deregistration if the > > pci device is removed. > > Ah, yeah, that's going to get messy and will be a problem if someone has > the file open then. > > > Another tricky path is error cleanup in its probe function. > > Technically, if probe fails after registering the misc device, there's > > a brief moment where you could open the miscdevice before it gets > > removed in the cleanup path, which seems to me that it could lead to > > UAF? > > > > Or is there something I'm missing? > > Nope, that too is a window of a problem, luckily you "should" only > register the misc device after you know the device is safe to use as > once it is registered, it could be used so it "should" be the last thing > you do in probe. > > So yes, you are right, and we do know about these issues (again see the > talk I mentioned and some previous ones for many years at plumbers > conferences by different people.) It's just up to someone to do the > work to fix them. > > If you think we can prevent the race in the rust side, wonderful, I'm > all for that being a valid fix. The current patch prevents the race by only allowing access to the `struct miscdevice` in fops->open(). That's safe since `file->f_op->open` runs with `misc_mtx` held. Do we really need the miscdevice to stay alive for longer? You can already take a refcount on `this_device` if you want to keep the device alive for longer for dev_* printing purposes, but it seems like that is the only field you really need from the `struct miscdevice` past fops->open()? Alice
Hi Alice, kernel test robot noticed the following build errors: [auto build test ERROR on 40384c840ea1944d7c5a392e8975ed088ecf0b37] url: https://github.com/intel-lab-lkp/linux/commits/Alice-Ryhl/rust-miscdevice-access-file-in-fops/20241209-153054 base: 40384c840ea1944d7c5a392e8975ed088ecf0b37 patch link: https://lore.kernel.org/r/20241209-miscdevice-file-param-v2-2-83ece27e9ff6%40google.com patch subject: [PATCH v2 2/2] rust: miscdevice: access the `struct miscdevice` from fops->open() config: x86_64-rhel-9.4-rust (https://download.01.org/0day-ci/archive/20241209/202412092214.P4acQ6Rn-lkp@intel.com/config) compiler: clang version 19.1.3 (https://github.com/llvm/llvm-project ab51eccf88f5321e7c60591c5546b254b6afab99) reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20241209/202412092214.P4acQ6Rn-lkp@intel.com/reproduce) If you fix the issue in a separate patch/commit (i.e. not just a new version of the same patch/commit), kindly add following tags | Reported-by: kernel test robot <lkp@intel.com> | Closes: https://lore.kernel.org/oe-kbuild-all/202412092214.P4acQ6Rn-lkp@intel.com/ All errors (new ones prefixed by >>): In file included from arch/x86/kernel/asm-offsets.c:14: In file included from include/linux/suspend.h:5: In file included from include/linux/swap.h:9: In file included from include/linux/memcontrol.h:21: In file included from include/linux/mm.h:2223: include/linux/vmstat.h:504:43: warning: arithmetic between different enumeration types ('enum zone_stat_item' and 'enum numa_stat_item') [-Wenum-enum-conversion] 504 | return vmstat_text[NR_VM_ZONE_STAT_ITEMS + | ~~~~~~~~~~~~~~~~~~~~~ ^ 505 | item]; | ~~~~ include/linux/vmstat.h:511:43: warning: arithmetic between different enumeration types ('enum zone_stat_item' and 'enum numa_stat_item') [-Wenum-enum-conversion] 511 | return vmstat_text[NR_VM_ZONE_STAT_ITEMS + | ~~~~~~~~~~~~~~~~~~~~~ ^ 512 | NR_VM_NUMA_EVENT_ITEMS + | ~~~~~~~~~~~~~~~~~~~~~~ include/linux/vmstat.h:518:36: warning: arithmetic between different enumeration types ('enum node_stat_item' and 'enum lru_list') [-Wenum-enum-conversion] 518 | return node_stat_name(NR_LRU_BASE + lru) + 3; // skip "nr_" | ~~~~~~~~~~~ ^ ~~~ include/linux/vmstat.h:524:43: warning: arithmetic between different enumeration types ('enum zone_stat_item' and 'enum numa_stat_item') [-Wenum-enum-conversion] 524 | return vmstat_text[NR_VM_ZONE_STAT_ITEMS + | ~~~~~~~~~~~~~~~~~~~~~ ^ 525 | NR_VM_NUMA_EVENT_ITEMS + | ~~~~~~~~~~~~~~~~~~~~~~ 4 warnings generated. *** *** Rust bindings generator 'bindgen' < 0.69.5 together with libclang >= 19.1 *** may not work due to a bug (https://github.com/rust-lang/rust-bindgen/pull/2824), *** unless patched (like Debian's). *** Your bindgen version: 0.65.1 *** Your libclang version: 19.1.3 *** *** *** Please see Documentation/rust/quick-start.rst for details *** on how to set up the Rust support. *** In file included from rust/helpers/helpers.c:10: In file included from rust/helpers/blk.c:3: In file included from include/linux/blk-mq.h:5: In file included from include/linux/blkdev.h:9: In file included from include/linux/blk_types.h:10: In file included from include/linux/bvec.h:10: In file included from include/linux/highmem.h:8: In file included from include/linux/cacheflush.h:5: In file included from arch/x86/include/asm/cacheflush.h:5: In file included from include/linux/mm.h:2223: include/linux/vmstat.h:504:43: warning: arithmetic between different enumeration types ('enum zone_stat_item' and 'enum numa_stat_item') [-Wenum-enum-conversion] 504 | return vmstat_text[NR_VM_ZONE_STAT_ITEMS + | ~~~~~~~~~~~~~~~~~~~~~ ^ 505 | item]; | ~~~~ include/linux/vmstat.h:511:43: warning: arithmetic between different enumeration types ('enum zone_stat_item' and 'enum numa_stat_item') [-Wenum-enum-conversion] 511 | return vmstat_text[NR_VM_ZONE_STAT_ITEMS + | ~~~~~~~~~~~~~~~~~~~~~ ^ 512 | NR_VM_NUMA_EVENT_ITEMS + | ~~~~~~~~~~~~~~~~~~~~~~ include/linux/vmstat.h:518:36: warning: arithmetic between different enumeration types ('enum node_stat_item' and 'enum lru_list') [-Wenum-enum-conversion] 518 | return node_stat_name(NR_LRU_BASE + lru) + 3; // skip "nr_" | ~~~~~~~~~~~ ^ ~~~ include/linux/vmstat.h:524:43: warning: arithmetic between different enumeration types ('enum zone_stat_item' and 'enum numa_stat_item') [-Wenum-enum-conversion] 524 | return vmstat_text[NR_VM_ZONE_STAT_ITEMS + | ~~~~~~~~~~~~~~~~~~~~~ ^ 525 | NR_VM_NUMA_EVENT_ITEMS + | ~~~~~~~~~~~~~~~~~~~~~~ clang diag: include/linux/vmstat.h:504:43: warning: arithmetic between different enumeration types ('enum zone_stat_item' and 'enum numa_stat_item') [-Wenum-enum-conversion] clang diag: include/linux/vmstat.h:511:43: warning: arithmetic between different enumeration types ('enum zone_stat_item' and 'enum numa_stat_item') [-Wenum-enum-conversion] clang diag: include/linux/vmstat.h:518:36: warning: arithmetic between different enumeration types ('enum node_stat_item' and 'enum lru_list') [-Wenum-enum-conversion] clang diag: include/linux/vmstat.h:524:43: warning: arithmetic between different enumeration types ('enum zone_stat_item' and 'enum numa_stat_item') [-Wenum-enum-conversion] 4 warnings generated. clang diag: include/linux/vmstat.h:504:43: warning: arithmetic between different enumeration types ('enum zone_stat_item' and 'enum numa_stat_item') [-Wenum-enum-conversion] clang diag: include/linux/vmstat.h:511:43: warning: arithmetic between different enumeration types ('enum zone_stat_item' and 'enum numa_stat_item') [-Wenum-enum-conversion] clang diag: include/linux/vmstat.h:518:36: warning: arithmetic between different enumeration types ('enum node_stat_item' and 'enum lru_list') [-Wenum-enum-conversion] clang diag: include/linux/vmstat.h:524:43: warning: arithmetic between different enumeration types ('enum zone_stat_item' and 'enum numa_stat_item') [-Wenum-enum-conversion] clang diag: include/linux/vmstat.h:504:43: warning: arithmetic between different enumeration types ('enum zone_stat_item' and 'enum numa_stat_item') [-Wenum-enum-conversion] clang diag: include/linux/vmstat.h:511:43: warning: arithmetic between different enumeration types ('enum zone_stat_item' and 'enum numa_stat_item') [-Wenum-enum-conversion] clang diag: include/linux/vmstat.h:518:36: warning: arithmetic between different enumeration types ('enum node_stat_item' and 'enum lru_list') [-Wenum-enum-conversion] clang diag: include/linux/vmstat.h:524:43: warning: arithmetic between different enumeration types ('enum zone_stat_item' and 'enum numa_stat_item') [-Wenum-enum-conversion] >> error[E0277]: the size for values of type `Self` cannot be known at compilation time --> rust/kernel/miscdevice.rs:107:35 | 107 | fn open(_file: &File, _misc: &MiscDeviceRegistration<Self>) -> Result<Self::Ptr>; | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ doesn't have a size known at compile-time | note: required by an implicit `Sized` bound in `MiscDeviceRegistration` --> rust/kernel/miscdevice.rs:52:35 | 52 | pub struct MiscDeviceRegistration<T> { | ^ required by the implicit `Sized` requirement on this type parameter in `MiscDeviceRegistration` help: consider further restricting `Self` | 107 | fn open(_file: &File, _misc: &MiscDeviceRegistration<Self>) -> Result<Self::Ptr> where Self: Sized; | +++++++++++++++++ help: consider relaxing the implicit `Sized` restriction | 52 | pub struct MiscDeviceRegistration<T: ?Sized> { | ++++++++ -- >> error[E0609]: no field `private_data` on type `File` --> rust/kernel/miscdevice.rs:215:22 | 215 | unsafe { (*file).private_data = ptr.into_foreign().cast_mut() }; | ^^^^^^^^^^^^ unknown field
On Mon, Dec 09, 2024 at 02:36:31PM +0100, Alice Ryhl wrote: > On Mon, Dec 9, 2024 at 2:13 PM Greg Kroah-Hartman > <gregkh@linuxfoundation.org> wrote: > > > > On Mon, Dec 09, 2024 at 01:53:42PM +0100, Alice Ryhl wrote: > > > On Mon, Dec 9, 2024 at 1:08 PM Greg Kroah-Hartman > > > <gregkh@linuxfoundation.org> wrote: > > > > > > > > On Mon, Dec 09, 2024 at 01:00:05PM +0100, Alice Ryhl wrote: > > > > > On Mon, Dec 9, 2024 at 12:53 PM Greg Kroah-Hartman > > > > > <gregkh@linuxfoundation.org> wrote: > > > > > > > > > > > > On Mon, Dec 09, 2024 at 12:38:32PM +0100, Alice Ryhl wrote: > > > > > > > On Mon, Dec 9, 2024 at 12:10 PM Greg Kroah-Hartman > > > > > > > <gregkh@linuxfoundation.org> wrote: > > > > > > > > > > > > > > > > On Mon, Dec 09, 2024 at 11:50:57AM +0100, Alice Ryhl wrote: > > > > > > > > > On Mon, Dec 9, 2024 at 9:48 AM Greg Kroah-Hartman > > > > > > > > > <gregkh@linuxfoundation.org> wrote: > > > > > > > > > > > > > > > > > > > > On Mon, Dec 09, 2024 at 07:27:47AM +0000, Alice Ryhl wrote: > > > > > > > > > > > Providing access to the underlying `struct miscdevice` is useful for > > > > > > > > > > > various reasons. For example, this allows you access the miscdevice's > > > > > > > > > > > internal `struct device` for use with the `dev_*` printing macros. > > > > > > > > > > > > > > > > > > > > > > Note that since the underlying `struct miscdevice` could get freed at > > > > > > > > > > > any point after the fops->open() call, only the open call is given > > > > > > > > > > > access to it. To print from other calls, they should take a refcount on > > > > > > > > > > > the device to keep it alive. > > > > > > > > > > > > > > > > > > > > The lifespan of the miscdevice is at least from open until close, so > > > > > > > > > > it's safe for at least then (i.e. read/write/ioctl/etc.) > > > > > > > > > > > > > > > > > > How is that enforced? What happens if I call misc_deregister while > > > > > > > > > there are open fds? > > > > > > > > > > > > > > > > You shouldn't be able to do that as the code that would be calling > > > > > > > > misc_deregister() (i.e. in a module unload path) would not work because > > > > > > > > the module reference count is incremented at this point in time due to > > > > > > > > the file operation module reference. > > > > > > > > > > > > > > Oh .. so misc_deregister must only be called when the module is being unloaded? > > > > > > > > > > > > Traditionally yes, that's when it is called. Do you see it happening in > > > > > > any other place in the kernel today? > > > > > > > > > > I had not looked, but I know that Binder allows dynamically creating > > > > > and removing its devices at runtime. It happens to be the case that > > > > > this is only supported when binderfs is used, which is when it doesn't > > > > > use miscdevice, so technically Binder does not call misc_deregister() > > > > > outside of module unload, but following its example it's not hard to > > > > > imagine that such removals could happen. > > > > > > > > That's why those are files and not misc devices :) > > > > > > I grepped for misc_deregister and the first driver I looked at is > > > drivers/misc/bcm-vk which seems to allow dynamic deregistration if the > > > pci device is removed. > > > > Ah, yeah, that's going to get messy and will be a problem if someone has > > the file open then. > > > > > Another tricky path is error cleanup in its probe function. > > > Technically, if probe fails after registering the misc device, there's > > > a brief moment where you could open the miscdevice before it gets > > > removed in the cleanup path, which seems to me that it could lead to > > > UAF? > > > > > > Or is there something I'm missing? > > > > Nope, that too is a window of a problem, luckily you "should" only > > register the misc device after you know the device is safe to use as > > once it is registered, it could be used so it "should" be the last thing > > you do in probe. > > > > So yes, you are right, and we do know about these issues (again see the > > talk I mentioned and some previous ones for many years at plumbers > > conferences by different people.) It's just up to someone to do the > > work to fix them. > > > > If you think we can prevent the race in the rust side, wonderful, I'm > > all for that being a valid fix. > > The current patch prevents the race by only allowing access to the > `struct miscdevice` in fops->open(). That's safe since > `file->f_op->open` runs with `misc_mtx` held. Do we really need the > miscdevice to stay alive for longer? You can already take a refcount > on `this_device` if you want to keep the device alive for longer for > dev_* printing purposes, but it seems like that is the only field you > really need from the `struct miscdevice` past fops->open()? Good point, I also can't really see anything within struct miscdevice that a driver could need other than `this_device`. How would you provide the `device::Device` within the `MiscDevice` trait functions? If we don't guarantee that the `struct miscdevice` is still alive past open() we need to take a reference on `this_device` in open(). I guess the idea would be to let `MiscDeviceRegistration` provide a function to obtain an `ARef<device::Device>`? > > Alice >
On Mon, Dec 9, 2024 at 4:01 PM Danilo Krummrich <dakr@kernel.org> wrote: > > On Mon, Dec 09, 2024 at 02:36:31PM +0100, Alice Ryhl wrote: > > On Mon, Dec 9, 2024 at 2:13 PM Greg Kroah-Hartman > > <gregkh@linuxfoundation.org> wrote: > > > > > > On Mon, Dec 09, 2024 at 01:53:42PM +0100, Alice Ryhl wrote: > > > > On Mon, Dec 9, 2024 at 1:08 PM Greg Kroah-Hartman > > > > <gregkh@linuxfoundation.org> wrote: > > > > > > > > > > On Mon, Dec 09, 2024 at 01:00:05PM +0100, Alice Ryhl wrote: > > > > > > On Mon, Dec 9, 2024 at 12:53 PM Greg Kroah-Hartman > > > > > > <gregkh@linuxfoundation.org> wrote: > > > > > > > > > > > > > > On Mon, Dec 09, 2024 at 12:38:32PM +0100, Alice Ryhl wrote: > > > > > > > > On Mon, Dec 9, 2024 at 12:10 PM Greg Kroah-Hartman > > > > > > > > <gregkh@linuxfoundation.org> wrote: > > > > > > > > > > > > > > > > > > On Mon, Dec 09, 2024 at 11:50:57AM +0100, Alice Ryhl wrote: > > > > > > > > > > On Mon, Dec 9, 2024 at 9:48 AM Greg Kroah-Hartman > > > > > > > > > > <gregkh@linuxfoundation.org> wrote: > > > > > > > > > > > > > > > > > > > > > > On Mon, Dec 09, 2024 at 07:27:47AM +0000, Alice Ryhl wrote: > > > > > > > > > > > > Providing access to the underlying `struct miscdevice` is useful for > > > > > > > > > > > > various reasons. For example, this allows you access the miscdevice's > > > > > > > > > > > > internal `struct device` for use with the `dev_*` printing macros. > > > > > > > > > > > > > > > > > > > > > > > > Note that since the underlying `struct miscdevice` could get freed at > > > > > > > > > > > > any point after the fops->open() call, only the open call is given > > > > > > > > > > > > access to it. To print from other calls, they should take a refcount on > > > > > > > > > > > > the device to keep it alive. > > > > > > > > > > > > > > > > > > > > > > The lifespan of the miscdevice is at least from open until close, so > > > > > > > > > > > it's safe for at least then (i.e. read/write/ioctl/etc.) > > > > > > > > > > > > > > > > > > > > How is that enforced? What happens if I call misc_deregister while > > > > > > > > > > there are open fds? > > > > > > > > > > > > > > > > > > You shouldn't be able to do that as the code that would be calling > > > > > > > > > misc_deregister() (i.e. in a module unload path) would not work because > > > > > > > > > the module reference count is incremented at this point in time due to > > > > > > > > > the file operation module reference. > > > > > > > > > > > > > > > > Oh .. so misc_deregister must only be called when the module is being unloaded? > > > > > > > > > > > > > > Traditionally yes, that's when it is called. Do you see it happening in > > > > > > > any other place in the kernel today? > > > > > > > > > > > > I had not looked, but I know that Binder allows dynamically creating > > > > > > and removing its devices at runtime. It happens to be the case that > > > > > > this is only supported when binderfs is used, which is when it doesn't > > > > > > use miscdevice, so technically Binder does not call misc_deregister() > > > > > > outside of module unload, but following its example it's not hard to > > > > > > imagine that such removals could happen. > > > > > > > > > > That's why those are files and not misc devices :) > > > > > > > > I grepped for misc_deregister and the first driver I looked at is > > > > drivers/misc/bcm-vk which seems to allow dynamic deregistration if the > > > > pci device is removed. > > > > > > Ah, yeah, that's going to get messy and will be a problem if someone has > > > the file open then. > > > > > > > Another tricky path is error cleanup in its probe function. > > > > Technically, if probe fails after registering the misc device, there's > > > > a brief moment where you could open the miscdevice before it gets > > > > removed in the cleanup path, which seems to me that it could lead to > > > > UAF? > > > > > > > > Or is there something I'm missing? > > > > > > Nope, that too is a window of a problem, luckily you "should" only > > > register the misc device after you know the device is safe to use as > > > once it is registered, it could be used so it "should" be the last thing > > > you do in probe. > > > > > > So yes, you are right, and we do know about these issues (again see the > > > talk I mentioned and some previous ones for many years at plumbers > > > conferences by different people.) It's just up to someone to do the > > > work to fix them. > > > > > > If you think we can prevent the race in the rust side, wonderful, I'm > > > all for that being a valid fix. > > > > The current patch prevents the race by only allowing access to the > > `struct miscdevice` in fops->open(). That's safe since > > `file->f_op->open` runs with `misc_mtx` held. Do we really need the > > miscdevice to stay alive for longer? You can already take a refcount > > on `this_device` if you want to keep the device alive for longer for > > dev_* printing purposes, but it seems like that is the only field you > > really need from the `struct miscdevice` past fops->open()? > > Good point, I also can't really see anything within struct miscdevice that a > driver could need other than `this_device`. > > How would you provide the `device::Device` within the `MiscDevice` trait > functions? > > If we don't guarantee that the `struct miscdevice` is still alive past open() we > need to take a reference on `this_device` in open(). > > I guess the idea would be to let `MiscDeviceRegistration` provide a function to > obtain an `ARef<device::Device>`? Yes, you take a refcount on the device and store an ARef<device::Device> in your own struct. You would need Lee's accessor to obtain the device refcount: https://lore.kernel.org/all/20241206090515.752267-3-lee@kernel.org/ Alice
On Mon, Dec 09, 2024 at 04:04:48PM +0100, Alice Ryhl wrote: > On Mon, Dec 9, 2024 at 4:01 PM Danilo Krummrich <dakr@kernel.org> wrote: > > > > On Mon, Dec 09, 2024 at 02:36:31PM +0100, Alice Ryhl wrote: > > > On Mon, Dec 9, 2024 at 2:13 PM Greg Kroah-Hartman > > > <gregkh@linuxfoundation.org> wrote: > > > > > > > > On Mon, Dec 09, 2024 at 01:53:42PM +0100, Alice Ryhl wrote: > > > > > On Mon, Dec 9, 2024 at 1:08 PM Greg Kroah-Hartman > > > > > <gregkh@linuxfoundation.org> wrote: > > > > > > > > > > > > On Mon, Dec 09, 2024 at 01:00:05PM +0100, Alice Ryhl wrote: > > > > > > > On Mon, Dec 9, 2024 at 12:53 PM Greg Kroah-Hartman > > > > > > > <gregkh@linuxfoundation.org> wrote: > > > > > > > > > > > > > > > > On Mon, Dec 09, 2024 at 12:38:32PM +0100, Alice Ryhl wrote: > > > > > > > > > On Mon, Dec 9, 2024 at 12:10 PM Greg Kroah-Hartman > > > > > > > > > <gregkh@linuxfoundation.org> wrote: > > > > > > > > > > > > > > > > > > > > On Mon, Dec 09, 2024 at 11:50:57AM +0100, Alice Ryhl wrote: > > > > > > > > > > > On Mon, Dec 9, 2024 at 9:48 AM Greg Kroah-Hartman > > > > > > > > > > > <gregkh@linuxfoundation.org> wrote: > > > > > > > > > > > > > > > > > > > > > > > > On Mon, Dec 09, 2024 at 07:27:47AM +0000, Alice Ryhl wrote: > > > > > > > > > > > > > Providing access to the underlying `struct miscdevice` is useful for > > > > > > > > > > > > > various reasons. For example, this allows you access the miscdevice's > > > > > > > > > > > > > internal `struct device` for use with the `dev_*` printing macros. > > > > > > > > > > > > > > > > > > > > > > > > > > Note that since the underlying `struct miscdevice` could get freed at > > > > > > > > > > > > > any point after the fops->open() call, only the open call is given > > > > > > > > > > > > > access to it. To print from other calls, they should take a refcount on > > > > > > > > > > > > > the device to keep it alive. > > > > > > > > > > > > > > > > > > > > > > > > The lifespan of the miscdevice is at least from open until close, so > > > > > > > > > > > > it's safe for at least then (i.e. read/write/ioctl/etc.) > > > > > > > > > > > > > > > > > > > > > > How is that enforced? What happens if I call misc_deregister while > > > > > > > > > > > there are open fds? > > > > > > > > > > > > > > > > > > > > You shouldn't be able to do that as the code that would be calling > > > > > > > > > > misc_deregister() (i.e. in a module unload path) would not work because > > > > > > > > > > the module reference count is incremented at this point in time due to > > > > > > > > > > the file operation module reference. > > > > > > > > > > > > > > > > > > Oh .. so misc_deregister must only be called when the module is being unloaded? > > > > > > > > > > > > > > > > Traditionally yes, that's when it is called. Do you see it happening in > > > > > > > > any other place in the kernel today? > > > > > > > > > > > > > > I had not looked, but I know that Binder allows dynamically creating > > > > > > > and removing its devices at runtime. It happens to be the case that > > > > > > > this is only supported when binderfs is used, which is when it doesn't > > > > > > > use miscdevice, so technically Binder does not call misc_deregister() > > > > > > > outside of module unload, but following its example it's not hard to > > > > > > > imagine that such removals could happen. > > > > > > > > > > > > That's why those are files and not misc devices :) > > > > > > > > > > I grepped for misc_deregister and the first driver I looked at is > > > > > drivers/misc/bcm-vk which seems to allow dynamic deregistration if the > > > > > pci device is removed. > > > > > > > > Ah, yeah, that's going to get messy and will be a problem if someone has > > > > the file open then. > > > > > > > > > Another tricky path is error cleanup in its probe function. > > > > > Technically, if probe fails after registering the misc device, there's > > > > > a brief moment where you could open the miscdevice before it gets > > > > > removed in the cleanup path, which seems to me that it could lead to > > > > > UAF? > > > > > > > > > > Or is there something I'm missing? > > > > > > > > Nope, that too is a window of a problem, luckily you "should" only > > > > register the misc device after you know the device is safe to use as > > > > once it is registered, it could be used so it "should" be the last thing > > > > you do in probe. > > > > > > > > So yes, you are right, and we do know about these issues (again see the > > > > talk I mentioned and some previous ones for many years at plumbers > > > > conferences by different people.) It's just up to someone to do the > > > > work to fix them. > > > > > > > > If you think we can prevent the race in the rust side, wonderful, I'm > > > > all for that being a valid fix. > > > > > > The current patch prevents the race by only allowing access to the > > > `struct miscdevice` in fops->open(). That's safe since > > > `file->f_op->open` runs with `misc_mtx` held. Do we really need the > > > miscdevice to stay alive for longer? You can already take a refcount > > > on `this_device` if you want to keep the device alive for longer for > > > dev_* printing purposes, but it seems like that is the only field you > > > really need from the `struct miscdevice` past fops->open()? > > > > Good point, I also can't really see anything within struct miscdevice that a > > driver could need other than `this_device`. > > > > How would you provide the `device::Device` within the `MiscDevice` trait > > functions? > > > > If we don't guarantee that the `struct miscdevice` is still alive past open() we > > need to take a reference on `this_device` in open(). > > > > I guess the idea would be to let `MiscDeviceRegistration` provide a function to > > obtain an `ARef<device::Device>`? > > Yes, you take a refcount on the device and store an > ARef<device::Device> in your own struct. You would need Lee's accessor > to obtain the device refcount: > https://lore.kernel.org/all/20241206090515.752267-3-lee@kernel.org/ Sounds good! > > Alice
diff --git a/rust/kernel/miscdevice.rs b/rust/kernel/miscdevice.rs index 0cb79676c139..c5af1d5ec4be 100644 --- a/rust/kernel/miscdevice.rs +++ b/rust/kernel/miscdevice.rs @@ -104,7 +104,7 @@ pub trait MiscDevice { /// Called when the misc device is opened. /// /// The returned pointer will be stored as the private data for the file. - fn open(_file: &File) -> Result<Self::Ptr>; + fn open(_file: &File, _misc: &MiscDeviceRegistration<Self>) -> Result<Self::Ptr>; /// Called when the misc device is released. fn release(device: Self::Ptr, _file: &File) { @@ -190,14 +190,27 @@ impl<T: MiscDevice> VtableHelper<T> { return ret; } + // SAFETY: The opwn call of a file can access the private data. + let misc_ptr = unsafe { (*file).private_data }; + // SAFETY: This is a miscdevice, so `misc_open()` set the private data to a pointer to the + // associated `struct miscdevice` before calling into this method. Furthermore, `misc_open()` + // ensures that the miscdevice can't be unregistered and freed during this call to `fops_open`. + let misc = unsafe { &*misc_ptr.cast::<MiscDeviceRegistration<T>>() }; + // SAFETY: - // * The file is valid for the duration of this call. + // * The file is valid for the duration of the `T::open` call. // * There is no active fdget_pos region on the file on this thread. - let ptr = match T::open(unsafe { File::from_raw_file(file) }) { + let file = unsafe { File::from_raw_file(file) }; + + let ptr = match T::open(file, misc) { Ok(ptr) => ptr, Err(err) => return err.to_errno(), }; + // This overwrites the private data from above. It makes sense to not hold on to the misc + // pointer since the `struct miscdevice` can get unregistered as soon as we return from this + // call, so the misc pointer might be dangling on future file operations. + // // SAFETY: The open call of a file owns the private data. unsafe { (*file).private_data = ptr.into_foreign().cast_mut() };
Providing access to the underlying `struct miscdevice` is useful for various reasons. For example, this allows you access the miscdevice's internal `struct device` for use with the `dev_*` printing macros. Note that since the underlying `struct miscdevice` could get freed at any point after the fops->open() call, only the open call is given access to it. To print from other calls, they should take a refcount on the device to keep it alive. Signed-off-by: Alice Ryhl <aliceryhl@google.com> --- rust/kernel/miscdevice.rs | 19 ++++++++++++++++--- 1 file changed, 16 insertions(+), 3 deletions(-)