mbox series

[RFC,0/5] Rust block device driver API and null block driver

Message ID 20240313110515.70088-1-nmi@metaspace.dk (mailing list archive)
Headers show
Series Rust block device driver API and null block driver | expand

Message

Andreas Hindborg March 13, 2024, 11:05 a.m. UTC
From: Andreas Hindborg <a.hindborg@samsung.com>

Hi All!

This is the second version of the Rust block device driver API and the Rust null
block driver. The context and motivation can be seen in cover letter of the RFC
v1 [1]. If more context is required, a talk about this effort was recorded at
LPC [2]. I hope to be able to discuss this series at LSF this year [3].

The series is still in RFC state, as many dependencies have not yet been merged.

Changes from v1:
 - Improved request lifetime tracking
 - Changed rust null block driver name to `rnull`
 - Added timer completion support to `rnull`
 - Added softirq completion support to `rnull`
 - Moved to `xarray` for memory backing
 - Adopted the folio abstraction where applicable
 - Added `SAFETY` comments to all unsafe blocks
 - Improved documentation and added examples
 - Fixed up `vtable` default method behavior
 - Dropped `unsafe` from vtable builder
 - Fixed a safety issue in `RawWriter`
 - Dropped the `TagSetRef` abstraction
 - Renamed `Bio::iter` -> `Bio::raw_iter`
 - Updated `core::fmt::Display for Bio` for readability
 - Simplified `BioIterator::next`
 - Documented and refactored `bvec_iter` functions
 - Moved cache alignment out of `Lock`
 - Added MAINTAINER entry

Thanks for all the input on v1!


Performance
===========

Rather than include an abundance of performance numbers in this letter, I would
refer to [4] for a comparison of performance to the C null block driver. In
general across all of the benchmark configurations, the mean of the difference
is -2.5%, with individual configurations showing [-10%;30%] difference in IO/s.


Request lifetime modeling
=========================

While implementing timer completion, it became clear that the request lifetime
modeling applied in the v1 RFC was not adequate. With that modeling, combined
with the timer callback feature, a developer would be able to write safe Rust
code that violates the Rust invariants for references, thus leading to undefined
behavior. A similar situation would happen in the completion path when
developers write code to convert from a completion id to a `&Request`.

To make these situations safe, and thus free from undefined behavior, the
`Request` type now applies reference counting. The logic piggybacks on the
`atomic_t ref` field of the C `struct request`. In order to make this work, the
Rust code has to be able to free requests when the refcount reaches zero.
Therefore, the series exposes (EXPORT_SYMBOL_GPL) the internal blk-mq symbol
`__blk_mq_free_request`.

I am curious what the community thinks of this approach and whether it is OK to
expose this symbol so that Rust can call it.

One consequence of these changes is that requests that are processed by a Rust
block device driver, are freed at a different place in the completion path, than
requests processed by C drivers. Requests processed by C drivers are typically
freed and recycled inline during the call to `blk_mq_complete_request`. The
requests processed by a Rust driver will be recycled when the `RequestRef` is
dropped, which is usually at some time after the request has been completed.

There does not seem to be any statistically significant effect on performance
for the rust null block implementation due to this change.


Dependencies
============

This series is based on the branch `rnull-deps-v6.8` of [5]. This tree is based
on the upstream `v6.8` tag and has been prepared with dependencies for this
series:

- [17] rust: add `CacheAligned` for easy cache line alignment of values
- [16] rust: xarray: add mutable access through `Guard`
- [15] rust: hrtimer: introduce hrtimer support
- [14] rust: add `Opaque::try_ffi_init`
- [13] rust: make `ARef` repr transparent
- [12] rust: folio: add `with_slice_into_page()`
- [11] rust: page: add `from_raw()`
- [10] rust: page: add `with_slice_into_page()`
- [ 9] IM: implement `ForeignOwnable` for `Pin`
- [ 8] IM: add `module_params` macro
- [ 7] rust: xarray: fix example
- [ 6] LIST: rust: xarray: Add an abstraction for XArray
- [ 5] LIST: rust: types: add FOREIGN_ALIGN to ForeignOwnable
- [ 4] LIST: rust: lock: implement `IrqSaveBackend` for `SpinLock`
- [ 3] LIST: rust: lock: add support for `Lock::lock_irqsave`
- [ 2] LIST: rust: add improved version of `ForeignOwnable::borrow_mut`
- [ 1] LIST: rust: folio: introduce basic support for folios
- [ 0] LIST: rust: add abstraction for `struct page`

Dependencies 0-6 are patches that have appeared on the list but are not yet
merged. Dependencies 8-9 are imports from the `rust` branch [6,7] that have not
yet been submitted. Dependencies 10-17 I will submit independently in the near
future.

If the upstream maintainers agree, then when the dependencies are merged, I will
eventually start submitting PRs to Jens. I fully commit to maintain this code as
indicated in the MAINTAINERS entry.

Best regards,
Andreas Hindborg


[1] https://lore.kernel.org/rust-for-linux/20230503090708.2524310-1-nmi@metaspace.dk
[2] https://lpc.events/event/17/contributions/1425
[3] https://lore.kernel.org/rust-for-linux/87v87cgsp8.fsf@metaspace.dk
[4] https://rust-for-linux.com/null-block-driver
[5] git https://github.com/metaspace/linux.git rnull-deps-v6.8
[6] git https://github.com/rust-for-Linux/linux.git rust
[7] https://rust-for-linux.com/branches#rust


Andreas Hindborg (5):
  rust: block: introduce `kernel::block::mq` module
  rust: block: introduce `kernel::block::bio` module
  rust: block: allow `hrtimer::Timer` in `RequestData`
  rust: block: add rnull, Rust null_blk implementation
  MAINTAINERS: add entry for Rust block device driver API

 MAINTAINERS                        |  14 ++
 block/blk-mq.c                     |   3 +-
 drivers/block/Kconfig              |   4 +
 drivers/block/Makefile             |   3 +
 drivers/block/rnull.rs             | 323 +++++++++++++++++++++++++++
 include/linux/blk-mq.h             |   1 +
 rust/bindings/bindings_helper.h    |   2 +
 rust/helpers.c                     |  46 ++++
 rust/kernel/block.rs               |   6 +
 rust/kernel/block/bio.rs           | 112 ++++++++++
 rust/kernel/block/bio/vec.rs       | 279 +++++++++++++++++++++++
 rust/kernel/block/mq.rs            | 131 +++++++++++
 rust/kernel/block/mq/gen_disk.rs   | 174 +++++++++++++++
 rust/kernel/block/mq/operations.rs | 346 +++++++++++++++++++++++++++++
 rust/kernel/block/mq/raw_writer.rs |  60 +++++
 rust/kernel/block/mq/request.rs    | 269 ++++++++++++++++++++++
 rust/kernel/block/mq/tag_set.rs    | 117 ++++++++++
 rust/kernel/error.rs               |   5 +
 rust/kernel/lib.rs                 |   1 +
 scripts/Makefile.build             |   2 +-
 20 files changed, 1896 insertions(+), 2 deletions(-)
 create mode 100644 drivers/block/rnull.rs
 create mode 100644 rust/kernel/block.rs
 create mode 100644 rust/kernel/block/bio.rs
 create mode 100644 rust/kernel/block/bio/vec.rs
 create mode 100644 rust/kernel/block/mq.rs
 create mode 100644 rust/kernel/block/mq/gen_disk.rs
 create mode 100644 rust/kernel/block/mq/operations.rs
 create mode 100644 rust/kernel/block/mq/raw_writer.rs
 create mode 100644 rust/kernel/block/mq/request.rs
 create mode 100644 rust/kernel/block/mq/tag_set.rs


base-commit: e8f897f4afef0031fe618a8e94127a0934896aba
prerequisite-patch-id: 299c2cc48c45c8149e7cb18c6146a0308e5d0a44
prerequisite-patch-id: 5153ee1c410dbdf22a6fd40667712943b4b89a97
prerequisite-patch-id: 4d025bab9bc9741aedecc5327ad18f88f9105271
prerequisite-patch-id: a5e932c86fa6c68234764aa3d7f314e5b534b1d9
prerequisite-patch-id: aef3042976c4c678b7aa96154fc280f9061ebaf7
prerequisite-patch-id: 8bf108ad0af2a3ec89acb8d99ee1b49ca2f51c69
prerequisite-patch-id: a803b221c3232db3258406a4075558e85acefd09
prerequisite-patch-id: 5e9cbcd0dc56a83353f0e4a3b5d4e8d5b51f3160
prerequisite-patch-id: 28bae4a7fe83b36afed9892515a6dde1ea51e98a
prerequisite-patch-id: 5b5ea2a21b37afb05fdf655396a6f74d83bb99c4
prerequisite-patch-id: dc53b6ce21f74726d5d13988398c2954da07bcb6
prerequisite-patch-id: b86d2b14f1770c1d4756ca10b93efaada643e560
prerequisite-patch-id: 6a155859eb9a18afcd22a5bda3350d45d92e2fc7
prerequisite-patch-id: c8ca075008f50d3cf1781c1ea1130a8ee735e7d2
prerequisite-patch-id: b000cd190fe80dea3f4dd9172ecf8787f23b72be
prerequisite-patch-id: b72f1fc3bd44b60911d9d91ecc5a45812a75eba3
prerequisite-patch-id: 167c7770e124b9afa44bead742f90a57ac73f2d7
prerequisite-patch-id: cc24a3135b4f258f4b1ea83ac91c2be4ffe31772

Comments

Bart Van Assche March 13, 2024, 6:02 p.m. UTC | #1
On 3/13/24 04:05, Andreas Hindborg wrote:
> This is the second version of the Rust block device driver API and the Rust null
> block driver. The context and motivation can be seen in cover letter of the RFC
> v1 [1]. If more context is required, a talk about this effort was recorded at
> LPC [2]. I hope to be able to discuss this series at LSF this year [3].

Memory safety may land in C++ in the near future (see also
https://herbsutter.com/2024/03/). If memory-safe C++ or memory-safe C
would be adopted in the kernel, it would allow writing memory-safe
drivers without having to add complicated bindings between existing C
code and new Rust code. Please do not take this as personal criticism -
I appreciate the effort that has been spent on coming up with great
Rust bindings for the Linux kernel block layer.

Thanks,

Bart.
Boqun Feng March 13, 2024, 6:22 p.m. UTC | #2
On Wed, Mar 13, 2024 at 11:02:23AM -0700, Bart Van Assche wrote:
> On 3/13/24 04:05, Andreas Hindborg wrote:
> > This is the second version of the Rust block device driver API and the Rust null
> > block driver. The context and motivation can be seen in cover letter of the RFC
> > v1 [1]. If more context is required, a talk about this effort was recorded at
> > LPC [2]. I hope to be able to discuss this series at LSF this year [3].
> 
> Memory safety may land in C++ in the near future (see also
> https://herbsutter.com/2024/03/). If memory-safe C++ or memory-safe C
> would be adopted in the kernel, it would allow writing memory-safe
> drivers without having to add complicated bindings between existing C

I honestly doubt it, memory-safe is not free, basically you will still
want unsafe part for the performance reason (or interacting with
hardware), and provide a safe API for driver development. I don't think
that part will be gone with a memory-safe C++. So the complication still
exists. But I'm happy to be proved wrong ;-)

Regards,
Boqun

> code and new Rust code. Please do not take this as personal criticism -
> I appreciate the effort that has been spent on coming up with great
> Rust bindings for the Linux kernel block layer.
> 
> Thanks,
> 
> Bart.
>
Andreas Hindborg March 13, 2024, 7:03 p.m. UTC | #3
Boqun Feng <boqun.feng@gmail.com> writes:

> On Wed, Mar 13, 2024 at 11:02:23AM -0700, Bart Van Assche wrote:
>> On 3/13/24 04:05, Andreas Hindborg wrote:
>> > This is the second version of the Rust block device driver API and the Rust null
>> > block driver. The context and motivation can be seen in cover letter of the RFC
>> > v1 [1]. If more context is required, a talk about this effort was recorded at
>> > LPC [2]. I hope to be able to discuss this series at LSF this year [3].
>> 
>> Memory safety may land in C++ in the near future (see also
>> https://herbsutter.com/2024/03/). If memory-safe C++ or memory-safe C
>> would be adopted in the kernel, it would allow writing memory-safe
>> drivers without having to add complicated bindings between existing C
>
> I honestly doubt it, memory-safe is not free, basically you will still
> want unsafe part for the performance reason (or interacting with
> hardware), and provide a safe API for driver development. I don't think
> that part will be gone with a memory-safe C++. So the complication still
> exists. But I'm happy to be proved wrong ;-)

I think it is great that people are starting to realize that bringing
memory safety to other systems languages is a good idea. But from one
person blogging about it to things being ready for production is a long
journey. Language designers have to design ways to express the new
semantics, standards committees has to agree, compiler engineers have to
build and test their compilers. Probably we need a few cycles of this to
get things right. At any rate, it is going to take a while.

Second, as Boqun is saying, interfacing the unsafe C part of the kernel
with memory safe C++ is going to require the same complicated
abstraction logic as the safe Rust APIs are bringing. The complexity in
bringing Rust to the kernel is not coming from interfacing a foreign
language. It stems from the inherent difficulty in designing memory safe
wrappers around unsafe C APIs.

Best regards,
Andreas
Bart Van Assche March 13, 2024, 7:11 p.m. UTC | #4
On 3/13/24 12:03, Andreas Hindborg wrote:
> I think it is great that people are starting to realize that bringing
> memory safety to other systems languages is a good idea.

Totally agree :-)

> But from one person blogging about it to things being ready for
> production is a long journey.
In case you wouldn't be aware of this, Herb Sutter is not a random 
blogger - he is the chair of the ISO C++ standards committee since 2002.

Thanks,

Bart.
Matthew Wilcox March 13, 2024, 7:12 p.m. UTC | #5
On Wed, Mar 13, 2024 at 11:02:23AM -0700, Bart Van Assche wrote:
> On 3/13/24 04:05, Andreas Hindborg wrote:
> > This is the second version of the Rust block device driver API and the Rust null
> > block driver. The context and motivation can be seen in cover letter of the RFC
> > v1 [1]. If more context is required, a talk about this effort was recorded at
> > LPC [2]. I hope to be able to discuss this series at LSF this year [3].
> 
> Memory safety may land in C++ in the near future (see also
> https://herbsutter.com/2024/03/). If memory-safe C++ or memory-safe C
> would be adopted in the kernel, it would allow writing memory-safe
> drivers without having to add complicated bindings between existing C
> code and new Rust code. Please do not take this as personal criticism -
> I appreciate the effort that has been spent on coming up with great
> Rust bindings for the Linux kernel block layer.

You know, this reminds me of when I was at Intel working on NVMe.
We had a product all ready to ship and a manager suggested that we delay
shipping it until the SCSI-over-PCI standard was finished and ratified
so that the same piece of silicon could support both.

Fortunately, said manager was overruled, but it's a great example of
how this kind of FUD can actually work.
Philipp Stanner March 14, 2024, 12:14 p.m. UTC | #6
On Wed, 2024-03-13 at 11:02 -0700, Bart Van Assche wrote:
> On 3/13/24 04:05, Andreas Hindborg wrote:
> > This is the second version of the Rust block device driver API and
> > the Rust null
> > block driver. The context and motivation can be seen in cover
> > letter of the RFC
> > v1 [1]. If more context is required, a talk about this effort was
> > recorded at
> > LPC [2]. I hope to be able to discuss this series at LSF this year
> > [3].
> 
> Memory safety may land in C++ in the near future (see also
> https://herbsutter.com/2024/03/). If memory-safe C++ or memory-safe C
> would be adopted in the kernel, it would allow writing memory-safe
> drivers without having to add complicated bindings between existing C
> code and new Rust code.

Correct, but it would also most likely allow to just arbitrarily ignore
the "modern, memory safe C" (or C++) and write it the old way.

Those discussions are, below the surface, frequently in truth about the
question: Can you (and do you want to) _force_ people?

The Kernel's C already has more memory safety than standardized C:
There's devres, and since last year there's the __cleanup attribute.
– but the thing is, you can just ignore it and do it the old way.

Once you give people freedom (which is necessary frequently), you'll
get people who ignore "the right way to do it". You certainly get them
once thousands of people are participating in your project.
Actually, Rust in userspace has a similar problem: Your coprogrammers
can call unwrap(), just ignoring those neat Result types and blow up
your thread. So you have to review and reject that.

One of the stronger arguments behind the push for Rust is that the
language by design forces you to obey, because otherwise the compiler
will just reject building.


P.


>  Please do not take this as personal criticism -
> I appreciate the effort that has been spent on coming up with great
> Rust bindings for the Linux kernel block layer.
> 
> Thanks,
> 
> Bart.
> 
>
Bart Van Assche March 14, 2024, 5:03 p.m. UTC | #7
On 3/14/24 05:14, Philipp Stanner wrote:
> On Wed, 2024-03-13 at 11:02 -0700, Bart Van Assche wrote:
>> On 3/13/24 04:05, Andreas Hindborg wrote:
>>> This is the second version of the Rust block device driver API and
>>> the Rust null
>>> block driver. The context and motivation can be seen in cover
>>> letter of the RFC
>>> v1 [1]. If more context is required, a talk about this effort was
>>> recorded at
>>> LPC [2]. I hope to be able to discuss this series at LSF this year
>>> [3].
>>
>> Memory safety may land in C++ in the near future (see also
>> https://herbsutter.com/2024/03/). If memory-safe C++ or memory-safe C
>> would be adopted in the kernel, it would allow writing memory-safe
>> drivers without having to add complicated bindings between existing C
>> code and new Rust code.
> 
> Correct, but it would also most likely allow to just arbitrarily ignore
> the "modern, memory safe C" (or C++) and write it the old way.

Linux kernel developers have the choice today between memory-unsafe C
and memory-safe Rust so they already have the freedom to ignore memory
safety by choosing for C. In my opinion a compiler option should be
introduced once memory-safe C or C++ is available that can be set per
source file and that makes the build fail for source files that
unintentionally do not follow the memory-safety rules.

> The Kernel's C already has more memory safety than standardized C:
> There's devres, and since last year there's the __cleanup attribute.
> – but the thing is, you can just ignore it and do it the old way.

devres is controversial - see also Laurent Pinchart, "Why is
devm_kzalloc() harmful and what can we do about it", LPC, 2022
(https://lpc.events/event/16/contributions/1227/).

> One of the stronger arguments behind the push for Rust is that the
> language by design forces you to obey, because otherwise the compiler
> will just reject building.

Rust has a very significant disadvantage that memory-safe C/C++ won't
have: supporting Rust means adding Rust bindings for all C functions
called from Rust code. This forces everyone who wants to change an
interface to also change the Rust bindings and hence will make it
harder to maintain the Linux kernel in its entirety.

Thanks,

Bart.
Conor Dooley March 14, 2024, 5:16 p.m. UTC | #8
Just a passer-by here, but I noticed the link to Laurent's talk..

On Thu, Mar 14, 2024 at 10:03:28AM -0700, Bart Van Assche wrote:
> On 3/14/24 05:14, Philipp Stanner wrote:
> 
> > The Kernel's C already has more memory safety than standardized C:
> > There's devres, and since last year there's the __cleanup attribute.
> > – but the thing is, you can just ignore it and do it the old way.
> 
> devres is controversial - see also Laurent Pinchart, "Why is
> devm_kzalloc() harmful and what can we do about it", LPC, 2022
> (https://lpc.events/event/16/contributions/1227/).

I don't think that's a great thing to cite, that talk prompted a series
of others with (AFAIK*) the most recent being from Bart at LPC this year:
https://lpc.events/event/17/contributions/16f
The TL;DR is that it's not actually problem caused by devres.

* I think Wolfram also talked about it at an automotive conference, but
  that seemed like a bit of a pitch for funding from the safety
  conscious
Andreas Hindborg March 14, 2024, 5:43 p.m. UTC | #9
Hi Bart,

Bart Van Assche <bvanassche@acm.org> writes:
> On 3/14/24 05:14, Philipp Stanner wrote:
>> On Wed, 2024-03-13 at 11:02 -0700, Bart Van Assche wrote:

[...]

>> One of the stronger arguments behind the push for Rust is that the
>> language by design forces you to obey, because otherwise the compiler
>> will just reject building.
>
> Rust has a very significant disadvantage that memory-safe C/C++ won't
> have: supporting Rust means adding Rust bindings for all C functions
> called from Rust code. This forces everyone who wants to change an
> interface to also change the Rust bindings and hence will make it
> harder to maintain the Linux kernel in its entirety.

I think you might be missing a key point here. We actually generate Rust
bindings to the existing C kernel automatically. No hand editing
required, except for some corner cases we currently have with static
methods and certain macros. If we just wanted to call the C APIa
directly, there would be no engineering required. The main reason to
deploy Rust would also go away, we might as well stay in C.

The actual engineering effort goes into building memory safe versions of
the C APIs. This requirement will not magically go away, no matter what
memory safe language (or language extensions) your use to interface the
existing unsafe C APIs.

Best regards,
Andreas
Matthew Wilcox March 17, 2024, 2:50 a.m. UTC | #10
On Wed, Mar 13, 2024 at 12:05:07PM +0100, Andreas Hindborg wrote:
>  - Adopted the folio abstraction where applicable

I don't think this was the correct move.  The memory you're managing
with folios is only used for storing the data being stored in the blkdev.
It's not mapped to userspace, it doesn't make use of the flags (locked,
uptodate, writeback, etc), it doesn't need an LRU list, a mapping,
an index, a refcount or memcg.

I think you should use pages instead of folios to keep a handle on
this memory.  Apologies for the confusion; we're in the middle of a
years-long transition from the overly complex and overused struct page
to splitting it into different data types for different purposes.

More detail on this here, if you're interested:
https://kernelnewbies.org/MatthewWilcox/Memdescs
Andreas Hindborg March 17, 2024, 7:09 a.m. UTC | #11
Matthew Wilcox <willy@infradead.org> writes:

> On Wed, Mar 13, 2024 at 12:05:07PM +0100, Andreas Hindborg wrote:
>>  - Adopted the folio abstraction where applicable
>
> I don't think this was the correct move.  The memory you're managing
> with folios is only used for storing the data being stored in the blkdev.
> It's not mapped to userspace, it doesn't make use of the flags (locked,
> uptodate, writeback, etc), it doesn't need an LRU list, a mapping,
> an index, a refcount or memcg.
>
> I think you should use pages instead of folios to keep a handle on
> this memory.  Apologies for the confusion; we're in the middle of a
> years-long transition from the overly complex and overused struct page
> to splitting it into different data types for different purposes.

Ok, thanks. I'll swap it back.

I was under the impression that using folios also give the advantage
that handles are always head pages. No need to worry about head/tail
pages. If the driver moves to use higher order pages for larger block
sizes, would it then make sense with folios, or are page still
preferred?

> More detail on this here, if you're interested:
> https://kernelnewbies.org/MatthewWilcox/Memdescs

Thanks, that is useful.

Best regards,
Andreas
Matthew Wilcox March 17, 2024, 9:34 p.m. UTC | #12
On Sun, Mar 17, 2024 at 08:09:53AM +0100, Andreas Hindborg wrote:
> I was under the impression that using folios also give the advantage
> that handles are always head pages. No need to worry about head/tail
> pages. If the driver moves to use higher order pages for larger block
> sizes, would it then make sense with folios, or are page still
> preferred?

This is a good question.

If you do enhance this driver to support larger block sizes, I would
recommend indexing the xarray by sector number instead of page number.
You would allocate a compound page, but at no point would you need to
ask the page what order it is (you already know), you don't need to
change the page flags, etc, etc.

The page cache is in a bit of a different spot.  It has to support
multiple folio sizes for the same file; it has to support folios being
split, it has to support lookup from users who don't know what the folio
size is, etc, etc.

I don't think there's ever a point at which you'd need to call
compound_head() because you'd always look up the head page.  You'd still
want an iterator of some kind to copy to a compound page (because you
can only map one at a time).  And that might look a lot like the folio
copying code.  Probably the right way to handle this is for the folio
copying code to call the page copying code since a folio is always
composed of pages.