[RFC,00/11] Rust null block driver

Message ID	20230503090708.2524310-1-nmi@metaspace.dk (mailing list archive)
Headers	show Return-Path: <linux-block-owner@vger.kernel.org> From: Andreas Hindborg <nmi@metaspace.dk> To: Jens Axboe <axboe@kernel.dk>, Christoph Hellwig <hch@lst.de>, Keith Busch <kbusch@kernel.org>, Damien Le Moal <Damien.LeMoal@wdc.com>, Hannes Reinecke <hare@suse.de>, lsf-pc@lists.linux-foundation.org, rust-for-linux@vger.kernel.org, linux-block@vger.kernel.org Cc: Andreas Hindborg <a.hindborg@samsung.com>, Matthew Wilcox <willy@infradead.org>, Miguel Ojeda <ojeda@kernel.org>, Alex Gaynor <alex.gaynor@gmail.com>, Wedson Almeida Filho <wedsonaf@gmail.com>, Boqun Feng <boqun.feng@gmail.com>, Gary Guo <gary@garyguo.net>, =?utf-8?q?Bj=C3=B6rn_Roy_Baron?= <bjorn3_gh@protonmail.com>, Benno Lossin <benno.lossin@proton.me>, Andreas Hindborg <nmi@metaspace.dk>, linux-kernel@vger.kernel.org (open list), gost.dev@samsung.com Subject: [RFC PATCH 00/11] Rust null block driver Date: Wed, 3 May 2023 11:06:57 +0200 Message-Id: <20230503090708.2524310-1-nmi@metaspace.dk> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Precedence: bulk
Series	Rust null block driver \| expand [RFC,00/11] Rust null block driver [RFC,01/11] rust: add radix tree abstraction [RFC,02/11] rust: add `pages` module for handling page allocation [RFC,03/11] rust: block: introduce `kernel::block::mq` module [RFC,04/11] rust: block: introduce `kernel::block::bio` module [RFC,05/11] RUST: add `module_params` macro [RFC,06/11] rust: apply cache line padding for `SpinLock` [RFC,07/11] rust: lock: add support for `Lock::lock_irqsave` [RFC,08/11] rust: lock: implement `IrqSaveBackend` for `SpinLock` [RFC,09/11] RUST: implement `ForeignOwnable` for `Pin` [RFC,10/11] rust: add null block driver [RFC,11/11] rust: inline a number of short functions

Andreas Hindborg May 3, 2023, 9:06 a.m. UTC

From: Andreas Hindborg <a.hindborg@samsung.com>

Hi All!

This is an early preview of a null block driver written in Rust. It is a follow
up to my LSF topic proposal [1]. I send this series with intention of collecting
feedback and comments at LSF/MM/BPF.

The patches apply on top of the `rust-next` tree (rust-6.4
ea76e08f4d901a450619831a255e9e0a4c0ed162) [4].

A null block driver is a good opportunity to evaluate Rust bindings for the
block layer. It is a small and simple driver and thus should be simple to reason
about. Further, the null block driver is not usually deployed in production
environments. Thus, it should be fairly straight forward to review, and any
potential issues are not going to bring down any production workloads.

Being small and simple, the null block driver is a good place to introduce the
Linux kernel storage community to Rust. This will help prepare the community for
future Rust projects and facilitate a better maintenance process for these
projects.

The statistics presented in my previous message [1] show that the C null block
driver has had a significant amount of memory safety related problems in the
past. 41% of fixes merged for the C null block driver are fixes for memory
safety issues. This makes the null block driver a good candidate for rewriting
in Rust.

Patches in this series prefixed with capitalized RUST are code taken from the
R4L `rust` tree [2]. This code is not yet in mainline Linux and has not yet been
submitted as patches for inclusion in Linux. I am not the original author of
these patches, except for small changes required to include them in this series.

The driver performs similarly to the C driver for the implemented features, see
table below [5].

In this table each cell shows the relative performance of the Rust
driver to the C driver with the throughput of the C driver in parenthesis:
`rel_read rel_write (read_miops write_miops)`. Ex: the Rust driver performs 4.74
percent better than the C driver for fio randread with 2 jobs at 16 KiB
block size.

Over the 432 benchmark configurations, the relative performance of the Rust
driver to the C driver (in terms of IOPS) is between 6.8 and -11.8 percent with
an average of 0.2 percent better for reads. For writes the span is 16.8 to -4.5
percent with an average of 0.9 percent better.

For each measurement the drivers are loaded, a drive is configured with memory
backing and a size of 4 GiB. C null_blk is configured to match the implemented
modes of the Rust driver: `blocksize` is set to 4 KiB, `completion_nsec` to 0,
`irqmode` to 0 (IRQ_NONE), `queue_mode` to 2 (MQ), `hw_queue_depth` to 256 and
`memory_backed` to 1. For both the drivers, the queue scheduler is set to
`none`. These measurements are made using 30 second runs of `fio` with the
`PSYNC` IO engine with workers pinned to separate CPU cores. The measurements
are done inside a virtual machine (qemu/kvm) on an Intel Alder Lake workstation
(i5-12600).

Given the relatively large spread in these numbers, it is possible that there is
quite a bit of noise in the results. If anybody is able to dedicate some CPU
time to run benchmarks in a more controlled and standardized environment, I
would appreciate it.

The feature list of the driver is relatively short for now:

 - `blk-mq` support
 - Direct completion of IO
 - Read and write requests
 - Optional memory backing

Features available in the C null_blk driver that are currently not implemented
in this work:

 - Bio-based submission
 - NUMA support
 - Block size configuration
 - Multiple devices
 - Dynamic device creation/destruction
 - Soft-IRQ and timer mode
 - Queue depth configuration
 - Queue count configuration
 - Discard operation support
 - Cache emulation
 - Bandwidth throttling
 - Per node hctx
 - IO scheduler configuration
 - Blocking submission mode
 - Shared tags configuration (for >1 device)
 - Zoned storage support
 - Bad block simulation
 - Poll queues

The driver is implemented entirely in safe Rust, with all unsafe code fully
contained in wrappers for C APIs.

Patches 07/11 and 08/11 that enable the equivalent of the C
`spin_lock_irqsave()` did appear on the Rust list [6,3] but was not applied to
`rust-next` tree yet [4]. I am not the author of these patches, but I include
them here for convenience. Please defer any discussion of these patches to their
original posts.

The null block driver is provided in patch 10/11. Patches 03/11 and 04/11
introduce Rust abstractions for the block C APIs required by the driver.


[1] https://lore.kernel.org/all/87y1ofj5tt.fsf@metaspace.dk/
[2] https://github.com/Rust-for-Linux/linux/tree/rust
[3] https://lore.kernel.org/all/20230411054543.21278-7-wedsonaf@gmail.com/
[4] https://github.com/Rust-for-Linux/linux/tree/rust-next
[6] https://lore.kernel.org/all/20230411054543.21278-6-wedsonaf@gmail.com/

[5]:
+---------+-----------+-------------------------------+-----------------------------+-------------------------------+
| jobs/bs |  workload |               1               |              2              |               3               |
+---------+-----------+-------------------------------+-----------------------------+-------------------------------+
|    4k   |  randread |      1.54 0.00 (0.7,0.0)      |     0.35 0.00 (1.3,0.0)     |      -0.52 0.00 (1.3,0.0)     |
|    8k   |  randread |      3.30 0.00 (0.57,0.0)     |     6.79 0.00 (0.92,0.0)    |      0.24 0.00 (0.97,0.0)     |
|   16k   |  randread |      0.89 0.00 (0.42,0.0)     |     4.74 0.00 (0.64,0.0)    |      3.98 0.00 (0.65,0.0)     |
|   32k   |  randread |      2.04 0.00 (0.27,0.0)     |     3.26 0.00 (0.37,0.0)    |      1.41 0.00 (0.37,0.0)     |
|   64k   |  randread |      1.35 0.00 (0.16,0.0)     |     0.71 0.00 (0.21,0.0)    |      0.66 0.00 (0.22,0.0)     |
|   128k  |  randread |      0.59 0.00 (0.09,0.0)     |     0.13 0.00 (0.1,0.0)     |     -6.47 0.00 (0.11,0.0)     |
|   256k  |  randread |     0.13 0.00 (0.048,0.0)     |    1.00 0.00 (0.055,0.0)    |     0.03 0.00 (0.059,0.0)     |
|   512k  |  randread |     1.16 0.00 (0.023,0.0)     |    -1.82 0.00 (0.027,0.0)   |     -0.90 0.00 (0.028,0.0)    |
|  1024k  |  randread |     0.14 0.00 (0.011,0.0)     |    0.58 0.00 (0.012,0.0)    |     -1.14 0.00 (0.013,0.0)    |
|  2048k  |  randread |     2.02 0.00 (0.005,0.0)     |    0.13 0.00 (0.0062,0.0)   |    -0.37 0.00 (0.0062,0.0)    |
|  4096k  |  randread |     1.21 0.00 (0.0026,0.0)    |   -5.68 0.00 (0.0031,0.0)   |    -2.39 0.00 (0.0029,0.0)    |
|  8192k  |  randread |    -1.02 0.00 (0.0013,0.0)    |    0.66 0.00 (0.0013,0.0)   |    -1.72 0.00 (0.0013,0.0)    |
|    4k   |   randrw  |     2.82 2.82 (0.25,0.25)     |     2.94 2.94 (0.3,0.3)     |     3.19 3.20 (0.34,0.34)     |
|    8k   |   randrw  |     2.80 2.82 (0.21,0.21)     |    2.32 2.32 (0.25,0.25)    |     1.95 1.96 (0.28,0.28)     |
|   16k   |   randrw  |     4.54 4.52 (0.16,0.16)     |     2.29 2.29 (0.2,0.2)     |     2.27 2.30 (0.21,0.21)     |
|   32k   |   randrw  |     3.51 3.51 (0.11,0.11)     |    2.43 2.40 (0.13,0.13)    |     1.38 1.38 (0.14,0.14)     |
|   64k   |   randrw  |    2.40 2.42 (0.069,0.069)    |    1.98 1.99 (0.08,0.08)    |    0.51 0.51 (0.082,0.082)    |
|   128k  |   randrw  |    1.50 1.49 (0.039,0.039)    |   1.28 1.27 (0.042,0.042)   |    0.83 0.84 (0.043,0.043)    |
|   256k  |   randrw  |    2.02 2.06 (0.021,0.021)    |   0.92 0.91 (0.022,0.022)   |    0.70 0.75 (0.022,0.022)    |
|   512k  |   randrw  |    0.82 0.82 (0.011,0.011)    |   0.69 0.70 (0.011,0.011)   |    0.97 0.98 (0.011,0.011)    |
|  1024k  |   randrw  |   3.78 3.84 (0.0046,0.0046)   |  0.98 0.98 (0.0053,0.0053)  |   1.30 1.29 (0.0054,0.0054)   |
|  2048k  |   randrw  |   1.60 1.54 (0.0023,0.0023)   |  1.76 1.69 (0.0026,0.0026)  |   1.23 0.92 (0.0026,0.0026)   |
|  4096k  |   randrw  |   1.18 1.10 (0.0011,0.0012)   |  1.32 1.20 (0.0014,0.0014)  |   1.49 1.41 (0.0013,0.0013)   |
|  8192k  |   randrw  | -2.01 -1.89 (0.00057,0.00057) | 1.20 1.14 (0.00061,0.00061) |  -0.74 -0.61 (0.0006,0.00062) |
|    4k   | randwrite |      0.00 2.48 (0.0,0.41)     |     0.00 3.61 (0.0,0.39)    |      0.00 3.93 (0.0,0.44)     |
|    8k   | randwrite |      0.00 3.46 (0.0,0.35)     |     0.00 1.90 (0.0,0.34)    |      0.00 2.20 (0.0,0.37)     |
|   16k   | randwrite |      0.00 3.01 (0.0,0.28)     |     0.00 2.86 (0.0,0.27)    |      0.00 2.25 (0.0,0.29)     |
|   32k   | randwrite |      0.00 3.02 (0.0,0.19)     |     0.00 2.68 (0.0,0.19)    |      0.00 2.20 (0.0,0.21)     |
|   64k   | randwrite |      0.00 3.39 (0.0,0.12)     |     0.00 3.75 (0.0,0.12)    |     0.00 -0.79 (0.0,0.14)     |
|   128k  | randwrite |     0.00 3.81 (0.0,0.069)     |    0.00 1.10 (0.0,0.072)    |     0.00 5.48 (0.0,0.078)     |
|   256k  | randwrite |     0.00 4.28 (0.0,0.038)     |    0.00 1.27 (0.0,0.037)    |     0.00 -0.10 (0.0,0.041)    |
|   512k  | randwrite |     0.00 2.71 (0.0,0.019)     |    0.00 0.46 (0.0,0.017)    |     0.00 2.96 (0.0,0.019)     |
|  1024k  | randwrite |     0.00 2.67 (0.0,0.0082)    |    0.00 2.54 (0.0,0.0087)   |     0.00 0.20 (0.0,0.0093)    |
|  2048k  | randwrite |     0.00 2.40 (0.0,0.0041)    |    0.00 1.90 (0.0,0.0041)   |     0.00 3.44 (0.0,0.0043)    |
|  4096k  | randwrite |     0.00 3.38 (0.0,0.002)     |    0.00 1.69 (0.0,0.0025)   |     0.00 6.00 (0.0,0.0025)    |
|  8192k  | randwrite |    0.00 3.09 (0.0,0.00098)    |    0.00 1.10 (0.0,0.0012)   |     0.00 0.45 (0.0,0.0012)    |
|    4k   |    read   |      1.26 0.00 (1.0,0.0)      |     1.65 0.00 (2.0,0.0)     |      -9.94 0.00 (2.8,0.0)     |
|    8k   |    read   |      1.86 0.00 (0.78,0.0)     |     4.60 0.00 (1.5,0.0)     |      0.47 0.00 (1.7,0.0)      |
|   16k   |    read   |      1.13 0.00 (0.54,0.0)     |     4.00 0.00 (0.95,0.0)    |     -10.28 0.00 (0.97,0.0)    |
|   32k   |    read   |      0.64 0.00 (0.32,0.0)     |     5.54 0.00 (0.46,0.0)    |     -0.67 0.00 (0.47,0.0)     |
|   64k   |    read   |      0.29 0.00 (0.18,0.0)     |    -0.08 0.00 (0.25,0.0)    |     -0.82 0.00 (0.25,0.0)     |
|   128k  |    read   |     0.11 0.00 (0.095,0.0)     |     0.12 0.00 (0.11,0.0)    |     -4.99 0.00 (0.12,0.0)     |
|   256k  |    read   |      1.66 0.00 (0.05,0.0)     |    -0.41 0.00 (0.058,0.0)   |     -0.30 0.00 (0.061,0.0)    |
|   512k  |    read   |     1.06 0.00 (0.024,0.0)     |    -1.77 0.00 (0.029,0.0)   |     0.81 0.00 (0.029,0.0)     |
|  1024k  |    read   |     0.41 0.00 (0.011,0.0)     |    -2.28 0.00 (0.013,0.0)   |     -0.33 0.00 (0.013,0.0)    |
|  2048k  |    read   |    -1.18 0.00 (0.0053,0.0)    |    1.41 0.00 (0.0063,0.0)   |    -0.86 0.00 (0.0063,0.0)    |
|  4096k  |    read   |    -0.54 0.00 (0.0027,0.0)    |   -11.77 0.00 (0.0031,0.0)  |     0.02 0.00 (0.003,0.0)     |
|  8192k  |    read   |    -1.60 0.00 (0.0013,0.0)    |   -4.25 0.00 (0.0013,0.0)   |    -1.64 0.00 (0.0013,0.0)    |
|    4k   | readwrite |     1.01 1.01 (0.34,0.34)     |    1.00 1.00 (0.42,0.42)    |     1.30 1.29 (0.45,0.44)     |
|    8k   | readwrite |     1.54 1.54 (0.28,0.28)     |    1.65 1.66 (0.32,0.32)    |     1.18 1.18 (0.36,0.36)     |
|   16k   | readwrite |      1.07 1.08 (0.2,0.2)      |    2.82 2.81 (0.24,0.24)    |     0.40 0.39 (0.26,0.26)     |
|   32k   | readwrite |     0.56 0.56 (0.13,0.13)     |   -1.31 -1.30 (0.16,0.16)   |     0.09 0.09 (0.16,0.16)     |
|   64k   | readwrite |    1.82 1.83 (0.077,0.077)    |  -0.96 -0.96 (0.091,0.091)  |   -0.99 -0.97 (0.092,0.092)   |
|   128k  | readwrite |    2.09 2.10 (0.041,0.041)    |   1.07 1.06 (0.045,0.045)   |    0.28 0.29 (0.045,0.045)    |
|   256k  | readwrite |    2.06 2.05 (0.022,0.022)    |   1.84 1.85 (0.023,0.023)   |    0.66 0.59 (0.023,0.023)    |
|   512k  | readwrite |     5.90 5.84 (0.01,0.01)     |   1.20 1.17 (0.011,0.011)   |    1.18 1.21 (0.011,0.011)    |
|  1024k  | readwrite |   1.94 1.89 (0.0047,0.0047)   |  2.43 2.48 (0.0053,0.0053)  |   2.18 2.21 (0.0054,0.0054)   |
|  2048k  | readwrite |  -0.47 -0.47 (0.0023,0.0023)  |  2.45 2.38 (0.0026,0.0026)  |   1.76 1.78 (0.0026,0.0026)   |
|  4096k  | readwrite |   1.88 1.77 (0.0011,0.0012)   |  1.37 1.17 (0.0014,0.0014)  |   1.80 1.90 (0.0013,0.0013)   |
|  8192k  | readwrite |  1.27 1.42 (0.00057,0.00057)  | 0.86 0.73 (0.00061,0.00062) | -0.99 -1.28 (0.00061,0.00062) |
|    4k   |   write   |      0.00 1.53 (0.0,0.51)     |     0.00 0.35 (0.0,0.51)    |      0.00 0.92 (0.0,0.52)     |
|    8k   |   write   |      0.00 1.04 (0.0,0.42)     |     0.00 1.01 (0.0,0.4)     |      0.00 2.90 (0.0,0.44)     |
|   16k   |   write   |      0.00 0.86 (0.0,0.32)     |     0.00 1.89 (0.0,0.35)    |     0.00 -1.38 (0.0,0.36)     |
|   32k   |   write   |      0.00 0.47 (0.0,0.21)     |    0.00 -0.52 (0.0,0.26)    |     0.00 -1.40 (0.0,0.24)     |
|   64k   |   write   |      0.00 0.30 (0.0,0.13)     |    0.00 13.34 (0.0,0.14)    |      0.00 2.01 (0.0,0.15)     |
|   128k  |   write   |     0.00 2.24 (0.0,0.073)     |    0.00 16.77 (0.0,0.076)   |     0.00 -0.36 (0.0,0.085)    |
|   256k  |   write   |     0.00 3.49 (0.0,0.039)     |    0.00 9.76 (0.0,0.039)    |     0.00 -4.52 (0.0,0.043)    |
|   512k  |   write   |      0.00 4.08 (0.0,0.02)     |    0.00 1.03 (0.0,0.017)    |     0.00 1.05 (0.0,0.019)     |
|  1024k  |   write   |     0.00 2.72 (0.0,0.0083)    |    0.00 2.89 (0.0,0.0086)   |     0.00 2.98 (0.0,0.0095)    |
|  2048k  |   write   |     0.00 2.14 (0.0,0.0041)    |    0.00 3.12 (0.0,0.0041)   |     0.00 1.84 (0.0,0.0044)    |
|  4096k  |   write   |     0.00 1.76 (0.0,0.002)     |    0.00 1.80 (0.0,0.0026)   |     0.00 6.01 (0.0,0.0025)    |
|  8192k  |   write   |    0.00 1.41 (0.0,0.00099)    |    0.00 0.98 (0.0,0.0012)   |     0.00 0.13 (0.0,0.0012)    |
+---------+-----------+-------------------------------+-----------------------------+-------------------------------+

+---------+-----------+-------------------------------+------------------------------+-------------------------------+
| jobs/bs |  workload |               4               |              5               |               6               |
+---------+-----------+-------------------------------+------------------------------+-------------------------------+
|    4k   |  randread |      1.66 0.00 (1.3,0.0)      |     -0.81 0.00 (1.3,0.0)     |      -0.50 0.00 (1.2,0.0)     |
|    8k   |  randread |      2.52 0.00 (0.98,0.0)     |     2.54 0.00 (0.95,0.0)     |      2.52 0.00 (0.92,0.0)     |
|   16k   |  randread |      0.68 0.00 (0.65,0.0)     |     3.43 0.00 (0.64,0.0)     |      5.84 0.00 (0.63,0.0)     |
|   32k   |  randread |      1.88 0.00 (0.38,0.0)     |     0.46 0.00 (0.38,0.0)     |      2.33 0.00 (0.37,0.0)     |
|   64k   |  randread |      1.28 0.00 (0.21,0.0)     |     0.64 0.00 (0.21,0.0)     |     -0.26 0.00 (0.21,0.0)     |
|   128k  |  randread |     -0.05 0.00 (0.11,0.0)     |     0.10 0.00 (0.11,0.0)     |      0.09 0.00 (0.11,0.0)     |
|   256k  |  randread |     -0.01 0.00 (0.059,0.0)    |    -1.01 0.00 (0.059,0.0)    |     -0.39 0.00 (0.059,0.0)    |
|   512k  |  randread |     -1.17 0.00 (0.029,0.0)    |    0.39 0.00 (0.028,0.0)     |     -0.33 0.00 (0.028,0.0)    |
|  1024k  |  randread |     -0.20 0.00 (0.013,0.0)    |    -0.85 0.00 (0.013,0.0)    |     -0.57 0.00 (0.013,0.0)    |
|  2048k  |  randread |    -0.07 0.00 (0.0062,0.0)    |   -0.26 0.00 (0.0061,0.0)    |     -0.15 0.00 (0.006,0.0)    |
|  4096k  |  randread |    -0.25 0.00 (0.0027,0.0)    |   -2.70 0.00 (0.0027,0.0)    |    -2.35 0.00 (0.0026,0.0)    |
|  8192k  |  randread |    -2.94 0.00 (0.0013,0.0)    |   -3.01 0.00 (0.0012,0.0)    |    -2.09 0.00 (0.0012,0.0)    |
|    4k   |   randrw  |     2.83 2.84 (0.36,0.36)     |    2.24 2.23 (0.37,0.37)     |     3.49 3.50 (0.38,0.38)     |
|    8k   |   randrw  |     1.31 1.32 (0.29,0.29)     |     2.01 2.01 (0.3,0.3)      |      2.09 2.09 (0.3,0.3)      |
|   16k   |   randrw  |     1.96 1.95 (0.22,0.22)     |    1.61 1.61 (0.22,0.22)     |     1.99 2.00 (0.22,0.22)     |
|   32k   |   randrw  |     1.01 1.03 (0.14,0.14)     |    0.58 0.59 (0.14,0.14)     |    -0.04 -0.05 (0.14,0.14)    |
|   64k   |   randrw  |   -0.19 -0.18 (0.081,0.081)   |  -0.72 -0.72 (0.081,0.081)   |   -0.50 -0.50 (0.081,0.081)   |
|   128k  |   randrw  |    0.42 0.42 (0.043,0.043)    |  -0.33 -0.36 (0.043,0.043)   |   -0.62 -0.59 (0.043,0.043)   |
|   256k  |   randrw  |    0.38 0.41 (0.022,0.022)    |  -0.35 -0.35 (0.022,0.022)   |    0.37 0.31 (0.022,0.022)    |
|   512k  |   randrw  |    0.40 0.41 (0.011,0.011)    |  -0.29 -0.26 (0.011,0.011)   |     2.76 2.48 (0.01,0.01)     |
|  1024k  |   randrw  |   1.16 1.01 (0.0052,0.0052)   |  0.74 0.72 (0.0051,0.0051)   |    1.72 1.77 (0.005,0.005)    |
|  2048k  |   randrw  |   1.09 0.96 (0.0026,0.0026)   |  1.19 1.57 (0.0026,0.0026)   |   1.49 1.21 (0.0025,0.0025)   |
|  4096k  |   randrw  |   -0.13 0.06 (0.0013,0.0013)  |  0.69 0.77 (0.0013,0.0013)   |  -0.20 -0.13 (0.0012,0.0012)  |
|  8192k  |   randrw  | -0.71 -1.27 (0.00059,0.00061) | -1.78 -2.02 (0.00059,0.0006) | -1.60 -1.25 (0.00058,0.00059) |
|    4k   | randwrite |      0.00 2.17 (0.0,0.44)     |     0.00 4.08 (0.0,0.45)     |      0.00 5.47 (0.0,0.46)     |
|    8k   | randwrite |      0.00 0.98 (0.0,0.38)     |     0.00 2.39 (0.0,0.39)     |      0.00 2.30 (0.0,0.4)      |
|   16k   | randwrite |      0.00 2.02 (0.0,0.3)      |     0.00 2.63 (0.0,0.3)      |      0.00 3.17 (0.0,0.3)      |
|   32k   | randwrite |      0.00 1.82 (0.0,0.21)     |     0.00 1.12 (0.0,0.22)     |      0.00 1.40 (0.0,0.21)     |
|   64k   | randwrite |      0.00 1.41 (0.0,0.14)     |     0.00 1.92 (0.0,0.13)     |      0.00 1.76 (0.0,0.13)     |
|   128k  | randwrite |     0.00 2.32 (0.0,0.075)     |    0.00 4.61 (0.0,0.074)     |     0.00 2.31 (0.0,0.074)     |
|   256k  | randwrite |     0.00 -0.77 (0.0,0.038)    |    0.00 2.61 (0.0,0.036)     |     0.00 2.48 (0.0,0.036)     |
|   512k  | randwrite |     0.00 1.83 (0.0,0.019)     |    0.00 1.68 (0.0,0.018)     |     0.00 2.80 (0.0,0.018)     |
|  1024k  | randwrite |     0.00 3.32 (0.0,0.0086)    |    0.00 1.20 (0.0,0.0087)    |     0.00 0.92 (0.0,0.0081)    |
|  2048k  | randwrite |     0.00 0.92 (0.0,0.0043)    |    0.00 2.95 (0.0,0.0042)    |     0.00 3.15 (0.0,0.0042)    |
|  4096k  | randwrite |     0.00 3.22 (0.0,0.0025)    |    0.00 0.50 (0.0,0.0024)    |     0.00 0.40 (0.0,0.0024)    |
|  8192k  | randwrite |    0.00 -0.57 (0.0,0.0012)    |   0.00 -0.41 (0.0,0.0011)    |    0.00 -0.36 (0.0,0.0011)    |
|    4k   |    read   |      4.34 0.00 (2.5,0.0)      |     -0.40 0.00 (2.6,0.0)     |      -8.45 0.00 (2.6,0.0)     |
|    8k   |    read   |      -1.43 0.00 (1.7,0.0)     |     0.01 0.00 (1.7,0.0)      |      -3.89 0.00 (1.7,0.0)     |
|   16k   |    read   |      1.68 0.00 (0.96,0.0)     |    -0.38 0.00 (0.94,0.0)     |      0.47 0.00 (0.91,0.0)     |
|   32k   |    read   |     -0.98 0.00 (0.48,0.0)     |    -0.09 0.00 (0.47,0.0)     |     -1.24 0.00 (0.47,0.0)     |
|   64k   |    read   |     -0.42 0.00 (0.25,0.0)     |    -0.26 0.00 (0.25,0.0)     |     -0.94 0.00 (0.25,0.0)     |
|   128k  |    read   |     -0.49 0.00 (0.12,0.0)     |     0.06 0.00 (0.12,0.0)     |     -0.59 0.00 (0.12,0.0)     |
|   256k  |    read   |     -0.89 0.00 (0.062,0.0)    |    -0.11 0.00 (0.062,0.0)    |     -0.77 0.00 (0.062,0.0)    |
|   512k  |    read   |     -0.64 0.00 (0.03,0.0)     |    -0.11 0.00 (0.03,0.0)     |      0.08 0.00 (0.03,0.0)     |
|  1024k  |    read   |     0.00 0.00 (0.013,0.0)     |    -0.05 0.00 (0.013,0.0)    |     0.42 0.00 (0.013,0.0)     |
|  2048k  |    read   |    -0.69 0.00 (0.0063,0.0)    |   -0.36 0.00 (0.0063,0.0)    |     1.12 0.00 (0.006,0.0)     |
|  4096k  |    read   |    -2.69 0.00 (0.0029,0.0)    |   -1.85 0.00 (0.0027,0.0)    |    -1.13 0.00 (0.0026,0.0)    |
|  8192k  |    read   |    -2.15 0.00 (0.0013,0.0)    |   -1.66 0.00 (0.0013,0.0)    |    -2.90 0.00 (0.0012,0.0)    |
|    4k   | readwrite |     0.32 0.32 (0.46,0.46)     |    0.68 0.69 (0.47,0.47)     |     -0.26 -0.26 (0.5,0.5)     |
|    8k   | readwrite |     0.18 0.18 (0.37,0.37)     |    0.77 0.76 (0.38,0.38)     |    -0.47 -0.48 (0.39,0.39)    |
|   16k   | readwrite |     0.24 0.25 (0.27,0.27)     |   -0.17 -0.15 (0.27,0.27)    |    -0.51 -0.52 (0.27,0.27)    |
|   32k   | readwrite |    -0.91 -0.91 (0.17,0.17)    |   -1.40 -1.37 (0.17,0.17)    |    -1.67 -1.70 (0.17,0.17)    |
|   64k   | readwrite |    -1.22 -1.21 (0.09,0.09)    |   -1.33 -1.30 (0.09,0.09)    |   -2.49 -2.50 (0.091,0.091)   |
|   128k  | readwrite |   -0.67 -0.69 (0.045,0.045)   |  -1.23 -1.24 (0.045,0.045)   |   -1.01 -0.96 (0.045,0.045)   |
|   256k  | readwrite |   -0.31 -0.35 (0.023,0.023)   |  -0.28 -0.28 (0.023,0.023)   |    0.25 0.08 (0.022,0.023)    |
|   512k  | readwrite |    0.77 0.72 (0.011,0.011)    |   0.96 0.96 (0.011,0.011)    |    0.62 1.16 (0.011,0.011)    |
|  1024k  | readwrite |   0.93 0.91 (0.0053,0.0053)   |  0.72 0.78 (0.0051,0.0051)   |  -0.48 -0.10 (0.0051,0.0051)  |
|  2048k  | readwrite |   1.84 1.77 (0.0026,0.0026)   |  0.86 0.71 (0.0026,0.0026)   |   0.31 1.37 (0.0026,0.0026)   |
|  4096k  | readwrite |   2.71 2.43 (0.0013,0.0013)   |  1.54 1.73 (0.0013,0.0013)   |  -1.18 -0.79 (0.0012,0.0013)  |
|  8192k  | readwrite |  -2.61 -1.52 (0.0006,0.00061) | -1.83 -1.27 (0.00059,0.0006) | -2.25 -1.53 (0.00059,0.00059) |
|    4k   |   write   |     0.00 -2.94 (0.0,0.53)     |    0.00 -2.23 (0.0,0.57)     |      0.00 0.93 (0.0,0.56)     |
|    8k   |   write   |     0.00 -1.78 (0.0,0.45)     |    0.00 -1.14 (0.0,0.47)     |      0.00 0.47 (0.0,0.47)     |
|   16k   |   write   |     0.00 -1.46 (0.0,0.35)     |    0.00 -1.46 (0.0,0.36)     |      0.00 0.25 (0.0,0.36)     |
|   32k   |   write   |     0.00 -1.05 (0.0,0.24)     |    0.00 -1.53 (0.0,0.24)     |     0.00 -0.74 (0.0,0.24)     |
|   64k   |   write   |     0.00 -2.78 (0.0,0.15)     |     0.00 1.39 (0.0,0.15)     |     0.00 -2.42 (0.0,0.15)     |
|   128k  |   write   |     0.00 -0.03 (0.0,0.082)    |    0.00 0.32 (0.0,0.081)     |     0.00 2.76 (0.0,0.079)     |
|   256k  |   write   |     0.00 -0.33 (0.0,0.039)    |    0.00 -0.32 (0.0,0.038)    |     0.00 -2.37 (0.0,0.038)    |
|   512k  |   write   |     0.00 4.00 (0.0,0.019)     |     0.00 6.45 (0.0,0.02)     |     0.00 2.94 (0.0,0.019)     |
|  1024k  |   write   |     0.00 1.22 (0.0,0.0088)    |    0.00 1.57 (0.0,0.0088)    |    0.00 -3.98 (0.0,0.0087)    |
|  2048k  |   write   |     0.00 0.05 (0.0,0.0043)    |    0.00 3.33 (0.0,0.0044)    |     0.00 8.69 (0.0,0.0044)    |
|  4096k  |   write   |     0.00 2.36 (0.0,0.0025)    |    0.00 1.52 (0.0,0.0024)    |     0.00 0.44 (0.0,0.0024)    |
|  8192k  |   write   |    0.00 -0.35 (0.0,0.0012)    |   0.00 -0.45 (0.0,0.0011)    |    0.00 -0.73 (0.0,0.0011)    |
+---------+-----------+-------------------------------+------------------------------+-------------------------------+


Andreas Hindborg (9):
  rust: add radix tree abstraction
  rust: add `pages` module for handling page allocation
  rust: block: introduce `kernel::block::mq` module
  rust: block: introduce `kernel::block::bio` module
  RUST: add `module_params` macro
  rust: apply cache line padding for `SpinLock`
  RUST: implement `ForeignOwnable` for `Pin`
  rust: add null block driver
  rust: inline a number of short functions

Wedson Almeida Filho (2):
  rust: lock: add support for `Lock::lock_irqsave`
  rust: lock: implement `IrqSaveBackend` for `SpinLock`

 drivers/block/Kconfig              |   4 +
 drivers/block/Makefile             |   4 +
 drivers/block/rnull-helpers.c      |  60 ++++
 drivers/block/rnull.rs             | 177 ++++++++++
 rust/bindings/bindings_helper.h    |   4 +
 rust/bindings/lib.rs               |   1 +
 rust/helpers.c                     |  91 ++++++
 rust/kernel/block.rs               |   6 +
 rust/kernel/block/bio.rs           |  93 ++++++
 rust/kernel/block/bio/vec.rs       | 181 +++++++++++
 rust/kernel/block/mq.rs            |  15 +
 rust/kernel/block/mq/gen_disk.rs   | 133 ++++++++
 rust/kernel/block/mq/operations.rs | 260 +++++++++++++++
 rust/kernel/block/mq/raw_writer.rs |  30 ++
 rust/kernel/block/mq/request.rs    |  87 +++++
 rust/kernel/block/mq/tag_set.rs    |  92 ++++++
 rust/kernel/cache_padded.rs        |  33 ++
 rust/kernel/error.rs               |   4 +
 rust/kernel/lib.rs                 |  13 +
 rust/kernel/module_param.rs        | 501 +++++++++++++++++++++++++++++
 rust/kernel/pages.rs               | 284 ++++++++++++++++
 rust/kernel/radix_tree.rs          | 156 +++++++++
 rust/kernel/sync/lock.rs           |  49 ++-
 rust/kernel/sync/lock/mutex.rs     |   2 +
 rust/kernel/sync/lock/spinlock.rs  |  50 ++-
 rust/kernel/types.rs               |  30 ++
 rust/macros/helpers.rs             |  46 ++-
 rust/macros/module.rs              | 402 +++++++++++++++++++++--
 scripts/Makefile.build             |   2 +-
 29 files changed, 2766 insertions(+), 44 deletions(-)
 create mode 100644 drivers/block/rnull-helpers.c
 create mode 100644 drivers/block/rnull.rs
 create mode 100644 rust/kernel/block.rs
 create mode 100644 rust/kernel/block/bio.rs
 create mode 100644 rust/kernel/block/bio/vec.rs
 create mode 100644 rust/kernel/block/mq.rs
 create mode 100644 rust/kernel/block/mq/gen_disk.rs
 create mode 100644 rust/kernel/block/mq/operations.rs
 create mode 100644 rust/kernel/block/mq/raw_writer.rs
 create mode 100644 rust/kernel/block/mq/request.rs
 create mode 100644 rust/kernel/block/mq/tag_set.rs
 create mode 100644 rust/kernel/cache_padded.rs
 create mode 100644 rust/kernel/module_param.rs
 create mode 100644 rust/kernel/pages.rs
 create mode 100644 rust/kernel/radix_tree.rs


base-commit: ea76e08f4d901a450619831a255e9e0a4c0ed162

Niklas Cassel May 3, 2023, 11:32 a.m. UTC | #1

On Wed, May 03, 2023 at 11:06:57AM +0200, Andreas Hindborg wrote:
> From: Andreas Hindborg <a.hindborg@samsung.com>
> 

(cut)

> 
> For each measurement the drivers are loaded, a drive is configured with memory
> backing and a size of 4 GiB. C null_blk is configured to match the implemented
> modes of the Rust driver: `blocksize` is set to 4 KiB, `completion_nsec` to 0,
> `irqmode` to 0 (IRQ_NONE), `queue_mode` to 2 (MQ), `hw_queue_depth` to 256 and
> `memory_backed` to 1. For both the drivers, the queue scheduler is set to
> `none`. These measurements are made using 30 second runs of `fio` with the
> `PSYNC` IO engine with workers pinned to separate CPU cores. The measurements
> are done inside a virtual machine (qemu/kvm) on an Intel Alder Lake workstation
> (i5-12600).

Hello Andreas,

I'm curious why you used psync ioengine for the benchmarks.

As psync is a sync ioengine, it means queue depth == 1.

Wouldn't it have been more interesting to see an async ioengine,
together with different queue depths?

You might want to explain your table a bit more.
It might be nice to see IOPS and average latencies.

As an example of a table that I find easier to interpret,
see e.g. the table on page 29 in the SPDK performance report:
https://ci.spdk.io/download/performance-reports/SPDK_nvme_bdev_perf_report_2301.pdf

Kind regards,
Niklas

Andreas Hindborg May 3, 2023, 12:29 p.m. UTC | #2

Hi Niklas,

Niklas Cassel <Niklas.Cassel@wdc.com> writes:

> On Wed, May 03, 2023 at 11:06:57AM +0200, Andreas Hindborg wrote:
>> From: Andreas Hindborg <a.hindborg@samsung.com>
>> 
>
> (cut)
>
>> 
>> For each measurement the drivers are loaded, a drive is configured with memory
>> backing and a size of 4 GiB. C null_blk is configured to match the implemented
>> modes of the Rust driver: `blocksize` is set to 4 KiB, `completion_nsec` to 0,
>> `irqmode` to 0 (IRQ_NONE), `queue_mode` to 2 (MQ), `hw_queue_depth` to 256 and
>> `memory_backed` to 1. For both the drivers, the queue scheduler is set to
>> `none`. These measurements are made using 30 second runs of `fio` with the
>> `PSYNC` IO engine with workers pinned to separate CPU cores. The measurements
>> are done inside a virtual machine (qemu/kvm) on an Intel Alder Lake workstation
>> (i5-12600).
>
> Hello Andreas,
>
> I'm curious why you used psync ioengine for the benchmarks.
>
> As psync is a sync ioengine, it means queue depth == 1.
>
> Wouldn't it have been more interesting to see an async ioengine,
> together with different queue depths?

That would also be interesting. I was a bit constrained on CPU cycles so
I had to choose. I intend to produce the numbers you ask for. For now
here is two runs of random read using io_uring with queue depth 128
(same table style):

For iodepth_batch_submit=1, iodepth_batch_complete=1:
+---------+----------+---------------------+---------------------+
| jobs/bs | workload |          1          |          6          |
+---------+----------+---------------------+---------------------+
|    4k   | randread | 2.97 0.00 (0.9,0.0) | 4.06 0.00 (1.8,0.0) |
+---------+----------+---------------------+---------------------+

For iodepth_batch_submit=16, iodepth_batch_complete=16:
+---------+----------+---------------------+---------------------+
| jobs/bs | workload |          1          |          6          |
+---------+----------+---------------------+---------------------+
|    4k   | randread | 4.40 0.00 (1.1,0.0) | 4.87 0.00 (1.8,0.0) |
+---------+----------+---------------------+---------------------+

Above numbers are 60 second runs on bare metal, so not entirely
comparable with the ones in the cover letter.

> You might want to explain your table a bit more.

I understand that the table can be difficult to read. It is not easy to
convey all this information in ASCII email. The numbers in parenthesis
in the cells _are_ IOPS x 10e6 (read,write). Referring to he second
table above, for 1 job at 4k bs the Rust driver performed 4.8 percent
more IOPS than the C driver. The C driver did 1.1M IOPS. I hope this
clarifies the table, otherwise let me know!

> It might be nice to see IOPS and average latencies.

I did collect latency info as well, including completion latency
percentiles. It's just difficult to fit all that data in an email. I
have the fio json output, let me know if you want it and I will find a
way to get it to you. I am considering setting up some kind of CI that
will publish the performance results online automatically so that it
will be a link instead of an inline table.

>
> As an example of a table that I find easier to interpret,
> see e.g. the table on page 29 in the SPDK performance report:
> https://ci.spdk.io/download/performance-reports/SPDK_nvme_bdev_perf_report_2301.pdf

Thanks for the input, I will be sure to reference that next time. Just
for clarity, as you mentioned there is only one queue depth in play for
the numbers in the cover letter.

Best regards
Andreas

Niklas Cassel May 3, 2023, 1:54 p.m. UTC | #3

On Wed, May 03, 2023 at 02:29:08PM +0200, Andreas Hindborg wrote:
> 

(cut)

> 
> For iodepth_batch_submit=1, iodepth_batch_complete=1:
> +---------+----------+---------------------+---------------------+
> | jobs/bs | workload |          1          |          6          |
> +---------+----------+---------------------+---------------------+
> |    4k   | randread | 2.97 0.00 (0.9,0.0) | 4.06 0.00 (1.8,0.0) |
> +---------+----------+---------------------+---------------------+
> 
> For iodepth_batch_submit=16, iodepth_batch_complete=16:
> +---------+----------+---------------------+---------------------+
> | jobs/bs | workload |          1          |          6          |
> +---------+----------+---------------------+---------------------+
> |    4k   | randread | 4.40 0.00 (1.1,0.0) | 4.87 0.00 (1.8,0.0) |
> +---------+----------+---------------------+---------------------+
> 
> Above numbers are 60 second runs on bare metal, so not entirely
> comparable with the ones in the cover letter.
> 
> > You might want to explain your table a bit more.
> 
> I understand that the table can be difficult to read. It is not easy to
> convey all this information in ASCII email. The numbers in parenthesis
> in the cells _are_ IOPS x 10e6 (read,write). Referring to he second
> table above, for 1 job at 4k bs the Rust driver performed 4.8 percent
> more IOPS than the C driver. The C driver did 1.1M IOPS. I hope this
> clarifies the table, otherwise let me know!

Yes, I can fully understand that it is hard to convey all the information
in a small text table. I do think it is better to have more small tables.
It will be a lot of text, but perhaps one table per "nr jobs" would have
been clearer.

Perhaps you don't need such fine granularity of "nr jobs" either, perhaps
skip some. Same thing can probably be said about block size increments,
you can probably skip some.

It seems like the numbers are very jittery. I assume that this is because
the drive utilization is so low. The IOPS are also very low.
You probably need to increase the QD for these numbers to be consistent.

Perhaps move this text to be just before the table itself:
"In this table each cell shows the relative performance of the Rust
driver to the C driver with the throughput of the C driver in parenthesis:
`rel_read rel_write (read_miops write_miops)`. Ex: the Rust driver performs 4.74
percent better than the C driver for fio randread with 2 jobs at 16 KiB
block size."

I think it would be good to print the IOPS for both C and Rust, as you now
only seem to print the C driver IOPS. That way it is also easier to verify
the relative differences. (Not that I think that you are making up numbers,
but you usually see IOPS for both implementations, followed by percentage
diffs.

See e.g. how Jens did it:
https://lore.kernel.org/linux-block/20190116175003.17880-1-axboe@kernel.dk/#r

Kind regards,
Niklas

Bart Van Assche May 3, 2023, 4:47 p.m. UTC | #4

On 5/3/23 02:06, Andreas Hindborg wrote:
> This is an early preview of a null block driver written in Rust.

It is not clear to me why this effort was started? As far as I know the 
null_blk driver is only used by kernel developers for testing kernel 
changes so end users are not affected by bugs in this driver. 
Additionally, performance of this driver is critical since this driver 
is used to measure block layer performance. Does this mean that C is a 
better choice than Rust for this driver?

Thanks,

Bart.

Andreas Hindborg May 4, 2023, 6:15 p.m. UTC | #5

Hi Bart,

Bart Van Assche <bvanassche@acm.org> writes:

> On 5/3/23 02:06, Andreas Hindborg wrote:
>> This is an early preview of a null block driver written in Rust.
>
> It is not clear to me why this effort was started? As far as I know the null_blk
> driver is only used by kernel developers for testing kernel changes so end users
> are not affected by bugs in this driver. Additionally, performance of this
> driver is critical since this driver is used to measure block layer performance.
> Does this mean that C is a better choice than Rust for this driver?

I take it you did not read the rest of the cover letter. Let me quote
some of it here:

> A null block driver is a good opportunity to evaluate Rust bindings for the
> block layer. It is a small and simple driver and thus should be simple to reason
> about. Further, the null block driver is not usually deployed in production
> environments. Thus, it should be fairly straight forward to review, and any
> potential issues are not going to bring down any production workloads.
>
> Being small and simple, the null block driver is a good place to introduce the
> Linux kernel storage community to Rust. This will help prepare the community for
> future Rust projects and facilitate a better maintenance process for these
> projects.
>
> The statistics presented in my previous message [1] show that the C null block
> driver has had a significant amount of memory safety related problems in the
> past. 41% of fixes merged for the C null block driver are fixes for memory
> safety issues. This makes the null block driver a good candidate for rewriting
> in Rust.

In relation to performance, it turns out that there is not much of a
difference. For memory safety bugs - I think we are better off without
them, no matter if we are user facing or not.

If it is still unclear to you why this effort was started, please do let
me know and I shall try to clarify further :)

Best regards,
Andreas

Bart Van Assche May 4, 2023, 6:36 p.m. UTC | #6

On 5/4/23 11:15, Andreas Hindborg wrote:
> If it is still unclear to you why this effort was started, please do let
> me know and I shall try to clarify further :)

It seems like I was too polite in my previous email. What I meant is that
rewriting code is useful if it provides a clear advantage to the users of
a driver. For null_blk, the users are kernel developers. The code that has
been posted is the start of a rewrite of the null_blk driver. The benefits
of this rewrite (making low-level memory errors less likely) do not outweigh
the risks that this effort will introduce functional or performance regressions.

Bart.

Andreas Hindborg May 4, 2023, 6:46 p.m. UTC | #7

Bart Van Assche <bvanassche@acm.org> writes:

> On 5/4/23 11:15, Andreas Hindborg wrote:
>> If it is still unclear to you why this effort was started, please do let
>> me know and I shall try to clarify further :)
>
> It seems like I was too polite in my previous email. What I meant is that
> rewriting code is useful if it provides a clear advantage to the users of
> a driver. For null_blk, the users are kernel developers. The code that has
> been posted is the start of a rewrite of the null_blk driver. The benefits
> of this rewrite (making low-level memory errors less likely) do not outweigh
> the risks that this effort will introduce functional or performance regressions.

If this turns in to a full rewrite instead of just a demonstrator, we
will be in the lucky situation that we have the existing C version to
verify performance and functionality against. Unnoticed regressions are
unlikely in this sense.

If we want to have Rust abstractions for the block layer in the kernel
(some people do), then having a simple driver in Rust to regression test
these abstractions with, is good value.

Best regards,
Andreas

Keith Busch May 4, 2023, 6:52 p.m. UTC | #8

On Thu, May 04, 2023 at 11:36:01AM -0700, Bart Van Assche wrote:
> On 5/4/23 11:15, Andreas Hindborg wrote:
> > If it is still unclear to you why this effort was started, please do let
> > me know and I shall try to clarify further :)
> 
> It seems like I was too polite in my previous email. What I meant is that
> rewriting code is useful if it provides a clear advantage to the users of
> a driver. For null_blk, the users are kernel developers. The code that has
> been posted is the start of a rewrite of the null_blk driver. The benefits
> of this rewrite (making low-level memory errors less likely) do not outweigh
> the risks that this effort will introduce functional or performance regressions.

Instead of replacing, would co-existing be okay? Of course as long as
there's no requirement to maintain feature parity between the two.
Actually, just call it "rust_blk" and declare it has no relationship to
null_blk, despite their functional similarities: it's a developer
reference implementation for a rust block driver.

Jens Axboe May 4, 2023, 7:02 p.m. UTC | #9

On 5/4/23 12:52?PM, Keith Busch wrote:
> On Thu, May 04, 2023 at 11:36:01AM -0700, Bart Van Assche wrote:
>> On 5/4/23 11:15, Andreas Hindborg wrote:
>>> If it is still unclear to you why this effort was started, please do let
>>> me know and I shall try to clarify further :)
>>
>> It seems like I was too polite in my previous email. What I meant is that
>> rewriting code is useful if it provides a clear advantage to the users of
>> a driver. For null_blk, the users are kernel developers. The code that has
>> been posted is the start of a rewrite of the null_blk driver. The benefits
>> of this rewrite (making low-level memory errors less likely) do not outweigh
>> the risks that this effort will introduce functional or performance regressions.
> 
> Instead of replacing, would co-existing be okay? Of course as long as
> there's no requirement to maintain feature parity between the two.
> Actually, just call it "rust_blk" and declare it has no relationship to
> null_blk, despite their functional similarities: it's a developer
> reference implementation for a rust block driver.

To me, the big discussion point isn't really whether we're doing
null_blk or not, it's more if we want to go down this path of
maintaining rust bindings for the block code in general. If the answer
to that is yes, then doing null_blk seems like a great choice as it's
not a critical piece of infrastructure. It might even be a good idea to
be able to run both, for performance purposes, as the bindings or core
changes.

But back to the real question... This is obviously extra burden on
maintainers, and that needs to be sorted out first. Block drivers in
general are not super security sensitive, as it's mostly privileged code
and there's not a whole lot of user visibile API. And the stuff we do
have is reasonably basic. So what's the long term win of having rust
bindings? This is a legitimate question. I can see a lot of other more
user exposed subsystems being of higher interest here.

Andreas Hindborg May 4, 2023, 7:59 p.m. UTC | #10

Jens Axboe <axboe@kernel.dk> writes:

> On 5/4/23 12:52?PM, Keith Busch wrote:
>> On Thu, May 04, 2023 at 11:36:01AM -0700, Bart Van Assche wrote:
>>> On 5/4/23 11:15, Andreas Hindborg wrote:
>>>> If it is still unclear to you why this effort was started, please do let
>>>> me know and I shall try to clarify further :)
>>>
>>> It seems like I was too polite in my previous email. What I meant is that
>>> rewriting code is useful if it provides a clear advantage to the users of
>>> a driver. For null_blk, the users are kernel developers. The code that has
>>> been posted is the start of a rewrite of the null_blk driver. The benefits
>>> of this rewrite (making low-level memory errors less likely) do not outweigh
>>> the risks that this effort will introduce functional or performance regressions.
>> 
>> Instead of replacing, would co-existing be okay? Of course as long as
>> there's no requirement to maintain feature parity between the two.
>> Actually, just call it "rust_blk" and declare it has no relationship to
>> null_blk, despite their functional similarities: it's a developer
>> reference implementation for a rust block driver.
>
> To me, the big discussion point isn't really whether we're doing
> null_blk or not, it's more if we want to go down this path of
> maintaining rust bindings for the block code in general. If the answer
> to that is yes, then doing null_blk seems like a great choice as it's
> not a critical piece of infrastructure. It might even be a good idea to
> be able to run both, for performance purposes, as the bindings or core
> changes.
>
> But back to the real question... This is obviously extra burden on
> maintainers, and that needs to be sorted out first. Block drivers in
> general are not super security sensitive, as it's mostly privileged code
> and there's not a whole lot of user visibile API. And the stuff we do
> have is reasonably basic. So what's the long term win of having rust
> bindings? This is a legitimate question. I can see a lot of other more
> user exposed subsystems being of higher interest here.

Even though the block layer is not usually exposed in the same way that
something like the USB stack is, absence of memory safety bugs is a very
useful property. If this is attainable without sacrificing performance,
it seems like a nice option to offer future block device driver
developers. Some would argue that it is worth offering even in the face
of performance regression.

While memory safety is the primary feature that Rust brings to the
table, it does come with other nice features as well. The type system,
language support stackless coroutines and error handling language
support are all very useful.

Regarding maintenance of the bindings, it _is_ an amount extra work. But
there is more than one way to structure that work. If Rust is accepted
into the block layer at some point, maintenance could be structured in
such a way that it does not get in the way of existing C maintenance
work. A "rust keeps up or it breaks" model. That could work for a while.

Best regards
Andreas

Miguel Ojeda May 4, 2023, 8:11 p.m. UTC | #11

On Thu, May 4, 2023 at 9:02 PM Jens Axboe <axboe@kernel.dk> wrote:
>
> But back to the real question... This is obviously extra burden on
> maintainers, and that needs to be sorted out first. Block drivers in

Regarding maintenance, something we have suggested in similar cases to
other subsystems is that the author gets involved as a maintainer of,
at least, the Rust abstractions/driver (possibly with a different
`MAINTAINERS` entry).

Of course, that is still work for the existing maintainer(s), i.e.
you, since coordination takes time. However, it can also be a nice way
to learn Rust on the side, meanwhile things are getting upstreamed and
discussed (I think Daniel, in Cc, is taking that approach).

And it may also be a way for you to get an extra
maintainer/reviewer/... later on for the C parts, too, even if Rust
does not succeed.

> general are not super security sensitive, as it's mostly privileged code
> and there's not a whole lot of user visibile API. And the stuff we do
> have is reasonably basic. So what's the long term win of having rust
> bindings? This is a legitimate question. I can see a lot of other more
> user exposed subsystems being of higher interest here.

From the experience of other kernel maintainers/developers that are
making the move, the advantages seem to be well worth it, even
disregarding the security aspect, i.e. on the language side alone.

Cheers,
Miguel

Jens Axboe May 4, 2023, 8:22 p.m. UTC | #12

On 5/4/23 2:11?PM, Miguel Ojeda wrote:
> On Thu, May 4, 2023 at 9:02?PM Jens Axboe <axboe@kernel.dk> wrote:
>>
>> But back to the real question... This is obviously extra burden on
>> maintainers, and that needs to be sorted out first. Block drivers in
> 
> Regarding maintenance, something we have suggested in similar cases to
> other subsystems is that the author gets involved as a maintainer of,
> at least, the Rust abstractions/driver (possibly with a different
> `MAINTAINERS` entry).

Right, but that doesn't really solve the problem when the rust bindings
get in the way of changes that you are currently making. Or if you break
them inadvertently. I do see benefits to that approach, but it's no
panacea.

> Of course, that is still work for the existing maintainer(s), i.e.
> you, since coordination takes time. However, it can also be a nice way
> to learn Rust on the side, meanwhile things are getting upstreamed and
> discussed (I think Daniel, in Cc, is taking that approach).

This seems to assume that time is plentiful and we can just add more to
our plate, which isn't always true. While I'd love to do more rust and
get more familiar with it, the time still has to be there for that. I'm
actually typing this on a laptop with a rust gpu driver :-)

And this isn't just on me, there are other regular contributors and
reviewers that would need to be onboard with this.

> And it may also be a way for you to get an extra
> maintainer/reviewer/... later on for the C parts, too, even if Rust
> does not succeed.

That is certainly a win!

>> general are not super security sensitive, as it's mostly privileged code
>> and there's not a whole lot of user visibile API. And the stuff we do
>> have is reasonably basic. So what's the long term win of having rust
>> bindings? This is a legitimate question. I can see a lot of other more
>> user exposed subsystems being of higher interest here.
> 
> From the experience of other kernel maintainers/developers that are
> making the move, the advantages seem to be well worth it, even
> disregarding the security aspect, i.e. on the language side alone.

Each case is different though, different people and different schedules
and priorities. So while the above is promising, it's also just
annecdotal and doesn't necessarily apply to our case.

Jens Axboe May 4, 2023, 8:55 p.m. UTC | #13

On 5/4/23 1:59?PM, Andreas Hindborg wrote:
> 
> Jens Axboe <axboe@kernel.dk> writes:
> 
>> On 5/4/23 12:52?PM, Keith Busch wrote:
>>> On Thu, May 04, 2023 at 11:36:01AM -0700, Bart Van Assche wrote:
>>>> On 5/4/23 11:15, Andreas Hindborg wrote:
>>>>> If it is still unclear to you why this effort was started, please do let
>>>>> me know and I shall try to clarify further :)
>>>>
>>>> It seems like I was too polite in my previous email. What I meant is that
>>>> rewriting code is useful if it provides a clear advantage to the users of
>>>> a driver. For null_blk, the users are kernel developers. The code that has
>>>> been posted is the start of a rewrite of the null_blk driver. The benefits
>>>> of this rewrite (making low-level memory errors less likely) do not outweigh
>>>> the risks that this effort will introduce functional or performance regressions.
>>>
>>> Instead of replacing, would co-existing be okay? Of course as long as
>>> there's no requirement to maintain feature parity between the two.
>>> Actually, just call it "rust_blk" and declare it has no relationship to
>>> null_blk, despite their functional similarities: it's a developer
>>> reference implementation for a rust block driver.
>>
>> To me, the big discussion point isn't really whether we're doing
>> null_blk or not, it's more if we want to go down this path of
>> maintaining rust bindings for the block code in general. If the answer
>> to that is yes, then doing null_blk seems like a great choice as it's
>> not a critical piece of infrastructure. It might even be a good idea to
>> be able to run both, for performance purposes, as the bindings or core
>> changes.
>>
>> But back to the real question... This is obviously extra burden on
>> maintainers, and that needs to be sorted out first. Block drivers in
>> general are not super security sensitive, as it's mostly privileged code
>> and there's not a whole lot of user visibile API. And the stuff we do
>> have is reasonably basic. So what's the long term win of having rust
>> bindings? This is a legitimate question. I can see a lot of other more
>> user exposed subsystems being of higher interest here.
> 
> Even though the block layer is not usually exposed in the same way
> that something like the USB stack is, absence of memory safety bugs is
> a very useful property. If this is attainable without sacrificing
> performance, it seems like a nice option to offer future block device
> driver developers. Some would argue that it is worth offering even in
> the face of performance regression.
> 
> While memory safety is the primary feature that Rust brings to the
> table, it does come with other nice features as well. The type system,
> language support stackless coroutines and error handling language
> support are all very useful.

We're in violent agreement on this part, I don't think anyone sane would
argue that memory safety with the same performance [1] isn't something
you'd want. And the error handling with rust is so much better than the
C stuff drivers do now that I can't see anyone disagreeing on that being
a great thing as well.

The discussion point here is the price being paid in terms of people
time.

> Regarding maintenance of the bindings, it _is_ an amount extra work. But
> there is more than one way to structure that work. If Rust is accepted
> into the block layer at some point, maintenance could be structured in
> such a way that it does not get in the way of existing C maintenance
> work. A "rust keeps up or it breaks" model. That could work for a while.

That potentially works for null_blk, but it would not work for anything
that people actually depend on. In other words, anything that isn't
null_blk. And I don't believe we'd be actively discussing these bindings
if just doing null_blk is the end goal, because that isn't useful by
itself, and at that point we'd all just be wasting our time. In the real
world, once we have just one actual driver using it, then we'd be
looking at "this driver regressed because of change X/Y/Z and that needs
to get sorted before the next release". And THAT is the real issue for
me. So a "rust keeps up or it breaks" model is a bit naive in my
opinion, it's just not a viable approach. In fact, even for null_blk,
this doesn't really fly as we rely on blktests to continually vet the
sanity of the IO stack, and null_blk is an integral part of that.

So I really don't think there's much to debate between "rust people vs
jens" here, as we agree on the benefits, but my end of the table has to
stomach the cons. And like I mentioned in an earlier email, that's not
just on me, there are other regular contributors and reviewers that are
relevant to this discussion. This is something we need to discuss.

[1] We obviously need to do real numbers here, the ones posted I don't
consider stable enough to be useful in saying "yeah it's fully on part".
If you have an updated rust nvme driver that uses these bindings I'd
be happy to run some testing that will definitively tell us if there's a
performance win, loss, or parity, and how much.

Christoph Hellwig May 5, 2023, 3:52 a.m. UTC | #14

On Thu, May 04, 2023 at 01:02:19PM -0600, Jens Axboe wrote:
> null_blk or not, it's more if we want to go down this path of
> maintaining rust bindings for the block code in general. If the answer
> to that is yes, then doing null_blk seems like a great choice as it's
> not a critical piece of infrastructure. It might even be a good idea to
> be able to run both, for performance purposes, as the bindings or core
> changes.

Yes.  And I'm not in favor of it, especially right now.  There is
so much work we need to do that requires changes all over (e.g.
sorting out the request_queue vs gendisk, and making the bio_vec
folio or even better physical address based), and the last thing I
need is a binding to a another language, one that happens to have
nice features but that also is really ugly.

Andreas Hindborg May 5, 2023, 5:06 a.m. UTC | #15

Jens Axboe <axboe@kernel.dk> writes:

> On 5/4/23 1:59?PM, Andreas Hindborg wrote:
>> 
>> Jens Axboe <axboe@kernel.dk> writes:
>> 
>>> On 5/4/23 12:52?PM, Keith Busch wrote:
>>>> On Thu, May 04, 2023 at 11:36:01AM -0700, Bart Van Assche wrote:
>>>>> On 5/4/23 11:15, Andreas Hindborg wrote:
>>>>>> If it is still unclear to you why this effort was started, please do let
>>>>>> me know and I shall try to clarify further :)
>>>>>
>>>>> It seems like I was too polite in my previous email. What I meant is that
>>>>> rewriting code is useful if it provides a clear advantage to the users of
>>>>> a driver. For null_blk, the users are kernel developers. The code that has
>>>>> been posted is the start of a rewrite of the null_blk driver. The benefits
>>>>> of this rewrite (making low-level memory errors less likely) do not outweigh
>>>>> the risks that this effort will introduce functional or performance regressions.
>>>>
>>>> Instead of replacing, would co-existing be okay? Of course as long as
>>>> there's no requirement to maintain feature parity between the two.
>>>> Actually, just call it "rust_blk" and declare it has no relationship to
>>>> null_blk, despite their functional similarities: it's a developer
>>>> reference implementation for a rust block driver.
>>>
>>> To me, the big discussion point isn't really whether we're doing
>>> null_blk or not, it's more if we want to go down this path of
>>> maintaining rust bindings for the block code in general. If the answer
>>> to that is yes, then doing null_blk seems like a great choice as it's
>>> not a critical piece of infrastructure. It might even be a good idea to
>>> be able to run both, for performance purposes, as the bindings or core
>>> changes.
>>>
>>> But back to the real question... This is obviously extra burden on
>>> maintainers, and that needs to be sorted out first. Block drivers in
>>> general are not super security sensitive, as it's mostly privileged code
>>> and there's not a whole lot of user visibile API. And the stuff we do
>>> have is reasonably basic. So what's the long term win of having rust
>>> bindings? This is a legitimate question. I can see a lot of other more
>>> user exposed subsystems being of higher interest here.
>> 
>> Even though the block layer is not usually exposed in the same way
>> that something like the USB stack is, absence of memory safety bugs is
>> a very useful property. If this is attainable without sacrificing
>> performance, it seems like a nice option to offer future block device
>> driver developers. Some would argue that it is worth offering even in
>> the face of performance regression.
>> 
>> While memory safety is the primary feature that Rust brings to the
>> table, it does come with other nice features as well. The type system,
>> language support stackless coroutines and error handling language
>> support are all very useful.
>
> We're in violent agreement on this part, I don't think anyone sane would
> argue that memory safety with the same performance [1] isn't something
> you'd want. And the error handling with rust is so much better than the
> C stuff drivers do now that I can't see anyone disagreeing on that being
> a great thing as well.
>
> The discussion point here is the price being paid in terms of people
> time.
>
>> Regarding maintenance of the bindings, it _is_ an amount extra work. But
>> there is more than one way to structure that work. If Rust is accepted
>> into the block layer at some point, maintenance could be structured in
>> such a way that it does not get in the way of existing C maintenance
>> work. A "rust keeps up or it breaks" model. That could work for a while.
>
> That potentially works for null_blk, but it would not work for anything
> that people actually depend on. In other words, anything that isn't
> null_blk. And I don't believe we'd be actively discussing these bindings
> if just doing null_blk is the end goal, because that isn't useful by
> itself, and at that point we'd all just be wasting our time. In the real
> world, once we have just one actual driver using it, then we'd be
> looking at "this driver regressed because of change X/Y/Z and that needs
> to get sorted before the next release". And THAT is the real issue for
> me. So a "rust keeps up or it breaks" model is a bit naive in my
> opinion, it's just not a viable approach. In fact, even for null_blk,
> this doesn't really fly as we rely on blktests to continually vet the
> sanity of the IO stack, and null_blk is an integral part of that.

Sure, once there are actual users, this model would not work. But during
an introduction period it might be a useful model. Having Rust around
without having to take care of it might give
maintainers,reviewers,contributors a no strings attached opportunity to
dabble with the language in a domain they are familiar with.

>
> So I really don't think there's much to debate between "rust people vs
> jens" here, as we agree on the benefits, but my end of the table has to
> stomach the cons. And like I mentioned in an earlier email, that's not
> just on me, there are other regular contributors and reviewers that are
> relevant to this discussion. This is something we need to discuss.
>
> [1] We obviously need to do real numbers here, the ones posted I don't
> consider stable enough to be useful in saying "yeah it's fully on part".
> If you have an updated rust nvme driver that uses these bindings I'd
> be happy to run some testing that will definitively tell us if there's a
> performance win, loss, or parity, and how much.

I do plan to rebase the NVMe driver somewhere in the next few months.
I'll let you know when that work is done.

Best regards
Andreas

Hannes Reinecke May 5, 2023, 5:28 a.m. UTC | #16

On 5/4/23 20:36, Bart Van Assche wrote:
> On 5/4/23 11:15, Andreas Hindborg wrote:
>> If it is still unclear to you why this effort was started, please do let
>> me know and I shall try to clarify further :)
> 
> It seems like I was too polite in my previous email. What I meant is that
> rewriting code is useful if it provides a clear advantage to the users of
> a driver. For null_blk, the users are kernel developers. The code that has
> been posted is the start of a rewrite of the null_blk driver. The benefits
> of this rewrite (making low-level memory errors less likely) do not 
> outweigh
> the risks that this effort will introduce functional or performance 
> regressions.
> 
I have to disagree here. While the null_blk driver in itself is 
certainly not _that_ useful, it does provide a good sounding board if 
all the design principles of the linux block layer can be adequately 
expressed in Rust.

And by posting this driver you just proved that, and we all have a 
better understanding what would be needed to convert old or create new 
drivers.

But I guess we'll have a longer discussion at LSF :-)

Cheers,

Hannes

Miguel Ojeda May 5, 2023, 10:53 a.m. UTC | #17

On Thu, May 4, 2023 at 10:22 PM Jens Axboe <axboe@kernel.dk> wrote:
>
> Right, but that doesn't really solve the problem when the rust bindings
> get in the way of changes that you are currently making. Or if you break
> them inadvertently. I do see benefits to that approach, but it's no
> panacea.
>
> This seems to assume that time is plentiful and we can just add more to
> our plate, which isn't always true. While I'd love to do more rust and
> get more familiar with it, the time still has to be there for that. I'm
> actually typing this on a laptop with a rust gpu driver :-)
>
> And this isn't just on me, there are other regular contributors and
> reviewers that would need to be onboard with this.

Indeed -- I didn't mean to imply it wouldn't be time consuming, only
that it might be an alternative approach compared to having existing
maintainers do it. Of course, it depends on the dynamics of the
subsystem, how busy the subsystem is, whether there is good rapport,
etc.

> Each case is different though, different people and different schedules
> and priorities. So while the above is promising, it's also just
> annecdotal and doesn't necessarily apply to our case.

Definitely, in the end subsystems know best if there is enough time
available (from everybody) to pull it off. I only meant to say that
the security angle is not the only benefit.

For instance, like you said, the error handling, plus a bunch more
that people usually enjoy: stricter typing, more information on
signatures, sum types, pattern matching, privacy, closures, generics,
etc.

Cheers,
Miguel

Miguel Ojeda May 5, 2023, 11:14 a.m. UTC | #18

On Thu, May 4, 2023 at 10:55 PM Jens Axboe <axboe@kernel.dk> wrote:
>
> That potentially works for null_blk, but it would not work for anything
> that people actually depend on. In other words, anything that isn't
> null_blk. And I don't believe we'd be actively discussing these bindings
> if just doing null_blk is the end goal, because that isn't useful by
> itself, and at that point we'd all just be wasting our time. In the real
> world, once we have just one actual driver using it, then we'd be
> looking at "this driver regressed because of change X/Y/Z and that needs
> to get sorted before the next release". And THAT is the real issue for
> me. So a "rust keeps up or it breaks" model is a bit naive in my
> opinion, it's just not a viable approach. In fact, even for null_blk,
> this doesn't really fly as we rely on blktests to continually vet the
> sanity of the IO stack, and null_blk is an integral part of that.

But `null_blk` shouldn't be affected, no? The Rust abstractions can be
behind an explicit "experimental" / "broken" / "compile-test only"
gate (or similar) in the beginning, as a way to test how much
maintenance it actually requires.

In such a setup, Andreas could be the one responsible to keep them up
to date in the beginning. That is, in the worst case, a kernel release
could happen with the Rust side broken -- that way `null_blk` is not
impacted.

That is why a separate `MAINTAINERS` entry may be interesting if you
want to keep e.g. the `S:` level separate (though Andreas, I think,
may be able to do "Supported" at this time).

When the first real driver comes, a similar approach could be
repeated, to buy some more time too.

Cheers,
Miguel

Boqun Feng May 5, 2023, 12:24 p.m. UTC | #19

On Fri, May 05, 2023 at 12:53:41PM +0200, Miguel Ojeda wrote:
> On Thu, May 4, 2023 at 10:22 PM Jens Axboe <axboe@kernel.dk> wrote:
> >
> > Right, but that doesn't really solve the problem when the rust bindings
> > get in the way of changes that you are currently making. Or if you break
> > them inadvertently. I do see benefits to that approach, but it's no
> > panacea.

One thing I want to point out is: not having a block layer Rust API
doesn't keep the block layer away from Rust ;-) Rust will get in the way
as long as block layer is used, directly or indirectly, in any Rust code
in kernel.

Take the M1 GPU driver for example, it can totally be done without a drm
Rust API: Lina will have to directly call C funciton in her GPU driver,
but it's possible or she can have her own drm Rust binding which is not
blessed by the drm maintainers. But as long as drm is used in a Rust
driver, a refactoring/improvement of drm will need to take the usage of
Rust side into consideration. Unless of course, some one is willing to
write a C driver for M1 GPU.

The Rust bindings are actually the way of communication between
subsystem mantainers and Rust driver writers, and can help reduce the
amount of work: You can have the abstraction the way you like.

Of course, there is always "don't do it until there are actually users",
and I totally agree with that. But what is a better way to design the
Rust binding for a subsystem?

*	Sit down and use the wisdom of maintainers and active
	developers, and really spend time on it right now? Or

*	Let one future user drag the API/binding design to insaneness?

I'd rather prefer the first approach. Time spent is time saved.

Personally, my biggest fear is: RCU stalls/lockdep warnings in the Rust
code (or they don't happen because incorrect bindings), and who is going
to fix them ;-) So I have to spend my time on making sure these bindings
in good shapes, which is not always a pleasant experience: the more you
use something, the more you hate it ;-) But I think it's worth.

Of course, by no means I want to force anyone to learn Rust, I totally
understand people who want to see zero Rust. Just want to say the
maintain burden may exist any way, and the Rust binding is actually the
thing to help here.

Regards,
Boqun

> >
> > This seems to assume that time is plentiful and we can just add more to
> > our plate, which isn't always true. While I'd love to do more rust and
> > get more familiar with it, the time still has to be there for that. I'm
> > actually typing this on a laptop with a rust gpu driver :-)
> >
> > And this isn't just on me, there are other regular contributors and
> > reviewers that would need to be onboard with this.
> 
> Indeed -- I didn't mean to imply it wouldn't be time consuming, only
> that it might be an alternative approach compared to having existing
> maintainers do it. Of course, it depends on the dynamics of the
> subsystem, how busy the subsystem is, whether there is good rapport,
> etc.
> 
> > Each case is different though, different people and different schedules
> > and priorities. So while the above is promising, it's also just
> > annecdotal and doesn't necessarily apply to our case.
> 
> Definitely, in the end subsystems know best if there is enough time
> available (from everybody) to pull it off. I only meant to say that
> the security angle is not the only benefit.
> 
> For instance, like you said, the error handling, plus a bunch more
> that people usually enjoy: stricter typing, more information on
> signatures, sum types, pattern matching, privacy, closures, generics,
> etc.
> 
> Cheers,
> Miguel

Boqun Feng May 5, 2023, 1:52 p.m. UTC | #20

On Fri, May 05, 2023 at 05:24:56AM -0700, Boqun Feng wrote:
> On Fri, May 05, 2023 at 12:53:41PM +0200, Miguel Ojeda wrote:
> > On Thu, May 4, 2023 at 10:22 PM Jens Axboe <axboe@kernel.dk> wrote:
> > >
> > > Right, but that doesn't really solve the problem when the rust bindings
> > > get in the way of changes that you are currently making. Or if you break
> > > them inadvertently. I do see benefits to that approach, but it's no
> > > panacea.
> 
> One thing I want to point out is: not having a block layer Rust API
> doesn't keep the block layer away from Rust ;-) Rust will get in the way
> as long as block layer is used, directly or indirectly, in any Rust code
> in kernel.
> 
> Take the M1 GPU driver for example, it can totally be done without a drm
> Rust API: Lina will have to directly call C funciton in her GPU driver,
> but it's possible or she can have her own drm Rust binding which is not
> blessed by the drm maintainers. But as long as drm is used in a Rust
> driver, a refactoring/improvement of drm will need to take the usage of
> Rust side into consideration. Unless of course, some one is willing to
> write a C driver for M1 GPU.
> 
> The Rust bindings are actually the way of communication between
> subsystem mantainers and Rust driver writers, and can help reduce the
> amount of work: You can have the abstraction the way you like.
> 
> Of course, there is always "don't do it until there are actually users",
> and I totally agree with that. But what is a better way to design the
> Rust binding for a subsystem?
> 
> *	Sit down and use the wisdom of maintainers and active
> 	developers, and really spend time on it right now? Or
> 
> *	Let one future user drag the API/binding design to insaneness?
> 

Ah, of course, I should add: this is not the usual case, most of the
time, users (e.g. a real driver) can help the design, I was just trying
to say: without the wisdom of maintainers and active developers, a Rust
binding solely designed by one user could have some design issues. In
other words, the experience of maintaining C side API is very valuable
to design Rust bindings.

Regards,
Boqun

> I'd rather prefer the first approach. Time spent is time saved.
> 
> Personally, my biggest fear is: RCU stalls/lockdep warnings in the Rust
> code (or they don't happen because incorrect bindings), and who is going
> to fix them ;-) So I have to spend my time on making sure these bindings
> in good shapes, which is not always a pleasant experience: the more you
> use something, the more you hate it ;-) But I think it's worth.
> 
> Of course, by no means I want to force anyone to learn Rust, I totally
> understand people who want to see zero Rust. Just want to say the
> maintain burden may exist any way, and the Rust binding is actually the
> thing to help here.
> 
> Regards,
> Boqun
> 
> > >
> > > This seems to assume that time is plentiful and we can just add more to
> > > our plate, which isn't always true. While I'd love to do more rust and
> > > get more familiar with it, the time still has to be there for that. I'm
> > > actually typing this on a laptop with a rust gpu driver :-)
> > >
> > > And this isn't just on me, there are other regular contributors and
> > > reviewers that would need to be onboard with this.
> > 
> > Indeed -- I didn't mean to imply it wouldn't be time consuming, only
> > that it might be an alternative approach compared to having existing
> > maintainers do it. Of course, it depends on the dynamics of the
> > subsystem, how busy the subsystem is, whether there is good rapport,
> > etc.
> > 
> > > Each case is different though, different people and different schedules
> > > and priorities. So while the above is promising, it's also just
> > > annecdotal and doesn't necessarily apply to our case.
> > 
> > Definitely, in the end subsystems know best if there is enough time
> > available (from everybody) to pull it off. I only meant to say that
> > the security angle is not the only benefit.
> > 
> > For instance, like you said, the error handling, plus a bunch more
> > that people usually enjoy: stricter typing, more information on
> > signatures, sum types, pattern matching, privacy, closures, generics,
> > etc.
> > 
> > Cheers,
> > Miguel

Bart Van Assche May 5, 2023, 7:38 p.m. UTC | #21

On 5/5/23 03:53, Miguel Ojeda wrote:
> Definitely, in the end subsystems know best if there is enough time
> available (from everybody) to pull it off. I only meant to say that
> the security angle is not the only benefit.
> 
> For instance, like you said, the error handling, plus a bunch more
> that people usually enjoy: stricter typing, more information on
> signatures, sum types, pattern matching, privacy, closures, generics,
> etc.

These are all great advantages of Rust.

One potential cause of memory corruption caused by block drivers is
misprogramming the DMA engine of the storage controller. This is something
no borrow checker can protect against. Only an IOMMU can protect against
the storage controller accessing memory that it shouldn't access. This is
not a criticism of Rust - I'm bringing this up because I think this is
something that is important to realize.

Bart.

Keith Busch May 5, 2023, 7:42 p.m. UTC | #22

On Fri, May 05, 2023 at 05:24:56AM -0700, Boqun Feng wrote:
> 
> The Rust bindings are actually the way of communication between
> subsystem mantainers and Rust driver writers, and can help reduce the
> amount of work: You can have the abstraction the way you like.

We don't have stable APIs or structures here, so let's be clear-eyed
about the maintenance burden these bindings create for linux-block
contributors. Not a hard "no" from me, but this isn't something to
handwave over.

Boqun Feng May 5, 2023, 9:46 p.m. UTC | #23

On Fri, May 05, 2023 at 01:42:42PM -0600, Keith Busch wrote:
> On Fri, May 05, 2023 at 05:24:56AM -0700, Boqun Feng wrote:
> > 
> > The Rust bindings are actually the way of communication between
> > subsystem mantainers and Rust driver writers, and can help reduce the
> > amount of work: You can have the abstraction the way you like.
> 
> We don't have stable APIs or structures here, so let's be clear-eyed

Of course, but every API change need to cover all in-tree users, right?

> about the maintenance burden these bindings create for linux-block
> contributors. Not a hard "no" from me, but this isn't something to
> handwave over.

Not tried to handwave over anything ;-) The fact IIUC is simply that Rust
drivers can call C function, so say a in-tree Rust driver does something
as follow:

	struct Foo {
		ptr: *mut bio; // A pointer to bio.
		...
	}

	impl Foo {
		pub fn bar(&self) {
			unsafe {
				// calling a C function "do_sth_to_bio".
				// void do_sth_to_bio(struct bio *bio);
				bindings::do_sth_to_bio(self.ptr);
			}
		}
	}

That's an user of the block layer, and that user could exist even
without Bio abstraction. And whenever a linux-block developer wants to
refactor the "do_sth_to_bio" with a slightly different semantics, that
user will need to be taken into consideration, meaning reading the Rust
code of Foo to understand the usage.

Now with a Bio abstraction, the immediate effect would be there should
be no Rust code is allowed to directly calls block layer functions
without using Bio abstraction. And hopefully Bio abstraction along with
other bindings is a place good enough to reasoning about semanitcs
changes or refactoring, so no need to read the code of Foo to understand
the usage. Of course, some C side changes may result into changes in
Rust bindings as well, but that doesn't make things worse. (Need to
understand Foo in that case if there is no Rust bindings).

Of course, this is just my 2 cents. I could be completely wrong.
(Put the Rust-for-Linux hat on) Needless to say with or without the Rust
bindings for the block layer, we are (at least I'm) happy to help on any
Rust related questions/bugs/issues for linux-block ;-)

Regards,
Boqun

Luis Chamberlain May 7, 2023, 11:31 p.m. UTC | #24

On Wed, May 03, 2023 at 11:06:57AM +0200, Andreas Hindborg wrote:
> The statistics presented in my previous message [1] show that the C null block
> driver has had a significant amount of memory safety related problems in the
> past. 41% of fixes merged for the C null block driver are fixes for memory
> safety issues. This makes the null block driver a good candidate for rewriting
> in Rust.

Curious, how long does it take to do an analysis like this? Are there efforts
to automate this a bit more? We have efforts to use machine learning to
evaluate stable candidate patches, we probably should be able to qualify
commits as fixing "memory safety", I figure.

Because what I'd love to see is if we can could easily obtain similar
statistics for arbitrary parts of the kernel. The easiest way to break
this down might be by kconfig symbol for instance, and then based on
that gather more information about subsystems.

Then the rationale for considerating adopting rust bindings for certain areas
of the kernel becomes a bit clearer.

I figured some of this work has already been done, but I just haven't seen it
yet.

  Luis

Andreas Hindborg May 7, 2023, 11:37 p.m. UTC | #25

Luis Chamberlain <mcgrof@kernel.org> writes:

> On Wed, May 03, 2023 at 11:06:57AM +0200, Andreas Hindborg wrote:
>> The statistics presented in my previous message [1] show that the C null block
>> driver has had a significant amount of memory safety related problems in the
>> past. 41% of fixes merged for the C null block driver are fixes for memory
>> safety issues. This makes the null block driver a good candidate for rewriting
>> in Rust.
>
> Curious, how long does it take to do an analysis like this? Are there efforts
> to automate this a bit more? We have efforts to use machine learning to
> evaluate stable candidate patches, we probably should be able to qualify
> commits as fixing "memory safety", I figure.
>
> Because what I'd love to see is if we can could easily obtain similar
> statistics for arbitrary parts of the kernel. The easiest way to break
> this down might be by kconfig symbol for instance, and then based on
> that gather more information about subsystems.
>

I spent around 4 hours with a spreadsheet and git. It would be cool if
that work could be automated. It's not always clear from the commit
heading that a commit is a fix. When it is clear that it is a fix, it
might not be clear what is fixed. I had to look at the diff quite a few
commits.

There is some work mentioning the ratio of memory safety issues fixed in
the kernel, but none of them go into details for specific subsystems as
far as I know. 20% of bugs fixed in stable Linux Kernel branches for
drivers are memory safety issues [1]. 65% of recent Linux kernel
vulnerabilities are memory safety issues [2]

> Then the rationale for considerating adopting rust bindings for certain areas
> of the kernel becomes a bit clearer.

As mentioned elsewhere in this thread there are other benefits from
deploying Rust than provable absence of memory safety issues.

Best regards
Andreas

[1] http://dx.doi.org/10.15514/ISPRAS-2018-30(6)-8
[2] https://lssna19.sched.com/event/RHaT/writing-linux-kernel-modules-in-safe-rust-geoffrey-thomas-two-sigma-investments-alex-gaynor-alloy

Andreas Hindborg June 6, 2023, 1:33 p.m. UTC | #26

Hi All,

I apologize for the lengthy email, but I have a lot of things to cover.

As some of you know, a goal of mine is to make it possible to write blk-mq
device drivers in Rust. The RFC patches I have sent to this list are the first
steps of making that goal a reality. They are a sample of the work I am doing.

My current plan of action is to provide a Rust API that allows implementation of
blk-mq device drives, along with a Rust implementation of null_blk to serve as a
reference implementation. This reference implementation will demonstrate how to
use the API.

I attended LSF in Vancouver a few weeks back where I led a discussion on the
topic. My goal for that session was to obtain input from the community on how to
upstream the work as it becomes more mature.

I received a lot of feedback, both during the session, in the hallway, and on
the mailing list. Ultimately, we did not achieve consensus on a path forward. I
will try to condense the key points raised by the community here. If anyone feel
their point is not contained below, please chime in.

Please note that I am paraphrasing the points below, they are not citations.

1) "Block layer community does not speak Rust and thus cannot review Rust patches"

   This work hinges on one of two things happening. Either block layer reviewers
   and maintainers eventually becoming fluent in Rust, or they accept code in
   their tree that are maintained by the "rust people". I very much would prefer
   the first option.

   I would suggest to use this work to facilitate gradual adoption of Rust. I
   understand that this will be a multi-year effort. By giving the community
   access to a Rust bindings specifically designed or the block layer, the block
   layer community will have a helpful reference to consult when investigating
   Rust.

   While the block community is getting up to speed in Rust, the Rust for Linux
   community is ready to conduct review of patches targeting the block layer.
   Until such a time where Rust code can be reviewed by block layer experts, the
   work could be gated behind an "EXPERIMENTAL" flag.

   Selection of the null_blk driver for a reference implementation to drive the
   Rust block API was not random. The null_blk driver is relatively simple and
   thus makes for a good platform to demonstrate the Rust API without having to
   deal with actual hardware.

   The null_blk driver is a piece of testing infrastructure that is not usually
   deployed in production environments, so people who are worried about Rust in
   general will not have to worry about their production environments being
   infested with Rust.

   Finally there have been suggestions both to replace and/or complement the
   existing C null_blk driver with the Rust version. I would suggest
   (eventually, not _now_) complementing the existing driver, since it can be
   very useful to benchmark and test the two drivers side by side.

2) "Having Rust bindings for the block layer in-tree is a burden for the
   maintainers"

   I believe we can integrate the bindings in a way so that any potential
   breakage in the Rust API does not impact current maintenance work.
   Maintainers and reviewers that do not wish to bother with Rust should be able
   to opt out. All Rust parts should be gated behind a default N kconfig option.
   With this scheme there should be very little inconvenience for current
   maintainers.

   I will take necessary steps to make sure block layer Rust bindings are always
   up to date with changes to kernel C API. I would run CI against

   - for-next of https://git.kernel.dk/linux.git
   - master of https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git
   - mainline releases including RCs
   - stable and longterm kernels with queues applied
   - stable and longterm releases including RCs

   Samsung will provide resources to support this CI effort. Through this effort
   I will aim to minimize any inconvenience for maintainers.

3) "How will you detect breakage in the Rust API caused by changes to C code?"

   The way we call C code from Rust in the kernel guarantees that most changes
   to C APIs that are called by Rust code will cause a compile failure when
   building the kernel with Rust enabled. This includes changing C function
   argument names or types, and struct field names or types. Thus, we do not need
   to rely on symvers CRC calculation as suggested by James Bottomley at LSF.

   However, if the semantics of a kernel C function is changed without changing
   its name or signature, potential breakage will not be detected by the build
   system. To detect breakage resulting from this kind of change, we have to
   rely _on the same mechanics_ that maintainers of kernel C code are relying on
   today:

   - kunit tests
   - blktests
   - fstests
   - staying in the loop wrt changes in general

   We also have Rust support in Intel 0-day CI, although only compile tests for
   now.

4) "How will you prevent breakage in C code resulting from changes to Rust code"

   The way the Rust API is designed, existing C code is not going to be reliant
   on Rust code. If anything breaks just disable Rust and no Rust code will be
   built. Or disable block layer Rust code if you want to keep general Rust
   support. If Rust is disabled by default, nothing in the kernel should break
   because of Rust, if not explicitly enabled.

5) "Block drivers in general are not security sensitive because they are mostly
   privileged code and have limited user visible API"

   There are probably easier ways to exploit a Linux system than to target the
   block layer, although people are plugging in potentially malicious block
   devices all the time in the form of USB Mass Storage devices or CF cards.

   While memory safety is very relevant for preventing exploitable security
   vulnerabilities, it is also incredibly useful in preventing memory safety
   bugs in general. Fewer bugs means less risk of bugs leading to data
   corruption. It means less time spent on tracking down and fixing bugs, and
   less time spent reviewing bug fixes. It also means less time required to
   review patches in general, because reviewers do not have to review for memory
   safety issues.

   So while Rust has high merit in exposed and historically exploited
   subsystems, this does not mean that it has no merit in other subsystems.

6) "Other subsystems may benefit more from adopting Rust"

   While this might be true, it does not prevent the block subsystem from
   benefiting from adopting Rust (see 5).


7) "Do not waste time re-implementing null_blk, it is test infrastructure so
   memory safety does not matter. Why don't you do loop instead?"

   I strongly believe that memory safety is also relevant in test
   infrastructure. We waste time and energy fixing memory safety issues in our
   code, no matter if the code is test infrastructure or not. I refer to the
   statistics I posted to the list at an earlier date [3].

   Further, I think it is a benefit to all if the storage community can become
   fluent in Rust before any critical infrastructure is deployed using Rust.
   This is one reason that I switched my efforts to null_block and that I am not
   pushing Rust NVMe.

8) "Why don't you wait with this work until you have a driver for a new storage
   standard"

   Let's be proactive. I think it is important to iron out the details of the
   Rust API before we implement any potential new driver. When we eventually
   need to implement a driver for a future storage standard, the choice to do so
   in Rust should be easy. By making the API available ahead of time, we will be
   able to provide future developers with a stable implementation to choose
   from.

9) "You are a new face in our community. How do we know you will not disappear?"

   I recognize this consideration and I acknowledge that the community is trust
   based. Trust takes time to build. I can do little more than state that I
   intend to stay with my team at Samsung to take care of this project for many
   years to come. Samsung is behind this particular effort. In general Google
   and Microsoft are actively contributing to the wider Rust for Linux project.
   Perhaps that can be an indication that the project in general is not going
   away.

10) "How can I learn how to build the kernel with Rust enabled?"

    We have a guide in `Documentation/rust/quick-start.rst`. If that guide does
    not get you started, please reach out to us [1] and we will help you get
    started (and fix the documentation since it must not be good enough then).

11) "What if something catches fire and you are out of office?"

    If I am for some reason not responding to pings during a merge, please
    contact the Rust subsystem maintainer and the Rust for Linux list [2]. There
    are quite a few people capable of firefighting if it should ever become
    necessary.

12) "These patches are not ready yet, we should not accept them"

    They most definitely are _not_ ready, and I would not ask for them to be
    included at all in their current state. The RFC is meant to give a sample of
    the work that I am doing and to start this conversation. I would rather have
    this conversation preemptively. I did not intend to give the impression that
    the patches are in a finalized state at all.


With all this in mind I would suggest that we treat the Rust block layer API and
associated null block driver as an experiment. I would suggest that we merge it
in when it is ready, and we gate it behind an experimental kconfig option. If it
turns out that all your worst nightmares come true and it becomes an unbearable
load for maintainers, reviewers and contributors, it will be low effort remove
it again. I very much doubt this will be the case though.

Jens, Kieth, Christoph, Ming, I would kindly ask you to comment on my suggestion
for next steps, or perhaps suggest an alternate path. In general I would
appreciate any constructive feedback from the community.

[1] https://rust-for-linux.com/contact
[2] rust-for-linux@vger.kernel.org
[3] https://lore.kernel.org/all/87y1ofj5tt.fsf@metaspace.dk/

Best regards,
Andreas Hindborg

Miguel Ojeda June 6, 2023, 2:46 p.m. UTC | #27

On Tue, Jun 6, 2023 at 3:40 PM Andreas Hindborg (Samsung)
<nmi@metaspace.dk> wrote:
>
>    Samsung will provide resources to support this CI effort. Through this effort
>    I will aim to minimize any inconvenience for maintainers.

This is great.

>    We also have Rust support in Intel 0-day CI, although only compile tests for
>    now.

In addition, we also have initial Rust support in KernelCI (including
`linux-next` [1], `rust-next` and the old `rust` branch), which could
be expanded upon, especially with more resources (Cc'ing Guillaume).

Moreover, our plan is to replicate the simple CI we originally had for
the `rust` branch (which included QEMU boot tests under a matrix of
several archs/configs/compilers) to our `rust-next`, `rust-fixes` and
`rust-dev` branches, in order to complement the other CIs and to get
some early smoke testing (especially for `rust-dev`).

[1] https://github.com/kernelci/kernelci-core/blob/65b0900438e0ed20e7efe0ada681ab212dc8c774/config/core/build-configs.yaml#L1152-L1197

Cheers,
Miguel

Andreas Hindborg July 28, 2023, 6:49 a.m. UTC | #28

Hi,

Yexuan Yang <1182282462@bupt.edu.cn> writes:

>> Over the 432 benchmark configurations, the relative performance of the Rust
>> driver to the C driver (in terms of IOPS) is between 6.8 and -11.8 percent with
>> an average of 0.2 percent better for reads. For writes the span is 16.8 to -4.5
>> percent with an average of 0.9 percent better.
>> 
>> For each measurement the drivers are loaded, a drive is configured with memory
>> backing and a size of 4 GiB. C null_blk is configured to match the implemented
>> modes of the Rust driver: `blocksize` is set to 4 KiB, `completion_nsec` to 0,
>> `irqmode` to 0 (IRQ_NONE), `queue_mode` to 2 (MQ), `hw_queue_depth` to 256 and
>> `memory_backed` to 1. For both the drivers, the queue scheduler is set to
>> `none`. These measurements are made using 30 second runs of `fio` with the
>> `PSYNC` IO engine with workers pinned to separate CPU cores. The measurements
>> are done inside a virtual machine (qemu/kvm) on an Intel Alder Lake workstation
>> (i5-12600).
>
> Hi All!
> I have some problems about your benchmark test.
> In Ubuntu 22.02, I compiled an RFL kernel with both C null_blk driver and Rust
> null_blk_driver as modules. For the C null_blk driver, I used the `modprobe`
> command to set relevant parameters, while for the Rust null_blk_driver, I simply
> started it. I used the following two commands to start the drivers:
>
> ```bash
> sudo modprobe null_blk \
>     queue_mode=2 irqmode=0 hw_queue_depth=256 \
>     memory_backed=1 bs=4096 completion_nsec=0 \
>     no_sched=1 gb=4;
> sudo modprobe rnull_mod
> ```
>
> After that, I tested their performance in `randread` with the fio command, specifying the first parameter as 4 and the second parameter as 1:
>
> ```bash
> fio --filename=/dev/nullb0 --iodepth=64 --ioengine=psync --direct=1 --rw=randread --bs=$1k --numjobs=$2 --runtime=30 --group_reporting --name=test-rand-read --output=test_c_randread.log
> fio --filename=/dev/rnullb0 --iodepth=64 --ioengine=psync --direct=1 --rw=randread --bs=$1k --numjobs=$2 --runtime=30 --group_reporting --name=test-rand-read --output=test_rust_randread.log
> ```
>
> The test results showed a significant performance difference between the two
> drivers, which was drastically different from the data you tested in the
> community. Specifically, for `randread`, the C driver had a bandwidth of
> 487MiB/s and IOPS of 124.7k, while the Rust driver had a bandwidth of 264MiB/s
> and IOPS of 67.7k. However, for other I/O types, the performance of the C and
> Rust drivers was more similar. Could you please provide more information about
> the actual bandwidth and IOPS data from your tests, rather than just the
> performance difference between the C and Rust drivers? Additionally, if you
> could offer possible reasons for this abnormality, I would greatly appreciate
> it!

Thanks for trying out the code! I am not sure why you get these numbers.
I am currently out of office, but I will rerun the benchmarks next week
when I get back in. Maybe I can provide you with some scripts and my
kernel configuration. Hopefully we can figure out the difference in our
setups.

Best regards,
Andreas

Andreas Hindborg July 31, 2023, 2:14 p.m. UTC | #29

Hi,

"Andreas Hindborg (Samsung)" <nmi@metaspace.dk> writes:

> Hi,
>
> Yexuan Yang <1182282462@bupt.edu.cn> writes:
>
>>> Over the 432 benchmark configurations, the relative performance of the Rust
>>> driver to the C driver (in terms of IOPS) is between 6.8 and -11.8 percent with
>>> an average of 0.2 percent better for reads. For writes the span is 16.8 to -4.5
>>> percent with an average of 0.9 percent better.
>>> 
>>> For each measurement the drivers are loaded, a drive is configured with memory
>>> backing and a size of 4 GiB. C null_blk is configured to match the implemented
>>> modes of the Rust driver: `blocksize` is set to 4 KiB, `completion_nsec` to 0,
>>> `irqmode` to 0 (IRQ_NONE), `queue_mode` to 2 (MQ), `hw_queue_depth` to 256 and
>>> `memory_backed` to 1. For both the drivers, the queue scheduler is set to
>>> `none`. These measurements are made using 30 second runs of `fio` with the
>>> `PSYNC` IO engine with workers pinned to separate CPU cores. The measurements
>>> are done inside a virtual machine (qemu/kvm) on an Intel Alder Lake workstation
>>> (i5-12600).
>>
>> Hi All!
>> I have some problems about your benchmark test.
>> In Ubuntu 22.02, I compiled an RFL kernel with both C null_blk driver and Rust
>> null_blk_driver as modules. For the C null_blk driver, I used the `modprobe`
>> command to set relevant parameters, while for the Rust null_blk_driver, I simply
>> started it. I used the following two commands to start the drivers:
>>
>> ```bash
>> sudo modprobe null_blk \
>>     queue_mode=2 irqmode=0 hw_queue_depth=256 \
>>     memory_backed=1 bs=4096 completion_nsec=0 \
>>     no_sched=1 gb=4;
>> sudo modprobe rnull_mod
>> ```
>>
>> After that, I tested their performance in `randread` with the fio command, specifying the first parameter as 4 and the second parameter as 1:
>>
>> ```bash
>> fio --filename=/dev/nullb0 --iodepth=64 --ioengine=psync --direct=1 --rw=randread --bs=$1k --numjobs=$2 --runtime=30 --group_reporting --name=test-rand-read --output=test_c_randread.log
>> fio --filename=/dev/rnullb0 --iodepth=64 --ioengine=psync --direct=1 --rw=randread --bs=$1k --numjobs=$2 --runtime=30 --group_reporting --name=test-rand-read --output=test_rust_randread.log
>> ```
>>
>> The test results showed a significant performance difference between the two
>> drivers, which was drastically different from the data you tested in the
>> community. Specifically, for `randread`, the C driver had a bandwidth of
>> 487MiB/s and IOPS of 124.7k, while the Rust driver had a bandwidth of 264MiB/s
>> and IOPS of 67.7k. However, for other I/O types, the performance of the C and
>> Rust drivers was more similar. Could you please provide more information about
>> the actual bandwidth and IOPS data from your tests, rather than just the
>> performance difference between the C and Rust drivers? Additionally, if you
>> could offer possible reasons for this abnormality, I would greatly appreciate
>> it!
>
> Thanks for trying out the code! I am not sure why you get these numbers.
> I am currently out of office, but I will rerun the benchmarks next week
> when I get back in. Maybe I can provide you with some scripts and my
> kernel configuration. Hopefully we can figure out the difference in our
> setups.
>

I just ran the benchmarks for that configuration again. My setup is an
Intel Alder Lake CPU (i5-12600) with Debian Bullseye user space running
a 6.2.12 host kernel with a kernel config based on the stock Debian
Bullseye (debian config + make olddefconfig). 32 GB of DDR5 4800MHz
memory. The benchmarks (and patched kernel) run inside QEMU KVM (from
Debian repos Debian 1:5.2+dfsg-11+deb11u2). I use the following qemu
command to boot (edited a bit for clarity):

"qemu-system-x86_64" "-nographic" "-enable-kvm" "-m" "16G" "-cpu" "host"
"-M" "q35" "-smp" "6" "-kernel" "vmlinux" "-append" "console=ttyS0
nokaslr rdinit=/sbin/init root=/dev/vda1 null_blk.nr_devices=0"

I use a Debian Bullseye user space inside the virtual machine.

I think I used the stock Bullseye kernel for the host when I did the
numbers for the cover letter, so that is different for this run.

Here are the results:

+---------+----------+------+---------------------+
| jobs/bs | workload | prep |          4          |
+---------+----------+------+---------------------+
|    4    | randread |  1   | 0.28 0.00 (1.7,0.0) |
|    4    | randread |  0   | 5.75 0.00 (6.8,0.0) |
+---------+----------+------+---------------------+

I used the following parameters for both tests:

Block size: 4k
Run time: 120s
Workload: randread
Jobs: 4

This is the `fio` command I used:

['fio', '--group_reporting', '--name=default', '--filename=/dev/<snip>', '--time_based=1', '--runtime=120', '--rw=randread', '--output=<snip>', '--output-format=json', '--numjobs=4', '--cpus_allowed=0-3', '--cpus_allowed_policy=split', '--ioengine=psync', '--bs=4k', '--direct=1', '--norandommap', '--random_generator=lfsr']

For the line in the table where `prep` is 1 I filled up the target
device with data before running the benchmark. I use the following `fio`
job to do that:

['fio', '--name=prep', '--rw=write', '--filename=/dev/None', '--bs=4k', '--direct=1']

Did you prepare the volumes with data in your experiment?

How to read the data in the table: For the benchmark where the drive is
prepared, Rust null_blk perform 0.28 percent better than C null_block.
For the case where the drive is empty Rust null_block performs 5.75
percent better than C null_block. I calculated the relative performance
as "(r_iops - c_iops) / c_iops * 100".

The IOPS info you request is present in the tables from the cover
letter. As mentioned in the cover letter, the numbers in parenthesis in
each cell is throughput in 1,000,000 IOPS for read and
write for the C null_blk driver. For the table above, C null_blk did
1,700,000 read IOPS in the case when the drive is prepared and 5,750,000
IOPS in the case where the drive is not prepared.

You can obtain bandwidth by multiplying IOPS by block size. I can also
provide the raw json output of `fio` if you want to have a look.

One last note is that I unload the module between each run and I do not
have both modules loaded at the same time. If you are exhausting you
memory pool this could maybe impact performance?

So things to check:
 - Do you prepare the drive?
 - Do you release drive memory after test (you can unload the module)
 - Use the lfsr random number generator
 - Do not use the randommap

I am not sure what hardware you are running on, but the throughput
numbers you obtain seem _really_ low.

Best regards
Andreas Hindborg

[RFC,00/11] Rust null block driver

Message

Comments