[RFC,03/24] erofs: add Errno in Rust

Message ID	20240916135634.98554-4-toolmanp@tlmp.cc (mailing list archive)
State	New
Headers	show Received: from mail.tlmp.cc (unknown [148.135.17.20]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 4FF80158DDD; Mon, 16 Sep 2024 13:56:44 +0000 (UTC) From: Yiyang Wu <toolmanp@tlmp.cc> To: linux-erofs@lists.ozlabs.org Cc: rust-for-linux@vger.kernel.org, linux-fsdevel@vger.kernel.org, LKML <linux-kernel@vger.kernel.org> Subject: [RFC PATCH 03/24] erofs: add Errno in Rust Date: Mon, 16 Sep 2024 21:56:13 +0800 Message-ID: <20240916135634.98554-4-toolmanp@tlmp.cc> In-Reply-To: <20240916135634.98554-1-toolmanp@tlmp.cc> References: <20240916135634.98554-1-toolmanp@tlmp.cc> Precedence: bulk MIME-Version: 1.0 Content-Transfer-Encoding: 8bit
Series	erofs: introduce Rust implementation \| expand [RFC,00/24] erofs: introduce Rust implementation [RFC,01/24] erofs: lift up erofs_fill_inode to global [RFC,02/24] erofs: add superblock data structure in Rust [RFC,03/24] erofs: add Errno in Rust [RFC,04/24] erofs: add xattrs data structure in Rust [RFC,05/24] erofs: add inode data structure in Rust [RFC,06/24] erofs: add alloc_helper in Rust [RFC,07/24] erofs: add data abstraction in Rust [RFC,08/24] erofs: add device data structure in Rust [RFC,09/24] erofs: add continuous iterators in Rust [RFC,10/24] erofs: add device_infos implementation in Rust [RFC,11/24] erofs: add map data structure in Rust [RFC,12/24] erofs: add directory entry data structure in Rust [RFC,13/24] erofs: add runtime filesystem and inode in Rust [RFC,14/24] erofs: add block mapping capability in Rust [RFC,15/24] erofs: add iter methods in filesystem in Rust [RFC,16/24] erofs: implement dir and inode operations in Rust [RFC,17/24] erofs: introduce Rust SBI to C [RFC,18/24] erofs: introduce iget alternative to C [RFC,19/24] erofs: introduce namei alternative to C [RFC,20/24] erofs: introduce readdir alternative to C [RFC,21/24] erofs: introduce erofs_map_blocks alternative to C [RFC,22/24] erofs: add skippable iters in Rust [RFC,23/24] erofs: implement xattrs operations in Rust [RFC,24/24] erofs: introduce xattrs replacement to C

Yiyang Wu Sept. 16, 2024, 1:56 p.m. UTC

Introduce Errno to Rust side code. Note that in current Rust For Linux,
Errnos are implemented as core::ffi::c_uint unit structs.
However, EUCLEAN, a.k.a EFSCORRUPTED is missing from error crate.

Since the errno_base hasn't changed for over 13 years,
This patch merely serves as a temporary workaround for the missing
errno in the Rust For Linux.

Signed-off-by: Yiyang Wu <toolmanp@tlmp.cc>
---
 fs/erofs/rust/erofs_sys.rs        |   6 +
 fs/erofs/rust/erofs_sys/errnos.rs | 191 ++++++++++++++++++++++++++++++
 2 files changed, 197 insertions(+)
 create mode 100644 fs/erofs/rust/erofs_sys/errnos.rs

Greg KH Sept. 16, 2024, 5:51 p.m. UTC | #1

On Mon, Sep 16, 2024 at 09:56:13PM +0800, Yiyang Wu wrote:
> Introduce Errno to Rust side code. Note that in current Rust For Linux,
> Errnos are implemented as core::ffi::c_uint unit structs.
> However, EUCLEAN, a.k.a EFSCORRUPTED is missing from error crate.
> 
> Since the errno_base hasn't changed for over 13 years,
> This patch merely serves as a temporary workaround for the missing
> errno in the Rust For Linux.

Why not just add the missing errno to the core rust code instead?  No
need to define a whole new one for this.

thanks,

greg k-h

Gary Guo Sept. 16, 2024, 8:01 p.m. UTC | #2

On Mon, 16 Sep 2024 21:56:13 +0800
Yiyang Wu <toolmanp@tlmp.cc> wrote:

> Introduce Errno to Rust side code. Note that in current Rust For Linux,
> Errnos are implemented as core::ffi::c_uint unit structs.
> However, EUCLEAN, a.k.a EFSCORRUPTED is missing from error crate.
> 
> Since the errno_base hasn't changed for over 13 years,
> This patch merely serves as a temporary workaround for the missing
> errno in the Rust For Linux.
> 
> Signed-off-by: Yiyang Wu <toolmanp@tlmp.cc>

As Greg said, please add missing errno that you need to kernel crate
instead.

Also, it seems that you're building abstractions into EROFS directly
without building a generic abstraction. We have been avoiding that. If
there's an abstraction that you need and missing, please add that
abstraction. In fact, there're a bunch of people trying to add FS
support, please coordinate instead of rolling your own.

You also have been referencing `kernel::bindings::` directly in various
places in the patch series. The module is marked as `#[doc(hidden)]`
for a reason -- it's not supposed to referenced directly. It's only
exposed so that macros can reference them. In fact, we have a policy
that direct reference to raw bindings are not allowed from drivers.

There're a few issues with this patch itself that I pointed out below,
although as already said this would require big changes so most points
are probably moot anyway.

Thanks,
Gary

> ---
>  fs/erofs/rust/erofs_sys.rs        |   6 +
>  fs/erofs/rust/erofs_sys/errnos.rs | 191 ++++++++++++++++++++++++++++++
>  2 files changed, 197 insertions(+)
>  create mode 100644 fs/erofs/rust/erofs_sys/errnos.rs
> 
> diff --git a/fs/erofs/rust/erofs_sys.rs b/fs/erofs/rust/erofs_sys.rs
> index 0f1400175fc2..2bd1381da5ab 100644
> --- a/fs/erofs/rust/erofs_sys.rs
> +++ b/fs/erofs/rust/erofs_sys.rs
> @@ -19,4 +19,10 @@
>  pub(crate) type Nid = u64;
>  /// Erofs Super Offset to read the ondisk superblock
>  pub(crate) const EROFS_SUPER_OFFSET: Off = 1024;
> +/// PosixResult as a type alias to kernel::error::Result
> +/// to avoid naming conflicts.
> +pub(crate) type PosixResult<T> = Result<T, Errno>;
> +
> +pub(crate) mod errnos;
>  pub(crate) mod superblock;
> +pub(crate) use errnos::Errno;
> diff --git a/fs/erofs/rust/erofs_sys/errnos.rs b/fs/erofs/rust/erofs_sys/errnos.rs
> new file mode 100644
> index 000000000000..40e5cdbcb353
> --- /dev/null
> +++ b/fs/erofs/rust/erofs_sys/errnos.rs
> @@ -0,0 +1,191 @@
> +// Copyright 2024 Yiyang Wu
> +// SPDX-License-Identifier: MIT or GPL-2.0-or-later
> +
> +#[repr(i32)]
> +#[non_exhaustive]
> +#[allow(clippy::upper_case_acronyms)]
> +#[derive(Debug, Copy, Clone, PartialEq)]
> +pub(crate) enum Errno {
> +    NONE = 0,

Why is NONE an error? No "error: operation completed successfully"
please.

> +    EPERM,
> +    ENOENT,
> +    ESRCH,
> +    EINTR,
> +    EIO,
> +    ENXIO,
> +    E2BIG,
> +    ENOEXEC,
> +    EBADF,
> +    ECHILD,
> +    EAGAIN,
> +    ENOMEM,
> +    EACCES,
> +    EFAULT,
> +    ENOTBLK,
> +    EBUSY,
> +    EEXIST,
> +    EXDEV,
> +    ENODEV,
> +    ENOTDIR,
> +    EISDIR,
> +    EINVAL,
> +    ENFILE,
> +    EMFILE,
> +    ENOTTY,
> +    ETXTBSY,
> +    EFBIG,
> +    ENOSPC,
> +    ESPIPE,
> +    EROFS,
> +    EMLINK,
> +    EPIPE,
> +    EDOM,
> +    ERANGE,
> +    EDEADLK,
> +    ENAMETOOLONG,
> +    ENOLCK,
> +    ENOSYS,
> +    ENOTEMPTY,
> +    ELOOP,
> +    ENOMSG = 42,

This looks very fragile way to maintain an enum.

> +    EIDRM,
> +    ECHRNG,
> +    EL2NSYNC,
> +    EL3HLT,
> +    EL3RST,
> +    ELNRNG,
> +    EUNATCH,
> +    ENOCSI,
> +    EL2HLT,
> +    EBADE,
> +    EBADR,
> +    EXFULL,
> +    ENOANO,
> +    EBADRQC,
> +    EBADSLT,
> +    EBFONT = 59,
> +    ENOSTR,
> +    ENODATA,
> +    ETIME,
> +    ENOSR,
> +    ENONET,
> +    ENOPKG,
> +    EREMOTE,
> +    ENOLINK,
> +    EADV,
> +    ESRMNT,
> +    ECOMM,
> +    EPROTO,
> +    EMULTIHOP,
> +    EDOTDOT,
> +    EBADMSG,
> +    EOVERFLOW,
> +    ENOTUNIQ,
> +    EBADFD,
> +    EREMCHG,
> +    ELIBACC,
> +    ELIBBAD,
> +    ELIBSCN,
> +    ELIBMAX,
> +    ELIBEXEC,
> +    EILSEQ,
> +    ERESTART,
> +    ESTRPIPE,
> +    EUSERS,
> +    ENOTSOCK,
> +    EDESTADDRREQ,
> +    EMSGSIZE,
> +    EPROTOTYPE,
> +    ENOPROTOOPT,
> +    EPROTONOSUPPORT,
> +    ESOCKTNOSUPPORT,
> +    EOPNOTSUPP,
> +    EPFNOSUPPORT,
> +    EAFNOSUPPORT,
> +    EADDRINUSE,
> +    EADDRNOTAVAIL,
> +    ENETDOWN,
> +    ENETUNREACH,
> +    ENETRESET,
> +    ECONNABORTED,
> +    ECONNRESET,
> +    ENOBUFS,
> +    EISCONN,
> +    ENOTCONN,
> +    ESHUTDOWN,
> +    ETOOMANYREFS,
> +    ETIMEDOUT,
> +    ECONNREFUSED,
> +    EHOSTDOWN,
> +    EHOSTUNREACH,
> +    EALREADY,
> +    EINPROGRESS,
> +    ESTALE,
> +    EUCLEAN,
> +    ENOTNAM,
> +    ENAVAIL,
> +    EISNAM,
> +    EREMOTEIO,
> +    EDQUOT,
> +    ENOMEDIUM,
> +    EMEDIUMTYPE,
> +    ECANCELED,
> +    ENOKEY,
> +    EKEYEXPIRED,
> +    EKEYREVOKED,
> +    EKEYREJECTED,
> +    EOWNERDEAD,
> +    ENOTRECOVERABLE,
> +    ERFKILL,
> +    EHWPOISON,
> +    EUNKNOWN,
> +}
> +
> +impl From<i32> for Errno {
> +    fn from(value: i32) -> Self {
> +        if (-value) <= 0 || (-value) > Errno::EUNKNOWN as i32 {
> +            Errno::EUNKNOWN
> +        } else {
> +            // Safety: The value is guaranteed to be a valid errno and the memory
> +            // layout is the same for both types.
> +            unsafe { core::mem::transmute(value) }

This is just unsound. As evident from the fact that you need to manually
specify a few constants, the errno enum doesn't cover all values from 1
to EUNKNOWN.

> +        }
> +    }
> +}
> +
> +impl From<Errno> for i32 {
> +    fn from(value: Errno) -> Self {
> +        -(value as i32)
> +    }
> +}
> +
> +/// Replacement for ERR_PTR in Linux Kernel.
> +impl From<Errno> for *const core::ffi::c_void {
> +    fn from(value: Errno) -> Self {
> +        (-(value as core::ffi::c_long)) as *const core::ffi::c_void
> +    }
> +}
> +
> +impl From<Errno> for *mut core::ffi::c_void {
> +    fn from(value: Errno) -> Self {
> +        (-(value as core::ffi::c_long)) as *mut core::ffi::c_void
> +    }
> +}
> +
> +/// Replacement for PTR_ERR in Linux Kernel.
> +impl From<*const core::ffi::c_void> for Errno {
> +    fn from(value: *const core::ffi::c_void) -> Self {
> +        (-(value as i32)).into()
> +    }
> +}
> +
> +impl From<*mut core::ffi::c_void> for Errno {
> +    fn from(value: *mut core::ffi::c_void) -> Self {
> +        (-(value as i32)).into()
> +    }
> +}
> +/// Replacement for IS_ERR in Linux Kernel.
> +#[inline(always)]
> +pub(crate) fn is_value_err(value: *const core::ffi::c_void) -> bool {
> +    (value as core::ffi::c_ulong) >= (-4095 as core::ffi::c_long) as core::ffi::c_ulong
> +}

Gao Xiang Sept. 16, 2024, 11:45 p.m. UTC | #3

Hi Greg,

On 2024/9/17 01:51, Greg KH wrote:
> On Mon, Sep 16, 2024 at 09:56:13PM +0800, Yiyang Wu wrote:
>> Introduce Errno to Rust side code. Note that in current Rust For Linux,
>> Errnos are implemented as core::ffi::c_uint unit structs.
>> However, EUCLEAN, a.k.a EFSCORRUPTED is missing from error crate.
>>
>> Since the errno_base hasn't changed for over 13 years,
>> This patch merely serves as a temporary workaround for the missing
>> errno in the Rust For Linux.
> 
> Why not just add the missing errno to the core rust code instead?  No
> need to define a whole new one for this.

I've discussed with Yiyang about this last week.  I also tend to avoid
our own errno.

The main reason is that Rust errno misses EUCLEAN error number. TBH, I
don't know why not just introduces all kernel supported errnos for Rust
in one shot.

I guess just because no Rust user uses other errno?  But for errno
cases, I think it's odd for users to add their own errno.

Thanks,
Gao Xiang

> 
> thanks,
> 
> greg k-h

Gao Xiang Sept. 16, 2024, 11:58 p.m. UTC | #4

Hi Gary,

On 2024/9/17 04:01, Gary Guo wrote:
> On Mon, 16 Sep 2024 21:56:13 +0800
> Yiyang Wu <toolmanp@tlmp.cc> wrote:
> 
>> Introduce Errno to Rust side code. Note that in current Rust For Linux,
>> Errnos are implemented as core::ffi::c_uint unit structs.
>> However, EUCLEAN, a.k.a EFSCORRUPTED is missing from error crate.
>>
>> Since the errno_base hasn't changed for over 13 years,
>> This patch merely serves as a temporary workaround for the missing
>> errno in the Rust For Linux.
>>
>> Signed-off-by: Yiyang Wu <toolmanp@tlmp.cc>
> 
> As Greg said, please add missing errno that you need to kernel crate
> instead.

I've answered Greg about this in another email.

> 
> Also, it seems that you're building abstractions into EROFS directly
> without building a generic abstraction. We have been avoiding that. If
> there's an abstraction that you need and missing, please add that
> abstraction. In fact, there're a bunch of people trying to add FS

No, I'd like to try to replace some EROFS C logic first to Rust (by
using EROFS C API interfaces) and try if Rust is really useful for
a real in-tree filesystem.  If Rust can improve EROFS security or
performance (although I'm sceptical on performance), As an EROFS
maintainer, I'm totally fine to accept EROFS Rust logic landed to
help the whole filesystem better.

For Rust VFS abstraction, that is a different and indepenent story,
Yiyang don't have any bandwidth on this due to his limited time.
And I _also_ don't think an incomplete ROFS VFS Rust abstraction
is useful to Linux community (because IMO for generic interface
design, we need a global vision for all filesystems instead of
just ROFSes.  No existing user is not an excuse for an incomplete
abstraction.)

If a reasonble Rust VFS abstraction landed, I think we will switch
to use that, but as I said, they are completely two stories.

> support, please coordinate instead of rolling your own.
> 
> You also have been referencing `kernel::bindings::` directly in various
> places in the patch series. The module is marked as `#[doc(hidden)]`
> for a reason -- it's not supposed to referenced directly. It's only
> exposed so that macros can reference them. In fact, we have a policy
> that direct reference to raw bindings are not allowed from drivers.

This patch can be avoided if EUCLEAN is added to errno.

Thanks,
Gao Xiang

Benno Lossin Sept. 19, 2024, 1:45 p.m. UTC | #5

Hi,

Thanks for the patch series. I think it's great that you want to use
Rust for this filesystem.

On 17.09.24 01:58, Gao Xiang wrote:
> On 2024/9/17 04:01, Gary Guo wrote:
>> Also, it seems that you're building abstractions into EROFS directly
>> without building a generic abstraction. We have been avoiding that. If
>> there's an abstraction that you need and missing, please add that
>> abstraction. In fact, there're a bunch of people trying to add FS
> 
> No, I'd like to try to replace some EROFS C logic first to Rust (by
> using EROFS C API interfaces) and try if Rust is really useful for
> a real in-tree filesystem.  If Rust can improve EROFS security or
> performance (although I'm sceptical on performance), As an EROFS
> maintainer, I'm totally fine to accept EROFS Rust logic landed to
> help the whole filesystem better.

As Gary already said, we have been using a different approach and it has
served us well. Your approach of calling directly into C from the driver
can be used to create a proof of concept, but in our opinion it is not
something that should be put into mainline. That is because calling C
from Rust is rather complicated due to the many nuanced features that
Rust provides (for example the safety requirements of references).
Therefore moving the dangerous parts into a central location is crucial
for making use of all of Rust's advantages inside of your code.

> For Rust VFS abstraction, that is a different and indepenent story,
> Yiyang don't have any bandwidth on this due to his limited time.

This seems a bit weird, you have the bandwidth to write your own
abstractions, but not use the stuff that has already been developed?

I have quickly glanced over the patchset and the abstractions seem
rather immature, not general enough for other filesystems to also take
advantage of them. They also miss safety documentation and are in
general poorly documented.

Additionally, all of the code that I saw is put into the `fs/erofs` and
`rust/erofs_sys` directories. That way people can't directly benefit
from your code, put your general abstractions into the kernel crate.
Soon we will be split the kernel crate, I could imagine that we end up
with an `fs` crate, when that happens, we would put those abstractions
there.

As I don't have the bandwidth to review two different sets of filesystem
abstractions, I can only provide you with feedback if you use the
existing abstractions.

> And I _also_ don't think an incomplete ROFS VFS Rust abstraction
> is useful to Linux community

IIRC Wedson created ROFS VFS abstractions before going for the full
filesystem. So it would definitely be useful for other read-only
filesystems (as well as filesystems that also allow writing, since last
time I checked, they often also support reading).

> (because IMO for generic interface
> design, we need a global vision for all filesystems instead of
> just ROFSes.  No existing user is not an excuse for an incomplete
> abstraction.)

Yes we need a global vision, but if you would use the existing
abstractions, then you would participate in this global vision.

Sorry for repeating this point so many times, but it is *really*
important that we don't have multiple abstractions for the same thing.

> If a reasonble Rust VFS abstraction landed, I think we will switch
> to use that, but as I said, they are completely two stories.

For them to land, there has to be some kind of user. For example, a rust
reference driver, or a new filesystem. For example this one.

---
Cheers,
Benno

Gao Xiang Sept. 19, 2024, 3:13 p.m. UTC | #6

Hi Benno,

On 2024/9/19 21:45, Benno Lossin wrote:
> Hi,
> 
> Thanks for the patch series. I think it's great that you want to use
> Rust for this filesystem.
> 
> On 17.09.24 01:58, Gao Xiang wrote:
>> On 2024/9/17 04:01, Gary Guo wrote:
>>> Also, it seems that you're building abstractions into EROFS directly
>>> without building a generic abstraction. We have been avoiding that. If
>>> there's an abstraction that you need and missing, please add that
>>> abstraction. In fact, there're a bunch of people trying to add FS
>>
>> No, I'd like to try to replace some EROFS C logic first to Rust (by
>> using EROFS C API interfaces) and try if Rust is really useful for
>> a real in-tree filesystem.  If Rust can improve EROFS security or
>> performance (although I'm sceptical on performance), As an EROFS
>> maintainer, I'm totally fine to accept EROFS Rust logic landed to
>> help the whole filesystem better.
> 
> As Gary already said, we have been using a different approach and it has
> served us well. Your approach of calling directly into C from the driver
> can be used to create a proof of concept, but in our opinion it is not
> something that should be put into mainline. That is because calling C
> from Rust is rather complicated due to the many nuanced features that
> Rust provides (for example the safety requirements of references).
> Therefore moving the dangerous parts into a central location is crucial
> for making use of all of Rust's advantages inside of your code.

I'm not quite sure about your point honestly.  In my opinion, there
is nothing different to use Rust _within a filesystem_ or _within a
driver_ or _within a Linux subsystem_ as long as all negotiated APIs
are audited.

Otherwise, it means Rust will never be used to write Linux core parts
such as MM, VFS or block layer. Does this point make sense? At least,
Rust needs to get along with the existing C code (in an audited way)
rather than refuse C code.

My personal idea about Rust: I think Rust is just another _language
tool_ for the Linux kernel which could save us time and make the
kernel development better.

Or I wonder why not writing a complete new Rust stuff instead rather
than living in the C world?

> 
>> For Rust VFS abstraction, that is a different and indepenent story,
>> Yiyang don't have any bandwidth on this due to his limited time.
> 
> This seems a bit weird, you have the bandwidth to write your own
> abstractions, but not use the stuff that has already been developed?

It's not written by me, Yiyang is still an undergraduate tudent.
It's his research project and I don't think it's his responsibility
to make an upstreamable VFS abstraction.

> 
> I have quickly glanced over the patchset and the abstractions seem
> rather immature, not general enough for other filesystems to also take

I don't have enough time to take a full look of this patchset too
due to other ongoing work for now (Rust EROFS is not quite a high
priority stuff for me).

And that's why it's called "RFC PATCH".

> advantage of them. They also miss safety documentation and are in

I don't think it needs to be general enough, since we'd like to use
the new Rust language tool within a subsystem.

So why it needs to take care of other filesystems? Again, I'm not
working on a full VFS abstriction.

Yes, this patchset is not perfect.  But I've asked Yiyang to isolate
all VFS structures as much as possible, but it seems that it still
touches something.

> general poorly documented.

Okay, I think it can be improved then if you give more detailed hints.

> 
> Additionally, all of the code that I saw is put into the `fs/erofs` and
> `rust/erofs_sys` directories. That way people can't directly benefit
> from your code, put your general abstractions into the kernel crate.
> Soon we will be split the kernel crate, I could imagine that we end up
> with an `fs` crate, when that happens, we would put those abstractions
> there.
> 
> As I don't have the bandwidth to review two different sets of filesystem
> abstractions, I can only provide you with feedback if you use the
> existing abstractions.

I think Rust is just a tool, if you could have extra time to review
our work, that would be wonderful!  Many thanks then.

However, if you don't have time to review, IMHO, Rust is just a tool,
I think each subsystem can choose to use Rust in their codebase, or
I'm not sure what's your real point is?

> 
>> And I _also_ don't think an incomplete ROFS VFS Rust abstraction
>> is useful to Linux community
> 
> IIRC Wedson created ROFS VFS abstractions before going for the full
> filesystem. So it would definitely be useful for other read-only
> filesystems (as well as filesystems that also allow writing, since last
> time I checked, they often also support reading).

Leaving aside everything else, an incomplete Rust read-only VFS
abstraction itself is just an unsafe stuff.

> 
>> (because IMO for generic interface
>> design, we need a global vision for all filesystems instead of
>> just ROFSes.  No existing user is not an excuse for an incomplete
>> abstraction.)
> 
> Yes we need a global vision, but if you would use the existing
> abstractions, then you would participate in this global vision.
> 
> Sorry for repeating this point so many times, but it is *really*
> important that we don't have multiple abstractions for the same thing.

I've expressed my viewpoint.

> 
>> If a reasonble Rust VFS abstraction landed, I think we will switch
>> to use that, but as I said, they are completely two stories.
> 
> For them to land, there has to be some kind of user. For example, a rust
> reference driver, or a new filesystem. For example this one.

Without a full proper VFS abstraction, it's just broken and
needs to be refactored.  And that will be painful to all
users then.

=======
In the end,

Other thoughts, comments are helpful here since I wonder how "Rust
-for-Linux" works in the long term, and decide whether I will work
on Kernel Rust or not at least in the short term.

Thanks,
Gao Xiang

> 
> ---
> Cheers,
> Benno
>

Benno Lossin Sept. 19, 2024, 7:36 p.m. UTC | #7

On 19.09.24 17:13, Gao Xiang wrote:
> Hi Benno,
> 
> On 2024/9/19 21:45, Benno Lossin wrote:
>> Hi,
>>
>> Thanks for the patch series. I think it's great that you want to use
>> Rust for this filesystem.
>>
>> On 17.09.24 01:58, Gao Xiang wrote:
>>> On 2024/9/17 04:01, Gary Guo wrote:
>>>> Also, it seems that you're building abstractions into EROFS directly
>>>> without building a generic abstraction. We have been avoiding that. If
>>>> there's an abstraction that you need and missing, please add that
>>>> abstraction. In fact, there're a bunch of people trying to add FS
>>>
>>> No, I'd like to try to replace some EROFS C logic first to Rust (by
>>> using EROFS C API interfaces) and try if Rust is really useful for
>>> a real in-tree filesystem.  If Rust can improve EROFS security or
>>> performance (although I'm sceptical on performance), As an EROFS
>>> maintainer, I'm totally fine to accept EROFS Rust logic landed to
>>> help the whole filesystem better.
>>
>> As Gary already said, we have been using a different approach and it has
>> served us well. Your approach of calling directly into C from the driver
>> can be used to create a proof of concept, but in our opinion it is not
>> something that should be put into mainline. That is because calling C
>> from Rust is rather complicated due to the many nuanced features that
>> Rust provides (for example the safety requirements of references).
>> Therefore moving the dangerous parts into a central location is crucial
>> for making use of all of Rust's advantages inside of your code.
> 
> I'm not quite sure about your point honestly.  In my opinion, there
> is nothing different to use Rust _within a filesystem_ or _within a
> driver_ or _within a Linux subsystem_ as long as all negotiated APIs
> are audited.

To us there is a big difference: If a lot of functions in an API are
`unsafe` without being inherent from the problem that it solves, then
it's a bad API.

> Otherwise, it means Rust will never be used to write Linux core parts
> such as MM, VFS or block layer. Does this point make sense? At least,
> Rust needs to get along with the existing C code (in an audited way)
> rather than refuse C code.

I am neither requiring you to write solely safe code, nor am I banning
interacting with the C side. What we mean when we talk about
abstractions is that we want to minimize the Rust code that directly
interfaces with C. Rust-to-Rust interfaces can be a lot safer and are
easier to implement correctly.

> My personal idea about Rust: I think Rust is just another _language
> tool_ for the Linux kernel which could save us time and make the
> kernel development better.

Yes, but we do have conventions, rules and guidelines for writing such
code. C code also has them. If you want/need to break them, there should
be a good reason to do so. I don't see one in this instance.

> Or I wonder why not writing a complete new Rust stuff instead rather
> than living in the C world?

There are projects that do that yes. But Rust-for-Linux is about
bringing Rust to the kernel and part of that is coming up with good
conventions and rules.

>>> For Rust VFS abstraction, that is a different and indepenent story,
>>> Yiyang don't have any bandwidth on this due to his limited time.
>>
>> This seems a bit weird, you have the bandwidth to write your own
>> abstractions, but not use the stuff that has already been developed?
> 
> It's not written by me, Yiyang is still an undergraduate tudent.
> It's his research project and I don't think it's his responsibility
> to make an upstreamable VFS abstraction.

That is fair, but he wouldn't have to start from scratch, Wedsons
abstractions were good enough for him to write a Rust version of ext2.
In addition, tarfs and puzzlefs also use those bindings.
To me it sounds as if you have not taken the time to try to make it work
with the existing abstractions. Have you tried reaching out to Ariel? He
is working on puzzlefs and might have some insight to give you. Sadly
Wedson has left the project, so someone will have to pick up his work.

I hope that you understand that we can't have two abstractions for the
same C API. It confuses people which to use, some features might only be
available in one version and others only in the other. It would be a
total mess. It's just like the rule for no duplicated drivers that you
have on the C side.

People (mostly Wedson) also have put in a lot of work into making the
VFS abstractions good. Why ignore all of that?

>> I have quickly glanced over the patchset and the abstractions seem
>> rather immature, not general enough for other filesystems to also take
> 
> I don't have enough time to take a full look of this patchset too
> due to other ongoing work for now (Rust EROFS is not quite a high
> priority stuff for me).
> 
> And that's why it's called "RFC PATCH".

Yeah I saw the RFC title. I just wanted to communicate early that I
would not review it if it were a normal patch. In fact, I would advise
against taking the patch, due to the reasons I outlined.

>> advantage of them. They also miss safety documentation and are in
> 
> I don't think it needs to be general enough, since we'd like to use
> the new Rust language tool within a subsystem.
> 
> So why it needs to take care of other filesystems? Again, I'm not
> working on a full VFS abstriction.

And that's OK, feel free to just pick the parts of the existing VFS that
you need and extend as you (or your student) see fit. What you said
yourself is that we need a global vision for VFS abstractions. If you
only use a subset of them, then you only care about that subset, other
people can extend it if they need. If everyone would roll their own
abstractions without communicating, then how would we create a global
vision?

> Yes, this patchset is not perfect.  But I've asked Yiyang to isolate
> all VFS structures as much as possible, but it seems that it still
> touches something.

It would already be a big improvement to put the VFS structures into the
kernel crate. Because then everyone can benefit from your work.

>> general poorly documented.
> 
> Okay, I think it can be improved then if you give more detailed hints.
> 
>>
>> Additionally, all of the code that I saw is put into the `fs/erofs` and
>> `rust/erofs_sys` directories. That way people can't directly benefit
>> from your code, put your general abstractions into the kernel crate.
>> Soon we will be split the kernel crate, I could imagine that we end up
>> with an `fs` crate, when that happens, we would put those abstractions
>> there.
>>
>> As I don't have the bandwidth to review two different sets of filesystem
>> abstractions, I can only provide you with feedback if you use the
>> existing abstractions.
> 
> I think Rust is just a tool, if you could have extra time to review
> our work, that would be wonderful!  Many thanks then.
> 
> However, if you don't have time to review, IMHO, Rust is just a tool,
> I think each subsystem can choose to use Rust in their codebase, or
> I'm not sure what's your real point is?

I don't want to prevent or discourage you from using Rust in the kernel.
In fact, I can't prevent you from putting this in, since after all you
are the maintainer.
What I can do, is advise against not using abstractions. That has been
our philosophy since very early on. They are the reason that you can
write PHY drivers without any `unsafe` code whatsoever *today*. I think
that is an impressive feat and our recipe for success.

We even have this in our documentation:
https://docs.kernel.org/rust/general-information.html#abstractions-vs-bindings

My real point is that I want Rust to succeed in the kernel. I strongly
believe that good abstractions (in the sense that you can do as much as
possible using only safe Rust) are a crucial factor.
I and others from the RfL team can help you if you (or your student)
have any Rust related questions for the abstractions. Feel free to reach
out.

Maybe Miguel can say more on this matter, since he was at the
maintainers summit, but our takeaways essentially are that we want
maintainers to experiment with Rust. And if you don't have any real
users, then breaking the Rust code is fine.
Though I think that with breaking we mean that changes to the C side
prevent the Rust side from working, not shipping Rust code without
abstractions.

We might be able to make an exception to the "your driver can only use
abstractions" rule, but only with the promise that the subsystem is
working towards creating suitable abstractions and replacing the direct
C accesses with that.

I personally think that we should not make that the norm, instead try to
create the minimal abstraction and minimal driver (without directly
calling C) that you need to start. Of course this might not work, the
"minimal driver" might need to be rather complex for you to start, but I
don't know your subsystem to make that judgement.

>>> And I _also_ don't think an incomplete ROFS VFS Rust abstraction
>>> is useful to Linux community
>>
>> IIRC Wedson created ROFS VFS abstractions before going for the full
>> filesystem. So it would definitely be useful for other read-only
>> filesystems (as well as filesystems that also allow writing, since last
>> time I checked, they often also support reading).
> 
> Leaving aside everything else, an incomplete Rust read-only VFS
> abstraction itself is just an unsafe stuff.

I don't understand what you want to say.

>>> (because IMO for generic interface
>>> design, we need a global vision for all filesystems instead of
>>> just ROFSes.  No existing user is not an excuse for an incomplete
>>> abstraction.)
>>
>> Yes we need a global vision, but if you would use the existing
>> abstractions, then you would participate in this global vision.
>>
>> Sorry for repeating this point so many times, but it is *really*
>> important that we don't have multiple abstractions for the same thing.
> 
> I've expressed my viewpoint.
> 
>>
>>> If a reasonble Rust VFS abstraction landed, I think we will switch
>>> to use that, but as I said, they are completely two stories.
>>
>> For them to land, there has to be some kind of user. For example, a rust
>> reference driver, or a new filesystem. For example this one.
> 
> Without a full proper VFS abstraction, it's just broken and
> needs to be refactored.  And that will be painful to all
> users then.

I also don't understand your point here. What is broken, this EROFS
implementation? Why will it be painful to refactor?

> =======
> In the end,
> 
> Other thoughts, comments are helpful here since I wonder how "Rust
> -for-Linux" works in the long term, and decide whether I will work
> on Kernel Rust or not at least in the short term.

The longterm goal is to make everything that is possible in C, possible
in Rust. For more info, please take a look at the kernel summit talk by
Miguel Ojeda.
However, we can only reach that longterm goal if maintainers are willing
and ready to put Rust into their subsystems (either by knowing/learning
Rust themselves or by having a co-maintainer that does just the Rust
part). So you wanting to experiment is great. I appreciate that you also
have a student working on this. Still, I think we should follow our
guidelines and create abstractions in order to require as little
`unsafe` code as possible.

---
Cheers,
Benno

Gao Xiang Sept. 20, 2024, 12:49 a.m. UTC | #8

On 2024/9/20 03:36, Benno Lossin wrote:
> On 19.09.24 17:13, Gao Xiang wrote:
>> Hi Benno,
>>
>> On 2024/9/19 21:45, Benno Lossin wrote:
>>> Hi,
>>>
>>> Thanks for the patch series. I think it's great that you want to use
>>> Rust for this filesystem.
>>>
>>> On 17.09.24 01:58, Gao Xiang wrote:
>>>> On 2024/9/17 04:01, Gary Guo wrote:
>>>>> Also, it seems that you're building abstractions into EROFS directly
>>>>> without building a generic abstraction. We have been avoiding that. If
>>>>> there's an abstraction that you need and missing, please add that
>>>>> abstraction. In fact, there're a bunch of people trying to add FS
>>>>
>>>> No, I'd like to try to replace some EROFS C logic first to Rust (by
>>>> using EROFS C API interfaces) and try if Rust is really useful for
>>>> a real in-tree filesystem.  If Rust can improve EROFS security or
>>>> performance (although I'm sceptical on performance), As an EROFS
>>>> maintainer, I'm totally fine to accept EROFS Rust logic landed to
>>>> help the whole filesystem better.
>>>
>>> As Gary already said, we have been using a different approach and it has
>>> served us well. Your approach of calling directly into C from the driver
>>> can be used to create a proof of concept, but in our opinion it is not
>>> something that should be put into mainline. That is because calling C
>>> from Rust is rather complicated due to the many nuanced features that
>>> Rust provides (for example the safety requirements of references).
>>> Therefore moving the dangerous parts into a central location is crucial
>>> for making use of all of Rust's advantages inside of your code.
>>
>> I'm not quite sure about your point honestly.  In my opinion, there
>> is nothing different to use Rust _within a filesystem_ or _within a
>> driver_ or _within a Linux subsystem_ as long as all negotiated APIs
>> are audited.
> 
> To us there is a big difference: If a lot of functions in an API are
> `unsafe` without being inherent from the problem that it solves, then
> it's a bad API.

Which one? If you point it out, we will update the EROFS kernel
APIs then.

> 
>> Otherwise, it means Rust will never be used to write Linux core parts
>> such as MM, VFS or block layer. Does this point make sense? At least,
>> Rust needs to get along with the existing C code (in an audited way)
>> rather than refuse C code.
> 
> I am neither requiring you to write solely safe code, nor am I banning
> interacting with the C side. What we mean when we talk about
> abstractions is that we want to minimize the Rust code that directly
> interfaces with C. Rust-to-Rust interfaces can be a lot safer and are

We will definitly minimize the API interface between Rust and C in
EROFS.

And it can be done incrementally, why not?  I assume your world is
not pure C and pure Rust as for the Rust for Linux project, no?

> easier to implement correctly.
> 
>> My personal idea about Rust: I think Rust is just another _language
>> tool_ for the Linux kernel which could save us time and make the
>> kernel development better.
> 
> Yes, but we do have conventions, rules and guidelines for writing such
> code. C code also has them. If you want/need to break them, there should
> be a good reason to do so. I don't see one in this instance.
> >> Or I wonder why not writing a complete new Rust stuff instead rather
>> than living in the C world?
> 
> There are projects that do that yes. But Rust-for-Linux is about
> bringing Rust to the kernel and part of that is coming up with good
> conventions and rules.

Which rule is broken?  Was they discussed widely around the
Linux world?

> 
>>>> For Rust VFS abstraction, that is a different and indepenent story,
>>>> Yiyang don't have any bandwidth on this due to his limited time.
>>>
>>> This seems a bit weird, you have the bandwidth to write your own
>>> abstractions, but not use the stuff that has already been developed?
>>
>> It's not written by me, Yiyang is still an undergraduate tudent.
>> It's his research project and I don't think it's his responsibility
>> to make an upstreamable VFS abstraction.
> 
> That is fair, but he wouldn't have to start from scratch, Wedsons
> abstractions were good enough for him to write a Rust version of ext2.

The Wedson one is just broken, I assume that you've read
https://lwn.net/Articles/978738/ ?

The initial Linux VFS C version is already for generic
read-write use.

> In addition, tarfs and puzzlefs also use those bindings.

These are both toy fses, I don't know who will use these two
fses for their customers.

> To me it sounds as if you have not taken the time to try to make it work
> with the existing abstractions. Have you tried reaching out to Ariel? He
> is working on puzzlefs and might have some insight to give you. Sadly

IMHO, puzzlefs is another Rust incomplete clone of EROFS, I
could tell him what EROFS currently do.

I'm very happy to collaborate with him to work on his use
cases (and tell him why EROFS can already be used for his
use cases), just
like the previous ComposeFS discussion.

There are enough FS projects which reinvents in-tree fses without
enough good reasons (for example, performance or design): ZUFS,
FamFS, ComposeFS.

Tarfs (here tar is not the real tar format), and Puzzlefs are
two special one just because they are written in Rust.  But
other than that they are just incomplete approach to EROFS.

I do hope Ariel could attend LSF/MM/BPF to discuss his use
cases with filesystem developpers.  And I'm very happy to
collaborate with him .

> Wedson has left the project, so someone will have to pick up his work.

It's not necessary to be Yiyang, since he's interested in
EROFS only.

> 
> I hope that you understand that we can't have two abstractions for the
> same C API. It confuses people which to use, some features might only be
> available in one version and others only in the other. It would be a
> total mess. It's just like the rule for no duplicated drivers that you
> have on the C side.
> 
> People (mostly Wedson) also have put in a lot of work into making the
> VFS abstractions good. Why ignore all of that?

How good? TBH, I think there could be something left eventually,
but the current prososed Rust VFS abstraction is just broken.

I don't think the current abstraction is of any use to be
upstreamed, at least, it should be driven with a generic
read-write filesystem, and resolve lifetime issues during
development (for example, just like Al mentioned d_name and
d_parent, etc.)

Because the initial Linux VFS C version is completely out of
minix fs, rather than an incomplete broken one just for some
toy (I don't know how to express more accurately, since each
upstream filesystem should have strong use cases and users,
but tarfs and puzzlefs are both not.)

Otherwise, all the broken Rust VFS users will be painful for
bugs and endless API refactering.

> 
>>> I have quickly glanced over the patchset and the abstractions seem
>>> rather immature, not general enough for other filesystems to also take
>>
>> I don't have enough time to take a full look of this patchset too
>> due to other ongoing work for now (Rust EROFS is not quite a high
>> priority stuff for me).
>>
>> And that's why it's called "RFC PATCH".
> 
> Yeah I saw the RFC title. I just wanted to communicate early that I
> would not review it if it were a normal patch. In fact, I would advise
> against taking the patch, due to the reasons I outlined.

You reason currently is still not valid.  IMO, again, Rust
is just a tool, you cannot forbid a real Linux subsystem to
use Rust as an experiment.

Or what's your real point?  An Rust gatekeeper?

> 
>>> advantage of them. They also miss safety documentation and are in
>>
>> I don't think it needs to be general enough, since we'd like to use
>> the new Rust language tool within a subsystem.
>>
>> So why it needs to take care of other filesystems? Again, I'm not
>> working on a full VFS abstriction.
> 
> And that's OK, feel free to just pick the parts of the existing VFS that
> you need and extend as you (or your student) see fit. What you said
> yourself is that we need a global vision for VFS abstractions. If you
> only use a subset of them, then you only care about that subset, other
> people can extend it if they need. If everyone would roll their own
> abstractions without communicating, then how would we create a global
> vision?

No.  We don't roll our own abstraction, instead we define a
clear C <-> Rust boundary of EROFS APIs, just like
"fs/xfs/libxfs" if you could take a look.

> 
>> Yes, this patchset is not perfect.  But I've asked Yiyang to isolate
>> all VFS structures as much as possible, but it seems that it still
>> touches something.
> 
> It would already be a big improvement to put the VFS structures into the
> kernel crate. Because then everyone can benefit from your work.

Again, that is not Yiyang's interest.  Which is just like to sell
something you don't want, I don't think it's reasonable.

> 
>>> general poorly documented.
>>
>> Okay, I think it can be improved then if you give more detailed hints.
>>
>>>
>>> Additionally, all of the code that I saw is put into the `fs/erofs` and
>>> `rust/erofs_sys` directories. That way people can't directly benefit
>>> from your code, put your general abstractions into the kernel crate.
>>> Soon we will be split the kernel crate, I could imagine that we end up
>>> with an `fs` crate, when that happens, we would put those abstractions
>>> there.
>>>
>>> As I don't have the bandwidth to review two different sets of filesystem
>>> abstractions, I can only provide you with feedback if you use the
>>> existing abstractions.
>>
>> I think Rust is just a tool, if you could have extra time to review
>> our work, that would be wonderful!  Many thanks then.
>>
>> However, if you don't have time to review, IMHO, Rust is just a tool,
>> I think each subsystem can choose to use Rust in their codebase, or
>> I'm not sure what's your real point is?
> 
> I don't want to prevent or discourage you from using Rust in the kernel.
> In fact, I can't prevent you from putting this in, since after all you
> are the maintainer.

I do think you're discouraging anyone to use Rust in their codebase,
because I've said we _will_ form a good abstraction in our codebase.

But you're just selling another stuff forcely.

> What I can do, is advise against not using abstractions. That has been
> our philosophy since very early on. They are the reason that you can
> write PHY drivers without any `unsafe` code whatsoever *today*. I think

I don't think filesystems are comparable to some PHY drivers.  If you
take ext4, it's more than 65000 line, and XFS, that is almost 78000.

I think filesystems can have a way to be reimplmented in Rust
incrementally, rather than purely black and write world.

> that is an impressive feat and our recipe for success.
> 
> We even have this in our documentation:
> https://docs.kernel.org/rust/general-information.html#abstractions-vs-bindings
> 
> My real point is that I want Rust to succeed in the kernel. I strongly
> believe that good abstractions (in the sense that you can do as much as
> possible using only safe Rust) are a crucial factor.

On my side, you are just isolating any useful subsystem to try
to use Rust and

“Leaf” modules (e.g. drivers) should not use the C bindings directly

is unreasonable for filesystems because it cannot be done in one
shot.

In addition, there are a lot ongoing features on both C and Rust
side, you need at least a fallback to make end users happy with a
unique feature view rather than just return "it's broken".

Users don't care C or Rust (they only care full functionality)
but only developers care.  And mixing up two different things
(VFS abstraction and use Rust in the codebase) is not good to
RFL success.

IMHO, this is "Rust for Linux" not "Linux for Rust".

> I and others from the RfL team can help you if you (or your student)
> have any Rust related questions for the abstractions. Feel free to reach
> out.
> 
> 
> Maybe Miguel can say more on this matter, since he was at the
> maintainers summit, but our takeaways essentially are that we want
> maintainers to experiment with Rust. And if you don't have any real
> users, then breaking the Rust code is fine.
> Though I think that with breaking we mean that changes to the C side
> prevent the Rust side from working, not shipping Rust code without
> abstractions.

I think you're still mixing them up.

> 
> We might be able to make an exception to the "your driver can only use
> abstractions" rule, but only with the promise that the subsystem is
> working towards creating suitable abstractions and replacing the direct
> C accesses with that.

I don't think those rules are reasonable for RFL success, honestly.

You are artificially isolating the Linux C and Rust world, not from
Linux users or Linux ecosystem perspective, but only from some
developer perspersive.

Good luck, anyway.

> 
> I personally think that we should not make that the norm, instead try to
> create the minimal abstraction and minimal driver (without directly
> calling C) that you need to start. Of course this might not work, the
> "minimal driver" might need to be rather complex for you to start, but I
> don't know your subsystem to make that judgement.

...

>>
>> Without a full proper VFS abstraction, it's just broken and
>> needs to be refactored.  And that will be painful to all
>> users then.
> 
> I also don't understand your point here. What is broken, this EROFS
> implementation? Why will it be painful to refactor?

I've said earlier.

> 
>> =======
>> In the end,
>>
>> Other thoughts, comments are helpful here since I wonder how "Rust
>> -for-Linux" works in the long term, and decide whether I will work
>> on Kernel Rust or not at least in the short term.
> 
> The longterm goal is to make everything that is possible in C, possible
> in Rust. For more info, please take a look at the kernel summit talk by

But you're disallowing Rust in the codebase.

> Miguel Ojeda.
> However, we can only reach that longterm goal if maintainers are willing
> and ready to put Rust into their subsystems (either by knowing/learning
> Rust themselves or by having a co-maintainer that does just the Rust
> part). So you wanting to experiment is great. I appreciate that you also
> have a student working on this. Still, I think we should follow our
> guidelines and create abstractions in order to require as little
> `unsafe` code as possible.

I've expressed my point.  I don't think some `guideline`
could bring success to RFL.  Since many subsystems needs
an incremental way, not just a black-or-white thing.

Thanks,
Gao Xiang

> 
> ---
> Cheers,
> Benno

Yiyang Wu Sept. 20, 2024, 2:57 a.m. UTC | #9

On Mon, Sep 16, 2024 at 07:51:40PM GMT, Greg KH wrote:
> On Mon, Sep 16, 2024 at 09:56:13PM +0800, Yiyang Wu wrote:
> > Introduce Errno to Rust side code. Note that in current Rust For Linux,
> > Errnos are implemented as core::ffi::c_uint unit structs.
> > However, EUCLEAN, a.k.a EFSCORRUPTED is missing from error crate.
> > 
> > Since the errno_base hasn't changed for over 13 years,
> > This patch merely serves as a temporary workaround for the missing
> > errno in the Rust For Linux.
> 
> Why not just add the missing errno to the core rust code instead?  No
> need to define a whole new one for this.
> 
> thanks,
> 
> greg k-h

I have added all the missing errnos by autogenerating declare_err!
in the preceding patches. Please check :)

Best Regards,

Yiyang Wu

Greg KH Sept. 21, 2024, 8:37 a.m. UTC | #10

On Fri, Sep 20, 2024 at 08:49:26AM +0800, Gao Xiang wrote:
> 
> 
> On 2024/9/20 03:36, Benno Lossin wrote:
> > On 19.09.24 17:13, Gao Xiang wrote:
> > > Hi Benno,
> > > 
> > > On 2024/9/19 21:45, Benno Lossin wrote:
> > > > Hi,
> > > > 
> > > > Thanks for the patch series. I think it's great that you want to use
> > > > Rust for this filesystem.
> > > > 
> > > > On 17.09.24 01:58, Gao Xiang wrote:
> > > > > On 2024/9/17 04:01, Gary Guo wrote:
> > > > > > Also, it seems that you're building abstractions into EROFS directly
> > > > > > without building a generic abstraction. We have been avoiding that. If
> > > > > > there's an abstraction that you need and missing, please add that
> > > > > > abstraction. In fact, there're a bunch of people trying to add FS
> > > > > 
> > > > > No, I'd like to try to replace some EROFS C logic first to Rust (by
> > > > > using EROFS C API interfaces) and try if Rust is really useful for
> > > > > a real in-tree filesystem.  If Rust can improve EROFS security or
> > > > > performance (although I'm sceptical on performance), As an EROFS
> > > > > maintainer, I'm totally fine to accept EROFS Rust logic landed to
> > > > > help the whole filesystem better.
> > > > 
> > > > As Gary already said, we have been using a different approach and it has
> > > > served us well. Your approach of calling directly into C from the driver
> > > > can be used to create a proof of concept, but in our opinion it is not
> > > > something that should be put into mainline. That is because calling C
> > > > from Rust is rather complicated due to the many nuanced features that
> > > > Rust provides (for example the safety requirements of references).
> > > > Therefore moving the dangerous parts into a central location is crucial
> > > > for making use of all of Rust's advantages inside of your code.
> > > 
> > > I'm not quite sure about your point honestly.  In my opinion, there
> > > is nothing different to use Rust _within a filesystem_ or _within a
> > > driver_ or _within a Linux subsystem_ as long as all negotiated APIs
> > > are audited.
> > 
> > To us there is a big difference: If a lot of functions in an API are
> > `unsafe` without being inherent from the problem that it solves, then
> > it's a bad API.
> 
> Which one? If you point it out, we will update the EROFS kernel
> APIs then.
> 
> > 
> > > Otherwise, it means Rust will never be used to write Linux core parts
> > > such as MM, VFS or block layer. Does this point make sense? At least,
> > > Rust needs to get along with the existing C code (in an audited way)
> > > rather than refuse C code.
> > 
> > I am neither requiring you to write solely safe code, nor am I banning
> > interacting with the C side. What we mean when we talk about
> > abstractions is that we want to minimize the Rust code that directly
> > interfaces with C. Rust-to-Rust interfaces can be a lot safer and are
> 
> We will definitly minimize the API interface between Rust and C in
> EROFS.
> 
> And it can be done incrementally, why not?  I assume your world is
> not pure C and pure Rust as for the Rust for Linux project, no?
> 
> > easier to implement correctly.
> > 
> > > My personal idea about Rust: I think Rust is just another _language
> > > tool_ for the Linux kernel which could save us time and make the
> > > kernel development better.
> > 
> > Yes, but we do have conventions, rules and guidelines for writing such
> > code. C code also has them. If you want/need to break them, there should
> > be a good reason to do so. I don't see one in this instance.
> > >> Or I wonder why not writing a complete new Rust stuff instead rather
> > > than living in the C world?
> > 
> > There are projects that do that yes. But Rust-for-Linux is about
> > bringing Rust to the kernel and part of that is coming up with good
> > conventions and rules.
> 
> Which rule is broken?  Was they discussed widely around the
> Linux world?
> 
> > 
> > > > > For Rust VFS abstraction, that is a different and indepenent story,
> > > > > Yiyang don't have any bandwidth on this due to his limited time.
> > > > 
> > > > This seems a bit weird, you have the bandwidth to write your own
> > > > abstractions, but not use the stuff that has already been developed?
> > > 
> > > It's not written by me, Yiyang is still an undergraduate tudent.
> > > It's his research project and I don't think it's his responsibility
> > > to make an upstreamable VFS abstraction.
> > 
> > That is fair, but he wouldn't have to start from scratch, Wedsons
> > abstractions were good enough for him to write a Rust version of ext2.
> 
> The Wedson one is just broken, I assume that you've read
> https://lwn.net/Articles/978738/ ?

Yes, and if you see the patches on linux-fsdevel, people are working to
get these vfs bindings correct for any filesystem to use.  Please review
them and see if they will work for you for erofs, as "burying" the
binding in just one filesystem is not a good idea.

> > In addition, tarfs and puzzlefs also use those bindings.
> 
> These are both toy fses, I don't know who will use these two
> fses for their customers.

tarfs is being used by real users as it solves a need they have today.
And it's a good example of how the vfs bindings would work, although
in a very simple way.  You have to start somewhere :)

> > Miguel Ojeda.
> > However, we can only reach that longterm goal if maintainers are willing
> > and ready to put Rust into their subsystems (either by knowing/learning
> > Rust themselves or by having a co-maintainer that does just the Rust
> > part). So you wanting to experiment is great. I appreciate that you also
> > have a student working on this. Still, I think we should follow our
> > guidelines and create abstractions in order to require as little
> > `unsafe` code as possible.
> 
> I've expressed my point.  I don't think some `guideline`
> could bring success to RFL.  Since many subsystems needs
> an incremental way, not just a black-or-white thing.

Incremental is good, and if you want to use a .rs file or two in the
middle of your module, that's fine.  But please don't try to implement
bindings to common kernel data structures like inodes and dentries and
superblocks if at all possible and ignore the work that others are doing
in this area as that's just duplicated work and will cause more
confusion over time.

It's the same for drivers, I will object strongly if someone attempted
to create a USB binding for 'struct usb_interface' in the middle of a
USB driver and instead insist they work on a generic binding that can be
used by all USB drivers.  I imagine the VFS maintainers have the same
opinion on their apis as well for valid reasons.

thanks,

greg k-h

Gao Xiang Sept. 21, 2024, 9:29 a.m. UTC | #11

Hi Greg,

On 2024/9/21 16:37, Greg Kroah-Hartman wrote:
> On Fri, Sep 20, 2024 at 08:49:26AM +0800, Gao Xiang wrote:
>>
>>

...

>>
>>>
>>>>>> For Rust VFS abstraction, that is a different and indepenent story,
>>>>>> Yiyang don't have any bandwidth on this due to his limited time.
>>>>>
>>>>> This seems a bit weird, you have the bandwidth to write your own
>>>>> abstractions, but not use the stuff that has already been developed?
>>>>
>>>> It's not written by me, Yiyang is still an undergraduate tudent.
>>>> It's his research project and I don't think it's his responsibility
>>>> to make an upstreamable VFS abstraction.
>>>
>>> That is fair, but he wouldn't have to start from scratch, Wedsons
>>> abstractions were good enough for him to write a Rust version of ext2.
>>
>> The Wedson one is just broken, I assume that you've read
>> https://lwn.net/Articles/978738/ ?
> 
> Yes, and if you see the patches on linux-fsdevel, people are working to
> get these vfs bindings correct for any filesystem to use.  Please review
> them and see if they will work for you for erofs, as "burying" the
> binding in just one filesystem is not a good idea.

Thanks for the reply!

I do think the first Rust filesystem should be ext2 or other
simple read-write fses due to many VFS member lifetime
concerns as other filesystem developpers suggested before [1],
otherwise honestly the VFS abstraction will be refactoredagain
and again just due to limited vision and broken functionality,
I do think which way is not how currently new C Linux kernel
APIs are proposed too (e.g. carefully review all potential use
cases).

[1] https://lore.kernel.org/linux-fsdevel/ZZ3GeehAw%2F78gZJk@dread.disaster.area/

> 
>>> In addition, tarfs and puzzlefs also use those bindings.
>>
>> These are both toy fses, I don't know who will use these two
>> fses for their customers.
> 
> tarfs is being used by real users as it solves a need they have today.
> And it's a good example of how the vfs bindings would work, although
> in a very simple way.  You have to start somewhere :)

EROFS has resolved the same functionality upstream in
2023, see [2]

```
Replacing tar or cpio archives with a filesystem is a
potential use case for EROFS. There has been a proposal
from the confidential-computing community for a kernel
tarfs filesystem, which would allow guest VMs to
efficiently mount a tar file directly. But EROFS would
be a better choice, he said. There is a proof-of-concept
patch set that allows directly mounting a downloaded tar
file using EROFS that performs better than unpacking the
tarball to ext4, then mounting it in the guest using
overlayfs.
```

Honestly, I've kept doing very friendly public/private
communitation with Wedson in the confidential-computing
community to see if there could be a collaboration for
our tar direct mount use cases, but he just ignored my
suggestion [3] and keep on his "tarfs" development (even
this "tarfs" has no relationship with the standard
POSIX tar/pax format because you cannot mount a real
POSIX tar/pax by his implementation.)

So my concern is just as below:
  1) EROFS can already work well for his "tarfs" use
     cases, so there is already an in-tree stuff works
     without any issue;

  2) No matter from his "tarfs" current on-disk format,
     and on-disk extendability perspersive, I think it
     will be used for a very narrow use case.
     So in the long term, it could be vanished or forget
     since there are more powerful alternatives in the
     kernel tree for more wider use cases.

I think there could be some example fs to show Rust VFS
abstraction (such as ext2, and even minix fs).  Those
fses shouldn't be too hard to get a Rust implementation
(e.g. minix fs for pre Linux 1.0).  But honestly I don't
think it's a good idea to upstream a narrow use case
stuff even it's written in Rust: also considering Wedson
has been quited, so the code may not be maintainerd
anymore.

In short, I do _hope_ a nice Rust VFS abstraction could
be landed upstream.  But it should be driven by a simple
no-journal read-write filesystem to check all Rust VFS
components in the global vision instead of some
unsustainable random upstream work just for
corporation pride likewise.

And if some other approach could compare EROFS as a known
prior art (as I once fully compared with SquashFS in the
paper) with good reasons, I will be very happy and feel
respect (also I could know the limitation of EROFS or how
to improve EROFS.)  But if there is no reason and just
ignore EROFS exists, and I think it's not the proper way
to propose a new kernel feature / filesystem.

[2] https://lwn.net/Articles/934047
[3] https://github.com/kata-containers/kata-containers/pull/7106#issuecomment-1592192981

> 
>>> Miguel Ojeda.
>>> However, we can only reach that longterm goal if maintainers are willing
>>> and ready to put Rust into their subsystems (either by knowing/learning
>>> Rust themselves or by having a co-maintainer that does just the Rust
>>> part). So you wanting to experiment is great. I appreciate that you also
>>> have a student working on this. Still, I think we should follow our
>>> guidelines and create abstractions in order to require as little
>>> `unsafe` code as possible.
>>
>> I've expressed my point.  I don't think some `guideline`
>> could bring success to RFL.  Since many subsystems needs
>> an incremental way, not just a black-or-white thing.
> 
> Incremental is good, and if you want to use a .rs file or two in the
> middle of your module, that's fine.  But please don't try to implement
> bindings to common kernel data structures like inodes and dentries and
> superblocks if at all possible and ignore the work that others are doing
> in this area as that's just duplicated work and will cause more
> confusion over time.

Yeah, agreed. That is what I'd like to say.

Honestly, Yiyang don't have enough time to implement
VFS abstraction due to his studies and my time slot is
limited for now too.

So I also asked him to "don't touch common kernel data
structures like inodes and dentries and superblocks
if possible" and just convert the EROFS core logic.

But it seems his RFC patch is still left something,
I think he will address them the next version.

> 
> It's the same for drivers, I will object strongly if someone attempted
> to create a USB binding for 'struct usb_interface' in the middle of a
> USB driver and instead insist they work on a generic binding that can be
> used by all USB drivers.  I imagine the VFS maintainers have the same
> opinion on their apis as well for valid reasons.

Agreed.

Thanks,
Gao Xiang

> 
> thanks,
> 
> greg k-h

Ariel Miculas Sept. 25, 2024, 3:48 p.m. UTC | #12

On 24/09/19 07:36, Benno Lossin via Linux-erofs wrote:
> On 19.09.24 17:13, Gao Xiang wrote:
> > Hi Benno,
> > 
> > On 2024/9/19 21:45, Benno Lossin wrote:
> >> Hi,
> >>
> >> Thanks for the patch series. I think it's great that you want to use
> >> Rust for this filesystem.
> >>
> >> On 17.09.24 01:58, Gao Xiang wrote:
> >>> On 2024/9/17 04:01, Gary Guo wrote:
> >>>> Also, it seems that you're building abstractions into EROFS directly
> >>>> without building a generic abstraction. We have been avoiding that. If
> >>>> there's an abstraction that you need and missing, please add that
> >>>> abstraction. In fact, there're a bunch of people trying to add FS
> >>>
> >>> No, I'd like to try to replace some EROFS C logic first to Rust (by
> >>> using EROFS C API interfaces) and try if Rust is really useful for
> >>> a real in-tree filesystem.  If Rust can improve EROFS security or
> >>> performance (although I'm sceptical on performance), As an EROFS
> >>> maintainer, I'm totally fine to accept EROFS Rust logic landed to
> >>> help the whole filesystem better.
> >>
> >> As Gary already said, we have been using a different approach and it has
> >> served us well. Your approach of calling directly into C from the driver
> >> can be used to create a proof of concept, but in our opinion it is not
> >> something that should be put into mainline. That is because calling C
> >> from Rust is rather complicated due to the many nuanced features that
> >> Rust provides (for example the safety requirements of references).
> >> Therefore moving the dangerous parts into a central location is crucial
> >> for making use of all of Rust's advantages inside of your code.
> > 
> > I'm not quite sure about your point honestly.  In my opinion, there
> > is nothing different to use Rust _within a filesystem_ or _within a
> > driver_ or _within a Linux subsystem_ as long as all negotiated APIs
> > are audited.
> 
> To us there is a big difference: If a lot of functions in an API are
> `unsafe` without being inherent from the problem that it solves, then
> it's a bad API.
> 
> > Otherwise, it means Rust will never be used to write Linux core parts
> > such as MM, VFS or block layer. Does this point make sense? At least,
> > Rust needs to get along with the existing C code (in an audited way)
> > rather than refuse C code.
> 
> I am neither requiring you to write solely safe code, nor am I banning
> interacting with the C side. What we mean when we talk about
> abstractions is that we want to minimize the Rust code that directly
> interfaces with C. Rust-to-Rust interfaces can be a lot safer and are
> easier to implement correctly.
> 
> > My personal idea about Rust: I think Rust is just another _language
> > tool_ for the Linux kernel which could save us time and make the
> > kernel development better.
> 
> Yes, but we do have conventions, rules and guidelines for writing such
> code. C code also has them. If you want/need to break them, there should
> be a good reason to do so. I don't see one in this instance.
> 
> > Or I wonder why not writing a complete new Rust stuff instead rather
> > than living in the C world?
> 
> There are projects that do that yes. But Rust-for-Linux is about
> bringing Rust to the kernel and part of that is coming up with good
> conventions and rules.
> 
> >>> For Rust VFS abstraction, that is a different and indepenent story,
> >>> Yiyang don't have any bandwidth on this due to his limited time.
> >>
> >> This seems a bit weird, you have the bandwidth to write your own
> >> abstractions, but not use the stuff that has already been developed?
> > 
> > It's not written by me, Yiyang is still an undergraduate tudent.
> > It's his research project and I don't think it's his responsibility
> > to make an upstreamable VFS abstraction.
> 
> That is fair, but he wouldn't have to start from scratch, Wedsons
> abstractions were good enough for him to write a Rust version of ext2.
> In addition, tarfs and puzzlefs also use those bindings.
> To me it sounds as if you have not taken the time to try to make it work
> with the existing abstractions. Have you tried reaching out to Ariel? He
> is working on puzzlefs and might have some insight to give you. Sadly
> Wedson has left the project, so someone will have to pick up his work.

I share the same opinions as Benno that we should try to use the
existing filesystem abstractions, even if they are not yet upstream.
Since erofs is a read-only filesystem and the Rust filesystem
abstractions are also used by other two read-only filesystems (TarFS and
PuzzleFS), it shouldn't be too difficult to adapt the erofs Rust code so
that it also uses the existing filesystem abstractions. And if there is
anything lacking, we can improve the existing generic APIs. This would
also increase the chances of upstreaming them.

I'm happy to help you if you decide to go down this route.

Cheers,
Ariel

Gao Xiang Sept. 25, 2024, 4:35 p.m. UTC | #13

Hi Ariel,

On 2024/9/25 23:48, Ariel Miculas wrote:

...

> I share the same opinions as Benno that we should try to use the
> existing filesystem abstractions, even if they are not yet upstream.
> Since erofs is a read-only filesystem and the Rust filesystem
> abstractions are also used by other two read-only filesystems (TarFS and
> PuzzleFS), it shouldn't be too difficult to adapt the erofs Rust code so
> that it also uses the existing filesystem abstractions. And if there is
> anything lacking, we can improve the existing generic APIs. This would
> also increase the chances of upstreaming them.

I've expressed my ideas about "TarFS" [1] and PuzzleFS already: since
I'm one of the EROFS authors, I should be responsible for this
long-term project as my own promise to the Linux community and makes
it serve for more Linux users (it has not been interrupted since 2017,
even I sacrificed almost all my leisure time because the EROFS project
isn't all my paid job, I need to maintain our internal kernel storage
stack too).

[1] https://lore.kernel.org/r/3a6314fc-7956-47f3-8727-9dc026f3f50e@linux.alibaba.com

Basically there should be some good reasons to upstream a new stuff to
Linux kernel, I believe it has no exception on the Rust side even it's
somewhat premature: please help compare to the prior arts in details.

And there are all thoughts for reference [2][3][4][5]:
[2] https://github.com/project-machine/puzzlefs/issues/114#issuecomment-2369872133
[3] https://github.com/opencontainers/image-spec/issues/1190#issuecomment-2138572683
[4] https://lore.kernel.org/linux-fsdevel/b9358e7c-8615-1b12-e35d-aae59bf6a467@linux.alibaba.com/
[5] https://lore.kernel.org/linux-fsdevel/20230609-nachrangig-handwagen-375405d3b9f1@brauner/

Here still, I do really want to collaborate with you on your
reasonable use cases.  But if you really want to do your upstream
attempt without even any comparsion, please go ahead because I
believe I can only express my own opinion, but I really don't
decide if your work is acceptable for the kernel.

> 
> I'm happy to help you if you decide to go down this route.

Again, the current VFS abstraction is totally incomplete and broken
[6].

I believe it should be driven by a full-featured read-write fs [7]
(even like a simple minix fs in pre-Linux 1.0 era) and EROFS will
use Rust in "fs/erofs" as the experiment, but I will definitely
polish the Rust version until it looks good before upstreaming.

I really don't want to be a repeater again.

[6] https://lwn.net/SubscriberLink/991062/9de8e9a466a3faf5
[7] https://lore.kernel.org/linux-fsdevel/ZZ3GeehAw%2F78gZJk@dread.disaster.area

Thanks,
Gao Xiang

> 
> Cheers,
> Ariel

Ariel Miculas Sept. 25, 2024, 9:45 p.m. UTC | #14

On 24/09/26 12:35, Gao Xiang wrote:
> Hi Ariel,
> 
> On 2024/9/25 23:48, Ariel Miculas wrote:
> 
> ...
> 
> > I share the same opinions as Benno that we should try to use the
> > existing filesystem abstractions, even if they are not yet upstream.
> > Since erofs is a read-only filesystem and the Rust filesystem
> > abstractions are also used by other two read-only filesystems (TarFS and
> > PuzzleFS), it shouldn't be too difficult to adapt the erofs Rust code so
> > that it also uses the existing filesystem abstractions. And if there is
> > anything lacking, we can improve the existing generic APIs. This would
> > also increase the chances of upstreaming them.
> 
> I've expressed my ideas about "TarFS" [1] and PuzzleFS already: since
> I'm one of the EROFS authors, I should be responsible for this
> long-term project as my own promise to the Linux community and makes
> it serve for more Linux users (it has not been interrupted since 2017,
> even I sacrificed almost all my leisure time because the EROFS project
> isn't all my paid job, I need to maintain our internal kernel storage
> stack too).
> 
> [1] https://lore.kernel.org/r/3a6314fc-7956-47f3-8727-9dc026f3f50e@linux.alibaba.com
> 
> Basically there should be some good reasons to upstream a new stuff to
> Linux kernel, I believe it has no exception on the Rust side even it's
> somewhat premature: please help compare to the prior arts in details.
> 
> And there are all thoughts for reference [2][3][4][5]:
> [2] https://github.com/project-machine/puzzlefs/issues/114#issuecomment-2369872133
> [3] https://github.com/opencontainers/image-spec/issues/1190#issuecomment-2138572683
> [4] https://lore.kernel.org/linux-fsdevel/b9358e7c-8615-1b12-e35d-aae59bf6a467@linux.alibaba.com/
> [5] https://lore.kernel.org/linux-fsdevel/20230609-nachrangig-handwagen-375405d3b9f1@brauner/
> 
> Here still, I do really want to collaborate with you on your
> reasonable use cases.  But if you really want to do your upstream
> attempt without even any comparsion, please go ahead because I
> believe I can only express my own opinion, but I really don't
> decide if your work is acceptable for the kernel.
> 

Thanks for your thoughts on PuzzleFS, I would really like if we could
centralize the discussions on the latest patch series I sent to the
mailing lists back in May [1]. The reason I say this is because looking
at that thread, it seems there is no feedback for PuzzleFS. The feedback
exists, it's just scattered throughout different mediums. On top of
this, I would also like to engage in the discussions with Dave Chinner,
so I can better understand the limitations of PuzzleFS and the reasons
for which it might be rejected in the Linux Kernel. I do appreciate your
feedback and I need to take my time to respond to the technical issues
that you brought up in the github issue.

However, even if it's not upstream, PuzzleFS does use the latest Rust
filesystem abstractions and thus it stands as an example of how to use
them. And this thread is not about PuzzleFS, but about the Rust
filesystem abstractions and how one might start to use them. That's
where I offered to help, since I already went through the process of
having to use them.

[1] https://lore.kernel.org/all/20240516190345.957477-1-amiculas@cisco.com/

> > 
> > I'm happy to help you if you decide to go down this route.
> 
> Again, the current VFS abstraction is totally incomplete and broken
> [6].

If they're incomplete, we can work together to implement the missing
functionalities. Furthermore, we can work to fix the broken stuff. I
don't think these are good reasons to completely ignore the work that's
already been done on this topic.

By the way, what is it that's actually broken? You've linked to an LWN
article [2] (or at least I think your 6th link was supposed to link to
"Rust for filesystems" instead of the "Committing to Rust in the kernel"
one), but I'm interested in the specifics. What exactly doesn't work as
expected from the filesystem abstractions?

[2] https://lwn.net/Articles/978738/

> 
> I believe it should be driven by a full-featured read-write fs [7]
> (even like a simple minix fs in pre-Linux 1.0 era) and EROFS will

I do find it weird that you want a full-featured read-write fs
implemented in Rust, when erofs is a read-only filesystem.

> use Rust in "fs/erofs" as the experiment, but I will definitely
> polish the Rust version until it looks good before upstreaming.

I honestly don't see how it would look good if they're not using the
existing filesystem abstractions. And I'm not convinced that Rust in the
kernel would be useful in any way without the many subsystem
abstractions which were implemented by the Rust for Linux team for the
past few years.

Cheers,
Ariel

> 
> I really don't want to be a repeater again.
> 
> [6] https://lwn.net/SubscriberLink/991062/9de8e9a466a3faf5
> [7] https://lore.kernel.org/linux-fsdevel/ZZ3GeehAw%2F78gZJk@dread.disaster.area
> 
> Thanks,
> Gao Xiang
> 
> > 
> > Cheers,
> > Ariel
>

Gao Xiang Sept. 26, 2024, 12:40 a.m. UTC | #15

On 2024/9/26 05:45, Ariel Miculas wrote:
> On 24/09/26 12:35, Gao Xiang wrote:
>> Hi Ariel,
>>
>> On 2024/9/25 23:48, Ariel Miculas wrote:
>>

...

>>
>> And there are all thoughts for reference [2][3][4][5]:
>> [2] https://github.com/project-machine/puzzlefs/issues/114#issuecomment-2369872133
>> [3] https://github.com/opencontainers/image-spec/issues/1190#issuecomment-2138572683
>> [4] https://lore.kernel.org/linux-fsdevel/b9358e7c-8615-1b12-e35d-aae59bf6a467@linux.alibaba.com/
>> [5] https://lore.kernel.org/linux-fsdevel/20230609-nachrangig-handwagen-375405d3b9f1@brauner/
>>
>> Here still, I do really want to collaborate with you on your
>> reasonable use cases.  But if you really want to do your upstream
>> attempt without even any comparsion, please go ahead because I
>> believe I can only express my own opinion, but I really don't
>> decide if your work is acceptable for the kernel.
>>
> 
> Thanks for your thoughts on PuzzleFS, I would really like if we could
> centralize the discussions on the latest patch series I sent to the
> mailing lists back in May [1]. The reason I say this is because looking
> at that thread, it seems there is no feedback for PuzzleFS. The feedback
> exists, it's just scattered throughout different mediums. On top of
> this, I would also like to engage in the discussions with Dave Chinner,
> so I can better understand the limitations of PuzzleFS and the reasons
> for which it might be rejected in the Linux Kernel. I do appreciate your
> feedback and I need to take my time to respond to the technical issues
> that you brought up in the github issue.

In short, I really want to avoid open arbitary number files in the
page fault path regardless of the performance concerns, because
even there are many cases that mmap_lock is dropped, but IMHO there
is still cases that mmap_lock will be taken.

IOWs, I think it's controversal for a kernel fs to open random file
in the page fault context under mmap_lock in the begining.
Otherwise, it's pretty straight-forward to add some similiar feature
to EROFS.

> 
> However, even if it's not upstream, PuzzleFS does use the latest Rust
> filesystem abstractions and thus it stands as an example of how to use
> them. And this thread is not about PuzzleFS, but about the Rust
> filesystem abstractions and how one might start to use them. That's
> where I offered to help, since I already went through the process of
> having to use them.
> 
> [1] https://lore.kernel.org/all/20240516190345.957477-1-amiculas@cisco.com/
> 
>>>
>>> I'm happy to help you if you decide to go down this route.
>>
>> Again, the current VFS abstraction is totally incomplete and broken
>> [6].
> 
> If they're incomplete, we can work together to implement the missing
> functionalities. Furthermore, we can work to fix the broken stuff. I
> don't think these are good reasons to completely ignore the work that's
> already been done on this topic.

I've said, we don't miss any Rust VFS abstraction work, as long as
some work lands in the Linux kernel, we will switch to use them.

The reason we don't do that is again
  - I don't have time to work on this because my life is still limited
    for RFL in any way at least this year; I don't know if Yiyang has
    time to work on a complete ext2 and a Rust VFS abstraction.

  - We just would like to _use Rust_ for the core EROFS logic, instead
    of touching any VFS stuff.  I'm not sure why it's called "completely
    ignore the VFS abstraction", because there is absolutely no
    relationship between these two things.  Why we need to mix them up?

> 
> By the way, what is it that's actually broken? You've linked to an LWN
> article [2] (or at least I think your 6th link was supposed to link to
> "Rust for filesystems" instead of the "Committing to Rust in the kernel"
> one), but I'm interested in the specifics. What exactly doesn't work as
> expected from the filesystem abstractions?

For example, with my current Rust skill, I'm not sure why
fill_super for "T::SUPER_TYPE, sb::Type::BlockDev" must use
"new_sb.bdev().inode().mapper()".

It's unnecessary for a bdev-based fs to use bdev inode page
cache to read metadata;

Also it's unnecessary for a const fs type to be
sb::Type::BlockDev or sb::Type::Independent as

/// Determines how superblocks for this file system type are keyed.
+    const SUPER_TYPE: sb::Type = sb::Type::Independent;

because at least for the current EROFS use cases, we will
decide to use get_tree_bdev() or get_tree_nodev() according
to the mount source or mount options.

> 
> [2] https://lwn.net/Articles/978738/
> 
>>
>> I believe it should be driven by a full-featured read-write fs [7]
>> (even like a simple minix fs in pre-Linux 1.0 era) and EROFS will
> 
> I do find it weird that you want a full-featured read-write fs
> implemented in Rust, when erofs is a read-only filesystem.

I'm not sure why it's weird from the sane Rust VFS abstraction
perspective.

> 
>> use Rust in "fs/erofs" as the experiment, but I will definitely
>> polish the Rust version until it looks good before upstreaming.
> 
> I honestly don't see how it would look good if they're not using the
> existing filesystem abstractions. And I'm not convinced that Rust in the
> kernel would be useful in any way without the many subsystem
> abstractions which were implemented by the Rust for Linux team for the
> past few years.

So let's see the next version.

Thanks,
Gao Xiang

> 
> Cheers,
> Ariel
>

Gao Xiang Sept. 26, 2024, 1:04 a.m. UTC | #16

On 2024/9/26 08:40, Gao Xiang wrote:
> 
> 
> On 2024/9/26 05:45, Ariel Miculas wrote:

...

>>
>> I honestly don't see how it would look good if they're not using the
>> existing filesystem abstractions. And I'm not convinced that Rust in the
>> kernel would be useful in any way without the many subsystem
>> abstractions which were implemented by the Rust for Linux team for the
>> past few years.
> 
> So let's see the next version.

Some more words, regardless of in-tree "fs/xfs/libxfs",
you also claimed "Another goal is to share the same code between user
space and kernel space in order to provide one secure implementation."
for example in [1].

I wonder Rust kernel VFS abstraction is forcely used in your userspace
implementation, or (somewhat) your argument is still broken here.

[1] https://lore.kernel.org/r/20230609-feldversuch-fixieren-fa141a2d9694@brauner

Thanks,
Gao Xiang

> 
> Thanks,
> Gao Xiang
> 
>>
>> Cheers,
>> Ariel
>>

Ariel Miculas Sept. 26, 2024, 8:10 a.m. UTC | #17

On 24/09/26 09:04, Gao Xiang wrote:
> 
> 
> On 2024/9/26 08:40, Gao Xiang wrote:
> > 
> > 
> > On 2024/9/26 05:45, Ariel Miculas wrote:
> 
> ...
> 
> > > 
> > > I honestly don't see how it would look good if they're not using the
> > > existing filesystem abstractions. And I'm not convinced that Rust in the
> > > kernel would be useful in any way without the many subsystem
> > > abstractions which were implemented by the Rust for Linux team for the
> > > past few years.
> > 
> > So let's see the next version.
> 
> Some more words, regardless of in-tree "fs/xfs/libxfs",
> you also claimed "Another goal is to share the same code between user
> space and kernel space in order to provide one secure implementation."
> for example in [1].
> 
> I wonder Rust kernel VFS abstraction is forcely used in your userspace
> implementation, or (somewhat) your argument is still broken here.

Of course the implementations cannot be identical, but there is a lot of
shared code between the user space and kernel space PuzzleFS
implementations. The user space implementation uses the fuser [1] crate
and implicitly its API for implementing the read/seek/list_xattrs etc.
operations, while the kernel implementation uses the Rust filesystem
abstractions.

While it's currently not possible to use external crates in the Linux
kernel (maybe it won't ever be), one area for improvement would be to
keep the shared code between these PuzzleFS implementations in the
kernel and publish releases to crates.io from there. In this way it will
be obvious which parts of the code are shared and they will actually be
shared (right now the code is duplicated).

I've actually touched on these points [2] during my last year's
presentation of PuzzleFS at Open Source Summit Europe [3].

And here [4] you can see the space savings achieved by PuzzleFS. In
short, if you take 10 versions of Ubuntu Jammy from dockerhub, they take
up 282 MB. Convert them to PuzzleFS and they only take up 130 MB (this
is before applying any compression, the space savings are only due to
the chunking algorithm). If we enable compression (PuzzleFS uses Zstd
seekable compression), which is a fairer comparison (considering that
the OCI image uses gzip compression), then we get down to 53 MB for
storing all 10 Ubuntu Jammy versions using PuzzleFS.

Here's a summary:
# Steps

* I’ve downloaded 10 versions of Jammy from hub.docker.com
* These images only have one layer which is in tar.gz format
* I’ve built 10 equivalent puzzlefs images
* Compute the tarball_total_size by summing the sizes of every Jammy
  tarball (uncompressed) => 766 MB (use this as baseline)
* Sum the sizes of every oci/puzzlefs image => total_size
* Compute the total size as if all the versions were stored in a single
  oci/puzzlefs repository => total_unified_size
* Saved space = tarball_total_size - total_unified_size

# Results
(See [5] if you prefer the video format)

| Type | Total size (MB) | Average layer size (MB) | Unified size (MB) | Saved (MB) / 766 MB |
| --- | --- | --- | --- | --- |
| Oci (uncompressed) | 766 | 77 | 766 | 0 (0%) |
| PuzzleFS uncompressed | 748 | 74 | 130 | 635 (83%) |
| Oci (compressed) | 282 | 28 | 282 | 484 (63%) |
| PuzzleFS (compressed) | 298 | 30 | 53 | 713 (93%) |

Here's the script I used to download the Ubuntu Jammy versions and
generate the PuzzleFS images [6] to get an idea about how I got to these
results.

Can we achieve these results with the current erofs features?  I'm
referring specifically to this comment: "EROFS already supports
variable-sized chunks + CDC" [7].

[1] https://docs.rs/fuser/latest/fuser/
[2] https://youtu.be/OhMtoLrjiBY?si=iuk7PstznEUgnr4g&t=1150
[3] https://osseu2023.sched.com/event/1OGjk/puzzlefs-the-next-generation-container-filesystem-armand-ariel-miculas-cisco-systems
[4] https://youtu.be/OhMtoLrjiBY?si=jwReE1qjs1wXLUCr&t=1732
[5] https://youtu.be/OhMtoLrjiBY?si=Nhlz8FJ9CGnwgOlS&t=1862
[6] https://gist.github.com/ariel-miculas/956056f213db2d3027905c61264d160b
[7] https://github.com/project-machine/puzzlefs/issues/114#issuecomment-2367039464

Regards,
Ariel

> 
> [1] https://lore.kernel.org/r/20230609-feldversuch-fixieren-fa141a2d9694@brauner
> 
> Thanks,
> Gao Xiang
> 
> > 
> > Thanks,
> > Gao Xiang
> > 
> > > 
> > > Cheers,
> > > Ariel
> > > 
>

Gao Xiang Sept. 26, 2024, 8:25 a.m. UTC | #18

On 2024/9/26 16:10, Ariel Miculas wrote:
> On 24/09/26 09:04, Gao Xiang wrote:
>>


...

> 
> And here [4] you can see the space savings achieved by PuzzleFS. In
> short, if you take 10 versions of Ubuntu Jammy from dockerhub, they take
> up 282 MB. Convert them to PuzzleFS and they only take up 130 MB (this
> is before applying any compression, the space savings are only due to
> the chunking algorithm). If we enable compression (PuzzleFS uses Zstd
> seekable compression), which is a fairer comparison (considering that
> the OCI image uses gzip compression), then we get down to 53 MB for
> storing all 10 Ubuntu Jammy versions using PuzzleFS.
> 
> Here's a summary:
> # Steps
> 
> * I’ve downloaded 10 versions of Jammy from hub.docker.com
> * These images only have one layer which is in tar.gz format
> * I’ve built 10 equivalent puzzlefs images
> * Compute the tarball_total_size by summing the sizes of every Jammy
>    tarball (uncompressed) => 766 MB (use this as baseline)
> * Sum the sizes of every oci/puzzlefs image => total_size
> * Compute the total size as if all the versions were stored in a single
>    oci/puzzlefs repository => total_unified_size
> * Saved space = tarball_total_size - total_unified_size
> 
> # Results
> (See [5] if you prefer the video format)
> 
> | Type | Total size (MB) | Average layer size (MB) | Unified size (MB) | Saved (MB) / 766 MB |
> | --- | --- | --- | --- | --- |
> | Oci (uncompressed) | 766 | 77 | 766 | 0 (0%) |
> | PuzzleFS uncompressed | 748 | 74 | 130 | 635 (83%) |
> | Oci (compressed) | 282 | 28 | 282 | 484 (63%) |
> | PuzzleFS (compressed) | 298 | 30 | 53 | 713 (93%) |
> 
> Here's the script I used to download the Ubuntu Jammy versions and
> generate the PuzzleFS images [6] to get an idea about how I got to these
> results.
> 
> Can we achieve these results with the current erofs features?  I'm
> referring specifically to this comment: "EROFS already supports
> variable-sized chunks + CDC" [7].

Please see
https://erofs.docs.kernel.org/en/latest/comparsion/dedupe.html

	                Total Size (MiB)	Average layer size (MiB)	Saved / 766.1MiB
Compressed OCI (tar.gz)	282.5	28.3	63%
Uncompressed OCI (tar)	766.1	76.6	0%
Uncomprssed EROFS	109.5	11.0	86%
EROFS (DEFLATE,9,32k)	46.4	4.6	94%
EROFS (LZ4HC,12,64k)	54.2	5.4	93%

I don't know which compression algorithm are you using (maybe Zstd?),
but from the result is
   EROFS (LZ4HC,12,64k)  54.2
   PuzzleFS compressed   53?
   EROFS (DEFLATE,9,32k) 46.4

I could reran with EROFS + Zstd, but it should be smaller. This feature
has been supported since Linux 6.1, thanks.

Thanks,
Gao Xiang

Gao Xiang Sept. 26, 2024, 8:48 a.m. UTC | #19

On 2024/9/26 16:10, Ariel Miculas via Linux-erofs wrote:
> On 24/09/26 09:04, Gao Xiang wrote:
>>
>>
>> On 2024/9/26 08:40, Gao Xiang wrote:
>>>
>>>
>>> On 2024/9/26 05:45, Ariel Miculas wrote:
>>
>> ...
>>
>>>>
>>>> I honestly don't see how it would look good if they're not using the
>>>> existing filesystem abstractions. And I'm not convinced that Rust in the
>>>> kernel would be useful in any way without the many subsystem
>>>> abstractions which were implemented by the Rust for Linux team for the
>>>> past few years.
>>>
>>> So let's see the next version.
>>
>> Some more words, regardless of in-tree "fs/xfs/libxfs",
>> you also claimed "Another goal is to share the same code between user
>> space and kernel space in order to provide one secure implementation."
>> for example in [1].
>>
>> I wonder Rust kernel VFS abstraction is forcely used in your userspace
>> implementation, or (somewhat) your argument is still broken here.
> 
> Of course the implementations cannot be identical, but there is a lot of
> shared code between the user space and kernel space PuzzleFS
> implementations. The user space implementation uses the fuser [1] crate
If you know what you're doing, you may know what Yiyang is doing
here, he will just form a Rust EROFS core logic and upstream later.

Thanks,
Gao Xiang

Ariel Miculas Sept. 26, 2024, 9:51 a.m. UTC | #20

On 24/09/26 04:25, Gao Xiang wrote:
> 
> 
> On 2024/9/26 16:10, Ariel Miculas wrote:
> > On 24/09/26 09:04, Gao Xiang wrote:
> > > 
> 
> 
> ...
> 
> > 
> > And here [4] you can see the space savings achieved by PuzzleFS. In
> > short, if you take 10 versions of Ubuntu Jammy from dockerhub, they take
> > up 282 MB. Convert them to PuzzleFS and they only take up 130 MB (this
> > is before applying any compression, the space savings are only due to
> > the chunking algorithm). If we enable compression (PuzzleFS uses Zstd
> > seekable compression), which is a fairer comparison (considering that
> > the OCI image uses gzip compression), then we get down to 53 MB for
> > storing all 10 Ubuntu Jammy versions using PuzzleFS.
> > 
> > Here's a summary:
> > # Steps
> > 
> > * I’ve downloaded 10 versions of Jammy from hub.docker.com
> > * These images only have one layer which is in tar.gz format
> > * I’ve built 10 equivalent puzzlefs images
> > * Compute the tarball_total_size by summing the sizes of every Jammy
> >    tarball (uncompressed) => 766 MB (use this as baseline)
> > * Sum the sizes of every oci/puzzlefs image => total_size
> > * Compute the total size as if all the versions were stored in a single
> >    oci/puzzlefs repository => total_unified_size
> > * Saved space = tarball_total_size - total_unified_size
> > 
> > # Results
> > (See [5] if you prefer the video format)
> > 
> > | Type | Total size (MB) | Average layer size (MB) | Unified size (MB) | Saved (MB) / 766 MB |
> > | --- | --- | --- | --- | --- |
> > | Oci (uncompressed) | 766 | 77 | 766 | 0 (0%) |
> > | PuzzleFS uncompressed | 748 | 74 | 130 | 635 (83%) |
> > | Oci (compressed) | 282 | 28 | 282 | 484 (63%) |
> > | PuzzleFS (compressed) | 298 | 30 | 53 | 713 (93%) |
> > 
> > Here's the script I used to download the Ubuntu Jammy versions and
> > generate the PuzzleFS images [6] to get an idea about how I got to these
> > results.
> > 
> > Can we achieve these results with the current erofs features?  I'm
> > referring specifically to this comment: "EROFS already supports
> > variable-sized chunks + CDC" [7].
> 
> Please see
> https://erofs.docs.kernel.org/en/latest/comparsion/dedupe.html

Great, I see you've used the same example as I did. Though I must admit
I'm a little surprised there's no mention of PuzzleFS in your document.

> 
> 	                Total Size (MiB)	Average layer size (MiB)	Saved / 766.1MiB
> Compressed OCI (tar.gz)	282.5	28.3	63%
> Uncompressed OCI (tar)	766.1	76.6	0%
> Uncomprssed EROFS	109.5	11.0	86%
> EROFS (DEFLATE,9,32k)	46.4	4.6	94%
> EROFS (LZ4HC,12,64k)	54.2	5.4	93%
> 
> I don't know which compression algorithm are you using (maybe Zstd?),
> but from the result is
>   EROFS (LZ4HC,12,64k)  54.2
>   PuzzleFS compressed   53?
>   EROFS (DEFLATE,9,32k) 46.4
> 
> I could reran with EROFS + Zstd, but it should be smaller. This feature
> has been supported since Linux 6.1, thanks.

The average layer size is very impressive for EROFS, great work.
However, if we multiply the average layer size by 10, we get the total
size (5.4 MiB * 10 ~ 54.2 MiB), whereas for PuzzleFS, we see that while
the average layer size is 30 MIB (for the compressed case), the unified
size is only 53 MiB. So this tells me there's blob sharing between the
different versions of Ubuntu Jammy with PuzzleFS, but there's no sharing
with EROFS (what I'm talking about is deduplication across the multiple
versions of Ubuntu Jammy and not within one single version).

Of course, with only 10 images, the space savings don't seem that
impressive for PuzzleFS compared to EROFS, but imagine we are storing
hundreds/thousands of Ubuntu versions. Then we're also building OCI
images on top of these versions. So if the user already has all the
blobs for an Ubuntu version, then we only need to ship the chunks that
have changed / have been added as a result of the specific application
that we've built on top of an existing Ubuntu version.

One more thing: the "Unified size" column is the key for understanding
the space savings offered by PuzzleFS and I see that you've left this
column out of your table.

Regards,
Ariel

> 
> Thanks,
> Gao Xiang

Gao Xiang Sept. 26, 2024, 10:46 a.m. UTC | #21

On 2024/9/26 17:51, Ariel Miculas wrote:
> On 24/09/26 04:25, Gao Xiang wrote:
>>
>>
>> On 2024/9/26 16:10, Ariel Miculas wrote:
>>> On 24/09/26 09:04, Gao Xiang wrote:
>>>>
>>
>>
>> ...
>>
>>>
>>> And here [4] you can see the space savings achieved by PuzzleFS. In
>>> short, if you take 10 versions of Ubuntu Jammy from dockerhub, they take
>>> up 282 MB. Convert them to PuzzleFS and they only take up 130 MB (this
>>> is before applying any compression, the space savings are only due to
>>> the chunking algorithm). If we enable compression (PuzzleFS uses Zstd
>>> seekable compression), which is a fairer comparison (considering that
>>> the OCI image uses gzip compression), then we get down to 53 MB for
>>> storing all 10 Ubuntu Jammy versions using PuzzleFS.
>>>
>>> Here's a summary:
>>> # Steps
>>>
>>> * I’ve downloaded 10 versions of Jammy from hub.docker.com
>>> * These images only have one layer which is in tar.gz format
>>> * I’ve built 10 equivalent puzzlefs images
>>> * Compute the tarball_total_size by summing the sizes of every Jammy
>>>     tarball (uncompressed) => 766 MB (use this as baseline)
>>> * Sum the sizes of every oci/puzzlefs image => total_size
>>> * Compute the total size as if all the versions were stored in a single
>>>     oci/puzzlefs repository => total_unified_size
>>> * Saved space = tarball_total_size - total_unified_size
>>>
>>> # Results
>>> (See [5] if you prefer the video format)
>>>
>>> | Type | Total size (MB) | Average layer size (MB) | Unified size (MB) | Saved (MB) / 766 MB |
>>> | --- | --- | --- | --- | --- |
>>> | Oci (uncompressed) | 766 | 77 | 766 | 0 (0%) |
>>> | PuzzleFS uncompressed | 748 | 74 | 130 | 635 (83%) |
>>> | Oci (compressed) | 282 | 28 | 282 | 484 (63%) |
>>> | PuzzleFS (compressed) | 298 | 30 | 53 | 713 (93%) |
>>>
>>> Here's the script I used to download the Ubuntu Jammy versions and
>>> generate the PuzzleFS images [6] to get an idea about how I got to these
>>> results.
>>>
>>> Can we achieve these results with the current erofs features?  I'm
>>> referring specifically to this comment: "EROFS already supports
>>> variable-sized chunks + CDC" [7].
>>
>> Please see
>> https://erofs.docs.kernel.org/en/latest/comparsion/dedupe.html
> 
> Great, I see you've used the same example as I did. Though I must admit
> I'm a little surprised there's no mention of PuzzleFS in your document.

Why I need to mention and even try PuzzleFS here (there are too many
attempts why I need to try them all)?  It just compares to the EROFS
prior work.

> 
>>
>> 	                Total Size (MiB)	Average layer size (MiB)	Saved / 766.1MiB
>> Compressed OCI (tar.gz)	282.5	28.3	63%
>> Uncompressed OCI (tar)	766.1	76.6	0%
>> Uncomprssed EROFS	109.5	11.0	86%
>> EROFS (DEFLATE,9,32k)	46.4	4.6	94%
>> EROFS (LZ4HC,12,64k)	54.2	5.4	93%
>>
>> I don't know which compression algorithm are you using (maybe Zstd?),
>> but from the result is
>>    EROFS (LZ4HC,12,64k)  54.2
>>    PuzzleFS compressed   53?
>>    EROFS (DEFLATE,9,32k) 46.4
>>
>> I could reran with EROFS + Zstd, but it should be smaller. This feature
>> has been supported since Linux 6.1, thanks.
> 
> The average layer size is very impressive for EROFS, great work.
> However, if we multiply the average layer size by 10, we get the total
> size (5.4 MiB * 10 ~ 54.2 MiB), whereas for PuzzleFS, we see that while
> the average layer size is 30 MIB (for the compressed case), the unified
> size is only 53 MiB. So this tells me there's blob sharing between the
> different versions of Ubuntu Jammy with PuzzleFS, but there's no sharing
> with EROFS (what I'm talking about is deduplication across the multiple
> versions of Ubuntu Jammy and not within one single version).

Don't make me wrong, I don't think you got the point.

First, what you asked was `I'm referring specifically to this
comment: "EROFS already supports variable-sized chunks + CDC"`,
so I clearly answered with the result of compressed data global
deduplication with CDC.

Here both EROFS and Squashfs compresses 10 Ubuntu images into
one image for fair comparsion to show the benefit of CDC, so
I believe they basically equal to your `Unified size`s, so
the result is

			Your unified size
	EROFS (LZ4HC,12,64k)  54.2
	PuzzleFS compressed   53?
	EROFS (DEFLATE,9,32k) 46.4

That is why I used your 53 unified size to show EROFS is much
smaller than PuzzleFS.

The reason why EROFS and SquashFS doesn't have the `Total Size`s
is just because we cannot store every individual chunk into some
seperate file.

Currently, I have seen no reason to open arbitary kernel files
(maybe hundreds due to large folio feature at once) in the page
fault context.  If I modified `mkfs.erofs` tool, I could give
some similar numbers, but I don't want to waste time now due
to `open arbitary kernel files in the page fault context`.

As I said, if PuzzleFS finally upstream some work to open kernel
files in page fault context, I will definitely work out the same
feature for EROFS soon, but currently I don't do that just
because it's very controversal and no in-tree kernel filesystem
does that.

Thanks,
Gao Xiang

Ariel Miculas Sept. 26, 2024, 11:01 a.m. UTC | #22

On 24/09/26 06:46, Gao Xiang wrote:
> 
> 
> On 2024/9/26 17:51, Ariel Miculas wrote:
> > On 24/09/26 04:25, Gao Xiang wrote:
> > > 
> > > 
> > > On 2024/9/26 16:10, Ariel Miculas wrote:
> > > > On 24/09/26 09:04, Gao Xiang wrote:
> > > > > 
> > > 
> > > 
> > > ...
> > > 
> > > > 
> > > > And here [4] you can see the space savings achieved by PuzzleFS. In
> > > > short, if you take 10 versions of Ubuntu Jammy from dockerhub, they take
> > > > up 282 MB. Convert them to PuzzleFS and they only take up 130 MB (this
> > > > is before applying any compression, the space savings are only due to
> > > > the chunking algorithm). If we enable compression (PuzzleFS uses Zstd
> > > > seekable compression), which is a fairer comparison (considering that
> > > > the OCI image uses gzip compression), then we get down to 53 MB for
> > > > storing all 10 Ubuntu Jammy versions using PuzzleFS.
> > > > 
> > > > Here's a summary:
> > > > # Steps
> > > > 
> > > > * I’ve downloaded 10 versions of Jammy from hub.docker.com
> > > > * These images only have one layer which is in tar.gz format
> > > > * I’ve built 10 equivalent puzzlefs images
> > > > * Compute the tarball_total_size by summing the sizes of every Jammy
> > > >     tarball (uncompressed) => 766 MB (use this as baseline)
> > > > * Sum the sizes of every oci/puzzlefs image => total_size
> > > > * Compute the total size as if all the versions were stored in a single
> > > >     oci/puzzlefs repository => total_unified_size
> > > > * Saved space = tarball_total_size - total_unified_size
> > > > 
> > > > # Results
> > > > (See [5] if you prefer the video format)
> > > > 
> > > > | Type | Total size (MB) | Average layer size (MB) | Unified size (MB) | Saved (MB) / 766 MB |
> > > > | --- | --- | --- | --- | --- |
> > > > | Oci (uncompressed) | 766 | 77 | 766 | 0 (0%) |
> > > > | PuzzleFS uncompressed | 748 | 74 | 130 | 635 (83%) |
> > > > | Oci (compressed) | 282 | 28 | 282 | 484 (63%) |
> > > > | PuzzleFS (compressed) | 298 | 30 | 53 | 713 (93%) |
> > > > 
> > > > Here's the script I used to download the Ubuntu Jammy versions and
> > > > generate the PuzzleFS images [6] to get an idea about how I got to these
> > > > results.
> > > > 
> > > > Can we achieve these results with the current erofs features?  I'm
> > > > referring specifically to this comment: "EROFS already supports
> > > > variable-sized chunks + CDC" [7].
> > > 
> > > Please see
> > > https://erofs.docs.kernel.org/en/latest/comparsion/dedupe.html
> > 
> > Great, I see you've used the same example as I did. Though I must admit
> > I'm a little surprised there's no mention of PuzzleFS in your document.
> 
> Why I need to mention and even try PuzzleFS here (there are too many
> attempts why I need to try them all)?  It just compares to the EROFS
> prior work.
> 
> > 
> > > 
> > > 	                Total Size (MiB)	Average layer size (MiB)	Saved / 766.1MiB
> > > Compressed OCI (tar.gz)	282.5	28.3	63%
> > > Uncompressed OCI (tar)	766.1	76.6	0%
> > > Uncomprssed EROFS	109.5	11.0	86%
> > > EROFS (DEFLATE,9,32k)	46.4	4.6	94%
> > > EROFS (LZ4HC,12,64k)	54.2	5.4	93%
> > > 
> > > I don't know which compression algorithm are you using (maybe Zstd?),
> > > but from the result is
> > >    EROFS (LZ4HC,12,64k)  54.2
> > >    PuzzleFS compressed   53?
> > >    EROFS (DEFLATE,9,32k) 46.4
> > > 
> > > I could reran with EROFS + Zstd, but it should be smaller. This feature
> > > has been supported since Linux 6.1, thanks.
> > 
> > The average layer size is very impressive for EROFS, great work.
> > However, if we multiply the average layer size by 10, we get the total
> > size (5.4 MiB * 10 ~ 54.2 MiB), whereas for PuzzleFS, we see that while
> > the average layer size is 30 MIB (for the compressed case), the unified
> > size is only 53 MiB. So this tells me there's blob sharing between the
> > different versions of Ubuntu Jammy with PuzzleFS, but there's no sharing
> > with EROFS (what I'm talking about is deduplication across the multiple
> > versions of Ubuntu Jammy and not within one single version).
> 
> Don't make me wrong, I don't think you got the point.
> 
> First, what you asked was `I'm referring specifically to this
> comment: "EROFS already supports variable-sized chunks + CDC"`,
> so I clearly answered with the result of compressed data global
> deduplication with CDC.
> 
> Here both EROFS and Squashfs compresses 10 Ubuntu images into
> one image for fair comparsion to show the benefit of CDC, so

It might be a fair comparison, but that's not how container images are
distributed. You're trying to argue that I should just use EROFS and I'm
showing you that EROFS doesn't currently support the functionality
provided by PuzzleFS: the deduplication across multiple images.

> I believe they basically equal to your `Unified size`s, so
> the result is
> 
> 			Your unified size
> 	EROFS (LZ4HC,12,64k)  54.2
> 	PuzzleFS compressed   53?
> 	EROFS (DEFLATE,9,32k) 46.4
> 
> That is why I used your 53 unified size to show EROFS is much
> smaller than PuzzleFS.
> 
> The reason why EROFS and SquashFS doesn't have the `Total Size`s
> is just because we cannot store every individual chunk into some
> seperate file.

Well storing individual chunks into separate files is the entire point
of PuzzleFS.

> 
> Currently, I have seen no reason to open arbitary kernel files
> (maybe hundreds due to large folio feature at once) in the page
> fault context.  If I modified `mkfs.erofs` tool, I could give
> some similar numbers, but I don't want to waste time now due
> to `open arbitary kernel files in the page fault context`.
> 
> As I said, if PuzzleFS finally upstream some work to open kernel
> files in page fault context, I will definitely work out the same
> feature for EROFS soon, but currently I don't do that just
> because it's very controversal and no in-tree kernel filesystem
> does that.

The PuzzleFS kernel filesystem driver is still in an early POC stage, so
there's still a lot more work to be done.

Regards,
Ariel

> 
> Thanks,
> Gao Xiang

Gao Xiang Sept. 26, 2024, 11:05 a.m. UTC | #23

On 2024/9/26 19:01, Ariel Miculas wrote:

..

> 
>> I believe they basically equal to your `Unified size`s, so
>> the result is
>>
>> 			Your unified size
>> 	EROFS (LZ4HC,12,64k)  54.2
>> 	PuzzleFS compressed   53?
>> 	EROFS (DEFLATE,9,32k) 46.4
>>
>> That is why I used your 53 unified size to show EROFS is much
>> smaller than PuzzleFS.
>>
>> The reason why EROFS and SquashFS doesn't have the `Total Size`s
>> is just because we cannot store every individual chunk into some
>> seperate file.
> 
> Well storing individual chunks into separate files is the entire point
> of PuzzleFS.
> 
>>
>> Currently, I have seen no reason to open arbitary kernel files
>> (maybe hundreds due to large folio feature at once) in the page
>> fault context.  If I modified `mkfs.erofs` tool, I could give
>> some similar numbers, but I don't want to waste time now due
>> to `open arbitary kernel files in the page fault context`.
>>
>> As I said, if PuzzleFS finally upstream some work to open kernel
>> files in page fault context, I will definitely work out the same
>> feature for EROFS soon, but currently I don't do that just
>> because it's very controversal and no in-tree kernel filesystem
>> does that.
> 
> The PuzzleFS kernel filesystem driver is still in an early POC stage, so
> there's still a lot more work to be done.

I suggest that you could just ask FS/MM folks about this ("open
kernel files when reading in the page fault")  first.

If they say "no", I suggest please don't waste on this anymore.

Thanks,
Gao Xiang

> 
> Regards,
> Ariel
> 
>>
>> Thanks,
>> Gao Xiang

Gao Xiang Sept. 26, 2024, 11:23 a.m. UTC | #24

On 2024/9/26 19:01, Ariel Miculas via Linux-erofs wrote:
> On 24/09/26 06:46, Gao Xiang wrote:

...

>>
>>>
>>>>
>>>> 	                Total Size (MiB)	Average layer size (MiB)	Saved / 766.1MiB
>>>> Compressed OCI (tar.gz)	282.5	28.3	63%
>>>> Uncompressed OCI (tar)	766.1	76.6	0%
>>>> Uncomprssed EROFS	109.5	11.0	86%
>>>> EROFS (DEFLATE,9,32k)	46.4	4.6	94%
>>>> EROFS (LZ4HC,12,64k)	54.2	5.4	93%
>>>>
>>>> I don't know which compression algorithm are you using (maybe Zstd?),
>>>> but from the result is
>>>>     EROFS (LZ4HC,12,64k)  54.2
>>>>     PuzzleFS compressed   53?
>>>>     EROFS (DEFLATE,9,32k) 46.4
>>>>
>>>> I could reran with EROFS + Zstd, but it should be smaller. This feature
>>>> has been supported since Linux 6.1, thanks.
>>>
>>> The average layer size is very impressive for EROFS, great work.
>>> However, if we multiply the average layer size by 10, we get the total
>>> size (5.4 MiB * 10 ~ 54.2 MiB), whereas for PuzzleFS, we see that while
>>> the average layer size is 30 MIB (for the compressed case), the unified
>>> size is only 53 MiB. So this tells me there's blob sharing between the
>>> different versions of Ubuntu Jammy with PuzzleFS, but there's no sharing
>>> with EROFS (what I'm talking about is deduplication across the multiple
>>> versions of Ubuntu Jammy and not within one single version).
>>
>> Don't make me wrong, I don't think you got the point.
>>
>> First, what you asked was `I'm referring specifically to this
>> comment: "EROFS already supports variable-sized chunks + CDC"`,
>> so I clearly answered with the result of compressed data global
>> deduplication with CDC.
>>
>> Here both EROFS and Squashfs compresses 10 Ubuntu images into
>> one image for fair comparsion to show the benefit of CDC, so
> 
> It might be a fair comparison, but that's not how container images are
> distributed. You're trying to argue that I should just use EROFS and I'm

First, OCI layer is just distributed like what I said.

For example, I could introduce some common blobs to keep
chunks as chunk dictionary.   And then the each image
will be just some index, and all data will be
deduplicated.  That is also what Nydus works.

> showing you that EROFS doesn't currently support the functionality
> provided by PuzzleFS: the deduplication across multiple images.

No, EROFS supports external devices/blobs to keep a lot of
chunks too (as dictionary to share data among images), but
clearly it has the upper limit.

But PuzzleFS just treat each individual chunk as a seperate
file, that will cause unavoidable "open arbitary number of
files on reading, even in page fault context".

> 
>> I believe they basically equal to your `Unified size`s, so
>> the result is
>>
>> 			Your unified size
>> 	EROFS (LZ4HC,12,64k)  54.2
>> 	PuzzleFS compressed   53?
>> 	EROFS (DEFLATE,9,32k) 46.4
>>
>> That is why I used your 53 unified size to show EROFS is much
>> smaller than PuzzleFS.
>>
>> The reason why EROFS and SquashFS doesn't have the `Total Size`s
>> is just because we cannot store every individual chunk into some
>> seperate file.
> 
> Well storing individual chunks into separate files is the entire point
> of PuzzleFS.
> 
>>
>> Currently, I have seen no reason to open arbitary kernel files
>> (maybe hundreds due to large folio feature at once) in the page
>> fault context.  If I modified `mkfs.erofs` tool, I could give
>> some similar numbers, but I don't want to waste time now due
>> to `open arbitary kernel files in the page fault context`.
>>
>> As I said, if PuzzleFS finally upstream some work to open kernel
>> files in page fault context, I will definitely work out the same
>> feature for EROFS soon, but currently I don't do that just
>> because it's very controversal and no in-tree kernel filesystem
>> does that.
> 
> The PuzzleFS kernel filesystem driver is still in an early POC stage, so
> there's still a lot more work to be done.

I suggest that you could just ask FS/MM folks about this ("open
kernel files when reading in the page fault") first.

If they say "no", I suggest please don't waste on this anymore.

Thanks,
Gao Xiang

Ariel Miculas Sept. 26, 2024, 12:50 p.m. UTC | #25

On 24/09/26 07:23, Gao Xiang wrote:
> 
> 
> On 2024/9/26 19:01, Ariel Miculas via Linux-erofs wrote:
> > On 24/09/26 06:46, Gao Xiang wrote:
> 
> ...
> 
> > > 
> > > > 
> > > > > 
> > > > > 	                Total Size (MiB)	Average layer size (MiB)	Saved / 766.1MiB
> > > > > Compressed OCI (tar.gz)	282.5	28.3	63%
> > > > > Uncompressed OCI (tar)	766.1	76.6	0%
> > > > > Uncomprssed EROFS	109.5	11.0	86%
> > > > > EROFS (DEFLATE,9,32k)	46.4	4.6	94%
> > > > > EROFS (LZ4HC,12,64k)	54.2	5.4	93%
> > > > > 
> > > > > I don't know which compression algorithm are you using (maybe Zstd?),
> > > > > but from the result is
> > > > >     EROFS (LZ4HC,12,64k)  54.2
> > > > >     PuzzleFS compressed   53?
> > > > >     EROFS (DEFLATE,9,32k) 46.4
> > > > > 
> > > > > I could reran with EROFS + Zstd, but it should be smaller. This feature
> > > > > has been supported since Linux 6.1, thanks.
> > > > 
> > > > The average layer size is very impressive for EROFS, great work.
> > > > However, if we multiply the average layer size by 10, we get the total
> > > > size (5.4 MiB * 10 ~ 54.2 MiB), whereas for PuzzleFS, we see that while
> > > > the average layer size is 30 MIB (for the compressed case), the unified
> > > > size is only 53 MiB. So this tells me there's blob sharing between the
> > > > different versions of Ubuntu Jammy with PuzzleFS, but there's no sharing
> > > > with EROFS (what I'm talking about is deduplication across the multiple
> > > > versions of Ubuntu Jammy and not within one single version).
> > > 
> > > Don't make me wrong, I don't think you got the point.
> > > 
> > > First, what you asked was `I'm referring specifically to this
> > > comment: "EROFS already supports variable-sized chunks + CDC"`,
> > > so I clearly answered with the result of compressed data global
> > > deduplication with CDC.
> > > 
> > > Here both EROFS and Squashfs compresses 10 Ubuntu images into
> > > one image for fair comparsion to show the benefit of CDC, so
> > 
> > It might be a fair comparison, but that's not how container images are
> > distributed. You're trying to argue that I should just use EROFS and I'm
> 
> First, OCI layer is just distributed like what I said.
> 
> For example, I could introduce some common blobs to keep
> chunks as chunk dictionary.   And then the each image
> will be just some index, and all data will be
> deduplicated.  That is also what Nydus works.

I don't really follow what Nydus does. Here [1] it says they're using
fixed size chunks of 1 MB. Where is the CDC step exactly?

[1] https://github.com/dragonflyoss/nydus/blob/master/docs/nydus-design.md#2-rafs

> 
> > showing you that EROFS doesn't currently support the functionality
> > provided by PuzzleFS: the deduplication across multiple images.
> 
> No, EROFS supports external devices/blobs to keep a lot of
> chunks too (as dictionary to share data among images), but
> clearly it has the upper limit.
> 
> But PuzzleFS just treat each individual chunk as a seperate
> file, that will cause unavoidable "open arbitary number of
> files on reading, even in page fault context".
> 
> > 
> > > I believe they basically equal to your `Unified size`s, so
> > > the result is
> > > 
> > > 			Your unified size
> > > 	EROFS (LZ4HC,12,64k)  54.2
> > > 	PuzzleFS compressed   53?
> > > 	EROFS (DEFLATE,9,32k) 46.4
> > > 
> > > That is why I used your 53 unified size to show EROFS is much
> > > smaller than PuzzleFS.
> > > 
> > > The reason why EROFS and SquashFS doesn't have the `Total Size`s
> > > is just because we cannot store every individual chunk into some
> > > seperate file.
> > 
> > Well storing individual chunks into separate files is the entire point
> > of PuzzleFS.
> > 
> > > 
> > > Currently, I have seen no reason to open arbitary kernel files
> > > (maybe hundreds due to large folio feature at once) in the page
> > > fault context.  If I modified `mkfs.erofs` tool, I could give
> > > some similar numbers, but I don't want to waste time now due
> > > to `open arbitary kernel files in the page fault context`.
> > > 
> > > As I said, if PuzzleFS finally upstream some work to open kernel
> > > files in page fault context, I will definitely work out the same
> > > feature for EROFS soon, but currently I don't do that just
> > > because it's very controversal and no in-tree kernel filesystem
> > > does that.
> > 
> > The PuzzleFS kernel filesystem driver is still in an early POC stage, so
> > there's still a lot more work to be done.
> 
> I suggest that you could just ask FS/MM folks about this ("open
> kernel files when reading in the page fault") first.
> 
> If they say "no", I suggest please don't waste on this anymore.
> 
> Thanks,
> Gao Xiang

Gao Xiang Sept. 27, 2024, 2:18 a.m. UTC | #26

On 2024/9/26 20:50, Ariel Miculas wrote:
> On 24/09/26 07:23, Gao Xiang wrote:

...

>>>
>>> It might be a fair comparison, but that's not how container images are
>>> distributed. You're trying to argue that I should just use EROFS and I'm
>>
>> First, OCI layer is just distributed like what I said.
>>
>> For example, I could introduce some common blobs to keep
>> chunks as chunk dictionary.   And then the each image
>> will be just some index, and all data will be
>> deduplicated.  That is also what Nydus works.
> 
> I don't really follow what Nydus does. Here [1] it says they're using
> fixed size chunks of 1 MB. Where is the CDC step exactly?

Dragonfly Nydus uses fixed-size chunks of 1MiB by default with
limited external blobs as chunk dictionaries.  And ComposeFS
uses per-file blobs.

Currently, Both are all EROFS users using different EROFS
features.  EROFS itself supports fixed-size chunks (unencoded),
variable-sized extents (encoded, CDC optional) and limited
external blobs.

Honestly, for your testload (10 versions of ubuntu:jammy), I
don't think CDC made a significant difference in the final
result compared to per-file blobs likewise.  Because most of
the files in these images are identical, I think there are
only binary differences due to CVE fixes or similar issues.
Maybe delta compression could do more help, but I never try
this.  So as I asked in [1], does ComposeFS already meet your
requirement?

Again, EROFS could keep every different extent (or each chunk,
whatever) as a seperate file with minor update, it's a trivial
stuff for some userspace archive system, but IMO it's
controversal for an in-tree kernel filesystem.

[1] https://github.com/project-machine/puzzlefs/issues/114#issuecomment-2369971291

Thanks,
Gao Xiang

[RFC,03/24] erofs: add Errno in Rust

Commit Message

Comments

Patch