Message ID | 20230903032555.np6lu5mouv5tw4ff@moria.home.lan (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [GIT,PULL] bcachefs | expand |
Hi Kent, I thought you'd follow Christians proposal to actually work with people to try to use common APIs (i.e. to use iomap once it's been even more iter-like, for which I'm still waiting for suggestions), and make your new APIs used more widely if they are a good idea (which also requires explaining them better) and aim for 6.7 once that is done. But without that, and without being in linux-next and the VFS maintainer ACK required for I think this is a very bad idea.
On Tue, Sep 05, 2023 at 06:24:47AM -0700, Christoph Hellwig wrote: > Hi Kent, > > I thought you'd follow Christians proposal to actually work with > people to try to use common APIs (i.e. to use iomap once it's been > even more iter-like, for which I'm still waiting for suggestions), > and make your new APIs used more widely if they are a good idea > (which also requires explaining them better) and aim for 6.7 once > that is done. Christoph, I get that iomap is your pet project and you want to make it better and see it more widely used. But the reasons bcachefs doesn't use iomap have been discussed at length, and I've posted and talked about the bcachefs equivalents of that code. You were AWOL on those discussions; you consistently say "bcachefs should use iomap" and then walk away, so those discussions haven't moved forwards. To recap, besides being more iterator like (passing data structures around with iterators, vs. indirect function calls into the filesystem), bcachefs also hangs a bit more state off the pagecache, due to being multi device and also using the same data structure for tracking disk reservations (because why make the buffered write paths look that up separately?). > But without that, and without being in linux-next and the VFS maintainer > ACK required for I think this is a very bad idea. Christain gave his reviewed-by for the dcache patch. Since this patchset doesn't change existing code maintained by others aside from that one patch, not sure how linux-next is relevant here...
On Tue, Sep 05, 2023 at 08:00:07PM -0400, Kent Overstreet wrote: > On Tue, Sep 05, 2023 at 06:24:47AM -0700, Christoph Hellwig wrote: > > Hi Kent, > > > > I thought you'd follow Christians proposal to actually work with > > people to try to use common APIs (i.e. to use iomap once it's been > > even more iter-like, for which I'm still waiting for suggestions), > > and make your new APIs used more widely if they are a good idea > > (which also requires explaining them better) and aim for 6.7 once > > that is done. > > Christoph, I get that iomap is your pet project and you want to make it > better and see it more widely used. > > But the reasons bcachefs doesn't use iomap have been discussed at > length, and I've posted and talked about the bcachefs equivalents of > that code. You were AWOL on those discussions; you consistently say > "bcachefs should use iomap" and then walk away, so those discussions > haven't moved forwards. > > To recap, besides being more iterator like (passing data structures > around with iterators, vs. indirect function calls into the filesystem), > bcachefs also hangs a bit more state off the pagecache, due to being > multi device and also using the same data structure for tracking disk > reservations (because why make the buffered write paths look that up > separately?). I /thought/ the proposal was to use iomap for bcachefs DIO and leave buffered writes for a different day. I agree the iomap buffered write path is inappropriate for bcachefs today. I'd like that to change, but there's a lot of functionality that it would need to support. I don't see the point in keeping bcachefs out of tree for another cycle. Yes, more use of iomap would be great, but I don't think that it being in-tree is going to hinder anything. It's not like it uses bufferheads.
On Wed, Sep 06, 2023 at 01:41:22AM +0100, Matthew Wilcox wrote: > On Tue, Sep 05, 2023 at 08:00:07PM -0400, Kent Overstreet wrote: > > On Tue, Sep 05, 2023 at 06:24:47AM -0700, Christoph Hellwig wrote: > > > Hi Kent, > > > > > > I thought you'd follow Christians proposal to actually work with > > > people to try to use common APIs (i.e. to use iomap once it's been > > > even more iter-like, for which I'm still waiting for suggestions), > > > and make your new APIs used more widely if they are a good idea > > > (which also requires explaining them better) and aim for 6.7 once > > > that is done. > > > > Christoph, I get that iomap is your pet project and you want to make it > > better and see it more widely used. > > > > But the reasons bcachefs doesn't use iomap have been discussed at > > length, and I've posted and talked about the bcachefs equivalents of > > that code. You were AWOL on those discussions; you consistently say > > "bcachefs should use iomap" and then walk away, so those discussions > > haven't moved forwards. > > > > To recap, besides being more iterator like (passing data structures > > around with iterators, vs. indirect function calls into the filesystem), > > bcachefs also hangs a bit more state off the pagecache, due to being > > multi device and also using the same data structure for tracking disk > > reservations (because why make the buffered write paths look that up > > separately?). > > I /thought/ the proposal was to use iomap for bcachefs DIO and leave > buffered writes for a different day. I agree the iomap buffered write > path is inappropriate for bcachefs today. I'd like that to change, > but there's a lot of functionality that it would need to support. No, I'm not going to convert the bcachefs DIO path to iomap; not as it exitss now. Right now I've got a clean separation between the VFS level DIO code, and the lower level bcachefs code that walks the extents btree and issues IOs. I have to consider the iomap approach where the loop-up-mappings-and-issue loop is in iomap code but calling into filesystem code pretty gross. I was talking about this /years/ ago when I did the work to make it possible to efficiently split bios - iomap could have taken the approach bcachefs did, the prereqs were in place when iomap was started, but it didn't happen - iomap ended up being a more conservative approach, a cleaned up version of buffer heads and fs/direct-IO.c. That's fine, iomap is certainly an improvement over what it was replacing, but it would not be an improvement for bcachefs. I think it might be more fruitful to look at consolidating the buffered IO code in bcachefs and iomap. The conceptual models are a bit closer, bcachefs's buffered IO code is just a bit more fully-featured in that it does the dirty block tracking in a unified way. That was a project that turned out pretty nicely.
On Wed, Sep 06, 2023 at 12:10:02PM -0400, Kent Overstreet wrote: > On Wed, Sep 06, 2023 at 01:41:22AM +0100, Matthew Wilcox wrote: > > On Tue, Sep 05, 2023 at 08:00:07PM -0400, Kent Overstreet wrote: > > > On Tue, Sep 05, 2023 at 06:24:47AM -0700, Christoph Hellwig wrote: > > > > Hi Kent, > > > > > > > > I thought you'd follow Christians proposal to actually work with > > > > people to try to use common APIs (i.e. to use iomap once it's been > > > > even more iter-like, for which I'm still waiting for suggestions), > > > > and make your new APIs used more widely if they are a good idea > > > > (which also requires explaining them better) and aim for 6.7 once > > > > that is done. > > > > > > Christoph, I get that iomap is your pet project and you want to make it > > > better and see it more widely used. Christoph quit as maintainer 78 days ago. > > > But the reasons bcachefs doesn't use iomap have been discussed at > > > length, and I've posted and talked about the bcachefs equivalents of > > > that code. You were AWOL on those discussions; you consistently say > > > "bcachefs should use iomap" and then walk away, so those discussions > > > haven't moved forwards. > > > > > > To recap, besides being more iterator like (passing data structures > > > around with iterators, vs. indirect function calls into the filesystem), > > > bcachefs also hangs a bit more state off the pagecache, due to being > > > multi device and also using the same data structure for tracking disk > > > reservations (because why make the buffered write paths look that up > > > separately?). > > > > I /thought/ the proposal was to use iomap for bcachefs DIO and leave I wasn't aware of /any/ active effort to make bcachefs use iomap for any purpose. > > buffered writes for a different day. I agree the iomap buffered write > > path is inappropriate for bcachefs today. I'd like that to change, > > but there's a lot of functionality that it would need to support. > > No, I'm not going to convert the bcachefs DIO path to iomap; not as it > exitss now. > > Right now I've got a clean separation between the VFS level DIO code, > and the lower level bcachefs code that walks the extents btree and > issues IOs. I have to consider the iomap approach where the > loop-up-mappings-and-issue loop is in iomap code but calling into > filesystem code pretty gross. > > I was talking about this /years/ ago when I did the work to make it > possible to efficiently split bios - iomap could have taken the approach > bcachefs did, the prereqs were in place when iomap was started, but it > didn't happen - iomap ended up being a more conservative approach, a > cleaned up version of buffer heads and fs/direct-IO.c. <shrug> iirc the genesis of xfs "iomap" was that it was created to supply space mappings to pnfs, and then got reused for directio and pagecache operations. Later that was hoisted wholesale to the vfs. Then spectre/meltdown happened and our indirect function call toy got taken away, and now we're stuck figuring out how to remove them all. IOWs, individual tactics that each made sense for maintaining the overall stability of xfs, but overall didn't amount to anything ambitious. In the end I'd probably rather do something like this to eliminate all the indirect function calls: int xfs_zero_range( struct xfs_inode *ip, loff_t pos, loff_t len, bool *did_zero) { struct iomap_iter iter = { }; struct inode *inode = VFS_I(ip); int ret = 0; iomap_start_zero_range(&iter, inode, pos, len); while (iter.len > 0) { ret = xfs_buffered_write_iomap_begin(&iter, &iter.write_map, &iter.read_map); if (ret) break; len = iomap_zero_range_iter(&iter, did_zero); if (len < 0) { ret = len; break; } ret = xfs_buffered_write_iomap_end(&iter, len, &iter.write_map); if (ret) break; iomap_iter_advance(&iter, len); } iomap_end_zero_range(&iter); return iter.write > 0 ? iter.write : ret; } (I would also rename iter.iomap and iter.srcmap to make it more obvious which ones get filled out under what circumstances.) > That's fine, iomap is certainly an improvement over what it was > replacing, but it would not be an improvement for bcachefs. Filesystems are free to implement read_ and write_iter as they choose, whether or not that involves iomap. > I think it might be more fruitful to look at consolidating the buffered > IO code in bcachefs and iomap. The conceptual models are a bit closer, > bcachefs's buffered IO code is just a bit more fully-featured in that it > does the dirty block tracking in a unified way. That was a project that > turned out pretty nicely. I think I'd much rather gradually revise iomap for everyone and cut bcachefs over when its ready. I don't think revising iomap is a reasonable precondition for merging bcachefs, nor do I think it's a good idea to risk destabilizing bcachefs just to hack in a new dependency that won't even fit well. --D
So I'm starting to look at this because I have most other pull requests done, and while I realize there's no universal support for it I suspect any further changes are better done in-tree. The out-of-tree thing has been done. However, while I'll continue to look at it in this form, I just realized that it's completely unacceptable for one very obvious reason: On Sat, 2 Sept 2023 at 20:26, Kent Overstreet <kent.overstreet@linux.dev> wrote: > > https://evilpiepirate.org/git/bcachefs.git bcachefs-for-upstream No way am I pulling that without a signed tag and a pgp key with a chain of trust. You've been around for long enough that having such a key shouldn't be a problem for you, so make it happen. There are a few other issues that I have with this, and Christoph did mention a big one: it's not been in linux-next. I don't know why I thought it had been, it's just such an obvious thing for any new "I want this merged upstream" tree. So these kinds of "I'll just ignore _all_ basic rules" kinds of issues do annoy me. I need to know that you understand that if you actually want this upstream, you need to work with upstream. That very much means *NOT* continuing this "I'll just do it my way". You need to show that you can work with others, that you can work within the framework of upstream, and that not every single thread you get into becomes an argument. This, btw, is not negotiable. If you feel uncomfortable with that basic notion, you had better just continue doing development outside the main kernel tree for another decade. The fact that I only now notice that you never submitted this to linux-next is obviously on me. My bad. But at the same time it worries me that it might be a sign of you just thinking that your way is special. Linus
On Wed, 6 Sept 2023 at 12:36, Linus Torvalds <torvalds@linux-foundation.org> wrote: > > There are a few other issues that I have with this, and Christoph did > mention a big one: it's not been in linux-next. Oh, and the fact that it hasn't been in linux-next became immediately clear, and it's a real practical problem. I get a compile error when just doing a basic built-test: fs/bcachefs/btree_key_cache.c: In function ‘btree_key_cache_create’: fs/bcachefs/btree_key_cache.c:356:56: error: implicit conversion from ‘enum btree_node_locked_type’ to ‘enum six_lock_type’ [-Werror=enum-conversion] 356 | mark_btree_node_locked(trans, path, 0, BTREE_NODE_UNLOCKED); | ^~~~~~~~~~~~~~~~~~~ and that compile error is actually a sign of a pretty serious bug. BTREE_NODE_UNLOCKED is of the wrong enum type, and has a value (-1) that simply cannot be represented in a 'enum six_lock_type'. I'm pretty sure that the compiler is actually allowed to assume it is in the range of the source enum. Yes, the inline function then does a cast to 'enum btree_node_locked_type', but because the compiler may assume that the incoming enum has values 0..2, it's not clear that it will necessarily cast the value -1 to the expected value. It probably works in practice on x86 (without -fshort-enum, I think gcc will always use an 'int'), but that code really is wrong. On other platforms, gcc defaults -fshort-enums on, and in that case a 'enum six_lock_type' is actually of underlying type 'unsigned char'. And guess what happens when you have (unsigned char)-1? It does *not* cast back to -1. I don't know whether any of the architectures we support has that "default to -fshort-enums" behaviour, and at least hexagon uses "-fno-short-enums" to avoid this issue, but it really is a serious bug. A Clang build also results in objtool complaints about bcachefs, with fs/bcachefs/buckets.o: warning: objtool: bch2_trans_mark_metadata_bucket() falls through to next function bch2_trans_mark_dev_sb() fs/bcachefs/fsck.o: warning: objtool: write_inode() falls through to next function hash_check_key() and sure enough, the code generation ends up being movq %rbx, %rdi movl %r12d, %esi callq bch2_trans_restart_error .Lfunc_end25: .size write_inode, .Lfunc_end25-write_inode and the issue is that bch2_trans_restart_error() is marked __noreturn, but objtool hasn't been told about it, so objtool is very unhappy about that "code seems to fall off the side of the earth" kind of issue. Both of these are very much signs of "this has never been tested in other environments", and would presumably have been found immediately in linux-next. As it is, I'm not convinced I want to even continue to go through this all, when it fails at the very first hurdle. A clean build is not some "would be nice" feature, and it is big part of why we have automation for new code. Linus
On Wed, 6 Sept 2023 at 13:02, Linus Torvalds <torvalds@linux-foundation.org> wrote: > > And guess what happens when you have (unsigned char)-1? It does *not* > cast back to -1. Side note: again, this may be one of those "it works in practice", because if we have -fshort-enums, I think 'enum btree_node_locked_type' in turn ends up being represented as a 'signed char', because that's the smallest simple type that can fit all those values. I don't think gcc ever uses less than that (ie while a six_lock_type could fit in two bits, it's still going to be considered at least a 8-bit value in practice). So we may have 'enum six_lock_type' essentially being 'unsigned char', and when the code does mark_btree_node_locked(trans, path, 0, BTREE_NODE_UNLOCKED); that BTREE_NODE_UNLOCKED value might actually be 255. And then when it's cast to 'enum btree_node_locked_type' in the inline function, the 255 will be cast to 'signed char', and we'll end up compatible with '(enum btree_node_locked_type)-1' again. So it's one of those things that are seriously wrong to do, but might generate the expected code anyway. Unless the compiler adds any other sanity checks, like UBSAN or something, that actually uses the exact range of the enums. It could happen even without UBSAN, if the compiler ends up going "I can see that the original value came from a 'enum six_lock_type', so I know the original value can't be signed, so any comparison with BTREE_NODE_UNLOCKED can never be true. But again, I suspect that in practice this all just happens to work. That doesn't make it right. Linus
Em Wed, Sep 06, 2023 at 01:20:59PM -0700, Linus Torvalds escreveu: > On Wed, 6 Sept 2023 at 13:02, Linus Torvalds > <torvalds@linux-foundation.org> wrote: > > > > And guess what happens when you have (unsigned char)-1? It does *not* > > cast back to -1. > > Side note: again, this may be one of those "it works in practice", > because if we have -fshort-enums, I think 'enum > btree_node_locked_type' in turn ends up being represented as a 'signed > char', because that's the smallest simple type that can fit all those > values. > I don't think gcc ever uses less than that (ie while a six_lock_type > could fit in two bits, it's still going to be considered at least a > 8-bit value in practice). There are some cases where people stuff the enum into a bitfield, but no, no simple type. ⬢[acme@toolbox perf-tools-next]$ pahole | grep -w enum | grep : enum btrfs_rsv_type type:8; /* 28:16 4 */ enum btrfs_delayed_item_type type:8; /* 100: 0 4 */ enum kernel_pkey_operation op:8; /* 40: 0 4 */ enum integrity_status ima_file_status:4; /* 96: 0 4 */ enum integrity_status ima_mmap_status:4; /* 96: 4 4 */ enum integrity_status ima_bprm_status:4; /* 96: 8 4 */ enum integrity_status ima_read_status:4; /* 96:12 4 */ enum integrity_status ima_creds_status:4; /* 96:16 4 */ enum integrity_status evm_status:4; /* 96:20 4 */ enum fs_context_purpose purpose:8; /* 152: 0 4 */ enum fs_context_phase phase:8; /* 152: 8 4 */ enum fs_value_type type:8; /* 8: 0 4 */ enum sgx_page_type type:16; /* 8: 8 4 */ enum nf_hook_ops_type hook_ops_type:8; /* 24: 8 4 */ enum resctrl_event_id evtid:8; /* 0:10 4 */ enum _cache_type type:5; /* 0: 0 4 */ ⬢[acme@toolbox perf-tools-next]$ pahole _cache_type enum _cache_type { CTYPE_NULL = 0, CTYPE_DATA = 1, CTYPE_INST = 2, CTYPE_UNIFIED = 3, }; ⬢[acme@toolbox perf-tools-next]$ pahole integrity_status enum integrity_status { INTEGRITY_PASS = 0, INTEGRITY_PASS_IMMUTABLE = 1, INTEGRITY_FAIL = 2, INTEGRITY_FAIL_IMMUTABLE = 3, INTEGRITY_NOLABEL = 4, INTEGRITY_NOXATTRS = 5, INTEGRITY_UNKNOWN = 6, }; ⬢[acme@toolbox perf-tools-next] > > So we may have 'enum six_lock_type' essentially being 'unsigned char', > and when the code does > > mark_btree_node_locked(trans, path, 0, BTREE_NODE_UNLOCKED); > > that BTREE_NODE_UNLOCKED value might actually be 255. > > And then when it's cast to 'enum btree_node_locked_type' in the inline > function, the 255 will be cast to 'signed char', and we'll end up > compatible with '(enum btree_node_locked_type)-1' again. > > So it's one of those things that are seriously wrong to do, but might > generate the expected code anyway. > > Unless the compiler adds any other sanity checks, like UBSAN or > something, that actually uses the exact range of the enums. > > It could happen even without UBSAN, if the compiler ends up going "I > can see that the original value came from a 'enum six_lock_type', so I > know the original value can't be signed, so any comparison with > BTREE_NODE_UNLOCKED can never be true. > > But again, I suspect that in practice this all just happens to work. > That doesn't make it right. > > Linus
Hi Kent, On Sat, Sep 02, 2023 at 11:25:55PM -0400, Kent Overstreet wrote: > here's the bcachefs pull request, for 6.6. Hopefully everything > outstanding from the previous PR thread has been resolved; the block > layer prereqs are in now via Jens's tree and the dcache helper has a > reviewed-by from Christain. I pulled this into mainline locally and did an LLVM build, which found an immediate issue. It appears the bcachefs codes uses zero length arrays for flexible arrays instead of the C99 syntax that the kernel is moving to, which will cause issues with -fstrict-flex-arrays=3 (clang 16+, gcc 13+), see commit df8fc4e934c1 ("kbuild: Enable -fstrict-flex-arrays=3"). Currently, building x86_64 defconfig + the bcachefs configs with clang warns (or errors with CONFIG_WERROR): In file included from fs/bcachefs/replicas.c:6: fs/bcachefs/replicas.h:46:2: error: array index 0 is past the end of the array (that has type '__u8[0]' (aka 'unsigned char[0]')) [-Werror,-Warray-bounds] 46 | e->devs[0] = dev; | ^ ~ fs/bcachefs/bcachefs_format.h:1392:2: note: array 'devs' declared here 1392 | __u8 devs[0]; | ^ 1 error generated. GCC would warn in the same manner if -Warray-bounds was not disabled for it... :( In file included from fs/bcachefs/buckets.c:22: In function 'bch2_replicas_entry_cached', inlined from 'update_cached_sectors' at fs/bcachefs/buckets.c:409:2, inlined from 'bch2_mark_alloc' at fs/bcachefs/buckets.c:590:9: fs/bcachefs/replicas.h:46:16: error: array subscript 0 is outside array bounds of '__u8[0]' {aka 'unsigned char[]'} [-Werror=array-bounds=] 46 | e->devs[0] = dev; | ~~~~~~~^~~ In file included from fs/bcachefs/bcachefs.h:206, from fs/bcachefs/buckets.c:8: fs/bcachefs/bcachefs_format.h: In function 'bch2_mark_alloc': fs/bcachefs/bcachefs_format.h:1392:33: note: while referencing 'devs' 1392 | __u8 devs[0]; | ^~~~ GCC shows many other instances of this problem for the same reason (I can send a build log if you'd like), such as 'struct snapshot_table' in subvolume_types.h and several other structures in bcachefs_format.h. Cheers, Nathan
On Wed, Sep 06, 2023 at 06:55:38PM -0300, Arnaldo Carvalho de Melo wrote: > Em Wed, Sep 06, 2023 at 01:20:59PM -0700, Linus Torvalds escreveu: > > On Wed, 6 Sept 2023 at 13:02, Linus Torvalds > > <torvalds@linux-foundation.org> wrote: > > > > > > And guess what happens when you have (unsigned char)-1? It does *not* > > > cast back to -1. > > > > Side note: again, this may be one of those "it works in practice", > > because if we have -fshort-enums, I think 'enum > > btree_node_locked_type' in turn ends up being represented as a 'signed > > char', because that's the smallest simple type that can fit all those > > values. > > > I don't think gcc ever uses less than that (ie while a six_lock_type > > could fit in two bits, it's still going to be considered at least a > > 8-bit value in practice). > > There are some cases where people stuff the enum into a bitfield, but > no, no simple type. > > ⬢[acme@toolbox perf-tools-next]$ pahole | grep -w enum | grep : > enum btrfs_rsv_type type:8; /* 28:16 4 */ > enum btrfs_delayed_item_type type:8; /* 100: 0 4 */ The simple grep might give a skewed view, in the above case there's also /* Bitfield combined with previous fields */ in the full output, with adjacent bool struct members it's all packed into one int. I think I've always seen an int for enums, unless it was explicitly narrowed in the structure (:8) or by __packed attribute in the enum definition. > enum kernel_pkey_operation op:8; /* 40: 0 4 */ > enum integrity_status ima_file_status:4; /* 96: 0 4 */ > enum integrity_status ima_mmap_status:4; /* 96: 4 4 */ > enum integrity_status ima_bprm_status:4; /* 96: 8 4 */ > enum integrity_status ima_read_status:4; /* 96:12 4 */ > enum integrity_status ima_creds_status:4; /* 96:16 4 */ > enum integrity_status evm_status:4; /* 96:20 4 */ > enum fs_context_purpose purpose:8; /* 152: 0 4 */ > enum fs_context_phase phase:8; /* 152: 8 4 */ > enum fs_value_type type:8; /* 8: 0 4 */ > enum sgx_page_type type:16; /* 8: 8 4 */ > enum nf_hook_ops_type hook_ops_type:8; /* 24: 8 4 */ > enum resctrl_event_id evtid:8; /* 0:10 4 */ > enum _cache_type type:5; /* 0: 0 4 */
On Wed, 6 Sept 2023 at 14:55, Arnaldo Carvalho de Melo <acme@kernel.org> wrote: > > > I don't think gcc ever uses less than that (ie while a six_lock_type > > could fit in two bits, it's still going to be considered at least a > > 8-bit value in practice). > > There are some cases where people stuff the enum into a bitfield, but > no, no simple type. Note that I am talking about the types gcc uses natively. To show what I'm talking about, build this (silly) code that has some of the same enum types that bcachefs has: #include <stdio.h> enum enum1 { val1 = 0, val2 = 1, val3 = 2, }; enum enum2 { num1 = -1, num2 = 0, num3 = 1, num4 = 2, }; int main(int argc, char **argv) { printf("%d %d (%zu %zu)\n", (enum enum1) num1, (enum enum1) num1 == num1, sizeof(enum enum1), sizeof(enum enum2)); return 0; } and run it. On x86 with no special options, you should get something like this: -1 1 (4 4) ie both types have a four-byte size, and casting 'num1' to 'enum enum1' will in fact give you back -1, and will then compare equal to num1 in the end. Because both types are in practice just 'int'. But now do the same with -fshort-enums, and you instead get 255 0 (1 1) because both types are just a single byte, and casting 'num1' to 'enum enum1' will in fact result in 255 (because it's an _unsigned_ type), and then comparing with the original num1 value will no longer compare equal. (But casting it then further back to 'enum enum2' will in fact result in -1 again, because you're effectively casting it back to 'signed char', and then 255 and -1 are in fact the same in that type). End result: you can get some really unexpected behavior if you cast enums to other types. Which is, of course, exactly why the compiler is warning about the comparison and about passing in the wrong type of enum. Sadly, the compiler doesn't warn about the cast, so you can hide all of this by just adding casts. That doesn't make it *right*, of course. Linus
On Wed, 6 Sept 2023 at 16:20, David Sterba <dsterba@suse.cz> wrote: > > I think I've always seen an int for enums, unless it was > explicitly narrowed in the structure (:8) or by __packed attribute in > the enum definition. 'int' is definitely the default (and traditional) behavior. But exactly because enums can act very differently depending on compiler options (and some of those may have different defaults on different architectures), we should never ever have a bare 'enum' as part of a structure in any UAPI. In fact, having an enum as a bitfield is much better for that case. Doing a quick grep shows that sadly people haven't realized that. Now: using -fshort-enum can break a _lot_ of libraries exactly for this kind of reason, so the kernel isn't unusual, and I don't know of anybody who actually uses -fshort-enum. I'm mentioning -fshort-enum not because it's likely to be used, but mainly because it's an easy way to show some issues. You can get very similar issues by just having unusual enum values. Doing enum mynum { val = 0x80000000 }; does something special too. I leave it to the reader to figure out, but as a hint it's basically exactly the same issue as I was trying to show with my crazy -fshort-enum example. Linus
Em Wed, Sep 06, 2023 at 04:34:32PM -0700, Linus Torvalds escreveu: > On Wed, 6 Sept 2023 at 16:20, David Sterba <dsterba@suse.cz> wrote: > > I think I've always seen an int for enums, unless it was > > explicitly narrowed in the structure (:8) or by __packed attribute in > > the enum definition. > 'int' is definitely the default (and traditional) behavior. > But exactly because enums can act very differently depending on > compiler options (and some of those may have different defaults on > different architectures), we should never ever have a bare 'enum' as > part of a structure in any UAPI. > In fact, having an enum as a bitfield is much better for that case. > Doing a quick grep shows that sadly people haven't realized that. > Now: using -fshort-enum can break a _lot_ of libraries exactly for > this kind of reason, so the kernel isn't unusual, and I don't know of > anybody who actually uses -fshort-enum. I'm mentioning -fshort-enum > not because it's likely to be used, but mainly because it's an easy > way to show some issues. > You can get very similar issues by just having unusual enum values. Doing > > enum mynum { val = 0x80000000 }; > does something special too. > I leave it to the reader to figure out, but as a hint it's basically > exactly the same issue as I was trying to show with my crazy > -fshort-enum example. Two extra hints: ⬢[acme@toolbox perf-tools-next]$ grep KIND_ENUM64 include/uapi/linux/btf.h BTF_KIND_ENUM64 = 19, /* Enumeration up to 64-bit values */ /* BTF_KIND_ENUM64 is followed by multiple "struct btf_enum64". ⬢[acme@toolbox perf-tools-next]$ ⬢[acme@toolbox perf-tools-next]$ pahole --help |& grep enum --skip_encoding_btf_enum64 Do not encode ENUM64s in BTF. ⬢[acme@toolbox perf-tools-next]$ :-) - Arnaldo
Em Wed, Sep 06, 2023 at 08:46:14PM -0300, Arnaldo Carvalho de Melo escreveu: > Em Wed, Sep 06, 2023 at 04:34:32PM -0700, Linus Torvalds escreveu: > > On Wed, 6 Sept 2023 at 16:20, David Sterba <dsterba@suse.cz> wrote: > > > I think I've always seen an int for enums, unless it was > > > explicitly narrowed in the structure (:8) or by __packed attribute in > > > the enum definition. > > 'int' is definitely the default (and traditional) behavior. > > But exactly because enums can act very differently depending on > > compiler options (and some of those may have different defaults on > > different architectures), we should never ever have a bare 'enum' as > > part of a structure in any UAPI. > > In fact, having an enum as a bitfield is much better for that case. > > Doing a quick grep shows that sadly people haven't realized that. > > Now: using -fshort-enum can break a _lot_ of libraries exactly for > > this kind of reason, so the kernel isn't unusual, and I don't know of > > anybody who actually uses -fshort-enum. I'm mentioning -fshort-enum > > not because it's likely to be used, but mainly because it's an easy > > way to show some issues. > > You can get very similar issues by just having unusual enum values. Doing > > enum mynum { val = 0x80000000 }; > > does something special too. > > I leave it to the reader to figure out, but as a hint it's basically > > exactly the same issue as I was trying to show with my crazy > > -fshort-enum example. > > Two extra hints: > ⬢[acme@toolbox perf-tools-next]$ grep KIND_ENUM64 include/uapi/linux/btf.h > BTF_KIND_ENUM64 = 19, /* Enumeration up to 64-bit values */ > /* BTF_KIND_ENUM64 is followed by multiple "struct btf_enum64". > ⬢[acme@toolbox perf-tools-next]$ > ⬢[acme@toolbox perf-tools-next]$ pahole --help |& grep enum > --skip_encoding_btf_enum64 Do not encode ENUM64s in BTF. > ⬢[acme@toolbox perf-tools-next]$ > :-) Some examples: [root@five perf-tools-next]# bpftool btf dump file /sys/kernel/btf/vmlinux | grep -w ENUM64 [2954] ENUM64 '(anon)' encoding=SIGNED size=8 vlen=28 [6519] ENUM64 'blake2b_iv' encoding=UNSIGNED size=8 vlen=8 [12990] ENUM64 '(anon)' encoding=UNSIGNED size=8 vlen=3 [16283] ENUM64 'netdev_priv_flags' encoding=UNSIGNED size=8 vlen=33 [21717] ENUM64 '(anon)' encoding=UNSIGNED size=8 vlen=11 [28247] ENUM64 'ib_uverbs_device_cap_flags' encoding=UNSIGNED size=8 vlen=28 [34836] ENUM64 'perf_callchain_context' encoding=UNSIGNED size=8 vlen=7 [48851] ENUM64 '(anon)' encoding=UNSIGNED size=8 vlen=9 [54703] ENUM64 'hmm_pfn_flags' encoding=UNSIGNED size=8 vlen=7 [root@five perf-tools-next]# pahole netdev_priv_flags enum netdev_priv_flags { IFF_802_1Q_VLAN = 1, IFF_EBRIDGE = 2, IFF_BONDING = 4, IFF_ISATAP = 8, IFF_WAN_HDLC = 16, IFF_XMIT_DST_RELEASE = 32, IFF_DONT_BRIDGE = 64, IFF_DISABLE_NETPOLL = 128, IFF_MACVLAN_PORT = 256, IFF_BRIDGE_PORT = 512, IFF_OVS_DATAPATH = 1024, IFF_TX_SKB_SHARING = 2048, IFF_UNICAST_FLT = 4096, IFF_TEAM_PORT = 8192, IFF_SUPP_NOFCS = 16384, IFF_LIVE_ADDR_CHANGE = 32768, IFF_MACVLAN = 65536, IFF_XMIT_DST_RELEASE_PERM = 131072, IFF_L3MDEV_MASTER = 262144, IFF_NO_QUEUE = 524288, IFF_OPENVSWITCH = 1048576, IFF_L3MDEV_SLAVE = 2097152, IFF_TEAM = 4194304, IFF_RXFH_CONFIGURED = 8388608, IFF_PHONY_HEADROOM = 16777216, IFF_MACSEC = 33554432, IFF_NO_RX_HANDLER = 67108864, IFF_FAILOVER = 134217728, IFF_FAILOVER_SLAVE = 268435456, IFF_L3MDEV_RX_HANDLER = 536870912, IFF_NO_ADDRCONF = 1073741824, IFF_TX_SKB_NO_LINEAR = 2147483648, IFF_CHANGE_PROTO_DOWN = 4294967296, } [root@five perf-tools-next]# pahole --hex ib_uverbs_device_cap_flags enum ib_uverbs_device_cap_flags { IB_UVERBS_DEVICE_RESIZE_MAX_WR = 0x1, IB_UVERBS_DEVICE_BAD_PKEY_CNTR = 0x2, IB_UVERBS_DEVICE_BAD_QKEY_CNTR = 0x4, IB_UVERBS_DEVICE_RAW_MULTI = 0x8, IB_UVERBS_DEVICE_AUTO_PATH_MIG = 0x10, IB_UVERBS_DEVICE_CHANGE_PHY_PORT = 0x20, IB_UVERBS_DEVICE_UD_AV_PORT_ENFORCE = 0x40, IB_UVERBS_DEVICE_CURR_QP_STATE_MOD = 0x80, IB_UVERBS_DEVICE_SHUTDOWN_PORT = 0x100, IB_UVERBS_DEVICE_PORT_ACTIVE_EVENT = 0x400, IB_UVERBS_DEVICE_SYS_IMAGE_GUID = 0x800, IB_UVERBS_DEVICE_RC_RNR_NAK_GEN = 0x1000, IB_UVERBS_DEVICE_SRQ_RESIZE = 0x2000, IB_UVERBS_DEVICE_N_NOTIFY_CQ = 0x4000, IB_UVERBS_DEVICE_MEM_WINDOW = 0x20000, IB_UVERBS_DEVICE_UD_IP_CSUM = 0x40000, IB_UVERBS_DEVICE_XRC = 0x100000, IB_UVERBS_DEVICE_MEM_MGT_EXTENSIONS = 0x200000, IB_UVERBS_DEVICE_MEM_WINDOW_TYPE_2A = 0x800000, IB_UVERBS_DEVICE_MEM_WINDOW_TYPE_2B = 0x1000000, IB_UVERBS_DEVICE_RC_IP_CSUM = 0x2000000, IB_UVERBS_DEVICE_RAW_IP_CSUM = 0x4000000, IB_UVERBS_DEVICE_MANAGED_FLOW_STEERING = 0x20000000, IB_UVERBS_DEVICE_RAW_SCATTER_FCS = 0x400000000, IB_UVERBS_DEVICE_PCI_WRITE_END_PADDING = 0x1000000000, IB_UVERBS_DEVICE_FLUSH_GLOBAL = 0x4000000000, IB_UVERBS_DEVICE_FLUSH_PERSISTENT = 0x8000000000, IB_UVERBS_DEVICE_ATOMIC_WRITE = 0x10000000000, } [root@five perf-tools-next]#
On Wed, Sep 06, 2023 at 03:28:47PM -0700, Nathan Chancellor wrote: > Hi Kent, > > On Sat, Sep 02, 2023 at 11:25:55PM -0400, Kent Overstreet wrote: > > here's the bcachefs pull request, for 6.6. Hopefully everything > > outstanding from the previous PR thread has been resolved; the block > > layer prereqs are in now via Jens's tree and the dcache helper has a > > reviewed-by from Christain. > > I pulled this into mainline locally and did an LLVM build, which found > an immediate issue. It appears the bcachefs codes uses zero length It looks like this series hasn't been in -next at all? That seems like a pretty important step. Also, when I look at the PR, it seems to be a branch history going back _years_. For this kind of a feature, I'd expect a short series of "here's the code" in incremental additions (e.g. look at the x86 shstk series), not the development history from it being out of tree -- this could easily lead to ugly bisection problems, etc. -Kees
On 9/6/23 8:03 PM, Kees Cook wrote: > On Wed, Sep 06, 2023 at 03:28:47PM -0700, Nathan Chancellor wrote: >> Hi Kent, >> >> On Sat, Sep 02, 2023 at 11:25:55PM -0400, Kent Overstreet wrote: >>> here's the bcachefs pull request, for 6.6. Hopefully everything >>> outstanding from the previous PR thread has been resolved; the block >>> layer prereqs are in now via Jens's tree and the dcache helper has a >>> reviewed-by from Christain. >> >> I pulled this into mainline locally and did an LLVM build, which found >> an immediate issue. It appears the bcachefs codes uses zero length > > It looks like this series hasn't been in -next at all? That seems like a > pretty important step. > > Also, when I look at the PR, it seems to be a branch history going > back _years_. For this kind of a feature, I'd expect a short series of > "here's the code" in incremental additions (e.g. look at the x86 shstk > series), not the development history from it being out of tree -- this > could easily lead to ugly bisection problems, etc. When we merged btrfs, Linus helped redo all of the btrfs out of tree commits on top of kernel git. I can't remember at this point if it was his idea or mine, but git history is obviously improved by remembering my sparse file joy: commit 3a686375629da5d2e2ad019265b66ef113c87455 Author: Chris Mason <chris.mason@oracle.com> Date: Thu May 24 13:35:57 2007 -0400 Btrfs: sparse files! I'd have a preference for keeping the old history, warts and all, but wanted to give a data point to help jog people's memory around problems it might have caused. -chris
On Wed, Sep 06, 2023 at 12:36:18PM -0700, Linus Torvalds wrote: > So I'm starting to look at this because I have most other pull > requests done, and while I realize there's no universal support for it > I suspect any further changes are better done in-tree. The out-of-tree > thing has been done. > > However, while I'll continue to look at it in this form, I just > realized that it's completely unacceptable for one very obvious > reason: > > On Sat, 2 Sept 2023 at 20:26, Kent Overstreet <kent.overstreet@linux.dev> wrote: > > > > https://evilpiepirate.org/git/bcachefs.git bcachefs-for-upstream > > No way am I pulling that without a signed tag and a pgp key with a > chain of trust. You've been around for long enough that having such a > key shouldn't be a problem for you, so make it happen. > > There are a few other issues that I have with this, and Christoph did > mention a big one: it's not been in linux-next. I don't know why I > thought it had been, it's just such an obvious thing for any new "I > want this merged upstream" tree. > > So these kinds of "I'll just ignore _all_ basic rules" kinds of issues > do annoy me. > > I need to know that you understand that if you actually want this > upstream, you need to work with upstream. > > That very much means *NOT* continuing this "I'll just do it my way". > You need to show that you can work with others, that you can work > within the framework of upstream, and that not every single thread you > get into becomes an argument. > > This, btw, is not negotiable. If you feel uncomfortable with that > basic notion, you had better just continue doing development outside > the main kernel tree for another decade. > > The fact that I only now notice that you never submitted this to > linux-next is obviously on me. My bad. > > But at the same time it worries me that it might be a sign of you just > thinking that your way is special. Linus, I specifically asked you about linux-next in the offlist pre-pull request thread back in May. It would have been nice if I could've gotten an answer back then; instead, I'm only getting a definitive answer on that now. That question has even come up in meetings and no one could give a definitive answer; the suggestion was to email you and CC people and ask, which is precisely what I did. Sigh. Now I'm wondering what other surprises I'm going to get the next time around...
On Wed, Sep 06, 2023 at 05:03:06PM -0700, Kees Cook wrote: > On Wed, Sep 06, 2023 at 03:28:47PM -0700, Nathan Chancellor wrote: > > Hi Kent, > > > > On Sat, Sep 02, 2023 at 11:25:55PM -0400, Kent Overstreet wrote: > > > here's the bcachefs pull request, for 6.6. Hopefully everything > > > outstanding from the previous PR thread has been resolved; the block > > > layer prereqs are in now via Jens's tree and the dcache helper has a > > > reviewed-by from Christain. > > > > I pulled this into mainline locally and did an LLVM build, which found > > an immediate issue. It appears the bcachefs codes uses zero length > > It looks like this series hasn't been in -next at all? That seems like a > pretty important step. > > Also, when I look at the PR, it seems to be a branch history going > back _years_. For this kind of a feature, I'd expect a short series of > "here's the code" in incremental additions (e.g. look at the x86 shstk > series), not the development history from it being out of tree -- this > could easily lead to ugly bisection problems, etc. Chris already commented on this, but - we really want the full history available, that's important context for anyone working on the code going forward. Brian was just mentioning digging through the history for context in the meeting last Tuesday, and I've been going to a lot of effort to make sure every commit builds and runs.
On Thu, 7 Sept 2023 at 13:37, Kent Overstreet <kent.overstreet@linux.dev> wrote: > > Linus, I specifically asked you about linux-next in the offlist pre-pull > request thread back in May. It would have been nice if I could've gotten > an answer back then; instead, I'm only getting a definitive answer on > that now. I may not have answered because it was so *obvious*. Was there really any question about it? Anyway, the fact that the code doesn't even build for me is kind of a show-stopper regardless. It's the kind of things linux-next is supposed to notice early, so that we aren't in the situation where the code has not even been build-tested properly. Linus
On Wed, Sep 06, 2023 at 12:36:18PM -0700, Linus Torvalds wrote: > So I'm starting to look at this because I have most other pull > requests done, and while I realize there's no universal support for it > I suspect any further changes are better done in-tree. The out-of-tree > thing has been done. > > However, while I'll continue to look at it in this form, I just > realized that it's completely unacceptable for one very obvious > reason: > > On Sat, 2 Sept 2023 at 20:26, Kent Overstreet <kent.overstreet@linux.dev> wrote: > > > > https://evilpiepirate.org/git/bcachefs.git bcachefs-for-upstream > > No way am I pulling that without a signed tag and a pgp key with a > chain of trust. You've been around for long enough that having such a > key shouldn't be a problem for you, so make it happen. > > There are a few other issues that I have with this, and Christoph did > mention a big one: it's not been in linux-next. I don't know why I > thought it had been, it's just such an obvious thing for any new "I > want this merged upstream" tree. > > So these kinds of "I'll just ignore _all_ basic rules" kinds of issues > do annoy me. > > I need to know that you understand that if you actually want this > upstream, you need to work with upstream. > > That very much means *NOT* continuing this "I'll just do it my way". > You need to show that you can work with others, that you can work > within the framework of upstream, and that not every single thread you > get into becomes an argument. > > This, btw, is not negotiable. If you feel uncomfortable with that > basic notion, you had better just continue doing development outside > the main kernel tree for another decade. > > The fact that I only now notice that you never submitted this to > linux-next is obviously on me. My bad. > > But at the same time it worries me that it might be a sign of you just > thinking that your way is special. > > Linus Honestly, though, this process is getting entirely kafkaesque. I've been spending the past month or two working laying the groundwork for putting together a team to work on this, because god knows we need fresh blood in filesystem land - but that's on hold. Getting blindsided by another three month delay hurts, but that's not even the main thing. The biggest thing has just been the non stop hostility and accusations - everything from "fracturing the community" too "ignoring all the rules" and my favorite, "is this the hill Kent wants to die on?" - when I'm just trying to get work done. I don't generally think of myself as being particularly difficult to work with, I get along fine with most of the filesystem developers I interact with - regularly sharing ideas back and forth with the XFS people - but these review discussions have been entirely dominated by the most divisive people in our community, and I'm being told it's entirely on me to work with the guy whos one constant in the past 15 years has been to try and block everything I submit? I'm just trying to get work done here. I'm not trying to ignore the rules. I'm trying to work with people who are willing to have reasonable discussions. ------------------- When I was a teenager, I wanted nothing more than to be a Linux kernel programmer. I thought it utterly amazing that this huge group of people from around the world were working together over the internet, and that anyone could take part if they had the chops. That was my escape from a shitty family situation and the assholes in my life. But my life is different now; I have new and better people in my life, and I have to be thinking about them, and if merging bcachefs means I have to spend a lot more time in interactions like this then it's going to make me a shitty person to be around; and I don't want to do that to myself and I definitely don't want to do that to the people I care about. I'm going to go offline for awhile and think about what I want to do next.
Hi Kent, hi Linus, hi everyone, Kent Overstreet - 08.09.23, 01:40:01 CEST: > The biggest thing has just been the non stop hostility and accusations - > everything from "fracturing the community" too "ignoring all the rules" > and my favorite, "is this the hill Kent wants to die on?" - when I'm > just trying to get work done. I observed this for a while now, without commenting and found the following pattern on both "sides" of the story: Accusing the other one of wrong-doing. As long as those involved in the merging process continue that pattern that story of not merging bcachefs most likely will continue. And even if it gets merged, there would be ongoing conflict about it. Cause I have no control over how someone else acts. Quite the contrary: The more I expect and require someone else to change the more resistance I am most likely to meet. I only can change how I act. This pattern stops exactly when everyone involved looks at their own part in this repeated and frustrating "bcachefs is not merged to the mainline Linux kernel" dance. And from what I observed the failure to merge it is not caused by a single developer. Neither from you, Kent, neither from anyone else. It is the combination of the single actions of several developers and the social interaction between them that caused the failure to merge it so far. Accusing the other one is giving all the power to change the situation to someone else. I am sure merging it will work when everyone involved first looks at themselves and asks themselves the questions "Have I contributed to make merging bcachefs difficult and if so how and most importantly how can I act more constructive about it?". And I mean that for the developers who have been skeptical about the merge as well as the supportive developers including Kent. There have been actions on both "sides" that contributed to delay a merge. I am not going to make a list but leave it to everyone involved to consider themselves what those were. For the recent requests of having it GPG signed as well as having it go through next: I think those requests are reasonable. As far as I read bcache back then went through next as well. Would it have been nice to have been told that earlier? Yes. But both of those requests are certainly not a show-stopper to have bcachefs merged at a later time. Of course I know I have been asking others to go within and consider their own behavior in this mail while being perfectly aware that I cannot change how anyone else acts. However, maybe it is an inspiration to some to decide for themselves to consider a change. In the best hopes to see bcachefs being merged to the "official" Linux kernel soon,
On 9/8/23 00:40, Kent Overstreet wrote: > On Wed, Sep 06, 2023 at 12:36:18PM -0700, Linus Torvalds wrote: >> So I'm starting to look at this because I have most other pull >> requests done, and while I realize there's no universal support for it >> I suspect any further changes are better done in-tree. The out-of-tree >> thing has been done. >> >> However, while I'll continue to look at it in this form, I just >> realized that it's completely unacceptable for one very obvious >> reason: >> >> On Sat, 2 Sept 2023 at 20:26, Kent Overstreet <kent.overstreet@linux.dev> wrote: >>> >>> https://evilpiepirate.org/git/bcachefs.git bcachefs-for-upstream >> >> No way am I pulling that without a signed tag and a pgp key with a >> chain of trust. You've been around for long enough that having such a >> key shouldn't be a problem for you, so make it happen. >> >> There are a few other issues that I have with this, and Christoph did >> mention a big one: it's not been in linux-next. I don't know why I >> thought it had been, it's just such an obvious thing for any new "I >> want this merged upstream" tree. >> >> So these kinds of "I'll just ignore _all_ basic rules" kinds of issues >> do annoy me. >> >> I need to know that you understand that if you actually want this >> upstream, you need to work with upstream. >> >> That very much means *NOT* continuing this "I'll just do it my way". >> You need to show that you can work with others, that you can work >> within the framework of upstream, and that not every single thread you >> get into becomes an argument. >> >> This, btw, is not negotiable. If you feel uncomfortable with that >> basic notion, you had better just continue doing development outside >> the main kernel tree for another decade. >> >> The fact that I only now notice that you never submitted this to >> linux-next is obviously on me. My bad. >> >> But at the same time it worries me that it might be a sign of you just >> thinking that your way is special. >> >> Linus > > Honestly, though, this process is getting entirely kafkaesque. > > I've been spending the past month or two working laying the groundwork > for putting together a team to work on this, because god knows we need > fresh blood in filesystem land - but that's on hold. Getting blindsided > by another three month delay hurts, but that's not even the main thing. > > The biggest thing has just been the non stop hostility and accusations - > everything from "fracturing the community" too "ignoring all the rules" > and my favorite, "is this the hill Kent wants to die on?" - when I'm > just trying to get work done. > > I don't generally think of myself as being particularly difficult to > work with, I get along fine with most of the filesystem developers I > interact with - regularly sharing ideas back and forth with the XFS > people - but these review discussions have been entirely dominated by > the most divisive people in our community, and I'm being told it's > entirely on me to work with the guy whos one constant in the past 15 > years has been to try and block everything I submit? > > I'm just trying to get work done here. I'm not trying to ignore the > rules. I'm trying to work with people who are willing to have reasonable > discussions. > > ------------------- > > When I was a teenager, I wanted nothing more than to be a Linux kernel > programmer. I thought it utterly amazing that this huge group of people > from around the world were working together over the internet, and that > anyone could take part if they had the chops. > > That was my escape from a shitty family situation and the assholes in my > life. > > But my life is different now; I have new and better people in my life, > and I have to be thinking about them, and if merging bcachefs means I > have to spend a lot more time in interactions like this then it's going > to make me a shitty person to be around; and I don't want to do that to > myself and I definitely don't want to do that to the people I care > about. > > I'm going to go offline for awhile and think about what I want to do > next. I've been holding off replying here for a while because I really hoped that this situation would just work itself out. (I apologise for adding more noise in advance) I agree that it really sucks that sometimes you don't get replies to things sometimes or the review from the people you need it from all the time, or didn't tell you something you needed to know. But, I think it's really important though to realize that you are talking to other people on the ML and not review machines (unless that person is Dave Airlie on Zink ;P) and very often, other work can come up that would block them being able to spend time reviewing or guiding you on this process. Everyone on here is another person who has their own huuuge slog of work that is super important for security, stability, shipping a product/feature, keeping their job, etc. Eg. I proposed several revisions on the casefolding support for bcachefs, but right now I am busy doing some other AMDGPU and Gamescope/Proton + color work so I haven't had a chance to follow up more on that since the last discussion. You might think that because X takes a while to respond/review or a didn't mention that you actually needed to do Y or missed your meeting; it's because they don't care, but it's probably way more likely that they are just busy and going through their own personal hell. One of the harsh things about open source is rationalizing that nobody owes you a review or any of their time. If people are willing to review your features and changes in any capacity, then they also have an interest in your project. If you can understand that, then you are going to have a much better time proposing things upstream. I also really want to see bcachefs in mainline, and I know you can do it. :-) Cheers - Joshie
On Tue, Sep 05, 2023 at 08:00:07PM -0400, Kent Overstreet wrote: > But the reasons bcachefs doesn't use iomap have been discussed at > length, Care to send a pointer? > that code. You were AWOL on those discussions; you consistently say > "bcachefs should use iomap" and then walk away, so those discussions > haven't moved forwards. Well, let's hsave the proper discussion then. I defintively don't think "I just walked away", because that's not what I do. I did give up on quite a few of discussions with your because they turned from technical into personal very quick, but I don't even remember that for this case. > To recap, besides being more iterator like (passing data structures > around with iterators, Which is something you can do with iomap, and we're making use of that in quite a few places. Thanks to the work from willy that I helped a bit with this has been done years ago. It might or might not make sense to replace iomap_begin/end with a single iter callback that can be inlined, but that's not changing anything fundamental. > bcachefs also hangs a bit more state off the pagecache, due to being > multi device and also using the same data structure for tracking disk > reservations (because why make the buffered write paths look that up > separately?). I'm not even parsing the sentence. But as said many times I'm all ear if we stick to technical questions. We've added all kinds of improvements to iomap especially for gfs2 and btrfs, and while it sometimes took some time we're all better off with that as we're converging on a model of doing I/O instead of everytone doing something slightly different. > > > But without that, and without being in linux-next and the VFS maintainer > > ACK required for I think this is a very bad idea. > > Christain gave his reviewed-by for the dcache patch. Since this patchset > doesn't change existing code maintained by others aside from that one > patch, not sure how linux-next is relevant here... Because file systems really should have a VFS maintainer ACK, just like block drivers have a block maintainer ACK, or network drivers have a net maintainer ACK. Doing this differently for the most complex driver interface in the kernel doesn't make any sense whatsover.
On Thu, Sep 07, 2023 at 04:39:46PM -0400, Kent Overstreet wrote: > On Wed, Sep 06, 2023 at 05:03:06PM -0700, Kees Cook wrote: > > On Wed, Sep 06, 2023 at 03:28:47PM -0700, Nathan Chancellor wrote: > > > Hi Kent, > > > > > > On Sat, Sep 02, 2023 at 11:25:55PM -0400, Kent Overstreet wrote: > > > > here's the bcachefs pull request, for 6.6. Hopefully everything > > > > outstanding from the previous PR thread has been resolved; the block > > > > layer prereqs are in now via Jens's tree and the dcache helper has a > > > > reviewed-by from Christain. > > > > > > I pulled this into mainline locally and did an LLVM build, which found > > > an immediate issue. It appears the bcachefs codes uses zero length > > > > It looks like this series hasn't been in -next at all? That seems like a > > pretty important step. > > > > Also, when I look at the PR, it seems to be a branch history going > > back _years_. For this kind of a feature, I'd expect a short series of > > "here's the code" in incremental additions (e.g. look at the x86 shstk > > series), not the development history from it being out of tree -- this > > could easily lead to ugly bisection problems, etc. > > Chris already commented on this, but - we really want the full history > available, that's important context for anyone working on the code going > forward. Brian was just mentioning digging through the history for > context in the meeting last Tuesday, and I've been going to a lot of > effort to make sure every commit builds and runs. > Yeah.. IMO the main advantages of a sort of squashed down/sanitized git history is to either aid in code review or just clean up a history that is aesthetically a mess. For the former, the consensus seems to be that no one person is going to sit down and review the entire codebase, but rather folks have been digging into peripheral areas they have experience in (i.e., locking, pagecache, etc.) to call out any major concerns. I believe Kent has also offered to give pointers or just sit down with anybody who needs assistance to navigate the codebase for review purposes. For the latter, ISTM that bcachefs has pretty much followed kernel patch conventions, with it being originally derived from another upstream kernel subsystem and whatnot. The flipside is that losing the history makes it incrementally more annoying for developers working on bcachefs going forward. So I can see an argument for doing things either way in general just depending on context, but it looks like there's precedent for either approach. Looking back at btrfs in v2.6.29, that looks like a ~900 or so commit history that was pulled in. bcachefs has a larger commit log (~2500+) at this point, but if we can do whatever magic Chris referred to to try and avoid any logistical issues for the broader kernel community, I think that would be ideal. BTW this is just my .02 of course, but I'm also fairly certain at least one or two developers have looked at the git log and expressed the exact opposite opinion expressed here: that seeing an upstream-like history is appreciated because it reflects a sane/compatible development process. That again isn't to say one way or the other is the right approach for a merge, just that it seems subjective to some degree and so inevitably there will be different opinions.. Brian
To all kernel developers.
Kent Overstreet - 03.09.23, 05:25:55 CEST:
> Hi Linus,
[…]
Sometimes it is all too easy to forget saying thank you!
Thank you to all of you for your work on the Linux kernel.
I greatly appreciate it.
Except for some older and one (almost insanely nice) newer device that run
AmigaOS or variants of that operating system, all my computing devices
including router and phone run a Linux kernel. And then considering the
huge amount of Linux servers that actually power most of what we call the
internet and its services… awesome!
That would not have been possible without your work!
So: Thank you!
Best,
On Wed, Sep 06, 2023 at 05:03:06PM -0700, Kees Cook wrote: > On Wed, Sep 06, 2023 at 03:28:47PM -0700, Nathan Chancellor wrote: > > Hi Kent, > > > > On Sat, Sep 02, 2023 at 11:25:55PM -0400, Kent Overstreet wrote: > > > here's the bcachefs pull request, for 6.6. Hopefully everything > > > outstanding from the previous PR thread has been resolved; the block > > > layer prereqs are in now via Jens's tree and the dcache helper has a > > > reviewed-by from Christain. > > > > I pulled this into mainline locally and did an LLVM build, which found > > an immediate issue. It appears the bcachefs codes uses zero length > > It looks like this series hasn't been in -next at all? That seems like a > pretty important step. > > Also, when I look at the PR, it seems to be a branch history going > back _years_. For this kind of a feature, I'd expect a short series of > "here's the code" in incremental additions (e.g. look at the x86 shstk > series), not the development history from it being out of tree -- this > could easily lead to ugly bisection problems, etc. I disagree entirely. One of the most valuable things we have for XFS is a complete change history going back to the first commit in 1993. We don't have all that in the kernel git tree - that only goes back to 2005. We do, however, have a separate git archive that runs from the initial commit in 1993 to 2008, and so we can track bugs and changes back all the way to when they were first introduced. I'm looking at something in that historic XFS archive every second day; understanding the XFS code and why it is like it is requires looking back in time on a regular basis. And because XFS engineers have been really good with commit messages for a really long time, there is a lot of information in that historic archive that is simply not documented anywhere else. So if we have the full history of bcachefs developement sitting in a git tree, we'd be *utterly stupid* to discard it when merging it into the mainline tree. Nothing else documents the reasons for the code being like it is right now better than the change history of the code. For people trying to understand the code or trying to perform root cause analysis of a bug, a complete revision history is a rich gold mine full of valuable nuggets.... -Dave.
On Wed, Sep 06, 2023 at 01:20:59PM -0700, Linus Torvalds wrote: > On Wed, 6 Sept 2023 at 13:02, Linus Torvalds > <torvalds@linux-foundation.org> wrote: > > > > And guess what happens when you have (unsigned char)-1? It does *not* > > cast back to -1. > > Side note: again, this may be one of those "it works in practice", > because if we have -fshort-enums, I think 'enum > btree_node_locked_type' in turn ends up being represented as a 'signed > char', because that's the smallest simple type that can fit all those > values. > > I don't think gcc ever uses less than that (ie while a six_lock_type > could fit in two bits, it's still going to be considered at least a > 8-bit value in practice). > > So we may have 'enum six_lock_type' essentially being 'unsigned char', > and when the code does > > mark_btree_node_locked(trans, path, 0, BTREE_NODE_UNLOCKED); > > that BTREE_NODE_UNLOCKED value might actually be 255. > > And then when it's cast to 'enum btree_node_locked_type' in the inline > function, the 255 will be cast to 'signed char', and we'll end up > compatible with '(enum btree_node_locked_type)-1' again. > > So it's one of those things that are seriously wrong to do, but might > generate the expected code anyway. > > Unless the compiler adds any other sanity checks, like UBSAN or > something, that actually uses the exact range of the enums. > > It could happen even without UBSAN, if the compiler ends up going "I > can see that the original value came from a 'enum six_lock_type', so I > know the original value can't be signed, so any comparison with > BTREE_NODE_UNLOCKED can never be true. > > But again, I suspect that in practice this all just happens to work. > That doesn't make it right. No, this was just broken - it should have been mark_btree_node_unlocked(), we never should've been passing that enum val there.