Message ID | 20210308150650.18626-1-avarab@gmail.com (mailing list archive) |
---|---|
Headers | show |
Series | tree-walk: mostly "mode" to "enum object_type" | expand |
On Mon, Mar 8, 2021 at 7:07 AM Ævar Arnfjörð Bjarmason <avarab@gmail.com> wrote: > > This large series goes on top of my 6 patch series for > read_tree_recursive() as this one further refactors that function. See > https://lore.kernel.org/git/20210308022138.28166-1-avarab@gmail.com/ > for that series. > > I noticed that since 2014 or so we haven't been doing the fsck checks > for bad file modes in trees. This series fixes that. I plan to add > tests etc. for that in another follow-up series. > > I wanted to get this out for review sooner than later, particularly > since the fsck testing will probably get me down another refactoring > path (fsck testing in general in this area is pretty bad...). > > As noted in 30/30 it would have been way easier to simply do an > isolated fix for that bug by introducing some fsck-specific API for > raw tree reading. > > But I thought the bug was symptomatic of a wider problem in our > codebase. Namely that we pass around the tree's mode *a lot*. > > But almost everything that then deals with the mode doesn't per-se > care about the mode bits in the tree, but using them to map that mode > to a tree entry for one of of OBJ_{BLOB,TREE,COMMIT}. > > So this is a large refactoring of all users of the widely used > tree-walk.h API to "enum obj2ect_type", finally in 29/30 I rename the > field to a scary "raw_mode". > > At that point we have just ~30-50 grep hits left for "raw_mode" in the > codebase (depending on whether we count names in function parameters). > > Hopefully being in that state alleviates e.g. Elijah's concerns > expressed in > https://lore.kernel.org/git/CABPp-BEdu1PqV5W=FuL0f08iFhGzvzV8oSUybNj4eF0aAwTnAw@mail.gmail.com/ > I agree that doing the equivalent of 30/30 on top of master would be > way too scary, but once we're at 29/30 I think it's sane. > > I tested this in combination with his on-list series to add more > merge-ort testing: > https://lore.kernel.org/git/pull.973.git.git.1614905738.gitgitgadget@gmail.com/ > > I found a regression I'd caused in the merge-ort.c code with those > tests, fixed here. See the comment in merge-ort.c in 30/30. I'll start reading through this series, but two quick early notes: - I was worried about touching the tree-walk code message up performance. Since collect_rename_info() in merge-ort.c takes the most time (and it's mostly a tree-walk), I was worried this would regress performance for me. A couple runs of my mega-renames big rebase testcase suggests performance is not that different, so my fears look unfounded. (Note: I tested in combination with all my performance improvements, because otherwise tree-walking would just be in the noise of overall runtime.) - There's a textual conflict with this series and my sets of patch series that finish off the ort implementation[1], but it's just two lines that are pretty easy to resolve. [1] https://github.com/gitgitgadget/git/pulls?q=is%3Aopen+is%3Apr+author%3Anewren+Optimization+batch > Ævar Arnfjörð Bjarmason (30): > diff.c: remove redundant canon_mode() call > notes & match-trees: use name_entry's "pathlen" member > cache.h: add a comment to object_type() > tree-walk.h: add object_type member to name_entry > tree-walk.c: migrate to using new "object_type" field when possible > cache.h: have base_name_compare() take "is tree?", not "mode" > tree-walk.h users: switch object_type(...) to new .object_type > tree.h: format argument lists of read_tree_recursive() users > tree.h users: format argument lists in archive.c > archive: get rid of 'stage' parameter > tree.h API: make read_tree_fn_t take an "enum object_type" > tree-walk.h users: migrate "p->mode &&" pattern > tree-walk.h users: refactor chained "mode" if/else into switch > tree-walk.h users: migrate miscellaneous "mode" to "object_type" > merge-tree tests: test for the mode comparison in same_entry() > merge-ort: correct reference to test in 62fdec17a11 > fsck.c: switch on "object_type" in fsck_walk_tree() > tree-walk.h users: use temporary variable(s) for "mode" > tree-walk.h API: formatting changes for subsequent commit > tree-walk.h API: rename get_tree_entry() to get_tree_entry_mode() > tree-walk.h API users: use "tmp" for mode in shift_tree_by() > tree-walk.h API: Add get_tree_entry_type() > tree-walk.h API: add a get_tree_entry_path() function > tree-walk.h API: document and format tree_entry_extract() > tree-entry.h API: rename tree_entry_extract() to > tree_entry_extract_mode() > tree-walk.h API: add a tree_entry_extract_all() function > tree-walk.h API: add a tree_entry_extract_type() function > tree-walk.h API users: rename "struct name_entry"'s "mode" to > "raw_mode" > tree.h API users: rename read_tree_fn_t's "mode" to "raw_mode" > tree-walk.h API: move canon_mode() back out of decode_tree_entry() > > archive.c | 51 +++++++++++++----------- > blame.c | 9 +++-- > builtin/checkout.c | 7 +++- > builtin/fast-import.c | 8 ++-- > builtin/grep.c | 6 +-- > builtin/log.c | 7 ++-- > builtin/ls-files.c | 13 +++--- > builtin/ls-tree.c | 18 ++++----- > builtin/merge-tree.c | 32 +++++++++------ > builtin/mktree.c | 4 +- > builtin/pack-objects.c | 6 +-- > builtin/reflog.c | 3 +- > builtin/rm.c | 2 +- > builtin/update-index.c | 7 +++- > cache-tree.c | 2 +- > cache.h | 11 ++++-- > combine-diff.c | 8 ++-- > delta-islands.c | 2 +- > diff.c | 2 +- > fsck.c | 23 +++++------ > http-push.c | 6 ++- > line-log.c | 2 +- > list-objects.c | 20 +++++++--- > match-trees.c | 52 ++++++++++++------------ > merge-ort.c | 34 ++++++++++------ > merge-recursive.c | 33 ++++++++-------- > notes.c | 15 +++---- > object-name.c | 7 ++-- > pack-bitmap-write.c | 8 ++-- > read-cache.c | 16 ++++---- > revision.c | 12 ++++-- > t/t4300-merge-tree.sh | 44 +++++++++++++++++++++ > tree-diff.c | 44 ++++++++++++--------- > tree-walk.c | 89 +++++++++++++++++++++++++++++++----------- > tree-walk.h | 67 ++++++++++++++++++++++++++----- > tree.c | 19 +++++---- > tree.h | 5 ++- > unpack-trees.c | 30 ++++++++------ > walker.c | 22 ++++++----- > 39 files changed, 482 insertions(+), 264 deletions(-) > > -- > 2.31.0.rc0.126.g04f22c5b82 >
On Mon, Mar 8, 2021 at 7:07 AM Ævar Arnfjörð Bjarmason <avarab@gmail.com> wrote: > > This large series goes on top of my 6 patch series for > read_tree_recursive() as this one further refactors that function. See > https://lore.kernel.org/git/20210308022138.28166-1-avarab@gmail.com/ > for that series. > > I noticed that since 2014 or so we haven't been doing the fsck checks > for bad file modes in trees. This series fixes that. I plan to add > tests etc. for that in another follow-up series. > > I wanted to get this out for review sooner than later, particularly > since the fsck testing will probably get me down another refactoring > path (fsck testing in general in this area is pretty bad...). > > As noted in 30/30 it would have been way easier to simply do an > isolated fix for that bug by introducing some fsck-specific API for > raw tree reading. > > But I thought the bug was symptomatic of a wider problem in our > codebase. Namely that we pass around the tree's mode *a lot*. > > But almost everything that then deals with the mode doesn't per-se > care about the mode bits in the tree, but using them to map that mode > to a tree entry for one of of OBJ_{BLOB,TREE,COMMIT}. > > So this is a large refactoring of all users of the widely used > tree-walk.h API to "enum obj2ect_type", finally in 29/30 I rename the > field to a scary "raw_mode". > > At that point we have just ~30-50 grep hits left for "raw_mode" in the > codebase (depending on whether we count names in function parameters). > > Hopefully being in that state alleviates e.g. Elijah's concerns > expressed in > https://lore.kernel.org/git/CABPp-BEdu1PqV5W=FuL0f08iFhGzvzV8oSUybNj4eF0aAwTnAw@mail.gmail.com/ > I agree that doing the equivalent of 30/30 on top of master would be > way too scary, but once we're at 29/30 I think it's sane. It's partially less scary (good cleanups that help make things clearer, your comment about the code having been in a similar state once upon a time in patch 30), but in some ways even more scary (after reading through 30/30 and readily noticing a few missing areas and starting to dig and finding several more). > > I tested this in combination with his on-list series to add more > merge-ort testing: > https://lore.kernel.org/git/pull.973.git.git.1614905738.gitgitgadget@gmail.com/ > > I found a regression I'd caused in the merge-ort.c code with those > tests, fixed here. See the comment in merge-ort.c in 30/30. I've read through the whole series now. It is nicely structured, and has lots of good cleanups. Unfortunately, there are also some clear regressions noted in my comments on both patches 6 and 30. I'm particularly worried with patch 30's basic plan; I think it'd be far safer to have the tree-walking default to returning canonicalized modes but allowing callers to request it be off. I think each caller is going to need someone to audit the particular path for whether it can be safely switched over to using raw modes on a case-by-case basis and with the introduction of new tests. Some callers probably aren't worth the effort (e.g. merge-recursive). Others might be, but require a fair amount of work or other trade-offs. I'm split about whether merge-ort should consider it. Using raw_modes might allow me to fix one funny corner case issue in merge-ort that to my knowledge no user has ever hit in practice, but I'm not sure fixing that testcase is worth it. To fix it, we'd also have to allow writing tree objects with non-canonicalized modes to the object store (for a "temporary" tree defining the virtual merge-base); that means new objects with "broken"/"non-canonicalized" modes being written that users can access. > Ævar Arnfjörð Bjarmason (30): > diff.c: remove redundant canon_mode() call > notes & match-trees: use name_entry's "pathlen" member > cache.h: add a comment to object_type() > tree-walk.h: add object_type member to name_entry > tree-walk.c: migrate to using new "object_type" field when possible > cache.h: have base_name_compare() take "is tree?", not "mode" > tree-walk.h users: switch object_type(...) to new .object_type > tree.h: format argument lists of read_tree_recursive() users > tree.h users: format argument lists in archive.c > archive: get rid of 'stage' parameter > tree.h API: make read_tree_fn_t take an "enum object_type" > tree-walk.h users: migrate "p->mode &&" pattern > tree-walk.h users: refactor chained "mode" if/else into switch > tree-walk.h users: migrate miscellaneous "mode" to "object_type" > merge-tree tests: test for the mode comparison in same_entry() > merge-ort: correct reference to test in 62fdec17a11 > fsck.c: switch on "object_type" in fsck_walk_tree() > tree-walk.h users: use temporary variable(s) for "mode" > tree-walk.h API: formatting changes for subsequent commit > tree-walk.h API: rename get_tree_entry() to get_tree_entry_mode() > tree-walk.h API users: use "tmp" for mode in shift_tree_by() > tree-walk.h API: Add get_tree_entry_type() > tree-walk.h API: add a get_tree_entry_path() function > tree-walk.h API: document and format tree_entry_extract() > tree-entry.h API: rename tree_entry_extract() to > tree_entry_extract_mode() > tree-walk.h API: add a tree_entry_extract_all() function > tree-walk.h API: add a tree_entry_extract_type() function > tree-walk.h API users: rename "struct name_entry"'s "mode" to > "raw_mode" > tree.h API users: rename read_tree_fn_t's "mode" to "raw_mode" > tree-walk.h API: move canon_mode() back out of decode_tree_entry() > > archive.c | 51 +++++++++++++----------- > blame.c | 9 +++-- > builtin/checkout.c | 7 +++- > builtin/fast-import.c | 8 ++-- > builtin/grep.c | 6 +-- > builtin/log.c | 7 ++-- > builtin/ls-files.c | 13 +++--- > builtin/ls-tree.c | 18 ++++----- > builtin/merge-tree.c | 32 +++++++++------ > builtin/mktree.c | 4 +- > builtin/pack-objects.c | 6 +-- > builtin/reflog.c | 3 +- > builtin/rm.c | 2 +- > builtin/update-index.c | 7 +++- > cache-tree.c | 2 +- > cache.h | 11 ++++-- > combine-diff.c | 8 ++-- > delta-islands.c | 2 +- > diff.c | 2 +- > fsck.c | 23 +++++------ > http-push.c | 6 ++- > line-log.c | 2 +- > list-objects.c | 20 +++++++--- > match-trees.c | 52 ++++++++++++------------ > merge-ort.c | 34 ++++++++++------ > merge-recursive.c | 33 ++++++++-------- > notes.c | 15 +++---- > object-name.c | 7 ++-- > pack-bitmap-write.c | 8 ++-- > read-cache.c | 16 ++++---- > revision.c | 12 ++++-- > t/t4300-merge-tree.sh | 44 +++++++++++++++++++++ > tree-diff.c | 44 ++++++++++++--------- > tree-walk.c | 89 +++++++++++++++++++++++++++++++----------- > tree-walk.h | 67 ++++++++++++++++++++++++++----- > tree.c | 19 +++++---- > tree.h | 5 ++- > unpack-trees.c | 30 ++++++++------ > walker.c | 22 ++++++----- > 39 files changed, 482 insertions(+), 264 deletions(-) > > -- > 2.31.0.rc0.126.g04f22c5b82 >
On Tue, Mar 09 2021, Elijah Newren wrote: > On Mon, Mar 8, 2021 at 7:07 AM Ævar Arnfjörð Bjarmason <avarab@gmail.com> wrote: >> >> This large series goes on top of my 6 patch series for >> read_tree_recursive() as this one further refactors that function. See >> https://lore.kernel.org/git/20210308022138.28166-1-avarab@gmail.com/ >> for that series. >> >> I noticed that since 2014 or so we haven't been doing the fsck checks >> for bad file modes in trees. This series fixes that. I plan to add >> tests etc. for that in another follow-up series. >> >> I wanted to get this out for review sooner than later, particularly >> since the fsck testing will probably get me down another refactoring >> path (fsck testing in general in this area is pretty bad...). >> >> As noted in 30/30 it would have been way easier to simply do an >> isolated fix for that bug by introducing some fsck-specific API for >> raw tree reading. >> >> But I thought the bug was symptomatic of a wider problem in our >> codebase. Namely that we pass around the tree's mode *a lot*. >> >> But almost everything that then deals with the mode doesn't per-se >> care about the mode bits in the tree, but using them to map that mode >> to a tree entry for one of of OBJ_{BLOB,TREE,COMMIT}. >> >> So this is a large refactoring of all users of the widely used >> tree-walk.h API to "enum obj2ect_type", finally in 29/30 I rename the >> field to a scary "raw_mode". >> >> At that point we have just ~30-50 grep hits left for "raw_mode" in the >> codebase (depending on whether we count names in function parameters). >> >> Hopefully being in that state alleviates e.g. Elijah's concerns >> expressed in >> https://lore.kernel.org/git/CABPp-BEdu1PqV5W=FuL0f08iFhGzvzV8oSUybNj4eF0aAwTnAw@mail.gmail.com/ >> I agree that doing the equivalent of 30/30 on top of master would be >> way too scary, but once we're at 29/30 I think it's sane. > > It's partially less scary (good cleanups that help make things > clearer, your comment about the code having been in a similar state > once upon a time in patch 30), but in some ways even more scary (after > reading through 30/30 and readily noticing a few missing areas and > starting to dig and finding several more). Thanks a lot for your reviews. I'm going to let this sit for a fair bit longer before any re-roll, especially with the rc period, other eyeballs on this most welcome though :) >> >> I tested this in combination with his on-list series to add more >> merge-ort testing: >> https://lore.kernel.org/git/pull.973.git.git.1614905738.gitgitgadget@gmail.com/ >> >> I found a regression I'd caused in the merge-ort.c code with those >> tests, fixed here. See the comment in merge-ort.c in 30/30. > > I've read through the whole series now. It is nicely structured, and > has lots of good cleanups. Unfortunately, there are also some clear > regressions noted in my comments on both patches 6 and 30. > > I'm particularly worried with patch 30's basic plan; I think it'd be > far safer to have the tree-walking default to returning canonicalized > modes but allowing callers to request it be off. I think each caller > is going to need someone to audit the particular path for whether it > can be safely switched over to using raw modes on a case-by-case basis > and with the introduction of new tests. Some callers probably aren't > worth the effort (e.g. merge-recursive). Others might be, but require > a fair amount of work or other trade-offs. I'm split about whether > merge-ort should consider it. Using raw_modes might allow me to fix > one funny corner case issue in merge-ort that to my knowledge no user > has ever hit in practice, but I'm not sure fixing that testcase is > worth it. To fix it, we'd also have to allow writing tree objects > with non-canonicalized modes to the object store (for a "temporary" > tree defining the virtual merge-base); that means new objects with > "broken"/"non-canonicalized" modes being written that users can > access. To add a bit to your worries, I think your "[...]does lower my worry some[...]" in 30/30 is unfortunately based on some unintentional lying on my part. I.e. my "yes our test coverage sucks, but[...]" was based on some misreading of the history. Here's a correction: 7146e66f086 didn't break the fsck check in mid-2014, it had been broken for much longer. See Jeff King's late-2014 E-mail about it here (which I've just now discovered): https://lore.kernel.org/git/20140923154751.GA19319@peff.net/#t Basically, decode_tree_entry() didn't sanitize the mode, but we did that in the tree_entry_extract() function, which is what fsck.c was using all along, so it was always getting pre-sanitized modes. But I think I was mostly right, somewhat by accident, AFAICT this was the state of things back then: $ git grep '\b(init_tree_desc|fill_tree_descriptor|update_tree_entry|update_tree_entry|tree_entry)\(' -- '*.c' | wc -l 78 $ git grep '\b(get_tree_entry|tree_entry_extract)\(' -- '*.c' | wc -l 25 Those were the API functions that gave you the un-canonical and canonical mode, respectively. But yes, I agree that it's probably a bit too scary. Where I was mainly trying to get to with 30/30 was that for any future code we'd more carefully review this whole "raw_mode" thing just because of its name. So what do you think about a version of 30/30 where existing API users immediately call canon_mode() upon calling the current API functions? It would be ugly and verbose now, but the benefit would be that we'd eye any future change that deals with this "raw_mode" with more suspicion, and likely convert it to use the object_type instead. Or I could just back out of this whole 29-30 step and just add a "bare" API for fsck in particular. I'm partial to renaming it to *something* just to make it more grep-able though, we have a "mode" in all sorts of structs all over the place... >> Ævar Arnfjörð Bjarmason (30): >> diff.c: remove redundant canon_mode() call >> notes & match-trees: use name_entry's "pathlen" member >> cache.h: add a comment to object_type() >> tree-walk.h: add object_type member to name_entry >> tree-walk.c: migrate to using new "object_type" field when possible >> cache.h: have base_name_compare() take "is tree?", not "mode" >> tree-walk.h users: switch object_type(...) to new .object_type >> tree.h: format argument lists of read_tree_recursive() users >> tree.h users: format argument lists in archive.c >> archive: get rid of 'stage' parameter >> tree.h API: make read_tree_fn_t take an "enum object_type" >> tree-walk.h users: migrate "p->mode &&" pattern >> tree-walk.h users: refactor chained "mode" if/else into switch >> tree-walk.h users: migrate miscellaneous "mode" to "object_type" >> merge-tree tests: test for the mode comparison in same_entry() >> merge-ort: correct reference to test in 62fdec17a11 >> fsck.c: switch on "object_type" in fsck_walk_tree() >> tree-walk.h users: use temporary variable(s) for "mode" >> tree-walk.h API: formatting changes for subsequent commit >> tree-walk.h API: rename get_tree_entry() to get_tree_entry_mode() >> tree-walk.h API users: use "tmp" for mode in shift_tree_by() >> tree-walk.h API: Add get_tree_entry_type() >> tree-walk.h API: add a get_tree_entry_path() function >> tree-walk.h API: document and format tree_entry_extract() >> tree-entry.h API: rename tree_entry_extract() to >> tree_entry_extract_mode() >> tree-walk.h API: add a tree_entry_extract_all() function >> tree-walk.h API: add a tree_entry_extract_type() function >> tree-walk.h API users: rename "struct name_entry"'s "mode" to >> "raw_mode" >> tree.h API users: rename read_tree_fn_t's "mode" to "raw_mode" >> tree-walk.h API: move canon_mode() back out of decode_tree_entry() >> >> archive.c | 51 +++++++++++++----------- >> blame.c | 9 +++-- >> builtin/checkout.c | 7 +++- >> builtin/fast-import.c | 8 ++-- >> builtin/grep.c | 6 +-- >> builtin/log.c | 7 ++-- >> builtin/ls-files.c | 13 +++--- >> builtin/ls-tree.c | 18 ++++----- >> builtin/merge-tree.c | 32 +++++++++------ >> builtin/mktree.c | 4 +- >> builtin/pack-objects.c | 6 +-- >> builtin/reflog.c | 3 +- >> builtin/rm.c | 2 +- >> builtin/update-index.c | 7 +++- >> cache-tree.c | 2 +- >> cache.h | 11 ++++-- >> combine-diff.c | 8 ++-- >> delta-islands.c | 2 +- >> diff.c | 2 +- >> fsck.c | 23 +++++------ >> http-push.c | 6 ++- >> line-log.c | 2 +- >> list-objects.c | 20 +++++++--- >> match-trees.c | 52 ++++++++++++------------ >> merge-ort.c | 34 ++++++++++------ >> merge-recursive.c | 33 ++++++++-------- >> notes.c | 15 +++---- >> object-name.c | 7 ++-- >> pack-bitmap-write.c | 8 ++-- >> read-cache.c | 16 ++++---- >> revision.c | 12 ++++-- >> t/t4300-merge-tree.sh | 44 +++++++++++++++++++++ >> tree-diff.c | 44 ++++++++++++--------- >> tree-walk.c | 89 +++++++++++++++++++++++++++++++----------- >> tree-walk.h | 67 ++++++++++++++++++++++++++----- >> tree.c | 19 +++++---- >> tree.h | 5 ++- >> unpack-trees.c | 30 ++++++++------ >> walker.c | 22 ++++++----- >> 39 files changed, 482 insertions(+), 264 deletions(-) >> >> -- >> 2.31.0.rc0.126.g04f22c5b82 >>
On Tue, Mar 9, 2021 at 1:48 PM Ævar Arnfjörð Bjarmason <avarab@gmail.com> wrote: > > On Tue, Mar 09 2021, Elijah Newren wrote: > > > On Mon, Mar 8, 2021 at 7:07 AM Ævar Arnfjörð Bjarmason <avarab@gmail.com> wrote: > >> > >> This large series goes on top of my 6 patch series for > >> read_tree_recursive() as this one further refactors that function. See > >> https://lore.kernel.org/git/20210308022138.28166-1-avarab@gmail.com/ > >> for that series. > >> > >> I noticed that since 2014 or so we haven't been doing the fsck checks > >> for bad file modes in trees. This series fixes that. I plan to add > >> tests etc. for that in another follow-up series. > >> > >> I wanted to get this out for review sooner than later, particularly > >> since the fsck testing will probably get me down another refactoring > >> path (fsck testing in general in this area is pretty bad...). > >> > >> As noted in 30/30 it would have been way easier to simply do an > >> isolated fix for that bug by introducing some fsck-specific API for > >> raw tree reading. > >> > >> But I thought the bug was symptomatic of a wider problem in our > >> codebase. Namely that we pass around the tree's mode *a lot*. > >> > >> But almost everything that then deals with the mode doesn't per-se > >> care about the mode bits in the tree, but using them to map that mode > >> to a tree entry for one of of OBJ_{BLOB,TREE,COMMIT}. > >> > >> So this is a large refactoring of all users of the widely used > >> tree-walk.h API to "enum obj2ect_type", finally in 29/30 I rename the > >> field to a scary "raw_mode". > >> > >> At that point we have just ~30-50 grep hits left for "raw_mode" in the > >> codebase (depending on whether we count names in function parameters). > >> > >> Hopefully being in that state alleviates e.g. Elijah's concerns > >> expressed in > >> https://lore.kernel.org/git/CABPp-BEdu1PqV5W=FuL0f08iFhGzvzV8oSUybNj4eF0aAwTnAw@mail.gmail.com/ > >> I agree that doing the equivalent of 30/30 on top of master would be > >> way too scary, but once we're at 29/30 I think it's sane. > > > > It's partially less scary (good cleanups that help make things > > clearer, your comment about the code having been in a similar state > > once upon a time in patch 30), but in some ways even more scary (after > > reading through 30/30 and readily noticing a few missing areas and > > starting to dig and finding several more). > > Thanks a lot for your reviews. I'm going to let this sit for a fair bit > longer before any re-roll, especially with the rc period, other eyeballs > on this most welcome though :) > > >> > >> I tested this in combination with his on-list series to add more > >> merge-ort testing: > >> https://lore.kernel.org/git/pull.973.git.git.1614905738.gitgitgadget@gmail.com/ > >> > >> I found a regression I'd caused in the merge-ort.c code with those > >> tests, fixed here. See the comment in merge-ort.c in 30/30. > > > > I've read through the whole series now. It is nicely structured, and > > has lots of good cleanups. Unfortunately, there are also some clear > > regressions noted in my comments on both patches 6 and 30. > > > > I'm particularly worried with patch 30's basic plan; I think it'd be > > far safer to have the tree-walking default to returning canonicalized > > modes but allowing callers to request it be off. I think each caller > > is going to need someone to audit the particular path for whether it > > can be safely switched over to using raw modes on a case-by-case basis > > and with the introduction of new tests. Some callers probably aren't > > worth the effort (e.g. merge-recursive). Others might be, but require > > a fair amount of work or other trade-offs. I'm split about whether > > merge-ort should consider it. Using raw_modes might allow me to fix > > one funny corner case issue in merge-ort that to my knowledge no user > > has ever hit in practice, but I'm not sure fixing that testcase is > > worth it. To fix it, we'd also have to allow writing tree objects > > with non-canonicalized modes to the object store (for a "temporary" > > tree defining the virtual merge-base); that means new objects with > > "broken"/"non-canonicalized" modes being written that users can > > access. > > To add a bit to your worries, I think your "[...]does lower my worry > some[...]" in 30/30 is unfortunately based on some unintentional lying > on my part. > > I.e. my "yes our test coverage sucks, but[...]" was based on some > misreading of the history. Here's a correction: > > 7146e66f086 didn't break the fsck check in mid-2014, it had been broken > for much longer. See Jeff King's late-2014 E-mail about it here (which > I've just now discovered): > https://lore.kernel.org/git/20140923154751.GA19319@peff.net/#t > > Basically, decode_tree_entry() didn't sanitize the mode, but we did that > in the tree_entry_extract() function, which is what fsck.c was using all > along, so it was always getting pre-sanitized modes. > > But I think I was mostly right, somewhat by accident, AFAICT this was > the state of things back then: > > $ git grep '\b(init_tree_desc|fill_tree_descriptor|update_tree_entry|update_tree_entry|tree_entry)\(' -- '*.c' | wc -l > 78 > $ git grep '\b(get_tree_entry|tree_entry_extract)\(' -- '*.c' | wc -l > 25 > > Those were the API functions that gave you the un-canonical and > canonical mode, respectively. > > But yes, I agree that it's probably a bit too scary. > > Where I was mainly trying to get to with 30/30 was that for any future > code we'd more carefully review this whole "raw_mode" thing just because > of its name. > > So what do you think about a version of 30/30 where existing API users > immediately call canon_mode() upon calling the current API functions? > > It would be ugly and verbose now, but the benefit would be that we'd eye > any future change that deals with this "raw_mode" with more suspicion, > and likely convert it to use the object_type instead. I don't have any strong objections to this. I'm less worried about ugly and verbose now, than I am in missing cases while converting; it'd be easy to get some of the conversions wrong accidentally (similar to the mode->is_tree changes for fast-import in patch 06 of this series or the original read_tree() removal patch before you fixed it up in v2), so I think it'd need to be done with some care. > Or I could just back out of this whole 29-30 step and just add a "bare" > API for fsck in particular. That'd be fine as well, and certainly less concerning; perhaps it's even the first step. But I don't think we have to do it this way, just that if we move towards raw_mode it needs to be done with a lot of caution. > I'm partial to renaming it to *something* just to make it more grep-able > though, we have a "mode" in all sorts of structs all over the place... > > >> Ævar Arnfjörð Bjarmason (30): > >> diff.c: remove redundant canon_mode() call > >> notes & match-trees: use name_entry's "pathlen" member > >> cache.h: add a comment to object_type() > >> tree-walk.h: add object_type member to name_entry > >> tree-walk.c: migrate to using new "object_type" field when possible > >> cache.h: have base_name_compare() take "is tree?", not "mode" > >> tree-walk.h users: switch object_type(...) to new .object_type > >> tree.h: format argument lists of read_tree_recursive() users > >> tree.h users: format argument lists in archive.c > >> archive: get rid of 'stage' parameter > >> tree.h API: make read_tree_fn_t take an "enum object_type" > >> tree-walk.h users: migrate "p->mode &&" pattern > >> tree-walk.h users: refactor chained "mode" if/else into switch > >> tree-walk.h users: migrate miscellaneous "mode" to "object_type" > >> merge-tree tests: test for the mode comparison in same_entry() > >> merge-ort: correct reference to test in 62fdec17a11 > >> fsck.c: switch on "object_type" in fsck_walk_tree() > >> tree-walk.h users: use temporary variable(s) for "mode" > >> tree-walk.h API: formatting changes for subsequent commit > >> tree-walk.h API: rename get_tree_entry() to get_tree_entry_mode() > >> tree-walk.h API users: use "tmp" for mode in shift_tree_by() > >> tree-walk.h API: Add get_tree_entry_type() > >> tree-walk.h API: add a get_tree_entry_path() function > >> tree-walk.h API: document and format tree_entry_extract() > >> tree-entry.h API: rename tree_entry_extract() to > >> tree_entry_extract_mode() > >> tree-walk.h API: add a tree_entry_extract_all() function > >> tree-walk.h API: add a tree_entry_extract_type() function > >> tree-walk.h API users: rename "struct name_entry"'s "mode" to > >> "raw_mode" > >> tree.h API users: rename read_tree_fn_t's "mode" to "raw_mode" > >> tree-walk.h API: move canon_mode() back out of decode_tree_entry() > >> > >> archive.c | 51 +++++++++++++----------- > >> blame.c | 9 +++-- > >> builtin/checkout.c | 7 +++- > >> builtin/fast-import.c | 8 ++-- > >> builtin/grep.c | 6 +-- > >> builtin/log.c | 7 ++-- > >> builtin/ls-files.c | 13 +++--- > >> builtin/ls-tree.c | 18 ++++----- > >> builtin/merge-tree.c | 32 +++++++++------ > >> builtin/mktree.c | 4 +- > >> builtin/pack-objects.c | 6 +-- > >> builtin/reflog.c | 3 +- > >> builtin/rm.c | 2 +- > >> builtin/update-index.c | 7 +++- > >> cache-tree.c | 2 +- > >> cache.h | 11 ++++-- > >> combine-diff.c | 8 ++-- > >> delta-islands.c | 2 +- > >> diff.c | 2 +- > >> fsck.c | 23 +++++------ > >> http-push.c | 6 ++- > >> line-log.c | 2 +- > >> list-objects.c | 20 +++++++--- > >> match-trees.c | 52 ++++++++++++------------ > >> merge-ort.c | 34 ++++++++++------ > >> merge-recursive.c | 33 ++++++++-------- > >> notes.c | 15 +++---- > >> object-name.c | 7 ++-- > >> pack-bitmap-write.c | 8 ++-- > >> read-cache.c | 16 ++++---- > >> revision.c | 12 ++++-- > >> t/t4300-merge-tree.sh | 44 +++++++++++++++++++++ > >> tree-diff.c | 44 ++++++++++++--------- > >> tree-walk.c | 89 +++++++++++++++++++++++++++++++----------- > >> tree-walk.h | 67 ++++++++++++++++++++++++++----- > >> tree.c | 19 +++++---- > >> tree.h | 5 ++- > >> unpack-trees.c | 30 ++++++++------ > >> walker.c | 22 ++++++----- > >> 39 files changed, 482 insertions(+), 264 deletions(-) > >> > >> -- > >> 2.31.0.rc0.126.g04f22c5b82