Message ID | ZqumdJz3-0mqh6Rc@ArchLinux (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | ref consistency check infra setup | expand |
On Thu, Aug 01, 2024 at 11:15:00PM +0800, shejialuo wrote: > Introduce a new function "fsck_refs" that initializes and runs a child > process to execute the "git-refs verify" command. It's `git refs verify`, not `git-refs verify` both in the commit body and subject. > Mentored-by: Patrick Steinhardt <ps@pks.im> > Mentored-by: Karthik Nayak <karthik.188@gmail.com> > Signed-off-by: shejialuo <shejialuo@gmail.com> > --- > builtin/fsck.c | 17 +++++++++++++++++ > 1 file changed, 17 insertions(+) > > diff --git a/builtin/fsck.c b/builtin/fsck.c > index 766bbd014d..b6ac878270 100644 > --- a/builtin/fsck.c > +++ b/builtin/fsck.c > @@ -899,6 +899,21 @@ static int check_pack_rev_indexes(struct repository *r, int show_progress) > return res; > } > > +static void fsck_refs(void) > +{ > + struct child_process refs_verify = CHILD_PROCESS_INIT; > + child_process_init(&refs_verify); > + refs_verify.git_cmd = 1; > + strvec_pushl(&refs_verify.args, "refs", "verify", NULL); > + if (verbose) > + strvec_push(&refs_verify.args, "--verbose"); > + if (check_strict) > + strvec_push(&refs_verify.args, "--strict"); > + > + if (run_command(&refs_verify)) > + errors_found |= ERROR_REFS; > +} Okay. I think that it's sensible to execute this as part of git-fsck(1). But do we want to provide an option to disable this new check, as well? It does feel a bit like opening a can of worms, though. None of the other checks have trivial ways to disable them, and git-fsck(1) is gaining more and more checks. So if we can disable ref checks, we may also want to have options to disable checks for objects, connectivity, reverse indices, indices, commit graphs and whatnot. In other words, in my opinion we need to think a bit bigger and design a proper UI around this. But I don't think that should happen as part of this commit series, as it is already big enough. So either we just accept this patch as-is. Or we evict it from this series and handle it in the future together with all the other taks that one may potentially want to disable. I'd rather pick option two. Patrick
> > +static void fsck_refs(void) > > +{ > > + struct child_process refs_verify = CHILD_PROCESS_INIT; > > + child_process_init(&refs_verify); > > + refs_verify.git_cmd = 1; > > + strvec_pushl(&refs_verify.args, "refs", "verify", NULL); > > + if (verbose) > > + strvec_push(&refs_verify.args, "--verbose"); > > + if (check_strict) > > + strvec_push(&refs_verify.args, "--strict"); > > + > > + if (run_command(&refs_verify)) > > + errors_found |= ERROR_REFS; > > +} > > Okay. I think that it's sensible to execute this as part of git-fsck(1). > But do we want to provide an option to disable this new check, as well? > > It does feel a bit like opening a can of worms, though. None of the > other checks have trivial ways to disable them, and git-fsck(1) is > gaining more and more checks. So if we can disable ref checks, we may > also want to have options to disable checks for objects, connectivity, > reverse indices, indices, commit graphs and whatnot. In other words, in > my opinion we need to think a bit bigger and design a proper UI around > this. > > But I don't think that should happen as part of this commit series, as > it is already big enough. So either we just accept this patch as-is. Or > we evict it from this series and handle it in the future together with > all the other taks that one may potentially want to disable. > > I'd rather pick option two. > After talking with Patrick offline, we decide to drop this patch. At current, we should put this change slowly for the user. Because many people use "git-fsck(1)", currently we don't have a way to disable ref checks by default. It's a little beyond this series. We may consider later. > Patrick
shejialuo <shejialuo@gmail.com> writes: >> > +static void fsck_refs(void) >> > +{ >> > + struct child_process refs_verify = CHILD_PROCESS_INIT; >> > + child_process_init(&refs_verify); >> > + refs_verify.git_cmd = 1; >> > + strvec_pushl(&refs_verify.args, "refs", "verify", NULL); >> > + if (verbose) >> > + strvec_push(&refs_verify.args, "--verbose"); >> > + if (check_strict) >> > + strvec_push(&refs_verify.args, "--strict"); >> > + >> > + if (run_command(&refs_verify)) >> > + errors_found |= ERROR_REFS; >> > +} >> >> Okay. I think that it's sensible to execute this as part of git-fsck(1). >> But do we want to provide an option to disable this new check, as well? >> >> It does feel a bit like opening a can of worms, though. None of the >> other checks have trivial ways to disable them, and git-fsck(1) is >> gaining more and more checks. So if we can disable ref checks, we may >> also want to have options to disable checks for objects, connectivity, >> reverse indices, indices, commit graphs and whatnot. In other words, in >> my opinion we need to think a bit bigger and design a proper UI around >> this. >> >> But I don't think that should happen as part of this commit series, as >> it is already big enough. So either we just accept this patch as-is. Or >> we evict it from this series and handle it in the future together with >> all the other taks that one may potentially want to disable. >> >> I'd rather pick option two. >> > > After talking with Patrick offline, we decide to drop this patch. At > current, we should put this change slowly for the user. Because many > people use "git-fsck(1)", currently we don't have a way to disable ref > checks by default. It's a little beyond this series. > > We may consider later. Hmph, I am fine with the approach to take it slower. BUT. Here is what the diffstat for the whole thing in the updated round v15 looked like. Do most of these changes outside refs/ still needed if we do not give any way to even optionally enable it via "fsck" for those who feel adventurous? Should we still be touching fsck.[ch] and other fsck related infrastructure in the series? Documentation/fsck-msgids.txt | 6 ++ Documentation/git-refs.txt | 13 +++++ builtin/fsck.c | 17 +++--- builtin/mktag.c | 3 +- builtin/refs.c | 34 +++++++++++ fsck.c | 127 +++++++++++++++++++++++++++++++++--------- fsck.h | 76 +++++++++++++++++++------ object-file.c | 9 ++- refs.c | 5 ++ refs.h | 8 +++ refs/debug.c | 11 ++++ refs/files-backend.c | 116 +++++++++++++++++++++++++++++++++++++- refs/packed-backend.c | 8 +++ refs/refs-internal.h | 6 ++ refs/reftable-backend.c | 8 +++ t/t0602-reffiles-fsck.sh | 92 ++++++++++++++++++++++++++++++ 16 files changed, 480 insertions(+), 59 deletions(-)
> > Hmph, I am fine with the approach to take it slower. > > BUT. > > Here is what the diffstat for the whole thing in the updated round > v15 looked like. Do most of these changes outside refs/ still > needed if we do not give any way to even optionally enable it via > "fsck" for those who feel adventurous? Should we still be touching > fsck.[ch] and other fsck related infrastructure in the series? > From my current standing, regardless of whether we run the "git refs verify" subprocess inside "git-fsck(1)", we must change the fsck infrastructure illustrated by the following: From the development of this series, we can know the main problem is that fsck error message is highly coupled with the object checks. Even if we don't perform "git refs verify" in "git-fsck(1)", we still need to follow the fsck msg framework when executing "git refs verify". We cannot avoid this. The content we change for "fsck.[ch]" is mainly the fsck msg part. We do not change anything about the objects. I agree with you that it would be strange if we do not expose any interfaces for user who are adventurous. Actually we may simply add an option "--refs-experimental" or simply "--refs" to allow the users check ref consistency by using "git-fsck(1)". I guess the concern that Patrick cares about is that we ONLY make refs optional here, but do not provide options for other checks. It will be strange from this perspective. > Documentation/fsck-msgids.txt | 6 ++ > Documentation/git-refs.txt | 13 +++++ > builtin/fsck.c | 17 +++--- > builtin/mktag.c | 3 +- > builtin/refs.c | 34 +++++++++++ > fsck.c | 127 +++++++++++++++++++++++++++++++++--------- > fsck.h | 76 +++++++++++++++++++------ > object-file.c | 9 ++- > refs.c | 5 ++ > refs.h | 8 +++ > refs/debug.c | 11 ++++ > refs/files-backend.c | 116 +++++++++++++++++++++++++++++++++++++- > refs/packed-backend.c | 8 +++ > refs/refs-internal.h | 6 ++ > refs/reftable-backend.c | 8 +++ > t/t0602-reffiles-fsck.sh | 92 ++++++++++++++++++++++++++++++ > 16 files changed, 480 insertions(+), 59 deletions(-)
shejialuo <shejialuo@gmail.com> writes: > I agree with you that it would be strange if we do not expose any > interfaces for user who are adventurous. Actually we may simply add an > option "--refs-experimental" or simply "--refs" to allow the users check > ref consistency by using "git-fsck(1)". > > I guess the concern that Patrick cares about is that we ONLY make refs > optional here, but do not provide options for other checks. It will be > strange from this perspective. I do not care about strange all that much. I however care about new complexity in the code, complexity that is not taken advantage of and is not exercised. You said > From the development of this series, we can know the main problem is > that fsck error message is highly coupled with the object checks. and even if it is true and we have problem in fsck code paths, we cannot see if _your_ solution to that problem is a good one without having the code that exercises your additional code. But if "git refs verify" does exercise all the new code paths (and the refactored code that existed before this series, sitting now in different places), then I do not have to worry about it. My question was primarily to extract "even though we do not wire this up to fsck, we already have another code paths that uses all these changes" out of you. Thanks.
On Mon, Aug 05, 2024 at 12:38:43PM -0700, Junio C Hamano wrote: > shejialuo <shejialuo@gmail.com> writes: > > > I agree with you that it would be strange if we do not expose any > > interfaces for user who are adventurous. Actually we may simply add an > > option "--refs-experimental" or simply "--refs" to allow the users check > > ref consistency by using "git-fsck(1)". > > > > I guess the concern that Patrick cares about is that we ONLY make refs > > optional here, but do not provide options for other checks. It will be > > strange from this perspective. > > I do not care about strange all that much. I however care about new > complexity in the code, complexity that is not taken advantage of > and is not exercised. > > You said > > > From the development of this series, we can know the main problem is > > that fsck error message is highly coupled with the object checks. > > and even if it is true and we have problem in fsck code paths, we > cannot see if _your_ solution to that problem is a good one without > having the code that exercises your additional code. > > But if "git refs verify" does exercise all the new code paths (and > the refactored code that existed before this series, sitting now in > different places), then I do not have to worry about it. My question > was primarily to extract "even though we do not wire this up to fsck, > we already have another code paths that uses all these changes" out > of you. > I understand what you mean here. I can say that "git refs verify" only exercises a part of the new code paths. The main reason why this series changes a lot of "fsck.[ch]" is that I want to expose the "fsck_report_ref" interface to report refs-related errors. So I guess this part should be covered. At the current implementation, we change the "fsck.[ch]" for the following three things: 1. Refactor "report" to use "fsck_vreport" 2. Create "fsck_report_ref" for refs check. 3. Do some simple renames to distinguish between refs and objects. We do cover the second case in "git refs verify". But sadly, the first case and third case are covered in "git-fsck(1)". So, "git refs verify" does not exercise the refactored code. However, I am a little confused. Actually, in the implementation, refs check and objects check are independent. I think we should wire up "git refs verify" to "git-fsck(1)". Because we have no way to exercise the above case 1 and 3. If we do not, we will bring a lot of complexity here. > Thanks.
On Mon, Aug 05, 2024 at 12:38:43PM -0700, Junio C Hamano wrote: > shejialuo <shejialuo@gmail.com> writes: > > > I agree with you that it would be strange if we do not expose any > > interfaces for user who are adventurous. Actually we may simply add an > > option "--refs-experimental" or simply "--refs" to allow the users check > > ref consistency by using "git-fsck(1)". > > > > I guess the concern that Patrick cares about is that we ONLY make refs > > optional here, but do not provide options for other checks. It will be > > strange from this perspective. > > I do not care about strange all that much. I however care about new > complexity in the code, complexity that is not taken advantage of > and is not exercised. > > You said > > > From the development of this series, we can know the main problem is > > that fsck error message is highly coupled with the object checks. > > and even if it is true and we have problem in fsck code paths, we > cannot see if _your_ solution to that problem is a good one without > having the code that exercises your additional code. > > But if "git refs verify" does exercise all the new code paths (and > the refactored code that existed before this series, sitting now in > different places), then I do not have to worry about it. My question > was primarily to extract "even though we do not wire this up to fsck, > we already have another code paths that uses all these changes" out > of you. Yeah, there is quite some complexity introduced in fsck code, where most of the complexity comes from making the reportin functions more generic. And we do end up using that in this series even if we don't integrate `git refs verify` with `git fsck`. It brings us the ability to configure the checks performed by `git refs verify` in the same way like we can configure all the other checks in `git fsck`, such that we can enable, disable or change severity levels for each of the messages reported by it. I think that this is a sensible way to go about it, and it leaves us with a more flexible and consistent error reporting infrastructure that we can eventually also use for other commands that git-fsck(1) shells out to. Patrick
On Tue, Aug 06, 2024 at 08:42:23AM +0800, shejialuo wrote: > On Mon, Aug 05, 2024 at 12:38:43PM -0700, Junio C Hamano wrote: > > shejialuo <shejialuo@gmail.com> writes: > > > > > I agree with you that it would be strange if we do not expose any > > > interfaces for user who are adventurous. Actually we may simply add an > > > option "--refs-experimental" or simply "--refs" to allow the users check > > > ref consistency by using "git-fsck(1)". > > > > > > I guess the concern that Patrick cares about is that we ONLY make refs > > > optional here, but do not provide options for other checks. It will be > > > strange from this perspective. > > > > I do not care about strange all that much. I however care about new > > complexity in the code, complexity that is not taken advantage of > > and is not exercised. > > > > You said > > > > > From the development of this series, we can know the main problem is > > > that fsck error message is highly coupled with the object checks. > > > > and even if it is true and we have problem in fsck code paths, we > > cannot see if _your_ solution to that problem is a good one without > > having the code that exercises your additional code. > > > > But if "git refs verify" does exercise all the new code paths (and > > the refactored code that existed before this series, sitting now in > > different places), then I do not have to worry about it. My question > > was primarily to extract "even though we do not wire this up to fsck, > > we already have another code paths that uses all these changes" out > > of you. > > > > I understand what you mean here. I can say that "git refs verify" only > exercises a part of the new code paths. The main reason why this series > changes a lot of "fsck.[ch]" is that I want to expose the > "fsck_report_ref" interface to report refs-related errors. So I guess > this part should be covered. > > At the current implementation, we change the "fsck.[ch]" for the > following three things: > > 1. Refactor "report" to use "fsck_vreport" > 2. Create "fsck_report_ref" for refs check. > 3. Do some simple renames to distinguish between refs and objects. > > We do cover the second case in "git refs verify". But sadly, the first > case and third case are covered in "git-fsck(1)". So, "git refs verify" > does not exercise the refactored code. > > However, I am a little confused. Actually, in the implementation, refs > check and objects check are independent. > > I think we should wire up "git refs verify" to "git-fsck(1)". Because we > have no way to exercise the above case 1 and 3. If we do not, we will > bring a lot of complexity here. I don't think that is necessary. Basically, what you are saying is that we seem to be testing only the new refs-related reporting that you have introduced via your refactorings, but not the preexisting objects related functionality. That isn't true though. The object-specific reporting functions that you have refactored already had callers before, and it still has callers now because you adapted those accordingly. You can assume that those callers already had tests before your refactorings, and as no tests broken you can be reasonably sure that your refactorings are sound for the object related code. Furthermore, you do exercise the new ref-related parts via a couple of tests that exercise `git refs verify`. Consequently, all parts of the refactoring are covered by either old or new tests, and we should be good here. So even though we do not exercise the ref-related parts via git-fsck(1), it is still subjected to tests. Eventually, when we start calling `git refs verify` in git-fsck(1), we'd introduce more tests that verify that the integration of those two commands works as expected, as well. So, to summarize: the refactored functionality is both used and tested and I think it's sensible to defer the integration of git-fsck(1) and git-refs(1). Thanks! Patrick
Patrick Steinhardt <ps@pks.im> writes: >> > But if "git refs verify" does exercise all the new code paths (and >> > the refactored code that existed before this series, sitting now in >> > different places), then I do not have to worry about it. My question >> > was primarily to extract "even though we do not wire this up to fsck, >> > we already have another code paths that uses all these changes" out >> > of you. >> ... > So, to summarize: the refactored functionality is both used and tested > and I think it's sensible to defer the integration of git-fsck(1) and > git-refs(1). After refactoring, existing functionality about objects are used, of course, (there is no other code that does so), the refactoring lets the code to learn to perform checks on references, and these new checks are exercised by "git refs verify". I took what shejialuo said that way, and that is fine by me when I said the above ;-). So I think we all are on the same page? Thanks.
On Tue, Aug 06, 2024 at 08:54:34AM -0700, Junio C Hamano wrote: > Patrick Steinhardt <ps@pks.im> writes: > > >> > But if "git refs verify" does exercise all the new code paths (and > >> > the refactored code that existed before this series, sitting now in > >> > different places), then I do not have to worry about it. My question > >> > was primarily to extract "even though we do not wire this up to fsck, > >> > we already have another code paths that uses all these changes" out > >> > of you. > >> ... > > So, to summarize: the refactored functionality is both used and tested > > and I think it's sensible to defer the integration of git-fsck(1) and > > git-refs(1). > > After refactoring, existing functionality about objects are used, of > course, (there is no other code that does so), the refactoring lets > the code to learn to perform checks on references, and these new > checks are exercised by "git refs verify". > > I took what shejialuo said that way, and that is fine by me when I > said the above ;-). So I think we all are on the same page? Yup, we are. I was mostly aiming to reassure Jialuo, who was a bit uncertain whether his answers had been sufficient or not. Thanks! Patrick
diff --git a/builtin/fsck.c b/builtin/fsck.c index 766bbd014d..b6ac878270 100644 --- a/builtin/fsck.c +++ b/builtin/fsck.c @@ -899,6 +899,21 @@ static int check_pack_rev_indexes(struct repository *r, int show_progress) return res; } +static void fsck_refs(void) +{ + struct child_process refs_verify = CHILD_PROCESS_INIT; + child_process_init(&refs_verify); + refs_verify.git_cmd = 1; + strvec_pushl(&refs_verify.args, "refs", "verify", NULL); + if (verbose) + strvec_push(&refs_verify.args, "--verbose"); + if (check_strict) + strvec_push(&refs_verify.args, "--strict"); + + if (run_command(&refs_verify)) + errors_found |= ERROR_REFS; +} + static char const * const fsck_usage[] = { N_("git fsck [--tags] [--root] [--unreachable] [--cache] [--no-reflogs]\n" " [--[no-]full] [--strict] [--verbose] [--lost-found]\n" @@ -1068,6 +1083,8 @@ int cmd_fsck(int argc, const char **argv, const char *prefix) check_connectivity(); + fsck_refs(); + if (the_repository->settings.core_commit_graph) { struct child_process commit_graph_verify = CHILD_PROCESS_INIT;
Introduce a new function "fsck_refs" that initializes and runs a child process to execute the "git-refs verify" command. Mentored-by: Patrick Steinhardt <ps@pks.im> Mentored-by: Karthik Nayak <karthik.188@gmail.com> Signed-off-by: shejialuo <shejialuo@gmail.com> --- builtin/fsck.c | 17 +++++++++++++++++ 1 file changed, 17 insertions(+)