Message ID | pull.1018.git.git.1620240022594.gitgitgadget@gmail.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | builtin/init-db: preemptively clear repo_fmt to avoid leaks | expand |
Hi Andrzej, On Wed, 5 May 2021, Andrzej Hunt via GitGitGadget wrote: > diff --git a/builtin/init-db.c b/builtin/init-db.c > index 31b718259a64..b25147ebaf59 100644 > --- a/builtin/init-db.c > +++ b/builtin/init-db.c > @@ -512,6 +512,7 @@ int init_db(const char *git_dir, const char *real_git_dir, > git_dir, len && git_dir[len-1] != '/' ? "/" : ""); > } > > + clear_repository_format(&repo_fmt); I am afraid that this might not be correct, as t0410.27 experienced a segmentation fault (see https://github.com/git/git/pull/1018/checks?check_run_id=2511749719#step:5:2845 for the full details): -- snip -- [...] + repack_and_check -a bd6a66906d425462f8fbf1e3a61cda48cc4e456c 3251618c5193c77083052fb7d6ed5e8b0b227a94 + rm -rf repo2 + cp -r repo repo2 + git -C repo2 repack -a -d warning: reflog of 'HEAD' references pruned commits warning: reflog of 'refs/heads/master' references pruned commits Segmentation fault (core dumped) error: last command exited with $?=139 not ok 27 - repack -d does not irreversibly delete promisor objects -- snap -- When I encountered similar problems with my patches in the past, I found the `--valgrind-only` option incredibly useful, i.e. something like cd t && sh t0410-*.sh -i -v -x --valgrind-only=27 Maybe you can give it a try? Ciao, Dscho > free(original_git_dir); > return 0; > } > > base-commit: c3901c8daf043cdc16ffb20d3a410f3a2d5494fd > -- > gitgitgadget > >
Hi Johannes, On 05/05/2021 21:28, Johannes Schindelin wrote: > Hi Andrzej, > > On Wed, 5 May 2021, Andrzej Hunt via GitGitGadget wrote: > >> diff --git a/builtin/init-db.c b/builtin/init-db.c >> index 31b718259a64..b25147ebaf59 100644 >> --- a/builtin/init-db.c >> +++ b/builtin/init-db.c >> @@ -512,6 +512,7 @@ int init_db(const char *git_dir, const char *real_git_dir, >> git_dir, len && git_dir[len-1] != '/' ? "/" : ""); >> } >> >> + clear_repository_format(&repo_fmt); > > I am afraid that this might not be correct, as t0410.27 experienced a > segmentation fault (see > https://github.com/git/git/pull/1018/checks?check_run_id=2511749719#step:5:2845 > for the full details): Thanks for spotting that. On further investigation this looks like a preexisting issue on seen (my github PR was based on seen - in hindsight that was probably not a good idea) - here's a CI run from seen this morning exhibiting the same failure: https://github.com/git/git/runs/2515095446?check_suite_focus=true To gain more confidence, I've rebased my patch onto next, and I no longer hit any CI failures: https://github.com/ahunt/git/runs/2515309312?check_suite_focus=true (I ran this on my fork because changing the base branch appears to have confused the PR's CI configuration.) Nevertheless, I was still curious about what might be causing the failure in the first place: it appears to only happen in CI with gcc on Linux, and I was not able to repro on my machine using (tested with gcc-7.5 and gcc-10, with seen from this morning): make CC=gcc-10 T=t0410-partial-clone.sh test I'm guessing that understanding this failure might require the help of someone with an Ubuntu install to debug (unless there's some easy way of using Github's Actions to run a bisect)? -- snip -- ATB, Andrzej
Hi Andrzej, On Thu, 6 May 2021, Andrzej Hunt wrote: > On 05/05/2021 21:28, Johannes Schindelin wrote: > > > On Wed, 5 May 2021, Andrzej Hunt via GitGitGadget wrote: > > > > > diff --git a/builtin/init-db.c b/builtin/init-db.c > > > index 31b718259a64..b25147ebaf59 100644 > > > --- a/builtin/init-db.c > > > +++ b/builtin/init-db.c > > > @@ -512,6 +512,7 @@ int init_db(const char *git_dir, const char > > > *real_git_dir, > > > git_dir, len && git_dir[len-1] != '/' ? "/" : ""); > > > } > > > > > > + clear_repository_format(&repo_fmt); > > > > I am afraid that this might not be correct, as t0410.27 experienced a > > segmentation fault (see > > https://github.com/git/git/pull/1018/checks?check_run_id=2511749719#step:5:2845 > > for the full details): > > > Thanks for spotting that. On further investigation this looks like a > preexisting issue on seen (my github PR was based on seen - in hindsight that > was probably not a good idea) - here's a CI run from seen this morning > exhibiting the same failure: > https://github.com/git/git/runs/2515095446?check_suite_focus=true > > To gain more confidence, I've rebased my patch onto next, and I no longer hit > any CI failures: > https://github.com/ahunt/git/runs/2515309312?check_suite_focus=true > (I ran this on my fork because changing the base branch appears to have > confused the PR's CI configuration.) Oh, I'm sorry! But it is great news that this is not _your_ patch's problem (I had a brief look over it and wondered why it would cause this segfault). > Nevertheless, I was still curious about what might be causing the failure in > the first place: it appears to only happen in CI with gcc on Linux, and I was > not able to repro on my machine using (tested with gcc-7.5 and gcc-10, with > seen from this morning): > make CC=gcc-10 T=t0410-partial-clone.sh test There are more settings used in the CI run, indeed. So yeah, it would be good to debug this in an environment where it does fail. > I'm guessing that understanding this failure might require the help of someone > with an Ubuntu install to debug (unless there's some easy way of using > Github's Actions to run a bisect)? As a matter of fact, there is a neat way in GitHub Actions to debug this: the `action-tmate` Action. It installs and starts a `tmate` instance, which is a `tmux` inside kind of a proxied SSH server. This is what I usually do when I have time to track down such problems: I edit `.github/workflows/main.yml` and push it into a branch in my fork. The edits are essentially: - I delete the `ci-config` job (because I know that I want to run this workflow and don't want to wait for that job to be queued and run) - Obviously, this requires all the `needs: ci-config` and `if: needs.ci-config.outputs.enabled == 'yes'` lines to be deleted, too. - I also delete the jobs, tasks, and matrix elements that are not necessary for my debugging session (in your case, that would be e.g. the `dockerized` job as well as the vectors in `regular` that are not `linux-gcc`). - Then comes the most important part: after the `ci/print-test-failures.sh` task, I insert this task: - name: action-tmate if: failure() uses: mxschmitt/action-tmate@v3 with: limit-access-to-actor: true The `limit-access-to-actor` thing only applies if you uploaded your public SSH key to your GitHub profile. Otherwise, anybody who monitors the run could see the command-line to connect to your debugging instance and interfere with you. For details, see https://mxschmitt.github.io/action-tmate/ Thanks, Dscho
diff --git a/builtin/init-db.c b/builtin/init-db.c index 31b718259a64..b25147ebaf59 100644 --- a/builtin/init-db.c +++ b/builtin/init-db.c @@ -512,6 +512,7 @@ int init_db(const char *git_dir, const char *real_git_dir, git_dir, len && git_dir[len-1] != '/' ? "/" : ""); } + clear_repository_format(&repo_fmt); free(original_git_dir); return 0; }