Message ID | Z5r6-52eBgT4TUYG@ArchLinux (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | add more ref consistency checks | expand |
shejialuo <shejialuo@gmail.com> writes: > It might seems that the method one is much easier than method two. > However, method one has a significant drawback. When we have checked the > file mode using "lstat", we will need to read the file content, there is > a possibility that when finishing reading the file content to the > memory, the file could be changed into a symlink and we cannot notice. To me, the above sounds like saying: The user can run 'git refs verify' and it may declare that refs are all good, and then somebody else can come in and turn the packed-refs file into a bad one, but the user will not notice the mischeif until the check is run the next time. It is just the time that somebody else comes in becomes a bit earlier than the time the 'git refs verify' command finishes, and there is no fundamental difference. > With method two, we could get the "fd" firstly. Even if the file is > changed into a symlink, we could still operate the "fd" in the memory > which is consistent across the checking which avoids race condition. The end result is the same with the lstat(2) approach, isn't it, though?. 'git refs verify' may say "I opened the file without following symlink and checked the contents, which turned out to be perfectly fine". But because that somebody else came in just after the command did nofollow-open and swapped the packed-refs file, the repository has a packed-refs file that is not a regular file after the command returns success. So I am not sure if I am following your argument to favor the latter over the former. What am I missing? As long as both approaches are equally portable, I do not think it matters which one we pick from correctness point of view, and we can pick the one that is easier to use to implement the feature. On a platform without O_NOFOLLOW, open_nofollow() falls back to the lstat and open, so your "open_nofollow() is better than lstat() and open()" argument does not portably work, though. > Reuse "FSCK_MSG_BAD_REF_FILETYPE" fsck message id to report the error to > the user if "packed-refs" is not a regular file. Good. Say "regular file" on the commit title, too, and it would be perfect. > -static int packed_fsck(struct ref_store *ref_store UNUSED, > - struct fsck_options *o UNUSED, > +static int packed_fsck(struct ref_store *ref_store, > + struct fsck_options *o, > struct worktree *wt) > { > + struct packed_ref_store *refs = packed_downcast(ref_store, > + REF_STORE_READ, "fsck"); > + int ret = 0; > + int fd; > > if (!is_main_worktree(wt)) > - return 0; > + goto cleanup; > > - return 0; > + if (o->verbose) > + fprintf_ln(stderr, "Checking packed-refs file %s", refs->path); > + > + fd = open_nofollow(refs->path, O_RDONLY); > + if (fd < 0) { > + /* > + * If the packed-refs file doesn't exist, there's nothing > + * to check. > + */ > + if (errno == ENOENT) > + goto cleanup; > + > + if (errno == ELOOP) { > + struct fsck_ref_report report = { 0 }; > + report.path = "packed-refs"; > + ret = fsck_report_ref(o, &report, > + FSCK_MSG_BAD_REF_FILETYPE, > + "not a regular file"); > + goto cleanup; > + } > + > + ret = error_errno(_("unable to open %s"), refs->path); > + goto cleanup; > + } > + > +cleanup: > + return ret; > } Looking good. > diff --git a/t/t0602-reffiles-fsck.sh b/t/t0602-reffiles-fsck.sh > index cf7a202d0d..42c8d4ca1e 100755 > --- a/t/t0602-reffiles-fsck.sh > +++ b/t/t0602-reffiles-fsck.sh > @@ -617,4 +617,26 @@ test_expect_success 'ref content checks should work with worktrees' ' > ) > ' > > +test_expect_success SYMLINKS 'the filetype of packed-refs should be checked' ' > + test_when_finished "rm -rf repo" && > + git init repo && > + ( > + cd repo && > + test_commit default && > + git branch branch-1 && > + git branch branch-2 && > + git branch branch-3 && > + git pack-refs --all && > + > + mv .git/packed-refs .git/packed-refs-back && > + ln -sf packed-refs-bak .git/packed-refs && > + test_must_fail git refs verify 2>err && > + cat >expect <<-EOF && > + error: packed-refs: badRefFiletype: not a regular file > + EOF > + rm .git/packed-refs && > + test_cmp expect err > + ) > +' > + > test_done OK. I notice that the previous step did not have any new test associated with it. Perhaps we can corrupt "HEAD" *and* replace packed-refs file with a symbolic link (or do some other damage to the refs) and make sure both breakages are reported? It does not have to be done in this step, and certainly not as a part of this single test this step adds, but we'd want it tested somewhere. Thanks.
On Thu, Jan 30, 2025 at 10:23:15AM -0800, Junio C Hamano wrote: > shejialuo <shejialuo@gmail.com> writes: > > > It might seems that the method one is much easier than method two. > > However, method one has a significant drawback. When we have checked the > > file mode using "lstat", we will need to read the file content, there is > > a possibility that when finishing reading the file content to the > > memory, the file could be changed into a symlink and we cannot notice. > > To me, the above sounds like saying: > > The user can run 'git refs verify' and it may declare that refs > are all good, and then somebody else can come in and turn the > packed-refs file into a bad one, but the user will not notice > the mischeif until the check is run the next time. > Yes, it is. > It is just the time that somebody else comes in becomes a bit > earlier than the time the 'git refs verify' command finishes, and > there is no fundamental difference. > > > With method two, we could get the "fd" firstly. Even if the file is > > changed into a symlink, we could still operate the "fd" in the memory > > which is consistent across the checking which avoids race condition. > > The end result is the same with the lstat(2) approach, isn't it, > though?. 'git refs verify' may say "I opened the file without > following symlink and checked the contents, which turned out to be > perfectly fine". But because that somebody else came in just after > the command did nofollow-open and swapped the packed-refs file, the > repository has a packed-refs file that is not a regular file after > the command returns success. So I am not sure if I am following > your argument to favor the latter over the former. What am I > missing? > Let me give you some background. In the version 1, I used the following way: ```c lstat(...) if (!IS_REG(...)) report_error(...); strbuf_read(...) ``` Patrick has told me that there is a possibility that between the `IS_REG` and `strbuf_read`, the "packed-refs" could be converted into a symlink. So, my idea is that we could use `open_nofollow`, when we have got the file descriptor, no matter what happens to `packed-refs` file (deleted or changed into a symlink), we could operate the file descriptor and read its content. However, on a platform with O_NOFOLLOW, this situation will also happen. So, I think we may just use "open_nofollow" now and don't talk about the method one at all to avoid confusing readers. > As long as both approaches are equally portable, I do not think it > matters which one we pick from correctness point of view, and we can > pick the one that is easier to use to implement the feature. > > On a platform without O_NOFOLLOW, open_nofollow() falls back to the > lstat and open, so your "open_nofollow() is better than lstat() and > open()" argument does not portably work, though. > Yes, actually in my first implementation, I didn't notice this. But the CI told me that and I finally chose "open_nofollow". > > Reuse "FSCK_MSG_BAD_REF_FILETYPE" fsck message id to report the error to > > the user if "packed-refs" is not a regular file. > > Good. Say "regular file" on the commit title, too, and it would be > perfect. > Let me improve this in the next version. > > diff --git a/t/t0602-reffiles-fsck.sh b/t/t0602-reffiles-fsck.sh > > index cf7a202d0d..42c8d4ca1e 100755 > > --- a/t/t0602-reffiles-fsck.sh > > +++ b/t/t0602-reffiles-fsck.sh > > @@ -617,4 +617,26 @@ test_expect_success 'ref content checks should work with worktrees' ' > > ) > > ' > > > > +test_expect_success SYMLINKS 'the filetype of packed-refs should be checked' ' > > + test_when_finished "rm -rf repo" && > > + git init repo && > > + ( > > + cd repo && > > + test_commit default && > > + git branch branch-1 && > > + git branch branch-2 && > > + git branch branch-3 && > > + git pack-refs --all && > > + > > + mv .git/packed-refs .git/packed-refs-back && > > + ln -sf packed-refs-bak .git/packed-refs && > > + test_must_fail git refs verify 2>err && > > + cat >expect <<-EOF && > > + error: packed-refs: badRefFiletype: not a regular file > > + EOF > > + rm .git/packed-refs && > > + test_cmp expect err > > + ) > > +' > > + > > test_done > > OK. I notice that the previous step did not have any new test > associated with it. Perhaps we can corrupt "HEAD" *and* replace > packed-refs file with a symbolic link (or do some other damage > to the refs) and make sure both breakages are reported? > As I have said in the previous comment, we cannot detect the error if "HEAD" itself is corrupted. However, we will check the referent in the later. So, we don't need to do this. > It does not have to be done in this step, and certainly not as a > part of this single test this step adds, but we'd want it tested > somewhere. > If we need to check the referent of the "HEAD" in the "packed-refs". We could do this in the later test. I could cover this in [PATCH 6/8]. Thanks, Jialuo
shejialuo <shejialuo@gmail.com> writes: > However, on a platform with O_NOFOLLOW, this situation will also happen. > So, I think we may just use "open_nofollow" now and don't talk about the > method one at all to avoid confusing readers. Exactly. That is what you see below ;-) >> As long as both approaches are equally portable, I do not think it >> matters which one we pick from correctness point of view, and we can >> pick the one that is easier to use to implement the feature. >> >> On a platform without O_NOFOLLOW, open_nofollow() falls back to the >> lstat and open, so your "open_nofollow() is better than lstat() and >> open()" argument does not portably work, though. >> ... >> OK. I notice that the previous step did not have any new test >> associated with it. Perhaps we can corrupt "HEAD" *and* replace >> packed-refs file with a symbolic link (or do some other damage >> to the refs) and make sure both breakages are reported? > > As I have said in the previous comment, we cannot detect the error if > "HEAD" itself is corrupted. However, we will check the referent in the > later. So, we don't need to do this. I still think you absolutely need to diagnose and tell the user about the broken HEAD. With your "don't check HEAD because a repository with a broken HEAD is not a repository", a check run in such a place may find everything else in the repository perfectly fine, but because the user wanted "git refs verify" to tell them about breakages, you would want to somehow tell them about it. Either it is missing, malformed, whatever.
On Fri, Jan 31, 2025 at 08:20:36AM -0800, Junio C Hamano wrote: > shejialuo <shejialuo@gmail.com> writes: > > > > > As I have said in the previous comment, we cannot detect the error if > > "HEAD" itself is corrupted. However, we will check the referent in the > > later. So, we don't need to do this. > > I still think you absolutely need to diagnose and tell the user > about the broken HEAD. With your "don't check HEAD because a > repository with a broken HEAD is not a repository", a check run in > such a place may find everything else in the repository perfectly > fine, but because the user wanted "git refs verify" to tell them > about breakages, you would want to somehow tell them about it. > Either it is missing, malformed, whatever. Yes, that's absolutely correct. However, I don't want to do this in this series. Actually, there is no check for root ref. I will add checks for root refs later. Thanks, Jialuo
On Thu, Jan 30, 2025 at 12:07:23PM +0800, shejialuo wrote: > Although "git-fsck(1)" and "packed-backend.c" will check some > consistency and correctness of "packed-refs" file, they never check the > filetype of the "packed-refs". The user should always use "git > packed-refs" command to create the raw regular "packed-refs" file, so we It's `git pack-refs`, not `git packed-refs`. Otherwise I'm not going to comment on the rest of the commit, as Junio has already sufficiently discussed it with you, and I very much agree with his assessment that we don't need to discuss whether or not to use `open_nofollow()` in this depth. Patrick
shejialuo <shejialuo@gmail.com> writes: > On Fri, Jan 31, 2025 at 08:20:36AM -0800, Junio C Hamano wrote: >> shejialuo <shejialuo@gmail.com> writes: >> >> > >> > As I have said in the previous comment, we cannot detect the error if >> > "HEAD" itself is corrupted. However, we will check the referent in the >> > later. So, we don't need to do this. >> >> I still think you absolutely need to diagnose and tell the user >> about the broken HEAD. With your "don't check HEAD because a >> repository with a broken HEAD is not a repository", a check run in >> such a place may find everything else in the repository perfectly >> fine, but because the user wanted "git refs verify" to tell them >> about breakages, you would want to somehow tell them about it. >> Either it is missing, malformed, whatever. > > Yes, that's absolutely correct. However, I don't want to do this in > this series. Actually, there is no check for root ref. I will add checks > for root refs later. Another thing I just thought of is that what is your plans for repository discovery when HEAD is iffy. In the working tree of our project, you go to a subdirectory, say "t/", and then corrupt the HEAD, would "git refs verify" still recognise that ../.git/ is the "repository" the user is interested in, but it has a broken HEAD? setup.c:is_git_directory() would say "no", so I am not sure the discovery would work without changing that, and I am not sure if it is worth doing (i.e. when the user knows the repository's HEAD is broken, it is OK to disable discovery and force them to say GIT_DIR=/this/directory).
On Mon, Feb 03, 2025 at 12:15:39PM -0800, Junio C Hamano wrote: > shejialuo <shejialuo@gmail.com> writes: > > > On Fri, Jan 31, 2025 at 08:20:36AM -0800, Junio C Hamano wrote: > >> shejialuo <shejialuo@gmail.com> writes: > >> > >> > > >> > As I have said in the previous comment, we cannot detect the error if > >> > "HEAD" itself is corrupted. However, we will check the referent in the > >> > later. So, we don't need to do this. > >> > >> I still think you absolutely need to diagnose and tell the user > >> about the broken HEAD. With your "don't check HEAD because a > >> repository with a broken HEAD is not a repository", a check run in > >> such a place may find everything else in the repository perfectly > >> fine, but because the user wanted "git refs verify" to tell them > >> about breakages, you would want to somehow tell them about it. > >> Either it is missing, malformed, whatever. > > > > Yes, that's absolutely correct. However, I don't want to do this in > > this series. Actually, there is no check for root ref. I will add checks > > for root refs later. > > Another thing I just thought of is that what is your plans for > repository discovery when HEAD is iffy. In the working tree of our > project, you go to a subdirectory, say "t/", and then corrupt the > HEAD, would "git refs verify" still recognise that ../.git/ is the > "repository" the user is interested in, but it has a broken HEAD? > > setup.c:is_git_directory() would say "no", so I am not sure the > discovery would work without changing that, and I am not sure if it > is worth doing (i.e. when the user knows the repository's HEAD is > broken, it is OK to disable discovery and force them to say > GIT_DIR=/this/directory). I have to say I am not so familiar with the "setup.c" code. Thanks for the direction here, I will dive into to figure out a solution. Thanks, Jialuo
diff --git a/refs/packed-backend.c b/refs/packed-backend.c index a7b6f74b6e..6401cecd5f 100644 --- a/refs/packed-backend.c +++ b/refs/packed-backend.c @@ -4,6 +4,7 @@ #include "../git-compat-util.h" #include "../config.h" #include "../dir.h" +#include "../fsck.h" #include "../gettext.h" #include "../hash.h" #include "../hex.h" @@ -1748,15 +1749,45 @@ static struct ref_iterator *packed_reflog_iterator_begin(struct ref_store *ref_s return empty_ref_iterator_begin(); } -static int packed_fsck(struct ref_store *ref_store UNUSED, - struct fsck_options *o UNUSED, +static int packed_fsck(struct ref_store *ref_store, + struct fsck_options *o, struct worktree *wt) { + struct packed_ref_store *refs = packed_downcast(ref_store, + REF_STORE_READ, "fsck"); + int ret = 0; + int fd; if (!is_main_worktree(wt)) - return 0; + goto cleanup; - return 0; + if (o->verbose) + fprintf_ln(stderr, "Checking packed-refs file %s", refs->path); + + fd = open_nofollow(refs->path, O_RDONLY); + if (fd < 0) { + /* + * If the packed-refs file doesn't exist, there's nothing + * to check. + */ + if (errno == ENOENT) + goto cleanup; + + if (errno == ELOOP) { + struct fsck_ref_report report = { 0 }; + report.path = "packed-refs"; + ret = fsck_report_ref(o, &report, + FSCK_MSG_BAD_REF_FILETYPE, + "not a regular file"); + goto cleanup; + } + + ret = error_errno(_("unable to open %s"), refs->path); + goto cleanup; + } + +cleanup: + return ret; } struct ref_storage_be refs_be_packed = { diff --git a/t/t0602-reffiles-fsck.sh b/t/t0602-reffiles-fsck.sh index cf7a202d0d..42c8d4ca1e 100755 --- a/t/t0602-reffiles-fsck.sh +++ b/t/t0602-reffiles-fsck.sh @@ -617,4 +617,26 @@ test_expect_success 'ref content checks should work with worktrees' ' ) ' +test_expect_success SYMLINKS 'the filetype of packed-refs should be checked' ' + test_when_finished "rm -rf repo" && + git init repo && + ( + cd repo && + test_commit default && + git branch branch-1 && + git branch branch-2 && + git branch branch-3 && + git pack-refs --all && + + mv .git/packed-refs .git/packed-refs-back && + ln -sf packed-refs-bak .git/packed-refs && + test_must_fail git refs verify 2>err && + cat >expect <<-EOF && + error: packed-refs: badRefFiletype: not a regular file + EOF + rm .git/packed-refs && + test_cmp expect err + ) +' + test_done
Although "git-fsck(1)" and "packed-backend.c" will check some consistency and correctness of "packed-refs" file, they never check the filetype of the "packed-refs". The user should always use "git packed-refs" command to create the raw regular "packed-refs" file, so we need to explicitly check this in "git refs verify". We could use the following two ways to check whether the "packed-refs" is regular: 1. We could use "lstat" system call to check the file mode. 2. We could use "open_nofollow" wrapper to open the raw "packed-refs" file If the returned fd value is less than 0, we could check whether the "errno" is "ELOOP" to report an error to the user. It might seems that the method one is much easier than method two. However, method one has a significant drawback. When we have checked the file mode using "lstat", we will need to read the file content, there is a possibility that when finishing reading the file content to the memory, the file could be changed into a symlink and we cannot notice. With method two, we could get the "fd" firstly. Even if the file is changed into a symlink, we could still operate the "fd" in the memory which is consistent across the checking which avoids race condition. Reuse "FSCK_MSG_BAD_REF_FILETYPE" fsck message id to report the error to the user if "packed-refs" is not a regular file. Mentored-by: Patrick Steinhardt <ps@pks.im> Mentored-by: Karthik Nayak <karthik.188@gmail.com> Signed-off-by: shejialuo <shejialuo@gmail.com> --- refs/packed-backend.c | 39 +++++++++++++++++++++++++++++++++++---- t/t0602-reffiles-fsck.sh | 22 ++++++++++++++++++++++ 2 files changed, 57 insertions(+), 4 deletions(-)