Message ID | pull.673.v3.git.1594925141.gitgitgadget@gmail.com (mailing list archive) |
---|---|
Headers | show |
Series | Remove special casing for PSEUDOREF updates | expand |
"Han-Wen Nienhuys via GitGitGadget" <gitgitgadget@gmail.com> writes: > This gets rid of the special casing code for pseudorefs in refs.c > > This is in preparation for reftable. > > v3 > > * tweak git-update-ref.txt description for logAllRefUpdates > > Han-Wen Nienhuys (3): > t1400: use git rev-parse for testing PSEUDOREF existence > Modify pseudo refs through ref backend storage > Make HEAD a PSEUDOREF rather than PER_WORKTREE. I reviewed some codepaths that deal with FETCH_HEAD recently. As the file is quite different from all the other pseudo references in that it needs to store more than one object name and in that each ref in it needs more than just the object name, I doubt that it makes much sense to enhance the refs API so that its requirements can be covered. It would probably make sense for FETCH_HEAD to stay "a text file directly underneath $GIT_DIR", but get_oid() needs to be able to read it and return the first object name recorded in the file.
On Fri, Jul 17, 2020 at 12:10 AM Junio C Hamano <gitster@pobox.com> wrote: > I reviewed some codepaths that deal with FETCH_HEAD recently. > > As the file is quite different from all the other pseudo references > in that it needs to store more than one object name and in that each > ref in it needs more than just the object name, I doubt that it > makes much sense to enhance the refs API so that its requirements > can be covered. I agree. Do we ever pretend that FETCH_HEAD is a ref today?
Han-Wen Nienhuys <hanwen@google.com> writes: > On Fri, Jul 17, 2020 at 12:10 AM Junio C Hamano <gitster@pobox.com> wrote: >> I reviewed some codepaths that deal with FETCH_HEAD recently. >> >> As the file is quite different from all the other pseudo references >> in that it needs to store more than one object name and in that each >> ref in it needs more than just the object name, I doubt that it >> makes much sense to enhance the refs API so that its requirements >> can be covered. > > I agree. Do we ever pretend that FETCH_HEAD is a ref today? "git rev-parse FETCH_HEAD", "git show FETCH_HEAD" etc. all should keep working, so in that sense, it is treated as a ref. It does not protect the history leading to the objects listed in it from being collected, though. "git merge FETCH_HEAD" is an interesting case---I haven't thought it through. Jung fubhyq unccra nsgre "tvg chyy bevtva sbb one" nggrzcgf gb teno gjb oenapurf naq znxr na bpgbchf zretr vagb gur oenapu pheeragyl purpxrq bhg, naq gura "tvg erfrg --uneq && tvg zretr SRGPU_URNQ" vf tvira?
On Mon, Jul 27, 2020 at 6:20 PM Junio C Hamano <gitster@pobox.com> wrote: > > On Fri, Jul 17, 2020 at 12:10 AM Junio C Hamano <gitster@pobox.com> wrote: > >> I reviewed some codepaths that deal with FETCH_HEAD recently. > >> > >> As the file is quite different from all the other pseudo references > >> in that it needs to store more than one object name and in that each > >> ref in it needs more than just the object name, I doubt that it > >> makes much sense to enhance the refs API so that its requirements > >> can be covered. > > > > I agree. Do we ever pretend that FETCH_HEAD is a ref today? > > "git rev-parse FETCH_HEAD", "git show FETCH_HEAD" etc. all should keep > working, so in that sense, it is treated as a ref. I added this to the last version of the full reftable patch series that I posted, as patches "Split off reading loose ref data in separate function" and "Read FETCH_HEAD as loose ref". Which other refs that aren't really refs should also be supported? The JGit source code suggests that MERGE_HEAD should also be special cased? It's not pretty, though. The entry point is read_raw_ref(), which gets a ref_store as argument. ref_store doesn't have generic API to get at the repository directory, so the implementation of reading the FETCH_HEAD file has to be pushed down to the ref backend (which does know the directory). > It does not > protect the history leading to the objects listed in it from being > collected, though. Is there documented agreement of how pseudo refs and GC should interact? > "git merge FETCH_HEAD" is an interesting case---I haven't thought it > through. > > What should happen after "git pull origin foo bar" attempts to grab > two branches and make an octopus merge into the branch currently > checked out, and then "git reset --hard && git merge FETCH_HEAD" is > given? I don't understand this question.
Han-Wen Nienhuys <hanwen@google.com> writes: >> >> As the file is quite different from all the other pseudo references >> >> in that it needs to store more than one object name and in that each >> >> ref in it needs more than just the object name, I doubt that it >> >> makes much sense to enhance the refs API so that its requirements >> >> can be covered. >> > >> > I agree. Do we ever pretend that FETCH_HEAD is a ref today? >> >> "git rev-parse FETCH_HEAD", "git show FETCH_HEAD" etc. all should keep >> working, so in that sense, it is treated as a ref. > > I added this to the last version of the full reftable patch series > that I posted, as patches > "Split off reading loose ref data in separate function" and "Read > FETCH_HEAD as loose ref". > > Which other refs that aren't really refs should also be supported? The > JGit source code suggests that MERGE_HEAD should also be special > cased? I'd think all .git/${SOMETHING}_HEAD are of transitory nature that can be left as simple on-disk files that are read (and preferrably written---except for FETCH_HEAD for obvious reasons) as if they are loose refs handled by files backend. It probably makes sense not to even write reflog entries for them---it is not like the MERGE_HEAD I see now in .git/ directory is an updated version of MERGE_HEAD I had there yesterday. "git log -g MERGE_HEAD" gives no interesting information. >> "git merge FETCH_HEAD" is an interesting case---I haven't thought it >> through. >> >> What should happen after "git pull origin foo bar" attempts to grab >> two branches and make an octopus merge into the branch currently >> checked out, and then "git reset --hard && git merge FETCH_HEAD" is >> given? > > I don't understand this question. The request "git pull origin foo bar" is "grab the tip of 'foo' and 'bar' branches from remote whose name is 'origin'; merge these two commits into the commit that I have checked out". To fulfill the request, "git pull" runs "git fetch", and the latter leaves two lines of interest in .git/FETCH_HEAD file. Each line lists the name of the object at the ref, an optional "not-for-merge" token (which in this case does not exist, as both records are for merging), and piece of human-readable text to describe where that object came from to help later step that computes the default message for the resulting merge commit. If the octopus merge does not finish correctly (e.g. due to conflicts), with "git reset --hard", we can recover to the original state and re-attempt the opeation with "git merge FETCH_HEAD". Such a merge using FETCH_HEAD will produce an octopus merge. Which means that at least "git merge", FETCH_HEAD is not just a regular ref where you can ask what object it points at and it gives you a single object name back. But to other commands like "git log master..FETCH_HEAD", it acts as if there is only one object recorded. That makes it an interesting case. Making it act as if "git log ^master origin/foo origin/bar" were given might be a "bugfix" to make it behave more "correctly", but I do not know how large a fallout such a change brings in.
Junio C Hamano <gitster@pobox.com> writes: >> Which other refs that aren't really refs should also be supported? The >> JGit source code suggests that MERGE_HEAD should also be special >> cased? > > I'd think all .git/${SOMETHING}_HEAD are of transitory nature that > can be left as simple on-disk files that are read (and preferrably > written---except for FETCH_HEAD for obvious reasons) as if they are > loose refs handled by files backend. Sorry for flipping and flopping. The above goes directly against the spirit of 09743417 (Modify pseudo refs through ref backend storage, 2020-07-27). I still think .git/${SOMETHING}_HEAD except for FETCH_HEAD should be written and read via the ref subsystem, but I was wrong to say that it should always be done via the files backend. There is no reason to insist on the use of files backend here. > It probably makes sense not to > even write reflog entries for them---it is not like the MERGE_HEAD > I see now in .git/ directory is an updated version of MERGE_HEAD I > had there yesterday. "git log -g MERGE_HEAD" gives no interesting > information. This still is true, but that is pretty much orthogonal to which backend is used. > If the octopus merge does not finish correctly (e.g. due to > conflicts), with "git reset --hard", we can recover to the original > state and re-attempt the opeation with "git merge FETCH_HEAD". Such > a merge using FETCH_HEAD will produce an octopus merge. > > Which means that at least "git merge", FETCH_HEAD is not just a > regular ref where you can ask what object it points at and it gives > you a single object name back. > > But to other commands like "git log master..FETCH_HEAD", it acts as > if there is only one object recorded. All of which means FETCH_HEAD is special and we may not want to burden the special casing of it to newer backends.
On Wed, Aug 5, 2020 at 3:45 AM Junio C Hamano <gitster@pobox.com> wrote: > > I'd think all .git/${SOMETHING}_HEAD are of transitory nature that > > can be left as simple on-disk files that are read (and preferrably > > written---except for FETCH_HEAD for obvious reasons) as if they are > > loose refs handled by files backend. > > Sorry for flipping and flopping. The above goes directly against > the spirit of 09743417 (Modify pseudo refs through ref backend > storage, 2020-07-27). I still think .git/${SOMETHING}_HEAD except > for FETCH_HEAD should be written and read via the ref subsystem, but > I was wrong to say that it should always be done via the files > backend. There is no reason to insist on the use of files backend > here. Yes, that confused me. Thanks for setting that straight. > > If the octopus merge does not finish correctly (e.g. due to > > conflicts), with "git reset --hard", we can recover to the original > > state and re-attempt the opeation with "git merge FETCH_HEAD". Such > > a merge using FETCH_HEAD will produce an octopus merge. > > > > Which means that at least "git merge", FETCH_HEAD is not just a > > regular ref where you can ask what object it points at and it gives > > you a single object name back. > > > > But to other commands like "git log master..FETCH_HEAD", it acts as > > if there is only one object recorded. > > All of which means FETCH_HEAD is special and we may not want to > burden the special casing of it to newer backends. Can you confirm that FETCH_HEAD is the only thing that can store more than just a symref / SHA1 ? Based on the name, and a comment in the JGit source, I thought that MERGE_HEAD might contain more than one SHA1 at a time.
Han-Wen Nienhuys <hanwen@google.com> writes: >> All of which means FETCH_HEAD is special and we may not want to >> burden the special casing of it to newer backends. > > Can you confirm that FETCH_HEAD is the only thing that can store more > than just a symref / SHA1 ? Based on the name, and a comment in the > JGit source, I thought that MERGE_HEAD might contain more than one > SHA1 at a time. No I cannot. I do not think MERGE_HEAD is something I added with as deep a thought as I did with FETCH_HEAD. If it mimics FETCH_HEAD, then perhaps it does, but I somehow doubt it. As can be seen in builtin/merge.c::collect_parents(), we do special case FETCH_HEAD when grabbing what commit*S* to merge, but all others are read with get_merge_parent() that makes a single call to get_oid(), i.e. anything other than FETCH_HEAD cannot have more than one object recorded.
On Wed, Aug 5, 2020 at 5:54 PM Junio C Hamano <gitster@pobox.com> wrote: > >> All of which means FETCH_HEAD is special and we may not want to > >> burden the special casing of it to newer backends. > > > > Can you confirm that FETCH_HEAD is the only thing that can store more > > than just a symref / SHA1 ? Based on the name, and a comment in the > > JGit source, I thought that MERGE_HEAD might contain more than one > > SHA1 at a time. > > No I cannot. I do not think MERGE_HEAD is something I added with as > deep a thought as I did with FETCH_HEAD. If it mimics FETCH_HEAD, > then perhaps it does, but I somehow doubt it. > > As can be seen in builtin/merge.c::collect_parents(), we do special > case FETCH_HEAD when grabbing what commit*S* to merge, but all > others are read with get_merge_parent() that makes a single call to > get_oid(), i.e. anything other than FETCH_HEAD cannot have more than > one object recorded. Understood. Is there anything you'd like me to do with this patch series so it can be merged? Dealing with FETCH_HEAD generically isn't possible unless we extend the API of the ref backend: the generic ref_store instance doesn't offer a way to get at the path that corresponds to FETCH_HEAD, so we can't handle it in refs_read_raw_ref(). In the current reftable series, FETCH_HEAD is dealt with in the backends separately.
Han-Wen Nienhuys <hanwen@google.com> writes: > Dealing with FETCH_HEAD generically isn't possible unless we extend > the API of the ref backend: the generic ref_store instance doesn't > offer a way to get at the path that corresponds to FETCH_HEAD, so we > can't handle it in refs_read_raw_ref(). In the current reftable > series, FETCH_HEAD is dealt with in the backends separately. I am not sure what the best way would be, but I do not think any existing code writes into it using the refs API at all, even though it may be read only for the first object name using the refs API. And I am not sure if we want to extend the write side API so that the callers can express the full flexibility of that single file. So perhaps the best way forward would be to ensure that anybody who tries to read from FETCH_HEAD using the ref API reads the first object name in it from $GIT_DIR/FETCH_HEAD file as we've always done since the beginning of time, regardless of what ref backend is used, that anybody who tries to write FETCH_HEAD using the ref API gets an error, and letting the writers write into it directly?
On Mon, Aug 10, 2020 at 6:04 PM Junio C Hamano <gitster@pobox.com> wrote: > > Han-Wen Nienhuys <hanwen@google.com> writes: > > > Dealing with FETCH_HEAD generically isn't possible unless we extend > > the API of the ref backend: the generic ref_store instance doesn't > > offer a way to get at the path that corresponds to FETCH_HEAD, so we > > can't handle it in refs_read_raw_ref(). In the current reftable > > series, FETCH_HEAD is dealt with in the backends separately. > > I am not sure what the best way would be, but I do not think any > existing code writes into it using the refs API at all, even though > it may be read only for the first object name using the refs API. > > And I am not sure if we want to extend the write side API so that > the callers can express the full flexibility of that single file. That's not what I am getting at. I am just interested in how to handle FETCH_HEAD in refs_read_raw_ref. > So perhaps the best way forward would be to ensure that anybody who > tries to read from FETCH_HEAD using the ref API reads the first > object name in it from $GIT_DIR/FETCH_HEAD file as we've always done > since the beginning of time, regardless of what ref backend is used, Right, but how do we get at the value of $GIT_DIR given a struct ref_store? We can either push that out to the ref store backend, because each backend knows what $GIT_DIR is, or we can make $GIT_DIR a property of the generic ref_store. I suppose it's cleaner to make the latter API extension.
Han-Wen Nienhuys <hanwen@google.com> writes: > On Mon, Aug 10, 2020 at 6:04 PM Junio C Hamano <gitster@pobox.com> wrote: >> >> Han-Wen Nienhuys <hanwen@google.com> writes: >> >> > Dealing with FETCH_HEAD generically isn't possible unless we extend >> > the API of the ref backend: the generic ref_store instance doesn't >> > offer a way to get at the path that corresponds to FETCH_HEAD, so we >> > can't handle it in refs_read_raw_ref(). In the current reftable >> > series, FETCH_HEAD is dealt with in the backends separately. >> >> I am not sure what the best way would be, but I do not think any >> existing code writes into it using the refs API at all, even though >> it may be read only for the first object name using the refs API. >> >> And I am not sure if we want to extend the write side API so that >> the callers can express the full flexibility of that single file. > > That's not what I am getting at. I am just interested in how to handle > FETCH_HEAD in refs_read_raw_ref. > >> So perhaps the best way forward would be to ensure that anybody who >> tries to read from FETCH_HEAD using the ref API reads the first >> object name in it from $GIT_DIR/FETCH_HEAD file as we've always done >> since the beginning of time, regardless of what ref backend is used, > > Right, but how do we get at the value of $GIT_DIR given a struct > ref_store? We can either push that out to the ref store backend, > because each backend knows what $GIT_DIR is, or we can make $GIT_DIR a > property of the generic ref_store. > > I suppose it's cleaner to make the latter API extension. Yup. Sounds like the right direction to go. Thanks.
On Wed, Aug 5, 2020 at 5:54 PM Junio C Hamano <gitster@pobox.com> wrote: > >> All of which means FETCH_HEAD is special and we may not want to > >> burden the special casing of it to newer backends. > > > > Can you confirm that FETCH_HEAD is the only thing that can store more > > than just a symref / SHA1 ? Based on the name, and a comment in the > > JGit source, I thought that MERGE_HEAD might contain more than one > > SHA1 at a time. > > No I cannot. I do not think MERGE_HEAD is something I added with as > deep a thought as I did with FETCH_HEAD. If it mimics FETCH_HEAD, > then perhaps it does, but I somehow doubt it. blame.c has an append_merge_parents() which reads multiple lines from MERGE_HEAD, and cmd_commit does so too. It's populated from write_merge_heads(). So the special casing should be extended to MERGE_HEAD.
Han-Wen Nienhuys <hanwen@google.com> writes: > blame.c has an append_merge_parents() which reads multiple lines from > MERGE_HEAD, and cmd_commit does so too. It's populated from > write_merge_heads(). So the special casing should be extended to > MERGE_HEAD. Ahh. That makes sense.
This gets rid of the special casing code for pseudorefs in refs.c This is in preparation for reftable. v3 * tweak git-update-ref.txt description for logAllRefUpdates Han-Wen Nienhuys (3): t1400: use git rev-parse for testing PSEUDOREF existence Modify pseudo refs through ref backend storage Make HEAD a PSEUDOREF rather than PER_WORKTREE. Documentation/git-update-ref.txt | 13 ++-- refs.c | 127 +++---------------------------- t/t1400-update-ref.sh | 30 ++++---- t/t1405-main-ref-store.sh | 5 +- 4 files changed, 36 insertions(+), 139 deletions(-) base-commit: b6a658bd00c9c29e07f833cabfc0ef12224e277a Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-673%2Fhanwen%2Fpseudoref-v3 Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-673/hanwen/pseudoref-v3 Pull-Request: https://github.com/gitgitgadget/git/pull/673 Range-diff vs v2: 1: 9c3dc4b2cb = 1: 28bd3534d0 t1400: use git rev-parse for testing PSEUDOREF existence 2: 871b411517 ! 2: 79cd5dd480 Modify pseudo refs through ref backend storage @@ Commit message Signed-off-by: Han-Wen Nienhuys <hanwen@google.com> + ## Documentation/git-update-ref.txt ## +@@ Documentation/git-update-ref.txt: still see a subset of the modifications. + + LOGGING UPDATES + --------------- +-If config parameter "core.logAllRefUpdates" is true and the ref is one under +-"refs/heads/", "refs/remotes/", "refs/notes/", or the symbolic ref HEAD; or +-the file "$GIT_DIR/logs/<ref>" exists then `git update-ref` will append +-a line to the log file "$GIT_DIR/logs/<ref>" (dereferencing all +-symbolic refs before creating the log name) describing the change +-in ref value. Log lines are formatted as: ++If config parameter "core.logAllRefUpdates" is true and the ref is one ++under "refs/heads/", "refs/remotes/", "refs/notes/", or a pseudoref ++like HEAD or ORIG_HEAD; or the file "$GIT_DIR/logs/<ref>" exists then ++`git update-ref` will append a line to the log file ++"$GIT_DIR/logs/<ref>" (dereferencing all symbolic refs before creating ++the log name) describing the change in ref value. Log lines are ++formatted as: + + oldsha1 SP newsha1 SP committer LF + + ## refs.c ## @@ refs.c: long get_files_ref_lock_timeout_ms(void) return timeout_ms; 3: 1c2b9d5f17 = 3: 3ab9f2f04e Make HEAD a PSEUDOREF rather than PER_WORKTREE.