mbox series

[0/5] refs: remove functions without ref store

Message ID cover.1714717057.git.ps@pks.im (mailing list archive)
Headers show
Series refs: remove functions without ref store | expand

Message

Patrick Steinhardt May 3, 2024, 6:27 a.m. UTC
Hi,

this patch series aims to convert the ref subsystem to rely less on
`the_repository`. The focus of it is to remove those cases where we have
two variants of the same function: one with a `struct ref_store`, and
one without. There are still other cases in "refs.c" where we implicitly
assume `the_repository`, but those require a bit more thought and will
thus be handled in a subsequent patch series.

The biggest part of this patch is a set of new Coccinelle rules added by
patch 3. Those rules are applied in patch 4 and the now-unused functions
that do not take a `struct ref_store` are then removed in patch 5. This
of course results in quite a lot of churn, but given that it is fully
automated via Coccinelle I don't think it is particularly bad.

It is quite likely that this patch series will impact in-flight patch
series. I'd be quite happy to drop the last patch that removes the old
interfaces to make this a bit less painful.

Patrick

Patrick Steinhardt (5):
  refs: introduce missing functions that accept a `struct ref_store`
  refs: add `exclude_patterns` parameter to `for_each_fullref_in()`
  cocci: introduce rules to transform "refs" to pass ref store
  cocci: apply rules to rewrite callers of "refs" interfaces
  refs: remove functions without ref store

 add-interactive.c             |  17 ++-
 bisect.c                      |  25 +++--
 blame.c                       |   4 +-
 branch.c                      |   5 +-
 builtin/am.c                  |  38 ++++---
 builtin/bisect.c              |  44 +++++---
 builtin/blame.c               |   4 +-
 builtin/branch.c              |  49 +++++----
 builtin/checkout.c            |  35 +++---
 builtin/clone.c               |  36 +++---
 builtin/describe.c            |   3 +-
 builtin/fast-import.c         |  11 +-
 builtin/fetch.c               |  20 +++-
 builtin/fsck.c                |  11 +-
 builtin/gc.c                  |   3 +-
 builtin/log.c                 |   6 +-
 builtin/merge.c               |  34 ++++--
 builtin/name-rev.c            |   5 +-
 builtin/notes.c               |  26 +++--
 builtin/pack-objects.c        |  10 +-
 builtin/pull.c                |   2 +-
 builtin/rebase.c              |  18 +--
 builtin/receive-pack.c        |  15 ++-
 builtin/reflog.c              |  25 +++--
 builtin/remote.c              |  37 ++++---
 builtin/repack.c              |   7 +-
 builtin/replace.c             |   9 +-
 builtin/reset.c               |  13 ++-
 builtin/rev-parse.c           |  25 +++--
 builtin/show-branch.c         |  22 ++--
 builtin/show-ref.c            |  19 +++-
 builtin/stash.c               |  23 ++--
 builtin/submodule--helper.c   |   7 +-
 builtin/symbolic-ref.c        |  13 ++-
 builtin/tag.c                 |  11 +-
 builtin/update-index.c        |   2 +-
 builtin/update-ref.c          |  21 ++--
 builtin/worktree.c            |  19 ++--
 bundle-uri.c                  |  12 +-
 bundle.c                      |   2 +-
 commit-graph.c                |   3 +-
 commit.c                      |   3 +-
 config.c                      |   3 +-
 contrib/coccinelle/refs.cocci | 103 +++++++++++++++++
 delta-islands.c               |   3 +-
 fetch-pack.c                  |   6 +-
 fmt-merge-msg.c               |   4 +-
 help.c                        |   5 +-
 http-backend.c                |  13 ++-
 log-tree.c                    |   9 +-
 ls-refs.c                     |  10 +-
 midx-write.c                  |   3 +-
 negotiator/default.c          |   3 +-
 negotiator/skipping.c         |   3 +-
 notes-cache.c                 |   6 +-
 notes-merge.c                 |   2 +-
 notes-utils.c                 |   7 +-
 notes.c                       |   5 +-
 reachable.c                   |   5 +-
 ref-filter.c                  |  35 ++++--
 reflog-walk.c                 |  27 +++--
 reflog.c                      |  20 ++--
 refs.c                        | 200 ++++------------------------------
 refs.h                        |  84 +++-----------
 remote.c                      |  38 ++++---
 reset.c                       |  29 +++--
 revision.c                    |  27 +++--
 sequencer.c                   |  61 ++++++-----
 server-info.c                 |   3 +-
 setup.c                       |   2 +-
 shallow.c                     |  16 ++-
 submodule.c                   |   6 +-
 transport-helper.c            |  29 +++--
 transport.c                   |  16 ++-
 upload-pack.c                 |  20 ++--
 walker.c                      |   6 +-
 wt-status.c                   |  22 ++--
 77 files changed, 845 insertions(+), 680 deletions(-)
 create mode 100644 contrib/coccinelle/refs.cocci

Comments

Junio C Hamano May 3, 2024, 5:24 p.m. UTC | #1
Patrick Steinhardt <ps@pks.im> writes:

> It is quite likely that this patch series will impact in-flight patch
> series. I'd be quite happy to drop the last patch that removes the old
> interfaces to make this a bit less painful.

The last step could replace these deprecated-to-be-removed functions
with a stub that BUG()s out [*], with a comment to instruct how a
caller can be rewritten to use the corresponding refs_ variant with
a call to get_main_ref_store(the_repository) as the first parameter,
which would help out of tree and in-flight series to migrate.

[Footnote]

 * The exact mechanism to cause an attempted use of an old function
   fail is immaterial.  We can remove the definition of these
   functions while retaining the old implementation as comments, or
   wrap them in an #ifdef USE_REF_STORE_LESS_FUNCTIONS .. #endif
   pair _without_ defining USE_REF_STORE_LESS_FUNCTIONS, purely for
   the documentation value to help us migration.
Jeff King May 3, 2024, 5:35 p.m. UTC | #2
On Fri, May 03, 2024 at 10:24:00AM -0700, Junio C Hamano wrote:

> Patrick Steinhardt <ps@pks.im> writes:
> 
> > It is quite likely that this patch series will impact in-flight patch
> > series. I'd be quite happy to drop the last patch that removes the old
> > interfaces to make this a bit less painful.
> 
> The last step could replace these deprecated-to-be-removed functions
> with a stub that BUG()s out [*], with a comment to instruct how a
> caller can be rewritten to use the corresponding refs_ variant with
> a call to get_main_ref_store(the_repository) as the first parameter,
> which would help out of tree and in-flight series to migrate.
> 
> [Footnote]
> 
>  * The exact mechanism to cause an attempted use of an old function
>    fail is immaterial.  We can remove the definition of these
>    functions while retaining the old implementation as comments, or
>    wrap them in an #ifdef USE_REF_STORE_LESS_FUNCTIONS .. #endif
>    pair _without_ defining USE_REF_STORE_LESS_FUNCTIONS, purely for
>    the documentation value to help us migration.

I prefer strict removal, as then the problem is caught by the compiler,
rather than runtime/tests. The error message does not point the user in
the right direction, but IMHO that is trumped by finding it sooner in
the edit-compile-test cycle.

Though maybe an even more radical proposal: now that read_ref_full(),
etc, are gone, and we have only refs_read_ref_full(), could/should we
shorten the latter to drop the "refs_" prefix? Its original purpose of
distinguishing the "takes a ref store vs using the_repository" variants
is now done, and shorter names are less annoying. But:
 
  - maybe there is value in having ref-related functions namespaced? We
    certainly don't cover all ref functions here, though, and aside from
    tight OO-ish APIs (e.g. strbuf) we don't usually do so at all.

  - the error message for in-flight callers of the "old" names will be
    even more confusing (it will not be "foo() does not exist" but
    rather "you did not pass enough arguments to foo()").

-Peff
Junio C Hamano May 3, 2024, 6:24 p.m. UTC | #3
Jeff King <peff@peff.net> writes:

> Though maybe an even more radical proposal: now that read_ref_full(),
> etc, are gone, and we have only refs_read_ref_full(), could/should we
> shorten the latter to drop the "refs_" prefix?

I view it as a good longer-term goal.  But I also view it as an
orthogonal issue to the transition.

We need a smooth migration path for remaining callers of these older
functions.  We could do the USE_THE_INDEX_MACROS like compatibility
layer during transition period.

Thanks.
Taylor Blau May 3, 2024, 6:58 p.m. UTC | #4
On Fri, May 03, 2024 at 01:35:53PM -0400, Jeff King wrote:
> Though maybe an even more radical proposal: now that read_ref_full(),
> etc, are gone, and we have only refs_read_ref_full(), could/should we
> shorten the latter to drop the "refs_" prefix? Its original purpose of
> distinguishing the "takes a ref store vs using the_repository" variants
> is now done, and shorter names are less annoying. But:
>
>   - maybe there is value in having ref-related functions namespaced? We
>     certainly don't cover all ref functions here, though, and aside from
>     tight OO-ish APIs (e.g. strbuf) we don't usually do so at all.
>
>   - the error message for in-flight callers of the "old" names will be
>     even more confusing (it will not be "foo() does not exist" but
>     rather "you did not pass enough arguments to foo()").

I actually thought something like the approach we take in banned.h might
be helpful, e.g.:

    #define REFS_DEPRECATE(func) use_refs_##func##_instead

    #define read_ref(refname, oid) REFS_DEPRECATE(read_ref)

Then, we could add a bunch of these to the top of refs.h, which would
give semi-helpful compiler messages to those whose series are affected
by this change.

But TBH I think this is probably overkill and anybody who encounters an
issue like this likely does not need the extra hand.

Thanks,
Taylor
Junio C Hamano May 3, 2024, 7:35 p.m. UTC | #5
Taylor Blau <me@ttaylorr.com> writes:

> But TBH I think this is probably overkill and anybody who encounters an
> issue like this likely does not need the extra hand.

As long as the replacement is clearly obvious.  Is it always the
rule that "when the function F you tried to call is missing, just
prefix F with refs_ and add the first parameter that is a ref_store,
i.e., refs_f(get_main_ref_store(the_repository), ...)"?  Even if
that were the case, some in-header comment would help the affected
topic owners than having nothing at all.
Patrick Steinhardt May 6, 2024, 6:44 a.m. UTC | #6
On Fri, May 03, 2024 at 11:24:11AM -0700, Junio C Hamano wrote:
> Jeff King <peff@peff.net> writes:
> 
> > Though maybe an even more radical proposal: now that read_ref_full(),
> > etc, are gone, and we have only refs_read_ref_full(), could/should we
> > shorten the latter to drop the "refs_" prefix?
> 
> I view it as a good longer-term goal.  But I also view it as an
> orthogonal issue to the transition.

Personally, I'd prefer to keep the `refs_` prefix. This may be personal
preference, but I find it way easier to reason about code when there are
prefixes for our functions that clearly indicate the subsystem they
belong to.

It's also in line with how other subsystems behave. Everything relating
to strbufs has a `strbuf_` prefix, attr-related code has `attr_` or
`git_attr_`, mem-pool has `mem_pool_`. So ref-related code having a
`ref_` prefix just feels natural to me.

> We need a smooth migration path for remaining callers of these older
> functions.  We could do the USE_THE_INDEX_MACROS like compatibility
> layer during transition period.

Wouldn't this be overengineered? We already have all the required
functions. So my take would be to drop the last patch for now, wait a
release cycle, and then remove it in the next one.

The only problem is that this allows in-flight patch series to introduce
_new_ callers. So we're basically working against a moving target in
that case. That is something that would be addressed by having something
like your proposed `USE_THE_INDEX_MACROS` macro. But honestly, I doubt
that it would be faster for any author of a patch series to figure out
that they now need to a define something compared to just adding the
`refs_` prefix to their functions.

Patrick
Junio C Hamano May 6, 2024, 4:14 p.m. UTC | #7
Patrick Steinhardt <ps@pks.im> writes:

> It's also in line with how other subsystems behave. Everything relating
> to strbufs has a `strbuf_` prefix, attr-related code has `attr_` or
> `git_attr_`, mem-pool has `mem_pool_`. So ref-related code having a
> `ref_` prefix just feels natural to me.

OK.  This is a tangent but there are a few functions with strbuf_
prefix whose sole association with the strbuf is that it is the type
that happens to be used to store the operand of the operation (it is
like naming getenv() as str_getenv() or something silly like that).
We should rename them to lose their strbuf_ prefix in the longer
term.

> ... But honestly, I doubt
> that it would be faster for any author of a patch series to figure out
> that they now need to a define something compared to just adding the
> `refs_` prefix to their functions.

The authors would not bother figuring that out while your series is
not yet in 'master'.  The alternative they have is to base their
series on top of yours.  You may have "what changes are needed on
the callers side" in your head, but they don't.

Somebody brought up the approach used in <banned.h> to move the
problem to link time, but in the context the only message we are
giving is "it is banned - do not use it", which is sufficient over
there, but probably not in this context.  "it was removed - use this
instead by adding this suffix and add that as the first parameter"
is the message I want whoever needs to deal with the fallout to see.
Patrick Steinhardt May 7, 2024, 5:56 a.m. UTC | #8
On Mon, May 06, 2024 at 09:14:55AM -0700, Junio C Hamano wrote:
> Patrick Steinhardt <ps@pks.im> writes:
> > ... But honestly, I doubt
> > that it would be faster for any author of a patch series to figure out
> > that they now need to a define something compared to just adding the
> > `refs_` prefix to their functions.
> 
> The authors would not bother figuring that out while your series is
> not yet in 'master'.  The alternative they have is to base their
> series on top of yours.  You may have "what changes are needed on
> the callers side" in your head, but they don't.

Yeah, that is fair indeed. Theoretically one could re-run Coccinelle
whenever merging a new topic. But that would of course put the burden on
you, which is something we definitely want to avoid.

> Somebody brought up the approach used in <banned.h> to move the
> problem to link time, but in the context the only message we are
> giving is "it is banned - do not use it", which is sufficient over
> there, but probably not in this context.  "it was removed - use this
> instead by adding this suffix and add that as the first parameter"
> is the message I want whoever needs to deal with the fallout to see.

Unfortunately we cannot quite get there at compile time because it is
not possible to expand a macro into an error macro with an arbitrary
message. The best I could come up with is the following:

    #define REPLACED_REFS_FUNC(func) func ## was_replaced_by_refs_### func

Which results in compiler errors like this:

    bisect.c:712:6: error: use of undeclared identifier 'read_ref_was_replaced_by_refs_read_ref'
      712 |         if (read_ref("BISECT_EXPECTED_REV", &expected_oid))

What is still missing is the bit of informatino that you need to pass in
`get_main_ref_store()`. But maybe this is good enough?

Patrick
Junio C Hamano May 7, 2024, 6:20 a.m. UTC | #9
Patrick Steinhardt <ps@pks.im> writes:

> message. The best I could come up with is the following:
>
>     #define REPLACED_REFS_FUNC(func) func ## was_replaced_by_refs_### func
>
> Which results in compiler errors like this:
>
>     bisect.c:712:6: error: use of undeclared identifier 'read_ref_was_replaced_by_refs_read_ref'
>       712 |         if (read_ref("BISECT_EXPECTED_REV", &expected_oid))
>
> What is still missing is the bit of informatino that you need to pass in
> `get_main_ref_store()`. But maybe this is good enough?

What I had in mind was a lot more stupid like the attached.  For
illustration purposes, I just did only one, but you got the idea.
Thanks to "#if 0", the compilation will fail, the compiler telling
the developer "resolve_ref_unsafe()? what are you talking about?",
and the developer will grep for that name to find the hint at the
end.

 refs.h | 14 +++++++++++---
 1 file changed, 11 insertions(+), 3 deletions(-)

diff --git c/refs.h w/refs.h
index d278775e08..a7e1c261ac 100644
--- c/refs.h
+++ w/refs.h
@@ -72,9 +72,6 @@ const char *refs_resolve_ref_unsafe(struct ref_store *refs,
 				    struct object_id *oid,
 				    int *flags);
 
-const char *resolve_ref_unsafe(const char *refname, int resolve_flags,
-			       struct object_id *oid, int *flags);
-
 char *refs_resolve_refdup(struct ref_store *refs,
 			  const char *refname, int resolve_flags,
 			  struct object_id *oid, int *flags);
@@ -1054,4 +1051,15 @@ void update_ref_namespace(enum ref_namespace namespace, char *ref);
 int is_pseudoref(struct ref_store *refs, const char *refname);
 int is_headref(struct ref_store *refs, const char *refname);
 
+/* The following are removed - rewrite your callers */
+#if 0
+static inline const char *resolve_ref_unsafe(
+	const char *refname, int resolve_flags,
+	struct object_id *oid, int *flags)
+{
+	return refs_resolve_ref_unsafe(get_main_ref_store(the_repository),
+				       refname, resolve_flags, oid, flags);
+}
+
+#endif
 #endif /* REFS_H */
Patrick Steinhardt May 7, 2024, 6:30 a.m. UTC | #10
On Mon, May 06, 2024 at 11:20:05PM -0700, Junio C Hamano wrote:
> Patrick Steinhardt <ps@pks.im> writes:
> 
> > message. The best I could come up with is the following:
> >
> >     #define REPLACED_REFS_FUNC(func) func ## was_replaced_by_refs_### func
> >
> > Which results in compiler errors like this:
> >
> >     bisect.c:712:6: error: use of undeclared identifier 'read_ref_was_replaced_by_refs_read_ref'
> >       712 |         if (read_ref("BISECT_EXPECTED_REV", &expected_oid))
> >
> > What is still missing is the bit of informatino that you need to pass in
> > `get_main_ref_store()`. But maybe this is good enough?
> 
> What I had in mind was a lot more stupid like the attached.  For
> illustration purposes, I just did only one, but you got the idea.
> Thanks to "#if 0", the compilation will fail, the compiler telling
> the developer "resolve_ref_unsafe()? what are you talking about?",
> and the developer will grep for that name to find the hint at the
> end.

That works for me, too. Thanks!

Patrick
Junio C Hamano May 7, 2024, 3:46 p.m. UTC | #11
Patrick Steinhardt <ps@pks.im> writes:

>> What I had in mind was a lot more stupid like the attached.  For
>> illustration purposes, I just did only one, but you got the idea.
>> Thanks to "#if 0", the compilation will fail, the compiler telling
>> the developer "resolve_ref_unsafe()? what are you talking about?",
>> and the developer will grep for that name to find the hint at the
>> end.
>
> That works for me, too. Thanks!

And if I replace 0 with USE_IMPLICIT_MAIN_REFSTORE (which is never
defined anywhere, and only serves for documentation purposes with
its rather explicit name), you get an exact implementation of what I
meant by "an approach similar to USE_THE_INDEX_MACROS".

Having said all that, as I expect that the topic itself will go
through smoothly down to 'master' (once a topic hits 'next', unless
it still has dubious parts, it usually takes 7 calendar days before
it goes to 'master') because the other parts of the series are fairly
straight-forward, we do not need anything more than just removing them,
without the helpful documentation in this case, especially given that
the adjustment other people need to do is very mechanical (and the
recipe is already in the coccinelle step).  They can notice that
their new call to resolve_ref_unsafe() does not compile, grep for
resolve_ref_unsafe and find nothing, and then they'll know to use
"log -Sresolve_ref_unsafe" to find what series removed it, what
happened to its old callers, and then adjust their calls.

Thanks.
Jeff King May 9, 2024, 4:55 p.m. UTC | #12
On Mon, May 06, 2024 at 08:44:45AM +0200, Patrick Steinhardt wrote:

> On Fri, May 03, 2024 at 11:24:11AM -0700, Junio C Hamano wrote:
> > Jeff King <peff@peff.net> writes:
> > 
> > > Though maybe an even more radical proposal: now that read_ref_full(),
> > > etc, are gone, and we have only refs_read_ref_full(), could/should we
> > > shorten the latter to drop the "refs_" prefix?
> > 
> > I view it as a good longer-term goal.  But I also view it as an
> > orthogonal issue to the transition.
> 
> Personally, I'd prefer to keep the `refs_` prefix. This may be personal
> preference, but I find it way easier to reason about code when there are
> prefixes for our functions that clearly indicate the subsystem they
> belong to.
> 
> It's also in line with how other subsystems behave. Everything relating
> to strbufs has a `strbuf_` prefix, attr-related code has `attr_` or
> `git_attr_`, mem-pool has `mem_pool_`. So ref-related code having a
> `ref_` prefix just feels natural to me.

I'd find that more compelling if all of the ref-related code had such a
prefix. But try reading refs.h sometime. ;)

That said, if we want to move in that direction I am OK with it.

-Peff
Patrick Steinhardt May 10, 2024, 5:54 a.m. UTC | #13
On Thu, May 09, 2024 at 12:55:57PM -0400, Jeff King wrote:
> On Mon, May 06, 2024 at 08:44:45AM +0200, Patrick Steinhardt wrote:
> 
> > On Fri, May 03, 2024 at 11:24:11AM -0700, Junio C Hamano wrote:
> > > Jeff King <peff@peff.net> writes:
> > > 
> > > > Though maybe an even more radical proposal: now that read_ref_full(),
> > > > etc, are gone, and we have only refs_read_ref_full(), could/should we
> > > > shorten the latter to drop the "refs_" prefix?
> > > 
> > > I view it as a good longer-term goal.  But I also view it as an
> > > orthogonal issue to the transition.
> > 
> > Personally, I'd prefer to keep the `refs_` prefix. This may be personal
> > preference, but I find it way easier to reason about code when there are
> > prefixes for our functions that clearly indicate the subsystem they
> > belong to.
> > 
> > It's also in line with how other subsystems behave. Everything relating
> > to strbufs has a `strbuf_` prefix, attr-related code has `attr_` or
> > `git_attr_`, mem-pool has `mem_pool_`. So ref-related code having a
> > `ref_` prefix just feels natural to me.
> 
> I'd find that more compelling if all of the ref-related code had such a
> prefix. But try reading refs.h sometime. ;)
> 
> That said, if we want to move in that direction I am OK with it.

Oh yeah, it's still quite a mixed bag. I will follow up this patch
series to get rid of `the_repository` in "refs.c" completely. While at
it I'll adapt some of the functions to gain the `refs_` prefix.

Patrick