[RFC,00/19] Bring more repository handles into our code base
mbox series

Message ID 20181011211754.31369-1-sbeller@google.com
Headers show
Series
  • Bring more repository handles into our code base
Related show

Message

Stefan Beller Oct. 11, 2018, 9:17 p.m. UTC
This applies on nd/the-index (b3c7eef9b05) and is the logical continuation of
the object store series, which I sent over the last year.

The previous series did take a very slow and pedantic approach,
using a #define trick, see cfc62fc98c for details, but it turns out,
that it doesn't work:
   When changing the signature of widely used functions, it burdens the
   maintainer in resolving the semantic conflicts.
   
   In the orginal approach this was called a feature, as then we can ensure
   that not bugs creep into the code base during the merge window (while such
   a refactoring series wanders from pu to master). It turns out this
   was not well received and was just burdensome.
   
   The #define trick doesn't buy us much to begin with when dealing with
   non-merge-conflicts.  For example, see deref_tag at tag.c:68, which got
   the repository argument in 286d258d4f (tag.c: allow deref_tag to handle
   arbitrary repositories, 2018-06-28) but lost its property of working on any
   repository while 8c4cc32689 (tag: don't warn if target is missing but
   promised, 2018-07-12) was in flight simultaneously.
   
   Another example of failure of this approach is seen in patch 5, which
   shows that the pedantry was missed.
        
This series takes another approach as it doesn't change the signature of
functions, but introduces new functions that can deal with arbitrary 
repositories, keeping the old function signature around using a shallow wrapper.

Additionally each patch adds a semantic patch, that would port from the old to
the new function. These semantic patches are all applied in the very last patch,
but we could omit applying the last patch if it causes too many merge conflicts
and trickl in the semantic patches over time when there are no merge conflicts.


The original goal of all these refactoring series was to remove add_submodule_odb 
in submodule.c, which was partially reached with this series. I'll investigate the
remaining calls in another series, but it shows we're close to be done with these
large refactorings as far as I am concerned.

Thanks,
Stefan

Stefan Beller (19):
  sha1_file: allow read_object to read objects in arbitrary repositories
  packfile: allow has_packed_and_bad to handle arbitrary repositories
  object-store: allow read_object_file_extended to read from arbitrary
    repositories
  object-store: prepare read_object_file to deal with arbitrary
    repositories
  object: parse_object to honor its repository argument
  commit: allow parse_commit* to handle arbitrary repositories
  commit.c: allow paint_down_to_common to handle arbitrary repositories
  commit.c: allow merge_bases_many to handle arbitrary repositories
  commit.c: allow remove_redundant to handle arbitrary repositories
  commit: allow get_merge_bases_many_0 to handle arbitrary repositories
  commit: prepare get_merge_bases to handle arbitrary repositories
  commit: prepare get_commit_buffer to handle arbitrary repositories
  commit: prepare in_merge_bases[_many] to handle arbitrary repositories
  commit: prepare repo_unuse_commit_buffer to handle arbitrary
    repositories
  commit: prepare logmsg_reencode to handle arbitrary repositories
  pretty: prepare format_commit_message to handle arbitrary repositories
  submodule: use submodule repos for object lookup
  submodule: don't add submodule as odb for push
  Apply semantic patches from previous patches

 apply.c                                 |   6 +-
 archive.c                               |   5 +-
 bisect.c                                |   5 +-
 blame.c                                 |  15 +--
 builtin/am.c                            |   2 +-
 builtin/blame.c                         |   4 +-
 builtin/cat-file.c                      |  21 +++--
 builtin/checkout.c                      |   4 +-
 builtin/commit.c                        |  13 ++-
 builtin/describe.c                      |   4 +-
 builtin/difftool.c                      |   3 +-
 builtin/fast-export.c                   |   7 +-
 builtin/fmt-merge-msg.c                 |   8 +-
 builtin/grep.c                          |   2 +-
 builtin/index-pack.c                    |   8 +-
 builtin/log.c                           |   4 +-
 builtin/merge-base.c                    |   2 +-
 builtin/merge-tree.c                    |   9 +-
 builtin/mktag.c                         |   3 +-
 builtin/name-rev.c                      |   2 +-
 builtin/notes.c                         |  12 ++-
 builtin/pack-objects.c                  |  22 +++--
 builtin/reflog.c                        |   5 +-
 builtin/replace.c                       |   2 +-
 builtin/shortlog.c                      |   5 +-
 builtin/show-branch.c                   |   4 +-
 builtin/tag.c                           |   4 +-
 builtin/unpack-file.c                   |   2 +-
 builtin/unpack-objects.c                |   3 +-
 builtin/verify-commit.c                 |   2 +-
 bundle.c                                |   2 +-
 combine-diff.c                          |   2 +-
 commit-graph.c                          |   8 +-
 commit.c                                | 120 ++++++++++++++----------
 commit.h                                |  67 ++++++++++---
 config.c                                |   2 +-
 contrib/coccinelle/the_repository.cocci | 114 ++++++++++++++++++++++
 diff.c                                  |   3 +-
 dir.c                                   |   2 +-
 entry.c                                 |   3 +-
 fast-import.c                           |   7 +-
 fsck.c                                  |   9 +-
 grep.c                                  |   3 +-
 http-push.c                             |   3 +-
 log-tree.c                              |   3 +-
 mailmap.c                               |   2 +-
 match-trees.c                           |   4 +-
 merge-blobs.c                           |   6 +-
 merge-recursive.c                       |  13 +--
 negotiator/default.c                    |   6 +-
 negotiator/skipping.c                   |   2 +-
 notes-cache.c                           |   5 +-
 notes-merge.c                           |   4 +-
 notes-utils.c                           |   2 +-
 notes.c                                 |  10 +-
 object-store.h                          |  13 ++-
 object.c                                |   2 +-
 packfile.c                              |   5 +-
 packfile.h                              |   2 +-
 pretty.c                                |  33 ++++---
 pretty.h                                |   7 +-
 read-cache.c                            |   5 +-
 remote-testsvn.c                        |   4 +-
 remote.c                                |   2 +-
 rerere.c                                |   5 +-
 revision.c                              |  12 +--
 sequencer.c                             |  55 ++++++-----
 sha1-file.c                             |  22 +++--
 sha1-name.c                             |   9 +-
 shallow.c                               |   4 +-
 streaming.c                             |   2 +-
 submodule-config.c                      |   3 +-
 submodule.c                             |  51 +++++++---
 t/helper/test-revision-walking.c        |   3 +-
 tag.c                                   |   5 +-
 tree-walk.c                             |   6 +-
 tree.c                                  |   5 +-
 walker.c                                |   2 +-
 xdiff-interface.c                       |   2 +-
 79 files changed, 571 insertions(+), 278 deletions(-)
 create mode 100644 contrib/coccinelle/the_repository.cocci

Comments

Jonathan Tan Oct. 11, 2018, 11:07 p.m. UTC | #1
> This series takes another approach as it doesn't change the signature of
> functions, but introduces new functions that can deal with arbitrary 
> repositories, keeping the old function signature around using a shallow wrapper.
> 
> Additionally each patch adds a semantic patch, that would port from the old to
> the new function. These semantic patches are all applied in the very last patch,
> but we could omit applying the last patch if it causes too many merge conflicts
> and trickl in the semantic patches over time when there are no merge conflicts.

Thanks, this looks like a good plan.

One concern is that if we leave 2 versions of functions around, it will
be difficult to look at a function and see if it's truly
multi-repository-compatible (or making a call to a function that
internally uses the_repository, and is thus wrong). But with the plan
Stefan quoted [1], mentioned in commit e675765235 ("diff.c: remove
implicit dependency on the_index", 2018-09-21):

  The plan is these macros will always be defined for all library files
  and the macros are only accessible in builtin/

(The macros include NO_THE_REPOSITORY_COMPATIBILITY_MACROS, which
disables the single-repository function-like macros.) This mitigates the
concern somewhat.

[1] https://public-inbox.org/git/20181011211754.31369-1-sbeller@google.com/
Junio C Hamano Oct. 11, 2018, 11:31 p.m. UTC | #2
Stefan Beller <sbeller@google.com> writes:

> Additionally each patch adds a semantic patch, that would port from the old to
> the new function. These semantic patches are all applied in the very last patch,
> but we could omit applying the last patch if it causes too many merge conflicts
> and trickl in the semantic patches over time when there are no merge conflicts.

That's an interesting approach ;-)

> The original goal of all these refactoring series was to remove add_submodule_odb 
> in submodule.c, which was partially reached with this series.

Yup, that is a very good goalpost to keep in mind.

> remaining calls in another series, but it shows we're close to be done with these
> large refactorings as far as I am concerned.

Nice.
Jonathan Nieder Oct. 12, 2018, 6:50 p.m. UTC | #3
Hi,

Stefan Beller wrote:

> This applies on nd/the-index (b3c7eef9b05) and is the logical continuation of
> the object store series, which I sent over the last year.
>
> The previous series did take a very slow and pedantic approach,
> using a #define trick, see cfc62fc98c for details, but it turns out,
> that it doesn't work:

Thanks for the heads up --- this will remind me to review this new
series more carefully, since it differs from what was reviewed before.

I think this will be easiest to review with --function-context.  I can
generate that diff locally, so no need to resend.

>    When changing the signature of widely used functions, it burdens the
>    maintainer in resolving the semantic conflicts.
>
>    In the orginal approach this was called a feature, as then we can ensure
>    that not bugs creep into the code base during the merge window (while such
>    a refactoring series wanders from pu to master). It turns out this
>    was not well received and was just burdensome.

I don't agree with this characterization.

The question of who resolves conflicts is separate from the question
of whether conflicts appear, which is in turn separate from the
question of whether the build breaks.

I consider making the build break when a caller tries to use a
half-converted function too early to be a very useful feature.  There
is a way to do that in C++ that allows decoupled conversions, but the
C version forced an ordering of the conversions.  It seems that the
pain was caused by the combination of

 1. that coupling, which forced an ordering on the conversions and
    prevented us from ordering the patches in an order based on
    convenience of integration (unlike e.g. the "struct object_id"
    series which was able to proceed by taking a batch covering a
    quiet area of the tree at a time)

 2. as you mentioned, removal of old API at the same time of addition
 of new API forced callers across the tree to update at once

 3. the lack of having decided how to handle the anticipated churn

Now most of the conversions are done (thanks much for that) so the
ordering (1) is not the main remaining pain point.  Thanks for
tackling the other two in this series.

I want future API changes to be easier.  That means tackling the
following questions up front:

 i. Where does this fit in Rusty's API rating scheme
    <http://sweng.the-davies.net/Home/rustys-api-design-manifesto>?
    Does misuse (or misconverted callers) break the build, break
    visibly at runtime, or are the effects more subtle?

 ii. Is there good test coverage for the new API?  Are there tests
     that need to be migrated?

 iii. Is there a way to automatically migrate callers, or does this
      require manual, error-prone work (thanks for tackling that in
      this one.)

 iv. How are we planning to handle multiple patches in flight?  Will
     the change produce merge conflicts?  How can others on the list
     help the maintainer with integrating this set of changes?

 iv. Is the ending point cleaner than where we started?

The #define trick you're referring to was a way of addressing (i).

[...]
>  79 files changed, 571 insertions(+), 278 deletions(-)

Most of the increase is in the coccinelle file and in improved
documentation.

It appears that some patches use a the_index-style
NO_THE_REPOSITORY_COMPATIBILITY_MACROS backward compatibility synonym
and others don't.  Can you say a little more about this aspect of the
approach?  Would the compatibility macros go away eventually?

Thanks,
Jonathan
Stefan Beller Oct. 13, 2018, 12:30 a.m. UTC | #4
On Fri, Oct 12, 2018 at 11:50 AM Jonathan Nieder <jrnieder@gmail.com> wrote:

>
> It appears that some patches use a the_index-style
> NO_THE_REPOSITORY_COMPATIBILITY_MACROS backward compatibility synonym
> and others don't.  Can you say a little more about this aspect of the
> approach?  Would the compatibility macros go away eventually?

I use the macro only when not doing the whole conversion in the patch
(i.e. there is a coccinelle patch IFF there is the macro and vice versa).

It's quite frankly a judgement call what I would convert as a whole
and what not, it depends on the usage of the functions and if I know
series that are in flight using it. The full conversion is easy to write if
there are less than a hand full of callers, so for the "small case", I just
did it, hoping it won't break other topics in flight.