mbox series

[0/6] commit-graph: use 'struct object_directory *' everywhere

Message ID cover.1580424766.git.me@ttaylorr.com (mailing list archive)
Headers show
Series commit-graph: use 'struct object_directory *' everywhere | expand

Message

Taylor Blau Jan. 30, 2020, 11 p.m. UTC
Hi,

While working on deploying layered commit-graphs at GitHub, I noticed
that I was able to trigger the mechanism to prevent merging
cross-alternate commit-graph layers in funny ways, like claiming that
"./objects" and "objects" were in different directories.

This resulted in removing all of the places in the commit-graph.c code
that use 'char *object_dir's with 'struct object_directory *'s instead.
This allows us to replace brittle string-based path comparisons with raw
pointer comparisons to check whether two commit-graphs belong to the
same object store or not.

This has a pleasant side effect in PATCH 5/6 of fixing an uninitialized
read as described in [1].

This series became a little bit longer than I was expecting it to be, so
here is the high-level structure:

  - 1/6 fixes a bug in a test that would cause a subsequent failure if
    unaddressed.

  - 2/6 does the first half of the removal by using 'struct
    object_directory *'s within the 'commit_graph' structure.

  - 4/6 does the second half by removing 'char *object_dir' usage in the
    'write_commit_graph_context' structure.

  - 5/6 ties 2/6 and 4/6 together by removing all path normalization
    completely, fixing the uninitialized read bug.

  - And 6/6 cleans up.

We've been running a version of this series based on our fork (which is
in turn based on 2.24) for a few hours without issue.

Thanks in advance for your review.

[1]: https://lore.kernel.org/git/20191027042116.GA5801@sigill.intra.peff.net/

Taylor Blau (6):
  t5318: don't pass non-object directory to '--object-dir'
  commit-graph.h: store object directory in 'struct commit_graph'
  builtin/commit-graph.c: die() with unknown '--object-dir'
  commit-graph.h: store an odb in 'struct write_commit_graph_context'
  commit-graph.c: remove path normalization, comparison
  commit-graph.h: use odb in 'load_commit_graph_one_fd_st'

 Documentation/git-commit-graph.txt |   5 +-
 builtin/commit-graph.c             |  25 ++++--
 builtin/commit.c                   |   3 +-
 builtin/fetch.c                    |   2 +-
 builtin/gc.c                       |   2 +-
 commit-graph.c                     | 135 +++++++++++++++--------------
 commit-graph.h                     |  17 ++--
 t/helper/test-read-graph.c         |   8 +-
 t/t5318-commit-graph.sh            |   4 +-
 9 files changed, 114 insertions(+), 87 deletions(-)

--
2.25.0.dirty

Comments

Jeff King Jan. 31, 2020, 10:30 a.m. UTC | #1
On Thu, Jan 30, 2020 at 03:00:40PM -0800, Taylor Blau wrote:

> This series became a little bit longer than I was expecting it to be, so
> here is the high-level structure:
> 
>   - 1/6 fixes a bug in a test that would cause a subsequent failure if
>     unaddressed.
> 
>   - 2/6 does the first half of the removal by using 'struct
>     object_directory *'s within the 'commit_graph' structure.
> 
>   - 4/6 does the second half by removing 'char *object_dir' usage in the
>     'write_commit_graph_context' structure.
> 
>   - 5/6 ties 2/6 and 4/6 together by removing all path normalization
>     completely, fixing the uninitialized read bug.
> 
>   - And 6/6 cleans up.

With the exception of the patch-ordering discussion in the sub-thread
with Martin, this looks good to me.

Patch 3 is a change in user-visible behavior, as it restricts how
--object-dir can be used (it must be the main object-dir or an alternate
within the repository). I don't _think_ anybody would care, as the
semantics of those options seemed kind of ill-defined to me in the first
place. But it's worth calling out as a potential risk. I suppose the
alternative is to make a one-off fake "struct object_directory" within
the process that isn't connected to the repository. But if nobody cares,
I'd just as soon avoid that.

One other funny thing with this series: the Date headers of your patches
seem out of order. They ordering in your cover letter here is fine and
presumably reflects the commit topology:

> Taylor Blau (6):
>   t5318: don't pass non-object directory to '--object-dir'
>   commit-graph.h: store object directory in 'struct commit_graph'
>   builtin/commit-graph.c: die() with unknown '--object-dir'
>   commit-graph.h: store an odb in 'struct write_commit_graph_context'
>   commit-graph.c: remove path normalization, comparison
>   commit-graph.h: use odb in 'load_commit_graph_one_fd_st'

but the Date headers in order of 1-6 are:

  Date:   Thu, 30 Jan 2020 15:00:43 -0800
  Date:   Thu, 30 Jan 2020 15:00:50 -0800
  Date:   Thu, 30 Jan 2020 15:00:54 -0800
  Date:   Thu, 30 Jan 2020 15:00:52 -0800
  Date:   Thu, 30 Jan 2020 15:00:45 -0800
  Date:   Thu, 30 Jan 2020 15:00:47 -0800

It's like your sending script rewrites the Date header and puts a
2-second bump between each one (which is good, and what git-send-email
does), but got fed the patches in the wrong order (perhaps their
_original_ date order, if there was clean-up rebasing).

-Peff
Derrick Stolee Jan. 31, 2020, 1:22 p.m. UTC | #2
On 1/31/2020 5:30 AM, Jeff King wrote:
> On Thu, Jan 30, 2020 at 03:00:40PM -0800, Taylor Blau wrote:
> 
>> This series became a little bit longer than I was expecting it to be, so
>> here is the high-level structure:
>>
>>   - 1/6 fixes a bug in a test that would cause a subsequent failure if
>>     unaddressed.
>>
>>   - 2/6 does the first half of the removal by using 'struct
>>     object_directory *'s within the 'commit_graph' structure.
>>
>>   - 4/6 does the second half by removing 'char *object_dir' usage in the
>>     'write_commit_graph_context' structure.
>>
>>   - 5/6 ties 2/6 and 4/6 together by removing all path normalization
>>     completely, fixing the uninitialized read bug.
>>
>>   - And 6/6 cleans up.
> 
> With the exception of the patch-ordering discussion in the sub-thread
> with Martin, this looks good to me.

I agree. Martin's comment is a good one. I can't find anything else
to improve the series.

> Patch 3 is a change in user-visible behavior, as it restricts how
> --object-dir can be used (it must be the main object-dir or an alternate
> within the repository). I don't _think_ anybody would care, as the
> semantics of those options seemed kind of ill-defined to me in the first
> place. But it's worth calling out as a potential risk. I suppose the
> alternative is to make a one-off fake "struct object_directory" within
> the process that isn't connected to the repository. But if nobody cares,
> I'd just as soon avoid that.

I think that this change of behavior is fine, especially because if
someone writes a commit-graph to an --object-dir that is not an
alternate, then that repo will not discover the new commit-graph
anyway.

I do like that you state a possible work-around in case someone shows
up with a legitimate use case for a non-alternate object-dir.

Thanks,
-Stolee
Taylor Blau Feb. 3, 2020, 4:38 a.m. UTC | #3
On Fri, Jan 31, 2020 at 08:22:42AM -0500, Derrick Stolee wrote:
> On 1/31/2020 5:30 AM, Jeff King wrote:
> > On Thu, Jan 30, 2020 at 03:00:40PM -0800, Taylor Blau wrote:
> >
> >> This series became a little bit longer than I was expecting it to be, so
> >> here is the high-level structure:
> >>
> >>   - 1/6 fixes a bug in a test that would cause a subsequent failure if
> >>     unaddressed.
> >>
> >>   - 2/6 does the first half of the removal by using 'struct
> >>     object_directory *'s within the 'commit_graph' structure.
> >>
> >>   - 4/6 does the second half by removing 'char *object_dir' usage in the
> >>     'write_commit_graph_context' structure.
> >>
> >>   - 5/6 ties 2/6 and 4/6 together by removing all path normalization
> >>     completely, fixing the uninitialized read bug.
> >>
> >>   - And 6/6 cleans up.
> >
> > With the exception of the patch-ordering discussion in the sub-thread
> > with Martin, this looks good to me.
>
> I agree. Martin's comment is a good one. I can't find anything else
> to improve the series.

Thanks for your review!

> > Patch 3 is a change in user-visible behavior, as it restricts how
> > --object-dir can be used (it must be the main object-dir or an alternate
> > within the repository). I don't _think_ anybody would care, as the
> > semantics of those options seemed kind of ill-defined to me in the first
> > place. But it's worth calling out as a potential risk. I suppose the
> > alternative is to make a one-off fake "struct object_directory" within
> > the process that isn't connected to the repository. But if nobody cares,
> > I'd just as soon avoid that.
>
> I think that this change of behavior is fine, especially because if
> someone writes a commit-graph to an --object-dir that is not an
> alternate, then that repo will not discover the new commit-graph
> anyway.

And thanks for the ack. I would be somewhat surprised if someone were
really relying on this behavior in practice.

> I do like that you state a possible work-around in case someone shows
> up with a legitimate use case for a non-alternate object-dir.
>
> Thanks,
> -Stolee

Thanks,
Taylor