diff mbox series

[RFC/PATCH] commit-graph: generation v5 (backward compatible date ceiling)

Message ID 86o8ziatb2.fsf_-_@gmail.com (mailing list archive)
State New, archived
Headers show
Series [RFC/PATCH] commit-graph: generation v5 (backward compatible date ceiling) | expand

Commit Message

Jakub Narębski Sept. 18, 2019, 8:43 a.m. UTC
Derrick Stolee <stolee@gmail.com> writes:
> On 6/25/2019 3:51 AM, Jakub Narebski wrote:
>> Jakub Narebski <jnareb@gmail.com> writes:
>>> Derrick Stolee <stolee@gmail.com> writes:
[...]
>>> O.K., so the "generation number v2 (legacy)" would be incremental and
>>> backward-compatibile in use (though not in generation and validation).
[...]
>>> Do you have benchmark for this "monotonically offset corrected commit
>>> date" generation number in https://github.com/derrickstolee/git/commits/reach-perf
>>> and https://github.com/derrickstolee/gen-test ?
>> 
>> I guess this will have to wait...
>
> I have not had time to revisit this topic and re-run performance
> numbers, sorry.

I have created pull requests against `reach-perf` branch of
derrickstolee/git repository[1], and companion pull request against
gen-test repository[2] with proposed prototype of backward-compatible
corrected commit date (with monotonic offsets).

Could you please run the tests for this generation number v5?  I was not
able to do so.  It was my first time trying to compile Git on MS
Windows, and while there were no problems compiling `master` (well,
except for compilation taking a long time), I was unable to do it for
`reach-perf` branch because of independent of change compilation errors.

It would be good to test also the legacy mode, i.e. old Git (using
generation number v0, i.e. topological levels (shortest path from node
to sink, plus one) with all backward-compatible generation numbers, or
at least generation number v5.


Do I understand it correctly that for the time being because of
backward-compatibility concerns instead of incrementing the version
number there would be used some kind of "metadata" chunk (at least until
version of Git that fails hard, instead of turning off commit-graph
feature softly, on unknown version numbers dies out).  Is there some
proposal how such chunk should look like?

[1]: https://github.com/derrickstolee/git/pull/10
[2]: https://github.com/derrickstolee/gen-test/pull/1

--- >8 ---- >8 ---- >8 ---- >8 ---- >8 ---- >8 ---- >8 ---- >8 ----
From: Jakub Narębski <jnareb@gmail.com>
Subject: [PATCH] commit-graph: generation v5 (backward compatible date ceiling)

This generation number option is backward-compatible (with
reachability index v0) version of the generation number v3, that is
the corrected commit date.  It takes the simple approach of faking
"commit dates" being in the right order by adding an offset to the
commit date (which is stored in the generation number column), so the
modified commit date is always at least one more than the maximum
modified commit date of all ancestors.  Additionally the offset itself
is adjusted in such way that the offset of a commit is always at least
one more than the offsets of all ancestors.

Two following conditions are satisfied on the offset of a commit C for
every parent P:

 1. committer_date(C) + offset(C) > committer_date(P) + offset(P)
 2. offset(C) > offset(P)

This reachability index is locally computable, which means it is
compatible with incremental commit graph feature.

It has the same problem as v3, namely that we have a "maximum clock
skew" based on the maximum generation number we can store, only more
so.

Note: it includes incidental whitespace cleanups for v3 code.

Signed-off-by: Jakub Narębski <jnareb@gmail.com>
---
Note that the diff looks a bit strange in the part adding the code for
the new compute_generation_numbers_5() subroutine, because it got
entangled in whitespace cleanup (which I shouldn't do, I'm sorry).

Relevant part of README.md from `gen-test` repository:

--------
**V5: Corrected Commit Date with Strictly Monotonic Offset.**
For a commit C, let its _offset commit date_ (denoted by odate(C))
be a commit date plus some offset, i.e. odate(C) = date(C) + offset(C),
such that:

1. Offset commit date is greater than the maximum of the commit date
   of C and the offset commit dates of its parents.

     If odate(A) < odate(B), then A cannot reach B.

2. Offset of a commit is one more than the maximum offset of a parent,
   or more

     If offset(A) < offset(B), then A cannot reach B.

This is backward-compatible version of V3: Corrected Commit Date.

### Comparing Reachability Index Versions Viability
[...]
| Index                     | Compatible? | Immutable? | Local? |
|---------------------------|-------------|------------|--------|
| Minimum Generation Number | Yes         | Yes        | Yes    |
| (Epoch, Date) pairs       | Yes         | Yes        | Yes    |
| Maximum Generation Number | Yes         | No         | No     |
| Corrected Commit Date     | No          | Yes        | Yes    |
| FELINE index              | Yes         | No         | No     |
| Offset Commit Date *NEW*  | Yes         | Yes        | Yes    |

_Note:_ The corrected commit date uses the generation number column
to store an offset of "how much do I need to add to my commit date
to get my corrected commit date?" The values stored in that column
are then not backwards-compatible.

_Note:_ The corrected commit date with strictly monotonic offset also
uses the generation number column to store the date offset, but the
offset alone can be used as generation number (as reachability index)
itself.


 commit-graph.c | 126 +++++++++++++++++++++++++++++++++++++------------
 1 file changed, 95 insertions(+), 31 deletions(-)

Comments

Derrick Stolee Sept. 18, 2019, 12:26 p.m. UTC | #1
On 9/18/2019 4:43 AM, Jakub Narebski wrote:
> Derrick Stolee <stolee@gmail.com> writes:
>> On 6/25/2019 3:51 AM, Jakub Narebski wrote:
>>> Jakub Narebski <jnareb@gmail.com> writes:
>>>> Derrick Stolee <stolee@gmail.com> writes:
> [...]
>>>> O.K., so the "generation number v2 (legacy)" would be incremental and
>>>> backward-compatibile in use (though not in generation and validation).
> [...]
>>>> Do you have benchmark for this "monotonically offset corrected commit
>>>> date" generation number in https://github.com/derrickstolee/git/commits/reach-perf
>>>> and https://github.com/derrickstolee/gen-test ?
>>>
>>> I guess this will have to wait...
>>
>> I have not had time to revisit this topic and re-run performance
>> numbers, sorry.
> 
> I have created pull requests against `reach-perf` branch of
> derrickstolee/git repository[1], and companion pull request against
> gen-test repository[2] with proposed prototype of backward-compatible
> corrected commit date (with monotonic offsets).
> 
> Could you please run the tests for this generation number v5?

I'll try to get to this, but it may take a few days.

> I was not
> able to do so.  It was my first time trying to compile Git on MS
> Windows, and while there were no problems compiling `master` (well,
> except for compilation taking a long time), I was unable to do it for
> `reach-perf` branch because of independent of change compilation errors.

I don't think I tested this on Windows. I do most of my performance tests
in Linux, especially when presenting numbers to the mailing list. The
gen-test repo has a bunch of shell scripts that I used for testing in
that environment.

Thanks,
-Stolee
diff mbox series

Patch

diff --git a/commit-graph.c b/commit-graph.c
index 91863d4895..633b4b24f8 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -99,10 +99,9 @@  int compare_generations(struct generation *a, struct generation *b)
 			return 1;
 		return 0;
 
-	case 3:
-		ta = a->date + a->value1;
-		tb = b->date + b->value1;
-
+	case 3: /* V3: Corrected Commit Date */
+	case 5: /* V5: Strictly Monotonic Corrected Commit Date */
+		/* handle special cases, i.e. commits outside commit graph */
 		if (a->value1 == GENERATION_NUMBER_INFINITY) {
 			if (b->value1 == GENERATION_NUMBER_INFINITY)
 				return 0;
@@ -112,6 +111,10 @@  int compare_generations(struct generation *a, struct generation *b)
 			return -1;
 		}
 
+		/* corrected commit date = date + offset (correction) */
+		ta = a->date + a->value1;
+		tb = b->date + b->value1;
+
 		if (ta < tb)
 			return -1;
 		if (ta > tb)
@@ -162,6 +165,7 @@  void get_generation_version_from_commit(const struct commit *c,
 
 		case 1:
 		case 3:
+		case 5:
 			gen->value1 = c->generation;
 			gen->date = c->date;
 			break;
@@ -212,9 +216,10 @@  void set_generation_below_commit(const struct commit *c, struct generation *g)
 			break;
 
 		case 3:
+		case 5: /* ??? */
 			if (g->value1 + g->date >= gc.value1 + gc.date) {
 				g->value1 = 0;
-				g->date = gc.value1 + gc.date;			
+				g->date = gc.value1 + gc.date;
 			}
 			break;
 
@@ -363,7 +368,7 @@  struct commit_graph *load_commit_graph_one(const char *graph_file)
 			else
 				graph->chunk_large_edges = data + chunk_offset;
 			break;
-		
+
 		case GRAPH_CHUNKID_FELINE:
 			if (graph->chunk_feline_gen)
 				chunk_repeated = 1;
@@ -971,27 +976,27 @@  static void compute_generation_numbers_3(struct packed_commit_list *commits)
 	int i;
 	struct commit_list *list = NULL;
 
-        for (i = 0; i < commits->nr; i++) {
-                if (commits->list[i]->generation != GENERATION_NUMBER_INFINITY &&
-                    commits->list[i]->generation != GENERATION_NUMBER_ZERO)
-                        continue;
+	for (i = 0; i < commits->nr; i++) {
+		if (commits->list[i]->generation != GENERATION_NUMBER_INFINITY &&
+			commits->list[i]->generation != GENERATION_NUMBER_ZERO)
+			continue;
 
-                commit_list_insert(commits->list[i], &list);
+		commit_list_insert(commits->list[i], &list);
 
-                while (list) {
-                        struct commit *current = list->item;
-                        struct commit_list *parent;
-                        int all_parents_computed = 1;
+		while (list) {
+			struct commit *current = list->item;
+			struct commit_list *parent;
+			int all_parents_computed = 1;
 
-                        timestamp_t max_timestamp = current->date;
+			timestamp_t max_timestamp = current->date;
 
-                        for (parent = current->parents; parent; parent = parent->next) {
-                                if (parent->item->generation == GENERATION_NUMBER_INFINITY ||
-                                    parent->item->generation == GENERATION_NUMBER_ZERO) {
-                                        all_parents_computed = 0;
-                                        commit_list_insert(parent->item, &list);
-                                        break;
-                                } else {
+			for (parent = current->parents; parent; parent = parent->next) {
+				if (parent->item->generation == GENERATION_NUMBER_INFINITY ||
+					parent->item->generation == GENERATION_NUMBER_ZERO) {
+					all_parents_computed = 0;
+					commit_list_insert(parent->item, &list);
+					break;
+				} else {
 					struct generation pg;
 					timestamp_t pt;
 					get_generation_version_from_commit(parent->item, 3, &pg);
@@ -1001,19 +1006,75 @@  static void compute_generation_numbers_3(struct packed_commit_list *commits)
 					if (pt > max_timestamp)
 						max_timestamp = pt + 1;
 				}
-                        }
+			}
+
+			if (all_parents_computed) {
+				current->generation = (uint32_t)(max_timestamp - current->date) + 1;
+				pop_commit(&list);
+
+				if (current->generation > GENERATION_NUMBER_MAX)
+					die(_("generation number gap is too high!"));
+			}
+		}
+	}
+}
+
+static void compute_generation_numbers_5(struct packed_commit_list *commits)
+{
+	int i;
+	struct commit_list *list = NULL;
+
+	for (i = 0; i < commits->nr; i++) {
+		/* skip already computed generation numbers */
+		if (commits->list[i]->generation != GENERATION_NUMBER_INFINITY &&
+			commits->list[i]->generation != GENERATION_NUMBER_ZERO)
+			continue;
+
+		commit_list_insert(commits->list[i], &list);
+
+		while (list) {
+			struct commit *current = list->item;
+			struct commit_list *parent;
+			int all_parents_computed = 1;
+
+			timestamp_t max_timestamp = current->date;
+			uint32_t max_generation = 0;
+
+			for (parent = current->parents; parent; parent = parent->next) {
+				if (parent->item->generation == GENERATION_NUMBER_INFINITY ||
+				    parent->item->generation == GENERATION_NUMBER_ZERO) {
+					all_parents_computed = 0;
+					commit_list_insert(parent->item, &list);
+					break;
+
+				} else {
+					struct generation pg;
+					timestamp_t pt;
+					get_generation_version_from_commit(parent->item, 5, &pg);
+
+					pt = pg.value1 + pg.date;
+
+					if (pt > max_timestamp)
+						max_timestamp = pt + 1;
+					if (pg.value1 > max_generation)
+						max_generation = pg.value1;
+				}
+			}
 
-                        if (all_parents_computed) {
+			if (all_parents_computed) {
 				current->generation = (uint32_t)(max_timestamp - current->date) + 1;
-                                pop_commit(&list);
+				if (current->generation < max_generation + 1)
+					current->generation = max_generation + 1;
+				pop_commit(&list);
 
-                                if (current->generation > GENERATION_NUMBER_MAX)
-                                        die(_("generation number gap is too high!"));
-                        }
-                }
-        }
+				if (current->generation > GENERATION_NUMBER_MAX)
+					die(_("generation number gap is too high!"));
+			}
+		}
+	}
 }
 
+
 static void compute_generation_numbers(struct packed_commit_list *commits,
 				       int generation_version)
 {
@@ -1038,6 +1099,9 @@  static void compute_generation_numbers(struct packed_commit_list *commits,
 	case 4:
 		/* compute at write time */
 		return;
+	case 5:
+		compute_generation_numbers_5(commits);
+		return;
 	}
 }