diff mbox series

[v2,7/8] packed-backend: check whether the "packed-refs" is sorted

Message ID Z5r7KvL1bvSO4UQY@ArchLinux (mailing list archive)
State New
Headers show
Series add more ref consistency checks | expand

Commit Message

shejialuo Jan. 30, 2025, 4:08 a.m. UTC
We will always try to sort the "packed-refs" increasingly by comparing
the refname. So, we should add checks to verify whether the "packed-refs"
is sorted.

We already have code to parse the content. Let's create a new structure
"fsck_packed_ref_entry" to store the state during the parsing process
for every entry. It may seem that we could just add a new "struct strbuf
refname" into the "struct fsck_packed_ref_entry" and during the parsing
process, we could store the refname into this structure and we could
compare later. However, this is not a good design due to the following
reasons:

1. Because we need to store the state across the whole checking
   lifetime, we would consume a lot of memory if there are many entries
   in the "packed-refs" file.
2. The most important thing is that we cannot reuse the existing compare
   functions which cause repetition.

So, instead of storing the "struct strbuf", let's use the existing
structure "struct snaphost_record". And thus we could use the existing
function "cmp_packed_ref_records".

However, this function need an extra parameter for "struct snaphost".
Extract the common part into a new function "cmp_packed_ref_records" to
reuse this function to compare.

Then, create a new function "packed_fsck_ref_sorted" to use the new fsck
message "packedRefUnsorted(ERROR)" to report to the user.

Mentored-by: Patrick Steinhardt <ps@pks.im>
Mentored-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: shejialuo <shejialuo@gmail.com>
---
 Documentation/fsck-msgids.txt |   3 +
 fsck.h                        |   1 +
 refs/packed-backend.c         | 100 +++++++++++++++++++++++++++++++---
 t/t0602-reffiles-fsck.sh      |  38 +++++++++++++
 4 files changed, 135 insertions(+), 7 deletions(-)

Comments

Junio C Hamano Jan. 30, 2025, 7:02 p.m. UTC | #1
shejialuo <shejialuo@gmail.com> writes:

> We will always try to sort the "packed-refs" increasingly by comparing
> the refname. So, we should add checks to verify whether the "packed-refs"
> is sorted.

Do this _ONLY_ when the packed-refs file has a header that declares
"sorted" trait.  Insisting on a packed-refs file that does not would
mean you are stricter than the runtime contract allows.

> +struct fsck_packed_ref_entry {
> +	int line_number;
> +
> +	struct snapshot_record record;
> +};

Not a huge deal, as 1 billion is still plenty of a large number, but
the same comment on the line-number applies here.  We might want to
consistently use ulong for line numbers of files we read from.
shejialuo Jan. 31, 2025, 2:35 p.m. UTC | #2
On Thu, Jan 30, 2025 at 11:02:18AM -0800, Junio C Hamano wrote:
> shejialuo <shejialuo@gmail.com> writes:
> 
> > We will always try to sort the "packed-refs" increasingly by comparing
> > the refname. So, we should add checks to verify whether the "packed-refs"
> > is sorted.
> 
> Do this _ONLY_ when the packed-refs file has a header that declares
> "sorted" trait.  Insisting on a packed-refs file that does not would
> mean you are stricter than the runtime contract allows.
> 

From my perspective, we should check whether it is sorted when the
header has a "sorted" trait. Actually, in the runtime, when calling
`create_snapshot` method, the following would happen:

1. If there is no "sorted" trait, it will sort the "packed-refs".
2. If there is, it won't sort the "packed-refs".

So, we DO allow refs unsorted.

Actually, I have used `git show v1.5.0:builtin-pack-refs.c`, in this
version, it does not sort the ref. However, I quite don't understand the
comment from Patrick in the version one about this patch:

> Makes sense. It has been a source of bugs a couple years ago, and it can
> silently make you receive wrong results, so this is quite a sensible
> check to have.

Patrick, could you please help to explain this. I don't know whether we
need to check whether "packed-refs" is sorted always. It seems that we
truly allow refs unsorted. We need to know whether we should tighten
this?

> > +struct fsck_packed_ref_entry {
> > +	int line_number;
> > +
> > +	struct snapshot_record record;
> > +};
> 
> Not a huge deal, as 1 billion is still plenty of a large number, but
> the same comment on the line-number applies here.  We might want to
> consistently use ulong for line numbers of files we read from.

Yes, let me improve this.

Thanks,
Jialuo
Junio C Hamano Jan. 31, 2025, 4:23 p.m. UTC | #3
shejialuo <shejialuo@gmail.com> writes:

> On Thu, Jan 30, 2025 at 11:02:18AM -0800, Junio C Hamano wrote:
>> shejialuo <shejialuo@gmail.com> writes:
>> 
>> > We will always try to sort the "packed-refs" increasingly by comparing
>> > the refname. So, we should add checks to verify whether the "packed-refs"
>> > is sorted.
>> 
>> Do this _ONLY_ when the packed-refs file has a header that declares
>> "sorted" trait.  Insisting on a packed-refs file that does not would
>> mean you are stricter than the runtime contract allows.
>> 
>
> From my perspective, we should check whether it is sorted when the
> header has a "sorted" trait.

So the three-lines you wrote is not accurate, then.  That is why I
said that "should add checks" should not be unconditional---we
should not check if the file contents is sorted when "sorted" trait
is not declared.
shejialuo Feb. 1, 2025, 9:50 a.m. UTC | #4
On Fri, Jan 31, 2025 at 08:23:22AM -0800, Junio C Hamano wrote:
> shejialuo <shejialuo@gmail.com> writes:
> 
> > On Thu, Jan 30, 2025 at 11:02:18AM -0800, Junio C Hamano wrote:
> >> shejialuo <shejialuo@gmail.com> writes:
> >> 
> >> > We will always try to sort the "packed-refs" increasingly by comparing
> >> > the refname. So, we should add checks to verify whether the "packed-refs"
> >> > is sorted.
> >> 
> >> Do this _ONLY_ when the packed-refs file has a header that declares
> >> "sorted" trait.  Insisting on a packed-refs file that does not would
> >> mean you are stricter than the runtime contract allows.
> >> 
> >
> > From my perspective, we should check whether it is sorted when the
> > header has a "sorted" trait.
> 
> So the three-lines you wrote is not accurate, then.  That is why I
> said that "should add checks" should not be unconditional---we
> should not check if the file contents is sorted when "sorted" trait
> is not declared.

I have made confusion here. Sorry. Let me improve this in the next
version.

Thanks,
Jialuo
Patrick Steinhardt Feb. 3, 2025, 8:40 a.m. UTC | #5
On Thu, Jan 30, 2025 at 12:08:10PM +0800, shejialuo wrote:
> diff --git a/refs/packed-backend.c b/refs/packed-backend.c
> index 271c740728..b250f987b2 100644
> --- a/refs/packed-backend.c
> +++ b/refs/packed-backend.c
> @@ -1768,6 +1774,28 @@ static struct ref_iterator *packed_reflog_iterator_begin(struct ref_store *ref_s
>  	return empty_ref_iterator_begin();
>  }
>  
> +struct fsck_packed_ref_entry {
> +	int line_number;

This should rather be a `size_t`, or at least `unsigned`.

> +
> +	struct snapshot_record record;
> +};
> +
> +static struct fsck_packed_ref_entry *create_fsck_packed_ref_entry(int line_number,
> +								  const char *start)
> +{
> +	struct fsck_packed_ref_entry *entry = xcalloc(1, sizeof(*entry));
> +	entry->line_number = line_number;
> +	entry->record.start = start;
> +	return entry;
> +}
> +
> +static void free_fsck_packed_ref_entries(struct fsck_packed_ref_entry **entries, int nr)
> +{
> +	for (int i = 0; i < nr; i++)

Let's use `size_t` for both `i` and `nr`.

> +		free(entries[i]);
> +	free(entries);
> +}
> +
>  static int packed_fsck_ref_next_line(struct fsck_options *o,
>  				     struct strbuf *packed_entry, const char *start,
>  				     const char *eof, const char **eol)
> @@ -1893,13 +1921,60 @@ static int packed_fsck_ref_main_line(struct fsck_options *o,
>  	return 0;
>  }
>  
> +static int packed_fsck_ref_sorted(struct fsck_options *o,
> +				  struct ref_store *ref_store,
> +				  struct fsck_packed_ref_entry **entries,
> +				  int nr)
> +{
> +	size_t hexsz = ref_store->repo->hash_algo->hexsz;
> +	struct strbuf packed_entry = STRBUF_INIT;
> +	struct fsck_ref_report report = { 0 };
> +	struct strbuf refname1 = STRBUF_INIT;
> +	struct strbuf refname2 = STRBUF_INIT;
> +	int ret = 0;
> +
> +	for (int i = 1; i < nr; i++) {

Here, as well.

Patrick
Patrick Steinhardt Feb. 3, 2025, 8:40 a.m. UTC | #6
On Fri, Jan 31, 2025 at 10:35:51PM +0800, shejialuo wrote:
> On Thu, Jan 30, 2025 at 11:02:18AM -0800, Junio C Hamano wrote:
> > shejialuo <shejialuo@gmail.com> writes:
> > Makes sense. It has been a source of bugs a couple years ago, and it can
> > silently make you receive wrong results, so this is quite a sensible
> > check to have.
> 
> Patrick, could you please help to explain this. I don't know whether we
> need to check whether "packed-refs" is sorted always. It seems that we
> truly allow refs unsorted. We need to know whether we should tighten
> this?

The context here is that packed-refs sometimes claim that they are
sorted, but indeed they aren't. There are two sources for this that I've
seen in the wild:

  - An invalid comparison function. I think I remember that libgit2 at
    one point sorted them incorrectly, but not a 100% sure anymore where
    I've seen this.

  - A user manually edits the packed-refs file, but isn't aware of the
    sorting.

So we should assert that a packed-refs file is correctly sorted, but
only when the header claims that it should be sorted.

Patrick
diff mbox series

Patch

diff --git a/Documentation/fsck-msgids.txt b/Documentation/fsck-msgids.txt
index 2a7ec7592e..7a11d35c5e 100644
--- a/Documentation/fsck-msgids.txt
+++ b/Documentation/fsck-msgids.txt
@@ -190,6 +190,9 @@ 
 `packedRefMissingHeader`::
 	(INFO) The "packed-refs" file does not contain the header.
 
+`packedRefUnsorted`::
+	(ERROR) The "packed-refs" file is not sorted.
+
 `refMissingNewline`::
 	(INFO) A loose ref that does not end with newline(LF). As
 	valid implementations of Git never created such a loose ref
diff --git a/fsck.h b/fsck.h
index 40126242a4..0d3d1045ae 100644
--- a/fsck.h
+++ b/fsck.h
@@ -56,6 +56,7 @@  enum fsck_msg_type {
 	FUNC(MISSING_TYPE_ENTRY, ERROR) \
 	FUNC(MULTIPLE_AUTHORS, ERROR) \
 	FUNC(PACKED_REF_ENTRY_NOT_TERMINATED, ERROR) \
+	FUNC(PACKED_REF_UNSORTED, ERROR) \
 	FUNC(TREE_NOT_SORTED, ERROR) \
 	FUNC(UNKNOWN_TYPE, ERROR) \
 	FUNC(ZERO_PADDED_DATE, ERROR) \
diff --git a/refs/packed-backend.c b/refs/packed-backend.c
index 271c740728..b250f987b2 100644
--- a/refs/packed-backend.c
+++ b/refs/packed-backend.c
@@ -300,14 +300,9 @@  struct snapshot_record {
 	size_t len;
 };
 
-static int cmp_packed_ref_records(const void *v1, const void *v2,
-				  void *cb_data)
-{
-	const struct snapshot *snapshot = cb_data;
-	const struct snapshot_record *e1 = v1, *e2 = v2;
-	const char *r1 = e1->start + snapshot_hexsz(snapshot) + 1;
-	const char *r2 = e2->start + snapshot_hexsz(snapshot) + 1;
 
+static int cmp_packed_refname(const char *r1, const char *r2)
+{
 	while (1) {
 		if (*r1 == '\n')
 			return *r2 == '\n' ? 0 : -1;
@@ -322,6 +317,17 @@  static int cmp_packed_ref_records(const void *v1, const void *v2,
 	}
 }
 
+static int cmp_packed_ref_records(const void *v1, const void *v2,
+				  void *cb_data)
+{
+	const struct snapshot *snapshot = cb_data;
+	const struct snapshot_record *e1 = v1, *e2 = v2;
+	const char *r1 = e1->start + snapshot_hexsz(snapshot) + 1;
+	const char *r2 = e2->start + snapshot_hexsz(snapshot) + 1;
+
+	return cmp_packed_refname(r1, r2);
+}
+
 /*
  * Compare a snapshot record at `rec` to the specified NUL-terminated
  * refname.
@@ -1768,6 +1774,28 @@  static struct ref_iterator *packed_reflog_iterator_begin(struct ref_store *ref_s
 	return empty_ref_iterator_begin();
 }
 
+struct fsck_packed_ref_entry {
+	int line_number;
+
+	struct snapshot_record record;
+};
+
+static struct fsck_packed_ref_entry *create_fsck_packed_ref_entry(int line_number,
+								  const char *start)
+{
+	struct fsck_packed_ref_entry *entry = xcalloc(1, sizeof(*entry));
+	entry->line_number = line_number;
+	entry->record.start = start;
+	return entry;
+}
+
+static void free_fsck_packed_ref_entries(struct fsck_packed_ref_entry **entries, int nr)
+{
+	for (int i = 0; i < nr; i++)
+		free(entries[i]);
+	free(entries);
+}
+
 static int packed_fsck_ref_next_line(struct fsck_options *o,
 				     struct strbuf *packed_entry, const char *start,
 				     const char *eof, const char **eol)
@@ -1893,13 +1921,60 @@  static int packed_fsck_ref_main_line(struct fsck_options *o,
 	return 0;
 }
 
+static int packed_fsck_ref_sorted(struct fsck_options *o,
+				  struct ref_store *ref_store,
+				  struct fsck_packed_ref_entry **entries,
+				  int nr)
+{
+	size_t hexsz = ref_store->repo->hash_algo->hexsz;
+	struct strbuf packed_entry = STRBUF_INIT;
+	struct fsck_ref_report report = { 0 };
+	struct strbuf refname1 = STRBUF_INIT;
+	struct strbuf refname2 = STRBUF_INIT;
+	int ret = 0;
+
+	for (int i = 1; i < nr; i++) {
+		const char *r1 = entries[i - 1]->record.start + hexsz + 1;
+		const char *r2 = entries[i]->record.start + hexsz + 1;
+
+		if (cmp_packed_refname(r1, r2) >= 0) {
+			const char *err_fmt =
+				"refname '%s' is not less than next refname '%s'";
+			const char *eol;
+			eol = memchr(entries[i - 1]->record.start, '\n',
+				     entries[i - 1]->record.len);
+			strbuf_add(&refname1, r1, eol - r1);
+			eol = memchr(entries[i]->record.start, '\n',
+				     entries[i]->record.len);
+			strbuf_add(&refname2, r2, eol - r2);
+
+			strbuf_addf(&packed_entry, "packed-refs line %d",
+				    entries[i - 1]->line_number);
+			report.path = packed_entry.buf;
+			ret = fsck_report_ref(o, &report,
+					      FSCK_MSG_PACKED_REF_UNSORTED,
+					      err_fmt, refname1.buf, refname2.buf);
+			goto cleanup;
+		}
+	}
+
+cleanup:
+	strbuf_release(&packed_entry);
+	strbuf_release(&refname1);
+	strbuf_release(&refname2);
+	return ret;
+}
+
 static int packed_fsck_ref_content(struct fsck_options *o,
 				   struct ref_store *ref_store,
 				   const char *start, const char *eof)
 {
 	struct strbuf packed_entry = STRBUF_INIT;
+	struct fsck_packed_ref_entry **entries;
 	struct strbuf refname = STRBUF_INIT;
+	int entry_alloc = 20;
 	int line_number = 1;
+	int entry_nr = 0;
 	const char *eol;
 	int ret = 0;
 
@@ -1919,7 +1994,13 @@  static int packed_fsck_ref_content(struct fsck_options *o,
 				       "missing header line");
 	}
 
+	ALLOC_ARRAY(entries, entry_alloc);
 	while (start < eof) {
+		struct fsck_packed_ref_entry *entry
+			= create_fsck_packed_ref_entry(line_number, start);
+
+		ALLOC_GROW(entries, entry_nr + 1, entry_alloc);
+		entries[entry_nr++] = entry;
 		strbuf_reset(&packed_entry);
 		strbuf_addf(&packed_entry, "packed-refs line %d", line_number);
 		ret |= packed_fsck_ref_next_line(o, &packed_entry, start, eof, &eol);
@@ -1935,11 +2016,16 @@  static int packed_fsck_ref_content(struct fsck_options *o,
 			start = eol + 1;
 			line_number++;
 		}
+		entry->record.len = start - entry->record.start;
 	}
 
+	if (!ret)
+		ret |= packed_fsck_ref_sorted(o, ref_store, entries, entry_nr);
+
 	strbuf_release(&packed_entry);
 	strbuf_release(&refname);
 	strbuf_release(&packed_entry);
+	free_fsck_packed_ref_entries(entries, entry_nr);
 	return ret;
 }
 
diff --git a/t/t0602-reffiles-fsck.sh b/t/t0602-reffiles-fsck.sh
index e4b4a58684..9d802d71a9 100755
--- a/t/t0602-reffiles-fsck.sh
+++ b/t/t0602-reffiles-fsck.sh
@@ -727,4 +727,42 @@  test_expect_success 'packed-refs content should be checked' '
 	)
 '
 
+test_expect_success 'packed-ref sorted should be checked' '
+	test_when_finished "rm -rf repo" &&
+	git init repo &&
+	(
+		cd repo &&
+		test_commit default &&
+		git branch branch-1 &&
+		git branch branch-2 &&
+		git tag -a annotated-tag-1 -m tag-1 &&
+		branch_1_oid=$(git rev-parse branch-1) &&
+		branch_2_oid=$(git rev-parse branch-2) &&
+		tag_1_oid=$(git rev-parse annotated-tag-1) &&
+		tag_1_peeled_oid=$(git rev-parse annotated-tag-1^{}) &&
+		refname1="refs/heads/main" &&
+		refname2="refs/heads/foo" &&
+		refname3="refs/tags/foo" &&
+		printf "# pack-refs with: peeled fully-peeled sorted \n"  >.git/packed-refs &&
+		printf "%s %s\n" "$branch_2_oid" "$refname1" >>.git/packed-refs &&
+		printf "%s %s\n" "$branch_1_oid" "$refname2" >>.git/packed-refs &&
+		test_must_fail git refs verify 2>err &&
+		cat >expect <<-EOF &&
+		error: packed-refs line 2: packedRefUnsorted: refname '\''$refname1'\'' is not less than next refname '\''$refname2'\''
+		EOF
+		rm .git/packed-refs &&
+		test_cmp expect err &&
+		printf "# pack-refs with: peeled fully-peeled sorted \n"  >.git/packed-refs &&
+		printf "%s %s\n" "$tag_1_oid" "$refname3" >>.git/packed-refs &&
+		printf "^%s\n" "$tag_1_peeled_oid" >>.git/packed-refs &&
+		printf "%s %s\n" "$branch_2_oid" "$refname2" >>.git/packed-refs &&
+		test_must_fail git refs verify 2>err &&
+		cat >expect <<-EOF &&
+		error: packed-refs line 2: packedRefUnsorted: refname '\''$refname3'\'' is not less than next refname '\''$refname2'\''
+		EOF
+		rm .git/packed-refs &&
+		test_cmp expect err
+	)
+'
+
 test_done