Message ID | Z3qN_8-HKdspwcDb@ArchLinux (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | add more ref consistency checks | expand |
On Sun, Jan 05, 2025 at 09:49:51PM +0800, shejialuo wrote: > We have already implemented the header consistency check for the raw > "packed-refs" file. Before we implement the consistency check for each > ref entry, let's analysis [1] which reports that "git fsck" cannot > detect some binary zeros. > > "packed-backend.c::next_record" will use "check_refname_format" to check > the consistency of the refname. If it is not OK, the program will die. > So, we already have the code path and we must miss out something. > > We use the following code to get the refname: > > strbuf_add(&iter->refname_buf, p, eol - p); > iter->base.refname = iter->refname_buf.buf > > In the above code, `p` is the start pointer of the refname and `eol` is > the next newline pointer. We calculate the length of the refname by > subtracting the two pointers. Then we add the memory range between `p` > and `eol` to get the refname. > > However, if there are some NULL binaries in the memory range between `p` You probably mean NUL characters, not NULL binaries? > diff --git a/refs/packed-backend.c b/refs/packed-backend.c > index 3b11abe5f8..f6142a4402 100644 > --- a/refs/packed-backend.c > +++ b/refs/packed-backend.c > @@ -493,6 +493,23 @@ static void verify_buffer_safe(struct snapshot *snapshot) > last_line, eof - last_line); > } > > +/* > + * When parsing the "packed-refs" file, we will parse it line by line. > + * Because we know the start pointer of the refname and the next > + * newline pointer, we could calculate the length of the refname by > + * subtracting the two pointers. However, there is a corner case where > + * the refname contains corrupted embedded NULL binaries. And > + * `check_refname_format()` will not catch this when the truncated > + * refname is still a valid refname. To prevent this, we need to check > + * whether the refname contains the NULL binaries. > + */ > +static int refname_contains_null(struct strbuf refname) > +{ > + if (refname.len != strlen(refname.buf)) > + return 1; > + return 0; > +} > + > #define SMALL_FILE_SIZE (32*1024) > > /* > @@ -894,6 +911,9 @@ static int next_record(struct packed_ref_iterator *iter) > strbuf_add(&iter->refname_buf, p, eol - p); > iter->base.refname = iter->refname_buf.buf; > > + if (refname_contains_null(iter->refname_buf)) We can replace this with `memchr(iter->refname_buf.buf, '\0', iter->refname_buf.len)`, which should be more efficient than using strlen(3p). > + die("packed refname contains embedded NULL: %s", iter->base.refname); > + I was a bit surprised to find that we modify the way that we read refs from the packed-refs file instead of adapting the fsck code. But I think this check is sensible. Patrick
On Thu, Jan 16, 2025 at 02:57:40PM +0100, Patrick Steinhardt wrote: > On Sun, Jan 05, 2025 at 09:49:51PM +0800, shejialuo wrote: > > We have already implemented the header consistency check for the raw > > "packed-refs" file. Before we implement the consistency check for each > > ref entry, let's analysis [1] which reports that "git fsck" cannot > > detect some binary zeros. > > > > "packed-backend.c::next_record" will use "check_refname_format" to check > > the consistency of the refname. If it is not OK, the program will die. > > So, we already have the code path and we must miss out something. > > > > We use the following code to get the refname: > > > > strbuf_add(&iter->refname_buf, p, eol - p); > > iter->base.refname = iter->refname_buf.buf > > > > In the above code, `p` is the start pointer of the refname and `eol` is > > the next newline pointer. We calculate the length of the refname by > > subtracting the two pointers. Then we add the memory range between `p` > > and `eol` to get the refname. > > > > However, if there are some NULL binaries in the memory range between `p` > > You probably mean NUL characters, not NULL binaries? > Yes, I will improve this in the next version. > > diff --git a/refs/packed-backend.c b/refs/packed-backend.c > > index 3b11abe5f8..f6142a4402 100644 > > --- a/refs/packed-backend.c > > +++ b/refs/packed-backend.c > > @@ -493,6 +493,23 @@ static void verify_buffer_safe(struct snapshot *snapshot) > > last_line, eof - last_line); > > } > > > > +/* > > + * When parsing the "packed-refs" file, we will parse it line by line. > > + * Because we know the start pointer of the refname and the next > > + * newline pointer, we could calculate the length of the refname by > > + * subtracting the two pointers. However, there is a corner case where > > + * the refname contains corrupted embedded NULL binaries. And > > + * `check_refname_format()` will not catch this when the truncated > > + * refname is still a valid refname. To prevent this, we need to check > > + * whether the refname contains the NULL binaries. > > + */ > > +static int refname_contains_null(struct strbuf refname) > > +{ > > + if (refname.len != strlen(refname.buf)) > > + return 1; > > + return 0; > > +} > > + > > #define SMALL_FILE_SIZE (32*1024) > > > > /* > > @@ -894,6 +911,9 @@ static int next_record(struct packed_ref_iterator *iter) > > strbuf_add(&iter->refname_buf, p, eol - p); > > iter->base.refname = iter->refname_buf.buf; > > > > + if (refname_contains_null(iter->refname_buf)) > > We can replace this with `memchr(iter->refname_buf.buf, '\0', > iter->refname_buf.len)`, which should be more efficient than using > strlen(3p). Thanks for the suggestion. Will improve this in the next version. > > > + die("packed refname contains embedded NULL: %s", iter->base.refname); > > + > > I was a bit surprised to find that we modify the way that we read refs > from the packed-refs file instead of adapting the fsck code. But I think > this check is sensible. Actually, I am also surprised here. And this thing is extremely interesting. When I implement all the fsck code, I find I still cannot detect the error reported in [1] which is the motivation why we want to add checks for ref explicitly. And I dive into the code to fix this problem. The reason why I put here is that we are going to implement the checks like what "next_record" does. [1] https://lore.kernel.org/git/6cfee0e4-3285-4f18-91ff-d097da9de737@rd10.de/ Thanks, Jialuo
diff --git a/refs/packed-backend.c b/refs/packed-backend.c index 3b11abe5f8..f6142a4402 100644 --- a/refs/packed-backend.c +++ b/refs/packed-backend.c @@ -493,6 +493,23 @@ static void verify_buffer_safe(struct snapshot *snapshot) last_line, eof - last_line); } +/* + * When parsing the "packed-refs" file, we will parse it line by line. + * Because we know the start pointer of the refname and the next + * newline pointer, we could calculate the length of the refname by + * subtracting the two pointers. However, there is a corner case where + * the refname contains corrupted embedded NULL binaries. And + * `check_refname_format()` will not catch this when the truncated + * refname is still a valid refname. To prevent this, we need to check + * whether the refname contains the NULL binaries. + */ +static int refname_contains_null(struct strbuf refname) +{ + if (refname.len != strlen(refname.buf)) + return 1; + return 0; +} + #define SMALL_FILE_SIZE (32*1024) /* @@ -894,6 +911,9 @@ static int next_record(struct packed_ref_iterator *iter) strbuf_add(&iter->refname_buf, p, eol - p); iter->base.refname = iter->refname_buf.buf; + if (refname_contains_null(iter->refname_buf)) + die("packed refname contains embedded NULL: %s", iter->base.refname); + if (check_refname_format(iter->base.refname, REFNAME_ALLOW_ONELEVEL)) { if (!refname_is_safe(iter->base.refname)) die("packed refname is dangerous: %s",
We have already implemented the header consistency check for the raw "packed-refs" file. Before we implement the consistency check for each ref entry, let's analysis [1] which reports that "git fsck" cannot detect some binary zeros. "packed-backend.c::next_record" will use "check_refname_format" to check the consistency of the refname. If it is not OK, the program will die. So, we already have the code path and we must miss out something. We use the following code to get the refname: strbuf_add(&iter->refname_buf, p, eol - p); iter->base.refname = iter->refname_buf.buf In the above code, `p` is the start pointer of the refname and `eol` is the next newline pointer. We calculate the length of the refname by subtracting the two pointers. Then we add the memory range between `p` and `eol` to get the refname. However, if there are some NULL binaries in the memory range between `p` and `eol`, we will see the refname as a valid ref name as long as the memory range between `p` and the first occurred NULL binary is valid. In order to catch above corruption, create a new function "refname_contains_null" by checking whether the "refname.len" equals to the length of the raw string pointer "refname.buf". If not equal, there must be some NULL binaries in the refname. Use this function in "next_record" function to die the program if "refname_contains_null" returns true. [1] https://lore.kernel.org/git/6cfee0e4-3285-4f18-91ff-d097da9de737@rd10.de/ Reported-by: R. Diez <rdiez-temp3@rd10.de> Mentored-by: Patrick Steinhardt <ps@pks.im> Mentored-by: Karthik Nayak <karthik.188@gmail.com> Signed-off-by: shejialuo <shejialuo@gmail.com> --- refs/packed-backend.c | 20 ++++++++++++++++++++ 1 file changed, 20 insertions(+)