Message ID | ed9eeef8513e08935c59defafde99956eb62d49a.1727199118.git.me@ttaylorr.com (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | hash.h: support choosing a separate SHA-1 for non-cryptographic uses | expand |
On Tue, Sep 24, 2024 at 01:32:17PM -0400, Taylor Blau wrote: > Loose objects are exempt from this check, and the collision check may be > skipped by calling the _flags variant of this function with the > FOF_SKIP_COLLISION_CHECK bit set. This is done for a couple of reasons: > > - We don't treat the hash of the loose object file's contents as a > checksum, since the same loose object can be stored using different > bytes on disk (e.g., when adjusting core.compression, using a > different version of zlib, etc.). > > This is fundamentally different from cases where > finalize_object_file() is operating over a file which uses the hash > value as a checksum of the contents. In other words, a pair of > identical loose objects can be stored using different bytes on disk, > and that should not be treated as a collision. OK, this explains why a byte-for-byte hash-check would be the wrong thing (it would cause false positives)... > - We already use the path of the loose object as its hash value / > object name, so checking for collisions at the content level doesn't > add anything. > > This is why we do not bother to check the inflated object contents > for collisions either, since either (a) the object contents have the > fingerprint of a SHA-1 collision, in which case the collision > detecting SHA-1 implementation used to hash the contents to give us > a path would have already rejected it, or (b) the contents are part > of a colliding pair which does not bear the same fingerprints of > known collision attacks, in which case we would not have caught it > anyway. > > So skipping the collision check here does not change for better or > worse the hardness of loose object writes. ...and this is the argument for why it is not necessary to do a content-level check. As you say, we're not making anything worse here, as we've never implemented this collision check. But I think the "FIXME" in the code was really there under the notion that it was a backstop to SHA-1 being broken (belt-and-suspenders, if you will). And your argument above assumes that SHA-1 (using the DC variant) is safe. But I think we can expand the argument a bit. I don't think this is a very good place for such a collision check, because race conditions aside, we'll already have bailed long before! When we do a write via write_object_file(), for example, we'll hit the freshen paths that assume the contents are identical, and skip the write (and thus this finalize) entirely. So if you want a collision check, we have to do it at a much higher level. And indeed we do: index-pack already implements such a check (through the confusingly named "check_collison" function). I don't think unpack-objects does so (it looks like it just feeds the result to write_object_file(), which does the freshening thing). So the argument I'd make here is more like: this is the wrong place to do it. Side thoughts on collision checks: I think index-pack is safe in the sense that it will always prefer on-disk copies and will complain if it sees a collision. unpack-objects is not, nor are regular in-repo writes (which normally cannot be triggered by remote, but on forges that do merges, etc, that's not always true). Both of the latter are also subject to races, where a simultaneous collision might let the attacker win. That race is kind of moot in a world where anybody can push to a fork of a repo that ends up in the same shared location (so they can actually win and become the "on disk" copy), and we're relying on sha1dc for protection there. That's specific to certain forges, but they do represent a lot of Git use. In general, the use of unpack-objects versus index-pack is up to the attacker (based on pack size). So I think it would be nice if unpack-objects did the collision check. Even if the attacker beats you to writing the object, it would be nice to see "holy crap, there was a collision" instead of just silently throwing your pushed object away. I know that GitHub only ever runs index-pack, which may give some amount of protection here. In general, I think we should consider deprecating unpack-objects in favor of teaching index-pack to unpack. It has many enhancements (this one, but also threading) that unpack-objects does not. I have an old patch series for this (sorry, only from 2017, I'm slipping). IIRC the sticking point was that unpack-objects is better about memory use in some cases (streaming large blobs?) and I hadn't figured out a way around that. Phew. Sorry, that was a long digression for "I think you're right, but I might have argued it a little differently". I think the direction of the patch (skipping checks entirely for loose objects) is the right thing. > As a small note related to the latter bullet point above, we must teach > the tmp-objdir routines to similarly skip the content-level collision > checks when calling migrate_one() on a loose object file, which we do by > setting the FOF_SKIP_COLLISION_CHECK bit when we are inside of a loose > object shard. OK, this is the part I was wondering how you were going to deal with. :) Let's look at the implementation... > @@ -239,11 +247,15 @@ static int migrate_paths(struct strbuf *src, struct strbuf *dst) > > for (i = 0; i < paths.nr; i++) { > const char *name = paths.items[i].string; > + enum finalize_object_file_flags flags_copy = flags; > > strbuf_addf(src, "/%s", name); > strbuf_addf(dst, "/%s", name); > > - ret |= migrate_one(src, dst); > + if (is_loose_object_shard(name)) > + flags_copy |= FOF_SKIP_COLLISION_CHECK; > + > + ret |= migrate_one(src, dst, flags_copy); > > strbuf_setlen(src, src_len); > strbuf_setlen(dst, dst_len); This looks pretty reasonable overall, though I'd note that migrate_paths() is called recursively. So if I had: tmp-objdir-XXXXXX/pack/ab/foo.pack we'd skip the collision check. I'm not sure how much we want to worry about that. I don't think we'd ever create such a path, and the general form of the paths is all under local control, so an attacker can't convince us to do so. And I find it pretty unlikely that we'd change the on-disk layout to accidentally trigger this. So even though there are security implications if we have a false positive from is_loose_object_shard(), I don't know that it's worth caring about. If we did want to, I think a more robust check is to make sure the shard comes at the start of the path. Which is tricky, of course, because we're building a path on top of an arbitrary root (tmp-objdir for the src, and the repo object dir for the dst). So you'd have to save that base_len and use it as a root, like the patch below. I'm OK if you want to write all of this off as over-engineering, though. diff --git a/tmp-objdir.c b/tmp-objdir.c index 9da0071cba..27ba9f4f57 100644 --- a/tmp-objdir.c +++ b/tmp-objdir.c @@ -207,10 +207,18 @@ static int read_dir_paths(struct string_list *out, const char *path) } static int migrate_paths(struct strbuf *src, struct strbuf *dst, - enum finalize_object_file_flags flags); + size_t base_len); + +static int is_loose_path(const char *name) +{ + return *name++ == '/' && + isxdigit(*name++) && + isxdigit(*name++) && + *name++ == '/'; +} static int migrate_one(struct strbuf *src, struct strbuf *dst, - enum finalize_object_file_flags flags) + size_t base_len) { struct stat st; @@ -222,18 +230,16 @@ static int migrate_one(struct strbuf *src, struct strbuf *dst, return -1; } else if (errno != EEXIST) return -1; - return migrate_paths(src, dst, flags); + return migrate_paths(src, dst, base_len); } - return finalize_object_file_flags(src->buf, dst->buf, flags); + return finalize_object_file_flags(src->buf, dst->buf, + is_loose_path(src->buf + base_len) ? + FOF_SKIP_COLLISION_CHECK : 0); } -static int is_loose_object_shard(const char *name) -{ - return strlen(name) == 2 && isxdigit(name[0]) && isxdigit(name[1]); -} static int migrate_paths(struct strbuf *src, struct strbuf *dst, - enum finalize_object_file_flags flags) + size_t base_len) { size_t src_len = src->len, dst_len = dst->len; struct string_list paths = STRING_LIST_INIT_DUP; @@ -247,15 +253,11 @@ static int migrate_paths(struct strbuf *src, struct strbuf *dst, for (i = 0; i < paths.nr; i++) { const char *name = paths.items[i].string; - enum finalize_object_file_flags flags_copy = flags; strbuf_addf(src, "/%s", name); strbuf_addf(dst, "/%s", name); - if (is_loose_object_shard(name)) - flags_copy |= FOF_SKIP_COLLISION_CHECK; - - ret |= migrate_one(src, dst, flags_copy); + ret |= migrate_one(src, dst, base_len); strbuf_setlen(src, src_len); strbuf_setlen(dst, dst_len); @@ -283,7 +285,7 @@ int tmp_objdir_migrate(struct tmp_objdir *t) strbuf_addbuf(&src, &t->path); strbuf_addstr(&dst, repo_get_object_directory(the_repository)); - ret = migrate_paths(&src, &dst, 0); + ret = migrate_paths(&src, &dst, src.len); strbuf_release(&src); strbuf_release(&dst);
Taylor Blau <me@ttaylorr.com> writes: > One of the reasons we never implemented this is because the files it > moves are all named after the cryptographic hash of their contents > (either loose objects, or packs which have their hash in the name these > days). So we are saying that both loose objects and packfiles would benefit if we did collision checks here. > ... So in preparation, let's actually > implement a byte-for-byte collision check. But the byte-for-byte collision check is only good for packfiles. In other words, the definition of "content" that gets hashed to derive the name can differ among csum-file users, so the way to check for collision can be written for these different types. > Note that this may cause some extra computation when the files are in > fact identical, but this should happen rarely. > > Loose objects are exempt from this check, and the collision check may be > skipped by calling the _flags variant of this function with the > FOF_SKIP_COLLISION_CHECK bit set. This is done for a couple of reasons: > > - We don't treat the hash of the loose object file's contents as a > checksum, since the same loose object can be stored using different > bytes on disk (e.g., when adjusting core.compression, using a > different version of zlib, etc.). That's a good explanation why byte-for-byte check is inappropriate. > - We already use the path of the loose object as its hash value / > object name, so checking for collisions at the content level doesn't > add anything. "We already ... doesn't add anything" -> "The point of collision check is to find an attempt to store different contents that hashes down to the same object name, and we continue to use sha1dc to hash the content name so such a collision would have already been detected". > This is why we do not bother to check the inflated object contents > for collisions either, since either (a) the object contents have the > fingerprint of a SHA-1 collision, in which case the collision > detecting SHA-1 implementation used to hash the contents to give us > a path would have already rejected it, or (b) the contents are part > of a colliding pair which does not bear the same fingerprints of > known collision attacks, in which case we would not have caught it > anyway. I do not understand (b). If you compare inflated contents, you would find such a pair that sha1dc missed but the "implement collision check" patch would have caught as a pair of byte sequences that hash to the same value. Isn't it an opportunity to make it safer to implement such a loose object collision check? Thanks.
On Tue, Sep 24, 2024 at 04:37:18PM -0400, Jeff King wrote: > But I think we can expand the argument a bit. I don't think this is a > very good place for such a collision check, because race conditions > aside, we'll already have bailed long before! When we do a write via > write_object_file(), for example, we'll hit the freshen paths that > assume the contents are identical, and skip the write (and thus this > finalize) entirely. > > So if you want a collision check, we have to do it at a much higher > level. And indeed we do: index-pack already implements such a check > (through the confusingly named "check_collison" function). I don't think > unpack-objects does so (it looks like it just feeds the result to > write_object_file(), which does the freshening thing). > > So the argument I'd make here is more like: this is the wrong place to > do it. I think that is reasonable, and I agree with your reasoning here. I'm happy to reword the commit message if you think doing so would be useful, or drop the paragraph entirely if you think it's just confusing. Let me know what you think, I'm happy to do whatever here, reroll or not :-). > Side thoughts on collision checks: > > I think index-pack is safe in the sense that it will always prefer > on-disk copies and will complain if it sees a collision. > unpack-objects is not, nor are regular in-repo writes (which > normally cannot be triggered by remote, but on forges that do > merges, etc, that's not always true). Both of the latter are also > subject to races, where a simultaneous collision might let the > attacker win. Yup. > That race is kind of moot in a world where anybody can push to a > fork of a repo that ends up in the same shared location (so they can > actually win and become the "on disk" copy), and we're relying on > sha1dc for protection there. That's specific to certain forges, but > they do represent a lot of Git use. > > In general, the use of unpack-objects versus index-pack is up to the > attacker (based on pack size). So I think it would be nice if > unpack-objects did the collision check. Even if the attacker beats > you to writing the object, it would be nice to see "holy crap, there > was a collision" instead of just silently throwing your pushed > object away. Right, I agree that unpack-objects definitely should do the collision check here. And indeed, that is the case, since that code (which all is directly implemented in the builtin) uses the regular collision-detecting SHA-1 implementation. > I know that GitHub only ever runs index-pack, which may give some > amount of protection here. In general, I think we should consider > deprecating unpack-objects in favor of teaching index-pack to > unpack. It has many enhancements (this one, but also threading) that > unpack-objects does not. I have an old patch series for this (sorry, > only from 2017, I'm slipping). IIRC the sticking point was that > unpack-objects is better about memory use in some cases (streaming > large blobs?) and I hadn't figured out a way around that. Only seven years old? Sheesh, you really are slipping ;-). > Phew. Sorry, that was a long digression for "I think you're right, but I > might have argued it a little differently". I think the direction of the > patch (skipping checks entirely for loose objects) is the right thing. > > > As a small note related to the latter bullet point above, we must teach > > the tmp-objdir routines to similarly skip the content-level collision > > checks when calling migrate_one() on a loose object file, which we do by > > setting the FOF_SKIP_COLLISION_CHECK bit when we are inside of a loose > > object shard. > > OK, this is the part I was wondering how you were going to deal with. :) > Let's look at the implementation... > > > @@ -239,11 +247,15 @@ static int migrate_paths(struct strbuf *src, struct strbuf *dst) > > > > for (i = 0; i < paths.nr; i++) { > > const char *name = paths.items[i].string; > > + enum finalize_object_file_flags flags_copy = flags; > > > > strbuf_addf(src, "/%s", name); > > strbuf_addf(dst, "/%s", name); > > > > - ret |= migrate_one(src, dst); > > + if (is_loose_object_shard(name)) > > + flags_copy |= FOF_SKIP_COLLISION_CHECK; > > + > > + ret |= migrate_one(src, dst, flags_copy); > > > > strbuf_setlen(src, src_len); > > strbuf_setlen(dst, dst_len); > > This looks pretty reasonable overall, though I'd note that > migrate_paths() is called recursively. So if I had: > > tmp-objdir-XXXXXX/pack/ab/foo.pack > > we'd skip the collision check. I'm not sure how much we want to worry > about that. I don't think we'd ever create such a path, and the general > form of the paths is all under local control, so an attacker can't > convince us to do so. And I find it pretty unlikely that we'd change the > on-disk layout to accidentally trigger this. I thought about this when writing it, and came to the conclusion that the checks we have for "are we in something that looks like a loose object shard?" are sane. That's only because we won't bother reading a pack in $GIT_DIR/objects/pack/xx/yy.pack, since we do not read packs recursively from the pack sub-directory. So I think that the diff you posted below isn't hurting anything, and the implementation looks correct to me, but I also don't think it's helping anything either as a consequence of the above. Thanks, Taylor
On Tue, Sep 24, 2024 at 02:32:21PM -0700, Junio C Hamano wrote: > > - We already use the path of the loose object as its hash value / > > object name, so checking for collisions at the content level doesn't > > add anything. > > "We already ... doesn't add anything" -> "The point of collision > check is to find an attempt to store different contents that hashes > down to the same object name, and we continue to use sha1dc to hash > the content name so such a collision would have already been > detected". Exactly. > > This is why we do not bother to check the inflated object contents > > for collisions either, since either (a) the object contents have the > > fingerprint of a SHA-1 collision, in which case the collision > > detecting SHA-1 implementation used to hash the contents to give us > > a path would have already rejected it, or (b) the contents are part > > of a colliding pair which does not bear the same fingerprints of > > known collision attacks, in which case we would not have caught it > > anyway. > > I do not understand (b). If you compare inflated contents, you > would find such a pair that sha1dc missed but the "implement > collision check" patch would have caught as a pair of byte sequences > that hash to the same value. Isn't it an opportunity to make it safer > to implement such a loose object collision check? All I was saying with (b) was that if you had some new collision attack that didn't trigger the existing detection mechanisms in the existing collision-detecting SHA-1 implementation, then you wouldn't notice the collision anyway. You could inflate the contents and compare them, but I don't think it would much matter at that point since you've already let one of them enter the repository, and it's not clear which is the "correct" one to keep since they both hash to the same value. Thanks, Taylor
On Tue, Sep 24, 2024 at 05:59:10PM -0400, Taylor Blau wrote: > > So the argument I'd make here is more like: this is the wrong place to > > do it. > > I think that is reasonable, and I agree with your reasoning here. I'm > happy to reword the commit message if you think doing so would be > useful, or drop the paragraph entirely if you think it's just confusing. > > Let me know what you think, I'm happy to do whatever here, reroll or not > :-). I'm content to let this live in the list archive, but it sounds like Junio had the same reaction, so it may be worth trying to rework the commit message a bit. > > In general, the use of unpack-objects versus index-pack is up to the > > attacker (based on pack size). So I think it would be nice if > > unpack-objects did the collision check. Even if the attacker beats > > you to writing the object, it would be nice to see "holy crap, there > > was a collision" instead of just silently throwing your pushed > > object away. > > Right, I agree that unpack-objects definitely should do the collision > check here. And indeed, that is the case, since that code (which all is > directly implemented in the builtin) uses the regular > collision-detecting SHA-1 implementation. It uses sha1dc, but what I mean by "collision check" is an additional belt-and-suspenders check. That even once we see an object which hashes to a particular sha1 which we already have, we'll do the byte-for-byte comparison. See index-pack's "check_collison". And that's what I think should be added to unpack-objects (but again, this is all orthogonal to your series). > I thought about this when writing it, and came to the conclusion that > the checks we have for "are we in something that looks like a loose > object shard?" are sane. That's only because we won't bother reading a > pack in $GIT_DIR/objects/pack/xx/yy.pack, since we do not read packs > recursively from the pack sub-directory. > > So I think that the diff you posted below isn't hurting anything, and > the implementation looks correct to me, but I also don't think it's > helping anything either as a consequence of the above. Right, it's purely about future-proofing against a hypothetical change where we'd be more inclusive (both in what we read, but also in what we'd ourselves write!). -Peff
On Tue, Sep 24, 2024 at 06:20:39PM -0400, Jeff King wrote: > On Tue, Sep 24, 2024 at 05:59:10PM -0400, Taylor Blau wrote: > > > > So the argument I'd make here is more like: this is the wrong place to > > > do it. > > > > I think that is reasonable, and I agree with your reasoning here. I'm > > happy to reword the commit message if you think doing so would be > > useful, or drop the paragraph entirely if you think it's just confusing. > > > > Let me know what you think, I'm happy to do whatever here, reroll or not > > :-). > > I'm content to let this live in the list archive, but it sounds like > Junio had the same reaction, so it may be worth trying to rework the > commit message a bit. Here's the relevant portion of my range-diff that has the new wording (which is more or less equivalent to your "this isn't the right place to do it, and we're not fundamentally changing anything from a security perspective here" argument). Let me know what you think: --- 8< --- 3: ed9eeef851 ! 3: 41d38352a4 finalize_object_file(): implement collision check @@ Commit message object name, so checking for collisions at the content level doesn't add anything. - This is why we do not bother to check the inflated object contents - for collisions either, since either (a) the object contents have the - fingerprint of a SHA-1 collision, in which case the collision - detecting SHA-1 implementation used to hash the contents to give us - a path would have already rejected it, or (b) the contents are part - of a colliding pair which does not bear the same fingerprints of - known collision attacks, in which case we would not have caught it - anyway. + Adding a content-level collision check would have to happen at a + higher level than in finalize_object_file(), since (avoiding race + conditions) writing an object loose which already exists in the + repository will prevent us from even reaching finalize_object_file() + via the object freshening code. + + There is a collision check in index-pack via its `check_collision()` + function, but there isn't an analogous function in unpack-objects, + which just feeds the result to write_object_file(). So skipping the collision check here does not change for better or worse the hardness of loose object writes. @@ Commit message Co-authored-by: Jeff King <peff@peff.net> Signed-off-by: Jeff King <peff@peff.net> + Helped-by: Elijah Newren <newren@gmail.com> Signed-off-by: Taylor Blau <me@ttaylorr.com> ## object-file.c ## --- >8 --- Thanks, Taylor
diff --git a/object-file.c b/object-file.c index 5a1da577115..b9a3a1f62db 100644 --- a/object-file.c +++ b/object-file.c @@ -1932,10 +1932,67 @@ static void write_object_file_prepare_literally(const struct git_hash_algo *algo hash_object_body(algo, &c, buf, len, oid, hdr, hdrlen); } +static int check_collision(const char *filename_a, const char *filename_b) +{ + char buf_a[4096], buf_b[4096]; + int fd_a = -1, fd_b = -1; + int ret = 0; + + fd_a = open(filename_a, O_RDONLY); + if (fd_a < 0) { + ret = error_errno(_("unable to open %s"), filename_a); + goto out; + } + + fd_b = open(filename_b, O_RDONLY); + if (fd_b < 0) { + ret = error_errno(_("unable to open %s"), filename_b); + goto out; + } + + while (1) { + ssize_t sz_a, sz_b; + + sz_a = read_in_full(fd_a, buf_a, sizeof(buf_a)); + if (sz_a < 0) { + ret = error_errno(_("unable to read %s"), filename_a); + goto out; + } + + sz_b = read_in_full(fd_b, buf_b, sizeof(buf_b)); + if (sz_b < 0) { + ret = error_errno(_("unable to read %s"), filename_b); + goto out; + } + + if (sz_a != sz_b || memcmp(buf_a, buf_b, sz_a)) { + ret = error(_("files '%s' and '%s' differ in contents"), + filename_a, filename_b); + goto out; + } + + if (sz_a < sizeof(buf_a)) + break; + } + +out: + if (fd_a > -1) + close(fd_a); + if (fd_b > -1) + close(fd_b); + return ret; +} + /* * Move the just written object into its final resting place. */ int finalize_object_file(const char *tmpfile, const char *filename) +{ + return finalize_object_file_flags(tmpfile, filename, 0); +} + +int finalize_object_file_flags(const char *tmpfile, const char *filename, + enum finalize_object_file_flags flags) { struct stat st; int ret = 0; @@ -1974,7 +2031,9 @@ int finalize_object_file(const char *tmpfile, const char *filename) errno = saved_errno; return error_errno(_("unable to write file %s"), filename); } - /* FIXME!!! Collision check here ? */ + if (!(flags & FOF_SKIP_COLLISION_CHECK) && + check_collision(tmpfile, filename)) + return -1; unlink_or_warn(tmpfile); } @@ -2228,7 +2287,8 @@ static int write_loose_object(const struct object_id *oid, char *hdr, warning_errno(_("failed utime() on %s"), tmp_file.buf); } - return finalize_object_file(tmp_file.buf, filename.buf); + return finalize_object_file_flags(tmp_file.buf, filename.buf, + FOF_SKIP_COLLISION_CHECK); } static int freshen_loose_object(const struct object_id *oid) @@ -2350,7 +2410,8 @@ int stream_loose_object(struct input_stream *in_stream, size_t len, strbuf_release(&dir); } - err = finalize_object_file(tmp_file.buf, filename.buf); + err = finalize_object_file_flags(tmp_file.buf, filename.buf, + FOF_SKIP_COLLISION_CHECK); if (!err && compat) err = repo_add_loose_object_map(the_repository, oid, &compat_oid); cleanup: diff --git a/object-file.h b/object-file.h index d6414610f80..81b30d269c8 100644 --- a/object-file.h +++ b/object-file.h @@ -117,7 +117,13 @@ int check_object_signature(struct repository *r, const struct object_id *oid, */ int stream_object_signature(struct repository *r, const struct object_id *oid); +enum finalize_object_file_flags { + FOF_SKIP_COLLISION_CHECK = 1, +}; + int finalize_object_file(const char *tmpfile, const char *filename); +int finalize_object_file_flags(const char *tmpfile, const char *filename, + enum finalize_object_file_flags flags); /* Helper to check and "touch" a file */ int check_and_freshen_file(const char *fn, int freshen); diff --git a/tmp-objdir.c b/tmp-objdir.c index c2fb9f91930..9da0071cba8 100644 --- a/tmp-objdir.c +++ b/tmp-objdir.c @@ -206,9 +206,11 @@ static int read_dir_paths(struct string_list *out, const char *path) return 0; } -static int migrate_paths(struct strbuf *src, struct strbuf *dst); +static int migrate_paths(struct strbuf *src, struct strbuf *dst, + enum finalize_object_file_flags flags); -static int migrate_one(struct strbuf *src, struct strbuf *dst) +static int migrate_one(struct strbuf *src, struct strbuf *dst, + enum finalize_object_file_flags flags) { struct stat st; @@ -220,12 +222,18 @@ static int migrate_one(struct strbuf *src, struct strbuf *dst) return -1; } else if (errno != EEXIST) return -1; - return migrate_paths(src, dst); + return migrate_paths(src, dst, flags); } - return finalize_object_file(src->buf, dst->buf); + return finalize_object_file_flags(src->buf, dst->buf, flags); } -static int migrate_paths(struct strbuf *src, struct strbuf *dst) +static int is_loose_object_shard(const char *name) +{ + return strlen(name) == 2 && isxdigit(name[0]) && isxdigit(name[1]); +} + +static int migrate_paths(struct strbuf *src, struct strbuf *dst, + enum finalize_object_file_flags flags) { size_t src_len = src->len, dst_len = dst->len; struct string_list paths = STRING_LIST_INIT_DUP; @@ -239,11 +247,15 @@ static int migrate_paths(struct strbuf *src, struct strbuf *dst) for (i = 0; i < paths.nr; i++) { const char *name = paths.items[i].string; + enum finalize_object_file_flags flags_copy = flags; strbuf_addf(src, "/%s", name); strbuf_addf(dst, "/%s", name); - ret |= migrate_one(src, dst); + if (is_loose_object_shard(name)) + flags_copy |= FOF_SKIP_COLLISION_CHECK; + + ret |= migrate_one(src, dst, flags_copy); strbuf_setlen(src, src_len); strbuf_setlen(dst, dst_len); @@ -271,7 +283,7 @@ int tmp_objdir_migrate(struct tmp_objdir *t) strbuf_addbuf(&src, &t->path); strbuf_addstr(&dst, repo_get_object_directory(the_repository)); - ret = migrate_paths(&src, &dst); + ret = migrate_paths(&src, &dst, 0); strbuf_release(&src); strbuf_release(&dst);