diff mbox series

[v2,01/13] hash: add an algo member to struct object_id

Message ID 20210426010301.1093562-2-sandals@crustytoothpaste.net (mailing list archive)
State New, archived
Headers show
Series SHA-256 / SHA-1 interop, part 1 | expand

Commit Message

brian m. carlson April 26, 2021, 1:02 a.m. UTC
Now that we're working with multiple hash algorithms in the same repo,
it's best if we label each object ID with its algorithm so we can
determine how to format a given object ID. Add a member called algo to
struct object_id.

Performance testing on object ID-heavy workloads doesn't reveal a clear
change in performance.  Out of performance tests t0001 and t1450, there
are slight variations in performance both up and down, but all
measurements are within the margin of error.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
 hash.h | 1 +
 1 file changed, 1 insertion(+)

Comments

Matheus Tavares May 7, 2021, 1:58 p.m. UTC | #1
Hi, brian

On Sun, Apr 25, 2021 at 10:03 PM brian m. carlson
<sandals@crustytoothpaste.net> wrote:
>
> Now that we're working with multiple hash algorithms in the same repo,
> it's best if we label each object ID with its algorithm so we can
> determine how to format a given object ID. Add a member called algo to
> struct object_id.

In parallel-checkout.c:send_one_item(), I used hashcpy() instead of
oidcpy() to prepare the packet data that is sent to the checkout
workers through a pipe.

I avoided oidcpy() because it would copy the whole GIT_MAX_RAWSZ
bytes, and the last part could be uninitialized, leading to a Valgrind
warning about passing uninitialized bytes to a write() syscall. There
is no real harm in this case, but I wanted to avoid the warning as it
might confuse someone trying to debug this code, me included.

The problem with this approach, of course, is that it will not copy
the new `algo` field, leaving it as zero for all items. So, what do
you think would be best in this situation? Some ideas that came
through my mind were:

1. Make oidcpy() only copy `hash_algos[src->algo].rawsz` bytes. (But
then we would probably need to branch in case `src->algo` is zero,
right?)

2. Reintroduce the oid_pad_buffer() function from your v1, and use it
in parallel-checkout.c:send_one_item(), after oidcpy(). This would
then zero out the copied uninitialized bytes (with the cost of one
additional memcpy() per item, but this might be neglectable here).

3. Use oidcpy() as-is, without additional padding, and let Valgrind
warn. This false-positive warn might not be so problematic after all,
and maybe I'm just overthinking things :)

What do you think?

Thanks,
Matheus
brian m. carlson May 7, 2021, 8:07 p.m. UTC | #2
On 2021-05-07 at 13:58:42, Matheus Tavares Bernardino wrote:
> Hi, brian

Hey,

> 1. Make oidcpy() only copy `hash_algos[src->algo].rawsz` bytes. (But
> then we would probably need to branch in case `src->algo` is zero,
> right?)

Yeah, this will likely incur a performance cost.  I'd recommend avoiding
this if possible.

> 2. Reintroduce the oid_pad_buffer() function from your v1, and use it
> in parallel-checkout.c:send_one_item(), after oidcpy(). This would
> then zero out the copied uninitialized bytes (with the cost of one
> additional memcpy() per item, but this might be neglectable here).

This is fine with me.  I didn't have a use for it anymore, but you've
clearly found one, and I think this is probably the best approach here.

> 3. Use oidcpy() as-is, without additional padding, and let Valgrind
> warn. This false-positive warn might not be so problematic after all,
> and maybe I'm just overthinking things :)

I'm okay with this, but I don't know if the other end is security
sensitive and might need unused data zeroed.  If so, we should
definitely avoid this option.
diff mbox series

Patch

diff --git a/hash.h b/hash.h
index 3fb0c3d400..dafdcb3335 100644
--- a/hash.h
+++ b/hash.h
@@ -181,6 +181,7 @@  static inline int hash_algo_by_ptr(const struct git_hash_algo *p)
 
 struct object_id {
 	unsigned char hash[GIT_MAX_RAWSZ];
+	int algo;
 };
 
 #define the_hash_algo the_repository->hash_algo