read-cache: avoid misaligned reads in index v4

Message ID	pull.1366.git.1663962236069.gitgitgadget@gmail.com (mailing list archive)
State	Superseded
Headers	show Return-Path: <git-owner@kernel.org> Message-Id: <pull.1366.git.1663962236069.gitgitgadget@gmail.com> Date: Fri, 23 Sep 2022 19:43:55 +0000 Subject: [PATCH] read-cache: avoid misaligned reads in index v4 Fcc: Sent Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit MIME-Version: 1.0 To: git@vger.kernel.org Cc: peff@peff.net, derrickstolee@github.com, gitster@pobox.com, Victoria Dye <vdye@github.com>, Victoria Dye <vdye@github.com> Precedence: bulk From: Victoria Dye <vdye@github.com>
Series	read-cache: avoid misaligned reads in index v4 \| expand read-cache: avoid misaligned reads in index v4

Victoria Dye Sept. 23, 2022, 7:43 p.m. UTC

From: Victoria Dye <vdye@github.com>

The process for reading the index into memory from disk is to first read its
contents into a single memory-mapped file buffer (type 'char *'), then
sequentially convert each on-disk index entry into a corresponding incore
'cache_entry'. To access the contents of the on-disk entry for processing, a
moving pointer within the memory-mapped file is cast to type 'struct
ondisk_cache_entry *'.

In index v4, the entries in the on-disk index file are written *without*
aligning their first byte to a 4-byte boundary; entries are a variable
length (depending on the entry name and whether or not extended flags are
used). As a result, casting the 'char *' buffer pointer to 'struct
ondisk_cache_entry *' then accessing its contents in a 'SANITIZE=undefined'
build can trigger the following error:

  read-cache.c:1886:46: runtime error: member access within misaligned
  address <address> for type 'struct ondisk_cache_entry', which requires 4
  byte alignment

Avoid this error by reading fields directly from the 'char *' buffer, using
the 'offsetof' individual fields in 'struct ondisk_cache_entry'.

Reported-by: Jeff King <peff@peff.net>
Signed-off-by: Victoria Dye <vdye@github.com>
---
    read-cache: avoid misaligned reads in index v4
    
    This fixes the bug reported in [1], where unaligned index entries in the
    memory-mapped index file triggered a 'SANITIZE=undefined' error due to
    casting to & accessing unaligned data a 4 byte-aligned 'struct
    ondisk_cache_entry *' type.
    
    In addition to the originally-reported 't9210-scalar.sh' now passing in
    a 'SANITIZE=undefined' build, I did some light testing by first writing
    a v4 index with a released version of Git (v2.37), then running some
    index-modifying operations ('git status', 'git add') with this patch's
    changes, then again running 'git status' with the stable version. I
    didn't see anything out of the ordinary but, considering how critical
    "reading the index" is, I'd very much appreciate some extra-thorough
    reviews on this patch. :)
    
    Thanks!
    
     * Victoria
    
    [1]
    https://lore.kernel.org/git/YywzNTzd72tox8Z+@coredump.intra.peff.net/

Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1366%2Fvdye%2Fbugfix%2Findex-v4-v1
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1366/vdye/bugfix/index-v4-v1
Pull-Request: https://github.com/gitgitgadget/git/pull/1366

 read-cache.c | 42 +++++++++++++++++++++++-------------------
 1 file changed, 23 insertions(+), 19 deletions(-)


base-commit: 1b3d6e17fe83eb6f79ffbac2f2c61bbf1eaef5f8

Jeff King Sept. 23, 2022, 9:39 p.m. UTC | #1

On Fri, Sep 23, 2022 at 07:43:55PM +0000, Victoria Dye via GitGitGadget wrote:

> Avoid this error by reading fields directly from the 'char *' buffer, using
> the 'offsetof' individual fields in 'struct ondisk_cache_entry'.

Thanks for moving this forward. I agree this should fix the alignment
problems, and I didn't see anything in the patch that would do the wrong
thing. I do have some style/technique suggestions, though.

> @@ -1883,7 +1883,7 @@ static struct cache_entry *create_from_disk(struct mem_pool *ce_mem_pool,
>  	size_t len;
>  	const char *name;
>  	const unsigned hashsz = the_hash_algo->rawsz;
> -	const uint16_t *flagsp = (const uint16_t *)(ondisk->data + hashsz);
> +	const char *flagsp = ondisk + offsetof(struct ondisk_cache_entry, data) + hashsz;

Now we use the "const char *" pointer instead of the cast to the
ondisk_cache_entry struct, which is good, and is what fixes the
alignment question.

But we also convert flagsp from being a uint16_t into a byte pointer.
I'm not sure if that's strictly necessary from an alignment perspective,
as we'd dereference it only via get_be16(), which handles alignment and
type conversion itself.

I'd imagine the standard probably says that even forming such a pointer
is illegal, so in that sense, it probably is undefined behavior. But I
think it's one of those things that's OK in practice.

That might be splitting hairs, but if you kept it as a uint16_t pointer,
then code like this:

> @@ -1901,15 +1901,15 @@ static struct cache_entry *create_from_disk(struct mem_pool *ce_mem_pool,
>  
>  	if (flags & CE_EXTENDED) {
>  		int extended_flags;
> -		extended_flags = get_be16(flagsp + 1) << 16;
> +		extended_flags = get_be16(flagsp + sizeof(uint16_t)) << 16;

doesn't need to be changed. I don't know if it's that big a deal either
way, though.

> @@ -1935,20 +1935,24 @@ static struct cache_entry *create_from_disk(struct mem_pool *ce_mem_pool,
>  
>  	ce = mem_pool__ce_alloc(ce_mem_pool, len);
>  
> -	ce->ce_stat_data.sd_ctime.sec = get_be32(&ondisk->ctime.sec);
> [...]
> +	ce->ce_stat_data.sd_ctime.sec = get_be32(ondisk + offsetof(struct ondisk_cache_entry, ctime)
> +							+ offsetof(struct cache_time, sec));

I had figured we'd be able to drop ondisk_cache_entry entirely. But here
you're using it essentially as a template for a set of constants
retrieved via offsetof().

That's OK from an alignment perspective. It does mean we'd be in trouble
if a compiler ever decided to introduce padding into the struct. That's
probably unlikely. We don't use __attribute__((packed)) because it's not
portable, and our existing uses have generally been OK, because our
data structures are organized around 8-byte alignment. We might have
problems on a theoretical 128-bit processor or something.

So I don't think this is a problem now, and unlikely to be in the near
future. But another way to do it would just be an actual set of offsets
(either #define or an enum). That maybe makes the intended use more
obvious, and also prevents people from accidentally misusing the struct.
I'm not sure if it's worth it for not.

It is a bit of a pain to write. Either you have magic numbers, or you
have to reference the offset and size of the previous entry:

  #define ONDISK_CACHE_CTIME 0
  #define ONDISK_CACHE_MTIME (ONDISK_CACHE_CTIME + sizeof(struct cache_time))
  #define ONDISK_CACHE_DEV (ONDISK_CACHE_MTIME + sizeof(struct cache_time))

Another strategy is to just parse left-to-right, advancing the byte
pointer. Like:

  ce->ce_state_data.sd_ctime.sec = get_be32(ondisk);
  ondisk += sizeof(uint32_t);
  ce->ce_state_data.sd_mtime.sec = get_be32(ondisk);
  ondisk += sizeof(uint32_t);
  ...etc...

You can even stick that in a helper function that does the get_b32() and
advances, so you know they're always done in sync. See pack-bitmap.c's
read_be32(), etc. IMHO this produces a nice result because the reading
code itself becomes the source of truth for the format.

But one tricky thing there is if you want to parse out of order. And it
does seem that we read the struct out of order in this case. But I don't
think there's any reason we need to do so. Of course reordering the
function would make the change much more invasive.

So all that said, I'm OK with this approach as the minimal fix, and then
we can think about further refactoring or cleanup on top.

One final note, though:

> +	ce->ce_stat_data.sd_mtime.sec = get_be32(ondisk + offsetof(struct ondisk_cache_entry, mtime)
> +							+ offsetof(struct cache_time, sec));

Here (and elsewhere), you can assume that the offsetof() "sec" in
cache_time is 0, for two reasons:

  - I didn't look up chapter and verse, but I'm pretty sure the standard
    does guarantee that the first field of a struct is at the beginning.

  - If there's any padding, this whole scheme is hosed anyway, because
    it means sizeof(cache_time) is bigger than we expect, which messes
    up the offsetof() the entry after us (in this case sd_dev).

So this can just be:

  ce->ce_stat_data.sd_mtime.sec = get_be32(ondisk + offsetof(struct ondisk_cache_entry, mtime));

which is mercifully shorter.

Assuming we dismiss the rest of what I said as not worth it for a
minimal fix, I do think that simplification is worth rolling a v2.

-Peff

PS BTW, I mentioned earlier "can we just get rid of ondisk_cache_entry".
   We also use it for the writing side, of course. That doesn't have
   alignment issues, but it does have the same "I hope there's never any
   padding" question. In an ideal world, it would be using the
   equivalent put_be32(), but again, that's getting out of the "minimal
   fix" territory.

Junio C Hamano Sept. 23, 2022, 10:04 p.m. UTC | #2

Jeff King <peff@peff.net> writes:

> Here (and elsewhere), you can assume that the offsetof() "sec" in
> cache_time is 0, for two reasons:
>
>   - I didn't look up chapter and verse, but I'm pretty sure the standard
>     does guarantee that the first field of a struct is at the beginning.

https://www.open-std.org/jtc1/sc22/wg14/www/docs/n1256.pdf

6.7.2.1 #13 (page 103)

    Within a structure object, the non-bit-field members and the
    units in which bit-fields reside have addresses that increase in
    the order in which they are declared. A pointer to a structure
    object, suitably converted, points to its initial member (or if
    that member is a bit-field, then to the unit in which it
    resides), and vice versa. There may be unnamed padding within a
    structure object, but not at its beginning.

As an initial padding is forbidden, the first member's offset is zero.

Phillip Wood Sept. 25, 2022, 8:25 a.m. UTC | #3

Hi Victoria

On 23/09/2022 20:43, Victoria Dye via GitGitGadget wrote:
> From: Victoria Dye <vdye@github.com>
> 
> The process for reading the index into memory from disk is to first read its
> contents into a single memory-mapped file buffer (type 'char *'), then
> sequentially convert each on-disk index entry into a corresponding incore
> 'cache_entry'. To access the contents of the on-disk entry for processing, a
> moving pointer within the memory-mapped file is cast to type 'struct
> ondisk_cache_entry *'.
> 
> In index v4, the entries in the on-disk index file are written *without*
> aligning their first byte to a 4-byte boundary; entries are a variable
> length (depending on the entry name and whether or not extended flags are
> used). As a result, casting the 'char *' buffer pointer to 'struct
> ondisk_cache_entry *' then accessing its contents in a 'SANITIZE=undefined'
> build can trigger the following error:
> 
>    read-cache.c:1886:46: runtime error: member access within misaligned
>    address <address> for type 'struct ondisk_cache_entry', which requires 4
>    byte alignment
> 
> Avoid this error by reading fields directly from the 'char *' buffer, using
> the 'offsetof' individual fields in 'struct ondisk_cache_entry'.

I was confused as to why this was safe as it means we're still reading 
the individual fields from unaligned addresses. The reason it is safe is 
that we use get_bexx() to read the fields and those functions handle 
unaligned access. I wonder if it is worth adding a note to clarify that 
if you re-roll.

Best Wishes

Phillip
> 
> Reported-by: Jeff King <peff@peff.net>
> Signed-off-by: Victoria Dye <vdye@github.com>
> ---
>      read-cache: avoid misaligned reads in index v4
>      
>      This fixes the bug reported in [1], where unaligned index entries in the
>      memory-mapped index file triggered a 'SANITIZE=undefined' error due to
>      casting to & accessing unaligned data a 4 byte-aligned 'struct
>      ondisk_cache_entry *' type.
>      
>      In addition to the originally-reported 't9210-scalar.sh' now passing in
>      a 'SANITIZE=undefined' build, I did some light testing by first writing
>      a v4 index with a released version of Git (v2.37), then running some
>      index-modifying operations ('git status', 'git add') with this patch's
>      changes, then again running 'git status' with the stable version. I
>      didn't see anything out of the ordinary but, considering how critical
>      "reading the index" is, I'd very much appreciate some extra-thorough
>      reviews on this patch. :)
>      
>      Thanks!
>      
>       * Victoria
>      
>      [1]
>      https://lore.kernel.org/git/YywzNTzd72tox8Z+@coredump.intra.peff.net/
> 
> Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1366%2Fvdye%2Fbugfix%2Findex-v4-v1
> Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1366/vdye/bugfix/index-v4-v1
> Pull-Request: https://github.com/gitgitgadget/git/pull/1366
> 
>   read-cache.c | 42 +++++++++++++++++++++++-------------------
>   1 file changed, 23 insertions(+), 19 deletions(-)
> 
> diff --git a/read-cache.c b/read-cache.c
> index b09128b1884..d16eb979060 100644
> --- a/read-cache.c
> +++ b/read-cache.c
> @@ -1875,7 +1875,7 @@ static int read_index_extension(struct index_state *istate,
>   
>   static struct cache_entry *create_from_disk(struct mem_pool *ce_mem_pool,
>   					    unsigned int version,
> -					    struct ondisk_cache_entry *ondisk,
> +					    const char *ondisk,
>   					    unsigned long *ent_size,
>   					    const struct cache_entry *previous_ce)
>   {
> @@ -1883,7 +1883,7 @@ static struct cache_entry *create_from_disk(struct mem_pool *ce_mem_pool,
>   	size_t len;
>   	const char *name;
>   	const unsigned hashsz = the_hash_algo->rawsz;
> -	const uint16_t *flagsp = (const uint16_t *)(ondisk->data + hashsz);
> +	const char *flagsp = ondisk + offsetof(struct ondisk_cache_entry, data) + hashsz;
>   	unsigned int flags;
>   	size_t copy_len = 0;
>   	/*
> @@ -1901,15 +1901,15 @@ static struct cache_entry *create_from_disk(struct mem_pool *ce_mem_pool,
>   
>   	if (flags & CE_EXTENDED) {
>   		int extended_flags;
> -		extended_flags = get_be16(flagsp + 1) << 16;
> +		extended_flags = get_be16(flagsp + sizeof(uint16_t)) << 16;
>   		/* We do not yet understand any bit out of CE_EXTENDED_FLAGS */
>   		if (extended_flags & ~CE_EXTENDED_FLAGS)
>   			die(_("unknown index entry format 0x%08x"), extended_flags);
>   		flags |= extended_flags;
> -		name = (const char *)(flagsp + 2);
> +		name = (const char *)(flagsp + 2 * sizeof(uint16_t));
>   	}
>   	else
> -		name = (const char *)(flagsp + 1);
> +		name = (const char *)(flagsp + sizeof(uint16_t));
>   
>   	if (expand_name_field) {
>   		const unsigned char *cp = (const unsigned char *)name;
> @@ -1935,20 +1935,24 @@ static struct cache_entry *create_from_disk(struct mem_pool *ce_mem_pool,
>   
>   	ce = mem_pool__ce_alloc(ce_mem_pool, len);
>   
> -	ce->ce_stat_data.sd_ctime.sec = get_be32(&ondisk->ctime.sec);
> -	ce->ce_stat_data.sd_mtime.sec = get_be32(&ondisk->mtime.sec);
> -	ce->ce_stat_data.sd_ctime.nsec = get_be32(&ondisk->ctime.nsec);
> -	ce->ce_stat_data.sd_mtime.nsec = get_be32(&ondisk->mtime.nsec);
> -	ce->ce_stat_data.sd_dev   = get_be32(&ondisk->dev);
> -	ce->ce_stat_data.sd_ino   = get_be32(&ondisk->ino);
> -	ce->ce_mode  = get_be32(&ondisk->mode);
> -	ce->ce_stat_data.sd_uid   = get_be32(&ondisk->uid);
> -	ce->ce_stat_data.sd_gid   = get_be32(&ondisk->gid);
> -	ce->ce_stat_data.sd_size  = get_be32(&ondisk->size);
> +	ce->ce_stat_data.sd_ctime.sec = get_be32(ondisk + offsetof(struct ondisk_cache_entry, ctime)
> +							+ offsetof(struct cache_time, sec));
> +	ce->ce_stat_data.sd_mtime.sec = get_be32(ondisk + offsetof(struct ondisk_cache_entry, mtime)
> +							+ offsetof(struct cache_time, sec));
> +	ce->ce_stat_data.sd_ctime.nsec = get_be32(ondisk + offsetof(struct ondisk_cache_entry, ctime)
> +							 + offsetof(struct cache_time, nsec));
> +	ce->ce_stat_data.sd_mtime.nsec = get_be32(ondisk + offsetof(struct ondisk_cache_entry, mtime)
> +							 + offsetof(struct cache_time, nsec));
> +	ce->ce_stat_data.sd_dev   = get_be32(ondisk + offsetof(struct ondisk_cache_entry, dev));
> +	ce->ce_stat_data.sd_ino   = get_be32(ondisk + offsetof(struct ondisk_cache_entry, ino));
> +	ce->ce_mode  = get_be32(ondisk + offsetof(struct ondisk_cache_entry, mode));
> +	ce->ce_stat_data.sd_uid   = get_be32(ondisk + offsetof(struct ondisk_cache_entry, uid));
> +	ce->ce_stat_data.sd_gid   = get_be32(ondisk + offsetof(struct ondisk_cache_entry, gid));
> +	ce->ce_stat_data.sd_size  = get_be32(ondisk + offsetof(struct ondisk_cache_entry, size));
>   	ce->ce_flags = flags & ~CE_NAMEMASK;
>   	ce->ce_namelen = len;
>   	ce->index = 0;
> -	oidread(&ce->oid, ondisk->data);
> +	oidread(&ce->oid, (const unsigned char *)ondisk + offsetof(struct ondisk_cache_entry, data));
>   
>   	if (expand_name_field) {
>   		if (copy_len)
> @@ -2117,12 +2121,12 @@ static unsigned long load_cache_entry_block(struct index_state *istate,
>   	unsigned long src_offset = start_offset;
>   
>   	for (i = offset; i < offset + nr; i++) {
> -		struct ondisk_cache_entry *disk_ce;
>   		struct cache_entry *ce;
>   		unsigned long consumed;
>   
> -		disk_ce = (struct ondisk_cache_entry *)(mmap + src_offset);
> -		ce = create_from_disk(ce_mem_pool, istate->version, disk_ce, &consumed, previous_ce);
> +		ce = create_from_disk(ce_mem_pool, istate->version,
> +				      mmap + src_offset,
> +				      &consumed, previous_ce);
>   		set_index_entry(istate, i, ce);
>   
>   		src_offset += consumed;
> 
> base-commit: 1b3d6e17fe83eb6f79ffbac2f2c61bbf1eaef5f8

Victoria Dye Sept. 26, 2022, 3:39 p.m. UTC | #4

Jeff King wrote:
> On Fri, Sep 23, 2022 at 07:43:55PM +0000, Victoria Dye via GitGitGadget wrote:
>> @@ -1883,7 +1883,7 @@ static struct cache_entry *create_from_disk(struct mem_pool *ce_mem_pool,
>>  	size_t len;
>>  	const char *name;
>>  	const unsigned hashsz = the_hash_algo->rawsz;
>> -	const uint16_t *flagsp = (const uint16_t *)(ondisk->data + hashsz);
>> +	const char *flagsp = ondisk + offsetof(struct ondisk_cache_entry, data) + hashsz;
> 
> Now we use the "const char *" pointer instead of the cast to the
> ondisk_cache_entry struct, which is good, and is what fixes the
> alignment question.
> 
> But we also convert flagsp from being a uint16_t into a byte pointer.
> I'm not sure if that's strictly necessary from an alignment perspective,
> as we'd dereference it only via get_be16(), which handles alignment and
> type conversion itself.
> 
> I'd imagine the standard probably says that even forming such a pointer
> is illegal, so in that sense, it probably is undefined behavior. But I
> think it's one of those things that's OK in practice.

Yep, per the C standard §6.3.2.3 #7 [1]:

  A pointer to an object or incomplete type may be converted to a pointer to
  a different object or incomplete type. If the resulting pointer is not
  correctly aligned for the pointed-to type, the behavior is undefined.

To your point, it is probably fine in practice, but I'd lean towards
sticking with a 'char *' to play it safe.

[1] https://www.open-std.org/JTC1/SC22/WG14/www/docs/n1256.pdf

>> @@ -1935,20 +1935,24 @@ static struct cache_entry *create_from_disk(struct mem_pool *ce_mem_pool,
>>  
>>  	ce = mem_pool__ce_alloc(ce_mem_pool, len);
>>  
>> -	ce->ce_stat_data.sd_ctime.sec = get_be32(&ondisk->ctime.sec);
>> [...]
>> +	ce->ce_stat_data.sd_ctime.sec = get_be32(ondisk + offsetof(struct ondisk_cache_entry, ctime)
>> +							+ offsetof(struct cache_time, sec));
> 
> I had figured we'd be able to drop ondisk_cache_entry entirely. But here
> you're using it essentially as a template for a set of constants
> retrieved via offsetof().
> 
> That's OK from an alignment perspective. It does mean we'd be in trouble
> if a compiler ever decided to introduce padding into the struct. That's
> probably unlikely. We don't use __attribute__((packed)) because it's not
> portable, and our existing uses have generally been OK, because our
> data structures are organized around 8-byte alignment. We might have
> problems on a theoretical 128-bit processor or something.

In addition to portability, using '__attribute__((packed))' could hurt
performance (and, in a large index, that might have a noticeable effect).

As for dropping 'ondisk_cache_entry()', I didn't want to drop it only from
the "read" operation (and use something like the "parse left-to-right"
strategy below) while leaving it in "write." And, as you mentioned later,
changing 'ce_write_entry()' is a lot more invasive than what's already in
this patch and possibly out-of-scope.

> Another strategy is to just parse left-to-right, advancing the byte
> pointer. Like:
> 
>   ce->ce_state_data.sd_ctime.sec = get_be32(ondisk);
>   ondisk += sizeof(uint32_t);
>   ce->ce_state_data.sd_mtime.sec = get_be32(ondisk);
>   ondisk += sizeof(uint32_t);
>   ...etc...
> 
> You can even stick that in a helper function that does the get_b32() and
> advances, so you know they're always done in sync. See pack-bitmap.c's
> read_be32(), etc. IMHO this produces a nice result because the reading
> code itself becomes the source of truth for the format.
> 

...

> One final note, though:
> 
>> +	ce->ce_stat_data.sd_mtime.sec = get_be32(ondisk + offsetof(struct ondisk_cache_entry, mtime)
>> +							+ offsetof(struct cache_time, sec));
> 
> Here (and elsewhere), you can assume that the offsetof() "sec" in
> cache_time is 0, for two reasons:
> 
>   - I didn't look up chapter and verse, but I'm pretty sure the standard
>     does guarantee that the first field of a struct is at the beginning.
> 
>   - If there's any padding, this whole scheme is hosed anyway, because
>     it means sizeof(cache_time) is bigger than we expect, which messes
>     up the offsetof() the entry after us (in this case sd_dev).
> 
> So this can just be:
> 
>   ce->ce_stat_data.sd_mtime.sec = get_be32(ondisk + offsetof(struct ondisk_cache_entry, mtime));
> 
> which is mercifully shorter.
> 
> Assuming we dismiss the rest of what I said as not worth it for a
> minimal fix, I do think that simplification is worth rolling a v2.

That makes sense from a technical perspective, but I included the starting
entry offset for readability reasons. It might be confusing to someone
unfamiliar with C struct memory alignment to see every other 'get_be32'
refer to the exact entry it's reading via the 'offsetof()', but have that
information absent only for a few entries. And, the double 'offsetof()'
would still be used by the 'mtime.nsec'/'ctime.nsec' fields anyway.

In any case, if this patch is intended to be a short-lived change on the way
to a more complete refactor and/or I'm being overzealous on the readability,
I'd be happy to change it. :) 

Thanks!

> 
> -Peff
> 
> PS BTW, I mentioned earlier "can we just get rid of ondisk_cache_entry".
>    We also use it for the writing side, of course. That doesn't have
>    alignment issues, but it does have the same "I hope there's never any
>    padding" question. In an ideal world, it would be using the
>    equivalent put_be32(), but again, that's getting out of the "minimal
>    fix" territory.

Jeff King Sept. 26, 2022, 5:35 p.m. UTC | #5

On Mon, Sep 26, 2022 at 08:39:10AM -0700, Victoria Dye wrote:

> > So this can just be:
> > 
> >   ce->ce_stat_data.sd_mtime.sec = get_be32(ondisk + offsetof(struct ondisk_cache_entry, mtime));
> > 
> > which is mercifully shorter.
> > 
> > Assuming we dismiss the rest of what I said as not worth it for a
> > minimal fix, I do think that simplification is worth rolling a v2.
> 
> That makes sense from a technical perspective, but I included the starting
> entry offset for readability reasons. It might be confusing to someone
> unfamiliar with C struct memory alignment to see every other 'get_be32'
> refer to the exact entry it's reading via the 'offsetof()', but have that
> information absent only for a few entries. And, the double 'offsetof()'
> would still be used by the 'mtime.nsec'/'ctime.nsec' fields anyway.

Ah, right, I wasn't looking close enough. I was thinking that you were
reading the whole struct via a single function call, but of course that
is not true with get_be32(), and the nsec loads just below make that
obvious.

> In any case, if this patch is intended to be a short-lived change on the way
> to a more complete refactor and/or I'm being overzealous on the readability,
> I'd be happy to change it. :) 

No, I was just mis-reading it. I think what you've got here is a good
stopping point to fix the immediate problem.

-Peff

Junio C Hamano Sept. 26, 2022, 5:54 p.m. UTC | #6

Phillip Wood <phillip.wood123@gmail.com> writes:

> I was confused as to why this was safe as it means we're still reading
> the individual fields from unaligned addresses. The reason it is safe
> is that we use get_bexx() to read the fields and those functions
> handle unaligned access. I wonder if it is worth adding a note to
> clarify that if you re-roll.

I've got so used to ours that gives unaligned accesses, but I guess
in some other circles, get_XeYY() means reading YY bit-wide integer
in X byte-order from a preperly aligned address, and if that is the
case, I do not mind such a comment somewhere.  It would help those
coming from such a background very much.

Thanks.

Jeff King Sept. 26, 2022, 7:08 p.m. UTC | #7

On Fri, Sep 23, 2022 at 05:39:28PM -0400, Jeff King wrote:

> Another strategy is to just parse left-to-right, advancing the byte
> pointer. Like:
> 
>   ce->ce_state_data.sd_ctime.sec = get_be32(ondisk);
>   ondisk += sizeof(uint32_t);
>   ce->ce_state_data.sd_mtime.sec = get_be32(ondisk);
>   ondisk += sizeof(uint32_t);
>   ...etc...
> 
> You can even stick that in a helper function that does the get_b32() and
> advances, so you know they're always done in sync. See pack-bitmap.c's
> read_be32(), etc. IMHO this produces a nice result because the reading
> code itself becomes the source of truth for the format.
> 
> But one tricky thing there is if you want to parse out of order. And it
> does seem that we read the struct out of order in this case. But I don't
> think there's any reason we need to do so. Of course reordering the
> function would make the change much more invasive.

By the way, this last paragraph turns out not to be true. We do rely on
the order because we need to know the length of the name (retrieved from
the flags field) in order to allocate the internal ce_entry, which has a
FLEX_ARRAY. And we must allocate the struct before populating its fields
from the earlier bytes of the ondisk entry.

So we either need to go out of order, or parse into a dummy ce_entry and
then memcpy the results into the heap-allocated one.

You can still do a partial conversion as below, which I do think
improves readability, but without getting rid of the match for the
flagsp pointer, it feels like it may not be accomplishing enough to be
worth it.

Note also that there is some confusion with signed vs unsigned pointers.
It doesn't really matter in practice because get_be* is casting under
the hood, but the compiler is picky here. Arguably read_be32() should
take a void (just like get_be32() does). But I do find it a bit odd that
all of the index code uses a signed pointer for the mmap. Most of our
other code uses "const unsigned char *" to indicate that we expect
binary data. We could switch over, but it's a rather invasive patch. And
while we get rid of some casts (e.g., when we call oidread()), we'd gain
some new ones (some code uses strtol() to parse ascii numbers).

In the patch below I hacked around it by passing through a local void
pointer. ;)

diff --git a/read-cache.c b/read-cache.c
index d16eb97906..8668ded8f5 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -1875,17 +1875,20 @@ static int read_index_extension(struct index_state *istate,
 
 static struct cache_entry *create_from_disk(struct mem_pool *ce_mem_pool,
 					    unsigned int version,
-					    const char *ondisk,
+					    const void *ondisk_map,
 					    unsigned long *ent_size,
 					    const struct cache_entry *previous_ce)
 {
+	const unsigned char *ondisk = ondisk_map;
 	struct cache_entry *ce;
 	size_t len;
 	const char *name;
 	const unsigned hashsz = the_hash_algo->rawsz;
-	const char *flagsp = ondisk + offsetof(struct ondisk_cache_entry, data) + hashsz;
+	const unsigned char *flagsp = ondisk + offsetof(struct ondisk_cache_entry, data) + hashsz;
 	unsigned int flags;
 	size_t copy_len = 0;
+	size_t pos;
+
 	/*
 	 * Adjacent cache entries tend to share the leading paths, so it makes
 	 * sense to only store the differences in later entries.  In the v4
@@ -1935,24 +1938,21 @@ static struct cache_entry *create_from_disk(struct mem_pool *ce_mem_pool,
 
 	ce = mem_pool__ce_alloc(ce_mem_pool, len);
 
-	ce->ce_stat_data.sd_ctime.sec = get_be32(ondisk + offsetof(struct ondisk_cache_entry, ctime)
-							+ offsetof(struct cache_time, sec));
-	ce->ce_stat_data.sd_mtime.sec = get_be32(ondisk + offsetof(struct ondisk_cache_entry, mtime)
-							+ offsetof(struct cache_time, sec));
-	ce->ce_stat_data.sd_ctime.nsec = get_be32(ondisk + offsetof(struct ondisk_cache_entry, ctime)
-							 + offsetof(struct cache_time, nsec));
-	ce->ce_stat_data.sd_mtime.nsec = get_be32(ondisk + offsetof(struct ondisk_cache_entry, mtime)
-							 + offsetof(struct cache_time, nsec));
-	ce->ce_stat_data.sd_dev   = get_be32(ondisk + offsetof(struct ondisk_cache_entry, dev));
-	ce->ce_stat_data.sd_ino   = get_be32(ondisk + offsetof(struct ondisk_cache_entry, ino));
-	ce->ce_mode  = get_be32(ondisk + offsetof(struct ondisk_cache_entry, mode));
-	ce->ce_stat_data.sd_uid   = get_be32(ondisk + offsetof(struct ondisk_cache_entry, uid));
-	ce->ce_stat_data.sd_gid   = get_be32(ondisk + offsetof(struct ondisk_cache_entry, gid));
-	ce->ce_stat_data.sd_size  = get_be32(ondisk + offsetof(struct ondisk_cache_entry, size));
+	pos = 0;
+	ce->ce_stat_data.sd_ctime.sec = read_be32(ondisk, &pos);
+	ce->ce_stat_data.sd_ctime.nsec = read_be32(ondisk, &pos);
+	ce->ce_stat_data.sd_mtime.sec = read_be32(ondisk, &pos);
+	ce->ce_stat_data.sd_mtime.nsec = read_be32(ondisk, &pos);
+	ce->ce_stat_data.sd_dev   = read_be32(ondisk, &pos);
+	ce->ce_stat_data.sd_ino   = read_be32(ondisk, &pos);
+	ce->ce_mode  = read_be32(ondisk, &pos);
+	ce->ce_stat_data.sd_uid   = read_be32(ondisk, &pos);
+	ce->ce_stat_data.sd_gid   = read_be32(ondisk, &pos);
+	ce->ce_stat_data.sd_size  = read_be32(ondisk, &pos);
 	ce->ce_flags = flags & ~CE_NAMEMASK;
 	ce->ce_namelen = len;
 	ce->index = 0;
-	oidread(&ce->oid, (const unsigned char *)ondisk + offsetof(struct ondisk_cache_entry, data));
+	oidread(&ce->oid, ondisk + pos);
 
 	if (expand_name_field) {
 		if (copy_len)

Jeff King Sept. 26, 2022, 7:31 p.m. UTC | #8

On Mon, Sep 26, 2022 at 03:08:28PM -0400, Jeff King wrote:

> So we either need to go out of order, or parse into a dummy ce_entry and
> then memcpy the results into the heap-allocated one.

Here's the "dummy" version. I did this mostly to satisfy my own
curiosity, and am sharing so the effort isn't lost. If you are ready to
move on from the topic, don't feel compelled to read or respond. :)

I do find it a bit more straight-forward, but the extra copy is ugly,
plus the "yuck" comment.

diff --git a/read-cache.c b/read-cache.c
index d16eb97906..8773f833bb 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -1875,17 +1875,19 @@ static int read_index_extension(struct index_state *istate,
 
 static struct cache_entry *create_from_disk(struct mem_pool *ce_mem_pool,
 					    unsigned int version,
-					    const char *ondisk,
+					    const void *ondisk_map,
 					    unsigned long *ent_size,
 					    const struct cache_entry *previous_ce)
 {
+	const unsigned char *ondisk = ondisk_map;
+	struct cache_entry pe; /* parsed entry; not sized to hold name */
 	struct cache_entry *ce;
 	size_t len;
-	const char *name;
 	const unsigned hashsz = the_hash_algo->rawsz;
-	const char *flagsp = ondisk + offsetof(struct ondisk_cache_entry, data) + hashsz;
 	unsigned int flags;
 	size_t copy_len = 0;
+	size_t pos = 0;
+
 	/*
 	 * Adjacent cache entries tend to share the leading paths, so it makes
 	 * sense to only store the differences in later entries.  In the v4
@@ -1895,24 +1897,35 @@ static struct cache_entry *create_from_disk(struct mem_pool *ce_mem_pool,
 	 */
 	int expand_name_field = version == 4;
 
-	/* On-disk flags are just 16 bits */
-	flags = get_be16(flagsp);
+	pe.ce_stat_data.sd_ctime.sec = read_be32(ondisk, &pos);
+	pe.ce_stat_data.sd_ctime.nsec = read_be32(ondisk, &pos);
+	pe.ce_stat_data.sd_mtime.sec = read_be32(ondisk, &pos);
+	pe.ce_stat_data.sd_mtime.nsec = read_be32(ondisk, &pos);
+	pe.ce_stat_data.sd_dev   = read_be32(ondisk, &pos);
+	pe.ce_stat_data.sd_ino   = read_be32(ondisk, &pos);
+	pe.ce_mode  = read_be32(ondisk, &pos);
+	pe.ce_stat_data.sd_uid   = read_be32(ondisk, &pos);
+	pe.ce_stat_data.sd_gid   = read_be32(ondisk, &pos);
+	pe.ce_stat_data.sd_size  = read_be32(ondisk, &pos);
+
+	oidread(&pe.oid, ondisk + pos);
+	pos += hashsz;
+
+	flags = read_be16(ondisk, &pos);
 	len = flags & CE_NAMEMASK;
 
 	if (flags & CE_EXTENDED) {
 		int extended_flags;
-		extended_flags = get_be16(flagsp + sizeof(uint16_t)) << 16;
+		extended_flags = read_be16(ondisk, &pos) << 16;
 		/* We do not yet understand any bit out of CE_EXTENDED_FLAGS */
 		if (extended_flags & ~CE_EXTENDED_FLAGS)
 			die(_("unknown index entry format 0x%08x"), extended_flags);
 		flags |= extended_flags;
-		name = (const char *)(flagsp + 2 * sizeof(uint16_t));
 	}
-	else
-		name = (const char *)(flagsp + sizeof(uint16_t));
+	pe.ce_flags = flags & ~CE_NAMEMASK;
 
 	if (expand_name_field) {
-		const unsigned char *cp = (const unsigned char *)name;
+		const unsigned char *cp = ondisk + pos;
 		size_t strip_len, previous_len;
 
 		/* If we're at the beginning of a block, ignore the previous name */
@@ -1924,43 +1937,29 @@ static struct cache_entry *create_from_disk(struct mem_pool *ce_mem_pool,
 					previous_ce->name);
 			copy_len = previous_len - strip_len;
 		}
-		name = (const char *)cp;
+		pos = cp - ondisk;
 	}
 
 	if (len == CE_NAMEMASK) {
-		len = strlen(name);
+		len = strlen((const char *)ondisk + pos);
 		if (expand_name_field)
 			len += copy_len;
 	}
 
 	ce = mem_pool__ce_alloc(ce_mem_pool, len);
-
-	ce->ce_stat_data.sd_ctime.sec = get_be32(ondisk + offsetof(struct ondisk_cache_entry, ctime)
-							+ offsetof(struct cache_time, sec));
-	ce->ce_stat_data.sd_mtime.sec = get_be32(ondisk + offsetof(struct ondisk_cache_entry, mtime)
-							+ offsetof(struct cache_time, sec));
-	ce->ce_stat_data.sd_ctime.nsec = get_be32(ondisk + offsetof(struct ondisk_cache_entry, ctime)
-							 + offsetof(struct cache_time, nsec));
-	ce->ce_stat_data.sd_mtime.nsec = get_be32(ondisk + offsetof(struct ondisk_cache_entry, mtime)
-							 + offsetof(struct cache_time, nsec));
-	ce->ce_stat_data.sd_dev   = get_be32(ondisk + offsetof(struct ondisk_cache_entry, dev));
-	ce->ce_stat_data.sd_ino   = get_be32(ondisk + offsetof(struct ondisk_cache_entry, ino));
-	ce->ce_mode  = get_be32(ondisk + offsetof(struct ondisk_cache_entry, mode));
-	ce->ce_stat_data.sd_uid   = get_be32(ondisk + offsetof(struct ondisk_cache_entry, uid));
-	ce->ce_stat_data.sd_gid   = get_be32(ondisk + offsetof(struct ondisk_cache_entry, gid));
-	ce->ce_stat_data.sd_size  = get_be32(ondisk + offsetof(struct ondisk_cache_entry, size));
-	ce->ce_flags = flags & ~CE_NAMEMASK;
-	ce->ce_namelen = len;
+	memcpy(ce, &pe, sizeof(pe));
 	ce->index = 0;
-	oidread(&ce->oid, (const unsigned char *)ondisk + offsetof(struct ondisk_cache_entry, data));
+	ce->ce_namelen = len;
+	/* yuck, our memcpy overwrote this */
+	ce->mem_pool_allocated = 1;
 
 	if (expand_name_field) {
 		if (copy_len)
 			memcpy(ce->name, previous_ce->name, copy_len);
-		memcpy(ce->name + copy_len, name, len + 1 - copy_len);
-		*ent_size = (name - ((char *)ondisk)) + len + 1 - copy_len;
+		memcpy(ce->name + copy_len, ondisk + pos, len + 1 - copy_len);
+		*ent_size = pos + len + 1 - copy_len;
 	} else {
-		memcpy(ce->name, name, len + 1);
+		memcpy(ce->name, ondisk + pos, len + 1);
 		*ent_size = ondisk_ce_size(ce);
 	}
 	return ce;

Junio C Hamano Sept. 26, 2022, 11:02 p.m. UTC | #9

Jeff King <peff@peff.net> writes:

> I do find it a bit more straight-forward, but the extra copy is ugly,
> plus the "yuck" comment.

Yeah, the "yuck" memcpy() is, eh, unfortunate.  Consistently using a
single "pointer" (pos) to keep track of where we are does make the
structure of the code appear uniform and easier to follow, though.

read-cache: avoid misaligned reads in index v4

Commit Message

Comments

Patch