diff mbox series

[v2,15/16] packed: no out-of-bound access of packed bitfields

Message ID 20201226175129.9621-16-luc.vanoostenryck@gmail.com (mailing list archive)
State Superseded, archived
Headers show
Series support __packed struct | expand

Commit Message

Luc Van Oostenryck Dec. 26, 2020, 5:51 p.m. UTC
There is (at least) 2 ways by which packed bitfields doesn't
follow normal layout/access rules and as consequence can't (always)
be accessed the usual way (load the whole underlying word, then shift
and mask to isolate the bitfield).

At least two different cases are a concern:
1) there is no padding at the end of a bitfield sequence. For example,
   the following struct is only 3 bytes width:
	struct s {
		int f:24;
	} __packed;
   So, trying to access the bitfield by first doing a 32-bit load
   will create an out-of-bound access.

2) a bitfield smaller than one word may need more than one word to be
   accessed. For example, with the following struct
	struct {
		int a:5;
		int f:30;
		int z:5;
	} __packed;
   the bitfield 'f', while smaller than one 32-bit word, can't be accessed
   with a single 32-bit access.

At machine level, these bitfields should be accessed with several, possibly
smaller, loads and their corresponding values reconstructed form these,
making things much more complicated than for non-packed bitfields.

But at IR level, things can be a little more flexible and things can stay
simple by using sub-word or super-word accesses (until these need to
be lowered to be usable at machine level). In other words, the example here
can be safely accessed with respectively a 24-bit and a 40-bit load.
This is what is done in this patch.
---
 linearize.c | 13 +++++++++++--
 symbol.h    |  1 +
 2 files changed, 12 insertions(+), 2 deletions(-)

Comments

Ramsay Jones Dec. 28, 2020, 5:10 p.m. UTC | #1
On 26/12/2020 17:51, Luc Van Oostenryck wrote:
> There is (at least) 2 ways by which packed bitfields doesn't
> follow normal layout/access rules and as consequence can't (always)
> be accessed the usual way (load the whole underlying word, then shift
> and mask to isolate the bitfield).
> 
> At least two different cases are a concern:
> 1) there is no padding at the end of a bitfield sequence. For example,
>    the following struct is only 3 bytes width:
> 	struct s {
> 		int f:24;
> 	} __packed;
>    So, trying to access the bitfield by first doing a 32-bit load
>    will create an out-of-bound access.
> 
> 2) a bitfield smaller than one word may need more than one word to be
>    accessed. For example, with the following struct
> 	struct {
> 		int a:5;
> 		int f:30;
> 		int z:5;
> 	} __packed;
>    the bitfield 'f', while smaller than one 32-bit word, can't be accessed
>    with a single 32-bit access.
> 
> At machine level, these bitfields should be accessed with several, possibly
> smaller, loads and their corresponding values reconstructed form these,

s/form/from/

> making things much more complicated than for non-packed bitfields.
> 
> But at IR level, things can be a little more flexible and things can stay
> simple by using sub-word or super-word accesses (until these need to
> be lowered to be usable at machine level). In other words, the example here
> can be safely accessed with respectively a 24-bit and a 40-bit load.
> This is what is done in this patch.

Hmm, I didn't know the IR could represent this! ;-)
Is the 'lowering' code already present? Maybe next patch.

ATB,
Ramsay Jones

> ---
>  linearize.c | 13 +++++++++++--
>  symbol.h    |  1 +
>  2 files changed, 12 insertions(+), 2 deletions(-)
> 
> diff --git a/linearize.c b/linearize.c
> index 0250c6bb17ef..e80715ab2458 100644
> --- a/linearize.c
> +++ b/linearize.c
> @@ -977,8 +977,17 @@ static struct symbol *bitfield_base_type(struct symbol *sym)
>  	if (sym) {
>  		if (sym->type == SYM_NODE)
>  			base = base->ctype.base_type;
> -		if (base->type == SYM_BITFIELD)
> -			return base->ctype.base_type;
> +		if (base->type == SYM_BITFIELD) {
> +			base = base->ctype.base_type;
> +			if (sym->packed) {
> +				int size = bits_to_bytes(sym->bit_offset + sym->bit_size);
> +				sym = __alloc_symbol(0);
> +				*sym = *base;
> +				sym->bit_size = bytes_to_bits(size);
> +				return sym;
> +			}
> +			return base;
> +		}
>  	}
>  	return sym;
>  }
> diff --git a/symbol.h b/symbol.h
> index 5c5a7f12affa..866d57522f49 100644
> --- a/symbol.h
> +++ b/symbol.h
> @@ -192,6 +192,7 @@ struct symbol {
>  					accessed:1,
>  					builtin:1,
>  					torename:1,
> +					packed:1,
>  					transparent_union:1;
>  			int		rank:3;	// arithmetic's rank
>  			struct expression *array_size;
>
Luc Van Oostenryck Dec. 28, 2020, 9:12 p.m. UTC | #2
> 
> Hmm, I didn't know the IR could represent this! ;-)

It was already used but I would prefer to avoid it.
For example, when copying a structure:
	struct s {
		char name[5];
	} s, d;

	...
	d = s;
will linearize into a single 40-bit load + store.
In this case, it's quite OK because it directly translate to
a memcpy().

> Is the 'lowering' code already present? Maybe next patch.

I had several versions, all more ugly than the others. It's why
I ended with this 'OK, keep things simple for now'.
Also, there is several ways of doing this and I'm not convinced
of which one  should be used. Worse, the case:
	struct {
		a:10;
		f:14;
	};
should probably not be handled like the case:
	struct {
		a:5;
		f:30;
		z:5;
	};
since the problems are different (the first one is just a question
of not doing an out-of-bound access, while for the second case we
have a field not wider than 4-bytes but which can't be accessed 
in less than 5 bytes).
 
-- Luc
diff mbox series

Patch

diff --git a/linearize.c b/linearize.c
index 0250c6bb17ef..e80715ab2458 100644
--- a/linearize.c
+++ b/linearize.c
@@ -977,8 +977,17 @@  static struct symbol *bitfield_base_type(struct symbol *sym)
 	if (sym) {
 		if (sym->type == SYM_NODE)
 			base = base->ctype.base_type;
-		if (base->type == SYM_BITFIELD)
-			return base->ctype.base_type;
+		if (base->type == SYM_BITFIELD) {
+			base = base->ctype.base_type;
+			if (sym->packed) {
+				int size = bits_to_bytes(sym->bit_offset + sym->bit_size);
+				sym = __alloc_symbol(0);
+				*sym = *base;
+				sym->bit_size = bytes_to_bits(size);
+				return sym;
+			}
+			return base;
+		}
 	}
 	return sym;
 }
diff --git a/symbol.h b/symbol.h
index 5c5a7f12affa..866d57522f49 100644
--- a/symbol.h
+++ b/symbol.h
@@ -192,6 +192,7 @@  struct symbol {
 					accessed:1,
 					builtin:1,
 					torename:1,
+					packed:1,
 					transparent_union:1;
 			int		rank:3;	// arithmetic's rank
 			struct expression *array_size;