diff mbox

[RFC,v3,1/3] arm64/kernel: kaslr: reduce module randomization range to 4 GB

Message ID 20180214113645.16793-2-ard.biesheuvel@linaro.org (mailing list archive)
State New, archived
Headers show

Commit Message

Ard Biesheuvel Feb. 14, 2018, 11:36 a.m. UTC
We currently have to rely on the GCC large code model for KASLR for
two distinct but related reasons:
- if we enable full randomization, modules will be loaded very far away
  from the core kernel, where they are out of range for ADRP instructions,
- even without full randomization, the fact that the 128 MB module region
  is now no longer fully reserved for kernel modules means that there is
  a very low likelihood that the normal bottom-up allocation of other
  vmalloc regions may collide, and use up the range for other things.

Large model code is suboptimal, given that each symbol reference involves
a literal load that goes through the D-cache, reducing cache utilization.
But more importantly, literals are not instructions but part of .text
nonetheless, and hence mapped with executable permissions.

So let's get rid of our dependency on the large model for KASLR, by:
- reducing the full randomization range to 4 GB, thereby ensuring that
  ADRP references between modules and the kernel are always in range,
- reduce the spillover range to 4 GB as well, so that we fallback to a
  region that is still guaranteed to be in range
- move the randomization window of the core kernel to the middle of the
  VMALLOC space

Note that KASAN always uses the module region outside of the vmalloc space,
so keep the kernel close to that if KASAN is enabled.

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
---
 arch/arm64/Kconfig         |  7 +++----
 arch/arm64/kernel/kaslr.c  | 20 ++++++++++++--------
 arch/arm64/kernel/module.c |  7 ++++---
 include/linux/sizes.h      |  2 ++
 4 files changed, 21 insertions(+), 15 deletions(-)

Comments

Mark Rutland Feb. 23, 2018, 5 p.m. UTC | #1
On Wed, Feb 14, 2018 at 11:36:43AM +0000, Ard Biesheuvel wrote:
> We currently have to rely on the GCC large code model for KASLR for
> two distinct but related reasons:
> - if we enable full randomization, modules will be loaded very far away
>   from the core kernel, where they are out of range for ADRP instructions,
> - even without full randomization, the fact that the 128 MB module region
>   is now no longer fully reserved for kernel modules means that there is
>   a very low likelihood that the normal bottom-up allocation of other
>   vmalloc regions may collide, and use up the range for other things.
> 
> Large model code is suboptimal, given that each symbol reference involves
> a literal load that goes through the D-cache, reducing cache utilization.
> But more importantly, literals are not instructions but part of .text
> nonetheless, and hence mapped with executable permissions.

I guess that means they pollute the I-caches, too?

How big a difference does this series make to .text size?

I don't really have a strong opinion here. IIRC the idea for randomizing
modules across the whole vmalloc space was to make it harder for module
bugs to leak "real" kernel addresses, but I don't know how much that's
likely to help in practice, and the performance / cache footprint wins
are enticing.

[...]

> @@ -149,21 +151,23 @@ u64 __init kaslr_early_init(u64 dt_phys)
>  		 * vmalloc region, since shadow memory is allocated for each
>  		 * module at load time, whereas the vmalloc region is shadowed
>  		 * by KASAN zero pages. So keep modules out of the vmalloc
> -		 * region if KASAN is enabled.
> +		 * region if KASAN is enabled, and put the kernel well within
> +		 * 4 GB of the module region.
>  		 */
> -		return offset;
> +		return offset % SZ_2G;

I wonder if we can do more here, taking the kernel size into account.

[...]

> diff --git a/include/linux/sizes.h b/include/linux/sizes.h
> index ce3e8150c174..bc621db852d9 100644
> --- a/include/linux/sizes.h
> +++ b/include/linux/sizes.h
> @@ -44,4 +44,6 @@
>  #define SZ_1G				0x40000000
>  #define SZ_2G				0x80000000
>  
> +#define SZ_4G				0x100000000ULL

Some asm includes <linux/sizes.h>, so it'd be nice for this to use
ULL().

Masahiro Yamada had patches moving that to <linux/const.h>.

Thanks,
Mark.
Ard Biesheuvel Feb. 23, 2018, 5:07 p.m. UTC | #2
On 23 February 2018 at 17:00, Mark Rutland <mark.rutland@arm.com> wrote:
> On Wed, Feb 14, 2018 at 11:36:43AM +0000, Ard Biesheuvel wrote:
>> We currently have to rely on the GCC large code model for KASLR for
>> two distinct but related reasons:
>> - if we enable full randomization, modules will be loaded very far away
>>   from the core kernel, where they are out of range for ADRP instructions,
>> - even without full randomization, the fact that the 128 MB module region
>>   is now no longer fully reserved for kernel modules means that there is
>>   a very low likelihood that the normal bottom-up allocation of other
>>   vmalloc regions may collide, and use up the range for other things.
>>
>> Large model code is suboptimal, given that each symbol reference involves
>> a literal load that goes through the D-cache, reducing cache utilization.
>> But more importantly, literals are not instructions but part of .text
>> nonetheless, and hence mapped with executable permissions.
>
> I guess that means they pollute the I-caches, too?
>

Yes.

> How big a difference does this series make to .text size?
>

I will add the numbers for a couple of sizable modules when I respin,
although i don't expect that aspect to be the most convincing.

> I don't really have a strong opinion here. IIRC the idea for randomizing
> modules across the whole vmalloc space was to make it harder for module
> bugs to leak "real" kernel addresses, but I don't know how much that's
> likely to help in practice, and the performance / cache footprint wins
> are enticing.
>

Yes. I think the important thing is that they are randomized
independently, and the additional entropy in the high order bits is
unlikely to make a huge difference imo.

When we added KASLR, the reason we enabled full module randomization
by default was to get coverage for the new PLT code, not because it is
deemed 'better' in some respect.

> [...]
>
>> @@ -149,21 +151,23 @@ u64 __init kaslr_early_init(u64 dt_phys)
>>                * vmalloc region, since shadow memory is allocated for each
>>                * module at load time, whereas the vmalloc region is shadowed
>>                * by KASAN zero pages. So keep modules out of the vmalloc
>> -              * region if KASAN is enabled.
>> +              * region if KASAN is enabled, and put the kernel well within
>> +              * 4 GB of the module region.
>>                */
>> -             return offset;
>> +             return offset % SZ_2G;
>
> I wonder if we can do more here, taking the kernel size into account.
>
> [...]
>

Not sure whether it matters tbh. To be honest, I think most KASAN
users turn KASLR off unless they are debugging some aspect of KASLR
itself (that's certainly how I use it)

>> diff --git a/include/linux/sizes.h b/include/linux/sizes.h
>> index ce3e8150c174..bc621db852d9 100644
>> --- a/include/linux/sizes.h
>> +++ b/include/linux/sizes.h
>> @@ -44,4 +44,6 @@
>>  #define SZ_1G                                0x40000000
>>  #define SZ_2G                                0x80000000
>>
>> +#define SZ_4G                                0x100000000ULL
>
> Some asm includes <linux/sizes.h>, so it'd be nice for this to use
> ULL().
>
> Masahiro Yamada had patches moving that to <linux/const.h>.
>

OK
Ard Biesheuvel March 5, 2018, 12:22 p.m. UTC | #3
On 23 February 2018 at 17:07, Ard Biesheuvel <ard.biesheuvel@linaro.org> wrote:
> On 23 February 2018 at 17:00, Mark Rutland <mark.rutland@arm.com> wrote:
>> On Wed, Feb 14, 2018 at 11:36:43AM +0000, Ard Biesheuvel wrote:
>>> We currently have to rely on the GCC large code model for KASLR for
>>> two distinct but related reasons:
>>> - if we enable full randomization, modules will be loaded very far away
>>>   from the core kernel, where they are out of range for ADRP instructions,
>>> - even without full randomization, the fact that the 128 MB module region
>>>   is now no longer fully reserved for kernel modules means that there is
>>>   a very low likelihood that the normal bottom-up allocation of other
>>>   vmalloc regions may collide, and use up the range for other things.
>>>
>>> Large model code is suboptimal, given that each symbol reference involves
>>> a literal load that goes through the D-cache, reducing cache utilization.
>>> But more importantly, literals are not instructions but part of .text
>>> nonetheless, and hence mapped with executable permissions.
>>
>> I guess that means they pollute the I-caches, too?
>>
>
> Yes.
>
>> How big a difference does this series make to .text size?
>>
>
> I will add the numbers for a couple of sizable modules when I respin,
> although i don't expect that aspect to be the most convincing.
>

For the record, this is what I get before and after for a widely used
fairly large module:

   text    data     bss     dec     hex filename
 317907    7004      16 324927   4f53f net/mac80211/mac80211.ko

   text    data     bss     dec     hex filename
 315603    7004      19 322626   4ec42 net/mac80211/mac80211.ko

This is large model vs small model with the stack protector in strong mode.

So while the size does decrease slightly, I think the reduced cache
footprint is a more important metric here, and I am not sure how to
measure that.

>> I don't really have a strong opinion here. IIRC the idea for randomizing
>> modules across the whole vmalloc space was to make it harder for module
>> bugs to leak "real" kernel addresses, but I don't know how much that's
>> likely to help in practice, and the performance / cache footprint wins
>> are enticing.
>>
>
> Yes. I think the important thing is that they are randomized
> independently, and the additional entropy in the high order bits is
> unlikely to make a huge difference imo.
>
> When we added KASLR, the reason we enabled full module randomization
> by default was to get coverage for the new PLT code, not because it is
> deemed 'better' in some respect.
>
>> [...]
>>
>>> @@ -149,21 +151,23 @@ u64 __init kaslr_early_init(u64 dt_phys)
>>>                * vmalloc region, since shadow memory is allocated for each
>>>                * module at load time, whereas the vmalloc region is shadowed
>>>                * by KASAN zero pages. So keep modules out of the vmalloc
>>> -              * region if KASAN is enabled.
>>> +              * region if KASAN is enabled, and put the kernel well within
>>> +              * 4 GB of the module region.
>>>                */
>>> -             return offset;
>>> +             return offset % SZ_2G;
>>
>> I wonder if we can do more here, taking the kernel size into account.
>>
>> [...]
>>
>
> Not sure whether it matters tbh. To be honest, I think most KASAN
> users turn KASLR off unless they are debugging some aspect of KASLR
> itself (that's certainly how I use it)
>
>>> diff --git a/include/linux/sizes.h b/include/linux/sizes.h
>>> index ce3e8150c174..bc621db852d9 100644
>>> --- a/include/linux/sizes.h
>>> +++ b/include/linux/sizes.h
>>> @@ -44,4 +44,6 @@
>>>  #define SZ_1G                                0x40000000
>>>  #define SZ_2G                                0x80000000
>>>
>>> +#define SZ_4G                                0x100000000ULL
>>
>> Some asm includes <linux/sizes.h>, so it'd be nice for this to use
>> ULL().
>>
>> Masahiro Yamada had patches moving that to <linux/const.h>.
>>
>
> OK
diff mbox

Patch

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 7381eeb7ef8e..ae7d3d4c0bbe 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -1109,7 +1109,6 @@  config ARM64_MODULE_CMODEL_LARGE
 
 config ARM64_MODULE_PLTS
 	bool
-	select ARM64_MODULE_CMODEL_LARGE
 	select HAVE_MOD_ARCH_SPECIFIC
 
 config RELOCATABLE
@@ -1143,12 +1142,12 @@  config RANDOMIZE_BASE
 	  If unsure, say N.
 
 config RANDOMIZE_MODULE_REGION_FULL
-	bool "Randomize the module region independently from the core kernel"
+	bool "Randomize the module region over a 4 GB range"
 	depends on RANDOMIZE_BASE
 	default y
 	help
-	  Randomizes the location of the module region without considering the
-	  location of the core kernel. This way, it is impossible for modules
+	  Randomizes the location of the module region inside a 4 GB window
+	  covering the core kernel. This way, it is less likely for modules
 	  to leak information about the location of core kernel data structures
 	  but it does imply that function calls between modules and the core
 	  kernel will need to be resolved via veneers in the module PLT.
diff --git a/arch/arm64/kernel/kaslr.c b/arch/arm64/kernel/kaslr.c
index 47080c49cc7e..17dbdb055314 100644
--- a/arch/arm64/kernel/kaslr.c
+++ b/arch/arm64/kernel/kaslr.c
@@ -117,13 +117,15 @@  u64 __init kaslr_early_init(u64 dt_phys)
 	/*
 	 * OK, so we are proceeding with KASLR enabled. Calculate a suitable
 	 * kernel image offset from the seed. Let's place the kernel in the
-	 * lower half of the VMALLOC area (VA_BITS - 2).
+	 * middle half of the VMALLOC area (VA_BITS - 2), and stay clear of
+	 * the lower and upper quarters to avoid colliding with other
+	 * allocations.
 	 * Even if we could randomize at page granularity for 16k and 64k pages,
 	 * let's always round to 2 MB so we don't interfere with the ability to
 	 * map using contiguous PTEs
 	 */
 	mask = ((1UL << (VA_BITS - 2)) - 1) & ~(SZ_2M - 1);
-	offset = seed & mask;
+	offset = BIT(VA_BITS - 3) + (seed & mask);
 
 	/* use the top 16 bits to randomize the linear region */
 	memstart_offset_seed = seed >> 48;
@@ -149,21 +151,23 @@  u64 __init kaslr_early_init(u64 dt_phys)
 		 * vmalloc region, since shadow memory is allocated for each
 		 * module at load time, whereas the vmalloc region is shadowed
 		 * by KASAN zero pages. So keep modules out of the vmalloc
-		 * region if KASAN is enabled.
+		 * region if KASAN is enabled, and put the kernel well within
+		 * 4 GB of the module region.
 		 */
-		return offset;
+		return offset % SZ_2G;
 
 	if (IS_ENABLED(CONFIG_RANDOMIZE_MODULE_REGION_FULL)) {
 		/*
-		 * Randomize the module region independently from the core
-		 * kernel. This prevents modules from leaking any information
+		 * Randomize the module region over a 4 GB window covering the
+		 * kernel. This reduces the risk of modules leaking information
 		 * about the address of the kernel itself, but results in
 		 * branches between modules and the core kernel that are
 		 * resolved via PLTs. (Branches between modules will be
 		 * resolved normally.)
 		 */
-		module_range = VMALLOC_END - VMALLOC_START - MODULES_VSIZE;
-		module_alloc_base = VMALLOC_START;
+		module_range = SZ_4G - (u64)(_end - _stext);
+		module_alloc_base = max((u64)_end + offset - SZ_4G,
+					(u64)MODULES_VADDR);
 	} else {
 		/*
 		 * Randomize the module region by setting module_alloc_base to
diff --git a/arch/arm64/kernel/module.c b/arch/arm64/kernel/module.c
index f469e0435903..10c6ab9534e8 100644
--- a/arch/arm64/kernel/module.c
+++ b/arch/arm64/kernel/module.c
@@ -55,9 +55,10 @@  void *module_alloc(unsigned long size)
 		 * less likely that the module region gets exhausted, so we
 		 * can simply omit this fallback in that case.
 		 */
-		p = __vmalloc_node_range(size, MODULE_ALIGN, VMALLOC_START,
-				VMALLOC_END, GFP_KERNEL, PAGE_KERNEL_EXEC, 0,
-				NUMA_NO_NODE, __builtin_return_address(0));
+		p = __vmalloc_node_range(size, MODULE_ALIGN, module_alloc_base,
+				module_alloc_base + SZ_4G, GFP_KERNEL,
+				PAGE_KERNEL_EXEC, 0, NUMA_NO_NODE,
+				__builtin_return_address(0));
 
 	if (p && (kasan_module_alloc(p, size) < 0)) {
 		vfree(p);
diff --git a/include/linux/sizes.h b/include/linux/sizes.h
index ce3e8150c174..bc621db852d9 100644
--- a/include/linux/sizes.h
+++ b/include/linux/sizes.h
@@ -44,4 +44,6 @@ 
 #define SZ_1G				0x40000000
 #define SZ_2G				0x80000000
 
+#define SZ_4G				0x100000000ULL
+
 #endif /* __LINUX_SIZES_H__ */