Message ID | 20200205223950.1212394-7-kristen@linux.intel.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | Finer grained kernel address space randomization | expand |
On Wed, Feb 05, 2020 at 02:39:45PM -0800, Kristen Carlson Accardi wrote: > We will be using -ffunction-sections to place each function in > it's own text section so it can be randomized at load time. The > linker considers these .text.* sections "orphaned sections", and > will place them after the first similar section (.text). However, > we need to move _etext so that it is after both .text and .text.* > We also need to calculate text size to include .text AND .text.* The dependency on the linker's orphan section handling is, I feel, rather fragile (during work on CFI and generally building kernels with Clang's LLD linker, we keep tripping over difference between how BFD and LLD handle orphans). However, this is currently no way to perform a section "pass through" where input sections retain their name as an output section. (If anyone knows a way to do this, I'm all ears). Right now, you can only collect sections like this: .text : AT(ADDR(.text) - LOAD_OFFSET) { *(.text.*) } or let them be orphans, which then the linker attempts to find a "similar" (code, data, etc) section to put them near: https://sourceware.org/binutils/docs-2.33.1/ld/Orphan-Sections.html So, basically, yes, this works, but I'd like to see BFD and LLD grow some kind of /PASSTHRU/ special section (like /DISCARD/), that would let a linker script specify _where_ these sections should roughly live. Related thoughts: I know x86_64 stack alignment is 16 bytes. I cannot find evidence for what function start alignment should be. It seems the linker is 16 byte aligning these functions, when I think no alignment is needed for function starts, so we're wasting some memory (average 8 bytes per function, at say 50,000 functions, so approaching 512KB) between functions. If we can specify a 1 byte alignment for these orphan sections, that would be nice, as mentioned in the cover letter: we lose a 4 bits of entropy to this alignment, since all randomized function addresses will have their low bits set to zero. And we can't adjust function section alignment, or there is some benefit to a larger alignment, I would like to have a way for the linker to specify the inter-section padding (or fill) bytes. Right now, the FILL(0xnn) (or =0xnn) can be used to specify the padding bytes _within_ a section, but not between sections. Right now, BFD appears to 0-pad. I'd like that to be 0xCC so "guessing" addresses incorrectly will trigger a trap. -Kees > > Signed-off-by: Kristen Carlson Accardi <kristen@linux.intel.com> > --- > arch/x86/kernel/vmlinux.lds.S | 18 +++++++++++++++++- > include/asm-generic/vmlinux.lds.h | 2 +- > 2 files changed, 18 insertions(+), 2 deletions(-) > > diff --git a/arch/x86/kernel/vmlinux.lds.S b/arch/x86/kernel/vmlinux.lds.S > index 3a1a819da137..e54e9ac5b429 100644 > --- a/arch/x86/kernel/vmlinux.lds.S > +++ b/arch/x86/kernel/vmlinux.lds.S > @@ -146,8 +146,24 @@ SECTIONS > #endif > } :text =0xcccc > > - /* End of text section, which should occupy whole number of pages */ > +#ifdef CONFIG_FG_KASLR > + /* > + * -ffunction-sections creates .text.* sections, which are considered > + * "orphan sections" and added after the first similar section (.text). > + * Adding this ALIGN statement causes the address of _etext > + * to be below that of all the .text.* orphaned sections > + */ > + . = ALIGN(PAGE_SIZE); > +#endif > _etext = .; > + > + /* > + * the size of the .text section is used to calculate the address > + * range for orc lookups. If we just use SIZEOF(.text), we will > + * miss all the .text.* sections. Calculate the size using _etext > + * and _stext and save the value for later. > + */ > + text_size = _etext - _stext; > . = ALIGN(PAGE_SIZE); > > X86_ALIGN_RODATA_BEGIN > diff --git a/include/asm-generic/vmlinux.lds.h b/include/asm-generic/vmlinux.lds.h > index e00f41aa8ec4..edf19f4296e2 100644 > --- a/include/asm-generic/vmlinux.lds.h > +++ b/include/asm-generic/vmlinux.lds.h > @@ -798,7 +798,7 @@ > . = ALIGN(4); \ > .orc_lookup : AT(ADDR(.orc_lookup) - LOAD_OFFSET) { \ > orc_lookup = .; \ > - . += (((SIZEOF(.text) + LOOKUP_BLOCK_SIZE - 1) / \ > + . += (((text_size + LOOKUP_BLOCK_SIZE - 1) / \ > LOOKUP_BLOCK_SIZE) + 1) * 4; \ > orc_lookup_end = .; \ > } > -- > 2.24.1 >
On Thu, Feb 6, 2020 at 1:26 PM Kees Cook <keescook@chromium.org> wrote: > I know x86_64 stack alignment is 16 bytes. That's true for the standard sysv ABI that is used in userspace; but the kernel uses a custom ABI with 8-byte stack alignment. See arch/x86/Makefile: # For gcc stack alignment is specified with -mpreferred-stack-boundary, # clang has the option -mstack-alignment for that purpose. ifneq ($(call cc-option, -mpreferred-stack-boundary=4),) cc_stack_align4 := -mpreferred-stack-boundary=2 cc_stack_align8 := -mpreferred-stack-boundary=3 else ifneq ($(call cc-option, -mstack-alignment=16),) cc_stack_align4 := -mstack-alignment=4 cc_stack_align8 := -mstack-alignment=8 endif [...] # By default gcc and clang use a stack alignment of 16 bytes for x86. # However the standard kernel entry on x86-64 leaves the stack on an # 8-byte boundary. If the compiler isn't informed about the actual # alignment it will generate extra alignment instructions for the # default alignment which keep the stack *mis*aligned. # Furthermore an alignment to the register width reduces stack usage # and the number of alignment instructions. KBUILD_CFLAGS += $(call cc-option,$(cc_stack_align8)) > I cannot find evidence for > what function start alignment should be. There is no architecturally required alignment for functions, but Intel's Optimization Manual (<https://www.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-32-architectures-optimization-manual.pdf>) recommends in section 3.4.1.5, "Code Alignment": | Assembly/Compiler Coding Rule 12. (M impact, H generality) | All branch targets should be 16-byte aligned. AFAIK this is recommended because, as documented in section 2.3.2.1, "Legacy Decode Pipeline" (describing the frontend of Sandy Bridge, and used as the base for newer microarchitectures): | An instruction fetch is a 16-byte aligned lookup through the ITLB and into the instruction cache. | The instruction cache can deliver every cycle 16 bytes to the instruction pre-decoder. AFAIK this means that if a branch ends close to the end of a 16-byte block, the frontend is less efficient because it may have to run two instruction fetches before the first instruction can even be decoded.
On Thu, Feb 06, 2020 at 04:26:23AM -0800, Kees Cook wrote: > I know x86_64 stack alignment is 16 bytes. I cannot find evidence for > what function start alignment should be. It seems the linker is 16 byte > aligning these functions, when I think no alignment is needed for > function starts, so we're wasting some memory (average 8 bytes per > function, at say 50,000 functions, so approaching 512KB) between > functions. If we can specify a 1 byte alignment for these orphan > sections, that would be nice, as mentioned in the cover letter: we lose > a 4 bits of entropy to this alignment, since all randomized function > addresses will have their low bits set to zero. > The default function alignment is 16-bytes for x64 at least with gcc. You can use -falign-functions to specify a different alignment. There was some old discussion on reducing it [1] but it doesn't seem to have been merged. [1] https://lore.kernel.org/lkml/tip-4874fe1eeb40b403a8c9d0ddeb4d166cab3f37ba@git.kernel.org/
On Thu, Feb 06, 2020 at 04:26:23AM -0800, Kees Cook wrote: > On Wed, Feb 05, 2020 at 02:39:45PM -0800, Kristen Carlson Accardi wrote: > > We will be using -ffunction-sections to place each function in > > it's own text section so it can be randomized at load time. The > > linker considers these .text.* sections "orphaned sections", and > > will place them after the first similar section (.text). However, > > we need to move _etext so that it is after both .text and .text.* > > We also need to calculate text size to include .text AND .text.* > > The dependency on the linker's orphan section handling is, I feel, > rather fragile (during work on CFI and generally building kernels with > Clang's LLD linker, we keep tripping over difference between how BFD and > LLD handle orphans). However, this is currently no way to perform a > section "pass through" where input sections retain their name as an > output section. (If anyone knows a way to do this, I'm all ears). > > Right now, you can only collect sections like this: > > .text : AT(ADDR(.text) - LOAD_OFFSET) { > *(.text.*) > } > > or let them be orphans, which then the linker attempts to find a > "similar" (code, data, etc) section to put them near: > https://sourceware.org/binutils/docs-2.33.1/ld/Orphan-Sections.html > > So, basically, yes, this works, but I'd like to see BFD and LLD grow > some kind of /PASSTHRU/ special section (like /DISCARD/), that would let > a linker script specify _where_ these sections should roughly live. > You could go through the objects that are being linked and find the individual text sections, and generate the linker script using that?
On Thu, Feb 06, 2020 at 09:39:43AM -0500, Arvind Sankar wrote: > On Thu, Feb 06, 2020 at 04:26:23AM -0800, Kees Cook wrote: > > I know x86_64 stack alignment is 16 bytes. I cannot find evidence for > > what function start alignment should be. It seems the linker is 16 byte > > aligning these functions, when I think no alignment is needed for > > function starts, so we're wasting some memory (average 8 bytes per > > function, at say 50,000 functions, so approaching 512KB) between > > functions. If we can specify a 1 byte alignment for these orphan > > sections, that would be nice, as mentioned in the cover letter: we lose > > a 4 bits of entropy to this alignment, since all randomized function > > addresses will have their low bits set to zero. > > > > The default function alignment is 16-bytes for x64 at least with gcc. > You can use -falign-functions to specify a different alignment. > > There was some old discussion on reducing it [1] but it doesn't seem to > have been merged. > > [1] https://lore.kernel.org/lkml/tip-4874fe1eeb40b403a8c9d0ddeb4d166cab3f37ba@git.kernel.org/ Though I don't think the entropy loss is real. With 50k functions, you can use at most log(50k!) = ~35 KiB worth of entropy in permuting them, no matter what the alignment is. The only way you can get more is if you have more than 50k slots to put them in.
On Thu, Feb 06, 2020 at 09:57:40AM -0500, Arvind Sankar wrote: > On Thu, Feb 06, 2020 at 04:26:23AM -0800, Kees Cook wrote: > > On Wed, Feb 05, 2020 at 02:39:45PM -0800, Kristen Carlson Accardi wrote: > > > We will be using -ffunction-sections to place each function in > > > it's own text section so it can be randomized at load time. The > > > linker considers these .text.* sections "orphaned sections", and > > > will place them after the first similar section (.text). However, > > > we need to move _etext so that it is after both .text and .text.* > > > We also need to calculate text size to include .text AND .text.* > > > > The dependency on the linker's orphan section handling is, I feel, > > rather fragile (during work on CFI and generally building kernels with > > Clang's LLD linker, we keep tripping over difference between how BFD and > > LLD handle orphans). However, this is currently no way to perform a > > section "pass through" where input sections retain their name as an > > output section. (If anyone knows a way to do this, I'm all ears). > > > > Right now, you can only collect sections like this: > > > > .text : AT(ADDR(.text) - LOAD_OFFSET) { > > *(.text.*) > > } > > > > or let them be orphans, which then the linker attempts to find a > > "similar" (code, data, etc) section to put them near: > > https://sourceware.org/binutils/docs-2.33.1/ld/Orphan-Sections.html > > > > So, basically, yes, this works, but I'd like to see BFD and LLD grow > > some kind of /PASSTHRU/ special section (like /DISCARD/), that would let > > a linker script specify _where_ these sections should roughly live. > > > > You could go through the objects that are being linked and find the > individual text sections, and generate the linker script using that? Also, one thing to note about the orphan section handling -- by default ld will combine multiple orphan sections with the same name into a single output section. So if you have sections corresponding to static functions with the same name but from different files, they will get unnecessarily combined. You may want to add --unique to the ld options to keep them separate. That will create multiple sections with the same name instead of merging them.
> On Feb 6, 2020, at 7:29 AM, Arvind Sankar <nivedita@alum.mit.edu> wrote: > > On Thu, Feb 06, 2020 at 09:39:43AM -0500, Arvind Sankar wrote: >>> On Thu, Feb 06, 2020 at 04:26:23AM -0800, Kees Cook wrote: >>> I know x86_64 stack alignment is 16 bytes. I cannot find evidence for >>> what function start alignment should be. It seems the linker is 16 byte >>> aligning these functions, when I think no alignment is needed for >>> function starts, so we're wasting some memory (average 8 bytes per >>> function, at say 50,000 functions, so approaching 512KB) between >>> functions. If we can specify a 1 byte alignment for these orphan >>> sections, that would be nice, as mentioned in the cover letter: we lose >>> a 4 bits of entropy to this alignment, since all randomized function >>> addresses will have their low bits set to zero. >>> >> >> The default function alignment is 16-bytes for x64 at least with gcc. >> You can use -falign-functions to specify a different alignment. >> >> There was some old discussion on reducing it [1] but it doesn't seem to >> have been merged. >> >> [1] https://lore.kernel.org/lkml/tip-4874fe1eeb40b403a8c9d0ddeb4d166cab3f37ba@git.kernel.org/ > > Though I don't think the entropy loss is real. With 50k functions, you > can use at most log(50k!) = ~35 KiB worth of entropy in permuting them, > no matter what the alignment is. The only way you can get more is if you > have more than 50k slots to put them in. There is a security consideration here that has nothing to do with entropy per se. If an attacker locates two functions, they learn the distance between them. This constrains what can fit in the gap. Padding reduces the strength of this type of attack, as would some degree of random padding.
From: Jann Horn > Sent: 06 February 2020 13:16 ... > > I cannot find evidence for > > what function start alignment should be. > > There is no architecturally required alignment for functions, but > Intel's Optimization Manual > (<https://www.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-32-architectures- > optimization-manual.pdf>) > recommends in section 3.4.1.5, "Code Alignment": > > | Assembly/Compiler Coding Rule 12. (M impact, H generality) > | All branch targets should be 16-byte aligned. > > AFAIK this is recommended because, as documented in section 2.3.2.1, > "Legacy Decode Pipeline" (describing the frontend of Sandy Bridge, and > used as the base for newer microarchitectures): > > | An instruction fetch is a 16-byte aligned lookup through the ITLB > and into the instruction cache. > | The instruction cache can deliver every cycle 16 bytes to the > instruction pre-decoder. > > AFAIK this means that if a branch ends close to the end of a 16-byte > block, the frontend is less efficient because it may have to run two > instruction fetches before the first instruction can even be decoded. See also The microarchitecture of Intel, AMD and VIA CPUs from www.agner.org/optimize My suspicion is that reducing the cache size (so more code fits in) will almost always be a win over aligning branch targets and entry points. If the alignment of a function matters then there are probably other changes to that bit of code that will give a larger benefit. David - Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK Registration No: 1397386 (Wales)
On Thu, 2020-02-06 at 04:26 -0800, Kees Cook wrote: > On Wed, Feb 05, 2020 at 02:39:45PM -0800, Kristen Carlson Accardi > wrote: > > We will be using -ffunction-sections to place each function in > > it's own text section so it can be randomized at load time. The > > linker considers these .text.* sections "orphaned sections", and > > will place them after the first similar section (.text). However, > > we need to move _etext so that it is after both .text and .text.* > > We also need to calculate text size to include .text AND .text.* > > The dependency on the linker's orphan section handling is, I feel, > rather fragile (during work on CFI and generally building kernels > with > Clang's LLD linker, we keep tripping over difference between how BFD > and > LLD handle orphans). However, this is currently no way to perform a > section "pass through" where input sections retain their name as an > output section. (If anyone knows a way to do this, I'm all ears). > > Right now, you can only collect sections like this: > > .text : AT(ADDR(.text) - LOAD_OFFSET) { > *(.text.*) > } > > or let them be orphans, which then the linker attempts to find a > "similar" (code, data, etc) section to put them near: > https://sourceware.org/binutils/docs-2.33.1/ld/Orphan-Sections.html > > So, basically, yes, this works, but I'd like to see BFD and LLD grow > some kind of /PASSTHRU/ special section (like /DISCARD/), that would > let > a linker script specify _where_ these sections should roughly live. > > Related thoughts: > > I know x86_64 stack alignment is 16 bytes. I cannot find evidence for > what function start alignment should be. It seems the linker is 16 > byte > aligning these functions, when I think no alignment is needed for > function starts, so we're wasting some memory (average 8 bytes per > function, at say 50,000 functions, so approaching 512KB) between > functions. If we can specify a 1 byte alignment for these orphan > sections, that would be nice, as mentioned in the cover letter: we > lose > a 4 bits of entropy to this alignment, since all randomized function > addresses will have their low bits set to zero. So, when I was developing this patch set, I initially ignored the value of sh_addralign and just packed the functions in one right after another when I did the new layout. They were byte aligned :). I later realized that I should probably pay attention to alignment and thus started respecting the value that was in sh_addralign. There is actually nothing stopping me from ignoring it again, other than I am concerned that I will make runtime performance suffer even more than I already have. > > And we can't adjust function section alignment, or there is some > benefit to a larger alignment, I would like to have a way for the > linker > to specify the inter-section padding (or fill) bytes. Right now, the > FILL(0xnn) (or =0xnn) can be used to specify the padding bytes > _within_ > a section, but not between sections. Right now, BFD appears to 0-pad. > I'd > like that to be 0xCC so "guessing" addresses incorrectly will trigger > a trap. Padding the space between functions with int3 is easy to add during boot time, and I've got it on my todo list.
> On Feb 6, 2020, at 11:41 AM, Kristen Carlson Accardi <kristen@linux.intel.com> wrote: > > On Thu, 2020-02-06 at 04:26 -0800, Kees Cook wrote: >>> On Wed, Feb 05, 2020 at 02:39:45PM -0800, Kristen Carlson Accardi >>> wrote: >>> We will be using -ffunction-sections to place each function in >>> it's own text section so it can be randomized at load time. The >>> linker considers these .text.* sections "orphaned sections", and >>> will place them after the first similar section (.text). However, >>> we need to move _etext so that it is after both .text and .text.* >>> We also need to calculate text size to include .text AND .text.* >> >> The dependency on the linker's orphan section handling is, I feel, >> rather fragile (during work on CFI and generally building kernels >> with >> Clang's LLD linker, we keep tripping over difference between how BFD >> and >> LLD handle orphans). However, this is currently no way to perform a >> section "pass through" where input sections retain their name as an >> output section. (If anyone knows a way to do this, I'm all ears). >> >> Right now, you can only collect sections like this: >> >> .text : AT(ADDR(.text) - LOAD_OFFSET) { >> *(.text.*) >> } >> >> or let them be orphans, which then the linker attempts to find a >> "similar" (code, data, etc) section to put them near: >> https://sourceware.org/binutils/docs-2.33.1/ld/Orphan-Sections.html >> >> So, basically, yes, this works, but I'd like to see BFD and LLD grow >> some kind of /PASSTHRU/ special section (like /DISCARD/), that would >> let >> a linker script specify _where_ these sections should roughly live. >> >> Related thoughts: >> >> I know x86_64 stack alignment is 16 bytes. I cannot find evidence for >> what function start alignment should be. It seems the linker is 16 >> byte >> aligning these functions, when I think no alignment is needed for >> function starts, so we're wasting some memory (average 8 bytes per >> function, at say 50,000 functions, so approaching 512KB) between >> functions. If we can specify a 1 byte alignment for these orphan >> sections, that would be nice, as mentioned in the cover letter: we >> lose >> a 4 bits of entropy to this alignment, since all randomized function >> addresses will have their low bits set to zero. > > So, when I was developing this patch set, I initially ignored the value > of sh_addralign and just packed the functions in one right after > another when I did the new layout. They were byte aligned :). I later > realized that I should probably pay attention to alignment and thus > started respecting the value that was in sh_addralign. There is > actually nothing stopping me from ignoring it again, other than I am > concerned that I will make runtime performance suffer even more than I > already have. If you start randomizing *data* sections, then alignment matters. Also, in the shiny new era of Intel-CPUs-can’t-handle-Jcc-spanning-a-cacheline, function alignment may actually matter. Sigh. The symptom will be horrible maybe-exploitable crashes on old microcode and “minimal performance impact” on new microcode. In this context, “minimal” may actually mean “huge, throw away your CPU and replace it with one from a different vendor.” Of course, there doesn’t appear to be anything resembling credible public documentation for any of this.
On Thu, Feb 06, 2020 at 12:02:36PM -0800, Andy Lutomirski wrote: > Also, in the shiny new era of > Intel-CPUs-can’t-handle-Jcc-spanning-a-cacheline, function alignment > may actually matter. *groan*, indeed. I just went and looked that up. I missed this one in all the other fuss :/ So per: https://www.intel.com/content/dam/support/us/en/documents/processors/mitigations-jump-conditional-code-erratum.pdf the toolchain mitigations only work if the offset in the ifetch window (32 bytes) is preserved. Which seems to suggest we ought to align all functions to 32byte before randomizing it, otherwise we're almost guaranteed to change this offset by the act of randomizing.
On Fri, Feb 07, 2020 at 10:24:23AM +0100, Peter Zijlstra wrote: > On Thu, Feb 06, 2020 at 12:02:36PM -0800, Andy Lutomirski wrote: > > Also, in the shiny new era of > > Intel-CPUs-can’t-handle-Jcc-spanning-a-cacheline, function alignment > > may actually matter. > > *groan*, indeed. I just went and looked that up. I missed this one in > all the other fuss :/ > > So per: > > https://www.intel.com/content/dam/support/us/en/documents/processors/mitigations-jump-conditional-code-erratum.pdf > > the toolchain mitigations only work if the offset in the ifetch window > (32 bytes) is preserved. Which seems to suggest we ought to align all > functions to 32byte before randomizing it, otherwise we're almost > guaranteed to change this offset by the act of randomizing. Wheee! This sounds like in needs to be fixed generally, yes? (And I see "FUNCTION_ALIGN" macro is currently 16 bytes...
On Sun, Feb 09, 2020 at 05:43:40PM -0800, Kees Cook wrote: > On Fri, Feb 07, 2020 at 10:24:23AM +0100, Peter Zijlstra wrote: > > On Thu, Feb 06, 2020 at 12:02:36PM -0800, Andy Lutomirski wrote: > > > Also, in the shiny new era of > > > Intel-CPUs-can’t-handle-Jcc-spanning-a-cacheline, function alignment > > > may actually matter. > > > > *groan*, indeed. I just went and looked that up. I missed this one in > > all the other fuss :/ > > > > So per: > > > > https://www.intel.com/content/dam/support/us/en/documents/processors/mitigations-jump-conditional-code-erratum.pdf > > > > the toolchain mitigations only work if the offset in the ifetch window > > (32 bytes) is preserved. Which seems to suggest we ought to align all > > functions to 32byte before randomizing it, otherwise we're almost > > guaranteed to change this offset by the act of randomizing. > > Wheee! This sounds like in needs to be fixed generally, yes? (And I see > "FUNCTION_ALIGN" macro is currently 16 bytes... It depends a bit on how it all works I suppose (I'm not too clear on the details). Suppose the linker appends translation units at (at least) 32 bytes alignment, but the function alignment inside the translation unit is smaller, then it could still work, because the assembler (which is going to insert NOPs to avoid instructions being in the 'wrong' place) can still know the offset. If the linker is going to be fancy (say LTO) and move code around inside sections/translation units, then this goes out the window obviously. The same with this fine-grained-randomization, if the section alignment is smaller than 32 bytes, the offset is going to change and the mitigation will be nullified. I'll leave it to others to figure out the exact details. But afaict it should be possible to have fine-grained-randomization and preserve the workaround in the end.
> > I'll leave it to others to figure out the exact details. But afaict it > should be possible to have fine-grained-randomization and preserve the > workaround in the end. > the most obvious "solution" is to compile with an alignment of 4 bytes (so tight packing) and then in the randomizer preserve the offset within 32 bytes, no matter what it is that would get you an average padding of 16 bytes which is a bit more than now but not too insane (queue Kees' argument that tiny bits of padding are actually good)
On Mon, Feb 10, 2020 at 07:54:58AM -0800, Arjan van de Ven wrote: > > > > I'll leave it to others to figure out the exact details. But afaict it > > should be possible to have fine-grained-randomization and preserve the > > workaround in the end. > > > > the most obvious "solution" is to compile with an alignment of 4 bytes (so tight packing) > and then in the randomizer preserve the offset within 32 bytes, no matter what it is > > that would get you an average padding of 16 bytes which is a bit more than now but not too insane > (queue Kees' argument that tiny bits of padding are actually good) > With the patchset for adding the mbranches-within-32B-boundaries option, the section alignment gets forced to 32. With function-sections that means function alignment has to be 32 too.
On Mon, Feb 10, 2020 at 11:36:29AM -0500, Arvind Sankar wrote: > On Mon, Feb 10, 2020 at 07:54:58AM -0800, Arjan van de Ven wrote: > > > > > > I'll leave it to others to figure out the exact details. But afaict it > > > should be possible to have fine-grained-randomization and preserve the > > > workaround in the end. > > > > > > > the most obvious "solution" is to compile with an alignment of 4 bytes (so tight packing) > > and then in the randomizer preserve the offset within 32 bytes, no matter what it is > > > > that would get you an average padding of 16 bytes which is a bit more than now but not too insane > > (queue Kees' argument that tiny bits of padding are actually good) > > > > With the patchset for adding the mbranches-within-32B-boundaries option, > the section alignment gets forced to 32. With function-sections that > means function alignment has to be 32 too. We should be careful about enabling -mbranches-within-32B-boundaries. It will hurt AMD, and presumably future Intel CPUs which don't need it.
On Fri, Feb 21, 2020 at 01:50:39PM -0600, Josh Poimboeuf wrote: > On Mon, Feb 10, 2020 at 11:36:29AM -0500, Arvind Sankar wrote: > > On Mon, Feb 10, 2020 at 07:54:58AM -0800, Arjan van de Ven wrote: > > > > > > > > I'll leave it to others to figure out the exact details. But afaict it > > > > should be possible to have fine-grained-randomization and preserve the > > > > workaround in the end. > > > > > > > > > > the most obvious "solution" is to compile with an alignment of 4 bytes (so tight packing) > > > and then in the randomizer preserve the offset within 32 bytes, no matter what it is > > > > > > that would get you an average padding of 16 bytes which is a bit more than now but not too insane > > > (queue Kees' argument that tiny bits of padding are actually good) > > > > > > > With the patchset for adding the mbranches-within-32B-boundaries option, > > the section alignment gets forced to 32. With function-sections that > > means function alignment has to be 32 too. > > We should be careful about enabling -mbranches-within-32B-boundaries. > It will hurt AMD, and presumably future Intel CPUs which don't need it. > > -- > Josh > And past Intel CPUs too :) As I understand it only appears from Skylake onwards.
diff --git a/arch/x86/kernel/vmlinux.lds.S b/arch/x86/kernel/vmlinux.lds.S index 3a1a819da137..e54e9ac5b429 100644 --- a/arch/x86/kernel/vmlinux.lds.S +++ b/arch/x86/kernel/vmlinux.lds.S @@ -146,8 +146,24 @@ SECTIONS #endif } :text =0xcccc - /* End of text section, which should occupy whole number of pages */ +#ifdef CONFIG_FG_KASLR + /* + * -ffunction-sections creates .text.* sections, which are considered + * "orphan sections" and added after the first similar section (.text). + * Adding this ALIGN statement causes the address of _etext + * to be below that of all the .text.* orphaned sections + */ + . = ALIGN(PAGE_SIZE); +#endif _etext = .; + + /* + * the size of the .text section is used to calculate the address + * range for orc lookups. If we just use SIZEOF(.text), we will + * miss all the .text.* sections. Calculate the size using _etext + * and _stext and save the value for later. + */ + text_size = _etext - _stext; . = ALIGN(PAGE_SIZE); X86_ALIGN_RODATA_BEGIN diff --git a/include/asm-generic/vmlinux.lds.h b/include/asm-generic/vmlinux.lds.h index e00f41aa8ec4..edf19f4296e2 100644 --- a/include/asm-generic/vmlinux.lds.h +++ b/include/asm-generic/vmlinux.lds.h @@ -798,7 +798,7 @@ . = ALIGN(4); \ .orc_lookup : AT(ADDR(.orc_lookup) - LOAD_OFFSET) { \ orc_lookup = .; \ - . += (((SIZEOF(.text) + LOOKUP_BLOCK_SIZE - 1) / \ + . += (((text_size + LOOKUP_BLOCK_SIZE - 1) / \ LOOKUP_BLOCK_SIZE) + 1) * 4; \ orc_lookup_end = .; \ }
We will be using -ffunction-sections to place each function in it's own text section so it can be randomized at load time. The linker considers these .text.* sections "orphaned sections", and will place them after the first similar section (.text). However, we need to move _etext so that it is after both .text and .text.* We also need to calculate text size to include .text AND .text.* Signed-off-by: Kristen Carlson Accardi <kristen@linux.intel.com> --- arch/x86/kernel/vmlinux.lds.S | 18 +++++++++++++++++- include/asm-generic/vmlinux.lds.h | 2 +- 2 files changed, 18 insertions(+), 2 deletions(-)