Message ID | 20180413100204.9674-1-peter.maydell@linaro.org (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Fri, Apr 13, 2018 at 11:02:04AM +0100, Peter Maydell wrote: > If userspace faults on a kernel address, handing them the raw ESR > value on the sigframe as part of the delivered signal can leak data > useful to attackers who are using information about the underlying hardware > fault type (e.g. translation vs permission) as a mechanism to defeat KASLR. > > However there are also legitimate uses for the information provided > in the ESR -- notably the GCC and LLVM sanitizers would like to be "would like to" or "currently rely on for correct behaviour"? The usefulness of some of this information (particularly WnR) does not seem to be arch-dependent. If this were reported in a portable way via siginfo, would it allow any known users to be simplified? I'm wondering whether there's an argument for coming up with a generic solution for the future. That wouldn't replace this patch though. > able to report whether wild pointer accesses by the application are > reads or writes (since a wild write is a more serious bug than a > wild read), so we don't want to drop the ESR information entirely. > > For faulting addresses in the kernel, sanitize the ESR. We choose > to present userspace with the illusion that there is nothing mapped > in the kernel's part of the address space at all, by reporting all > faults as level 0 translation faults. > > These fields are safe to pass through to userspace as they depend > only on the instruction that userspace used to provoke the fault: > EC IL ISV SAS SSE SRT SF AR CM WNR > All the other fields in ESR except DFSC are architecturally zero > for an L1 translation fault, so can be zeroed out without confusing > userspace. While we're about it, it occurs to me that even for faults taken on user addresses, ESR exposes information that is not really relevant to userspace. For example, the layout of the pagetables is not something that userspace can normally see, so it is strange to report the level at which a translation fault occurred etc. If we are sanitising this information, we should perhaps squash all faults that specify a translation table level to a single level. There are certain fault types that should never be exposed to userspace, such as TLB conflict aborts, access flag faults, etc. See the changes upstream between v4.16 and torvalds/master (which I had temporarily forgotten about...) Now, __do_user_fault() is called for these cases but the signal has already been mapped to SIGKILL by this point via the fault_info[] table. It doesn't matter too much what we do in such cases because the user task will be killed on signal delivery and so doesn't get a chance to inspect the ESR contents anyway. It might be worth expanding the WARN() to catch mismaintenance of the fault_info[] table -- but that may be overkill. Maybe adding some comments explaining the dependency is sufficient. > The illusion is not entirely perfect, as there is a tiny wrinkle > where we will report an alignment fault that was not due to the memory > type (for instance a LDREX to an unaligned address) as a translation > fault, whereas if you do this on real unmapped memory the alignment > fault takes precedence. This is not likely to trip anybody up in > practice, as the only users we know of for the ESR information who > care about the behaviour for kernel addresses only really want to > know about the WnR bit. I tend to agree with this. We could probably allow the alignment fault to show through, but I can't think of a reasonable scenario where userspace would legitimately rely on this. If we err on the size of exposing too little information, we are less likely to get into problems later on... > > Signed-off-by: Peter Maydell <peter.maydell@linaro.org> > --- > This RFC patch is an alternative proposal to Will's patch > https://patchwork.kernel.org/patch/10258781/ > which simply removed the ESR record entirely for kernel addresses. > > arch/arm64/mm/fault.c | 51 +++++++++++++++++++++++++++++++++++++++++++++++++++ > 1 file changed, 51 insertions(+) > > diff --git a/arch/arm64/mm/fault.c b/arch/arm64/mm/fault.c > index bff11553eb05..933c6d3b906e 100644 > --- a/arch/arm64/mm/fault.c > +++ b/arch/arm64/mm/fault.c > @@ -307,6 +307,57 @@ static void __do_user_fault(struct task_struct *tsk, unsigned long addr, > __show_regs(regs); > } > > + /* > + * If the faulting address is in the kernel, we must sanitize the ESR. > + * From userspace's point of view, kernel-only mappings don't exist > + * at all, so we report them as level 0 translation faults. > + * (This is not quite the way that "no mapping there at all" behaves: > + * an alignment fault not caused by the memory type would take > + * precedence over translation fault for a real access to empty > + * space. Unfortunately we can't easily distinguish "alignment fault > + * not caused by memory type" from "alignment fault caused by memory > + * type", so we ignore this wrinkle and just return the translation > + * fault.) > + */ > + if (addr >= TASK_SIZE) { > + switch (ESR_ELx_EC(esr)) { > + case ESR_ELx_EC_DABT_CUR: How do we hit the _CUR cases, and if we can, aren't we reporting information about an insn executed by the _kernel_ here? Possibly we should delete those cases: if we took a _CUR exception with user_mode(regs) it looks a bit like we would have to have been executing at EL0 but somehow took the exception from EL1 to EL1. That sounds like it should be impossible... > + case ESR_ELx_EC_DABT_LOW: > + /* > + * These bits provide only information about the > + * faulting instruction, which userspace knows already. > + * All other bits are architecturally required to be > + * zero for faults reported with a DFSCR indicating > + * a level 0 translation fault. > + */ > + esr &= ESR_ELx_EC_MASK | ESR_ELx_IL | ESR_ELx_ISV | In the !ISV case, some of the other fields become architecturally UNKNOWN. We should probably mask them out in that case, since they could expose something that EL0 is not supposed to see. > + ESR_ELx_SAS | ESR_ELx_SSE | ESR_ELx_SRT_MASK | > + ESR_ELx_SF | ESR_ELx_AR | ESR_ELx_CM | > + ESR_ELx_WNR; > + esr |= ESR_ELx_FSC_FAULT; > + break; > + case ESR_ELx_EC_IABT_CUR: > + case ESR_ELx_EC_IABT_LOW: > + /* > + * Claim a level 0 translation fault. > + * All other bits are architecturally required to be > + * zero for faults reported with that DFSC value. Nit: they're RES0, so they might become nonzero in the future, with unpredicatable (and possibly sensitive) meanings. So it makes sense to hide them until we know they're safe to expose. > + */ > + esr &= ESR_ELx_EC_MASK | ESR_ELx_IL; > + esr |= ESR_ELx_FSC_FAULT; > + break; > + default: > + /* > + * This should never happen (entry.S only brings us > + * into this code for insn and data aborts). Fail > + * safe by not providing an ESR context record at all. > + */ > + WARN(1, "ESR 0x%x is not DABT or IABT\n", esr); > + esr = 0; > + break; > + } > + } > + > tsk->thread.fault_address = addr; > tsk->thread.fault_code = esr; > si.si_signo = sig; > -- > 2.16.2 > > > _______________________________________________ > linux-arm-kernel mailing list > linux-arm-kernel@lists.infradead.org > http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
On 13 April 2018 at 15:21, Dave Martin <Dave.Martin@arm.com> wrote: > On Fri, Apr 13, 2018 at 11:02:04AM +0100, Peter Maydell wrote: >> If userspace faults on a kernel address, handing them the raw ESR >> value on the sigframe as part of the delivered signal can leak data >> useful to attackers who are using information about the underlying hardware >> fault type (e.g. translation vs permission) as a mechanism to defeat KASLR. >> >> However there are also legitimate uses for the information provided >> in the ESR -- notably the GCC and LLVM sanitizers would like to be > > "would like to" or "currently rely on for correct behaviour"? Closer to the latter -- certainly if we just dropped ESR we would change their behaviour in those situations, though they have a fallback codepath I think so they wouldn't die entirely. > The usefulness of some of this information (particularly WnR) does not > seem to be arch-dependent. If this were reported in a portable way via > siginfo, would it allow any known users to be simplified? > > I'm wondering whether there's an argument for coming up with a generic > solution for the future. That wouldn't replace this patch though. That would certainly be nice. QEMU has accumulated a collection of per-arch per-OS signal handlers that have code of varying degrees of ugliness to try to determine "was this a write" when in a SIGSEGV/SIGBUS handler: https://git.qemu.org/?p=qemu.git;a=blob;f=accel/tcg/user-exec.c;h=26a3ffbba1de229060f683f9f6a0ca80db988eb9;hb=refs/heads/master#l180 as has the LLVM sanitizer code: https://github.com/llvm-mirror/compiler-rt/blob/master/lib/sanitizer_common/sanitizer_linux.cc#L1724 (though they seem to have managed to make it a bit less ugly than QEMU :-)) > While we're about it, it occurs to me that even for faults taken > on user addresses, ESR exposes information that is not really > relevant to userspace. For example, the layout of the pagetables > is not something that userspace can normally see, so it is strange > to report the level at which a translation fault occurred etc. > > If we are sanitising this information, we should perhaps squash > all faults that specify a translation table level to a single level. Mmm, maybe (though I guess level 0 fault is not so appropriate here). TBH it feels a bit like an unnecessary level of gold-plating. > There are certain fault types that should never be exposed to > userspace, such as TLB conflict aborts, access flag faults, etc. > See the changes upstream between v4.16 and torvalds/master (which I had > temporarily forgotten about...) > > Now, __do_user_fault() is called for these cases but the signal has > already been mapped to SIGKILL by this point via the fault_info[] > table. It doesn't matter too much what we do in such cases because > the user task will be killed on signal delivery and so doesn't get a > chance to inspect the ESR contents anyway. It might be worth expanding > the WARN() to catch mismaintenance of the fault_info[] table -- but that > may be overkill. Maybe adding some comments explaining the dependency > is sufficient. I don't entirely understand this paragraph, but if you want to provide a comment I'm happy to fill it into a v2 of this patch :-) >> The illusion is not entirely perfect, as there is a tiny wrinkle >> where we will report an alignment fault that was not due to the memory >> type (for instance a LDREX to an unaligned address) as a translation >> fault, whereas if you do this on real unmapped memory the alignment >> fault takes precedence. This is not likely to trip anybody up in >> practice, as the only users we know of for the ESR information who >> care about the behaviour for kernel addresses only really want to >> know about the WnR bit. > > I tend to agree with this. We could probably allow the alignment fault > to show through, but I can't think of a reasonable scenario where > userspace would legitimately rely on this. If you always allow "alignment fault" DFSC values through, then you are permitting userspace to distinguish kernel-space memory that is Device memory from (unmapped or Normal), because it can do an unaligned LDR access or similar, and see whether it gets a DFSC for alignment fault or one for a translation error. >> + if (addr >= TASK_SIZE) { >> + switch (ESR_ELx_EC(esr)) { >> + case ESR_ELx_EC_DABT_CUR: > > How do we hit the _CUR cases, and if we can, aren't we reporting > information about an insn executed by the _kernel_ here? > > Possibly we should delete those cases: if we took a _CUR exception with > user_mode(regs) it looks a bit like we would have to have been executing > at EL0 but somehow took the exception from EL1 to EL1. That sounds like > it should be impossible... Yes, you're right that we shouldn't get here with the _CUR cases. I think I was thinking there might be cases where the kernel accessed userspace memory and reported faults with a signal, but looking at the code we can't get to this function unless user_mode() returned true. So we can let the default case handle the _CUR cases too. >> + case ESR_ELx_EC_DABT_LOW: >> + /* >> + * These bits provide only information about the >> + * faulting instruction, which userspace knows already. >> + * All other bits are architecturally required to be >> + * zero for faults reported with a DFSCR indicating >> + * a level 0 translation fault. >> + */ >> + esr &= ESR_ELx_EC_MASK | ESR_ELx_IL | ESR_ELx_ISV | > > In the !ISV case, some of the other fields become architecturally > UNKNOWN. We should probably mask them out in that case, since they > could expose something that EL0 is not supposed to see. The spec says they're RES0 when ISV is 0, and only UNKNOWN if ISV is UNKNOWN (which only happens for Debug state accesses, which I think are not relevant here). We can certainly mask out the RES0 bits in the ISV==0 case, though. >> + ESR_ELx_SAS | ESR_ELx_SSE | ESR_ELx_SRT_MASK | >> + ESR_ELx_SF | ESR_ELx_AR | ESR_ELx_CM | >> + ESR_ELx_WNR; >> + esr |= ESR_ELx_FSC_FAULT; >> + break; >> + case ESR_ELx_EC_IABT_CUR: >> + case ESR_ELx_EC_IABT_LOW: >> + /* >> + * Claim a level 0 translation fault. >> + * All other bits are architecturally required to be >> + * zero for faults reported with that DFSC value. > > Nit: they're RES0, so they might become nonzero in the future, with > unpredicatable (and possibly sensitive) meanings. So it makes sense to > hide them until we know they're safe to expose. OK, I'll tweak the comment text. >> + */ >> + esr &= ESR_ELx_EC_MASK | ESR_ELx_IL; >> + esr |= ESR_ELx_FSC_FAULT; >> + break; >> + default: >> + /* >> + * This should never happen (entry.S only brings us >> + * into this code for insn and data aborts). Fail >> + * safe by not providing an ESR context record at all. >> + */ >> + WARN(1, "ESR 0x%x is not DABT or IABT\n", esr); >> + esr = 0; >> + break; >> + } >> + } >> + >> tsk->thread.fault_address = addr; >> tsk->thread.fault_code = esr; >> si.si_signo = sig; thanks -- PMM
On 13/04/18 15:21, Dave Martin wrote: > On Fri, Apr 13, 2018 at 11:02:04AM +0100, Peter Maydell wrote: >> If userspace faults on a kernel address, handing them the raw ESR >> value on the sigframe as part of the delivered signal can leak data >> useful to attackers who are using information about the underlying hardware >> fault type (e.g. translation vs permission) as a mechanism to defeat KASLR. >> >> However there are also legitimate uses for the information provided >> in the ESR -- notably the GCC and LLVM sanitizers would like to be > > "would like to" or "currently rely on for correct behaviour"? > > The usefulness of some of this information (particularly WnR) does not > seem to be arch-dependent. If this were reported in a portable way via > siginfo, would it allow any known users to be simplified? > > I'm wondering whether there's an argument for coming up with a generic > solution for the future. That wouldn't replace this patch though. > >> able to report whether wild pointer accesses by the application are >> reads or writes (since a wild write is a more serious bug than a >> wild read), so we don't want to drop the ESR information entirely. >> >> For faulting addresses in the kernel, sanitize the ESR. We choose >> to present userspace with the illusion that there is nothing mapped >> in the kernel's part of the address space at all, by reporting all >> faults as level 0 translation faults. >> >> These fields are safe to pass through to userspace as they depend >> only on the instruction that userspace used to provoke the fault: >> EC IL ISV SAS SSE SRT SF AR CM WNR >> All the other fields in ESR except DFSC are architecturally zero >> for an L1 translation fault, so can be zeroed out without confusing >> userspace. > > While we're about it, it occurs to me that even for faults taken > on user addresses, ESR exposes information that is not really > relevant to userspace. For example, the layout of the pagetables > is not something that userspace can normally see, so it is strange > to report the level at which a translation fault occurred etc. > > If we are sanitising this information, we should perhaps squash > all faults that specify a translation table level to a single level. > > There are certain fault types that should never be exposed to > userspace, such as TLB conflict aborts, access flag faults, etc. > See the changes upstream between v4.16 and torvalds/master (which I had > temporarily forgotten about...) > > Now, __do_user_fault() is called for these cases but the signal has > already been mapped to SIGKILL by this point via the fault_info[] > table. It doesn't matter too much what we do in such cases because > the user task will be killed on signal delivery and so doesn't get a > chance to inspect the ESR contents anyway. It might be worth expanding > the WARN() to catch mismaintenance of the fault_info[] table -- but that > may be overkill. Maybe adding some comments explaining the dependency > is sufficient. > >> The illusion is not entirely perfect, as there is a tiny wrinkle >> where we will report an alignment fault that was not due to the memory >> type (for instance a LDREX to an unaligned address) as a translation >> fault, whereas if you do this on real unmapped memory the alignment >> fault takes precedence. This is not likely to trip anybody up in >> practice, as the only users we know of for the ESR information who >> care about the behaviour for kernel addresses only really want to >> know about the WnR bit. > > I tend to agree with this. We could probably allow the alignment fault > to show through, but I can't think of a reasonable scenario where > userspace would legitimately rely on this. I think the problem with that is that memory type alignment faults also still have precedence over permission faults, meaning that valid device mappings could potentially be discovered by probing with deliberately misaligned accesses, which could give away the location of the vmalloc region. Robin. > If we err on the size of exposing too little information, we are less > likely to get into problems later on... > >> >> Signed-off-by: Peter Maydell <peter.maydell@linaro.org> >> --- >> This RFC patch is an alternative proposal to Will's patch >> https://patchwork.kernel.org/patch/10258781/ >> which simply removed the ESR record entirely for kernel addresses. >> >> arch/arm64/mm/fault.c | 51 +++++++++++++++++++++++++++++++++++++++++++++++++++ >> 1 file changed, 51 insertions(+) >> >> diff --git a/arch/arm64/mm/fault.c b/arch/arm64/mm/fault.c >> index bff11553eb05..933c6d3b906e 100644 >> --- a/arch/arm64/mm/fault.c >> +++ b/arch/arm64/mm/fault.c >> @@ -307,6 +307,57 @@ static void __do_user_fault(struct task_struct *tsk, unsigned long addr, >> __show_regs(regs); >> } >> >> + /* >> + * If the faulting address is in the kernel, we must sanitize the ESR. >> + * From userspace's point of view, kernel-only mappings don't exist >> + * at all, so we report them as level 0 translation faults. >> + * (This is not quite the way that "no mapping there at all" behaves: >> + * an alignment fault not caused by the memory type would take >> + * precedence over translation fault for a real access to empty >> + * space. Unfortunately we can't easily distinguish "alignment fault >> + * not caused by memory type" from "alignment fault caused by memory >> + * type", so we ignore this wrinkle and just return the translation >> + * fault.) >> + */ >> + if (addr >= TASK_SIZE) { >> + switch (ESR_ELx_EC(esr)) { >> + case ESR_ELx_EC_DABT_CUR: > > How do we hit the _CUR cases, and if we can, aren't we reporting > information about an insn executed by the _kernel_ here? > > Possibly we should delete those cases: if we took a _CUR exception with > user_mode(regs) it looks a bit like we would have to have been executing > at EL0 but somehow took the exception from EL1 to EL1. That sounds like > it should be impossible... > > >> + case ESR_ELx_EC_DABT_LOW: >> + /* >> + * These bits provide only information about the >> + * faulting instruction, which userspace knows already. >> + * All other bits are architecturally required to be >> + * zero for faults reported with a DFSCR indicating >> + * a level 0 translation fault. >> + */ >> + esr &= ESR_ELx_EC_MASK | ESR_ELx_IL | ESR_ELx_ISV | > > In the !ISV case, some of the other fields become architecturally > UNKNOWN. We should probably mask them out in that case, since they > could expose something that EL0 is not supposed to see. > >> + ESR_ELx_SAS | ESR_ELx_SSE | ESR_ELx_SRT_MASK | >> + ESR_ELx_SF | ESR_ELx_AR | ESR_ELx_CM | >> + ESR_ELx_WNR; >> + esr |= ESR_ELx_FSC_FAULT; >> + break; >> + case ESR_ELx_EC_IABT_CUR: >> + case ESR_ELx_EC_IABT_LOW: >> + /* >> + * Claim a level 0 translation fault. >> + * All other bits are architecturally required to be >> + * zero for faults reported with that DFSC value. > > Nit: they're RES0, so they might become nonzero in the future, with > unpredicatable (and possibly sensitive) meanings. So it makes sense to > hide them until we know they're safe to expose. > >> + */ >> + esr &= ESR_ELx_EC_MASK | ESR_ELx_IL; >> + esr |= ESR_ELx_FSC_FAULT; >> + break; >> + default: >> + /* >> + * This should never happen (entry.S only brings us >> + * into this code for insn and data aborts). Fail >> + * safe by not providing an ESR context record at all. >> + */ >> + WARN(1, "ESR 0x%x is not DABT or IABT\n", esr); >> + esr = 0; >> + break; >> + } >> + } >> + >> tsk->thread.fault_address = addr; >> tsk->thread.fault_code = esr; >> si.si_signo = sig; >> -- >> 2.16.2 >> >> >> _______________________________________________ >> linux-arm-kernel mailing list >> linux-arm-kernel@lists.infradead.org >> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
On Fri, Apr 13, 2018 at 03:47:08PM +0100, Peter Maydell wrote: > On 13 April 2018 at 15:21, Dave Martin <Dave.Martin@arm.com> wrote: > > On Fri, Apr 13, 2018 at 11:02:04AM +0100, Peter Maydell wrote: > >> If userspace faults on a kernel address, handing them the raw ESR > >> value on the sigframe as part of the delivered signal can leak data > >> useful to attackers who are using information about the underlying hardware > >> fault type (e.g. translation vs permission) as a mechanism to defeat KASLR. > >> > >> However there are also legitimate uses for the information provided > >> in the ESR -- notably the GCC and LLVM sanitizers would like to be > > > > "would like to" or "currently rely on for correct behaviour"? > > Closer to the latter -- certainly if we just dropped ESR we would > change their behaviour in those situations, though they have a > fallback codepath I think so they wouldn't die entirely. I see. > > The usefulness of some of this information (particularly WnR) does not > > seem to be arch-dependent. If this were reported in a portable way via > > siginfo, would it allow any known users to be simplified? > > > > I'm wondering whether there's an argument for coming up with a generic > > solution for the future. That wouldn't replace this patch though. > > That would certainly be nice. QEMU has accumulated a collection > of per-arch per-OS signal handlers that have code of varying > degrees of ugliness to try to determine "was this a write" > when in a SIGSEGV/SIGBUS handler: > > https://git.qemu.org/?p=qemu.git;a=blob;f=accel/tcg/user-exec.c;h=26a3ffbba1de229060f683f9f6a0ca80db988eb9;hb=refs/heads/master#l180 > > as has the LLVM sanitizer code: > > https://github.com/llvm-mirror/compiler-rt/blob/master/lib/sanitizer_common/sanitizer_linux.cc#L1724 > > (though they seem to have managed to make it a bit less ugly > than QEMU :-)) > > > While we're about it, it occurs to me that even for faults taken > > on user addresses, ESR exposes information that is not really > > relevant to userspace. For example, the layout of the pagetables > > is not something that userspace can normally see, so it is strange > > to report the level at which a translation fault occurred etc. > > > > If we are sanitising this information, we should perhaps squash > > all faults that specify a translation table level to a single level. > > Mmm, maybe (though I guess level 0 fault is not so appropriate > here). TBH it feels a bit like an unnecessary level of gold-plating. > > > There are certain fault types that should never be exposed to > > userspace, such as TLB conflict aborts, access flag faults, etc. > > See the changes upstream between v4.16 and torvalds/master (which I had > > temporarily forgotten about...) > > > > Now, __do_user_fault() is called for these cases but the signal has > > already been mapped to SIGKILL by this point via the fault_info[] > > table. It doesn't matter too much what we do in such cases because > > the user task will be killed on signal delivery and so doesn't get a > > chance to inspect the ESR contents anyway. It might be worth expanding > > the WARN() to catch mismaintenance of the fault_info[] table -- but that > > may be overkill. Maybe adding some comments explaining the dependency > > is sufficient. > > I don't entirely understand this paragraph, but if you want to > provide a comment I'm happy to fill it into a v2 of this patch :-) Take a look at arch/arm64/mm/fault.c in torvalds/master. That's probably easier than trying to explain it. What I was trying to say was that for certain DFSC values __do_user_fault() may assume that userspace is sent SIGKILL and so never sees the syndrome information. But if we are relying on that for safety here, we should comment and/or double-check with a WARN(). > >> The illusion is not entirely perfect, as there is a tiny wrinkle > >> where we will report an alignment fault that was not due to the memory > >> type (for instance a LDREX to an unaligned address) as a translation > >> fault, whereas if you do this on real unmapped memory the alignment > >> fault takes precedence. This is not likely to trip anybody up in > >> practice, as the only users we know of for the ESR information who > >> care about the behaviour for kernel addresses only really want to > >> know about the WnR bit. > > > > I tend to agree with this. We could probably allow the alignment fault > > to show through, but I can't think of a reasonable scenario where > > userspace would legitimately rely on this. > > If you always allow "alignment fault" DFSC values through, then you > are permitting userspace to distinguish kernel-space memory that > is Device memory from (unmapped or Normal), because it can do an > unaligned LDR access or similar, and see whether it gets a DFSC > for alignment fault or one for a translation error. Yes, true. Bleh. I think your argument about this apparent repriotisation of fault types being harmless is reasonable in any case. > >> + if (addr >= TASK_SIZE) { > >> + switch (ESR_ELx_EC(esr)) { > >> + case ESR_ELx_EC_DABT_CUR: > > > > How do we hit the _CUR cases, and if we can, aren't we reporting > > information about an insn executed by the _kernel_ here? > > > > Possibly we should delete those cases: if we took a _CUR exception with > > user_mode(regs) it looks a bit like we would have to have been executing > > at EL0 but somehow took the exception from EL1 to EL1. That sounds like > > it should be impossible... > > Yes, you're right that we shouldn't get here with the _CUR cases. > I think I was thinking there might be cases where the kernel > accessed userspace memory and reported faults with a signal, > but looking at the code we can't get to this function unless > user_mode() returned true. So we can let the default case handle > the _CUR cases too. OK. The WARN text should probably change if so. > >> + case ESR_ELx_EC_DABT_LOW: > >> + /* > >> + * These bits provide only information about the > >> + * faulting instruction, which userspace knows already. > >> + * All other bits are architecturally required to be > >> + * zero for faults reported with a DFSCR indicating > >> + * a level 0 translation fault. > >> + */ > >> + esr &= ESR_ELx_EC_MASK | ESR_ELx_IL | ESR_ELx_ISV | > > > > In the !ISV case, some of the other fields become architecturally > > UNKNOWN. We should probably mask them out in that case, since they > > could expose something that EL0 is not supposed to see. > > The spec says they're RES0 when ISV is 0, and only UNKNOWN if ISV > is UNKNOWN (which only happens for Debug state accesses, which I > think are not relevant here). Hmmm, you're right. I wasn't reading the spec carefully enough. RES0 still probably means we should mask those bits out though (same rationale as in the next case). > We can certainly mask out the RES0 bits in the ISV==0 case, though. Agreed. > >> + ESR_ELx_SAS | ESR_ELx_SSE | ESR_ELx_SRT_MASK | > >> + ESR_ELx_SF | ESR_ELx_AR | ESR_ELx_CM | > >> + ESR_ELx_WNR; > >> + esr |= ESR_ELx_FSC_FAULT; > >> + break; > >> + case ESR_ELx_EC_IABT_CUR: > >> + case ESR_ELx_EC_IABT_LOW: > >> + /* > >> + * Claim a level 0 translation fault. > >> + * All other bits are architecturally required to be > >> + * zero for faults reported with that DFSC value. > > > > Nit: they're RES0, so they might become nonzero in the future, with > > unpredicatable (and possibly sensitive) meanings. So it makes sense to > > hide them until we know they're safe to expose. > > OK, I'll tweak the comment text. > > >> + */ > >> + esr &= ESR_ELx_EC_MASK | ESR_ELx_IL; > >> + esr |= ESR_ELx_FSC_FAULT; > >> + break; > >> + default: > >> + /* > >> + * This should never happen (entry.S only brings us > >> + * into this code for insn and data aborts). Fail > >> + * safe by not providing an ESR context record at all. > >> + */ > >> + WARN(1, "ESR 0x%x is not DABT or IABT\n", esr); > >> + esr = 0; > >> + break; > >> + } > >> + } > >> + > >> tsk->thread.fault_address = addr; > >> tsk->thread.fault_code = esr; > >> si.si_signo = sig; > > thanks > -- PMM > > _______________________________________________ > linux-arm-kernel mailing list > linux-arm-kernel@lists.infradead.org > http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
On 13 April 2018 at 15:21, Dave Martin <Dave.Martin@arm.com> wrote: > There are certain fault types that should never be exposed to > userspace, such as TLB conflict aborts, access flag faults, etc. > See the changes upstream between v4.16 and torvalds/master (which I had > temporarily forgotten about...) > > Now, __do_user_fault() is called for these cases but the signal has > already been mapped to SIGKILL by this point via the fault_info[] > table. It doesn't matter too much what we do in such cases because > the user task will be killed on signal delivery and so doesn't get a > chance to inspect the ESR contents anyway. It might be worth expanding > the WARN() to catch mismaintenance of the fault_info[] table -- but that > may be overkill. Maybe adding some comments explaining the dependency > is sufficient. I had a look at the code in current master, and from my reading of it it does not call __do_user_fault() for faults like TLB conflict aborts, etc. Those all have fault_info[] table entries that specify do_bad() as the function pointer, which just returns 1. It's only do_page_fault(), do_translation_fault() and do_alignment() fault that can get into __do_user_fault(). So I don't think there's any change that needs to be made for this point. Am I misunderstanding what you have in mind? If so it would probably be simplest if you explained the change you'd like to see. (The changes in master since v4.16 do require a rebase anyway, mostly for textual reasons, so I'll base v2 on master unless you have a different branch you'd rather I based it on.) I've made the various other changes you suggest. thanks -- PMM
On Tue, Apr 17, 2018 at 06:34:11PM +0100, Peter Maydell wrote: > On 13 April 2018 at 15:21, Dave Martin <Dave.Martin@arm.com> wrote: > > There are certain fault types that should never be exposed to > > userspace, such as TLB conflict aborts, access flag faults, etc. > > See the changes upstream between v4.16 and torvalds/master (which I had > > temporarily forgotten about...) > > > > Now, __do_user_fault() is called for these cases but the signal has > > already been mapped to SIGKILL by this point via the fault_info[] > > table. It doesn't matter too much what we do in such cases because > > the user task will be killed on signal delivery and so doesn't get a > > chance to inspect the ESR contents anyway. It might be worth expanding > > the WARN() to catch mismaintenance of the fault_info[] table -- but that > > may be overkill. Maybe adding some comments explaining the dependency > > is sufficient. > > I had a look at the code in current master, and from my reading > of it it does not call __do_user_fault() for faults like TLB > conflict aborts, etc. Those all have fault_info[] table entries > that specify do_bad() as the function pointer, which just returns 1. > It's only do_page_fault(), do_translation_fault() and do_alignment() > fault that can get into __do_user_fault(). > > So I don't think there's any change that needs to be made for this > point. Am I misunderstanding what you have in mind? If so it would > probably be simplest if you explained the change you'd like to see. Don't worry about this; I think you're right. I was confusing do_bad() with do_bad_area() in my head somewhere along the line... > > (The changes in master since v4.16 do require a rebase anyway, mostly > for textual reasons, so I'll base v2 on master unless you have a > different branch you'd rather I based it on.) -rc1 or master is probably fine. Most of the dust from the merge window ought to have settled now. > I've made the various other changes you suggest. Cheers ---Dave
diff --git a/arch/arm64/mm/fault.c b/arch/arm64/mm/fault.c index bff11553eb05..933c6d3b906e 100644 --- a/arch/arm64/mm/fault.c +++ b/arch/arm64/mm/fault.c @@ -307,6 +307,57 @@ static void __do_user_fault(struct task_struct *tsk, unsigned long addr, __show_regs(regs); } + /* + * If the faulting address is in the kernel, we must sanitize the ESR. + * From userspace's point of view, kernel-only mappings don't exist + * at all, so we report them as level 0 translation faults. + * (This is not quite the way that "no mapping there at all" behaves: + * an alignment fault not caused by the memory type would take + * precedence over translation fault for a real access to empty + * space. Unfortunately we can't easily distinguish "alignment fault + * not caused by memory type" from "alignment fault caused by memory + * type", so we ignore this wrinkle and just return the translation + * fault.) + */ + if (addr >= TASK_SIZE) { + switch (ESR_ELx_EC(esr)) { + case ESR_ELx_EC_DABT_CUR: + case ESR_ELx_EC_DABT_LOW: + /* + * These bits provide only information about the + * faulting instruction, which userspace knows already. + * All other bits are architecturally required to be + * zero for faults reported with a DFSCR indicating + * a level 0 translation fault. + */ + esr &= ESR_ELx_EC_MASK | ESR_ELx_IL | ESR_ELx_ISV | + ESR_ELx_SAS | ESR_ELx_SSE | ESR_ELx_SRT_MASK | + ESR_ELx_SF | ESR_ELx_AR | ESR_ELx_CM | + ESR_ELx_WNR; + esr |= ESR_ELx_FSC_FAULT; + break; + case ESR_ELx_EC_IABT_CUR: + case ESR_ELx_EC_IABT_LOW: + /* + * Claim a level 0 translation fault. + * All other bits are architecturally required to be + * zero for faults reported with that DFSC value. + */ + esr &= ESR_ELx_EC_MASK | ESR_ELx_IL; + esr |= ESR_ELx_FSC_FAULT; + break; + default: + /* + * This should never happen (entry.S only brings us + * into this code for insn and data aborts). Fail + * safe by not providing an ESR context record at all. + */ + WARN(1, "ESR 0x%x is not DABT or IABT\n", esr); + esr = 0; + break; + } + } + tsk->thread.fault_address = addr; tsk->thread.fault_code = esr; si.si_signo = sig;
If userspace faults on a kernel address, handing them the raw ESR value on the sigframe as part of the delivered signal can leak data useful to attackers who are using information about the underlying hardware fault type (e.g. translation vs permission) as a mechanism to defeat KASLR. However there are also legitimate uses for the information provided in the ESR -- notably the GCC and LLVM sanitizers would like to be able to report whether wild pointer accesses by the application are reads or writes (since a wild write is a more serious bug than a wild read), so we don't want to drop the ESR information entirely. For faulting addresses in the kernel, sanitize the ESR. We choose to present userspace with the illusion that there is nothing mapped in the kernel's part of the address space at all, by reporting all faults as level 0 translation faults. These fields are safe to pass through to userspace as they depend only on the instruction that userspace used to provoke the fault: EC IL ISV SAS SSE SRT SF AR CM WNR All the other fields in ESR except DFSC are architecturally zero for an L1 translation fault, so can be zeroed out without confusing userspace. The illusion is not entirely perfect, as there is a tiny wrinkle where we will report an alignment fault that was not due to the memory type (for instance a LDREX to an unaligned address) as a translation fault, whereas if you do this on real unmapped memory the alignment fault takes precedence. This is not likely to trip anybody up in practice, as the only users we know of for the ESR information who care about the behaviour for kernel addresses only really want to know about the WnR bit. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> --- This RFC patch is an alternative proposal to Will's patch https://patchwork.kernel.org/patch/10258781/ which simply removed the ESR record entirely for kernel addresses. arch/arm64/mm/fault.c | 51 +++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 51 insertions(+)