Message ID | 20170925142648.25959-1-george.dunlap@citrix.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Mon, Sep 25, 2017 at 3:26 PM, George Dunlap <george.dunlap@citrix.com> wrote: > From: Jan Beulich <jbeulich@suse.com> > > fuzz_insn_fetch() is the only data access helper where it is possible > to see offsets larger than 4Gb in 16- or 32-bit modes, as we leave the > incoming rIP untouched in the emulator itself. The check is needed here > as otherwise, after successfully fetching insn bytes, we may end up > zero-extending EIP soon after complete_insn, which collides with the > X86EMUL_EXCEPTION-conditional respective ASSERT() in > x86_emulate_wrapper(). (NB: put_rep_prefix() is what allows > complete_insn to be reached with rc set to other than X86EMUL_OKAY or > X86EMUL_DONE. See also commit 53f87c03b4 ["x86emul: generalize > exception handling for rep_* hooks"].) > > Add assert()-s for all other (data) access routines, as effective > address generation in the emulator ought to guarantee in-range values. > For them to not trigger, several adjustments to the emulator's address > calculations are needed: While for DstBitBase it is really mandatory, > the specification allows for either behavior for two-part accesses. Something seems to be missing here -- what's mandatory for DstBitBase, and what are the two behaviors allowed by the specification for two-part accesses? > Observed behavior on real hardware, however, is for such accesses to > silently wrap at the 2^^32 boundary in other than 64-bit mode, just > like they do at the 2^^64 boundary in 64-bit mode. While adding > truncate_ea() invocations there, also convert open coded instances of > it. > > Reported-by: George Dunlap <george.dunlap@citrix.com> > Signed-off-by: Jan Beulich <jbeulich@suse.com> This certainly fixes the issue AFL discovered; and on the whole it seems to do what the description says, and seems to be pretty reasonable. -George
>>> On 06.10.17 at 17:21, <george.dunlap@citrix.com> wrote: > On Mon, Sep 25, 2017 at 3:26 PM, George Dunlap <george.dunlap@citrix.com> wrote: >> From: Jan Beulich <jbeulich@suse.com> >> >> fuzz_insn_fetch() is the only data access helper where it is possible >> to see offsets larger than 4Gb in 16- or 32-bit modes, as we leave the >> incoming rIP untouched in the emulator itself. The check is needed here >> as otherwise, after successfully fetching insn bytes, we may end up >> zero-extending EIP soon after complete_insn, which collides with the >> X86EMUL_EXCEPTION-conditional respective ASSERT() in >> x86_emulate_wrapper(). (NB: put_rep_prefix() is what allows >> complete_insn to be reached with rc set to other than X86EMUL_OKAY or >> X86EMUL_DONE. See also commit 53f87c03b4 ["x86emul: generalize >> exception handling for rep_* hooks"].) >> >> Add assert()-s for all other (data) access routines, as effective >> address generation in the emulator ought to guarantee in-range values. >> For them to not trigger, several adjustments to the emulator's address >> calculations are needed: While for DstBitBase it is really mandatory, >> the specification allows for either behavior for two-part accesses. > > Something seems to be missing here -- what's mandatory for DstBitBase, > and what are the two behaviors allowed by the specification for > two-part accesses? "it" in the sentence above refers to "adjustments" (for DstBitBase it's just one adjustment, so the singular/plural mismatch is not really helpful). Similarly "either behavior" refers to the code with and without the changes done. Would "Add assert()-s for all other (data) access routines, as effective address generation in the emulator ought to guarantee in-range values. For them to not trigger, several adjustments to the emulator's address calculations are needed: While the DstBitBase one is really mandatory, the specification allows for either original or new behavior for two- part accesses. Observed behavior on real hardware, however, is for such accesses to silently wrap at the 2^^32 boundary in other than 64-bit mode, just like they do at the 2^^64 boundary in 64-bit mode, which our code is now being brought in line with. While adding truncate_ea() invocations there, also convert open coded instances of it." be any better? Jan
On 10/06/2017 04:54 PM, Jan Beulich wrote: >>>> On 06.10.17 at 17:21, <george.dunlap@citrix.com> wrote: >> On Mon, Sep 25, 2017 at 3:26 PM, George Dunlap <george.dunlap@citrix.com> wrote: >>> From: Jan Beulich <jbeulich@suse.com> >>> >>> fuzz_insn_fetch() is the only data access helper where it is possible >>> to see offsets larger than 4Gb in 16- or 32-bit modes, as we leave the >>> incoming rIP untouched in the emulator itself. The check is needed here >>> as otherwise, after successfully fetching insn bytes, we may end up >>> zero-extending EIP soon after complete_insn, which collides with the >>> X86EMUL_EXCEPTION-conditional respective ASSERT() in >>> x86_emulate_wrapper(). (NB: put_rep_prefix() is what allows >>> complete_insn to be reached with rc set to other than X86EMUL_OKAY or >>> X86EMUL_DONE. See also commit 53f87c03b4 ["x86emul: generalize >>> exception handling for rep_* hooks"].) >>> >>> Add assert()-s for all other (data) access routines, as effective >>> address generation in the emulator ought to guarantee in-range values. >>> For them to not trigger, several adjustments to the emulator's address >>> calculations are needed: While for DstBitBase it is really mandatory, >>> the specification allows for either behavior for two-part accesses. >> >> Something seems to be missing here -- what's mandatory for DstBitBase, >> and what are the two behaviors allowed by the specification for >> two-part accesses? > > "it" in the sentence above refers to "adjustments" (for DstBitBase > it's just one adjustment, so the singular/plural mismatch is not > really helpful). Similarly "either behavior" refers to the code with > and without the changes done. Would > > "Add assert()-s for all other (data) access routines, as effective > address generation in the emulator ought to guarantee in-range values. > For them to not trigger, several adjustments to the emulator's address > calculations are needed: While the DstBitBase one is really mandatory, > the specification allows for either original or new behavior for two- > part accesses. Observed behavior on real hardware, however, is for such > accesses to silently wrap at the 2^^32 boundary in other than 64-bit > mode, just like they do at the 2^^64 boundary in 64-bit mode, which our > code is now being brought in line with. While adding truncate_ea() > invocations there, also convert open coded instances of it." > > be any better? Yes, I think so. One more thing: > @@ -1249,10 +1249,10 @@ static void __put_rep_prefix( > > /* Clip maximum repetitions so that the index register at most just wraps. */ > #define truncate_ea_and_reps(ea, reps, bytes_per_rep) ({ > - unsigned long todo__, ea__ = truncate_word(ea, ad_bytes); > + unsigned long todo__, ea__ = truncate_ea(ea); > if ( !(_regs.eflags & X86_EFLAGS_DF) ) > - todo__ = truncate_word(-(ea), ad_bytes) / (bytes_per_rep); > - else if ( truncate_word((ea) + (bytes_per_rep) - 1, ad_bytes) < ea__ )\ > + todo__ = truncate_ea(-ea__) / (bytes_per_rep); > + else if ( truncate_ea(ea__ + (bytes_per_rep) - 1) < ea__ ) This changes truncate_ea(-ea) to truncate_ea(-truncate_ea(ea)), and truncate_ea(ea + bpr-1) to truncate_ea(truncate_ea(ea) + bpr-1). That sounds like a plausible change, but it's worth checking to see that it was intentional. I'm not at the moment able to evaluate the assertion that "the specification allows for either original or new behavior for two-part accesses", nor that there is a 1:1 mapping between what this patch changes and what needs to be changed. But assuming the above are true (and the change I asked about was intentional): Reviewed-by: George Dunlap <george.dunlap@citrix.com> If that's not good enough to check it in, feel free to give me the appropriate references to the Intel manual that will allow me to independently verify those facts, and get rid of my caveat. -George
>>> On 06.10.17 at 19:06, <george.dunlap@citrix.com> wrote: > On 10/06/2017 04:54 PM, Jan Beulich wrote: >>>>> On 06.10.17 at 17:21, <george.dunlap@citrix.com> wrote: >>> On Mon, Sep 25, 2017 at 3:26 PM, George Dunlap <george.dunlap@citrix.com> wrote: > One more thing: > >> @@ -1249,10 +1249,10 @@ static void __put_rep_prefix( >> >> /* Clip maximum repetitions so that the index register at most just wraps. */ >> #define truncate_ea_and_reps(ea, reps, bytes_per_rep) ({ >> - unsigned long todo__, ea__ = truncate_word(ea, ad_bytes); >> + unsigned long todo__, ea__ = truncate_ea(ea); >> if ( !(_regs.eflags & X86_EFLAGS_DF) ) >> - todo__ = truncate_word(-(ea), ad_bytes) / (bytes_per_rep); >> - else if ( truncate_word((ea) + (bytes_per_rep) - 1, ad_bytes) < ea__ )\ >> + todo__ = truncate_ea(-ea__) / (bytes_per_rep); >> + else if ( truncate_ea(ea__ + (bytes_per_rep) - 1) < ea__ ) > > This changes truncate_ea(-ea) to truncate_ea(-truncate_ea(ea)), and > truncate_ea(ea + bpr-1) to truncate_ea(truncate_ea(ea) + bpr-1). That > sounds like a plausible change, but it's worth checking to see that it > was intentional. The main goal here was to eliminate the multiple evaluation of the macro argument plus the open-coding of truncate_ea(). With truncate_ea(truncate_ea(x)) == truncate_ea(x) and truncate_ea(-truncate_ea(x)) == truncate_ea(-x) there's no change in net result. So yes, the change was intentional. > I'm not at the moment able to evaluate the assertion that "the > specification allows for either original or new behavior for two-part > accesses", nor that there is a 1:1 mapping between what this patch > changes and what needs to be changed. Section "Limit Checking" in volume 3 has "When the effective limit is FFFFFFFFH (4 GBytes), these accesses may or may not cause the indicated exceptions. Behavior is implementation-specific and may vary from one execution to another." Additionally section "Segment Wraparound" in volume 3 has "The behavior when executing near the limit of a 4-GByte selector (limit = FFFFFFFFH) is different between the Pentium Pro and the Pentium 4 family of processors. On the Pentium Pro, instructions which cross the limit -- for example, a two byte instruction such as INC EAX that is encoded as FFH C0H starting exactly at the limit faults for a segment violation (a one byte instruction at FFFFFFFFH does not cause an exception). Using the Pentium 4 microprocessor family, neither of these situations causes a fault." > But assuming the above are true (and the change I asked about was > intentional): > > Reviewed-by: George Dunlap <george.dunlap@citrix.com> Thanks. > If that's not good enough to check it in, feel free to give me the > appropriate references to the Intel manual that will allow me to > independently verify those facts, and get rid of my caveat. The patch still needs Andrew's ack before it can go in. Jan
diff --git a/tools/fuzz/x86_instruction_emulator/fuzz-emul.c b/tools/fuzz/x86_instruction_emulator/fuzz-emul.c index a2329f84a5..105145e9f9 100644 --- a/tools/fuzz/x86_instruction_emulator/fuzz-emul.c +++ b/tools/fuzz/x86_instruction_emulator/fuzz-emul.c @@ -139,7 +139,18 @@ static int fuzz_read( struct x86_emulate_ctxt *ctxt) { /* Reads expected for all user and system segments. */ - assert(is_x86_user_segment(seg) || is_x86_system_segment(seg)); + if ( is_x86_user_segment(seg) ) + assert(ctxt->addr_size == 64 || !(offset >> 32)); + else if ( seg == x86_seg_tr ) + /* + * The TSS is special in that accesses below the segment base are + * possible, as the Interrupt Redirection Bitmap starts 32 bytes + * ahead of the I/O Bitmap, regardless of the value of the latter. + */ + assert((long)offset < 0 ? (long)offset > -32 : !(offset >> 17)); + else + assert(is_x86_system_segment(seg) && + (ctxt->lma ? offset <= 0x10007 : !(offset >> 16))); return data_read(ctxt, seg, "read", p_data, bytes); } @@ -162,6 +173,13 @@ static int fuzz_insn_fetch( { assert(seg == x86_seg_cs); + /* Minimal segment limit checking, until full one is being put in place. */ + if ( ctxt->addr_size < 64 && (offset >> 32) ) + { + x86_emul_hw_exception(13, 0, ctxt); + return X86EMUL_EXCEPTION; + } + /* * Zero-length instruction fetches are made at the destination of jumps, * to perform segmentation checks. No data needs returning. @@ -232,6 +250,7 @@ static int fuzz_rep_ins( struct x86_emulate_ctxt *ctxt) { assert(dst_seg == x86_seg_es); + assert(ctxt->addr_size == 64 || !(dst_offset >> 32)); return _fuzz_rep_read(ctxt, "rep_ins", reps); } @@ -247,6 +266,7 @@ static int fuzz_rep_movs( { assert(is_x86_user_segment(src_seg)); assert(dst_seg == x86_seg_es); + assert(ctxt->addr_size == 64 || !((src_offset | dst_offset) >> 32)); return _fuzz_rep_read(ctxt, "rep_movs", reps); } @@ -260,6 +280,7 @@ static int fuzz_rep_outs( struct x86_emulate_ctxt *ctxt) { assert(is_x86_user_segment(src_seg)); + assert(ctxt->addr_size == 64 || !(src_offset >> 32)); return _fuzz_rep_write(ctxt, "rep_outs", reps); } @@ -277,6 +298,7 @@ static int fuzz_rep_stos( * for CLZERO. */ assert(is_x86_user_segment(seg)); + assert(ctxt->addr_size == 64 || !(offset >> 32)); return _fuzz_rep_write(ctxt, "rep_stos", reps); } @@ -290,6 +312,7 @@ static int fuzz_write( { /* Writes not expected for any system segments. */ assert(is_x86_user_segment(seg)); + assert(ctxt->addr_size == 64 || !(offset >> 32)); return maybe_fail(ctxt, "write", true); } @@ -306,8 +329,10 @@ static int fuzz_cmpxchg( * Cmpxchg expected for user segments, and setting accessed/busy bits in * GDT/LDT enties, but not expected for any IDT or TR accesses. */ - assert(is_x86_user_segment(seg) || - seg == x86_seg_gdtr || seg == x86_seg_ldtr); + if ( is_x86_user_segment(seg) ) + assert(ctxt->addr_size == 64 || !(offset >> 32)); + else + assert((seg == x86_seg_gdtr || seg == x86_seg_ldtr) && !(offset >> 16)); return maybe_fail(ctxt, "cmpxchg", true); } @@ -319,6 +344,7 @@ static int fuzz_invlpg( { /* invlpg(), unlike all other hooks, may be called with x86_seg_none. */ assert(is_x86_user_segment(seg) || seg == x86_seg_none); + assert(ctxt->addr_size == 64 || !(offset >> 32)); return maybe_fail(ctxt, "invlpg", false); } diff --git a/xen/arch/x86/x86_emulate/x86_emulate.c b/xen/arch/x86/x86_emulate/x86_emulate.c index c1e2300b39..31df5aeb97 100644 --- a/xen/arch/x86/x86_emulate/x86_emulate.c +++ b/xen/arch/x86/x86_emulate/x86_emulate.c @@ -1249,10 +1249,10 @@ static void __put_rep_prefix( /* Clip maximum repetitions so that the index register at most just wraps. */ #define truncate_ea_and_reps(ea, reps, bytes_per_rep) ({ \ - unsigned long todo__, ea__ = truncate_word(ea, ad_bytes); \ + unsigned long todo__, ea__ = truncate_ea(ea); \ if ( !(_regs.eflags & X86_EFLAGS_DF) ) \ - todo__ = truncate_word(-(ea), ad_bytes) / (bytes_per_rep); \ - else if ( truncate_word((ea) + (bytes_per_rep) - 1, ad_bytes) < ea__ )\ + todo__ = truncate_ea(-ea__) / (bytes_per_rep); \ + else if ( truncate_ea(ea__ + (bytes_per_rep) - 1) < ea__ ) \ todo__ = 1; \ else \ todo__ = ea__ / (bytes_per_rep) + 1; \ @@ -3136,6 +3136,7 @@ x86_emulate( op_bytes + (((-src.val - 1) >> 3) & ~(op_bytes - 1L)); else ea.mem.off += (src.val >> 3) & ~(op_bytes - 1L); + ea.mem.off = truncate_ea(ea.mem.off); } /* Bit index always truncated to within range. */ @@ -3354,7 +3355,7 @@ x86_emulate( unsigned long src_val2; int lb, ub, idx; generate_exception_if(src.type != OP_MEM, EXC_UD); - if ( (rc = read_ulong(src.mem.seg, src.mem.off + op_bytes, + if ( (rc = read_ulong(src.mem.seg, truncate_ea(src.mem.off + op_bytes), &src_val2, op_bytes, ctxt, ops)) ) goto done; ub = (op_bytes == 2) ? (int16_t)src_val2 : (int32_t)src_val2; @@ -3905,7 +3906,7 @@ x86_emulate( seg = (b & 1) * 3; /* es = 0, ds = 3 */ les: generate_exception_if(src.type != OP_MEM, EXC_UD); - if ( (rc = read_ulong(src.mem.seg, src.mem.off + src.bytes, + if ( (rc = read_ulong(src.mem.seg, truncate_ea(src.mem.off + src.bytes), &dst.val, 2, ctxt, ops)) != X86EMUL_OKAY ) goto done; ASSERT(is_x86_user_segment(seg)); @@ -4939,7 +4940,8 @@ x86_emulate( case 5: /* jmp (far, absolute indirect) */ generate_exception_if(src.type != OP_MEM, EXC_UD); - if ( (rc = read_ulong(src.mem.seg, src.mem.off + op_bytes, + if ( (rc = read_ulong(src.mem.seg, + truncate_ea(src.mem.off + op_bytes), &imm2, 2, ctxt, ops)) ) goto done; imm1 = src.val; @@ -5126,8 +5128,8 @@ x86_emulate( } if ( (rc = ops->write(ea.mem.seg, ea.mem.off, &sreg.limit, 2, ctxt)) != X86EMUL_OKAY || - (rc = ops->write(ea.mem.seg, ea.mem.off + 2, &sreg.base, - op_bytes, ctxt)) != X86EMUL_OKAY ) + (rc = ops->write(ea.mem.seg, truncate_ea(ea.mem.off + 2), + &sreg.base, op_bytes, ctxt)) != X86EMUL_OKAY ) goto done; break; @@ -5137,9 +5139,9 @@ x86_emulate( generate_exception_if(!mode_ring0(), EXC_GP, 0); fail_if(ops->write_segment == NULL); memset(&sreg, 0, sizeof(sreg)); - if ( (rc = read_ulong(ea.mem.seg, ea.mem.off+0, + if ( (rc = read_ulong(ea.mem.seg, ea.mem.off, &limit, 2, ctxt, ops)) || - (rc = read_ulong(ea.mem.seg, ea.mem.off+2, + (rc = read_ulong(ea.mem.seg, truncate_ea(ea.mem.off + 2), &base, mode_64bit() ? 8 : 4, ctxt, ops)) ) goto done; generate_exception_if(!is_canonical_address(base), EXC_GP, 0);