From patchwork Fri Jan 24 00:10:30 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Michael Clark X-Patchwork-Id: 13948794 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id D4BA0C0218B for ; Fri, 24 Jan 2025 00:13:33 +0000 (UTC) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1tb7Ja-0002ti-Bw; Thu, 23 Jan 2025 19:12:50 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1tb7JX-0002sp-36 for qemu-devel@nongnu.org; Thu, 23 Jan 2025 19:12:47 -0500 Received: from anarch128.org ([2001:4801:7825:104:be76:4eff:fe10:52ae]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1tb7JS-0001JC-2T for qemu-devel@nongnu.org; Thu, 23 Jan 2025 19:12:46 -0500 Received: from localhost.localdomain (default-rdns.vocus.co.nz [202.150.110.104] (may be forged)) (authenticated bits=0) by anarch128.org (8.15.2/8.15.2/Debian-22+deb11u3) with ESMTPSA id 50O0CHKp2247057 (version=TLSv1.3 cipher=TLS_AES_256_GCM_SHA384 bits=256 verify=NO); Fri, 24 Jan 2025 00:12:27 GMT Authentication-Results: anarch128.org; auth=pass; dkim=pass (2048-bit rsa key sha256) header.d=anarch128.org header.i=@anarch128.org header.b=ZPpesUFQ header.a=rsa-sha256 header.s=100003; x-return-mx=pass header.domain=anarch128.org policy.is_org=yes (MX Records found: mail.anarch128.org); x-return-mx=pass smtp.domain=anarch128.org policy.is_org=yes (MX Records found: mail.anarch128.org) DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=anarch128.org; s=100003; t=1737677550; bh=tz9f/13fOkjpsPPzzOrlrjDrof3elwb/EdndyVEDpXo=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=ZPpesUFQb1XArZNm3S443LaiwYldcFTpYZIqDYwCsaSowIPPHrOjcTV+xj2XpqVXv cfyhD6nuHlMwDjRlSCkpz8GhxZyiMfPMYFCngrgSwH9aRkKvsgiYh3AewaZf+JNimU mbSbm1oQtLkHbk3JxnOZ7lC5JeE9aIGw/pv12J1oQ6OSDL+NS45T2SkxU9taYI+HKy 0bD0wam9DEIcyRVJikWa+gkAPBJqS8PUO/gA5t8di7gGbffLc5KC8KUzVK4dkWT4ff GdcezbWftB3/ZVdwvp+pwmMKPk00NdubU3aOYBYK4RMLwoKtWL8FH+RurmXgQBMc94 qZeiPWJBFsp9g== From: Michael Clark To: qemu-devel@nongnu.org, Richard Henderson , Eduardo Habkost , Paolo Bonzini , Zhao Liu Cc: Michael Clark Subject: [PATCH v1 2/4] x86-disas: add x86-mini metadata documentation Date: Fri, 24 Jan 2025 13:10:30 +1300 Message-ID: <20250124001032.1073926-3-michael@anarch128.org> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20250124001032.1073926-1-michael@anarch128.org> References: <20250124001032.1073926-1-michael@anarch128.org> MIME-Version: 1.0 Received-SPF: pass client-ip=2001:4801:7825:104:be76:4eff:fe10:52ae; envelope-from=michael@anarch128.org; helo=anarch128.org X-Spam_score_int: -16 X-Spam_score: -1.7 X-Spam_bar: - X-Spam_report: (-1.7 / 5.0 requ) BAYES_00=-1.9, DKIM_INVALID=0.1, DKIM_SIGNED=0.1, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001 autolearn=no autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org add detailed information on the instruction opcode encoding format for LEX/VEX/EVEX prefix, map and opcode encoding, the operand encoding format, the field order encoding format and notes on instruction synthesis for parameterized opcodes. Signed-off-by: Michael Clark --- docs/x86-metadata.txt | 301 ++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 301 insertions(+) create mode 100644 docs/x86-metadata.txt diff --git a/docs/x86-metadata.txt b/docs/x86-metadata.txt new file mode 100644 index 000000000000..1e4756069d9d --- /dev/null +++ b/docs/x86-metadata.txt @@ -0,0 +1,301 @@ +x86 Instruction Set Metadata +============================ + +Legacy x86 instructions have been parameterized in the instruction set +metadata using a new LEX prefix for instruction encoding with abstract width +suffix codes that synthesize multiple instruction widths using combinations +of operand size prefixes and `REX.W` bits. This new LEX format makes legacy +instruction encodings consistent with VEX and EVEX encodings as well as +eliminating some redundancy in the metadata. + +There are a small number of special cases for legacy instructions which need +mode-dependent overrides for cases such as, a different opcode is used for +different modes, or the instruction has a quirk where the operand size does +not follow the default rules for instruction word and address sizes: + +- `.wx` is used to specify 64-bit instructions that default to 32-bit + operands in 64-bit mode. +- `.ww` is used to specify 64-bit instructions that default to 64-bit + operands in 64-bit mode. +- `.o16` is used to specify an instruction override specific to 16-bit mode. +- `.o32` is used to specify an instruction override specific to 32-bit mode. +- `.o64` is used to specify an instruction override specific to 64-bit mode. + +CSV File Format +=============== + +The instruction set metadata in the `data` directory has the following fields +which map to instruction encoding tables in the Intel Architecture Software +Developer's Manual: + +- _Instruction_: opcode and operands from Opcode/Instruction column. +- _Opcode_: instruction encoding from Opcode/Instruction column. +- _Valid 64-bit_: 64-bit valid field from 64/32 bit Mode Support column. +- _Valid 32-bit_: 32-bit valid field from 64/32 bit Mode Support column. +- _Valid 16-bit_: 16-bit valid field from Compat/Legacy Mode column. +- _Feature Flags_: extension name from CPUID Feature Flag column. +- _Operand 1_: Operand 1 column from Instruction Operand Encoding table. +- _Operand 2_: Operand 2 column from Instruction Operand Encoding table. +- _Operand 3_: Operand 3 column from Instruction Operand Encoding table. +- _Operand 4_: Operand 4 column from Instruction Operand Encoding table. +- _Tuple Type_: Tuple Type column from Instruction Operand Encoding table. + +The instruction set metadata in the `data` directory is derived from +[x86-csv](https://github.com/GregoryComer/x86-csv), although it has had +extensive modifications to fix transcription errors, to revise legacy +instruction encodings to conform to the new LEX format, as well as add +missing details such as missing operands or recently added AVX-512 +instruction encodings and various other instruction set extensions. + +Table Generation +================ + +The appendices outline the printable form of the mnemonics used in the +generated tables to describe operands, instruction encodings and field order. +The mnemonics are referenced in the instruction set metadata files which are +translated to enums and arrays by `scripts/x86-tablegen.py` which then map to +the enum type and set definitions in `disas/x86.h`: + +- _enum x86_opr_ - operand encoding enum type and set attributes. +- _enum x86_enc_ - instruction encoding enum type and set attributes. +- _enum x86_ord_ - operand to instruction encoding field map set attributes. + +The enum values are combined together with _logical or_ combinations to +form the primary metadata tables used by the encoder and decoder library: + +- _struct x86_opc_data_ - table type for instruction opcode encodings. +- _struct x86_opr_data_ - table type for unique sets of instruction operands. +- _struct x86_ord_data_ - table type for unique sets of instruction field orders. + +***Note***: There are some differences between the mnemonics used in the +CSV metadata and the C enums. Exceptions are described in `operand_map` and +`opcode_map` within `scripts/x86_tablegen.py`. The primary differences are +in the names used in the operand columns to indicate operand field order, +otherwise a type prefix is added, dots and brackets are omitted, and forward +slashes are translated to underscores. + +Appendices +========== + +This section describes the mnemonics used in the primary data structures: + +- _Appendix A - Operand Encoding_ - describes instruction operands. +- _Appendix B - Operand Order_ - describes instruction field ordering. +- _Appendix C - Instruction Encoding Prefixes_ - describes encoding prefixes. +- _Appendix D - Instruction Encoding Suffixes_ - describes encoding suffixes. +- _Appendix E - Instruction Synthesis Notes_ - notes on prefix synthesis. + +Appendix A - Operand Encoding +============================= + +This table outlines the operand mnemonics used in instruction operands +_(enum x86_opr)_. + +| operand | description | +|:-------------------|:------------------------------------------------------| +| `r` | integer register | +| `v` | vector register | +| `k` | mask register | +| `seg` | segment register | +| `creg` | control register | +| `dreg` | debug register | +| `bnd` | bound register | +| `mem` | memory reference | +| `rw` | integer register word-sized (16/32/64 bit) | +| `ra` | integer register addr-sized (16/32/64 bit) | +| `mw` | memory reference word-sized (16/32/64 bit) | +| `mm` | vector register 64-bit | +| `xmm` | vector register 128-bit | +| `ymm` | vector register 256-bit | +| `zmm` | vector register 512-bit | +| `r8` | register 8-bit | +| `r16` | register 16-bit | +| `r32` | register 32-bit | +| `r64` | register 64-bit | +| `m8` | memory reference 8-bit byte | +| `m16` | memory reference 16-bit word | +| `m32` | memory reference 32-bit dword | +| `m64` | memory reference 64-bit qword | +| `m128` | memory reference 128-bit oword/xmmword | +| `m256` | memory reference 256-bit ymmword | +| `m512` | memory reference 512-bit zmmword | +| `m80` | memory reference 80-bit tword/tbyte | +| `m384` | memory reference 384-bit key locker handle | +| `mib` | memory reference bound | +| `m16bcst` | memory reference 16-bit word broadcast | +| `m32bcst` | memory reference 32-bit word broadcast | +| `m64bcst` | memory reference 64-bit word broadcast | +| `vm32` | vector memory 32-bit | +| `vm64` | vector memory 64-bit | +| `{er}` | operand suffix - embedded rounding control | +| `{k}` | operand suffix - apply mask register | +| `{sae}` | operand suffix - suppress all execptions | +| `{z}` | operand suffix - zero instead of merge | +| `{rs2}` | operand suffix - register stride 2 | +| `{rs4}` | operand suffix - register stride 4 | +| `r/m8` | register unsized memory 8-bit | +| `r/m16` | register unsized memory 16-bit | +| `r/m32` | register unsized memory 32-bit | +| `r/m64` | register unsized memory 64-bit | +| `k/m8` | mask register memory 8-bit | +| `k/m16` | mask register memory 16-bit | +| `k/m32` | mask register memory 32-bit | +| `k/m64` | mask register memory 64-bit | +| `bnd/m64` | bound register memory 64-bit | +| `bnd/m128` | bound register memory 128-bit | +| `rw/mw` | register or memory 16/32/64-bit (word size) | +| `r8/m8` | 8-bit register 8-bit memory | +| `r?/m?` | N-bit register N-bit memory | +| `mm/m?` | 64-bit vector N-bit memory | +| `xmm/m?` | 128-bit vector N-bit memory | +| `ymm/m?` | 256-bit vector N-bit memory | +| `zmm/m?` | 512-bit vector N-bit memory | +| `xmm/m?/m?bcst` | 128-bit vector N-bit memory N-bit broadcast | +| `ymm/m?/m?bcst` | 256-bit vector N-bit memory N-bit broadcast | +| `zmm/m?/m?bcst` | 512-bit vector N-bit memory N-bit broadcast | +| `vm32x` | 32-bit vector memory in xmm | +| `vm32y` | 32-bit vector memory in ymm | +| `vm32z` | 32-bit vector memory in zmm | +| `vm64x` | 64-bit vector memory in xmm | +| `vm64y` | 64-bit vector memory in ymm | +| `vm64z` | 64-bit vector memory in zmm | +| `st0` | implicit register st0 | +| `st1` | implicit register st1 | +| `es` | implicit segment es | +| `cs` | implicit segment cs | +| `ss` | implicit segment ss | +| `ds` | implicit segment ds | +| `fs` | implicit segment fs | +| `gs` | implicit segment gs | +| `aw` | implicit register (ax/eax/rax) | +| `cw` | implicit register (cx/ecx/rcx) | +| `dw` | implicit register (dx/edx/rdx) | +| `bw` | implicit register (bx/ebx/rbx) | +| `pa` | implicit indirect register (ax/eax/rax) | +| `pc` | implicit indirect register (cx/ecx/rcx) | +| `pd` | implicit indirect register (dx/edx/rdx) | +| `pb` | implicit indirect register (bx/ebx/rbx) | +| `psi` | implicit indirect register (si/esi/rsi) | +| `pdi` | implicit indirect register (di/edi/rdi) | +| `xmm0` | implicit register xmm0 | +| `xmm0_7` | implicit registers xmm0-xmm7 | +| `1` | constant 1 | +| `ib` | 8-bit immediate | +| `iw` | 16-bit or 32-bit immediate (mode + operand size) | +| `id` | 32-bit immediate | +| `iq` | 64-bit immediate | +| `rel8` | 8-bit displacement | +| `relw` | 6-bit or 32-bit displacement (mode + operand size) | +| `moffs` | indirect memory offset | +| `far16/16` | 16-bit seg 16-bit far displacement | +| `far16/32` | 16-bit seg 32-bit far displacement | +| `memfar16/16` | indirect 16-bit seg 16-bit far displacement | +| `memfar16/32` | indirect 16-bit seg 32-bit far displacement | +| `memfar16/64` | indirect 16-bit seg 64-bit far displacement | + +Appendix B - Operand Order +========================== + +This table outlines the mnemonics used to map operand field order +_(enum x86_ord)_. + +| mnemonic | description | +|:---------|:----------------------------------------------------------------| +| `imm` | ib, iw, i16, i32, i64 | +| `reg` | modrm.reg | +| `mrm` | modrm.r/m | +| `sib` | modrm.r/m sib | +| `is4` | register from ib | +| `ime` | i8, i16 (special case for CALLF/JMPF/ENTER) | +| `vec` | VEX.vvvv | +| `opr` | opcode +r | +| `one` | constant 1 | +| `rax` | constant al/ax/eax/rax | +| `rcx` | constant cl/cx/ecx/rcx | +| `rdx` | constant dl/dx/edx/rdx | +| `rbx` | constant bl/bx/ebx/rbx | +| `rsp` | constant sp/esp/rsp | +| `rbp` | constant bp/ebp/rbp | +| `rsi` | constant si/esi/rsi | +| `rdi` | constant di/edi/rdi | +| `st0` | constant st(0) | +| `stx` | constant st(i) | +| `seg` | constant segment | +| `xmm0` | constant xmm0 | +| `xmm0_7` | constant xmm0-xmm7 | +| `mxcsr` | constant mxcsr | +| `rflags` | constant rflags | + +Appendix C - Instruction Encoding Prefixes +========================================== + +This table outlines the mnemonic prefixes used in instruction encodings +_(enum x86_enc)_. + +| mnemonic | description | +|:---------|:----------------------------------------------------------------| +| `lex` | legacy instruction | +| `vex` | VEX encoded instruction | +| `evex` | EVEX encoded instruction | +| `.lz` | VEX encoding L=0 and L=1 is unassigned | +| `.l0` | VEX encoding L=0 | +| `.l1` | VEX encoding L=1 | +| `.lig` | VEX/EVEX encoding ignores length L=any | +| `.128` | VEX/EVEX encoding uses 128-bit vector L=0 | +| `.256` | VEX/EVEX encoding uses 256-bit vector L=1 | +| `.512` | EVEX encoding uses 512-bit vector L=2 | +| `.66` | prefix byte 66 is used for opcode mapping | +| `.f2` | prefix byte f2 is used for opcode mapping | +| `.f3` | prefix byte f3 is used for opcode mapping | +| `.9b` | prefix byte 9b is used for opcode mapping (x87 only) | +| `.0f` | map 0f is used in opcode | +| `.0f38` | map 0f38 is used in opcode | +| `.0f3a` | map 0f3a is used in opcode | +| `.wn` | no register extension, fixed operand size | +| `.wb` | register extension, fixed operand size | +| `.wx` | REX and/or operand size extension, optional 66 or REX.W0/W1 | +| `.ww` | REX and/or operand size extension, optional 66 and REX.WIG | +| `.w0` | LEX/VEX/EVEX optional REX W0 with operand size used in opcode | +| `.w1` | LEX/VEX/EVEX mandatory REX W1 with operand size used in opcode | +| `.wig` | VEX/EVEX encoding width ignored | + +Appendix D - Instruction Encoding Suffixes +========================================== + +This table outlines the mnemonic suffixes used in instruction encodings +_(enum x86_enc)_. + +| mnemonic | description | +|:---------|:----------------------------------------------------------------| +| `/r` | ModRM byte | +| `/0../9` | ModRM byte with 'r' field used for functions 0 to 7 | +| `XX+r` | opcode byte with 3-bit register added to the opcode | +| `XX` | opcode byte | +| `ib` | 8-bit immediate | +| `iw` | 16-bit or 32-bit immediate (real mode XOR operand size) | +| `i16` | 16-bit immediate | +| `i32` | 32-bit immediate | +| `i64` | 64-bit immediate | +| `o16` | encoding uses prefix 66 in 32-bit and 64-bit modes | +| `o32` | encoding uses prefix 66 in 16-bit mode | +| `o64` | encoding is used exclusively in 64-bit mode with REX.W=1 | +| `a16` | encoding uses prefix 67 in 32-bit and 64-bit modes | +| `a32` | encoding uses prefix 67 in 16-bit mode | +| `a64` | encoding is used exclusively in 64-bit mode | +| `lock` | memory operand encodings can be used with the LOCK prefix | + +Appendix E - Instruction Synthesis Notes +======================================== + +The `.wx` and `.ww` mnemonics are used to synthesize prefix combinations: + +- `.wx` labels opcodes with _default 32-bit operand size in 64-bit mode_ + to synthesize 16/32/64-bit versions using REX and operand size prefix, + or in 16/32-bit modes synthesizes 16/32-bit versions using only the + operand size prefix. REX is used for register extension on opcodes + with `rw` or `rw/mw` operands or fixed register operands like `aw`. +- `.ww` labels opcodes with _default 64-bit operand size in 64-bit mode_ + to synthesize 16/64-bit versions using only the operand size prefix, + or in 16/32-bit modes. synthesizes 16/32-bit versions using only the + operand size prefix. REX is used for register extension on opcodes + with `rw` or `rw/mw` operands or fixed register operands like `aw`.