diff mbox

[RFC,2/2] KVM: emulate: clean up initializations in init_decode_cache

Message ID 1396564070-5586-3-git-send-email-bsd@redhat.com (mailing list archive)
State New, archived
Headers show

Commit Message

Bandan Das April 3, 2014, 10:27 p.m. UTC
A lot of initializations are unnecessary as they get set to
appropriate values before actually being used. Remove some
of them and rework some others if the conditions that set
them are not true

Signed-off-by: Bandan Das <bsd@redhat.com>
---
 arch/x86/include/asm/kvm_emulate.h | 16 +++++++------
 arch/x86/kvm/emulate.c             | 46 +++++++++++++++++++++++++++++++++-----
 2 files changed, 50 insertions(+), 12 deletions(-)

Comments

Paolo Bonzini April 4, 2014, 9:47 a.m. UTC | #1
Hi Bandan.

> diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
> index 8e2b866..eac488b 100644
> --- a/arch/x86/kvm/emulate.c
> +++ b/arch/x86/kvm/emulate.c
> @@ -1072,6 +1072,9 @@ static int decode_modrm(struct x86_emulate_ctxt *ctxt,
>  		ctxt->modrm_reg = (ctxt->rex_prefix & 4) << 1;	/* REX.R */
>  		index_reg = (ctxt->rex_prefix & 2) << 2; /* REX.X */
>  		ctxt->modrm_rm = base_reg = (ctxt->rex_prefix & 1) << 3; /* REG.B */
> +	} else {
> +		ctxt->modrm_reg = 0;
> +		ctxt->modrm_rm = 0;
>  	}

You can drop the if altogether here (and also the initialization
in "int index_reg = 0, base_reg = 0, scale;").

Also, getting down to micro-micro-optimization, the following will give
slightly better code:

 		ctxt->modrm_reg = (ctxt->rex_prefix << 1) & 8; /* REX.R */
 		index_reg = (ctxt->rex_prefix << 2) & 8; /* REX.X */
 		ctxt->modrm_rm = base_reg = (ctxt->rex_prefix << 3) & 8; /* REG.B */

because x86 can do a three-operand shift (using the LEA instruction), but
not three-operand AND.

>  	ctxt->modrm_mod |= (ctxt->modrm & 0xc0) >> 6;
> @@ -4357,6 +4360,8 @@ done_prefixes:
>  
>  	if (ctxt->d & ModRM)
>  		ctxt->modrm = insn_fetch(u8, ctxt);
> +	else
> +		ctxt->modrm = 0;

I think in the "else" case modrm won't be read at all, but it is indeed a bit
safer this way.  You can just leave it in the "zeroed" part of ctxt.  There is
padding in the struct, so you get the initialization for free.


>  	while (ctxt->d & GroupMask) {
>  		switch (ctxt->d & GroupMask) {
> @@ -4435,10 +4440,14 @@ done_prefixes:
>  			ctxt->op_bytes = 16;
>  		else if (ctxt->d & Mmx)
>  			ctxt->op_bytes = 8;
> +	} else {
> +		ctxt->intercept = 0;

You can add a preparatory patch that checks (ctxt->d & Intercept) in
x86_emulate_insn instead of ctxt->intercept and skip this initialization.

> +		ctxt->check_perm = NULL;

Similarly you can check (ctxt->d & CheckPerm) and skip this initialization.

>  	}

Same here: the common case will zero them, might as well leave it to the memset.

>  
>  	/* ModRM and SIB bytes. */
>  	if (ctxt->d & ModRM) {
> +		ctxt->modrm_mod = 0;

What about instead changing

        ctxt->modrm_mod |= (ctxt->modrm & 0xc0) >> 6;

from "|=" to "="?

>  		rc = decode_modrm(ctxt, &ctxt->memop);
>  		if (!ctxt->has_seg_override)
>  			set_seg_override(ctxt, ctxt->modrm_seg);
> @@ -4552,14 +4561,41 @@ static int fastop(struct x86_emulate_ctxt *ctxt, void (*fop)(struct fastop *))
>  
>  void init_decode_cache(struct x86_emulate_ctxt *ctxt)
>  {
> -	memset(&ctxt->opcode_len, 0,
> -	       (void *)&ctxt->_regs - (void *)&ctxt->opcode_len);
>  
> -	ctxt->fetch.start = 0;
> -	ctxt->fetch.end = 0;
> +	/*
> +	 * Variables that don't require initializing to 0
> +	 * opcode_len - set in x86_decode_insn
> +	 * b - set in x86_decode_insn
> +	 * intercept - conditionally set in x86_decode_insn, added
> +	 *             else set to 0
> +	 * op_bytes - initialized in x86_decode_insn
> +	 * ad_bytes - initialized in x86_decode_insn
> +	 * rex_prefix - conditionally set in x86_decode_isn

I think rex_prefix must be in the zeroed area.  Again, with careful field
placement you get that for free.

> +	 * struct operands src,src2,dst - set by calling decode_operand
> +	 *                                in x86_decode_insn,
> +	 *                                default.type = OP_NONE
> +	 * (*execute) - set in x86_decode_insn
> +	 * (*check_perm) - conditionally set in x86_decode_insn, added
> +	 *                 else set to 0
> +	 * d - set in x86_decode_insn
> +	 * modrm - conditionally set in x86_decode_insn, added else set to 0
> +	 * modrm_mod - or'ed in decode_modrm which is conditionally called in
> +	 *             in x86_decode_insn, added initialization to 0 before call
> +	 * modrm_reg - set in decode_modrm or else decode_register_operand
> +	 * modrm_rm - set in decode_modrm, added else set to 0
> +	 * modrm_seg - set in decode_modrm
> +	 * _eip - set in x86_decode_insn
> +	 * memop - .type set to OP_NONE in x86_decode_insn
> +	 * ctxt->fetch.start - set in x86_decode_insn
> +	 * ctxt->fetch.end
> +	 * ctxt->mem_read.pos - set in x86_emulate_insn

Apart from the above suggestions, I agree with this patch.

Also, memopp is initialized to NULL and can be moved to the zeroed area.
But perhaps we can remove memopp, see below.

You now have:

+	u8 lock_prefix;
+	u8 rep_prefix;
 	bool has_seg_override;
 	u8 seg_override;
 	u64 d;
+	bool rip_relative;
+	/* bitmaps of registers in _regs[] that can be read */
+	u32 regs_valid;
+	/* bitmaps of registers in _regs[] that have been written */
+	u32 regs_dirty;

Even if we add back a couple of fields, that's pretty good!  It's 32
bytes (four stores).  rip_relative introduces a lot of padding; move
it together with other u8 and bool fields, and that makes 24.
And ctxt->d is in the zeroed area by mistake, which makes it 16.

There is a little more that you can do in this field.

seg_override can be moved out of the zeroed area, and the seg_override()
calls can be changed to direct accesses of the field.  Please double check
though. :)  Actually, if the above is true, has_seg_override can also be
eliminated and moved to x86_decode_insn as a local variable.

We could also get rid completely of rip_relative and move it to a local
variable in decode_modrm.  Currently it is in x86_decode_insn because
of this earlier check:

        if (ctxt->memop.type == OP_MEM && ctxt->ad_bytes != 8)
                ctxt->memop.addr.mem.ea = (u32)ctxt->memop.addr.mem.ea;

	...

        if (ctxt->memopp && ctxt->memopp->type == OP_MEM && ctxt->rip_relative)
                ctxt->memopp->addr.mem.ea += ctxt->_eip;


but I think the earlier check can be moved to decode_modrm as well.
This is my reasoning:

- the first "if" can only trigger for modrm (decode_abs will not set
mem.ea to an out-of-range value if ctxt->ad_bytes != 8)

- ctxt->memopp != NULL means that *ctxt->memopp was copied from ctxt->memop
(so it must be either ModRM or MemAbs), but rip_relative is set to 1
only if the instruction is ModRM, never for MemAbs.

And once you do this, memopp is never read anymore and can be dropped.


This gives the following series:

- patch 1 from here

- replace ->intercept and ->check_perm checks with ctxt->d checks

- cleanups and code changes from this patch, except struct reorganization and
memset change

- struct reorganization and memset change; instead of the large comment in
init_decode_cache, please add comments in front of each field or group of
fields

- removing has_seg_override and moving seg_override out of the zeroed area

- removing memopp and rip_relative

(BTW, I found a bug in my second series while studying your patch.  It
ignores the mem_read cache in prepare_memory_operand, but it shouldn't).

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/arch/x86/include/asm/kvm_emulate.h b/arch/x86/include/asm/kvm_emulate.h
index ad4cca8..ccb7911 100644
--- a/arch/x86/include/asm/kvm_emulate.h
+++ b/arch/x86/include/asm/kvm_emulate.h
@@ -315,30 +315,32 @@  struct x86_emulate_ctxt {
 	u8 opcode_len;
 	u8 b;
 	u8 intercept;
-	u8 lock_prefix;
-	u8 rep_prefix;
 	u8 op_bytes;
 	u8 ad_bytes;
 	u8 rex_prefix;
 	struct operand src;
 	struct operand src2;
 	struct operand dst;
+	int (*execute)(struct x86_emulate_ctxt *ctxt);
+	int (*check_perm)(struct x86_emulate_ctxt *ctxt);
+	u8 lock_prefix;
+	u8 rep_prefix;
 	bool has_seg_override;
 	u8 seg_override;
 	u64 d;
-	int (*execute)(struct x86_emulate_ctxt *ctxt);
-	int (*check_perm)(struct x86_emulate_ctxt *ctxt);
+	bool rip_relative;
+	/* bitmaps of registers in _regs[] that can be read */
+	u32 regs_valid;
+	/* bitmaps of registers in _regs[] that have been written */
+	u32 regs_dirty;
 	/* modrm */
 	u8 modrm;
 	u8 modrm_mod;
 	u8 modrm_reg;
 	u8 modrm_rm;
 	u8 modrm_seg;
-	bool rip_relative;
 	unsigned long _eip;
 	struct operand memop;
-	u32 regs_valid;  /* bitmaps of registers in _regs[] that can be read */
-	u32 regs_dirty;  /* bitmaps of registers in _regs[] that have been written */
 	/* Fields above regs are cleared together. */
 	unsigned long _regs[NR_VCPU_REGS];
 	struct operand *memopp;
diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index 8e2b866..eac488b 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -1072,6 +1072,9 @@  static int decode_modrm(struct x86_emulate_ctxt *ctxt,
 		ctxt->modrm_reg = (ctxt->rex_prefix & 4) << 1;	/* REX.R */
 		index_reg = (ctxt->rex_prefix & 2) << 2; /* REX.X */
 		ctxt->modrm_rm = base_reg = (ctxt->rex_prefix & 1) << 3; /* REG.B */
+	} else {
+		ctxt->modrm_reg = 0;
+		ctxt->modrm_rm = 0;
 	}
 
 	ctxt->modrm_mod |= (ctxt->modrm & 0xc0) >> 6;
@@ -4357,6 +4360,8 @@  done_prefixes:
 
 	if (ctxt->d & ModRM)
 		ctxt->modrm = insn_fetch(u8, ctxt);
+	else
+		ctxt->modrm = 0;
 
 	while (ctxt->d & GroupMask) {
 		switch (ctxt->d & GroupMask) {
@@ -4435,10 +4440,14 @@  done_prefixes:
 			ctxt->op_bytes = 16;
 		else if (ctxt->d & Mmx)
 			ctxt->op_bytes = 8;
+	} else {
+		ctxt->intercept = 0;
+		ctxt->check_perm = NULL;
 	}
 
 	/* ModRM and SIB bytes. */
 	if (ctxt->d & ModRM) {
+		ctxt->modrm_mod = 0;
 		rc = decode_modrm(ctxt, &ctxt->memop);
 		if (!ctxt->has_seg_override)
 			set_seg_override(ctxt, ctxt->modrm_seg);
@@ -4552,14 +4561,41 @@  static int fastop(struct x86_emulate_ctxt *ctxt, void (*fop)(struct fastop *))
 
 void init_decode_cache(struct x86_emulate_ctxt *ctxt)
 {
-	memset(&ctxt->opcode_len, 0,
-	       (void *)&ctxt->_regs - (void *)&ctxt->opcode_len);
 
-	ctxt->fetch.start = 0;
-	ctxt->fetch.end = 0;
+	/*
+	 * Variables that don't require initializing to 0
+	 * opcode_len - set in x86_decode_insn
+	 * b - set in x86_decode_insn
+	 * intercept - conditionally set in x86_decode_insn, added
+	 *             else set to 0
+	 * op_bytes - initialized in x86_decode_insn
+	 * ad_bytes - initialized in x86_decode_insn
+	 * rex_prefix - conditionally set in x86_decode_isn
+	 * struct operands src,src2,dst - set by calling decode_operand
+	 *                                in x86_decode_insn,
+	 *                                default.type = OP_NONE
+	 * (*execute) - set in x86_decode_insn
+	 * (*check_perm) - conditionally set in x86_decode_insn, added
+	 *                 else set to 0
+	 * d - set in x86_decode_insn
+	 * modrm - conditionally set in x86_decode_insn, added else set to 0
+	 * modrm_mod - or'ed in decode_modrm which is conditionally called in
+	 *             in x86_decode_insn, added initialization to 0 before call
+	 * modrm_reg - set in decode_modrm or else decode_register_operand
+	 * modrm_rm - set in decode_modrm, added else set to 0
+	 * modrm_seg - set in decode_modrm
+	 * _eip - set in x86_decode_insn
+	 * memop - .type set to OP_NONE in x86_decode_insn
+	 * ctxt->fetch.start - set in x86_decode_insn
+	 * ctxt->fetch.end
+	 * ctxt->mem_read.pos - set in x86_emulate_insn
+	 */
+
+	memset(&ctxt->lock_prefix, 0,
+	       (void *)&ctxt->modrm - (void *)&ctxt->lock_prefix);
+
 	ctxt->io_read.pos = 0;
 	ctxt->io_read.end = 0;
-	ctxt->mem_read.pos = 0;
 	ctxt->mem_read.end = 0;
 }