diff mbox

[2/2] trace: separate MMIO tracepoints from TB-access tracepoints

Message ID 1455744555-22101-2-git-send-email-hollis_blanchard@mentor.com (mailing list archive)
State New, archived
Headers show

Commit Message

Hollis Blanchard Feb. 17, 2016, 9:29 p.m. UTC
Memory accesses to code which has previously been translated into a TB show up
in the MMIO path, so that they may invalidate the TB. It's extremely confusing
to mix those in with device MMIOs, so split them into their own tracepoint.

Signed-off-by: Hollis Blanchard <hollis_blanchard@mentor.com>

---
It took many hours to figure out why some RAM accesses were coming through the
MMIO path instead of being handled inline in the TBs.

On IRC, Paolo expressed some concern about performance, but ultimately agreed
that adding one conditional to an already heavy codepath wouldn't have much
impact.
---
 memory.c     | 25 +++++++++++++++++++++++++
 trace-events |  2 ++
 2 files changed, 27 insertions(+)

Comments

Hollis Blanchard Feb. 18, 2016, 6:37 p.m. UTC | #1
On 02/17/2016 01:29 PM, Hollis Blanchard wrote:
> diff --git a/trace-events b/trace-events
> index 756ce86..7994420 100644
> --- a/trace-events
> +++ b/trace-events
> @@ -1630,6 +1630,8 @@ memory_region_ops_read(int cpu_index, void *mr, uint64_t addr, uint64_t value, u
>   memory_region_ops_write(int cpu_index, void *mr, uint64_t addr, uint64_t value, unsigned size) "cpu %d mr %p addr %#"PRIx64" value %#"PRIx64" size %u"
>   memory_region_subpage_read(int cpu_index, void *mr, uint64_t offset, uint64_t value, unsigned size) "cpu %d mr %p offset %#"PRIx64" value %#"PRIx64" size %u"
>   memory_region_subpage_write(int cpu_index, void *mr, uint64_t offset, uint64_t value, unsigned size) "cpu %d mr %p offset %#"PRIx64" value %#"PRIx64" size %u"
> +memory_region_ops_tb_read(int cpu_index, uint64_t addr, uint64_t value, unsigned size) "cpu %d addr %#"PRIx64" value %#"PRIx64" size %u"
> +memory_region_ops_tb_write(int cpu_index, uint64_t addr, uint64_t value, unsigned size) "cpu %d addr %#"PRIx64" value %#"PRIx64" size %u"

Actually, I'd like to rename these new tracepoints to 
"memory_region_tb_read/write", for consistency with 
"memory_region_subpage_read/write". I'll wait for other comments before 
re-submitting though.

Hollis Blanchard
Mentor Graphics Emulation Division
Stefan Hajnoczi Feb. 24, 2016, 2:13 p.m. UTC | #2
On Wed, Feb 17, 2016 at 01:29:15PM -0800, Hollis Blanchard wrote:
> Memory accesses to code which has previously been translated into a TB show up
> in the MMIO path, so that they may invalidate the TB. It's extremely confusing
> to mix those in with device MMIOs, so split them into their own tracepoint.
> 
> Signed-off-by: Hollis Blanchard <hollis_blanchard@mentor.com>
> 
> ---
> It took many hours to figure out why some RAM accesses were coming through the
> MMIO path instead of being handled inline in the TBs.
> 
> On IRC, Paolo expressed some concern about performance, but ultimately agreed
> that adding one conditional to an already heavy codepath wouldn't have much
> impact.
> ---
>  memory.c     | 25 +++++++++++++++++++++++++
>  trace-events |  2 ++
>  2 files changed, 27 insertions(+)

Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Hollis Blanchard March 23, 2016, 4:47 p.m. UTC | #3
Paolo, is it true that only TB-invalidating writes go through the
io_mem_notdirty path? I'm looking at the live migration code now, and
it seems like every memory write will go through that path when global
dirty memory logging is enabled.
Paolo Bonzini March 23, 2016, 4:53 p.m. UTC | #4
On 23/03/2016 17:47, Hollis Blanchard wrote:
> Paolo, is it true that only TB-invalidating writes go through the
> io_mem_notdirty path? I'm looking at the live migration code now, and it
> seems like every memory write will go through that path when global
> dirty memory logging is enabled.

When live migration is enabled, writes to clean memory (almost all of
them) will go through that path indeed.  Some writes to the framebuffer
will go through that path too.

It depends on

      cpu_physical_memory_is_clean(
                        memory_region_get_ram_addr(section->mr) + xlat))

in tlb_set_page_with_attrs.

Paolo
Hollis Blanchard March 23, 2016, 5:16 p.m. UTC | #5
On Wed, 2016-03-23 at 17:53 +0100, Paolo Bonzini wrote:
> 
> On 23/03/2016 17:47, Hollis Blanchard wrote:
> > 
> > Paolo, is it true that only TB-invalidating writes go through the
> > io_mem_notdirty path? I'm looking at the live migration code now,
> > and it
> > seems like every memory write will go through that path when global
> > dirty memory logging is enabled.
> When live migration is enabled, writes to clean memory (almost all of
> them) will go through that path indeed.  Some writes to the
> framebuffer
> will go through that path too.
> 
> It depends on
> 
>       cpu_physical_memory_is_clean(
>                         memory_region_get_ram_addr(section->mr) +
> xlat))
> 
> in tlb_set_page_with_attrs.

Would "memory_region_notdirty_read/write" be a better tracepoint name
than "memory_region_tb_read/write"?

-- 
Hollis Blanchard <hollis_blanchard@mentor.com>
Mentor Graphics Emulation Division
Hollis Blanchard March 24, 2016, 7:30 p.m. UTC | #6
On 03/23/2016 09:53 AM, Paolo Bonzini wrote:
> On 23/03/2016 17:47, Hollis Blanchard wrote:
>> Paolo, is it true that only TB-invalidating writes go through the
>> io_mem_notdirty path? I'm looking at the live migration code now, and it
>> seems like every memory write will go through that path when global
>> dirty memory logging is enabled.
> When live migration is enabled, writes to clean memory (almost all of
> them) will go through that path indeed.  Some writes to the framebuffer
> will go through that path too.
>
> It depends on
>
>        cpu_physical_memory_is_clean(
>                          memory_region_get_ram_addr(section->mr) + xlat))
>
> in tlb_set_page_with_attrs.

I'm guessing that when live migration starts (ram_save_setup), the TLB 
must be flushed so that new entries can be created with the TLB_NOTDIRTY 
flag. Otherwise, pre-migration entries without TLB_NOTDIRTY flag could 
live on, allowing the TBs to directly modify guest RAM without tracking, 
right?

I can't find anything underneath ram_save_setup() that does this, 
though. Am I just missing it?
Paolo Bonzini March 24, 2016, 10:01 p.m. UTC | #7
----- Original Message -----
> From: "Hollis Blanchard" <hollis_blanchard@mentor.com>
> To: "Paolo Bonzini" <pbonzini@redhat.com>
> Cc: qemu-devel@nongnu.org
> Sent: Thursday, March 24, 2016 8:30:01 PM
> Subject: Re: io_mem_notdirty and live migration
> 
> On 03/23/2016 09:53 AM, Paolo Bonzini wrote:
> > On 23/03/2016 17:47, Hollis Blanchard wrote:
> >> Paolo, is it true that only TB-invalidating writes go through the
> >> io_mem_notdirty path? I'm looking at the live migration code now, and it
> >> seems like every memory write will go through that path when global
> >> dirty memory logging is enabled.
> > When live migration is enabled, writes to clean memory (almost all of
> > them) will go through that path indeed.  Some writes to the framebuffer
> > will go through that path too.
> >
> > It depends on
> >
> >        cpu_physical_memory_is_clean(
> >                          memory_region_get_ram_addr(section->mr) + xlat))
> >
> > in tlb_set_page_with_attrs.
> 
> I'm guessing that when live migration starts (ram_save_setup), the TLB
> must be flushed so that new entries can be created with the TLB_NOTDIRTY
> flag. Otherwise, pre-migration entries without TLB_NOTDIRTY flag could
> live on, allowing the TBs to directly modify guest RAM without tracking,
> right?
> 
> I can't find anything underneath ram_save_setup() that does this,
> though. Am I just missing it?

It's done (in a pretty roundabout way) by tcg_commit, which is called
by memory_global_dirty_log_start's call to memory_region_transaction_commit.

Paolo
diff mbox

Patch

diff --git a/memory.c b/memory.c
index 6ae7bae..3d125c9 100644
--- a/memory.c
+++ b/memory.c
@@ -403,6 +403,11 @@  static MemTxResult memory_region_oldmmio_read_accessor(MemoryRegion *mr,
     tmp = mr->ops->old_mmio.read[ctz32(size)](mr->opaque, addr);
     if (mr->subpage) {
         trace_memory_region_subpage_read(cpu_index, mr, addr, tmp, size);
+    } else if (mr == &io_mem_notdirty) {
+        /* Accesses to code which has previously been translated into a TB show
+         * up in the MMIO path, as accesses to the io_mem_notdirty
+         * MemoryRegion. */
+        trace_memory_region_ops_tb_read(cpu_index, addr, tmp, size);
     } else if (TRACE_MEMORY_REGION_OPS_READ_ENABLED) {
         hwaddr abs_addr = memory_region_to_absolute_addr(mr, addr);
         trace_memory_region_ops_read(cpu_index, mr, abs_addr, tmp, size);
@@ -428,6 +433,11 @@  static MemTxResult  memory_region_read_accessor(MemoryRegion *mr,
     tmp = mr->ops->read(mr->opaque, addr, size);
     if (mr->subpage) {
         trace_memory_region_subpage_read(cpu_index, mr, addr, tmp, size);
+    } else if (mr == &io_mem_notdirty) {
+        /* Accesses to code which has previously been translated into a TB show
+         * up in the MMIO path, as accesses to the io_mem_notdirty
+         * MemoryRegion. */
+        trace_memory_region_ops_tb_read(cpu_index, addr, tmp, size);
     } else if (TRACE_MEMORY_REGION_OPS_READ_ENABLED) {
         hwaddr abs_addr = memory_region_to_absolute_addr(mr, addr);
         trace_memory_region_ops_read(cpu_index, mr, abs_addr, tmp, size);
@@ -454,6 +464,11 @@  static MemTxResult memory_region_read_with_attrs_accessor(MemoryRegion *mr,
     r = mr->ops->read_with_attrs(mr->opaque, addr, &tmp, size, attrs);
     if (mr->subpage) {
         trace_memory_region_subpage_read(cpu_index, mr, addr, tmp, size);
+    } else if (mr == &io_mem_notdirty) {
+        /* Accesses to code which has previously been translated into a TB show
+         * up in the MMIO path, as accesses to the io_mem_notdirty
+         * MemoryRegion. */
+        trace_memory_region_ops_tb_read(cpu_index, addr, tmp, size);
     } else if (TRACE_MEMORY_REGION_OPS_READ_ENABLED) {
         hwaddr abs_addr = memory_region_to_absolute_addr(mr, addr);
         trace_memory_region_ops_read(cpu_index, mr, abs_addr, tmp, size);
@@ -479,6 +494,11 @@  static MemTxResult memory_region_oldmmio_write_accessor(MemoryRegion *mr,
     tmp = (*value >> shift) & mask;
     if (mr->subpage) {
         trace_memory_region_subpage_write(cpu_index, mr, addr, tmp, size);
+    } else if (mr == &io_mem_notdirty) {
+        /* Accesses to code which has previously been translated into a TB show
+         * up in the MMIO path, as accesses to the io_mem_notdirty
+         * MemoryRegion. */
+        trace_memory_region_ops_tb_write(cpu_index, addr, tmp, size);
     } else if (TRACE_MEMORY_REGION_OPS_WRITE_ENABLED) {
         hwaddr abs_addr = memory_region_to_absolute_addr(mr, addr);
         trace_memory_region_ops_write(cpu_index, mr, abs_addr, tmp, size);
@@ -504,6 +524,11 @@  static MemTxResult memory_region_write_accessor(MemoryRegion *mr,
     tmp = (*value >> shift) & mask;
     if (mr->subpage) {
         trace_memory_region_subpage_write(cpu_index, mr, addr, tmp, size);
+    } else if (mr == &io_mem_notdirty) {
+        /* Accesses to code which has previously been translated into a TB show
+         * up in the MMIO path, as accesses to the io_mem_notdirty
+         * MemoryRegion. */
+        trace_memory_region_ops_tb_write(cpu_index, addr, tmp, size);
     } else if (TRACE_MEMORY_REGION_OPS_WRITE_ENABLED) {
         hwaddr abs_addr = memory_region_to_absolute_addr(mr, addr);
         trace_memory_region_ops_write(cpu_index, mr, abs_addr, tmp, size);
diff --git a/trace-events b/trace-events
index 756ce86..7994420 100644
--- a/trace-events
+++ b/trace-events
@@ -1630,6 +1630,8 @@  memory_region_ops_read(int cpu_index, void *mr, uint64_t addr, uint64_t value, u
 memory_region_ops_write(int cpu_index, void *mr, uint64_t addr, uint64_t value, unsigned size) "cpu %d mr %p addr %#"PRIx64" value %#"PRIx64" size %u"
 memory_region_subpage_read(int cpu_index, void *mr, uint64_t offset, uint64_t value, unsigned size) "cpu %d mr %p offset %#"PRIx64" value %#"PRIx64" size %u"
 memory_region_subpage_write(int cpu_index, void *mr, uint64_t offset, uint64_t value, unsigned size) "cpu %d mr %p offset %#"PRIx64" value %#"PRIx64" size %u"
+memory_region_ops_tb_read(int cpu_index, uint64_t addr, uint64_t value, unsigned size) "cpu %d addr %#"PRIx64" value %#"PRIx64" size %u"
+memory_region_ops_tb_write(int cpu_index, uint64_t addr, uint64_t value, unsigned size) "cpu %d addr %#"PRIx64" value %#"PRIx64" size %u"
 
 # qom/object.c
 object_dynamic_cast_assert(const char *type, const char *target, const char *file, int line, const char *func) "%s->%s (%s:%d:%s)"