Message ID | 4712D8F4B26E034E80552F30A67BE0B1A19A11@ORSMSX112.amr.corp.intel.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On 16/03/2017 21:02, Xu, Anthony wrote: >>> memory_region_finalize. >>> Let me know if you think otherwise. >> >> Yes, you can replace memory_region_del_subregion in >> memory_region_finalize >> with special code that does >> >> assert(!mr->enabled); >> assert(subregion->container == mr); >> subregion->container = NULL; >> QTAILQ_REMOVE(&mr->subregions, subregion, subregions_link); >> memory_region_unref(subregion); >> >> The last four lines are shared with memory_region_del_subregion, so please >> factor them in a new function, for example >> memory_region_del_subregion_internal. > > After adding synchronize_rcu, I saw an infinite recursive call, > mem_commit-> memory_region_finalize-> mem_commit-> > memory_region_finalize-> ...... > it caused a segment fault, because 8M stack space is used up, and found when > memory_region_finalize is called, memory_region_transaction_depth is 0 and > memory_region_update_pending is true. That's not normal! Ok, this is a bug. This would fix it: > Please review below patch > > diff --git a/memory.c b/memory.c > index 64b0a60..4c95aaf 100644 > --- a/memory.c > +++ b/memory.c > @@ -906,12 +906,6 @@ void memory_region_transaction_begin(void) > ++memory_region_transaction_depth; > } > > -static void memory_region_clear_pending(void) > -{ > - memory_region_update_pending = false; > - ioeventfd_update_pending = false; > -} > - > void memory_region_transaction_commit(void) > { > AddressSpace *as; > @@ -927,14 +921,14 @@ void memory_region_transaction_commit(void) > QTAILQ_FOREACH(as, &address_spaces, address_spaces_link) { > address_space_update_topology(as); > } > - > + memory_region_update_pending = false; > MEMORY_LISTENER_CALL_GLOBAL(commit, Forward); > } else if (ioeventfd_update_pending) { > QTAILQ_FOREACH(as, &address_spaces, address_spaces_link) { > address_space_update_ioeventfds(as); > } > + ioeventfd_update_pending = false; > } > - memory_region_clear_pending(); > } > } So please send it to the list with Signed-off-by line. > The thing is, seems both address_space_translate and address_space_dispatch_free > are called under the global lock. When synchronize_rcu is called, no other threads > are in RCU critical section. No, not necessarily. address_space_write for example is called outside the global lock by KVM and it calls address_space_translate. > Seems RCU is not that useful for address space. I suggest that you study the code more closely... there is this in kvm-all.c: DPRINTF("handle_mmio\n"); /* Called outside BQL */ address_space_rw(&address_space_memory, run->mmio.phys_addr, attrs, run->mmio.data, run->mmio.len, run->mmio.is_write); and adding a simple assert() would have quickly disproved your theory. > After adding synchronize_rcu, we noticed a RCU dead loop. synchronize_rcu is called > inside RCU critical section. It happened when guest OS programmed the PCI bar. > The call trace is like, > address_space_write-> pci_host_config_write_common -> > memory_region_transaction_commit ->mem_commit-> synchronize_rcu > pci_host_config_write_common is called inside RCU critical section. > > The address_space_write change fixed this issue. It's not a fix if the code is not thread-safe anymore! But I think you have the answer now as to why you cannot use synchronize_rcu. Paolo
> So please send it to the list with Signed-off-by line. Thanks, > > DPRINTF("handle_mmio\n"); > /* Called outside BQL */ > address_space_rw(&address_space_memory, > run->mmio.phys_addr, attrs, > run->mmio.data, > run->mmio.len, > run->mmio.is_write); > > and adding a simple assert() would have quickly disproved your theory. You are right here. If it is a PCI bar write, a memory region operation(del/add) may be called inside address_space_rw, and memory_region_transaction_commit will be called. If address_space_rw is called without the global lock, memory_region_transaction_commit is called with the global lock. It conflicts with what you said before. >No, you don't need to check it. Most functions (in this case, >memory_region_transaction_commit) can only be called under the global lock. And if two vcpus program the different PCI bars at the same time (it is unlikely, but QEMU should not assume it), without the global lock, region operations may be called at the same time. Are memory_region_del_subregion and memory_region_add_subregion_overlap thread-safe? > It's not a fix if the code is not thread-safe anymore! But I think you > have the answer now as to why you cannot use synchronize_rcu. Can you elaborate why the code is not thread-safe? Thanks Anthony
diff --git a/memory.c b/memory.c index 64b0a60..4c95aaf 100644 --- a/memory.c +++ b/memory.c @@ -906,12 +906,6 @@ void memory_region_transaction_begin(void) ++memory_region_transaction_depth; } -static void memory_region_clear_pending(void) -{ - memory_region_update_pending = false; - ioeventfd_update_pending = false; -} - void memory_region_transaction_commit(void) { AddressSpace *as; @@ -927,14 +921,14 @@ void memory_region_transaction_commit(void) QTAILQ_FOREACH(as, &address_spaces, address_spaces_link) { address_space_update_topology(as); } - + memory_region_update_pending = false; MEMORY_LISTENER_CALL_GLOBAL(commit, Forward); } else if (ioeventfd_update_pending) { QTAILQ_FOREACH(as, &address_spaces, address_spaces_link) { address_space_update_ioeventfds(as); } + ioeventfd_update_pending = false; } - memory_region_clear_pending(); } }