Message ID | 20221223142307.1614945-3-xuchuangxclwt@bytedance.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | migration: reduce time of loading non-iterable vmstate | expand |
On Fri, Dec 23, 2022 at 10:23:06PM +0800, Chuang Xu wrote: > Before using any flatview, sanity check we're not during a memory > region transaction or the map can be invalid. > > Signed-off-by: Chuang Xu <xuchuangxclwt@bytedance.com> > --- > include/exec/memory.h | 9 +++++++++ > softmmu/memory.c | 5 +++++ > 2 files changed, 14 insertions(+) > > diff --git a/include/exec/memory.h b/include/exec/memory.h > index 91f8a2395a..66c43b4862 100644 > --- a/include/exec/memory.h > +++ b/include/exec/memory.h > @@ -1069,8 +1069,17 @@ struct FlatView { > MemoryRegion *root; > }; > > +int memory_region_transaction_get_depth(void); > + > static inline FlatView *address_space_to_flatview(AddressSpace *as) > { > + /* > + * Before using any flatview, sanity check we're not during a memory > + * region transaction or the map can be invalid. Note that this can > + * also be called during commit phase of memory transaction, but that > + * should also only happen when the depth decreases to 0 first. Nitpick: after adding the RCU check the comment may need slight touch up: * Meanwhile it's safe to access current_map with RCU read lock held * even if during a memory transaction. It means the user can bear * with an obsolete map. > + */ > + assert(memory_region_transaction_get_depth() == 0 || rcu_read_locked()); > return qatomic_rcu_read(&as->current_map); > } > > diff --git a/softmmu/memory.c b/softmmu/memory.c > index bc0be3f62c..01192e2e5b 100644 > --- a/softmmu/memory.c > +++ b/softmmu/memory.c > @@ -1116,6 +1116,11 @@ void memory_region_transaction_commit(void) > } > } > > +int memory_region_transaction_get_depth(void) > +{ > + return memory_region_transaction_depth; > +} > + > static void memory_region_destructor_none(MemoryRegion *mr) > { > } > -- > 2.20.1 >
On 12/23/22 15:23, Chuang Xu wrote: > static inline FlatView *address_space_to_flatview(AddressSpace *as) > { > + /* > + * Before using any flatview, sanity check we're not during a memory > + * region transaction or the map can be invalid. Note that this can > + * also be called during commit phase of memory transaction, but that > + * should also only happen when the depth decreases to 0 first. > + */ > + assert(memory_region_transaction_get_depth() == 0 || rcu_read_locked()); > return qatomic_rcu_read(&as->current_map); > } This is not valid because the transaction could happen in *another* thread. In that case memory_region_transaction_depth() will be > 0, but RCU is needed. Paolo
Hi, Paolo, On Fri, Dec 23, 2022 at 04:47:57PM +0100, Paolo Bonzini wrote: > On 12/23/22 15:23, Chuang Xu wrote: > > static inline FlatView *address_space_to_flatview(AddressSpace *as) > > { > > + /* > > + * Before using any flatview, sanity check we're not during a memory > > + * region transaction or the map can be invalid. Note that this can > > + * also be called during commit phase of memory transaction, but that > > + * should also only happen when the depth decreases to 0 first. > > + */ > > + assert(memory_region_transaction_get_depth() == 0 || rcu_read_locked()); > > return qatomic_rcu_read(&as->current_map); > > } > > This is not valid because the transaction could happen in *another* thread. > In that case memory_region_transaction_depth() will be > 0, but RCU is > needed. Do you mean the code is wrong, or the comment? Note that the code has checked rcu_read_locked() where introduced in patch 1, but maybe something else was missed? Thanks,
Il ven 23 dic 2022, 16:54 Peter Xu <peterx@redhat.com> ha scritto: > > This is not valid because the transaction could happen in *another* > thread. > > In that case memory_region_transaction_depth() will be > 0, but RCU is > > needed. > > Do you mean the code is wrong, or the comment? Note that the code has > checked rcu_read_locked() where introduced in patch 1, but maybe something > else was missed? > The assertion is wrong. It will succeed even if RCU is unlocked in this thread but a transaction is in progress in another thread. Perhaps you can check (memory_region_transaction_depth() > 0 && !qemu_mutex_iothread_locked()) || rcu_read_locked() instead? Paolo Thanks, > > -- > Peter Xu > >
On 23/12/22 15:23, Chuang Xu wrote: > Before using any flatview, sanity check we're not during a memory > region transaction or the map can be invalid. > > Signed-off-by: Chuang Xu <xuchuangxclwt@bytedance.com> > --- > include/exec/memory.h | 9 +++++++++ > softmmu/memory.c | 5 +++++ > 2 files changed, 14 insertions(+) > > diff --git a/include/exec/memory.h b/include/exec/memory.h > index 91f8a2395a..66c43b4862 100644 > --- a/include/exec/memory.h > +++ b/include/exec/memory.h > @@ -1069,8 +1069,17 @@ struct FlatView { > MemoryRegion *root; > }; > > +int memory_region_transaction_get_depth(void); Do we want to expose this; isn't the depth internal? If we need to expose something, can we restrict it to bool memory_region_in_transaction(void) or bool memory_region_transaction_in_progress(void)? > static inline FlatView *address_space_to_flatview(AddressSpace *as) > { > + /* > + * Before using any flatview, sanity check we're not during a memory > + * region transaction or the map can be invalid. Note that this can > + * also be called during commit phase of memory transaction, but that > + * should also only happen when the depth decreases to 0 first. > + */ > + assert(memory_region_transaction_get_depth() == 0 || rcu_read_locked()); > return qatomic_rcu_read(&as->current_map); > }
Hi, Paolo, On Wed, Dec 28, 2022 at 09:27:50AM +0100, Paolo Bonzini wrote: > Il ven 23 dic 2022, 16:54 Peter Xu <peterx@redhat.com> ha scritto: > > > > This is not valid because the transaction could happen in *another* > > thread. > > > In that case memory_region_transaction_depth() will be > 0, but RCU is > > > needed. > > > > Do you mean the code is wrong, or the comment? Note that the code has > > checked rcu_read_locked() where introduced in patch 1, but maybe something > > else was missed? > > > > The assertion is wrong. It will succeed even if RCU is unlocked in this > thread but a transaction is in progress in another thread. IIUC this is the case where the context: (1) doesn't have RCU read lock held, and, (2) doesn't have BQL held. Is it safe at all to reference any flatview in such a context? The thing is I think the flatview pointer can be freed anytime if both locks are not taken. > Perhaps you can check (memory_region_transaction_depth() > 0 && > !qemu_mutex_iothread_locked()) || rcu_read_locked() instead? What if one thread calls address_space_to_flatview() with BQL held but not RCU read lock held? I assume it's a legal operation, but it seems to be able to trigger the assert already? Thanks,
On 2022/12/28 下午6:50, Philippe Mathieu-Daudé wrote:
On 23/12/22 15:23, Chuang Xu wrote:
Before using any flatview, sanity check we're not during a memory
region transaction or the map can be invalid.
Signed-off-by: Chuang Xu <xuchuangxclwt@bytedance.com>
<xuchuangxclwt@bytedance.com>
---
include/exec/memory.h | 9 +++++++++
softmmu/memory.c | 5 +++++
2 files changed, 14 insertions(+)
diff --git a/include/exec/memory.h b/include/exec/memory.h
index 91f8a2395a..66c43b4862 100644
--- a/include/exec/memory.h
+++ b/include/exec/memory.h
@@ -1069,8 +1069,17 @@ struct FlatView {
MemoryRegion *root;
};
+int memory_region_transaction_get_depth(void);
Do we want to expose this; isn't the depth internal?
If we need to expose something, can we restrict it to
bool memory_region_in_transaction(void) or
bool memory_region_transaction_in_progress(void)?
Yes, we'd better not expose the value of an internal
variable. I'll make changes in v5.
Thanks!
Hi, Peter and Paolo, I'm sorry I didn't reply to your email in time. I was infected with COVID-19 two weeks ago, so I couldn't think about the problems discussed in your email for a long time.
On Tue, Jan 10, 2023 at 12:09:41AM -0800, Chuang Xu wrote: > Hi, Peter and Paolo, Hi, Chuang, Paolo, > I'm sorry I didn't reply to your email in time. I was infected with > COVID-19 two weeks ago, so I couldn't think about the problems discussed > in your email for a long time.
Hi, Peter, Paolo, On 2023/1/10 下午10:45, Peter Xu wrote: > On Tue, Jan 10, 2023 at 12:09:41AM -0800, Chuang Xu wrote: >> Hi, Peter and Paolo, > Hi, Chuang, Paolo, > >> I'm sorry I didn't reply to your email in time. I was infected with >> COVID-19 two weeks ago, so I couldn't think about the problems discussed >> in your email for a long time.
On Thu, Jan 12, 2023 at 03:59:55PM +0800, Chuang Xu wrote: > Hi, Peter, Paolo, Chuang, > > On 2023/1/10 下午10:45, Peter Xu wrote: > > On Tue, Jan 10, 2023 at 12:09:41AM -0800, Chuang Xu wrote: > > > Hi, Peter and Paolo, > > Hi, Chuang, Paolo, > > > > > I'm sorry I didn't reply to your email in time. I was infected with > > > COVID-19 two weeks ago, so I couldn't think about the problems discussed > > > in your email for a long time.
Hi, Peter, On 2023/1/12 下午11:13, Peter Xu wrote: > We wanted to capture outliers when you apply the follow up patch to vm load > procedure. > > That will make depth>0 for the whole process of vm load during migration, > and we wanted to make sure it's safe, hence this patch, right? > >> In my perspective, both BQL and RCU can ensure that the flatview will not be >> released when the worker thread accesses the flatview, because before the rcu >> thread releases the flatview, it will make sure itself holding BQL and the >> worker thread not holding RCU. So, whether the depth is 0 or not, as long as >> BQL or RCU is held, the worker thread will not access the obsolete flatview >> (IIUC 'obsolete' means that flatview is released). >> >>> To summarize, the original check didn't consider BQL, and if to consider >>> BQL I think it should be something like: >>> >>> /* Guarantees valid access to the flatview, either lock works */ >>> assert(BQL_HELD() || RCU_HELD()); >>> >>> /* >>> * Guarantees any BQL holder is not reading obsolete flatview (e.g. when >>> * during vm load) >>> */ >>> if (BQL_HELD()) >>> assert(depth==0); >>> >>> IIUC it can be merged into: >>> >>> assert((BQL_HELD() && depth==0) || RCU_HELD()); >> IMHO assert(BQL_HELD() || RCU_HELD()) is enough.. > Yes, but IMHO this will guarantee safe use of flatview only if _before_ > your follow up patch. > > Before that patch, the depth==0 should always stand (when BQL_HELD() > stands) I think. > > After that patch, since depth will be increased at the entry of vm load > there's risk that we can overlook code that will be referencing flatview > during vm load and that can reference an obsolete flatview. Since the > whole process of qemu_loadvm_state_main() will have BQL held we won't hit > the assertion if only to check "BQL_HELD() || RCU_HELD()" because BQL_HELD > always satisfies. > >> Or you think that once a mr transaction is in progress, the old flatview has >> been obsolete? If we regard flatview as obsolete when a mr transaction is in >> progress, How can RCU ensure that flatview is not obsolete? > AFAIU RCU cannot guarantee that. So IIUC any RCU lock user need to be able > to tolerant obsolete flatviews being referenced and it should not harm the > system. If it needs the latest update, it should take care of that > separately. > > For example, the virtio code we're looking at in this series uses RCU lock > to build address space cache for the device vrings which is based on the > current flatview of mem. It's safe to reference obsolete flatview in this > case (it means the flatview can be during an update when virtio is > establishing the address space cache), IMHO that's fine because the address > space cache will be updated again in virtio_memory_listener_commit() so > it'll be consistent at last. The intermediate phase of inconsistency > should be fine in this case just like any DMA happens during a memory > hotplug. > > For this specific patch, IMHO the core is to check depth>0 reference, and > we need RCU_HELD to mask out where the obsolete references are fine. > > Thanks, Thanks a lot for your patience! I ignored the preconditions for this discussion, so that I asked a stupid question.. Now I'm in favor of checking “(BQL_HELD() && depth==0) || RCU_HELD()” in v5. And what does Paolo thinks of Peter's solution? Thanks again!
diff --git a/include/exec/memory.h b/include/exec/memory.h index 91f8a2395a..66c43b4862 100644 --- a/include/exec/memory.h +++ b/include/exec/memory.h @@ -1069,8 +1069,17 @@ struct FlatView { MemoryRegion *root; }; +int memory_region_transaction_get_depth(void); + static inline FlatView *address_space_to_flatview(AddressSpace *as) { + /* + * Before using any flatview, sanity check we're not during a memory + * region transaction or the map can be invalid. Note that this can + * also be called during commit phase of memory transaction, but that + * should also only happen when the depth decreases to 0 first. + */ + assert(memory_region_transaction_get_depth() == 0 || rcu_read_locked()); return qatomic_rcu_read(&as->current_map); } diff --git a/softmmu/memory.c b/softmmu/memory.c index bc0be3f62c..01192e2e5b 100644 --- a/softmmu/memory.c +++ b/softmmu/memory.c @@ -1116,6 +1116,11 @@ void memory_region_transaction_commit(void) } } +int memory_region_transaction_get_depth(void) +{ + return memory_region_transaction_depth; +} + static void memory_region_destructor_none(MemoryRegion *mr) { }
Before using any flatview, sanity check we're not during a memory region transaction or the map can be invalid. Signed-off-by: Chuang Xu <xuchuangxclwt@bytedance.com> --- include/exec/memory.h | 9 +++++++++ softmmu/memory.c | 5 +++++ 2 files changed, 14 insertions(+)