Message ID | 1489554682-6126-5-git-send-email-chao.gao@intel.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
>>> On 15.03.17 at 06:11, <chao.gao@intel.com> wrote: > +static void update_irte(struct iremap_entry *entry, > + const struct iremap_entry *new_ire) > +{ > + if ( cpu_has_cx16 ) > + { > + __uint128_t ret; > + struct iremap_entry old_ire; > + > + old_ire = *entry; > + ret = cmpxchg16b(entry, &old_ire, new_ire); > + > + /* > + * In the above, we use cmpxchg16 to atomically update the 128-bit > + * IRTE, and the hardware cannot update the IRTE behind us, so > + * the return value of cmpxchg16 should be the same as old_ire. > + * This ASSERT validate it. > + */ > + ASSERT(ret == old_ire.val); > + } > + else > + { > + /* > + * The following method to update IRTE is safe on condition that > + * only the high qword or the low qword is to be updated. > + * If entire IRTE is to be updated, callers should make sure the > + * IRTE is not in use. > + */ > + entry->lo = new_ire->lo; > + entry->hi = new_ire->hi; How is this any better than structure assignment? Furthermore the comment here partially contradicts the commit message. I guess callers need to be given a way (another function parameter?) to signal the function whether the unsafe variant is okay to use. You should then add a suitable BUG_ON() in the else path here. Jan
On Wed, Mar 15, 2017 at 10:48:25AM -0600, Jan Beulich wrote: >>>> On 15.03.17 at 06:11, <chao.gao@intel.com> wrote: >> + /* >> + * The following method to update IRTE is safe on condition that >> + * only the high qword or the low qword is to be updated. >> + * If entire IRTE is to be updated, callers should make sure the >> + * IRTE is not in use. >> + */ >> + entry->lo = new_ire->lo; >> + entry->hi = new_ire->hi; > >How is this any better than structure assignment? Furthermore Indeed, not better. when using structure assignment, the assembly code is 48 8b 06 mov (%rsi),%rax 48 8b 56 08 mov 0x8(%rsi),%rdx 48 89 07 mov %rax,(%rdi) 48 89 57 08 mov %rdx,0x8(%rdi) Using the code above, the assembly code is 48 8b 06 mov (%rsi),%rax 48 89 07 mov %rax,(%rdi) 48 8b 46 08 mov 0x8(%rsi),%rax 48 89 47 08 mov %rax,0x8(%rdi) I thought structure assignment maybe ultilize memcpy considering structure of a big size, so I made this change. I will change this back. Although that, this patch is trying to make the change safer when cmpxchg16() is supported. >the comment here partially contradicts the commit message. I Yes. >guess callers need to be given a way (another function parameter?) >to signal the function whether the unsafe variant is okay to use. This means we need to add the new parameter to iommu ops for only IOAPIC/MSI know the entry they want to change is masked. Is there any another reasonable and correct solution? How about... >You should then add a suitable BUG_ON() in the else path here. just add a BUG_ON() like this BUG_ON( (entry->hi != new_ire->hi) && (entry->lo != new_ire->lo) ); Adding this BUG_ON() means update_irte() can't be used for initializing or clearing IRTE which are not bugs. Thanks Chao > >Jan >
>>> On 15.03.17 at 23:39, <chao.gao@intel.com> wrote: > On Wed, Mar 15, 2017 at 10:48:25AM -0600, Jan Beulich wrote: >>>>> On 15.03.17 at 06:11, <chao.gao@intel.com> wrote: >>> + /* >>> + * The following method to update IRTE is safe on condition that >>> + * only the high qword or the low qword is to be updated. >>> + * If entire IRTE is to be updated, callers should make sure the >>> + * IRTE is not in use. >>> + */ >>> + entry->lo = new_ire->lo; >>> + entry->hi = new_ire->hi; >> >>How is this any better than structure assignment? Furthermore > > Indeed, not better. when using structure assignment, the assembly code is > 48 8b 06 mov (%rsi),%rax > 48 8b 56 08 mov 0x8(%rsi),%rdx > 48 89 07 mov %rax,(%rdi) > 48 89 57 08 mov %rdx,0x8(%rdi) > Using the code above, the assembly code is > 48 8b 06 mov (%rsi),%rax > 48 89 07 mov %rax,(%rdi) > 48 8b 46 08 mov 0x8(%rsi),%rax > 48 89 47 08 mov %rax,0x8(%rdi) > > I thought structure assignment maybe ultilize memcpy considering structure > of a big size, so I made this change. I will change this back. Although > that, this patch is trying to make the change safer when cmpxchg16() is > supported. Perhaps you've really meant to use write_atomic()? >>the comment here partially contradicts the commit message. I > > Yes. > >>guess callers need to be given a way (another function parameter?) >>to signal the function whether the unsafe variant is okay to use. > > This means we need to add the new parameter to iommu ops for only > IOAPIC/MSI know the entry they want to change is masked. Is there > any another reasonable and correct solution? Well, users you convert in this patch must be okay to use the non-atomic variant. The PI user(s) know(s) that cmpxchg16b is available, so could always request the safe variant. No need for a new parameter higher up in the call trees afaics. > How about... > >>You should then add a suitable BUG_ON() in the else path here. > > just add a BUG_ON() like this > BUG_ON( (entry->hi != new_ire->hi) && (entry->lo != new_ire->lo) ); > Adding this BUG_ON() means update_irte() can't be used for initializing > or clearing IRTE which are not bugs. Yes, that's an option too, albeit then I'd suggest (pseudo code) if ( high_up_to_date ) update_low; else if ( low_up_to_date ) update_high; else BUG(); But you'll want to have the okay from Kevin as the maintainer for something like this. Jan
On Thu, Mar 16, 2017 at 04:29:29AM -0600, Jan Beulich wrote: >>>> On 15.03.17 at 23:39, <chao.gao@intel.com> wrote: >> On Wed, Mar 15, 2017 at 10:48:25AM -0600, Jan Beulich wrote: >>>>>> On 15.03.17 at 06:11, <chao.gao@intel.com> wrote: >>>> + /* >>>> + * The following method to update IRTE is safe on condition that >>>> + * only the high qword or the low qword is to be updated. >>>> + * If entire IRTE is to be updated, callers should make sure the >>>> + * IRTE is not in use. >>>> + */ >>>> + entry->lo = new_ire->lo; >>>> + entry->hi = new_ire->hi; >>> >>>How is this any better than structure assignment? Furthermore >> >> Indeed, not better. when using structure assignment, the assembly code is >> 48 8b 06 mov (%rsi),%rax >> 48 8b 56 08 mov 0x8(%rsi),%rdx >> 48 89 07 mov %rax,(%rdi) >> 48 89 57 08 mov %rdx,0x8(%rdi) >> Using the code above, the assembly code is >> 48 8b 06 mov (%rsi),%rax >> 48 89 07 mov %rax,(%rdi) >> 48 8b 46 08 mov 0x8(%rsi),%rax >> 48 89 47 08 mov %rax,0x8(%rdi) >> >> I thought structure assignment maybe ultilize memcpy considering structure >> of a big size, so I made this change. I will change this back. Although >> that, this patch is trying to make the change safer when cmpxchg16() is >> supported. > >Perhaps you've really meant to use write_atomic()? I don't understand what you mean. But I think write_atomic may be not related to the problem how to update a 16 byte memory atomically if cmpxchg16() is not supported. > >>>the comment here partially contradicts the commit message. I >> >> Yes. >> >>>guess callers need to be given a way (another function parameter?) >>>to signal the function whether the unsafe variant is okay to use. >> >> This means we need to add the new parameter to iommu ops for only >> IOAPIC/MSI know the entry they want to change is masked. Is there >> any another reasonable and correct solution? > >Well, users you convert in this patch must be okay to use the >non-atomic variant. The PI user(s) know(s) that cmpxchg16b is >available, so could always request the safe variant. No need for >a new parameter higher up in the call trees afaics. > >> How about... >> >>>You should then add a suitable BUG_ON() in the else path here. >> >> just add a BUG_ON() like this >> BUG_ON( (entry->hi != new_ire->hi) && (entry->lo != new_ire->lo) ); >> Adding this BUG_ON() means update_irte() can't be used for initializing >> or clearing IRTE which are not bugs. > >Yes, that's an option too, albeit then I'd suggest (pseudo code) > > if ( high_up_to_date ) > update_low; > else if ( low_up_to_date ) > update_high; > else > BUG(); > >But you'll want to have the okay from Kevin as the maintainer for >something like this. ok. I will wait for comments of Kevin. Thank, Chao > >Jan >
>>> On 17.03.17 at 02:52, <chao.gao@intel.com> wrote: > On Thu, Mar 16, 2017 at 04:29:29AM -0600, Jan Beulich wrote: >>>>> On 15.03.17 at 23:39, <chao.gao@intel.com> wrote: >>> On Wed, Mar 15, 2017 at 10:48:25AM -0600, Jan Beulich wrote: >>>>>>> On 15.03.17 at 06:11, <chao.gao@intel.com> wrote: >>>>> + /* >>>>> + * The following method to update IRTE is safe on condition that >>>>> + * only the high qword or the low qword is to be updated. >>>>> + * If entire IRTE is to be updated, callers should make sure the >>>>> + * IRTE is not in use. >>>>> + */ >>>>> + entry->lo = new_ire->lo; >>>>> + entry->hi = new_ire->hi; >>>> >>>>How is this any better than structure assignment? Furthermore >>> >>> Indeed, not better. when using structure assignment, the assembly code is >>> 48 8b 06 mov (%rsi),%rax >>> 48 8b 56 08 mov 0x8(%rsi),%rdx >>> 48 89 07 mov %rax,(%rdi) >>> 48 89 57 08 mov %rdx,0x8(%rdi) >>> Using the code above, the assembly code is >>> 48 8b 06 mov (%rsi),%rax >>> 48 89 07 mov %rax,(%rdi) >>> 48 8b 46 08 mov 0x8(%rsi),%rax >>> 48 89 47 08 mov %rax,0x8(%rdi) >>> >>> I thought structure assignment maybe ultilize memcpy considering structure >>> of a big size, so I made this change. I will change this back. Although >>> that, this patch is trying to make the change safer when cmpxchg16() is >>> supported. >> >>Perhaps you've really meant to use write_atomic()? > > I don't understand what you mean. But I think write_atomic may be not related > to the problem how to update a 16 byte memory atomically if cmpxchg16() is not > supported. Of course not, but you were concerned about memcpy() being called by the compiler. I.e. I did assume that when doing the 2x8-byte update you would want to have them carried out as two 8-byte writes, instead of possibly being broken up further by the compiler. Jan
> From: Jan Beulich [mailto:JBeulich@suse.com] > Sent: Thursday, March 16, 2017 6:29 PM > > >>> On 15.03.17 at 23:39, <chao.gao@intel.com> wrote: > > On Wed, Mar 15, 2017 at 10:48:25AM -0600, Jan Beulich wrote: > >>>>> On 15.03.17 at 06:11, <chao.gao@intel.com> wrote: > >>> + /* > >>> + * The following method to update IRTE is safe on condition that > >>> + * only the high qword or the low qword is to be updated. > >>> + * If entire IRTE is to be updated, callers should make sure the > >>> + * IRTE is not in use. > >>> + */ > >>> + entry->lo = new_ire->lo; > >>> + entry->hi = new_ire->hi; > >> > >>How is this any better than structure assignment? Furthermore > > > > Indeed, not better. when using structure assignment, the assembly code is > > 48 8b 06 mov (%rsi),%rax > > 48 8b 56 08 mov 0x8(%rsi),%rdx > > 48 89 07 mov %rax,(%rdi) > > 48 89 57 08 mov %rdx,0x8(%rdi) > > Using the code above, the assembly code is > > 48 8b 06 mov (%rsi),%rax > > 48 89 07 mov %rax,(%rdi) > > 48 8b 46 08 mov 0x8(%rsi),%rax > > 48 89 47 08 mov %rax,0x8(%rdi) > > > > I thought structure assignment maybe ultilize memcpy considering > > structure of a big size, so I made this change. I will change this > > back. Although that, this patch is trying to make the change safer > > when cmpxchg16() is supported. > > Perhaps you've really meant to use write_atomic()? > > >>the comment here partially contradicts the commit message. I > > > > Yes. > > > >>guess callers need to be given a way (another function parameter?) to > >>signal the function whether the unsafe variant is okay to use. > > > > This means we need to add the new parameter to iommu ops for only > > IOAPIC/MSI know the entry they want to change is masked. Is there any > > another reasonable and correct solution? > > Well, users you convert in this patch must be okay to use the non-atomic > variant. The PI user(s) know(s) that cmpxchg16b is available, so could always > request the safe variant. No need for a new parameter higher up in the call > trees afaics. > > > How about... > > > >>You should then add a suitable BUG_ON() in the else path here. > > > > just add a BUG_ON() like this > > BUG_ON( (entry->hi != new_ire->hi) && (entry->lo != new_ire->lo) ); > > Adding this BUG_ON() means update_irte() can't be used for > > initializing or clearing IRTE which are not bugs. > > Yes, that's an option too, albeit then I'd suggest (pseudo code) > > if ( high_up_to_date ) > update_low; > else if ( low_up_to_date ) > update_high; > else > BUG(); > > But you'll want to have the okay from Kevin as the maintainer for something > like this. > > Jan We'll have an internal discussion about exact requirement of update atomicity here and then get back with a proposal. Thanks Kevin
> From: Jan Beulich [mailto:JBeulich@suse.com] > Sent: Thursday, March 16, 2017 6:29 PM > > >>> On 15.03.17 at 23:39, <chao.gao@intel.com> wrote: > > On Wed, Mar 15, 2017 at 10:48:25AM -0600, Jan Beulich wrote: > >>>>> On 15.03.17 at 06:11, <chao.gao@intel.com> wrote: > >>> + /* > >>> + * The following method to update IRTE is safe on condition that > >>> + * only the high qword or the low qword is to be updated. > >>> + * If entire IRTE is to be updated, callers should make sure the > >>> + * IRTE is not in use. > >>> + */ > >>> + entry->lo = new_ire->lo; > >>> + entry->hi = new_ire->hi; > >> > >>How is this any better than structure assignment? Furthermore > > > > Indeed, not better. when using structure assignment, the assembly code is > > 48 8b 06 mov (%rsi),%rax > > 48 8b 56 08 mov 0x8(%rsi),%rdx > > 48 89 07 mov %rax,(%rdi) > > 48 89 57 08 mov %rdx,0x8(%rdi) > > Using the code above, the assembly code is > > 48 8b 06 mov (%rsi),%rax > > 48 89 07 mov %rax,(%rdi) > > 48 8b 46 08 mov 0x8(%rsi),%rax > > 48 89 47 08 mov %rax,0x8(%rdi) > > > > I thought structure assignment maybe ultilize memcpy considering > > structure of a big size, so I made this change. I will change this > > back. Although that, this patch is trying to make the change safer > > when cmpxchg16() is supported. > > Perhaps you've really meant to use write_atomic()? > > >>the comment here partially contradicts the commit message. I > > > > Yes. > > > >>guess callers need to be given a way (another function parameter?) to > >>signal the function whether the unsafe variant is okay to use. > > > > This means we need to add the new parameter to iommu ops for only > > IOAPIC/MSI know the entry they want to change is masked. Is there any > > another reasonable and correct solution? > > Well, users you convert in this patch must be okay to use the non-atomic > variant. The PI user(s) know(s) that cmpxchg16b is available, so could always > request the safe variant. No need for a new parameter higher up in the call > trees afaics. > > > How about... > > > >>You should then add a suitable BUG_ON() in the else path here. > > > > just add a BUG_ON() like this > > BUG_ON( (entry->hi != new_ire->hi) && (entry->lo != new_ire->lo) ); > > Adding this BUG_ON() means update_irte() can't be used for > > initializing or clearing IRTE which are not bugs. > > Yes, that's an option too, albeit then I'd suggest (pseudo code) > > if ( high_up_to_date ) > update_low; > else if ( low_up_to_date ) > update_high; > else > BUG(); > > But you'll want to have the okay from Kevin as the maintainer for something > like this. > Just had a discussion with Chao. Here is a summary: - PI implies availability of cmpxchg16b, so can always request atomicity - The problem is really about remap mode. cmpxchg16b might be not available on those platforms: * for init/clear operations, atomicity is not a requirement. We can have high/low parts updated separately; * for reprogram operations (e.g. irq balance), atomicity is a requirement. But in reality only lower 64bit should be touched. Then: - for remapping IRTE * do update_irte(high) and update_irte(low) separately when IRTE is newly allocated * Then only do update_irte(low) in case an IRTE is found - for posting IRTE * always do update_irte(high&low) Then we can use the pseudo code you suggested. Thanks Kevin
diff --git a/xen/drivers/passthrough/vtd/intremap.c b/xen/drivers/passthrough/vtd/intremap.c index 7d4c7e1..342b45f 100644 --- a/xen/drivers/passthrough/vtd/intremap.c +++ b/xen/drivers/passthrough/vtd/intremap.c @@ -169,6 +169,38 @@ bool_t __init iommu_supports_eim(void) return 1; } +static void update_irte(struct iremap_entry *entry, + const struct iremap_entry *new_ire) +{ + if ( cpu_has_cx16 ) + { + __uint128_t ret; + struct iremap_entry old_ire; + + old_ire = *entry; + ret = cmpxchg16b(entry, &old_ire, new_ire); + + /* + * In the above, we use cmpxchg16 to atomically update the 128-bit + * IRTE, and the hardware cannot update the IRTE behind us, so + * the return value of cmpxchg16 should be the same as old_ire. + * This ASSERT validate it. + */ + ASSERT(ret == old_ire.val); + } + else + { + /* + * The following method to update IRTE is safe on condition that + * only the high qword or the low qword is to be updated. + * If entire IRTE is to be updated, callers should make sure the + * IRTE is not in use. + */ + entry->lo = new_ire->lo; + entry->hi = new_ire->hi; + } +} + /* Mark specified intr remap entry as free */ static void free_remap_entry(struct iommu *iommu, int index) { @@ -353,7 +385,7 @@ static int ioapic_rte_to_remap_entry(struct iommu *iommu, remap_rte->format = 1; /* indicate remap format */ } - *iremap_entry = new_ire; + update_irte(iremap_entry, &new_ire); iommu_flush_cache_entry(iremap_entry, sizeof(*iremap_entry)); iommu_flush_iec_index(iommu, 0, index); @@ -640,7 +672,7 @@ static int msi_msg_to_remap_entry( remap_rte->address_hi = 0; remap_rte->data = index - i; - *iremap_entry = new_ire; + update_irte(iremap_entry, &new_ire); iommu_flush_cache_entry(iremap_entry, sizeof(*iremap_entry)); iommu_flush_iec_index(iommu, 0, index);
We used structure assignment to update irte which was not safe when an interrupt happened during update. It is better to update IRTE atomically through cmpxchg16b(). When cmpxchg16b is not supported, two 64-bit write operations can update IRTE safely when only the high qword or the low qword is intented to be updated. Signed-off-by: Chao Gao <chao.gao@intel.com> --- v10: - rename copy_irte_to_irt to update_irte - remove copy_from_to_irt - change commmit message and add some comments to illustrate on which condition update_irte() is safe. xen/drivers/passthrough/vtd/intremap.c | 36 ++++++++++++++++++++++++++++++++-- 1 file changed, 34 insertions(+), 2 deletions(-)