Message ID | 20190716162355.1321-1-andrew.cooper3@citrix.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [v3] passthrough/vtd: Don't DMA to the stack in queue_invalidate_wait() | expand |
On 16.07.2019 18:23, Andrew Cooper wrote: > DMA-ing to the stack is considered bad practice. In this case, if a > timeout occurs because of a sluggish device which is processing the > request, the completion notification will corrupt the stack of a > subsequent deeper call tree. > > Place the poll_slot in a percpu area and DMA to that instead. > > Fix the declaration of saddr in struct qinval_entry, to avoid a shift by > two. The requirement here is that the DMA address is dword aligned, > which is covered by poll_slot's type. > > This change does not address other issues. Correlating completions > after a timeout with their request is a more complicated change. > > Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <JBeulich@suse.com> Must have been quite some time since v2 ... Jan
On 16/07/2019 17:33, Jan Beulich wrote: > On 16.07.2019 18:23, Andrew Cooper wrote: >> DMA-ing to the stack is considered bad practice. In this case, if a >> timeout occurs because of a sluggish device which is processing the >> request, the completion notification will corrupt the stack of a >> subsequent deeper call tree. >> >> Place the poll_slot in a percpu area and DMA to that instead. >> >> Fix the declaration of saddr in struct qinval_entry, to avoid a shift by >> two. The requirement here is that the DMA address is dword aligned, >> which is covered by poll_slot's type. >> >> This change does not address other issues. Correlating completions >> after a timeout with their request is a more complicated change. >> >> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> > Reviewed-by: Jan Beulich <JBeulich@suse.com> > > Must have been quite some time since v2 ... Oct. 19, 2017, 4:22 p.m. UTC according to patchwork. And now I can talk about it, that's when I was getting really stuck in to trying to fix Spectre/Meltdown ~Andrew
> From: Andrew Cooper [mailto:andrew.cooper3@citrix.com] > Sent: Wednesday, July 17, 2019 12:24 AM > > DMA-ing to the stack is considered bad practice. In this case, if a > timeout occurs because of a sluggish device which is processing the > request, the completion notification will corrupt the stack of a > subsequent deeper call tree. > > Place the poll_slot in a percpu area and DMA to that instead. > > Fix the declaration of saddr in struct qinval_entry, to avoid a shift by > two. The requirement here is that the DMA address is dword aligned, > which is covered by poll_slot's type. > > This change does not address other issues. Correlating completions > after a timeout with their request is a more complicated change. > > Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com>
diff --git a/xen/drivers/passthrough/vtd/iommu.h b/xen/drivers/passthrough/vtd/iommu.h index 1a992f72d6..c9290a3996 100644 --- a/xen/drivers/passthrough/vtd/iommu.h +++ b/xen/drivers/passthrough/vtd/iommu.h @@ -444,8 +444,7 @@ struct qinval_entry { sdata : 32; }lo; struct { - u64 res_1 : 2, - saddr : 62; + u64 saddr; }hi; }inv_wait_dsc; }q; diff --git a/xen/drivers/passthrough/vtd/qinval.c b/xen/drivers/passthrough/vtd/qinval.c index 01447cf9a8..09cbd36ebb 100644 --- a/xen/drivers/passthrough/vtd/qinval.c +++ b/xen/drivers/passthrough/vtd/qinval.c @@ -147,13 +147,15 @@ static int __must_check queue_invalidate_wait(struct iommu *iommu, u8 iflag, u8 sw, u8 fn, bool_t flush_dev_iotlb) { - volatile u32 poll_slot = QINVAL_STAT_INIT; + static DEFINE_PER_CPU(uint32_t, poll_slot); unsigned int index; unsigned long flags; u64 entry_base; struct qinval_entry *qinval_entry, *qinval_entries; + uint32_t *this_poll_slot = &this_cpu(poll_slot); spin_lock_irqsave(&iommu->register_lock, flags); + ACCESS_ONCE(*this_poll_slot) = QINVAL_STAT_INIT; index = qinval_next_index(iommu); entry_base = iommu_qi_ctrl(iommu)->qinval_maddr + ((index >> QINVAL_ENTRY_ORDER) << PAGE_SHIFT); @@ -166,8 +168,7 @@ static int __must_check queue_invalidate_wait(struct iommu *iommu, qinval_entry->q.inv_wait_dsc.lo.fn = fn; qinval_entry->q.inv_wait_dsc.lo.res_1 = 0; qinval_entry->q.inv_wait_dsc.lo.sdata = QINVAL_STAT_DONE; - qinval_entry->q.inv_wait_dsc.hi.res_1 = 0; - qinval_entry->q.inv_wait_dsc.hi.saddr = virt_to_maddr(&poll_slot) >> 2; + qinval_entry->q.inv_wait_dsc.hi.saddr = virt_to_maddr(this_poll_slot); unmap_vtd_domain_page(qinval_entries); qinval_update_qtail(iommu, index); @@ -182,7 +183,7 @@ static int __must_check queue_invalidate_wait(struct iommu *iommu, timeout = NOW() + MILLISECS(flush_dev_iotlb ? iommu_dev_iotlb_timeout : VTD_QI_TIMEOUT); - while ( poll_slot != QINVAL_STAT_DONE ) + while ( ACCESS_ONCE(*this_poll_slot) != QINVAL_STAT_DONE ) { if ( NOW() > timeout ) {
DMA-ing to the stack is considered bad practice. In this case, if a timeout occurs because of a sluggish device which is processing the request, the completion notification will corrupt the stack of a subsequent deeper call tree. Place the poll_slot in a percpu area and DMA to that instead. Fix the declaration of saddr in struct qinval_entry, to avoid a shift by two. The requirement here is that the DMA address is dword aligned, which is covered by poll_slot's type. This change does not address other issues. Correlating completions after a timeout with their request is a more complicated change. Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> --- CC: Jan Beulich <JBeulich@suse.com> CC: Kevin Tian <kevin.tian@intel.com> It turns out that this has been pending since 4.10, and grossly late. v3: * Fix saddr declarion to drop a shift-by-two. * Drop volatile attribute. Use ACCESS_ONCE() instead. --- xen/drivers/passthrough/vtd/iommu.h | 3 +-- xen/drivers/passthrough/vtd/qinval.c | 9 +++++---- 2 files changed, 6 insertions(+), 6 deletions(-)