From patchwork Sat Jul 22 00:46:01 2017
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Stefano Stabellini <sstabellini@kernel.org>
X-Patchwork-Id: 9857933
Return-Path: <xen-devel-bounces@lists.xen.org>
Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org
	[172.30.200.125])
	by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id
	E846660392 for <patchwork-xen-devel@patchwork.kernel.org>;
	Sat, 22 Jul 2017 00:48:16 +0000 (UTC)
Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1])
	by mail.wl.linuxfoundation.org (Postfix) with ESMTP id D83112864F
	for <patchwork-xen-devel@patchwork.kernel.org>;
	Sat, 22 Jul 2017 00:48:16 +0000 (UTC)
Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486)
	id CD08528670; Sat, 22 Jul 2017 00:48:16 +0000 (UTC)
X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on
	pdx-wl-mail.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-4.2 required=2.0 tests=BAYES_00, RCVD_IN_DNSWL_MED
	autolearn=ham version=3.3.1
Received: from lists.xenproject.org (lists.xenproject.org [192.237.175.120])
	(using TLSv1.2 with cipher AES128-GCM-SHA256 (128/128 bits))
	(No client certificate requested)
	by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id 375C02864F
	for <patchwork-xen-devel@patchwork.kernel.org>;
	Sat, 22 Jul 2017 00:48:16 +0000 (UTC)
Received: from localhost ([127.0.0.1] helo=lists.xenproject.org)
	by lists.xenproject.org with esmtp (Exim 4.84_2)
	(envelope-from <xen-devel-bounces@lists.xen.org>)
	id 1dYiYa-0007XG-L1; Sat, 22 Jul 2017 00:46:08 +0000
Received: from mail6.bemta6.messagelabs.com ([193.109.254.103])
	by lists.xenproject.org with esmtp (Exim 4.84_2)
	(envelope-from <sstabellini@kernel.org>) id 1dYiYZ-0007WX-7l
	for xen-devel@lists.xenproject.org; Sat, 22 Jul 2017 00:46:07 +0000
Received: from [85.158.143.35] by server-2.bemta-6.messagelabs.com id
	74/74-27137-E40A2795; Sat, 22 Jul 2017 00:46:06 +0000
X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFvrFLMWRWlGSWpSXmKPExsVybKJssq7vgqJ
	Ig947Ehbft0xmcmD0OPzhCksAYxRrZl5SfkUCa8aU9Y9ZC5rtKqb/msXawHjXsIuRi0NIYB2T
	xM9tR9i6GDk5WAQcJJr/P2TuYuTgYBSIkXjwwxokzCgQJjH58hJWEJtNwFDi75NNYOUiQPaDr
	ctZQeYwCyxmlJi27ykjiCMsMIFR4sqv7UwQQ1UlNly/xAgylFfATWLeZWGQsISAnMTJY5PBhn
	IKuEtMPbuNCeKgdkaJdYtvM01g5F3AyLCKUb04tagstUjXUC+pKDM9oyQ3MTNH19DATC83tbg
	4MT01JzGpWC85P3cTIzAcGIBgB+PO506HGCU5mJREeTWtiiKF+JLyUyozEosz4otKc1KLDzHK
	cHAoSfD6zAfKCRalpqdWpGXmAAMTJi3BwaMkwhsAkuYtLkjMLc5Mh0idYtTleDXh/zcmIZa8/
	LxUKXFeAZAiAZCijNI8uBGwKLnEKCslzMsIdJQQT0FqUW5mCar8K0ZxDkYlYd6IeUBTeDLzSu
	A2vQI6ggnoiEduBSBHlCQipKQaGNX0XT4GNztkLbxyfGnJW6ZYceX4Cy1cUS0PHufkBK/f5fZ
	d5c0J+6Tjnd8377jxST2txu3G64e7vYvU+k/M7T7x5PjK9QH3fl6Pe8BxhaucnSdkEds1w87r
	aubnNkbMm97sbquRPO25xbrin3+r5H5xd7u4SPELPdl8edLyL5MXu3mczi846K3EUpyRaKjFX
	FScCAD53/5djQIAAA==
X-Env-Sender: sstabellini@kernel.org
X-Msg-Ref: server-7.tower-21.messagelabs.com!1500684364!74453349!1
X-Originating-IP: [198.145.29.99]
X-SpamReason: No, hits=0.5 required=7.0 tests=BODY_RANDOM_LONG
X-StarScan-Received: 
X-StarScan-Version: 9.4.25; banners=-,-,-
X-VirusChecked: Checked
Received: (qmail 6102 invoked from network); 22 Jul 2017 00:46:05 -0000
Received: from mail.kernel.org (HELO mail.kernel.org) (198.145.29.99)
	by server-7.tower-21.messagelabs.com with DHE-RSA-AES256-GCM-SHA384
	encrypted SMTP; 22 Jul 2017 00:46:05 -0000
Received: from localhost.localdomain
	(162-198-228-33.lightspeed.wlfrct.sbcglobal.net [162.198.228.33])
	(using TLSv1.2 with cipher ECDHE-RSA-AES128-SHA256 (128/128 bits))
	(No client certificate requested)
	by mail.kernel.org (Postfix) with ESMTPSA id 34FFA22C9A;
	Sat, 22 Jul 2017 00:46:03 +0000 (UTC)
DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 34FFA22C9A
From: Stefano Stabellini <sstabellini@kernel.org>
To: peter.maydell@linaro.org,
	stefanha@gmail.com
Date: Fri, 21 Jul 2017 17:46:01 -0700
Message-Id: <1500684361-20532-2-git-send-email-sstabellini@kernel.org>
X-Mailer: git-send-email 1.9.1
In-Reply-To: <1500684361-20532-1-git-send-email-sstabellini@kernel.org>
References: <alpine.DEB.2.10.1707211740220.22381@sstabellini-ThinkPad-X260>
	<1500684361-20532-1-git-send-email-sstabellini@kernel.org>
Cc: sstabellini@kernel.org, qemu-devel@nongnu.org,
	Alexey G <x1917x@gmail.com>,
	stefanha@redhat.com, anthony.perard@citrix.com,
	xen-devel@lists.xenproject.org
Subject: [Xen-devel] [PULL for-2.10 2/2] xen-mapcache: Fix the bug when
	overlapping emulated DMA operations may cause inconsistency
	in guest memory mappings
X-BeenThere: xen-devel@lists.xen.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: Xen developer discussion <xen-devel.lists.xen.org>
List-Unsubscribe: <https://lists.xen.org/cgi-bin/mailman/options/xen-devel>,
	<mailto:xen-devel-request@lists.xen.org?subject=unsubscribe>
List-Post: <mailto:xen-devel@lists.xen.org>
List-Help: <mailto:xen-devel-request@lists.xen.org?subject=help>
List-Subscribe: <https://lists.xen.org/cgi-bin/mailman/listinfo/xen-devel>,
	<mailto:xen-devel-request@lists.xen.org?subject=subscribe>
MIME-Version: 1.0
Errors-To: xen-devel-bounces@lists.xen.org
Sender: "Xen-devel" <xen-devel-bounces@lists.xen.org>
X-Virus-Scanned: ClamAV using ClamSMTP

From: Alexey G <x1917x@gmail.com>

Under certain circumstances normal xen-mapcache functioning may be broken
by guest's actions. This may lead to either QEMU performing exit() due to
a caught bad pointer (and with QEMU process gone the guest domain simply
appears hung afterwards) or actual use of the incorrect pointer inside
QEMU address space -- a write to unmapped memory is possible. The bug is
hard to reproduce on a i440 machine as multiple DMA sources are required
(though it's possible in theory, using multiple emulated devices), but can
be reproduced somewhat easily on a Q35 machine using an emulated AHCI
controller -- each NCQ queue command slot may be used as an independent
DMA source ex. using READ FPDMA QUEUED command, so a single storage
device on the AHCI controller port will be enough to produce multiple DMAs
(up to 32). The detailed description of the issue follows.

Xen-mapcache provides an ability to map parts of a guest memory into
QEMU's own address space to work with.

There are two types of cache lookups:
 - translating a guest physical address into a pointer in QEMU's address
   space, mapping a part of guest domain memory if necessary (while trying
   to reduce a number of such (re)mappings to a minimum)
 - translating a QEMU's pointer back to its physical address in guest RAM

These lookups are managed via two linked-lists of structures.
MapCacheEntry is used for forward cache lookups, while MapCacheRev -- for
reverse lookups.

Every guest physical address is broken down into 2 parts:
    address_index  = phys_addr >> MCACHE_BUCKET_SHIFT;
    address_offset = phys_addr & (MCACHE_BUCKET_SIZE - 1);

MCACHE_BUCKET_SHIFT depends on a system (32/64) and is equal to 20 for
a 64-bit system (which assumed for the further description). Basically,
this means that we deal with 1 MB chunks and offsets within those 1 MB
chunks. All mappings are created with 1MB-granularity, i.e. 1MB/2MB/3MB
etc. Most DMA transfers typically are less than 1MB, however, if the
transfer crosses any 1MB border(s) - than a nearest larger mapping size
will be used, so ex. a 512-byte DMA transfer with the start address
700FFF80h will actually require a 2MB range.

Current implementation assumes that MapCacheEntries are unique for a given
address_index and size pair and that a single MapCacheEntry may be reused
by multiple requests -- in this case the 'lock' field will be larger than
1. On other hand, each requested guest physical address (with 'lock' flag)
is described by each own MapCacheRev. So there may be multiple MapCacheRev
entries corresponding to a single MapCacheEntry. The xen-mapcache code
uses MapCacheRev entries to retrieve the address_index & size pair which
in turn used to find a related MapCacheEntry. The 'lock' field within
a MapCacheEntry structure is actually a reference counter which shows
a number of corresponding MapCacheRev entries.

The bug lies in ability for the guest to indirectly manipulate with the
xen-mapcache MapCacheEntries list via a special sequence of DMA
operations, typically for storage devices. In order to trigger the bug,
guest needs to issue DMA operations in specific order and timing.
Although xen-mapcache is protected by the mutex lock -- this doesn't help
in this case, as the bug is not due to a race condition.

Suppose we have 3 DMA transfers, namely A, B and C, where
- transfer A crosses 1MB border and thus uses a 2MB mapping
- transfers B and C are normal transfers within 1MB range
- and all 3 transfers belong to the same address_index

In this case, if all these transfers are to be executed one-by-one
(without overlaps), no special treatment necessary -- each transfer's
mapping lock will be set and then cleared on unmap before starting
the next transfer.
The situation changes when DMA transfers overlap in time, ex. like this:

  |===== transfer A (2MB) =====|

              |===== transfer B (1MB) =====|

                          |===== transfer C (1MB) =====|
 time --->

In this situation the following sequence of actions happens:

1. transfer A creates a mapping to 2MB area (lock=1)
2. transfer B (1MB) tries to find available mapping but cannot find one
   because transfer A is still in progress, and it has 2MB size + non-zero
   lock. So transfer B creates another mapping -- same address_index,
   but 1MB size.
3. transfer A completes, making 1st mapping entry available by setting its
   lock to 0
4. transfer C starts and tries to find available mapping entry and sees
   that 1st entry has lock=0, so it uses this entry but remaps the mapping
   to a 1MB size
5. transfer B completes and by this time
  - there are two locked entries in the MapCacheEntry list with the SAME
    values for both address_index and size
  - the entry for transfer B actually resides farther in list while
    transfer C's entry is first
6. xen_ram_addr_from_mapcache() for transfer B gets correct address_index
   and size pair from corresponding MapCacheRev entry, but then it starts
   looking for MapCacheEntry with these values and finds the first entry
   -- which belongs to transfer C.

At this point there may be following possible (bad) consequences:

1. xen_ram_addr_from_mapcache() will use a wrong entry->vaddr_base value
   in this statement:

   raddr = (reventry->paddr_index << MCACHE_BUCKET_SHIFT) +
       ((unsigned long) ptr - (unsigned long) entry->vaddr_base);

resulting in an incorrent raddr value returned from the function. The
(ptr - entry->vaddr_base) expression may produce both positive and negative
numbers and its actual value may differ greatly as there are many
map/unmap operations take place. If the value will be beyond guest RAM
limits then a "Bad RAM offset" error will be triggered and logged,
followed by exit() in QEMU.

2. If raddr value won't exceed guest RAM boundaries, the same sequence
of actions will be performed for xen_invalidate_map_cache_entry() on DMA
unmap, resulting in a wrong MapCacheEntry being unmapped while DMA
operation which uses it is still active. The above example must
be extended by one more DMA transfer in order to allow unmapping as the
first mapping in the list is sort of resident.

The patch modifies the behavior in which MapCacheEntry's are added to the
list, avoiding duplicates.

Signed-off-by: Alexey Gerasimenko <x1917x@gmail.com>
Signed-off-by: Stefano Stabellini <sstabellini@kernel.org>
---
 hw/i386/xen/xen-mapcache.c | 13 +++++++++++--
 1 file changed, 11 insertions(+), 2 deletions(-)

diff --git a/hw/i386/xen/xen-mapcache.c b/hw/i386/xen/xen-mapcache.c
index bb1078c..369c3df 100644
--- a/hw/i386/xen/xen-mapcache.c
+++ b/hw/i386/xen/xen-mapcache.c
@@ -234,7 +234,8 @@ static void xen_remap_bucket(MapCacheEntry *entry,
 static uint8_t *xen_map_cache_unlocked(hwaddr phys_addr, hwaddr size,
                                        uint8_t lock, bool dma)
 {
-    MapCacheEntry *entry, *pentry = NULL;
+    MapCacheEntry *entry, *pentry = NULL,
+                  *free_entry = NULL, *free_pentry = NULL;
     hwaddr address_index;
     hwaddr address_offset;
     hwaddr cache_size = size;
@@ -281,14 +282,22 @@ tryagain:
 
     entry = &mapcache->entry[address_index % mapcache->nr_buckets];
 
-    while (entry && entry->lock && entry->vaddr_base &&
+    while (entry && (lock || entry->lock) && entry->vaddr_base &&
             (entry->paddr_index != address_index || entry->size != cache_size ||
              !test_bits(address_offset >> XC_PAGE_SHIFT,
                  test_bit_size >> XC_PAGE_SHIFT,
                  entry->valid_mapping))) {
+        if (!free_entry && !entry->lock) {
+            free_entry = entry;
+            free_pentry = pentry;
+        }
         pentry = entry;
         entry = entry->next;
     }
+    if (!entry && free_entry) {
+        entry = free_entry;
+        pentry = free_pentry;
+    }
     if (!entry) {
         entry = g_malloc0(sizeof (MapCacheEntry));
         pentry->next = entry;