From patchwork Mon May 22 11:26:22 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Sameeh Jubran X-Patchwork-Id: 9739967 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id 6CEEA601C2 for ; Mon, 22 May 2017 11:28:00 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 58F7A20144 for ; Mon, 22 May 2017 11:28:00 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 4D45328047; Mon, 22 May 2017 11:28:00 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.8 required=2.0 tests=BAYES_00,DKIM_SIGNED, RCVD_IN_DNSWL_HI,T_DKIM_INVALID autolearn=ham version=3.3.1 Received: from lists.gnu.org (lists.gnu.org [208.118.235.17]) (using TLSv1 with cipher AES256-SHA (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id A8B6120144 for ; Mon, 22 May 2017 11:27:59 +0000 (UTC) Received: from localhost ([::1]:41745 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1dClVG-00012h-Dj for patchwork-qemu-devel@patchwork.kernel.org; Mon, 22 May 2017 07:27:58 -0400 Received: from eggs.gnu.org ([2001:4830:134:3::10]:43302) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1dClUQ-00011x-Qq for qemu-devel@nongnu.org; Mon, 22 May 2017 07:27:08 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1dClUN-0008BP-Kp for qemu-devel@nongnu.org; Mon, 22 May 2017 07:27:06 -0400 Received: from mail-wm0-x243.google.com ([2a00:1450:400c:c09::243]:34027) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1dClUN-0008BK-EE for qemu-devel@nongnu.org; Mon, 22 May 2017 07:27:03 -0400 Received: by mail-wm0-x243.google.com with SMTP id d127so32609179wmf.1 for ; Mon, 22 May 2017 04:27:03 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=daynix-com.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id:list-id; bh=T0wKsywsJ7y2c8U5F3a7Ubnx3AwrsB2hgENvL+o1Wlg=; b=bkGPCygYI7Dppr15aesBQQnxo2nTLVNSxmNRa+F6Hg87J1vU7BuzXSMc7fOJSvukfI E9hebSFihgzwRvGVBEAn9Zli6DluE6sSnNLgO9DwJ+jiBaY4NB/fefrfDfyqfwmjsd6K lslbNzfn3z+tEaIHdxesMmVFlgpAeriPZTPH4cFrp0gMKIlvB7rw1s7To5MrIyGs+vuS jkTw8r0936CaO06OL60Um7Vyf9MD9YV5jqniNq8PEW5NP+nKnG1ok+fl5rHev7nZcao5 vBVxMQ1ohrdOA3VyDrjOK988bfzhUPPlI9xZ9o4Gc0ZhOuVBB9Z3j84naCym7OyZ1S4S NZIg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:list-id; bh=T0wKsywsJ7y2c8U5F3a7Ubnx3AwrsB2hgENvL+o1Wlg=; b=mgA2CVaFwdh48DmBGct0dU1KqZrQJvrTZ8dAuM9StQfz6K7x/lPKqR76XG4qpzGlEs v3TqlqCirCx5PL1b93nAZdO/TFUGYdn6Jx3NZkj8t6C5s8BNbEjxUnZjrHzkyCVe+nE/ x6bkm+drzeepvWWWonO6XqHIg4tAQGuZI3hciMwK1V/UJywlU+nbLho9JLbjqcAlRjIX WJKEuP7oyiPXLsz4DQx2hD05+rHwLzAyLi/tWKY/1mZeEEi+1WWdtfzkt2Atm9ENwacE wJbGA4Ibb8SBRDdTnNWb+o8i6PKnnIrQHsL/H3lxA5QN1plFWdaIy0jnCWzq/hy6xGjz Lpaw== X-Gm-Message-State: AODbwcBwnYiSDxLMiW1KlnO0uoO91RzKXHCHHgZFHjVY5ps/qiiQ2/fl 8okGIqT+oRYh9LFd X-Received: by 10.28.9.204 with SMTP id 195mr14801004wmj.97.1495452422051; Mon, 22 May 2017 04:27:02 -0700 (PDT) Received: from localhost.localdomain ([141.226.163.173]) by smtp.gmail.com with ESMTPSA id e187sm25362450wmf.31.2017.05.22.04.27.00 (version=TLS1_2 cipher=ECDHE-RSA-AES128-SHA bits=128/128); Mon, 22 May 2017 04:27:01 -0700 (PDT) From: Sameeh Jubran To: qemu-devel@nongnu.org, Jason Wang , Dmitry Fleytman Date: Mon, 22 May 2017 14:26:22 +0300 Message-Id: <1495452382-19840-1-git-send-email-sameeh@daynix.com> X-Mailer: git-send-email 2.7.0.windows.1 X-detected-operating-system: by eggs.gnu.org: Genre and OS details not recognized. X-Received-From: 2a00:1450:400c:c09::243 Subject: [Qemu-devel] [PATCH v2] e1000e: Fix ICR "Other" causes clear logic X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Yan Vugenfirer Errors-To: qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@nongnu.org Sender: "Qemu-devel" X-Virus-Scanned: ClamAV using ClamSMTP This commit fixes a bug which causes the guest to hang. The bug was observed upon a "receive overrun" (bit #6 of the ICR register) interrupt which could be triggered post migration in a heavy traffic environment. Even though the "receive overrun" bit (#6) is masked out by the IMS register (refer to the log below) the driver still receives an interrupt as the "receive overrun" bit (#6) causes the "Other" - bit #24 of the ICR register - bit to be set as documented below. The driver handles the interrupt and clears the "Other" bit (#24) but doesn't clear the "receive overrun" bit (#6) which leads to an infinite loop. Apparently the Windows driver expects that the "receive overrun" bit and other ones - documented below - to be cleared when the "Other" bit (#24) is cleared. So to sum that up: 1. Bit #6 of the ICR register is set by heavy traffic 2. As a results of setting bit #6, bit #24 is set 3. The driver receives an interrupt for bit 24 (it doesn't receieve an interrupt for bit #6 as it is masked out by IMS) 4. The driver handles and clears the interrupt of bit #24 5. Bit #6 is still set. 6. 2 happens all over again The Interrupt Cause Read - ICR register: The ICR has the "Other" bit - bit #24 - that is set when one or more of the following ICR register's bits are set: LSC - bit #2, RXO - bit #6, MDAC - bit #9, SRPD - bit #16, ACK - bit #17, MNG - bit #18 This bug can occur with any of these bits depending on the driver's behaviour and the way it configures the device. However, trying to reproduce it with any bit other than RX0 is challenging and came to failure as the drivers don't implement most of these bits, trying to reproduce it with LSC (Link Status Change - bit #2) bit didn't succeed too as it seems that Windows handles this bit differently. Log sample of the storm: 27563@1494850819.411877:e1000e_irq_pending_interrupts ICR PENDING: 0x1000000 (ICR: 0x815000c2, IMS: 0x1a00004) 27563@1494850819.411900:e1000e_irq_pending_interrupts ICR PENDING: 0x0 (ICR: 0x815000c2, IMS: 0xa00004) 27563@1494850819.411915:e1000e_irq_pending_interrupts ICR PENDING: 0x0 (ICR: 0x815000c2, IMS: 0xa00004) 27563@1494850819.412380:e1000e_irq_pending_interrupts ICR PENDING: 0x0 (ICR: 0x815000c2, IMS: 0xa00004) 27563@1494850819.412395:e1000e_irq_pending_interrupts ICR PENDING: 0x0 (ICR: 0x815000c2, IMS: 0xa00004) 27563@1494850819.412436:e1000e_irq_pending_interrupts ICR PENDING: 0x0 (ICR: 0x815000c2, IMS: 0xa00004) 27563@1494850819.412441:e1000e_irq_pending_interrupts ICR PENDING: 0x0 (ICR: 0x815000c2, IMS: 0xa00004) 27563@1494850819.412998:e1000e_irq_pending_interrupts ICR PENDING: 0x1000000 (ICR: 0x815000c2, IMS: 0x1a00004) * This bug behaviour wasn't observed with the Linux driver. This commit solves: https://bugzilla.redhat.com/show_bug.cgi?id=1447935 https://bugzilla.redhat.com/show_bug.cgi?id=1449490 Signed-off-by: Sameeh Jubran --- hw/net/e1000e_core.c | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/hw/net/e1000e_core.c b/hw/net/e1000e_core.c index 28c5be1..8174b53 100644 --- a/hw/net/e1000e_core.c +++ b/hw/net/e1000e_core.c @@ -2454,14 +2454,17 @@ e1000e_set_ics(E1000ECore *core, int index, uint32_t val) static void e1000e_set_icr(E1000ECore *core, int index, uint32_t val) { + uint32_t icr = 0; if ((core->mac[ICR] & E1000_ICR_ASSERTED) && (core->mac[CTRL_EXT] & E1000_CTRL_EXT_IAME)) { trace_e1000e_irq_icr_process_iame(); e1000e_clear_ims_bits(core, core->mac[IAM]); } - trace_e1000e_irq_icr_write(val, core->mac[ICR], core->mac[ICR] & ~val); - core->mac[ICR] &= ~val; + icr = core->mac[ICR] & ~val; + icr = (val & E1000_ICR_OTHER) ? (icr & ~E1000_ICR_OTHER_CAUSES) : icr; + trace_e1000e_irq_icr_write(val, core->mac[ICR], icr); + core->mac[ICR] = icr; e1000e_update_interrupt_state(core); }