[v3] x86/mce: Avoid infinite loop for copy from user recovery

There are two cases for machine check recovery:
1) The machine check was triggered by ring3 (application) code.
   This is the simpler case. The machine check handler simply queues
   work to be executed on return to user. That code unmaps the page
   from all users and arranges to send a SIGBUS to the task that
   triggered the poison.
2) The machine check was triggered in kernel code that is covered by
   an extable entry.
   In this case the machine check handler still queues a work entry to
   unmap the page, etc. but this will not be called right away because
   the #MC handler returns to the fix up code address in the extable
   entry.

Problems occur if the kernel triggers another machine check before the
return to user processes the first queued work item.

Specifically the work is queued using the "mce_kill_me" callback
structure in the task struct for the current thread. Attempting to queue
a second work item using this same callback results in a loop in the
linked list of work functions to call. So when the kernel does return to
user it enters an infinite loop processing the same entry for ever.

There are some legitimate scenarios where the kernel may take a second
machine check before returning to the user.

1) Some code (e.g. futex) first tries a get_user() with page faults
   disabled. If this fails, the code retries with page faults enabled
   expecting that this will resolve the page fault.
2) Copy from user code retries a copy in byte-at-time mode to check
   whether any additional bytes can be copied.

On the other side of the fence are some bad drivers that do not check
the return value from individual get_user() calls and may access
multiple user addresses without noticing that some/all calls have
failed.

Fix by adding a counter (current->mce_count) to keep track of repeated
machine checks before task_work() is called. First machine check saves
the address information and calls task_work_add(). Subsequent machine
checks before that task_work call back is executed check that the address
is in the same page as the first machine check (since the callback will
offline exactly one page).

Expected worst case is four machine checks before moving on (e.g. one
user access with page faults disabled, then a repeat to the same address
with page faults enabled ... repeat in copy tail bytes). Just in case
there is some code that loops forever enforce a limit of 10.

Also mark queue_task_work() as "noinstr" (as reported kernel test robot
<lkp@intel.com>)

Cc: <stable@vger.kernel.org>
Fixes: 5567d11c21a1 ("x86/mce: Send #MC singal from task work")
Signed-off-by: Tony Luck <tony.luck@intel.com>
---

> What about a Fixes: tag?

Added a Fixes tag.

Also added "noinstr" to queue_task_work() per a kernel robot report.

Also re-wrote the commit comment (based on questions raised against v2)

> I guess backporting this to the respective kernels is predicated upon
> the existence of those other "places" in the kernel where code assumes
> the EFAULT was because of a #PF.

Not really. I don't expect to change any kernel code that just bounces
off the same machine check a few times. This patch does work best in
conjunction with patches 2 & 3 (unchanged, not reposted here). But it
will fix some old issues even without those two.

 arch/x86/kernel/cpu/mce/core.c | 43 +++++++++++++++++++++++++---------
 include/linux/sched.h          |  1 +
 2 files changed, 33 insertions(+), 11 deletions(-)

Message ID	YT/IJ9ziLqmtqEPu@agluck-desk2.amr.corp.intel.com (mailing list archive)
State	New, archived
Headers	show Return-Path: <linux-edac-owner@kernel.org> X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-13.7 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id ECB17C433F5 for <linux-edac@archiver.kernel.org>; Mon, 13 Sep 2021 21:52:43 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id A95D1610FB for <linux-edac@archiver.kernel.org>; Mon, 13 Sep 2021 21:52:43 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235200AbhIMVx7 (ORCPT <rfc822;linux-edac@archiver.kernel.org>); Mon, 13 Sep 2021 17:53:59 -0400 Received: from mga03.intel.com ([134.134.136.65]:27374 "EHLO mga03.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234843AbhIMVx7 (ORCPT <rfc822;linux-edac@vger.kernel.org>); Mon, 13 Sep 2021 17:53:59 -0400 X-IronPort-AV: E=McAfee;i="6200,9189,10106"; a="221853521" X-IronPort-AV: E=Sophos;i="5.85,290,1624345200"; d="scan'208";a="221853521" Received: from fmsmga003.fm.intel.com ([10.253.24.29]) by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 13 Sep 2021 14:52:42 -0700 X-IronPort-AV: E=Sophos;i="5.85,290,1624345200"; d="scan'208";a="543499530" Received: from agluck-desk2.sc.intel.com (HELO agluck-desk2.amr.corp.intel.com) ([10.3.52.146]) by fmsmga003-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 13 Sep 2021 14:52:41 -0700 Date: Mon, 13 Sep 2021 14:52:39 -0700 From: "Luck, Tony" <tony.luck@intel.com> To: Borislav Petkov <bp@alien8.de> Cc: Jue Wang <juew@google.com>, Ding Hui <dinghui@sangfor.com.cn>, naoya.horiguchi@nec.com, osalvador@suse.de, Youquan Song <youquan.song@intel.com>, huangcun@sangfor.com.cn, x86@kernel.org, linux-edac@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: [PATCH v3] x86/mce: Avoid infinite loop for copy from user recovery Message-ID: <YT/IJ9ziLqmtqEPu@agluck-desk2.amr.corp.intel.com> References: <20210706190620.1290391-1-tony.luck@intel.com> <20210818002942.1607544-1-tony.luck@intel.com> <20210818002942.1607544-2-tony.luck@intel.com> <YT8Y5cBiaD3NpAIi@zn.tnic> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <YT8Y5cBiaD3NpAIi@zn.tnic> Precedence: bulk List-ID: <linux-edac.vger.kernel.org> X-Mailing-List: linux-edac@vger.kernel.org
Series	[v3] x86/mce: Avoid infinite loop for copy from user recovery \| expand [v3] x86/mce: Avoid infinite loop for copy from user recovery

[v3] x86/mce: Avoid infinite loop for copy from user recovery

Commit Message

Comments

Patch