[v3] x86/sgx: Fix deadlock and race conditions between fork() and EPC reclaim

From: Sean Christopherson <sean.j.christopherson@intel.com>

From: Sean Christopherson <sean.j.christopherson@intel.com>

Drop the synchronize_srcu() from sgx_encl_mm_add() and replace it with a
mm_list versioning concept to avoid deadlock when adding a mm during
dup_mmap()/fork(), and to ensure copied PTEs are zapped.

When dup_mmap() runs, it holds mmap_sem for write in both the old mm and
new mm.  Invoking synchronize_srcu() while holding mmap_sem of a mm that
is already attached to the enclave will deadlock if the reclaimer is in
the process of walking mm_list, as the reclaimer will try to acquire
mmap_sem (of the old mm) while holding encl->srcu for read.

 INFO: task ksgxswapd:181 blocked for more than 120 seconds.
 ksgxswapd       D    0   181      2 0x80004000
 Call Trace:
  __schedule+0x2db/0x700
  schedule+0x44/0xb0
  rwsem_down_read_slowpath+0x370/0x470
  down_read+0x95/0xa0
  sgx_reclaim_pages+0x1d2/0x7d0
  ksgxswapd+0x151/0x2e0
  kthread+0x120/0x140
  ret_from_fork+0x35/0x40

 INFO: task fork_consistenc:18824 blocked for more than 120 seconds.
 fork_consistenc D    0 18824  18786 0x00004320
 Call Trace:
  __schedule+0x2db/0x700
  schedule+0x44/0xb0
  schedule_timeout+0x205/0x300
  wait_for_completion+0xb7/0x140
  __synchronize_srcu.part.22+0x81/0xb0
  synchronize_srcu_expedited+0x27/0x30
  synchronize_srcu+0x57/0xe0
  sgx_encl_mm_add+0x12b/0x160
  sgx_vma_open+0x22/0x40
  dup_mm+0x521/0x580
  copy_process+0x1a56/0x1b50
  _do_fork+0x85/0x3a0
  __x64_sys_clone+0x8e/0xb0
  do_syscall_64+0x57/0x1b0
  entry_SYSCALL_64_after_hwframe+0x44/0xa9

Furthermore, doing synchronize_srcu() in sgx_encl_mm_add() does not
prevent the new mm from having stale PTEs pointing at the EPC page to be
reclaimed.  dup_mmap() calls vm_ops->open()/sgx_encl_mm_add() _after_
PTEs are copied to the new mm, i.e. blocking fork() until reclaim zaps
the old mm is pointless as the stale PTEs have already been created in
the new mm.

All other flows that walk mm_list can safely race with dup_mmap() or are
protected by a different mechanism.  Add comments to all srcu readers
that don't check the list version to document why its ok for the flow to
ignore the version.

Note, synchronize_srcu() is still needed when removing a mm from an
enclave, as the srcu readers must complete their walk before the mm can
be freed.  Removing a mm is never done while holding mmap_sem.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
---
v3:
* Sanitized version list version handling in sgx_reclaimer_block().
  With the fences it was quite complicted given that the version
  was read both in the beginning and end of the loop.
* Removed comment before cpumask_clear() because technically it is
  not part of this bug fix.
v2:
* Remove smp_wmb() as x86 does not reorder writes in the pipeline.
* Refine comments to be more to the point and more maintainable when
  things might change.
* Replace the ad hoc (goto-based) loop construct with a proper loop
  construct.
 arch/x86/kernel/cpu/sgx/encl.c    | 11 +++++++--
 arch/x86/kernel/cpu/sgx/encl.h    |  1 +
 arch/x86/kernel/cpu/sgx/ioctl.c   |  1 +
 arch/x86/kernel/cpu/sgx/reclaim.c | 41 ++++++++++++++++++++++---------
 4 files changed, 40 insertions(+), 14 deletions(-)

Message ID	20200404010741.24486-1-jarkko.sakkinen@linux.intel.com (mailing list archive)
State	New, archived
Headers	show Return-Path: <SRS0=o5ed=5U=vger.kernel.org=linux-sgx-owner@kernel.org> Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id C5A27912 for <patchwork-linux-sgx@patchwork.kernel.org>; Sat, 4 Apr 2020 01:07:46 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 98EF12074B for <patchwork-linux-sgx@patchwork.kernel.org>; Sat, 4 Apr 2020 01:07:46 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726208AbgDDBHq (ORCPT <rfc822;patchwork-linux-sgx@patchwork.kernel.org>); Fri, 3 Apr 2020 21:07:46 -0400 Received: from mga01.intel.com ([192.55.52.88]:17461 "EHLO mga01.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726170AbgDDBHq (ORCPT <rfc822;linux-sgx@vger.kernel.org>); Fri, 3 Apr 2020 21:07:46 -0400 IronPort-SDR: 81qAe6h8TY/6jGHI7L08qMjCXITEzqZ6ENMCQd9AIAhfg9/pL7A0nHNBwOshPhRGvM/Li1Ccw4 GJchsD/bH40w== X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga005.jf.intel.com ([10.7.209.41]) by fmsmga101.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 03 Apr 2020 18:07:45 -0700 IronPort-SDR: R4kvR7QDorijRX9OFn81Xt3Gq3VmvoVUxIBRJwnPwSJ3jqNN3FO0oGfUxBn49xpf0y0fv1XQ9d bSUTK97o/Agg== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.72,341,1580803200"; d="scan'208";a="423715851" Received: from shyleshu-mobl.amr.corp.intel.com (HELO localhost) ([10.252.37.189]) by orsmga005.jf.intel.com with ESMTP; 03 Apr 2020 18:07:43 -0700 From: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com> To: linux-sgx@vger.kernel.org Cc: Sean Christopherson <sean.j.christopherson@intel.com>, Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com> Subject: [PATCH v3] x86/sgx: Fix deadlock and race conditions between fork() and EPC reclaim Date: Sat, 4 Apr 2020 04:07:41 +0300 Message-Id: <20200404010741.24486-1-jarkko.sakkinen@linux.intel.com> X-Mailer: git-send-email 2.25.1 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Sender: linux-sgx-owner@vger.kernel.org Precedence: bulk List-ID: <linux-sgx.vger.kernel.org> X-Mailing-List: linux-sgx@vger.kernel.org
Series	[v3] x86/sgx: Fix deadlock and race conditions between fork() and EPC reclaim \| expand [v3] x86/sgx: Fix deadlock and race conditions between fork() and EPC reclaim

[v3] x86/sgx: Fix deadlock and race conditions between fork() and EPC reclaim

Commit Message

Comments

Patch