From patchwork Sat Oct 13 00:24:28 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrea Arcangeli X-Patchwork-Id: 10639761 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 486AE14BD for ; Sat, 13 Oct 2018 00:24:40 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 39E4E2BAFF for ; Sat, 13 Oct 2018 00:24:40 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 2E78A2BB04; Sat, 13 Oct 2018 00:24:40 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id A55042BAFF for ; Sat, 13 Oct 2018 00:24:39 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id C7F786B0298; Fri, 12 Oct 2018 20:24:33 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id A1D1B6B0296; Fri, 12 Oct 2018 20:24:33 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 8C42F6B029A; Fri, 12 Oct 2018 20:24:33 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-qk1-f198.google.com (mail-qk1-f198.google.com [209.85.222.198]) by kanga.kvack.org (Postfix) with ESMTP id 55A676B0296 for ; Fri, 12 Oct 2018 20:24:33 -0400 (EDT) Received: by mail-qk1-f198.google.com with SMTP id l75-v6so13270994qke.23 for ; Fri, 12 Oct 2018 17:24:33 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:from:to:cc :subject:date:message-id:in-reply-to:references; bh=SAn+1otRylh5SKzrbcCwYL1Iufz364Vef4rIrxRO5ig=; b=VOw/2hJkIye0k5dOsXksI0jiCCbcbqUYc7HJ6Y0SEz6zu1bgwcI3RW3wh9oFfyqIBF hT4iX9g/2YRk5Z1kX3tJ7Ld1DZwr0FZzaPkiGcBEv+BC4dsSuFfKwJ0DoMKo6TFXbxD1 0nqE0WkULlUmj+kgYpnksqh99YXKfde7D5ej3T7TFp5oy8ndBLKIhm8uG4Sc34hcTyfz HAqxwse+GmrRSzo0HI9LbuOMUKTTYP8US5pUpBFkSsOtgeWGiiMZ/bCSr418t717Ct4b wr0NBkeXcYB3aq43+u/rbZjYtyIH1A8k6MjaVngKc7QgZJujgFoJ+3QVuoqZeuHDQXcy CtNg== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of aarcange@redhat.com designates 209.132.183.28 as permitted sender) smtp.mailfrom=aarcange@redhat.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com X-Gm-Message-State: ABuFfoi8mVE/yOImZeBGFQXDgXvR/tg3cAu49fXIjEr1gQT3uw15uqTj 1FJ0c7G8oJkCezfCNbZAyg1UHwpK5jH8QjrS5DD9zNJk8iEYyacj206+5YO8+aVvH8EmU5e5UDm Sp3QZzj5KaY1VzSK5t7tsu8N+NmoeHR+VBC/V4rPitWsNejcixE8EEhLcfotP2vbjCg== X-Received: by 2002:aed:22e1:: with SMTP id q30-v6mr7933313qtc.145.1539390273118; Fri, 12 Oct 2018 17:24:33 -0700 (PDT) X-Google-Smtp-Source: ACcGV621nv2PzWRHjnoqwpIkggmaaRbOIKCP+isXa2sYSngVmIWIMoGOv4vjflIR7ZrVzqbAOBDm X-Received: by 2002:aed:22e1:: with SMTP id q30-v6mr7933288qtc.145.1539390272483; Fri, 12 Oct 2018 17:24:32 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1539390272; cv=none; d=google.com; s=arc-20160816; b=KfYNQSNcFzWUWhT74Qeunm3GiY2z46TmpH2S+/3WqwDsxiMCseEd+fx6Zuw2j3crHF Sh9GgVLGt7cDmBXgKX3QzB7MjklM/JBh/h1FcqA3vH37CbYoOm72kJrKmBqCVCS130V8 HJmqySWPolymxSPem1hXx6r34OpYJUwmk3V1bHyuf9SqS2BQGXugDeBmqd1hpxjluqX4 egxq9k6Gla5ERsg45kRd64uXBoS+vKnTutXLb+VJpN/XrJCgBCecyEIzPkXm9mfzo+OU 7AtYUeTF/lngvgDv8yvTyKPFBixYMMfG2HkMXTIocJRFyvpM+lcZT3UsLIm/QGEYRZoG CwEA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=references:in-reply-to:message-id:date:subject:cc:to:from; bh=SAn+1otRylh5SKzrbcCwYL1Iufz364Vef4rIrxRO5ig=; b=lOY5oMeMjuUpaA2QtrwvUYyYigSPHVirVOBy9Mx8XkkIl34ptrzaq9VS8ybS3LvXay botbfe+vyLhpGPrVDoRGmQ7AjxHSxm5gOmScEkHRF0xerZXQYobEt3Fm54AWaiz7pGNf 0z/4Y5eGwGcIMOiFOJiY4SmaMZW4SAWga5dJFqlFRH9vxwQzHKaDDxeX1VSDueroCb1a 5iQmAe1gaSIBwZRwcYzEw4NQJ6DoFWJvSQLwUMiuEA+RrkvyyZkIahDKbgYXH3hN5vH/ 6aeSHKwL53LzkMkV7y5CNkKSqmna/6PQOsLuudsmnJDjydlMBaSk0ntEsoSz1cV7PtpB sD+A== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of aarcange@redhat.com designates 209.132.183.28 as permitted sender) smtp.mailfrom=aarcange@redhat.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from mx1.redhat.com (mx1.redhat.com. [209.132.183.28]) by mx.google.com with ESMTPS id a21-v6si955864qtp.98.2018.10.12.17.24.32 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Fri, 12 Oct 2018 17:24:32 -0700 (PDT) Received-SPF: pass (google.com: domain of aarcange@redhat.com designates 209.132.183.28 as permitted sender) client-ip=209.132.183.28; Authentication-Results: mx.google.com; spf=pass (google.com: domain of aarcange@redhat.com designates 209.132.183.28 as permitted sender) smtp.mailfrom=aarcange@redhat.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from smtp.corp.redhat.com (int-mx07.intmail.prod.int.phx2.redhat.com [10.5.11.22]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 9D465750CB; Sat, 13 Oct 2018 00:24:31 +0000 (UTC) Received: from sky.random (ovpn-120-22.rdu2.redhat.com [10.10.120.22]) by smtp.corp.redhat.com (Postfix) with ESMTPS id 4CB901001949; Sat, 13 Oct 2018 00:24:31 +0000 (UTC) From: Andrea Arcangeli To: linux-mm@kvack.org Cc: Aaron Tomlin , Mel Gorman , Jerome Glisse , "Kirill A. Shutemov" , Andrew Morton Subject: [PATCH 1/3] mm: thp: fix MADV_DONTNEED vs migrate_misplaced_transhuge_page race condition Date: Fri, 12 Oct 2018 20:24:28 -0400 Message-Id: <20181013002430.698-2-aarcange@redhat.com> In-Reply-To: <20181013002430.698-1-aarcange@redhat.com> References: <20181013002430.698-1-aarcange@redhat.com> X-Scanned-By: MIMEDefang 2.84 on 10.5.11.22 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.39]); Sat, 13 Oct 2018 00:24:31 +0000 (UTC) X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP This is a corollary of ced108037c2aa542b3ed8b7afd1576064ad1362a, 58ceeb6bec86d9140f9d91d71a710e963523d063, 5b7abeae3af8c08c577e599dd0578b9e3ee6687b. When the above three fixes where posted Dave asked https://lkml.kernel.org/r/929b3844-aec2-0111-fef7-8002f9d4e2b9@intel.com but apparently this was missed. The pmdp_clear_flush* in migrate_misplaced_transhuge_page was introduced in commit a54a407fbf7735fd8f7841375574f5d9b0375f93. The important part of such commit is only the part where the page lock is not released until the first do_huge_pmd_numa_page() finished disarming the pagenuma/protnone. The addition of pmdp_clear_flush() wasn't beneficial to such commit and there's no commentary about such an addition either. I guess the pmdp_clear_flush() in such commit was added just in case for safety, but it ended up introducing the MADV_DONTNEED race condition found by Aaron. At that point in time nobody thought of such kind of MADV_DONTNEED race conditions yet (they were fixed later) so the code may have looked more robust by adding the pmdp_clear_flush(). This specific race condition won't destabilize the kernel, but it can confuse userland because after MADV_DONTNEED the memory won't be zeroed out. This also optimizes the code and removes a superflous TLB flush. Reported-by: Aaron Tomlin Signed-off-by: Andrea Arcangeli Acked-by: Mel Gorman Acked-by: Kirill A. Shutemov --- mm/migrate.c | 26 +++++++++++++++++++------- 1 file changed, 19 insertions(+), 7 deletions(-) diff --git a/mm/migrate.c b/mm/migrate.c index d6a2e89b086a..180e3d0ed16d 100644 --- a/mm/migrate.c +++ b/mm/migrate.c @@ -2082,15 +2082,27 @@ int migrate_misplaced_transhuge_page(struct mm_struct *mm, entry = maybe_pmd_mkwrite(pmd_mkdirty(entry), vma); /* - * Clear the old entry under pagetable lock and establish the new PTE. - * Any parallel GUP will either observe the old page blocking on the - * page lock, block on the page table lock or observe the new page. - * The SetPageUptodate on the new page and page_add_new_anon_rmap - * guarantee the copy is visible before the pagetable update. + * Overwrite the old entry under pagetable lock and establish + * the new PTE. Any parallel GUP will either observe the old + * page blocking on the page lock, block on the page table + * lock or observe the new page. The SetPageUptodate on the + * new page and page_add_new_anon_rmap guarantee the copy is + * visible before the pagetable update. */ flush_cache_range(vma, mmun_start, mmun_end); page_add_anon_rmap(new_page, vma, mmun_start, true); - pmdp_huge_clear_flush_notify(vma, mmun_start, pmd); + /* + * At this point the pmd is numa/protnone (i.e. non present) + * and the TLB has already been flushed globally. So no TLB + * can be currently caching this non present pmd mapping. + * There's no need of clearing the pmd before doing + * set_pmd_at(), nor to flush the TLB after + * set_pmd_at(). Clearing the pmd here would introduce a race + * condition against MADV_DONTNEED, beacuse MADV_DONTNEED only + * holds the mmap_sem for reading. If the pmd is set to NULL + * at any given time, MADV_DONTNEED won't wait on the pmd lock + * and it'll skip clearing this pmd. + */ set_pmd_at(mm, mmun_start, pmd, entry); update_mmu_cache_pmd(vma, address, &entry); @@ -2104,7 +2116,7 @@ int migrate_misplaced_transhuge_page(struct mm_struct *mm, * No need to double call mmu_notifier->invalidate_range() callback as * the above pmdp_huge_clear_flush_notify() did already call it. */ - mmu_notifier_invalidate_range_only_end(mm, mmun_start, mmun_end); + mmu_notifier_invalidate_range_end(mm, mmun_start, mmun_end); /* Take an "isolate" reference and put new page on the LRU. */ get_page(new_page); From patchwork Sat Oct 13 00:24:29 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrea Arcangeli X-Patchwork-Id: 10639763 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 0E814112B for ; Sat, 13 Oct 2018 00:24:44 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id F3ADC2BB01 for ; Sat, 13 Oct 2018 00:24:43 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id E84652BB11; Sat, 13 Oct 2018 00:24:43 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 440112BB01 for ; Sat, 13 Oct 2018 00:24:43 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 1690C6B0296; Fri, 12 Oct 2018 20:24:34 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 11B6A6B029A; Fri, 12 Oct 2018 20:24:34 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id F26AD6B029C; Fri, 12 Oct 2018 20:24:33 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-qt1-f197.google.com (mail-qt1-f197.google.com [209.85.160.197]) by kanga.kvack.org (Postfix) with ESMTP id 9C4FE6B029B for ; Fri, 12 Oct 2018 20:24:33 -0400 (EDT) Received: by mail-qt1-f197.google.com with SMTP id u28-v6so14033746qtu.3 for ; Fri, 12 Oct 2018 17:24:33 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:from:to:cc :subject:date:message-id:in-reply-to:references; bh=3Rx5J7Q2qn7U7fWNfOig0CiOoqZ7a6efEfPFwnX4eUQ=; b=V4RKp+vKkjvnwyE1KczFpaU7HQzpsDCIxve1ylasrk1PXdxC6PoVkDu5A898sWKvUA +HECaEpGk0B6Aiy5wzcuIwA86ow31q3E/HzYRhXZUnQBGWRZGjkAvVYPYOzOLkyrEtA1 AtpDUcKFWX//M4Eqw6yBOcCpXWG74dI/Ka5UQqdj5b7KsFjAY3N2KUFPBidF1wo2Arbo rzVZn6EF77OmKQQUVzARkYgBO2e4xkKH5OAvpcAET2dVLt+JDTuMRBNSd2u7UWAZjxxC opA13t0A60wnBU1h7hoLKfHv1hCMlFwcIX3a4H6aZ5E8PzPRqBqyGquAcA1AvzHD8AZ7 Ji+A== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of aarcange@redhat.com designates 209.132.183.28 as permitted sender) smtp.mailfrom=aarcange@redhat.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com X-Gm-Message-State: ABuFfoiSwkk+xTSNVp2r1Rv6vRluQpZMo6OoJ1zOS8z/aK21jgeHj0pE GYBtp4av9M2zEfkGpo1dLtGsDwo5YDp/pFRvWneRRxQaS97Lla371YS3oukr+o03SsXJcBgZLih /1ssVj+NvC6nYB0QDSriLNBrWIdtx+u2bUgAxyd0i8FKbq2As7BDhkbncFAT6yH+LWw== X-Received: by 2002:a37:284c:: with SMTP id o73-v6mr7955476qkh.333.1539390273348; Fri, 12 Oct 2018 17:24:33 -0700 (PDT) X-Google-Smtp-Source: ACcGV61fpdEJb/fal2sEzHOkOebdwmAVZ0xqJbBbdFjyrW8E5vJ8fyAbOhDgw7EzDdFxjN4HKrhK X-Received: by 2002:a37:284c:: with SMTP id o73-v6mr7955448qkh.333.1539390272487; Fri, 12 Oct 2018 17:24:32 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1539390272; cv=none; d=google.com; s=arc-20160816; b=T55glZLn/pqidcm6UpZAEhi7P+rDLE8m+jXRnj7oj44Kepe8zoN7csZsCDnBQCjPmT BiDElL4NyZqT5GVufi2/rvZfdzvn333h+ZRUceNZL4TGZDixZMvt42AapRc9aY2Eaw2E AhWC+/tvscrXp9UQgwSC40dZ2GAg9MObxTGiMBo747k1Z3oUxmaRaR7NjEq5122yhKjy kVp0oxINfQP1whqZvwRklwWcxkKf0cI6F/rikk5Cx3H5AvAwtNRFfix52zbgiHtl5id/ viPQ2+xxUb+OpYWXt7QgVOAtcbPaCmDvVGo2+CAKcxz2UWHP6Vjx0ZUq1IDu6a2oYrZZ PaYg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=references:in-reply-to:message-id:date:subject:cc:to:from; bh=3Rx5J7Q2qn7U7fWNfOig0CiOoqZ7a6efEfPFwnX4eUQ=; b=OlkCq4fT2lAJzCVsMk9poYyoELjS+0uyKV0vWl9XUMw6e9Q7hgJ5/STzXuTA++amSE j5T3kpgymfJ9HUHha0+nhqDqYVZJKNyrzl2CalqswNTKv/50qjK4M1jMeJlNiwpwgfro BdDwDO8RRzUtmltPD65KO3JQdisJtbKUx2VcyTd5dD51zOVsdcJPIBmMR+QolDWfmbeZ 9j3cTdut1m8ZfmhUOWWTKtT1OjTgDQHD9TwMKH9n8LDygQI0sBYk4Svg9SLD2rTvlHwY rHX6AIzPXul4WAuYXoVjvlQXm/h4jC7DejhXXk9ZX6uFaZRvroFIygFjVKubzo/0nzev 1KZA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of aarcange@redhat.com designates 209.132.183.28 as permitted sender) smtp.mailfrom=aarcange@redhat.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from mx1.redhat.com (mx1.redhat.com. [209.132.183.28]) by mx.google.com with ESMTPS id g10-v6si2604249qvb.171.2018.10.12.17.24.32 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Fri, 12 Oct 2018 17:24:32 -0700 (PDT) Received-SPF: pass (google.com: domain of aarcange@redhat.com designates 209.132.183.28 as permitted sender) client-ip=209.132.183.28; Authentication-Results: mx.google.com; spf=pass (google.com: domain of aarcange@redhat.com designates 209.132.183.28 as permitted sender) smtp.mailfrom=aarcange@redhat.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from smtp.corp.redhat.com (int-mx04.intmail.prod.int.phx2.redhat.com [10.5.11.14]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id A25F1C050012; Sat, 13 Oct 2018 00:24:31 +0000 (UTC) Received: from sky.random (ovpn-120-22.rdu2.redhat.com [10.10.120.22]) by smtp.corp.redhat.com (Postfix) with ESMTPS id 5F91A7B5EE; Sat, 13 Oct 2018 00:24:31 +0000 (UTC) From: Andrea Arcangeli To: linux-mm@kvack.org Cc: Aaron Tomlin , Mel Gorman , Jerome Glisse , "Kirill A. Shutemov" , Andrew Morton Subject: [PATCH 2/3] mm: thp: fix mmu_notifier in migrate_misplaced_transhuge_page() Date: Fri, 12 Oct 2018 20:24:29 -0400 Message-Id: <20181013002430.698-3-aarcange@redhat.com> In-Reply-To: <20181013002430.698-1-aarcange@redhat.com> References: <20181013002430.698-1-aarcange@redhat.com> X-Scanned-By: MIMEDefang 2.79 on 10.5.11.14 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.31]); Sat, 13 Oct 2018 00:24:31 +0000 (UTC) X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP change_huge_pmd() after arming the numa/protnone pmd doesn't flush the TLB right away. do_huge_pmd_numa_page() flushes the TLB before calling migrate_misplaced_transhuge_page(). By the time do_huge_pmd_numa_page() runs some CPU could still access the page through the TLB. change_huge_pmd() before arming the numa/protnone transhuge pmd calls mmu_notifier_invalidate_range_start(). So there's no need of mmu_notifier_invalidate_range_start()/mmu_notifier_invalidate_range_only_end() sequence in migrate_misplaced_transhuge_page() too, because by the time migrate_misplaced_transhuge_page() runs, the pmd mapping has already been invalidated in the secondary MMUs. It has to or if a secondary MMU can still write to the page, the migrate_page_copy() would lose data. However an explicit mmu_notifier_invalidate_range() is needed before migrate_misplaced_transhuge_page() starts copying the data of the transhuge page or the below can happen for MMU notifier users sharing the primary MMU pagetables and only implementing ->invalidate_range: CPU0 CPU1 GPU sharing linux pagetables using only ->invalidate_range ----------- ------------ --------- GPU secondary MMU writes to the page mapped by the transhuge pmd change_pmd_range() mmu..._range_start() ->invalidate_range_start() noop change_huge_pmd() set_pmd_at(numa/protnone) pmd_unlock() do_huge_pmd_numa_page() CPU TLB flush globally (1) CPU cannot write to page migrate_misplaced_transhuge_page() GPU writes to the page... migrate_page_copy() ...GPU stops writing to the page CPU TLB flush (2) mmu..._range_end() (3) ->invalidate_range_stop() noop ->invalidate_range() GPU secondary MMU is invalidated and cannot write to the page anymore (too late) Just like we need a CPU TLB flush (1) because the TLB flush (2) arrives too late, we also need a mmu_notifier_invalidate_range() before calling migrate_misplaced_transhuge_page(), because the ->invalidate_range() in (3) also arrives too late. This requirement is the result of the lazy optimization in change_huge_pmd() that releases the pmd_lock without first flushing the TLB and without first calling mmu_notifier_invalidate_range(). Even converting the removed mmu_notifier_invalidate_range_only_end() into a mmu_notifier_invalidate_range_end() would not have been enough to fix this, because it run after migrate_page_copy(). After the hugepage data copy is done migrate_misplaced_transhuge_page() can proceed and call set_pmd_at without having to flush the TLB nor any secondary MMUs because the secondary MMU invalidate, just like the CPU TLB flush, has to happen before the migrate_page_copy() is called or it would be a bug in the first place (and it was for drivers using ->invalidate_range()). KVM is unaffected because it doesn't implement ->invalidate_range(). The standard PAGE_SIZEd migrate_misplaced_page is less accelerated and uses the generic migrate_pages which transitions the pte from numa/protnone to a migration entry in try_to_unmap_one() and flushes TLBs and all mmu notifiers there before copying the page. Signed-off-by: Andrea Arcangeli Acked-by: Mel Gorman Acked-by: Kirill A. Shutemov Reviewed-by: Aaron Tomlin --- mm/huge_memory.c | 14 +++++++++++++- mm/migrate.c | 19 ++++++------------- 2 files changed, 19 insertions(+), 14 deletions(-) diff --git a/mm/huge_memory.c b/mm/huge_memory.c index a5b28547e321..70b5104075ef 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -1562,8 +1562,20 @@ vm_fault_t do_huge_pmd_numa_page(struct vm_fault *vmf, pmd_t pmd) * We are not sure a pending tlb flush here is for a huge page * mapping or not. Hence use the tlb range variant */ - if (mm_tlb_flush_pending(vma->vm_mm)) + if (mm_tlb_flush_pending(vma->vm_mm)) { flush_tlb_range(vma, haddr, haddr + HPAGE_PMD_SIZE); + /* + * change_huge_pmd() released the pmd lock before + * invalidating the secondary MMUs sharing the primary + * MMU pagetables (with ->invalidate_range()). The + * mmu_notifier_invalidate_range_end() (which + * internally calls ->invalidate_range()) in + * change_pmd_range() will run after us, so we can't + * rely on it here and we need an explicit invalidate. + */ + mmu_notifier_invalidate_range(vma->vm_mm, haddr, + haddr + HPAGE_PMD_SIZE); + } /* * Migrate the THP to the requested node, returns with page unlocked diff --git a/mm/migrate.c b/mm/migrate.c index 180e3d0ed16d..c9e9b7db8b6d 100644 --- a/mm/migrate.c +++ b/mm/migrate.c @@ -2018,8 +2018,8 @@ int migrate_misplaced_transhuge_page(struct mm_struct *mm, int isolated = 0; struct page *new_page = NULL; int page_lru = page_is_file_cache(page); - unsigned long mmun_start = address & HPAGE_PMD_MASK; - unsigned long mmun_end = mmun_start + HPAGE_PMD_SIZE; + unsigned long start = address & HPAGE_PMD_MASK; + unsigned long end = start + HPAGE_PMD_SIZE; /* * Rate-limit the amount of data that is being migrated to a node. @@ -2054,11 +2054,9 @@ int migrate_misplaced_transhuge_page(struct mm_struct *mm, WARN_ON(PageLRU(new_page)); /* Recheck the target PMD */ - mmu_notifier_invalidate_range_start(mm, mmun_start, mmun_end); ptl = pmd_lock(mm, pmd); if (unlikely(!pmd_same(*pmd, entry) || !page_ref_freeze(page, 2))) { spin_unlock(ptl); - mmu_notifier_invalidate_range_end(mm, mmun_start, mmun_end); /* Reverse changes made by migrate_page_copy() */ if (TestClearPageActive(new_page)) @@ -2089,8 +2087,8 @@ int migrate_misplaced_transhuge_page(struct mm_struct *mm, * new page and page_add_new_anon_rmap guarantee the copy is * visible before the pagetable update. */ - flush_cache_range(vma, mmun_start, mmun_end); - page_add_anon_rmap(new_page, vma, mmun_start, true); + flush_cache_range(vma, start, end); + page_add_anon_rmap(new_page, vma, start, true); /* * At this point the pmd is numa/protnone (i.e. non present) * and the TLB has already been flushed globally. So no TLB @@ -2103,7 +2101,7 @@ int migrate_misplaced_transhuge_page(struct mm_struct *mm, * at any given time, MADV_DONTNEED won't wait on the pmd lock * and it'll skip clearing this pmd. */ - set_pmd_at(mm, mmun_start, pmd, entry); + set_pmd_at(mm, start, pmd, entry); update_mmu_cache_pmd(vma, address, &entry); page_ref_unfreeze(page, 2); @@ -2112,11 +2110,6 @@ int migrate_misplaced_transhuge_page(struct mm_struct *mm, set_page_owner_migrate_reason(new_page, MR_NUMA_MISPLACED); spin_unlock(ptl); - /* - * No need to double call mmu_notifier->invalidate_range() callback as - * the above pmdp_huge_clear_flush_notify() did already call it. - */ - mmu_notifier_invalidate_range_end(mm, mmun_start, mmun_end); /* Take an "isolate" reference and put new page on the LRU. */ get_page(new_page); @@ -2141,7 +2134,7 @@ int migrate_misplaced_transhuge_page(struct mm_struct *mm, ptl = pmd_lock(mm, pmd); if (pmd_same(*pmd, entry)) { entry = pmd_modify(entry, vma->vm_page_prot); - set_pmd_at(mm, mmun_start, pmd, entry); + set_pmd_at(mm, start, pmd, entry); update_mmu_cache_pmd(vma, address, &entry); } spin_unlock(ptl); From patchwork Sat Oct 13 00:24:30 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrea Arcangeli X-Patchwork-Id: 10639759 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 7478417E1 for ; Sat, 13 Oct 2018 00:24:37 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 65D772BAFF for ; Sat, 13 Oct 2018 00:24:37 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 59AE82BB04; Sat, 13 Oct 2018 00:24:37 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id E57FB2BAFF for ; Sat, 13 Oct 2018 00:24:36 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id A1F6C6B029D; Fri, 12 Oct 2018 20:24:33 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 92C736B0298; Fri, 12 Oct 2018 20:24:33 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 758416B029C; Fri, 12 Oct 2018 20:24:33 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-qk1-f198.google.com (mail-qk1-f198.google.com [209.85.222.198]) by kanga.kvack.org (Postfix) with ESMTP id 3C7886B0298 for ; Fri, 12 Oct 2018 20:24:33 -0400 (EDT) Received: by mail-qk1-f198.google.com with SMTP id d200-v6so13244490qkc.22 for ; Fri, 12 Oct 2018 17:24:33 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:from:to:cc :subject:date:message-id:in-reply-to:references; bh=RJnUg25d+eLor5UPrJN+kZQntXCyQ0/TA0GnjYU5GcE=; b=IdyHqm4pC8saEha3ukRLHXfGsVfS+KpVR6+ZVbe2OGGGSI9JtK1S/o0iWP9C9NH/Rs WthTdrG/1nVQ3cQ/qbAHeRWDDSwTt9k6CF0X+0qIix8pth2/c01IWEFZweVUi5mb3wz1 xbzeyuqYq/6t/t1YW0WbpOBKoPKbrK9Bw37iprKzpysNCr+Mp9m1WhhlKb5VF2/3JBcj 3CCQtyJezc4TpVVFpVvrwhycRz4W2dfT+nW6s7JbiRpSA+/L618TCGNiRe4kuw5lMkoo 1/HuCW/yn+X7RcFmzBQLbBrXRfke0vkKEbpZRIHFwQgXGpq5M9ukvQZZM7DdTFTR3HuZ z8zw== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of aarcange@redhat.com designates 209.132.183.28 as permitted sender) smtp.mailfrom=aarcange@redhat.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com X-Gm-Message-State: ABuFfoj0+cmWJ7yfQYY4O8ixx0vxdn9zQGHwFfY72ZAO/LlR8Sq7mYba hI3x7nSBnVwM2fTd7WwKNfdctqQF+ncoHdAjoALQ0Nypz9p5dNIseVkSFA0H+ZO8RQ4ZFqdYMnY jyM4J7EWZ+XvbAVDm9tCnqe26piF7h8VTAxWuhT+4pKkOedvd1TSt254wHO2u/kxbRQ== X-Received: by 2002:a0c:db91:: with SMTP id m17mr8085639qvk.166.1539390273002; Fri, 12 Oct 2018 17:24:33 -0700 (PDT) X-Google-Smtp-Source: ACcGV609LWrN8nK+r99NvQDTJgdTq8j8yC2mL0BgOp3S9dfHzbpm7dPz/qUcJLdscPdMwuzU4FXa X-Received: by 2002:a0c:db91:: with SMTP id m17mr8085621qvk.166.1539390272408; Fri, 12 Oct 2018 17:24:32 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1539390272; cv=none; d=google.com; s=arc-20160816; b=XSYehwQYoJhditfFNqVV++I4v5oEaLe1oX/mmHvXN/sLQMyPwg4tN0s964QvarnvK/ VEMQMFhQGGNifhdKuV51Jq0eXojqUf3b1IAPCyI4pjJ+6yp7z3QsqdFUge09ps1kS7Kl mFu1LulEi6N2vyBRrHGKUtCgicpy0biDfSmCpEgBbYXy9OJr7zbPp45OokxRuY9J9PrW h3dHS/VESA14fuUvwjLCcWRCO2xYjqYfApUjPMIwhqHcx8FLmPVZUo5h2uVzmZeYVEA3 At0L9zyBvSRN53S18jsWNhdLFEF/pIw3r6qiaOagRTwaebtnZLq5M0Jv6Ad75wkg/oum vSZg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=references:in-reply-to:message-id:date:subject:cc:to:from; bh=RJnUg25d+eLor5UPrJN+kZQntXCyQ0/TA0GnjYU5GcE=; b=ZeinSwlyForA6CK+fVV4mUDEHyO8hQk8BlqrQUhKb5BEWOZbhPa/JPAu/80KzD75jV AHKhUXhbl3LPUpWLF3NB75eULwlEnJVr9S0mP4hvVEufXUjttA2tjVIJhhVZF5WLnLZI o2oKNM8+koJqn4mgSACZ/EeePcydbuVy79yLB7UAn7Ds6hvbIuKlsh2sFFgOlnsPGpTO V/qaiQulpkCP19KQXNL8KzTcPWQj0KSbNUsl0Dr+QsH6VEIpAJF4yW97gEAbWEwoFiXL gZnfNCkhtdva1ARP5sdqe6dSZkj09+JztLkwOrDUYid9jPhTZxK81Rjs7QxTFo24kKPG Sy8w== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of aarcange@redhat.com designates 209.132.183.28 as permitted sender) smtp.mailfrom=aarcange@redhat.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from mx1.redhat.com (mx1.redhat.com. [209.132.183.28]) by mx.google.com with ESMTPS id w2-v6si2409311qte.137.2018.10.12.17.24.32 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Fri, 12 Oct 2018 17:24:32 -0700 (PDT) Received-SPF: pass (google.com: domain of aarcange@redhat.com designates 209.132.183.28 as permitted sender) client-ip=209.132.183.28; Authentication-Results: mx.google.com; spf=pass (google.com: domain of aarcange@redhat.com designates 209.132.183.28 as permitted sender) smtp.mailfrom=aarcange@redhat.com; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from smtp.corp.redhat.com (int-mx07.intmail.prod.int.phx2.redhat.com [10.5.11.22]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 8476E4628E; Sat, 13 Oct 2018 00:24:31 +0000 (UTC) Received: from sky.random (ovpn-120-22.rdu2.redhat.com [10.10.120.22]) by smtp.corp.redhat.com (Postfix) with ESMTPS id 4CDAA1001F3D; Sat, 13 Oct 2018 00:24:31 +0000 (UTC) From: Andrea Arcangeli To: linux-mm@kvack.org Cc: Aaron Tomlin , Mel Gorman , Jerome Glisse , "Kirill A. Shutemov" , Andrew Morton Subject: [PATCH 3/3] mm: thp: relocate flush_cache_range() in migrate_misplaced_transhuge_page() Date: Fri, 12 Oct 2018 20:24:30 -0400 Message-Id: <20181013002430.698-4-aarcange@redhat.com> In-Reply-To: <20181013002430.698-1-aarcange@redhat.com> References: <20181013002430.698-1-aarcange@redhat.com> X-Scanned-By: MIMEDefang 2.84 on 10.5.11.22 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.29]); Sat, 13 Oct 2018 00:24:31 +0000 (UTC) X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP There should be no cache left by the time we overwrite the old transhuge pmd with the new one. It's already too late to flush through the virtual address because we already copied the page data to the new physical address. So flush the cache before the data copy. Also delete the "end" variable to shutoff a "unused variable" warning on x86 where flush_cache_range() is a noop. Signed-off-by: Andrea Arcangeli Acked-by: Kirill A. Shutemov --- mm/migrate.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/mm/migrate.c b/mm/migrate.c index c9e9b7db8b6d..9bf5fe9a1008 100644 --- a/mm/migrate.c +++ b/mm/migrate.c @@ -2019,7 +2019,6 @@ int migrate_misplaced_transhuge_page(struct mm_struct *mm, struct page *new_page = NULL; int page_lru = page_is_file_cache(page); unsigned long start = address & HPAGE_PMD_MASK; - unsigned long end = start + HPAGE_PMD_SIZE; /* * Rate-limit the amount of data that is being migrated to a node. @@ -2050,6 +2049,8 @@ int migrate_misplaced_transhuge_page(struct mm_struct *mm, /* anon mapping, we can simply copy page->mapping to the new page: */ new_page->mapping = page->mapping; new_page->index = page->index; + /* flush the cache before copying using the kernel virtual address */ + flush_cache_range(vma, start, end + HPAGE_PMD_SIZE); migrate_page_copy(new_page, page); WARN_ON(PageLRU(new_page)); @@ -2087,7 +2088,6 @@ int migrate_misplaced_transhuge_page(struct mm_struct *mm, * new page and page_add_new_anon_rmap guarantee the copy is * visible before the pagetable update. */ - flush_cache_range(vma, start, end); page_add_anon_rmap(new_page, vma, start, true); /* * At this point the pmd is numa/protnone (i.e. non present)