From patchwork Mon Dec 21 16:25:22 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Liang Li X-Patchwork-Id: 11984869 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.0 required=3.0 tests=BAYES_00, DKIM_ADSP_CUSTOM_MED,DKIM_INVALID,DKIM_SIGNED,FREEMAIL_FORGED_FROMDOMAIN, FREEMAIL_FROM,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED, USER_AGENT_SANE_1 autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7D677C433E0 for ; Mon, 21 Dec 2020 16:25:28 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id EDB3022B51 for ; Mon, 21 Dec 2020 16:25:27 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org EDB3022B51 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=gmail.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 252B06B006E; Mon, 21 Dec 2020 11:25:27 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 1DC4E6B0070; Mon, 21 Dec 2020 11:25:27 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 02E8E6B0071; Mon, 21 Dec 2020 11:25:26 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0055.hostedemail.com [216.40.44.55]) by kanga.kvack.org (Postfix) with ESMTP id D88986B006E for ; Mon, 21 Dec 2020 11:25:26 -0500 (EST) Received: from smtpin19.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with ESMTP id 74174181AEF30 for ; Mon, 21 Dec 2020 16:25:26 +0000 (UTC) X-FDA: 77617814652.19.coach73_1a0f10427458 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin19.hostedemail.com (Postfix) with ESMTP id 607C51AD31E for ; Mon, 21 Dec 2020 16:25:26 +0000 (UTC) X-HE-Tag: coach73_1a0f10427458 X-Filterd-Recvd-Size: 9356 Received: from mail-pf1-f182.google.com (mail-pf1-f182.google.com [209.85.210.182]) by imf44.hostedemail.com (Postfix) with ESMTP for ; Mon, 21 Dec 2020 16:25:25 +0000 (UTC) Received: by mail-pf1-f182.google.com with SMTP id d2so6720896pfq.5 for ; Mon, 21 Dec 2020 08:25:25 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:date:to:cc:subject:message-id:mail-followup-to:mime-version :content-disposition:content-transfer-encoding:user-agent; bh=RreyNqsnjscC5R/r4F9+ZP7PLXJJgWh0SrYbZXMQ7Ck=; b=UME8HYaKBAOkfsOcODuWSX+ujRpuMpzE9XWcMw58aNCebW/yY3m8eljUcU9oEE31Zx 0WoWO/H+2/ZvJ1o7LjbmbZFjkE8IjEiYDpEBytn0fvWHn/0zW8kF2iYFGGZnRctUmQGR t2TFjj+mdfUKuBhmvmzx670lS5HGr+PwV9B9xx1TMQbTIJBA9asZTpf0ejMa/8YqciZd 9EhgtrQl5UhChFFSP0EuVG1Itq1luxOtORAZLFAv1xb2L1TZvdNiGD2BxUSCnRNWOVkm J2M41dAzLIqUenT/1wT9/bTtLAkjXlKSImMC10uBSLLqA5qQiAstaB99NtMXVV7BafNX JXWw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:date:to:cc:subject:message-id :mail-followup-to:mime-version:content-disposition :content-transfer-encoding:user-agent; bh=RreyNqsnjscC5R/r4F9+ZP7PLXJJgWh0SrYbZXMQ7Ck=; b=LDegagDvjWuvmVPICBHsS0ViM3UKy+J+CRnp6UgRUCHLpxRDk+nsJxaoQqmIWga/nm 6GQ+pjpq+fJ7Y2M1cbpkyoIwBeGfh9SBuBzFqA4z87t+ClzfsaD1xqq54RpMkfRJEXBT FPQqDTwlxqpTFOBL25TaDRgDB3lkXBuKhh3jgCx2LWVkOCvCmc6ArVpsYKfuYNuYjqFw gy3gqMW6FSHiGOvUjrYfxQrUrr4fZ/Ludf5+qL5kWt+UnbWU+yCAGjb/Ac3Rc6Cni9px I9/kinQP+wjRRSe5lgpJ2C87fek/qL+UyAiHlKXkX6+v3CqMPbVsbPupNsB6WX0AN7UQ E9Ew== X-Gm-Message-State: AOAM530OyZCj8wNuCs7dWSxC33XSL8y/GlTtzbPuU3E0vgatJgEkwkZ+ t5epELF4Lokqe+D5cXQpoFA= X-Google-Smtp-Source: ABdhPJxN9YG/duwDt2XXOC+T6qR53zkVFRq6L+FhhHUQwY1KeuaZzuR/KoC2zECjmq5gCfJjwRSucw== X-Received: by 2002:a63:5014:: with SMTP id e20mr15805755pgb.152.1608567924960; Mon, 21 Dec 2020 08:25:24 -0800 (PST) Received: from open-light-1.localdomain (66.98.113.28.16clouds.com. [66.98.113.28]) by smtp.gmail.com with ESMTPSA id b72sm16608601pfb.129.2020.12.21.08.25.23 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Mon, 21 Dec 2020 08:25:24 -0800 (PST) From: Liang Li X-Google-Original-From: Liang Li Date: Mon, 21 Dec 2020 11:25:22 -0500 To: Alexander Duyck , Mel Gorman , Andrew Morton , Andrea Arcangeli , Dan Williams , "Michael S. Tsirkin" , David Hildenbrand , Jason Wang , Dave Hansen , Michal Hocko , Liang Li Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, virtualization@lists.linux-foundation.org Subject: [RFC v2 PATCH 0/4] speed up page allocation for __GFP_ZERO Message-ID: <20201221162519.GA22504@open-light-1.localdomain> Mail-Followup-To: Alexander Duyck , Mel Gorman , Andrew Morton , Andrea Arcangeli , Dan Williams , "Michael S. Tsirkin" , David Hildenbrand , Jason Wang , Dave Hansen , Michal Hocko , Liang Li , linux-mm@kvack.org, linux-kernel@vger.kernel.org, virtualization@lists.linux-foundation.org MIME-Version: 1.0 Content-Disposition: inline User-Agent: Mutt/1.5.21 (2010-09-15) X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: The first version can be found at: https://lkml.org/lkml/2020/4/12/42 Zero out the page content usually happens when allocating pages with the flag of __GFP_ZERO, this is a time consuming operation, it makes the population of a large vma area very slowly. This patch introduce a new feature for zero out pages before page allocation, it can help to speed up page allocation with __GFP_ZERO. My original intention for adding this feature is to shorten VM creation time when SR-IOV devicde is attached, it works good and the VM creation time is reduced by about 90%. Creating a VM [64G RAM, 32 CPUs] with GPU passthrough ===================================================== QEMU use 4K pages, THP is off round1 round2 round3 w/o this patch: 23.5s 24.7s 24.6s w/ this patch: 10.2s 10.3s 11.2s QEMU use 4K pages, THP is on round1 round2 round3 w/o this patch: 17.9s 14.8s 14.9s w/ this patch: 1.9s 1.8s 1.9s ===================================================== Obviously, it can do more than this. We can benefit from this feature in the flowing case: Interactive sence ================= Shorten application lunch time on desktop or mobile phone, it can help to improve the user experience. Test shows on a server [Intel(R) Xeon(R) CPU E5-2620 v3 @ 2.40GHz], zero out 1GB RAM by the kernel will take about 200ms, while some mainly used application like Firefox browser, Office will consume 100 ~ 300 MB RAM just after launch, by pre zero out free pages, it means the application launch time will be reduced about 20~60ms (can be visual sensed?). May be we can make use of this feature to speed up the launch of Andorid APP (I didn't do any test for Android). Virtulization ============= Speed up VM creation and shorten guest boot time, especially for PCI SR-IOV device passthrough scenario. Compared with some of the para vitalization solutions, it is easy to deploy because it’s transparent to guest and can handle DMA properly in BIOS stage, while the para virtualization solution can’t handle it well. Improve guest performance when use VIRTIO_BALLOON_F_REPORTING for memory overcommit. The VIRTIO_BALLOON_F_REPORTING feature will report guest page to the VMM, VMM will unmap the corresponding host page for reclaim, when guest allocate a page just reclaimed, host will allocate a new page and zero it out for guest, in this case pre zero out free page will help to speed up the proccess of fault in and reduce the performance impaction. Speed up kernel routine ======================= This can’t be guaranteed because we don’t pre zero out all the free pages, but is true for most case. It can help to speed up some important system call just like fork, which will allocate zero pages for building page table. And speed up the process of page fault, especially for huge page fault. The POC of Hugetlb free page pre zero out has been done. Security ======== This is a weak version of "introduce init_on_alloc=1 and init_on_free=1 boot options", which zero out page in a asynchronous way. For users can't tolerate the impaction of 'init_on_alloc=1' or 'init_on_free=1' brings, this feauture provide another choice. For the feedback of the first version, cache pollution is the main concern of the mm guys, On the other hand, this feature is really helpful for some use case. May be we should let the user decide wether to use it. So a switch is added in the /sys files, users who don’t like it can turn off the switch, or by configuring a large batch size to reduce cache pollution. To make the whole function works, support of pre zero out free huge pages should be added for hugetlbfs, I will send another patch for it. Liang Li (4): mm: let user decide page reporting option mm: pre zero out free pages to speed up page allocation for __GFP_ZERO mm: make page reporing worker works better for low order page mm: Add batch size for free page reporting drivers/virtio/virtio_balloon.c | 3 + include/linux/highmem.h | 31 +++- include/linux/page-flags.h | 16 +- include/linux/page_reporting.h | 3 + include/trace/events/mmflags.h | 7 + mm/Kconfig | 10 ++ mm/Makefile | 1 + mm/huge_memory.c | 3 +- mm/page_alloc.c | 4 + mm/page_prezero.c | 266 ++++++++++++++++++++++++++++++++ mm/page_prezero.h | 13 ++ mm/page_reporting.c | 49 +++++- mm/page_reporting.h | 16 +- 13 files changed, 405 insertions(+), 17 deletions(-) create mode 100644 mm/page_prezero.c create mode 100644 mm/page_prezero.h Cc: Alexander Duyck Cc: Mel Gorman Cc: Andrea Arcangeli Cc: Dan Williams Cc: Dave Hansen Cc: David Hildenbrand Cc: Michal Hocko Cc: Andrew Morton Cc: Alex Williamson Cc: Michael S. Tsirkin Signed-off-by: Liang Li