From patchwork Sat Jul 22 00:43:13 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Fabio M. De Francesco" X-Patchwork-Id: 13322688 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4B1A8EB64DD for ; Sat, 22 Jul 2023 00:45:01 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id A7C286B0071; Fri, 21 Jul 2023 20:45:00 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 9DE696B0072; Fri, 21 Jul 2023 20:45:00 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 8577C8D0001; Fri, 21 Jul 2023 20:45:00 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 6F52F6B0071 for ; Fri, 21 Jul 2023 20:45:00 -0400 (EDT) Received: from smtpin09.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id 34572B16A5 for ; Sat, 22 Jul 2023 00:45:00 +0000 (UTC) X-FDA: 81037403160.09.5FD36CB Received: from mail-ej1-f42.google.com (mail-ej1-f42.google.com [209.85.218.42]) by imf30.hostedemail.com (Postfix) with ESMTP id 66CA180003 for ; Sat, 22 Jul 2023 00:44:58 +0000 (UTC) Authentication-Results: imf30.hostedemail.com; dkim=pass header.d=gmail.com header.s=20221208 header.b=Eka2UzXK; spf=pass (imf30.hostedemail.com: domain of fmdefrancesco@gmail.com designates 209.85.218.42 as permitted sender) smtp.mailfrom=fmdefrancesco@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1689986698; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:references:dkim-signature; bh=TSkVUAs/bC4Qut2615ixo6merV8IPi1f4c7VcSRSGl0=; b=Jq80T6Q3nAmhvD0A90W0gr2fq6kOhU+HRVHgNy05W3etNCnPf/yxGIHuxSbJRTP0WZK9kp VDXLufUo8M6yMcdMbPGmmVZGHKOiL+j1F4i+17TsKmMtaX9Q5GTpz03UyeIO5r3nTK9p9p ez8u/EZqcSmbYZ4Kl0fN8dr5FKloXOs= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1689986698; a=rsa-sha256; cv=none; b=FEHUPxhFBTRqMBZUWNKyZP9keZ20n5DnxD6JTt4dPte260y559ZAC1KicDwCAbfhsTis37 mTC9fCPB+QhXb/gMloOpPBz0Xah4ngxPnpXrgtWbwhz/qvwOZgCXa+LygoRIdRU/L9zDUQ XOICUTmrZw5BkaYP8/wrTscJJ2XqyVA= ARC-Authentication-Results: i=1; imf30.hostedemail.com; dkim=pass header.d=gmail.com header.s=20221208 header.b=Eka2UzXK; spf=pass (imf30.hostedemail.com: domain of fmdefrancesco@gmail.com designates 209.85.218.42 as permitted sender) smtp.mailfrom=fmdefrancesco@gmail.com; dmarc=pass (policy=none) header.from=gmail.com Received: by mail-ej1-f42.google.com with SMTP id a640c23a62f3a-991da766865so382993566b.0 for ; Fri, 21 Jul 2023 17:44:57 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20221208; t=1689986697; x=1690591497; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:from:to:cc:subject:date:message-id:reply-to; bh=TSkVUAs/bC4Qut2615ixo6merV8IPi1f4c7VcSRSGl0=; b=Eka2UzXKh4jYI0Tmj260Dr+WcsoM73VBZr4t67pnteCoEuo6f2v5+NQpVU2CbJgk7P CnC78vdJ99kATvqTXIrkvk0lCKfQnAL58eNo6lcRNxbNjw92mCUzxMqPjeCNaVC3FvI/ dbw/cOIlDNI+axS73IiJypZeOVRkT2lSnnExKAqsuBJVmORMhTlGkrw/KapjU9Dm6xui L/EuNEIdvYu4byMwItAaExzXCnXqG7noxoRYAdo8JEPy/LmbjlfljHPEx+4viZYy6U0N fU4FNSQOLXmV1ABQuH1zclehlgoZZRLwFyoxAFb1WKoJd8aDh4hiqRgXQJKPJOCe5laD 02Gg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1689986697; x=1690591497; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=TSkVUAs/bC4Qut2615ixo6merV8IPi1f4c7VcSRSGl0=; b=LzS1PNbskgiBNxaiqOhVrgKpltXeay8045ng+mw1EHufRgNuzPfy6udPYBqtuiwxZ6 DlbUlAQiPz7/jcN6B2zZMkB4XlZ87c5L2VYZrvEe+mzhBwGAINgvy4W+27BVDNqeliPi 9HbNP3d/53s/dV366pOsyf4GTwsduIQi8Dx1PfgZD3z+Tj2x/SlbsOlas0p1tIZmth/T jmz4UnpHaEO0ktsEhrgbwm9ohHrEokiOCjv7FRHJMnuUYPRwSKGhoDPrd+VgoIEBKVGi Ec4i/uXkATQPRTblc2/kNGU/CsPJWcOxX/NR58rx0mytK9xqjGkpy8qtulwPiCBom1Xm Nczg== X-Gm-Message-State: ABy/qLYSWkKNUwoyfBCOM5Jda4RUwE2eOKdi8r3QHqQUmYsmj0LptfhO M8Hwt+k2Wv5aZBtm54v+69c= X-Google-Smtp-Source: APBJJlFkEZtV9yQDYlcm5eBbc3TmV1SftKpzjlhgBWwW1LaVDbWsTb+Z5w0CocRBp9A77mYvyyvq7Q== X-Received: by 2002:a17:906:18c:b0:957:2e48:5657 with SMTP id 12-20020a170906018c00b009572e485657mr3200545ejb.68.1689986696396; Fri, 21 Jul 2023 17:44:56 -0700 (PDT) Received: from localhost.localdomain (host-79-21-231-40.retail.telecomitalia.it. [79.21.231.40]) by smtp.gmail.com with ESMTPSA id r11-20020a056000014b00b0031437299fafsm5507490wrx.34.2023.07.21.17.44.54 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 21 Jul 2023 17:44:55 -0700 (PDT) From: "Fabio M. De Francesco" To: Jonathan Corbet , Jonathan Cameron , Linus Walleij , "Fabio M. De Francesco" , Mike Rapoport , linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, Matthew Wilcox Cc: Andrew Morton , Bagas Sanjaya , Randy Dunlap Subject: [RFC PATCH] Documentation/page_tables: MMU, TLB, and Page Faults Date: Sat, 22 Jul 2023 02:43:13 +0200 Message-ID: <20230722004451.7730-1-fmdefrancesco@gmail.com> X-Mailer: git-send-email 2.41.0 MIME-Version: 1.0 X-Rspamd-Queue-Id: 66CA180003 X-Rspam-User: X-Rspamd-Server: rspam11 X-Stat-Signature: iyt9emuadiqa9ib8r34nkskcyn5mykne X-HE-Tag: 1689986698-282030 X-HE-Meta: U2FsdGVkX1/rLbSvKmSLNWiOEZaYN0FtXMxLHsOyZwH32F1PDUWycc6iOdyAwrvvLFDIG8VlLNjdfUq/B7l0B16J4wyk/e6cFWgrM1fAwpwGhRpZUKs4AOeXz8Y4CYhb8mqP6A5fGPRbELey7T6zjBDwapjft6feIOgE9oq1eBBgbxkvidxeV8lQClTfGAsNUo+Zj2IHBOi5lELR+746f88ur35Pa0rsM2hmuVmxDsEGGNJDFAQeXd35f3FF/7diwlAKsFZ87qj3PRz3FSxzr+AjAlxKzy6TJCD37Xzh9Gzf0IevLJAmzs1pR+Th528om6p6f4q8hDDnJmXzVB1hCV8/iMMPkveMC8uQudCkJWwRFlYoG347RxHSalzxAtp6Y9ncw95CEXhXudKr/8oDB1kKCT6gMvqgF3cMoRB4De7JHF0laPKlCbWuVYBw+/yzV8xeBQmz4voJjhG2rE04I55QFTg/QpUw1Kkz2BV4hy+e6swg+OZdh744XoOyYDAPA49mDdqz5g3cB3xVstXkifoF1kRF62kf2c6n3RjpIjFuS3AKSEAKAjqtj5dDmu8DaMtauZxB+pfDmn+p1Fu7uc5/0bnmnARvJEIRdnYhE0IrcbHLnxDepn3bnmIlEu8KgGqbF0yD01me7sv0mxhvBzS/Z+JB3KCbp8fo0lLYiGkQOlljMhskeExiulYYrLzTenAuuvtHqqJmMnG3on4RjVAEgxrYS/6cByBt0OpSbrK/haGe5g0MoTQ433L3Sn9kiBhEa4c5L7NfXVp0llQW2kwVhgS5Uu+3GjC+pF+mHbqwG0o0zjmrOE+2kV2devq+JbWiiTCmwJJpa8BKjeGYe2vdEEJP97lsqpDcg6pxqUbbPbCL4tTSh1BFGs0xW82BLlIMmgcv6Uc9nu4l0SsPNm03d02Jni3TTKHnVefDjzpxV9lffGi6Pa83YZM0kL9ypKDnmTLFRs31Hbo6ish oX4iI0Ki IRYv/MNuFPKQ+WekIiX6uxyfTraapwXGY8WxsMc4hKpXK1euMKo+1PccyUh61cofaHS2HGdo7RAdZZzqh3CD9smp5rOlEX0PEqvoYsVLZoQ+ZIAcOKO9B6rCXQFptqrOKca1WZp55gxkR5+fGziCdh7k4Ka8Q5v92P0Saaw7V1nVKNJlmXQ6Sen2Ghwbm5jnYDEl/ni/TZ/4mREeFd0JNIs5NSY1UjfeMBpQJuadDFZpbpjcvTd1xu4Q7D6RehzDXJuz2OalAe5hdtcfxC2XsYlURw5rhTVpVMawBmjbnin2OFPYEAK8O/XE9K4wE592Fvdjn8vrS5wTeMmRX4OMDO7e48UK5U0wP28raeineisGc+G9yNQl/erlG0vsNM+QgFP61MlW40PIIV2Z9988kxvT30yaMwUTw1a5icHL7pNeUxWofDfcbycoJF64YEDXkO4+eIyEsP+8gmTy1ZTuPHQhbx+GRPZwJEUwn8phwDBTZkWHvvIfWQtdegp7SiNsoEdEJuw4OXpQxh4OQSPb2+TsM/5J3CkeaoZko5fMpbeZYNXPX3XBt5PZeCYwvjiqZW5s5v/if67EJ4CaiYUTTgc4p3ly78P631fHY8qMEcOTJ68EqRTRJAjW12/MmfUTlYdB7i6iBotHtvJVO3361k5ifljoxV1BVU22uqvlfs0ThH6boo5t9MR8gx46vtFF25NatiLp6hHP2PznpwXm8ZTQIGQErE/xHdjYn6wGGnl5jWuQTaFjlTLYV/OCao3zGxxJPgauap9PPMhxnnhRU5gv40gR8/tpf4zlr7UCvHZI6YfP2AqZrfFEa4g== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Extend page_tables.rst by adding a small introductive section about the role of MMU and TLB in translating between virtual addresses and physical page frames. Furthermore explain the concepts behind the Page Faults exceptions and how Linux handles them. Cc: Andrew Morton Cc: Bagas Sanjaya Cc: Jonathan Cameron Cc: Jonathan Corbet Cc: Linus Walleij Cc: Matthew Wilcox Cc: Mike Rapoport Cc: Randy Dunlap Signed-off-by: Fabio M. De Francesco --- This is an RFC PATCH because of two reasons: 1) I've heard that there is consensus about the need to revise and extend the MM documentation, but I'm not sure about whether or not developers need these kind of introductory information. 2) While preparing this little patch I decided to take a quicj look at the code and found out it currently is not how I thought I remembered it. I'm especially speaking about the x86 case. I'm not sure that I've been able to properly understand what I described as a difference in workflow compared to most of the other architecture. Therefore, for the two reasons explained above, I'd like to hear from people actively involved in MM. If this is not what you want, feel free to throw it away. Otherwise I'd be happy to write more on this and other MM topics. I'm looking forward for comments on this small work. Documentation/mm/page_tables.rst | 61 ++++++++++++++++++++++++++++++++ 1 file changed, 61 insertions(+) diff --git a/Documentation/mm/page_tables.rst b/Documentation/mm/page_tables.rst index 7840c1891751..fa617894fda8 100644 --- a/Documentation/mm/page_tables.rst +++ b/Documentation/mm/page_tables.rst @@ -152,3 +152,64 @@ Page table handling code that wishes to be architecture-neutral, such as the virtual memory manager, will need to be written so that it traverses all of the currently five levels. This style should also be preferred for architecture-specific code, so as to be robust to future changes. + + +MMU, TLB, and Page Faults +========================= + +The Memory Management Unit (MMU) is a hardware component that handles virtual to +physical address translations. It uses a relatively small cache in hardware +called the Translation Lookaside Buffer (TLB) to speed up these translations. +When a process wants to access a memory location, the CPU provides a virtual +address to the MMU, which then uses the TLB to quickly find the corresponding +physical address. + +However, sometimes the MMU can't find a valid translation in the TLB. This +could be because the process is trying to access a range of memory that it's not +allowed to, or because the memory hasn't been loaded into RAM yet. When this +happens, the MMU triggers a page fault, which is a type of interrupt that +signals the CPU to pause the current process and run a special function to +handle the fault. + +One cause of page faults is due to bugs (or maliciously crafted addresses) and +happens when a process tries to access a range of memory that it doesn't have +permission to. This could be because the memory is reserved for the kernel or +for another process, or because the process is trying to write to a read-only +section of memory. When this happens, the kernel sends a Segmentation Fault +(SIGSEGV) signal to the process, which usually causes the process to terminate. + +An expected and more common cause of page faults is "lazy allocation". This is +a technique used by the Kernel to improve memory efficiency and reduce +footprint. Instead of allocating physical memory to a process as soon as it's +requested, the kernel waits until the process actually tries to use the memory. +This can save a significant amount of memory in cases where a process requests +a large block but only uses a small portion of it. + +A related technique is "Copy-on-Write" (COW), where the Kernel allows multiple +processes to share the same physical memory as long as they're only reading +from it. If a process tries to write to the shared memory, the kernel triggers +a page fault and allocates a separate copy of the memory for the process. This +allows the kernel to save memory and avoid unnecessary data copying and, by +doing so, it reduces latency. + +Now, let's see how the Linux kernel handles these page faults: + +1. For most architectures, `do_page_fault()` is the primary interrupt handler + for page faults. It delegates the actual handling of the page fault to + `handle_mm_fault()`. This function checks the cause of the page fault and + takes the appropriate action, such as loading the required page into + memory, granting the process the necessary permissions, or sending a + SIGSEGV signal to the process. + +2. In the specific case of the x86 architecture, the interrupt handler is + defined by the `DEFINE_IDTENTRY_RAW_ERRORCODE()` macro, which calls + `handle_page_fault()`. This function then calls either + `do_user_addr_fault()` or `do_kern_addr_fault()`, depending on whether + the fault occurred in user space or kernel space. Both of these functions + eventually lead to `handle_mm_fault()`, similar to the workflow in other + architectures. + +The actual implementation of the workflow is very complex. Its design allows +Linux to handle page faults in a way that is tailored to the specific +characteristics of each architecture, while still sharing a common overall +structure.