From patchwork Wed Aug 24 12:03:16 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Brian Foster X-Patchwork-Id: 12953318 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id EA660C32796 for ; Wed, 24 Aug 2022 12:03:25 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 7155F940008; Wed, 24 Aug 2022 08:03:25 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 69D79940007; Wed, 24 Aug 2022 08:03:25 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 51818940008; Wed, 24 Aug 2022 08:03:25 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 3CEBC940007 for ; Wed, 24 Aug 2022 08:03:25 -0400 (EDT) Received: from smtpin06.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id 0883141AE9 for ; Wed, 24 Aug 2022 12:03:25 +0000 (UTC) X-FDA: 79834351170.06.631D73E Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by imf28.hostedemail.com (Postfix) with ESMTP id 70BD6C007D for ; Wed, 24 Aug 2022 12:03:22 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1661342601; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type; bh=IREFwbsssCnPdMxw+qKEjfvLgl/phNZh1Old9gDyDzk=; b=LfwQvl6vRoLpdMKDISaCW8LqFDVT73jK9e4SnrygFvCP0eizg/8VhgcQWsg56BjlcjsXeQ PZXG8KXXqHREi66mh0SZ7R2o1Vk1S7Q08vCXCbMamzYKFz9LRFhlsiZYqMAgeG9oe9vWQS 6YgxcNqYaU91wxOCuLo3hhzBdJiV2rg= Received: from mail-qt1-f197.google.com (mail-qt1-f197.google.com [209.85.160.197]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_128_GCM_SHA256) id us-mta-594-sH3ED2R5PxiUBUlejC2Ouw-1; Wed, 24 Aug 2022 08:03:20 -0400 X-MC-Unique: sH3ED2R5PxiUBUlejC2Ouw-1 Received: by mail-qt1-f197.google.com with SMTP id fy12-20020a05622a5a0c00b00344569022f7so12735699qtb.17 for ; Wed, 24 Aug 2022 05:03:20 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-disposition:mime-version:message-id:subject:cc:to:from:date :x-gm-message-state:from:to:cc; bh=IREFwbsssCnPdMxw+qKEjfvLgl/phNZh1Old9gDyDzk=; b=QJA/LVRUNkiOkg0IVhtjlJsfgYZPRgfKqQSjPIMFSXpn5abP4XlHQm6Xe7vyHeK5te QrniKtnN1JRfSHESGQBgqAnmPJ7UB+SlQZKHqKiivCTM9a2lIi4ml3zd6QeF8Lg1xkC8 2qWaMB61p4gYwIiFqtBrjgm0Q/ZVYdh1k4SrR2lOQMhBI88yQCRsanpeZJXf3fADs9EI 0164mFC2wgkGohttznKL5OKloVFHr51LnHIKMOFngNnuqKx+8Wm1pWsr1xqbrITXsVhe 67588xUKSN8AVpGzexEq9+3u6eIKf3w2sEA0hNi4vo4tTd1yzQolhNFqQLrSKFR0QdaM 6Dxg== X-Gm-Message-State: ACgBeo28V7/zmLNNCOEfKSlwBKEI5K6RZRTgsBVtgHnbwYlS62tSZQUh rEG+mC7XzWhcRnQ8uimHPlqSUB+kIsQkqt6q+Wdj64eTqqxiBRSzQbVNoAEkYiavhWwO2i53TVm /R6R8/kjNP1A= X-Received: by 2002:a0c:e482:0:b0:496:c63a:a3a3 with SMTP id n2-20020a0ce482000000b00496c63aa3a3mr19951021qvl.28.1661342600068; Wed, 24 Aug 2022 05:03:20 -0700 (PDT) X-Google-Smtp-Source: AA6agR5MGSGtqp5K0jUfkK/bGA4cXqEFJ8yleKktWHT4MnDkrVdt9Ky9Xy+IlP2pD/RKvKZiHeBLKQ== X-Received: by 2002:a0c:e482:0:b0:496:c63a:a3a3 with SMTP id n2-20020a0ce482000000b00496c63aa3a3mr19950975qvl.28.1661342599644; Wed, 24 Aug 2022 05:03:19 -0700 (PDT) Received: from bfoster (c-24-61-119-116.hsd1.ma.comcast.net. [24.61.119.116]) by smtp.gmail.com with ESMTPSA id ca26-20020a05622a1f1a00b003445b83de67sm12905509qtb.3.2022.08.24.05.03.18 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 24 Aug 2022 05:03:19 -0700 (PDT) Date: Wed, 24 Aug 2022 08:03:16 -0400 From: Brian Foster To: linux-s390@vger.kernel.org Cc: linux-mm@kvack.org, Jason Gunthorpe , David Hildenbrand Subject: [RFC] s390: kernel BUG at include/linux/mm.h:1529! Message-ID: MIME-Version: 1.0 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Disposition: inline ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1661342602; a=rsa-sha256; cv=none; b=5pH4UyXzPv0H3SMQFo9vCBKjvjr4iuvLUmW78M76xpmrBHN6HtWr0xtL/6LeeBd4VhmKOn gvJQ4r3W14waOGKpi0Ws43xCqQxGFTMb6kZe61nskCBlhWMwAdNC8UaB+Mg7lTEyBYa2DU hnkpi52rgtgxlDaTSHC6O6/99tK3HvY= ARC-Authentication-Results: i=1; imf28.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=LfwQvl6v; dmarc=pass (policy=none) header.from=redhat.com; spf=pass (imf28.hostedemail.com: domain of bfoster@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=bfoster@redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1661342602; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding:in-reply-to: references:dkim-signature; bh=IREFwbsssCnPdMxw+qKEjfvLgl/phNZh1Old9gDyDzk=; b=kpSDLLBbdujk/ASJz0rQ4NyRJsFyrLVVlynmj7n5j0DPjKL23XOaeG0XqNySDQq4sSuIEo XRMIzCZ6beWb/L7V8fqTQyWyVHSeJywZGX/XuxF9HlIEtQdwRm+vl8i7ZBome1xFd2AKNF l8qzV+yDCYc5hWgxOqAalnq75YSu/hY= X-Rspamd-Server: rspam04 X-Rspamd-Queue-Id: 70BD6C007D X-Stat-Signature: d97errkxkncthpw9jfazueu3pb4cwan8 Authentication-Results: imf28.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=LfwQvl6v; dmarc=pass (policy=none) header.from=redhat.com; spf=pass (imf28.hostedemail.com: domain of bfoster@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=bfoster@redhat.com X-Rspam-User: X-HE-Tag: 1661342602-135466 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Hi all, When running a fuzzer workload to test an unrelated patch[1], I've been reproducing the VM_BUG_ON() splat below[2] on s390x. I've narrowed the problem down to a deterministic reproducer. The code for that is also appended below[3]. The splat occurs because during fork() we end up down in copy_present_pte() -> page_try_dup_anon_rmap() -> page_needs_cow_for_dma() for a !is_cow mapping, so copy_page_range() didn't acquire the ->write_protect_seq seqlock as expected. After digging into this a bit, I _think_ this boils down to a bug in the s390 arch fault code dealing with a write fault to a !VM_WRITE mapping.. The sequence of events implemented by the reproducer that leads to this: 1. Create a shmem segment and attach it SHM_RDONLY. This causes do_mmap() to set up a !VM_WRITE mapping, but also clear (VM_MAYWRITE|VM_SHARED) on the mapping because the backing shmem file is read-only. 2. Take a write fault on the mapping in kernel mode (via getrlimit() in this case). The write fault ultimately causes getrlimit() to fail with a bad access error, but not before the generic fault code creates an anon_vma mapping for the page. This occurs because first do_dat_exception() calls handle_mm_fault() with FAULT_FLAG_WRITE via the following logic: access = VM_ACCESS_FLAGS; ... if (access == VM_WRITE || is_write) flags |= FAULT_FLAG_WRITE; ... if (unlikely(!(vma->vm_flags & access))) goto out_up; ... fault = handle_mm_fault(...); So the FAULT_FLAG_WRITE fault proceeds because is_write is true and ->vm_flags has read or exec permission (but not VM_WRITE). This eventually gets down into do_cow_fault() -> finish_fault() -> do_set_pte(), the latter of which calls page_add_new_anon_rmap() because this is a write fault to a !shared mapping. Note this is immediately followed by a do_protection_exception() that uses access = VM_WRITE and thus fails the above check and returns with VM_FAULT_BADACCESS. So I think this ultimately DTRT wrt to failing the syscall to userspace, but the do_dat_exception() handling sets up an unexpected situation for fork().. 3. fork() -> dup_mm() comes across this mapping with ->anon_vma set (so vma_needs_copy() returns true), but is_cow_mapping() returns false because VM_MAYWRITE is cleared. From there we fall down into the page table copying path described by the BUG splat. This problem doesn't occur on x86 seemingly because we don't call into handle_mm_fault() for a write fault to a !VM_WRITE mapping, which is specifically checked in access_error(). Therefore, something like the patch below[4] seems to prevent the problem on s390. However, the access checking logic looks wonky enough to me that I wonder whether it warrants a closer look from s390 experts. For example, does this code need to care about any flags/context beyond write or read faults vs. write or !write mappings? I don't have enough context to reason about it. Could somebody more familiar with these two s390 exception variants chime in? Finally, note that so far I've only really tested the patch against the reproducer. I'm happy to try and form it into a proper patch and test further after some feedback... thanks. Brian [1] https://lore.kernel.org/linux-s390/20220816155407.537372-1-bfoster@redhat.com [2] BUG splat: kernel BUG at include/linux/mm.h:1529! monitor event: 0040 ilc:2 [#1] SMP Modules linked in: rfkill sunrpc ghash_s390 prng xts aes_s390 des_s390 libdes sha3_512_s390 sha3_256_s390 vfio_ccw sha512_s390 mdev vfio_iommu_type1 vfio xfs libcrc32c virtio_blk virtio_net net_failover failover dm_mirror dm_region_hash dm_log dm_mod pkey zcrypt CPU: 1 PID: 1401 Comm: shmem-fork-test Not tainted 6.0.0-rc2+ #20 Hardware name: IBM 8561 LT1 400 (KVM/Linux) Krnl PSW : 0704c00180000000 0000000014928240 (copy_pte_range+0xa40/0xe58) R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:0 PM:0 RI:0 EA:3 Krnl GPRS: 000003ff85b80000 000000000000000c 0000000000000000 000003ff85b80000 0000000091c5f31f 0000000087d70640 000000008160e800 00000372024717c0 000003ff85b80000 0000000000000000 00000000831c9c00 0000000091c5f31f 00000000823ada00 0000000087d70640 00000000149279c2 0000038000773880 Krnl Code: 0000000014928232: c0e5fffff48f brasl %r14,0000000014926b50 0000000014928238: a7f4fd43 brc 15,0000000014927cbe #000000001492823c: af000000 mc 0,0 >0000000014928240: b904005b lgr %r5,%r11 0000000014928244: a7f4ffde brc 15,0000000014928200 0000000014928248: e310f0e80004 lg %r1,232(%r15) 000000001492824e: a7f4ff17 brc 15,000000001492807c 0000000014928252: ec3800091c7c cgij %r3,28,8,0000000014928264 Call Trace: [<0000000014928240>] copy_pte_range+0xa40/0xe58 ([<00000000149279c2>] copy_pte_range+0x1c2/0xe58) [<000000001492e258>] copy_page_range+0x510/0x770 [<00000000146f3896>] dup_mmap+0x47e/0x6c0 [<00000000146f3b52>] dup_mm+0x7a/0x278 [<00000000146f5a48>] copy_process+0x1298/0x1a48 [<00000000146f62fe>] kernel_clone+0x5e/0x3c0 [<00000000146f6742>] __do_sys_clone+0x5a/0x68 [<00000000146f67e0>] __s390x_sys_clone+0x40/0x50 [<0000000014f68dac>] __do_syscall+0x1d4/0x200 [<0000000014f78c22>] system_call+0x82/0xb0 Last Breaking-Event-Address: [<0000000014927a46>] copy_pte_range+0x246/0xe58 Kernel panic - not syncing: Fatal exception: panic_on_oops [3] minimal reproducer: #include #include #include int main() { int id; void *p; id = shmget(IPC_PRIVATE, 4096, IPC_CREAT); p = shmat(id, NULL, SHM_RDONLY); getrlimit(RLIMIT_AS, p); fork(); return 0; } [4] RFC patch: diff --git a/arch/s390/mm/fault.c b/arch/s390/mm/fault.c index 13449941516c..c12722da1558 100644 --- a/arch/s390/mm/fault.c +++ b/arch/s390/mm/fault.c @@ -418,6 +418,8 @@ static inline vm_fault_t do_exception(struct pt_regs *regs, int access) fault = VM_FAULT_BADACCESS; if (unlikely(!(vma->vm_flags & access))) goto out_up; + if ((flags & FAULT_FLAG_WRITE) && !(vma->vm_flags & VM_WRITE)) + goto out_up; if (is_vm_hugetlb_page(vma)) address &= HPAGE_MASK;