From patchwork Tue May 15 13:15:29 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Jia He X-Patchwork-Id: 10401081 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id 73CD5601C8 for ; Tue, 15 May 2018 13:16:59 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 5F577287F4 for ; Tue, 15 May 2018 13:16:59 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 501A6287F9; Tue, 15 May 2018 13:16:59 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00, DKIM_ADSP_CUSTOM_MED, DKIM_SIGNED, DKIM_VALID, FREEMAIL_FROM, MAILING_LIST_MULTI autolearn=ham version=3.3.1 Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id 9431A287F4 for ; Tue, 15 May 2018 13:16:58 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20170209; h=Sender: Content-Transfer-Encoding:Content-Type:Cc:List-Subscribe:List-Help:List-Post: List-Archive:List-Unsubscribe:List-Id:In-Reply-To:MIME-Version:Date: Message-ID:References:To:From:Subject:Reply-To:Content-ID:Content-Description :Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID: List-Owner; bh=xY3I1oUL+J2VZcuz8Py+U750F5WXDBjXftO9mC/v3fw=; b=TNTuLElfoJcHkY 415/SokVGKBk0TSr0rs0V1NVbZ0NefEwPSajxT/tRHipPCtqaKKk50f9WR9TYXIFLQchzafVresBc ZqcI+Yr9M8C2WXfjSkeO6mek6grUZDjXCuXHtrIzqobj9vJDoUEb7rXCCCAf7nJFNc6fAQm+dEbT5 /GcrUkSDplNg732GCuSgfqWOWE71zKQpJpeLQOm2Re+M3VYNW2qDiqf0+T6eQCE+mu9i5J8KJ2TqI kvDD6gM5rSJwBpxB2b3RIfdnpKENq68i22t++Y7ehaDfMduZVFCQp8ZiOaNhSJrn10v1KwEY0eYqh SG3in0ixYLXVhmnHwgRw==; Received: from localhost ([127.0.0.1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.90_1 #2 (Red Hat Linux)) id 1fIZow-0005bT-4v; Tue, 15 May 2018 13:16:50 +0000 Received: from mail-pl0-x229.google.com ([2607:f8b0:400e:c01::229]) by bombadil.infradead.org with esmtps (Exim 4.90_1 #2 (Red Hat Linux)) id 1fIZnz-0004xD-SW for linux-arm-kernel@lists.infradead.org; Tue, 15 May 2018 13:15:58 +0000 Received: by mail-pl0-x229.google.com with SMTP id ay10-v6so50173plb.1 for ; Tue, 15 May 2018 06:15:41 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=subject:from:to:cc:references:message-id:date:user-agent :mime-version:in-reply-to:content-transfer-encoding; bh=ZFtwBphnH1hHKCnZOoEl0d0XC2hzmYyE6U8uxsp3jxU=; b=G2Hor+WFw2WbXuUyUCXQd4rHY3MooPM73sKjhfis9rxArJbPmkZ/2FjhATPM+fJUcu zIQFqn3rlkMlBOu26x6pFMVQCSMsA+p63WP1TTQH/AJw28tnl+bLbPjdYzp/PE27sJuW c3Vn8nvY+BtaRYjaWn/uC2r4pRmgJT6FZHzKBU+X4HUbjimvwKT/fRNdD22E9nKTpuO1 28Rkc/1IpJzw2OlGbqRt8AZGiMVZPLMS1zwahRE7JPm0zjIQU2dhU7+lV3aaRpdmO2tf kqtXe2lNYFlxFTkUQrRiWeJPyhU8rJWZNPjF+zKMpPbiN2vLln7lQqv3DvJSQPYs8hfV B2wQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:from:to:cc:references:message-id:date :user-agent:mime-version:in-reply-to:content-transfer-encoding; bh=ZFtwBphnH1hHKCnZOoEl0d0XC2hzmYyE6U8uxsp3jxU=; b=II+ybz9dum5HM1W3dr+xtqFManEV3Ae6Hy1qYrD/qtmX9p3oL/OliQXiLOUSeU+3/x 9p9S6fpuyGcrukX00JZNQ0K4a2nAmreE0wx1qTV+wETZIameYSeCCFNYh8f4FcY3aPUP A76MZV3zRAwBzkWex3+znfiHOoCogUA0Iam7jIw6zwn9a1SuzH1bMPIB1QPKRvWv6Jxf ioQCQQkZBNRnxPzyw0owt5Dz3VXi6UVItzOGfRlAQGP5ccXFxQKy2HT/gGMLfVrmhURk HDdErxEI+rLF3I7hdLFylv9k2v+9hluEpW/IV3DYzK3EnFqzGWMhcpSWba09Rq4xc1Tt 3xZA== X-Gm-Message-State: ALKqPwecRcc59jtUKGcgY9YygJvCUn2QujPSKZW2t5PSRygIFD54Tn9y yFLVz4FdLJbj10TY6cNaKtk= X-Google-Smtp-Source: AB8JxZqbY/JXrRZUUS/BDZSR9NoVI8Py25keIzDb4vbrMqHQZJeP+2dBdx/TGsGVHzA8CtbzRj6l8w== X-Received: by 2002:a17:902:6b04:: with SMTP id o4-v6mr14421697plk.101.1526390140494; Tue, 15 May 2018 06:15:40 -0700 (PDT) Received: from [0.0.0.0] (67.216.217.169.16clouds.com. [67.216.217.169]) by smtp.gmail.com with ESMTPSA id h191-v6sm71460pgc.11.2018.05.15.06.15.33 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 15 May 2018 06:15:39 -0700 (PDT) Subject: Re: [PATCH] KVM: arm/arm64: fix unaligned hva start and end in handle_hva_to_gpa From: Jia He To: Suzuki K Poulose , Marc Zyngier , Christoffer Dall , linux-arm-kernel@lists.infradead.org, kvmarm@lists.cs.columbia.edu References: <1525244911-5519-1-git-send-email-hejianet@gmail.com> <04e6109f-cbbd-24b0-03bb-9247b930d42c@arm.com> <85e04362-05dd-c697-e9c4-ad5824e63819@gmail.com> <0f134188-12d5-7184-3fbd-0ec8204cf649@arm.com> <695beacb-ff51-bb2c-72ef-d268f7d4e59d@arm.com> <1e065e9b-4dad-611d-fc5b-26fe6c031507@gmail.com> <14954b3a-6a2a-fe05-47b7-1890375ab8a4@arm.com> Message-ID: <5efb2da1-fed8-443b-d06b-cc03152b2c2e@gmail.com> Date: Tue, 15 May 2018 21:15:29 +0800 User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:52.0) Gecko/20100101 Thunderbird/52.7.0 MIME-Version: 1.0 In-Reply-To: X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20180515_061552_045342_260BDF40 X-CRM114-Status: GOOD ( 26.80 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: li.zhang@hxt-semitech.com, hughd@google.com, linux-kernel@vger.kernel.org, Jia He , "Andrea Arcangeli ; Minchan Kim ; Claudio Imbrenda ; Arvind Yadav ; Mike Rapoport" , akpm@linux-foundation.org Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+patchwork-linux-arm=patchwork.kernel.org@lists.infradead.org X-Virus-Scanned: ClamAV using ClamSMTP On 5/15/2018 8:38 PM, Jia He Wrote: > Hi Suzuki > > On 5/15/2018 4:36 PM, Suzuki K Poulose Wrote: >> >> Hi Jia >> >> On 05/15/2018 03:03 AM, Jia He wrote: >>> Hi Suzuki >>> >>> I will merge the other thread into this, and add the necessary CC list >>> >>> That WARN_ON call trace is very easy to reproduce in my armv8a server after I >>> start 20 guests >>> >>> and run memhog in the host. Of course, ksm should be enabled >>> >>> For you question about my inject fault debug patch: >> >> >> Thanks for the patch, comments below. >> >>> >> >> ... >> >>> index 7f6a944..ab8545e 100644 >>> --- a/virt/kvm/arm/mmu.c >>> +++ b/virt/kvm/arm/mmu.c >>> @@ -290,12 +290,17 @@ static void unmap_stage2_puds(struct kvm *kvm, pgd_t *pgd, >>>    * destroying the VM), otherwise another faulting VCPU may come in and mess >>>    * with things behind our backs. >>>    */ >>> +extern int trigger_by_ksm; >>>   static void unmap_stage2_range(struct kvm *kvm, phys_addr_t start, u64 size) >>>   { >>>          pgd_t *pgd; >>>          phys_addr_t addr = start, end = start + size; >>>          phys_addr_t next; >>> >>> +       if(trigger_by_ksm) { >>> +               end -= 0x200; >>> +       } >>> + >>>          assert_spin_locked(&kvm->mmu_lock); >>>          pgd = kvm->arch.pgd + stage2_pgd_index(addr); >>>          do { >>> >>> I need to point out that I never reproduced it without this debugging patch. >> >> That could trigger the panic iff your "size" <= 0x200, leading to the >> condition (end < start), which can make the loop go forever, as we >> do while(addr < end) and end up accessing something which may not be PGD entry >> and thus get a bad page with bad numbers all around. This case could be hit only >> with your change and the bug in the KSM which gives us an address near the page >> boundary. > No, I injected the fault on purpose to simulate the case when size is less than > PAGE_SIZE(eg. PAGE_SIZE-0x200=65024) > I ever got the panic info [1] *without* the debugging patch only once > > [1] https://lkml.org/lkml/2018/5/9/992 >> >> So, I think we can safely ignore the PANIC(). >> More below. >> >> >>>>> Suzuki, thanks for the comments. >>>>> >>>>> I proposed another ksm patch https://lkml.org/lkml/2018/5/3/1042 >>>>> The root cause is ksm will add some extra flags to indicate that the page >>>>> is in/not_in the stable tree. This makes address not be aligned with PAGE_SIZE. >>>> Thanks for the pointer. In the future, please Cc the people relevant to the >>>> discussion in the patches. >>>> >>>>>   From arm kvm mmu point of view, do you think handle_hva_to_gpa still need >>>>> to handle >>>>> the unalignment case? >>>> I don't think we should do that. Had we done this, we would never have caught >>>> this bug >>>> in KSM. Eventually if some other new implementation comes up with the a new >>>> notifier >>>> consumer which doesn't check alignment and doesn't WARN, it could simply do >>>> the wrong >>>> thing. So I believe what we have is a good measure to make sure that things are >>>> in the right order. >>>> >>>>> IMO, the PAGE_SIZE alignment is still needed because we should not let the >>>>> bottom function >>>>> kvm_age_hva_handler to handle the exception. Please refer to the >>>>> implementation in X86 and >>>>> powerpc kvm_handle_hva_range(). They both aligned the hva with >>>>> hva_to_gfn_memslot. >>>>> >>>>   From an API perspective, you are passed on a "start" and "end" address. So, >>>> you could potentially >>>> do the wrong thing if you align the "start" and "end". May be those handlers >>>> should also do the >>>> same thing as we do. >> >>> But handle_hva_to_gpa has partially adjusted the alignment possibly: >>>     1750         kvm_for_each_memslot(memslot, slots) { >>>     1751                 unsigned long hva_start, hva_end; >>>     1752                 gfn_t gpa; >>>     1753 >>>     1754                 hva_start = max(start, memslot->userspace_addr); >>>     1755                 hva_end = min(end, memslot->userspace_addr + >>>     1756                             (memslot->npages << PAGE_SHIFT)); >>> >>> at line 1755, let us assume that end=0x12340200 and >>> memslot->userspace_addr + (memslot->npages << PAGE_SHIFT)=0x12340000 >>> Then, hva_start is not page_size aligned and hva_end is aligned, and the size >>> will be PAGE_SIZE-0x200, >>> just as what I had done in the inject fault debugging patch. >> >> Thats because we want to limit the handling of the hva/gpa range by memslot. So, >> we make sure we pass on the range within the given memslot >> to hva_to_gfn_memslot(). But we do iterate over the next memslot if the >> original range falls in to the next slot. So, in practice, there is no >> alignment/trimming of the range. Its just that we pass on the appropriate range >> for each slot. >> > Yes, I understand what the codes did in hva_to_gfn_memslot(). What I mean is > hva_end may be changed and (hva_end - hva_start) will not be same as the > parameter _size_ ? > >> ret |= handler(kvm, gpa, (u64)(hva_end - hva_start), data); > > Anyway, I have to admit that all the exceptions are originally caused by the > STABLE_FLAG in ksm code. What I want to discuss here is how to make arm kvm > handle the exception more gracefully. > Hi Suzuki How about this patch (balance of avoiding the WARN_ON storm and debugging convenience): /* @@ -1792,7 +1794,7 @@ static int kvm_set_spte_handler(struct kvm *kvm, gpa_t gpa, u64 size, void *data { pte_t *pte = (pte_t *)data; - WARN_ON(size != PAGE_SIZE); + WARN_ON_ONCE(size != PAGE_SIZE); /* * We can always call stage2_set_pte with KVM_S2PTE_FLAG_LOGGING_ACTIVE * flag clear because MMU notifiers will have unmapped a huge PMD before @@ -1823,7 +1825,7 @@ static int kvm_age_hva_handler(struct kvm *kvm, gpa_t gpa, u64 size, void *data) pmd_t *pmd; pte_t *pte; - WARN_ON(size != PAGE_SIZE && size != PMD_SIZE); + WARN_ON_ONCE(size != PAGE_SIZE && size != PMD_SIZE); pmd = stage2_get_pmd(kvm, NULL, gpa); if (!pmd || pmd_none(*pmd)) /* Nothing there */ return 0; @@ -1843,7 +1845,7 @@ static int kvm_test_age_hva_handler(struct kvm *kvm, gpa_t gpa, u64 size, void * pmd_t *pmd; pte_t *pte; - WARN_ON(size != PAGE_SIZE && size != PMD_SIZE); + WARN_ON_ONCE(size != PAGE_SIZE && size != PMD_SIZE); pmd = stage2_get_pmd(kvm, NULL, gpa); if (!pmd || pmd_none(*pmd)) /* Nothing there */ return 0; diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c index 7f6a944..4033946 100644 --- a/virt/kvm/arm/mmu.c +++ b/virt/kvm/arm/mmu.c @@ -297,6 +297,8 @@ static void unmap_stage2_range(struct kvm *kvm, phys_addr_t start, u64 size) phys_addr_t next; assert_spin_locked(&kvm->mmu_lock); + + WARN_ON_ONCE(size & ~PAGE_MASK); pgd = kvm->arch.pgd + stage2_pgd_index(addr); do {