From patchwork Fri Jul 26 13:04:02 2013 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Christian Borntraeger X-Patchwork-Id: 2834129 Return-Path: X-Original-To: patchwork-kvm@patchwork.kernel.org Delivered-To: patchwork-parsemail@patchwork1.web.kernel.org Received: from mail.kernel.org (mail.kernel.org [198.145.19.201]) by patchwork1.web.kernel.org (Postfix) with ESMTP id 703919F243 for ; Fri, 26 Jul 2013 13:04:01 +0000 (UTC) Received: from mail.kernel.org (localhost [127.0.0.1]) by mail.kernel.org (Postfix) with ESMTP id D6ACC201FD for ; Fri, 26 Jul 2013 13:03:59 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 297F720204 for ; Fri, 26 Jul 2013 13:03:57 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1758690Ab3GZNDy (ORCPT ); Fri, 26 Jul 2013 09:03:54 -0400 Received: from e06smtp13.uk.ibm.com ([195.75.94.109]:36126 "EHLO e06smtp13.uk.ibm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1758247Ab3GZNDu (ORCPT ); Fri, 26 Jul 2013 09:03:50 -0400 Received: from /spool/local by e06smtp13.uk.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Fri, 26 Jul 2013 13:59:04 +0100 Received: from d06dlp03.portsmouth.uk.ibm.com (9.149.20.15) by e06smtp13.uk.ibm.com (192.168.101.143) with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted; Fri, 26 Jul 2013 13:59:02 +0100 Received: from b06cxnps4074.portsmouth.uk.ibm.com (d06relay11.portsmouth.uk.ibm.com [9.149.109.196]) by d06dlp03.portsmouth.uk.ibm.com (Postfix) with ESMTP id 476361B0806B; Fri, 26 Jul 2013 14:03:46 +0100 (BST) Received: from d06av09.portsmouth.uk.ibm.com (d06av09.portsmouth.uk.ibm.com [9.149.37.250]) by b06cxnps4074.portsmouth.uk.ibm.com (8.13.8/8.13.8/NCO v10.0) with ESMTP id r6QD3YVN64290882; Fri, 26 Jul 2013 13:03:34 GMT Received: from d06av09.portsmouth.uk.ibm.com (localhost [127.0.0.1]) by d06av09.portsmouth.uk.ibm.com (8.14.4/8.14.4/NCO v10.0 AVout) with ESMTP id r6QD3jpf022998; Fri, 26 Jul 2013 07:03:45 -0600 Received: from tuxmaker.boeblingen.de.ibm.com (tuxmaker.boeblingen.de.ibm.com [9.152.85.9]) by d06av09.portsmouth.uk.ibm.com (8.14.4/8.14.4/NCO v10.0 AVin) with ESMTP id r6QD3jMA022992; Fri, 26 Jul 2013 07:03:45 -0600 Received: by tuxmaker.boeblingen.de.ibm.com (Postfix, from userid 25651) id 52C0A122438C; Fri, 26 Jul 2013 15:03:45 +0200 (CEST) From: Christian Borntraeger To: Gleb Natapov , Paolo Bonzini Cc: Cornelia Huck , Heiko Carstens , Martin Schwidefsky , KVM , linux-s390 , Christian Borntraeger Subject: [PATCH 3/8] KVM: s390: allow sie enablement for multi-threaded programs Date: Fri, 26 Jul 2013 15:04:02 +0200 Message-Id: <1374843847-5046-4-git-send-email-borntraeger@de.ibm.com> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1374843847-5046-1-git-send-email-borntraeger@de.ibm.com> References: <1374843847-5046-1-git-send-email-borntraeger@de.ibm.com> X-TM-AS-MML: No X-Content-Scanned: Fidelis XPS MAILER x-cbid: 13072612-2966-0000-0000-00000841176D Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org X-Spam-Status: No, score=-8.4 required=5.0 tests=BAYES_00, RCVD_IN_DNSWL_HI, RP_MATCHES_RCVD, UNPARSEABLE_RELAY autolearn=unavailable version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on mail.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP From: Martin Schwidefsky Improve the code to upgrade the standard 2K page tables to 4K page tables with PGSTEs to allow the operation to happen when the program is already multi-threaded. Signed-off-by: Martin Schwidefsky Signed-off-by: Christian Borntraeger --- arch/s390/include/asm/mmu.h | 2 - arch/s390/include/asm/mmu_context.h | 19 +--- arch/s390/include/asm/pgtable.h | 11 +++ arch/s390/mm/pgtable.c | 181 +++++++++++++++++++++++------------- 4 files changed, 129 insertions(+), 84 deletions(-) diff --git a/arch/s390/include/asm/mmu.h b/arch/s390/include/asm/mmu.h index 6340178..ff132ac 100644 --- a/arch/s390/include/asm/mmu.h +++ b/arch/s390/include/asm/mmu.h @@ -12,8 +12,6 @@ typedef struct { unsigned long asce_bits; unsigned long asce_limit; unsigned long vdso_base; - /* Cloned contexts will be created with extended page tables. */ - unsigned int alloc_pgste:1; /* The mmu context has extended page tables. */ unsigned int has_pgste:1; } mm_context_t; diff --git a/arch/s390/include/asm/mmu_context.h b/arch/s390/include/asm/mmu_context.h index 084e775..4fb67a0 100644 --- a/arch/s390/include/asm/mmu_context.h +++ b/arch/s390/include/asm/mmu_context.h @@ -21,24 +21,7 @@ static inline int init_new_context(struct task_struct *tsk, #ifdef CONFIG_64BIT mm->context.asce_bits |= _ASCE_TYPE_REGION3; #endif - if (current->mm && current->mm->context.alloc_pgste) { - /* - * alloc_pgste indicates, that any NEW context will be created - * with extended page tables. The old context is unchanged. The - * page table allocation and the page table operations will - * look at has_pgste to distinguish normal and extended page - * tables. The only way to create extended page tables is to - * set alloc_pgste and then create a new context (e.g. dup_mm). - * The page table allocation is called after init_new_context - * and if has_pgste is set, it will create extended page - * tables. - */ - mm->context.has_pgste = 1; - mm->context.alloc_pgste = 1; - } else { - mm->context.has_pgste = 0; - mm->context.alloc_pgste = 0; - } + mm->context.has_pgste = 0; mm->context.asce_limit = STACK_TOP_MAX; crst_table_init((unsigned long *) mm->pgd, pgd_entry_type(mm)); return 0; diff --git a/arch/s390/include/asm/pgtable.h b/arch/s390/include/asm/pgtable.h index 75fb726..7a60bb9 100644 --- a/arch/s390/include/asm/pgtable.h +++ b/arch/s390/include/asm/pgtable.h @@ -1361,6 +1361,17 @@ static inline pmd_t pmd_mkwrite(pmd_t pmd) } #endif /* CONFIG_TRANSPARENT_HUGEPAGE || CONFIG_HUGETLB_PAGE */ +static inline void pmdp_flush_lazy(struct mm_struct *mm, + unsigned long address, pmd_t *pmdp) +{ + int active = (mm == current->active_mm) ? 1 : 0; + + if ((atomic_read(&mm->context.attach_count) & 0xffff) > active) + __pmd_idte(address, pmdp); + else + mm->context.flush_mm = 1; +} + #ifdef CONFIG_TRANSPARENT_HUGEPAGE #define __HAVE_ARCH_PGTABLE_DEPOSIT diff --git a/arch/s390/mm/pgtable.c b/arch/s390/mm/pgtable.c index a8154a1..6d33248 100644 --- a/arch/s390/mm/pgtable.c +++ b/arch/s390/mm/pgtable.c @@ -731,6 +731,11 @@ void gmap_do_ipte_notify(struct mm_struct *mm, unsigned long addr, pte_t *pte) spin_unlock(&gmap_notifier_lock); } +static inline int page_table_with_pgste(struct page *page) +{ + return atomic_read(&page->_mapcount) == 0; +} + static inline unsigned long *page_table_alloc_pgste(struct mm_struct *mm, unsigned long vmaddr) { @@ -750,7 +755,7 @@ static inline unsigned long *page_table_alloc_pgste(struct mm_struct *mm, mp->vmaddr = vmaddr & PMD_MASK; INIT_LIST_HEAD(&mp->mapper); page->index = (unsigned long) mp; - atomic_set(&page->_mapcount, 3); + atomic_set(&page->_mapcount, 0); table = (unsigned long *) page_to_phys(page); clear_table(table, _PAGE_TYPE_EMPTY, PAGE_SIZE/2); clear_table(table + PTRS_PER_PTE, 0, PAGE_SIZE/2); @@ -821,6 +826,11 @@ EXPORT_SYMBOL(set_guest_storage_key); #else /* CONFIG_PGSTE */ +static inline int page_table_with_pgste(struct page *page) +{ + return 0; +} + static inline unsigned long *page_table_alloc_pgste(struct mm_struct *mm, unsigned long vmaddr) { @@ -897,12 +907,12 @@ void page_table_free(struct mm_struct *mm, unsigned long *table) struct page *page; unsigned int bit, mask; - if (mm_has_pgste(mm)) { + page = pfn_to_page(__pa(table) >> PAGE_SHIFT); + if (page_table_with_pgste(page)) { gmap_disconnect_pgtable(mm, table); return page_table_free_pgste(table); } /* Free 1K/2K page table fragment of a 4K page */ - page = pfn_to_page(__pa(table) >> PAGE_SHIFT); bit = 1 << ((__pa(table) & ~PAGE_MASK)/(PTRS_PER_PTE*sizeof(pte_t))); spin_lock_bh(&mm->context.list_lock); if ((atomic_read(&page->_mapcount) & FRAG_MASK) != FRAG_MASK) @@ -940,14 +950,14 @@ void page_table_free_rcu(struct mmu_gather *tlb, unsigned long *table) unsigned int bit, mask; mm = tlb->mm; - if (mm_has_pgste(mm)) { + page = pfn_to_page(__pa(table) >> PAGE_SHIFT); + if (page_table_with_pgste(page)) { gmap_disconnect_pgtable(mm, table); table = (unsigned long *) (__pa(table) | FRAG_MASK); tlb_remove_table(tlb, table); return; } bit = 1 << ((__pa(table) & ~PAGE_MASK) / (PTRS_PER_PTE*sizeof(pte_t))); - page = pfn_to_page(__pa(table) >> PAGE_SHIFT); spin_lock_bh(&mm->context.list_lock); if ((atomic_read(&page->_mapcount) & FRAG_MASK) != FRAG_MASK) list_del(&page->lru); @@ -1033,36 +1043,120 @@ void tlb_remove_table(struct mmu_gather *tlb, void *table) } #ifdef CONFIG_TRANSPARENT_HUGEPAGE -void thp_split_vma(struct vm_area_struct *vma) +static inline void thp_split_vma(struct vm_area_struct *vma) { unsigned long addr; - struct page *page; - for (addr = vma->vm_start; addr < vma->vm_end; addr += PAGE_SIZE) { - page = follow_page(vma, addr, FOLL_SPLIT); - } + for (addr = vma->vm_start; addr < vma->vm_end; addr += PAGE_SIZE) + follow_page(vma, addr, FOLL_SPLIT); } -void thp_split_mm(struct mm_struct *mm) +static inline void thp_split_mm(struct mm_struct *mm) { - struct vm_area_struct *vma = mm->mmap; + struct vm_area_struct *vma; - while (vma != NULL) { + for (vma = mm->mmap; vma != NULL; vma = vma->vm_next) { thp_split_vma(vma); vma->vm_flags &= ~VM_HUGEPAGE; vma->vm_flags |= VM_NOHUGEPAGE; - vma = vma->vm_next; } + mm->def_flags |= VM_NOHUGEPAGE; +} +#else +static inline void thp_split_mm(struct mm_struct *mm) +{ } #endif /* CONFIG_TRANSPARENT_HUGEPAGE */ +static unsigned long page_table_realloc_pmd(struct mmu_gather *tlb, + struct mm_struct *mm, pud_t *pud, + unsigned long addr, unsigned long end) +{ + unsigned long next, *table, *new; + struct page *page; + pmd_t *pmd; + + pmd = pmd_offset(pud, addr); + do { + next = pmd_addr_end(addr, end); +again: + if (pmd_none_or_clear_bad(pmd)) + continue; + table = (unsigned long *) pmd_deref(*pmd); + page = pfn_to_page(__pa(table) >> PAGE_SHIFT); + if (page_table_with_pgste(page)) + continue; + /* Allocate new page table with pgstes */ + new = page_table_alloc_pgste(mm, addr); + if (!new) { + mm->context.has_pgste = 0; + continue; + } + spin_lock(&mm->page_table_lock); + if (likely((unsigned long *) pmd_deref(*pmd) == table)) { + /* Nuke pmd entry pointing to the "short" page table */ + pmdp_flush_lazy(mm, addr, pmd); + pmd_clear(pmd); + /* Copy ptes from old table to new table */ + memcpy(new, table, PAGE_SIZE/2); + clear_table(table, _PAGE_INVALID, PAGE_SIZE/2); + /* Establish new table */ + pmd_populate(mm, pmd, (pte_t *) new); + /* Free old table with rcu, there might be a walker! */ + page_table_free_rcu(tlb, table); + new = NULL; + } + spin_unlock(&mm->page_table_lock); + if (new) { + page_table_free_pgste(new); + goto again; + } + } while (pmd++, addr = next, addr != end); + + return addr; +} + +static unsigned long page_table_realloc_pud(struct mmu_gather *tlb, + struct mm_struct *mm, pgd_t *pgd, + unsigned long addr, unsigned long end) +{ + unsigned long next; + pud_t *pud; + + pud = pud_offset(pgd, addr); + do { + next = pud_addr_end(addr, end); + if (pud_none_or_clear_bad(pud)) + continue; + next = page_table_realloc_pmd(tlb, mm, pud, addr, next); + } while (pud++, addr = next, addr != end); + + return addr; +} + +static void page_table_realloc(struct mmu_gather *tlb, struct mm_struct *mm, + unsigned long addr, unsigned long end) +{ + unsigned long next; + pgd_t *pgd; + + pgd = pgd_offset(mm, addr); + do { + next = pgd_addr_end(addr, end); + if (pgd_none_or_clear_bad(pgd)) + continue; + next = page_table_realloc_pud(tlb, mm, pgd, addr, next); + } while (pgd++, addr = next, addr != end); +} + /* * switch on pgstes for its userspace process (for kvm) */ int s390_enable_sie(void) { struct task_struct *tsk = current; - struct mm_struct *mm, *old_mm; + struct mm_struct *mm = tsk->mm; + struct mmu_gather tlb; /* Do we have switched amode? If no, we cannot do sie */ if (s390_user_mode == HOME_SPACE_MODE) @@ -1072,57 +1166,16 @@ int s390_enable_sie(void) if (mm_has_pgste(tsk->mm)) return 0; - /* lets check if we are allowed to replace the mm */ - task_lock(tsk); - if (!tsk->mm || atomic_read(&tsk->mm->mm_users) > 1 || -#ifdef CONFIG_AIO - !hlist_empty(&tsk->mm->ioctx_list) || -#endif - tsk->mm != tsk->active_mm) { - task_unlock(tsk); - return -EINVAL; - } - task_unlock(tsk); - - /* we copy the mm and let dup_mm create the page tables with_pgstes */ - tsk->mm->context.alloc_pgste = 1; - /* make sure that both mms have a correct rss state */ - sync_mm_rss(tsk->mm); - mm = dup_mm(tsk); - tsk->mm->context.alloc_pgste = 0; - if (!mm) - return -ENOMEM; - -#ifdef CONFIG_TRANSPARENT_HUGEPAGE + down_write(&mm->mmap_sem); /* split thp mappings and disable thp for future mappings */ thp_split_mm(mm); - mm->def_flags |= VM_NOHUGEPAGE; -#endif - - /* Now lets check again if something happened */ - task_lock(tsk); - if (!tsk->mm || atomic_read(&tsk->mm->mm_users) > 1 || -#ifdef CONFIG_AIO - !hlist_empty(&tsk->mm->ioctx_list) || -#endif - tsk->mm != tsk->active_mm) { - mmput(mm); - task_unlock(tsk); - return -EINVAL; - } - - /* ok, we are alone. No ptrace, no threads, etc. */ - old_mm = tsk->mm; - tsk->mm = tsk->active_mm = mm; - preempt_disable(); - update_mm(mm, tsk); - atomic_inc(&mm->context.attach_count); - atomic_dec(&old_mm->context.attach_count); - cpumask_set_cpu(smp_processor_id(), mm_cpumask(mm)); - preempt_enable(); - task_unlock(tsk); - mmput(old_mm); - return 0; + /* Reallocate the page tables with pgstes */ + mm->context.has_pgste = 1; + tlb_gather_mmu(&tlb, mm, 0); + page_table_realloc(&tlb, mm, 0, TASK_SIZE); + tlb_finish_mmu(&tlb, 0, -1); + up_write(&mm->mmap_sem); + return mm->context.has_pgste ? 0 : -ENOMEM; } EXPORT_SYMBOL_GPL(s390_enable_sie);