From patchwork Mon Jan 15 17:03:11 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Claudio Imbrenda X-Patchwork-Id: 10164843 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id 29C71602C2 for ; Mon, 15 Jan 2018 17:03:27 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 13A93201F3 for ; Mon, 15 Jan 2018 17:03:27 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 0756B20223; Mon, 15 Jan 2018 17:03:27 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.9 required=2.0 tests=BAYES_00,RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 02D6E201F3 for ; Mon, 15 Jan 2018 17:03:26 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S935042AbeAORDX (ORCPT ); Mon, 15 Jan 2018 12:03:23 -0500 Received: from mx0a-001b2d01.pphosted.com ([148.163.156.1]:40066 "EHLO mx0a-001b2d01.pphosted.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S934993AbeAORDV (ORCPT ); Mon, 15 Jan 2018 12:03:21 -0500 Received: from pps.filterd (m0098394.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.16.0.22/8.16.0.22) with SMTP id w0FGxw4V006266 for ; Mon, 15 Jan 2018 12:03:21 -0500 Received: from e06smtp15.uk.ibm.com (e06smtp15.uk.ibm.com [195.75.94.111]) by mx0a-001b2d01.pphosted.com with ESMTP id 2fh00rh0tx-1 (version=TLSv1.2 cipher=AES256-SHA bits=256 verify=NOT) for ; Mon, 15 Jan 2018 12:03:20 -0500 Received: from localhost by e06smtp15.uk.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Mon, 15 Jan 2018 17:03:18 -0000 Received: from b06cxnps4075.portsmouth.uk.ibm.com (9.149.109.197) by e06smtp15.uk.ibm.com (192.168.101.145) with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted; Mon, 15 Jan 2018 17:03:15 -0000 Received: from d06av24.portsmouth.uk.ibm.com (mk.ibm.com [9.149.105.60]) by b06cxnps4075.portsmouth.uk.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id w0FH3EZp41681118; Mon, 15 Jan 2018 17:03:14 GMT Received: from d06av24.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 53C0342041; Mon, 15 Jan 2018 16:56:43 +0000 (GMT) Received: from d06av24.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 1EC3E4204D; Mon, 15 Jan 2018 16:56:43 +0000 (GMT) Received: from p-imbrenda.boeblingen.de.ibm.com (unknown [9.152.224.212]) by d06av24.portsmouth.uk.ibm.com (Postfix) with ESMTPS; Mon, 15 Jan 2018 16:56:43 +0000 (GMT) From: Claudio Imbrenda To: kvm@vger.kernel.org Cc: borntraeger@de.ibm.com, cohuck@redhat.com, pbonzini@redhat.com, david@redhat.com, schwidefsky@de.ibm.com Subject: [PATCH v1 2/2] KVM: s390: Fix storage attributes migration with memory slots Date: Mon, 15 Jan 2018 18:03:11 +0100 X-Mailer: git-send-email 2.7.4 In-Reply-To: <1516035791-10609-1-git-send-email-imbrenda@linux.vnet.ibm.com> References: <1516035791-10609-1-git-send-email-imbrenda@linux.vnet.ibm.com> X-TM-AS-GCONF: 00 x-cbid: 18011517-0020-0000-0000-000003EADDB5 X-IBM-AV-DETECTION: SAVI=unused REMOTE=unused XFE=unused x-cbparentid: 18011517-0021-0000-0000-0000427D1190 Message-Id: <1516035791-10609-3-git-send-email-imbrenda@linux.vnet.ibm.com> X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:, , definitions=2018-01-15_08:, , signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 priorityscore=1501 malwarescore=0 suspectscore=1 phishscore=0 bulkscore=0 spamscore=0 clxscore=1015 lowpriorityscore=0 impostorscore=0 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1709140000 definitions=main-1801150240 Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP This is a fix for several issues that were found in the original code for storage attributes migration. Now no bitmap is allocated to keep track of dirty storage attributes; the extra bits of the per-memslot bitmap that are always present anyway are now used for this purpose. The code has also been refactored a little to improve readability. Signed-off-by: Claudio Imbrenda Fixes: 190df4a212a ("KVM: s390: CMMA tracking, ESSA emulation, migration mode") Fixes: 4036e3874a1 ("KVM: s390: ioctls to get and set guest storage attributes") --- arch/s390/include/asm/kvm_host.h | 9 +- arch/s390/kvm/kvm-s390.c | 246 ++++++++++++++++++++++----------------- arch/s390/kvm/priv.c | 33 ++++-- 3 files changed, 168 insertions(+), 120 deletions(-) diff --git a/arch/s390/include/asm/kvm_host.h b/arch/s390/include/asm/kvm_host.h index 819838c..2ca73b0 100644 --- a/arch/s390/include/asm/kvm_host.h +++ b/arch/s390/include/asm/kvm_host.h @@ -763,12 +763,6 @@ struct kvm_s390_vsie { struct page *pages[KVM_MAX_VCPUS]; }; -struct kvm_s390_migration_state { - unsigned long bitmap_size; /* in bits (number of guest pages) */ - atomic64_t dirty_pages; /* number of dirty pages */ - unsigned long *pgste_bitmap; -}; - struct kvm_arch{ void *sca; int use_esca; @@ -796,7 +790,8 @@ struct kvm_arch{ struct kvm_s390_vsie vsie; u8 epdx; u64 epoch; - struct kvm_s390_migration_state *migration_state; + atomic_t migration_mode; + atomic64_t cmma_dirty_pages; /* subset of available cpu features enabled by user space */ DECLARE_BITMAP(cpu_feat, KVM_S390_VM_CPU_FEAT_NR_BITS); struct kvm_s390_gisa *gisa; diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c index 100ea15..c8e1cce 100644 --- a/arch/s390/kvm/kvm-s390.c +++ b/arch/s390/kvm/kvm-s390.c @@ -778,52 +778,38 @@ static inline unsigned long *_cmma_bitmap(struct kvm_memory_slot *ms) */ static int kvm_s390_vm_start_migration(struct kvm *kvm) { - struct kvm_s390_migration_state *mgs; - struct kvm_memory_slot *ms; - /* should be the only one */ struct kvm_memslots *slots; - unsigned long ram_pages; + struct kvm_memory_slot *ms; + unsigned long ram_pages = 0; int slotnr; /* migration mode already enabled */ - if (kvm->arch.migration_state) + if (atomic_read(&kvm->arch.migration_mode)) return 0; - slots = kvm_memslots(kvm); if (!slots || !slots->used_slots) return -EINVAL; - mgs = kzalloc(sizeof(*mgs), GFP_KERNEL); - if (!mgs) - return -ENOMEM; - kvm->arch.migration_state = mgs; - if (kvm->arch.use_cmma) { - /* - * Get the last slot. They should be sorted by base_gfn, so the - * last slot is also the one at the end of the address space. - * We have verified above that at least one slot is present. - */ - ms = slots->memslots + slots->used_slots - 1; - /* round up so we only use full longs */ - ram_pages = roundup(ms->base_gfn + ms->npages, BITS_PER_LONG); - /* allocate enough bytes to store all the bits */ - mgs->pgste_bitmap = vmalloc(ram_pages / 8); - if (!mgs->pgste_bitmap) { - kfree(mgs); - kvm->arch.migration_state = NULL; - return -ENOMEM; - } - - mgs->bitmap_size = ram_pages; - atomic64_set(&mgs->dirty_pages, ram_pages); /* mark all the pages in active slots as dirty */ for (slotnr = 0; slotnr < slots->used_slots; slotnr++) { ms = slots->memslots + slotnr; - bitmap_set(mgs->pgste_bitmap, ms->base_gfn, ms->npages); + /* + * The second half of the bitmap is only used on x86, + * and would be wasted otherwise, so we put it to good + * use here to keep track of the state of the storage + * attributes. + */ + memset(_cmma_bitmap(ms), 0xFF, + kvm_dirty_bitmap_bytes(ms)); + ram_pages += ms->npages; } + atomic64_set(&kvm->arch.cmma_dirty_pages, ram_pages); + atomic_set(&kvm->arch.migration_mode, 1); kvm_s390_sync_request_broadcast(kvm, KVM_REQ_START_MIGRATION); + } else { + atomic_set(&kvm->arch.migration_mode, 1); } return 0; } @@ -834,19 +820,12 @@ static int kvm_s390_vm_start_migration(struct kvm *kvm) */ static int kvm_s390_vm_stop_migration(struct kvm *kvm) { - struct kvm_s390_migration_state *mgs; - /* migration mode already disabled */ - if (!kvm->arch.migration_state) + if (!atomic_read(&kvm->arch.migration_mode)) return 0; - mgs = kvm->arch.migration_state; - kvm->arch.migration_state = NULL; - - if (kvm->arch.use_cmma) { + atomic_set(&kvm->arch.migration_mode, 0); + if (kvm->arch.use_cmma) kvm_s390_sync_request_broadcast(kvm, KVM_REQ_STOP_MIGRATION); - vfree(mgs->pgste_bitmap); - } - kfree(mgs); return 0; } @@ -876,7 +855,7 @@ static int kvm_s390_vm_set_migration(struct kvm *kvm, static int kvm_s390_vm_get_migration(struct kvm *kvm, struct kvm_device_attr *attr) { - u64 mig = (kvm->arch.migration_state != NULL); + u64 mig = atomic_read(&kvm->arch.migration_mode); if (attr->attr != KVM_S390_VM_MIGRATION_STATUS) return -ENXIO; @@ -1551,6 +1530,112 @@ static int gfn_to_memslot_approx(struct kvm *kvm, gfn_t gfn) return start; } +static int kvm_s390_peek_cmma(struct kvm *kvm, struct kvm_s390_cmma_log *args, + u8 *res, unsigned long bufsize) +{ + unsigned long pgstev, cur, hva, i = 0; + int r, ret = 0; + + cur = args->start_gfn; + while (i < bufsize) { + hva = gfn_to_hva(kvm, cur); + if (kvm_is_error_hva(hva)) { + if (!i) + ret = -EFAULT; + break; + } + r = get_pgste(kvm->mm, hva, &pgstev); + if (r < 0) + pgstev = 0; + res[i++] = (pgstev >> 24) & 0x43; + cur++; + } + args->count = i; + + return ret; +} + +static unsigned long kvm_s390_next_dirty_cmma(struct kvm *kvm, + unsigned long cur) +{ + struct kvm_memslots *slots = kvm_memslots(kvm); + struct kvm_memory_slot *ms; + int slotidx; + + slotidx = gfn_to_memslot_approx(kvm, cur); + ms = slots->memslots + slotidx; + + if (ms->base_gfn + ms->npages <= cur) { + slotidx--; + /* If we are above the highest slot, wrap around */ + if (slotidx < 0) + slotidx = slots->used_slots - 1; + + ms = slots->memslots + slotidx; + cur = ms->base_gfn; + } + cur = find_next_bit(_cmma_bitmap(ms), ms->npages, cur - ms->base_gfn); + while ((slotidx > 0) && (cur >= ms->npages)) { + slotidx--; + ms = slots->memslots + slotidx; + cur = find_next_bit(_cmma_bitmap(ms), ms->npages, + cur - ms->base_gfn); + } + return cur + ms->base_gfn; +} + +static int kvm_s390_get_cmma(struct kvm *kvm, struct kvm_s390_cmma_log *args, + u8 *res, unsigned long bufsize) +{ + unsigned long next, mem_end, cur, hva, pgstev, i = 0; + struct kvm_memslots *slots = kvm_memslots(kvm); + struct kvm_memory_slot *ms; + int r, ret = 0; + + cur = kvm_s390_next_dirty_cmma(kvm, args->start_gfn); + ms = gfn_to_memslot(kvm, cur); + args->count = 0; + args->start_gfn = 0; + if (!ms) + return 0; + next = kvm_s390_next_dirty_cmma(kvm, cur + 1); + mem_end = slots->memslots[0].base_gfn + slots->memslots[0].npages; + + args->start_gfn = cur; + while (i < bufsize) { + hva = gfn_to_hva(kvm, cur); + if (kvm_is_error_hva(hva)) + break; + /* decrement only if we actually flipped the bit to 0 */ + if (test_and_clear_bit(cur - ms->base_gfn, _cmma_bitmap(ms))) + atomic64_dec(&kvm->arch.cmma_dirty_pages); + r = get_pgste(kvm->mm, hva, &pgstev); + if (r < 0) + pgstev = 0; + /* save the value */ + res[i++] = (pgstev >> 24) & 0x43; + /* + * if the next bit is too far away, stop. + * if we reached the previous "next", find the next one + */ + if (next > cur + KVM_S390_MAX_BIT_DISTANCE) + break; + if (cur == next) + next = kvm_s390_next_dirty_cmma(kvm, cur + 1); + /* reached the end of the bitmap or of the buffer, stop */ + if ((next >= mem_end) || (next - args->start_gfn >= bufsize)) + break; + cur++; + if (cur - ms->base_gfn >= ms->npages) { + ms = gfn_to_memslot(kvm, cur); + if (!ms) + break; + } + } + args->count = i; + return ret; +} + /* * This function searches for the next page with dirty CMMA attributes, and * saves the attributes in the buffer up to either the end of the buffer or @@ -1562,90 +1647,47 @@ static int gfn_to_memslot_approx(struct kvm *kvm, gfn_t gfn) static int kvm_s390_get_cmma_bits(struct kvm *kvm, struct kvm_s390_cmma_log *args) { - struct kvm_s390_migration_state *s = kvm->arch.migration_state; - unsigned long bufsize, hva, pgstev, i, next, cur; - int srcu_idx, peek, r = 0, rr; + unsigned long bufsize; + int srcu_idx, peek, s, rr, r = 0; u8 *res; - cur = args->start_gfn; - i = next = pgstev = 0; + s = atomic_read(&kvm->arch.migration_mode); if (unlikely(!kvm->arch.use_cmma)) return -ENXIO; /* Invalid/unsupported flags were specified */ - if (args->flags & ~KVM_S390_CMMA_PEEK) + if (unlikely(args->flags & ~KVM_S390_CMMA_PEEK)) return -EINVAL; /* Migration mode query, and we are not doing a migration */ peek = !!(args->flags & KVM_S390_CMMA_PEEK); - if (!peek && !s) + if (unlikely(!peek && !s)) return -EINVAL; /* CMMA is disabled or was not used, or the buffer has length zero */ bufsize = min(args->count, KVM_S390_CMMA_SIZE_MAX); - if (!bufsize || !kvm->mm->context.use_cmma) { + if (unlikely(!bufsize || !kvm->mm->context.use_cmma)) { memset(args, 0, sizeof(*args)); return 0; } - - if (!peek) { - /* We are not peeking, and there are no dirty pages */ - if (!atomic64_read(&s->dirty_pages)) { - memset(args, 0, sizeof(*args)); - return 0; - } - cur = find_next_bit(s->pgste_bitmap, s->bitmap_size, - args->start_gfn); - if (cur >= s->bitmap_size) /* nothing found, loop back */ - cur = find_next_bit(s->pgste_bitmap, s->bitmap_size, 0); - if (cur >= s->bitmap_size) { /* again! (very unlikely) */ - memset(args, 0, sizeof(*args)); - return 0; - } - next = find_next_bit(s->pgste_bitmap, s->bitmap_size, cur + 1); + /* We are not peeking, and there are no dirty pages */ + if (!peek && !atomic64_read(&kvm->arch.cmma_dirty_pages)) { + memset(args, 0, sizeof(*args)); + return 0; } res = vmalloc(bufsize); if (!res) return -ENOMEM; - args->start_gfn = cur; - down_read(&kvm->mm->mmap_sem); srcu_idx = srcu_read_lock(&kvm->srcu); - while (i < bufsize) { - hva = gfn_to_hva(kvm, cur); - if (kvm_is_error_hva(hva)) { - r = -EFAULT; - break; - } - /* decrement only if we actually flipped the bit to 0 */ - if (!peek && test_and_clear_bit(cur, s->pgste_bitmap)) - atomic64_dec(&s->dirty_pages); - r = get_pgste(kvm->mm, hva, &pgstev); - if (r < 0) - pgstev = 0; - /* save the value */ - res[i++] = (pgstev >> 24) & 0x43; - /* - * if the next bit is too far away, stop. - * if we reached the previous "next", find the next one - */ - if (!peek) { - if (next > cur + KVM_S390_MAX_BIT_DISTANCE) - break; - if (cur == next) - next = find_next_bit(s->pgste_bitmap, - s->bitmap_size, cur + 1); - /* reached the end of the bitmap or of the buffer, stop */ - if ((next >= s->bitmap_size) || - (next >= args->start_gfn + bufsize)) - break; - } - cur++; - } + if (peek) + r = kvm_s390_peek_cmma(kvm, args, res, bufsize); + else + r = kvm_s390_get_cmma(kvm, args, res, bufsize); srcu_read_unlock(&kvm->srcu, srcu_idx); up_read(&kvm->mm->mmap_sem); - args->count = i; - args->remaining = s ? atomic64_read(&s->dirty_pages) : 0; + + args->remaining = s ? atomic64_read(&kvm->arch.cmma_dirty_pages) : 0; rr = copy_to_user((void __user *)args->values, res, args->count); if (rr) @@ -2096,10 +2138,6 @@ void kvm_arch_destroy_vm(struct kvm *kvm) kvm_s390_destroy_adapters(kvm); kvm_s390_clear_float_irqs(kvm); kvm_s390_vsie_destroy(kvm); - if (kvm->arch.migration_state) { - vfree(kvm->arch.migration_state->pgste_bitmap); - kfree(kvm->arch.migration_state); - } KVM_EVENT(3, "vm 0x%pK destroyed", kvm); } diff --git a/arch/s390/kvm/priv.c b/arch/s390/kvm/priv.c index 572496c..321d6b2 100644 --- a/arch/s390/kvm/priv.c +++ b/arch/s390/kvm/priv.c @@ -954,9 +954,11 @@ static int handle_pfmf(struct kvm_vcpu *vcpu) return 0; } -static inline int do_essa(struct kvm_vcpu *vcpu, const int orc) +/* + * Must be called with relevant read locks held (kvm->mm->mmap_sem, kvm->srcu) + */ +static inline int __do_essa(struct kvm_vcpu *vcpu, const int orc) { - struct kvm_s390_migration_state *ms = vcpu->kvm->arch.migration_state; int r1, r2, nappended, entries; unsigned long gfn, hva, res, pgstev, ptev; unsigned long *cbrlo; @@ -965,7 +967,6 @@ static inline int do_essa(struct kvm_vcpu *vcpu, const int orc) * We don't need to set SD.FPF.SK to 1 here, because if we have a * machine check here we either handle it or crash */ - kvm_s390_get_regs_rre(vcpu, &r1, &r2); gfn = vcpu->run->s.regs.gprs[r2] >> PAGE_SHIFT; hva = gfn_to_hva(vcpu->kvm, gfn); @@ -1007,9 +1008,17 @@ static inline int do_essa(struct kvm_vcpu *vcpu, const int orc) } if (orc) { - /* increment only if we are really flipping the bit to 1 */ - if (!test_and_set_bit(gfn, ms->pgste_bitmap)) - atomic64_inc(&ms->dirty_pages); + struct kvm_memory_slot *ms = gfn_to_memslot(vcpu->kvm, gfn); + unsigned long *bm; + + if (ms) { + /* The cmma bitmap is right after the memory bitmap */ + bm = ms->dirty_bitmap + kvm_dirty_bitmap_bytes(ms) + / sizeof(*ms->dirty_bitmap); + /* Increment only if we are really flipping the bit */ + if (!test_and_set_bit(gfn - ms->base_gfn, bm)) + atomic64_inc(&vcpu->kvm->arch.cmma_dirty_pages); + } } return nappended; @@ -1038,7 +1047,7 @@ static int handle_essa(struct kvm_vcpu *vcpu) : ESSA_SET_STABLE_IF_RESIDENT)) return kvm_s390_inject_program_int(vcpu, PGM_SPECIFICATION); - if (likely(!vcpu->kvm->arch.migration_state)) { + if (likely(!atomic_read(&vcpu->kvm->arch.migration_mode))) { /* * CMMA is enabled in the KVM settings, but is disabled in * the SIE block and in the mm_context, and we are not doing @@ -1066,10 +1075,16 @@ static int handle_essa(struct kvm_vcpu *vcpu) /* Retry the ESSA instruction */ kvm_s390_retry_instr(vcpu); } else { - /* Account for the possible extra cbrl entry */ - i = do_essa(vcpu, orc); + int srcu_idx; + + down_read(&vcpu->kvm->mm->mmap_sem); + srcu_idx = srcu_read_lock(&vcpu->kvm->srcu); + i = __do_essa(vcpu, orc); + srcu_read_unlock(&vcpu->kvm->srcu, srcu_idx); + up_read(&vcpu->kvm->mm->mmap_sem); if (i < 0) return i; + /* Account for the possible extra cbrl entry */ entries += i; } vcpu->arch.sie_block->cbrlo &= PAGE_MASK; /* reset nceo */