From patchwork Wed Nov 6 15:58:49 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dario Faggioli X-Patchwork-Id: 11230669 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 825A41575 for ; Wed, 6 Nov 2019 16:00:39 +0000 (UTC) Received: from lists.xenproject.org (lists.xenproject.org [192.237.175.120]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 68E6F2178F for ; Wed, 6 Nov 2019 16:00:39 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 68E6F2178F Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=suse.com Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=xen-devel-bounces@lists.xenproject.org Received: from localhost ([127.0.0.1] helo=lists.xenproject.org) by lists.xenproject.org with esmtp (Exim 4.89) (envelope-from ) id 1iSNhv-0007Wa-T1; Wed, 06 Nov 2019 15:58:55 +0000 Received: from us1-rack-iad1.inumbo.com ([172.99.69.81]) by lists.xenproject.org with esmtp (Exim 4.89) (envelope-from ) id 1iSNhu-0007WV-7y for xen-devel@lists.xenproject.org; Wed, 06 Nov 2019 15:58:54 +0000 X-Inumbo-ID: 54504f76-00ae-11ea-b678-bc764e2007e4 Received: from mail-wr1-f66.google.com (unknown [209.85.221.66]) by us1-rack-iad1.inumbo.com (Halon) with ESMTPS id 54504f76-00ae-11ea-b678-bc764e2007e4; Wed, 06 Nov 2019 15:58:53 +0000 (UTC) Received: by mail-wr1-f66.google.com with SMTP id f2so17562037wrs.11 for ; Wed, 06 Nov 2019 07:58:53 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:from:to:cc:date:message-id:user-agent :mime-version:content-transfer-encoding; bh=PTrMImt4y3y/oG8P/8TUZHYOQbTAJi/JSedi2l4rzfU=; b=nDcYoJaCmLFJiiqq43yrYpzo1uUw/NIwgOD3N3CNC0XludD2GSBvG6ZxT5+TJ45Pb/ ZuiRbrYDPrXxis3a8KNUur2TWKSCQleiorN7aCbOjrjMCqeOl8/GYOeRzGpnOIdE4IxP 7AMIMjQbuEmtNtqIt6HBLa7EyTbw1JghT2a+k9Qoo7EYz72OOgczSfHA8ztLFCFihlYD mdsoc77wzguOhQI7vX2VsMn4b8ZNP6tCMDAL8rkUgLR8Pene0cFH5ZFZcNhHzgSlhcXY eV/BsVlXmmg87+q64ZPIqPPxD03a+yVm8MJcFDMGi1rIPMZWivl5sTbUpfRdPX2nTSFG U8kA== X-Gm-Message-State: APjAAAX9/+2N9Drzv/dr3iK6XnGgT3oklD/DFccoYXFYXbpuirP4qSlR tsSnjbaAX3BRhMSvAnb9H9Q= X-Google-Smtp-Source: APXvYqwFJIa88hVUuaCstXcW87Xzi0hMl8LNvzLNy+XiAJVQf7YR7rGMe6fCrJgOb3EyLSOVfLFI+g== X-Received: by 2002:a5d:4585:: with SMTP id p5mr3468703wrq.134.1573055931891; Wed, 06 Nov 2019 07:58:51 -0800 (PST) Received: from [192.168.0.35] (87.78.186.89.cust.ip.kpnqwest.it. [89.186.78.87]) by smtp.gmail.com with ESMTPSA id k125sm3667879wmf.2.2019.11.06.07.58.50 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 06 Nov 2019 07:58:50 -0800 (PST) From: Dario Faggioli To: xen-devel@lists.xenproject.org Date: Wed, 06 Nov 2019 16:58:49 +0100 Message-ID: <157305592941.20672.10855835711915878073.stgit@Palanthas> User-Agent: StGit/0.19 MIME-Version: 1.0 Subject: [Xen-devel] [BUGFIX PATCH for-4.13] sched: fix dom0less boot with the null scheduler X-BeenThere: xen-devel@lists.xenproject.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Xen developer discussion List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Cc: George.Dunlap@eu.citrix.com, julien.grall@arm.com, sstabellini@kernel.org, jgross@suse.de Errors-To: xen-devel-bounces@lists.xenproject.org Sender: "Xen-devel" In a dom0less configuration, if the null scheduler is used, the system may fail to boot, because the loop in null_unit_wake() never exits. Bisection showed that this behavior occurs since commit d545f1d6 ("xen: sched: deal with vCPUs being or becoming online or offline") but the real problem is that, in this case, pick_res() always return the same CPU. Fix this by only deal with the simple case, i.e., the vCPU that is coming online can be assigned to a sched. resource right away, in null_unit_wake(). If it can't, just add it to the waitqueue, and we will deal with it in null_schedule(), being careful about not racing with vcpu_wake(). Reported-by: Stefano Stabellini Signed-off-by: Dario Faggioli Tested-by: Stefano Stabellini Reviewed-by: George Dunlap --- xen/common/sched_null.c | 113 +++++++++++++++++++++++++++++++---------------- 1 file changed, 75 insertions(+), 38 deletions(-) diff --git a/xen/common/sched_null.c b/xen/common/sched_null.c index 2525464a7c..88bd11a187 100644 --- a/xen/common/sched_null.c +++ b/xen/common/sched_null.c @@ -568,50 +568,52 @@ static void null_unit_wake(const struct scheduler *ops, else SCHED_STAT_CRANK(unit_wake_not_runnable); + if ( likely(per_cpu(npc, cpu).unit == unit) ) + { + cpu_raise_softirq(cpu, SCHEDULE_SOFTIRQ); + return; + } + /* * If a unit is neither on a pCPU nor in the waitqueue, it means it was - * offline, and that it is now coming back being online. + * offline, and that it is now coming back being online. If we're lucky, + * and it's previous resource is free (and affinities match), we can just + * assign the unit to it (we own the proper lock already) and be done. */ - if ( unlikely(per_cpu(npc, cpu).unit != unit && list_empty(&nvc->waitq_elem)) ) + if ( per_cpu(npc, cpu).unit == NULL && + unit_check_affinity(unit, cpu, BALANCE_HARD_AFFINITY) ) { - spin_lock(&prv->waitq_lock); - list_add_tail(&nvc->waitq_elem, &prv->waitq); - spin_unlock(&prv->waitq_lock); - - cpumask_and(cpumask_scratch_cpu(cpu), unit->cpu_hard_affinity, - cpupool_domain_master_cpumask(unit->domain)); - - if ( !cpumask_intersects(&prv->cpus_free, cpumask_scratch_cpu(cpu)) ) + if ( !has_soft_affinity(unit) || + unit_check_affinity(unit, cpu, BALANCE_SOFT_AFFINITY) ) { - dprintk(XENLOG_G_WARNING, "WARNING: d%dv%d not assigned to any CPU!\n", - unit->domain->domain_id, unit->unit_id); + unit_assign(prv, unit, cpu); + cpu_raise_softirq(cpu, SCHEDULE_SOFTIRQ); return; } + } - /* - * Now we would want to assign the unit to cpu, but we can't, because - * we don't have the lock. So, let's do the following: - * - try to remove cpu from the list of free cpus, to avoid races with - * other onlining, inserting or migrating operations; - * - tickle the cpu, which will pickup work from the waitqueue, and - * assign it to itself; - * - if we're racing already, and if there still are free cpus, try - * again. - */ - while ( cpumask_intersects(&prv->cpus_free, cpumask_scratch_cpu(cpu)) ) - { - unsigned int new_cpu = pick_res(prv, unit)->master_cpu; + /* + * If the resource is not free (or affinities do not match) we need + * to assign unit to some other one, but we can't do it here, as: + * - we don't own the proper lock, + * - we can't change v->processor under vcpu_wake()'s feet. + * So we add it to the waitqueue, and tickle all the free CPUs (if any) + * on which unit can run. The first one that schedules will pick it up. + */ + spin_lock(&prv->waitq_lock); + list_add_tail(&nvc->waitq_elem, &prv->waitq); + spin_unlock(&prv->waitq_lock); - if ( test_and_clear_bit(new_cpu, &prv->cpus_free) ) - { - cpu_raise_softirq(new_cpu, SCHEDULE_SOFTIRQ); - return; - } - } - } + cpumask_and(cpumask_scratch_cpu(cpu), unit->cpu_hard_affinity, + cpupool_domain_master_cpumask(unit->domain)); + cpumask_and(cpumask_scratch_cpu(cpu), cpumask_scratch_cpu(cpu), + &prv->cpus_free); - /* Note that we get here only for units assigned to a pCPU */ - cpu_raise_softirq(sched_unit_master(unit), SCHEDULE_SOFTIRQ); + if ( cpumask_empty(cpumask_scratch_cpu(cpu)) ) + dprintk(XENLOG_G_WARNING, "WARNING: d%dv%d not assigned to any CPU!\n", + unit->domain->domain_id, unit->unit_id); + else + cpumask_raise_softirq(cpumask_scratch_cpu(cpu), SCHEDULE_SOFTIRQ); } static void null_unit_sleep(const struct scheduler *ops, @@ -827,6 +829,8 @@ static void null_schedule(const struct scheduler *ops, struct sched_unit *prev, */ if ( unlikely(prev->next_task == NULL) ) { + bool unit_found; + spin_lock(&prv->waitq_lock); if ( list_empty(&prv->waitq) ) @@ -839,6 +843,7 @@ static void null_schedule(const struct scheduler *ops, struct sched_unit *prev, * it only in cases where a pcpu has no unit associated (e.g., as * said above, the cpu has just joined a cpupool). */ + unit_found = false; for_each_affinity_balance_step( bs ) { list_for_each_entry( wvc, &prv->waitq, waitq_elem ) @@ -849,13 +854,45 @@ static void null_schedule(const struct scheduler *ops, struct sched_unit *prev, if ( unit_check_affinity(wvc->unit, sched_cpu, bs) ) { - unit_assign(prv, wvc->unit, sched_cpu); - list_del_init(&wvc->waitq_elem); - prev->next_task = wvc->unit; - goto unlock; + spinlock_t *lock; + + unit_found = true; + + /* + * If the unit in the waitqueue has just come up online, + * we risk racing with vcpu_wake(). To avoid this, sync + * on the spinlock that vcpu_wake() holds, but only with + * trylock, to avoid deadlock). + */ + lock = pcpu_schedule_trylock(sched_unit_master(wvc->unit)); + + /* + * We know the vcpu's lock is not this resource's lock. In + * fact, if it were, since this cpu is free, vcpu_wake() + * would have assigned the unit to here directly. + */ + ASSERT(lock != get_sched_res(sched_cpu)->schedule_lock); + + if ( lock ) { + unit_assign(prv, wvc->unit, sched_cpu); + list_del_init(&wvc->waitq_elem); + prev->next_task = wvc->unit; + spin_unlock(lock); + goto unlock; + } } } } + /* + * If we did find a unit with suitable affinity in the waitqueue, but + * we could not pick it up (due to lock contention), and hence we are + * still free, plan for another try. In fact, we don't want such unit + * to be stuck in the waitqueue, when there are free cpus where it + * could run. + */ + if ( unlikely( unit_found && prev->next_task == NULL && + !list_empty(&prv->waitq)) ) + cpu_raise_softirq(cur_cpu, SCHEDULE_SOFTIRQ); unlock: spin_unlock(&prv->waitq_lock);