From patchwork Thu Sep 19 20:04:27 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Mina Almasry X-Patchwork-Id: 11153217 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id E92AF14DB for ; Thu, 19 Sep 2019 20:04:39 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id A9544217D6 for ; Thu, 19 Sep 2019 20:04:39 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="Omgake2b" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org A9544217D6 Authentication-Results: mail.kernel.org; dmarc=fail (p=reject dis=none) header.from=google.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 51FA56B0266; Thu, 19 Sep 2019 16:04:38 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 487396B026B; Thu, 19 Sep 2019 16:04:38 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 3263A6B0276; Thu, 19 Sep 2019 16:04:38 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0005.hostedemail.com [216.40.44.5]) by kanga.kvack.org (Postfix) with ESMTP id 0BFC66B0266 for ; Thu, 19 Sep 2019 16:04:38 -0400 (EDT) Received: from smtpin17.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with SMTP id A4F20180AD807 for ; Thu, 19 Sep 2019 20:04:37 +0000 (UTC) X-FDA: 75952747794.17.play00_457236f1a8447 X-Spam-Summary: 2,0,0,5f58f82fe8aedf3d,d41d8cd98f00b204,3vn-dxqskcbuv67vdcj738v19916z.x97638fi-775gvx5.9c1@flex--almasrymina.bounces.google.com,:mike.kravetz@oracle.com:almasrymina@google.com:rientjes@google.com:shakeelb@google.com:gthelen@google.com:akpm@linux-foundation.org:linux-kernel@vger.kernel.org:,RULES_HIT:2:41:69:152:355:379:541:800:960:966:968:973:982:988:989:1260:1277:1313:1314:1345:1359:1431:1437:1516:1518:1535:1593:1594:1605:1606:1730:1747:1777:1792:2196:2198:2199:2200:2393:2559:2562:2693:2902:3138:3139:3140:3141:3142:3152:3865:3866:3867:3868:3870:3871:3872:3874:4119:4321:4385:5007:6119:6261:6653:7903:7904:8603:8660:9010:9592:9969:10004:11026:11232:11473:11658:11914:12043:12295:12296:12297:12555:12679:12683:12895:13141:13148:13230:14096:14097:14394:14659:21080:21324:21444:21451:21627:21740:21939:30012:30034:30054:30064:30070,0,RBL:209.85.222.73:@flex--almasrymina.bounces.google.com:.lbl8.mailshell.net-62.18.0.100 66.100.201.100,CacheIP:none,Bayesian:0.5,0.5,0.5, Netcheck X-HE-Tag: play00_457236f1a8447 X-Filterd-Recvd-Size: 8577 Received: from mail-ua1-f73.google.com (mail-ua1-f73.google.com [209.85.222.73]) by imf41.hostedemail.com (Postfix) with ESMTP for ; Thu, 19 Sep 2019 20:04:37 +0000 (UTC) Received: by mail-ua1-f73.google.com with SMTP id t16so918406uae.21 for ; Thu, 19 Sep 2019 13:04:36 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=date:in-reply-to:message-id:mime-version:references:subject:from:to :cc; bh=VgzUh1n1rberaTXXihsqmsmieopIuBTorRpxm9/WwWg=; b=Omgake2bCvusDoi9TS/gj/AqeLx5vELUrVjRQKHqhEkF4Tgs2bPF15on7pAmUpSTf1 TX75BbMM8sNz9Job3O0OlVylOKu+udFtDc19H3cEPJYtzrrJh+fx3HBKn09ouO2QuOVO LDna1cAUm2xmBFMEaH/DIqALGed7Kk1YxGfsKFT30uzvzaxUGN8wxB+/ATzarVrRS3O4 lQmWEn9MMfzf3QfTpPTL2ItkdgZGZTP8JNf6pBBvXGFxg53nq3c4Svp9beVDZrgaqvgH ZFb9gM5KSvjCrEYfeBOOBj2vJcmDAEPvdgAGrngnL7O0IMy4HSGz8ijxnkgA359O6k3c 2LOA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=VgzUh1n1rberaTXXihsqmsmieopIuBTorRpxm9/WwWg=; b=Gqmo598rkP7KHn2lOWa2+KQc1hv5QJCwKR1IF/ClHuA6zQcSmogIqNRHmvYew5UMuj sgii2IsgxTAIbAydeVdhvlhkOL/M5j701as/k11amewBjBBNdMR2RC1cDjI9HbZtcUmb hTIswHlEvFv35MeGOdSwkkRgcxmcWaT2/hw0yjoxQEZsPHqDAbLdESjdbDU/wp5Wc8fR c6QIQWmc6GO7cta3R2FQcU0vPpv59sdZe5+/k9WWAoe2bw3fzNb4qyV7WFIR4bVrpCSU DUpLAKH0/dKCDZO8944opi4rH3thHljm8HF/ofH541f/n6JALZ90tHtFHR+kpsz+E6Uu d4aw== X-Gm-Message-State: APjAAAUQQ+4DQwGLKSVuqZ0Z8R646PsCPvDxYlnBYNljVrzMOhf/+wyG R+sFfpsSatWhTI8FKIovPLVfu4ZxKUgMzveHkw== X-Google-Smtp-Source: APXvYqzyOKasUBwfbVumiUXbU2+j6HGE8tYCxCuoJohu/6zOJWSyZX9I9TiTmOtlhlUuc6NKPZaGSAV3RwMfKd7Djw== X-Received: by 2002:a1f:19d8:: with SMTP id 207mr5740981vkz.54.1568923476284; Thu, 19 Sep 2019 13:04:36 -0700 (PDT) Date: Thu, 19 Sep 2019 13:04:27 -0700 In-Reply-To: <20190919200428.188797-1-almasrymina@google.com> Message-Id: <20190919200428.188797-2-almasrymina@google.com> Mime-Version: 1.0 References: <20190919200428.188797-1-almasrymina@google.com> X-Mailer: git-send-email 2.23.0.351.gc4317032e6-goog Subject: [PATCH 1/2] hugetlb: region_chg provides only cache entry From: Mina Almasry To: mike.kravetz@oracle.com Cc: almasrymina@google.com, rientjes@google.com, shakeelb@google.com, gthelen@google.com, akpm@linux-foundation.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Current behavior is that region_chg provides both a cache entry in resv->region_cache, AND a placeholder entry in resv->regions. region_add first tries to use the placeholder, and if it finds that the placeholder has been deleted by a racing region_del call, it uses the cache entry. This behavior is completely unnecessary and is removed in this patch for a couple of reasons: 1. region_add needs to either find a cached file_region entry in resv->region_cache, or find an entry in resv->regions to expand. It does not need both. 2. region_chg adding a placeholder entry in resv->regions opens up a possible race with region_del, where region_chg adds a placeholder region in resv->regions, and this region is deleted by a racing call to region_del during region_chg execution or before region_add is called. Removing the race makes the code easier to reason about and maintain. In addition, a follow up patch in another series that disables region coalescing, which would be further complicated if the race with region_del exists. Signed-off-by: Mina Almasry Reviewed-by: Mike Kravetz --- mm/hugetlb.c | 63 +++++++++------------------------------------------- 1 file changed, 11 insertions(+), 52 deletions(-) -- 2.23.0.351.gc4317032e6-goog diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 6d7296dd11b8..a14f6047fc7e 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -246,14 +246,10 @@ struct file_region { /* * Add the huge page range represented by [f, t) to the reserve - * map. In the normal case, existing regions will be expanded - * to accommodate the specified range. Sufficient regions should - * exist for expansion due to the previous call to region_chg - * with the same range. However, it is possible that region_del - * could have been called after region_chg and modifed the map - * in such a way that no region exists to be expanded. In this - * case, pull a region descriptor from the cache associated with - * the map and use that for the new range. + * map. Existing regions will be expanded to accommodate the specified + * range, or a region will be taken from the cache. Sufficient regions + * must exist in the cache due to the previous call to region_chg with + * the same range. * * Return the number of new huge pages added to the map. This * number is greater than or equal to zero. @@ -272,9 +268,8 @@ static long region_add(struct resv_map *resv, long f, long t) /* * If no region exists which can be expanded to include the - * specified range, the list must have been modified by an - * interleving call to region_del(). Pull a region descriptor - * from the cache and use it for this range. + * specified range, pull a region descriptor from the cache + * and use it for this range. */ if (&rg->link == head || t < rg->from) { VM_BUG_ON(resv->region_cache_count <= 0); @@ -339,15 +334,9 @@ static long region_add(struct resv_map *resv, long f, long t) * call to region_add that will actually modify the reserve * map to add the specified range [f, t). region_chg does * not change the number of huge pages represented by the - * map. However, if the existing regions in the map can not - * be expanded to represent the new range, a new file_region - * structure is added to the map as a placeholder. This is - * so that the subsequent region_add call will have all the - * regions it needs and will not fail. - * - * Upon entry, region_chg will also examine the cache of region descriptors - * associated with the map. If there are not enough descriptors cached, one - * will be allocated for the in progress add operation. + * map. A new file_region structure is added to the cache + * as a placeholder, so that the subsequent region_add + * call will have all the regions it needs and will not fail. * * Returns the number of huge pages that need to be added to the existing * reservation map for the range [f, t). This number is greater or equal to @@ -357,10 +346,9 @@ static long region_add(struct resv_map *resv, long f, long t) static long region_chg(struct resv_map *resv, long f, long t) { struct list_head *head = &resv->regions; - struct file_region *rg, *nrg = NULL; + struct file_region *rg; long chg = 0; -retry: spin_lock(&resv->lock); retry_locked: resv->adds_in_progress++; @@ -378,10 +366,8 @@ static long region_chg(struct resv_map *resv, long f, long t) spin_unlock(&resv->lock); trg = kmalloc(sizeof(*trg), GFP_KERNEL); - if (!trg) { - kfree(nrg); + if (!trg) return -ENOMEM; - } spin_lock(&resv->lock); list_add(&trg->link, &resv->region_cache); @@ -394,28 +380,6 @@ static long region_chg(struct resv_map *resv, long f, long t) if (f <= rg->to) break; - /* If we are below the current region then a new region is required. - * Subtle, allocate a new region at the position but make it zero - * size such that we can guarantee to record the reservation. */ - if (&rg->link == head || t < rg->from) { - if (!nrg) { - resv->adds_in_progress--; - spin_unlock(&resv->lock); - nrg = kmalloc(sizeof(*nrg), GFP_KERNEL); - if (!nrg) - return -ENOMEM; - - nrg->from = f; - nrg->to = f; - INIT_LIST_HEAD(&nrg->link); - goto retry; - } - - list_add(&nrg->link, rg->link.prev); - chg = t - f; - goto out_nrg; - } - /* Round our left edge to the current segment if it encloses us. */ if (f > rg->from) f = rg->from; @@ -439,11 +403,6 @@ static long region_chg(struct resv_map *resv, long f, long t) } out: - spin_unlock(&resv->lock); - /* We already know we raced and no longer need the new region */ - kfree(nrg); - return chg; -out_nrg: spin_unlock(&resv->lock); return chg; }