From patchwork Tue Oct 10 03:28:33 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: lai bin X-Patchwork-Id: 13414783 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 48733CD68E3 for ; Tue, 10 Oct 2023 03:28:45 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id C56F88003F; Mon, 9 Oct 2023 23:28:44 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id C071680027; Mon, 9 Oct 2023 23:28:44 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id ACEA78003F; Mon, 9 Oct 2023 23:28:44 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 9E87980027 for ; Mon, 9 Oct 2023 23:28:44 -0400 (EDT) Received: from smtpin05.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id 6F87C40539 for ; Tue, 10 Oct 2023 03:28:44 +0000 (UTC) X-FDA: 81328119768.05.B750D78 Received: from mail-pf1-f178.google.com (mail-pf1-f178.google.com [209.85.210.178]) by imf23.hostedemail.com (Postfix) with ESMTP id A2945140018 for ; Tue, 10 Oct 2023 03:28:42 +0000 (UTC) Authentication-Results: imf23.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=miOyEVJS; spf=pass (imf23.hostedemail.com: domain of sclaibin@gmail.com designates 209.85.210.178 as permitted sender) smtp.mailfrom=sclaibin@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1696908522; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:references:dkim-signature; bh=SGVf7q7LnkpOvGfWEYq5TcNZtO9gcsQp7/NyaBpwlnw=; b=lPqmshCj9btWN2cfusZ1BqEAaiXagt7yhyNiDstyU7vgv6Vipot+sQzR8l2qGof8Bd2SCJ uODyvqmgOJvohUdz/bOWYEHxlHpfnSHKUIFO2i8LrT63Klet++8lXf8p+0xptBBG7CZxT9 HFaFAFQ8ZzKkVl5EB1tojOlbI3iZP/E= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1696908522; a=rsa-sha256; cv=none; b=CEqsa7kr4EBtyc6LONWKwqQVlrhfh3k8rMDmSQBek93j6ywFa/ekdDHZXw0+El/LjGmd/9 EsIAQ2kewITBhfpPU+Kboc/JcG0sbvQT6HFF3pdAo7MLzBYuAB7HG0e2wznKMJoDJuSaJb 16yiQyJrw0PQWVfkdlm268aOyIMRsJE= ARC-Authentication-Results: i=1; imf23.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=miOyEVJS; spf=pass (imf23.hostedemail.com: domain of sclaibin@gmail.com designates 209.85.210.178 as permitted sender) smtp.mailfrom=sclaibin@gmail.com; dmarc=pass (policy=none) header.from=gmail.com Received: by mail-pf1-f178.google.com with SMTP id d2e1a72fcca58-6969b391791so3520429b3a.3 for ; Mon, 09 Oct 2023 20:28:42 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1696908521; x=1697513321; darn=kvack.org; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:from:to:cc:subject:date:message-id:reply-to; bh=SGVf7q7LnkpOvGfWEYq5TcNZtO9gcsQp7/NyaBpwlnw=; b=miOyEVJSIQO5WE7aLvIlmoO7/qRNpBwQZFwn8Q+1h5+YYGNGyHRTZmMHlKgoMLWp5Q PXNvi/AGAmX/LfmoFkyuW/+xtXyi7RpEDQk54tJbSkHTFNZGFbtyTOAYcqu+4vDHXu3d u8L0K6u9CZlinwMZWZhSEeZGrKDpWOew+sbp75KYM+OynlKGVyKC+0SN6PeUivvd0Mjy 5A8FaURSPPcFjHnuqoC0cfP6zl5Vqc9yl8jpGi+rjzY1eKZFAeEccpbT3Yg9PzyhWYzf QInZZemWOmx6KPxBkYYvC36RgeA6C+Q6Y36tOw4DG/QeGmN73Zprwa1ApQM7VBgnwYLm 76cw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1696908521; x=1697513321; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=SGVf7q7LnkpOvGfWEYq5TcNZtO9gcsQp7/NyaBpwlnw=; b=YZtvELf8nDi7paL5c46LSmxLcqGTiehG3fmUZeMFarGvtKfBXcec0oe4QFQQsigQWK Y/xbdR/vyxzWLvh5+2Jo5jHWvJ9Mx2AgYqyNAWaFDefYQp1QfuxtI8A/phIu1OSULgQ5 KAhYHcoEsA+qjc1BY9XoUSbkpSeO2c5hgFBC6yqG18iyinrRtyPHlYaGW6CnbYl2phZr t3mb1qH88HxN7NpY0z/GW3fczIti09pigyYMtX1pQJXEgP4M+fLeCfcmLaYdXPf0ipxl K4jnZ69dKOYqvecHw68JmVdRTVoZzxz5Jv4AVrnXdcIC04vDFMNAfsRlrbhmYsq1+87H Rckg== X-Gm-Message-State: AOJu0Yx05+DVYtqoKC8yjJnulOHzJ7peflO2gsxtAB9cZJ9ZZyZ4w7mh /wjkOVNvujuKAYDtW7KhhXo= X-Google-Smtp-Source: AGHT+IHznTpB4tCFTP+SES9KpvSO9WSZ5MZ0i8ZWpqBUiTTnF2TwBk52JSz7vQv2R0qAbXwdXdyBjQ== X-Received: by 2002:a05:6a00:2316:b0:690:c5cf:91f5 with SMTP id h22-20020a056a00231600b00690c5cf91f5mr17596309pfh.18.1696908521096; Mon, 09 Oct 2023 20:28:41 -0700 (PDT) Received: from localhost.localdomain ([14.22.11.164]) by smtp.gmail.com with ESMTPSA id 19-20020aa79153000000b00692754580f0sm7123468pfi.187.2023.10.09.20.28.38 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 09 Oct 2023 20:28:40 -0700 (PDT) From: Bin Lai X-Google-Original-From: Bin Lai To: mingo@redhat.com, peterz@infradead.org, juri.lelli@redhat.com, vincent.guittot@linaro.org, akpm@linux-foundation.org Cc: dietmar.eggemann@arm.com, rostedt@goodmis.org, bsegall@google.com, mgorman@suse.de, bristot@redhat.com, vschneid@redhat.com, willy@infradead.org, linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, Bin Lai Subject: [PATCH] sched/wait: introduce endmark in __wake_up_common Date: Tue, 10 Oct 2023 11:28:33 +0800 Message-Id: <20231010032833.398033-1-robinlai@tencent.com> X-Mailer: git-send-email 2.39.3 MIME-Version: 1.0 X-Rspamd-Queue-Id: A2945140018 X-Rspam-User: X-Stat-Signature: 3d6xw3ag8rcuq3oprgjpkjzezpy558g3 X-Rspamd-Server: rspam03 X-HE-Tag: 1696908522-176121 X-HE-Meta: U2FsdGVkX1+Rty73SBLaJj6N3rphSkNMlayqR9pGmBcprCahHAQX5y9aAiUBo/Fv46LEUMVPKwa+DjHQaZF2+7UfF3VdI8V4hi0o1DHk0Us8ozXf6tGP01kRBxw9wQFGJE7JyRTRQR8W7vJF/xKM/MYn64FStVi983gwKrInLaUiKbmh9NPtaunY0nI7zlq19ib9DA3T4/IiE210z4hNtaIIK7gDZRLkeQYx0qbUdFGhtwKhaeAl0rQyvhiEz6zZTKO12vO8JjsHVGsYwAKTXBxzpz19T/tM+XHjj4UV55R4nbcMlEumpacRh7eLZiMd7ef7ZY+ak1b/61/R2vCkIb3nASXRsk1bmTnzaQfiQfEb0MwNsi31YrjkFofSxwiaI6HVR0uR9PjwV4yTOPx2sopaKk8FKArdk6iqALNUZofsXRAEsJkVzCIto41ESf3lehc3Jfiu0aVTKdv4yXH6wdCgqo8X21BN0E0KvHKx0bRHFaBzAlI/v59dF28vynVGt9gfA8G0UmZPcpOIYDbnojRMKYEuLbIyJjiwXtERwlrQoDnFMQYcGGaQEZyrw1XsTEEOdbnsA8ckaKNJ6RWF9GuO6ayVn5jpdXXzhF5qCvfwKkklrzrjf6r+wUFWU2ALxUs6E3KrBzGrmz1sjTsHiHxhQlxIu7Nj9iCRHG1m5c1iGmzNjJAtxJLbaOz3wIoMZa+LgM7aT8qfQl57BZdopfn6KYTiVFFf/ja/kN8DYA+ka9VYjPvh6MSfIQ5+g9IE/gHFJMULc3/Zei8I2sy9fjV/b74hFpx+0MhakThOoLAv/6URunk73AuHK1D8bmDAKXlasJpq7Nhj+DzrAxJY09KFGcRHrkRuAODT7wZZg0tI/o8kEAckwQL8d1TC+cliGwgiPTbmEfq/b/z3NofXfFeVgph+0iyTW/wwBN0VDFiG2xPkyysu6aIVqI4gUX+fSg9Sdj4Fpa74hmqKKRZ IRUeci6A 2W6cONpoy6WOcgXhFeQDO4x0m66cLlzihWofgzDDBRYJEo/r8mWaFGJ9l0rHUU1e8VrUXpPS6rdnnbqSmrgSP2KDDlorGTC2jUyRUgTVHbZlJ0pTHBkdGUuw79x1B92UhY0WT9c7dobgpPFJ+5ufru0HUzdo9aRgBMQzEN/tenO9qz1DWxO4KY8Ya1QOfKiEhbEas7RAybyjk9jgGSUN4Ax/3lCmuEPkz61nbk0C8wIbLEfR/5s2lnCeWUiJfU1rbAFZCbbjK5zRtyNRd9ifJJcJSzjI7a5po2rezFjvDVgntbAeEeIximstGqHfA/Ln+I8/qhknQ4DdOLexYzVK2UvcLcY1oF1/pe2LxuP0RAeQWzfaRwwM9k3qPImtyYWiysTn2lHo4XVgv6tbLAjvI9FIg60Lu/kA296jSpmd1hOJ5ZFHSntke3JyN7iGPyfyk2wRVBIY/PKxbXpH6mSQYL5Ntlx/XNu1nJOmywqzNjuMeNYc= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Without this patch applied, it can cause the waker to fall into an infinite loop in some cases. The commit 2554db916586 ("sched/wait: Break up long wake list walk") introduces WQ_FLAG_BOOKMARK to break up long wake list walk. When the number of walked entries reach 64, the waker will record scan position and release the queue lock, which reduces interrupts and rescheduling latency. Let's take an example, ltp-aiodio case of runltp-ng create 100 processes for io testing. These processes write the same test file and wait for page writing complete. Because these processes are all writing to the same file, they may all be waiting for the writeback bit of the same page to be cleared. When the page writeback is completed, the end_page_writeback will clear the writeback bit of the page and wake up all processes on the wake list for the page. At the same time, another process could submit the page and set the writeback bit again. When the awakened processes find that the writeback bit has not been cleared, thery will try to add themselves to the wake list immediately. Because of the WQ_FLAG_BOOKMARK feature, the awakened processes will be added to the tail of wake list again after the waker releases the queue lock. It causes the waker to fall into an infinite loop. Therefore, we introduce the endmark to indicate the end of bookmark state. When we get the endmark entry, stop placing the bookmark flag and hold the lock until all remaining entries have been walked. Signed-off-by: Bin Lai diff --git a/include/linux/wait.h b/include/linux/wait.h index 5ec7739400f4..3413babd2db4 100644 --- a/include/linux/wait.h +++ b/include/linux/wait.h @@ -213,7 +213,8 @@ int __wake_up(struct wait_queue_head *wq_head, unsigned int mode, int nr, void * void __wake_up_on_current_cpu(struct wait_queue_head *wq_head, unsigned int mode, void *key); void __wake_up_locked_key(struct wait_queue_head *wq_head, unsigned int mode, void *key); void __wake_up_locked_key_bookmark(struct wait_queue_head *wq_head, - unsigned int mode, void *key, wait_queue_entry_t *bookmark); + unsigned int mode, void *key, wait_queue_entry_t *bookmark, + wait_queue_entry_t *endmark); void __wake_up_sync_key(struct wait_queue_head *wq_head, unsigned int mode, void *key); void __wake_up_locked_sync_key(struct wait_queue_head *wq_head, unsigned int mode, void *key); void __wake_up_locked(struct wait_queue_head *wq_head, unsigned int mode, int nr); diff --git a/kernel/sched/wait.c b/kernel/sched/wait.c index 802d98cf2de3..9ecb59193710 100644 --- a/kernel/sched/wait.c +++ b/kernel/sched/wait.c @@ -79,10 +79,11 @@ EXPORT_SYMBOL(remove_wait_queue); */ static int __wake_up_common(struct wait_queue_head *wq_head, unsigned int mode, int nr_exclusive, int wake_flags, void *key, - wait_queue_entry_t *bookmark) + wait_queue_entry_t *bookmark, + wait_queue_entry_t *endmark) { wait_queue_entry_t *curr, *next; - int cnt = 0; + int cnt = 0, touch_endmark = 0; lockdep_assert_held(&wq_head->lock); @@ -95,12 +96,17 @@ static int __wake_up_common(struct wait_queue_head *wq_head, unsigned int mode, curr = list_first_entry(&wq_head->head, wait_queue_entry_t, entry); if (&curr->entry == &wq_head->head) - return nr_exclusive; + goto out; list_for_each_entry_safe_from(curr, next, &wq_head->head, entry) { unsigned flags = curr->flags; int ret; + if (curr == endmark) { + touch_endmark = 1; + continue; + } + if (flags & WQ_FLAG_BOOKMARK) continue; @@ -110,14 +116,24 @@ static int __wake_up_common(struct wait_queue_head *wq_head, unsigned int mode, if (ret && (flags & WQ_FLAG_EXCLUSIVE) && !--nr_exclusive) break; - if (bookmark && (++cnt > WAITQUEUE_WALK_BREAK_CNT) && + if (bookmark && !touch_endmark && (++cnt > WAITQUEUE_WALK_BREAK_CNT) && (&next->entry != &wq_head->head)) { bookmark->flags = WQ_FLAG_BOOKMARK; list_add_tail(&bookmark->entry, &next->entry); - break; + + if (endmark && !(endmark->flags & WQ_FLAG_BOOKMARK)) { + endmark->flags = WQ_FLAG_BOOKMARK; + list_add_tail(&endmark->entry, &wq_head->head); + } + + return nr_exclusive; } } +out: + if (endmark && (endmark->flags & WQ_FLAG_BOOKMARK)) + list_del(&endmark->entry); + return nr_exclusive; } @@ -125,7 +141,7 @@ static int __wake_up_common_lock(struct wait_queue_head *wq_head, unsigned int m int nr_exclusive, int wake_flags, void *key) { unsigned long flags; - wait_queue_entry_t bookmark; + wait_queue_entry_t bookmark, endmark; int remaining = nr_exclusive; bookmark.flags = 0; @@ -133,10 +149,15 @@ static int __wake_up_common_lock(struct wait_queue_head *wq_head, unsigned int m bookmark.func = NULL; INIT_LIST_HEAD(&bookmark.entry); + endmark.flags = 0; + endmark.private = NULL; + endmark.func = NULL; + INIT_LIST_HEAD(&endmark.entry); + do { spin_lock_irqsave(&wq_head->lock, flags); remaining = __wake_up_common(wq_head, mode, remaining, - wake_flags, key, &bookmark); + wake_flags, key, &bookmark, &endmark); spin_unlock_irqrestore(&wq_head->lock, flags); } while (bookmark.flags & WQ_FLAG_BOOKMARK); @@ -171,20 +192,21 @@ void __wake_up_on_current_cpu(struct wait_queue_head *wq_head, unsigned int mode */ void __wake_up_locked(struct wait_queue_head *wq_head, unsigned int mode, int nr) { - __wake_up_common(wq_head, mode, nr, 0, NULL, NULL); + __wake_up_common(wq_head, mode, nr, 0, NULL, NULL, NULL); } EXPORT_SYMBOL_GPL(__wake_up_locked); void __wake_up_locked_key(struct wait_queue_head *wq_head, unsigned int mode, void *key) { - __wake_up_common(wq_head, mode, 1, 0, key, NULL); + __wake_up_common(wq_head, mode, 1, 0, key, NULL, NULL); } EXPORT_SYMBOL_GPL(__wake_up_locked_key); void __wake_up_locked_key_bookmark(struct wait_queue_head *wq_head, - unsigned int mode, void *key, wait_queue_entry_t *bookmark) + unsigned int mode, void *key, wait_queue_entry_t *bookmark, + wait_queue_entry_t *endmark) { - __wake_up_common(wq_head, mode, 1, 0, key, bookmark); + __wake_up_common(wq_head, mode, 1, 0, key, bookmark, endmark); } EXPORT_SYMBOL_GPL(__wake_up_locked_key_bookmark); @@ -233,7 +255,7 @@ EXPORT_SYMBOL_GPL(__wake_up_sync_key); void __wake_up_locked_sync_key(struct wait_queue_head *wq_head, unsigned int mode, void *key) { - __wake_up_common(wq_head, mode, 1, WF_SYNC, key, NULL); + __wake_up_common(wq_head, mode, 1, WF_SYNC, key, NULL, NULL); } EXPORT_SYMBOL_GPL(__wake_up_locked_sync_key); diff --git a/mm/filemap.c b/mm/filemap.c index 4ea4387053e8..49dc8620271d 100644 --- a/mm/filemap.c +++ b/mm/filemap.c @@ -1135,7 +1135,7 @@ static void folio_wake_bit(struct folio *folio, int bit_nr) wait_queue_head_t *q = folio_waitqueue(folio); struct wait_page_key key; unsigned long flags; - wait_queue_entry_t bookmark; + wait_queue_entry_t bookmark, endmark; key.folio = folio; key.bit_nr = bit_nr; @@ -1146,8 +1146,13 @@ static void folio_wake_bit(struct folio *folio, int bit_nr) bookmark.func = NULL; INIT_LIST_HEAD(&bookmark.entry); + endmark.flags = 0; + endmark.private = NULL; + endmark.func = NULL; + INIT_LIST_HEAD(&endmark.entry); + spin_lock_irqsave(&q->lock, flags); - __wake_up_locked_key_bookmark(q, TASK_NORMAL, &key, &bookmark); + __wake_up_locked_key_bookmark(q, TASK_NORMAL, &key, &bookmark, &endmark); while (bookmark.flags & WQ_FLAG_BOOKMARK) { /* @@ -1159,7 +1164,7 @@ static void folio_wake_bit(struct folio *folio, int bit_nr) spin_unlock_irqrestore(&q->lock, flags); cpu_relax(); spin_lock_irqsave(&q->lock, flags); - __wake_up_locked_key_bookmark(q, TASK_NORMAL, &key, &bookmark); + __wake_up_locked_key_bookmark(q, TASK_NORMAL, &key, &bookmark, &endmark); } /*