[055/131] padata: add basic support for multithreaded jobs

Message ID	20200603225943.18h45Damv%akpm@linux-foundation.org (mailing list archive)
State	New, archived
Headers	show Return-Path: <SRS0=OQel=7Q=kvack.org=owner-linux-mm@kernel.org> DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 68DB3208C9 Date: Wed, 03 Jun 2020 15:59:43 -0700 From: Andrew Morton <akpm@linux-foundation.org> To: akpm@linux-foundation.org, alex.williamson@redhat.com, alexander.h.duyck@linux.intel.com, corbet@lwn.net, dan.j.williams@intel.com, daniel.m.jordan@oracle.com, dave.hansen@linux.intel.com, david@redhat.com, elliott@hpe.com, herbert@gondor.apana.org.au, jgg@ziepe.ca, josh@joshtriplett.org, ktkhai@virtuozzo.com, linux-mm@kvack.org, mhocko@kernel.org, mm-commits@vger.kernel.org, pasha.tatashin@soleen.com, pavel@ucw.cz, peterz@infradead.org, rdunlap@infradead.org, shile.zhang@linux.alibaba.com, steffen.klassert@secunet.com, steven.sistare@oracle.com, tj@kernel.org, torvalds@linux-foundation.org, ziy@nvidia.com Subject: [patch 055/131] padata: add basic support for multithreaded jobs Message-ID: <20200603225943.18h45Damv%akpm@linux-foundation.org> In-Reply-To: <20200603155549.e041363450869eaae4c7f05b@linux-foundation.org> User-Agent: s-nail v14.8.16 Sender: owner-linux-mm@kvack.org Precedence: bulk
Series	[001/131] mm/slub: fix a memory leak in sysfs_slab_add() \| expand [001/131] mm/slub: fix a memory leak in sysfs_slab_add() [002/131] mm/memcg: optimize memory.numa_stat like memory.stat [003/131] mm/gup: move __get_user_pages_fast() down a few lines in gup.c [004/131] mm/gup: refactor and de-duplicate gup_fast() code [005/131] mm/gup: introduce pin_user_pages_fast_only() [006/131] drm/i915: convert get_user_pages() --> pin_user_pages() [007/131] mm/gup: might_lock_read(mmap_sem) in get_user_pages_fast() [008/131] kasan: stop tests being eliminated as dead code with FORTIFY_SOURCE [009/131] string.h: fix incompatibility between FORTIFY_SOURCE and KASAN [010/131] mm: clarify __GFP_MEMALLOC usage [011/131] mm: memblock: replace dereferences of memblock_region.nid with API calls [012/131] mm: make early_pfn_to_nid() and related defintions close to each other [013/131] mm: remove CONFIG_HAVE_MEMBLOCK_NODE_MAP option [014/131] mm: free_area_init: use maximal zone PFNs rather than zone sizes [015/131] mm: use free_area_init() instead of free_area_init_nodes() [016/131] alpha: simplify detection of memory zone boundaries [017/131] arm: simplify detection of memory zone boundaries [018/131] arm64: simplify detection of memory zone boundaries for UMA configs [019/131] csky: simplify detection of memory zone boundaries [020/131] m68k: mm: simplify detection of memory zone boundaries [021/131] parisc: simplify detection of memory zone boundaries [022/131] sparc32: simplify detection of memory zone boundaries [023/131] unicore32: simplify detection of memory zone boundaries [024/131] xtensa: simplify detection of memory zone boundaries [025/131] mm: memmap_init: iterate over memblock regions rather that check each PFN [026/131] mm: remove early_pfn_in_nid() and CONFIG_NODES_SPAN_OTHER_NODES [027/131] mm: free_area_init: allow defining max_zone_pfn in descending order [028/131] mm: rename free_area_init_node() to free_area_init_memoryless_node() [029/131] mm: clean up free_area_init_node() and its helpers [030/131] mm: simplify find_min_pfn_with_active_regions() [031/131] docs/vm: update memory-models documentation [032/131] mm/page_alloc.c: bad_[reason\|flags] is not necessary when PageHWPoison [033/131] mm/page_alloc.c: bad_flags is not necessary for bad_page() [034/131] mm/page_alloc.c: rename free_pages_check_bad() to check_free_page_bad() [035/131] mm/page_alloc.c: rename free_pages_check() to check_free_page() [036/131] mm/page_alloc.c: extract check_[new\|free]_page_bad() common part to page_bad_reason() [037/131] mm,page_alloc,cma: conditionally prefer cma pageblocks for movable allocations [038/131] mm/page_alloc.c: remove unused free_bootmem_with_active_regions [039/131] mm/page_alloc.c: only tune sysctl_lowmem_reserve_ratio value once when changing it [040/131] mm/page_alloc.c: clear out zone->lowmem_reserve[] if the zone is empty [041/131] mm/vmstat.c: do not show lowmem reserve protection information of empty zone [042/131] mm/page_alloc: use ac->high_zoneidx for classzone_idx [043/131] mm/page_alloc: integrate classzone_idx and high_zoneidx [044/131] mm/page_alloc.c: use NODE_MASK_NONE in build_zonelists() [045/131] mm: rename gfpflags_to_migratetype to gfp_migratetype for same convention [046/131] mm/page_alloc.c: reset numa stats for boot pagesets [047/131] mm, page_alloc: reset the zone->watermark_boost early [048/131] mm/page_alloc: restrict and formalize compound_page_dtors[] [049/131] mm/pagealloc.c: call touch_nmi_watchdog() on max order boundaries in deferred init [050/131] mm: initialize deferred pages with interrupts enabled [051/131] mm: call cond_resched() from deferred_init_memmap() [052/131] padata: remove exit routine [053/131] padata: initialize earlier [054/131] padata: allocate work structures for parallel jobs from a pool [055/131] padata: add basic support for multithreaded jobs [056/131] mm: don't track number of pages during deferred initialization [057/131] mm: parallelize deferred_init_memmap() [058/131] mm: make deferred init's max threads arch-specific [059/131] padata: document multithreaded jobs [060/131] mm/page_alloc.c: add missing newline [061/131] khugepaged: add self test [062/131] khugepaged: do not stop collapse if less than half PTEs are referenced [063/131] khugepaged: drain all LRU caches before scanning pages [064/131] khugepaged: drain LRU add pagevec after swapin [065/131] khugepaged: allow to collapse a page shared across fork [066/131] khugepaged: allow to collapse PTE-mapped compound pages [067/131] thp: change CoW semantics for anon-THP [068/131] khugepaged: introduce 'max_ptes_shared' tunable [069/131] hugetlbfs: add arch_hugetlb_valid_size [070/131] hugetlbfs: move hugepagesz= parsing to arch independent code [071/131] hugetlbfs: remove hugetlb_add_hstate() warning for existing hstate [072/131] hugetlbfs: clean up command line processing [073/131] hugetlbfs: fix changes to command line processing [074/131] mm/hugetlb: avoid unnecessary check on pud and pmd entry in huge_pte_offset [075/131] arm64/mm: drop __HAVE_ARCH_HUGE_PTEP_GET [076/131] mm/hugetlb: define a generic fallback for is_hugepage_only_range() [077/131] mm/hugetlb: define a generic fallback for arch_clear_hugepage_flags() [078/131] mm: simplify calling a compound page destructor [079/131] mm/vmscan.c: use update_lru_size() in update_lru_sizes() [080/131] mm/vmscan: count layzfree pages and fix nr_isolated_* mismatch [081/131] mm/vmscan.c: change prototype for shrink_page_list [082/131] mm/vmscan: update the comment of should_continue_reclaim() [083/131] mm: fix NUMA node file count error in replace_page_cache() [084/131] mm: memcontrol: fix stat-corrupting race in charge moving [085/131] mm: memcontrol: drop @compound parameter from memcg charging API [086/131] mm: shmem: remove rare optimization when swapin races with hole punching [087/131] mm: memcontrol: move out cgroup swaprate throttling [088/131] mm: memcontrol: convert page cache to a new mem_cgroup_charge() API [089/131] mm: memcontrol: prepare uncharging for removal of private page type counters [090/131] mm: memcontrol: prepare move_account for removal of private page type counters [091/131] mm: memcontrol: prepare cgroup vmstat infrastructure for native anon counters [092/131] mm: memcontrol: switch to native NR_FILE_PAGES and NR_SHMEM counters [093/131] mm: memcontrol: switch to native NR_ANON_MAPPED counter [094/131] mm: memcontrol: switch to native NR_ANON_THPS counter [095/131] mm: memcontrol: convert anon and file-thp to new mem_cgroup_charge() API [096/131] mm: memcontrol: drop unused try/commit/cancel charge API [097/131] mm: memcontrol: prepare swap controller setup for integration [098/131] mm: memcontrol: make swap tracking an integral part of memory control [099/131] mm: memcontrol: charge swapin pages on instantiation [100/131] mm: memcontrol: document the new swap control behavior [101/131] mm: memcontrol: delete unused lrucare handling [102/131] mm: memcontrol: update page->mem_cgroup stability rules [103/131] mm: fix LRU balancing effect of new transparent huge pages [104/131] mm: keep separate anon and file statistics on page reclaim activity [105/131] mm: allow swappiness that prefers reclaiming anon over the file workingset [106/131] mm: fold and remove lru_cache_add_anon() and lru_cache_add_file() [107/131] mm: workingset: let cache workingset challenge anon [108/131] mm: remove use-once cache bias from LRU balancing [109/131] mm: vmscan: drop unnecessary div0 avoidance rounding in get_scan_count() [110/131] mm: base LRU balancing on an explicit cost model [111/131] mm: deactivations shouldn't bias the LRU balance [112/131] mm: only count actual rotations as LRU reclaim cost [113/131] mm: balance LRU lists based on relative thrashing [114/131] mm: vmscan: determine anon/file pressure balance at the reclaim root [115/131] mm: vmscan: reclaim writepage is IO cost [116/131] mm: vmscan: limit the range of LRU type balancing [117/131] mm: swap: fix vmstats for huge pages [118/131] mm: swap: memcg: fix memcg stats for huge pages [119/131] tools/vm/page_owner_sort.c: filter out unneeded line [120/131] mm, mempolicy: fix up gup usage in lookup_node [121/131] include/linux/memblock.h: fix minor typo and unclear comment [122/131] sparc32: register memory occupied by kernel as memblock.memory [123/131] hugetlbfs: get unmapped area below TASK_UNMAPPED_BASE for hugetlbfs [124/131] mm: thp: don't need to drain lru cache when splitting and mlocking THP [125/131] powerpc/mm: drop platform defined pmd_mknotpresent() [126/131] mm/thp: rename pmd_mknotpresent() as pmd_mkinvalid() [127/131] drivers/base/memory.c: cache memory blocks in xarray to accelerate lookup [128/131] mm: add DEBUG_WX support [129/131] riscv: support DEBUG_WX [130/131] x86: mm: use ARCH_HAS_DEBUG_WX instead of arch defined [131/131] arm64: mm: use ARCH_HAS_DEBUG_WX instead of arch defined

Message ID

20200603225943.18h45Damv%akpm@linux-foundation.org (mailing list archive)

State

New, archived

Headers

DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 68DB3208C9
Date: Wed, 03 Jun 2020 15:59:43 -0700
From: Andrew Morton <akpm@linux-foundation.org>
To: akpm@linux-foundation.org, alex.williamson@redhat.com,
 alexander.h.duyck@linux.intel.com, corbet@lwn.net,
 dan.j.williams@intel.com, daniel.m.jordan@oracle.com,
 dave.hansen@linux.intel.com, david@redhat.com, elliott@hpe.com,
 herbert@gondor.apana.org.au, jgg@ziepe.ca, josh@joshtriplett.org,
 ktkhai@virtuozzo.com, linux-mm@kvack.org, mhocko@kernel.org,
 mm-commits@vger.kernel.org, pasha.tatashin@soleen.com, pavel@ucw.cz,
 peterz@infradead.org, rdunlap@infradead.org,
 shile.zhang@linux.alibaba.com, steffen.klassert@secunet.com,
 steven.sistare@oracle.com, tj@kernel.org, torvalds@linux-foundation.org,
 ziy@nvidia.com
Subject: [patch 055/131] padata: add basic support for
 multithreaded jobs
Message-ID: <20200603225943.18h45Damv%akpm@linux-foundation.org>
In-Reply-To: <20200603155549.e041363450869eaae4c7f05b@linux-foundation.org>
User-Agent: s-nail v14.8.16
Sender: owner-linux-mm@kvack.org
Precedence: bulk

Series

[001/131] mm/slub: fix a memory leak in sysfs_slab_add() | expand

Commit Message

Andrew Morton June 3, 2020, 10:59 p.m. UTC

From: Daniel Jordan <daniel.m.jordan@oracle.com>
Subject: padata: add basic support for multithreaded jobs

Sometimes the kernel doesn't take full advantage of system memory
bandwidth, leading to a single CPU spending excessive time in
initialization paths where the data scales with memory size.

Multithreading naturally addresses this problem.

Extend padata, a framework that handles many parallel yet singlethreaded
jobs, to also handle multithreaded jobs by adding support for splitting up
the work evenly, specifying a minimum amount of work that's appropriate
for one helper thread to do, load balancing between helpers, and
coordinating them.

This is inspired by work from Pavel Tatashin and Steve Sistare.

Link: http://lkml.kernel.org/r/20200527173608.2885243-5-daniel.m.jordan@oracle.com
Signed-off-by: Daniel Jordan <daniel.m.jordan@oracle.com>
Tested-by: Josh Triplett <josh@joshtriplett.org>
Cc: Alexander Duyck <alexander.h.duyck@linux.intel.com>
Cc: Alex Williamson <alex.williamson@redhat.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Kirill Tkhai <ktkhai@virtuozzo.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Pavel Machek <pavel@ucw.cz>
Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Robert Elliott <elliott@hpe.com>
Cc: Shile Zhang <shile.zhang@linux.alibaba.com>
Cc: Steffen Klassert <steffen.klassert@secunet.com>
Cc: Steven Sistare <steven.sistare@oracle.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Zi Yan <ziy@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 include/linux/padata.h |   29 +++++++
 kernel/padata.c        |  152 ++++++++++++++++++++++++++++++++++++++-
 2 files changed, 178 insertions(+), 3 deletions(-)

--- a/include/linux/padata.h~padata-add-basic-support-for-multithreaded-jobs
+++ a/include/linux/padata.h
@@ -4,6 +4,9 @@ 
  *
  * Copyright (C) 2008, 2009 secunet Security Networks AG
  * Copyright (C) 2008, 2009 Steffen Klassert <steffen.klassert@secunet.com>
+ *
+ * Copyright (c) 2020 Oracle and/or its affiliates.
+ * Author: Daniel Jordan <daniel.m.jordan@oracle.com>
  */
 
 #ifndef PADATA_H
@@ -131,6 +134,31 @@  struct padata_shell {
 };
 
 /**
+ * struct padata_mt_job - represents one multithreaded job
+ *
+ * @thread_fn: Called for each chunk of work that a padata thread does.
+ * @fn_arg: The thread function argument.
+ * @start: The start of the job (units are job-specific).
+ * @size: size of this node's work (units are job-specific).
+ * @align: Ranges passed to the thread function fall on this boundary, with the
+ *         possible exceptions of the beginning and end of the job.
+ * @min_chunk: The minimum chunk size in job-specific units.  This allows
+ *             the client to communicate the minimum amount of work that's
+ *             appropriate for one worker thread to do at once.
+ * @max_threads: Max threads to use for the job, actual number may be less
+ *               depending on task size and minimum chunk size.
+ */
+struct padata_mt_job {
+	void (*thread_fn)(unsigned long start, unsigned long end, void *arg);
+	void			*fn_arg;
+	unsigned long		start;
+	unsigned long		size;
+	unsigned long		align;
+	unsigned long		min_chunk;
+	int			max_threads;
+};
+
+/**
  * struct padata_instance - The overall control structure.
  *
  * @cpu_online_node: Linkage for CPU online callback.
@@ -173,6 +201,7 @@  extern void padata_free_shell(struct pad
 extern int padata_do_parallel(struct padata_shell *ps,
 			      struct padata_priv *padata, int *cb_cpu);
 extern void padata_do_serial(struct padata_priv *padata);
+extern void __init padata_do_multithreaded(struct padata_mt_job *job);
 extern int padata_set_cpumask(struct padata_instance *pinst, int cpumask_type,
 			      cpumask_var_t cpumask);
 extern int padata_start(struct padata_instance *pinst);
--- a/kernel/padata.c~padata-add-basic-support-for-multithreaded-jobs
+++ a/kernel/padata.c
@@ -7,6 +7,9 @@ 
  * Copyright (C) 2008, 2009 secunet Security Networks AG
  * Copyright (C) 2008, 2009 Steffen Klassert <steffen.klassert@secunet.com>
  *
+ * Copyright (c) 2020 Oracle and/or its affiliates.
+ * Author: Daniel Jordan <daniel.m.jordan@oracle.com>
+ *
  * This program is free software; you can redistribute it and/or modify it
  * under the terms and conditions of the GNU General Public License,
  * version 2, as published by the Free Software Foundation.
@@ -21,6 +24,7 @@ 
  * 51 Franklin St - Fifth Floor, Boston, MA 02110-1301 USA.
  */
 
+#include <linux/completion.h>
 #include <linux/export.h>
 #include <linux/cpumask.h>
 #include <linux/err.h>
@@ -32,6 +36,8 @@ 
 #include <linux/sysfs.h>
 #include <linux/rcupdate.h>
 
+#define	PADATA_WORK_ONSTACK	1	/* Work's memory is on stack */
+
 struct padata_work {
 	struct work_struct	pw_work;
 	struct list_head	pw_list;  /* padata_free_works linkage */
@@ -42,7 +48,17 @@  static DEFINE_SPINLOCK(padata_works_lock
 static struct padata_work *padata_works;
 static LIST_HEAD(padata_free_works);
 
+struct padata_mt_job_state {
+	spinlock_t		lock;
+	struct completion	completion;
+	struct padata_mt_job	*job;
+	int			nworks;
+	int			nworks_fini;
+	unsigned long		chunk_size;
+};
+
 static void padata_free_pd(struct parallel_data *pd);
+static void __init padata_mt_helper(struct work_struct *work);
 
 static int padata_index_to_cpu(struct parallel_data *pd, int cpu_index)
 {
@@ -81,18 +97,56 @@  static struct padata_work *padata_work_a
 }
 
 static void padata_work_init(struct padata_work *pw, work_func_t work_fn,
-			     void *data)
+			     void *data, int flags)
 {
-	INIT_WORK(&pw->pw_work, work_fn);
+	if (flags & PADATA_WORK_ONSTACK)
+		INIT_WORK_ONSTACK(&pw->pw_work, work_fn);
+	else
+		INIT_WORK(&pw->pw_work, work_fn);
 	pw->pw_data = data;
 }
 
+static int __init padata_work_alloc_mt(int nworks, void *data,
+				       struct list_head *head)
+{
+	int i;
+
+	spin_lock(&padata_works_lock);
+	/* Start at 1 because the current task participates in the job. */
+	for (i = 1; i < nworks; ++i) {
+		struct padata_work *pw = padata_work_alloc();
+
+		if (!pw)
+			break;
+		padata_work_init(pw, padata_mt_helper, data, 0);
+		list_add(&pw->pw_list, head);
+	}
+	spin_unlock(&padata_works_lock);
+
+	return i;
+}
+
 static void padata_work_free(struct padata_work *pw)
 {
 	lockdep_assert_held(&padata_works_lock);
 	list_add(&pw->pw_list, &padata_free_works);
 }
 
+static void __init padata_works_free(struct list_head *works)
+{
+	struct padata_work *cur, *next;
+
+	if (list_empty(works))
+		return;
+
+	spin_lock(&padata_works_lock);
+	list_for_each_entry_safe(cur, next, works, pw_list) {
+		list_del(&cur->pw_list);
+		padata_work_free(cur);
+	}
+	spin_unlock(&padata_works_lock);
+}
+
 static void padata_parallel_worker(struct work_struct *parallel_work)
 {
 	struct padata_work *pw = container_of(parallel_work, struct padata_work,
@@ -168,7 +222,7 @@  int padata_do_parallel(struct padata_she
 	pw = padata_work_alloc();
 	spin_unlock(&padata_works_lock);
 	if (pw) {
-		padata_work_init(pw, padata_parallel_worker, padata);
+		padata_work_init(pw, padata_parallel_worker, padata, 0);
 		queue_work(pinst->parallel_wq, &pw->pw_work);
 	} else {
 		/* Maximum works limit exceeded, run in the current task. */
@@ -409,6 +463,98 @@  out:
 	return err;
 }
 
+static void __init padata_mt_helper(struct work_struct *w)
+{
+	struct padata_work *pw = container_of(w, struct padata_work, pw_work);
+	struct padata_mt_job_state *ps = pw->pw_data;
+	struct padata_mt_job *job = ps->job;
+	bool done;
+
+	spin_lock(&ps->lock);
+
+	while (job->size > 0) {
+		unsigned long start, size, end;
+
+		start = job->start;
+		/* So end is chunk size aligned if enough work remains. */
+		size = roundup(start + 1, ps->chunk_size) - start;
+		size = min(size, job->size);
+		end = start + size;
+
+		job->start = end;
+		job->size -= size;
+
+		spin_unlock(&ps->lock);
+		job->thread_fn(start, end, job->fn_arg);
+		spin_lock(&ps->lock);
+	}
+
+	++ps->nworks_fini;
+	done = (ps->nworks_fini == ps->nworks);
+	spin_unlock(&ps->lock);
+
+	if (done)
+		complete(&ps->completion);
+}
+
+/**
+ * padata_do_multithreaded - run a multithreaded job
+ * @job: Description of the job.
+ *
+ * See the definition of struct padata_mt_job for more details.
+ */
+void __init padata_do_multithreaded(struct padata_mt_job *job)
+{
+	/* In case threads finish at different times. */
+	static const unsigned long load_balance_factor = 4;
+	struct padata_work my_work, *pw;
+	struct padata_mt_job_state ps;
+	LIST_HEAD(works);
+	int nworks;
+
+	if (job->size == 0)
+		return;
+
+	/* Ensure at least one thread when size < min_chunk. */
+	nworks = max(job->size / job->min_chunk, 1ul);
+	nworks = min(nworks, job->max_threads);
+
+	if (nworks == 1) {
+		/* Single thread, no coordination needed, cut to the chase. */
+		job->thread_fn(job->start, job->start + job->size, job->fn_arg);
+		return;
+	}
+
+	spin_lock_init(&ps.lock);
+	init_completion(&ps.completion);
+	ps.job	       = job;
+	ps.nworks      = padata_work_alloc_mt(nworks, &ps, &works);
+	ps.nworks_fini = 0;
+
+	/*
+	 * Chunk size is the amount of work a helper does per call to the
+	 * thread function.  Load balance large jobs between threads by
+	 * increasing the number of chunks, guarantee at least the minimum
+	 * chunk size from the caller, and honor the caller's alignment.
+	 */
+	ps.chunk_size = job->size / (ps.nworks * load_balance_factor);
+	ps.chunk_size = max(ps.chunk_size, job->min_chunk);
+	ps.chunk_size = roundup(ps.chunk_size, job->align);
+
+	list_for_each_entry(pw, &works, pw_list)
+		queue_work(system_unbound_wq, &pw->pw_work);
+
+	/* Use the current thread, which saves starting a workqueue worker. */
+	padata_work_init(&my_work, padata_mt_helper, &ps, PADATA_WORK_ONSTACK);
+	padata_mt_helper(&my_work.pw_work);
+
+	/* Wait for all the helpers to finish. */
+	wait_for_completion(&ps.completion);
+
+	destroy_work_on_stack(&my_work.pw_work);
+	padata_works_free(&works);
+}
+
 static void __padata_list_init(struct padata_list *pd_list)
 {
 	INIT_LIST_HEAD(&pd_list->list);

[055/131] padata: add basic support for multithreaded jobs

Commit Message

Patch