[v4,0/5] Parallel Checkout (part 2)

This version is almost identical to v3, but the last patch incorporates the typo fixes and other rewording suggestions Christian made about the design doc on the last round. I decided to remove the sentence about step 3 dominating the execution time as that's not always the case on e.g. a non-local clone or sparse-checkout. Matheus Tavares (5): unpack-trees: add basic support for parallel checkout parallel-checkout: make it truly parallel parallel-checkout: add configuration options parallel-checkout: support progress displaying parallel-checkout: add design documentation .gitignore | 1 + Documentation/Makefile | 1 + Documentation/config/checkout.txt | 21 + Documentation/technical/parallel-checkout.txt | 270 ++++++++ Makefile | 2 + builtin.h | 1 + builtin/checkout--worker.c | 145 ++++ entry.c | 17 +- git.c | 2 + parallel-checkout.c | 655 ++++++++++++++++++ parallel-checkout.h | 111 +++ unpack-trees.c | 19 +- 12 files changed, 1240 insertions(+), 5 deletions(-) create mode 100644 Documentation/technical/parallel-checkout.txt create mode 100644 builtin/checkout--worker.c create mode 100644 parallel-checkout.c create mode 100644 parallel-checkout.h Range-diff against v3: 1: 7096822c14 = 1: 7096822c14 unpack-trees: add basic support for parallel checkout 2: 4526516ea0 = 2: 4526516ea0 parallel-checkout: make it truly parallel 3: ad165c0637 = 3: ad165c0637 parallel-checkout: add configuration options 4: cf9e28dc0e = 4: cf9e28dc0e parallel-checkout: support progress displaying 5: 415d4114aa ! 5: fd929f072c parallel-checkout: add design documentation @@ Documentation/technical/parallel-checkout.txt (new) +* Step 4: Write the new index to disk. + +Step 3 is the focus of the "parallel checkout" effort described here. -+It dominates the execution time for most of the above command types. + +Sequential Implementation +------------------------- @@ Documentation/technical/parallel-checkout.txt (new) +It wouldn't be safe to perform Step 3b in parallel, as there could be +race conditions between file creations and removals. Instead, the +parallel checkout framework lets the sequential code handle Step 3b, -+and use parallel workers to replace the sequential ++and uses parallel workers to replace the sequential +`entry.c:write_entry()` calls from Step 3c. + +Rejected Multi-Threaded Solution @@ Documentation/technical/parallel-checkout.txt (new) +warning for the user, like the classic sequential checkout does. + +The workers are able to detect both collisions among the entries being -+concurrently written and collisions among parallel-eligible and -+ineligible entries. The general idea for collision detection is quite -+straightforward: for each parallel-eligible entry, the main process must -+remove all files that prevent this entry from being written (before -+enqueueing it). This includes any non-directory file in the leading path -+of the entry. Later, when a worker gets assigned the entry, it looks -+again for the non-directories files and for an already existing file at -+the entry's path. If any of these checks finds something, the worker -+knows that there was a path collision. ++concurrently written and collisions between a parallel-eligible entry ++and an ineligible entry. The general idea for collision detection is ++quite straightforward: for each parallel-eligible entry, the main ++process must remove all files that prevent this entry from being written ++(before enqueueing it). This includes any non-directory file in the ++leading path of the entry. Later, when a worker gets assigned the entry, ++it looks again for the non-directories files and for an already existing ++file at the entry's path. If any of these checks finds something, the ++worker knows that there was a path collision. + +Because parallel checkout can distinguish path collisions from the case +where the file was already present in the working tree before checkout, @@ Documentation/technical/parallel-checkout.txt (new) +Besides, long-running filters may use the delayed checkout feature to +postpone the return of some filtered blobs. The delayed checkout queue +and the parallel checkout queue are not compatible and should remain -+separated. ++separate. ++ +Note: regular files that only require internal filters, like end-of-line +conversion and re-encoding, are eligible for parallel checkout. @@ Documentation/technical/parallel-checkout.txt (new) +The API +------- + -+The parallel checkout API was designed with the goal to minimize changes -+to the current users of the checkout machinery. This means that they -+don't have to call a different function for sequential or parallel ++The parallel checkout API was designed with the goal of minimizing ++changes to the current users of the checkout machinery. This means that ++they don't have to call a different function for sequential or parallel +checkout. As already mentioned, `checkout_entry()` will automatically +insert the given entry in the parallel checkout queue when this feature +is enabled and the entry is eligible; otherwise, it will just write the

Message ID	cover.1618861380.git.matheus.bernardino@usp.br (mailing list archive)
Headers	show Return-Path: <git-owner@kernel.org> From: Matheus Tavares <matheus.bernardino@usp.br> To: gitster@pobox.com Cc: git@vger.kernel.org, christian.couder@gmail.com, git@jeffhostetler.com Subject: [PATCH v4 0/5] Parallel Checkout (part 2) Date: Mon, 19 Apr 2021 16:53:30 -0300 Message-Id: <cover.1618861380.git.matheus.bernardino@usp.br> In-Reply-To: <cover.1618790794.git.matheus.bernardino@usp.br> References: <cover.1618790794.git.matheus.bernardino@usp.br> MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Precedence: bulk
Series	Parallel Checkout (part 2) \| expand [v4,0/5] Parallel Checkout (part 2) [v4,1/5] unpack-trees: add basic support for parallel checkout [v4,2/5] parallel-checkout: make it truly parallel [v4,3/5] parallel-checkout: add configuration options [v4,4/5] parallel-checkout: support progress displaying [v4,5/5] parallel-checkout: add design documentation

[v4,0/5] Parallel Checkout (part 2)

Message