From patchwork Mon Aug 10 21:33:09 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Matheus Tavares X-Patchwork-Id: 11708153 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 2795C1392 for ; Mon, 10 Aug 2020 21:34:27 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 0D87E2073E for ; Mon, 10 Aug 2020 21:34:27 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=usp-br.20150623.gappssmtp.com header.i=@usp-br.20150623.gappssmtp.com header.b="a0ZlZzWJ" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726775AbgHJVe0 (ORCPT ); Mon, 10 Aug 2020 17:34:26 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36028 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726615AbgHJVeZ (ORCPT ); Mon, 10 Aug 2020 17:34:25 -0400 Received: from mail-qk1-x744.google.com (mail-qk1-x744.google.com [IPv6:2607:f8b0:4864:20::744]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 46E29C061756 for ; Mon, 10 Aug 2020 14:34:25 -0700 (PDT) Received: by mail-qk1-x744.google.com with SMTP id l64so9847811qkb.8 for ; Mon, 10 Aug 2020 14:34:25 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=usp-br.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=1Fc2qv7QNbKAklvZ2pqqMAXUIQRikZ+en0/8xZepLBE=; b=a0ZlZzWJAI4j1RWcnl2Vumfg1j/NiqKp1DUWSmmx/MHeHpsyehN6c2u62b+f1UYJd5 4w/Iho/Fxd1Jem8I9xfv8aU4sPXZ4hzn/ncTjfAAYaGHfAxgFB678Sx2smVuvLODkwYS FVBCagaqekiK6OeAnIqiI/Y9FvK8LOeju+Lkz/BkkTvXv8xV0FdF1AW/gFOIh31Hu+b3 NAJWz+OF8nR08E5bDt+G343w7BH4cn42eHbRfH2flGmJzrfi1LadC5wD80wFqjtI2yzr oMJaWfNYU3YBicUa+EM0XI7/e0M+0IdkqdDfkA+8iGujGrMArLaF5GRVKlo3TkNH2pSp xFNw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=1Fc2qv7QNbKAklvZ2pqqMAXUIQRikZ+en0/8xZepLBE=; b=bgwtqf6L1HTGMRhvPASLLBmoUVwQT678gscw4qT8kXXHYoPTWNimJbreG4HYzgJK4e Kh+vwHH8pOUKQOJaZw6BylUiFg+Ag2N2wGXtPPAVI+Qy7MxS93p+DTSmRFJF0T7DY2de eM7Nk380Gvdc+F/9Zmf/iLIFgi/YayXNEIgnE/jJMkQktM9/nTnomuLCtPNu5c6LvZcf iumdqbG4oF5Zcr5sdA2+umqCPsT28ad6xzAve+kg7lPAMMtzLDq2JV4muzsgR5mHvenG ZcaG44B3SbsHHALHx2uwVxxP3H/pM1UmsmXhIFY3svt4ri4XDQXsFsKr2E+rejyBukv2 E/Tg== X-Gm-Message-State: AOAM532+8k3rpJG+KO0NGJIdu0UEnOWippInauOHEXNPAsY9m+XDmLJ4 PxGxxWcw/N0sdlvhO13BETPG6t4Y6/s= X-Google-Smtp-Source: ABdhPJyQ9LhnKI37ExBVUkVSt//QUEzCbCtEukc4+UGdlrgQW77EMHmPKsYiPYJoKdI4hjBDCk9NEg== X-Received: by 2002:a05:620a:1105:: with SMTP id o5mr28652882qkk.434.1597095264160; Mon, 10 Aug 2020 14:34:24 -0700 (PDT) Received: from localhost.localdomain ([2804:18:87c:466:1120:3c2c:21e4:5931]) by smtp.gmail.com with ESMTPSA id z197sm15370674qkb.66.2020.08.10.14.34.20 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 10 Aug 2020 14:34:23 -0700 (PDT) From: Matheus Tavares To: git@vger.kernel.org Cc: stolee@gmail.com, jeffhost@microsoft.com, Lars Schneider , =?utf-8?q?Torsten_B=C3=B6gershau?= =?utf-8?q?sen?= , Junio C Hamano , "brian m. carlson" Subject: [RFC PATCH 01/21] convert: make convert_attrs() and convert structs public Date: Mon, 10 Aug 2020 18:33:09 -0300 Message-Id: X-Mailer: git-send-email 2.27.0 In-Reply-To: References: MIME-Version: 1.0 Sender: git-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Jeff Hostetler Move convert_attrs() declaration from convert.c to convert.h, together with the conv_attrs struct and the crlf_action enum. This function and the data structures will be used outside convert.c in the upcoming parallel checkout implementation. [matheus.bernardino: squash and reword msg] Signed-off-by: Matheus Tavares --- convert.c | 23 ++--------------------- convert.h | 24 ++++++++++++++++++++++++ 2 files changed, 26 insertions(+), 21 deletions(-) diff --git a/convert.c b/convert.c index 572449825c..9710d770dc 100644 --- a/convert.c +++ b/convert.c @@ -24,17 +24,6 @@ #define CONVERT_STAT_BITS_TXT_CRLF 0x2 #define CONVERT_STAT_BITS_BIN 0x4 -enum crlf_action { - CRLF_UNDEFINED, - CRLF_BINARY, - CRLF_TEXT, - CRLF_TEXT_INPUT, - CRLF_TEXT_CRLF, - CRLF_AUTO, - CRLF_AUTO_INPUT, - CRLF_AUTO_CRLF -}; - struct text_stat { /* NUL, CR, LF and CRLF counts */ unsigned nul, lonecr, lonelf, crlf; @@ -1300,18 +1289,10 @@ static int git_path_check_ident(struct attr_check_item *check) return !!ATTR_TRUE(value); } -struct conv_attrs { - struct convert_driver *drv; - enum crlf_action attr_action; /* What attr says */ - enum crlf_action crlf_action; /* When no attr is set, use core.autocrlf */ - int ident; - const char *working_tree_encoding; /* Supported encoding or default encoding if NULL */ -}; - static struct attr_check *check; -static void convert_attrs(const struct index_state *istate, - struct conv_attrs *ca, const char *path) +void convert_attrs(const struct index_state *istate, + struct conv_attrs *ca, const char *path) { struct attr_check_item *ccheck = NULL; diff --git a/convert.h b/convert.h index e29d1026a6..aeb4a1be9a 100644 --- a/convert.h +++ b/convert.h @@ -37,6 +37,27 @@ enum eol { #endif }; +enum crlf_action { + CRLF_UNDEFINED, + CRLF_BINARY, + CRLF_TEXT, + CRLF_TEXT_INPUT, + CRLF_TEXT_CRLF, + CRLF_AUTO, + CRLF_AUTO_INPUT, + CRLF_AUTO_CRLF +}; + +struct convert_driver; + +struct conv_attrs { + struct convert_driver *drv; + enum crlf_action attr_action; /* What attr says */ + enum crlf_action crlf_action; /* When no attr is set, use core.autocrlf */ + int ident; + const char *working_tree_encoding; /* Supported encoding or default encoding if NULL */ +}; + enum ce_delay_state { CE_NO_DELAY = 0, CE_CAN_DELAY = 1, @@ -102,6 +123,9 @@ void convert_to_git_filter_fd(const struct index_state *istate, int would_convert_to_git_filter_fd(const struct index_state *istate, const char *path); +void convert_attrs(const struct index_state *istate, + struct conv_attrs *ca, const char *path); + /* * Initialize the checkout metadata with the given values. Any argument may be * NULL if it is not applicable. The treeish should be a commit if that is From patchwork Mon Aug 10 21:33:10 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Matheus Tavares X-Patchwork-Id: 11708155 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 4B265109A for ; Mon, 10 Aug 2020 21:34:36 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 2646420734 for ; Mon, 10 Aug 2020 21:34:36 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=usp-br.20150623.gappssmtp.com header.i=@usp-br.20150623.gappssmtp.com header.b="i63hEBYB" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726791AbgHJVef (ORCPT ); Mon, 10 Aug 2020 17:34:35 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36050 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726615AbgHJVee (ORCPT ); Mon, 10 Aug 2020 17:34:34 -0400 Received: from mail-qv1-xf41.google.com (mail-qv1-xf41.google.com [IPv6:2607:f8b0:4864:20::f41]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 9BD1DC061756 for ; Mon, 10 Aug 2020 14:34:34 -0700 (PDT) Received: by mail-qv1-xf41.google.com with SMTP id a19so5013811qvy.3 for ; Mon, 10 Aug 2020 14:34:34 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=usp-br.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=40sUyRu1dFi+4Bq4gLlKro3POd6437kVSlHaHPlTBaE=; b=i63hEBYBZCSNtBZAYWL5rMIRhDMT0XP1ofxP4Xmpx/Z36TIDMJ1HGDe2DoDR9dKkui lxiwrws61Ra0FUGVddWdnFkoF0nzFgTEbacpU8hZaDhM/4RzCsm+LL5zkTC7QxYRMGjZ j5nfwAeXBUbw5Zv/BuAuscTFU6S5orVcl5M+irXzYauShoF7QCNbymJTbmRTWFmBE2US C5oUtY6dhcRpT6hOkTwGrdVqSqcWHqCMWl+lFSc+gSR/GOSA3tUTtROlZTQJIA24i1ZD SuZaF+oIQiJKfpHa5daIuGnoqDMGvK3LffKkOK85Uu5Kz3z/hk3wKrZNlho50vaoyFKu DPIA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=40sUyRu1dFi+4Bq4gLlKro3POd6437kVSlHaHPlTBaE=; b=GD+1UkxQM5sEzgySHBmMIA8OMk9OT4FZ/GIIYHlSZ/ZfcinDcp+e74Fg7AzPyVX3cO VwhQzGSZJk+id7QL+Z08JaxEd6qg5DTihoL17OLGwsPQcPYEGvJXq7DxWPnNDBRH2Sjb T+5ZAMg6lkFTpu65P+xrLZu5TP86tMXDeCO2x8sBfZhppptYrXtkX1dLMZpKpGvoWyFe Jnd/jNHkRfGfrWgDxPlS3GZcl2duUK0sUfRl5BoFoKV+YLcPOfh4h598htDRtslPtoKG NbeQNWwUt7qxLsxGSPtJpDSjuFGTq1Ru2ijxINTp7ZFIidJW1YYOW6aTmJ/r6yBBYP8K oslA== X-Gm-Message-State: AOAM533LWaBorI699mlC8WBwVjuSHl4sJnJWv53fmx8Dc7nh6JIQJgw+ 5K5GkP/fLD7NJIJoTIfJvBY1cMwmNQA= X-Google-Smtp-Source: ABdhPJwhOjcfnBwxETIm3/CeM8lWJKtkxvuI1FrFtUaQqr5cVuohiPrvVDB8nb6U9WxE39YxJM52gw== X-Received: by 2002:a0c:b52b:: with SMTP id d43mr30878883qve.158.1597095273416; Mon, 10 Aug 2020 14:34:33 -0700 (PDT) Received: from localhost.localdomain ([2804:18:87c:466:1120:3c2c:21e4:5931]) by smtp.gmail.com with ESMTPSA id z197sm15370674qkb.66.2020.08.10.14.34.30 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 10 Aug 2020 14:34:32 -0700 (PDT) From: Matheus Tavares To: git@vger.kernel.org Cc: stolee@gmail.com, jeffhost@microsoft.com, =?utf-8?b?Tmd1eeG7hW4gVGjDoWkg?= =?utf-8?b?Tmfhu41jIER1eQ==?= , Junio C Hamano , Lars Schneider Subject: [RFC PATCH 02/21] convert: add [async_]convert_to_working_tree_ca() variants Date: Mon, 10 Aug 2020 18:33:10 -0300 Message-Id: X-Mailer: git-send-email 2.27.0 In-Reply-To: References: MIME-Version: 1.0 Sender: git-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Jeff Hostetler Separate the attribute gathering from the actual conversion by adding _ca() variants of the conversion functions. These variants receive a precomputed 'struct conv_attrs', not relying, thus, on a index state. They will be used in a future patch adding parallel checkout support, for two reasons: - We will already load the conversion attributes in checkout_entry(), before conversion, to decide whether a path is eligible for parallel checkout. Therefore, it would be wasteful to load them again later, for the actual conversion. - The parallel workers will be responsible for reading, converting and writing blobs to the working tree. They won't have access to the main process' index state, so they cannot load the attributes. Instead, they will receive the preloaded ones and call the _ca() variant of the conversion functions. Furthermore, the attributes machinery is optimized to handle paths in sequential order, so it's better to leave it for the main process, anyway. [matheus.bernardino: squash, remove one function definition and reword] Signed-off-by: Matheus Tavares --- convert.c | 50 ++++++++++++++++++++++++++++++++++++-------------- convert.h | 9 +++++++++ 2 files changed, 45 insertions(+), 14 deletions(-) diff --git a/convert.c b/convert.c index 9710d770dc..757dc2585c 100644 --- a/convert.c +++ b/convert.c @@ -1450,7 +1450,7 @@ void convert_to_git_filter_fd(const struct index_state *istate, ident_to_git(dst->buf, dst->len, dst, ca.ident); } -static int convert_to_working_tree_internal(const struct index_state *istate, +static int convert_to_working_tree_internal(const struct conv_attrs *ca, const char *path, const char *src, size_t len, struct strbuf *dst, int normalizing, @@ -1458,11 +1458,8 @@ static int convert_to_working_tree_internal(const struct index_state *istate, struct delayed_checkout *dco) { int ret = 0, ret_filter = 0; - struct conv_attrs ca; - - convert_attrs(istate, &ca, path); - ret |= ident_to_worktree(src, len, dst, ca.ident); + ret |= ident_to_worktree(src, len, dst, ca->ident); if (ret) { src = dst->buf; len = dst->len; @@ -1472,24 +1469,24 @@ static int convert_to_working_tree_internal(const struct index_state *istate, * is a smudge or process filter (even if the process filter doesn't * support smudge). The filters might expect CRLFs. */ - if ((ca.drv && (ca.drv->smudge || ca.drv->process)) || !normalizing) { - ret |= crlf_to_worktree(src, len, dst, ca.crlf_action); + if ((ca->drv && (ca->drv->smudge || ca->drv->process)) || !normalizing) { + ret |= crlf_to_worktree(src, len, dst, ca->crlf_action); if (ret) { src = dst->buf; len = dst->len; } } - ret |= encode_to_worktree(path, src, len, dst, ca.working_tree_encoding); + ret |= encode_to_worktree(path, src, len, dst, ca->working_tree_encoding); if (ret) { src = dst->buf; len = dst->len; } ret_filter = apply_filter( - path, src, len, -1, dst, ca.drv, CAP_SMUDGE, meta, dco); - if (!ret_filter && ca.drv && ca.drv->required) - die(_("%s: smudge filter %s failed"), path, ca.drv->name); + path, src, len, -1, dst, ca->drv, CAP_SMUDGE, meta, dco); + if (!ret_filter && ca->drv && ca->drv->required) + die(_("%s: smudge filter %s failed"), path, ca->drv->name); return ret | ret_filter; } @@ -1500,7 +1497,9 @@ int async_convert_to_working_tree(const struct index_state *istate, const struct checkout_metadata *meta, void *dco) { - return convert_to_working_tree_internal(istate, path, src, len, dst, 0, meta, dco); + struct conv_attrs ca; + convert_attrs(istate, &ca, path); + return convert_to_working_tree_internal(&ca, path, src, len, dst, 0, meta, dco); } int convert_to_working_tree(const struct index_state *istate, @@ -1508,13 +1507,36 @@ int convert_to_working_tree(const struct index_state *istate, size_t len, struct strbuf *dst, const struct checkout_metadata *meta) { - return convert_to_working_tree_internal(istate, path, src, len, dst, 0, meta, NULL); + struct conv_attrs ca; + convert_attrs(istate, &ca, path); + return convert_to_working_tree_internal(&ca, path, src, len, dst, 0, meta, NULL); +} + +int async_convert_to_working_tree_ca(const struct conv_attrs *ca, + const char *path, const char *src, + size_t len, struct strbuf *dst, + const struct checkout_metadata *meta, + void *dco) +{ + return convert_to_working_tree_internal(ca, path, src, len, dst, 0, meta, dco); +} + +int convert_to_working_tree_ca(const struct conv_attrs *ca, + const char *path, const char *src, + size_t len, struct strbuf *dst, + const struct checkout_metadata *meta) +{ + return convert_to_working_tree_internal(ca, path, src, len, dst, 0, meta, NULL); } int renormalize_buffer(const struct index_state *istate, const char *path, const char *src, size_t len, struct strbuf *dst) { - int ret = convert_to_working_tree_internal(istate, path, src, len, dst, 1, NULL, NULL); + struct conv_attrs ca; + int ret; + + convert_attrs(istate, &ca, path); + ret = convert_to_working_tree_internal(&ca, path, src, len, dst, 1, NULL, NULL); if (ret) { src = dst->buf; len = dst->len; diff --git a/convert.h b/convert.h index aeb4a1be9a..46d537d1ae 100644 --- a/convert.h +++ b/convert.h @@ -100,11 +100,20 @@ int convert_to_working_tree(const struct index_state *istate, const char *path, const char *src, size_t len, struct strbuf *dst, const struct checkout_metadata *meta); +int convert_to_working_tree_ca(const struct conv_attrs *ca, + const char *path, const char *src, + size_t len, struct strbuf *dst, + const struct checkout_metadata *meta); int async_convert_to_working_tree(const struct index_state *istate, const char *path, const char *src, size_t len, struct strbuf *dst, const struct checkout_metadata *meta, void *dco); +int async_convert_to_working_tree_ca(const struct conv_attrs *ca, + const char *path, const char *src, + size_t len, struct strbuf *dst, + const struct checkout_metadata *meta, + void *dco); int async_query_available_blobs(const char *cmd, struct string_list *available_paths); int renormalize_buffer(const struct index_state *istate, From patchwork Mon Aug 10 21:33:11 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Matheus Tavares X-Patchwork-Id: 11708157 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 5800E1392 for ; Mon, 10 Aug 2020 21:34:42 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 3B6D02073E for ; Mon, 10 Aug 2020 21:34:42 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=usp-br.20150623.gappssmtp.com header.i=@usp-br.20150623.gappssmtp.com header.b="W2TdLRw/" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726841AbgHJVel (ORCPT ); Mon, 10 Aug 2020 17:34:41 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36068 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726615AbgHJVek (ORCPT ); Mon, 10 Aug 2020 17:34:40 -0400 Received: from mail-qk1-x732.google.com (mail-qk1-x732.google.com [IPv6:2607:f8b0:4864:20::732]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 43C96C061756 for ; Mon, 10 Aug 2020 14:34:39 -0700 (PDT) Received: by mail-qk1-x732.google.com with SMTP id x69so9890904qkb.1 for ; Mon, 10 Aug 2020 14:34:39 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=usp-br.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=y6NlU9dy0xibxHUIy1uCtPfWuI8nLQDL/odVeJh7JZI=; b=W2TdLRw/a3Gj3GFDAYegG8b/cGjnVDhZkWad6JgpO4Uw1GYisYDVjerJrdVPvtAXf0 MYQMaz0Wqpast7lhhHH/ZBH8PE/600gcpOV/vk772bLpWBfE6cZTzkuqsyRchmqEQtwm rp4DEf1jNoS2/o4EdngPVPigctv1jtgN0tGO/jOUkRugGGOKa48RDzraSovOM3c/GDOq /3Lp2Wgx24TytaUVCx03Opr+sAh6J16MQwZQZLNLpwb2ioiRUaXdi6chVY0trM9NL/pI tJJztz5Bjm5DdZOuzwrCZ6NGyszAuFkCgFz4nkOaW0VLjF17ngv8TtuqcqtYRwgT181z L4CQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=y6NlU9dy0xibxHUIy1uCtPfWuI8nLQDL/odVeJh7JZI=; b=L+zWoISw9WYIqCtzt2ql1x8r7yLIXRvWJlbRVV7zTBMJwdNkDFwOnyalyCNRljFefn 5j0In2Jpp8VHxeJtE1VK26zmOdP0gRnlXGy/zwY+IzCwRn069df6NdpBWnTRwOr3XzYN OTuel+zyLJEc6Iif1VyYD/nw0NqPdn80FSPBrLmbhlBJYk1YPi2CT9PnZHMAqy69Otso YqBQ3mqhmRSP/eMrG37MqE3409SGg+WtZ0n61YAWZU22w++DKKUx69KJe2jyhYCu9nm3 narvK2eVK6A4bfVAmqhwO8hGQfx7yAyldHo2x8oB7vcgfzmsOIoG97X2jZ+zz2W6rB36 bafQ== X-Gm-Message-State: AOAM532uq74wWLyPdt8gsF7/DuEmvNxDFVmcODgN8ECqXXiNgM3D9cGh rl0PjnwaFdVrL8BHqvzf7D205dP3b60= X-Google-Smtp-Source: ABdhPJxc2KMOHuUsC1kBVWWtQIFW8+RPIov1A5N8LBr4fLoJBVqRSwp/K1Zerv7et1tCDz3XfRmmXQ== X-Received: by 2002:a37:a543:: with SMTP id o64mr28464894qke.177.1597095278054; Mon, 10 Aug 2020 14:34:38 -0700 (PDT) Received: from localhost.localdomain ([2804:18:87c:466:1120:3c2c:21e4:5931]) by smtp.gmail.com with ESMTPSA id z197sm15370674qkb.66.2020.08.10.14.34.33 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 10 Aug 2020 14:34:37 -0700 (PDT) From: Matheus Tavares To: git@vger.kernel.org Cc: stolee@gmail.com, jeffhost@microsoft.com, =?utf-8?q?Torsten_B=C3=B6gersh?= =?utf-8?q?ausen?= , =?utf-8?b?Tmd1eeG7hW4gVGjDoWkgTmfhu41j?= =?utf-8?b?IER1eQ==?= , Johannes Schindelin , =?utf-8?q?Jakub_Nar=C4=99?= =?utf-8?q?bski?= , Lars Schneider , Junio C Hamano , "brian m. carlson" Subject: [RFC PATCH 03/21] convert: add get_stream_filter_ca() variant Date: Mon, 10 Aug 2020 18:33:11 -0300 Message-Id: <437d31f3b1d6c333b95096c999b9549cab25768e.1597093021.git.matheus.bernardino@usp.br> X-Mailer: git-send-email 2.27.0 In-Reply-To: References: MIME-Version: 1.0 Sender: git-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Jeff Hostetler Like the previous patch, we will also need to call get_stream_filter() with a precomputed `struct conv_attrs`, when we add support for parallel checkout workers. So add the _ca() variant which takes the conversion attributes struct as a parameter. [matheus.bernardino: move header comment to ca() variant and reword msg] Signed-off-by: Matheus Tavares --- convert.c | 28 +++++++++++++++++----------- convert.h | 2 ++ 2 files changed, 19 insertions(+), 11 deletions(-) diff --git a/convert.c b/convert.c index 757dc2585c..8e995b39c3 100644 --- a/convert.c +++ b/convert.c @@ -1963,34 +1963,31 @@ static struct stream_filter *ident_filter(const struct object_id *oid) } /* - * Return an appropriately constructed filter for the path, or NULL if + * Return an appropriately constructed filter for the given ca, or NULL if * the contents cannot be filtered without reading the whole thing * in-core. * * Note that you would be crazy to set CRLF, smudge/clean or ident to a * large binary blob you would want us not to slurp into the memory! */ -struct stream_filter *get_stream_filter(const struct index_state *istate, - const char *path, - const struct object_id *oid) +struct stream_filter *get_stream_filter_ca(const struct conv_attrs *ca, + const struct object_id *oid) { - struct conv_attrs ca; struct stream_filter *filter = NULL; - convert_attrs(istate, &ca, path); - if (ca.drv && (ca.drv->process || ca.drv->smudge || ca.drv->clean)) + if (ca->drv && (ca->drv->process || ca->drv->smudge || ca->drv->clean)) return NULL; - if (ca.working_tree_encoding) + if (ca->working_tree_encoding) return NULL; - if (ca.crlf_action == CRLF_AUTO || ca.crlf_action == CRLF_AUTO_CRLF) + if (ca->crlf_action == CRLF_AUTO || ca->crlf_action == CRLF_AUTO_CRLF) return NULL; - if (ca.ident) + if (ca->ident) filter = ident_filter(oid); - if (output_eol(ca.crlf_action) == EOL_CRLF) + if (output_eol(ca->crlf_action) == EOL_CRLF) filter = cascade_filter(filter, lf_to_crlf_filter()); else filter = cascade_filter(filter, &null_filter_singleton); @@ -1998,6 +1995,15 @@ struct stream_filter *get_stream_filter(const struct index_state *istate, return filter; } +struct stream_filter *get_stream_filter(const struct index_state *istate, + const char *path, + const struct object_id *oid) +{ + struct conv_attrs ca; + convert_attrs(istate, &ca, path); + return get_stream_filter_ca(&ca, oid); +} + void free_stream_filter(struct stream_filter *filter) { filter->vtbl->free(filter); diff --git a/convert.h b/convert.h index 46d537d1ae..262c1a1d46 100644 --- a/convert.h +++ b/convert.h @@ -169,6 +169,8 @@ struct stream_filter; /* opaque */ struct stream_filter *get_stream_filter(const struct index_state *istate, const char *path, const struct object_id *); +struct stream_filter *get_stream_filter_ca(const struct conv_attrs *ca, + const struct object_id *oid); void free_stream_filter(struct stream_filter *); int is_null_stream_filter(struct stream_filter *); From patchwork Mon Aug 10 21:33:12 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Matheus Tavares X-Patchwork-Id: 11708159 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 5C4951392 for ; Mon, 10 Aug 2020 21:34:45 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 431DB2073E for ; Mon, 10 Aug 2020 21:34:45 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=usp-br.20150623.gappssmtp.com header.i=@usp-br.20150623.gappssmtp.com header.b="CXwFQkwL" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726869AbgHJVeo (ORCPT ); Mon, 10 Aug 2020 17:34:44 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36080 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726615AbgHJVeo (ORCPT ); Mon, 10 Aug 2020 17:34:44 -0400 Received: from mail-qk1-x734.google.com (mail-qk1-x734.google.com [IPv6:2607:f8b0:4864:20::734]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id DE364C061756 for ; Mon, 10 Aug 2020 14:34:43 -0700 (PDT) Received: by mail-qk1-x734.google.com with SMTP id b79so9842706qkg.9 for ; Mon, 10 Aug 2020 14:34:43 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=usp-br.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=iSiTjWdNdCa0maLmKAqlM43JmQ5CAGoVsFPbN+iWh0U=; b=CXwFQkwL9Su5l0CT5f+KF0HA4/sYEqQxRpIV0U8H5rJxHk6paoJs3/hh3ST00qWc/B SrtGjuTTgRP82A8UZZOySwRVb3zUPUuzZnox/YQ35/eezVJ5sIIqqgemeA90SfpHODPa osP3aCef3WOQhColXZ0WNiSfqNzsl4KqB9XbtpWXRGBFnlBpFvpWozmI+B8R1Wvv0DuT GyKn4YGbu3lUnYSOlS18evlQY+Z7apUcQoUX3r5x08oSOLuJTVcG3Y572p49IFN16WMF CyDoitrOtujKSywCzaRQxbyqdUXLBjGtK7YaXiQhErhKLi+r1xDpaP4hk12fTeiRkp+2 uCsg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=iSiTjWdNdCa0maLmKAqlM43JmQ5CAGoVsFPbN+iWh0U=; b=aRbJBejDwuXkDAIf2XomOrWOHJr9D9tuvFkk+wrAoWMkeg61FGkh7hr6yhSF94I1Xi W+WJIqxIDqS/zL37IRxsH7dR9Cs6Xxb9xT/kthAzId4WSg0xi96rskXEw2i6Ag5wVuBf UqdUa1XHQSRDMUxf9E7c6JP662xUkYpcggF/QjlnZ64Jt+PTBZjBa5p0NXlQiim95KDo v5phgb6z8YxLmRBfjAfz86Yt5DQiMMOXAqeVD8q7ORIUHMNIkmqAuOx84BctLv/Ld468 K+rM9uM0yK+pHJIuQAGHem+tWfGSOgmE3MVS2Kfg1CUBmE9FNMSdKm07xSJrExT5DUEk Vcgg== X-Gm-Message-State: AOAM5330J5CXwl+iPmac10cvlx25brMxXLc8Uy8SWXeTaqQMh99WTTEI VCzY/ferfXOilk00+upAa0+iItXvI0w= X-Google-Smtp-Source: ABdhPJzK3UnFiXZ1gRCCUuw9PKFHajTLkiNECEsP/SB6YiyQbJp4NtzBe+tTZMbNlErVocH8gvF/ZA== X-Received: by 2002:a37:4f07:: with SMTP id d7mr30030923qkb.144.1597095281930; Mon, 10 Aug 2020 14:34:41 -0700 (PDT) Received: from localhost.localdomain ([2804:18:87c:466:1120:3c2c:21e4:5931]) by smtp.gmail.com with ESMTPSA id z197sm15370674qkb.66.2020.08.10.14.34.38 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 10 Aug 2020 14:34:41 -0700 (PDT) From: Matheus Tavares To: git@vger.kernel.org Cc: stolee@gmail.com, jeffhost@microsoft.com, Lars Schneider , =?utf-8?q?Torsten_B=C3=B6gershau?= =?utf-8?q?sen?= , "brian m. carlson" , =?utf-8?b?Tmd1eeG7hW4g?= =?utf-8?b?VGjDoWkgTmfhu41jIER1eQ==?= , Junio C Hamano Subject: [RFC PATCH 04/21] convert: add conv_attrs classification Date: Mon, 10 Aug 2020 18:33:12 -0300 Message-Id: X-Mailer: git-send-email 2.27.0 In-Reply-To: References: MIME-Version: 1.0 Sender: git-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Jeff Hostetler Create `enum conv_attrs_classification` to express the different ways that attributes are handled for a blob during checkout. This will be used in a later commit when deciding whether to add a file to the parallel or delayed queue during checkout. For now, we can also use it in get_stream_filter_ca() to simplify the function (as the classifying logic is the same). [matheus.bernardino: use classification in get_stream_filter_ca()] Signed-off-by: Matheus Tavares --- convert.c | 26 +++++++++++++++++++------- convert.h | 33 +++++++++++++++++++++++++++++++++ 2 files changed, 52 insertions(+), 7 deletions(-) diff --git a/convert.c b/convert.c index 8e995b39c3..c037bb99eb 100644 --- a/convert.c +++ b/convert.c @@ -1975,13 +1975,7 @@ struct stream_filter *get_stream_filter_ca(const struct conv_attrs *ca, { struct stream_filter *filter = NULL; - if (ca->drv && (ca->drv->process || ca->drv->smudge || ca->drv->clean)) - return NULL; - - if (ca->working_tree_encoding) - return NULL; - - if (ca->crlf_action == CRLF_AUTO || ca->crlf_action == CRLF_AUTO_CRLF) + if (classify_conv_attrs(ca) != CA_CLASS_STREAMABLE) return NULL; if (ca->ident) @@ -2037,3 +2031,21 @@ void clone_checkout_metadata(struct checkout_metadata *dst, if (blob) oidcpy(&dst->blob, blob); } + +enum conv_attrs_classification classify_conv_attrs(const struct conv_attrs *ca) +{ + if (ca->drv) { + if (ca->drv->process) + return CA_CLASS_INCORE_PROCESS; + if (ca->drv->smudge || ca->drv->clean) + return CA_CLASS_INCORE_FILTER; + } + + if (ca->working_tree_encoding) + return CA_CLASS_INCORE; + + if (ca->crlf_action == CRLF_AUTO || ca->crlf_action == CRLF_AUTO_CRLF) + return CA_CLASS_INCORE; + + return CA_CLASS_STREAMABLE; +} diff --git a/convert.h b/convert.h index 262c1a1d46..523ba9b140 100644 --- a/convert.h +++ b/convert.h @@ -190,4 +190,37 @@ int stream_filter(struct stream_filter *, const char *input, size_t *isize_p, char *output, size_t *osize_p); +enum conv_attrs_classification { + /* + * The blob must be loaded into a buffer before it can be + * smudged. All smudging is done in-proc. + */ + CA_CLASS_INCORE, + + /* + * The blob must be loaded into a buffer, but uses a + * single-file driver filter, such as rot13. + */ + CA_CLASS_INCORE_FILTER, + + /* + * The blob must be loaded into a buffer, but uses a + * long-running driver process, such as LFS. This might or + * might not use delayed operations. (The important thing is + * that there is a single subordinate long-running process + * handling all associated blobs and in case of delayed + * operations, may hold per-blob state.) + */ + CA_CLASS_INCORE_PROCESS, + + /* + * The blob can be streamed and smudged without needing to + * completely read it into a buffer. + */ + CA_CLASS_STREAMABLE, +}; + +enum conv_attrs_classification classify_conv_attrs( + const struct conv_attrs *ca); + #endif /* CONVERT_H */ From patchwork Mon Aug 10 21:33:13 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Matheus Tavares X-Patchwork-Id: 11708161 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id D1C46109A for ; Mon, 10 Aug 2020 21:34:50 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id ACC23206DA for ; Mon, 10 Aug 2020 21:34:50 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=usp-br.20150623.gappssmtp.com header.i=@usp-br.20150623.gappssmtp.com header.b="GY9lwhtB" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726877AbgHJVet (ORCPT ); Mon, 10 Aug 2020 17:34:49 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36092 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726615AbgHJVet (ORCPT ); Mon, 10 Aug 2020 17:34:49 -0400 Received: from mail-qt1-x841.google.com (mail-qt1-x841.google.com [IPv6:2607:f8b0:4864:20::841]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 49924C061756 for ; Mon, 10 Aug 2020 14:34:48 -0700 (PDT) Received: by mail-qt1-x841.google.com with SMTP id s16so7980661qtn.7 for ; Mon, 10 Aug 2020 14:34:48 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=usp-br.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=Fkne9ypIVXomr3Ar74ayXp8Yz9Wd1nC55Ua9FdsUSSM=; b=GY9lwhtBI6kzIq/PcHJIMuKL9LdIS1s6iLpoY/gD/QcIK+BZQ8TJq/IFXAU0b800oT cOV5im8lGYupCYN+sr8Jbhe2TW6qz9nQmacw1rOR5W8WaYuO726vL4pmLks0iXAHTRow GRteREvJ9SqJaKZ6FAiA5IkPLdGp6Hs6ifBOE2ZHfzj53oA73eZ9NXLsPDqnvlYYrxk9 IRKL+Ouo4Zk0ENK2SXvQc5kcNcUgI1ib0arMnsqskiektvrzZ+a9Fk6+vYIJ4pNXyxMC 1dRH2cvHgjYidVJTaG2teorwIVU/9od7oPY2HfR1UvjEe9rSdWp6cjvxUTLS0U3OkVA2 wZvQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=Fkne9ypIVXomr3Ar74ayXp8Yz9Wd1nC55Ua9FdsUSSM=; b=HhQd/KVD2KmfIU5Y3wDmAMnd5abYNKGRD/kcGwVBTcOeyBQMeFskapvdxhjAIr5rtz KuPmS2njmuOPnN8ZbtP3iKHV+G0g9OPgDdsleBtcDPczv7gzFIBrnfI3VqcfnsY5drsI vRYt10jfkd45IUYR66UqJVWP3oxOKZVwuAKQVpzKAgJXoMH//nXmLjqB/SPAKDPfEEfe njfPovEvcj0c3rHszWYeLS5puXWUOGklZX/xbrxSKOFyORPksqfj1hj8hvmeQ2yfEP4d fyTawAK/AvgTBwAy0gfOQMDSsr1dfgp57og/hsn1wpQ9jd90r98UvqutCvx3b2ZW5tnA x4FQ== X-Gm-Message-State: AOAM533FMtVKs6Z4nt/Bg3qd2SYwe0qlEIlWbRx+tW8fJiJetcBZemtp KTBbE0jPJ5UuUxoJDo5K/vWg1DhuvME= X-Google-Smtp-Source: ABdhPJz4yU4yeHoiMC0NOkXdCcZL+eOk8GNFkLLOckAdQ36rnbslg9wuaKOCNj7Sghsn65HASSva3g== X-Received: by 2002:ac8:73d1:: with SMTP id v17mr29626691qtp.51.1597095287047; Mon, 10 Aug 2020 14:34:47 -0700 (PDT) Received: from localhost.localdomain ([2804:18:87c:466:1120:3c2c:21e4:5931]) by smtp.gmail.com with ESMTPSA id z197sm15370674qkb.66.2020.08.10.14.34.43 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 10 Aug 2020 14:34:46 -0700 (PDT) From: Matheus Tavares To: git@vger.kernel.org Cc: stolee@gmail.com, jeffhost@microsoft.com, =?utf-8?b?Tmd1eeG7hW4gVGjDoWkg?= =?utf-8?b?Tmfhu41jIER1eQ==?= , Ben Peart , Christian Couder , Lars Schneider , Stefan Beller , =?utf-8?q?Ren=C3=A9_Scharfe?= , Junio C Hamano Subject: [RFC PATCH 05/21] entry: extract a header file for entry.c functions Date: Mon, 10 Aug 2020 18:33:13 -0300 Message-Id: X-Mailer: git-send-email 2.27.0 In-Reply-To: References: MIME-Version: 1.0 Sender: git-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org The declarations of entry.c's public functions and structures currently reside in cache.h. Although not many, they contribute to the size of cache.h and, when changed, cause the unnecessary recompilation of modules that don't really use these functions. So let's move them to a new entry.h header. Original-patch-by: Nguyễn Thái Ngọc Duy Signed-off-by: Nguyễn Thái Ngọc Duy Signed-off-by: Matheus Tavares --- apply.c | 1 + builtin/checkout-index.c | 1 + builtin/checkout.c | 1 + builtin/difftool.c | 1 + cache.h | 24 ----------------------- entry.c | 9 +-------- entry.h | 41 ++++++++++++++++++++++++++++++++++++++++ unpack-trees.c | 1 + 8 files changed, 47 insertions(+), 32 deletions(-) create mode 100644 entry.h diff --git a/apply.c b/apply.c index 8bff604dbe..1443c307a4 100644 --- a/apply.c +++ b/apply.c @@ -21,6 +21,7 @@ #include "quote.h" #include "rerere.h" #include "apply.h" +#include "entry.h" struct gitdiff_data { struct strbuf *root; diff --git a/builtin/checkout-index.c b/builtin/checkout-index.c index a854fd16e7..0f1ff73129 100644 --- a/builtin/checkout-index.c +++ b/builtin/checkout-index.c @@ -11,6 +11,7 @@ #include "quote.h" #include "cache-tree.h" #include "parse-options.h" +#include "entry.h" #define CHECKOUT_ALL 4 static int nul_term_line; diff --git a/builtin/checkout.c b/builtin/checkout.c index 2837195491..3e09b29cfe 100644 --- a/builtin/checkout.c +++ b/builtin/checkout.c @@ -26,6 +26,7 @@ #include "unpack-trees.h" #include "wt-status.h" #include "xdiff-interface.h" +#include "entry.h" static const char * const checkout_usage[] = { N_("git checkout [] "), diff --git a/builtin/difftool.c b/builtin/difftool.c index 7ac432b881..dfa22b67eb 100644 --- a/builtin/difftool.c +++ b/builtin/difftool.c @@ -23,6 +23,7 @@ #include "lockfile.h" #include "object-store.h" #include "dir.h" +#include "entry.h" static int trust_exit_code; diff --git a/cache.h b/cache.h index 0290849c19..e6963cf8fe 100644 --- a/cache.h +++ b/cache.h @@ -1695,30 +1695,6 @@ const char *show_ident_date(const struct ident_split *id, */ int ident_cmp(const struct ident_split *, const struct ident_split *); -struct checkout { - struct index_state *istate; - const char *base_dir; - int base_dir_len; - struct delayed_checkout *delayed_checkout; - struct checkout_metadata meta; - unsigned force:1, - quiet:1, - not_new:1, - clone:1, - refresh_cache:1; -}; -#define CHECKOUT_INIT { NULL, "" } - -#define TEMPORARY_FILENAME_LENGTH 25 -int checkout_entry(struct cache_entry *ce, const struct checkout *state, char *topath, int *nr_checkouts); -void enable_delayed_checkout(struct checkout *state); -int finish_delayed_checkout(struct checkout *state, int *nr_checkouts); -/* - * Unlink the last component and schedule the leading directories for - * removal, such that empty directories get removed. - */ -void unlink_entry(const struct cache_entry *ce); - struct cache_def { struct strbuf path; int flags; diff --git a/entry.c b/entry.c index 449bd32dee..f46c06e831 100644 --- a/entry.c +++ b/entry.c @@ -6,6 +6,7 @@ #include "submodule.h" #include "progress.h" #include "fsmonitor.h" +#include "entry.h" static void create_directories(const char *path, int path_len, const struct checkout *state) @@ -429,14 +430,6 @@ static void mark_colliding_entries(const struct checkout *state, } } -/* - * Write the contents from ce out to the working tree. - * - * When topath[] is not NULL, instead of writing to the working tree - * file named by ce, a temporary file is created by this function and - * its name is returned in topath[], which must be able to hold at - * least TEMPORARY_FILENAME_LENGTH bytes long. - */ int checkout_entry(struct cache_entry *ce, const struct checkout *state, char *topath, int *nr_checkouts) { diff --git a/entry.h b/entry.h new file mode 100644 index 0000000000..2d69185448 --- /dev/null +++ b/entry.h @@ -0,0 +1,41 @@ +#ifndef ENTRY_H +#define ENTRY_H + +#include "cache.h" +#include "convert.h" + +struct checkout { + struct index_state *istate; + const char *base_dir; + int base_dir_len; + struct delayed_checkout *delayed_checkout; + struct checkout_metadata meta; + unsigned force:1, + quiet:1, + not_new:1, + clone:1, + refresh_cache:1; +}; +#define CHECKOUT_INIT { NULL, "" } + +#define TEMPORARY_FILENAME_LENGTH 25 + +/* + * Write the contents from ce out to the working tree. + * + * When topath[] is not NULL, instead of writing to the working tree + * file named by ce, a temporary file is created by this function and + * its name is returned in topath[], which must be able to hold at + * least TEMPORARY_FILENAME_LENGTH bytes long. + */ +int checkout_entry(struct cache_entry *ce, const struct checkout *state, + char *topath, int *nr_checkouts); +void enable_delayed_checkout(struct checkout *state); +int finish_delayed_checkout(struct checkout *state, int *nr_checkouts); +/* + * Unlink the last component and schedule the leading directories for + * removal, such that empty directories get removed. + */ +void unlink_entry(const struct cache_entry *ce); + +#endif /* ENTRY_H */ diff --git a/unpack-trees.c b/unpack-trees.c index 323280dd48..a511fadd89 100644 --- a/unpack-trees.c +++ b/unpack-trees.c @@ -16,6 +16,7 @@ #include "fsmonitor.h" #include "object-store.h" #include "promisor-remote.h" +#include "entry.h" /* * Error messages expected by scripts out of plumbing commands such as From patchwork Mon Aug 10 21:33:14 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Matheus Tavares X-Patchwork-Id: 11708163 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 6AE38109A for ; Mon, 10 Aug 2020 21:34:55 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 50DBB20734 for ; Mon, 10 Aug 2020 21:34:55 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=usp-br.20150623.gappssmtp.com header.i=@usp-br.20150623.gappssmtp.com header.b="rpZBogLs" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726891AbgHJVey (ORCPT ); Mon, 10 Aug 2020 17:34:54 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36108 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726615AbgHJVey (ORCPT ); Mon, 10 Aug 2020 17:34:54 -0400 Received: from mail-qk1-x742.google.com (mail-qk1-x742.google.com [IPv6:2607:f8b0:4864:20::742]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id DB719C061756 for ; Mon, 10 Aug 2020 14:34:52 -0700 (PDT) Received: by mail-qk1-x742.google.com with SMTP id n129so4194733qkd.6 for ; Mon, 10 Aug 2020 14:34:52 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=usp-br.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=/1NqtSVRsdQ/XrZ7wCWXEDYg8R3VKgCtklFQLw7ZAiA=; b=rpZBogLscM6/R/oZZ3sTOpV1nLTqcsvZ7V2nwQIYcVLeIIAUraoyFCC+vYm87RjPUZ /0ub4zx3TUVG8rxVhIFU82d2VmM22YmvukGk9P1v08RIMzl0i6RrJj6cBMA0CeLiwxM3 5oL8d18L8Im2NLMM73+Wfy6PPXbbxFPxb9wblz0s3UrokUH6HXbo9Bw4vPaljVZLb69S khsKgZpQi5Ye4kAb4rEyaCc3QsY7s5bIOhmcZRNLrf6eY3LfkHNn7wfpiu52DEmZ4Ex0 nMZGwc7DS0g+PrscHsq8mGLE0s+23fpP6mfOz8Dol6ouBuhnn1hQkY+zhD9Tw3oU174e 5zsQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=/1NqtSVRsdQ/XrZ7wCWXEDYg8R3VKgCtklFQLw7ZAiA=; b=H6s6h0wxOkP7S9JlxrvLRsNgapjbayyg9cocUK/OgcrsRuQXaGjpdB9lHTu1uZFUhF imiflHTJTYxSPn8evgCWs/GVCw7W+l09HZrVq8lIB71+YOYxayCgbCDC5rJfppvJxzox c0oSbtHUW7EqJfkrMXGHNyGOS+oTmLuIhe8CNDSqBTFtPB2s3a9ysXRQFPx4Y3D1iGqm mzgi5K7idlscvBsnLHj7FCfMgVQOnw3UIe+A2SlJyV5lzdla2/9D2y/AXta2OkzKxKIk JfffcXNqp88oaQsevS4rpEbsyXfmGwOUU+HKxdwALr6aQbX4p0z7/bCLtdexdOttyOnr CEhA== X-Gm-Message-State: AOAM5315ukPWAK9OAV8tOQ5W/sLGcqcUrJZBUWCoCxdUqJA39YMZvwHe 17VpCEsMXIn2HW5t/yp8Vc2DNZgaHQg= X-Google-Smtp-Source: ABdhPJzjmy3Grf0wylBvRcUIyvtlRMTha8ABKXroW4DY8S6hMuvlHfvqfSW1ZUECJ4rxiRfF4zTZ7A== X-Received: by 2002:a05:620a:11b7:: with SMTP id c23mr21575992qkk.70.1597095291712; Mon, 10 Aug 2020 14:34:51 -0700 (PDT) Received: from localhost.localdomain ([2804:18:87c:466:1120:3c2c:21e4:5931]) by smtp.gmail.com with ESMTPSA id z197sm15370674qkb.66.2020.08.10.14.34.47 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 10 Aug 2020 14:34:50 -0700 (PDT) From: Matheus Tavares To: git@vger.kernel.org Cc: stolee@gmail.com, jeffhost@microsoft.com, "brian m. carlson" , Junio C Hamano , Brandon Williams , Jeff King , Denton Liu , Jonathan Nieder , =?utf-8?b?Tmd1eeG7hW4gVGjDoWkgTmfhu41jIER1eQ==?= Subject: [RFC PATCH 06/21] entry: make fstat_output() and read_blob_entry() public Date: Mon, 10 Aug 2020 18:33:14 -0300 Message-Id: <4f86b585b7cd1ae60ba5f90a74f0cca9af59929c.1597093021.git.matheus.bernardino@usp.br> X-Mailer: git-send-email 2.27.0 In-Reply-To: References: MIME-Version: 1.0 Sender: git-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org These two functions will be used by the parallel checkout code, so let's make them public. Note: fstat_output() is renamed to fstat_checkout_output(), now that it has become public, seeking to avoid future name collisions. Signed-off-by: Matheus Tavares --- entry.c | 8 ++++---- entry.h | 2 ++ 2 files changed, 6 insertions(+), 4 deletions(-) diff --git a/entry.c b/entry.c index f46c06e831..cc27564473 100644 --- a/entry.c +++ b/entry.c @@ -84,7 +84,7 @@ static int create_file(const char *path, unsigned int mode) return open(path, O_WRONLY | O_CREAT | O_EXCL, mode); } -static void *read_blob_entry(const struct cache_entry *ce, unsigned long *size) +void *read_blob_entry(const struct cache_entry *ce, unsigned long *size) { enum object_type type; void *blob_data = read_object_file(&ce->oid, &type, size); @@ -109,7 +109,7 @@ static int open_output_fd(char *path, const struct cache_entry *ce, int to_tempf } } -static int fstat_output(int fd, const struct checkout *state, struct stat *st) +int fstat_checkout_output(int fd, const struct checkout *state, struct stat *st) { /* use fstat() only when path == ce->name */ if (fstat_is_reliable() && @@ -132,7 +132,7 @@ static int streaming_write_entry(const struct cache_entry *ce, char *path, return -1; result |= stream_blob_to_fd(fd, &ce->oid, filter, 1); - *fstat_done = fstat_output(fd, state, statbuf); + *fstat_done = fstat_checkout_output(fd, state, statbuf); result |= close(fd); if (result) @@ -346,7 +346,7 @@ static int write_entry(struct cache_entry *ce, wrote = write_in_full(fd, new_blob, size); if (!to_tempfile) - fstat_done = fstat_output(fd, state, &st); + fstat_done = fstat_checkout_output(fd, state, &st); close(fd); free(new_blob); if (wrote < 0) diff --git a/entry.h b/entry.h index 2d69185448..f860e60846 100644 --- a/entry.h +++ b/entry.h @@ -37,5 +37,7 @@ int finish_delayed_checkout(struct checkout *state, int *nr_checkouts); * removal, such that empty directories get removed. */ void unlink_entry(const struct cache_entry *ce); +void *read_blob_entry(const struct cache_entry *ce, unsigned long *size); +int fstat_checkout_output(int fd, const struct checkout *state, struct stat *st); #endif /* ENTRY_H */ From patchwork Mon Aug 10 21:33:15 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Matheus Tavares X-Patchwork-Id: 11708165 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 1EA19109A for ; Mon, 10 Aug 2020 21:34:59 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id F321C2073E for ; Mon, 10 Aug 2020 21:34:58 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=usp-br.20150623.gappssmtp.com header.i=@usp-br.20150623.gappssmtp.com header.b="LtFTgYsm" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726933AbgHJVe5 (ORCPT ); Mon, 10 Aug 2020 17:34:57 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36120 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726615AbgHJVe4 (ORCPT ); Mon, 10 Aug 2020 17:34:56 -0400 Received: from mail-qk1-x741.google.com (mail-qk1-x741.google.com [IPv6:2607:f8b0:4864:20::741]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id BDD45C061756 for ; Mon, 10 Aug 2020 14:34:56 -0700 (PDT) Received: by mail-qk1-x741.google.com with SMTP id p4so9885332qkf.0 for ; Mon, 10 Aug 2020 14:34:56 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=usp-br.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=eM9FUx2sec1LsRUfqnljMBGrcrO2+jcyZTwQ95XJhww=; b=LtFTgYsmsjBKqXqB/vhINbdAjd5/JGC9a1kPqiuoCRI0vKuh4B3vr7oPc/OP55VdeQ OMWGErrA1D/hjEZFmH1hutWuxd6W+CI8q5OF7u+r0X+MW+KpvmWMcLiiqdT4NbEkeNIP M8YWPrmdFuUCMtC5Zto92xcfy3osZi/82oGlaFwE6yAc3874FCtvWLgz57VXjF7dWZsK m8buBS1WG4AJ5DOOIbQy7W+0Mz9vZ58JiB430SnkMPKHZd0h5mJE4Xa5UkCT4pGFDF+F Z/FKwRvaDUYhLGXyLg8610mjtCvTqXiXI7nP7h0YW+0Fzat4ZsneeSZbfVxP18GvJYRP Vecg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=eM9FUx2sec1LsRUfqnljMBGrcrO2+jcyZTwQ95XJhww=; b=FMAm4ZohxxozqTZawAZ7I0nIztQOmIDfkqyDULay33/EzBRCeC5tU8GjHcAtMy8UBi zAqLZ/dAb0769les0VLY5PhShnud5V/rHIC5FsS+YN259qN3tlCl+EY00reWsGjm0NqW cOB/MYrsGq+728A/fFTjL8ihiOtekuQmtnsFh4vMLs9Buyg6n2f5aiGBe7DV9/GgYIEZ 9ESUrZlCqSXpDdpUcUgODPNYZpnWJxbtCbKRD0bJ7cNnwf/LGxsAAXcNcze0/+fIuoRB Z+/4KBZLFH6RpkdOgOqBxcAnCsg3upf9wC4nFsGwINdomOaOZGWDuscp5gQPSdvhah0m Ju/Q== X-Gm-Message-State: AOAM530te6zLelVNRD4HGfG6mnWOBqHy39mdWCWTJiJHK5qJn1DUTQmq PAbL+ZZhaO8Jphcp298v2OiwI8QVC/M= X-Google-Smtp-Source: ABdhPJxVj8VYggZBUbTmELpkJ1A0cmn8/xMMX7Dr8PjyhuK3Kq2AL1ry6M3mKAj5wcVb3pHe6dPuoA== X-Received: by 2002:a05:620a:9d0:: with SMTP id y16mr28874383qky.353.1597095295625; Mon, 10 Aug 2020 14:34:55 -0700 (PDT) Received: from localhost.localdomain ([2804:18:87c:466:1120:3c2c:21e4:5931]) by smtp.gmail.com with ESMTPSA id z197sm15370674qkb.66.2020.08.10.14.34.52 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 10 Aug 2020 14:34:54 -0700 (PDT) From: Matheus Tavares To: git@vger.kernel.org Cc: stolee@gmail.com, jeffhost@microsoft.com, Lars Schneider , Jeff King , Junio C Hamano , =?utf-8?b?Tmd1eeG7hW4gVGjDoWkgTmfhu41j?= =?utf-8?b?IER1eQ==?= , Johannes Schindelin , Ben Peart Subject: [RFC PATCH 07/21] entry: extract cache_entry update from write_entry() Date: Mon, 10 Aug 2020 18:33:15 -0300 Message-Id: <4b9e130278495aa7819e5cfd450b0b01d8b58b6d.1597093021.git.matheus.bernardino@usp.br> X-Mailer: git-send-email 2.27.0 In-Reply-To: References: MIME-Version: 1.0 Sender: git-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org This code will be used by the parallel checkout functions, outside entry.c, so extract it to a public function. Signed-off-by: Matheus Tavares --- entry.c | 26 +++++++++++++++++--------- entry.h | 2 ++ 2 files changed, 19 insertions(+), 9 deletions(-) diff --git a/entry.c b/entry.c index cc27564473..837629a804 100644 --- a/entry.c +++ b/entry.c @@ -251,6 +251,19 @@ int finish_delayed_checkout(struct checkout *state, int *nr_checkouts) return errs; } +void update_ce_after_write(const struct checkout *state, struct cache_entry *ce, + struct stat *st) +{ + if (state->refresh_cache) { + assert(state->istate); + fill_stat_cache_info(state->istate, ce, st); + ce->ce_flags |= CE_UPDATE_IN_BASE; + mark_fsmonitor_invalid(state->istate, ce); + state->istate->cache_changed |= CE_ENTRY_CHANGED; + } +} + + static int write_entry(struct cache_entry *ce, char *path, const struct checkout *state, int to_tempfile) { @@ -371,15 +384,10 @@ static int write_entry(struct cache_entry *ce, finish: if (state->refresh_cache) { - assert(state->istate); - if (!fstat_done) - if (lstat(ce->name, &st) < 0) - return error_errno("unable to stat just-written file %s", - ce->name); - fill_stat_cache_info(state->istate, ce, &st); - ce->ce_flags |= CE_UPDATE_IN_BASE; - mark_fsmonitor_invalid(state->istate, ce); - state->istate->cache_changed |= CE_ENTRY_CHANGED; + if (!fstat_done && lstat(ce->name, &st) < 0) + return error_errno("unable to stat just-written file %s", + ce->name); + update_ce_after_write(state, ce , &st); } delayed: return 0; diff --git a/entry.h b/entry.h index f860e60846..664aed1576 100644 --- a/entry.h +++ b/entry.h @@ -39,5 +39,7 @@ int finish_delayed_checkout(struct checkout *state, int *nr_checkouts); void unlink_entry(const struct cache_entry *ce); void *read_blob_entry(const struct cache_entry *ce, unsigned long *size); int fstat_checkout_output(int fd, const struct checkout *state, struct stat *st); +void update_ce_after_write(const struct checkout *state, struct cache_entry *ce, + struct stat *st); #endif /* ENTRY_H */ From patchwork Mon Aug 10 21:33:16 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Matheus Tavares X-Patchwork-Id: 11708167 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 287D3109A for ; Mon, 10 Aug 2020 21:35:03 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 0A6A920734 for ; Mon, 10 Aug 2020 21:35:03 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=usp-br.20150623.gappssmtp.com header.i=@usp-br.20150623.gappssmtp.com header.b="jUoBcOsB" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726948AbgHJVfC (ORCPT ); Mon, 10 Aug 2020 17:35:02 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36132 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726615AbgHJVfB (ORCPT ); Mon, 10 Aug 2020 17:35:01 -0400 Received: from mail-qv1-xf43.google.com (mail-qv1-xf43.google.com [IPv6:2607:f8b0:4864:20::f43]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 5D413C061756 for ; Mon, 10 Aug 2020 14:35:01 -0700 (PDT) Received: by mail-qv1-xf43.google.com with SMTP id cs12so5018878qvb.2 for ; Mon, 10 Aug 2020 14:35:01 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=usp-br.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=PoEe5TuOwwZlwqBMI+ExawovhMZUiU10LGramkQL/CE=; b=jUoBcOsBiarO5hFXkUPcfAyTif4Kxin5w/bHAGnzz2EQGJ0jNCG1OSAgC6BAvueczE zleq7TIhafncogGZ3cK9pbSAmayESYhkMLIMfmDf709F7vLKb7MgbHuP2aR1DvDNIuBD 9SZC15RQrU3JboxAeTfNoIL99KeH4llSuV3YF/Wg55RAjpmb/uo+YhgUFE5r1qX4gUet XRbHXAL6KNPCzTAvYTJsZDnPDCHLU2krpzLG1peB/xPQEy+qjXOwalBE6bXn1QEWKGuq 1pr3OzJ+YmvMImKM8Yrva3vjKJ0w+mZ22AJe6iutXpacbcSfRMKH1hxsNvEEP02dRTUU LfeA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=PoEe5TuOwwZlwqBMI+ExawovhMZUiU10LGramkQL/CE=; b=LrK2PLjVudwfs3kdIWFSl2m60Oa9Nl5t8Wfx/zxBXwX7Wetm4TWTz2TgHUFhnw01sw XRqVVNwAxAY9Ybu0PJGXIdk+h8Eqp8nu/Nx5IlC2nvpEbOnbYQlLgLMcPAFYNzzT217O fuqH2EjmjcQ4EPMBkGdxqDrRJ2ap1aDrT1nTktr1crADnPBAIOTMKzn10RFB6+0L205Y lW95jDgS8GdpESIwIN2XJv7+AIqFbk+qiP0ccFfgaPYysjDbO9rRRM/BcFzul6jOC3Ac Jo4g5QiRjXpBSFcvFHmNXNXQnArWYBpNJ9PFyz03PjdFGmPsefxQK6XwLdhOV1HnopGu aqaA== X-Gm-Message-State: AOAM533iyPhvANEgdB/i0qYxCgB8MFeGAOGfOGns9v8v4fZFhN9u6ZE+ Swh/QvClPstVdnX1TqGDd8hXkU5tX7I= X-Google-Smtp-Source: ABdhPJwHf7qsysdiHjNlLZOXIozFMydW+MmbWz8Kddb1OXdfqnSwCnS4lhRuBFse3dXjXHr1KaIoOg== X-Received: by 2002:ad4:470f:: with SMTP id k15mr30389462qvz.216.1597095300156; Mon, 10 Aug 2020 14:35:00 -0700 (PDT) Received: from localhost.localdomain ([2804:18:87c:466:1120:3c2c:21e4:5931]) by smtp.gmail.com with ESMTPSA id z197sm15370674qkb.66.2020.08.10.14.34.56 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 10 Aug 2020 14:34:58 -0700 (PDT) From: Matheus Tavares To: git@vger.kernel.org Cc: stolee@gmail.com, jeffhost@microsoft.com, Thomas Gummerer , Junio C Hamano , =?utf-8?b?Tmd1eeG7hW4gVGjDoWkgTmfhu41jIER1eQ==?= , "brian m. carlson" Subject: [RFC PATCH 08/21] entry: move conv_attrs lookup up to checkout_entry() Date: Mon, 10 Aug 2020 18:33:16 -0300 Message-Id: <934b025526d8ced39e7cb5b7a161bf80bc2ef99a.1597093021.git.matheus.bernardino@usp.br> X-Mailer: git-send-email 2.27.0 In-Reply-To: References: MIME-Version: 1.0 Sender: git-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org In a following patch, checkout_entry() will use conv_attrs to decide whether an entry should be enqueued for parallel checkout or not. But the attributes lookup only happens lower in this call stack. To avoid the unnecessary work of loading the attributes twice, let's move it up to checkout_entry(), and pass the loaded struct down to write_entry(). Signed-off-by: Matheus Tavares --- entry.c | 39 +++++++++++++++++++++++++++------------ 1 file changed, 27 insertions(+), 12 deletions(-) diff --git a/entry.c b/entry.c index 837629a804..59d5335ff1 100644 --- a/entry.c +++ b/entry.c @@ -263,9 +263,9 @@ void update_ce_after_write(const struct checkout *state, struct cache_entry *ce, } } - -static int write_entry(struct cache_entry *ce, - char *path, const struct checkout *state, int to_tempfile) +/* Note: ca is used (and required) iff the entry refers to a regular file. */ +static int write_entry(struct cache_entry *ce, char *path, struct conv_attrs *ca, + const struct checkout *state, int to_tempfile) { unsigned int ce_mode_s_ifmt = ce->ce_mode & S_IFMT; struct delayed_checkout *dco = state->delayed_checkout; @@ -282,8 +282,7 @@ static int write_entry(struct cache_entry *ce, clone_checkout_metadata(&meta, &state->meta, &ce->oid); if (ce_mode_s_ifmt == S_IFREG) { - struct stream_filter *filter = get_stream_filter(state->istate, ce->name, - &ce->oid); + struct stream_filter *filter = get_stream_filter_ca(ca, &ce->oid); if (filter && !streaming_write_entry(ce, path, filter, state, to_tempfile, @@ -330,14 +329,17 @@ static int write_entry(struct cache_entry *ce, * Convert from git internal format to working tree format */ if (dco && dco->state != CE_NO_DELAY) { - ret = async_convert_to_working_tree(state->istate, ce->name, new_blob, - size, &buf, &meta, dco); + ret = async_convert_to_working_tree_ca(ca, ce->name, + new_blob, size, + &buf, &meta, dco); if (ret && string_list_has_string(&dco->paths, ce->name)) { free(new_blob); goto delayed; } - } else - ret = convert_to_working_tree(state->istate, ce->name, new_blob, size, &buf, &meta); + } else { + ret = convert_to_working_tree_ca(ca, ce->name, new_blob, + size, &buf, &meta); + } if (ret) { free(new_blob); @@ -443,6 +445,7 @@ int checkout_entry(struct cache_entry *ce, const struct checkout *state, { static struct strbuf path = STRBUF_INIT; struct stat st; + struct conv_attrs ca; if (ce->ce_flags & CE_WT_REMOVE) { if (topath) @@ -455,8 +458,13 @@ int checkout_entry(struct cache_entry *ce, const struct checkout *state, return 0; } - if (topath) - return write_entry(ce, topath, state, 1); + if (topath) { + if (S_ISREG(ce->ce_mode)) { + convert_attrs(state->istate, &ca, ce->name); + return write_entry(ce, topath, &ca, state, 1); + } + return write_entry(ce, topath, NULL, state, 1); + } strbuf_reset(&path); strbuf_add(&path, state->base_dir, state->base_dir_len); @@ -520,9 +528,16 @@ int checkout_entry(struct cache_entry *ce, const struct checkout *state, return 0; create_directories(path.buf, path.len, state); + if (nr_checkouts) (*nr_checkouts)++; - return write_entry(ce, path.buf, state, 0); + + if (S_ISREG(ce->ce_mode)) { + convert_attrs(state->istate, &ca, ce->name); + return write_entry(ce, path.buf, &ca, state, 0); + } + + return write_entry(ce, path.buf, NULL, state, 0); } void unlink_entry(const struct cache_entry *ce) From patchwork Mon Aug 10 21:33:17 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Matheus Tavares X-Patchwork-Id: 11708169 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id DB32F109A for ; Mon, 10 Aug 2020 21:35:06 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id C1A932073E for ; Mon, 10 Aug 2020 21:35:06 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=usp-br.20150623.gappssmtp.com header.i=@usp-br.20150623.gappssmtp.com header.b="uA+0mJC8" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726574AbgHJVfF (ORCPT ); Mon, 10 Aug 2020 17:35:05 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36146 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726615AbgHJVfF (ORCPT ); Mon, 10 Aug 2020 17:35:05 -0400 Received: from mail-qk1-x741.google.com (mail-qk1-x741.google.com [IPv6:2607:f8b0:4864:20::741]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 6DFF4C061787 for ; Mon, 10 Aug 2020 14:35:05 -0700 (PDT) Received: by mail-qk1-x741.google.com with SMTP id h7so9866073qkk.7 for ; Mon, 10 Aug 2020 14:35:05 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=usp-br.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=DgRCzcq3hK6Os5aV3Xq9Kffc33HyktM7Rm9FPSc61pw=; b=uA+0mJC8++PmIHbn1FOSecAexeEtwfO2zKIpGjaDNyX+3x8QmGLQ4P2v4HYp60v9zv 03kC9dNO9L/oajapTj4BumJ0Ow7yn9DihgpGYb0TRWwEE3VjXPHW57AoYxGMRUwRFKTH aes8kTfTAPuDuTaV3pK0qm0fwRL06Yo7FKF8pmv4+xa/bO/yiCsBaWFJfBtJYvUpno0u PEdI54+ZnnHmnz+SP/ji+gBICFBfI3k1zOFCnIrwmt4FFaDeb5DX40ppkdZYkf2hVuJQ XQB1u9u0X1oP8jFmMbFWPO6/E/ZvNjvkAfEEO9P/PsI33jeIhU4U5N0gihO1bJerE26L nDDQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=DgRCzcq3hK6Os5aV3Xq9Kffc33HyktM7Rm9FPSc61pw=; b=EkAKvm5DRlkEOhUsPZXeBrAh/i5D7MgXDcL9WzXH9ZXvoLfsSqRUlp7MsEQ9Q/5jRx VdUJdOBQ5kmFLu7I/6IM+O+jK9EcpJ62N+i5h5HoV/qiIVov5TCBPd0h12JJR+kYbrAF 9CxVRddVHpurYjDmH8eMRZ0iCr/mBYKi+i0stNahlKoWR81Xt4L8FqQOfT6tZt1egi5O 45DeA+V7Pa0O2j9E1u9bbah9xPHSS5Ve9GQoyHFQJIe48D/bPGzWQtltrYjT9/WkAxCA k0A5PEvUF8lphU8WaO5eb3whhh5NU+PgSMLbdZBJfVCIfb1bxXCeuxjZ20/H5p4YGpAO NN0A== X-Gm-Message-State: AOAM530W1IsK9GtfZMryFjaJiK5WCBxWlmM5PJuDT3JSI6J75q5LMI2E IlVi2E/DlbtcIOF/N6IvLBthPPN+XEo= X-Google-Smtp-Source: ABdhPJwLHx6j11BlAFO2taM9tUMgFwEsEiQ7SOGARaqi4q4UMkLjPHaTqmK3pX4MnmBaWCg8LS6GJA== X-Received: by 2002:a37:4144:: with SMTP id o65mr28736020qka.32.1597095304039; Mon, 10 Aug 2020 14:35:04 -0700 (PDT) Received: from localhost.localdomain ([2804:18:87c:466:1120:3c2c:21e4:5931]) by smtp.gmail.com with ESMTPSA id z197sm15370674qkb.66.2020.08.10.14.35.00 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 10 Aug 2020 14:35:03 -0700 (PDT) From: Matheus Tavares To: git@vger.kernel.org Cc: stolee@gmail.com, jeffhost@microsoft.com, Thomas Gummerer , Junio C Hamano , Denton Liu , =?utf-8?b?Tmd1eeG7hW4gVGjDoWkgTmfhu41j?= =?utf-8?b?IER1eQ==?= Subject: [RFC PATCH 09/21] entry: add checkout_entry_ca() which takes preloaded conv_attrs Date: Mon, 10 Aug 2020 18:33:17 -0300 Message-Id: <6ff36c853294f2449f781cfa1baaa63e691fa3d9.1597093021.git.matheus.bernardino@usp.br> X-Mailer: git-send-email 2.27.0 In-Reply-To: References: MIME-Version: 1.0 Sender: git-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org The parallel checkout machinery will call checkout_entry() for entries that could not be written in parallel due to path collisions. At this point, we will already be holding the conversion attributes for each entry, and it would be wasteful to let checkout_entry() load these again. Instead, let's add the checkout_entry_ca() variant, which optionally takes a preloaded conv_attrs struct. Signed-off-by: Matheus Tavares --- entry.c | 23 ++++++++++++----------- entry.h | 13 +++++++++++-- 2 files changed, 23 insertions(+), 13 deletions(-) diff --git a/entry.c b/entry.c index 59d5335ff1..f9835afba3 100644 --- a/entry.c +++ b/entry.c @@ -440,12 +440,13 @@ static void mark_colliding_entries(const struct checkout *state, } } -int checkout_entry(struct cache_entry *ce, const struct checkout *state, - char *topath, int *nr_checkouts) +int checkout_entry_ca(struct cache_entry *ce, struct conv_attrs *ca, + const struct checkout *state, char *topath, + int *nr_checkouts) { static struct strbuf path = STRBUF_INIT; struct stat st; - struct conv_attrs ca; + struct conv_attrs ca_buf; if (ce->ce_flags & CE_WT_REMOVE) { if (topath) @@ -459,11 +460,11 @@ int checkout_entry(struct cache_entry *ce, const struct checkout *state, } if (topath) { - if (S_ISREG(ce->ce_mode)) { - convert_attrs(state->istate, &ca, ce->name); - return write_entry(ce, topath, &ca, state, 1); + if (S_ISREG(ce->ce_mode) && !ca) { + convert_attrs(state->istate, &ca_buf, ce->name); + ca = &ca_buf; } - return write_entry(ce, topath, NULL, state, 1); + return write_entry(ce, topath, ca, state, 1); } strbuf_reset(&path); @@ -532,12 +533,12 @@ int checkout_entry(struct cache_entry *ce, const struct checkout *state, if (nr_checkouts) (*nr_checkouts)++; - if (S_ISREG(ce->ce_mode)) { - convert_attrs(state->istate, &ca, ce->name); - return write_entry(ce, path.buf, &ca, state, 0); + if (S_ISREG(ce->ce_mode) && !ca) { + convert_attrs(state->istate, &ca_buf, ce->name); + ca = &ca_buf; } - return write_entry(ce, path.buf, NULL, state, 0); + return write_entry(ce, path.buf, ca, state, 0); } void unlink_entry(const struct cache_entry *ce) diff --git a/entry.h b/entry.h index 664aed1576..2081fbbbab 100644 --- a/entry.h +++ b/entry.h @@ -27,9 +27,18 @@ struct checkout { * file named by ce, a temporary file is created by this function and * its name is returned in topath[], which must be able to hold at * least TEMPORARY_FILENAME_LENGTH bytes long. + * + * With checkout_entry_ca(), callers can optionally pass a preloaded + * conv_attrs struct (to avoid reloading it), when ce refers to a + * regular file. If ca is NULL, the attributes will be loaded + * internally when (and if) needed. */ -int checkout_entry(struct cache_entry *ce, const struct checkout *state, - char *topath, int *nr_checkouts); +#define checkout_entry(ce, state, topath, nr_checkouts) \ + checkout_entry_ca(ce, NULL, state, topath, nr_checkouts) +int checkout_entry_ca(struct cache_entry *ce, struct conv_attrs *ca, + const struct checkout *state, char *topath, + int *nr_checkouts); + void enable_delayed_checkout(struct checkout *state); int finish_delayed_checkout(struct checkout *state, int *nr_checkouts); /* From patchwork Mon Aug 10 21:33:18 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Matheus Tavares X-Patchwork-Id: 11708171 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id E4286109A for ; Mon, 10 Aug 2020 21:35:11 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id B8F0D206DA for ; Mon, 10 Aug 2020 21:35:11 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=usp-br.20150623.gappssmtp.com header.i=@usp-br.20150623.gappssmtp.com header.b="qVCTxQMP" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726537AbgHJVfL (ORCPT ); Mon, 10 Aug 2020 17:35:11 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36160 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726472AbgHJVfK (ORCPT ); Mon, 10 Aug 2020 17:35:10 -0400 Received: from mail-qt1-x844.google.com (mail-qt1-x844.google.com [IPv6:2607:f8b0:4864:20::844]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 56EEBC061756 for ; Mon, 10 Aug 2020 14:35:10 -0700 (PDT) Received: by mail-qt1-x844.google.com with SMTP id c12so7982459qtn.9 for ; Mon, 10 Aug 2020 14:35:10 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=usp-br.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=wVm26iTCCXeQuT8Vip9MqcUaktabSiUVttAJe6FADfU=; b=qVCTxQMP4hUzCO1ZLGaAzyrM/XCyhpJ47KM5Ma1XXKUJxalGOC1mL/qSAM1Qkvsf4f XzHP56oWVUAHHsZxWYQg/fIa5JZMg2WUs10csKzG6KtBRFvgT1F0Wxz95p57NVEWG/6F Td4ieF+t0KeFRc1vnSvL346KGcHcN+vnqsoIOybDdltspOgMZKder874xjCfl8teVQMm ScuLYEhVj8r32bqKj9g6ccvSloSYEJDoqcoHogrUsthmNAB+/N0QMFG5WYPw2s9rjfX0 r7egN0e2I9OAzkMrKouqEAW7g3UMsMPcf8HYSQVl9TMyjcoI4q3o9GY0GOsmAaveM70E UaUA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=wVm26iTCCXeQuT8Vip9MqcUaktabSiUVttAJe6FADfU=; b=abIfl8Bdo/YIgH+Uv7mOgnn7c/GOwpZ0t/UB4I02sKp3PLARGX4mZ+JPzRbGRFcHdf GSuepFUhl/GjBTr6xFxNIAOWm22UZHMgzQ6Ed2UqGMj5nypphXAVSzU+rRQzBgJAI/Lv jZl/csJJdFXTpoP1yPbFEkWeLE1Ryj37B4mm3ohvwlj1LwKZlUTTGJ6BFnfxR02O7xJF N0gMIBciFH2qkw9c4QpKrsFEV2+fOKxnn58dLFEwqchONSzh9ze+gxLNbcC9eXAand41 2Gi0o2Ne1i9R3v5FKDWXqs1/2MQQnYmO2JNxMHlK6UT7zH8yNOQiy2iSElgCbeiCNSi/ 6TGg== X-Gm-Message-State: AOAM532mZFtRdEEeZKW8DIPwMFtg3IwsePoPY5kjCvtdGaDSL8vvpaEV 6BGK5gk5x7TmQCyfT+f/nLUej3iXA3o= X-Google-Smtp-Source: ABdhPJw11l5s5OrUX7KjKcGq960UrKVnP627WCj3Ty1X9vX9uWijSAqwGbtw4GcVfeDUqI4tyLSVNQ== X-Received: by 2002:ac8:428f:: with SMTP id o15mr27562365qtl.213.1597095308816; Mon, 10 Aug 2020 14:35:08 -0700 (PDT) Received: from localhost.localdomain ([2804:18:87c:466:1120:3c2c:21e4:5931]) by smtp.gmail.com with ESMTPSA id z197sm15370674qkb.66.2020.08.10.14.35.05 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 10 Aug 2020 14:35:07 -0700 (PDT) From: Matheus Tavares To: git@vger.kernel.org Cc: stolee@gmail.com, jeffhost@microsoft.com, =?utf-8?b?Tmd1eeG7hW4gVGjDoWkg?= =?utf-8?b?Tmfhu41jIER1eQ==?= , Jonathan Tan , =?utf-8?q?Ren=C3=A9_Scharfe?= , Christian Couder , Stefan Beller , Junio C Hamano , Lars Schneider Subject: [RFC PATCH 10/21] unpack-trees: add basic support for parallel checkout Date: Mon, 10 Aug 2020 18:33:18 -0300 Message-Id: <1b39a4099a69f2c42211d46d615055c783703fea.1597093021.git.matheus.bernardino@usp.br> X-Mailer: git-send-email 2.27.0 In-Reply-To: References: MIME-Version: 1.0 Sender: git-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org This new interface allows us to enqueue some of the entries being checked out to later call write_entry() for them in parallel. For now, the parallel checkout machinery is enabled by default and there is no user configuration, but run_parallel_checkout() just writes the queued entries in sequence (without spawning additional workers). The next patch will actually implement the parallelism and, later, we will make it configurable. When there are path collisions among the entries being written (which can happen e.g. with case-sensitive files in case-insensitive file systems), the parallel checkout code detects the problem and mark the checkout_item with CI_RETRY. Later, these items are sequentially feed to checkout_entry() again. This is similar to the way the sequential code deals with collisions, overwriting the previously checked out entries with the subsequent ones. The only difference is that, once we start writing the entries in parallel, we will no longer be able to determine which of the colliding entries will survive on disk (for the sequential algorithm, it is always the last one). Finally, just like the sequential code, there is no additional overhead when there are no collisions. Note: we continue the loop of write_checkout_item() even if the previous call returned an error. This is how checkout_entry() is called in builtin/checkout.c:checkout_paths() and unpack-trees.c:check-updates(). In the case of fatal errors, die() aborts the loop. Co-authored-by: Nguyễn Thái Ngọc Duy Co-authored-by: Jeff Hostetler Signed-off-by: Matheus Tavares --- For consistency, the parallel code replicates the sequential behavior of overwriting colliding entries. However, during parallel checkout it's possible to distinguish a path collision from the case where a path was already present in the working tree before checkout. So, in the event of a collision, we could chose to write a single entry and skip overwriting it with the next ones. Does that sounds reasonable, or are there other problems in not writing the extra colliding entries? Makefile | 1 + entry.c | 4 + parallel-checkout.c | 340 ++++++++++++++++++++++++++++++++++++++++++++ parallel-checkout.h | 20 +++ unpack-trees.c | 6 +- 5 files changed, 370 insertions(+), 1 deletion(-) create mode 100644 parallel-checkout.c create mode 100644 parallel-checkout.h diff --git a/Makefile b/Makefile index 65f8cfb236..caab8e6401 100644 --- a/Makefile +++ b/Makefile @@ -933,6 +933,7 @@ LIB_OBJS += pack-revindex.o LIB_OBJS += pack-write.o LIB_OBJS += packfile.o LIB_OBJS += pager.o +LIB_OBJS += parallel-checkout.o LIB_OBJS += parse-options-cb.o LIB_OBJS += parse-options.o LIB_OBJS += patch-delta.o diff --git a/entry.c b/entry.c index f9835afba3..47c2c20d5a 100644 --- a/entry.c +++ b/entry.c @@ -7,6 +7,7 @@ #include "progress.h" #include "fsmonitor.h" #include "entry.h" +#include "parallel-checkout.h" static void create_directories(const char *path, int path_len, const struct checkout *state) @@ -538,6 +539,9 @@ int checkout_entry_ca(struct cache_entry *ce, struct conv_attrs *ca, ca = &ca_buf; } + if (!enqueue_checkout(ce, ca)) + return 0; + return write_entry(ce, path.buf, ca, state, 0); } diff --git a/parallel-checkout.c b/parallel-checkout.c new file mode 100644 index 0000000000..e3b44eeb34 --- /dev/null +++ b/parallel-checkout.c @@ -0,0 +1,340 @@ +#include "cache.h" +#include "entry.h" +#include "parallel-checkout.h" +#include "streaming.h" + +enum ci_status { + CI_PENDING = 0, + CI_SUCCESS, + CI_RETRY, + CI_FAILED, +}; + +struct checkout_item { + /* pointer to a istate->cache[] entry. Not owned by us. */ + struct cache_entry *ce; + struct conv_attrs ca; + struct stat st; + enum ci_status status; +}; + +struct parallel_checkout { + struct checkout_item *items; + size_t nr, alloc; +}; + +static struct parallel_checkout *parallel_checkout = NULL; + +enum pc_status { + PC_UNINITIALIZED = 0, + PC_ACCEPTING_ENTRIES, + PC_RUNNING, + PC_HANDLING_RESULTS, +}; + +static enum pc_status pc_status = PC_UNINITIALIZED; + +void init_parallel_checkout(void) +{ + if (parallel_checkout) + BUG("parallel checkout already initialized"); + + parallel_checkout = xcalloc(1, sizeof(*parallel_checkout)); + pc_status = PC_ACCEPTING_ENTRIES; +} + +static void finish_parallel_checkout(void) +{ + if (!parallel_checkout) + BUG("cannot finish parallel checkout: not initialized yet"); + + free(parallel_checkout->items); + FREE_AND_NULL(parallel_checkout); + pc_status = PC_UNINITIALIZED; +} + +static int is_eligible_for_parallel_checkout(const struct cache_entry *ce, + const struct conv_attrs *ca) +{ + enum conv_attrs_classification c; + + if (!S_ISREG(ce->ce_mode)) + return 0; + + c = classify_conv_attrs(ca); + switch (c) { + case CA_CLASS_INCORE: + return 1; + + case CA_CLASS_INCORE_FILTER: + /* + * It would be safe to allow concurrent instances of + * single-file smudge filters, like rot13, but we should not + * assume that all filters are parallel-process safe. So we + * don't allow this. + */ + return 0; + + case CA_CLASS_INCORE_PROCESS: + /* + * The parallel queue and the delayed queue are not compatible, + * so they must be kept completely separated. And we can't tell + * if a long-running process will delay its response without + * actually asking it to perform the filtering. Therefore, this + * type of filter is not allowed in parallel checkout. + * + * Furthermore, there should only be one instance of the + * long-running process filter as we don't know how it is + * managing its own concurrency. So, spreading the entries that + * requisite such a filter among the parallel workers would + * require a lot more inter-process communication. We would + * probably have to designate a single process to interact with + * the filter and send all the necessary data to it, for each + * entry. + */ + return 0; + + case CA_CLASS_STREAMABLE: + return 1; + + default: + BUG("unsupported conv_attrs classification '%d'", c); + } +} + +int enqueue_checkout(struct cache_entry *ce, struct conv_attrs *ca) +{ + struct checkout_item *ci; + + if (!parallel_checkout || pc_status != PC_ACCEPTING_ENTRIES || + !is_eligible_for_parallel_checkout(ce, ca)) + return -1; + + ALLOC_GROW(parallel_checkout->items, parallel_checkout->nr + 1, + parallel_checkout->alloc); + + ci = ¶llel_checkout->items[parallel_checkout->nr++]; + ci->ce = ce; + memcpy(&ci->ca, ca, sizeof(ci->ca)); + + return 0; +} + +static int handle_results(struct checkout *state) +{ + int ret = 0; + size_t i; + + pc_status = PC_HANDLING_RESULTS; + + for (i = 0; i < parallel_checkout->nr; ++i) { + struct checkout_item *ci = ¶llel_checkout->items[i]; + struct stat *st = &ci->st; + + switch(ci->status) { + case CI_SUCCESS: + update_ce_after_write(state, ci->ce, st); + break; + case CI_RETRY: + /* + * The fails for which we set CI_RETRY are the ones + * that might have been caused by a path collision. So + * we let checkout_entry_ca() retry writing, as it will + * properly handle collisions and the creation of + * leading dirs in the entry's path. + */ + ret |= checkout_entry_ca(ci->ce, &ci->ca, state, NULL, NULL); + break; + case CI_FAILED: + ret = -1; + break; + case CI_PENDING: + BUG("parallel checkout finished with pending entries"); + default: + BUG("unknown checkout item status in parallel checkout"); + } + } + + return ret; +} + +static int reset_fd(int fd, const char *path) +{ + if (lseek(fd, 0, SEEK_SET) != 0) + return error_errno("failed to rewind descriptor of %s", path); + if (ftruncate(fd, 0)) + return error_errno("failed to truncate file %s", path); + return 0; +} + +static int write_checkout_item_to_fd(int fd, struct checkout *state, + struct checkout_item *ci, const char *path) +{ + int ret; + struct stream_filter *filter; + struct strbuf buf = STRBUF_INIT; + char *new_blob; + unsigned long size; + size_t newsize = 0; + ssize_t wrote; + + /* Sanity check */ + assert(is_eligible_for_parallel_checkout(ci->ce, &ci->ca)); + + filter = get_stream_filter_ca(&ci->ca, &ci->ce->oid); + if (filter) { + if (stream_blob_to_fd(fd, &ci->ce->oid, filter, 1)) { + /* On error, reset fd to try writing without streaming */ + if (reset_fd(fd, path)) + return -1; + } else { + return 0; + } + } + + new_blob = read_blob_entry(ci->ce, &size); + if (!new_blob) + return error("unable to read sha1 file of %s (%s)", path, + oid_to_hex(&ci->ce->oid)); + + /* + * checkout metadata is used to give context for external process + * filters. Files requiring such filters are not eligible for parallel + * checkout, so pass NULL. + */ + ret = convert_to_working_tree_ca(&ci->ca, ci->ce->name, new_blob, size, + &buf, NULL); + + if (ret) { + free(new_blob); + new_blob = strbuf_detach(&buf, &newsize); + size = newsize; + } + + wrote = write_in_full(fd, new_blob, size); + free(new_blob); + if (wrote < 0) + return error("unable to write file %s", path); + + return 0; +} + +static int close_and_clear(int *fd) +{ + int ret = 0; + + if (*fd >= 0) { + ret = close(*fd); + *fd = -1; + } + + return ret; +} + +static int check_leading_dirs(const char *path, int len, int prefix_len) +{ + const char *slash = path + len; + + while (slash > path && *slash != '/') + slash--; + + return has_dirs_only_path(path, slash - path, prefix_len); +} + +static void write_checkout_item(struct checkout *state, struct checkout_item *ci) +{ + unsigned int mode = (ci->ce->ce_mode & 0100) ? 0777 : 0666; + int fd = -1, fstat_done = 0; + struct strbuf path = STRBUF_INIT; + + strbuf_add(&path, state->base_dir, state->base_dir_len); + strbuf_add(&path, ci->ce->name, ce_namelen(ci->ce)); + + /* + * At this point, leading dirs should have already been created. But if + * a symlink being checked out has collided with one of the dirs, due to + * file system folding rules, it's possible that the dirs are no longer + * present. So we have to check again, and report any path collisions. + */ + if (!check_leading_dirs(path.buf, path.len, state->base_dir_len)) { + ci->status = CI_RETRY; + goto out; + } + + fd = open(path.buf, O_WRONLY | O_CREAT | O_EXCL, mode); + + if (fd < 0) { + if (errno == EEXIST || errno == EISDIR) { + /* + * Errors which probably represent a path collision. + * Suppress the error message and mark the ci to be + * retried later, sequentially. ENOTDIR and ENOENT are + * also interesting, but check_leading_dirs() should + * have already caught these cases. + */ + ci->status = CI_RETRY; + } else { + error_errno("failed to open file %s", path.buf); + ci->status = CI_FAILED; + } + goto out; + } + + if (write_checkout_item_to_fd(fd, state, ci, path.buf)) { + /* Error was already reported. */ + ci->status = CI_FAILED; + goto out; + } + + fstat_done = fstat_checkout_output(fd, state, &ci->st); + + if (close_and_clear(&fd)) { + error_errno("unable to close file %s", path.buf); + ci->status = CI_FAILED; + goto out; + } + + if (state->refresh_cache && !fstat_done && lstat(path.buf, &ci->st) < 0) { + error_errno("unable to stat just-written file %s", path.buf); + ci->status = CI_FAILED; + goto out; + } + + ci->status = CI_SUCCESS; + +out: + /* + * No need to check close() return. At this point, either fd is already + * closed, or we are on an error path, that has already been reported. + */ + close_and_clear(&fd); + strbuf_release(&path); +} + +static int run_checkout_sequentially(struct checkout *state) +{ + size_t i; + + for (i = 0; i < parallel_checkout->nr; ++i) { + struct checkout_item *ci = ¶llel_checkout->items[i]; + write_checkout_item(state, ci); + } + + return handle_results(state); +} + + +int run_parallel_checkout(struct checkout *state) +{ + int ret; + + if (!parallel_checkout) + BUG("cannot run parallel checkout: not initialized yet"); + + pc_status = PC_RUNNING; + + ret = run_checkout_sequentially(state); + + finish_parallel_checkout(); + return ret; +} diff --git a/parallel-checkout.h b/parallel-checkout.h new file mode 100644 index 0000000000..8eef59ffcd --- /dev/null +++ b/parallel-checkout.h @@ -0,0 +1,20 @@ +#ifndef PARALLEL_CHECKOUT_H +#define PARALLEL_CHECKOUT_H + +struct cache_entry; +struct checkout; +struct conv_attrs; + +void init_parallel_checkout(void); + +/* + * Return -1 if parallel checkout is currently not enabled or if the entry is + * not eligible for parallel checkout. Otherwise, enqueue the entry for later + * write and return 0. + */ +int enqueue_checkout(struct cache_entry *ce, struct conv_attrs *ca); + +/* Write all the queued entries, returning 0 on success.*/ +int run_parallel_checkout(struct checkout *state); + +#endif /* PARALLEL_CHECKOUT_H */ diff --git a/unpack-trees.c b/unpack-trees.c index a511fadd89..1b1da7485a 100644 --- a/unpack-trees.c +++ b/unpack-trees.c @@ -17,6 +17,7 @@ #include "object-store.h" #include "promisor-remote.h" #include "entry.h" +#include "parallel-checkout.h" /* * Error messages expected by scripts out of plumbing commands such as @@ -438,7 +439,6 @@ static int check_updates(struct unpack_trees_options *o, if (should_update_submodules()) load_gitmodules_file(index, &state); - enable_delayed_checkout(&state); if (has_promisor_remote()) { /* * Prefetch the objects that are to be checked out in the loop @@ -461,6 +461,9 @@ static int check_updates(struct unpack_trees_options *o, to_fetch.oid, to_fetch.nr); oid_array_clear(&to_fetch); } + + enable_delayed_checkout(&state); + init_parallel_checkout(); for (i = 0; i < index->cache_nr; i++) { struct cache_entry *ce = index->cache[i]; @@ -474,6 +477,7 @@ static int check_updates(struct unpack_trees_options *o, } } stop_progress(&progress); + errs |= run_parallel_checkout(&state); errs |= finish_delayed_checkout(&state, NULL); git_attr_set_direction(GIT_ATTR_CHECKIN); From patchwork Mon Aug 10 21:33:19 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Matheus Tavares X-Patchwork-Id: 11708173 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 1850816B1 for ; Mon, 10 Aug 2020 21:35:18 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id DF1DE20734 for ; Mon, 10 Aug 2020 21:35:17 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=usp-br.20150623.gappssmtp.com header.i=@usp-br.20150623.gappssmtp.com header.b="hNs/nK5u" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726615AbgHJVfR (ORCPT ); Mon, 10 Aug 2020 17:35:17 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36182 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726472AbgHJVfQ (ORCPT ); Mon, 10 Aug 2020 17:35:16 -0400 Received: from mail-qv1-xf29.google.com (mail-qv1-xf29.google.com [IPv6:2607:f8b0:4864:20::f29]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id AC75BC061756 for ; Mon, 10 Aug 2020 14:35:15 -0700 (PDT) Received: by mail-qv1-xf29.google.com with SMTP id s15so5004919qvv.7 for ; Mon, 10 Aug 2020 14:35:15 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=usp-br.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=oB+9+sH+zzmqoJU3KU14T7RXeXV94nelW5FCDp0hghs=; b=hNs/nK5ukfprghI8vG76Ccect57KDLwVMyD3QXquPk1DAhRO/6xxIpE18EOGud21uI NiO/GZLy5/KlJQvFnKz3+waKZSsXLTCMDcY9tYoFve9JwmsIZz1EOnPC/1XkAbKJoDyX P1z0ALT69XbQKRVVfHJ6AeXZHFarUu2KKXJwujUWDD+z5r630NsmqgEKY1RXZCSoJ6wQ 5YzrDr9uaQmgmnI6XvKGm5Pc5Z9vLyd+/0dkg/XfNknmp81b3cLK7qeJdKEHe+fw+x/P ikG7x/LT+YI7wk/JAzI3d38v9G5NdxKtbsQAgSPvAXxTZ+M+Cb2XAO9cdwtqMEH9wdfG c4CA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=oB+9+sH+zzmqoJU3KU14T7RXeXV94nelW5FCDp0hghs=; b=Z7WV4H/gGUDC6NxzqAD5Wc3WLYZDONydzZXgR60riu/oAltIY1M/+YHCB/H6PVSOPf 4xcEYvfURflHYVzyvSReglU7/2nXZ+UkDp0dpHjf03Z1XahH5Db0aDaz5be0ARPyuDQe FTBhcgyiAo39D/Y8srBC0aaeV1/BqRtvuJ3KMqUtEWc6V6/w++QU21ANx2cTm8acJV8S 8JwqNfMnGRkgxZfGy4VQvtQRtrpDPAPjxVWV4FdAMVmnE2OX3/vL2Rnsdq4DY+XD5Bba flJ6SY0vOhpKviHArl7f9sNgLbweHqRZZ1f1qIfOGiEhcR8GgBJWTnKSi3SVXfxd2odf ZRMw== X-Gm-Message-State: AOAM533vtA6XZJZ9d+crx8UEQSxzaV1lZSVs4/PmrUYmqFw7QyLGD4BW F3CS+7b/NMJ4MngEP36MaqVmzqv8DvU= X-Google-Smtp-Source: ABdhPJwydQrRPCQcHVkwi2bM7hnK0JV+QcF9WuBL8rj4dXsZ5zM6y4Lkn553arN1e7GuvjAO87Pv2Q== X-Received: by 2002:a05:6214:d43:: with SMTP id 3mr30709556qvr.47.1597095313878; Mon, 10 Aug 2020 14:35:13 -0700 (PDT) Received: from localhost.localdomain ([2804:18:87c:466:1120:3c2c:21e4:5931]) by smtp.gmail.com with ESMTPSA id z197sm15370674qkb.66.2020.08.10.14.35.10 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 10 Aug 2020 14:35:13 -0700 (PDT) From: Matheus Tavares To: git@vger.kernel.org Cc: stolee@gmail.com, jeffhost@microsoft.com, =?utf-8?b?Tmd1eeG7hW4gVGjDoWkg?= =?utf-8?b?Tmfhu41jIER1eQ==?= , Paul Tan , Denton Liu , Remi Lespinet , Junio C Hamano Subject: [RFC PATCH 11/21] parallel-checkout: make it truly parallel Date: Mon, 10 Aug 2020 18:33:19 -0300 Message-Id: <7e7527ef3e8a9e71a012f1623e9642c47f7f741c.1597093021.git.matheus.bernardino@usp.br> X-Mailer: git-send-email 2.27.0 In-Reply-To: References: MIME-Version: 1.0 Sender: git-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org Use multiple worker processes to distribute the queued entries and call write_checkout_item() in parallel for them. The items are distributed uniformly in contiguous chunks. This minimizes the chances of two workers writing to the same directory simultaneously, which could affect performance due to lock contention in the kernel. Work stealing (or any other format of re-distribution) is not implemented yet. For now, the number of workers is equal to the number of logical cores available. But the next patch will add settings to configure this. Distributed file systems, such as NFS and EFS, can benefit from using more workers than the actual number of cores (see timings below). The parallel version was benchmarked during three operations in the linux repo, with cold cache: cloning v5.8, checking out v5.8 from v2.6.15 (checkout I) and checking out v5.8 from v5.7 (checkout II). The three tables below show the mean run times and standard deviations for 5 runs in a local file system, a Linux NFS server and Amazon EFS. The numbers of workers were chosen based on what produces the best result for each case. Local: Clone Checkout I Checkout II Sequential 8.180 s ± 0.021 s 6.936 s ± 0.030 s 2.585 s ± 0.005 s 10 workers 3.633 s ± 0.040 s 2.288 s ± 0.026 s 1.058 s ± 0.015 s Speedup 2.25 ± 0.03 3.03 ± 0.04 2.44 ± 0.03 Linux NFS server (v4.1, on EBS, single availability zone): Clone Checkout I Checkout II Sequential 208.069 s ± 2.522 s 198.610 s ± 1.979 s 54.376 s ± 1.333 s 32 workers 67.078 s ± 0.878 s 64.828 s ± 0.387 s 22.993 s ± 0.252 s Speedup 3.10 ± 0.06 3.06 ± 0.04 2.36 ± 0.06 EFS (v4.1, replicated over multiple availability zones): Clone Checkout I Checkout II Sequential 1143.655 s ± 11.819 s 1277.891 s ± 10.481 s 396.891 s ± 7.505 s 64 workers 173.242 s ± 1.484 s 282.421 s ± 1.521 s 165.424 s ± 9.564 s Speedup 6.60 ± 0.09 4.52 ± 0.04 2.40 ± 0.15 Local tests were executed in an i7-7700HQ (4 cores with hyper-threading) running Manjaro Linux, with SSD. NFS and EFS tests were executed in an Amazon EC2 c5n.large instance, with 2 vCPUs. The Linux NFS server was running on a m6g.large instance with 1 TB, EBS GP2 volume. Before each timing, the linux repository was removed (or checked out back), and `sync && sysctl vm.drop_caches=3` was executed. Co-authored-by: Nguyễn Thái Ngọc Duy Co-authored-by: Jeff Hostetler Signed-off-by: Matheus Tavares --- .gitignore | 1 + Makefile | 1 + builtin.h | 1 + builtin/checkout--helper.c | 135 +++++++++++++++++++++ entry.c | 13 +- git.c | 2 + parallel-checkout.c | 237 +++++++++++++++++++++++++++++++------ parallel-checkout.h | 74 +++++++++++- 8 files changed, 425 insertions(+), 39 deletions(-) create mode 100644 builtin/checkout--helper.c diff --git a/.gitignore b/.gitignore index ee509a2ad2..6c01f0a58c 100644 --- a/.gitignore +++ b/.gitignore @@ -33,6 +33,7 @@ /git-check-mailmap /git-check-ref-format /git-checkout +/git-checkout--helper /git-checkout-index /git-cherry /git-cherry-pick diff --git a/Makefile b/Makefile index caab8e6401..926473d484 100644 --- a/Makefile +++ b/Makefile @@ -1049,6 +1049,7 @@ BUILTIN_OBJS += builtin/check-attr.o BUILTIN_OBJS += builtin/check-ignore.o BUILTIN_OBJS += builtin/check-mailmap.o BUILTIN_OBJS += builtin/check-ref-format.o +BUILTIN_OBJS += builtin/checkout--helper.o BUILTIN_OBJS += builtin/checkout-index.o BUILTIN_OBJS += builtin/checkout.o BUILTIN_OBJS += builtin/clean.o diff --git a/builtin.h b/builtin.h index a5ae15bfe5..5790c68750 100644 --- a/builtin.h +++ b/builtin.h @@ -122,6 +122,7 @@ int cmd_branch(int argc, const char **argv, const char *prefix); int cmd_bundle(int argc, const char **argv, const char *prefix); int cmd_cat_file(int argc, const char **argv, const char *prefix); int cmd_checkout(int argc, const char **argv, const char *prefix); +int cmd_checkout__helper(int argc, const char **argv, const char *prefix); int cmd_checkout_index(int argc, const char **argv, const char *prefix); int cmd_check_attr(int argc, const char **argv, const char *prefix); int cmd_check_ignore(int argc, const char **argv, const char *prefix); diff --git a/builtin/checkout--helper.c b/builtin/checkout--helper.c new file mode 100644 index 0000000000..269cf02feb --- /dev/null +++ b/builtin/checkout--helper.c @@ -0,0 +1,135 @@ +#include "builtin.h" +#include "config.h" +#include "entry.h" +#include "parallel-checkout.h" +#include "parse-options.h" +#include "pkt-line.h" + +static void packet_to_ci(char *line, int len, struct checkout_item *ci) +{ + struct ci_fixed_portion *fixed_portion; + char *encoding, *variant; + + if (len < sizeof(struct ci_fixed_portion)) + BUG("checkout worker received too short item (got %d, exp %d)", + len, (int)sizeof(struct ci_fixed_portion)); + + fixed_portion = (struct ci_fixed_portion *)line; + + if (len - sizeof(struct ci_fixed_portion) != + fixed_portion->name_len + fixed_portion->working_tree_encoding_len) + BUG("checkout worker received corrupted item"); + + variant = line + sizeof(struct ci_fixed_portion); + if (fixed_portion->working_tree_encoding_len) { + encoding = xmemdupz(variant, + fixed_portion->working_tree_encoding_len); + variant += fixed_portion->working_tree_encoding_len; + } else { + encoding = NULL; + } + + memset(ci, 0, sizeof(*ci)); + ci->ce = make_empty_transient_cache_entry(fixed_portion->name_len); + ci->ce->ce_namelen = fixed_portion->name_len; + ci->ce->ce_mode = fixed_portion->ce_mode; + memcpy(ci->ce->name, variant, ci->ce->ce_namelen); + oidcpy(&ci->ce->oid, &fixed_portion->oid); + + ci->id = fixed_portion->id; + ci->ca.attr_action = fixed_portion->attr_action; + ci->ca.crlf_action = fixed_portion->crlf_action; + ci->ca.ident = fixed_portion->ident; + ci->ca.working_tree_encoding = encoding; +} + +static void report_result(struct checkout_item *ci) +{ + struct ci_result res = { 0 }; + size_t size; + + res.id = ci->id; + res.status = ci->status; + + if (ci->status == CI_SUCCESS) { + res.st = ci->st; + size = sizeof(res); + } else { + size = ci_result_base_size(); + } + + packet_write(1, (const char *)&res, size); +} + +/* Free the worker-side malloced data, but not the ci itself. */ +static void release_checkout_item_data(struct checkout_item *ci) +{ + free((char *)ci->ca.working_tree_encoding); + discard_cache_entry(ci->ce); +} + +static void worker_loop(struct checkout *state) +{ + struct checkout_item *items = NULL; + size_t i, nr = 0, alloc = 0; + + while (1) { + int len; + char *line = packet_read_line(0, &len); + + if (!line) + break; + + ALLOC_GROW(items, nr + 1, alloc); + packet_to_ci(line, len, &items[nr++]); + } + + for (i = 0; i < nr; ++i) { + struct checkout_item *ci = &items[i]; + write_checkout_item(state, ci); + report_result(ci); + release_checkout_item_data(ci); + } + + packet_flush(1); + + free(items); +} + +static const char * const checkout_helper_usage[] = { + N_("git checkout--helper []"), + NULL +}; + +int cmd_checkout__helper(int argc, const char **argv, const char *prefix) +{ + struct checkout state = CHECKOUT_INIT; + struct option checkout_helper_options[] = { + OPT_STRING(0, "prefix", &state.base_dir, N_("string"), + N_("when creating files, prepend ")), + OPT_END() + }; + + if (argc == 2 && !strcmp(argv[1], "-h")) + usage_with_options(checkout_helper_usage, + checkout_helper_options); + + git_config(git_default_config, NULL); + argc = parse_options(argc, argv, prefix, checkout_helper_options, + checkout_helper_usage, 0); + if (argc > 0) + usage_with_options(checkout_helper_usage, checkout_helper_options); + + if (state.base_dir) + state.base_dir_len = strlen(state.base_dir); + + /* + * Setting this on worker won't actually update the index. We just need + * to pretend so to induce the checkout machinery to stat() the written + * entries. + */ + state.refresh_cache = 1; + + worker_loop(&state); + return 0; +} diff --git a/entry.c b/entry.c index 47c2c20d5a..b6c808dffa 100644 --- a/entry.c +++ b/entry.c @@ -427,8 +427,17 @@ static void mark_colliding_entries(const struct checkout *state, for (i = 0; i < state->istate->cache_nr; i++) { struct cache_entry *dup = state->istate->cache[i]; - if (dup == ce) - break; + if (dup == ce) { + /* + * Parallel checkout creates the files in a racy order. + * So the other side of the collision may appear after + * the given cache_entry in the array. + */ + if (parallel_checkout_status() == PC_HANDLING_RESULTS) + continue; + else + break; + } if (dup->ce_flags & (CE_MATCHED | CE_VALID | CE_SKIP_WORKTREE)) continue; diff --git a/git.c b/git.c index 8bd1d7551d..78c7bd412c 100644 --- a/git.c +++ b/git.c @@ -486,6 +486,8 @@ static struct cmd_struct commands[] = { { "check-mailmap", cmd_check_mailmap, RUN_SETUP }, { "check-ref-format", cmd_check_ref_format, NO_PARSEOPT }, { "checkout", cmd_checkout, RUN_SETUP | NEED_WORK_TREE }, + { "checkout--helper", cmd_checkout__helper, + RUN_SETUP | NEED_WORK_TREE | SUPPORT_SUPER_PREFIX }, { "checkout-index", cmd_checkout_index, RUN_SETUP | NEED_WORK_TREE}, { "cherry", cmd_cherry, RUN_SETUP }, diff --git a/parallel-checkout.c b/parallel-checkout.c index e3b44eeb34..ec42342bc8 100644 --- a/parallel-checkout.c +++ b/parallel-checkout.c @@ -1,39 +1,23 @@ #include "cache.h" #include "entry.h" #include "parallel-checkout.h" +#include "pkt-line.h" +#include "run-command.h" #include "streaming.h" -enum ci_status { - CI_PENDING = 0, - CI_SUCCESS, - CI_RETRY, - CI_FAILED, -}; - -struct checkout_item { - /* pointer to a istate->cache[] entry. Not owned by us. */ - struct cache_entry *ce; - struct conv_attrs ca; - struct stat st; - enum ci_status status; -}; - struct parallel_checkout { struct checkout_item *items; size_t nr, alloc; }; static struct parallel_checkout *parallel_checkout = NULL; - -enum pc_status { - PC_UNINITIALIZED = 0, - PC_ACCEPTING_ENTRIES, - PC_RUNNING, - PC_HANDLING_RESULTS, -}; - static enum pc_status pc_status = PC_UNINITIALIZED; +enum pc_status parallel_checkout_status(void) +{ + return pc_status; +} + void init_parallel_checkout(void) { if (parallel_checkout) @@ -113,9 +97,11 @@ int enqueue_checkout(struct cache_entry *ce, struct conv_attrs *ca) ALLOC_GROW(parallel_checkout->items, parallel_checkout->nr + 1, parallel_checkout->alloc); - ci = ¶llel_checkout->items[parallel_checkout->nr++]; + ci = ¶llel_checkout->items[parallel_checkout->nr]; ci->ce = ce; memcpy(&ci->ca, ca, sizeof(ci->ca)); + ci->id = parallel_checkout->nr; + parallel_checkout->nr++; return 0; } @@ -200,7 +186,8 @@ static int write_checkout_item_to_fd(int fd, struct checkout *state, /* * checkout metadata is used to give context for external process * filters. Files requiring such filters are not eligible for parallel - * checkout, so pass NULL. + * checkout, so pass NULL. Note: if that changes, the metadata must also + * be passed from the main process to the workers. */ ret = convert_to_working_tree_ca(&ci->ca, ci->ce->name, new_blob, size, &buf, NULL); @@ -241,14 +228,14 @@ static int check_leading_dirs(const char *path, int len, int prefix_len) return has_dirs_only_path(path, slash - path, prefix_len); } -static void write_checkout_item(struct checkout *state, struct checkout_item *ci) +void write_checkout_item(struct checkout *state, struct checkout_item *ci) { unsigned int mode = (ci->ce->ce_mode & 0100) ? 0777 : 0666; int fd = -1, fstat_done = 0; struct strbuf path = STRBUF_INIT; strbuf_add(&path, state->base_dir, state->base_dir_len); - strbuf_add(&path, ci->ce->name, ce_namelen(ci->ce)); + strbuf_add(&path, ci->ce->name, ci->ce->ce_namelen); /* * At this point, leading dirs should have already been created. But if @@ -311,30 +298,214 @@ static void write_checkout_item(struct checkout *state, struct checkout_item *ci strbuf_release(&path); } -static int run_checkout_sequentially(struct checkout *state) +static void send_one_item(int fd, struct checkout_item *ci) +{ + size_t len_data; + char *data, *variant; + struct ci_fixed_portion *fixed_portion; + const char *working_tree_encoding = ci->ca.working_tree_encoding; + size_t name_len = ci->ce->ce_namelen; + size_t working_tree_encoding_len = working_tree_encoding ? + strlen(working_tree_encoding) : 0; + + len_data = sizeof(struct ci_fixed_portion) + name_len + + working_tree_encoding_len; + + data = xcalloc(1, len_data); + + fixed_portion = (struct ci_fixed_portion *)data; + fixed_portion->id = ci->id; + oidcpy(&fixed_portion->oid, &ci->ce->oid); + fixed_portion->ce_mode = ci->ce->ce_mode; + fixed_portion->attr_action = ci->ca.attr_action; + fixed_portion->crlf_action = ci->ca.crlf_action; + fixed_portion->ident = ci->ca.ident; + fixed_portion->name_len = name_len; + fixed_portion->working_tree_encoding_len = working_tree_encoding_len; + + variant = data + sizeof(*fixed_portion); + if (working_tree_encoding_len) { + memcpy(variant, working_tree_encoding, working_tree_encoding_len); + variant += working_tree_encoding_len; + } + memcpy(variant, ci->ce->name, name_len); + + packet_write(fd, data, len_data); + + free(data); +} + +static void send_batch(int fd, size_t start, size_t nr) { size_t i; + for (i = 0; i < nr; ++i) + send_one_item(fd, ¶llel_checkout->items[start + i]); + packet_flush(fd); +} - for (i = 0; i < parallel_checkout->nr; ++i) { - struct checkout_item *ci = ¶llel_checkout->items[i]; - write_checkout_item(state, ci); +static struct child_process *setup_workers(struct checkout *state, int num_workers) +{ + struct child_process *workers; + int i, workers_with_one_extra_item; + size_t base_batch_size, next_to_assign = 0; + + base_batch_size = parallel_checkout->nr / num_workers; + workers_with_one_extra_item = parallel_checkout->nr % num_workers; + ALLOC_ARRAY(workers, num_workers); + + for (i = 0; i < num_workers; ++i) { + struct child_process *cp = &workers[i]; + size_t batch_size = base_batch_size; + + child_process_init(cp); + cp->git_cmd = 1; + cp->in = -1; + cp->out = -1; + strvec_push(&cp->args, "checkout--helper"); + if (state->base_dir_len) + strvec_pushf(&cp->args, "--prefix=%s", state->base_dir); + if (start_command(cp)) + die(_("failed to spawn checkout worker")); + + /* distribute the extra work evenly */ + if (i < workers_with_one_extra_item) + batch_size++; + + send_batch(cp->in, next_to_assign, batch_size); + next_to_assign += batch_size; } + return workers; +} + +static void finish_workers(struct child_process *workers, int num_workers) +{ + int i; + for (i = 0; i < num_workers; ++i) { + struct child_process *w = &workers[i]; + if (w->in >= 0) + close(w->in); + if (w->out >= 0) + close(w->out); + if (finish_command(w)) + die(_("checkout worker finished with error")); + } + free(workers); +} + +static void parse_and_save_result(const char *line, int len) +{ + struct ci_result *res; + struct checkout_item *ci; + + /* + * Worker should send either the full result struct or just the base + * (i.e. no stat data). + */ + if (len != ci_result_base_size() && len != sizeof(struct ci_result)) + BUG("received corrupted item from checkout worker"); + + res = (struct ci_result *)line; + + if (res->id > parallel_checkout->nr) + BUG("checkout worker sent unknown item id"); + + ci = ¶llel_checkout->items[res->id]; + ci->status = res->status; + + /* + * Worker only sends stat data on success. Otherwise, we *cannot* access + * res->st as that will be an invalid address. + */ + if (res->status == CI_SUCCESS) + ci->st = res->st; +} + +static void gather_results_from_workers(struct child_process *workers, + int num_workers) +{ + int i, active_workers = num_workers; + struct pollfd *pfds; + + CALLOC_ARRAY(pfds, num_workers); + for (i = 0; i < num_workers; ++i) { + pfds[i].fd = workers[i].out; + pfds[i].events = POLLIN; + } + + while (active_workers) { + int nr = poll(pfds, num_workers, -1); + + if (nr < 0) { + if (errno == EINTR) + continue; + die_errno("failed to poll checkout workers"); + } + + for (i = 0; i < num_workers && nr > 0; ++i) { + struct pollfd *pfd = &pfds[i]; + + if (!pfd->revents) + continue; + + if (pfd->revents & POLLIN) { + int len; + const char *line = packet_read_line(pfd->fd, &len); + + if (!line) { + pfd->fd = -1; + active_workers--; + } else { + parse_and_save_result(line, len); + } + } else if (pfd->revents & POLLHUP) { + pfd->fd = -1; + active_workers--; + } else if (pfd->revents & (POLLNVAL | POLLERR)) { + die(_("error polling from checkout worker")); + } + + nr--; + } + } + + free(pfds); +} + +static int run_checkout_sequentially(struct checkout *state) +{ + size_t i; + for (i = 0; i < parallel_checkout->nr; ++i) + write_checkout_item(state, ¶llel_checkout->items[i]); return handle_results(state); } +static const int workers_threshold = 0; int run_parallel_checkout(struct checkout *state) { - int ret; + int num_workers = online_cpus(); + int ret = 0; + struct child_process *workers; if (!parallel_checkout) BUG("cannot run parallel checkout: not initialized yet"); pc_status = PC_RUNNING; - ret = run_checkout_sequentially(state); + if (parallel_checkout->nr == 0) { + goto done; + } else if (parallel_checkout->nr < workers_threshold || num_workers == 1) { + ret = run_checkout_sequentially(state); + goto done; + } + + workers = setup_workers(state, num_workers); + gather_results_from_workers(workers, num_workers); + finish_workers(workers, num_workers); + ret = handle_results(state); +done: finish_parallel_checkout(); return ret; } diff --git a/parallel-checkout.h b/parallel-checkout.h index 8eef59ffcd..f25f2874ae 100644 --- a/parallel-checkout.h +++ b/parallel-checkout.h @@ -1,10 +1,21 @@ #ifndef PARALLEL_CHECKOUT_H #define PARALLEL_CHECKOUT_H -struct cache_entry; -struct checkout; -struct conv_attrs; +#include "entry.h" +#include "convert.h" +/**************************************************************** + * Users of parallel checkout + ****************************************************************/ + +enum pc_status { + PC_UNINITIALIZED = 0, + PC_ACCEPTING_ENTRIES, + PC_RUNNING, + PC_HANDLING_RESULTS, +}; + +enum pc_status parallel_checkout_status(void); void init_parallel_checkout(void); /* @@ -14,7 +25,62 @@ void init_parallel_checkout(void); */ int enqueue_checkout(struct cache_entry *ce, struct conv_attrs *ca); -/* Write all the queued entries, returning 0 on success.*/ +/* Write all the queued entries, returning 0 on success. */ int run_parallel_checkout(struct checkout *state); +/**************************************************************** + * Interface with checkout--helper + ****************************************************************/ + +enum ci_status { + CI_PENDING = 0, + CI_SUCCESS, + CI_RETRY, + CI_FAILED, +}; + +struct checkout_item { + /* + * In main process ce points to a istate->cache[] entry. Thus, it's not + * owned by us. In workers they own the memory, which *must be* released. + */ + struct cache_entry *ce; + struct conv_attrs ca; + size_t id; /* position in parallel_checkout->items[] of main process */ + + /* Output fields, sent from workers. */ + enum ci_status status; + struct stat st; +}; + +/* + * The fixed-size portion of `struct checkout_item` that is sent to the workers. + * Following this will be 2 strings: ca.working_tree_encoding and ce.name; These + * are NOT null terminated, since we have the size in the fixed portion. + */ +struct ci_fixed_portion { + size_t id; + struct object_id oid; + unsigned int ce_mode; + enum crlf_action attr_action; + enum crlf_action crlf_action; + int ident; + size_t working_tree_encoding_len; + size_t name_len; +}; + +/* + * The `struct checkout_item` fields returned by the workers. The order is + * important here, specially stat being the last one, as it is omitted on error. + */ +struct ci_result { + size_t id; + enum ci_status status; + struct stat st; +}; + +#define ci_result_base_size() offsetof(struct ci_result, st) + +void write_checkout_item(struct checkout *state, struct checkout_item *ci); + #endif /* PARALLEL_CHECKOUT_H */ From patchwork Mon Aug 10 21:33:20 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Matheus Tavares X-Patchwork-Id: 11708175 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 71080109A for ; Mon, 10 Aug 2020 21:35:21 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 4A44D206DA for ; Mon, 10 Aug 2020 21:35:21 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=usp-br.20150623.gappssmtp.com header.i=@usp-br.20150623.gappssmtp.com header.b="oGXTqHf0" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726632AbgHJVfU (ORCPT ); Mon, 10 Aug 2020 17:35:20 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36194 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726472AbgHJVfT (ORCPT ); Mon, 10 Aug 2020 17:35:19 -0400 Received: from mail-qk1-x743.google.com (mail-qk1-x743.google.com [IPv6:2607:f8b0:4864:20::743]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 0D22AC061756 for ; Mon, 10 Aug 2020 14:35:19 -0700 (PDT) Received: by mail-qk1-x743.google.com with SMTP id 2so9840565qkf.10 for ; Mon, 10 Aug 2020 14:35:19 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=usp-br.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=Qs381WwVWvq+55aulTzUiJ3886qQM5oDZ5ByL3RhMkQ=; b=oGXTqHf0Zv2ieY0U6ipQOU24Y+zGiSz+knRSkL5w+qlEEX8vWJBevRWeW3WDfQ5gxf phmY+0TfY1DKltndDSH2LZys/ydyGBWd2Ma6BBHaCnBpSi+G95Tbghp8cnYCB9uKmx0y O8U4QOOtducR8YM+G8ZYjILGMCzZILuzX18j0jmA1bF7U6EBmMMtl1LdPuyPR0SmRXeL xzWHOqDF8RgLM7kHwkKWsLc4gryR9pSEo10oZrNRd8NI9xPMBF1sTtLbwfjl1dhrZa6J g4O6AyIPBStZZ3bWhVL1GC+c9ivFu6ocd/DWYC6TG6MtuljKJAn6OG5kiEDwK1TT36CJ nHNQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=Qs381WwVWvq+55aulTzUiJ3886qQM5oDZ5ByL3RhMkQ=; b=lB5VsQmFZ+zvb/dCEois/WOhCaMM7KA8OYFRtM8Lx3+MGARocW3YZXe5s33PflfTUd gqFzlh/Mt+nbKM3IHPBhwfqADeETkVlOnuGPCTgojLnGjLB8LVTsxpabd0dfXRpJ3pxz O9BjjbN9s3rRlwPN7ND4B0H9138ob/7vxsOz0Tt4a2uFiPsXDtjA+jnDnBoIKHq2krn4 5v29rvHNPoHVAImO6rnz8xO+0NVsxo0dywHog2Fn4HNRXscRGYTvl24+B16E2VliUJza l7kgZwgSq7CXrdylhVeUSepHnGT3C+J3t9P4KUwCR/MECltIibGdwnwVEdA3fBJZFhqv l7nA== X-Gm-Message-State: AOAM530d/A3X7KnHoAlJTVTIlYI86fHtlFfz5CcFyFYKTPHyd7Lo7fFK 5y/WGratBq4lSLg3ErGWpu7n1+tiFQk= X-Google-Smtp-Source: ABdhPJxDuFA4hCRnpm9MRgD1eN5uT4SjLiYrgwzeKESpgReBGkSDenElMq4CWunf1WKcyx/K0MW03w== X-Received: by 2002:a37:614:: with SMTP id 20mr27945572qkg.456.1597095317743; Mon, 10 Aug 2020 14:35:17 -0700 (PDT) Received: from localhost.localdomain ([2804:18:87c:466:1120:3c2c:21e4:5931]) by smtp.gmail.com with ESMTPSA id z197sm15370674qkb.66.2020.08.10.14.35.14 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 10 Aug 2020 14:35:17 -0700 (PDT) From: Matheus Tavares To: git@vger.kernel.org Cc: stolee@gmail.com, jeffhost@microsoft.com, =?utf-8?b?Tmd1eeG7hW4gVGjDoWkg?= =?utf-8?b?Tmfhu41jIER1eQ==?= , Junio C Hamano , =?utf-8?q?Ren=C3=A9_Scharfe?= , Stefan Beller Subject: [RFC PATCH 12/21] parallel-checkout: add configuration options Date: Mon, 10 Aug 2020 18:33:20 -0300 Message-Id: <1263342110bfc96480146df0f876a5c936b61ce1.1597093021.git.matheus.bernardino@usp.br> X-Mailer: git-send-email 2.27.0 In-Reply-To: References: MIME-Version: 1.0 Sender: git-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org Add the checkout.workers and checkout.workerThreshold settings, which allow users to configure and/or disable the parallel checkout feature as desired. The first setting defines the number of workers and the second defines the minimum number of entries to attempt parallel checkout. Co-authored-by: Jeff Hostetler Signed-off-by: Matheus Tavares --- I still have to evaluate what is the best default value for checkout.workersThreshold. For now, I used 0 so that the test suite uses parallel-checkout by default, exercising the new code. I'm open to suggestions on how we can improve testing for it, once checkout.workersThreshold is no longer 0. Note: the default number of workers can probably be better calculated as well, multiplying the number of cores by some factor. My machine, for example, has 8 logical cores but 10 workers leads to the fastest execution. Documentation/config/checkout.txt | 16 ++++++++++++++++ parallel-checkout.c | 26 +++++++++++++++++++++----- parallel-checkout.h | 11 +++++++++-- unpack-trees.c | 10 +++++++--- 4 files changed, 53 insertions(+), 10 deletions(-) diff --git a/Documentation/config/checkout.txt b/Documentation/config/checkout.txt index 6b646813ab..9dabdf9231 100644 --- a/Documentation/config/checkout.txt +++ b/Documentation/config/checkout.txt @@ -16,3 +16,19 @@ will checkout the '' branch on another remote, and by linkgit:git-worktree[1] when 'git worktree add' refers to a remote branch. This setting might be used for other checkout-like commands or functionality in the future. + +checkout.workers:: + The number of worker processes to use when updating the working tree. + If unset (or set to a value less than one), Git will use as many + workers as the number of logical cores available. One means sequential + execution. This and the checkout.workersThreshold settings affect all + commands which perform checkout. E.g. checkout, switch, clone, + sparse-checkout, read-tree, etc. + +checkout.workersThreshold:: + If set to a positive number, parallel checkout will not be attempted + when the number of files to be updated is less than the defined limit. + When set to a negative number or unset, defaults to 0. The reasoning + behind this config is that, when modifying a small number of files, a + sequential execution might be faster, as it avoids the cost of spawning + subprocesses and inter-process communication. diff --git a/parallel-checkout.c b/parallel-checkout.c index ec42342bc8..e0fca4d380 100644 --- a/parallel-checkout.c +++ b/parallel-checkout.c @@ -4,6 +4,8 @@ #include "pkt-line.h" #include "run-command.h" #include "streaming.h" +#include "thread-utils.h" +#include "config.h" struct parallel_checkout { struct checkout_item *items; @@ -18,6 +20,19 @@ enum pc_status parallel_checkout_status(void) return pc_status; } +#define DEFAULT_WORKERS_THRESHOLD 0 + +void get_parallel_checkout_configs(int *num_workers, int *threshold) +{ + if (git_config_get_int("checkout.workers", num_workers) || + *num_workers < 1) + *num_workers = online_cpus(); + + if (git_config_get_int("checkout.workersThreshold", threshold) || + *threshold < 0) + *threshold = DEFAULT_WORKERS_THRESHOLD; +} + void init_parallel_checkout(void) { if (parallel_checkout) @@ -480,22 +495,23 @@ static int run_checkout_sequentially(struct checkout *state) return handle_results(state); } -static const int workers_threshold = 0; - -int run_parallel_checkout(struct checkout *state) +int run_parallel_checkout(struct checkout *state, int num_workers, int threshold) { - int num_workers = online_cpus(); int ret = 0; struct child_process *workers; if (!parallel_checkout) BUG("cannot run parallel checkout: not initialized yet"); + if (num_workers < 1) + BUG("invalid number of workers for run_parallel_checkout: %d", + num_workers); + pc_status = PC_RUNNING; if (parallel_checkout->nr == 0) { goto done; - } else if (parallel_checkout->nr < workers_threshold || num_workers == 1) { + } else if (parallel_checkout->nr < threshold || num_workers == 1) { ret = run_checkout_sequentially(state); goto done; } diff --git a/parallel-checkout.h b/parallel-checkout.h index f25f2874ae..b4d412c8b5 100644 --- a/parallel-checkout.h +++ b/parallel-checkout.h @@ -18,6 +18,9 @@ enum pc_status { enum pc_status parallel_checkout_status(void); void init_parallel_checkout(void); +/* Reads the checkout.workers and checkout.workersThreshold settings. */ +void get_parallel_checkout_configs(int *num_workers, int *threshold); + /* * Return -1 if parallel checkout is currently not enabled or if the entry is * not eligible for parallel checkout. Otherwise, enqueue the entry for later @@ -25,8 +28,12 @@ void init_parallel_checkout(void); */ int enqueue_checkout(struct cache_entry *ce, struct conv_attrs *ca); -/* Write all the queued entries, returning 0 on success. */ -int run_parallel_checkout(struct checkout *state); +/* + * Write all the queued entries, returning 0 on success. If the number of + * entries is below the specified threshold, the operation is performed + * sequentially. + */ +int run_parallel_checkout(struct checkout *state, int num_workers, int threshold); /**************************************************************** * Interface with checkout--helper diff --git a/unpack-trees.c b/unpack-trees.c index 1b1da7485a..117ed42370 100644 --- a/unpack-trees.c +++ b/unpack-trees.c @@ -399,7 +399,7 @@ static int check_updates(struct unpack_trees_options *o, int errs = 0; struct progress *progress; struct checkout state = CHECKOUT_INIT; - int i; + int i, pc_workers, pc_threshold; trace_performance_enter(); state.force = 1; @@ -462,8 +462,11 @@ static int check_updates(struct unpack_trees_options *o, oid_array_clear(&to_fetch); } + get_parallel_checkout_configs(&pc_workers, &pc_threshold); + enable_delayed_checkout(&state); - init_parallel_checkout(); + if (pc_workers > 1) + init_parallel_checkout(); for (i = 0; i < index->cache_nr; i++) { struct cache_entry *ce = index->cache[i]; @@ -477,7 +480,8 @@ static int check_updates(struct unpack_trees_options *o, } } stop_progress(&progress); - errs |= run_parallel_checkout(&state); + if (pc_workers > 1) + errs |= run_parallel_checkout(&state, pc_workers, pc_threshold); errs |= finish_delayed_checkout(&state, NULL); git_attr_set_direction(GIT_ATTR_CHECKIN); From patchwork Mon Aug 10 21:33:21 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Matheus Tavares X-Patchwork-Id: 11708177 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id C3DA416B1 for ; Mon, 10 Aug 2020 21:35:24 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id A6BA72073E for ; Mon, 10 Aug 2020 21:35:24 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=usp-br.20150623.gappssmtp.com header.i=@usp-br.20150623.gappssmtp.com header.b="kH18h29A" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726679AbgHJVfY (ORCPT ); Mon, 10 Aug 2020 17:35:24 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36208 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726472AbgHJVfX (ORCPT ); Mon, 10 Aug 2020 17:35:23 -0400 Received: from mail-qk1-x742.google.com (mail-qk1-x742.google.com [IPv6:2607:f8b0:4864:20::742]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id E1FE3C061756 for ; Mon, 10 Aug 2020 14:35:22 -0700 (PDT) Received: by mail-qk1-x742.google.com with SMTP id x69so9892763qkb.1 for ; Mon, 10 Aug 2020 14:35:22 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=usp-br.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=jBwE97t+9764gI7oLRHp0q8Zfh9t9w578Xwe4bC6nC8=; b=kH18h29A87Bekqudq4yIhwZMObSB59yvlltxm1e85aaHKq5yKve8STcejALoLAQW1c KB0W2v9F1h7a2Z7zIzloqv+wRrNvjvi8df1/odib/Kt3Bhbi0hxd3aief+rMyIdOVxrD cbtMr2GSTFwkY01HcJOBbnG3if7qvnZBs0Dl8unfGHW4akuJ52kdyQjnMv9gc/vlWCQ/ P8rSxv9kbGYLq8mWSc1ZGRSgV/rVJ/c+GCPBHYPbh44CaHCrarX/CrKhPmUsk0yIuawS /vEbGiCXydpR3Ej69q0RsxKZNOy+FKDC9MlhjjDdKc4Q7KYhky9j4U6cqIpw+qh5982b qTsw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=jBwE97t+9764gI7oLRHp0q8Zfh9t9w578Xwe4bC6nC8=; b=lm20hZ/Y3myyi1XgqUrwwXou0rxjTY0TbK3TQvzJC5REawWCCInZY7HZFaDEyPi7GV K3t6LhtxlGZQr9hYmRchSWbr/V3LuYuEEOG4UxMDX8B4E5ELWR6o8Crboz4Ue0pHh9UK eYu+cnYqDmcU9o+5IQYP40HWhXLqj8ljerUipg5s4YUKZeI+r4humhPre/LoycucSZde lEByABRWsxbnIpsAvndfX4YE13U+DA7BovCPx7puUsAi4odhET85OnJyDFXMTk14ZVuv qDDBPfAdZnHK/Y8D0rCVSLmM19HYrQUXGZjf5tyJEnMBbnkC3ov+2TtGGPnoBKaxeUwH ORYg== X-Gm-Message-State: AOAM530jSxwA0wjSE4TqkI1k7B7yHWeFsCOBMd0p5FB4ouwnm8EeqtI8 QedAx6qSG9vnsHUNLR4lYc1vm9LvOxY= X-Google-Smtp-Source: ABdhPJzY+uz5HG0tgN3dLpXuFLzt5+oKjh6aob4MuymFICKWMSmfNrQ/vTwiBG3Lsz5rr8dIPoF2kg== X-Received: by 2002:a37:8484:: with SMTP id g126mr28251144qkd.230.1597095321768; Mon, 10 Aug 2020 14:35:21 -0700 (PDT) Received: from localhost.localdomain ([2804:18:87c:466:1120:3c2c:21e4:5931]) by smtp.gmail.com with ESMTPSA id z197sm15370674qkb.66.2020.08.10.14.35.18 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 10 Aug 2020 14:35:21 -0700 (PDT) From: Matheus Tavares To: git@vger.kernel.org Cc: stolee@gmail.com, jeffhost@microsoft.com, =?utf-8?b?Tmd1eeG7hW4gVGjDoWkg?= =?utf-8?b?Tmfhu41jIER1eQ==?= , Junio C Hamano , Johannes Schindelin , Elijah Newren Subject: [RFC PATCH 13/21] parallel-checkout: support progress displaying Date: Mon, 10 Aug 2020 18:33:21 -0300 Message-Id: X-Mailer: git-send-email 2.27.0 In-Reply-To: References: MIME-Version: 1.0 Sender: git-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org Original-patch-by: Nguyễn Thái Ngọc Duy Signed-off-by: Nguyễn Thái Ngọc Duy Signed-off-by: Matheus Tavares --- parallel-checkout.c | 40 +++++++++++++++++++++++++++++++++++++--- parallel-checkout.h | 4 +++- unpack-trees.c | 11 ++++++++--- 3 files changed, 48 insertions(+), 7 deletions(-) diff --git a/parallel-checkout.c b/parallel-checkout.c index e0fca4d380..78bf2de5ea 100644 --- a/parallel-checkout.c +++ b/parallel-checkout.c @@ -2,6 +2,7 @@ #include "entry.h" #include "parallel-checkout.h" #include "pkt-line.h" +#include "progress.h" #include "run-command.h" #include "streaming.h" #include "thread-utils.h" @@ -10,6 +11,8 @@ struct parallel_checkout { struct checkout_item *items; size_t nr, alloc; + struct progress *progress; + unsigned int *progress_cnt; }; static struct parallel_checkout *parallel_checkout = NULL; @@ -121,6 +124,22 @@ int enqueue_checkout(struct cache_entry *ce, struct conv_attrs *ca) return 0; } +size_t pc_queue_size(void) +{ + if (!parallel_checkout) + return 0; + return parallel_checkout->nr; +} + +static void advance_progress_meter(void) +{ + if (parallel_checkout && parallel_checkout->progress) { + (*parallel_checkout->progress_cnt)++; + display_progress(parallel_checkout->progress, + *parallel_checkout->progress_cnt); + } +} + static int handle_results(struct checkout *state) { int ret = 0; @@ -132,6 +151,10 @@ static int handle_results(struct checkout *state) struct checkout_item *ci = ¶llel_checkout->items[i]; struct stat *st = &ci->st; + /* + * Note: progress meter was already incremented for CI_SUCCESS + * and CI_FAILED. + */ switch(ci->status) { case CI_SUCCESS: update_ce_after_write(state, ci->ce, st); @@ -145,6 +168,7 @@ static int handle_results(struct checkout *state) * leading dirs in the entry's path. */ ret |= checkout_entry_ca(ci->ce, &ci->ca, state, NULL, NULL); + advance_progress_meter(); break; case CI_FAILED: ret = -1; @@ -434,6 +458,9 @@ static void parse_and_save_result(const char *line, int len) */ if (res->status == CI_SUCCESS) ci->st = res->st; + + if (res->status != CI_RETRY) + advance_progress_meter(); } static void gather_results_from_workers(struct child_process *workers, @@ -490,12 +517,17 @@ static void gather_results_from_workers(struct child_process *workers, static int run_checkout_sequentially(struct checkout *state) { size_t i; - for (i = 0; i < parallel_checkout->nr; ++i) - write_checkout_item(state, ¶llel_checkout->items[i]); + for (i = 0; i < parallel_checkout->nr; ++i) { + struct checkout_item *ci = ¶llel_checkout->items[i]; + write_checkout_item(state, ci); + if (ci->status != CI_RETRY) + advance_progress_meter(); + } return handle_results(state); } -int run_parallel_checkout(struct checkout *state, int num_workers, int threshold) +int run_parallel_checkout(struct checkout *state, int num_workers, int threshold, + struct progress *progress, unsigned int *progress_cnt) { int ret = 0; struct child_process *workers; @@ -508,6 +540,8 @@ int run_parallel_checkout(struct checkout *state, int num_workers, int threshold num_workers); pc_status = PC_RUNNING; + parallel_checkout->progress = progress; + parallel_checkout->progress_cnt = progress_cnt; if (parallel_checkout->nr == 0) { goto done; diff --git a/parallel-checkout.h b/parallel-checkout.h index b4d412c8b5..2b81a5db6c 100644 --- a/parallel-checkout.h +++ b/parallel-checkout.h @@ -27,13 +27,15 @@ void get_parallel_checkout_configs(int *num_workers, int *threshold); * write and return 0. */ int enqueue_checkout(struct cache_entry *ce, struct conv_attrs *ca); +size_t pc_queue_size(void); /* * Write all the queued entries, returning 0 on success. If the number of * entries is below the specified threshold, the operation is performed * sequentially. */ -int run_parallel_checkout(struct checkout *state, int num_workers, int threshold); +int run_parallel_checkout(struct checkout *state, int num_workers, int threshold, + struct progress *progress, unsigned int *progress_cnt); /**************************************************************** * Interface with checkout--helper diff --git a/unpack-trees.c b/unpack-trees.c index 117ed42370..e05e6ceff2 100644 --- a/unpack-trees.c +++ b/unpack-trees.c @@ -471,17 +471,22 @@ static int check_updates(struct unpack_trees_options *o, struct cache_entry *ce = index->cache[i]; if (ce->ce_flags & CE_UPDATE) { + size_t last_pc_queue_size = pc_queue_size(); + if (ce->ce_flags & CE_WT_REMOVE) BUG("both update and delete flags are set on %s", ce->name); - display_progress(progress, ++cnt); ce->ce_flags &= ~CE_UPDATE; errs |= checkout_entry(ce, &state, NULL, NULL); + + if (last_pc_queue_size == pc_queue_size()) + display_progress(progress, ++cnt); } } - stop_progress(&progress); if (pc_workers > 1) - errs |= run_parallel_checkout(&state, pc_workers, pc_threshold); + errs |= run_parallel_checkout(&state, pc_workers, pc_threshold, + progress, &cnt); + stop_progress(&progress); errs |= finish_delayed_checkout(&state, NULL); git_attr_set_direction(GIT_ATTR_CHECKIN); From patchwork Mon Aug 10 21:33:22 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Matheus Tavares X-Patchwork-Id: 11708179 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id E0FD416B1 for ; Mon, 10 Aug 2020 21:35:29 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id B6E8E20734 for ; Mon, 10 Aug 2020 21:35:29 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=usp-br.20150623.gappssmtp.com header.i=@usp-br.20150623.gappssmtp.com header.b="dc+xRMiH" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726804AbgHJVf3 (ORCPT ); Mon, 10 Aug 2020 17:35:29 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36220 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726472AbgHJVf2 (ORCPT ); Mon, 10 Aug 2020 17:35:28 -0400 Received: from mail-qk1-x744.google.com (mail-qk1-x744.google.com [IPv6:2607:f8b0:4864:20::744]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 213A9C061756 for ; Mon, 10 Aug 2020 14:35:28 -0700 (PDT) Received: by mail-qk1-x744.google.com with SMTP id l64so9850327qkb.8 for ; Mon, 10 Aug 2020 14:35:28 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=usp-br.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=LDmyu5JAlxx/KyrSc8Q/vhIIFpog0hTyKPjy3UkMPMI=; b=dc+xRMiHcsXIXMV3aEIX0Ye0djqLzKkJ5tI9vGSAw/AclpclBPPyGNXFYpXeJQrecv 6NxNwLmTd2GPWMvoxazdaPyBriaRv5KxbyTbmQD/+Cq9cgVqSPkmeddH+fiBIq4Of9qK zf8nd40bZcoIWOWmUyqA9a9A09ch1QiKMDCF5kxDBpwey8cndlRRPqOOqtEnhXECj4Eq fUOtbpnE04kj0MZDKL6umHTZzzK0sxTNiUxv0NMHHaa5sr0HLfk6cBGWQOWFK3H78iEd FdU6BcQzHH/5XwpzGghssVnnzS4ZL2D3INcFPVIapUs0hQAlMKlApOrg1ze9D96Uvyds WMuQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=LDmyu5JAlxx/KyrSc8Q/vhIIFpog0hTyKPjy3UkMPMI=; b=C/Aom0zTuWU8DpZOWM5Ho4rDVYIVpnfDKT+FRepLPCyrOugZ7YisrQyxKiHE+2rhCJ Gf9IcE8v3xZ068USfRCLTR7wDstP6XNHt8eX+xkRk8hVEuzYhjROZVHymnTgzecOQCo9 srJjGxjTxEqYa4b+QgI7K+8/IwXm5xqs7Z+OW9if9obpAoSnBXZ/jVkZ3fwzxt31Ex08 PxdPU92jJdSAYiXweRMgS4QX1ZpP8YgacXXNh9gcwJ+allDgJ5W3HyZIik91MKy/kDV/ oyz190xlyWpo6xiaxIoet3DMffCjh/pYjkbK3YhC2Hm0WS4gKuubxDz2qkdsTY8WMzJt gaaQ== X-Gm-Message-State: AOAM531m4YGaSEg6cte6/erdZazzscvFxVEKUlN+hPr5TQ8XG0sEpmaO d2fMw1mjfFUyNvkMmdCcR82VHWIXQL8= X-Google-Smtp-Source: ABdhPJwTmDkNYK5nbYwc7aDtOBag3AazUKThlREtZdT3g0F/GsVjSP2CV52qeo8NHpFhHREInmJm1g== X-Received: by 2002:a37:6255:: with SMTP id w82mr28249071qkb.392.1597095326933; Mon, 10 Aug 2020 14:35:26 -0700 (PDT) Received: from localhost.localdomain ([2804:18:87c:466:1120:3c2c:21e4:5931]) by smtp.gmail.com with ESMTPSA id z197sm15370674qkb.66.2020.08.10.14.35.23 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 10 Aug 2020 14:35:25 -0700 (PDT) From: Matheus Tavares To: git@vger.kernel.org Cc: stolee@gmail.com, jeffhost@microsoft.com, =?utf-8?b?Tmd1eeG7hW4gVGjDoWkg?= =?utf-8?b?Tmfhu41jIER1eQ==?= , Patryk Obara , Johannes Schindelin , Junio C Hamano , Jameson Miller , Jeff King Subject: [RFC PATCH 14/21] make_transient_cache_entry(): optionally alloc from mem_pool Date: Mon, 10 Aug 2020 18:33:22 -0300 Message-Id: <9050b9da6748b7b64508255741f925d9d57983af.1597093021.git.matheus.bernardino@usp.br> X-Mailer: git-send-email 2.27.0 In-Reply-To: References: MIME-Version: 1.0 Sender: git-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org Allow make_transient_cache_entry() to optionally receive a mem_pool struct in which it should allocate the entry. This will be used in the following patch, to store some transient entries which should persist until parallel checkout finishes. Signed-off-by: Matheus Tavares --- builtin/checkout--helper.c | 2 +- builtin/checkout.c | 2 +- builtin/difftool.c | 2 +- cache.h | 10 +++++----- read-cache.c | 12 ++++++++---- unpack-trees.c | 2 +- 6 files changed, 17 insertions(+), 13 deletions(-) diff --git a/builtin/checkout--helper.c b/builtin/checkout--helper.c index 269cf02feb..d2ab40cb4c 100644 --- a/builtin/checkout--helper.c +++ b/builtin/checkout--helper.c @@ -30,7 +30,7 @@ static void packet_to_ci(char *line, int len, struct checkout_item *ci) } memset(ci, 0, sizeof(*ci)); - ci->ce = make_empty_transient_cache_entry(fixed_portion->name_len); + ci->ce = make_empty_transient_cache_entry(fixed_portion->name_len, NULL); ci->ce->ce_namelen = fixed_portion->name_len; ci->ce->ce_mode = fixed_portion->ce_mode; memcpy(ci->ce->name, variant, ci->ce->ce_namelen); diff --git a/builtin/checkout.c b/builtin/checkout.c index 3e09b29cfe..8e4a3c1df0 100644 --- a/builtin/checkout.c +++ b/builtin/checkout.c @@ -291,7 +291,7 @@ static int checkout_merged(int pos, const struct checkout *state, int *nr_checko if (write_object_file(result_buf.ptr, result_buf.size, blob_type, &oid)) die(_("Unable to add merge result for '%s'"), path); free(result_buf.ptr); - ce = make_transient_cache_entry(mode, &oid, path, 2); + ce = make_transient_cache_entry(mode, &oid, path, 2, NULL); if (!ce) die(_("make_cache_entry failed for path '%s'"), path); status = checkout_entry(ce, state, NULL, nr_checkouts); diff --git a/builtin/difftool.c b/builtin/difftool.c index dfa22b67eb..5e7a57c8c2 100644 --- a/builtin/difftool.c +++ b/builtin/difftool.c @@ -323,7 +323,7 @@ static int checkout_path(unsigned mode, struct object_id *oid, struct cache_entry *ce; int ret; - ce = make_transient_cache_entry(mode, oid, path, 0); + ce = make_transient_cache_entry(mode, oid, path, 0, NULL); ret = checkout_entry(ce, state, NULL, NULL); discard_cache_entry(ce); diff --git a/cache.h b/cache.h index e6963cf8fe..e2b41c5f8b 100644 --- a/cache.h +++ b/cache.h @@ -355,16 +355,16 @@ struct cache_entry *make_empty_cache_entry(struct index_state *istate, size_t name_len); /* - * Create a cache_entry that is not intended to be added to an index. - * Caller is responsible for discarding the cache_entry - * with `discard_cache_entry`. + * Create a cache_entry that is not intended to be added to an index. If mp is + * not NULL, the entry is allocated within the given memory pool. Caller is + * responsible for discarding the cache_entry with `discard_cache_entry`. */ struct cache_entry *make_transient_cache_entry(unsigned int mode, const struct object_id *oid, const char *path, - int stage); + int stage, struct mem_pool *mp); -struct cache_entry *make_empty_transient_cache_entry(size_t name_len); +struct cache_entry *make_empty_transient_cache_entry(size_t len, struct mem_pool *mp); /* * Discard cache entry. diff --git a/read-cache.c b/read-cache.c index 8ed1c29b54..eeb122cca4 100644 --- a/read-cache.c +++ b/read-cache.c @@ -811,8 +811,10 @@ struct cache_entry *make_empty_cache_entry(struct index_state *istate, size_t le return mem_pool__ce_calloc(find_mem_pool(istate), len); } -struct cache_entry *make_empty_transient_cache_entry(size_t len) +struct cache_entry *make_empty_transient_cache_entry(size_t len, struct mem_pool *mp) { + if (mp) + return mem_pool__ce_calloc(mp, len); return xcalloc(1, cache_entry_size(len)); } @@ -846,8 +848,10 @@ struct cache_entry *make_cache_entry(struct index_state *istate, return ret; } -struct cache_entry *make_transient_cache_entry(unsigned int mode, const struct object_id *oid, - const char *path, int stage) +struct cache_entry *make_transient_cache_entry(unsigned int mode, + const struct object_id *oid, + const char *path, int stage, + struct mem_pool *mp) { struct cache_entry *ce; int len; @@ -858,7 +862,7 @@ struct cache_entry *make_transient_cache_entry(unsigned int mode, const struct o } len = strlen(path); - ce = make_empty_transient_cache_entry(len); + ce = make_empty_transient_cache_entry(len, mp); oidcpy(&ce->oid, oid); memcpy(ce->name, path, len); diff --git a/unpack-trees.c b/unpack-trees.c index e05e6ceff2..dcb40dc8fa 100644 --- a/unpack-trees.c +++ b/unpack-trees.c @@ -1031,7 +1031,7 @@ static struct cache_entry *create_ce_entry(const struct traverse_info *info, size_t len = traverse_path_len(info, tree_entry_len(n)); struct cache_entry *ce = is_transient ? - make_empty_transient_cache_entry(len) : + make_empty_transient_cache_entry(len, NULL) : make_empty_cache_entry(istate, len); ce->ce_mode = create_ce_mode(n->mode); From patchwork Mon Aug 10 21:33:23 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Matheus Tavares X-Patchwork-Id: 11708181 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id BE92B109A for ; Mon, 10 Aug 2020 21:35:33 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id A5CD020734 for ; Mon, 10 Aug 2020 21:35:33 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=usp-br.20150623.gappssmtp.com header.i=@usp-br.20150623.gappssmtp.com header.b="2Q6xbdqA" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726827AbgHJVfc (ORCPT ); Mon, 10 Aug 2020 17:35:32 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36232 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726806AbgHJVfc (ORCPT ); Mon, 10 Aug 2020 17:35:32 -0400 Received: from mail-qk1-x743.google.com (mail-qk1-x743.google.com [IPv6:2607:f8b0:4864:20::743]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 29419C061756 for ; Mon, 10 Aug 2020 14:35:32 -0700 (PDT) Received: by mail-qk1-x743.google.com with SMTP id 77so9872859qkm.5 for ; Mon, 10 Aug 2020 14:35:32 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=usp-br.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=FcoQ2JwemdUdgjV7jH1TeoZsMmV/7yQxaQifCDX/Q+U=; b=2Q6xbdqAau+txXPshCtHv05jQFwDwqrK0RWIfRVM7OjUTeSwqC6EK5T2Rck6d0UczS xfB9w+rCfTEFwlvQBy1jFG5erS6pyUDYl6OKpH1froYozpZWVsjrlpRFDuEszY3E4ar8 3ES0M0WQZA1LDxkAP9o1ySOrtI82MG6ZySEU8CivquKfhxOLp4DLY4qljMU/iUQWFIkO HhPk6mFJlqlx0f+guDgQdNakB2DnwiHpO/01kYZ8LmPNghx62ZDUAOmSZDojw8zzHlR8 6wvpYbYMNAsZwkQqb3nxtrVDN878sy2gDMqV8jCBTLGI9YmeOUAGVNNUTc5faVz9frRC MFDA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=FcoQ2JwemdUdgjV7jH1TeoZsMmV/7yQxaQifCDX/Q+U=; b=AvGHFDlVEqVKjpZ1vTC+eW0Bkv6FlLf+89jqVcusy8eXhrW1D+NDsbrag+TlnClmYF D6o9EVw9N2MbhqvWn6GBSEqRnUYihKDLpPzTMa0a2r++TbzYrxCw/UlKgJMLA63WsU0b cgUqY1CcyUO9gYfGTNI8sfKMp9NqERdjOK0KikhCWuJ7Jm40bSdU8ub902AjkylUiZXa /ylnBSAC/bp8a1vFieW3AM0Pmmu5Gl9s7E1rNVE5pXVL/4mMDjso/P5QQQvqf2Vp/XAD 2zbmSpmhT9xhULNh2bO3mVQNR7qSFhuqZTlSOIA9SqI6pPxC0zDnLMi9g+wxC1hjJlHt LpVw== X-Gm-Message-State: AOAM53391U3HrsGSTdo4k5YZJuft2XSd8tPbtcbWc2kiKuj6b0WBZxyE 8/RT2JB0p2pyRoldFEJvr2V/IR8gByw= X-Google-Smtp-Source: ABdhPJzcUx5CVF1Lq8cyG6DCylLLfr78lKIl4h63DPC3F+4TD5HU+sx7+/u2cHOxp39xNfA3PqYotw== X-Received: by 2002:a05:620a:123c:: with SMTP id v28mr25811271qkj.366.1597095330915; Mon, 10 Aug 2020 14:35:30 -0700 (PDT) Received: from localhost.localdomain ([2804:18:87c:466:1120:3c2c:21e4:5931]) by smtp.gmail.com with ESMTPSA id z197sm15370674qkb.66.2020.08.10.14.35.27 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 10 Aug 2020 14:35:30 -0700 (PDT) From: Matheus Tavares To: git@vger.kernel.org Cc: stolee@gmail.com, jeffhost@microsoft.com, Junio C Hamano , =?utf-8?b?Tmd1eeG7hW4gVGjDoWkgTmfhu41j?= =?utf-8?b?IER1eQ==?= Subject: [RFC PATCH 15/21] builtin/checkout.c: complete parallel checkout support Date: Mon, 10 Aug 2020 18:33:23 -0300 Message-Id: <258b855e49463ed39f0e6865327714c614c786d5.1597093021.git.matheus.bernardino@usp.br> X-Mailer: git-send-email 2.27.0 In-Reply-To: References: MIME-Version: 1.0 Sender: git-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org There is one code path in builtin/checkout.c which still doesn't benefit from parallel checkout because it calls checkout_entry() directly, instead of unpack_trees(). Let's add parallel support for this missing spot as well. Note: the transient cache entries allocated in checkout_merged() are now allocated in a mem_pool which is only discarded after parallel checkout finishes. This is done because the entries need to be valid when run_parallel_checkout() is called. Signed-off-by: Matheus Tavares --- builtin/checkout.c | 20 ++++++++++++++++---- 1 file changed, 16 insertions(+), 4 deletions(-) diff --git a/builtin/checkout.c b/builtin/checkout.c index 8e4a3c1df0..b9230d5009 100644 --- a/builtin/checkout.c +++ b/builtin/checkout.c @@ -27,6 +27,7 @@ #include "wt-status.h" #include "xdiff-interface.h" #include "entry.h" +#include "parallel-checkout.h" static const char * const checkout_usage[] = { N_("git checkout [] "), @@ -230,7 +231,8 @@ static int checkout_stage(int stage, const struct cache_entry *ce, int pos, return error(_("path '%s' does not have their version"), ce->name); } -static int checkout_merged(int pos, const struct checkout *state, int *nr_checkouts) +static int checkout_merged(int pos, const struct checkout *state, + int *nr_checkouts, struct mem_pool *ce_mem_pool) { struct cache_entry *ce = active_cache[pos]; const char *path = ce->name; @@ -291,11 +293,10 @@ static int checkout_merged(int pos, const struct checkout *state, int *nr_checko if (write_object_file(result_buf.ptr, result_buf.size, blob_type, &oid)) die(_("Unable to add merge result for '%s'"), path); free(result_buf.ptr); - ce = make_transient_cache_entry(mode, &oid, path, 2, NULL); + ce = make_transient_cache_entry(mode, &oid, path, 2, ce_mem_pool); if (!ce) die(_("make_cache_entry failed for path '%s'"), path); status = checkout_entry(ce, state, NULL, nr_checkouts); - discard_cache_entry(ce); return status; } @@ -359,16 +360,22 @@ static int checkout_worktree(const struct checkout_opts *opts, int nr_checkouts = 0, nr_unmerged = 0; int errs = 0; int pos; + int pc_workers, pc_threshold; + struct mem_pool *ce_mem_pool = NULL; state.force = 1; state.refresh_cache = 1; state.istate = &the_index; + mem_pool_init(&ce_mem_pool, 0); + get_parallel_checkout_configs(&pc_workers, &pc_threshold); init_checkout_metadata(&state.meta, info->refname, info->commit ? &info->commit->object.oid : &info->oid, NULL); enable_delayed_checkout(&state); + if (pc_workers > 1) + init_parallel_checkout(); for (pos = 0; pos < active_nr; pos++) { struct cache_entry *ce = active_cache[pos]; if (ce->ce_flags & CE_MATCHED) { @@ -384,10 +391,15 @@ static int checkout_worktree(const struct checkout_opts *opts, &nr_checkouts, opts->overlay_mode); else if (opts->merge) errs |= checkout_merged(pos, &state, - &nr_unmerged); + &nr_unmerged, + ce_mem_pool); pos = skip_same_name(ce, pos) - 1; } } + if (pc_workers > 1) + errs |= run_parallel_checkout(&state, pc_workers, pc_threshold, + NULL, NULL); + mem_pool_discard(ce_mem_pool, should_validate_cache_entries()); remove_marked_cache_entries(&the_index, 1); remove_scheduled_dirs(); errs |= finish_delayed_checkout(&state, &nr_checkouts); From patchwork Mon Aug 10 21:33:24 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Matheus Tavares X-Patchwork-Id: 11708183 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id C4D7414E3 for ; Mon, 10 Aug 2020 21:35:38 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id AA7FB20748 for ; Mon, 10 Aug 2020 21:35:38 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=usp-br.20150623.gappssmtp.com header.i=@usp-br.20150623.gappssmtp.com header.b="BesY85Te" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726977AbgHJVfg (ORCPT ); Mon, 10 Aug 2020 17:35:36 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36246 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726867AbgHJVfg (ORCPT ); Mon, 10 Aug 2020 17:35:36 -0400 Received: from mail-qk1-x743.google.com (mail-qk1-x743.google.com [IPv6:2607:f8b0:4864:20::743]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 81801C061787 for ; Mon, 10 Aug 2020 14:35:35 -0700 (PDT) Received: by mail-qk1-x743.google.com with SMTP id p4so9886844qkf.0 for ; Mon, 10 Aug 2020 14:35:35 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=usp-br.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=uZ7xSdg5IPnYhdCWJBg2p+E3ZTG/S+5YSDTbJ2OAaZ0=; b=BesY85Tet11Uh9kobwIjb8ru9DZ/puskxRLK/o/DeH1D0aekpc5vaToAhs8+jVhr7Z XyYCXrecGY29v+saB669m/mdAoXgm00H8l0YJ2wHApnDzuqq571W8zjBC4X6xkwzpzB4 lQ/X6sdN8Mzwja99NQkvIBt8ncxafvl/pFXF+5Dirve0OZwbCrn/H2DCcbAyM+f3BUG2 57/SCFdtSA6j5xeMGeNlPw+P2DXj5r6u6ukDFdJpV6qiiFBZm6IZvvifOjRs4hZ9menb rpe/enaVxQuBxRri9sC27JWu0N/UYZ8BP3dfYQtDN+/Z7IQZoQsQoR1XbC7KQoYW7B9w g2Tg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=uZ7xSdg5IPnYhdCWJBg2p+E3ZTG/S+5YSDTbJ2OAaZ0=; b=OdhSNTni/jXN9BhOiM9M8F2fHpcjBVgCE2T2/Ww+kn9g3ch2/gD5JI+85LBT0taWeI i5Ruo8AR0n3+GOyaAA8k0m6klR7sl0JqOG+QST0RU18HA9JNNL1yo5p5Gp5QZDGc/jQM tKfcdJdP9VmU+eJWh8Kys35jsYe0GjG4Y+NGIYSeGcAkyMnD1VVv4Hb3SFd1DEv6XzT1 LRFxQtGzMQVC5xXv4qjJnd0M11CQA/grjN7hkTCvEZbVc+Ql79jOHQwGmQPEeEzY+H7Z Q1+oJtYrdwhlOoDA5F0QFyEU9WYSV3t2hjYIBUiz2EpVlySQhQN+woN8bJNlaGvxmp2f CkAw== X-Gm-Message-State: AOAM531zcCT1VBlwrTZncDZ0nJy8veVceUjFBqnXpZjhj79oXxPOabUI LRD0iXYIRhTJj2knz8/zxbPlajHyhcA= X-Google-Smtp-Source: ABdhPJysceHsdpSOBh1kKCAf7xMyoKmWjfkGBqZz9wUs3ydxWX5m+U6kC5h4jqWe1GzYWs22c2nGXQ== X-Received: by 2002:a37:a44b:: with SMTP id n72mr26697202qke.448.1597095334320; Mon, 10 Aug 2020 14:35:34 -0700 (PDT) Received: from localhost.localdomain ([2804:18:87c:466:1120:3c2c:21e4:5931]) by smtp.gmail.com with ESMTPSA id z197sm15370674qkb.66.2020.08.10.14.35.31 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 10 Aug 2020 14:35:33 -0700 (PDT) From: Matheus Tavares To: git@vger.kernel.org Cc: stolee@gmail.com, jeffhost@microsoft.com, =?utf-8?q?Martin_=C3=85gren?= , =?utf-8?b?Tmd1eQ==?= =?utf-8?b?4buFbiBUaMOhaSBOZ+G7jWMgRHV5?= , Jeff King , Junio C Hamano Subject: [RFC PATCH 16/21] checkout-index: add parallel checkout support Date: Mon, 10 Aug 2020 18:33:24 -0300 Message-Id: X-Mailer: git-send-email 2.27.0 In-Reply-To: References: MIME-Version: 1.0 Sender: git-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org Signed-off-by: Matheus Tavares --- builtin/checkout-index.c | 16 ++++++++++++++++ 1 file changed, 16 insertions(+) diff --git a/builtin/checkout-index.c b/builtin/checkout-index.c index 0f1ff73129..33fb933c30 100644 --- a/builtin/checkout-index.c +++ b/builtin/checkout-index.c @@ -12,6 +12,7 @@ #include "cache-tree.h" #include "parse-options.h" #include "entry.h" +#include "parallel-checkout.h" #define CHECKOUT_ALL 4 static int nul_term_line; @@ -160,6 +161,7 @@ int cmd_checkout_index(int argc, const char **argv, const char *prefix) int prefix_length; int force = 0, quiet = 0, not_new = 0; int index_opt = 0; + int pc_workers, pc_threshold; struct option builtin_checkout_index_options[] = { OPT_BOOL('a', "all", &all, N_("check out all files in the index")), @@ -214,6 +216,14 @@ int cmd_checkout_index(int argc, const char **argv, const char *prefix) hold_locked_index(&lock_file, LOCK_DIE_ON_ERROR); } + if (!to_tempfile) + get_parallel_checkout_configs(&pc_workers, &pc_threshold); + else + pc_workers = 1; + + if (pc_workers > 1) + init_parallel_checkout(); + /* Check out named files first */ for (i = 0; i < argc; i++) { const char *arg = argv[i]; @@ -256,6 +266,12 @@ int cmd_checkout_index(int argc, const char **argv, const char *prefix) if (all) checkout_all(prefix, prefix_length); + if (pc_workers > 1) { + /* Errors were already reported */ + run_parallel_checkout(&state, pc_workers, pc_threshold, + NULL, NULL); + } + if (is_lock_file_locked(&lock_file) && write_locked_index(&the_index, &lock_file, COMMIT_LOCK)) die("Unable to write new index file"); From patchwork Mon Aug 10 21:33:25 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Matheus Tavares X-Patchwork-Id: 11708185 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 01C6716B1 for ; Mon, 10 Aug 2020 21:35:43 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id D036220734 for ; Mon, 10 Aug 2020 21:35:42 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=usp-br.20150623.gappssmtp.com header.i=@usp-br.20150623.gappssmtp.com header.b="hZ+ssJXM" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726990AbgHJVfm (ORCPT ); Mon, 10 Aug 2020 17:35:42 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36258 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726806AbgHJVfk (ORCPT ); Mon, 10 Aug 2020 17:35:40 -0400 Received: from mail-qt1-x841.google.com (mail-qt1-x841.google.com [IPv6:2607:f8b0:4864:20::841]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 9A39EC061756 for ; Mon, 10 Aug 2020 14:35:39 -0700 (PDT) Received: by mail-qt1-x841.google.com with SMTP id x12so8007583qtp.1 for ; Mon, 10 Aug 2020 14:35:39 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=usp-br.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=x6GQoMBXSvbFGOIPMm6h5bCv2EstWxxEOMbYw9I3O/4=; b=hZ+ssJXMtFL5IuV4X0KFOUCtZoCb/MbCTwSmKYVlJh9bYVPa5Y/OrGO4Qgn562wu9O BIYzIn7REt4KCgTC0uw4m0kHWJFab/tu/wA+2AXMSWG6oQXOuBfz+pn46i7/yPKieAVh tlBPo/khgpiDe8GG6GYA73m3+6gkXc6JlmxpdWaJ3/Ci7Q34y7LSHuAExP68Q5BtIfko F+EfK2MieBBuC4dVYjJh99mZ08yxq3IM9rPJuOog3rQcU9jEaAWGPrrUKJ3A3N8tP4N9 vLSoeQXq0cXq/bBBeSW/h0/UXtFijz6FOnwjN+JUc9jHL6xM86wJZkQwjCgCQXGe4TyS u2+A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=x6GQoMBXSvbFGOIPMm6h5bCv2EstWxxEOMbYw9I3O/4=; b=mm7MqVV2j38cTx8s7qR0TyCkicoJ4IFN7IsmY/L1SGi2feV0qzQJWkrycTuI5vGgQU /sssBYxiUnJnAumS1rPlBEy/em3yGLPapEZV03c12NWeQ2+b9iLZU7OtUvLwk7dk33LG 26ZSG21xhtdrwoj2J7HmXt4oTO/x+nPZmW8Fh3eeYJWzFQmUNul2LJ6EY3o0hL2bYrp1 wL9CTDMBudRyelv0bJBprd9hrobrn7GEO1XX9d1mq96sem40K+rgi2HMlTfT6fMpBh5u Hc5lXp7kQSezZjAeq+oH32gLS0zavLgRrK/fBLmtRDbrwpJWD9nU82jXoc4c+lR84zx1 iakw== X-Gm-Message-State: AOAM532mMveIjX+B3vpsJmYOswuaDj8zazIkcdRGwzl1cSPV3EwUvrAc IQQSOHAVu8d/5lL5MHXP0uxge1OHnQs= X-Google-Smtp-Source: ABdhPJwFOPm844bt5X7x41QY2xCoKpOfUle9RbuKEe8fmliC0retMpIIus4UJT6XEG/s2frXI3BuhA== X-Received: by 2002:ac8:6901:: with SMTP id e1mr29594501qtr.352.1597095338218; Mon, 10 Aug 2020 14:35:38 -0700 (PDT) Received: from localhost.localdomain ([2804:18:87c:466:1120:3c2c:21e4:5931]) by smtp.gmail.com with ESMTPSA id z197sm15370674qkb.66.2020.08.10.14.35.35 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 10 Aug 2020 14:35:37 -0700 (PDT) From: Matheus Tavares To: git@vger.kernel.org Cc: stolee@gmail.com, jeffhost@microsoft.com, Johannes Schindelin , =?utf-8?b?Tmd1eeG7hW4g?= =?utf-8?b?VGjDoWkgTmfhu41jIER1eQ==?= , Elijah Newren , Junio C Hamano Subject: [RFC PATCH 17/21] parallel-checkout: avoid stat() calls in workers Date: Mon, 10 Aug 2020 18:33:25 -0300 Message-Id: <79de52b6952441e77d7276243b4b2ebe7ca16a1f.1597093021.git.matheus.bernardino@usp.br> X-Mailer: git-send-email 2.27.0 In-Reply-To: References: MIME-Version: 1.0 Sender: git-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org The current parallel checkout implementation requires the workers to stat() the path components of each entry before writing, to make sure they are all real directories and not symlinks or something else. The stat() info is cached, so this procedure should not be so bad performance-wise. But the exact same check is already done by the main process, before enqueueing the entries for parallel checkout, to remove files that were in the way and create the leading dirs. The reason we still need the second check is that, in case of path collisions, a symlink X could be created after an entry x/f was enqueued, leading the parallel worker to wrongly create the file at X/f. If we postpone the symlinks' checkouts, though, we can avoid the need of these stat() calls in the workers. Other types of path collisions are still possible, such as a regular file X being written before the worker tries to write x/f. But that's OK, since the parallel checkout machinery will check the return of open() to detect such collisions (which would not be possible for the symlink case, as open() would succeed). Signed-off-by: Matheus Tavares --- entry.c | 10 +++++++ parallel-checkout.c | 71 ++++++++++++++++++++++++++++----------------- parallel-checkout.h | 8 +++++ unpack-trees.c | 4 ++- 4 files changed, 65 insertions(+), 28 deletions(-) diff --git a/entry.c b/entry.c index b6c808dffa..6208df23df 100644 --- a/entry.c +++ b/entry.c @@ -477,6 +477,16 @@ int checkout_entry_ca(struct cache_entry *ce, struct conv_attrs *ca, return write_entry(ce, topath, ca, state, 1); } + /* + * If a regular file x/f is queued for parallel checkout and a symlink + * X is created now, the worker could wrongly create the file at X/f + * due to path collision. Thus, symlinks are only created after + * parallel-eligible entries. + */ + if (parallel_checkout_status() == PC_ACCEPTING_ENTRIES && + S_ISLNK(ce->ce_mode)) + enqueue_symlink_checkout(ce, nr_checkouts); + strbuf_reset(&path); strbuf_add(&path, state->base_dir, state->base_dir_len); strbuf_add(&path, ce->name, ce_namelen(ce)); diff --git a/parallel-checkout.c b/parallel-checkout.c index 78bf2de5ea..fee93460c1 100644 --- a/parallel-checkout.c +++ b/parallel-checkout.c @@ -140,6 +140,44 @@ static void advance_progress_meter(void) } } +struct symlink_checkout_item { + struct cache_entry *ce; + int *nr_checkouts; +}; + +static struct symlink_checkout_item *symlink_queue = NULL; +static size_t symlink_queue_nr = 0, symlink_queue_alloc = 0; + +void enqueue_symlink_checkout(struct cache_entry *ce, int *nr_checkouts) +{ + assert(S_ISLNK(ce->ce_mode)); + ALLOC_GROW(symlink_queue, symlink_queue_nr + 1, symlink_queue_alloc); + symlink_queue[symlink_queue_nr].ce = ce; + symlink_queue[symlink_queue_nr].nr_checkouts = nr_checkouts; + symlink_queue_nr++; +} + +size_t symlink_queue_size(void) +{ + return symlink_queue_nr; +} + +static int checkout_symlink_queue(struct checkout *state) +{ + size_t i; + int ret = 0; + + for (i = 0; i < symlink_queue_nr; ++i) { + struct symlink_checkout_item *sci = &symlink_queue[i]; + ret |= checkout_entry(sci->ce, state, NULL, sci->nr_checkouts); + advance_progress_meter(); + } + + FREE_AND_NULL(symlink_queue); + symlink_queue_nr = symlink_queue_alloc = 0; + return ret; +} + static int handle_results(struct checkout *state) { int ret = 0; @@ -257,16 +295,6 @@ static int close_and_clear(int *fd) return ret; } -static int check_leading_dirs(const char *path, int len, int prefix_len) -{ - const char *slash = path + len; - - while (slash > path && *slash != '/') - slash--; - - return has_dirs_only_path(path, slash - path, prefix_len); -} - void write_checkout_item(struct checkout *state, struct checkout_item *ci) { unsigned int mode = (ci->ce->ce_mode & 0100) ? 0777 : 0666; @@ -276,27 +304,15 @@ void write_checkout_item(struct checkout *state, struct checkout_item *ci) strbuf_add(&path, state->base_dir, state->base_dir_len); strbuf_add(&path, ci->ce->name, ci->ce->ce_namelen); - /* - * At this point, leading dirs should have already been created. But if - * a symlink being checked out has collided with one of the dirs, due to - * file system folding rules, it's possible that the dirs are no longer - * present. So we have to check again, and report any path collisions. - */ - if (!check_leading_dirs(path.buf, path.len, state->base_dir_len)) { - ci->status = CI_RETRY; - goto out; - } - fd = open(path.buf, O_WRONLY | O_CREAT | O_EXCL, mode); if (fd < 0) { - if (errno == EEXIST || errno == EISDIR) { + if (errno == EEXIST || errno == EISDIR || errno == ENOENT || + errno == ENOTDIR) { /* * Errors which probably represent a path collision. * Suppress the error message and mark the ci to be - * retried later, sequentially. ENOTDIR and ENOENT are - * also interesting, but check_leading_dirs() should - * have already caught these cases. + * retried later, sequentially. */ ci->status = CI_RETRY; } else { @@ -523,7 +539,7 @@ static int run_checkout_sequentially(struct checkout *state) if (ci->status != CI_RETRY) advance_progress_meter(); } - return handle_results(state); + return handle_results(state) | checkout_symlink_queue(state); } int run_parallel_checkout(struct checkout *state, int num_workers, int threshold, @@ -553,7 +569,8 @@ int run_parallel_checkout(struct checkout *state, int num_workers, int threshold workers = setup_workers(state, num_workers); gather_results_from_workers(workers, num_workers); finish_workers(workers, num_workers); - ret = handle_results(state); + ret |= handle_results(state); + ret |= checkout_symlink_queue(state); done: finish_parallel_checkout(); diff --git a/parallel-checkout.h b/parallel-checkout.h index 2b81a5db6c..a4f7e5b7bd 100644 --- a/parallel-checkout.h +++ b/parallel-checkout.h @@ -29,6 +29,14 @@ void get_parallel_checkout_configs(int *num_workers, int *threshold); int enqueue_checkout(struct cache_entry *ce, struct conv_attrs *ca); size_t pc_queue_size(void); +/* + * Enqueues a symlink to be checked out *sequentially* after the parallel + * checkout finishes. This is done to avoid path collisions with leading dirs, + * which could make parallel workers write a file to the wrong place. + */ +void enqueue_symlink_checkout(struct cache_entry *ce, int *nr_checkouts); +size_t symlink_queue_size(void); + /* * Write all the queued entries, returning 0 on success. If the number of * entries is below the specified threshold, the operation is performed diff --git a/unpack-trees.c b/unpack-trees.c index dcb40dc8fa..01928d3d65 100644 --- a/unpack-trees.c +++ b/unpack-trees.c @@ -472,6 +472,7 @@ static int check_updates(struct unpack_trees_options *o, if (ce->ce_flags & CE_UPDATE) { size_t last_pc_queue_size = pc_queue_size(); + size_t last_symlink_queue_size = symlink_queue_size(); if (ce->ce_flags & CE_WT_REMOVE) BUG("both update and delete flags are set on %s", @@ -479,7 +480,8 @@ static int check_updates(struct unpack_trees_options *o, ce->ce_flags &= ~CE_UPDATE; errs |= checkout_entry(ce, &state, NULL, NULL); - if (last_pc_queue_size == pc_queue_size()) + if (last_pc_queue_size == pc_queue_size() && + last_symlink_queue_size == symlink_queue_size()) display_progress(progress, ++cnt); } } From patchwork Mon Aug 10 21:33:26 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Matheus Tavares X-Patchwork-Id: 11708187 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id CD6DD174A for ; Mon, 10 Aug 2020 21:35:43 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id AEAD42073E for ; Mon, 10 Aug 2020 21:35:43 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=usp-br.20150623.gappssmtp.com header.i=@usp-br.20150623.gappssmtp.com header.b="oGX/BlEP" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727002AbgHJVfn (ORCPT ); Mon, 10 Aug 2020 17:35:43 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36266 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726806AbgHJVfm (ORCPT ); Mon, 10 Aug 2020 17:35:42 -0400 Received: from mail-qt1-x841.google.com (mail-qt1-x841.google.com [IPv6:2607:f8b0:4864:20::841]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 03D05C061756 for ; Mon, 10 Aug 2020 14:35:42 -0700 (PDT) Received: by mail-qt1-x841.google.com with SMTP id w9so7974630qts.6 for ; Mon, 10 Aug 2020 14:35:41 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=usp-br.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=Hh+MYb8IWJsn+O485kBva+2apDsvaqnZccKmIGFqA3A=; b=oGX/BlEP3D6YvTAlXXwWW+/UpSBYj2FVOytrfnEgrvBEmc40Rf7OQ0cagHqInshCIQ JCtPkJIx8p742Fjhgy2jpP9AuKZQQ4IfuhpSRaYTooFs5/3aKs4nX1OMoR4Eb4o5tZEc ZdsPDEJAMBpbTfdS08qdYzRSEVyYspzidQqv4sMbxiD8YYNLazDGJ6SB9lmc4yeAg9KN E3p31kdog1NWsJOy/3jUJE6Ayv4CGKRB0hTeufGnWycy6TO18PXbZ/I7mnwlvcMVmzq9 D5MArRGf+5ryH6jBQMzpA69Hw1UYkLaniZdMEA+eR8vKeF6eSod4C7g+jeXxohup0Ely h7mQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=Hh+MYb8IWJsn+O485kBva+2apDsvaqnZccKmIGFqA3A=; b=LHQxEiRtjB4IHZbzM4/XkBlSHMdIzqTkjjzVB+AdmtiBC/e+H4hq+jKhM8b16TcAof w8e7gqsOKmd8SeIKfwjJQ+WzxZPsqJ6X28q9mASF0RkS2IClXoZUYv/N0Y79f0dDupVF z5oOLRENfFggopVKr9h93BApgUsZZdAIV0xU7gtO50a1rKh/NPSqXc+Wg/0AD0mqB5xf HPVXyesmAPVFxsgf5kp5gDM4J5X5PvmzsuTASoPIrPmZvOKx8bcQBxBUMI+pSBQt+FsO GnBJ0vM2L4YZN3/Bx4lpU3/bXjptHD5N8qRpGlAbCk8BzH9Lfj6/YDmssEqfPQWhhz+J uV2Q== X-Gm-Message-State: AOAM5317qn1lhJjXQQeo5KACxAKWQkIk7l3TLF2W2Je+IKFNC6Hovasw iBaK4o1NC5LD3pFUo1RSoGWfhUsc7HI= X-Google-Smtp-Source: ABdhPJyPmAiKGe28b/qz3B09nCYQ+rzXYSXdlJ+eCFoaUcaQnfsDbHJL1foPx1EPiWCpWEKKNX+9tw== X-Received: by 2002:aed:2946:: with SMTP id s64mr29699832qtd.204.1597095340806; Mon, 10 Aug 2020 14:35:40 -0700 (PDT) Received: from localhost.localdomain ([2804:18:87c:466:1120:3c2c:21e4:5931]) by smtp.gmail.com with ESMTPSA id z197sm15370674qkb.66.2020.08.10.14.35.38 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 10 Aug 2020 14:35:40 -0700 (PDT) From: Matheus Tavares To: git@vger.kernel.org Cc: stolee@gmail.com, jeffhost@microsoft.com, Junio C Hamano Subject: [RFC PATCH 18/21] entry: use is_dir_sep() when checking leading dirs Date: Mon, 10 Aug 2020 18:33:26 -0300 Message-Id: <747e78a0a34c3044a3edf07ca038bd85e7c0afef.1597093021.git.matheus.bernardino@usp.br> X-Mailer: git-send-email 2.27.0 In-Reply-To: References: MIME-Version: 1.0 Sender: git-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org The test 'prevent git~1 squatting on Windows' in t7415, adds the file 'd./a/x' and the submodule 'd\a' to the index, with `git -c core.protectNTFS=false update-index --add --cacheinfo`. Then it performs a clone with `--recurse-submodules`. Since "d./" and "d\" represent the same entry on NTFS, the operation is expected to fail, because the submodule directory is not empty by the time "d\a" is cloned. With parallel checkout, this condition is still valid: although we call checkout_entry() for gitlinks before we write regular files (which are delayed for later parallel write), the actual submodule cloning only happens after unpack_trees() returns, in builtin/clone.c:checkout(). Note, however, that we do create the submodule directory (and leading directories) in unpack_trees(). But the current code iterates through path components only considering "/", not "\", which is also valid on Windows. The reason we don't fail to create the leading dir "d" for the gitlink "d\a" is because, by the time we call mkdir("d\a"), "d" was already created for the regular file 'd./a/x'. Again, this is still true for parallel checkout, since we create leading dirs sequentially, even for entries that are delayed for later writing. But in a following patch, we will allow checkout workers to create the leading directories in parallel, for better performance. Therefore, when checkout_entry() is called for the gitlink "d\a", "d" won't be present yet, and mkdir("d\a") will fail with ENOENT. To solve this, in preparation for the said patch, let's use is_dir_sep() when checking path components, so that checkout_entry() can correctly create "d" for the gitlink "d\a". Signed-off-by: Matheus Tavares --- I'm not sure if this is the right way to make t7415 work with parallel-checkout; or if we should, perhaps, change the test to add the submodule at 'a/d'. I'd love if someone more familiar with Windows could review this one. entry.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/entry.c b/entry.c index 6208df23df..19f2c1d132 100644 --- a/entry.c +++ b/entry.c @@ -19,7 +19,7 @@ static void create_directories(const char *path, int path_len, do { buf[len] = path[len]; len++; - } while (len < path_len && path[len] != '/'); + } while (len < path_len && !is_dir_sep(path[len])); if (len >= path_len) break; buf[len] = 0; @@ -404,7 +404,7 @@ static int check_path(const char *path, int len, struct stat *st, int skiplen) { const char *slash = path + len; - while (path < slash && *slash != '/') + while (path < slash && !is_dir_sep(*slash)) slash--; if (!has_dirs_only_path(path, slash - path, skiplen)) { errno = ENOENT; From patchwork Mon Aug 10 21:33:27 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Matheus Tavares X-Patchwork-Id: 11708189 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 5F2A4174A for ; Mon, 10 Aug 2020 21:35:48 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 37B4D20734 for ; Mon, 10 Aug 2020 21:35:48 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=usp-br.20150623.gappssmtp.com header.i=@usp-br.20150623.gappssmtp.com header.b="FRa5A/J7" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727011AbgHJVfr (ORCPT ); Mon, 10 Aug 2020 17:35:47 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36278 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726806AbgHJVfq (ORCPT ); Mon, 10 Aug 2020 17:35:46 -0400 Received: from mail-qk1-x741.google.com (mail-qk1-x741.google.com [IPv6:2607:f8b0:4864:20::741]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 671ACC061756 for ; Mon, 10 Aug 2020 14:35:46 -0700 (PDT) Received: by mail-qk1-x741.google.com with SMTP id 77so9873477qkm.5 for ; Mon, 10 Aug 2020 14:35:46 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=usp-br.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=4wvrvKFNwrONSKucJStPte6w8ul1OeR/QlmjFedtXVY=; b=FRa5A/J7Aky2xaNOD4301relG1dBBJ0g0hvP7uoA7EYYkO34CgPhkZjQEFhWw3xDkS e1OGripsKwBb+se/dAGUkT1LPlZx1RaWAx0IGmRzbYhfpXTY/Q7SfUP9oD8Z/j27KJ0j yFuVOZu1rz7evdhWgoGRUGjtRUkgzf3I/6g1dch+pOTflowajSW94SUuOdcnO3d55BKL GkCdYvjW8EsHU/zTsulcIK1vSvikSjfTFocQV/h/nnK+OHTIMJO4wQwo80CUcgW8pSup sBtsbbm0Eh+DbylD38HbUUT0RnBWawJxUy22bRof6HueYLmONelBm1Qv9ldc9zD/k10z yy/A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=4wvrvKFNwrONSKucJStPte6w8ul1OeR/QlmjFedtXVY=; b=LBqDZZCo4n5Z1ThV921GWa4flbelwYKDGCELnmv5dfH9LkTuAT5512nozVRhhaMVIR tyHzjnP5VOmBY98bKho8Gzw1evMiuoJOlarXkrMveO98+vR09ODf1T6uFq3aTiuq/XYu 014U5xaDFCnGbQsj8ArDfpta2b5ySuAxMwIw9CaF0rsrG2FGFj9ofdZ28p82vkwX6t+w XYTCOgTVUjYxpAmnEhvYimvHSzGZMl3g7kkSBLRBeSVO7ooMZkVenOsId6UCo7j+5fM2 7mMDuhtgOtovFJ5V8w3Val8WwlyIpvOMJI+LgMWbxR8JnMpH6w/hQYZI9qB8tFcXsY20 Sb4w== X-Gm-Message-State: AOAM531D70b+RN7Wj0pxtkIiyE81MRpQl9Nvq2J1RBm2wcVRMUGOAHeW oTYIzOv1xenyC5X/Gy6Eg/UP5hQXj2s= X-Google-Smtp-Source: ABdhPJywmZU/Unm3qr9VEhF17eEF0zemvb+DpS9gBqoLkxhUtap8ops9y/CoJOw3a2GVW9fePzZFNQ== X-Received: by 2002:a05:620a:9d0:: with SMTP id y16mr28877221qky.353.1597095345043; Mon, 10 Aug 2020 14:35:45 -0700 (PDT) Received: from localhost.localdomain ([2804:18:87c:466:1120:3c2c:21e4:5931]) by smtp.gmail.com with ESMTPSA id z197sm15370674qkb.66.2020.08.10.14.35.42 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 10 Aug 2020 14:35:44 -0700 (PDT) From: Matheus Tavares To: git@vger.kernel.org Cc: stolee@gmail.com, jeffhost@microsoft.com, Denton Liu , =?utf-8?b?Tmd1eeG7hW4gVGjDoWkgTmfhu41j?= =?utf-8?b?IER1eQ==?= , Jeff King , Junio C Hamano Subject: [RFC PATCH 19/21] symlinks: make has_dirs_only_path() track FL_NOENT Date: Mon, 10 Aug 2020 18:33:27 -0300 Message-Id: X-Mailer: git-send-email 2.27.0 In-Reply-To: References: MIME-Version: 1.0 Sender: git-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org In the next patch, the parallel-checkout workers will be able to create the leading directories of the entries being written by themselves, to increase performance. But to do so, the main process will first need to remove non-directory files that can potentially be in the way (the reasoning is discussed in the next patch). This can be done without much cost by calling has_dirs_only_path() for each path component until we find the first one which is not a real directory, which should then be removed. This operations is cheap because it doesn't have to call stat() again for each component, as the information is already cached from the previous call at entry.c:check_path(). However, when has_dirs_only_path() returns false, we don't know if the component doesn't exist or if it exists as another file type. The best we could do in this case would be to stat() the component again. When there are many files to be checked out inside the same directory (yet to be created by a worker), we would have to call stat() for the same directory once for each path, even though there is nothing to be unlinked there. We can skip this stat() calls by making has_dirs_only_path() also ask for FL_NOENT caching, and converting its return to a tri-state. Note: since we are now caching FL_NOENT, we also need to manually invalidate the cache when we create a directory in a path previously cached as non-existent. While we are here, also remove duplicated comments in has_dirs_only_path() and check_leading_path(). Signed-off-by: Matheus Tavares --- cache.h | 1 + entry.c | 11 +++++++++-- parallel-checkout.c | 3 +++ symlinks.c | 42 ++++++++++++++++++------------------------ 4 files changed, 31 insertions(+), 26 deletions(-) diff --git a/cache.h b/cache.h index e2b41c5f8b..7a08cd6f0e 100644 --- a/cache.h +++ b/cache.h @@ -1711,6 +1711,7 @@ int has_symlink_leading_path(const char *name, int len); int threaded_has_symlink_leading_path(struct cache_def *, const char *, int); int check_leading_path(const char *name, int len); int has_dirs_only_path(const char *name, int len, int prefix_len); +void reset_default_lstat_cache(void); void schedule_dir_for_removal(const char *name, int len); void remove_scheduled_dirs(void); diff --git a/entry.c b/entry.c index 19f2c1d132..e876adff19 100644 --- a/entry.c +++ b/entry.c @@ -14,6 +14,7 @@ static void create_directories(const char *path, int path_len, { char *buf = xmallocz(path_len); int len = 0; + int reset_cache = 0; while (len < path_len) { do { @@ -31,7 +32,7 @@ static void create_directories(const char *path, int path_len, * we test the path components of the prefix with the * stat() function instead of the lstat() function. */ - if (has_dirs_only_path(buf, len, state->base_dir_len)) + if (has_dirs_only_path(buf, len, state->base_dir_len) > 0) continue; /* ok, it is already a directory. */ /* @@ -45,8 +46,14 @@ static void create_directories(const char *path, int path_len, !unlink_or_warn(buf) && !mkdir(buf, 0777)) continue; die_errno("cannot create directory at '%s'", buf); + } else { + /* The cache had FL_NOENT, but we now created a dir */ + reset_cache = 1; } } + + if (reset_cache) + reset_default_lstat_cache(); free(buf); } @@ -406,7 +413,7 @@ static int check_path(const char *path, int len, struct stat *st, int skiplen) while (path < slash && !is_dir_sep(*slash)) slash--; - if (!has_dirs_only_path(path, slash - path, skiplen)) { + if (has_dirs_only_path(path, slash - path, skiplen) <= 0) { errno = ENOENT; return -1; } diff --git a/parallel-checkout.c b/parallel-checkout.c index fee93460c1..4d72540256 100644 --- a/parallel-checkout.c +++ b/parallel-checkout.c @@ -185,6 +185,9 @@ static int handle_results(struct checkout *state) pc_status = PC_HANDLING_RESULTS; + /* Workers might have created dirs, so the cache must be invalidated */ + reset_default_lstat_cache(); + for (i = 0; i < parallel_checkout->nr; ++i) { struct checkout_item *ci = ¶llel_checkout->items[i]; struct stat *st = &ci->st; diff --git a/symlinks.c b/symlinks.c index 69d458a24d..3adf6ef8a1 100644 --- a/symlinks.c +++ b/symlinks.c @@ -47,6 +47,11 @@ static inline void reset_lstat_cache(struct cache_def *cache) */ } +void reset_default_lstat_cache(void) +{ + reset_lstat_cache(&default_cache); +} + #define FL_DIR (1 << 0) #define FL_NOENT (1 << 1) #define FL_SYMLINK (1 << 2) @@ -210,15 +215,6 @@ int has_symlink_leading_path(const char *name, int len) return threaded_has_symlink_leading_path(&default_cache, name, len); } -/* - * Return zero if path 'name' has a leading symlink component or - * if some leading path component does not exists. - * - * Return -1 if leading path exists and is a directory. - * - * Return path length if leading path exists and is neither a - * directory nor a symlink. - */ int check_leading_path(const char *name, int len) { return threaded_check_leading_path(&default_cache, name, len); @@ -246,30 +242,28 @@ static int threaded_check_leading_path(struct cache_def *cache, const char *name return match_len; } -/* - * Return non-zero if all path components of 'name' exists as a - * directory. If prefix_len > 0, we will test with the stat() - * function instead of the lstat() function for a prefix length of - * 'prefix_len', thus we then allow for symlinks in the prefix part as - * long as those points to real existing directories. - */ int has_dirs_only_path(const char *name, int len, int prefix_len) { return threaded_has_dirs_only_path(&default_cache, name, len, prefix_len); } /* - * Return non-zero if all path components of 'name' exists as a - * directory. If prefix_len > 0, we will test with the stat() - * function instead of the lstat() function for a prefix length of - * 'prefix_len', thus we then allow for symlinks in the prefix part as - * long as those points to real existing directories. + * Return a positive number if all path components of 'name' exist as + * directories, a negative number if a component does not exist, and 0 otherwise + * (e.g. a component exists but as another file type). If prefix_len > 0, we + * will test with the stat() function instead of the lstat() function for a + * prefix length of 'prefix_len', thus we return +1 for symlinks in the prefix + * part as long as those points to real existing directories. */ static int threaded_has_dirs_only_path(struct cache_def *cache, const char *name, int len, int prefix_len) { - return lstat_cache(cache, name, len, - FL_DIR|FL_FULLPATH, prefix_len) & - FL_DIR; + int flags = lstat_cache(cache, name, len, + FL_NOENT|FL_DIR|FL_FULLPATH, prefix_len); + if (flags & FL_DIR) + return 1; + if (flags & FL_NOENT) + return -1; + return 0; } static struct strbuf removal = STRBUF_INIT; From patchwork Mon Aug 10 21:33:28 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Matheus Tavares X-Patchwork-Id: 11708191 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id EAC6D14E3 for ; Mon, 10 Aug 2020 21:35:52 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id C3B232073E for ; Mon, 10 Aug 2020 21:35:52 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=usp-br.20150623.gappssmtp.com header.i=@usp-br.20150623.gappssmtp.com header.b="Uk+mPO7p" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727012AbgHJVfw (ORCPT ); Mon, 10 Aug 2020 17:35:52 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36290 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726806AbgHJVfv (ORCPT ); Mon, 10 Aug 2020 17:35:51 -0400 Received: from mail-qt1-x843.google.com (mail-qt1-x843.google.com [IPv6:2607:f8b0:4864:20::843]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 050C4C061756 for ; Mon, 10 Aug 2020 14:35:50 -0700 (PDT) Received: by mail-qt1-x843.google.com with SMTP id s16so7982769qtn.7 for ; Mon, 10 Aug 2020 14:35:49 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=usp-br.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=Rw4A1c5Qr5hHagunbyoBniK9C8/HmVggraJ+NBd3OcA=; b=Uk+mPO7pGzZF3gblALHy74ToBGwUzFena7h4SevbEuXhn9oIOW5toKSkyE3H5vXg/N e9RZkE3O73BDCehquS7WN7lX+fIl7Y6+NHkJL4z0l6hotPpjvgRiGb2/0wCs7FIe0lMz 1xJY8HbJzlk5TiQZJXmIwO2JLYVE8XWdIQcRVdQAoBnC9yMm2qIs2giwJkJodXABDGZI Fpun1eDjcy3rq/AZF8yA3NoqhZAP87LM6N/VO0W+k77rhgXMXmwGC75N0Y3yctuk+Zl6 BJVVB4lBtSj5phdUNkchyiIpLCT1VdV2AeKLEAB5vXZbht8O0GgRp8gUOwqUIJslWgL0 kwhw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=Rw4A1c5Qr5hHagunbyoBniK9C8/HmVggraJ+NBd3OcA=; b=ZA/HKv1Z3b526jnF4p7Zx6Nbg4ijYTU2heVV/7ugBtFXskbJMeDe/l5zLRmu6ToZ8Q QWPUBr0FKIduw9+vkI4jhV7ZERdB3Pnig6+PVDKWjhCENo3QiEI/3DeinkrEIY1sLc4i u+mbzPjK0JYVV1dQaH8AC1J463NTU2wamea0vecRCytubuVDtiG4ki7qDuXG90EcJQ4I 53tN1XBgXRoVsrd7Tu+F9KzGPxcNwtorvFEfjCq2n+hddC/JLtqrnysioI9ro2WbKYf8 0qO+UgCFDjJaBhJf1GBbEOtvvFdQkM+EFNUv5QbJBaHkIWmDk9Us5LjEYs1vDbPJt77H DKtw== X-Gm-Message-State: AOAM530pXGnj5qN6h0ENzDQWv8ovdHqtpnIsk/8kLkw+YlGoc/yzzhGK v+TymHnEHivSF2JaQ5U5wVtgiZ7rfVk= X-Google-Smtp-Source: ABdhPJzxkYBJFYgdbKoYi7s85gGtifZBBIq6Bj9ZZ8wkDvBtgWi86GqKygiMY8zkXHMqbINaSp3BfA== X-Received: by 2002:ac8:4b78:: with SMTP id g24mr29917028qts.248.1597095348694; Mon, 10 Aug 2020 14:35:48 -0700 (PDT) Received: from localhost.localdomain ([2804:18:87c:466:1120:3c2c:21e4:5931]) by smtp.gmail.com with ESMTPSA id z197sm15370674qkb.66.2020.08.10.14.35.46 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 10 Aug 2020 14:35:47 -0700 (PDT) From: Matheus Tavares To: git@vger.kernel.org Cc: stolee@gmail.com, jeffhost@microsoft.com, Junio C Hamano , =?utf-8?b?Tmd1eeG7hW4gVGjDoWkgTmfhu41j?= =?utf-8?b?IER1eQ==?= , Thomas Gummerer Subject: [RFC PATCH 20/21] parallel-checkout: create leading dirs in workers Date: Mon, 10 Aug 2020 18:33:28 -0300 Message-Id: X-Mailer: git-send-email 2.27.0 In-Reply-To: References: MIME-Version: 1.0 Sender: git-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org Allow the parallel workers to create the leading directories of the entries being checked out, instead of pre-creating them in the main process. This optimization should be more effective on file systems with higher I/O latency. Part of the process of creating leading dirs is the removal of any non-directory file that could be in the way. This is currently done inside entry.c:create_directories(). However, if we were to move this to the workers as well, we would risk removing a file just written by another worker, which collided with the one currently being written. In a worse scenario, we could remove the file right after a worker have closed it but before it called stat(). To avoid these problems, let's remove the non-directory files in the main process. And to avoid the cost of extra lstat() calls in this process, we use has_dirs_only_path(), which will have the necessary information already cached from check_path(). Finally, to create the leading dirs in the workers, we could re-use create_directories(). But, unlike the main process, we wouldn't have the stat() information cached. Thus, let's use raceproof_create_file(), which will only stat() the path components after a open() failure, saving us time when creating subsequent files in the same directory. Signed-off-by: Matheus Tavares --- entry.c | 45 ++++++++++++++++++++++++++++++++++++++++++--- parallel-checkout.c | 42 ++++++++++++++++++++++++++++++++++++------ 2 files changed, 78 insertions(+), 9 deletions(-) diff --git a/entry.c b/entry.c index e876adff19..5dfd4d150d 100644 --- a/entry.c +++ b/entry.c @@ -57,6 +57,43 @@ static void create_directories(const char *path, int path_len, free(buf); } +static void remove_non_dirs(const char *path, int path_len, + const struct checkout *state) +{ + char *buf = xmallocz(path_len); + int len = 0; + + while (len < path_len) { + int ret; + + do { + buf[len] = path[len]; + len++; + } while (len < path_len && !is_dir_sep(path[len])); + if (len >= path_len) + break; + buf[len] = 0; + + ret = has_dirs_only_path(buf, len, state->base_dir_len); + + if (ret > 0) + continue; /* Is directory. */ + if (ret < 0) + break; /* No entry */ + + /* ret == 0: not a directory, let's unlink it. */ + + if (!state->force) + die("'%s' already exists, and it's not a directory", buf); + + if (unlink(buf)) + die_errno("cannot unlink '%s'", buf); + else + break; + } + free(buf); +} + static void remove_subtree(struct strbuf *path) { DIR *dir = opendir(path->buf); @@ -555,8 +592,6 @@ int checkout_entry_ca(struct cache_entry *ce, struct conv_attrs *ca, } else if (state->not_new) return 0; - create_directories(path.buf, path.len, state); - if (nr_checkouts) (*nr_checkouts)++; @@ -565,9 +600,13 @@ int checkout_entry_ca(struct cache_entry *ce, struct conv_attrs *ca, ca = &ca_buf; } - if (!enqueue_checkout(ce, ca)) + if (!enqueue_checkout(ce, ca)) { + /* "clean" path so that workers can create leading dirs */ + remove_non_dirs(path.buf, path.len, state); return 0; + } + create_directories(path.buf, path.len, state); return write_entry(ce, path.buf, ca, state, 0); } diff --git a/parallel-checkout.c b/parallel-checkout.c index 4d72540256..5b73d8fa4b 100644 --- a/parallel-checkout.c +++ b/parallel-checkout.c @@ -298,20 +298,48 @@ static int close_and_clear(int *fd) return ret; } +struct ci_open_data { + int fd; + unsigned int mode; +}; + +static int ci_open(const char *path, void *cb) +{ + struct ci_open_data *data = cb; + data->fd = open(path, O_WRONLY | O_CREAT | O_EXCL, data->mode); + + if (data->fd < 0) { + /* + * EISDIR can only indicate path collisions among the entries + * being checked out. We don't need raceproof_create_file() to + * try removing empty dirs. Instead, just let the caller known + * that the path already exists, so that the collision can be + * properly handled later. + */ + if (errno == EISDIR) + errno = EEXIST; + return 1; + } + + return 0; +} + void write_checkout_item(struct checkout *state, struct checkout_item *ci) { - unsigned int mode = (ci->ce->ce_mode & 0100) ? 0777 : 0666; + struct ci_open_data open_data; int fd = -1, fstat_done = 0; struct strbuf path = STRBUF_INIT; + open_data.mode = (ci->ce->ce_mode & 0100) ? 0777 : 0666; strbuf_add(&path, state->base_dir, state->base_dir_len); strbuf_add(&path, ci->ce->name, ci->ce->ce_namelen); - fd = open(path.buf, O_WRONLY | O_CREAT | O_EXCL, mode); - - if (fd < 0) { - if (errno == EEXIST || errno == EISDIR || errno == ENOENT || - errno == ENOTDIR) { + /* + * The main process already removed any non-directory file that was in + * the way. So if we find one, it's a path collision. + */ + if (raceproof_create_file(path.buf, ci_open, &open_data)) { + if (errno == EEXIST || errno == ENOTDIR || errno == ENOENT) { /* * Errors which probably represent a path collision. * Suppress the error message and mark the ci to be @@ -325,6 +353,8 @@ void write_checkout_item(struct checkout *state, struct checkout_item *ci) goto out; } + fd = open_data.fd; + if (write_checkout_item_to_fd(fd, state, ci, path.buf)) { /* Error was already reported. */ ci->status = CI_FAILED; From patchwork Mon Aug 10 21:33:29 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Matheus Tavares X-Patchwork-Id: 11708193 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id A88D1174A for ; Mon, 10 Aug 2020 21:35:56 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 8AB80206DA for ; Mon, 10 Aug 2020 21:35:56 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=usp-br.20150623.gappssmtp.com header.i=@usp-br.20150623.gappssmtp.com header.b="I0DxX38f" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727021AbgHJVfz (ORCPT ); Mon, 10 Aug 2020 17:35:55 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36302 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726806AbgHJVfy (ORCPT ); Mon, 10 Aug 2020 17:35:54 -0400 Received: from mail-qk1-x743.google.com (mail-qk1-x743.google.com [IPv6:2607:f8b0:4864:20::743]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id EE051C061756 for ; Mon, 10 Aug 2020 14:35:52 -0700 (PDT) Received: by mail-qk1-x743.google.com with SMTP id 2so9841935qkf.10 for ; Mon, 10 Aug 2020 14:35:52 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=usp-br.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=pr52DQdJReuczN6yQWxg7xzpDo926gZ3vA2f9FtoTwA=; b=I0DxX38fQwZt4epJueQ3xwLYdTnjd5i6DC+rQ4i3zvx6Mvtp/nqCIrm9jLQqJa75Ch 6dPlkQADQ7E7NCJrclGmXOQSJ/HyilDCAYGJPPPWvQ9emXV4kmvObIQsC7u6SCuG1wCL FBOlrJuZKt6Z7395nerjDnUpdUDbReWE1pi2eGtDlrmncprUO6l1Ved8Dz3Cs8Afog1v qlrALFcHO6GIYBehVDhVoC1MNOV+DkbiVMC0HzIZN6e1Qqsi15SIJsTjVR4Rf4Er4oFr eu8fz7HQf+ZrXE+LtWqTshYJ8bUuxolWg4SK+HzpOu5GSbJluKApoJizkHHTSOUbRLxT HneA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=pr52DQdJReuczN6yQWxg7xzpDo926gZ3vA2f9FtoTwA=; b=A5gsD0D36WIeJbhXaUme5bJ0UEwOYl9dOzL5reb9u6dYjPgIH7CnNIcRVqXxtK6JQb CYD5nfl6hzkZ1Me1g2m/6Bpbj6Lsm7UiPiuXjw2AUD6QVrOHK+l5ik93OJ3n+RIGL8ct Kr5em7uclYZnHCXq7d2LdYb1+kUd8QMpHwOfrUOBcc8CZPg1IaU/0P7Jf4rhlNol5cr5 1K2rimdn08rcLlQc0hMuclY6fmP6+GJiQe0W305jQjuIUMLdLhjC4b+mRLCbwdgiZg55 qg0SDlPvILMrb9jA6kBIioQFaAi/geEIxRMro7dAlA6impJrbcq5NlBnpSnKcYjvU/em Vdeg== X-Gm-Message-State: AOAM532YyEf4uA+iI2RdzXhp97xVry0GA+/rnNQJbbntuxqYTosc+vGv NljdRRxVMwgPUVD7AD/4ebbM7u5QCek= X-Google-Smtp-Source: ABdhPJyvex9NhOUtkzLVXkrQp6IogngOYy+Xczns41EUIdGIyVoFCBhVmYvgsZ3JTocsAK+levuZYQ== X-Received: by 2002:ae9:dc45:: with SMTP id q66mr27956567qkf.55.1597095351743; Mon, 10 Aug 2020 14:35:51 -0700 (PDT) Received: from localhost.localdomain ([2804:18:87c:466:1120:3c2c:21e4:5931]) by smtp.gmail.com with ESMTPSA id z197sm15370674qkb.66.2020.08.10.14.35.49 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 10 Aug 2020 14:35:51 -0700 (PDT) From: Matheus Tavares To: git@vger.kernel.org Cc: stolee@gmail.com, jeffhost@microsoft.com, Junio C Hamano , Thomas Gummerer Subject: [RFC PATCH 21/21] parallel-checkout: skip checking the working tree on clone Date: Mon, 10 Aug 2020 18:33:29 -0300 Message-Id: <6356b499a4c12c86ec12ca6309c12e16662996f9.1597093021.git.matheus.bernardino@usp.br> X-Mailer: git-send-email 2.27.0 In-Reply-To: References: MIME-Version: 1.0 Sender: git-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org If the current checkout process is part of a clone, we can skip some steps that check paths in the working tree, as we know it was previously empty. More specifically, we can enqueue the entry for parallel checkout before calling check_path() to see if the path was already present and up-to-date. We can also skip calling remove_non_dirs(). Note: this optimization is only possible because the parallel checkout machinery will detect path collisions, and call checkout_entry_ca() again for them, going through the check_path() logic. Signed-off-by: Matheus Tavares --- entry.c | 16 ++++++++++++++-- 1 file changed, 14 insertions(+), 2 deletions(-) diff --git a/entry.c b/entry.c index 5dfd4d150d..8c03e23811 100644 --- a/entry.c +++ b/entry.c @@ -513,12 +513,24 @@ int checkout_entry_ca(struct cache_entry *ce, struct conv_attrs *ca, return 0; } - if (topath) { + if (topath || state->clone) { if (S_ISREG(ce->ce_mode) && !ca) { convert_attrs(state->istate, &ca_buf, ce->name); ca = &ca_buf; } - return write_entry(ce, topath, ca, state, 1); + if (topath) + return write_entry(ce, topath, ca, state, 1); + /* + * Since we are cloning, there should be no previous files in + * the working tree. So we can skip calling remove_non_dirs() + * and check_path(). (parallel-checkout.c will take care of path + * collision.) + */ + if (!enqueue_checkout(ce, ca)) { + if (nr_checkouts) + (*nr_checkouts)++; + return 0; + } } /*