From patchwork Fri Oct 6 22:01:50 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Taylor Blau X-Patchwork-Id: 13412046 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5D749E94134 for ; Fri, 6 Oct 2023 22:01:55 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233530AbjJFWBy (ORCPT ); Fri, 6 Oct 2023 18:01:54 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53702 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233248AbjJFWBx (ORCPT ); Fri, 6 Oct 2023 18:01:53 -0400 Received: from mail-qk1-x72d.google.com (mail-qk1-x72d.google.com [IPv6:2607:f8b0:4864:20::72d]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 52D77BD for ; Fri, 6 Oct 2023 15:01:52 -0700 (PDT) Received: by mail-qk1-x72d.google.com with SMTP id af79cd13be357-7740c8509c8so163496385a.3 for ; Fri, 06 Oct 2023 15:01:52 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ttaylorr-com.20230601.gappssmtp.com; s=20230601; t=1696629711; x=1697234511; darn=vger.kernel.org; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=x2mVs/HGyl66PZJ1g77jzYqsJ5AKC++VawlUe5Kc/y8=; b=uW05FIG+y6nKEISnUAgCRMCxJg5UTWctT8UsqMJsgcG89999oonmlkpiNjJDU4ZDcY T6oU/Wp0CvqnkyMxGVCQvnUX+upnZbmQXQ8FcbzJXFg3l3RtqyGaHMUNotNRPSHNNrdY Nr9d/xse88Y97+KcbmSkO3TLZIYezWSZIlrioNc0X2Z39AYQ+3bP0RLRnj9Yj8NAXq1e pIi/mwNowoBgrECbJG5k52fGLmFoUjxL/Qn97S6EIKxFX145FDo6y3UgU4FoeMf8de8G BAl5pUx78wlPgy+jDFLM2exBptp/KEx4gBapJ+DKQr8Q1+JHj1WzuHQmnB7ZMNmcFVP9 yG+w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1696629711; x=1697234511; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=x2mVs/HGyl66PZJ1g77jzYqsJ5AKC++VawlUe5Kc/y8=; b=VmUS09niq2QDyAGj5xIDuc8D+xVwALue6+RCB506g/rp6RighvnLrmesu6fiy4AscA TSKBuMh2gqsmd9kZnAQ0yE+GP1Za+uS54JmANsuGS6fIRGegzmxpmO8qeEtFwAQ1+fsw Q+NGEZ1cKrvxrBLleh9JyBx7E7RsUbpkmwJB0NsImF+uGJ8aK2wuxfO7omtFtfgwwyAz 2HkBDtYEaaSFaj3MKvCbwhg23dTluuCU21m4dKFTddIKjClre6fwdjSsS011oq16xqCu atKsZSAUZ0kR5FMfAgZ1eo0tVeqRrQiwwOaq5Q4PdEnbjGBXFnXrEyD1FhvhkgfrkVSk v+Uw== X-Gm-Message-State: AOJu0YwKyyZOtx4agqihS9m2HFk+mieYZ3+OYb7u8cGs3xWg2ZB68bgq cy2muMkMjy0d3VWD5X89BTkrKtbnodLJVh6lZFAFVw== X-Google-Smtp-Source: AGHT+IEA8oPNUvgREcd5dEjQV2YzgHq2UvdzJKm6/PR0khKlp6FjaaUG3Z6srcLyKhlry599ZSciWw== X-Received: by 2002:a05:620a:3915:b0:76d:a00b:84f8 with SMTP id qr21-20020a05620a391500b0076da00b84f8mr10946707qkn.57.1696629711199; Fri, 06 Oct 2023 15:01:51 -0700 (PDT) Received: from localhost (104-178-186-189.lightspeed.milwwi.sbcglobal.net. [104.178.186.189]) by smtp.gmail.com with ESMTPSA id o16-20020ae9f510000000b00765aa3ffa07sm1614058qkg.98.2023.10.06.15.01.50 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 06 Oct 2023 15:01:50 -0700 (PDT) Date: Fri, 6 Oct 2023 18:01:50 -0400 From: Taylor Blau To: git@vger.kernel.org Cc: Elijah Newren , "Eric W. Biederman" , Jeff King , Junio C Hamano Subject: [PATCH 1/7] bulk-checkin: factor out `format_object_header_hash()` Message-ID: <37f407281596dd596e49c847c35fdf163977b479.1696629697.git.me@ttaylorr.com> References: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org Before deflating a blob into a pack, the bulk-checkin mechanism prepares the pack object header by calling `format_object_header()`, and writing into a scratch buffer, the contents of which eventually makes its way into the pack. Future commits will add support for deflating multiple kinds of objects into a pack, and will likewise need to perform a similar operation as below. This is a mostly straightforward extraction, with one notable exception. Instead of hard-coding `the_hash_algo`, pass it in to the new function as an argument. This isn't strictly necessary for our immediate purposes here, but will prove useful in the future if/when the bulk-checkin mechanism grows support for the hash transition plan. Signed-off-by: Taylor Blau --- bulk-checkin.c | 20 ++++++++++++++------ 1 file changed, 14 insertions(+), 6 deletions(-) diff --git a/bulk-checkin.c b/bulk-checkin.c index 223562b4e7..0aac3dfe31 100644 --- a/bulk-checkin.c +++ b/bulk-checkin.c @@ -247,6 +247,19 @@ static void prepare_to_stream(struct bulk_checkin_packfile *state, die_errno("unable to write pack header"); } +static void format_object_header_hash(const struct git_hash_algo *algop, + git_hash_ctx *ctx, enum object_type type, + size_t size) +{ + unsigned char header[16384]; + unsigned header_len = format_object_header((char *)header, + sizeof(header), + type, size); + + algop->init_fn(ctx); + algop->update_fn(ctx, header, header_len); +} + static int deflate_blob_to_pack(struct bulk_checkin_packfile *state, struct object_id *result_oid, int fd, size_t size, @@ -254,8 +267,6 @@ static int deflate_blob_to_pack(struct bulk_checkin_packfile *state, { off_t seekback, already_hashed_to; git_hash_ctx ctx; - unsigned char obuf[16384]; - unsigned header_len; struct hashfile_checkpoint checkpoint = {0}; struct pack_idx_entry *idx = NULL; @@ -263,10 +274,7 @@ static int deflate_blob_to_pack(struct bulk_checkin_packfile *state, if (seekback == (off_t) -1) return error("cannot find the current offset"); - header_len = format_object_header((char *)obuf, sizeof(obuf), - OBJ_BLOB, size); - the_hash_algo->init_fn(&ctx); - the_hash_algo->update_fn(&ctx, obuf, header_len); + format_object_header_hash(the_hash_algo, &ctx, OBJ_BLOB, size); /* Note: idx is non-NULL when we are writing */ if ((flags & HASH_WRITE_OBJECT) != 0) From patchwork Fri Oct 6 22:01:53 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Taylor Blau X-Patchwork-Id: 13412047 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 72C37E94133 for ; Fri, 6 Oct 2023 22:02:00 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233594AbjJFWB7 (ORCPT ); Fri, 6 Oct 2023 18:01:59 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:39520 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233473AbjJFWB4 (ORCPT ); Fri, 6 Oct 2023 18:01:56 -0400 Received: from mail-qv1-xf2d.google.com (mail-qv1-xf2d.google.com [IPv6:2607:f8b0:4864:20::f2d]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 5BB28BF for ; Fri, 6 Oct 2023 15:01:55 -0700 (PDT) Received: by mail-qv1-xf2d.google.com with SMTP id 6a1803df08f44-65b0c9fb673so13711416d6.1 for ; Fri, 06 Oct 2023 15:01:55 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ttaylorr-com.20230601.gappssmtp.com; s=20230601; t=1696629714; x=1697234514; darn=vger.kernel.org; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=3eTbKEnYSDWnQAp/MCV495EbAoblSj6tsRoXYxx8eTA=; b=CbQV3v2LImOnbb4/HwA2Z1ci3z86EWmGIT3+VhuDWBnN/ZMZ3LPugQOmGShyopSSOx FeO6Vr4rQBNOAoP9NnZfU8eUW/zpcza4dJYLBdfr1svp7q4lZ7KzLkRhkzOrwmniGuz6 OqFE1xkrFiX67cn1dQfybiVi9JYctG+X5EeBvWXAUZwsw0gz1Kgc4U4klQW4fCPcfE10 FmA8LqnbtyhRJrXxhCeVF7euT8A2ben01clQS5PpEfm+RyDjTQzJnoyzRWjCz4bvai4d MyhxXU77ztphbZP5YKj/EUxSeVCYjiX2Zjr43CISL8ZfO6XD5b1jvcj6Kq2NnpuaHkib h0bA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1696629714; x=1697234514; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=3eTbKEnYSDWnQAp/MCV495EbAoblSj6tsRoXYxx8eTA=; b=OrHbRuKXF08qjeWINtuqgPEq7y5DGmkz9sWuiYRb8R59DoOVCAJGi92Lx9+mzPb617 +bRPw9vZbJzE/6Tfc/khku+BtbTx6HUykJdUhrd6lHWhX3HVBTuj+OgW1X9JRYu0wf9T /skeG14N8eRECvA68RWULGuERiWXmzLuIILUwfRuD6mrX/jLLhA4GudCW/6JembmuGXc 6qywT2JYKFEpzSgSSk9ZvtupUn55WeVfSMcBzATzeMC0y1sXAqyUKstv6MOe+7hujfE2 1Fz2lsdxHmfP47Ea5QJOfeRfnmnlWrwbQ8/jrqpP9WyGIJpcrX9WPPdWOy31ScOMzkiZ d7Tg== X-Gm-Message-State: AOJu0Yz4SJeEu7xX0WnaIpwFn0sNNiPwkdgOqJKz/b+6seaRkb6FVgOn BViyFjwcSnbxMmfuFVI2E2ZBhqMmvwvZjXB5FugYFA== X-Google-Smtp-Source: AGHT+IFhhPL5IeOT6a5P+Z2xa2DHMP9QDtH06CpfxceSJGthVYFR33I6/BUPcUab6s6RPNubtu21oA== X-Received: by 2002:a0c:dd90:0:b0:65b:177b:a430 with SMTP id v16-20020a0cdd90000000b0065b177ba430mr9894044qvk.47.1696629714281; Fri, 06 Oct 2023 15:01:54 -0700 (PDT) Received: from localhost (104-178-186-189.lightspeed.milwwi.sbcglobal.net. [104.178.186.189]) by smtp.gmail.com with ESMTPSA id x20-20020a0cda14000000b0064f53943626sm1697966qvj.89.2023.10.06.15.01.53 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 06 Oct 2023 15:01:54 -0700 (PDT) Date: Fri, 6 Oct 2023 18:01:53 -0400 From: Taylor Blau To: git@vger.kernel.org Cc: Elijah Newren , "Eric W. Biederman" , Jeff King , Junio C Hamano Subject: [PATCH 2/7] bulk-checkin: factor out `prepare_checkpoint()` Message-ID: <9cc1f3014abe7fec997a99b6ac93d8ebb5455fa6.1696629697.git.me@ttaylorr.com> References: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org In a similar spirit as the previous commit, factor out the routine to prepare streaming into a bulk-checkin pack into its own function. Unlike the previous patch, this is a verbatim copy and paste. Signed-off-by: Taylor Blau --- bulk-checkin.c | 20 ++++++++++++++------ 1 file changed, 14 insertions(+), 6 deletions(-) diff --git a/bulk-checkin.c b/bulk-checkin.c index 0aac3dfe31..377c41f3ad 100644 --- a/bulk-checkin.c +++ b/bulk-checkin.c @@ -260,6 +260,19 @@ static void format_object_header_hash(const struct git_hash_algo *algop, algop->update_fn(ctx, header, header_len); } +static void prepare_checkpoint(struct bulk_checkin_packfile *state, + struct hashfile_checkpoint *checkpoint, + struct pack_idx_entry *idx, + unsigned flags) +{ + prepare_to_stream(state, flags); + if (idx) { + hashfile_checkpoint(state->f, checkpoint); + idx->offset = state->offset; + crc32_begin(state->f); + } +} + static int deflate_blob_to_pack(struct bulk_checkin_packfile *state, struct object_id *result_oid, int fd, size_t size, @@ -283,12 +296,7 @@ static int deflate_blob_to_pack(struct bulk_checkin_packfile *state, already_hashed_to = 0; while (1) { - prepare_to_stream(state, flags); - if (idx) { - hashfile_checkpoint(state->f, &checkpoint); - idx->offset = state->offset; - crc32_begin(state->f); - } + prepare_checkpoint(state, &checkpoint, idx, flags); if (!stream_blob_to_pack(state, &ctx, &already_hashed_to, fd, size, path, flags)) break; From patchwork Fri Oct 6 22:01:55 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Taylor Blau X-Patchwork-Id: 13412048 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8D5C4E94134 for ; Fri, 6 Oct 2023 22:02:04 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233603AbjJFWCD (ORCPT ); Fri, 6 Oct 2023 18:02:03 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:39588 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233614AbjJFWCA (ORCPT ); Fri, 6 Oct 2023 18:02:00 -0400 Received: from mail-qk1-x734.google.com (mail-qk1-x734.google.com [IPv6:2607:f8b0:4864:20::734]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 40F52CE for ; Fri, 6 Oct 2023 15:01:58 -0700 (PDT) Received: by mail-qk1-x734.google.com with SMTP id af79cd13be357-7740cf93901so155894385a.2 for ; Fri, 06 Oct 2023 15:01:58 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ttaylorr-com.20230601.gappssmtp.com; s=20230601; t=1696629717; x=1697234517; darn=vger.kernel.org; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=+mW8hc27WncfN/IxezQKjo6JZB6oeTpfnoy0nHgvO/A=; b=oC1foAKlbf4CJXUyzifElT6sQp083KBHo75PyQoGDY/eDZvrjgiNeuIIpIEKVjXf0K GrfjvBMTTfUNLvyxb20GEEMqt+D5ZSj6R16c1jUId3H6oCzTMWgq8G9Wzf+wooNcEYaU RxKqW01o7pVLL8WHp3TKnS8M6cSK8TkHZPh2tk+y2qap8xt+aQ8fqWUGe8i+mCZ1cjMO gi2pYYCJcVotq3iP4h5SktlQnhk+HuacvlHw1CZi7lnQJhNTwxfW6lpowT41cpBEaXmc o28BSmo2oTBoAIoSDf/Xda1fLF44tMkDcDcEjrk9bWrSocnKK6olsWTUk+pl4iPoNDkW stZw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1696629717; x=1697234517; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=+mW8hc27WncfN/IxezQKjo6JZB6oeTpfnoy0nHgvO/A=; b=bZfAzBcTHzJcMHfmyLQ9IthlwSr6EqB830ol/9bYdanZP7ECq0dpYiKoGgX2+5fz71 ZIUWqB1vXx6a62JHAuOI83hk1DDWW85LwPWSlovT4BMuQZEsWkBLHqhAPVJSNzrqcEK+ zo4K+6KoqYBuRvxTz2qDNn+8jZ6we+eMS0qT/Mli15+a7N3qB/mNR+bGxh0l7g8DFzup 8uLj3GqGtNV3URbbzeb4DlN25f4SI+S6LQNf3IgdvuN0SfX2HNRZ07T0JDSNuKRY9HSi jCH8mnlcNG4gFOXrw7SfVCsJIInRbn8y1RWDmbMMrbMrTuxxygmZmvncVuqZCh0esUcj efdA== X-Gm-Message-State: AOJu0YxwnSdepdA+VKJsF/toJo20uzip2mkhqrU/RuhctLV+9wqj8vyA ENBnWAV6EfZ6KGfUG+MqXGAq3ppIaNO+i2GKC43w3w== X-Google-Smtp-Source: AGHT+IGGozNCBk5COK95F+HFuw+s8OR1reqdUWDQCaqUJqIXPQXWvAfbsJ5W8GeCQcj1i/SuH0Mvig== X-Received: by 2002:a05:620a:3949:b0:772:4706:99c0 with SMTP id qs9-20020a05620a394900b00772470699c0mr10181212qkn.26.1696629717138; Fri, 06 Oct 2023 15:01:57 -0700 (PDT) Received: from localhost (104-178-186-189.lightspeed.milwwi.sbcglobal.net. [104.178.186.189]) by smtp.gmail.com with ESMTPSA id o24-20020a05620a111800b0075ca4cd03d4sm1611094qkk.64.2023.10.06.15.01.56 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 06 Oct 2023 15:01:56 -0700 (PDT) Date: Fri, 6 Oct 2023 18:01:55 -0400 From: Taylor Blau To: git@vger.kernel.org Cc: Elijah Newren , "Eric W. Biederman" , Jeff King , Junio C Hamano Subject: [PATCH 3/7] bulk-checkin: factor out `truncate_checkpoint()` Message-ID: References: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org In a similar spirit as previous commits, factor our the routine to truncate a bulk-checkin packfile when writing past the pack size limit. Signed-off-by: Taylor Blau --- bulk-checkin.c | 27 +++++++++++++++++---------- 1 file changed, 17 insertions(+), 10 deletions(-) diff --git a/bulk-checkin.c b/bulk-checkin.c index 377c41f3ad..2dae8be461 100644 --- a/bulk-checkin.c +++ b/bulk-checkin.c @@ -273,6 +273,22 @@ static void prepare_checkpoint(struct bulk_checkin_packfile *state, } } +static void truncate_checkpoint(struct bulk_checkin_packfile *state, + struct hashfile_checkpoint *checkpoint, + struct pack_idx_entry *idx) +{ + /* + * Writing this object to the current pack will make + * it too big; we need to truncate it, start a new + * pack, and write into it. + */ + if (!idx) + BUG("should not happen"); + hashfile_truncate(state->f, checkpoint); + state->offset = checkpoint->offset; + flush_bulk_checkin_packfile(state); +} + static int deflate_blob_to_pack(struct bulk_checkin_packfile *state, struct object_id *result_oid, int fd, size_t size, @@ -300,16 +316,7 @@ static int deflate_blob_to_pack(struct bulk_checkin_packfile *state, if (!stream_blob_to_pack(state, &ctx, &already_hashed_to, fd, size, path, flags)) break; - /* - * Writing this object to the current pack will make - * it too big; we need to truncate it, start a new - * pack, and write into it. - */ - if (!idx) - BUG("should not happen"); - hashfile_truncate(state->f, &checkpoint); - state->offset = checkpoint.offset; - flush_bulk_checkin_packfile(state); + truncate_checkpoint(state, &checkpoint, idx); if (lseek(fd, seekback, SEEK_SET) == (off_t) -1) return error("cannot seek back"); } From patchwork Fri Oct 6 22:01:58 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Taylor Blau X-Patchwork-Id: 13412049 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 84FE7E94133 for ; Fri, 6 Oct 2023 22:02:06 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233630AbjJFWCF (ORCPT ); Fri, 6 Oct 2023 18:02:05 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:39636 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233545AbjJFWCC (ORCPT ); Fri, 6 Oct 2023 18:02:02 -0400 Received: from mail-qk1-x72c.google.com (mail-qk1-x72c.google.com [IPv6:2607:f8b0:4864:20::72c]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 09B8DC6 for ; Fri, 6 Oct 2023 15:02:01 -0700 (PDT) Received: by mail-qk1-x72c.google.com with SMTP id af79cd13be357-7741c5bac51so164523585a.1 for ; Fri, 06 Oct 2023 15:02:00 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ttaylorr-com.20230601.gappssmtp.com; s=20230601; t=1696629720; x=1697234520; darn=vger.kernel.org; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=PEfiUoaCsKtEaNT9T49KMG0BkYQOfIczSGf0JPJ2QGE=; b=mGSpu1CSra9+IrOTwNVjmUvxV1IvOePEtSQfoUELfv0s3oPoPHLAjwNJu8pSx+Jipc HqP6yH4tvDYDiZ1peCNz/dK8Y748g1ZEyis0xYomh3KlYRqejgkaN9aGWjz278nAulWR ZMmxlgyqEDng0aJpBfcDLMaOVdAzNlv2KSrhJPiGjbCVzG7OnsPI9W6wjUl5k4DsqRhA ejtLtNSfUlo6TjNblcoF15yIFpw/nbQseJhaNiKql642X/jkIsF5KFsAkOC8CXjXXG3J iWvw8Fw2VucWVca0Fro5Mz7gg0YywUWcvslInhwLjlfUBLSVy5ELtx+KJCtUuUs4Sc64 W3QA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1696629720; x=1697234520; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=PEfiUoaCsKtEaNT9T49KMG0BkYQOfIczSGf0JPJ2QGE=; b=rzkEeoSg543YVsLjjRr5yk45B/XSzAXrt0ZtzkaWBOMBaelaTYZvIWCSg3PjWJVILA E9P6jbTBubP1/nj9yOUcynV0osSsfIi9utQ+q1/0NTuSEP5reqWk1Oo8IDXxP7LyJ5pc q83IKhjUlCzWjheeZt0jXmjNd4N682gPAh0fWhZQqQ5UlpL0vr9KFQ+53VbWTIBFX8gO PEF2FTVTy4j+0IaeT2m5JYRVuIalQh1HIGfybOggZiJfBxxfJjWkLqU8MpwHxa5d931y u7fO2815ZOOFEFU+G7OE5KU0ilxVA8uH3s+JGfps2jcq51tKRkRIQWSyJzBXeJuaHbpX ZuEw== X-Gm-Message-State: AOJu0Ywbv/MWDyv+3g2wGdmhaW8DtJEly7aJzImNwE904IJfSUQ7NIqd YNen/OB9YCVqtEOciGV48YlH9OaaBLKwUJXzrNhP1A== X-Google-Smtp-Source: AGHT+IGsqz/allKA+wGYUWuXOBNE+h72I/uQHIZ1LHFkao4YMjYIblnNJ+7DzUOfUpqRw+OkVY10nA== X-Received: by 2002:a05:620a:444c:b0:775:9bc3:c492 with SMTP id w12-20020a05620a444c00b007759bc3c492mr12217442qkp.7.1696629719875; Fri, 06 Oct 2023 15:01:59 -0700 (PDT) Received: from localhost (104-178-186-189.lightspeed.milwwi.sbcglobal.net. [104.178.186.189]) by smtp.gmail.com with ESMTPSA id v9-20020ae9e309000000b0076d25b11b62sm1615988qkf.38.2023.10.06.15.01.59 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 06 Oct 2023 15:01:59 -0700 (PDT) Date: Fri, 6 Oct 2023 18:01:58 -0400 From: Taylor Blau To: git@vger.kernel.org Cc: Elijah Newren , "Eric W. Biederman" , Jeff King , Junio C Hamano Subject: [PATCH 4/7] bulk-checkin: factor our `finalize_checkpoint()` Message-ID: <9c6ca564adf297e77e3304ef06692b8c82cddfd6.1696629697.git.me@ttaylorr.com> References: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org In a similar spirit as previous commits, factor out the routine to finalize the just-written object from the bulk-checkin mechanism. Signed-off-by: Taylor Blau --- bulk-checkin.c | 41 +++++++++++++++++++++++++---------------- 1 file changed, 25 insertions(+), 16 deletions(-) diff --git a/bulk-checkin.c b/bulk-checkin.c index 2dae8be461..a9497fcb28 100644 --- a/bulk-checkin.c +++ b/bulk-checkin.c @@ -289,6 +289,30 @@ static void truncate_checkpoint(struct bulk_checkin_packfile *state, flush_bulk_checkin_packfile(state); } +static void finalize_checkpoint(struct bulk_checkin_packfile *state, + git_hash_ctx *ctx, + struct hashfile_checkpoint *checkpoint, + struct pack_idx_entry *idx, + struct object_id *result_oid) +{ + the_hash_algo->final_oid_fn(result_oid, ctx); + if (!idx) + return; + + idx->crc32 = crc32_end(state->f); + if (already_written(state, result_oid)) { + hashfile_truncate(state->f, checkpoint); + state->offset = checkpoint->offset; + free(idx); + } else { + oidcpy(&idx->oid, result_oid); + ALLOC_GROW(state->written, + state->nr_written + 1, + state->alloc_written); + state->written[state->nr_written++] = idx; + } +} + static int deflate_blob_to_pack(struct bulk_checkin_packfile *state, struct object_id *result_oid, int fd, size_t size, @@ -320,22 +344,7 @@ static int deflate_blob_to_pack(struct bulk_checkin_packfile *state, if (lseek(fd, seekback, SEEK_SET) == (off_t) -1) return error("cannot seek back"); } - the_hash_algo->final_oid_fn(result_oid, &ctx); - if (!idx) - return 0; - - idx->crc32 = crc32_end(state->f); - if (already_written(state, result_oid)) { - hashfile_truncate(state->f, &checkpoint); - state->offset = checkpoint.offset; - free(idx); - } else { - oidcpy(&idx->oid, result_oid); - ALLOC_GROW(state->written, - state->nr_written + 1, - state->alloc_written); - state->written[state->nr_written++] = idx; - } + finalize_checkpoint(state, &ctx, &checkpoint, idx, result_oid); return 0; } From patchwork Fri Oct 6 22:02:01 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Taylor Blau X-Patchwork-Id: 13412050 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 762CFE94134 for ; Fri, 6 Oct 2023 22:02:13 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233722AbjJFWCL (ORCPT ); Fri, 6 Oct 2023 18:02:11 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:56580 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233685AbjJFWCF (ORCPT ); Fri, 6 Oct 2023 18:02:05 -0400 Received: from mail-qv1-xf2d.google.com (mail-qv1-xf2d.google.com [IPv6:2607:f8b0:4864:20::f2d]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 22D96F0 for ; Fri, 6 Oct 2023 15:02:04 -0700 (PDT) Received: by mail-qv1-xf2d.google.com with SMTP id 6a1803df08f44-65af7d102b3so15218866d6.1 for ; Fri, 06 Oct 2023 15:02:04 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ttaylorr-com.20230601.gappssmtp.com; s=20230601; t=1696629723; x=1697234523; darn=vger.kernel.org; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=cnYv87NMRNdPneI+Sydk5I4zX4G2dO52pP4tgONi3Hc=; b=FAYs9Pp4CsUNzf3Q39B0pft73AjnhiHIo9iYWYmdguALpRg/nMfK94+Z1m1Pc6Hhpu HaU/+7i1oqoRqbMlmg2qjZrb9rWpS55JjBSrCfVQf5ApLNmPqv6GVyvSMUorCaEkhBBa SBjVHhf/+KSTdDyGqwsUOCXFdNYLn8CO+RMCQuF5qx9eWqt5pTQPEZ/Z37yaxHuMdY9f GmXpslMVwPv71Bu/2GqPljLBfyyActid4fWzLVyqyClZU6b14OxC6VzTcSfvxXh4qYU9 Yw9OqF/KtL6kiRzlFGIZsv2bd+yS+m4yv6lMkHeGlF5+49UdnZJ8tzIWbaqXtGg7HuHN NkZQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1696629723; x=1697234523; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=cnYv87NMRNdPneI+Sydk5I4zX4G2dO52pP4tgONi3Hc=; b=Id4rqLhuralID+j+d4uIyCMrnH2dMk8Q35zxfkXKBfS6IGuX0zuKM7DEoPZF5ckP5w qdcD9pMhLjx/tOuwtVUax4fos/W+Auw/elHEww3USuIVgW7hx/mcW8FE5eHEVmM3aUh7 nBgj9e76VIILqGmlNIT9sGHRhrnT/OvNBEySULHJ4S5JYT3vdqwbnZ/HAxZ2jkFV6lsf LUrrMd4Y04AHjs8EPvkAYnXqUqcWjTTvNaQZUrs/ledD9rXrEdbSaZW0VTyyJoMw6llh 6Nm2ZAHNY6Ob5WHuTEOQsCHYHxCiO3V7B90wERPs2QepjhtuzxFYGREzgHTrfxGodVRW 281A== X-Gm-Message-State: AOJu0YxFQC+l54kdWRX0FSxqkq2Rst4R5aQq4UZwlyCg4ahKWZQe8bFU Jz2PWoBt+7OQpekRv7w1z2XOMyE02yIjvTQrvpf4Fg== X-Google-Smtp-Source: AGHT+IEMEMlnfejvUc+gLQkdue8rAnJhNBOGfkSD1Jw+1WqZH15Na9kD6JSb7LN8JugzyPXKnDi/hw== X-Received: by 2002:a0c:df10:0:b0:658:9cf2:15df with SMTP id g16-20020a0cdf10000000b006589cf215dfmr11112302qvl.8.1696629722694; Fri, 06 Oct 2023 15:02:02 -0700 (PDT) Received: from localhost (104-178-186-189.lightspeed.milwwi.sbcglobal.net. [104.178.186.189]) by smtp.gmail.com with ESMTPSA id h26-20020a0cb4da000000b0065b0a3ae7c7sm1717174qvf.113.2023.10.06.15.02.02 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 06 Oct 2023 15:02:02 -0700 (PDT) Date: Fri, 6 Oct 2023 18:02:01 -0400 From: Taylor Blau To: git@vger.kernel.org Cc: Elijah Newren , "Eric W. Biederman" , Jeff King , Junio C Hamano Subject: [PATCH 5/7] bulk-checkin: introduce `index_blob_bulk_checkin_incore()` Message-ID: <30ca7334c7605a81b9a6bbb386627e436bf8ab33.1696629697.git.me@ttaylorr.com> References: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org Now that we have factored out many of the common routines necessary to index a new object into a pack created by the bulk-checkin machinery, we can introduce a variant of `index_blob_bulk_checkin()` that acts on blobs whose contents we can fit in memory. This will be useful in a couple of more commits in order to provide the `merge-tree` builtin with a mechanism to create a new pack containing any objects it created during the merge, instead of storing those objects individually as loose. Similar to the existing `index_blob_bulk_checkin()` function, the entrypoint delegates to `deflate_blob_to_pack_incore()`, which is responsible for formatting the pack header and then deflating the contents into the pack. The latter is accomplished by calling deflate_blob_contents_to_pack_incore(), which takes advantage of the earlier refactoring and is responsible for writing the object to the pack and handling any overage from pack.packSizeLimit. The bulk of the new functionality is implemented in the function `stream_obj_to_pack_incore()`, which is a generic implementation for writing objects of arbitrary type (whose contents we can fit in-core) into a bulk-checkin pack. The new function shares an unfortunate degree of similarity to the existing `stream_blob_to_pack()` function. But DRY-ing up these two would likely be more trouble than it's worth, since the latter has to deal with reading and writing the contents of the object. Consistent with the rest of the bulk-checkin mechanism, there are no direct tests here. In future commits when we expose this new functionality via the `merge-tree` builtin, we will test it indirectly there. Signed-off-by: Taylor Blau --- bulk-checkin.c | 116 +++++++++++++++++++++++++++++++++++++++++++++++++ bulk-checkin.h | 4 ++ 2 files changed, 120 insertions(+) diff --git a/bulk-checkin.c b/bulk-checkin.c index a9497fcb28..319921efe7 100644 --- a/bulk-checkin.c +++ b/bulk-checkin.c @@ -140,6 +140,69 @@ static int already_written(struct bulk_checkin_packfile *state, struct object_id return 0; } +static int stream_obj_to_pack_incore(struct bulk_checkin_packfile *state, + git_hash_ctx *ctx, + off_t *already_hashed_to, + const void *buf, size_t size, + enum object_type type, + const char *path, unsigned flags) +{ + git_zstream s; + unsigned char obuf[16384]; + unsigned hdrlen; + int status = Z_OK; + int write_object = (flags & HASH_WRITE_OBJECT); + + git_deflate_init(&s, pack_compression_level); + + hdrlen = encode_in_pack_object_header(obuf, sizeof(obuf), type, size); + s.next_out = obuf + hdrlen; + s.avail_out = sizeof(obuf) - hdrlen; + + if (*already_hashed_to < size) { + size_t hsize = size - *already_hashed_to; + if (hsize) { + the_hash_algo->update_fn(ctx, buf, hsize); + } + *already_hashed_to = size; + } + s.next_in = (void *)buf; + s.avail_in = size; + + while (status != Z_STREAM_END) { + status = git_deflate(&s, Z_FINISH); + if (!s.avail_out || status == Z_STREAM_END) { + if (write_object) { + size_t written = s.next_out - obuf; + + /* would we bust the size limit? */ + if (state->nr_written && + pack_size_limit_cfg && + pack_size_limit_cfg < state->offset + written) { + git_deflate_abort(&s); + return -1; + } + + hashwrite(state->f, obuf, written); + state->offset += written; + } + s.next_out = obuf; + s.avail_out = sizeof(obuf); + } + + switch (status) { + case Z_OK: + case Z_BUF_ERROR: + case Z_STREAM_END: + continue; + default: + die("unexpected deflate failure: %d", status); + } + } + git_deflate_end(&s); + return 0; +} + /* * Read the contents from fd for size bytes, streaming it to the * packfile in state while updating the hash in ctx. Signal a failure @@ -313,6 +376,48 @@ static void finalize_checkpoint(struct bulk_checkin_packfile *state, } } +static int deflate_obj_contents_to_pack_incore(struct bulk_checkin_packfile *state, + git_hash_ctx *ctx, + struct object_id *result_oid, + const void *buf, size_t size, + enum object_type type, + const char *path, unsigned flags) +{ + struct hashfile_checkpoint checkpoint = {0}; + struct pack_idx_entry *idx = NULL; + off_t already_hashed_to = 0; + + /* Note: idx is non-NULL when we are writing */ + if (flags & HASH_WRITE_OBJECT) + CALLOC_ARRAY(idx, 1); + + while (1) { + prepare_checkpoint(state, &checkpoint, idx, flags); + if (!stream_obj_to_pack_incore(state, ctx, &already_hashed_to, + buf, size, type, path, flags)) + break; + truncate_checkpoint(state, &checkpoint, idx); + } + + finalize_checkpoint(state, ctx, &checkpoint, idx, result_oid); + + return 0; +} + +static int deflate_blob_to_pack_incore(struct bulk_checkin_packfile *state, + struct object_id *result_oid, + const void *buf, size_t size, + const char *path, unsigned flags) +{ + git_hash_ctx ctx; + + format_object_header_hash(the_hash_algo, &ctx, OBJ_BLOB, size); + + return deflate_obj_contents_to_pack_incore(state, &ctx, result_oid, + buf, size, OBJ_BLOB, path, + flags); +} + static int deflate_blob_to_pack(struct bulk_checkin_packfile *state, struct object_id *result_oid, int fd, size_t size, @@ -392,6 +497,17 @@ int index_blob_bulk_checkin(struct object_id *oid, return status; } +int index_blob_bulk_checkin_incore(struct object_id *oid, + const void *buf, size_t size, + const char *path, unsigned flags) +{ + int status = deflate_blob_to_pack_incore(&bulk_checkin_packfile, oid, + buf, size, path, flags); + if (!odb_transaction_nesting) + flush_bulk_checkin_packfile(&bulk_checkin_packfile); + return status; +} + void begin_odb_transaction(void) { odb_transaction_nesting += 1; diff --git a/bulk-checkin.h b/bulk-checkin.h index aa7286a7b3..1b91daeaee 100644 --- a/bulk-checkin.h +++ b/bulk-checkin.h @@ -13,6 +13,10 @@ int index_blob_bulk_checkin(struct object_id *oid, int fd, size_t size, const char *path, unsigned flags); +int index_blob_bulk_checkin_incore(struct object_id *oid, + const void *buf, size_t size, + const char *path, unsigned flags); + /* * Tell the object database to optimize for adding * multiple objects. end_odb_transaction must be called From patchwork Fri Oct 6 22:02:04 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Taylor Blau X-Patchwork-Id: 13412051 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1A9C6E94133 for ; Fri, 6 Oct 2023 22:02:21 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233774AbjJFWCT (ORCPT ); Fri, 6 Oct 2023 18:02:19 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:39732 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233677AbjJFWCI (ORCPT ); Fri, 6 Oct 2023 18:02:08 -0400 Received: from mail-qv1-xf2b.google.com (mail-qv1-xf2b.google.com [IPv6:2607:f8b0:4864:20::f2b]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 86487C6 for ; Fri, 6 Oct 2023 15:02:06 -0700 (PDT) Received: by mail-qv1-xf2b.google.com with SMTP id 6a1803df08f44-65afac36b2cso14129586d6.3 for ; Fri, 06 Oct 2023 15:02:06 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ttaylorr-com.20230601.gappssmtp.com; s=20230601; t=1696629725; x=1697234525; darn=vger.kernel.org; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=kXgj0ajSWXQjnEa7XOp0OEDTb8UQQh3uAxx9x3ixfEI=; b=NirsXLMbYnuwR1Akd54UZcJ6sTfPgjaEUqyxg33F5yV1i1TQUTRmQ7bwD2FshEnE7H dB9+ALlDvE4v3mda1eXgNX5+aidjTQ/c/sn5ZC3lYsOsgJFsHiSRoOIg5SkDdoEMb5xO pMcW+1nUgxiUrRRswEABZ8izEHVtLTSX7RZgpNmHi8Wepkjo1MoGjOR7c9S1GMRG/Tfq frbwFNrwYvIPh0v9yM4MB4JBhfzMBX6IkSsBapEmyZi6Q99HlQryv/Alz0p1Q0lCLvM6 ObKRfGK5vq7HgEgnZrkpcqR6OaeB2nDNHduIgUvAS889iwrQ/o3W7vut7wIC3gA5nT4Y WpCw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1696629725; x=1697234525; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=kXgj0ajSWXQjnEa7XOp0OEDTb8UQQh3uAxx9x3ixfEI=; b=NlUQMRot9pxMtyklqHY2VKrOuQcKTSxdwGlgfd8jMxtYEzw8W6PyM1UtFJVSoVkpaS Z/9tfai7aiIk3DS2PecmGxM9JVNAW4nCP0+rHYjlIK8h0v09rz4tTSy6w5cUMakumCDK HpQvC2cVpSCQ+flcTUtR0+Fh+d9IWbEg3fzMGaO3QuVdQr8ziv/2HbCuoPcNeAUUY6Ix oGiNdQ9PADkF4xaiQvJHCrgDZ3DUtP+/lxkL22NcvTGyVjPJEbpOsIg/wPZjQf36mhT+ Wds4X0Yqz9cV57BBWUbcnaP8GkVagUWUmbvtoZGKyaNiSP6uMUAM9ZjgEUrlLPcjdXAp Ta2A== X-Gm-Message-State: AOJu0Yz0n9KoYpDcVckUJp2a5SdJSzTW5kFQIIoCB+dRBMx/OOddWP7h PJXznZucYDFUi07DDsh5y46A4Z+KEGpGx5sTUNMRRA== X-Google-Smtp-Source: AGHT+IH21SJwmBeuH153MpqmiqrXfClcfQHJSIDAM3VZ74hrqhgvoEJgZAl3HOvMQ35NdDJTnAt7QA== X-Received: by 2002:a0c:e184:0:b0:641:8d17:96fd with SMTP id p4-20020a0ce184000000b006418d1796fdmr9564222qvl.41.1696629725490; Fri, 06 Oct 2023 15:02:05 -0700 (PDT) Received: from localhost (104-178-186-189.lightspeed.milwwi.sbcglobal.net. [104.178.186.189]) by smtp.gmail.com with ESMTPSA id c29-20020a0ca9dd000000b0065af24495easm1744157qvb.51.2023.10.06.15.02.05 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 06 Oct 2023 15:02:05 -0700 (PDT) Date: Fri, 6 Oct 2023 18:02:04 -0400 From: Taylor Blau To: git@vger.kernel.org Cc: Elijah Newren , "Eric W. Biederman" , Jeff King , Junio C Hamano Subject: [PATCH 6/7] bulk-checkin: introduce `index_tree_bulk_checkin_incore()` Message-ID: References: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org The remaining missing piece in order to teach the `merge-tree` builtin how to write the contents of a merge into a pack is a function to index tree objects into a bulk-checkin pack. This patch implements that missing piece, which is a thin wrapper around all of the functionality introduced in previous commits. If and when Git gains support for a "compatibility" hash algorithm, the changes to support that here will be minimal. The bulk-checkin machinery will need to convert the incoming tree to compute its length under the compatibility hash, necessary to reconstruct its header. With that information (and the converted contents of the tree), the bulk-checkin machinery will have enough to keep track of the converted object's hash in order to update the compatibility mapping. Within `deflate_tree_to_pack_incore()`, the changes should be limited to something like: if (the_repository->compat_hash_algo) { struct strbuf converted = STRBUF_INIT; if (convert_object_file(&compat_obj, the_repository->hash_algo, the_repository->compat_hash_algo, ...) < 0) die(...); format_object_header_hash(the_repository->compat_hash_algo, OBJ_TREE, size); strbuf_release(&converted); } , assuming related changes throughout the rest of the bulk-checkin machinery necessary to update the hash of the converted object, which are likewise minimal in size. Signed-off-by: Taylor Blau --- bulk-checkin.c | 25 +++++++++++++++++++++++++ bulk-checkin.h | 4 ++++ 2 files changed, 29 insertions(+) diff --git a/bulk-checkin.c b/bulk-checkin.c index 319921efe7..d7d46f1dac 100644 --- a/bulk-checkin.c +++ b/bulk-checkin.c @@ -418,6 +418,20 @@ static int deflate_blob_to_pack_incore(struct bulk_checkin_packfile *state, flags); } +static int deflate_tree_to_pack_incore(struct bulk_checkin_packfile *state, + struct object_id *result_oid, + const void *buf, size_t size, + const char *path, unsigned flags) +{ + git_hash_ctx ctx; + + format_object_header_hash(the_hash_algo, &ctx, OBJ_TREE, size); + + return deflate_obj_contents_to_pack_incore(state, &ctx, result_oid, + buf, size, OBJ_TREE, path, + flags); +} + static int deflate_blob_to_pack(struct bulk_checkin_packfile *state, struct object_id *result_oid, int fd, size_t size, @@ -508,6 +522,17 @@ int index_blob_bulk_checkin_incore(struct object_id *oid, return status; } +int index_tree_bulk_checkin_incore(struct object_id *oid, + const void *buf, size_t size, + const char *path, unsigned flags) +{ + int status = deflate_tree_to_pack_incore(&bulk_checkin_packfile, oid, + buf, size, path, flags); + if (!odb_transaction_nesting) + flush_bulk_checkin_packfile(&bulk_checkin_packfile); + return status; +} + void begin_odb_transaction(void) { odb_transaction_nesting += 1; diff --git a/bulk-checkin.h b/bulk-checkin.h index 1b91daeaee..89786b3954 100644 --- a/bulk-checkin.h +++ b/bulk-checkin.h @@ -17,6 +17,10 @@ int index_blob_bulk_checkin_incore(struct object_id *oid, const void *buf, size_t size, const char *path, unsigned flags); +int index_tree_bulk_checkin_incore(struct object_id *oid, + const void *buf, size_t size, + const char *path, unsigned flags); + /* * Tell the object database to optimize for adding * multiple objects. end_odb_transaction must be called From patchwork Fri Oct 6 22:02:07 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Taylor Blau X-Patchwork-Id: 13412052 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 04D46E94134 for ; Fri, 6 Oct 2023 22:02:24 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233796AbjJFWCW (ORCPT ); Fri, 6 Oct 2023 18:02:22 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:56720 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233737AbjJFWCO (ORCPT ); Fri, 6 Oct 2023 18:02:14 -0400 Received: from mail-qv1-xf2e.google.com (mail-qv1-xf2e.google.com [IPv6:2607:f8b0:4864:20::f2e]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 78C8BFC for ; Fri, 6 Oct 2023 15:02:09 -0700 (PDT) Received: by mail-qv1-xf2e.google.com with SMTP id 6a1803df08f44-65af7e20f39so14759886d6.2 for ; Fri, 06 Oct 2023 15:02:09 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ttaylorr-com.20230601.gappssmtp.com; s=20230601; t=1696629728; x=1697234528; darn=vger.kernel.org; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=ZyJEdYz9SRHk5dZnA9eeyinw6ZZg9KpLSBbH4bdIA7Q=; b=Kzg4c+c/DCcTEN50Ho99uzW2HaC2BlWnV/aksamVV4xivisBmt4CJ3o4qwjpkm8/qX fshF4Q7Ukv4CayIgisYPqEl4A4yKoltUuGDEvGZF1PdhM0/sGRNEC97weG0rkkunMAS4 qGpFjKaKZDhAxvagMqIIj835t9K64i3zeyakgK1RGGTh0LzZ0M0KXa+sUHynXfWATM46 0cITaJCCKVlYdNnNje+eMLunpd/u4o0HSPoDwtvlNbI39+vMTyDBFt1obgaLgtjMfcag lSymBL6s4GC0Ru2NROVBTrVfwaiEqgOs1dAmEUaE0X0fEBsWyf6W5xNpIP0flbmK1ugo ddDg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1696629728; x=1697234528; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=ZyJEdYz9SRHk5dZnA9eeyinw6ZZg9KpLSBbH4bdIA7Q=; b=JpaQcH50BaDevs17nxf9QIAs4AP1pQ/cPnx0rMNcyrU2r8ziSDZQPcuFO/oQ0kIsux hSbI4U2WgSMCC0QPsstvZDBrf9fuIMbUbXqb2vGusdTrvyrSMXK6r8ZOZCPB2vKP/8T6 FwY5ug4rsl5K2juy63idRbrSMPxFJIQF+yLys3ardDmLuA+kut4lx3lXDBV5MA6+Ae2F FyJS10bFhLC96u6+llIQVqmxNBL/1EuhPloWglskWlMvFhmgFhejQ0YTCS7tPhBIT4p/ S9CkxbBxEseEzXYbRn08Hc0q7GyjQyLwAzO5qKOLwHkRMrr1458d5cm/fmK2iHrDFLSA NdTg== X-Gm-Message-State: AOJu0Ywvd/TlX5aLEscDCtndpShHgjvAsKoXUjsKf8O8+IVfS5o5rFvv jvXk9tySDS8hgZpuTClR3IllXfhxVjqf11chUHpK9g== X-Google-Smtp-Source: AGHT+IHwHKa3zNdMWBI0qr+FoXyehUdhg3dBvBy9M5UYUwnlSvre42M/jPhR/KtXXvS81N5ucm2ffw== X-Received: by 2002:a05:6214:5585:b0:668:ef6a:7664 with SMTP id mi5-20020a056214558500b00668ef6a7664mr9370900qvb.33.1696629728342; Fri, 06 Oct 2023 15:02:08 -0700 (PDT) Received: from localhost (104-178-186-189.lightspeed.milwwi.sbcglobal.net. [104.178.186.189]) by smtp.gmail.com with ESMTPSA id c26-20020a0ca9da000000b0065b079366a7sm1729405qvb.114.2023.10.06.15.02.08 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 06 Oct 2023 15:02:08 -0700 (PDT) Date: Fri, 6 Oct 2023 18:02:07 -0400 From: Taylor Blau To: git@vger.kernel.org Cc: Elijah Newren , "Eric W. Biederman" , Jeff King , Junio C Hamano Subject: [PATCH 7/7] builtin/merge-tree.c: implement support for `--write-pack` Message-ID: References: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org When using merge-tree often within a repository[^1], it is possible to generate a relatively large number of loose objects, which can result in degraded performance, and inode exhaustion in extreme cases. Building on the functionality introduced in previous commits, the bulk-checkin machinery now has support to write arbitrary blob and tree objects which are small enough to be held in-core. We can use this to write any blob/tree objects generated by ORT into a separate pack instead of writing them out individually as loose. This functionality is gated behind a new `--write-pack` option to `merge-tree` that works with the (non-deprecated) `--write-tree` mode. The implementation is relatively straightforward. There are two spots within the ORT mechanism where we call `write_object_file()`, one for content differences within blobs, and another to assemble any new trees necessary to construct the merge. In each of those locations, conditionally replace calls to `write_object_file()` with `index_blob_bulk_checkin_incore()` or `index_tree_bulk_checkin_incore()` depending on which kind of object we are writing. The only remaining task is to begin and end the transaction necessary to initialize the bulk-checkin machinery, and move any new pack(s) it created into the main object store. [^1]: Such is the case at GitHub, where we run presumptive "test merges" on open pull requests to see whether or not we can light up the merge button green depending on whether or not the presumptive merge was conflicted. This is done in response to a number of user-initiated events, including viewing an open pull request whose last test merge is stale with respect to the current base and tip of the pull request. As a result, merge-tree can be run very frequently on large, active repositories. Signed-off-by: Taylor Blau --- Documentation/git-merge-tree.txt | 4 ++ builtin/merge-tree.c | 5 ++ merge-ort.c | 43 +++++++++++---- merge-recursive.h | 1 + t/t4301-merge-tree-write-tree.sh | 93 ++++++++++++++++++++++++++++++++ 5 files changed, 136 insertions(+), 10 deletions(-) diff --git a/Documentation/git-merge-tree.txt b/Documentation/git-merge-tree.txt index ffc4fbf7e8..9d37609ef1 100644 --- a/Documentation/git-merge-tree.txt +++ b/Documentation/git-merge-tree.txt @@ -69,6 +69,10 @@ OPTIONS specify a merge-base for the merge, and specifying multiple bases is currently not supported. This option is incompatible with `--stdin`. +--write-pack:: + Write any new objects into a separate packfile instead of as + individual loose objects. + [[OUTPUT]] OUTPUT ------ diff --git a/builtin/merge-tree.c b/builtin/merge-tree.c index 0de42aecf4..672ebd4c54 100644 --- a/builtin/merge-tree.c +++ b/builtin/merge-tree.c @@ -18,6 +18,7 @@ #include "quote.h" #include "tree.h" #include "config.h" +#include "bulk-checkin.h" static int line_termination = '\n'; @@ -414,6 +415,7 @@ struct merge_tree_options { int show_messages; int name_only; int use_stdin; + int write_pack; }; static int real_merge(struct merge_tree_options *o, @@ -440,6 +442,7 @@ static int real_merge(struct merge_tree_options *o, init_merge_options(&opt, the_repository); opt.show_rename_progress = 0; + opt.write_pack = o->write_pack; opt.branch1 = branch1; opt.branch2 = branch2; @@ -548,6 +551,8 @@ int cmd_merge_tree(int argc, const char **argv, const char *prefix) &merge_base, N_("commit"), N_("specify a merge-base for the merge")), + OPT_BOOL(0, "write-pack", &o.write_pack, + N_("write new objects to a pack instead of as loose")), OPT_END() }; diff --git a/merge-ort.c b/merge-ort.c index 8631c99700..85d8c5c6b3 100644 --- a/merge-ort.c +++ b/merge-ort.c @@ -48,6 +48,7 @@ #include "tree.h" #include "unpack-trees.h" #include "xdiff-interface.h" +#include "bulk-checkin.h" /* * We have many arrays of size 3. Whenever we have such an array, the @@ -2124,11 +2125,19 @@ static int handle_content_merge(struct merge_options *opt, if ((merge_status < 0) || !result_buf.ptr) ret = err(opt, _("Failed to execute internal merge")); - if (!ret && - write_object_file(result_buf.ptr, result_buf.size, - OBJ_BLOB, &result->oid)) - ret = err(opt, _("Unable to add %s to database"), - path); + if (!ret) { + ret = opt->write_pack + ? index_blob_bulk_checkin_incore(&result->oid, + result_buf.ptr, + result_buf.size, + path, 1) + : write_object_file(result_buf.ptr, + result_buf.size, + OBJ_BLOB, &result->oid); + if (ret) + ret = err(opt, _("Unable to add %s to database"), + path); + } free(result_buf.ptr); if (ret) @@ -3618,7 +3627,8 @@ static int tree_entry_order(const void *a_, const void *b_) b->string, strlen(b->string), bmi->result.mode); } -static int write_tree(struct object_id *result_oid, +static int write_tree(struct merge_options *opt, + struct object_id *result_oid, struct string_list *versions, unsigned int offset, size_t hash_size) @@ -3652,8 +3662,14 @@ static int write_tree(struct object_id *result_oid, } /* Write this object file out, and record in result_oid */ - if (write_object_file(buf.buf, buf.len, OBJ_TREE, result_oid)) + ret = opt->write_pack + ? index_tree_bulk_checkin_incore(result_oid, + buf.buf, buf.len, "", 1) + : write_object_file(buf.buf, buf.len, OBJ_TREE, result_oid); + + if (ret) ret = -1; + strbuf_release(&buf); return ret; } @@ -3818,8 +3834,8 @@ static int write_completed_directory(struct merge_options *opt, */ dir_info->is_null = 0; dir_info->result.mode = S_IFDIR; - if (write_tree(&dir_info->result.oid, &info->versions, offset, - opt->repo->hash_algo->rawsz) < 0) + if (write_tree(opt, &dir_info->result.oid, &info->versions, + offset, opt->repo->hash_algo->rawsz) < 0) ret = -1; } @@ -4353,9 +4369,13 @@ static int process_entries(struct merge_options *opt, fflush(stdout); BUG("dir_metadata accounting completely off; shouldn't happen"); } - if (write_tree(result_oid, &dir_metadata.versions, 0, + if (write_tree(opt, result_oid, &dir_metadata.versions, 0, opt->repo->hash_algo->rawsz) < 0) ret = -1; + + if (opt->write_pack) + end_odb_transaction(); + cleanup: string_list_clear(&plist, 0); string_list_clear(&dir_metadata.versions, 0); @@ -4899,6 +4919,9 @@ static void merge_start(struct merge_options *opt, struct merge_result *result) */ strmap_init(&opt->priv->conflicts); + if (opt->write_pack) + begin_odb_transaction(); + trace2_region_leave("merge", "allocate/init", opt->repo); } diff --git a/merge-recursive.h b/merge-recursive.h index b88000e3c2..156e160876 100644 --- a/merge-recursive.h +++ b/merge-recursive.h @@ -48,6 +48,7 @@ struct merge_options { unsigned renormalize : 1; unsigned record_conflict_msgs_as_headers : 1; const char *msg_header_prefix; + unsigned write_pack : 1; /* internal fields used by the implementation */ struct merge_options_internal *priv; diff --git a/t/t4301-merge-tree-write-tree.sh b/t/t4301-merge-tree-write-tree.sh index 250f721795..2d81ff4de5 100755 --- a/t/t4301-merge-tree-write-tree.sh +++ b/t/t4301-merge-tree-write-tree.sh @@ -922,4 +922,97 @@ test_expect_success 'check the input format when --stdin is passed' ' test_cmp expect actual ' +packdir=".git/objects/pack" + +test_expect_success 'merge-tree can pack its result with --write-pack' ' + test_when_finished "rm -rf repo" && + git init repo && + + # base has lines [3, 4, 5] + # - side adds to the beginning, resulting in [1, 2, 3, 4, 5] + # - other adds to the end, resulting in [3, 4, 5, 6, 7] + # + # merging the two should result in a new blob object containing + # [1, 2, 3, 4, 5, 6, 7], along with a new tree. + test_commit -C repo base file "$(test_seq 3 5)" && + git -C repo branch -M main && + git -C repo checkout -b side main && + test_commit -C repo side file "$(test_seq 1 5)" && + git -C repo checkout -b other main && + test_commit -C repo other file "$(test_seq 3 7)" && + + find repo/$packdir -type f -name "pack-*.idx" >packs.before && + tree="$(git -C repo merge-tree --write-pack \ + refs/tags/side refs/tags/other)" && + blob="$(git -C repo rev-parse $tree:file)" && + find repo/$packdir -type f -name "pack-*.idx" >packs.after && + + test_must_be_empty packs.before && + test_line_count = 1 packs.after && + + git show-index <$(cat packs.after) >objects && + test_line_count = 2 objects && + grep "^[1-9][0-9]* $tree" objects && + grep "^[1-9][0-9]* $blob" objects +' + +test_expect_success 'merge-tree can write multiple packs with --write-pack' ' + test_when_finished "rm -rf repo" && + git init repo && + ( + cd repo && + + git config pack.packSizeLimit 512 && + + test_seq 512 >f && + + # "f" contains roughly ~2,000 bytes. + # + # Each side ("foo" and "bar") adds a small amount of data at the + # beginning and end of "base", respectively. + git add f && + test_tick && + git commit -m base && + git branch -M main && + + git checkout -b foo main && + { + echo foo && cat f + } >f.tmp && + mv f.tmp f && + git add f && + test_tick && + git commit -m foo && + + git checkout -b bar main && + echo bar >>f && + git add f && + test_tick && + git commit -m bar && + + find $packdir -type f -name "pack-*.idx" >packs.before && + # Merging either side should result in a new object which is + # larger than 1M, thus the result should be split into two + # separate packs. + tree="$(git merge-tree --write-pack \ + refs/heads/foo refs/heads/bar)" && + blob="$(git rev-parse $tree:f)" && + find $packdir -type f -name "pack-*.idx" >packs.after && + + test_must_be_empty packs.before && + test_line_count = 2 packs.after && + for idx in $(cat packs.after) + do + git show-index <$idx || return 1 + done >objects && + + # The resulting set of packs should contain one copy of both + # objects, each in a separate pack. + test_line_count = 2 objects && + grep "^[1-9][0-9]* $tree" objects && + grep "^[1-9][0-9]* $blob" objects + + ) +' + test_done