From patchwork Tue Oct 17 16:31:12 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Taylor Blau X-Patchwork-Id: 13425622 Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net [23.128.96.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 4BB439CA66 for ; Tue, 17 Oct 2023 16:31:16 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=ttaylorr-com.20230601.gappssmtp.com header.i=@ttaylorr-com.20230601.gappssmtp.com header.b="VWvMI4/T" Received: from mail-qk1-x72c.google.com (mail-qk1-x72c.google.com [IPv6:2607:f8b0:4864:20::72c]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 2CDC9B0 for ; Tue, 17 Oct 2023 09:31:15 -0700 (PDT) Received: by mail-qk1-x72c.google.com with SMTP id af79cd13be357-774141bb415so357749085a.3 for ; Tue, 17 Oct 2023 09:31:15 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ttaylorr-com.20230601.gappssmtp.com; s=20230601; t=1697560274; x=1698165074; darn=vger.kernel.org; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=hCmeGXai+JkrXjSHBFW1AM9uwKhigNCKnvl4mAcXjGE=; b=VWvMI4/Tm3GURmxwNCGbhhqPM1NoWecfxHOYi/FZtYDs2+PlHFiYQS9ssS3g8mbTv9 +/QSFbS0FuqpytzHUnnSdOP0hSQ2i56gRct6CS5Qu8Q4mFtBZTNOaXB3xYSqShZBiqgN ZjNc8rFiYCCmhsE4eRqYq+p0XagP4pwZilst+SH9RtlOaZ2NSvU8RdXKCubmgHhihiYY PuWhw+C9F6yC2fvss/WjZNQtf8RMW7coQ2YCGAbXlH29aq6t3Jbl6tcEXv8vTMytrV1Q Uc+RYOixt9JLzzyix3qHjA94mg4R9QdY3X4WgkgJ+/Mk22BfNnJxnfZY7FCHtir1bFqb W5yw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1697560274; x=1698165074; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=hCmeGXai+JkrXjSHBFW1AM9uwKhigNCKnvl4mAcXjGE=; b=ozewp2S+M1HKw9NlpcXFfUoAWDK/zIEaomT9v+qS/nqeSj1R+EtMuADIJGJ18s8SN8 v/4fldmYy8waJBSveE6TCQcpA8Sga16XJ+jDWdbWi2P3/VuP31MXbKVyGLlWz9klBbbv XvyVZC5haZL5a6yqUcVhYKABCQd203x2jYxlYmUzTFSeIY+39V1S6DjBg1bIHcPGH4/T 8UNewH47IA/TuRLQFypOnAX/tVkuKi5e005RPjP/gK5DviTlFuMwvNZF5AJai0NP9gAt W1D2R9+hNCvHDbp33BJYbGe/nF3TUCX098TKTGzPvPrJpYw2cWcFwkiBKa5RZIxzhE2a GksA== X-Gm-Message-State: AOJu0YyP8agx7FOacUOsz6S0G4fZLPSsfwgcjdP0WLWv62Udx6qI314o ybzRD2rXC+6Zqp5fOqBzKnsHAyIhiManIoVvmKccow== X-Google-Smtp-Source: AGHT+IGdGeyXsOq+V2vTBh0WEF1+i7n0Ws2vPKvi06nkD4rwkgZhaGck2/d7wUKhcgWSj5aBn0xeMA== X-Received: by 2002:a05:620a:2844:b0:777:4519:4d81 with SMTP id h4-20020a05620a284400b0077745194d81mr3056708qkp.58.1697560274103; Tue, 17 Oct 2023 09:31:14 -0700 (PDT) Received: from localhost (104-178-186-189.lightspeed.milwwi.sbcglobal.net. [104.178.186.189]) by smtp.gmail.com with ESMTPSA id s17-20020ae9f711000000b0076f16e98851sm782354qkg.102.2023.10.17.09.31.13 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 17 Oct 2023 09:31:13 -0700 (PDT) Date: Tue, 17 Oct 2023 12:31:12 -0400 From: Taylor Blau To: git@vger.kernel.org Cc: Elijah Newren , "Eric W. Biederman" , Jeff King , Junio C Hamano , Patrick Steinhardt Subject: [PATCH v2 1/7] bulk-checkin: factor out `format_object_header_hash()` Message-ID: References: Precedence: bulk X-Mailing-List: git@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_NONE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Before deflating a blob into a pack, the bulk-checkin mechanism prepares the pack object header by calling `format_object_header()`, and writing into a scratch buffer, the contents of which eventually makes its way into the pack. Future commits will add support for deflating multiple kinds of objects into a pack, and will likewise need to perform a similar operation as below. This is a mostly straightforward extraction, with one notable exception. Instead of hard-coding `the_hash_algo`, pass it in to the new function as an argument. This isn't strictly necessary for our immediate purposes here, but will prove useful in the future if/when the bulk-checkin mechanism grows support for the hash transition plan. Signed-off-by: Taylor Blau --- bulk-checkin.c | 25 ++++++++++++++++++------- 1 file changed, 18 insertions(+), 7 deletions(-) diff --git a/bulk-checkin.c b/bulk-checkin.c index 6ce62999e5..fd3c110d1c 100644 --- a/bulk-checkin.c +++ b/bulk-checkin.c @@ -247,6 +247,22 @@ static void prepare_to_stream(struct bulk_checkin_packfile *state, die_errno("unable to write pack header"); } +static void format_object_header_hash(const struct git_hash_algo *algop, + git_hash_ctx *ctx, + struct hashfile_checkpoint *checkpoint, + enum object_type type, + size_t size) +{ + unsigned char header[16384]; + unsigned header_len = format_object_header((char *)header, + sizeof(header), + type, size); + + algop->init_fn(ctx); + algop->update_fn(ctx, header, header_len); + algop->init_fn(&checkpoint->ctx); +} + static int deflate_blob_to_pack(struct bulk_checkin_packfile *state, struct object_id *result_oid, int fd, size_t size, @@ -254,8 +270,6 @@ static int deflate_blob_to_pack(struct bulk_checkin_packfile *state, { off_t seekback, already_hashed_to; git_hash_ctx ctx; - unsigned char obuf[16384]; - unsigned header_len; struct hashfile_checkpoint checkpoint = {0}; struct pack_idx_entry *idx = NULL; @@ -263,11 +277,8 @@ static int deflate_blob_to_pack(struct bulk_checkin_packfile *state, if (seekback == (off_t) -1) return error("cannot find the current offset"); - header_len = format_object_header((char *)obuf, sizeof(obuf), - OBJ_BLOB, size); - the_hash_algo->init_fn(&ctx); - the_hash_algo->update_fn(&ctx, obuf, header_len); - the_hash_algo->init_fn(&checkpoint.ctx); + format_object_header_hash(the_hash_algo, &ctx, &checkpoint, OBJ_BLOB, + size); /* Note: idx is non-NULL when we are writing */ if ((flags & HASH_WRITE_OBJECT) != 0) From patchwork Tue Oct 17 16:31:15 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Taylor Blau X-Patchwork-Id: 13425623 Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net [23.128.96.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 18F3C2D02E for ; Tue, 17 Oct 2023 16:31:19 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=ttaylorr-com.20230601.gappssmtp.com header.i=@ttaylorr-com.20230601.gappssmtp.com header.b="k6vcL/vb" Received: from mail-qt1-x82f.google.com (mail-qt1-x82f.google.com [IPv6:2607:f8b0:4864:20::82f]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D11EBB0 for ; Tue, 17 Oct 2023 09:31:18 -0700 (PDT) Received: by mail-qt1-x82f.google.com with SMTP id d75a77b69052e-41cb78cf0bfso1701401cf.2 for ; Tue, 17 Oct 2023 09:31:18 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ttaylorr-com.20230601.gappssmtp.com; s=20230601; t=1697560278; x=1698165078; darn=vger.kernel.org; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=XA1jNB253C8JW5YaAnb1x1RlQnaHoKx+PjAakGcUEIo=; b=k6vcL/vbSjSZa9fXcaT3CmldOTFUPIJ2jyuPpSY39leGMjRBrnNHDhiQ3eN+Re2ITx ikTcAQTNhp0tUZ9BSkW7p6MTD4DqksLAHIjF1iICGEcaQlSA4f4HMV1bIk+jNhMAvM8U bud0DPPOa9Mozkfl7lPZrXU/lQ7z98jVzSWbX+VTI+JjsIQJH648z7VYzH+m5XDJZshO h0xAa61z8lXIwPncgUPxPndbMuPMWWDPtFXCRFzL3tGjMEnRg33sU+NmsKunC8onRjLE Sy4sssx7Fot/HEUXhVWL6mRQGMwxtA6vyifR3EL1UH64wUOU8QQXnDgdW0RMEwdPMPJP yE2Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1697560278; x=1698165078; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=XA1jNB253C8JW5YaAnb1x1RlQnaHoKx+PjAakGcUEIo=; b=Y4A1p8qKLMlD/bbLa1K0gjJtTX28U7KtjoT/fuC7St6oLt9gaCz0oEpZrebhFzFyzS 9RQ5Oqe+0Kqc1n0jyUaLwkZsFncrJHXIHhhl0s+dTiATbj1XmCrLKjjiudmayLN2QqbV io4KS1Qv6fVHMSZq4NGL9zgmx33tUsn09OS6quUXfOmzKl7MMJazq7We/TC0gGjc3v6Y cLnaigA/GwMU4/7C+c+Gc5Q+94qiiW2SugXY9HiZN3kbAiKoX7x2LxWFEIIiG89BYRff W3vXipk8saLguKEqBp0TQ8B6pICzj3zw713JpwUCfY/9nBCmjJONnXWHMDulY9EoK3tp vOhA== X-Gm-Message-State: AOJu0YxS4+iXXC8e7zd7/ElsCALRALW+M2/1n7L+Lc0V/vTKWgc9WTGe wIqV19dfnvHXaX2DFqAfPn24TvTSqJAS5Ha2LDHV6w== X-Google-Smtp-Source: AGHT+IHUbnj+8apHB3tZ/hQWlS6BL7GOzJaPgPMAwOk2eFNKL4z9WuNAuX2l+Ece3uAKbOkVn0dS/g== X-Received: by 2002:a05:622a:18c:b0:417:b269:4689 with SMTP id s12-20020a05622a018c00b00417b2694689mr2865213qtw.53.1697560277689; Tue, 17 Oct 2023 09:31:17 -0700 (PDT) Received: from localhost (104-178-186-189.lightspeed.milwwi.sbcglobal.net. [104.178.186.189]) by smtp.gmail.com with ESMTPSA id d17-20020ac85451000000b0041cb8732d57sm168396qtq.38.2023.10.17.09.31.17 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 17 Oct 2023 09:31:17 -0700 (PDT) Date: Tue, 17 Oct 2023 12:31:15 -0400 From: Taylor Blau To: git@vger.kernel.org Cc: Elijah Newren , "Eric W. Biederman" , Jeff King , Junio C Hamano , Patrick Steinhardt Subject: [PATCH v2 2/7] bulk-checkin: factor out `prepare_checkpoint()` Message-ID: References: Precedence: bulk X-Mailing-List: git@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_NONE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net In a similar spirit as the previous commit, factor out the routine to prepare streaming into a bulk-checkin pack into its own function. Unlike the previous patch, this is a verbatim copy and paste. Signed-off-by: Taylor Blau --- bulk-checkin.c | 20 ++++++++++++++------ 1 file changed, 14 insertions(+), 6 deletions(-) diff --git a/bulk-checkin.c b/bulk-checkin.c index fd3c110d1c..c1f5450583 100644 --- a/bulk-checkin.c +++ b/bulk-checkin.c @@ -263,6 +263,19 @@ static void format_object_header_hash(const struct git_hash_algo *algop, algop->init_fn(&checkpoint->ctx); } +static void prepare_checkpoint(struct bulk_checkin_packfile *state, + struct hashfile_checkpoint *checkpoint, + struct pack_idx_entry *idx, + unsigned flags) +{ + prepare_to_stream(state, flags); + if (idx) { + hashfile_checkpoint(state->f, checkpoint); + idx->offset = state->offset; + crc32_begin(state->f); + } +} + static int deflate_blob_to_pack(struct bulk_checkin_packfile *state, struct object_id *result_oid, int fd, size_t size, @@ -287,12 +300,7 @@ static int deflate_blob_to_pack(struct bulk_checkin_packfile *state, already_hashed_to = 0; while (1) { - prepare_to_stream(state, flags); - if (idx) { - hashfile_checkpoint(state->f, &checkpoint); - idx->offset = state->offset; - crc32_begin(state->f); - } + prepare_checkpoint(state, &checkpoint, idx, flags); if (!stream_blob_to_pack(state, &ctx, &already_hashed_to, fd, size, path, flags)) break; From patchwork Tue Oct 17 16:31:19 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Taylor Blau X-Patchwork-Id: 13425624 Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net [23.128.96.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 3F9332D034 for ; Tue, 17 Oct 2023 16:31:24 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=ttaylorr-com.20230601.gappssmtp.com header.i=@ttaylorr-com.20230601.gappssmtp.com header.b="lFaR7eA9" Received: from mail-oi1-x236.google.com (mail-oi1-x236.google.com [IPv6:2607:f8b0:4864:20::236]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 38BD8100 for ; Tue, 17 Oct 2023 09:31:22 -0700 (PDT) Received: by mail-oi1-x236.google.com with SMTP id 5614622812f47-3b2e44c7941so129312b6e.2 for ; Tue, 17 Oct 2023 09:31:22 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ttaylorr-com.20230601.gappssmtp.com; s=20230601; t=1697560281; x=1698165081; darn=vger.kernel.org; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=xMMzjQAvKSKV96pfGUqgGIgRkl7QuXFF9tdiE3RB3Jc=; b=lFaR7eA9ihslNQVTyIx0XHrXoq/B3fIQcKjEWfgVmebzvZE/7ami5kYupTAXA63LMg XdOVJacBWYo4CDnyQU1Z8EHEWviW0N7JMUbjMEmG32vQxHQ/bp2kcjrgr3GSh9KgO1hv aRyHqWbVCXD4Nl2HbIhyLteQmpAFXpTUWery1AbVHLtPtYQODJoY8Xu/lpn4I3dAR+dg lGLyUnGpUW/JKz76r22+RI7619KW/W7Rc1SxJ4qBYo02ZyMN8vy93oI5fcb1k2LrE1Mh 3SmkymiucM6RL6JzGxmIYnmePhWwpDTZRUtFqcBIcfnFuPdmdyc9yrGTH1XqSYhZMePm PkJA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1697560281; x=1698165081; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=xMMzjQAvKSKV96pfGUqgGIgRkl7QuXFF9tdiE3RB3Jc=; b=VNFN+l5lxPcXe9/3HtBu52eGHQHlRw+NkWf7TpGcXH6PJBCIEXUjlt1s6n8msS8x4C zKPj2QOQVaOrlrSKiYIp3zkavJol+/ZTrMMUqvhSsEQjIvo2lpFMLhFRlBIlKxZSlsT1 gp/Q71pkeznND5sr3AflUBAvKNbyqsijNml8pu9d//ihGb2rAmxfX0ogprHPmRoxPYBq U5LQwGFxxbWsz7LUhc1IPSRIWcj+fxRmHy9v2igMSslKku5/knPj3cmDJ2ubeJnvtzys 90YIMlKKQY9vaHsE9gVL0SN+KxJ7PHR/3kOIngyZflYQSVjL99fEV0cArU6vBEezr7jE 611w== X-Gm-Message-State: AOJu0YyfWvMaewX+GFfPnqxUyk7aF3qDoRTQraktZJZf3tGThoBGGXrB pCPS2d7je7WlyTKhzgU4QA8vPE7+pz2qovV5vkqooQ== X-Google-Smtp-Source: AGHT+IGIMVh0e8XAKRm0QBsRPoZ1DAws+qtgtdKY0il/db4SRmV+c3sTcC1/5Dk4r7SrXul1A2qAJg== X-Received: by 2002:a05:6808:2990:b0:3af:c259:71e6 with SMTP id ex16-20020a056808299000b003afc25971e6mr2797634oib.5.1697560281055; Tue, 17 Oct 2023 09:31:21 -0700 (PDT) Received: from localhost (104-178-186-189.lightspeed.milwwi.sbcglobal.net. [104.178.186.189]) by smtp.gmail.com with ESMTPSA id br20-20020a05620a461400b0077731466526sm784192qkb.70.2023.10.17.09.31.20 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 17 Oct 2023 09:31:20 -0700 (PDT) Date: Tue, 17 Oct 2023 12:31:19 -0400 From: Taylor Blau To: git@vger.kernel.org Cc: Elijah Newren , "Eric W. Biederman" , Jeff King , Junio C Hamano , Patrick Steinhardt Subject: [PATCH v2 3/7] bulk-checkin: factor out `truncate_checkpoint()` Message-ID: References: Precedence: bulk X-Mailing-List: git@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_NONE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net In a similar spirit as previous commits, factor our the routine to truncate a bulk-checkin packfile when writing past the pack size limit. Signed-off-by: Taylor Blau --- bulk-checkin.c | 27 +++++++++++++++++---------- 1 file changed, 17 insertions(+), 10 deletions(-) diff --git a/bulk-checkin.c b/bulk-checkin.c index c1f5450583..b92d7a6f5a 100644 --- a/bulk-checkin.c +++ b/bulk-checkin.c @@ -276,6 +276,22 @@ static void prepare_checkpoint(struct bulk_checkin_packfile *state, } } +static void truncate_checkpoint(struct bulk_checkin_packfile *state, + struct hashfile_checkpoint *checkpoint, + struct pack_idx_entry *idx) +{ + /* + * Writing this object to the current pack will make + * it too big; we need to truncate it, start a new + * pack, and write into it. + */ + if (!idx) + BUG("should not happen"); + hashfile_truncate(state->f, checkpoint); + state->offset = checkpoint->offset; + flush_bulk_checkin_packfile(state); +} + static int deflate_blob_to_pack(struct bulk_checkin_packfile *state, struct object_id *result_oid, int fd, size_t size, @@ -304,16 +320,7 @@ static int deflate_blob_to_pack(struct bulk_checkin_packfile *state, if (!stream_blob_to_pack(state, &ctx, &already_hashed_to, fd, size, path, flags)) break; - /* - * Writing this object to the current pack will make - * it too big; we need to truncate it, start a new - * pack, and write into it. - */ - if (!idx) - BUG("should not happen"); - hashfile_truncate(state->f, &checkpoint); - state->offset = checkpoint.offset; - flush_bulk_checkin_packfile(state); + truncate_checkpoint(state, &checkpoint, idx); if (lseek(fd, seekback, SEEK_SET) == (off_t) -1) return error("cannot seek back"); } From patchwork Tue Oct 17 16:31:22 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Taylor Blau X-Patchwork-Id: 13425625 Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net [23.128.96.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 9DE622D042 for ; Tue, 17 Oct 2023 16:31:27 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=ttaylorr-com.20230601.gappssmtp.com header.i=@ttaylorr-com.20230601.gappssmtp.com header.b="DTgHHpZR" Received: from mail-qk1-x72d.google.com (mail-qk1-x72d.google.com [IPv6:2607:f8b0:4864:20::72d]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 8F027109 for ; Tue, 17 Oct 2023 09:31:25 -0700 (PDT) Received: by mail-qk1-x72d.google.com with SMTP id af79cd13be357-7788fb06997so19878385a.0 for ; Tue, 17 Oct 2023 09:31:25 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ttaylorr-com.20230601.gappssmtp.com; s=20230601; t=1697560284; x=1698165084; darn=vger.kernel.org; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=ODk6o/m079r0X5JbwmFY+jZkbdxnP1Q6CXZnnzNLwLE=; b=DTgHHpZRjvAhRYw2/8q5hpk2W2Yl2vegUlrfBoHPHUbzDuCkfaq7lPc1ypFOskd1Ux AIwthzbv9rVzBpC+GTK0rwS5SmnR+/yoXC1xOjbbtfmT97u45QB4j72sPdLFxTa+b8j+ Nn4qFnrB4cffpe4RHVZCkrCY2GOQLNhdaR21VdeixaDznEMVogBVjxqpok/I57w529/N j4TC+/IfJPenYqiWdnxvHPkDKmnd5+yIrWgOevYo6qXFrRstN/3mmplXsX5ELJ0iF8Z2 S8z2kB9Rr0sFmT8JbhmYoTNCZDS/sNkfPwAnpv4Fs+mrsU5MY7vS7XE+2eZnHCQ5BbQh cahg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1697560284; x=1698165084; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=ODk6o/m079r0X5JbwmFY+jZkbdxnP1Q6CXZnnzNLwLE=; b=XQGfKzeQuTRvgNgjwFHam59rYY6MC9+oMd8V+Xh/kDbe8SvoD9KAnqCGQjbahUOPWu 9TzptWRDhmox/b1gbeorvNGA2QoBrT2WDPg0o9jsnBpYO/y/BOUFk+X3Vzrjpk4hkzbq GlX/xDyuKpIcCzTcV151as3Wo6jKLu47KGq9F5qSKZqXCyXoTHOrD3iy12Sg8797NBYk ERnqcITi1fq4krlhhSNA0oVN2C4Yw/om45X7u25Y8MOo2Pq4bJ5FJE3LUmIMh0btOmDf 9BcCo0VWJO/YAsUWJj1LoCaRTFlgD3dnZVNj/SP7ZiXIQMDch7WAmkQtCQPtibdg2gT/ jMlg== X-Gm-Message-State: AOJu0YwQP5bE4OYg1jWVFVe6ziBtcXtstwPGnqRXiXRqSk2FdWG9hzBw z62PeWqgVq2D6DbeqVcBfK53q+NdRcKaJ8NEghFHug== X-Google-Smtp-Source: AGHT+IG/STNhfrnhEhZ09JMXBTatFAh5wXa8RWNJzvwOt/HiNyc/+OKyDywX/yk+u8ZiXAQPwfNguQ== X-Received: by 2002:a05:620a:2995:b0:778:8fdf:1b4 with SMTP id r21-20020a05620a299500b007788fdf01b4mr807478qkp.57.1697560284510; Tue, 17 Oct 2023 09:31:24 -0700 (PDT) Received: from localhost (104-178-186-189.lightspeed.milwwi.sbcglobal.net. [104.178.186.189]) by smtp.gmail.com with ESMTPSA id y10-20020a05620a25ca00b0077402573fb4sm783328qko.124.2023.10.17.09.31.23 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 17 Oct 2023 09:31:23 -0700 (PDT) Date: Tue, 17 Oct 2023 12:31:22 -0400 From: Taylor Blau To: git@vger.kernel.org Cc: Elijah Newren , "Eric W. Biederman" , Jeff King , Junio C Hamano , Patrick Steinhardt Subject: [PATCH v2 4/7] bulk-checkin: factor our `finalize_checkpoint()` Message-ID: <0b855a6eb7f147a9fc4c41dd183b768162345220.1697560266.git.me@ttaylorr.com> References: Precedence: bulk X-Mailing-List: git@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_NONE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net In a similar spirit as previous commits, factor out the routine to finalize the just-written object from the bulk-checkin mechanism. Signed-off-by: Taylor Blau --- bulk-checkin.c | 41 +++++++++++++++++++++++++---------------- 1 file changed, 25 insertions(+), 16 deletions(-) diff --git a/bulk-checkin.c b/bulk-checkin.c index b92d7a6f5a..f4914fb6d1 100644 --- a/bulk-checkin.c +++ b/bulk-checkin.c @@ -292,6 +292,30 @@ static void truncate_checkpoint(struct bulk_checkin_packfile *state, flush_bulk_checkin_packfile(state); } +static void finalize_checkpoint(struct bulk_checkin_packfile *state, + git_hash_ctx *ctx, + struct hashfile_checkpoint *checkpoint, + struct pack_idx_entry *idx, + struct object_id *result_oid) +{ + the_hash_algo->final_oid_fn(result_oid, ctx); + if (!idx) + return; + + idx->crc32 = crc32_end(state->f); + if (already_written(state, result_oid)) { + hashfile_truncate(state->f, checkpoint); + state->offset = checkpoint->offset; + free(idx); + } else { + oidcpy(&idx->oid, result_oid); + ALLOC_GROW(state->written, + state->nr_written + 1, + state->alloc_written); + state->written[state->nr_written++] = idx; + } +} + static int deflate_blob_to_pack(struct bulk_checkin_packfile *state, struct object_id *result_oid, int fd, size_t size, @@ -324,22 +348,7 @@ static int deflate_blob_to_pack(struct bulk_checkin_packfile *state, if (lseek(fd, seekback, SEEK_SET) == (off_t) -1) return error("cannot seek back"); } - the_hash_algo->final_oid_fn(result_oid, &ctx); - if (!idx) - return 0; - - idx->crc32 = crc32_end(state->f); - if (already_written(state, result_oid)) { - hashfile_truncate(state->f, &checkpoint); - state->offset = checkpoint.offset; - free(idx); - } else { - oidcpy(&idx->oid, result_oid); - ALLOC_GROW(state->written, - state->nr_written + 1, - state->alloc_written); - state->written[state->nr_written++] = idx; - } + finalize_checkpoint(state, &ctx, &checkpoint, idx, result_oid); return 0; } From patchwork Tue Oct 17 16:31:26 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Taylor Blau X-Patchwork-Id: 13425626 Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net [23.128.96.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 8C0602D029 for ; Tue, 17 Oct 2023 16:31:30 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=ttaylorr-com.20230601.gappssmtp.com header.i=@ttaylorr-com.20230601.gappssmtp.com header.b="LrHWB2w+" Received: from mail-qt1-x82b.google.com (mail-qt1-x82b.google.com [IPv6:2607:f8b0:4864:20::82b]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 0E1D9101 for ; Tue, 17 Oct 2023 09:31:29 -0700 (PDT) Received: by mail-qt1-x82b.google.com with SMTP id d75a77b69052e-41995d42c3bso35083021cf.1 for ; Tue, 17 Oct 2023 09:31:29 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ttaylorr-com.20230601.gappssmtp.com; s=20230601; t=1697560288; x=1698165088; darn=vger.kernel.org; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=pIJdisFI6j/HucsYEabQZF+SDzcDzbR+3H1HGUdnYys=; b=LrHWB2w+rXDMpQJj+lBlWw9cvcIhkinkEnecs3JmK0rsEYx3DaPCRUCwmuf4iNlEt6 DIGiRxATUTODsSdIQA4ruxyXQKSitpk9QR+Tgpd/1PY8hcXeKEKhOldsqoDrSov/IsVG OvPc5yKmQypt4kfbFMOed5OXnPPBLNpDRN1H9Qmj7bL2GqN85ABQpF4+4FUYPBW3qeW9 Mj7ulVis3/m4L6hBhnJTBhJSSHwAR7nj0Af9/yaETDvm16jCUYxKQMDcWSkyIywCulJc hNRl3+qSyNbTQrQfxneg+6Ee5krmedJip1/96wsxFBxIjWGtCFP4/4qCL8ruCkNiKLpj UzMg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1697560288; x=1698165088; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=pIJdisFI6j/HucsYEabQZF+SDzcDzbR+3H1HGUdnYys=; b=nxsXKuk8qRIWzRlnNhM1HIlhIqvy8/iYgeOy4KUhk6FpsXpRtcIhYGZ+OJ4TFNMKJR LhvWFXu7vEBA5FvvZvCn2PeOwvDw2l7/f3FHCEnimCtt6BaowgwOS7G/fsZKrICg6RWz Q0MYyN+xuHV9DWNKvykMiP2x8zhcKPhWjlPHDjkircXxBt1SZVDkumOPK0sd1OMyBeqn fpsQDuOs9kJLXvhfj81R4WIsAQfgCaQFBVYnvPdRKLH4OAT4TTyF1TGwm47hy2yS0Y3m ze0us0Cf5y2Qjh+djcutC8/1dU4dc6J/5755q0hFaAxa3X7h9T2q+FzZ+LAcC44Lf6Hn /Msw== X-Gm-Message-State: AOJu0YzgOn67PZakmx7YfRz7aiWyZVCx94x7pOoqC92jM5Ky9AFU/gOb a+SvUKWn5UFddEPMJu2ZKxCQSAem632C4QBra5AJIA== X-Google-Smtp-Source: AGHT+IEn9zqXOxJ1ziBnbFD2F7w0cM5t80Y5ON8RuAMjnqYxSmbImKPMkF9rd+gbswDlpNmsiMWK4Q== X-Received: by 2002:ac8:5ad5:0:b0:418:a03:66fd with SMTP id d21-20020ac85ad5000000b004180a0366fdmr3329475qtd.13.1697560287889; Tue, 17 Oct 2023 09:31:27 -0700 (PDT) Received: from localhost (104-178-186-189.lightspeed.milwwi.sbcglobal.net. [104.178.186.189]) by smtp.gmail.com with ESMTPSA id ew3-20020a05622a514300b00410a9dd3d88sm734632qtb.68.2023.10.17.09.31.27 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 17 Oct 2023 09:31:27 -0700 (PDT) Date: Tue, 17 Oct 2023 12:31:26 -0400 From: Taylor Blau To: git@vger.kernel.org Cc: Elijah Newren , "Eric W. Biederman" , Jeff King , Junio C Hamano , Patrick Steinhardt Subject: [PATCH v2 5/7] bulk-checkin: introduce `index_blob_bulk_checkin_incore()` Message-ID: <239bf39bfb21ef621a15839bade34446dcbc3103.1697560266.git.me@ttaylorr.com> References: Precedence: bulk X-Mailing-List: git@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_NONE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Now that we have factored out many of the common routines necessary to index a new object into a pack created by the bulk-checkin machinery, we can introduce a variant of `index_blob_bulk_checkin()` that acts on blobs whose contents we can fit in memory. This will be useful in a couple of more commits in order to provide the `merge-tree` builtin with a mechanism to create a new pack containing any objects it created during the merge, instead of storing those objects individually as loose. Similar to the existing `index_blob_bulk_checkin()` function, the entrypoint delegates to `deflate_blob_to_pack_incore()`, which is responsible for formatting the pack header and then deflating the contents into the pack. The latter is accomplished by calling deflate_blob_contents_to_pack_incore(), which takes advantage of the earlier refactoring and is responsible for writing the object to the pack and handling any overage from pack.packSizeLimit. The bulk of the new functionality is implemented in the function `stream_obj_to_pack_incore()`, which is a generic implementation for writing objects of arbitrary type (whose contents we can fit in-core) into a bulk-checkin pack. The new function shares an unfortunate degree of similarity to the existing `stream_blob_to_pack()` function. But DRY-ing up these two would likely be more trouble than it's worth, since the latter has to deal with reading and writing the contents of the object. Consistent with the rest of the bulk-checkin mechanism, there are no direct tests here. In future commits when we expose this new functionality via the `merge-tree` builtin, we will test it indirectly there. Signed-off-by: Taylor Blau --- bulk-checkin.c | 118 +++++++++++++++++++++++++++++++++++++++++++++++++ bulk-checkin.h | 4 ++ 2 files changed, 122 insertions(+) diff --git a/bulk-checkin.c b/bulk-checkin.c index f4914fb6d1..25cd1ffa25 100644 --- a/bulk-checkin.c +++ b/bulk-checkin.c @@ -140,6 +140,69 @@ static int already_written(struct bulk_checkin_packfile *state, struct object_id return 0; } +static int stream_obj_to_pack_incore(struct bulk_checkin_packfile *state, + git_hash_ctx *ctx, + off_t *already_hashed_to, + const void *buf, size_t size, + enum object_type type, + const char *path, unsigned flags) +{ + git_zstream s; + unsigned char obuf[16384]; + unsigned hdrlen; + int status = Z_OK; + int write_object = (flags & HASH_WRITE_OBJECT); + + git_deflate_init(&s, pack_compression_level); + + hdrlen = encode_in_pack_object_header(obuf, sizeof(obuf), type, size); + s.next_out = obuf + hdrlen; + s.avail_out = sizeof(obuf) - hdrlen; + + if (*already_hashed_to < size) { + size_t hsize = size - *already_hashed_to; + if (hsize) { + the_hash_algo->update_fn(ctx, buf, hsize); + } + *already_hashed_to = size; + } + s.next_in = (void *)buf; + s.avail_in = size; + + while (status != Z_STREAM_END) { + status = git_deflate(&s, Z_FINISH); + if (!s.avail_out || status == Z_STREAM_END) { + if (write_object) { + size_t written = s.next_out - obuf; + + /* would we bust the size limit? */ + if (state->nr_written && + pack_size_limit_cfg && + pack_size_limit_cfg < state->offset + written) { + git_deflate_abort(&s); + return -1; + } + + hashwrite(state->f, obuf, written); + state->offset += written; + } + s.next_out = obuf; + s.avail_out = sizeof(obuf); + } + + switch (status) { + case Z_OK: + case Z_BUF_ERROR: + case Z_STREAM_END: + continue; + default: + die("unexpected deflate failure: %d", status); + } + } + git_deflate_end(&s); + return 0; +} + /* * Read the contents from fd for size bytes, streaming it to the * packfile in state while updating the hash in ctx. Signal a failure @@ -316,6 +379,50 @@ static void finalize_checkpoint(struct bulk_checkin_packfile *state, } } +static int deflate_obj_contents_to_pack_incore(struct bulk_checkin_packfile *state, + git_hash_ctx *ctx, + struct hashfile_checkpoint *checkpoint, + struct object_id *result_oid, + const void *buf, size_t size, + enum object_type type, + const char *path, unsigned flags) +{ + struct pack_idx_entry *idx = NULL; + off_t already_hashed_to = 0; + + /* Note: idx is non-NULL when we are writing */ + if (flags & HASH_WRITE_OBJECT) + CALLOC_ARRAY(idx, 1); + + while (1) { + prepare_checkpoint(state, checkpoint, idx, flags); + if (!stream_obj_to_pack_incore(state, ctx, &already_hashed_to, + buf, size, type, path, flags)) + break; + truncate_checkpoint(state, checkpoint, idx); + } + + finalize_checkpoint(state, ctx, checkpoint, idx, result_oid); + + return 0; +} + +static int deflate_blob_to_pack_incore(struct bulk_checkin_packfile *state, + struct object_id *result_oid, + const void *buf, size_t size, + const char *path, unsigned flags) +{ + git_hash_ctx ctx; + struct hashfile_checkpoint checkpoint = {0}; + + format_object_header_hash(the_hash_algo, &ctx, &checkpoint, OBJ_BLOB, + size); + + return deflate_obj_contents_to_pack_incore(state, &ctx, &checkpoint, + result_oid, buf, size, + OBJ_BLOB, path, flags); +} + static int deflate_blob_to_pack(struct bulk_checkin_packfile *state, struct object_id *result_oid, int fd, size_t size, @@ -396,6 +503,17 @@ int index_blob_bulk_checkin(struct object_id *oid, return status; } +int index_blob_bulk_checkin_incore(struct object_id *oid, + const void *buf, size_t size, + const char *path, unsigned flags) +{ + int status = deflate_blob_to_pack_incore(&bulk_checkin_packfile, oid, + buf, size, path, flags); + if (!odb_transaction_nesting) + flush_bulk_checkin_packfile(&bulk_checkin_packfile); + return status; +} + void begin_odb_transaction(void) { odb_transaction_nesting += 1; diff --git a/bulk-checkin.h b/bulk-checkin.h index aa7286a7b3..1b91daeaee 100644 --- a/bulk-checkin.h +++ b/bulk-checkin.h @@ -13,6 +13,10 @@ int index_blob_bulk_checkin(struct object_id *oid, int fd, size_t size, const char *path, unsigned flags); +int index_blob_bulk_checkin_incore(struct object_id *oid, + const void *buf, size_t size, + const char *path, unsigned flags); + /* * Tell the object database to optimize for adding * multiple objects. end_odb_transaction must be called From patchwork Tue Oct 17 16:31:29 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Taylor Blau X-Patchwork-Id: 13425627 Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net [23.128.96.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 874102D059 for ; Tue, 17 Oct 2023 16:31:34 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=ttaylorr-com.20230601.gappssmtp.com header.i=@ttaylorr-com.20230601.gappssmtp.com header.b="vb3tCqoY" Received: from mail-qv1-xf2b.google.com (mail-qv1-xf2b.google.com [IPv6:2607:f8b0:4864:20::f2b]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 5BDD610F for ; Tue, 17 Oct 2023 09:31:32 -0700 (PDT) Received: by mail-qv1-xf2b.google.com with SMTP id 6a1803df08f44-66cfd874520so38416106d6.2 for ; Tue, 17 Oct 2023 09:31:32 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ttaylorr-com.20230601.gappssmtp.com; s=20230601; t=1697560291; x=1698165091; darn=vger.kernel.org; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=J2A8hcMW9oFp1Qg8zy8Z8vYpexHa6tGY5QnuDCf8HvQ=; b=vb3tCqoYf4eD+1N2a3MA/92PMtoCOCs6KBieA6WtIpMOlwEuycXiq4RPg3BDlMqYxx 2s8YXLMEqNmKbuZHk7CpCCj9Lb8e3MTprUsHRYu7uCFnbfleIrEcQNMQmWLR+cqPOOHW rtpeIgBfxk8wB9dg6P1tHjBuuZtXanZFR9LGzJEaYIkPKUKqXfc4xzUv3iT0ySvdiLY6 mSjy56QqGW+pn4Tp9uQ/5yjdAvwK47sQudSxZurErz4q3u8YF2ONoElCOOYP2LQGzLGX 4iT6XvAdWfDf7E7jsmPJgEuN9esLCVbxm1lfF3iOhX1EOoMrmPHKC9+4+LDRNvtO7i95 zWlg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1697560291; x=1698165091; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=J2A8hcMW9oFp1Qg8zy8Z8vYpexHa6tGY5QnuDCf8HvQ=; b=Ni9B0q9RscWoZ6+WQ+J6d/y4R1Lg86Ym/VXZyV/NA+Fxbl/qdedP/SAE8PyIcsMXnp nef7gKItH6RGeMrtv292IEIAkON6MCQawYWmEWoCoDdlXIFkMCMC7pfCkZyW388lGNuw 92JBM0R/aPP7PGGBMs4qGceASstxlf/DYJgpMDgzhTynX2xXDWYj2UGmS/N6CR+GoeLb Wa8TRRaMVpiusEtprx7uXm3/98OYmaoruCU1HaEClujNoXikk7VFl4IcNWQlAUgLapk0 bJmMzjtCkz/mtDT0VHu2G4B/mXmeXtrxVBivkJz6YsZ+TdEhr8slaskW7s2pysmXcmyv iMPg== X-Gm-Message-State: AOJu0YyaZq4lN+DeGRfCJAoGNOw5f4MwfFTk6W7kt8FofSfWySxU9+Wu +Q1FCtZtTruRUg8QgsIHd+gTmF6+gFgkSktnY0v+uQ== X-Google-Smtp-Source: AGHT+IHZN7klXtHwx1L6AhEkJD8wFqMqn7QHoLBe0NVrtwi51o8fuYl9hDEOwcWUMjGk8cMrc7sxTw== X-Received: by 2002:a05:6214:258a:b0:66d:130c:bb9d with SMTP id fq10-20020a056214258a00b0066d130cbb9dmr3162940qvb.13.1697560291008; Tue, 17 Oct 2023 09:31:31 -0700 (PDT) Received: from localhost (104-178-186-189.lightspeed.milwwi.sbcglobal.net. [104.178.186.189]) by smtp.gmail.com with ESMTPSA id eu2-20020ad44f42000000b006575372c845sm676342qvb.119.2023.10.17.09.31.30 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 17 Oct 2023 09:31:30 -0700 (PDT) Date: Tue, 17 Oct 2023 12:31:29 -0400 From: Taylor Blau To: git@vger.kernel.org Cc: Elijah Newren , "Eric W. Biederman" , Jeff King , Junio C Hamano , Patrick Steinhardt Subject: [PATCH v2 6/7] bulk-checkin: introduce `index_tree_bulk_checkin_incore()` Message-ID: <57613807d84d19fa2691fcf7fe81c4aa9a575d4b.1697560266.git.me@ttaylorr.com> References: Precedence: bulk X-Mailing-List: git@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_NONE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net The remaining missing piece in order to teach the `merge-tree` builtin how to write the contents of a merge into a pack is a function to index tree objects into a bulk-checkin pack. This patch implements that missing piece, which is a thin wrapper around all of the functionality introduced in previous commits. If and when Git gains support for a "compatibility" hash algorithm, the changes to support that here will be minimal. The bulk-checkin machinery will need to convert the incoming tree to compute its length under the compatibility hash, necessary to reconstruct its header. With that information (and the converted contents of the tree), the bulk-checkin machinery will have enough to keep track of the converted object's hash in order to update the compatibility mapping. Within `deflate_tree_to_pack_incore()`, the changes should be limited to something like: struct strbuf converted = STRBUF_INIT; if (the_repository->compat_hash_algo) { if (convert_object_file(&compat_obj, the_repository->hash_algo, the_repository->compat_hash_algo, ...) < 0) die(...); format_object_header_hash(the_repository->compat_hash_algo, OBJ_TREE, size); } /* compute the converted tree's hash using the compat algorithm */ strbuf_release(&converted); , assuming related changes throughout the rest of the bulk-checkin machinery necessary to update the hash of the converted object, which are likewise minimal in size. Signed-off-by: Taylor Blau --- bulk-checkin.c | 27 +++++++++++++++++++++++++++ bulk-checkin.h | 4 ++++ 2 files changed, 31 insertions(+) diff --git a/bulk-checkin.c b/bulk-checkin.c index 25cd1ffa25..fe13100e04 100644 --- a/bulk-checkin.c +++ b/bulk-checkin.c @@ -423,6 +423,22 @@ static int deflate_blob_to_pack_incore(struct bulk_checkin_packfile *state, OBJ_BLOB, path, flags); } +static int deflate_tree_to_pack_incore(struct bulk_checkin_packfile *state, + struct object_id *result_oid, + const void *buf, size_t size, + const char *path, unsigned flags) +{ + git_hash_ctx ctx; + struct hashfile_checkpoint checkpoint = {0}; + + format_object_header_hash(the_hash_algo, &ctx, &checkpoint, OBJ_TREE, + size); + + return deflate_obj_contents_to_pack_incore(state, &ctx, &checkpoint, + result_oid, buf, size, + OBJ_TREE, path, flags); +} + static int deflate_blob_to_pack(struct bulk_checkin_packfile *state, struct object_id *result_oid, int fd, size_t size, @@ -514,6 +530,17 @@ int index_blob_bulk_checkin_incore(struct object_id *oid, return status; } +int index_tree_bulk_checkin_incore(struct object_id *oid, + const void *buf, size_t size, + const char *path, unsigned flags) +{ + int status = deflate_tree_to_pack_incore(&bulk_checkin_packfile, oid, + buf, size, path, flags); + if (!odb_transaction_nesting) + flush_bulk_checkin_packfile(&bulk_checkin_packfile); + return status; +} + void begin_odb_transaction(void) { odb_transaction_nesting += 1; diff --git a/bulk-checkin.h b/bulk-checkin.h index 1b91daeaee..89786b3954 100644 --- a/bulk-checkin.h +++ b/bulk-checkin.h @@ -17,6 +17,10 @@ int index_blob_bulk_checkin_incore(struct object_id *oid, const void *buf, size_t size, const char *path, unsigned flags); +int index_tree_bulk_checkin_incore(struct object_id *oid, + const void *buf, size_t size, + const char *path, unsigned flags); + /* * Tell the object database to optimize for adding * multiple objects. end_odb_transaction must be called From patchwork Tue Oct 17 16:31:32 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Taylor Blau X-Patchwork-Id: 13425628 Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net [23.128.96.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 195632D02B for ; Tue, 17 Oct 2023 16:31:37 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=ttaylorr-com.20230601.gappssmtp.com header.i=@ttaylorr-com.20230601.gappssmtp.com header.b="joTGcePD" Received: from mail-qv1-xf34.google.com (mail-qv1-xf34.google.com [IPv6:2607:f8b0:4864:20::f34]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 54443102 for ; Tue, 17 Oct 2023 09:31:35 -0700 (PDT) Received: by mail-qv1-xf34.google.com with SMTP id 6a1803df08f44-65b0e623189so34242006d6.1 for ; Tue, 17 Oct 2023 09:31:35 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ttaylorr-com.20230601.gappssmtp.com; s=20230601; t=1697560294; x=1698165094; darn=vger.kernel.org; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=v3/HE0HcYCCqncHcoyjGoMV6FIIG1SvaKRiOw57KuXc=; b=joTGcePDWUNHdLGhGwdEhvKZj4o+jVm0qSTA4knfXvbSlLzz0cav7LPuD6NDDHwMbZ poXi99nN+QgzXPk4N3QBzgFsL9WyzuBGfEiX4FGiAGrJl5O8qsDBhf9KGlNeyu1++0Z/ 5gtYjoFlvssT+ARVxKPK3Sbner2Pt6NWO99f5AXa7ujABqp0aVdsJMpNfeWlrY2B45GO B6QsWEVgHMisBhwQDUQOzp4bitx0p7y29fN4wSDisisuSAoWka3hmAcIHeIhiU+d+oSF f2/4CC9rlAvsLvun0h7Cq0CEKKO5uauAwzXN+6AgQHWzLO6jur9cLk6p+k7/0+OYPT9o 6EWg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1697560294; x=1698165094; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=v3/HE0HcYCCqncHcoyjGoMV6FIIG1SvaKRiOw57KuXc=; b=w+sG3+1PYkYzltXO3oddr6782ugnEuzmovgf878M6t2VDvK3x9Zje9aKJlhz46J+8S 6tUJv+buI/nyvaBK8gGoBFsm8ElverBks/6tCt8+8h2e0+jap+CSkrDvlvw/6P9zWSTT SO3//UJh/CufOlzssbXIv13a2ixqrtqUgEhy0qu5l+gBSEOKErBLN+9X1q6pmKlgB1Ge dqKAgSu6LgAFJMACfaVBH9kRb+4WzjQ+y+TSoMJHu8/UmxjysSP8rBlj44g3kHny5LUQ 2y5jwfgutHHm+NZXvaYiaopj/q8MksgngmCqxFCMLMx1LL8Fc79UWW7yD3GIjK+JvaMP Zv7g== X-Gm-Message-State: AOJu0Yz6NWpgLf8XdKPdggtUt+YKrzRE4PXBt5RpaXZIfJvN8PJF9WWN 7eJrGtr/NmeEE5ItDJY+1xNqJ9cs5Mt+tSXZOkWX+w== X-Google-Smtp-Source: AGHT+IFv0SfcUFqtW8MCD5I3YDVNoOEkyM7dFF7GEe9QcHr+nr1MDwNsfHFpfET/XsP3rqAjyyhitg== X-Received: by 2002:a05:6214:1d22:b0:66d:1174:3b46 with SMTP id f2-20020a0562141d2200b0066d11743b46mr3630382qvd.50.1697560294250; Tue, 17 Oct 2023 09:31:34 -0700 (PDT) Received: from localhost (104-178-186-189.lightspeed.milwwi.sbcglobal.net. [104.178.186.189]) by smtp.gmail.com with ESMTPSA id g13-20020ad457ad000000b0065d0a4262e0sm677397qvx.70.2023.10.17.09.31.33 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 17 Oct 2023 09:31:33 -0700 (PDT) Date: Tue, 17 Oct 2023 12:31:32 -0400 From: Taylor Blau To: git@vger.kernel.org Cc: Elijah Newren , "Eric W. Biederman" , Jeff King , Junio C Hamano , Patrick Steinhardt Subject: [PATCH v2 7/7] builtin/merge-tree.c: implement support for `--write-pack` Message-ID: References: Precedence: bulk X-Mailing-List: git@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,RCVD_IN_DNSWL_BLOCKED,SPF_HELO_NONE,SPF_NONE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net When using merge-tree often within a repository[^1], it is possible to generate a relatively large number of loose objects, which can result in degraded performance, and inode exhaustion in extreme cases. Building on the functionality introduced in previous commits, the bulk-checkin machinery now has support to write arbitrary blob and tree objects which are small enough to be held in-core. We can use this to write any blob/tree objects generated by ORT into a separate pack instead of writing them out individually as loose. This functionality is gated behind a new `--write-pack` option to `merge-tree` that works with the (non-deprecated) `--write-tree` mode. The implementation is relatively straightforward. There are two spots within the ORT mechanism where we call `write_object_file()`, one for content differences within blobs, and another to assemble any new trees necessary to construct the merge. In each of those locations, conditionally replace calls to `write_object_file()` with `index_blob_bulk_checkin_incore()` or `index_tree_bulk_checkin_incore()` depending on which kind of object we are writing. The only remaining task is to begin and end the transaction necessary to initialize the bulk-checkin machinery, and move any new pack(s) it created into the main object store. [^1]: Such is the case at GitHub, where we run presumptive "test merges" on open pull requests to see whether or not we can light up the merge button green depending on whether or not the presumptive merge was conflicted. This is done in response to a number of user-initiated events, including viewing an open pull request whose last test merge is stale with respect to the current base and tip of the pull request. As a result, merge-tree can be run very frequently on large, active repositories. Signed-off-by: Taylor Blau --- Documentation/git-merge-tree.txt | 4 ++ builtin/merge-tree.c | 5 ++ merge-ort.c | 42 +++++++++++---- merge-recursive.h | 1 + t/t4301-merge-tree-write-tree.sh | 93 ++++++++++++++++++++++++++++++++ 5 files changed, 136 insertions(+), 9 deletions(-) diff --git a/Documentation/git-merge-tree.txt b/Documentation/git-merge-tree.txt index ffc4fbf7e8..9d37609ef1 100644 --- a/Documentation/git-merge-tree.txt +++ b/Documentation/git-merge-tree.txt @@ -69,6 +69,10 @@ OPTIONS specify a merge-base for the merge, and specifying multiple bases is currently not supported. This option is incompatible with `--stdin`. +--write-pack:: + Write any new objects into a separate packfile instead of as + individual loose objects. + [[OUTPUT]] OUTPUT ------ diff --git a/builtin/merge-tree.c b/builtin/merge-tree.c index 0de42aecf4..672ebd4c54 100644 --- a/builtin/merge-tree.c +++ b/builtin/merge-tree.c @@ -18,6 +18,7 @@ #include "quote.h" #include "tree.h" #include "config.h" +#include "bulk-checkin.h" static int line_termination = '\n'; @@ -414,6 +415,7 @@ struct merge_tree_options { int show_messages; int name_only; int use_stdin; + int write_pack; }; static int real_merge(struct merge_tree_options *o, @@ -440,6 +442,7 @@ static int real_merge(struct merge_tree_options *o, init_merge_options(&opt, the_repository); opt.show_rename_progress = 0; + opt.write_pack = o->write_pack; opt.branch1 = branch1; opt.branch2 = branch2; @@ -548,6 +551,8 @@ int cmd_merge_tree(int argc, const char **argv, const char *prefix) &merge_base, N_("commit"), N_("specify a merge-base for the merge")), + OPT_BOOL(0, "write-pack", &o.write_pack, + N_("write new objects to a pack instead of as loose")), OPT_END() }; diff --git a/merge-ort.c b/merge-ort.c index 7857ce9fbd..e198d2bc2b 100644 --- a/merge-ort.c +++ b/merge-ort.c @@ -48,6 +48,7 @@ #include "tree.h" #include "unpack-trees.h" #include "xdiff-interface.h" +#include "bulk-checkin.h" /* * We have many arrays of size 3. Whenever we have such an array, the @@ -2107,10 +2108,19 @@ static int handle_content_merge(struct merge_options *opt, if ((merge_status < 0) || !result_buf.ptr) ret = error(_("failed to execute internal merge")); - if (!ret && - write_object_file(result_buf.ptr, result_buf.size, - OBJ_BLOB, &result->oid)) - ret = error(_("unable to add %s to database"), path); + if (!ret) { + ret = opt->write_pack + ? index_blob_bulk_checkin_incore(&result->oid, + result_buf.ptr, + result_buf.size, + path, 1) + : write_object_file(result_buf.ptr, + result_buf.size, + OBJ_BLOB, &result->oid); + if (ret) + ret = error(_("unable to add %s to database"), + path); + } free(result_buf.ptr); if (ret) @@ -3596,7 +3606,8 @@ static int tree_entry_order(const void *a_, const void *b_) b->string, strlen(b->string), bmi->result.mode); } -static int write_tree(struct object_id *result_oid, +static int write_tree(struct merge_options *opt, + struct object_id *result_oid, struct string_list *versions, unsigned int offset, size_t hash_size) @@ -3630,8 +3641,14 @@ static int write_tree(struct object_id *result_oid, } /* Write this object file out, and record in result_oid */ - if (write_object_file(buf.buf, buf.len, OBJ_TREE, result_oid)) + ret = opt->write_pack + ? index_tree_bulk_checkin_incore(result_oid, + buf.buf, buf.len, "", 1) + : write_object_file(buf.buf, buf.len, OBJ_TREE, result_oid); + + if (ret) ret = -1; + strbuf_release(&buf); return ret; } @@ -3796,8 +3813,8 @@ static int write_completed_directory(struct merge_options *opt, */ dir_info->is_null = 0; dir_info->result.mode = S_IFDIR; - if (write_tree(&dir_info->result.oid, &info->versions, offset, - opt->repo->hash_algo->rawsz) < 0) + if (write_tree(opt, &dir_info->result.oid, &info->versions, + offset, opt->repo->hash_algo->rawsz) < 0) ret = -1; } @@ -4331,9 +4348,13 @@ static int process_entries(struct merge_options *opt, fflush(stdout); BUG("dir_metadata accounting completely off; shouldn't happen"); } - if (write_tree(result_oid, &dir_metadata.versions, 0, + if (write_tree(opt, result_oid, &dir_metadata.versions, 0, opt->repo->hash_algo->rawsz) < 0) ret = -1; + + if (opt->write_pack) + end_odb_transaction(); + cleanup: string_list_clear(&plist, 0); string_list_clear(&dir_metadata.versions, 0); @@ -4877,6 +4898,9 @@ static void merge_start(struct merge_options *opt, struct merge_result *result) */ strmap_init(&opt->priv->conflicts); + if (opt->write_pack) + begin_odb_transaction(); + trace2_region_leave("merge", "allocate/init", opt->repo); } diff --git a/merge-recursive.h b/merge-recursive.h index b88000e3c2..156e160876 100644 --- a/merge-recursive.h +++ b/merge-recursive.h @@ -48,6 +48,7 @@ struct merge_options { unsigned renormalize : 1; unsigned record_conflict_msgs_as_headers : 1; const char *msg_header_prefix; + unsigned write_pack : 1; /* internal fields used by the implementation */ struct merge_options_internal *priv; diff --git a/t/t4301-merge-tree-write-tree.sh b/t/t4301-merge-tree-write-tree.sh index 250f721795..2d81ff4de5 100755 --- a/t/t4301-merge-tree-write-tree.sh +++ b/t/t4301-merge-tree-write-tree.sh @@ -922,4 +922,97 @@ test_expect_success 'check the input format when --stdin is passed' ' test_cmp expect actual ' +packdir=".git/objects/pack" + +test_expect_success 'merge-tree can pack its result with --write-pack' ' + test_when_finished "rm -rf repo" && + git init repo && + + # base has lines [3, 4, 5] + # - side adds to the beginning, resulting in [1, 2, 3, 4, 5] + # - other adds to the end, resulting in [3, 4, 5, 6, 7] + # + # merging the two should result in a new blob object containing + # [1, 2, 3, 4, 5, 6, 7], along with a new tree. + test_commit -C repo base file "$(test_seq 3 5)" && + git -C repo branch -M main && + git -C repo checkout -b side main && + test_commit -C repo side file "$(test_seq 1 5)" && + git -C repo checkout -b other main && + test_commit -C repo other file "$(test_seq 3 7)" && + + find repo/$packdir -type f -name "pack-*.idx" >packs.before && + tree="$(git -C repo merge-tree --write-pack \ + refs/tags/side refs/tags/other)" && + blob="$(git -C repo rev-parse $tree:file)" && + find repo/$packdir -type f -name "pack-*.idx" >packs.after && + + test_must_be_empty packs.before && + test_line_count = 1 packs.after && + + git show-index <$(cat packs.after) >objects && + test_line_count = 2 objects && + grep "^[1-9][0-9]* $tree" objects && + grep "^[1-9][0-9]* $blob" objects +' + +test_expect_success 'merge-tree can write multiple packs with --write-pack' ' + test_when_finished "rm -rf repo" && + git init repo && + ( + cd repo && + + git config pack.packSizeLimit 512 && + + test_seq 512 >f && + + # "f" contains roughly ~2,000 bytes. + # + # Each side ("foo" and "bar") adds a small amount of data at the + # beginning and end of "base", respectively. + git add f && + test_tick && + git commit -m base && + git branch -M main && + + git checkout -b foo main && + { + echo foo && cat f + } >f.tmp && + mv f.tmp f && + git add f && + test_tick && + git commit -m foo && + + git checkout -b bar main && + echo bar >>f && + git add f && + test_tick && + git commit -m bar && + + find $packdir -type f -name "pack-*.idx" >packs.before && + # Merging either side should result in a new object which is + # larger than 1M, thus the result should be split into two + # separate packs. + tree="$(git merge-tree --write-pack \ + refs/heads/foo refs/heads/bar)" && + blob="$(git rev-parse $tree:f)" && + find $packdir -type f -name "pack-*.idx" >packs.after && + + test_must_be_empty packs.before && + test_line_count = 2 packs.after && + for idx in $(cat packs.after) + do + git show-index <$idx || return 1 + done >objects && + + # The resulting set of packs should contain one copy of both + # objects, each in a separate pack. + test_line_count = 2 objects && + grep "^[1-9][0-9]* $tree" objects && + grep "^[1-9][0-9]* $blob" objects + + ) +' + test_done