From patchwork Mon Oct 23 22:44:56 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Taylor Blau X-Patchwork-Id: 13433669 Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net [23.128.96.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id B34A7224DE for ; Mon, 23 Oct 2023 22:44:59 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=ttaylorr-com.20230601.gappssmtp.com header.i=@ttaylorr-com.20230601.gappssmtp.com header.b="arSgZaDz" Received: from mail-yb1-xb29.google.com (mail-yb1-xb29.google.com [IPv6:2607:f8b0:4864:20::b29]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 3DE4D10D for ; Mon, 23 Oct 2023 15:44:58 -0700 (PDT) Received: by mail-yb1-xb29.google.com with SMTP id 3f1490d57ef6-d9b9adaf291so2981738276.1 for ; Mon, 23 Oct 2023 15:44:58 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ttaylorr-com.20230601.gappssmtp.com; s=20230601; t=1698101097; x=1698705897; darn=vger.kernel.org; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=M69HJlHcyV0wktY2lfvWR+t+JHaDI+2avkioFDrDme8=; b=arSgZaDzAIiDIK4Svq6afuyz2VOC/s2fjC0P8KZGSdnn5RVVyaszZZs+Up6zAJg8g/ AAbShMIRrdlAez8aIN/evSHNe36zXad4eDQmRidJfmnRwoKYOKpnA//eBR7DQ+4plrPD PtE7djgjouWGXHf/X/KIoRG6O9WWGDBheJVXBJyYwP4mO+DfhKDp12gYCTz4MpwQ04Ot REvoNtQenLxKYxp8oe5WzAc7MaxzNC4/aiZ5Y4pBzzpDTYuZdtEdMluRMr4dIOCc2/W5 wnVMWax7GHYovguCuCwB/LzLtTBcFhO+hQRDVyNzWrSJRh1LAigppepCQvQKznJAN8aW 88MA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1698101097; x=1698705897; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=M69HJlHcyV0wktY2lfvWR+t+JHaDI+2avkioFDrDme8=; b=AEeo5xdECNNpwW1vErsuQgaqVwqi2SUJXi1Pg93tA/oTgIYtWwH2OWpH/RStQ/VGUu 2BBZprU4LydvZq40hs4c8Ifah4+cnamIEpV8LioEIyNsddZlkrJ7sSSz7LJ4BFw8uh7M DQWZPosvOx9QQBah/gQXPiWVCUsI4/H7V3p4jcmdt1O7MhXBJs1oGRhLwEiL16du7grt GOHrP90Xmt8OXuW4jU0Djfx+Wwku2u0NOWmgHcg9EhcOiHsN2mPlnJbVThwv/WV0WZ4V 69702UlpdJVK00CdG6eQuTmdfWfNb5Nn8hIaScGKVybw3QHFc7SiDH2tn9R6XqL/sk6f j2/w== X-Gm-Message-State: AOJu0YwDuK3KqYNBks6loK3vqShF1uDZtsoKFLmxnFdTVyWRBpEzBJE2 PcJeG+Yl/pXlsUIu0X2n1EDHrL7F+RFbpZrseH0jOw== X-Google-Smtp-Source: AGHT+IF/GPHnwBbFW2mZ1jE7FoStvMUazfz0RElwNDHu2rderW1QvDe1evx3zjMP6J7Tw3t/h79jbw== X-Received: by 2002:a05:6902:1504:b0:d9a:b70c:d32b with SMTP id q4-20020a056902150400b00d9ab70cd32bmr10919532ybu.41.1698101097219; Mon, 23 Oct 2023 15:44:57 -0700 (PDT) Received: from localhost (104-178-186-189.lightspeed.milwwi.sbcglobal.net. [104.178.186.189]) by smtp.gmail.com with ESMTPSA id 205-20020a2505d6000000b00d814d8dfd69sm3075560ybf.27.2023.10.23.15.44.56 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 23 Oct 2023 15:44:56 -0700 (PDT) Date: Mon, 23 Oct 2023 18:44:56 -0400 From: Taylor Blau To: git@vger.kernel.org Cc: Elijah Newren , "Eric W. Biederman" , Jeff King , Junio C Hamano , Patrick Steinhardt Subject: [PATCH v5 1/5] bulk-checkin: extract abstract `bulk_checkin_source` Message-ID: <696aa027e46ddec310812fad2d4b12082447d925.1698101088.git.me@ttaylorr.com> References: Precedence: bulk X-Mailing-List: git@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: A future commit will want to implement a very similar routine as in `stream_blob_to_pack()` with two notable changes: - Instead of streaming just OBJ_BLOBs, this new function may want to stream objects of arbitrary type. - Instead of streaming the object's contents from an open file-descriptor, this new function may want to "stream" its contents from memory. To avoid duplicating a significant chunk of code between the existing `stream_blob_to_pack()`, extract an abstract `bulk_checkin_source`. This concept currently is a thin layer of `lseek()` and `read_in_full()`, but will grow to understand how to perform analogous operations when writing out an object's contents from memory. Suggested-by: Junio C Hamano Signed-off-by: Taylor Blau --- bulk-checkin.c | 65 +++++++++++++++++++++++++++++++++++++++++++------- 1 file changed, 57 insertions(+), 8 deletions(-) diff --git a/bulk-checkin.c b/bulk-checkin.c index 6ce62999e5..174a6c24e4 100644 --- a/bulk-checkin.c +++ b/bulk-checkin.c @@ -140,8 +140,49 @@ static int already_written(struct bulk_checkin_packfile *state, struct object_id return 0; } +struct bulk_checkin_source { + off_t (*read)(struct bulk_checkin_source *, void *, size_t); + off_t (*seek)(struct bulk_checkin_source *, off_t); + + union { + struct { + int fd; + } from_fd; + } data; + + size_t size; + const char *path; +}; + +static off_t bulk_checkin_source_read_from_fd(struct bulk_checkin_source *source, + void *buf, size_t nr) +{ + return read_in_full(source->data.from_fd.fd, buf, nr); +} + +static off_t bulk_checkin_source_seek_from_fd(struct bulk_checkin_source *source, + off_t offset) +{ + return lseek(source->data.from_fd.fd, offset, SEEK_SET); +} + +static void init_bulk_checkin_source_from_fd(struct bulk_checkin_source *source, + int fd, size_t size, + const char *path) +{ + memset(source, 0, sizeof(struct bulk_checkin_source)); + + source->read = bulk_checkin_source_read_from_fd; + source->seek = bulk_checkin_source_seek_from_fd; + + source->data.from_fd.fd = fd; + + source->size = size; + source->path = path; +} + /* - * Read the contents from fd for size bytes, streaming it to the + * Read the contents from 'source' for 'size' bytes, streaming it to the * packfile in state while updating the hash in ctx. Signal a failure * by returning a negative value when the resulting pack would exceed * the pack size limit and this is not the first object in the pack, @@ -157,7 +198,7 @@ static int already_written(struct bulk_checkin_packfile *state, struct object_id */ static int stream_blob_to_pack(struct bulk_checkin_packfile *state, git_hash_ctx *ctx, off_t *already_hashed_to, - int fd, size_t size, const char *path, + struct bulk_checkin_source *source, unsigned flags) { git_zstream s; @@ -167,22 +208,27 @@ static int stream_blob_to_pack(struct bulk_checkin_packfile *state, int status = Z_OK; int write_object = (flags & HASH_WRITE_OBJECT); off_t offset = 0; + size_t size = source->size; git_deflate_init(&s, pack_compression_level); - hdrlen = encode_in_pack_object_header(obuf, sizeof(obuf), OBJ_BLOB, size); + hdrlen = encode_in_pack_object_header(obuf, sizeof(obuf), OBJ_BLOB, + size); s.next_out = obuf + hdrlen; s.avail_out = sizeof(obuf) - hdrlen; while (status != Z_STREAM_END) { if (size && !s.avail_in) { ssize_t rsize = size < sizeof(ibuf) ? size : sizeof(ibuf); - ssize_t read_result = read_in_full(fd, ibuf, rsize); + ssize_t read_result; + + read_result = source->read(source, ibuf, rsize); if (read_result < 0) - die_errno("failed to read from '%s'", path); + die_errno("failed to read from '%s'", + source->path); if (read_result != rsize) die("failed to read %d bytes from '%s'", - (int)rsize, path); + (int)rsize, source->path); offset += rsize; if (*already_hashed_to < offset) { size_t hsize = offset - *already_hashed_to; @@ -258,6 +304,9 @@ static int deflate_blob_to_pack(struct bulk_checkin_packfile *state, unsigned header_len; struct hashfile_checkpoint checkpoint = {0}; struct pack_idx_entry *idx = NULL; + struct bulk_checkin_source source; + + init_bulk_checkin_source_from_fd(&source, fd, size, path); seekback = lseek(fd, 0, SEEK_CUR); if (seekback == (off_t) -1) @@ -283,7 +332,7 @@ static int deflate_blob_to_pack(struct bulk_checkin_packfile *state, crc32_begin(state->f); } if (!stream_blob_to_pack(state, &ctx, &already_hashed_to, - fd, size, path, flags)) + &source, flags)) break; /* * Writing this object to the current pack will make @@ -295,7 +344,7 @@ static int deflate_blob_to_pack(struct bulk_checkin_packfile *state, hashfile_truncate(state->f, &checkpoint); state->offset = checkpoint.offset; flush_bulk_checkin_packfile(state); - if (lseek(fd, seekback, SEEK_SET) == (off_t) -1) + if (source.seek(&source, seekback) == (off_t)-1) return error("cannot seek back"); } the_hash_algo->final_oid_fn(result_oid, &ctx); From patchwork Mon Oct 23 22:44:59 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Taylor Blau X-Patchwork-Id: 13433670 Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net [23.128.96.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 7892523760 for ; Mon, 23 Oct 2023 22:45:02 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=ttaylorr-com.20230601.gappssmtp.com header.i=@ttaylorr-com.20230601.gappssmtp.com header.b="VPQehhIu" Received: from mail-yw1-x112f.google.com (mail-yw1-x112f.google.com [IPv6:2607:f8b0:4864:20::112f]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 0879C10E for ; Mon, 23 Oct 2023 15:45:01 -0700 (PDT) Received: by mail-yw1-x112f.google.com with SMTP id 00721157ae682-5a7c011e113so41993967b3.1 for ; Mon, 23 Oct 2023 15:45:00 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ttaylorr-com.20230601.gappssmtp.com; s=20230601; t=1698101100; x=1698705900; darn=vger.kernel.org; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=PCwa04nllI+txxAefr6o6HeUKTygNGiXBI3Nkz211y8=; b=VPQehhIuOfsQJu7sytnFZLokTXiRpaGF6JsY6MpipvM8wvSSm4xIOi06GA8nAzxW/4 2gcHlSP9W2iDuechufl4Bos9a7btgtltxfldNovd+wvNYnIA+JlxDyt84KMNV0Rdvfwq T4BkRkxrpD12wIvO/2bikTKNGqeu6WYAVxdyw05qLkljyJj1agfr5vL2kRocXIpwxjSy tibate3g9zFefq/VvHhRfGoyIFC/eP2dZq9GYuHVWhZNN4b+GT8jrhTEB7OYO5SzpEno VO6XEVV5sWf/TQHnRLgKClNvT+ZY8VRV5COqUedvnZl4gyXsq/2gXsx/XM0vDXGsCxNk bF5w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1698101100; x=1698705900; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=PCwa04nllI+txxAefr6o6HeUKTygNGiXBI3Nkz211y8=; b=YdyYQQ0EA8dv7aCPSQ4f/z4LPIPlgNjuQB+EIrDpGuPvfUZi7J0dsYV2UHHf9Vkcdq LovO80JRD2r1zuFVTG79ePZOCpPjPWcFpXeKRGThPjy9/ypgXD5IuNxi0KqwZ0Kuej8D eC71gkFeGzJI5qZ9KgdIJpxF6knzHo+9bHUZksvhRRWmjpQZ7MSq1hQlatiNB31pmk08 jut/NMZKCUTcTuckdpdTtTfEq+Bgog2kc4aVdkBU6ULhYyacqb9Bn8Y9ZB1TEY+dAtFh F9xSO5PTjgIVuAE3FXsYstmcccZhOy5iNQ31dEamjJQRT21rKEB8UFltop2pPqWjyt2k whaA== X-Gm-Message-State: AOJu0YyQ6/veoRv0VZYvkRY7+yT2+qP3XAvoJQ+Rig91h+0hwFSXKEPz DLBLi1i17SyY0xM653Pv24xRoxMegi6ZHB9GvujeAA== X-Google-Smtp-Source: AGHT+IGslSKc+WVPmno1V03qKSZAhHjsAI9v3Jw8Ag72SvceWq61ejaNTmeVh+1KQnb62oka7BXlEA== X-Received: by 2002:a0d:eb4a:0:b0:5a7:ec86:fc84 with SMTP id u71-20020a0deb4a000000b005a7ec86fc84mr11554492ywe.21.1698101100019; Mon, 23 Oct 2023 15:45:00 -0700 (PDT) Received: from localhost (104-178-186-189.lightspeed.milwwi.sbcglobal.net. [104.178.186.189]) by smtp.gmail.com with ESMTPSA id a23-20020a0dd817000000b005a8dbe385d1sm3515928ywe.12.2023.10.23.15.44.59 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 23 Oct 2023 15:44:59 -0700 (PDT) Date: Mon, 23 Oct 2023 18:44:59 -0400 From: Taylor Blau To: git@vger.kernel.org Cc: Elijah Newren , "Eric W. Biederman" , Jeff King , Junio C Hamano , Patrick Steinhardt Subject: [PATCH v5 2/5] bulk-checkin: generify `stream_blob_to_pack()` for arbitrary types Message-ID: <596bd028a74f45c8f7ecf46dc5eb25f45ff5f523.1698101088.git.me@ttaylorr.com> References: Precedence: bulk X-Mailing-List: git@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: The existing `stream_blob_to_pack()` function is named based on the fact that it knows only how to stream blobs into a bulk-checkin pack. But there is no longer anything in this function which prevents us from writing objects of arbitrary types to the bulk-checkin pack. Prepare to write OBJ_TREEs by removing this assumption, adding an `enum object_type` parameter to this function's argument list, and renaming it to `stream_obj_to_pack()` accordingly. Signed-off-by: Taylor Blau --- bulk-checkin.c | 61 +++++++++++++++++++++++++++++--------------------- 1 file changed, 36 insertions(+), 25 deletions(-) diff --git a/bulk-checkin.c b/bulk-checkin.c index 174a6c24e4..79776e679e 100644 --- a/bulk-checkin.c +++ b/bulk-checkin.c @@ -196,10 +196,10 @@ static void init_bulk_checkin_source_from_fd(struct bulk_checkin_source *source, * status before calling us just in case we ask it to call us again * with a new pack. */ -static int stream_blob_to_pack(struct bulk_checkin_packfile *state, - git_hash_ctx *ctx, off_t *already_hashed_to, - struct bulk_checkin_source *source, - unsigned flags) +static int stream_obj_to_pack(struct bulk_checkin_packfile *state, + git_hash_ctx *ctx, off_t *already_hashed_to, + struct bulk_checkin_source *source, + enum object_type type, unsigned flags) { git_zstream s; unsigned char ibuf[16384]; @@ -212,8 +212,7 @@ static int stream_blob_to_pack(struct bulk_checkin_packfile *state, git_deflate_init(&s, pack_compression_level); - hdrlen = encode_in_pack_object_header(obuf, sizeof(obuf), OBJ_BLOB, - size); + hdrlen = encode_in_pack_object_header(obuf, sizeof(obuf), type, size); s.next_out = obuf + hdrlen; s.avail_out = sizeof(obuf) - hdrlen; @@ -293,27 +292,23 @@ static void prepare_to_stream(struct bulk_checkin_packfile *state, die_errno("unable to write pack header"); } -static int deflate_blob_to_pack(struct bulk_checkin_packfile *state, - struct object_id *result_oid, - int fd, size_t size, - const char *path, unsigned flags) + +static int deflate_obj_to_pack(struct bulk_checkin_packfile *state, + struct object_id *result_oid, + struct bulk_checkin_source *source, + enum object_type type, + off_t seekback, + unsigned flags) { - off_t seekback, already_hashed_to; + off_t already_hashed_to = 0; git_hash_ctx ctx; unsigned char obuf[16384]; unsigned header_len; struct hashfile_checkpoint checkpoint = {0}; struct pack_idx_entry *idx = NULL; - struct bulk_checkin_source source; - init_bulk_checkin_source_from_fd(&source, fd, size, path); - - seekback = lseek(fd, 0, SEEK_CUR); - if (seekback == (off_t) -1) - return error("cannot find the current offset"); - - header_len = format_object_header((char *)obuf, sizeof(obuf), - OBJ_BLOB, size); + header_len = format_object_header((char *)obuf, sizeof(obuf), type, + source->size); the_hash_algo->init_fn(&ctx); the_hash_algo->update_fn(&ctx, obuf, header_len); the_hash_algo->init_fn(&checkpoint.ctx); @@ -322,8 +317,6 @@ static int deflate_blob_to_pack(struct bulk_checkin_packfile *state, if ((flags & HASH_WRITE_OBJECT) != 0) CALLOC_ARRAY(idx, 1); - already_hashed_to = 0; - while (1) { prepare_to_stream(state, flags); if (idx) { @@ -331,8 +324,8 @@ static int deflate_blob_to_pack(struct bulk_checkin_packfile *state, idx->offset = state->offset; crc32_begin(state->f); } - if (!stream_blob_to_pack(state, &ctx, &already_hashed_to, - &source, flags)) + if (!stream_obj_to_pack(state, &ctx, &already_hashed_to, + source, type, flags)) break; /* * Writing this object to the current pack will make @@ -344,7 +337,7 @@ static int deflate_blob_to_pack(struct bulk_checkin_packfile *state, hashfile_truncate(state->f, &checkpoint); state->offset = checkpoint.offset; flush_bulk_checkin_packfile(state); - if (source.seek(&source, seekback) == (off_t)-1) + if (source->seek(source, seekback) == (off_t)-1) return error("cannot seek back"); } the_hash_algo->final_oid_fn(result_oid, &ctx); @@ -366,6 +359,24 @@ static int deflate_blob_to_pack(struct bulk_checkin_packfile *state, return 0; } +static int deflate_blob_to_pack(struct bulk_checkin_packfile *state, + struct object_id *result_oid, + int fd, size_t size, + const char *path, unsigned flags) +{ + struct bulk_checkin_source source; + off_t seekback; + + init_bulk_checkin_source_from_fd(&source, fd, size, path); + + seekback = lseek(fd, 0, SEEK_CUR); + if (seekback == (off_t) -1) + return error("cannot find the current offset"); + + return deflate_obj_to_pack(state, result_oid, &source, OBJ_BLOB, + seekback, flags); +} + void prepare_loose_object_bulk_checkin(void) { /* From patchwork Mon Oct 23 22:45:01 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Taylor Blau X-Patchwork-Id: 13433671 Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net [23.128.96.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 0FF41241F7 for ; Mon, 23 Oct 2023 22:45:04 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=ttaylorr-com.20230601.gappssmtp.com header.i=@ttaylorr-com.20230601.gappssmtp.com header.b="XQORGZpb" Received: from mail-yb1-xb35.google.com (mail-yb1-xb35.google.com [IPv6:2607:f8b0:4864:20::b35]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 815F8DF for ; Mon, 23 Oct 2023 15:45:03 -0700 (PDT) Received: by mail-yb1-xb35.google.com with SMTP id 3f1490d57ef6-d81d09d883dso3587128276.0 for ; Mon, 23 Oct 2023 15:45:03 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ttaylorr-com.20230601.gappssmtp.com; s=20230601; t=1698101102; x=1698705902; darn=vger.kernel.org; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=wZhWzU306bJ5fvb6pmzE4F5FDInCiZk0yBmAdk6EUMg=; b=XQORGZpbBSpVgDmRWAPHC50Atl7aPnjzbaTCaMmTOXvwAiP2vchWzxo1vYeM5QVkrB v1xAIKUnZJ43TKZWWELUdKbCv3ycENHGf/zgzDgEC5NguKhl+cQpWuCKGPO7kseWjqtC YwUkHMsRVNkCY5lsHuol+1kyakEeHybjsg2bIHq18O8WinayAR7lseqcN3fD25AQXNec HScLaneLZxSyv0SNKeRfPxX65ehBsFiq0eUsZGkoeikOS5QNNvgv7fJgJtRlIgKsjI0N CHcuDMg/oRMrs+N3lTYA2h+O2YXzu5hNgokUJmlMv7iXdJZAfiuK/W1ibtHJbIV5ctKH Cf7A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1698101102; x=1698705902; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=wZhWzU306bJ5fvb6pmzE4F5FDInCiZk0yBmAdk6EUMg=; b=rp0u27FpSVIWcedFaUJNDqaeJmxVgLaHtgjq8CqUla39GY+hCnYzAitYgo+qtKXUDz rOSLyAte/93eHZhOqJtT8RLUmlrfG23Azsj8bm4C5zeNggLXJBexmMFqmXKDHZmHzHJ0 BH5WAjlaQFdjLuaA6NheIv6O6CyR59NU2VdqR0huURkED/eK8CbJMKDuP/QF1gxqEvW4 kLdHm829yDnQuzXQrvp7A8xuNgitW2MkxhSK/G2G7NE7+hDYnauQNTh4EW8+iIqqHTYX ja6TwvAukb2y2E66kEztcnyYgd64UpLApv+Qvq9X7EBgIf3U3wSdbJYUMAd+Vz8/ty8F DmDg== X-Gm-Message-State: AOJu0YzHKdF4u27EXh2a/Tm46Ui7EWaoX8IwzMT1wahOuu7k0QGv15mo fnHXjt1basCujp4pCtFTcz1NwRm1lfnUpxOYFz7hJA== X-Google-Smtp-Source: AGHT+IEQ00WeGI16h7XXkAkFQCSBJttQ6BwCFdOngxYLZIYS4ZbVeNVF+kFQRmjadK6yuBj9lvj1oA== X-Received: by 2002:a25:d20e:0:b0:d9c:a485:332b with SMTP id j14-20020a25d20e000000b00d9ca485332bmr9755641ybg.4.1698101102513; Mon, 23 Oct 2023 15:45:02 -0700 (PDT) Received: from localhost (104-178-186-189.lightspeed.milwwi.sbcglobal.net. [104.178.186.189]) by smtp.gmail.com with ESMTPSA id x17-20020a25ac91000000b00d995a8b956csm3067865ybi.51.2023.10.23.15.45.02 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 23 Oct 2023 15:45:02 -0700 (PDT) Date: Mon, 23 Oct 2023 18:45:01 -0400 From: Taylor Blau To: git@vger.kernel.org Cc: Elijah Newren , "Eric W. Biederman" , Jeff King , Junio C Hamano , Patrick Steinhardt Subject: [PATCH v5 3/5] bulk-checkin: introduce `index_blob_bulk_checkin_incore()` Message-ID: References: Precedence: bulk X-Mailing-List: git@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: Introduce `index_blob_bulk_checkin_incore()` which allows streaming arbitrary blob contents from memory into the bulk-checkin pack. In order to support streaming from a location in memory, we must implement a new kind of bulk_checkin_source that does just that. These implementation in spread out across: - init_bulk_checkin_source_incore() - bulk_checkin_source_read_incore() - bulk_checkin_source_seek_incore() Note that, unlike file descriptors, which manage their own offset internally, we have to keep track of how many bytes we've read out of the buffer, and make sure we don't read past the end of the buffer. This will be useful in a couple of more commits in order to provide the `merge-tree` builtin with a mechanism to create a new pack containing any objects it created during the merge, instead of storing those objects individually as loose. Similar to the existing `index_blob_bulk_checkin()` function, the entrypoint delegates to `deflate_obj_to_pack_incore()`. That function in turn delegates to deflate_obj_to_pack(), which is responsible for formatting the pack header and then deflating the contents into the pack. Consistent with the rest of the bulk-checkin mechanism, there are no direct tests here. In future commits when we expose this new functionality via the `merge-tree` builtin, we will test it indirectly there. Signed-off-by: Taylor Blau --- bulk-checkin.c | 75 ++++++++++++++++++++++++++++++++++++++++++++++++++ bulk-checkin.h | 4 +++ 2 files changed, 79 insertions(+) diff --git a/bulk-checkin.c b/bulk-checkin.c index 79776e679e..b728210bc7 100644 --- a/bulk-checkin.c +++ b/bulk-checkin.c @@ -148,6 +148,10 @@ struct bulk_checkin_source { struct { int fd; } from_fd; + struct { + const void *buf; + size_t nr_read; + } incore; } data; size_t size; @@ -166,6 +170,36 @@ static off_t bulk_checkin_source_seek_from_fd(struct bulk_checkin_source *source return lseek(source->data.from_fd.fd, offset, SEEK_SET); } +static off_t bulk_checkin_source_read_incore(struct bulk_checkin_source *source, + void *buf, size_t nr) +{ + const unsigned char *src = source->data.incore.buf; + + if (source->data.incore.nr_read > source->size) + BUG("read beyond bulk-checkin source buffer end " + "(%"PRIuMAX" > %"PRIuMAX")", + (uintmax_t)source->data.incore.nr_read, + (uintmax_t)source->size); + + if (nr > source->size - source->data.incore.nr_read) + nr = source->size - source->data.incore.nr_read; + + src += source->data.incore.nr_read; + + memcpy(buf, src, nr); + source->data.incore.nr_read += nr; + return nr; +} + +static off_t bulk_checkin_source_seek_incore(struct bulk_checkin_source *source, + off_t offset) +{ + if (!(0 <= offset && offset < source->size)) + return (off_t)-1; + source->data.incore.nr_read = offset; + return source->data.incore.nr_read; +} + static void init_bulk_checkin_source_from_fd(struct bulk_checkin_source *source, int fd, size_t size, const char *path) @@ -181,6 +215,22 @@ static void init_bulk_checkin_source_from_fd(struct bulk_checkin_source *source, source->path = path; } +static void init_bulk_checkin_source_incore(struct bulk_checkin_source *source, + const void *buf, size_t size, + const char *path) +{ + memset(source, 0, sizeof(struct bulk_checkin_source)); + + source->read = bulk_checkin_source_read_incore; + source->seek = bulk_checkin_source_seek_incore; + + source->data.incore.buf = buf; + source->data.incore.nr_read = 0; + + source->size = size; + source->path = path; +} + /* * Read the contents from 'source' for 'size' bytes, streaming it to the * packfile in state while updating the hash in ctx. Signal a failure @@ -359,6 +409,19 @@ static int deflate_obj_to_pack(struct bulk_checkin_packfile *state, return 0; } +static int deflate_obj_to_pack_incore(struct bulk_checkin_packfile *state, + struct object_id *result_oid, + const void *buf, size_t size, + const char *path, enum object_type type, + unsigned flags) +{ + struct bulk_checkin_source source; + + init_bulk_checkin_source_incore(&source, buf, size, path); + + return deflate_obj_to_pack(state, result_oid, &source, type, 0, flags); +} + static int deflate_blob_to_pack(struct bulk_checkin_packfile *state, struct object_id *result_oid, int fd, size_t size, @@ -421,6 +484,18 @@ int index_blob_bulk_checkin(struct object_id *oid, return status; } +int index_blob_bulk_checkin_incore(struct object_id *oid, + const void *buf, size_t size, + const char *path, unsigned flags) +{ + int status = deflate_obj_to_pack_incore(&bulk_checkin_packfile, oid, + buf, size, path, OBJ_BLOB, + flags); + if (!odb_transaction_nesting) + flush_bulk_checkin_packfile(&bulk_checkin_packfile); + return status; +} + void begin_odb_transaction(void) { odb_transaction_nesting += 1; diff --git a/bulk-checkin.h b/bulk-checkin.h index aa7286a7b3..1b91daeaee 100644 --- a/bulk-checkin.h +++ b/bulk-checkin.h @@ -13,6 +13,10 @@ int index_blob_bulk_checkin(struct object_id *oid, int fd, size_t size, const char *path, unsigned flags); +int index_blob_bulk_checkin_incore(struct object_id *oid, + const void *buf, size_t size, + const char *path, unsigned flags); + /* * Tell the object database to optimize for adding * multiple objects. end_odb_transaction must be called From patchwork Mon Oct 23 22:45:04 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Taylor Blau X-Patchwork-Id: 13433672 Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net [23.128.96.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 3426024202 for ; Mon, 23 Oct 2023 22:45:07 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=ttaylorr-com.20230601.gappssmtp.com header.i=@ttaylorr-com.20230601.gappssmtp.com header.b="MtyFNImZ" Received: from mail-ot1-x32c.google.com (mail-ot1-x32c.google.com [IPv6:2607:f8b0:4864:20::32c]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 19CC910C for ; Mon, 23 Oct 2023 15:45:06 -0700 (PDT) Received: by mail-ot1-x32c.google.com with SMTP id 46e09a7af769-6ce353df504so2676390a34.3 for ; Mon, 23 Oct 2023 15:45:06 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ttaylorr-com.20230601.gappssmtp.com; s=20230601; t=1698101105; x=1698705905; darn=vger.kernel.org; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=frZHwzCGV5v5rhRlLDXV4Q612DnPcvX7ixs4V4d4tSQ=; b=MtyFNImZXx8KluxqywKVJoEHJNHnF0/l5znt0EJkfwuV1rXyrEus0EMLz+zM0emEo/ 5Ul5PSlkxiT36DgNTnNWrW2gpDG7GD8BlDr7TpHBsKcZUMiQHVexxuT/MNl4DiXq/syW fpWuQ7MIc8Yz1/PvvSDh0lXRvHLXs2L6jCsoU/xJU/N7yDsaW8fO4WHQ8K3Q5Rry/ueu t0a0bat3+Ld4PjWw2bCkcLMAwQOm0fumigoBAyvlB0obQb1+neL6XKybwnWoekCJGpzH cph2S9CiYdhLh8/Vqopgku67KqV4qTTBfiQrJZlmm43p+DDLbNSdjht3hEE9/bTOItrM 8m5g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1698101105; x=1698705905; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=frZHwzCGV5v5rhRlLDXV4Q612DnPcvX7ixs4V4d4tSQ=; b=lZ6o+AnmL62SOjJJEkl35ZXUYcFpdyDhBTzX/pgrW+9Bof8pwdlZb96qv3ZMKAnTo/ 0SFnx35G9S8tqzgGWSW2JSZ6MX0BZorNdbNwbgn77i5c7pys3Ka1qIesdkeXwDO/H+6R IHB7eefDQUfl7xtpbZECWyzpbHYCmJhAZ1wZXxxmuH0xP/jYjlPoTUL5mfwwbANiEJSs zYm5nXrDcfHlDljrX4w2btcKb4Zc5kVXJX2fsCDOMIsBGRSM75eF8/SPo3wvvYh+/kgC UAKRaGpO38z0MEvP/xcMDBL7mhr1/i4NVojNs87AuQW2hz91hFB90Pd+qYrlACBszSYX MrJg== X-Gm-Message-State: AOJu0YxhvdoHSOrUenprRX3I7ZtmPqVlX26vpxOP8/gH7bS7rvsVk1gM w7Ks7pbfrILjhnOKcW0T3+Mf9POCrtf902YhC80xMw== X-Google-Smtp-Source: AGHT+IFVDScx3rOM5km7eDa1pnB6LCY1/dWN5rGxFzudQGtcs88FgPjd5pmo6ZiHIzWG5Y5YHKGeUQ== X-Received: by 2002:a05:6830:13cc:b0:6bd:b29:85d3 with SMTP id e12-20020a05683013cc00b006bd0b2985d3mr10971706otq.24.1698101105083; Mon, 23 Oct 2023 15:45:05 -0700 (PDT) Received: from localhost (104-178-186-189.lightspeed.milwwi.sbcglobal.net. [104.178.186.189]) by smtp.gmail.com with ESMTPSA id c64-20020a0dc143000000b0059a34cfa2a8sm3485296ywd.62.2023.10.23.15.45.04 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 23 Oct 2023 15:45:04 -0700 (PDT) Date: Mon, 23 Oct 2023 18:45:04 -0400 From: Taylor Blau To: git@vger.kernel.org Cc: Elijah Newren , "Eric W. Biederman" , Jeff King , Junio C Hamano , Patrick Steinhardt Subject: [PATCH v5 4/5] bulk-checkin: introduce `index_tree_bulk_checkin_incore()` Message-ID: <2670192802a904b42fb0c11c26c9f7311aa8dd90.1698101088.git.me@ttaylorr.com> References: Precedence: bulk X-Mailing-List: git@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: The remaining missing piece in order to teach the `merge-tree` builtin how to write the contents of a merge into a pack is a function to index tree objects into a bulk-checkin pack. This patch implements that missing piece, which is a thin wrapper around all of the functionality introduced in previous commits. If and when Git gains support for a "compatibility" hash algorithm, the changes to support that here will be minimal. The bulk-checkin machinery will need to convert the incoming tree to compute its length under the compatibility hash, necessary to reconstruct its header. With that information (and the converted contents of the tree), the bulk-checkin machinery will have enough to keep track of the converted object's hash in order to update the compatibility mapping. Within some thin wrapper around `deflate_obj_to_pack_incore()` (perhaps `deflate_tree_to_pack_incore()`), the changes should be limited to something like: struct strbuf converted = STRBUF_INIT; if (the_repository->compat_hash_algo) { if (convert_object_file(&compat_obj, the_repository->hash_algo, the_repository->compat_hash_algo, ...) < 0) die(...); format_object_header_hash(the_repository->compat_hash_algo, OBJ_TREE, size); } /* compute the converted tree's hash using the compat algorithm */ strbuf_release(&converted); , assuming related changes throughout the rest of the bulk-checkin machinery necessary to update the hash of the converted object, which are likewise minimal in size. Signed-off-by: Taylor Blau --- bulk-checkin.c | 12 ++++++++++++ bulk-checkin.h | 4 ++++ 2 files changed, 16 insertions(+) diff --git a/bulk-checkin.c b/bulk-checkin.c index b728210bc7..bd6151ba3c 100644 --- a/bulk-checkin.c +++ b/bulk-checkin.c @@ -496,6 +496,18 @@ int index_blob_bulk_checkin_incore(struct object_id *oid, return status; } +int index_tree_bulk_checkin_incore(struct object_id *oid, + const void *buf, size_t size, + const char *path, unsigned flags) +{ + int status = deflate_obj_to_pack_incore(&bulk_checkin_packfile, oid, + buf, size, path, OBJ_TREE, + flags); + if (!odb_transaction_nesting) + flush_bulk_checkin_packfile(&bulk_checkin_packfile); + return status; +} + void begin_odb_transaction(void) { odb_transaction_nesting += 1; diff --git a/bulk-checkin.h b/bulk-checkin.h index 1b91daeaee..89786b3954 100644 --- a/bulk-checkin.h +++ b/bulk-checkin.h @@ -17,6 +17,10 @@ int index_blob_bulk_checkin_incore(struct object_id *oid, const void *buf, size_t size, const char *path, unsigned flags); +int index_tree_bulk_checkin_incore(struct object_id *oid, + const void *buf, size_t size, + const char *path, unsigned flags); + /* * Tell the object database to optimize for adding * multiple objects. end_odb_transaction must be called From patchwork Mon Oct 23 22:45:06 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Taylor Blau X-Patchwork-Id: 13433673 Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net [23.128.96.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 619992420B for ; Mon, 23 Oct 2023 22:45:10 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=ttaylorr-com.20230601.gappssmtp.com header.i=@ttaylorr-com.20230601.gappssmtp.com header.b="qf0KV4K8" Received: from mail-yw1-x1133.google.com (mail-yw1-x1133.google.com [IPv6:2607:f8b0:4864:20::1133]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id B0D43DF for ; Mon, 23 Oct 2023 15:45:08 -0700 (PDT) Received: by mail-yw1-x1133.google.com with SMTP id 00721157ae682-5a877e0f0d8so39989437b3.1 for ; Mon, 23 Oct 2023 15:45:08 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ttaylorr-com.20230601.gappssmtp.com; s=20230601; t=1698101108; x=1698705908; darn=vger.kernel.org; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=VGIK5oS/upej6rMx7BDsxeHhw0gWuhqfUCFAPKKMnk4=; b=qf0KV4K8xKM4aJJ6WHZZYq3AtyFdqp2CGGeK4O+JMCBPs/QQKYh7ZcoOlRcDLzH2Vx GjTckLvLNYml7SbbLb1gNpzTZo+0+H1gL7IMh7pAVsRKHYGaBBTK4dGc/CH998fO430z JODVBivaNt/Yel2I50L85JzeRPwXZgaXQAxhqiwvBdGyfxA4MjS12stjhtLUAN61gXuA bXqh2//9x6fp/tjbmdQPmI9wPo7xddXtcZYtms8kegUzMEWe1Om1me6CyeweAL+HBlO1 en0NDX27ZrGHhkKvULFb5uiTe+LzLuzLV2dt5HnrOGXQ8rG5dVpqsPgd2+zMNuEIW3gJ bkmA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1698101108; x=1698705908; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=VGIK5oS/upej6rMx7BDsxeHhw0gWuhqfUCFAPKKMnk4=; b=UqpsJnOQNC++kceN7vA6IcNsl0VFr7EjkRtmjEVyiM9vTULyOQRnHa1O8X92TZUnL3 YDfPsmnVOX2+vNmXK5lr1GwICk3gj1G/GTZ5q7JPy8pHOXZK7ZLhxR+eOM2a9Pk2Yhes eK9vMOGo2NhnHtfWdsFOhYgCj2jL2qAB/Tj/fgOVc/PJYDcUE2JQw68cHl4J/BSnVHwV XCj2JVJVvWo7skmFhGbnoDEMAudjkzrzz9Caq/vJrh/OTgi1a17cSWkPhKb2orW2MiH1 bGDLxkJKzXMTRAh9HF5+dqEGlFzstbYTpXh3oIpTzBFc0tBzLvDIeUnfx0gubmez2usg yMBw== X-Gm-Message-State: AOJu0YzJzgxgiXzwYAykQI6FrwoFN0byg+qH+Gq3B2UtxnxJYXpJjoRa GAK9jI+PMcCeX/emi47I/tJ8r7B2fnH0N02mCwBCpg== X-Google-Smtp-Source: AGHT+IG/MZozWvKO6pJb8XgoELY9Wry+wQWe9BLtCxhDIKwVgYPxjlo+K0aTTf42m1VKJBve8/O6bg== X-Received: by 2002:a0d:e24b:0:b0:56c:e480:2b2b with SMTP id l72-20020a0de24b000000b0056ce4802b2bmr13513250ywe.12.1698101107693; Mon, 23 Oct 2023 15:45:07 -0700 (PDT) Received: from localhost (104-178-186-189.lightspeed.milwwi.sbcglobal.net. [104.178.186.189]) by smtp.gmail.com with ESMTPSA id w65-20020a817b44000000b005869fd2b5bcsm3515636ywc.127.2023.10.23.15.45.07 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 23 Oct 2023 15:45:07 -0700 (PDT) Date: Mon, 23 Oct 2023 18:45:06 -0400 From: Taylor Blau To: git@vger.kernel.org Cc: Elijah Newren , "Eric W. Biederman" , Jeff King , Junio C Hamano , Patrick Steinhardt Subject: [PATCH v5 5/5] builtin/merge-tree.c: implement support for `--write-pack` Message-ID: <3595db76a525fcebc3c896e231246704b044310c.1698101088.git.me@ttaylorr.com> References: Precedence: bulk X-Mailing-List: git@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: When using merge-tree often within a repository[^1], it is possible to generate a relatively large number of loose objects, which can result in degraded performance, and inode exhaustion in extreme cases. Building on the functionality introduced in previous commits, the bulk-checkin machinery now has support to write arbitrary blob and tree objects which are small enough to be held in-core. We can use this to write any blob/tree objects generated by ORT into a separate pack instead of writing them out individually as loose. This functionality is gated behind a new `--write-pack` option to `merge-tree` that works with the (non-deprecated) `--write-tree` mode. The implementation is relatively straightforward. There are two spots within the ORT mechanism where we call `write_object_file()`, one for content differences within blobs, and another to assemble any new trees necessary to construct the merge. In each of those locations, conditionally replace calls to `write_object_file()` with `index_blob_bulk_checkin_incore()` or `index_tree_bulk_checkin_incore()` depending on which kind of object we are writing. The only remaining task is to begin and end the transaction necessary to initialize the bulk-checkin machinery, and move any new pack(s) it created into the main object store. [^1]: Such is the case at GitHub, where we run presumptive "test merges" on open pull requests to see whether or not we can light up the merge button green depending on whether or not the presumptive merge was conflicted. This is done in response to a number of user-initiated events, including viewing an open pull request whose last test merge is stale with respect to the current base and tip of the pull request. As a result, merge-tree can be run very frequently on large, active repositories. Signed-off-by: Taylor Blau --- Documentation/git-merge-tree.txt | 4 ++ builtin/merge-tree.c | 5 ++ merge-ort.c | 42 +++++++++++---- merge-recursive.h | 1 + t/t4301-merge-tree-write-tree.sh | 93 ++++++++++++++++++++++++++++++++ 5 files changed, 136 insertions(+), 9 deletions(-) diff --git a/Documentation/git-merge-tree.txt b/Documentation/git-merge-tree.txt index ffc4fbf7e8..9d37609ef1 100644 --- a/Documentation/git-merge-tree.txt +++ b/Documentation/git-merge-tree.txt @@ -69,6 +69,10 @@ OPTIONS specify a merge-base for the merge, and specifying multiple bases is currently not supported. This option is incompatible with `--stdin`. +--write-pack:: + Write any new objects into a separate packfile instead of as + individual loose objects. + [[OUTPUT]] OUTPUT ------ diff --git a/builtin/merge-tree.c b/builtin/merge-tree.c index a35e0452d6..218442ac9b 100644 --- a/builtin/merge-tree.c +++ b/builtin/merge-tree.c @@ -19,6 +19,7 @@ #include "tree.h" #include "config.h" #include "strvec.h" +#include "bulk-checkin.h" static int line_termination = '\n'; @@ -416,6 +417,7 @@ struct merge_tree_options { int name_only; int use_stdin; struct merge_options merge_options; + int write_pack; }; static int real_merge(struct merge_tree_options *o, @@ -441,6 +443,7 @@ static int real_merge(struct merge_tree_options *o, _("not something we can merge")); opt.show_rename_progress = 0; + opt.write_pack = o->write_pack; opt.branch1 = branch1; opt.branch2 = branch2; @@ -553,6 +556,8 @@ int cmd_merge_tree(int argc, const char **argv, const char *prefix) N_("specify a merge-base for the merge")), OPT_STRVEC('X', "strategy-option", &xopts, N_("option=value"), N_("option for selected merge strategy")), + OPT_BOOL(0, "write-pack", &o.write_pack, + N_("write new objects to a pack instead of as loose")), OPT_END() }; diff --git a/merge-ort.c b/merge-ort.c index 3653725661..523577d71e 100644 --- a/merge-ort.c +++ b/merge-ort.c @@ -48,6 +48,7 @@ #include "tree.h" #include "unpack-trees.h" #include "xdiff-interface.h" +#include "bulk-checkin.h" /* * We have many arrays of size 3. Whenever we have such an array, the @@ -2108,10 +2109,19 @@ static int handle_content_merge(struct merge_options *opt, if ((merge_status < 0) || !result_buf.ptr) ret = error(_("failed to execute internal merge")); - if (!ret && - write_object_file(result_buf.ptr, result_buf.size, - OBJ_BLOB, &result->oid)) - ret = error(_("unable to add %s to database"), path); + if (!ret) { + ret = opt->write_pack + ? index_blob_bulk_checkin_incore(&result->oid, + result_buf.ptr, + result_buf.size, + path, 1) + : write_object_file(result_buf.ptr, + result_buf.size, + OBJ_BLOB, &result->oid); + if (ret) + ret = error(_("unable to add %s to database"), + path); + } free(result_buf.ptr); if (ret) @@ -3597,7 +3607,8 @@ static int tree_entry_order(const void *a_, const void *b_) b->string, strlen(b->string), bmi->result.mode); } -static int write_tree(struct object_id *result_oid, +static int write_tree(struct merge_options *opt, + struct object_id *result_oid, struct string_list *versions, unsigned int offset, size_t hash_size) @@ -3631,8 +3642,14 @@ static int write_tree(struct object_id *result_oid, } /* Write this object file out, and record in result_oid */ - if (write_object_file(buf.buf, buf.len, OBJ_TREE, result_oid)) + ret = opt->write_pack + ? index_tree_bulk_checkin_incore(result_oid, + buf.buf, buf.len, "", 1) + : write_object_file(buf.buf, buf.len, OBJ_TREE, result_oid); + + if (ret) ret = -1; + strbuf_release(&buf); return ret; } @@ -3797,8 +3814,8 @@ static int write_completed_directory(struct merge_options *opt, */ dir_info->is_null = 0; dir_info->result.mode = S_IFDIR; - if (write_tree(&dir_info->result.oid, &info->versions, offset, - opt->repo->hash_algo->rawsz) < 0) + if (write_tree(opt, &dir_info->result.oid, &info->versions, + offset, opt->repo->hash_algo->rawsz) < 0) ret = -1; } @@ -4332,9 +4349,13 @@ static int process_entries(struct merge_options *opt, fflush(stdout); BUG("dir_metadata accounting completely off; shouldn't happen"); } - if (write_tree(result_oid, &dir_metadata.versions, 0, + if (write_tree(opt, result_oid, &dir_metadata.versions, 0, opt->repo->hash_algo->rawsz) < 0) ret = -1; + + if (opt->write_pack) + end_odb_transaction(); + cleanup: string_list_clear(&plist, 0); string_list_clear(&dir_metadata.versions, 0); @@ -4878,6 +4899,9 @@ static void merge_start(struct merge_options *opt, struct merge_result *result) */ strmap_init(&opt->priv->conflicts); + if (opt->write_pack) + begin_odb_transaction(); + trace2_region_leave("merge", "allocate/init", opt->repo); } diff --git a/merge-recursive.h b/merge-recursive.h index 3d3b3e3c29..5c5ff380a8 100644 --- a/merge-recursive.h +++ b/merge-recursive.h @@ -48,6 +48,7 @@ struct merge_options { unsigned renormalize : 1; unsigned record_conflict_msgs_as_headers : 1; const char *msg_header_prefix; + unsigned write_pack : 1; /* internal fields used by the implementation */ struct merge_options_internal *priv; diff --git a/t/t4301-merge-tree-write-tree.sh b/t/t4301-merge-tree-write-tree.sh index b2c8a43fce..d2a8634523 100755 --- a/t/t4301-merge-tree-write-tree.sh +++ b/t/t4301-merge-tree-write-tree.sh @@ -945,4 +945,97 @@ test_expect_success 'check the input format when --stdin is passed' ' test_cmp expect actual ' +packdir=".git/objects/pack" + +test_expect_success 'merge-tree can pack its result with --write-pack' ' + test_when_finished "rm -rf repo" && + git init repo && + + # base has lines [3, 4, 5] + # - side adds to the beginning, resulting in [1, 2, 3, 4, 5] + # - other adds to the end, resulting in [3, 4, 5, 6, 7] + # + # merging the two should result in a new blob object containing + # [1, 2, 3, 4, 5, 6, 7], along with a new tree. + test_commit -C repo base file "$(test_seq 3 5)" && + git -C repo branch -M main && + git -C repo checkout -b side main && + test_commit -C repo side file "$(test_seq 1 5)" && + git -C repo checkout -b other main && + test_commit -C repo other file "$(test_seq 3 7)" && + + find repo/$packdir -type f -name "pack-*.idx" >packs.before && + tree="$(git -C repo merge-tree --write-pack \ + refs/tags/side refs/tags/other)" && + blob="$(git -C repo rev-parse $tree:file)" && + find repo/$packdir -type f -name "pack-*.idx" >packs.after && + + test_must_be_empty packs.before && + test_line_count = 1 packs.after && + + git show-index <$(cat packs.after) >objects && + test_line_count = 2 objects && + grep "^[1-9][0-9]* $tree" objects && + grep "^[1-9][0-9]* $blob" objects +' + +test_expect_success 'merge-tree can write multiple packs with --write-pack' ' + test_when_finished "rm -rf repo" && + git init repo && + ( + cd repo && + + git config pack.packSizeLimit 512 && + + test_seq 512 >f && + + # "f" contains roughly ~2,000 bytes. + # + # Each side ("foo" and "bar") adds a small amount of data at the + # beginning and end of "base", respectively. + git add f && + test_tick && + git commit -m base && + git branch -M main && + + git checkout -b foo main && + { + echo foo && cat f + } >f.tmp && + mv f.tmp f && + git add f && + test_tick && + git commit -m foo && + + git checkout -b bar main && + echo bar >>f && + git add f && + test_tick && + git commit -m bar && + + find $packdir -type f -name "pack-*.idx" >packs.before && + # Merging either side should result in a new object which is + # larger than 1M, thus the result should be split into two + # separate packs. + tree="$(git merge-tree --write-pack \ + refs/heads/foo refs/heads/bar)" && + blob="$(git rev-parse $tree:f)" && + find $packdir -type f -name "pack-*.idx" >packs.after && + + test_must_be_empty packs.before && + test_line_count = 2 packs.after && + for idx in $(cat packs.after) + do + git show-index <$idx || return 1 + done >objects && + + # The resulting set of packs should contain one copy of both + # objects, each in a separate pack. + test_line_count = 2 objects && + grep "^[1-9][0-9]* $tree" objects && + grep "^[1-9][0-9]* $blob" objects + + ) +' + test_done