From patchwork Wed Oct 18 17:07:48 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Taylor Blau X-Patchwork-Id: 13427515 Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net [23.128.96.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 5A24A3D961 for ; Wed, 18 Oct 2023 17:10:16 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=ttaylorr-com.20230601.gappssmtp.com header.i=@ttaylorr-com.20230601.gappssmtp.com header.b="KXm34lYm" Received: from mail-qt1-x832.google.com (mail-qt1-x832.google.com [IPv6:2607:f8b0:4864:20::832]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 562F630EC for ; Wed, 18 Oct 2023 10:08:03 -0700 (PDT) Received: by mail-qt1-x832.google.com with SMTP id d75a77b69052e-41c157bbd30so10545541cf.0 for ; Wed, 18 Oct 2023 10:08:03 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ttaylorr-com.20230601.gappssmtp.com; s=20230601; t=1697648870; x=1698253670; darn=vger.kernel.org; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=6bemIE568wjYmkD8+0jqFSRkI1H3MvGZ4fBnPWLhW4Y=; b=KXm34lYm1xthuWIMiS1js+9P8P2W/x9UfHiNVyQrawob3czVSK1gyHh61bX03LDJOW Y2sqQ+huoMZPVKcF6+GlABkwfHaNBCmBbgFaG+pncrx+IAXM42dOzPDSZDnVG/PKc/SW LC4S87XSAasVmVaM0lyVoiSCd0aKBMjzoxwtvTeeiQUtYaZuu1Uaos7cdluFvujg+21y nUjbzrcrp+hzAkxgl06JJDgbiPs7a3vbp0/aetmng7NiSQR1/+DonaOB108IU54KpZRH 03+heV2LjoTxFeAwtaIUXy8FBBPO60XfKkD+HR2Tl/fhWd35jndj/F2KpT5O6/98HT9U 5hIw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1697648870; x=1698253670; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=6bemIE568wjYmkD8+0jqFSRkI1H3MvGZ4fBnPWLhW4Y=; b=lVHHK+mjNVD0BRyF+HYReFks/7LusFw4PgNA0sDES5G4Te2oU1SLXXHDi3EOBqRhE0 7hpZQDDQ++at+hB06X6g32p8ob4aRg6QvsbTKpyTS4ZBKYPKU4JDj7Zoo0Ut2vkg3g6w dO1vIi0BELnpIppB8cQ6iBlaERbBaWexSuu8ZK2A7h/kNuvdYCEmHeS6l080ZMPUDffP /qwTc3ygZ5JnNpeAILkJ3e5XxVpidy7lgOU5Vevh0l2M2kSfujqdET8LoQuknfY6dI5K VU+SOBABvAFtKdfc7RwdWkd9X6Ee8nMiHBC8gmAGteYOPwP92SldONBcNqpQ4oq2DL/v 2wOw== X-Gm-Message-State: AOJu0YyGnQu+TKP/i6SUrM9Dc77xwhLL75D3HNm53pPNrbdzrGisYhCJ X1xT46mDD+zYeG0cGGViCEi7fe664pFj+I/8J4ctzQ== X-Google-Smtp-Source: AGHT+IGYgKvORG9mKnBjLDinDoaTg6v4miW1iO+0/Ed4a3Kli0GSiMvau+565JUbn/cf+p1VItCOvw== X-Received: by 2002:a05:622a:34f:b0:40d:589d:9ce5 with SMTP id r15-20020a05622a034f00b0040d589d9ce5mr6854624qtw.34.1697648870530; Wed, 18 Oct 2023 10:07:50 -0700 (PDT) Received: from localhost (104-178-186-189.lightspeed.milwwi.sbcglobal.net. [104.178.186.189]) by smtp.gmail.com with ESMTPSA id x24-20020ac84a18000000b00417f330026bsm95197qtq.49.2023.10.18.10.07.50 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 18 Oct 2023 10:07:50 -0700 (PDT) Date: Wed, 18 Oct 2023 13:07:48 -0400 From: Taylor Blau To: git@vger.kernel.org Cc: Elijah Newren , "Eric W. Biederman" , Jeff King , Junio C Hamano , Patrick Steinhardt Subject: [PATCH v3 01/10] bulk-checkin: factor out `format_object_header_hash()` Message-ID: <2dffa4518339a7b96a885db4c64431276bfeb4d6.1697648864.git.me@ttaylorr.com> References: Precedence: bulk X-Mailing-List: git@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,RCVD_IN_DNSWL_BLOCKED,SPF_HELO_NONE,SPF_NONE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Before deflating a blob into a pack, the bulk-checkin mechanism prepares the pack object header by calling `format_object_header()`, and writing into a scratch buffer, the contents of which eventually makes its way into the pack. Future commits will add support for deflating multiple kinds of objects into a pack, and will likewise need to perform a similar operation as below. This is a mostly straightforward extraction, with one notable exception. Instead of hard-coding `the_hash_algo`, pass it in to the new function as an argument. This isn't strictly necessary for our immediate purposes here, but will prove useful in the future if/when the bulk-checkin mechanism grows support for the hash transition plan. Signed-off-by: Taylor Blau --- bulk-checkin.c | 25 ++++++++++++++++++------- 1 file changed, 18 insertions(+), 7 deletions(-) diff --git a/bulk-checkin.c b/bulk-checkin.c index 6ce62999e5..fd3c110d1c 100644 --- a/bulk-checkin.c +++ b/bulk-checkin.c @@ -247,6 +247,22 @@ static void prepare_to_stream(struct bulk_checkin_packfile *state, die_errno("unable to write pack header"); } +static void format_object_header_hash(const struct git_hash_algo *algop, + git_hash_ctx *ctx, + struct hashfile_checkpoint *checkpoint, + enum object_type type, + size_t size) +{ + unsigned char header[16384]; + unsigned header_len = format_object_header((char *)header, + sizeof(header), + type, size); + + algop->init_fn(ctx); + algop->update_fn(ctx, header, header_len); + algop->init_fn(&checkpoint->ctx); +} + static int deflate_blob_to_pack(struct bulk_checkin_packfile *state, struct object_id *result_oid, int fd, size_t size, @@ -254,8 +270,6 @@ static int deflate_blob_to_pack(struct bulk_checkin_packfile *state, { off_t seekback, already_hashed_to; git_hash_ctx ctx; - unsigned char obuf[16384]; - unsigned header_len; struct hashfile_checkpoint checkpoint = {0}; struct pack_idx_entry *idx = NULL; @@ -263,11 +277,8 @@ static int deflate_blob_to_pack(struct bulk_checkin_packfile *state, if (seekback == (off_t) -1) return error("cannot find the current offset"); - header_len = format_object_header((char *)obuf, sizeof(obuf), - OBJ_BLOB, size); - the_hash_algo->init_fn(&ctx); - the_hash_algo->update_fn(&ctx, obuf, header_len); - the_hash_algo->init_fn(&checkpoint.ctx); + format_object_header_hash(the_hash_algo, &ctx, &checkpoint, OBJ_BLOB, + size); /* Note: idx is non-NULL when we are writing */ if ((flags & HASH_WRITE_OBJECT) != 0) From patchwork Wed Oct 18 17:07:52 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Taylor Blau X-Patchwork-Id: 13427516 Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net [23.128.96.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id B7FA23FB01 for ; Wed, 18 Oct 2023 17:10:21 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=ttaylorr-com.20230601.gappssmtp.com header.i=@ttaylorr-com.20230601.gappssmtp.com header.b="rizEGvnB" Received: from mail-qv1-xf2d.google.com (mail-qv1-xf2d.google.com [IPv6:2607:f8b0:4864:20::f2d]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 77607324B for ; Wed, 18 Oct 2023 10:08:07 -0700 (PDT) Received: by mail-qv1-xf2d.google.com with SMTP id 6a1803df08f44-66cfd0b2d58so45115336d6.2 for ; Wed, 18 Oct 2023 10:08:07 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ttaylorr-com.20230601.gappssmtp.com; s=20230601; t=1697648874; x=1698253674; darn=vger.kernel.org; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=7BNgyPonwnkyrYBF3IPJVbzk9x7EP0eogM8Bn+m/YpI=; b=rizEGvnB7cv8Fhi1yh6zFEid3clfjSFKQWlmOEyXzb3gcOjjMJSHn8mutJ8OeT4kzg x9ir4G6ROH0EjTjOqYRaNeg3HbGB00tFU+MbHYxpWtGuSh7MxnKvMKueLlk4M+KfKa4o LBhKDyvtdg8dSY34OdCVYsQFwWGBRQ1S9Ujwiq1boNfuXZFgIcQ4MxH58zkv2ryezh5j rKFxdzYTLVq3784Q95lnr0sC/WNBqfP+lO9GplKBuunoG6qFzTvDlZv8Mz+0IiTlfUml 3zzBd4bRbgXyQduS8b42hQ3e3HeSe5/Iv60TeMQqpy382nrvljG8CygF2czG/FyXKPo3 l7nw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1697648874; x=1698253674; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=7BNgyPonwnkyrYBF3IPJVbzk9x7EP0eogM8Bn+m/YpI=; b=UFaNbopVMB72xUxL9LOK+WBCBtSuREsgkaydYu8cktC+7gFo4bde7zOcnT3WGR0rNU EXfMjOmPwf2sO6+W3nz8uVaGnlQwzoBPymKGCF7YkPOCjBwyOFrKypbzvKi+SdNbvh/0 YmFSPDPCpNgZPO4zL4AX1lI+i5Qn/gi28iI5MJ7TLavNVhip7KgsMozItHsAwUKkU/tQ 2MIJzvO8R4LHxlHg9GnDqMdJI+T0oOFxm2WJ7srSmPpleKsB1DfFmeSPh+gyDOTbohU1 RA7fwFDUkzh+xsaIuPTAcFnfc7wQhdy55b2Rsk2RJ9YaT6nhjz8pVmoSwOxlVWT29rkw 327A== X-Gm-Message-State: AOJu0Yy9rYwWwoiqP4xFmJk0ljH65ChSXZ78nx2znaun28lTnWp4NZlP nh8Ngxif/uq74mBr0pg7/h3lk5AdEAdnl4PIamFd3w== X-Google-Smtp-Source: AGHT+IGDJLNuJujknARTXunV9G2wC09IBUoNMX1UgDOmyROVbex3UEE+6dZqz0T6H9j9wDLElad6xQ== X-Received: by 2002:a05:6214:628:b0:65a:f332:10f6 with SMTP id a8-20020a056214062800b0065af33210f6mr7114438qvx.35.1697648874309; Wed, 18 Oct 2023 10:07:54 -0700 (PDT) Received: from localhost (104-178-186-189.lightspeed.milwwi.sbcglobal.net. [104.178.186.189]) by smtp.gmail.com with ESMTPSA id l15-20020ad4452f000000b0065b22afe53csm96001qvu.94.2023.10.18.10.07.53 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 18 Oct 2023 10:07:53 -0700 (PDT) Date: Wed, 18 Oct 2023 13:07:52 -0400 From: Taylor Blau To: git@vger.kernel.org Cc: Elijah Newren , "Eric W. Biederman" , Jeff King , Junio C Hamano , Patrick Steinhardt Subject: [PATCH v3 02/10] bulk-checkin: factor out `prepare_checkpoint()` Message-ID: <7a10dc794aad20cfc226184acda1d40b191164d5.1697648864.git.me@ttaylorr.com> References: Precedence: bulk X-Mailing-List: git@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,RCVD_IN_DNSWL_BLOCKED,SPF_HELO_NONE,SPF_NONE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net In a similar spirit as the previous commit, factor out the routine to prepare streaming into a bulk-checkin pack into its own function. Unlike the previous patch, this is a verbatim copy and paste. Signed-off-by: Taylor Blau --- bulk-checkin.c | 20 ++++++++++++++------ 1 file changed, 14 insertions(+), 6 deletions(-) diff --git a/bulk-checkin.c b/bulk-checkin.c index fd3c110d1c..c1f5450583 100644 --- a/bulk-checkin.c +++ b/bulk-checkin.c @@ -263,6 +263,19 @@ static void format_object_header_hash(const struct git_hash_algo *algop, algop->init_fn(&checkpoint->ctx); } +static void prepare_checkpoint(struct bulk_checkin_packfile *state, + struct hashfile_checkpoint *checkpoint, + struct pack_idx_entry *idx, + unsigned flags) +{ + prepare_to_stream(state, flags); + if (idx) { + hashfile_checkpoint(state->f, checkpoint); + idx->offset = state->offset; + crc32_begin(state->f); + } +} + static int deflate_blob_to_pack(struct bulk_checkin_packfile *state, struct object_id *result_oid, int fd, size_t size, @@ -287,12 +300,7 @@ static int deflate_blob_to_pack(struct bulk_checkin_packfile *state, already_hashed_to = 0; while (1) { - prepare_to_stream(state, flags); - if (idx) { - hashfile_checkpoint(state->f, &checkpoint); - idx->offset = state->offset; - crc32_begin(state->f); - } + prepare_checkpoint(state, &checkpoint, idx, flags); if (!stream_blob_to_pack(state, &ctx, &already_hashed_to, fd, size, path, flags)) break; From patchwork Wed Oct 18 17:07:56 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Taylor Blau X-Patchwork-Id: 13427450 Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net [23.128.96.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 1EFDA3C685 for ; Wed, 18 Oct 2023 17:08:39 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=ttaylorr-com.20230601.gappssmtp.com header.i=@ttaylorr-com.20230601.gappssmtp.com header.b="PbpGXYCf" Received: from mail-qv1-xf34.google.com (mail-qv1-xf34.google.com [IPv6:2607:f8b0:4864:20::f34]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 71877326A for ; Wed, 18 Oct 2023 10:08:10 -0700 (PDT) Received: by mail-qv1-xf34.google.com with SMTP id 6a1803df08f44-66d13ac2796so42057976d6.2 for ; Wed, 18 Oct 2023 10:08:10 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ttaylorr-com.20230601.gappssmtp.com; s=20230601; t=1697648877; x=1698253677; darn=vger.kernel.org; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=fzYxvgYmm/IptMXg/jU7ny3eQ+8FXjm1uBaFK6DT4hY=; b=PbpGXYCfjxy03wXqE4OuZENJKpnwr9vOK/4Q+uN7icLLceFUb8vRXrnZP74PFDsNVo 4MBVXjTY96OBKtetGh7stx6l9KTYI3GjWcD2mllcJyA1jgutqSTJ4jL5xWfWV87EXWpu haCUr2iJHZ7LlhTK1Iky0SiOSboa09qFUoyM2/m/0wyDPkk+HGDJKRFfAjtXbrGgkWjK gMAwHsKKIyhKShSbq5usv8oJFVoB5Yq8ACy6CSwvW9S4CcAYwBDV1oe3Bl8I/z4SRdKf 1jGnsYhJwqrH+B2VgVsIErzYcDfl0iwoS9kpnHdZruAXIxrG5F1hZr9fH6teI7n5BlWP tqIg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1697648877; x=1698253677; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=fzYxvgYmm/IptMXg/jU7ny3eQ+8FXjm1uBaFK6DT4hY=; b=ug+o/6BwOQxDCiyMrDUije0nAF+Ew6784SbWnn+l8b5tbYQCF+xxSR++/LivxwFJNW 3nnTv2ttLJHi8PlJrpILgdK0BJQWNiAtk/ILCO0aYXfZgEqsRTQoELYLv7akJYgeXa9t yyc9eiIbchuIIm14hPkJJoBs4uwA2ywqIGjvHYfce9RMMWK9Y0+M+wCPSt8TSaWBnr+0 P5gmGHHZ8lzXjcb5dVnT7nvEVKMKn7Mkc/z09iEeDV1Tu+X4iLuxp+QxcxjUm4LwIUhT 3F+TY7LwUjbfGhV53gRkNKzdMkmdXbtRgahNElbl4gVjOW0iYBuPJkyUBrwecjuzkJEC n14w== X-Gm-Message-State: AOJu0YzRAbhWbm1vaBpKczZrSzPN1Y+qHr7YG1R4SeEF607K8V5aeG+n 3ZJxfw2cZTMEYy2dWpDRJpRj4nhCPLejMCIfgLrfag== X-Google-Smtp-Source: AGHT+IE8cTi6CJ/dAOlXX+UU5pH5Ie2erz3NhPCdw9TZZNpPJSuQeH2e1Djq6qjff8n2a/T5JuJUuQ== X-Received: by 2002:a05:6214:d8e:b0:65d:f1d:d383 with SMTP id e14-20020a0562140d8e00b0065d0f1dd383mr6251643qve.3.1697648877689; Wed, 18 Oct 2023 10:07:57 -0700 (PDT) Received: from localhost (104-178-186-189.lightspeed.milwwi.sbcglobal.net. [104.178.186.189]) by smtp.gmail.com with ESMTPSA id u11-20020a05621411ab00b0065b1bcd0d33sm94519qvv.93.2023.10.18.10.07.57 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 18 Oct 2023 10:07:57 -0700 (PDT) Date: Wed, 18 Oct 2023 13:07:56 -0400 From: Taylor Blau To: git@vger.kernel.org Cc: Elijah Newren , "Eric W. Biederman" , Jeff King , Junio C Hamano , Patrick Steinhardt Subject: [PATCH v3 03/10] bulk-checkin: factor out `truncate_checkpoint()` Message-ID: <20c32d2178560180692327d8b93fe2a7adcf6ffd.1697648864.git.me@ttaylorr.com> References: Precedence: bulk X-Mailing-List: git@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,RCVD_IN_DNSWL_BLOCKED,SPF_HELO_NONE,SPF_NONE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net In a similar spirit as previous commits, factor our the routine to truncate a bulk-checkin packfile when writing past the pack size limit. Signed-off-by: Taylor Blau --- bulk-checkin.c | 27 +++++++++++++++++---------- 1 file changed, 17 insertions(+), 10 deletions(-) diff --git a/bulk-checkin.c b/bulk-checkin.c index c1f5450583..b92d7a6f5a 100644 --- a/bulk-checkin.c +++ b/bulk-checkin.c @@ -276,6 +276,22 @@ static void prepare_checkpoint(struct bulk_checkin_packfile *state, } } +static void truncate_checkpoint(struct bulk_checkin_packfile *state, + struct hashfile_checkpoint *checkpoint, + struct pack_idx_entry *idx) +{ + /* + * Writing this object to the current pack will make + * it too big; we need to truncate it, start a new + * pack, and write into it. + */ + if (!idx) + BUG("should not happen"); + hashfile_truncate(state->f, checkpoint); + state->offset = checkpoint->offset; + flush_bulk_checkin_packfile(state); +} + static int deflate_blob_to_pack(struct bulk_checkin_packfile *state, struct object_id *result_oid, int fd, size_t size, @@ -304,16 +320,7 @@ static int deflate_blob_to_pack(struct bulk_checkin_packfile *state, if (!stream_blob_to_pack(state, &ctx, &already_hashed_to, fd, size, path, flags)) break; - /* - * Writing this object to the current pack will make - * it too big; we need to truncate it, start a new - * pack, and write into it. - */ - if (!idx) - BUG("should not happen"); - hashfile_truncate(state->f, &checkpoint); - state->offset = checkpoint.offset; - flush_bulk_checkin_packfile(state); + truncate_checkpoint(state, &checkpoint, idx); if (lseek(fd, seekback, SEEK_SET) == (off_t) -1) return error("cannot seek back"); } From patchwork Wed Oct 18 17:07:59 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Taylor Blau X-Patchwork-Id: 13427517 Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net [23.128.96.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id A3FEE3FB12 for ; Wed, 18 Oct 2023 17:10:24 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=ttaylorr-com.20230601.gappssmtp.com header.i=@ttaylorr-com.20230601.gappssmtp.com header.b="2qFzci2b" Received: from mail-qv1-xf2d.google.com (mail-qv1-xf2d.google.com [IPv6:2607:f8b0:4864:20::f2d]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id F303F3580 for ; Wed, 18 Oct 2023 10:08:11 -0700 (PDT) Received: by mail-qv1-xf2d.google.com with SMTP id 6a1803df08f44-66d0169cf43so45160856d6.3 for ; Wed, 18 Oct 2023 10:08:11 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ttaylorr-com.20230601.gappssmtp.com; s=20230601; t=1697648881; x=1698253681; darn=vger.kernel.org; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=fdOewODb38aAPzbzHPP7AVJkAW1Xajuiv60PskYm8LA=; b=2qFzci2bb1MnqIX39f9Eay08CiXn7KueCkeXnTfhDkU7aMH4oLmiebj+qS7XpC9GAJ Yj6A3anUpiJH7Bar6CApKjraGzTzmWbTYtQRyWEk/o2QIrWwlQdcw8LCnULBAgmLEYku d7Ze4kxsAPv/4rbTbg07t0BOnigjrlVxN6pXKWhdxz6cGVWDb+Na7kIKVzcJhkwd06bX cvaq268fB23LKXr7JwipFbC13dpgGi+DHtUZz8wE2BO6a9nbeawZzeShveSNoS/cl1N3 nD46OS6h8C4gVOphLGE2340/GTpptjPlTuCsHEs6IlzDwQ7tI6ZzEjPY0yfDsPwUbbEU KMKw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1697648881; x=1698253681; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=fdOewODb38aAPzbzHPP7AVJkAW1Xajuiv60PskYm8LA=; b=s99T+cRt0N4+XEKeG99vqbhR3gbvUyUtiazwx1NXga9ox2nHQYzwaEVqNM7RYF/Aje CYif8qLiLzCXF/yvqihapstGOrLMEleUPgdPgavSvYdMZhwI0vg7pL7n+54lOFQ1dCql bWPE+8+dfW+3K0T4wD30s1/0hEIIxEL8naNe+5sgKl3Sy02jIzbyLIjG8cC/INFNtMt5 yhrmEUD5iGK4yTkAnGEsfb5udx9shbf9Jf6svXLd/cfaPrJLahd+/vHrI4Uur9TkFZ9B vKA4u2+DL8Gk0w/+TlG3qEUMj8r/w2jGPU96A/e84qqYD93LMas5E8T+sB0OKcH1veut PQ2A== X-Gm-Message-State: AOJu0YxUetb2DiVQ9zhwMcvTHMccZI0EZaI+48TKOtXd+DB0mhkyhkof l3MDiFYhgsV0u1+QNbvizsbzYw9xn3ZnCStHw6/VmA== X-Google-Smtp-Source: AGHT+IE0WF+KLVefTIqTLGMbKjQYZp7lFSQSYPi0MRTBNMgni0pxsfInBX55cyhVlxpb8ErLrNtmVA== X-Received: by 2002:a05:6214:2262:b0:66d:253c:9a80 with SMTP id gs2-20020a056214226200b0066d253c9a80mr7978835qvb.54.1697648880871; Wed, 18 Oct 2023 10:08:00 -0700 (PDT) Received: from localhost (104-178-186-189.lightspeed.milwwi.sbcglobal.net. [104.178.186.189]) by smtp.gmail.com with ESMTPSA id w16-20020a0cef90000000b006577e289d37sm101494qvr.2.2023.10.18.10.08.00 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 18 Oct 2023 10:08:00 -0700 (PDT) Date: Wed, 18 Oct 2023 13:07:59 -0400 From: Taylor Blau To: git@vger.kernel.org Cc: Elijah Newren , "Eric W. Biederman" , Jeff King , Junio C Hamano , Patrick Steinhardt Subject: [PATCH v3 04/10] bulk-checkin: factor out `finalize_checkpoint()` Message-ID: <893051d0b7aa162396778cd696e98ae507d7f3d6.1697648864.git.me@ttaylorr.com> References: Precedence: bulk X-Mailing-List: git@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,RCVD_IN_DNSWL_BLOCKED,SPF_HELO_NONE,SPF_NONE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net In a similar spirit as previous commits, factor out the routine to finalize the just-written object from the bulk-checkin mechanism. Signed-off-by: Taylor Blau --- bulk-checkin.c | 41 +++++++++++++++++++++++++---------------- 1 file changed, 25 insertions(+), 16 deletions(-) diff --git a/bulk-checkin.c b/bulk-checkin.c index b92d7a6f5a..f4914fb6d1 100644 --- a/bulk-checkin.c +++ b/bulk-checkin.c @@ -292,6 +292,30 @@ static void truncate_checkpoint(struct bulk_checkin_packfile *state, flush_bulk_checkin_packfile(state); } +static void finalize_checkpoint(struct bulk_checkin_packfile *state, + git_hash_ctx *ctx, + struct hashfile_checkpoint *checkpoint, + struct pack_idx_entry *idx, + struct object_id *result_oid) +{ + the_hash_algo->final_oid_fn(result_oid, ctx); + if (!idx) + return; + + idx->crc32 = crc32_end(state->f); + if (already_written(state, result_oid)) { + hashfile_truncate(state->f, checkpoint); + state->offset = checkpoint->offset; + free(idx); + } else { + oidcpy(&idx->oid, result_oid); + ALLOC_GROW(state->written, + state->nr_written + 1, + state->alloc_written); + state->written[state->nr_written++] = idx; + } +} + static int deflate_blob_to_pack(struct bulk_checkin_packfile *state, struct object_id *result_oid, int fd, size_t size, @@ -324,22 +348,7 @@ static int deflate_blob_to_pack(struct bulk_checkin_packfile *state, if (lseek(fd, seekback, SEEK_SET) == (off_t) -1) return error("cannot seek back"); } - the_hash_algo->final_oid_fn(result_oid, &ctx); - if (!idx) - return 0; - - idx->crc32 = crc32_end(state->f); - if (already_written(state, result_oid)) { - hashfile_truncate(state->f, &checkpoint); - state->offset = checkpoint.offset; - free(idx); - } else { - oidcpy(&idx->oid, result_oid); - ALLOC_GROW(state->written, - state->nr_written + 1, - state->alloc_written); - state->written[state->nr_written++] = idx; - } + finalize_checkpoint(state, &ctx, &checkpoint, idx, result_oid); return 0; } From patchwork Wed Oct 18 17:08:02 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Taylor Blau X-Patchwork-Id: 13427518 Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net [23.128.96.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 9C7623FB26 for ; Wed, 18 Oct 2023 17:10:26 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=ttaylorr-com.20230601.gappssmtp.com header.i=@ttaylorr-com.20230601.gappssmtp.com header.b="foESqzsq" Received: from mail-qt1-x833.google.com (mail-qt1-x833.google.com [IPv6:2607:f8b0:4864:20::833]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D3FAB3588 for ; Wed, 18 Oct 2023 10:08:11 -0700 (PDT) Received: by mail-qt1-x833.google.com with SMTP id d75a77b69052e-41b7ec4cceeso14614251cf.1 for ; Wed, 18 Oct 2023 10:08:11 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ttaylorr-com.20230601.gappssmtp.com; s=20230601; t=1697648884; x=1698253684; darn=vger.kernel.org; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=Xa+7gCAWaHsWWVREEEId/KoMSu/bRPLyZxQYLtv6Uzw=; b=foESqzsqzc+aXvwfu5au0yhIcbq9VmUX8TyfHCLwm0IlHD+/Eo3yyeIQeat38sQ7YE uTl0uIPeYYrQ8lsBCkYWYCJOQeoGUVYaAeuZIMPSV6PYQyDx/qVZSkbSgfjGEwW6aw/4 yPawhyIr5/PqZ82FzSgRB5P2vssxqas47gu6aaSMQBhYnOZhnMrlw8vthmRTrSrt/MR8 QoATE4T3PT7+ZyXHxmyOWpq8Gr2p6O2atwJ8Au4ktSQ4nlEhIVPonWAasUOC4xwVmKh3 a+r2YgsBqAMn/aovS5WVG5SODIjk8KPFjp18FfgrPT6TkEaa6eSiP6y3S5hCiS7uqJKI wn9w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1697648884; x=1698253684; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=Xa+7gCAWaHsWWVREEEId/KoMSu/bRPLyZxQYLtv6Uzw=; b=aPmiCogiIi2QO+s6lhIwsWcntmiJco0VAQqra4bX+Nr3CTN73Suesb9XyVQ4uu2TyW KCnIiGkGvsv2I1NZJ3SwO4mtn1H0nYy6V76krT+woh5VLm2hBL5x9/xtLez2vNTfmqX4 sd7C0gZi33CeFbysHKGnDtF3xsm64mZW52DSxZG9lIQInsFNwwaJtto6B45VBNJPBn3n 1J5SORO7td41JObXyd+1kPM4CHKejUedYJxvhAJcTWDmHqMhhe2NyPoXt2HczPNxXvx+ 0ozW0isySG/34y302Rzqs46Ld3whWsL0SEclqOVnDOW2y6evRcUN2VB10fZfXP2/kn0i qFcQ== X-Gm-Message-State: AOJu0Ywt/oWCZpgmdZMmJl2tcs1O3n2gMREfiRxXbR7jsr5OmimgEoX/ VXUrC7aRb3A9MQdlDpTXfZssw6AASExApH8OQVo8jw== X-Google-Smtp-Source: AGHT+IGxp/RPPNU9Kp5CGZ5oqAfd3q13hWzz8QOKUWbyloMtL6tf2RNWbSzW2ohbF9WmYxc9H7yItQ== X-Received: by 2002:ac8:598a:0:b0:418:4e7:b82c with SMTP id e10-20020ac8598a000000b0041804e7b82cmr6913060qte.57.1697648883985; Wed, 18 Oct 2023 10:08:03 -0700 (PDT) Received: from localhost (104-178-186-189.lightspeed.milwwi.sbcglobal.net. [104.178.186.189]) by smtp.gmail.com with ESMTPSA id l30-20020ac84cde000000b0041818df8a0dsm97451qtv.36.2023.10.18.10.08.03 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 18 Oct 2023 10:08:03 -0700 (PDT) Date: Wed, 18 Oct 2023 13:08:02 -0400 From: Taylor Blau To: git@vger.kernel.org Cc: Elijah Newren , "Eric W. Biederman" , Jeff King , Junio C Hamano , Patrick Steinhardt Subject: [PATCH v3 05/10] bulk-checkin: extract abstract `bulk_checkin_source` Message-ID: References: Precedence: bulk X-Mailing-List: git@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,RCVD_IN_DNSWL_BLOCKED,SPF_HELO_NONE,SPF_NONE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net A future commit will want to implement a very similar routine as in `stream_blob_to_pack()` with two notable changes: - Instead of streaming just OBJ_BLOBs, this new function may want to stream objects of arbitrary type. - Instead of streaming the object's contents from an open file-descriptor, this new function may want to "stream" its contents from memory. To avoid duplicating a significant chunk of code between the existing `stream_blob_to_pack()`, extract an abstract `bulk_checkin_source`. This concept currently is a thin layer of `lseek()` and `read_in_full()`, but will grow to understand how to perform analogous operations when writing out an object's contents from memory. Suggested-by: Junio C Hamano Signed-off-by: Taylor Blau --- bulk-checkin.c | 61 +++++++++++++++++++++++++++++++++++++++++++------- 1 file changed, 53 insertions(+), 8 deletions(-) diff --git a/bulk-checkin.c b/bulk-checkin.c index f4914fb6d1..fc1d902018 100644 --- a/bulk-checkin.c +++ b/bulk-checkin.c @@ -140,8 +140,41 @@ static int already_written(struct bulk_checkin_packfile *state, struct object_id return 0; } +struct bulk_checkin_source { + enum { SOURCE_FILE } type; + + /* SOURCE_FILE fields */ + int fd; + + /* common fields */ + size_t size; + const char *path; +}; + +static off_t bulk_checkin_source_seek_to(struct bulk_checkin_source *source, + off_t offset) +{ + switch (source->type) { + case SOURCE_FILE: + return lseek(source->fd, offset, SEEK_SET); + default: + BUG("unknown bulk-checkin source: %d", source->type); + } +} + +static ssize_t bulk_checkin_source_read(struct bulk_checkin_source *source, + void *buf, size_t nr) +{ + switch (source->type) { + case SOURCE_FILE: + return read_in_full(source->fd, buf, nr); + default: + BUG("unknown bulk-checkin source: %d", source->type); + } +} + /* - * Read the contents from fd for size bytes, streaming it to the + * Read the contents from 'source' for 'size' bytes, streaming it to the * packfile in state while updating the hash in ctx. Signal a failure * by returning a negative value when the resulting pack would exceed * the pack size limit and this is not the first object in the pack, @@ -157,7 +190,7 @@ static int already_written(struct bulk_checkin_packfile *state, struct object_id */ static int stream_blob_to_pack(struct bulk_checkin_packfile *state, git_hash_ctx *ctx, off_t *already_hashed_to, - int fd, size_t size, const char *path, + struct bulk_checkin_source *source, unsigned flags) { git_zstream s; @@ -167,22 +200,28 @@ static int stream_blob_to_pack(struct bulk_checkin_packfile *state, int status = Z_OK; int write_object = (flags & HASH_WRITE_OBJECT); off_t offset = 0; + size_t size = source->size; git_deflate_init(&s, pack_compression_level); - hdrlen = encode_in_pack_object_header(obuf, sizeof(obuf), OBJ_BLOB, size); + hdrlen = encode_in_pack_object_header(obuf, sizeof(obuf), OBJ_BLOB, + size); s.next_out = obuf + hdrlen; s.avail_out = sizeof(obuf) - hdrlen; while (status != Z_STREAM_END) { if (size && !s.avail_in) { ssize_t rsize = size < sizeof(ibuf) ? size : sizeof(ibuf); - ssize_t read_result = read_in_full(fd, ibuf, rsize); + ssize_t read_result; + + read_result = bulk_checkin_source_read(source, ibuf, + rsize); if (read_result < 0) - die_errno("failed to read from '%s'", path); + die_errno("failed to read from '%s'", + source->path); if (read_result != rsize) die("failed to read %d bytes from '%s'", - (int)rsize, path); + (int)rsize, source->path); offset += rsize; if (*already_hashed_to < offset) { size_t hsize = offset - *already_hashed_to; @@ -325,6 +364,12 @@ static int deflate_blob_to_pack(struct bulk_checkin_packfile *state, git_hash_ctx ctx; struct hashfile_checkpoint checkpoint = {0}; struct pack_idx_entry *idx = NULL; + struct bulk_checkin_source source = { + .type = SOURCE_FILE, + .fd = fd, + .size = size, + .path = path, + }; seekback = lseek(fd, 0, SEEK_CUR); if (seekback == (off_t) -1) @@ -342,10 +387,10 @@ static int deflate_blob_to_pack(struct bulk_checkin_packfile *state, while (1) { prepare_checkpoint(state, &checkpoint, idx, flags); if (!stream_blob_to_pack(state, &ctx, &already_hashed_to, - fd, size, path, flags)) + &source, flags)) break; truncate_checkpoint(state, &checkpoint, idx); - if (lseek(fd, seekback, SEEK_SET) == (off_t) -1) + if (bulk_checkin_source_seek_to(&source, seekback) == (off_t)-1) return error("cannot seek back"); } finalize_checkpoint(state, &ctx, &checkpoint, idx, result_oid); From patchwork Wed Oct 18 17:08:05 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Taylor Blau X-Patchwork-Id: 13427519 Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net [23.128.96.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 711363FE22 for ; Wed, 18 Oct 2023 17:10:29 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=ttaylorr-com.20230601.gappssmtp.com header.i=@ttaylorr-com.20230601.gappssmtp.com header.b="BEqFhjS/" Received: from mail-qv1-xf32.google.com (mail-qv1-xf32.google.com [IPv6:2607:f8b0:4864:20::f32]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 9FF53359D for ; Wed, 18 Oct 2023 10:08:14 -0700 (PDT) Received: by mail-qv1-xf32.google.com with SMTP id 6a1803df08f44-66d24ccc6f2so47665276d6.0 for ; Wed, 18 Oct 2023 10:08:14 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ttaylorr-com.20230601.gappssmtp.com; s=20230601; t=1697648887; x=1698253687; darn=vger.kernel.org; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=t3xJACvcTogI6rqUYPzB/a+ZJMPHbJJ1R0f7E7mJydU=; b=BEqFhjS/16bkMm7ctYmXEeMCbpEEKkOfPt0BsY53DFByQxpih6AzZnGtfuPuyVlR0J GOrcUVdtCsOOgSL9vIIyl5lleBUH1DaxHog2wp5TkVMytSe7zlGva/0AMRII2moXNwf3 CjOusJ4MCWSrwxlM/PBXb+0euZS8PYYdaIqUo6dN8vBxa/7ec4BR29PtuEBTWNhi44Y4 1l8jnZkQGCfwjzldRSc/R5FS7jwJSGjG/w8Kb8TCTZ0Rv1ZilnsL5nJagAWo54Xg2j8m GGh0N9md1w8Z4dZoJ8oncp+brxEns9+C0cNpbt68ENJIzqDIA1urGc5r/wq/DZtTa9GO DYTA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1697648887; x=1698253687; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=t3xJACvcTogI6rqUYPzB/a+ZJMPHbJJ1R0f7E7mJydU=; b=bC5fpP8JeRKuxvTr+LhKzWOscXMIpil+IpSbDp1aPQTyWVjtczl9/aeGcD5PUSzZFR ngzdKvmgL/0vK20K7DUMmsMSWQLhrV2Q+XmyQp0yMl6kohqZmF04yErcWKOEv1MP8b8O ctdwarvcZvyjvbclIqcjn+2N29tm00ZtYwJu0bfV2AGa5pCp3PVST5o+Ya/E01kjE0Ne /gVIQeXLFWsbfGUSp6QyJYpxu6Wl5EMhAriMbDbYXYwKAb0LyVINU7BMUZHhaQlMGmND 7wzYutjwLrNIBeGd0hD8WCMG2F5SZfRN/ZcaeB6V7UJIApHRSlt+jX6EiEnUBAXHoVOv B6Uw== X-Gm-Message-State: AOJu0YyZB/uYrm0yU90KEgVRfo3YchZ5w0XPg7ak4H2HZhvHhtl+Du0o OBC3aoWQV9xAhD9WbrTxAOK3H7eiIfeEytkudOcbBg== X-Google-Smtp-Source: AGHT+IHooKkC/yeCbuyOIe5EcDoJ/QznGk/BzFVssYEijkD+WrQi9tdEpdrI6Bv/vd3erh1e+qqumQ== X-Received: by 2002:ad4:4f29:0:b0:63d:580:9c68 with SMTP id fc9-20020ad44f29000000b0063d05809c68mr6273174qvb.32.1697648887025; Wed, 18 Oct 2023 10:08:07 -0700 (PDT) Received: from localhost (104-178-186-189.lightspeed.milwwi.sbcglobal.net. [104.178.186.189]) by smtp.gmail.com with ESMTPSA id c14-20020ac8518e000000b0041b83654af9sm97330qtn.30.2023.10.18.10.08.06 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 18 Oct 2023 10:08:06 -0700 (PDT) Date: Wed, 18 Oct 2023 13:08:05 -0400 From: Taylor Blau To: git@vger.kernel.org Cc: Elijah Newren , "Eric W. Biederman" , Jeff King , Junio C Hamano , Patrick Steinhardt Subject: [PATCH v3 06/10] bulk-checkin: implement `SOURCE_INCORE` mode for `bulk_checkin_source` Message-ID: <4e9bac5bc1a49ca7a96aaee84a46b389c6bfe99b.1697648864.git.me@ttaylorr.com> References: Precedence: bulk X-Mailing-List: git@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,RCVD_IN_DNSWL_BLOCKED,SPF_HELO_NONE,SPF_NONE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Continue to prepare for streaming an object's contents directly from memory by teaching `bulk_checkin_source` how to perform reads and seeks based on an address in memory. Unlike file descriptors, which manage their own offset internally, we have to keep track of how many bytes we've read out of the buffer, and make sure we don't read past the end of the buffer. Suggested-by: Junio C Hamano Signed-off-by: Taylor Blau --- bulk-checkin.c | 18 +++++++++++++++++- 1 file changed, 17 insertions(+), 1 deletion(-) diff --git a/bulk-checkin.c b/bulk-checkin.c index fc1d902018..133e02ce36 100644 --- a/bulk-checkin.c +++ b/bulk-checkin.c @@ -141,11 +141,15 @@ static int already_written(struct bulk_checkin_packfile *state, struct object_id } struct bulk_checkin_source { - enum { SOURCE_FILE } type; + enum { SOURCE_FILE, SOURCE_INCORE } type; /* SOURCE_FILE fields */ int fd; + /* SOURCE_INCORE fields */ + const void *buf; + size_t read; + /* common fields */ size_t size; const char *path; @@ -157,6 +161,11 @@ static off_t bulk_checkin_source_seek_to(struct bulk_checkin_source *source, switch (source->type) { case SOURCE_FILE: return lseek(source->fd, offset, SEEK_SET); + case SOURCE_INCORE: + if (!(0 <= offset && offset < source->size)) + return (off_t)-1; + source->read = offset; + return source->read; default: BUG("unknown bulk-checkin source: %d", source->type); } @@ -168,6 +177,13 @@ static ssize_t bulk_checkin_source_read(struct bulk_checkin_source *source, switch (source->type) { case SOURCE_FILE: return read_in_full(source->fd, buf, nr); + case SOURCE_INCORE: + assert(source->read <= source->size); + if (nr > source->size - source->read) + nr = source->size - source->read; + memcpy(buf, (unsigned char *)source->buf + source->read, nr); + source->read += nr; + return nr; default: BUG("unknown bulk-checkin source: %d", source->type); } From patchwork Wed Oct 18 17:08:08 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Taylor Blau X-Patchwork-Id: 13427511 Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net [23.128.96.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 79AC43D396 for ; Wed, 18 Oct 2023 17:09:43 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=ttaylorr-com.20230601.gappssmtp.com header.i=@ttaylorr-com.20230601.gappssmtp.com header.b="g1O031Wa" Received: from mail-qk1-x72d.google.com (mail-qk1-x72d.google.com [IPv6:2607:f8b0:4864:20::72d]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 1AE1635B5 for ; Wed, 18 Oct 2023 10:08:17 -0700 (PDT) Received: by mail-qk1-x72d.google.com with SMTP id af79cd13be357-7781bc3783fso139538885a.1 for ; Wed, 18 Oct 2023 10:08:16 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ttaylorr-com.20230601.gappssmtp.com; s=20230601; t=1697648890; x=1698253690; darn=vger.kernel.org; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=y7ZwlZkHGjvPzOo7/TXew226gbAp53JBC9TkQu6Y0uA=; b=g1O031WaJaih77acUXbC/s3q6PJs2p+J72phsIracXhMN5zvznzbECPY8XcFMMrXXM V9si9e4G8NFB4Z399PngtLXOoq122fLjR+BHyFNV5o57YXPL/z7WZMhQdniPdH0fvSwV h7NT2xOg3+d4YB733CtnTeXuQCbLrHMAD3L8HQvLfG8CTBAERtXc+xk8W5OwRTeXuf/l VGxnnQJPCDt/YIcfXrzaqeqQXY3Zgjc+myH8gx5LDHopPuRnJ0i49ciIy0PCwPulXokg gVn+JDv4gUClVRX8xq83eWx35ArQh+xwTkfHTVHvYc1fG88GzoQjzEApXyqGbzrJ050s kofQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1697648890; x=1698253690; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=y7ZwlZkHGjvPzOo7/TXew226gbAp53JBC9TkQu6Y0uA=; b=MTt0CsBx7Duf72ejGcob6IljD8fRrXQNPVlISS8i5ER5CEWyg+9kiWylPJVgXN62xM Vjq48kvSJ/IQTIIYXTg/945Jr4px1VmFcyfQq1LVz/bMu9YDS7x0R5meErT1k48+YNTi Hz0MYf4Hb82av03Sv0c0DT55XA0k3MrDgFNiipKgwJhvLMbIBUx3RCZE0hGh9Lato7E+ Eif2q5ymvMy/fkcGfKnz0t6rqCAkXSXlILN5a50fqWZt1k1r28a74VMUJw8L6vYtKK1Y zFmoiAWZ6QuPwFK/LI+e0cwD8d91XELU/DgqYz/VTTfqvbWbgExxtbOx4Bq5DQpm1B6L IA3g== X-Gm-Message-State: AOJu0Yw1hAbRdNG9LmXcxMqr71LzrSfae8S5Webd5F+PJ47OKVVnE7f4 jga10winaBhRY1fMKSOAsPzq2nfgcnzalIvWePMY/A== X-Google-Smtp-Source: AGHT+IHkIRVSsGt7RETl2PQ91C6hQzkyfLPrZPktxTF+DLvrHNJpjsM1uqGIyKG6TYpp6/EiGSAgUw== X-Received: by 2002:ad4:5b8f:0:b0:668:da55:6c17 with SMTP id 15-20020ad45b8f000000b00668da556c17mr6909231qvp.49.1697648889977; Wed, 18 Oct 2023 10:08:09 -0700 (PDT) Received: from localhost (104-178-186-189.lightspeed.milwwi.sbcglobal.net. [104.178.186.189]) by smtp.gmail.com with ESMTPSA id w11-20020a0562140b2b00b0066cf09f5ba9sm92912qvj.131.2023.10.18.10.08.09 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 18 Oct 2023 10:08:09 -0700 (PDT) Date: Wed, 18 Oct 2023 13:08:08 -0400 From: Taylor Blau To: git@vger.kernel.org Cc: Elijah Newren , "Eric W. Biederman" , Jeff King , Junio C Hamano , Patrick Steinhardt Subject: [PATCH v3 07/10] bulk-checkin: generify `stream_blob_to_pack()` for arbitrary types Message-ID: <04ec74e3574b8e0cfc503c46fa3481ef196348ac.1697648864.git.me@ttaylorr.com> References: Precedence: bulk X-Mailing-List: git@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,RCVD_IN_DNSWL_BLOCKED,SPF_HELO_NONE,SPF_NONE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net The existing `stream_blob_to_pack()` function is named based on the fact that it knows only how to stream blobs into a bulk-checkin pack. But there is no longer anything in this function which prevents us from writing objects of arbitrary types to the bulk-checkin pack. Prepare to write OBJ_TREEs by removing this assumption, adding an `enum object_type` parameter to this function's argument list, and renaming it to `stream_obj_to_pack()` accordingly. Signed-off-by: Taylor Blau --- bulk-checkin.c | 15 +++++++-------- 1 file changed, 7 insertions(+), 8 deletions(-) diff --git a/bulk-checkin.c b/bulk-checkin.c index 133e02ce36..f0115efb2e 100644 --- a/bulk-checkin.c +++ b/bulk-checkin.c @@ -204,10 +204,10 @@ static ssize_t bulk_checkin_source_read(struct bulk_checkin_source *source, * status before calling us just in case we ask it to call us again * with a new pack. */ -static int stream_blob_to_pack(struct bulk_checkin_packfile *state, - git_hash_ctx *ctx, off_t *already_hashed_to, - struct bulk_checkin_source *source, - unsigned flags) +static int stream_obj_to_pack(struct bulk_checkin_packfile *state, + git_hash_ctx *ctx, off_t *already_hashed_to, + struct bulk_checkin_source *source, + enum object_type type, unsigned flags) { git_zstream s; unsigned char ibuf[16384]; @@ -220,8 +220,7 @@ static int stream_blob_to_pack(struct bulk_checkin_packfile *state, git_deflate_init(&s, pack_compression_level); - hdrlen = encode_in_pack_object_header(obuf, sizeof(obuf), OBJ_BLOB, - size); + hdrlen = encode_in_pack_object_header(obuf, sizeof(obuf), type, size); s.next_out = obuf + hdrlen; s.avail_out = sizeof(obuf) - hdrlen; @@ -402,8 +401,8 @@ static int deflate_blob_to_pack(struct bulk_checkin_packfile *state, while (1) { prepare_checkpoint(state, &checkpoint, idx, flags); - if (!stream_blob_to_pack(state, &ctx, &already_hashed_to, - &source, flags)) + if (!stream_obj_to_pack(state, &ctx, &already_hashed_to, + &source, OBJ_BLOB, flags)) break; truncate_checkpoint(state, &checkpoint, idx); if (bulk_checkin_source_seek_to(&source, seekback) == (off_t)-1) From patchwork Wed Oct 18 17:08:11 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Taylor Blau X-Patchwork-Id: 13427512 Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net [23.128.96.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id ED2313D965 for ; Wed, 18 Oct 2023 17:09:48 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=ttaylorr-com.20230601.gappssmtp.com header.i=@ttaylorr-com.20230601.gappssmtp.com header.b="o08AuR9l" Received: from mail-qt1-x82b.google.com (mail-qt1-x82b.google.com [IPv6:2607:f8b0:4864:20::82b]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D47653840 for ; Wed, 18 Oct 2023 10:08:17 -0700 (PDT) Received: by mail-qt1-x82b.google.com with SMTP id d75a77b69052e-41cbd2cf3bbso7211911cf.0 for ; Wed, 18 Oct 2023 10:08:17 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ttaylorr-com.20230601.gappssmtp.com; s=20230601; t=1697648893; x=1698253693; darn=vger.kernel.org; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=nusivuG2il7aIPPIQ3lBI1O3aqqCGeS+pUw+w/8YRn4=; b=o08AuR9lQFBbqJJtCx2x4/a3JintN+Xz+Azx1SQcb6fg7zmad6XKootdmIcNmFjXvQ VQmGYoL7rWDv8oFZZ672N3LR0RJbpYEfSZqzhygpHJUw9xAQr3++yvgDBFj/K19QT3CG k56KmK0NBWpdphYyIWPIQrbrRHRVtBy2AtZeySZPtROyA/AjplTHHATt+QdzYa045Da9 lY61CbtgWyEfro75jk3HVD5LlmXnbccjy7SZYYccgPPZ1fxWO04SsMFZ+ERw2bixlWl6 B8VN2OxCXHpQwMVnFTSTnV/R+gMjrbZ/TMJUts4cLGAarAFkSrnyrmoScNxYE1wSBtMR SX2g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1697648893; x=1698253693; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=nusivuG2il7aIPPIQ3lBI1O3aqqCGeS+pUw+w/8YRn4=; b=pTeqlb0kgUp3hVf2BF5XqfC50VB3VCl1vWG4U0i/vENqdfMFBWopGjf5numCrrGZ7M c8Gl2/1pOI8kHRMDpKtUoO1e2TwdVhAKPxwqgw5tZ87aN8tZVADgA35CLzQJZgcviiri +/bZSheqwX/XBtrYKq4aZys9KjeqyOqVAjUG9JmiY4pieeqACAVYDoMHryAi50mRxIz0 xiPaOZRAqKyJHb+lAvAfgKz/3AX+LM2J8dkbfuGqzyO2LO/auwPB7eOGszQD1QU9HcX+ l0Dipfc+FBPUt5qctuLX2zY091ux9fiC4zvuACqFCjEpRfsNOD+7O3Qk7+YKiPfGg7wr bKQg== X-Gm-Message-State: AOJu0Yw+z7rCYN1ABTf6s8g97S5XvQulV35lF1OGuoKNFk77TqYQ22+F EqXPfQvZcZne5dAj89lx7T5ghUAK+NofSOK7UA9dyQ== X-Google-Smtp-Source: AGHT+IEtvtBpwO4F11ZFWwiszhcQHIB0zCwyJTCamJH6Qyx/et0MSwvlwxfOWue5/PSmpObsApvJxw== X-Received: by 2002:a05:622a:19a8:b0:403:a662:a3c1 with SMTP id u40-20020a05622a19a800b00403a662a3c1mr6733658qtc.29.1697648893009; Wed, 18 Oct 2023 10:08:13 -0700 (PDT) Received: from localhost (104-178-186-189.lightspeed.milwwi.sbcglobal.net. [104.178.186.189]) by smtp.gmail.com with ESMTPSA id z6-20020ac86b86000000b004198ae7f841sm88836qts.90.2023.10.18.10.08.12 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 18 Oct 2023 10:08:12 -0700 (PDT) Date: Wed, 18 Oct 2023 13:08:11 -0400 From: Taylor Blau To: git@vger.kernel.org Cc: Elijah Newren , "Eric W. Biederman" , Jeff King , Junio C Hamano , Patrick Steinhardt Subject: [PATCH v3 08/10] bulk-checkin: introduce `index_blob_bulk_checkin_incore()` Message-ID: <8667b763652ffa71b52b7bd78821e46a6e5fe5a9.1697648864.git.me@ttaylorr.com> References: Precedence: bulk X-Mailing-List: git@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,RCVD_IN_DNSWL_BLOCKED,SPF_HELO_NONE,SPF_NONE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Now that we have factored out many of the common routines necessary to index a new object into a pack created by the bulk-checkin machinery, we can introduce a variant of `index_blob_bulk_checkin()` that acts on blobs whose contents we can fit in memory. This will be useful in a couple of more commits in order to provide the `merge-tree` builtin with a mechanism to create a new pack containing any objects it created during the merge, instead of storing those objects individually as loose. Similar to the existing `index_blob_bulk_checkin()` function, the entrypoint delegates to `deflate_blob_to_pack_incore()`, which is responsible for formatting the pack header and then deflating the contents into the pack. The latter is accomplished by calling deflate_obj_contents_to_pack_incore(), which takes advantage of the earlier refactorings and is responsible for writing the object to the pack and handling any overage from pack.packSizeLimit. The bulk of the new functionality is implemented in the function `stream_obj_to_pack()`, which can handle streaming objects from memory to the bulk-checkin pack as a result of the earlier refactoring. Consistent with the rest of the bulk-checkin mechanism, there are no direct tests here. In future commits when we expose this new functionality via the `merge-tree` builtin, we will test it indirectly there. Signed-off-by: Taylor Blau --- bulk-checkin.c | 64 ++++++++++++++++++++++++++++++++++++++++++++++++++ bulk-checkin.h | 4 ++++ 2 files changed, 68 insertions(+) diff --git a/bulk-checkin.c b/bulk-checkin.c index f0115efb2e..9ae43648ba 100644 --- a/bulk-checkin.c +++ b/bulk-checkin.c @@ -370,6 +370,59 @@ static void finalize_checkpoint(struct bulk_checkin_packfile *state, } } +static int deflate_obj_contents_to_pack_incore(struct bulk_checkin_packfile *state, + git_hash_ctx *ctx, + struct hashfile_checkpoint *checkpoint, + struct object_id *result_oid, + const void *buf, size_t size, + enum object_type type, + const char *path, unsigned flags) +{ + struct pack_idx_entry *idx = NULL; + off_t already_hashed_to = 0; + struct bulk_checkin_source source = { + .type = SOURCE_INCORE, + .buf = buf, + .size = size, + .read = 0, + .path = path, + }; + + /* Note: idx is non-NULL when we are writing */ + if (flags & HASH_WRITE_OBJECT) + CALLOC_ARRAY(idx, 1); + + while (1) { + prepare_checkpoint(state, checkpoint, idx, flags); + + if (!stream_obj_to_pack(state, ctx, &already_hashed_to, &source, + type, flags)) + break; + truncate_checkpoint(state, checkpoint, idx); + bulk_checkin_source_seek_to(&source, 0); + } + + finalize_checkpoint(state, ctx, checkpoint, idx, result_oid); + + return 0; +} + +static int deflate_blob_to_pack_incore(struct bulk_checkin_packfile *state, + struct object_id *result_oid, + const void *buf, size_t size, + const char *path, unsigned flags) +{ + git_hash_ctx ctx; + struct hashfile_checkpoint checkpoint = {0}; + + format_object_header_hash(the_hash_algo, &ctx, &checkpoint, OBJ_BLOB, + size); + + return deflate_obj_contents_to_pack_incore(state, &ctx, &checkpoint, + result_oid, buf, size, + OBJ_BLOB, path, flags); +} + static int deflate_blob_to_pack(struct bulk_checkin_packfile *state, struct object_id *result_oid, int fd, size_t size, @@ -456,6 +509,17 @@ int index_blob_bulk_checkin(struct object_id *oid, return status; } +int index_blob_bulk_checkin_incore(struct object_id *oid, + const void *buf, size_t size, + const char *path, unsigned flags) +{ + int status = deflate_blob_to_pack_incore(&bulk_checkin_packfile, oid, + buf, size, path, flags); + if (!odb_transaction_nesting) + flush_bulk_checkin_packfile(&bulk_checkin_packfile); + return status; +} + void begin_odb_transaction(void) { odb_transaction_nesting += 1; diff --git a/bulk-checkin.h b/bulk-checkin.h index aa7286a7b3..1b91daeaee 100644 --- a/bulk-checkin.h +++ b/bulk-checkin.h @@ -13,6 +13,10 @@ int index_blob_bulk_checkin(struct object_id *oid, int fd, size_t size, const char *path, unsigned flags); +int index_blob_bulk_checkin_incore(struct object_id *oid, + const void *buf, size_t size, + const char *path, unsigned flags); + /* * Tell the object database to optimize for adding * multiple objects. end_odb_transaction must be called From patchwork Wed Oct 18 17:08:14 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Taylor Blau X-Patchwork-Id: 13427514 Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net [23.128.96.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id B9E453D3AE for ; Wed, 18 Oct 2023 17:10:15 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=ttaylorr-com.20230601.gappssmtp.com header.i=@ttaylorr-com.20230601.gappssmtp.com header.b="kPYiS3N2" Received: from mail-qk1-x72a.google.com (mail-qk1-x72a.google.com [IPv6:2607:f8b0:4864:20::72a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 56D433845 for ; Wed, 18 Oct 2023 10:08:17 -0700 (PDT) Received: by mail-qk1-x72a.google.com with SMTP id af79cd13be357-778925998cbso58809285a.0 for ; Wed, 18 Oct 2023 10:08:17 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ttaylorr-com.20230601.gappssmtp.com; s=20230601; t=1697648896; x=1698253696; darn=vger.kernel.org; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=ygAZDf+SJgG4c9MIqaGaEA338zYBwYZtx0qiouBEqVc=; b=kPYiS3N2rlpH6V9N5aYyO1OmedJ/lCVvS8q3SEmg7Od87WeBr/LZVVKZXQ+cPPcJVD MK2zLmTQ3HA/QU8qyHAHMVi8OwTZruKpn43LpnnAUBjo98aiydOGf1nLwoHMUQcSFkK2 Jb8PyvKhExbqI4mkD6kAdFgAyrjFXCVHLBhMIpy/1sUprtkbvqcTgUuCC6NsUysVVj7b bQDH8alH2uvwAKg5V/D0digLNM2mzG/aOh7IJ0XqekfRhhsRF4NuvwKXYvtgx9h96Z3p VZ8fohN7tEzJv4WIV5uZjFXsPEEXn99hlkKNPnLefWRkiLpMjxhXCONaRfYQk5YeKl94 38Iw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1697648896; x=1698253696; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=ygAZDf+SJgG4c9MIqaGaEA338zYBwYZtx0qiouBEqVc=; b=btsGKDmd4YU7qTZi+xsu4KSCBGyqOiThGiNBjvn4IYEceAJCD1IwZjCzy42ER78JRg lgSJarqgxja58wFiQSY6g0IpAFQ4Wgeb4zEDd1TLfJNwPrToqS3B5HCVhIh/MfOjwbSh K4aF1g9ikPUVQCGwxd91C/DAuRmfkvQrK7cZ4LCZJDnL7UFV9EDwEvc71seiM8ez75kd FEbVXdm4ED8uCN5OzP5GLCcILaSAtECzmW5+3TY9ZPHtEIWwmvBWKukIBAU8eXiYWG8f COcsJDrUGPn8ZVaUBZgBtsCkKnXC2AIo80SBVscO3EWxH9UvrjfJpWAxuoAmtfx6pub1 Pw0A== X-Gm-Message-State: AOJu0YxA1P5Uxs1XwoZWc/RUFn7AwY7nGTwC8t8Oi3I6lW2vO8GVHae7 tt5q4gfs4N15nFwNwIyo0e7eASzS5F/Ijd1fGAp6KA== X-Google-Smtp-Source: AGHT+IFkahsNR5g7J0bO/cUyxRd2NqikATx2PCvfNBVj1w/cCd1xjmAQWNpVvPf7EQGc/dtxiiraBg== X-Received: by 2002:a05:620a:2891:b0:777:5e79:d280 with SMTP id j17-20020a05620a289100b007775e79d280mr6162349qkp.53.1697648895970; Wed, 18 Oct 2023 10:08:15 -0700 (PDT) Received: from localhost (104-178-186-189.lightspeed.milwwi.sbcglobal.net. [104.178.186.189]) by smtp.gmail.com with ESMTPSA id m17-20020ae9e011000000b00774830b40d4sm101450qkk.47.2023.10.18.10.08.15 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 18 Oct 2023 10:08:15 -0700 (PDT) Date: Wed, 18 Oct 2023 13:08:14 -0400 From: Taylor Blau To: git@vger.kernel.org Cc: Elijah Newren , "Eric W. Biederman" , Jeff King , Junio C Hamano , Patrick Steinhardt Subject: [PATCH v3 09/10] bulk-checkin: introduce `index_tree_bulk_checkin_incore()` Message-ID: References: Precedence: bulk X-Mailing-List: git@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: The remaining missing piece in order to teach the `merge-tree` builtin how to write the contents of a merge into a pack is a function to index tree objects into a bulk-checkin pack. This patch implements that missing piece, which is a thin wrapper around all of the functionality introduced in previous commits. If and when Git gains support for a "compatibility" hash algorithm, the changes to support that here will be minimal. The bulk-checkin machinery will need to convert the incoming tree to compute its length under the compatibility hash, necessary to reconstruct its header. With that information (and the converted contents of the tree), the bulk-checkin machinery will have enough to keep track of the converted object's hash in order to update the compatibility mapping. Within `deflate_tree_to_pack_incore()`, the changes should be limited to something like: struct strbuf converted = STRBUF_INIT; if (the_repository->compat_hash_algo) { if (convert_object_file(&compat_obj, the_repository->hash_algo, the_repository->compat_hash_algo, ...) < 0) die(...); format_object_header_hash(the_repository->compat_hash_algo, OBJ_TREE, size); } /* compute the converted tree's hash using the compat algorithm */ strbuf_release(&converted); , assuming related changes throughout the rest of the bulk-checkin machinery necessary to update the hash of the converted object, which are likewise minimal in size. Signed-off-by: Taylor Blau --- bulk-checkin.c | 27 +++++++++++++++++++++++++++ bulk-checkin.h | 4 ++++ 2 files changed, 31 insertions(+) diff --git a/bulk-checkin.c b/bulk-checkin.c index 9ae43648ba..d088a9c10b 100644 --- a/bulk-checkin.c +++ b/bulk-checkin.c @@ -423,6 +423,22 @@ static int deflate_blob_to_pack_incore(struct bulk_checkin_packfile *state, OBJ_BLOB, path, flags); } +static int deflate_tree_to_pack_incore(struct bulk_checkin_packfile *state, + struct object_id *result_oid, + const void *buf, size_t size, + const char *path, unsigned flags) +{ + git_hash_ctx ctx; + struct hashfile_checkpoint checkpoint = {0}; + + format_object_header_hash(the_hash_algo, &ctx, &checkpoint, OBJ_TREE, + size); + + return deflate_obj_contents_to_pack_incore(state, &ctx, &checkpoint, + result_oid, buf, size, + OBJ_TREE, path, flags); +} + static int deflate_blob_to_pack(struct bulk_checkin_packfile *state, struct object_id *result_oid, int fd, size_t size, @@ -520,6 +536,17 @@ int index_blob_bulk_checkin_incore(struct object_id *oid, return status; } +int index_tree_bulk_checkin_incore(struct object_id *oid, + const void *buf, size_t size, + const char *path, unsigned flags) +{ + int status = deflate_tree_to_pack_incore(&bulk_checkin_packfile, oid, + buf, size, path, flags); + if (!odb_transaction_nesting) + flush_bulk_checkin_packfile(&bulk_checkin_packfile); + return status; +} + void begin_odb_transaction(void) { odb_transaction_nesting += 1; diff --git a/bulk-checkin.h b/bulk-checkin.h index 1b91daeaee..89786b3954 100644 --- a/bulk-checkin.h +++ b/bulk-checkin.h @@ -17,6 +17,10 @@ int index_blob_bulk_checkin_incore(struct object_id *oid, const void *buf, size_t size, const char *path, unsigned flags); +int index_tree_bulk_checkin_incore(struct object_id *oid, + const void *buf, size_t size, + const char *path, unsigned flags); + /* * Tell the object database to optimize for adding * multiple objects. end_odb_transaction must be called From patchwork Wed Oct 18 17:08:17 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Taylor Blau X-Patchwork-Id: 13427451 Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net [23.128.96.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 5579D3C685 for ; Wed, 18 Oct 2023 17:08:49 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=ttaylorr-com.20230601.gappssmtp.com header.i=@ttaylorr-com.20230601.gappssmtp.com header.b="daHq/dsC" Received: from mail-qk1-x732.google.com (mail-qk1-x732.google.com [IPv6:2607:f8b0:4864:20::732]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id A31C6386D for ; Wed, 18 Oct 2023 10:08:21 -0700 (PDT) Received: by mail-qk1-x732.google.com with SMTP id af79cd13be357-777719639adso139233785a.3 for ; Wed, 18 Oct 2023 10:08:21 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ttaylorr-com.20230601.gappssmtp.com; s=20230601; t=1697648899; x=1698253699; darn=vger.kernel.org; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=WYc5Y9iYFqj6E4GnaqAXiGbXRhIZ8LpcV4snUepGbzg=; b=daHq/dsCjIoFVuq7tbSUuChb+9knxW2/B7iUWuLlW5AJwb16oHd4k3s3QQ/1pXaWZM AhItsYhEK5sNy7XR4iPXIetYLEa/IjJcioaziqiTXPycD6SBNhCnG1sL/PmZV+72DEse DxTNjdYLBUwKO//k+igz1PLtVROJa8XcLyT6cwx52A1e1mVbJh96Lj2rRMcj70tXIZdp vQQjzK9bQPpc0/q81tddxbU6m4aTwXktSWdItjZqoW+BrZr6/RTCZNQmeOgHHfhWGfnH sXSG+qP6lDc030qFHP58ms3zj9Y4L7wH1N3FSDYIBrYhrSR2Xa1Y6UaIBfD0OPXBKS/d kD2Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1697648899; x=1698253699; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=WYc5Y9iYFqj6E4GnaqAXiGbXRhIZ8LpcV4snUepGbzg=; b=bfUkFLjmnHZlkuh1LfvVts97Rw699e03yN8ILqiNE47cl2DfUu+ZVss88IYsJGLvvi To276b4a4CFx8/okKm77VXwDSMFDKVaGasu8mTNP0pxH77ianpd8EiT+Eh6B7o/ratiH n/W/lEOAA5sMB8qsWOF9phSmy6LHZu3ew2BEVaA8D3hXQmsX6HW51/tS7I+Z/cbGTOsc k7ylFcZbLTSrxr2lzlXYad2+mJZ5xWPRgR+8mXYnycqffeY+m7+RsTSxbw3zMCjDSXXQ 0n16nTAJgCWpYlI7PJmCQFHMCIBWQEWq/RBcuEcgjC/SqwwXkI8qXUQuQHQ+42hSLzfk +YBQ== X-Gm-Message-State: AOJu0YwIgKNYGAfgSaoRtkDHYJ2e8OCf+2rG+cz2t3Zzr2t3ABayuTId b5NkEqdHzpFew8bj+zj2ruHJPg/gfWlNus95KxPYFw== X-Google-Smtp-Source: AGHT+IFx6J6h7115bdXERiZMa7dXuyKxB1zdMYD+SI2LWCUm+V2i+5b6Mgo4uEw7fmbnA6sQTrPadg== X-Received: by 2002:a05:620a:25d2:b0:774:3497:a7a3 with SMTP id y18-20020a05620a25d200b007743497a7a3mr5897601qko.17.1697648899039; Wed, 18 Oct 2023 10:08:19 -0700 (PDT) Received: from localhost (104-178-186-189.lightspeed.milwwi.sbcglobal.net. [104.178.186.189]) by smtp.gmail.com with ESMTPSA id s27-20020a05620a16bb00b007756f60bcacsm96178qkj.79.2023.10.18.10.08.18 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 18 Oct 2023 10:08:18 -0700 (PDT) Date: Wed, 18 Oct 2023 13:08:17 -0400 From: Taylor Blau To: git@vger.kernel.org Cc: Elijah Newren , "Eric W. Biederman" , Jeff King , Junio C Hamano , Patrick Steinhardt Subject: [PATCH v3 10/10] builtin/merge-tree.c: implement support for `--write-pack` Message-ID: References: Precedence: bulk X-Mailing-List: git@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,RCVD_IN_DNSWL_BLOCKED,SPF_HELO_NONE,SPF_NONE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net When using merge-tree often within a repository[^1], it is possible to generate a relatively large number of loose objects, which can result in degraded performance, and inode exhaustion in extreme cases. Building on the functionality introduced in previous commits, the bulk-checkin machinery now has support to write arbitrary blob and tree objects which are small enough to be held in-core. We can use this to write any blob/tree objects generated by ORT into a separate pack instead of writing them out individually as loose. This functionality is gated behind a new `--write-pack` option to `merge-tree` that works with the (non-deprecated) `--write-tree` mode. The implementation is relatively straightforward. There are two spots within the ORT mechanism where we call `write_object_file()`, one for content differences within blobs, and another to assemble any new trees necessary to construct the merge. In each of those locations, conditionally replace calls to `write_object_file()` with `index_blob_bulk_checkin_incore()` or `index_tree_bulk_checkin_incore()` depending on which kind of object we are writing. The only remaining task is to begin and end the transaction necessary to initialize the bulk-checkin machinery, and move any new pack(s) it created into the main object store. [^1]: Such is the case at GitHub, where we run presumptive "test merges" on open pull requests to see whether or not we can light up the merge button green depending on whether or not the presumptive merge was conflicted. This is done in response to a number of user-initiated events, including viewing an open pull request whose last test merge is stale with respect to the current base and tip of the pull request. As a result, merge-tree can be run very frequently on large, active repositories. Signed-off-by: Taylor Blau --- Documentation/git-merge-tree.txt | 4 ++ builtin/merge-tree.c | 5 ++ merge-ort.c | 42 +++++++++++---- merge-recursive.h | 1 + t/t4301-merge-tree-write-tree.sh | 93 ++++++++++++++++++++++++++++++++ 5 files changed, 136 insertions(+), 9 deletions(-) diff --git a/Documentation/git-merge-tree.txt b/Documentation/git-merge-tree.txt index ffc4fbf7e8..9d37609ef1 100644 --- a/Documentation/git-merge-tree.txt +++ b/Documentation/git-merge-tree.txt @@ -69,6 +69,10 @@ OPTIONS specify a merge-base for the merge, and specifying multiple bases is currently not supported. This option is incompatible with `--stdin`. +--write-pack:: + Write any new objects into a separate packfile instead of as + individual loose objects. + [[OUTPUT]] OUTPUT ------ diff --git a/builtin/merge-tree.c b/builtin/merge-tree.c index 0de42aecf4..672ebd4c54 100644 --- a/builtin/merge-tree.c +++ b/builtin/merge-tree.c @@ -18,6 +18,7 @@ #include "quote.h" #include "tree.h" #include "config.h" +#include "bulk-checkin.h" static int line_termination = '\n'; @@ -414,6 +415,7 @@ struct merge_tree_options { int show_messages; int name_only; int use_stdin; + int write_pack; }; static int real_merge(struct merge_tree_options *o, @@ -440,6 +442,7 @@ static int real_merge(struct merge_tree_options *o, init_merge_options(&opt, the_repository); opt.show_rename_progress = 0; + opt.write_pack = o->write_pack; opt.branch1 = branch1; opt.branch2 = branch2; @@ -548,6 +551,8 @@ int cmd_merge_tree(int argc, const char **argv, const char *prefix) &merge_base, N_("commit"), N_("specify a merge-base for the merge")), + OPT_BOOL(0, "write-pack", &o.write_pack, + N_("write new objects to a pack instead of as loose")), OPT_END() }; diff --git a/merge-ort.c b/merge-ort.c index 7857ce9fbd..e198d2bc2b 100644 --- a/merge-ort.c +++ b/merge-ort.c @@ -48,6 +48,7 @@ #include "tree.h" #include "unpack-trees.h" #include "xdiff-interface.h" +#include "bulk-checkin.h" /* * We have many arrays of size 3. Whenever we have such an array, the @@ -2107,10 +2108,19 @@ static int handle_content_merge(struct merge_options *opt, if ((merge_status < 0) || !result_buf.ptr) ret = error(_("failed to execute internal merge")); - if (!ret && - write_object_file(result_buf.ptr, result_buf.size, - OBJ_BLOB, &result->oid)) - ret = error(_("unable to add %s to database"), path); + if (!ret) { + ret = opt->write_pack + ? index_blob_bulk_checkin_incore(&result->oid, + result_buf.ptr, + result_buf.size, + path, 1) + : write_object_file(result_buf.ptr, + result_buf.size, + OBJ_BLOB, &result->oid); + if (ret) + ret = error(_("unable to add %s to database"), + path); + } free(result_buf.ptr); if (ret) @@ -3596,7 +3606,8 @@ static int tree_entry_order(const void *a_, const void *b_) b->string, strlen(b->string), bmi->result.mode); } -static int write_tree(struct object_id *result_oid, +static int write_tree(struct merge_options *opt, + struct object_id *result_oid, struct string_list *versions, unsigned int offset, size_t hash_size) @@ -3630,8 +3641,14 @@ static int write_tree(struct object_id *result_oid, } /* Write this object file out, and record in result_oid */ - if (write_object_file(buf.buf, buf.len, OBJ_TREE, result_oid)) + ret = opt->write_pack + ? index_tree_bulk_checkin_incore(result_oid, + buf.buf, buf.len, "", 1) + : write_object_file(buf.buf, buf.len, OBJ_TREE, result_oid); + + if (ret) ret = -1; + strbuf_release(&buf); return ret; } @@ -3796,8 +3813,8 @@ static int write_completed_directory(struct merge_options *opt, */ dir_info->is_null = 0; dir_info->result.mode = S_IFDIR; - if (write_tree(&dir_info->result.oid, &info->versions, offset, - opt->repo->hash_algo->rawsz) < 0) + if (write_tree(opt, &dir_info->result.oid, &info->versions, + offset, opt->repo->hash_algo->rawsz) < 0) ret = -1; } @@ -4331,9 +4348,13 @@ static int process_entries(struct merge_options *opt, fflush(stdout); BUG("dir_metadata accounting completely off; shouldn't happen"); } - if (write_tree(result_oid, &dir_metadata.versions, 0, + if (write_tree(opt, result_oid, &dir_metadata.versions, 0, opt->repo->hash_algo->rawsz) < 0) ret = -1; + + if (opt->write_pack) + end_odb_transaction(); + cleanup: string_list_clear(&plist, 0); string_list_clear(&dir_metadata.versions, 0); @@ -4877,6 +4898,9 @@ static void merge_start(struct merge_options *opt, struct merge_result *result) */ strmap_init(&opt->priv->conflicts); + if (opt->write_pack) + begin_odb_transaction(); + trace2_region_leave("merge", "allocate/init", opt->repo); } diff --git a/merge-recursive.h b/merge-recursive.h index b88000e3c2..156e160876 100644 --- a/merge-recursive.h +++ b/merge-recursive.h @@ -48,6 +48,7 @@ struct merge_options { unsigned renormalize : 1; unsigned record_conflict_msgs_as_headers : 1; const char *msg_header_prefix; + unsigned write_pack : 1; /* internal fields used by the implementation */ struct merge_options_internal *priv; diff --git a/t/t4301-merge-tree-write-tree.sh b/t/t4301-merge-tree-write-tree.sh index 250f721795..2d81ff4de5 100755 --- a/t/t4301-merge-tree-write-tree.sh +++ b/t/t4301-merge-tree-write-tree.sh @@ -922,4 +922,97 @@ test_expect_success 'check the input format when --stdin is passed' ' test_cmp expect actual ' +packdir=".git/objects/pack" + +test_expect_success 'merge-tree can pack its result with --write-pack' ' + test_when_finished "rm -rf repo" && + git init repo && + + # base has lines [3, 4, 5] + # - side adds to the beginning, resulting in [1, 2, 3, 4, 5] + # - other adds to the end, resulting in [3, 4, 5, 6, 7] + # + # merging the two should result in a new blob object containing + # [1, 2, 3, 4, 5, 6, 7], along with a new tree. + test_commit -C repo base file "$(test_seq 3 5)" && + git -C repo branch -M main && + git -C repo checkout -b side main && + test_commit -C repo side file "$(test_seq 1 5)" && + git -C repo checkout -b other main && + test_commit -C repo other file "$(test_seq 3 7)" && + + find repo/$packdir -type f -name "pack-*.idx" >packs.before && + tree="$(git -C repo merge-tree --write-pack \ + refs/tags/side refs/tags/other)" && + blob="$(git -C repo rev-parse $tree:file)" && + find repo/$packdir -type f -name "pack-*.idx" >packs.after && + + test_must_be_empty packs.before && + test_line_count = 1 packs.after && + + git show-index <$(cat packs.after) >objects && + test_line_count = 2 objects && + grep "^[1-9][0-9]* $tree" objects && + grep "^[1-9][0-9]* $blob" objects +' + +test_expect_success 'merge-tree can write multiple packs with --write-pack' ' + test_when_finished "rm -rf repo" && + git init repo && + ( + cd repo && + + git config pack.packSizeLimit 512 && + + test_seq 512 >f && + + # "f" contains roughly ~2,000 bytes. + # + # Each side ("foo" and "bar") adds a small amount of data at the + # beginning and end of "base", respectively. + git add f && + test_tick && + git commit -m base && + git branch -M main && + + git checkout -b foo main && + { + echo foo && cat f + } >f.tmp && + mv f.tmp f && + git add f && + test_tick && + git commit -m foo && + + git checkout -b bar main && + echo bar >>f && + git add f && + test_tick && + git commit -m bar && + + find $packdir -type f -name "pack-*.idx" >packs.before && + # Merging either side should result in a new object which is + # larger than 1M, thus the result should be split into two + # separate packs. + tree="$(git merge-tree --write-pack \ + refs/heads/foo refs/heads/bar)" && + blob="$(git rev-parse $tree:f)" && + find $packdir -type f -name "pack-*.idx" >packs.after && + + test_must_be_empty packs.before && + test_line_count = 2 packs.after && + for idx in $(cat packs.after) + do + git show-index <$idx || return 1 + done >objects && + + # The resulting set of packs should contain one copy of both + # objects, each in a separate pack. + test_line_count = 2 objects && + grep "^[1-9][0-9]* $tree" objects && + grep "^[1-9][0-9]* $blob" objects + + ) +' + test_done